The Great Escape > Reverse Engineering

by David Thomas on

Table of Contents

  1. Rippity Doo-Dah: Starting to Pull the Game Apart
  2. Commence Disassemblifications
  3. Discovering SkoolKit
  4. Reversing
  5. Now Rewrite it All…

Rippity Doo-Dah: Starting to Pull the Game Apart

Ideally we want to transform a dump of the original game’s memory into a fully-commented assembly language listing that we can use to rebuild the game. First, is that even possible? Fortunately The Great Escape is self-contained: the game loads everything it needs then runs. If it was a multi-load game, where individual levels need to load from tape, or a 128K-only game, which uses bank switching to get around the Z80’s 64K address space limit, then things would be considerably more difficult.

How do we get started?

Locating Graphics

For The Great Escape, I located the graphics in the game binary by using the Windows version of SGE (Spectrum Graphics Editor) by Richard Jordan. (Richard created the Skool Daze tribute/remake called Klass of ‘99 and is now working on an isometric Atic Atac remake called Melkhior’s Mansion.)

WinSGE analysing Chase HQ, Where Time Stood Still and Batman

Finding graphics in game binaries is a process of scrolling up and down in an ripping tool like SGE while looking for the “stripes” which give away the presence of graphics data. Since we don’t know the width of the graphics we can then try various common widths like 16 or 24 pixels wide. Once the correct width is found the image will snap into focus, albeit possibly offset. None of this is a hard and fast rule however - game authors are free to encode their graphics in whatever form they choose.

Once located, each graphic’s offset and dimensions will need to be recorded. Sometimes it’s a little difficult to decide exactly where the image starts and ends if there is an amount of unused space at the top or bottom of an image, which could equally be space allocated to an adjacent graphic.

Other Clues

Aside from the graphics, what other clues can we find?

Strings of Text

We can hunt through the binary for text we recognise. For The Great Escape I knew that it must contain the strings for the input devices (KEYBOARD, KEMPSTON, SINCLAIR, …) we see on the main menu. Also the news bulletins that appear at the bottom left of the screen when events happen in the game. However upon inspection I could find no ASCII strings in the binary…

When I located the font data the reason for this became apparent: to save space the font omits the glyph for letter “O” and re-uses the glyph for zero. This saves 8 bytes, but means that all text has to be encoded to match the font data. So “KEYBOARD” is $14,$0E,$21,$0B,$00,$0A,$1A,$0D (where $ denotes hex). Once decoded to ASCII it looks like KEYB0ARD.

Other Patterns

Sometimes when scrolling through the graphics we’ll see regular patterns of data which we can’t resolve into any recognisable graphic. For instance, commonly needed is a table of 256 bit-reversed bytes, which are used for flipping characters left-right. This can be spotted when dumping graphics. Anything seen in the binary like this could be data, like maps, masks or lookup tables. Anything which looks like random noise is probably going to be instructions. I’ll write below about a better way of establishing which locations contain real instructions.

Commence Disassemblifications

With the binary roughly mapped out we can start disassembling the game’s instructions.

For my initial disassembly I used IDA Pro. IDA Pro is an interactive disassembler - an eye wateringly expensive one. Although more commonly used these days for reversing malware it can handle many different CPUs including the Spectrum’s Z80. Fortunately my (then) employer had a licensed copy that I could use.

IDA Pro disassembling The Great Escape

IDA can be nimble up to a point, but I quickly got annoyed with it repeatedly crashing and not having undo. I wrote an IDAPython script to automate the marking up of the known parts of the executable so I could start from scratch each time it assploded.

Identifying Instructions

We’ve now got a semi-opaque binary. We know what some locations contain, but even with IDA Pro’s code analysis it’s still unclear: which of the remaining locations contains genuine instructions, and which are data? I used the Free Unix Spectrum Emulator (FUSE) with profiling output enabled to see which locations really are instructions. To have a representative run of the game I ran an RZX recording of the game from http://www.rzxarchive.co.uk/ in the emulator to capture the profile. The RZX recording ran the game all the way to completion. This is important for coverage: we want to make sure that every instruction in the game gets executed.

I captured this profile and pulled it into IDA Pro as comments (look at the “Hits: …” comments in the screenshot). This took me down routes like:

This is typical for a space-constrained game. Once the graphics are on-screen they’re never modified again and don’t ever have to be re-plotted, so we can reclaim that valuable space in RAM for buffers. Later on we see that the same is true of anything on the main menu - the title music plays once and is never heard again, for example, so its space becomes one of the various rendering buffers. Likewise the input handling routines can only be chosen once so are copied into place once chosen, then gets reused as a buffer.

I/O

Some of the earliest functions I spotted were those which talk to hardware. These are easy to spot due to the Z80’s IN and OUT mnemonics making it obvious. This includes:

Discovering SkoolKit

By this point IDA Pro was really starting to grate. I started looking for an alternative Z80 diassembler and discovered that the guy who reverse engineered the Spectrum classics Skool Daze, Back to Skool and Contact Sam Cruise (and rewrote them in Python) had built a toolkit called SkoolKit to assist in pulling apart and documenting these games.

SkoolKit is a meta-assembler of sorts. You build your disassembly in a Z80 assembly listing-like file called a .skool file. This can then be transformed into either a listing for assembly into a working game, or into a cross-referenced HTML format disassembly.

We can also supplement the disassembly with Python scripts to decode data structures from a snapshot of the game’s RAM. We can use this to generate images of game maps, sprites, rooms, and decode other data like the routes the characters take.

I immediately switched allegiance.

The Great Escape & Skool Daze

There is much in common between Skool Daze and The Great Escape. Both feature an open world with free roaming, automatic characters with “AI” and take a sequence of steps to complete the game.

SkoolKit’s .ctls, .skools & .sfts

SkoolKit offers three formats (though since version 7.2 one format is now deprecated).

The .ctl is a control file which is a way to annotate addresses, but not include any of the original game data (important if you want to avoid redistributing copyrighted data…)

The .skool file is the full-fat version which includes everything necessary to rebuild asm files and HTML disassemblies.

The third option is a .sft file which is a Skool File Template. These are deprecated now that .ctls can do just about everything .sfts can do - which wasn’t always the case.

The SkoolKit workflow is to start with a .ctl file, and start annotating addresses as each memory location’s purpose is decoded. Past some threshold the idea is that the ctl becomes awkward and you move onto using a skool file full-time. I struggled with that and tried to stick with .ctl files for ages.

Pseudocode

Knowing that I wanted to create a portable C language version of the game I decided to not write plain English comments on the assembly but to instead use a C-styled pseudocode.

This was valuable up to a point, but caused problems when SkoolKit’s use of braces conflicted with C’s. I ended up bending the notation and it became quite awkward. In the previous year I have rewritten the entire disassembly in plain English.

Reversing

With Skoolkit setup, from here on the job is mainly to slog away, pulling each function apart and decoding the game’s data formats.

When stuck on something we can open up FUSE and (attempt to) use its debugger, or sometimes just poke values into game snapshots to see what happens.

It’s a scatty approach: discoveries in one function can impact elsewhere in the code. That in turn can impact elsewhere. You can end up hopping around the disassembly making notes but risk ending up with a giant set of diffs with little cohesion - a hairball patch - then having to spend time splitting the patch up into meaningful commits.

Complications

Function boundaries aren’t always clear in this type of code. Some functions have multiple entry points. Some fall through into the following function. Some have a function buried in their middle. In one case a routine had its RET return instruction missing so (harmlessly) fell into another. Even late on in the process I was changing my mind about what constituted a routine.

The game uses some self-modifying code which is a pain in the bum, but often worse was trying to follow register use across code which switches between the Z80’s two register banks.

Now Rewrite it All…

By January 2016 the diassembly was sufficient to reassemble the original game. This was useful for making experimental tweaks to the original game, but the disassembly commentary really wasn’t up to scratch. My original choice to write it in a C-style pseudocode proved to be awkward. In 2018 I embarked on a rewrite of every single function in the disassembly. It took about a year to rewrite it all in plain English prose.