Note: This is an article from my old dev blog. External links have been updated, but the text is otherwise reposted verbatim.
So, that tweet went a little bit viral. It’s the classic Game Boy Advance boot-up screen, with the text changed to the oh-so-relatable “I’m Gay”. I could have created this as an animation, but instead I’d spent a couple of days poring over documentation and disassembly to actually modify the sprites in the system’s BIOS file. I thought it might be interesting to share the technical details about that.
For all of my testing I was using the VisualBoyAdvance emulator. It’s got some very nice debug views to visualise the state of the VRAM, a memory viewer, and very helpfully the disassembly of the active program code, along with the ability to step instructions one-by-one.
My initial assumption was that the graphics data would exist in an obvious format in the BIOS, and that I’d be able to spot it just by dumping out the BIOS as an image, mapping each byte to a pixel. I’ve used this technique on other reverse-engineering projects and it’s usually very helpful. In this case, however, I turned up nothing but entropy - no obvious patterned data at all.
I tried zeroing out various parts of the BIOS data, seeing if I could deduce the location of the sprite data. This didn’t work very well - I managed to break the audio chime and later managed to crash the BIOS entirely, so I scrapped that idea pretty quickly.
I reached the conclusion that the data must be compressed in some form, and started looking around for resources about GBA data compression techniques. I stumbled across a project called dsdecmp which contained code for compression and decompression with various algorithms used by the GBA and DS systems, and thought it might be useful.
I tried running dsdecmp’s LZ77 decompressor on the BIOS, starting at each point in the BIOS that could feasibly match the LZ77 data header, in the hopes that I could find the compressed sprite data by sheer brute force, but this also turned up a dead end.
Eventually I realised I was going to have to get my hands dirty, and by stepping through the BIOS code one instruction at a time using VBA’s disassembler, I was able to identify the following data flow:
- Copy
$370
bytes from$0000332C
to$03000564
- Decompress
$370
bytes from$03000564
into$3C0
bytes at$03001564
- Decompress
$3C0
bytes from$03001564
into$800
bytes at$03000564
- Expand
$800
bytes of 2bit graphics data from$03000564
into$2000
bytes of 8bit graphics data at$06000040
A quick note about the GBA memory layout. The BIOS is mapped at address range $00000000-$00003FFF
, there’s some general-purpose RAM starting at $03000000
, and VRAM starts at $06000000
. There are various other parts of addressable memory but they’re not relevant here. (source: GBATEK)
So it’s copying some compressed data from the BIOS into IRAM, decompressing it twice in IRAM, and then expanding it while copying into VRAM. After a little while reading the GBATEK documentation and comparing against the compressed data, I was able to determine from the header bytes that the first compression pass is Huffman and the second pass is LZ77. So I think the BIOS is actually performing the following steps using the BIOS decompression functions:
MemCopy($0000332C, $03000564, $370); // likely using CpuSet or CpuFastSet
HuffUnCompReadNormal($03000564, $03001564);
LZ77UnCompReadNormalWrite8bit($03001564, $03000564);
BitUnPack($03000564, $06000040, {
sourceLength: $800,
sourceWidth: 2,
destWidth: 8,
dataOffset: 0
});
I was able to bodge together some C# code to extract the sprite data and dump it out to an image file. I then bodged together some more code to read the image file, cut it down to 2 bits per pixel, and compress the data in the manner the BIOS expects. I could then just modify the image file, run the code, and I’d get a modified BIOS file with the new sprites.
This doesn’t work all the time though. If the sprites have too much entropy, the compression won’t be able to keep the data under $370
bytes, and I think the halfway-stage compressed data has an upper size limit too. Thankfully I managed to get the data I wanted under the size limit, but I did have a couple of failed attempts while experimenting.
While I’m sure plenty of you want my tooling for this, I won’t be releasing it. It’s a hacky and buggy mess I’m not particularly proud of, and I don’t really feel like tidying it up or fielding support requests. This should have given you enough detail to build a comparable tool yourself if you’re really determined though ;)
Oh, and there was a bonus GDPR joke tweet that blew up a bit too, made with the same techniques.