Sunday, January 8, 2012

Going Open Source

Actually, I already included the source to my Ruby ISO 9660 parser in the file I attached to my previous post. I just thought I'd make the project official by creating a Git for it. You can check it out at github.com/An0Hit0/Ruby-ISO-9660. I'm not going to be doing further updates to the library in the immediate future, but it does already do several useful things, and it wouldn't be difficult to add more functionality if someone wanted to take on the project. Either way, I'm eager to see if anyone gets any use out of it besides me.

Tuesday, January 3, 2012

Oh look, I did something constructive.

It happens every once in a while, despite my best efforts. For the past week I spent some time on and off trying to get an old PSP game called Gurumin: A Monstrous Adventure undubbed. For those of you not familiar with undubs, an undub is a hack  designed to take a game that has been raped by horrible English voice acting and restore it to it's former Japanese glory. Maybe people make undubs for games that originate in other languages too, but I've never heard of one. Gurumin may not be the best example of a game to undub, but hopefully by reading this you might learn a little bit about how the process works, and hacking in general. I've tried my best to write this article so someone who doesn't have much experience with hacking can still understand it.

Gurumin is sort of an interesting case as far as undubbing is concerned. For one thing, it already has a feature to enable the Japanese voice acting. You may ask yourself, why would I try to undub a game that already has a Japanese voice feature? A couple of reasons actually. For one, you can't enable the feature without entering a special code, and the code can't be used unless you've already beaten the game. This is sort of a nitpicky issue I guess. You could just download a game save that's been completed and copy it to your PSP. Doing so is neither difficult nor complicated. It's just, something about the way they made Japanese voice acting such a hidden feature that only someone who had done some serious research would be able to figure out how to use it on their first playthough, I find distasteful. And besides, there's still the much more significant, although still somewhat nitpicky problem that even if you enable the Japanese voice acting, the voices that play when you are actually on the game field are still from the English voice set. This is just plain annoying. It's one thing to be forced to play the game in English only, it's entirely another to have the voice actors switching back and forth constantly. It doesn't make for a very coherent gaming experience. My only question was, could I fix it?

I will freely admit, I am not the greatest hacker in the world. I have some experience, but there is plenty of stuff that is over my head. Undubbing a game that is already more than half undubbed however, did not seem like one of those things. In many cases a successful undub can be accomplished just by moving some files from the Japanese version of the game into the English version, and rebuilding the ISO image. In this case the files I needed were already on the disc, which meant I could make an undub patch without even violating any copyright laws. It seemed like a pretty straightforward proposition to me.

The first thing I tried, was to use a well known tool for PSP ISO manipulation called UMDGen to simply overwrite the files in the English voice acting folder with the files from the Japanese voice acting folder. The game couldn't very well use the English voice acting files if they were no longer on the disc, and I knew the game engine could support the Japanese files since it already had an option to use them. What could possibly go wrong? After I rebuilt the image, the first sign of trouble was that the rebuilt image was about 100MB smaller than the original image. The second sign was that when I tried to run the game on my PSP, the game decided to hang at the loading screen about a second after it was started. Well shit, this wasn't going to be as easy as I thought.

Based on what I already knew about games that behave this way when they have their disk image rebuilt, it was highly likely the problem was that the game was loading it's data from locations that were hard coded into it's programming, instead of looking up the locations of files from the ISO 9660 header at the beginning of the disk image. When I rebuilt the image after overwriting files with files of different sizes, it inevitably forced UMDGen to move the data in the image layout around. The result of this was, the game was now looking for important files in locations where they no longer existed. Man, what a pain in the butt. Now I was going to have to do some actual work to make this thing happen.

The next thing I tried, which was perhaps a bit misguided, was to write some software that would allow me to make changes to the ISO 9660 header. While I did say I was pretty sure the game wasn't using the header, I wasn't completely sure it never used it. There was a possibility it was loading some files using the information from the ISO header, and others using other methods. If it ever, at any point used the ISO header to locate a voice acting file, that header data would have to be changed regardless of anything else I had to do. Plus, I figured that it would be handy to have some software that would allow me to easily script edits to an ISO image, and indeed this did prove to be the case. All I knew was, there was no way I was manually editing the 2053 file entries it would take to make the necessary changes.

It would have been nice if some software already existed to do what I wanted to do, but sadly this was not the case. Due to the nature of the ISO 9660 specifications, making edits to ISO images that don't require rebuilding the entire ISO header is tricky. And it's something that the format was never really designed to handle gracefully, since ISO images were never originally intended to be altered after they were generated. Needless to say, when I set out the write my library, I pretty much had to start from scratch. Lucky for me, the specification summary done by the good people at OSDev was more than enough to tell me everything I needed to know.

I decided to write my library in Ruby because... I like Ruby. Even though it really isn't particularly suited for messing with binary files, I like the fact my code will run on almost any platform, and the fact that anyone can easily make changes to and use my code for their own purposes without having to deal with a complex or platform specific build process. And Ruby is really fun to program in too. ;)

After spending several days coding on and off (a faster programmer might have written the whole thing in a day, but I wasn't exactly working on it 24/7), I had implemented enough of the ISO 9660 specs to give what I wanted to do a shot. Here was the code I used for the first step:
stream = File.open("gurumin.iso", "rb+")
iso = ISO.new(stream)

jap = iso.root["PSP_GAME"]["USRDIR"]["vag_jp"]
usa = iso.root["PSP_GAME"]["USRDIR"]["vag"]

jap.entries.each do | name |
  jap_entry = jap.entry(name)
  usa_entry = usa.entry(name)
  
  usa_entry.extent_lba = jap_entry.extent_lba
  usa_entry.data_length = jap_entry.data_length
  
  stream.pos = usa_entry.position
  usa_entry.dump(stream)
end
What this does is simple enough to understand. It takes the location and size fields of the entries for the Japanese voice acting files, and uses them to update the entries for the English voice acting files. After running this script on my original ISO I started up the game to find out... absolutely nothing had changed.

The reason why I don't consider myself to be the best hacker in the world is not because I lack knowledge or ability. It's because I lack patience. Hacking something often means banging your head on the wall repeatedly after you run into dead ends. In this case, I could have gotten lucky, and the game could have used the ISO header to load the voice acting entries. In reality however, it did not, and after working on the problem for days and accomplishing nothing, I was about ready to give up. But when I tried to sleep that night, I just couldn't get the project out of my head. I really wanted to put this one in the win column, and I still had a lot of ideas left. Even if I wasn't 100% sure any of them would work, I wasn't ready to give up.

The problem of course becomes that unlike when I hack things on the PC, I don't have access to the debugging tools I need to properly analyze how a PSP game works. I could load up the whole thing in a disassembler and go though it line by line (and I actually did end up needing a disassembler later on), but that would be way too time consuming to justify based on what I was actually trying to accomplish. No, I needed some low hanging fruit.

I decided to load up the game's ISO in WinHex for some further analysis. In my experience the first rule of hacking binary files, is that strings are pay-dirt. The first thing I did after loading the file was to look for instances of the name of the first file in the voice acting folder ("a00_001_") in the data. The first instances I ran into were unsurprisingly entries in the ISO header. But then I found what looked like a pretty interesting table. It had 16 byte entries, that followed a consistent pattern. The first 8 bytes were the name of a file or folder. The next 4 bytes appeared to be the size of the file. The final 4 bytes I couldn't quite figure out. I assume they had something to do with the location of the file, but there was nothing else in the table that would have indicated how to resolve the information in those bytes to an actual location on the disc. Since the table happened to have entries for every file in both the Japanese and English voice acting folder, as well as several other key folders in the game's directory structure, it seemed I'd found what I was looking for.

I figured out, by looking for an entry in the ISO header that matched the location of the data I was looking at, the table was located in a file called "aaaa.lst". Why would anyone put a table of file sizes and locations in some randomly named file instead of just looking up the files in the ISO header? ...I don't know, but you'd be surprised how often you find these type of questionable design decisions in the process of hacking something. I edited the file so that the entries for the English files would match the entries for the Japanese files, and patched the file into the ISO using the following code:
entry = iso.root["PSP_GAME"]["USRDIR"].entry("aaaa.lst")
stream.pos = entry.extent_lba * iso.lba_size
buffer = File.open("aaaa.lst", "rb+").read()
stream.write(buffer)
I skipped updating the file size in the entry because I didn't actually change the size of the file.

So I loaded up the ISO again on my PSP and the result was... nothing changed. The voice acting in the opening was still entirely in English, and working as if nothing had happened. Ok, seriously, what the hell? I had actually already gone through the ISO file looking for more references to a file in the voice acting directory earlier. Other than that table, there was nothing interesting. But more importantly, what does that file even do if changing it doesn't do anything? This is another one of those situations where patience pays off. I could have turned off the game at this point and never given it another look. But I decided to play through it a bit, and see if anything had changed. To my surprise, the field voices were now in Japanese.

If I had just been doing this for myself, I probably would have called this a win and put the project to bed. I had wanted to get the game to default to Japanese, but it was at least now possible to play the game properly with full Japanese voices by using a completed game save and the Japanese voice acting code. Really, that's what was most important. Except... I did want to make this a public release, and to make matters worse after playing into the game even further there was still some English voice acting left that played when your character used a heal point. I'm just too much of a perfectionist to let something like that stand.

However, I was running out of ideas. Not only had I changed everything in the game I could find that I felt would be meaningful to change, I was throughly confused about why the results I was getting were so inconsistent. How exactly was it that I'd managed to change every field voice, except one? And why was the cutscene voice acting so resistant to change? When I was editing aaaa.lst, I did skip over a few entries, as there were a few files in the English voice set that weren't in the Japanese set. That could explain why one of the voices wasn't changed. I decided to try the Japanese version of the game to find out if there was supposed to be a voice sample for heal points or not. As it turns out there was. And there were no files in the voice acting folder of the original Japanese game that weren't in the Japanese voice acting folder of the US release of the game. There was really only one thing left to do. I was going to have to disassemble the game's main executable and figure out what was going on.

The thing about PSP executables is, they're encrypted to prevent tampering and piracy. That would have been a show-stopper except for two things. Due to a mistake on Sony's part, early PSP games actually left the unencrypted version of the game's executable on the disc image. The unencrypted file is called "BOOT.BIN", and the encrypted version is called "EBOOT.BIN". But while the original unencrypted boot file is left on the disc, it is never actually used. The PSP only loads the "EBOOT.BIN" file in order to start the game. The next step was fairly obvious:
eboot_entry = iso.root["PSP_GAME"]["SYSDIR"].entry("EBOOT.BIN")
boot_entry = iso.root["PSP_GAME"]["SYSDIR"].entry("BOOT.BIN")

eboot_entry.extent_lba = boot_entry.extent_lba
eboot_entry.data_length = boot_entry.data_length

stream.pos = eboot_entry.position
eboot_entry.dump(stream)
The nice thing is, when the PSP OS loads files, such as the main executable it has to load in order to start a game, it actually uses the ISO header. I guess my library was pretty useful after all. And thanks to the magic of custom firmware, my PSP has absolutely no issues loading an eboot file that isn't actually encrypted.

Now all I had to do was figure out how the game was loading the voice files. Simple, right? Actually, yes, yes it was. Remember when I said strings are pay-dirt? The first thing I did after loading the file up in my disassembler was to look over the list of strings. Most things on the list seemed pretty useless, but two entries stuck out:
"vag/%s.vag"
"vag_jp/%s.vag"
Anyone familiar with the printf() function could tell you that it's highly probable those strings were at some point being used by the game to load files in the voice acting folders. Could it really be that simple? Could I just switch the string referring to the English folder with the string for the Japanese folder, and be done with the whole mess without familiarizing myself with MIPS assembly language? I could, except changing the entries is not that simple. If the string for the English folder was bigger than the one for the Japanese folder, there would be no problem. I could just overwrite one string with the other and put a terminating null at the end. But since the string I needed to replace was smaller than the string I wanted to replace it with, I had a problem. Here is what things looked like in the Hex editor:
vag/%s.vag..vag_
jp/%s.vag.......
(Note that the "."s represent null characters in this example, except the ones that follow both instances of "%s" which actually are "."s). If I had simply copied the string for the Japanese folder over the one for the English folder, this is what would have happened.
vag_jp/%s.vag.g_
jp/%s.vag.......
It would probably have worked for changing the English voice acting to Japanese. It's just too sloppy for my taste. If you ever actually did enter in the Japanese voice code, it would try to use the string "g" to load voice files. This would at best cause no voice to play, and at worst crash the game. Thankfully, there was a better way.

By the way, for anyone wondering why I would have had to write over the data that came after the string I was overwriting instead of inserting a few bytes to make room, you need to understand that executable files are, like the Gurumin ISO, very dependent on hard coded locations. If you cause everything to move that comes after the string you're inserting by adding bytes, all of a sudden the executable is looking for those things a few bytes earlier than they actually exist. For various reasons, it's really difficult to patch an executable file to fix this, so you just have to take it as a given that you can't ever insert bytes into the middle of an executable file.

Anyway, if you are using a decent disassembler, odds are it has the capability to take a string and find all the places where it's being used by the program. As it turns out, in this case both strings were being used in 8 different places. Not too difficult to deal with. To make things even easier, the instructions that used the strings were always the same:
la $a1, aVagS_vag #"vag/%s.vag"
la $a1, aVag_jpS_vag #"vag_jp/%s.vag"
What these instructions to do isn't particularly important. What is important is if you can change all the instances of the first instruction to the second instruction, you can make the game always load the Japanese voice files.

If you actually look at the bytes for these instructions in a hex editor, here is what they look like:
la $a1, aVagS_vag #"vag/%s.vag" -> 8C 33 A5 24
la $a1, aVag_jpS_vag #"vag_jp/%s.vag" -> 98 33 A5 24
Hmm... those instructions are pretty similar in hexadecimal form. They're only one byte off actually. All I'd have to do would be to locate those instructions in my hex editor and make a one byte change:
stream.pos = boot_entry.extent_lba * iso.lba_size
buffer = stream.read(boot_entry.data_length)

buffer[473956] = 0x98
buffer[474240] = 0x98
buffer[474600] = 0x98
buffer[474932] = 0x98
buffer[475348] = 0x98
buffer[475700] = 0x98
buffer[476116] = 0x98
buffer[476840] = 0x98

stream.pos = boot_entry.extent_lba * iso.lba_size
stream.write(buffer)
Mission accomplished. I can now boot up Gurumin and hear Japanese voices no matter what I do. Totally worth it.

If you'd like to patch your ISO, go ahead and try my script. I linked to it at the bottom of this post. Be sure to read the Readme.txt file for instructions. And don't be afraid to leave a comment if you have any issues.

Gurumin Undub Patch 1.02