In memory of Pocho, the Panther. RIP buddy.
Last Ekoparty I got the chance to try my luck solving one of Core Security challenges. For a while I have been hearing great things about angr so I wanted to get to the solution using it.
Angr is a Python binary analysis framework that provides a quite clean API to a powerful symbolic executor (built around vex intermediate code) and a solving back-end (provided by a z3 wrapper called claripy).
I have to admit that it took me a while to solve the challenge (I did not win :( ), I have only my own very limited intellect and lack of familiarity with the tool to blame. But what I lack in intelligence I have in perseverance so let us solve this with, as always, maximum effort.
The Sokohash V2 challenge imitates the rules of the known Sokoban puzzle game (https://en.wikipedia.org/wiki/Sokoban). Through a Rogue-like interface you get four "boxes" (x,y,z,w) that the player has to move to their corresponding correct positions.
Unlike Sokoban there are no limits to the number of steps or pushes the player can make. In Sokohash, the game assigns a hash to each position on the board, then on each movement it takes the hashes of each box and calculates a global hash with the current state of the board. The objective of the challenge is to have that global hash match the winning hash:
As I want to get a sense of what angr can do for me, I’ll keep my binary analysis to a minimum. I basically have the following needs:
- Identify were the global hash is calculated
- Identify were the hash was stored.
- Identify the function input and verify how it was related to the boxes coordinates/hashes.
- Model the hash calculation with angr.
First we need to locate the function that calculates the hash. This is a simple step, we take a look at the strings with IDA and "Calculating hash..." takes us to the right spot.
After that, It was pretty straightforward to locate the function input by putting a breakpoint at the correct place, in my case 0x40101a (*). After the canary the function copies 32 bytes from stack and a uses them as local data. Just by looking at the process screen is evident that those 32 bytes correspond to the hashes of each of the boxes, 8 bytes each, the position of the player seems to be irrelevant.
(*) I run the Win32 binary with Wine and debugged with GDB. Why? Because I was feeling wild and reckless.
Then we look for the place were the calculated hash is stored, this is also easy by using IDA/GDB.
So now we know what the input is and where the output is stored. We might as well spin angr and see what it says. We are going to tell angr to find a path between start_addr and to_find. As the input values that will cause the function to walk that path will evidently be not unique (there are barely any code constraints between the start and end) we will also need to tell angr that we expect the input values to produce the winning hash by setting specific logical constraints on the resulting path.
As the output hash is not stored in memory in a linear fashion I just made a helper Python function help set up the constraints on the memory output.
If we run it we get:
Bingo! But if we check the solution on the actual game we won't get the winning hash. We can think that maybe there are more than one solution and just ask the solver for more of them by using the line:
possible = found.se.any_n_int(buffer, 10) # Give me 10 solutions if possible
With lots of solutions checking each one will be very time consuming and we may never arrive at the correct one. As we narrow the output using constraints maybe we also need to narrow the input. To do this we need to ask ourselves the question: What is the relationship between the position hash and the box coordinates?
Turns out that by looking at the caller of "x_calc_hash" we can see that the coordinates are used to read data from memory. If we take a look at the memory region where the data is read from it is full of hashes, turns out that each hash correspond to a board position. Then it will be ideal to find were those hashes are stored, dump them and then using them to add the proper constraints. We do that finding the first and last hash and then finding their memory address with GDB.
We can easily map each hash to a coordinate using python but first we need the x,y coordinate pairs. The board is 27x17 but we also need to consider there are positions on the board that cannot be reached (the portal, the letter in "CORE", inside letters). To get an accurate coordinate map we need to exclude these positions from it, for example, using the function below:
After we just need to load our memory dump and read the hashes, I will also store the dump in symbolic memory in case angr accesses it.
After this we just map hashes to coordinates:
So now we have a map of hashes to coordinate pairs. The only thing remaining to tell angr is that each box hash is in that map and that each box has to have a different hash (boxes cannot be stacked), to do this we apply constrains to each of the 8 byte input hashes.
We run it and…
This is by no means the only solution and I have barely scraped what Angr has to offer (multi-arch symbolic debugging with integrated solving seems fancy :> ) but I can say is joy to play with and to add to one's toolbox.
Below is the full solution: