Wednesday, November 2, 2016

Solving Sokohashv2.0 full of Angr on Ekoparty 2016



In memory of Pocho, the Panther. RIP buddy.


Last Ekoparty I got the chance to try my luck solving one of Core Security challenges. For a while I have been hearing great things about angr so I wanted to get to the solution using it.

Angr is a Python binary analysis framework that provides a quite clean API to a powerful symbolic executor (built around vex intermediate code) and a solving back-end (provided by a z3 wrapper called claripy). 

I have to admit that it took me a while to solve the challenge (I did not win :( ), I have only my own very limited intellect and lack of familiarity with the tool to blame. But what I lack in intelligence I have in perseverance so let us solve this with, as always, maximum effort.

The Sokohash V2 challenge imitates the rules of the known Sokoban puzzle game (https://en.wikipedia.org/wiki/Sokoban). Through a Rogue-like interface you get four "boxes" (x,y,z,w) that the player has to move to their corresponding correct positions. 


Sokohash interface - it is a bit off because I ran it in Wine :P
Unlike Sokoban there are no limits to the number of steps or pushes the player can make. In Sokohash, the game assigns a hash to each position on the board, then on each movement it takes the hashes of each box and calculates a global hash with the current state of the board. The objective of the challenge is to have that global hash match the winning hash:

C03922D0206DC3A33016010D6C66936E953ABAB9000010AE805CE8463CBE9A2D

As I want to get a sense of what angr can do for me, I’ll keep my binary analysis to a minimum. I basically have the following needs:

  • Identify were the global hash is calculated
  • Identify were the hash was stored.
  • Identify the function input and verify how it was related to the boxes coordinates/hashes.
  • Model the hash calculation with angr.

First we need to locate the function that calculates the hash. This is a simple step, we take a look at the strings with IDA and "Calculating hash..." takes us to the right spot.






After that, It was pretty straightforward to locate the function input by putting a breakpoint at the correct place, in my case 0x40101a (*). After the canary the function copies 32 bytes from stack and a uses them as local data. Just by looking at the process screen is evident that those 32 bytes correspond to the hashes of each of the boxes, 8 bytes each, the position of the player seems to be irrelevant. 

(*) I run the Win32 binary with Wine and debugged with GDB. Why? Because I was feeling wild and reckless.



Then we look for the place were the calculated hash is stored, this is also easy by using IDA/GDB. 






So now we know what the input is and where the output is stored. We might as well spin angr and see what it says. We are going to tell angr to find a path between start_addr and to_find. As the input values that will cause the function to walk that path will evidently be not unique (there are barely any code constraints between the start and end) we will also need to tell angr that we expect the input values to produce the winning hash by setting specific logical constraints on the resulting path.

As the output hash is not stored in memory in a linear fashion I just made a helper Python function help set up the constraints on the memory output.

import angr
import sys
import simuvex
import struct
from itertools import combinations, product

WIN_HASH = "C03922D0206DC3A33016010D6C66936E953ABAB9000010AE805CE8463CBE9A2D".decode("hex")

def do_nothing(state):
    pass

def do_repmovsd(state):
    # angr does not like rep movsd
    # we do it by hand
    buffer = state.memory.load(state.regs.esi, 8 * 4)
    state.memory.store(state.regs.edi, buffer)

def get_hash_map(init_addr):
    # Helper function to get each address were a byte of
    # of the calculated hash is going to be stored in the 
    # proper order
    addr = init_addr
    hash_map = []
    for i in xrange(0, len(WIN_HASH), 2):
        pair = WIN_HASH[i:i+2]
        hash_map.append((addr, ord(pair[1])))
        hash_map.append((addr+1, ord(pair[0])))
        addr += 8    

    return hash_map


def main():
    proj = angr.Project('sokohashv2.0.exe', use_sim_procedures=True, load_options={"auto_load_libs": False})

    # ADDRS 
    main = 0x401013 # The start of our path, just after the canary
    to_find = 0x0040123E # The end, just before the security check
    hash_addr = 0x04216C0 # The address were the output hash will be stored

    # HOOKS - We avoid any function call that does not alter
    # our result
    func_hooks = [0x0040102C, 0x0401033]
    for addr in func_hooks:
        proj.hook(addr, do_nothing, length=6) 

    func_hooks = [0x401215, 0x40121E, 0x401239, 0x40123C]
    for addr in func_hooks:
        proj.hook(addr, do_nothing, length=2) 

    # We need to hook the rep movsd because the symbolic execution
    # fails to resolve it. We model it by hand
    proj.hook(0x0401028, do_repmovsd, length=2)
    proj.hook(0x0401253, do_nothing, length=5) 
    proj.hook(0x040103E, do_nothing, length=5) 
    proj.hook(0x0401225, do_nothing, length=5) 
    proj.hook(0x0401243, do_nothing, length=5) 

    # initial state
    init = proj.factory.blank_state(addr=main)
    
    # we set some registers to get a context closer to real world
    init.regs.ebp = init.regs.esp + 0x78
    buffer = init.memory.load(init.regs.ebp + 0x8, 0x20)
        
    pg = proj.factory.path_group(init, threads=8)
    pg.explore(find=to_find)

    path = pg.found[0]

    found = path.state

    # We print the expected hash for verification
    conds = []
    expected = []
    hash_map = get_hash_map(hash_addr)
    for addr, value in hash_map:       
        memory = found.memory.load(addr, 1, endness=proj.arch.memory_endness) 

        # Here we declare that each byte in the output hash must be
        # part of the winning hash
        conds.append((memory == value))
        expected.append((hex(addr), hex(value)))
    print "Expected is '%s'\n\n" % expected

    found.add_constraints(init.se.And(*conds))

    # We print the resulting hash for verification
    result = []
    hash_map = get_hash_map(hash_addr)
    for addr, value in hash_map:       
        buf_ptr = found.memory.load(addr, 1)
        possible = found.se.any_int(buf_ptr)
        result.append((hex(addr), "0x%x" % possible))
    print "Result is '%s'\n\n" % result

    # Print solutions
    possible = found.se.any_n_int(buffer, 1) # We ask for the first solution
    for i, f in enumerate(possible):
        out = "%x" % f
        if len(out) < (0x20*2):
            continue

        names = ["x","y","z","w"]
        values = []
        for j in xrange(0, len(out), 16):
            value = out[j:j+16]
            unpk_value = struct.unpack("<Q", value.decode("hex"))[0]

            values.append((names[j//16], "%.16x" % unpk_value))
        print "\tSolution %d: %s" % (i, values)


if __name__ == '__main__':
    #angr.path_group.l.setLevel('DEBUG')
    main()

If we run it we get:



Bingo! But if we check the solution on the actual game we won't get the winning hash. We can think that maybe there are more than one solution and just ask the solver for more of them by using the line:

...
possible = found.se.any_n_int(buffer, 10) # Give me 10 solutions if possible
...





With lots of solutions checking each one will be very time consuming and we may never arrive at the correct one. As we narrow the output using constraints maybe we also need to narrow the input. To do this we need to ask ourselves the question: What is the relationship between the position hash and the box coordinates? 

Turns out that by looking at the caller of "x_calc_hash" we can see that the coordinates are used to read data from memory. If we take a look at the memory region where the data is read from it is full of hashes, turns out that each hash correspond to a board position. Then it will be ideal to find were those hashes are stored, dump them and then using them to add the proper constraints. We do that finding the first and last hash and then finding their memory address with GDB.





We can easily map each hash to a coordinate using python but first we need the x,y coordinate pairs. The board is 27x17 but we also need to consider there are positions on the board that cannot be reached (the portal, the letter in "CORE", inside letters). To get an accurate coordinate map we need to exclude these positions from it, for example, using the function below:

def get_valid_coords():
    var = """#                            #
#            O               #
#         x                  #
#                   w        #
#           *                #
# ##    ####   ######   ###  #
##  #  ##  ##  ##  ##  ##    #
##     ##  ##  ##  ##  ##    #
##     ##  ##  #####   ####  #
##     ##  ##  ## ##   ##    #
##  #  ##  ##  ##  ##  ##    #
# ##    ####   ##  ##   ###  #
#                            #
#                            #
#                      yz    #
#                     O      #
/                            #
#                            #
"""

    valid = []
    invalid = (list(product(range(7,12),[9,10])) +
               list(product([7,8],[17,18])) )

    x = 1
    for line in var.splitlines():
        line = line.strip()
        line = line[1:len(line)-1]
        y = 1
        for i in line:
            if i not in ["O", "#", "/"]:
                if (x,y) not in invalid:
                    valid.append((x,y))
            y += 1
        x += 1   

    return valid

After we just need to load our memory dump and read the hashes, I will also store the dump in symbolic memory in case angr accesses it. 

def do_memset(state):
    addr = 0x417490
    with open("matrix.bin","rb") as f:
        content = f.read()
        for i in content:            
            state.memory.store(addr, state.se.BVV(ord(i), 8 * 1))
            addr += 1

    start_off = 0x41d450 - addr
    end_off = 0x41e0c8 - addr
    coords = []
    for i in xrange(start_off, end_off+8, 8):        
        coords.append(struct.unpack("<Q", content[i:i+8])[0])

    return coords 

After this we just map hashes to coordinates:

    coords = do_memset(init)
    coord_dict = {}
    count = 0
    for i in get_valid_coords():
        #print "%s = %.16x" % (i, pos[count])
        coord_dict[coords[count]] = i
        count += 1

So now we have a map of hashes to coordinate pairs. The only thing remaining to tell angr is that each box hash is in that map and that each box has to have a different hash (boxes cannot be stacked), to do this we apply constrains to each of the 8 byte input hashes.

   # search only for possible coords
    variables = []
    for i in xrange(0, 4):
        var = init.memory.load(init.regs.ebp + 0x8 + (0x8*i), 0x8, endness=proj.arch.memory_endness) 
        variables.append(var)
        conds = []
        for p in coords:
            conds.append(p == var)
        init.add_constraints(init.se.Or(*conds))

    # each coordinate must be distinct
    for v1,v2 in combinations(variables, 2):
        init.add_constraints(v1 != v2)


We run it and…




:) 

This is by no means the only solution and I have barely scraped what Angr has to offer (multi-arch symbolic debugging with integrated solving seems fancy :> ) but I can say is joy to play with and to add to one's toolbox. 

Facuman
@facuman

---
Below is the full solution:

import angr
import sys
import struct
from itertools import combinations, product

WIN_HASH = "C03922D0206DC3A33016010D6C66936E953ABAB9000010AE805CE8463CBE9A2D".decode("hex")


def get_valid_coords():
    var = """#                            #
#            O               #
#         x                  #
#                   w        #
#           *                #
# ##    ####   ######   ###  #
##  #  ##  ##  ##  ##  ##    #
##     ##  ##  ##  ##  ##    #
##     ##  ##  #####   ####  #
##     ##  ##  ## ##   ##    #
##  #  ##  ##  ##  ##  ##    #
# ##    ####   ##  ##   ###  #
#                            #
#                            #
#                      yz    #
#                     O      #
/                            #
#                            #
"""

    valid = []
    invalid = (list(product(range(7,12),[9,10])) +
               list(product([7,8],[17,18])) )

    x = 1
    for line in var.splitlines():
        line = line.strip()
        line = line[1:len(line)-1]
        y = 1
        for i in line:
            if i not in ["O", "#", "/"]:
                if (x,y) not in invalid:
                    valid.append((x,y))
            y += 1
        x += 1   

    return valid

def do_memset(state):
    addr = 0x417490
    with open("matrix.bin","rb") as f:
        content = f.read()
        for i in content:            
            state.memory.store(addr, state.se.BVV(ord(i), 8 * 1))
            addr += 1

    start_off = 0x41d450 - addr
    end_off = 0x41e0c8 - addr
    coords = []
    for i in xrange(start_off, end_off+8, 8):        
        coords.append(struct.unpack("<Q", content[i:i+8])[0])

    return coords

def do_repmovsd(state):
    # angr does not like rep movsd
    # we do it by hand
    buffer = state.memory.load(state.regs.esi, 8 * 4)
    state.memory.store(state.regs.edi, buffer)

def do_nothing(state):
    pass

def get_hash_map(init_addr):
    addr = init_addr
    hash_map = []
    for i in xrange(0, len(WIN_HASH), 2):
        pair = WIN_HASH[i:i+2]
        hash_map.append((addr, ord(pair[1])))
        hash_map.append((addr+1, ord(pair[0])))
        addr += 8    

    return hash_map


def main():
    proj = angr.Project('sokohashv2.0.exe', use_sim_procedures=True, load_options={"auto_load_libs": False})

    # addrs 
    main = 0x401013
    to_find = 0x0040123E
    hash_addr = 0x04216C0

    # hooks
    func_hooks = [0x0040102C, 0x0401033]
    for addr in func_hooks:
        proj.hook(addr, do_nothing, length=6) 

    func_hooks = [0x401215, 0x40121E, 0x401239, 0x40123C]
    for addr in func_hooks:
        proj.hook(addr, do_nothing, length=2) 

    proj.hook(0x0401028, do_repmovsd, length=2)
    proj.hook(0x0401253, do_nothing, length=5) 
    proj.hook(0x040103E, do_nothing, length=5) 
    proj.hook(0x0401225, do_nothing, length=5) 
    proj.hook(0x0401243, do_nothing, length=5) 

    # initial state
    init = proj.factory.blank_state(addr=main)
    
    coords = do_memset(init)
    coord_dict = {}
    count = 0
    for i in get_valid_coords():
        #print "%s = %.16x" % (i, pos[count])
        coord_dict[coords[count]] = i
        count += 1

    init.regs.ebp = init.regs.esp + 0x78

    # search only for possible coords
    variables = []
    for i in xrange(0, 4):
        var = init.memory.load(init.regs.ebp + 0x8 + (0x8*i), 0x8, endness=proj.arch.memory_endness) 
        variables.append(var)
        conds = []
        for p in coords:
            conds.append(p == var)
        init.add_constraints(init.se.Or(*conds))

    # each coordinate must be distinct
    for v1,v2 in combinations(variables, 2):
        init.add_constraints(v1 != v2)

    buffer = init.memory.load(init.regs.ebp + 0x8, 0x20)
        
    pg = proj.factory.path_group(init, threads=8, save_unconstrained=True)
    pg.explore(find=to_find)

    path = pg.found[0]

    found = path.state

    # Resulting hash must be winning hash
    # Print expected hash and resulting hash for verification
    conds = []
    expected = []
    hash_map = get_hash_map(hash_addr)
    for addr, value in hash_map:       
        memory = found.memory.load(addr, 1, endness=proj.arch.memory_endness) 
        conds.append((memory == value))
        expected.append((hex(addr), hex(value)))
    print "Expected is '%s'\n\n" % expected

    found.add_constraints(init.se.And(*conds))

    result = []
    hash_map = get_hash_map(hash_addr)
    for addr, value in hash_map:       
        buf_ptr = found.memory.load(addr, 1)
        possible = found.se.any_int(buf_ptr)
        result.append((hex(addr), "0x%x" % possible))
    print "Result is '%s'\n\n" % result


    # Print solutions
    possible = found.se.any_n_int(buffer, 1)
    for i, f in enumerate(possible):
        out = "%x" % f
        if len(out) < (0x20*2):
            continue

        names = ["x","y","z","w"]
        values = []
        for j in xrange(0, len(out), 16):
            value = out[j:j+16]
            unpk_value = struct.unpack("<Q", value.decode("hex"))[0]

            values.append((names[j//16], coord_dict[unpk_value]))
        print "\tSolution %d: %s" % (i, values)


if __name__ == '__main__':
    #angr.path_group.l.setLevel('DEBUG')
    main()