Identifying Unreachable Basic Blocks

The below IDAPython code can be used to identify unreachable code in a function.

def unreachable(address, colorme = True, verbose = True):
    'identfies basic blocks that do not have an entry point and are not the entry point of a function' 
    child = set([])
    curr = 0
    orph = []
    f = idaapi.FlowChart(idaapi.get_func(address))
    for block in f:
        for succ_block in block.succs():
            child.add(succ_block.id)     
        for pred_block in block.preds():
            child.add(pred_block.id)
    for block in f:
        if block.id not in child and  block.id != 0:
            if verbose:
                print "Unreachable Code - %x - %x" % (block.startEA, block.endEA)
            if colorme:
                curr = block.startEA
                while curr != block.endEA:
                    SetColor(curr, CIC_ITEM, 0x80ffff)
                    curr = NextAddr(curr)
            orph.append((block.startEA, block.endEA))
    return orph
             
  
    

Some obfuscation techniques use unreachable code and dead code variables to clutter the output of a disassembler.  The goal of this is to either hinder or slow down the analysis of the code. In the grey code below we can see a basic block of unreachable code.

The basic block has no entry point to it. Typically the only time we will see a basic block that doesn't have an entry will be at the entry point of the function. Using IDA's FlowChart function can we get all the blocks ids and their exit point(s). If all the exit points are saved into a set; we will also have a set of corresponding entry points for other basic blocks. The last block which is usually the function epilogue will not have an exit point (from a control flow [1] function chunk association view ).  Using the output from Elias Bachaalany's ex_gdl_qflow_chart.py (which my code is just a hack of) we can see the block ids and their corresponding exit points.


Python>cls_main()
1002a5d0 - 1002a626 [0]:
  1002a626 - 1002a62b [1]:
  1002a63f - 1002a671 [3]:
1002a626 - 1002a62b [1]:
  1002a684 - 1002a68a [5]:
1002a62b - 1002a63f [2]:
  1002a63f - 1002a671 [3]:
1002a63f - 1002a671 [3]:
  1002a684 - 1002a68a [5]:
1002a671 - 1002a684 [4]:
  1002a684 - 1002a68a [5]:
1002a684 - 1002a68a [5]:  

Notice the last block in the output does not contain an exit point. Since we have all the entry points we now just need to iterate through all the block ids and see if it's id is in the exit point set. All the block ids (except for the function entry point) should be an exit point from another block id. Or simply, all exit points are another blocks entry point.  If the block id is not the function entry point and is not entry point from another basic block we know that block is unreachable code. Let's see an example run.


The function needs only one argument which is the address. A second argument can be True or False for colors, and True or False for verbose. unreachable(here(), False, False) would only return a list of tuples (start address, end address) for the unreachable basic block. The code can be easily modified to loop through every function to print out all unreachable code in an executable. Or could be used to highlight functions via a hot-key similar to deroko's color eip redirection script. Just a quick post. Cheers.

[1] "The IDA SDK's notion of basic block membership within a function is based upon function chunk association and not control flow." - via Rolf on RE Reddit - source

1 comment:

  1. Neat stuff. Maybe this could be used to help identify the compiler/linker used, etc.
    Probably show some neat statistical insights if nothing else.
    Thanks for sharing.

    ReplyDelete