The first thing that we will need is the root parent function. Usually this function is identified during the reversing process. I have been working on a script that will automatically identify large block of functions that are linked libraries or stand alone subroutine-blocks. This will be described later on in this post. Let's use Gozi as an example. One of it's commands in Gozi is called "GET_CERTS". This function will grab a set of certificates such as AuthRoot, Certificate Authority and a couple of others then write them to the %TEMP% directory, zip up the files and where they will wait till they are processed. The zip library is statically linked to the Gozi binary.
The above image is the cross-reference from call graph in IDA for the function that I identified as the root parent for the zip functionality. I don't know if the term is correct but I have started calling the root parent function and all the child nodes, subroutine-blocks. All the functions for a linked library could be thought of as one subroutine-block. They are all used to support some similar functionality or all the child nodes can be traced back to a root function. In the Gozi zip example we have 86 nodes. Let's describe real quick how we could rename all of these functions. First we will need to find the root parent function (check), then a string of what we want to label the function with, then we will need to recursively create a set that contains all the functions down, we will need to remove all APIs names from the set and then we will rename all the functions in the set.
Now let's describe how we can do that using IDAPython. Getting the function and label/tag can be done in IDA with the AskStr('example_name', "Text to ask"). We can recursively "graph_down" by stealing code from Carlos G. Prado on his awesome Milf-Plugin for IDA. Then we can get a set of all the imported API by using idaapi.get_import_module_name() and idaapi.enum_import_names(). Once we have a set of the code path and a set of all the API names we can remove all intersections from the code path set. This will remove any imported API names because we don't want to rename them. Then from there it's a simple for loop to rename the function name with the MakeNameEx() function. Let's go over how this would look visually in IDA.
Entering the root parent function name that we would like to recursively rename all the child nodes.
Add the tag/label that we would like to add to the start of the function name. Note if the function has already been labeled with the tag it will not be renamed.
Now all the subroutine-block names have been tagged. For people who don't want to wait for IDAScope to be released you can download the source code below. Previously in the post I mentioned that I have been working on a script that will automatically identify subroutine-blocks for stand alone code or linked libraries. A side effect of child functions that are part of subroutine-blocks is they are not often called outside of their subroutine-block. This can be seen easily using IDA's proximity view.
Yeah, that was the first time I found use of proximity view also, kind of cool. We can see from the graph all of our zip functions that were previously renamed. If we review all the functions we will notice there are not any calls to and from outside of the subroutine-block. A way that we can automatically identify these subroutine-blocks is to to enumerate all functions, remove all imported functions, for each function we create a self that includes all child functions/nodes that we will call the path, then for each function in the path we check if the cross-reference to the function are outside of the path. If the cross-reference is outside of the path then we know it's not a subroutine-block because it's not self contained. This scenario is actually extremely rare because its too rigid. In order to help with this we can add a threshold. The threshold is a Python set that is updated if a function contained in the subroutine-block is cross-referenced by another function outside of the subroutine-block. If we allow a threshold of one or two than we can successfully identify sub-routine blocks.
Disclaimer: I don' have much background in algorithms. Nor do I have a background in graph theory. My approach is basically a brute-force approach. I'm sure this could be speed up or optimized. As Dan mentioned to me there are some issues with linear complexity in my algorithm. The speed is decent on IDBs that contain less that 1k functions. After that it can run a little slow. But then again it's much quicker than trying to find it by hand.
Time for an example. Note: ALT-F9 is your friend if you are developing scripts in IDA.
The above output is from a python script that can be found below. At the bottom of the output window you can see that we identified the parent function of what was thought to be the start of the subroutine-block for the zip functions. Hopefully by being able to recursively rename functions and automatically being able to identify those subroutine-blocks it will help you with speeding up the reversing process. Below is the code for anyone who doesn't want to wait for IDAScope to be released.
rec_rename.py - recursively rename function - Download
## This script helps with renaming functions ## See the following link for more info ## Created by alexander.hanel@gmail.com from idaapi import * import idautils import idc import sys imports_list = [] # The following two functions are used to get the import API names. # ret_list_of_imports() returns the api names in a list def imp_cb(ea, name, ord): global imports_list if not name: pass else: imports_list.append(name) return True def ret_list_of_imports(): global imports_list nimps = idaapi.get_import_module_qty() for i in xrange(0,nimps): name = idaapi.get_import_module_name(i) if not name: pass idaapi.enum_import_names(i, imp_cb) return imports_list def graph_down(ea, graph = {}, path = set([])): # This function was borrowed from Carlos G. Prado. Check out his Milf-Plugin for IDA on Google Code. graph[ea] = list() # Create a new entry on the graph dictionary {node: [child1, child2, ...], ...} path.add(ea) # This is a set, therefore the add() method # Iterate through all function instructions and take only call instructions for x in [x for x in FuncItems(ea) if is_call_insn(x)]: # Take the call elements for xref in XrefsFrom(x, XREF_FAR): if not xref.iscode: continue if xref.to not in path: # Eliminates recursions graph[ea].append(xref.to) graph_down(xref.to, graph, path) return path def main(): # Get function name as input. func_name = LocByName(AskStr("sub_0xxxxx", "Enter Function Name")) if func_name == 0xffffffff: Warning("[ERROR] Bad Function Name [ERROR]") return tag = AskStr("string", "Function Tag") if tag == None: Warning("[ERROR] Tag cannot be None [ERROR]") return list_imports = ret_list_of_imports() # graph down needs the address of the function passed. nodes_xref_down = graph_down(func_name,graph = {}, path = set([])) # graph_down returns the int address needs to be converted tmp = [] tmp1 = '' for func in nodes_xref_down: tmp1 = GetFunctionName(func) if tmp1 != '': tmp.append(tmp1) nodes_xref_down = tmp # Remove the APIs from the xref list for xx in set(list_imports).intersection(set(nodes_xref_down)): nodes_xref_down.remove(xx) for rename in nodes_xref_down: func_addr = LocByName(rename) if tag not in rename: MakeNameEx(func_addr, str(tag) + str('_') + rename, SN_NOWARN)
=============================================================================
sub_blocks.py - subroutine-blocks finder - Download
if __name__ == "__main__": main()
## This script will find subroutine-blocks using IDA ## See the following link for more info ## Created by alexander.hanel@gmail.com from idaapi import * import idautils import idc import operator import sys imports_list = [] sys.setrecursionlimit(2000) # Get every function name. Returns the function names in a list. This is basically # the output from the "Function name" Tab/Window def get_func_names(): func_name_list = [] for x in idautils.Functions(): func_name_list.append(GetFunctionName(x)) return func_name_list # The following two functions are used to get the import API names. # ret_list_of_imports() returns the api names in a list def imp_cb(ea, name, ord): global imports_list if not name: pass else: imports_list.append(name) return True # 2nd function as described above def ret_list_of_imports(): global imports_list nimps = idaapi.get_import_module_qty() for i in xrange(0,nimps): name = idaapi.get_import_module_name(i) if not name: pass idaapi.enum_import_names(i, imp_cb) return imports_list # Returns a set of of calls that are found xrefed from def graph_down(ea, graph = {}, path = set([])): # This function was borrowed from Carlos G. Prado. Check out his Milf-Plugin for IDA on Google Code. graph[ea] = list() # Create a new entry on the graph dictionary {node: [child1, child2, ...], ...} path.add(ea) # This is a set, therefore the add() method # Iterate through all function instructions and take only call instructions for x in [x for x in FuncItems(ea) if is_call_insn(x)]: # Take the call elements for xref in XrefsFrom(x, XREF_FAR): if not xref.iscode: continue if xref.to not in path: # Eliminates recursions graph[ea].append(xref.to) graph_down(xref.to, graph, path) return path def get_nodes(func_name, list_imports): threshold = set([]) # Create list of each node in the xref from nodes_xref_down = graph_down(LocByName(func_name),graph = {}, path = set([])) # graph_down returns the int address needs to be converted tmp = [] tmp1 = '' for func in nodes_xref_down: tmp1 = GetFunctionName(func) if tmp1 != '': tmp.append(tmp1) nodes_xref_down = tmp # Do not want the parent function xrefs to, just the child functions nodes_xref_down.remove(func_name) # Remove the APIs from the xref list for xx in set(list_imports).intersection(set(nodes_xref_down)): nodes_xref_down.remove(xx) for nodes in nodes_xref_down: # Get all code xrefs to ref_addr = CodeRefsTo(LocByName(nodes),0) # For each code xrefs to is not in nodes for func_ref in ref_addr: # if func is not in all nodes if GetFunctionName(func_ref) not in nodes_xref_down and GetFunctionName(func_ref) != func_name: threshold.add(GetFunctionName(func_ref)) if len(threshold) == 3: return ret = [] if len(threshold) < 3 and len(threshold) != 0: ret.append(func_name) ret.append(len(nodes_xref_down)) for each in threshold: if each == '': ret.append('Unknown Function Address') else: ret.append(each) return ret def sortByColumn(bigList, *args): bigList.sort(key=operator.itemgetter(*args)) # Uncomment and comment above if sort from high to low, rather high to low #bigList.sort(key=operator.itemgetter(*args), reverse = True) # sorts the list in place def main(): data_o = [] k = '' # List of all import APIs list_imports = ret_list_of_imports() # List of all function names list_func_names = get_func_names() # Some API names will be labeled functions by IDA. The below for loop # deletes all intersections that reside in both lists. for intersection_function_name in set(list_imports).intersection(set(list_func_names)): list_func_names.remove(intersection_function_name) # For each func_name in set for func_name in list_func_names: k = get_nodes(func_name,list_imports) if k != None and k != []: data_o.append(k) # Sort by number of child nodes sortByColumn(data_o,1) for results in data_o: print 'Subroutine-Block: %s' % results[0] print '\tChild Nodes: %s' % results[1] if len(results) >= 3: print '\tThreshold Function: %s' % results[2] if len(results) == 4: print '\tThreshold Function: %s' % results[3] print 'Completed' if __name__ == "__main__": main()
No comments:
Post a Comment