tsa4ida.py - Rule Based Function Profiler for IDA

Reverse engineering malware is typically a repetitive task that requires a lot of overhead knowledge. A large part of the knowledge consists of understanding assembly and APIs. An area that I would like to master is having a complete understanding of how the assembly language can be converted back to it's original language. Sadly, I'm in no way near that point. Most experienced reverse engineers can glance at a function's assembly, APIs, arguments and the flow path and infer what it does in matter of seconds. Especially for functionality that is common in malware.  This is not even the hard part. The hard part comes when we have a large functions with no APIs and is just raw assembly. These function could be crypto algorithms, compression algorithms or other painful libraries. But then again these functions are where the learning analysis and fun happen. Once the complicated code is mastered all the other code is just a walk in the park. In order to understand  the complicated code we  need more time for analysis. To free up more time we might as well automate the knowledge that we have already have. The following is a proof of concept to help with documenting that knowledge for IDAScope. Dan and I have been talking about this for months. Even users are starting to call us out on it. The lag on implementation is my fault. Luckily Hurricane Sandy gave me some free time for coding by taking me away from my comfortable and easily distractable apartment.

What is tsa4ida.py? The script is a rule based function profiler of functions. It uses the Python library ConfigParser to extract user defined rules from a configuration files, parses the configs to extract the rules and then scans each function for those rules. If a function is matched the user will have the option to rename the function or add a function comment. The rules can be of two types. The first one is strings and the second is simple regular expressions.

value1 = FindFirstUrlCacheEntryA
value2 = FindNextUrlCacheEntryA
value3 = DeleteUrlCacheEntry

value1 = OpenProcess
value2 = WriteProcessMemory
value3 = CreateRemoteThread

[Imported Function Call]
regex1 = call\s*(eax|ebx|ecx|edx|esi|edi)
value = GetProcAddress

The first line contains our rule between brackets "[ RULE ]". It should be noted if we were to rename the function in IDA we will need to remove all blank lines and non-standard chars. The next contains a variable that contains our search string. Each variable name will need to be unique. Quotes are not needed. Each value in the rule set will be used to search the instructions of a function. In the example above if a function contains the strings "FindFirstUrlCacheEntryA","FindNextUrlCacheEntryA" and "DeleteUrlCacheEntry" it will be labeled or renamed to "Cache". The same syntax applies to the "inject" rule. The brackets dictate a new rule.  The third rule "Import Function Call" contains a simple regular expression that searches for a dynamic instruction call such as "call eax" and for the string "GetProcAddress".  If the regex and the string is found the function will be renamed or labeled "Imported Function Call". The string "regex" must be in the name of the variable to define the user of a regex. As of this time it is recommended to keep rules simple. My original intention was to use Yara for doing the rule parsing and scanning but I was unable to configure IDA and/or Yara to import Yara from IDAPython.

In order to call tsa4ida we will need to have the rule file located in the working directory of the script. Then we will call the script using IDA.

Once the script runs we will see any hits on our rules.

The code and samples rules for tsa4ida.py can be found on BitBucket. I will be making rules specifically for banking malware over the next couple of weeks. Please make sure to check out the repo every now and then. If you would like to add rules please email me (address is in the source code). Please leave any thought or suggestions in the comments or feel free to email me or contact me on twitter.

## tsa4ida.py - rule based function profiler
## Created by alexanderhanelgmailcom
## Version 1.0 - Thanks to PNX, Kernel Sanders and CB. 
## To do 
## Use Yara to replace ConfigParer
##     [status] - Yara can not sucessefully be imported via IDAPython

import ConfigParser
import idautils
import idc
import os
import re

class Profiler():    
    def __init__(self, config_filename=None):
        self.config_filename = "sigs.ini"
        if config_filename:
            self.config_filename = config_filename
        self.script_file_path = \
            os.path.realpath(__file__)[:os.path.realpath(__file__).rfind(os.sep) + 1]
        self.error = False
        self.function_eas = []
        self.parser = ConfigParser.SafeConfigParser()
        self.comment = False
        self.rename = False

    def getFunctions(self):
        'get a lit of function addresses'
        for func in idautils.Functions():
            # Ignore Library Code
            flags = GetFunctionFlags(func)
            if flags & FUNC_LIB:

    def getInstructions(self, function):
        'get all instruction in a function'
        buff = ''
        for x in idautils.FuncItems(function):
            buff = buff + idc.GetDisasm(x) + '\n'
        return buff

    def addToFunction(self, address, comment):
        'add comment to function or rename function'
        if self.rename == True:
            if comment not in idc.GetFunctionName(address):
                idc.MakeNameEx(address, str(comment) + str('_') + idc.GetFunctionName(address), idc.SN_NOWARN)
        if self.comment == True:
            curCmt = idc.GetFunctionCmt(address,1)
            if comment not in curCmt:
                comment = comment + ' ' + curCmt
                idc.SetFunctionCmt(address, comment, 1)

    def parseConfig(self):
        'parse the the configs file'
            with open(self.script_file_path + os.sep + self.config_filename) as f: pass
        except IOError as e:
            print 'Error: Could not find sigs.ini'
            self.error = True
        if not os.path.isfile(self.script_file_path + os.sep + self.config_filename):
            print 'Error: Could not find sigs.ini'
            self.error = True
            self.parser.read(self.script_file_path + os.sep + self.config_filename)
        except ConfigParser.ParsingError, err:
            print 'Error: Could not parse %s', err
            self.error = True

    def getRuleNames(self):
        'gets name of all the rules in the config'  
        rules = []
        for rule in self.parser.sections():
        return rules

    def checkValues(self, buffer, section_name): 
        'run rules against instruction buffer'
        is_value_present = False
        values = []
        regexs = []
        # Get values from the rules
        for x, value in self.parser.items(section_name):
            if 'regex' in x:
        # check if values are in the instruction buffer
        for item in values:
            if item in buffer:   
                is_value_present = True
                return False
            if not item in values:
                return False
        # We can return because there are no regexs 
        if len(regexs) == 0:
            return True
        for item in regexs:
                regex = re.compile(item,re.S)
            except Exception:
                print "Error: Invalid Regular Expression Pattern"
            test =  re.search(regex, buffer) 
            if re.search(regex, buffer) == None:
                return False    
        return True

    def run(self):
        if self.error is True:
        print '_Status: Started'
        # loop through each function
        for function_addr in self.function_eas:
            instBuffer = self.getInstructions(function_addr)
            # loop through each rule
            for section_name in self.parser.sections():
                status = self.checkValues(instBuffer, section_name)
                if status == True:
                    self.addToFunction(function_addr, section_name)
                    print "Rule:", section_name, "found at", hex(function_addr)
        print '_Status: Completed'

if __name__ == '__main__':
    profiler = Profiler()
    profiler.comment = True
    profiler.rename = False


[Imported Function Call]
regex1 = call\s*(eax|ebx|ecx|edx|esi|edi)
value = GetProcAddress

value1 = FindFirstUrlCacheEntryA
value2 = FindNextUrlCacheEntryA
value3 = DeleteUrlCacheEntry

value1 = OpenProcess
value2 = WriteProcessMemory
value3 = CreateRemoteThread

[Adjust Privileges]
value1 = SeShutdownPrivilege
value2 = LookupPrivilegeValue
value3 = AdjustTokenPrivileges

[Windows File Protection Related]
value1 = sfc_os.dll
value2 = LoadLibrary

[Restart Machine]
value1 = SeShutdownPrivilege
value2 = LookupPrivilegeValue
value3 = AdjustTokenPrivileges
value4 = ExitWindows

[Enumerate Processes]
value1 = CreateToolhelp32Snapshot
value2 = Process32First

[Firefox Hook APIs]
value1 = nspr4.dll
value2 = PR_Write
value3 = PR_Read
value4 = PR_Close

[Get Firefox APIs]
value1 = PR_OpenTCPSocket
value2 = PR_Close
value3 = PR_Read
value4 = PR_Write
value5 = GetProcAddress

[Search for File]
value1 = FindFirstFileA
value2 = FindClose

[Check if installed]
value1 = 1F0001h
value2 = OpenMutexA
value3 = ExitProcess

[Kill Machine]
value1 = \\\\.\\PHYSICALDRIVE0
value2 = CreateFile
value3 = WriteFile

[Delete Restore Point API]
value1 = SrClient.dll
value2 = SRRemoveRestorePoint    

[Disable Restore Point Registry]
value = DisableSR

No comments:

Post a Comment