#include <stdio.h> #include <windows.h> typedef void (WINAPI *PGNSI)(HWND, LPCSTR, LPCSTR, UINT); int main(void) { PGNSI pGNSI; HANDLE hdll = LoadLibrary("User32.dll"); if (NULL != hdll) printf("User32 loaded at %x\n", hdll); pGNSI = (PGNSI)GetProcAddress(hdll, "MessageBoxA"); if (NULL != pGNSI) pGNSI(NULL, "MessageBoxA Was Called", "Yep", 0); return 0; }
Calling Locally Allocated Variables |
Calling Global Variables |
Run-time dynamic linking APIs can be recovered typically in one of three methods. The first method uses global variables.We first locate where the dword that is being called is the destination (mov, dword_00XXXX, eax) . From there we will trace the source register (eax) to where it was populated. If the register we are interested becomes the destination, we will then start tracing the source register, this will continue until our source register is eax and the previous call is GetProcAddress. Once we found GetProcAddress we read the second argument (lpProcName) and now we can label or comment the call with the name of the API. An example of this can be seen in the Calling Global Variables image above. The second method is similar to the previously described method except the call is to a local variable rather than a dword. An example of this can be seen in Calling Locally Allocated Variables image from the MessageBoxA example. The third method is when the LoadLibrary and GetProcAddress are handled within a function and passed or returned back to the caller.. This method is not always static. Each programmer can decide to implement this is their own way. The programmer could have the function return the address, save the address in a buffer passed as an argument or etc, etc. This method gets more complicated because we have to reverse the function and understand how the address is being returned or saved off.
In regards to these methods we only care about two things. The second argument of GetProcAddress and where we are calling the return of GetProcAddress. Since we are not calculating or manipulating the return address (xor, mul, add, etc) all we need to do is track when the data is moved around. The process of tracking data being used is called data-flow analysis and is a fundamental concept in compilers. Note: I learned about this concept last week. I found it by asking myself that great question of "What would a person smarter than myself call this?". Wikipedia for the win. Data-flow analysis is an extremely powerful tool in static analysis. We can use it to estimate/infer how data is being used during dynamic analysis and trace where data originated from. Okay time for some examples and code.
reiat.py is a script that will use data-flow to rename or add comments to calls to APIs that were loaded via run-time dynamic linking. If we were to run reiat.py on the MessageBoxA we would see the following comment to where the function is called.
MessageBoxA Comment Added by reiat.py |
dwords renamed to the APIs |
Time for example of where the script fails. If the script fails the output will contain an Error message with the API name and location. This should help with tracking it down. Here are two examples.
Odds are I will be continuing working on data-flow analysis in IDA. If you know of any good papers, code or have some comments please send me an email (source code below), ping me on twitter or leave a comment. I'm still working on parts of the code but I'm hopeful it will be useful to others.
BitBucket Repository - LINK
Please download code from the repo link. New versions will not be updated below.
''' Name: reiat.py Version: 0.2 Description: renames and add coments to apis that are are called via run-time dynamic analysis in IDA. To execute the script just call it in IDA Author: alexander<dot>hanel<at>gmail<dot>com License: reiat.py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. ''' from idaapi import * import idautils import idc class getProcAddresser(): def __init__(self): self.getProcAddressRefs = [] self.registers = ['eax', 'ebx', 'ecx', 'edx', 'esi', 'edi', 'esp', 'ebp'] def getRefs(self): 'get all addresses of GetProcAddress' for addr in CodeRefsTo(LocByName("GetProcAddress"), 0): self.getProcAddressRefs.append(addr) def getlpProcName(self, GetProcAddress): 'returns the address of the 2nd argument to GetProcAddress' pushcount = 0 argPlacement = 2 instructionMax = 10 + argPlacement currAddress = PrevHead(GetProcAddress,minea=0) while pushcount <= argPlacement and instructionMax != 0: if 'push' in GetDisasm(currAddress): pushcount += 1 if pushcount == argPlacement: return currAddress if 'GetModuleHandle' in GetDisasm(currAddress): pushcount -= 1 instructionMax -= 1 currAddress = PrevHead(currAddress,minea=0) return None def getString(self, address): 'reads the string value that is the second push' # note it will be useful to include back tracing code for variable reference. api = GetString(GetOperandValue(address,ASCSTR_C), -1) if api == None: return None else: return api def traceBack(self, address): funcStart = GetFunctionAttr(address, FUNCATTR_START) var = GetOpnd(address, 0) # return if digit is being pushed, likely an error on parsing if var.isdigit(): return None # return, value is not being passed as a register. already checked # if offset in calling function if var not in self.registers: return None # get next address currentAddress = PrevHead(address) # get dism dism = GetDisasm(currentAddress) # until end of function # Example: # mov ebp, offset aInternetconnec ; "InternetConnectA" # push ebp while(currentAddress >= funcStart): # var = 'ebp', if var in dism: # if operand == ebp, our tracked var is the destination if GetOpnd(currentAddress,0) == var: mnem = GetMnem(currentAddress) # if our tracked var is having something moved into it if 'mov' in mnem or 'lea' in mnem: # 4 scenarios on mov: string, digit, register unknown.. # 1. Check if destination is a string # read operand 1 value, get address of "offset aInternetconnec" value = GetOperandValue(currentAddress,1) if value != None: api = GetString(value, -1) if api != None: return api # 2. Check if register var = GetOpnd(currentAddress,1) # 3. Check if digit if var.isdigit() == True: return None # 4. Unknown if var == None: return None currentAddress = PrevHead(currentAddress) dism = GetDisasm(currentAddress) return None def traceForwardRename(self, address, apiString): 'address is call GetProcAddress, apiString is the API name' currentAddress = NextHead(address) funcEnd = GetFunctionAttr(address, FUNCATTR_END) var = 'eax' lastref = '' lastrefAddress = None while currentAddress < funcEnd: dism = GetDisasm(currentAddress) # if we are not referencing the return from GetProcAddress # continue to next instuction if var not in dism: currentAddress = NextHead(currentAddress) continue # mov dword_1000F224, eax # call esi ; GetProcAddress # push offset aHttpaddreque_0 ; "HttpAddRequestHeadersW" # push dword_1000FD08 ; hModule # mov dword_1000F228, eax # if we have the above instructions after GetProcAddress the code # is saving off the address of HttpAddRequestHeadersW. if GetMnem(currentAddress) == 'mov' and GetOpnd(currentAddress,1) == var and GetOpType(currentAddress,0) == 2: # rename dword address status = True status = MakeNameEx(GetOperandValue(currentAddress,0), apiString, SN_NOWARN) if status == False: # some api names are already in use. Will need to be renamed to something generic. # IDA will typically add a number to the function or api name. GetProcAddress_0 status = MakeNameEx(GetOperandValue(currentAddress,0), str("__" + apiString), SN_NOWARN) if status == False: return None return currentAddress # tracked data is being moved into another destination if GetMnem(currentAddress) == 'mov' and GetOpnd(currentAddress,1) == var: lastref = var lastrefAddress = currentAddress var = GetOpnd(currentAddress,0) # add comments for call var # example: # call ds:GetProcAddress # ... # call eax if GetMnem(currentAddress) == 'call' and GetOpnd(currentAddress,0) == var: cmt = GetFunctionCmt(currentAddress,1) if apiString not in cmt: cmt = cmt + ' ' + apiString MakeComm(currentAddress, cmt) return currentAddress # eax is usually over written by the the return value if GetMnem(currentAddress) == 'call' and var == 'eax': return None currentAddress = NextHead(currentAddress) return None def rename(self): self.getRefs() for addr in self.getProcAddressRefs: lpProcNameAddr = self.getlpProcName(addr) if lpProcNameAddr == None: print "ERROR: Address of lpProcName at %s was not found" % hex(addr) continue lpProcName = self.getString(lpProcNameAddr) if lpProcName == None: lpProcName = self.traceBack(lpProcNameAddr) if lpProcName == None: print "ERROR: String of lpProcName at %s was not found" % hex(addr) continue status = self.traceForwardRename(addr, lpProcName) if status == None: print "ERROR: Could not rename address at %s " % hex(addr) continue else: print "RENAMED %s at %s" % ( lpProcName, hex(status)) if __name__ == "__main__": ok = getProcAddresser() ok.rename()
Thank you for this useful script.
ReplyDeleteFeature request: Oftentimes, some DLLs stash away pointers using EncodePointer() and later (at time of use) call DecodePointer() to get the real pointer. It may be possible to enhance your script to "see through" this encode-decode step and annotate the final decoded pointer with the real function name.
Nicely done, very useful, thanks for the script ;)
ReplyDelete