Hello,
I forgot to post a recent IDAPython plugin that I created for viewing Microsoft SDK documentation in IDA. Here is an example screenshot of msdocsviewer .
The repository for the plugin can be found here.
Hello,
I forgot to post a recent IDAPython plugin that I created for viewing Microsoft SDK documentation in IDA. Here is an example screenshot of msdocsviewer .
Function Trapper Keeper is an IDA plugin for writing and storing function notes in IDBs, it’s a middle ground between function comments and IDA’s Notepad. It’s a tool that I have wanted for a while. To understand why I wanted Function Trapper Keeper, it might be worth describing my process of reverse engineering a binary in IDA.
Upon opening a binary, I always take note of the code to data ratio. This is can be inferred by looking at the Navigator band in IDA. If there is more data than code in the binary, it can hint that the binary is packed or encrypted. If so, I typically stop the triage of the binary to start searching for cross-references to the data. In many instances the cross-references can lead to code used for decompressing or decrypting the data. For example, if the binary is a loader it would contain the second stage payload encrypted or some other form of obfuscation. By cross-referencing the data and finding the decryption routine of the loader, I can quickly pivot to extracting the payload. Another notable ratio is if the data or code is not consistent. If the code changes from data to code and back, it is likely that the analysis process of IDA found inconsistencies in the disassembled functions. This could be from anti-disassemblers, flawed memory dumps or something else that needs attention. After the ratios, I look at the strings. I look for the presence of compilers strings, strings related to DLLs and APIs, user defined strings or the lack of user defined strings. If the latter, I’ll start searching for the presence of encrypted strings and then cross-referencing their usage. This can help find the function responsible for string decryption. If I can’t find the string decryption routine, I’ll use some automation to find all references to XOR instructions. After reviewing strings, I’ll do a quick triage of imported function. I like to look for sets of APIs that I know are related to certain functionality. For example, if I see calls to VirtualAlloc, VirtualProtect and CreateRemoteThread, I can infer that process injection is potentially present in the binary.
After the previously described triage, I have high-level overview of the binary and usually know if I should do a deep dive of the binary or if I need to focus on certain functionality (encrypted strings, unpacked, etc). If I’m doing a deep dive I like to label all functions. For my IDBs, the name of the function hints at my level of understanding of the function. The more descriptive the function name, the more I know about it. If I know the function does process injection into explorer.exe I might name it “inject_VirtRemoteThreadExplorer”. If I don’t care about the function but I need to note it’s related to strings and memory allocation I might label it “str_mem”. If I’m super lazy I might name the function “str_mem_??”, and yes you can use “?” in IDA’s function names. This is a reminder that I should probably double check the function if it’s used a lot. Once I have all the functions labeled, I can be confident of the general functionality of the binary. This is when I start digging deeper into the functions.
This can vary but with lots of malware families a handful of the functions contain the majority of the notable functionality. This is commonly where I spend the most of my time reversing. I have said it before in a previous post, that if you aren’t writing then you aren’t reversing. Since I spend lots of time in these functions, I like to have my notes close by. Notes can be added as Function comments but the text disappears once you scroll down the function, plus the text can’t be formatted or the function comments can’t be easily exported and IDA’s Notepad suffers from the same issues (minus the export). Having all the function notes in a single pane and being able to export than to markdown is super helpful. My favorite feature of the plugin is when I scroll from function to function the text refreshes for each function. The plugin can be seen in the right of the following image.
Having a description accessible minimizes the amount of time I have to read code I already reversed, which is useful when opening up old IDBs. I hope others find it as useful as I do.
Here is a link to the repo.
For more information on the Navigation band in IDA check out Igor’s post.
Please leave a comment, ping me on twitter or mastodon or create an issue on GitHub.
A common question when first reverse engineering ransomware is “what is a good resource for learning cryptography?”. Having an understanding of cryptography is essential when reversing ransomware. Most reverse engineers need to know how to identify the encryption algorithm, be able to follow the key generation, understand key storage and ensure the encryption implementation isn’t flawed. To accomplish these items it is essential to have a good foundational knowledge of cryptography. The following are some recommendations that I have found beneficial on my path to learning cryptography.
One of the most important skills is having an understanding of how common encryption algorithms work. The best introductory book on cryptography is Understanding Cryptography: A Textbook for Students and Practitioners. It was written in a way that “teaches modern applied cryptography to readers with a technical background but without an education in pure mathematics” (source). The book also covers all modern crypto schemes commonly used. One of the best parts about the book is each chapter has a lecture on YouTube taught by the authors. This format is useful because it reinforces the concepts or adds more details to some of the more difficult topics.
gopep (Go Lang Portable Executable Parser) is project I have been working on for learning about Windows Portable Executables (PE) compiled in Go. As most malware analyst have noticed, there has been an uptick in malware (particularly ransomware) compiled in Go. At first glance, reverse engineering Go PE files can be intimidating. The files are commonly over 3MB in size, contains thousands of functions and have a unique calling convention that can return multiple arguments. The first time I opened up an executable in IDA, I was lucky because the plugin IDAGolangHelper was able to identify everything. The second time, I wasn't so lucky. This motivated me to port IDAGolangHelper to IDA 7.5, Python 3, convert the GUI to PyQT and include some code that parsed the Go source code and added the Go function comments to the IDB. After everything was done, my code didn't fix up the IDB. This lead me writing gopep. In IDAGolangHelper defense, the issue was because the hard-coded bytes used to identify Go version had not been updated for a couple of years. I should have checked this first or checked one of the multiple pull requests.
gopep is a Python script that can parse Go compiled PE file without using Go. The script only relies on Pefile. There are similar scripts that are excellent for ELF executables but during my analysis I noticed they threw exceptions when parsing PE files. Below we can see the command line options that gopep currently supports, it can also be used as a class.
C:\Users\null\Documents\repo\gopep>python gopep.py -h usage: gopep.py [-h] [-c C_DIR] [-e E_FILE] [-x EA_DIR] [-v IN_FILE] [-m MD_FILE] [-t T_FILE] [-ev ET_FILE] gopep Go Portable Executable Parser optional arguments: -h, --help show this help message and exit -c C_DIR, --cluster C_DIR cluster directory of files -e E_FILE, --export E_FILE export results of file to JSON -x EA_DIR, --export_all EA_DIR export results of directory to JSONs -v IN_FILE, --version IN_FILE print version -m MD_FILE, --module-data MD_FILE print module data details -t T_FILE, --triage T_FILE triage file, print interesting attributes -ev ET_FILE, --everything ET_FILE print EVERYTHING!
gopep is primarily for exploring structures within PE files compiled in Go but it also supports clustering. The clustering algorithm is similar to import hashing but uses a sets of symbol names and file paths that are unique to executables compiled in Go. As with most executable clustering algorithms, it can be broken by compressing the executable. The clustering can be done by passing a command of -c and a directory of files that should be clustered. I would not recommend clustering to many files using my code. You'd be better off exporting the hashes using the command -x , parsing the JSONs and then querying that way.
The README for the project has more details on the fields parsed, my notes and a great set of references for anyone wanting to read up on what happens when Go compiles an executable.
https://github.com/alexander-hanel/gopep
For anyone doing deobfuscation of languages interpreted by wscript.exe, I would recommend investigating hooking APIs. Most of the APIs that need to be hooked can be identified by using an API monitor. Also with hooking it allows you to control what the APIs return. This is useful if you want to recover all URLS that sample might want to connect to. I'll try to post some example code in the next week or two.
[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)] Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin
def hamming_distance(bytes_a, bytes_b): return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))
substr_counter = Counter(message[i: i+size] for i in range(len(message) - size)) sub_count = substr_counter.most_common(32)
def xor_mb(message, key): return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))
def pe_carv(data): '''carve out executable using pefile's trim''' c = 1 for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]: # slice out executable temp_buff = data[offset:] try: pe = pefile.PE(data=temp_buff) except: continue return pe.trim() return None
""" Author: Alexander Hanel Name: pe_ham_brute.py Purpose: - POC that searches for n-grams and uses them as the XOR key. - Also uses hamming distance to guess key size. Check out cryptopals Challenge 6 for more details https://cryptopals.com/sets/1/challenges/6 Example: pe_ham_brute.py ba5aa03d724d17312d9b65a420f91285caff711e2f891b3699093cc990fdaae0 Hamming distances & calculated key sizes [(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)] Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin """ import base64 import string import sys import collections import pefile import re import hashlib from cStringIO import StringIO from collections import Counter from itertools import cycle from itertools import product DEBUG = True def xor_mb(message, key): return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key))) def hamming_distance(bytes_a, bytes_b): return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b))) def key_len(message, key_size): """"returns [(dist, key_size),(dist, key_size)]""" avg = [] for k in xrange(2,key_size): hd = [] for n in xrange(len(message)/k-1): hd.append(hamming_distance(message[k*n:k*(n+1)],message[k*(n+1):k*(n*2)])/k) if hd: avg.append((sum(hd) / float(len(hd)), k)) return sorted(avg)[:10] def pe_carv(data): '''carve out executable using pefile's trim''' c = 1 for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]: # slice out executable temp_buff = data[offset:] try: pe = pefile.PE(data=temp_buff) except: continue return pe.trim() return None def write_file(data, key): m = hashlib.md5() m.update(data) name = m.hexdigest() key_name = "key-" + name + ".bin" file_name = name + ".bin" print "Length: %s, Key: %s File Name: %s" % (len(key),key, file_name) f = open(file_name, "wb") fk = open(key_name , "wb") f.write(data) fk.write(key) f.close() fk.close() def run(message): key_sizes = key_len(message, 64) if DEBUG: print "Hamming distances & calculated key sizes" print key_sizes for temp_sz in key_sizes: size = temp_sz[1] substr_counter = Counter(message[i: i+size] for i in range(len(message) - size)) sub_count = substr_counter.most_common(32) for temp in sub_count: key, count = temp if count == 1: break temp = xor_mb(message, key) pe_c = pe_carv(temp) if pe_c: write_file(pe_c, key) return data = open(sys.argv[1],'rb').read() run(data)
__@___:~/hachoir-subfile crsenvironscan.xls [+] Start search on 126444 bytes (123.5 KB) [+] File at 0 size=80384 (78.5 KB): Microsoft Office document [+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9 [+] End of search -- offset=126444 (123.5 KB) Total time: 1 sec 478 ms -- global rate: 83.5 KB/sec __@___:~/$ python ExtractSubFile.py crsenvironscan.xls [+] Start search on 126444 bytes (123.5 KB) [+] File at 0 size=80384 (78.5 KB): Microsoft Office document => /home/file-0001.doc [+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9 => /home/file-0002.swf [+] End of search -- offset=126444 (123.5 KB)