Hooked on Mnemonics Worked for Me

msdocsviewer

Hello,

I forgot to post a recent IDAPython plugin that I created for viewing Microsoft SDK documentation in IDA. Here is an example screenshot of msdocsviewer .

The repository for the plugin can be found here.

Function Trapper Keeper - An IDA Plugin For Function Notes

Function Trapper Keeper is an IDA plugin for writing and storing function notes in IDBs, it’s a middle ground between function comments and IDA’s Notepad. It’s a tool that I have wanted for a while. To understand why I wanted Function Trapper Keeper, it might be worth describing my process of reverse engineering a binary in IDA.

Upon opening a binary, I always take note of the code to data ratio. This is can be inferred by looking at the Navigator band in IDA. If there is more data than code in the binary, it can hint that the binary is packed or encrypted. If so, I typically stop the triage of the binary to start searching for cross-references to the data. In many instances the cross-references can lead to code used for decompressing or decrypting the data. For example, if the binary is a loader it would contain the second stage payload encrypted or some other form of obfuscation. By cross-referencing the data and finding the decryption routine of the loader, I can quickly pivot to extracting the payload. Another notable ratio is if the data or code is not consistent. If the code changes from data to code and back, it is likely that the analysis process of IDA found inconsistencies in the disassembled functions. This could be from anti-disassemblers, flawed memory dumps or something else that needs attention. After the ratios, I look at the strings. I look for the presence of compilers strings, strings related to DLLs and APIs, user defined strings or the lack of user defined strings. If the latter, I’ll start searching for the presence of encrypted strings and then cross-referencing their usage. This can help find the function responsible for string decryption. If I can’t find the string decryption routine, I’ll use some automation to find all references to XOR instructions. After reviewing strings, I’ll do a quick triage of imported function. I like to look for sets of APIs that I know are related to certain functionality. For example, if I see calls to VirtualAlloc, VirtualProtect and CreateRemoteThread, I can infer that process injection is potentially present in the binary.

After the previously described triage, I have high-level overview of the binary and usually know if I should do a deep dive of the binary or if I need to focus on certain functionality (encrypted strings, unpacked, etc). If I’m doing a deep dive I like to label all functions. For my IDBs, the name of the function hints at my level of understanding of the function. The more descriptive the function name, the more I know about it. If I know the function does process injection into explorer.exe I might name it “inject_VirtRemoteThreadExplorer”. If I don’t care about the function but I need to note it’s related to strings and memory allocation I might label it “str_mem”. If I’m super lazy I might name the function “str_mem_??”, and yes you can use “?” in IDA’s function names. This is a reminder that I should probably double check the function if it’s used a lot. Once I have all the functions labeled, I can be confident of the general functionality of the binary. This is when I start digging deeper into the functions.

This can vary but with lots of malware families a handful of the functions contain the majority of the notable functionality. This is commonly where I spend the most of my time reversing. I have said it before in a previous post, that if you aren’t writing then you aren’t reversing. Since I spend lots of time in these functions, I like to have my notes close by. Notes can be added as Function comments but the text disappears once you scroll down the function, plus the text can’t be formatted or the function comments can’t be easily exported and IDA’s Notepad suffers from the same issues (minus the export). Having all the function notes in a single pane and being able to export than to markdown is super helpful. My favorite feature of the plugin is when I scroll from function to function the text refreshes for each function. The plugin can be seen in the right of the following image.

Having a description accessible minimizes the amount of time I have to read code I already reversed, which is useful when opening up old IDBs. I hope others find it as useful as I do.

Here is a link to the repo.

For more information on the Navigation band in IDA check out Igor’s post.

Please leave a comment, ping me on twitter or mastodon or create an issue on GitHub.

Recommended Resources for Learning Cryptography: RE Edition

A common question when first reverse engineering ransomware is “what is a good resource for learning cryptography?”. Having an understanding of cryptography is essential when reversing ransomware. Most reverse engineers need to know how to identify the encryption algorithm, be able to follow the key generation, understand key storage and ensure the encryption implementation isn’t flawed. To accomplish these items it is essential to have a good foundational knowledge of cryptography. The following are some recommendations that I have found beneficial on my path to learning cryptography.

One of the most important skills is having an understanding of how common encryption algorithms work. The best introductory book on cryptography is Understanding Cryptography: A Textbook for Students and Practitioners. It was written in a way that “teaches modern applied cryptography to readers with a technical background but without an education in pure mathematics” (source). The book also covers all modern crypto schemes commonly used. One of the best parts about the book is each chapter has a lecture on YouTube taught by the authors. This format is useful because it reinforces the concepts or adds more details to some of the more difficult topics.

After Understanding Cryptography I’d recommend a non-textbook approach using the cryptopals crypto challenges. It is basically a set of problems that progressively get harder. You can solve the problems using a programming language of your choice. I have yet to complete the challenges but I’d recommend attempting and solving the first two sets of problems. They introduce you to a lot of foundational concepts that can actually be applied. From what I learned in the first set, I was able to easily crack XOR encrypted executable payloads. I love cryptopals so much that I created a mirror of the site and converted it to markdown so I can easily download everything via git.

Once a foundational knowledge of cryptography has been established it is useful to see how the algorithms look when compiled. I came across this while I was reversing a family of ransomware and couldn’t correctly decrypt the data. I was able to recover the private RSA key, decrypt the AES key encrypted with the RSA private key and decrypt files using AES in CTR but the after a certain amount of decrypted bytes the data would be corrupted. In response to this I continuously reversed the code, studied AES and all it’s different modes, compiled multiple versions of AES, opened them up in a disassembler and diffed the results but the data was still corrupted. Everything pointed to AES in CTR, eventually I identified that the CTR loop had a off-by-one error and it didn’t matter because (as a colleague pointed out) they also stored the extra byte of the key. It was only when I accounted for the off-by-one error in my decryptor that I was able to successfully decrypt files.

After this incident whenever I come across a new encryption algorithm that I don’t understand or want to learn more about; I search for references, search for source code, add them to README.md, compile the executables and upload the .exes along with the PDB to a repository named asm-examples. I find the exploration of the disassembled code along with symbols and names from the PDB to be valuable. It aids in being able to quickly identify encryption algorithms and makes the disassembled or decompiled code less intimidating.

To recap, my goto resources for learning encryption are Understanding Cryptography: A Textbook for Students and Practitioners, cryptopals and comparing compiled binaries to the source code. This isn’t the most in-depth approach to learning cryptography but for supporting malware analysis and reverse engineering ransomware it works well.

gopep (Go Lang Portable Executable Parser)

gopep (Go Lang Portable Executable Parser) is project I have been working on for learning about Windows Portable Executables (PE) compiled in Go. As most malware analyst have noticed, there has been an uptick in malware (particularly ransomware) compiled in Go. At first glance, reverse engineering Go PE files can be intimidating. The files are commonly over 3MB in size, contains thousands of functions and have a unique calling convention that can return multiple arguments. The first time I opened up an executable in IDA, I was lucky because the plugin IDAGolangHelper was able to identify everything. The second time, I wasn't so lucky. This motivated me to port IDAGolangHelper to IDA 7.5, Python 3, convert the GUI to PyQT and include some code that parsed the Go source code and added the Go function comments to the IDB. After everything was done, my code didn't fix up the IDB. This lead me writing gopep. In IDAGolangHelper defense, the issue was because the hard-coded bytes used to identify Go version had not been updated for a couple of years. I should have checked this first or checked one of the multiple pull requests.

gopep is a Python script that can parse Go compiled PE file without using Go. The script only relies on Pefile. There are similar scripts that are excellent for ELF executables but during my analysis I noticed they threw exceptions when parsing PE files. Below we can see the command line options that gopep currently supports, it can also be used as a class.

C:\Users\null\Documents\repo\gopep>python gopep.py -h
usage: gopep.py [-h] [-c C_DIR] [-e E_FILE] [-x EA_DIR] [-v IN_FILE] [-m MD_FILE] [-t T_FILE] [-ev ET_FILE]

gopep Go Portable Executable Parser

optional arguments:
  -h, --help            show this help message and exit
  -c C_DIR, --cluster C_DIR
                        cluster directory of files
  -e E_FILE, --export E_FILE
                        export results of file to JSON
  -x EA_DIR, --export_all EA_DIR
                        export results of directory to JSONs
  -v IN_FILE, --version IN_FILE
                        print version
  -m MD_FILE, --module-data MD_FILE
                        print module data details
  -t T_FILE, --triage T_FILE
                        triage file, print interesting attributes
  -ev ET_FILE, --everything ET_FILE
                        print EVERYTHING!

gopep is primarily for exploring structures within PE files compiled in Go but it also supports clustering. The clustering algorithm is similar to import hashing but uses a sets of symbol names and file paths that are unique to executables compiled in Go. As with most executable clustering algorithms, it can be broken by compressing the executable. The clustering can be done by passing a command of -c and a directory of files that should be clustered. I would not recommend clustering to many files using my code. You'd be better off exporting the hashes using the command -x , parsing the JSONs and then querying that way.

The README for the project has more details on the fields parsed, my notes and a great set of references for anyone wanting to read up on what happens when Go compiles an executable.

https://github.com/alexander-hanel/gopep

Updates

Hello,

Some real quick updates. I have released an new version of The Beginner's Guide to IDAPython. It has been rewritten to cover changes that IDA 7.0 introduced. I plan on adding a couple of more chapters in the upcoming months on HexRays, Structures, GUIs and (the chapter I'm most excited about) using The Unicorn Engine within IDA.

Sorry for the lack updates. Lately, I have been using a combination of Twitter and MarkDown for posting content. I can be found on Twitter @nullandnull. If you don't want to follow me, add me to a list. I swear list are the most underrated feature of Twitter. I also created an account on GitHub. I still need to transfer all my old stuff from my Bitbucket account. Two kind of recent projects I created were asmdec and capstool. The README.md within the repos contain all the needed details.

As always feel free to send me an email at alexander dot hanel at gmail dot com or ping me on Twitter.

Cheers.

A Primer on Cracking XOR Encoded Executables

A while back Locky JS downloaders were downloading executable payloads encrypted with XOR. The infection chain consisted of a victim double clicking on a JS (JavaScript), JSE (Encoded JavaScript), WSH (Windows Script Host ) or another Jscript based interpreted language, the script would then connect to a compromised website, download a binary file, decrypt the binary file using XOR and then execute the decrypted executable file. At the time I was relying on one of two analysis approaches to retrieve the decrypted payload. The first was using automated malware analysis systems to recover the dropped payload. The second was reversing obfuscated JavaScript or other languages interpreted by wscrpt.exe to find the XOR key. Once I found the key I would decrypt the network traffic carved from a PCAP to recover the Locky executable. Both of these approaches are laborious because either I was relying on automated malware analysis system or successfully deobfuscating the script to recover the key.

Side Note:

For anyone doing deobfuscation of languages interpreted by wscript.exe, I would recommend investigating hooking APIs. Most of the APIs that need to be hooked can be identified by using an API monitor. Also with hooking it allows you to control what the APIs return. This is useful if you want to recover all URLS that sample might want to connect to. I'll try to post some example code in the next week or two.

Since the attackers were using XOR on an Portable Executable (PE) file I decided to crack it. This is not very difficult because XOR is not a secure cipher and when used on a portable executable file a padding attack is introduced. Cracking XOR is a four step process. The first is recovering the key size, second is recovering the key, then decrypting the data with the found key and finally checking for the correct decrypted data.

To recover the key size Hamming distance can be used. Hamming distance can be used to calculate the number of substitutions needed to change one string into the other. From a XOR cracking standpoint, the smallest hamming distance found in a XOR file is likely the XOR key size or a multiple of it. I say a multiple of it because sometimes the smallest hamming distance could be the key size times 2 or another value. For example the below output contains a list of tuples that has the hamming distance and the key size. The actual key size was 29 but the lowest hamming distance found was 58.

[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin

Here is the code for computing the hamming distance. Note, the two strings must have the same size.

def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))

Identifying the key size is very important. Earlier versions of my script used standard key sizes of 16,32, 64, etc but shortly after releasing my code some Locky downloaders started using a 29 byte XOR key size. This broke my code because I was not using Hamming distance to check for the key size.

The second step is recovering the key. When a Portable executable is compiled one flag is /filealign:number. The number specifies the alignment of sections in the compiled PE file. It can be found in the Portalble Executable file format in OptionalHeader under FileAlignment. All sections within the executable will need to start at an address that is a multiple of the value defined within the FileAlignment. If the FileAlignment is 0x200, and the size of a data is 0x201 then the next section will start at offset 0x400. In between the data and the start of the section is padded with NULL bytes represented as "\x00". The file alignment padding introduces a large amount of null bytes into the executable. When null bytes are XORed the encoded data will contain the key. Searching for the most common recurring byte patterns in a XOR encoded executable can be used to recover the key. The following code can be used to find the 32 most common occurring bytes in an executable

substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
sub_count = substr_counter.most_common(32)

The third step is XOR the data. The following code can be used to XOR data with single or multibyte keys. If you don’t understand the code I would recommend walking through each section of it. This is personally one of my favorite pieces of Python code. It covers a number of Python concepts from list comprehension, logical operations and standard functions.

def xor_mb(message, key):

    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))

The last step is to verify that the key and decrypted data is correct. Since the decrypted payload is an executable file with a known file structure I used pefile to verify the data has been decrypted correctly. If the PE structure is invalid Pefile would throw an exception.

def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
        try:
            pe = pefile.PE(data=temp_buff)
        except:
            continue
        return pe.trim()
    return None

Complete code with example output - link

"""
    Author: 
        Alexander Hanel
    Name: 
        pe_ham_brute.py
    Purpose:
         - POC that searches for n-grams and uses them as the XOR key.
         - Also uses hamming distance to guess key size. Check out cryptopals Challenge 6
         for more details https://cryptopals.com/sets/1/challenges/6
    Example: 
    
pe_ham_brute.py ba5aa03d724d17312d9b65a420f91285caff711e2f891b3699093cc990fdaae0
Hamming distances & calculated key sizes
[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin
"""

import base64
import string
import sys
import collections
import pefile
import re
import hashlib

from cStringIO import StringIO
from collections import Counter
from itertools import cycle 
from itertools import product

DEBUG = True

def xor_mb(message, key):
    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))


def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))


def key_len(message, key_size):
    """"returns [(dist, key_size),(dist, key_size)]"""
    avg = []
    for k in xrange(2,key_size): 
        hd = []
        for n in xrange(len(message)/k-1):
            hd.append(hamming_distance(message[k*n:k*(n+1)],message[k*(n+1):k*(n*2)])/k)
        if hd:
            avg.append((sum(hd) / float(len(hd)), k))
    return sorted(avg)[:10]


def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
        try:
            pe = pefile.PE(data=temp_buff)
        except:
            continue
        return pe.trim()
    return None

def write_file(data, key):
    m = hashlib.md5()
    m.update(data)
    name = m.hexdigest()
    key_name = "key-" + name + ".bin"
    file_name = name + ".bin"
    print "Length: %s, Key: %s File Name: %s" % (len(key),key, file_name)
    f =  open(file_name, "wb")
    fk = open(key_name , "wb")
    f.write(data)
    fk.write(key)
    f.close()
    fk.close()

def run(message):
    key_sizes = key_len(message, 64)
    if DEBUG:
        print "Hamming distances & calculated key sizes"
        print key_sizes
    for temp_sz in key_sizes:
        size = temp_sz[1]
        substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
        sub_count = substr_counter.most_common(32)
        for temp in sub_count:
            key, count = temp
            if count == 1:
                break
            temp = xor_mb(message, key)
            pe_c = pe_carv(temp)
            if pe_c:
                write_file(pe_c, key)
                return
    
data = open(sys.argv[1],'rb').read()
run(data)

For anyone else interested in learning about crypto I'd recommend checking out Understanding Cryptography. It is a great beginner book with not a lot of math. Each chapter has corresponding video lectures on YouTube. Another resource is attempting The Cryptopals Crypto Challenges. I can not recommend the CryptoPals challenge enough. Here are my solutions so far. At one point I contemplated quitting my job so I could just focus only on the challenges. Not one of my most practical ideas but the challenges exposed many of my weaknesses in programming and mathematics. It's pretty rare to find something that points you in the direction of what you need to learn and gives you a definitive answer (cracking the challenge) when you can move on to the next area of study. Pretty awesome. If you have any questions or comments you can ping me on Twitter, leave a comment or send me an email at alexander dot hanel at gmail dot com.

ObfStrReplacer & ExtractSubfile Snippets

ObfStrReplacer is a script that replaces obfuscated variable names with easier to read strings. Some obfuscation techniques rely on common looking strings to make the code difficult to read. For example the string Illl1III111I11 is hard to distinguish from lIll1III111I11. ObfStrReplacer takes a regular expression as an argument to match obfuscated strings, it will then add all matches to a set and replace the matches with a unique string. 11ll1III111I11 would become _carat. All renamed strings start with "_". In the image above we can see the obfuscated code on the left and the de-obfuscated code on the right.

Please see the command line example in the source code for details on usage. I have confirmed it works well on obfuscated ActionScript. The code blindly replaces matches. It does not check for the reuse of variable names within the scope of different functions. I plan on adding this at a later date. Please leave a VT hash in the comments if you have an example.

ObfStrReplacer Source Code

ExtractSubfile is a simple modification to hachoir subfile's search.py. It is used to extract embedded files. The carving functionality was already included in hachoir-subfile but not exposed.

__@___:~/hachoir-subfile crsenvironscan.xls 
[+] Start search on 126444 bytes (123.5 KB)

[+] File at 0 size=80384 (78.5 KB): Microsoft Office document
[+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9

[+] End of search -- offset=126444 (123.5 KB)
Total time: 1 sec 478 ms -- global rate: 83.5 KB/sec
__@___:~/$ python ExtractSubFile.py  crsenvironscan.xls 
[+] Start search on 126444 bytes (123.5 KB)

[+] File at 0 size=80384 (78.5 KB): Microsoft Office document => /home/file-0001.doc
[+] File at 2584 size=52039 (50.8 KB): Macromedia Flash data: version 9 => /home/file-0002.swf

[+] End of search -- offset=126444 (123.5 KB)

In the second and third lines at the end of the output we can see a document and SWF were carved.

ExtractSubFile Source Code