Side Note:
For anyone doing deobfuscation of languages interpreted by wscript.exe, I would recommend investigating hooking APIs. Most of the APIs that need to be hooked can be identified by using an API monitor. Also with hooking it allows you to control what the APIs return. This is useful if you want to recover all URLS that sample might want to connect to. I'll try to post some example code in the next week or two.
Since the attackers were using XOR on an Portable Executable (PE) file I decided to crack it. This is not very difficult because XOR is not a secure cipher and when used on a portable executable file a padding attack is introduced. Cracking XOR is a four step process. The first is recovering the key size, second is recovering the key, then decrypting the data with the found key and finally checking for the correct decrypted data.
To recover the key size Hamming distance can be used. Hamming distance can be used to calculate the number of substitutions needed to change one string into the other. From a XOR cracking standpoint, the smallest hamming distance found in a XOR file is likely the XOR key size or a multiple of it. I say a multiple of it because sometimes the smallest hamming distance could be the key size times 2 or another value. For example the below output contains a list of tuples that has the hamming distance and the key size. The actual key size was 29 but the lowest hamming distance found was 58.
[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)] Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin
Here is the code for computing the hamming distance. Note, the two strings must have the same size.
def hamming_distance(bytes_a, bytes_b): return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))
Identifying the key size is very important. Earlier versions of my script used standard key sizes of 16,32, 64, etc but shortly after releasing my code some Locky downloaders started using a 29 byte XOR key size. This broke my code because I was not using Hamming distance to check for the key size.
The second step is recovering the key. When a Portable executable is compiled one flag is /filealign:number. The number specifies the alignment of sections in the compiled PE file. It can be found in the Portalble Executable file format in OptionalHeader under FileAlignment. All sections within the executable will need to start at an address that is a multiple of the value defined within the FileAlignment. If the FileAlignment is 0x200, and the size of a data is 0x201 then the next section will start at offset 0x400. In between the data and the start of the section is padded with NULL bytes represented as "\x00". The file alignment padding introduces a large amount of null bytes into the executable. When null bytes are XORed the encoded data will contain the key. Searching for the most common recurring byte patterns in a XOR encoded executable can be used to recover the key. The following code can be used to find the 32 most common occurring bytes in an executable
substr_counter = Counter(message[i: i+size] for i in range(len(message) - size)) sub_count = substr_counter.most_common(32)
The third step is XOR the data. The following code can be used to XOR data with single or multibyte keys. If you don’t understand the code I would recommend walking through each section of it. This is personally one of my favorite pieces of Python code. It covers a number of Python concepts from list comprehension, logical operations and standard functions.
def xor_mb(message, key): return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))
The last step is to verify that the key and decrypted data is correct. Since the decrypted payload is an executable file with a known file structure I used pefile to verify the data has been decrypted correctly. If the PE structure is invalid Pefile would throw an exception.
def pe_carv(data): '''carve out executable using pefile's trim''' c = 1 for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]: # slice out executable temp_buff = data[offset:] try: pe = pefile.PE(data=temp_buff) except: continue return pe.trim() return None
Complete code with example output - link
""" Author: Alexander Hanel Name: pe_ham_brute.py Purpose: - POC that searches for n-grams and uses them as the XOR key. - Also uses hamming distance to guess key size. Check out cryptopals Challenge 6 for more details https://cryptopals.com/sets/1/challenges/6 Example: pe_ham_brute.py ba5aa03d724d17312d9b65a420f91285caff711e2f891b3699093cc990fdaae0 Hamming distances & calculated key sizes [(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)] Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin """ import base64 import string import sys import collections import pefile import re import hashlib from cStringIO import StringIO from collections import Counter from itertools import cycle from itertools import product DEBUG = True def xor_mb(message, key): return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key))) def hamming_distance(bytes_a, bytes_b): return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b))) def key_len(message, key_size): """"returns [(dist, key_size),(dist, key_size)]""" avg = [] for k in xrange(2,key_size): hd = [] for n in xrange(len(message)/k-1): hd.append(hamming_distance(message[k*n:k*(n+1)],message[k*(n+1):k*(n*2)])/k) if hd: avg.append((sum(hd) / float(len(hd)), k)) return sorted(avg)[:10] def pe_carv(data): '''carve out executable using pefile's trim''' c = 1 for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]: # slice out executable temp_buff = data[offset:] try: pe = pefile.PE(data=temp_buff) except: continue return pe.trim() return None def write_file(data, key): m = hashlib.md5() m.update(data) name = m.hexdigest() key_name = "key-" + name + ".bin" file_name = name + ".bin" print "Length: %s, Key: %s File Name: %s" % (len(key),key, file_name) f = open(file_name, "wb") fk = open(key_name , "wb") f.write(data) fk.write(key) f.close() fk.close() def run(message): key_sizes = key_len(message, 64) if DEBUG: print "Hamming distances & calculated key sizes" print key_sizes for temp_sz in key_sizes: size = temp_sz[1] substr_counter = Counter(message[i: i+size] for i in range(len(message) - size)) sub_count = substr_counter.most_common(32) for temp in sub_count: key, count = temp if count == 1: break temp = xor_mb(message, key) pe_c = pe_carv(temp) if pe_c: write_file(pe_c, key) return data = open(sys.argv[1],'rb').read() run(data)
For anyone else interested in learning about crypto I'd recommend checking out Understanding Cryptography. It is a great beginner book with not a lot of math. Each chapter has corresponding video lectures on YouTube. Another resource is attempting The Cryptopals Crypto Challenges. I can not recommend the CryptoPals challenge enough. Here are my solutions so far. At one point I contemplated quitting my job so I could just focus only on the challenges. Not one of my most practical ideas but the challenges exposed many of my weaknesses in programming and mathematics. It's pretty rare to find something that points you in the direction of what you need to learn and gives you a definitive answer (cracking the challenge) when you can move on to the next area of study. Pretty awesome. If you have any questions or comments you can ping me on Twitter, leave a comment or send me an email at alexander dot hanel at gmail dot com.
Hamming distance seems useful.
ReplyDelete