A Primer on Cracking XOR Encoded Executables

A while back Locky JS downloaders were downloading executable payloads encrypted with XOR. The infection chain consisted of a victim double clicking on a JS (JavaScript), JSE (Encoded JavaScript), WSH (Windows Script Host ) or another Jscript based interpreted language, the script would then connect to a compromised website, download a binary file, decrypt the binary file using XOR and then execute the decrypted executable file.  At the time I was relying on one of two analysis approaches to retrieve the decrypted payload. The first was using automated malware analysis systems to recover the dropped payload. The second was reversing obfuscated JavaScript or other languages interpreted by wscrpt.exe to find the XOR key. Once I found the key I would decrypt the network traffic carved from a PCAP to recover the Locky executable. Both of these approaches are laborious because either I was relying on automated malware analysis system or successfully deobfuscating the script to recover the key.

Side Note:
For anyone doing deobfuscation of languages interpreted by wscript.exe, I would recommend investigating  hooking APIs. Most of the APIs that need to be hooked can be identified by using an API monitor. Also with hooking it allows you to control what the APIs return. This is useful if you want to recover all URLS that sample might want to connect to. I'll try to post some example code in the next week or two. 

Since the attackers were using XOR on an Portable Executable (PE) file I decided to crack it. This is not very difficult because XOR is not a secure cipher and when used on a portable executable file a  padding attack is introduced. Cracking XOR is a four step process. The first is recovering the key size, second is recovering the key, then decrypting the data with the found key and finally checking for the correct decrypted data.

To recover the key size Hamming distance can be used. Hamming distance can be used to calculate the number of substitutions needed to change one string into the other. From a XOR cracking standpoint, the smallest hamming distance found in a XOR file is likely the XOR key size or a multiple of it. I say a multiple of it because sometimes the smallest hamming distance could be the key size times 2 or another value. For example the below output contains a list of tuples that has the hamming distance and the key size. The actual key size was 29 but the lowest hamming distance found was 58.


[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin

Here is the code for computing the hamming distance. Note, the two strings must have the same size.


def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))

Identifying the key size is very important. Earlier versions of my script used standard key sizes of 16,32, 64, etc but shortly after releasing my code some Locky downloaders started using a 29 byte XOR key size. This broke my code because I was not using Hamming distance to check for the key size.

The second step is recovering the key. When a Portable executable is compiled one flag is /filealign:number. The number specifies the alignment of sections in the compiled PE file. It can be found in the Portalble Executable file format in OptionalHeader under FileAlignment. All sections within the executable will need to start at an address that is a multiple of the value defined within the FileAlignment. If the FileAlignment is 0x200, and the size of a data is 0x201 then the next section will start at offset 0x400. In between the data and the start of the section is padded with NULL bytes represented as "\x00".  The file alignment padding introduces a large amount of null bytes into the executable. When null bytes are XORed the encoded data will contain the key.  Searching for the most common recurring byte patterns in a XOR encoded executable can be used to recover the key. The following code can be used to find the 32 most common occurring bytes in an executable


substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
sub_count = substr_counter.most_common(32)


The third step is XOR the data. The following code can be used to XOR data with single or multibyte keys. If you don’t understand the code I would recommend walking through each section of it. This is personally one of my favorite pieces of Python code. It covers a number of Python concepts from list comprehension, logical operations and standard functions.


def xor_mb(message, key):

    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))


The last step is to verify that the key and decrypted data is correct. Since the decrypted payload is an executable file with a known file structure I used pefile to verify the data has been decrypted correctly. If the PE structure is invalid Pefile would throw an exception.


def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
        try:
            pe = pefile.PE(data=temp_buff)
        except:
            continue
        return pe.trim()
    return None


Complete code with example output - link


"""
    Author: 
        Alexander Hanel
    Name: 
        pe_ham_brute.py
    Purpose:
         - POC that searches for n-grams and uses them as the XOR key.
         - Also uses hamming distance to guess key size. Check out cryptopals Challenge 6
         for more details https://cryptopals.com/sets/1/challenges/6
    Example: 
    
pe_ham_brute.py ba5aa03d724d17312d9b65a420f91285caff711e2f891b3699093cc990fdaae0
Hamming distances & calculated key sizes
[(2.6437784522003036, 58), (2.6952976867652634, 29), (3.2587556654305727, 63), (3.270363951473137, 53), (3.285315243415802, 61), (3.2863494886616276, 34), (3.29136690647482, 55), (3.300850228907783, 50), (3.306188371302278, 26), (3.309218485361723, 37)]
Length: 58, Key: IUN0mhqDx239nW3vpeL9YWBPtHC0HIUN0mhqDx239nW3vpeL9YWBPtHC0H File Name: dc53de4f4f022e687908727570345aba.bin
"""

import base64
import string
import sys
import collections
import pefile
import re
import hashlib

from cStringIO import StringIO
from collections import Counter
from itertools import cycle 
from itertools import product

DEBUG = True

def xor_mb(message, key):
    return''.join(chr(ord(m_byte)^ord(k_byte)) for m_byte,k_byte in zip(message, cycle(key)))


def hamming_distance(bytes_a, bytes_b):
    return sum(bin(i ^ j).count("1") for i, j in zip(bytearray(bytes_a), bytearray(bytes_b)))


def key_len(message, key_size):
    """"returns [(dist, key_size),(dist, key_size)]"""
    avg = []
    for k in xrange(2,key_size): 
        hd = []
        for n in xrange(len(message)/k-1):
            hd.append(hamming_distance(message[k*n:k*(n+1)],message[k*(n+1):k*(n*2)])/k)
        if hd:
            avg.append((sum(hd) / float(len(hd)), k))
    return sorted(avg)[:10]


def pe_carv(data):
    '''carve out executable using pefile's trim'''
    c = 1
    for offset in [temp.start() for temp in re.finditer('\x4d\x5a',data)]:
        # slice out executable 
        temp_buff = data[offset:]
        try:
            pe = pefile.PE(data=temp_buff)
        except:
            continue
        return pe.trim()
    return None

def write_file(data, key):
    m = hashlib.md5()
    m.update(data)
    name = m.hexdigest()
    key_name = "key-" + name + ".bin"
    file_name = name + ".bin"
    print "Length: %s, Key: %s File Name: %s" % (len(key),key, file_name)
    f =  open(file_name, "wb")
    fk = open(key_name , "wb")
    f.write(data)
    fk.write(key)
    f.close()
    fk.close()

def run(message):
    key_sizes = key_len(message, 64)
    if DEBUG:
        print "Hamming distances & calculated key sizes"
        print key_sizes
    for temp_sz in key_sizes:
        size = temp_sz[1]
        substr_counter = Counter(message[i: i+size] for i in range(len(message) - size))
        sub_count = substr_counter.most_common(32)
        for temp in sub_count:
            key, count = temp
            if count == 1:
                break
            temp = xor_mb(message, key)
            pe_c = pe_carv(temp)
            if pe_c:
                write_file(pe_c, key)
                return
    
data = open(sys.argv[1],'rb').read()
run(data) 

For anyone else interested in learning about crypto I'd recommend checking out Understanding Cryptography. It is a great beginner book with not a lot of math. Each chapter has corresponding video lectures on YouTube. Another resource is attempting The Cryptopals Crypto Challenges. I can not recommend the CryptoPals challenge enough. Here are my solutions so far. At one point I contemplated quitting my job so I could just focus only on the challenges. Not one of my most practical ideas but the challenges exposed many of my weaknesses in programming and mathematics. It's pretty rare to find something that points you in the direction of what you need to learn and gives you a definitive answer (cracking the challenge) when you can move on to the next area of study.   Pretty awesome. If you have any questions or comments you can ping me on Twitter, leave a comment or send me an email at alexander dot hanel at gmail dot com.

1 comment: