PE Skeletons

I know this topic has been beaten to death but I thought I'd share a technique for detecting single byte XOR executables in file streams. Recently while looking at a file stream I instantly knew there was an encoded XOR executable file in it. This got me thinking, why can I spot this and can I script it up? Since executable files are a defined structure they have a standard skeleton to them. It's not always easy to see the skeleton if we just look at the hex bytes.

We can add some color via the following Python code. 

import matplotlib.pyplot as plt
import numpy as np
import sys

def main():
        data = open(sys.argv[1], 'rb').read()[:512]
        dlist = bytearray(data)
        print len(dlist)
        plotters = np.array(dlist)
        plotters.shape = (32,16)
        plt.axis([0,16,0, 32])

if __name__ == '__main__':

The code reads the first 512 bytes of a file, puts each byte into a bytearray and then plots the color in the same structure as the hex dump. If we were to pass an executable file to this script we would get the following pretty picture. 
The bottom left hand corner byte is 0x4d 'M' the second is 0x5a 'Z' and so on. If we were to XOR the executable with  0x88 we would get the following image.
XOR 0x88 Key
If we were to think about the red in the first image and blue in the second image as negative space we would see the PE skeleton. Okay, that was cute, now let's see if we can detect this in a file stream. Since the Portable Executable have a standard structure. The beginning starts with 'MZ', jump 0x3C bytes, read four bytes to get the address of the PE, then check if "PE" is at the read offset. This is a highly dumb down version. Check out PE101 by Ange Albertini for an awesome introduction if my definition is unclear. Since these are standard steps all we have to do is check for the same structure but with XORed data.

import sys
import struct

# read file into a bytearray
byte = bytearray(open(sys.argv[1], 'rb').read())

# for each byte in the file stream, excluding the last 256 bytes
for i in range(0, len(byte) - 256):
        # KEY ^ VALUE ^ KEY = VALUE; Simple way to get the key 
        key = byte[i] ^ ord('M')
        # verify the two bytes contain 'M' & 'Z'
        if chr(byte[i] ^ key) == 'M' and  chr(byte[i+1] ^ key) == 'Z':
                # skip non-XOR encoded MZ
                if key == 0:
                # read four bytes into temp, offset to PE aka lfanew
                temp = byte[(i + 0x3c) : (i + 0x3c + 4)]
                # decode values with key 
                lfanew = []
                for x in temp:
                        lfanew.append( x ^ key)
                # convert from bytearray to int value, probably a better way to do this
                pe_offset  = struct.unpack( '<i', str(bytearray(lfanew)))[0]
                # verify results are not negative or read is bigger than file 
                if pe_offset < 0 or pe_offset > len(byte):
                # verify the two decoded bytes are 'P' & 'E'
                if byte[pe_offset + i ] ^ key == ord('P') and byte[pe_offset + i + 1] ^ key == ord('E'):
                        print "Encoded PE Found, Key %x, Offset %x" % (key, i)
Speed, false postives testing, etc are all probably areas of improvement for the code.

If we were to run this on the executable XORed with 0x88 we would be present with the following output Encoded PE Found, Key 88, Offset 0. Here is a script that will automatically find a XOR encoded executable and carve it out using the above code and Pefile.

Kind of a cool technique to use the Portable Executable structure to find XOR exes. It only works on single byte executables. Could be modified for 2 or 4 bytes. Not sure about anything higher. A brute force approach would probably be better for key byte size of anything higher than 4. The skeleton is prevalent when an executable is XORed with a key of five bytes in size. Using gray tones can help show the skeleton because the contrast is dulled.

Useful Links

Side Note:
I have to admit I'm a huge fan of using ByteArrays now. I wish I could have of learned of them sooner. They are very useful for writing decoders. It remove a lot of the four play of checking the computed size ( value & 0xFF), using ord() and using chr().

No comments:

Post a Comment