dism-this.py

dism-this.py is a Python script for analyzing disassembled data within file objects. The script relies on pydasm by Ero Carrera  for linear disassembling of the data. The tool does four simple tests to count the presence of aberrant instructions. The tool can also be used to display the disassembly of data and the disassembly of ASCII blobs. dism-this.py needs to have a file passed to the script. Below is an example of the script being passed to itself. The file does not contain any valid instructions but is a good example of the output.
dism-this.py dism-this.py
Analysis:
        Info: Instructions Disassembled Count 2166
        Error: Invalid Disassembly Count 71
                * Example: ?? jna 0x129
        Invalid: Static Offset Count 97
                * Example: sub [0xd218000a], ecx
        Invalid: Segment Register Use Count 127
                * Example: fs daa
        Anomaly: Infrequent Instruction Use Count 1135
                * Example: arpl [ebp+ecx+0xa],bp
The first line of the analysis contains a count of how many lines of code were disassembled. The second analysis contains a count of how many lines pydasm could not disassemble due to the line not being valid. The third counts the use of static offsets. The fourth counts the number of segment registered used. The later two are not typically used. The FS register is used for traversing the PEB to get the base offset of Kernel32.dll. But the script only checks the first couple of chars in the disasembled line. The last analysis checks if the instruction is infrequent. Most executable code contains one of twenty one instructions. We can run the following Python code in IDA to get the top concordance count of the code.
instr = []
ea = ScreenEA()
for funcea in Functions(SegStart(ea), SegEnd(ea)):
    E = list(FuncItems(ea))
    for e in E:
        instr.append(GetMnem(e))

count = {}
for mnem in instr:
    if mnem in count:
        count[mnem] += 1
    else:
        count[mnem]  = 1
        

popMnem = sorted(count, key = count.get, reverse = True)

print len(popMnem[:35])
print popMnem[:35]
Output of the command on an IDB.
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] 
IDAPython v1.5.5 final (serial 0) (c) The IDAPython Team <idapython@googlegroups.com>
--------------------------------------------------------------------------------------
21
['push', 'call', 'mov', 'pop', 'add', 'inc', 'and', 'movzx', 'cdq', 'idiv', 'shr', 'test', 'or', 'xor', 'sub', 'jz', 'retn', 'jnz', 'jmp', 'jnb', 'cmp']
It should be noted that shellcode has instructions that are not always seen in normal executable code. These were not included because I did not want to use them as signatures. I'm trying to do this more generically.
dism-this.py -h
Usage: dism-this.py [options] data.file

Options:
  -h, --help            show this help message and exit
  -v, --verbose         print disassembly
  -s SKIP, --skip=SKIP  skip n input bytes
  -c COUNT, --count=COUNT
                        disassembly only n input blocks
  -a, --ascii_blob      disassembly ascii blob
dism-this.py has four arguments. The -v or --verbose is to print the output from pydasm. The -s or --skip is used to skip n number of bytes. The -c or --count is to read only n amount of bytes. The -a or --ascii_blob is for disassembling ascii blobs. An example of an ascii blob would be '9090' which would translate to nop and nop; once it has been converted to a hex binary format.

Hex of ASCII blob
For example let's say we have the following data that we think is shellcode. The highlighted section is the ASCII blob. The data starts at offset 0x14 and ends 0xA9. The length of the data is 0x96 bytes.
dism-this.py -a -s 0x14 -c 0x96 -v asm.txt
Disassembly:
        xor edx,edx
        push edx
        push dword 0x636c6163
        mov esi,esp
        push edx
        push esi
        mov esi,fs:[edx+0x30]
        mov esi,[esi+0xc]
        mov esi,[esi+0xc]
        lodsd
        mov esi,[eax]
        mov edi,[esi+0x18]
        mov ebx,[edi+0x3c]
        mov ebx,[edi+ebx+0x78]
        mov esi,[edi+ebx+0x20]
        add esi,edi
        mov ecx,[edi+ebx+0x24]
        add ecx,edi
        inc edx
        lodsd
        cmp dword [edi+eax],0x456e6957
        jnz 0x2f
        movzx edx,[ecx+edx*2-0x2]
        mov esi,[edi+ebx+0x1c]
        add esi,edi
        add edi,[esi+edx*4]
        call edi
        int3

Analysis:
        Info: Instructions Disassembled Count 28
        Error: Invalid Disassembly Count 0
                * Example: ?? jna 0x129
        Invalid: Static Offset Count 0
                * Example: sub [0xd218000a], ecx
        Invalid: Segment Register Use Count 0
                * Example: fs daa
        Anomaly: Infrequent Instruction Use Count 3
                * Example: arpl [ebp+ecx+0xa],bp
Please email if you find any bugs or have any questions. My email can be found in the comments of the code. I have created a bitbucket repo. Please download from the repo. The below code is not the most current.

Source Code - BitBucket Repo

#!/usr/bin/env python

# dism-this.py is a script that analyzes data for the possible detection of shellcode or instructions.  
# Written by alexander dot hanel at gmail dot com    

import re
import sys    
from optparse import OptionParser
try:
    import pydasm
except ImportError:
    print "Error: Pydasm Can Not be Found"
    sys.exit()

class CKASM():
    def __init__(self):
        self.brRegex = re.compile(r'\[.+?\]')
        self.registers = ['eax', 'ebx', 'ecx', 'edx', 'esi', 'edi', 'esp', 'ebp', 'ax', 'bx', 'cx', 'dx', 'ah', 'al', 'bh', 'bp', 'bl', 'ch', 'cl', 'dh', 'dl', 'di', 'si', 'sp', 'ip']
        self.popMnem = ['push', 'call', 'mov', 'pop', 'add', 'inc', 'and', 'movzx', 'cdq', 'idiv', 'shr', 'test', 'or', 'xor', 'sub', 'jz', 'retn', 'jnz', 'jmp', 'jnb', 'cmp']
        self.segment = [ 'ds', 'cs', 'ss', ' es', 'gs', 'fs']
        self.segmentCount = 0 
        self.errorCount = 0
        self.skip =  None
        self.count = None 
        self.buffer = None
        self.ascii = False
        self.verbose = False
        self.fhandle = None
        self.parser = None
        self.callParser()
        self.checkFileArgs()
        self.getBuffer()
        self.asciiBlob()
        self.errorStaticCount = 0
        self.errorStatic = []
        self.errorInvalidInstCount = 0
        self.errorInvalidInst = []
        self.outcastInstr = 0
        
    def dis(self, buff):
        'disassembles buffer using pydasm, returns assembly in buffer'
        offset = 0
        outDis = []
        while offset < len(buff):
            i = pydasm.get_instruction(buff[offset:],pydasm.MODE_32)
            tmp = pydasm.get_instruction_string(i,pydasm.FORMAT_INTEL,offset)
            outDis.append(tmp)
            if not i:
                return outDis
            offset +=  i.length
        return outDis

    def callParser(self):
        'parses the command line arguments'
        self.parser = OptionParser()
        usage = 'usage: %prog [options] <data.file>'
        self.parser = OptionParser(usage=usage)
        # command options
        self.parser.add_option('-v', '--verbose', action='store_true', dest='verbose', help="print disassembly")
        self.parser.add_option('-s', '--skip', type="int", dest='skip', help='skip n input bytes')
        self.parser.add_option('-c' , '--count', type="int", dest='count', help='disassembly only n input blocks')
        self.parser.add_option('-a', '--ascii_blob', action='store_true', dest='ascii', help='disassembly ascii blob')
        (options, args) = self.parser.parse_args()
        # Assigns passed variables 
        if options.verbose == True:
            self.verbose = True
        if options.skip != None:
            self.skip = options.skip
        if options.count != None:
            self.count = options.count
        if options.ascii != None:
            self.ascii = options.ascii
        
    def analyzeInstr(self, line):
        'add instruction analysis here' 
        if None == line:
            return
        elif '??' in line:
            self.errorInvalidInstCount += 1
            self.errorInvalidInst.append(line)
            return 
        elif '[' in line and ']' in line:
            if self.staticOffset(line) != None:
                self.errorStaticCount += 1
                self.errorStatic.append(line)
                return 
        self.segmentCheck(line)
        self.outcast(line)
        return 
        
    def checkOffsetBounds(self, line):
        if  self.getOffset(line) > 0xfffff and line != None:
            print "Invalid: Offset %s" % line
            
    def staticOffset(self, line):
        value = re.search(self.brRegex, line).group(0)[1:-1]
        try:
            tmp = int(value,16)
            return tmp 
        except:
            return None 
    
    def segmentCheck(self,line):
        for seg in self.segment:
            if seg in line[0:3]:
                self.segmentCount += 1
                
    def outcast(self,line):
        b = False
        for mnem in self.popMnem:
            if mnem in line[0:5]:
                return
            else:
                b = False        
        if b == False:
            self.outcastInstr += 1
        
    def checkFileArgs(self):
        'janky way for checking file arguments'
        if len(sys.argv) == 1:
            self.parser.print_help()
            sys.exit()
        else:
            try:
                self.fhandle = open(sys.argv[len(sys.argv)-1], 'rb')
            except:
                print "Error: Could not access the file" 
                sys.exit()
        pass
        
    def asciiBlob(self):
        'converts ascii blobs to binary two bytes at a time'
        if self.ascii == False:
            return 
        from StringIO import StringIO
        tmpBuff = StringIO(self.buffer)
        buff = ''
        b = tmpBuff.read(2)
        while b != '':
            try:
                buff = buff + chr(int(b,16))
                b = tmpBuff.read(2)
            except ValueError:
                break
        self.buffer = buff
        
    def getBuffer(self):
        'checks the skip and count contents then reads the data to a buffer'
        if self.skip != None:
            self.fhandle.seek(self.skip)
        if self.count != None:
            self.buffer = self.fhandle.read(int(self.count))
            return
        self.buffer = self.fhandle.read()
        return 
        
    def start(self):
        'disneyland'
        disO = self.dis(self.buffer)
        for assemblyLine in list(disO):
            self.analyzeInstr(assemblyLine)
        if self.verbose == True:
            self.verbosed(disO)
        self.output(disO)
            
    def output(self,disO):
        'print output of analysis'
        print "Analysis:"
        print "\tInfo: Instructions Disassembled Count %s" % len(disO)
        print "\tError: Invalid Disassembly Count %s" % self.errorInvalidInstCount
        print "\t\t* Example: ?? jna 0x129"    
        print "\tInvalid: Static Offset Count %s " % self.errorStaticCount
        print "\t\t* Example: sub [0xd218000a], ecx"    
        print "\tInvalid: Segment Register Use Count %s " % self.segmentCount
        print "\t\t* Example: fs daa"
        print "\tAnomaly: Infrequent Instruction Use Count %s " % self.outcastInstr
        print "\t\t* Example: arpl [ebp+ecx+0xa],bp"
        print
    
    def verbosed(self, disO):
        'print disassembly'
        print 'Disassembly:'
        for assemblyLine in list(disO):
            print '\t' + assemblyLine
        print 
    

def main():
    ck = CKASM()
    ck.start()

if __name__ == "__main__":
    main()

No comments:

Post a Comment