xxxswf.py

Note: Please check the xxxswf.py repo for the most current version.  The current version handles extracting and decompressing LZMA (ZWS) embedded SWFs.

xxxswf.py is a Python script for carving, scanning, compressing, decompressing and analyzing Flash SWF files. The script can be used on an individual SWF, single SWF or multiple SWFs embedded in a file stream or all files in a directory. The tool could be useful for system administrators, incident response, exploit analyst, malware analyst or web developers.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -h
Usage: xxxswf.py [options] file.bad

Options:
  -h, --help            show this help message and exit
  -x, --extract         Extracts the embedded SWF(s), names it MD5HASH.swf &
                        saves it in the working dir. No addition args needed
  -y, --yara            Scans the SWF(s) with yara. If the SWF(s) is
                        compressed it will be deflated. No addition args
                        needed
  -s, --md5scan         Scans the SWF(s) for MD5 signatures. Please see func
                        checkMD5 to define hashes. No addition args needed
  -H, --header          Displays the SWFs file header. No addition args needed
  -d, --decompress      Deflates compressed SWFS(s)
  -r PATH, --recdir=PATH
                        Will recursively scan a directory for files that
                        contain SWFs. Must provide path in quotes
  -c, --compress        Compresses the SWF using Zlib</pre>
xxxswf.py with no options and a file passed. The output is extremely simple. The [SUMMARY] shows the count of embedded SWFs. The MD5 and name of the scanned file, the address of the embedded SWF and the header of the SWF. FWS is uncompressed and CWS is compressed with zlib.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py test.swf
[SUMMARY] 1 SWF(s) in MD5:7ca4ab177f480503653702b33366111f:test.swf
        [ADDR] SWF 1 at 0xa18  - CWS Header
xxxswf.py with the -x (--extract) option. The file will be carved and saved to the working directory. The name will be the MD5 of the deflated SWF and the '.swf' extension. If there are multiple files with the same MD5 the file's name will be MD5.count.swf. The count will only go up to 50. A useful example of this will be given later.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -x x.bin
[SUMMARY] 2 SWF(s) in MD5:32fed596fa850057211121488f6c6b75:x.bin
        [ADDR] SWF 1 at 0x0  - FWS Header
                [FILE] Carved SWF MD5: c46299a5015c6d31ad5766cb49e4ab4b.swf
        [ADDR] SWF 2 at 0x7774  - FWS Header
                [FILE] Carved SWF MD5: c46299a5015c6d31ad5766cb49e4ab4b.2.swf
The -r or --recdir option can be used to recursively search or carve out all SWFs in a directory. This could be used on a temporary internet directory or a repository of malicious documents. It's recommend to pipe the output to a text file. The path will need to be in quotes. This can take a few minutes due to the size of the directory and the speed of your processor.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -x -r "C:\Documents and Settings\XOR\Desktop\samples\mal" > out.txt

vi out.txt

[SUMMARY] 1 SWF(s) in MD5:93d63b5f9167d7ab579ca9bd70d1dd3e:C:\Documents and Settings\XOR\Desktop\samples\mal\301.xls=
    [ADDR] SWF 1 at 0x13ef81 - [ERROR]: Zlib decompression error. Invalid CWS SWF

[SUMMARY] 1 SWF(s) in MD5:d2cad99c92a1a43b8ed0c217b6a501af:C:\Documents and Settings\XOR\Desktop\samples\mal\CVE-2009-3129.xls
    [ADDR] SWF 1 at 0x13ef81 - [ERROR]: Zlib decompression error. Invalid CWS SWF

[SUMMARY] 1 SWF(s) in MD5:358895e898866ef0432391b931096209:C:\Documents and Settings\XOR\Desktop\samples\mal\CWS.swf
    [ADDR] SWF 1 at 0x0  - CWS Header
        [FILE] Carved SWF MD5: f05ba07d32e9a7b47a18aa3f172ad4e5.swf

[SUMMARY] 1 SWF(s) in MD5:c46299a5015c6d31ad5766cb49e4ab4b:C:\Documents and Settings\XOR\Desktop\samples\mal\simple.swf
    [ADDR] SWF 1 at 0x0  - FWS Header
        [FILE] Carved SWF MD5: c46299a5015c6d31ad5766cb49e4ab4b.3.swf

[SUMMARY] 7 SWF(s) in MD5:7089ec4198e70f58f09547201ae4e185:C:\Documents and Settings\XOR\Desktop\samples\mal\swfxxx.py
    [ADDR] SWF 1 at 0x607  - [ERROR] Invalid SWF Version
    [ADDR] SWF 2 at 0x60b  - [ERROR] Invalid SWF Version
    [ADDR] SWF 3 at 0x958  - [ERROR] Invalid SWF Version
    [ADDR] SWF 4 at 0x981  - [ERROR] Invalid SWF Size
    [ADDR] SWF 5 at 0x18d0  - [ERROR] Invalid SWF Size
    [ADDR] SWF 6 at 0x1c45  - [ERROR] Invalid SWF Size
    [ADDR] SWF 7 at 0x1cc3  - [ERROR] Invalid SWF Size
....
The search for embedded SWFs is done simply by using a regular expression with "FWS" and "CWS" as the search criteria. This generic search will return false positives. Verifying the SWF is done by checking for a valid version, valid size and valid decompression (if compressed). Please see the function verifySWF(). This approach is time consuming but it does work. Above we can see the different errors being generated. All errors will contain the string "[ERROR]". If the sample set is large enough odds are there will be recurring MD5 file names. xxxswf.py can be used to classify or alert on commonly used MD5 SWFs. The function checkMD5 can be edited to alert on specific MD5s.
def checkMD5(md5):
# checks if MD5 has been seen in MD5 Dictionary 
# MD5Dict contains the MD5 and the CVE
# For { 'MD5':'CVE', 'MD5-1':'CVE-1', 'MD5-2':'CVE-2'}
    MD5Dict = {'c46299a5015c6d31ad5766cb49e4ab4b':'CVE-XXXX-XXXX'}
    if MD5Dict.get(md5):
        print '\t[BAD] MD5 Match on', MD5Dict.get(md5)
    return
The MD5 "c46299a5015c6d31ad5766cb49e4ab4b" was found in the x.bin example from a couple example above. MD5 scanning is done by passing the -s or --md5scan. All hashing or signature alerts contain the string [BAD].
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -s x.bin
[SUMMARY] 2 SWF(s) in MD5:32fed596fa850057211121488f6c6b75:x.bin
        [ADDR] SWF 1 at 0x0  - FWS Header
        [BAD] MD5 Match on CVE-XXXX-XXXX
        [ADDR] SWF 2 at 0x7774  - FWS Header
        [BAD] MD5 Match on CVE-XXXX-XXXX
xxxswf can be used to decompress a single SWF by using the -d --decompress option.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -d test.swf
[SUMMARY] 1 SWF(s) in MD5:7ca4ab177f480503653702b33366111f:test.swf
        [ADDR] SWF 1 at 0xa18  - CWS Header
                [FILE] Carved SWF MD5: f0f40a975ef68cf6358f84515a8f103e.4.swf
It can compress SWFs using the -c or --compress options. Note: In testing I wasn't able to decompress a SWF downloaded from the internet and compress it again to get a matching MD5. A single byte is off. If someone could give me a clue on this one or recommend another technique please let me know.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -c f0f40a975ef68cf6358f84515a8f103e.2.swf
[SUMMARY] 1 SWF(s) in MD5:f0f40a975ef68cf6358f84515a8f103e:f0f40a975ef68cf6358f8
4515a8f103e.2.swf
        [ADDR] SWF 1 at 0x0  - FWS Header
                [FILE] Compressed SWF MD5: e9e6c13c461dc38006ff7d26c18e904e.swf
The SWF headers information can be displayed by using -H or --header
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -H 11cc16d78597fe9999b7f6b714727ac3.10.swf
[SUMMARY] 1 SWF(s) in MD5:11cc16d78597fe9999b7f6b714727ac3:11cc16d78597fe9999b7f
6b714727ac3.10.swf
        [ADDR] SWF 1 at 0x0  - FWS Header
        [HEADER] File header: FWS
        [HEADER] File version: 7
        [HEADER] File size: 52647
        [HEADER] Rect Nbit: 15
        [HEADER] Rect Xmin: 0
        [HEADER] Rect Xmax: 11000
        [HEADER] Rect Ymin: 0
        [HEADER] Rect Ymax: 3600
        [HEADER] Frame Rate: 7936
        [HEADER] Frame Count: 1
The output for Rect is in twips. The script contains the ability to scan the deflated SWF(s) with yara. The options are -y and --yara. This makes it easy to create signatures on malicious SWF files that do not have static MD5s. Due to the scanning set only being in a SWF file the signatures can be a little more generic. Let's walk through an example using information gathered from the excellent write up by Microsoft.

http://blogs.technet.com/b/mmpc/archive/2011/03/17/a-technical-analysis-on-the-cve-2011-0609-adobe-flash-player-vulnerability.aspx 

After reading the link we know some key things. We know what is triggering the exploit (bytecode verification error), we know there is some shellcode and we know there is some code for creating the heap spray. The analysis gives a nice clue about the exploit and what to target. From the analysis "The Adobe Flash file embedded inside the Excel file is another carrier for the exploit. It loads shellcode inside memory, performs heap-spraying, and loads a Flash byte stream from memory to exploit the 0-day vulnerability". If you look closely at the byte stream in the screenshot you will notice the string "43575309". What would this sring or Flash byte stream looks like if it was actually binary data and not a string?
import sys
s = "43575309"
for i in xrange(0,len(s),2): 
    sys.stdout.write(chr(int('0x'+ s[i:i+2],16)))

CWS
As mentioned earlier 'CWS' is the header for a compressed SWF. Nine is the Flash version. We have an embedded SWF that is stored into a byte array and then converted from hex to binary. Let's create a yara signature targeting this. Note: the string hexToBin is the name of a function and in a way is arbitrary. It's better to go after the code or data related to triggering the exploit. This exploit is a little more difficult because the trigger is embedded in a compressed SWF stored as ASCII hex. For more information please see my poor-grammar-non-proof-read post called An Intro to Creating Anti-Virus Signatures.
rule CVE_2011_0609
{  
strings:  
    $CWSHeader = "435753"
    $FWSHeader = "465753"
    $hex2bin = "hexToBin"
    
condition:  
    ($CWSHeader or $FWSHeader) and $hex2bin
}
Saved int the working dir as rules.yar.
C:\Documents and Settings\XOR\My Documents\Projects\swfxxx>python xxxswf.py -y "CVE-2011-0609_.xls__"
[SUMMARY] 1 SWF(s) in MD5:4bb64c1da2f73da11f331a96d55d63e2:CVE-2011-0609_.xls=__

        [ADDR] SWF 1 at 0xa18  - FWS Header
        [BAD] Yara Signature Hit: CVE_2011_0609
If you would like to import this script there is a function called bad(). This function can be used for scanning a SWF with MD5 and Yara. An open file handle will need to be passed to the function. The output will then need to be parsed for a line containing [BAD]. If interested in Yara and MD5 signatures feel free to contact me. I won't be posting my signature sets but I might be able to share depending on the organization or group.

Summary
The goal of this tool is to be able to work with embedded SWF files in an easy and quick way. This script is a work in progress. With a recent move to NYC I needed a new project. If you find any bugs or have some comments please contact me or leave a comment.


xxxswf.py - download

# xxxswf.py was created by alexander dot hanel at gmail dot com
# version 0.1 
# Date - 12-07-2011 
# To do list
#   - Tag Parser
#   - ActionScript Decompiler

import fnmatch 
import hashlib
import imp
import math
import os
import re
import struct
import sys
import time
from StringIO import StringIO
from optparse import OptionParser
import zlib

def checkMD5(md5):
# checks if MD5 has been seen in MD5 Dictionary 
# MD5Dict contains the MD5 and the CVE
# For { 'MD5':'CVE', 'MD5-1':'CVE-1', 'MD5-2':'CVE-2'}
    MD5Dict = {'c46299a5015c6d31ad5766cb49e4ab4b':'CVE-XXXX-XXXX'}
    if MD5Dict.get(md5):
        print '\t[BAD] MD5 Match on', MD5Dict.get(md5)
    return    

def bad(f):
    for idx, x in enumerate(findSWF(f)):
        tmp = verifySWF(f,x)
        if tmp != None:
            yaraScan(tmp)
            checkMD5(hashBuff(tmp))
    return 
    
def yaraScan(d):
# d = buffer of the read file 
# Scans SWF using Yara
    # test if yara module is installed
    # if not Yara can be downloaded from http://code.google.com/p/yara-project/
    try:
        imp.find_module('yara')
        import yara 
    except ImportError:
        print '\t[ERROR] Yara module not installed - aborting scan'
        return
    # test for yara compile errors
    try:
        r = yara.compile(r'rules.yar')
    except:
        pass
        print '\t[ERROR] Yara compile error - aborting scan'
        return
    # get matches
    m = r.match(data=d)
    # print matches
    for X in m:
        print '\t[BAD] Yara Signature Hit:', X
    return

def findSWF(d):
# d = buffer of the read file 
# Search for SWF Header Sigs in files
    return [tmp.start() for tmp in re.finditer('CWS|FWS', d.read())]

def hashBuff(d):
# d = buffer of the read file 
# This function hashes the buffer
# source: http://stackoverflow.com/q/5853830
    if type(d) is str:
      d = StringIO(d)
    md5 = hashlib.md5()
    while True:
        data = d.read(128)
        if not data:
            break
        md5.update(data)
    return md5.hexdigest()

def verifySWF(f,addr):
    # Start of SWF
    f.seek(addr)
    # Read Header
    header = f.read(3)
    # Read Version
    ver = struct.unpack('<b', f.read(1))[0]
    # Read SWF Size
    size = struct.unpack('<i', f.read(4))[0]
    # Start of SWF
    f.seek(addr)
    try:
        # Read SWF into buffer. If compressed read uncompressed size. 
        t = f.read(size)
    except:
        pass
        # Error check for invalid SWF
        print ' - [ERROR] Invalid SWF Size'
        return None
    if type(t) is str:
      f = StringIO(t)
    # Error check for version above 20
    if ver > 20:
        print ' - [ERROR] Invalid SWF Version'
        return None
    
    if 'CWS' in header:
        try:
            f.read(3)
            tmp = 'FWS' + f.read(5) + zlib.decompress(f.read())
            print ' - CWS Header'
            return tmp
        
        except:
            pass
            print '- [ERROR]: Zlib decompression error. Invalid CWS SWF'
            return None
        
    elif 'FWS' in header:
        try:
            tmp = f.read(size)
            print ' - FWS Header'
            return tmp
        
        except:
            pass
            print ' - [ERROR] Invalid SWF Size'
            return None
        
    else:
        print ' - [Error] Logic Error Blame Programmer'
        return None
    
def headerInfo(f):
# f is the already opended file handle 
# Yes, the format is is a rip off SWFDump. Can you blame me? Their tool is awesome.
    # SWFDump FORMAT    
    # [HEADER]        File version: 8
    # [HEADER]        File is zlib compressed. Ratio: 52%
    # [HEADER]        File size: 37536
    # [HEADER]        Frame rate: 18.000000
    # [HEADER]        Frame count: 323
    # [HEADER]        Movie width: 217.00
    # [HEADER]        Movie height: 85.00
    if type(f) is str:
      f = StringIO(f)
    sig = f.read(3)             
    print '\t[HEADER] File header:', sig
    if 'C' in sig:
        print '\t[HEADER] File is zlib compressed.'
    version = struct.unpack('<b', f.read(1))[0]
    print '\t[HEADER] File version:', version
    size = struct.unpack('<i', f.read(4))[0]
    print '\t[HEADER] File size:', size
    # deflate compressed SWF
    if 'C' in sig:
        f = verifySWF(f,0)
        if type(f) is str:
            f = StringIO(f)
        f.seek(0, 0)
        x = f.read(8)
    ta = f.tell()
    tmp = struct.unpack('<b', f.read(1))[0]
    nbit =  tmp >> 3
    print '\t[HEADER] Rect Nbit:', nbit
    # Curretely the nbit is static at 15. This could be modified in the
    # future. If larger than 9 this will break the struct unpack. Will have
    # to revist must be a more effective way to deal with bits. Tried to keep
    # the algo but damn this is ugly...
    f.seek(ta)
    rect =  struct.unpack('>Q', f.read(int(math.ceil((nbit*4)/8.0))))[0]
    tmp = struct.unpack('<b', f.read(1))[0]
    tmp = bin(tmp>>7)[2:].zfill(1)
    # bin requires Python 2.6 or higher
    # skips string '0b' and the nbit 
    rect =  bin(rect)[7:] 
    xmin = int(rect[0:nbit-1],2)
    print '\t[HEADER] Rect Xmin:', xmin
    xmax = int(rect[nbit:(nbit*2)-1],2)
    print '\t[HEADER] Rect Xmax:', xmax
    ymin = int(rect[nbit*2:(nbit*3)-1],2)
    print '\t[HEADER] Rect Ymin:', ymin
    # one bit needs to be added, my math might be off here
    ymax = int(rect[nbit*3:(nbit*4)-1] + str(tmp) ,2)
    print '\t[HEADER] Rect Ymax:', ymax
    framerate = struct.unpack('<H', f.read(2))[0]
    print '\t[HEADER] Frame Rate:', framerate
    framecount = struct.unpack('<H', f.read(2))[0] 
    print '\t[HEADER] Frame Count:', framecount
       
def walk4SWF(path):
    # returns a list of [folder-path, [addr1,addrw2]]
    # Don't ask, will come back to this code. 
    p = ['',[]]
    r = p*0
    if os.path.isdir(path) != True and path != '':
        print '\t[ERROR] walk4SWF path must be a dir.'
        return 
    for root, dirs, files in os.walk(path):
        for name in files:
            try: 
                x = open(os.path.join(root, name), 'rb')
            except:
                pass
                break
            y = findSWF(x)
            if len(y) != 0:
                # Path of file SWF
                p[0] = os.path.join(root, name)
                # contains list of the file offset of SWF header
                p[1] = y
                r.insert(len(r),p)
                p = ['',[]]
                y = ''
            x.close()
    return r

def tagsInfo(f):
    return

def fileExist(n, ext):
    # Checks the working dir to see if the file is
    # already in the dir. If exists the file will
    # be named name.count.ext (n.c.ext). No more than
    # 50 matching MD5s will be written to the dir. 
    if os.path.exists( n + '.' + ext):
                c = 2
                while os.path.exists(n + '.' + str(c) + '.' + ext):
                    c =  c + 1
                    if c == 50:
                        print '\t[ERROR] Skipped 50 Matching MD5 SWFs'
                        break
                n = n + '.' + str(c)
                
    return n + '.' + ext
    
def CWSize(f):
    # The file size in the header is of the uncompressed SWF.
    # To estimate the size of the compressed data, we can grab
    # the length, read that amount, deflate the data, then
    # compress the data again, and then call len(). This will
    # give us the length of the compressed SWF. 
    return

def compressSWF(f):
    if type(f) is str:
      f = StringIO(f)
    try:
        f.read(3)
        tmp = 'CWS' + f.read(5) + zlib.compress(f.read())
        return tmp
    except:
        pass
        print '\t[ERROR] SWF Zlib Compression Failed'
        return None

def disneyland(f,filename, options):
    # because this is where the magic happens
    # but seriously I did the recursion part last..
    retfindSWF = findSWF(f)
    f.seek(0)
    print '\n[SUMMARY] %d SWF(s) in MD5:%s:%s' % ( len(retfindSWF),hashBuff(f), filename )
    # for each SWF in file 
    for idx, x in enumerate(retfindSWF):
        print '\t[ADDR] SWF %d at %s' % (idx+1, hex(x)),
        f.seek(x)
        h = f.read(1)
        f.seek(x)
        swf = verifySWF(f,x)
        if swf == None:
            continue
        if options.extract != None:
            name = fileExist(hashBuff(swf), 'swf')
            print '\t\t[FILE] Carved SWF MD5: %s' % name 
            try:
                o = open(name, 'wb+')
            except IOError, e:
                print '\t[ERROR] Could Not Create %s ' % e
                continue 
            o.write(swf)
            o.close()
        if options.yara != None:
            yaraScan(swf)
        if options.md5scan != None:
            checkMD5(hashBuff(swf))
        if options.decompress != None:
            name = fileExist(hashBuff(swf), 'swf')
            print '\t\t[FILE] Carved SWF MD5: %s' % name 
            try:
                o = open(name, 'wb+')
            except IOError, e:
                print '\t[ERROR] Could Not Create %s ' % e
                continue
            o.write(swf)
            o.close()
        if options.header != None:
            headerInfo(swf)
        if options.compress != None:
            swf = compressSWF(swf)
            if swf == None:
                continue 
            name = fileExist(hashBuff(swf), 'swf')
            print '\t\t[FILE] Compressed SWF MD5: %s' % name
            try:
                o = open(name, 'wb+')
            except IOError, e:
                print '\t[ERROR] Could Not Create %s ' % e
                continue
            o.write(swf)
            o.close()

def main():
    # Scenarios:
    # Scan file for SWF(s)
    # Scan file for SWF(s) and extract them 
    # Scan file for SWF(s) and scan them with Yara
    # Scan file for SWF(s), extract them and scan with Yara
    # Scan directory recursively for files that contain SWF(s) 
    # Scan directory recursively for files that contain SWF(s) and extract them
    
    parser = OptionParser()
    usage = 'usage: %prog [options] <file.bad>'
    parser = OptionParser(usage=usage)
    parser.add_option('-x', '--extract', action='store_true', dest='extract', help='Extracts the embedded SWF(s), names it MD5HASH.swf & saves it in the working dir. No addition args needed')
    parser.add_option('-y', '--yara', action='store_true', dest='yara', help='Scans the SWF(s) with yara. If the SWF(s) is compressed it will be deflated. No addition args needed')
    parser.add_option('-s', '--md5scan', action='store_true', dest='md5scan', help='Scans the SWF(s) for MD5 signatures. Please see func checkMD5 to define hashes. No addition args needed')
    parser.add_option('-H', '--header', action='store_true', dest='header', help='Displays the SWFs file header. No addition args needed')
    parser.add_option('-d', '--decompress', action='store_true', dest='decompress', help='Deflates compressed SWFS(s)')
    parser.add_option('-r', '--recdir', dest='PATH', type='string', help='Will recursively scan a directory for files that contain SWFs. Must provide path in quotes')
    parser.add_option('-c', '--compress', action='store_true', dest='compress', help='Compresses the SWF using Zlib')
    
    (options, args) = parser.parse_args()

    # Print help if no argurments are passed
    if len(sys.argv) < 2:
        parser.print_help()
        return

    # Note files can't start with '-'
    if '-' in sys.argv[len(sys.argv)-1][0] and options.PATH == None:
        parser.print_help()
        return
    
    # Recusive Search
    if options.PATH != None:
        paths = walk4SWF(options.PATH)
        for y in paths:
            #if sys.argv[0] not in y[0]:
            try:
                t = open(y[0], 'rb+')
                disneyland(t, y[0],options)
            except IOError:
                pass
        return 
        
    # try to open file 
    try:
        f = open(sys.argv[len(sys.argv)-1],'rb+')
        filename = sys.argv[len(sys.argv)-1]
    except Exception:
        print '[ERROR] File can not be opended/accessed'
        return

    disneyland(f,filename,options)
    f.close()
    return 
        
if __name__ == '__main__':
   main() 

4 comments:

  1. Superb practical work showing through this blog and i am really glade to join this blog through this commenting.

    ReplyDelete
  2. What does it mean that the header says [SUMMARY] 0 SWF(s)?

    ReplyDelete
    Replies
    1. Do you have an example? My email is in the code. Send me an email and we can check it out.

      Delete
  3. what is the license under which the xxxswf.py is available.

    ReplyDelete