Hooked on Mnemonics Worked for Me: Malware Randomization Via Resource Compilers

A while back I was searching for a set of Gozi samples. The main indicator that I had to search on was an IP address. I was able to track down 40+ unique MD5 samples. When reviewing the samples something immediately caught my eye. It was the variation in sizes of the files. Typically variants only deviate slightly in size by a couple of bytes. Ssdeep only had three matches on all the files in the set. I was curious how these files were different. When I started to dig deeper into the executables I noticed that the malware authors were using resource compilers to randomize the files. This post is an overview of resoure compilers, changeable resources, the steps the malware authors took and data on the randomized files. For starters we will need to understand how resource compilers affect an executable.

Resource Compilers Introduction

Resource compilers are tools that allow users to include specified read only resources into their executables. These resources include cursors, icons, bitmaps, Dialog Boxes, Fonts, HTML documents, stings and executable file version data. Resource compilers use resources defined in scripts (.rc files) that describe the files and settings to be compiled. The syntax for RC is similar to those of Microsoft C/C++ compiler except it supports a subset of preprocessor directives, defines, and pragmas. Once compiled, the compiled resource files (.res) can be linked to an executable. [1] Malware authors can use scripting languages, word dictionaries, icons and command line resource compilers to create randomized executables. Some resource editing tools can be used to modify an executable file directly rather than modifying the executable during the linking and compiling process. ResHacker is one such notable tools.

For this introduction we will be going over a subset of resources that were changed by malware authors to break signature of hashes and file version strings. The resources are Dialog Boxes, Strings, Icons and Version Information. Using Resedit to extract resources from an executable we will go over the fields and data from a sample. The comments will give details of what these values are used for.

Dialog

Used to define the position and dimension of the dialog box and style.

///////////////////////////////////////////////////////////////////////////////////////////
LANGUAGE 0, SUBLANG_NEUTRAL     
    //  Default custom locale language
104 DIALOGEX 0, 0, 210, 414     
    // 104 is the name id which is a unique name; 0 = x, 0 = y, 210 = width, 414 = height 
STYLE DS_MODALFRAME | DS_SHELLFONT | WS_CAPTION | WS_POPUP | WS_SYSMENU 
    // dialog box template style 
    // DS_MODALFRAM = dialog box with title bar and window, DS_SHELLFONT = use the system font
    // WS_CAPTION = caption enabled, WS_SYSMENU = Window has title bar
CAPTION "Darkness key, brick they."  
    // A character string enclosed in double quotation marks that is the caption for the dialog box.
FONT 8, "MS Shell Dlg", 400, 0, 1   
    // Font, 8 = point size, "MS Shell Dlg" = type face, 400 = font weight (400 is default), 
    //  0 = italic BOOL, 1 = character set ( default )
{
    EDITTEXT        1001, 75, 153, 49, 14, ES_AUTOHSCROLL
    // EDITTEXT = rectangular region in which the user can type and edit text
    // 1001 = id, 75 = x, 153 = y, 49 = width, 14 = height 
    LTEXT           "Stone muscle bone kept.", -1, 139, 306, 184, 8, SS_LEFT
    // LTEXT  = left-aligned text control. Simple rectangle displaying the given text left-aligned in the rectangle
    // "Stone muscle bone kept." = text, -1 = id, 139 = x, 306 = y, 184 = width, 8 = height,  SS_LEFT = style 
}
    Refernces: [2], [3], [4], [5] 
//////////////////////////////////////////////////////////////////////////////////

/////////

These settings are used to add random data to the executable file.

String Table

Used to define one or more string resources for an application.

///////////////////////////////////////////////////////////////////////////////////////////
LANGUAGE 0, SUBLANG_NEUTRAL
STRINGTABLE
{
    101                           "Her bowl July tool towards slightly gate."
    102                           "From hit fish."
    103                           "In flew these balloon\?"
}
    // 101 = stringid, "Her bowl July tool towards slightly gate." = string
    References: [6]
///////////////////////////////////////////////////////////////////////////////////////////

These settings are used to add random data to the executable file.

Icon

Defines a bitmap that defines the shape of the icon to be used for a given application or an animated icon. The first icon in the resource is displayed by explorer.exe.

///////////////////////////////////////////////////////////////////////////////////////////
LANGUAGE 0, SUBLANG_NEUTRAL
    //  Default custom locale language
MAINICON           ICON     
    // MAINICON = Default Icon 
{
  '00 00 01 00 01 00 20 20 00 00 01 00 20 00 A8 10'
  '00 00 16 00 00 00 28 00 00 00 20 00 00 00 40 00'
  '00 00 01 00 20 00 00 00 00 00 00 10 00 00 00 00'
  '00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'
  'AA 3E 00 00 AA 7F 00 00 AA 86 00 00 AA 86 00 00'
  .....
  'FF FF 83 FF FF FF A1 FF FF FF FF FF FF FF'
}
    // HEX Encoded icon file. 

Second ICON
   
LANGUAGE 0, SUBLANG_NEUTRAL
108                ICON  
    // 108 = Icon ID
{
  '00 00 01 00 01 00 20 20 00 00 01 00 20 00 A8 10'
  '00 00 16 00 00 00 28 00 00 00 20 00 00 00 40 00'
  '00 00 01 00 20 00 00 00 00 00 00 10 00 00 00 00'
  .......
  'FF FF FF FF FF FF FF FF FF FF FF FF FF FF'
}

    References:[7]   
///////////////////////////////////////////////////////////////////////////////////////////

These settings are used to add random data to an executable file. Due to the size of the bitmap files multiple icons can substantially change the files size of the executable. The below image contains all the icons that were extracted from the 40+ files. In the red box we can see that five icons were embedded in a single executable.

Version Information

Defines version information for an executable file. The resource contains such information about the file as its version number, its intended operating system, and its original file name.

///////////////////////////////////////////////////////////////////////////////////////////
LANGUAGE 0, SUBLANG_NEUTRAL
1 VERSIONINFO
        // 1 = Version Information. This value must be 1. 
    FILEVERSION     7,6,3,3
        // Binary version number for the file. 7,6,3,3 = version 
    PRODUCTVERSION  7,6,3,3
        // Binary version number for the product with which the file is distributed. 7,6,3,3 = version
    FILEOS          VOS_NT_WINDOWS32
        // Operating system for which this file was designed.
    FILETYPE        VFT_APP
        // VFT_APP = File contains an application.
    FILESUBTYPE     VFT2_UNKNOWN
        // FILESUBTYPE = Driver type is unknown.
    FILEFLAGSMASK   0x0000003F
        // Bits in the FILEFLAGS statement are valid.
    FILEFLAGS       0x00000000
        // Attributes of the file. The fileflags parameter must be the combination of all the file flags that are valid at compile time.
{
    BLOCK "StringFileInfo"
    // Defines a string information block.
    // Layout: BLOCK "StringFileInfo" { BLOCK "lang-charset" {VALUE "string-name", "value" . . . }}
    {
        BLOCK "000004B0"
        // 000004B0 = Unknown lang-charset
        {
            VALUE "CompanyName", "Stayonline Hampton Inn Bloomington"
            // "Company Name" = string-name, "Stayonline Hampton Inn Bloomington" = value
            VALUE "FileDescription", "Catch under We may, chamber learn."
            VALUE "FileVersion", "7,6,3,3"
            VALUE "InternalName", "mice.exe"
            VALUE "LegalCopyright", "Copyright (C) 2010"
            VALUE "OriginalFilename", "mice.exe"
            VALUE "ProductName", "Catch under We may, chamber learn."
            VALUE "ProductVersion", "7,6,3,3"
        }
    }
    BLOCK "VarFileInfo"
    // Represents the organization of data in a file-version resource. 
    {
        VALUE "Translation", 0x0000, 0x04B0
        // Language IDs 0x0000 = Default and 0x04B0 = Unicode
    }
}

    References: [8], [9]
///////////////////////////////////////////////////////////////////////////////////////////

Once the malware authors have the process to modify the resources it's trivial to modify the strings to break hashes but having different sizes takes a couple more steps. First the authors will compile their DLL, once this is done they will compress it with UPX, then use a resource compiler to add random strings, random icons, random number of embedded icons and random dialog boxes. The DLL will be embedded in an executable dropper that will be packed with UPX and then randomized again. This ensures that certain attributes of the portable executable will be different for each sample. Below are some graphs demonstrating how each attribute is different. For people wanting to do the at home follow along version the data can be found here (Google Docs Link).

Why is knowing about this technique valuable?

It's important to understand when it comes to creating reliable detection signatures that data related to resources is cheap and unreliable. Malware authors can automatically create resources based off of arbitrarily values, dictionaries, or mimicking trusted software. Targeting expensive data for detection is a more reliable approach for detection. Expensive can be defined as any process that has to be taken by the programmer to manually change the code.An example of expensive string would be server command such as "httpgrabber". This command would be static and hardcoded into the executable and the server. Modifying this string would take multiple steps by the malware authors. They first would need to change the source code of the executable or builder and then change the source code in the server. Since these strings can not be arbitrarily changed which makes them more expensive than the resource strings. The most valuable part of an executable from a detection standpoint is the original unpacked code in memory or the dumped process. This removes all the four play of packers, obfuscators, installers and allows the scanner to target the most expensive part of the code. If the code is in a packed state the code or executable is useless for classification and detection. The expenisve code is not being targeted but the packer is. This is a good example of why scanning on files on disk is only useful if the expensive code is not compressed. The best time to scan a file is when it has unpacked itself in memory. If we were to run two of these executables, dump the payload DLLs from memory and then scan with a tool like Cospare we would have a 72.17% match in code. Odds are it would be closer if I spent sometime and updated my old tools or had access to Bindiff.

From a defense perspective this is a good example on why network mitigations are very important. In the 40+ samples that I analyzed all of the samples had the same command and control (C2). By blocking the C2 at the perimeter we would be able to break one of the most important chains in the attack [10]. If anti-virus software didn't detect one of the samples the malware would still be live in an enterprise environment.

References:

Hooked on Mnemonics Worked for Me

Malware Randomization Via Resource Compilers