Stage 1 -Non-Tabbed, Stage 2 Tabbed Added |
function sHOGG(c,d,e){ var idx = d % c.length; var s = ""; while (s.length < c.length){ s += c[idx]; idx = (idx + e) % c.length; } return s; }
Code has a visual flow structure to it. If we were to look at the black as negative space we can see the flow. Tab, Tab, Tab, Tab>Tab, Tab>Tab, Tab, etc. Anyone who has programmed in Python understands this flow.
Most programmers use this structure because it's easier to read. Sometimes code will have new lines chars stripped to save space but the code can be cleaned up using jsbeautifier. This will give a somewhat original state. Even when structurally cleaned up most obfuscation destroy the flow. How does it destroy it? Well let's graph the code and find out. Note all Python code can be found at the end of the post.
Okay time for the disclaimer. I wrote all the code and came up with the concept on three hours of sleep after a late night. I almost didn't post it but it made me starting thinking about lexical analysis, graph theory and all the cool stuff people smarter than myself are doing. Hopefully it does the same for others.
jquery.cycle.lite[1].js |
MicrosoftAjax[1].js |
PDF JS |
Second PDF JS |
JS from CVE-2013-0641 |
Structure Example Code 0 through 5,800 |
Structure Example Code 5,800 through 6,400 |
Structure Example Code 6,400 to EOF |
Here are some example pulled from PDFs with obfuscated Javascript
Mean 621.395348837 - Median 10.0
Mean 341.941176471 - Median 18.5
Mean 92.0138528139 - Median 42.0 (CVE-2013-0641)
If the Mean divided by Median is greater than two is a decent range to detect suspicious code. Time for the play at home version. The following is an example of the commands for the below script. We will need scipy, matplotlib, numpy and jsbeautifier.
>>> n = GraphMe() // Create instance >>> n.process(open('pdf1.out', 'r')) // open JS file >>> n.plot() // plot it >>> n.outlier() // check if the JS is suspicious Suspicious: mean 642.476190476 median 11.5
I might create a repo for it. Couldn't think of a name. Which seems to be the hardest part of creating a repo.
## created by alexander<dot>hanel<at>gmail<dot>com ## 2/21/2013 ## No license, free game to use, just give credit or you suck. import sys from StringIO import StringIO import pylab as pylab import matplotlib.pyplot as plt import numpy as num import jsbeautifier # https://github.com/einars/js-beautify class GraphMe(): def __init__(self): self.fullData = '' self.bjs = False self.PS = True self.plotData = [] self.x = [] self.y = [] def beautifier(self, buffer): 'clean up the JS' try: temp = jsbeautifier.beautify(buffer.read()) except: print "ERROR: jsbeautifier" print "EXITING...." sys.exit() return temp def process(self,data): 'disneyland' if self.bjs == True: data = self.beautifier(data) if type(data) is str: data = StringIO(data) self.fullData = data.readlines() # clean up JS that is all one line if len(self.fullData) == 1 or self.PS == True: self.PS = False self.bjs = True data.seek(0) self.process(data) for t in range(len(self.fullData)): self.x.append(t) for t in self.fullData : self.y.append(len(t)) def outlier(self): 'calcuate if mean/median < 2' if num.mean(self.y)/num.median(self.y) > 2: print "Suspicious: mean %s median %s" % (num.mean(self.y), num.median(self.y)) def graph(self): 'create graph of the JS' fig = pylab.figure() ax = fig.add_subplot(1,1,1) ax.bar(self.x,self.y) pylab.show() def plot(self): 'create plot of the JS' plt.plot(self.y, 'ro-') plt.ylabel('Test') plt.show()
Statistical(ly)Suspects that can be like StaSus
ReplyDeletecool?
I like it. Now that I have a name I just need to create the repo. Thanks.
Deletethere is way too much legitimate obfuscated code out there... you have to account for the false positives.
ReplyDeleteI completely agree except for instances of obfuscated code in PDFs.
Delete