Example Data:
0000090: f0ed e0b8 0000 0000 0000 0000 0000 0000 ................ 00000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000b0: 576b 6a70 236a 7023 6e62 646a 6023 736f Wkjp#jp#nbdj`#so 00000c0: 6260 6600 0000 0000 0000 0000 0000 0000 b`f............. 00000d0: 635f 5e44 175e 4417 5617 4352 4f43 1b17 c_^D.^D.V.CROC.. 00000e0: 5859 5b4e 1756 1743 5244 4319 1906 0504 XY[N.V.CRDC..... 00000f0: 0300 0000 0000 0000 0000 0000 0000 0000 ................
As most people know XOR is substitution cypher. A major flaw of substitution cyphers is they can be broken either by frequency analysis or bruteforcing. I have my doubts that frequency analysis would be a good way to tackle this problem. Frequency analysis is useful for dictionary text but what about a URL such as http://google.com? It would be an interesting task to do frequency analysis of ascii strings extracted from compiled exeuctables files. If we were to dump all the ASCII strings in *.dlls and *.exe files in system32 we would get about 60 mb of strings. Maybe another post. This leaves us with the option of bruteforcing. The main problem with bruteforcing data for valid strings is the how do we know the range of the encoded data and if it's a valid string and not a false positives. With our data example above we know that the encoded data range starts and ends with '\x00'. We could create a regular expression to find all data that falls between those characters. Our regular expression pattern would be '\x00(?!\x00).+?\x00'. If the ranges were different we would need another regular expression pattern. First problem solved. Now lets assume that we have XORed each byte in the block of data with a static key between 0x1 and 0xff. Now we are at the second problem. How do we know if the XORed data is valid text? We know that a valid string must be printable ascii. If we XOR a block of data and the data contains a non-printable ASCII characters we can ignore that set. That will eliminate about 0x9b or 155 values between 0x0 - 0xff. This will make our set of false positives a little bit smaller.
Function for testing if valid Ascii:
def valid_ascii(char): if char in string.printable[:-3]: return True else: return None Output of string.printable >>> string.printable '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
Let's see what the output would be from bruteforcing the example above would look like.
0xaf key 0x1 Vjkq"kq"oceka"rncag 0xaf key 0x2 Uihr!hr!l`fhb!qm`bd 0xaf key 0x3 This is magic place !!! Valid String 0xaf key 0x4 Sont'nt'jf`nd'wkfdb 0xaf key 0x5 Rnou&ou&kgaoe&vjgec 0xaf key 0x6 Qmlv%lv%hdblf%uidf` 0xaf key 0x7 Plmw$mw$iecmg$thega 0xaf key 0x8 _cbx+bx+fjlbh+{gjhn 0xaf key 0x9 ^bcy*cy*gkmci*zfkio 0xaf key 0xa ]a`z)`z)dhn`j)yehjl 0xaf key 0xb \`a{(a{(eioak(xdikm 0xaf key 0xd Zfg}.g}.coigm.~bomk 0xaf key 0xe Yed~-d~-`ljdn-}alnh 0xaf key 0x12 Eyxb1xb1|pvxr1a}prt 0xaf key 0x13 Dxyc0yc0}qwys0`|qsu 0xaf key 0x16 A}|f5|f5xtr|v5eytvp 0xaf key 0x17 @|}g4}g4yus}w4dxuwq 0xaf key 0x18 Osrh;rh;vz|rx;kwzx~ 0xaf key 0x1a Mqpj9pj9tx~pz9iuxz| 0xaf key 0x1c Kwvl?vl?r~xv|?os~|z 0xaf key 0x1e Iutn=tn=p|zt~=mq|~x 0xaf key 0x29 ~BCY CY GKMCI ZFKIO 0xaf key 0x2a }A@Z @Z DHN@J YEHJL 0xaf key 0x5d 67-~7-~3?97=~.2?=; 0xaf key 0x5e 54.}4.}0<:4>}-1<>8 0xcf key 0x22 A}|f5|f5t5apma95z{yl5t5apfa;;$'&! 0xcf key 0x23 @|}g4}g4u4`ql`84{zxm4u4`qg`::%&' 0xcf key 0x25 Fz{a2{a2s2fwjf>2}|~k2s2fwaf<<# !& 0xcf key 0x28 Kwvl?vl?~?kzgk3?pqsf?~?kzlk11.-,+ 0xcf key 0x2a Iutn=tn=|=ixei1=rsqd=|=ixni33,/.) 0xcf key 0x2b Htuo<uo<}<hydh0<srpe<}<hyoh22-./( 0xcf key 0x2c Osrh;rh;z;o~co7;tuwb;z;o~ho55*)(/ 0xcf key 0x2e Mqpj9pj9x9m|am59vwu`9x9m|jm77(+*- 0xcf key 0x2f Lpqk8qk8y8l}`l48wvta8y8l}kl66)*+, 0xcf key 0x32 Qmlv%lv%d%q`}q)%jki|%d%q`vq++4761 0xcf key 0x33 Plmw$mw$e$pa|p($kjh}$e$pawp**5670 0xcf key 0x34 Wkjp#jp#b#wf{w/#lmoz#b#wfpw--2107 0xcf key 0x35 Vjkq"kq"c"vgzv."mln{"c"vgqv,,3016 0xcf key 0x36 Uihr!hr!`!udyu-!nomx!`!udru//0325 0xcf key 0x37 This is a text, only a test..1234 !!! Valid String 0xcf key 0x38 [gf|/f|/n/{jw{#/`acv/n/{j|{!!>=<; 0xcf key 0x39 Zfg}.g}.o.zkvz".a`bw.o.zk}z ?<=: 0xcf key 0x3a Yed~-d~-l-yhuy!-bcat-l-yh~y##<?>9 0xcf key 0x3d ^bcy*cy*k*~or~&*edfs*k*~oy~$$;89> 0xcf key 0x3e ]a`z)`z)h)}lq}%)fgep)h)}lz}''8;:= 0xcf key 0x3f \`a{(a{(i(|mp|$(gfdq(i(|m{|&&9:;< 0xcf key 0x69 67-~7-~?~*;&*r~102'~?~*;-*ppolmj 0xcf key 0x6a 54.}4.}<})8%)q}231$}<})8.)ssloniThe above output was created using iheartxor (see below)
Yeah, that's a lot of extra data. We have a 456 bytes file with an output size of 2,090 bytes. That's a 400% increase in size. How can we slim this down? Let's say we did this on a large file such as Notepad.exe with the same data pasted at the end. The original Notepad.exe has a size of 69,280 bytes and the output is 117,400 bytes. That's a 69% increase. Who want's to look through that much data for a couple of XORed strings? This is exactly why people use string searches. To be continued...I think I have to go ask Reddit.
Introducing iheartxor
iheartxor is a tool that can be used to bruteforce xor encoded strings within a user defined data block via a regular expression pattern (-r). The default search pattern is a regular expression that searches for data between null bytes ('\x00'). The tool can also be used to do a straight xor on a file with -f file.name -k value. The value must between 0x0-0x255. The tool is still in development. Please leave comments or ideas. I'm still trying to figure out the best way to detect valid ASCII text without using a search strings such as xorsearch.
No comments:
Post a Comment