Examining a piece of malware for strings (sequences of printable characters) can reveal a few clues about what the malware does, or what it is capable of doing. Most malware is packed or otherwise obfuscated these days, and this series of articles demonstrates one of the reasons why.
I was fortunate enough to have my Dionaea honeypot capture a piece of malware that hadn’t been packed or otherwise obfuscated, and decided to use it to show how much information strings can reveal.
During static analysis of this malware sample, I ran the UNIX strings(1) command to extract all the strings that were at least four characters long. I prefer working under UNIX (which given my background, is hardly surprising) as it enables me to script tasks reasonably easily.
If you prefer working under Windows, I believe Mark Russinovich has created a Windows ‘strings’ command as part of the Sysinternals suite of utilities which, incidentally, I’ve found quite useful for forensic/malware analysis of Windows systems.
Now, this is where some experience comes in handy. The strings command will list all the strings that are at least four characters (by default) long. It is up to you to interpret what they may mean, and for this, remembering where you’ve seen similar strings before can be the key. I had a look at some of the strings from the malware sample by running strings <filename>:
H:mm:ss dddd, MMMM dd, yyyy M/d/yy
These strings look like they describe the format of times and dates, and possibly USA ones at that (given the month appears before the day of the month).
Following that, I found a reverse list of full month names (December, November, October, …), a reverse list of days of the week (Saturday, Friday, Thursday, …), and a list of short name versions:
SunMonTueWedThuFriSat JanFebMarAprMayJunJulAugSepOctNovDec
These latter two strings are handy for converting between day/month names and numbers by taking the first three characters of the full name, finding where they occur in the relevant string above, adding two and dividing by three (or subtracting one and dividing by three for zero based numbering).
Next in this exciting list of strings, I found what looked like Win32 API function names and DLL names. This is more than likely the API names in the PE file’s import directory (and the offsets from strings -tx match the VMAs from the import table dump from objdump -x which strengthens those thoughts).
Now this is where it really gets exciting (I was just pulling your leg in the last paragraph) as next up I saw a series of strings, which included:
zxcv yxcv test123 test temp123 temp sybase super shadow server secret root qwerty
That list, and the fact that it goes on to include the strings ‘password’, ‘passwd’, ‘pass’, ‘mypc123’, ‘mypass123’, and ‘mypass’, suggested to me that these strings are a list of words used to guess passwords.
I noticed the string ‘baseball’ in there, and the only other string pertaining to a sport is ‘golf’. This also lead me to think that the malware may have originated in the USA (or is expecting to target the USA, or baseball fanatics) as Top End Sports’ Popularity of Baseball Around the World shows that baseball is more popular in the USA, by far (and it’s on the Internet so it must be true!).
These strings, also suggests that the malware has some code to brute force (keep guessing) passwords. This is an example of a situation where combining findings from multiple sources can yield extra information. For instance, information from a network capture showing how this malware infected the honeypot, can suggest which passwords it might be attempting to brute force (I know because I’ve seen it, but that is a topic for a separate blog posting).
I then found the string ‘PHIME2008’ which looks like it could be used as the name of a mutex (an object used for mutual exclusion, typically used to make sure that only one thread of code accesses a data object at one time). Malware often uses a mutex to indicate a previous infection by the same malware strain/version).
However, given that the next string in the list was ‘Software\Microsoft\Windows\CurrentVersion\Run’, the former could be the name of a subkey underneath the ‘Run’ key, used to cause Windows to start the malware on boot (persistence).
There is the string ‘ /SYNC’, which looks like a command line option, and the leading space character suggests it will be appended to another string. Its placement immediately (string wise) after the ‘Run’ key name suggests that it may be added to the command line to indicate that the malware was auto-started and not being run to infect the target. This will be something to look for when disassembling the malware.
%s\ipc$ %s TaskOK %s %s %s %s CopyOK %s %s %s %s\admin$\system32\dnsapi.exe %s LoginOK %s %s %s \\%s %d.%d.%d.%d
The ‘%s\ipc$’ string looks like a known inbuilt Windows share, as does ‘%s\admin$’ further down. Again, looking at a network capture of the infection process will help explain why these are here. The ‘TaskOK’ (could this be a Windows Scheduler task?), ‘CopyOK’, and ‘LoginOK’ lines look like they could be either log messages, or responses to commands.
The ‘TaskOK’ could be a Windows Scheduler task, and that could explain why the ‘%s\ipc$’ share is in there. The fact that the ‘CopyOK’ line is next to the ‘%s\admin$\system32\dnsapi.exe’, which looks like a Windows share and filename, suggests that that filename could be the source or target of a copy.
The ‘\\%s’ string looks like a format string for the server part of a UNC name. The ‘%d.%d.%d.%d’ following it looks like a format string for an IP address. These two format strings could be used to produce a UNC server name consisting of an IP address, which is the kind of thing that would come in handy if you’ve just found the IP address of a Windows host that you would like to try to infect.
“Are we there yet?” I hear you say… almost.
%04d%02d%02d%02d%02d%02d
I don’t know about you, but I reckon that could be a format string for a date time stamp in the format yyyyMMddhhmmss, possibly for logging purposes.
Next up, I found what look like HTTP related strings:
HTTP/1.1 Host: [censored].jp lg1=%s&lg2=%s&lg3=%s&lg4=%s&lg5=%s&lg6=%s 1.003 GET /updata/TPDA.php? 555.206.117.59 lg1=%s&lg2=%s&lg3=%s&lg4=%s&lg5=%s&lg6=%s&lg7=%d GET /updata/TPDB.php? NONE
I have censored the first part of the hostname (mainly due to language which some readers may find offensive if they accidentally stumble across this whilst searching for something else) and I’ve obviously concealed the first octet of the IP address.
This suggests that the malware may issue two HTTP requests, one for TPDA.php and one for TPDB.php. The trailing ‘?’ suggests a query string will follow, which doesn’t come as much of a surprise as the two strings starting with ‘lg1’ look just like HTTP query strings.
I’m thinking the ‘1.003’ looks like a version number, and the reason for the ‘NONE’ is unclear at this point, but given that it is near the HTTP requests and query string, it could be used to indicate that it is unable to get a piece of information to include in the query string.
The fact that the HTTP requests start with ‘updata’ (which could be short for ‘upload data’), the filename of the request looks like it is a PHP script (using a PHP script will make it easy to process input from the malware), and the URLs have a trailing query string, suggest that the malware may issue these HTTP queries to leak information.
http://[censored].jp/updata/ACCl3.jpg \msupd.exe %s%s
Again, the host name in the URL was the same as that in the ‘Host:’ header above. Here we have what looks like a URL to download an image file, the file name of a known Windows executable file, and a format string.
Could it be that the JPG file is really a Windows PE executable file which is downloaded to \msupd.exe? Could the ‘%s%s’ format string be used to append one string to the end of another (you’ll see why I thought that in part two). How do ‘dnsapi.exe’ and ‘msupd.exe’ fit in to this picture? Is the cuppa that I’ve just remembered making, now cold?
If these questions have you on the edge of your seat, then tune in for part two where I will ramble on a tad more on how to find the functions that use these strings, and from that, have a bit of a stab at what each of those functions may do.