I was just in the middle of doing a post on analysing a malware sample and I thought that I should start it off by documenting my setup. It then occurred to me that doing so was making my post somewhat longer, and since the setup would apply pretty much to all of my malware analysis work, I should document it separately. So here it is — my malware analysis setup.
I have a laptop running Linux (Debian) which I do a lot of the static analysis on, and it hosts both VirtualBox and kvm virtuals. I’m trying to migrate from VirtualBox to a kvm setup because then I can run my analysis virtuals on another box of mine which is running my virtual honeypots, and then automate submission from the honeypots to the analysis host (because then it’ll no longer be hosted on a laptop which may or may not be somewhere else at the time) — it’s not possible (as far as I know) to run both VirtualBox and kvm virtuals on the same host at the same time.
I’ll describe what software I’m using, but not how to install it. The reason being, there is a reasonable chance that you’re running a different Linux distribution to what I’m running, with a different window manager and different ways of configuring it. As far as I can remember, none of the software required building from source, nor anything potentially finicky like that, and if you’re an administrator of a host then you should know how to find and install software on it. Having said that, I believe that I found Debian packages for most, if not all of the software that I’m using apart for some of the Python libraries required by Cuckoo, which I installed using pip.
Host Setup
Virtualisation software
I’m in the early days of having switched from VirtualBox to kvm. I can’t get the shared folders to work with kvm and Windows guests, which makes it annoying copying files and log data between the virtuals and the host, but that’s not too hard to work around.
Analysis software
Cuckoo Sandbox
Cuckoo Sandbox is software to aid dynamic analysis of malware by running a sample and producing a report showing the sample’s behaviour. It documents, amongst other things, which files the sample dropped, registry keys that were changed, and processes that it started.
Cuckoo has a client / server architecture. The server component is written in Python and, in my case, runs on the virtualisation host. The client side of things, which in my case runs on the virtual guest (the dynamic analysis machine), consists of two parts: a Python ‘agent’ script which listens for XML RPC connections from the server; and a DLL which the ‘agent’ script injects in to the malware sample. The injected DLL hooks and logs calls to various Windows API calls.
The server and client components need to be able to talk with each other over the network, as mentioned in the Cuckoo documentation.
Wireshark and tshark
Wireshark and tshark are good for monitoring what network traffic the guest machine produces while running the malware sample. Wireshark gives you a graphical user interface, which makes it easier to use. tshark, however, is a text based command that is awfully nifty for extracting data from specific fields (from various protocol stack layers) in the packets, which makes it awfully nice to use in scripts. tshark is similar to tcpdump, but with the ability to control which fields are displayed.
I have mentioned tshark in previous blog posts as it is awfully handy for getting a list of DNS queries / responses, or syslog messages, for instance, out of a capture file in a way that can easily be used by another command — hence it’s good for scripting.
Binutils
The GNU Binutils suite. It contains commands that are useful for software development so your distribution will more than likely have it packaged at least, if not already installed. The Binutils suite contains a few useful commands for static analysis.
objdump(1) can do a number of things, and even though it is running on a Linux platform, it can recognise Windows PE files (and if your particular version wasn’t compiled with support for Windows PE files, then you should be able to get the MingGW version of Binutils. Your distribution may already have this packaged — Debian does).
I usually use objdump with the ‘-x’ option to dump all header information. This will give you information from the PE header, such as the compilation time stamp, image base address, entry point offset, etc., list the data directories that are present, dump the imports, exports, relocations, and section information. This makes anything packed with UPX stand out because you’ll see the UPX sections in the section information. It can also disassemble code, both ELF and PE binaries, and arbitrary opcodes that aren’t contained within a known executable file format, such as those found in shellcode and the like.
strings(1) is another useful command provided by Binutils, which lists all of the strings (funnily enough) contained in the binary. That is, a run of at least four (by default — this is configurable) ASCII text characters. This will often show things like URLs, domain names, user names, IRC commands, SMTP commands, FTP commands, and other strings (a number of which I’ve put on a t-shirt) that can often give you a clue as to a sample’s functionality.
od/hexdump/bvi
Some sort of hex dumping command / application. This is useful for examining data that isn’t ASCII printable characters, and for looking for patterns in binary data, or even for determining which bytes differ between two samples of binary data.
It is sometimes possible to have a stab at the type of data by looking at a hexadecimal representation of it. For instance, audio data over a VoIP connection often looks like a large collection of bytes fluctuating around the value 0x80.
You can also search through hex data looking for known IP address bytes, and values that look like they could be a seconds since epoch time stamp (annoyingly, I missed the time when the four byte second since epoch value, when interpreted as an ASCII string, spelt my name).
Python
Python is a scripting language that is used by a number of malware analysis / reverse engineering tools (and debuggers), and Cuckoo in particular. I’ve also used it to de-obfuscate JavaScript and macros found in Microsoft Office documents.
yara
Pattern matching tool designed to help malware analysts ‘identify and classify malware samples’. This is something that I need to look at using more, and I have some ideas.
Network setup
As for the network setup, I created a bridge interface on the host, using brctl(8). This bridge interface, on the host, is given an IP address which is given to the virtual machine’s (guest) operating system as the default gateway. The virtual machine is then configured to bridge the guest’s network interface to this bridge interface on the host.
iptables(8) rules are used to protect the host and other local networks from the malware sample(s). I also have iptables(8) rules to ‘masquerade’ the guest’s traffic so that it leaves the local network with the same IP address as that of the host. This saves setting up routing to route traffic back to the guest.
IP forwarding needs to be enabled such that the host will route packets between the host’s Internet interface and its bridge interface that the guest is using. This involves setting /proc/sys/net/ipv4/conf/<interface>/forwarding to ‘1’ for the bridge interface that the virtual host(s) are using, and for the interface that the host is using to get to the Internet.
That’s the basic TCP/IP networking covered, in that that’ll allow your guest to connect to the Internet and prevent it from connecting to any of your local hosts / networks. There are, however, a few other things to consider which will come in handy.
Configure the host as the default router / gateway in the guest. This will allow the guest to at least spit IP packets out without the host having to forward them to its (the host’s) default router. That is, you can operate without having to be connected to a physical network, and with forwarding disabled on the host to prevent any malware from connecting back to its command and control server, or spreading to other Internet connected hosts (you may not always want to give a malware sample access to the Internet).
I suspect that you could also put the host’s physical external network interface (be it Ethernet, wireless, PPP, USB) in to the bridge used by the guest, and then set the host’s default route out the bridge interface. Or configure the virtual machine to bridge the guests network to your external interface. However, if you do that, then the host won’t be able to firewall the guest’s traffic as the packets (or frames) will be bridged at layer 2 (Ethernet) rather than routed at layer 3 (IP) where iptables operates.
It’s also worth noting that UDP packets don’t require any connection set up procedure (like TCP’s three way handshake for instance) before sending data. That means that some servers, like syslog and DNS, don’t actually have to exist. Although the DNS server will obviously have to exist if you want the guest to get a correct DNS reply in order to connect to the Internet.
However, if you are spoofing the Internet to make the guest think that it is Internet connected when in fact it isn’t, then the DNS server doesn’t necessarily have to exist (see dnsspoof(8) in Dug Song’s dsniff).
A syslog server certainly doesn’t have to exist. If you are going to run with a non-existent server though, you will need network sniffing / monitoring tools such as tcpdump, wireshark, tshark installed — the idea then is that you pluck the data (requested DNS names and syslog log messages) out of the network packets (tshark is especially good for this).
Also note that if you are going to send packets to a non-existent host (DNS server and / or syslog server for instance), then the destination IP address that you choose will have to be on an IP subnet other than the local one (as far as the guest operating system is concerned).
If the destination address is on the guest operating system’s local (IP) network then you won’t actually see IP packets but rather ARP packets as the operating system attempts to determine the MAC (most likely Ethernet) address of the destination host. It also makes things easier if you pick a destination address that will be routed, or at least bridged / pass, through the virtualisation host (or whichever host is doing the network traffic monitoring).
If running more than one virtual machine on the same IP subnet, it is worth considering the use of ebtables(8) to prevent each virtual machine from being able to connect (at the layer 2 — Ethernet — layer) to any host other than its default gateway. This stops an infected / rogue virtual machine from being able to attack other virtual machines on the same IP subnet (or, technically, the same layer 2 network), even in the event of the virtual machines’s host based firewall (if any) having been compromised / modified (you can’t rely on a host based firewall in the event of the host having being compromised).
Guest Setup
I have one virtual machine that I use to run the malware samples for dynamic analysis. The guests (virtual machines) have Cuckoo, WinAppDbg, and my unpacking script installed on them so that I can use the same machine to do a number of different tests on each sample.
Operating System
I’ll say Windows 7 Professional, although I haven’t been using it for long as I’m just on the tail end of migrating my dynamic analysis machines over from Windows XP to Windows 7 Professional (after recently discovering that analysis software running on the host no longer supported the cryptography algorithms that Windows XP was using). It is also the operating system that I used for my latest analysis work for which I’m currently drafting a separate post.
I was using Cuckoo v1.2 on Windows XP (yes, I know!), and upgrading to Cuckoo v2.0-rc1 seems to have fixed an issue that I was having with v1.2 on Windows 7.
As the Cuckoo documentation says, turn Windows Firewall, Windows Updates, and User Access Control (UAC) off — it makes things a lot easier! I do the firewalling on the host, and should I run more than one virtual machine on the same network interface, I’ll use ebtables(8) to prevent the various guests from being able to communicate with each other.
Windows Firewall can get in the way of Cuckoo’s network communications, but it may also get in the way of your malware sample and prevent it from connecting to the Internet. Granted, you may not want it to have Internet connectivity, but if you block it at the (virtualisation) host instead, then your packet capture and other monitoring software on the host will at least see the connection attempts.
If you use Windows Firewall on the guest to stop the malware from accessing the Internet, then monitoring software on the host won’t see it. Although you can configure Windows Firewall to log such attempts, you have to pull the log data from the guest, and it is another modified file which wasn’t modified by the malware sample (and hence may taint the analysis report from Cuckoo, or other forensic analyses).
Windows Updates can be useful if you want to keep Windows up to date for testing how malware will behave on your latest corporate desktops for instance, but if you want to do that then revert back to your clean snapshot, login to Windows, manually check for and apply the updates, then delete the old snapshot and take a new snapshot — leave the automatic updating turned off. That way your Cuckoo analysis, Windows logs, and network monitoring of malware samples isn’t tainted by Windows system processes attempting to look for updates.
User Access Control is something that keeps tripping me up. With my UNIX background I keep thinking that if I login as an administrator, then any process that I run has full admin rights, but with UAC under Windows 7 this isn’t the case. A process can be granted permission to use the admin rights (using the ‘Run as administrator’ mechanism) which the user has, but isn’t automatically granted them at runtime.
I’ve also enabled just about all of the auditing that I could (under the ‘Local Security Policy’ applet in Control Panel). This may produce more logging data than I know what to do with, in which case I’ll scale it back a bit. The most important audit events are process creation and process termination events, as these will tell us if the malware starts any other processes, and when.
Applications
eventlog-to-syslog
This is really useful. It takes Windows Event Log entries and forwards them over syslog, and in a much nicer format. As a bonus, the syslog messages are sent over UDP (standard syslog behaviour). This means that you don’t even need to install, nor send the syslog messages to, an actual syslog server because the messages are still sent and will show up in any packet captures. This coupled with the tshark command from the Wireshark suite, makes it easy to get Windows Event Log messages — certainly easier than trying to get them out of Event Viewer, or even out of the .evt/.evtx files — in a nice easy to read text format.
Don’t forget to enable auditing events in the Local Security Policy Control Panel applet so that Windows actually logs some of the more useful information (such as process creation and termination events for instance).
Cuckoo Sandbox
As mentioned above, Cuckoo consists of a server component and a client component. Just about all of the distributed Cuckoo files are installed on the server. The only files that need to be copied to the client are the agent.py and / or agent.pyw files. The .py file is used if you want to see a console window for the agent script, and the .pyw file is used if you don’t.
Seeing the console window for agent.py can be useful initially, as you can see log messages and the like to help you determine the cause of any problems. However, this window can also get in the way of any malware windows, making them invisible or obscured in any screen captures that Cuckoo takes.
The agent script listens on port 8000/tcp for connections from the cuckoo.py script running on the server. The cuckoo.py script will send all of the necessary files to the agent.py script including the malware sample and Cuckoo DLL file. This makes it easy to install / upgrade, as you only need to upgrade the files on the server and then copy the desired agent script to the client.
WinAppDbg
This is a Python wrapper to the Windows Debugging API. It was originally developed for instrumentation of applications but because it nicely wraps a lot (if not all) of the Windows Debugging API, and I believe adds some extra functionality on top, it is awfully useful for malware analysis. It can be used to hook Windows API functions, set code and memory hardware and software breakpoints, read process memory, and more. My Unpacker (unpack.py) script (see below) uses it.
I’ve installed both the 32-bit and 64-bit versions, each using the corresponding version of Python. WinAppDbg issues a warning if we try and use the 64-bit version with a 32-bit binary.
Python
Python is a scripting language, and it is required for both Cuckoo and WinAppDbg. Both of them want version 2.7. You’ll need Python installing on both the virtualisation host and the virtual guests, as both Cuckoo’s server and client components (apart from the DLL) are written in Python.
I’ve installed both the 32-bit version and the 64-bit version, allowing me to use both the 32-bit and 64-bit versions of WinAppDbg and hence allowing me to analyse both 32 and 64-bit binaries.
I installed the 32-bit to the default c:\Python27\ directory and then threw the 64-bit version in to the c:\Python27_x64\ directory. When installing the 64-bit version, turn off the ‘feature’ that registers extensions. That way the default version for running .py files will be the 32-bit version.
Unpacker (unpack.py)
This is a Python script that I developed after noticing a common trend amongst a number of malware samples that my honeypot was catching. The samples were unpacking and running malicious code. The original version of this script used WinAppDbg to automatically capture the unpacked code and to provide information on its entry point and the loop of code used to unpack it.
I’ve since added more functionality to it (but still haven’t come up with a sensible name for it), and written a couple of posts about it:
Automated Unpacking: A Behaviour Based Approach
Beyond Automated Unpacking: Extracting Decrypted/Decompressed Memory Blocks
Debuggers
I’ve also installed a couple of debuggers on the Cuckoo analysis virtual machine. I use these if I want to load a sample in to a debugger and have a look at it, or see what it is doing in a more interactive way. For instance, I’ll use these if my unpacking script doesn’t log anything, or logs a whole load of unhandled exceptions, or if the sample generally doesn’t appear to be doing anything under Cuckoo.
The debuggers are used in an attempt to figure out what the malware is doing in more detail, if I don’t get enough information from Cuckoo nor from my unpacking script, or I’m just curious about it. This can also give me ideas for new functionality to add to my unpacking script.
OllyDbg
This was the first debugger that I found when I got back into disassembling and reversing since the good old debug.exe on MS-DOS days, and it’s free — thanks Oleh (Olly).
OllyDbg nicely went through and showed cross-references to strings and all sorts of other useful information that I didn’t get from debug.exe. I’ve found that there are some situations where OllyDbg is more useful/works better than the free version of IDA Pro is/does. OllyDbg is scriptable I believe, although I haven’t yet tried this functionality.
IDA Pro (free version)
I installed the free version of IDA Pro. This has some limitations, but since I am only doing this as a hobby at the moment and trying to build my skills up, it is a good option — thanks Hex-Rays for providing a free version.
IDA Pro will analyse the execution flow of an executable file, showing the basic blocks and the links between them. This is nice functionality as it makes it easier to identify high-level language constructs such as if-then-else blocks, switch blocks, and the like. It will also produce a function call graph which is awfully nifty. IDA Pro is scriptable with Python.
Happy Analysing
I believe that that is just about it, as far as my setup goes (which is good because it’s gone dinner time and I’m hungry). Look out for my upcoming post where I’ll describe how I used my analysis setup to analyse a piece of malware. Until then, happy analysing.