Right, it’s time to start the fun stuff — dynamic analysis, where we run the malware sample and see what happens. What could possibly go wrong?!
Overview
I started off by running the malware sample with my malware analysis script, which has provided a bit of an overview of what’s going on:
INFO:[*] <832:828> 0x405052: VirtualAlloc(0x0,0x14538 (83256),0x1000,0x040) = 0x1a0000
INFO:[-] Request for EXECUTEable memory
INFO:[*] VirtualAlloc()d memory address 0x1a0000 written from 0x405083 (sample!0x5083): mov byte ptr [ecx + edi], al
INFO:[*] VirtualAlloc()d memory address 0x1a0000 accessed (1) from 0x405083
INFO:[-] Enabling tracing
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
INFO:[*] VirtualAlloc()d memory address 0x1a6c82 accessed (1) from 0x405083
INFO:[-] Enabling tracing
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
INFO:[*] VirtualAlloc()d memory address 0x1ad904 accessed (1) from 0x405083
INFO:[-] Enabling tracing
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
INFO:[*] VirtualAlloc()d memory address 0x1a0000 written from 0x404dff (sample!0x4dff): mov dword ptr [edi], eax
INFO:[*] VirtualAlloc()d memory address 0x1a0000 accessed (1) from 0x404dff
INFO:[-] Enabling tracing
INFO:[*] VirtualAlloc()d memory address 0x1a0004 written from 0x404e11 (sample!0x4e11): mov dword ptr [edi + 4], ebx
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
INFO:[*] VirtualAlloc()d memory address 0x1a0448 accessed (1) from 0x404dff
INFO:[-] Enabling tracing
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
INFO:[*] VirtualAlloc()d memory address 0x1a0890 accessed (1) from 0x404dff
INFO:[-] Enabling tracing
INFO:[E] Reached tracing limit of 250000 instructions
INFO:[D] Single-step instruction limit reached -- stopping tracing
It looks like the sample is doing something in a loop around 0x405083, and again in a loop around 0x404dff, because the VirtualAlloc()d memory is accessed a number of times from those addresses (as indicated by the ‘INFO:[*] VirtualAlloc()d memory address 0x1a0000 written from …’ lines in the script output).
The second loop is logged a number of times with the accessed address increasing by 0x448 until it accesses memory at address 0x1b4118. That got me wondering why it would by accessing memory in 0x448 byte steps and went looking for the significance of 0x448 (encryption algorithm block size, or something like that), but couldn’t find anything.
It turns out that the memory address jumping by 0x448 was a consequence of my script disabling single-stepping after 250,000 instructions, for performance reasons. This is a good example of why it’s important to know how your tools work — so you can understand what their output means and what they are actually telling you!
Just reviewing the log output while reviewing this post, I had another lesson in why it’s important to know how your tools work, but also why you need to think about your log output too. I ended up reading more into my log output than I should have after noticing that there are two different log lines there — one saying ‘VirtualAlloc()d memory address … written from …’ and one saying ‘VirtualAlloc()d memory address … accessed (1) from …’.
The difference in the log message wording was making me think that the first log entry meant that the address was written, and the second meant that the memory was read. Looking at my script again I noticed that that isn’t the case. In fact, the ‘(1)’ after the ‘accessed’ is EXCEPTION_RECORD.ExceptionInformation[0], which is an indication of the type of operation that caused the fault, with ‘1’ signifying a memory write operation (and ‘0’ signifying a read operation). It’s been a while since I worked on my script as I’ve been working on my newer C++ malware debugger instead.
So, we seem to have two loops that manipulate the VirtualAlloc()d memory at 0x1a0000, one modifying it from 0x405083 (mov byte ptr [ecx + edi], al) and one modifying it from 0x404dff (mov dword ptr [edi], eax). The former looks like a classic base address plus index reference, modifying a byte at a time.
Next we see that the unpacked code is executed, and we have the entry point and the unpacking loop:
INFO:[*] Found unpacked entry point at 0x1a3f7a called from 0x778a0a22 (push ebp) (after executing -1 instructions)
INFO:[-] Unpacking loop at 0x405067 - 0x405091 (83255 iterations)
INFO: 0x405067: cmp dword ptr [start+0x3d7f9c], 0xa8
INFO: 0x405071: mov eax, dword ptr [start+0x3d8a00]
INFO: 0x405076: mov al, byte ptr [eax + edi + 0x1134b]
INFO: 0x40507d: mov ecx, dword ptr [start+0x3b9cbc]
INFO: 0x405083: mov byte ptr [ecx + edi], al
INFO: 0x405086: jne 0x40508a
INFO: 0x405088: call ebx
INFO: 0x40508a: inc edi
INFO: 0x40508b: cmp edi, dword ptr [start+0x3d7f9c]
INFO: 0x405091: jb 0x405067
INFO:[-] Dumping 83256 bytes of memory range 0x1a0000 - 0x1b4537
The first thing we notice there is the interesting address from which the unpacked code was called, 0x778a0a22 — that looks like DLL memory. It also looks like a bug in my script, because the instruction at that address was decoded as ‘push ebp’ which isn’t going to initiate a jump/call (unless it generates an exception because of the value of ss:esp, or because ss:esp points to non-writable memory for instance). We’ll need to manually analyse the sample to find where the unpacked code is called from.
We then see that the block of unpacked code requests a second block of memory:
INFO:root:[*] <5000:4916> 0x1a4433: VirtualAlloc(0x0,0x12a67 (76391),0x1000,0x040) = 0x1c0000
INFO:root:[-] Request for EXECUTEable memory
There is some modification of the second block of requested memory before my script detects control passing to it from 0x1a4467 (the first block of requested memory):
INFO:[*] Found unpacked entry point at 0x1c0000 called from 0x1a4467 (jmp dword ptr [ebp - 4]) (after executing 82010 instructions)
INFO:[-] Unpacking loop at 0x1a44a2 - 0x1a44a9 (18 iterations)
INFO: 0x1a44a2: mov dl, byte ptr [ecx]
INFO: 0x1a44a4: mov byte ptr [eax], dl
INFO: 0x1a44a6: inc eax
INFO: 0x1a44a7: inc ecx
INFO: 0x1a44a8: dec edi
INFO: 0x1a44a9: jne 0x1a44a2
INFO:[-] Dumping 76391 bytes of memory range 0x1c0000 - 0x1d2a66
The process then exits after an exception:
INFO:[*] <5000:4916> Unhandled exception at 0xff50ffff (0xff50ffff): EXCEPTION_ACCESS_VIOLATION
INFO:[*] <5000:4916> Unhandled exception at 0xff50ffff (0xff50ffff): EXCEPTION_ACCESS_VIOLATION
INFO:[*] <5000:4916> Exit process event for C:\Users\bobby\sample.exe: 0xc0000005
0xc0000005 is the exit code from the process, and in this case it is the value STATUS_ACCESS_VIOLATION corresponding to the exception that occurred. It is interesting that the process tried to access memory at 0xff50ffff. This could be a bug, either inherent in the malware sample, or caused by my analysis script interacting with the malware sample, or it could be an anti-debug technique where the malware sample sets up an exception handler then deliberately generates an exception to see if its exception handler gets control. If not, it is likely a debugger has control. Again, something to check manually with a debugger.
So the sample requests a block of memory (0x1a0000), which two loops (one around the instruction at 0x405083 and the other around the instruction at 0x404dff) then populate and modify to create executable code, and then control is passed to it (entry point 0x1a3f7a). That ‘unpacked’ code requests a second block of memory (0x1c0000) which is then modified before control is passed to it (entry point 0x1c0000).
This blog post will focus on the first of those two loops, and the next post will focus on the second loop.
Manual Analysis Using x32dbg (x64dbg)
Right, so what have we got? We now have something to start with. Let’s revert our virtual machine (VM) to a snapshot (my malware analysis script actually runs the malware so we can’t trust the VM in its current state) and load the sample into a debugger and have a look at it. I’m going to use the x32dbg (the 32-bit version of x64dbg) debugger.
Let’s break after the first ‘unpacking’ loop (to 0x1a0000) to see what we’ve got, by setting a breakpoint at 0x405093 after the unpacking loop at 0x405067 – 0x405091 (see my script output above). We’ll see if we can resolve 0x778a0a22 to a name at this point, since my script seemed to think that this is where control passes to the unpacked code (I have my doubts!). If we can’t, we’ll try again once control is passed to the unpacked code, given my script is claiming that it was called from that 0x778a0a22 address it should be loaded by that point. It does look like a DLL address though so it may have been loaded at a different address.
I usually like to find the instruction that passes control to the unpacked code, as that is a convenient place to set a breakpoint to speed up manual analysis, but it’s not strictly necessary if you just want to determine what the malware is doing to the system. However, since I have my doubts about the unpacked code being called from 0x778a0a22, I want to try and find where this unpacked code is actually called from.
Once we hit the breakpoint after the first unpacking loop (0x405067-0x405091), we’ll have a look and see what we have at 0x1a0000, being the block of memory returned by the VirtualAlloc() call. My script output told us that that loop modifies the newly allocated memory block (at 0x1a0000 in the script output), so we should have some presumably unpacked code there that we can look at.
VirtualAlloc() isn’t guaranteed to give us a block of memory at the same address every time though. In fact, it allocates memory at a different address with the malware sample running in x32dbg than it does when my script runs it. Consequently we need to find the address of the allocated memory. This address is returned by VirtualAlloc() and hence we’ll find it in the eax register when VirtualAlloc() returns. Turns out we can do something a bit spiffy in x32dbg.
To get the address of the allocated memory we’ll put a logging only breakpoint immediately after the VirtualAlloc() call at 0x40504c, being 0x405052 (the first line of my script output (see above) gives us the address of the instruction immediately after the VirtualAlloc() call because it can only see the return address from VirtualAlloc() — it can’t see the actual address of the call instruction). This breakpoint will log the address returned by VirtualAlloc() so when we break at 0x405093, after the unpacking loop, we know the address of the newly allocated memory.
Once we have the address returned by VirtualAlloc() we can set a memory execute breakpoint on that block of memory. Setting the breakpoint on the whole block of memory is a bit overkill because we’ve been told (by my script) that it starts executing at 0x1a3f7a, but this will confirm that my script found the correct entry point. Also, if the malware uses a call instruction to execute the unpacked code then we’ll find a return address on the stack once that breakpoint is hit.
We’re not going to get that far in this post, but we’ll set the breakpoint now while we’re examining that part of the code, and also setting a breakpoint there now will stop the malware from running unchecked should something go wrong and we fail to stop it before it gets that far1.
Now, here’s the spiffy x32dbg bit. We can add an x32dbg command (membp eax,0,x) to the breakpoint at 0x405052 (being the return address from VirtualAlloc()) so that x32dbg automatically sets a memory execute breakpoint on the address returned by VirtualAlloc() for us — “the things we can do with technology today” (I don’t seem to be able to find that advert any more). The x64dbg documentation says that ‘If command condition is set, evaluate the expression (defaults to 1)’, but I found that I had to explicitly set it to ‘1’ to get the Command Text to run. If we can’t see a return address on the stack then we’ll have to do a bit more work to find where it is called from.
Go to 0x405052 (the address after the call to VirtualAlloc() just before the unpacking loop starts at 0x405067) in x32dbg and press SHIFT-F2 to set a conditional breakpoint. Set the following parameters:
Break Condition | 0 (don’t halt program execution) |
Log Text | VirtualAlloc({[ESP – 10]}, {[ESP – 0c]}, {[ESP – 08]}, {[ESP – 04]}): {EAX} |
Command Text | membp eax,0,x |
Command Condition | 1 |
Note that the negative stack pointer (ESP) offsets are because this breakpoint is set at the instruction immediately after VirtualAlloc() returns, and VirtualAlloc() pops the arguments off the stack before returning. The negative stack pointer offsets are looking at the four values that were just popped (they are still accessible because nothing was pushed to the stack since VirtualAlloc() returned). In other words, it is accessing the four arguments that were passed to VirtualAlloc().
We should now have two breakpoints set. One at 0x405052, immediately after the VirtualAlloc() call, and one at 0x405093, immediately after the first unpacking loop.
Address | State | Disassembly | Summary |
00405052 | Enabled | mov ebx,dword ptr ds:[<&SetFileApisToANSI>] | breakif(0), log(“”VirtualAlloc({[ESP – 10]}, {[ESP – 0C]}, {[ESP – 08]}, {[ESP – 04]}): {EAX}””), cmdif(1, “”membp eax,0,x””) |
00405093 | Enabled | mov dword ptr ss:[ebp-14],esi |
Let’s run the malware sample and see what happens. The following snippet is the x32dbg log after hitting the entry point breakpoint, setting the above two breakpoints, and then running the malware sample:
INT3 breakpoint "entry breakpoint" at <sample.EntryPoint> (00408716)!
Breakpoint at 00405093 set!
Breakpoint at 00405052 set!
VirtualAlloc(0, 14538, 1000, 40): 190000
Memory breakpoint at 00190000 set!
INT3 breakpoint at sample.00405093!
You can see there that x32dbg has stopped at our 0x405093 breakpoint — the one after the first unpacking loop. The fourth line (‘VirtualAlloc(…)’) is the output from the logging only breakpoint at 0x405052. As you can see, it has logged the VirtualAlloc() arguments and also the return address. In this case (and this demonstrates what I was saying about the address differing between my script running the sample and x32dbg running the sample), VirtualAlloc() allocated memory at 0x190000.
The fifth line in the above x32dbg log output is the x32dbg membp() command setting the breakpoint at the address returned by the VirtualAlloc() call.
When the breakpoint at 0x405093 is hit, it’s just exited the first unpacking loop. We know from the 0x405052 breakpoint’s log message (and from my script’s output) that we should have some unpacked code/data at address 0x190000. Let’s have a look:
00190000 | 031CE2 | add ebx,dword ptr ds:[edx] |
00190003 | 61 | popad |
00190004 | A2 06CEBF46 | mov byte ptr ds:[46BFCE06],al |
00190009 | 58 | pop eax |
0019000A | F3:E7 AA | out AA,eax |
0019000D | 4D | dec ebp |
0019000E | 3626:61 | popad |
00190011 | E2 31 | loop 190044 |
00190013 | AE | scasb |
00190014 | C8 71A2 93 | enter A271,93 |
That doesn’t look like valid code. It could be data — if it is, it’s not ASCII characters. Let’s disassemble from the known entry point, as that should definitely be valid code:
00193F7A | 5D | pop ebp |
00193F7B | 0103 | add dword ptr ds:[ebx],eax |
00193F7D | 9C | pushfd |
00193F7E | 5A | pop edx |
00193F7F | 695A 4F 70A6B224 | imul ebx,dword ptr ds:[edx+4F],24B2A670 |
00193F86 | 3D C42983F3 | cmp eax,F38329C4 |
00193F8B | F0 | ??? |
00193F8C | AC | lodsb |
00193F8D | 4D | dec ebp |
00193F8E | D6 | salc |
00193F8F | F9 | stc |
00193F90 | 0A08 | or cl,byte ptr ds:[eax] |
00193F92 | AC | lodsb |
00193F93 | 0F1EEC | nop esp,ebp |
00193F96 | 5B | pop ebx |
00193F97 | 1C EF | sbb al,EF |
Ditto — that doesn’t look like valid code either. Remember though, that log messages from my unpacking script were suggesting that there were two unpacking loops because it noticed two different instructions modifying the newly allocated memory block at 0x190000/0x1a0000:
INFO:[*] VirtualAlloc()d memory address 0x1a0000 written from 0x405083 (sample!0x5083): mov byte ptr [ecx + edi], al
...
INFO:[*] VirtualAlloc()d memory address 0x1a0000 written from 0x404dff (sample!0x4dff): mov dword ptr [edi], eax
The fact that there are two instructions modifying the same target address suggests that there is more unpacking to do. Well that and the fact that there doesn’t appear to be executable code at the unpacked entry point after the first loop. Of course it could be that my script got the entry point incorrect, but we can verify that by letting the sample run until that memory execute breakpoint is hit, and then examining the eip register.
It is worth noting at this point that my unpacking script actually dumps unpacked memory to files, and we can actually use the UNIXTM objdump(1) command to disassemble them, however, we can’t see an instruction at offset 0x3f7a (corresponding to address 0x193f7a/0x1a3f7a):
$ objdump -b binary -m i386 -D sample.exe.memblk0x001a0000
...
3f78: df 79 e8 fistpll -0x18(%ecx)
3f7b: 01 00 add %eax,(%eax)
3f7d: 00 00 add %al,(%eax)
3f7f: c3 ret
3f80: 55 push %ebp
3f81: 8b ec mov %esp,%ebp
...
Just explaining those objdump command line arguments, the -b specifies the file format (elf, pe, pdb, srec, ihex, etc.). Since we just have raw opcodes with no file format headers, we specify ‘binary’. The -m specifies the architecture which we need to specify because objdump can’t get that information from a file header because there isn’t one. We specify i386 here to tell objdump that the binary data is i386 (or 80×86) opcodes. The -D says to disassemble all sections, as compared to -d which will only disassemble sections that are expected to contain code (like the .text section).
The reason we can’t see an instruction at offset 0x3f7a is because the 80×86 processor has variable length instructions — different instructions are encoded to a differing number of bytes. Consequently if we don’t have consecutive valid instructions from offset 0x0, we can’t guarantee that the instructions displayed above are correct, and they certainly don’t look correct apart from what looks like a function prologue at offset 0x3f80.
This is a property of the 80×86 processors that also allows for some anti-disassembly tricks, like jumping into the middle of an instruction. Something I was playing around with back in the nineties (back when I didn’t have lots of other stuff to get done), just to see if it is possible.
To get around this problem we can give objdump a starting address (or offset), and let’s go all out and tell objdump where the code was located in memory too. The –adjust-vma=0x190000 tells objdump to add 0x190000 to addresses, so that they match the addresses where the code will end up when it is unpacked in x32dbg); and –start-address=0x193f7a tells objdump to start disassembling from what is now address 0x193f7a (file offset of 0x3f7a plus the start-address of 0x190000)2:
$ objdump -b binary -m i386 -D --adjust-vma=0x190000 --start-address=0x193f7a sample.exe.memblk0x001a0000
sample.exe.memblk0x001a0000: file format binary
Disassembly of section .data:
00193f7a <.data+0x3f7a>:
193f7a: e8 01 00 00 00 call 0x193f80
193f7f: c3 ret
193f80: 55 push %ebp
193f81: 8b ec mov %esp,%ebp
193f83: 8d 45 c4 lea -0x3c(%ebp),%eax
193f86: 83 ec 3c sub $0x3c,%esp
193f89: 50 push %eax
193f8a: e8 0d 00 00 00 call 0x193f9c
...
Now we can see a valid instruction at address 0x193f7a. Using –adjust-vma to add the memory base address to all the file offsets means that all the relative addresses, like those in the call and jmp instructions for instance, are shown as the actual memory address which will make it easier to follow along in a debugger.
You’ll notice a difference in the instruction decoding done by objdump versus that done by x32dbg. objdump uses the AT&T style of instruction <src>, <dst> and parenthesis rather than brackets for the memory indirect addressing mode, as opposed to x32dbg which uses the Intel style of instruction <dst>, <src>. That can get confusing when switching between the two!
I did try to find what was at 0x778a0a22 (the address from which my script claimed the unpacked code was called) after loading the sample into x32dbg and running it to the first unpacked code entry point (0x1a3f7a), but all I got was bitter and a nasty little rash. It turned out that 0x778a0a22 wasn’t actually mapped, so I couldn’t see what was there when my script ran. As far as analysing the sample goes though, it’s largely irrelevant, but something that I should figure out to debug my script/algorithm.
Once the unpacked code starts running, and hence we know (or at least are more confident) that it is completely unpacked, we can set a breakpoint at 0x194467 (address corrected for x32dbg execution), where the second block of unpacked code is called from (‘Found unpacked entry point at 0x1c0000 called from 0x1a4467’). We should be able to automate that with x32dbg breakpoints.
Now we know from my script output that this unpacked code also includes an unpacking loop, but to keep you in suspense (more so that this post doesn’t get too long… well, even longer), let’s look at that in the next blog post.
- We can’t set a code breakpoint when we start our debugging session because that memory won’t exist (it’s dynamically allocated with VirtualAlloc()). We also can’t set a breakpoint as soon as that memory is VirtualAlloc()d either, because our breakpoint instruction (int 3) will be overwritten by the unpacking code in the malware. This would, however, be a good reason to use a hardware breakpoint rather than a software one. Let’s use a memory execute breakpoint (guard page) though in case, as stated above, the memory is modified more than once during unpacking. ↩︎
- The filename contains 0x001a0000 because my script created the file based on the memory address — 0x1a0000 — that was unpacked to when my script ran the sample. We’re using 0x190000 in the objdump command so that the addresses in the objdump output match what we’ll see in x32dbg (where the code will be unpacked to 0x190000) ↩︎