Self modifying code is a phrase used to describe programs that are able to change themselves. Using self modifying code can make it harder to reverse engineer a program, largely because the ‘actual code may differ from that shown’. That is, the actual code that ends up executing may be different from the code that is first shown during disassembly. Join me as I attempt to relive my youth and create some self modifying code, only this time it will have to work under the memory protection schemes implemented by the 80×86 processors and MS Windows.
The last time I was playing with self modifying code was back in the days of MS-DOS. I used to have fun writing assembly language with good old debug.exe, and conjuring up different ways my code could change itself. Needless to say, I didn’t get invited to many parties.
Neither MS-DOS, nor the 8086 processor on which my code was running, implemented any form of memory protection — applications were free to write all over each other’s memory. There is even a programming game around this concept — if you haven’t already seen it, have a look at Corewars.
However, in MS Windows (at least these days), memory blocks have permissions associated with them. These permissions are enforced with the help of the 80×86 (80286/80386 and above, when running in protected mode) processor. Memory blocks can be readable, writeable, and executable.
In order for code to change itself, the memory page that it resides in must have both executable (so that the code can run to make the changes to itself) and writeable (so that it can change itself). By default, memory that Windows allocates for code (.text section for EXE files for instance) is generally read only, and memory that Windows allocates for data is generally non-executable.
This is generally a good thing, as you shouldn’t try to run data (unless you are performing something like a buffer overflow attack, or you are male and trying to impress a girl), and most programmers would probably agree that writing code to change code is generally considered bad programming. Partly because it is hard to understand, but mostly because it annoys the Vogons.
Despite that, I’m going to see how self modifying code can work under MS Windows, so grab yourself a stiff drink and a suitable assembler, as time is fleeting and I’d like to get through this before madness takes its toll. Read this closely, hopefully not for very much longer (believe it or not, I am trying to write shorter blog entries), as I’ve got to write some code.
I’m going to use the MingW32 GNU C cross compiler, gcc, running on Linux (because hey, why do something the easy way when there is a harder way to do it). gcc which will take an assembly language file with a .s file extension and build a standalone Windows EXE file for me.
I should probably mention that gcc uses the AT&T style of assembly language notation, rather than the Intel style used by NASM, TASM, etc.. You can probably figure out most of the differences, but one difference worth noting is that the operands are reversed. AT&T style places the source operand on the left, and the destination operand on the right. Intel style places the destination operand on the left, and the source operand on the right.
Also, the trailing ‘l’ character on a number of the instructions isn’t because this code is brought to you by the letter ‘l’, but rather to specify the size of the operands. In this case ‘l’ for long (that is 32 bits).
If you would like to follow along at home, I have annotated my code with the letters A to Z in the comments describing each block of code.
If you read my code you will see that at I started, at A, with a function (vrtqry()) to make calling _VirtualQuery@16() easier. Well actually, if I’m honest, I created it later when I realised that I wanted to call _VirtualQuery@16() twice and didn’t want to repeat all the supporting code.
My vrtqry() function takes two parameters, on the stack. The first is a memory address to query, and the second is a pointer to a buffer for the MEMORY_BASIC_INFORMATION structure.
It just sets up a stack frame (A), saves some registers (B) which are modified by _VirtualQuery@12(), pushes its arguments on to the stack and calls _VirtualAlloc@12() (C), and then cleans up (D) before removing the stack frame and returning (E).
Next, at F, is the start of _WinMain@16() with the usual opening prologue.
The subl $28,%esp instruction at G is to allocate space on the stack for the MEMORY_BASIC_INFORMATION structure. The following pushl $0x0 instruction is to push the value 0 on to the stack. The code at V checks for this to see if it is on the second iteration of the loop and should exit.
Now here, at H, is where it gets exciting. This block of code contains an instruction which you often see in malware (and shellcode) and that is a call instruction to a pop instruction, which is often the next instruction. ‘Has madness set in?’ I hear you ask. Not only is this code transferring control to an instruction that was going to execute next anyway, but that instruction pops something off the stack when nothing was pushed. Or was it?
Think about what a call instruction does. It pushes the return address, that is the address of the next instruction, on to the stack and then transfers control to the specified address (or offset from the next instruction in this case, as gcc spat out a call relative address opcode).
The instruction at the specified address pops the last item off the stack, which in this case is the return address of the call instruction, in to the eax register. The upshot of this is that the eax register ends up containing the address of itself. This construct is often used by malware to find out its address in memory.
Here’s J! The nop instruction at J is there because I realised that it is an instruction consisting entirely of sequential letters of our alphabet, and I wanted to share my excitement with someone.
On a more serious note, the nop instruction is there as a place holder. This code is going to end up attempting to change itself by changing that nop instruction to an inc %ebx instruction which, although it doesn’t consist entirely of sequential letters of our alphabet, is more useful and a better instruction to be stranded in a desert memory page with.
Next up, at K, we’ll do what the comment says (experience has taught me that it is often a good idea to comment assembly language code) and call our vrtqry() function to get the base address and current permissions of the memory page within which our nop instruction is sitting.
That wasn’t too hard. The lea instruction at L is loading the address of that 28 byte buffer that we created with subl $28,%esp at the start of _WinMain@16(), and then pushing it on to the stack as an argument to my vrtqry() function.
M pushes another argument on to the stack and calls vrtqry(). This argument is the address of the memory which we want to query. N simply removes the arguments from the stack.
O saves registers on the stack, as _printf() modifies them.
P then, and I add various constants on to the edx register to get the address of the various MEMORY_BASIC_INFORMATION structure’s elements, push them on to the stack along with a format string, and call _printf(). This step isn’t really necessary — it just provides some information about the memory page and was mainly so that I could tell that the upcoming _VirtualProtect@16() call (now I’ve gone and ruined the suspense) was in fact changing the protection.
Q restores the registers that we saved at O.
R is where we will make a _VirtualProtect@16() call to change the protection on the memory page containing the nop instruction. S is where we print the results. First though, we’ll leave these blocks commented out to see what happens with the default memory protection in place.
T is a small block of code that just calls _getchar(). The only reason for doing this, as you will see it rudely discards the result, is to pause the program to make it somewhat easier to attach a kernel debugger to it.
The reason for attaching a kernel debugger is so that I can see the descriptor table entries that correspond to the selectors in the segment registers. This is to satisfy my curiosity regarding how it is possible to create memory that is both writeable and executable when the 80×86 segment descriptors do not allow for such a memory segment.
The type field of the segment descriptor uses the same bit to indicate whether a code segment is readable/not readable as it does to indicate whether a data segment is writeable/read only. Therefore it is not possible to indicate that a code segment is writeable. I have a hunch about how this is achieved, but that is probably best left for a future blog post — I don’t want to pack too much excitement in to one post.
U, and time to print out some information such as the address of our unsuspecting little nop instruction, and the contents of the ebx register.
This information is printed both before and after modifying the nop to inc %ebx (that is, on the first and second iterations of the loop) so that we can see whether the modification worked. Not because we don’t trust the processor, but rather because we’re expecting the memory protection to get in our way.
This next little trick at V is to decide whether it is time to quit. Remember that, at the end of block G and just before the call nxtinstr instruction at H, we pushed the constant 0 on to the stack, and then went to great lengths to preserve the contents of the eax register ever since popping it immediately after the call? Well this is where we finally do something with it.
The value in eax, at V, will either be the address of the popl %eax instruction at nxtinstr: (that is, the return address of the call instruction), or, if we have already jumped back to nxtinstr: once and executed that popl %eax instruction, eax will contain the 0 from the pushl $0x0 instruction at the end of block G.
This is where we check for that 0. If we find it, we exit. If we don’t find it then we know that this is the first run through this code and so we continue to W.
The movb $0x43,0x1(%eax) instruction at W is what would be written as mov byte ptr [eax + 0x1],0x43 in Intel style, and it will sneak up on the nop instruction, render it unconscious, drag it out of the way, and replace it with an inc %ebx instruction; all before you can exclaim ‘Oh my God, they killed noppy‘.
The constant operand of 0x43 is the opcode for inc %ebx (it is also the ASCII code for the C character, but that is largely inconsequential and my stating it just serves to highlight why people tend not to talk to me much).
The jmp *%eax instruction at X simply jumps back to the address in the eax register (which we’ve gone to great lengths to preserve), which will be the return address of the call instruction at H.
Finally, the moment you’ve all been waiting for — the end, where I describe what this code aims to achieve.
The code at Y just causes _WinMain@16() to return 0, which will cause the program to exit with the ERRORLEVEL environment variable set to 0.
The purpose of this code then, is to push the constant 0 on to the stack. This will later signal that it is time to exit. It then uses the call instruction to push the address of the popl %eax instruction (at label nxtinstr:), on to the stack.
The first time through, the instruction after the popl %eax instruction, at J, will be a nop. The code at U will print the value of the ebx register. It then checks the eax register and sees that it equals the address of the instruction at nxtinstr: (ie. it isn’t 0) so it replaces the nop at J with inc %ebx and jumps to the address in the eax register (which is the address of the popl %eax instruction at nxtinstr:).
The eax register is popped again, which is where that constant 0 that we pushed earlier, enters the picture. eax now equals 0. However, the instruction after the popl %eax is now (as we won’t get this far if the instruction to replace the nop fails) inc %ebx. We print the value of the ebx register again to confirm that it is one more than last time, before noticing that the eax register now equals 0 and exiting.
So ‘what’s the point?’ I imagine you asking. The point is to show what happens when a program tries to change its own code, with default memory protection applied.
If you assemble the code and run the executable file, you should see a dialog box stating that ‘memtst.exe has encountered a problem and needs to close.’.
For once, this is actually a good thing and if you click on the ‘click here’ to ‘see what data this error report contains’, and then ‘click here’ to ‘view technical information about the error report’, you can see why.
The exception code of 0xc0000005 is access violation (winbase.h defines this as STATUS_ACCESS_VIOLATION) which generally means a memory protection issue. If you make a note of the address (which was 0x40141d on mine) and then disassemble memtst.exe with a debugger, or quicker (assuming you have an objdump with support for the pei-i386 target installed) with the following UNIX command line, you can see which instruction caused the access violation:
$ objdump -D my.exe | grep -i 40141d 40141d: c6 40 01 43 movb $0x43,0x1(%eax)
As you can see, it is the instruction to replace the nop instruction with the inc %ebx instruction, and this means that Windows, with help from the processor, has stopped a program from modifying memory marked as containing non-writeable code. This is good because this would almost always suggest a serious problem with the code (like attempting to write to memory referenced by a corrupted pointer for instance).
Unfortunately for us though, this spoils our fun.
Now is a good time to go back and uncomment the blocks of code at R and at S, as a call to _VirtualProtect@16() would be pretty handy right about now. Reassemble the new code and run it.
This time the code prints the address of the nop instruction (or more accurately, the contents of the eax register, which is 0 on the second iteration), and the contents of the ebx register. However, this time, the instruction at W actually succeeds in modifying the nop instruction, and the jmp *%eax instruction at X transfers control back to the popl %eax instruction at nxtinstr:.
The ebx register is then printed again, and note that this time through it is one more than the value printed on the last iteration, as we replaced the nop instruction with an inc %ebx instruction before looping back.
That then is one way that a program can change itself — use VirtualProtect() to add write permissions to its code page(s) before writing to them. I have also discovered another way which allows us to do away with the VirtualProtect() call, but that is a topic for another time.
Pingback: Automated Unpacking: A Behaviour Based Approach | Malware Musings