Malware often extracts an embedded PE (Portable Executable) file from within itself, and either overwrites its original process image, or starts and overwrites a new process (process hollowing), with the embedded image. What if you want to save a copy of this extracted PE file so that you can analyse it using something other than the debugger that you were running the sample in?
While looking at Tofsee I noticed that it extracted an embedded PE file and overwrote its original process image in memory (at 0x400000) with the extracted PE file. It would be good to save a copy of that so that I can analyse it in Ghidra, or in a different debugger, or to just save the extracted PE as a malware sample without all the packing around it. You never know, we may then find different malware samples that end up unpacking the same embedded PE file.
If you load the malware sample into x64dbg, go to the memory map in x64dbg and select all (hold ‘Shift’ key and left-click) the memory blocks corresponding to the process name (typically the memory block at 0x400000), and the memory blocks corresponding to the .text, .data, .rsrc, and .reloc sections. Right-click and select ‘Dump Memory to File’:
Address | Size | Party | Info | Content | Type | Protection | Initial |
00400000 | 00001000 | User | sample.exe | IMG | ER— | ERWC- | |
00401000 | 0002E000 | User | “.text” | Executable code | IMG | ER— | ERWC- |
0042F000 | 01FEF000 | User | “.data” | Initialized data | IMG | -RWC- | ERWC- |
0241E000 | 00028000 | User | “.rsrc” | Resources | IMG | -R— | ERWC- |
02446000 | 0000B000 | User | “.reloc” | Base relocations | IMG | -R— | ERWC- |
You’ll notice that that creates a huge file (33,886,208 bytes) when the original PE file was only 398,848 bytes! That is because x64dbg has saved the sections to the file with the same spacing as they have in memory. That is, the address of the first memory block (0x400000 in this case) becomes offset 0 in the file, and the other memory blocks are saved at their memory offset from 0x400000, in the file. For example, the second memory block (.text) is at 0x401000, so we’ll find that at offset 0x1000 (0x401000 – 0x400000) in the file. The .data section is at 0x42f000, so we’ll find that at offset 0x2f000 (0x42f000 – 0x400000) in the file. Hence we end up with a large file with a lot of zero bytes (padding between the sections) in it.
If you remember back to the PE file header, you’ll remember that there is a field in the section table that specifies the memory load address and the file offset of each section. We can use this information to rebuild the PE file from the various memory sections.
If, instead of selecting the multiple memory blocks at once and dumping to a file, we select each one in turn and dump it to a file, then we get the memory of each section in a separate file. We can then put them back together, according to the section table in the file header, using some UNIXTM jiggery pokery1. Let’s first dump the PE file header from the first block of memory which was at 0x400000:
$ objdump -h sample_00400000.bin
BFD: error: sample_00400000.bin(.text) is too large (0x2d06c bytes)
sample_00400000.bin: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0002d06c 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .data 00001e00 0042f000 0042f000 0002d600 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .rsrc 00027978 0241e000 0241e000 0002f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .reloc 0000a6bc 02446000 02446000 00056e00 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Ignoring that error, which I’m guessing is objdump telling us that the .text section is located past the end of the file (because the file only contains the PE file header(s) and none of the sections), we can see the memory addresses and the file offsets for each of the PE file sections. x64dbg has saved the memory address in the file name when it dumped each memory section, so we can now start putting the separate sections together, using the dd(1) command, to build a PE file.
I’ll explain the command line arguments for the dd(1) command:
if | input file name |
of | output file name |
bs | block size (default 512 bytes) |
seek | number of blocks to skip in the output file. dd(1) will lseek(2)/fseek(3) to this location in the output file before writing to it. This argument is used to start writing the various sections at their correct location in the output (PE) file. Without this argument, dd(1) will start writing at the start of the output file and clobber any existing content |
I’ll explain the whole block size thing. dd(1) reads and writes blocks (512 byte blocks by default), so we need to convert the offset used in the seek argument to the number of blocks rather than the number of bytes. Well, technically we don’t, because we can just specify a bs of ‘1’ (byte) and then specify the number of bytes in the seek argument.
Using a block size (bs) of ‘1’ (byte) is inefficient though, because dd(1) reads and writes a block at a time and hence a bs argument of ‘1’ will cause dd(1) to read(fd, buf, 1) and write(fd, buf, 1) which means more system calls (511 extra read() calls and 511 extra write() calls for every 512 bytes of data!). This PE file is reasonably small so there probably won’t be any really noticeable difference, but when dealing with large files (USB thumb drive/CD/DVD images for instance), the larger block size you can use, the better. In this case we are restricted by the offset in the PE file where we need to start writing, because we need to give dd(1) an integer number of blocks (rather than a number of bytes) to skip. That is why I change bs from ‘1024’ to ‘512’ in some of the commands below — the offset of 0x2d600 (185856) is not divisible by 1024, but is divisible by 512.
The next obstacle is that objdump(1) gives us the file offsets in hexadecimal (hex), but dd(1) wants decimal values, so we need to convert. You can (if you have to) use bash(1) to do this. I’ve been using UNIXTM for some time so prefer to use something that is closer to being POSIXLY_CORRECT2, like bc(1).
Here, then, are the commands to reconstruct a valid PE file from the memory sections. I’ve done the hex to decimal conversion and shown my working (an ode to maths teachers) in the comments preceding each dd(1) command:
# start with the PE file header(s)
$ cp -a sample_00400000.bin pefile.exe
# add the .text section from 0x401000 to 0x400
# 0x400 == 1024
# 1024 / 1024 (bs) == 1
$ dd if=sample_00401000.bin of=pefile.exe bs=1024 seek=1
184+0 records in
184+0 records out
188416 bytes (188 kB, 184 KiB) copied, 0.00115347 s, 163 MB/s
# add the .data section from 0x42f000 to 0x2d600
# 0x2d600 == 185856
# 185856 / 512 (bs) == 363
$ dd if=sample_0042F000.bin of=pefile.exe bs=512 seek=363
65400+0 records in
65400+0 records out
33484800 bytes (33 MB, 32 MiB) copied, 0.141891 s, 236 MB/s
# add the .rsrc section from 0x241e000 to 0x2f400
# 0x2f400 == 193536
# 193536 / 1024 (bs) == 189
$ dd if=sample_0241E000.bin of=pefile.exe bs=1024 seek=189
160+0 records in
160+0 records out
163840 bytes (164 kB, 160 KiB) copied, 0.00103226 s, 159 MB/s
# add the .reloc section from 0x2446000 to 0x56e00
# 0x56e00 == 355840
# 355840 / 512 (bs) == 695
$ dd if=sample_02446000.bin of=pefile.exe bs=512 seek=695
88+0 records in
88+0 records out
45056 bytes (45 kB, 44 KiB) copied, 0.000521471 s, 86.4 MB/s
If we compare the output file (of), pefile.exe, with the original PE file, sample.exe, loaded into x64dbg, we can see an obvious difference:
-r——– 1 user group 398848 Oct 7 2023 sample.exe
-rw——- 1 user group 400896 Jun 10 10:25 pefile.exe
Should they not be the same size?! Let’s think about what’s going on here. We dumped the memory sections from x64dbg and pasted them back together at the correct file offsets in pefile.exe. However, memory is allocated in pages (of typically 4096 bytes on 80×86 processors), and x64dbg is dumping the whole block of memory which is hence going to be an integer number of pages (that is, an integer multiple of 4096 bytes).
We are, however, specifying the offset where we want the sections to be written in the output file, so the extra padding at the end of the section dumps shouldn’t make any difference because each section will be truncated to the same size as in the original PE file when we place the next section at its correct offset. Except, that is, for the last section which isn’t truncated because we’re not writing another section after it.
The section table is telling us that the last section (.reloc) is 0xa6bc (42,684) bytes, but the file that x64dbg dumped for the last section is 45,056 (11 x 4096) bytes. So, taking that difference (2,372 bytes) into consideration and subtracting it from the size of the rebuilt PE file (400,896 bytes) we get 398,524 bytes. The original PE file (at 398,848 bytes) is larger by 324 bytes, which looks like it could be padding (something that I could have done with when I came off the front of a 36″ unicycle doing around 20 km/hr a few weeks ago). 398,848 / 512 is 779.0, whereas 398,524 / 512 is 778.3671875, so the original PE file may have been padded to the nearest 512 bytes, for some reason, possibly in case it crashes. Padding is handy in a crash.
So there we have it — we’ve reconstructed a PE file from memory. We can now analyse this with Ghidra, and other tools of choice. Let’s do some sanity checking and check the objdump(1) output for both the original sample file, and the reconstructed PE file, we’ll see that the only difference is the file name:
$ objdump -x sample.exe > original
$ objdump -x pefile.exe > reconstructed
$ diff original reconstructed
2,3c2,3
< sample.exe: file format pei-i386
< sample.exe
---
> pefile.exe: file format pei-i386
> pefile.exe
To demonstrate why we need to go to all that trouble to reconstruct the PE file, let’s run objdump(1) on the large (33,886,208 byte) memory dump file that we started off with (the memory dump file containing all of the sections):
There is an import table in .text at 0x42d550
The Import Tables (interpreted .text section contents)
vma: Hint Time Forward DLL First
Table Stamp Chain Name Thunk
0002d550 004086c3 004086d7 fffffffe 00000000 ffffffd4
PE File Base Relocations (interpreted .reloc section contents)
There is a debug directory in .text at 0x401230
Type Size Rva Offset
0 Unknown 00000000 00000000 00000000
The .rsrc Resource Directory section:
000 Type Table: Char: 0, Time: 00000000, Ver: 0/0, Num Names: 0, IDs: 0
WARNING: Extra data in .rsrc section - it will be ignored by Windows:
218 Type Table: Char: -622912640, Time: 2520e47f, Ver: 5/0, Num Names: 2, IDs: 0
228 Entry: <corrupt string offset: 0x401cd0>
Corrupt .rsrc section detected!
Notice how objdump(1) fails to get most of the information, like imports, that is contained in one of the sections rather than in the PE header(s). This is because the sections are not at the correct offsets, as specified in the section table, in the PE file. Compare this with objdump(1) output on the reconstructed PE file:
There is an import table in .text at 0x42d550
The Import Tables (interpreted .text section contents)
vma: Hint Time Forward DLL First
Table Stamp Chain Name Thunk
0002d550 0002d5dc 00000000 00000000 0002dac4 00001014
DLL Name: KERNEL32.dll
vma: Hint/Ord Member-Name Bound-To
2d806 973 SetEndOfFile
2d816 313 FindResourceW
2d826 700 InterlockedDecrement
2d83e 698 InterlockedCompareExchange
2d85c 6 AddConsoleAliasW
...
Now my main reason for doing that was to test that I could rebuild a PE file from an unpacked PE file in memory, and hence save the embedded PE file that the Tofsee sample unpacks. I could then load it into Ghidra to see what Ghidra makes of it. Although it is worth noting that you can load the raw memory dumps into Ghidra — you just need to help it out by telling it what the bytes are that you’re giving it (and maintaining valid PE file structure if the memory dump is of a PE file header and sections). So, now let’s return from this little side-track back to our Tofsee analysis.
- I grew up thinking ‘jiggery pokery’ was just some kind of clever/fancy tricks, however when I looked it up to verify that, I found out that it is actually used to mean deceptive/slight-of-hand trickery. I’m going to leave it in though because I like the phrase, just not to mean what it’s formal/common definition seems to mean. ↩︎
- POSIXTM: Portable Operating System Interface defines system APIs and commands for portability. Consequently, shell scripts written on one POSIXTM system should run without modification on another POSIXTM system. ↩︎