Increase in MySQL Attacks: Extracting the Binary Files

Seeing an increase in MySQL attacks hitting your network and interested in knowing more about them? If so, then these posts are for you. They have all the fun involved from noticing an increase in traffic to extracting malware from a packet capture and analysing it. If you like the thrills and spills of scripting information processing tasks, then read on as this post will show you how to extract the binary files from the MySQL commands.

Updating the Capture File

Before going any further with this, I decided to update the packet capture files as a couple of weeks have gone by since I generated those weekly pcap files. By generating a new packet capture file, I’ll be able to see if the attacks have changed much since I first noticed them, back in early November.

I decided to use the Honeywall‘s Walleye web interface to generate the new pcap file, as I’m after less than a month’s worth of packets. This can be done from the Data Analysis tab. Change the date/time text in the Time Start text box and submit the form. If you want to, you can change IP Proto to TCP and then enter 3306 in to the Either Port text field to only get MySQL packets. This can reduce the size of the capture file somewhat. I called this capture file november.pcap.

Extracting the Binary Files

This is where it gets interesting. I used my extractbins.sh script to extract the binaries from the MySQL commands contained in the november.pcap capture file. This script uses the tcpick command to extract TCP data from TCP connections.

As a sanity check, I also generated the attack_* files, as in the previous post in this series, and extracted the binaries from the MySQL commands in those files. The extractbins.sh script found eight different binary files (from the ‘autocommit’ attacks), whereas the commands in the attack_* files only yielded two different binary files. These latter two were the most frequently occurring of the eight binaries from extractbins.sh, with each of the remaining six only appearing once. Something was amiss.

s/long story/short story/ (to make a long story, short), I found that the tcpick command wasn’t reconstructing the TCP data stream correctly when there were missing and/or retransmitted packets. Extracting the hex data from the MySQL commands saved from a ‘Follow TCP Stream’ window in wireshark yielded a binary file that was the same as one of the binary files extracted from the attack_* files.

This taught me two things. Firstly, that tcpick can’t reliably get data from TCP connections with dropped/retransmitted packets; and secondly, neither can tshark as the attack_* files corresponding to this connection were missing the hex data altogether.

I’m going to have to write something to reconstruct TCP streams, I think. That is a story for another time though.

So, the two different binaries that I got out of those MySQL connections were:

cna12.dll (a922d55a873d4ad0bbbbbc8147a3a65a)
cna12.dll (8981d24d223d8996d97a80f41a5a468e)

The cna12.dll filename came from the following MySQL command:

set @dir2 = concat('select data from yongger2 into DUMPFILE "',@@plugin_dir,'\\cna12.dll"')'

Quick Comparison

Quickly comparing the PE header information to get an idea of any similarity between these two files, we see:

$ objdump -x 8981d24d223d8996d97a80f41a5a468e > 8.hdr
$ objdump -x a922d55a873d4ad0bbbbbc8147a3a65a > a.hdr
$ diff 8.hdr a.hdr
2,3c2,3
< 8981d24d223d8996d97a80f41a5a468e:     file format pei-i386
< 8981d24d223d8996d97a80f41a5a468e
---
> a922d55a873d4ad0bbbbbc8147a3a65a:     file format pei-i386
> a922d55a873d4ad0bbbbbc8147a3a65a
6c6
< start address 0x10006730
---
> start address 0x10006620
15c15
< Time/Date             Tue May 22 01:46:20 2012
---
> Time/Date             Fri Jul 13 10:34:22 2012
22c22
< AddressOfEntryPoint   00006730
---
> AddressOfEntryPoint   00006620
97c97
< Time/Date stamp               4fba634c
---
> Time/Date stamp               4fff6d0e
111c111
<       [   1] +base[   2] 1350 Export RVA
---
>       [   1] +base[   2] 1330 Export RVA
123c123
<   1 UPX1          00000a00  10006000  10006000  00000400  2**2
---
>   1 UPX1          00000800  10006000  10006000  00000400  2**2
125c125
<   2 UPX2          00000200  10007000  10007000  00000e00  2**2
---
>   2 UPX2          00000200  10007000  10007000  00000c00  2**2

This tells us that there aren’t that many differences other than the filename, timestamp, and a couple of addresses/offsets. Given this information, and the fact that both files were delivered by very similar looking attacks, I’d say that a922d55a873d4ad0bbbbbc8147a3a65a could just be a later version of 8981d24d223d8996d97a80f41a5a468e. It will be interesting to see what differences we find during analysis.

Let’s locate that timestamp in the hex encoded version of the binary file, and see if we can notice a pattern with regard to when the attacks were using each of the two different binaries.

# convert the human readable timestamps to seconds since epoch
$ date -d 'May 22 01:46:20 2012' '+%s'
 1337615180
$ date -d 'Jul 13 10:34:22 2012' '+%s'
 1342139662

# convert the seconds since epoch values from decimal to hexadecimal
$ bc
 bc 1.06.95
 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
 This is free software with ABSOLUTELY NO WARRANTY.
 For details type `warranty'.
 obase=16
 1337615180
 4FBA634C
 1342139662
 4FFF6D0E

# no real surprise that these match the hex Time/Date stamp values from
# objdump
# being an Intel platform, and the fact that Windows was first developed
# on an Intel platform, those values are likely stored in little-endian
# format in the PE headers.
# it would be prudent to check for both little-endian and big-endian
# versions though, to increase confidence that it is actually the timestamp
# that we have found, since we're not bothering to actually find the
# part of the PE file that has the timestamp (I shall, however, do so for
# a later exercise).
$ grep 4C63BA4F attack_* | cut -d: -f1,2
attack_1352440662.781644000_555.0.14.6:1633
attack_1352507392.778021000_555.0.14.6:4846
attack_1352753753.003280000_555.55.16.4:42233
attack_1352838143.708208000_555.0.14.6:2969
attack_1353269681.904467000_555.0.14.6:2643
$ grep 4FBA634C attack_* | cut -d: -f1,2
$
$ grep 0E6DFF4F attack_* | cut -d: -f1,2
[ 55 lines of output ]
$ grep 4FFF6D0E attack_* | cut -d: -f1,2
$

# sanity check -- check that the 60 (55 + 5) files matched by grep
# is the same as the number of attack files that we have with 
# binary files in them (they are larger than 34 bytes)
$ find attack_* -size +34c -print | wc -l
60

# look for a trend. I was expecting that the binary would have changed
# over time with the later timestamped one replacing the earlier
# timestamped one.
# we'll look for a trend by extracting the seconds since epoch timestamp
# and IP address from the filenames, and appending a tag of either 'E' for
# the binaries with the earlier timestamp, or 'L' for the binaries with the
# later timestamp.
# the first s command in the sed script removes the fractional seconds
# from the timestamp part of the filename.
$ grep -i "4C63BA4F" attack_* | cut -d: -f1| cut -d_ -f2,3 | sed 's/\.[0-9]*_/ /;s/$/ E/' > earlier
$ grep -i "0E6DFF4F" attack_* | cut -d: -f1| cut -d_ -f2,3 | sed 's/\.[0-9]*_/ /;s/$/ L/' > later

# now, if the trend is based on time, that is, one of the binaries
# replaced the other, then the output from the following command should
# have all the 'E' lines together, and all the 'L' lines together, as the
# output is sorted by timestamp
$ cat earlier later | sort
[ 60 lines of output ]

# as it turns out, the 'E' and 'L' lines are intermingled.
# let's sort on the second field (-k2), being the sourcce IP address, and
# see if it is a different source address that is giving us the different
# binaries
$ cat earlier later | sort -k2
[ 60 lines of output ]

# that's better -- the 'E' and 'L' lines are grouped by IP address, and
# tell us that there are two particular IP addresses which are sending 
# the older timestamped binary, and that the other addresses are sending 
# the newer one.

# did any of the attacks originate from somewhere other than China?
$ cat earlier later | cut -d\  -f2 | sort | uniq | while read ipaddr; do
>     whois $ipaddr > ${ipaddr}.whois
>     sleep 1
> done
$ grep "^country:" *whois | cut -d: -f2- | sort | uniq -c
      2 country:        cn
     81 country:        CN

# seemingly not

Conclusions

These ‘autocommit’ MySQL attacks are delivering two different, but on initial inspection very similar, DLL files. Given the CREATE FUNCTION xpdl3 RETURNS STRING SONAME ‘cna12.dll’ MySQL command sent by the attacker, I’d say that cna12.dll is likely implementing a MySQL UDF (User Defined Function) called xpdl3. This, as we shall find out in the next post, gives us some information about the functions that we can expect to find within each of the two cna12.dll DLL files.

The two different binaries are distributed by different hosts. There are two hosts which always distribute the older timestamped version, while other hosts always distribute the later timestamped version (this was determined by concatenating the earlier and later files, generated by some of the commands above, and sorting by IP address).

Join me for the next post where I do some static analysis of one of the DLL files, before poking around it with a debugger.

Malware Musings

Thoughts on malware and malware analysis