So there I was happily running mergecap in a script to merge the honeywall’s hourly pcap files together, when it aborted with an error and reported that the capture file ‘appears to be damaged or corrupt’. This problem means that I will be missing some packets in the merged capture file which will potentially hinder my analysis, so I decided to get out a hex editor and play with the capture file to see if I could see what was going on and at least recover the remaining packets from the corrupt file.
Despite not knowing the format of a libpcap capture file, I was able to intuitively work out what was going on. Granted, I couldn’t figure out why it had happened, but I could figure out enough to recover the remaining packets from the file.
The error that I received from mergecap was:
mergecap: Error reading 1351904462/log: The file appears to be damaged or corrupt. (pcap: File has 1922760704-byte packet, bigger than maximum of 65535)
You may recognise that directory name as a seconds since epoch timestamp, and you’d be correct. The Honeywall’s /etc/init.d/hw-pcap script which runs at one minute past each hour, creates a directory using the current seconds since epoch timestamp as the name. Each of its /var/log/pcap/ directories gets its name from the seconds since epoch time of its creation, and the log file therein is a pcap file containing an hour’s worth of captured traffic.
So, looking at that error message we see that mergecap was under the impression that there was a 1922760704 byte packet, which is ludicrous. Assuming that it was getting that information from header information in the pcap file itself, especially given the capture file wasn’t that large, I loaded the troubled pcap file in to bvi (binary vi — a hex editor) to have a look at it.
Next, I used bc (a calculator) to convert 1922760704 to hex so that I could search for it in the pcap file:
$ bc bc 1.06.95 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. obase=16 1922760704 729B0000
The bc obase command sets the output base to 16 (hexadecimal), in this case. This tells bc that we want it to output values in base 16 (hexadecimal) while we are going to give it values in the default base of 10 (decimal) — note bc‘s ibase command later. This tells me that 1922760704 is 0x729b0000 in hex (I prefer lower case hex digits in hex numbers, however bc only accepts upper case hex digits). I can now use ‘\’ in bvi to search for the hex value (as opposed to ‘/’ which would search for text) 0x729b in our pcap file.
bvi doesn’t find the value 0x729b, but remember, values requiring more than one byte to represent them are stored in either big-endian or little-endian format. This means that 1922760704 could be stored as either 0x729b0000 or as 0x00009b72, so I’ll try a search for 0x9b72. Bingo. 0x9b72 appears at offset 0x1000. The two bytes before it are 0x0000, so that gives us 0x00009b72. Here is a screenshot of my bvi session on the off chance that this is more exciting that whatever you happen to be doing at the moment, in which case you can follow along.
So I have found what could be the incorrect packet size in the corrupt pcap file, but how can I be sure? For all I know, that data could be in the middle of a packet. This is where knowledge of Ethernet frame headers comes in handy, along with the ability to recognise common values.
Ethernet frames look like this:
0x0000: Destination MAC address (6 bytes) 0x0006: Source MAC address (6 bytes) 0x000c: Ethertype/length (2 bytes) 0x000e: Payload data
Since all the Ethernet frames in these capture files are travelling between KVM virtual hosts using the default MAC addresses, all the MAC addresses start with the OUI value of 52:54:00 (those values are hex by the way). You may notice that those first two hex values are in the range of ASCII printable characters and that they correspond to ‘R’ and ‘T’ respectively. That is handy because it means that we (now that I have included a screenshot) can find the start of Ethernet frames by searching through the text on the right in bvi, for two ‘RT’ strings with four bytes (the 0x00 on the end of the OUI, and the remaining three bytes of the MAC address) between them. Go on — I’ll race you.
Looking back to find the nearest couple of ‘RT’ strings before 0x1000, we see one at offset 0xfb4, and the second at 0xfba. This suggests that offset 0xfb4 is the start of an Ethernet frame, however, let’s use our knowledge of the Ethernet frame format to sanity check our findings and make sure that this at least looks like a valid Ethernet frame:
0x0fb4: Destination MAC address (52:54:00:0d:5e:c5) 0x0fba: Source MAC address (52:54:00:da:2c:4c) 0x0fc0: Ethertype/length (0x0806)
That looks good. The MAC addresses look sane, and the 0x0806 Ethertype value corresponds to ARP. Looks like we’ve found ourselves an ARP packet, which is good as that will explain the third ‘RT’ in between offset 0xfb4 and 0x1000 (ARP packets have MAC addresses in their payload).
Since packets displayed in wireshark/tcpdump have a timestamp associated with them, I think it is probably safe to say that a timestamp value is stored along with each packet. Hence we can expect some sort of header before each frame in a pcap file. I’ll refer to this header as the pcap frame header to distinguish it from the Ethernet frame header. Let’s see if we can figure this header out.
Ok, so it looks like we’ve found an ARP packet starting at offset 0xfb4, and the packet before it looks like a syslog message that ends at the end of the syslog message text (I’m assuming that the Ethernet frame wasn’t padded, and that the pcap file format doesn’t add any trailing data), being offset 0xfa3. That gives us sixteen bytes (between 0xfa4 and 0xfb3 inclusive) of unknown data to account for.
Let’s start with the last eight bytes, as that looks like the value 0x0000003c (values are little-endian remember) repeated. This is a reasonably small value that could be something like the size of the packet. Converting 0x3c to decimal we get (3 * 16) + 12 == 60. Is it coincidence that 60 is also the smallest allowable frame size on an Ethernet network? Why does it appear twice?
If you’ve used wireshark to look at packets in a pcap file, you may have noticed that it mentions the size twice. Once to indicate the size of the packet on the network, and once to mention how many bytes it actually captured. This could explain why 0x0000003c occurs twice.
So what of the eight bytes between 0xfa4 and 0xfab? Remember there is a timestamp associated with each packet. Let’s see what the 4-byte (little-endian) value at 0xfa4 looks like when converted to decimal:
$ bc bc 1.06.95 Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. ibase=16 50947142 1351905602
In this case, the ibase command sets the input base to 16 (hexadecimal), to say that any values that we enter are hexadecimal values. Any values that it outputs though will still be using the default base of 10 (decimal). That value of 1351905602 looks remarkably like a seconds since epoch timestamp. The 0x00056E62 value following it at 0xfa8 however, is 355938 in decimal and doesn’t look so familiar.
Let’s load the capture file in to wireshark and see how we went. The screenshot shows part of the wireshark window. The display stops at the point of corruption, and highlights packet #38 (which looks corrupt) to display its contents in the bottom two panes.
Firstly notice how the packet before it, that is #37, is a 60 byte ARP packet, and that if we use the UNIX (I think only GNU’s date command supports the ‘@<seconds>’ format actually) date command to convert our seconds since epoch looking value to a time, thusly:
$ date -d '@1351905602' Saturday 3 November 12:20:02 EST 2012
we see that it matches that displayed by wireshark as the timestamp for packet #37 — the 60 byte ARP packet. What of the 355938 value which we didn’t recognise? wireshark has appended it to the timestamp as a sub-second measurement of time.
So, what’s gone wrong? Let’s go back to our hex editor to try to see why the pcap file is supposedly corrupt. We’ll start by looking at the pcap frame header of the last known good packet, that is the 60 byte ARP packet that we identified, at offset 0xfa4.
Skipping over the two four byte timestamp values we get to the two length values of 0x0000003c. Length values in headers sometimes include the length of the header, and sometimes don’t. In this case, since it is 60 and 60 bytes is a likely frame size (minimum Ethernet frame size), not to mention the size of the frame as reported by wireshark, we’ll conclude that the length specified in the pcap frame header is that of the captured frame and doesn’t include the length of the pcap frame headers.
Adding 0x3c on to 0xfb4 (the start of the Ethernet frame data), we get 0xff0 (it still takes me a while, and usually involves counting on my fingers, to do maths in hex).
Jumping down to offset 0xff0 we can expect to see, from what we’ve learnt so far, two 4-byte timestamps, two 4-byte length values, then the start of an Ethernet frame. We certainly see what looks like two 4-byte timestamps, and two 4-byte length fields, however at offset 0x1000 where the Ethernet frame data supposedly starts, we see 0x9b729450 (little-endian), which looks (after converting it to big-endian) more like the seconds since epoch timestamp value of 0x5094729b that we were converting earlier, than it does a MAC address at the start of an Ethernet frame.
In fact, it looks like offset 0x1000 is the start of another pcap frame header with two timestamp values and two length values, then the start of an Ethernet frame at offset 0x1010. wireshark has gone ahead with the notion that there is an Ethernet frame at offset 0x1000 and attempted to decode it. You can see this in the wireshark screenshot fragment, above, with what looks like the second pcap frame header down in the hex dump of the frame. Hence the decoded data in the top pane is not recognisable (Ethertype of 0x3600).
If we go along with the information in the pcap frame header and say that there is a 60 byte Ethernet frame starting at offset 0x1000, that means the next pcap frame header will be at 0x1000 + 0x3c == 0x103c. Jumping down to 0x103c and decoding the data as the next pcap frame header, we get a timestamp value of 0x02500000 (big-endian), which is 38797312 in decimal and corresponds to a time of ‘Friday 26 March 11:01:52 EST 1971’.
Moving along, we skip the sub-second timestamp and get to the length value which is 0x729b0000 (big-endian), which is 1922760704 in decimal and the same value as in the mergecap error message. That explains the corruption.
What we are going to do now is to remove one of those two consecutive pcap frame headers. That should make the pcap file valid again and allow us to process the remaining packets in it.
The question now, is which of the two consecutive headers should we remove? Given the first one specifies a length of 60 bytes, and the second one specifies a length of 54 bytes, I would say that the first one is more likely correct. The length of the smallest transmittable Ethernet frame is 60 bytes. This is to make sure that the transmitting station is transmitting for long enough to allow all stations on the network to detect a collision (back in the days when an Ethernet network consisted of multiple stations sharing a run of coaxial cable). The length of 54 could be from a runt which has otherwise been ignored/dropped.
Having said that, when writing to the file the pcap frame header is more likely written at the same time as the frame itself. This would suggest that the second header is the most likely correct header of the two. Plus the pcap files show a few packets that are only 54 bytes in length — maybe network cards aren’t worrying about the 60 byte minimum size now that they are using point-to-point UTP cabling to another Ethernet device, usually a switch, instead of a shared coaxial cable).
Let’s use what we’ve learnt so far and search for the start of the Ethernet frame after the one that actually starts at 0x1010. We find ‘RT’, four bytes, ‘RT’ at offset 0x1056.
Working back over the two 4-byte length values and the two 4-byte timestamp values, we find the start of the pcap frame header at offset 0x1046. Since we know that a pcap frame header starts immediately after the end of the data from the previous frame (we found the timestamp values immediately following the end of the syslog message text), we can calculate the length of the frame at 0x1010 as 0x1046 – 0x1010 == 0x36 bytes. Therefore we need to keep the second of the two consecutive pcap frame headers and remove the sixteen bytes from 0xff0 to 0xfff inclusive (being the first pcap frame header with a length value of 0x3c).
Before we change anything though, let’s take a backup copy of the pcap file so that we have the original to refer to, and evidence that this happened. After taking a copy, open the copy and scroll down to offset 0xff0. In order to remove bytes (or otherwise alter the size of the file) in bvi, you need to issue the :set memmove command. Make sure that your text cursor is over the byte at offset 0xff0 (either on the hex side or the text side of the display). You can remove 16 bytes in bvi the same way that you would remove 16 characters in vi, by typing 16x. The 16 being a repetition count, in this case we want to repeat the following command 16 times, and the x being the command, in this case delete character/byte.
Before we save the file, let’s just have a look to make sure that it looks sane. We have the pcap frame header at offset 0xff0 which specifies that the frame is 0x36 (54) bytes long. Adding 0x36 on to 0x1000 (the start of the frame data) we get 0x1036.
At 0x1036 we have 0x5094729b (big-endian) which looks like a timestamp value from the start of the next pcap frame header. Adding 16 on to skip over the pcap frame header (two 4-byte timestamp values and two 4-byte length values) we see what looks like the start of another Ethernet frame. I think that our work here is done, so let’s save it using the :w command, exit bvi using the :q command, load the pcap file in to wireshark, then never speak of this again.
Brilliant. The capture file now loads and displays packets all the way down to #157, captured at 13:00:58. Given that the /etc/init.d/hw-pcap script restarts the tcpdump process at one minute past every hour, and that wireshark didn’t complain when loading it, I’d say that it now looks like a complete packet capture. The fact that the corruption occurred would suggest that there is at least one packet missing, but that is better than missing forty minutes of packets from the point of corruption onwards.
Given the timestamp in the deleted pcap header was 12:20:02, it was 60 bytes long, and packet #37 is an ARP request which goes unanswered, I’d say that there is a reasonable chance that our missing packet is the ARP reply to the ARP request in packet #37. We can’t say for certain though. Nor can we be certain that only one packet is missing.
Now I can go back to my mergecap script and get on with the MySQL attack analysis that I was going to use these pcap files for this morning. As for what caused the corruption, I can’t say. It looks like tcpdump (or something) was possibly interrupted in the middle of writing a frame to the file (the 60 byte one whose header we ended up deleting) because either itself, or something else, wanted to write another frame. Either that or it wrote the pcap frame header and then received an error requesting the Ethernet frame data.
If it was a thread synchronisation issue, then the first thread should have continued writing the frame at some point, in which case we’d expect to see corruption later in the file. I’m wondering if a write() call to write the missing frame’s data to the file took so long (due to my disk lying about its physical block size and hence causing the operating system to issue inefficient block writes) that the buffer containing the most recently loaded frame got overwritten with the next frame causing an error from the libpcap call to fetch the frame data. I know libpcap programs use a loop to fetch captured packets, but it has been a while since I’ve done any libpcap/WinPcap programming so I can’t remember much more than that.
I’m digressing… On with the mergecap script and the next blog post in the MySQL attack analysis series.