Extracting Data From MS-SQL TDS Streams in a libpcap File

Sometimes in life you find yourself wanting to have a quiet afternoon in front of a computer, extracting login credentials and SQL commands from captured MS-SQL TCP connections. Other times you may find yourself needing to do so to analyse some MS-SQL attacks. Whatever your reason, this post explains how to use my parsetds.py script to extract such information.

My parsetds.py script will only process and parse TDS (Tabular Data Stream) data, it won’t process and parse TCP/IP packets. So to start off, we need to extract the MS-SQL application data from the network captures. To do that, I used tcpflow.

$ tcpflow -r  'tcp dst port 1433'

I used tcpflow because, as its man(ual) page puts it, ‘tcpflow understands TCP sequence numbers and will correctly reconstruct data streams regardless of retransmissions or out-of-order delivery’.

However, I noticed a problem with tcpflow. While it understands TCP sequence numbers, it either ignores the start/end of connections, or simply reuses the same output file for different connections using the same ports.

A TCP connection is uniquely identified by its IP address and TCP port pairs, however, this is only true for connections that are active at the same time. That is, you can’t have two different active TCP connections with the same IP address and TCP port pairs. You can, however, have a TCP connection close, and a later TCP connection use the same IP address and TCP port pairs as the (now) closed connection.

The TCP has what is known as the TIME_WAIT state, which is a state in which the host initiating the connection waits for a given amount of time, to make sure that there aren’t any TCP segments belonging to this connection still in existence on the network (which can happen with retransmitted packets, for instance).

It could be that tcpflow is aware of this, and that the problem is with TCP connections reusing ports before the TIME_WAIT time is up (I have noticed Wireshark flagging reused TCP ports). Unfortunately for us, and the point to me bringing this up (I got to it eventually), is that if TCP ports are reused within a capture file, it can cause tcpflow to create huge output files, and suggests that there could be cases where tcpflow may corrupt (overwrite) data from earlier connections with data from later connections.

Presumably, the large files are created because of the large difference in TCP sequence numbers between the first and later connections. tcpflow believes that the packets in between were lost, and fills the void with 0 bytes. This would be the case, if the connection wasn’t closed and opened again.

Backing this theory is the fact that the size of the file ends up being the difference between the first TCP sequence number of the first connection and the last TCP sequence number of the second connection, plus the size of the data payload in the last TCP data packet. .

This reuse of the TCP ports could be legitimate TCP behaviour, if the sequence numbers of the latter connections are larger than those of the former connections. Either way, I’m not going to dwell on this much more, and am only mentioning it because it causes some problems when it comes to parsing tcpflow‘s output files, as you’re about to see.

Note that the file globbing pattern of ‘*01433’ is because I’m only interested in TDS messages from the client to the server, and given that I only implemented what I needed and haven’t included support for TDS message type 4 (being the server response messages), there isn’t much point using a different glob pattern anyway.

$ for f in *01433; do
  echo "$f"
  python parsetds.py "$f" > "$f.sql"
done

My original version of parsetds.py would hang on the large problem files output by tcpflow. The reason being, TDS data streams consist of a series of messages. Each message has a length value used to indicate its size. parsetds.py was adding the length value to the current offset in the stream, to find the next message. When tcpflow filled what it thought was a gap caused by missing TCP packets, with 0 bytes, the length value at what would be the start of the next TDS message was set to 0. parsetds.py got stuck in a loop continually adding this length of 0 on to the current offset and as a result, never reached the end of the TDS stream, and hence never exited the loop.

As a result of this, you’ll notice a part of the script toward the bottom (lines 171-175) which checks to see if the packet length and packet type are both 0, and if so it crawls (as it turns out, slowly) through the file looking for the start of another TDS message.

parsetds.py currently recognises two types of TDS message, a TDS7 login message and an SQLBatch message. It will output lines of one of three types:

TDS7 Login

These lines show various values from a TDS7 login message and start with the string Login7:.

Login7:SERVER|sa|\ua592\ua5b3\ua592\ua5b3\ua592\ua5b3|OSQL-32|555.12.34.56|ODBC|

The fields are separated by a ‘|’ (pipe) character, and are as follows:

Client name
User name
Password
Application name
Server name (obfuscated IP address in the example)
Library name
Database name (blank in the example)

The above example is someone attempting to login to the ‘sa’ account using a password consisting of Unicode (each character starts with ‘\u’) characters.

SQLBatch

These lines show SQL commands from SQLBatch messages. They are SQL commands which parsetds.py prefixes with the string SQLBatch:.

SQLBatch:sp_configure 'show advanced options', 1;

Warning

These lines indicate conditions that parsetds.py detected, rather than data from the TDS connection. There are two warnings:

Warning:Found tcpflow reused ports problem. Searching file for next TDS message…
This warning indicates that parsetds.py has found a TDS message with both length and type fields of 0. This suggests that its input file contains TCP stream data from at least two TCP connections between the same two hosts, using the same pair of TCP ports. It is notifying you that it has detected the tcpflow problem described above.
Warning:Resuming
This warning is notifying you that parsetds.py, after finding a TDS message with length and type fields of 0, has searched through the rest of its input file, found another TDS message, and that it is resuming parsing.

So there you have it. The script itself is pretty simple in that it is just fetching data from particular offsets in the stream, and printing it. The two consecutive while loops at lines 173 and 174, are my attempt to speed up the search for the next TDS message. The first loop searches eight bytes at a time until it comes across a block of eight bytes that isn’t all zero, and then it searches one byte at a time. In practice though, this didn’t seem to offer much of a speed improvement.

See my next post for a practical example of parsetds.py being used to analyse some MS-SQL attacks.

Malware Musings

Thoughts on malware and malware analysis