Using Bro to extract data from MS-SQL TDS streams

Remember my parsetds.py script to extract data from MS-SQL TDS streams? Well here is a bit of an introduction to the Bro Network Security Monitoring software which implements my parsetds.py functionality using Bro‘s scripting language.

If you find monitoring networks for security related artefacts interesting, or it’s your job, then read on as I demonstrate some of Bro‘s scripting flexibility by using it to extract MS-SQL commands and login information from network traffic.

I attended a conference, late 2013, which demonstrated the use of Bro to extract information, and files, from network traffic. This whet my appetite and so I installed Bro at home and started playing with it.

On an unrelated note, it is also the 06th March, which is the day (Michelangelo’s birthday) that the Michelangelo Virus would reformat your hard disk.

Something that became evident pretty early on, is that Bro would be ideal to take over the role of a number of things that I had put in place, including my parsetds.py script to extract login information and MS-SQL commands (and more, but I don’t want to kill the suspense) from the MS-SQL attacks that my honeynet has been receiving.

A bit of background information (a.k.a. how I was doing this)

I was running tcpflow to capture network traffic destined for the MS-SQL server (actually Dionaea emulating an MS-SQL server). tcpflow was writing the TCP stream data to individual files, with each file name containing the source and destination addresses and ports of the connection.

Next, I used Suricata to monitor MS-SQL connections for the string 0x4D5A which you may recognise as the ASCII codes for MZ, which is the signature of the start of a Windows PE executable file. These strings were arriving in a series of MS-SQL attacks hitting my honeynet.

Following this, I ran logsurfer, and configured it to watch for the alerts that I’d configured Suricata to generate when it saw the 0x4D5A string in the inbound MS-SQL connections. When logsurfer saw such alerts, I got it to run my runparsetds.sh script.

I then wrote a script, runparsetds.sh, which took the IP address and port information from the Suricata log entry (matched and passed to my script by logsurfer), mangled it with awk to create the file name that tcpflow used to store that TCP stream, and then ran my parsetds.py script on the tcpflow file.

runparsetds.sh then took the output from my parsetds.py script, looked for the line containing the 0x4D5A string, mangled it with sed to leave just the hex digits, and passed that to another awk script which converted the hex digits back to binary characters and spat them out to a file.

With me so far?! Basically I was using tcpflow, Suricata, logsurfer, a custom Bourne shell script, sed, awk, and a custom Python script, to extract a hex encoded Windows PE file from commands in MS-SQL attacks.

Why am I telling you all this? So that you can appreciate how much easier Bro made it (or how difficult/over engineered I made the problem).

The Bro Solution

One script, my parsetds.bro.

To start with, I tried to learn, understand, and write a protocol analyser for Bro, given that there wasn’t one for TDS/MS-SQL connections. This was taking a while because whenever I searched for something to try and get information, I kept getting links to the source code rather than to documentation. That and the fact that I have to go to work.

So, determined to regain my youthful inquisitiveness and produce something which I, at least, thought was pretty funky (not to mention useful), I set about placing fprintf(stderr,…) statements throughout the Bro source code, especially the automatically generated C++ code, and print statements throughout the Bro scripts, to figure out what was going on.

I eventually got enough of an analyser working to give me the SQL commands from an MS-SQL connection, and I was about to blog about it when, in order to document how I did it, I went to restore the original source code and realised that version 2.2 had been released. Figuring that I should use the latest version, I downloaded version 2.2 and found out that they’d re-jigged the protocol analyser code.

So, I set about modifying my code to work with version 2.2. I was almost there, but it was taking me a while and I figured that I could probably implement my parsetds.py script functionality using Bro‘s scripting language, given that it has a means of passing just the TCP stream data to scripts. Plus a Bro script would be easier for people to implement in to existing Bro installations as it won’t require patching and recompilation of the Bro source code.

Getting to the point now, let me talk you through my Bro script.

How parsetds.bro works

parsetds.bro is pretty similar to my parsetds.py script, some of the lines are almost the same. However, it does have to do a little bit of TCP work because although Bro delivers the raw TCP stream data to scripts, it does so in chunks (which makes sense as it is capable of monitoring live interfaces and will have limited buffer space) of an indeterminate size.

To get around this problem, we create a data structure (record) which is used to buffer the TCP data until a complete TDS packet has arrived. This data structure is defined at lines 84 – 88, and a corresponding variable is declared and initialised at line 102. The pktlen and pktoff variables in the structure are used to keep track of where the next-to-be-processed TDS packet starts in the buffer, and also the length of the TDS packet (since it isn’t, and can’t be without Bro being TDS aware, specified in each chunk of data presented to the script).

A similar problem exists in that an SQL Batch request can span multiple TDS packets. This is handled by a second buffer, sqlbatch, declared and initialised at line 107. Since an SQL Batch request will always end at the end of a TDS packet, each TDS packet is buffered (by the script itself) and presented as a complete packet, and the last TDS packet in a TDS request is marked by a flag in the TDS header (bit 0 — End of Message), we don’t need to keep track of where we are up to, nor of how long the request is, so we can get away with just using a simple string to buffer the SQL Batch request. I should, however, be keeping track of how long my sentences and paragraphs are getting.

Now for a special Bro variable. This variable is defined in $BRO_HOME/share/bro/base/init-bare.bro as:

const tcp_content_delivery_ports_orig: table[port] of bool = {} &redef;

A Bro table is essentially an associative array, in this case indexed by a special data type of port. The contents of each table/array element are Boolean values, where a value of T (True) indicates that TCP content should be delivered for that port, and a value of F (False), or the lack of a table entry for a port, indicates that TCP content shouldn’t be delivered.

The &redef keyword indicates that the values of tcp_content_delivery_ports_orig can be modified by other scripts. The const declares the symbol (since it can’t be variable if it is constant) as a constant, which means that its value can only be modified before runtime. parsetds.bro seizes its chance and sets an entry for port 1433/tcp to ‘T‘ which will cause Bro to call the tcp_content() event handler with data from MS-SQL TCP streams (payloads). This gives parsetds.bro the same data that tcpflow was giving to parsetds.py, albeit in small chunks rather than in one file.

We’ll come back to the extract_varlen_fields() function. It is used to extract variable length fields from a TDS7 Login packet.

Here’s the exciting bit — the Bro event handlers. Bro‘s protocol analysers and other internal code, generate ‘events’ when certain things happen. For instance, the HTTP protocol analyser generates an http_request event when it parses an HTTP request, and similarly an http_reply event when it parses an HTTP reply.

This is what I am aiming for, a TDS protocol analyser which generates events when it parses each type of TDS packet (and a TDS packet itself). Until then though, we can do it with a Bro script. I also want to develop a MySQL protocol analyser, but after reading the specifications for the MySQL protocol and realising how many conditions need checking just to parse it, I quickly realised that an MS-SQL/TDS parser would be easier to start with, especially given I had no idea how to write a protocol analyser for Bro.

Continuing on then, the Event handlers section contains event handlers to handle some interesting Bro events. Event handlers are pretty much like functions, only there can be multiple event handlers for the same event, with differing priorities. Event handler definitions start with the event keyword.

The bro_init event handler is called, funnily enough, when Bro initialises itself. parsetds.bro uses the bro_init handler to create log steams (lines 185 – 189) which will allow it to log information.

The ‘TDS::’ preceding the log ‘tag’ is our module name (which is defined back at line 19). The $columns parameter specifies the columns that are to appear in each log file, and it is passed the record type that will be used to store the information to be logged. These record types are defined in lines 40 – 74.

Now, for the other event handlers, we’ll skip to the bottom and work up, as I’m used to C where functions have to be declared before they can be referenced.

tcp_contents()

Apart from the bro_init event handler, the script first gets control when it is time to process some TCP data. This happens when the tcp_contents event handler is called, with the following arguments:

connection: variable containing information about the connection, such as IP addresses and ports.
is_orig: indicates the direction of the data ('T' from the connection originator to the receiver, 'F' is to the connection originator)
seq: the TCP sequence number
contents: the bytes from the TCP payload data.

tcp_contents first checks (at line 338) that the data is from the client (connection originator) to the server (is_orig == T), and that the TCP connection is going to the port (resp_p — the port on the connection responder, being the server) 1433/tcp. This is necessary because if other scripts have requested the TCP contents for other ports, then we will also see that content, so we need to check that this call is for content that we want.

If the TCP data belongs to a port 1433/tcp connection, then add it to our TDS packet buffer (tdspktbuffer$pkt — the ‘$’ is the operator used to reference elements of a record, in a similar fashion to the ‘.’ in C and I believe in Pascal too). Consequently the c$id$resp_p construct used to check the destination port, means the resp_p element of the id record which is an element of the c record, which is the connection argument passed to the event handler.

Next (lines 342 – 345), we add the contents of the TCP stream to the packet data in our TDS packet buffer, tdspktbuffer. The block of code from line 351 – 371, determines the length of the new TDS packet. The bytestring_to_count() function is an inbuilt Bro function that treats its arguments as an unsigned int (the Bro count type) and returns the result. It is similar to the Python ord() function which you can see being used for the same purpose in my parsetds.py script.

To determine the length of the new TDS packet, the script checks to see whether we have just finished processing a TDS packet (or if this is our first TDS packet) and hence we are waiting for the header of a new TDS packet (line 351). This is indicated by the pktlen element of our tdspktbuffer record (structure) being 0. If pktlen is 0, then check to see if we have received at least 8 bytes — a complete TDS packet header — of the new packet. Technically, to calculate the length, we only need the first five bytes of the header.

Given that the length field is at a different location in a non-TLS TDS packet header than it is in a TLS TDS packet, we need to check the packet type (line 361). This is also why we need to read at least five bytes of the header to determine the packet length — a non-TLS TDS packet has the length at offset 2 – 3, but a TLS TDS packet has the length at offset 3 – 4.

The Length field in the TLS TDS packet doesn’t include the length of the header itself, which is why five is added to it (line 366). Once the length is calculated, it is stored in the tdspktbuffer record (structure).

The block of code from lines 376 – 414 is largely buffer management. It first checks to see if we have received the complete TDS packet (line 376), and if so, creates a tds_packet event (line 386) containing the connection information (passed to the tcp_contents event handler) and the complete TDS packet (including header). This causes Bro to call the tds_packet event handler.

After generating a tds_packet event, the script removes the TDS packet from the buffer (lines 395 – 397), leaving any trailing bytes as they will belong to the next TDS packet. If there aren’t any extra bytes, then it just empties the buffer (line 405).

Writing about what my script is doing is actually making me think about it again, and I’ve just thought of something which I obviously didn’t think about when I wrote it. parsetds.bro doesn’t call tds_packet but rather raises a tds_packet event, which then leaves it up to Bro‘s event handling code to call tds_packet.

Bro will also call any tds_packet event handlers in any of the loaded scripts. These calls may be done asynchronously (as a background task to the parsetds.bro script), which is making me wonder whether it is actually safe to remove the current TDS packet from the buffer immediately after raising the event. This will depend on whether a copy of the data, or a reference to the data, is passed to the event handlers. That’s probably a question for the Bro developers.

The final part of tcp_contents (lines 412 – 413) sets the pktoff and pktlen elements of our tdspktbuffer to 0, to indicate that the next TDS packet starts at offset 0 of our TDS packet buffer (tdspktbuffer$pkt), and that we haven’t yet extracted its length from its header. We know this because the assignment statement at line 397 copies any remaining data to the start of the buffer.

tds_packet()

There are two tds_packet event handlers in parsetds.bro, each with a different priority. The first one (lines 231 – 236) has a lower priority (-5), and is consequently called after the second tds_packet handler with a higher priority (0). The lower priority handler simply uses the Log::write() function to log the packet data to a Bro log file.

The second tds_packet event handler (line 242 – 324) is the one that actually parses the TDS packet. It currently ignores TLS packet types, and processes SQL Batch packets and TDS7 Login packets. The TDS7 Login packets are easy to handle (lines 315 – 322) as they consist of a single TDS packet so we can just extract the payload from the packet and create a tds_tds7login event.

The SQL Batch packets however are a little trickier, as a single SQL Batch request can be made up of multiple SQL Batch packets. Consequently we need our old friend Mr Buffer again, as mentioned in the tcp_contents explanation above. The buffering is somewhat simpler for SQL Batch requests though.

TDS packets contain an End of Message status bit in the TDS header which tells us whether there are any more packets (in this case SQL Batch packets) that make up this transaction. We know that there won’t be any trailing SQL Batch data belonging to another SQL Batch packet (from the protocol definition). Consequently, we can simply join all of the SQL Batch data together and then generate a tds_sqlbatch event after we have received the last SQL Batch packet.

The tds_packet() code between lines 269 – 310 does just that. If the packet type is an SQL Batch packet (0x01), then check to see if it is the first packet of the request. If it is, we have to skip over the ALL_HEADERS section, which is done by continuing on from the offset specified in the ALL_HEADERS TotalLength header field (line 282). Note that the value of ‘True’ (T) in the bytestring_to_count() call (line 281), specifies that the argument is in little endian byte order (that is, least significant byte first).

Most, if not all, of the strings in TDS packets (at least as used by MS-SQL), are Unicode characters ‘encoded in UCS-2‘ (see section 2.2 of the TDS protocol specification). Since Bro isn’t Unicode aware (that I’ve noticed to date), we need to convert these strings to ASCII. We’re potentially going to lose data here, as Unicode can represent more characters than ASCII can.

The first two blocks of the Basic Multilingual Plane — C0 Controls and Basic Latin, and C1 Controls and Latin-1 Supplement — include the English alphabet and the accented characters used in some European alphabets. I don’t know about you, but that covers more language(s) than I’m capable of reading and understanding.

The advantage of the first two blocks (of the Basic Multilingual Plane) is that the first byte of each character is 0x00, and the second byte of the first block at least, matches that of the ASCII characters. As such, we can cheat and simply remove null bytes from the strings (line 294). This leaves readable English (and I suspect also covers some of the European languages) strings, and leaves Bro to represent Unicode characters from other blocks as escaped hex strings of the form \xnn\xnn. The proper way to do this though, would be to use a Unicode aware function to convert the strings, but I couldn’t find one in Bro‘s scripting reference.

After ‘converting’ the Unicode string, it is appended to the SQL Batch request buffer, sqlbatch (line 299). The if block between lines 306 – 309 checks to see if this is the last packet in the SQL transaction and if so, generates a tds_sqlbatch event before emptying the SQL Batch request buffer.

The if condition at line 306 is a method of performing a bit test operation without the bitwise operators (&, |, ^). It is based on the fact that dividing by 2 is the same as shifting bits to the right by one bit (as (2^n) / 2 == 2^(n-1)). We don’t have a bitwise operator, but we do have an operator that will give us the remainder of a division — mod (%).

This means that we can test bit 0 by dividing by two and checking for a remainder. If we have a remainder, then bit 0 was set, if not, bit 0 was clear. That is what the test at line 306 is doing. Note that this will only work for bit 0, because a remainder after a division actually tells you if any of the bits below the bit represented by the divisor (2 in this case, being bit 1 (2 ^ 1 == 2)) are set.

Any bit can be tested by first dividing by the power of two represented by the bit that you wish to test, and then doing the mod 2 operation. For instance, to test bit 5 (which has the value 32 (2 ^ 5 == 32)), use the following expression (x / 32) % 2, or more generally, (x / (2 ^ n)) % 2, which will test bit n in x.

Lines 315 – 322 I’ve mentioned above, and they simply check for a TDS7 Login packet and create a tds_tds7login event.

tds_sqlbatch

The tds_sqlbatch event handler (lines 195 – 198) simply populates the fields of the tdssqlbatchinfo record and passes it to the Log::write() function, which is a Bro built-in function to log to files. I have already started work on an extension to this event handler, but I have left it out of this article for the sake of brevity. I’ll finish it off and create another article to cover it.

tds_tds7login()

The tds_tds7login event handler (lines 204 – 225) is a tad more complicated, although it is merely performing the same function of extracting information from the packet, populating the tdstds7logininfo record, and calling Log::write() to log it.

The problem with the TDS7 Login packet is that a lot of the information is in variable length fields. The TDS packet contains two sections, one (OffsetLength) containing the offset and length of each value, and the other section (Data) containing the values. See section 2.2.6.3 of the TDS protocol specification for more details.

The tds_tds7login event handler starts off by extracting some of the fixed variables from the TDS7 Login header (lines 208 – 211), before extracting the OffsetLength section in to the variable login_lenoff (line 216). If you’ve seen my parsetds.py script, then you’ll notice that I’ve replaced the seven three-line blocks with a function (extract_varlen_fields) and a vector of count.

The function, extract_varlen_fields() takes the OffsetLength section (in the login_lenoff variable) the Data section (in the data variable), and a list of indices of the fields we’d like to extract (in the fieldnums variable). In this case we are after the Data section fields at index 0 (Hostname), 1 (UserName), 2 (Password), 3 (AppName), 4 (ServerName), 6 (CltIntName), 8 (Database). The values of the requested fields are returned in a vector of string with the vector index representing its index in to the Data section. That is, a field’s index in the returned vector will be the same as the index used to request the field.

For example, the Database field is the ninth (the first is index 0, so index 8 is the ninth) field in the variable length field section, so it is requested by inserting ‘8’ in to the fieldidxs vector passed to the extract_varlen_fields() function.

The extract_varlen_fields() function will return with the value of the Database field at index ‘8’ despite it being the seventh field requested. That is, it is the index of the field in the Data section that determines its index in the returned vector, not the number of fields preceding it in the returned vector. Or to put it another way, the indices in the returned vector, for fields that weren’t requested (like fields 5 and 7 in this case), will be empty/non-existent.

Once the variable length fields have been extracted, the tds_tds7login event handler transfers the data to the tds7_tds7logininfo record, and passes it to the Log::write() function for logging.

extract_varlen_fields()

The extract_varlen_fields() function (lines 124 – 175), as previously mentioned, extracts the values of requested variable length fields. This function is reasonably straight forward. It basically steps through each of the indices requested in the fieldidxs parameter (line 132) and calculates the offset, in to the OffsetLength section, of the offset in to the Data section. That is, it finds the offset of the ib* fields in the OffsetLength section (line 136) by multiplying the index by 4 (each field has a two byte offset and a two byte length in the OffsetLength section). It then adds 2 (line 141) to get the offset of the length of the field (the cch* fields in the OffsetLength section).

These offsets are then used to extract the offset (line 146) of the field value in the Data section, and the length (line 151) of the particular field value. These values are then used to calculate the offset of the last byte of the field value (line 156). Note that the length is multiplied by 2 because the length specifies the number of characters, not the number of bytes. Since we are dealing with Unicode (UCS-2), each character is represented by two bytes.

The code at lines 158 – 171 checks to see if the field length is greater than 0, as non-present optional fields will have a length of 0 to signify that they are not included in the Data section. If the field is present (length > 0), then extract it from the Data section, and badly convert it to ASCII by removing null bytes (line 163).

If the field is not present, then set its value to an empty string (“”) in the returned vector (line 170), to stop the calling code from generating an error when it references it. Although a better approach would be to leave the value unset, and have the caller check for this condition before using it. That way the caller can tell the difference between a value that isn’t set, and a value that is explicitly set to a null string.

I may improve this in the next version, but for now, all the calling code is doing is logging the field values, and we need to provide a value for each field that Log::write() has been told to log otherwise it seems to silently quit without logging anything (and some earlier troubleshooting suggests that it fails to return to the calling code also).

The end (finally!)

So there you have it — a Bro script to capture some data from MS-SQL attacks. This version was written back in January 2014 — I posted it to a blog page, but didn’t write a post about it because it seemed to be missing some MS-SQL connections.

I noticed that it would work if ran against a pcap file, but seemed to miss some connections if ran on a live interface. I was thinking that this was due to a problem with my script (which is why I held off on the blog post), but I’m starting to wonder if it is a problem with Bro dropping packets due to resource constraints of my system.

If you get a chance to try this script out, I’d be interested to know if you notice any problems with it. Meanwhile, I’ll work on a second (and hopefully shorter!) post which talks about the extra functionality that I’ve since added to this script.

Malware Musings

Thoughts on malware and malware analysis