SSH’s TCP forwarding feature allows users to tunnel arbitrary TCP connections over an encrypted SSH connection, which in turn can allow them to make connections to internal hosts from outside your network — connections which can be hard to detect as SSH traffic is encrypted. So, is it possible to infer what an SSH connection is being used for when you don’t have access to the unencrypted data?
While SSH’s TCP forwarding feature can be quite useful, it can also pose a serious security risk.
The security risk comes about because if an SSH connection can be opened to a remote host which is also accessible from outside your network, then SSH can be configured to forward a port on that remote host, back to any host and port on the local side of the SSH connection. The upshot of this is that any host connecting to the forwarded port on the remote host, can get a connection to the (more than likely) internal host to which that remote port was forwarded.
The sshd_config(5) man page has this to say about the AllowTcpForwarding configuration option:
AllowTcpForwarding Specifies whether TCP forwarding is permitted. The available options are ``yes'' or ``all'' to allow TCP forwarding, ``no'' to prevent all TCP forwarding, ``local'' to allow local (from the perspective of ssh(1)) forwarding only or ``remote'' to allow remote forwarding only. The default is ``yes''. Note that disabling TCP forwarding does not improve security unless users are also denied shell access, as they can always install their own forwarders.
Smashing. So I started wondering whether or not we could detect this, after all, if a user enables port forwarding, then the SSH client and server have to negotiate and configure it. Can we detect such negotiation even if we can’t decrypt the SSH payload?
Does the fact that the client and server have to negotiate port forwarding mean that they send some extra packets that they wouldn’t otherwise send? Note that this isn’t necessarily going to be the case, as TCP forwarding could be negotiated as a configuration option, like a bit/flag for instance, set within a packet (say a configuration packet) that is already being sent as part of setting up the SSH connection.
So, let’s get technical, technical, I wanna get technical, let’s get into technical, let me see your server talk. Oh dear. Hey, if you thought that was bad, you should see the video clip.
Seriously, let’s grab ourselves a hot drink (because it’s the middle of winter, 6.1C outside, and houses around here really weren’t built for cold climates), and get stuck in to some technical details by having a look at a simple SSH connection.
If you’re following along at home, what you see may be different depending on the SSH client/server software that you’re using (or more precisely, on how your SSH software has implemented the SSH protocol(s)), and the particular version thereof. I’m using OpenSSH on both the client and on the server. I may try experimenting with different client/server combinations at a later date, but I figure that OpenSSH is reasonably popular so isn’t a bad one to test.
The reason that that sprung to mind was because as I started reading the SSH RFCs yesterday, I started with RFC 4253: The Secure Shell (SSH) Transport Layer Protocol, and knowing what it was that I was intending on doing I was paying particular attention to features of the protocol that would likely rain on my little parade.
The first few drops of parade-ruining rain came in section 5.3, Packet Size and Overhead, with the mention of padding. Padding is just like its name suggests, and is similar to things like bubble wrap and those bags of air that you get to protect — or ‘pad’ — the contents of boxes during shipping.
Why is padding in SSH packets bad for us? Padding is bad for us because we are planning on using the size of the TCP payload, which we hope will correlate with the amount of encrypted SSH data, to infer what type of data is in the encrypted SSH payload(s).
Padding will allow the SSH implementation to break that direct correlation between the size of the TCP payload and the amount of encrypted SSH data, in the same way that a large cardboard box could contain a large item with very little bubble wrap, or it could contain a small item with a large amount of bubble wrap. A point which I later discovered the RFC authors actually point out to the reader.
After reading that, and starting to lose hope, I got to section 6, Binary Packet Protocol, which states that SSH packets (the TCP payload) contain five bytes at the start — the first four specifying the payload length, and the fifth specifying the padding length. Brilliant — there’s hope yet! We can get the payload lengths from the SSH packet headers.
This hope was short lived, however, because before we even get to section 6.1, section 6 also states:
Note that the 'packet_length' field is also encrypted, and processing it requires special care when sending or receiving packets. Also note that the insertion of variable amounts of 'random padding' may help thwart traffic analysis.
Undeterred, I decided to press on anyway, as the amount of encrypted data isn’t the only thing that we can use to infer what is going on over an encrypted connection — we also have traffic patterns, that is, how many packets are sent and at what stage(s) during the connection.
It turns out that as we continue, we’ll see that (in the case of OpenSSH at least) neither the SSH padding nor the encryption of the length fields pose a problem to us because OpenSSH appears to have decided to not implement the anti-traffic analysis strategy of randomising the amount of padding in SSH packets.
So, since the payload (and the useful headers, it turns out) of the SSH packets will, for the most part, be encrypted (SSH doesn’t seem to mess around — it exchanges version information, does key negotiation, then starts encrypting), most of the data that we use will be from the TCP layer, and what better way to capture packet header information in a nice, easy to play with line based text format, than by using tshark (part of the Wireshark package). Here’s a command line that I prepared earlier, and shoved in to a script (to save me from having to remember it) which takes the network interface name as the first (and only) parameter:
[code autolinks=”false”]
#!/bin/sh
tshark -ni "$1" -T fields -Eheader=y -Eseparator=, -e frame.time_epoch -e eth.type -e tcp.time_delta -e _ws.col.Source -e tcp.srcport -e _ws.col.Destination -e tcp.dstport -e tcp.stream -e tcp.flags -e tcp.len ‘tcp’
[/code]
That tshark command was actually created to capture information on all TCP connections, not just SSH connections, because when I started dabbling with traffic analysis I was looking at all the connections that my host was making. We can modify it easily enough so that it only captures SSH, and I’d also be tempted to modify the filter sting so that it only captures packets that contain TCP data (that is, ignore packets that are only acknowledging the receipt of data). I used awk on some already captured data to pull out SSH packets containing TCP data, but since we’re not going to be looking at any other TCP traffic for the moment, it would be easier to modify the filter in the tshark command above:
[code autolinks=”false”]
tshark -lni "$1" -T fields -Eheader=y -Eseparator=, -e frame.time_epoch -e _ws.col.Source -e tcp.srcport -e _ws.col.Destination -e tcp.dstport -e tcp.stream -e tcp.len -Y ‘tcp.len > 0’ ‘tcp port 22’
[/code]
Note the use of a display filter (the ‘-Y’ option). This is because, as this ‘tshark capture filter for len parameter‘ answer reminded me, there isn’t a payload length field in the TCP header. We can’t use the IP length field because, while it is dependant on the length of the TCP payload, it is also dependant on things like IP options and TCP options.
Conveniently enough, Wireshark and tshark actually calculate the length of the TCP payload for us and make it available to display filters, so rather than doing a convoluted filter to extract and calculate the TCP payload length, we have the option of using a display filter, so let’s do just that.
I also added the ‘-l’ option, because I like the letter ‘l’. No, seriously, the ‘-l’ option tells tshark to not buffer multiple lines of standard output, but instead to generate output after each matching packet. This stops us getting confused because we’re typing away in our SSH session but aren’t seeing tshark log any SSH packets.
I then removed the eth.type field (which shows the Ethertype field from the Ethernet header) because we’re only gathering TCP packets so this will always either be 0x800 for IPv4 or 0x86dd for IPv6, which doesn’t add any information that we don’t already know.
Similarly, I removed the tcp.flags field because we are looking at packets where the TCP data length is > 0, so the TCP flags don’t really add any extra info either.
I also removed the tcp.time_delta field because I never saw that actually show any values and ended up writing an awk script to calculate it. It is irrelevant for this exercise anyway.
The tcp.stream field is a piece of metadata created by Wireshark/tshark to identify a particular TCP stream. You can see Wireshark using it if you use its Follow TCP Stream feature. If you notice this value change, then you are looking at a different TCP connection.
If you run the script/tshark command (you’ll obviously need to replace the ‘$1’ with the name of your network interface if you are running the command outside of a script) and then establish an SSH connection and login to a remote host, you’ll see something similar to the following:
[code autolinks=”false”]
frame.time_epoch,_ws.col.Source,tcp.srcport,_ws.col.Destination,tcp.dstport,tcp.stream,tcp.len
1436668476.780197000,<client>,35358,<server>,22,0,32
1436668476.786903000,<server>,22,<client>,35358,0,32
1436668476.787840000,<server>,22,<client>,35358,0,920
1436668476.790111000,<client>,35358,<server>,22,0,1428
1436668476.790414000,<client>,35358,<server>,22,0,540
1436668476.799173000,<client>,35358,<server>,22,0,48
1436668476.805548000,<server>,22,<client>,35358,0,624
1436668476.818890000,<client>,35358,<server>,22,0,16
1436668476.858482000,<client>,35358,<server>,22,0,44
1436668476.858840000,<server>,22,<client>,35358,0,44
1436668476.859211000,<client>,35358,<server>,22,0,60
1436668476.860929000,<server>,22,<client>,35358,0,44
1436668479.029158000,<client>,35358,<server>,22,0,140
1436668479.046631000,<server>,22,<client>,35358,0,28
1436668479.046983000,<client>,35358,<server>,22,0,104
1436668479.062577000,<server>,22,<client>,35358,0,44
1436668479.062799000,<client>,35358,<server>,22,0,436
1436668479.067751000,<server>,22,<client>,35358,0,84
1436668479.067795000,<server>,22,<client>,35358,0,108
1436668479.067806000,<server>,22,<client>,35358,0,28
1436668479.067813000,<server>,22,<client>,35358,0,28
1436668479.067820000,<server>,22,<client>,35358,0,108
1436668479.067849000,<server>,22,<client>,35358,0,28
1436668479.067911000,<server>,22,<client>,35358,0,92
1436668479.067922000,<server>,22,<client>,35358,0,28
1436668479.067981000,<server>,22,<client>,35358,0,76
1436668479.067991000,<server>,22,<client>,35358,0,28
1436668479.068107000,<server>,22,<client>,35358,0,120
1436668479.068129000,<server>,22,<client>,35358,0,28
1436668479.068182000,<server>,22,<client>,35358,0,60
1436668479.068195000,<server>,22,<client>,35358,0,28
1436668479.068207000,<server>,22,<client>,35358,0,44
1436668479.068313000,<server>,22,<client>,35358,0,28
1436668479.068341000,<server>,22,<client>,35358,0,124
1436668479.068355000,<server>,22,<client>,35358,0,28
1436668479.075859000,<server>,22,<client>,35358,0,76
[/code]
Now compare that with the output that the SSH client displayed:
[code autolinks=”false”]
user01@client01:~$ ssh server1
user01@client01’s password:
Linux server1 4.0.0-2-amd64 #1 SMP Debian 4.0.5-1 (2015-06-16) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
No mail.
Last login: Sun Jul 12 11:50:55 2015 from randclint.****************************.***
[/code]
Right away we can notice some patterns in the TCP packets. You can tell which is the client and which is the server by the TCP port numbers — the server will be the side with a TCP port of 22 (the well-known port assigned for SSH). You then see a two-way conversation, but then see it drop off to a rather one-sided conversation with the server doing all of the talking.
To explain this, let’s think about what is happening. We are establishing a connection to an SSH server. The client and server exchange version information (two-way conversation), then they perform a key exchange (two-way conversation), the user authenticates to the server (two-way conversation), and then the server logs the user in to an interactive login session which results in a number of lines of text being sent back to, and displayed by, the client (one-sided conversation with the server doing all of the talking). So we can now identify the connection initialisation part of the connection, and the part where the user has just logged in and the server is sending a number of lines of greeting text to the client.
At this point in the proceedings, it would be useful to determine what sort of overhead the SSH protocol adds to the data. We already know from RFC 4253 that there is a four byte payload length field, one byte padding length field, padding, and a message authentication code added to the actual payload (our data). It would be good to get an idea of the actual number of bytes of overhead so that we can determine roughly how much data plus padding there might be in the SSH payload of each packet. The amount of overhead will depend on which algorithms are used and whether or not the SSH client and/or server are varying the amount of padding.
Let’s see how much TCP data the client and server create, to send a single character of SSH data. Try pressing ‘.’ (without the quotes and without pressing the Enter/Return key) in the SSH session and examine the output from tshark:
[code autolinks=”false”]
1436673138.589161000,<client>,35358,<server>,22,0,28
1436673138.589805000,<server>,22,<client>,35358,0,28
[/code]
28 bytes. You’ll notice that the shell on the server echoes the character back to you. This will be the 28 byte packet that comes back to the client. Did you notice that the collection of packets corresponding to the greeting text that the server sent back to us after logging in, contained a number of 28 byte packets? What’s the deal with them?
Could it be that the server is sending output back line-by-line (which, given that we are talking to an interactive shell, is reasonably likely) and sending the trailing newline character separately? Let’s think about the post-login text packets.
We see client to server SSH packets stop before the string of server to client packets, so let’s hypothesise (because I don’t use that word very often) that the client stops talking after it sends our credentials to the server, and that the subsequent string of packets from the server to the client may contain some status/configuration information (such as whether or not the authentication was successful), followed by the post-login text.
Let’s also suspect that the 28 byte packets contain a single character, and that the server is sending back each line, followed by a single newline character. How can we test this hypothesis? Let’s look at the packets after the last packet sent by the client, and if we count the number of 28 byte packets, we get 9, and then a 76 byte packet tacked on to the end.
If we are correct, and the 28 byte packet corresponds to a newline character, then that would mean that we have 9 lines of post-login text. Counting them up, we have 10 lines (including blank lines), so close, but not quite correct.
Let’s not give up just yet though, as that is close enough to suggest some correlation between the 28 byte packets and the newline characters, and it is possible that the operating system’s TCP/IP implementation may have buffered or combined multiple SSH packets in to a single TCP packet. After all, TCP is unaware that SSH also has its own packets — to TCP, the SSH ‘packets’ are just a single stream of data to be sent.
Let’s do another test. Let’s get the SSH server to output most of that post-login text again. Most of the post-login text is contained within the /etc/motd (Message Of The Day, although I can’t recall anyone actually changing it, let alone changing it daily), so let’s output it again and see what tshark tells us.
Type ‘cat /etc/motd‘ but don’t enter it yet. Instead, hit enter a few times in the terminal (shell) containing the tshark output — this will create some blank lines so that we can see where we’re up to. That is, it will create some blank lines to separate the SSH packets corresponding to us typing the command in, and the SSH packets corresponding to the output of the command — the /etc/motd file.
Now press enter to run the ‘cat /etc/motd‘ command and tshark should feed you something similar to the following:
[code autolinks=”false”]
1436701740.511347000,<client>,35358,<server>,22,0,28
1436701740.512447000,<server>,22,<client>,35358,0,28
1436701740.513378000,<server>,22,<client>,35358,0,28
1436701740.513428000,<server>,22,<client>,35358,0,108
1436701740.513447000,<server>,22,<client>,35358,0,28
1436701740.513614000,<server>,22,<client>,35358,0,92
1436701740.513632000,<server>,22,<client>,35358,0,28
1436701740.513647000,<server>,22,<client>,35358,0,76
1436701740.513673000,<server>,22,<client>,35358,0,28
1436701740.513720000,<server>,22,<client>,35358,0,28
1436701740.513877000,<server>,22,<client>,35358,0,92
1436701740.513926000,<server>,22,<client>,35358,0,28
1436701740.513944000,<server>,22,<client>,35358,0,60
1436701740.513958000,<server>,22,<client>,35358,0,28
1436701740.514678000,<server>,22,<client>,35358,0,76
[/code]
Now, the first 28 byte packet from the client (line 1) will be us hitting enter to run the cat command. The next 28 byte packet (line 2) will be the server echoing the newline character back to us. The remaining packets (lines 3-15) will be the server sending us the contents of the /etc/motd file which, incidentally, does start with a blank line (which is why we have a 28 byte packet at line 3).
Notice something different (apart from it starting with three 28 byte packets) about this tshark output compared to the tshark output corresponding to the post-login text when we logged in? There is a second occurrence of two consecutive 28 byte packets (lines 9 and 10), which is good because the motd file has a blank line in the middle of it and hence we’d expect to see two consecutive newline characters. Do we now have the same number of 28 byte packets as we have lines of output?
Let’s count the number of 28 byte packets starting with the packet at line 3 (being the start of the motd file), and see if this now matches the number of lines of text that the server sent after we entered the cat command. We have seven 28 byte packets, and seven lines of text returned as a result of running the cat command. This is good, as it backs up our hypothesis.
Why does the tshark output end with a 76 byte packet rather than a 28 byte packet? The server sends the shell prompt after running the command, and the shell prompt does not have a newline character after it (if it did, any commands that you typed wouldn’t appear immediately after the prompt, but rather on the line after the prompt). It looks like the 76 byte packet is the shell prompt.
Also notice that at the point of mismatch between the post-login text packets and the /etc/motd packets, the former has a 28 byte, 120 byte, then a 28 byte packet (lines 28-30). The latter has a 28 byte, 28 byte, 92 byte, then a 28 byte packet (lines 9-12). Notice anything? If you add the 92 bytes from the /etc/motd packets packet at line 11, with the 28 bytes from the packet immediately before it (line 10), then you get 120 bytes.
This difference between the post-login text packets and the subsequent ‘cat /etc/motd‘ packets, is making it look like the mismatch between the number of 28 byte packets in the post-login text output and the number of newlines in the post-login text output, could be due to the operating system buffering TCP data before sending it out on to the network. This buffering is resulting in the newline character (28 byte TCP packet) being combined with the ‘Debian GNU/Linux …’ line (92 byte TCP packet) in one TCP packet.
We can test the hypothesis about the TCP packets corresponding to line-by-line output with the newline character output separately, by matching the TCP packet sizes up with the output from the ‘cat /etc/motd‘ command. I can feel another script coming on. Let’s use awk to prefix each input line with the length of the line:
[code autolinks=”false”]
awk ‘{ print length($0) "\t" $0; }’ /etc/motd
[/code]
Now manually prefix each line with the TCP packet lengths, starting from line 3 (where the motd output starts) of the ‘cat /etc/motd‘ tshark output. Leaving the 28 byte packets out, as these appear to correspond to the newline characters at the end of each line, we get:
[code autolinks=”false”]
TCP line line
len len text
0
108 73 The programs included with the Debian GNU/Linux system are free software;
92 66 the exact distribution terms for each program are described in the
76 47 individual files in /usr/share/doc/*/copyright.
0
92 65 Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
60 28 permitted by applicable law.
76
[/code]
We could have scripted that by using the paste command, but we’d need to extract the relevant lines from the tshark output first, which isn’t worth the overhead for seven lines of text.
There won’t be a byte-for-byte correlation between the TCP length and the line length, and this can be seen with the two 92 byte TCP length lines (lines 5 and 8) being 66 characters and 65 characters. This is because RFC 4253 states that the random padding must be such that the total length of the SSH packet is a multiple of the cipher block size or of 8, whichever is larger (section 6, Binary Packet Protocol). That means that the TCP length is never going to increase by less that 8, so if 1 byte of data is added, the padding will have to be adjusted such that the TCP length will either remain the same (as it does in this case), suggesting that there is less padding, or increase by at least 8 (more if the cipher block size is larger than 8), suggesting that there is more padding.
Just quickly (or not so quickly, as it turned out) taking this a tad further, mainly because I’m curious as to why the SSH packets seem to be sent one per line with the newline sent separately, I’ll try using the strace command on the sshd process that is running as me (as opposed to the privilege separation one that is running as root).
I used the following strace command to display the read() and write() system calls that sshd is making:
[code autolinks=”false”]
strace -fp <pid> -e read,write
[/code]
That resulted in the following output when I hit enter to run ‘cat /etc/motd‘:
[code autolinks=”false”]
read(3, "\0\0\0\20\244\366J\340\267\204\220*\355\tI\272\230\36\327\r\375y,\307Q\265WN", 16384) = 28
write(9, "\r", 1) = 1
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20(5\350\3\t\304\322\v\311\321\232\2203Y7\240\351\305\300\261\365\332sc", 28) = 28
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20R\250\21H\31\353\312\262b\365\236;\345\271)*N/$\n8\304 q", 28) = 28
read(11, "The programs included with the D"…, 16384) = 73
write(3, "\0\0\0“\347\231S\364s\4\327\34u\341\22V\33u\244B\26\345\23i\246D\17\202\212J\243"…, 108) = 108
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20\247\370\272\20|\334\216pv\300#\1z\202\347!^\342)\371\334\0\273^", 28) = 28
read(11, "the exact distribution terms for"…, 16384) = 66
write(3, "\0\0\0P\\z$d\266_-\240o\22\307\324;\253\370\260j\30\257\311\v\310)]h\340G\204"…, 92) = 92
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20\317*\3057q\361qb\ngVd\263\264\237i\355\234~\237\217\212\26\376", 28) = 28
read(11, "individual files in /usr/share/d"…, 16384) = 47
write(3, "\0\0\0@D\303<\22\32\3622\254\303V\263Z4\37A\3269\t\241\r\233\346\301~\323\5a\236"…, 76) = 76
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20\321-\355\335\35\21\205\261_f\224\361\207\0\330W\n\0373\222]\2452\32", 28) = 28
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20_\262`\204\261\201\356D#A\243\234RoS\302\35%\237~\"I\371\317", 28) = 28
read(11, "Debian GNU/Linux comes with ABSO"…, 16384) = 65
write(3, "\0\0\0P\264\265#\205o\333\376\214\200>\351K=h\177\t\220\24`\355\274\335#-\307\370\303`"…, 92) = 92
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20n\342\241<\257\25\7\330\301\t\2\t\3318\331g\243\211\267\31\2625\325\371", 28) = 28
read(11, "permitted by applicable law.", 16384) = 28
write(3, "\0\0\0000\6\350\357\251\31K\2427\344`#4\351B\360,\f\\\227\2\364\267\370\0106\345\272r"…, 60) = 60
read(11, "\r\n", 16384) = 2
write(3, "\0\0\0\20\204\333RNh\230q\310\32\257\326\270\236\340\351$lU\17|\326\206\365\213", 28) = 28
read(11, "\33]0;user01@server1: ~\7user01@ser"…, 16384) = 40
write(3, "\0\0\0@\335\333\230Z\7\2351\362\220\32\266\373n8\236I\25\234\375\345\356\363\312\nX\335\277\373"…, 76) = 76
[/code]
Now, peeking at the /proc/<pid>/fd/ directory for this sshd process, I see that file descriptor 11 is /dev/ptmx which, according to a ptmx(4) man page that I found, is a ‘pseudoterminal master and slave’.
From reading the man page, this ptmx setup seems to be like a pipe between a pseudo-terminal master device and a pseudo-terminal slave device, in that anything written to the slave device can be read by the master device, and vice versa. The slave devices present as normal terminal devices, and are allocated from (or created in) the /dev/pts/ directory — this device is the standard input, standard output, and standard error used by the login shell (/dev/pts/0 for instance).
Now things are starting to fall in to place. sshd requests a pseudo-terminal master/slave pair. It opens the master, and the login shell opens that slave. When the shell writes to its standard output — the slave device — say to output the contents of the /etc/motd file, the data ends up at the master device. This allows sshd to read the output from the shell by reading from the master device (file descriptor 11 in the strace output above). It then encrypts this data and sends it over the network (file descriptor 3 above).
Notice anything about the read() calls reading from file descriptor 11 (the pseudo-terminal master device)? They seem to read the contents of the line, and then read the ‘\r\n‘ (carriage return and newline) characters in a separate read() call.
If sshd was reading up to and including a newline character, then it would be annoying to use an interactive shell in an SSH session, because an interactive shell echoes back characters as you type them. If sshd was reading up to and including a newline character, then you wouldn’t see what you had typed until you hit enter.
Given that an strace command on ‘cat /etc/motd‘ was showing that cat issues a write() call for the whole motd file basically, it isn’t cat that is breaking it up in to lines. The read() call won’t be aware of how the application has called it (whether directly, or as the result of a library call such as fscanf() for instance), so it isn’t going to naturally only read up to white-space (fscanf() ‘%s‘ format token).
I figure that it must be the terminal driver breaking the data from the pseudo-terminal up in to lines. I had a quick look at the stty command to see if I could put the terminal into line-by-line mode (which I remember reading somewhere — it might have been telnet though) to see if that changed the pattern of read() calls, but I couldn’t find an option to do so (I think it was a telnet thing actually).
So that’s my reasoning for why OpenSSH is sending each line in one go with the trailing newline character sent separately — terminal I/O, and the fact that you want the shell on the other end of the connection to be interactive and respond as you type. That makes it easy for us to perform traffic analysis to identify interactive SSH sessions. Thinking about it, the SSH server can’t buffer the output that it sends back for precisely the same reason that it can’t wait for a newline character — the user expects to see what they are typing, when they type it.
OpenSSH could (and should), however, randomise the amount of padding that it adds to its SSH packets — that would make it pretty difficult to identify the output text followed by end of line pattern that we’ve observed when SSH is used for an interactive shell session. That would stop us from being able to use the TCP length of packets to identify a pattern, and we’d then have to resort to a different method such as trying to find a pattern in the timing of the packets.
So there we have it, we’ve not only identified a likely pattern of packets for interactive shell sessions over SSH, but also identified a possible reason for that pattern, and one which suggests that that pattern of packets is pretty much inevitable (due to terminal I/O and user expectations).
That’s it for this post (which ended up somewhat longer than I’d originally intended). If you thought that that was fun then you won’t want to miss the next post in the series where I’ll see if we can identify patterns of packets generated by scp and sftp. I’ll then follow that up by looking for differences when we add TCP forwarding to the different SSH use cases that we’ve covered.