If your computer, mobile phone, or any other piece of I.T. equipment starts behaving differently, then it is probably wise to investigate further as it could indicate a malware infection. I remembered this lesson on Friday when I was left stranded by the side of the road, due to car trouble after ignoring what I suspect with hindsight, was a subtle change in my car’s behaviour.
Not being particularly mechanically minded, I was shocked when my car just stalled whilst I was driving around town on Friday, and even more surprised to find that it wouldn’t start again. Being an I.T. software type person, I spent a lot of time on Saturday downloading (and in one case compiling) open source software to talk nicely to my car’s engine management system to convince it to tell me what was wrong with it (it is possible to get diagnostic trouble codes from vehicle engine management systems if the vehicle is OBD (On Board Diagnostics) compliant).
After a while of not seeing anything apart from error messages, I concluded that my car isn’t exactly OBD-II compliant (the ELM327 OBD-II to USB device that I was using wasn’t able to talk to the car, as far as I could tell), and that I should have listened to my more mechanically minded friend and done some good old-fashioned mechanical troubleshooting by checking to see how far the fuel was getting from the tank to the engine.
So, just like putting a network sniffer at various points in the network to see how far your network packets are getting, we started trying to undo the fuel hose near the engine to see how far the fuel was getting. Unlike putting a network sniffer at various points in the network, this seemed to require some physical strength.
I gave up on that, as fifteen years of sitting at a desk doing I.T. work hasn’t exactly endowed me with much in the way of physical strength, and it could have been because I was using a network switch instead of a sword, but raising it above my head and shouting ‘By the power of Grayskull‘ didn’t seem to give me any either. Plus from what I read, attempting to pull those particular fuel lines may have damaged them, so we turned our attention to the fuel lines closer to the fuel tank, which actually had clamps on them.
It turns out that the fuel pump in a car is important, and that mine wasn’t living up to its name and actually pumping any fuel. As I was sitting by the car watching what would have been close to a full 68 litre tank’s worth of fuel (worth close to AU$100) leak out into a bucket (the fuel pump was in the fuel tank), I started to ponder if some of the subtly different behaviour that my car had exhibited over the previous month or so may have actually been a sign that one day my fuel pump was going to cause my car to surprise me by stopping in the middle of a road. Behaviour such as the engine occasionally sounding like it was going to stall while idling, for instance — possibly a sign that it wasn’t quite getting enough fuel.
It is similar with I.T. devices (now I finally get to the point). Changes in behaviour should make users and administrators suspicious. Suspicious (that is if it is not normal behaviour for that particular device — a lot of disk activity on a network file server, for instance, is generally not that suspicious) types of behaviour include:
Disk light(s) flashing more often than normal/expected
This indicates increased disk activity and could be indicating that malware is scanning for files to infect; that it is actually infecting files; that it is scanning for documents/files to leak out over the network; or that it is destroying files/data.
Network light(s) flashing more often than normal/expected
This indicates increased network activity and could be indicating that malware is scanning for other network hosts to infect; that it is receiving a large malware payload/update from a (malicious) server; that it is attempting a denial-of-service attack on a target; that it is leaking a lot of information; or that it is the victim of a denial-of-service attack (probably not so much, unless the firewall allows external connections to the host/device).
Having said that, it could also be due to it downloading Windows updates. Be careful of this one as it can involve downloading a large amount of data by a background process, so it isn’t always obvious that it is happening, and it is a legitimate reason for a lot of network activity.
CPU/laptop fan louder than normal/expected (increased CPU usage)
This indicates a hotter CPU which in turn is indicating higher CPU usage. This generally only happens when running code that is CPU intensive, such as code to process video or perform large amounts of mathematical calculations for instance.
However, badly written software which comes across an unexpected error can also cause this due to it running in an infinite loop constantly retrying an operation which is quick to fail. An example of such an operation would be attempting to open a TCP connection when the network adapter has lost the network link to the switch. Such an operation would fail pretty quickly because the network hardware knows that the operation is going to fail because it no longer sees the electrical connection to/from the switch and can hence return an error immediately.
This is contrary to an operation such as attempting to open a TCP connection to a host which is down, or to which the connection in question is blocked by a firewall. In this case, the host initiating the connection will have to wait for a time out period to expire to be (reasonably) sure that the remote host is not going to respond.
Granted, depending on network hardware and firewall configuration, either of these situations can also fail quickly because a router may know that a host is down because it stopped responding to the router’s ARP requests. In this situation, the router can send back an ICMP host unreachable message in response to a packet addressed to the unavailable host.
Also, a firewall policy may say to ‘reject’, rather than to ‘drop’, a connection. In such a situation the firewall will send back usually either an ICMP unreachable message (a classic Cisco device response was an ICMP destination administratively prohibited I believe) or in the case of a TCP connection, a TCP RST packet.
I have seen some malware which uses a tight loop and hence which has the potential to increase CPU usage. A tight loop is a loop with not many instructions in its body, and which generally runs quickly due to the few instructions typically being instructions which don’t need much in the way of I/O and are hence quick to execute.
A good example of this was the SQL Slammer worm. It was small and could fit inside a single UDP packet. A UDP connection doesn’t need any packets to set it up (unlike TCP which requires the sending of three SYN, SYN+ACK, and ACK packets before sending data. We’ll ignore TCP Fast Open). Consequently, SQL Slammer was able to just sit in a tight loop to randomly generate an IP address, spit out a UDP packet with itself inside it,and rinse and repeat. As such it was very quick and I suspect soon highlighted network routing loops by completely flooding the links involved.
If a Windows (and I suspect a similar thing is possible with Linux) host suddenly reboots for no clear reason, then that is obviously cause for concern. I’m not talking about a clean Windows shutdown and reboot, but rather the computer acting as if someone had pushed the reset button.
The operating system kernel should capture all processor exceptions generated by a user mode application, and handle them accordingly. The ‘… has encountered a problem and needs to close.’ message is an example of this. The operating system kernel/processor protected mode should also prevent user mode code from being able to transfer code to the processor start-up address of FFFF:0000 or otherwise cause a non-normal Windows shutdown/computer reboot.
This means that a computer spontaneously rebooting will normally indicate an unhandled problem in kernel code. Since most of the kernel code is generally stable and tested, the most likely cause is from code that handles external/unpredictable events, that is events caused by external devices. The code that handles (or in the case of spontaneous reboots, fails to handle) external events from devices, is in the device drivers.
In the case of legitimate device drivers, spontaneous reboots and other issues are generally caused by race conditions or hardware interrupts occurring at unexpected times. For instance, an interrupt (most likely a timer interrupt given that these occur, or used to, 18.2 times a second) occurring in between modifying the ss and esp registers. An interrupt causes the processor to access the stack (to save the CPU flags and return address), so if only one of the ss and esp registers has been updated at the time we can expect some trouble to ensue, and since we don’t have a valid stack, a reboot is more than likely imminent.
I was once asked to investigate my neighbour’s PC, as it kept spontaneously rebooting. It turns out that Windows was actually catching the exception, creating a mini dump file, and then rebooting, rather quickly. I explained that the reboot was possibly the result of a hardware failure, or something more exciting like some malware. At this point my neighbour explained that my idea of exciting obviously differed from that of theirs.
Examining the mini dump in WinDbg showed that there was a problem in lzx32.sys, which wasn’t showing up in the registry under Windows, but did show up in reglookup output under Linux. Registry entries not showing up under Windows, but showing up when viewing the registry files ‘offline’ (that is, on another system which isn’t running using those registry files) suggests the presence of a rootkit to hide the malware’s activity, and indeed, lzx32.sys was later identified as being the Rustock.B rootkit.
Having said all that, there are some situations where a sudden increase in disk/network/CPU usage is required. For instance, Windows hosts downloading Windows updates, Linux hosts running updatedb to update the database used by the locate command, downloading large files, and processing video/generating encryption keys.
The important thing is to know what is normal behaviour and what isn’t. This is probably true for just about anything and can alert you to a number of potential issues including the presence of malware, the presence of hardware failures/software bugs, and the impending failure of a car’s fuel pump (or other components).