A honeynet is a nifty way of collecting malware from the Internet, and consists of a network of one or more honeypots together with the supporting network infrastructure. If this sounds like something that you’ve been dying to do, or you are simply interested in what I’m using to capture malware samples, then grab yourself a cuppa (after last night, I’d suggest something other than two cups of Chai Vanilla tea if you are planning on getting some sleep anytime soon) and let’s begin.
Honeynets consist of three main components:
- Network infrastructure
- Honeypot host(s)
- Management host(s)
which this ‘Building a Honeynet on Linux’ series will cover in three parts. It will take you through my process of building a honeynet environment on a Linux platform, using KVM to run the virtual hosts, and using (virtual/software) network bridges to create the virtual honeynet network.
The focus will be on the methods and configurations that I used to build my lab, however it will also describe my understanding of honeynets, for the benefit of any readers not familiar with them, and if I don’t start getting more sleep than I did last night, it might also be largely incomprehensible.
As I haven’t been keeping up to date with honeynet technologies, it won’t offend me if readers not familiar with honeynets want to read more definitive documentation from someone actively involved in the research and/or the development thereof. The Honeynet Project is a good source of honeynet related documentation and projects.
There are two main concepts in the design of honeynets:
- Data Control
- Data Capture
Data control is all about controlling the flow of data both in and out of the honeynet. As a honeynet’s main goal is to capture malicious and/or suspicious files, we obviously need to take certain precautions. This controlling of the flow of data both in and out is to make sure that we are not only protecting our own internal systems, but that we are also protecting other Internet users, as we certainly don’t want our own systems being used to attack others.
Data capture is about capturing as much data as we can. Some of the most important data being the inbound network traffic, any outbound network traffic, log files containing timestamps (for log file/data correlation) and artifact (changes, basically) information from the honeypot, and, of course, any malicious/suspicious files. Basically we want to know as much as we can about what happened.
A honeynet also needs a way of being restored to a known clean state after a potential infection. If a honeypot is using software to emulate vulnerable services, then this step is generally not required as nothing, other than what we want to have changed, should have changed on the honeypot host(s).
However, if a honeypot is using real vulnerabilities in an operating system installed in a virtual machine (or on a physical host for that matter), then it will be necessary to restore that operating system to a known clean state after each infection. This is typically achieved using snapshots (for virtual machines) or re-imaging software (for physical machines).
To be continued…
There’s more musing to be done, so don’t miss ‘Building a Honeynet on Linux: Network Infrastructure‘.
How do you know when your honeypot has been infected by malware? How long after the event before you know?
Hi John,
Good question, and something which is quite important to know.
This is something which, when I first started using honeypots, I would do manually by sitting and watching a network capture for tell tale signs such as outbound network connections, in which case you know pretty much instantly.
Usually, after a successful infection, you will see connection attempts to lots of different IP addresses and to the same destination port that the attack came in on. This is why it is generally not a good idea to allow your public web servers to get out on port 80/tcp, and a good idea to split your inbound and outbound email servers and stop your public inbound mail server from getting out on port 25/tcp — it naturally restricts the ability of a compromised host to compromise other (remote) hosts, but I digress.
That manual approach was annoyingly arduous and so these days I use some software which emulates vulnerabilities, and it performs what it believes the malware is trying to do. This emulating software (which will be mentioned in the part subtitled ‘Honeypot Host’) is configured to log to a SURFcert IDS database. I notice new malware downloads and attacks by occasionally checking the database using the SURFcert IDS web interface.
As my honeypot is using software to emulate the vulnerabilities, it doesn’t (unless the emulating software is explicitly targeted) actually get infected, and hence I feel that this is sufficient.
However, if you are going to be running a live operating system (as opposed to emulating vulnerabilities) on your honeypot, then unless you’ve got nothing better to do you will want to automate the processes of detection, archiving the malware sample and log files, shutting down the honeypot, cleaning up (revert a virtual machine to a snapshot, or re-image a physical machine), and restarting the honeypot. You would include some method of notification as part of the automation, in which case the notification in this case would be close to instantaneous.
Having said that, if you are just starting out with honeypots then it can be interesting to take the manual approach of sitting and watching the network traffic, as it gives you the opportunity to go back afterwards and watch the attack unfold.
I started discussing ways in which this automation could be done, but that created a humongous comment reply, so I figured that it would be a good topic for a future blog entry.
I hope that this has answered your questions.
Musingly,
Karl.
Thanks Karl. That’s an excellent answer. You obviously know your stuff! And yes, I would use your answer as a blog post in its own right!
Thanks.