What is a Honeypot?

If you connect a computer to the open Internet without any sort of inbound traffic filter, it will typically start receiving attack traffic within minutes. Such is the scale of automated attacks and scanning constantly happening on the Internet. The vast majority of this sort of traffic is completely automated. It’s just bots scanning the Internet and maybe trying some random payloads to see if they do anything interesting.

Of course, if you run a web server or any other form of server you need your server connected to the Internet. Depending on your use case you might be able to use something like a VPN to only allow access to authorised users. If you want your server to be publicly accessible, however, you need to be connected to the Internet. As such your server is going to face attacks.

It might seem like you basically have to just accept this as par for the course. Thankfully there are some things that you can do.

Contents

1 Implement a honeypot
2 An example of a low-interaction honeypot
3 Avoiding false positives
4 Conclusion

Implement a honeypot

A honeypot is a tool that is designed to lure in attackers. It promises juicy information or vulnerabilities that could lead to information but is just a trap. A honeypot is deliberately set up to bait an attacker in. There are a few different varieties depending on what you want to do.

A high-interaction honeypot is advanced. It is very complex and offers lots of things to keep the attacker busy. These are mostly used for security research. They allow the owner to see how an attacker acts in real-time. This can be used to inform current or even future defences. The goal of a high-interaction honeypot is to keep the attacker occupied for as long as possible and to not give the game away. To that end, they are complex to set up and maintain.

A low-interaction honeypot is basically a place-and-forget trap. They are typically simple and not designed to be used for deep research or analysis. Instead, low-interaction honeypots are meant to detect that someone is trying to do something that they shouldn’t and to then just block them completely. This sort of honeypot is easy to set up and implement but can be more prone to false positives if not carefully planned.

An example of a low-interaction honeypot

If you run a website, a piece of functionality you may be familiar with is robots.txt. Robots.txt is a text file that you can place in the root directory of a web server. As a standard, bots, especially crawlers for search engines know to check this file. You can configure it with a list of pages or directories that you do or don’t want a bot to crawl and index. Legitimate bots, such as a search engine crawler will respect instructions in this file.

The format is typically along the lines of “you can look at these pages, but don’t crawl anything else”. Sometimes though, websites have a lot of pages to allow and only a few they want to prevent crawling. Thus they take a shortcut and say “don’t look at this, but you can crawl anything else”. Most hackers and bots will see “don’t look here” and then do the exact opposite. So instead of preventing your admin login page from being crawled by Google, you’ve pointed attackers straight to it.

Given that this is known behaviour though, it’s pretty easy to manipulate. If you set up a honeypot with a sensitive-looking name, and then configure your robots.txt file to say “don’t look here” many bots and hackers will do exactly that. It is then quite simple to log all IP addresses that interact with the honeypot in any way and just block them all.

Avoiding false positives

In a good number of cases, this sort of hidden honeypot can automatically block traffic from IP addresses that would issue further attacks. Care must be taken to ensure that legitimate users of the site don’t ever go to the honeypot though. An automated system like this can’t tell the difference between an attacker and a legitimate user. As such you need to make sure that no legitimate resources link to the honeypot at all.

You could include a comment in the robots.txt file indicating that the honeypot entry is a honeypot. This should dissuade legitimate users from trying to sate their curiosity. It would also dissuade hackers that are manually probing your site and potentially rile them up. Some bots may also have systems in place to try to detect this sort of thing.

Another method to reduce the number of false positives would be to require more in-depth interaction with the honeypot. Instead of blocking anyone who even loads the honeypot page, you could block anyone that then interacts with it further. Again, the idea is to make it look legitimate, while it actually leads nowhere. Having your honeypot be a login form at /admin is a good idea, just as long as it can’t actually log you into anything at all. Having it then log into what looks like a legitimate system but is in fact just deeper into the honeypot would be more of a high-interaction honeypot.

Conclusion

A honeypot is a trap. It’s designed to look like it might be of use to a hacker, while it is actually useless. Basic systems just block the IP address of anyone that interacts with the honeypot. More advanced techniques can be used to lead the hacker on, potentially for a long period of time. The former is typically used as a security tool. The latter is more of a security research tool as it can give insights into the attacker’s techniques. Care must be taken to prevent legitimate users from interacting with the honeypot. Such actions would either result in blocking the legitimate user or muddy the data collection. Thus the honeypot shouldn’t be related to actual functionality but should be locatable with some basic effort.

A honeypot can also be a device deployed on a network. In this scenario, it’s separate from all legitimate functionality. Again, it would be designed to look like it has interesting or sensitive data to someone scanning the network, but no legitimate user should ever encounter it. Thus, anyone that interacts with the honeypot is worthy of review.