What is Account Harvesting?

There are many different types of data breaches. Some involve huge amounts of time, planning, and effort on the attacker’s part. This can take the form of learning how a system works before crafting a convincing phishing message and sending it to an employee that has enough access to allow the attacker to steal sensitive details. This sort of attack can result in a huge amount of lost data. Source code and company data are common targets. Other targets include user data such as usernames, passwords, payment details, and PII such as social security numbers and phone numbers.

Some attacks aren’t anywhere near that complicated though. Admittedly, they also don’t have such a large impact on everyone affected. That doesn’t mean that they’re not an issue though. One example is called account harvesting, or account enumeration.

Contents

1 Account enumeration
2 Subtlety in account harvesting
3 The devil is in the details
4 Putting the details together
5 Effects of account harvesting
6 Conclusion

Account enumeration

Have you ever tried to sign into a website only for it to tell you that your password was wrong? That’s rather a specific error message, isn’t it? It’s possible that if you then, deliberately, make a typo in your username or email address that the website will tell you that an “account with that email doesn’t exist” or something to that effect. See the difference between those two error messages? Websites that do this are vulnerable to account enumeration or account harvesting. Simply put, by providing two different error messages for the two different scenarios, it’s possible to determine if a username or email address has a valid account with the service or not.

There are many different ways this sort of issue can be identified. The above scenario of the two different error messages is fairly visible. It’s also easy to fix, simply provide a generic error message for both cases. Something like “The username or password you entered were incorrect”.

Other ways that accounts can be harvested include password reset forms. Being able to recover your account if you forget your password is handy. A poorly secured website though might again provide two different messages depending on if the username you tried to send a password reset for exists. Imagine: “Account does not exist” and “Password reset sent, check your email”. Again in this scenario, it’s possible to determine if an account exists by comparing the responses. The solution is the same too. Provide a generic response, something like: “A password reset email has been sent” even if there is no email account to send it to.

Subtlety in account harvesting

Both of the above methods are somewhat loud in terms of their footprint. If an attacker tries to perform either attack at scale, it will show up quite easily in basically any logging system. The password reset method also explicitly sends an email to any account that does actually exist. Being loud isn’t the best idea if you’re trying to be sneaky.

Some websites allow direct user interaction or visibility. In this case, simply by browsing the website, you can gather the screen names of every account you run across. The screen name can often be the username. In many other cases, it can give a big hint as to what usernames to guess as people commonly use variations of their names in their email addresses. This type of account harvesting does interact with the service but is essentially indistinguishable from standard usage, and so is a lot more subtle.

A great way to be subtle is to never touch the website under attack at all. If an attacker was trying to gain access to an employee-only corporate website, they might be able to do exactly that. Rather than checking the site itself for user enumeration issues they can go elsewhere. By trawling sites like Facebook, Twitter, and especially LinkedIn it can be possible to build up a pretty good list of employees of a company. If the attacker can then determine the company’s email format, such as firstname.lastname@comapny.com then they can in fact harvest a large number of accounts without ever connecting to the website they plan to attack with them.

Little can be done against either of these account harvesting techniques. They are less reliable than the first methods but can be used to inform more active methods of account enumeration.

The devil is in the details

A generic error message is generally the solution to preventing active account enumeration. Sometimes though, it’s the little details that give the game away. By standards, webservers provide status codes when responding to requests. 200 is the status code for “OK” meaning success, and 501 is an “internal server error”. A website should have a generic message indicating that a password reset was sent, even if it actually wasn’t because there was no account with the provided username or email address. In some cases, though the server will still send the 501 error code, even if the website displays a successful message. To an attacker paying attention to the details, this is enough to tell that the account really does or doesn’t exist.

When it comes to usernames and passwords even time can play a factor. A website needs to store your password, but to avoid leaking it in the event they are compromised, or have a rogue insider, the standard practice is to hash the password. A cryptographic hash is a one-way mathematical function that if given the same input always gives the same output, but if even a single character in the input changes the entire output changes completely. By storing the output of the hash, then hashing the password you submit and comparing the stored hash it’s possible to verify that you submitted the correct password without ever actually knowing your password.

Putting the details together

Good hashing algorithms take some time to complete, typically less than a tenth of a second. This is enough to make it difficult to brute force but not so long to be unwieldy when you only one to check a single value. it might be tempting for a website engineer to cut a corner and not bother to hash the password if the username doesn’t exist. I mean, there’s no real point as there’s nothing to compare it to. The problem is time.

Web requests typically see a response in a few tens or even a hundred or so milliseconds. If the password hashing process takes 100 milliseconds to complete and the developer skips it… that can be noticeable. In this case, an authentication request for an account that doesn’t exist would get a response in roughly 50ms because of communication latency. An authentication request for a valid account with an invalid password might take roughly 150ms, this includes the communication latency as well as the 100ms while the server hashes the password. By simply checking how long it took for a response to come back the attacker can determine with fairly reliable accuracy if an account exists or not.

Detail-oriented enumeration opportunities like these two can be just as effective as the more obvious methods of harvesting valid user accounts.

Effects of account harvesting

On the face of it, being able to identify if an account exists or doesn’t exist on a site may not seem like too much of an issue. It’s not like the attacker was able to gain access to the account or anything. The problems tend to be a little wider in scope. Usernames tend to be either email addresses or pseudonyms or based on real names. A real name can easily be tied to an individual. Both email addresses and pseudonyms also tend to be reused by a single individual allowing them to be tied to a specific person.

So, imagine if an attacker can determine that your email address has an account on a divorce solicitors website. What about on a website about niche political affiliations, or specific health conditions. That sort of thing could actually leak some sensitive information about you. Information that you may not want out there.

Furthermore, many people still reuse passwords across multiple websites. This is despite pretty much everyone being aware of the security advice to use unique passwords for everything. If your email address is involved in a big data breach, it’s possible that the hash of your password might be included in that breach. If an attacker is able to use brute force to guess your password from that data breach, they may try to use it elsewhere. At that point, an attacker would know your email address and a password that you might use. If they can enumerate accounts on a site that you do have an account on, they may try that password. If you’ve reused that password on that site, then the attacker can get into your account. This is why it’s recommended to use unique passwords for everything.

Conclusion

Account harvesting, also referred to as account enumeration is a security issue. An account enumeration vulnerability allows an attacker to determine if an account exists or not. As an information disclosure vulnerability, its direct effect isn’t necessarily severe. The problem is that when combined with other information the situation can get a lot worse. This can result in the existence of sensitive or private details being able to be tied to a specific person. It can also be used in combination with third-party data breaches to gain access to accounts.

There’s also no legitimate reason for a website to leak this information. If a user makes a mistake in either their username or password, they only have to check two things to see where they made the mistake. The risk caused by account enumeration vulnerabilities is much greater than the extremely minor benefit they can provide a user who made a typo in the username or password.