The basics of encryption algorithms are fairly easy to understand. An input or plaintext is taken along with a key and processed by the algorithm. The output is encrypted and known as the ciphertext. A critical part of an encryption algorithm though is that you can reverse the process. If you have a ciphertext and the decryption key, you can run the algorithm again and get the plaintext back. Some types of encryption algorithms require the same key to be used to both encrypt and decrypt data. Others require a pair of keys, one to encrypt, and another to decrypt.
The concept of a hashing algorithm is related but has a few critical differences. The most important difference is the fact that a hashing algorithm is a one-way function. You put plaintext into a hash function and get a hash digest out, but there is no way to turn that hash digest back into the original plaintext.
Note: The output of a hash function is known as a hash digest, not a ciphertext. The term hash digest is also commonly shortened to “hash” though the use of that can lack clarity sometimes. For example, in an authentication process, you generate a hash and compare it to the hash stored in the database.
Another key feature of a hash function is that the hash digest is always the same if you supply the same plaintext input. Additionally, if you make even a small change to the plaintext, the hash digest output is completely different. The combination of these two features makes hashing algorithms useful in cryptography. A common use is with passwords.
Password hashing algorithms
When you sign into a website, you provide it with your username and password. At the surface level, the website then checks that the details you entered match the details it has on file. The process isn’t quite that simple though.
Data breaches are relatively common, it’s quite likely that you’ve already been affected by one. Customer data is one of the big targets in a data breach. Lists of usernames and passwords can be traded and sold. To make the whole process more difficult for hackers, websites generally run every password through a hashing algorithm and only store the hash of the password rather than the actual password itself.
This works because when a user tries to authenticate, the website can also hash the submitted password and compare that to the stored hash. If they match, then it knows that the same password was submitted even if it doesn’t know what the actual password was. Additionally, if the database with the password hashes stored in it is breached by hackers, they can’t immediately see what the actual passwords are.
Strong hashes
If a hacker has access to password hashes, there’s not much they can do with them straight away. There’s no reverse function to decrypt the hashes and see the original passwords. Instead, they must try to crack the hash. This basically involves a brute-force process of making many password guesses and seeing if any of the hashes match the ones stored in the database.
There are two issues when it comes to the strength of a hash. The strength of the hashing function itself and the strength of the password that was hashed. Assuming a strong password and hashing algorithm are used, a hacker should need to try enough passwords to compute 50% of the entire hash output space to have a 50/50 chance to crack any single hash.
The amount of processing can be dramatically reduced if the hashing algorithm has weaknesses in it that either leak data or have an increased chance of incidentally having the same hash, known as a collision.
Brute force attacks can be slow as there are a huge number of possible passwords to try. Unfortunately, people tend to be quite predictable when coming up with passwords. This means that educated guesses can be made, using lists of commonly used passwords. If you choose a weak password, it may be guessed significantly earlier than the 50% of the way through the hash output space figure would suggest.
This is why it is important to use strong passwords. If the hash of your password is involved in a data breach, it doesn’t matter if the website used the best possible, most secure hashing algorithm available, if your password is “password1” it will still be guessed nearly instantly.
Conclusion
A hashing algorithm is a one-way function. It always produces the same output if provided with the same input. Even minor differences in input significantly change the output, meaning you can’t tell if you were close to the right input. Hash functions can’t be reversed. There’s no way to tell what input was used to generate any given output without just guessing. A cryptographic hash function is cryptographically secure and suitable for uses that need that sort of security. A common use case is to hash passwords. Other use cases include hashing files as an integrity verification.
Did this help? Let us know!