In computer programming, values that need to be acted on or processed somehow are stored in a variable. A variable is simply a name and value pair. People use many naming schemes but generally have few limitations on the actual names. The value of a variable can be changed in the course of the program.
In many modern programming languages, everything about a variable value can be changed, from a number to a text string to an array of strings and numbers. Many older programming languages don’t quite allow this much flexibility, requiring variables to be defined with a type, such as an integer variable for numbers or a string variable for text.
That sort of modern programming language that offers flexibility is called a high-level language; One example would be Python. This is because the compiler or interpreter of the programming language does much of the heavy lifting with complicated things like memory management. These languages have a layer of abstraction from the machine code that helps to keep writing code simple. This approach does have some limitations, though. It can often be preferred to code software in older low-level programming languages.
Note: Not all old languages are low-level, and not all high-level languages are modern. There has generally been a shift towards higher-level languages though both are popular.
Low-level programming languages such as C are much less flexible and require more effort to achieve the same functionality. One of the standard features in low-level languages is the ability to manage memory directly. And while this can be useful, it’s also been a common source of a class of security vulnerability known as a buffer overflow.
How to Fill a Buffer
Another possible name for a variable is a buffer, as it can be a temporary place to store data either only needed to act on other data or before it’s written to a file. When declaring variables in C, you need to explicitly state how much memory space needs to be allocated to the variable. This is generally relatively easy when you only use the variable for one thing, especially if you also know that value.
Setting aside the right amount of memory gets a little more complicated when you don’t have a known value to add straight away or if the content size could vary. Thankfully it is possible to adjust the memory allocation if needed, but a problem arises if that isn’t done.
For example, let’s imagine we have two variables, Name and Age. The name has been defined as a string, while age is an integer. Let’s also suppose you’ve only ever known people with short names: Dave, Sean, and Mary. You only need four ASCII characters to be able to store each of those names but to be safe, you add an extra digit and set the length of the Name variable to five characters. Thankfully this holds other names such as James, Barry, and Becky.
Name |
Age |
|||||
B |
e | c | k | y | 3 |
2 |
D | a | v | e | 4 |
4 |
In this scenario, both variables are stored right after each other in memory as this is an efficient use of limited memory space.
How to Overflow a Buffer
The problem happens when the unexpected happens, and someone with a longer name shows up. Caitlyn is about to cause a bunch of trouble. When she enters her name, the software does exactly what it’s told to and enters her name. By default, C doesn’t check to see if a value fits in the allocated memory it’s being put in. it also doesn’t truncate it if it runs out of space; it just writes it, no matter what.
Name |
Age | |||||
C | a | i | t | l | y |
n |
Now the written data has extended beyond the allocated buffer, overflowing it and overwriting Caitlyn’s age. This isn’t immediately a problem as nothing has tried using the values. Unfortunately, you will have difficulties when you want to read the values. Let’s try printing Caitlyn’s name to the screen because when the value of the name is read, it prints, “Caitl.” It’s rude to cut off the name, but not a big issue. There’s a bigger problem when it comes to checking the age.
Let’s say the software checks that the person is of age to buy something age restricted to 18 and older. To do so, it might run a check like this age> 18″. As a number, this operation is easy, is age greater than 18. Both Becky and Dave pass, others might be too young. something different happens to Caitlyn though. The software tries to check if “yn” is greater than 18. As you might expect, the software throws an error because it can’t do that.
Error Severity
The sort of error in the example is relatively tame. It might even be handled by the software and allow you to edit the age to fix it. In a worse scenario, the error might not be tolerated. In this case, it will cause the whole program to crash.
The thing is, there’s no guarantee that something as simple as age is what gets overwritten. What if what follows Name is pathToFileToDelete or pathToOtherSoftwareToRun. In these examples, you could fundamentally change what the software does. Now imagine if a hacker knows exactly how these attacks work and how to execute them reliably. They may be able to use them to delete sensitive use files or to run other, potentially malicious software.
This entire class of vulnerability has been a gold mine for high-severity security vulnerabilities. Most offer the ability to crash the program, but many can be much worse. Numerous examples of buffer overflow vulnerabilities enable code execution or privilege escalation.
Prevention of Buffer Overflows
Some tools minimize the risks, such as amending the memory allocation size based on the data being added or simply checking if the data fits within the intended memory allocation. Unfortunately, these steps are optional extras that add complication and increase development time. Sometimes they are skipped because “what’s the worst that could happen” sometimes, they are ignored because the code was written 30 years ago, and “if it ain’t broke, don’t fix it.”
High-level languages don’t have this issue because they abstract the memory allocation away from the user and manage it automatically.
Conclusion
A buffer overflow is a class of security vulnerability when the memory area allocated for a variable is exceeded by the data placed within it. Instead of being truncated to fit, the data is written as is, overwriting whatever was in memory directly after the affected variable. In many cases, this causes memory corruption and software crashes due to uncaught errors. Some buffer overflow vulnerabilities can, however, be more dangerous. In the right (or wrong) circumstance and with careful execution, it can be possible to use a buffer overflow to change the software’s functionality, often to something malicious.
Buffer overflow vulnerabilities stem from poor memory allocation management. It exclusively happens in low-level languages that offer or require manual memory management. While these languages provide tools to prevent exposures, they have to be explicitly used, something that doesn’t always happen. High-level languages entirely abstract the memory management functionality from the developer, essentially preventing the class of vulnerability.