Historically, all computer programs were written in an utterly sequential fashion. This is simple to read, write, and understand. It’s also simple for a computer to execute and requires relatively simple hardware. With this design paradigm, the only two ways to increase system performance are to write more efficient code and increase CPU speed. Increasing code efficiency may be possible, but it is generally a complex process with often limited results.
For decades, performance could be decreased by waiting for new, more efficient CPUs. As described by Moore’s law, CPUs roughly double in performance every two to three years. Unfortunately, most of these performance gains came from using ever-smaller manufacturing nodes. Modern technology has been struggling to decrease node size at the historical rate, thanks to material difficulties working at the scale of nanometers.
To get around this, modern CPU architects have opted to add multiple processor cores to CPUs. Each processor core can act independently on a different task. While they can’t combine the same problem, they can work on two issues simultaneously. This fundamental architectural change provides lots of extra performance, but it doesn’t directly benefit individual processes, though it does reduce contention for processor time.
To take advantage of multi-core CPUs, code must be written in a multi-threaded fashion. Each thread can then be run concurrently, scaling the performance benefit by the number of available threads and CPU cores. Doing this, though, runs into a new challenge, the “race condition.”
Note: Some tasks can’t be multi-threaded, while others can be massively multi-threaded. The possible performance benefits do rely on the work being done.
Multi-threaded software can take advantage of multiple cores. Dangers are lurking in those waters, ready to trap the inexperienced programmer. A race condition can occur when two different threads interact with the same bit of memory.
A simple example could be two threads trying to check and increment a variable simultaneously. Let’s say that a=0. Two different threads then perform their functions and, at some point, check a and increment it by one. Generally, you’d expect the result of two threads adding one to zero to be two. Most of the time, this should be the case. You can get a different result if both threads go through that specific functionality at precisely the right time.
In this case, the first thread reads the value of a. Before the first thread can increment the value of a though, the second thread reads it. Now the first thread adds one to zero, but the second thread already believes the value to be zero, adding one to zero. The result of this is that the final value of a is 1, not 2.
Racing to the Worst-Case Scenario
While the example above might not sound particularly bad, it can have dramatic effects. What if the value of a selects the mode of operation of a machine? What if specific modes of operation of that machine can be dangerous or even life-threatening?
Race conditions also don’t need to be that simple. For example, it can be possible for one thread to read a memory section at the same time that another thread is writing to it. In this case, the reading thread may get a weird mix of the data from both before and after. Let’s say that the check is a simple true/false check.
If the variable said true at the start of the read but was in the process of being overwritten to the word false, the result of the read operation might be something like “trlse.” This isn’t “true” or “false.” Not being either of the two options in a binary choice would almost certainly result in the application crashing. This memory corruption can lead to many security issues, such as denial of service and privilege escalation.
Locking out the Race
Knowing what bits of memory in a program are shared between different threads is essential to prevent a race condition. Nothing needs to be done if a variable is only ever controlled and accessible by a single thread. If two or more threads can access a variable, then you must ensure that all operations on that memory piece are completed independently of one another.
This independence is achieved thanks to a lock. In the code of a program, you need to put a lock when writing a function that operates on a shared piece of memory. This lock blocks other threads from accessing that piece of memory until the lock is released.
The lock isn’t the most elegant of solutions. For one thing, it has memory overheads. It also can force a thread to hang, waiting for a lock to be released. Depending on the situation, the lock may not be released for a very long time or may not be released at all. In a worst-case scenario, unlocking a lock could depend on something happening in another blocked thread, leading to a deadlock.
It’s essential to optimize the use of locks. You can control how granular the lock is. For example, if you’re editing data in a table, you could lock the entire table or lock just the edited row. Locking the whole table would be a coarse granularity lock. It minimizes the overhead from implementing too many locks but increases the chance that another thread gets blocked by the lock. Locking just the row would be a fine granularity lock. This is much less likely to interfere with other threads, but means tore locks will be needed, increasing the total overhead.
A memory lock is a code tool that is used to ensure atomicity in-memory operations in a multi-threaded environment. By locking a piece of memory before operating on it, you can be sure that no unexpected behavior can occur because of a race condition. Memory locks come with a memory overhead but can also cause blocking.
Blocking is where another thread attempts to operate on a locked pemory. The thread sits there, blocked until the lock is released. This can cause issues if releasing the lock requires another thread to do something, as it may become blocked before it can complete the prerequisite to release the lock blocking it. Memory locks can be avoided by writing non-blocking codes. Doing so, however, can be complex and less performant than utilizing locks. Don’t forget to leave your comments below.