In the early days of computing, CPUs were purely sequential machines. This helped to keep designs simple. However, it also limited performance. Many processes will need to request data from system RAM or the hard drive. While system RAM is fast, it’s still not as fast as the CPU, leaving it sitting idle, waiting for data until the response comes back from the RAM. The situation is even worse for data requested from the hard drive, a storage device much slower than RAM. Here the CPU can be idle for significant periods, waiting for a response. Unfortunately, with sequential processors, this issue is simply unavoidable.
Thankfully, modern CPUs are no longer sequential. They offer many advanced features, such as out-of-order execution and multiple threads. Out-of-order execution allows the CPU to analyze upcoming instructions and reorder them to maximize efficiency. Multi-threading allows the CPU to have numerous different threads or processes running.
Outside of having multiple cores, the CPU can’t run more than one at a time. It can, however, make it look like it by switching between them regularly to ensure they each get an appreciable amount of constant CPU time. The process of switching between threads is called a context switch.
How Does a Context Switch Work?
A context switch consists of two parts, switching out the previous thread and switching in the new one. To change the old thread, the CPU must save its current state to a Process Control Block or switch frame. This includes the values of any relevant CPU registers and always consists of the value of the program counter. Once the thread has been stored, a handle can be added to a ready queue to allow it to be restored when needed.
Switching in the following thread is the same process in reverse. A thread is selected either from the ready queue, depending on weighting. Alternatively, it can be chosen by an interrupt indicating that an event the thread was waiting on is now ready or complete. The data for the thread is then copied into the correct registers, and the thread is restored. At this point, the new thread is ready to continue operation from where it stopped.
The process of reading and writing data when switching a thread in or out takes some time, though not much, as the memory used is typically high-speed. There are, however, further performance costs. When switching threads, the data in the CPU caches and buffers from the previous thread, may not be relevant to the new thread. This can lead to a significant increase in TLB (Translation Lookaside Buffer), and cache misses.
This effect isn’t significant if the two threads were spawned by the same process, as they are likely to share considerable memory elements. The TLB must be flushed entirely when switching between threads from different methods. This leads to a 100% TLB miss rate while the hit rate of the CPU cache is also significantly reduced.
While CPUs offer hardware support for context switching, operating systems typically don’t use this. Hardware context switching lacks awareness of the relevance of data. Therefore it needs to store and restore all registers, increasing the time taken and the storage space required.
Additionally, hardware context switching doesn’t store the data from floating point registers, functionality that may be necessary. Software context switching is, therefore, generally used. It allows keeping the data from all registers, including floating point registers. Software context switches do have an understanding of the relevancy of data. This means it can pick and choose which ones to store as needed.
A context switch is a process by which a modern CPU switches which thread it is running. The process involves storing the current thread’s relevant data and restoring the new thread’s pertinent data. Context switching comes with a performance cost related to the time needed to perform the switch, and the increased rate of cache and TLB misses as these are not stored. Context switches happen either to ensure that all threads have a good supply of CPU time or because of an interrupt indicating that an event the line was waiting on is complete.
Amir madadi says