Modern computers are highly complex machines. No part, however, is more complicated than the CPU. Often at the cutting edge of what is possible to manufacture in bulk. A modern CPU actively reorders the instructions it receives, has multiple processors, deep pipelines, multiple pipelines per core, a high success rate of predicting which branching path to execute speculatively, and the ability to use more registers than allowed by simply renaming them. All this makes CPUs extremely complex. Gone are the sequential days where an instruction is processed to completion before the next one is started.
This complexity means a lot of complex management is needed to keep it all working as intended. The control unit performs this management. The control unit is responsible for ensuring that all instructions get executed.
Memory and Interrupts
One of the core functions of a control unit is to ensure that each instruction has the data it needs to operate on available. Thanks to high-speed cache memory, this often doesn’t take too long. CPU cache memory is usually available within a few CPU cycles, though it does depend on the cache tier. If the data is not in the cache, it must be retrieved from the main memory, which takes much longer. During this time, the control unit could stall the pipeline, waiting for the data to be available. This can, however, involve hundreds of lost CPU cycles in which many other instructions could have been completed.
To get around this, a task waiting for memory or I/O will generally be side-lined through a context switch. This allows other instructions that are ready to run to be completed while that instruction waits for the data it needs. When the data is available, that instruction then needs to be resumed. This happens through the use of an interruption. An interrupt pauses the pipeline and inserts the waiting instruction and its data, ready for execution.
Most modern CPUs are now out-of-order processors. Rather than executing instructions in order, even allowing for the pipeline, they have a reorder buffer. The reorder buffer allows the CPU scheduler, a part of the control unit, to see the upcoming instructions and to reorder them for improved efficiency. For example, an instruction that needs to claim data from memory may be pushed up the queue to get the process of querying main memory started sooner.
Out-of-order processing is highly complex. It requires keeping track of not only when memory events should have happened but when they did happen so that the control unit can ensure that the writeback function at the end of the pipeline writes the data to memory in the correct, scheduled order, even if the data wasn’t processed in that order. This also allows a processor to prioritize specific tasks over others, which can be helpful in the foreground vs. background task management, where responsiveness is more important for the foreground task.
Branch Prediction and Speculative Execution
One of the significant issues with pipelined computers comes from conditional branches. A branch is a part of code, such as an “IF” statement. In one branch, that if statement is true; in the other, it is false. The problem is that you only determine which branch to take at the execution stage. In a pipelined CPU, though, you’re already supposed to have loaded the next instruction and potentially many others through much of the pipeline at that point.
So, what do you do? You could stall the pipeline whenever a branch is encountered. This would always lead to performance loss, though. Another option would be to start performing both outcomes and then drop the wrong one. This approach would have half of the performance loss but would still have that performance loss. The final option is performing branch prediction and speculatively executing your predicted branch. If you guess right, you lose no performance, but if you guess wrong, you have to flush the pipeline, losing a bit more performance than you would have done if you hadn’t done anything.
Performance relating to branch prediction and speculative execution is based on the success rate of the prediction algorithm. Modern algorithms, managed by the control unit, can have hit rates in the high 90% range, so they are pretty accurate. One way this can be partially managed is with out-of-order execution. By performing any branching operation as soon as possible, you can minimize the need even to predict the branch.
A control unit is a part of a CPU that manages the flow of instructions in a CPU in various ways. It ensures that each instruction has the data it needs available. If an instruction needs to be delayed, it manages the interrupt process that resumes that instruction. It manages the instruction reordering process needed for out-of-order execution and performs branch prediction. Without a highly complex control unit, CPUs would still be limited to slow sequential processing, even requiring a less complicated control unit.