Computers are programmed with programming languages. These languages are generally human-readable and allow the programmer to configure what the computer does. This code then needs to be compiled into computer instructions. The exact details of this vary depending on the Instruction Set Architecture or ISA the intended computer uses. This is why there are different download links for x86 CPUs from Intel and AMD, and ARM CPUs as used in modern Apple devices. The ISA of x86 and ARM is different; the software must be compiled separately. As Apple has shown, it is possible to build a fancy translation layer; it’s just not common to do so.
You might think that the CPU sees the instructions that it is presented with and then executes them in order. There are many tricks that modern CPUs do, including out-of-order execution, allowing the CPU to reorder things on the fly to optimize performance. However, a clever part that is pretty well hidden is micro-operations.
The Pipeline to Micro-Operation
Individual instructions in machine code can be called instructions or operations; the terms are interchangeable. One of the difficulties with Complex Instruction Set Computing or CISC architectures like x86 is that instructions can vary in how long they are. This specifically refers to how much data they take to represent. In x86, an instruction can be as short as one byte or as long as 15. Compare this to the standard RISC-V architecture used by modern ARM CPUs with fixed-length 4-byte instructions.
Tip: RISC stands for Reduced Instruction Set Computing.
One of the implications of this difference in structure is that RISC architectures tend to be a lot easier to pipeline efficiently. Each instruction has multiple stages to its operation that utilize different hardware. Pipelining runs multiple instructions through these stages simultaneously, with precisely one instruction in each stage. Pipelining offers a considerable performance boost when used efficiently. One key factor in efficiently utilizing a pipeline is ensuring that each stage is used simultaneously. This keeps everything running through the pipeline smoothly.
With all instructions being the same length, RISC instructions tend to require the same processing time as each other. In a CISC, however, like x86, some instructions can take much longer to complete than others. This creates a big efficiency issue when pipelining a CPU. Every time a longer instruction comes along, it gets stuck in the pipeline for longer. This causes a bubble and holds up everything behind it. Micro-operations are the solution to this.
Instead of treating each instruction as the only level of operation that can be performed, micro-operations introduce a new lower layer. Each operation can be split into many micro-operations. By designing the micro-operations carefully, you can optimize the pipeline.
Interestingly, this offers a new advantage. While the overall ISA, say x86 remains the same between many different CPU generations, the micro-operations can be custom designed for each generation of hardware. This can be done with a deep understanding of how much performance can be squeezed out of each pipeline stage for each micro-operation.
In the early days of micro-operations, they were hard-wired connections that activated or disabled specific functionality depending on the micro-operation. In modern CPU design, a micro-operation is added to a reorder buffer. It’s this buffer that the CPU can perform its efficiency-oriented reordering. It’s micro-operations, not actual instructions, that are reordered.
In some cases, especially with more advanced CPUs, even more, can be done. Micro-op fusion is where multiple micro-operations are combined into one. For example, a sequence of simple micro-operations may perform an action that can be performed with a single, more complex instruction. By reducing the number of micro-operations performed, the process can complete faster. This also reduces the number of state changes reducing power consumption. Full instructions may even be analyzed and combined into more efficient micro-operation structures.
Some CPUs make use of a micro-operation cache. This stores fully decoded micro-operation sequences that can be reused if called again. Typically, the size of such a cache is referred to by the number of micro-operations it can store rather than by byte capacity.
A micro-operation is a CPU-specific implementation of an instruction set. Instructions are decoded to a series of micro-operations. These micro-operations are significantly easier to pipeline more efficiently and thus make better use of CPU resources. As micro-operations are not hardcoded into the instruction set, they can be customized to the specific hardware of each generation of CPU. Micro-operations are often shortened to micro-ops or even μops. That uses the Greek letter μ (pronounced Mu), the SI symbol for the micro prefix.