Computers are really fast at doing things. One of the things you may not know, however, is that computers try really hard to hide how slow memory is from you and themselves. The CPU in a computer operates at a clock speed approaching 6 billion ticks per second. On each clock cycle, the processor can complete multiple instructions on each of its processing cores. Keeping those cores fed with data is critical to performance.
Storage media contain all of the data the CPU could want. The main problem is that they’re slow, really slow compared to the CPU. To hide that slowness from the CPU system RAM is used to store the data of all running programs. Still, compared to the CPU, system RAM is slow, with latencies on the order of 400 clock cycles.
Caching for speed
To hide that latency from the CPU a tiered CPU cache is used. Typically, the CPU cache has three tiers referred to as L1, L2, and L3. L1 is the fastest tier, able to return results on the order of 5 clock cycles. The L2 cache may take 20 cycles and L3 around 200 cycles. While you might think it would make sense to give every CPU a lot of L1 cache, this is impossible. The performance of the L1 cache comes from a number of factors, each of which prevents having a large and fast L1.
The smaller the cache the less time it takes to find and return a result. As such a large L1 cache would take longer to return any result. Part of the speed comes from L1 cache being placed within the block space of the processing core, and only being accessible to that core. As such, space is limited, and taking up more space, would impact on the rest of the processing core. The actual memory cells for the L1 cache are quite large, making them faster to access. Shrinking their size to fit more in the same area, slows them down. Silicon die area is expensive, allocating more of it to cache increases the cost.
All of these factors have led to the tiered caching system. Each tier takes a different balance of capacity and speed by adjusting those factors for optimum access time and hit ratios.
Weirdly, none of this optimisation would matter at all without another factor, tag memory. modern computers use virtual memory. Each application is allocated its own memory space with its own addressing system. This helps to provide security as well as prevent memory issues from affecting other programs. It lets you hide the underlying architecture of the RAM from the software. Unfortunately, it also means that you need to translate every virtual address to the physical address. This means that in order to read data from memory, you need to make two requests to memory. One is to translate the virtual address to the physical address and then one is to read the data from the physical address.
This downside even affects the cache. to read the cache you’d have to go all the way to the main memory. Thankfully tag memory prevents that. Tag memory is arranged differently from normal memory. Instead of providing an address, and waiting for the result, you provide some data, and it tells you if it has a match. When properly optimised, tag memory can be extremely fast, returning a result in less than a clock cycle. Unfortunately, part of this speed, again, comes from the fact that it’s tiny, even smaller than the L1 cache. Thankfully, it’s very space efficient, and even when this small can have a hit ratio higher than 99%.
Tag memory basically stores memory address translations. They can be searched quickly allowing the CPU to determine that the data that it wants is in the cache, or if it really does have to go to main memory.
Tag memory is a form of memory that can be accessed differently from standard memory. Instead of requesting an address and getting the data, some content is provided and the tag memory searches for matches. It is used in and around the CPU cache to be able to translate virtual memory addresses into physical memory addresses enabling fast access to the cache. Without this, it would be necessary to get the translation from the main memory, even if the data was in the cache, massively increasing memory latency. In modern computers, tag memory is referred to as the Translation Lookaside Buffer or TLB.