When requesting data from any source, there’s always some delay. Ping to web servers is measured in milliseconds, storage access time can have latencies in the microseconds while RAM latency is measured in CPU clock cycles. Of course, these sorts of speeds would have been unthinkable merely a few decades ago but in the present, they’re never fast enough. Access speed is regularly some form of bottleneck in performance. One of the ways this can be addressed is with caching.
Caching is a process of storing a temporary copy of a resource in a way that it can be accessed faster than it could be normally. There are a huge range of implementations both in software and hardware. Caches can act as read caches, write caches, or both.
In a read cache data that has been requested previously is stored in a cache for faster access. In some scenarios, the cache may even be pre-emptively loaded with data allowing the first request to be served from the cache rather than just subsequent requests.
The read cache that you’re most likely to be familiar with is the browser cache. Here the browser stores a local copy of requested resources. This means that if and when the webpage is reloaded or a similar page is loaded that uses much of the same content, that content can be served from the cache rather than the web server. Not only does this mean that the webpage can load faster, but it also reduces the load on the web server and reduces the amount of data the user needs to download which can be important on metered connections.
RAM itself also acts as a read cache for data in the hard drive. In this case, data for a running program is pre-emptively loaded into RAM so that the CPU can access it faster. Data from the RAM is then further cached to the CPU cache, though the process for this is a lot more complex as the CPU cache is measured in megabytes not gigabytes.
A write cache is a cache that can absorb data being written to a slower device. A common example of this would be the SLC cache in modern SSDs. This cache doesn’t allow data to be read any faster, however, it is much faster to write to than it is to write to the TLC or QLC flash that makes up the rest of the SSD. The SLC cache can absorb high-speed write operations, and then offloads that data as soon as it can to the TLC flash which offers a lot better storage density, but is also a lot slower to write to. Using the flash memory in this way optimises it for both fast write speeds and high storage density.
There are many ways to handle caches that can allow them to act as both a read and write cache. Each of these methods handles write operations differently and have benefits and drawbacks. The three options are write-around, write-through, and write-back. A write-around cache entirely skips the cache when writing, the write-through cache writes to the cache but only considers the operation complete when it’s been written to storage. The write-back cache writes to the cache and then considers the operation complete, relying on the cache to transfer it to storage if it’s needed.
Write-around can be useful if you’re expecting a large volume of writes as it minimises cache churn. It does, however, mean that an operation that then reads any of that written data will face at least one cache miss the first time. Write-through caches immediately cache write operations meaning that the result can be served from the cache the first time it is requested. To be considered complete though, a write operation needs to write the data to disk too which adds latency. A write-back cache has the same benefit as a write-through, allowing written data to be immediately served from the cache. It doesn’t require write operations to write to disk to be considered complete though. This reduces write latency but comes with the risk of data loss if the cache is volatile and it doesn’t finish writing the data back to storage before power is lost.
How to remove data from the cache?
One of the limiting factors of any cache is capacity. A large cache takes a long time to search, negating a good portion of the advantage of using a cache in the first place. Memory technologies used for caching also tend to be more expensive than the memory they’re caching from. If this wasn’t the case, it’s likely that that memory tier would have switched memory technologies to improve performance. Both of these factors mean that caches tend to be relatively small, especially when compared with the storage medium they’re caching from. RAM has less capacity than storage and CPU cache has less capacity than RAM. The SLC cache has less capacity than the TLC memory.
All of this means that it is often necessary to cycle data out of the cache to free up space for new data that needs to be cached. There are a range of different approaches to this. “Least frequently used”, prefers to evict cache entries that have the lowest access count. This can be useful for predicting which entries will have the least effect on future cache misses but would also count very recently added entries as having a low number of accesses, which may lead to cache churn.
“Least recently used” prefers to evict cache entries that haven’t been used in a while. This assumes that they aren’t being used currently, but doesn’t take into account if they were heavily used a while back. “Most recently used” prefers to evict the most recently used cache entries, assuming that they’ve been used and won’t need to be used again. The best approach is generally a combination of all three, informed by usage stats.
Stale information and security risks
The main risk of caches is that the information they contain can become stale. A cache entry is considered stale when the original data has been updated leaving the cache entry out of date. It’s important to regularly verify that the live copy being served still matches the cached copy.
A cache is a portion of memory that can store some recently used data in a storage method that is faster to access than it would be to complete the normal data access process again. A cache is typically limited in capacity meaning it needs to evict entries once it is full. Caches are generally transparent to the user, meaning that latency is the only indication that the result was served via a cache.