Understanding LRU Caches: Theory and Data Structures (Part I)

Aug 27, 2024

systems

computer science

7kb

When I think of the word cache, it evokes images of mystical inventory items found in an MMORPG. Think Diablo¹, Runescape² or maybe even Star Wars: The Old Republic³ for those fans of space operas.

animated cache chest — The flesh raiders must have some pretty cools stuff in there

But no, unfortunately this post isn't about those type of caches. This is about something you'll come across in computing. The LRU (Least Recently Used) cache.

If this article isn't about those then what is it about? Well it's about understanding LRU caches from the ground up. This is Part 1 of a 3-part series where we'll explore the theory, implementation, and practical applications of LRU caches. By the end of this series, I'll show you how LRU caches, Rust, WebAssembly, and Pokémon all connect in an interesting way!

What Are LRU Caches?

A better question to answer first is: what is a cache? Simply put a cache is:

the hardware or software component that stores data so that future requests for that data can be served faster

In this case we are dealing with a software cache and the type of the cache in question is a Least Recently Used Cache.

The type defines how we deal with maintaining the size of the cache⁴. An LRU cache evicts items based on when an item was accessed. As the name implies, items that have not been accessed recently will be evicted when the cache exceeds it's maximum amount of items.

In the example above the capacity of the cache is 4, the maximum number of items. Items are accessed and added. There are two iterations of interest:

Iteration 5: We replace A with E as A had been the least recently used (0 times)
Iteration 7: In a similar vain E is replaced with F as E had been the least recently used (1 time)

Makes sense. But how do we go about implementing an LRU cache? Perhaps there are a couple of data structures that can come in handy for this type of problem.

Implementing an LRU Cache

Thinking through what's needed for an LRU cache, two things come to mind:

Fast access, this implies constant time $O(1)$
At the same time, a way to keep track of which items have been used and evicting them easily

In the spirit of a great problem there are many ways to solve this one. However, in the same vein TL;DR, we will only explore one of them. That one will involve two fundamental data structures: hashmaps and linked lists, both which lend themselves to deal with the properties above elegantly.

Hashmaps, Linked Lists and Caches

Putting the puzzle pieces together to create an LRU cache works as follows: we create a hashmap with items in the cache as keys and the values pointing to nodes in a doubly linked list.

A hashmap will satisfy $O(1)$ but why a doubly linked list? Doubly-ness aside, a linked list is useful here because we can keep track of the head and tail nodes.

Whenever we access an item, we move it to the head, as it has become the most recent. Once the linked list exceeds it's capacity, we can remove the tail as this will be the least recently used item. The new item becomes the head.

Bringing back the doubly-ness, the utility from this property is that it makes removing nodes simpler because we always have a pointer to the previous and next nodes relative to the current node. We need this information when removing nodes.

The code⁵ below in Python, which is pretty simple, would look something like this:

ddl.py

def delete_node(self, dele):
        # Base Case
        if self.head is None or dele is None:
            return
 
        # If node to be deleted is head node
        if self.head == dele:
            self.head = dele.next
 
        # Change next only if node to be deleted is NOT
        # the last node
        if dele.next is not None:
            dele.next.prev = dele.prev
 
        # Change prev only if node to be deleted is NOT
        # the first node
        if dele.prev is not None:
            dele.prev.next = dele.next

a simple way to delete a node in a doubly linked list

Putting it all together, we have something that resembles the diagram below, with our first sighting of Pokémon. Here we have a map of Pokémon pointing to linked list nodes that can contain any relevant data that we need.

an lru cache, with a hashmap and linked list

What's Next?

Now that we understand the theory behind LRU caches and the fundamental data structures involved, we're ready to dive into implementation. In Part 2, we'll explore how to implement this in Rust, the challenges we'll face with linked lists in Rust's ownership system, and how we'll adapt our approach using different data structures. We'll also discover how to compile our Rust code to WebAssembly for use in web applications.

Stay tuned for the next part where theory meets practice!

Contact

If you have any comments or thoughts you can reach me at any of the places below.

github

***

Footnotes

One type of cache you might find in Diablo 4 ↩
This one is more pertinent to the type of cache we are talking about ↩
Maybe a time not to raid a weapons cache ↩
More specifically we say the eviction policy tells us what type the cache is ↩
Good old GFG ↩

A big thanks to Deirdre Murray-Wallace, Joe Eid, Spike Jensen and Martin Rabone for reviewing this post and making it better.