Implementing LRU Cache in Rust and WebAssembly (Part II)

Aug 28, 2024

systems

19kb

Welcome back! In Part 1, we explored the theory behind LRU caches and the fundamental data structures that make them tick. Now it's time to get our hands dirty with implementation. But here's the twist - we're going to implement it in Rust and then compile it to WebAssembly for use in web applications.

An Implementation in Rust

Firstly, why Rust? I enjoy systems programming and I have wanted to experiment with WebAssembly (Wasm) and thought this would be a good idea. Boy was I wrong. Before we dive in, let's quickly sidestep and ask ourselves why this was a bad idea.

Linked Lists and Rust

Apparently these two don't mesh well together. You can find discussions on linked lists in Rust¹, tutorials on how create them in Rust² and why they are hard to write in Rust³. For a hobby and educational project I clearly bit off more than I can chew. I didn't want to use a crate⁴ for the linked list because I wanted to implement something myself to maximise my learnings.

If I wanted to continue to use Rust for this project, I'd have to adapt and find new ways to implement the cache. Moments like these teach you more deeply about what a programming language is well suited for. Rust clearly is not well suited⁵ for linked lists.

This leaves no choice but to explore different data structures for the implementation in Rust. Enter: double ended queues (deque).

Deques

A deque is a data structure similar to a queue⁶ but it allows inserting and deleting at both ends of the queue. This allows for the same functionality as having accesss to the head and tail of a doubly linked list.

a deque, showing a pop and push operations on both ends

However, what does the hashmap point to now? The items in the deque are not nodes anymore with left and right pointers. You might be thinking: "we can just store the indexes of each item in the hashmap for constant access!".

You'd be right to think that we are going to store the indexes and slightly right in thinking that we maintain our constant time. Because we have introduced indexes, we are going to have to shift all of these indexes by 1 up until the node we are processing, a worst case $O(n)$ operation.

Some of you will probably be raging into the screen thinking that there are better ways to do this. You'd be absolutely correct but at this point I didn't want to get bogged down into making this ultra efficient. It is after all an educational project and meant to be somewhat fun. Plus, we've got Pokémon to attend to!

Implementation in Rust... Again

Before we dive into the code, let's zoom out. What kind of API should our cache have? Fortunately it's pretty simple. Only two operations:

get(value) -> value
put(value)

We want to get values from the cache and put values into the cache. So, without further ado, let's get started!

The Code

We want to define a struct that will hold the model of our cache. It will have a capacity defined by the user, hence the only public field is pub capacity: usize.

lib.rs

pub struct LRU {
    pub capacity: usize,
    map: HashMap<String, usize>,
    q: VecDeque<String>,
}

a struct that represents the LRU cache

I've chosen to simply map a String to the index as I'm going to go with uuids⁷ to link any arbitrary item in the cache.

I've defined two helper methods to manage the state of the cache as we get and put into it.

lib.rs

fn update_map_values(&mut self) {
    for (idx, key) in self.q.iter().enumerate() {
        self.map.insert(key.to_string(), idx);
    }
}
 
fn make_node_head(&mut self, idx: usize, key: &str) {
    self.q.remove(idx);
    self.q.push_front(key.to_string());
 
    self.update_map_values();
    self.map.insert(key.to_string(), 0);
}

helper methods

The first, lazily updates all the hashmap indexes with the new state of the queue. The second will make the value that has just either been put or get into the head node⁸.

The final methods are the actual user facing get and put.

lib.rs

pub fn get(&mut self, key: &str) -> Option<String> {
    if let Some(&idx) = self.map.get(key) {
        self.make_node_head(idx, key);
        return Some(key.to_string());
    } else {
        return None;
    }
}
 
pub fn put(&mut self, key: &str) {
    if self.q.len() == self.capacity {
        let remove_key = self.q.pop_back().unwrap();
        self.map.remove(&remove_key);
    }
 
    self.q.push_front(key.to_string());
    self.update_map_values();
    self.map.insert(key.to_string(), 0);
}

the juicy stuff

The get simply gets the value if it's there and makes it the head, otherwise it will return None. Performing a put will, if the capacity is full, remove the tail node⁹ and then put the item in the cache and make it the head node. That's it!

All together it looks like this:

lib.rs

#[wasm_bindgen]
pub struct LRU {
    pub capacity: usize,
    map: HashMap<String, usize>,
    q: VecDeque<String>,
}
 
#[wasm_bindgen]
impl LRU {
    #[wasm_bindgen(constructor)]
    pub fn new(capacity: usize) -> LRU {
        utils::set_panic_hook();
        LRU {
            capacity,
            map: HashMap::new(),
            q: VecDeque::new(),
        }
    }
 
    fn update_map_values(&mut self) {
        for (idx, key) in self.q.iter().enumerate() {
            self.map.insert(key.to_string(), idx);
        }
    }
 
    fn make_node_head(&mut self, idx: usize, key: &str) {
        self.q.remove(idx);
        self.q.push_front(key.to_string());
 
        self.update_map_values();
        self.map.insert(key.to_string(), 0);
    }
 
    #[wasm_bindgen(getter, skip_typescript)]
    pub fn queue(&self) -> JsValue {
        serde_wasm_bindgen::to_value(&self.q).unwrap()
    }
 
    #[wasm_bindgen(getter, skip_typescript)]
    pub fn map(&self) -> JsValue {
        serde_wasm_bindgen::to_value(&self.map).unwrap()
    }
 
    pub fn get(&mut self, key: &str) -> Option<String> {
        if let Some(&idx) = self.map.get(key) {
            self.make_node_head(idx, key);
            return Some(key.to_string());
        } else {
            return None;
        }
    }
 
    pub fn put(&mut self, key: &str) {
        if self.q.len() == self.capacity {
            let remove_key = self.q.pop_back().unwrap();
            self.map.remove(&remove_key);
        }
 
        self.q.push_front(key.to_string());
        self.update_map_values();
        self.map.insert(key.to_string(), 0);
    }
}

You may have noticed that there are some extra methods and macros¹⁰ I haven't talked about. Not to worry, these link into the next part of the puzzle: WebAssembly.

WebAssembly

WebAssembly (Wasm) is a type of code that compiles to a binary that can run in your browser¹¹ at native speeds¹². It can compile from C, C++, Go and Rust amongst others in the ecosystem. This sounds like a great deal for those looking to improve performance on compute heavy tasks but it seems the jury is still out¹³.

Nonetheless, I thought trying to run my Rust LRU code in this project could be a cool thing to experiment with. Curiosity is the name of the game here. But how do I go about doing this? Luckily, there is quite a handy crate that can help us out: wasm-pack.

Wasm-Pack

Wasm-pack is a great open-source tool that helps compile Rust code to Wasm and package it so we can use it in our web applications. It's recommended on the mdn web docs¹⁴ and makes the process of using wasm in your application a lot easier. This is one of the major benefits of using Rust as the language of choice to compile into Wasm. The ecosystem is more mature and supported than some of the other languages¹⁵.

It works by packaging your Rust code as an npm package that we can directly install into our web app by adding it into the package.json. Easy as that! Running wasm-pack build --target bundler will yield something similar to:

$ wasm-pack build --target bundler
[INFO]: 🎯  Checking for the Wasm target...
[INFO]: 🌀  Compiling to Wasm...
   Compiling lru-wasm v0.1.0 (/repos/wasm-lru)
    Finished `release` profile [optimized] target(s) in 2.17s
[INFO]: ⬇️   Installing wasm-bindgen...
[INFO]: Optimizing wasm binaries with `wasm-opt`...
[INFO]: Optional fields missing from Cargo.toml: 'description', 'repository', and 'license'. These are not necessary, but recommended
[INFO]: ✨   Done in 3.72s
[INFO]: 📦   Your wasm pkg is ready to publish at /repos/wasm-lru/pkg.

running the wasm-pack tool to generate an npm package

You may be asking: "why am I setting the target as bundler". Well, in my case I am using Next.js which uses webpack so I could target a bundler and secondly it makes using our package much simpler¹⁶.

It's all well and good that wasm-pack bundles our code ready to be imported into a web app but how does it interface with Rust when you are writing Rust code, and for that matter TypeScript?

Using Wasm with TypeScript

Wasm-pack greatly simplifies the process of compiling Rust to Wasm and using it in a Javascript or TypeScript project.

Remember those macros I said I'd talk about, the ones labelled #[wasm_bindgen]? Let's revisit them in this section. They are what expose the functions that will be accessible to both Rust and Javascript¹⁷. They can not only be applied to functions but also to structs as per the Rust code above.

So it's as easy as that! Slap on #[wasm_bindgen] and reap the rewards of using those Rust functions in Javascript. Well... not exactly.

You see, the LRU struct has the HashMap<String, usize> and VecDeque<String> fields which aren't supported types in wasm-pack¹⁸.

lib.rs

#[wasm_bindgen]
pub struct LRU {
    pub capacity: usize,
    map: HashMap<String, usize>,
    q: VecDeque<String>,
}

problematic map and q fields

However we aren't entirely stuck here, instead we can use the serde¹⁹ package, which handles sending arbitrary data to Javascript²⁰ for use.

lib.rs

#[wasm_bindgen(getter, skip_typescript)]
pub fn queue(&self) -> JsValue {
    serde_wasm_bindgen::to_value(&self.q).unwrap()
}
 
#[wasm_bindgen(getter, skip_typescript)]
pub fn map(&self) -> JsValue {
    serde_wasm_bindgen::to_value(&self.map).unwrap()
}

how we choose to expose the deque and map

You might be wondering why I've added this new macro #[wasm_bindgen(getter, skip_typescript)] that looks like it's telling the packager to skip TypeScript when the heading of this section suggests an explanation of how to use TypeScript with Wasm.

Creating seamless automated typing from Rust to TypeScript with the current ecosystem is hard²¹. wasm-bindgen does provide lots of attributes²² that can help export some useful typing for us but it lacks when it comes to the non standard data types I'm using here. If you run wasm-pack you'll get the file:

lru_wasm.d.ts

/* tslint:disable */
/* eslint-disable */
/**
*/
export class LRU {
  free(): void;
/**
* @param {number} capacity
*/
  constructor(capacity: number);
/**
* @param {string} key
* @returns {string | undefined}
*/
  get(key: string): string | undefined;
/**
* @param {string} key
*/
  put(key: string): void;
/**
*/
  capacity: number;
}

the current typing

Evidently, the map: HashMap<String, usize> and q: VecDeque<String> are missing from the type definitions, as we specified to skip them with skip_typescript. If we chose to include these (i.e removing the skip_typescript) then our types would look like this:

lru_wasm.d.ts

/* tslint:disable */
/* eslint-disable */
/**
*/
export class LRU {
  free(): void;
/**
* @param {number} capacity
*/
  constructor(capacity: number);
/**
* @param {string} key
* @returns {string | undefined}
*/
  get(key: string): string | undefined;
/**
* @param {string} key
*/
  put(key: string): void;
/**
*/
  capacity: number;
 
  readonly queue: any; 
  readonly map: any; 
}

the current typing

which isn't very useful in TypeScript. How do we get around this? It's not very straightforward. I tried multiple solutions using different crates²³, using typescript_type²⁴ but nothing suited what I needed.

Instead I chose to generate what I could with the bindings (as seen in the first lru_wasm.d.ts) and extend the class on the frontend.

export class LRU extends WasmLru {
  readonly map: Map<string, number>;
  readonly queue: Array<string>;
}

a bit janky but it worked

Finally, we have packaged the wasm module and have it typed. What's next is to actually use it.

Wasm and MDX

I use MDX to write blog posts. It allows for more interesting posts that can utilise the magic of React components and embed them into your markdown. It's awesome. I decided to try my hand at including a React component that uses the wasm module built above in an interesting way.

As has been the central theme to this project, it wasn't easy to integrate the wasm module into MDX on Next.js. Thankfully I came across an article²⁵ that explains a workaround for using wasm on Next.js, however this wasn't the end of the woes.

Since MDX is rendered on the server, I was getting an error when bundling²⁶ my wasm component with my MDX files. The solution here was to load the module dynamically and to not render it on the server.

import dynamic from "next/dynamic";
 
const DynamicWasmModule = dynamic(
  () =>
    import("@components/mdx/WasmModule").then(
      (mod) => mod.WasmComponent
    ),
  {
    ssr: false,
  }
);

dynamic imports that then get bundled with mdx

Now the show can finally begin: running wasm code on the client used in this very mdx file.

What's Next?

We've successfully implemented an LRU cache in Rust and compiled it to WebAssembly for use in web applications. We've overcome the challenges of Rust's ownership system by adapting our approach with deques, and we've navigated the complexities of TypeScript integration.

In Part 3, we'll put our WebAssembly LRU cache to work in a practical and fun application: building an interactive Pokédex! We'll explore how LRU caches apply to real-world scenarios and create a visual demonstration that brings together everything we've learned.

Contact

If you have any comments or thoughts you can reach me at any of the places below.

github

***

Footnotes

They are quite contentious apparently ↩
Further reading ↩
This is one way to learn a new languge ↩
Rust's internal package manager ↩
Simply anyways ↩
If you want to know more about queues, just visit wikipedia ↩
Pretty much a unique identifier ↩
They aren't really nodes as we are using a deque now, but thought I'd keep the semantics consistent ↩
The least recently used node!!! ↩
More on macros here ↩
The official docs ↩
Apparently not too bad ↩
A great critical read about why maybe performance isn't the best feature of wasm ↩
See here for the full tutorial ↩
I'm loooking at you Go ↩
Pretty much you have to manually init the module in a useEffect instead of simply importing and using the package if you don't target a bundler. ↩
They are one of the most important things to know about wasm-pack ↩
See here for supported types ↩
Useful for serialising and deserialising Rust data structures ↩
Here are the docs that explain how to do it ↩
I was banging my head with tsify and ts-rust and searching online and using Claude and couldn't really find something that worked easy ↩
For instance I used skip_typescript, getter and constructor, see the docs here ↩
Libraries include tsify and ts-rs ↩
This did look promising however going with this approach would mean just re-writing the whole type from scratch, I wanted to use a mix of automated and patched ↩
Thank god for this ↩
I use Kent Dodds blazingly fast mdx bundler ↩

A big thanks to Deirdre Murray-Wallace, Joe Eid, Spike Jensen and Martin Rabone for reviewing this post and making it better.