Lately I’ve been working on my first big Rust project, a part of which involves
fetching remote assets over HTTP and caching them into a
std::collections::HashMap
.
Previously, I had been threading the HashMap
throughout my code, making
requests and inserting entries as necessary. Recently, though, I’ve decided to
further abstract this into a struct where one can declare that it wants the
content of an asset—if it hasn’t been fetched yet, then the request is made
and the response is cached. Otherwise, the cached response is returned.
This means that code that deals with the content of assets does not have to care about having to fetch it. Yay, separation of concerns! Abstractions! Hopefully “zero cost” ones! (…joking.)
A simplified rendition of this abstraction looks a bit like this:
use std::collections::HashMap;
struct Cacher {
cache: HashMap<String, String>,
}
We start off with our Cacher
struct. The key is an opaque asset ID that can be
any UTF-8 string. The asset content is also a UTF-8 string. It is appropriate
for both to be owned by the struct itself, so we use String
instead of &str
.
This also implies that we cannot have any binary asset content.
To make this more clear, we can use type aliases instead:
type AssetId = String;
type AssetContent = String;
struct Cacher {
cache: HashMap<AssetId, AssetContent>,
}
That’s better.
Now that fetching happens implicitly, any operation that obtains asset content
is now fallible, since the (potential) HTTP request can fail. This means that
we’ll have to return a Result<T, E>
.
I’ll assume you know about this, and the
?
(try)
operator too. If you don’t, check out the link(s) above. Rust docs 👍
Since I’m just trying to give an example here, I’ll use the unit type as our error:
// amazing error type :100:
type Error = ();
Implementing Cacher
Let’s sketch out the actual implementation of our Cacher
. Again, we want the
somewhat declarative behavior where user code just says “hey, I want the content
of this asset!” and it’ll be fetched and cached (if necessary), returning the
content to whoever is being pesky about it.
// it's important that this lives outside of the `Cacher` struct so we don't
// have to borrow `self` again when fetching which could be a problem
//
// PLUS fetching has nothing to do with caching so separation of concerns x2!
fn fetch(id: &str) -> Result<String, Error> {
// how we exactly do this isn't important yaknow
Ok(String::from("remember to hug your blahaj"))
}
impl Cacher {
fn content(&mut self, id: &str) -> Result<&str, Error> {
match self.cache.get(id) {
// If we have the cached asset content, just return it. This can't
// fail.
Some(content) => Ok(content),
// We haven't cached the asset yet!
None => {
// Fetch the asset.
//
// If something goes wrong, propagate that error by returning
// early.
let content = fetch(id)?;
// Cache, so we don't have fetch this asset again.
//
// Here, we use `to_owned` to copy the asset ID so the `HashMap`
// can own it as a `String`.
self.cache.insert(id.to_owned(), content);
// Return a reference to the asset content we just fetched.
//
// If it's not there for whatever reason, panic with an error
// message.
Ok(&self
.cache
.get(id) // kinda sucks that we have to do this awkward dance,
// cuz we end up hashing `id` *twice* overall if
// we take this path
// aaand this is a little pointless too.
.expect("key we just inserted is not there?!")
)
}
}
}
}
Dissecting the signature of content
a bit, we take an asset ID &str
and
return another yet another &str
. We use string slices because we only want
read-only views into strings.
A view we accept and a view we return. We don’t need to own the requested asset ID string, and it’d be awfully inefficient to copy the asset content every time we returned it.
The self.cache.get
call borrows the value for us, returning an
Option<&String>
, an optional reference to the cache’s value. Due to deref
coercion though, we can treat it as if it were a Option<&str>
.
What about entry
?
You may notice that we are expressing the following behavior: “if this key
isn’t present, insert some value beforehand; then, get key
’s value”. This
procedure is so common that HashMap
has
an entry API that lets you do exactly that:
let mut map: HashMap<&str, u32> = HashMap::new();
map.entry("poneyland").or_insert(3);
assert_eq!(map["poneyland"], 3);
However, this particular API isn’t well-suited for our use case, as we’d have to
make the request every time in order to pass it as an argument. Luckily, there
is an or_insert_with
API, which is similar to or_insert
, but takes a closure
instead:
let mut map: HashMap<&str, String> = HashMap::new();
let s = "hoho".to_string();
map.entry("poneyland").or_insert_with(|| s);
The closure is only called if the entry is vacant, which seems perfect!
Unfortunately, we can’t use this API either, because you can’t propagate any
errors that happen during fetching inside of the closure. It’d try to return the
error from inside the closure, and the or_insert_with
isn’t expecting a
Result
. Darn.
The error
Okay, now we know we can’t use the or_insert
or or_insert_with
APIs to get
around this issue. Let’s try to compile our Cacher
implementation:
error[E0502]: cannot borrow `self.cache` as mutable because it is also borrowed as immutable
--> src/lib.rs:19:17
|
14 | fn content(&mut self, id: &str) -> Result<&str, Error> {
| - let's call the lifetime of this reference `'1`
15 | match self.cache.get(id) {
| ---------- immutable borrow occurs here
16 | Some(content) => Ok(content),
| ----------- returning this value requires that `self.cache` is borrowed for `'1`
...
19 | self.cache.insert(id.to_owned(), content);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
Examining the code again:
fn content(&mut self, id: &str) -> Result<&str, Error> {
match self.cache.get(id) {
Some(content) => Ok(content),
None => {
let content = self.fetch(id)?;
self.cache.insert(id.to_owned(), content);
Ok(&self
.cache
.get(id)
.expect("key we just inserted is not there?!"))
}
}
}
To check for a potential entry in the cache, we use self.cache.get
, which
borrows the HashMap
, because it’s syntax sugar for
HashMap::get(&self.cache, id)
. If it’s there, then we return the reference to
it, wrapped in Ok
.
However, the borrow “lives” into the None
arm of the match
, preventing us
from calling self.cache.insert
, because you can’t borrow mutably and immutably
at the same time.
By the time we reach the None
arm, though, we don’t need the borrow from the
get
call anymore. So what gives?
The issue
Our issue arises from the fact that the lifetime of the &str
we return is tied
to the lifetime of the borrow from the self.cache.get
call. We can make this
more obvious if we expand out the lifetimes ourselves.
// "lifetimes are just boneless generics"
// vvvvvvvvvvvvvvvvvvvvv
fn content<'ourself, 'asset_id>(
// `self` contains the all-important hashmap we care about
&'ourself mut self,
// the asset id that is being requested
id: &'asset_id str
// the string slice we return has the same lifetime as `self`
// since it comes from the hashmap on `self`
// vvvvvvvvvvvvv
) -> Result<&'ourself str, Error> {
// borrowing 'ourself here...
// vvvvvvvvvvvvvvvvvv
match self.cache.get(id) {
// ...and returning it here...
// vvvvvvv
Some(content) => Ok(content),
// [snip]
}
}
Both &mut self
and the &str
we return have to have the same lifetime
'ourself
because the reference to the hash map data can only live as long as
the Cacher
struct (and therefore the HashMap
inside) is still around. We
don’t really care for how long the id
string lives, because we’re only using
it in the lookup.
Returning the content
we get from the cache.get
call means that the borrow
has to live for the duration of whole function, meaning that it doesn’t even
matter that we don’t need the borrow in the None
arm of the match
!
This is more evident when we examine the lifetimes in the signature. mut self
is annotated with 'ourself
, and the return value has 'ourself
, too. That
means if we borrow self
(which we do in the self.cache.get
call) and return
something from that borrow, the borrow would have to live until the end of the
function.
// these things have the same lifetimes; because we borrow
// `self` and return something from that borrow, the borrow
// would have to "last" for the whole function
// vvvvvvvvvvvvv vvvvvvv
fn content<'a, 'b>(&'a mut self, id: &'b str) -> Result<&'a str, Error> {
// ------- this lifetime is unimportant
OK, I know I basically just restated what rustc
said. But I think it’s good to
explain the details behind that error, because I was pretty confused on why it
was erroring in the first place.
Why NLL doesn’t help
If you happen to know about Rust’s “non-lexical lifetimes”, you might’ve been confused as to why the code wasn’t accepted by the borrow checker.
As a Compiler That’s Trying Its Best, rustc
is inherently conservative in that
it’s perfectly willing to reject programs that cannot be adequately proven to be
safe under the compiler’s verification model, even if they are actually safe
in reality. This can be frustrating, especially in situations where we simply
know better. NLL was an effort to loosen some of these restrictions by making
the compiler smarter:
fn print(string: &str) {
println!("{string}");
}
fn main() {
let mut cool = String::from("hello,");
// we borrow `cool` mutably…
let exclusive_borrow = &mut cool;
exclusive_borrow.push_str(" world!");
// since we never use `exclusive_borrow` again, we can borrow immutably
print(&cool);
}
Traditionally, this code would be rejected because the borrow would last until the end of the enclosing block (hence, lexical lifetimes).
OK, so how do we fix it?
We can fix this with HashMap
’s entry API, but not with the or_insert
nor the
or_insert_with
methods. Instead of those helpers, we use the Entry
enum
instead.
fn content(&mut self, id: &str) -> Result<&str, Error> {
use std::collections::hash_map::Entry;
match self.cache.entry(id.to_owned()) {
Entry::Occupied(entry) => Ok(entry.into_mut()),
Entry::Vacant(entry) => Ok(entry.insert(fetch(id)?))
}
}
entry
returns an Entry
. If occupied, we call
into_mut
to get a mutable reference to the value with the same lifetime as the HashMap
itself. (We don’t use
get
because that’d borrow from the entry itself instead, and that’s going to be
dropped by the end of the function.) Mutable references are coercible to
immutable references. OK!
If vacant,
insert
into the entry and return the value simultaneously.
Now we don’t need to call get
again and deal with the Option
, nice!
It’s a bit weird that we need to clone the asset ID when grabbing the entry, but that seems to be a hard problem.
Conclusion
I hope you took something away from this glorified error dissection! Here’s a small review outline you might find useful:
- Use type aliases or newtypes to increase readability and establish API boundaries.
str
is a UTF-8 text region,&str
is an immutable reference to it.String
is an owned UTF-8 string buffer.String
impl
sDeref<Target=str>
, soString
getsstr
’s methods and you can treat&String
as&str
.- NLL permits some programs that strict lexical lifetimes would reject, but it has its limitations.
- Returning something from a borrow with a lifetime in both the
self
and return position means that borrow lives for the whole function. - The
Entry
API allows for fluent and performantHashMap
usage, letting succinct patterns emerge. If you run into ownership problems vis-à-visHashMap
, see ifEntry
can help.
Hopefully the tangents and asides weren’t too disorienting, but I’d like this post to function as a branching point for other useful resources, ideas, and whatnot. Plus some things are just too important to gloss over!