Today we are going to try to implement self-references in safe Rust!
source: https://pixabay.com/fr/photos/fleur-fleur-blanche-ros%C3%A9e-3402550/ [CC0 licence]

First what is a self-reference. It’s a reference that points into the same struct where the reference lives. For example:

struct World {
    entities: Container<Entity>,
    player: &'??? mut Entity, // player is a reference to one of the `Entity` stored into `entities`
}

We want player to point into entities. But if so, what is the lifetime of that reference? It’s the same lifetime as the lifetime of the object itself. You can’t express this currently in Rust.

Do we really need language support for self-references (&'self)?

First of all, let assume that &'self T allow to create a reference pointing into the same structure. Let’s take a look of the operations that a self reference must support using some imaginary syntax:

// declaration
struct World {
    entities: Container<Entity>,
    player: &'self mut Entity, // player is a reference to one of the `Entity` stored into `entities`
}

// initialization
let w = World {
    entities: 
    player: self.entities[0],
}

// shared access
let _: &Entity = w.player;
display(&w.player);

// mutable access
let _: &mut Entity = w.player;
set(&mut w.player, ...);
*w.player = ...;

// reset the reference
w.player = w.entities[2];

First of all, how do we represent it? Container is something very simple like a struct, a tuple or an array, a simple integer representing the offset in bytes starting from the address of the variable of type World is enough. If Container is something more complex like a Box, Vec or HashMap we need a constant offset to find the pointer that we need to dereference, then another offset to access inside the linear memory of that container. The same logic can be re-used if Container has multiple level of dereference like a 2D Matrix represented by a Vec<Vec<T>>. In all those cases, our self reference is just a list of offset (in bytes) known at compile time. However more complex node-based Containor (like Map or List) cannot be represented by a constant number of offset. The number of indirection can only be known at run-time.

So let’s take a totally different approach. Instead of storing one (or more) integer representing the offset from the address of the base, let’s represent our self reference with a function that describe how to retrieve this reference. And since such operation can be arbitrary complex (especially for a node-based container), we could use a function pointer: fn(&'a Base) -> &'a Target.

let w = World {
    entities: 
    player: |this| this.entities[0],
}

What is very nice is that we don’t even need a special syntax for the initialization. However this example works well because the index is a constant. If it was a variable, we could use a closure that capture the variable used for indexing, but this would make the size of the object unspecified. Instead we can explicitly store the index next to it:

let index = 0;
let w = World {
    entities: 
    player: (|this, idx| this.entities[idx], &index);
}

Using a function pointer is not really ergonomic:

let entity: &Entity = (w.player.0)(&w, w.player.1);

So, let’s refactor it into a struct:

struct SelfReference<Base, Idx, Target> {
    get: for <'a> fn (&'a Base, Idx) -> &'a Target,
    idx: Idx,
}
impl <Base, Index, Target> SelfReference<Base, Index, Target>
where Index: Copy {
    fn get<'a>(&self, base: &'a Base) -> &'a Target {
        (self.get)(base, self.idx)
    }
    fn set(&mut self, idx: Index) {
        self.idx = idx;
    }
}

Perfect! Now we can access (read-only) and reset our self-reference. We can even have quite easily multiple self-references in the same struct.

let current_monster_idx = 1;
let mut w = World {
    entities: vec![Entity("player"), Entity("monster"), Entity("boss")],
    player: SelfReference {
        get: |this, &idx| &this.entities[idx],
        idx: 0,
    },
    monster: SelfReference {
        get: |this, &idx| &this.entities[idx],
        idx: current_monster_idx,
    },
};

// shared access
let _: &Entity = dbg!(w.player.get(&w));
println!("{:?}", &w.monster.get(&w));

// reset the reference
w.monster.set(&2);
dbg!(w.monster.get(&w));

The current approach will have to be modified a bit if we want to support mutable access. Indeed, if we suppose that we have a function get_mut with the same interface as get (with some extra mut), we would not be able to use it the same way:

error[E0502]: cannot borrow `w` as mutable because it is also borrowed as immutable
  --> src/main.rs:52:43
   |
52 |     let e: &mut Entity = w.player.get_mut(&mut w);
   |                          -----------------^^^^^^-
   |                          |        |       |
   |                          |        |       mutable borrow occurs here
   |                          |        immutable borrow later used by call
   |                          immutable borrow occurs here

So we will have to modify the interface to only pass the container, instead of the whole object to get_mut (and get for uniformity).

let e: &mut Entity = w.player.get_mut(&mut w.entities);

A full working example can be found on the playground.

struct SelfReference<Container, Idx, Target> {
    get: for <'a> fn (&'a Container, Idx) -> &'a Target,
    get_mut: for <'a> fn (&'a mut Container, Idx) -> &'a mut Target,
    idx: Idx,
}
impl <Container, Index, Target> SelfReference<Container, Index, Target>
where Index: Copy,
{
    fn get<'a>(&self, container: &'a Container) -> &'a Target {
        (self.get)(container, self.idx)
    }
    fn get_mut<'a>(&self, container: &'a mut Container) -> &'a mut Target {
        (self.get_mut)(container, self.idx)
    }
    fn set(&mut self, idx: Index) {
        self.idx = idx;
    }
}
struct World {
    entities: Container<Entity>,
    player: SelfReference<Container<Entity>, usize, Entity>,
    monster: SelfReference<Container<Entity>, usize, Entity>,
}

fn main() {
    let mut w = World {
        entities: vec![Entity("player"), Entity("monster"), Entity("boss")],
        player: SelfReference {
            get: |container, idx| &container[idx],
            get_mut: |container, idx| &mut container[idx],
            idx: 0,
        },
        monster: SelfReference {
            get: |container, idx| &container[idx],
            get_mut: |container, idx| &mut container[idx],
            idx: 1,
        },
    };

    // shared access
    let _: &Entity = dbg!(w.player.get(&w.entities));
    println!("{:?}", &w.player.get(&w.entities));

    // mutable access
    let e: &mut Entity = w.player.get_mut(&mut w.entities);
    e.0 = "frightened player";

    fn set(entity: &mut Entity) {
        entity.0 = "wounded player"
    }
    set(w.player.get_mut(&mut w.entities));

    w.player.get_mut(&mut w.entities).0 = "ghost of player";

    // reset the reference
    w.monster.set(2);
}

This is the point where I should stop this post, since we have demonstrated that we can have the semantic of a self-reference without needed language support.

The rest of this posts will discuss what modifications to the language could make self-references more ergonomic. This is going to be more controversial, but fortunately none of those language modifications are needed. And consider all of those ideas as a draft. They are at best at the pre-pre RFC stage.

Forwarding mut

The most useful addition to the language would probably be the ability to forward mut from an input reference to an output reference, as long as the function itself only require an immutable reference. For example fn get(input: & ~mut Input) -> & ~mut Output would return an &Output if we call it with an &Input and would return an &mut Output if we call it with an &mut Input. In term of syntax I’m using & ~mut to match ~const, but I personally would have preferred ?mut. Side note: I’m not sure it’s possible to extent it to also return a value if a value is passed as input.

Being able to forward mut would help immensely to reduce the duplication of get/get_mut pair in API design, where both function generate the same assembly. If I’m not mistaken, it’s usually the case with any kind of container. The fact that Index and IndexMut, as well as Deref and DerefMut uses the same syntax seems good indices that such proposal would be useful.

If forwarding mut was adopted, it would probably be possible to store a single function pointer, and expose a single fn get<'a>(&self, &'a ~mut Container) -> &'a ~mut Target function.

Non-opaque capturing closure

The type of capturing closure is currently opaque. If it was possible to make it visible, it would be possible to store a capturing closure without using something like Box<&dyn Fn(...) -> ...> since such type would be Sized. Let’s assume that the syntax is fn(..., move foo: usize) -> ... for a function pointer capturing an usize (named foo), and |..., move foo: usize| ... a closure that capture an usize named foo.

If the language had non-opaque capture closure, and especially if forwarding mut was adopted too, SelfReference would just be a simple fn(& ~mut Container, move Idx) -> & ~mut Target, and reseting it would be much more natural: w.monster = |& ~mut entities, move idx| & ~mut entities[idx].

Without forwarding mut it would be much less useful, because you would still have to repeat the code of the closure, and the compiler would have somewhat understand that both function pointer share the same captures.

Direct language support for self-references

What was not really nice, was that we had to manually pass the container into the closure (w.player.get_mut(&mut w.entities)) instead of just being able to call w.player.get_mut(). If Rust had language support, it should probably be possible to be able to write the later, and have the compiler add automatically the implicit reference to the container. This would also allow to write the closure more naturally (|self, idx| &mut self.entities[idx] instead of |container, idx| &mut container[idx]) without the borrow checker complaining that the base (World) is captured both immutably (when accessing immutably the self-reference) and mutably (accessing (possibly mutably) the Entity inside the container).

Having language support would also allow to implement Deref and DerefMut. However, I’m not 100% convinced that it would be a good idea since the closure used to access to the self-reference can be arbitrary complex.

Everything together

type Container<T> = Vec<T>;

#[derive(Debug)]
struct Entity(&'static str);

struct World {
    entities: Container<Entity>,
    player: fn(&~mut self, move usize) -> &~mut Entity,
    monster: fn(&~mut self, move usize) -> &~mut Entity,
}

fn main() {
    let mut w = World {
        entities: vec![Entity("player"), Entity("monster"), Entity("boss")],
        player: |self, move idx=0| &~mut self.container[idx],
        monster: |self, move idx=1| &~mut self.container[idx],
    };

    // shared access
    let _: &Entity = dbg!(*w.player);
    println!("{:?}", *w.player);

    // mutable access
    let e: &mut Entity = *w.player;
    e.0 = "frightened player";
    println!("{:?}", *w.player);

    fn set(entity: &mut Entity) {
        entity.0 = "wounded player"
    }
    set(*w.player);
    println!("{:?}", *w.player);

    w.player.0 = "ghost of player";
    println!("{:?}", *w.player);

    // reset the reference
    w.monster = |self, move idx=2| &~mut self.container[idx],
    println!("{:?}", *w.monster);
}

EDIT: I realized that there is a use-case that I totally didn’t covered. My solution works only when creating the reference is cheap (like when indexing into a Vec a HashMap or a Map), but not when it’s expensive (like parsing a file). It would be extremely costly to use the solution I proposed to the folowing snippet (you would have to reparse the object each time you want to access parsed_file):

struct OwningFile {
    data: Vec<u8>,
    parsed_file: object::File<'self>,
}

OwningFile {
     data: large_blob,
     parsed_file: object::File::parse(&self.data).unwrap(),
}

For those interested, you can also take a look at self_cell and ouroboros

Discuss-it on reddit on on the user forum.

Creative Commons License