Transient trees in C
              
              You may have heard about persistent data structures, that is immutable structures where every update returns a new version of the structure instead of modifying it in place.
I’ve been interested in persistent structures lately since I’ve been toying with the idea of building a text editor, and they have a couple of very desirable properties:
- implementing undo is trivial, since you can just store a list of previous versions of the structure; 
- they are safe to access from multiple threads, so syntax highlighting on large files can be offloaded to another thread while the buffer remains editable. 
The downside is that persistent operations require allocating and copying; a smart implementation can mitigate the performance impact by sharing as many parts as possible between different versions, but some overhead is inevitable. For this reason sometimes it may be useful to forgo immutability. In Clojure, this is achieved by using transients.
In pseudocode, usage looks like this:
let n = 1000000
let v = new persistent vector
// persisent
for i from 0 to n
    v = v.push(i)
// transient
t = v.transient()
for i from 0 to n
    t.push(i)
v = t.persistent()what happens is that the first loop will create (light) copies of v at each
              iteration, while the second may modify the vector directly1 - in hot paths
              this may end up saving a lot of CPU time. Naturally, transients don’t have the
              properties I mentioned above (in particular they are not thread-safe), so you
              usually want to come back to a persistent version as soon as possible.
Making transience the default
Clojure’s approach is nice in a functional context, but I thought that turning it inside out could make it fit better with an imperative language.
The idea is this: we treat the structure as mutable by default, but with the option of taking a persistent snapshot; updates on said snapshots have a copy-on-write semantics. It’s worth noting that mutable structures may have some immutable subparts, but this is handled transparently by the update procedures.
Let’s see an example of how this could be implemented in practice in C.
A transient binary search tree
This is the definition of our tree:
struct tree {
    atomic_int rc;
    bool is_const;
    int key;
    char const *value;
    struct tree *left, *right;
};As you can see it’s a BST augmented with two fields: a reference count3 and
              a boolean flag. is_const is handled by following these rules:
- if - is_constis true, the node is immutable. Changing its fields other than- rcis forbidden, so a mutable copy will be made if updating is required;
- is_constis propagated lazily, so that making a tree persistent is just a matter of setting- is_constto true on the root. When we descend down the tree to add a leaf or edit a value, if we encounter a node coming from an immutable parent, we’ll set that node’s flag (see- tree_putlater). Additionally, if a node is a shared child of multiple parents and at least one of them is immutable, then it too needs to be immutable4 (this will be relevant in the implementation of- tree_clone).
Here are some simple functions to create, increase the reference count, and free tree nodes:
struct tree *
tree_new(int k, char const *v, struct tree *l, struct tree *r)
{
    struct tree *t = malloc(sizeof(*t));
    if (!t) return 0;
    *t = (struct tree){
        .rc = 1,
        .key = k,
        .value = v,
        .left = l,
        .right = r
    };
    return t;
}
              
void
tree_hold(struct tree *t)
{
    if (atomic_fetch_add(&t->rc, 1) == 0)
        exit(1);
}void
tree_free(struct tree *t)
{
    if (atomic_fetch_sub(&t->rc, 1) != 1)
        return;
    if (t->left)
        tree_free(t->left);
    if (t->right)
        tree_free(t->right);
    free(t);
}There’s nothing interesting going on here. Nodes are created with is_const
              set to false and rc to 1.
Getting a value is no different than what we’d have in a mutable BST:
char const *
tree_get(struct tree *t, int k)
{
    while (t) {
        if (t->key == k)
            return t->value;
        t = k > t->key ? t->right : t->left;
    }
    return 0;
}The meat of the problem is, of course, insertion. We’ll need an helper function that creates a mutable shallow copy of a node:
struct tree *
tree_clone(struct tree *t)
{
    struct tree *res = malloc(sizeof(*res));
    if (!res) return 0;
    memcpy(res, t, sizeof(*t));
    res->rc = 1;
    res->is_const = false;
    if (res->left) tree_hold(res->left);
    if (res->right) tree_hold(res->right);
    if (t->is_const) {
        if (t->left)
            t->left->is_const = true;
        if (t->right)
            t->right->is_const = true;
    }
    return res;
}There’s a couple of things to note here:
- by copying - twe’re creating new references to its children, so we need to increase their reference counts;
- as mentioned before, if the original node was immutable then - is_constmust be propagated to its children (again, ⁴).
Now we’re ready to define tree_put. This is how it works:
- descend down the tree, selecting the appropriate child as usual in a BST; 
- if the current node - tis immutable, create a mutable copy with- tree_cloneand use it to replace- tin- t’s parent (we’re deleting a reference to- t, so call- tree_freeon it);
- if a node is found that matches the provided key, write the desired value on that node and return; 
- otherwise, when a leaf is reached add a new node with the provided key-value pair as a child. 
You may have noticed that these steps require modifying some nodes. However, step 2 means that we maintain the following invariant:
At the start of each iteration,
t’s parent is mutable.
This guarantees that step 2 and 4 are valid. Additionally, step 3 is executed
              after step 2, so t itself is mutable and changing its value field is
              allowed.
tree_put’s return value is the provided tree if it is mutable, its mutable
              copy if it isn’t.
Here’s the code:
struct tree *
tree_put(struct tree *t, int k, char const *v)
{
    struct tree *root = t;
    struct tree **p = &root;
    while (t) {
        if (t->is_const) {
            t = tree_clone(t);
            if (!t) return 0;
            tree_free(*p);
            *p = t;
        }
        if (t->key == k) {
            t->value = v;
            return root;
        }
        p = k > t->key ? &t->right : &t->left;
        t = *p;
    }
    *p = tree_new(k, v, 0, 0);
    if (!*p) return 0;
    return root;
}There’s only one piece missing: a way to make a tree persistent. This is very
              easy thanks to the lazy propagation of is_const:
void
tree_persist(struct tree *t)
{
    t->is_const = true;
    tree_hold(t);
}Using the tree
Let’s see an example of how we might use our transient BST6:
struct tree *t = tree_new(0, "root", 0, 0);
t = tree_put(t, -10, "foo");
t = tree_put(t, -5, "bar");
t = tree_put(t, 10, "baz");
t = tree_put(t, 5, "fizz");
t = tree_put(t, 15, "buzz");
t = tree_put(t, 7, "pippo");This creates the following tree:
At the moment the tree is still completely mutable, so the value returned by
              tree_put is t itself:
struct tree *u
u = tree_put(t, 0, "tree");
assert(t == v);We can take a persistent snaphost:
tree_persist(t);
u = tree_put(t, 6, "pluto");Now we have the following:
But what’s intersting is that t and u share some structure (left is t,
              right is u):
If we keep modifying u, they will gradually diverge:
u = tree_put(u, -5, "car");u’s nodes that also belong to t are immutable, the rest is mutable.
Remember to clean up after yourself:
tree_free(t);
tree_free(u);Conclusions
First of all, you can find the full code on Codeberg: dalz/transient-tree.
I haven’t shown it here, but after calling tree_persist on a tree, that tree
              becomes safe to pass between threads.
Modifying a BST is pretty simple, but things get more hairy with complex data
              structures that have multiple, more involved update procedures (in contrast
              with a single tree_put). Tracking immutability and reference counts in your
              head starts to be annoying in that case. This could be solved with a stronger
              type system that provides better static guarantees7; but this is C, so just
              be smarter?
- Just to be clear, this does not mean that modifying - twill change- v: as you can see we need to assign- t’s value to- vafter the loop.↩︎
- As the - valuefield is a pointer, other parts of the code could modify the value it points to without calling our procedures. This would mean that changes to the value would not respect persistence. You can remove the- constqualifier if this is fine for your use case.↩︎
- Not strictly necessary; you can replace it with a garbage collector, or even ignore memory management if you know you’ll keep every snapshot for the whole execution of the program.↩︎ 
- In case it’s not clear why, consider the following example: node C is child of mutable node A and immutable node B. If we update C coming from B we’ll see it as immutable (since we need to propagate - is_const), but if we did the same from A we’d treat it as mutable. Obviously C can’t be both mutable and immutable at the same time.↩︎
- If at the time - tree_holdis called the reference count is 0, the tree is being freed and holding it is not possible. A serious program could try to handle this gracefully, but here we’ll just exit.↩︎
- I’m not checking the return value of - tree_newand- tree_putfor null, but you should.↩︎
- We could for instance define different types for mutable and immutable nodes.↩︎