A Tour of Immutable Data Structures

An immutable data structure is one that cannot be modified after it is created. This seems to imply that any changes to the data structure require the entire object to be copied into the new, modified object. However, that's not the case. It's easiest to see why with a simple example:

Linked Lists

Consider an immutable singly-linked linked list:

d -> c -> b -> a

Let's call this list L. L can be stored with just a reference to the head of the list, the element d. Suppose we want to add an element ("e") to the head of the list, creating the new list L':

e -> d -> c -> b -> a

We now have two lists: L' (starting at the element e) and L (starting at the element d). We did not have to copy L to a new list to create L'

Now, suppose we want to add a list to the end of L: If we add it directly:

d -> c -> b -> a -> x

Then we will have modified L, because anyone accessing L (starting at d) now has an extra element on the the end of their list. To add an element to the end of L, we need to fully copy it:

d -> c -> b -> a
d -> c -> b -> a -> x

What's the difference between adding to the head of the list, and adding to the tail? No elements in L have pointers to the element a, but all the elements in L have pointers to the element x.

General Immutable Data Structures

We can generalize idea into a rule for general immutable data structures, as follows. Given two elements x, y in a data structure, say that y is reachable from x if there is a directed path of pointers from x to y. In the linked list L above, a is reachable from d, but c is not reachable from b. The rule is:

When modifying an immutable data structure at a given position, we must copy all elements from which that position is reachable.

In our linked list example, a modification to b requires d and c to be copied--but not a.

This idea turns out to be quite powerful. It implies that the immutable data structures which take the least time and space to modify are those containing only short paths. In a binary tree, modifying (that is, inserting, deleting, or changing the value of) a node requires that only the ancestors of the node being modified be copied to the new structure. This implies that creating a new binary tree with a modified element takes O(log n) time and space (assuming the original tree is balanced). Trees with higher fanout are even better: A tree with a fan-out of 64 and height of 10 can contain over 1017 elements, but modifying such a tree to create a new one takes only O(1) time and space. In fact, Scala's immutable Vector data structure use a structure like this.

Why You Should Use Immutable Data Structures

  • They're inherently thread-safe.
  • You can always copy an immutable object "by reference", because it will never change on you.
  • Code using immutable object is easier to read. After an immutable object is assigned to a value, it will never change.
  • As discussed above, performance is often comparable to that of mutable data structures.