I implemented a Trie
data structure for Effect. During the process I learned all the details of what is required to build a usable and efficient data structure in Typescript:
- What features to consider when implementing a data structure
- Performance (time and space)
- Stack safety
- Immutability
- How to make the data structure inspectable (
JSON.stringify
,toString
) - How to compare equality by value
- How to make the data structure
Iterable
Data structure features
Some feature of a data structure will radically change its implementation.
While the algorithm behind the data structure may be straightforward, adding some requirements can make the actual implementation more challenging
For the Trie
I implemented I considered 3 requirements:
- Performance
- Stack safety
- Immutability
Performance
This is the obvious (and crucial) feature.
The point of a new data structure is to be more performant for a specific usecase.
Therefore, big-O notation is the first aspect to consider.
Fortunately, most of the times you do not need to calculate this metric yourself, you can find plenty of analysis on the web ππΌββοΈ
β οΈ Important: The performance in the end depends on the internal implementation.
Stack safety
This instead is definitely not obvious (it was not obvious for me ππΌββοΈ).
Every time a recursive function is called some space on the stack (memory) is allocated.
If the recursion is too deep, we risk to reach the limit of the stack and the app will crash π€―
I prevented this issue by not using recursion at all. This introduces some challenges and makes the code more complex.
It's always possible to convert a recursive implementation to non-recursive using an array (manual stack)
let stack: Node<V> = [];
stack.push(root); // Start from the root of the Trie
// Continue until there is something in the stack π
while (stack.length > 0) {
// Pop off end of stack.
const obj = stack.pop();
// Do stuff.
// Push other objects on the stack as needed.
...
}
Immutability
An immutable data structure cannot be modified in-place: methods like insert
and delete
will not update the original variable but return a new updated value.
This requirement can have an impact on performance.
You cannot create a full copy of all the nodes every time you update some value. Instead, I used a technique called Path Copying:
Path Copying: create a copy only of the nodes in the path from the root to the updated node, while keeping all the other nodes shared
Find the node to add, update or delete, trace the path from the root, and create only the updated nodes
In practice this adds even more complexity to the implementation ππΌββοΈ
There is more π€©
Timeless coding principles, practices, and tools that make a difference, regardless of your language or framework, delivered in your inbox every week.
Data structure implementation
The first aspect to consider is the API. For a Trie
these where the main methods:
insert
remove
get
longestPrefixOf
keysWithPrefix
size
Each of these methods require a specific and efficient algorithm.
Fortunately, most of the times you do not need to invent the algorithm yourself
You can search online and find plenty of algorithms with examples. You job is to implement the algorithm considering all the requirements.
Inspectable
Making a data structure inspectable in Typescript requires to implement the following interface:
export interface Inspectable {
toString(): string
toJSON(): unknown
}
In Effect the implementation is the following:
export const toJSON = (x: unknown): unknown => {
if (
hasProperty(x, "toJSON") && isFunction(x["toJSON"]) &&
x["toJSON"].length === 0
) {
return x.toJSON()
} else if (Array.isArray(x)) {
return x.map(toJSON)
}
return x
}
export const format = (x: unknown): string => JSON.stringify(x, null, 2)
const TrieProto: TR.Trie<unknown> = {
// ...
toString() {
return format(this.toJSON())
},
toJSON() {
return {
_id: "Trie",
values: Array.from(this).map(toJSON)
}
},
/// ...
}
Iterable
By implementing the iterable protocol it is possible to allow using a for...of
loop and Array.from
to iterate the values of the data structure:
interface Iterable<T> {
[Symbol.iterator](): Iterator<T>;
}
You can also implement the iterator protocol to allow using the data structure in generator functions (yield
):
interface IterableIterator<T> extends Iterator<T> {
[Symbol.iterator](): IterableIterator<T>;
}
I defined a TrieIterator
that satisfies the iterable protocol:
The
next()
method returns one-by-one all the values when iterating over the data structure
class TrieIterator<in out V, out T> implements IterableIterator<T> {
next(): IteratorResult<T> {
// ...
}
[Symbol.iterator](): IterableIterator<T> {
return new TrieIterator(this.trie)
}
}
Equality
When you compare two instances you usually want the comparison to be based on their values (not by reference).
This requires a custom equality implementation. In Effect this is achieved using the Equal
module:
export const symbolHash: unique symbol = Symbol.for("Hash")
export interface Hash {
[symbolHash](): number
}
export const symbolEqual: unique symbol = Symbol.for("Equal")
export interface Equal extends Hash {
[symbolEqual](that: Equal): boolean
}
For Trie
we have the following:
const TrieProto: TR.Trie<unknown> = {
// ...
[Hash.symbol](): number {
let hash = Hash.hash(TrieSymbolKey)
for (const item of this) {
// Compute the hash for each item in the Trie π
hash ^= pipe(Hash.hash(item[0]), Hash.combine(Hash.hash(item[1])))
}
return hash
},
[Equal.symbol]<V>(this: TrieImpl<V>, that: unknown): boolean {
if (isTrie(that)) {
const entries = Array.from(that)
// Compare each value inside the Trie π
return Array.from(this).every((itemSelf, i) => {
const itemThat = entries[i]
return Equal.equals(itemSelf[0], itemThat[0]) && Equal.equals(itemSelf[1], itemThat[1])
})
}
return false
},
// ...
}
Pipable
Data structures in Effect are also pipable: you have access to a pipe
function to chain method calls:
const trie = Trie.empty<number>().pipe(Trie.insert("sandro", 500))
For reference, you can view here below the standard implementation of the Pipeable
interface:
This can then be used by simply passing an instance of the data structure to pipe
:
const TrieProto: TR.Trie<unknown> = {
// ...
pipe() {
return pipeArguments(this, arguments)
}
}
Refinement
You also want to check if a value is a valid instance of your data structure (similar to isArray
).
We achieve this by assigning a Symbol
parameter internal to the data structure:
const TrieSymbolKey = "effect/Trie"
export const TrieTypeId: TypeId = Symbol.for(TrieSymbolKey) as TypeId
const TypeId: unique symbol = TrieTypeId as TypeId
export type TypeId = typeof TypeId
We then assign TypeId
to the data structure while also defining its type variance:
export interface Trie<out Value> extends Iterable<[string, Value]>, Equal, Pipeable, Inspectable {
readonly [TypeId]: {
readonly _Value: Types.Invariant<Value>
}
}
You can read more about type variance here: Covariant, Contravariant, and Invariant in Typescript
This allows to define an isTrie
refinement function that simply checks the existence of the TypeId
property:
export const isTrie = (u: unknown): u is TR.Trie<unknown> => hasProperty(u, TrieTypeId)
Prototype and Object.create
Having done all the above we can finally define the complete prototype instance for the data structure:
const TrieProto: TR.Trie<unknown> = {
[TrieTypeId]: trieVariance,
[Symbol.iterator]<V>(this: TrieImpl<V>): Iterator<[string, V]> { /** */ },
[Hash.symbol](): number { /** */ },
[Equal.symbol]<V>(this: TrieImpl<V>, that: unknown): boolean { /** */ },
toString() { /** */ },
toJSON() { /** */ },
[NodeInspectSymbol]() { /** */ },
pipe() { /** */ }
}
We can then create a new instance of Trie
by using Object.create
:
Object.create()
creates a new object using an existing object as the prototype
const makeImpl = <V>(root: Node.Node<V> | undefined): TrieImpl<V> => {
const trie = Object.create(TrieProto)
trie._root = root
trie._count = root?.count ?? 0
return trie
}
This is all!
Internal structure of the final implementation of Trie.
There are indeed a lot of aspect to consider when implementing your own data structure.
Nonetheless, remember that the hard part is then the actual implementation. I needed to make sure to satisfy all the requirements.
You can view and read the final implementation on Github:
If you are interested to learn more, every week I publish a new open source project and share notes and lessons learned in my newsletter. You can subscribe here below π
Thanks for reading.