The Merk Tree — A Merkle AVL Tree
The Merk tree is the fundamental building block of GroveDB. Every subtree in the grove is a Merk tree — a self-balancing binary search tree where every node is cryptographically hashed, producing a single root hash that authenticates the entire tree's contents.
What is a Merk Node?
Unlike many Merkle tree implementations where data lives only at the leaves, in a Merk tree every node stores a key-value pair. This means there are no "empty" internal nodes — the tree is both a search structure and a data store simultaneously.
graph TD
subgraph TreeNode
subgraph inner["inner: Box<TreeNodeInner>"]
subgraph kv["kv: KV"]
KEY["<b>key:</b> Vec<u8><br/><i>e.g. b"alice"</i>"]
VAL["<b>value:</b> Vec<u8><br/><i>serialized Element bytes</i>"]
FT["<b>feature_type:</b> TreeFeatureType<br/><i>BasicMerkNode | SummedMerkNode(n) | ...</i>"]
VH["<b>value_hash:</b> [u8; 32]<br/><i>H(varint(value.len) ‖ value)</i>"]
KVH["<b>hash:</b> [u8; 32] — the kv_hash<br/><i>H(varint(key.len) ‖ key ‖ value_hash)</i>"]
end
LEFT["<b>left:</b> Option<Link>"]
RIGHT["<b>right:</b> Option<Link>"]
end
OLD["<b>old_value:</b> Option<Vec<u8>> — previous value for cost deltas"]
KNOWN["<b>known_storage_cost:</b> Option<KeyValueStorageCost>"]
end
LEFT -->|"smaller keys"| LC["Left Child<br/><i>Link::Reference | Modified<br/>Uncommitted | Loaded</i>"]
RIGHT -->|"larger keys"| RC["Right Child<br/><i>Link::Reference | Modified<br/>Uncommitted | Loaded</i>"]
style kv fill:#eaf2f8,stroke:#2980b9
style inner fill:#fef9e7,stroke:#f39c12
style TreeNode fill:#f9f9f9,stroke:#333
style LC fill:#d5f5e3,stroke:#27ae60
style RC fill:#d5f5e3,stroke:#27ae60
In code (merk/src/tree/mod.rs):
#![allow(unused)] fn main() { pub struct TreeNode { pub(crate) inner: Box<TreeNodeInner>, pub(crate) old_value: Option<Vec<u8>>, // Previous value for cost tracking pub(crate) known_storage_cost: Option<KeyValueStorageCost>, } pub struct TreeNodeInner { pub(crate) left: Option<Link>, // Left child (smaller keys) pub(crate) right: Option<Link>, // Right child (larger keys) pub(crate) kv: KV, // The key-value payload } }
The Box<TreeNodeInner> keeps the node on the heap, which is essential since child
links can recursively contain entire TreeNode instances.
The KV Structure
The KV struct holds both the raw data and its cryptographic digests
(merk/src/tree/kv.rs):
#![allow(unused)] fn main() { pub struct KV { pub(super) key: Vec<u8>, // The lookup key pub(super) value: Vec<u8>, // The stored value pub(super) feature_type: TreeFeatureType, // Aggregation behavior pub(crate) value_defined_cost: Option<ValueDefinedCostType>, pub(super) hash: CryptoHash, // kv_hash pub(super) value_hash: CryptoHash, // H(value) } }
Two important points:
-
Keys are not stored on disk as part of the encoded node. They are stored as the RocksDB key. When a node is decoded from storage, the key is injected from the outside. This avoids duplicating key bytes.
-
Two hash fields are maintained. The
value_hashisH(value)and thehash(kv_hash) isH(key, value_hash). Keeping both allows the proof system to choose how much information to reveal.
The Semi-Balanced Nature — How AVL "Wobbles"
A Merk tree is an AVL tree — the classic self-balancing binary search tree invented by Adelson-Velsky and Landis. The key invariant is:
For every node, the height difference between its left and right subtrees is at most 1.
This is expressed as the balance factor:
balance_factor = right_height - left_height
Valid values: {-1, 0, 1}
#![allow(unused)] fn main() { // merk/src/tree/mod.rs pub const fn balance_factor(&self) -> i8 { let left_height = self.child_height(true) as i8; let right_height = self.child_height(false) as i8; right_height - left_height } }
But here's the subtle point: while each individual node can only lean by one level, these leanings can compound through the tree. This is why we call it "semi-balanced" — the tree isn't perfectly balanced like a complete binary tree.
Consider a tree of 10 nodes. A perfectly balanced tree would have height 4 (⌈log₂(10+1)⌉). But an AVL tree might have height 5:
Perfectly balanced (height 4) — every level fully packed:
graph TD
N5["5<br/><small>bf=0</small>"]
N3["3<br/><small>bf=0</small>"]
N8["8<br/><small>bf=0</small>"]
N2["2<br/><small>bf=0</small>"]
N4["4<br/><small>bf=0</small>"]
N6["6<br/><small>bf=0</small>"]
N9["9<br/><small>bf=+1</small>"]
N10["10<br/><small>bf=0</small>"]
N5 --- N3
N5 --- N8
N3 --- N2
N3 --- N4
N8 --- N6
N8 --- N9
N9 --- N10
style N5 fill:#d4e6f1,stroke:#2980b9
AVL-valid "wobble" (height 5) — each node leans at most 1, but it compounds:
graph TD
N4["4<br/><small>bf=+1</small>"]
N2["2<br/><small>bf=-1</small>"]
N7["7<br/><small>bf=+1</small>"]
N1["1<br/><small>bf=-1</small>"]
N3["3<br/><small>bf=0</small>"]
N5["5<br/><small>bf=0</small>"]
N9["9<br/><small>bf=-1</small>"]
N0["0<br/><small>bf=0</small>"]
N8["8<br/><small>bf=0</small>"]
N10["10<br/><small>bf=0</small>"]
N4 --- N2
N4 --- N7
N2 --- N1
N2 --- N3
N7 --- N5
N7 --- N9
N1 --- N0
N9 --- N8
N9 --- N10
style N4 fill:#fadbd8,stroke:#e74c3c
Height 5 vs the perfect 4 — that's the "wobble". Worst case: h ≤ 1.44 × log₂(n+2).
Both trees are valid AVL trees! The worst-case height of an AVL tree is:
h ≤ 1.4404 × log₂(n + 2) − 0.3277
So for n = 1,000,000 nodes:
- Perfect balance: height 20
- AVL worst case: height ≈ 29
This ~44% overhead is the price of AVL's simple rotation rules. In practice, random insertions produce trees much closer to perfectly balanced.
Here's what valid and invalid trees look like:
VALID — all balance factors in {-1, 0, +1}:
graph TD
subgraph balanced["Balanced (bf=0)"]
D1["D<br/>bf=0"] --- B1["B<br/>bf=0"]
D1 --- F1["F<br/>bf=0"]
B1 --- A1["A"] & C1["C"]
F1 --- E1["E"] & G1["G"]
end
subgraph rightlean["Right-leaning (bf=+1)"]
D2["D<br/>bf=+1"] --- B2["B<br/>bf=0"]
D2 --- F2["F<br/>bf=0"]
B2 --- A2["A"] & C2["C"]
F2 --- E2["E"] & G2["G"]
end
subgraph leftlean["Left-leaning (bf=-1)"]
D3["D<br/>bf=-1"] --- B3["B<br/>bf=-1"]
D3 --- E3["E"]
B3 --- A3["A"]
end
style balanced fill:#d5f5e3,stroke:#27ae60
style rightlean fill:#d5f5e3,stroke:#27ae60
style leftlean fill:#d5f5e3,stroke:#27ae60
INVALID — balance factor = +2 (needs rotation!):
graph TD
B["B<br/><b>bf=+2 ✗</b>"]
D["D<br/>bf=+1"]
F["F<br/>bf=0"]
B --- D
D --- F
style B fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
The right subtree is 2 levels taller than the left (which is empty). This triggers a left rotation to restore the AVL invariant.
Rotations — Restoring Balance
When an insertion or deletion causes a balance factor to reach ±2, the tree must be rotated to restore the AVL invariant. There are four cases, reducible to two fundamental operations.
Single Left Rotation
Used when a node is right-heavy (bf = +2) and its right child is right-heavy or balanced (bf ≥ 0):
Before (bf=+2):
graph TD
A["A<br/><small>bf=+2</small>"]
t1["t₁"]
B["B<br/><small>bf≥0</small>"]
X["X"]
C["C"]
A --- t1
A --- B
B --- X
B --- C
style A fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
After left rotation — B promoted to root:
graph TD
B2["B<br/><small>bf=0</small>"]
A2["A"]
C2["C"]
t12["t₁"]
X2["X"]
B2 --- A2
B2 --- C2
A2 --- t12
A2 --- X2
style B2 fill:#d5f5e3,stroke:#27ae60,stroke-width:3px
Steps: (1) Detach B from A. (2) Detach X (B's left child). (3) Attach X as A's right child. (4) Attach A as B's left child. The subtree rooted at B is now balanced.
In code (merk/src/tree/ops.rs):
#![allow(unused)] fn main() { fn rotate<V>(self, left: bool, ...) -> CostResult<Self, Error> { // Detach child on the heavy side let (tree, child) = self.detach_expect(left, ...); // Detach grandchild from opposite side of child let (child, maybe_grandchild) = child.detach(!left, ...); // Attach grandchild to original root tree.attach(left, maybe_grandchild) .maybe_balance(...) .flat_map_ok(|tree| { // Attach original root as child of promoted node child.attach(!left, Some(tree)) .maybe_balance(...) }) } }
Note how maybe_balance is called recursively — the rotation itself might create
new imbalances that need further correction.
Double Rotation (Left-Right)
Used when a node is left-heavy (bf = -2) but its left child is right-heavy (bf > 0). A single rotation would not fix this:
Step 0: Before — C is left-heavy (bf=-2) but its left child A leans right (bf=+1). A single rotation won't fix this:
graph TD
C0["C<br/><small>bf=-2</small>"]
A0["A<br/><small>bf=+1</small>"]
t4["t₄"]
t1["t₁"]
B0["B"]
t2["t₂"]
t3["t₃"]
C0 --- A0
C0 --- t4
A0 --- t1
A0 --- B0
B0 --- t2
B0 --- t3
style C0 fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
Step 1: Left-rotate child A — now both C and B lean left, fixable by a single rotation:
graph TD
C1["C<br/><small>bf=-2</small>"]
B1["B"]
t41["t₄"]
A1["A"]
t31["t₃"]
t11["t₁"]
t21["t₂"]
C1 --- B1
C1 --- t41
B1 --- A1
B1 --- t31
A1 --- t11
A1 --- t21
style C1 fill:#fdebd0,stroke:#e67e22,stroke-width:2px
Step 2: Right-rotate root C — balanced!
graph TD
B2["B<br/><small>bf=0</small>"]
A2["A"]
C2["C"]
t12["t₁"]
t22["t₂"]
t32["t₃"]
t42["t₄"]
B2 --- A2
B2 --- C2
A2 --- t12
A2 --- t22
C2 --- t32
C2 --- t42
style B2 fill:#d5f5e3,stroke:#27ae60,stroke-width:3px
The algorithm detects this case by comparing the parent's lean direction with the child's balance factor:
#![allow(unused)] fn main() { fn maybe_balance<V>(self, ...) -> CostResult<Self, Error> { let balance_factor = self.balance_factor(); if balance_factor.abs() <= 1 { return Ok(self); // Already balanced } let left = balance_factor < 0; // true if left-heavy // Double rotation needed when child leans opposite to parent let tree = if left == (self.tree().link(left).unwrap().balance_factor() > 0) { // First rotation: rotate child in opposite direction self.walk_expect(left, |child| child.rotate(!left, ...).map_ok(Some), ... ) } else { self }; // Second (or only) rotation tree.rotate(left, ...) } }
Batch Operations — Building and Applying
Rather than inserting elements one at a time, Merk supports batch operations that apply multiple changes in a single pass. This is critical for efficiency: a batch of N operations on a tree of M elements takes O((M + N) log(M + N)) time, versus O(N log M) for sequential inserts.
The MerkBatch Type
#![allow(unused)] fn main() { type MerkBatch<K> = [(K, Op)]; enum Op { Put(Vec<u8>, TreeFeatureType), // Insert or update with value and feature type PutWithSpecializedCost(...), // Insert with predefined cost PutCombinedReference(...), // Insert reference with combined hash Replace(Vec<u8>, TreeFeatureType), Patch { .. }, // Partial value update Delete, // Remove key DeleteLayered, // Remove with layered cost DeleteMaybeSpecialized, // Remove with optional specialized cost } }
Strategy 1: build() — Building from Scratch
When the tree is empty, build() constructs a balanced tree directly from the
sorted batch using a median-split algorithm:
Input batch (sorted): [A, B, C, D, E, F, G] — pick middle (D) as root, recurse on each half:
graph TD
D["<b>D</b><br/><small>root = mid(0..6)</small>"]
B["<b>B</b><br/><small>mid(A,B,C)</small>"]
F["<b>F</b><br/><small>mid(E,F,G)</small>"]
A["A"]
C["C"]
E["E"]
G["G"]
D --- B
D --- F
B --- A
B --- C
F --- E
F --- G
style D fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
style B fill:#d5f5e3,stroke:#27ae60
style F fill:#d5f5e3,stroke:#27ae60
Result: perfectly balanced tree with height = 3 = ⌈log₂(7)⌉.
#![allow(unused)] fn main() { fn build(batch: &MerkBatch<K>, ...) -> CostResult<Option<TreeNode>, Error> { let mid_index = batch.len() / 2; let (mid_key, mid_op) = &batch[mid_index]; // Create root node from middle element let mid_tree = TreeNode::new(mid_key.clone(), value.clone(), None, feature_type)?; // Recursively build left and right subtrees let left = Self::build(&batch[..mid_index], ...); let right = Self::build(&batch[mid_index + 1..], ...); // Attach children mid_tree.attach(true, left).attach(false, right) } }
This produces a tree with height ⌈log₂(n)⌉ — perfectly balanced.
Strategy 2: apply_sorted() — Merging into Existing Tree
When the tree already has data, apply_sorted() uses binary search to find
where each batch operation belongs, then recursively applies operations to the left
and right subtrees:
Existing tree with batch [(B, Put), (F, Delete)]:
Binary search: B < D (go left), F > D (go right).
Before:
graph TD
D0["D"] --- C0["C"]
D0 --- E0["E"]
E0 --- F0["F"]
style D0 fill:#d4e6f1,stroke:#2980b9
After applying batch and rebalancing:
graph TD
D1["D"] --- B1["B"]
D1 --- E1["E"]
B1 --- C1["C"]
style D1 fill:#d5f5e3,stroke:#27ae60
B inserted as left subtree, F deleted from right subtree.
maybe_balance()confirms bf(D) = 0.
#![allow(unused)] fn main() { fn apply_sorted(self, batch: &MerkBatch<K>, ...) -> CostResult<...> { let search = batch.binary_search_by(|(key, _)| key.cmp(self.tree().key())); match search { Ok(index) => { // Key matches this node — apply operation directly // (Put replaces value, Delete removes node) } Err(mid) => { // Key not found — mid is the split point // Recurse on left_batch[..mid] and right_batch[mid..] } } self.recurse(batch, mid, exclusive, ...) } }
The recurse method splits the batch and walks left and right:
#![allow(unused)] fn main() { fn recurse(self, batch: &MerkBatch<K>, mid: usize, ...) { let left_batch = &batch[..mid]; let right_batch = &batch[mid..]; // or mid+1 if exclusive // Apply left batch to left subtree let tree = self.walk(true, |maybe_left| { Self::apply_to(maybe_left, left_batch, ...) }); // Apply right batch to right subtree let tree = tree.walk(false, |maybe_right| { Self::apply_to(maybe_right, right_batch, ...) }); // Re-balance after modifications tree.maybe_balance(...) } }
Node Removal
When deleting a node with two children, Merk promotes the edge node from the taller subtree. This minimizes the chance of needing additional rotations:
Before — deleting D (has two children, right subtree height ≥ left):
graph TD
D["D ✗ delete"]
B0["B"]
F0["F"]
A0["A"]
C0["C"]
E0["E ← successor"]
G0["G"]
D --- B0
D --- F0
B0 --- A0
B0 --- C0
F0 --- E0
F0 --- G0
style D fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
style E0 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
After — E (leftmost in right subtree = in-order successor) promoted to D's position:
graph TD
E1["E"]
B1["B"]
F1["F"]
A1["A"]
C1["C"]
G1["G"]
E1 --- B1
E1 --- F1
B1 --- A1
B1 --- C1
F1 --- G1
style E1 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
Rule: If left height > right → promote right-edge of left subtree. If right height ≥ left → promote left-edge of right subtree. This minimizes post-deletion rebalancing.
#![allow(unused)] fn main() { pub fn remove(self, ...) -> CostResult<Option<Self>, Error> { let has_left = tree.link(true).is_some(); let has_right = tree.link(false).is_some(); let left = tree.child_height(true) > tree.child_height(false); if has_left && has_right { // Two children: promote edge of taller child let (tree, tall_child) = self.detach_expect(left, ...); let (_, short_child) = tree.detach_expect(!left, ...); tall_child.promote_edge(!left, short_child, ...) } else if has_left || has_right { // One child: promote it directly self.detach_expect(left, ...).1 } else { // Leaf node: just remove None } } }