Introduction — What is GroveDB?

The Core Idea

GroveDB is a hierarchical authenticated data structure — essentially a grove (tree of trees) built on Merkle AVL trees. Each node in the database is part of a cryptographically authenticated tree, and each tree can contain other trees as children, forming a deep hierarchy of verifiable state.

graph TD
    subgraph root["Root Merk Tree"]
        R_contracts["&quot;contracts&quot;<br/><i>Tree</i>"]
        R_identities["&quot;identities&quot;<br/><i>Tree</i>"]
        R_balances["&quot;balances&quot;<br/><i>SumTree</i>"]
        R_contracts --- R_identities
        R_contracts --- R_balances
    end

    subgraph ident["Identities Merk"]
        I_bob["&quot;bob&quot;<br/><i>Tree</i>"]
        I_alice["&quot;alice&quot;<br/><i>Tree</i>"]
        I_carol["&quot;carol&quot;<br/><i>Item</i>"]
        I_bob --- I_alice
        I_bob --- I_carol
    end

    subgraph contracts["Contracts Merk"]
        C_c2["&quot;C2&quot;<br/><i>Item</i>"]
        C_c1["&quot;C1&quot;<br/><i>Item</i>"]
        C_c3["&quot;C3&quot;<br/><i>Item</i>"]
        C_c2 --- C_c1
        C_c2 --- C_c3
    end

    subgraph balances["Balances SumTree — sum=5300"]
        B_bob["&quot;bob&quot;<br/>SumItem(2500)"]
        B_al["&quot;alice&quot;<br/>SumItem(2000)"]
        B_eve["&quot;eve&quot;<br/>SumItem(800)"]
        B_bob --- B_al
        B_bob --- B_eve
    end

    subgraph alice_merk["Alice Merk"]
        A_name["&quot;name&quot; → Alice"]
        A_bal["&quot;balance&quot; → 1000"]
    end

    subgraph bob_merk["Bob Merk"]
        Bo_name["&quot;name&quot; → Bob"]
    end

    R_identities -.->|subtree| ident
    R_contracts -.->|subtree| contracts
    R_balances -.->|subtree| balances
    I_alice -.->|subtree| alice_merk
    I_bob -.->|subtree| bob_merk

    style root fill:#e8f4fd,stroke:#2980b9,stroke-width:2px
    style ident fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style contracts fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style balances fill:#fdedec,stroke:#e74c3c,stroke-width:2px
    style alice_merk fill:#eafaf1,stroke:#27ae60,stroke-width:2px
    style bob_merk fill:#eafaf1,stroke:#27ae60,stroke-width:2px

Each colored box is a separate Merk tree. Dashed arrows show the subtree relationship — a Tree element in the parent contains the root key of the child Merk.

In a traditional database, you might store data in a flat key-value store with a single Merkle tree on top for authentication. GroveDB takes a different approach: it nests Merkle trees inside Merkle trees. This gives you:

Efficient secondary indexes — query by any path, not just primary key
Compact cryptographic proofs — prove the existence (or absence) of any data
Aggregate data — trees can automatically sum, count, or otherwise aggregate their children
Atomic cross-tree operations — batch operations span multiple subtrees

Why GroveDB Exists

GroveDB was designed for Dash Platform, a decentralized application platform where every piece of state must be:

Authenticated: Any node can prove any piece of state to a light client
Deterministic: Every node computes exactly the same state root
Efficient: Operations must complete within block time constraints
Queryable: Applications need rich queries, not just key lookups

Traditional approaches fall short:

Approach	Problem
Plain Merkle Tree	Only supports key lookups, no range queries
Ethereum MPT	Expensive rebalancing, large proof sizes
Flat key-value + single tree	No hierarchical queries, single proof covers everything
B-tree	Not naturally Merklized, complex authentication

GroveDB solves these by combining the proven balance guarantees of AVL trees with hierarchical nesting and a rich element type system.

Architecture Overview

GroveDB is organized into distinct layers, each with a clear responsibility:

graph TD
    APP["<b>Application Layer</b><br/>Dash Platform, etc.<br/><code>insert() get() query() prove() apply_batch()</code>"]

    GROVE["<b>GroveDB Core</b> — <code>grovedb/src/</code><br/>Hierarchical subtree management · Element type system<br/>Reference resolution · Batch ops · Multi-layer proofs"]

    MERK["<b>Merk Layer</b> — <code>merk/src/</code><br/>Merkle AVL tree · Self-balancing rotations<br/>Link system · Blake3 hashing · Proof encoding"]

    STORAGE["<b>Storage Layer</b> — <code>storage/src/</code><br/>RocksDB OptimisticTransactionDB<br/>4 column families · Blake3 prefix isolation · Batched writes"]

    COST["<b>Cost Layer</b> — <code>costs/src/</code><br/>OperationCost tracking · CostContext monad<br/>Worst-case &amp; average-case estimation"]

    APP ==>|"writes ↓"| GROVE
    GROVE ==>|"tree ops"| MERK
    MERK ==>|"disk I/O"| STORAGE
    STORAGE -.->|"cost accumulation ↑"| COST
    COST -.-> MERK
    MERK -.-> GROVE
    GROVE ==>|"reads ↑"| APP

    style APP fill:#f5f5f5,stroke:#999,stroke-width:1px
    style GROVE fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style MERK fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style STORAGE fill:#fdebd0,stroke:#e67e22,stroke-width:2px
    style COST fill:#fadbd8,stroke:#e74c3c,stroke-width:2px

Data flows down through these layers during writes and up during reads. Every operation accumulates costs as it traverses the stack, enabling precise resource accounting.

The Merk Tree — A Merkle AVL Tree

The Merk tree is the fundamental building block of GroveDB. Every subtree in the grove is a Merk tree — a self-balancing binary search tree where every node is cryptographically hashed, producing a single root hash that authenticates the entire tree's contents.

What is a Merk Node?

Unlike many Merkle tree implementations where data lives only at the leaves, in a Merk tree every node stores a key-value pair. This means there are no "empty" internal nodes — the tree is both a search structure and a data store simultaneously.

graph TD
    subgraph TreeNode
        subgraph inner["inner: Box&lt;TreeNodeInner&gt;"]
            subgraph kv["kv: KV"]
                KEY["<b>key:</b> Vec&lt;u8&gt;<br/><i>e.g. b&quot;alice&quot;</i>"]
                VAL["<b>value:</b> Vec&lt;u8&gt;<br/><i>serialized Element bytes</i>"]
                FT["<b>feature_type:</b> TreeFeatureType<br/><i>BasicMerkNode | SummedMerkNode(n) | ...</i>"]
                VH["<b>value_hash:</b> [u8; 32]<br/><i>H(varint(value.len) ‖ value)</i>"]
                KVH["<b>hash:</b> [u8; 32] — the kv_hash<br/><i>H(varint(key.len) ‖ key ‖ value_hash)</i>"]
            end
            LEFT["<b>left:</b> Option&lt;Link&gt;"]
            RIGHT["<b>right:</b> Option&lt;Link&gt;"]
        end
        OLD["<b>old_value:</b> Option&lt;Vec&lt;u8&gt;&gt; — previous value for cost deltas"]
        KNOWN["<b>known_storage_cost:</b> Option&lt;KeyValueStorageCost&gt;"]
    end

    LEFT -->|"smaller keys"| LC["Left Child<br/><i>Link::Reference | Modified<br/>Uncommitted | Loaded</i>"]
    RIGHT -->|"larger keys"| RC["Right Child<br/><i>Link::Reference | Modified<br/>Uncommitted | Loaded</i>"]

    style kv fill:#eaf2f8,stroke:#2980b9
    style inner fill:#fef9e7,stroke:#f39c12
    style TreeNode fill:#f9f9f9,stroke:#333
    style LC fill:#d5f5e3,stroke:#27ae60
    style RC fill:#d5f5e3,stroke:#27ae60

In code (merk/src/tree/mod.rs):

#![allow(unused)]
fn main() {
pub struct TreeNode {
    pub(crate) inner: Box<TreeNodeInner>,
    pub(crate) old_value: Option<Vec<u8>>,        // Previous value for cost tracking
    pub(crate) known_storage_cost: Option<KeyValueStorageCost>,
}

pub struct TreeNodeInner {
    pub(crate) left: Option<Link>,    // Left child (smaller keys)
    pub(crate) right: Option<Link>,   // Right child (larger keys)
    pub(crate) kv: KV,               // The key-value payload
}
}

The Box<TreeNodeInner> keeps the node on the heap, which is essential since child links can recursively contain entire TreeNode instances.

The KV Structure

The KV struct holds both the raw data and its cryptographic digests (merk/src/tree/kv.rs):

#![allow(unused)]
fn main() {
pub struct KV {
    pub(super) key: Vec<u8>,                        // The lookup key
    pub(super) value: Vec<u8>,                      // The stored value
    pub(super) feature_type: TreeFeatureType,       // Aggregation behavior
    pub(crate) value_defined_cost: Option<ValueDefinedCostType>,
    pub(super) hash: CryptoHash,                    // kv_hash
    pub(super) value_hash: CryptoHash,              // H(value)
}
}

Two important points:

Keys are not stored on disk as part of the encoded node. They are stored as the RocksDB key. When a node is decoded from storage, the key is injected from the outside. This avoids duplicating key bytes.
Two hash fields are maintained. The value_hash is H(value) and the hash (kv_hash) is H(key, value_hash). Keeping both allows the proof system to choose how much information to reveal.

The Semi-Balanced Nature — How AVL "Wobbles"

A Merk tree is an AVL tree — the classic self-balancing binary search tree invented by Adelson-Velsky and Landis. The key invariant is:

For every node, the height difference between its left and right subtrees is at most 1.

This is expressed as the balance factor:

balance_factor = right_height - left_height

Valid values: {-1, 0, 1}

#![allow(unused)]
fn main() {
// merk/src/tree/mod.rs
pub const fn balance_factor(&self) -> i8 {
    let left_height = self.child_height(true) as i8;
    let right_height = self.child_height(false) as i8;
    right_height - left_height
}
}

But here's the subtle point: while each individual node can only lean by one level, these leanings can compound through the tree. This is why we call it "semi-balanced" — the tree isn't perfectly balanced like a complete binary tree.

Consider a tree of 10 nodes. A perfectly balanced tree would have height 4 (⌈log₂(10+1)⌉). But an AVL tree might have height 5:

Perfectly balanced (height 4) — every level fully packed:

graph TD
    N5["5<br/><small>bf=0</small>"]
    N3["3<br/><small>bf=0</small>"]
    N8["8<br/><small>bf=0</small>"]
    N2["2<br/><small>bf=0</small>"]
    N4["4<br/><small>bf=0</small>"]
    N6["6<br/><small>bf=0</small>"]
    N9["9<br/><small>bf=+1</small>"]
    N10["10<br/><small>bf=0</small>"]

    N5 --- N3
    N5 --- N8
    N3 --- N2
    N3 --- N4
    N8 --- N6
    N8 --- N9
    N9 --- N10

    style N5 fill:#d4e6f1,stroke:#2980b9

AVL-valid "wobble" (height 5) — each node leans at most 1, but it compounds:

graph TD
    N4["4<br/><small>bf=+1</small>"]
    N2["2<br/><small>bf=-1</small>"]
    N7["7<br/><small>bf=+1</small>"]
    N1["1<br/><small>bf=-1</small>"]
    N3["3<br/><small>bf=0</small>"]
    N5["5<br/><small>bf=0</small>"]
    N9["9<br/><small>bf=-1</small>"]
    N0["0<br/><small>bf=0</small>"]
    N8["8<br/><small>bf=0</small>"]
    N10["10<br/><small>bf=0</small>"]

    N4 --- N2
    N4 --- N7
    N2 --- N1
    N2 --- N3
    N7 --- N5
    N7 --- N9
    N1 --- N0
    N9 --- N8
    N9 --- N10

    style N4 fill:#fadbd8,stroke:#e74c3c

Height 5 vs the perfect 4 — that's the "wobble". Worst case: h ≤ 1.44 × log₂(n+2).

Both trees are valid AVL trees! The worst-case height of an AVL tree is:

h ≤ 1.4404 × log₂(n + 2) − 0.3277

So for n = 1,000,000 nodes:

Perfect balance: height 20
AVL worst case: height ≈ 29

This ~44% overhead is the price of AVL's simple rotation rules. In practice, random insertions produce trees much closer to perfectly balanced.

Here's what valid and invalid trees look like:

VALID — all balance factors in {-1, 0, +1}:

graph TD
    subgraph balanced["Balanced (bf=0)"]
        D1["D<br/>bf=0"] --- B1["B<br/>bf=0"]
        D1 --- F1["F<br/>bf=0"]
        B1 --- A1["A"] & C1["C"]
        F1 --- E1["E"] & G1["G"]
    end
    subgraph rightlean["Right-leaning (bf=+1)"]
        D2["D<br/>bf=+1"] --- B2["B<br/>bf=0"]
        D2 --- F2["F<br/>bf=0"]
        B2 --- A2["A"] & C2["C"]
        F2 --- E2["E"] & G2["G"]
    end
    subgraph leftlean["Left-leaning (bf=-1)"]
        D3["D<br/>bf=-1"] --- B3["B<br/>bf=-1"]
        D3 --- E3["E"]
        B3 --- A3["A"]
    end

    style balanced fill:#d5f5e3,stroke:#27ae60
    style rightlean fill:#d5f5e3,stroke:#27ae60
    style leftlean fill:#d5f5e3,stroke:#27ae60

INVALID — balance factor = +2 (needs rotation!):

graph TD
    B["B<br/><b>bf=+2 ✗</b>"]
    D["D<br/>bf=+1"]
    F["F<br/>bf=0"]
    B --- D
    D --- F

    style B fill:#fadbd8,stroke:#e74c3c,stroke-width:3px

The right subtree is 2 levels taller than the left (which is empty). This triggers a left rotation to restore the AVL invariant.

Rotations — Restoring Balance

When an insertion or deletion causes a balance factor to reach ±2, the tree must be rotated to restore the AVL invariant. There are four cases, reducible to two fundamental operations.

Single Left Rotation

Used when a node is right-heavy (bf = +2) and its right child is right-heavy or balanced (bf ≥ 0):

Before (bf=+2):

graph TD
    A["A<br/><small>bf=+2</small>"]
    t1["t₁"]
    B["B<br/><small>bf≥0</small>"]
    X["X"]
    C["C"]
    A --- t1
    A --- B
    B --- X
    B --- C
    style A fill:#fadbd8,stroke:#e74c3c,stroke-width:3px

After left rotation — B promoted to root:

graph TD
    B2["B<br/><small>bf=0</small>"]
    A2["A"]
    C2["C"]
    t12["t₁"]
    X2["X"]
    B2 --- A2
    B2 --- C2
    A2 --- t12
    A2 --- X2
    style B2 fill:#d5f5e3,stroke:#27ae60,stroke-width:3px

Steps: (1) Detach B from A. (2) Detach X (B's left child). (3) Attach X as A's right child. (4) Attach A as B's left child. The subtree rooted at B is now balanced.

In code (merk/src/tree/ops.rs):

#![allow(unused)]
fn main() {
fn rotate<V>(self, left: bool, ...) -> CostResult<Self, Error> {
    // Detach child on the heavy side
    let (tree, child) = self.detach_expect(left, ...);
    // Detach grandchild from opposite side of child
    let (child, maybe_grandchild) = child.detach(!left, ...);

    // Attach grandchild to original root
    tree.attach(left, maybe_grandchild)
        .maybe_balance(...)
        .flat_map_ok(|tree| {
            // Attach original root as child of promoted node
            child.attach(!left, Some(tree))
                .maybe_balance(...)
        })
}
}

Note how maybe_balance is called recursively — the rotation itself might create new imbalances that need further correction.

Double Rotation (Left-Right)

Used when a node is left-heavy (bf = -2) but its left child is right-heavy (bf > 0). A single rotation would not fix this:

Step 0: Before — C is left-heavy (bf=-2) but its left child A leans right (bf=+1). A single rotation won't fix this:

graph TD
    C0["C<br/><small>bf=-2</small>"]
    A0["A<br/><small>bf=+1</small>"]
    t4["t₄"]
    t1["t₁"]
    B0["B"]
    t2["t₂"]
    t3["t₃"]
    C0 --- A0
    C0 --- t4
    A0 --- t1
    A0 --- B0
    B0 --- t2
    B0 --- t3
    style C0 fill:#fadbd8,stroke:#e74c3c,stroke-width:3px

Step 1: Left-rotate child A — now both C and B lean left, fixable by a single rotation:

graph TD
    C1["C<br/><small>bf=-2</small>"]
    B1["B"]
    t41["t₄"]
    A1["A"]
    t31["t₃"]
    t11["t₁"]
    t21["t₂"]
    C1 --- B1
    C1 --- t41
    B1 --- A1
    B1 --- t31
    A1 --- t11
    A1 --- t21
    style C1 fill:#fdebd0,stroke:#e67e22,stroke-width:2px

Step 2: Right-rotate root C — balanced!

graph TD
    B2["B<br/><small>bf=0</small>"]
    A2["A"]
    C2["C"]
    t12["t₁"]
    t22["t₂"]
    t32["t₃"]
    t42["t₄"]
    B2 --- A2
    B2 --- C2
    A2 --- t12
    A2 --- t22
    C2 --- t32
    C2 --- t42
    style B2 fill:#d5f5e3,stroke:#27ae60,stroke-width:3px

The algorithm detects this case by comparing the parent's lean direction with the child's balance factor:

#![allow(unused)]
fn main() {
fn maybe_balance<V>(self, ...) -> CostResult<Self, Error> {
    let balance_factor = self.balance_factor();
    if balance_factor.abs() <= 1 {
        return Ok(self);  // Already balanced
    }

    let left = balance_factor < 0;  // true if left-heavy

    // Double rotation needed when child leans opposite to parent
    let tree = if left == (self.tree().link(left).unwrap().balance_factor() > 0) {
        // First rotation: rotate child in opposite direction
        self.walk_expect(left, |child|
            child.rotate(!left, ...).map_ok(Some), ...
        )
    } else {
        self
    };

    // Second (or only) rotation
    tree.rotate(left, ...)
}
}

Batch Operations — Building and Applying

Rather than inserting elements one at a time, Merk supports batch operations that apply multiple changes in a single pass. This is critical for efficiency: a batch of N operations on a tree of M elements takes O((M + N) log(M + N)) time, versus O(N log M) for sequential inserts.

The MerkBatch Type

#![allow(unused)]
fn main() {
type MerkBatch<K> = [(K, Op)];

enum Op {
    Put(Vec<u8>, TreeFeatureType),  // Insert or update with value and feature type
    PutWithSpecializedCost(...),     // Insert with predefined cost
    PutCombinedReference(...),       // Insert reference with combined hash
    Replace(Vec<u8>, TreeFeatureType),
    Patch { .. },                    // Partial value update
    Delete,                          // Remove key
    DeleteLayered,                   // Remove with layered cost
    DeleteMaybeSpecialized,          // Remove with optional specialized cost
}
}

Strategy 1: build() — Building from Scratch

When the tree is empty, build() constructs a balanced tree directly from the sorted batch using a median-split algorithm:

Input batch (sorted): [A, B, C, D, E, F, G] — pick middle (D) as root, recurse on each half:

graph TD
    D["<b>D</b><br/><small>root = mid(0..6)</small>"]
    B["<b>B</b><br/><small>mid(A,B,C)</small>"]
    F["<b>F</b><br/><small>mid(E,F,G)</small>"]
    A["A"]
    C["C"]
    E["E"]
    G["G"]

    D --- B
    D --- F
    B --- A
    B --- C
    F --- E
    F --- G

    style D fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style B fill:#d5f5e3,stroke:#27ae60
    style F fill:#d5f5e3,stroke:#27ae60

Result: perfectly balanced tree with height = 3 = ⌈log₂(7)⌉.

#![allow(unused)]
fn main() {
fn build(batch: &MerkBatch<K>, ...) -> CostResult<Option<TreeNode>, Error> {
    let mid_index = batch.len() / 2;
    let (mid_key, mid_op) = &batch[mid_index];

    // Create root node from middle element
    let mid_tree = TreeNode::new(mid_key.clone(), value.clone(), None, feature_type)?;

    // Recursively build left and right subtrees
    let left = Self::build(&batch[..mid_index], ...);
    let right = Self::build(&batch[mid_index + 1..], ...);

    // Attach children
    mid_tree.attach(true, left).attach(false, right)
}
}

This produces a tree with height ⌈log₂(n)⌉ — perfectly balanced.

Strategy 2: apply_sorted() — Merging into Existing Tree

When the tree already has data, apply_sorted() uses binary search to find where each batch operation belongs, then recursively applies operations to the left and right subtrees:

Existing tree with batch [(B, Put), (F, Delete)]:

Binary search: B < D (go left), F > D (go right).

Before:

graph TD
    D0["D"] --- C0["C"]
    D0 --- E0["E"]
    E0 --- F0["F"]
    style D0 fill:#d4e6f1,stroke:#2980b9

After applying batch and rebalancing:

graph TD
    D1["D"] --- B1["B"]
    D1 --- E1["E"]
    B1 --- C1["C"]
    style D1 fill:#d5f5e3,stroke:#27ae60

B inserted as left subtree, F deleted from right subtree. maybe_balance() confirms bf(D) = 0.

#![allow(unused)]
fn main() {
fn apply_sorted(self, batch: &MerkBatch<K>, ...) -> CostResult<...> {
    let search = batch.binary_search_by(|(key, _)| key.cmp(self.tree().key()));

    match search {
        Ok(index) => {
            // Key matches this node — apply operation directly
            // (Put replaces value, Delete removes node)
        }
        Err(mid) => {
            // Key not found — mid is the split point
            // Recurse on left_batch[..mid] and right_batch[mid..]
        }
    }

    self.recurse(batch, mid, exclusive, ...)
}
}

The recurse method splits the batch and walks left and right:

#![allow(unused)]
fn main() {
fn recurse(self, batch: &MerkBatch<K>, mid: usize, ...) {
    let left_batch = &batch[..mid];
    let right_batch = &batch[mid..];  // or mid+1 if exclusive

    // Apply left batch to left subtree
    let tree = self.walk(true, |maybe_left| {
        Self::apply_to(maybe_left, left_batch, ...)
    });

    // Apply right batch to right subtree
    let tree = tree.walk(false, |maybe_right| {
        Self::apply_to(maybe_right, right_batch, ...)
    });

    // Re-balance after modifications
    tree.maybe_balance(...)
}
}

Node Removal

When deleting a node with two children, Merk promotes the edge node from the taller subtree. This minimizes the chance of needing additional rotations:

Before — deleting D (has two children, right subtree height ≥ left):

graph TD
    D["D ✗ delete"]
    B0["B"]
    F0["F"]
    A0["A"]
    C0["C"]
    E0["E ← successor"]
    G0["G"]
    D --- B0
    D --- F0
    B0 --- A0
    B0 --- C0
    F0 --- E0
    F0 --- G0
    style D fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style E0 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

After — E (leftmost in right subtree = in-order successor) promoted to D's position:

graph TD
    E1["E"]
    B1["B"]
    F1["F"]
    A1["A"]
    C1["C"]
    G1["G"]
    E1 --- B1
    E1 --- F1
    B1 --- A1
    B1 --- C1
    F1 --- G1
    style E1 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

Rule: If left height > right → promote right-edge of left subtree. If right height ≥ left → promote left-edge of right subtree. This minimizes post-deletion rebalancing.

#![allow(unused)]
fn main() {
pub fn remove(self, ...) -> CostResult<Option<Self>, Error> {
    let has_left = tree.link(true).is_some();
    let has_right = tree.link(false).is_some();
    let left = tree.child_height(true) > tree.child_height(false);

    if has_left && has_right {
        // Two children: promote edge of taller child
        let (tree, tall_child) = self.detach_expect(left, ...);
        let (_, short_child) = tree.detach_expect(!left, ...);
        tall_child.promote_edge(!left, short_child, ...)
    } else if has_left || has_right {
        // One child: promote it directly
        self.detach_expect(left, ...).1
    } else {
        // Leaf node: just remove
        None
    }
}
}

The Link System — Lazy Loading Architecture

Loading an entire Merk tree into memory would be prohibitively expensive for large trees. The Link system solves this by representing child connections in four possible states, enabling lazy loading — children are fetched from storage only when actually needed.

Four Link States

#![allow(unused)]
fn main() {
// merk/src/tree/link.rs
pub enum Link {
    Reference {                    // Pruned: only metadata, no tree in memory
        hash: CryptoHash,
        child_heights: (u8, u8),
        key: Vec<u8>,
        aggregate_data: AggregateData,
    },
    Modified {                     // Recently changed, hash not yet computed
        pending_writes: usize,
        child_heights: (u8, u8),
        tree: TreeNode,
    },
    Uncommitted {                  // Hashed but not yet persisted to storage
        hash: CryptoHash,
        child_heights: (u8, u8),
        tree: TreeNode,
        aggregate_data: AggregateData,
    },
    Loaded {                       // Fully loaded from storage
        hash: CryptoHash,
        child_heights: (u8, u8),
        tree: TreeNode,
        aggregate_data: AggregateData,
    },
}
}

State Transition Diagram

stateDiagram-v2
    [*] --> Reference : decode from storage<br/>(hash + key + child_heights)

    Reference --> Loaded : fetch()<br/>load from RocksDB

    Loaded --> Modified : put / delete<br/>any mutation invalidates hash

    Modified --> Uncommitted : commit()<br/>recompute hashes bottom-up

    Uncommitted --> Loaded : write to disk<br/>persist to RocksDB

    Loaded --> Reference : into_reference()<br/>prune to free memory

    state Reference {
        [*] : hash ✓ · tree ✗ · key ✓
    }
    state Loaded {
        [*] : hash ✓ · tree ✓
    }
    state Modified {
        [*] : hash ✗ INVALID · tree ✓ dirty<br/>pending_writes: n
    }
    state Uncommitted {
        [*] : hash ✓ fresh · tree ✓ clean
    }

What Each State Stores

State	Hash?	Tree in Memory?	Purpose
Reference	Yes	No	Compact on-disk representation. Only stores key, hash, child heights, and aggregate data.
Modified	No	Yes	After any mutation. Tracks `pending_writes` count for batch optimization.
Uncommitted	Yes	Yes	After hash computation but before storage write. Intermediate state during commit.
Loaded	Yes	Yes	Fully materialized. Ready for reads or further modifications.

The pending_writes field in Modified is noteworthy:

#![allow(unused)]
fn main() {
// Computed as: 1 + left_pending_writes + right_pending_writes
pending_writes: 1 + tree.child_pending_writes(true)
                  + tree.child_pending_writes(false),
}

This count helps the commit phase decide how to order writes for optimal performance.

The Fetch Callback Pattern

The Link system uses a Fetch trait to abstract over how child nodes are loaded:

#![allow(unused)]
fn main() {
pub trait Fetch {
    fn fetch(
        &self,
        link: &Link,
        value_defined_cost_fn: Option<&impl Fn(&[u8], &GroveVersion) -> Option<ValueDefinedCostType>>,
        grove_version: &GroveVersion,
    ) -> CostResult<TreeNode, Error>;
}
}

Different fetch implementations serve different purposes:

StorageFetch: Loads from RocksDB (the normal path)
PanicSource: Used in tests where fetching should never happen
MockSource: Returns controlled test data

This pattern allows the tree operations to be storage-agnostic — the same balancing and mutation logic works regardless of where the data comes from.

The Walker Pattern

The Walker wraps a TreeNode with a Fetch source, providing safe tree traversal with automatic lazy loading (merk/src/tree/walk/mod.rs):

#![allow(unused)]
fn main() {
pub struct Walker<S: Fetch + Sized + Clone> {
    tree: Owner<TreeNode>,
    source: S,
}
}

The Walker provides three key operations:

walk() — Detach a child, transform it, and reattach:

#![allow(unused)]
fn main() {
pub fn walk<F, T>(self, left: bool, f: F, ...) -> CostResult<Self, Error>
where
    F: FnOnce(Option<Self>) -> CostResult<Option<T>, Error>,
    T: Into<TreeNode>,
}

detach() — Remove a child, loading it from storage if necessary:

#![allow(unused)]
fn main() {
pub fn detach(self, left: bool, ...) -> CostResult<(Self, Option<Self>), Error>
}

If the child is a Link::Reference (pruned), detach will call fetch() to load it first. If the child is already in memory (Modified, Uncommitted, Loaded), it simply takes ownership.

attach() — Connect a child to a parent:

#![allow(unused)]
fn main() {
pub fn attach(self, left: bool, maybe_child: Option<Self>) -> Self
}

Attaching always creates a Link::Modified since the parent-child relationship has changed.

Memory Efficiency Through Pruning

After committing changes, the tree can prune loaded subtrees back to Link::Reference, freeing memory while retaining the hash needed for proof generation:

Before pruning — all 7 nodes in memory:

graph TD
    D["D<br/><small>Loaded</small>"]
    B["B<br/><small>Loaded</small>"]
    F["F<br/><small>Loaded</small>"]
    A["A<br/><small>Loaded</small>"]
    C["C<br/><small>Loaded</small>"]
    E["E<br/><small>Loaded</small>"]
    G["G<br/><small>Loaded</small>"]
    D --- B & F
    B --- A & C
    F --- E & G
    style D fill:#d4e6f1,stroke:#2980b9
    style B fill:#d4e6f1,stroke:#2980b9
    style F fill:#d4e6f1,stroke:#2980b9
    style A fill:#d4e6f1,stroke:#2980b9
    style C fill:#d4e6f1,stroke:#2980b9
    style E fill:#d4e6f1,stroke:#2980b9
    style G fill:#d4e6f1,stroke:#2980b9

After pruning — only root in memory, children are Link::Reference (just hash + key):

graph TD
    D2["D<br/><small>Loaded (in memory)</small>"]
    B2["B<br/><small>Reference<br/>hash + key only</small>"]
    F2["F<br/><small>Reference<br/>hash + key only</small>"]
    D2 --- B2 & F2
    style D2 fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style B2 fill:#f5f5f5,stroke:#999,stroke-dasharray: 5 5
    style F2 fill:#f5f5f5,stroke:#999,stroke-dasharray: 5 5

Link::Loaded holds hash + child_heights + tree (TreeNode). Link::Reference holds only hash + child_heights + key — the TreeNode is released from memory.

The transformation is simple:

#![allow(unused)]
fn main() {
pub fn into_reference(self) -> Link {
    Link::Reference {
        hash: self.hash(),
        child_heights: self.child_heights(),
        key: self.key().to_vec(),
        aggregate_data: self.aggregate_data(),
    }
}
}

This is crucial for keeping memory usage bounded in large trees — only the nodes actively being accessed need to be in memory.

Hashing — Cryptographic Integrity

Every node in a Merk tree is hashed to produce a root hash — a single 32-byte value that authenticates the entire tree. Any change to any key, value, or structural relationship will produce a different root hash.

Three-Level Hash Hierarchy

Merk uses a three-level hashing scheme, from innermost to outermost:

Example: key = "bob" (3 bytes), value = "hello" (5 bytes):

graph LR
    subgraph level1["Level 1: value_hash"]
        V_IN["varint(5) ‖ &quot;hello&quot;"]
        V_BLAKE["Blake3"]
        V_OUT(["value_hash<br/><small>32 bytes</small>"])
        V_IN --> V_BLAKE --> V_OUT
    end

    subgraph level2["Level 2: kv_hash"]
        K_IN["varint(3) ‖ &quot;bob&quot; ‖ value_hash"]
        K_BLAKE["Blake3"]
        K_OUT(["kv_hash<br/><small>32 bytes</small>"])
        K_IN --> K_BLAKE --> K_OUT
    end

    subgraph level3["Level 3: node_hash"]
        N_LEFT(["left_child_hash<br/><small>32B (or NULL_HASH)</small>"])
        N_KV(["kv_hash<br/><small>32B</small>"])
        N_RIGHT(["right_child_hash<br/><small>32B (or NULL_HASH)</small>"])
        N_BLAKE["Blake3<br/><small>96B input = 2 blocks</small>"]
        N_OUT(["node_hash<br/><small>32 bytes</small>"])
        N_LEFT --> N_BLAKE
        N_KV --> N_BLAKE
        N_RIGHT --> N_BLAKE
        N_BLAKE --> N_OUT
    end

    V_OUT -.-> K_IN
    K_OUT -.-> N_KV

    style level1 fill:#eaf2f8,stroke:#2980b9
    style level2 fill:#fef9e7,stroke:#f39c12
    style level3 fill:#fdedec,stroke:#e74c3c

The ROOT of the tree = node_hash of the root node — authenticates every key, value, and structural relationship. Missing children use NULL_HASH = [0x00; 32].

Level 1: value_hash

#![allow(unused)]
fn main() {
// merk/src/tree/hash.rs
pub fn value_hash(value: &[u8]) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    let val_length = value.len().encode_var_vec();  // Varint encoding
    hasher.update(val_length.as_slice());
    hasher.update(value);
    // ...
}
}

The value's length is varint-encoded and prepended. This is critical for collision resistance — without it, H("AB" ‖ "C") would equal H("A" ‖ "BC").

Level 2: kv_hash

#![allow(unused)]
fn main() {
pub fn kv_hash(key: &[u8], value: &[u8]) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    let key_length = key.len().encode_var_vec();
    hasher.update(key_length.as_slice());
    hasher.update(key);
    let vh = value_hash(value);
    hasher.update(vh.as_slice());  // Nested hash
    // ...
}
}

This binds the key to the value. For proof verification, there's also a variant that takes a pre-computed value_hash:

#![allow(unused)]
fn main() {
pub fn kv_digest_to_kv_hash(key: &[u8], value_hash: &CryptoHash) -> CostContext<CryptoHash>
}

This is used when the verifier already has the value_hash (e.g., for subtrees where value_hash is a combined hash).

Level 3: node_hash

#![allow(unused)]
fn main() {
pub fn node_hash(
    kv: &CryptoHash,
    left: &CryptoHash,
    right: &CryptoHash,
) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    hasher.update(kv);       // 32 bytes
    hasher.update(left);     // 32 bytes
    hasher.update(right);    // 32 bytes — total 96 bytes
    // Always exactly 2 hash operations (96 bytes / 64-byte block = 2)
}
}

If a child is absent, its hash is the NULL_HASH — 32 zero bytes:

#![allow(unused)]
fn main() {
pub const NULL_HASH: CryptoHash = [0; HASH_LENGTH];  // [0u8; 32]
}

Blake3 as the Hash Function

GroveDB uses Blake3 for all hashing. Key properties:

256-bit output (32 bytes)
Block size: 64 bytes
Speed: ~3x faster than SHA-256 on modern hardware
Streaming: Can incrementally feed data

The hash operation cost is calculated based on how many 64-byte blocks are processed:

#![allow(unused)]
fn main() {
let hashes = 1 + (hasher.count() - 1) / 64;  // Number of hash operations
}

Length-Prefix Encoding for Collision Resistance

Every variable-length input is prefixed with its length using varint encoding:

graph LR
    subgraph bad["Without length prefix — VULNERABLE"]
        BAD1["H(&quot;AB&quot; + &quot;C&quot;) = H(0x41 0x42 0x43)"]
        BAD2["H(&quot;A&quot; + &quot;BC&quot;) = H(0x41 0x42 0x43)"]
        BAD1 --- SAME["SAME HASH!"]
        BAD2 --- SAME
    end

    subgraph good["With length prefix — collision resistant"]
        GOOD1["H([02] 0x41 0x42)<br/><small>varint(2) ‖ &quot;AB&quot;</small>"]
        GOOD2["H([03] 0x41 0x42 0x43)<br/><small>varint(3) ‖ &quot;ABC&quot;</small>"]
        GOOD1 --- DIFF["DIFFERENT"]
        GOOD2 --- DIFF
    end

    style bad fill:#fadbd8,stroke:#e74c3c
    style good fill:#d5f5e3,stroke:#27ae60
    style SAME fill:#e74c3c,color:#fff,stroke:#c0392b
    style DIFF fill:#27ae60,color:#fff,stroke:#229954

value_hash input: [varint(value.len)] [value bytes] kv_hash input: [varint(key.len)] [key bytes] [value_hash: 32 bytes]

Without length prefixes, an attacker could craft different key-value pairs that hash to the same digest. The length prefix makes this cryptographically infeasible.

Combined Hashing for Special Elements

For subtrees and references, the value_hash is not simply H(value). Instead, it's a combined hash that binds the element to its target:

graph LR
    subgraph item["Regular Item"]
        I_val["value bytes"] --> I_hash["H(len ‖ bytes)"] --> I_vh(["value_hash"])
    end

    subgraph subtree["Subtree Element"]
        S_elem["tree element bytes"] --> S_hash1["H(len ‖ bytes)"]
        S_root(["child Merk<br/>root hash"])
        S_hash1 --> S_combine["combine_hash()<br/><small>Blake3(a ‖ b)</small>"]
        S_root --> S_combine
        S_combine --> S_vh(["value_hash"])
    end

    subgraph reference["Reference Element"]
        R_elem["ref element bytes"] --> R_hash1["H(len ‖ bytes)"]
        R_target["target value"] --> R_hash2["H(len ‖ bytes)"]
        R_hash1 --> R_combine["combine_hash()"]
        R_hash2 --> R_combine
        R_combine --> R_vh(["value_hash"])
    end

    style item fill:#eaf2f8,stroke:#2980b9
    style subtree fill:#fef9e7,stroke:#f39c12
    style reference fill:#fdedec,stroke:#e74c3c

Subtree: binds the child Merk's root hash into the parent. Reference: binds both the reference path AND the target value. Changing either changes the root hash.

The combine_hash function:

#![allow(unused)]
fn main() {
pub fn combine_hash(hash_one: &CryptoHash, hash_two: &CryptoHash) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    hasher.update(hash_one);   // 32 bytes
    hasher.update(hash_two);   // 32 bytes — total 64 bytes, exactly 1 hash op
    // ...
}
}

This is what allows GroveDB to authenticate the entire hierarchy through a single root hash — each parent tree's value_hash for a subtree element includes the child tree's root hash.

Aggregate Hashing for ProvableCountTree

ProvableCountTree nodes include the aggregate count in the node hash:

#![allow(unused)]
fn main() {
pub fn node_hash_with_count(
    kv: &CryptoHash,
    left: &CryptoHash,
    right: &CryptoHash,
    count: u64,
) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    hasher.update(kv);                        // 32 bytes
    hasher.update(left);                      // 32 bytes
    hasher.update(right);                     // 32 bytes
    hasher.update(&count.to_be_bytes());      // 8 bytes — total 104 bytes
    // Still exactly 2 hash ops (104 < 128 = 2 × 64)
}
}

This means a proof of count doesn't require revealing the actual data — the count is baked into the cryptographic commitment.

Aggregate Hashing for ProvableSumTree

ProvableSumTree is the sum parallel — each node's aggregate sum is bound into the node hash:

#![allow(unused)]
fn main() {
pub fn node_hash_with_sum(
    kv: &CryptoHash,
    left: &CryptoHash,
    right: &CryptoHash,
    sum: i64,
) -> CostContext<CryptoHash> {
    let mut hasher = blake3::Hasher::new();
    hasher.update(kv);                        // 32 bytes
    hasher.update(left);                      // 32 bytes
    hasher.update(right);                     // 32 bytes
    hasher.update(&sum.to_be_bytes());        // 8 bytes (signed i64 BE)
    // Same 2 hash ops as node_hash_with_count
}
}

Hashing uses fixed 8-byte big-endian i64::to_be_bytes() (signed), not the varint encoding used for wire-format compactness in proof ops. The two are deliberately decoupled: wire wants compact, the hash input must be canonical and length-fixed so the verifier reconstructs the exact pre-image. Negative sums hash correctly because two's-complement big-endian is a deterministic content-binding encoding (no order preservation needed).

A proof against a ProvableSumTree can return the verified total of any key range without revealing the underlying SumItem values — see Aggregate Sum on Range Queries.

The Element System

While Merk deals with raw key-value pairs, GroveDB operates at a higher level using Elements — typed values that carry semantic meaning. Every value stored in GroveDB is an Element.

The Element Enum

#![allow(unused)]
fn main() {
// grovedb-element/src/element/mod.rs
pub enum Element {
    Item(Vec<u8>, Option<ElementFlags>),                                    // [0]
    Reference(ReferencePathType, MaxReferenceHop, Option<ElementFlags>),    // [1]
    Tree(Option<Vec<u8>>, Option<ElementFlags>),                           // [2]
    SumItem(SumValue, Option<ElementFlags>),                               // [3]
    SumTree(Option<Vec<u8>>, SumValue, Option<ElementFlags>),              // [4]
    BigSumTree(Option<Vec<u8>>, BigSumValue, Option<ElementFlags>),        // [5]
    CountTree(Option<Vec<u8>>, CountValue, Option<ElementFlags>),          // [6]
    CountSumTree(Option<Vec<u8>>, CountValue, SumValue, Option<ElementFlags>), // [7]
    ProvableCountTree(Option<Vec<u8>>, CountValue, Option<ElementFlags>),  // [8]
    ItemWithSumItem(Vec<u8>, SumValue, Option<ElementFlags>),              // [9]
    ProvableCountSumTree(Option<Vec<u8>>, CountValue, SumValue,
                         Option<ElementFlags>),                            // [10]
    CommitmentTree(u64, u8, Option<ElementFlags>),                         // [11]
    MmrTree(u64, Option<ElementFlags>),                                    // [12]
    BulkAppendTree(u64, u8, Option<ElementFlags>),                         // [13]
    DenseAppendOnlyFixedSizeTree(u16, u8, Option<ElementFlags>),           // [14]
    NonCounted(Box<Element>),                                              // [15] wrapper byte
    NotSummed(Box<Element>),                                               // [16] wrapper byte
    ProvableSumTree(Option<Vec<u8>>, SumValue, Option<ElementFlags>),      // [17]
}
}

The discriminant numbers (shown in brackets) are used during serialization.

Type aliases used throughout:

#![allow(unused)]
fn main() {
pub type ElementFlags = Vec<u8>;        // Arbitrary metadata per element
pub type MaxReferenceHop = Option<u8>;  // Optional hop limit for references
pub type SumValue = i64;                // 64-bit signed sum
pub type BigSumValue = i128;            // 128-bit signed sum
pub type CountValue = u64;              // 64-bit unsigned count
}

Item — Basic Key-Value Storage

The simplest element. Stores arbitrary bytes:

#![allow(unused)]
fn main() {
Element::Item(value: Vec<u8>, flags: Option<ElementFlags>)
}

graph TD
    subgraph item["Element::Item — discriminant 0x00"]
        VALUE["<b>value:</b> Vec&lt;u8&gt;<br/><small>arbitrary bytes: serialized proto, JSON, raw u64, etc.</small>"]
        FLAGS["<b>flags:</b> Option&lt;ElementFlags&gt;<br/><small>epoch info, storage credits, metadata</small>"]
        FEAT["Merk feature: <b>BasicMerkNode</b> (no aggregation)"]
    end
    style item fill:#eaf2f8,stroke:#2980b9

Constructors:

#![allow(unused)]
fn main() {
Element::new_item(b"hello world".to_vec())
Element::new_item_with_flags(b"data".to_vec(), Some(vec![0x01, 0x02]))
}

Items participate in sum aggregation: within a SumTree, an Item contributes a default sum of 0. A SumItem contributes its explicit value.

Tree — Containers for Subtrees

A Tree element is a portal to another Merk tree. It stores the root key of the child tree (if any):

#![allow(unused)]
fn main() {
Element::Tree(root_key: Option<Vec<u8>>, flags: Option<ElementFlags>)
}

graph TD
    subgraph parent["Parent Merk Tree"]
        settings["&quot;settings&quot;<br/>Element::Item"]
        logs["&quot;logs&quot;<br/>Element::Item"]
        users["&quot;users&quot;<br/>Element::Tree<br/>root_key = &quot;bob&quot;"]
        settings --> logs
        settings --> users
    end

    subgraph child["&quot;users&quot; Child Merk Tree"]
        bob["&quot;bob&quot;<br/>(root_key)"]
        alice["&quot;alice&quot;<br/>Item"]
        carol["&quot;carol&quot;<br/>Item"]
        bob --> alice
        bob --> carol
    end

    users -.->|"portal: root_key<br/>links to child tree"| bob

    style parent fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style child fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style users fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style bob fill:#fef9e7,stroke:#f39c12,stroke-width:2px

The Tree element in the parent Merk stores the root_key of the child Merk tree. This creates a portal — a link from one Merk tree into another.

When a tree is empty, root_key is None. The constructor Element::empty_tree() creates Element::Tree(None, None).

SumItem / SumTree — Aggregate Sums

A SumTree automatically maintains the sum of all its direct children's sum-contributions:

#![allow(unused)]
fn main() {
Element::SumTree(root_key: Option<Vec<u8>>, sum: SumValue, flags: Option<ElementFlags>)
Element::SumItem(value: SumValue, flags: Option<ElementFlags>)
}

graph TD
    subgraph sumtree["SumTree &quot;balances&quot; — sum=350"]
        bob["&quot;bob&quot;<br/>SumItem(150)<br/><i>subtree_sum: 150+100+100 = 350</i>"]
        alice["&quot;alice&quot;<br/>SumItem(100)<br/><i>subtree_sum: 100</i>"]
        carol["&quot;carol&quot;<br/>SumItem(100)<br/><i>subtree_sum: 100</i>"]
        bob --> alice
        bob --> carol
    end

    style sumtree fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style bob fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style alice fill:#d4e6f1,stroke:#2980b9
    style carol fill:#d4e6f1,stroke:#2980b9

Aggregation formula: node_sum = own_value + left_child_sum + right_child_sum Bob: 150 + 100 (alice) + 100 (carol) = 350. The root sum (350) is stored in the parent's SumTree element.

The sum is maintained at the Merk level through the TreeFeatureType::SummedMerkNode(i64) feature type. During tree propagation, each node's aggregate data is recomputed:

aggregate_sum = own_sum + left_child_sum + right_child_sum

CountTree, CountSumTree, BigSumTree

Additional aggregate tree types:

Element Type	Merk Feature Type	Aggregates
`CountTree`	`CountedMerkNode(u64)`	Number of elements
`CountSumTree`	`CountedSummedMerkNode(u64, i64)`	Both count and sum
`BigSumTree`	`BigSummedMerkNode(i128)`	128-bit sum for large values
`ProvableCountTree`	`ProvableCountedMerkNode(u64)`	Count baked into hash
`ProvableCountSumTree`	`ProvableCountedSummedMerkNode(u64, i64)`	Count in hash + sum
`ProvableSumTree`	`ProvableSummedMerkNode(i64)`	Sum baked into hash

ProvableCountTree is special: its count is included in the node_hash computation (via node_hash_with_count), so a proof can verify the count without revealing any values.

ProvableSumTree is the parallel for sums: each node's aggregate sum is included in the node_hash via node_hash_with_sum(kv_hash, left, right, sum), so a proof can return the verified total of any key range without revealing the underlying SumItem values. Use this when the sum is part of the protocol invariant — stake weights, fee priorities, vote tallies — and a peer needs to verify totals from the root hash alone. Use plain SumTree when the sum is bookkeeping metadata that doesn't need cryptographic binding. The per-node hashing cost is a small fixed addition over plain SumTree. See Aggregate Sum on Range Queries for the verifiable range-sum query that this element enables.

Like its count counterpart, ProvableSumTree accepts the NotSummed wrapper so a sum-bearing child can opt out of contributing to its parent's running sum. The NotSummed ElementType twin lives at slot 177 in the 0xB0..=0xBF family range, assigned explicitly rather than via the prefix | base formula used elsewhere (the formula would collide because ProvableSumTree's base discriminant 17 and 0xb0 | 17 = 0xB1 would mask back to Reference under the legacy & 0x0F inverse).

Element Serialization

Elements are serialized using bincode with big-endian byte order:

#![allow(unused)]
fn main() {
pub fn serialize(&self, grove_version: &GroveVersion) -> Result<Vec<u8>, ElementError> {
    let config = config::standard().with_big_endian().with_no_limit();
    bincode::encode_to_vec(self, config)
}
}

The first byte is the discriminant, allowing O(1) type detection:

#![allow(unused)]
fn main() {
pub fn from_serialized_value(value: &[u8]) -> Option<ElementType> {
    match value.first()? {
        0 => Some(ElementType::Item),
        1 => Some(ElementType::Reference),
        2 => Some(ElementType::Tree),
        3 => Some(ElementType::SumItem),
        // ... etc
    }
}
}

TreeFeatureType and Aggregate Data Flow

The TreeFeatureType enum bridges the gap between GroveDB Elements and Merk nodes:

#![allow(unused)]
fn main() {
pub enum TreeFeatureType {
    BasicMerkNode,                              // No aggregation
    SummedMerkNode(i64),                       // Sum aggregation
    BigSummedMerkNode(i128),                   // Large sum
    CountedMerkNode(u64),                      // Count
    CountedSummedMerkNode(u64, i64),           // Count + sum
    ProvableCountedMerkNode(u64),              // Count in hash
    ProvableCountedSummedMerkNode(u64, i64),   // Count in hash + sum
    ProvableSummedMerkNode(i64),               // Sum in hash
}
}

Aggregate data flows upward through the tree:

graph TD
    subgraph counttree["CountTree &quot;users&quot; — count = 5"]
        C["C<br/>own=1, total=<b>5</b>"]
        B["B<br/>own=1, total=2"]
        D["D<br/>own=1, total=2"]
        A["A<br/>own=1, total=1"]
        E["E<br/>own=1, total=1"]
        C --> B
        C --> D
        B --> A
        D --> E
    end

    style counttree fill:#e8daef,stroke:#8e44ad,stroke-width:2px
    style C fill:#fef9e7,stroke:#f39c12,stroke-width:2px

Aggregation table: Each node's aggregate = own(1) + left_aggregate + right_aggregate

Node own left_agg right_agg total

A 1 0 0 1

B 1 1 (A) 0 2

E 1 0 0 1

D 1 0 1 (E) 2

C 1 2 (B) 2 (D) 5 (root)

Node	own	left_agg	right_agg	total
A	1	0	0	1
B	1	1 (A)	0	2
E	1	0	0	1
D	1	0	1 (E)	2
C	1	2 (B)	2 (D)	5 (root)

The count stored at each node represents the total count in the subtree rooted at that node, including itself. The root node's count is the total for the entire tree.

The AggregateData enum carries this through the Link system:

#![allow(unused)]
fn main() {
pub enum AggregateData {
    NoAggregateData,
    Sum(i64),
    BigSum(i128),
    Count(u64),
    CountAndSum(u64, i64),
    ProvableCount(u64),
    ProvableCountAndSum(u64, i64),
}
}

CommitmentTree — Sinsemilla Commitment Tree

A CommitmentTree provides a depth-32 Sinsemilla Merkle tree for tracking note commitment anchors, as used in Zcash's Orchard shielded protocol. It wraps incrementalmerkletree::Frontier<MerkleHashOrchard, 32> for O(1) append and root computation:

#![allow(unused)]
fn main() {
Element::CommitmentTree(
    total_count: u64,               // Number of commitments appended
    chunk_power: u8,                // BulkAppendTree compaction size (chunk_size = 2^chunk_power)
    flags: Option<ElementFlags>,
)                                   // discriminant [11]
}

Note: The Sinsemilla frontier root hash is NOT stored in the Element. It is persisted in data storage and flows through the Merk child hash mechanism (insert_subtree's subtree_root_hash parameter). Any change to the frontier automatically propagates up through the GroveDB Merk hierarchy.

graph TD
    subgraph ct["CommitmentTree — BulkAppendTree + Sinsemilla Frontier"]
        ROOT["sinsemilla_root<br/><small>depth-32 Merkle root using<br/>MerkleHashOrchard hashing</small>"]
        FRONTIER["<b>Frontier</b><br/><small>~1KB: rightmost path only<br/>O(1) append, O(1) root</small>"]
        ITEMS["Data namespace:<br/>BulkAppendTree entries + frontier<br/>(buffer + chunks + MMR + __ct_data__)"]
        ROOT --- FRONTIER
        FRONTIER --- ITEMS
    end
    style ct fill:#e8daef,stroke:#8e44ad,stroke-width:2px

Architecture:

The frontier (rightmost path of the Merkle tree, ~1KB constant size) is stored in the data namespace, keyed by COMMITMENT_TREE_DATA_KEY
The actual note data (cmx || rho || cv_net || ciphertext) is stored via a BulkAppendTree in the data namespace — chunk-compacted, retrievable by position. rho and cv_net are unencrypted protocol fields and never enter the Sinsemilla frontier.
Historical anchors are tracked by Platform in a separate provable tree
The Sinsemilla root is NOT stored in the Element — it flows as the Merk child hash through the GroveDB hash hierarchy

Operations:

commitment_tree_insert(path, key, cmx, rho, cv_net, ciphertext, tx) — Typed append accepting TransmittedNoteCiphertext<M>; returns (new_root, position)
commitment_tree_anchor(path, key, tx) — Get current Orchard Anchor
commitment_tree_get_value(path, key, position, tx) — Retrieve value by position
commitment_tree_count(path, key, tx) — Get total item count

MemoSize generic: CommitmentTree<S, M: MemoSize = DashMemo> validates that ciphertext payloads match the expected size for M. For Dash (36-byte memos): epk_bytes (32) + enc_ciphertext (104) + out_ciphertext (80) = 216 bytes.

Cost tracking: Sinsemilla hash operations are tracked via cost.sinsemilla_hash_calls. Root computation always traverses 32 levels. Ommer merges cascade through trailing_ones() of the previous position. BulkAppendTree operations add Blake3 hash costs.

MmrTree — Merkle Mountain Range

An MmrTree stores data in an append-only Merkle Mountain Range (MMR) using Blake3 hashing. MMR nodes are stored in the data column (same as Merk nodes), not in a child Merk subtree. See Chapter 13 for a comprehensive deep-dive into how MMRs work, how they fill up, how proofs are generated and verified, and how MmrTree integrates with GroveDB.

#![allow(unused)]
fn main() {
Element::MmrTree(
    mmr_size: u64,                  // Internal MMR size (nodes, not leaves)
    flags: Option<ElementFlags>,
)                                   // discriminant [12]
}

Note: The MMR root hash is NOT stored in the Element. It flows as the Merk child hash via insert_subtree's subtree_root_hash parameter.

Operations: mmr_tree_append, mmr_tree_root_hash, mmr_tree_get_value, mmr_tree_leaf_count. Proofs: V1 proofs (see §9.6 and §13.9).

BulkAppendTree — Two-Level Append-Only Structure

A BulkAppendTree combines a dense Merkle tree buffer with a chunk-level MMR for efficient high-throughput appends with provable range queries. It is a non-Merk tree — data lives in the data namespace, not in a child Merk subtree. See Chapter 14 for a comprehensive deep-dive into the two-level architecture, chunk compaction, proof generation, verification, and GroveDB integration.

#![allow(unused)]
fn main() {
Element::BulkAppendTree(
    total_count: u64,               // Total values appended
    chunk_power: u8,                // Dense tree height (buffer capacity = 2^chunk_power - 1)
    flags: Option<ElementFlags>,
)                                   // discriminant [13]
}

Note: The state root (blake3("bulk_state" || mmr_root || dense_tree_root)) is NOT stored in the Element. It flows as the Merk child hash via insert_subtree's subtree_root_hash parameter.

Operations: bulk_append, bulk_get_value, bulk_get_chunk, bulk_get_buffer, bulk_count, bulk_chunk_count. Proofs: V1 range proofs (see §9.6 and §14.10).

DenseAppendOnlyFixedSizeTree — Dense Fixed-Capacity Storage

A DenseAppendOnlyFixedSizeTree is a complete binary tree of fixed height h where every node (internal and leaf) stores a data value. Positions are filled in level-order (BFS). The root hash is recomputed on the fly — no intermediate hashes are persisted. See Chapter 16 for the full deep-dive.

#![allow(unused)]
fn main() {
Element::DenseAppendOnlyFixedSizeTree(
    count: u16,                     // Number of values stored (max 65,535)
    height: u8,                     // Tree height (1..=16, immutable), capacity = 2^h - 1
    flags: Option<ElementFlags>,
)                                   // discriminant [14]
}

Note: The root hash is NOT stored in the Element — it is recomputed on the fly and flows as the Merk child hash. The count field is u16 (not u64), limiting trees to 65,535 positions. Heights are restricted to 1..=16.

Operations: dense_tree_insert, dense_tree_get, dense_tree_root_hash, dense_tree_count. Proofs: Element-level only (no subquery proofs yet).

Non-Merk Trees — Common Patterns

CommitmentTree, MmrTree, BulkAppendTree, and DenseAppendOnlyFixedSizeTree share a common architectural pattern that distinguishes them from the Merk-based tree types (Tree, SumTree, CountTree, etc.):

Property	Merk-based trees	Non-Merk trees
Child Merk subtree	Yes (`root_key = Some(...)`)	No (no root_key field)
Data storage	Merk key-value pairs	Data column blobs (non-Merk keys)
Root hash binding	`combine_hash(elem_hash, child_root_hash)`	`combine_hash(elem_hash, type_specific_root)`
Type-specific root	Maintained by Merk AVL	Flows as Merk child hash (NOT in element bytes)
Proof format	V0 (layer-by-layer Merk)	V1 (type-specific proof)
TreeFeatureType	BasicMerkNode (no aggregation)	BasicMerkNode

Storage column note: All four non-Merk tree types (MmrTree, CommitmentTree, BulkAppendTree, DenseAppendOnlyFixedSizeTree) store their data in the data column using non-Merk keys. CommitmentTree stores its Sinsemilla frontier alongside BulkAppendTree entries in the same data column (key b"__ct_data__").

The type-specific root (sinsemilla root, MMR root, state root, or dense tree root hash) is NOT stored in the Element. Instead, it flows as the Merk child hash via insert_subtree's subtree_root_hash parameter. The Merk combined_value_hash becomes combine_hash(value_hash(element_bytes), type_specific_root). Any change to the type-specific root changes the child hash, which changes the combined_value_hash, which propagates up through the GroveDB hash hierarchy — maintaining cryptographic integrity.

The Hierarchical Grove — Tree of Trees

How Subtrees Nest Inside Parent Trees

The defining feature of GroveDB is that a Merk tree can contain elements that are themselves Merk trees. This creates a hierarchical namespace:

graph TD
    subgraph root["ROOT MERK TREE — path: []"]
        contracts["&quot;contracts&quot;<br/>Tree"]
        identities["&quot;identities&quot;<br/>Tree"]
        balances["&quot;balances&quot;<br/>SumTree, sum=0"]
        contracts --> identities
        contracts --> balances
    end

    subgraph ident["IDENTITIES MERK — path: [&quot;identities&quot;]"]
        bob456["&quot;bob456&quot;<br/>Tree"]
        alice123["&quot;alice123&quot;<br/>Tree"]
        eve["&quot;eve&quot;<br/>Item"]
        bob456 --> alice123
        bob456 --> eve
    end

    subgraph bal["BALANCES MERK (SumTree) — path: [&quot;balances&quot;]"]
        bob_bal["&quot;bob456&quot;<br/>SumItem(800)"]
    end

    subgraph alice_tree["ALICE123 MERK — path: [&quot;identities&quot;, &quot;alice123&quot;]"]
        name["&quot;name&quot;<br/>Item(&quot;Al&quot;)"]
        balance_item["&quot;balance&quot;<br/>SumItem(1000)"]
        docs["&quot;docs&quot;<br/>Tree"]
        name --> balance_item
        name --> docs
    end

    identities -.-> bob456
    balances -.-> bob_bal
    alice123 -.-> name
    docs -.->|"... more subtrees"| more["..."]

    style root fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style ident fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style bal fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style alice_tree fill:#e8daef,stroke:#8e44ad,stroke-width:2px
    style more fill:none,stroke:none

Each colored box is a separate Merk tree. Dashed arrows represent the portal links from Tree elements to their child Merk trees. The path to each Merk is shown in its label.

Path Addressing System

Every element in GroveDB is addressed by a path — a sequence of byte strings that navigate from the root through subtrees to the target key:

    Path: ["identities", "alice123", "name"]

    Step 1: In root tree, look up "identities" → Tree element
    Step 2: Open identities subtree, look up "alice123" → Tree element
    Step 3: Open alice123 subtree, look up "name" → Item("Alice")

Paths are represented as Vec<Vec<u8>> or using the SubtreePath type for efficient manipulation without allocation:

#![allow(unused)]
fn main() {
// The path to the element (all segments except the last)
let path: &[&[u8]] = &[b"identities", b"alice123"];
// The key within the final subtree
let key: &[u8] = b"name";
}

Blake3 Prefix Generation for Storage Isolation

Each subtree in GroveDB gets its own isolated storage namespace in RocksDB. The namespace is determined by hashing the path with Blake3:

#![allow(unused)]
fn main() {
pub type SubtreePrefix = [u8; 32];

// The prefix is computed by hashing the path segments
// storage/src/rocksdb_storage/storage.rs
}

For example:

    Path: ["identities", "alice123"]
    Prefix: Blake3(["identities", "alice123"]) = [0xab, 0x3f, ...]  (32 bytes)

    In RocksDB, keys for this subtree are stored as:
    [prefix: 32 bytes][original_key]

    So "name" in this subtree becomes:
    [0xab, 0x3f, ...][0x6e, 0x61, 0x6d, 0x65]  ("name")

This ensures:

No key collisions between subtrees (32-byte prefix = 256-bit isolation)
Efficient prefix computation (single Blake3 hash over the path bytes)
Subtree data is colocated in RocksDB for cache efficiency

Root Hash Propagation Through the Hierarchy

When a value changes deep in the grove, the change must propagate upward to update the root hash:

    Change: Update "name" to "ALICE" in identities/alice123/

    Step 1: Update value in alice123's Merk tree
            → alice123 tree gets new root hash: H_alice_new

    Step 2: Update "alice123" element in identities tree
            → identities tree's value_hash for "alice123" =
              combine_hash(H(tree_element_bytes), H_alice_new)
            → identities tree gets new root hash: H_ident_new

    Step 3: Update "identities" element in root tree
            → root tree's value_hash for "identities" =
              combine_hash(H(tree_element_bytes), H_ident_new)
            → ROOT HASH changes

graph TD
    subgraph step3["STEP 3: Update root tree"]
        R3["Root tree recalculates:<br/>value_hash for &quot;identities&quot; =<br/>combine_hash(H(tree_bytes), H_ident_NEW)<br/>→ new ROOT HASH"]
    end
    subgraph step2["STEP 2: Update identities tree"]
        R2["identities tree recalculates:<br/>value_hash for &quot;alice123&quot; =<br/>combine_hash(H(tree_bytes), H_alice_NEW)<br/>→ new root hash: H_ident_NEW"]
    end
    subgraph step1["STEP 1: Update alice123 Merk"]
        R1["alice123 tree recalculates:<br/>value_hash(&quot;ALICE&quot;) → new kv_hash<br/>→ new root hash: H_alice_NEW"]
    end

    R1 -->|"H_alice_NEW flows up"| R2
    R2 -->|"H_ident_NEW flows up"| R3

    style step1 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style step2 fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style step3 fill:#fadbd8,stroke:#e74c3c,stroke-width:2px

Before vs After — changed nodes marked with red:

graph TD
    subgraph before["BEFORE"]
        B_root["Root: aabb1122"]
        B_ident["&quot;identities&quot;: cc44.."]
        B_contracts["&quot;contracts&quot;: 1234.."]
        B_balances["&quot;balances&quot;: 5678.."]
        B_alice["&quot;alice123&quot;: ee55.."]
        B_bob["&quot;bob456&quot;: bb22.."]
        B_name["&quot;name&quot;: 7f.."]
        B_docs["&quot;docs&quot;: a1.."]
        B_root --- B_ident
        B_root --- B_contracts
        B_root --- B_balances
        B_ident --- B_alice
        B_ident --- B_bob
        B_alice --- B_name
        B_alice --- B_docs
    end

    subgraph after["AFTER"]
        A_root["Root: ff990033"]
        A_ident["&quot;identities&quot;: dd88.."]
        A_contracts["&quot;contracts&quot;: 1234.."]
        A_balances["&quot;balances&quot;: 5678.."]
        A_alice["&quot;alice123&quot;: 1a2b.."]
        A_bob["&quot;bob456&quot;: bb22.."]
        A_name["&quot;name&quot;: 3c.."]
        A_docs["&quot;docs&quot;: a1.."]
        A_root --- A_ident
        A_root --- A_contracts
        A_root --- A_balances
        A_ident --- A_alice
        A_ident --- A_bob
        A_alice --- A_name
        A_alice --- A_docs
    end

    style A_root fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style A_ident fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style A_alice fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style A_name fill:#fadbd8,stroke:#e74c3c,stroke-width:3px

Only the nodes on the path from the changed value up to the root are recomputed. Siblings and other branches remain unchanged.

The propagation is implemented by propagate_changes_with_transaction, which walks up the path from the modified subtree to the root, updating each parent's element hash along the way.

Multi-Level Grove Structure Example

Here's a complete example showing how Dash Platform structures its state:

graph TD
    ROOT["GroveDB Root"]

    ROOT --> contracts["[01] &quot;data_contracts&quot;<br/>Tree"]
    ROOT --> identities["[02] &quot;identities&quot;<br/>Tree"]
    ROOT --> balances["[03] &quot;balances&quot;<br/>SumTree"]
    ROOT --> pools["[04] &quot;pools&quot;<br/>Tree"]

    contracts --> c1["contract_id_1<br/>Tree"]
    contracts --> c2["contract_id_2<br/>Tree"]
    c1 --> docs["&quot;documents&quot;<br/>Tree"]
    docs --> profile["&quot;profile&quot;<br/>Tree"]
    docs --> note["&quot;note&quot;<br/>Tree"]
    profile --> d1["doc_id_1<br/>Item"]
    profile --> d2["doc_id_2<br/>Item"]
    note --> d3["doc_id_3<br/>Item"]

    identities --> id1["identity_id_1<br/>Tree"]
    identities --> id2["identity_id_2<br/>Tree"]
    id1 --> keys["&quot;keys&quot;<br/>Tree"]
    id1 --> rev["&quot;revision&quot;<br/>Item(u64)"]
    keys --> k1["key_id_1<br/>Item(pubkey)"]
    keys --> k2["key_id_2<br/>Item(pubkey)"]

    balances --> b1["identity_id_1<br/>SumItem(balance)"]
    balances --> b2["identity_id_2<br/>SumItem(balance)"]

    style ROOT fill:#2c3e50,stroke:#2c3e50,color:#fff
    style contracts fill:#d4e6f1,stroke:#2980b9
    style identities fill:#d5f5e3,stroke:#27ae60
    style balances fill:#fef9e7,stroke:#f39c12
    style pools fill:#e8daef,stroke:#8e44ad

Each box is a separate Merk tree, authenticated all the way up to a single root hash that validators agree on.

The Reference System

Why References Exist

In a hierarchical database, you often need the same data accessible from multiple paths. For example, documents might be stored under their contract but also queryable by owner identity. References are GroveDB's answer — they are pointers from one location to another, similar to symbolic links in a filesystem.

graph LR
    subgraph primary["Primary Storage"]
        item["contracts/C1/docs/D1<br/><b>Item</b>(data)"]
    end
    subgraph secondary["Secondary Index"]
        ref["identities/alice/docs/D1<br/><b>Reference</b>"]
    end
    ref -->|"points to"| item

    style primary fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style secondary fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style ref fill:#fef9e7,stroke:#f39c12,stroke-width:2px

Key properties:

References are authenticated — the reference's value_hash includes both the reference itself and the referenced element
References can be chained — a reference can point to another reference
Cycle detection prevents infinite loops
A configurable hop limit prevents resource exhaustion

The Seven Reference Types

#![allow(unused)]
fn main() {
// grovedb-element/src/reference_path/mod.rs
pub enum ReferencePathType {
    AbsolutePathReference(Vec<Vec<u8>>),
    UpstreamRootHeightReference(u8, Vec<Vec<u8>>),
    UpstreamRootHeightWithParentPathAdditionReference(u8, Vec<Vec<u8>>),
    UpstreamFromElementHeightReference(u8, Vec<Vec<u8>>),
    CousinReference(Vec<u8>),
    RemovedCousinReference(Vec<Vec<u8>>),
    SiblingReference(Vec<u8>),
}
}

Let's walk through each with diagrams.

AbsolutePathReference

The simplest type. Stores the full path to the target:

graph TD
    subgraph root["Root Merk — path: []"]
        A["A<br/>Tree"]
        P["P<br/>Tree"]
    end

    subgraph merkA["Merk [A]"]
        B["B<br/>Tree"]
    end

    subgraph merkP["Merk [P]"]
        Q["Q<br/>Tree"]
    end

    subgraph merkAB["Merk [A, B]"]
        X["X = Reference<br/>AbsolutePathRef([P, Q, R])"]
    end

    subgraph merkPQ["Merk [P, Q]"]
        R["R = Item<br/>&quot;target&quot;"]
    end

    A -.-> B
    P -.-> Q
    B -.-> X
    Q -.-> R
    X ==>|"resolves to [P, Q, R]"| R

    style merkAB fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style merkPQ fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style X fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style R fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

X stores the full absolute path [P, Q, R]. No matter where X is located, it always resolves to the same target.

UpstreamRootHeightReference

Keeps the first N segments of the current path, then appends a new path:

graph TD
    subgraph resolve["Resolution: keep first 2 segments + append [P, Q]"]
        direction LR
        curr["current: [A, B, C, D]"] --> keep["keep first 2: [A, B]"] --> append["append: [A, B, <b>P, Q</b>]"]
    end

    subgraph grove["Grove Hierarchy"]
        gA["A (height 0)"]
        gB["B (height 1)"]
        gC["C (height 2)"]
        gD["D (height 3)"]
        gX["X = Reference<br/>UpstreamRootHeight(2, [P,Q])"]
        gP["P (height 2)"]
        gQ["Q (height 3) — target"]

        gA --> gB
        gB --> gC
        gB -->|"keep first 2 → [A,B]<br/>then descend [P,Q]"| gP
        gC --> gD
        gD -.-> gX
        gP --> gQ
    end

    gX ==>|"resolves to"| gQ

    style resolve fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style gX fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style gQ fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

UpstreamRootHeightWithParentPathAdditionReference

Like UpstreamRootHeight, but re-appends the last segment of the current path:

    Reference at path [A, B, C, D, E] key=X
    UpstreamRootHeightWithParentPathAdditionReference(2, [P, Q])

    Current path:    [A, B, C, D, E]
    Keep first 2:    [A, B]
    Append [P, Q]:   [A, B, P, Q]
    Re-append last:  [A, B, P, Q, E]   ← "E" from original path added back

    Useful for: indexes where the parent key should be preserved

UpstreamFromElementHeightReference

Discards the last N segments, then appends:

    Reference at path [A, B, C, D] key=X
    UpstreamFromElementHeightReference(1, [P, Q])

    Current path:     [A, B, C, D]
    Discard last 1:   [A, B, C]
    Append [P, Q]:    [A, B, C, P, Q]

CousinReference

Replaces only the immediate parent with a new key:

graph TD
    subgraph resolve["Resolution: pop last 2, push cousin C, push key X"]
        direction LR
        r1["path: [A, B, M, D]"] --> r2["pop last 2: [A, B]"] --> r3["push C: [A, B, C]"] --> r4["push key X: [A, B, C, X]"]
    end

    subgraph merkAB["Merk [A, B]"]
        M["M<br/>Tree"]
        C["C<br/>Tree<br/>(cousin of M)"]
    end

    subgraph merkABM["Merk [A, B, M]"]
        D["D<br/>Tree"]
    end

    subgraph merkABMD["Merk [A, B, M, D]"]
        Xref["X = Reference<br/>CousinReference(C)"]
    end

    subgraph merkABC["Merk [A, B, C]"]
        Xtarget["X = Item<br/>(target)"]
    end

    M -.-> D
    D -.-> Xref
    C -.-> Xtarget
    Xref ==>|"resolves to [A, B, C, X]"| Xtarget

    style resolve fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style Xref fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style Xtarget fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style M fill:#fadbd8,stroke:#e74c3c
    style C fill:#d5f5e3,stroke:#27ae60

The "cousin" is a sibling subtree of the reference's grandparent. The reference navigates up two levels, then descends into the cousin subtree.

RemovedCousinReference

Like CousinReference but replaces the parent with a multi-segment path:

    Reference at path [A, B, C, D] key=X
    RemovedCousinReference([M, N])

    Current path:  [A, B, C, D]
    Pop parent C:  [A, B]
    Append [M, N]: [A, B, M, N]
    Push key X:    [A, B, M, N, X]

SiblingReference

The simplest relative reference — just changes the key within the same parent:

graph TD
    subgraph merk["Merk [A, B, C] — same tree, same path"]
        M_sib["M"]
        X_sib["X = Reference<br/>SiblingRef(Y)"]
        Y_sib["Y = Item<br/>(target)"]
        Z_sib["Z = Item"]
        M_sib --> X_sib
        M_sib --> Y_sib
    end

    X_sib ==>|"resolves to [A, B, C, Y]"| Y_sib

    style merk fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style X_sib fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style Y_sib fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

The simplest reference type. X and Y are siblings in the same Merk tree — the resolution just changes the key while keeping the same path.

Reference Following and the Hop Limit

When GroveDB encounters a Reference element, it must follow it to find the actual value. Since references can point to other references, this involves a loop:

#![allow(unused)]
fn main() {
// grovedb/src/reference_path.rs
pub const MAX_REFERENCE_HOPS: usize = 10;

pub fn follow_reference(...) -> CostResult<ResolvedReference, Error> {
    let mut hops_left = MAX_REFERENCE_HOPS;
    let mut visited = HashSet::new();

    while hops_left > 0 {
        // Resolve reference path to absolute path
        let target_path = current_ref.absolute_qualified_path(...);

        // Check for cycles
        if !visited.insert(target_path.clone()) {
            return Err(Error::CyclicReference);
        }

        // Fetch element at target
        let element = Element::get(target_path);

        match element {
            Element::Reference(next_ref, ..) => {
                // Still a reference — keep following
                current_ref = next_ref;
                hops_left -= 1;
            }
            other => {
                // Found the actual element!
                return Ok(ResolvedReference { element: other, ... });
            }
        }
    }

    Err(Error::ReferenceLimit)  // Exceeded 10 hops
}
}

Cycle Detection

The visited HashSet tracks all paths we've seen. If we encounter a path we've already visited, we have a cycle:

graph LR
    A["A<br/>Reference"] -->|"step 1"| B["B<br/>Reference"]
    B -->|"step 2"| C["C<br/>Reference"]
    C -->|"step 3"| A

    style A fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style B fill:#fef9e7,stroke:#f39c12
    style C fill:#fef9e7,stroke:#f39c12

Cycle detection trace:

Step Follow visited set Result

1 Start at A { A } A is Ref → follow

2 A → B { A, B } B is Ref → follow

3 B → C { A, B, C } C is Ref → follow

4 C → A A already in visited! Error::CyclicRef

Without cycle detection, this would loop forever. MAX_REFERENCE_HOPS = 10 also caps traversal depth for long chains.

Step	Follow	visited set	Result
1	Start at A	{ A }	A is Ref → follow
2	A → B	{ A, B }	B is Ref → follow
3	B → C	{ A, B, C }	C is Ref → follow
4	C → A	A already in visited!	Error::CyclicRef

References in Merk — Combined Value Hashes

When a Reference is stored in a Merk tree, its value_hash must authenticate both the reference structure and the referenced data:

#![allow(unused)]
fn main() {
// merk/src/tree/kv.rs
pub fn update_hashes_using_reference_value_hash(
    mut self,
    reference_value_hash: CryptoHash,
) -> CostContext<Self> {
    // Hash the reference element's own bytes
    let actual_value_hash = value_hash(self.value_as_slice());

    // Combine: H(reference_bytes) ⊕ H(referenced_data)
    let combined = combine_hash(&actual_value_hash, &reference_value_hash);

    self.value_hash = combined;
    self.hash = kv_digest_to_kv_hash(self.key(), self.value_hash());
    // ...
}
}

This means changing either the reference itself OR the data it points to will change the root hash — both are cryptographically bound.

The Storage Layer

RocksDB with OptimisticTransactionDB

GroveDB uses RocksDB as its storage backend, specifically the OptimisticTransactionDB variant that supports transactions:

#![allow(unused)]
fn main() {
// storage/src/rocksdb_storage/storage.rs
pub(crate) type Db = OptimisticTransactionDB;
pub(crate) type Tx<'db> = Transaction<'db, Db>;

pub struct RocksDbStorage {
    db: OptimisticTransactionDB,
}
}

Optimistic transactions work by assuming there won't be conflicts. If two transactions modify the same data, the second one to commit will fail and can be retried. This is more efficient than pessimistic locking for workloads where conflicts are rare.

RocksDB options are tuned for GroveDB's workload:

#![allow(unused)]
fn main() {
lazy_static! {
    static ref DEFAULT_OPTS: rocksdb::Options = {
        let mut opts = rocksdb::Options::default();
        opts.create_if_missing(true);
        opts.increase_parallelism(num_cpus::get() as i32);
        opts.set_allow_mmap_writes(true);
        opts.set_allow_mmap_reads(true);
        opts.create_missing_column_families(true);
        opts.set_atomic_flush(true);
        opts
    };
}
}

Four Column Families

RocksDB column families act like separate key-value namespaces within a single database. GroveDB uses four:

graph TD
    subgraph rocksdb["RocksDB (OptimisticTransactionDB)"]
        subgraph cf_main["Column Family: &quot;default&quot; (main)"]
            main_desc["All Merk tree nodes<br/>Key: [prefix: 32B][node_key]<br/>Value: encoded TreeNode"]
        end
        subgraph cf_aux["Column Family: &quot;aux&quot;"]
            aux_desc["Per-subtree auxiliary data<br/>Key: [prefix: 32B][aux_key]<br/>Used for application-defined metadata"]
        end
        subgraph cf_roots["Column Family: &quot;roots&quot;"]
            roots_desc["Subtree existence + root keys<br/>Key: [prefix: 32B]<br/>Value: root_key bytes"]
        end
        subgraph cf_meta["Column Family: &quot;meta&quot;"]
            meta_desc["Global metadata<br/>NOT prefixed — shared across entire DB<br/>Grove version, feature flags, etc."]
        end
    end

    style rocksdb fill:#f8f9fa,stroke:#2c3e50,stroke-width:3px
    style cf_main fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style cf_aux fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style cf_roots fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style cf_meta fill:#e8daef,stroke:#8e44ad,stroke-width:2px

Example: Key [ab3fc2...][6e616d65] in the "default" CF maps to TreeNode{key:"name", val:"Al"}, where ab3fc2... is Blake3(path) and 6e616d65 is "name" in bytes.

#![allow(unused)]
fn main() {
pub(crate) const AUX_CF_NAME: &str = "aux";
pub(crate) const ROOTS_CF_NAME: &str = "roots";
pub(crate) const META_CF_NAME: &str = "meta";
// Main data uses the default column family
}

Prefixed Storage Contexts

Each subtree gets its own prefixed storage context — a wrapper that automatically prepends the 32-byte Blake3 prefix to all keys:

    Subtree path: ["identities", "alice"]
    Prefix: Blake3(path) = [0xab, 0x3f, 0xc2, ...]  (32 bytes)

    When subtree stores key "name" with value "Alice":

    RocksDB key:   [0xab 0x3f 0xc2 ... (32 bytes) | 0x6e 0x61 0x6d 0x65]
                    \_________prefix________/       \_____"name"_____/

    RocksDB value: [encoded TreeNode with value "Alice"]

The context types:

    Without transaction:
    PrefixedRocksDbImmediateStorageContext
    └── Reads/writes directly to DB with prefix

    With transaction:
    PrefixedRocksDbTransactionContext
    └── Reads/writes through a Transaction with prefix

Both implement the StorageContext trait:

#![allow(unused)]
fn main() {
pub trait StorageContext<'db> {
    fn get(&self, key: &[u8]) -> CostResult<Option<Vec<u8>>, Error>;
    fn get_aux(&self, key: &[u8]) -> CostResult<Option<Vec<u8>>, Error>;
    fn get_root(&self, key: &[u8]) -> CostResult<Option<Vec<u8>>, Error>;
    fn get_meta(&self, key: &[u8]) -> CostResult<Option<Vec<u8>>, Error>;
    fn put(&self, key: &[u8], value: &[u8], ...) -> CostResult<(), Error>;
    fn put_aux(&self, key: &[u8], value: &[u8], ...) -> CostResult<(), Error>;
    fn put_root(&self, key: &[u8], value: &[u8], ...) -> CostResult<(), Error>;
    fn put_meta(&self, key: &[u8], value: &[u8], ...) -> CostResult<(), Error>;
    fn delete(&self, key: &[u8], ...) -> CostResult<(), Error>;
    // ...
}
}

Write Batches and Transaction Model

For performance, GroveDB accumulates writes into batches:

graph LR
    subgraph individual["Individual Writes (3 I/O ops)"]
        i1["put(&quot;a&quot;,&quot;1&quot;)"] --> d1["DISK fsync"]
        i2["put(&quot;b&quot;,&quot;2&quot;)"] --> d2["DISK fsync"]
        i3["put(&quot;c&quot;,&quot;3&quot;)"] --> d3["DISK fsync"]
    end

    subgraph batched["Batched Writes (1 atomic I/O op)"]
        b1["put(&quot;a&quot;,&quot;1&quot;)"] --> buf["Memory<br/>Buffer"]
        b2["put(&quot;b&quot;,&quot;2&quot;)"] --> buf
        b3["put(&quot;c&quot;,&quot;3&quot;)"] --> buf
        buf --> d4["DISK<br/>fsync once"]
    end

    style individual fill:#fadbd8,stroke:#e74c3c,stroke-width:2px
    style batched fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style buf fill:#fef9e7,stroke:#f39c12

3 disk syncs vs 1 disk sync = ~3x faster. Batched writes are also atomic (all-or-nothing).

The StorageBatch accumulates operations that are flushed together:

#![allow(unused)]
fn main() {
pub struct StorageBatch {
    operations: RefCell<Vec<AbstractBatchOperation>>,
}
}

The Critical commit_local() Pattern

When using transactions, there's a critical pattern that must be followed. Writes within a transaction are buffered — they're not visible until committed:

#![allow(unused)]
fn main() {
// CORRECT pattern:
{
    let tx = db.start_transaction();
    let storage_ctx = db.get_transactional_storage_context(path, &tx);

    storage_ctx.put(key, value);  // Writes to transaction buffer

    drop(storage_ctx);            // Release borrow on tx
    tx.commit_local();            // Flush transaction to DB
}

// INCORRECT — data is lost:
{
    let tx = db.start_transaction();
    let storage_ctx = db.get_transactional_storage_context(path, &tx);

    storage_ctx.put(key, value);  // Writes to transaction buffer

    // tx drops here without commit_local()!
    // All writes are ROLLED BACK!
}
}

This is especially important because the storage_ctx borrows the transaction. You must drop(storage_ctx) before you can call tx.commit_local().

The Proof System

GroveDB's proof system lets any party verify the correctness of query results without having the full database. A proof is a compact representation of the relevant tree structure that allows reconstruction of the root hash.

Stack-Based Proof Operations

Proofs are encoded as a sequence of operations that reconstruct a partial tree using a stack machine:

#![allow(unused)]
fn main() {
// merk/src/proofs/mod.rs
pub enum Op {
    Push(Node),        // Push a node onto the stack (ascending key order)
    PushInverted(Node),// Push a node (descending key order)
    Parent,            // Pop parent, pop child → attach child as LEFT of parent
    Child,             // Pop child, pop parent → attach child as RIGHT of parent
    ParentInverted,    // Pop parent, pop child → attach child as RIGHT of parent
    ChildInverted,     // Pop child, pop parent → attach child as LEFT of parent
}
}

Execution uses a stack:

Proof ops: Push(B), Push(A), Parent, Push(C), Child

Step	Operation	Stack (top→right)	Action
1	Push(B)	[ B ]	Push B onto stack
2	Push(A)	[ B , A ]	Push A onto stack
3	Parent	[ A{left:B} ]	Pop A (parent), pop B (child), B → LEFT of A
4	Push(C)	[ A{left:B} , C ]	Push C onto stack
5	Child	[ A{left:B, right:C} ]	Pop C (child), pop A (parent), C → RIGHT of A

Final result — one tree on the stack:

graph TD
    A_proof["A<br/>(root)"]
    B_proof["B<br/>(left)"]
    C_proof["C<br/>(right)"]
    A_proof --> B_proof
    A_proof --> C_proof

    style A_proof fill:#fef9e7,stroke:#f39c12,stroke-width:2px

The verifier computes node_hash(A) = Blake3(kv_hash_A || node_hash_B || node_hash_C) and checks it matches the expected root hash.

This is the execute function (merk/src/proofs/tree.rs):

#![allow(unused)]
fn main() {
pub fn execute<I, F>(ops: I, collapse: bool, mut visit_node: F) -> CostResult<Tree, Error>
where
    I: IntoIterator<Item = Result<Op, Error>>,
    F: FnMut(&Node) -> Result<(), Error>,
{
    let mut stack: Vec<Tree> = Vec::with_capacity(32);

    for op in ops {
        match op? {
            Op::Parent => {
                let (mut parent, child) = (try_pop(&mut stack), try_pop(&mut stack));
                parent.left = Some(Child { tree: Box::new(child), hash: child.hash() });
                stack.push(parent);
            }
            Op::Child => {
                let (child, mut parent) = (try_pop(&mut stack), try_pop(&mut stack));
                parent.right = Some(Child { tree: Box::new(child), hash: child.hash() });
                stack.push(parent);
            }
            Op::Push(node) => {
                visit_node(&node)?;
                stack.push(Tree::from(node));
            }
            // ... Inverted variants swap left/right
        }
    }
    // Final item on stack is the root
}
}

Node Types in Proofs

Each Push carries a Node that contains just enough information for verification:

#![allow(unused)]
fn main() {
pub enum Node {
    // Minimum info — just the hash. Used for distant siblings.
    Hash(CryptoHash),

    // KV hash for nodes on the path but not queried.
    KVHash(CryptoHash),

    // Full key-value for queried items.
    KV(Vec<u8>, Vec<u8>),

    // Key, value, and pre-computed value_hash.
    // Used for subtrees where value_hash = combine_hash(...)
    KVValueHash(Vec<u8>, Vec<u8>, CryptoHash),

    // KV with feature type — for ProvableCountTree or chunk restoration.
    KVValueHashFeatureType(Vec<u8>, Vec<u8>, CryptoHash, TreeFeatureType),

    // Reference: key, dereferenced value, hash of reference element.
    KVRefValueHash(Vec<u8>, Vec<u8>, CryptoHash),

    // For items in ProvableCountTree.
    KVCount(Vec<u8>, Vec<u8>, u64),

    // KV hash + count for non-queried ProvableCountTree nodes.
    KVHashCount(CryptoHash, u64),

    // Reference in ProvableCountTree.
    KVRefValueHashCount(Vec<u8>, Vec<u8>, CryptoHash, u64),

    // For boundary/absence proofs in ProvableCountTree.
    KVDigestCount(Vec<u8>, CryptoHash, u64),

    // Key + value_hash for absence proofs (regular trees).
    KVDigest(Vec<u8>, CryptoHash),
}
}

The choice of Node type determines what information the verifier needs:

Query: "Get value for key 'bob'"

graph TD
    dave["dave<br/><b>KVHash</b><br/>(on path, not queried)"]
    bob["bob<br/><b>KVValueHash</b><br/>key + value + value_hash<br/><i>THE QUERIED NODE</i>"]
    frank["frank<br/><b>Hash</b><br/>(distant sibling,<br/>32-byte hash only)"]
    alice["alice<br/><b>Hash</b><br/>(32-byte hash only)"]
    carol["carol<br/><b>Hash</b><br/>(32-byte hash only)"]

    dave --> bob
    dave --> frank
    bob --> alice
    bob --> carol

    style bob fill:#d5f5e3,stroke:#27ae60,stroke-width:3px
    style dave fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style frank fill:#e8e8e8,stroke:#999
    style alice fill:#e8e8e8,stroke:#999
    style carol fill:#e8e8e8,stroke:#999

Green = queried node (full data revealed). Yellow = on the path (kv_hash only). Gray = siblings (just 32-byte node hashes).

Encoded as proof ops:

#	Op	Effect
1	Push(Hash(alice_node_hash))	Push alice hash
2	Push(KVValueHash("bob", value, value_hash))	Push bob with full data
3	Parent	alice becomes left child of bob
4	Push(Hash(carol_node_hash))	Push carol hash
5	Child	carol becomes right child of bob
6	Push(KVHash(dave_kv_hash))	Push dave kv_hash
7	Parent	bob subtree becomes left of dave
8	Push(Hash(frank_node_hash))	Push frank hash
9	Child	frank becomes right child of dave

Proof Node Types by Tree Type

Each tree type in GroveDB uses a specific set of proof node types depending on the role of the node in the proof. There are four roles:

Role	Description
Queried	The node matches the query — full key + value revealed
On-path	The node is an ancestor of queried nodes — only kv_hash needed
Boundary	Adjacent to a missing key — proves absence
Distant	A sibling subtree not on the proof path — only node_hash needed

Regular Trees (Tree, SumTree, BigSumTree, CountTree, CountSumTree)

All five of these tree types use identical proof node types and the same hash function: compute_hash (= node_hash(kv_hash, left, right)). There is no difference in how they are proved at the merk level.

Each merk node carries a feature_type internally (BasicMerkNode, SummedMerkNode, BigSummedMerkNode, CountedMerkNode, CountedSummedMerkNode), but this is not included in the hash and not included in the proof. The aggregate data (sum, count) for these tree types lives in the parent Element's serialized bytes, which are hash-verified through the parent tree's proof:

Tree type	Element stores	Merk feature_type (not hashed)
Tree	`Element::Tree(root_key, flags)`	`BasicMerkNode`
SumTree	`Element::SumTree(root_key, sum, flags)`	`SummedMerkNode(sum)`
BigSumTree	`Element::BigSumTree(root_key, sum, flags)`	`BigSummedMerkNode(sum)`
CountTree	`Element::CountTree(root_key, count, flags)`	`CountedMerkNode(count)`
CountSumTree	`Element::CountSumTree(root_key, count, sum, flags)`	`CountedSummedMerkNode(count, sum)`

Where does the sum/count come from? When a verifier processes a proof for [root, my_sum_tree], the parent tree's proof includes a KVValueHash node for key my_sum_tree. The value field contains the serialized Element::SumTree(root_key, 42, flags). Since this value is hash-verified (its hash is committed to the parent Merkle root), the sum 42 is trustworthy. The merk-level feature_type is irrelevant.

Role	V0 Node Type	V1 Node Type	Hash function
Queried item	`KV`	`KV`	`node_hash(kv_hash(key, H(value)), left, right)`
Queried non-empty tree (no subquery)	`KVValueHash`	`KVValueHashFeatureTypeWithChildHash`	`node_hash(kv_hash(key, value_hash), left, right)`
Queried empty tree	`KVValueHash`	`KVValueHash`	`node_hash(kv_hash(key, value_hash), left, right)`
Queried reference	`KVRefValueHash`	`KVRefValueHash`	`node_hash(kv_hash(key, combine_hash(ref_hash, H(deref_value))), left, right)`
On-path	`KVHash`	`KVHash`	`node_hash(kv_hash, left, right)`
Boundary	`KVDigest`	`KVDigest`	`node_hash(kv_hash(key, value_hash), left, right)`
Distant	`Hash`	`Hash`	Used directly

Non-empty trees WITH a subquery descend into the child layer — the tree node appears as KVValueHash in the parent layer proof and the child layer has its own proof.

Why KVValueHash for subtrees? A subtree's value_hash is combine_hash(H(element_bytes), child_root_hash) — the verifier cannot recompute this from the element bytes alone (it would need the child root hash). So the prover provides the pre-computed value_hash.

Why KV for items? An item's value_hash is simply H(value), which the verifier can recompute. Using KV is tamper-proof: if the prover changes the value, the hash won't match.

V1 enhancement — KVValueHashFeatureTypeWithChildHash: In V1 proofs, when a queried non-empty tree has no subquery (the query stops at this tree — the tree element itself is the result), the GroveDB layer upgrades the merk node to KVValueHashFeatureTypeWithChildHash(key, value, value_hash, feature_type, child_hash). This lets the verifier check combine_hash(H(value), child_hash) == value_hash, preventing an attacker from swapping the element bytes while reusing the original value_hash. Empty trees are not upgraded because they have no child merk to provide a root hash.

Security note on feature_type: For regular (non-provable) trees, the feature_type field in KVValueHashFeatureType and KVValueHashFeatureTypeWithChildHash is decoded but not used for hash computation or returned to callers. The canonical tree type lives in the hash-verified Element bytes. This field only matters for ProvableCountTree (see below), where it carries the count needed for node_hash_with_count.

ProvableCountTree and ProvableCountSumTree

These tree types use node_hash_with_count(kv_hash, left, right, count) instead of node_hash. The count is included in the hash, so the verifier needs the count for every node to recompute the Merkle root.

Role	V0 Node Type	V1 Node Type	Hash function
Queried item	`KVCount`	`KVCount`	`node_hash_with_count(kv_hash(key, H(value)), left, right, count)`
Queried non-empty tree (no subquery)	`KVValueHashFeatureType`	`KVValueHashFeatureTypeWithChildHash`	`node_hash_with_count(kv_hash(key, value_hash), left, right, feature_type.count())`
Queried empty tree	`KVValueHashFeatureType`	`KVValueHashFeatureType`	`node_hash_with_count(kv_hash(key, value_hash), left, right, feature_type.count())`
Queried reference	`KVRefValueHashCount`	`KVRefValueHashCount`	`node_hash_with_count(kv_hash(key, combine_hash(...)), left, right, count)`
On-path	`KVHashCount`	`KVHashCount`	`node_hash_with_count(kv_hash, left, right, count)`
Boundary	`KVDigestCount`	`KVDigestCount`	`node_hash_with_count(kv_hash(key, value_hash), left, right, count)`
Distant	`Hash`	`Hash`	Used directly

Non-empty trees WITH a subquery descend into the child layer, same as regular trees.

Why does every node carry a count? Because node_hash_with_count is used instead of node_hash. Without the count, the verifier cannot reconstruct any intermediate hash on the path to the root — even for non-queried nodes.

V1 enhancement: Same as regular trees — queried non-empty trees without subqueries get upgraded to KVValueHashFeatureTypeWithChildHash for combine_hash verification.

ProvableCountSumTree note: Only the count is included in the hash. The sum is carried in the feature_type (ProvableCountedSummedMerkNode(count, sum)) but is not hashed. Like the regular tree types above, the canonical sum value lives in the parent Element's serialized bytes (e.g. Element::ProvableCountSumTree(root_key, count, sum, flags)), which are hash-verified in the parent tree's proof.

Summary: Node Type → Tree Type Matrix

Node Type	Regular Trees	ProvableCount Trees
`KV`	Queried items	—
`KVCount`	—	Queried items
`KVValueHash`	Queried subtrees	—
`KVValueHashFeatureType`	—	Queried subtrees
`KVRefValueHash`	Queried references	—
`KVRefValueHashCount`	—	Queried references
`KVHash`	On-path	—
`KVHashCount`	—	On-path
`KVDigest`	Boundary/absence	—
`KVDigestCount`	—	Boundary/absence
`Hash`	Distant siblings	Distant siblings
`KVValueHashFeatureTypeWithChildHash`	—	Non-empty trees without subquery

Multi-Layer Proof Generation

Since GroveDB is a tree of trees, proofs span multiple layers. Each layer proves the relevant portion of one Merk tree, and the layers are connected by the combined value_hash mechanism:

Query: Get ["identities", "alice", "name"]

graph TD
    subgraph layer0["LAYER 0: Root tree proof"]
        L0["Proves &quot;identities&quot; exists<br/>Node: KVValueHash<br/>value_hash = combine_hash(<br/>  H(tree_element_bytes),<br/>  identities_root_hash<br/>)"]
    end

    subgraph layer1["LAYER 1: Identities tree proof"]
        L1["Proves &quot;alice&quot; exists<br/>Node: KVValueHash<br/>value_hash = combine_hash(<br/>  H(tree_element_bytes),<br/>  alice_root_hash<br/>)"]
    end

    subgraph layer2["LAYER 2: Alice subtree proof"]
        L2["Proves &quot;name&quot; = &quot;Alice&quot;<br/>Node: KV (full key + value)<br/>Result: <b>&quot;Alice&quot;</b>"]
    end

    state_root["Known State Root"] -->|"verify"| L0
    L0 -->|"identities_root_hash<br/>must match"| L1
    L1 -->|"alice_root_hash<br/>must match"| L2

    style layer0 fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style layer1 fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style layer2 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style state_root fill:#2c3e50,stroke:#2c3e50,color:#fff

Chain of trust: known_state_root → verify Layer 0 → verify Layer 1 → verify Layer 2 → "Alice". Each layer's reconstructed root hash must match the value_hash from the layer above.

The verifier checks each layer, confirming that:

The layer proof reconstructs to the expected root hash
The root hash matches the value_hash from the parent layer
The top-level root hash matches the known state root

Proof Verification

Verification follows the proof layers bottom-up or top-down, using the execute function to reconstruct each layer's tree. The Tree::hash() method in the proof tree computes the hash based on the node type:

#![allow(unused)]
fn main() {
impl Tree {
    pub fn hash(&self) -> CostContext<CryptoHash> {
        match &self.node {
            Node::Hash(hash) => *hash,  // Already a hash, return directly

            Node::KVHash(kv_hash) =>
                node_hash(kv_hash, &self.child_hash(true), &self.child_hash(false)),

            Node::KV(key, value) =>
                kv_hash(key, value)
                    .flat_map(|kv_hash| node_hash(&kv_hash, &left, &right)),

            Node::KVValueHash(key, _, value_hash) =>
                kv_digest_to_kv_hash(key, value_hash)
                    .flat_map(|kv_hash| node_hash(&kv_hash, &left, &right)),

            Node::KVValueHashFeatureType(key, _, value_hash, feature_type) => {
                let kv = kv_digest_to_kv_hash(key, value_hash);
                match feature_type {
                    ProvableCountedMerkNode(count) =>
                        node_hash_with_count(&kv, &left, &right, *count),
                    _ => node_hash(&kv, &left, &right),
                }
            }

            Node::KVRefValueHash(key, referenced_value, ref_element_hash) => {
                let ref_value_hash = value_hash(referenced_value);
                let combined = combine_hash(ref_element_hash, &ref_value_hash);
                let kv = kv_digest_to_kv_hash(key, &combined);
                node_hash(&kv, &left, &right)
            }
            // ... other variants
        }
    }
}
}

Absence Proofs

GroveDB can prove that a key does not exist. This uses boundary nodes — the nodes that would be adjacent to the missing key if it existed:

Prove: "charlie" does NOT exist

graph TD
    dave_abs["dave<br/><b>KVDigest</b><br/>(right boundary)"]
    bob_abs["bob"]
    frank_abs["frank<br/>Hash"]
    alice_abs["alice<br/>Hash"]
    carol_abs["carol<br/><b>KVDigest</b><br/>(left boundary)"]
    missing["(no right child!)<br/>&quot;charlie&quot; would be here"]

    dave_abs --> bob_abs
    dave_abs --> frank_abs
    bob_abs --> alice_abs
    bob_abs --> carol_abs
    carol_abs -.->|"right = None"| missing

    style carol_abs fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style dave_abs fill:#fadbd8,stroke:#e74c3c,stroke-width:3px
    style missing fill:none,stroke:#e74c3c,stroke-dasharray:5 5
    style alice_abs fill:#e8e8e8,stroke:#999
    style frank_abs fill:#e8e8e8,stroke:#999

Binary search: alice < bob < carol < "charlie" < dave < frank. "charlie" would be between carol and dave. Carol's right child is None, proving nothing exists between carol and dave. Therefore "charlie" cannot exist in this tree.

For range queries, absence proofs show that there are no keys within the queried range that were not included in the result set.

Boundary Key Detection

When verifying a proof from an exclusive range query, you may need to confirm that specific keys exist as boundary elements — keys that anchor the range but are not part of the result set.

For example, with RangeAfter(10) (all keys strictly after 10), the proof includes key 10 as a KVDigest node. This proves key 10 exists in the tree and anchors the start of the range, but key 10 is not returned in the results.

When boundary nodes appear

Boundary KVDigest (or KVDigestCount for ProvableCountTree) nodes appear in proofs for exclusive range query types:

Query type	Boundary key	What it proves
`RangeAfter(start..)`	`start`	The exclusive start exists in the tree
`RangeAfterTo(start..end)`	`start`	The exclusive start exists in the tree
`RangeAfterToInclusive(start..=end)`	`start`	The exclusive start exists in the tree

Boundary nodes also appear in absence proofs, where neighboring keys prove a gap exists (see Absence Proofs above).

Retrieving all boundary keys

After verifying a proof, call boundaries on the decoded GroveDBProof to get all boundary keys at a given path:

#![allow(unused)]
fn main() {
// Decode and verify the proof
let (grovedb_proof, _): (GroveDBProof, _) =
    bincode::decode_from_slice(&proof_bytes, config)?;
let (root_hash, results) = grovedb_proof.verify(&path_query, grove_version)?;

// Get all boundary keys at this path
let boundary_keys: Vec<Vec<u8>> = grovedb_proof
    .boundaries(&[b"documents", b"notes"])?;
}

The path argument identifies which layer of the proof to inspect (matching the GroveDB subtree path where the range query was executed).

Checking a single boundary key

If you only need to check whether one specific key is a boundary, use key_exists_as_boundary:

#![allow(unused)]
fn main() {
let cursor_exists = grovedb_proof
    .key_exists_as_boundary(&[b"documents", b"notes"], &cursor_key)?;
}

Practical use: pagination verification

This is particularly useful for pagination. When a client requests "the next 100 documents after document X," the query is RangeAfter(document_X_id). The proof returns documents 101-200, but the client may also want to confirm that document X (the pagination cursor) still exists in the tree:

If the cursor key appears in boundaries(), the cursor is valid — the client can trust the pagination is anchored to a real document.
If it does not appear, the cursor document may have been deleted between pages, and the client should consider restarting pagination.

Important: Both boundaries() and key_exists_as_boundary perform a syntactic scan of the proof's KVDigest/KVDigestCount nodes. They provide no cryptographic guarantee on their own — always verify the proof against a trusted root hash first. The same node types also appear in absence proofs, so the caller should interpret results in the context of the query that generated the proof.

At the merk level, the same checks are available via boundaries_in_proof(proof_bytes) and key_exists_as_boundary_in_proof(proof_bytes, key) for working directly with raw merk proof bytes.

V1 Proofs — Non-Merk Trees

The V0 proof system works exclusively with Merk subtrees, descending layer by layer through the grove hierarchy. However, CommitmentTree, MmrTree, BulkAppendTree, and DenseAppendOnlyFixedSizeTree elements store their data outside a child Merk tree. They have no child Merk to descend into — their type-specific root hash flows as the Merk child hash instead.

The V1 proof format extends V0 to handle these non-Merk trees with type-specific proof structures:

#![allow(unused)]
fn main() {
/// Which proof format a layer uses.
pub enum ProofBytes {
    Merk(Vec<u8>),            // Standard Merk proof ops
    MMR(Vec<u8>),             // MMR membership proof
    BulkAppendTree(Vec<u8>),  // BulkAppendTree range proof
    DenseTree(Vec<u8>),       // Dense tree inclusion proof
    CommitmentTree(Vec<u8>),  // Sinsemilla root (32 bytes) + BulkAppendTree proof
}

/// One layer of a V1 proof.
pub struct LayerProof {
    pub merk_proof: ProofBytes,
    pub lower_layers: BTreeMap<Vec<u8>, LayerProof>,
}
}

V0/V1 selection rule: If every layer in the proof is a standard Merk tree, prove_query produces a GroveDBProof::V0 (backward compatible). If any layer involves an MmrTree, BulkAppendTree, or DenseAppendOnlyFixedSizeTree, it produces GroveDBProof::V1.

How Non-Merk Tree Proofs Bind to the Root Hash

The parent Merk tree proves the element's serialized bytes via a standard Merk proof node (KVValueHash). The type-specific root (e.g., mmr_root or state_root) flows as the Merk child hash — it is NOT embedded in the element bytes:

combined_value_hash = combine_hash(
    Blake3(varint(len) || element_bytes),   ← contains count, height, etc.
    type_specific_root                      ← mmr_root / state_root / dense_root
)

The type-specific proof then proves that the queried data is consistent with the type-specific root that was used as the child hash.

MMR Tree Proofs

An MMR proof demonstrates that specific leaves exist at known positions within the MMR, and that the MMR's root hash matches the child hash stored in the parent Merk node:

#![allow(unused)]
fn main() {
pub struct MmrProof {
    pub mmr_size: u64,
    pub proof: MerkleProof,  // ckb_merkle_mountain_range::MerkleProof
    pub leaves: Vec<MmrProofLeaf>,
}

pub struct MmrProofLeaf {
    pub position: u64,       // MMR position
    pub leaf_index: u64,     // Logical leaf index
    pub hash: [u8; 32],      // Leaf hash
    pub value: Vec<u8>,      // Leaf value bytes
}
}

graph TD
    subgraph parent_merk["Parent Merk (V0 layer)"]
        elem["&quot;my_mmr&quot;<br/><b>KVValueHash</b><br/>element bytes contain mmr_root"]
    end

    subgraph mmr_proof["MMR Proof (V1 layer)"]
        peak1["Peak 1<br/>hash"]
        peak2["Peak 2<br/>hash"]
        leaf_a["Leaf 5<br/><b>proved</b><br/>value = 0xABCD"]
        sibling["Sibling<br/>hash"]
        peak2 --> leaf_a
        peak2 --> sibling
    end

    elem -->|"mmr_root must match<br/>MMR root from peaks"| mmr_proof

    style parent_merk fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style mmr_proof fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style leaf_a fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

Query keys are positions: Query items encode positions as big-endian u64 bytes (which preserves sort order). QueryItem::RangeInclusive with BE-encoded start/end positions selects a contiguous range of MMR leaves.

Verification:

Reconstruct MmrNode leaves from the proof
Verify the ckb MerkleProof against the expected MMR root from the parent Merk child hash
Cross-validate that proof.mmr_size matches the element's stored size
Return the proved leaf values

BulkAppendTree Proofs

BulkAppendTree proofs are more complex because data lives in two places: sealed chunk blobs and the in-progress buffer. A range proof must return:

Full chunk blobs for any completed chunk overlapping the query range
Individual buffer entries for positions still in the buffer

#![allow(unused)]
fn main() {
pub struct BulkAppendTreeProof {
    pub chunk_power: u8,
    pub total_count: u64,
    pub chunk_blobs: Vec<(u64, Vec<u8>)>,       // (chunk_index, blob_bytes)
    pub chunk_mmr_size: u64,
    pub chunk_mmr_proof_items: Vec<[u8; 32]>,    // MMR sibling hashes
    pub chunk_mmr_leaves: Vec<(u64, [u8; 32])>,  // (mmr_pos, dense_merkle_root)
    pub buffer_entries: Vec<Vec<u8>>,             // ALL current buffer (dense tree) entries
    pub chunk_mmr_root: [u8; 32],
}
}

graph TD
    subgraph verify["Verification Steps"]
        step1["1. For each chunk blob:<br/>compute dense_merkle_root<br/>verify matches chunk_mmr_leaves"]
        step2["2. Verify chunk MMR proof<br/>against chunk_mmr_root"]
        step3["3. Recompute dense_tree_root<br/>from ALL buffer entries<br/>using dense Merkle tree"]
        step4["4. Verify state_root =<br/>blake3(&quot;bulk_state&quot; ||<br/>chunk_mmr_root ||<br/>dense_tree_root)"]
        step5["5. Extract entries in<br/>queried position range"]

        step1 --> step2 --> step3 --> step4 --> step5
    end

    style verify fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style step4 fill:#fadbd8,stroke:#e74c3c,stroke-width:2px

Why include ALL buffer entries? The buffer is a dense Merkle tree whose root hash commits to every entry. To verify the dense_tree_root, the verifier must rebuild the tree from all entries. Since the buffer is bounded by capacity entries (at most 65,535), this is acceptable.

Limit accounting: Each individual value (within a chunk or the buffer) counts toward the query limit, not each chunk blob as a whole. If a query has limit: 100 and a chunk contains 1024 entries with 500 overlapping the range, all 500 entries count toward the limit.

DenseAppendOnlyFixedSizeTree Proofs

A dense tree proof demonstrates that specific positions hold specific values, authenticated against the tree's root hash (which flows as the Merk child hash). All nodes use blake3(H(value) || H(left) || H(right)), so ancestor nodes on the auth path only need their 32-byte value hash — not the full value.

#![allow(unused)]
fn main() {
pub struct DenseTreeProof {
    pub entries: Vec<(u16, Vec<u8>)>,            // proved (position, value)
    pub node_value_hashes: Vec<(u16, [u8; 32])>, // ancestor value hashes on auth path
    pub node_hashes: Vec<(u16, [u8; 32])>,       // precomputed sibling subtree hashes
}
}

height and count come from the parent Element (authenticated by the Merk hierarchy), not the proof.

graph TD
    subgraph parent_merk["Parent Merk (V0 layer)"]
        elem["&quot;my_dense&quot;<br/><b>KVValueHash</b><br/>element bytes contain root_hash"]
    end

    subgraph dense_proof["Dense Tree Proof (V1 layer)"]
        root["Position 0<br/>node_value_hashes<br/>H(value[0])"]
        node1["Position 1<br/>node_value_hashes<br/>H(value[1])"]
        hash2["Position 2<br/>node_hashes<br/>H(subtree)"]
        hash3["Position 3<br/>node_hashes<br/>H(node)"]
        leaf4["Position 4<br/><b>entries</b><br/>value[4] (proved)"]
        root --> node1
        root --> hash2
        node1 --> hash3
        node1 --> leaf4
    end

    elem -->|"root_hash must match<br/>recomputed H(0)"| dense_proof

    style parent_merk fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style dense_proof fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style leaf4 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

Verification is a pure function requiring no storage:

Build lookup maps from entries, node_value_hashes, and node_hashes
Recursively recompute the root hash from position 0:
- Position has precomputed hash in node_hashes → use it directly
- Position with value in entries → blake3(blake3(value) || H(left) || H(right))
- Position with hash in node_value_hashes → blake3(hash || H(left) || H(right))
- Position >= count or >= capacity → [0u8; 32]
Compare computed root with expected root hash from parent element
Return proved entries on success

Multi-position proofs merge overlapping auth paths: shared ancestors and their values appear only once, making them more compact than independent proofs.

The Query System

PathQuery Structure

GroveDB queries use the PathQuery type, which combines a path (where to look) with a query (what to select):

#![allow(unused)]
fn main() {
pub struct PathQuery {
    pub path: Vec<Vec<u8>>,         // Starting path in the grove
    pub query: SizedQuery,          // What to select
}

pub struct SizedQuery {
    pub query: Query,               // The selection criteria
    pub limit: Option<u16>,         // Maximum number of results
    pub offset: Option<u16>,        // Skip first N results
}
}

The Query Type

#![allow(unused)]
fn main() {
pub struct Query {
    pub items: Vec<QueryItem>,              // What to match
    pub default_subquery_branch: SubqueryBranch,
    pub conditional_subquery_branches: Option<IndexMap<QueryItem, SubqueryBranch>>,
    pub left_to_right: bool,                // Iteration direction
    pub add_parent_tree_on_subquery: bool,  // Include parent tree element in results (v2)
}
}

add_parent_tree_on_subquery (v2): When true, the parent tree element (e.g., a CountTree or SumTree) is included in query results alongside its children's values. This lets you retrieve both aggregate values and individual elements in one query.

QueryItems — What to Select

Each QueryItem specifies a key or range to match:

#![allow(unused)]
fn main() {
pub enum QueryItem {
    Key(Vec<u8>),                           // Exact key match
    Range(Range<Vec<u8>>),                  // Exclusive range [start..end)
    RangeInclusive(RangeInclusive<Vec<u8>>),// Inclusive range [start..=end]
    RangeFull(RangeFull),                   // All keys
    RangeFrom(RangeFrom<Vec<u8>>),          // [start..)
    RangeTo(RangeTo<Vec<u8>>),              // [..end)
    RangeToInclusive(RangeToInclusive<Vec<u8>>), // [..=end]
    RangeAfter(RangeFrom<Vec<u8>>),         // (start..) exclusive start
    RangeAfterTo(Range<Vec<u8>>),           // (start..end) exclusive both
    RangeAfterToInclusive(RangeInclusive<Vec<u8>>), // (start..=end]
    AggregateCountOnRange(Box<QueryItem>),         // Count-only — see Aggregate Count Queries
}
}

AggregateCountOnRange is a terminal item: when present, it must be the only item in the Query, and the query may not carry subqueries or pagination. See Aggregate Count Queries for the full contract — it is restricted to provable count trees.

Example queries:

Merk tree (sorted): alice bob carol dave eve frank

Query	Selection	Result
`Key("bob")`	alice [bob] carol dave eve frank	bob
`RangeInclusive("bob"..="dave")`	alice [bob carol dave] eve frank	bob, carol, dave
`RangeAfter("carol"..)`	alice bob carol [dave eve frank]	dave, eve, frank
`RangeFull`, limit=2	[alice bob] carol dave eve frank (stopped by limit)	alice, bob
`RangeFull`, limit=2, right-to-left	alice bob carol dave [eve frank] (stopped by limit)	frank, eve

Subqueries and Conditional Branches

The real power of GroveDB queries is subqueries — when a query matches a Tree element, the query can automatically descend into that subtree:

graph TD
    subgraph contracts["Merk [&quot;contracts&quot;]"]
        cA["&quot;contract_A&quot;<br/>Tree"]
        cB["&quot;contract_B&quot;<br/>Tree"]
    end

    subgraph merkCA["Merk [contracts, contract_A]"]
        f1a["field1 → &quot;value1&quot;"]
        f2a["field2 → &quot;value2&quot;"]
    end

    subgraph merkCB["Merk [contracts, contract_B]"]
        f1b["field1 → &quot;value3&quot;"]
        f2b["field2 → &quot;value4&quot;"]
    end

    cA -.-> f1a
    cB -.-> f1b

    style contracts fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style merkCA fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style merkCB fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style f1a fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style f1b fill:#fef9e7,stroke:#f39c12,stroke-width:2px

PathQuery: path: ["contracts"], query: RangeFull with default_subquery: Key("field1")

Execution:

RangeFull on ["contracts"] → matches contract_A, contract_B

Both are Tree elements → descend with subquery Key("field1")

contract_A → "value1", contract_B → "value3"

Result: ["value1", "value3"]

Conditional subqueries let you apply different subqueries based on which key was matched:

#![allow(unused)]
fn main() {
conditional_subquery_branches: Some(indexmap! {
    QueryItem::Key(b"contract_A".to_vec()) => SubqueryBranch {
        subquery: Some(Query { items: vec![Key(b"field1".to_vec())] }),
        ..
    },
    QueryItem::Key(b"contract_B".to_vec()) => SubqueryBranch {
        subquery: Some(Query { items: vec![Key(b"field2".to_vec())] }),
        ..
    },
})
}

This would get field1 from contract_A but field2 from contract_B.

Sized Queries — Limit and Offset

The SizedQuery wrapper adds pagination:

graph LR
    subgraph offset_zone["Offset (skip 2)"]
        oA["A"]
        oB["B"]
    end
    subgraph result_zone["Result (limit = 3)"]
        rC["C"]
        rD["D"]
        rE["E"]
    end
    subgraph remaining["Remaining (not returned)"]
        xF["F"]
        xG["G"]
        xH["H"]
    end

    oA --> oB --> rC --> rD --> rE --> xF --> xG --> xH

    style offset_zone fill:#e8e8e8,stroke:#999,stroke-dasharray:5 5
    style result_zone fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style remaining fill:#fadbd8,stroke:#e74c3c,stroke-dasharray:5 5

SizedQuery { query: RangeFull, limit: Some(3), offset: Some(2) } → Result: [C, D, E]

When combined with left_to_right: false, the iteration is reversed:

    SizedQuery {
        query: Query { items: [RangeFull], left_to_right: false, .. },
        limit: Some(3),
        offset: None
    }

    Result: [H, G, F]

Query Merging

Multiple PathQueries can be merged into a single query for efficiency. The merge algorithm finds common path prefixes and combines query items:

    Query A: path=["users"], query=Key("alice")
    Query B: path=["users"], query=Key("bob")

    Merged:  path=["users"], query=items=[Key("alice"), Key("bob")]

Aggregate Sum Queries

Heads up — two different features. This page covers the sum-budget iterator: walk a SumTree returning (key, sum_value) pairs until a running total is reached. If you instead want a cryptographically verifiable total for a key range against a ProvableSumTree, see Aggregate Sum on Range Queries. The two features are independent — the iterator does not produce a proof of the running total, only the elements that contributed to it.

Overview

Aggregate Sum Queries are a specialized query type designed for SumTrees in GroveDB. While regular queries retrieve elements by key or range, aggregate sum queries iterate through elements and accumulate their sum values until a sum limit is reached.

This is useful for questions like:

"Give me transactions until the running total exceeds 1000"
"Which items contribute to the first 500 units of value in this tree?"
"Collect sum items up to a budget of N"

Core Concepts

How It Differs from Regular Queries

Feature	PathQuery	AggregateSumPathQuery
Target	Any element type	SumItem / ItemWithSumItem elements
Stop condition	Limit (count) or end of range	Sum limit (running total) and/or item limit
Returns	Elements or keys	Key-sum value pairs
Subqueries	Yes (descend into subtrees)	No (single tree level)
References	Resolved by GroveDB layer	Optionally followed or ignored

The AggregateSumQuery Structure

#![allow(unused)]
fn main() {
pub struct AggregateSumQuery {
    pub items: Vec<QueryItem>,              // Keys or ranges to scan
    pub left_to_right: bool,                // Iteration direction
    pub sum_limit: u64,                     // Stop when running total reaches this
    pub limit_of_items_to_check: Option<u16>, // Max number of matching items to return
}
}

The query is wrapped in an AggregateSumPathQuery to specify where in the grove to look:

#![allow(unused)]
fn main() {
pub struct AggregateSumPathQuery {
    pub path: Vec<Vec<u8>>,                 // Path to the SumTree
    pub aggregate_sum_query: AggregateSumQuery,
}
}

Sum Limit — The Running Total

The sum_limit is the central concept. As elements are scanned, their sum values are accumulated. Once the running total meets or exceeds the sum limit, iteration stops:

graph LR
    subgraph scanned["Scanned (sum_limit = 15)"]
        A["a: 7"]
        B["b: 5"]
        C["c: 3"]
    end
    subgraph remaining["Not reached"]
        D["d: 11"]
    end

    A -->|"total: 7"| B -->|"total: 12"| C -->|"total: 15 ≥ 15 → stop"| D

    style scanned fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style remaining fill:#fadbd8,stroke:#e74c3c,stroke-dasharray:5 5

Result: [(a, 7), (b, 5), (c, 3)] — iteration stops because 7 + 5 + 3 = 15 >= sum_limit

Negative sum values are supported. A negative value increases the remaining budget:

sum_limit = 12, elements: a(10), b(-3), c(5)

a: total = 10, remaining = 2
b: total =  7, remaining = 5  ← negative value gave us more room
c: total = 12, remaining = 0  ← stop

Result: [(a, 10), (b, -3), (c, 5)]

Query Options

The AggregateSumQueryOptions struct controls query behavior:

#![allow(unused)]
fn main() {
pub struct AggregateSumQueryOptions {
    pub allow_cache: bool,                              // Use cached reads (default: true)
    pub error_if_intermediate_path_tree_not_present: bool, // Error on missing path (default: true)
    pub error_if_non_sum_item_found: bool,              // Error on non-sum elements (default: true)
    pub ignore_references: bool,                        // Skip references (default: false)
}
}

Handling Non-Sum Elements

SumTrees may contain a mix of element types: SumItem, Item, Reference, ItemWithSumItem, and others. By default, encountering a non-sum, non-reference element produces an error.

When error_if_non_sum_item_found is set to false, non-sum elements are silently skipped without consuming a user limit slot:

Tree contents: a(SumItem=7), b(Item), c(SumItem=3)
Query: sum_limit=100, limit_of_items_to_check=2, error_if_non_sum_item_found=false

Scan: a(7) → returned, limit=1
      b(Item) → skipped, limit still 1
      c(3) → returned, limit=0 → stop

Result: [(a, 7), (c, 3)]

Note: ItemWithSumItem elements are always processed (never skipped), because they carry a sum value.

Reference Handling

By default, Reference elements are followed — the query resolves the reference chain (up to 3 intermediate hops) to find the target element's sum value:

Tree contents: a(SumItem=7), ref_b(Reference → a)
Query: sum_limit=100

ref_b is followed → resolves to a(SumItem=7)

Result: [(a, 7), (ref_b, 7)]

When ignore_references is true, references are silently skipped without consuming a limit slot, similar to how non-sum elements are skipped.

Reference chains deeper than 3 intermediate hops produce a ReferenceLimit error.

The Result Type

Queries return an AggregateSumQueryResult:

#![allow(unused)]
fn main() {
pub struct AggregateSumQueryResult {
    pub results: Vec<(Vec<u8>, i64)>,       // Key-sum value pairs
    pub hard_limit_reached: bool,           // True if system limit truncated results
}
}

The hard_limit_reached flag indicates whether the system's hard scan limit (default: 1024 elements) was reached before the query completed naturally. When true, more results may exist beyond what was returned.

Three Limit Systems

Aggregate sum queries have three stopping conditions:

Limit	Source	What it counts	Effect when reached
sum_limit	User (query)	Running total of sum values	Stops iteration
limit_of_items_to_check	User (query)	Matching items returned	Stops iteration
Hard scan limit	System (GroveVersion, default 1024)	All elements scanned (including skipped)	Stops iteration, sets `hard_limit_reached`

The hard scan limit prevents unbounded iteration when no user limit is set. Skipped elements (non-sum items with error_if_non_sum_item_found=false, or references with ignore_references=true) count against the hard scan limit but not against the user's limit_of_items_to_check.

API Usage

Simple Query

#![allow(unused)]
fn main() {
use grovedb::AggregateSumPathQuery;
use grovedb_merk::proofs::query::AggregateSumQuery;

// "Give me items from this SumTree until the total reaches 1000"
let query = AggregateSumQuery::new(1000, None);
let path_query = AggregateSumPathQuery {
    path: vec![b"my_tree".to_vec()],
    aggregate_sum_query: query,
};

let result = db.query_aggregate_sums(
    &path_query,
    true,   // allow_cache
    true,   // error_if_intermediate_path_tree_not_present
    None,   // transaction
    grove_version,
).unwrap().expect("query failed");

for (key, sum_value) in &result.results {
    println!("{}: {}", String::from_utf8_lossy(key), sum_value);
}
}

Query with Options

#![allow(unused)]
fn main() {
use grovedb::{AggregateSumPathQuery, AggregateSumQueryOptions};
use grovedb_merk::proofs::query::AggregateSumQuery;

// Skip non-sum items and ignore references
let query = AggregateSumQuery::new(1000, Some(50));
let path_query = AggregateSumPathQuery {
    path: vec![b"mixed_tree".to_vec()],
    aggregate_sum_query: query,
};

let result = db.query_aggregate_sums_with_options(
    &path_query,
    AggregateSumQueryOptions {
        error_if_non_sum_item_found: false,  // skip Items, Trees, etc.
        ignore_references: true,              // skip References
        ..AggregateSumQueryOptions::default()
    },
    None,
    grove_version,
).unwrap().expect("query failed");

if result.hard_limit_reached {
    println!("Warning: results may be incomplete (hard limit reached)");
}
}

Key-Based Queries

Instead of scanning a range, you can query specific keys:

#![allow(unused)]
fn main() {
// Check the sum value of specific keys
let query = AggregateSumQuery::new_with_keys(
    vec![b"alice".to_vec(), b"bob".to_vec(), b"carol".to_vec()],
    u64::MAX,  // no sum limit
    None,      // no item limit
);
}

Descending Queries

Iterate from the highest key to the lowest:

#![allow(unused)]
fn main() {
let query = AggregateSumQuery::new_descending(500, Some(10));
// Or: query.left_to_right = false;
}

Constructor Reference

Constructor	Description
`new(sum_limit, limit)`	Full range, ascending
`new_descending(sum_limit, limit)`	Full range, descending
`new_single_key(key, sum_limit)`	Single key lookup
`new_with_keys(keys, sum_limit, limit)`	Multiple specific keys
`new_with_keys_reversed(keys, sum_limit, limit)`	Multiple keys, descending
`new_single_query_item(item, sum_limit, limit)`	Single QueryItem (key or range)
`new_with_query_items(items, sum_limit, limit)`	Multiple QueryItems

Aggregate Count Queries

Overview

An Aggregate Count Query lets a caller ask a single, very specific question:

"How many elements in this subtree fall inside this key range?"

The answer comes back as a u64, and on a ProvableCountTree or ProvableCountSumTree it can be returned together with a cryptographic proof that anyone holding the tree's root hash can verify — without ever materializing the elements themselves.

Where regular queries return key/value pairs and aggregate-sum queries return running totals of SumItem values, an aggregate-count query returns only a count and a proof of that count.

It is implemented as a new QueryItem variant:

#![allow(unused)]
fn main() {
pub enum QueryItem {
    Key(Vec<u8>),
    Range(Range<Vec<u8>>),
    // ... existing variants ...
    RangeAfterToInclusive(RangeInclusive<Vec<u8>>),

    /// Count the elements matched by the inner range, without returning them.
    /// Only valid on ProvableCountTree / ProvableCountSumTree (and their
    /// `NonCounted` wrapper variants).
    AggregateCountOnRange(Box<QueryItem>),
}
}

The wrapped QueryItem is the range to count over — it must be one of the true range variants: Range, RangeInclusive, RangeFrom, RangeTo, RangeToInclusive, RangeAfter, RangeAfterTo, RangeAfterToInclusive. The single-key (Key), full-range (RangeFull), and self-nested (AggregateCountOnRange) variants are all rejected.

Why are Key and RangeFull rejected?

Key(k) would always return 0 or 1 — an existence test. Callers should use the existing GroveDb::has_raw / GroveDb::get_raw (or their provable variants) instead. Routing existence checks through this API would force a count-shaped result type and proof shape on a question that already has a much cheaper, narrower answer.

RangeFull has its answer already exposed by the parent's Element::ProvableCountTree(_, count, _) / Element::ProvableCountSumTree(_, count, _, _) bytes, which are hash-verified by the parent Merk's proof. Going through AggregateCountOnRange(RangeFull) would always produce a strictly heavier proof for an answer the caller can read directly.

In short, AggregateCountOnRange exists for the case the rest of the API can't already answer cheaply: counting a bounded sub-range of keys.

Why this works only on Provable Count Trees

GroveDB has six tree types that track a count:

Tree type	Count tracked?	Count in node hash?	AggregateCountOnRange allowed?
`CountTree`	yes	no	no
`CountSumTree`	yes	no	no
`ProvableCountTree`	yes	yes	yes
`ProvableCountSumTree`	yes	yes (count only)	yes
`NonCountedProvableCountTree`	yes (via wrapper)	yes (inner)	yes
`NonCountedProvableCountSumTree`	yes (via wrapper)	yes (inner)	yes

Only the provable variants bake the count into the node hash via node_hash_with_count(kv_hash, left, right, count). Because every node's count participates in the Merkle root, a verifier holding only the root hash can reconstruct enough of the tree from a proof to trust the counts that appear inside.

Plain CountTree and CountSumTree track counts in storage as a convenience for the executing node, but those counts are not in the hash. A "proof" of their count would be unverifiable, so we reject AggregateCountOnRange against them at query-construction time.

The two NonCounted* wrapper variants are accepted because the wrapper only tells the parent tree to skip this element when aggregating its own count; the inner tree is still a fully-fledged provable count tree.

Query-Level Constraints

AggregateCountOnRange is a terminal query item. When it appears, the surrounding Query is reduced to a single, well-defined operation: "count, then return."

#![allow(unused)]
fn main() {
pub struct Query {
    pub items: Vec<QueryItem>,
    pub default_subquery_branch: SubqueryBranch,
    pub conditional_subquery_branches: Option<IndexMap<QueryItem, SubqueryBranch>>,
    pub left_to_right: bool,
    pub add_parent_tree_on_subquery: bool,
}
}

If any QueryItem::AggregateCountOnRange(_) appears in items, the query is only well-formed when all of the following hold:

items.len() == 1 — no other range items, no other counts, no mixing.
The inner QueryItem is not Key (use has_raw / get_raw for existence tests — see the note above).
The inner QueryItem is not RangeFull (use the parent element to read the unconditional total — see the note above).
The inner QueryItem is not itself another AggregateCountOnRange.
default_subquery_branch.subquery.is_none() and subquery_path.is_none().
conditional_subquery_branches.is_none() (or empty).
The targeted subtree's TreeType is one of the four allowed variants above.
The enclosing SizedQuery does not set a limit or offset. Counting is an aggregate over the matched range — pagination would silently change the answer and is therefore rejected.
left_to_right is ignored (counting is direction-agnostic). It is not an error to set it, but it has no effect on the returned count or proof.

Violating constraints 1–8 returns Error::InvalidQuery(...) with a message that names the offending field, before any I/O is performed.

API surface

AggregateCountOnRange queries go through the same prove_query entry point as every other PathQuery — only the verifier is dedicated:

#![allow(unused)]
fn main() {
// Prove side — unchanged from regular queries:
GroveDb::prove_query(&path_query, prove_options, grove_version)
    -> CostResult<Vec<u8>, Error>

// Verify side — dedicated, returns (root_hash, count):
GroveDb::verify_aggregate_count_query(proof, &path_query, grove_version)
    -> Result<(CryptoHash, u64), Error>
}

A bare tuple is used for the result rather than a wrapper struct because the count is already a u64 and the path_query itself echoes the inner range — there is nothing else to return.

Note on NonCounted children. Element::NonCounted wrappers tell the parent tree to skip the wrapped element when aggregating its own count. AggregateCountOnRange honors this: every node in a ProvableCountTree carries an own-count of 1 (normal) or 0 (NonCounted-wrapped), and the verifier credits only the own-count to the in-range total when the boundary key falls in range. So NonCounted children are excluded from the result, matching the tree's own aggregate.

Mechanically the verifier derives each boundary node's own-count from its committed aggregate as aggregate − left_struct − right_struct (see the "Verifier shape walk" section). For a NonCounted leaf, aggregate = 0 and there are no children, so own-count = 0 and the key contributes nothing.

How the Proof is Built

For a ProvableCountTree, every node hash already commits to the count of its own subtree via node_hash_with_count(kv_hash, left, right, count). The proof generator's job is to produce just enough structure that the verifier can:

Reconstruct the root hash of the queried Merk and check it against the expected hash.
Compute the answer count from the count fields embedded along the way.

To do that, every proof node has a role; we use a small vocabulary of proof-node types — three from the existing proof system, plus one new self-verifying node added specifically for this proof shape:

Role in proof	Proof node type	What it carries	Why we picked it
On-path / boundary	`KVDigestCount(key, value_hash, count)`	key + value digest + subtree count	the verifier needs the key to test "is it in the range?", and the count is hash-bound via `node_hash_with_count` so it can also be used as the structural count of this subtree by ancestor own-count derivation
Fully-inside root	`HashWithCount(kv_hash, left_child_hash, right_child_hash, count)`	the four fields needed to recompute `node_hash_with_count`	one op per collapsed subtree, and self-verifying — see security note below
Fully-outside	`HashWithCount(kv_hash, left_child_hash, right_child_hash, count)` (same)	same shape as the inside variant	the structural count of an outside subtree is needed by the boundary parent's `own_count = aggregate − left − right` derivation; only `HashWithCount` carries a hash-bound count, so we use it for outside subtrees too. Plain `Hash(_)` would not bind a count and is therefore not used in count proofs.
Empty side	(the empty-tree sentinel, no `Push` needed)	—	a missing child contributes hash = 0 and count = 0 to the parent

Why HashWithCount is self-verifying. The count value carried by a HashWithCount op is bound to the parent merk's hash chain, not trusted on faith. The verifier computes node_hash_with_count(kv_hash, left_child_hash, right_child_hash, count) from the four committed fields and uses the result as the subtree's committed node_hash for the parent's hash recomputation. If the prover lied about count, the recomputed node_hash diverges from what the parent committed, and the parent's Merkle-root check fails. (An earlier draft of this design used HashWithCount(node_hash, count) only — that form was rejected during review because the count would have been trustlessly attached metadata, with no cryptographic binding. See the "Verifier shape walk" section below for the second half of the security story.)

Walking running example

We'll use this 7-key ProvableCountTree as the running example through every diagram below. Counts shown next to each node are "size of the subtree rooted here":

graph TD
    d["d<br/>count = 7"]
    b["b<br/>count = 3"]
    f["f<br/>count = 3"]
    a["a<br/>count = 1"]
    c["c<br/>count = 1"]
    e["e<br/>count = 1"]
    g["g<br/>count = 1"]
    d --> b
    d --> f
    b --> a
    b --> c
    f --> e
    f --> g

    style d fill:#fef9e7,stroke:#f39c12,stroke-width:2px

Below, each per-case diagram colours nodes by the role table above:

🟢 green = HashWithCount (fully-inside, contributes count, not descended)
🟡 yellow = KVDigestCount (on-path / boundary, key tested for in-range)
⚪ gray = HashWithCount used as a fully-outside subtree (carries the structural count needed by the boundary parent's own_count derivation, but its key is not in range so it contributes 0 to the in-range total)

Case 1 — Open ranges (one bound)

These are the variants with a single bound: RangeFrom(a..), RangeTo(..b), RangeToInclusive(..=b), RangeAfter((a, ..)). Conceptually we walk down to that one bound, partitioning each subtree along the way into "fully on the included side" or "fully on the excluded side".

Example — `RangeFrom("c"..)` → keys ≥ "c"

Expected: {c, d, e, f, g}, count = 5.

graph TD
    d["d<br/>KVDigestCount<br/>key = d, vh, count = 7"]
    b["b<br/>KVDigestCount<br/>key = b, vh, count = 3"]
    f["f<br/>HashWithCount<br/>kv_hash, l, r, count = 3"]
    aH["a<br/>HashWithCount<br/>kv_hash, l, r, count = 1"]
    c["c<br/>KVDigestCount<br/>key = c, vh, count = 1"]
    d --> b
    d --> f
    b --> aH
    b --> c

    style d fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style b fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style c fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style f fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style aH fill:#e8e8e8,stroke:#999,stroke-dasharray:5 5

Why each role:

d, b, c — boundary nodes on the walk to the lower bound "c". Each is KVDigestCount because the verifier must test its key against >= "c".
a — left child of b; "a" < "c", so its entire subtree is excluded from the in-range total. Sent as a HashWithCount (no key) — the verifier needs the structural count = 1 to derive b's own_count, and this is the only proof-node type that binds the count to b's hash chain. The a subtree contributes 0 to the in-range total (its key is not tested).
f — right child of d; "d" < "f" and we're including everything ≥ "c", so the entire f subtree (including its descendants) is in-range. We don't need to descend — f is sent as a single HashWithCount op whose (kv_hash, left_child_hash, right_child_hash, count) lets the verifier recompute f.node_hash self-contained, and contributes the full subtree count of 3 directly. The original tree's e and g children do not appear as separate proof ops — their hashes live inside the HashWithCount's left_child_hash / right_child_hash fields.

Verifier total:

Node	In range?	Contribution
d (KVDigestCount, key="d")	"d" ≥ "c"	+1
b (KVDigestCount, key="b")	"b" < "c"	+0
a (HashWithCount, count=1)	(outside, key not tested)	+0
c (KVDigestCount, key="c")	"c" ≥ "c"	+1
f (HashWithCount, count=3)	(whole subtree in range)	+3

→ count = 5 ✓

Example — `RangeAfter(("b", ..))` → keys > "b"

Same expected match set {c, d, e, f, g}, count = 5 — but the boundary walk stops one level higher (at b instead of c), and the in-range test flips from >= to >.

graph TD
    d["d<br/>KVDigestCount<br/>key = d, vh, count = 7"]
    b["b<br/>KVDigestCount<br/>key = b, vh, count = 3"]
    f["f<br/>HashWithCount<br/>kv_hash, l, r, count = 3"]
    aH["a<br/>HashWithCount<br/>kv_hash, l, r, count = 1"]
    c["c<br/>HashWithCount<br/>kv_hash, l, r, count = 1"]
    d --> b
    d --> f
    b --> aH
    b --> c

    style d fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style b fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style c fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style f fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style aH fill:#e8e8e8,stroke:#999,stroke-dasharray:5 5

Why each role differs from the previous example:

b is now the boundary's terminus, not c. It is still KVDigestCount because the verifier needs the key to apply the in-range test — but the test is now > "b", so b itself fails and contributes 0.
c is the right child of b. Every key in c's subtree is > "b" (here, just the leaf c itself), so the whole subtree is in-range. We don't descend; c becomes HashWithCount (no key needed — its (kv_hash, l, r, count) self-contains everything the verifier needs) and contributes its count of 1 directly. Compare to the previous example where c was a boundary node tested against >= "c".
a plays the same role as before — fully outside, sent as HashWithCount so its structural count of 1 is hash-bound to b. Contributes 0 to the in-range total (key not tested). f's original-tree children (e, g) do not appear as separate proof ops — they live inside f's HashWithCount fields.

Verifier total:

Node	In range?	Contribution
d (KVDigestCount, key="d")	"d" > "b"	+1
b (KVDigestCount, key="b")	"b" > "b" → no	+0
a (HashWithCount, count=1)	(outside, key not tested)	+0
c (HashWithCount, count=1)	(whole subtree in range)	+1
f (HashWithCount, count=3)	(whole subtree in range)	+3

→ count = 5 ✓

Take-away: the match set is the same as RangeFrom("c"..), but the proof shape is slightly cheaper — one fewer KVDigestCount and one extra HashWithCount — because the bound aligns with an internal node rather than a leaf. The generator picks the shape based on where the bound key lives in the tree, not on what the user wrote.

The same pattern, mirrored, applies to RangeTo(..b) and RangeToInclusive(..=b) (upper-bound variants — boundary walk goes right, fully-inside subtrees hang off the left of each step). The only differences across all four open-range variants are which side of each split is "fully-included" and whether the boundary key itself counts (>= vs > for the lower side, < vs <= for the upper side).

Case 2 — Closed ranges (both bounds)

These are the variants with both a lower and upper bound: Range(a..b), RangeInclusive(a..=b), RangeAfterTo((a, b)), RangeAfterToInclusive((a, ..=b)).

The proof has two boundary walks meeting at the lowest common ancestor of the two bounds. Subtrees fully between the two bounds appear as HashWithCount; subtrees fully outside both bounds also appear as HashWithCount (the structural count is needed by the boundary parent's own_count derivation, and only HashWithCount binds that count to the parent's hash chain).

To make the structure interesting we'll use a slightly bigger example tree than for Case 1 — 15 keys (a through o), 4 levels deep, balanced as a perfect binary tree. Counts shown are subtree sizes:

graph TD
    h["h<br/>count = 15"]
    d["d<br/>count = 7"]
    l["l<br/>count = 7"]
    b["b<br/>count = 3"]
    f["f<br/>count = 3"]
    j["j<br/>count = 3"]
    n["n<br/>count = 3"]
    a["a<br/>count = 1"]
    c["c<br/>count = 1"]
    e["e<br/>count = 1"]
    g["g<br/>count = 1"]
    i["i<br/>count = 1"]
    k["k<br/>count = 1"]
    m["m<br/>count = 1"]
    o["o<br/>count = 1"]
    h --> d
    h --> l
    d --> b
    d --> f
    l --> j
    l --> n
    b --> a
    b --> c
    f --> e
    f --> g
    j --> i
    j --> k
    n --> m
    n --> o

    style h fill:#fef9e7,stroke:#f39c12,stroke-width:2px

Example — `RangeInclusive("c"..="l")` → keys ∈ [c, l]

Expected: {c, d, e, f, g, h, i, j, k, l}, count = 10.

graph TD
    h["h<br/>KVDigestCount<br/>key = h, vh, count = 15"]
    d["d<br/>KVDigestCount<br/>key = d, vh, count = 7"]
    l["l<br/>KVDigestCount<br/>key = l, vh, count = 7"]
    b["b<br/>KVDigestCount<br/>key = b, vh, count = 3"]
    f["f<br/>HashWithCount<br/>kv_hash, l, r, count = 3"]
    j["j<br/>HashWithCount<br/>kv_hash, l, r, count = 3"]
    nH["n subtree<br/>HashWithCount<br/>kv_hash, l, r, count = 3"]
    aH["a<br/>HashWithCount<br/>kv_hash, l, r, count = 1"]
    c["c<br/>KVDigestCount<br/>key = c, vh, count = 1"]
    h --> d
    h --> l
    d --> b
    d --> f
    l --> j
    l --> nH
    b --> aH
    b --> c

    style h fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style d fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style l fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style b fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style c fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style f fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style j fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style aH fill:#e8e8e8,stroke:#999,stroke-dasharray:5 5
    style nH fill:#e8e8e8,stroke:#999,stroke-dasharray:5 5

Why each role:

h — LCA of "c" and "l". Sits above both walks, so it's a KVDigestCount and the verifier tests its key against [c, l].
d — on the left walk (down to lower bound c). KVDigestCount, key tested.
l — on the right walk (down to upper bound l); also the upper bound itself. KVDigestCount, key tested (it passes — l ≤ l).
b — on the left walk (b < c, so we have to descend further to find the lower bound). KVDigestCount, key tested (it fails — b < c).
c — the lower bound itself. KVDigestCount, key tested (it passes — c ≥ c).
a — left of b; "a" < "c", entire subtree outside. Sent as HashWithCount carrying (kv_hash, l, r, count = 1) so its structural count is hash-bound to b. Contributes 0 to the in-range total.
n — right of l; entire subtree has keys > "l". The whole n subtree (n, m, o) collapses to a single HashWithCount carrying (kv_hash, l, r, count = 3) so its structural count is hash-bound to l. Contributes 0 to the in-range total.
f — right child of d. Every key under f is > "d" and ≤ "g" < "l", so the entire subtree is in-range. We do not descend; f becomes a single HashWithCount op carrying (kv_hash, left_child_hash, right_child_hash, count=3) and contributes 3 directly. Its original-tree children e and g do not appear as separate proof ops — their hashes are inside f's HashWithCount fields.
j — left child of l. Same shape as f: every key under j is ≥ "i" > "c" and ≤ "k" < "l", so the entire subtree is in-range. HashWithCount, contributes count = 3. i and k likewise live inside j's embedded child hashes.

Each collapsed subtree is one Push op. Because HashWithCount embeds its (kv_hash, left_child_hash, right_child_hash, count) directly, every fully-inside subtree contributes exactly one proof op regardless of its depth in the original tree. The proof for this 15-key range scan in a 4-level tree is just 9 push ops (h, d, b, c, a, f, l, j, n) plus the structural Parent/Child ops — barely more than the 7-key example in Case 1. This is what "O(log n) regardless of count" looks like in practice: deeper trees do not blow up the proof.

Verifier total:

Node	In range?	Contribution
h (KVDigestCount, key="h")	"c" ≤ "h" ≤ "l"	+1
d (KVDigestCount, key="d")	"c" ≤ "d" ≤ "l"	+1
b (KVDigestCount, key="b")	"b" < "c" → no	+0
a (HashWithCount, count=1)	(outside, key not tested)	+0
c (KVDigestCount, key="c")	"c" ≤ "c" ≤ "l"	+1
f (HashWithCount, count=3)	(whole subtree in range)	+3
l (KVDigestCount, key="l")	"c" ≤ "l" ≤ "l"	+1
j (HashWithCount, count=3)	(whole subtree in range)	+3
n (HashWithCount, count=3)	(outside, key not tested)	+0

→ count = 10 ✓

Variant differences

The four closed-range variants differ only in whether each boundary key itself counts, not in the proof shape:

Variant	Lower test	Upper test
`Range(a..b)`	key ≥ a	key < b
`RangeInclusive(a..=b)`	key ≥ a	key ≤ b
`RangeAfterTo((a, b))`	key > a	key < b
`RangeAfterToInclusive((a, ..=b))`	key > a	key ≤ b

The verifier applies the relevant test at each boundary KVDigestCount. The generator does not need to know which variant is in play — it always emits the same shape, and the inclusivity flags travel with the query for the verifier.

Empty subtrees

An aggregate-count query against an empty Merk returns count = 0 with a trivial proof (the empty-tree marker). Asking for AggregateCountOnRange on a path that does not resolve to a tree at all is an error (Error::PathNotFound(...)), the same as any other query.

Why this is `O(log n)` regardless of count

Every diagram above has at most:

One walk per bound (so 1 or 2 walks of depth O(log n)),
A constant number of fully-inside subtree roots per level (the "right siblings" hanging off the left walk and "left siblings" hanging off the right walk).

Each of those is a single proof-node Push. Therefore the proof's node count is O(log n), and crucially does not depend on the answer's value. Counting a billion-key range can be done with the same proof size as counting a hundred-key range.

Verifier shape walk

The verifier is two-phase, not just a "count everything visible" pass. Without this discipline a malicious prover could:

Send a single Push(Hash(expected_root)) for a non-empty tree, and receive (expected_root, 0) for any range — root hash matches, count is trivially zero.
Replace an in-range collapsed subtree with a hash carrying the same node_hash but no count, undercounting by the missing subtree count.
Attach extra KVDigestCount children below a keyless leaf node. Tree::hash() for those node types is computed only from their embedded fields and ignores any reconstructed children, so the root hash stays valid — but a verifier that summed every visited node would credit the bogus children as +1 each.
Lie about the structural count of an outside subtree to skew an ancestor boundary node's own_count derivation, over- or under- counting NonCounted-aware boundary contributions.

To rule out all four, the verifier:

Phase 1 — decode the proof bytes into a ProofTree via execute_with_options. The visit-node closure performs only a coarse allowlist (HashWithCount / KVDigestCount; plain Hash is not accepted in count proofs) and does not count anything. (We disable the AVL balance check for this proof shape — count proofs intentionally collapse one side to height 1 while descending the other.)

Phase 2 — walk the reconstructed tree with the same inherited exclusive subtree-key bounds the prover used ((None, None) at the root). At each position, call classify_subtree(bounds, range) and bind the proof-tree node type to the classification, returning the pair (in_range_count, structural_count) where structural_count is the merk-recorded aggregate count of this subtree (used by the parent's own_count derivation):

Classification	Required node	Children allowed?	`(in_range, structural)`
`Disjoint`	leaf `HashWithCount(_, _, _, count)`	no (must be a leaf)	`(0, count)`
`Contained`	leaf `HashWithCount(_, _, _, count)`	no (must be a leaf)	`(count, count)` — `count` is the merk's aggregate, which already excludes `NonCounted` entries (own = 0)
`Boundary`	`KVDigestCount(key, _, aggregate)` with `key` strictly inside `bounds`	yes — recurse	`own_count = aggregate − left_struct − right_struct`; in-range = `left_in + right_in + (own_count if range.contains(key) else 0)`; structural = `aggregate`

Counts are summed with checked_add; the boundary own_count uses checked_sub (so a malformed proof claiming children's structural counts that exceed the parent's aggregate is rejected, not silently saturated).

Because every leaf-shape position is forced to be a leaf, attack 3 (smuggled counted children under a keyless node) is rejected. Because every Contained and Disjoint position must hold HashWithCount (and its count is bound to the parent's hash via node_hash_with_count), attacks 2 and 4 are both rejected — outside subtrees can't lie about their structural count any more than inside ones can. Because the root's (None, None) bounds against any bounded inner range classify as Boundary (requiring KVDigestCount), attack 1 is rejected.

The shape walk is independent of the chain-hash check: even a proof whose reconstructed root happens to match the expected root will be rejected if its shape diverges from what classify_subtree expects.

Decode safety

QueryItem::AggregateCountOnRange(Box<QueryItem>) is the only recursive variant in the enum. To prevent a small malicious payload of repeated variant-10 bytes from exhausting the stack inside the bincode or serde decoder before any validation runs:

The bincode Decode / BorrowDecode impls dispatch through internal decode_with_depth helpers with MAX_QUERY_ITEM_DECODE_DEPTH = 4 (the only legal nesting is one wrap, plus headroom). Exceeding the limit errors with "QueryItem nesting depth exceeded maximum during deserialization".
The serde Deserialize impl deserializes the inner item via a NonAggregateInner newtype wrapper whose Field enum omits AggregateCountOnRange, so a nested-aggregate payload is rejected by serde's enum dispatcher immediately, with no recursion through QueryItem::deserialize.
Defense in depth: an inner AggregateCountOnRange is also rejected on decode (in addition to being rejected by Query::validate_aggregate_count_on_range).

Cost Model

AggregateCountOnRange queries are designed to be cheap and predictable:

Storage seeks: O(log n).
Hash calls: one per node in the proof.
Proof bytes: O(log n) * (hash size + count varint size).

There is no per-element cost component, because no elements are read or returned. This is the headline reason the API exists — a billion-element tree can be counted in a few hundred bytes of proof.

The cost-tracking integration mirrors regular range queries, but with the "loaded bytes" component dominated by the proof shape rather than element payloads.

API Sketch

#![allow(unused)]
fn main() {
use grovedb::{Element, GroveDb, PathQuery, Query, SizedQuery};
use grovedb_query::QueryItem;

// "How many votes have keys between block 1_000 and 2_000 (exclusive)?"
// Use the helper constructor to skip the boilerplate of building the Query
// and SizedQuery by hand.
let path_query = PathQuery::new_aggregate_count_on_range(
    vec![b"votes".to_vec()],
    QueryItem::Range(1_000u64.to_be_bytes().to_vec()..2_000u64.to_be_bytes().to_vec()),
);

let proof_bytes = db
    .prove_query(&path_query, None, grove_version)
    .unwrap()
    .expect("prove failed");

// Verifier side — only needs the proof bytes + the trusted root hash.
let (root, count) = GroveDb::verify_aggregate_count_query(
    &proof_bytes, &path_query, grove_version,
).expect("verify failed");

assert_eq!(root, expected_root_hash);
println!("votes in [1000, 2000): {}", count);
}

Comparison Table

Feature	Regular `Query`	`AggregateSumQuery`	`AggregateCountOnRange` (this doc)
Returns	Elements / keys	Sum + matched key/value pairs	A single `u64` count
Stops on	Limit, end of range	Sum limit and/or item limit	Range bounds (whole match counted)
Subqueries allowed	Yes	No	No
Other items in same `Query`	Yes	N/A (own struct)	No — must be the only item
`limit` / `offset` honored	Yes	Yes (item limit)	No — rejected at validation
Required tree type	Any	`SumTree`, `BigSumTree`, ...	Provable count trees only
Proof size relative to result	O(result)	O(matched items)	O(log n) regardless of count

Aggregate Sum on Range Queries

Overview

An Aggregate Sum on Range Query lets a caller ask:

"What is the total sum of children whose keys fall in this range, in this ProvableSumTree?"

The answer is a signed i64, and on a ProvableSumTree it comes back with a cryptographic proof. A verifier holding the tree's root hash can compute the total from the proof in O(log n + |boundary|) work — without ever materializing the SumItem values themselves.

This is the parallel to Aggregate Count on Range for sum trees. The two query types are orthogonal: an aggregate-sum query returns a sum, an aggregate-count query returns a count, and a single PathQuery may not contain both.

Not to be confused with Aggregate Sum Queries. That existing API is a sum-budget iterator — it walks a SumTree returning (key, sum_value) pairs until a running total is reached. AggregateSumOnRange is a different feature: it answers "what is the verified total for keys in this range?" without returning any values, and only against the ProvableSumTree element type.

The feature is implemented as a QueryItem variant:

#![allow(unused)]
fn main() {
pub enum QueryItem {
    Key(Vec<u8>),
    Range(Range<Vec<u8>>),
    // ... existing variants ...
    AggregateCountOnRange(Box<QueryItem>),

    /// Sum the per-node sum contributions of children matched by the inner
    /// range, without returning them. Only valid on ProvableSumTree (and its
    /// `NonCounted` / `NotSummed` wrapper variants).
    AggregateSumOnRange(Box<QueryItem>),
}
}

The wrapped QueryItem is the range to sum over. As with AggregateCountOnRange, it must be one of the true range variants: Range, RangeInclusive, RangeFrom, RangeTo, RangeToInclusive, RangeAfter, RangeAfterTo, RangeAfterToInclusive. The single-key (Key), full-range (RangeFull), and self-nested (AggregateSumOnRange) variants are rejected — and AggregateSumOnRange may not wrap an AggregateCountOnRange either.

Why are Key and RangeFull rejected?

Key(k) would return either 0 or the single child's sum contribution — degenerate cases the existing get_raw / verify_query_with_options paths already handle more cheaply.

RangeFull has its answer already exposed by the parent's Element::ProvableSumTree(_, sum, _) bytes, which are hash-verified by the parent Merk's proof. Going through AggregateSumOnRange(RangeFull) would always produce a strictly heavier proof for an answer the caller can read directly.

Why this works only on ProvableSumTree

GroveDB has several tree types that track a sum:

Tree type	Sum tracked?	Sum in node hash?	AggregateSumOnRange allowed?
`SumTree`	yes	no	no
`BigSumTree`	yes (i128)	no	no
`CountSumTree`	yes	no	no
`ProvableCountSumTree`	yes	no (count only)	no
`ProvableSumTree`	yes	yes	yes
`NonCountedProvableSumTree`	yes (inner)	yes (inner)	yes
`NotSummedProvableSumTree`	yes (inner)	yes (inner)	yes

Only ProvableSumTree bakes the per-node sum into the node hash via node_hash_with_sum(kv_hash, left, right, sum). Because every node's sum participates in the Merkle root, a verifier holding only the root hash can reconstruct enough of the tree from a proof to trust the sums embedded in it.

SumTree, BigSumTree, CountSumTree, and ProvableCountSumTree all track sums in storage, but those sums are not committed in the node hash chain. (For ProvableCountSumTree, the count is in the hash but the sum is not.) A "proof" of those sums would be unverifiable, so we reject AggregateSumOnRange against them at query-construction time.

The wrapper variants are accepted because the wrapper only changes how the parent aggregates this element — the inner is still a fully-fledged ProvableSumTree.

Why not BigSumTree? BigSumTree uses i128 sums and would need a separate hash dispatch (node_hash_with_big_sum) plus a different verify path. It is a documented follow-up, not part of this PR.

Query-Level Constraints

AggregateSumOnRange is a terminal query item. Its presence reduces the enclosing Query to a single, well-defined operation: "sum, then return."

If any QueryItem::AggregateSumOnRange(_) appears in Query::items, the query is well-formed only when:

items.len() == 1 — no other items, no other sums, no mixing with AggregateCountOnRange.
The inner QueryItem is not Key, RangeFull, or another AggregateSumOnRange / AggregateCountOnRange.
default_subquery_branch.subquery.is_none() and subquery_path.is_none().
conditional_subquery_branches.is_none() (or empty).
The targeted subtree's TreeType is ProvableSumTree.
The enclosing SizedQuery does not set limit or offset. Summing is aggregate over the matched range — pagination would silently change the answer and is rejected.
left_to_right is ignored (summing is direction-agnostic).

Violations return Error::InvalidQuery(...) before any I/O.

API Surface

#![allow(unused)]
fn main() {
// Prove side — unchanged from regular queries:
GroveDb::prove_query(&path_query, prove_options, grove_version)
    -> CostResult<Vec<u8>, Error>

// Verify side — dedicated, returns (root_hash, sum):
GroveDb::verify_aggregate_sum_query(proof, &path_query, grove_version)
    -> Result<(CryptoHash, i64), Error>
}

A bare tuple is used rather than a wrapper struct: the sum is already an i64 and the path_query echoes the inner range.

Note on NonCounted and NotSummed children. An Element::NotSummed(child) wrapper tells the parent sum tree to skip the wrapped element when aggregating its own sum. AggregateSumOnRange honors this: every node in a ProvableSumTree carries an own-sum equal to its own SumItem value or 0 if NotSummed-wrapped. The verifier credits only the own-sum to the in-range total when the boundary key falls in range. NonCounted is orthogonal to sums — it suppresses count aggregation, not sum aggregation — so a NonCounted SumItem still contributes its sum value normally.

Proof Node Vocabulary

For ProvableSumTree, every node hash commits to its subtree's aggregate sum via node_hash_with_sum(kv_hash, left, right, sum). The proof-node vocabulary is parallel to the count family, with new variants carrying an i64 sum field in place of the u64 count:

Role in proof	Proof node type	What it carries
On-path / boundary	`KVDigestSum(key, value_hash, sum)`	key + value digest + subtree sum
Fully-inside / outside	`HashWithSum(kv_hash, left_hash, right_hash, sum)`	the four fields needed to recompute `node_hash_with_sum`
Queried boundary item	`KVSum(key, value, sum)`	leaf value at a boundary key, with subtree sum
Empty side	(the empty-tree sentinel, no `Push` needed)	—

Wire format tag bytes (V1 only): 0x30..=0x3D for the push and push-inverted variants. The on-the-wire sum field is varint i64 (not fixed-width) for compactness; the hash input to node_hash_with_sum uses fixed 8-byte big-endian — wire and hash are deliberately decoupled.

Why HashWithSum is self-verifying. The sum value carried by a HashWithSum op is bound to the parent merk's hash chain, not trusted on faith. The verifier recomputes node_hash_with_sum(kv_hash, left, right, sum) from the four fields and uses the result as the subtree's committed node_hash for the parent's hash recomputation. If the prover lies about sum, the recomputed node_hash diverges from what the parent committed, and the parent's Merkle-root check fails.

The walk-by-example diagrams from Aggregate Count on Range Queries apply unchanged — substitute KVDigestCount → KVDigestSum and HashWithCount → HashWithSum.

Signed-Sum Arithmetic

Two correctness points differ from the count machinery:

Negative sums

A ProvableSumTree can hold negative SumItem values, and a range can sum to a negative or zero total. Two consequences:

No if sum == 0 short-circuit. The count generator can skip an empty subtree (count = 0 means "no elements"), but sum == 0 does not mean "no elements" — it can mean "+5 and -5 cancelled". The sum prover descends regardless.
No own_sum = aggregate − left_struct − right_struct overflow check. Count uses checked_sub to catch "children claim more than parent" as corruption. Signed sums can naturally have children's structural sums in any combination (+200 + -150 = +50), so the subtraction is allowed to wrap. The hash chain still binds every node, so arithmetic corruption changes the reconstructed root hash and the caller's root check catches it.

i64 overflow at extremes

A sum of two i64::MAX children does not fit in i64. The verify path accumulates in i128 end-to-end:

The prover's internal recursion (emit_sum_proof) returns CostResult<i128, Error>.
The verifier's verify_sum_shape accumulates into an i128.
Both narrow to i64 at the outermost entry point via i64::try_from(sum_i128), returning Error::InvalidProofError if the i128 result doesn't fit.

Tests cover the two interesting overflow shapes:

i64::MAX + i64::MAX → overflows i64, verify rejects with InvalidProofError.
i64::MAX + i64::MIN → -1, fits i64, verify succeeds. The intermediate i128 carries the difference safely.

Tests and Examples

See:

grovedb/src/tests/aggregate_sum_query_tests.rs — 21 end-to-end GroveDB tests.
merk/src/proofs/query/aggregate_sum.rs — 14 Merk-level tests (classification, prover internals, single-Hash rejection, disjoint-with-children rejection, overflow at i64::MAX).
grovedb/src/operations/proof/aggregate_sum.rs — V0/V1 envelope walker with layer-chain validation.

The marquee scenarios:

Scenario	Result
Full range over `[1..=15]`	sum = 120
Subrange `[5..=10]`	sum = 45
Mixed `+50, -100, +30, -50`	sum = -70
All-negative subrange	sum = -10
`+5, -5` (non-zero children, zero sum)	sum = 0 (no short-circuit)
`i64::MAX + i64::MAX`	`Error::InvalidProofError`
`i64::MAX + i64::MIN`	sum = -1
Tampered `HashWithSum::sum`	rejected (root-hash divergence)
`NotSummed(SumItem)` in range	excluded (matches tree's aggregate)
Query with subquery / pagination / mixed aggregates	rejected at validation

Count-Offset Paginated Queries

Overview

A count-offset paginated query lets a caller paginate through the keys inside a ProvableCountTree or ProvableCountSumTree, asking:

"Skip the first N in-range items, then return the next M items."

…with a single proof whose size is proportional to M + log(skipped), not M + skipped. The skipped region collapses to one hash-bound op per skipped subtree.

This is the missing pagination primitive for provable queries. Where aggregate count queries ask "how many?", count-offset paginated queries ask "give me page N of this range, proving I skipped exactly the items between the start of the range and the page".

Concretely the prover honors SizedQuery::offset and SizedQuery::limit for the duration of a single-range query against a count tree:

#![allow(unused)]
fn main() {
let mut q = Query::new();
q.insert_range_inclusive(b"a".to_vec()..=b"z".to_vec());

let path_query = PathQuery::new(
    vec![b"my_count_tree".to_vec()],
    SizedQuery::new(q, /* limit  */ Some(20), /* offset */ Some(40)),
);

let proof_bytes = db.prove_query(&path_query, None, grove_version)?;
let (root_hash, results) = GroveDb::verify_query_raw(
    &proof_bytes, &path_query, grove_version,
)?;
// `results` holds in-range items 41..=60 (offset 40, then limit 20).
}

Inside the offset window, the prover never emits the skipped items' values, only proofs that those items exist and contribute their full count to the skipped total. Inside the limit window it emits the actual value-bearing nodes. Past the limit it goes back to digest-only nodes. The verifier independently re-derives the offset / limit accounting from the proof shape — it never trusts the prover's numbers; it computes its own and demands they match.

Eligibility

Count-offset paginated proofs are accepted only for queries that pass the SizedQuery::validate_count_offset_paginated gate:

SizedQuery::offset is Some(n) with n > 0. Offset = 0 is not pagination — the regular query path already covers that case.
Exactly one item in Query::items. Multi-item queries are out of scope for the initial implementation.
The item is a true range variant — Range, RangeInclusive, RangeFrom, RangeFull, RangeTo, RangeToInclusive, RangeAfter, RangeAfterTo, or RangeAfterToInclusive. QueryItem::Key(_) is rejected because it matches at most one key, so offset > 0 is guaranteed to return zero items — a useless query that almost always indicates user error.
No subqueries. default_subquery_branch.subquery.is_none() / subquery_path.is_none() / no conditional_subquery_branches.
No aggregate wrappers. AggregateCountOnRange and AggregateSumOnRange have their own paginated semantics; we reject the combination so the two flows don't shadow each other.
The PathQuery::path is non-empty. The GroveDB root is always a NormalTree, never a count tree, so a root-level count-offset query has no valid target.
The leaf merk's tree_type is ProvableCountTree or ProvableCountSumTree. This is checked at the top of the proof generator by opening the merk and reading its tree type — anything else surfaces as Error::InvalidQuery with a clear "only valid against ProvableCountTree / ProvableCountSumTree" message.

Violating any of 1–6 returns Error::InvalidQuery(...) from SizedQuery::validate_count_offset_paginated. Violating 7 returns the same error from check_count_offset_target_tree_type (the prover's leaf-merk precheck).

V1-only — V0 proofs do not support this

This is a V1-proof-only feature. The V0 proof envelope is a shipped wire format used by grove versions v1 and v2 in production; adding new accepted query shapes there would be a consensus-breaking change for already-deployed validators. V0's prover unconditionally rejects any non-zero offset, and the verifier's V0/V1 split (in verify_proof_internal and verify_proof_raw_internal) unconditionally rejects offset queries against a V0 envelope while routing V1 envelopes through validate_count_offset_paginated.

The prove_count_offset_on_range method on Merk is gated on MerkProofVersions::prove_count_offset_on_range (initial implementation version 0, set across all grove versions). The version field exists so a coordinated prover/verifier change in a future grove version can bump it to 1+ without breaking older callers.

Why this works only on count trees that bind count into the hash

Same reasoning as aggregate count queries: only ProvableCountTree and ProvableCountSumTree use node_hash_with_count(kv_hash, left, right, count) for their node-hash computation, so a proof that asserts a particular count for a skipped subtree is cryptographically bound — a forged count produces a different reconstructed root hash and the chain check fails.

For ProvableCountSumTree the node hash binds only the count (not the sum) — the sum is stored on the node but isn't in the hash, by the same design choice that makes AggregateCountOnRange work uniformly for both variants. So count-offset paginated proofs commit only the count too; the sum is not part of this feature.

Plain CountTree / CountSumTree track counts but don't bind them to the hash. Pagination proofs against them would be unverifiable. NormalTree, SumTree, etc. don't even track counts. All are rejected at the prover's leaf-merk precheck.

How the proof is built

The proof generator carries two pieces of state through the recursion:

offset_remaining: u64 — how many in-range items the prover still needs to skip.
limit_remaining: Option<u64> — how many in-range items the prover can still return (None = unlimited).

At each subtree, the prover classifies it against the inner range — Disjoint, Contained, or Boundary — and decides whether to collapse the entire subtree into a single HashWithCount op or descend into it per-element. The decision is direction-aware: for ascending walks (left-to-right) the prover visits the left child first, then self, then right; for descending walks (right-to-left) it visits right, then self, then left, so "the first N in-range keys" matches the user-facing iteration order.

Collapse rules

Classification	Condition	Emitted op	State mutation
Disjoint	always	`HashWithCount(kv_hash, l_h, r_h, c)`	none — no in-range items
Contained	`subtree_count ≤ offset_remaining`	`HashWithCount(...)`	`offset_remaining −= subtree_count`
Contained	`offset_remaining == 0 && limit_remaining == Some(0)`	`HashWithCount(...)` (past-limit)	none
Contained	otherwise (partial-skip or partial-limit)	descend per-element	—
Boundary	always	descend per-element	—

The first row is shared with AggregateCountOnRange: a Disjoint subtree contributes zero to the in-range total but its structural count still has to be hash-bound for the parent's own-count derivation (see "Why HashWithCount is self-verifying" in the aggregate-count chapter).

The middle two rows are what's new for offset queries:

Whole-subtree skip (subtree_count ≤ offset_remaining): the prover emits one HashWithCount op for an entire subtree and decrements offset_remaining by that subtree's count. This is the optimization the feature exists for — an offset of, say, 10,000 over a tree of 100,000 items pays log-of-skipped proof size, not 10,000 ops.
Whole-subtree past-limit collapse: once the limit is exhausted, any remaining Contained subtree is emitted as one HashWithCount with no state change. The verifier reaches it via the same collapse rule and accepts.

Per-element emission inside a descent

When the prover descends into a Boundary node (or a Contained subtree that's too big to fully skip), each node it visits emits one of:

Per-node disposition	Emitted op
Out-of-range key (Boundary path node)	`KVDigestCount(key, value_hash, count)`
In-range `NonCounted`-wrapped entry (own_count = 0)	`KVDigestCount(key, value_hash, count)`
In-range counted entry, inside the offset window	`KVDigestCount(key, value_hash, count)` (skip)
In-range counted entry, past the limit	`KVDigestCount(key, value_hash, count)` (no return)
In-range counted entry, inside the limit window — returned	`KVCount(key, value, count)` or `KVValueHashFeatureType(key, value, value_hash, ft)`

The same KVDigestCount op is used for four conceptually-different positions; the verifier disambiguates by re-running the prover's state machine in directional order and matching the op shape to the disposition its current state implies. Out-of-range keys and NonCounted entries naturally contribute own_count = 0 to the state machine and the verifier expects no state mutation for them.

The returned-item flavor depends on what's stored in the count tree:

Items / SumItems → KVCount(key, value, count).
Trees / References → KVValueHashFeatureType(key, value, value_hash, feature_type).
References under a count tree get the same merk-level shape as trees; GroveDB's reference-resolution post-pass rewrites them to KVRefValueHashCount with the dereferenced value.

Verifier shape walk

The verifier mirrors the prover's state machine in directional order. It maintains:

offset_remaining — initialized to the caller-passed offset, decremented whenever the verifier independently observes a count-bound skip.
limit_remaining — initialized to the caller-passed limit, decremented for each returned item.
skipped — running count of items the prover skipped, computed entirely from the proof shape (the prover's claimed number is not trusted).
returned: Vec<CountOffsetReturnedItem> — items the verifier reconstructs from value-bearing nodes inside the limit window.

For each node it visits:

Classify the position (Disjoint / Contained / Boundary) using the same inherited-bounds logic the prover used.
Validate the op shape against the classification + own state. For example:
- A HashWithCount at a Disjoint position must be a leaf in the proof (no attached children). A child here would mean the prover is hiding counted entries under a hash-only node.
- A HashWithCount at a Contained position is valid only when state.offset_remaining > 0 (skip-mode) or state.limit_remaining == Some(0) (past-limit). Anywhere else means the prover should have descended.
- A value-bearing node (KVCount, KVValueHashFeatureType) is valid only when state.offset_remaining == 0 && state.limit_remaining != Some(0).
Derive own_count for the current node in O(1) from its immediate children's count fields: own = aggregate − left_aggregate − right_aggregate. This is what lets the in-order state machine know — before recursing into the second-direction child — whether this position contributes a slot to offset or limit.
Apply the per-position state mutation in directional order: left → self → right for ascending, right → self → left for descending.
Bubble the structural count back up so the parent can re-derive its own own_count.

At the end the verifier returns (root_hash, returned_items, skipped). The caller compares root_hash against their trusted root and trusts the rest.

Why we don't trust the prover's offset accounting

A malicious prover could try several attacks:

Attack	Detection
Forge `HashWithCount.count` to under-count a skipped subtree	`node_hash_with_count` recomputation gives a different reconstructed root hash → chain check fails
Forge `HashWithCount.count` to over-count a skipped subtree	Same as above
Substitute a value-bearing node for `KVDigestCount` mid-skip	Verifier sees a value at a position where `state.offset_remaining > 0` → "value emitted with offset remaining"
Substitute `KVDigestCount` for a value mid-limit	Verifier sees a digest at `offset_remaining == 0 && limit_remaining > 0` → "digest at offset=0 with limit free"
Attach children to a leaf-position `HashWithCount`	Shape-walk check rejects: "HashWithCount at Contained/Disjoint must be a leaf"
Emit a `KVDigestCount` with a key outside its inherited bounds	`key_strictly_inside` check rejects
Emit children whose aggregates exceed the parent's	`own_count = aggregate − left − right` underflow rejects
Inject a non-count node kind (e.g. `Hash`, `KVHash`)	`execute_with_options` visit-node allowlist rejects

These rejection branches all have dedicated forging tests in merk/src/proofs/query/count_offset/tests.rs.

Unsupported in-range value shapes (P1 / P2)

The count-offset proof flow's scope is plain Item / SumItem / ItemWithSumItem and empty trees inside a count tree. Three shapes are explicitly rejected by both the prover and the verifier:

Rejected shape	Primary defense
`NonCounted`-wrapped entry	Rejected at insert time by PR #672 — `NonCounted` cannot be stored inside a `ProvableCountTree` / `ProvableCountSumTree` at all. The merk-level prover and the verifier still reject defensively (against pre-#672 data on disk or any lower-level builder that bypasses the insert restriction).
`Reference` / `ReferenceWithSumItem`	The regular flow's reference post-pass dereferences these to the target's value bytes. The count-offset short-circuit returns before that post-pass, so a verified result would expose the raw `Element::Reference` rather than the dereferenced target. Prover rejects at descent; verifier rejects in returned items.
Non-empty tree (any tree variant)	V1 strict-mode requires a `KVValueHashFeatureTypeWithChildHash` proof node for these, which the count-offset prover doesn't emit. Accepting one without that node would silently bypass the child-hash invariant the regular flow enforces.

Why the `NonCounted` rejection is enforced at insert time

A ProvableCountTree binds its count aggregate into every node hash via node_hash_with_count. The HashWithCount collapse rule in the prover (Contained subtree + subtree_count ≤ offset_remaining) folds an entire subtree into one self-verifying op whose committed count field is what consumes the offset budget.

NonCounted children contribute own_count = 0, so they don't show up in subtree_count — but they are visible to regular GroveDB pagination. Allowing them in a ProvableCountTree would mean a contained subtree like [counted-a, NonCounted-b, counted-c] with offset = 2, limit = 1 could collapse as HashWithCount(count = 2) and verify with returned = [], while regular pagination would return [c]. That's a silent semantic divergence.

#672 closes the gap at the only place it can be closed without changing the proof wire format: the Element::insert / batch path refuses to store NonCounted inside a Provable* count parent. With that invariant in place, subtree_count always equals the actual entry count for these trees, and the collapse rule is safe.

Lifting the remaining restrictions (follow-up work)

Non-empty trees: emit KVValueHashFeatureTypeWithChildHash (mirroring the regular V1 prover) and drop the verifier-side rejection.
References: apply the same reference-post-pass the regular V1 prover uses, rewriting Reference / ReferenceWithSumItem value nodes into KVRefValueHashCount with the dereferenced target's bytes.
NonCounted entries are unlikely to become legal here, since the whole ProvableCountTree model depends on subtree_count == entry count. If that semantic is ever wanted, the right path is a different tree type, not relaxing the insert rule.

API surface

Count-offset paginated queries go through the same prove_query / verify_query_raw / verify_query_with_options entry points as every other path query — there is no dedicated entry point. The query envelope (PathQuery with a SizedQuery carrying a non-zero offset) is what selects the count-offset dispatch.

Prover side:

#![allow(unused)]
fn main() {
// Same entry point as every other path query.
GroveDb::prove_query(&path_query, prove_options, grove_version)
    -> CostResult<Vec<u8>, Error>
}

Internally prove_subqueries_v1 short-circuits at the leaf when path_query.path.len() == current_path.len() && path_query.has_non_zero_offset(), calls Merk::prove_count_offset_on_range, and wraps the bytes in a LayerProof with empty lower_layers.

Verifier side:

#![allow(unused)]
fn main() {
// Same entry points as every other path query.
GroveDb::verify_query_with_options(proof, &path_query, options, grove_version)
GroveDb::verify_query_raw(proof, &path_query, grove_version)
}

verify_proof_internal / verify_proof_raw_internal enforce the V0/V1 split on offset, then verify_layer_proof_v1 short-circuits at the leaf to run_count_offset_layer_dispatch, which:

Rejects unexpected lower_layers (an honest count-offset leaf proof has none — the validator forbade subqueries).
Calls verify_count_offset_on_range_proof for the merk-level shape walk.
Rejects any non-empty tree returned item.
Translates each surviving returned item into a ProvedPathKeyOptionalValue using the merk-surfaced value-hash and child_hash_verified flag (not synthesized — see comment in CountOffsetReturnedItem).

The merk-level type that the verifier emits per item is

#![allow(unused)]
fn main() {
pub struct CountOffsetReturnedItem {
    pub key: Vec<u8>,
    pub value: Vec<u8>,
    /// `H(value)` for `KVCount`; proof-carried value_hash for
    /// `KVValueHashFeatureType` / `KVValueHash` (tree-flavored entries
    /// store `combine_hash(H(value), child_root)`).
    pub value_hash: CryptoHash,
    /// Always `false` for now — the current prover never emits the
    /// with-child-hash variant.
    pub child_hash_verified: bool,
}
}

Comparison table

	Regular query	Aggregate count	Count-offset paginated
Return type	items	`u64` count	items (subset of range)
Honors `SizedQuery::offset`?	yes (but proofs reject it)	no	yes (V1 only, count trees only)
Honors `SizedQuery::limit`?	yes	leaf: no, carrier: yes	yes
Direction-aware?	yes	no (counting is direction-agnostic)	yes
Supported on V0 proofs?	yes	yes	no (V1 only)
Allowed tree types	any	ProvableCount / ProvableCountSum	ProvableCount / ProvableCountSum
Proof size vs offset	O(offset + limit)	n/a	O(log(offset) + limit)

Future work

Multi-item queries / subqueries. Currently rejected by validate_count_offset_paginated. The merk machinery extends naturally if the per-level state-machine accounting can be re-derived correctly.
Non-empty tree returns. Requires the prover to emit KVValueHashFeatureTypeWithChildHash for tree children of a count tree, same as the regular V1 prover does. The verifier's child-hash check would then accept those entries instead of rejecting them in the GroveDB layer.
Aggregate-style multi-layer paginated proofs. The AggregateCountOnRange carrier shape (an outer multi-key walk with inner-aggregate leaves) could be extended to paginated outer walks with count-offset inner leaves. Out of scope for the initial PR.

Batch Operations at the Grove Level

GroveOp Variants

At the GroveDB level, operations are represented as GroveOp:

#![allow(unused)]
fn main() {
pub enum GroveOp {
    // User-facing operations:
    InsertOnly { element: Element },
    InsertOrReplace { element: Element },
    Replace { element: Element },
    Patch { element: Element, change_in_bytes: i32 },
    RefreshReference { reference_path_type, max_reference_hop, flags, trust_refresh_reference },
    Delete,
    DeleteTree(TreeType, SubelementsDeletionBehavior),  // Per-op deletion policy

    // Non-Merk tree append operations (user-facing):
    CommitmentTreeInsert { cmx: [u8; 32], rho: [u8; 32], cv_net: [u8; 32], payload: Vec<u8> },
    MmrTreeAppend { value: Vec<u8> },
    BulkAppend { value: Vec<u8> },
    DenseTreeInsert { value: Vec<u8> },

    // Internal operations (created by preprocessing/propagation, rejected by from_ops):
    ReplaceTreeRootKey { hash, root_key, aggregate_data },
    InsertTreeWithRootHash { hash, root_key, flags, aggregate_data },
    ReplaceNonMerkTreeRoot { hash: [u8; 32], meta: NonMerkTreeMeta },
    InsertNonMerkTree { hash, root_key, flags, aggregate_data, meta: NonMerkTreeMeta },
}
}

NonMerkTreeMeta carries tree-type-specific state through batch processing:

#![allow(unused)]
fn main() {
pub enum NonMerkTreeMeta {
    CommitmentTree { total_count: u64, chunk_power: u8 },
    MmrTree { mmr_size: u64 },
    BulkAppendTree { total_count: u64, chunk_power: u8 },
    DenseTree { count: u16, height: u8 },
}
}

SubelementsDeletionBehavior controls how a DeleteTree handles non-empty subtrees:

#![allow(unused)]
fn main() {
pub enum SubelementsDeletionBehavior {
    DontCheckWithNoCleanup, // Skip emptiness check AND post-apply cleanup; caller guarantees tree is empty
    Error,                  // Return Error::DeletingNonEmptyTree if non-empty
    DeleteChildren,         // Skip emptiness check, but perform post-apply storage cleanup
    Skip,                   // Check, and silently skip deletion if non-empty
}
}

Variant	Tree state	Emptiness check	Deletes tree	Storage cleanup
`DontCheckWithNoCleanup`	empty	No	Yes	No
`DontCheckWithNoCleanup`	non-empty	No	Yes	No
`DeleteChildren`	empty	No	Yes	Yes
`DeleteChildren`	non-empty	No	Yes	Yes
`Error`	empty	Yes	Yes	Yes
`Error`	non-empty	Yes	No (returns error)	No
`Skip`	empty	Yes	Yes	Yes
`Skip`	non-empty	Yes	No (silently skips)	No

Each operation is wrapped in a QualifiedGroveDbOp that includes the path:

#![allow(unused)]
fn main() {
pub struct QualifiedGroveDbOp {
    pub path: KeyInfoPath,           // Where in the grove
    pub key: Option<KeyInfo>,        // Which key (None for append-only tree ops)
    pub op: GroveOp,                 // What to do
}
}

Note: The key field is Option<KeyInfo> — it is None for append-only tree operations (CommitmentTreeInsert, MmrTreeAppend, BulkAppend, DenseTreeInsert) where the tree key is the last segment of path instead.

Two-Phase Processing

Batch operations are processed in two phases:

graph TD
    input["Input: Vec&lt;QualifiedGroveDbOp&gt;"]

    subgraph phase1["PHASE 1: VALIDATION"]
        v1["1. Sort by path + key<br/>(stable sort)"]
        v2["2. Build batch structure<br/>(group ops by subtree)"]
        v3["3. Validate element types<br/>match targets"]
        v4["4. Resolve & validate<br/>references"]
        v1 --> v2 --> v3 --> v4
    end

    v4 -->|"validation OK"| phase2_start
    v4 -->|"validation failed"| abort["Err(Error)<br/>abort, no changes"]

    subgraph phase2["PHASE 2: APPLICATION"]
        phase2_start["Start application"]
        a1["1. Open all affected<br/>subtrees (TreeCache)"]
        a2["2. Apply MerkBatch ops<br/>(deferred propagation)"]
        a3["3. Propagate root hashes<br/>upward (leaf → root)"]
        a4["4. Commit transaction<br/>atomically"]
        phase2_start --> a1 --> a2 --> a3 --> a4
    end

    input --> v1

    style phase1 fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style phase2 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style abort fill:#fadbd8,stroke:#e74c3c,stroke-width:2px
    style a4 fill:#d4e6f1,stroke:#2980b9,stroke-width:2px

TreeCache and Deferred Propagation

During batch application, GroveDB uses a TreeCache to defer root hash propagation until all operations in a subtree are complete:

graph TD
    subgraph without["WITHOUT TreeCache (naive)"]
        w1["Op 1: Insert A in X"]
        w1p["Propagate X → parent → root"]
        w2["Op 2: Insert B in X"]
        w2p["Propagate X → parent → root"]
        w3["Op 3: Insert C in X"]
        w3p["Propagate X → parent → root"]
        w1 --> w1p --> w2 --> w2p --> w3 --> w3p
    end

    subgraph with_tc["WITH TreeCache (deferred)"]
        t1["Op 1: Insert A in X<br/>→ buffered"]
        t2["Op 2: Insert B in X<br/>→ buffered"]
        t3["Op 3: Insert C in X<br/>→ buffered"]
        tp["Propagate X → parent → root<br/>(walk up ONCE)"]
        t1 --> t2 --> t3 --> tp
    end

    style without fill:#fadbd8,stroke:#e74c3c,stroke-width:2px
    style with_tc fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style w1p fill:#fadbd8,stroke:#e74c3c
    style w2p fill:#fadbd8,stroke:#e74c3c
    style w3p fill:#fadbd8,stroke:#e74c3c
    style tp fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

3 propagations × O(depth) vs 1 propagation × O(depth) = 3x faster for this subtree.

This is a significant optimization when many operations target the same subtree.

Atomic Cross-Subtree Operations

A key property of GroveDB batches is atomicity across subtrees. A single batch can modify elements in multiple subtrees, and either all changes commit or none do:

    Batch:
    1. Delete ["balances", "alice"]       (remove balance)
    2. Insert ["balances", "bob"] = 100   (add balance)
    3. Update ["identities", "bob", "rev"] = 2  (update revision)

    Three subtrees affected: balances, identities, identities/bob

    If ANY operation fails → ALL operations are rolled back
    If ALL succeed → ALL are committed atomically

The batch processor handles this by:

Collecting all affected paths
Opening all needed subtrees
Applying all operations
Propagating all root hashes in dependency order
Committing the entire transaction

Batch Preprocessing for Non-Merk Trees

CommitmentTree, MmrTree, BulkAppendTree, and DenseAppendOnlyFixedSizeTree operations require access to storage contexts outside the Merk, which is not available inside the standard execute_ops_on_path method (it only has access to the Merk). These operations use a preprocessing pattern: before the main apply_body phase, the entry points scan for non-Merk tree ops and convert them to standard internal ops.

#![allow(unused)]
fn main() {
pub enum GroveOp {
    // ... standard ops ...

    // Non-Merk tree operations (user-facing):
    CommitmentTreeInsert { cmx: [u8; 32], rho: [u8; 32], cv_net: [u8; 32], payload: Vec<u8> },
    MmrTreeAppend { value: Vec<u8> },
    BulkAppend { value: Vec<u8> },
    DenseTreeInsert { value: Vec<u8> },

    // Internal ops (produced by preprocessing):
    ReplaceNonMerkTreeRoot { hash: [u8; 32], meta: NonMerkTreeMeta },
}
}

graph TD
    subgraph preprocess["PREPROCESSING PHASE"]
        scan["Scan ops for<br/>CommitmentTreeInsert<br/>MmrTreeAppend<br/>BulkAppend<br/>DenseTreeInsert"]
        load["Load current state<br/>from storage"]
        mutate["Apply append to<br/>in-memory structure"]
        save["Write updated state<br/>back to storage"]
        convert["Convert to<br/>ReplaceNonMerkTreeRoot<br/>with new root hash + meta"]

        scan --> load --> mutate --> save --> convert
    end

    subgraph apply["STANDARD APPLY_BODY"]
        body["execute_ops_on_path<br/>sees ReplaceNonMerkTreeRoot<br/>(non-Merk tree update)"]
        prop["Propagate root hash<br/>upward through grove"]

        body --> prop
    end

    convert --> body

    style preprocess fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style apply fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

Why preprocessing? The execute_ops_on_path function operates on a single Merk subtree and has no access to self.db or broader storage contexts. Preprocessing in the entry points (apply_batch_with_element_flags_update, apply_partial_batch_with_element_flags_update) has full access to the database, so it can load/save data and then hand off a simple ReplaceNonMerkTreeRoot to the standard batch machinery.

Each preprocessing method follows the same pattern:

preprocess_commitment_tree_ops — Loads frontier and BulkAppendTree from data storage, appends to both, saves back, converts to ReplaceNonMerkTreeRoot with updated combined root and CommitmentTree { total_count, chunk_power } meta
preprocess_mmr_tree_ops — Loads MMR from data storage, appends values, saves back, converts to ReplaceNonMerkTreeRoot with updated MMR root and MmrTree { mmr_size } meta
preprocess_bulk_append_ops — Loads BulkAppendTree from data storage, appends values (may trigger chunk compaction), saves back, converts to ReplaceNonMerkTreeRoot with updated state root and BulkAppendTree { total_count, chunk_power } meta
preprocess_dense_tree_ops — Loads DenseFixedSizedMerkleTree from data storage, inserts values sequentially, recomputes root hash, saves back, converts to ReplaceNonMerkTreeRoot with updated root hash and DenseTree { count, height } meta

The ReplaceNonMerkTreeRoot op carries the new root hash and a NonMerkTreeMeta enum so the element can be fully reconstructed after processing.

Cost Tracking

The OperationCost Structure

Every operation in GroveDB accumulates costs, measured in computational resources:

#![allow(unused)]
fn main() {
// costs/src/lib.rs
pub struct OperationCost {
    pub seek_count: u32,              // Number of storage seeks
    pub storage_cost: StorageCost,    // Bytes added/replaced/removed
    pub storage_loaded_bytes: u64,    // Bytes read from disk
    pub hash_node_calls: u32,         // Number of Blake3 hash operations
    pub sinsemilla_hash_calls: u32,   // Number of Sinsemilla hash operations (EC ops)
}
}

Sinsemilla hash calls track elliptic curve hash operations for CommitmentTree anchors. These are significantly more expensive than Blake3 node hashes.

Storage costs break down further:

#![allow(unused)]
fn main() {
// costs/src/storage_cost/mod.rs
pub struct StorageCost {
    pub added_bytes: u32,                   // New data written
    pub replaced_bytes: u32,                // Existing data overwritten
    pub removed_bytes: StorageRemovedBytes, // Data freed
}
}

The CostContext Pattern

All operations return their result wrapped in a CostContext:

#![allow(unused)]
fn main() {
pub struct CostContext<T> {
    pub value: T,               // The operation result
    pub cost: OperationCost,    // Resources consumed
}

pub type CostResult<T, E> = CostContext<Result<T, E>>;
}

This creates a monadic cost-tracking pattern — costs flow through chains of operations automatically:

#![allow(unused)]
fn main() {
// Unwrap a result, adding its cost to an accumulator
let result = expensive_operation().unwrap_add_cost(&mut total_cost);

// Chain operations, accumulating costs
let final_result = op1()
    .flat_map(|x| op2(x))      // Costs from op1 + op2
    .flat_map(|y| op3(y));      // + costs from op3
}

The cost_return_on_error! Macro

The most common pattern in GroveDB code is the cost_return_on_error! macro, which acts like ? but preserves costs on early return:

#![allow(unused)]
fn main() {
macro_rules! cost_return_on_error {
    ( &mut $cost:ident, $($body:tt)+ ) => {
        {
            let result_with_cost = { $($body)+ };
            let result = result_with_cost.unwrap_add_cost(&mut $cost);
            match result {
                Ok(x) => x,
                Err(e) => return Err(e).wrap_with_cost($cost),
            }
        }
    };
}
}

In practice:

#![allow(unused)]
fn main() {
fn insert_element(&self, path: &[&[u8]], key: &[u8], element: Element) -> CostResult<(), Error> {
    let mut cost = OperationCost::default();

    // Each macro call adds the operation's cost to `cost`
    // and returns the Ok value (or early-returns with accumulated cost on Err)
    let merk = cost_return_on_error!(&mut cost, self.open_merk(path));
    cost_return_on_error!(&mut cost, merk.insert(key, element));
    cost_return_on_error!(&mut cost, self.propagate_changes(path));

    Ok(()).wrap_with_cost(cost)
    // `cost` now contains the sum of all three operations' costs
}
}

Storage Cost Breakdown

When a value is updated, the cost depends on whether the new value is larger, smaller, or the same size:

graph LR
    subgraph case1["CASE 1: Same Size (old=100, new=100)"]
        c1_old["old: 100B"]
        c1_new["new: 100B"]
        c1_cost["replaced_bytes += 100"]
    end

    subgraph case2["CASE 2: Growing (old=100, new=120)"]
        c2_old["old: 100B"]
        c2_new["new: 120B"]
        c2_replaced["replaced: 100B"]
        c2_added["added: +20B"]
        c2_cost["replaced_bytes += 100<br/>added_bytes += 20"]
    end

    subgraph case3["CASE 3: Shrinking (old=100, new=70)"]
        c3_old["old: 100B"]
        c3_new["new: 70B"]
        c3_replaced["replaced: 70B"]
        c3_removed["removed: 30B"]
        c3_cost["replaced_bytes += 70<br/>removed_bytes += 30"]
    end

    style case1 fill:#d4e6f1,stroke:#2980b9,stroke-width:2px
    style case2 fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style c2_added fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style case3 fill:#fadbd8,stroke:#e74c3c,stroke-width:2px
    style c3_removed fill:#fadbd8,stroke:#e74c3c,stroke-width:2px

Hash Operation Costs

Hash costs are measured in "hash node calls" — the number of Blake3 block compressions:

Operation	Input Size	Hash Calls
`value_hash(small)`	< 64 bytes	1
`value_hash(medium)`	64-127 bytes	2
`kv_hash`	key + value_hash	varies
`node_hash`	96 bytes (3 × 32)	2 (always)
`combine_hash`	64 bytes (2 × 32)	1 (always)
`node_hash_with_count`	104 bytes (3 × 32 + 8)	2 (always)
Sinsemilla (CommitmentTree)	Pallas curve EC op	tracked separately via `sinsemilla_hash_calls`

The general formula for Blake3:

hash_calls = 1 + (input_bytes - 1) / 64

Worst-Case and Average-Case Estimation

GroveDB provides functions to estimate operation costs before executing them. This is crucial for blockchain fee calculation — you need to know the cost before committing to pay for it.

#![allow(unused)]
fn main() {
// Worst-case cost for reading a node
pub fn add_worst_case_get_merk_node(
    cost: &mut OperationCost,
    not_prefixed_key_len: u32,
    max_element_size: u32,
    node_type: NodeType,
) {
    cost.seek_count += 1;  // One disk seek
    cost.storage_loaded_bytes +=
        TreeNode::worst_case_encoded_tree_size(
            not_prefixed_key_len, max_element_size, node_type
        ) as u64;
}

// Worst-case propagation cost
pub fn add_worst_case_merk_propagate(
    cost: &mut OperationCost,
    input: &WorstCaseLayerInformation,
) {
    let levels = match input {
        MaxElementsNumber(n) => ((*n + 1) as f32).log2().ceil() as u32,
        NumberOfLevels(n) => *n,
    };
    let mut nodes_updated = levels;

    // AVL rotations may update additional nodes
    if levels > 2 {
        nodes_updated += 2;  // At most 2 extra nodes for rotations
    }

    cost.storage_cost.replaced_bytes += nodes_updated * MERK_BIGGEST_VALUE_SIZE;
    cost.storage_loaded_bytes +=
        nodes_updated as u64 * (MERK_BIGGEST_VALUE_SIZE + MERK_BIGGEST_KEY_SIZE) as u64;
    cost.seek_count += nodes_updated;
    cost.hash_node_calls += nodes_updated * 2;
}
}

Constants used:

#![allow(unused)]
fn main() {
pub const MERK_BIGGEST_VALUE_SIZE: u32 = u16::MAX as u32;  // 65535
pub const MERK_BIGGEST_KEY_SIZE: u32 = 256;
}

The MMR Tree — Append-Only Authenticated Logs

The MmrTree is GroveDB's append-only authenticated data structure, built on a Merkle Mountain Range (MMR) with Blake3 hashing. While Merk AVL trees (Chapter 2) excel at random key-value operations with O(log N) updates, MMRs are purpose-built for the append-only case: they offer zero rotations, O(1) amortized hash cost per append, and sequential I/O patterns.

This chapter covers the MMR data structure in depth — how it grows, how nodes are stored, how appends cascade, and how the proof system lets any third party verify that a specific value was appended at a specific position.

Why a Separate Tree Type?

GroveDB's standard Merk trees handle ordered key-value data well, but append-only logs have different requirements:

Property	Merk AVL Tree	MMR
Operations	Insert, update, delete	Append only
Rebalancing	O(log N) rotations per write	None
I/O pattern	Random (rebalancing touches many nodes)	Sequential (new nodes always at end)
Total hashes for N inserts	O(N log N)	O(N)
Structure	Determined by insertion order	Determined only by leaf count
Proofs	Path from root to leaf	Sibling + peak hashes

For use cases like transaction logs, event streams, or any monotonically growing data, the MMR is strictly better: simpler, faster, and more predictable.

The MMR Data Structure

An MMR is a forest of perfect binary trees (called "peaks") that grows left to right. Each peak is a complete binary tree of some height h, containing exactly 2^h leaves.

The key insight: the binary representation of the leaf count determines the peak structure. Each 1-bit in the binary form corresponds to one peak:

Leaf count    Binary    Peaks
─────────     ──────    ─────
1             1         one peak h=0
2             10        one peak h=1
3             11        peaks h=1, h=0
4             100       one peak h=2
5             101       peaks h=2, h=0
6             110       peaks h=2, h=1
7             111       peaks h=2, h=1, h=0
8             1000      one peak h=3

This means the MMR structure is fully determined by a single number — the leaf count. Two MMRs with the same number of leaves always have the same shape, regardless of what values were appended.

How the MMR Fills Up

Each node in the MMR has a position (0-indexed). Leaves and internal nodes are interleaved in a specific pattern. Here is step-by-step growth:

After 1 leaf (mmr_size = 1):

pos:  0
      leaf₀        ← one peak at height 0

After 2 leaves (mmr_size = 3):

pos:     2          ← internal: blake3(leaf₀.hash || leaf₁.hash)
        / \
       0   1        ← leaves

One peak at height 1. Positions 0 and 1 are leaves, position 2 is internal.

When leaf₁ was appended, it created a height-0 peak. But there was already a height-0 peak (leaf₀), so they merged into a height-1 peak.

After 3 leaves (mmr_size = 4):

pos:     2     3    ← peak h=1, peak h=0
        / \
       0   1

Two peaks. No merge — heights 1 and 0 are different.
(Binary: 3 = 11₂ → one peak per 1-bit)

After 4 leaves (mmr_size = 7):

pos:         6              ← internal: merge of nodes 2 and 5
           /   \
         2       5          ← internal nodes
        / \     / \
       0   1   3   4        ← leaves

One peak at height 2.

This is where it gets interesting. Appending leaf₃ (position 4) creates node₅ (merging positions 3 and 4). But now node₅ (height 1) and node₂ (height 1) are equal-height adjacent peaks, so they merge into node₆. A cascade of two merges from a single append.

After 5 leaves (mmr_size = 8):

pos:         6         7    ← peak h=2, peak h=0
           /   \
         2       5
        / \     / \
       0   1   3   4

Two peaks. (Binary: 5 = 101₂)

After 7 leaves (mmr_size = 11):

pos:         6         10    ← peak h=2, peak h=1, peak h=0
           /   \      / \
         2       5   8   9    7
        / \     / \
       0   1   3   4

Three peaks. (Binary: 7 = 111₂)

After 8 leaves (mmr_size = 15):

pos:              14                     ← single peak h=3
               /      \
            6            13
          /   \        /    \
        2       5    9       12
       / \     / \  / \     / \
      0   1   3  4 7   8  10  11

One peak at height 3. Three cascading merges from appending leaf₇.

graph TD
    subgraph mmr5["MMR with 5 leaves (mmr_size = 8)"]
        pos6["pos 6<br/>H(2,5)<br/><b>peak h=2</b>"]
        pos2["pos 2<br/>H(0,1)"]
        pos5["pos 5<br/>H(3,4)"]
        pos0["pos 0<br/>leaf₀"]
        pos1["pos 1<br/>leaf₁"]
        pos3["pos 3<br/>leaf₂"]
        pos4["pos 4<br/>leaf₃"]
        pos7["pos 7<br/>leaf₄<br/><b>peak h=0</b>"]

        pos6 --> pos2
        pos6 --> pos5
        pos2 --> pos0
        pos2 --> pos1
        pos5 --> pos3
        pos5 --> pos4
    end

    style pos6 fill:#d4e6f1,stroke:#2980b9,stroke-width:3px
    style pos7 fill:#d4e6f1,stroke:#2980b9,stroke-width:3px
    style pos0 fill:#d5f5e3,stroke:#27ae60
    style pos1 fill:#d5f5e3,stroke:#27ae60
    style pos3 fill:#d5f5e3,stroke:#27ae60
    style pos4 fill:#d5f5e3,stroke:#27ae60
    style pos7 fill:#d5f5e3,stroke:#27ae60

Blue = peaks (roots of perfect binary subtrees). Green = leaf nodes.

The Merge Cascade

When a new leaf is appended, it may trigger a chain of merges. The number of merges equals the number of trailing 1-bits in the binary representation of the current leaf count:

Leaf count (before push)	Binary	trailing 1s	Merges	Total hashes
0	`0`	0	0	1 (leaf only)
1	`1`	1	1	2
2	`10`	0	0	1
3	`11`	2	2	3
4	`100`	0	0	1
5	`101`	1	1	2
6	`110`	0	0	1
7	`111`	3	3	4

Total hashes per push = 1 + trailing_ones(leaf_count):

1 hash for the leaf itself: blake3(value)
N hashes for the merge cascade: blake3(left.hash || right.hash) for each merge

This is how GroveDB tracks hash costs for each append. The implementation:

#![allow(unused)]
fn main() {
pub fn hash_count_for_push(leaf_count: u64) -> u32 {
    1 + leaf_count.trailing_ones()
}
}

MMR Size vs Leaf Count

The MMR stores both leaves and internal nodes in a flat position space, so mmr_size is always larger than the leaf count. The exact relationship is:

mmr_size = 2 * leaf_count - popcount(leaf_count)

where popcount is the number of 1-bits (i.e., the number of peaks). Each internal node merges two subtrees, reducing the node count by one per merge.

The reverse computation — leaf count from mmr_size — uses the peak positions:

#![allow(unused)]
fn main() {
fn mmr_size_to_leaf_count(mmr_size: u64) -> u64 {
    // Each peak at height h contains 2^h leaves
    get_peaks(mmr_size).iter()
        .map(|&peak_pos| 1u64 << pos_height_in_tree(peak_pos))
        .sum()
}
}

mmr_size	leaf_count	peaks
0	0	(empty)
1	1	h=0
3	2	h=1
4	3	h=1, h=0
7	4	h=2
8	5	h=2, h=0
10	6	h=2, h=1
11	7	h=2, h=1, h=0
15	8	h=3

GroveDB stores mmr_size in the Element (not leaf count) because the ckb MMR library uses positions internally. The mmr_tree_leaf_count operation derives the leaf count on the fly.

MMR Root Hash — Bagging the Peaks

An MMR has multiple peaks (one per 1-bit in the leaf count). To produce a single 32-byte root hash, the peaks are "bagged" right-to-left:

root = bag_rhs_peaks(peaks):
    start with rightmost peak
    fold leftward: blake3(left_peak || accumulated_right)

With 1 peak, the root is just that peak's hash. With 3 peaks:

graph LR
    subgraph bag["Bagging 3 peaks (7 leaves = 111₂)"]
        P0["peak h=2<br/>(4 leaves)"]
        P1["peak h=1<br/>(2 leaves)"]
        P2["peak h=0<br/>(1 leaf)"]

        B1["blake3(P1 || P2)"]
        P1 --> B1
        P2 --> B1

        B0["blake3(P0 || B1)<br/><b>= MMR root</b>"]
        P0 --> B0
        B1 --> B0
    end

    style B0 fill:#d4e6f1,stroke:#2980b9,stroke-width:3px
    style bag fill:#fef9e7,stroke:#f39c12,stroke-width:2px

The root hash changes with every append, even when no merges occur, because the rightmost peak changes and bagging must be recomputed.

Node Structure and Serialization

Each MMR node is an MmrNode:

#![allow(unused)]
fn main() {
struct MmrNode {
    hash: [u8; 32],           // Blake3 hash
    value: Option<Vec<u8>>,   // Some for leaves, None for internal nodes
}
}

Leaf node: hash = blake3(value_bytes), value = Some(value_bytes) Internal node: hash = blake3(left.hash || right.hash), value = None

The merge function is straightforward — concatenate two 32-byte hashes and Blake3 the result:

#![allow(unused)]
fn main() {
fn blake3_merge(left: &[u8; 32], right: &[u8; 32]) -> [u8; 32] {
    let mut input = [0u8; 64];
    input[..32].copy_from_slice(left);
    input[32..].copy_from_slice(right);
    *blake3::hash(&input).as_bytes()
}
}

Note on PartialEq: MmrNode implements PartialEq by comparing only the hash field, not the value. This is critical for proof verification: the ckb verifier compares a reconstructed root (value = None) against the expected root. If PartialEq compared the value field, single-leaf MMR proofs would always fail because the leaf has value: Some(...) but the root reconstruction produces value: None.

Serialization format:

Internal: [0x00] [hash: 32 bytes]                                = 33 bytes
Leaf:     [0x01] [hash: 32 bytes] [value_len: 4 BE] [value...]   = 37 + len bytes

The flag byte distinguishes internal nodes from leaves. Deserialization validates exact length — no trailing bytes allowed.

Storage Architecture

MmrTree stores its nodes in the data column (the same column family used by Merk nodes), not in a child Merk subtree. The Element has no root_key field — the MMR root hash flows as the Merk child hash via insert_subtree(subtree_root_hash), authenticating the MMR state.

Storage keys are position-based:

key = 'm' || position_as_be_u64    (9 bytes: prefix + u64 BE)

So position 42 is stored at key [0x6D, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2A].

Looking up leaf i requires computing the MMR position first: pos = leaf_index_to_pos(i), then reading data key m{pos}.

Write-through cache: During appends, newly written nodes must be immediately readable for subsequent merges in the same push. Because GroveDB's transactional storage defers writes to a batch (they aren't visible to reads until commit), an MmrStore adapter wraps the storage context with an in-memory HashMap cache:

graph LR
    subgraph store["MmrStore"]
        read["get_elem(pos)"] --> cache_check{"In cache?"}
        cache_check -->|"Yes"| cache_hit["Return cached node"]
        cache_check -->|"No"| data_read["Read from data storage"]

        write["append(pos, nodes)"] --> data_write["Write to data storage"]
        data_write --> cache_write["Also insert into cache"]
    end

    style store fill:#fef9e7,stroke:#f39c12,stroke-width:2px

This ensures that when appending leaf₃ triggers a merge cascade (creating internal nodes at positions 5 and 6), node₅ is immediately available when computing node₆, even though node₅ hasn't been committed to RocksDB yet.

Root hash propagation to the GroveDB state root:

combined_value_hash = blake3(
    blake3(varint(len) || element_bytes),   ← value_hash from serialized Element
    mmr_root_hash                           ← child_hash = type-specific root
)

The MMR root hash flows as the Merk child hash via insert_subtree. Any change to the MMR state produces a different combined_value_hash, which propagates up through the parent Merk hierarchy all the way to the GroveDB state root.

GroveDB Operations

MmrTree provides four operations:

#![allow(unused)]
fn main() {
// Append a value — returns (new_mmr_root, leaf_index)
db.mmr_tree_append(path, key, value, tx, version)

// Read the current root hash (from Element, no storage access)
db.mmr_tree_root_hash(path, key, tx, version)

// Get a leaf value by 0-based index
db.mmr_tree_get_value(path, key, leaf_index, tx, version)

// Get the number of leaves appended
db.mmr_tree_leaf_count(path, key, tx, version)
}

Append Flow

The append operation is the most complex, performing 8 steps:

graph TD
    subgraph append["mmr_tree_append(path, key, value, tx)"]
        A1["1. Read Element at path/key<br/>→ get mmr_size, flags"]
        A2["2. Open data storage at<br/>path/key subtree"]
        A3["3. Create MmrNode::leaf(value)<br/>hash = blake3(value)"]
        A4["4. Push leaf into ckb MMR<br/>→ cascading merges write<br/>new internal nodes"]
        A5["5. Commit → flush new<br/>nodes to storage"]
        A6["6. Bag peaks → new mmr_root"]
        A7["7. Update Element with<br/>new mmr_root and mmr_size"]
        A8["8. Propagate changes up<br/>through GroveDB Merk<br/>hierarchy + commit tx"]

        A1 --> A2 --> A3 --> A4 --> A5 --> A6 --> A7 --> A8
    end

    style append fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style A6 fill:#d4e6f1,stroke:#2980b9,stroke-width:2px

Step 4 may write 1 node (leaf only) or 1 + N nodes (leaf + N internal merge nodes). Step 5 calls mmr.commit() which flushes the ckb MemStore to the MmrStore. Step 7 calls insert_subtree with the new MMR root as the child hash (via subtree_root_hash), since MmrTree has no child Merk.

Read Operations

mmr_tree_root_hash computes the root from MMR data in storage. mmr_tree_leaf_count derives the leaf count from mmr_size in the Element. No data storage access needed.

mmr_tree_get_value computes pos = leaf_index_to_pos(leaf_index), reads the single data storage entry at m{pos}, deserializes the MmrNode, and returns node.value.

Batch Operations

Multiple MMR appends can be batched using GroveOp::MmrTreeAppend { value }. Because the standard batch execute_ops_on_path function only has access to the Merk (not the MMR's storage context), MMR appends use a preprocessing phase:

graph TD
    subgraph batch["Batch Processing"]
        subgraph pre["1. PREPROCESSING (preprocess_mmr_tree_ops)"]
            P1["Group MmrTreeAppend ops<br/>by (path, key)"]
            P2["For each group:<br/>load MMR from data storage"]
            P3["Push ALL values<br/>into the MMR"]
            P4["Save updated nodes<br/>back to data storage"]
            P5["Replace group with single<br/>ReplaceNonMerkTreeRoot op<br/>carrying new mmr_root<br/>and mmr_size"]
            P1 --> P2 --> P3 --> P4 --> P5
        end

        subgraph body["2. STANDARD apply_body"]
            B1["execute_ops_on_path sees<br/>ReplaceNonMerkTreeRoot<br/>(standard Merk update)"]
            B2["Propagate root hash<br/>upward through grove"]
            B1 --> B2
        end

        P5 --> B1
    end

    style pre fill:#fef9e7,stroke:#f39c12,stroke-width:2px
    style body fill:#d5f5e3,stroke:#27ae60,stroke-width:2px

Example: A batch with 3 appends to the same MMR:

#![allow(unused)]
fn main() {
vec![
    QualifiedGroveDbOp { path: p, key: k, op: MmrTreeAppend { value: v1 } },
    QualifiedGroveDbOp { path: p, key: k, op: MmrTreeAppend { value: v2 } },
    QualifiedGroveDbOp { path: p, key: k, op: MmrTreeAppend { value: v3 } },
]
}

Preprocessing loads the MMR once, pushes v1, v2, v3 (creating all intermediate nodes), saves everything to data storage, then emits a single ReplaceNonMerkTreeRoot with the final mmr_root and mmr_size. The standard batch machinery handles the rest.

Proof Generation

MMR proofs are V1 proofs — they use the ProofBytes::MMR variant in the layered proof structure (see §9.6). The proof demonstrates that specific leaf values exist at specific positions within the MMR, and that their hashes are consistent with the mmr_root stored in the parent element.

Query Encoding

Query keys encode positions as big-endian u64 bytes. This preserves lexicographic sort order (since BE encoding is monotonic), allowing all standard QueryItem variants to work:

QueryItem::Key([0,0,0,0,0,0,0,5])            → leaf index 5
QueryItem::RangeInclusive([..2]..=[..7])      → leaf indices [2, 3, 4, 5, 6, 7]
QueryItem::RangeFrom([..10]..)                → leaf indices [10, 11, ..., N-1]
QueryItem::RangeFull                          → all leaves [0..leaf_count)

A safety cap of 10,000,000 indices prevents memory exhaustion from unbounded range queries. An empty MMR (zero leaves) returns an empty proof.

The MmrTreeProof Structure

#![allow(unused)]
fn main() {
struct MmrTreeProof {
    mmr_size: u64,                 // MMR size at proof time
    leaves: Vec<(u64, Vec<u8>)>,   // (leaf_index, value) for each proved leaf
    proof_items: Vec<[u8; 32]>,    // Sibling/peak hashes for verification
}
}

The proof_items contain the minimal set of hashes needed to reconstruct paths from the proved leaves up to the MMR root. These are the sibling nodes at each level and the uninvolved peak hashes.

Generation Flow

graph TD
    subgraph gen["generate_mmr_layer_proof"]
        G1["1. Get subquery items<br/>from PathQuery"]
        G2["2. Decode BE u64 keys<br/>→ leaf indices"]
        G3["3. Open data storage<br/>at subtree path"]
        G4["4. Load all MMR nodes<br/>into read-only MemNodeStore"]
        G5["5. Call ckb gen_proof(positions)<br/>→ MerkleProof"]
        G6["6. Read each proved leaf's<br/>full value from storage"]
        G7["7. Extract proof_items<br/>as [u8; 32] hashes"]
        G8["8. Encode MmrTreeProof<br/>→ ProofBytes::MMR(bytes)"]

        G1 --> G2 --> G3 --> G4 --> G5 --> G6 --> G7 --> G8
    end

    style gen fill:#fef9e7,stroke:#f39c12,stroke-width:2px

Step 4 uses a MemNodeStore — a read-only BTreeMap that pre-loads all MMR nodes from data storage. The ckb proof generator needs random access, so all nodes must be in memory.

Step 5 is where the ckb library does the heavy lifting: given the MMR size and the positions to prove, it determines which sibling and peak hashes are needed.

Worked Example

Proving leaf 2 in a 5-leaf MMR (mmr_size = 8):

MMR structure:
pos:         6         7
           /   \
         2       5
        / \     / \
       0   1   3   4

Leaf index 2 → MMR position 3

To verify leaf at position 3:
  1. Hash the claimed value: leaf_hash = blake3(value)
  2. Sibling at position 4:  node₅ = blake3(leaf_hash || proof[pos 4])
  3. Sibling at position 2:  node₆ = blake3(proof[pos 2] || node₅)
  4. Peak at position 7:     root  = bag(node₆, proof[pos 7])
  5. Compare: root == expected mmr_root ✓

proof_items = [hash(pos 4), hash(pos 2), hash(pos 7)]
leaves = [(2, original_value_bytes)]

The proof size for this example is: 3 hashes (96 bytes) + 1 leaf value + metadata. In general, proving K leaves from an N-leaf MMR requires O(K * log N) sibling hashes.

Proof Verification

Verification is pure — it requires no database access. The verifier needs only the proof bytes and the expected MMR root hash (which it extracts from the parent element proven in the Merk layer above).

Verification Steps

graph TD
    subgraph verify["verify_mmr_lower_layer"]
        V1["1. Deserialize MmrTreeProof<br/>(bincode, 100MB limit)"]
        V2["2. Cross-validate:<br/>proof.mmr_size == element.mmr_size"]
        V3["3. For each proved leaf:<br/>reconstruct MmrNode::leaf(value)<br/>hash = blake3(value)"]
        V4["4. Reconstruct proof_items<br/>as MmrNode::internal(hash)"]
        V5["5. Build ckb MerkleProof<br/>(mmr_size, proof_nodes)"]
        V6["6. Call proof.verify(<br/>root = MmrNode::internal(mmr_root),<br/>leaves = [(pos, leaf_node), ...]<br/>)"]
        V7["7. Return verified<br/>(leaf_index, value) pairs"]

        V1 --> V2 --> V3 --> V4 --> V5 --> V6 --> V7
    end

    V6 -->|"root mismatch"| FAIL["Err: InvalidProof<br/>'MMR proof root hash mismatch'"]
    V6 -->|"root matches"| V7

    style verify fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style FAIL fill:#fadbd8,stroke:#e74c3c,stroke-width:2px

The ckb MerkleProof::verify function reconstructs the root from the leaves and proof items, then compares it (using PartialEq, which checks only the hash) against the expected root.

Chain of Trust

The full chain from GroveDB state root to a verified leaf value:

GroveDB state_root (known/trusted)
│
├─ V0 Merk proof layer 0: proves subtree exists at root
│   └─ root_hash matches state_root ✓
│
├─ V0 Merk proof layer 1: proves MmrTree element at path/key
│   └─ KVValueHash node: element_bytes contain mmr_root
│   └─ combined_hash = combine_hash(H(element_bytes), mmr_root)
│   └─ root_hash matches parent layer ✓
│
└─ V1 MMR proof: proves leaf values are in the MMR
    └─ Reconstruct paths from leaves through siblings to peaks
    └─ Bag peaks → reconstructed root
    └─ reconstructed root == mmr_root from element_bytes ✓
    └─ Result: leaf₂ = [verified value bytes]

Security Properties

mmr_size cross-validation: The proof's mmr_size must match the element's mmr_size. A mismatch indicates the proof was generated against a different state and is rejected.
Bincode size limit: Deserialization uses a 100MB limit to prevent crafted length headers from causing huge allocations.
Limit accounting: Each proved leaf decrements the overall query limit by 1 using saturating_sub to prevent underflow.
Child hash return: The verifier returns the computed MMR root as the child hash for the combine_hash computation in the parent layer.
V0 rejection: Attempting a subquery into an MmrTree with V0 proofs returns Error::NotSupported. Only V1 proofs can descend into non-Merk trees.

Cost Tracking

MMR operations track costs with precision:

Operation	Hash calls	Storage operations
Append 1 leaf	`1 + trailing_ones(leaf_count)`	1 leaf write + N internal writes
Root hash	0 (cached in Element)	1 Element read
Get value	0	1 Element read + 1 data read
Leaf count	0	1 Element read

The hash count formula 1 + trailing_ones(N) gives the exact number of Blake3 calls: 1 for the leaf hash, plus one merge hash per cascade level.

Amortized analysis: Over N appends, the total hash count is:

Σ (1 + trailing_ones(i)) for i = 0..N-1
= N + Σ trailing_ones(i) for i = 0..N-1
= N + (N - popcount(N))
≈ 2N

So the amortized cost per append is approximately 2 Blake3 hash calls — constant and independent of the tree size. Compare this to Merk AVL trees where each insert requires O(log N) hashes for the path plus potential rotation hashes.

Storage cost: Each append writes 1 leaf node (37 + value_len bytes) plus 0 to log₂(N) internal nodes (33 bytes each). The amortized storage write per append is approximately 33 + 37 + value_len bytes ≈ 70 + value_len bytes.

Implementation Files

File	Purpose
`grovedb-mmr/src/node.rs`	`MmrNode` struct, Blake3 merge, serialization
`grovedb-mmr/src/grove_mmr.rs`	`GroveMmr` wrapper around ckb MMR
`grovedb-mmr/src/util.rs`	`mmr_node_key`, `hash_count_for_push`, `mmr_size_to_leaf_count`
`grovedb-mmr/src/proof.rs`	`MmrTreeProof` generation and verification
`grovedb-mmr/src/dense_merkle.rs`	Dense Merkle tree roots (used by BulkAppendTree)
`grovedb/src/operations/mmr_tree.rs`	GroveDB operations + `MmrStore` adapter + batch preprocessing
`grovedb/src/operations/proof/generate.rs`	V1 proof generation: `generate_mmr_layer_proof`, `query_items_to_leaf_indices`
`grovedb/src/operations/proof/verify.rs`	V1 proof verification: `verify_mmr_lower_layer`
`grovedb/src/tests/mmr_tree_tests.rs`	28 integration tests

Comparison with Other Authenticated Structures

	MMR (MmrTree)	Merk AVL (Tree)	Sinsemilla (CommitmentTree)
Use case	Append-only logs	Key-value store	ZK-friendly commitments
Hash function	Blake3	Blake3	Sinsemilla (Pallas curve)
Operations	Append, read by index	Insert, update, delete, query	Append, witness
Amortized hash/write	~2	O(log N)	~33 (32 levels + ommers)
Proof type	V1 (MMR sibling hashes)	V0 (Merk path proof)	Witness (Merkle auth path)
ZK-friendly	No	No	Yes (Halo 2 circuits)
Rebalancing	None	AVL rotations	None
Delete support	No	Yes	No

The BulkAppendTree — High-Throughput Append-Only Storage

The BulkAppendTree is GroveDB's answer to a specific engineering challenge: how do you build a high-throughput append-only log that supports efficient range proofs, minimises per-write hashing, and produces immutable chunk snapshots suitable for CDN distribution?

While an MmrTree (Chapter 13) is ideal for individual leaf proofs, the BulkAppendTree is designed for workloads where thousands of values arrive per block and clients need to sync by fetching ranges of data. It achieves this with a two-level architecture: a dense Merkle tree buffer that absorbs incoming appends, and a chunk-level MMR that records finalized chunk roots.

The Two-Level Architecture

┌────────────────────────────────────────────────────────────────┐
│                      BulkAppendTree                            │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Chunk MMR                                               │  │
│  │  ┌────┐ ┌────┐ ┌────┐ ┌────┐                            │  │
│  │  │ R0 │ │ R1 │ │ R2 │ │ H  │ ← Dense Merkle roots      │  │
│  │  └────┘ └────┘ └────┘ └────┘   of each chunk blob       │  │
│  │                     peak hashes bagged together = MMR root│  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Buffer (DenseFixedSizedMerkleTree, capacity = 2^h - 1) │  │
│  │  ┌───┐ ┌───┐ ┌───┐                                      │  │
│  │  │v_0│ │v_1│ │v_2│ ... (fills in level-order)           │  │
│  │  └───┘ └───┘ └───┘                                      │  │
│  │  dense_tree_root = recomputed root hash of dense tree     │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                │
│  state_root = blake3("bulk_state" || mmr_root || dense_tree_root) │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Level 1 — The Buffer. Incoming values are written to a DenseFixedSizedMerkleTree (see Chapter 16). The buffer capacity is 2^height - 1 positions. The dense tree's root hash (dense_tree_root) updates after every insert.

Level 2 — The Chunk MMR. When the buffer fills (reaches chunk_size entries), all entries are serialized into an immutable chunk blob, a dense Merkle root is computed over those entries, and that root is appended as a leaf to the chunk MMR. The buffer is then cleared.

The state root combines both levels into a single 32-byte commitment that changes on every append, ensuring the parent Merk tree always reflects the latest state.

How Values Fill the Buffer

Each call to append() follows this sequence:

Step 1: Write value to dense tree buffer at next position
        dense_tree.insert(value, store)

Step 2: Increment total_count
        total_count += 1

Step 3: Check if buffer is full (dense tree at capacity)
        if dense_tree.count() == capacity:
            → trigger compaction (§14.3)

Step 4: Compute new state root (+1 blake3 call)
        dense_tree_root = dense_tree.root_hash(store)
        state_root = blake3("bulk_state" || mmr_root || dense_tree_root)

The buffer IS a DenseFixedSizedMerkleTree (see Chapter 16). Its root hash changes after every insert, providing a commitment to all current buffer entries. This root hash is what flows into the state root computation.

Chunk Compaction

When the buffer fills (reaches chunk_size entries), compaction fires automatically:

Compaction Steps:
─────────────────
1. Read all chunk_size buffer entries

2. Compute dense Merkle root
   - Hash each entry: leaf[i] = blake3(entry[i])
   - Build complete binary tree bottom-up
   - Extract root hash
   Hash cost: chunk_size + (chunk_size - 1) = 2 * chunk_size - 1

3. Serialize entries into chunk blob
   - Auto-selects fixed-size or variable-size format (§14.6)
   - Store as: store.put(chunk_key(chunk_index), blob)

4. Append dense Merkle root to chunk MMR
   - MMR push with merge cascade (see Chapter 13)
   Hash cost: ~2 amortized (trailing_ones pattern)

5. Reset the dense tree (clear all buffer entries from storage)
   - Dense tree count reset to 0

After compaction, the chunk blob is permanently immutable — it never changes again. This makes chunk blobs ideal for CDN caching, client sync, and archival storage.

Example: 4 appends with chunk_power=2 (chunk_size=4)

Append v_0: dense_tree=[v_0],       dense_root=H(v_0), total=1
Append v_1: dense_tree=[v_0,v_1],   dense_root=H(v_0,v_1), total=2
Append v_2: dense_tree=[v_0..v_2],  dense_root=H(v_0..v_2), total=3
Append v_3: dense_tree=[v_0..v_3],  dense_root=H(v_0..v_3), total=4
  → COMPACTION:
    chunk_blob_0 = serialize([v_0, v_1, v_2, v_3])
    dense_root_0 = dense_merkle_root(v_0..v_3)
    mmr.push(dense_root_0)
    dense tree cleared (count=0)

Append v_4: dense_tree=[v_4],       dense_root=H(v_4), total=5
  → state_root = blake3("bulk_state" || mmr_root || dense_root)

The State Root

The state root binds both levels into one hash:

#![allow(unused)]
fn main() {
fn compute_state_root(
    mmr_root: &[u8; 32],         // Chunk MMR root (or [0;32] if empty)
    dense_tree_root: &[u8; 32],  // Root hash of current buffer (dense tree)
) -> [u8; 32] {
    blake3("bulk_state" || mmr_root || dense_tree_root)
}
}

The total_count and chunk_power are not included in the state root because they are already authenticated by the Merk value hash — they are fields of the serialized Element stored in the parent Merk node. The state root captures only the data-level commitments (mmr_root and dense_tree_root). This is the hash that flows as the Merk child hash and propagates up to the GroveDB root hash.

The Dense Merkle Root

When a chunk compacts, the entries need a single 32-byte commitment. The BulkAppendTree uses a dense (complete) binary Merkle tree:

Given entries [e_0, e_1, e_2, e_3]:

Level 0 (leaves):  blake3(e_0)  blake3(e_1)  blake3(e_2)  blake3(e_3)
                      \__________/              \__________/
Level 1:           blake3(h_0 || h_1)       blake3(h_2 || h_3)
                          \____________________/
Level 2 (root):    blake3(h_01 || h_23)  ← this is the dense Merkle root

Because chunk_size is always a power of 2 (by construction: 1u32 << chunk_power), the tree is always complete (no padding or dummy leaves needed). The hash count is exactly 2 * chunk_size - 1:

chunk_size leaf hashes (one per entry)
chunk_size - 1 internal node hashes

The dense Merkle root implementation lives in grovedb-mmr/src/dense_merkle.rs and provides two functions:

compute_dense_merkle_root(hashes) — from pre-hashed leaves
compute_dense_merkle_root_from_values(values) — hashes values first, then builds the tree

Chunk Blob Serialization

Chunk blobs are the immutable archives produced by compaction. The serializer auto-selects the most compact wire format based on entry sizes:

Fixed-size format (flag 0x01) — when all entries have the same length:

┌──────┬──────────┬─────────────┬─────────┬─────────┬─────────┐
│ 0x01 │ count    │ entry_size  │ entry_0 │ entry_1 │ ...     │
│ 1B   │ 4B (BE)  │ 4B (BE)     │ N bytes │ N bytes │         │
└──────┴──────────┴─────────────┴─────────┴─────────┴─────────┘
Total: 1 + 4 + 4 + (count × entry_size) bytes

Variable-size format (flag 0x00) — when entries have different lengths:

┌──────┬──────────┬─────────┬──────────┬─────────┬─────────────┐
│ 0x00 │ len_0    │ entry_0 │ len_1    │ entry_1 │ ...         │
│ 1B   │ 4B (BE)  │ N bytes │ 4B (BE)  │ M bytes │             │
└──────┴──────────┴─────────┴──────────┴─────────┴─────────────┘
Total: 1 + Σ(4 + len_i) bytes

The fixed-size format saves 4 bytes per entry compared to variable-size, which adds up significantly for large chunks of uniform-size data (like 32-byte hash commitments). For 1024 entries of 32 bytes each:

Fixed: 1 + 4 + 4 + 32768 = 32,777 bytes
Variable: 1 + 1024 × (4 + 32) = 36,865 bytes
Savings: ~11%

Storage Key Layout

All BulkAppendTree data lives in the data namespace, keyed with single-character prefixes:

Key pattern	Format	Size	Purpose
`M`	1 byte	1B	Metadata key
`b` + `{index}`	`b` + u32 BE	5B	Buffer entry at index
`e` + `{index}`	`e` + u64 BE	9B	Chunk blob at index
`m` + `{pos}`	`m` + u64 BE	9B	MMR node at position

Metadata stores mmr_size (8 bytes BE). The total_count and chunk_power are stored in the Element itself (in the parent Merk), not in data namespace metadata. This split means reading the count is a simple element lookup without opening the data storage context.

Buffer keys use u32 indices (0 to chunk_size - 1) because the buffer capacity is limited by chunk_size (a u32, computed as 1u32 << chunk_power). Chunk keys use u64 indices because the number of completed chunks can grow indefinitely.

The BulkAppendTree Struct

#![allow(unused)]
fn main() {
pub struct BulkAppendTree<S> {
    pub total_count: u64,                        // Total values ever appended
    pub dense_tree: DenseFixedSizedMerkleTree<S>, // The buffer (dense tree)
}
}

The buffer IS a DenseFixedSizedMerkleTree — its root hash is dense_tree_root.

Accessors:

capacity() -> u16: dense_tree.capacity() (= 2^height - 1)
epoch_size() -> u64: capacity + 1 (= 2^height, the number of entries per chunk)
height() -> u8: dense_tree.height()

Derived values (not stored):

Value	Formula
`chunk_count`	`total_count / epoch_size`
`buffer_count`	`dense_tree.count()`
`mmr_size`	`leaf_count_to_mmr_size(chunk_count)`

GroveDB Operations

The BulkAppendTree integrates with GroveDB through six operations defined in grovedb/src/operations/bulk_append_tree.rs:

bulk_append

The primary mutating operation. Follows the standard GroveDB non-Merk storage pattern:

1. Validate element is BulkAppendTree
2. Open data storage context
3. Load tree from store
4. Append value (may trigger compaction)
5. Update element in parent Merk with new state_root + total_count
6. Propagate changes up through Merk hierarchy
7. Commit transaction

The AuxBulkStore adapter wraps GroveDB's get_aux/put_aux/delete_aux calls and accumulates OperationCost in a RefCell for cost tracking. Hash costs from the append operation are added to cost.hash_node_calls.

Read operations

Operation	What it returns	Aux storage?
`bulk_get_value(path, key, position)`	Value at global position	Yes — reads from chunk blob or buffer
`bulk_get_chunk(path, key, chunk_index)`	Raw chunk blob	Yes — reads chunk key
`bulk_get_buffer(path, key)`	All current buffer entries	Yes — reads buffer keys
`bulk_count(path, key)`	Total count (u64)	No — reads from element
`bulk_chunk_count(path, key)`	Completed chunks (u64)	No — computed from element

The get_value operation transparently routes by position:

if position < completed_chunks × chunk_size:
    chunk_idx = position / chunk_size
    intra_idx = position % chunk_size
    → read chunk blob, deserialize, return entry[intra_idx]
else:
    buffer_idx = position % chunk_size
    → read buffer_key(buffer_idx)

Batch Operations and Preprocessing

BulkAppendTree supports batch operations through the GroveOp::BulkAppend variant. Since execute_ops_on_path doesn't have access to the data storage context, all BulkAppend ops must be preprocessed before apply_body.

The preprocessing pipeline:

Input: [BulkAppend{v1}, Insert{...}, BulkAppend{v2}, BulkAppend{v3}]
                                     ↑ same (path,key) as v1

Step 1: Group BulkAppend ops by (path, key)
        group_1: [v1, v2, v3]

Step 2: For each group:
        a. Read existing element → get (total_count, chunk_power)
        b. Open transactional storage context
        c. Load BulkAppendTree from store
        d. Load existing buffer into memory (Vec<Vec<u8>>)
        e. For each value:
           tree.append_with_mem_buffer(store, value, &mut mem_buffer)
        f. Save metadata
        g. Compute final state_root

Step 3: Replace all BulkAppend ops with one ReplaceNonMerkTreeRoot per group
        carrying: hash=state_root, meta=BulkAppendTree{total_count, chunk_power}

Output: [ReplaceNonMerkTreeRoot{...}, Insert{...}]

The append_with_mem_buffer variant avoids read-after-write issues: buffer entries are tracked in a Vec<Vec<u8>> in memory, so compaction can read them even though the transactional storage hasn't committed yet.

The BulkStore Trait

#![allow(unused)]
fn main() {
pub trait BulkStore {
    fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, String>;
    fn put(&self, key: &[u8], value: &[u8]) -> Result<(), String>;
    fn delete(&self, key: &[u8]) -> Result<(), String>;
}
}

Methods take &self (not &mut self) to match GroveDB's interior mutability pattern where writes go through a batch. The GroveDB integration implements this via AuxBulkStore which wraps a StorageContext and accumulates OperationCost.

The MmrAdapter bridges BulkStore to the ckb MMR's MMRStoreReadOps/ MMRStoreWriteOps traits, adding a write-through cache for read-after-write correctness.

Proof Generation

BulkAppendTree proofs support range queries over positions. The proof structure captures everything needed for a stateless verifier to confirm that specific data exists in the tree:

#![allow(unused)]
fn main() {
pub struct BulkAppendTreeProof {
    pub chunk_power: u8,
    pub total_count: u64,
    pub chunk_blobs: Vec<(u64, Vec<u8>)>,         // Full chunk blobs
    pub chunk_mmr_size: u64,
    pub chunk_mmr_proof_items: Vec<[u8; 32]>,      // MMR sibling hashes
    pub chunk_mmr_leaves: Vec<(u64, [u8; 32])>,    // (leaf_idx, dense_root)
    pub buffer_entries: Vec<Vec<u8>>,               // ALL buffer entries
    pub chunk_mmr_root: [u8; 32],
}
}

Generation steps for a range [start, end) (with chunk_size = 1u32 << chunk_power):

1. Determine overlapping chunks
   first_chunk = start / chunk_size
   last_chunk  = min((end-1) / chunk_size, completed_chunks - 1)

2. Read chunk blobs for overlapping chunks
   For each chunk_idx in [first_chunk, last_chunk]:
     chunk_blobs.push((chunk_idx, store.get(chunk_key(idx))))

3. Compute dense Merkle root for each chunk blob
   For each blob:
     deserialize → values
     dense_root = compute_dense_merkle_root_from_values(values)
     chunk_mmr_leaves.push((chunk_idx, dense_root))

4. Generate MMR proof for those chunk positions
   positions = chunk_indices.map(|idx| leaf_to_pos(idx))
   proof = mmr.gen_proof(positions)
   chunk_mmr_proof_items = proof.proof_items().map(|n| n.hash)

5. Get chunk MMR root

6. Read ALL buffer entries (bounded by chunk_size)
   for i in 0..buffer_count:
     buffer_entries.push(store.get(buffer_key(i)))

Why include ALL buffer entries? The buffer is a dense Merkle tree whose root hash commits to every entry. The verifier must rebuild the tree from all entries to verify the dense_tree_root. Since the buffer is bounded by capacity (at most 65,535 entries), this is a reasonable cost.

Proof Verification

Verification is a pure function — no database access needed. It performs five checks:

Step 0: Metadata consistency
        - chunk_power <= 31
        - buffer_entries.len() == total_count % chunk_size
        - MMR leaf count == completed_chunks

Step 1: Chunk blob integrity
        For each (chunk_idx, blob):
          values = deserialize(blob)
          computed_root = dense_merkle_root(values)
          assert computed_root == chunk_mmr_leaves[chunk_idx]

Step 2: Chunk MMR proof
        Reconstruct MmrNode leaves and proof items
        proof.verify(chunk_mmr_root, leaves) == true

Step 3: Buffer (dense tree) integrity
        Rebuild DenseFixedSizedMerkleTree from buffer_entries
        dense_tree_root = compute root hash of rebuilt tree

Step 4: State root
        computed = blake3("bulk_state" || chunk_mmr_root || dense_tree_root)
        assert computed == expected_state_root

After verification succeeds, the BulkAppendTreeProofResult provides a values_in_range(start, end) method that extracts specific values from the verified chunk blobs and buffer entries.

How It Ties to the GroveDB Root Hash

The BulkAppendTree is a non-Merk tree — it stores data in the data namespace, not in a child Merk subtree. In the parent Merk, the element is stored as:

Element::BulkAppendTree(total_count, chunk_power, flags)

The state root flows as the Merk child hash. The parent Merk node hash is:

combine_hash(value_hash(element_bytes), state_root)

The state_root flows as the Merk child hash (via insert_subtree's subtree_root_hash parameter). Any change to the state root propagates up through the GroveDB Merk hierarchy to the root hash.

In V1 proofs (§9.6), the parent Merk proof proves the element bytes and the child hash binding, and the BulkAppendTreeProof proves that the queried data is consistent with the state_root used as the child hash.

Cost Tracking

Each operation's hash cost is tracked explicitly:

Operation	Blake3 calls	Notes
Single append (no compaction)	3	2 for buffer hash chain + 1 for state root
Single append (with compaction)	3 + 2C - 1 + ~2	Chain + dense Merkle (C=chunk_size) + MMR push + state root
`get_value` from chunk	0	Pure deserialization, no hashing
`get_value` from buffer	0	Direct key lookup
Proof generation	Depends on chunk count	Dense Merkle root per chunk + MMR proof
Proof verification	2C·K - K + B·2 + 1	K chunks, B buffer entries, C chunk_size

Amortized cost per append: For chunk_size=1024 (chunk_power=10), the compaction overhead of ~2047 hashes (dense Merkle root) is amortized over 1024 appends, adding ~2 hashes per append. Combined with the 3 per-append hashes, the amortized total is ~5 blake3 calls per append — very efficient for a cryptographically authenticated structure.

Comparison with MmrTree

	BulkAppendTree	MmrTree
Architecture	Two-level (buffer + chunk MMR)	Single MMR
Per-append hash cost	3 (+ amortized ~2 for compaction)	~2
Proof granularity	Range queries over positions	Individual leaf proofs
Immutable snapshots	Yes (chunk blobs)	No
CDN-friendly	Yes (chunk blobs cacheable)	No
Buffer entries	Yes (need all for proof)	N/A
Best for	High-throughput logs, bulk sync	Event logs, individual lookups
Element discriminant	13	12
TreeType	9	8

Choose MmrTree when you need individual leaf proofs with minimal overhead. Choose BulkAppendTree when you need range queries, bulk synchronization, and chunk-based snapshots.

Implementation Files

File	Purpose
`grovedb-bulk-append-tree/src/lib.rs`	Crate root, re-exports
`grovedb-bulk-append-tree/src/tree/mod.rs`	`BulkAppendTree` struct, state accessors, metadata persistence
`grovedb-bulk-append-tree/src/tree/append.rs`	`append()`, `append_with_mem_buffer()`, `compact_entries()`
`grovedb-bulk-append-tree/src/tree/hash.rs`	`compute_state_root`
`grovedb-bulk-append-tree/src/tree/keys.rs`	`META_KEY`, `buffer_key`, `chunk_key`, `mmr_node_key`
`grovedb-bulk-append-tree/src/tree/query.rs`	`get_value`, `get_chunk`, `get_buffer`
`grovedb-bulk-append-tree/src/tree/mmr_adapter.rs`	`MmrAdapter` with write-through cache
`grovedb-bulk-append-tree/src/chunk.rs`	Chunk blob serialization (fixed + variable formats)
`grovedb-bulk-append-tree/src/proof.rs`	`BulkAppendTreeProof` generation and verification
`grovedb-bulk-append-tree/src/store.rs`	`BulkStore` trait
`grovedb-bulk-append-tree/src/error.rs`	`BulkAppendError` enum
`grovedb/src/operations/bulk_append_tree.rs`	GroveDB operations, `AuxBulkStore`, batch preprocessing
`grovedb/src/tests/bulk_append_tree_tests.rs`	27 integration tests

The CommitmentTree — Sinsemilla Commitment Anchors

The CommitmentTree is GroveDB's bridge between authenticated storage and zero-knowledge proof systems. It combines a BulkAppendTree (Chapter 14) for efficient chunk-compacted data storage with a Sinsemilla frontier in the data namespace for ZK-compatible anchors. Like MmrTree and BulkAppendTree, it has no child Merk — the combined root hash flows as the Merk child hash. Both the BulkAppendTree entries and the Sinsemilla frontier live in the data namespace.

This chapter covers the Sinsemilla hash function and why it matters for zero-knowledge circuits, the frontier data structure and its compact serialization, the dual-namespace storage architecture, the GroveDB operations, batch preprocessing, client-side witness generation, and how proofs work.

Why a ZK-Friendly Tree?

GroveDB's standard trees use Blake3 hashing. Blake3 is fast in software, but expensive inside zero-knowledge circuits. When a spender needs to prove "I know a note at position P in the commitment tree" without revealing P, they must evaluate the Merkle hash function 32 times (once per tree level) inside a ZK circuit.

Sinsemilla (specified in ZIP-244 for the Zcash Orchard protocol) is designed for exactly this use case — it provides efficient in-circuit hashing over the Pallas elliptic curve, one half of the Pasta curve cycle used by the Halo 2 proof system.

Property	Blake3	Sinsemilla
Circuit cost	~25,000 constraints per hash	~800 constraints per hash
Software speed	Very fast (~2 GB/s)	Slow (~10,000 hashes/s)
Algebraic structure	None (bitwise)	Pallas curve point operations
Primary purpose	General hashing, Merkle trees	In-circuit Merkle proofs
Used by	GroveDB Merk trees, MMR, Bulk	Orchard shielded protocol
Output size	32 bytes	32 bytes (Pallas field element)

The CommitmentTree uses Sinsemilla for the Merkle tree that ZK circuits reason about, while still using Blake3 for the GroveDB Merk hierarchy above it. Items inserted into the tree are stored via a BulkAppendTree in the data namespace (chunk-compacted, retrievable by position) and simultaneously appended to the Sinsemilla frontier (producing a ZK-provable anchor).

The Data-Namespace Architecture

The CommitmentTree stores all data in the data namespace at the same subtree path. Like MmrTree and BulkAppendTree, it has no child Merk (no root_key field — the type-specific root flows as the Merk child hash). The BulkAppendTree entries and the Sinsemilla frontier coexist in the data namespace using distinct key prefixes:

┌──────────────────────────────────────────────────────────────┐
│                       CommitmentTree                          │
│                                                               │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │  Data Namespace                                         │  │
│  │                                                         │  │
│  │  BulkAppendTree storage (Chapter 14):                   │  │
│  │    Buffer entries → chunk blobs → chunk MMR             │  │
│  │    value = cmx (32) || rho (32) || cv_net (32)          │  │
│  │            || ciphertext (216)                          │  │
│  │                                                         │  │
│  │  Sinsemilla Frontier (~1KB):                            │  │
│  │    key: b"__ct_data__" (COMMITMENT_TREE_DATA_KEY)       │  │
│  │    Depth-32 incremental Merkle tree                     │  │
│  │    Stores only the rightmost path (leaf + ommers)       │  │
│  │    O(1) append, O(1) root computation                   │  │
│  │    Produces Orchard-compatible Anchor for ZK proofs     │  │
│  └─────────────────────────────────────────────────────────┘  │
│                                                               │
│  sinsemilla_root embedded in Element bytes                    │
│    → flows through Merk value_hash → GroveDB state root      │
└──────────────────────────────────────────────────────────────┘

Why two structures? The BulkAppendTree provides efficient, chunk-compacted storage and retrieval for potentially millions of encrypted notes. The Sinsemilla frontier provides ZK-compatible anchors that can be proved inside a Halo 2 circuit. Both are updated in lockstep on every append.

Compare with the other non-standard tree types:

	CommitmentTree	MmrTree	BulkAppendTree
Child Merk	No	No	No
Data namespace	BulkAppendTree entries + frontier	MMR nodes	Buffer + chunks + MMR
Aux namespace	—	—	—
Items queryable	Via V1 proofs	Via V1 proofs	Via V1 proofs
Hash function	Sinsemilla + Blake3	Blake3	Blake3

The Sinsemilla Frontier

The frontier is a depth-32 incremental Merkle tree implemented by the incrementalmerkletree crate's Frontier<MerkleHashOrchard, 32> type. Instead of storing all 2^32 possible leaves, it stores only the information needed to append the next leaf and compute the current root: the rightmost leaf and its ommers (sibling hashes needed for root computation).

                         root (level 32)
                        /               \
                      ...               ...
                     /                     \
                  (level 2)             (level 2)
                  /     \               /     \
              (level 1) (level 1)   (level 1)  ?
              /    \    /    \      /    \
             L0    L1  L2    L3   L4    ?     ← frontier stores L4
                                              + ommers at levels
                                              where left sibling exists

The frontier stores:

leaf: the most recently appended value (a Pallas field element)
ommers: the left-sibling hashes at each level where the frontier path goes right (at most 32 ommers for a depth-32 tree)
position: the 0-indexed position of the leaf

Key properties:

O(1) append: insert a new leaf, update ommers, recompute root
O(1) root: traverse the stored ommers from leaf to root
~1KB constant size: regardless of how many leaves have been appended
Deterministic: two frontiers with the same sequence of appends produce the same root

The EMPTY_SINSEMILLA_ROOT constant is the root of an empty depth-32 tree, precomputed as MerkleHashOrchard::empty_root(Level::from(32)).to_bytes():

0xae2935f1dfd8a24aed7c70df7de3a668eb7a49b1319880dde2bbd9031ae5d82f

How Appending Works — The Ommer Cascade

When a new commitment is appended at position N, the number of ommers that must be updated equals trailing_ones(N) — the number of trailing 1-bits in N's binary representation. This is the same pattern as the MMR merge cascade (§13.4), but operating on ommers rather than peaks.

Worked example — appending 4 leaves:

Position 0 (binary: 0, trailing_ones: 0):
  frontier = { leaf: L0, ommers: [], position: 0 }
  Sinsemilla hashes: 32 (root computation) + 0 (no ommer merges) = 32

Position 1 (binary: 1, trailing_ones: 0 of PREVIOUS position 0):
  Before: position 0 has trailing_ones = 0
  frontier = { leaf: L1, ommers: [H(L0,L1) at level 1], position: 1 }
  Sinsemilla hashes: 32 + 0 = 32

Position 2 (binary: 10, trailing_ones: 0 of PREVIOUS position 1):
  Before: position 1 has trailing_ones = 1
  frontier = { leaf: L2, ommers: [level1_hash], position: 2 }
  Sinsemilla hashes: 32 + 1 = 33

Position 3 (binary: 11, trailing_ones: 0 of PREVIOUS position 2):
  Before: position 2 has trailing_ones = 0
  frontier = { leaf: L3, ommers: [level1_hash, level2_hash], position: 3 }
  Sinsemilla hashes: 32 + 0 = 32

The total Sinsemilla hashes per append is:

32 (root computation always traverses all 32 levels)
+ trailing_ones(current_position)  (ommer cascade)

On average, trailing_ones is ~1 (geometric distribution), so the average cost is ~33 Sinsemilla hashes per append. The worst case (at position 2^32 - 1, where all bits are 1) is 64 hashes.

The Frontier Serialization Format

The frontier is stored in data storage at key b"__ct_data__". The wire format is:

┌──────────────────────────────────────────────────────────────────┐
│ has_frontier: u8                                                  │
│   0x00 → empty tree (no more fields)                             │
│   0x01 → non-empty (fields follow)                               │
├──────────────────────────────────────────────────────────────────┤
│ position: u64 BE (8 bytes)      — 0-indexed leaf position        │
├──────────────────────────────────────────────────────────────────┤
│ leaf: [u8; 32]                  — Pallas field element bytes     │
├──────────────────────────────────────────────────────────────────┤
│ ommer_count: u8                 — number of ommers (0..=32)      │
├──────────────────────────────────────────────────────────────────┤
│ ommers: [ommer_count × 32 bytes] — Pallas field elements        │
└──────────────────────────────────────────────────────────────────┘

Size analysis:

State	Size	Breakdown
Empty	1 byte	`0x00` flag only
1 leaf, 0 ommers	42 bytes	1 + 8 + 32 + 1
~16 ommers (average)	554 bytes	1 + 8 + 32 + 1 + 16×32
32 ommers (maximum)	1,066 bytes	1 + 8 + 32 + 1 + 32×32

The frontier size is bounded by ~1.1KB regardless of how many millions of commitments have been appended. This makes the load→modify→save cycle very cheap (1 seek to read, 1 seek to write).

Element Representation

#![allow(unused)]
fn main() {
CommitmentTree(
    u64,                  // total_count: number of appended items
    u8,                   // chunk_power: dense tree height for BulkAppendTree buffer
    Option<ElementFlags>, // flags: optional metadata
)
}

The chunk_power parameter controls the BulkAppendTree buffer's dense tree height; chunk_power must be in the range 1..=16 (see §14.1 and §16).

Type identifiers:

Identifier	Value
Element discriminant	11
`TreeType`	`CommitmentTree = 7`
`ElementType`	11
`COMMITMENT_TREE_COST_SIZE`	12 bytes (8 total_count + 1 chunk_power + 1 discriminant + 2 overhead)

The Sinsemilla root is NOT stored in the Element. It flows as the Merk child hash through the insert_subtree mechanism. When the parent Merk computes its combined_value_hash, the Sinsemilla-derived root is included as the child hash:

combined_value_hash = blake3(value_hash || child_hash)
                                           ↑ sinsemilla/BulkAppendTree combined root

This means any change to the Sinsemilla frontier automatically propagates through the GroveDB Merk hierarchy to the state root.

Constructor methods:

Method	Creates
`Element::empty_commitment_tree(chunk_power)`	Empty tree, count=0, no flags
`Element::empty_commitment_tree_with_flags(chunk_power, flags)`	Empty tree with flags
`Element::new_commitment_tree(total_count, chunk_power, flags)`	All fields explicit

Storage Architecture

The CommitmentTree stores all its data in a single data namespace at the subtree path. BulkAppendTree entries and the Sinsemilla frontier coexist in the same column using distinct key prefixes. No aux namespace is used.

┌──────────────────────────────────────────────────────────────────┐
│  Data Namespace (all CommitmentTree storage)                      │
│                                                                   │
│  BulkAppendTree storage keys (see §14.7):                         │
│    b"m" || pos (u64 BE)  → MMR node blobs                        │
│    b"b" || index (u64 BE)→ buffer entries (cmx||rho||cv_net||ciphertext) │
│    b"e" || chunk (u64 BE)→ chunk blobs (compacted buffer)         │
│    b"M"                  → BulkAppendTree metadata                │
│                                                                   │
│  Sinsemilla frontier:                                             │
│    b"__ct_data__"        → serialized CommitmentFrontier (~1KB)   │
│                                                                   │
│  No Merk nodes — this is a non-Merk tree.                         │
│  Data authenticated via BulkAppendTree state_root (Blake3).       │
│  Sinsemilla root authenticates all cmx values via Pallas curve.   │
└──────────────────────────────────────────────────────────────────┘

The load→modify→save pattern: Every mutating operation loads the frontier from data storage, modifies it in memory, and writes it back. Since the frontier is at most ~1KB, this is an inexpensive pair of I/O operations (1 seek to read, 1 seek to write). Simultaneously, the BulkAppendTree is loaded, appended to, and saved.

Root hash propagation: When an item is inserted, two things change:

The BulkAppendTree state changes (new entry in buffer or chunk compaction)
The Sinsemilla root changes (new commitment in the frontier)

Both are captured in the updated CommitmentTree element. The parent Merk node hash becomes:

combined_hash = combine_hash(
    value_hash(element_bytes),    ← includes total_count + chunk_power
    child_hash(combined_root)     ← sinsemilla/BulkAppendTree combined root
)

Like MmrTree and BulkAppendTree, the type-specific root flows as the Merk child hash. All data authentication flows through this child hash binding.

Non-Merk data storage implications: Because the data namespace contains BulkAppendTree keys (not Merk nodes), operations that iterate storage as Merk elements — such as find_subtrees, is_empty_tree, and verify_merk_and_submerks — must special-case CommitmentTree (and other non-Merk tree types). The uses_non_merk_data_storage() helper on both Element and TreeType identifies these tree types. Delete operations clear the data namespace directly instead of iterating it, and verify_grovedb skips sub-merk recursion for these types.

GroveDB Operations

CommitmentTree provides four operations. The insert operation is generic over M: MemoSize (from the orchard crate), which controls ciphertext payload size validation. The default M = DashMemo gives a 216-byte payload (32 epk + 104 enc + 80 out).

#![allow(unused)]
fn main() {
// Insert a commitment (typed) — returns (sinsemilla_root, position)
// M controls ciphertext size validation
db.commitment_tree_insert::<_, _, M>(path, key, cmx, rho, cv_net, ciphertext, tx, version)

// Insert a commitment (raw bytes) — validates payload.len() == ciphertext_payload_size::<DashMemo>()
db.commitment_tree_insert_raw(path, key, cmx, rho, cv_net, payload_vec, tx, version)

// Get the current Orchard Anchor
db.commitment_tree_anchor(path, key, tx, version)

// Retrieve a value by global position
db.commitment_tree_get_value(path, key, position, tx, version)

// Get the current item count
db.commitment_tree_count(path, key, tx, version)
}

The typed commitment_tree_insert accepts a TransmittedNoteCiphertext<M> and serializes it internally. The raw commitment_tree_insert_raw (pub(crate)) accepts Vec<u8> and is used by the direct (non-batch) API and apply_operations_without_batching. Batch preprocessing performs its own inline processing using a shared StorageBatch and does not call commitment_tree_insert_raw.

commitment_tree_insert

The insert operation updates both the BulkAppendTree and the Sinsemilla frontier in a single atomic operation:

Step 1: Validate element at path/key is a CommitmentTree
        → extract total_count, chunk_power, flags

Step 2: Build ct_path = path ++ [key]

Step 3: Open data storage context at ct_path
        Load CommitmentTree (frontier + BulkAppendTree)
        Serialize ciphertext → validate payload size matches M
        Append cmx||rho||cv_net||ciphertext to BulkAppendTree
        Append cmx to Sinsemilla frontier → get new sinsemilla_root
        (cv_net never enters the frontier — the anchor is unchanged)
        Track Blake3 + Sinsemilla hash costs

Step 4: Save updated frontier to data storage

Step 5: Open parent Merk at path
        Write updated CommitmentTree element:
          new total_count, same chunk_power, same flags
        Child hash = combined_root (sinsemilla + bulk state)

Step 6: Propagate changes from parent upward through Merk hierarchy

Step 7: Commit storage batch and local transaction
        Return (sinsemilla_root, position)

graph TD
    A["commitment_tree_insert(path, key, cmx, rho, cv_net, ciphertext)"] --> B["Validate: is CommitmentTree?"]
    B --> C["Open data storage, load CommitmentTree"]
    C --> D["Serialize & validate ciphertext size"]
    D --> E["BulkAppendTree.append(cmx||rho||cv_net||payload)"]
    E --> F["frontier.append(cmx)"]
    F --> G["Save frontier to data storage"]
    G --> H["Update parent CommitmentTree element<br/>new sinsemilla_root + total_count"]
    H --> I["Propagate changes upward"]
    I --> J["Commit batch + transaction"]
    J --> K["Return (sinsemilla_root, position)"]

    style F fill:#fadbd8,stroke:#e74c3c,stroke-width:2px
    style E fill:#d5f5e3,stroke:#27ae60,stroke-width:2px
    style H fill:#d4e6f1,stroke:#2980b9,stroke-width:2px

Red = Sinsemilla operations. Green = BulkAppendTree operations. Blue = element update bridging both.

commitment_tree_anchor

The anchor operation is a read-only query:

Step 1: Validate element at path/key is a CommitmentTree
Step 2: Build ct_path = path ++ [key]
Step 3: Load frontier from data storage
Step 4: Return frontier.anchor() as orchard::tree::Anchor

The Anchor type is the Orchard-native representation of the Sinsemilla root, suitable for passing directly to orchard::builder::Builder when constructing spend authorization proofs.

commitment_tree_get_value

Retrieves a stored value (cmx || rho || cv_net || payload) by its global position:

Step 1: Validate element at path/key is a CommitmentTree
        → extract total_count, chunk_power
Step 2: Build ct_path = path ++ [key]
Step 3: Open data storage context, wrap in CachedBulkStore
Step 4: Load BulkAppendTree, call get_value(position)
Step 5: Return Option<Vec<u8>>

This follows the same pattern as bulk_get_value (§14.9) — the BulkAppendTree transparently retrieves from the buffer or a compacted chunk blob depending on where the position falls.

commitment_tree_count

Returns the total number of items appended to the tree:

Step 1: Read element at path/key
Step 2: Verify it is a CommitmentTree
Step 3: Return total_count from element fields

This is a simple element field read — no storage access beyond the parent Merk.

Batch Operations

CommitmentTree supports batch inserts through the GroveOp::CommitmentTreeInsert variant:

#![allow(unused)]
fn main() {
GroveOp::CommitmentTreeInsert {
    cmx: [u8; 32],      // extracted note commitment
    rho: [u8; 32],      // nullifier of the spent note
    cv_net: [u8; 32],   // value commitment (for outgoing/OVK recovery)
    payload: Vec<u8>,    // serialized ciphertext (216 bytes for DashMemo)
}
}

Two constructors create this op:

#![allow(unused)]
fn main() {
// Raw constructor — caller serializes payload manually
QualifiedGroveDbOp::commitment_tree_insert_op(path, cmx, rho, cv_net, payload_vec)

// Typed constructor — serializes TransmittedNoteCiphertext<M> internally
QualifiedGroveDbOp::commitment_tree_insert_op_typed::<M>(path, cmx, rho, cv_net, &ciphertext)
}

Multiple inserts targeting the same tree are allowed in a single batch. Since execute_ops_on_path doesn't have access to data storage, all CommitmentTree ops must be preprocessed before apply_body.

The preprocessing pipeline (preprocess_commitment_tree_ops):

Input: [CTInsert{cmx1}, Insert{...}, CTInsert{cmx2}, CTInsert{cmx3}]
                                       ↑ same (path,key) as cmx1

Step 1: Group CommitmentTreeInsert ops by (path, key)
        group_1: [cmx1, cmx2, cmx3]

Step 2: For each group:
        a. Read existing element → verify CommitmentTree, extract chunk_power
        b. Open transactional storage context at ct_path
        c. Load CommitmentTree from data storage (frontier + BulkAppendTree)
        d. For each (cmx, rho, cv_net, payload):
           - ct.append_raw(cmx, rho, cv_net, payload) — validates size, appends to both
        e. Save updated frontier to data storage

Step 3: Replace all CTInsert ops with one ReplaceNonMerkTreeRoot per group
        carrying: hash=bulk_state_root (combined root),
                  meta=NonMerkTreeMeta::CommitmentTree {
                      total_count: new_count,
                      chunk_power,
                  }

Output: [ReplaceNonMerkTreeRoot{...}, Insert{...}]

The first CommitmentTreeInsert op in each group is replaced by the ReplaceNonMerkTreeRoot; subsequent ops for the same (path, key) are dropped. The standard batch machinery then handles element update and root hash propagation.

MemoSize Generic and Ciphertext Handling

The CommitmentTree<S, M> struct is generic over M: MemoSize (from the orchard crate). This controls the size of encrypted note ciphertexts stored alongside each commitment.

#![allow(unused)]
fn main() {
pub struct CommitmentTree<S, M: MemoSize = DashMemo> {
    frontier: CommitmentFrontier,
    pub bulk_tree: BulkAppendTree<S>,
    _memo: PhantomData<M>,
}
}

The default M = DashMemo means existing code that doesn't care about memo size (like verify_grovedb, commitment_tree_anchor, commitment_tree_count) works without specifying M.

Serialization helpers (public free functions):

Function	Description
`ciphertext_payload_size::<M>()`	Expected payload size for a given `MemoSize`
`serialize_ciphertext::<M>(ct)`	Serialize `TransmittedNoteCiphertext<M>` to bytes
`deserialize_ciphertext::<M>(data)`	Deserialize bytes back to `TransmittedNoteCiphertext<M>`

Payload validation: The append_raw() method validates that payload.len() == ciphertext_payload_size::<M>() and returns CommitmentTreeError::InvalidPayloadSize on mismatch. The typed append() method serializes internally, so size is always correct by construction.

Stored Record Layout (312 bytes for DashMemo)

Each entry in the BulkAppendTree stores the complete encrypted note record. The full layout, accounting for every byte:

┌─────────────────────────────────────────────────────────────────────┐
│  Offset   Size   Field                                              │
├─────────────────────────────────────────────────────────────────────┤
│  0        32     cmx — extracted note commitment (Pallas base field)│
│  32       32     rho — nullifier of the spent note                  │
│  64       32     cv_net — value commitment (Pallas curve point)     │
│  96       32     epk_bytes — ephemeral public key (Pallas point)    │
│  128      104    enc_ciphertext — encrypted note plaintext + MAC    │
│  232      80     out_ciphertext — encrypted outgoing data + MAC     │
├─────────────────────────────────────────────────────────────────────┤
│  Total:   312 bytes                                                 │
└─────────────────────────────────────────────────────────────────────┘

The first three fields (cmx, rho, and cv_net) are unencrypted protocol fields — they are public by design. The remaining three fields (epk_bytes, enc_ciphertext, out_ciphertext) form the TransmittedNoteCiphertext and are the encrypted payload. Because cv_net is a fixed 32-byte field inserted between rho and the payload, the ciphertext still deserializes from the record's payload slice (offset 96 onward) exactly as before — only its starting offset shifts from 64 to 96.

Field-by-Field Breakdown

cmx (32 bytes) — The extracted note commitment, a Pallas base field element. This is the leaf value appended to the Sinsemilla frontier. It commits to all note fields (recipient, value, randomness) without revealing them. The cmx is what makes the note "findable" in the commitment tree.

rho (32 bytes) — The nullifier of the note being spent in this action. Nullifiers are already public on the blockchain (they must be to prevent double-spending). Storing rho alongside the commitment allows light clients performing trial decryption to verify esk = PRF(rseed, rho) and confirm epk' == epk without a separate nullifier lookup. This field sits between cmx and the ciphertext as an unencrypted protocol-level association.

cv_net (32 bytes) — The note's net value commitment, a serialized Pallas curve point (cv_net = [v]·V + [rcv]·R). Like rho, it is stored as an unencrypted protocol field — it sits between rho and the ciphertext payload and never enters the Sinsemilla frontier, so the Orchard anchor is unaffected. It is stored because it is required for outgoing-note (OVK) recovery: the sender's Outgoing Cipher Key is derived as ock = BLAKE2b-256("Zcash_Orchardock", ovk || cv_net || cmx || epk) (see out_ciphertext below), and cv_net cannot be recomputed from the note alone. Without it stored, a light wallet holding only the ovk could not reconstruct ock to decrypt out_ciphertext and recover a note it sent — Platform does not retain transitions, so the value commitment would otherwise be lost.

epk_bytes (32 bytes) — The ephemeral public key, a serialized Pallas curve point. Derived deterministically from the note's rseed via:

rseed → esk = ToScalar(PRF^expand(rseed, [4] || rho))
esk   → epk = [esk] * g_d     (scalar multiplication on Pallas)
epk   → epk_bytes = Serialize(epk)

where g_d = DiversifyHash(d) is the diversified base point for the recipient's diversifier. The epk enables the recipient to compute the shared secret for decryption: shared_secret = [ivk] * epk. It is transmitted in the clear because it reveals nothing about sender or recipient without knowing either esk or ivk.

enc_ciphertext (104 bytes for DashMemo) — The encrypted note plaintext, produced by ChaCha20-Poly1305 AEAD encryption:

enc_ciphertext = ChaCha20-Poly1305.Encrypt(key, nonce=[0;12], aad=[], plaintext)
               = ciphertext (88 bytes) || MAC tag (16 bytes) = 104 bytes

The symmetric key is derived via ECDH: key = BLAKE2b-256("Zcash_OrchardKDF", shared_secret || epk_bytes).

When decrypted by the recipient (using ivk), the note plaintext (88 bytes for DashMemo) contains:

Offset	Size	Field	Description
0	1	version	Always `0x02` (Orchard, post-ZIP-212)
1	11	diversifier (d)	Recipient's diversifier, derives the base point `g_d`
12	8	value (v)	64-bit little-endian note value in duffs
20	32	rseed	Random seed for deterministic derivation of `rcm`, `psi`, `esk`
52	36	memo	Application-layer memo data (DashMemo: 36 bytes)
Total	88

The first 52 bytes (version + diversifier + value + rseed) are the compact note — light clients can trial-decrypt just this portion using the ChaCha20 stream cipher (without verifying the MAC) to check whether the note belongs to them. If it does, they decrypt the full 88 bytes and verify the MAC.

out_ciphertext (80 bytes) — The encrypted outgoing data, allowing the sender to recover the note after the fact. Encrypted with the Outgoing Cipher Key:

ock = BLAKE2b-256("Zcash_Orchardock", ovk || cv_net || cmx || epk)
out_ciphertext = ChaCha20-Poly1305.Encrypt(ock, nonce=[0;12], aad=[], plaintext)
               = ciphertext (64 bytes) || MAC tag (16 bytes) = 80 bytes

The cv_net input to ock is exactly why the value commitment is stored as an unencrypted field in the record (offset 64): without it, a wallet holding only the ovk could not reconstruct ock and would be unable to decrypt out_ciphertext to recover a note it sent.

When decrypted by the sender (using ovk), the outgoing plaintext (64 bytes) contains:

Offset	Size	Field	Description
0	32	pk_d	Diversified transmission key (recipient's public key)
32	32	esk	Ephemeral secret key (Pallas scalar)
Total	64

With pk_d and esk, the sender can reconstruct the shared secret, decrypt enc_ciphertext, and recover the full note. If the sender sets ovk = null, the outgoing plaintext is filled with random bytes before encryption, making recovery impossible even for the sender (non-recoverable output).

Encryption Scheme: ChaCha20-Poly1305

Both enc_ciphertext and out_ciphertext use ChaCha20-Poly1305 AEAD (RFC 8439):

Parameter	Value
Key size	256 bits (32 bytes)
Nonce	`[0u8; 12]` (safe because each key is used exactly once)
AAD	Empty
MAC tag	16 bytes (Poly1305)

The zero nonce is safe because the symmetric key is derived from a fresh Diffie-Hellman exchange per note — each key encrypts exactly one message.

DashMemo vs ZcashMemo Size Comparison

Component	DashMemo	ZcashMemo	Notes
Memo field	36 bytes	512 bytes	Application data
Note plaintext	88 bytes	564 bytes	52 fixed + memo
enc_ciphertext	104 bytes	580 bytes	plaintext + 16 MAC
Ciphertext payload (epk+enc+out)	216 bytes	692 bytes	Transmitted per note
Full stored record (cmx+rho+cv_net+payload)	312 bytes	788 bytes	BulkAppendTree entry

DashMemo's smaller memo (36 vs 512 bytes) reduces each stored record by 476 bytes — significant when storing millions of notes.

Trial Decryption Flow (Light Client)

A light client scanning for its own notes performs this sequence for each stored record:

1. Read record: cmx (32) || rho (32) || cv_net (32) || epk (32) || enc_ciphertext (104) || out_ciphertext (80)
   (incoming-note scanning ignores cv_net; it is consumed by the outgoing/OVK path)

2. Compute shared_secret = [ivk] * epk     (ECDH with incoming viewing key)

3. Derive key = BLAKE2b-256("Zcash_OrchardKDF", shared_secret || epk)

4. Trial-decrypt compact note (first 52 bytes of enc_ciphertext):
   → version (1) || diversifier (11) || value (8) || rseed (32)

5. Reconstruct esk = PRF(rseed, rho)    ← rho is needed here!
   Verify: [esk] * g_d == epk           ← confirms this is our note

6. If match: decrypt full enc_ciphertext (88 bytes + 16 MAC):
   → compact_note (52) || memo (36)
   Verify MAC tag for authenticity

7. Reconstruct full Note from (diversifier, value, rseed, rho)
   This note can later be spent by proving knowledge of it in ZK

Step 5 is why rho must be stored alongside the ciphertext — without it, the light client cannot verify the ephemeral key binding during trial decryption.

Client-Side Witness Generation

The grovedb-commitment-tree crate provides a client-side tree for wallets and test harnesses that need to generate Merkle witness paths for spending notes. Enable the client feature to use it:

grovedb-commitment-tree = { version = "4", features = ["client"] }

#![allow(unused)]
fn main() {
pub struct ClientMemoryCommitmentTree {
    inner: ShardTree<MemoryShardStore<MerkleHashOrchard, u32>, 32, 4>,
}
}

The ClientMemoryCommitmentTree wraps ShardTree — a full commitment tree (not just a frontier) that keeps complete history in memory. This allows generating authentication paths for any marked leaf, which the frontier alone cannot do.

API:

Method	Description
`new(max_checkpoints)`	Create empty tree with checkpoint retention limit
`append(cmx, retention)`	Append a commitment with retention policy
`checkpoint(id)`	Create a checkpoint at current state
`max_leaf_position()`	Position of most recently appended leaf
`witness(position, depth)`	Generate `MerklePath` for spending a note
`anchor()`	Current root as `orchard::tree::Anchor`

Retention policies control which leaves can be witnessed later:

Retention	Meaning
`Retention::Ephemeral`	Leaf cannot be witnessed (other people's notes)
`Retention::Marked`	Leaf can be witnessed (your own notes)
`Retention::Checkpoint { id, marking }`	Create a checkpoint, optionally mark

Server vs Client comparison:

	`CommitmentFrontier` (server)	`ClientMemoryCommitmentTree` (client)	`ClientPersistentCommitmentTree` (sqlite)
Storage	~1KB frontier in data storage	Full tree in memory	Full tree in SQLite
Can witness	No	Yes (marked leaves only)	Yes (marked leaves only)
Can compute anchor	Yes	Yes	Yes
Anchor matches	Same sequence → same anchor	Same sequence → same anchor	Same sequence → same anchor
Persists across restarts	Yes (GroveDB data storage)	No (lost on drop)	Yes (SQLite database)
Use case	GroveDB server-side anchor tracking	Testing, ephemeral wallets	Production wallets
Feature flag	`server`	`client`	`sqlite`

All three produce identical anchors for the same sequence of appends. This is verified by the test_frontier_and_client_same_root test.

Persistent Client — SQLite-Backed Witness Generation

The in-memory ClientMemoryCommitmentTree loses all state on drop. For production wallets that must survive restarts without re-scanning the entire blockchain, the crate provides ClientPersistentCommitmentTree backed by SQLite. Enable the sqlite feature:

grovedb-commitment-tree = { version = "4", features = ["sqlite"] }

#![allow(unused)]
fn main() {
pub struct ClientPersistentCommitmentTree {
    inner: ShardTree<SqliteShardStore, 32, 4>,
}
}

Three constructor modes:

Constructor	Description
`open(conn, max_checkpoints)`	Takes ownership of an existing `rusqlite::Connection`
`open_on_shared_connection(arc, max_checkpoints)`	Shares an `Arc<Mutex<Connection>>` with other components
`open_path(path, max_checkpoints)`	Convenience — opens/creates a SQLite DB at the given file path

The bring-your-own-connection constructors (open, open_on_shared_connection) allow the wallet to use its existing database for commitment tree storage. The SqliteShardStore creates its tables with a commitment_tree_ prefix, so it coexists safely alongside other application tables.

API is identical to ClientMemoryCommitmentTree:

Method	Description
`append(cmx, retention)`	Append a commitment with retention policy
`checkpoint(id)`	Create a checkpoint at current state
`max_leaf_position()`	Position of most recently appended leaf
`witness(position, depth)`	Generate `MerklePath` for spending a note
`anchor()`	Current root as `orchard::tree::Anchor`

SQLite schema (4 tables, created automatically):

commitment_tree_shards                -- Shard data (serialized prunable trees)
commitment_tree_cap                   -- Tree cap (single-row, top of shard tree)
commitment_tree_checkpoints           -- Checkpoint metadata (position or empty)
commitment_tree_checkpoint_marks_removed  -- Marks removed per checkpoint

Persistence example:

#![allow(unused)]
fn main() {
use grovedb_commitment_tree::{ClientPersistentCommitmentTree, Retention, Position};

// First session: append notes and close
let mut tree = ClientPersistentCommitmentTree::open_path("wallet.db", 100)?;
tree.append(cmx_0, Retention::Marked)?;
tree.append(cmx_1, Retention::Ephemeral)?;
let anchor_before = tree.anchor()?;
drop(tree);

// Second session: reopen, state is preserved
let tree = ClientPersistentCommitmentTree::open_path("wallet.db", 100)?;
let anchor_after = tree.anchor()?;
assert_eq!(anchor_before, anchor_after);  // same anchor, no re-scan needed
}

Shared connection example (for wallets with an existing SQLite database):

#![allow(unused)]
fn main() {
use std::sync::{Arc, Mutex};
use grovedb_commitment_tree::rusqlite::Connection;

let conn = Arc::new(Mutex::new(Connection::open("wallet.db")?));
// conn is also used by other wallet components...
let mut tree = ClientPersistentCommitmentTree::open_on_shared_connection(
    conn.clone(), 100
)?;
}

The grovedb-commitment-tree crate re-exports rusqlite under the sqlite feature flag, so downstream consumers do not need to add rusqlite as a separate dependency.

SqliteShardStore internals:

The SqliteShardStore implements all 18 methods of the ShardStore trait. Shard trees are serialized using a compact binary format:

Nil:    [0x00]                                     — 1 byte
Leaf:   [0x01][hash: 32][flags: 1]                 — 34 bytes
Parent: [0x02][has_ann: 1][ann?: 32][left][right]  — recursive

LocatedPrunableTree adds an address prefix: [level: 1][index: 8][tree_bytes].

The ConnectionHolder enum abstracts over owned vs shared connections:

#![allow(unused)]
fn main() {
enum ConnectionHolder {
    Owned(Connection),                    // exclusive access
    Shared(Arc<Mutex<Connection>>),       // shared with other components
}
}

All database operations acquire the connection through a with_conn helper that transparently handles both modes, locking the mutex only when shared.

Proof Integration

CommitmentTree supports two proof paths:

1. Sinsemilla anchor proof (ZK path):

GroveDB root hash
  ↓ Merk proof (V0, standard)
Parent Merk node
  ↓ value_hash includes CommitmentTree element bytes
CommitmentTree element bytes
  ↓ contains sinsemilla_root field
Sinsemilla root (Orchard Anchor)
  ↓ ZK proof (Halo 2 circuit, off-chain)
Note commitment at position P

The parent Merk proof demonstrates that the CommitmentTree element exists at the claimed path/key, with specific bytes.
Those bytes include the sinsemilla_root field.
The client (wallet) independently constructs a Merkle witness in the Sinsemilla tree using ClientMemoryCommitmentTree::witness() (testing) or ClientPersistentCommitmentTree::witness() (production, SQLite-backed).
The ZK circuit verifies the witness against the anchor (sinsemilla_root).

2. Item retrieval proof (V1 path):

Individual items (cmx || rho || cv_net || payload) can be queried by position and proved using V1 proofs (§9.6), the same mechanism used by standalone BulkAppendTree. The V1 proof includes the BulkAppendTree authentication path for the requested position, chained to the parent Merk proof for the CommitmentTree element.

Cost Tracking

CommitmentTree introduces a dedicated cost field for Sinsemilla operations:

#![allow(unused)]
fn main() {
pub struct OperationCost {
    pub seek_count: u32,
    pub storage_cost: StorageCost,
    pub storage_loaded_bytes: u64,
    pub hash_node_calls: u32,
    pub sinsemilla_hash_calls: u32,   // ← new field for CommitmentTree
}
}

The sinsemilla_hash_calls field is separate from hash_node_calls because Sinsemilla hashes are dramatically more expensive than Blake3 in both CPU time and ZK circuit cost.

Per-append cost breakdown:

Component	Average case	Worst case
Sinsemilla hashes	33 (32 root + 1 ommer avg)	64 (32 root + 32 ommers)
Frontier I/O seeks	2 (get + put)	2
Frontier bytes loaded	554 (~16 ommers)	1,066 (32 ommers)
Frontier bytes written	554	1,066
BulkAppendTree hashes	~5 Blake3 (amortized, see §14.15)	O(chunk_size) on compaction
BulkAppendTree I/O	2-3 seeks (metadata + buffer)	+2 on chunk compaction

Cost estimation constants (from average_case_costs.rs and worst_case_costs.rs):

#![allow(unused)]
fn main() {
// Average case
const AVG_FRONTIER_SIZE: u32 = 554;    // ~16 ommers
const AVG_SINSEMILLA_HASHES: u32 = 33; // 32 root levels + 1 avg ommer

// Worst case
const MAX_FRONTIER_SIZE: u32 = 1066;   // 32 ommers (max depth)
const MAX_SINSEMILLA_HASHES: u32 = 64; // 32 root levels + 32 ommers
}

The BulkAppendTree component cost is tracked alongside the Sinsemilla cost, combining both Blake3 hashes (from BulkAppendTree buffer/chunk operations) and Sinsemilla hashes (from frontier append) into a single OperationCost.

The Orchard Key Hierarchy and Re-exports

The grovedb-commitment-tree crate re-exports the full Orchard API needed to construct and verify shielded transactions. This allows Platform code to import everything from a single crate.

Key management types:

SpendingKey
  ├── SpendAuthorizingKey → SpendValidatingKey
  └── FullViewingKey
        ├── IncomingViewingKey (decrypt received notes)
        ├── OutgoingViewingKey (decrypt sent notes)
        └── Address (= PaymentAddress, derive recipient addresses)

Note types:

Type	Purpose
`Note`	Full note with value, recipient, randomness
`ExtractedNoteCommitment`	The `cmx` extracted from a note (32 bytes)
`Nullifier`	Unique tag that marks a note as spent
`Rho`	Nullifier derivation input (links spend to prior note)
`NoteValue`	64-bit note value
`ValueCommitment`	Pedersen commitment to a note value

Proof and bundle types:

Type	Purpose
`ProvingKey`	Halo 2 proving key for Orchard circuits
`VerifyingKey`	Halo 2 verifying key for Orchard circuits
`BatchValidator`	Batch verification of multiple Orchard bundles
`Bundle<T, V>`	Collection of Actions forming a shielded transfer
`Action`	Single spend/output pair within a bundle
`Authorized`	Bundle authorization state (signatures + ZK proof)
`Flags`	Bundle flags (spends enabled, outputs enabled)
`Proof`	The Halo 2 proof within an authorized bundle

Builder types:

Type	Purpose
`Builder`	Constructs an Orchard bundle from spends and outputs
`BundleType`	Configures padding strategy for the bundle

Tree types:

Type	Purpose
`Anchor`	Sinsemilla root as an Orchard-native type
`MerkleHashOrchard`	Sinsemilla hash node in the commitment tree
`MerklePath`	32-level authentication path for witness generation

Implementation Files

File	Purpose
`grovedb-commitment-tree/src/lib.rs`	`CommitmentFrontier` struct, serialization, `EMPTY_SINSEMILLA_ROOT`, re-exports
`grovedb-commitment-tree/src/commitment_tree/mod.rs`	`CommitmentTree<S, M>` struct, typed/raw append, ciphertext ser/de helpers
`grovedb-commitment-tree/src/commitment_frontier/mod.rs`	`CommitmentFrontier` (Sinsemilla frontier wrapping `Frontier`)
`grovedb-commitment-tree/src/error.rs`	`CommitmentTreeError` (including `InvalidPayloadSize`)
`grovedb-commitment-tree/src/client/mod.rs`	`ClientMemoryCommitmentTree`, in-memory witness generation
`grovedb-commitment-tree/src/client/sqlite_store.rs`	`SqliteShardStore`, `ShardStore` impl over SQLite, tree serialization
`grovedb-commitment-tree/src/client/client_persistent_commitment_tree.rs`	`ClientPersistentCommitmentTree`, SQLite-backed witness generation
`grovedb-commitment-tree/Cargo.toml`	Feature flags: `server`, `client`, `sqlite`
`grovedb-element/src/element/mod.rs`	`Element::CommitmentTree` variant (3 fields: `u64, u8, Option<ElementFlags>`)
`grovedb-element/src/element/constructor.rs`	`empty_commitment_tree(chunk_power)`, `new_commitment_tree_with_all()`
`grovedb-element/src/element/helpers.rs`	`uses_non_merk_data_storage()` helper
`merk/src/tree_type/costs.rs`	`COMMITMENT_TREE_COST_SIZE = 12`
`merk/src/tree_type/mod.rs`	`TreeType::CommitmentTree = 7`, `uses_non_merk_data_storage()`
`grovedb/src/operations/commitment_tree.rs`	GroveDB operations: typed insert, raw insert, anchor, get_value, count, batch preprocessing
`grovedb/src/operations/delete/mod.rs`	Non-Merk tree type delete handling
`grovedb/src/batch/mod.rs`	`GroveOp::CommitmentTreeInsert`, `commitment_tree_insert_op_typed` constructor
`grovedb/src/batch/estimated_costs/average_case_costs.rs`	Average case cost model
`grovedb/src/batch/estimated_costs/worst_case_costs.rs`	Worst case cost model
`grovedb/src/tests/commitment_tree_tests.rs`	32 integration tests

Comparison with Other Tree Types

	CommitmentTree	MmrTree	BulkAppendTree	DenseTree
Element discriminant	11	12	13	14
TreeType	7	8	9	10
Has child Merk	No	No	No	No
Data namespace	BulkAppendTree entries + Sinsemilla frontier	MMR nodes	Buffer + chunks + MMR	Values by position
Hash function	Sinsemilla + Blake3	Blake3	Blake3	Blake3
Proof type	V1 (Bulk) + ZK (Sinsemilla)	V1 (MMR proof)	V1 (Bulk proof)	V1 (DenseTree proof)
Hashes per append	~33 Sinsemilla + ~5 Blake3	~2 Blake3	~5 Blake3 (amortized)	O(n) Blake3
Cost size	12 bytes	11 bytes	12 bytes	6 bytes
Capacity	Unlimited	Unlimited	Unlimited	Fixed (2^h - 1)
ZK-friendly	Yes (Halo 2)	No	No	No
Chunk compaction	Yes (configurable chunk_power)	No	Yes	No
Use case	Shielded note commitments	Event/transaction logs	High-throughput bulk logs	Small bounded structures

Choose CommitmentTree when you need ZK-provable anchors for shielded protocols with efficient chunk-compacted storage. Choose MmrTree when you need a simple append-only log with individual leaf proofs. Choose BulkAppendTree when you need high-throughput range queries with chunk-based snapshots. Choose DenseAppendOnlyFixedSizeTree when you need a compact, fixed-capacity structure where every position stores a value and the root hash is always recomputed on the fly.

The DenseAppendOnlyFixedSizeTree — Dense Fixed-Capacity Merkle Storage

The DenseAppendOnlyFixedSizeTree is a complete binary tree of a fixed height where every node — both internal and leaf — stores a data value. Positions are filled sequentially in level-order (BFS): root first (position 0), then left-to-right at each level. No intermediate hashes are persisted; the root hash is recomputed on the fly by recursively hashing from leaves to root.

This design is ideal for small, bounded data structures where the maximum capacity is known in advance and you need O(1) append, O(1) retrieval by position, and a compact 32-byte root hash commitment that changes after every insert.

Tree Structure

A tree of height h has capacity 2^h - 1 positions. Positions use 0-based level-order indexing:

Height 3 tree (capacity = 7):

              pos 0          ← root (level 0)
             /     \
          pos 1    pos 2     ← level 1
         /   \    /   \
       pos 3 pos 4 pos 5 pos 6  ← level 2 (leaves)

Navigation:
  left_child(i)  = 2i + 1
  right_child(i) = 2i + 2
  parent(i)      = (i - 1) / 2
  is_leaf(i)     = 2i + 1 >= capacity

Values are appended sequentially: the first value goes to position 0 (root), then position 1, 2, 3, and so on. This means the root always has data, and the tree fills in level-order — the most natural traversal order for a complete binary tree.

Hash Computation

The root hash is not stored separately — it is recomputed from scratch whenever needed. The recursive algorithm visits only filled positions:

hash(position, store):
  value = store.get(position)

  if position is unfilled (>= count):
    return [0; 32]                                    ← empty sentinel

  value_hash = blake3(value)
  left_hash  = hash(2 * position + 1, store)
  right_hash = hash(2 * position + 2, store)
  return blake3(value_hash || left_hash || right_hash)

Key properties:

All nodes (leaf and internal): blake3(blake3(value) || H(left) || H(right))
Leaf nodes: left_hash and right_hash are both [0; 32] (unfilled children)
Unfilled positions: [0u8; 32] (zero hash)
Empty tree (count = 0): [0u8; 32]

No leaf/internal domain separation tags are used. The tree structure (height and count) is externally authenticated in the parent Element::DenseAppendOnlyFixedSizeTree, which flows through the Merk hierarchy. The verifier always knows exactly which positions are leaves vs internal nodes from the height and count, so an attacker cannot substitute one for the other without breaking the parent authentication chain.

This means the root hash encodes a commitment to every stored value and its exact position in the tree. Changing any value (if it were mutable) would cascade through all ancestor hashes up to the root.

Hash cost: Computing the root hash visits all filled positions plus any unfilled children. For a tree with n values, worst case is O(n) blake3 calls. This is acceptable because the tree is designed for small, bounded capacities (max height 16, max 65,535 positions).

The Element Variant

#![allow(unused)]
fn main() {
Element::DenseAppendOnlyFixedSizeTree(
    u16,                   // count — number of values stored (max 65,535)
    u8,                    // height — immutable after creation (1..=16)
    Option<ElementFlags>,  // flags — storage flags
)
}

Field	Type	Description
`count`	`u16`	Number of values inserted so far (max 65,535)
`height`	`u8`	Tree height (1..=16), immutable after creation
`flags`	`Option<ElementFlags>`	Optional storage flags

The root hash is NOT stored in the Element — it flows as the Merk child hash via insert_subtree's subtree_root_hash parameter.

Discriminant: 14 (ElementType), TreeType = 10

Cost size: DENSE_TREE_COST_SIZE = 6 bytes (2 count + 1 height + 1 discriminant

2 overhead)

Storage Layout

Like MmrTree and BulkAppendTree, the DenseAppendOnlyFixedSizeTree stores data in the data namespace (not a child Merk). Values are keyed by their position as a big-endian u64:

Subtree path: blake3(parent_path || key)

Storage keys:
  [0, 0, 0, 0, 0, 0, 0, 0] → value at position 0 (root)
  [0, 0, 0, 0, 0, 0, 0, 1] → value at position 1
  [0, 0, 0, 0, 0, 0, 0, 2] → value at position 2
  ...

The Element itself (stored in the parent Merk) carries the count and height. The root hash flows as the Merk child hash. This means:

Reading the root hash requires recomputation from storage (O(n) hashing)
Reading a value by position is O(1) — single storage lookup
Inserting is O(n) hashing — one storage write + full root hash recomputation

Operations

`dense_tree_insert(path, key, value, tx, grove_version)`

Appends a value to the next available position. Returns (root_hash, position).

Step 1: Read element, extract (count, height)
Step 2: Check capacity: if count >= 2^height - 1 → error
Step 3: Build subtree path, open storage context
Step 4: Write value to position = count
Step 5: Reconstruct DenseFixedSizedMerkleTree from state
Step 6: Call tree.insert(value, store) → (root_hash, position, hash_calls)
Step 7: Update element with new root_hash and count + 1
Step 8: Propagate changes up through Merk hierarchy
Step 9: Commit transaction

`dense_tree_get(path, key, position, tx, grove_version)`

Retrieves the value at a given position. Returns None if position >= count.

`dense_tree_root_hash(path, key, tx, grove_version)`

Returns the root hash stored in the element. This is the hash computed during the most recent insert — no recomputation needed.

`dense_tree_count(path, key, tx, grove_version)`

Returns the number of values stored (the count field from the element).

Batch Operations

The GroveOp::DenseTreeInsert variant supports batch insertion through the standard GroveDB batch pipeline:

#![allow(unused)]
fn main() {
let ops = vec![
    QualifiedGroveDbOp::dense_tree_insert_op(
        vec![b"parent".to_vec()],
        b"my_dense_tree".to_vec(),
        b"value_data".to_vec(),
    ),
];
db.apply_batch(ops, None, None, grove_version)?;
}

Preprocessing: Like all non-Merk tree types, DenseTreeInsert ops are preprocessed before the main batch body executes. The preprocess_dense_tree_ops method:

Groups all DenseTreeInsert ops by (path, key)
For each group, executes the inserts sequentially (reading the element, inserting each value, updating the root hash)
Converts each group into a ReplaceNonMerkTreeRoot op that carries the final root_hash and count through the standard propagation machinery

Multiple inserts to the same dense tree within a single batch are supported — they are processed in order and the consistency check allows duplicate keys for this op type.

Propagation: The root hash and count flow through the NonMerkTreeMeta::DenseTree variant in ReplaceNonMerkTreeRoot, following the same pattern as MmrTree and BulkAppendTree.

Proofs

DenseAppendOnlyFixedSizeTree supports V1 subquery proofs via the ProofBytes::DenseTree variant. Individual positions can be proved against the tree's root hash using inclusion proofs that carry ancestor values and sibling subtree hashes.

Auth Path Structure

Because internal nodes hash their own value (not just child hashes), the authentication path differs from a standard Merkle tree. To verify a leaf at position p, the verifier needs:

The leaf value (the proved entry)
Ancestor value hashes for every internal node on the path from p to the root (only the 32-byte hash, not the full value)
Sibling subtree hashes for every child that is NOT on the path

Because all nodes use blake3(H(value) || H(left) || H(right)) (no domain tags), the proof only carries 32-byte value hashes for ancestors — not full values. This keeps proofs compact regardless of how large individual values are.

#![allow(unused)]
fn main() {
pub struct DenseTreeProof {
    pub entries: Vec<(u16, Vec<u8>)>,            // proved (position, value) pairs
    pub node_value_hashes: Vec<(u16, [u8; 32])>, // ancestor value hashes on the auth path
    pub node_hashes: Vec<(u16, [u8; 32])>,       // precomputed sibling subtree hashes
}
}

Note: height and count are not in the proof struct — the verifier gets them from the parent Element, which is authenticated by the Merk hierarchy.

Walkthrough Example

Tree with height=3, capacity=7, count=5, proving position 4:

Path from 4 to root: 4 → 1 → 0. Expanded set: {0, 1, 4}.

The proof contains:

entries: [(4, value[4])] — the proved position
node_value_hashes: [(0, H(value[0])), (1, H(value[1]))] — ancestor value hashes (32 bytes each, not full values)
node_hashes: [(2, H(subtree_2)), (3, H(node_3))] — siblings not on the path

Verification recomputes the root hash bottom-up:

H(4) = blake3(blake3(value[4]) || [0;32] || [0;32]) — leaf (children are unfilled)
H(3) — from node_hashes
H(1) = blake3(H(value[1]) || H(3) || H(4)) — internal uses value hash from node_value_hashes
H(2) — from node_hashes
H(0) = blake3(H(value[0]) || H(1) || H(2)) — root uses value hash from node_value_hashes
Compare H(0) against expected root hash

Multi-Position Proofs

When proving multiple positions, the expanded set merges overlapping auth paths. Shared ancestors are included only once, making multi-position proofs more compact than independent single-position proofs.

V0 Limitation

V0 proofs cannot descend into dense trees. If a V0 query matches a DenseAppendOnlyFixedSizeTree with a subquery, the system returns Error::NotSupported directing the caller to use prove_query_v1.

Query Key Encoding

Dense tree positions are encoded as big-endian u16 (2-byte) query keys, unlike MmrTree and BulkAppendTree which use u64. All standard QueryItem range types are supported.

Comparison with Other Non-Merk Trees

	DenseTree	MmrTree	BulkAppendTree	CommitmentTree
Element discriminant	14	12	13	11
TreeType	10	8	9	7
Capacity	Fixed (`2^h - 1`, max 65,535)	Unlimited	Unlimited	Unlimited
Data model	Every position stores a value	Leaf-only	Dense tree buffer + chunks	Leaf-only
Hash in Element?	No (flows as child hash)	No (flows as child hash)	No (flows as child hash)	No (flows as child hash)
Insert cost (hashing)	O(n) blake3	O(1) amortized	O(1) amortized	~33 Sinsemilla
Cost size	6 bytes	11 bytes	12 bytes	12 bytes
Proof support	V1 (Dense)	V1 (MMR)	V1 (Bulk)	V1 (CommitmentTree)
Best for	Small bounded structures	Event logs	High-throughput logs	ZK commitments

When to choose DenseAppendOnlyFixedSizeTree:

The maximum number of entries is known at creation time
You need every position (including internal nodes) to store data
You want the simplest possible data model with no unbounded growth
O(n) root hash recomputation is acceptable (small tree heights)

When NOT to choose it:

You need unlimited capacity → use MmrTree or BulkAppendTree
You need ZK compatibility → use CommitmentTree

Usage Example

#![allow(unused)]
fn main() {
use grovedb::Element;
use grovedb_version::version::GroveVersion;

let grove_version = GroveVersion::latest();

// Create a dense tree of height 4 (capacity = 15 values)
db.insert(
    &[b"state"],
    b"validator_slots",
    Element::empty_dense_tree(4),
    None,
    None,
    grove_version,
)?;

// Append values — positions filled 0, 1, 2, ...
let (root_hash, pos) = db.dense_tree_insert(
    &[b"state"],
    b"validator_slots",
    validator_pubkey.to_vec(),
    None,
    grove_version,
)?;
// pos == 0, root_hash = blake3(validator_pubkey)

// Read back by position
let value = db.dense_tree_get(
    &[b"state"],
    b"validator_slots",
    0,     // position
    None,
    grove_version,
)?;
assert_eq!(value, Some(validator_pubkey.to_vec()));

// Query metadata
let count = db.dense_tree_count(&[b"state"], b"validator_slots", None, grove_version)?;
let hash = db.dense_tree_root_hash(&[b"state"], b"validator_slots", None, grove_version)?;
}

Implementation Files

File	Contents
`grovedb-dense-fixed-sized-merkle-tree/src/lib.rs`	`DenseTreeStore` trait, `DenseFixedSizedMerkleTree` struct, recursive hash
`grovedb-dense-fixed-sized-merkle-tree/src/proof.rs`	`DenseTreeProof` struct, `generate()`, `encode_to_vec()`, `decode_from_slice()`
`grovedb-dense-fixed-sized-merkle-tree/src/verify.rs`	`DenseTreeProof::verify()` — pure function, no storage needed
`grovedb-element/src/element/mod.rs`	`Element::DenseAppendOnlyFixedSizeTree` (discriminant 14)
`grovedb-element/src/element/constructor.rs`	`empty_dense_tree()`, `new_dense_tree()`
`merk/src/tree_type/mod.rs`	`TreeType::DenseAppendOnlyFixedSizeTree = 10`
`merk/src/tree_type/costs.rs`	`DENSE_TREE_COST_SIZE = 6`
`grovedb/src/operations/dense_tree.rs`	GroveDB operations, `AuxDenseTreeStore`, batch preprocessing
`grovedb/src/operations/proof/generate.rs`	`generate_dense_tree_layer_proof()`, `query_items_to_positions()`
`grovedb/src/operations/proof/verify.rs`	`verify_dense_tree_lower_layer()`
`grovedb/src/operations/proof/mod.rs`	`ProofBytes::DenseTree` variant
`grovedb/src/batch/estimated_costs/average_case_costs.rs`	Average case cost model
`grovedb/src/batch/estimated_costs/worst_case_costs.rs`	Worst case cost model
`grovedb/src/tests/dense_tree_tests.rs`	22 integration tests

Quantum Cryptography — Post-Quantum Threat Analysis

This chapter analyzes how quantum computers would affect the cryptographic primitives used in GroveDB and the shielded transaction protocols built on top of it (Orchard, Dash Platform). It covers which components are vulnerable, which are safe, what "harvest now, decrypt later" means for stored data, and what mitigation strategies exist — including hybrid KEM designs.

Two Quantum Algorithms That Matter

Only two quantum algorithms are relevant to cryptography in practice:

Shor's algorithm solves the discrete logarithm problem (and integer factoring) in polynomial time. For a 255-bit elliptic curve like Pallas, this requires roughly 510 logical qubits — but with error correction overhead, the real requirement is approximately 4 million physical qubits. Shor's algorithm completely breaks all elliptic curve cryptography regardless of key size.

Grover's algorithm provides a quadratic speedup for brute-force search. A 256-bit symmetric key effectively becomes 128-bit. However, the circuit depth for Grover's on a 128-bit key space is still 2^64 quantum operations — many cryptographers believe this will never be practical on real hardware due to decoherence limits. Grover's reduces security margins but does not break well-parameterized symmetric cryptography.

Algorithm	Targets	Speedup	Practical impact
Shor's	ECC discrete log, RSA factoring	Polynomial time (exponential speedup over classical)	Total break of ECC
Grover's	Symmetric key search, hash preimage	Quadratic (halves key bits)	256-bit → 128-bit (still safe)

GroveDB's Cryptographic Primitives

GroveDB and the Orchard-based shielded protocol use a mix of elliptic curve and symmetric/hash-based primitives. The table below classifies every primitive by its quantum vulnerability:

Quantum-Vulnerable (Shor's algorithm — 0 bits post-quantum)

Primitive	Where used	What breaks
Pallas ECDLP	Note commitments (cmx), ephemeral keys (epk/esk), viewing keys (ivk), payment keys (pk_d), nullifier derivation	Recover any private key from its public counterpart
ECDH key agreement (Pallas)	Deriving symmetric encryption keys for note ciphertexts	Recover shared secret → decrypt all notes
Sinsemilla hash	Commitment tree Merkle paths, in-circuit hashing	Collision resistance depends on ECDLP; degrades when Pallas breaks
Halo 2 IPA	ZK proof system (polynomial commitment over Pasta curves)	Forge proofs for false statements (counterfeit, unauthorized spends)
Pedersen commitments	Value commitments (cv_net) hiding transaction amounts	Recover hidden amounts; forge balance proofs

Quantum-Safe (Grover's algorithm — 128+ bits post-quantum)

Primitive	Where used	Post-quantum security
Blake3	Merk tree node hashes, MMR nodes, BulkAppendTree state roots, subtree path prefixes	128-bit preimage, 128-bit second-preimage
BLAKE2b-256	KDF for symmetric key derivation, outgoing cipher key, PRF^expand	128-bit preimage
ChaCha20-Poly1305	Encrypts enc_ciphertext and out_ciphertext (256-bit keys)	128-bit key search (safe, but key derivation path through ECDH is not)
PRF^expand (BLAKE2b-512)	Derives esk, rcm, psi from rseed	128-bit PRF security

GroveDB Infrastructure: Believed Quantum-Safe Under Current Assumptions

All of GroveDB's own data structures rely exclusively on Blake3 hashing, which is believed to be quantum-resistant under current cryptographic assumptions:

Merk AVL trees — node hashes, combined_value_hash, child hash propagation
MMR trees — internal node hashes, peak computation, root derivation
BulkAppendTree — buffer hash chains, dense Merkle roots, epoch MMR
CommitmentTree state root — blake3("ct_state" || sinsemilla_root || bulk_state_root)
Subtree path prefixes — Blake3 hashing of path segments
V1 proofs — authentication chains through Merk hierarchy

No changes needed based on known attacks. GroveDB's Merk tree proofs, MMR consistency checks, BulkAppendTree epoch roots, and all V1 proof authentication chains are believed to remain secure against quantum computers. Hash-based infrastructure is the strongest part of the system post-quantum, though assessments may evolve with new cryptanalytic techniques.

Retroactive vs Real-Time Threats

This distinction is critical for prioritizing what to fix and when.

Retroactive threats compromise data that is already stored. An adversary records data today and decrypts it when quantum computers become available. These threats cannot be mitigated after the fact — once the data is on-chain, it cannot be re-encrypted or recalled.

Real-time threats only affect transactions created in the future. An adversary with a quantum computer could forge signatures or proofs, but only for new transactions. Old transactions were already validated and confirmed by the network.

Threat	Type	What's exposed	Urgency
Note decryption (enc_ciphertext)	Retroactive	Note contents: recipient, amount, memo, rseed	High — stored forever
Value commitment opening (cv_net)	Retroactive	Transaction amounts (but not sender/receiver)	Medium — amounts only
Sender recovery data (out_ciphertext)	Retroactive	Sender's recovery keys for sent notes	High — stored forever
Spend authorization forgery	Real-time	Could forge new spend signatures	Low — upgrade before QC arrives
Halo 2 proof forgery	Real-time	Could forge new proofs (counterfeit)	Low — upgrade before QC arrives
Sinsemilla collision	Real-time	Could forge new Merkle paths	Low — subsumed by proof forgery
Binding signature forgery	Real-time	Could forge new balance proofs	Low — upgrade before QC arrives

What Exactly Gets Revealed?

If note encryption is broken (the primary HNDL threat):

A quantum adversary recovers esk from the stored epk via Shor's algorithm, computes the ECDH shared secret, derives the symmetric key, and decrypts enc_ciphertext. This reveals the full note plaintext:

Field	Size	What it reveals
version	1 byte	Protocol version (not sensitive)
diversifier	11 bytes	Recipient's address component
value	8 bytes	Exact transaction amount
rseed	32 bytes	Enables nullifier linkage (deanonymizes transaction graph)
memo	36 bytes (DashMemo)	Application data, potentially identifying

With rseed and rho (stored alongside the ciphertext), the adversary can compute esk = PRF(rseed, rho) and verify the ephemeral key binding. Combined with the diversifier, this links inputs to outputs across the entire transaction history — full deanonymization of the shielded pool.

If only value commitments are broken (secondary HNDL threat):

The adversary recovers v from cv_net = [v]*V + [rcv]*R by solving ECDLP. This reveals transaction amounts but not sender or receiver identities. The adversary sees "someone sent 5.0 Dash to someone" but cannot link the amount to any address or identity without also breaking note encryption.

On its own, amounts without linkage are limited in usefulness. But combined with external data (timing, known invoices, amounts matching public requests), correlation attacks become possible.

The "Harvest Now, Decrypt Later" Attack

This is the most urgent and practical quantum threat.

Attack model: A state-level adversary (or any party with sufficient storage) records all on-chain shielded transaction data today. This data is publicly available on the blockchain and immutable. The adversary waits for a cryptographically relevant quantum computer (CRQC), then:

Step 1: Read stored record from CommitmentTree BulkAppendTree:
        cmx (32) || rho (32) || cv_net (32) || epk (32) || enc_ciphertext (104) || out_ciphertext (80)

Step 2: Solve ECDLP on Pallas via Shor's algorithm:
        epk = [esk] * g_d  →  recover esk

Step 3: Compute shared secret:
        shared_secret = [esk] * pk_d

Step 4: Derive symmetric key (BLAKE2b is quantum-safe, but input is compromised):
        K_enc = BLAKE2b-256("Zcash_OrchardKDF", shared_secret || epk)

Step 5: Decrypt enc_ciphertext using ChaCha20-Poly1305:
        → version || diversifier || value || rseed || memo

Step 6: With rseed + rho, link nullifiers to note commitments:
        esk = PRF(rseed, rho)
        → full transaction graph reconstruction

Key insight: The symmetric encryption (ChaCha20-Poly1305) is perfectly quantum-safe. The vulnerability is entirely in the key derivation path — the symmetric key is derived from an ECDH shared secret, and ECDH is broken by Shor's algorithm. The attacker doesn't break the encryption; they recover the key.

Retroactivity: This attack is fully retroactive. Every encrypted note ever stored on-chain can be decrypted once a CRQC exists. The data cannot be re-encrypted or protected after the fact. This is why it must be addressed before data is stored, not after.

Mitigation: Hybrid KEM (ML-KEM + ECDH)

The defense against HNDL is to derive the symmetric encryption key from two independent key agreement mechanisms, such that breaking only one is insufficient. This is called a hybrid KEM.

ML-KEM-768 (CRYSTALS-Kyber)

ML-KEM is the NIST-standardized (FIPS 203, August 2024) post-quantum key encapsulation mechanism based on the Module Learning With Errors (MLWE) problem.

Parameter	ML-KEM-512	ML-KEM-768	ML-KEM-1024
Public key (ek)	800 bytes	1,184 bytes	1,568 bytes
Ciphertext (ct)	768 bytes	1,088 bytes	1,568 bytes
Shared secret	32 bytes	32 bytes	32 bytes
NIST Category	1 (128-bit)	3 (192-bit)	5 (256-bit)

ML-KEM-768 is the recommended choice — it is the parameter set used by X-Wing, Signal's PQXDH, and Chrome/Firefox TLS hybrid key exchange. Category 3 provides a comfortable margin against future lattice cryptanalysis advances.

How the Hybrid Scheme Works

Current flow (vulnerable):

Sender:
  esk = PRF(rseed, rho)                    // deterministic from note
  epk = [esk] * g_d                         // Pallas curve point
  shared_secret = [esk] * pk_d              // ECDH (broken by Shor's)
  K_enc = BLAKE2b("Zcash_OrchardKDF", shared_secret || epk)
  enc_ciphertext = ChaCha20(K_enc, note_plaintext)

Hybrid flow (quantum-resistant):

Sender:
  esk = PRF(rseed, rho)                    // unchanged
  epk = [esk] * g_d                         // unchanged
  ss_ecdh = [esk] * pk_d                    // same ECDH

  (ct_pq, ss_pq) = ML-KEM-768.Encaps(ek_pq)  // NEW: lattice-based KEM
                                                // ek_pq from recipient's address

  K_enc = BLAKE2b(                          // MODIFIED: combines both secrets
      "DashPlatform_HybridKDF",
      ss_ecdh || ss_pq || ct_pq || epk
  )

  enc_ciphertext = ChaCha20(K_enc, note_plaintext)  // unchanged

Recipient decryption:

Recipient:
  ss_ecdh = [ivk] * epk                    // same ECDH (using incoming viewing key)
  ss_pq = ML-KEM-768.Decaps(dk_pq, ct_pq)  // NEW: KEM decapsulation
  K_enc = BLAKE2b("DashPlatform_HybridKDF", ss_ecdh || ss_pq || ct_pq || epk)
  note_plaintext = ChaCha20.Decrypt(K_enc, enc_ciphertext)

Security Guarantee

The combined KEM is IND-CCA2 secure if either component KEM is secure. This is formally proven by Giacon, Heuer, and Poettering (2018) for KEM combiners using a PRF (BLAKE2b qualifies), and independently by the X-Wing security proof.

Scenario	ECDH	ML-KEM	Combined key	Status
Classical world	Secure	Secure	Secure	Both intact
Quantum breaks ECC	Broken	Secure	Secure	ML-KEM protects
Lattice advances break ML-KEM	Secure	Broken	Secure	ECDH protects (same as today)
Both broken	Broken	Broken	Broken	Requires two simultaneous breakthroughs

Size Impact

The hybrid KEM adds the ML-KEM-768 ciphertext (1,088 bytes) to each stored note and expands the outgoing ciphertext to include the ML-KEM shared secret for sender recovery:

Stored record per note:

┌──────────────────────────────────────────────────────────────────┐
│  Current (312 bytes)         Hybrid (1,432 bytes)               │
│                                                                  │
│  cmx:             32         cmx:              32               │
│  rho:             32         rho:              32               │
│  cv_net:          32         cv_net:           32               │
│  epk:             32         epk:              32               │
│  enc_ciphertext: 104         ct_pq:         1,088  ← NEW       │
│  out_ciphertext:  80         enc_ciphertext:  104               │
│                              out_ciphertext:  112  ← +32        │
│  ─────────────────           ──────────────────────             │
│  Total:          312         Total:          1,432  (4.6x)      │
└──────────────────────────────────────────────────────────────────┘

Storage at scale:

Notes	Current (312 B)	Hybrid (1,432 B)	Delta
100,000	29.8 MB	137 MB	+107 MB
1,000,000	298 MB	1.33 GB	+1.04 GB
10,000,000	2.91 GB	13.3 GB	+10.4 GB

Address size:

Current:  diversifier (11) + pk_d (32) = 43 bytes
Hybrid:   diversifier (11) + pk_d (32) + ek_pq (1,184) = 1,227 bytes

The 1,184-byte ML-KEM public key must be included in the address so senders can perform encapsulation. At ~1,960 Bech32m characters, this is large but still fits in a QR code (max ~2,953 alphanumeric characters).

Key Management

The ML-KEM keypair derives deterministically from the spending key:

SpendingKey (sk) [32 bytes]
  |
  +-> ... (all existing Orchard key derivation unchanged)
  |
  +-> ml_kem_d = PRF^expand(sk, [0x09])[0..32]
  +-> ml_kem_z = PRF^expand(sk, [0x0A])[0..32]
        |
        +-> (ek_pq, dk_pq) = ML-KEM-768.KeyGen(d=ml_kem_d, z=ml_kem_z)
              ek_pq: 1,184 bytes (public, included in address)
              dk_pq: 2,400 bytes (private, part of viewing key)

No backup changes needed. The existing 24-word seed phrase covers the ML-KEM key because it derives from the spending key deterministically. Wallet recovery works as before.

Diversified addresses all share the same ek_pq because ML-KEM has no natural diversification mechanism like Pallas scalar multiplication. This means an observer with two of a user's addresses can link them by comparing ek_pq.

Trial Decryption Performance

Step	Current	Hybrid	Delta
Pallas ECDH	~100 us	~100 us	—
ML-KEM-768 Decaps	—	~40 us	+40 us
BLAKE2b KDF	~0.5 us	~1 us	—
ChaCha20 (52 bytes)	~0.1 us	~0.1 us	—
Total per note	~101 us	~141 us	+40% overhead

Scanning 100,000 notes: ~10.1 sec → ~14.1 sec. The overhead is meaningful but not prohibitive. ML-KEM decapsulation is constant-time with no batching advantage (unlike elliptic curve operations), so it scales linearly.

Impact on ZK Circuits

None. The hybrid KEM is entirely in the transport/encryption layer. The Halo 2 circuit proves note existence, nullifier correctness, and value balance — it does not prove anything about encryption. No changes to proving keys, verifying keys, or circuit constraints.

Comparison with Industry

System	Approach	Status
Signal (PQXDH)	X25519 + ML-KEM-768, mandatory for all users	Deployed (2023)
Chrome/Firefox TLS	X25519 + ML-KEM-768 hybrid key exchange	Deployed (2024)
X-Wing (IETF draft)	X25519 + ML-KEM-768, purpose-built combiner	Draft standard
Zcash	Quantum recoverability draft ZIP (fund recovery, not encryption)	Discussion only
Dash Platform	Pallas ECDH + ML-KEM-768 (proposed)	Design phase

When to Deploy

The Timeline Question

Current state (2026): No quantum computer can break 255-bit ECC. Largest demonstrated quantum factoring: ~50 bits. Gap: orders of magnitude.
Near-term (2030-2035): Hardware roadmaps from IBM, Google, Quantinuum target millions of qubits. ML-KEM implementations and parameter sets will have matured.
Medium-term (2035-2050): Most estimates place CRQC arrival in this window. HNDL data collected today is at risk.
Long-term (2050+): Consensus among cryptographers: large-scale quantum computers are a matter of "when," not "if."

Recommended Strategy

1. Design for upgradability now. Ensure the stored record format, the TransmittedNoteCiphertext struct, and the BulkAppendTree entry layout are versioned and extensible. This is low-cost and preserves the option to add hybrid KEM later.

2. Deploy hybrid KEM when ready, make it mandatory. Do not offer two pools (classical and hybrid). Splitting the anonymity set defeats the purpose of shielded transactions — users hiding among a smaller group are less private, not more. When deployed, every note uses the hybrid scheme.

3. Target the 2028-2030 window. This is well before any realistic quantum threat but after ML-KEM implementations and parameter sizes have stabilized. It also allows learning from Zcash's and Signal's deployment experience.

4. Monitor trigger events:

NIST or NSA mandating post-quantum migration deadlines
Significant advances in quantum hardware (>100,000 physical qubits with error correction)
Cryptanalytic advances against lattice problems (would affect ML-KEM choice)

What Does Not Need Urgent Action

Component	Why it can wait
Spend authorization signatures	Forgery is real-time, not retroactive. Upgrade to ML-DSA/SLH-DSA before CRQC arrives.
Halo 2 proof system	Proof forgery is real-time. Migrate to STARK-based system when needed.
Sinsemilla collision resistance	Only useful for new attacks, not retroactive. Subsumed by proof system migration.
GroveDB Merk/MMR/Blake3 infrastructure	Already quantum-safe under current cryptographic assumptions. No action needed based on known attacks.

Post-Quantum Alternatives Reference

For Encryption (replacing ECDH)

Scheme	Type	Public key	Ciphertext	NIST Category	Notes
ML-KEM-768	Lattice (MLWE)	1,184 B	1,088 B	3 (192-bit)	FIPS 203, industry standard
ML-KEM-512	Lattice (MLWE)	800 B	768 B	1 (128-bit)	Smaller, lower margin
ML-KEM-1024	Lattice (MLWE)	1,568 B	1,568 B	5 (256-bit)	Overkill for hybrid

For Signatures (replacing RedPallas/Schnorr)

Scheme	Type	Public key	Signature	NIST Category	Notes
ML-DSA-65 (Dilithium)	Lattice	1,952 B	3,293 B	3	FIPS 204, fast
SLH-DSA (SPHINCS+)	Hash-based	32-64 B	7,856-49,856 B	1-5	FIPS 205, conservative
XMSS/LMS	Hash-based (stateful)	60 B	2,500 B	varies	Stateful — reuse = break

For ZK Proofs (replacing Halo 2)

System	Assumption	Proof size	Post-quantum	Notes
STARKs	Hash functions (collision resistance)	~100-400 KB	Yes	Used by StarkNet
Plonky3	FRI (hash-based polynomial commitment)	~50-200 KB	Yes	Active development
Halo 2 (current)	ECDLP on Pasta curves	~5 KB	No	Current Orchard system
Lattice SNARKs	MLWE	Research	Yes	Not production-ready

Rust Crate Ecosystem

Crate	Source	FIPS 203	Verified	Notes
`libcrux-ml-kem`	Cryspen	Yes	Formally verified (hax/F*)	Highest assurance
`ml-kem`	RustCrypto	Yes	Constant-time, not audited	Ecosystem compatibility
`fips203`	integritychain	Yes	Constant-time	Pure Rust, no_std

Summary

┌─────────────────────────────────────────────────────────────────────┐
│  QUANTUM THREAT SUMMARY FOR GROVEDB + ORCHARD                      │
│                                                                     │
│  SAFE UNDER CURRENT ASSUMPTIONS (hash-based):                        │
│    ✓ Blake3 Merk trees, MMR, BulkAppendTree                        │
│    ✓ BLAKE2b KDF, PRF^expand                                       │
│    ✓ ChaCha20-Poly1305 symmetric encryption                        │
│    ✓ All GroveDB proof authentication chains                        │
│                                                                     │
│  FIX BEFORE DATA IS STORED (retroactive HNDL):                     │
│    ✗ Note encryption (ECDH key agreement) → Hybrid KEM             │
│    ✗ Value commitments (Pedersen) → amounts revealed                │
│                                                                     │
│  FIX BEFORE QUANTUM COMPUTERS ARRIVE (real-time only):              │
│    ~ Spend authorization → ML-DSA / SLH-DSA                        │
│    ~ ZK proofs → STARKs / Plonky3                                  │
│    ~ Sinsemilla → hash-based Merkle tree                            │
│                                                                     │
│  RECOMMENDED TIMELINE:                                              │
│    2026-2028: Design for upgradability, version stored formats      │
│    2028-2030: Deploy mandatory hybrid KEM for note encryption       │
│    2035+: Migrate signatures and proof system if needed             │
└─────────────────────────────────────────────────────────────────────┘

Appendix A: Complete Element Type Reference

Reading the table. "Element disc" is the bincode discriminant of the Element enum (one byte; persisted at the start of every serialized element). "TreeType disc" is the discriminant of the separate TreeType enum in merk/src/tree_type/mod.rs — it is NOT the same numbering. Most rows list both the TreeType disc and its variant name (e.g. 0 (NormalTree)) to keep the distinction obvious; N/A means the Element variant is not a tree.

Element disc	Variant	TreeType disc	Fields	Cost Size	Purpose
0	`Item`	N/A	`(value, flags)`	varies	Basic key-value storage
1	`Reference`	N/A	`(path, max_hop, flags)`	varies	Link between elements
2	`Tree`	0 (NormalTree)	`(root_key, flags)`	TREE_COST_SIZE	Container for subtrees
3	`SumItem`	N/A	`(value, flags)`	varies	Contributes to parent sum
4	`SumTree`	1 (SumTree)	`(root_key, sum, flags)`	SUM_TREE_COST_SIZE	Maintains sum of descendants
5	`BigSumTree`	2 (BigSumTree)	`(root_key, sum128, flags)`	BIG_SUM_TREE_COST_SIZE	128-bit sum tree
6	`CountTree`	3 (CountTree)	`(root_key, count, flags)`	COUNT_TREE_COST_SIZE	Element counting tree
7	`CountSumTree`	4 (CountSumTree)	`(root_key, count, sum, flags)`	COUNT_SUM_TREE_COST_SIZE	Combined count + sum
8	`ProvableCountTree`	5 (ProvableCountTree)	`(root_key, count, flags)`	COUNT_TREE_COST_SIZE	Provable count tree
9	`ItemWithSumItem`	N/A	`(value, sum, flags)`	varies	Item with sum contribution
10	`ProvableCountSumTree`	6 (ProvableCountSumTree)	`(root_key, count, sum, flags)`	COUNT_SUM_TREE_COST_SIZE	Provable count + sum (only count in hash)
11	`CommitmentTree`	7 (CommitmentTree)	`(total_count: u64, chunk_power: u8, flags)`	12	ZK-friendly Sinsemilla + BulkAppendTree
12	`MmrTree`	8 (MmrTree)	`(mmr_size: u64, flags)`	11	Append-only MMR log
13	`BulkAppendTree`	9 (BulkAppendTree)	`(total_count: u64, chunk_power: u8, flags)`	12	High-throughput append-only log
14	`DenseAppendOnlyFixedSizeTree`	10 (DenseAppendOnlyFixedSizeTree)	`(count: u16, height: u8, flags)`	6	Dense fixed-capacity Merkle storage
15	`NonCounted`	wrapper	`Box<Element>`	inner + 1 byte	Opts inner out of parent count aggregation
16	`NotSummed`	wrapper	`Box<Element>`	inner + 1 byte	Opts inner out of parent sum aggregation
17	`ProvableSumTree`	11 (ProvableSumTree)	`(root_key, sum: i64, flags)`	SUM_TREE_COST_SIZE	Sum baked into hash (see Aggregate Sum on Range Queries)

Notes:

Discriminants 11–14 are non-Merk trees: data lives outside a child Merk subtree
- All four store non-Merk data in the data column
- CommitmentTree stores its Sinsemilla frontier alongside BulkAppendTree entries in the same data column (key b"__ct_data__")
Non-Merk trees do NOT have a root_key field — their type-specific root hash flows as the Merk child hash via insert_subtree
CommitmentTree uses Sinsemilla hashing (Pallas curve); all others use Blake3
Cost behavior for non-Merk trees follows NormalTree (BasicMerkNode, no aggregation)
DenseAppendOnlyFixedSizeTree count is u16 (max 65,535); heights restricted to 1..=16

Appendix B: Proof System Quick Reference

V0 Proofs (Merk-only)

Generate:
  1. Start at root Merk, execute query → collect matching elements
  2. For each matching tree element with subquery:
     Open child Merk → prove subquery recursively
  3. Serialize as GroveDBProof::V0(root_layer, Vec<(key, GroveDBProof)>)

Verify:
  1. Verify root Merk proof → root_hash, elements
  2. For each sub-proof:
     Verify child Merk proof → child_hash
     Check: combine_hash(value_hash(parent_element), child_hash) == expected
  3. Final root_hash must match known GroveDB root

V1 Proofs (Mixed Merk + Non-Merk)

When any layer involves a CommitmentTree, MmrTree, BulkAppendTree, or DenseAppendOnlyFixedSizeTree, a V1 proof is generated:

Generate:
  1. Same as V0 for Merk layers
  2. When descending into CommitmentTree:
     → generate_commitment_tree_layer_proof(query_items) → ProofBytes::CommitmentTree(bytes)
     (sinsemilla_root (32 bytes) || BulkAppendTree proof bytes)
  3. When descending into MmrTree:
     → generate_mmr_layer_proof(query_items) → ProofBytes::MMR(bytes)
  4. When descending into BulkAppendTree:
     → generate_bulk_append_layer_proof(query_items) → ProofBytes::BulkAppendTree(bytes)
  5. When descending into DenseAppendOnlyFixedSizeTree:
     → generate_dense_tree_layer_proof(query_items) → ProofBytes::DenseTree(bytes)
  6. Store as LayerProof { merk_proof, lower_layers }

Verify:
  1. Same as V0 for Merk layers
  2. For ProofBytes::CommitmentTree: extract sinsemilla_root, verify inner
     BulkAppendTree proof, recompute combined root hash
  3. For ProofBytes::MMR: verify MMR proof against MMR root from parent child hash
  4. For ProofBytes::BulkAppendTree: verify against state_root from parent child hash
  5. For ProofBytes::DenseTree: verify against root_hash from parent child hash
     (recompute root from entries + ancestor values + sibling hashes)
  6. Type-specific root flows as Merk child hash (not NULL_HASH)

V0/V1 selection: If all layers are Merk, produce V0 (backward compatible). If any layer is a non-Merk tree, produce V1.

Appendix C: Hash Computation Reference

Merk Node Hash

node_hash = blake3(key_len(1) || key || value_hash(32) || left_hash(32) || right_hash(32))

GroveDB Subtree Prefix

prefix = blake3(parent_prefix || key) → 32 bytes

State Root — BulkAppendTree

state_root = blake3("bulk_state" || mmr_root || dense_tree_root)

The buffer is a DenseFixedSizedMerkleTree — dense_tree_root is its root hash.

Dense Merkle Root — BulkAppendTree Chunks

leaves[i] = blake3(entry[i])
internal[parent] = blake3(left_child || right_child)
root = internal[0] (top of complete binary tree)

Dense Tree Root Hash — DenseAppendOnlyFixedSizeTree

Recursive from root (position 0):
  all nodes with data: blake3(blake3(value) || H(left) || H(right))
                       (no domain separation tags — structure externally authenticated)
  leaf nodes:          left_hash = right_hash = [0; 32]
  unfilled position:   [0; 32]
  empty tree:          [0; 32]

Dense Tree Proof Verification — DenseAppendOnlyFixedSizeTree

Given: entries, node_value_hashes, node_hashes, expected_root

Pre-checks:
  1. Validate height in [1, 16]
  2. Validate count <= capacity (= 2^height - 1)
  3. Reject if any field has > 100,000 elements (DoS prevention)
  4. Reject duplicate positions within entries, node_value_hashes, node_hashes
  5. Reject overlapping positions between the three fields
  6. Reject node_hashes at ancestors of any proved entry (prevents forgery)

recompute_hash(position):
  if position >= capacity or position >= count → [0; 32]
  if position in node_hashes → return precomputed hash
  value_hash = blake3(entries[position]) or node_value_hashes[position]
  left_hash  = recompute_hash(2*pos+1)
  right_hash = recompute_hash(2*pos+2)
  return blake3(value_hash || left_hash || right_hash)

Verify: recompute_hash(0) == expected_root

GroveDB cross-validation:
  proof.height must match element.height
  proof.count must match element.count

MMR Node Merge — MmrTree / BulkAppendTree Chunk MMR

parent.hash = blake3(left.hash || right.hash)

Sinsemilla Root — CommitmentTree

Sinsemilla hash over Pallas curve (see Zcash ZIP-244)
MerkleHashOrchard::empty_root(Level::from(32)) for empty tree

combine_hash (Parent-Child Binding)

combine_hash(value_hash, child_hash) = blake3(value_hash || child_hash)
For Merk trees: child_hash = child Merk root hash
For non-Merk trees: child_hash = type-specific root (mmr_root, state_root, dense_root, etc.)

The GroveDB Book — documenting the internals of GroveDB for developers and researchers. Based on the GroveDB source code.