Hashing — Cryptographic Integrity
Every node in a Merk tree is hashed to produce a root hash — a single 32-byte value that authenticates the entire tree. Any change to any key, value, or structural relationship will produce a different root hash.
Three-Level Hash Hierarchy
Merk uses a three-level hashing scheme, from innermost to outermost:
Example: key = "bob" (3 bytes), value = "hello" (5 bytes):
graph LR
subgraph level1["Level 1: value_hash"]
V_IN["varint(5) ‖ "hello""]
V_BLAKE["Blake3"]
V_OUT(["value_hash<br/><small>32 bytes</small>"])
V_IN --> V_BLAKE --> V_OUT
end
subgraph level2["Level 2: kv_hash"]
K_IN["varint(3) ‖ "bob" ‖ value_hash"]
K_BLAKE["Blake3"]
K_OUT(["kv_hash<br/><small>32 bytes</small>"])
K_IN --> K_BLAKE --> K_OUT
end
subgraph level3["Level 3: node_hash"]
N_LEFT(["left_child_hash<br/><small>32B (or NULL_HASH)</small>"])
N_KV(["kv_hash<br/><small>32B</small>"])
N_RIGHT(["right_child_hash<br/><small>32B (or NULL_HASH)</small>"])
N_BLAKE["Blake3<br/><small>96B input = 2 blocks</small>"]
N_OUT(["node_hash<br/><small>32 bytes</small>"])
N_LEFT --> N_BLAKE
N_KV --> N_BLAKE
N_RIGHT --> N_BLAKE
N_BLAKE --> N_OUT
end
V_OUT -.-> K_IN
K_OUT -.-> N_KV
style level1 fill:#eaf2f8,stroke:#2980b9
style level2 fill:#fef9e7,stroke:#f39c12
style level3 fill:#fdedec,stroke:#e74c3c
The ROOT of the tree =
node_hashof the root node — authenticates every key, value, and structural relationship. Missing children useNULL_HASH = [0x00; 32].
Level 1: value_hash
#![allow(unused)] fn main() { // merk/src/tree/hash.rs pub fn value_hash(value: &[u8]) -> CostContext<CryptoHash> { let mut hasher = blake3::Hasher::new(); let val_length = value.len().encode_var_vec(); // Varint encoding hasher.update(val_length.as_slice()); hasher.update(value); // ... } }
The value's length is varint-encoded and prepended. This is critical for
collision resistance — without it, H("AB" ‖ "C") would equal H("A" ‖ "BC").
Level 2: kv_hash
#![allow(unused)] fn main() { pub fn kv_hash(key: &[u8], value: &[u8]) -> CostContext<CryptoHash> { let mut hasher = blake3::Hasher::new(); let key_length = key.len().encode_var_vec(); hasher.update(key_length.as_slice()); hasher.update(key); let vh = value_hash(value); hasher.update(vh.as_slice()); // Nested hash // ... } }
This binds the key to the value. For proof verification, there's also a variant that takes a pre-computed value_hash:
#![allow(unused)] fn main() { pub fn kv_digest_to_kv_hash(key: &[u8], value_hash: &CryptoHash) -> CostContext<CryptoHash> }
This is used when the verifier already has the value_hash (e.g., for subtrees where value_hash is a combined hash).
Level 3: node_hash
#![allow(unused)] fn main() { pub fn node_hash( kv: &CryptoHash, left: &CryptoHash, right: &CryptoHash, ) -> CostContext<CryptoHash> { let mut hasher = blake3::Hasher::new(); hasher.update(kv); // 32 bytes hasher.update(left); // 32 bytes hasher.update(right); // 32 bytes — total 96 bytes // Always exactly 2 hash operations (96 bytes / 64-byte block = 2) } }
If a child is absent, its hash is the NULL_HASH — 32 zero bytes:
#![allow(unused)] fn main() { pub const NULL_HASH: CryptoHash = [0; HASH_LENGTH]; // [0u8; 32] }
Blake3 as the Hash Function
GroveDB uses Blake3 for all hashing. Key properties:
- 256-bit output (32 bytes)
- Block size: 64 bytes
- Speed: ~3x faster than SHA-256 on modern hardware
- Streaming: Can incrementally feed data
The hash operation cost is calculated based on how many 64-byte blocks are processed:
#![allow(unused)] fn main() { let hashes = 1 + (hasher.count() - 1) / 64; // Number of hash operations }
Length-Prefix Encoding for Collision Resistance
Every variable-length input is prefixed with its length using varint encoding:
graph LR
subgraph bad["Without length prefix — VULNERABLE"]
BAD1["H("AB" + "C") = H(0x41 0x42 0x43)"]
BAD2["H("A" + "BC") = H(0x41 0x42 0x43)"]
BAD1 --- SAME["SAME HASH!"]
BAD2 --- SAME
end
subgraph good["With length prefix — collision resistant"]
GOOD1["H([02] 0x41 0x42)<br/><small>varint(2) ‖ "AB"</small>"]
GOOD2["H([03] 0x41 0x42 0x43)<br/><small>varint(3) ‖ "ABC"</small>"]
GOOD1 --- DIFF["DIFFERENT"]
GOOD2 --- DIFF
end
style bad fill:#fadbd8,stroke:#e74c3c
style good fill:#d5f5e3,stroke:#27ae60
style SAME fill:#e74c3c,color:#fff,stroke:#c0392b
style DIFF fill:#27ae60,color:#fff,stroke:#229954
value_hash input:
[varint(value.len)] [value bytes]kv_hash input:[varint(key.len)] [key bytes] [value_hash: 32 bytes]
Without length prefixes, an attacker could craft different key-value pairs that hash to the same digest. The length prefix makes this cryptographically infeasible.
Combined Hashing for Special Elements
For subtrees and references, the value_hash is not simply H(value).
Instead, it's a combined hash that binds the element to its target:
graph LR
subgraph item["Regular Item"]
I_val["value bytes"] --> I_hash["H(len ‖ bytes)"] --> I_vh(["value_hash"])
end
subgraph subtree["Subtree Element"]
S_elem["tree element bytes"] --> S_hash1["H(len ‖ bytes)"]
S_root(["child Merk<br/>root hash"])
S_hash1 --> S_combine["combine_hash()<br/><small>Blake3(a ‖ b)</small>"]
S_root --> S_combine
S_combine --> S_vh(["value_hash"])
end
subgraph reference["Reference Element"]
R_elem["ref element bytes"] --> R_hash1["H(len ‖ bytes)"]
R_target["target value"] --> R_hash2["H(len ‖ bytes)"]
R_hash1 --> R_combine["combine_hash()"]
R_hash2 --> R_combine
R_combine --> R_vh(["value_hash"])
end
style item fill:#eaf2f8,stroke:#2980b9
style subtree fill:#fef9e7,stroke:#f39c12
style reference fill:#fdedec,stroke:#e74c3c
Subtree: binds the child Merk's root hash into the parent. Reference: binds both the reference path AND the target value. Changing either changes the root hash.
The combine_hash function:
#![allow(unused)] fn main() { pub fn combine_hash(hash_one: &CryptoHash, hash_two: &CryptoHash) -> CostContext<CryptoHash> { let mut hasher = blake3::Hasher::new(); hasher.update(hash_one); // 32 bytes hasher.update(hash_two); // 32 bytes — total 64 bytes, exactly 1 hash op // ... } }
This is what allows GroveDB to authenticate the entire hierarchy through a single root hash — each parent tree's value_hash for a subtree element includes the child tree's root hash.
Aggregate Hashing for ProvableCountTree
ProvableCountTree nodes include the aggregate count in the node hash:
#![allow(unused)] fn main() { pub fn node_hash_with_count( kv: &CryptoHash, left: &CryptoHash, right: &CryptoHash, count: u64, ) -> CostContext<CryptoHash> { let mut hasher = blake3::Hasher::new(); hasher.update(kv); // 32 bytes hasher.update(left); // 32 bytes hasher.update(right); // 32 bytes hasher.update(&count.to_be_bytes()); // 8 bytes — total 104 bytes // Still exactly 2 hash ops (104 < 128 = 2 × 64) } }
This means a proof of count doesn't require revealing the actual data — the count is baked into the cryptographic commitment.