Count-Offset Paginated Queries
Overview
A count-offset paginated query lets a caller paginate through the keys
inside a ProvableCountTree or ProvableCountSumTree, asking:
"Skip the first N in-range items, then return the next M items."
…with a single proof whose size is proportional to M + log(skipped),
not M + skipped. The skipped region collapses to one hash-bound
op per skipped subtree.
This is the missing pagination primitive for provable queries. Where aggregate count queries ask "how many?", count-offset paginated queries ask "give me page N of this range, proving I skipped exactly the items between the start of the range and the page".
Concretely the prover honors SizedQuery::offset and SizedQuery::limit
for the duration of a single-range query against a count tree:
#![allow(unused)] fn main() { let mut q = Query::new(); q.insert_range_inclusive(b"a".to_vec()..=b"z".to_vec()); let path_query = PathQuery::new( vec![b"my_count_tree".to_vec()], SizedQuery::new(q, /* limit */ Some(20), /* offset */ Some(40)), ); let proof_bytes = db.prove_query(&path_query, None, grove_version)?; let (root_hash, results) = GroveDb::verify_query_raw( &proof_bytes, &path_query, grove_version, )?; // `results` holds in-range items 41..=60 (offset 40, then limit 20). }
Inside the offset window, the prover never emits the skipped items' values, only proofs that those items exist and contribute their full count to the skipped total. Inside the limit window it emits the actual value-bearing nodes. Past the limit it goes back to digest-only nodes. The verifier independently re-derives the offset / limit accounting from the proof shape — it never trusts the prover's numbers; it computes its own and demands they match.
Eligibility
Count-offset paginated proofs are accepted only for queries that pass the
SizedQuery::validate_count_offset_paginated gate:
SizedQuery::offsetisSome(n)withn > 0. Offset = 0 is not pagination — the regular query path already covers that case.- Exactly one item in
Query::items. Multi-item queries are out of scope for the initial implementation. - The item is a true range variant —
Range,RangeInclusive,RangeFrom,RangeFull,RangeTo,RangeToInclusive,RangeAfter,RangeAfterTo, orRangeAfterToInclusive.QueryItem::Key(_)is rejected because it matches at most one key, sooffset > 0is guaranteed to return zero items — a useless query that almost always indicates user error. - No subqueries.
default_subquery_branch.subquery.is_none()/subquery_path.is_none()/ noconditional_subquery_branches. - No aggregate wrappers.
AggregateCountOnRangeandAggregateSumOnRangehave their own paginated semantics; we reject the combination so the two flows don't shadow each other. - The
PathQuery::pathis non-empty. The GroveDB root is always aNormalTree, never a count tree, so a root-level count-offset query has no valid target. - The leaf merk's
tree_typeisProvableCountTreeorProvableCountSumTree. This is checked at the top of the proof generator by opening the merk and reading its tree type — anything else surfaces asError::InvalidQuerywith a clear "only valid against ProvableCountTree / ProvableCountSumTree" message.
Violating any of 1–6 returns Error::InvalidQuery(...) from
SizedQuery::validate_count_offset_paginated. Violating 7 returns the same
error from check_count_offset_target_tree_type (the prover's leaf-merk
precheck).
V1-only — V0 proofs do not support this
This is a V1-proof-only feature. The V0 proof envelope is a shipped wire
format used by grove versions v1 and v2 in production; adding new
accepted query shapes there would be a consensus-breaking change for
already-deployed validators. V0's prover unconditionally rejects any
non-zero offset, and the verifier's V0/V1 split (in verify_proof_internal
and verify_proof_raw_internal) unconditionally rejects offset queries
against a V0 envelope while routing V1 envelopes through
validate_count_offset_paginated.
The prove_count_offset_on_range method on Merk is gated on
MerkProofVersions::prove_count_offset_on_range (initial implementation
version 0, set across all grove versions). The version field exists so a
coordinated prover/verifier change in a future grove version can bump it
to 1+ without breaking older callers.
Why this works only on count trees that bind count into the hash
Same reasoning as aggregate count queries:
only ProvableCountTree and ProvableCountSumTree use
node_hash_with_count(kv_hash, left, right, count) for their node-hash
computation, so a proof that asserts a particular count for a skipped
subtree is cryptographically bound — a forged count produces a
different reconstructed root hash and the chain check fails.
For ProvableCountSumTree the node hash binds only the count (not the
sum) — the sum is stored on the node but isn't in the hash, by the same
design choice that makes AggregateCountOnRange work uniformly for both
variants. So count-offset paginated proofs commit only the count too; the
sum is not part of this feature.
Plain CountTree / CountSumTree track counts but don't bind them to the
hash. Pagination proofs against them would be unverifiable.
NormalTree, SumTree, etc. don't even track counts. All are rejected at
the prover's leaf-merk precheck.
How the proof is built
The proof generator carries two pieces of state through the recursion:
offset_remaining: u64— how many in-range items the prover still needs to skip.limit_remaining: Option<u64>— how many in-range items the prover can still return (None= unlimited).
At each subtree, the prover classifies it
against the inner range — Disjoint, Contained, or Boundary — and
decides whether to collapse the entire subtree into a single
HashWithCount op or descend into it per-element. The decision is
direction-aware: for ascending walks (left-to-right) the prover visits
the left child first, then self, then right; for descending walks
(right-to-left) it visits right, then self, then left, so "the first N
in-range keys" matches the user-facing iteration order.
Collapse rules
| Classification | Condition | Emitted op | State mutation |
|---|---|---|---|
| Disjoint | always | HashWithCount(kv_hash, l_h, r_h, c) | none — no in-range items |
| Contained | subtree_count ≤ offset_remaining | HashWithCount(...) | offset_remaining −= subtree_count |
| Contained | offset_remaining == 0 && limit_remaining == Some(0) | HashWithCount(...) (past-limit) | none |
| Contained | otherwise (partial-skip or partial-limit) | descend per-element | — |
| Boundary | always | descend per-element | — |
The first row is shared with AggregateCountOnRange: a Disjoint subtree
contributes zero to the in-range total but its structural count still has
to be hash-bound for the parent's own-count derivation (see "Why
HashWithCount is self-verifying" in the aggregate-count chapter).
The middle two rows are what's new for offset queries:
- Whole-subtree skip (
subtree_count ≤ offset_remaining): the prover emits oneHashWithCountop for an entire subtree and decrementsoffset_remainingby that subtree's count. This is the optimization the feature exists for — an offset of, say, 10,000 over a tree of 100,000 items pays log-of-skipped proof size, not 10,000 ops. - Whole-subtree past-limit collapse: once the limit is exhausted, any
remaining Contained subtree is emitted as one
HashWithCountwith no state change. The verifier reaches it via the same collapse rule and accepts.
Per-element emission inside a descent
When the prover descends into a Boundary node (or a Contained subtree that's too big to fully skip), each node it visits emits one of:
| Per-node disposition | Emitted op |
|---|---|
| Out-of-range key (Boundary path node) | KVDigestCount(key, value_hash, count) |
In-range NonCounted-wrapped entry (own_count = 0) | KVDigestCount(key, value_hash, count) |
| In-range counted entry, inside the offset window | KVDigestCount(key, value_hash, count) (skip) |
| In-range counted entry, past the limit | KVDigestCount(key, value_hash, count) (no return) |
| In-range counted entry, inside the limit window — returned | KVCount(key, value, count) or KVValueHashFeatureType(key, value, value_hash, ft) |
The same KVDigestCount op is used for four conceptually-different
positions; the verifier disambiguates by re-running the prover's state
machine in directional order and matching the op shape to the disposition
its current state implies. Out-of-range keys and NonCounted entries
naturally contribute own_count = 0 to the state machine and the
verifier expects no state mutation for them.
The returned-item flavor depends on what's stored in the count tree:
- Items / SumItems →
KVCount(key, value, count). - Trees / References →
KVValueHashFeatureType(key, value, value_hash, feature_type). - References under a count tree get the same merk-level shape as
trees; GroveDB's reference-resolution post-pass rewrites them to
KVRefValueHashCountwith the dereferenced value.
Verifier shape walk
The verifier mirrors the prover's state machine in directional order. It maintains:
offset_remaining— initialized to the caller-passed offset, decremented whenever the verifier independently observes a count-bound skip.limit_remaining— initialized to the caller-passed limit, decremented for each returned item.skipped— running count of items the prover skipped, computed entirely from the proof shape (the prover's claimed number is not trusted).returned: Vec<CountOffsetReturnedItem>— items the verifier reconstructs from value-bearing nodes inside the limit window.
For each node it visits:
- Classify the position (Disjoint / Contained / Boundary) using the same inherited-bounds logic the prover used.
- Validate the op shape against the classification + own state. For
example:
- A
HashWithCountat a Disjoint position must be a leaf in the proof (no attached children). A child here would mean the prover is hiding counted entries under a hash-only node. - A
HashWithCountat a Contained position is valid only whenstate.offset_remaining > 0(skip-mode) orstate.limit_remaining == Some(0)(past-limit). Anywhere else means the prover should have descended. - A value-bearing node (
KVCount,KVValueHashFeatureType) is valid only whenstate.offset_remaining == 0 && state.limit_remaining != Some(0).
- A
- Derive
own_countfor the current node in O(1) from its immediate children's count fields:own = aggregate − left_aggregate − right_aggregate. This is what lets the in-order state machine know — before recursing into the second-direction child — whether this position contributes a slot to offset or limit. - Apply the per-position state mutation in directional order: left → self → right for ascending, right → self → left for descending.
- Bubble the structural count back up so the parent can re-derive its
own
own_count.
At the end the verifier returns (root_hash, returned_items, skipped). The
caller compares root_hash against their trusted root and trusts the rest.
Why we don't trust the prover's offset accounting
A malicious prover could try several attacks:
| Attack | Detection |
|---|---|
Forge HashWithCount.count to under-count a skipped subtree | node_hash_with_count recomputation gives a different reconstructed root hash → chain check fails |
Forge HashWithCount.count to over-count a skipped subtree | Same as above |
Substitute a value-bearing node for KVDigestCount mid-skip | Verifier sees a value at a position where state.offset_remaining > 0 → "value emitted with offset remaining" |
Substitute KVDigestCount for a value mid-limit | Verifier sees a digest at offset_remaining == 0 && limit_remaining > 0 → "digest at offset=0 with limit free" |
Attach children to a leaf-position HashWithCount | Shape-walk check rejects: "HashWithCount at Contained/Disjoint must be a leaf" |
Emit a KVDigestCount with a key outside its inherited bounds | key_strictly_inside check rejects |
| Emit children whose aggregates exceed the parent's | own_count = aggregate − left − right underflow rejects |
Inject a non-count node kind (e.g. Hash, KVHash) | execute_with_options visit-node allowlist rejects |
These rejection branches all have dedicated forging tests in
merk/src/proofs/query/count_offset/tests.rs.
Unsupported in-range value shapes (P1 / P2)
The count-offset proof flow's scope is plain Item / SumItem /
ItemWithSumItem and empty trees inside a count tree. Three shapes
are explicitly rejected by both the prover and the verifier:
| Rejected shape | Primary defense |
|---|---|
NonCounted-wrapped entry | Rejected at insert time by PR #672 — NonCounted cannot be stored inside a ProvableCountTree / ProvableCountSumTree at all. The merk-level prover and the verifier still reject defensively (against pre-#672 data on disk or any lower-level builder that bypasses the insert restriction). |
Reference / ReferenceWithSumItem | The regular flow's reference post-pass dereferences these to the target's value bytes. The count-offset short-circuit returns before that post-pass, so a verified result would expose the raw Element::Reference rather than the dereferenced target. Prover rejects at descent; verifier rejects in returned items. |
| Non-empty tree (any tree variant) | V1 strict-mode requires a KVValueHashFeatureTypeWithChildHash proof node for these, which the count-offset prover doesn't emit. Accepting one without that node would silently bypass the child-hash invariant the regular flow enforces. |
Why the NonCounted rejection is enforced at insert time
A ProvableCountTree binds its count aggregate into every node hash
via node_hash_with_count. The HashWithCount collapse rule in the
prover (Contained subtree + subtree_count ≤ offset_remaining) folds
an entire subtree into one self-verifying op whose committed count
field is what consumes the offset budget.
NonCounted children contribute own_count = 0, so they don't show up
in subtree_count — but they are visible to regular GroveDB
pagination. Allowing them in a ProvableCountTree would mean a
contained subtree like [counted-a, NonCounted-b, counted-c] with
offset = 2, limit = 1 could collapse as HashWithCount(count = 2)
and verify with returned = [], while regular pagination would return
[c]. That's a silent semantic divergence.
#672 closes the gap at the only place it can be closed without changing
the proof wire format: the Element::insert / batch path refuses to
store NonCounted inside a Provable* count parent. With that
invariant in place, subtree_count always equals the actual entry
count for these trees, and the collapse rule is safe.
Lifting the remaining restrictions (follow-up work)
- Non-empty trees: emit
KVValueHashFeatureTypeWithChildHash(mirroring the regular V1 prover) and drop the verifier-side rejection. - References: apply the same reference-post-pass the regular V1
prover uses, rewriting
Reference/ReferenceWithSumItemvalue nodes intoKVRefValueHashCountwith the dereferenced target's bytes. - NonCounted entries are unlikely to become legal here, since the
whole
ProvableCountTreemodel depends onsubtree_count == entry count. If that semantic is ever wanted, the right path is a different tree type, not relaxing the insert rule.
API surface
Count-offset paginated queries go through the same prove_query /
verify_query_raw / verify_query_with_options entry points as every
other path query — there is no dedicated entry point. The query envelope
(PathQuery with a SizedQuery carrying a non-zero offset) is what
selects the count-offset dispatch.
Prover side:
#![allow(unused)] fn main() { // Same entry point as every other path query. GroveDb::prove_query(&path_query, prove_options, grove_version) -> CostResult<Vec<u8>, Error> }
Internally prove_subqueries_v1 short-circuits at the leaf when
path_query.path.len() == current_path.len() && path_query.has_non_zero_offset(),
calls Merk::prove_count_offset_on_range, and wraps the bytes in a
LayerProof with empty lower_layers.
Verifier side:
#![allow(unused)] fn main() { // Same entry points as every other path query. GroveDb::verify_query_with_options(proof, &path_query, options, grove_version) GroveDb::verify_query_raw(proof, &path_query, grove_version) }
verify_proof_internal / verify_proof_raw_internal enforce the V0/V1
split on offset, then verify_layer_proof_v1 short-circuits at the leaf
to run_count_offset_layer_dispatch, which:
- Rejects unexpected
lower_layers(an honest count-offset leaf proof has none — the validator forbade subqueries). - Calls
verify_count_offset_on_range_prooffor the merk-level shape walk. - Rejects any non-empty tree returned item.
- Translates each surviving returned item into a
ProvedPathKeyOptionalValueusing the merk-surfaced value-hash andchild_hash_verifiedflag (not synthesized — see comment inCountOffsetReturnedItem).
The merk-level type that the verifier emits per item is
#![allow(unused)] fn main() { pub struct CountOffsetReturnedItem { pub key: Vec<u8>, pub value: Vec<u8>, /// `H(value)` for `KVCount`; proof-carried value_hash for /// `KVValueHashFeatureType` / `KVValueHash` (tree-flavored entries /// store `combine_hash(H(value), child_root)`). pub value_hash: CryptoHash, /// Always `false` for now — the current prover never emits the /// with-child-hash variant. pub child_hash_verified: bool, } }
Comparison table
| Regular query | Aggregate count | Count-offset paginated | |
|---|---|---|---|
| Return type | items | u64 count | items (subset of range) |
Honors SizedQuery::offset? | yes (but proofs reject it) | no | yes (V1 only, count trees only) |
Honors SizedQuery::limit? | yes | leaf: no, carrier: yes | yes |
| Direction-aware? | yes | no (counting is direction-agnostic) | yes |
| Supported on V0 proofs? | yes | yes | no (V1 only) |
| Allowed tree types | any | ProvableCount / ProvableCountSum | ProvableCount / ProvableCountSum |
| Proof size vs offset | O(offset + limit) | n/a | O(log(offset) + limit) |
Future work
- Multi-item queries / subqueries. Currently rejected by
validate_count_offset_paginated. The merk machinery extends naturally if the per-level state-machine accounting can be re-derived correctly. - Non-empty tree returns. Requires the prover to emit
KVValueHashFeatureTypeWithChildHashfor tree children of a count tree, same as the regular V1 prover does. The verifier's child-hash check would then accept those entries instead of rejecting them in the GroveDB layer. - Aggregate-style multi-layer paginated proofs. The
AggregateCountOnRangecarrier shape (an outer multi-key walk with inner-aggregate leaves) could be extended to paginated outer walks with count-offset inner leaves. Out of scope for the initial PR.