Document Count Trees
Counting the documents that match a query used to mean fetching them and calling .len(). From protocol v12 (Platform 3.1) onward, document types can opt into a different primary-key tree variant that maintains a running count inside the tree itself, turning count(*)-style queries into an O(1) lookup. This chapter explains the three tree variants, how a document type selects one, and the two query endpoints that expose the feature.
Why Count Trees Exist
The default primary-key tree for a document type is a NormalTree. To count the documents in it, Drive walks the subtree, deserializes every record, and returns the length of the resulting collection. That is fine for small types but becomes painful as soon as a UI needs "how many widgets are there?" on a contract with millions of widgets.
GroveDB has two count-aware tree variants. Both are provable — the count is committed to the Merkle root in each case — but they differ in where counts are stored inside the tree, and that controls which kinds of count queries can be answered without enumerating leaves:
CountTree— stores a singleu64count, at the root of the tree. The total document count is one read; any per-subtree count requires walking down to that subtree's root and reading its (separate) tree element.ProvableCountTree— stores au64count at every internal node, not just the root. Each node's count covers everything in the subtree below it, so range queries like "how many items between key A and key B?" or "how many items per value of an indexed property?" can be answered by walking the boundary nodes and summing their pre-computed counts, without touching any leaf.
GroveDB merk trees are binary — each internal node has exactly a left and a right child:
The dashed box is the wrapping Element (the "tree" in grovedb terms) and contains the root node of the merk tree. Both variants store the total count on the wrapping element — that's the O(1) field Drive reads for total counts. The difference is what's inside: in a CountTree the merk root and the rest of the tree don't carry the count, so only the wrapper has it. In a ProvableCountTree the count is also stored on the root node itself and on every internal merk-tree node, so it's committed into the merk root hash and provable per-subtree.
flowchart LR
subgraph CT ["CountTree"]
direction TB
subgraph CT_ELEM ["Tree element c=3"]
direction TB
A["root"]:::node
end
A --> B["·"]:::node
A --> C["x"]:::leaf
B --> D["x"]:::leaf
B --> E["x"]:::leaf
end
subgraph PCT ["ProvableCountTree"]
direction TB
subgraph PCT_ELEM ["Tree element c=3"]
direction TB
H["root c=3"]:::countnode
end
H --> I["c=2"]:::countnode
H --> J["c=1"]:::leaf
I --> K["c=1"]:::leaf
I --> L["c=1"]:::leaf
end
CT ~~~ PCT
classDef node fill:#6e7681,color:#fff,stroke:#6e7681;
classDef countnode fill:#3fb950,color:#0d1117,stroke:#3fb950,stroke-width:2px;
classDef leaf fill:#21262d,color:#c9d1d9,stroke:#484f58;
style CT_ELEM fill:none,stroke:#1f6feb,stroke-width:2px,stroke-dasharray: 6 4,color:#1f6feb
style PCT_ELEM fill:none,stroke:#1f6feb,stroke-width:2px,stroke-dasharray: 6 4,color:#1f6feb
In a CountTree, the only count-bearing node is the root. To compute "how many items per value of property P?" you'd have to navigate to each value-keyed subtree (a separate grovedb tree, not a child node of the binary structure above), read its root count, and pay for a separate proof per read — N reads for N distinct values. In a ProvableCountTree, every internal node along the binary path already carries the count of its left and right subtrees, so a range query like "items in [a, b]" or "items per value of P" walks only the boundary path and sums the pre-committed sub-counts in a single traversal and a single proof.
A document type opts in via two schema flags:
documentsCountable: true→ primary-key tree is aCountTree. Enables O(1) total-count for the document type; sufficient forGetDocumentsCountwith nowherefilter.rangeCountable: true→ primary-key tree is aProvableCountTree. ImpliesdocumentsCountable. The same flag is also accepted per-index, where it controls range-count storage layout (see below) and is required for anyGetDocumentsCountrequest that carries a range where-clause.
How a Document Type Picks Its Tree Variant
Selection lives in packages/rs-drive/src/drive/document/primary_key_tree_type.rs:
#![allow(unused)] fn main() { impl DocumentTypePrimaryKeyTreeType for DocumentTypeRef<'_> { fn primary_key_tree_type( &self, platform_version: &PlatformVersion, ) -> Result<TreeType, Error> { match platform_version .drive .methods .document .primary_key_tree_type { 0 => { if self.range_countable() { Ok(TreeType::ProvableCountTree) } else if self.documents_countable() { Ok(TreeType::CountTree) } else { Ok(TreeType::NormalTree) } } version => Err(Error::Drive(DriveError::UnknownVersionMismatch { method: "DocumentTypeRef::primary_key_tree_type".to_string(), known_versions: vec![0], received: version, })), } } } }
primary_key_tree_type() is the single source of truth — every Drive code path that needs to know which tree variant to read from or write to routes through this helper, including:
- Contract insert and update (to
CREATEthe right tree element when the document type is added). - Document insert / delete (to know how to update the count alongside the document).
- Cost estimation (so fees match the variant that will actually be used).
The contract insert/update paths use three thin Drive helpers parallel to the existing batch_insert_empty_tree / batch_insert_empty_sum_tree:
batch_insert_empty_tree— NormalTree.batch_insert_empty_count_tree— CountTree, used whendocuments_countable() && !range_countable().batch_insert_empty_provable_count_tree— ProvableCountTree, used whenrange_countable().
Each helper goes through LowLevelDriveOperation::for_known_path_key_empty_*_tree (or its _estimated_path_key_* cousin in cost-estimation paths), so the contract setup, document operations, and proof generation all see the same on-disk shape.
Storage-Layout Invariants
Because the tree variant is fixed at contract-creation time and baked into how the tree element is laid out on disk, two flags are immutable across a contract update:
- Changing
documents_countablefrom any state to any other state on avalidate_configupdate returnsDocumentTypeUpdateError. - Same for
range_countable.
Tests pinning these guards live in packages/rs-dpp/src/data_contract/document_type/methods/validate_update/v0/mod.rs. Don't relax them: if a NormalTree-backed document type were silently switched to CountTree mid-contract, every subsequent insert or delete would update a count value attached to a tree element that physically isn't a count tree, leading to grovedb element-shape errors at best and consensus drift at worst.
Counting Documents at Query Time
A single unified gRPC endpoint exposes the feature: GetDocumentsCount. The response shape varies by request mode (total / per-In-value / per-distinct-value-in-range / total-over-range), see Range Modes below. The wire-level shape makes that split explicit: on the no-proof path the response's CountResults carries an inner oneof variant { uint64 aggregate_count; CountEntries entries; } — total-count and range-without-distinct modes return aggregate_count (a single u64), per-In-value and per-distinct-value-in-range modes return entries (a list of CountEntry { optional bytes in_key; bytes key; uint64 count } where in_key is the prefix value for compound In + range shapes and absent for flat queries). The endpoint has two underlying paths (prove vs. no-prove); every mode — including return_distinct_counts_in_range = true — is valid on both paths. The prove path uses two different proof shapes depending on whether you want a single aggregate or per-distinct-value entries (see Prove (Client-Side Verify-Then-Aggregate or Aggregate-Count Proof) below).
No-Prove (Server-Side O(1) or O(log n))
When prove=false, drive-abci calls into DriveDocumentCountQuery (in packages/rs-drive/src/query/drive_document_count_query/mod.rs). The handler picks a path based on the where clauses:
Unfiltered total (no where clauses) on a documentsCountable: true document type (Drive::read_primary_key_count_tree):
The doctype's primary-key tree at [contract_doc, contract_id, 1, doctype, 0] is itself a CountTree. One grovedb read gives count_value — the total document count. O(1).
Equal/In only (execute_no_proof):
- Pick a
countable: trueindex whose properties exactly match the Equal/In where-clause fields — every index property has a matching clause, no orphan clauses, no uncovered properties. If no such index exists the request rejects withWhereClauseOnNonIndexedProperty(the strict-coverage contract; see "Index design" below). - Walk the tree from the root down to the terminal level, pushing
prop_nameandserialize_value_for_key(prop_name, value)at each step.Equalextends one path;Inclones the current path once per value in its array (a cartesian fork) and the per-branch counts are summed. - Read the
CountTreeelement at the resulting path and return itscount_value. O(1) per branch.
If the request carries an In clause, the response is the entries variant — one CountEntry per In value (the per-value split mode). Otherwise the response is the aggregate_count variant — a single u64.
Index design contract: a countable: true index counts exactly its declared properties. Want count(*) WHERE color = X? Define a [color] countable index. Want count(*) WHERE color = X AND shape = Y? Define a [color, shape] countable index. Want both? Define both. Partial coverage (e.g. color = X against a [color, shape] index) is rejected — define a more specific countable index, or set documentsCountable: true on the document type for unfiltered total counts. The prove path enforces the same contract, so prove=true and prove=false reject in the same situations with the same error.
Range (execute_range_count_no_proof):
- Pick a
range_countable: trueindex where the Equal/In clauses cover the prefix and the range operator hits the index's last property. - Build the path
[contract_doc, doctype, prefix..., range_prop_name]— pointing at the property-nameProvableCountTree. - Issue a grovedb path query with the converted range
QueryItem(>,>=,<,<=,Range,RangeInclusive,RangeAfter,RangeAfterTo,RangeAfterToInclusive) and walk the children whose keys lie inside the range. - Each child's
count_value_or_default()is the doc count at that property value. Either sum all per-value counts and return as theaggregate_countvariant (summed mode), or emit them as per-valueCountEntrys under theentriesvariant (distinct mode), then apply order / cursor / limit.
Prove (Client-Side Verify-Then-Aggregate or Aggregate-Count Proof)
When prove=true, the proof shape depends on whether the query carries a range clause.
With a range clause: the handler picks one of two prove sub-paths based on return_distinct_counts_in_range:
-
Aggregate (
return_distinct_counts_in_range = false, default): drive-abci builds a grovedbAggregateCountOnRangepath query against the property-nameProvableCountTree, andget_proved_path_queryproduces an aggregate-count proof. The client verifies viaGroveDb::verify_aggregate_count_queryand recovers(root_hash, count)directly — proof size is O(log n) regardless of how many keys match. No documents are ever materialized. -
Distinct (
return_distinct_counts_in_range = true): drive-abci builds a regular range path query (noAggregateCountOnRangewrapper) against the sameProvableCountTree. Because the leaf is aProvableCountTree, merk emits oneNode::KVCount(key, value, count)op per matched in-range key, with eachcountcryptographically committed to the merk root vianode_hash_with_count(kv_hash, l_hash, r_hash, count)— same forge-resistance as the aggregate path'sHashWithCountcollapse. The SDK'sdrive_proof_verifier::verify_distinct_count_proofruns the standard hash-chain check, then walks the proof's op stream to extract the counts as aBTreeMap<Vec<u8>, u64>. Trade-off vs. the aggregate path: proof size is O(distinct values matched) rather than O(log n), because each distinct in-range key emits its ownKVCountop instead of being collapsed into a boundary subtree. Acceptable for typical histograms (a few dozen distinct values in range); for "give me a single count" use the aggregate path instead.
Without a range clause (point-lookup with prove): two sub-paths based on the request shape.
-
Unfiltered total +
documentsCountable: true: drive-abci proves the doctype's primary-keyCountTreeelement at[contract_doc, contract_id, 1, doctype, 0]. One merk path proof; the SDK'sdrive_proof_verifier::verify_primary_key_count_tree_proofreadscount_valueoff the verified element. O(log n) bytes. -
Equal/In against a fully-covering
countable: trueindex: drive-abci proves oneElement::CountTreeper covered branch. Two sub-shapes:- Equal-only fully-covered → one element at
[..., last_field, last_value, 0]. Inat any index position (with any number of trailing Equals) → one element per In value, fetched via outer Query + a subquery whoseset_subquery_pathcarries the post-In Equal segments (zero of them when In is on the last property; one or more when In sits earlier in the index). The subquery'sKey([0])picks off the CountTree at[..., in_field, in_value, <trailing equals>, 0]for each matched In branch.
The In position rule for count queries is more permissive than the regular document query path's
Index::matchesrule (which restricts In to last-or-before-last because of a positional path-construction assumption — seeDriveDocumentQuery::get_non_primary_key_path_queryfor the layout that forces it). The count path doesn't have that constraint: there's no document-key terminator descent, noorder_byinterpretation, and nolimit/offsetsemantics — it's a pure CountTree-element lookup, soset_subquery_pathwith an arbitrary trailing tail works. Both no-proof and prove count executors route through a singlepoint_lookup_count_path_querybuilder (no-proof runs the path query viagrove.queryand sums the emittedCountTreeelements' counts; prove signs the same path query viaget_proved_path_query), so they accept the same query shapes by construction. The SDK'sdrive_proof_verifier::verify_point_lookup_count_proofverifies and extractscount_value_or_default()from each verified element. - Equal-only fully-covered → one element at
Both sub-paths share the proof shape: each CountTree element's count_value is cryptographically bound to the merk root via node_hash_with_count(kv_hash, l_hash, r_hash, count), same forge-resistance guarantee the range-distinct path relies on. Neither materializes documents or runs per-key bookkeeping client-side.
Proof size: O(k × log n) where k is the number of covered branches (1 for the documents_countable fast path and Equal-only fully-covered case; ≤ |In values| for Equal-prefix + In-on-last).
Symmetric rejection contract: prove count requires a countable: true index whose properties exactly match the where clauses — same requirement as the no-proof Total / PerInValue modes. Partial coverage (where the where clauses are a strict prefix of the index, or the index has uncovered properties) rejects with a WhereClauseOnNonIndexedProperty-class error pointing the caller at the index-design fix. The documents_countable: true fast path handles unfiltered total counts in O(log n) proof bytes when set on the document type. No silent fallback to materializing matching documents — that path doesn't exist anymore.
Implementation reference:
- Path query:
DriveDocumentCountQuery::point_lookup_count_path_query— shared by prover and verifier. - Server executor:
DriveDocumentCountQuery::execute_point_lookup_count_with_proof. - Verifier:
DriveDocumentCountQuery::verify_point_lookup_count_proof; SDK wrapperdrive_proof_verifier::verify_point_lookup_count_proofcomposes tenderdash signature verification on top.
Supported Where Operators
The no-prove fast path covers three operator shapes:
Equal(==) — single point lookup against the count tree at a fully-resolved index path. Picked byfind_countable_index_for_where_clauses.In(in) — cartesian fork. Each value in theInarray becomes its own index path; their counts are summed (or, for split counts, merged by split key). AnInclause withkvalues costskpoint lookups, not a tree walk. TheInclause also doubles as the per-value split signal in the unifiedGetDocumentsCountendpoint — at most oneInper request.- Range (
>,>=,<,<=,between*,startsWith) — walks the property-nameProvableCountTree's children whose keys lie inside the range, reading each childCountTree's count value. Picked byfind_range_countable_index_for_where_clauses; requires the index to haverange_countable: trueAND the range property to be the index's last property (the IndexLevel terminator).startsWith "p"becomes the half-open range[serialize("p"), serialize("p") with last byte +1)— the same byte-incremented encoding the normal docs path uses (seeconditions.rs'sStartsWitharm), valid for UTF-8 string keys since UTF-8 never contains0xFF.
Through the unified GetDocumentsCount request handler, range queries take a single range terminator clause plus a prefix of Equal clauses and/or one In clause. In on a prefix property exercises grovedb's native subquery primitive — each emitted entry then carries both the in_key (the In value for that fork) and the key (the terminator value within the range). Per-fork counts are NOT merged server-side — see No-Merge Compound Semantics below for rationale.
Range Modes
A range query in the unified endpoint produces one of two response shapes, controlled by return_distinct_counts_in_range:
return_distinct_counts_in_range = false(default) —CountResults.aggregate_countcarrying the sum of the per-valueCountTreecounts within the range. Use for "how many widgets have color in[red, tomato]?".return_distinct_counts_in_range = true—CountResults.entrieswith oneCountEntryper distinct property value within the range (key= serialized terminator value,count=CountTreecount for that value,in_key= the In-fork value for compound queries or absent for flat queries). Use for "show me a histogram of widgets by color in[red, tomato]".
No-Merge Compound Semantics
For compound queries (In on a prefix property + range on the terminator), the entries are returned unmerged — one CountEntry per emitted (in_key, key) pair. The server does NOT collapse them down to a flat histogram keyed only by key. This is a load-bearing design choice:
- Correctness under
limit. Pushing alimitinto grovedb's path query truncates the emitted elements before any merge could run. With cross-fork merging this can undercount the merged sums (e.g.brand IN (acme, contoso)+color > x+limit=1could returnacme/red, count=2and silently dropcontoso/red, count=3so the mergedredcount comes out as2instead of5). Without merge,limitand the user's "number of entries returned" mean the same thing. - Proof verification stays straightforward. A malicious server omitting one
Inbranch shows up as missing entries with thatin_keyrather than as a silent undercount in a merged total. The caller can detect "I asked for 3 In values but only got entries for 2" directly from the response shape. - No information loss. A caller who wanted the merged histogram can compute
result.fold(by=key, sum=count)client-side trivially. A caller who wanted per-(in_key, key)counts can't reverse a merged histogram.
The rs-sdk surfaces this via DocumentSplitCounts.0: Vec<VerifiedSplitCount>. Callers wanting the historical flat-map shape can call DocumentSplitCounts::into_flat_map() which sums across in_key forks.
Flat queries (no In on prefix) have in_key = None on every entry; for those callers the API behaves identically to the pre-no-merge shape.
Pagination
Distinct mode accepts pagination knobs:
| Field | Effect |
|---|---|
order_by | CBOR-encoded list of [field, "asc"|"desc"] clauses, same shape as GetDocumentsRequestV0.order_by. First clause's direction controls split-mode entry ordering; ascending (default) walks the range in BTreeMap natural order, descending reverses. Only meaningful in split modes (per-In-value, per-distinct-value-in-range, prove-distinct); total-count and aggregate-prove responses are scalar and have no entry ordering. |
limit | Truncate after min(requested, max_query_limit) entries; applied last (after order). Unset (None) is normalized to default_query_limit before the cap is applied — the server never walks an unbounded distinct-mode result set, even if the client omits the field. Clients that want a tight working-set should still set this explicitly. |
For pagination, clients narrow the underlying range itself rather than passing a cursor — page 2 is just color > <last-key-from-page-1> with the same limit. There's no cursor field on the request because a single-bytes cursor would be ambiguous for compound (In + range + distinct) queries whose natural sort is (in_key, key), and range narrowing has the same expressivity for the simple cases.
These knobs are ignored on summed mode (they have no defined meaning for a single aggregate).
Range Queries on the Prove Path
When prove = true and the query carries a range clause, the handler picks one of two prove sub-paths based on return_distinct_counts_in_range. The aggregate sub-path (default) builds a grovedb AggregateCountOnRange proof — verified via GroveDb::verify_aggregate_count_query, recovering (root_hash, count) without materializing any matching documents. Proof size is O(log n) regardless of how many documents match. The distinct sub-path (return_distinct_counts_in_range = true) builds a regular range proof against the property-name ProvableCountTree — the leaf merk emits per-(in_key, key) KVCount ops, each bound to the merk root via node_hash_with_count, and the SDK extracts them as a Vec<VerifiedSplitCount> (preserving the unmerged compound shape per No-Merge Compound Semantics). Distinct proof size is O(distinct (in_key, key) pairs matched) instead of the aggregate's O(log n) — pick the aggregate path when you want one number, the distinct path when you want a histogram.
In on a prefix property is supported on the distinct sub-path: grovedb's outer Query enumerates Key(in_value) entries at the In-bearing prop's property-name subtree, set_subquery_path carries any post-In Equal pairs + terminator name, and set_subquery is the range item. The aggregate sub-path still rejects In on prefix because AggregateCountOnRange is a single-range merk primitive that can't fork at the merk layer — for compound aggregates, callers use return_distinct_counts_in_range = true and reduce client-side via DocumentSplitCounts::into_flat_map.
A "desc" direction in the first order_by clause is supported on the distinct sub-path. The derived direction flows into grovedb's Query.left_to_right on both the outer In-keys Query and the inner range subquery, so descending iteration walks (in_key_desc, key_desc) tuples. The prover and verifier MUST agree on this direction — the path query bytes include it, and disagreement breaks merk-root recomputation. The SDK derives left_to_right from the first request.document_query.order_by_clauses direction, matching the server's derivation in drive_dispatcher, so the two stay in lockstep by construction. Combined with limit, descending order returns the LAST limit matched entries (the largest keys) rather than the first limit reversed — exactly what callers paginating from the end expect.
For point-lookup count proofs (no range clause), drive emits a CountTree element proof against the covering countable index — proof size is O(k × log n) where k is the number of covered branches, with no cap on the underlying document count. See the Prove path section above for the symmetric-rejection contract.
Range Queries and ProvableCountTree
Range count queries (>, <, between*) over an index with range_countable: true are answered in O(log n) by walking the property-name ProvableCountTree's boundary nodes. The proof path uses grovedb's AggregateCountOnRange, which lets clients verify a range count without ever materializing the underlying documents.
Offset-style queries ("the next 50 items starting after item 7") are a separate primitive that will likely build on the same
ProvableCountTreeshape. They are not exposed viaGetDocumentsCounttoday — pagination of distinct-mode entries is done by narrowing the range itself (e.g.color > <last-key-from-previous-page>), not by offsetting into the underlying documents.
Why Internal-Node Counts Make Range Counts O(log n)
In a sorted merk tree the keys partition into a left (smaller) and right (larger) subtree at every internal node. To answer a question like "how many items have a key strictly greater than 7?" you walk the boundary between "below 7" and "above 7" from the root down, and at each step you can decide what to do with the other subtree — the one not on the boundary path — based on a single read:
- If a subtree lies entirely above the cutoff, add its full count and don't descend into it.
- If it lies entirely below, ignore it (contributes 0) and don't descend.
- If it straddles the cutoff, recurse into it (it is then the next step on the boundary path).
On a ProvableCountTree every internal node carries the count of its left and right subtrees, so the "add the full count" step is a single O(1) read of the node we're already touching. The whole walk visits one node per tree level — O(log n) — and every visited node is on the boundary path. The total ends up as a sum of pre-committed sub-counts plus zero or one straddle leaf at the bottom.
Concretely, picture a ProvableCountTree of 8 items with sorted integer keys 1, 3, 5, 7, 9, 11, 13, 15 — three full levels of internal nodes plus a leaf row:
flowchart TB
R["root c=8"]:::countroot
R --> L1["c=4"]:::countnode
R --> R1["c=4"]:::countnode
L1 --> LL["c=2"]:::countnode
L1 --> LR["c=2"]:::countnode
R1 --> RL["c=2"]:::countnode
R1 --> RR["c=2"]:::countnode
LL --> x1["key=1, c=1"]:::leaf
LL --> x3["key=3, c=1"]:::leaf
LR --> x5["key=5, c=1"]:::leaf
LR --> x7["key=7, c=1"]:::leaf
RL --> x9["key=9, c=1"]:::leaf
RL --> x11["key=11, c=1"]:::leaf
RR --> x13["key=13, c=1"]:::leaf
RR --> x15["key=15, c=1"]:::leaf
classDef countroot fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
classDef countnode fill:#3fb950,color:#0d1117,stroke:#3fb950,stroke-width:2px;
classDef leaf fill:#21262d,color:#c9d1d9,stroke:#484f58;
For "give me the count of items with key > 6":
- root (c=8): 6 falls inside the left subtree (which holds 1–7). Read both children's sub-counts. Right subtree's keys are all > 6 → take its full
c=4and don't descend. Recurse into left. - left (c=4): 6 falls inside its right subtree (which holds 5,7). Read both children. Left-left's keys (1,3) are both ≤ 6 → contribute 0 and don't descend. Recurse into left-right.
- left-right (c=2): 6 splits this leaf-pair. Read both leaves. Key 5 ≤ 6 → contribute 0. Key 7 > 6 → contribute 1.
- Total = 4 (right of root) + 0 (left-left) + 0 (key=5) + 1 (key=7) = 5.
We visited 4 internal nodes on the boundary path (root → left → left-right → key=7) and read sub-counts off 3 siblings (right, left-left, key=5) without descending into them. Six of the eight items were never enumerated: their counts were summed straight out of the committed sub-count fields. The walk is O(log n) in tree depth regardless of how many items live under each skipped subtree.
Why This Is Provable
A merk proof of the same boundary walk includes:
- The boundary path from root to the leaf adjacent to the cutoff.
- The siblings of every node on the boundary path (so the verifier can recompute hashes up to the merk root).
Each sibling node, on a ProvableCountTree, ships its committed sub-count alongside its hash. The verifier walks the same logic the server did — "this sibling lies entirely above 7, add its c=… value" — and ends up with the same total without enumerating the sibling subtrees. Verification is also O(log n).
The same primitive answers any range query of the form [A, B]: walk to the cutoff at A, then to the cutoff at B, and combine sub-counts along the way. [A, ∞) and (-∞, B] are special cases.
Authoring a Contract That Uses Count Trees
There are two opt-in surfaces in the document meta-schema. They're independent and can be used together:
- Top-level flags on the document type control the primary-key tree variant — the tree that stores documents keyed by document ID. This is what
GetDocumentsCount(with no equality predicates) reads. - A per-index
countable: trueflag controls whether that specific index's tree carries counts. This is what enables the no-prove fast path for queries that filter by the index's leading equality columns.
Primary-Key Tree Flags
Set at the same level as type / properties / indices on a document type:
{
"widget": {
"type": "object",
"documentsCountable": true,
"properties": {
"name": { "type": "string", "position": 0, "maxLength": 64 },
"color": { "type": "string", "position": 1, "maxLength": 16 }
},
"additionalProperties": false
}
}
That contract gets a CountTree for the widget primary-key tree. GetDocumentsCount for widget with no where filter is now an O(1) lookup of the tree element's count value.
To opt into a ProvableCountTree for the primary-key tree instead — useful if you want range queries on the primary key in the future, or if you intend to use this document type behind range proof primitives — set rangeCountable: true at the document-type level. It implies documentsCountable, so you don't need both:
{
"widget": {
"type": "object",
"rangeCountable": true,
"properties": {
"name": { "type": "string", "position": 0, "maxLength": 64 },
"color": { "type": "string", "position": 1, "maxLength": 16 }
},
"additionalProperties": false
}
}
These two flags are immutable across a contract update. You pick the tree variant at contract creation; you can't switch to a different one later without creating a new document type. (See Storage-Layout Invariants above.)
Per-Index Countable Flag
Set on a single entry in the document type's indices array:
{
"widget": {
"type": "object",
"documentsCountable": true,
"properties": {
"name": { "type": "string", "position": 0, "maxLength": 64 },
"color": { "type": "string", "position": 1, "maxLength": 16 }
},
"indices": [
{
"name": "byColor",
"properties": [{ "color": "asc" }],
"countable": true
}
],
"additionalProperties": false
}
}
With byColor.countable: true the byColor index's tree carries counts, so GetDocumentsCount with where: [["color", "==", "red"]] reaches the count via that index in O(1). Without the flag, find_countable_index_for_where_clauses skips the index and the query rejects with WhereClauseOnNonIndexedProperty — there's no slow fallback, only fast counts on properly-indexed properties.
The countable field accepts three forms:
| JSON value | Tree variant | Capabilities |
|---|---|---|
false (or omitted, or "notCountable") | NormalTree | No count fast path |
true (or "countable") | CountTree | O(1) totals at the root |
"countableAllowingOffset" | ProvableCountTree | O(1) totals plus per-node counts that will enable future O(log n) range / offset queries on this index |
The boolean true / false form is kept for back-compat with contracts written before the enum form was introduced; new contracts should prefer the explicit string variants for clarity, especially "countableAllowingOffset" when range/offset queries are wanted.
A few notes about the index-level flag:
- Setting any countable variant increases storage cost — every insert and delete updates the index tree's count alongside the document.
"countableAllowingOffset"costs more than plain"countable"(every internal node carries count metadata, not just the root). Don't sprinkle it on every index; opt in for the ones you'll actually count by, and use the cheaper variant unless you specifically need the offset capability. - The flag is on the whole index, not per-property. The index handles
count(*)queries whose equalitywhereclauses cover the index's properties exactly — every index property has a matching==(orin) clause, and every clause's field appears in the index. A["color", "size"]countable index gives you O(1) counts forWHERE color = X AND size = Y— butWHERE color = Xalone is rejected withWhereClauseOnNonIndexedPropertybecause that index doesn't claim to count by color alone. If you want both single-column-by-color counts AND compound color+size counts, define both["color"]and["color", "size"]countable indexes (or just["color"]if size-filtered counts aren't a hot path). The picker is strict by design: each countable index represents a deliberate decision about which count queries the contract supports. - Index-level countable is independent of the primary-key flags. You can have
documentsCountable: trueon the document type ANDcountable: trueon a specific index — the first gives you fast totals, the second gives you fast filtered counts that match that index. countableon auniqueindex is mostly a no-op, but not always. A unique index stores its terminal as a bare reference at key[0]rather than wrapping it in a count tree, so for documents whose indexed fields are all non-null the flag has no storage effect — insertion bypasses the count-tree code entirely. It does still do meaningful work for null-bearing entries: when a document has any null value among the indexed properties, insertion takes the same count-tree branch a non-unique index uses (because uniqueness can't be enforced on null), and the count tree at that path aggregates them. Socountableon a unique index is worth setting when at least one of the indexed properties is optional in the schema and you expect null values; otherwise it's an inert flag. Counts on all-non-null exact matches still return correctly (1 if present, 0 if not) because the on-disk reference reads as count 1 via grovedb's default-aggregate semantics.
Choosing What to Set
| You want | Set |
|---|---|
Fast count(*) for the whole document type | documentsCountable: true on the document type |
O(1) filtered count: count(*) WHERE col = X | countable: true on an index whose properties are exactly ["col"]. A composite index whose leading column is col (e.g. ["col", "other"]) does NOT answer this query — partial coverage rejects with WhereClauseOnNonIndexedProperty. Define a separate ["col"] countable index if you want this count. |
Per-In-value sub-counts: one CountEntry per value in an In clause | countable: true on an index whose properties exactly match the query's == clauses plus the In field. The In field may sit at any position in the index — both the no-proof and prove count paths use set_subquery_path to descend through any trailing Equals after the In, which is strictly more permissive than the regular document query path's last-or-before-last rule. E.g. WHERE color IN [...] needs ["color"]; WHERE brand = X AND color IN [...] needs ["brand", "color"]; WHERE brand IN [...] AND model = X AND year = 2024 needs ["brand", "model", "year"] with In on brand (position 0 of 3). |
O(log n) range count: count(*) WHERE col BETWEEN A AND B | rangeCountable: true on an index whose last property is col and whose other properties cover any equality predicates as a prefix. Implies countable: true. |
Per-distinct-value range histogram: one CountEntry per distinct value in a range | Same rangeCountable: true index as above, plus return_distinct_counts_in_range = true on the request. Available on both prove and no-prove paths; the prove path returns a regular range proof against the property-name ProvableCountTree and the SDK extracts per-key counts from the proof's KVCount ops via drive_proof_verifier::verify_distinct_count_proof. |
Range count proof (prove = true + range clause) | Same rangeCountable: true index. The handler uses grovedb's AggregateCountOnRange proof primitive — proof is O(log n), no cap on matched docs. |
| Future offset-style range queries (not yet released — see above) | rangeCountable: true on the document type |
| Nothing count-aware (default) | Don't set any of these flags. Primary-key tree stays a NormalTree. |
A migration check from dapi-grpc server logic: every count query requires either documentsCountable: true (for unfiltered totals) or a countable: true / rangeCountable: true index whose properties exactly match the query's where-clause fields. No covering index → the call returns a clear InvalidArgument describing what the picker was looking for ("requires a range_countable: true index whose last property matches the range field" for range queries, "requires a countable index whose properties exactly match the where clause fields" for Equal/In queries). Pick your indexes deliberately at contract creation time — per-index countable: true / rangeCountable: true flags can't be added later (contract indexes are immutable post-creation).
SDK Access at Three Layers
rs-sdk (native Rust)
Both shapes land on the standard Fetch trait against a single DocumentCountQuery:
#![allow(unused)] fn main() { use dash_sdk::platform::documents::document_count_query::DocumentCountQuery; use dash_sdk::platform::Fetch; use drive::query::{WhereClause, WhereOperator}; use drive_proof_verifier::{DocumentCount, DocumentSplitCounts}; // Total count: no In clause. let DocumentCount(count) = DocumentCount::fetch( &sdk, DocumentCountQuery::new(contract.clone(), "widget")?, ) .await? .expect("DocumentCount::fetch always returns a value on success"); // Split count: signal split by including an `In` clause whose field // is the split property. The In's values enumerate the keys to count. let split_query = DocumentCountQuery::new(contract, "widget")? .with_where(WhereClause { field: "color".to_string(), operator: WhereOperator::In, value: platform_value::Value::Array(vec![ "red".into(), "blue".into(), "green".into(), ]), }); let splits = DocumentSplitCounts::fetch(&sdk, split_query) .await? .expect("DocumentSplitCounts::fetch always returns a value on success"); // `splits` is `DocumentSplitCounts(Vec<SplitCountEntry>)` — for the // flat-histogram view, collapse via `splits.into_flat_map()`. }
DocumentCountQuery wraps an internal DocumentQuery (so it reuses where-clause / order-by / contract-id machinery) and exposes with_where(WhereClause) + with_order_by(OrderClause) builders. The SDK picks the request mode (total / per-In-value / total-range / per-distinct-range) from query shape — Equal/In/range operators in the where clauses — plus explicit request flags. return_distinct_counts_in_range = true (set via .with_distinct_counts_in_range(true)) selects per-distinct-range over the default total-range when a range clause is present; without it a range query returns a single sum.
wasm-sdk (browser)
Two methods on the WasmSdk JS class — one entry per [plain | withProofInfo] variant covers every count mode, because the underlying DocumentSplitCounts::fetch dispatches on the query shape:
sdk.getDocumentsCount(
query: DocumentsQuery,
): Promise<Map<string, bigint>>;
sdk.getDocumentsCountWithProofInfo(
query: DocumentsQuery,
): Promise<ProofMetadataResponseTyped<Map<string, bigint>>>;
Result shapes:
- No
where, or Equal-onlywhere— single map entry with the empty-string key carrying the total count. whereincludes anInclause — one entry per (deduped) In value, keyed by the hex-encoded canonical bytes of that value.whereincludes a range clause +returnDistinctCountsInRange: true— one entry per distinct property value in the range. For compoundIn + range + distinctqueries, entries are summed by terminatorkeyinto a flat map (callers needing the unmerged per-(in_key, key) view should use a richer binding).
Map keys are always hex-encoded bytes matching the canonical serialize_value_for_key encoding of each property value, so callers that need a typed key ("red", 42, etc.) need to hex-decode and interpret per the contract's index-property type. The hex-encoded shape matches the no-prove server response, so merging or comparing count maps from prove and no-prove paths needs no transformation.
rs-sdk-ffi (iOS / native bindings)
#![allow(unused)] fn main() { dash_sdk_document_count( sdk, data_contract, document_type, where_json, // null or JSON [{field, operator, value}] order_by_json, // null or JSON [{field, direction}] return_distinct_counts_in_range, // bool limit, // i64; -1 = server default, >= 0 = explicit cap ) -> JSON {"counts": {"<hex-key>": <u64>, ...}} }
Single FFI entry covers every count mode — the result is always {"counts": {...}} with hex-encoded keys. For total counts (no where/In, distinct flag off), the map carries a single entry with the empty-string key. where_json is the same JSON shape dash_sdk_document_search already accepts ([{field, operator, value}]), so iOS callers can reuse their where-clause encoding. order_by_json is optional and controls split-mode entry ordering only (per-In-value and per-distinct-value-in-range results); pass null for total counts and aggregate range counts where ordering has no defined meaning. The endpoint returns its result as a JSON-encoded C string allocated on the heap — caller frees it via the standard SDK string-free routine.