Count Index Group By Examples

This chapter is the GROUP BY companion to Count Index Examples. It uses the same widget contract, the same 100 000-row fixture, and the same bench at packages/rs-drive/benches/document_count_worst_case.rs. Read chapter 29 first — most of the mechanics (CountTree variants, the merk-proof reconstruction algorithm, node_hash_with_count and friends) carry over unchanged.

What's different here:

  • Every query in chapter 29 returns either a single u64 aggregate or a small list of CountTrees the caller sums. The verifier-side payload shape is one count, total.
  • Every query in this chapter returns one count per group. The caller gets back a Vec<(group_key, count)> and can index it directly — no summation.

The most important thing to understand up front: group_by is two things at once — a result-shaping directive for the SDK and (for some queries) a proof-shaping directive for the prover. When you pass group_by = [...] in a count request, you're always telling the SDK "don't collapse the result into a single number — give me one count per group key." That result-shaping role is universal: it's what turns Aggregate(sum) into Entries([(key, count), …]).

Whether group_by also changes the proof bytes depends on the query shape. For queries where the underlying proof already commits one CountTree per matched key (single-property INs, for instance), the per-group breakdown is reconstructible from the existing bytes — the prover ships the same proof, the SDK just zips it with the group keys instead of summing. For range queries and certain compound shapes, the per-group breakdown can't be reconstructed from the aggregate-style proof (which commits opaque subtree counts rather than per-key counts), so passing group_by forces the prover to emit a structurally different, larger proof.

The interesting question this chapter answers is: which queries fall into which bucket, and why?

When group_by Changes the Proof (and When It Doesn't)

Filtergroup_byAggregate proof (no group_by)Group-By proofProof bytes change?
brand IN [b0, b1][brand]Q5 — 1 102 B1 102 B (2 entries)No — byte-identical
color IN [c0, c1][color]Q6 — 1 381 B1 381 B (2 entries)No — byte-identical
color > floor[color]Q7 — 2 072 B (1 u64)10 992 B (100 entries)Yes — different primitive
brand == X AND color > floor[brand, color]Q8 — 2 656 B (1 u64)not allowed in this form

The key observation: IN clauses produce proofs that already commit one CountTree per resolved key, so adding group_by on the same property is purely a verifier-side relabel — the prover ships the same bytes, the verifier just returns them as Entries(...) instead of Aggregate(sum). This is why G1 and G2 below are not new proofs — they're Q5 and Q6 reinterpreted.

So why pass group_by at all if the proof bytes don't change? Because without it, the SDK has no way to know you want the per-key breakdown. The same brand IN ["brand_000", "brand_001"] proof can answer two different questions:

  • "How many widgets total are made by brand_000 or brand_001?" → caller passes no group_by, SDK returns Aggregate(2 000).
  • "How many widgets per brand?" → caller passes group_by = [brand], SDK returns Entries([("brand_000", 1 000), ("brand_001", 1 000)]).

The bytes on the wire and the cryptographic guarantees are identical; the only thing that changes is which result shape the SDK delivers. Think of group_by as the count-query equivalent of SELECT brand, COUNT(*) ... GROUP BY brand versus SELECT COUNT(*) ... in SQL — same scan plan, different projection.

Range queries are different. AggregateCountOnRange (chapter 29's Q7) walks the boundary of the range over a ProvableCountTree and sums per-subtree counts directly — it never resolves individual keys. GroupByRange (this chapter) has to enumerate the distinct in-range keys to label each group, so it produces a different proof shape with one CountTree (or CountTree-feature-typed element) per distinct key in the range. That's where group_by genuinely earns its bytes — the prover has to do additional work because the per-group breakdown can't be reconstructed from AggregateCountOnRange's opaque-subtree-count commitments.

Queries in this Chapter

All proof-size and behaviour numbers below come from the same bench helper (report_group_by_matrix) as chapter 29's. The dispatcher's group_by surface validation lives in validate_count_query_groupby_against_index; the per-mode path-query builders sit in packages/rs-drive/src/query/drive_document_count_query/path_query.rs's group_by_* family.

#QueryFilter + group_byComplexityAvg timeProof sizeVerified shapeNotes
G1In on byBrandbrand IN ["brand_000", "brand_001"]
group_by = [brand]
O(k · log B)38.6 µs1 102 BEntries(2 groups, sum = 2 000)Byte-identical to Q5
G2In on byColorcolor IN ["color_00000000", "color_00000001"]
group_by = [color]
O(k · log C)62.1 µs1 381 BEntries(2 groups, sum = 200)Byte-identical to Q6
G3Compound In + Equalbrand IN [...] AND color == Y
group_by = [brand]
O(k · (log B + log C'))106.2 µs2 842 BEntries(2 groups, sum = 2)Per-In compound resolution; two parallel Q4 descents sharing L1–L6
G4Range on byColorcolor > "color_00000500"
group_by = [color]
O(R · log C)762.9 µs10 992 BEntries(100 groups, sum = 10 000)GroupByRange: enumerates distinct in-range keys instead of Q7's boundary aggregate
G5Compound In + Rangebrand IN [...] AND color > floor
group_by = [brand, color]
O(k · R' · log C')737.5 µs11 554 BEntries(100 groups, sum = 100)Compound In-fan-out × in-range distinct keys (G3 outer × G4 inner)
G6High-fanout In on byBrandbrand IN [100 values]
group_by = [brand]
O(k · log B)1 532 µs10 038 BEntries(100 groups, sum = 100 000)Scales linearly with |IN|; reveals every byBrand entry when |IN| = B
G7Carrier In + Range (byBrandColor)brand IN [...] AND color > "color_00000500"
group_by = [brand]
O(k · (log B + log C'))255.9 µs4 332 BEntries(2 groups, sum = 998)Per-In aggregate via AggregateCountOnRange as a carrier subquery; one u64 per branch
G8Carrier outer Range + Range (byBrandColor)brand > "brand_050" AND color > "color_00000500"
group_by = [brand]
O(L · (log B + log C'))523 µs18 022 BEntries(10 groups, sum = 4 990)Outer-Range carrier with a platform-max SizedQuery::limit of 10; caller may pass smaller, can't pass larger

Complexity variables. B = distinct brands in the byBrand merk-tree (≈ 100); C = distinct colors in byColor (≈ 1 000); C' = distinct colors per brand in byBrandColor (≈ 1 000); R = distinct in-range values returned by GroupByRange (capped at 100 in this fixture by an implicit response-size limit); R' = distinct in-range values per fan-out branch (similarly capped); k = |IN| for the In-outer carrier shapes; L = the effective outer-walk limit for the Range-outer carrier shape (G8). The platform's MAX_CARRIER_AGGREGATE_OUTER_RANGE_LIMIT = 10 is both the default (when the caller passes no limit) and a hard ceiling; callers may pass a smaller limit to truncate further. See G8 for the rationale. As in chapter 29, the total document count N doesn't appear — count proofs read pre-committed count_values rather than enumerating docs.

Avg time is the criterion-reported median of cargo bench --bench document_count_worst_case -- 'document_count_worst_case/query_g' on the same 100 000-row warmed fixture used by chapter 29's query_N_* cases. Each row reflects 10 samples × ~3 k–130 k iterations per sample with 2 s warm-up and 5 s measurement; the median sits within ±2 % of the mean across reruns. G1 and G2 match their Q5 / Q6 counterparts to within ~3 µs — the residual is the SDK-side zip-vs-sum cost. G4 is ~11 × Q7 because GroupByRange enumerates 100 distinct in-range CountTrees rather than walking O(log C) boundary nodes; the time difference is exactly the complexity difference predicted (O(R · log C) vs O(log C)).

Group-By Shapes That Are Not Allowed

Several plausible-looking (where, group_by) combinations are rejected by the dispatcher before any proof generation. The rejections fall into four buckets — operator/group_by mismatch, missing range window, no covering index, and one currently-deferred aggregate variant. All are surfaced as typed QuerySyntaxErrors; the precise error strings appear in the bench's [matrix] output.

1. group_by field constrained by == instead of In or range

where    = brand == "brand_050"
group_by = [brand]

count query supports only ... (rejected because == produces exactly one entry whose key equals the where-clause's value — grouping by a field that already has a single value contributes no extra information).

Why. GROUP BY [field] is meaningful only when field can take multiple values in the result set. An == clause pins the field to exactly one value, so the group_by is structurally redundant — the dispatcher rejects it rather than silently returning a single-entry response that would look like a bug. Use Q2 / Q3 (no group_by) for single-value == queries.

Applies symmetrically: where = color == X, group_by = [color] is rejected for the same reason.

2. group_by contains a range field but the where clause doesn't range over it

where    = brand IN[...] AND color == "color_00000500"
group_by = [brand, color]

GROUP BY on a range field requires a range where-clause; the range field must appear in where for the distinct walk to have a window to iterate over

Why. group_by = [in_field, range_field] (GroupByCompound) routes through distinct_count_path_query, which needs a range window on the second field to know what values to enumerate. With color == Y the second dimension collapses to a single value, so the compound walk degenerates to a point lookup — and that's what Q4 / G3 are for. For compound plus range, the where must carry a range on the second field (which is what G5 does).

3. group_by orders fields in a way no covering index can serve

where    = color IN[...] AND brand > "brand_050"
group_by = [color, brand]

where clause on non indexed property error: range count requires a range_countable: true index whose last property matches the range field

Why. The covering index for (group_by[0] = color, group_by[1] = brand) would need to be byColorBrand with rangeCountable: true on the brand terminator. The widget contract doesn't have that index — only byBrand, byColor, and byBrandColor. The dispatcher's index picker walks every declared index, finds none whose (properties, last_property_is_range_countable) shape matches the request, and rejects with the "non-indexed property" error.

The fix is contract-level: declare a byColorBrand index with rangeCountable: true if the application needs this group_by order. The dispatcher itself can't infer alternate index orders from the request alone — rangeCountable: true is an explicit opt-in on each index because it changes the on-disk tree shape (NormalTree → ProvableCountTree on the property-name subtree).


To put these three buckets in one place: every rejected (where, group_by) shape on this contract reduces to one of:

  • the group_by field's where operator doesn't admit multiple values (bucket 1),
  • the group_by has a range slot that the where doesn't fill with a range (bucket 2),
  • there's no covering rangeCountable index in property order (bucket 3).

All three checks happen at request validation, before any GroveDB work. The bench's report_group_by_matrix exercises one example of each and prints the exact error string, so adding a new contract or index shape is a quick way to see which checks each new query shape hits.

Historical note. A fourth bucket — group_by = [in_field] with where = in_field IN[...] AND range_field > floor — was rejected before grovedb PR #663. That PR added support for AggregateCountOnRange as a carrier subquery under outer Keys, which unblocked the natural single-field-group_by shape (one aggregate count per In branch) at the merk layer. The dispatcher now routes that shape to [DocumentCountMode::RangeAggregateCarrierProof]; the worked-out example is G7 below.

G1 — In on byBrand, Grouped By brand

select   = COUNT
where    = brand IN ["brand_000", "brand_001"]
group_by = [brand]
prove    = true

Path query (identical to Q5):

path:         ["@", contract_id, 0x01, "widget", "brand"]
query items:  [Key("brand_000"), Key("brand_001")]

Verified payload (the only thing that differs from Q5):

Entries([
  ("brand_000", CountTree { count_value_or_default: 1000 }),
  ("brand_001", CountTree { count_value_or_default: 1000 }),
])

The SDK zips the In values with the two resolved CountTree elements (in lex-asc order) rather than summing them as Q5's CountMode::Aggregate does.

Proof size: 1 102 B. Proof bytes are byte-identical to Q5 — same path query, same merk ops, same hash composition. The dispatcher recognises that CountMode::GroupByIn on a single-property In clause resolves through the same point_lookup_count_path_query as CountMode::Aggregate does; only the response-shaping at the very end differs.

For the verbatim proof display, see Q5 in chapter 29 — every byte of the 1 102-byte proof is the same. Or ▶ open the proof interactively in the visualizer ↗ (same encoded payload). The diagrams below show the result-shaping difference.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree"]:::path
  BR ==> B000["brand_000: CountTree count=1000"]:::target
  BR ==> B001["brand_001: CountTree count=1000"]:::target
  BR -.-> BMore["brand_002 ... brand_099"]:::faded

  SDK["Verifier returns Entries([<br/>(&quot;brand_000&quot;, 1000),<br/>(&quot;brand_001&quot;, 1000)<br/>])"]:::sdk

  B000 -.-> SDK
  B001 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef faded fill:#21262d,color:#6e7681,stroke:#484f58;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

Identical to Q5's Layer-5+ diagram — same merk ops, same byBrand binary tree, same two KVValueHashFeatureTypeWithChildHash targets. The only difference is what the verifier returns at the end (Entries(...) instead of Aggregate(2000)); the per-layer structure is unchanged. See chapter 29 for the diagram.

G2 — In on byColor, Grouped By color

select   = COUNT
where    = color IN ["color_00000000", "color_00000001"]
group_by = [color]
prove    = true

Path query (identical to Q6):

path:         ["@", contract_id, 0x01, "widget", "color"]
query items:  [Key("color_00000000"), Key("color_00000001")]

Verified payload:

Entries([
  ("color_00000000", CountTree { count_value_or_default: 100 }),
  ("color_00000001", CountTree { count_value_or_default: 100 }),
])

Proof size: 1 381 B. Byte-identical to Q6 — same path query, same ProvableCountTree-style boundary commitments (KVHashCount ops carry running counts even though the SDK doesn't read them for this point lookup). The single difference from G1 is the underlying property-name tree type (ProvableCountTree for byColor vs NormalTree for byBrand); that affects the merk-boundary commitments but not the dispatcher's GroupByIn-vs-Aggregate routing.

For the verbatim proof display, see Q6 in chapter 29 — or ▶ open it interactively in the visualizer ↗.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> CO["color: ProvableCountTree"]:::path
  CO ==> C000["color_00000000: CountTree count=100"]:::target
  CO ==> C001["color_00000001: CountTree count=100"]:::target
  CO -.-> CMore["color_00000002 ... color_00000999"]:::faded

  SDK["Verifier returns Entries([<br/>(&quot;color_00000000&quot;, 100),<br/>(&quot;color_00000001&quot;, 100)<br/>])"]:::sdk
  C000 -.-> SDK
  C001 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#d29922,color:#0d1117,stroke:#1f6feb,stroke-width:2px;
  classDef faded fill:#21262d,color:#6e7681,stroke:#484f58;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

Identical to Q6's Layer-5+ diagram. The byColor ProvableCountTree at L6 carries the same KVHashCount running counts; the SDK ignores them for point-lookup group_by and reads only the two resolved targets' count_value_or_default.

G3 — Compound In + Equal, Grouped By brand

select   = COUNT
where    = brand IN ["brand_000", "brand_001"] AND color == "color_00000500"
group_by = [brand]
prove    = true

Path query (per-In compound resolution — outer Query on byBrand, inner subquery on byBrandColor's color terminator):

path:               ["@", contract_id, 0x01, "widget", "brand"]
query items:        [Key("brand_000"), Key("brand_001")]
subquery_path:      ["color"]
subquery items:     [Key("color_00000500")]

Verified payload:

Entries([
  ("brand_000", CountTree { count_value_or_default: 1 }),
  ("brand_001", CountTree { count_value_or_default: 1 }),
])

Each (brand, "color_00000500") pair has exactly 1 document in the bench's deterministic schedule.

Proof size: 2 842 B. Mode: CountMode::GroupByIn over the byBrandColor compound index.

Proof display:

Expand to see the structured proof (8 layers — two parallel brand-X → color → color_00000500 descents sharing L1–L6) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(
      0: Push(Hash(HASH[bd291f29893fb6f6d6201087746ca1f23a178dd08e1346cb6c127e91ae3623b3]))
      1: Push(KVValueHash(@, Tree(4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289), HASH[4a5a28cb1b40226aa35b2f0d502767df13268bdf4678627dbfde26a557acdf73]))
      2: Parent
      3: Push(Hash(HASH[19c924989e473a90d0848277d0b1498ccc8db3dc870cbc130e773f3d79ea5b71]))
      4: Child)
    lower_layers: {
      @ => {
        LayerProof {
          proof: Merk(
            0: Push(KVValueHash(0x4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289, Tree(01), HASH[5b90e1e952b7eef903cc9db2d9098e334a37f7e08cade52c6b2ea3bf4b56b645])))
          lower_layers: {
            0x4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289 => {
              LayerProof {
                proof: Merk(
                  0: Push(Hash(HASH[49e7191075272395ed72cf03e973987ede6e4945e08574fe77d725f4ce7ecdf8]))
                  1: Push(KVValueHash(0x01, Tree(776964676574), HASH[5d9a0fad8a3f32560f8e8950c1e84a7feabaab21b79bc72fec4482442844e2ef]))
                  2: Parent)
                lower_layers: {
                  0x01 => {
                    LayerProof {
                      proof: Merk(
                        0: Push(KVValueHash(widget, Tree(6272616e64), HASH[6c505f53f2ebf3de030cc2aca463d4b429aeb320a9fadb8ae68bb7903a22bb68])))
                      lower_layers: {
                        widget => {
                          LayerProof {
                            proof: Merk(
                              0: Push(Hash(HASH[9862894b16a0792688fdcf64edcb2ceade5c8b234649bfc6cfc6426869b0e9d9]))
                              1: Push(KVValueHash(brand, Tree(6272616e645f303633), HASH[68b697da99d6ea70a83eb41794dca7ba3938d0ba98fbfaeb3cd0c19b3b5d0ff2]))
                              2: Parent
                              3: Push(Hash(HASH[6c36729e93b1a316cbf60fe282eb630c0ed6e45db088e365110302b6c9caba86]))
                              4: Child)
                            lower_layers: {
                              brand => {
                                LayerProof {
                                  proof: Merk(
                                    0: Push(KVValueHash(brand_000, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[90ff6f6d9a3d901195982128130677243bfd27b75736206f3c8400966ef0d37b]))
                                    1: Push(KVValueHash(brand_001, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[484ca11fb4ec8f479be1f78af903ce0c9d4fe630517579fb0172c2576d6b9652]))
                                    2: Parent
                                    3: Push(Hash(HASH[8ca09dadc802a7efe03534ce4ad991b2f191f368878754a37b5e5c03d9498dab]))
                                    4: Child
                                    5: Push(KVHash(HASH[e5297b3ebe81c6435c29f712074da5f7c90265e12ed3d4f5af1f6d900e50c9f1]))
                                    6: Parent
                                    7: Push(Hash(HASH[50f373fd01dea89c992779764dff82cc7200b492be8f5cf3721627d5323bcbff]))
                                    8: Child
                                    9: Push(KVHash(HASH[cf78c9f1b1a1204bb2e437806f52c21e331392de3436388572bd1fa4bce1cdc7]))
                                    10: Parent
                                    11: Push(Hash(HASH[4a8dc186a95c8c4a1252fb51dbc407727f588eb5bdc8313c96f5c29889e13926]))
                                    12: Child
                                    13: Push(KVHash(HASH[d00ee7653e34e47d46004929b13ded33dff069ed9cc88342cecdf66a65fd8401]))
                                    14: Parent
                                    15: Push(Hash(HASH[7f1d17b9632f0bd440dacf5e841025482bc1d8145df3650301a95a5ee71ce8c8]))
                                    16: Child
                                    17: Push(KVHash(HASH[3ed48a5e35cb7546d329487b0e1ab8a81d7c5bec358c37449e6cbd956e3bb069]))
                                    18: Parent
                                    19: Push(Hash(HASH[eaef9fc530408393bc321409414814b290309a861f474a925a922250327affc6]))
                                    20: Child
                                    21: Push(KVHash(HASH[f776417ede76e6194706e483ac14ab7b3db6aa0461ec14ed5f8e5d20071363af]))
                                    22: Parent
                                    23: Push(Hash(HASH[b3fccba79c14fcc5e97ff6a3cd051228dc755e6de147bef690ba9681264b2b9f]))
                                    24: Child)
                                  lower_layers: {
                                    brand_000 => {
                                      LayerProof {
                                        proof: Merk(
                                          0: Push(Hash(HASH[d605b4b78e674fd77371ea6adb32ce3e58ee3b96d73c4d34df84159661634587]))
                                          1: Push(KVValueHash(color, NonCounted(ProvableCountTree(636f6c6f725f3030303030353131, 1000, flags: [0, 0, 0])), HASH[fccc0c94657f2a78084f789bb6f687c4bba295e3a062f3199bc33f14dd2b7fe2]))
                                          2: Parent)
                                        lower_layers: {
                                          color => {
                                            LayerProof {
                                              proof: Merk(
                                                ... 37 ops — same boundary shape as Q4 / Q8's L8,
                                                terminating at op 18 with
                                                Push(KVValueHashFeatureTypeWithChildHash(
                                                  color_00000500, CountTree(00, 1, ...),
                                                  HASH[6834...], ProvableCountedMerkNode(1),
                                                  HASH[840c...]))
                                                — TARGET 1
                                              )
                                            }
                                          }
                                        }
                                      }
                                    }
                                    brand_001 => {
                                      LayerProof {
                                        proof: Merk(
                                          0: Push(Hash(HASH[f54769bf6e9d24b9dba53ebd37c9ceb3485b3c6511f8de6f17860676fe4d9331]))
                                          1: Push(KVValueHash(color, NonCounted(ProvableCountTree(636f6c6f725f3030303030353131, 1000, flags: [0, 0, 0])), HASH[8f883171c33df0aba2541a5b9d6195faac7bd1ffef93e8ddcaf9d092f0fa5e19]))
                                          2: Parent)
                                        lower_layers: {
                                          color => {
                                            LayerProof {
                                              proof: Merk(
                                                ... 37 ops — same boundary shape as brand_000's
                                                color subtree, terminating at op 18 with
                                                Push(KVValueHashFeatureTypeWithChildHash(
                                                  color_00000500, CountTree(00, 1, ...),
                                                  HASH[881d...], ProvableCountedMerkNode(1),
                                                  HASH[a422...]))
                                                — TARGET 2
                                              )
                                            }
                                          }
                                        }
                                      }
                                    }
                                  }
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

The two parallel descents below brand are the structurally novel part — every other layer above brand is byte-identical to Q4. The byBrand layer (L6) inlines brand_000 and brand_001 as KVValueHash siblings (ops 0–2), then descends via the lower_layers map into each one's value-tree continuation. Each continuation (L7) carries a single color key whose value is NonCounted(ProvableCountTree(…)) — the byBrandColor terminator. The terminator (L8) walks the boundary path through its in-color binary merk tree to land at color_00000500 with CountTree count=1 and a feature-typed child hash.

The bulk of the proof bytes (≈ 2 × 1 100 B = 2 200 B) is the doubled L7+L8 descent. The L1–L6 prefix amortises across both branches (≈ 600 B shared), giving 2 842 B total — significantly less than 2× Q4's 1 911 B because the upper layers aren't repeated.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree"]:::path
  BR ==> B000["brand_000: CountTree count=1000"]:::path
  BR ==> B001["brand_001: CountTree count=1000"]:::path
  B000 ==> B000_C["color: NonCounted(ProvableCountTree)"]:::path
  B001 ==> B001_C["color: NonCounted(ProvableCountTree)"]:::path
  B000_C ==> T1["color_00000500: CountTree count=1"]:::target
  B001_C ==> T2["color_00000500: CountTree count=1"]:::target

  SDK["Verifier returns Entries([<br/>(&quot;brand_000&quot;, 1),<br/>(&quot;brand_001&quot;, 1)<br/>])"]:::sdk
  T1 -.-> SDK
  T2 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 4 stroke:#1f6feb,stroke-width:3px;
  linkStyle 5 stroke:#1f6feb,stroke-width:3px;
  linkStyle 6 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

Layers 5–6 are like Q4's L5 + Q5's L6 combined (one KVValueHash per In brand at byBrand's binary tree); Layers 7–8 fork — one brand_000-rooted continuation chain and one brand_001-rooted chain — each shaped exactly like Q4's L7 + L8 descent.

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree"]
    direction TB
    L5_q["<b>brand</b><br/>kv_hash=HASH[68b6...]<br/>value: Tree (descent into byBrand)"]:::queried
    L5_left["HASH[9862...]"]:::sibling
    L5_right["HASH[6c36...]"]:::sibling
    L5_q --> L5_left
    L5_q --> L5_right
  end

  subgraph L6["Layer 6 — byBrand merk-tree (TWO INTERMEDIATE TARGETS)"]
    direction TB
    L6_t1["<b>brand_001</b><br/>kv_hash=HASH[484c...]<br/>value: CountTree count=1000"]:::queried
    L6_t0["<b>brand_000</b><br/>kv_hash=HASH[90ff...]<br/>value: CountTree count=1000"]:::queried
    L6_boundary["Boundary commitments (22 merk ops):<br/>7 KVHash sibling brands + 7 Hash subtrees"]:::sibling
    L6_t1 --> L6_t0
    L6_t1 --> L6_boundary
  end

  subgraph L7a["Layer 7a — brand_000's continuation merk-tree"]
    direction TB
    L7a_q["<b>color</b><br/>kv_hash=HASH[fccc...]<br/>value: NonCounted(ProvableCountTree)"]:::queried
    L7a_left["HASH[d605...]"]:::sibling
    L7a_q --> L7a_left
  end

  subgraph L7b["Layer 7b — brand_001's continuation merk-tree"]
    direction TB
    L7b_q["<b>color</b><br/>kv_hash=HASH[8f88...]<br/>value: NonCounted(ProvableCountTree)"]:::queried
    L7b_left["HASH[f547...]"]:::sibling
    L7b_q --> L7b_left
  end

  subgraph L8a["Layer 8a — brand_000's byBrandColor color subtree (TARGET 1)"]
    direction TB
    L8a_target["<b>color_00000500</b><br/>kv_hash=HASH[6834...]<br/>value: <b>CountTree count=1</b><br/>feature: ProvableCountedMerkNode(1)"]:::target
    L8a_boundary["37 merk ops:<br/>9 KVHashCount boundary commitments<br/>(running counts 3, 7, 15, 31, 63, 127, 255, 511, 1000)<br/>+ subtree hashes"]:::sibling
    L8a_target --> L8a_boundary
  end

  subgraph L8b["Layer 8b — brand_001's byBrandColor color subtree (TARGET 2)"]
    direction TB
    L8b_target["<b>color_00000500</b><br/>kv_hash=HASH[881d...]<br/>value: <b>CountTree count=1</b><br/>feature: ProvableCountedMerkNode(1)"]:::target
    L8b_boundary["37 merk ops:<br/>same boundary shape as L8a<br/>(different hashes — different brand's subtree)"]:::sibling
    L8b_target --> L8b_boundary
  end

  L5_q -. "Tree(merk_root[byBrand])" .-> L6_t1
  L6_t0 -. "CountTree continuation" .-> L7a_q
  L6_t1 -. "CountTree continuation" .-> L7b_q
  L7a_q -. "NonCounted(ProvableCountTree)" .-> L8a_target
  L7b_q -. "NonCounted(ProvableCountTree)" .-> L8b_target

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef sibling fill:#6e7681,color:#fff,stroke:#6e7681;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;

The two parallel byBrandColor descents share their L1–L6 commitments (the doctype prefix + byBrand merk root) but each gets its own L7 + L8 sub-proof. Proof bytes ≈ shared upper layers + 2 × per-brand byBrandColor descent ≈ 2 842 B.

G4 — Range on byColor, Grouped By color

GroupByRange is the proof primitive that enumerates distinct in-range keys with a count per key, as opposed to chapter 29's AggregateCountOnRange which collapses the same range to a single u64.

select   = COUNT
where    = color > "color_00000500"
group_by = [color]
prove    = true

Path query (uses distinct_count_path_query with limit=100, left_to_right=true):

path:         ["@", contract_id, 0x01, "widget", "color"]
query items:  [RangeAfter("color_00000500"..)]
limit:        100

Verified payload:

Entries(100 groups, sum = 10 000)

The 100 groups are color_00000501 through color_00000600 (the first 100 in-range colors in lex-asc order, capped by the limit). Each carries count_value_or_default = 100 since the fixture's deterministic schedule gives each color exactly 100 documents.

Wait — but Q7 said there are 499 distinct in-range colors and sum = 49 900 over the same color > "color_00000500" predicate. So why does G4 see only 100 groups summing to 10 000? Because GroupByRange's distinct_count_path_query applies the 100-entry response cap (Some(limit) in execute_distinct_count_with_proof). Without that cap the proof would scale linearly with the full in-range distinct count (~5.5 KB for the full 499 colors at ~110 B per resolved CountTree branch). The cap is a response-size safety control — the verifier ceases the walk once it has 100 entries.

Proof size: 10 992 B — ~5.3 × Q7. The structural reason:

  • Q7 (AggregateCountOnRange) walks the boundary of the range and emits one HashWithCount or KVDigestCount per merk-binary-tree boundary node. Total boundary nodes ≈ O(log C) (≈ 36 ops on the 1 000-color tree). The verifier sums subtree counts directly without descending into individual keys.
  • G4 (GroupByRange) walks the distinct in-range colors themselves — emitting one KVValueHashFeatureTypeWithChildHash(color_X, CountTree count=100, ProvableCountedMerkNode(…), …) per distinct color in the range, not just per merk-tree boundary node. Total ops ≈ O(R) where R is the distinct in-range colors (capped at 100 here).

The trade-off is exactly what you'd expect: AggregateCountOnRange is O(log C) in proof bytes but loses per-key resolution (returns one u64); GroupByRange is O(R) in proof bytes but preserves per-key counts.

Proof display:

Expand to see the structured proof (5 layers; bottom layer enumerates 100 distinct in-range colors as `KVValueHashFeatureTypeWithChildHash` targets, each carrying `CountTree count=100`) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(
      0: Push(Hash(HASH[bd291f29893fb6f6d6201087746ca1f23a178dd08e1346cb6c127e91ae3623b3]))
      1: Push(KVValueHash(@, Tree(4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289), HASH[4a5a28cb1b40226aa35b2f0d502767df13268bdf4678627dbfde26a557acdf73]))
      2: Parent
      3: Push(Hash(HASH[19c924989e473a90d0848277d0b1498ccc8db3dc870cbc130e773f3d79ea5b71]))
      4: Child)
    lower_layers: {
      @ => {
        LayerProof {
          proof: Merk(
            0: Push(KVValueHash(0x4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289, Tree(01), HASH[5b90e1e952b7eef903cc9db2d9098e334a37f7e08cade52c6b2ea3bf4b56b645])))
          lower_layers: {
            0x4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289 => {
              LayerProof {
                proof: Merk(
                  0: Push(Hash(HASH[49e7191075272395ed72cf03e973987ede6e4945e08574fe77d725f4ce7ecdf8]))
                  1: Push(KVValueHash(0x01, Tree(776964676574), HASH[5d9a0fad8a3f32560f8e8950c1e84a7feabaab21b79bc72fec4482442844e2ef]))
                  2: Parent)
                lower_layers: {
                  0x01 => {
                    LayerProof {
                      proof: Merk(
                        0: Push(KVValueHash(widget, Tree(6272616e64), HASH[6c505f53f2ebf3de030cc2aca463d4b429aeb320a9fadb8ae68bb7903a22bb68])))
                      lower_layers: {
                        widget => {
                          LayerProof {
                            proof: Merk(
                              0: Push(Hash(HASH[9862894b16a0792688fdcf64edcb2ceade5c8b234649bfc6cfc6426869b0e9d9]))
                              1: Push(KVHash(HASH[a29ee8f206a253362b6da4fcacf8643ee8e5925cd979fcd449e5906f0f9f8be3]))
                              2: Parent
                              3: Push(KVValueHash(color, ProvableCountTree(636f6c6f725f3030303030353131, 100000), HASH[79569d595db75bbf2e9dca93a15c90b7eecf7b299632668ec410e2076d27f71c]))
                              4: Child)
                            lower_layers: {
                              color => {
                                LayerProof {
                                  proof: Merk(
                                    ... 18 boundary-descent ops walking the binary tree from
                                    root (color_00000511) leftward to the cut point ...
                                    18: Push(KVDigestCount(color_00000500, HASH[47b0ade5...], 100))
                                       // op 18: BOUNDARY (excluded by strict `>`)
                                    19: Push(KVValueHashFeatureTypeWithChildHash(color_00000501,
                                       CountTree(00, 100, flags: [0, 0, 0]),
                                       HASH[9146433eb6d43db2f109f5f7714146624bd646b27c7310f3c2cad7155eb7c741],
                                       ProvableCountedMerkNode(300),
                                       HASH[c285efb8724a488de916ce8301b06c197fc687b5b9b83a04bf3a026f1098d17a]))
                                       // op 19: TARGET 1
                                    20: Parent
                                    21: Push(KVValueHashFeatureTypeWithChildHash(color_00000502, CountTree(00, 100, ...)))
                                       // op 21: TARGET 2
                                    ... 98 more KVValueHashFeatureTypeWithChildHash targets
                                    (color_00000503 ... color_00000600), each emitting
                                    `CountTree count=100` plus its merk feature/child-hash glue,
                                    interleaved with Parent/Child ops walking the binary tree
                                    in lex-asc order. Every target shares the same shape:
                                    Push(KVValueHashFeatureTypeWithChildHash(
                                      color_XXXXXXXX,
                                      CountTree(00, 100, flags: [0, 0, 0]),
                                      HASH[...],
                                      ProvableCountedMerkNode(running_count_at_this_node),
                                      HASH[...]
                                    )) ...
                                    220: Push(KVValueHashFeatureTypeWithChildHash(color_00000600,
                                       CountTree(00, 100, ...))) // op 220: TARGET 100 (LAST)
                                    221..244: closing boundary ops — KVHashCount running
                                    counts (300, 700, 6300, 25500, 48800) and Hash subtrees
                                    proving the still-out-of-range portion to the right of
                                    color_00000600 covers the remainder of the merk root.)
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

That schematic gives the shape; the bench's [gproof] output (run cargo bench --bench document_count_worst_case and grep [gproof] G4) has all 245 ops verbatim. The compression in the chapter just elides the 100 KVValueHashFeatureTypeWithChildHash targets since they share the same structural template — only the key name, the leaf kv-hash, the running count, and the child-hash differ.

Why so many targets? Because GroupByRange must enumerate every in-range key with its CountTree value — the SDK needs each individual key→count pair, which the aggregate-style HashWithCount commitment hides. So the prover walks the merk binary tree's in-order traversal across the in-range portion (here, left-to-right starting just past color_00000500) and emits one KVValueHashFeatureTypeWithChildHash per distinct color it visits, until the response-size limit is reached.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> CO["color: ProvableCountTree count=100000"]:::path
  CO -.-> C500["color_00000500 (boundary, excluded)"]:::faded
  CO ==> C501["color_00000501: CountTree count=100"]:::target
  CO ==> CMore["color_00000502 ... color_00000600<br/>(98 more in-range targets,<br/>each CountTree count=100)"]:::target
  CO ==> C600["color_00000600: CountTree count=100"]:::target
  CO -.-> CRest["color_00000601 ... color_00000999<br/>(beyond limit — opaque)"]:::faded

  SDK["Verifier returns Entries(100 groups):<br/>(&quot;color_00000501&quot;, 100),<br/>(&quot;color_00000502&quot;, 100),<br/>... (&quot;color_00000600&quot;, 100)"]:::sdk
  C501 -.-> SDK
  CMore -.-> SDK
  C600 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#d29922,color:#0d1117,stroke:#1f6feb,stroke-width:2px;
  classDef faded fill:#21262d,color:#6e7681,stroke:#484f58;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 4 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

L5 is identical to Q3's / Q6's L5 (color queried under an opaque kv root in the widget doctype tree). L6 is the structural novelty: 245 merk ops, of which 100 are full KVValueHashFeatureTypeWithChildHash targets and the remaining 145 are boundary-walk glue (KVDigestCount / KVHashCount / HashWithCount / Hash + Parent/Child).

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree (proof view for `color`)"]
    direction TB
    L5_root["KVHash[a29e...]<br/>(opaque kv root)"]:::sibling
    L5_left["HASH[9862...]"]:::sibling
    L5_q["<b>color</b><br/>kv_hash=HASH[7956...]<br/>value: ProvableCountTree count=100000"]:::queried
    L5_root --> L5_left
    L5_root --> L5_q
  end

  subgraph L6["Layer 6 — byColor ProvableCountTree merk-tree (100 in-range targets)"]
    direction TB
    L6_boundary_l["Left boundary descent (18 ops):<br/>walks from merk root color_00000511<br/>through KVHashCount running counts<br/>(51100, 25500, 12700, 6300, 3100, 700)<br/>down to color_00000500"]:::sibling
    L6_cut["op 18: KVDigestCount(color_00000500, ..., 100)<br/>(boundary — excluded by strict `>`)"]:::boundary
    L6_targets["ops 19..220: 100 in-range targets<br/>color_00000501 (count=100), color_00000502 (100),<br/>color_00000503 (100), ... color_00000600 (100)<br/>each as KVValueHashFeatureTypeWithChildHash<br/>with ProvableCountedMerkNode(subtree_count)<br/>interleaved with Parent/Child glue"]:::target
    L6_boundary_r["Right closing boundary (24 ops):<br/>KVHashCount running counts<br/>(300, 700, 6300, 25500, 48800)<br/>+ Hash subtree commitments<br/>covering color_00000601 ... color_00000999"]:::sibling

    L6_boundary_l --> L6_cut
    L6_cut --> L6_targets
    L6_targets --> L6_boundary_r
  end

  L5_q -. "ProvableCountTree(merk_root[byColor])" .-> L6_boundary_l

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef sibling fill:#6e7681,color:#fff,stroke:#6e7681;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef boundary fill:#d29922,color:#0d1117,stroke:#d29922,stroke-width:2px,stroke-dasharray: 6 3;

Three things this diagram makes explicit:

  1. The cut is named. op 18: KVDigestCount(color_00000500, ..., 100) exposes the key at the boundary so the verifier knows the cut sits exactly between color_00000500 (excluded) and color_00000501 (first in-range). Without that named op, a malicious prover could shift the cut and the verifier wouldn't know.
  2. Targets carry their own count, not a running total. Unlike Q7's boundary commitments (where ProvableCountedMerkNode(N) carried a subtree count), G4's targets are individual keys with CountTree(00, 100, ...) — the count_value_or_default = 100 IS the per-key count, not a subtree aggregate. The ProvableCountedMerkNode(N) on the merk feature still carries the subtree count (e.g. 300 for color_00000501's subtree), but G4's verifier reads count_value_or_default directly from the CountTree element.
  3. The right closing boundary doesn't enumerate the rest. Once the limit is hit at color_00000600, the proof commits the remaining ~399 in-range colors as opaque subtree hashes (KVHashCount + Hash ops). The SDK returns only the 100 visible groups; the remainder are provably present but not enumerated. This is the limit's whole point — bound response size without sacrificing soundness on the visible groups.

G5 — Compound In + Range, Grouped By brand, color

select   = COUNT
where    = brand IN ["brand_000", "brand_001"] AND color > "color_00000500"
group_by = [brand, color]
prove    = true

Path query (outer In on byBrand fans out to per-brand distinct_count_path_query on byBrandColor's color terminator):

outer path:         ["@", contract_id, 0x01, "widget", "brand"]
outer query items:  [Key("brand_000"), Key("brand_001")]
subquery_path:      ["color"]
subquery items:     [RangeAfter("color_00000500"..)]
subquery limit:     100 (shared across both brands)

Verified payload:

Entries(100 groups, sum = 100)

Two brands × 50 in-range colors per brand = 100 distinct (brand, color) groups visible in the proof. Each (brand_X, color_Y) pair has exactly 1 document by the fixture's deterministic schedule.

Proof size: 11 554 B. Mode: CountMode::GroupByCompound.

This is the most general group-by shape supported on this contract: outer In fan-out × inner GroupByRange walk. Structurally it combines G3's two-branch descent with G4's in-range enumeration per branch. Proof bytes ≈ shared upper-layer descent + 2 × per-brand byBrandColor distinct-walk. The bench's group_by_compound_in_range_proof_limit_100 benchmark uses the same shape with |IN| = 100 brands instead of 2 — yielding 17 256 B at the much higher fan-out.

Proof display:

Expand to see the structured proof (8 layers — same descent skeleton as G3, but each brand's L8 enumerates 50 in-range colors instead of one point-lookup target) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(
      0: Push(Hash(HASH[bd291f29893fb6f6d6201087746ca1f23a178dd08e1346cb6c127e91ae3623b3]))
      1: Push(KVValueHash(@, Tree(4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289), HASH[4a5a28cb1b40226aa35b2f0d502767df13268bdf4678627dbfde26a557acdf73]))
      2: Parent
      3: Push(Hash(HASH[19c924989e473a90d0848277d0b1498ccc8db3dc870cbc130e773f3d79ea5b71]))
      4: Child)
    lower_layers: {
      @ => { LayerProof { ... contract_id descent ... } }
      // L2..L4 identical to G3 / Q4's first three subgroves
    }
  }
  // L5 widget doctype merk tree: same as G3 — `brand` queried, opaque siblings 9862 / 6c36
  // L6 byBrand merk tree: two KVValueHash targets (brand_000 + brand_001), 25 boundary ops
  // L7a brand_000's value tree: single key `color` with NonCounted(ProvableCountTree(...))
  //   L8a byBrandColor's color subtree (under brand_000):
  //     proof: Merk(
  //       ... 18 boundary-descent ops walking from the merk root down to color_00000500 ...
  //       18: Push(KVDigestCount(color_00000500, HASH[...], 1))     // BOUNDARY, excluded
  //       19: Push(KVValueHashFeatureTypeWithChildHash(color_00000501,
  //              CountTree(00, 1, flags: [0, 0, 0]),
  //              HASH[4192...], ProvableCountedMerkNode(3), HASH[c3b4...])) // TARGET (brand_000, color_00000501)
  //       21: Push(KVValueHashFeatureTypeWithChildHash(color_00000502, CountTree(00, 1, ...))) // TARGET 2
  //       24: Push(KVValueHashFeatureTypeWithChildHash(color_00000503, CountTree(00, 1, ...))) // TARGET 3
  //       ... 47 more KVValueHashFeatureTypeWithChildHash targets, each CountTree(00, 1, ...)
  //           — color_00000504 ... color_00000550 (50 per-brand_000 targets total) ...
  //       ... closing boundary ops covering color_00000551 ... color_00000999 for brand_000
  //     )
  //   end L8a
  // end L7a
  // L7b brand_001's value tree: identical structure to L7a, single key `color`
  //   L8b byBrandColor's color subtree (under brand_001):
  //     proof: Merk(
  //       ... 18 boundary-descent ops (different hashes — different brand's subtree) ...
  //       18: Push(KVDigestCount(color_00000500, HASH[...], 1))
  //       19..220: 50 in-range KVValueHashFeatureTypeWithChildHash(color_X, CountTree(00, 1, ...)) targets
  //                + interleaved Parent/Child glue + closing boundary ops
  //     )
  //   end L8b
  // end L7b
  // end L6
}

The 344-line verbatim is available via the bench's [gproof] G5 output. The schematic compresses the 50 per-brand KVValueHashFeatureTypeWithChildHash targets at L8a / L8b — they all share the same template (CountTree(00, 1, ...) since each (brand, color) pair has count=1), differing only in key, leaf kv-hash, running count, and child-hash. Once you've seen G3's L8 structure (single target) and G4's L6 structure (100 in-range targets at the doctype level), G5 is precisely the product: two parallel G3-shaped descents that each terminate in a G4-shaped distinct-walk.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree"]:::path
  BR ==> B000["brand_000: CountTree count=1000"]:::path
  BR ==> B001["brand_001: CountTree count=1000"]:::path

  B000 ==> B000_C["brand_000/color: NonCounted(ProvableCountTree)"]:::path
  B001 ==> B001_C["brand_001/color: NonCounted(ProvableCountTree)"]:::path

  B000_C ==> T000_501["color_00000501: CountTree count=1"]:::target
  B000_C ==> T000_more["... 48 more color targets<br/>(brand_000, color_00000502..550)"]:::target
  B000_C ==> T000_550["color_00000550: CountTree count=1"]:::target

  B001_C ==> T001_501["color_00000501: CountTree count=1"]:::target
  B001_C ==> T001_more["... 48 more color targets<br/>(brand_001, color_00000502..550)"]:::target
  B001_C ==> T001_550["color_00000550: CountTree count=1"]:::target

  SDK["Entries(100 groups, sum=100):<br/>(&quot;brand_000&quot;, &quot;color_00000501&quot;, 1),<br/>...<br/>(&quot;brand_001&quot;, &quot;color_00000550&quot;, 1)"]:::sdk

  T000_501 -.-> SDK
  T000_more -.-> SDK
  T000_550 -.-> SDK
  T001_501 -.-> SDK
  T001_more -.-> SDK
  T001_550 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 4 stroke:#1f6feb,stroke-width:3px;
  linkStyle 5 stroke:#1f6feb,stroke-width:3px;
  linkStyle 6 stroke:#1f6feb,stroke-width:3px;
  linkStyle 7 stroke:#1f6feb,stroke-width:3px;
  linkStyle 8 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

Layers 5–7 are exactly G3's L5–L7. The difference shows up at L8 — instead of a single target per brand (G3's compound point lookup), each brand's L8 walks 50 in-range colors via the same KVValueHashFeatureTypeWithChildHash enumeration G4 uses, plus the boundary descent / closing boundary glue.

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree"]
    direction TB
    L5_q["<b>brand</b> (queried)<br/>kv_hash=HASH[68b6...]"]:::queried
  end

  subgraph L6["Layer 6 — byBrand merk-tree (two intermediate targets)"]
    direction TB
    L6_t0["<b>brand_000</b> (queried)<br/>CountTree count=1000"]:::queried
    L6_t1["<b>brand_001</b> (queried)<br/>CountTree count=1000"]:::queried
  end

  subgraph L7a["Layer 7a — brand_000's continuation"]
    direction TB
    L7a_q["<b>color</b> (queried)<br/>NonCounted(ProvableCountTree)"]:::queried
  end
  subgraph L7b["Layer 7b — brand_001's continuation"]
    direction TB
    L7b_q["<b>color</b> (queried)<br/>NonCounted(ProvableCountTree)"]:::queried
  end

  subgraph L8a["Layer 8a — brand_000's byBrandColor distinct-walk"]
    direction TB
    L8a_targets["50 KVValueHashFeatureTypeWithChildHash targets:<br/>color_00000501 ... color_00000550<br/>each CountTree(00, 1, ...)<br/>+ left/right boundary glue"]:::target
  end
  subgraph L8b["Layer 8b — brand_001's byBrandColor distinct-walk"]
    direction TB
    L8b_targets["50 KVValueHashFeatureTypeWithChildHash targets:<br/>color_00000501 ... color_00000550<br/>each CountTree(00, 1, ...)<br/>+ left/right boundary glue<br/>(different hashes — different brand subtree)"]:::target
  end

  L5_q -. "byBrand" .-> L6_t0
  L5_q -. "byBrand" .-> L6_t1
  L6_t0 -. "continuation" .-> L7a_q
  L6_t1 -. "continuation" .-> L7b_q
  L7a_q -. "byBrandColor distinct-range" .-> L8a_targets
  L7b_q -. "byBrandColor distinct-range" .-> L8b_targets

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;

The 50-targets-per-brand limit reflects the shared response-size cap. In the 2-brand case the cap kicks in at 50 colors per brand; if the In set had 1 brand it would be 100 colors; if it had 4 brands it would be 25 each. The dispatcher slices the cap evenly across the In fan-out so the total number of returned entries equals the limit, regardless of how many In branches share it. That's why the bench's [matrix] row for this case shows Entries(len=100, sum=100) rather than len=200, sum=200.

G6 — High-Fanout In on byBrand

select   = COUNT
where    = brand IN ["brand_000", "brand_001", ..., "brand_099"]
group_by = [brand]
prove    = true

Path query (same shape as G1, scaled to |IN| = 100):

path:         ["@", contract_id, 0x01, "widget", "brand"]
query items:  [Key("brand_000"), Key("brand_001"), ..., Key("brand_099")]

Verified payload:

Entries(100 groups, sum = 100 000)

Every document in the fixture, partitioned by brand. Each Entries[i] carries (brand_NNN, CountTree count=1000).

Proof size: 10 038 B. Mode: CountMode::GroupByIn.

Same structural shape as G1, scaled from |IN| = 2 to |IN| = 100. The byBrand merk binary tree at L6 emits all 100 brands as KVValueHashFeatureTypeWithChildHash targets — each ~100 B (key + leaf kv-hash + CountTree(00, 1000, ...) + BasicMerkNode feature + child-hash) — plus minimal boundary glue at the binary-tree corners. The proof grows linearly with |IN|: G1 (|IN|=2) was 1 102 B; G6 (|IN|=100) is 10 038 B; the slope is ~99 B per additional In value.

Compare against the byColor equivalent (group_by_color_in_proof_100_rangecountable_branches, 10 512 B): the ProvableCountTree overhead from byColor's KVHashCount running counts adds ~5 % to the byBrand baseline, even though those running counts aren't consumed by a point-lookup group_by. This is the same ProvableCountTree overhead G2 carried at the smaller scale (|IN|=2).

Proof display:

Expand to see the structured proof (5 layers; bottom layer enumerates 100 brands as `KVValueHashFeatureTypeWithChildHash` targets — 192 merk ops total at L6 including binary-tree glue) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(
      0: Push(Hash(HASH[bd291f29893fb6f6d6201087746ca1f23a178dd08e1346cb6c127e91ae3623b3]))
      1: Push(KVValueHash(@, Tree(4ed22624752972af97fb71abf4067b23e6d296a61a02f35b2098819fde39d289), HASH[4a5a28cb1b40226aa35b2f0d502767df13268bdf4678627dbfde26a557acdf73]))
      2: Parent
      3: Push(Hash(HASH[19c924989e473a90d0848277d0b1498ccc8db3dc870cbc130e773f3d79ea5b71]))
      4: Child)
    lower_layers: {
      // L2..L4 are byte-identical to every other query in this chapter
      // (the @ / contract_id / 0x01 descent into widget); see chapter 29's
      // Q1 verbatim for the full L1..L4 chain.
      ...
      widget => {
        LayerProof {
          proof: Merk(
            // L5 widget doctype — `brand` queried, opaque siblings 9862 / 6c36
            0: Push(Hash(HASH[9862894b16a0792688fdcf64edcb2ceade5c8b234649bfc6cfc6426869b0e9d9]))
            1: Push(KVValueHash(brand, Tree(6272616e645f303633), HASH[68b697da99d6ea70a83eb41794dca7ba3938d0ba98fbfaeb3cd0c19b3b5d0ff2]))
            2: Parent
            3: Push(Hash(HASH[6c36729e93b1a316cbf60fe282eb630c0ed6e45db088e365110302b6c9caba86]))
            4: Child)
          lower_layers: {
            brand => {
              LayerProof {
                proof: Merk(
                  // L6 byBrand merk-tree — 100 targets + binary-tree glue
                  // (192 merk ops total; structurally a fully-resolved in-order
                  // traversal of all 100 brand entries in the byBrand merk tree)
                  0: Push(KVValueHashFeatureTypeWithChildHash(brand_000, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[90ff6f6d9a3d901195982128130677243bfd27b75736206f3c8400966ef0d37b], BasicMerkNode, HASH[19b58883c492e746861db1e6ad07529a5a91cc8330af522682486db9346d6875]))
                  1: Push(KVValueHashFeatureTypeWithChildHash(brand_001, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[484ca11fb4ec8f479be1f78af903ce0c9d4fe630517579fb0172c2576d6b9652], BasicMerkNode, HASH[0bf12023f8e067c12db4cec1583909a0283878d6d909c76196736299750b5879]))
                  2: Parent
                  3: Push(KVValueHashFeatureTypeWithChildHash(brand_002, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[4c19f047068654e71813dce7839a579edfdcb446e3d70efa1b8592c73259da16], BasicMerkNode, HASH[e8d5372904b7f4ac9334aeb4ddab619d9ad7a308732a4f231416e10208a0a356]))
                  ...
                  // 97 more KVValueHashFeatureTypeWithChildHash targets following
                  // the same template — brand_003 ... brand_099 — interleaved with
                  // Parent/Child ops glueing them into the byBrand merk binary tree.
                  // Every target shares the structure:
                  //   Push(KVValueHashFeatureTypeWithChildHash(
                  //     brand_NNN,
                  //     CountTree(636f6c6f72, 1000, flags: [0, 0, 0]),   // count_value=1000
                  //     HASH[<per-brand leaf kv-hash>],
                  //     BasicMerkNode,                                  // NormalTree (no count on the merk node)
                  //     HASH[<per-brand subtree child hash>]
                  //   ))
                  ...
                  189: Push(KVValueHashFeatureTypeWithChildHash(brand_097, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[92adee932cc12927cd76ad9fd25906bbfe547df2bf21e826845bb4d3b47f5314], BasicMerkNode, HASH[34b69e1e424aa023c74f61554db2823da6c19dcbc51bdd5dece32e3f6f9fd219]))
                  190: Parent
                  191: Push(KVValueHashFeatureTypeWithChildHash(brand_098, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[68e02fcf66f86797035fbc8d53290185fe3fed7de897a8654743cae4007c47c3], BasicMerkNode, HASH[acfc3a88b852e8895449b4c7e01f4b1cc25028e6a80e4915cdde578ff6eb029b]))
                  192: Push(KVValueHashFeatureTypeWithChildHash(brand_099, CountTree(636f6c6f72, 1000, flags: [0, 0, 0]), HASH[af9667a8f2a10a9402b3d1fb0ac6e0b64d1e3dde5b8829c03b8d2c9cfc94e16d], BasicMerkNode, HASH[d049fe7e250b7dd763a4a5daa4227dcd2e41733dd95fd0758641ac06c63c3b51]))
                  // + closing Parent/Child ops binding the last few entries
                )
              }
            }
          }
        }
      }
    }
  }
}

The 254-line full verbatim sits in the bench's [gproof] G6 output — same template (one KVValueHashFeatureTypeWithChildHash per brand, all with CountTree count=1000 and BasicMerkNode feature) repeating 100 times. The schematic above shows the first 3 and last 3 targets so the structural pattern is clear without reproducing 100 near-identical lines.

Key observation: BasicMerkNode (not ProvableCountedMerkNode) is the feature type on each L6 op. byBrand is a NormalTree, so its merk binary tree's internal nodes don't carry running counts — only the per-brand CountTree count=1000 values stored inside each brand's element matter. Contrast this with G6's byColor cousin (group_by_color_in_proof_100_rangecountable_branches, 10 512 B): there the L6 targets would carry ProvableCountedMerkNode(...) features because byColor IS a ProvableCountTree. The ~5 % size difference is exactly those count fields × 100 nodes.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree (100 entries)"]:::path
  BR ==> B000["brand_000: CountTree count=1000"]:::target
  BR ==> B001["brand_001: CountTree count=1000"]:::target
  BR ==> BMore["... 96 more in-range targets<br/>(brand_002 ... brand_097)"]:::target
  BR ==> B098["brand_098: CountTree count=1000"]:::target
  BR ==> B099["brand_099: CountTree count=1000"]:::target

  SDK["Entries(100 groups, sum=100 000):<br/>(&quot;brand_000&quot;, 1000),<br/>(&quot;brand_001&quot;, 1000),<br/>...<br/>(&quot;brand_099&quot;, 1000)"]:::sdk
  B000 -.-> SDK
  B099 -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 4 stroke:#1f6feb,stroke-width:3px;
  linkStyle 5 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

Identical to G1's L5–L6 shape, just with all 100 entries in the byBrand merk tree resolved as visible targets rather than just two. The byBrand binary tree has all 100 keys exposed — no opaque sibling subtrees (Hash ops) at all, only KVValueHashFeatureTypeWithChildHash (full reveal) plus Parent / Child glue.

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree"]
    direction TB
    L5_q["<b>brand</b> (queried)<br/>kv_hash=HASH[68b6...]"]:::queried
    L5_left["HASH[9862...]"]:::sibling
    L5_right["HASH[6c36...]"]:::sibling
    L5_q --> L5_left
    L5_q --> L5_right
  end

  subgraph L6["Layer 6 — byBrand merk-tree (ALL 100 targets fully resolved)"]
    direction TB
    L6_t0["<b>brand_000</b><br/>CountTree count=1000<br/>BasicMerkNode"]:::target
    L6_t1["<b>brand_001</b><br/>CountTree count=1000"]:::target
    L6_tmid["... 97 more KVValueHashFeatureTypeWithChildHash<br/>targets, each CountTree count=1000<br/>(192 merk ops total: 100 Push + 92 Parent/Child)"]:::target
    L6_t99["<b>brand_099</b><br/>CountTree count=1000"]:::target

    L6_t0 --> L6_t1
    L6_t1 --> L6_tmid
    L6_tmid --> L6_t99
  end

  L5_q -. "Tree(merk_root[byBrand])" .-> L6_t0

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef sibling fill:#6e7681,color:#fff,stroke:#6e7681;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;

Because the In set covers every brand in the fixture, the proof has zero opaque-sibling subtree commitments at L6 — every binary-tree node is revealed as a KVValueHashFeatureTypeWithChildHash target. That's the most efficient byte-per-key shape GroupByIn can hit: at |IN| = B (where B is the total entries in the property tree), the proof bytes ≈ B × (kv-hash + count + child-hash + glue)B × 100 B. For B = 100, that's exactly the 10 038 B we observe.

By contrast, smaller In sets (G1's |IN| = 2) pay the boundary-proof tax: the byBrand merk tree has ~98 unresolved entries, each contributing one KVHash (opaque-key commitment, ~33 B) or Hash (opaque-subtree commitment, ~33 B). The asymptotic crossover at which "reveal everything" becomes cheaper than "reveal-some-and-commit-the-rest" depends on the ratio of |IN| to B — for byBrand with B = 100, the crossover is around |IN| ≈ 50.

G7 — Carrier In + Range, Grouped By brand

select   = COUNT
where    = brand IN ["brand_000", "brand_001"] AND color > "color_00000500"
group_by = [brand]
prove    = true

Path query (carrier AggregateCountOnRange — outer Keys per In value, ACOR subquery over each brand's color subtree):

path:                  ["@", contract_id, 0x01, "widget", "brand"]
outer query items:     [Key("brand_000"), Key("brand_001")]
subquery_path:         ["color"]
subquery items:        [AggregateCountOnRange([RangeAfter("color_00000500"..)])]

Verified payload (verifier returns one (in_key, u64) per resolved In branch via GroveDb::verify_aggregate_count_query_per_key):

[("brand_000", 499), ("brand_001", 499)]

Each brand has all 1 000 colors in its byBrandColor terminator; the strict > cut at color_00000500 leaves color_00000501..color_00000999 = 499 in-range colors per brand. Total sum = 998 documents.

Proof size: 4 332 B. Mode: CountMode::GroupByIn routed to DocumentCountMode::RangeAggregateCarrierProof (the new dispatcher arm wired up against grovedb PR #663).

This is the natural answer to "give me a per-brand aggregate count over a colour range" — same per-In-aggregate semantics as the no-proof per-In fan-out, just verifiable in a single proof. Strictly smaller and asymptotically better than the alternative two-field shape G5:

  • G5 (compound distinct walk, group_by = [brand, color]): O(k · R' · log C') bytes; emits one KVValueHashFeatureTypeWithChildHash per resolved (brand, color) pair → 11 554 B for k=2, R'≈50. Carries per-pair granularity the caller may not want.
  • G7 (carrier aggregate, group_by = [brand]): O(k · (log B + log C')) bytes; emits one HashWithCount/KVDigestCount ACOR boundary walk per brand → 4 332 B for k=2, log C'≈10. ~2.7× smaller than G5 for the same input data, at the cost of losing per-color resolution (which the group_by = [brand] caller didn't ask for anyway).

The win vs Q8 (brand == X AND color > floor, the same shape with k=1 and group_by = []) is asymptotic: Q8 is 2 656 B, G7 is 4 332 B for k=2. The slope (G7 − Q8) / 1 = +1 676 B per additional In branch matches what you'd expect: each brand adds its own L6 commit + its own L7 + L8 ACOR boundary walk (≈ Q8's L7 + L8 ≈ ~1 700 B), with the L1–L5 prefix amortising once across all branches.

Proof display:

Expand to see the structured proof (8 layers — same skeleton as G5, but each brand's L8 is an ACOR boundary walk instead of a 50-target distinct-walk) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(... root-level descent, identical to every other chapter query ...)
    lower_layers: {
      @ => { ... contract_id descent ... }
      // L2..L4 byte-identical to G3 / G5 (the @/contract_id/0x01/widget chain)
    }
  }
  // L5 widget doctype: brand queried (same as G3 / G5 — opaque siblings 9862 / 6c36)
  // L6 byBrand merk-tree: two KVValueHash targets (brand_000 + brand_001), 25 ops
  //                       — same shape as G5's L6
  // L7a brand_000's value tree: single key `color` with NonCounted(ProvableCountTree)
  //   L8a byBrandColor color subtree under brand_000:
  //     proof: Merk(
  //       ... 36-37 ACOR boundary ops over color > color_00000500 ...
  //       18: Push(KVDigestCount(color_00000500, ..., 1))          // BOUNDARY (excluded)
  //       19..35: HashWithCount / KVDigestCount boundary walk
  //                 — same shape as Q8's L8, summing to count=499 for brand_000)
  //   end L8a
  // end L7a
  // L7b brand_001's value tree: same single-key shape, different hashes
  //   L8b byBrandColor color subtree under brand_001:
  //     proof: Merk(
  //       ... 36-37 ACOR boundary ops over color > color_00000500 ...
  //                 — same shape, different hashes, summing to count=499 for brand_001)
  //   end L8b
  // end L7b
}

The 186-line full verbatim is available via the bench's [gproof] G7 output. The schematic compresses the L1–L4 doctype prefix (byte-identical to every other 8-layer chapter query) and the two parallel L7+L8 descents (structurally identical to Q8's, with different hashes for each brand). Each brand's L8 contributes ~1 700 B of ACOR boundary commitments — exactly the predicted Q8 - L1..L5 overhead per branch.

Cryptographic guarantee (via grovedb PR #663): every per-brand count is independently committed to the merk root via node_hash_with_count. A malicious prover can't lie about brand_000's count without breaking brand_001's verification (and vice versa) because each carrier ACOR subquery has its own hash chain back to the merk root.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree"]:::path
  BR ==> B000["brand_000: CountTree count=1000"]:::path
  BR ==> B001["brand_001: CountTree count=1000"]:::path
  B000 ==> B000_C["brand_000/color: NonCounted(ProvableCountTree)<br/>ACOR boundary walk (color > color_00000500)"]:::target
  B001 ==> B001_C["brand_001/color: NonCounted(ProvableCountTree)<br/>ACOR boundary walk (color > color_00000500)"]:::target

  SDK["Entries(2 groups, sum=998):<br/>(&quot;brand_000&quot;, 499)<br/>(&quot;brand_001&quot;, 499)"]:::sdk
  B000_C -.-> SDK
  B001_C -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 4 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

L5–L7 are exactly G5's L5–L7 (widget → byBrand → brand_X's continuation). The difference is at L8: G5 enumerates 50 distinct (brand_X, color_Y) pairs as KVValueHashFeatureTypeWithChildHash targets per brand; G7 walks the same color subtree as an ACOR boundary cut (like Q8's L8), emitting HashWithCount / KVDigestCount ops that commit a single aggregate u64 per brand.

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree"]
    direction TB
    L5_q["<b>brand</b> (queried)<br/>kv_hash=HASH[68b6...]"]:::queried
  end

  subgraph L6["Layer 6 — byBrand merk-tree (two intermediate targets)"]
    direction TB
    L6_t0["<b>brand_000</b> (queried)<br/>CountTree count=1000"]:::queried
    L6_t1["<b>brand_001</b> (queried)<br/>CountTree count=1000"]:::queried
  end

  subgraph L7a["Layer 7a — brand_000's continuation"]
    direction TB
    L7a_q["<b>color</b> (queried)<br/>NonCounted(ProvableCountTree)"]:::queried
  end
  subgraph L7b["Layer 7b — brand_001's continuation"]
    direction TB
    L7b_q["<b>color</b> (queried)<br/>NonCounted(ProvableCountTree)"]:::queried
  end

  subgraph L8a["Layer 8a — brand_000's byBrandColor: ACOR cut"]
    direction TB
    L8a_target["<b>Aggregate count = 499</b><br/>(committed via node_hash_with_count)"]:::target
    L8a_ops["~37 merk ops:<br/>KVDigestCount(color_00000500, …) — boundary excluded<br/>+ HashWithCount/KVDigestCount boundary walk<br/>over the in-range portion"]:::sibling
    L8a_target --> L8a_ops
  end
  subgraph L8b["Layer 8b — brand_001's byBrandColor: ACOR cut"]
    direction TB
    L8b_target["<b>Aggregate count = 499</b><br/>(committed via node_hash_with_count)"]:::target
    L8b_ops["~37 merk ops:<br/>same boundary shape as L8a<br/>(different hashes — different brand subtree)"]:::sibling
    L8b_target --> L8b_ops
  end

  L5_q -. "byBrand" .-> L6_t0
  L5_q -. "byBrand" .-> L6_t1
  L6_t0 -. "continuation" .-> L7a_q
  L6_t1 -. "continuation" .-> L7b_q
  L7a_q -. "carrier ACOR subquery" .-> L8a_target
  L7b_q -. "carrier ACOR subquery" .-> L8b_target

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef sibling fill:#6e7681,color:#fff,stroke:#6e7681;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;

The "carrier" name comes from grovedb's PR #663 terminology: a carrier query is the outer multi-key query that carries an ACOR subquery into each branch. The ACOR primitive itself is unchanged — it still walks one range over one subtree per invocation — but it can now appear as a subquery item under outer Keys, which is what enables the per-brand aggregate proof shape G7 needs.

G8 — Carrier outer Range + Range, Grouped By brand

select   = COUNT
where    = brand > "brand_050" AND color > "color_00000500"
group_by = [brand]
limit    = (optional; ≤ 10)
prove    = true

The platform's MAX_CARRIER_AGGREGATE_OUTER_RANGE_LIMIT = 10 is both the default (when the caller passes no limit) and a hard ceiling. Callers may pass a smaller limit (1 through 9) to truncate the outer walk further; passing 0 or any value > 10 is rejected with InvalidLimit. See the rationale below.

Path query (the same carrier-ACOR shape as G7, but with a range outer dimension and SizedQuery::limit bounded by the platform max):

path:                  ["@", contract_id, 0x01, "widget", "brand"]
outer query item:      RangeAfter("brand_050"..)
subquery_path:         ["color"]
subquery items:        [AggregateCountOnRange([RangeAfter("color_00000500"..)])]
SizedQuery::limit:     10  (platform default; caller may request smaller)

Verified payload (verifier returns one (in_key, u64) per in-range outer key, capped at limit, via GroveDb::verify_aggregate_count_query_per_key):

[("brand_051", 499), ("brand_052", 499), …, ("brand_060", 499)]

The bench's 100-brand fixture has 49 brands > "brand_050". The platform's default SizedQuery::limit = 10 caps the carrier at the first 10 (brand_051brand_060); each carries the per-brand ACOR count of 499 in-range colors (color_00000501color_00000999). Total sum = 10 × 499 = 4 990 documents.

Proof size: 18 022 B. Mode: CountMode::GroupByRange routed to DocumentCountMode::RangeAggregateCarrierProof (the dispatcher distinguishes G7's In-outer shape from G8's Range-outer shape by the carrier clause's operator).

G8 is G7's natural extension from "k specific outer keys" to "L outer keys from an in-range walk." Same carrier proof primitive, same node_hash_with_count commitments per branch, same one-u64-per-branch return shape. The structural differences are exactly two:

  • Outer dimension: G7 emits k Key(serialized_in_value) items in the carrier query; G8 emits a single RangeAfter(serialized_floor..) (or any Range* variant) and lets grovedb walk it.
  • Limit: G8 sets SizedQuery::limit = Some(L) where L is the smaller of the caller's request and the platform max. Per grovedb PR #664, this is the load-bearing relaxation — the predecessor PR #663 allowed Range outer items at the validator level but kept the leaf-ACOR rule rejecting SizedQuery::limit, which made unbounded range-outer carriers impractical at any reasonable dataset size (49 brands × ~1 700 B each ≈ 83 KB; with the platform default of 10 we land at 18 KB).

Why the cap exists and where the ceiling lives

The cap bounds the prove-path proof size; the ceiling is a hardcoded compile-time constant for prover/verifier-agreement reasons.

  1. Proof-size bounding. Proof bytes scale linearly with the limit (~1 700 B per outer match, exactly as for G7). 10 keeps the worst-case proof under 20 KB (Tier-1 for the visualizer's shareable-link guidance) — enough for typical "top-N brands by an outer range" queries while avoiding pathological proof sizes. Callers that want a window above 10 entries call repeatedly with disjoint outer-range bounds; callers that want fewer pass a smaller limit (1 through 9). Limit 0 is rejected to keep the response shape non-trivial.
  2. Prover/verifier byte-for-byte agreement. SizedQuery::limit is part of the serialized PathQuery and feeds the merk-root reconstruction; both prover and verifier must agree on its value. The caller's request carries limit over the wire, so its specific value (1..=10) is fine to vary. What can't vary is the platform's default when the caller passes nothing — that's why the ceiling is a hardcoded compile-time constant (MAX_CARRIER_AGGREGATE_OUTER_RANGE_LIMIT) rather than an operator-tunable runtime value. Same rationale as RangeDistinctProof's use of crate::config::DEFAULT_QUERY_LIMIT rather than drive_config.default_query_limit.

Caller semantics summary:

Caller request.limitServer usesReason
None10 (the platform default)Default = ceiling
Some(1..=10)the caller's valueTruncates the walk further
Some(0)rejectedNon-trivial response required
Some(11+)rejectedAbove the ceiling

Complexity: O(L · (log B + log C')) where L = min(caller_limit, MAX_CARRIER_AGGREGATE_OUTER_RANGE_LIMIT)L outer-key descents in the byBrand layer + L leaf-ACOR boundary walks in each brand's color subtree. Independent of how many keys the outer range could have walked without the cap.

Proof display:

Expand to see the structured proof (8 layers — same skeleton as G7, but L8 contains 10 per-brand ACOR boundary walks instead of 2) — or open interactively in the visualizer ↗
GroveDBProofV1 {
  LayerProof {
    proof: Merk(... root-level descent, identical to every other chapter query ...)
    lower_layers: {
      @ => { ... contract_id descent ... }
      // L2..L4 byte-identical to G3 / G5 / G7 (the @/contract_id/0x01/widget chain)
    }
  }
  // L5 widget doctype: brand queried (same as G3 / G5 / G7)
  // L6 byBrand merk-tree: 10 outer-key matches inlined as KVValueHash items
  //                       (brand_051 ... brand_060), each descending into its
  //                       continuation. Boundary commitments cover the
  //                       brands_outside_the_limited_window.
  // L7 brand_NNN's value tree: single key `color` with NonCounted(ProvableCountTree)
  //    — repeated 10 times, once per resolved outer brand
  // L8 brand_NNN's byBrandColor color subtree:
  //    proof: Merk(
  //      ... 36-37 ACOR boundary ops over color > color_00000500,
  //          summing to count = 499 per brand ...
  //    )
  //    — repeated 10 times in parallel, each with its own per-brand boundary hashes
}

The 618-line full verbatim is available via the bench's [gproof] G8 output. The schematic compresses the 10 parallel L7+L8 descents — they share the same template (single-key continuation + 37-op ACOR boundary walk), differing only in per-brand kv-hashes and the resulting subtree commits. Each per-brand L8 contributes ~1 700 B of ACOR boundary commitments — exactly the predicted Q8 - L1..L5 overhead per outer match, scaling linearly: 18 022 B ≈ shared upper layers + 10 × ~1 700 B ≈ 18 KB (matches the per-In slope from G7 vs Q8).

Cryptographic guarantee (via grovedb PR #663 + PR #664): every per-brand count is independently committed to the merk root via node_hash_with_count. The SizedQuery::limit is part of the serialized PathQuery and is part of the merk-root reconstruction the verifier performs — a malicious prover can't truncate the outer walk at a different point without breaking the hash chain.

flowchart TB
  WD["@/contract_id/0x01/widget"]:::tree
  WD ==> BR["brand: NormalTree"]:::path
  BR ==> B051["brand_051: CountTree count=1000"]:::path
  BR ==> BMore["… 8 more in-range brands (brand_052 … brand_059) …"]:::path
  BR ==> B060["brand_060: CountTree count=1000"]:::path
  BR -.-> BCapped["brand_061 … brand_099<br/>(beyond platform cap — opaque subtree commitments)"]:::faded
  BR -.-> BBelow["brand_000 … brand_050<br/>(below range floor — boundary commitments)"]:::faded

  B051 ==> B051_C["brand_051/color: NonCounted(ProvableCountTree)<br/>ACOR boundary walk (color > color_00000500)"]:::target
  BMore ==> BMore_C["8 parallel ACOR walks"]:::target
  B060 ==> B060_C["brand_060/color: NonCounted(ProvableCountTree)<br/>ACOR boundary walk (color > color_00000500)"]:::target

  SDK["Entries(10 groups, sum=4 990):<br/>(&quot;brand_051&quot;, 499)<br/>(&quot;brand_052&quot;, 499)<br/>…<br/>(&quot;brand_060&quot;, 499)"]:::sdk
  B051_C -.-> SDK
  BMore_C -.-> SDK
  B060_C -.-> SDK

  classDef tree fill:#21262d,color:#c9d1d9,stroke:#1f6feb,stroke-width:2px;
  classDef path fill:#6e7681,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef faded fill:#21262d,color:#6e7681,stroke:#484f58;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;
  classDef sdk fill:#21262d,color:#39c5cf,stroke:#39c5cf,stroke-width:2px,stroke-dasharray: 4 2;

  linkStyle 0 stroke:#1f6feb,stroke-width:3px;
  linkStyle 1 stroke:#1f6feb,stroke-width:3px;
  linkStyle 2 stroke:#1f6feb,stroke-width:3px;
  linkStyle 3 stroke:#1f6feb,stroke-width:3px;
  linkStyle 6 stroke:#1f6feb,stroke-width:3px;
  linkStyle 7 stroke:#1f6feb,stroke-width:3px;
  linkStyle 8 stroke:#1f6feb,stroke-width:3px;

Diagram: per-layer merk-tree structure (Layer 5+)

L5 is identical to G7's L5 (widget doctype with brand queried). L6 differs: G7 inlined 2 KVValueHash targets for the In-bearing brands; G8 inlines 10 KVValueHash targets for the in-range brands the carrier walks (brand_051 through brand_060), with boundary commitments covering both the below-floor and beyond-cap portions of the byBrand merk tree. L7 + L8 fork into 10 parallel descents, each shaped exactly like G7's L7 + L8 — same NonCounted(ProvableCountTree) continuation, same 37-op ACOR boundary walk over color > color_00000500.

flowchart TB
  subgraph L5["Layer 5 — widget doctype merk-tree"]
    direction TB
    L5_q["<b>brand</b> (queried)<br/>kv_hash=HASH[68b6...]"]:::queried
  end

  subgraph L6["Layer 6 — byBrand merk-tree (10 outer-range targets)"]
    direction TB
    L6_t051["<b>brand_051</b><br/>CountTree count=1000"]:::queried
    L6_tmid["… 8 more in-range targets …<br/>(brand_052 … brand_059)"]:::queried
    L6_t060["<b>brand_060</b><br/>CountTree count=1000"]:::queried
    L6_capped["Beyond-cap commitments:<br/>brand_061 … brand_099<br/>(opaque KVHash / Hash ops)"]:::sibling
    L6_floor["Below-floor commitments:<br/>brand_000 … brand_050<br/>(opaque)"]:::sibling

    L6_t051 --> L6_tmid
    L6_tmid --> L6_t070
    L6_t070 --> L6_capped
    L6_t051 --> L6_floor
  end

  subgraph L7L8["Layers 7+8 — per-brand continuation + ACOR walk (×10)"]
    direction TB
    L7L8_each["For each of brand_051 … brand_060:<br/>L7: single-key `color` continuation (NonCounted(ProvableCountTree))<br/>L8: 37 merk ops — ACOR boundary walk for color > color_00000500<br/>committing one `u64 = 499` per brand"]:::target
  end

  L5_q -. "byBrand" .-> L6_t051
  L6_t051 -. "continuation × 20" .-> L7L8_each

  classDef queried fill:#1f6feb,color:#fff,stroke:#1f6feb,stroke-width:2px;
  classDef sibling fill:#6e7681,color:#fff,stroke:#6e7681;
  classDef target fill:#39c5cf,color:#0d1117,stroke:#39c5cf,stroke-width:3px;

The slope vs G7 is the proof's whole story: G7's k = 2 outer matches → ~4 KB; G8's L = 10 outer matches → ~18 KB. The per-outer-match cost (~1 700 B) is the same; only the outer-walk count changes. The platform max of 10 keeps the worst-case proof under 20 KB (Tier-1 of the visualizer's shareable-link guidance); larger windows are unreachable without changing the constant — callers that want more results call repeatedly with disjoint outer-range windows.

Future Work

This chapter now mirrors chapter 29's per-query structure: every section above carries a path query, verified payload, proof size, verbatim or schematic proof display, narrative, conceptual flowchart, and per-layer merk-tree diagram.

Two pieces of infrastructure made this possible:

  • query_g1_*query_g6_* criterion bench_function calls in document_count_worst_case.rs — produce the Avg time column in Queries in this Chapter.
  • display_group_by_proofs (a sibling of display_proofs in the same bench file) — emits each group_by shape's verbatim merk-proof structure via bincode decode + GroveDBProof::Display. Tagged with [gproof] prefix in stderr so reviewers can grep deterministically.

Open follow-ups:

  1. Inline the full G4 / G5 / G6 verbatim rather than the schematic-with-elision form. The bench captures every byte; the chapter's <details> blocks currently summarise the 100-target enumerations because reproducing 100 near-identical KVValueHashFeatureTypeWithChildHash lines per case is more noise than signal. If a reader needs byte-exact output, they can run the bench and grep [gproof].
  2. Wire path-query reconstruction + verified-payload printing into display_group_by_proofs. Today it only dumps the proof-display block; chapter 29's display_proofs also reconstructs the PathQuery and prints the verifier's structured result (the verified: block). Adding that to the group_by side would give the chapter parity with chapter 29's verified: sections — currently rendered manually from the [matrix] output's Entries(len=N, sum=M) figures.
  3. A high-fanout byColor variant of G6 (color IN [100 values], group_by = [color]) — captured implicitly in the bench's existing group_by_color_in_proof_100_rangecountable_branches (10 512 B) but not given its own G* section, since it's structurally G6 with ProvableCountTree overhead.

Cross-Reference to Chapter 29

For background on the building blocks every query in this chapter uses:

The path-query builder (packages/rs-drive/src/query/drive_document_count_query/path_query.rs) and verifier mirror (packages/rs-drive/src/verify/document_count/) live in the same modules for both chapters' queries — the only difference is which point_lookup_* / aggregate_* / group_by_* function the dispatcher calls based on the CountMode carried in the request.