Skip to content

fix: HashTable dynamic rehash and group count limit for GROUP BY aggregation#48174

Merged
sre-ci-robot merged 1 commit intomilvus-io:masterfrom
MrPresent-Han:hashtable-rehash-master
Mar 23, 2026
Merged

fix: HashTable dynamic rehash and group count limit for GROUP BY aggregation#48174
sre-ci-robot merged 1 commit intomilvus-io:masterfrom
MrPresent-Han:hashtable-rehash-master

Conversation

@MrPresent-Han
Copy link
Copy Markdown
Contributor

Summary

  • HashTable now dynamically rehashes (doubles capacity) when load factor exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792
  • Added configurable queryNode.segcore.maxGroupByGroups (default 100K) to cap total groups and prevent OOM on both C++ (per-segment HashTable) and Go (cross-segment agg reducer) layers
  • Added 4 C++ unit tests covering rehash basic/correctness, max groups limit, and multiple rehash rounds

issue: #47569

Test plan

  • C++ unit tests: --gtest_filter="*HashTableRehash*:*MaxGroups*"
  • E2E: GROUP BY aggregation with >2K unique values should succeed
  • E2E: Set queryNode.segcore.maxGroupByGroups to small value, verify clear error message

🤖 Generated with Claude Code

@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Mar 10, 2026
@mergify mergify bot added dco-passed DCO check passed. kind/bug Issues or changes related a bug labels Mar 10, 2026
@sre-ci-robot
Copy link
Copy Markdown
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-build-all // for ci-v2/build-all (multi-arch builds)
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm
  • /ci-rerun-e2e-default // for ci-v2/e2e-default
  • /ci-rerun-ciloop // for ci-v2/ciloop (build + unit tests in one pipeline)

If you have any questions or requests, please contact @zhikunyao.

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 10, 2026

@MrPresent-Han go-sdk check failed, comment rerun go-sdk can trigger the job again.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 92.20779% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.60%. Comparing base (aed7c8b) to head (d694f37).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/segcore/segcore_init_c.cpp 0.00% 4 Missing ⚠️
internal/agg/aggregate_reducer.go 75.00% 1 Missing and 2 partials ⚠️
pkg/util/paramtable/component_param.go 76.92% 2 Missing and 1 partial ⚠️
internal/core/src/exec/HashTable.cpp 96.42% 1 Missing ⚠️
internal/core/src/segcore/SegcoreConfig.h 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #48174      +/-   ##
==========================================
- Coverage   77.60%   77.60%   -0.01%     
==========================================
  Files        2107     2107              
  Lines      349666   349809     +143     
==========================================
+ Hits       271368   271465      +97     
- Misses      70035    70068      +33     
- Partials     8263     8276      +13     
Components Coverage Δ
Client 79.25% <ø> (ø)
Core 83.98% <95.23%> (+0.02%) ⬆️
Go 75.77% <78.57%> (-0.02%) ⬇️
Files with missing lines Coverage Δ
internal/core/src/exec/HashTable.h 98.14% <100.00%> (+7.58%) ⬆️
...l/core/src/exec/operator/query-agg/GroupingSet.cpp 100.00% <100.00%> (ø)
internal/core/unittest/test_query_group_by.cpp 99.88% <100.00%> (+0.01%) ⬆️
internal/util/initcore/query_node.go 79.23% <100.00%> (+0.49%) ⬆️
internal/core/src/exec/HashTable.cpp 97.35% <96.42%> (+3.85%) ⬆️
internal/core/src/segcore/SegcoreConfig.h 80.95% <50.00%> (-1.55%) ⬇️
internal/agg/aggregate_reducer.go 59.00% <75.00%> (+0.69%) ⬆️
pkg/util/paramtable/component_param.go 97.21% <76.92%> (-0.05%) ⬇️
internal/core/src/segcore/segcore_init_c.cpp 17.59% <0.00%> (-0.68%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Mar 10, 2026
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch 2 times, most recently from 4bb6ab8 to c1e903b Compare March 11, 2026 04:05
@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label Mar 11, 2026
@mergify mergify bot added the ci-passed label Mar 11, 2026
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from c1e903b to d47f238 Compare March 12, 2026 02:21
@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed ci-passed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 12, 2026
@mergify mergify bot added the ci-passed label Mar 12, 2026
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from d47f238 to 8330542 Compare March 12, 2026 12:28
@MrPresent-Han
Copy link
Copy Markdown
Contributor Author

has verified by cc reviewers team

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed ci-passed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 12, 2026
@mergify mergify bot added the ci-passed label Mar 12, 2026
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from 8330542 to d75ad20 Compare March 16, 2026 12:51
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 16, 2026

@MrPresent-Han go-sdk check failed, comment rerun go-sdk can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from d75ad20 to 010ea9a Compare March 17, 2026 02:21
@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 17, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 20, 2026

@MrPresent-Han go-sdk check failed, comment rerun go-sdk can trigger the job again.

@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from ed7eb41 to 52954ae Compare March 20, 2026 03:08
@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed lgtm labels Mar 20, 2026
buildParallelRate: 0.5 # the ratio of building interim index parallel matched with cpu num
multipleChunkedEnable: true # Deprecated. Enable multiple chunked search
enableGeometryCache: false # Enable geometry cache for geometry data
maxGroupByGroups: 100000 # Maximum number of groups allowed in GROUP BY aggregation, enforced both per segment and during cross-segment merge. Exceeding this limit fails the query.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxy might use this config? should put it somewhere or seprate to two configs

@xiaofan-luan
Copy link
Copy Markdown
Collaborator

/lgtm
/approve

@sre-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liliu-z, MrPresent-Han, xiaofan-luan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [liliu-z,xiaofan-luan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from 52954ae to 0a4fc85 Compare March 22, 2026 02:59
@sre-ci-robot sre-ci-robot removed lgtm low-code-coverage add test-label from zhikun, diff coverage > 80% labels Mar 22, 2026
@MrPresent-Han
Copy link
Copy Markdown
Contributor Author

/ci-rerun-ciloop

@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch 3 times, most recently from 002a0c4 to b156e18 Compare March 22, 2026 09:00
@sre-ci-robot sre-ci-robot added the low-code-coverage add test-label from zhikun, diff coverage > 80% label Mar 22, 2026
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from b156e18 to a3b47e8 Compare March 22, 2026 11:46
…egation (milvus-io#47569)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
@MrPresent-Han MrPresent-Han force-pushed the hashtable-rehash-master branch from a3b47e8 to d694f37 Compare March 23, 2026 02:18
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Mar 23, 2026

@MrPresent-Han go-sdk check failed, comment rerun go-sdk can trigger the job again.

@sre-ci-robot sre-ci-robot removed the low-code-coverage add test-label from zhikun, diff coverage > 80% label Mar 23, 2026
@MrPresent-Han
Copy link
Copy Markdown
Contributor Author

rerun go-sdk

@mergify mergify bot added the ci-passed label Mar 23, 2026
Copy link
Copy Markdown
Member

@liliu-z liliu-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@sre-ci-robot sre-ci-robot merged commit cc9c60e into milvus-io:master Mar 23, 2026
18 of 23 checks passed
houseme pushed a commit to heihutu/milvus that referenced this pull request Mar 31, 2026
…egation (milvus-io#48174)

## Summary
- HashTable now dynamically rehashes (doubles capacity) when load factor
exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792
- Added configurable `queryNode.segcore.maxGroupByGroups` (default 100K)
to cap total groups and prevent OOM on both C++ (per-segment HashTable)
and Go (cross-segment agg reducer) layers
- Added 4 C++ unit tests covering rehash basic/correctness, max groups
limit, and multiple rehash rounds

issue: milvus-io#47569

## Test plan
- [ ] C++ unit tests: `--gtest_filter="*HashTableRehash*:*MaxGroups*"`
- [ ] E2E: GROUP BY aggregation with >2K unique values should succeed
- [ ] E2E: Set `queryNode.segcore.maxGroupByGroups` to small value,
verify clear error message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: heihutu <heihutu@gmail.com>
houseme pushed a commit to heihutu/milvus that referenced this pull request Mar 31, 2026
…egation (milvus-io#48174)

## Summary
- HashTable now dynamically rehashes (doubles capacity) when load factor
exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792
- Added configurable `queryNode.segcore.maxGroupByGroups` (default 100K)
to cap total groups and prevent OOM on both C++ (per-segment HashTable)
and Go (cross-segment agg reducer) layers
- Added 4 C++ unit tests covering rehash basic/correctness, max groups
limit, and multiple rehash rounds

issue: milvus-io#47569

## Test plan
- [ ] C++ unit tests: `--gtest_filter="*HashTableRehash*:*MaxGroups*"`
- [ ] E2E: GROUP BY aggregation with >2K unique values should succeed
- [ ] E2E: Set `queryNode.segcore.maxGroupByGroups` to small value,
verify clear error message

🤖 Generated with [Claude Code](https://claude.com/claude-code)


Signed-off-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ci-passed dco-passed DCO check passed. kind/bug Issues or changes related a bug lgtm size/L Denotes a PR that changes 100-499 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants