fix: HashTable dynamic rehash and group count limit for GROUP BY aggregation#48174
Conversation
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
|
@MrPresent-Han go-sdk check failed, comment |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #48174 +/- ##
==========================================
- Coverage 77.60% 77.60% -0.01%
==========================================
Files 2107 2107
Lines 349666 349809 +143
==========================================
+ Hits 271368 271465 +97
- Misses 70035 70068 +33
- Partials 8263 8276 +13
🚀 New features to boost your workflow:
|
4bb6ab8 to
c1e903b
Compare
c1e903b to
d47f238
Compare
d47f238 to
8330542
Compare
|
has verified by cc reviewers team |
8330542 to
d75ad20
Compare
|
@MrPresent-Han go-sdk check failed, comment |
d75ad20 to
010ea9a
Compare
|
@MrPresent-Han go-sdk check failed, comment |
ed7eb41 to
52954ae
Compare
configs/milvus.yaml
Outdated
| buildParallelRate: 0.5 # the ratio of building interim index parallel matched with cpu num | ||
| multipleChunkedEnable: true # Deprecated. Enable multiple chunked search | ||
| enableGeometryCache: false # Enable geometry cache for geometry data | ||
| maxGroupByGroups: 100000 # Maximum number of groups allowed in GROUP BY aggregation, enforced both per segment and during cross-segment merge. Exceeding this limit fails the query. |
There was a problem hiding this comment.
proxy might use this config? should put it somewhere or seprate to two configs
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liliu-z, MrPresent-Han, xiaofan-luan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
52954ae to
0a4fc85
Compare
|
/ci-rerun-ciloop |
002a0c4 to
b156e18
Compare
b156e18 to
a3b47e8
Compare
…egation (milvus-io#47569) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: MrPresent-Han <chun.han@gmail.com>
a3b47e8 to
d694f37
Compare
|
@MrPresent-Han go-sdk check failed, comment |
|
rerun go-sdk |
…egation (milvus-io#48174) ## Summary - HashTable now dynamically rehashes (doubles capacity) when load factor exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792 - Added configurable `queryNode.segcore.maxGroupByGroups` (default 100K) to cap total groups and prevent OOM on both C++ (per-segment HashTable) and Go (cross-segment agg reducer) layers - Added 4 C++ unit tests covering rehash basic/correctness, max groups limit, and multiple rehash rounds issue: milvus-io#47569 ## Test plan - [ ] C++ unit tests: `--gtest_filter="*HashTableRehash*:*MaxGroups*"` - [ ] E2E: GROUP BY aggregation with >2K unique values should succeed - [ ] E2E: Set `queryNode.segcore.maxGroupByGroups` to small value, verify clear error message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: MrPresent-Han <chun.han@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: heihutu <heihutu@gmail.com>
…egation (milvus-io#48174) ## Summary - HashTable now dynamically rehashes (doubles capacity) when load factor exceeds 7/8, fixing crash when GROUP BY cardinality > ~1792 - Added configurable `queryNode.segcore.maxGroupByGroups` (default 100K) to cap total groups and prevent OOM on both C++ (per-segment HashTable) and Go (cross-segment agg reducer) layers - Added 4 C++ unit tests covering rehash basic/correctness, max groups limit, and multiple rehash rounds issue: milvus-io#47569 ## Test plan - [ ] C++ unit tests: `--gtest_filter="*HashTableRehash*:*MaxGroups*"` - [ ] E2E: GROUP BY aggregation with >2K unique values should succeed - [ ] E2E: Set `queryNode.segcore.maxGroupByGroups` to small value, verify clear error message 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Chun Han <116052805+MrPresent-Han@users.noreply.github.com>
Summary
queryNode.segcore.maxGroupByGroups(default 100K) to cap total groups and prevent OOM on both C++ (per-segment HashTable) and Go (cross-segment agg reducer) layersissue: #47569
Test plan
--gtest_filter="*HashTableRehash*:*MaxGroups*"queryNode.segcore.maxGroupByGroupsto small value, verify clear error message🤖 Generated with Claude Code