Skip to content

feat: prune bulk memtable parts by first tag#7911

Merged
evenyag merged 8 commits intoGreptimeTeam:mainfrom
evenyag:feat/bulk-part-prune
Apr 21, 2026
Merged

feat: prune bulk memtable parts by first tag#7911
evenyag merged 8 commits intoGreptimeTeam:mainfrom
evenyag:feat/bulk-part-prune

Conversation

@evenyag
Copy link
Copy Markdown
Contributor

@evenyag evenyag commented Apr 3, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Prune the parts in the bulk memtable by first tag's min max statistics. Currently, we only use the statistics of the first tag because they are very cheap to collect.

This can reduce the scan cost if the min max statistics can prune some parts. In my dataset, it saved 20% scan time spent on memtable.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions Bot added size/S docs-not-required This change does not impact docs. labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces batch-level pruning for bulk memtables by extracting min/max statistics from the first tag of encoded primary keys. It implements a BatchStats structure and a PruningStatistics adapter to leverage DataFusion's pruning logic during scans. The review feedback identifies a logic error when using sparse primary key encoding, suggests caching statistics earlier in the write path to avoid redundant computations during every scan, and recommends lowering the log level for pruning events to reduce performance overhead and noise.

Comment thread src/mito2/src/memtable/bulk/part.rs
Comment thread src/mito2/src/memtable/bulk.rs
Comment thread src/mito2/src/memtable/bulk/part.rs Outdated
@evenyag evenyag force-pushed the feat/bulk-part-prune branch from 9db208b to fbb038a Compare April 8, 2026 10:43
@github-actions github-actions Bot added size/M and removed size/S labels Apr 8, 2026
@evenyag evenyag force-pushed the feat/bulk-part-prune branch from 5ef57f8 to bc435a7 Compare April 9, 2026 06:29
evenyag added 6 commits April 9, 2026 14:34
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Document sparse encoding format in SparsePrimaryKeyCodec and add
comment explaining why primary_key.first() works for both encodings.
Remove noisy info-level pruning logs from the read path.

Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag force-pushed the feat/bulk-part-prune branch from bc435a7 to 4db4fac Compare April 9, 2026 06:34
@evenyag evenyag marked this pull request as ready for review April 9, 2026 06:37
@evenyag evenyag requested review from a team, v0y4g3r and waynexia as code owners April 9, 2026 06:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4db4fac805

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/memtable/bulk/part.rs
Comment thread src/mito2/src/memtable/bulk/part.rs
Comment thread src/mito-codec/src/row_converter/sparse.rs
Comment thread src/mito2/src/memtable/bulk/part.rs
Comment thread src/mito2/src/memtable/bulk/part.rs
@killme2008 killme2008 requested a review from discord9 April 9, 2026 11:10
Signed-off-by: evenyag <realevenyag@gmail.com>
Copy link
Copy Markdown
Contributor

@v0y4g3r v0y4g3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment thread src/mito2/src/memtable/bulk/part.rs Outdated
Signed-off-by: evenyag <realevenyag@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4ee98a13d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/mito2/src/memtable/bulk/part.rs
Copy link
Copy Markdown
Member

@killme2008 killme2008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@evenyag evenyag added this pull request to the merge queue Apr 21, 2026
Merged via the queue into GreptimeTeam:main with commit 555741a Apr 21, 2026
46 checks passed
@evenyag evenyag deleted the feat/bulk-part-prune branch April 21, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants