feat: implement last row cache reader for flat format by evenyag · Pull Request #7757 · GreptimeTeam/greptimedb

evenyag · 2026-03-04T10:28:35Z

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Add FlatRowGroupLastRowCachedReader and FlatLastTimestampSelector to support last-row selection
and caching for the flat SST format, mirroring the existing primary-key format cache
Extend SelectorResultValue with a SelectorResult enum to store either primary-key Batch or
flat RecordBatch cache entries
Wire the flat last-row reader into the file range read path, enabling it when the row group is
PUT-only and all rows are selected

TSBS lastpoint without cache:

min:  2087.04ms, med:  2094.78ms, mean:  2111.82ms, max: 2264.96ms, stddev:    51.22ms, sum:  21.1sec, count: 10
all queries             :
min:  2087.04ms, med:  2094.78ms, mean:  2111.82ms, max: 2264.96ms, stddev:    51.22ms, sum:  21.1sec, count: 10

with cache

min:   385.46ms, med:   395.38ms, mean:   579.50ms, max: 2256.26ms, stddev:   558.91ms, sum:   5.8sec, count: 10
all queries             :
min:   385.46ms, med:   395.38ms, mean:   579.50ms, max: 2256.26ms, stddev:   558.91ms, sum:   5.8sec, count: 10

Details

The existing last-row cache (RowGroupLastRowCachedReader) only works with the primary-key SST
format. This PR adds equivalent support for the flat format:

FlatLastTimestampSelector: Processes sorted flat RecordBatches, tracking primary key
boundaries to emit only the row with the latest timestamp per series. Handles primary keys spanning
multiple batches and deduplicates timestamps across batch boundaries.
FlatRowGroupLastRowCachedReader: Cache-aware reader that checks for cached flat results on
hit, or reads through FlatRowGroupReader and populates the cache on miss.
SelectorResult enum: The cache value type now distinguishes between PrimaryKey(Vec<Batch>)
and Flat(Vec<RecordBatch>) variants. Cache lookups validate both format and projection match before
returning hits.
FileRange::flat_reader now accepts an optional TimeSeriesRowSelector and creates the
last-row reader when applicable (PUT-only, all rows selected).
Primary key arrays are compacted during selection to reduce memory usage in cached results.

Future Work

The ScanHint rule will ensure it only passes the last row hint when there are only tag filters. So the last row cache doesn't consider field filters now. We can support a more flexible last row cache in the future.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

I have written the necessary rustdoc comments.
I have added the necessary unit tests and integration tests.
This PR requires documentation updates.
API changes are backward compatible.
Schema or data changes are backward compatible.

Signed-off-by: evenyag <realevenyag@gmail.com>

gemini-code-assist · 2026-03-04T10:31:16Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial performance enhancements for data retrieval in the flat SST format by adding a dedicated last-row caching mechanism. It extends the existing caching infrastructure to support RecordBatches, ensuring that queries requesting only the latest data per time series can benefit from cached results, thereby reducing redundant computations and improving read efficiency for this specific data layout.

Highlights

Flat Format Last Row Caching: Implemented FlatRowGroupLastRowCachedReader and FlatLastTimestampSelector to provide last-row selection and caching capabilities for the flat SST format, mirroring the existing primary-key format's caching mechanism.
Extended Selector Result Cache: The SelectorResultValue enum was extended with a SelectorResult variant to store cached entries in either the primary-key Batch format or the new flat RecordBatch format, allowing the cache to support both data representations.
Integration into Read Path: The new flat last-row reader is integrated into the file range read path, enabling its use when a row group contains only PUT operations and all rows are selected, optimizing queries for the latest data.
Primary Key Compaction: Primary key arrays are now compacted during selection to reduce memory usage in the cached results, specifically for the flat format.

Changelog

src/mito2/src/cache.rs
- Imported RecordBatch and record_batch_estimated_size for flat format support.
- Introduced SelectorResult enum to differentiate between primary-key Batch and flat RecordBatch cache entries.
- Updated SelectorResultValue to use SelectorResult and added a new_flat constructor.
- Modified estimated_size to correctly calculate memory usage for both primary-key and flat result types.
src/mito2/src/read/last_row.rs
- Added FlatRowGroupLastRowCachedReader to manage cached last rows for the flat format.
- Implemented FlatLastRowCacheReader for iterating over cached flat results.
- Created FlatRowGroupLastRowReader to read and cache last rows from flat format row groups.
- Developed FlatLastTimestampSelector to process RecordBatches, identify primary key boundaries, and select the latest timestamped rows.
- Added helper functions primary_key_bytes_at, last_timestamp_start, and rebuild_pk_dictionary_for_key for flat format primary key handling.
- Updated RowGroupLastRowCachedReader to verify the result format (PrimaryKey) during cache hits.
- Adjusted LastRowCacheReader to only process PrimaryKey results.
- Added comprehensive unit tests for FlatLastTimestampSelector.
src/mito2/src/read/prune.rs
- Imported FlatRowGroupLastRowCachedReader for integration.
- Extended FlatSource enum to include LastRow variant for flat last-row cached readers.
- Added new_with_last_row_reader constructor to FlatPruneReader for initializing with the new flat last-row reader.
- Updated metrics method in FlatPruneReader to merge metrics from the inner FlatRowGroupLastRowCachedReader.
src/mito2/src/read/scan_util.rs
- Modified flat_reader call to pass the series_row_selector from the stream context, enabling conditional last-row reading.
src/mito2/src/sst/parquet/file_range.rs
- Modified flat_reader to accept an optional TimeSeriesRowSelector parameter.
- Implemented logic to conditionally use FlatRowGroupLastRowCachedReader if the selector is LastRow, the row group is PUT-only, and all rows are selected.
- Instantiated FlatPruneReader with either a standard FlatRowGroupReader or the new FlatRowGroupLastRowCachedReader based on the selection criteria.
src/mito2/src/sst/parquet/format.rs
- Increased visibility of primary_key_offsets function to pub(crate) to allow its use by the new flat format components.

Activity

The pull request introduces new functionality for flat format last-row caching and integrates it into the read path. No specific human activity (comments, reviews, approvals) is provided in the context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a last-row cache reader for the flat SST format, which is a valuable performance optimization for a common query pattern. The changes are well-structured, with a new SelectorResult enum to handle different cache entry formats and a FlatLastTimestampSelector for the core logic. The integration into the existing read path appears correct. I have one suggestion to improve the clarity and robustness of the new reader implementation, as detailed in the comment.

src/mito2/src/read/last_row.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

- Replace VecDeque with BatchBuffer struct for output buffering - Remove rebuild_pk_dictionary_for_key as batches go directly into buffer - Remove unused push method and make BatchBuffer pub(crate) - Remove debug logging in maybe_update_cache Signed-off-by: evenyag <realevenyag@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c86866ddc0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/mito2/src/sst/parquet/file_range.rs

v0y4g3r

LGTM

src/mito2/src/read/last_row.rs

src/mito2/src/read/prune.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

evenyag added 8 commits March 4, 2026 12:20

feat: initial implementation

a4d6397

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: handle multiple series

3620b67

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: reset state in finish()

ddd507a

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: handle duplicated last timestamps across batches

e2a8b26

Signed-off-by: evenyag <realevenyag@gmail.com>

perf: compact primary key array

fcf49f8

Signed-off-by: evenyag <realevenyag@gmail.com>

fix(mito2): simplify flat last timestamp selector state

7331175

Signed-off-by: evenyag <realevenyag@gmail.com>

refactor(mito2): rebuild flat pk dictionary from selector state

f720e5a

Signed-off-by: evenyag <realevenyag@gmail.com>

test: reduce tests

b697445

Signed-off-by: evenyag <realevenyag@gmail.com>

github-actions bot added size/M docs-not-required This change does not impact docs. labels Mar 4, 2026

evenyag changed the title ~~Feat/flat last row cache~~ feat: implement last row cache reader for flat format Mar 4, 2026

chore: update comment

7958bad

Signed-off-by: evenyag <realevenyag@gmail.com>

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

src/mito2/src/read/last_row.rs Show resolved Hide resolved

evenyag closed this Mar 4, 2026

evenyag added 3 commits March 4, 2026 20:19

chore: more logs to debug

6b87f13

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: concat batches in last row reader

620ffc9

Signed-off-by: evenyag <realevenyag@gmail.com>

evenyag reopened this Mar 4, 2026

evenyag marked this pull request as ready for review March 4, 2026 15:17

evenyag requested review from v0y4g3r and waynexia as code owners March 4, 2026 15:17

chatgpt-codex-connector bot reviewed Mar 4, 2026

View reviewed changes

src/mito2/src/sst/parquet/file_range.rs Show resolved Hide resolved

evenyag mentioned this pull request Mar 5, 2026

Unify read paths for scanners in mito #7732

Open

7 tasks

v0y4g3r approved these changes Mar 6, 2026

View reviewed changes

discord9 reviewed Mar 6, 2026

View reviewed changes

src/mito2/src/read/last_row.rs Outdated Show resolved Hide resolved

discord9 reviewed Mar 6, 2026

View reviewed changes

src/mito2/src/read/prune.rs Show resolved Hide resolved

discord9 approved these changes Mar 6, 2026

View reviewed changes

chore: address comments

ebc554e

Signed-off-by: evenyag <realevenyag@gmail.com>

evenyag enabled auto-merge March 6, 2026 11:41

evenyag added this pull request to the merge queue Mar 6, 2026

Merged via the queue into GreptimeTeam:main with commit 93c48a0 Mar 6, 2026
43 of 45 checks passed

evenyag deleted the feat/flat-last-row-cache branch March 6, 2026 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement last row cache reader for flat format#7757

feat: implement last row cache reader for flat format#7757
evenyag merged 13 commits intoGreptimeTeam:mainfrom
evenyag:feat/flat-last-row-cache

evenyag commented Mar 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

v0y4g3r left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

evenyag commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

Details

Future Work

PR Checklist

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

v0y4g3r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

evenyag commented Mar 4, 2026 •

edited

Loading