Skip to content

feat(mito2): add partition range cache infrastructure#7798

Open
evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
evenyag:pr/partition-range-cache-infra
Open

feat(mito2): add partition range cache infrastructure#7798
evenyag wants to merge 9 commits intoGreptimeTeam:mainfrom
evenyag:pr/partition-range-cache-infra

Conversation

@evenyag
Copy link
Contributor

@evenyag evenyag commented Mar 12, 2026

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR adds the infrastructure for partition range scan result caching in mito2.

The main changes are:

  • add a dedicated range result cache to CacheManager and expose it through CacheStrategy
  • introduce read/range_cache.rs to define cache keys, request fingerprints, cached values, and
    memory estimation
  • add scan fingerprint construction in scan_region.rs for cache-eligible reads
  • include schema-sensitive fields in the fingerprint, including read column types and
    partition_expr_version
  • add unit tests for cache behavior and fingerprint eligibility/normalization
  • derive Hash for MergeMode so it can participate in cache keys

This branch is intentionally infrastructure-focused. It prepares the cache key model and cache
container, but it does not yet wire the scan execution path to actually read from or write to the
new range result cache.

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

evenyag added 3 commits March 11, 2026 20:11
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@github-actions github-actions bot added size/M docs-not-required This change does not impact docs. labels Mar 12, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the foundational infrastructure for caching partition range scan results within the mito2 storage engine. It establishes the necessary data structures for cache keys and values, integrates a new cache type into the existing CacheManager and CacheStrategy, and provides a mechanism to generate unique fingerprints for scan requests. This setup is a prerequisite for future work that will wire the actual scan execution path to utilize this new caching layer, aiming to improve read performance for repeated range queries.

Highlights

  • Range Result Cache Infrastructure: Added a dedicated range result cache to CacheManager and exposed it through CacheStrategy.
  • Cache Key and Value Definitions: Introduced read/range_cache.rs to define cache keys (RangeScanCacheKey), request fingerprints (ScanRequestFingerprint), cached values (RangeScanCacheValue), and memory estimation for these structures.
  • Scan Fingerprint Construction: Implemented scan fingerprint construction in scan_region.rs to identify cache-eligible reads, incorporating schema-sensitive fields like read column types and partition_expr_version.
  • Unit Tests: Included comprehensive unit tests to verify cache behavior, fingerprint eligibility, and normalization.
  • MergeMode Hashing: Derived the Hash trait for MergeMode to allow its participation in cache keys.
Changelog
  • src/mito2/src/cache.rs
    • Imported RangeScanCacheKey and RangeScanCacheValue for the new cache.
    • Defined RANGE_RESULT_TYPE constant for metrics.
    • Added get_range_result and put_range_result methods to CacheStrategy to interact with the new cache.
    • Introduced range_result_cache field to the CacheManager struct.
    • Implemented get_range_result and put_range_result methods within CacheManager to manage the range result cache.
    • Added range_result_cache_size to CacheManagerBuilder for configuration.
    • Provided a setter range_result_cache_size for the builder.
    • Initialized the range_result_cache within CacheManagerBuilder::build.
    • Included range_result_cache in the CacheManager struct instantiation.
    • Added range_result_cache_weight function for cache memory estimation.
    • Defined RangeResultCache type alias.
    • Updated test imports to include new range cache types.
    • Added test_range_result_cache unit test to verify the new cache's functionality.
  • src/mito2/src/read.rs
    • Added pub(crate) mod range_cache; to expose the new range cache module.
  • src/mito2/src/read/range_cache.rs
    • Added new file range_cache.rs to define types for range scan caching.
    • Defined ScanRequestFingerprint and ScanRequestFingerprintBuilder to capture request-relevant scan options.
    • Defined RangeScanCacheKey for uniquely identifying cached range scan outputs.
    • Defined RangeScanCacheValue to store cached record batches.
    • Implemented estimated_size methods for ScanRequestFingerprint, RangeScanCacheKey, and RangeScanCacheValue for cache sizing.
    • Added unit tests to ensure correct normalization and handling of time filters in scan fingerprints.
  • src/mito2/src/read/scan_region.rs
    • Imported ScanRequestFingerprint for use in scan processing.
    • Changed the time_range field in ScanInput to pub(crate) for broader access.
    • Added build_scan_fingerprint function to generate a ScanRequestFingerprint for cache-eligible scans.
    • Introduced new_scan_input helper function for testing purposes.
    • Added unit tests to validate the behavior and eligibility criteria of build_scan_fingerprint.
  • src/mito2/src/region/options.rs
    • Derived the Hash trait for the MergeMode enum to allow its use in hash-based data structures like cache keys.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the necessary infrastructure for caching partition range scan results in mito2. The changes are well-structured, adding a new cache to the CacheManager, defining cache keys and values in a new range_cache module, and implementing the logic for creating scan fingerprints. The code is consistent with existing patterns and includes comprehensive unit tests. I've found one minor issue regarding memory estimation for cache keys, which could lead to the cache using slightly more memory than its configured limit. Overall, this is a solid contribution that lays the groundwork for future performance improvements.

evenyag added 3 commits March 12, 2026 17:28
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag marked this pull request as ready for review March 12, 2026 12:59
Signed-off-by: evenyag <realevenyag@gmail.com>
evenyag added 2 commits March 13, 2026 13:23
- Remove TimeSeriesDistribution from fingerprint as it only affects yield order
- Disable range cache when dyn filters are present since they change at runtime

Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
@evenyag evenyag requested a review from discord9 March 13, 2026 06:52
@evenyag evenyag requested a review from waynexia March 16, 2026 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants