-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Introduce ephemeral index cache #2901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add a complete ephemeral indexing subsystem
- core/src/ops/core/ephemeral_status with input/output and query types
- core/src/ops/indexing/ephemeral with arena, cache, registry,
index_cache, types
- expose EphemeralIndexCache and EphemeralIndex through core modules
- EphemeralIndexCache supports
get/insert/create_for_indexing/mark_indexing_complete eviction and
stats
- Implement EphemeralIndex data structures for memory-efficient storage
- NodeArena, NameCache, NameRegistry, and related types
- Add EphemeralIndex status API
- EphemeralCacheStatusInput and EphemeralCacheStatusQuery
- EphemeralCacheStatus with per-index details
- Wire ephemeral indexing into the indexing flow
- Change default Ephemeral Indexer behavior to shallow mode
- Align code to EphemeralIndex usage across the codebase
- Enhance content kind detection in UI
- Add getContentKind(file) helper (prefers content_identity.kind, then
content_kind)
- Use getContentKind in Explorer utilities and UI components
- Invalidate directory listings when location index_mode changes
- Add useLocationChangeInvalidation to trigger refetches for ephemeral
vs persistent indexing transitions
- Misc refactors and formatting to accommodate the new modules and APIs
PR SummaryIntroduces a unified in-memory ephemeral index with a status query and integrates it into indexing, directory listing, and UI; adds memory‑efficient storage, CLI status command, and improved content kind handling.
Written by Cursor Bugbot for commit b6779d7. Configure here. |
- Remove TTL-based ephemeral cache and switch to a permanent in-memory cache. - Reuse ephemeral UUIDs when creating persistent entries to preserve continuity of user data. - Populate ephemeral UUIDs during the processing phase and expose get_ephemeral_uuid in the indexer state. - Remove the location invalidation hook and related UI usage.
- Implemented a mechanism to clear stale entries for a directory's children during re-indexing to prevent ghost files. - Updated the `create_for_indexing` method to remove previously indexed paths and ensure a clean slate for new indexing operations. - Added logging for the number of cleared entries to aid in debugging and monitoring.
…cate processing - Introduced a shared `seen_paths` structure using `RwLock` to manage paths across all workers, addressing symlink loops and duplicate directory processing. - Updated the `discovery_worker_rayon` function to utilize the shared `seen_paths`, enhancing efficiency and correctness in the discovery phase.
… ephemeral indexing cleanup - Moved the job phase logic into a new `run_job_phases` method for better organization and clarity. - Updated the `run` method to always mark ephemeral indexing as complete, even on failure, preventing stuck indexing flags. - Enhanced logging to provide feedback on the completion status of ephemeral indexing.
- Updated `EphemeralIndex` and `NodeArena` to return `std::io::Result` for better error handling during creation and insertion. - Implemented memory-mapped storage in `NodeArena` to efficiently manage large indexes, preventing out-of-memory errors. - Refactored `EphemeralIndexCache` to handle initialization errors gracefully. - Improved tests to validate new error handling and memory management features.
|
@cursor review again please, I have made several commits since your review |
| e | ||
| ); | ||
| // Mark indexing as not in progress since job failed | ||
| cache.mark_indexing_complete(&local_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Failed indexing wrongly marks path as successfully indexed
When ephemeral indexer job dispatch fails, the code calls cache.mark_indexing_complete(&local_path) which not only removes the path from indexing_in_progress but also adds it to indexed_paths. This incorrectly marks the path as successfully indexed even though no indexing occurred. Subsequent queries will return the global index for this path, but it will contain no entries for the directory. The code should only clear the in-progress state on failure without marking the path as indexed. A separate cancel_indexing method that only removes from indexing_in_progress would be appropriate here.
- Updated `EphemeralIndex` to preserve explicitly browsed subdirectories during re-indexing, preventing loss of user navigation context. - Modified `clear_directory_children` to return the count of cleared entries and a list of deleted browsed directories. - Introduced `EphemeralIndexCache` enhancements to support filesystem watching, allowing paths to be monitored for changes. - Added methods for registering, unregistering, and checking watched paths, improving the responsiveness of the indexing system. - Updated documentation and tests to reflect new functionality and ensure reliability.
…ng support - Introduced a new `handler.rs` module to manage filesystem change events for both persistent and ephemeral indexing. - Added a trait-based `ChangeHandler` interface to abstract operations for different storage backends. - Enhanced `EphemeralIndexCache` to support filesystem watching, allowing paths to be monitored for changes. - Implemented methods for registering and unregistering watched paths, improving responsiveness to filesystem events. - Updated the `LocationWatcher` to handle ephemeral watches and process events accordingly. - Added tests and documentation to ensure reliability and clarity of the new functionality.
- Removed the `handler.rs` module and integrated its functionality into the new `change_detection` module, which now handles both persistent and ephemeral change processing. - Implemented a `ChangeDetector` for batch indexing scans, allowing efficient detection of new, modified, moved, and deleted entries. - Introduced a `ChangeHandler` trait to abstract operations for both persistent and ephemeral storage, ensuring consistent behavior across different backends. - Enhanced the `EphemeralChangeHandler` and `PersistentChangeHandler` to utilize the new change detection infrastructure. - Updated the `apply_batch` function to streamline event processing and improve responsiveness to filesystem changes. - Added comprehensive tests and documentation to validate the new structure and functionality.
- Replaced calls to `delete_subtree_internal` with `EntryProcessor::delete_subtree` in the `entry`, `location`, and `manager` modules to streamline the deletion process. - Introduced a new `delete_subtree` method in `EntryProcessor` that handles the deletion of an entry and its descendants without creating tombstones, improving efficiency in database operations. - Removed the deprecated `delete_subtree_internal` function from the `responder` module, consolidating deletion logic into the `EntryProcessor`. - Updated documentation and tests to reflect the changes in deletion handling and ensure reliability.
- Introduced a new `ephemeral` module to encapsulate the `EphemeralIndex` functionality, enhancing organization and clarity. - Moved `EphemeralIndex` and related types to the new module, ensuring a cleaner separation of concerns. - Updated documentation across the `hierarchy`, `job`, and `persistence` modules to reflect the new structure and improve clarity on ephemeral indexing operations. - Removed deprecated references to `EphemeralIndex` in favor of the new module path, streamlining code references. - Enhanced comments and documentation to provide better context and understanding of the ephemeral indexing system.
- Replace EntryProcessor with DBWriter across indexing - Introduce EphemeralWriter to unify ephemeral indexing logic - Update IndexPersistence to abstract over writers and adjust modules
- Rename indexing backend: DBWriter to DatabaseStorage - Replace EphemeralWriter with MemoryAdapter across watcher and ephemeral components - Update module paths and imports in core indexing code, job, and persistence layers to use DatabaseStorage and MemoryAdapter - Update docs to reflect new names - (DatabaseStorage, MemoryAdapter)
Adds a new ephemeral indexing subsystem with in-memory cache and status API.
What's New
Ephemeral Index System
EphemeralIndexCachewith get/insert/create_for_indexing/mark_indexing_complete, eviction, and statsNodeArena,NameCache,NameRegistryEphemeralCacheStatusQueryandEphemeralCacheStatusIndexing Changes
UI Improvements
getContentKind(file)helper that preferscontent_identity.kindovercontent_kinduseLocationChangeInvalidationhook to refetch when locationindex_modechanges (ephemeral ↔ persistent transitions)Files Changed
core/src/ops/core/ephemeral_status/- status query types and outputcore/src/ops/indexing/ephemeral/- arena, cache, registry, index_cache, typespackages/interface/src/hooks/useLocationChangeInvalidation.ts- new hook