Skip to content

Conversation

@jamiepine
Copy link
Member

Adds a new ephemeral indexing subsystem with in-memory cache and status API.

What's New

Ephemeral Index System

  • New EphemeralIndexCache with get/insert/create_for_indexing/mark_indexing_complete, eviction, and stats
  • Memory-efficient data structures: NodeArena, NameCache, NameRegistry
  • Status API via EphemeralCacheStatusQuery and EphemeralCacheStatus

Indexing Changes

  • Default ephemeral indexer now uses shallow mode
  • Integrated ephemeral indexing throughout the indexing flow

UI Improvements

  • getContentKind(file) helper that prefers content_identity.kind over content_kind
  • useLocationChangeInvalidation hook to refetch when location index_mode changes (ephemeral ↔ persistent transitions)

Files Changed

  • core/src/ops/core/ephemeral_status/ - status query types and output
  • core/src/ops/indexing/ephemeral/ - arena, cache, registry, index_cache, types
  • packages/interface/src/hooks/useLocationChangeInvalidation.ts - new hook
  • Various updates to Explorer components for content kind detection

- Add a complete ephemeral indexing subsystem
  - core/src/ops/core/ephemeral_status with input/output and query types
  - core/src/ops/indexing/ephemeral with arena, cache, registry,
    index_cache, types
  - expose EphemeralIndexCache and EphemeralIndex through core modules
  - EphemeralIndexCache supports
    get/insert/create_for_indexing/mark_indexing_complete eviction and
    stats
- Implement EphemeralIndex data structures for memory-efficient storage
  - NodeArena, NameCache, NameRegistry, and related types
- Add EphemeralIndex status API
  - EphemeralCacheStatusInput and EphemeralCacheStatusQuery
  - EphemeralCacheStatus with per-index details
- Wire ephemeral indexing into the indexing flow
  - Change default Ephemeral Indexer behavior to shallow mode
  - Align code to EphemeralIndex usage across the codebase
- Enhance content kind detection in UI
  - Add getContentKind(file) helper (prefers content_identity.kind, then
    content_kind)
  - Use getContentKind in Explorer utilities and UI components
- Invalidate directory listings when location index_mode changes
  - Add useLocationChangeInvalidation to trigger refetches for ephemeral
    vs persistent indexing transitions
- Misc refactors and formatting to accommodate the new modules and APIs
@jamiepine jamiepine marked this pull request as ready for review December 8, 2025 03:58
@jamiepine jamiepine requested review from a team December 8, 2025 03:58
@cursor
Copy link

cursor bot commented Dec 8, 2025

PR Summary

Introduces a unified in-memory ephemeral index with a status query and integrates it into indexing, directory listing, and UI; adds memory‑efficient storage, CLI status command, and improved content kind handling.

  • Core/Indexing:
    • Add memory‑efficient ephemeral index subsystem: ops/indexing/ephemeral/* (NodeArena via memmap2, NameCache, NameRegistry, EphemeralIndexCache).
    • Expose cache via context; new status query core.ephemeral_status (EphemeralCacheStatus*).
    • Update directory listing to read from cache and trigger on‑demand indexing; preserve ephemeral UUIDs; add in‑memory sorting helper.
    • Enhance IndexerJob to share ephemeral index, default ephemeral mode to Shallow, and mark indexing complete.
  • CLI:
    • Add index ephemeral-cache command to print shared cache stats and indexed paths.
  • UI:
    • Add getContentKind(file) and update Explorer/Quick Preview/Inspector to prefer content_identity.kind.
    • DnD/preview refinements and job count refetch fix.
  • Misc:
    • Add deps (async-channel, memmap2, smallvec, parking_lot, num_cpus) and minor fixes (mobile native module loading, formatting).

Written by Cursor Bugbot for commit b6779d7. Configure here.

- Remove TTL-based ephemeral cache and switch to a permanent in-memory
  cache.
- Reuse ephemeral UUIDs when creating persistent entries to preserve
  continuity of user data.
- Populate ephemeral UUIDs during the processing phase and expose
  get_ephemeral_uuid in the indexer state.
- Remove the location invalidation hook and related UI usage.
- Implemented a mechanism to clear stale entries for a directory's children during re-indexing to prevent ghost files.
- Updated the `create_for_indexing` method to remove previously indexed paths and ensure a clean slate for new indexing operations.
- Added logging for the number of cleared entries to aid in debugging and monitoring.
…cate processing

- Introduced a shared `seen_paths` structure using `RwLock` to manage paths across all workers, addressing symlink loops and duplicate directory processing.
- Updated the `discovery_worker_rayon` function to utilize the shared `seen_paths`, enhancing efficiency and correctness in the discovery phase.
… ephemeral indexing cleanup

- Moved the job phase logic into a new `run_job_phases` method for better organization and clarity.
- Updated the `run` method to always mark ephemeral indexing as complete, even on failure, preventing stuck indexing flags.
- Enhanced logging to provide feedback on the completion status of ephemeral indexing.
- Updated `EphemeralIndex` and `NodeArena` to return `std::io::Result` for better error handling during creation and insertion.
- Implemented memory-mapped storage in `NodeArena` to efficiently manage large indexes, preventing out-of-memory errors.
- Refactored `EphemeralIndexCache` to handle initialization errors gracefully.
- Improved tests to validate new error handling and memory management features.
@jamiepine
Copy link
Member Author

@cursor review again please, I have made several commits since your review

e
);
// Mark indexing as not in progress since job failed
cache.mark_indexing_complete(&local_path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Failed indexing wrongly marks path as successfully indexed

When ephemeral indexer job dispatch fails, the code calls cache.mark_indexing_complete(&local_path) which not only removes the path from indexing_in_progress but also adds it to indexed_paths. This incorrectly marks the path as successfully indexed even though no indexing occurred. Subsequent queries will return the global index for this path, but it will contain no entries for the directory. The code should only clear the in-progress state on failure without marking the path as indexed. A separate cancel_indexing method that only removes from indexing_in_progress would be appropriate here.

Fix in Cursor Fix in Web

- Updated `EphemeralIndex` to preserve explicitly browsed subdirectories during re-indexing, preventing loss of user navigation context.
- Modified `clear_directory_children` to return the count of cleared entries and a list of deleted browsed directories.
- Introduced `EphemeralIndexCache` enhancements to support filesystem watching, allowing paths to be monitored for changes.
- Added methods for registering, unregistering, and checking watched paths, improving the responsiveness of the indexing system.
- Updated documentation and tests to reflect new functionality and ensure reliability.
…ng support

- Introduced a new `handler.rs` module to manage filesystem change events for both persistent and ephemeral indexing.
- Added a trait-based `ChangeHandler` interface to abstract operations for different storage backends.
- Enhanced `EphemeralIndexCache` to support filesystem watching, allowing paths to be monitored for changes.
- Implemented methods for registering and unregistering watched paths, improving responsiveness to filesystem events.
- Updated the `LocationWatcher` to handle ephemeral watches and process events accordingly.
- Added tests and documentation to ensure reliability and clarity of the new functionality.
- Removed the `handler.rs` module and integrated its functionality into the new `change_detection` module, which now handles both persistent and ephemeral change processing.
- Implemented a `ChangeDetector` for batch indexing scans, allowing efficient detection of new, modified, moved, and deleted entries.
- Introduced a `ChangeHandler` trait to abstract operations for both persistent and ephemeral storage, ensuring consistent behavior across different backends.
- Enhanced the `EphemeralChangeHandler` and `PersistentChangeHandler` to utilize the new change detection infrastructure.
- Updated the `apply_batch` function to streamline event processing and improve responsiveness to filesystem changes.
- Added comprehensive tests and documentation to validate the new structure and functionality.
- Replaced calls to `delete_subtree_internal` with `EntryProcessor::delete_subtree` in the `entry`, `location`, and `manager` modules to streamline the deletion process.
- Introduced a new `delete_subtree` method in `EntryProcessor` that handles the deletion of an entry and its descendants without creating tombstones, improving efficiency in database operations.
- Removed the deprecated `delete_subtree_internal` function from the `responder` module, consolidating deletion logic into the `EntryProcessor`.
- Updated documentation and tests to reflect the changes in deletion handling and ensure reliability.
- Introduced a new `ephemeral` module to encapsulate the `EphemeralIndex` functionality, enhancing organization and clarity.
- Moved `EphemeralIndex` and related types to the new module, ensuring a cleaner separation of concerns.
- Updated documentation across the `hierarchy`, `job`, and `persistence` modules to reflect the new structure and improve clarity on ephemeral indexing operations.
- Removed deprecated references to `EphemeralIndex` in favor of the new module path, streamlining code references.
- Enhanced comments and documentation to provide better context and understanding of the ephemeral indexing system.
- Replace EntryProcessor with DBWriter across indexing
- Introduce EphemeralWriter to unify ephemeral indexing logic
- Update IndexPersistence to abstract over writers and adjust modules
@jamiepine jamiepine changed the title Introduce ephemeral index cache and status API Introduce ephemeral index cache Dec 8, 2025
- Rename indexing backend: DBWriter to DatabaseStorage
- Replace EphemeralWriter with MemoryAdapter across watcher and
  ephemeral components
- Update module paths and imports in core indexing code, job, and
  persistence layers to use DatabaseStorage and MemoryAdapter
- Update docs to reflect new names
- (DatabaseStorage, MemoryAdapter)
@jamiepine jamiepine merged commit 89becd5 into main Dec 9, 2025
2 of 7 checks passed
@jamiepine jamiepine deleted the ephemeral-cache branch December 9, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants