Skip to content

feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24

Open
vishal-bala wants to merge 12 commits into
domain/airline-supportfrom
feat/semantic-caching
Open

feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24
vishal-bala wants to merge 12 commits into
domain/airline-supportfrom
feat/semantic-caching

Conversation

@vishal-bala
Copy link
Copy Markdown

@vishal-bala vishal-bala commented May 6, 2026

Motivation

Semantic caching is one of our most popular demo stories, and context surfaces has become another core part of how we present the product. Until now, those two stories have been separate. This change connects them so semantic caching can be demonstrated directly inside the context-surfaces experience, with the cache behavior visible in the same traced workflow as the underlying tool calls.

This PR also makes scoped semantic caching concrete. We have talked about different groups of users seeing different cache behavior, but we did not yet have a demo that showed that end to end. This change adds that capability and turns it into a supported flow in the airline-support domain.

Changes

Semantic caching implementation

This PR adds a shared semantic-cache runtime to the backend and integrates it into the context_surfaces chat flow. For eligible prompts on fresh single-turn threads, the backend now checks the semantic cache before running the agent, and if a matching answer is found it reuses that response directly while persisting the cached turn into LangGraph state. If no hit is found, the request proceeds normally and the system evaluates whether the final answer is safe to write back to cache.

Cache reads and writes are filtered by domain, mode, model, and access class, with explicit support for both public and group-scoped reuse. The runtime also tracks provenance from internal tools and MCP tools so only answers backed by safe sources are stored. Responses that depend on booking, itinerary, disruption, or other user-specific records are intentionally excluded from cache writes. Cache hit, miss, skip, and write events are surfaced in the trace so the behavior is visible during the demo instead of remaining an implementation detail.

Scoped semantic caching and multi-user configuration

This PR introduces scoped semantic caching by adding multi-user demo configuration to the project. The domain contract now supports semantic-cache settings, internal-tool access metadata, and domain-provided demo-user definitions. The chat request payload includes a selected demo_user_id, the backend resolves that into request-scoped user context, and the active user now determines the cache group available for semantic-cache reads and writes.

In the airline-support domain, this is used to model cohort-aware reuse explicitly. Shared policy guidance and public flight-status lookups can participate in caching, while record-backed or profile-backed paths remain non-cacheable. The result is a concrete scoped-cache demo: two passengers in the same cohort can share a cached answer for the same prompt, while a passenger in a different cohort will miss and generate a separate result. This also required moving identity resolution away from environment-only configuration and into per-request demo-user state.

Airline support demo extension

The airline-support demo has been extended to showcase the new semantic-caching behavior directly. The domain now exposes multiple demo passengers, including users who deliberately share a cache cohort and others who do not, so the demo can show both cohort-local reuse and cross-cohort misses. The identity tool has also been expanded to return tier, service-permission, and cache-group context that supports these flows cleanly.

The scripted airline demo paths and supporting dataset were updated to make the new behavior legible. There is now a tier-based cancellation-help flow that demonstrates scoped cache reuse within a passenger cohort, and a shared flight-status flow that demonstrates public cache reuse across passengers. The flagship disruption path remains intentionally non-cacheable, which helps reinforce the distinction between shared guidance, scoped guidance, and record-specific answers.

Additional changes

  • Added SemanticCacheService with RedisVL-backed cache lookup, write, warmup, and cleanup behavior.
  • Added semantic-cache configuration and internal-tool access-control metadata to the domain contract.
  • Added request-scoped demo-user context and demo_user_id support in the chat API.
  • Exposed semantic_cache_enabled, demo_users, and default_demo_user_id through /api/domain-config.
  • Added a passenger selector to the frontend and trace labeling/styling for semantic-cache events.
  • Hardened Redis and LangGraph connection handling with shared connection settings and cleanup hooks.
  • Updated airline-support prompts, demo-path documentation, and generated policy/data fixtures for the new cache flows.
  • Added tests covering demo-user resolution, cache grouping, tool classification, filter construction, and cached-turn persistence.

Note

Medium Risk
Touches the core chat/SSE path, LangGraph thread identity, and Redis-backed cache/checkpointer lifecycle; misclassification could cache or reuse answers across the wrong cohort or for personalized prompts, though heuristics and provenance gates aim to prevent that.

Overview
Adds optional semantic caching to the Context Surfaces chat path and wires it into the airline-support demo so cache behavior shows up in the same SSE tool trace as MCP/internal calls.

The shared backend now supports domain-configured semantic cache settings, internal-tool access classes (public / group / non-cacheable), and demo_user_id on chat requests. Selected passengers resolve to request-scoped identity (not only .env), namespaced LangGraph thread IDs, and a cache group for cohort-scoped reads/writes. On fresh single-turn threads, eligible prompts get a RedisVL semantic lookup (public + optional group filters); hits short-circuit the agent and persist the turn into checkpoint state. After a full run, answers are written back only when tool provenance is safe—public (e.g. shared flight status, policy search) or group (tier context)—and skipped for booking/itinerary/profile-style or user-specific prompts.

Airline-support is extended with multiple demo passengers (shared senator_en cohort vs others), richer profile/tier fields, get_current_service_tier_context, updated prompts/paths/docs, and UI passenger selector plus trace styling for cache events. Docs/deps add sentence-transformers, Redis pool tuning, and broad tests for cache classification, filtering, and stream behavior.

Reviewed by Cursor Bugbot for commit 425195a. Bugbot is set up for automated code reviews on this repo. Configure here.

@vishal-bala vishal-bala self-assigned this May 6, 2026
@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 6, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@vishal-bala vishal-bala marked this pull request as ready for review May 15, 2026 13:39
Comment thread backend/app/main.py
Comment thread backend/app/semantic_cache.py Outdated
Comment thread backend/app/main.py Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 425195a. Configure here.

Comment thread backend/app/main.py
return
_background_resources_cleaned = True

semantic_cache_service.close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup flag set before actual resource cleanup completes

Low Severity

In _cleanup_process_resources, the _background_resources_cleaned flag is set to True inside the lock, but semantic_cache_service.close() is called after the lock is released. This means a concurrent caller (e.g., the atexit handler racing with shutdown_resources) could observe the flag as True and return early, even though close() hasn't executed yet. The close() call belongs inside the with _cleanup_lock: block so the guarded flag accurately reflects whether cleanup has actually finished.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 425195a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant