feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24
feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24vishal-bala wants to merge 12 commits into
Conversation
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 425195a. Configure here.
| return | ||
| _background_resources_cleaned = True | ||
|
|
||
| semantic_cache_service.close() |
There was a problem hiding this comment.
Cleanup flag set before actual resource cleanup completes
Low Severity
In _cleanup_process_resources, the _background_resources_cleaned flag is set to True inside the lock, but semantic_cache_service.close() is called after the lock is released. This means a concurrent caller (e.g., the atexit handler racing with shutdown_resources) could observe the flag as True and return early, even though close() hasn't executed yet. The close() call belongs inside the with _cleanup_lock: block so the guarded flag accurately reflects whether cleanup has actually finished.
Reviewed by Cursor Bugbot for commit 425195a. Configure here.


Motivation
Semantic caching is one of our most popular demo stories, and context surfaces has become another core part of how we present the product. Until now, those two stories have been separate. This change connects them so semantic caching can be demonstrated directly inside the context-surfaces experience, with the cache behavior visible in the same traced workflow as the underlying tool calls.
This PR also makes scoped semantic caching concrete. We have talked about different groups of users seeing different cache behavior, but we did not yet have a demo that showed that end to end. This change adds that capability and turns it into a supported flow in the airline-support domain.
Changes
Semantic caching implementation
This PR adds a shared semantic-cache runtime to the backend and integrates it into the
context_surfaceschat flow. For eligible prompts on fresh single-turn threads, the backend now checks the semantic cache before running the agent, and if a matching answer is found it reuses that response directly while persisting the cached turn into LangGraph state. If no hit is found, the request proceeds normally and the system evaluates whether the final answer is safe to write back to cache.Cache reads and writes are filtered by domain, mode, model, and access class, with explicit support for both public and group-scoped reuse. The runtime also tracks provenance from internal tools and MCP tools so only answers backed by safe sources are stored. Responses that depend on booking, itinerary, disruption, or other user-specific records are intentionally excluded from cache writes. Cache hit, miss, skip, and write events are surfaced in the trace so the behavior is visible during the demo instead of remaining an implementation detail.
Scoped semantic caching and multi-user configuration
This PR introduces scoped semantic caching by adding multi-user demo configuration to the project. The domain contract now supports semantic-cache settings, internal-tool access metadata, and domain-provided demo-user definitions. The chat request payload includes a selected
demo_user_id, the backend resolves that into request-scoped user context, and the active user now determines the cache group available for semantic-cache reads and writes.In the airline-support domain, this is used to model cohort-aware reuse explicitly. Shared policy guidance and public flight-status lookups can participate in caching, while record-backed or profile-backed paths remain non-cacheable. The result is a concrete scoped-cache demo: two passengers in the same cohort can share a cached answer for the same prompt, while a passenger in a different cohort will miss and generate a separate result. This also required moving identity resolution away from environment-only configuration and into per-request demo-user state.
Airline support demo extension
The airline-support demo has been extended to showcase the new semantic-caching behavior directly. The domain now exposes multiple demo passengers, including users who deliberately share a cache cohort and others who do not, so the demo can show both cohort-local reuse and cross-cohort misses. The identity tool has also been expanded to return tier, service-permission, and cache-group context that supports these flows cleanly.
The scripted airline demo paths and supporting dataset were updated to make the new behavior legible. There is now a tier-based cancellation-help flow that demonstrates scoped cache reuse within a passenger cohort, and a shared flight-status flow that demonstrates public cache reuse across passengers. The flagship disruption path remains intentionally non-cacheable, which helps reinforce the distinction between shared guidance, scoped guidance, and record-specific answers.
Additional changes
SemanticCacheServicewith RedisVL-backed cache lookup, write, warmup, and cleanup behavior.demo_user_idsupport in the chat API.semantic_cache_enabled,demo_users, anddefault_demo_user_idthrough/api/domain-config.Note
Medium Risk
Touches the core chat/SSE path, LangGraph thread identity, and Redis-backed cache/checkpointer lifecycle; misclassification could cache or reuse answers across the wrong cohort or for personalized prompts, though heuristics and provenance gates aim to prevent that.
Overview
Adds optional semantic caching to the Context Surfaces chat path and wires it into the airline-support demo so cache behavior shows up in the same SSE tool trace as MCP/internal calls.
The shared backend now supports domain-configured semantic cache settings, internal-tool access classes (public / group / non-cacheable), and
demo_user_idon chat requests. Selected passengers resolve to request-scoped identity (not only.env), namespaced LangGraph thread IDs, and a cache group for cohort-scoped reads/writes. On fresh single-turn threads, eligible prompts get a RedisVL semantic lookup (public + optional group filters); hits short-circuit the agent and persist the turn into checkpoint state. After a full run, answers are written back only when tool provenance is safe—public (e.g. shared flight status, policy search) or group (tier context)—and skipped for booking/itinerary/profile-style or user-specific prompts.Airline-support is extended with multiple demo passengers (shared
senator_encohort vs others), richer profile/tier fields,get_current_service_tier_context, updated prompts/paths/docs, and UI passenger selector plus trace styling for cache events. Docs/deps addsentence-transformers, Redis pool tuning, and broad tests for cache classification, filtering, and stream behavior.Reviewed by Cursor Bugbot for commit 425195a. Bugbot is set up for automated code reviews on this repo. Configure here.