feat: Add semantic caching (both generic and scoped) to the demo workflow (optional) by vishal-bala · Pull Request #24 · redis/redis-iris-demos

vishal-bala · 2026-05-06T17:09:24Z

Motivation

Semantic caching is one of our most popular demo stories, and context surfaces has become another core part of how we present the product. Until now, those two stories have been separate. This change connects them so semantic caching can be demonstrated directly inside the context-surfaces experience, with the cache behavior visible in the same traced workflow as the underlying tool calls.

This PR also makes scoped semantic caching concrete. We have talked about different groups of users seeing different cache behavior, but we did not yet have a demo that showed that end to end. This change adds that capability and turns it into a supported flow in the airline-support domain.

Changes

Semantic caching implementation

This PR adds a shared semantic-cache runtime to the backend and integrates it into the context_surfaces chat flow. For eligible prompts on fresh single-turn threads, the backend now checks the semantic cache before running the agent, and if a matching answer is found it reuses that response directly while persisting the cached turn into LangGraph state. If no hit is found, the request proceeds normally and the system evaluates whether the final answer is safe to write back to cache.

Cache reads and writes are filtered by domain, mode, model, and access class, with explicit support for both public and group-scoped reuse. The runtime also tracks provenance from internal tools and MCP tools so only answers backed by safe sources are stored. Responses that depend on booking, itinerary, disruption, or other user-specific records are intentionally excluded from cache writes. Cache hit, miss, skip, and write events are surfaced in the trace so the behavior is visible during the demo instead of remaining an implementation detail.

Scoped semantic caching and multi-user configuration

This PR introduces scoped semantic caching by adding multi-user demo configuration to the project. The domain contract now supports semantic-cache settings, internal-tool access metadata, and domain-provided demo-user definitions. The chat request payload includes a selected demo_user_id, the backend resolves that into request-scoped user context, and the active user now determines the cache group available for semantic-cache reads and writes.

In the airline-support domain, this is used to model cohort-aware reuse explicitly. Shared policy guidance and public flight-status lookups can participate in caching, while record-backed or profile-backed paths remain non-cacheable. The result is a concrete scoped-cache demo: two passengers in the same cohort can share a cached answer for the same prompt, while a passenger in a different cohort will miss and generate a separate result. This also required moving identity resolution away from environment-only configuration and into per-request demo-user state.

Airline support demo extension

The airline-support demo has been extended to showcase the new semantic-caching behavior directly. The domain now exposes multiple demo passengers, including users who deliberately share a cache cohort and others who do not, so the demo can show both cohort-local reuse and cross-cohort misses. The identity tool has also been expanded to return tier, service-permission, and cache-group context that supports these flows cleanly.

The scripted airline demo paths and supporting dataset were updated to make the new behavior legible. There is now a tier-based cancellation-help flow that demonstrates scoped cache reuse within a passenger cohort, and a shared flight-status flow that demonstrates public cache reuse across passengers. The flagship disruption path remains intentionally non-cacheable, which helps reinforce the distinction between shared guidance, scoped guidance, and record-specific answers.

Additional changes

Added SemanticCacheService with RedisVL-backed cache lookup, write, warmup, and cleanup behavior.
Added semantic-cache configuration and internal-tool access-control metadata to the domain contract.
Added request-scoped demo-user context and demo_user_id support in the chat API.
Exposed semantic_cache_enabled, demo_users, and default_demo_user_id through /api/domain-config.
Added a passenger selector to the frontend and trace labeling/styling for semantic-cache events.
Hardened Redis and LangGraph connection handling with shared connection settings and cleanup hooks.
Updated airline-support prompts, demo-path documentation, and generated policy/data fixtures for the new cache flows.
Added tests covering demo-user resolution, cache grouping, tool classification, filter construction, and cached-turn persistence.

Note

Medium Risk
Touches the core chat/SSE path, LangGraph thread identity, and Redis-backed cache/checkpointer lifecycle; misclassification could cache or reuse answers across the wrong cohort or for personalized prompts, though heuristics and provenance gates aim to prevent that.

Overview
Adds optional semantic caching to the Context Surfaces chat path and wires it into the airline-support demo so cache behavior shows up in the same SSE tool trace as MCP/internal calls.

The shared backend now supports domain-configured semantic cache settings, internal-tool access classes (public / group / non-cacheable), and demo_user_id on chat requests. Selected passengers resolve to request-scoped identity (not only .env), namespaced LangGraph thread IDs, and a cache group for cohort-scoped reads/writes. On fresh single-turn threads, eligible prompts get a RedisVL semantic lookup (public + optional group filters); hits short-circuit the agent and persist the turn into checkpoint state. After a full run, answers are written back only when tool provenance is safe—public (e.g. shared flight status, policy search) or group (tier context)—and skipped for booking/itinerary/profile-style or user-specific prompts.

Airline-support is extended with multiple demo passengers (shared senator_en cohort vs others), richer profile/tier fields, get_current_service_tier_context, updated prompts/paths/docs, and UI passenger selector plus trace styling for cache events. Docs/deps add sentence-transformers, Redis pool tuning, and broad tests for cache classification, filtering, and stream behavior.

^{Reviewed by Cursor Bugbot for commit 425195a. Bugbot is set up for automated code reviews on this repo. Configure here.}

jit-ci · 2026-05-06T17:12:01Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 425195a. Configure here.}

cursor · 2026-05-26T16:42:50Z

+            return
+        _background_resources_cleaned = True
+
+    semantic_cache_service.close()


Cleanup flag set before actual resource cleanup completes

Low Severity

In _cleanup_process_resources, the _background_resources_cleaned flag is set to True inside the lock, but semantic_cache_service.close() is called after the lock is released. This means a concurrent caller (e.g., the atexit handler racing with shutdown_resources) could observe the flag as True and return early, even though close() hasn't executed yet. The close() call belongs inside the with _cleanup_lock: block so the guarded flag accurately reflects whether cleanup has actually finished.

^{Reviewed by Cursor Bugbot for commit 425195a. Configure here.}

vishal-bala added 6 commits May 5, 2026 09:55

feat(backend): add shared semantic cache runtime

cbc9d1b

feat(frontend): expose semantic cache gate state

1bb4d1f

feat(backend): harden semantic cache runtime

a4facc3

feat(frontend): refine passenger selector styling

1e0465b

feat(airline-support): add passenger semantic cache demo

229d651

docs(airline-support): refresh demo paths

19c01d0

vishal-bala self-assigned this May 6, 2026

vishal-bala added 4 commits May 8, 2026 22:45

fix: harden semantic cache scoping

01e0fae

fix: fail open on semantic cache errors

e14c7a2

fix: harden airline semantic cache flows

9f70247

merge: resolve domain/airline-support conflicts

504bb19

vishal-bala marked this pull request as ready for review May 15, 2026 13:39

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread backend/app/main.py

Comment thread backend/app/semantic_cache.py Outdated

fix: address semantic cache review comments

2776b78

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread backend/app/main.py Outdated

fix: normalize cache guard apostrophes

425195a

cursor Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24

feat: Add semantic caching (both generic and scoped) to the demo workflow (optional)#24
vishal-bala wants to merge 12 commits into
domain/airline-supportfrom
feat/semantic-caching

vishal-bala commented May 6, 2026 •

edited by cursor Bot

Loading

Uh oh!

jit-ci Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vishal-bala commented May 6, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Semantic caching implementation

Scoped semantic caching and multi-user configuration

Airline support demo extension

Additional changes

Uh oh!

jit-ci Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Cleanup flag set before actual resource cleanup completes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vishal-bala commented May 6, 2026 •

edited by cursor Bot

Loading

jit-ci Bot commented May 6, 2026 •

edited

Loading