[Refactor] New memory layout for AIBrix KVCache #1174

DwyaneShi · 2025-06-04T00:35:00Z

Pull Request Description

Legacy layout embedded tokens directly in the key, which could result in very long keys
for cache blocks with long prefixes
New layout uses hash as the key and stores tokens as part of the value
L2Cache get operation now uses hash key and verifies token match after
retrieving value
Enable new memory layout in vLLM v0.8.5

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Copilot

Pull Request Overview

Refactors the cache key layout to use fixed‐size hash keys (via TokenCacheKey) and move token storage into the cache value, updates L1/L2 flows and eviction policies to operate on byte capacities, and removes the old raw/hex key builders.

Remove RawKeyBuilder/HexKeyBuilder in favor of hash‐based keys
Introduce TokenCacheKey and MemoryRegionCacheEntry for key/value handling
Update L1Cache, KVCacheManager, connectors, and eviction policies to use byte‐based capacities

Reviewed Changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
key_builders/raw_key_builder.py	Removed legacy raw key builder
key_builders/hex_key_builder.py	Removed legacy hex key builder
key_builders/key_builder.py	Switched abstract builder to use `np.ndarray` and byte‐based keys
key_builders/hasher.py	Added new `FarmHasher`
key_builders/init.py	Export `FarmHasher`, drop raw/hex builders
connectors/mock.py	Relaxed key type to `Any`, return actual feature object
connectors/infinistore.py	Relaxed key type hints for new key format
connectors/hpkv.py	Fixed use of `desc.reg_buf` in HPKV SGL
connectors/connector.py	Relaxed key type hints to `Any`
l1/l1_cache.py	Refactored to accept `TokenCacheKey`, byte capacity, batch sizes
l1/eviction_policy/s3fifo.py	Updated S3FIFO to track bytes, use `MemoryRegionCacheEntry`
l1/eviction_policy/lru.py	Updated LRU to track bytes, use `MemoryRegionCacheEntry`
l1/eviction_policy/fifo.py	Updated FIFO to track bytes, use `MemoryRegionCacheEntry`
l1/eviction_policy/base_eviction_policy.py	Refactored base class to use byte capacity and hashed entries
envs.py	Removed deprecated evict‐size setting
config.py	Added `ModelSpec` field to KVCacheConfig
cache_manager.py	Overhauled to use `TokenCacheKey`, `allocate_for`, new get/put
cache_hashable.py	New: Defines `TokenCacheKey` and `MemoryRegionCacheEntry`
init.py	Export `TokenCacheKey`

python/aibrix_kvcache/aibrix_kvcache/cache_manager.py

- Legacy layout embedded tokens directly in the key, which could result in very long keys for cache blocks with long prefixes - New layout uses hash as the key and stores tokens as part of the value - L2Cache get operation now uses hash key and verifies token match after retrieving value Signed-off-by: Haiyang Shi <[email protected]>

Signed-off-by: Haiyang Shi <[email protected]>

Jeffwan

let's merge this one first and unblock future improvements. I will leave my comments today and we can address in future PRs.

Jeffwan · 2025-06-26T13:29:03Z

/gemini review

gemini-code-assist

Code Review

The pull request refactors the AIBrix KVCache to use a new memory layout, where a hash is used as the key and tokens are stored as part of the value. This change aims to improve efficiency by avoiding very long keys for cache blocks with long prefixes in the legacy layout. The L2Cache get operation is updated to use the hash key and verify token matches. The new memory layout is enabled in vLLM v0.8.5. The changes involve updates to cache management, eviction policies, and key builders to support the new hash-based keying and memory region handling.

DwyaneShi requested a review from Jeffwan June 4, 2025 00:35

Jeffwan requested a review from Copilot June 4, 2025 04:46

Copilot AI reviewed Jun 4, 2025

View reviewed changes

python/aibrix_kvcache/aibrix_kvcache/cache_manager.py Outdated Show resolved Hide resolved

DwyaneShi force-pushed the haiyang/kvcache-layout branch from bf4cc21 to b1efd56 Compare June 14, 2025 05:56

DwyaneShi force-pushed the haiyang/kvcache-layout branch 2 times, most recently from bb2929c to 6d76d82 Compare June 25, 2025 17:12

Haiyang Shi added 7 commits June 26, 2025 09:11

[Integration] Enable new memory layout in vLLM v0.8.5

cdab299

Signed-off-by: Haiyang Shi <[email protected]>

[Chore] Add hpkv dependency

c811fc9

Signed-off-by: Haiyang Shi <[email protected]>

[Fix] Fix typing errors with python3.11

b17f9c0

Signed-off-by: Haiyang Shi <[email protected]>

[Fix] Fix BaseKVCacheManager

50aaa4a

Signed-off-by: Haiyang Shi <[email protected]>

[Chore] Optimize L2Cache tokens comparison

c8d54f2

Signed-off-by: Haiyang Shi <[email protected]>

[Feature] KVCache layout: compact laytout

bdaac12

Signed-off-by: Haiyang Shi <[email protected]>

Jeffwan force-pushed the haiyang/kvcache-layout branch from 6d76d82 to bdaac12 Compare June 26, 2025 01:11

Jeffwan approved these changes Jun 26, 2025

View reviewed changes

DwyaneShi merged commit fa39b73 into vllm-project:main Jun 26, 2025
12 checks passed

gemini-code-assist bot reviewed Jun 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] New memory layout for AIBrix KVCache #1174

[Refactor] New memory layout for AIBrix KVCache #1174

Uh oh!

DwyaneShi commented Jun 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Jeffwan left a comment

Uh oh!

Uh oh!

Jeffwan commented Jun 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

[Refactor] New memory layout for AIBrix KVCache #1174

[Refactor] New memory layout for AIBrix KVCache #1174

Uh oh!

Conversation

DwyaneShi commented Jun 4, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Jeffwan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jeffwan commented Jun 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!