Skip to content

[Refactor] New memory layout for AIBrix KVCache #1174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 26, 2025

Conversation

DwyaneShi
Copy link
Collaborator

Pull Request Description

  • Legacy layout embedded tokens directly in the key, which could result in very long keys
    for cache blocks with long prefixes
  • New layout uses hash as the key and stores tokens as part of the value
  • L2Cache get operation now uses hash key and verifies token match after
    retrieving value
  • Enable new memory layout in vLLM v0.8.5

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@DwyaneShi DwyaneShi requested a review from Jeffwan June 4, 2025 00:35
@Jeffwan Jeffwan requested a review from Copilot June 4, 2025 04:46
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Refactors the cache key layout to use fixed‐size hash keys (via TokenCacheKey) and move token storage into the cache value, updates L1/L2 flows and eviction policies to operate on byte capacities, and removes the old raw/hex key builders.

  • Remove RawKeyBuilder/HexKeyBuilder in favor of hash‐based keys
  • Introduce TokenCacheKey and MemoryRegionCacheEntry for key/value handling
  • Update L1Cache, KVCacheManager, connectors, and eviction policies to use byte‐based capacities

Reviewed Changes

Copilot reviewed 41 out of 41 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
key_builders/raw_key_builder.py Removed legacy raw key builder
key_builders/hex_key_builder.py Removed legacy hex key builder
key_builders/key_builder.py Switched abstract builder to use np.ndarray and byte‐based keys
key_builders/hasher.py Added new FarmHasher
key_builders/init.py Export FarmHasher, drop raw/hex builders
connectors/mock.py Relaxed key type to Any, return actual feature object
connectors/infinistore.py Relaxed key type hints for new key format
connectors/hpkv.py Fixed use of desc.reg_buf in HPKV SGL
connectors/connector.py Relaxed key type hints to Any
l1/l1_cache.py Refactored to accept TokenCacheKey, byte capacity, batch sizes
l1/eviction_policy/s3fifo.py Updated S3FIFO to track bytes, use MemoryRegionCacheEntry
l1/eviction_policy/lru.py Updated LRU to track bytes, use MemoryRegionCacheEntry
l1/eviction_policy/fifo.py Updated FIFO to track bytes, use MemoryRegionCacheEntry
l1/eviction_policy/base_eviction_policy.py Refactored base class to use byte capacity and hashed entries
envs.py Removed deprecated evict‐size setting
config.py Added ModelSpec field to KVCacheConfig
cache_manager.py Overhauled to use TokenCacheKey, allocate_for, new get/put
cache_hashable.py New: Defines TokenCacheKey and MemoryRegionCacheEntry
init.py Export TokenCacheKey

@DwyaneShi DwyaneShi force-pushed the haiyang/kvcache-layout branch from bf4cc21 to b1efd56 Compare June 14, 2025 05:56
@DwyaneShi DwyaneShi force-pushed the haiyang/kvcache-layout branch 2 times, most recently from bb2929c to 6d76d82 Compare June 25, 2025 17:12
Haiyang Shi added 7 commits June 26, 2025 09:11
- Legacy layout embedded tokens directly in the key, which could result in very long keys
  for cache blocks with long prefixes
- New layout uses hash as the key and stores tokens as part of the value
- L2Cache get operation now uses hash key and verifies token match after
  retrieving value

Signed-off-by: Haiyang Shi <[email protected]>
Signed-off-by: Haiyang Shi <[email protected]>
Signed-off-by: Haiyang Shi <[email protected]>
@Jeffwan Jeffwan force-pushed the haiyang/kvcache-layout branch from 6d76d82 to bdaac12 Compare June 26, 2025 01:11
Copy link
Collaborator

@Jeffwan Jeffwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge this one first and unblock future improvements. I will leave my comments today and we can address in future PRs.

@DwyaneShi DwyaneShi merged commit fa39b73 into vllm-project:main Jun 26, 2025
12 checks passed
@Jeffwan
Copy link
Collaborator

Jeffwan commented Jun 26, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request refactors the AIBrix KVCache to use a new memory layout, where a hash is used as the key and tokens are stored as part of the value. This change aims to improve efficiency by avoiding very long keys for cache blocks with long prefixes in the legacy layout. The L2Cache get operation is updated to use the hash key and verify token matches. The new memory layout is enabled in vLLM v0.8.5. The changes involve updates to cache management, eviction policies, and key builders to support the new hash-based keying and memory region handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants