Problem Statement
GuideLLM currently lacks support for the Mooncake trace format, which provides real datasets with synthetic tokens and time-based request rates. This gap prevents users from performing proper customer workload simulation using Mooncake traces.
Mooncake traces are becoming an industry standard for KV cache testing and LLM inference benchmarking. Without native support for this trace format, GuideLLM users cannot:
- Replay production-representative workloads that use Mooncake's real dataset with synthetic token distributions
- Leverage Mooncake's time-based request rate patterns for realistic load generation
- Benchmark and demonstrate performance against industry-standard KV cache testing methodologies
This gap directly affects the ability to evaluate and compare LLM deployment performance against widely adopted industry benchmarks.
Proposed Solution
Add native Mooncake trace format support to GuideLLM's data ingestion pipeline. This would involve:
-
Trace Parser: Implement a Mooncake trace parser that can read and interpret Mooncake's trace file format, extracting request timestamps, token counts (input/output), and request metadata.
-
Data Source Integration: Add a new --data source type (e.g., --data mooncake:<path_to_trace>) that allows users to load Mooncake trace files directly as workload definitions.
-
Time-Based Rate Replay: Support Mooncake's time-based request rate patterns so that GuideLLM can replay requests at the exact inter-arrival times specified in the trace, enabling faithful production workload reproduction.
-
Synthetic Token Mapping: Map Mooncake's synthetic token specifications to GuideLLM's internal token representation, ensuring that input/output token distributions from the trace are accurately reflected in the benchmark requests.
-
Documentation: Add documentation and examples showing how to use Mooncake traces with GuideLLM for KV cache performance testing.
Alternatives Considered
Manual trace conversion: Users could manually convert Mooncake trace files into GuideLLM's existing supported formats (e.g., custom JSON or HuggingFace datasets). However, this is error-prone, loses the time-based request rate information that is central to Mooncake's value, and creates a significant barrier for users who want to quickly benchmark with industry-standard traces.
External preprocessing scripts: A standalone script could preprocess Mooncake traces into a compatible format. This adds toolchain complexity and maintenance burden, and still loses fidelity in the time-based scheduling that Mooncake traces provide.
Using other benchmarking tools: Some other LLM benchmarking tools may support Mooncake traces, but they lack GuideLLM's comprehensive evaluation capabilities such as sweep-based rate testing and rich reporting.
Usage Examples
# Run GuideLLM benchmark using a Mooncake trace file
guidellm benchmark run --target http://localhost:8000/v1 --data mooncake:./traces/mooncake_trace.jsonl
# Use Mooncake trace with time-based request replay
guidellm benchmark run --target http://localhost:8000/v1 --data mooncake:./traces/mooncake_trace.jsonl --rate-type trace
# Combine Mooncake trace with sweep mode to find saturation point
guidellm benchmark run --target http://localhost:8000/v1 --data mooncake:./traces/mooncake_trace.jsonl --rate-type sweep
Additional Context
Background on Mooncake Traces:
Mooncake is an open-source LLM serving platform that has published production trace datasets. These traces provide real dataset characteristics with synthetic tokens and time-based request rates, making them valuable for realistic workload simulation and KV cache performance testing.
Industry Adoption:
Mooncake traces are increasingly being adopted as a standard benchmark format for evaluating LLM inference performance, particularly for KV cache optimization and disaggregated prefill/decode architectures. Adding support would allow GuideLLM to be used in direct comparisons with results from other tools in the ecosystem.
Related Work:
- Mooncake project: https://github.com/kvcache-ai/Mooncake
- The trace format includes fields for request arrival times, input/output token counts, and request metadata that map well to GuideLLM's existing data pipeline concepts.
Problem Statement
GuideLLM currently lacks support for the Mooncake trace format, which provides real datasets with synthetic tokens and time-based request rates. This gap prevents users from performing proper customer workload simulation using Mooncake traces.
Mooncake traces are becoming an industry standard for KV cache testing and LLM inference benchmarking. Without native support for this trace format, GuideLLM users cannot:
This gap directly affects the ability to evaluate and compare LLM deployment performance against widely adopted industry benchmarks.
Proposed Solution
Add native Mooncake trace format support to GuideLLM's data ingestion pipeline. This would involve:
Trace Parser: Implement a Mooncake trace parser that can read and interpret Mooncake's trace file format, extracting request timestamps, token counts (input/output), and request metadata.
Data Source Integration: Add a new
--datasource type (e.g.,--data mooncake:<path_to_trace>) that allows users to load Mooncake trace files directly as workload definitions.Time-Based Rate Replay: Support Mooncake's time-based request rate patterns so that GuideLLM can replay requests at the exact inter-arrival times specified in the trace, enabling faithful production workload reproduction.
Synthetic Token Mapping: Map Mooncake's synthetic token specifications to GuideLLM's internal token representation, ensuring that input/output token distributions from the trace are accurately reflected in the benchmark requests.
Documentation: Add documentation and examples showing how to use Mooncake traces with GuideLLM for KV cache performance testing.
Alternatives Considered
Manual trace conversion: Users could manually convert Mooncake trace files into GuideLLM's existing supported formats (e.g., custom JSON or HuggingFace datasets). However, this is error-prone, loses the time-based request rate information that is central to Mooncake's value, and creates a significant barrier for users who want to quickly benchmark with industry-standard traces.
External preprocessing scripts: A standalone script could preprocess Mooncake traces into a compatible format. This adds toolchain complexity and maintenance burden, and still loses fidelity in the time-based scheduling that Mooncake traces provide.
Using other benchmarking tools: Some other LLM benchmarking tools may support Mooncake traces, but they lack GuideLLM's comprehensive evaluation capabilities such as sweep-based rate testing and rich reporting.
Usage Examples
Additional Context
Background on Mooncake Traces:
Mooncake is an open-source LLM serving platform that has published production trace datasets. These traces provide real dataset characteristics with synthetic tokens and time-based request rates, making them valuable for realistic workload simulation and KV cache performance testing.
Industry Adoption:
Mooncake traces are increasingly being adopted as a standard benchmark format for evaluating LLM inference performance, particularly for KV cache optimization and disaggregated prefill/decode architectures. Adding support would allow GuideLLM to be used in direct comparisons with results from other tools in the ecosystem.
Related Work: