The official backend system powering the LLM-perf Leaderboard. This repository contains the infrastructure and tools needed to run standardized benchmarks for Large Language Models (LLMs) across different hardware configurations and optimization backends.
LLM-perf Backend is designed to:
- Run automated benchmarks for the LLM-perf leaderboard
- Ensure consistent and reproducible performance measurements
- Support multiple hardware configurations and optimization backends
- Generate standardized performance metrics for latency, throughput, memory usage, and energy consumption
- Standardized benchmarking pipeline using Optimum-Benchmark
- Support for multiple hardware configurations (CPU, GPU)
- Multiple backend implementations (PyTorch, Onnxruntime, etc.)
- Automated metric collection:
- Latency and throughput measurements
- Memory usage tracking
- Energy consumption monitoring
- Quality metrics integration with Open LLM Leaderboard
- Clone the repository:
git clone https://github.com/huggingface/llm-perf-backend
cd llm-perf-backend
- Create a python env
python -m venv .venv
source .venv/bin/activate
- Install the package with required dependencies:
pip install -e "."
# or
pip install -e ".[all]" # to install optional dependency like Onnxruntime
Run benchmarks using the CLI tool:
llm-perf run-benchmark --hardware cpu --backend pytorch
View all the options with
llm-perf run-benchmark --help
--hardware
: Target hardware platform (cpu, cuda)--backend
: Backend framework to use (pytorch, onnxruntime, etc.)
Results are published to the official dataset: optimum-benchmark/llm-perf-leaderboard
All benchmarks follow these standardized settings:
- Single GPU usage to avoid communication-dependent results
- Energy monitoring via CodeCarbon
- Memory tracking:
- Maximum allocated memory
- Maximum reserved memory
- Maximum used memory (via PyNVML for GPU)