Skip to content

Commit fa884a5

Browse files
Merge pull request #8 from huggingface/make-backend-public
Make backend public
2 parents 9aabcf9 + 86dcef2 commit fa884a5

16 files changed

+108
-79
lines changed

.github/workflows/benchmark_cpu_onnxruntime.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,7 @@ name: Benchmark CPU Onnxruntime
33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: "0 0 * * *"
7-
push:
8-
branches:
9-
- '*'
6+
- cron: "0 12 * * *"
107
pull_request:
118

129
concurrency:

.github/workflows/benchmark_cpu_openvino.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ on:
44
workflow_dispatch:
55
schedule:
66
- cron: "0 0 * * *"
7-
push:
8-
branches:
9-
- '*'
107
pull_request:
118

129
concurrency:

.github/workflows/benchmark_cpu_pytorch.yaml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,6 @@ on:
44
workflow_dispatch:
55
schedule:
66
- cron: "0 0 * * *"
7-
push:
8-
branches:
9-
- '*'
107
pull_request:
118

129
concurrency:

.github/workflows/benchmark_cuda_pytorch.yaml

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,7 @@ name: Benchmark CUDA PyTorch
33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: "0 0 * * *"
7-
push:
8-
branches:
9-
- '*'
6+
- cron: "0 3 * * *"
107
pull_request:
118

129
concurrency:
@@ -33,7 +30,7 @@ jobs:
3330
strategy:
3431
fail-fast: false
3532
matrix:
36-
subset: [unquantized, bnb, awq, gptq]
33+
subset: [torchao]
3734

3835
machine:
3936
[

.github/workflows/update_llm_perf_leaderboard.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: Update LLM Perf Leaderboard
33
on:
44
workflow_dispatch:
55
schedule:
6-
- cron: "0 */6 * * *"
6+
- cron: "0 0 * * *"
77
push:
88
branches:
99
- main

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,4 +187,6 @@ outputs/
187187
wip/
188188

189189
*.csv
190-
optimum-benchmark/
190+
optimum-benchmark/
191+
192+
*.egg-info/

Makefile

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Style and Quality checks
2-
.PHONY: style quality
2+
.PHONY: style quality install install-dev run_cpu_container run_cuda_container run_rocm_container cpu-pytorch-container cpu-openvino-container collector-container
33

44
quality:
55
ruff check .
@@ -9,17 +9,13 @@ style:
99
ruff format .
1010
ruff check --fix .
1111

12-
.PHONY: install
13-
1412
install:
1513
pip install .
1614

1715
install-dev:
1816
DEBUG=1 uv pip install -e .
1917

20-
# Running containers
21-
.PHONY: run_cpu_container run_cuda_container run_rocm_container
22-
18+
# Running optimum-benchmark containers
2319
run_cpu_container:
2420
docker run -it --rm --pid host --volume .:/llm-perf-backend --workdir /llm-perf-backend ghcr.io/huggingface/optimum-benchmark:latest-cpu
2521

@@ -29,15 +25,15 @@ run_cuda_container:
2925
run_rocm_container:
3026
docker run -it --rm --shm-size 64G --device /dev/kfd --device /dev/dri --volume .:/llm-perf-backend --workdir /llm-perf-backend ghcr.io/huggingface/optimum-benchmark:latest-rocm
3127

28+
# Running llm-perf backend containers
3229
cpu-pytorch-container:
3330
docker build -t cpu-pytorch -f docker/cpu-pytorch/Dockerfile .
34-
# docker run -it --rm --pid host cpu-pytorch /bin/bash
3531
docker run -it --rm --pid host cpu-pytorch
3632

37-
collector-container:
38-
docker build -t collector -f docker/collector/Dockerfile .
39-
docker run -it --rm --pid host collector
40-
4133
cpu-openvino-container:
4234
docker build -t cpu-openvino -f docker/cpu-openvino/Dockerfile .
4335
docker run -it --rm --pid host cpu-openvino
36+
37+
collector-container:
38+
docker build -t collector -f docker/collector/Dockerfile .
39+
docker run -it --rm --pid host collector

README.md

Lines changed: 73 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,78 @@
1-
# llm-perf-backend
2-
The backend of [the LLM-perf leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard)
1+
# LLM-perf Backend 🏋️
32

4-
## Why
5-
this runs all the benchmarks to get results for the leaderboard
3+
The official backend system powering the [LLM-perf Leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard). This repository contains the infrastructure and tools needed to run standardized benchmarks for Large Language Models (LLMs) across different hardware configurations and optimization backends.
64

7-
## How to install
8-
git clone
9-
pip install -e .[openvino]
5+
## About 📝
106

11-
## How to use the cli
12-
llm-perf run-benchmark --hardware cpu --backend openvino
7+
LLM-perf Backend is designed to:
8+
- Run automated benchmarks for the LLM-perf leaderboard
9+
- Ensure consistent and reproducible performance measurements
10+
- Support multiple hardware configurations and optimization backends
11+
- Generate standardized performance metrics for latency, throughput, memory usage, and energy consumption
12+
13+
## Key Features 🔑
14+
15+
- Standardized benchmarking pipeline using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark)
16+
- Support for multiple hardware configurations (CPU, GPU)
17+
- Multiple backend implementations (PyTorch, Onnxruntime, etc.)
18+
- Automated metric collection:
19+
- Latency and throughput measurements
20+
- Memory usage tracking
21+
- Energy consumption monitoring
22+
- Quality metrics integration with Open LLM Leaderboard
23+
24+
## Installation 🛠️
25+
26+
1. Clone the repository:
27+
```bash
28+
git clone https://github.com/huggingface/llm-perf-backend
29+
cd llm-perf-backend
30+
```
31+
32+
2. Create a python env
33+
```bash
34+
python -m venv .venv
35+
source .venv/bin/activate
36+
```
37+
38+
2. Install the package with required dependencies:
39+
```bash
40+
pip install -e "."
41+
# or
42+
pip install -e ".[all]" # to install optional dependency like Onnxruntime
43+
```
44+
45+
## Usage 📋
46+
47+
### Command Line Interface
48+
49+
Run benchmarks using the CLI tool:
50+
51+
```bash
1352
llm-perf run-benchmark --hardware cpu --backend pytorch
53+
```
54+
55+
### Configuration Options
56+
57+
View all the options with
58+
```bash
59+
llm-perf run-benchmark --help
60+
```
61+
62+
- `--hardware`: Target hardware platform (cpu, cuda)
63+
- `--backend`: Backend framework to use (pytorch, onnxruntime, etc.)
64+
65+
## Benchmark Dataset 📊
66+
67+
Results are published to the official dataset:
68+
[optimum-benchmark/llm-perf-leaderboard](https://huggingface.co/datasets/optimum-benchmark/llm-perf-leaderboard)
69+
70+
## Benchmark Specifications 📑
1471

15-
https://huggingface.co/datasets/optimum-benchmark/llm-perf-leaderboard
72+
All benchmarks follow these standardized settings:
73+
- Single GPU usage to avoid communication-dependent results
74+
- Energy monitoring via CodeCarbon
75+
- Memory tracking:
76+
- Maximum allocated memory
77+
- Maximum reserved memory
78+
- Maximum used memory (via PyNVML for GPU)

llm_perf/benchmark_runners/cuda/update_llm_perf_cuda_pytorch.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,17 @@ def _get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
191191
},
192192
},
193193
}
194+
elif subset == "torchao":
195+
return {
196+
"torchao-int4wo-128": {
197+
"torch_dtype": "bfloat16",
198+
"quant_scheme": "torchao",
199+
"quant_config": {
200+
"quant_type": "int4_weight_only",
201+
"group_size": 128,
202+
},
203+
},
204+
}
194205
else:
195206
raise ValueError(f"Unknown subset: {subset}")
196207

llm_perf/hardware.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
- awq
66
- bnb
77
- gptq
8+
- torchao
89
backends:
910
- pytorch
1011

@@ -15,6 +16,7 @@
1516
- awq
1617
- bnb
1718
- gptq
19+
- torchao
1820
backends:
1921
- pytorch
2022

@@ -25,6 +27,7 @@
2527
- awq
2628
- bnb
2729
- gptq
30+
- torchao
2831
backends:
2932
- pytorch
3033

llm_perf/update_llm_perf_leaderboard.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
import pandas as pd
55
from huggingface_hub import create_repo, snapshot_download, upload_file, repo_exists
66
from optimum_benchmark import Benchmark
7-
import requests
87
import json
98

109
from llm_perf.common.hardware_config import load_hardware_configs
@@ -19,6 +18,7 @@
1918
PERF_DF = "perf-df-{backend}-{hardware}-{subset}-{machine}.csv"
2019
LLM_DF = "llm-df.csv"
2120

21+
2222
def patch_json(file):
2323
"""
2424
Patch a JSON file by adding a 'stdev_' key with the same value as 'stdev' for all occurrences,
@@ -37,7 +37,7 @@ def patch_json(file):
3737
"""
3838
with open(file, "r") as f:
3939
data = json.load(f)
40-
40+
4141
def add_stdev_(obj):
4242
if isinstance(obj, dict):
4343
new_items = []
@@ -53,10 +53,11 @@ def add_stdev_(obj):
5353
add_stdev_(item)
5454

5555
add_stdev_(data)
56-
56+
5757
with open(file, "w") as f:
5858
json.dump(data, f, indent=4)
5959

60+
6061
def gather_benchmarks(subset: str, machine: str, backend: str, hardware: str):
6162
"""
6263
Gather the benchmarks for a given machine
@@ -99,7 +100,6 @@ def gather_benchmarks(subset: str, machine: str, backend: str, hardware: str):
99100
# return response.status_code == 200
100101

101102

102-
103103
def update_perf_dfs():
104104
"""
105105
Update the performance dataframes for all machines
@@ -116,19 +116,18 @@ def update_perf_dfs():
116116
backend,
117117
hardware_config.hardware,
118118
)
119-
except Exception as e:
119+
except Exception:
120120
print("Dataset not found for:")
121121
print(f" • Backend: {backend}")
122122
print(f" • Subset: {subset}")
123123
print(f" • Machine: {hardware_config.machine}")
124124
print(f" • Hardware Type: {hardware_config.hardware}")
125125
url = f"{PERF_REPO_ID.format(subset=subset, machine=hardware_config.machine, backend=backend, hardware=hardware_config.hardware)}"
126-
126+
127127
does_exist = repo_exists(url, repo_type="dataset")
128128

129129
if does_exist:
130130
print(f"Dataset exists: {url} but could not be processed")
131-
132131

133132

134133
scrapping_script = """

optimum-benchmark

Lines changed: 0 additions & 1 deletion
This file was deleted.

pyproject.toml.bak

Lines changed: 0 additions & 29 deletions
This file was deleted.

setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
"auto-gptq",
3333
"bitsandbytes",
3434
"autoawq",
35+
"torchao",
3536
],
3637
}
3738

test.py

Whitespace-only changes.

test.sh

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)