Skip to content

Make backend public #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 1 addition & 4 deletions .github/workflows/benchmark_cpu_onnxruntime.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@ name: Benchmark CPU Onnxruntime
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- '*'
- cron: "0 12 * * *"
pull_request:

concurrency:
Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/benchmark_cpu_openvino.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- '*'
pull_request:

concurrency:
Expand Down
3 changes: 0 additions & 3 deletions .github/workflows/benchmark_cpu_pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- '*'
pull_request:

concurrency:
Expand Down
7 changes: 2 additions & 5 deletions .github/workflows/benchmark_cuda_pytorch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@ name: Benchmark CUDA PyTorch
on:
workflow_dispatch:
schedule:
- cron: "0 0 * * *"
push:
branches:
- '*'
- cron: "0 3 * * *"
pull_request:

concurrency:
Expand All @@ -33,7 +30,7 @@ jobs:
strategy:
fail-fast: false
matrix:
subset: [unquantized, bnb, awq, gptq]
subset: [torchao]

machine:
[
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/update_llm_perf_leaderboard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: Update LLM Perf Leaderboard
on:
workflow_dispatch:
schedule:
- cron: "0 */6 * * *"
- cron: "0 0 * * *"
push:
branches:
- main
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -187,4 +187,6 @@ outputs/
wip/

*.csv
optimum-benchmark/
optimum-benchmark/

*.egg-info/
18 changes: 7 additions & 11 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Style and Quality checks
.PHONY: style quality
.PHONY: style quality install install-dev run_cpu_container run_cuda_container run_rocm_container cpu-pytorch-container cpu-openvino-container collector-container

quality:
ruff check .
Expand All @@ -9,17 +9,13 @@ style:
ruff format .
ruff check --fix .

.PHONY: install

install:
pip install .

install-dev:
DEBUG=1 uv pip install -e .

# Running containers
.PHONY: run_cpu_container run_cuda_container run_rocm_container

# Running optimum-benchmark containers
run_cpu_container:
docker run -it --rm --pid host --volume .:/llm-perf-backend --workdir /llm-perf-backend ghcr.io/huggingface/optimum-benchmark:latest-cpu

Expand All @@ -29,15 +25,15 @@ run_cuda_container:
run_rocm_container:
docker run -it --rm --shm-size 64G --device /dev/kfd --device /dev/dri --volume .:/llm-perf-backend --workdir /llm-perf-backend ghcr.io/huggingface/optimum-benchmark:latest-rocm

# Running llm-perf backend containers
cpu-pytorch-container:
docker build -t cpu-pytorch -f docker/cpu-pytorch/Dockerfile .
# docker run -it --rm --pid host cpu-pytorch /bin/bash
docker run -it --rm --pid host cpu-pytorch

collector-container:
docker build -t collector -f docker/collector/Dockerfile .
docker run -it --rm --pid host collector

cpu-openvino-container:
docker build -t cpu-openvino -f docker/cpu-openvino/Dockerfile .
docker run -it --rm --pid host cpu-openvino

collector-container:
docker build -t collector -f docker/collector/Dockerfile .
docker run -it --rm --pid host collector
83 changes: 73 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,78 @@
# llm-perf-backend
The backend of [the LLM-perf leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard)
# LLM-perf Backend 🏋️

## Why
this runs all the benchmarks to get results for the leaderboard
The official backend system powering the [LLM-perf Leaderboard](https://huggingface.co/spaces/optimum/llm-perf-leaderboard). This repository contains the infrastructure and tools needed to run standardized benchmarks for Large Language Models (LLMs) across different hardware configurations and optimization backends.

## How to install
git clone
pip install -e .[openvino]
## About 📝

## How to use the cli
llm-perf run-benchmark --hardware cpu --backend openvino
LLM-perf Backend is designed to:
- Run automated benchmarks for the LLM-perf leaderboard
- Ensure consistent and reproducible performance measurements
- Support multiple hardware configurations and optimization backends
- Generate standardized performance metrics for latency, throughput, memory usage, and energy consumption

## Key Features 🔑

- Standardized benchmarking pipeline using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark)
- Support for multiple hardware configurations (CPU, GPU)
- Multiple backend implementations (PyTorch, Onnxruntime, etc.)
- Automated metric collection:
- Latency and throughput measurements
- Memory usage tracking
- Energy consumption monitoring
- Quality metrics integration with Open LLM Leaderboard

## Installation 🛠️

1. Clone the repository:
```bash
git clone https://github.com/huggingface/llm-perf-backend
cd llm-perf-backend
```

2. Create a python env
```bash
python -m venv .venv
source .venv/bin/activate
```

2. Install the package with required dependencies:
```bash
pip install -e "."
# or
pip install -e ".[all]" # to install optional dependency like Onnxruntime
```

## Usage 📋

### Command Line Interface

Run benchmarks using the CLI tool:

```bash
llm-perf run-benchmark --hardware cpu --backend pytorch
```

### Configuration Options

View all the options with
```bash
llm-perf run-benchmark --help
```

- `--hardware`: Target hardware platform (cpu, cuda)
- `--backend`: Backend framework to use (pytorch, onnxruntime, etc.)

## Benchmark Dataset 📊

Results are published to the official dataset:
[optimum-benchmark/llm-perf-leaderboard](https://huggingface.co/datasets/optimum-benchmark/llm-perf-leaderboard)

## Benchmark Specifications 📑

https://huggingface.co/datasets/optimum-benchmark/llm-perf-leaderboard
All benchmarks follow these standardized settings:
- Single GPU usage to avoid communication-dependent results
- Energy monitoring via CodeCarbon
- Memory tracking:
- Maximum allocated memory
- Maximum reserved memory
- Maximum used memory (via PyNVML for GPU)
11 changes: 11 additions & 0 deletions llm_perf/benchmark_runners/cuda/update_llm_perf_cuda_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,17 @@ def _get_weights_configs(self, subset) -> Dict[str, Dict[str, Any]]:
},
},
}
elif subset == "torchao":
return {
"torchao-int4wo-128": {
"torch_dtype": "bfloat16",
"quant_scheme": "torchao",
"quant_config": {
"quant_type": "int4_weight_only",
"group_size": 128,
},
},
}
else:
raise ValueError(f"Unknown subset: {subset}")

Expand Down
3 changes: 3 additions & 0 deletions llm_perf/hardware.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
- awq
- bnb
- gptq
- torchao
backends:
- pytorch

Expand All @@ -15,6 +16,7 @@
- awq
- bnb
- gptq
- torchao
backends:
- pytorch

Expand All @@ -25,6 +27,7 @@
- awq
- bnb
- gptq
- torchao
backends:
- pytorch

Expand Down
13 changes: 6 additions & 7 deletions llm_perf/update_llm_perf_leaderboard.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
import pandas as pd
from huggingface_hub import create_repo, snapshot_download, upload_file, repo_exists
from optimum_benchmark import Benchmark
import requests
import json

from llm_perf.common.hardware_config import load_hardware_configs
Expand All @@ -19,6 +18,7 @@
PERF_DF = "perf-df-{backend}-{hardware}-{subset}-{machine}.csv"
LLM_DF = "llm-df.csv"


def patch_json(file):
"""
Patch a JSON file by adding a 'stdev_' key with the same value as 'stdev' for all occurrences,
Expand All @@ -37,7 +37,7 @@ def patch_json(file):
"""
with open(file, "r") as f:
data = json.load(f)

def add_stdev_(obj):
if isinstance(obj, dict):
new_items = []
Expand All @@ -53,10 +53,11 @@ def add_stdev_(obj):
add_stdev_(item)

add_stdev_(data)

with open(file, "w") as f:
json.dump(data, f, indent=4)


def gather_benchmarks(subset: str, machine: str, backend: str, hardware: str):
"""
Gather the benchmarks for a given machine
Expand Down Expand Up @@ -99,7 +100,6 @@ def gather_benchmarks(subset: str, machine: str, backend: str, hardware: str):
# return response.status_code == 200



def update_perf_dfs():
"""
Update the performance dataframes for all machines
Expand All @@ -116,19 +116,18 @@ def update_perf_dfs():
backend,
hardware_config.hardware,
)
except Exception as e:
except Exception:
print("Dataset not found for:")
print(f" • Backend: {backend}")
print(f" • Subset: {subset}")
print(f" • Machine: {hardware_config.machine}")
print(f" • Hardware Type: {hardware_config.hardware}")
url = f"{PERF_REPO_ID.format(subset=subset, machine=hardware_config.machine, backend=backend, hardware=hardware_config.hardware)}"

does_exist = repo_exists(url, repo_type="dataset")

if does_exist:
print(f"Dataset exists: {url} but could not be processed")



scrapping_script = """
Expand Down
1 change: 0 additions & 1 deletion optimum-benchmark
Submodule optimum-benchmark deleted from de1e79
29 changes: 0 additions & 29 deletions pyproject.toml.bak

This file was deleted.

1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"auto-gptq",
"bitsandbytes",
"autoawq",
"torchao",
],
}

Expand Down
Empty file removed test.py
Empty file.
4 changes: 0 additions & 4 deletions test.sh

This file was deleted.

Loading