Skip to content

huggingface/llm-perf-backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-perf Backend 🏋️

The official backend system powering the LLM-perf Leaderboard. This repository contains the infrastructure and tools needed to run standardized benchmarks for Large Language Models (LLMs) across different hardware configurations and optimization backends.

About 📝

LLM-perf Backend is designed to:

  • Run automated benchmarks for the LLM-perf leaderboard
  • Ensure consistent and reproducible performance measurements
  • Support multiple hardware configurations and optimization backends
  • Generate standardized performance metrics for latency, throughput, memory usage, and energy consumption

Key Features 🔑

  • Standardized benchmarking pipeline using Optimum-Benchmark
  • Support for multiple hardware configurations (CPU, GPU)
  • Multiple backend implementations (PyTorch, Onnxruntime, etc.)
  • Automated metric collection:
    • Latency and throughput measurements
    • Memory usage tracking
    • Energy consumption monitoring
    • Quality metrics integration with Open LLM Leaderboard

Installation 🛠️

  1. Clone the repository:
git clone https://github.com/huggingface/llm-perf-backend
cd llm-perf-backend
  1. Create a python env
python -m venv .venv
source .venv/bin/activate
  1. Install the package with required dependencies:
pip install -e "." 
# or
pip install -e ".[all]" # to install optional dependency like Onnxruntime

Usage 📋

Command Line Interface

Run benchmarks using the CLI tool:

llm-perf run-benchmark --hardware cpu --backend pytorch

Configuration Options

View all the options with

llm-perf run-benchmark --help
  • --hardware: Target hardware platform (cpu, cuda)
  • --backend: Backend framework to use (pytorch, onnxruntime, etc.)

Benchmark Dataset 📊

Results are published to the official dataset: optimum-benchmark/llm-perf-leaderboard

Benchmark Specifications 📑

All benchmarks follow these standardized settings:

  • Single GPU usage to avoid communication-dependent results
  • Energy monitoring via CodeCarbon
  • Memory tracking:
    • Maximum allocated memory
    • Maximum reserved memory
    • Maximum used memory (via PyNVML for GPU)

About

Backend for the llm-perf leaderboard space

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published