Skip to content

Commit e6639fe

Browse files
sayakpaulDN6McPatate
authored
[benchmarks] overhaul benchmarks (#11565)
* start overhauling the benchmarking suite. * fixes * fixes * checking. * checking * fixes. * error handling and logging. * add flops and params. * add more models. * utility to fire execution of all benchmarking scripts. * utility to push to the hub. * push utility improvement * seems to be working. * okay * add torchprofile dep. * remove total gpu memory * fixes * fix * need a big gpu * better * what's happening. * okay * separate requirements and make it nightly. * add db population script. * update secret name * update secret. * population db update * disable db population for now. * change to every monday * Update .github/workflows/benchmark.yml Co-authored-by: Dhruv Nair <[email protected]> * quality improvements. * reparate hub upload step. * repository * remove csv * check * update * update * threading. * update * update * updaye * update * update * update * remove peft dep * upgrade runner. * fix * fixes * fix merging csvs. * push dataset to the Space repo for analysis. * warm up. * add a readme * Apply suggestions from code review Co-authored-by: Luc Georges <[email protected]> * address feedback * Apply suggestions from code review * disable db workflow. * update to bi weekly. * enable population * enable * updaye * update * metadata * fix --------- Co-authored-by: Dhruv Nair <[email protected]> Co-authored-by: Luc Georges <[email protected]>
1 parent 8c938fb commit e6639fe

22 files changed

+947
-761
lines changed

.github/workflows/benchmark.yml

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,18 @@ env:
1111
HF_HOME: /mnt/cache
1212
OMP_NUM_THREADS: 8
1313
MKL_NUM_THREADS: 8
14+
BASE_PATH: benchmark_outputs
1415

1516
jobs:
16-
torch_pipelines_cuda_benchmark_tests:
17+
torch_models_cuda_benchmark_tests:
1718
env:
1819
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
19-
name: Torch Core Pipelines CUDA Benchmarking Tests
20+
name: Torch Core Models CUDA Benchmarking Tests
2021
strategy:
2122
fail-fast: false
2223
max-parallel: 1
2324
runs-on:
24-
group: aws-g6-4xlarge-plus
25+
group: aws-g6e-4xlarge
2526
container:
2627
image: diffusers/diffusers-pytorch-cuda
2728
options: --shm-size "16gb" --ipc host --gpus 0
@@ -35,27 +36,47 @@ jobs:
3536
nvidia-smi
3637
- name: Install dependencies
3738
run: |
39+
apt update
40+
apt install -y libpq-dev postgresql-client
3841
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
3942
python -m uv pip install -e [quality,test]
40-
python -m uv pip install pandas peft
41-
python -m uv pip uninstall transformers && python -m uv pip install transformers==4.48.0
43+
python -m uv pip install -r benchmarks/requirements.txt
4244
- name: Environment
4345
run: |
4446
python utils/print_env.py
4547
- name: Diffusers Benchmarking
4648
env:
47-
HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
48-
BASE_PATH: benchmark_outputs
49+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
4950
run: |
50-
export TOTAL_GPU_MEMORY=$(python -c "import torch; print(torch.cuda.get_device_properties(0).total_memory / (1024**3))")
51-
cd benchmarks && mkdir ${BASE_PATH} && python run_all.py && python push_results.py
51+
cd benchmarks && python run_all.py
52+
53+
- name: Push results to the Hub
54+
env:
55+
HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
56+
run: |
57+
cd benchmarks && python push_results.py
58+
mkdir $BASE_PATH && cp *.csv $BASE_PATH
5259
5360
- name: Test suite reports artifacts
5461
if: ${{ always() }}
5562
uses: actions/upload-artifact@v4
5663
with:
5764
name: benchmark_test_reports
58-
path: benchmarks/benchmark_outputs
65+
path: benchmarks/${{ env.BASE_PATH }}
66+
67+
# TODO: enable this once the connection problem has been resolved.
68+
- name: Update benchmarking results to DB
69+
env:
70+
PGDATABASE: metrics
71+
PGHOST: ${{ secrets.DIFFUSERS_BENCHMARKS_PGHOST }}
72+
PGUSER: transformers_benchmarks
73+
PGPASSWORD: ${{ secrets.DIFFUSERS_BENCHMARKS_PGPASSWORD }}
74+
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
75+
run: |
76+
git config --global --add safe.directory /__w/diffusers/diffusers
77+
commit_id=$GITHUB_SHA
78+
commit_msg=$(git show -s --format=%s "$commit_id" | cut -c1-70)
79+
cd benchmarks && python populate_into_db.py "$BRANCH_NAME" "$commit_id" "$commit_msg"
5980
6081
- name: Report success status
6182
if: ${{ success() }}

benchmarks/README.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Diffusers Benchmarks
2+
3+
Welcome to Diffusers Benchmarks. These benchmarks are use to obtain latency and memory information of the most popular models across different scenarios such as:
4+
5+
* Base case i.e., when using `torch.bfloat16` and `torch.nn.functional.scaled_dot_product_attention`.
6+
* Base + `torch.compile()`
7+
* NF4 quantization
8+
* Layerwise upcasting
9+
10+
Instead of full diffusion pipelines, only the forward pass of the respective model classes (such as `FluxTransformer2DModel`) is tested with the real checkpoints (such as `"black-forest-labs/FLUX.1-dev"`).
11+
12+
The entrypoint to running all the currently available benchmarks is in `run_all.py`. However, one can run the individual benchmarks, too, e.g., `python benchmarking_flux.py`. It should produce a CSV file containing various information about the benchmarks run.
13+
14+
The benchmarks are run on a weekly basis and the CI is defined in [benchmark.yml](../.github/workflows/benchmark.yml).
15+
16+
## Running the benchmarks manually
17+
18+
First set up `torch` and install `diffusers` from the root of the directory:
19+
20+
```py
21+
pip install -e ".[quality,test]"
22+
```
23+
24+
Then make sure the other dependencies are installed:
25+
26+
```sh
27+
cd benchmarks/
28+
pip install -r requirements.txt
29+
```
30+
31+
We need to be authenticated to access some of the checkpoints used during benchmarking:
32+
33+
```sh
34+
huggingface-cli login
35+
```
36+
37+
We use an L40 GPU with 128GB RAM to run the benchmark CI. As such, the benchmarks are configured to run on NVIDIA GPUs. So, make sure you have access to a similar machine (or modify the benchmarking scripts accordingly).
38+
39+
Then you can either launch the entire benchmarking suite by running:
40+
41+
```sh
42+
python run_all.py
43+
```
44+
45+
Or, you can run the individual benchmarks.
46+
47+
## Customizing the benchmarks
48+
49+
We define "scenarios" to cover the most common ways in which these models are used. You can
50+
define a new scenario, modifying an existing benchmark file:
51+
52+
```py
53+
BenchmarkScenario(
54+
name=f"{CKPT_ID}-bnb-8bit",
55+
model_cls=FluxTransformer2DModel,
56+
model_init_kwargs={
57+
"pretrained_model_name_or_path": CKPT_ID,
58+
"torch_dtype": torch.bfloat16,
59+
"subfolder": "transformer",
60+
"quantization_config": BitsAndBytesConfig(load_in_8bit=True),
61+
},
62+
get_model_input_dict=partial(get_input_dict, device=torch_device, dtype=torch.bfloat16),
63+
model_init_fn=model_init_fn,
64+
)
65+
```
66+
67+
You can also configure a new model-level benchmark and add it to the existing suite. To do so, just defining a valid benchmarking file like `benchmarking_flux.py` should be enough.
68+
69+
Happy benchmarking 🧨

benchmarks/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)