Skip to content

[CI][Benchmark] Optimize performance benchmark workflow #1039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Jun 3, 2025
Merged
Show file tree
Hide file tree
Changes from 26 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 44 additions & 18 deletions .github/workflows/nightly_benchmarks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,22 @@
# limitations under the License.
#

name: 'run benchmarks main'
name: 'Benchmarks / Performance'
# This workflow runs nightly benchmarks for vllm-ascend.

on:
schedule:
# Run at 24:00 everyday
- cron: '00 16 * * *'
workflow_dispatch:

# after merged, secrets will be available
# pull_request:
# branches:
# - 'main'
# - '*-dev'
# paths:
# - '.github/workflows/nightly_benchmarks.yaml'
pull_request:
branches:
- 'main'
- '*-dev'
paths:
- '.github/workflows/nightly_benchmarks.yaml'
pull_request_target:
types: [labeled]


# Bash shells do not use ~/.profile or ~/.bashrc so these shells need to be explicitly
Expand All @@ -38,9 +40,13 @@ defaults:
run:
shell: bash -el {0}

concurrency:
group: pr-${{ github.event.pull_request.number }}
cancel-in-progress: true

jobs:
test:
name: run benchmarks main
name: Benchmarks/vLLM=${{ matrix.vllm_branch }}, vLLM-Ascend=${{ matrix.vllm_ascend_branch }}
runs-on: 'linux-arm64-npu-static-8'
strategy:
matrix:
Expand All @@ -64,6 +70,7 @@ jobs:
env:
HF_ENDPOINT: https://hf-mirror.com
HF_TOKEN: ${{ secrets.HF_TOKEN }}
HF_HOME: /github/home/.cache/huggingface
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need specified explictly?

ES_OM_DOMAIN: ${{ secrets.ES_OM_DOMAIN }}
ES_OM_AUTHORIZATION: ${{ secrets.ES_OM_AUTHORIZATION }}
steps:
Expand All @@ -90,7 +97,8 @@ jobs:
- name: Checkout vllm-project/vllm-ascend repo
uses: actions/checkout@v4
with:
ref: ${{ matrix.vllm_ascend_branch }}
ref: dev-bench
repository: Potabk/vllm-ascend

- name: Checkout vllm-project/vllm repo
uses: actions/checkout@v4
Expand All @@ -109,25 +117,43 @@ jobs:
pip install -e .
pip install -r benchmarks/requirements-bench.txt

- name: Checkout cosdt/elastic-tool
uses: actions/checkout@v4
- name: Run current commit benchmarks
if: github.event_name != 'schedule'
run: |
# Sometimes we only want to run benchmarks on the current commit
# This is useful for debugging or a release benchmark
bash benchmarks/scripts/run-performance-benchmarks.sh
# Convert the benchmark results to markdown format
python3 benchmarks/scripts/convert_json_to_markdown.py

- name: Generate step summary
if: github.event_name != 'schedule'
run: |
cat ./benchmarks/results/benchmark_results.md >> $GITHUB_STEP_SUMMARY

- name: Upload benchmark artifacts
if: github.event_name != 'schedule'
uses: actions/upload-artifact@v4
with:
repository: cosdt/elastic-tool
path: ./elastic_tool
ref: 0.1.0-dev
name: "benchmark-performance-${{ matrix.vllm_branch }}-${{ matrix.vllm_ascend_branch }}-report"
path: ./benchmarks/results/benchmark_results.md
if-no-files-found: warn
retention-days: 90
overwrite: true

- name: Install elastic_tool
working-directory: ./elastic_tool
run: |
pip install -e .
pip install escli-tool==0.2.0

- name: Collect pr info from vllm-project/vllm-ascend
if: github.event_name == 'schedule'
run: |
# Only get the pull request which may influences performance
git log --pretty=format:"%H %s" -- '**/*.py' ':!docs/*' ':!tests/*' ':!examples/*' > commit_log.txt
escli check commit_log.txt

- name: Run benchmark iteration
if: github.event_name == 'schedule'
run: |
while IFS= read -r line || [[ -n "$line" ]]; do
commit_id=${line%% *}
Expand Down
4 changes: 3 additions & 1 deletion benchmarks/requirements-bench.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
pandas
datasets
modelscope
modelscope
libcst
tabulate
183 changes: 183 additions & 0 deletions benchmarks/scripts/convert_json_to_markdown.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
import argparse
import json
import os
from pathlib import Path

import pandas as pd
from tabulate import tabulate

CUR_PATH = Path(__file__).parent.resolve()
# latency results and the keys that will be printed into markdown
latency_results = []
latency_column_mapping = {
"test_name": "Test name",
"avg_latency": "Mean latency (ms)",
"P50": "Median latency (ms)",
"P99": "P99 latency (ms)",
}

# throughput tests and the keys that will be printed into markdown
throughput_results = []
throughput_results_column_mapping = {
"test_name": "Test name",
"num_requests": "Num of reqs",
"total_num_tokens": "Total num of tokens",
"elapsed_time": "Elapsed time (s)",
"requests_per_second": "Tput (req/s)",
"tokens_per_second": "Tput (tok/s)",
}

# serving results and the keys that will be printed into markdown
serving_results = []
serving_column_mapping = {
"test_name": "Test name",
"request_rate": "Request rate (req/s)",
"request_throughput": "Tput (req/s)",
"output_throughput": "Output Tput (tok/s)",
"median_ttft_ms": "TTFT (ms)",
"median_tpot_ms": "TPOT (ms)",
"median_itl_ms": "ITL (ms)",
}


def read_markdown(file):
if os.path.exists(file):
with open(file) as f:
return f.read() + "\n"
else:
return f"{file} not found.\n"


def results_to_json(latency, throughput, serving):
return json.dumps({
'latency': latency.to_dict(),
'throughput': throughput.to_dict(),
'serving': serving.to_dict()
})


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Process the results of the benchmark tests.")
parser.add_argument(
"--results_folder",
type=str,
default="../results/",
help="The folder where the benchmark results are stored.")
parser.add_argument(
"--output_folder",
type=str,
default="../results/",
help="The folder where the benchmark results are stored.")
parser.add_argument("--markdown_template",
type=str,
default="./perf_result_template.md",
help="The template file for the markdown report.")
parser.add_argument("--tag",
default="main",
help="Tag to be used for release message.")
parser.add_argument("--commit_id",
default="",
help="Commit ID to be used for release message.")

args = parser.parse_args()
results_folder = (CUR_PATH / args.results_folder).resolve()
output_folder = (CUR_PATH / args.output_folder).resolve()
markdown_template = (CUR_PATH / args.markdown_template).resolve()

# collect results
for test_file in results_folder.glob("*.json"):

with open(test_file) as f:
raw_result = json.loads(f.read())

if "serving" in str(test_file):
# this result is generated via `benchmark_serving.py`

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
serving_results.append(raw_result)
continue

elif "latency" in f.name:
# this result is generated via `benchmark_latency.py`

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# get different percentiles
for perc in [10, 25, 50, 75, 90, 99]:
# Multiply 1000 to convert the time unit from s to ms
raw_result.update(
{f"P{perc}": 1000 * raw_result["percentiles"][str(perc)]})
raw_result["avg_latency"] = raw_result["avg_latency"] * 1000

# add the result to raw_result
latency_results.append(raw_result)
continue

elif "throughput" in f.name:
# this result is generated via `benchmark_throughput.py`

# update the test name of this result
raw_result.update({"test_name": test_file.stem})

# add the result to raw_result
throughput_results.append(raw_result)
continue

print(f"Skipping {test_file}")
serving_results.sort(key=lambda x: (len(x['test_name']), x['test_name']))

latency_results = pd.DataFrame.from_dict(latency_results)
serving_results = pd.DataFrame.from_dict(serving_results)
throughput_results = pd.DataFrame.from_dict(throughput_results)

raw_results_json = results_to_json(latency_results, throughput_results,
serving_results)

# remapping the key, for visualization purpose
if not latency_results.empty:
latency_results = latency_results[list(
latency_column_mapping.keys())].rename(
columns=latency_column_mapping)
if not serving_results.empty:
serving_results = serving_results[list(
serving_column_mapping.keys())].rename(
columns=serving_column_mapping)
if not throughput_results.empty:
throughput_results = throughput_results[list(
throughput_results_column_mapping.keys())].rename(
columns=throughput_results_column_mapping)

processed_results_json = results_to_json(latency_results,
throughput_results,
serving_results)

# get markdown tables
latency_md_table = tabulate(latency_results,
headers='keys',
tablefmt='pipe',
showindex=False)
serving_md_table = tabulate(serving_results,
headers='keys',
tablefmt='pipe',
showindex=False)
throughput_md_table = tabulate(throughput_results,
headers='keys',
tablefmt='pipe',
showindex=False)

# document the result
print(output_folder)
with open(output_folder / "benchmark_results.md", "w") as f:

results = read_markdown(markdown_template)
results = results.format(
latency_tests_markdown_table=latency_md_table,
throughput_tests_markdown_table=throughput_md_table,
serving_tests_markdown_table=serving_md_table,
benchmarking_results_in_json_string=processed_results_json)
f.write(results)
67 changes: 67 additions & 0 deletions benchmarks/scripts/patch_benchmark_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
from argparse import ArgumentParser

import libcst as cst
import libcst.matchers as m

# Patch the benchmark_dataset.py file to set streaming=False in load_dataset calls


class StreamingFalseTransformer(cst.CSTTransformer):

def __init__(self):
self.in_target_class = False
self.in_target_func = False

def visit_ClassDef(self, node):
if node.name.value == "HuggingFaceDataset":
self.in_target_class = True

def leave_ClassDef(self, original_node, updated_node):
self.in_target_class = False
return updated_node

def visit_FunctionDef(self, node):
if self.in_target_class and node.name.value == "load_data":
self.in_target_func = True

def leave_FunctionDef(self, original_node, updated_node):
self.in_target_func = False
return updated_node

def leave_Call(self, original_node, updated_node):
if self.in_target_class and self.in_target_func:
if m.matches(updated_node.func, m.Name("load_dataset")):
new_args = []
for arg in updated_node.args:
if arg.keyword and arg.keyword.value == "streaming":
new_arg = arg.with_changes(value=cst.Name("False"))
new_args.append(new_arg)
else:
new_args.append(arg)
return updated_node.with_changes(args=new_args)
return updated_node


def patch_file(path):
with open(path, "r", encoding="utf-8") as f:
source = f.read()

module = cst.parse_module(source)
modified = module.visit(StreamingFalseTransformer())

with open(path, "w", encoding="utf-8") as f:
f.write(modified.code)

print(f"Patched: {path}")


if __name__ == '__main__':
parser = ArgumentParser(
description=
"Patch benchmark_dataset.py to set streaming=False in load_dataset calls"
)
parser.add_argument("--path",
type=str,
help="Path to the benchmark_dataset.py file")
args = parser.parse_args()
patch_file(args.path)
Loading