aiter test workflow enhance by kiran-thumma · Pull Request #2905 · ROCm/aiter

kiran-thumma · 2026-04-24T09:19:50Z

Motivation

Add a hard wheel smoke gate so downstream GPU suites fail fast when the published AITER wheel is broken.
Normalize GPU labels and expose skip toggles so each suite only consumes the GPUs it needs.

Technical Details

.github/workflows/nightly.yaml: introduced skip_wheel_smoke, skip_sglang, skip_vllm, skip_atom, routed the smoke gate through test-whl.yaml, and short-circuited matrices when their suite is disabled while keeping the dependency chain intact.
.github/workflows/test-whl.yaml: added published-wheel fallback download, MI300X/MI35X × Python 3.10/3.12 coverage, and a no-op path when callers skip smoke.
.github/configs/vllm_models.json, .github/configs/vllm_tests.json, .github/scripts/run_vllm.sh, .github/scripts/run_vllm_test.sh, .github/configs/vllm_pins.json: normalized runner labels and enforced vLLM pin usage.
index.html: refreshed the dashboard to reflect the wheel gate, job counts, and new skip knobs.

Test Plan

Static workflow inspection.

Test Result

Not run (workflow-only change).

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests

- Add run_sglang, run_vllm, run_atom workflow_dispatch toggles - Create modular scripts: run_sglang.sh, run_vllm.sh, run_atom.sh - Wheels go to devreleases from non-schedule triggers - Promote only when all selected integration tests pass

- Add run_sglang, run_vllm, run_atom workflow_dispatch toggles - Create modular scripts for Docker setup and test execution - Model configs in JSON files for easy maintenance - ATOM: all 15 accuracy models loaded from atom_models.json - vLLM: 7 latency benchmarks loaded from vllm_models.json - SGLang: dispatches full scout to sgl-project/sglang - Wheels go to devreleases from non-schedule triggers - Promote only when all selected integration tests pass

- Add skip_build toggle to bypass build for faster testing - Add wheel_url input to use pre-built wheel directly - Add JSON model configs (atom_models.json, vllm_models.json) - Fix integration tests not depending on build when skipped - Add sglang_job_filter dropdown

- SGLang: run on aiter-1gpu-runner instead of dispatching to external repo - ATOM: fix accuracy results path, log file path, workspace mount - Fix artifact names with illegal characters - Add cleanup traps to all scripts - Add atom_models.json and vllm_models.json configs

…versions)

no results

test source install + tblib

github-actions · 2026-04-24T09:20:16Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

gyohuangxin · 2026-04-24T09:41:12Z

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests?
cc @valarLip

github-actions · 2026-04-24T16:15:36Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

kiran-thumma · 2026-04-24T16:15:44Z

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests? cc @valarLip

I'm reusing the test-workflow.yml and nightly.yml workflows adding more tests and its not yet for review.

The escaped quotes \"$filter\" and \"$jq_filter\" inside the $() command substitution caused jq to see malformed strings. Inside $() the shell handles quoting correctly with plain "$var" — no escaping needed. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

Inside any($allowed[]; cond), the dot refers to each $allowed string element, not the parent JSON object. Capture .runner as $r first so the comparison works correctly. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>

ATOM tests, SGLang GPU-aware image selection - vllm_pins.json: update base_image to vllm/vllm-openai-rocm:v0.19.1 - atom_models.json: add MI300X (gfx942) entries with proper tp-to-runner mapping - run_sglang.sh: auto-detect GPU arch, pick mi30x/mi35x image, use rocm720 builds

…acy tests

…FO-LLM by default

…flight once for all jobs

… mismatches

fallback version pin, ABI smoke test, source build guard

import torch + ctypes.CDLL

rejects working containers

kiran-thumma added 16 commits April 16, 2026 20:35

Add missing JSON model configs for atom and vllm

4b328dc

Retrigger CI

e05405b

Fix SGLang to run on GPU runner, fix artifact names, add cleanup traps

16baa9f

Full scout: 90 jobs across SGLang, vLLM, ATOM

5355d33

Use pip index URL instead of direct wheel URL

0e2a726

Add aiter_version input to pin specific wheel version

8e34797

fix: Auto-detect AITER version from pip index (handles PEP 440 local …

f90c41d

…versions)

Fix vLLM tests (clone repo for test files), pip install (direct URL)

0d0bf3f

feat: update scout scripts

0de65d2

fix: ATOM rm existing dir, vLLM source install + tblib + fail on

be142e6

no results

fix: Add --nightly flag to 14 SGLang suites, remove glm51, vLLM

75e43d2

test source install + tblib

Gate wheel smoke tests and align GPU runners

684220a

kiran-thumma added 3 commits April 24, 2026 04:45

Gate wheel smoke tests and align GPU runners

93fdafe

Fix nightly workflow gating and skip controls

f45b1bf

Fix nightly workflow gating and skip controls

bd6527c

gyohuangxin closed this Apr 24, 2026

test run for python 3.12 whls

02f6c7d

kiran-thumma reopened this Apr 24, 2026

kiran-thumma and others added 4 commits April 24, 2026 11:16

Merge branch 'main' into kithumma/aiter-test-workflow-enhance

3604ebd

Expose wheel smoke python selector

9113131

Adjust nightly/test-whl gating inputs

66af33f

Add matrix python filter

149de44

kiran-thumma and others added 23 commits April 25, 2026 16:12

Fix python filter without expressions

35f6485

Use tr for python filter normalization

60fd311

Syntax fixes

adfcf53

removed smoke tests, update after matrix tests

e8aab4d

removed smoke tests, update after matrix tests

493970c

Add runner filtering inputs

74e564e

Add runner group selector

88b9560

Rename runner group filter to gpu_arch_filter

25e9c92

Fix jq filter quoting in nightly config loader

32ca94f

update sglang and vllm test images

d96af2f

Fix SGLang git ownership, AITER wheel 403 fallback, add AFO-LLM accur…

23f577e

…acy tests

Fix AITER wheel S3 403 by probing root and subdirectory paths; skip A…

44dc65d

…FO-LLM by default

fix: pip install resolver.sh

9524f16

Export AITER env vars so heredoc Python script can read them

ae2a3a5

Smart SGLang image selection: match ROCm version, smoke-test ABI, pre…

c7ab526

…flight once for all jobs

Deepen ABI smoke test: load all AITER JIT .so modules to catch symbol…

4b0ada6

… mismatches

Fix 5 bugs: duplicate concurrency, version whitespace trim,

b5c29d8

fallback version pin, ABI smoke test, source build guard

fix: preflight

7daf034

Fix preflight: rocm700 first, correct ABI smoke test with

6f6f977

import torch + ctypes.CDLL

Simplify smoke test to just import aiter — deep .so loading

65ff2dd

rejects working containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aiter test workflow enhance#2905

aiter test workflow enhance#2905
kiran-thumma wants to merge 47 commits intomainfrom
kithumma/aiter-test-workflow-enhance

kiran-thumma commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

gyohuangxin commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kiran-thumma commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kiran-thumma commented Apr 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented Apr 24, 2026

🏷️ CI Guide

Uh oh!

gyohuangxin commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

🏷️ CI Guide

Uh oh!

kiran-thumma commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants