Skip to content

aiter test workflow enhance#2905

Draft
kiran-thumma wants to merge 47 commits intomainfrom
kithumma/aiter-test-workflow-enhance
Draft

aiter test workflow enhance#2905
kiran-thumma wants to merge 47 commits intomainfrom
kithumma/aiter-test-workflow-enhance

Conversation

@kiran-thumma
Copy link
Copy Markdown
Collaborator

Motivation

  • Add a hard wheel smoke gate so downstream GPU suites fail fast when the published AITER wheel is broken.
  • Normalize GPU labels and expose skip toggles so each suite only consumes the GPUs it needs.

Technical Details

  • .github/workflows/nightly.yaml: introduced skip_wheel_smoke, skip_sglang, skip_vllm, skip_atom, routed the smoke gate through test-whl.yaml, and short-circuited matrices when their suite is disabled while keeping the dependency chain intact.
  • .github/workflows/test-whl.yaml: added published-wheel fallback download, MI300X/MI35X × Python 3.10/3.12 coverage, and a no-op path when callers skip smoke.
  • .github/configs/vllm_models.json, .github/configs/vllm_tests.json, .github/scripts/run_vllm.sh, .github/scripts/run_vllm_test.sh, .github/configs/vllm_pins.json: normalized runner labels and enforced vLLM pin usage.
  • index.html: refreshed the dashboard to reflect the wheel gate, job counts, and new skip knobs.

Test Plan

  • Static workflow inspection.

Test Result

  • Not run (workflow-only change).

Submission Checklist

  - Add run_sglang, run_vllm, run_atom workflow_dispatch toggles
  - Create modular scripts: run_sglang.sh, run_vllm.sh, run_atom.sh
  - Wheels go to devreleases from non-schedule triggers
  - Promote only when all selected integration tests pass
  - Add run_sglang, run_vllm, run_atom workflow_dispatch toggles
  - Create modular scripts for Docker setup and test execution
  - Model configs in JSON files for easy maintenance
  - ATOM: all 15 accuracy models loaded from atom_models.json
  - vLLM: 7 latency benchmarks loaded from vllm_models.json
  - SGLang: dispatches full scout to sgl-project/sglang
  - Wheels go to devreleases from non-schedule triggers
  - Promote only when all selected integration tests pass
  - Add skip_build toggle to bypass build for faster testing
  - Add wheel_url input to use pre-built wheel directly
  - Add JSON model configs (atom_models.json, vllm_models.json)
  - Fix integration tests not depending on build when skipped
  - Add sglang_job_filter dropdown
  - SGLang: run on aiter-1gpu-runner instead of dispatching to external repo
  - ATOM: fix accuracy results path, log file path, workspace mount
  - Fix artifact names with illegal characters
  - Add cleanup traps to all scripts
  - Add atom_models.json and vllm_models.json configs
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

@gyohuangxin
Copy link
Copy Markdown
Member

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests?
cc @valarLip

@kiran-thumma kiran-thumma reopened this Apr 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2905 --add-label <label>

@kiran-thumma
Copy link
Copy Markdown
Collaborator Author

@kiran-thumma Can we reuse current test workflows instead of adding so many new tests? cc @valarLip

I'm reusing the test-workflow.yml and nightly.yml workflows adding more tests and its not yet for review.

kiran-thumma and others added 23 commits April 25, 2026 16:12
The escaped quotes \"$filter\" and \"$jq_filter\" inside the $()
command substitution caused jq to see malformed strings. Inside $()
the shell handles quoting correctly with plain "$var" — no escaping
needed.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Inside any($allowed[]; cond), the dot refers to each $allowed string
element, not the parent JSON object. Capture .runner as $r first so
the comparison works correctly.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
  ATOM tests, SGLang GPU-aware image selection

  - vllm_pins.json: update base_image to
  vllm/vllm-openai-rocm:v0.19.1
  - atom_models.json: add MI300X (gfx942) entries with
  proper tp-to-runner mapping
  - run_sglang.sh: auto-detect GPU arch, pick mi30x/mi35x
  image, use rocm720 builds
  fallback version pin, ABI smoke test, source build guard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants