[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 #32976

rasmith · 2026-01-23T22:51:04Z

Purpose

This PR updates concat_and_cache_ds_mla_kernel to use the correct scale divisor when running on gfx942 architectures on ROCm. The tile_size divisor was 448.0 which does not work on AMD platforms with arch of gfx942, e.g. MI300, MI325. This PR updates the divisor to be 224.0 on gfx942.

Additionally, I consolidated the scale divisor into a constexpr float.

Test Plan

Use lm_eval to check accuracy on DeepSeek-R1 model.

Command to run:
lm_eval --model vllm --model_args pretrained=/models/DeepSeek-R1,max_length=8192,tensor_parallel_size=8 --batch_size auto --tasks gsm8k --num_fewshot 5

Test Result

I did a comparison using lm_eval for with and without this PR.

I ran:
lm_eval --model vllm --model_args pretrained=/models/DeepSeek-R1,max_length=8192,tensor_parallel_size=8 --batch_size auto --tasks gsm8k --num_fewshot 5
on MI300 and got the following results:

with this PR:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9575|±  |0.0056|
|     |       |strict-match    |     5|exact_match|↑  |0.9583|±  |0.0055|

without this PR:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.9522|±  |0.0059|
|     |       |strict-match    |     5|exact_match|↑  |0.9522|±  |0.0059|

Essential Elements of an Effective PR Description Checklist

[ X] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ X] The test plan, such as providing test command.
[ X] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Randall Smith <ransmith@amd.com>

gemini-code-assist

Code Review

This pull request correctly addresses a bug in concat_and_cache_ds_mla_kernel for the gfx942 architecture by using the appropriate scaling factor. The change is consistent with other parts of the codebase for the same architecture. My review includes one suggestion to improve maintainability by refactoring the hardcoded scaling factors into a constant, which will make the code easier to read and prevent potential future inconsistencies.

csrc/cache_kernels.cu

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

tjtanaa

LGTM

use correct scale

6966183

Signed-off-by: Randall Smith <ransmith@amd.com>

mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Jan 23, 2026

gemini-code-assist bot reviewed Jan 23, 2026

View reviewed changes

csrc/cache_kernels.cu Outdated Show resolved Hide resolved

rasmith added 2 commits January 23, 2026 23:32

use constexpr and consolidate

f331b1e

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

typo

520c374

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

tjtanaa approved these changes Jan 24, 2026

View reviewed changes

tjtanaa added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 #32976

[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 #32976

rasmith commented Jan 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

tjtanaa left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 #32976

Are you sure you want to change the base?

[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 #32976

Conversation

rasmith commented Jan 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

tjtanaa left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rasmith commented Jan 23, 2026 •

edited by github-actions bot

Loading