Skip to content

Conversation

@Aznix07
Copy link

@Aznix07 Aznix07 commented Nov 3, 2025

What does this PR do?

Fixes the Qwen3 model condfiguration to use NormalizedTextConfigWithGQA instead of NormalizedTextConfig.

Problem

Qwen3 models use Group Query Attention (GQA) architecture similar to Qwen2, with:

  • num_attention_heads: 16
  • num_key_value_heads: 8
  • head_dim: 128

The previous configuration used NormalizedTextConfig which does not have the num_key_value_heads attribute needed for GQA models, which causing:

  1. Incorrect head dimension calculations (64 instead of 128)
  2. ONNX export failures with error

Solution

Changed the normalized config class for qwen3 from NormalizedTextConfig to NormalizedTextConfigWithGQA in otimum/utils/normalized_config.py (line 314).

Testing

✅ Tested with Qwen/Qwen3-0.6B - all assertions passed:

  • Correct normalized config class
  • head_dim = 128
  • GQA structure validated

Fixes #2379

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Note: This is a configuration fix. No user-facing documentation changes needed. Manually verified the fix works correctly with Qwen3 models.

Who can review?

@echarlaix @JingyaHuang @michaelbenayounIlyasMoutawwakil

@IlyasMoutawwakil
Copy link
Member

Hi ! Thanks for the catch ! what are the tests you ran exactly ? should this not be done on in optimum-onnx on the config level in https://github.com/huggingface/optimum-onnx/blob/c3db0acb978a916cf418350272242bb817276758/optimum/exporters/onnx/model_configs.py#L491

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Aznix07
Copy link
Author

Aznix07 commented Nov 12, 2025

Hi @IlyasMoutawwakil!

Thanks for pointing me to the right location! I've updated the fix as you suggested.

What I did

Updated optimum/exporters/onnx/model_configs.py (around line 491) to add NORMALIZED_CONFIG_CLASS = NormalizedTextConfigWithGQA to:

  • Qwen2OnnxConfig (line 486)
  • Qwen3OnnxConfig (line 492)
  • Qwen3MoeOnnxConfig (line 500)

All three inherit from LlamaOnnxConfig which uses NormalizedTextConfig by default, but Qwen models use GQA and need the GQA-aware config.

Tests Run

  1. Config verification: Confirmed all three configs use NormalizedTextConfigWithGQA
  2. Architecture check: Verified Qwen2.5-7B (28 attn heads, 4 kv heads) and Qwen3-235B-A22B MoE (64 attn heads, 4 kv heads) both use GQA
  3. ONNX config test: Confirmed normalized config is correctly applied

All tests passed ✅. Full test output is in the PR description.

Let me know if you need anything else.
Thank you.

@Aznix07
Copy link
Author

Aznix07 commented Nov 12, 2025

This PR is being closed. The fix has been implemented in the correct repository:

New PR: huggingface/optimum-onnx#97

As suggested by @IlyasMoutawwakil, the fix needed to be done in optimum-onnx at the config level in optimum/exporters/onnx/model_configs.py, not in the base optimum repo.

Please see the new PR for the complete implementation and testing.

Thank you!

@Aznix07 Aznix07 closed this Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qwen3-0.6 to onnx : index: 3 Got: 64 Expected: 128

3 participants