Skip to content

[Enhance]: Add validation for expert parallelism settings #1199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jianzs
Copy link
Collaborator

@jianzs jianzs commented Jun 12, 2025

Add validation to prevent simultaneous use of --enable-expert-parallel and expert-tensor-parallel-size configurations. These settings are mutually exclusive. Implementing this check prevents unexpected behavior and improves error tracing.

If both settings are enabled concurrently, the system now throws an error, making it easier to identify and resolve configuration issues.

Add validation to prevent simultaneous use of `--enable-expert-parallel` and `expert-tensor-parallel-size` configurations. These settings are mutually exclusive. Implementing this check prevents unexpected behavior and improves error tracing.

If both settings are enabled concurrently, the system now throws an error, making it easier to identify and resolve configuration issues.

Signed-off-by: Jade Zheng <[email protected]>
@jianzs jianzs force-pushed the ehance/ep_check branch from e13e50c to fefdc38 Compare June 12, 2025 17:53
jianzs added 3 commits June 13, 2025 01:53
Signed-off-by: Jade Zheng <[email protected]>
Signed-off-by: Jade Zheng <[email protected]>
Signed-off-by: Jade Zheng <[email protected]>
@jianzs jianzs requested a review from wangxiyuan June 13, 2025 04:12
@jianzs jianzs added the ready read for review label Jun 13, 2025
@jianzs jianzs requested a review from Copilot June 15, 2025 07:34
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a validation step in check_ascend_config to prevent simultaneous use of --enable-expert-parallel and an expert_tensor_parallel_size greater than 1, avoiding conflicting configurations.

  • Import TYPE_CHECKING and annotate vllm_config with VllmConfig for better type safety.
  • Introduce a runtime check that raises an error when expert parallelism flags conflict.
  • Preserve existing experimental ACL graph warnings.
Comments suppressed due to low confidence (1)

vllm_ascend/ascend_config.py:171

  • Consider adding a unit test to verify that enabling both enable_expert_parallel and expert_tensor_parallel_size > 1 reliably raises the intended error.
# for expert parallelism

Signed-off-by: Jade Zheng <[email protected]>

Co-authored-by: Copilot <[email protected]>
@@ -164,3 +167,10 @@ def check_ascend_config(vllm_config, enforce_eager):
"ACL Graph is currently experimental. Please "
"raise an issue on https://github.com/vllm-project/vllm-ascend/issues"
" if you encourage any Error")

# for expert parallelism
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this validation into check_and_update_config in platform.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will be called within check_and_update_config

@jianzs
Copy link
Collaborator Author

jianzs commented Jun 17, 2025

@Yikun @wangxiyuan ready to merge.

@wangxiyuan
Copy link
Collaborator

can you update the tests/ut/test_ascend_config.py as well?

Copy link

github-actions bot commented Jul 3, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions github-actions bot added merge-conflicts and removed ready read for review labels Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants