Skip to content

[None][feat] Add DeepSeek-V4-Pro curated configs#15919

Draft
lfr-0531 wants to merge 1 commit into
NVIDIA:mainfrom
lfr-0531:user/fanrongl/add-deepseek-v4-curated-configs
Draft

[None][feat] Add DeepSeek-V4-Pro curated configs#15919
lfr-0531 wants to merge 1 commit into
NVIDIA:mainfrom
lfr-0531:user/fanrongl/add-deepseek-v4-curated-configs

Conversation

@lfr-0531

@lfr-0531 lfr-0531 commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR adds curated inference configs for DeepSeek-V4-Pro (DeepseekV4ForCausalLM) under examples/configs/curated/, following the same convention as the existing DeepSeek-R1 curated configs (see #11653).

Two scenarios are provided, both tuned on B200 with TP=8 / EP=8 and MTP speculative decoding:

  • deepseek-v4-pro-latency.yamlMin Latency (max_batch_size=128, MTP max_draft_len=3, attention DP disabled, wide CUDA graph batch-size sweep).
  • deepseek-v4-pro-throughput.yamlMax Throughput (max_batch_size=32, MTP max_draft_len=1, attention DP + balance enabled, LM-head TP in ADP).

Both are registered in examples/configs/curated/lookup.yaml with arch: DeepseekV4ForCausalLM, model deepseek-ai/DeepSeek-V4-Pro, and gpu_compatibility: "B200".

Test Coverage

Covered by the existing curated-config validation suite in tests/unittest/llmapi/test_config_database.py, which loads every entry in curated/lookup.yaml, validates it against LlmArgs, runs a trtllm-serve sanity check, and asserts no unnecessary default values are present. The two new entries are picked up automatically via the arch field.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (config validation runs automatically over curated/lookup.yaml).

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. (No API changes — configs only.)

  • Any new dependencies have been scanned for license and vulnerabilities. (None.)

  • CODEOWNERS updated if ownership changes. (N/A.)

  • Documentation updated as needed.

  • Update tava architecture diagram if there is a significant design change in PR. (N/A.)

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@lfr-0531 lfr-0531 requested a review from a team as a code owner July 3, 2026 10:06
@lfr-0531 lfr-0531 requested review from QiJune and kaiyux July 3, 2026 10:06
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds two new curated YAML configuration files for DeepSeek V4 Flash (latency-optimized and throughput-optimized variants) defining CUDA graph, KV cache, MoE, parallelism, and speculative decoding settings, and registers both in lookup.yaml with corresponding model mapping entries.

Changes

DeepSeek V4 Flash curated configs and lookup registration

Layer / File(s) Summary
Latency and throughput config files
examples/configs/curated/deepseek-v4-flash-latency.yaml, examples/configs/curated/deepseek-v4-flash-throughput.yaml
New YAML configs define CUDA graph batch sizes, KV cache dtype/memory settings, MoE/TRTLLM backend parameters, parallelism (TP/PP/EP), and speculative decoding options tuned for latency vs. throughput scenarios.
Lookup registration for new configs
examples/configs/curated/lookup.yaml
Adds two deepseek-ai/DeepSeek-V4-Flash entries mapping to the new latency and throughput configs, both using DeepseekV4ForCausalLM and gpu_compatibility: "B200, GB200".

Estimated code review effort: 2 (Simple) | ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title names DeepSeek-V4-Pro, but the changes add DeepSeek-V4-Flash curated configs, so it is misleading. Rename the title to match the actual DeepSeek-V4-Flash curated configs and keep the [None][feat] format.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description includes the required Description, Test Coverage, and PR Checklist sections and is mostly complete.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@lfr-0531 lfr-0531 marked this pull request as draft July 3, 2026 11:23
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/add-deepseek-v4-curated-configs branch from f9b1c55 to 3da2bca Compare July 3, 2026 11:24
@lfr-0531 lfr-0531 changed the title [None][feat] Add DeepSeek-V4-Flash curated configs [None][feat] Add DeepSeek-V4-Pro curated configs Jul 3, 2026
Add max-throughput and min-latency curated configs for DeepSeek-V4-Pro
on B200 (8xTP, 8xEP, MTP), and register both in curated/lookup.yaml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/add-deepseek-v4-curated-configs branch from 3da2bca to 5945e34 Compare July 3, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant