[None][feat] Add DeepSeek-V4-Pro curated configs by lfr-0531 · Pull Request #15919 · NVIDIA/TensorRT-LLM

lfr-0531 · 2026-07-03T10:06:36Z

Description

This PR adds curated inference configs for DeepSeek-V4-Pro (DeepseekV4ForCausalLM) under examples/configs/curated/, following the same convention as the existing DeepSeek-R1 curated configs (see #11653).

Two scenarios are provided, both tuned on B200 with TP=8 / EP=8 and MTP speculative decoding:

deepseek-v4-pro-latency.yaml — Min Latency (max_batch_size=128, MTP max_draft_len=3, attention DP disabled, wide CUDA graph batch-size sweep).
deepseek-v4-pro-throughput.yaml — Max Throughput (max_batch_size=32, MTP max_draft_len=1, attention DP + balance enabled, LM-head TP in ADP).

Both are registered in examples/configs/curated/lookup.yaml with arch: DeepseekV4ForCausalLM, model deepseek-ai/DeepSeek-V4-Pro, and gpu_compatibility: "B200".

Test Coverage

Covered by the existing curated-config validation suite in tests/unittest/llmapi/test_config_database.py, which loads every entry in curated/lookup.yaml, validates it against LlmArgs, runs a trtllm-serve sanity check, and asserts no unnecessary default values are present. The two new entries are picked up automatically via the arch field.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (config validation runs automatically over curated/lookup.yaml).
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. (No API changes — configs only.)
Any new dependencies have been scanned for license and vulnerabilities. (None.)
CODEOWNERS updated if ownership changes. (N/A.)
Documentation updated as needed.
Update tava architecture diagram if there is a significant design change in PR. (N/A.)
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-07-03T10:11:18Z

📝 Walkthrough

Walkthrough

Adds two new curated YAML configuration files for DeepSeek V4 Flash (latency-optimized and throughput-optimized variants) defining CUDA graph, KV cache, MoE, parallelism, and speculative decoding settings, and registers both in lookup.yaml with corresponding model mapping entries.

Changes

DeepSeek V4 Flash curated configs and lookup registration

Layer / File(s)	Summary
Latency and throughput config files `examples/configs/curated/deepseek-v4-flash-latency.yaml`, `examples/configs/curated/deepseek-v4-flash-throughput.yaml`	New YAML configs define CUDA graph batch sizes, KV cache dtype/memory settings, MoE/TRTLLM backend parameters, parallelism (TP/PP/EP), and speculative decoding options tuned for latency vs. throughput scenarios.
Lookup registration for new configs `examples/configs/curated/lookup.yaml`	Adds two `deepseek-ai/DeepSeek-V4-Flash` entries mapping to the new latency and throughput configs, both using `DeepseekV4ForCausalLM` and `gpu_compatibility: "B200, GB200"`.

Estimated code review effort: 2 (Simple) | ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title names DeepSeek-V4-Pro, but the changes add DeepSeek-V4-Flash curated configs, so it is misleading.	Rename the title to match the actual DeepSeek-V4-Flash curated configs and keep the [None][feat] format.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description includes the required Description, Test Coverage, and PR Checklist sections and is mostly complete.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

Add max-throughput and min-latency curated configs for DeepSeek-V4-Pro on B200 (8xTP, 8xEP, MTP), and register both in curated/lookup.yaml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>

lfr-0531 requested a review from a team as a code owner July 3, 2026 10:06

lfr-0531 requested review from QiJune and kaiyux July 3, 2026 10:06

github-actions Bot assigned lfr-0531 Jul 3, 2026

lfr-0531 marked this pull request as draft July 3, 2026 11:23

lfr-0531 force-pushed the user/fanrongl/add-deepseek-v4-curated-configs branch from f9b1c55 to 3da2bca Compare July 3, 2026 11:24

lfr-0531 changed the title ~~[None][feat] Add DeepSeek-V4-Flash curated configs~~ [None][feat] Add DeepSeek-V4-Pro curated configs Jul 3, 2026

lfr-0531 force-pushed the user/fanrongl/add-deepseek-v4-curated-configs branch from 3da2bca to 5945e34 Compare July 3, 2026 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] Add DeepSeek-V4-Pro curated configs#15919

[None][feat] Add DeepSeek-V4-Pro curated configs#15919
lfr-0531 wants to merge 1 commit into
NVIDIA:mainfrom
lfr-0531:user/fanrongl/add-deepseek-v4-curated-configs

lfr-0531 commented Jul 3, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lfr-0531 commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jul 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lfr-0531 commented Jul 3, 2026 •

edited

Loading

coderabbitai Bot commented Jul 3, 2026 •

edited

Loading