Release v0.9.0 · microsoft/onnxruntime-genai

What's Changed

New Features

Constrained decoding integration by @ajindal1 in #1381
Update constrained decoding by @ajindal1 in #1477
Enable TRT multi profile option though provider option by @anujj in #1493
Add support for Machine Translation model by @apsonawane in #1482
Overlap prompt processing KV cache update for WindowedKeyValueCache in DecoderOnlyPipelineState by @edgchen1 in #1526
Add basic support for tracing by @edgchen1 in #1524
Logging SetLogCallback + Debugging cleanup by @RyanUnderhill in #1471
Support loading models from memory by @baijumeswani in #1571
Add SLM Engine support function calling by @kinfey in #1582
Pass the batch_size thought the Overlay by @anujj in #1627
Enable GPU based sampling for TRT-RTX by @gaugarg-nv in #1650

Model Builder Changes

Whisper Redesigned Solution by @kunal-vaishnavi in #1229
[Builder] Add support for Olive quantized models by @jambayk in #1647
Add Qwen3 to model builder by @xenova in #1428
Model builder: Add ability to exclude a node from quantization by @sushraja-msft in #1436
Support k_quant in model builder by @jiafatom in #1444
Add final norm for LoRA models by @kunal-vaishnavi in #1446
Add bfloat16 support in model builder by @kunal-vaishnavi in #1447
Fix accuracy issues with Gemma models by @kunal-vaishnavi in #1448
Always cast bf16 logits to fp32 by @nenad1002 in #1479
NvTensorRtRtx EP option in GenAI - model builder by @BLSharda in #1453
Add Gemma3 Model support for NvTensorRtRtx execution provider by @anujj in #1520
Use IRv10 in the model builder by @justinchuby in #1547
[Builder] Rename methods make_value and make_initializer by @justinchuby in #1554
Always use opset21 in builder by @justinchuby in #1548
Clamp KV Cache Size to Sliding Window for NvTensorRtRtx EP by @BLSharda in #1523
[Builder] Fix output name in make_rotary_embedding_multi_cache by @justinchuby in #1562
[Builder] Use lazy tensor by @justinchuby in #1556
[Builder] Fix KeyError for torch.uint8 in dtype mapping for MoE quantization by @Copilot in #1561
[Builder] Fix 1d constant creation by @justinchuby in #1568
[Builder] Create progress bar by @justinchuby in #1559
[Builder] Use packed 4bit tensors directly by @justinchuby in #1566
[Builder] Simplify constant creation by @justinchuby in #1569
[Builder] Add cuda-bfloat16 entry to valid_gqa_configurations by @justinchuby in #1585
[Builder] use dtype conversion helpers from onnx_ir by @justinchuby in #1587
[Model builder] Add support for Ernie 4.5 models by @xenova in #1608
whisper: Allow session options to be used for encoder by @RyanMetcalfeInt8 in #1622
Make default top_k=50 in model builder by @jiafatom in #1642
Update builder.py by @lnigam in #1665
Change IO dtype for INT4 CUDA models by @kunal-vaishnavi in #1629

Bug fixes

CUDA Top K / Top P Fixes by @aciddelgado in #1371
Persist provider options across ClearProviders, AppendProvider where possible by @baijumeswani in #1454
Add enable_skip_layer_norm_strict_mode flag by @nenad1002 in #1462
Avoid adding providers if not requested by @baijumeswani in #1464
Fix array eos_token_id handling by @RyanUnderhill in #1463
Remove BF16 CPU from valid GQA configuration by @nenad1002 in #1469
Address QNN specific regressions by @baijumeswani in #1470
Fix how torch tensors are saved by @kunal-vaishnavi in #1476
Fix model chat example for rewind by @ajindal1 in #1480
Correctly iterate over the providers to check if graph capture is enabled by @baijumeswani in #1497
Fix missing parameter name by @xadupre in #1502
Fix from pretrained method for quantized models by @kunal-vaishnavi in #1503
Remove position_id and fix context phase KV shapes for in-place cache buffer support by @anujj in #1505
Fix last layer generation for text-only models by @nenad1002 in #1513
[Fix] Remove references to TensorProto by @justinchuby in #1549
Fix make_layernorm_casts usage of value infos by @justinchuby in #1551
Fix DML Memory Leak by @aciddelgado in #1578
[DML] Bind the dml global objects to the Model by @baijumeswani in #1590
NvTensorRTRTx: Enable CUDA graph via config and fix attention_mask shape handling by @anujj in #1594
Append eos token to the end of input sequence for marian models by @apsonawane in #1630
Use two-step Softmax to do cuda sampling by @jiafatom in #1617
Use two-step softmax for CPU sampling by @jiafatom in #1631
Use last windowed input ids to update logits by @baijumeswani in #1636
Fix attention‑mask stride bug for static masking (batch > 1) by @anujj in #1639
Add open bytes functionality for C# by @ajindal1 in #1634

Packaging/Testing/Pipelines

Sign macos binaries by @baijumeswani in #1439
Add chat template tests by @sayanshaw24 in #1457
Update triggers by @snnn in #1490
Add support for building a cuda + dml package by @baijumeswani in #1600
NvTensorRtRtx: Pass the dynamic shapes (ISL and batch_size) to the ep at runtime as nv profile. by @anujj in #1614
Update docker image by @snnn in #1633
sign all genai dlls, in both onnxunrime-genai and python targets by @vortex-captain in #1635
Fixes all packaging pipelines by @baijumeswani in #1641
Update the benchmark scripts to account for the time spent in sampling by @gaugarg-nv in #1646
Add date for nightly packages by @ajindal1 in #1668

Compliance

Enable policheck in packaging pipeline by @apsonawane in #1449
Add third party notices in file exclusion by @apsonawane in #1459
Enable tsa options in packaging pipelines by @apsonawane in #1460
Update windows packaging pipelines to use build.py by @aciddelgado in #1468

Documentation and Examples

Update OnnxRuntimeGenAIChatClient with chat template and guidance by @stephentoub in #1533
Update SimpleGenAI.java docs by @edgchen1 in #1532
Make OnnxRuntime GenAI Examples Simpler by @baijumeswani in #1615
Update extensions commit and update example script for translation model by @apsonawane in #1623
Add instructions for macOS by @baijumeswani in #1625
Add nightly build badge to README by @natke in #1653
Add NvTenosrRtRtx ep in example file by @anujj in #1656
Update main README for 0.9.0 release by @kunal-vaishnavi in #1660
Update C++, C# and Python Examples by @sayanshaw24 in #1664

Tokenizer/Templating Changes

Set add_special_tokens to false by default in Encode by @sayanshaw24 in #1442
Remove prompt templates from GenAI config by @kunal-vaishnavi in #1445
Update Extensions Commit to Support Chat Template Override for Unsupported Models by @sayanshaw24 in #1452
Integrate tools input into Chat Template API by @sayanshaw24 in #1472
Update Chat Template Examples for Tools API change by @sayanshaw24 in #1506

Dependency Updates

Update to M.E.AI 9.4.3-preview.1.25230.7 by @stephentoub in #1443
Update to stable release of Microsoft.Extensions.AI.Abstractions by @stephentoub in #1489
Use ONNX IR for model builder by @justinchuby in #1416
Automatically install java maven artifact in the local maven repository by @asoldano in #1570
Bump onnx-ir to 0.1.2 by @jiafatom in #1579
Update OnnxRuntimeGenAIChatClient to M.E.AI.Abstractions 9.7.0 by @stephentoub in #1612
Update ORT Extensions Commit by @sayanshaw24 in #1667

New Contributors

@xenova made their first contribution in #1428
@sushraja-msft made their first contribution in #1436
@anujj made their first contribution in #1493
@xadupre made their first contribution in #1502
@satreysa made their first contribution in #1483
@Copilot made their first contribution in #1516
@asoldano made their first contribution in #1510
@justinchuby made their first contribution in #1416
@microsoft-github-policy-service[bot] made their first contribution in #1552
@mattleibow made their first contribution in #1572
@kinfey made their first contribution in #1582
@gaugarg-nv made their first contribution in #1646
@lnigam made their first contribution in #1665

Full Changelog: v0.8.3...v0.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Features

Model Builder Changes

Bug fixes

Packaging/Testing/Pipelines

Compliance

Documentation and Examples

Tokenizer/Templating Changes

Dependency Updates

New Contributors

Contributors

Uh oh!