Skip to content

Releases: microsoft/onnxruntime-genai

v0.11.2

18 Nov 12:53
25962b0

Choose a tag to compare

What's Changed

Full Changelog: v0.11.1...v0.11.2

v0.11.1

17 Nov 03:39
ec0f733

Choose a tag to compare

What's Changed

Full Changelog: v0.11.0...v0.11.1

v0.11.0

14 Nov 02:51
e0e02a9

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.10.0...v0.11.0

v0.10.0

10 Oct 17:26
6deb570

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.9.2...v0.10.0

v0.9.2

16 Sep 07:27

Choose a tag to compare

This release fixes a pre-processing bug with Phi-4 multimodal.

Full Changelog: v0.9.1...v0.9.2

v0.9.1

09 Sep 22:53
41211b8

Choose a tag to compare

🚀 Features

Support for Continuous Batching (#1580) by @baijumeswani
RegisterExecutionProviderLibrary (#1628) by @vortex-captain
Enable CUDA graph for LLMs for NvTensorRtRtx EP (#1645) by @anujj
Add support for smollm3 (#1666) by @xenova
Add OpenAI's gpt-oss to ONNX Runtime GenAI (#1678) by @kunal-vaishnavi
Add custom ops library path resolution using EP metadata (#1707) by @psakhamoori
Use OnnxRuntime API wrapper for EP device operations (#1719) by @psakhamoori

🛠 Improvements

Update Extensions Commit to Support Strft Custom Function for Chat Template (#1670) by @sayanshaw24
Add parameters to chat template in chat example (#1673) by @kunal-vaishnavi
Update how Hugging Face's config files are processed (#1693) by @kunal-vaishnavi
Tie embedding weight sharing (#1690) by @jiafatom
Improve top-k sampling CUDA kernel (#1708) by @gaugarg-nv

🐛 Bug Fixes

Fix accessing final norm for Gemma-3 models (#1687) by @kunal-vaishnavi
Fix runtime bugs with multi-modal models (#1701) by @kunal-vaishnavi
Fix BF16 CUDA version of OpenAI's gpt-oss (#1706) by @kunal-vaishnavi
Fix benchmark_e2e (#1702) by @jiafatom
Fix benchmark_multimodal (#1714) by @jiafatom
Fix pad vs. eos token misidentification (#1694) by @aciddelgado

⚡ Performance & EP Enhancements

NvTensorRtRtx: Support num_beam > 1 (#1688) by @anujj
NvTensorRtRtx: Skip if node of Phi4 models (#1696) by @anujj
Remove QDQ and Opset Coupling for TRT RTX EP (#1692) by @xiaoyu-work

🔒 Build & CI

Enable Security Protocols in MSVC for BinSkim (#1672) by @sayanshaw24
Explicitly specify setup-java architecture in win-cpu-arm64-build.yml (#1685) by @edgchen1
Use dotnet instead of nuget in mac build (#1717) by @natke

📦 Versioning & Release

Update version to 0.10.0 (#1676) by @ajindal1
Cherrypick 0: Forgot to change versions (#1721) by @aciddelgado
Cherrypick 1... Becomes RC1 (#1726) by @aciddelgado
Cherrypick 2 (#1743) by @aciddelgado

🙌 New Contributors

@xiaoyu-work (#1692)
@psakhamoori (#1707)

✅ Full Changelog: v0.9.0...v0.9.1

v0.9.0

06 Aug 17:30

Choose a tag to compare

What's Changed

New Features

Model Builder Changes

Bug fixes

Packaging/Testing/Pipelines

Compliance

Documentation and Examples

  • Update OnnxRuntimeGenAIChatClient with chat template and guidance by @stephentoub in #1533
  • Update SimpleGenAI.java docs by @edgc...
Read more

v0.8.3

03 Jul 20:37
dc2d850

Choose a tag to compare

This release addresses regressions with DML.

Fixes include:

v0.8.2

05 Jun 23:03
fea4e96

Choose a tag to compare

What's changed

New features

Bug fixes

  • Remove position_id and fix context phase KV shapes for in-place cache buffer support by @anujj (#1505)
  • Update Extensions Commit for 0.8.2 by @sayanshaw24 (#1519)
  • Update Extensions Commit for another DeepSeek Fix by @sayanshaw24 (#1521)

Packaging and testing

Full Changelog: v0.8.1...v0.8.2

v0.8.1

30 May 22:14
caba648

Choose a tag to compare

What's changed

New features

  • NvTensorRtRtx EP option in GenAI - model builder by @BLSharda (#1453)
  • Enable TRT multi profile option though provider option by @anujj (#1493)

Bug fixes

Examples and documentation

Model builder changes

Dependency updates

Full Changelog: v0.8.0...v0.8.1