30 May 22:12

baijumeswani

668fefa

v0.8.0

What's Changed

New Features

Add Chat Template API Changes by @sayanshaw24 in #1398
Add Python and C# bindings for Chat Template API by @sayanshaw24 in #1411
Support for gemma3 model by @baijumeswani in #1374
Support more QNN models with different model structures by @baijumeswani in #1322
Add ability to load audio from bytes, to match images API by @RyanUnderhill in #1304
Add support for DML Graph Capture to improve speed by @aciddelgado in #1305
Added OnnxRuntimeGenAIChatClient ctor with Config. by @azchohfi in #1364
Extensible AppendExecutionProvider and expose OrtSessionOptions::AddConfigEntry directly by @RyanUnderhill in #1384
OpenVINO: Model Managed KVCache by @RyanMetcalfeInt8 in #1399
Changes how the device OrtAllocators work, use a global OrtSession instead by @RyanUnderhill in #1378
Remove audio attention mask processing and update ort-extensions by @baijumeswani in #1319
Simplify the C API definitions and prevent any type mismatches going forward by @RyanUnderhill in #1365

Model builder updates

Quark Quantizer Support by @shobrienDMA in #1207
Add Gemma 3 to model builder by @kunal-vaishnavi in #1359
Initial support for VitisAI EP by @AnanyaA-9 in #1370
[OVEP] feat: Adding OpenVINO EP in ORT-GenAI by @ankitm3k in #1389
Initial support for NV EP by @BLSharda in #1404
Adapt to MatMulNBitsQuantizer in ort by @jiafatom in #1426
Fix LM head for Gemma-2 by @kunal-vaishnavi in #1420

Bug Fixes

Fix mismatch in Java bindings by @CaptainIRS in #1307
Fix type mismatch in Java bindings by @CaptainIRS in #1313
Update ort-extensions to fix tokenizer bug for phi4 by @baijumeswani in #1331
Windows: Show more useful DLL load errors to say exactly what DLL is missing by @RyanUnderhill in #1345
deprecate graph cap by @aciddelgado in #1338
Support load/unload of models to avoid QNN errors on deepseek r1 1.5B by @baijumeswani in #1346
Add missing 'value_stats' to logging API, and fix wrong default by @RyanUnderhill in #1353
Convert tokens to list for concat by @ajindal1 in #1358
Improve and Fix TopKTopP by @jiafatom in #1363
Switch the order of softmax on CPU Top K by @aciddelgado in #1354
Update pybind and fix rpath for macos and check for nullptr by @baijumeswani in #1367
iterate over the providers by @baijumeswani in #1486
Correctly iterate over the providers to check if graph capture is enabled by @baijumeswani in #1487

Examples and Documentation

Update README.md by @RyanUnderhill in #1372
Add slm engine example by @avijit-chakroborty in #1242
Added cancellation to the streaming method of OnnxRuntimeGenAIChatClient. by @azchohfi in #1289
Update nuget README with latest API by @natke in #1326
Update C examples downloads by @ajindal1 in #1332
Add Q&A Test Example in Nightly by @ajindal1 in #1277
docs: update the doc of slm_engine to ensure consistency with the code by @dennis2030 in #1386
C++ and python samples: follow_config support by @RyanMetcalfeInt8 in #1413
Fix Do Sample example by @ajindal1 in #1337
Make phi3 example Q&A rather than chat by @ajindal1 in #1392
Fix broken link in package description by @rogerbarreto in #1360

Packaging and Testing

Remove DirectML.dll dependency by @baijumeswani in #1342
Add support to creating a custom nuget in the packaging pipeline by @baijumeswani in #1315
Remove onnxruntime-genai-static library (non trivial change) by @RyanUnderhill in #1264
Add macosx to custom nuget package by @baijumeswani in #1419
Update the C++ clang-format lint workflow to use clang 20 by @snnn in #1418
Add model_benchmark options to specify prompt to use. by @edgchen1 in #1328
Add value_stats logging option to show statistical information about … by @RyanUnderhill in #1352
Fixed the MacOS build and updated the test script. by @avijit-chakroborty in #1310
Fix iOS packaging pipeline after static library removal by @RyanUnderhill in #1316
fix bug in python benchmark script by @thevishalagarwal in #1206
Fix macos package by @baijumeswani in #1347
Missing *.dylib in package_data, so Mac would not package our shared libraries by @RyanUnderhill in #1341

Dependency Updates

Update upload Artifact version by @ajindal1 in #1274
Update to M.E.AI 9.3.0-preview.1.25161.3 by @stephentoub in #1317
Update android min sdk version to 24 by @baijumeswani in #1324
Update torch to 2.5.1 by @baijumeswani in #1343
Update Pipelines for S360 by @ajindal1 in #1323
Update Nuget pkg name by @ajindal1 in #1351
update version to 0.8.0 by @baijumeswani in #1376
Update custom nuget packaging logic by @baijumeswani in #1377
Update Microsoft.Extensions.AI.Abstractions to 9.4.0-preview.1.25207.5 by @stephentoub in #1388
Bump torch from 2.5.1 to 2.6.0 in /test/python/macos/torch by @dependabot in #1408
Bump torch from 2.5.1+cu124 to 2.6.0+cu124 in /test/python/cuda/torch by @dependabot in #1409
Bump torch from 2.5.1+cpu to 2.7.0 in /test/python/cpu/torch by @dependabot in #1422
pin cmake version by @snnn in #1424

New Contributors

@avijit-chakroborty made their first contribution in #1242
@CaptainIRS made their first contribution in #1307
@AnanyaA-9 made their first contribution in #1370
@dennis2030 made their first contribution in #1386
@ankitm3k made their first contribution in #1389
@RyanMetcalfeInt8 made their first contribution in #1399

Full Changelog: v0.7.1...v0.8.0

Contributors

azchohfi, snnn, and 21 other contributors

Assets 13

22 Apr 02:20

aciddelgado

v0.7.1

efab081

v0.7.1

Release Notes

Add AMD Quark Quantizer Support #1207
Added Gemma 3 to model builder #1359
Updated Phi-3 Python Q&A example to be consistent with C++ example #1392
Updated Microsoft.Extensions.AI.Abstractions to 9.4.0-preview.1.25207.5 #1388
Added OnnxRuntimeGenAIChatClient constructor with Config #1364
Improve and Fix TopKTopP #1363
Switch the order of softmax on CPU Top K #1354
Updated custom nuget packaging logic #1377
Updated pybind and fix rpath for macos and check for nullptr #1367
Convert tokens to list for concat to accommodate breaking API change in tokenizer #1358

Assets 13

28 Mar 16:58

baijumeswani

v0.7.0

8a48d7b

v0.7.0

Release Notes

We are excited to announce the release of onnxruntime-genai version 0.7.0. Below are the key updates included in this release:

Support for a wider variety of QNN NPU models (such as Deepseek R1)
Remove onnxruntime-genai static library. All language bindings now interface with onnxruntime-genai through the onnxruntime-genai shared library.
- All return types from onnxruntime-genai python package is now a numpy array type.
- Previously the return type from tokenizer.encode was a python list. This broke examples/python/model-qa.py which was using '+' to concatenate two lists. np.concatenate must be used instead for these cases.
Abstract away execution provider specific code into shared libraries of their own (for example onnxruntime-genai-cuda for cuda, and onnxruntime-genai-dml for dml). This allows using the onnxruntime-genai-cuda package to also work on non cuda machines (as an example).
Support for multi-modal models (text, speech, and vision) such as phi4-multi-modal.
Add an IChatClient implementation to the onnxruntime-genai C# bindings.
Expose the model type through the Python bindings.
Code and performance improvements for DML EP.

This release also includes several bug fixes that resolve issues reported by users.

Assets 13

2 Join discussion

14 Feb 18:07

baijumeswani

v0.6.0

97d44f6

v0.6.0

Release Notes

We are excited to announce the release of onnxruntime-genai version 0.6.0. Below are the key updates included in this release:

Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
Python 3.13 wheels have been introduced
Support for generation for models sourced from Qualcomm's AI Hub. This work also includes publishing a nuget package Microsoft.ML.OnnxRuntimeGenAI.QNN for QNN EP
Support for WebGPU EP

This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.

Assets 13

1 Join discussion

26 Nov 18:05

ajindal1

v0.5.2

27bcf6c

v0.5.2

Release Notes

Patch release 0.5.2 adds:

Fixes for bugs #1074, #1092 via PRs #1065 and #1070
Fix Nuget sample in package README to show correct disposal of objects
Added extra validation via PRs #1050 #1066

Features in 0.5.0:

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

13 Nov 21:26

RyanUnderhill

v0.5.1

e8cd6bc

v0.5.1

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

Add ability to choose provider and modify options at runtime
Fixed data leakage bug with KV caches

Features in 0.5.0:

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

08 Nov 19:43

aciddelgado

v0.5.0

826f6aa

v0.5.0

Release Notes

Support for MultiLoRA
Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
Support for the Phi-3 MoE model
Support for NVIDIA Nemotron model
Support for the Qwen model
Addition of the Set Terminate feature, which allows users to cancel mid-generation
Soft capping support for Group Query Attention
Extend quantization support to embedding and LM head layers
Mac support in published packages

Known issues

Models running with DirectML do not support batching
Python 3.13 is not supported in this release

Assets 11

22 Aug 20:26

ajindal1

v0.4.0

b77e768

v0.4.0

Release Notes

Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
Support to build already-quantized models that were quantized with AWQ or GPTQ
Performance improvements for Intel and Arm CPU
Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)

Assets 9

21 Jun 21:23

baijumeswani

v0.3.0

964eb65

v0.3.0

Release Notes

Phi-3 Vision model support for DML EP.
Addressed DML memory leak issue and crashes on long prompts.
Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
Added the import lib for windows C API package.
Addressed a bug with get_output('logits') so that it returns the logits for the entire prompt and not for the last generated token.
Addressed a bug with querying the device type of the model so that it won't crash.
Added NetStandard 2.0 compatibility.

Assets 9

30 May 17:24

baijumeswani

v0.3.0-rc2

d536387

ONNX Runtime GenAI v0.3.0-rc2 Pre-release

Pre-release

Release Notes

Added support for the Phi-3-Vision model.
Added support for the Phi-3-Small model.
Removed usage of std::filesystem to avoid runtime issues when loading incompatible symbols from stdc++ and stdc++fs.

Assets 7

Releases: microsoft/onnxruntime-genai

v0.8.0

What's Changed

New Features

Model builder updates

Bug Fixes

Examples and Documentation

Packaging and Testing

Dependency Updates

New Contributors

Contributors

Uh oh!

v0.7.1

Release Notes

Uh oh!

v0.7.0

Release Notes

Uh oh!

v0.6.0

Release Notes

Uh oh!

v0.5.2

Release Notes

Known issues

Uh oh!

v0.5.1

Release Notes

Known issues

Uh oh!

v0.5.0

Release Notes

Known issues

Uh oh!

v0.4.0

Release Notes

Uh oh!

v0.3.0

Release Notes

Uh oh!

ONNX Runtime GenAI v0.3.0-rc2

Release Notes

Uh oh!