Release v0.11.0 · microsoft/onnxruntime-genai

What's Changed

ADO - Update WinML build pipeline by @chrisdMSFT in #1768
Fix CMakeLists.txt auto-detection of library directory by @anujj in #1774
Fix new/delete override and Enable cuda kernel test in Windows by @tianleiwu in #1772
Use abbreviation for TensorRT RTX EP by @kunal-vaishnavi in #1763
Add trust remote code option to model builder by @kunal-vaishnavi in #1766
Support block-wise quant in qmoe op by @apsonawane in #1746
Change the status for TRT-RTX EP by @gaugarg-nv in #1780
Cherry-Pick changes from rel 0.10.0 back to main. by @chrisdMSFT in #1782
Fix /CETCOMPAT Usage for Cross-Compiling by @sayanshaw24 in #1779
Provide distributed version of improved TopK kernel by @hariharans29 in #1710
[TRT-RTX] Disable KV cache re-computation for Phi models by @gaugarg-nv in #1787
[CUDA] Add high-performance Top-K kernels and online benchmarking by @tianleiwu in #1748
Change shared indices array type from float to int by @hariharans29 in #1789
Enable bfloat16 multi-modal models by @kunal-vaishnavi in #1786
Disable lmhead while prompt processing by @qti-ashimaj in #1762
Introduce support for dynamic batching by @baijumeswani in #1662
Generate pyd type info by @chemwolf6922 in #1742
Add trt-rtx c packages in c example by @anujj in #1794
[CUDA] Fix build with CUDA >= 12.9 by @tianleiwu in #1802
[CUDA] topk kernels v2 by @tianleiwu in #1798
Add prefill Chunking Support for NvTensorRtRtx and Cuda Providers by @anujj in #1765
Add TRT-RTX EP support, keep NvTensorRtRtx as user facing name, and force QDQ by @anujj in #1791
[CUDA] Add static assert to suppress windows build warnings by @tianleiwu in #1804
Revert "Generate pyd type info" by @baijumeswani in #1805
[QNN] Support continuous decoding by @baijumeswani in #1808
ADO Pipeline - nuget_winml_package_reference_version is configured at build time. by @chrisdMSFT in #1811
Update version to 0.11.0-dev by @baijumeswani in #1815
Add Support For Tokenizer Options by @sayanshaw24 in #1785
Fix exit call in README example by @justinchuby in #1823
Add tokenizer APIs for accessing important ids by @kunal-vaishnavi in #1822
Use correct classes for config-only usage in model builder by @kunal-vaishnavi in #1828
Fix packaging pipeline by @baijumeswani in #1829
Add missing tokenizer methods in java by @baijumeswani in #1833
Add run options to ONNX Runtime GenAI by @kunal-vaishnavi in #1795
Avoid Processing EOS Token During Continuous Decoding by @baijumeswani in #1814
Fix nuget packaging pipeline for dev builds by @baijumeswani in #1837
Add tool normalization for tool calling by @kunal-vaishnavi in #1838
Refactor past_present_share_buffer logic into reusable function by @anujj in #1839
Fix nuget packaging pipeline by @baijumeswani in #1841
Add enable_webgpu_graph in extra_options by @qjia7 in #1788
Update tool normalization in ORT GenAI by @kunal-vaishnavi in #1842
Support RotaryEmbedding in GQA for webgpu ep by @xiaofeihan1 in #1847
Enable guidance ff tokens for faster inference by @JC1DA in #1803
Support pre-registered plug-in cuda execution provider library by @baijumeswani in #1850
ADO: Update pipeline to publish onnxruntime-genai. for relwithdebinfo builds. by @chrisdMSFT in #1855
Layer-wise KV Cache Allocation for Models with Alternating Attention Patterns by @anujj in #1832
Mpasumarthi/nvtrt test suite by @mpasumarthi-git in #1756
bugfix: fix a memory issue in Whisper by @fs-eire in #1859
Add disable cuda graph when num_beams > 1 and fix set_provider_option bug by @anujj in #1846
Mixed precision export support for gptq quantized model by @rM-planet in #1853
Enable If Node Support for TRT-RTX in Phi-3.5/Phi-4 LongRoPE Models by @anujj in #1851
Fix handling EOS token id detection by @kunal-vaishnavi in #1849
Ensure Consistent Tool Calling JSON Serialization and Deserialization by @sayanshaw24 in #1863
Add C# binding for GetNextTokens by @kunal-vaishnavi in #1865
Set version as 0.11.0 by @kunal-vaishnavi in #1866

New Contributors

@hariharans29 made their first contribution in #1710
@qti-ashimaj made their first contribution in #1762
@chemwolf6922 made their first contribution in #1742
@qjia7 made their first contribution in #1788
@xiaofeihan1 made their first contribution in #1847
@JC1DA made their first contribution in #1803
@mpasumarthi-git made their first contribution in #1756
@rM-planet made their first contribution in #1853

Full Changelog: v0.10.0...v0.11.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.11.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!