v0.11.0
·
1 commit
to rel-0.11.0
since this release
What's Changed
- ADO - Update WinML build pipeline by @chrisdMSFT in #1768
- Fix CMakeLists.txt auto-detection of library directory by @anujj in #1774
- Fix new/delete override and Enable cuda kernel test in Windows by @tianleiwu in #1772
- Use abbreviation for TensorRT RTX EP by @kunal-vaishnavi in #1763
- Add trust remote code option to model builder by @kunal-vaishnavi in #1766
- Support block-wise quant in qmoe op by @apsonawane in #1746
- Change the status for TRT-RTX EP by @gaugarg-nv in #1780
- Cherry-Pick changes from rel 0.10.0 back to main. by @chrisdMSFT in #1782
- Fix /CETCOMPAT Usage for Cross-Compiling by @sayanshaw24 in #1779
- Provide distributed version of improved TopK kernel by @hariharans29 in #1710
- [TRT-RTX] Disable KV cache re-computation for Phi models by @gaugarg-nv in #1787
- [CUDA] Add high-performance Top-K kernels and online benchmarking by @tianleiwu in #1748
- Change shared indices array type from float to int by @hariharans29 in #1789
- Enable bfloat16 multi-modal models by @kunal-vaishnavi in #1786
- Disable lmhead while prompt processing by @qti-ashimaj in #1762
- Introduce support for dynamic batching by @baijumeswani in #1662
- Generate pyd type info by @chemwolf6922 in #1742
- Add trt-rtx c packages in c example by @anujj in #1794
- [CUDA] Fix build with CUDA >= 12.9 by @tianleiwu in #1802
- [CUDA] topk kernels v2 by @tianleiwu in #1798
- Add prefill Chunking Support for NvTensorRtRtx and Cuda Providers by @anujj in #1765
- Add TRT-RTX EP support, keep NvTensorRtRtx as user facing name, and force QDQ by @anujj in #1791
- [CUDA] Add static assert to suppress windows build warnings by @tianleiwu in #1804
- Revert "Generate pyd type info" by @baijumeswani in #1805
- [QNN] Support continuous decoding by @baijumeswani in #1808
- ADO Pipeline - nuget_winml_package_reference_version is configured at build time. by @chrisdMSFT in #1811
- Update version to 0.11.0-dev by @baijumeswani in #1815
- Add Support For Tokenizer Options by @sayanshaw24 in #1785
- Fix exit call in README example by @justinchuby in #1823
- Add tokenizer APIs for accessing important ids by @kunal-vaishnavi in #1822
- Use correct classes for config-only usage in model builder by @kunal-vaishnavi in #1828
- Fix packaging pipeline by @baijumeswani in #1829
- Add missing tokenizer methods in java by @baijumeswani in #1833
- Add run options to ONNX Runtime GenAI by @kunal-vaishnavi in #1795
- Avoid Processing EOS Token During Continuous Decoding by @baijumeswani in #1814
- Fix nuget packaging pipeline for dev builds by @baijumeswani in #1837
- Add tool normalization for tool calling by @kunal-vaishnavi in #1838
- Refactor past_present_share_buffer logic into reusable function by @anujj in #1839
- Fix nuget packaging pipeline by @baijumeswani in #1841
- Add enable_webgpu_graph in extra_options by @qjia7 in #1788
- Update tool normalization in ORT GenAI by @kunal-vaishnavi in #1842
- Support RotaryEmbedding in GQA for webgpu ep by @xiaofeihan1 in #1847
- Enable guidance ff tokens for faster inference by @JC1DA in #1803
- Support pre-registered plug-in cuda execution provider library by @baijumeswani in #1850
- ADO: Update pipeline to publish onnxruntime-genai. for relwithdebinfo builds. by @chrisdMSFT in #1855
- Layer-wise KV Cache Allocation for Models with Alternating Attention Patterns by @anujj in #1832
- Mpasumarthi/nvtrt test suite by @mpasumarthi-git in #1756
- bugfix: fix a memory issue in Whisper by @fs-eire in #1859
- Add disable cuda graph when num_beams > 1 and fix set_provider_option bug by @anujj in #1846
- Mixed precision export support for gptq quantized model by @rM-planet in #1853
- Enable If Node Support for TRT-RTX in Phi-3.5/Phi-4 LongRoPE Models by @anujj in #1851
- Fix handling EOS token id detection by @kunal-vaishnavi in #1849
- Ensure Consistent Tool Calling JSON Serialization and Deserialization by @sayanshaw24 in #1863
- Add C# binding for GetNextTokens by @kunal-vaishnavi in #1865
- Set version as 0.11.0 by @kunal-vaishnavi in #1866
New Contributors
- @hariharans29 made their first contribution in #1710
- @qti-ashimaj made their first contribution in #1762
- @chemwolf6922 made their first contribution in #1742
- @qjia7 made their first contribution in #1788
- @xiaofeihan1 made their first contribution in #1847
- @JC1DA made their first contribution in #1803
- @mpasumarthi-git made their first contribution in #1756
- @rM-planet made their first contribution in #1853
Full Changelog: v0.10.0...v0.11.0