Skip to content

Commit efab081

Browse files
aciddelgadoshobrienDMABowenBaokunal-vaishnaviajindal1
authored
Initial cherry-pick (#1394)
Co-authored-by: shobrienDMA <[email protected]> Co-authored-by: Bowen Bao <[email protected]> Co-authored-by: kunal-vaishnavi <[email protected]> Co-authored-by: Abhishek Jindal <[email protected]> Co-authored-by: Roger Barreto <[email protected]> Co-authored-by: David Fan <[email protected]> Co-authored-by: Alexandre Zollinger Chohfi <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: Ryan Hill <[email protected]> Co-authored-by: Stephen Toub <[email protected]>
1 parent 8a48d7b commit efab081

File tree

24 files changed

+574
-250
lines changed

24 files changed

+574
-250
lines changed

.github/workflows/linux-cpu-arm64-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474
run: |
7575
docker run --rm \
7676
--volume $GITHUB_WORKSPACE:/onnxruntime_src \
77-
-w /onnxruntime_src ort_genai_linux_arm64_gha bash -c "/usr/bin/cmake --preset linux_gcc_cpu_release"
77+
-w /onnxruntime_src ort_genai_linux_arm64_gha bash -c "python3 --version && /usr/bin/cmake --preset linux_gcc_cpu_release"
7878
7979
- name: Docker -- Build with CMake and GCC
8080
run: |

.pipelines/stages/jobs/custom-nuget-packaging-job.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,12 +49,12 @@ jobs:
4949
foreach ($file in $artifacts) {
5050
$a = $file.Name
5151
Write-Host "Extracting " $a
52-
$rid_match = $a -match "onnxruntime-genai-\d+\.\d+\.\d+(?:-[^-\s]+-)(.+?)-?(?:cuda|dml)?(\.zip|\.tar\.gz)"
53-
if ($rid_match) {
54-
$rid = $Matches.1
55-
}
56-
else {
57-
Write-Host "Invalid artifact name" $file
52+
if ($a -like "*win-x64*") {
53+
$rid = "win-x64"
54+
} elseif ($a -like "*win-arm64*") {
55+
$rid = "win-arm64"
56+
} else {
57+
Write-Host "Unknown artifact name" $a
5858
return
5959
}
6060

README.md

Lines changed: 6 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,12 @@
11
# ONNX Runtime GenAI
22

3-
## *Main branch contains new API changes and examples in main branch reflect these changes. For example scripts compatible with current release (0.5.2), [see release branch](https://github.com/microsoft/onnxruntime-genai/tree/rel-0.5.2).*
3+
Note: between release candidate 0.7.0-rc2 and release 0.7.0 there is a breaking Python API change in `tokenizer.encode(prompt)`. Previously this method returned a Python list and now returns a numpy array. When concatenating the tokens generated by two prompts to pass to `append_tokens` e.g a system prompt and a user prompt, you must use the following instead of `system_prompt + input_tokens`:
44

5+
```python
6+
system_tokens = tokenizer.encode(system_prompt)
7+
input_tokens = tokenizer.encode(prompt)
8+
generator.append_tokens(np.concatenate([system_tokens, input_tokens]))
9+
```
510

611
[![Latest version](https://img.shields.io/nuget/vpre/Microsoft.ML.OnnxRuntimeGenAI.Managed?label=latest)](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntimeGenAI.Managed/absoluteLatest)
712

@@ -51,8 +56,6 @@ See [installation instructions](https://onnxruntime.ai/docs/genai/howto/install)
5156

5257
3. Run the model
5358

54-
### Build from source / Next release (0.6.0)
55-
5659
```python
5760
import onnxruntime_genai as og
5861

@@ -97,52 +100,6 @@ See [installation instructions](https://onnxruntime.ai/docs/genai/howto/install)
97100
del generator
98101
```
99102

100-
### Current release (until 0.5.x)
101-
102-
```python
103-
import onnxruntime_genai as og
104-
105-
model = og.Model('cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4')
106-
tokenizer = og.Tokenizer(model)
107-
tokenizer_stream = tokenizer.create_stream()
108-
109-
# Set the max length to something sensible by default,
110-
# since otherwise it will be set to the entire context length
111-
search_options = {}
112-
search_options['max_length'] = 2048
113-
114-
chat_template = '<|user|>\n{input} <|end|>\n<|assistant|>'
115-
116-
text = input("Input: ")
117-
if not text:
118-
print("Error, input cannot be empty")
119-
exit
120-
121-
prompt = f'{chat_template.format(input=text)}'
122-
123-
input_tokens = tokenizer.encode(prompt)
124-
125-
params = og.GeneratorParams(model)
126-
params.set_search_options(**search_options)
127-
128-
generator = og.Generator(model, params)
129-
generator.append_tokens(input_tokens)
130-
131-
print("Output: ", end='', flush=True)
132-
133-
try:
134-
while not generator.is_done():
135-
generator.generate_next_token()
136-
137-
new_token = generator.get_next_tokens()[0]
138-
print(tokenizer_stream.decode(new_token), end='', flush=True)
139-
except KeyboardInterrupt:
140-
print(" --control+c pressed, aborting generation--")
141-
142-
print()
143-
del generator
144-
```
145-
146103
### Choosing the Right Examples: Release vs. Main Branch
147104

148105
Due to evolving nature of this project and ongoing feature additions, examples in the `main` branch may not always align with the latest stable release. This section outlines how to ensure compatibility between the examples and the corresponding version. Majority of the steps would remain same, just the package installation and the model example file would change.

VERSION_INFO

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.7.0
1+
0.7.1

cmake/deps.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
#not affect built binaries.
1111
#
1212
# NOTE: You must run deps_update_and_upload.py and generate_cgmanifest.py when ready to test your changes in a CI.
13-
pybind11;https://github.com/pybind/pybind11/archive/refs/tags/v2.10.1.zip;769b6aa67a77f17a770960f604b727645b6f6a13
13+
pybind11;https://github.com/pybind/pybind11/archive/refs/tags/v2.13.6.zip;f780292da9db273c8ef06ccf5fd4b623624143e9
1414
googletest;https://github.com/google/googletest/archive/530d5c8c84abd2a46f38583ee817743c9b3a42b4.zip;5e3a61db2aa975cfd0f97ba92c818744e7fa7034
1515
microsoft_wil;https://github.com/microsoft/wil/archive/refs/tags/v1.0.230629.1.zip;e4a542a323c070376f7c2d1973d0f7ddbc1d2fa5
1616
directx_headers;https://github.com/microsoft/DirectX-Headers/archive/refs/tags/v1.613.1.zip;47653509a3371eabb156360f42faf582f314bf2e

examples/csharp/HelloPhi/HelloPhi.csproj

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010
</PropertyGroup>
1111

1212
<ItemGroup>
13-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
14-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
15-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
13+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
14+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
15+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
1616
</ItemGroup>
1717

1818
<ItemGroup>

examples/csharp/HelloPhi3V/HelloPhi3V.csproj

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
</PropertyGroup>
1010

1111
<ItemGroup>
12-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
13-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
14-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
12+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
13+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
14+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
1515
</ItemGroup>
1616

1717
</Project>

examples/csharp/HelloPhi4MM/HelloPhi4MM.csproj

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
</PropertyGroup>
1010

1111
<ItemGroup>
12-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
13-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
14-
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.0" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
12+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug' OR '$(Configuration)' == 'Release' " />
13+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.Cuda" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_Cuda' OR '$(Configuration)' == 'Release_Cuda' " />
14+
<PackageReference Include="Microsoft.ML.OnnxRuntimeGenAI.DirectML" Version="0.7.1" Condition=" '$(Configuration)' == 'Debug_DirectML' OR '$(Configuration)' == 'Release_DirectML' " />
1515
</ItemGroup>
1616

1717
</Project>

examples/python/model-qa.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import onnxruntime_genai as og
22
import argparse
33
import time
4+
import numpy as np
45

56
def main(args):
67
if args.verbose: print("Loading model...")
@@ -96,7 +97,7 @@ def main(args):
9697
if args.verbose: print("Generator created")
9798

9899
# Append system and input tokens to the generator
99-
generator.append_tokens(system_tokens + input_tokens)
100+
generator.append_tokens(np.concatenate([system_tokens, input_tokens]))
100101

101102
if args.verbose: print("Running generation loop ...")
102103
if args.timings:

examples/python/phi3-qa.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,6 @@ def main(args):
3030

3131
chat_template = '<|user|>\n{input} <|end|>\n<|assistant|>'
3232

33-
params = og.GeneratorParams(model)
34-
params.set_search_options(**search_options)
35-
generator = og.Generator(model, params)
36-
3733
# Keep asking for input prompts in a loop
3834
while True:
3935
text = input("Input: ")
@@ -48,6 +44,10 @@ def main(args):
4844

4945
input_tokens = tokenizer.encode(prompt)
5046

47+
params = og.GeneratorParams(model)
48+
params.set_search_options(**search_options)
49+
generator = og.Generator(model, params)
50+
5151
generator.append_tokens(input_tokens)
5252
if args.verbose: print("Generator created")
5353

0 commit comments

Comments
 (0)