|
1 | 1 | # ONNX Runtime GenAI |
2 | 2 |
|
3 | | -## *Main branch contains new API changes and examples in main branch reflect these changes. For example scripts compatible with current release (0.5.2), [see release branch](https://github.com/microsoft/onnxruntime-genai/tree/rel-0.5.2).* |
| 3 | +Note: between release candidate 0.7.0-rc2 and release 0.7.0 there is a breaking Python API change in `tokenizer.encode(prompt)`. Previously this method returned a Python list and now returns a numpy array. When concatenating the tokens generated by two prompts to pass to `append_tokens` e.g a system prompt and a user prompt, you must use the following instead of `system_prompt + input_tokens`: |
4 | 4 |
|
| 5 | +```python |
| 6 | +system_tokens = tokenizer.encode(system_prompt) |
| 7 | +input_tokens = tokenizer.encode(prompt) |
| 8 | +generator.append_tokens(np.concatenate([system_tokens, input_tokens])) |
| 9 | +``` |
5 | 10 |
|
6 | 11 | [](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntimeGenAI.Managed/absoluteLatest) |
7 | 12 |
|
@@ -51,8 +56,6 @@ See [installation instructions](https://onnxruntime.ai/docs/genai/howto/install) |
51 | 56 |
|
52 | 57 | 3. Run the model |
53 | 58 |
|
54 | | - ### Build from source / Next release (0.6.0) |
55 | | - |
56 | 59 | ```python |
57 | 60 | import onnxruntime_genai as og |
58 | 61 |
|
@@ -97,52 +100,6 @@ See [installation instructions](https://onnxruntime.ai/docs/genai/howto/install) |
97 | 100 | del generator |
98 | 101 | ``` |
99 | 102 |
|
100 | | - ### Current release (until 0.5.x) |
101 | | - |
102 | | - ```python |
103 | | - import onnxruntime_genai as og |
104 | | - |
105 | | - model = og.Model('cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4') |
106 | | - tokenizer = og.Tokenizer(model) |
107 | | - tokenizer_stream = tokenizer.create_stream() |
108 | | - |
109 | | - # Set the max length to something sensible by default, |
110 | | - # since otherwise it will be set to the entire context length |
111 | | - search_options = {} |
112 | | - search_options['max_length'] = 2048 |
113 | | - |
114 | | - chat_template = '<|user|>\n{input} <|end|>\n<|assistant|>' |
115 | | - |
116 | | - text = input("Input: ") |
117 | | - if not text: |
118 | | - print("Error, input cannot be empty") |
119 | | - exit |
120 | | - |
121 | | - prompt = f'{chat_template.format(input=text)}' |
122 | | - |
123 | | - input_tokens = tokenizer.encode(prompt) |
124 | | - |
125 | | - params = og.GeneratorParams(model) |
126 | | - params.set_search_options(**search_options) |
127 | | - |
128 | | - generator = og.Generator(model, params) |
129 | | - generator.append_tokens(input_tokens) |
130 | | - |
131 | | - print("Output: ", end='', flush=True) |
132 | | - |
133 | | - try: |
134 | | - while not generator.is_done(): |
135 | | - generator.generate_next_token() |
136 | | - |
137 | | - new_token = generator.get_next_tokens()[0] |
138 | | - print(tokenizer_stream.decode(new_token), end='', flush=True) |
139 | | - except KeyboardInterrupt: |
140 | | - print(" --control+c pressed, aborting generation--") |
141 | | - |
142 | | - print() |
143 | | - del generator |
144 | | - ``` |
145 | | - |
146 | 103 | ### Choosing the Right Examples: Release vs. Main Branch |
147 | 104 |
|
148 | 105 | Due to evolving nature of this project and ongoing feature additions, examples in the `main` branch may not always align with the latest stable release. This section outlines how to ensure compatibility between the examples and the corresponding version. Majority of the steps would remain same, just the package installation and the model example file would change. |
|
0 commit comments