Skip to content

Commit 6518b82

Browse files
Keep example flags and repo names consistent
1 parent 22c0c32 commit 6518b82

File tree

17 files changed

+41
-49
lines changed

17 files changed

+41
-49
lines changed

.pipelines/stages/jobs/steps/python-validation-step.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ steps:
4343
python -m pip install --no-index --find-links=$(Build.BinariesDirectory)/wheel $(pip_package_name)
4444
4545
if ("$(ep)" -eq "directml") {
46-
python ${{ parameters.PythonScriptName }} -m .\${{ parameters.LocalFolder }}\${{ parameters.ModelFolder }} --provider dml --non-interactive
46+
python ${{ parameters.PythonScriptName }} -m .\${{ parameters.LocalFolder }}\${{ parameters.ModelFolder }} -e dml --non-interactive
4747
} else {
48-
python ${{ parameters.PythonScriptName }} -m .\${{ parameters.LocalFolder }}\${{ parameters.ModelFolder }} --provider $(ep) --non-interactive
48+
python ${{ parameters.PythonScriptName }} -m .\${{ parameters.LocalFolder }}\${{ parameters.ModelFolder }} -e $(ep) --non-interactive
4949
}
5050
displayName: 'Run ${{ parameters.PythonScriptName }} With Artifact on Windows'
5151
workingDirectory: '$(Build.Repository.LocalPath)'
@@ -72,7 +72,7 @@ steps:
7272
$python_exe -m pip install -r /ort_genai_src/test/python/cuda/ort/requirements.txt && \
7373
cd /ort_genai_src/${{ parameters.PythonScriptFolder }} && \
7474
$python_exe -m pip install --no-index --find-links=/ort_genai_binary/wheel $(pip_package_name) && \
75-
$python_exe ${{ parameters.PythonScriptName }} -m ./${{ parameters.LocalFolder }}/${{ parameters.ModelFolder }} --provider $(ep) --non-interactive"
75+
$python_exe ${{ parameters.PythonScriptName }} -m ./${{ parameters.LocalFolder }}/${{ parameters.ModelFolder }} -e $(ep) --non-interactive"
7676
7777
displayName: 'Run ${{ parameters.PythonScriptName }} With Artifact on Linux CUDA'
7878
workingDirectory: '$(Build.Repository.LocalPath)'
@@ -91,7 +91,7 @@ steps:
9191
fi
9292
cd ${{ parameters.PythonScriptFolder }}
9393
python -m pip install --no-index --find-links=$(Build.BinariesDirectory)/wheel $(pip_package_name)
94-
python ${{ parameters.PythonScriptName }} -m ./${{ parameters.LocalFolder }}/${{ parameters.ModelFolder }} --provider $(ep) --non-interactive
94+
python ${{ parameters.PythonScriptName }} -m ./${{ parameters.LocalFolder }}/${{ parameters.ModelFolder }} -e $(ep) --non-interactive
9595
displayName: 'Run ${{ parameters.PythonScriptName }} With Artifact on Linux/macOS CPU'
9696
workingDirectory: '$(Build.Repository.LocalPath)'
9797
condition: and(or(eq(variables['os'], 'linux'), eq(variables['os'], 'osx')), eq(variables['ep'], 'cpu'))

README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
1-
# ONNX Runtime generate() API
1+
# ONNX Runtime GenAI
22

33
## *Main branch contains new API changes and examples in main branch reflect these changes. For example scripts compatible with current release (0.5.2), [see release branch](https://github.com/microsoft/onnxruntime-genai/tree/rel-0.5.2).*
44

55

66
[![Latest version](https://img.shields.io/nuget/vpre/Microsoft.ML.OnnxRuntimeGenAI.Managed?label=latest)](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntimeGenAI.Managed/absoluteLatest)
77

8-
Run Llama, Phi, Gemma, Mistral with ONNX Runtime.
8+
Run generative AI models with ONNX Runtime.
99

1010
This API gives you an easy, flexible and performant way of running LLMs on device.
1111

1212
It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
1313

14-
You can call a high level `generate()` method to generate all of the output at once, or stream the output one token at a time.
15-
1614
See documentation at https://onnxruntime.ai/docs/genai.
1715

1816
|Support matrix|Supported now|Under development|On the roadmap|

examples/c/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ONNX Runtime generate() API C Example
1+
# ONNX Runtime GenAI C Example
22

33
## Setup
44

examples/csharp/HelloPhi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Generate() API C# example
1+
# ONNX Runtime GenAI C# example
22

33
## Obtain a model
44

examples/python/README.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Generate() API Python Examples
1+
# ONNX Runtime GenAI API Python Examples
22

33
## Install the onnxruntime-genai library
44

@@ -12,19 +12,15 @@ If you bring your own model, you need to provide the configuration. See the [con
1212

1313
To generate the model with model builder:
1414

15-
1. Install the model builder script dependencies
15+
1. Install the model builder's dependencies
1616

1717
```bash
1818
pip install numpy transformers torch onnx onnxruntime
1919
```
2020

21-
2. Choose a model. Examples of supported ones are:
22-
- Phi-2
23-
- Mistral
24-
- Gemma 2B IT
25-
- LLama 7B
21+
2. Choose a model. Examples of supported ones are listed on the repo's main README.
2622

27-
3. Run the model builder script to export, optimize, and quantize the model. More details can be found [here](../../src/python/py/models/README.md)
23+
3. Run the model builder to export, optimize, and quantize the model. More details can be found [here](../../src/python/py/models/README.md)
2824

2925
```bash
3026
cd examples/python
@@ -41,6 +37,6 @@ The `model-qa` script streams the output text token by token.
4137

4238
To run the python examples...
4339
```bash
44-
python model-generate.py -m {path to model folder} -pr {input prompt}
45-
python model-qa.py -m {path to model folder}
40+
python model-generate.py -m {path to model folder} -e {execution provider} -pr {input prompt}
41+
python model-qa.py -m {path to model folder} -e {execution provider}
4642
```

examples/python/awq-quantized-model.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
1-
# Create AWQ-quantized and optimized ONNX models from PyTorch models with AutoAWQ + ONNX Runtime generate() API
1+
# Create AWQ-quantized and optimized ONNX models from PyTorch models with AutoAWQ + ONNX Runtime GenAI
22

33
## Steps
44
1. [Download your PyTorch model](#1-download-your-pytorch-model)
55
2. [Install AutoAWQ](#2-install-autoawq)
6-
3. [Install the generate() API](#3-install-the-generate-api)
6+
3. [Install ONNX Runtime GenAI](#3-install-onnx-runtime-genai)
77
- [CPU](#cpu)
88
- [CUDA](#cuda)
99
- [DirectML](#directml)
@@ -13,7 +13,7 @@
1313

1414
Activation-aware Weight Quantization (AWQ) works by identifying the top 1% most salient weights that are most important for maintaining accuracy and quantizing the remaining 99% of weights. This leads to less accuracy loss from quantization compared to many other quantization techniques. For more on AWQ, see [here](https://arxiv.org/abs/2306.00978).
1515

16-
This tutorial downloads the Phi-3 mini short context PyTorch model, applies AWQ quantization, generates the corresponding optimized & quantized ONNX model, and runs the ONNX model with ONNX Runtime's generate() API. If you would like to use another model, please change the model name in the instructions below.
16+
This tutorial downloads the Phi-3 mini short context PyTorch model, applies AWQ quantization, generates the corresponding optimized & quantized ONNX model, and runs the ONNX model with ONNX Runtime GenAI. If you would like to use another model, please change the model name in the instructions below.
1717

1818
## 1. Download your PyTorch model
1919

@@ -47,7 +47,7 @@ $ pip install -e .
4747

4848
Note: You can try to install AutoAWQ directly with `pip install autoawq`. However, AutoAWQ will try to auto-detect the CUDA version installed on your machine. If the CUDA version it detects is incorrect, the `.whl` file that `pip` will choose will be incorrect. This will cause an error during runtime when trying to quantize. Thus, it is recommended to install AutoAWQ from source to get the right `.whl` file.
4949

50-
## 3. Install the generate() API
50+
## 3. Install ONNX Runtime GenAI
5151

5252
Based on your desired hardware target, pick from one of the following options to install ONNX Runtime GenAI.
5353

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Description: Example of generate end-to-end usage, including model building and running
22
pip install numpy transformers torch onnx onnxruntime
33
python3 -m onnxruntime_genai.models.builder -m microsoft/phi-2 -o genai_models/phi2-int4-cpu -p int4 -e cpu -c hf_cache
4-
python3 model-generate.py -m genai_models/phi2-int4-cpu -pr "my favorite movie is" "write a function that always returns True" "I am very happy" -p 0.0 -k 1 -v
4+
python3 model-generate.py -m genai_models/phi2-int4-cpu -e cpu -pr "my favorite movie is" "write a function that always returns True" "I am very happy" -p 0.0 -k 1 -v

examples/python/model-generate.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ def main(args):
77
if hasattr(og, 'Config'):
88
config = og.Config(args.model_path)
99
config.clear_providers()
10-
if args.provider != "cpu":
10+
if args.execution_provider != "cpu":
1111
if args.verbose:
12-
print(f"Setting model to {args.provider}...")
13-
config.append_provider(args.provider)
12+
print(f"Setting model to {args.execution_provider}...")
13+
config.append_provider(args.execution_provider)
1414
model = og.Model(config)
1515
else:
1616
model = og.Model(args.model_path)
@@ -80,8 +80,8 @@ def main(args):
8080

8181
if __name__ == "__main__":
8282
parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS, description="End-to-end token generation loop example for gen-ai")
83-
parser.add_argument('-m', '--model_path', type=str, required=True, help='Onnx model folder path (must contain config.json and model.onnx)')
84-
parser.add_argument("-p", "--provider", type=str, required=True, choices=["cpu", "cuda", "dml"], help="Provider to run model")
83+
parser.add_argument('-m', '--model_path', type=str, required=True, help='Onnx model folder path (must contain genai_config.json and model.onnx)')
84+
parser.add_argument("-e", "--execution_provider", type=str, required=True, choices=["cpu", "cuda", "dml"], help="Provider to run model")
8585
parser.add_argument('-pr', '--prompts', nargs='*', required=False, help='Input prompts to generate tokens from. Provide this parameter multiple times to batch multiple prompts')
8686
parser.add_argument('-i', '--min_length', type=int, default=25, help='Min number of tokens to generate including the prompt')
8787
parser.add_argument('-l', '--max_length', type=int, default=50, help='Max number of tokens to generate including the prompt')

examples/python/model-qa.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ def main(args):
9898

9999
if __name__ == "__main__":
100100
parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS, description="End-to-end AI Question/Answer example for gen-ai")
101-
parser.add_argument('-m', '--model', type=str, required=True, help='Onnx model folder path (must contain config.json and model.onnx)')
101+
parser.add_argument('-m', '--model_path', type=str, required=True, help='Onnx model folder path (must contain genai_config.json and model.onnx)')
102102
parser.add_argument('-e', '--execution_provider', type=str, required=True, choices=["cpu", "cuda", "dml"], help="Execution provider to run ONNX model with")
103103
parser.add_argument('-i', '--min_length', type=int, help='Min number of tokens to generate including the prompt')
104104
parser.add_argument('-l', '--max_length', type=int, help='Max number of tokens to generate including the prompt')

examples/python/phi-3-tutorial.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Run the Phi-3 models with the ONNX Runtime generate() API
1+
# Run the Phi-3 models with ONNX Runtime GenAI
22

33
## Steps
44
1. [Setup](#setup)
@@ -56,7 +56,7 @@ Are you on a Windows machine with GPU?
5656
This command downloads the model into a folder called `directml`.
5757

5858

59-
2. Install the generate() API
59+
2. Install ONNX Runtime GenAI
6060

6161
```bash
6262
pip install onnxruntime-genai-directml
@@ -97,7 +97,7 @@ Are you on a Windows machine with GPU?
9797
9898
This command downloads the model into a folder called `cuda`.
9999
100-
2. Install the generate() API
100+
2. Install ONNX Runtime GenAI
101101
102102
```bash
103103
pip install onnxruntime-genai-cuda
@@ -130,7 +130,7 @@ Are you on a Windows machine with GPU?
130130

131131
This command downloads the model into a folder called `cpu_and_mobile`
132132

133-
2. Install the generate() API for CPU
133+
2. Install ONNX Runtime GenAI
134134

135135
```bash
136136
pip install onnxruntime-genai

0 commit comments

Comments
 (0)