You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Updates
This PR introduces support for ORT Extensions introduced in [this
PR](microsoft/onnxruntime-extensions#998), which
allows passing an **options map** with `OrtxCreateTokenizerWithOptions`
when creating tokenizers, or using a new `OrtxUpdateTokenizerOptions`
method which allows updating the options map on an existing tokenizer
object (including those created using `OrtxCreateTokenizer`), enabling
more flexible configurations.
It additionally removes the previously added `OrtxTokenizeWithOptions`
and `OrtxDetokenize1DWithOptions` functions, which are now redundant.
With the new design, **options are set once on the tokenizer object
itself**, so there’s no longer a need to pass ad-hoc option sets into
individual tokenize/detokenize calls — reducing API clutter and
simplifying the C interface.
In additions to the C API updates, it also adds bindings for C++, C# and
Python.
### Sample Usage
C++
```
auto tokenizer = OgaTokenizer::Create(*model);
// Define tokenizer options as C-style arrays
const char* keys[] = {"add_bos_token", "trim_offsets"};
const char* values[] = {"true", "false"};
// Update tokenizer options
tokenizer->UpdateOptions(keys, values, 2);
```
C#
```
var tokenizer = new Tokenizer(model);
// Update tokenizer options using a dictionary
var options = new Dictionary<string, string>
{
{ "add_bos_token", "true" },
{ "trim_offsets", "false" }
};
tokenizer.UpdateOptions(options);
```
Python
```
tokenizer = Tokenizer(model)
options = {
"add_bos_token": "true",
"trim_offsets": "false"
}
tokenizer.update_options(**options)
```
---------
Co-authored-by: Sayan Shaw <[email protected]>
0 commit comments