|
| 1 | +# Using ONNX to deploy a finetuned model |
| 2 | + |
| 3 | +Imagine a scenario where you successfully used Finetuner to finetune a model on your search problem using data from the respective domain. What comes next? |
| 4 | + |
| 5 | +Naturally, you would want to deploy your {term}`embedding model` in a service and use it to |
| 6 | +encode data as part of a bigger search application. [Jina](https://docs.jina.ai/) |
| 7 | +provides the infrastructure layer to help you build and deploy neural search components. |
| 8 | +By implementing custom `Executor`s or using existing ones from [Jina Hub](https://hub.jina.ai/) |
| 9 | +you can define the building blocks of your search application and glue everything |
| 10 | +together in a fully-fledged search pipeline. |
| 11 | + |
| 12 | +To use your model with Jina, you would have to build a new `Executor` tailored to your model and upload it to Jina Hub. You can then use a Jina `Flow` to deploy your search |
| 13 | +application locally or in the cloud using kubernetes. |
| 14 | + |
| 15 | +It is difficult to provide a unified `Executor` that can support all models trained with Finetuner, |
| 16 | +since Finetuner supports different computational frameworks and each model differs in terms |
| 17 | +of pre-processing required, input type etc. |
| 18 | + |
| 19 | +There is an alternative though, that can be useful for standardizing your deployment |
| 20 | +procedure and unifying different models. Finetuner allows the user to use ONNX for converting |
| 21 | +finetuned models to the ONNX format, whether these models have been trained with PyTorch, |
| 22 | +TensorFlow or PaddlePaddle. The ONNX format is an open standard for defining |
| 23 | +neural network architectures. You can find out more in the [ONNX webpage](https://onnx.ai/). |
| 24 | + |
| 25 | +After converting a trained model to the ONNX format, you can use the |
| 26 | +[ONNXEncoder](https://hub.jina.ai/executor/2cuinbko) from Jina Hub to deploy your {term}`embedding |
| 27 | +model`. The encoder simply loads your {term}`embedding model` in ONNX format and uses the ONNX |
| 28 | +runtime for inference. The `Document` tensors are fed to the ONNX model as input |
| 29 | +and the output NumPy vectors are assigned to the `Document` as embeddings. |
| 30 | + |
| 31 | +The `ONNXEncoder` takes care of the embedding part and the only thing left is to add a custom |
| 32 | +`Executor` for pre-processing in case there is the need for one. Additionally, you gain |
| 33 | +a boost in efficiency, since the conversion to ONNX already optimizes the model based on the platform/hardware device. |
| 34 | + |
| 35 | +This tutorial will walk you through a simple model finetuning and how you can convert to |
| 36 | +ONNX and use the `ONNXEncoder` to deploy a Jina `Flow`. |
| 37 | + |
| 38 | +````{info} |
| 39 | +Jina, onnx, torch and torchvision are required to run this example. You can install the packages using: |
| 40 | +
|
| 41 | +``` |
| 42 | +pip install 'jina>=3.0' onnx torch torchvision |
| 43 | +``` |
| 44 | +```` |
| 45 | + |
| 46 | + |
| 47 | +## Exporting to ONNX |
| 48 | + |
| 49 | +Let's fine-tune a ResNet18 on a customized CelebA dataset. |
| 50 | +[Download the dataset](https://static.jina.ai/celeba/celeba-img.zip) and decompress it to |
| 51 | +`'./img_align_celeba'`. |
| 52 | + |
| 53 | +```python |
| 54 | +import torchvision |
| 55 | +from docarray import DocumentArray |
| 56 | + |
| 57 | +import finetuner as ft |
| 58 | + |
| 59 | +data = DocumentArray.from_files('img_align_celeba/*.jpg') |
| 60 | + |
| 61 | + |
| 62 | +def preprocess(doc): |
| 63 | + return ( |
| 64 | + doc.load_uri_to_image_tensor(224, 224) |
| 65 | + .set_image_tensor_normalization() |
| 66 | + .set_image_tensor_channel_axis(-1, 0) |
| 67 | + ) |
| 68 | + |
| 69 | + |
| 70 | +data.apply(preprocess) |
| 71 | + |
| 72 | +resnet = torchvision.models.resnet18(pretrained=True) |
| 73 | + |
| 74 | +tuned_model = ft.fit( |
| 75 | + model=resnet, |
| 76 | + train_data=data, |
| 77 | + loss='TripletLoss', |
| 78 | + epochs=20, |
| 79 | + batch_size=128, |
| 80 | + to_embedding_model=True, |
| 81 | + input_size=(3, 224, 224), |
| 82 | + layer_name='adaptiveavgpool2d_67', |
| 83 | + freeze=False, |
| 84 | +) |
| 85 | + |
| 86 | +``` |
| 87 | + |
| 88 | +We can now export the model to the ONNX format, using the `to_onnx` method provided |
| 89 | +by Finetuner. The `onnx` package is required to use this functionality. Also, if |
| 90 | +you are using TensorFlow or PaddlePaddle you will need to install additional packages |
| 91 | +for the ONNX conversion. Specifically you will need: |
| 92 | +[tf2onnx](https://github.com/onnx/tensorflow-onnx) for TensorFlow and |
| 93 | +[paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX) for PaddlePaddle. |
| 94 | + |
| 95 | +```bash |
| 96 | +pip install onnx |
| 97 | +pip install tf2onnx |
| 98 | +pip install paddle2onnx |
| 99 | +``` |
| 100 | + |
| 101 | +Back to the ONNX conversion: |
| 102 | + |
| 103 | +```python |
| 104 | +from finetuner.tuner.onnx import to_onnx |
| 105 | + |
| 106 | +to_onnx(tuned_model, 'tuned-model.onnx', input_shape=[3, 224, 224], opset_version=13) |
| 107 | +``` |
| 108 | + |
| 109 | +The ONNX exported model should be stored now at `tuned-model.onnx`. |
| 110 | + |
| 111 | +A helper function for validating the ONNX export is also provided: |
| 112 | +```python |
| 113 | +from finetuner.tuner.onnx import validate_onnx_export |
| 114 | + |
| 115 | +validate_onnx_export(tuned_model, 'tuned-model.onnx', input_shape=[3, 224, 224]) |
| 116 | +``` |
| 117 | + |
| 118 | +This function compares the outputs of the native model and its ONNX counterpart against |
| 119 | +the same random input. If the outputs differ an assertion error is raised. |
| 120 | + |
| 121 | +Now that we have our model exported and have verified that it has the same behavior as the |
| 122 | +original, it's time to deploy it 🚀. |
| 123 | + |
| 124 | + |
| 125 | +## Deploying using the `ONNXEncoder` |
| 126 | + |
| 127 | +You have already finetuned your own model and transferred it to ONNX format. Let's start deploying in the Jina `Flow`. If you are not familiar with Jina Hub or Jina `Flow`, check this: |
| 128 | +[Use Hub Executor](https://docs.jina.ai/advanced/hub/use-hub-executor/) |
| 129 | +[Use Jina Flow](https://docs.jina.ai/fundamentals/flow/) |
| 130 | + |
| 131 | +Here are the steps you can follow: |
| 132 | + |
| 133 | +### Add `ONNXEncoder` to Jina `Flow` |
| 134 | + |
| 135 | +We already have `ONNXEncoder` on Jina Hub, let's add `ONNXEncoder` to Jina `Flow`: |
| 136 | + |
| 137 | +using docker image: |
| 138 | + |
| 139 | +```python |
| 140 | +from jina import Flow |
| 141 | + |
| 142 | +f = Flow().add(uses='jinahub+docker://ONNXEncoder', |
| 143 | + uses_with={'model_path': 'tuned-model.onnx'}) |
| 144 | +``` |
| 145 | + |
| 146 | +or using source code: |
| 147 | + |
| 148 | +```python |
| 149 | +from jina import Flow |
| 150 | + |
| 151 | +f = Flow().add(uses='jinahub://ONNXEncoder', |
| 152 | + uses_with={'model_path': 'tuned-model.onnx'}) |
| 153 | +``` |
| 154 | + |
| 155 | +`model_path` is the path of the ONNX model you exported. |
| 156 | + |
| 157 | +### Complete the `Flow` by adding indexer and starting the service |
| 158 | + |
| 159 | +```python |
| 160 | +import numpy as np |
| 161 | +from docarray import DocumentArray |
| 162 | + |
| 163 | +from jina import Executor, Flow, requests |
| 164 | + |
| 165 | + |
| 166 | +class SimpleIndexer(Executor): |
| 167 | + |
| 168 | + def __init__(self, **kwargs): |
| 169 | + super().__init__(**kwargs) |
| 170 | + self._da = DocumentArray() |
| 171 | + |
| 172 | + @requests(on='/index') |
| 173 | + def index(self, docs: DocumentArray, **kwargs): |
| 174 | + self._da.extend(docs) |
| 175 | + |
| 176 | + @requests(on='/search') |
| 177 | + def search(self, docs: DocumentArray, **kwargs): |
| 178 | + docs.match(self._da) |
| 179 | + |
| 180 | + |
| 181 | +f = Flow(port_expose=12345, |
| 182 | + protocol='http').add(uses='jinahub://ONNXEncoder', |
| 183 | + uses_with={'model_path': 'tuned-model.onnx'}, |
| 184 | + name='Encoder').add(uses=SimpleIndexer, |
| 185 | + name='Indexer') |
| 186 | + |
| 187 | +with f: |
| 188 | + f.post('/index', data, show_progress=True) |
| 189 | + f.block() |
| 190 | +``` |
| 191 | + |
| 192 | +You will see something like this: |
| 193 | + |
| 194 | +```bash |
| 195 | + Flow@6260[I]:🎉 Flow is ready to use! |
| 196 | + 🔗 Protocol: HTTP |
| 197 | + 🏠 Local access: 0.0.0.0:12345 |
| 198 | + 🔒 Private network: 192.168.31.213:12345 |
| 199 | + 💬 Swagger UI: http://localhost:12345/docs |
| 200 | + 📚 Redoc: http://localhost:12345/redoc |
| 201 | +⠼ DONE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 0:00:14 100% ETA: 0 seconds 80 steps done in 14 seconds |
| 202 | +``` |
| 203 | + |
| 204 | +which means that the service has been successfully started. |
| 205 | + |
| 206 | +### Start to query! |
| 207 | + |
| 208 | +Finally it's time to see what we can get from our service. Start a client and try to query using the service we have just built. |
| 209 | + |
| 210 | +```python |
| 211 | +from docarray import DocumentArray |
| 212 | + |
| 213 | +from jina import Client |
| 214 | +from jina.types.request.data import Response |
| 215 | + |
| 216 | +# the callback function invoked when task is done |
| 217 | +def print_matches(resp: Response): |
| 218 | + |
| 219 | + # print top-3 matches for first doc |
| 220 | + for idx, d in enumerate(resp.docs[0].matches[:3]): |
| 221 | + print(f'[{idx}]{d.scores["cosine"].value:2f}') |
| 222 | + |
| 223 | + |
| 224 | +data = DocumentArray.from_files('img_align_celeba/*.jpg') |
| 225 | + |
| 226 | +def preprocess(doc): |
| 227 | + return ( |
| 228 | + doc.load_uri_to_image_tensor(224, 224) |
| 229 | + .set_image_tensor_normalization() |
| 230 | + .set_image_tensor_channel_axis(-1, 0) |
| 231 | + ) |
| 232 | + |
| 233 | + |
| 234 | +data.apply(preprocess) |
| 235 | + |
| 236 | +# connect to localhost:12345 |
| 237 | +c = Client(protocol='http', port=12345) |
| 238 | +c.post('/search', inputs=data, on_done=print_matches) |
| 239 | +``` |
| 240 | + |
| 241 | +And outputs will be like this: |
| 242 | + |
| 243 | +```bash |
| 244 | +[0]0.000001 |
| 245 | +[1]0.080417 |
| 246 | +[2]0.115125 |
| 247 | +``` |
| 248 | + |
| 249 | +The first column is the index of images and the second column is the cosine distance. |
| 250 | + |
| 251 | +Congratulations! You have implemented the pipeline which includes finetuning the model, converting to ONNX and deploying in Jina `Flow`. |
0 commit comments