@@ -9,7 +9,7 @@ classification model in real time (30 fps+) on the CPU.
9
9
This was all tested with Raspberry Pi 4 Model B 4GB but should work with the 2GB
10
10
variant as well as on the 3B with reduced performance.
11
11
12
- .. image :: https://user-images.githubusercontent.com/909104/152895495-7e9910c1-2b9f-4299-a788-d7ec43a93424.jpg
12
+ .. image :: https://user-images.githubusercontent.com/909104/153093710-bc736b6f-69d9-4a50-a3e8-9f2b2c9e04fd.gif
13
13
14
14
Prerequisites
15
15
~~~~~~~~~~~~~~~~
@@ -78,8 +78,7 @@ We can now check that everything installed correctly:
78
78
79
79
.. code :: shell
80
80
81
- $ python3 -c " import torch; print(torch.__version__)"
82
- 1.10.0+cpu
81
+ $ python -c " import torch; print(torch.__version__)"
83
82
84
83
.. image :: https://user-images.githubusercontent.com/909104/152874271-d7057c2d-80fd-4761-aed4-df6c8b7aa99f.png
85
84
@@ -116,7 +115,7 @@ shuffling to get it into the expected RGB format.
116
115
# convert opencv output from BGR to RGB
117
116
image = image[:, :, [2 , 1 , 0 ]]
118
117
119
- NOTE: You can get even more performance by training the model directly with OpenCV's BGR data format to remove the conversion step .
118
+ This data reading and processing takes about `` 3.5 ms `` .
120
119
121
120
Image Preprocessing
122
121
~~~~~~~~~~~~~~~~~~~~
@@ -128,11 +127,55 @@ We need to take the frames and transform them into the format the model expects.
128
127
from torchvision import transforms
129
128
130
129
preprocess = transforms.Compose([
130
+ # convert the frame to a CHW torch tensor for training
131
131
transforms.ToTensor(),
132
+ # normalize the colors to the range that mobilenet_v2/3 expect
132
133
transforms.Normalize(mean = [0.485 , 0.456 , 0.406 ], std = [0.229 , 0.224 , 0.225 ]),
133
134
])
134
135
input_tensor = preprocess(image)
135
- input_batch = input_tensor.unsqueeze(0 ) # create a mini-batch as expected by the model
136
+ # The model can handle multiple images simultaneously so we need to add an
137
+ # empty dimension for the batch.
138
+ # [3, 224, 224] -> [1, 3, 224, 224]
139
+ input_batch = input_tensor.unsqueeze(0 )
140
+
141
+ Model Choices
142
+ ~~~~~~~~~~~~~~~
143
+
144
+ There's a number of models you can choose from to use with different performance
145
+ characteristics. Not all models provide a ``qnnpack `` pretrained variant so for
146
+ testing purposes you should chose one that does but if you train and quantize
147
+ your own model you can use any of them.
148
+
149
+ We're using ``mobilenet_v2 `` for this tutorial since it has good performance and
150
+ accuracy.
151
+
152
+ Raspberry Pi 4 Benchmark Results:
153
+
154
+ +--------------------+------+-----------------------+-----------------------+--------------------+
155
+ | Model | FPS | Total Time (ms/frame) | Model Time (ms/frame) | qnnpack Pretrained |
156
+ +====================+======+=======================+=======================+====================+
157
+ | mobilenet_v2 | 33.7 | 29.7 | 26.4 | True |
158
+ +--------------------+------+-----------------------+-----------------------+--------------------+
159
+ | mobilenet_v3_large | 29.3 | 34.1 | 30.7 | True |
160
+ +--------------------+------+-----------------------+-----------------------+--------------------+
161
+ | resnet18 | 9.2 | 109.0 | 100.3 | False |
162
+ +--------------------+------+-----------------------+-----------------------+--------------------+
163
+ | resnet50 | 4.3 | 233.9 | 225.2 | False |
164
+ +--------------------+------+-----------------------+-----------------------+--------------------+
165
+ | resnext101_32x8d | 1.1 | 892.5 | 885.3 | False |
166
+ +--------------------+------+-----------------------+-----------------------+--------------------+
167
+ | inception_v3 | 4.9 | 204.1 | 195.5 | False |
168
+ +--------------------+------+-----------------------+-----------------------+--------------------+
169
+ | googlenet | 7.4 | 135.3 | 132.0 | False |
170
+ +--------------------+------+-----------------------+-----------------------+--------------------+
171
+ | shufflenet_v2_x0_5 | 46.7 | 21.4 | 18.2 | False |
172
+ +--------------------+------+-----------------------+-----------------------+--------------------+
173
+ | shufflenet_v2_x1_0 | 24.4 | 41.0 | 37.7 | False |
174
+ +--------------------+------+-----------------------+-----------------------+--------------------+
175
+ | shufflenet_v2_x1_5 | 16.8 | 59.6 | 56.3 | False |
176
+ +--------------------+------+-----------------------+-----------------------+--------------------+
177
+ | shufflenet_v2_x2_0 | 11.6 | 86.3 | 82.7 | False |
178
+ +--------------------+------+-----------------------+-----------------------+--------------------+
136
179
137
180
MobileNetV2: Quantization and JIT
138
181
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -163,7 +206,6 @@ We then want to jit the model to reduce Python overhead and fuse any ops. Jit gi
163
206
.. code :: python
164
207
165
208
net = torch.jit.script(net)
166
- net.eval()
167
209
168
210
Putting It Together
169
211
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -196,7 +238,6 @@ We can now put all the pieces together and run it:
196
238
net = models.quantization.mobilenet_v2(pretrained = True , quantize = True )
197
239
# jit model to take it from ~20fps to ~30fps
198
240
net = torch.jit.script(net)
199
- net.eval()
200
241
201
242
started = time.time()
202
243
last_logged = time.time()
@@ -243,6 +284,50 @@ If we check ``htop`` we see that we have almost 100% utilization.
243
284
244
285
.. image :: https://user-images.githubusercontent.com/909104/152892630-f094b84b-19ba-48f6-8632-1b954abc59c7.png
245
286
287
+ To verify that it's working end to end we can compute the probabilities of the
288
+ classes and
289
+ `use the ImageNet class labels <https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a >`_
290
+ to print the detections.
291
+
292
+ .. code :: python
293
+
294
+ top = list (enumerate (output[0 ].softmax(dim = 0 )))
295
+ top.sort(key = lambda x : x[1 ], reverse = True )
296
+ for idx, val in top[:10 ]:
297
+ print (f " { val.item()* 100 :.2f } % { classes[idx]} " )
298
+
299
+ ``mobilenet_v3_large `` running in real time:
300
+
301
+ .. image :: https://user-images.githubusercontent.com/909104/153093710-bc736b6f-69d9-4a50-a3e8-9f2b2c9e04fd.gif
302
+
303
+
304
+ Detecting an orange:
305
+
306
+ .. image :: https://user-images.githubusercontent.com/909104/153092153-d9c08dfe-105b-408a-8e1e-295da8a78c19.jpg
307
+
308
+
309
+ Detecting a mug:
310
+
311
+ .. image :: https://user-images.githubusercontent.com/909104/153092155-4b90002f-a0f3-4267-8d70-e713e7b4d5a0.jpg
312
+
313
+
314
+ Troubleshooting: Performance
315
+ ~~~~~~~~~~~~~~~~~
316
+
317
+ PyTorch by default will use all of the cores available. If you have anything
318
+ running in the background on the Raspberry Pi it may cause contention with the
319
+ model inference causing latency spikes. To alleviate this you can reduce the
320
+ number of threads which will reduce the peak latency at a small performance
321
+ penalty.
322
+
323
+ .. code :: python
324
+
325
+ torch.set_num_threads(2 )
326
+
327
+ For ``shufflenet_v2_x1_5 `` using ``2 threads `` instead of ``4 threads ``
328
+ increases best case latency to ``72 ms `` from ``60 ms `` but eliminates the
329
+ latency spikes of ``128 ms ``.
330
+
246
331
Next Steps
247
332
~~~~~~~~~~~~~
248
333
@@ -256,4 +341,5 @@ directly deploy with good performance on a Raspberry Pi.
256
341
See more:
257
342
258
343
* `Quantization <https://pytorch.org/docs/stable/quantization.html >`_ for more information on how to quantize and fuse your model.
259
- * :ref: `beginner/transfer_learning_tutorial ` for how to use transfer learning to fine tune a pre-existing model to your dataset.
344
+ * `Transfer Learning Tutorial <https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html >`_
345
+ for how to use transfer learning to fine tune a pre-existing model to your dataset.
0 commit comments