You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+58-19Lines changed: 58 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -16,12 +16,12 @@ However, I will add all the details and working examples for the new comers who
16
16
17
17
## Roadmap
18
18
19
-
This tutorial should take you from installation, to running pre-trained detection model, and training/evaluation your models with a custom dataset.
19
+
This tutorial should take you from installation, to running pre-trained detection model, then training your model with a custom dataset, and exporting your model for inference.
20
20
21
21
1.[Installation](#installation)
22
22
2.[Inference with pre-trained models](#inference-with-pre-trained-models)
23
23
3.[Preparing your custom dataset for training](#preparing-your-custom-dataset-for-training)
24
-
4. Training object detction model with your custom dataset
24
+
4.[Training object detection model with your custom dataset](#training-object-detection-model-with-your-custom-dataset)
25
25
5. Exporting your trained model for inference
26
26
27
27
@@ -163,6 +163,9 @@ The raccoon dataset contains a total of 200 images with 217 raccoons, which is s
163
163
The original [dataset repo](https://github.com/datitran/raccoon_dataset) provides many scripts to deal with the dataset and randomly select train and test splits with 160 and 40 images respectively.
164
164
However, just for convenience, and to decrease the efforts needed, I have included the dataset images and annotation in this repo (in [data/raccoon_data/](data/raccoon_data/) ), and split them manually, taking the first 160 images for training, and the last 40 images for testing.
165
165
I recommend checking the original [dataset repo](https://github.com/datitran/raccoon_dataset), along with this [article](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) written by the author of the dataset.
166
+
Here are some images from the raccoon dataset ([source](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9)).
First step to start training your model is to generate [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) file from the dataset annotations.
For convenience, I have added all these steps in a shell file that you can run to generate the csv files and use them to generate the tfrecords.
195
+
For convenience, I have added all these steps in a shell script that you can run to generate the csv files and use them to generate the tfrecords.
193
196
So simply run this shell file as follows:
194
197
195
198
```bash
@@ -205,13 +208,13 @@ Et voila, we have the tfrecord files generated, and we can use it in next steps
205
208
## Training object detection model with your custom dataset
206
209
207
210
To start training our model, we need to prepare a configuration file specifying the backbone model and all the required parameters for training and evaluation.
208
-
In this [tutorial](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md) from the object detection api you can find explanation of all the required parameters.
211
+
In this [tutorial](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md) from the object detection api you can find an explanation of all the required parameters.
209
212
But fortunately, they also provide us with many [example config files](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2) that we can use and just modify some parameters to match our requirements.
210
213
211
-
Here I will be using the the config file of the SSD model with MobileNetV2 backbone as it is small model that can fit in small GPU memory.
214
+
Here I will be using the the config file of the SSD model with MobileNetV2 backbone as it is small model that can fit in a small GPU memory.
212
215
So let's first download the pretrained model with coco dataset that is provided in the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md), and use it as initialization to our model.
213
216
This is called fine-tuning, which is simply loading the weights of pretrained model, and use it as a starting point in our training. This will help us too much as we have very small number of images.
214
-
You can read more about transfer learning methods from [here](https://cs231n.github.io/transfer-learning/).
217
+
You can read more about transfer learning methods [here](https://cs231n.github.io/transfer-learning/).
215
218
216
219
```bash
217
220
cd models/
@@ -224,21 +227,21 @@ tar -xzvf ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
224
227
Then you can download the original config file from [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).
225
228
I downloaded [ssd_mobilenet_v2_320x320_coco17_tpu-8.config](https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config) and made the following changes:
226
229
227
-
* Changed `num_classes: 1` as we have only class (raccoon), instead of 90 classes in coco dataset.
228
-
* Changed `fine_tune_checkpoint_type: "classification"` to `fine_tune_checkpoint_type: "detection"` as we will be using the pre-trained detection model as initialization.
230
+
* Changed `num_classes: 1` as we have only one class (raccoon), instead of 90 classes in coco dataset.
231
+
* Changed `fine_tune_checkpoint_type: "classification"` to `fine_tune_checkpoint_type: "detection"` as we are using the pre-trained detection model as initialization.
229
232
* Added the path of the pretrained model in the field `fine_tune_checkpoint:`, for example using the mobilenet v2 model I added `fine_tune_checkpoint: "../models/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0"`
230
233
* Changed `batch_size: 512` and used a reasonable number to my GPU memory. I have a 4GB of GPU memory, so I am using `batch_size: 16`
231
234
* Added the maximum number of training iterations in `num_steps:`, and also use the same number in `total_steps:`
232
235
* Adapted the learning rate to our model and batch size (originally they used higher learning rates because they had bigger batch sizes). This values needs some testing and tuning, but finally I used this configuration:
233
236
```
234
237
cosine_decay_learning_rate {
235
-
learning_rate_base: 0.03
238
+
learning_rate_base: 0.025
236
239
total_steps: 3000
237
240
warmup_learning_rate: 0.005
238
241
warmup_steps: 100 }
239
242
```
240
-
* The `label_map_path:` should point to your labelmap (here the raccoon labelmap) `label_map_path: "../models/raccoon_labelmap.pbtxt"`
241
-
* You need to set the `tf_record_input_reader` under both `train_input_reader` and `eval_input_reader`. This should point to the tfrecords we generated.
243
+
* The `label_map_path:` should point to our labelmap file (here the raccoon labelmap) `label_map_path: "../models/raccoon_labelmap.pbtxt"`
244
+
* You need to set the `tf_record_input_reader` under both `train_input_reader` and `eval_input_reader`. This should point to the tfrecords we generated (one for training and one for validation).
@@ -248,7 +251,7 @@ I downloaded [ssd_mobilenet_v2_320x320_coco17_tpu-8.config](https://github.com/t
248
251
}
249
252
```
250
253
251
-
Yous should also prepare the labelmap according to your data. For our raccoon dataset, the labelmap file contains:
254
+
You should also prepare the labelmap according to your data. For our raccoon dataset, the [labelmap file](models/raccoon_labelmap.pbtxt) contains:
252
255
253
256
```
254
257
item {
@@ -257,17 +260,18 @@ item {
257
260
}
258
261
```
259
262
260
-
The labelmap file and the modified configuration files are added to this repo for convenience.
261
-
You can find them is [models/raccoon_labelmap.pbtxt](models/raccoon_labelmap.pbtxt) and [models/ssd_mobilenet_v2_raccoon.config](models/ssd_mobilenet_v2_raccoon.config).
263
+
The labelmap file and the modified configuration files are added to this repo.
264
+
You can find them in [models/raccoon_labelmap.pbtxt](models/raccoon_labelmap.pbtxt) and [models/ssd_mobilenet_v2_raccoon.config](models/ssd_mobilenet_v2_raccoon.config).
262
265
263
266
Once you prepare the configuration file, you can start training by typing the following commands:
264
267
265
268
```bash
266
-
cd train_tf2
269
+
# you should run training scriptd from train_tf2/ directory
270
+
cd train_tf2/
267
271
bash start_train.sh
268
272
```
269
273
270
-
The [start_train.sh](train_tf2/start_train.sh) file is a simple shell script that contains all the parameters needed for training, and runs the training script.
274
+
The [start_train.sh](train_tf2/start_train.sh) file is a simple shell script runs the training with all the required parameters.
You can notice that it actually runs the [model_main_tf2.py](train_tf2/model_main_tf2.py),
286
+
which I copied from the object detection api package (directly in the object_detection folder), and you can also download it from [here]().
287
+
281
288
It is also recommended to run the validation script along with the training scripts.
282
289
The training script saves a checkpoint every _n_ steps while training, and this value can be specified in the parameter `--checkpoint_every_n`.
283
-
While training is running, the validation script reads these checkpoints once they are available, and use them to evaluate the model with the validation set.
284
-
This will help us to monitor the training progress by printing the values on the terminal, or by using a GUI monitoring package like [tensorboard](https://www.tensorflow.org/tensorboard/get_started) as we will see.
290
+
While training is running, the validation script reads these checkpoints when they are available, and use them to evaluate the model with the validation set (from the validation tfrecord file).
291
+
This will help us to monitor the training progress by printing the validation mAP on the terminal, or by using a GUI monitoring package like [tensorboard](https://www.tensorflow.org/tensorboard/get_started) as we will see.
285
292
286
293
To run the validation script along with training script, open another terminal and run:
--checkpoint_dir=$out_dir2>&1| tee $out_dir/eval.log
300
307
```
301
308
302
-
If you don't have enough resources, you can ignore running the validation script, and run it only once when the training is done.
309
+
Note that running the evaluation script along with the training requires another GPU dedicated for the evaluation.
310
+
So, if you don't have enough resources, you can ignore running the validation script, and run it only once when the training is done.
311
+
However, I used a simple trick that allowed me to run the evaluation on the CPU, while the training is running on the GPU.
312
+
Simply by adding this flag before running the evaluation script `export CUDA_VISIBLE_DEVICES="-1"`, which makes all the GPUs not visible for tensoflow,
313
+
so it will use the CPU instead. This flag is set in the [start_eval.sh](train_tf2/start_eval.sh) script, and you just need to uncomment this line before running the script.
314
+
315
+
316
+
Finally, it is time to see how our training is progressing, which is very easy task using [tensorboard](https://www.tensorflow.org/tensorboard/get_started).
317
+
Tensorboard reads the training and evaluation log files written by tensorflow, and draws different curves showing the progress of the training loss values (lower is better), and validation accuracy (higher is better).
318
+
To run the tensorboard, just open new terminal window and run the command:
the `--logdir` argument should point to the same directory as passed to the `--model_dir` argument used in training and validation scripts.
325
+
The training and validation scripts write their logs in separate folders inside this directory, then tensorboard read these logs and draw the curves.
326
+
327
+
When you run the tensorboard command, it will not show any GUI, but will give you a link (something like `http://localhost:6006/ `) that you can copy and paste in your favourite internet browser.
328
+
Then you can see all the curves for training and validation.
329
+
330
+
The total number of steps is 3000, and here is how the training loss evolved with the steps:
For the validation mAP (mean average precision) with the saved checkpoints (a checkpoint saved each 500 steps), you can see the next curve which represents [email protected].
335
+
Note that here we have only one class, so we actually have the average precision (AP) for this class.
336
+
[email protected] means that detection boxes are considered good detections (True positive) if their Intersection over Union (IoU) with the ground truth box is 0.5 or higher.
337
+
I recommend reading this [article](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/), which explains the idea of Intersection over Union in object detection.
0 commit comments