Skip to content

Commit 4cde8d3

Browse files
training evaluation completed + export model scripts added
1 parent 95017dc commit 4cde8d3

File tree

7 files changed

+194
-20
lines changed

7 files changed

+194
-20
lines changed

README.md

Lines changed: 58 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ However, I will add all the details and working examples for the new comers who
1616

1717
## Roadmap
1818

19-
This tutorial should take you from installation, to running pre-trained detection model, and training/evaluation your models with a custom dataset.
19+
This tutorial should take you from installation, to running pre-trained detection model, then training your model with a custom dataset, and exporting your model for inference.
2020

2121
1. [Installation](#installation)
2222
2. [Inference with pre-trained models](#inference-with-pre-trained-models)
2323
3. [Preparing your custom dataset for training](#preparing-your-custom-dataset-for-training)
24-
4. Training object detction model with your custom dataset
24+
4. [Training object detection model with your custom dataset](#training-object-detection-model-with-your-custom-dataset)
2525
5. Exporting your trained model for inference
2626

2727

@@ -163,6 +163,9 @@ The raccoon dataset contains a total of 200 images with 217 raccoons, which is s
163163
The original [dataset repo](https://github.com/datitran/raccoon_dataset) provides many scripts to deal with the dataset and randomly select train and test splits with 160 and 40 images respectively.
164164
However, just for convenience, and to decrease the efforts needed, I have included the dataset images and annotation in this repo (in [data/raccoon_data/](data/raccoon_data/) ), and split them manually, taking the first 160 images for training, and the last 40 images for testing.
165165
I recommend checking the original [dataset repo](https://github.com/datitran/raccoon_dataset), along with this [article](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9) written by the author of the dataset.
166+
Here are some images from the raccoon dataset ([source](https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9)).
167+
168+
![raccoon_dataset](data/samples/docs/raccoon_dataset.jpeg)
166169

167170

168171
First step to start training your model is to generate [TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) file from the dataset annotations.
@@ -189,7 +192,7 @@ python generate_tfrecord.py --path_to_images ../data/raccoon_data/train/images \
189192
--path_to_save_tfrecords ../data/raccoon_data/train.record
190193
```
191194

192-
For convenience, I have added all these steps in a shell file that you can run to generate the csv files and use them to generate the tfrecords.
195+
For convenience, I have added all these steps in a shell script that you can run to generate the csv files and use them to generate the tfrecords.
193196
So simply run this shell file as follows:
194197

195198
```bash
@@ -205,13 +208,13 @@ Et voila, we have the tfrecord files generated, and we can use it in next steps
205208
## Training object detection model with your custom dataset
206209

207210
To start training our model, we need to prepare a configuration file specifying the backbone model and all the required parameters for training and evaluation.
208-
In this [tutorial](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md) from the object detection api you can find explanation of all the required parameters.
211+
In this [tutorial](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md) from the object detection api you can find an explanation of all the required parameters.
209212
But fortunately, they also provide us with many [example config files](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2) that we can use and just modify some parameters to match our requirements.
210213

211-
Here I will be using the the config file of the SSD model with MobileNetV2 backbone as it is small model that can fit in small GPU memory.
214+
Here I will be using the the config file of the SSD model with MobileNetV2 backbone as it is small model that can fit in a small GPU memory.
212215
So let's first download the pretrained model with coco dataset that is provided in the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md), and use it as initialization to our model.
213216
This is called fine-tuning, which is simply loading the weights of pretrained model, and use it as a starting point in our training. This will help us too much as we have very small number of images.
214-
You can read more about transfer learning methods from [here](https://cs231n.github.io/transfer-learning/).
217+
You can read more about transfer learning methods [here](https://cs231n.github.io/transfer-learning/).
215218

216219
```bash
217220
cd models/
@@ -224,21 +227,21 @@ tar -xzvf ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
224227
Then you can download the original config file from [here](https://github.com/tensorflow/models/tree/master/research/object_detection/configs/tf2).
225228
I downloaded [ssd_mobilenet_v2_320x320_coco17_tpu-8.config](https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_320x320_coco17_tpu-8.config) and made the following changes:
226229

227-
* Changed `num_classes: 1` as we have only class (raccoon), instead of 90 classes in coco dataset.
228-
* Changed `fine_tune_checkpoint_type: "classification"` to `fine_tune_checkpoint_type: "detection"` as we will be using the pre-trained detection model as initialization.
230+
* Changed `num_classes: 1` as we have only one class (raccoon), instead of 90 classes in coco dataset.
231+
* Changed `fine_tune_checkpoint_type: "classification"` to `fine_tune_checkpoint_type: "detection"` as we are using the pre-trained detection model as initialization.
229232
* Added the path of the pretrained model in the field `fine_tune_checkpoint:`, for example using the mobilenet v2 model I added `fine_tune_checkpoint: "../models/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0"`
230233
* Changed `batch_size: 512` and used a reasonable number to my GPU memory. I have a 4GB of GPU memory, so I am using `batch_size: 16`
231234
* Added the maximum number of training iterations in `num_steps:`, and also use the same number in `total_steps:`
232235
* Adapted the learning rate to our model and batch size (originally they used higher learning rates because they had bigger batch sizes). This values needs some testing and tuning, but finally I used this configuration:
233236
```
234237
cosine_decay_learning_rate {
235-
learning_rate_base: 0.03
238+
learning_rate_base: 0.025
236239
total_steps: 3000
237240
warmup_learning_rate: 0.005
238241
warmup_steps: 100 }
239242
```
240-
* The `label_map_path:` should point to your labelmap (here the raccoon labelmap) `label_map_path: "../models/raccoon_labelmap.pbtxt"`
241-
* You need to set the `tf_record_input_reader` under both `train_input_reader` and `eval_input_reader`. This should point to the tfrecords we generated.
243+
* The `label_map_path:` should point to our labelmap file (here the raccoon labelmap) `label_map_path: "../models/raccoon_labelmap.pbtxt"`
244+
* You need to set the `tf_record_input_reader` under both `train_input_reader` and `eval_input_reader`. This should point to the tfrecords we generated (one for training and one for validation).
242245
```
243246
train_input_reader: {
244247
label_map_path: "../models/raccoon_labelmap.pbtxt"
@@ -248,7 +251,7 @@ I downloaded [ssd_mobilenet_v2_320x320_coco17_tpu-8.config](https://github.com/t
248251
}
249252
```
250253
251-
Yous should also prepare the labelmap according to your data. For our raccoon dataset, the labelmap file contains:
254+
You should also prepare the labelmap according to your data. For our raccoon dataset, the [labelmap file](models/raccoon_labelmap.pbtxt) contains:
252255
253256
```
254257
item {
@@ -257,17 +260,18 @@ item {
257260
}
258261
```
259262
260-
The labelmap file and the modified configuration files are added to this repo for convenience.
261-
You can find them is [models/raccoon_labelmap.pbtxt](models/raccoon_labelmap.pbtxt) and [models/ssd_mobilenet_v2_raccoon.config](models/ssd_mobilenet_v2_raccoon.config).
263+
The labelmap file and the modified configuration files are added to this repo.
264+
You can find them in [models/raccoon_labelmap.pbtxt](models/raccoon_labelmap.pbtxt) and [models/ssd_mobilenet_v2_raccoon.config](models/ssd_mobilenet_v2_raccoon.config).
262265
263266
Once you prepare the configuration file, you can start training by typing the following commands:
264267
265268
```bash
266-
cd train_tf2
269+
# you should run training scriptd from train_tf2/ directory
270+
cd train_tf2/
267271
bash start_train.sh
268272
```
269273

270-
The [start_train.sh](train_tf2/start_train.sh) file is a simple shell script that contains all the parameters needed for training, and runs the training script.
274+
The [start_train.sh](train_tf2/start_train.sh) file is a simple shell script runs the training with all the required parameters.
271275
The shell file contains the following command:
272276

273277
```bash
@@ -278,10 +282,13 @@ python model_main_tf2.py --alsologtostderr --model_dir=$out_dir --checkpoint_eve
278282
--eval_on_train_data 2>&1 | tee $out_dir/train.log
279283
```
280284

285+
You can notice that it actually runs the [model_main_tf2.py](train_tf2/model_main_tf2.py),
286+
which I copied from the object detection api package (directly in the object_detection folder), and you can also download it from [here]().
287+
281288
It is also recommended to run the validation script along with the training scripts.
282289
The training script saves a checkpoint every _n_ steps while training, and this value can be specified in the parameter `--checkpoint_every_n`.
283-
While training is running, the validation script reads these checkpoints once they are available, and use them to evaluate the model with the validation set.
284-
This will help us to monitor the training progress by printing the values on the terminal, or by using a GUI monitoring package like [tensorboard](https://www.tensorflow.org/tensorboard/get_started) as we will see.
290+
While training is running, the validation script reads these checkpoints when they are available, and use them to evaluate the model with the validation set (from the validation tfrecord file).
291+
This will help us to monitor the training progress by printing the validation mAP on the terminal, or by using a GUI monitoring package like [tensorboard](https://www.tensorflow.org/tensorboard/get_started) as we will see.
285292

286293
To run the validation script along with training script, open another terminal and run:
287294

@@ -299,7 +306,39 @@ python model_main_tf2.py --alsologtostderr --model_dir=$out_dir \
299306
--checkpoint_dir=$out_dir 2>&1 | tee $out_dir/eval.log
300307
```
301308

302-
If you don't have enough resources, you can ignore running the validation script, and run it only once when the training is done.
309+
Note that running the evaluation script along with the training requires another GPU dedicated for the evaluation.
310+
So, if you don't have enough resources, you can ignore running the validation script, and run it only once when the training is done.
311+
However, I used a simple trick that allowed me to run the evaluation on the CPU, while the training is running on the GPU.
312+
Simply by adding this flag before running the evaluation script `export CUDA_VISIBLE_DEVICES="-1"`, which makes all the GPUs not visible for tensoflow,
313+
so it will use the CPU instead. This flag is set in the [start_eval.sh](train_tf2/start_eval.sh) script, and you just need to uncomment this line before running the script.
314+
315+
316+
Finally, it is time to see how our training is progressing, which is very easy task using [tensorboard](https://www.tensorflow.org/tensorboard/get_started).
317+
Tensorboard reads the training and evaluation log files written by tensorflow, and draws different curves showing the progress of the training loss values (lower is better), and validation accuracy (higher is better).
318+
To run the tensorboard, just open new terminal window and run the command:
319+
320+
```bash
321+
tensorboard --logdir=models/ssd_mobilenet_v2_raccoon
322+
```
323+
324+
the `--logdir` argument should point to the same directory as passed to the `--model_dir` argument used in training and validation scripts.
325+
The training and validation scripts write their logs in separate folders inside this directory, then tensorboard read these logs and draw the curves.
326+
327+
When you run the tensorboard command, it will not show any GUI, but will give you a link (something like `http://localhost:6006/ `) that you can copy and paste in your favourite internet browser.
328+
Then you can see all the curves for training and validation.
329+
330+
The total number of steps is 3000, and here is how the training loss evolved with the steps:
331+
332+
![detcetion-output1](data/samples/docs/train_loss.png)
333+
334+
For the validation mAP (mean average precision) with the saved checkpoints (a checkpoint saved each 500 steps), you can see the next curve which represents [email protected].
335+
Note that here we have only one class, so we actually have the average precision (AP) for this class.
336+
[email protected] means that detection boxes are considered good detections (True positive) if their Intersection over Union (IoU) with the ground truth box is 0.5 or higher.
337+
I recommend reading this [article](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/), which explains the idea of Intersection over Union in object detection.
338+
339+
![detcetion-output1](data/samples/docs/val_precision.png)
340+
341+
303342

304343

305344

202 KB
Loading

data/samples/docs/train_loss.png

72.3 KB
Loading

data/samples/docs/val_precision.png

62.2 KB
Loading

models/ssd_mobilenet_v2_raccoon.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ train_config: {
155155
momentum_optimizer: {
156156
learning_rate: {
157157
cosine_decay_learning_rate {
158-
learning_rate_base: 0.035
158+
learning_rate_base: 0.025
159159
total_steps: 3000
160160
warmup_learning_rate: 0.005
161161
warmup_steps: 100

train_tf2/export_model.sh

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
model_dir=../models/ssd_mobilenet_v2_raccoon
2+
out_dir=$model_dir/exported_model
3+
mkdir -p $out_dir
4+
5+
python exporter_main_v2.py \
6+
--input_type="image_tensor" \
7+
--pipeline_config_path=$model_dir/pipeline.config \
8+
--trained_checkpoint_dir=$model_dir/ \
9+
--output_directory=$out_dir

train_tf2/exporter_main_v2.py

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Lint as: python2, python3
2+
# Copyright 2020 The TensorFlow Authors. All Rights Reserved.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
# ==============================================================================
16+
17+
r"""Tool to export an object detection model for inference.
18+
19+
Prepares an object detection tensorflow graph for inference using model
20+
configuration and a trained checkpoint. Outputs associated checkpoint files,
21+
a SavedModel, and a copy of the model config.
22+
23+
The inference graph contains one of three input nodes depending on the user
24+
specified option.
25+
* `image_tensor`: Accepts a uint8 4-D tensor of shape [1, None, None, 3]
26+
* `float_image_tensor`: Accepts a float32 4-D tensor of shape
27+
[1, None, None, 3]
28+
* `encoded_image_string_tensor`: Accepts a 1-D string tensor of shape [None]
29+
containing encoded PNG or JPEG images. Image resolutions are expected to be
30+
the same if more than 1 image is provided.
31+
* `tf_example`: Accepts a 1-D string tensor of shape [None] containing
32+
serialized TFExample protos. Image resolutions are expected to be the same
33+
if more than 1 image is provided.
34+
35+
and the following output nodes returned by the model.postprocess(..):
36+
* `num_detections`: Outputs float32 tensors of the form [batch]
37+
that specifies the number of valid boxes per image in the batch.
38+
* `detection_boxes`: Outputs float32 tensors of the form
39+
[batch, num_boxes, 4] containing detected boxes.
40+
* `detection_scores`: Outputs float32 tensors of the form
41+
[batch, num_boxes] containing class scores for the detections.
42+
* `detection_classes`: Outputs float32 tensors of the form
43+
[batch, num_boxes] containing classes for the detections.
44+
45+
46+
Example Usage:
47+
--------------
48+
python exporter_main_v2.py \
49+
--input_type image_tensor \
50+
--pipeline_config_path path/to/ssd_inception_v2.config \
51+
--trained_checkpoint_dir path/to/checkpoint \
52+
--output_directory path/to/exported_model_directory
53+
54+
The expected output would be in the directory
55+
path/to/exported_model_directory (which is created if it does not exist)
56+
holding two subdirectories (corresponding to checkpoint and SavedModel,
57+
respectively) and a copy of the pipeline config.
58+
59+
Config overrides (see the `config_override` flag) are text protobufs
60+
(also of type pipeline_pb2.TrainEvalPipelineConfig) which are used to override
61+
certain fields in the provided pipeline_config_path. These are useful for
62+
making small changes to the inference graph that differ from the training or
63+
eval config.
64+
65+
Example Usage (in which we change the second stage post-processing score
66+
threshold to be 0.5):
67+
68+
python exporter_main_v2.py \
69+
--input_type image_tensor \
70+
--pipeline_config_path path/to/ssd_inception_v2.config \
71+
--trained_checkpoint_dir path/to/checkpoint \
72+
--output_directory path/to/exported_model_directory \
73+
--config_override " \
74+
model{ \
75+
faster_rcnn { \
76+
second_stage_post_processing { \
77+
batch_non_max_suppression { \
78+
score_threshold: 0.5 \
79+
} \
80+
} \
81+
} \
82+
}"
83+
"""
84+
from absl import app
85+
from absl import flags
86+
87+
import tensorflow.compat.v2 as tf
88+
from google.protobuf import text_format
89+
from object_detection import exporter_lib_v2
90+
from object_detection.protos import pipeline_pb2
91+
92+
tf.enable_v2_behavior()
93+
94+
95+
FLAGS = flags.FLAGS
96+
97+
flags.DEFINE_string('input_type', 'image_tensor', 'Type of input node. Can be '
98+
'one of [`image_tensor`, `encoded_image_string_tensor`, '
99+
'`tf_example`, `float_image_tensor`]')
100+
flags.DEFINE_string('pipeline_config_path', None,
101+
'Path to a pipeline_pb2.TrainEvalPipelineConfig config '
102+
'file.')
103+
flags.DEFINE_string('trained_checkpoint_dir', None,
104+
'Path to trained checkpoint directory')
105+
flags.DEFINE_string('output_directory', None, 'Path to write outputs.')
106+
flags.DEFINE_string('config_override', '',
107+
'pipeline_pb2.TrainEvalPipelineConfig '
108+
'text proto to override pipeline_config_path.')
109+
110+
flags.mark_flag_as_required('pipeline_config_path')
111+
flags.mark_flag_as_required('trained_checkpoint_dir')
112+
flags.mark_flag_as_required('output_directory')
113+
114+
115+
def main(_):
116+
pipeline_config = pipeline_pb2.TrainEvalPipelineConfig()
117+
with tf.io.gfile.GFile(FLAGS.pipeline_config_path, 'r') as f:
118+
text_format.Merge(f.read(), pipeline_config)
119+
text_format.Merge(FLAGS.config_override, pipeline_config)
120+
exporter_lib_v2.export_inference_graph(
121+
FLAGS.input_type, pipeline_config, FLAGS.trained_checkpoint_dir,
122+
FLAGS.output_directory)
123+
124+
125+
if __name__ == '__main__':
126+
app.run(main)

0 commit comments

Comments
 (0)