google_object_detection

The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.github link

Note : tf在Github上的主页是:tensorflow,然后这个主页下又有两个比较重要的repo,分别是TensorFlow的源代码repo:tensorflow/tensorflow,还有一个tensorflow/models。后者tensorflow/models是Google官方用TensorFlow做的各种各样的模型,相当于示例代码,比如用于图像分类的Slim,深度文字OCR,以及用于NLP任务的句法分析模型syntaxnet,Seq2Seq with Attention等等等等。这次公布的Object Detection API同样是放在了tensorflow/models里。主要公布了5个在COCO上训练的网络。网络结构分别是SSD+MobileNet、SSD+Inception、R-FCN+ResNet101、Faster RCNN+ResNet101、Faster RCNN+Inception_ResNet、faster_rcnn_nas。后期应该还会有更多的模型加入进来。

Installation in Ubuntu

Dependencies

  • Tensorflow
  • Protobuf 2.6
  • Pillow 1.0
  • lxml
  • tf Slim (which is included in the “tensorflow/models/research/“ checkout)
  • Jupyter notebook
  • Matplotlib
    1
    2
    3
    4
    conda install tensorflow   # or pip install tensorflow
    sudo apt-get install protobuf-compiler python-pil python-lxml
    sudo pip install jupyter
    sudo pip install matplotlib

Protobuf Compilation

The Tensorflow Object Detection API uses Protobufs to configure model and training parameters. Before the framework can be used, the Protobuf libraries must be compiled. This should be done by running the following command from the tensorflow/models/research/ directory:

1
2
3
# just need to git clone tensorflow/models
# From models/research/
protoc object_detection/protos/*.proto --python_out=.

Add Libraries to PYTHONPATH

When running locally, the tensorflow/models/research/ and slim directories should be appended to PYTHONPATH. This can be done by running the following from tensorflow/models/research/:

1
2
# From models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Note: This command needs to run from every new terminal you start. If you wish to avoid running this manually, you can add it as a new line to the end of your ~/.bashrc file.

Testing the Installation(optional)

1
2
# From models/research/
python object_detection/builders/model_builder_test.py

Run a Demo

1
2
# From models
jupyter-notebook

访问文件夹object_detection,运行object_detection_tutorial.ipynb:object_detection_tutorial
依次shift+enter运行,这个Demo会自动下载并执行最快的模型ssd+mobilenet(coco dataset pre-trained),最后会出来两张图片:

Figure 1
Figure 2

Note: wait for a while ,then you will the images

Test Your Images with Pre-trained Model

Model Selection

1
2
3
4
5
6
7
8
9
10
11
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'

MODEL_NAME = 'ssd_inception_v2_coco_11_06_2017'

MODEL_NAME = 'rfcn_resnet101_coco_11_06_2017'

MODEL_NAME = 'faster_rcnn_resnet101_coco_11_06_2017'

MODEL_NAME = 'faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'

MODEL_NAME = 'faster_rcnn_nas_coco_24_10_2017'

模型比较:

Note: download the models from link to local directory “models/research/object_detection/ “, then unzip so that you can save time.

Modify Demo Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import time
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")

from utils import label_map_util
from utils import visualization_utils as vis_util

# What model to download.
MODEL_NAME = 'rfcn_resnet101_coco_11_06_2017' # replace your chosen model

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90 # 90类object,details in models\research\object_detection\data\mscoco_label_map.pbtxt

detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')

label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)


# test images: notice images should be in test_images directory
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [os.path.join(PATH_TO_TEST_IMAGES_DIR, '{}.png'.format(i)) for i in range(0, 10)]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
for image_path in TEST_IMAGE_PATHS:
current_time = time.time()
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded})
print("time: {}".format(time.time() - current_time))
print("box:{}".format(boxes), boxes.shape)
print("score:{}".format(scores))
print("class:{}".format(classes))
print("num:{}".format(num))
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=1)
plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
plt.show() # you need to shut down the image window to keep running the code

Main Parameters

numpy.squeeze(): 去除维度为1

  • boxes: a numpy array of shape [N, 4],相对坐标
  • scores: a numpy array of shape [N] or None.概率(从高往低排序) If scores=None, then
    this function assumes that the boxes to be plotted are groundtruth
    boxes and plot all boxes as black with no classes or scores.
    
  • classes: a numpy array of shape [N]. Note that class indices are 1-based,
    and match the keys in the label map.类别(总有90类 in coco dataset)
    
  • num: N=100,最多识别N个object

_More details in vis_util.visualize_boxes_and_labels_on_image_array()_

Get Object Coordinate

boxes为object相对坐标,图片左上角为原点1,横向为x,纵向为y:boxes每一项是一个object坐标,含义[ymin, xmin, ymax, xmax],转换成绝对坐标需要乘以输入原始图片的width or height。

Note: 默认概率低于0.5,不会被显示在原始图片上,可以通过visualize_boxes_and_labels_on_image_array()的min_score_thresh参数调整。

Extra

获得坐标后,可以通过opencv提取object图片:

1
2
3
4
5
import cv2
img = cv2.imread('0.png')

patch_tree = img[565:620, 180:260]
cv2.imwrite('cropped_image.png', patch_tree)

_install opencv : not to use conda install opencv(cannot use third-party),download opencv-2.4.13.4-vc14.exe and run, you will get a file named opencv. Then copy opencv\build\python\2.7\x64\cv2.pyd to python path.And you need to copy some third-party dlls to python path too if you want to use them. Details in link_.

Train Your Dataset

Configure

The Tensorflow Object Detection API uses protobuf files to configure the training and evaluation process. The schema for the training pipeline can be found in object_detection/protos/pipeline.proto. At a high level, the config file is split into 5 parts:

  • The model configuration. This defines what type of model will be trained (ie. meta-architecture, feature extractor).
  • The train_config, which decides what parameters should be used to train model parameters (ie. SGD parameters, input preprocessing and feature extractor initialization values).
  • The eval_config, which determines what set of metrics will be reported for evaluation (currently we only support the PASCAL VOC metrics).
  • The train_input_config, which defines what dataset the model should be trained on.
  • The eval_input_config, which defines what dataset the model will be evaluated on. Typically this should be different than the training input dataset.

A skeleton configuration file is shown below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
model {
(... Add model config here...)
}

train_config : {
(... Add train_config here...)
}

train_input_reader: {
(... Add train_input configuration here...)
}

eval_config: {
}

eval_input_reader: {
(... Add eval_input configuration here...)
}

Faster_RCNN_resnet101_coco config:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# Faster R-CNN with Resnet-101 (v1) configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
faster_rcnn {
num_classes: 3
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 600
max_dimension: 1024
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.7
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6
max_detections_per_class: 10
max_total_detections: 30
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}

train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0003
schedule {
step: 0
learning_rate: .0003
}
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "helic_fasterrcnn_20171217/model.ckpt-30000"
from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
num_steps: 20000
data_augmentation_options {
random_horizontal_flip {
}
}
}

train_input_reader: {
tf_record_input_reader {
input_path: "badminton.record"
}
label_map_path: "object_detection/data/badminton_label_map.pbtxt"
}

eval_config: {
num_examples: 8000
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 10
}

eval_input_reader: {
tf_record_input_reader {
input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record"
}
label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"
shuffle: false
num_readers: 1
num_epochs: 1
}

More model configure template files in https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

_More details in https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md_

Create Our Own Dataset

Tensorflow Object Detection API reads data using the TFRecord file format. Two sample scripts (create_pascal_tf_record.py and create_pet_tf_record.py) are provided to convert from the PASCAL VOC dataset and Oxford-IIIT Pet dataset to TFRecords.

To use your own dataset in Tensorflow Object Detection API, you must convert it into the TFRecord file format. This document outlines how to write a script to generate the TFRecord file.

Run Locally

This page walks through the steps required to train an object detection model on a local machine. It assumes the reader has completed the following prerequisites:

  • The Tensorflow Object Detection API has been installed as documented in the installation instructions. This includes installing library dependencies, compiling the configuration protobufs and setting up the Python environment.
  • A valid data set has been created. See this page for instructions on how to generate a dataset for the PASCAL VOC challenge or the Oxford-IIIT Pet dataset.
  • A Object Detection pipeline configuration has been written. See this page for details on how to write a pipeline configuration.

_More details in https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md_

Extras

  • Tensorflow detection model zoo
  • Exporting a trained model for inference
  • Defining your own model architecture
  • Bringing in your own dataset

_Details in https://github.com/tensorflow/models/tree/master/research/object_detection_