Repository: visipedia/tf_classification
Branch: master
Commit: 7dac8bb3a419
Files: 47
Total size: 397.8 KB
Directory structure:
gitextract_pxfoh05k/
├── .gitignore
├── LICENSE
├── README.md
├── classify.py
├── config/
│ ├── README.md
│ ├── __init__.py
│ ├── config_classify.yaml
│ ├── config_export.yaml
│ ├── config_test.yaml
│ ├── config_train.yaml
│ └── parse_config.py
├── export.py
├── extract.py
├── nets/
│ ├── README.md
│ ├── __init__.py
│ ├── inception.py
│ ├── inception_resnet_v2.py
│ ├── inception_resnet_v2_test.py
│ ├── inception_utils.py
│ ├── inception_v1.py
│ ├── inception_v1_test.py
│ ├── inception_v2.py
│ ├── inception_v2_test.py
│ ├── inception_v3.py
│ ├── inception_v3_test.py
│ ├── inception_v4.py
│ ├── inception_v4_test.py
│ ├── mobilenet_v1.py
│ ├── mobilenet_v1_test.py
│ ├── net_profile.py
│ ├── nets_factory.py
│ ├── nets_factory_test.py
│ ├── resnet_utils.py
│ ├── resnet_v2.py
│ └── resnet_v2_test.py
├── preprocessing/
│ ├── __init__.py
│ ├── decode_example.py
│ └── inputs.py
├── requirements.txt
├── test.py
├── tfserving/
│ ├── README.md
│ ├── __init__.py
│ ├── client.py
│ ├── inputs.py
│ └── tfserver.py
├── train.py
└── visualize_train_inputs.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# IPython Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# dotenv
.env
# virtualenv
venv/
ENV/
# Spyder project settings
.spyderproject
# Rope project settings
.ropeproject
# Mac stuff
.DS_Store
# Visual Studio Code stuff
.vscode/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2017 Visipedia
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# TensorFlow Classification
This repo contains training, testing and classifcation code for image classification using [TensorFlow](https://www.tensorflow.org/). Whole image classification as well as multi instance bounding box classification is supported.
Checkout the [Wiki](https://github.com/visipedia/tf_classification/wiki) for more detailed tutorials.
---
## Requirements
TensorFlow 1.0+ is required. The code is tested with TensorFlow 1.3 and Python 2.7 on Ubuntu 16.04 and Mac OSX 10.11. Check out the [requirements.txt](requirements.txt) file for a list of python dependencies.
---
## Prepare the Data
The models require the image data to be in a specific format. You can use the Visipedia [tfrecords repo](https://github.com/visipedia/tfrecords) to produce the files.
For the commands below, I'll assume that you have created a `DATASET_DIR` environment variable that points to the directory that contains your tfrecords:
```
$ export DATASET_DIR=/home/ubuntu/tf_datasets/cub
```
---
## Directory Structure
I have found that its useful to have the following directory and file setup:
* experiment/
* logdir/
* train_summaries/
* val_summaries/
* test_summaries/
* results/
* finetune/
* train_summaries/
* val_summaries/
* cmds.txt
* config_train.yaml
* config_test.yaml
* config_export.yaml
The purpose of each directory and file will be explained below.
The `cmds.txt` is useful to save the different training and testing commands. There are quite a few command-line arguments to some of the scripts, so its convienent to compose the commands in an editor.
For the commands below, I'll assume that you have created a `EXPERIMENT_DIR` environment variable that points to your experiment directory:
```
$ export EXPERIMENT_DIR=/home/ubuntu/tf_experiments/cub
```
---
## Configuration
There are example configuration files in the [config directory](config/). At the very least you'll need a `config_train.yaml` file, and you'll probably want a `config_test.yaml` file. It is convienent to copy the example configuration files into your `experiment` directory. See the configuration [README](config/README.md) for more details.
### Choose a Network Architecture
This repo currently supports the Google Inception, ResNet and MobileNet flavor of networks. See the nets [README](nets/README.md) for more information on the different Inception versions. At the moment, `inception_v3` probably offers the best tradeoff in terms of size and performance, although its always worth experimenting with a few different architectures. The [README](nets/README.md) also contains links where you can download checkpoint files for the models. In most cases you should start your training from these checkpoint files rather than training from scratch.
You can specify the name of the choosen network in the configuration yaml file. Alternatively you can pass it in as a command-line argument to most of the scripts.
For the commands below, I'll assume that you have created an environment variable that points to the pretrained checkpoint file that you downloaded:
```
$ export PRETRAINED_MODEL=/home/ubuntu/tf_models/inception_v3.ckpt
```
---
## Data Visualization
Now that you have a configuration script for training, it is a good idea to visualize the inputs to the network and ensure that they look good. This allows you to debug any problems with your tfrecords and lets you play with different augmentation techniques. Visualize your data by doing:
```
$ CUDA_VISIBLE_DEVICES=1 python visualize_train_inputs.py \
--tfrecords $DATASET_DIR/train* \
--config $EXPERIMENT_DIR/config_train.yaml
```
If you are in a virtualenv and Matplotlib is complaining, then you may need to modify your environment. See this [FAQ](http://matplotlib.org/faq/virtualenv_faq.html) and [this document](http://matplotlib.org/faq/osx_framework.html#osxframework-faq) for fixing this issue. I use a virtualenv on my Mac OSX 10.11 machine and I needed to do the `PYTHONHOME` [work around](http://matplotlib.org/faq/osx_framework.html#pythonhome-function) for Matplotlib to work properly. In this case the command looks like:
```
$ CUDA_VISIBLE_DEVICES=1 frameworkpython visualize_train_inputs.py \
--tfrecords $DATASET_DIR/train* \
--config $EXPERIMENT_DIR/config_train.yaml
```
---
## Training and Validating
It's recommended to start from a pretrained network when training a network on your own data. However, this isn't necessary and you can train from scratch if you have enough data. The following warmup section assumes you are starting from a pretrained network. See the nets [README](nets/README.md) to find links to pretrained checkpoint files.
### Finetune A Pretrained Network
Finetuning a pretrained network essentially uses the pretrained network as a generic feature extractor and learns a new final layer that will output predictions for your target classes (rather than the original classes that the pretrained network was trained on). To do this, we will specify the pretrained model as the starting point, and only allow the logits layers to be modified. We can put the trained models in the `experiment/logdir/finetune` directory.
```
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $PRETRAINED_MODEL \
--trainable_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--checkpoint_exclude_scopes InceptionV3/Logits InceptionV3/AuxLogits \
--learning_rate_decay_type fixed \
--lr 0.01
```
#### Monitoring Progress
We'll want to monitor performance of the model on a validation set. Once the model performance starts to plateau we can assume that the final layer is warmed up and we can switch to full training. We can monitor the validation performance by running:
```
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/finetune/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_test.yaml \
--batches 100 \
--eval_interval_secs 300
```
You may want to also monitor the accuracy on the train set. Simply pass in the train tfrecords to the `test.py` script and change the output directory:
```
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/train* \
--save_dir $EXPERIMENT_DIR/logdir/finetune/train_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir/finetune \
--config $EXPERIMENT_DIR/config_test.yaml \
--batches 100 \
--eval_interval_secs 300
```
Keeping the train summaries and val summaries in separate directories will keep the tensorboard ui clean. To monitor the training process you can fireup tensorboard:
```
$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006
```
### Training the Entire Network
The benefit of finetuning a network is that the training is very fast, as only the last layer is modified. However, to get the best performance you'll typically want to modify more (or all) of the layers of the network. Starting from a pretrained network (which can happen to be a finetuned network), this full training step essentially adapts the network to operating on the domain of your specific dataset. We'll store the generated files in the `experiment/logdir` directory. You can do the finetuning process as a warmup and then start the full train:
```
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $EXPERIMENT_DIR/logdir/finetune
```
Or you can just start the full train from a pretrained model:
```
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $PRETRAINED_MODEL \
--checkpoint_exclude_scopes InceptionV3/Logits InceptionV3/AuxLogits
```
Or if you have enough data, you may not want to even use the pretrained model. Rather you can train from scratch:
```
$ CUDA_VISIBLE_DEVICES=0 python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir/ \
--config $EXPERIMENT_DIR/config_train.yaml
```
#### Monitoring Progress
For watching the validation performance we can do:
```
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/val* \
--save_dir $EXPERIMENT_DIR/logdir/val_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batches 100 \
--eval_interval_secs 300
```
Similar for the train data:
```
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/train* \
--save_dir $EXPERIMENT_DIR/train_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batches 100 \
--eval_interval_secs 300
```
The command for tensorboard doesn't need to change:
```
$ tensorboard --logdir=$EXPERIMENT_DIR/logdir --port=6006
```
You will be able to see the fine-tune and the full train data plotted on the same plots.
---
## Test
Once performance on the validation data has plateaued (or some other criterion has been met), you can test the model on a held out set of images to see how well it generalizes to new data:
```
$ CUDA_VISIBLE_DEVICES=1 python test.py \
--tfrecords $DATASET_DIR/test* \
--save_dir $EXPERIMENT_DIR/logdir/test_summaries \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 32 \
--batches 100
```
If you are happy with the performance of the model, then you are ready to classify new images and export the model for production use. Otherwise its back to the drawing board to figure out how to increase performance.
---
## Classifying
If you want to classify data offline using the trained model then you can do:
```
CUDA_VISIBLE_DEVICES=1 python classify.py \
--tfrecords $DATASET_DIR/new/* \
--checkpoint_path $EXPERIMENT_DIR/logdir \
--save_path $EXPERIMENT_DIR/logdir/results/classification_results.npz \
--config $EXPERIMENT_DIR/config_test.yaml \
--batch_size 32 \
--batches 1000 \
--save_logits
```
The output of the script is a numpy uncompressed .npz file saved at `--save_path`. The file will contain at least 2 arrays: one that contains ids and one that contains the predicted class label. If `--save_logits` is specified, then the raw logits (before going through the softmax) will also be saved.
---
## Export & Compress
To export a model for easy use on a mobile device you can use:
```
python export.py \
--checkpoint_path model.ckpt-399739 \
--export_dir ./export \
--export_version 1 \
--config config_export.yaml \
--class_names class-codes.txt
```
The input node is called `images` and the output node is called `Predictions`. Checkout [this](https://github.com/visipedia/tf_classification/wiki/Exporting-an-Optimized-Model) wiki article for more tips.
If you are going to use the model with [TensorFlow Serving](https://www.tensorflow.org/deploy/tfserve) then you can use the following:
```
python export.py \
--checkpoint_path model.ckpt-399739 \
--export_dir ./export \
--export_version 1 \
--config config_export.yaml \
--serving \
--add_preprocess \
--class_names class-codes.txt
```
Check out the resources in the [tfserving](tfserving/) directory for more help with deploying on TensorFlow Serving.
================================================
FILE: classify.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import time
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
from config.parse_config import parse_config_file
from nets import nets_factory
from preprocessing import inputs
def classify(tfrecords, checkpoint_path, save_path, max_iterations, save_logits, cfg, read_images=False):
"""
Args:
tfrecords (list)
checkpoint_path (str)
save_dir (str)
max_iterations (int)
save_logits (bool)
cfg (EasyDict)
"""
tf.logging.set_verbosity(tf.logging.DEBUG)
graph = tf.Graph()
with graph.as_default():
global_step = slim.get_or_create_global_step()
with tf.device('/cpu:0'):
batch_dict = inputs.input_nodes(
tfrecords=tfrecords,
cfg=cfg.IMAGE_PROCESSING,
num_epochs=1,
batch_size=cfg.BATCH_SIZE,
num_threads=cfg.NUM_INPUT_THREADS,
shuffle_batch =cfg.SHUFFLE_QUEUE,
random_seed=cfg.RANDOM_SEED,
capacity=cfg.QUEUE_CAPACITY,
min_after_dequeue=cfg.QUEUE_MIN,
add_summaries=False,
input_type='classification',
read_filenames=read_images
)
arg_scope = nets_factory.arg_scopes_map[cfg.MODEL_NAME]()
with slim.arg_scope(arg_scope):
logits, end_points = nets_factory.networks_map[cfg.MODEL_NAME](
inputs=batch_dict['inputs'],
num_classes=cfg.NUM_CLASSES,
is_training=False
)
predicted_labels = tf.argmax(end_points['Predictions'], 1)
if 'MOVING_AVERAGE_DECAY' in cfg and cfg.MOVING_AVERAGE_DECAY > 0:
variable_averages = tf.train.ExponentialMovingAverage(
cfg.MOVING_AVERAGE_DECAY, global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
variables_to_restore[global_step.op.name] = global_step
else:
variables_to_restore = slim.get_variables_to_restore()
variables_to_restore.append(global_step)
saver = tf.train.Saver(variables_to_restore, reshape=True)
num_batches = max_iterations
num_images = num_batches * cfg.BATCH_SIZE
label_array = np.empty(num_images, dtype=np.int32)
id_array = np.empty(num_images, dtype=np.object)
fetches = [predicted_labels, batch_dict['ids']]
if save_logits:
fetches.append(logits)
logits_array = np.empty((num_images, cfg.NUM_CLASSES), dtype=np.float32)
if os.path.isdir(checkpoint_path):
checkpoint_dir = checkpoint_path
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
if checkpoint_path is None:
raise ValueError("Unable to find a model checkpoint in the " \
"directory %s" % (checkpoint_dir,))
tf.logging.info('Classifying records using %s' % checkpoint_path)
coord = tf.train.Coordinator()
sess_config = tf.ConfigProto(
log_device_placement=cfg.SESSION_CONFIG.LOG_DEVICE_PLACEMENT,
allow_soft_placement = True,
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=cfg.SESSION_CONFIG.PER_PROCESS_GPU_MEMORY_FRACTION
),
intra_op_parallelism_threads=cfg.SESSION_CONFIG.INTRA_OP_PARALLELISM_THREADS if 'INTRA_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None,
inter_op_parallelism_threads=cfg.SESSION_CONFIG.INTER_OP_PARALLELISM_THREADS if 'INTER_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None
)
sess = tf.Session(graph=graph, config=sess_config)
with sess.as_default():
tf.global_variables_initializer().run()
tf.local_variables_initializer().run()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
# Restore from checkpoint
saver.restore(sess, checkpoint_path)
print_str = ', '.join([
'Step: %d',
'Time/image (ms): %.1f'
])
step = 0
while not coord.should_stop():
t = time.time()
outputs = sess.run(fetches)
dt = time.time()-t
idx1 = cfg.BATCH_SIZE * step
idx2 = idx1 + cfg.BATCH_SIZE
label_array[idx1:idx2] = outputs[0]
id_array[idx1:idx2] = outputs[1]
if save_logits:
logits_array[idx1:idx2] = outputs[2]
step += 1
print(print_str % (step, (dt / cfg.BATCH_SIZE) * 1000))
if max_iterations > 0 and step == max_iterations:
break
except tf.errors.OutOfRangeError as e:
pass
coord.request_stop()
coord.join(threads)
# save the results
if save_logits:
np.savez(save_path, labels=label_array, ids=id_array, logits=logits_array)
else:
np.savez(save_path, labels=label_array, ids=id_array)
def parse_args():
parser = argparse.ArgumentParser(description='Classify images, optionally saving the logits.')
parser.add_argument('--tfrecords', dest='tfrecords',
help='Paths to tfrecords.', type=str,
nargs='+', required=True)
parser.add_argument('--checkpoint_path', dest='checkpoint_path',
help='Path to a specific model to test against. If a directory, then the newest checkpoint file will be used.', type=str,
required=True, default=None)
parser.add_argument('--save_path', dest='save_path',
help='File name path to a save the classification results.', type=str,
required=True, default=None)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file',
required=True, type=str)
parser.add_argument('--batch_size', dest='batch_size',
help='The number of images in a batch.',
required=True, type=int, default=None)
parser.add_argument('--batches', dest='batches',
help='Maximum number of iterations to run. Default is all records (modulo the batch size).',
required=True, type=int, default=0)
parser.add_argument('--save_logits', dest='save_logits',
help='Should the logits be saved?',
action='store_true', default=False)
parser.add_argument('--model_name', dest='model_name',
help='The name of the architecture to use.',
required=False, type=str, default=None)
parser.add_argument('--read_images', dest='read_images',
help='Read the images from the file system using the `filename` field rather than using the `encoded` field of the tfrecord.',
action='store_true', default=False)
args = parser.parse_args()
return args
def main():
args = parse_args()
cfg = parse_config_file(args.config_file)
if args.batch_size != None:
cfg.BATCH_SIZE = args.batch_size
if args.model_name != None:
cfg.MODEL_NAME = args.model_name
classify(
tfrecords=args.tfrecords,
checkpoint_path=args.checkpoint_path,
save_path = args.save_path,
max_iterations=args.batches,
save_logits=args.save_logits,
cfg=cfg,
read_images=args.read_images
)
if __name__ == '__main__':
main()
================================================
FILE: config/README.md
================================================
This directory contains example configuration scripts for training, testing, classifying and exporting models. I find it easy to copy these configuration files to my experiment directory and make the necessary changes.
## Training Configuration
See the [example training config file](config_train.yaml).
The training configuration script contains the most configurations. The other scripts mainly contain subsets of the training configuration. The `Learning Rate Parameters`, `Regularization`, and `Optimization` configurations provided experimenters fine-grained control over the learning process. Non-researchers will probably find most of the default settings adequate. I will not go into detail for these configuration parameters, but there are comments for these parameters in the [example training config file](config_train.yaml).
The configuration sections that you will want to pay attention to are the `Dataset Info` section and the `Image Processing and Augmentation` section. You'll most likely be modifying these for each experiment. Once you determine good settings for the `Queues` and `Saving Models and Summaries` you'll probably reuse these values across experiments.
### Dataset Info
| Config Name | Type | Description |
:----:|:----:|------------|
NUM_CLASSES | int | This is how you specify how many classes are in your dataset. |
NUM_TRAIN_EXAMPLES | int | This is the number of images (or bounding boxes) in your training tfrecords. This value, along with the `BATCH_SIZE` is used to compute the number of iterations in an epoch (i.e. the number of batches it takes to go through the whole training set) |
NUM_TRAIN_ITERATIONS | int | The maximum number of iterations to execute before stopping. If you are manually monitoring the training, then you can set this to a large number (e.g. 1000000) |
BATCH_SIZE | int | The number of images to process in one iteration. This number is constrained by the amount of GPU memory you have. The larger the batch size, the more GPU memory you need. You typically want the largest batch size that will fit on your GPU. |
MODEL_NAME | str | The architecture to use. Its important to keep this configuration parameter constant in all of your configuration files. |
### Image Processing and Augmentation
Deep neural networks are notoriously data hungry. One technique for increasing the amount of data that you can pass through the network is to augment your training data. Augmentations can be as simple as randomly flipping the images horizontally, or as complex as extracting crops and perturbing the pixel values. You will typically only want to augment data for the training phase.
`IMAGE_PROCESSING` contains the parameters for controlling how to extract data from the images:
| Config Name | Type | Description |
:----:|:----:|------------|
INPUT_SIZE | int | All images will be resized to [`INPUT_SIZE`, `INPUT_SIZE`, 3] prior to passing through the network. You'll want to set this to the same value that the pretrained model used. See the nets [README](../nets/README.md) for the input size of each model architecture. |
REGION_TYPE | str | Which region should be used when creating an example? Possible values are `image` and `bbox`. |
MAINTAIN_ASPECT_RATIO | bool | When we resize an extracted region, should we maintain the aspect ratio? Or just squish it?
RESIZE_FAST | bool | If true, then slower resize operations will be avoided and only [bilinear resizing](https://en.wikipedia.org/wiki/Bilinear_interpolation) will be used. Otherwise, a random choice between [bilinear](), [nearest neighbor](https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation), [bicubic](https://en.wikipedia.org/wiki/Bicubic_interpolation) and area interpolation will be used. |
DO_RANDOM_FLIP_LEFT_RIGHT | bool | If true, then each region has a 50% chance of being flipped. |
DO_COLOR_DISTORTION | float | Value between 0 and 1. 0 means never distort the color, and 1 means always distort the color. |
COLOR_DISTORT_FAST | bool | Its possible to distort the brightness, saturation, hue and contrast of an image. If true, then slower modifications (hue and contrast) are avoided. |
#### Region Extraction
Currently there are two different region extraction protocols:
* `image`: The entire image is extracted and passed to the next phase of augmentation
* `bbox`: Each bounding box in the tfrecord is used to crop out an image region. These regions are passed on to the next phase of augmentation. If there are `n` bounding boxes in a tfrecord, then `n` regions will be extracted from the image.
For bounding boxes, we can specify wether we want to enlarge the box. This can be used as another form of augmentation (loose bounding boxes vs tight bounding boxes).
| Config Name | Type | Description |
:----:|:----:|------------|
DO_EXPANSION | float | Value between 0 and 1. 0 means never expand the box. 1 means always expand the box. |
EXPANSION_CFG | | Contains the parameters controlling the expansion of the bounding box. |
EXPANSION_CFG.
WIDTH_EXPANSION_FACTOR | float | Scaling factor for the width of the box. |
EXPANSION_CFG.
HEIGHT_EXPANSION_FACTOR | float | Scaling factor for the height of the box. |
#### Random Cropping
Each region that is extracted from an image can then be randomly cropped. Again, this is a form of data augmentation. We are trying to make the network robust to changes in the data that do not effect the class label.
`RANDOM_CROP_CFG` contains parameters for cropping out a rectangular patch from each region.
| Config Name | Type | Description |
:----:|:----:|------------|
DO_RANDOM_CROP | float | Value between 0 and 1. 0 means never crop a region. 1 means always take a crop. |
RANDOM_CROP_CFG | | This contains parameters that controls the types of crops that are possible. |
RANDOM_CROP_CFG.
MIN_AREA | float | Value between 0 and 1. This controls how much of the region is required to be in the crop, essentially controlling how small a crop can be. |
RANDOM_CROP_CFG.
MAX_AREA | float | Value between 0 and 1. This controls the maximum size of the crop. |
RANDOM_CROP_CFG.
MIN_ASPECT_RATIO | float | The minimum [aspect ratio](https://en.wikipedia.org/wiki/Aspect_ratio_(image)) of the crop. Don't forget that this crop will be resized to [`INPUT_SIZE`, `INPUT_SIZE`, 3] prior to passing through the network. |
RANDOM_CROP_CFG.
MAX_ASPECT_RATIO | float | The maximum [aspect ratio](https://en.wikipedia.org/wiki/Aspect_ratio_(image)) of the crop. Don't forget that this crop will be resized to [`INPUT_SIZE`, `INPUT_SIZE`, 3] prior to passing through the network. |
RANDOM_CROP_CFG.
MAX_ATTEMPTS | int | The number of crop attempts to try before returning the whole region. |
### Queues
This section of the config file contains parameters for controlling the queueing of data to feed the network. These setting depend on the number of cores in your machine and the amount of memory available. Please see the comments in the example config file for more information.
### Saving Models and Summaries
This section of the config file contains parameters for controlling how often a model checkpoint should be created and how often tensorboard summary files should be generated. Please see the comments in the example config file for more information.
## Testing Configuration
See the [example testing config file](config_test.yaml).
The `Learning Rate Parameters`, `Optimization`, and `Saving Models and Summaries` parameters are not necessary for testing. The remaining parameters from the training config carry over to testing. In addition there are a few new configurations:
| Config Name | Type | Description |
:----:|:----:|------------|
PRECISION_AT_K_METRIC | array of ints | You can track top-k metrics using this array. Top-1 (i.e. accuracy) will always be plotted |
NUM_TEST_EXAMPLES | int | The number of images (or bounding boxes) in the tfrecords. This can be ignored if you use the `--batches` command line flag. |
Typically in a testing situation you'll want to turn off the augmentations to the extracted image regions. This way you are passing "real" data to the network. See the `Image Processing and Augmentation` section of the [example testing config file](config_test.yaml) to see how to extract regions without augmentations.
## Classification Configuration
See the [example classification config file](config_classify.yaml).
The classification configuration contains even fewer necessary fields than the testing configuration. The `Metrics` section is removed and you'll need to pass batch size and total batch information through command-line arguments.
## Export Configuration
See the [example export config file](config_export.yaml).
The export configuration is the smallest configuration file. See the [example](config_export.yaml) for which fields are required.
================================================
FILE: config/__init__.py
================================================
================================================
FILE: config/config_classify.yaml
================================================
# Classification specific configuration
RANDOM_SEED : 1.0
SESSION_CONFIG : {
# If true, then the device location of each variable will be printed
LOG_DEVICE_PLACEMENT : false,
# How much GPU memory we are allowed to pre-allocate
PER_PROCESS_GPU_MEMORY_FRACTION : 0.9,
# Set the number of accessible cpu threads. Leave as null to use everything.
# Set to 1 to help with debugging (makes the print statements legible)
INTRA_OP_PARALLELISM_THREADS : null,
INTER_OP_PARALLELISM_THREADS : null
}
#################################################
# Dataset Info
# The number of classes we are classifying
NUM_CLASSES : 200
# The model architecture to use.
MODEL_NAME : 'inception_v3'
# END: Dataset Info
#################################################
# Image Processing and Augmentation
# There are 5 steps to image processing:
# 1) Extract regions from the image
# 2) Extract a crops from each region
# 3) Resize the crops for the network architecture
# 4) Flip the crops
# 5) Modify the colors of the crops
IMAGE_PROCESSING : {
# All images will be resized to the [INPUT_SIZE, INPUT_SIZE, 3]
INPUT_SIZE : 299,
# 1) First we extract regions from the image
# What type of region should be extracted, either 'image' or 'bbox'
REGION_TYPE : 'image',
# Specific whole image region extraction configuration
WHOLE_IMAGE_CFG : {},
# Specific bounding box region extraction configuration
BBOX_CFG : {
# We can centrally expand a bbox (i.e. turn a tight crop into a loose crop)
# The fraction of time to expand the bounding box, 0 is never, 1 is always
DO_EXPANSION : 1,
EXPANSION_CFG : {
WIDTH_EXPANSION_FACTOR : 2.0, # Expand the width by a factor of 2 (centrally)
HEIGHT_EXPANSION_FACTOR : 2.0, # Expand the height by a factor of 2 (centrally)
}
},
# 2) Then we take a random crop from the region
# The fraction of time to take a random crop, 0 is never, 1 is always
DO_RANDOM_CROP : 0,
RANDOM_CROP_CFG : {
MIN_AREA : 0.5, # between 0 and 1, how much of the region must be included
MAX_AREA : 1.0, # between 0 and 1, how much of the region can be included
MIN_ASPECT_RATIO : 0.7, # minimum aspect ratio of the crop
MAX_ASPECT_RATIO : 1.33, # maximum aspect ratio of the crop
MAX_ATTEMPTS : 100, # maximum number of attempts before returning the whole region
},
# Alternatively we can take a central crop from the image
DO_CENTRAL_CROP : 0, # Fraction of the time to take a central crop, 0 is never, 1 is always
CENTRAL_CROP_FRACTION : 0.875, # Between 0 and 1, fraction of size to crop
# 3) We need to resize the extracted regions to feed into the network.
MAINTAIN_ASPECT_RATIO : false,
# Avoid slower resize operations (bi-cubic, etc.)
RESIZE_FAST : true,
# 4) We can flip the regions
# Randomly flip the image left right, 50% chance of flipping
DO_RANDOM_FLIP_LEFT_RIGHT : false,
# 5) We can distort the colors of the regions
# The fraction of time to distort the color, 0 is never, 1 is always
DO_COLOR_DISTORTION : 0,
# Avoids slower ops (random_hue and random_contrast)
COLOR_DISTORT_FAST : false
}
# END: Image Processing and Augmentation
#################################################
# Queues
#
# Number of threads to populate the batch queue
NUM_INPUT_THREADS : 2
# Should the data be shuffled?
SHUFFLE_QUEUE : false
# Capacity of the queue producing batched examples
QUEUE_CAPACITY : 1000
# Minimum size of the queue to ensure good shuffling
QUEUE_MIN : 200
# END: Queues
#################################################
# Regularization
#
# The decay to use for the moving average. If 0, then moving average is not computed
# When restoring models, this value is needed to determine whether to restore moving
# average variables or not.
MOVING_AVERAGE_DECAY : 0.9999
# End: Regularization
#################################################
================================================
FILE: config/config_export.yaml
================================================
# Export specific configuration
RANDOM_SEED : 1.0
SESSION_CONFIG : {
# If true, then the device location of each variable will be printed
LOG_DEVICE_PLACEMENT : false,
# How much GPU memory we are allowed to pre-allocate
PER_PROCESS_GPU_MEMORY_FRACTION : 0.9,
# Set the number of accessible cpu threads. Leave as null to use everything.
# Set to 1 to help with debugging (makes the print statements legible)
INTRA_OP_PARALLELISM_THREADS : null,
INTER_OP_PARALLELISM_THREADS : null
}
#################################################
# Dataset Info
# The number of classes we are classifying
NUM_CLASSES : 200
# The model architecture to use.
MODEL_NAME : 'inception_v3'
# END: Dataset Info
#################################################
# Image Processing and Augmentation
IMAGE_PROCESSING : {
# Images are assumed to be raveled, and have length INPUT_SIZE * INPUT_SIZE * 3
INPUT_SIZE : 299
}
# END: Image Processing and Augmentation
#################################################
# Regularization
#
# The decay to use for the moving average. If 0, then moving average is not computed
# When restoring models, this value is needed to determine whether to restore moving
# average variables or not.
MOVING_AVERAGE_DECAY : 0.9999
# End: Regularization
#################################################
================================================
FILE: config/config_test.yaml
================================================
# Testing specific configuration
RANDOM_SEED : 1.0
SESSION_CONFIG : {
# If true, then the device location of each variable will be printed
LOG_DEVICE_PLACEMENT : false,
# How much GPU memory we are allowed to pre-allocate
PER_PROCESS_GPU_MEMORY_FRACTION : 0.9,
# Set the number of accessible cpu threads. Leave as null to use everything.
# Set to 1 to help with debugging (makes the print statements legible)
INTRA_OP_PARALLELISM_THREADS : null,
INTER_OP_PARALLELISM_THREADS : null
}
#################################################
# Metrics
#
# Top-k precision information. Each entry is a different k value.
ACCURACY_AT_K_METRIC : [3, 5]
# END: Metrics
#################################################
# Dataset Info
# The number of classes we are classifying
NUM_CLASSES : 200
# Number of test examples in the tfrecords. This is needed to compute the total number of
# batches to pass through the network.
NUM_TEST_EXAMPLES : 5794
# The number of images to pass through the network on each iteration
BATCH_SIZE : 32
# The model architecture to use.
MODEL_NAME : 'inception_v3'
# END: Dataset Info
#################################################
# Image Processing and Augmentation
# There are 5 steps to image processing:
# 1) Extract regions from the image
# 2) Extract a crops from each region
# 3) Resize the crops for the network architecture
# 4) Flip the crops
# 5) Modify the colors of the crops
IMAGE_PROCESSING : {
# All images will be resized to the [INPUT_SIZE, INPUT_SIZE, 3]
INPUT_SIZE : 299,
# 1) First we extract regions from the image
# What type of region should be extracted, either 'image' or 'bbox'
REGION_TYPE : 'image',
# Specific whole image region extraction configuration
WHOLE_IMAGE_CFG : {},
# Specific bounding box region extraction configuration
BBOX_CFG : {
# We can centrally expand a bbox (i.e. turn a tight crop into a loose crop)
# The fraction of time to expand the bounding box, 0 is never, 1 is always
DO_EXPANSION : 1,
EXPANSION_CFG : {
WIDTH_EXPANSION_FACTOR : 2.0, # Expand the width by a factor of 2 (centrally)
HEIGHT_EXPANSION_FACTOR : 2.0, # Expand the height by a factor of 2 (centrally)
}
},
# 2) Then we take a random crop from the region
# The fraction of time to take a random crop, 0 is never, 1 is always
DO_RANDOM_CROP : 0,
RANDOM_CROP_CFG : {
MIN_AREA : 0.5, # between 0 and 1, how much of the region must be included
MAX_AREA : 1.0, # between 0 and 1, how much of the region can be included
MIN_ASPECT_RATIO : 0.7, # minimum aspect ratio of the crop
MAX_ASPECT_RATIO : 1.33, # maximum aspect ratio of the crop
MAX_ATTEMPTS : 100, # maximum number of attempts before returning the whole region
},
# Alternatively we can take a central crop from the image
DO_CENTRAL_CROP : 0, # Fraction of the time to take a central crop, 0 is never, 1 is always
CENTRAL_CROP_FRACTION : 0.875, # Between 0 and 1, fraction of size to crop
# 3) We need to resize the extracted regions to feed into the network.
MAINTAIN_ASPECT_RATIO : false,
# Avoid slower resize operations (bi-cubic, etc.)
RESIZE_FAST : true,
# 4) We can flip the regions
# Randomly flip the image left right, 50% chance of flipping
DO_RANDOM_FLIP_LEFT_RIGHT : false,
# 5) We can distort the colors of the regions
# The fraction of time to distort the color, 0 is never, 1 is always
DO_COLOR_DISTORTION : 0,
# Avoids slower ops (random_hue and random_contrast)
COLOR_DISTORT_FAST : false
}
# END: Image Processing and Augmentation
#################################################
# Queues
#
# Number of threads to populate the batch queue
NUM_INPUT_THREADS : 2
# Should the data be shuffled?
SHUFFLE_QUEUE : false
# Capacity of the queue producing batched examples
QUEUE_CAPACITY : 1000
# Minimum size of the queue to ensure good shuffling
QUEUE_MIN : 200
# END: Queues
#################################################
# Regularization
#
# The decay to use for the moving average. If 0, then moving average is not computed
# When restoring models, this value is needed to determine whether to restore moving
# average variables or not.
MOVING_AVERAGE_DECAY : 0.9999
# End: Regularization
#################################################
================================================
FILE: config/config_train.yaml
================================================
# Training specific configuration
RANDOM_SEED : 1.0
SESSION_CONFIG : {
# If true, then the device location of each variable will be printed
LOG_DEVICE_PLACEMENT : false,
# How much GPU memory we are allowed to pre-allocate
PER_PROCESS_GPU_MEMORY_FRACTION : 0.9,
# Set the number of accessible cpu threads. Leave as null to use everything.
# Set to 1 to help with debugging (makes the print statements legible)
INTRA_OP_PARALLELISM_THREADS : null,
INTER_OP_PARALLELISM_THREADS : null
}
#################################################
# Dataset Info
#
# The number of classes we are classifying
NUM_CLASSES : 200
# Number of training examples in the tfrecords. This is needed to compute the number of
# batches in an epoch
NUM_TRAIN_EXAMPLES : 5994
# Maximum number of iterations to run before stopping
NUM_TRAIN_ITERATIONS : 20000
# The number of images to pass through the network in a single iteration
BATCH_SIZE : 32
# Which model architecture to use.
MODEL_NAME : 'inception_v3'
# END: Dataset Info
#################################################
# Image Processing and Augmentation
# There are 5 steps to image processing:
# 1) Extract regions from the image
# 2) Extract a crops from each region
# 3) Resize the crops for the network architecture
# 4) Flip the crops
# 5) Modify the colors of the crops
IMAGE_PROCESSING : {
# All images will be resized to the [INPUT_SIZE, INPUT_SIZE, 3]
INPUT_SIZE : 299,
# 1) First we extract regions from the image
# What type of region should be extracted, either 'image' or 'bbox'
REGION_TYPE : 'image',
# Specific whole image region extraction configuration
WHOLE_IMAGE_CFG : {},
# Specific bounding box region extraction configuration
BBOX_CFG : {
# We can centrally expand a bbox (i.e. turn a tight crop into a loose crop)
# The fraction of time to expand the bounding box, 0 is never, 1 is always
DO_EXPANSION : 1,
EXPANSION_CFG : {
WIDTH_EXPANSION_FACTOR : 2.0, # Expand the width by a factor of 2 (centrally)
HEIGHT_EXPANSION_FACTOR : 2.0, # Expand the height by a factor of 2 (centrally)
}
},
# 2) Then we take a random crop from the region
# The fraction of time to take a random crop, 0 is never, 1 is always
DO_RANDOM_CROP : 1,
RANDOM_CROP_CFG : {
MIN_AREA : 0.5, # between 0 and 1, how much of the region must be included
MAX_AREA : 1.0, # between 0 and 1, how much of the region can be included
MIN_ASPECT_RATIO : 0.7, # minimum aspect ratio of the crop
MAX_ASPECT_RATIO : 1.33, # maximum aspect ratio of the crop
MAX_ATTEMPTS : 100, # maximum number of attempts before returning the whole region
},
# Alternatively we can take a central crop from the image
DO_CENTRAL_CROP : 0, # Fraction of the time to take a central crop, 0 is never, 1 is always
CENTRAL_CROP_FRACTION : 0.875, # Between 0 and 1, fraction of size to crop
# 3) We need to resize the extracted regions to feed into the network.
MAINTAIN_ASPECT_RATIO : false,
# Avoid slower resize operations (bi-cubic, etc.)
RESIZE_FAST : false,
# 4) We can flip the regions
# Randomly flip the image left right, 50% chance of flipping
DO_RANDOM_FLIP_LEFT_RIGHT : true,
# 5) We can distort the colors of the regions
# The fraction of time to distort the color, 0 is never, 1 is always
DO_COLOR_DISTORTION : 0.3,
# Avoids slower ops (random_hue and random_contrast)
COLOR_DISTORT_FAST : false
}
# END: Image Processing and Augmentation
#################################################
# Queues
#
# Number of threads to populate the batch queue
NUM_INPUT_THREADS : 4
# Should the data be shuffled?
SHUFFLE_QUEUE : true
# Capacity of the queue producing batched examples
QUEUE_CAPACITY : 1000
# Minimum size of the queue to ensure good shuffling
QUEUE_MIN : 200
# END: Queues
#################################################
# Saving Models and Summaries
#
# How often, in seconds, to save summaries.
SAVE_SUMMARY_SECS : 30
# How often, in seconds, to save the model
SAVE_INTERVAL_SECS : 1800
# The maximum number of recent checkpoint files to keep.
MAX_TO_KEEP : 3
# In addition to keeping the most recent `max_to_keep` checkpoint files,
# you might want to keep one checkpoint file for every N hours of training
# The default value of 10,000 hours effectively disables the feature.
KEEP_CHECKPOINT_EVERY_N_HOURS : 10000
# The frequency, in terms of global steps, that the loss and global step and logged.
LOG_EVERY_N_STEPS : 10
# END: Saving Models and Summaries
#################################################
# Learning Rate Parameters
LEARNING_RATE_DECAY_TYPE : 'exponential' # One of "fixed", "exponential", or "polynomial"
INITIAL_LEARNING_RATE : 0.01
# The minimal end learning rate used by a polynomial decay learning rate.
END_LEARNING_RATE : 0.0001
# The amount of label smoothing.
LABEL_SMOOTHING : 0.1
# How much to decay the learning rate
LEARNING_RATE_DECAY_FACTOR : 0.94
# Number of epochs between decaying the learning rate
NUM_EPOCHS_PER_DELAY : 4
LEARNING_RATE_STAIRCASE : true
# END: Learning Rate Parameters
#################################################
# Regularization
#
# The decay to use for the moving average. If 0, then moving average is not computed
MOVING_AVERAGE_DECAY : 0.9999
# The weight decay on the model weights
WEIGHT_DECAY : 0.00004
BATCHNORM_MOVING_AVERAGE_DECAY : 0.9997
BATCHNORM_EPSILON : 0.001
DROPOUT_KEEP_PROB : 0.5
CLIP_GRADIENT_NORM : 0 # If 0, no clipping is performed. Otherwise acts as a threshold to clip the gradients.
# End: Regularization
#################################################
# Optimization
#
# The name of the optimizer, one of "adadelta", "adagrad", "adam", "ftrl", "momentum", "sgd" or "rmsprop"
OPTIMIZER : 'rmsprop'
OPTIMIZER_EPSILON : 1.0
# The decay rate for adadelta.
ADADELTA_RHO: 0.95
# Starting value for the AdaGrad accumulators.
ADAGRAD_INITIAL_ACCUMULATOR_VALUE: 0.1
# The exponential decay rate for the 1st moment estimates.
ADAM_BETA1 : 0.9
# The exponential decay rate for the 2nd moment estimates.
ADAM_BETA2 : 0.99
# The learning rate power.
FTRL_LEARNING_RATE_POWER : -0.5
# Starting value for the FTRL accumulators.
FTRL_INITIAL_ACCUMULATOR_VALUE : 0.1
# The FTRL l1 regularization strength.
FTRL_L1 : 0.0
# The FTRL l2 regularization strength.
FTRL_L2 : 0.0
# The momentum for the MomentumOptimizer and RMSPropOptimizer
MOMENTUM : 0.9
# Decay term for RMSProp.
RMSPROP_DECAY : 0.9
# END: Optimization
#################################################
================================================
FILE: config/parse_config.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import yaml
from easydict import EasyDict as easydict
def parse_config_file(path_to_config):
with open(path_to_config) as f:
cfg = yaml.load(f)
return easydict(cfg)
================================================
FILE: export.py
================================================
"""
Export a trained model for application use.
Example for use with TensorFlow Serving:
python export.py \
--checkpoint_path model.ckpt-399739 \
--export_dir export \
--export_version 1 \
--config config_export.yaml \
--serving \
--add_preprocess \
--class_names class-codes.txt
Example for use with TensorFlow Mobile:
python export.py \
--checkpoint_path model.ckpt-399739 \
--export_dir export \
--export_version 1 \
--config config_export.yaml \
--class_names class-codes.txt
Author: Grant Van Horn
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import tensorflow as tf
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import graph_util
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model import utils
from tensorflow.python.tools import optimize_for_inference_lib
slim = tf.contrib.slim
from config.parse_config import parse_config_file
from nets import nets_factory
def export(checkpoint_path,
export_dir, export_version, export_for_serving, export_tflite, export_coreml,
add_preprocess_step,
output_classes, class_names,
batch_size, raveled_input,
cfg):
"""Export a model for use with TensorFlow Serving or for more conveinent use on mobile devices, etc.
Arguments:
checkpoint_path (str): Path to the specific model checkpoint file to export.
export_dir (str): Path to a directory to store the export files.
export_version (int): The version number of this export. If `export_for_serving` is True, then this version
number must not exist in the `export_dir`.
export_for_serving (bool): Export a model for use with TensorFlow Serving.
export_tflite (bool): Export a model for tensorflow lite.
export_coreml (bool): Export a model for coreml.
add_preprocess_step (bool): If True, then an input path for handling image byte strings will be added to the graph.
output_classes (bool): If True, then the class indices (or `class_names` if provided) will be output along with the scores.
class_names (list): A list of semantic class identifiers to embed within the model that correspond to the prediction
indices. Set to None to not embed.
batch_size (int or None): Specify a fixed batch size, or use None to keep it flexible. For tflite export you'll need a fixed batch size.
raveled_input (bool): If True, then the input is considered to be a raveled vector that will be reshaped to a fixed height and width. Otherwise it will be treated as the proper shape.
cfg (dict): Configuration dictionary.
"""
if not os.path.exists(export_dir):
print("Making export directory: %s" % (export_dir,))
os.makedirs(export_dir)
graph = tf.Graph()
array_input_node_name = "images"
bytes_input_node_name = "image_bytes"
output_node_name = "Predictions"
class_names_node_name = "names"
input_height = cfg.IMAGE_PROCESSING.INPUT_SIZE
input_width = cfg.IMAGE_PROCESSING.INPUT_SIZE
input_depth = 3
with graph.as_default():
global_step = slim.get_or_create_global_step()
# We want to store the preprocessing operation in the graph
if add_preprocess_step:
# The TensorFlow map_fn() function passes one argument only,
# so I have put this method here to take advantage of scope
# (to access input_height, etc.)
def preprocess_image(image_buffer):
"""Preprocess image bytes to 3D float Tensor."""
# Decode image bytes
image = tf.image.decode_image(image_buffer)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# make sure the image is of rank 3
image = tf.cond(
tf.equal(tf.rank(image), 2),
lambda: tf.expand_dims(image, 2),
lambda: image
)
num_channels = tf.shape(image)[2]
# if we decoded 1 channel (grayscale), then convert to a RGB image
image = tf.cond(
tf.equal(num_channels, 1),
lambda: tf.image.grayscale_to_rgb(image),
lambda: image
)
# if we decoded 2 channels (grayscale + alpha), then strip off the last dim and convert to rgb
image = tf.cond(
tf.equal(num_channels, 2),
lambda: tf.image.grayscale_to_rgb(
tf.expand_dims(image[:, :, 0], 2)),
lambda: image
)
# if we decoded 4 or more channels (rgb + alpha), then take the first three channels
image = tf.cond(
tf.greater(num_channels, 3),
lambda: image[:, :, :3],
lambda: image
)
# Resize the image to the input height and width for the network.
image = tf.expand_dims(image, 0)
image = tf.image.resize_bilinear(image,
[input_height, input_width],
align_corners=False)
image = tf.squeeze(image, [0])
# Finally, rescale to [-1,1] instead of [0, 1)
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0)
return image
image_bytes_placeholder = tf.placeholder(
tf.string, name=bytes_input_node_name)
preped_images = tf.map_fn(
preprocess_image, image_bytes_placeholder, dtype=tf.float32)
# Explicit name (we can't name the map_fn)
input_placeholder = tf.identity(
preped_images, name=array_input_node_name)
# We assume the client has preprocessed the data for us
else:
# Is the input coming in as a raveled vector? Or is it a tensor?
if raveled_input:
input_placeholder = tf.placeholder(tf.float32, shape=[batch_size, input_height * input_width * input_depth], name=array_input_node_name)
else:
input_placeholder = tf.placeholder(tf.float32, shape=[batch_size, input_height, input_width, input_depth], name=array_input_node_name)
# Reshape the images to proper tensors if they are coming in as vectors.
if raveled_input:
images = tf.reshape(input_placeholder,
[-1, input_height, input_width, input_depth])
else:
images = input_placeholder
arg_scope = nets_factory.arg_scopes_map[cfg.MODEL_NAME]()
with slim.arg_scope(arg_scope):
logits, end_points = nets_factory.networks_map[cfg.MODEL_NAME](
inputs=images,
num_classes=cfg.NUM_CLASSES,
is_training=False
)
class_scores = end_points['Predictions']
if output_classes:
if class_names == None:
class_names = tf.range(class_scores.get_shape().as_list()[1])
predicted_classes = tf.tile(tf.expand_dims(class_names, 0), [
tf.shape(class_scores)[0], 1], name=class_names_node_name)
# GVH: I would like to use tf.identity here, but the function tensorflow.python.framework.graph_util.remove_training_nodes
# called in (optimize_for_inference_lib.optimize_for_inference) removes the identity function.
# Sticking with an add 0 operation for now.
# We are doing this so that we can rename the output to `output_node_name` (i.e. something consistent)
output_node = tf.add(
end_points['Predictions'], 0., name=output_node_name)
output_node_name = output_node.op.name
if 'MOVING_AVERAGE_DECAY' in cfg and cfg.MOVING_AVERAGE_DECAY > 0:
variable_averages = tf.train.ExponentialMovingAverage(
cfg.MOVING_AVERAGE_DECAY, global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
else:
variables_to_restore = slim.get_variables_to_restore()
saver = tf.train.Saver(variables_to_restore, reshape=True)
if os.path.isdir(checkpoint_path):
checkpoint_dir = checkpoint_path
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
if checkpoint_path is None:
raise ValueError("Unable to find a model checkpoint in the "
"directory %s" % (checkpoint_dir,))
tf.logging.info('Exporting model: %s' % checkpoint_path)
sess_config = tf.ConfigProto(
log_device_placement=cfg.SESSION_CONFIG.LOG_DEVICE_PLACEMENT,
allow_soft_placement=True,
gpu_options=tf.GPUOptions(
per_process_gpu_memory_fraction=cfg.SESSION_CONFIG.PER_PROCESS_GPU_MEMORY_FRACTION
)
)
sess = tf.Session(graph=graph, config=sess_config)
if export_for_serving:
with tf.Session(graph=graph) as sess:
tf.global_variables_initializer().run()
saver.restore(sess, checkpoint_path)
save_path = os.path.join(export_dir, "%d" % (export_version,))
builder = saved_model_builder.SavedModelBuilder(save_path)
# Build the signature_def_map.
signature_def_map = {}
signature_def_outputs = {
'scores': utils.build_tensor_info(class_scores)}
if output_classes:
signature_def_outputs['classes'] = utils.build_tensor_info(
predicted_classes)
# image bytes input
if add_preprocess_step:
image_bytes_tensor_info = utils.build_tensor_info(
image_bytes_placeholder)
image_bytes_prediction_signature = signature_def_utils.build_signature_def(
inputs={'images': image_bytes_tensor_info},
outputs=signature_def_outputs,
method_name=signature_constants.PREDICT_METHOD_NAME
)
signature_def_map['predict_image_bytes'] = image_bytes_prediction_signature
# image array input
image_array_tensor_info = utils.build_tensor_info(
input_placeholder)
image_array_prediction_signature = signature_def_utils.build_signature_def(
inputs={'images': image_array_tensor_info},
outputs=signature_def_outputs,
method_name=signature_constants.PREDICT_METHOD_NAME
)
signature_def_map['predict_image_array'] = image_array_prediction_signature
signature_def_map[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = image_array_prediction_signature
legacy_init_op = tf.group(
tf.tables_initializer(), name='legacy_init_op')
builder.add_meta_graph_and_variables(
sess, [tag_constants.SERVING],
signature_def_map=signature_def_map,
legacy_init_op=legacy_init_op
)
builder.save()
print("Saved optimized model for TensorFlow Serving.")
else:
with sess.as_default():
tf.global_variables_initializer().run()
saver.restore(sess, checkpoint_path)
input_graph_def = graph.as_graph_def()
input_node_names = [array_input_node_name]
if add_preprocess_step:
input_node_names.append(bytes_input_node_name)
output_node_names = [output_node_name]
if output_classes:
output_node_names.append(class_names_node_name)
constant_graph_def = graph_util.convert_variables_to_constants(
sess=sess,
input_graph_def=input_graph_def,
output_node_names=output_node_names,
variable_names_whitelist=None,
variable_names_blacklist=None
)
if add_preprocess_step:
optimized_graph_def = constant_graph_def
else:
optimized_graph_def = optimize_for_inference_lib.optimize_for_inference(
input_graph_def=constant_graph_def,
input_node_names=input_node_names,
output_node_names=output_node_names,
placeholder_type_enum=dtypes.float32.as_datatype_enum
)
save_dir = os.path.join(export_dir, str(export_version))
if not os.path.exists(save_dir):
print("Making version directory in export directory: %s" %
(save_dir,))
os.makedirs(save_dir)
save_path = os.path.join(save_dir, 'optimized_model.pb')
with open(save_path, 'w') as f:
f.write(optimized_graph_def.SerializeToString())
print("Saved optimized model for mobile devices at: %s." %
(save_path,))
print("Input node names: %s" % (input_node_names,))
print("Output node name: %s" % (output_node_names,))
if export_tflite:
# Patch the tensorflow lite conversion module
# See here: https://github.com/tensorflow/tensorflow/issues/15410
import tempfile
import subprocess
tf.contrib.lite.tempfile = tempfile
tf.contrib.lite.subprocess = subprocess
assert batch_size != None, "We need a fixed batch size for the tensorflow lite export. (e.g. set --batch_size=1)"
tflite_model = tf.contrib.lite.toco_convert(
optimized_graph_def, [input_placeholder], [output_node])
tflite_save_path = os.path.join(
save_dir, 'optimized_model.tflite')
with open(tflite_save_path, 'wb') as f:
f.write(tflite_model)
print()
print("Saved optimized model for tensorflow lite: %s." %
(tflite_save_path,))
print("Input node names: %s" % (input_node_names,))
print("Output node name: %s" % (output_node_name,))
# We have to get out of the graph scope.
if export_coreml:
try:
import tfcoreml as tf_converter
except:
raise ValueError("Can't import tfcoreml, so we can't create a coreml model.")
assert batch_size != None, "We need a fixed batch size for the coreml export. (e.g. set --batch_size=1)"
assert raveled_input == False, "The input cannot be raveled. CoreML does not support `reshape()`."
coreml_save_path = os.path.join(save_dir, 'optimized_model.mlmodel')
tf_converter.convert(tf_model_path=save_path,
mlmodel_path=coreml_save_path,
output_feature_names=[output_node_name + ":0"],
input_name_shape_dict={'images:0': [
batch_size, input_height, input_width, input_depth]}
)
print()
print("Saved optimized model for coreml: %s." % (coreml_save_path,))
print("Input node names: %s" % (input_node_names,))
print("Output node name: %s" % (output_node_name,))
def parse_args():
parser = argparse.ArgumentParser(
description='Test an Inception V3 network')
parser.add_argument('--checkpoint_path', dest='checkpoint_path',
help='Path to the specific model you want to export.',
required=True, type=str)
parser.add_argument('--export_dir', dest='export_dir',
help='Path to a directory where the exported model will be saved.',
required=True, type=str)
parser.add_argument('--export_version', dest='export_version',
help='Version number of the model.',
required=True, type=int)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file',
required=True, type=str)
parser.add_argument('--serving', dest='serving',
help='Export for TensorFlow Serving usage. Otherwise, a constant graph will be generated.',
action='store_true', default=False)
parser.add_argument('--export_tflite', dest='export_tflite',
help='If True, then a tensorflow lite file will be produced along with the normal tensorflow model export (This is ignored if --serving is present).',
action='store_true', default=False)
parser.add_argument('--export_coreml', dest='export_coreml',
help='If True, then a coreml file will be produced along with the normal tensorflow model export (This is ignored if --serving is present).',
action='store_true', default=False)
parser.add_argument('--add_preprocess', dest='add_preprocess',
help='Add the image decoding and preprocessing nodes to the graph so that image bytes can be passed in.',
action='store_true', default=False)
parser.add_argument('--output_classes', dest='output_classes',
help='If True, then class indices (or names if `class_names` is provided) are output along with the scores.',
action='store_true', default=False)
parser.add_argument('--class_names', dest='class_names_path',
help='Path to the class names corresponding to each entry in the predictions output. This file should have one line for each index.',
required=False, type=str, default=None)
parser.add_argument('--batch_size', dest='batch_size',
help='Use this to specify a fixed batch size. Leave as None to have a flexible batch size. This must be specified to create tflite and coreml exports.',
required=False, type=int, default=None)
parser.add_argument('--raveled_input', dest='raveled_input',
help='If True, then the input is considered to be a vector that will be reshaped to the proper tensor form. This cannot be used with coreml',
action='store_true', default=False)
args = parser.parse_args()
return args
if __name__ == '__main__':
args = parse_args()
cfg = parse_config_file(args.config_file)
if args.class_names_path != None:
class_names = []
with open(args.class_names_path) as f:
for line in f:
class_names.append(line.strip())
else:
class_names = None
export(checkpoint_path=args.checkpoint_path,
export_dir=args.export_dir,
export_version=args.export_version,
export_for_serving=args.serving,
export_tflite=args.export_tflite,
export_coreml=args.export_coreml,
add_preprocess_step=args.add_preprocess,
output_classes=args.output_classes,
class_names=class_names,
batch_size=args.batch_size,
raveled_input=args.raveled_input,
cfg=cfg
)
================================================
FILE: extract.py
================================================
"""
Extract features.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import time
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
from config.parse_config import parse_config_file
from nets import nets_factory
from preprocessing import inputs
def extract_features(tfrecords, checkpoint_path, num_iterations, feature_keys, cfg, read_images=False):
"""
Extract and return the features
"""
tf.logging.set_verbosity(tf.logging.INFO)
graph = tf.Graph()
with graph.as_default():
global_step = slim.get_or_create_global_step()
with tf.device('/cpu:0'):
batch_dict = inputs.input_nodes(
tfrecords=tfrecords,
cfg=cfg.IMAGE_PROCESSING,
num_epochs=1,
batch_size=cfg.BATCH_SIZE,
num_threads=cfg.NUM_INPUT_THREADS,
shuffle_batch =cfg.SHUFFLE_QUEUE,
random_seed=cfg.RANDOM_SEED,
capacity=cfg.QUEUE_CAPACITY,
min_after_dequeue=cfg.QUEUE_MIN,
add_summaries=False,
input_type='classification',
read_filenames=read_images
)
arg_scope = nets_factory.arg_scopes_map[cfg.MODEL_NAME]()
with slim.arg_scope(arg_scope):
logits, end_points = nets_factory.networks_map[cfg.MODEL_NAME](
inputs=batch_dict['inputs'],
num_classes=cfg.NUM_CLASSES,
is_training=False
)
predicted_labels = tf.argmax(end_points['Predictions'], 1)
if 'MOVING_AVERAGE_DECAY' in cfg and cfg.MOVING_AVERAGE_DECAY > 0:
variable_averages = tf.train.ExponentialMovingAverage(
cfg.MOVING_AVERAGE_DECAY, global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
variables_to_restore[global_step.op.name] = global_step
else:
variables_to_restore = slim.get_variables_to_restore()
variables_to_restore.append(global_step)
saver = tf.train.Saver(variables_to_restore, reshape=True)
num_batches = num_iterations
num_items = num_batches * cfg.BATCH_SIZE
fetches = []
feature_stores = []
for feature_key in feature_keys:
feature = tf.reshape(end_points[feature_key], [cfg.BATCH_SIZE, -1])
num_elements = feature.get_shape().as_list()[1]
feature_stores.append(np.empty([num_items, num_elements], dtype=np.float32))
fetches.append(feature)
fetches.append(batch_dict['ids'])
feature_stores.append(np.empty(num_items, dtype=np.object))
if os.path.isdir(checkpoint_path):
checkpoint_dir = checkpoint_path
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
if checkpoint_path is None:
raise ValueError("Unable to find a model checkpoint in the " \
"directory %s" % (checkpoint_dir,))
tf.logging.info('Classifying records using %s' % checkpoint_path)
coord = tf.train.Coordinator()
sess_config = tf.ConfigProto(
log_device_placement=cfg.SESSION_CONFIG.LOG_DEVICE_PLACEMENT,
allow_soft_placement = True,
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=cfg.SESSION_CONFIG.PER_PROCESS_GPU_MEMORY_FRACTION
)
)
sess = tf.Session(graph=graph, config=sess_config)
with sess.as_default():
tf.global_variables_initializer().run()
tf.local_variables_initializer().run()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
# Restore from checkpoint
saver.restore(sess, checkpoint_path)
print_str = ', '.join([
'Step: %d',
'Time/image (ms): %.1f'
])
step = 0
while not coord.should_stop():
t = time.time()
outputs = sess.run(fetches)
dt = time.time()-t
idx1 = cfg.BATCH_SIZE * step
idx2 = idx1 + cfg.BATCH_SIZE
for i in range(len(outputs)):
feature_stores[i][idx1:idx2] = outputs[i]
step += 1
print(print_str % (step, (dt / cfg.BATCH_SIZE) * 1000))
if num_iterations > 0 and step == num_iterations:
break
except tf.errors.OutOfRangeError as e:
pass
coord.request_stop()
coord.join(threads)
feature_dict = {feature_key : feature for feature_key, feature in zip(feature_keys, feature_stores[:-1])}
feature_dict['ids'] = feature_stores[-1]
return feature_dict
def extract_and_save(tfrecords, checkpoint_path, save_path, num_iterations, feature_keys, cfg, read_images=False):
"""Extract and save the features
Args:
tfrecords (list)
checkpoint_path (str)
save_dir (str)
max_iterations (int)
save_logits (bool)
cfg (EasyDict)
"""
feature_dict = extract_features(tfrecords, checkpoint_path, num_iterations, feature_keys, cfg, read_images=read_images)
# save the results
np.savez(save_path, **feature_dict)
def parse_args():
parser = argparse.ArgumentParser(description='Classify images, optionally saving the logits.')
parser.add_argument('--tfrecords', dest='tfrecords',
help='Paths to tfrecords.', type=str,
nargs='+', required=True)
parser.add_argument('--checkpoint_path', dest='checkpoint_path',
help='Path to a specific model to test against. If a directory, then the newest checkpoint file will be used.', type=str,
required=True)
parser.add_argument('--save_path', dest='save_path',
help='File name path to a save the classification results.', type=str,
required=True)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file',
required=True, type=str)
parser.add_argument('--batch_size', dest='batch_size',
help='The number of images in a batch.',
required=True, type=int)
parser.add_argument('--batches', dest='batches',
help='Maximum number of iterations to run. Default is all records (modulo the batch size).',
required=True, type=int)
parser.add_argument('--features', dest='features',
help='The features to extract. These are keys into the end_points dictionary returned by the model architecture.',
type=str, nargs='+', required=True)
parser.add_argument('--model_name', dest='model_name',
help='The name of the architecture to use.',
required=False, type=str, default=None)
parser.add_argument('--read_images', dest='read_images',
help='Read the images from the file system using the `filename` field rather than using the `encoded` field of the tfrecord.',
action='store_true', default=False)
args = parser.parse_args()
return args
def main():
args = parse_args()
cfg = parse_config_file(args.config_file)
if args.batch_size != None:
cfg.BATCH_SIZE = args.batch_size
if args.model_name != None:
cfg.MODEL_NAME = args.model_name
extract_and_save(
tfrecords=args.tfrecords,
checkpoint_path=args.checkpoint_path,
save_path = args.save_path,
num_iterations=args.batches,
feature_keys=args.features,
cfg=cfg,
read_images=args.read_images
)
if __name__ == '__main__':
main()
================================================
FILE: nets/README.md
================================================
# Models
This directory contains the available classification models. All of these models were copied from the [TensorFlow Models repo](https://github.com/tensorflow/models/tree/master/slim/nets) and updated to TensorFlow r1.0.
The table below lists relevant information for each model. To use one of these models (e.g. when using the training scripts), simply set the `--model_name` flag to the appropriate name. The number of parameters and the number of flops were computed using the `profile` function in [net_profile.py](net_profile.py). I assumed a batch size of 1, and 1000 classes for all models. All available checkpoint files are from models trained on the [ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/) dataset. Top-1 and Top-5 numbers correspond to performance on that datasets. When fine-tuning from one of these checkpoints, it is recommended to use the same image size as the default image size for that model.
| Model | Name | TF-Slim File | Checkpoint | Top-1 Accuracy | Top-5 Accuracy | Default Image Size | Num Params | Num Flops |
:----:|:----:|:------------:|:----------:|:-------:|:--------:|:--------:|:--------:|:--------:|
[Inception V1](http://arxiv.org/abs/1409.4842v1) | inception_v1 | [Code](inception_v1.py) | [Checkpoint](http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz) | 69.8 | 89.6 | 224px | 6,617,624 | 3.00b |
[Inception V2](http://arxiv.org/abs/1502.03167) | inception_v2 | [Code](inception_v2.py) | [Checkpoint](http://download.tensorflow.org/models/inception_v2_2016_08_28.tar.gz) | 73.9 | 91.8 | 224px | 11,178,336 | 3.87b |
[Inception V3](http://arxiv.org/abs/1512.00567) | inception_v3 | [Code](inception_v3.py) | [Checkpoint](http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz) | 78.0 | 93.9 | 299px | 27,143,152 | 11.44b |
[Inception V4](http://arxiv.org/abs/1602.07261) | inception_v4 | [Code](inception_v4.py) | [Checkpoint](http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz) | 80.2 | 95.2 | 299px | 46,006,800 | 24.52b |
[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261) | inception_resnet_v2 | [Code](inception_resnet_v2.py) | [Checkpoint](http://download.tensorflow.org/models/inception_resnet_v2_2016_08_30.tar.gz) | 80.4 | 95.3 | 299px | 59,179,952 | 26.34b |
[ResNet V2 50](https://arxiv.org/abs/1603.05027) | resnet_v2_50 | [Code](resnet_v2.py) | [Checkpoint](http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz) | 75.6 | 92.8 | 299px | 25,568,360 | 13.08b |
[ResNet V2 101](https://arxiv.org/abs/1603.05027) | resnet_v2_101 | [Code](resnet_v2.py) | [Checkpoint](http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz) | 77.0 | 93.7 | 299px | 44,577,896 | 26.77b |
[ResNet V2 152](https://arxiv.org/abs/1603.05027) | resnet_v2_152 | [Code](resnet_v2.py) | [Checkpoint](http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz) | 77.8 | 94.1 | 299px | 60,236,904 | 40.45b |
[MobileNet-v1](https://arxiv.org/abs/1704.04861) | mobilenet_v1 | [Code](mobilenet_v1.py) | [Checkpoint](http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz) | 70.7 | 89.5 | 224px | 4,231,976 | 1.14b |
# Finetuning
When you finetune one of the above models, you'll start the training procedure using something like:
```
python train.py \
--tfrecords $DATASET_DIR/train* \
--logdir $EXPERIMENT_DIR/logdir \
--config $EXPERIMENT_DIR/config_train.yaml \
--pretrained_model $PRETRAINED_MODEL \
--checkpoint_exclude_scopes
```
The `--checkpoint_exclude_scopes` argument allows you to prevent restoring variables that have different sizes, which are typically your logit variables (which have a different size due to the number of classes in your application being different than the number of classes in ImageNet). The below table provides the proper value for `--checkpoint_exclude_scopes` for each model.
| Model | Name | TF-Slim File | Default Image Size | Exclude Scopes |
:----:|:----:|:------------:|:----------:|:-------:|
[Inception V1](http://arxiv.org/abs/1409.4842v1) | inception_v1 | [Code](inception_v1.py) | 224px | InceptionV1/Logits |
[Inception V2](http://arxiv.org/abs/1502.03167) | inception_v2 | [Code](inception_v2.py) | 224px | InceptionV2/Logits |
[Inception V3](http://arxiv.org/abs/1512.00567) | inception_v3 | [Code](inception_v3.py) | 299px | InceptionV3/Logits InceptionV3/AuxLogits |
[Inception V4](http://arxiv.org/abs/1602.07261) | inception_v4 | [Code](inception_v4.py) | 299px | InceptionV4/Logits InceptionV4/AuxLogits |
[Inception-ResNet-v2](http://arxiv.org/abs/1602.07261) | inception_resnet_v2 | [Code](inception_resnet_v2.py) | 299px | InceptionResnetV2/Logits InceptionResnetV2/AuxLogits |
[ResNet V2 50](https://arxiv.org/abs/1603.05027) | resnet_v2_50 | [Code](resnet_v2.py) | 224px | resnet_v2_50/logits |
[ResNet V2 101](https://arxiv.org/abs/1603.05027) | resnet_v2_101 | [Code](resnet_v2.py) | 224px | resnet_v2_101/logits |
[ResNet V2 152](https://arxiv.org/abs/1603.05027) | resnet_v2_152 | [Code](resnet_v2.py) | 224px | resnet_v2_152/logits |
[MobileNet-v1](https://arxiv.org/abs/1704.04861) | mobilenet_v1 | [Code](mobilenet_v1.py) | 224px | MobilenetV1/Logits |
================================================
FILE: nets/__init__.py
================================================
================================================
FILE: nets/inception.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Brings all inception models under one namespace."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# pylint: disable=unused-import
from nets.inception_resnet_v2 import inception_resnet_v2
from nets.inception_resnet_v2 import inception_resnet_v2_arg_scope
from nets.inception_v1 import inception_v1
from nets.inception_v1 import inception_v1_arg_scope
from nets.inception_v1 import inception_v1_base
from nets.inception_v2 import inception_v2
from nets.inception_v2 import inception_v2_arg_scope
from nets.inception_v2 import inception_v2_base
from nets.inception_v3 import inception_v3
from nets.inception_v3 import inception_v3_arg_scope
from nets.inception_v3 import inception_v3_base
from nets.inception_v4 import inception_v4
from nets.inception_v4 import inception_v4_arg_scope
from nets.inception_v4 import inception_v4_base
# pylint: enable=unused-import
================================================
FILE: nets/inception_resnet_v2.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition of the Inception Resnet V2 architecture.
As described in http://arxiv.org/abs/1602.07261.
Inception-v4, Inception-ResNet and the Impact of Residual Connections
on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
slim = tf.contrib.slim
def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
"""Builds the 35x35 resnet block."""
with tf.variable_scope(scope, 'Block35', [net], reuse=reuse):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1')
with tf.variable_scope('Branch_1'):
tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1')
tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3')
tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3')
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2])
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
activation_fn=None, scope='Conv2d_1x1')
net += scale * up
if activation_fn:
net = activation_fn(net)
return net
def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
"""Builds the 17x17 resnet block."""
with tf.variable_scope(scope, 'Block17', [net], reuse=reuse):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
with tf.variable_scope('Branch_1'):
tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7],
scope='Conv2d_0b_1x7')
tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1],
scope='Conv2d_0c_7x1')
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
activation_fn=None, scope='Conv2d_1x1')
net += scale * up
if activation_fn:
net = activation_fn(net)
return net
def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None):
"""Builds the 8x8 resnet block."""
with tf.variable_scope(scope, 'Block8', [net], reuse=reuse):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1')
with tf.variable_scope('Branch_1'):
tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3],
scope='Conv2d_0b_1x3')
tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1],
scope='Conv2d_0c_3x1')
mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2])
up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None,
activation_fn=None, scope='Conv2d_1x1')
net += scale * up
if activation_fn:
net = activation_fn(net)
return net
def inception_resnet_v2(inputs, num_classes=1001, is_training=True,
dropout_keep_prob=0.8,
reuse=None,
scope='InceptionResnetV2'):
"""Creates the Inception Resnet V2 model.
Args:
inputs: a 4-D tensor of size [batch_size, height, width, 3].
num_classes: number of predicted classes.
is_training: whether is training or not.
dropout_keep_prob: float, the fraction to keep before final layer.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
logits: the logits outputs of the model.
end_points: the set of end_points from the inception model.
"""
end_points = {}
with tf.variable_scope(scope, 'InceptionResnetV2', [inputs], reuse=reuse):
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='SAME'):
# 149 x 149 x 32
net = slim.conv2d(inputs, 32, 3, stride=2, padding='VALID',
scope='Conv2d_1a_3x3')
end_points['Conv2d_1a_3x3'] = net
# 147 x 147 x 32
net = slim.conv2d(net, 32, 3, padding='VALID',
scope='Conv2d_2a_3x3')
end_points['Conv2d_2a_3x3'] = net
# 147 x 147 x 64
net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3')
end_points['Conv2d_2b_3x3'] = net
# 73 x 73 x 64
net = slim.max_pool2d(net, 3, stride=2, padding='VALID',
scope='MaxPool_3a_3x3')
end_points['MaxPool_3a_3x3'] = net
# 73 x 73 x 80
net = slim.conv2d(net, 80, 1, padding='VALID',
scope='Conv2d_3b_1x1')
end_points['Conv2d_3b_1x1'] = net
# 71 x 71 x 192
net = slim.conv2d(net, 192, 3, padding='VALID',
scope='Conv2d_4a_3x3')
end_points['Conv2d_4a_3x3'] = net
# 35 x 35 x 192
net = slim.max_pool2d(net, 3, stride=2, padding='VALID',
scope='MaxPool_5a_3x3')
end_points['MaxPool_5a_3x3'] = net
# 35 x 35 x 320
with tf.variable_scope('Mixed_5b'):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1')
with tf.variable_scope('Branch_1'):
tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5,
scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1')
tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3,
scope='Conv2d_0b_3x3')
tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3,
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME',
scope='AvgPool_0a_3x3')
tower_pool_1 = slim.conv2d(tower_pool, 64, 1,
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[tower_conv, tower_conv1_1,
tower_conv2_2, tower_pool_1])
end_points['Mixed_5b'] = net
net = slim.repeat(net, 10, block35, scale=0.17)
# 17 x 17 x 1024
with tf.variable_scope('Mixed_6a'):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 384, 3, stride=2, padding='VALID',
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3,
scope='Conv2d_0b_3x3')
tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3,
stride=2, padding='VALID',
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
tower_pool = slim.max_pool2d(net, 3, stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[tower_conv, tower_conv1_2, tower_pool])
end_points['Mixed_6a'] = net
net = slim.repeat(net, 20, block17, scale=0.10)
# Auxillary tower
with tf.variable_scope('AuxLogits'):
# Originally, kernel_size = 5
# However, if we change the input size then we need to change the kernel size
# We want to pool the feature map to be 5x5xC
# With padding = 0, and stride 3, this means our kernel is H - 12
kernel_size = [net.get_shape().as_list()[1] - 12] * 2
aux = slim.avg_pool2d(net, kernel_size, stride=3, padding='VALID',
scope='Conv2d_1a_3x3')
aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1')
aux = slim.conv2d(aux, 768, aux.get_shape()[1:3],
padding='VALID', scope='Conv2d_2a_5x5')
aux = slim.flatten(aux)
aux = slim.fully_connected(aux, num_classes, activation_fn=None,
scope='Logits')
end_points['AuxLogits'] = aux
with tf.variable_scope('Mixed_7a'):
with tf.variable_scope('Branch_0'):
tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1')
tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3,
scope='Conv2d_0b_3x3')
tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_3'):
tower_pool = slim.max_pool2d(net, 3, stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[tower_conv_1, tower_conv1_1,
tower_conv2_2, tower_pool])
end_points['Mixed_7a'] = net
net = slim.repeat(net, 9, block8, scale=0.20)
net = block8(net, activation_fn=None)
net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1')
end_points['Conv2d_7b_1x1'] = net
with tf.variable_scope('Logits'):
end_points['PrePool'] = net
net = slim.avg_pool2d(net, net.get_shape()[1:3], padding='VALID',
scope='AvgPool_1a_8x8')
net = slim.flatten(net)
net = slim.dropout(net, dropout_keep_prob, is_training=is_training,
scope='Dropout')
end_points['PreLogitsFlatten'] = net
logits = slim.fully_connected(net, num_classes, activation_fn=None,
scope='Logits')
end_points['Logits'] = logits
end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
return logits, end_points
inception_resnet_v2.default_image_size = 299
def inception_resnet_v2_arg_scope(weight_decay=0.00004,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001):
"""Yields the scope with the default parameters for inception_resnet_v2.
Args:
weight_decay: the weight decay for weights variables.
batch_norm_decay: decay for the moving average of batch_norm momentums.
batch_norm_epsilon: small float added to variance to avoid dividing by zero.
Returns:
a arg_scope with the parameters needed for inception_resnet_v2.
"""
# Set weight_decay for weights in conv2d and fully_connected layers.
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_regularizer=slim.l2_regularizer(weight_decay),
biases_regularizer=slim.l2_regularizer(weight_decay)):
batch_norm_params = {
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
}
# Set activation_fn and parameters for batch_norm.
with slim.arg_scope([slim.conv2d], activation_fn=tf.nn.relu,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params) as scope:
return scope
================================================
FILE: nets/inception_resnet_v2_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for slim.inception_resnet_v2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception
class InceptionTest(tf.test.TestCase):
def testBuildLogits(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
with self.test_session():
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_resnet_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
def testBuildEndPoints(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
with self.test_session():
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_resnet_v2(inputs, num_classes)
self.assertTrue('Logits' in end_points)
logits = end_points['Logits']
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('AuxLogits' in end_points)
aux_logits = end_points['AuxLogits']
self.assertListEqual(aux_logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['PrePool']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 8, 8, 1536])
def testVariablesSetDevice(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
with self.test_session():
inputs = tf.random_uniform((batch_size, height, width, 3))
# Force all Variables to reside on the device.
with tf.variable_scope('on_cpu'), tf.device('/cpu:0'):
inception.inception_resnet_v2(inputs, num_classes)
with tf.variable_scope('on_gpu'), tf.device('/gpu:0'):
inception.inception_resnet_v2(inputs, num_classes)
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'):
self.assertDeviceEqual(v.device, '/cpu:0')
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'):
self.assertDeviceEqual(v.device, '/gpu:0')
def testHalfSizeImages(self):
batch_size = 5
height, width = 150, 150
num_classes = 1000
with self.test_session():
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_resnet_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['PrePool']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 3, 3, 1536])
def testUnknownBatchSize(self):
batch_size = 1
height, width = 299, 299
num_classes = 1000
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = inception.inception_resnet_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionResnetV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 299, 299
num_classes = 1000
with self.test_session() as sess:
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_resnet_v2(eval_inputs,
num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 150, 150
num_classes = 1000
with self.test_session() as sess:
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
inception.inception_resnet_v2(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = inception.inception_resnet_v2(eval_inputs,
num_classes,
is_training=False,
reuse=True)
predictions = tf.argmax(logits, 1)
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/inception_utils.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains common code shared by all inception models.
Usage of arg scope:
with slim.arg_scope(inception_arg_scope()):
logits, end_points = inception.inception_v3(images, num_classes,
is_training=is_training)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
slim = tf.contrib.slim
def inception_arg_scope(weight_decay=0.00004,
use_batch_norm=True,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001):
"""Defines the default arg scope for inception models.
Args:
weight_decay: The weight decay to use for regularizing the model.
use_batch_norm: "If `True`, batch_norm is applied after each convolution.
batch_norm_decay: Decay for batch norm moving average.
batch_norm_epsilon: Small float added to variance to avoid dividing by zero
in batch norm.
Returns:
An `arg_scope` to use for the inception models.
"""
batch_norm_params = {
# Decay for the moving averages.
'decay': batch_norm_decay,
# epsilon to prevent 0s in variance.
'epsilon': batch_norm_epsilon,
# collection containing update_ops.
'updates_collections': tf.GraphKeys.UPDATE_OPS,
}
if use_batch_norm:
normalizer_fn = slim.batch_norm
normalizer_params = batch_norm_params
else:
normalizer_fn = None
normalizer_params = {}
# Set weight_decay for weights in Conv and FC layers.
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_regularizer=slim.l2_regularizer(weight_decay)):
with slim.arg_scope(
[slim.conv2d],
weights_initializer=slim.variance_scaling_initializer(),
activation_fn=tf.nn.relu,
normalizer_fn=normalizer_fn,
normalizer_params=normalizer_params) as sc:
return sc
================================================
FILE: nets/inception_v1.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition for inception v1 classification network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def inception_v1_base(inputs,
final_endpoint='Mixed_5c',
scope='InceptionV1'):
"""Defines the Inception V1 base architecture.
This architecture is defined in:
Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
http://arxiv.org/pdf/1409.4842v1.pdf.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
final_endpoint: specifies the endpoint to construct the network up to. It
can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b', 'Mixed_5c']
scope: Optional variable_scope.
Returns:
A dictionary from components of the network to the corresponding activation.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values.
"""
end_points = {}
with tf.variable_scope(scope, 'InceptionV1', [inputs]):
with slim.arg_scope(
[slim.conv2d, slim.fully_connected],
weights_initializer=trunc_normal(0.01)):
with slim.arg_scope([slim.conv2d, slim.max_pool2d],
stride=1, padding='SAME'):
end_point = 'Conv2d_1a_7x7'
net = slim.conv2d(inputs, 64, [7, 7], stride=2, scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'MaxPool_2a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Conv2d_2b_1x1'
net = slim.conv2d(net, 64, [1, 1], scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Conv2d_2c_3x3'
net = slim.conv2d(net, 192, [3, 3], scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'MaxPool_3a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_3b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 96, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 128, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 16, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 32, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 32, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_3c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 192, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'MaxPool_4a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_4b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 96, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 208, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 16, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 48, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_4c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 112, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 24, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_4d'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 128, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 256, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 24, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_4e'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 112, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 144, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 288, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 64, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 64, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_4f'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 256, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 320, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'MaxPool_5a_2x2'
net = slim.max_pool2d(net, [2, 2], stride=2, scope=end_point)
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_5b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 256, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 160, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 320, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 32, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0a_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
end_point = 'Mixed_5c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 384, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 384, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, 48, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 128, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if final_endpoint == end_point: return net, end_points
raise ValueError('Unknown final endpoint %s' % final_endpoint)
def inception_v1(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.8,
prediction_fn=slim.softmax,
spatial_squeeze=True,
reuse=None,
scope='InceptionV1'):
"""Defines the Inception V1 architecture.
This architecture is defined in:
Going deeper with convolutions
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich.
http://arxiv.org/pdf/1409.4842v1.pdf.
The default image size used to train this network is 224x224.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: number of predicted classes.
is_training: whether is training or not.
dropout_keep_prob: the percentage of activation values that are retained.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
activation.
"""
# Final pooling and prediction
with tf.variable_scope(scope, 'InceptionV1', [inputs, num_classes],
reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
net, end_points = inception_v1_base(inputs, scope=scope)
with tf.variable_scope('Logits'):
net = slim.avg_pool2d(net, [7, 7], stride=1, scope='MaxPool_0a_7x7')
net = slim.dropout(net,
dropout_keep_prob, scope='Dropout_0b')
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='Conv2d_0c_1x1')
if spatial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
inception_v1.default_image_size = 224
inception_v1_arg_scope = inception_utils.inception_arg_scope
================================================
FILE: nets/inception_v1_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for nets.inception_v1."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from nets import inception
slim = tf.contrib.slim
class InceptionV1Test(tf.test.TestCase):
def testBuildClassificationNetwork(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('Predictions' in end_points)
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
[batch_size, num_classes])
def testBuildBaseNetwork(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
mixed_6c, end_points = inception.inception_v1_base(inputs)
self.assertTrue(mixed_6c.op.name.startswith('InceptionV1/Mixed_5c'))
self.assertListEqual(mixed_6c.get_shape().as_list(),
[batch_size, 7, 7, 1024])
expected_endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b',
'Mixed_3c', 'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c',
'Mixed_4d', 'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2',
'Mixed_5b', 'Mixed_5c']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
def testBuildOnlyUptoFinalEndpoint(self):
batch_size = 5
height, width = 224, 224
endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
'MaxPool_4a_3x3', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d',
'Mixed_4e', 'Mixed_4f', 'MaxPool_5a_2x2', 'Mixed_5b',
'Mixed_5c']
for index, endpoint in enumerate(endpoints):
with tf.Graph().as_default():
inputs = tf.random_uniform((batch_size, height, width, 3))
out_tensor, end_points = inception.inception_v1_base(
inputs, final_endpoint=endpoint)
self.assertTrue(out_tensor.op.name.startswith(
'InceptionV1/' + endpoint))
self.assertItemsEqual(endpoints[:index+1], end_points)
def testBuildAndCheckAllEndPointsUptoMixed5c(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v1_base(inputs,
final_endpoint='Mixed_5c')
endpoints_shapes = {'Conv2d_1a_7x7': [5, 112, 112, 64],
'MaxPool_2a_3x3': [5, 56, 56, 64],
'Conv2d_2b_1x1': [5, 56, 56, 64],
'Conv2d_2c_3x3': [5, 56, 56, 192],
'MaxPool_3a_3x3': [5, 28, 28, 192],
'Mixed_3b': [5, 28, 28, 256],
'Mixed_3c': [5, 28, 28, 480],
'MaxPool_4a_3x3': [5, 14, 14, 480],
'Mixed_4b': [5, 14, 14, 512],
'Mixed_4c': [5, 14, 14, 512],
'Mixed_4d': [5, 14, 14, 512],
'Mixed_4e': [5, 14, 14, 528],
'Mixed_4f': [5, 14, 14, 832],
'MaxPool_5a_2x2': [5, 7, 7, 832],
'Mixed_5b': [5, 7, 7, 832],
'Mixed_5c': [5, 7, 7, 1024]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name in endpoints_shapes:
expected_shape = endpoints_shapes[endpoint_name]
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testModelHasExpectedNumberOfParameters(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception.inception_v1_arg_scope()):
inception.inception_v1_base(inputs)
total_params, _ = slim.model_analyzer.analyze_vars(
slim.get_model_variables())
self.assertAlmostEqual(5607184, total_params)
def testHalfSizeImages(self):
batch_size = 5
height, width = 112, 112
inputs = tf.random_uniform((batch_size, height, width, 3))
mixed_5c, _ = inception.inception_v1_base(inputs)
self.assertTrue(mixed_5c.op.name.startswith('InceptionV1/Mixed_5c'))
self.assertListEqual(mixed_5c.get_shape().as_list(),
[batch_size, 4, 4, 1024])
def testUnknownImageShape(self):
tf.reset_default_graph()
batch_size = 2
height, width = 224, 224
num_classes = 1000
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, shape=(batch_size, None, None, 3))
logits, end_points = inception.inception_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_5c']
feed_dict = {inputs: input_np}
tf.global_variables_initializer().run()
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
def testUnknowBatchSize(self):
batch_size = 1
height, width = 224, 224
num_classes = 1000
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = inception.inception_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 224, 224
num_classes = 1000
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_v1(eval_inputs, num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 224, 224
num_classes = 1000
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
inception.inception_v1(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = inception.inception_v1(eval_inputs, num_classes, reuse=True)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
def testLogitsNotSqueezed(self):
num_classes = 25
images = tf.random_uniform([1, 224, 224, 3])
logits, _ = inception.inception_v1(images,
num_classes=num_classes,
spatial_squeeze=False)
with self.test_session() as sess:
tf.global_variables_initializer().run()
logits_out = sess.run(logits)
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/inception_v2.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition for inception v2 classification network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def inception_v2_base(inputs,
final_endpoint='Mixed_5c',
min_depth=16,
depth_multiplier=1.0,
scope=None):
"""Inception v2 (6a2).
Constructs an Inception v2 network from inputs to the given final endpoint.
This method can construct the network up to the layer inception(5b) as
described in http://arxiv.org/abs/1502.03167.
Args:
inputs: a tensor of shape [batch_size, height, width, channels].
final_endpoint: specifies the endpoint to construct the network up to. It
can be one of ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c', 'Mixed_4a',
'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_5a', 'Mixed_5b',
'Mixed_5c'].
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
scope: Optional variable_scope.
Returns:
tensor_out: output tensor corresponding to the final_endpoint.
end_points: a set of activations for external use, for example summaries or
losses.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0
"""
# end_points will collect relevant activations for external use, for example
# summaries or losses.
end_points = {}
# Used to find thinned depths for each layer.
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
depth = lambda d: max(int(d * depth_multiplier), min_depth)
with tf.variable_scope(scope, 'InceptionV2', [inputs]):
with slim.arg_scope(
[slim.conv2d, slim.max_pool2d, slim.avg_pool2d, slim.separable_conv2d],
stride=1, padding='SAME'):
# Note that sizes in the comments below assume an input spatial size of
# 224x224, however, the inputs can be of any size greater 32x32.
# 224 x 224 x 3
end_point = 'Conv2d_1a_7x7'
# depthwise_multiplier here is different from depth_multiplier.
# depthwise_multiplier determines the output channels of the initial
# depthwise conv (see docs for tf.nn.separable_conv2d), while
# depth_multiplier controls the # channels of the subsequent 1x1
# convolution. Must have
# in_channels * depthwise_multipler <= out_channels
# so that the separable convolution is not overparameterized.
depthwise_multiplier = min(int(depth(64) / 3), 8)
net = slim.separable_conv2d(
inputs, depth(64), [7, 7], depth_multiplier=depthwise_multiplier,
stride=2, weights_initializer=trunc_normal(1.0),
scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 112 x 112 x 64
end_point = 'MaxPool_2a_3x3'
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 64
end_point = 'Conv2d_2b_1x1'
net = slim.conv2d(net, depth(64), [1, 1], scope=end_point,
weights_initializer=trunc_normal(0.1))
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 64
end_point = 'Conv2d_2c_3x3'
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 56 x 56 x 192
end_point = 'MaxPool_3a_3x3'
net = slim.max_pool2d(net, [3, 3], scope=end_point, stride=2)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 28 x 28 x 192
# Inception module.
end_point = 'Mixed_3b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(32), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 28 x 28 x 256
end_point = 'Mixed_3c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(64), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 28 x 28 x 320
end_point = 'Mixed_4a'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(
net, depth(128), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, depth(160), [3, 3], stride=2,
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(
branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(
branch_1, depth(96), [3, 3], stride=2, scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(
net, [3, 3], stride=2, scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 14 x 14 x 576
end_point = 'Mixed_4b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(224), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(64), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(
branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(96), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(128), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 14 x 14 x 576
end_point = 'Mixed_4c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(96), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(128), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(96), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(128), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(128), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 14 x 14 x 576
end_point = 'Mixed_4d'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(128), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(160), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(128), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(160), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(160), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(96), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 14 x 14 x 576
end_point = 'Mixed_4e'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(96), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(128), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(192), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(160), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(192), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(192), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(96), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 14 x 14 x 576
end_point = 'Mixed_5a'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(
net, depth(128), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, depth(192), [3, 3], stride=2,
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(192), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(256), [3, 3],
scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, depth(256), [3, 3], stride=2,
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2,
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 7 x 7 x 1024
end_point = 'Mixed_5b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(352), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(192), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(320), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(160), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(128), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 7 x 7 x 1024
end_point = 'Mixed_5c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(352), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(
net, depth(192), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(320), [3, 3],
scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(
net, depth(192), [1, 1],
weights_initializer=trunc_normal(0.09),
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(224), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.max_pool2d(net, [3, 3], scope='MaxPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(128), [1, 1],
weights_initializer=trunc_normal(0.1),
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
raise ValueError('Unknown final endpoint %s' % final_endpoint)
def inception_v2(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.8,
min_depth=16,
depth_multiplier=1.0,
prediction_fn=slim.softmax,
spatial_squeeze=True,
reuse=None,
scope='InceptionV2'):
"""Inception v2 model for classification.
Constructs an Inception v2 network for classification as described in
http://arxiv.org/abs/1502.03167.
The default image size used to train this network is 224x224.
Args:
inputs: a tensor of shape [batch_size, height, width, channels].
num_classes: number of predicted classes.
is_training: whether is training or not.
dropout_keep_prob: the percentage of activation values that are retained.
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
activation.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0
"""
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
# Final pooling and prediction
with tf.variable_scope(scope, 'InceptionV2', [inputs, num_classes],
reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
net, end_points = inception_v2_base(
inputs, scope=scope, min_depth=min_depth,
depth_multiplier=depth_multiplier)
with tf.variable_scope('Logits'):
kernel_size = _reduced_kernel_size_for_small_input(net, [7, 7])
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
scope='AvgPool_1a_{}x{}'.format(*kernel_size))
# 1 x 1 x 1024
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='Conv2d_1c_1x1')
if spatial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
inception_v2.default_image_size = 224
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
"""Define kernel size which is automatically reduced for small input.
If the shape of the input images is unknown at graph construction time this
function assumes that the input images are is large enough.
Args:
input_tensor: input tensor of size [batch_size, height, width, channels].
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
Returns:
a tensor with the kernel size.
TODO(jrru): Make this function work with unknown shapes. Theoretically, this
can be done with the code below. Problems are two-fold: (1) If the shape was
known, it will be lost. (2) inception.slim.ops._two_element_tuple cannot
handle tensors that define the kernel size.
shape = tf.shape(input_tensor)
return = tf.pack([tf.minimum(shape[1], kernel_size[0]),
tf.minimum(shape[2], kernel_size[1])])
"""
shape = input_tensor.get_shape().as_list()
if shape[1] is None or shape[2] is None:
kernel_size_out = kernel_size
else:
kernel_size_out = [min(shape[1], kernel_size[0]),
min(shape[2], kernel_size[1])]
return kernel_size_out
inception_v2_arg_scope = inception_utils.inception_arg_scope
================================================
FILE: nets/inception_v2_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for nets.inception_v2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from nets import inception
slim = tf.contrib.slim
class InceptionV2Test(tf.test.TestCase):
def testBuildClassificationNetwork(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('Predictions' in end_points)
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
[batch_size, num_classes])
def testBuildBaseNetwork(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
mixed_5c, end_points = inception.inception_v2_base(inputs)
self.assertTrue(mixed_5c.op.name.startswith('InceptionV2/Mixed_5c'))
self.assertListEqual(mixed_5c.get_shape().as_list(),
[batch_size, 7, 7, 1024])
expected_endpoints = ['Mixed_3b', 'Mixed_3c', 'Mixed_4a', 'Mixed_4b',
'Mixed_4c', 'Mixed_4d', 'Mixed_4e', 'Mixed_5a',
'Mixed_5b', 'Mixed_5c', 'Conv2d_1a_7x7',
'MaxPool_2a_3x3', 'Conv2d_2b_1x1', 'Conv2d_2c_3x3',
'MaxPool_3a_3x3']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
def testBuildOnlyUptoFinalEndpoint(self):
batch_size = 5
height, width = 224, 224
endpoints = ['Conv2d_1a_7x7', 'MaxPool_2a_3x3', 'Conv2d_2b_1x1',
'Conv2d_2c_3x3', 'MaxPool_3a_3x3', 'Mixed_3b', 'Mixed_3c',
'Mixed_4a', 'Mixed_4b', 'Mixed_4c', 'Mixed_4d', 'Mixed_4e',
'Mixed_5a', 'Mixed_5b', 'Mixed_5c']
for index, endpoint in enumerate(endpoints):
with tf.Graph().as_default():
inputs = tf.random_uniform((batch_size, height, width, 3))
out_tensor, end_points = inception.inception_v2_base(
inputs, final_endpoint=endpoint)
self.assertTrue(out_tensor.op.name.startswith(
'InceptionV2/' + endpoint))
self.assertItemsEqual(endpoints[:index+1], end_points)
def testBuildAndCheckAllEndPointsUptoMixed5c(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v2_base(inputs,
final_endpoint='Mixed_5c')
endpoints_shapes = {'Mixed_3b': [batch_size, 28, 28, 256],
'Mixed_3c': [batch_size, 28, 28, 320],
'Mixed_4a': [batch_size, 14, 14, 576],
'Mixed_4b': [batch_size, 14, 14, 576],
'Mixed_4c': [batch_size, 14, 14, 576],
'Mixed_4d': [batch_size, 14, 14, 576],
'Mixed_4e': [batch_size, 14, 14, 576],
'Mixed_5a': [batch_size, 7, 7, 1024],
'Mixed_5b': [batch_size, 7, 7, 1024],
'Mixed_5c': [batch_size, 7, 7, 1024],
'Conv2d_1a_7x7': [batch_size, 112, 112, 64],
'MaxPool_2a_3x3': [batch_size, 56, 56, 64],
'Conv2d_2b_1x1': [batch_size, 56, 56, 64],
'Conv2d_2c_3x3': [batch_size, 56, 56, 192],
'MaxPool_3a_3x3': [batch_size, 28, 28, 192]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name in endpoints_shapes:
expected_shape = endpoints_shapes[endpoint_name]
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testModelHasExpectedNumberOfParameters(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception.inception_v2_arg_scope()):
inception.inception_v2_base(inputs)
total_params, _ = slim.model_analyzer.analyze_vars(
slim.get_model_variables())
self.assertAlmostEqual(10173112, total_params)
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v2(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys()
if key.startswith('Mixed') or key.startswith('Conv')]
_, end_points_with_multiplier = inception.inception_v2(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=0.5)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(0.5 * original_depth, new_depth)
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v2(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys()
if key.startswith('Mixed') or key.startswith('Conv')]
_, end_points_with_multiplier = inception.inception_v2(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=2.0)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(2.0 * original_depth, new_depth)
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
with self.assertRaises(ValueError):
_ = inception.inception_v2(inputs, num_classes, depth_multiplier=-0.1)
with self.assertRaises(ValueError):
_ = inception.inception_v2(inputs, num_classes, depth_multiplier=0.0)
def testHalfSizeImages(self):
batch_size = 5
height, width = 112, 112
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_5c']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 4, 4, 1024])
def testUnknownImageShape(self):
tf.reset_default_graph()
batch_size = 2
height, width = 224, 224
num_classes = 1000
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, shape=(batch_size, None, None, 3))
logits, end_points = inception.inception_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_5c']
feed_dict = {inputs: input_np}
tf.global_variables_initializer().run()
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
def testUnknowBatchSize(self):
batch_size = 1
height, width = 224, 224
num_classes = 1000
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = inception.inception_v2(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV2/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 224, 224
num_classes = 1000
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_v2(eval_inputs, num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 150, 150
num_classes = 1000
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
inception.inception_v2(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = inception.inception_v2(eval_inputs, num_classes, reuse=True)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
def testLogitsNotSqueezed(self):
num_classes = 25
images = tf.random_uniform([1, 224, 224, 3])
logits, _ = inception.inception_v2(images,
num_classes=num_classes,
spatial_squeeze=False)
with self.test_session() as sess:
tf.global_variables_initializer().run()
logits_out = sess.run(logits)
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/inception_v3.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition for inception v3 classification network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)
def inception_v3_base(inputs,
final_endpoint='Mixed_7c',
min_depth=16,
depth_multiplier=1.0,
scope=None):
"""Inception model from http://arxiv.org/abs/1512.00567.
Constructs an Inception v3 network from inputs to the given final endpoint.
This method can construct the network up to the final inception block
Mixed_7c.
Note that the names of the layers in the paper do not correspond to the names
of the endpoints registered by this function although they build the same
network.
Here is a mapping from the old_names to the new names:
Old name | New name
=======================================
conv0 | Conv2d_1a_3x3
conv1 | Conv2d_2a_3x3
conv2 | Conv2d_2b_3x3
pool1 | MaxPool_3a_3x3
conv3 | Conv2d_3b_1x1
conv4 | Conv2d_4a_3x3
pool2 | MaxPool_5a_3x3
mixed_35x35x256a | Mixed_5b
mixed_35x35x288a | Mixed_5c
mixed_35x35x288b | Mixed_5d
mixed_17x17x768a | Mixed_6a
mixed_17x17x768b | Mixed_6b
mixed_17x17x768c | Mixed_6c
mixed_17x17x768d | Mixed_6d
mixed_17x17x768e | Mixed_6e
mixed_8x8x1280a | Mixed_7a
mixed_8x8x2048a | Mixed_7b
mixed_8x8x2048b | Mixed_7c
Args:
inputs: a tensor of size [batch_size, height, width, channels].
final_endpoint: specifies the endpoint to construct the network up to. It
can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3',
'Mixed_5b', 'Mixed_5c', 'Mixed_5d', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c',
'Mixed_6d', 'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c'].
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
scope: Optional variable_scope.
Returns:
tensor_out: output tensor corresponding to the final_endpoint.
end_points: a set of activations for external use, for example summaries or
losses.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0
"""
# end_points will collect relevant activations for external use, for example
# summaries or losses.
end_points = {}
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
depth = lambda d: max(int(d * depth_multiplier), min_depth)
with tf.variable_scope(scope, 'InceptionV3', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='VALID'):
# 299 x 299 x 3
end_point = 'Conv2d_1a_3x3'
net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 149 x 149 x 32
end_point = 'Conv2d_2a_3x3'
net = slim.conv2d(net, depth(32), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 147 x 147 x 32
end_point = 'Conv2d_2b_3x3'
net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 147 x 147 x 64
end_point = 'MaxPool_3a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 73 x 73 x 64
end_point = 'Conv2d_3b_1x1'
net = slim.conv2d(net, depth(80), [1, 1], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 73 x 73 x 80.
end_point = 'Conv2d_4a_3x3'
net = slim.conv2d(net, depth(192), [3, 3], scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 71 x 71 x 192.
end_point = 'MaxPool_5a_3x3'
net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# 35 x 35 x 192.
# Inception blocks
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='SAME'):
# mixed: 35 x 35 x 256.
end_point = 'Mixed_5b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(32), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_1: 35 x 35 x 288.
end_point = 'Mixed_5c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
scope='Conv_1_0c_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1],
scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_2: 35 x 35 x 288.
end_point = 'Mixed_5d'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(64), [5, 5],
scope='Conv2d_0b_5x5')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, depth(96), [3, 3],
scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(64), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_3: 17 x 17 x 768.
end_point = 'Mixed_6a'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3],
scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_1x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed4: 17 x 17 x 768.
end_point = 'Mixed_6b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(128), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(128), [1, 7],
scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(128), [7, 1],
scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_5: 17 x 17 x 768.
end_point = 'Mixed_6c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(160), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(160), [1, 7],
scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_6: 17 x 17 x 768.
end_point = 'Mixed_6d'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(160), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(160), [1, 7],
scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(160), [7, 1],
scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_7: 17 x 17 x 768.
end_point = 'Mixed_6e'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(192), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, depth(192), [7, 1],
scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, depth(192), [7, 1],
scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, depth(192), [1, 7],
scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, depth(192), [1, 1],
scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_8: 8 x 8 x 1280.
end_point = 'Mixed_7a'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, depth(192), [1, 7],
scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, depth(192), [7, 1],
scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_9: 8 x 8 x 2048.
end_point = 'Mixed_7b'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat(axis=3, values=[
slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1')])
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(
branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat(axis=3, values=[
slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')])
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
# mixed_10: 8 x 8 x 2048.
end_point = 'Mixed_7c'
with tf.variable_scope(end_point):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat(axis=3, values=[
slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1')])
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(
branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3')
branch_2 = tf.concat(axis=3, values=[
slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'),
slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1')])
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(
branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1')
net = tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
end_points[end_point] = net
if end_point == final_endpoint: return net, end_points
raise ValueError('Unknown final endpoint %s' % final_endpoint)
def inception_v3(inputs,
num_classes=1000,
is_training=True,
dropout_keep_prob=0.8,
min_depth=16,
depth_multiplier=1.0,
prediction_fn=slim.softmax,
spatial_squeeze=True,
reuse=None,
scope='InceptionV3'):
"""Inception model from http://arxiv.org/abs/1512.00567.
"Rethinking the Inception Architecture for Computer Vision"
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
Zbigniew Wojna.
With the default arguments this method constructs the exact model defined in
the paper. However, one can experiment with variations of the inception_v3
network by changing arguments dropout_keep_prob, min_depth and
depth_multiplier.
The default image size used to train this network is 299x299.
Args:
inputs: a tensor of size [batch_size, height, width, channels].
num_classes: number of predicted classes.
is_training: whether is training or not.
dropout_keep_prob: the percentage of activation values that are retained.
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
activation.
Raises:
ValueError: if 'depth_multiplier' is less than or equal to zero.
"""
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
depth = lambda d: max(int(d * depth_multiplier), min_depth)
with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes],
reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
net, end_points = inception_v3_base(
inputs, scope=scope, min_depth=min_depth,
depth_multiplier=depth_multiplier)
# Auxiliary Head logits
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='SAME'):
aux_logits = end_points['Mixed_6e']
with tf.variable_scope('AuxLogits'):
# We want to pool the feature map to be 5x5xC
# With padding = 0, and stride 3, this means our kernel is H - 12
kernel_size = [aux_logits.get_shape().as_list()[1] - 12] * 2
aux_logits = slim.avg_pool2d(
aux_logits, kernel_size, stride=3, padding='VALID',
scope='AvgPool_1a_5x5')
aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1],
scope='Conv2d_1b_1x1')
# Shape of feature map before the final layer.
kernel_size = _reduced_kernel_size_for_small_input(aux_logits, [5, 5])
aux_logits = slim.conv2d(
aux_logits, depth(768), kernel_size,
weights_initializer=trunc_normal(0.01),
padding='VALID', scope='Conv2d_2a_{}x{}'.format(*kernel_size))
aux_logits = slim.conv2d(
aux_logits, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, weights_initializer=trunc_normal(0.001),
scope='Conv2d_2b_1x1')
if spatial_squeeze:
aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze')
end_points['AuxLogits'] = aux_logits
# Final pooling and prediction
with tf.variable_scope('Logits'):
#kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8])
kernel_size = _kernel_to_1x1_for_specific_input(net)
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
scope='AvgPool_1a_{}x{}'.format(*kernel_size))
# 1 x 1 x 2048
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
end_points['PreLogits'] = net
# 2048
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='Conv2d_1c_1x1')
if spatial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
# 1000
end_points['Logits'] = logits
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
inception_v3.default_image_size = 299
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
"""Define kernel size which is automatically reduced for small input.
If the shape of the input images is unknown at graph construction time this
function assumes that the input images are is large enough.
Args:
input_tensor: input tensor of size [batch_size, height, width, channels].
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
Returns:
a tensor with the kernel size.
TODO(jrru): Make this function work with unknown shapes. Theoretically, this
can be done with the code below. Problems are two-fold: (1) If the shape was
known, it will be lost. (2) inception.slim.ops._two_element_tuple cannot
handle tensors that define the kernel size.
shape = tf.shape(input_tensor)
return = tf.pack([tf.minimum(shape[1], kernel_size[0]),
tf.minimum(shape[2], kernel_size[1])])
"""
shape = input_tensor.get_shape().as_list()
if shape[1] is None or shape[2] is None:
kernel_size_out = kernel_size
else:
kernel_size_out = [min(shape[1], kernel_size[0]),
min(shape[2], kernel_size[1])]
return kernel_size_out
def _kernel_to_1x1_for_specific_input(input_tensor):
"""Return a kernel that will transform the input_tensor into a vector.
We want any input tensor of shape [B, H, W, C] to be transormed into [B, 1, 1, C].
We assume a known input shape.
"""
shape = input_tensor.get_shape().as_list()
return [shape[1], shape[2]]
inception_v3_arg_scope = inception_utils.inception_arg_scope
================================================
FILE: nets/inception_v3_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for nets.inception_v1."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from nets import inception
slim = tf.contrib.slim
class InceptionV3Test(tf.test.TestCase):
def testBuildClassificationNetwork(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v3(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV3/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('Predictions' in end_points)
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
[batch_size, num_classes])
def testBuildBaseNetwork(self):
batch_size = 5
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
final_endpoint, end_points = inception.inception_v3_base(inputs)
self.assertTrue(final_endpoint.op.name.startswith(
'InceptionV3/Mixed_7c'))
self.assertListEqual(final_endpoint.get_shape().as_list(),
[batch_size, 8, 8, 2048])
expected_endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
def testBuildOnlyUptoFinalEndpoint(self):
batch_size = 5
height, width = 299, 299
endpoints = ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3',
'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
'Mixed_6e', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c']
for index, endpoint in enumerate(endpoints):
with tf.Graph().as_default():
inputs = tf.random_uniform((batch_size, height, width, 3))
out_tensor, end_points = inception.inception_v3_base(
inputs, final_endpoint=endpoint)
self.assertTrue(out_tensor.op.name.startswith(
'InceptionV3/' + endpoint))
self.assertItemsEqual(endpoints[:index+1], end_points)
def testBuildAndCheckAllEndPointsUptoMixed7c(self):
batch_size = 5
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v3_base(
inputs, final_endpoint='Mixed_7c')
endpoints_shapes = {'Conv2d_1a_3x3': [batch_size, 149, 149, 32],
'Conv2d_2a_3x3': [batch_size, 147, 147, 32],
'Conv2d_2b_3x3': [batch_size, 147, 147, 64],
'MaxPool_3a_3x3': [batch_size, 73, 73, 64],
'Conv2d_3b_1x1': [batch_size, 73, 73, 80],
'Conv2d_4a_3x3': [batch_size, 71, 71, 192],
'MaxPool_5a_3x3': [batch_size, 35, 35, 192],
'Mixed_5b': [batch_size, 35, 35, 256],
'Mixed_5c': [batch_size, 35, 35, 288],
'Mixed_5d': [batch_size, 35, 35, 288],
'Mixed_6a': [batch_size, 17, 17, 768],
'Mixed_6b': [batch_size, 17, 17, 768],
'Mixed_6c': [batch_size, 17, 17, 768],
'Mixed_6d': [batch_size, 17, 17, 768],
'Mixed_6e': [batch_size, 17, 17, 768],
'Mixed_7a': [batch_size, 8, 8, 1280],
'Mixed_7b': [batch_size, 8, 8, 2048],
'Mixed_7c': [batch_size, 8, 8, 2048]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name in endpoints_shapes:
expected_shape = endpoints_shapes[endpoint_name]
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testModelHasExpectedNumberOfParameters(self):
batch_size = 5
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception.inception_v3_arg_scope()):
inception.inception_v3_base(inputs)
total_params, _ = slim.model_analyzer.analyze_vars(
slim.get_model_variables())
self.assertAlmostEqual(21802784, total_params)
def testBuildEndPoints(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v3(inputs, num_classes)
self.assertTrue('Logits' in end_points)
logits = end_points['Logits']
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('AuxLogits' in end_points)
aux_logits = end_points['AuxLogits']
self.assertListEqual(aux_logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('Mixed_7c' in end_points)
pre_pool = end_points['Mixed_7c']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 8, 8, 2048])
self.assertTrue('PreLogits' in end_points)
pre_logits = end_points['PreLogits']
self.assertListEqual(pre_logits.get_shape().as_list(),
[batch_size, 1, 1, 2048])
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v3(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys()
if key.startswith('Mixed') or key.startswith('Conv')]
_, end_points_with_multiplier = inception.inception_v3(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=0.5)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(0.5 * original_depth, new_depth)
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v3(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys()
if key.startswith('Mixed') or key.startswith('Conv')]
_, end_points_with_multiplier = inception.inception_v3(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=2.0)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(2.0 * original_depth, new_depth)
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
with self.assertRaises(ValueError):
_ = inception.inception_v3(inputs, num_classes, depth_multiplier=-0.1)
with self.assertRaises(ValueError):
_ = inception.inception_v3(inputs, num_classes, depth_multiplier=0.0)
def testHalfSizeImages(self):
batch_size = 5
height, width = 150, 150
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v3(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV3/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_7c']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 3, 3, 2048])
def testUnknownImageShape(self):
tf.reset_default_graph()
batch_size = 2
height, width = 299, 299
num_classes = 1000
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, shape=(batch_size, None, None, 3))
logits, end_points = inception.inception_v3(inputs, num_classes)
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_7c']
feed_dict = {inputs: input_np}
tf.global_variables_initializer().run()
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 8, 8, 2048])
def testUnknowBatchSize(self):
batch_size = 1
height, width = 299, 299
num_classes = 1000
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = inception.inception_v3(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV3/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 299, 299
num_classes = 1000
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_v3(eval_inputs, num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 150, 150
num_classes = 1000
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
inception.inception_v3(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = inception.inception_v3(eval_inputs, num_classes,
is_training=False, reuse=True)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
def testLogitsNotSqueezed(self):
num_classes = 25
images = tf.random_uniform([1, 299, 299, 3])
logits, _ = inception.inception_v3(images,
num_classes=num_classes,
spatial_squeeze=False)
with self.test_session() as sess:
tf.global_variables_initializer().run()
logits_out = sess.run(logits)
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/inception_v4.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the definition of the Inception V4 architecture.
As described in http://arxiv.org/abs/1602.07261.
Inception-v4, Inception-ResNet and the Impact of Residual Connections
on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception_utils
slim = tf.contrib.slim
def block_inception_a(inputs, scope=None, reuse=None):
"""Builds Inception-A block for Inception v4 network."""
# By default use stride=1 and SAME padding
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionA', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 96, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], scope='Conv2d_0b_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0b_3x3')
branch_2 = slim.conv2d(branch_2, 96, [3, 3], scope='Conv2d_0c_3x3')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 96, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
def block_reduction_a(inputs, scope=None, reuse=None):
"""Builds Reduction-A block for Inception v4 network."""
# By default use stride=1 and SAME padding
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockReductionA', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 384, [3, 3], stride=2, padding='VALID',
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 224, [3, 3], scope='Conv2d_0b_3x3')
branch_1 = slim.conv2d(branch_1, 256, [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
def block_inception_b(inputs, scope=None, reuse=None):
"""Builds Inception-B block for Inception v4 network."""
# By default use stride=1 and SAME padding
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionB', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 224, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 256, [7, 1], scope='Conv2d_0c_7x1')
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 192, [7, 1], scope='Conv2d_0b_7x1')
branch_2 = slim.conv2d(branch_2, 224, [1, 7], scope='Conv2d_0c_1x7')
branch_2 = slim.conv2d(branch_2, 224, [7, 1], scope='Conv2d_0d_7x1')
branch_2 = slim.conv2d(branch_2, 256, [1, 7], scope='Conv2d_0e_1x7')
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 128, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
def block_reduction_b(inputs, scope=None, reuse=None):
"""Builds Reduction-B block for Inception v4 network."""
# By default use stride=1 and SAME padding
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockReductionB', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 192, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 192, [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 256, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 320, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 320, [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_2'):
branch_2 = slim.max_pool2d(inputs, [3, 3], stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2])
def block_inception_c(inputs, scope=None, reuse=None):
"""Builds Inception-C block for Inception v4 network."""
# By default use stride=1 and SAME padding
with slim.arg_scope([slim.conv2d, slim.avg_pool2d, slim.max_pool2d],
stride=1, padding='SAME'):
with tf.variable_scope(scope, 'BlockInceptionC', [inputs], reuse=reuse):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(inputs, 256, [1, 1], scope='Conv2d_0a_1x1')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = tf.concat(axis=3, values=[
slim.conv2d(branch_1, 256, [1, 3], scope='Conv2d_0b_1x3'),
slim.conv2d(branch_1, 256, [3, 1], scope='Conv2d_0c_3x1')])
with tf.variable_scope('Branch_2'):
branch_2 = slim.conv2d(inputs, 384, [1, 1], scope='Conv2d_0a_1x1')
branch_2 = slim.conv2d(branch_2, 448, [3, 1], scope='Conv2d_0b_3x1')
branch_2 = slim.conv2d(branch_2, 512, [1, 3], scope='Conv2d_0c_1x3')
branch_2 = tf.concat(axis=3, values=[
slim.conv2d(branch_2, 256, [1, 3], scope='Conv2d_0d_1x3'),
slim.conv2d(branch_2, 256, [3, 1], scope='Conv2d_0e_3x1')])
with tf.variable_scope('Branch_3'):
branch_3 = slim.avg_pool2d(inputs, [3, 3], scope='AvgPool_0a_3x3')
branch_3 = slim.conv2d(branch_3, 256, [1, 1], scope='Conv2d_0b_1x1')
return tf.concat(axis=3, values=[branch_0, branch_1, branch_2, branch_3])
def inception_v4_base(inputs, final_endpoint='Mixed_7d', scope=None):
"""Creates the Inception V4 network up to the given final endpoint.
Args:
inputs: a 4-D tensor of size [batch_size, height, width, 3].
final_endpoint: specifies the endpoint to construct the network up to.
It can be one of [ 'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3',
'Mixed_3a', 'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d', 'Mixed_6e',
'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a', 'Mixed_7b', 'Mixed_7c',
'Mixed_7d']
scope: Optional variable_scope.
Returns:
logits: the logits outputs of the model.
end_points: the set of end_points from the inception model.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
"""
end_points = {}
def add_and_check_final(name, net):
end_points[name] = net
return name == final_endpoint
with tf.variable_scope(scope, 'InceptionV4', [inputs]):
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='SAME'):
# 299 x 299 x 3
net = slim.conv2d(inputs, 32, [3, 3], stride=2,
padding='VALID', scope='Conv2d_1a_3x3')
if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points
# 149 x 149 x 32
net = slim.conv2d(net, 32, [3, 3], padding='VALID',
scope='Conv2d_2a_3x3')
if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points
# 147 x 147 x 32
net = slim.conv2d(net, 64, [3, 3], scope='Conv2d_2b_3x3')
if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points
# 147 x 147 x 64
with tf.variable_scope('Mixed_3a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
scope='MaxPool_0a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 96, [3, 3], stride=2, padding='VALID',
scope='Conv2d_0a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1])
if add_and_check_final('Mixed_3a', net): return net, end_points
# 73 x 73 x 160
with tf.variable_scope('Mixed_4a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_0 = slim.conv2d(branch_0, 96, [3, 3], padding='VALID',
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.conv2d(net, 64, [1, 1], scope='Conv2d_0a_1x1')
branch_1 = slim.conv2d(branch_1, 64, [1, 7], scope='Conv2d_0b_1x7')
branch_1 = slim.conv2d(branch_1, 64, [7, 1], scope='Conv2d_0c_7x1')
branch_1 = slim.conv2d(branch_1, 96, [3, 3], padding='VALID',
scope='Conv2d_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1])
if add_and_check_final('Mixed_4a', net): return net, end_points
# 71 x 71 x 192
with tf.variable_scope('Mixed_5a'):
with tf.variable_scope('Branch_0'):
branch_0 = slim.conv2d(net, 192, [3, 3], stride=2, padding='VALID',
scope='Conv2d_1a_3x3')
with tf.variable_scope('Branch_1'):
branch_1 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID',
scope='MaxPool_1a_3x3')
net = tf.concat(axis=3, values=[branch_0, branch_1])
if add_and_check_final('Mixed_5a', net): return net, end_points
# 35 x 35 x 384
# 4 x Inception-A blocks
for idx in xrange(4):
block_scope = 'Mixed_5' + chr(ord('b') + idx)
net = block_inception_a(net, block_scope)
if add_and_check_final(block_scope, net): return net, end_points
# 35 x 35 x 384
# Reduction-A block
net = block_reduction_a(net, 'Mixed_6a')
if add_and_check_final('Mixed_6a', net): return net, end_points
# 17 x 17 x 1024
# 7 x Inception-B blocks
for idx in xrange(7):
block_scope = 'Mixed_6' + chr(ord('b') + idx)
net = block_inception_b(net, block_scope)
if add_and_check_final(block_scope, net): return net, end_points
# 17 x 17 x 1024
# Reduction-B block
net = block_reduction_b(net, 'Mixed_7a')
if add_and_check_final('Mixed_7a', net): return net, end_points
# 8 x 8 x 1536
# 3 x Inception-C blocks
for idx in xrange(3):
block_scope = 'Mixed_7' + chr(ord('b') + idx)
net = block_inception_c(net, block_scope)
if add_and_check_final(block_scope, net): return net, end_points
raise ValueError('Unknown final endpoint %s' % final_endpoint)
def inception_v4(inputs, num_classes=1001, is_training=True,
dropout_keep_prob=0.8,
reuse=None,
scope='InceptionV4',
create_aux_logits=True):
"""Creates the Inception V4 model.
Args:
inputs: a 4-D tensor of size [batch_size, height, width, 3].
num_classes: number of predicted classes.
is_training: whether is training or not.
dropout_keep_prob: float, the fraction to keep before final layer.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
create_aux_logits: Whether to include the auxilliary logits.
Returns:
logits: the logits outputs of the model.
end_points: the set of end_points from the inception model.
"""
end_points = {}
with tf.variable_scope(scope, 'InceptionV4', [inputs], reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
net, end_points = inception_v4_base(inputs, scope=scope)
with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],
stride=1, padding='SAME'):
# Auxiliary Head logits
if create_aux_logits:
with tf.variable_scope('AuxLogits'):
# 17 x 17 x 1024
aux_logits = end_points['Mixed_6h']
# Originally, kernel_size = 5
# However, if we change the input size then we need to change the kernel size
# We want to pool the feature map to be 5x5xC
# With padding = 0, and stride 3, this means our kernel is H - 12
kernel_size = [aux_logits.get_shape().as_list()[1] - 12] * 2
aux_logits = slim.avg_pool2d(aux_logits, kernel_size, stride=3,
padding='VALID',
scope='AvgPool_1a_5x5')
aux_logits = slim.conv2d(aux_logits, 128, [1, 1],
scope='Conv2d_1b_1x1')
aux_logits = slim.conv2d(aux_logits, 768,
aux_logits.get_shape()[1:3],
padding='VALID', scope='Conv2d_2a')
aux_logits = slim.flatten(aux_logits)
aux_logits = slim.fully_connected(aux_logits, num_classes,
activation_fn=None,
scope='Aux_logits')
end_points['AuxLogits'] = aux_logits
# Final pooling and prediction
with tf.variable_scope('Logits'):
# 8 x 8 x 1536
net = slim.avg_pool2d(net, net.get_shape()[1:3], padding='VALID',
scope='AvgPool_1a')
# 1 x 1 x 1536
net = slim.dropout(net, dropout_keep_prob, scope='Dropout_1b')
net = slim.flatten(net, scope='PreLogitsFlatten')
end_points['PreLogitsFlatten'] = net
# 1536
logits = slim.fully_connected(net, num_classes, activation_fn=None,
scope='Logits')
end_points['Logits'] = logits
end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions')
return logits, end_points
inception_v4.default_image_size = 299
inception_v4_arg_scope = inception_utils.inception_arg_scope
================================================
FILE: nets/inception_v4_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for slim.inception_v4."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import inception
class InceptionTest(tf.test.TestCase):
def testBuildLogits(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v4(inputs, num_classes)
auxlogits = end_points['AuxLogits']
predictions = end_points['Predictions']
self.assertTrue(auxlogits.op.name.startswith('InceptionV4/AuxLogits'))
self.assertListEqual(auxlogits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue(predictions.op.name.startswith(
'InceptionV4/Logits/Predictions'))
self.assertListEqual(predictions.get_shape().as_list(),
[batch_size, num_classes])
def testBuildWithoutAuxLogits(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, endpoints = inception.inception_v4(inputs, num_classes,
create_aux_logits=False)
self.assertFalse('AuxLogits' in endpoints)
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
def testAllEndPointsShapes(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = inception.inception_v4(inputs, num_classes)
endpoints_shapes = {'Conv2d_1a_3x3': [batch_size, 149, 149, 32],
'Conv2d_2a_3x3': [batch_size, 147, 147, 32],
'Conv2d_2b_3x3': [batch_size, 147, 147, 64],
'Mixed_3a': [batch_size, 73, 73, 160],
'Mixed_4a': [batch_size, 71, 71, 192],
'Mixed_5a': [batch_size, 35, 35, 384],
# 4 x Inception-A blocks
'Mixed_5b': [batch_size, 35, 35, 384],
'Mixed_5c': [batch_size, 35, 35, 384],
'Mixed_5d': [batch_size, 35, 35, 384],
'Mixed_5e': [batch_size, 35, 35, 384],
# Reduction-A block
'Mixed_6a': [batch_size, 17, 17, 1024],
# 7 x Inception-B blocks
'Mixed_6b': [batch_size, 17, 17, 1024],
'Mixed_6c': [batch_size, 17, 17, 1024],
'Mixed_6d': [batch_size, 17, 17, 1024],
'Mixed_6e': [batch_size, 17, 17, 1024],
'Mixed_6f': [batch_size, 17, 17, 1024],
'Mixed_6g': [batch_size, 17, 17, 1024],
'Mixed_6h': [batch_size, 17, 17, 1024],
# Reduction-A block
'Mixed_7a': [batch_size, 8, 8, 1536],
# 3 x Inception-C blocks
'Mixed_7b': [batch_size, 8, 8, 1536],
'Mixed_7c': [batch_size, 8, 8, 1536],
'Mixed_7d': [batch_size, 8, 8, 1536],
# Logits and predictions
'AuxLogits': [batch_size, num_classes],
'PreLogitsFlatten': [batch_size, 1536],
'Logits': [batch_size, num_classes],
'Predictions': [batch_size, num_classes]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name in endpoints_shapes:
expected_shape = endpoints_shapes[endpoint_name]
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testBuildBaseNetwork(self):
batch_size = 5
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
net, end_points = inception.inception_v4_base(inputs)
self.assertTrue(net.op.name.startswith(
'InceptionV4/Mixed_7d'))
self.assertListEqual(net.get_shape().as_list(), [batch_size, 8, 8, 1536])
expected_endpoints = [
'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'Mixed_3a',
'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
'Mixed_6e', 'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a',
'Mixed_7b', 'Mixed_7c', 'Mixed_7d']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
for name, op in end_points.iteritems():
self.assertTrue(op.name.startswith('InceptionV4/' + name))
def testBuildOnlyUpToFinalEndpoint(self):
batch_size = 5
height, width = 299, 299
all_endpoints = [
'Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'Mixed_3a',
'Mixed_4a', 'Mixed_5a', 'Mixed_5b', 'Mixed_5c', 'Mixed_5d',
'Mixed_5e', 'Mixed_6a', 'Mixed_6b', 'Mixed_6c', 'Mixed_6d',
'Mixed_6e', 'Mixed_6f', 'Mixed_6g', 'Mixed_6h', 'Mixed_7a',
'Mixed_7b', 'Mixed_7c', 'Mixed_7d']
for index, endpoint in enumerate(all_endpoints):
with tf.Graph().as_default():
inputs = tf.random_uniform((batch_size, height, width, 3))
out_tensor, end_points = inception.inception_v4_base(
inputs, final_endpoint=endpoint)
self.assertTrue(out_tensor.op.name.startswith(
'InceptionV4/' + endpoint))
self.assertItemsEqual(all_endpoints[:index+1], end_points)
def testVariablesSetDevice(self):
batch_size = 5
height, width = 299, 299
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
# Force all Variables to reside on the device.
with tf.variable_scope('on_cpu'), tf.device('/cpu:0'):
inception.inception_v4(inputs, num_classes)
with tf.variable_scope('on_gpu'), tf.device('/gpu:0'):
inception.inception_v4(inputs, num_classes)
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_cpu'):
self.assertDeviceEqual(v.device, '/cpu:0')
for v in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='on_gpu'):
self.assertDeviceEqual(v.device, '/gpu:0')
def testHalfSizeImages(self):
batch_size = 5
height, width = 150, 150
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = inception.inception_v4(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Mixed_7d']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 3, 3, 1536])
def testUnknownBatchSize(self):
batch_size = 1
height, width = 299, 299
num_classes = 1000
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = inception.inception_v4(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('InceptionV4/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 299, 299
num_classes = 1000
with self.test_session() as sess:
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = inception.inception_v4(eval_inputs,
num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 150, 150
num_classes = 1000
with self.test_session() as sess:
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
inception.inception_v4(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = inception.inception_v4(eval_inputs,
num_classes,
is_training=False,
reuse=True)
predictions = tf.argmax(logits, 1)
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/mobilenet_v1.py
================================================
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================
"""MobileNet v1.
MobileNet is a general architecture and can be used for multiple use cases.
Depending on the use case, it can use different input layer size and different
head (for example: embeddings, localization and classification).
As described in https://arxiv.org/abs/1704.04861.
MobileNets: Efficient Convolutional Neural Networks for
Mobile Vision Applications
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang,
Tobias Weyand, Marco Andreetto, Hartwig Adam
100% Mobilenet V1 (base) with input size 224x224:
See mobilenet_v1()
Layer params macs
--------------------------------------------------------------------------------
MobilenetV1/Conv2d_0/Conv2D: 864 10,838,016
MobilenetV1/Conv2d_1_depthwise/depthwise: 288 3,612,672
MobilenetV1/Conv2d_1_pointwise/Conv2D: 2,048 25,690,112
MobilenetV1/Conv2d_2_depthwise/depthwise: 576 1,806,336
MobilenetV1/Conv2d_2_pointwise/Conv2D: 8,192 25,690,112
MobilenetV1/Conv2d_3_depthwise/depthwise: 1,152 3,612,672
MobilenetV1/Conv2d_3_pointwise/Conv2D: 16,384 51,380,224
MobilenetV1/Conv2d_4_depthwise/depthwise: 1,152 903,168
MobilenetV1/Conv2d_4_pointwise/Conv2D: 32,768 25,690,112
MobilenetV1/Conv2d_5_depthwise/depthwise: 2,304 1,806,336
MobilenetV1/Conv2d_5_pointwise/Conv2D: 65,536 51,380,224
MobilenetV1/Conv2d_6_depthwise/depthwise: 2,304 451,584
MobilenetV1/Conv2d_6_pointwise/Conv2D: 131,072 25,690,112
MobilenetV1/Conv2d_7_depthwise/depthwise: 4,608 903,168
MobilenetV1/Conv2d_7_pointwise/Conv2D: 262,144 51,380,224
MobilenetV1/Conv2d_8_depthwise/depthwise: 4,608 903,168
MobilenetV1/Conv2d_8_pointwise/Conv2D: 262,144 51,380,224
MobilenetV1/Conv2d_9_depthwise/depthwise: 4,608 903,168
MobilenetV1/Conv2d_9_pointwise/Conv2D: 262,144 51,380,224
MobilenetV1/Conv2d_10_depthwise/depthwise: 4,608 903,168
MobilenetV1/Conv2d_10_pointwise/Conv2D: 262,144 51,380,224
MobilenetV1/Conv2d_11_depthwise/depthwise: 4,608 903,168
MobilenetV1/Conv2d_11_pointwise/Conv2D: 262,144 51,380,224
MobilenetV1/Conv2d_12_depthwise/depthwise: 4,608 225,792
MobilenetV1/Conv2d_12_pointwise/Conv2D: 524,288 25,690,112
MobilenetV1/Conv2d_13_depthwise/depthwise: 9,216 451,584
MobilenetV1/Conv2d_13_pointwise/Conv2D: 1,048,576 51,380,224
--------------------------------------------------------------------------------
Total: 3,185,088 567,716,352
75% Mobilenet V1 (base) with input size 128x128:
See mobilenet_v1_075()
Layer params macs
--------------------------------------------------------------------------------
MobilenetV1/Conv2d_0/Conv2D: 648 2,654,208
MobilenetV1/Conv2d_1_depthwise/depthwise: 216 884,736
MobilenetV1/Conv2d_1_pointwise/Conv2D: 1,152 4,718,592
MobilenetV1/Conv2d_2_depthwise/depthwise: 432 442,368
MobilenetV1/Conv2d_2_pointwise/Conv2D: 4,608 4,718,592
MobilenetV1/Conv2d_3_depthwise/depthwise: 864 884,736
MobilenetV1/Conv2d_3_pointwise/Conv2D: 9,216 9,437,184
MobilenetV1/Conv2d_4_depthwise/depthwise: 864 221,184
MobilenetV1/Conv2d_4_pointwise/Conv2D: 18,432 4,718,592
MobilenetV1/Conv2d_5_depthwise/depthwise: 1,728 442,368
MobilenetV1/Conv2d_5_pointwise/Conv2D: 36,864 9,437,184
MobilenetV1/Conv2d_6_depthwise/depthwise: 1,728 110,592
MobilenetV1/Conv2d_6_pointwise/Conv2D: 73,728 4,718,592
MobilenetV1/Conv2d_7_depthwise/depthwise: 3,456 221,184
MobilenetV1/Conv2d_7_pointwise/Conv2D: 147,456 9,437,184
MobilenetV1/Conv2d_8_depthwise/depthwise: 3,456 221,184
MobilenetV1/Conv2d_8_pointwise/Conv2D: 147,456 9,437,184
MobilenetV1/Conv2d_9_depthwise/depthwise: 3,456 221,184
MobilenetV1/Conv2d_9_pointwise/Conv2D: 147,456 9,437,184
MobilenetV1/Conv2d_10_depthwise/depthwise: 3,456 221,184
MobilenetV1/Conv2d_10_pointwise/Conv2D: 147,456 9,437,184
MobilenetV1/Conv2d_11_depthwise/depthwise: 3,456 221,184
MobilenetV1/Conv2d_11_pointwise/Conv2D: 147,456 9,437,184
MobilenetV1/Conv2d_12_depthwise/depthwise: 3,456 55,296
MobilenetV1/Conv2d_12_pointwise/Conv2D: 294,912 4,718,592
MobilenetV1/Conv2d_13_depthwise/depthwise: 6,912 110,592
MobilenetV1/Conv2d_13_pointwise/Conv2D: 589,824 9,437,184
--------------------------------------------------------------------------------
Total: 1,800,144 106,002,432
"""
# Tensorflow mandates these.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from collections import namedtuple
import functools
import tensorflow as tf
slim = tf.contrib.slim
# Conv and DepthSepConv namedtuple define layers of the MobileNet architecture
# Conv defines 3x3 convolution layers
# DepthSepConv defines 3x3 depthwise convolution followed by 1x1 convolution.
# stride is the stride of the convolution
# depth is the number of channels or filters in a layer
Conv = namedtuple('Conv', ['kernel', 'stride', 'depth'])
DepthSepConv = namedtuple('DepthSepConv', ['kernel', 'stride', 'depth'])
# _CONV_DEFS specifies the MobileNet body
_CONV_DEFS = [
Conv(kernel=[3, 3], stride=2, depth=32),
DepthSepConv(kernel=[3, 3], stride=1, depth=64),
DepthSepConv(kernel=[3, 3], stride=2, depth=128),
DepthSepConv(kernel=[3, 3], stride=1, depth=128),
DepthSepConv(kernel=[3, 3], stride=2, depth=256),
DepthSepConv(kernel=[3, 3], stride=1, depth=256),
DepthSepConv(kernel=[3, 3], stride=2, depth=512),
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
DepthSepConv(kernel=[3, 3], stride=1, depth=512),
DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
DepthSepConv(kernel=[3, 3], stride=1, depth=1024)
]
def mobilenet_v1_base(inputs,
final_endpoint='Conv2d_13_pointwise',
min_depth=8,
depth_multiplier=1.0,
conv_defs=None,
output_stride=None,
scope=None):
"""Mobilenet v1.
Constructs a Mobilenet v1 network from inputs to the given final endpoint.
Args:
inputs: a tensor of shape [batch_size, height, width, channels].
final_endpoint: specifies the endpoint to construct the network up to. It
can be one of ['Conv2d_0', 'Conv2d_1_pointwise', 'Conv2d_2_pointwise',
'Conv2d_3_pointwise', 'Conv2d_4_pointwise', 'Conv2d_5'_pointwise,
'Conv2d_6_pointwise', 'Conv2d_7_pointwise', 'Conv2d_8_pointwise',
'Conv2d_9_pointwise', 'Conv2d_10_pointwise', 'Conv2d_11_pointwise',
'Conv2d_12_pointwise', 'Conv2d_13_pointwise'].
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
output_stride: An integer that specifies the requested ratio of input to
output spatial resolution. If not None, then we invoke atrous convolution
if necessary to prevent the network from reducing the spatial resolution
of the activation maps. Allowed values are 8 (accurate fully convolutional
mode), 16 (fast fully convolutional mode), 32 (classification mode).
scope: Optional variable_scope.
Returns:
tensor_out: output tensor corresponding to the final_endpoint.
end_points: a set of activations for external use, for example summaries or
losses.
Raises:
ValueError: if final_endpoint is not set to one of the predefined values,
or depth_multiplier <= 0, or the target output_stride is not
allowed.
"""
depth = lambda d: max(int(d * depth_multiplier), min_depth)
end_points = {}
# Used to find thinned depths for each layer.
if depth_multiplier <= 0:
raise ValueError('depth_multiplier is not greater than zero.')
if conv_defs is None:
conv_defs = _CONV_DEFS
if output_stride is not None and output_stride not in [8, 16, 32]:
raise ValueError('Only allowed output_stride values are 8, 16, 32.')
with tf.variable_scope(scope, 'MobilenetV1', [inputs]):
with slim.arg_scope([slim.conv2d, slim.separable_conv2d], padding='SAME'):
# The current_stride variable keeps track of the output stride of the
# activations, i.e., the running product of convolution strides up to the
# current network layer. This allows us to invoke atrous convolution
# whenever applying the next convolution would result in the activations
# having output stride larger than the target output_stride.
current_stride = 1
# The atrous convolution rate parameter.
rate = 1
net = inputs
for i, conv_def in enumerate(conv_defs):
end_point_base = 'Conv2d_%d' % i
if output_stride is not None and current_stride == output_stride:
# If we have reached the target output_stride, then we need to employ
# atrous convolution with stride=1 and multiply the atrous rate by the
# current unit's stride for use in subsequent layers.
layer_stride = 1
layer_rate = rate
rate *= conv_def.stride
else:
layer_stride = conv_def.stride
layer_rate = 1
current_stride *= conv_def.stride
if isinstance(conv_def, Conv):
end_point = end_point_base
net = slim.conv2d(net, depth(conv_def.depth), conv_def.kernel,
stride=conv_def.stride,
normalizer_fn=slim.batch_norm,
scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint:
return net, end_points
elif isinstance(conv_def, DepthSepConv):
end_point = end_point_base + '_depthwise'
# By passing filters=None
# separable_conv2d produces only a depthwise convolution layer
net = slim.separable_conv2d(net, None, conv_def.kernel,
depth_multiplier=1,
stride=layer_stride,
rate=layer_rate,
normalizer_fn=slim.batch_norm,
scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint:
return net, end_points
end_point = end_point_base + '_pointwise'
net = slim.conv2d(net, depth(conv_def.depth), [1, 1],
stride=1,
normalizer_fn=slim.batch_norm,
scope=end_point)
end_points[end_point] = net
if end_point == final_endpoint:
return net, end_points
else:
raise ValueError('Unknown convolution type %s for layer %d'
% (conv_def.ltype, i))
raise ValueError('Unknown final endpoint %s' % final_endpoint)
def mobilenet_v1(inputs,
num_classes=1000,
dropout_keep_prob=0.999,
is_training=True,
min_depth=8,
depth_multiplier=1.0,
conv_defs=None,
prediction_fn=tf.contrib.layers.softmax,
spatial_squeeze=True,
reuse=None,
scope='MobilenetV1'):
"""Mobilenet v1 model for classification.
Args:
inputs: a tensor of shape [batch_size, height, width, channels].
num_classes: number of predicted classes.
dropout_keep_prob: the percentage of activation values that are retained.
is_training: whether is training or not.
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
activation.
Raises:
ValueError: Input rank is invalid.
"""
input_shape = inputs.get_shape().as_list()
if len(input_shape) != 4:
raise ValueError('Invalid input tensor rank, expected 4, was: %d' %
len(input_shape))
with tf.variable_scope(scope, 'MobilenetV1', [inputs, num_classes],
reuse=reuse) as scope:
with slim.arg_scope([slim.batch_norm, slim.dropout],
is_training=is_training):
net, end_points = mobilenet_v1_base(inputs, scope=scope,
min_depth=min_depth,
depth_multiplier=depth_multiplier,
conv_defs=conv_defs)
with tf.variable_scope('Logits'):
#kernel_size = _reduced_kernel_size_for_small_input(net, [7, 7])
kernel_size = net.get_shape()[1:3]
net = slim.avg_pool2d(net, kernel_size, padding='VALID',
scope='AvgPool_1a')
end_points['AvgPool_1a'] = net
# 1 x 1 x 1024
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
logits = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='Conv2d_1c_1x1')
if spatial_squeeze:
logits = tf.squeeze(logits, [1, 2], name='SpatialSqueeze')
end_points['Logits'] = logits
if prediction_fn:
end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
return logits, end_points
mobilenet_v1.default_image_size = 224
def wrapped_partial(func, *args, **kwargs):
partial_func = functools.partial(func, *args, **kwargs)
functools.update_wrapper(partial_func, func)
return partial_func
mobilenet_v1_075 = wrapped_partial(mobilenet_v1, depth_multiplier=0.75)
mobilenet_v1_050 = wrapped_partial(mobilenet_v1, depth_multiplier=0.50)
mobilenet_v1_025 = wrapped_partial(mobilenet_v1, depth_multiplier=0.25)
def _reduced_kernel_size_for_small_input(input_tensor, kernel_size):
"""Define kernel size which is automatically reduced for small input.
If the shape of the input images is unknown at graph construction time this
function assumes that the input images are large enough.
Args:
input_tensor: input tensor of size [batch_size, height, width, channels].
kernel_size: desired kernel size of length 2: [kernel_height, kernel_width]
Returns:
a tensor with the kernel size.
"""
shape = input_tensor.get_shape().as_list()
if shape[1] is None or shape[2] is None:
kernel_size_out = kernel_size
else:
kernel_size_out = [min(shape[1], kernel_size[0]),
min(shape[2], kernel_size[1])]
return kernel_size_out
def mobilenet_v1_arg_scope(is_training=True,
weight_decay=0.00004,
stddev=0.09,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001,
regularize_depthwise=False):
"""Defines the default MobilenetV1 arg scope.
Args:
is_training: Whether or not we're training the model.
weight_decay: The weight decay to use for regularizing the model.
stddev: The standard deviation of the trunctated normal weight initializer.
regularize_depthwise: Whether or not apply regularization on depthwise.
Returns:
An `arg_scope` to use for the mobilenet v1 model.
"""
batch_norm_params = {
'is_training': is_training,
'center': True,
'scale': True,
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
}
# Set weight_decay for weights in Conv and DepthSepConv layers.
weights_init = tf.truncated_normal_initializer(stddev=stddev)
regularizer = tf.contrib.layers.l2_regularizer(weight_decay)
if regularize_depthwise:
depthwise_regularizer = regularizer
else:
depthwise_regularizer = None
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
weights_initializer=weights_init,
activation_fn=tf.nn.relu6, normalizer_fn=slim.batch_norm):
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
with slim.arg_scope([slim.conv2d], weights_regularizer=regularizer):
with slim.arg_scope([slim.separable_conv2d],
weights_regularizer=depthwise_regularizer) as sc:
return sc
================================================
FILE: nets/mobilenet_v1_test.py
================================================
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================
"""Tests for MobileNet v1."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from nets import mobilenet_v1
slim = tf.contrib.slim
class MobilenetV1Test(tf.test.TestCase):
def testBuildClassificationNetwork(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
self.assertTrue('Predictions' in end_points)
self.assertListEqual(end_points['Predictions'].get_shape().as_list(),
[batch_size, num_classes])
def testBuildBaseNetwork(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
net, end_points = mobilenet_v1.mobilenet_v1_base(inputs)
self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_13'))
self.assertListEqual(net.get_shape().as_list(),
[batch_size, 7, 7, 1024])
expected_endpoints = ['Conv2d_0',
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
'Conv2d_13_depthwise', 'Conv2d_13_pointwise']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
def testBuildOnlyUptoFinalEndpoint(self):
batch_size = 5
height, width = 224, 224
endpoints = ['Conv2d_0',
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
'Conv2d_3_depthwise', 'Conv2d_3_pointwise',
'Conv2d_4_depthwise', 'Conv2d_4_pointwise',
'Conv2d_5_depthwise', 'Conv2d_5_pointwise',
'Conv2d_6_depthwise', 'Conv2d_6_pointwise',
'Conv2d_7_depthwise', 'Conv2d_7_pointwise',
'Conv2d_8_depthwise', 'Conv2d_8_pointwise',
'Conv2d_9_depthwise', 'Conv2d_9_pointwise',
'Conv2d_10_depthwise', 'Conv2d_10_pointwise',
'Conv2d_11_depthwise', 'Conv2d_11_pointwise',
'Conv2d_12_depthwise', 'Conv2d_12_pointwise',
'Conv2d_13_depthwise', 'Conv2d_13_pointwise']
for index, endpoint in enumerate(endpoints):
with tf.Graph().as_default():
inputs = tf.random_uniform((batch_size, height, width, 3))
out_tensor, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, final_endpoint=endpoint)
self.assertTrue(out_tensor.op.name.startswith(
'MobilenetV1/' + endpoint))
self.assertItemsEqual(endpoints[:index+1], end_points)
def testBuildCustomNetworkUsingConvDefs(self):
batch_size = 5
height, width = 224, 224
conv_defs = [
mobilenet_v1.Conv(kernel=[3, 3], stride=2, depth=32),
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=64),
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=2, depth=128),
mobilenet_v1.DepthSepConv(kernel=[3, 3], stride=1, depth=512)
]
inputs = tf.random_uniform((batch_size, height, width, 3))
net, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, final_endpoint='Conv2d_3_pointwise', conv_defs=conv_defs)
self.assertTrue(net.op.name.startswith('MobilenetV1/Conv2d_3'))
self.assertListEqual(net.get_shape().as_list(),
[batch_size, 56, 56, 512])
expected_endpoints = ['Conv2d_0',
'Conv2d_1_depthwise', 'Conv2d_1_pointwise',
'Conv2d_2_depthwise', 'Conv2d_2_pointwise',
'Conv2d_3_depthwise', 'Conv2d_3_pointwise']
self.assertItemsEqual(end_points.keys(), expected_endpoints)
def testBuildAndCheckAllEndPointsUptoConv2d_13(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
normalizer_fn=slim.batch_norm):
_, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, final_endpoint='Conv2d_13_pointwise')
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
'Conv2d_6_depthwise': [batch_size, 14, 14, 256],
'Conv2d_6_pointwise': [batch_size, 14, 14, 512],
'Conv2d_7_depthwise': [batch_size, 14, 14, 512],
'Conv2d_7_pointwise': [batch_size, 14, 14, 512],
'Conv2d_8_depthwise': [batch_size, 14, 14, 512],
'Conv2d_8_pointwise': [batch_size, 14, 14, 512],
'Conv2d_9_depthwise': [batch_size, 14, 14, 512],
'Conv2d_9_pointwise': [batch_size, 14, 14, 512],
'Conv2d_10_depthwise': [batch_size, 14, 14, 512],
'Conv2d_10_pointwise': [batch_size, 14, 14, 512],
'Conv2d_11_depthwise': [batch_size, 14, 14, 512],
'Conv2d_11_pointwise': [batch_size, 14, 14, 512],
'Conv2d_12_depthwise': [batch_size, 7, 7, 512],
'Conv2d_12_pointwise': [batch_size, 7, 7, 1024],
'Conv2d_13_depthwise': [batch_size, 7, 7, 1024],
'Conv2d_13_pointwise': [batch_size, 7, 7, 1024]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name, expected_shape in endpoints_shapes.iteritems():
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testOutputStride16BuildAndCheckAllEndPointsUptoConv2d_13(self):
batch_size = 5
height, width = 224, 224
output_stride = 16
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
normalizer_fn=slim.batch_norm):
_, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, output_stride=output_stride,
final_endpoint='Conv2d_13_pointwise')
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
'Conv2d_6_depthwise': [batch_size, 14, 14, 256],
'Conv2d_6_pointwise': [batch_size, 14, 14, 512],
'Conv2d_7_depthwise': [batch_size, 14, 14, 512],
'Conv2d_7_pointwise': [batch_size, 14, 14, 512],
'Conv2d_8_depthwise': [batch_size, 14, 14, 512],
'Conv2d_8_pointwise': [batch_size, 14, 14, 512],
'Conv2d_9_depthwise': [batch_size, 14, 14, 512],
'Conv2d_9_pointwise': [batch_size, 14, 14, 512],
'Conv2d_10_depthwise': [batch_size, 14, 14, 512],
'Conv2d_10_pointwise': [batch_size, 14, 14, 512],
'Conv2d_11_depthwise': [batch_size, 14, 14, 512],
'Conv2d_11_pointwise': [batch_size, 14, 14, 512],
'Conv2d_12_depthwise': [batch_size, 14, 14, 512],
'Conv2d_12_pointwise': [batch_size, 14, 14, 1024],
'Conv2d_13_depthwise': [batch_size, 14, 14, 1024],
'Conv2d_13_pointwise': [batch_size, 14, 14, 1024]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name, expected_shape in endpoints_shapes.iteritems():
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testOutputStride8BuildAndCheckAllEndPointsUptoConv2d_13(self):
batch_size = 5
height, width = 224, 224
output_stride = 8
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
normalizer_fn=slim.batch_norm):
_, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, output_stride=output_stride,
final_endpoint='Conv2d_13_pointwise')
endpoints_shapes = {'Conv2d_0': [batch_size, 112, 112, 32],
'Conv2d_1_depthwise': [batch_size, 112, 112, 32],
'Conv2d_1_pointwise': [batch_size, 112, 112, 64],
'Conv2d_2_depthwise': [batch_size, 56, 56, 64],
'Conv2d_2_pointwise': [batch_size, 56, 56, 128],
'Conv2d_3_depthwise': [batch_size, 56, 56, 128],
'Conv2d_3_pointwise': [batch_size, 56, 56, 128],
'Conv2d_4_depthwise': [batch_size, 28, 28, 128],
'Conv2d_4_pointwise': [batch_size, 28, 28, 256],
'Conv2d_5_depthwise': [batch_size, 28, 28, 256],
'Conv2d_5_pointwise': [batch_size, 28, 28, 256],
'Conv2d_6_depthwise': [batch_size, 28, 28, 256],
'Conv2d_6_pointwise': [batch_size, 28, 28, 512],
'Conv2d_7_depthwise': [batch_size, 28, 28, 512],
'Conv2d_7_pointwise': [batch_size, 28, 28, 512],
'Conv2d_8_depthwise': [batch_size, 28, 28, 512],
'Conv2d_8_pointwise': [batch_size, 28, 28, 512],
'Conv2d_9_depthwise': [batch_size, 28, 28, 512],
'Conv2d_9_pointwise': [batch_size, 28, 28, 512],
'Conv2d_10_depthwise': [batch_size, 28, 28, 512],
'Conv2d_10_pointwise': [batch_size, 28, 28, 512],
'Conv2d_11_depthwise': [batch_size, 28, 28, 512],
'Conv2d_11_pointwise': [batch_size, 28, 28, 512],
'Conv2d_12_depthwise': [batch_size, 28, 28, 512],
'Conv2d_12_pointwise': [batch_size, 28, 28, 1024],
'Conv2d_13_depthwise': [batch_size, 28, 28, 1024],
'Conv2d_13_pointwise': [batch_size, 28, 28, 1024]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name, expected_shape in endpoints_shapes.iteritems():
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testBuildAndCheckAllEndPointsApproximateFaceNet(self):
batch_size = 5
height, width = 128, 128
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
normalizer_fn=slim.batch_norm):
_, end_points = mobilenet_v1.mobilenet_v1_base(
inputs, final_endpoint='Conv2d_13_pointwise', depth_multiplier=0.75)
# For the Conv2d_0 layer FaceNet has depth=16
endpoints_shapes = {'Conv2d_0': [batch_size, 64, 64, 24],
'Conv2d_1_depthwise': [batch_size, 64, 64, 24],
'Conv2d_1_pointwise': [batch_size, 64, 64, 48],
'Conv2d_2_depthwise': [batch_size, 32, 32, 48],
'Conv2d_2_pointwise': [batch_size, 32, 32, 96],
'Conv2d_3_depthwise': [batch_size, 32, 32, 96],
'Conv2d_3_pointwise': [batch_size, 32, 32, 96],
'Conv2d_4_depthwise': [batch_size, 16, 16, 96],
'Conv2d_4_pointwise': [batch_size, 16, 16, 192],
'Conv2d_5_depthwise': [batch_size, 16, 16, 192],
'Conv2d_5_pointwise': [batch_size, 16, 16, 192],
'Conv2d_6_depthwise': [batch_size, 8, 8, 192],
'Conv2d_6_pointwise': [batch_size, 8, 8, 384],
'Conv2d_7_depthwise': [batch_size, 8, 8, 384],
'Conv2d_7_pointwise': [batch_size, 8, 8, 384],
'Conv2d_8_depthwise': [batch_size, 8, 8, 384],
'Conv2d_8_pointwise': [batch_size, 8, 8, 384],
'Conv2d_9_depthwise': [batch_size, 8, 8, 384],
'Conv2d_9_pointwise': [batch_size, 8, 8, 384],
'Conv2d_10_depthwise': [batch_size, 8, 8, 384],
'Conv2d_10_pointwise': [batch_size, 8, 8, 384],
'Conv2d_11_depthwise': [batch_size, 8, 8, 384],
'Conv2d_11_pointwise': [batch_size, 8, 8, 384],
'Conv2d_12_depthwise': [batch_size, 4, 4, 384],
'Conv2d_12_pointwise': [batch_size, 4, 4, 768],
'Conv2d_13_depthwise': [batch_size, 4, 4, 768],
'Conv2d_13_pointwise': [batch_size, 4, 4, 768]}
self.assertItemsEqual(endpoints_shapes.keys(), end_points.keys())
for endpoint_name, expected_shape in endpoints_shapes.iteritems():
self.assertTrue(endpoint_name in end_points)
self.assertListEqual(end_points[endpoint_name].get_shape().as_list(),
expected_shape)
def testModelHasExpectedNumberOfParameters(self):
batch_size = 5
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope([slim.conv2d, slim.separable_conv2d],
normalizer_fn=slim.batch_norm):
mobilenet_v1.mobilenet_v1_base(inputs)
total_params, _ = slim.model_analyzer.analyze_vars(
slim.get_model_variables())
self.assertAlmostEqual(3217920L, total_params)
def testBuildEndPointsWithDepthMultiplierLessThanOne(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys() if key.startswith('Conv')]
_, end_points_with_multiplier = mobilenet_v1.mobilenet_v1(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=0.5)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(0.5 * original_depth, new_depth)
def testBuildEndPointsWithDepthMultiplierGreaterThanOne(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
_, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
endpoint_keys = [key for key in end_points.keys()
if key.startswith('Mixed') or key.startswith('Conv')]
_, end_points_with_multiplier = mobilenet_v1.mobilenet_v1(
inputs, num_classes, scope='depth_multiplied_net',
depth_multiplier=2.0)
for key in endpoint_keys:
original_depth = end_points[key].get_shape().as_list()[3]
new_depth = end_points_with_multiplier[key].get_shape().as_list()[3]
self.assertEqual(2.0 * original_depth, new_depth)
def testRaiseValueErrorWithInvalidDepthMultiplier(self):
batch_size = 5
height, width = 224, 224
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
with self.assertRaises(ValueError):
_ = mobilenet_v1.mobilenet_v1(
inputs, num_classes, depth_multiplier=-0.1)
with self.assertRaises(ValueError):
_ = mobilenet_v1.mobilenet_v1(
inputs, num_classes, depth_multiplier=0.0)
def testHalfSizeImages(self):
batch_size = 5
height, width = 112, 112
num_classes = 1000
inputs = tf.random_uniform((batch_size, height, width, 3))
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Conv2d_13_pointwise']
self.assertListEqual(pre_pool.get_shape().as_list(),
[batch_size, 4, 4, 1024])
def testUnknownImageShape(self):
tf.reset_default_graph()
batch_size = 2
height, width = 224, 224
num_classes = 1000
input_np = np.random.uniform(0, 1, (batch_size, height, width, 3))
with self.test_session() as sess:
inputs = tf.placeholder(tf.float32, shape=(batch_size, None, None, 3))
logits, end_points = mobilenet_v1.mobilenet_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[batch_size, num_classes])
pre_pool = end_points['Conv2d_13_pointwise']
feed_dict = {inputs: input_np}
tf.global_variables_initializer().run()
pre_pool_out = sess.run(pre_pool, feed_dict=feed_dict)
self.assertListEqual(list(pre_pool_out.shape), [batch_size, 7, 7, 1024])
def testUnknowBatchSize(self):
batch_size = 1
height, width = 224, 224
num_classes = 1000
inputs = tf.placeholder(tf.float32, (None, height, width, 3))
logits, _ = mobilenet_v1.mobilenet_v1(inputs, num_classes)
self.assertTrue(logits.op.name.startswith('MobilenetV1/Logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, num_classes])
images = tf.random_uniform((batch_size, height, width, 3))
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEquals(output.shape, (batch_size, num_classes))
def testEvaluation(self):
batch_size = 2
height, width = 224, 224
num_classes = 1000
eval_inputs = tf.random_uniform((batch_size, height, width, 3))
logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes,
is_training=False)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (batch_size,))
def testTrainEvalWithReuse(self):
train_batch_size = 5
eval_batch_size = 2
height, width = 150, 150
num_classes = 1000
train_inputs = tf.random_uniform((train_batch_size, height, width, 3))
mobilenet_v1.mobilenet_v1(train_inputs, num_classes)
eval_inputs = tf.random_uniform((eval_batch_size, height, width, 3))
logits, _ = mobilenet_v1.mobilenet_v1(eval_inputs, num_classes,
reuse=True)
predictions = tf.argmax(logits, 1)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(predictions)
self.assertEquals(output.shape, (eval_batch_size,))
def testLogitsNotSqueezed(self):
num_classes = 25
images = tf.random_uniform([1, 224, 224, 3])
logits, _ = mobilenet_v1.mobilenet_v1(images,
num_classes=num_classes,
spatial_squeeze=False)
with self.test_session() as sess:
tf.global_variables_initializer().run()
logits_out = sess.run(logits)
self.assertListEqual(list(logits_out.shape), [1, 1, 1, num_classes])
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/net_profile.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import tensorflow as tf
from nets import nets_factory
def profile(model_name, num_classes, image_size, batch_size):
graph = tf.Graph()
sess = tf.Session(graph=graph)
with graph.as_default(), sess.as_default():
network_fn = nets_factory.get_network_fn(model_name, num_classes=num_classes)
inputs = tf.random_uniform((batch_size, image_size, image_size, 3))
logits, _ = network_fn(inputs)
print("Profiling model %s" % model_name)
# Print trainable variable parameter statistics to stdout.
param_stats = tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
tfprof_options=tf.contrib.tfprof.model_analyzer.
TRAINABLE_VARS_PARAMS_STAT_OPTIONS)
# param_stats is tensorflow.tfprof.TFProfNode proto. It organize the statistics
# of each graph node in tree scructure. Let's print the root below.
print('total_params: %d\n' % param_stats.total_parameters)
print()
# Print to stdout an analysis of the number of floating point operations in the
# model broken down by individual operations.
tf.contrib.tfprof.model_analyzer.print_model_analysis(
tf.get_default_graph(),
tfprof_options=tf.contrib.tfprof.model_analyzer.FLOAT_OPS_OPTIONS)
def parse_args():
parser = argparse.ArgumentParser(description='')
parser.add_argument('--model_name', dest='model_name',
help='The name of the architecture to profile.', type=str,
required=False, default='inception_v3')
parser.add_argument('--num_classes', dest='num_classes',
help='The number of classes.', type=int,
required=False, default=1000)
parser.add_argument('--image_size', dest='image_size',
help='The size of the input image.', type=int,
required=False, default=299)
parser.add_argument('--batch_size', dest='batch_size',
help='The number of images in a batch.', type=int,
required=False, default=1)
args = parser.parse_args()
return args
def main():
args = parse_args()
profile(args.model_name, args.num_classes, args.image_size, args.batch_size)
if __name__ == '__main__':
main()
================================================
FILE: nets/nets_factory.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains a factory for building various models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import tensorflow as tf
from nets import inception
from nets import mobilenet_v1
from nets import resnet_v2
slim = tf.contrib.slim
networks_map = {
'inception_v1': inception.inception_v1,
'inception_v2': inception.inception_v2,
'inception_v3': inception.inception_v3,
'inception_v4': inception.inception_v4,
'inception_resnet_v2': inception.inception_resnet_v2,
'resnet_v2_50': resnet_v2.resnet_v2_50,
'resnet_v2_101': resnet_v2.resnet_v2_101,
'resnet_v2_152': resnet_v2.resnet_v2_152,
'resnet_v2_200': resnet_v2.resnet_v2_200,
'mobilenet_v1': mobilenet_v1.mobilenet_v1,
'mobilenet_v1_075': mobilenet_v1.mobilenet_v1_075,
'mobilenet_v1_050': mobilenet_v1.mobilenet_v1_050,
'mobilenet_v1_025': mobilenet_v1.mobilenet_v1_025,
}
arg_scopes_map = {'inception_v1': inception.inception_v3_arg_scope,
'inception_v2': inception.inception_v3_arg_scope,
'inception_v3': inception.inception_v3_arg_scope,
'inception_v4': inception.inception_v4_arg_scope,
'inception_resnet_v2': inception.inception_resnet_v2_arg_scope,
'resnet_v2_50': resnet_v2.resnet_arg_scope,
'resnet_v2_101': resnet_v2.resnet_arg_scope,
'resnet_v2_152': resnet_v2.resnet_arg_scope,
'resnet_v2_200': resnet_v2.resnet_arg_scope,
'mobilenet_v1': mobilenet_v1.mobilenet_v1_arg_scope,
'mobilenet_v1_075': mobilenet_v1.mobilenet_v1_arg_scope,
'mobilenet_v1_050': mobilenet_v1.mobilenet_v1_arg_scope,
'mobilenet_v1_025': mobilenet_v1.mobilenet_v1_arg_scope,
}
def get_network_fn(name, num_classes, weight_decay=0.0, is_training=False):
"""Returns a network_fn such as `logits, end_points = network_fn(images)`.
Args:
name: The name of the network.
num_classes: The number of classes to use for classification.
weight_decay: The l2 coefficient for the model weights.
is_training: `True` if the model is being used for training and `False`
otherwise.
Returns:
network_fn: A function that applies the model to a batch of images. It has
the following signature:
logits, end_points = network_fn(images)
Raises:
ValueError: If network `name` is not recognized.
"""
if name not in networks_map:
raise ValueError('Name of network unknown %s' % name)
arg_scope = arg_scopes_map[name](weight_decay=weight_decay)
func = networks_map[name]
@functools.wraps(func)
def network_fn(images):
with slim.arg_scope(arg_scope):
return func(images, num_classes, is_training=is_training)
if hasattr(func, 'default_image_size'):
network_fn.default_image_size = func.default_image_size
return network_fn
================================================
FILE: nets/nets_factory_test.py
================================================
# Copyright 2016 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for slim.inception."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import nets_factory
class NetworksTest(tf.test.TestCase):
def testGetNetworkFn(self):
batch_size = 5
num_classes = 1000
for net in nets_factory.networks_map:
with self.test_session():
net_fn = nets_factory.get_network_fn(net, num_classes)
# Most networks use 224 as their default_image_size
image_size = getattr(net_fn, 'default_image_size', 224)
inputs = tf.random_uniform((batch_size, image_size, image_size, 3))
logits, end_points = net_fn(inputs)
self.assertTrue(isinstance(logits, tf.Tensor))
self.assertTrue(isinstance(end_points, dict))
self.assertEqual(logits.get_shape().as_list()[0], batch_size)
self.assertEqual(logits.get_shape().as_list()[-1], num_classes)
if __name__ == '__main__':
tf.test.main()
================================================
FILE: nets/resnet_utils.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains building blocks for various versions of Residual Networks.
Residual networks (ResNets) were proposed in:
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015
More variants were introduced in:
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027, 2016
We can obtain different ResNet variants by changing the network depth, width,
and form of residual unit. This module implements the infrastructure for
building them. Concrete ResNet units and full ResNet networks are implemented in
the accompanying resnet_v1.py and resnet_v2.py modules.
Compared to https://github.com/KaimingHe/deep-residual-networks, in the current
implementation we subsample the output activations in the last residual unit of
each block, instead of subsampling the input activations in the first residual
unit of each block. The two implementations give identical results but our
implementation is more memory efficient.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import tensorflow as tf
slim = tf.contrib.slim
class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
"""A named tuple describing a ResNet block.
Its parts are:
scope: The scope of the `Block`.
unit_fn: The ResNet unit function which takes as input a `Tensor` and
returns another `Tensor` with the output of the ResNet unit.
args: A list of length equal to the number of units in the `Block`. The list
contains one (depth, depth_bottleneck, stride) tuple for each unit in the
block to serve as argument to unit_fn.
"""
def subsample(inputs, factor, scope=None):
"""Subsamples the input along the spatial dimensions.
Args:
inputs: A `Tensor` of size [batch, height_in, width_in, channels].
factor: The subsampling factor.
scope: Optional variable_scope.
Returns:
output: A `Tensor` of size [batch, height_out, width_out, channels] with the
input, either intact (if factor == 1) or subsampled (if factor > 1).
"""
if factor == 1:
return inputs
else:
return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)
def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None):
"""Strided 2-D convolution with 'SAME' padding.
When stride > 1, then we do explicit zero-padding, followed by conv2d with
'VALID' padding.
Note that
net = conv2d_same(inputs, num_outputs, 3, stride=stride)
is equivalent to
net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME')
net = subsample(net, factor=stride)
whereas
net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME')
is different when the input's height or width is even, which is why we add the
current function. For more details, see ResnetUtilsTest.testConv2DSameEven().
Args:
inputs: A 4-D tensor of size [batch, height_in, width_in, channels].
num_outputs: An integer, the number of output filters.
kernel_size: An int with the kernel_size of the filters.
stride: An integer, the output stride.
rate: An integer, rate for atrous convolution.
scope: Scope.
Returns:
output: A 4-D tensor of size [batch, height_out, width_out, channels] with
the convolution output.
"""
if stride == 1:
return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, rate=rate,
padding='SAME', scope=scope)
else:
kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1)
pad_total = kernel_size_effective - 1
pad_beg = pad_total // 2
pad_end = pad_total - pad_beg
inputs = tf.pad(inputs,
[[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride,
rate=rate, padding='VALID', scope=scope)
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, output_stride=None,
outputs_collections=None):
"""Stacks ResNet `Blocks` and controls output feature density.
First, this function creates scopes for the ResNet in the form of
'block_name/unit_1', 'block_name/unit_2', etc.
Second, this function allows the user to explicitly control the ResNet
output_stride, which is the ratio of the input to output spatial resolution.
This is useful for dense prediction tasks such as semantic segmentation or
object detection.
Most ResNets consist of 4 ResNet blocks and subsample the activations by a
factor of 2 when transitioning between consecutive ResNet blocks. This results
to a nominal ResNet output_stride equal to 8. If we set the output_stride to
half the nominal network stride (e.g., output_stride=4), then we compute
responses twice.
Control of the output feature density is implemented by atrous convolution.
Args:
net: A `Tensor` of size [batch, height, width, channels].
blocks: A list of length equal to the number of ResNet `Blocks`. Each
element is a ResNet `Block` object describing the units in the `Block`.
output_stride: If `None`, then the output will be computed at the nominal
network stride. If output_stride is not `None`, it specifies the requested
ratio of input to output spatial resolution, which needs to be equal to
the product of unit strides from the start up to some level of the ResNet.
For example, if the ResNet employs units with strides 1, 2, 1, 3, 4, 1,
then valid values for the output_stride are 1, 2, 6, 24 or None (which
is equivalent to output_stride=24).
outputs_collections: Collection to add the ResNet block outputs.
Returns:
net: Output tensor with stride equal to the specified output_stride.
Raises:
ValueError: If the target output_stride is not valid.
"""
# The current_stride variable keeps track of the effective stride of the
# activations. This allows us to invoke atrous convolution whenever applying
# the next residual unit would result in the activations having stride larger
# than the target output_stride.
current_stride = 1
# The atrous convolution rate parameter.
rate = 1
for block in blocks:
with tf.variable_scope(block.scope, 'block', [net]) as sc:
for i, unit in enumerate(block.args):
if output_stride is not None and current_stride > output_stride:
raise ValueError('The target output_stride cannot be reached.')
with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
# If we have reached the target output_stride, then we need to employ
# atrous convolution with stride=1 and multiply the atrous rate by the
# current unit's stride for use in subsequent layers.
if output_stride is not None and current_stride == output_stride:
net = block.unit_fn(net, rate=rate, **dict(unit, stride=1))
rate *= unit.get('stride', 1)
else:
net = block.unit_fn(net, rate=1, **unit)
current_stride *= unit.get('stride', 1)
net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
if output_stride is not None and current_stride != output_stride:
raise ValueError('The target output_stride cannot be reached.')
return net
def resnet_arg_scope(weight_decay=0.0001,
batch_norm_decay=0.997,
batch_norm_epsilon=1e-5,
batch_norm_scale=True,
activation_fn=tf.nn.relu,
use_batch_norm=True):
"""Defines the default ResNet arg scope.
TODO(gpapan): The batch-normalization related default values above are
appropriate for use in conjunction with the reference ResNet models
released at https://github.com/KaimingHe/deep-residual-networks. When
training ResNets from scratch, they might need to be tuned.
Args:
weight_decay: The weight decay to use for regularizing the model.
batch_norm_decay: The moving average decay when estimating layer activation
statistics in batch normalization.
batch_norm_epsilon: Small constant to prevent division by zero when
normalizing activations by their variance in batch normalization.
batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the
activations in the batch normalization layer.
activation_fn: The activation function which is used in ResNet.
use_batch_norm: Whether or not to use batch normalization.
Returns:
An `arg_scope` to use for the resnet models.
"""
batch_norm_params = {
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
'scale': batch_norm_scale,
'updates_collections': tf.GraphKeys.UPDATE_OPS,
}
with slim.arg_scope(
[slim.conv2d],
weights_regularizer=slim.l2_regularizer(weight_decay),
weights_initializer=slim.variance_scaling_initializer(),
activation_fn=activation_fn,
normalizer_fn=slim.batch_norm if use_batch_norm else None,
normalizer_params=batch_norm_params):
with slim.arg_scope([slim.batch_norm], **batch_norm_params):
# The following implies padding='SAME' for pool1, which makes feature
# alignment easier for dense prediction tasks. This is also used in
# https://github.com/facebook/fb.resnet.torch. However the accompanying
# code of 'Deep Residual Learning for Image Recognition' uses
# padding='VALID' for pool1. You can switch to that choice by setting
# slim.arg_scope([slim.max_pool2d], padding='VALID').
with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
return arg_sc
================================================
FILE: nets/resnet_v2.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains definitions for the preactivation form of Residual Networks.
Residual networks (ResNets) were originally proposed in:
[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Deep Residual Learning for Image Recognition. arXiv:1512.03385
The full preactivation 'v2' ResNet variant implemented in this module was
introduced by:
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Identity Mappings in Deep Residual Networks. arXiv: 1603.05027
The key difference of the full preactivation 'v2' variant compared to the
'v1' variant in [1] is the use of batch normalization before every weight layer.
Typical use:
from tensorflow.contrib.slim.nets import resnet_v2
ResNet-101 for image classification into 1000 classes:
# inputs has shape [batch, 224, 224, 3]
with slim.arg_scope(resnet_v2.resnet_arg_scope()):
net, end_points = resnet_v2.resnet_v2_101(inputs, 1000, is_training=False)
ResNet-101 for semantic segmentation into 21 classes:
# inputs has shape [batch, 513, 513, 3]
with slim.arg_scope(resnet_v2.resnet_arg_scope(is_training)):
net, end_points = resnet_v2.resnet_v2_101(inputs,
21,
is_training=False,
global_pool=False,
output_stride=16)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
from nets import resnet_utils
slim = tf.contrib.slim
resnet_arg_scope = resnet_utils.resnet_arg_scope
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride, rate=1,
outputs_collections=None, scope=None):
"""Bottleneck residual unit variant with BN before convolutions.
This is the full preactivation residual unit variant proposed in [2]. See
Fig. 1(b) of [2] for its definition. Note that we use here the bottleneck
variant which has an extra bottleneck layer.
When putting together two consecutive ResNet blocks that use this unit, one
should use stride = 2 in the last unit of the first block.
Args:
inputs: A tensor of size [batch, height, width, channels].
depth: The depth of the ResNet unit output.
depth_bottleneck: The depth of the bottleneck layers.
stride: The ResNet unit's stride. Determines the amount of downsampling of
the units output compared to its input.
rate: An integer, rate for atrous convolution.
outputs_collections: Collection to add the ResNet unit output.
scope: Optional variable_scope.
Returns:
The ResNet unit's output.
"""
with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')
if depth == depth_in:
shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
else:
shortcut = slim.conv2d(preact, depth, [1, 1], stride=stride,
normalizer_fn=None, activation_fn=None,
scope='shortcut')
residual = slim.conv2d(preact, depth_bottleneck, [1, 1], stride=1,
scope='conv1')
residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
rate=rate, scope='conv2')
residual = slim.conv2d(residual, depth, [1, 1], stride=1,
normalizer_fn=None, activation_fn=None,
scope='conv3')
output = shortcut + residual
return slim.utils.collect_named_outputs(outputs_collections,
sc.original_name_scope,
output)
def resnet_v2(inputs,
blocks,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
include_root_block=True,
spatial_squeeze=True,
dropout_keep_prob=1.,
reuse=None,
scope=None):
"""Generator for v2 (preactivation) ResNet models.
This function generates a family of ResNet v2 models. See the resnet_v2_*()
methods for specific model instantiations, obtained by selecting different
block instantiations that produce ResNets of various depths.
Training for image classification on Imagenet is usually done with [224, 224]
inputs, resulting in [7, 7] feature maps at the output of the last ResNet
block for the ResNets defined in [1] that have nominal stride equal to 32.
However, for dense prediction tasks we advise that one uses inputs with
spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In
this case the feature maps at the ResNet output will have spatial shape
[(height - 1) / output_stride + 1, (width - 1) / output_stride + 1]
and corners exactly aligned with the input image corners, which greatly
facilitates alignment of the features to the image. Using as input [225, 225]
images results in [8, 8] feature maps at the output of the last ResNet block.
For dense prediction tasks, the ResNet needs to run in fully-convolutional
(FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all
have nominal stride equal to 32 and a good choice in FCN mode is to use
output_stride=16 in order to increase the density of the computed features at
small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915.
Args:
inputs: A tensor of size [batch, height_in, width_in, channels].
blocks: A list of length equal to the number of ResNet blocks. Each element
is a resnet_utils.Block object describing the units in the block.
num_classes: Number of predicted classes for classification tasks. If None
we return the features before the logit layer.
is_training: whether is training or not.
global_pool: If True, we perform global average pooling before computing the
logits. Set to True for image classification, False for dense prediction.
output_stride: If None, then the output will be computed at the nominal
network stride. If output_stride is not None, it specifies the requested
ratio of input to output spatial resolution.
include_root_block: If True, include the initial convolution followed by
max-pooling, if False excludes it. If excluded, `inputs` should be the
results of an activation-less convolution.
spatial_squeeze: if True, logits is of shape [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
To use this parameter, the input images must be smaller than 300x300
pixels, in which case the output logit layer does not contain spatial
information and can be removed.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
Returns:
net: A rank-4 tensor of size [batch, height_out, width_out, channels_out].
If global_pool is False, then height_out and width_out are reduced by a
factor of output_stride compared to the respective height_in and width_in,
else both height_out and width_out equal one. If num_classes is None, then
net is the output of the last ResNet block, potentially after global
average pooling. If num_classes is not None, net contains the pre-softmax
activations.
end_points: A dictionary from components of the network to the corresponding
activation.
Raises:
ValueError: If the target output_stride is not valid.
"""
with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
end_points_collection = sc.name + '_end_points'
with slim.arg_scope([slim.conv2d, bottleneck,
resnet_utils.stack_blocks_dense],
outputs_collections=end_points_collection):
with slim.arg_scope([slim.batch_norm], is_training=is_training):
net = inputs
if include_root_block:
if output_stride is not None:
if output_stride % 4 != 0:
raise ValueError('The output_stride needs to be a multiple of 4.')
output_stride /= 4
# We do not include batch normalization or activation functions in
# conv1 because the first ResNet unit will perform these. Cf.
# Appendix of [2].
with slim.arg_scope([slim.conv2d],
activation_fn=None, normalizer_fn=None):
net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
net = resnet_utils.stack_blocks_dense(net, blocks, output_stride)
# This is needed because the pre-activation variant does not have batch
# normalization or activation functions in the residual unit output. See
# Appendix of [2].
net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')
if global_pool:
# Global average pooling.
net = tf.reduce_mean(net, [1, 2], name='pool5', keep_dims=True)
if num_classes is not None:
net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None,
normalizer_fn=None, scope='logits')
if spatial_squeeze:
net = tf.squeeze(net, [1, 2], name='SpatialSqueeze')
# Convert end_points_collection into a dictionary of end_points.
end_points = slim.utils.convert_collection_to_dict(
end_points_collection)
if num_classes is not None:
end_points['predictions'] = slim.softmax(net, scope='predictions')
return net, end_points
resnet_v2.default_image_size = 224
def resnet_v2_block(scope, base_depth, num_units, stride):
"""Helper function for creating a resnet_v2 bottleneck block.
Args:
scope: The scope of the block.
base_depth: The depth of the bottleneck layer for each unit.
num_units: The number of units in the block.
stride: The stride of the block, implemented as a stride in the last unit.
All other units have stride=1.
Returns:
A resnet_v2 bottleneck block.
"""
return resnet_utils.Block(scope, bottleneck, [{
'depth': base_depth * 4,
'depth_bottleneck': base_depth,
'stride': 1
}] * (num_units - 1) + [{
'depth': base_depth * 4,
'depth_bottleneck': base_depth,
'stride': stride
}])
resnet_v2.default_image_size = 224
def resnet_v2_50(inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
spatial_squeeze=True,
dropout_keep_prob=1.,
reuse=None,
scope='resnet_v2_50'):
"""ResNet-50 model of [1]. See resnet_v2() for arg and return description."""
blocks = [
resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v2_block('block2', base_depth=128, num_units=4, stride=2),
resnet_v2_block('block3', base_depth=256, num_units=6, stride=2),
resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
]
return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
global_pool=global_pool, output_stride=output_stride,
include_root_block=True, spatial_squeeze=spatial_squeeze,
dropout_keep_prob=dropout_keep_prob, reuse=reuse, scope=scope)
resnet_v2_50.default_image_size = resnet_v2.default_image_size
def resnet_v2_101(inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
spatial_squeeze=True,
reuse=None,
scope='resnet_v2_101'):
"""ResNet-101 model of [1]. See resnet_v2() for arg and return description."""
blocks = [
resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v2_block('block2', base_depth=128, num_units=4, stride=2),
resnet_v2_block('block3', base_depth=256, num_units=23, stride=2),
resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
]
return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
global_pool=global_pool, output_stride=output_stride,
include_root_block=True, spatial_squeeze=spatial_squeeze,
reuse=reuse, scope=scope)
resnet_v2_101.default_image_size = resnet_v2.default_image_size
def resnet_v2_152(inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
spatial_squeeze=True,
dropout_keep_prob=1.,
reuse=None,
scope='resnet_v2_152'):
"""ResNet-152 model of [1]. See resnet_v2() for arg and return description."""
blocks = [
resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v2_block('block2', base_depth=128, num_units=8, stride=2),
resnet_v2_block('block3', base_depth=256, num_units=36, stride=2),
resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
]
return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
global_pool=global_pool, output_stride=output_stride,
include_root_block=True, spatial_squeeze=spatial_squeeze,
dropout_keep_prob=dropout_keep_prob, reuse=reuse, scope=scope)
resnet_v2_152.default_image_size = resnet_v2.default_image_size
def resnet_v2_200(inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
spatial_squeeze=True,
dropout_keep_prob=1.,
reuse=None,
scope='resnet_v2_200'):
"""ResNet-200 model of [2]. See resnet_v2() for arg and return description."""
blocks = [
resnet_v2_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v2_block('block2', base_depth=128, num_units=24, stride=2),
resnet_v2_block('block3', base_depth=256, num_units=36, stride=2),
resnet_v2_block('block4', base_depth=512, num_units=3, stride=1),
]
return resnet_v2(inputs, blocks, num_classes, is_training=is_training,
global_pool=global_pool, output_stride=output_stride,
include_root_block=True, spatial_squeeze=spatial_squeeze,
dropout_keep_prob=dropout_keep_prob, reuse=reuse, scope=scope)
resnet_v2_200.default_image_size = resnet_v2.default_image_size
================================================
FILE: nets/resnet_v2_test.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for slim.nets.resnet_v2."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import tensorflow as tf
from nets import resnet_utils
from nets import resnet_v2
slim = tf.contrib.slim
def create_test_input(batch_size, height, width, channels):
"""Create test input tensor.
Args:
batch_size: The number of images per batch or `None` if unknown.
height: The height of each image or `None` if unknown.
width: The width of each image or `None` if unknown.
channels: The number of channels per image or `None` if unknown.
Returns:
Either a placeholder `Tensor` of dimension
[batch_size, height, width, channels] if any of the inputs are `None` or a
constant `Tensor` with the mesh grid values along the spatial dimensions.
"""
if None in [batch_size, height, width, channels]:
return tf.placeholder(tf.float32, (batch_size, height, width, channels))
else:
return tf.to_float(
np.tile(
np.reshape(
np.reshape(np.arange(height), [height, 1]) +
np.reshape(np.arange(width), [1, width]),
[1, height, width, 1]),
[batch_size, 1, 1, channels]))
class ResnetUtilsTest(tf.test.TestCase):
def testSubsampleThreeByThree(self):
x = tf.reshape(tf.to_float(tf.range(9)), [1, 3, 3, 1])
x = resnet_utils.subsample(x, 2)
expected = tf.reshape(tf.constant([0, 2, 6, 8]), [1, 2, 2, 1])
with self.test_session():
self.assertAllClose(x.eval(), expected.eval())
def testSubsampleFourByFour(self):
x = tf.reshape(tf.to_float(tf.range(16)), [1, 4, 4, 1])
x = resnet_utils.subsample(x, 2)
expected = tf.reshape(tf.constant([0, 2, 8, 10]), [1, 2, 2, 1])
with self.test_session():
self.assertAllClose(x.eval(), expected.eval())
def testConv2DSameEven(self):
n, n2 = 4, 2
# Input image.
x = create_test_input(1, n, n, 1)
# Convolution kernel.
w = create_test_input(1, 3, 3, 1)
w = tf.reshape(w, [3, 3, 1, 1])
tf.get_variable('Conv/weights', initializer=w)
tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
tf.get_variable_scope().reuse_variables()
y1 = slim.conv2d(x, 1, [3, 3], stride=1, scope='Conv')
y1_expected = tf.to_float([[14, 28, 43, 26],
[28, 48, 66, 37],
[43, 66, 84, 46],
[26, 37, 46, 22]])
y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
y2 = resnet_utils.subsample(y1, 2)
y2_expected = tf.to_float([[14, 43],
[43, 84]])
y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
y3 = resnet_utils.conv2d_same(x, 1, 3, stride=2, scope='Conv')
y3_expected = y2_expected
y4 = slim.conv2d(x, 1, [3, 3], stride=2, scope='Conv')
y4_expected = tf.to_float([[48, 37],
[37, 22]])
y4_expected = tf.reshape(y4_expected, [1, n2, n2, 1])
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
self.assertAllClose(y1.eval(), y1_expected.eval())
self.assertAllClose(y2.eval(), y2_expected.eval())
self.assertAllClose(y3.eval(), y3_expected.eval())
self.assertAllClose(y4.eval(), y4_expected.eval())
def testConv2DSameOdd(self):
n, n2 = 5, 3
# Input image.
x = create_test_input(1, n, n, 1)
# Convolution kernel.
w = create_test_input(1, 3, 3, 1)
w = tf.reshape(w, [3, 3, 1, 1])
tf.get_variable('Conv/weights', initializer=w)
tf.get_variable('Conv/biases', initializer=tf.zeros([1]))
tf.get_variable_scope().reuse_variables()
y1 = slim.conv2d(x, 1, [3, 3], stride=1, scope='Conv')
y1_expected = tf.to_float([[14, 28, 43, 58, 34],
[28, 48, 66, 84, 46],
[43, 66, 84, 102, 55],
[58, 84, 102, 120, 64],
[34, 46, 55, 64, 30]])
y1_expected = tf.reshape(y1_expected, [1, n, n, 1])
y2 = resnet_utils.subsample(y1, 2)
y2_expected = tf.to_float([[14, 43, 34],
[43, 84, 55],
[34, 55, 30]])
y2_expected = tf.reshape(y2_expected, [1, n2, n2, 1])
y3 = resnet_utils.conv2d_same(x, 1, 3, stride=2, scope='Conv')
y3_expected = y2_expected
y4 = slim.conv2d(x, 1, [3, 3], stride=2, scope='Conv')
y4_expected = y2_expected
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
self.assertAllClose(y1.eval(), y1_expected.eval())
self.assertAllClose(y2.eval(), y2_expected.eval())
self.assertAllClose(y3.eval(), y3_expected.eval())
self.assertAllClose(y4.eval(), y4_expected.eval())
def _resnet_plain(self, inputs, blocks, output_stride=None, scope=None):
"""A plain ResNet without extra layers before or after the ResNet blocks."""
with tf.variable_scope(scope, values=[inputs]):
with slim.arg_scope([slim.conv2d], outputs_collections='end_points'):
net = resnet_utils.stack_blocks_dense(inputs, blocks, output_stride)
end_points = slim.utils.convert_collection_to_dict('end_points')
return net, end_points
def testEndPointsV2(self):
"""Test the end points of a tiny v2 bottleneck network."""
blocks = [
resnet_v2.resnet_v2_block(
'block1', base_depth=1, num_units=2, stride=2),
resnet_v2.resnet_v2_block(
'block2', base_depth=2, num_units=2, stride=1),
]
inputs = create_test_input(2, 32, 16, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
_, end_points = self._resnet_plain(inputs, blocks, scope='tiny')
expected = [
'tiny/block1/unit_1/bottleneck_v2/shortcut',
'tiny/block1/unit_1/bottleneck_v2/conv1',
'tiny/block1/unit_1/bottleneck_v2/conv2',
'tiny/block1/unit_1/bottleneck_v2/conv3',
'tiny/block1/unit_2/bottleneck_v2/conv1',
'tiny/block1/unit_2/bottleneck_v2/conv2',
'tiny/block1/unit_2/bottleneck_v2/conv3',
'tiny/block2/unit_1/bottleneck_v2/shortcut',
'tiny/block2/unit_1/bottleneck_v2/conv1',
'tiny/block2/unit_1/bottleneck_v2/conv2',
'tiny/block2/unit_1/bottleneck_v2/conv3',
'tiny/block2/unit_2/bottleneck_v2/conv1',
'tiny/block2/unit_2/bottleneck_v2/conv2',
'tiny/block2/unit_2/bottleneck_v2/conv3']
self.assertItemsEqual(expected, end_points)
def _stack_blocks_nondense(self, net, blocks):
"""A simplified ResNet Block stacker without output stride control."""
for block in blocks:
with tf.variable_scope(block.scope, 'block', [net]):
for i, unit in enumerate(block.args):
with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
net = block.unit_fn(net, rate=1, **unit)
return net
def testAtrousValuesBottleneck(self):
"""Verify the values of dense feature extraction by atrous convolution.
Make sure that dense feature extraction by stack_blocks_dense() followed by
subsampling gives identical results to feature extraction at the nominal
network output stride using the simple self._stack_blocks_nondense() above.
"""
block = resnet_v2.resnet_v2_block
blocks = [
block('block1', base_depth=1, num_units=2, stride=2),
block('block2', base_depth=2, num_units=2, stride=2),
block('block3', base_depth=4, num_units=2, stride=2),
block('block4', base_depth=8, num_units=2, stride=1),
]
nominal_stride = 8
# Test both odd and even input dimensions.
height = 30
width = 31
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
with slim.arg_scope([slim.batch_norm], is_training=False):
for output_stride in [1, 2, 4, 8, None]:
with tf.Graph().as_default():
with self.test_session() as sess:
tf.set_random_seed(0)
inputs = create_test_input(1, height, width, 3)
# Dense feature extraction followed by subsampling.
output = resnet_utils.stack_blocks_dense(inputs,
blocks,
output_stride)
if output_stride is None:
factor = 1
else:
factor = nominal_stride // output_stride
output = resnet_utils.subsample(output, factor)
# Make the two networks use the same weights.
tf.get_variable_scope().reuse_variables()
# Feature extraction at the nominal network rate.
expected = self._stack_blocks_nondense(inputs, blocks)
sess.run(tf.global_variables_initializer())
output, expected = sess.run([output, expected])
self.assertAllClose(output, expected, atol=1e-4, rtol=1e-4)
class ResnetCompleteNetworkTest(tf.test.TestCase):
"""Tests with complete small ResNet v2 networks."""
def _resnet_small(self,
inputs,
num_classes=None,
is_training=True,
global_pool=True,
output_stride=None,
include_root_block=True,
spatial_squeeze=True,
reuse=None,
scope='resnet_v2_small'):
"""A shallow and thin ResNet v2 for faster tests."""
block = resnet_v2.resnet_v2_block
blocks = [
block('block1', base_depth=1, num_units=3, stride=2),
block('block2', base_depth=2, num_units=3, stride=2),
block('block3', base_depth=4, num_units=3, stride=2),
block('block4', base_depth=8, num_units=2, stride=1),
]
return resnet_v2.resnet_v2(inputs, blocks, num_classes,
is_training=is_training,
global_pool=global_pool,
output_stride=output_stride,
include_root_block=include_root_block,
spatial_squeeze=spatial_squeeze,
reuse=reuse,
scope=scope)
def testClassificationEndPoints(self):
global_pool = True
num_classes = 10
inputs = create_test_input(2, 224, 224, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
logits, end_points = self._resnet_small(inputs, num_classes,
global_pool=global_pool,
spatial_squeeze=False,
scope='resnet')
self.assertTrue(logits.op.name.startswith('resnet/logits'))
self.assertListEqual(logits.get_shape().as_list(), [2, 1, 1, num_classes])
self.assertTrue('predictions' in end_points)
self.assertListEqual(end_points['predictions'].get_shape().as_list(),
[2, 1, 1, num_classes])
def testClassificationShapes(self):
global_pool = True
num_classes = 10
inputs = create_test_input(2, 224, 224, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
_, end_points = self._resnet_small(inputs, num_classes,
global_pool=global_pool,
scope='resnet')
endpoint_to_shape = {
'resnet/block1': [2, 28, 28, 4],
'resnet/block2': [2, 14, 14, 8],
'resnet/block3': [2, 7, 7, 16],
'resnet/block4': [2, 7, 7, 32]}
for endpoint in endpoint_to_shape:
shape = endpoint_to_shape[endpoint]
self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
def testFullyConvolutionalEndpointShapes(self):
global_pool = False
num_classes = 10
inputs = create_test_input(2, 321, 321, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
_, end_points = self._resnet_small(inputs, num_classes,
global_pool=global_pool,
spatial_squeeze=False,
scope='resnet')
endpoint_to_shape = {
'resnet/block1': [2, 41, 41, 4],
'resnet/block2': [2, 21, 21, 8],
'resnet/block3': [2, 11, 11, 16],
'resnet/block4': [2, 11, 11, 32]}
for endpoint in endpoint_to_shape:
shape = endpoint_to_shape[endpoint]
self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
def testRootlessFullyConvolutionalEndpointShapes(self):
global_pool = False
num_classes = 10
inputs = create_test_input(2, 128, 128, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
_, end_points = self._resnet_small(inputs, num_classes,
global_pool=global_pool,
include_root_block=False,
spatial_squeeze=False,
scope='resnet')
endpoint_to_shape = {
'resnet/block1': [2, 64, 64, 4],
'resnet/block2': [2, 32, 32, 8],
'resnet/block3': [2, 16, 16, 16],
'resnet/block4': [2, 16, 16, 32]}
for endpoint in endpoint_to_shape:
shape = endpoint_to_shape[endpoint]
self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
def testAtrousFullyConvolutionalEndpointShapes(self):
global_pool = False
num_classes = 10
output_stride = 8
inputs = create_test_input(2, 321, 321, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
_, end_points = self._resnet_small(inputs,
num_classes,
global_pool=global_pool,
output_stride=output_stride,
spatial_squeeze=False,
scope='resnet')
endpoint_to_shape = {
'resnet/block1': [2, 41, 41, 4],
'resnet/block2': [2, 41, 41, 8],
'resnet/block3': [2, 41, 41, 16],
'resnet/block4': [2, 41, 41, 32]}
for endpoint in endpoint_to_shape:
shape = endpoint_to_shape[endpoint]
self.assertListEqual(end_points[endpoint].get_shape().as_list(), shape)
def testAtrousFullyConvolutionalValues(self):
"""Verify dense feature extraction with atrous convolution."""
nominal_stride = 32
for output_stride in [4, 8, 16, 32, None]:
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
with tf.Graph().as_default():
with self.test_session() as sess:
tf.set_random_seed(0)
inputs = create_test_input(2, 81, 81, 3)
# Dense feature extraction followed by subsampling.
output, _ = self._resnet_small(inputs, None,
is_training=False,
global_pool=False,
output_stride=output_stride)
if output_stride is None:
factor = 1
else:
factor = nominal_stride // output_stride
output = resnet_utils.subsample(output, factor)
# Make the two networks use the same weights.
tf.get_variable_scope().reuse_variables()
# Feature extraction at the nominal network rate.
expected, _ = self._resnet_small(inputs, None,
is_training=False,
global_pool=False)
sess.run(tf.global_variables_initializer())
self.assertAllClose(output.eval(), expected.eval(),
atol=1e-4, rtol=1e-4)
def testUnknownBatchSize(self):
batch = 2
height, width = 65, 65
global_pool = True
num_classes = 10
inputs = create_test_input(None, height, width, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
logits, _ = self._resnet_small(inputs, num_classes,
global_pool=global_pool,
spatial_squeeze=False,
scope='resnet')
self.assertTrue(logits.op.name.startswith('resnet/logits'))
self.assertListEqual(logits.get_shape().as_list(),
[None, 1, 1, num_classes])
images = create_test_input(batch, height, width, 3)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(logits, {inputs: images.eval()})
self.assertEqual(output.shape, (batch, 1, 1, num_classes))
def testFullyConvolutionalUnknownHeightWidth(self):
batch = 2
height, width = 65, 65
global_pool = False
inputs = create_test_input(batch, None, None, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
output, _ = self._resnet_small(inputs, None,
global_pool=global_pool)
self.assertListEqual(output.get_shape().as_list(),
[batch, None, None, 32])
images = create_test_input(batch, height, width, 3)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(output, {inputs: images.eval()})
self.assertEqual(output.shape, (batch, 3, 3, 32))
def testAtrousFullyConvolutionalUnknownHeightWidth(self):
batch = 2
height, width = 65, 65
global_pool = False
output_stride = 8
inputs = create_test_input(batch, None, None, 3)
with slim.arg_scope(resnet_utils.resnet_arg_scope()):
output, _ = self._resnet_small(inputs,
None,
global_pool=global_pool,
output_stride=output_stride)
self.assertListEqual(output.get_shape().as_list(),
[batch, None, None, 32])
images = create_test_input(batch, height, width, 3)
with self.test_session() as sess:
sess.run(tf.global_variables_initializer())
output = sess.run(output, {inputs: images.eval()})
self.assertEqual(output.shape, (batch, 9, 9, 32))
if __name__ == '__main__':
tf.test.main()
================================================
FILE: preprocessing/__init__.py
================================================
================================================
FILE: preprocessing/decode_example.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
def decode_serialized_example(serialized_example, features_to_fetch, decode_image=True):
"""
Args:
serialized_example : A tfrecord example
features_to_fetch : a list of tuples (feature key, name for feature)
Returns:
dictionary : maps name to parsed example
"""
feature_map = {}
for feature_key, feature_name in features_to_fetch:
feature_map[feature_key] = {
'image/height': tf.FixedLenFeature([], tf.int64),
'image/width': tf.FixedLenFeature([], tf.int64),
'image/colorspace': tf.FixedLenFeature([], tf.string),
'image/channels': tf.FixedLenFeature([], tf.int64),
'image/format': tf.FixedLenFeature([], tf.string),
'image/filename': tf.FixedLenFeature([], tf.string),
'image/id': tf.FixedLenFeature([], tf.string),
'image/encoded': tf.FixedLenFeature([], tf.string),
'image/extra': tf.FixedLenFeature([], tf.string),
'image/class/label': tf.FixedLenFeature([], tf.int64),
'image/class/text': tf.FixedLenFeature([], tf.string),
'image/class/conf': tf.FixedLenFeature([], tf.float32),
'image/object/bbox/xmin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/xmax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymin': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/ymax': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/label': tf.VarLenFeature(dtype=tf.int64),
'image/object/bbox/text': tf.VarLenFeature(dtype=tf.string),
'image/object/bbox/conf': tf.VarLenFeature(dtype=tf.float32),
'image/object/bbox/score' : tf.VarLenFeature(dtype=tf.float32),
'image/object/parts/x' : tf.VarLenFeature(dtype=tf.float32),
'image/object/parts/y' : tf.VarLenFeature(dtype=tf.float32),
'image/object/parts/v' : tf.VarLenFeature(dtype=tf.int64),
'image/object/parts/score' : tf.VarLenFeature(dtype=tf.float32),
'image/object/count' : tf.FixedLenFeature([], tf.int64),
'image/object/area' : tf.VarLenFeature(dtype=tf.float32),
'image/object/id' : tf.VarLenFeature(dtype=tf.string)
}[feature_key]
features = tf.parse_single_example(
serialized_example,
features = feature_map
)
# return a dictionary of the features
parsed_features = {}
for feature_key, feature_name in features_to_fetch:
if feature_key == 'image/height':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/width':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/colorspace':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/channels':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/format':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/filename':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/id':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/encoded':
if decode_image:
parsed_features[feature_name] = tf.image.decode_jpeg(features[feature_key], channels=3)
else:
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/extra':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/class/label':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/class/text':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/class/conf':
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/object/bbox/xmin':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/xmax':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/ymin':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/ymax':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/label':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/text':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/conf':
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/bbox/score' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/parts/x' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/parts/y' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/parts/v' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/parts/score' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/count' :
parsed_features[feature_name] = features[feature_key]
elif feature_key == 'image/object/area' :
parsed_features[feature_name] = features[feature_key].values
elif feature_key == 'image/object/id' :
parsed_features[feature_name] = features[feature_key].values
return parsed_features
================================================
FILE: preprocessing/inputs.py
================================================
# Some of this code came from the https://github.com/tensorflow/models/tree/master/slim
# directory, so lets keep the Google license around for now.
#
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Provides utilities to preprocess images for the Inception networks."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from easydict import EasyDict
import tensorflow as tf
from tensorflow.python.ops import control_flow_ops
from preprocessing.decode_example import decode_serialized_example
def apply_with_random_selector(x, func, num_cases):
"""Computes func(x, sel), with sel sampled from [0...num_cases-1].
Args:
x: input Tensor.
func: Python function to apply.
num_cases: Python int32, number of cases to sample sel from.
Returns:
The result of func(x, sel), where func receives the value of the
selector as a python integer, but sel is sampled dynamically.
"""
sel = tf.random_uniform([], maxval=num_cases, dtype=tf.int32)
# Pass the real x only to one of the func calls.
return control_flow_ops.merge([
func(control_flow_ops.switch(x, tf.equal(sel, case))[1], case)
for case in range(num_cases)])[0]
def distort_color(image, color_ordering=0, fast_mode=True, scope=None):
"""Distort the color of a Tensor image.
Each color distortion is non-commutative and thus ordering of the color ops
matters. Ideally we would randomly permute the ordering of the color ops.
Rather then adding that level of complication, we select a distinct ordering
of color ops for each preprocessing thread.
Args:
image: 3-D Tensor containing single image in [0, 1].
color_ordering: Python int, a type of distortion (valid values: 0-3).
fast_mode: Avoids slower ops (random_hue and random_contrast)
scope: Optional scope for name_scope.
Returns:
3-D Tensor color-distorted image on range [0, 1]
Raises:
ValueError: if color_ordering not in [0, 3]
"""
with tf.name_scope(scope, 'distort_color', [image]):
if fast_mode:
if color_ordering == 0:
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
else:
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_brightness(image, max_delta=32. / 255.)
else:
if color_ordering == 0:
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_hue(image, max_delta=0.2)
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
elif color_ordering == 1:
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
image = tf.image.random_hue(image, max_delta=0.2)
elif color_ordering == 2:
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
image = tf.image.random_hue(image, max_delta=0.2)
image = tf.image.random_brightness(image, max_delta=32. / 255.)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
elif color_ordering == 3:
image = tf.image.random_hue(image, max_delta=0.2)
image = tf.image.random_saturation(image, lower=0.5, upper=1.5)
image = tf.image.random_contrast(image, lower=0.5, upper=1.5)
image = tf.image.random_brightness(image, max_delta=32. / 255.)
else:
raise ValueError('color_ordering must be in [0, 3]')
# The random_* ops do not necessarily clamp.
return tf.clip_by_value(image, 0.0, 1.0)
def distorted_bounding_box_crop(image,
bbox,
min_object_covered=0.1,
aspect_ratio_range=(0.75, 1.33),
area_range=(0.05, 1.0),
max_attempts=100,
scope=None):
"""Generates cropped_image using a one of the bboxes randomly distorted.
See `tf.image.sample_distorted_bounding_box` for more documentation.
Args:
image: 3-D Tensor of image (it will be converted to floats in [0, 1]).
bbox: 3-D float Tensor of bounding boxes arranged [1, num_boxes, coords]
where each coordinate is [0, 1) and the coordinates are arranged
as [ymin, xmin, ymax, xmax]. If num_boxes is 0 then it would use the whole
image.
min_object_covered: An optional `float`. Defaults to `0.1`. The cropped
area of the image must contain at least this fraction of any bounding box
supplied.
aspect_ratio_range: An optional list of `floats`. The cropped area of the
image must have an aspect ratio = width / height within this range.
area_range: An optional list of `floats`. The cropped area of the image
must contain a fraction of the supplied image within in this range.
max_attempts: An optional `int`. Number of attempts at generating a cropped
region of the image of the specified constraints. After `max_attempts`
failures, return the entire image.
scope: Optional scope for name_scope.
Returns:
A tuple, a 3-D Tensor cropped_image and the distorted bbox
"""
with tf.name_scope(scope, 'distorted_bounding_box_crop', [image, bbox]):
# Each bounding box has shape [1, num_boxes, box coords] and
# the coordinates are ordered [ymin, xmin, ymax, xmax].
# A large fraction of image datasets contain a human-annotated bounding
# box delineating the region of the image containing the object of interest.
# We choose to create a new bounding box for the object which is a randomly
# distorted version of the human-annotated bounding box that obeys an
# allowed range of aspect ratios, sizes and overlap with the human-annotated
# bounding box. If no box is supplied, then we assume the bounding box is
# the entire image.
sample_distorted_bounding_box = tf.image.sample_distorted_bounding_box(
tf.shape(image),
bounding_boxes=bbox,
min_object_covered=min_object_covered,
aspect_ratio_range=aspect_ratio_range,
area_range=area_range,
max_attempts=max_attempts,
use_image_if_no_bounding_boxes=True)
bbox_begin, bbox_size, distort_bbox = sample_distorted_bounding_box
# Crop the image to the specified bounding box.
cropped_image = tf.slice(image, bbox_begin, bbox_size)
return tf.tuple([cropped_image, distort_bbox])
def _largest_size_at_most(height, width, largest_side):
"""Computes new shape with the largest side equal to `largest_side`.
Computes new shape with the largest side equal to `largest_side` while
preserving the original aspect ratio.
Args:
height: an int32 scalar tensor indicating the current height.
width: an int32 scalar tensor indicating the current width.
largest_side: A python integer or scalar `Tensor` indicating the size of
the largest side after resize.
Returns:
new_height: an int32 scalar tensor indicating the new height.
new_width: and int32 scalar tensor indicating the new width.
"""
largest_side = tf.convert_to_tensor(largest_side, dtype=tf.int32)
height = tf.to_float(height)
width = tf.to_float(width)
largest_side = tf.to_float(largest_side)
scale = tf.cond(tf.greater(height, width),
lambda: largest_side / height,
lambda: largest_side / width)
new_height = tf.to_int32(height * scale)
new_width = tf.to_int32(width * scale)
return new_height, new_width
class DistortedInputs():
def __init__(self, cfg, add_summaries):
self.cfg = cfg
self.add_summaries = add_summaries
def apply(self, original_image, bboxes, distorted_inputs, image_summaries, current_index):
cfg = self.cfg
add_summaries = self.add_summaries
image_shape = tf.shape(original_image)
image_height = tf.cast(image_shape[0], dtype=tf.float32) # cast so that we can multiply them by the bbox coords
image_width = tf.cast(image_shape[1], dtype=tf.float32)
# First thing we need to do is crop out the bbox region from the image
bbox = bboxes[current_index]
xmin = tf.cast(bbox[0] * image_width, tf.int32)
ymin = tf.cast(bbox[1] * image_height, tf.int32)
xmax = tf.cast(bbox[2] * image_width, tf.int32)
ymax = tf.cast(bbox[3] * image_height, tf.int32)
bbox_width = xmax - xmin
bbox_height = ymax - ymin
image = tf.image.crop_to_bounding_box(
image=original_image,
offset_height=ymin,
offset_width=xmin,
target_height=bbox_height,
target_width=bbox_width
)
image_height = bbox_height
image_width = bbox_width
# Convert the pixel values to be in the range [0,1]
if image.dtype != tf.float32:
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# Add a summary of the original data
if add_summaries:
new_height, new_width = _largest_size_at_most(image_height, image_width, cfg.INPUT_SIZE)
resized_original_image = tf.image.resize_bilinear(tf.expand_dims(image, 0), [new_height, new_width])
resized_original_image = tf.squeeze(resized_original_image)
resized_original_image = tf.image.pad_to_bounding_box(resized_original_image, 0, 0, cfg.INPUT_SIZE, cfg.INPUT_SIZE)
# If there are multiple boxes for an image, we only want to write to the TensorArray once.
#image_summaries = image_summaries.write(0, tf.expand_dims(resized_original_image, 0))
image_summaries = tf.cond(tf.equal(current_index, 0),
lambda: image_summaries.write(0, tf.expand_dims(resized_original_image, 0)),
lambda: image_summaries.identity()
)
# Extract a distorted bbox
if cfg.DO_RANDOM_CROP > 0:
r = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
do_crop = tf.less(r, cfg.DO_RANDOM_CROP)
rc_cfg = cfg.RANDOM_CROP_CFG
bbox = tf.constant([0.0, 0.0, 1.0, 1.0], dtype=tf.float32, shape=[1, 1, 4])
distorted_image, distorted_bbox = tf.cond(do_crop,
lambda: distorted_bounding_box_crop(image, bbox,
aspect_ratio_range=(rc_cfg.MIN_ASPECT_RATIO, rc_cfg.MAX_ASPECT_RATIO),
area_range=(rc_cfg.MIN_AREA, rc_cfg.MAX_AREA),
max_attempts=rc_cfg.MAX_ATTEMPTS),
lambda: tf.tuple([image, bbox])
)
else:
distorted_image = tf.identity(image)
distorted_bbox = tf.constant([[[0.0, 0.0, 1.0, 1.0]]]) # ymin, xmin, ymax, xmax
if cfg.DO_CENTRAL_CROP > 0:
r = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
do_crop = tf.less(r, cfg.DO_CENTRAL_CROP)
distorted_image = tf.cond(do_crop,
lambda: tf.image.central_crop(distorted_image, cfg.CENTRAL_CROP_FRACTION),
lambda: tf.identity(distorted_image)
)
distorted_image.set_shape([None, None, 3])
# Add a summary
if add_summaries:
image_with_bbox = tf.image.draw_bounding_boxes(tf.expand_dims(image, 0), distorted_bbox)
new_height, new_width = _largest_size_at_most(image_height, image_width, cfg.INPUT_SIZE)
resized_image_with_bbox = tf.image.resize_bilinear(image_with_bbox, [new_height, new_width])
resized_image_with_bbox = tf.squeeze(resized_image_with_bbox)
resized_image_with_bbox = tf.image.pad_to_bounding_box(resized_image_with_bbox, 0, 0, cfg.INPUT_SIZE, cfg.INPUT_SIZE)
#image_summaries = image_summaries.write(1, tf.expand_dims(resized_image_with_bbox, 0))
image_summaries = tf.cond(tf.equal(current_index, 0),
lambda: image_summaries.write(1, tf.expand_dims(resized_image_with_bbox, 0)),
lambda: image_summaries.identity()
)
# Resize the distorted image to the correct dimensions for the network
if cfg.MAINTAIN_ASPECT_RATIO:
shape = tf.shape(distorted_image)
height = shape[0]
width = shape[1]
new_height, new_width = _largest_size_at_most(height, width, cfg.INPUT_SIZE)
else:
new_height = cfg.INPUT_SIZE
new_width = cfg.INPUT_SIZE
num_resize_cases = 1 if cfg.RESIZE_FAST else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [new_height, new_width], method=method),
num_cases=num_resize_cases)
distorted_image = tf.image.pad_to_bounding_box(distorted_image, 0, 0, cfg.INPUT_SIZE, cfg.INPUT_SIZE)
if add_summaries:
#image_summaries = image_summaries.write(2, tf.expand_dims(distorted_image, 0))
image_summaries = tf.cond(tf.equal(current_index, 0),
lambda: image_summaries.write(2, tf.expand_dims(distorted_image, 0)),
lambda: image_summaries.identity()
)
# Randomly flip the image:
if cfg.DO_RANDOM_FLIP_LEFT_RIGHT > 0:
r = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
do_flip = tf.less(r, 0.5)
distorted_image = tf.cond(do_flip, lambda: tf.image.flip_left_right(distorted_image), lambda: tf.identity(distorted_image))
# TODO: Can this be changed so that we don't always distort the colors?
# Distort the colors
if cfg.DO_COLOR_DISTORTION > 0:
r = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
do_color_distortion = tf.less(r, cfg.DO_COLOR_DISTORTION)
num_color_cases = 1 if cfg.COLOR_DISTORT_FAST else 4
distorted_color_image = apply_with_random_selector(
distorted_image,
lambda x, ordering: distort_color(x, ordering, fast_mode=cfg.COLOR_DISTORT_FAST),
num_cases=num_color_cases)
distorted_image = tf.cond(do_color_distortion, lambda: tf.identity(distorted_color_image), lambda: tf.identity(distorted_image))
distorted_image.set_shape([cfg.INPUT_SIZE, cfg.INPUT_SIZE, 3])
# Add a summary
if add_summaries:
#image_summaries = image_summaries.write(3, tf.expand_dims(distorted_image, 0))
image_summaries = tf.cond(tf.equal(current_index, 0),
lambda: image_summaries.write(3, tf.expand_dims(distorted_image, 0)),
lambda: image_summaries.identity()
)
# Add the distorted image to the TensorArray
distorted_inputs = distorted_inputs.write(current_index, tf.expand_dims(distorted_image, 0))
return [original_image, bboxes, distorted_inputs, image_summaries, current_index + 1]
def check_normalized_box_values(xmin, ymin, xmax, ymax, maximum_normalized_coordinate=1.01, prefix=""):
""" Make sure the normalized coordinates are less than 1
"""
xmin_maximum = tf.reduce_max(xmin)
xmin_assert = tf.Assert(
tf.greater_equal(1.01, xmin_maximum),
['%s, maximum xmin coordinate value is larger '
'than %f: ' % (prefix, maximum_normalized_coordinate), xmin_maximum])
with tf.control_dependencies([xmin_assert]):
xmin = tf.identity(xmin)
ymin_maximum = tf.reduce_max(ymin)
ymin_assert = tf.Assert(
tf.greater_equal(1.01, ymin_maximum),
['%s, maximum ymin coordinate value is larger '
'than %f: ' % (prefix, maximum_normalized_coordinate), ymin_maximum])
with tf.control_dependencies([ymin_assert]):
ymin = tf.identity(ymin)
xmax_maximum = tf.reduce_max(xmax)
xmax_assert = tf.Assert(
tf.greater_equal(1.01, xmax_maximum),
['%s, maximum xmax coordinate value is larger '
'than %f: ' % (prefix, maximum_normalized_coordinate), xmax_maximum])
with tf.control_dependencies([xmax_assert]):
xmax = tf.identity(xmax)
ymax_maximum = tf.reduce_max(ymax)
ymax_assert = tf.Assert(
tf.greater_equal(1.01, ymax_maximum),
['%s, maximum ymax coordinate value is larger '
'than %f: ' % (prefix, maximum_normalized_coordinate), ymax_maximum])
with tf.control_dependencies([ymax_assert]):
ymax = tf.identity(ymax)
return xmin, ymin, xmax, ymax
def expand_bboxes(xmin, xmax, ymin, ymax, cfg):
"""
Expand the bboxes.
"""
w = xmax - xmin
h = ymax - ymin
w = w * cfg.WIDTH_EXPANSION_FACTOR
h = h * cfg.HEIGHT_EXPANSION_FACTOR
half_w = w / 2.
half_h = h / 2.
xmin = tf.clip_by_value(xmin - half_w, 0, 1)
xmax = tf.clip_by_value(xmax + half_w, 0, 1)
ymin = tf.clip_by_value(ymin - half_h, 0, 1)
ymax = tf.clip_by_value(ymax + half_h, 0, 1)
return tf.tuple([xmin, xmax, ymin, ymax])
def get_region_data(serialized_example, cfg, fetch_ids=True, fetch_labels=True, fetch_text_labels=True, read_filename=False):
"""
Return the image, an array of bounding boxes, and an array of ids.
"""
feature_dict = {}
if cfg.REGION_TYPE == 'bbox':
bbox_cfg = cfg.BBOX_CFG
features_to_extract = [('image/object/bbox/xmin', 'xmin'),
('image/object/bbox/xmax', 'xmax'),
('image/object/bbox/ymin', 'ymin'),
('image/object/bbox/ymax', 'ymax'),
('image/object/bbox/ymax', 'ymax')]
if read_filename:
features_to_extract.append(('image/filename', 'filename'))
else:
features_to_extract.append(('image/encoded', 'image'))
if fetch_ids:
features_to_extract.append(('image/object/id', 'id'))
if fetch_labels:
features_to_extract.append(('image/object/bbox/label', 'label'))
if fetch_text_labels:
features_to_extract.append(('image/object/bbox/text', 'text'))
features = decode_serialized_example(serialized_example, features_to_extract)
if read_filename:
image_buffer = tf.read_file(features['filename'])
image = tf.image.decode_jpeg(image_buffer, channels=3)
else:
image = features['image']
feature_dict['image'] = image
xmin = tf.expand_dims(features['xmin'], 0)
ymin = tf.expand_dims(features['ymin'], 0)
xmax = tf.expand_dims(features['xmax'], 0)
ymax = tf.expand_dims(features['ymax'], 0)
xmin, ymin, xmax, ymax = check_normalized_box_values(xmin, ymin, xmax, ymax, prefix="From tfrecords ")
if 'DO_EXPANSION' in bbox_cfg and bbox_cfg.DO_EXPANSION > 0:
r = tf.random_uniform([], minval=0, maxval=1, dtype=tf.float32)
do_expansion = tf.less(r, bbox_cfg.DO_EXPANSION)
xmin, xmax, ymin, ymax = tf.cond(do_expansion,
lambda: expand_bboxes(xmin, xmax, ymin, ymax, bbox_cfg.EXPANSION_CFG),
lambda: tf.tuple([xmin, xmax, ymin, ymax])
)
xmin, ymin, xmax, ymax = check_normalized_box_values(xmin, ymin, xmax, ymax, prefix="After expansion ")
# combine the bounding boxes
bboxes = tf.concat(values=[xmin, ymin, xmax, ymax], axis=0)
# order the bboxes so that they have the shape: [num_bboxes, bbox_coords]
bboxes = tf.transpose(bboxes, [1, 0])
feature_dict['bboxes'] = bboxes
if fetch_ids:
ids = features['id']
feature_dict['ids'] = ids
if fetch_labels:
labels = features['label']
feature_dict['labels'] = labels
if fetch_text_labels:
text = features['text']
feature_dict['text'] = text
elif cfg.REGION_TYPE == 'image':
features_to_extract = []
if read_filename:
features_to_extract.append(('image/filename', 'filename'))
else:
features_to_extract.append(('image/encoded', 'image'))
if fetch_ids:
features_to_extract.append(('image/id', 'id'))
if fetch_labels:
features_to_extract.append(('image/class/label', 'label'))
if fetch_text_labels:
features_to_extract.append(('image/class/text', 'text'))
features = decode_serialized_example(serialized_example, features_to_extract)
if read_filename:
image_buffer = tf.read_file(features['filename'])
image = tf.image.decode_jpeg(image_buffer, channels=3)
else:
image = features['image']
feature_dict['image'] = image
bboxes = tf.constant([[0.0, 0.0, 1.0, 1.0]])
feature_dict['bboxes'] = bboxes
if fetch_ids:
ids = [features['id']]
feature_dict['ids'] = ids
if fetch_labels:
labels = [features['label']]
feature_dict['labels'] = labels
if fetch_text_labels:
text = [features['text']]
feature_dict['text'] = text
else:
raise ValueError("Unknown REGION_TYPE: %s" % (cfg.REGION_TYPE,))
return feature_dict
def bbox_crop_loop_cond(original_image, bboxes, distorted_inputs, image_summaries, current_index):
num_bboxes = tf.shape(bboxes)[0]
return current_index < num_bboxes
def get_distorted_inputs(original_image, bboxes, cfg, add_summaries):
distorter = DistortedInputs(cfg, add_summaries)
num_bboxes = tf.shape(bboxes)[0]
distorted_inputs = tf.TensorArray(
dtype=tf.float32,
size=num_bboxes,
element_shape=tf.TensorShape([1, cfg.INPUT_SIZE, cfg.INPUT_SIZE, 3])
)
if add_summaries:
image_summaries = tf.TensorArray(
dtype=tf.float32,
size=4,
element_shape=tf.TensorShape([1, cfg.INPUT_SIZE, cfg.INPUT_SIZE, 3])
)
else:
image_summaries = tf.constant([])
current_index = tf.constant(0, dtype=tf.int32)
loop_vars = [original_image, bboxes, distorted_inputs, image_summaries, current_index]
original_image, bboxes, distorted_inputs, image_summaries, current_index = tf.while_loop(
cond=bbox_crop_loop_cond,
body=distorter.apply,
loop_vars=loop_vars,
parallel_iterations=10, back_prop=False, swap_memory=False
)
distorted_inputs = distorted_inputs.concat()
if add_summaries:
tf.summary.image('0.original_image', image_summaries.read(0))
tf.summary.image('1.image_with_random_crop', image_summaries.read(1))
tf.summary.image('2.cropped_resized_image', image_summaries.read(2))
tf.summary.image('3.final_distorted_image', image_summaries.read(3))
return distorted_inputs
def create_training_batch(serialized_example, cfg, add_summaries, read_filenames=False):
features = get_region_data(serialized_example, cfg, fetch_ids=False,
fetch_labels=True, fetch_text_labels=False, read_filename=read_filenames)
original_image = features['image']
bboxes = features['bboxes']
labels = features['labels']
distorted_inputs = get_distorted_inputs(original_image, bboxes, cfg, add_summaries)
distorted_inputs = tf.subtract(distorted_inputs, 0.5)
distorted_inputs = tf.multiply(distorted_inputs, 2.0)
names = ('inputs', 'labels')
tensors = [distorted_inputs, labels]
return [names, tensors]
def create_visualization_batch(serialized_example, cfg, add_summaries, fetch_text_labels=False, read_filenames=False):
features = get_region_data(serialized_example, cfg, fetch_ids=True,
fetch_labels=True, fetch_text_labels=fetch_text_labels, read_filename=read_filenames)
original_image = features['image']
ids = features['ids']
bboxes = features['bboxes']
labels = features['labels']
if fetch_text_labels:
text_labels = features['text']
cpy_original_image = tf.identity(original_image)
distorted_inputs = get_distorted_inputs(original_image, bboxes, cfg, add_summaries)
original_image = cpy_original_image
# Resize the original image
if original_image.dtype != tf.float32:
original_image = tf.image.convert_image_dtype(original_image, dtype=tf.float32)
shape = tf.shape(original_image)
height = shape[0]
width = shape[1]
new_height, new_width = _largest_size_at_most(height, width, cfg.INPUT_SIZE)
original_image = tf.image.resize_images(original_image, [new_height, new_width], method=0)
original_image = tf.image.pad_to_bounding_box(original_image, 0, 0, cfg.INPUT_SIZE, cfg.INPUT_SIZE)
original_image = tf.image.convert_image_dtype(original_image, dtype=tf.uint8)
# make a copy of the original image for each bounding box
num_bboxes = tf.shape(bboxes)[0]
expanded_original_image = tf.expand_dims(original_image, 0)
concatenated_original_images = tf.tile(expanded_original_image, [num_bboxes, 1, 1, 1])
names = ['original_inputs', 'inputs', 'ids', 'labels']
tensors = [concatenated_original_images, distorted_inputs, ids, labels]
if fetch_text_labels:
names.append('text_labels')
tensors.append(text_labels)
return [names, tensors]
def create_classification_batch(serialized_example, cfg, add_summaries, read_filenames=False):
features = get_region_data(serialized_example, cfg, fetch_ids=True,
fetch_labels=False, fetch_text_labels=False, read_filename=read_filenames)
original_image = features['image']
bboxes = features['bboxes']
ids = features['ids']
distorted_inputs = get_distorted_inputs(original_image, bboxes, cfg, add_summaries)
distorted_inputs = tf.subtract(distorted_inputs, 0.5)
distorted_inputs = tf.multiply(distorted_inputs, 2.0)
names = ('inputs', 'ids')
tensors = [distorted_inputs, ids]
return [names, tensors]
def input_nodes(tfrecords, cfg, num_epochs=None, batch_size=32, num_threads=2,
shuffle_batch = True, random_seed=1, capacity = 1000, min_after_dequeue = 96,
add_summaries=True, input_type='train', fetch_text_labels=False,
read_filenames=False):
"""
Args:
tfrecords:
cfg:
num_epochs: number of times to read the tfrecords
batch_size:
num_threads:
shuffle_batch:
capacity:
min_after_dequeue:
add_summaries: Add tensorboard summaries of the images
input_type: 'train', 'visualize', 'test', 'classification'
"""
with tf.name_scope('inputs'):
# A producer to generate tfrecord file paths
filename_queue = tf.train.string_input_producer(
tfrecords,
num_epochs=num_epochs
)
# Construct a Reader to read examples from the tfrecords file
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
if input_type=='train' or input_type=='test':
batch_keys, data_to_batch = create_training_batch(serialized_example, cfg, add_summaries, read_filenames)
elif input_type=='visualize':
batch_keys, data_to_batch = create_visualization_batch(serialized_example, cfg, add_summaries, fetch_text_labels, read_filenames)
elif input_type=='classification':
batch_keys, data_to_batch = create_classification_batch(serialized_example, cfg, add_summaries, read_filenames)
else:
raise ValueError("Unknown input type: %s. Options are `train`, `test`, " \
"`visualize`, and `classification`." % (input_type,))
if shuffle_batch:
batch = tf.train.shuffle_batch(
data_to_batch,
batch_size=batch_size,
num_threads=num_threads,
capacity= capacity,
min_after_dequeue= min_after_dequeue,
seed = random_seed,
enqueue_many=True
)
else:
batch = tf.train.batch(
data_to_batch,
batch_size=batch_size,
num_threads=num_threads,
capacity= capacity,
enqueue_many=True
)
batch_dict = {k : v for k, v in zip(batch_keys, batch)}
return batch_dict
================================================
FILE: requirements.txt
================================================
easydict>=1.6
matplotlib>=2.0.0
numpy>=1.12.0
PyYAML>=3.11
tensorflow>=1.0.0
================================================
FILE: test.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
from config.parse_config import parse_config_file
from nets import nets_factory
from preprocessing import inputs
def test(tfrecords, checkpoint_path, save_dir, max_iterations, eval_interval_secs, cfg, read_images=False):
"""
Args:
tfrecords (list)
checkpoint_path (str)
savedir (str)
max_iterations (int)
cfg (EasyDict)
"""
tf.logging.set_verbosity(tf.logging.DEBUG)
graph = tf.Graph()
with graph.as_default():
global_step = slim.get_or_create_global_step()
with tf.device('/cpu:0'):
batch_dict = inputs.input_nodes(
tfrecords=tfrecords,
cfg=cfg.IMAGE_PROCESSING,
num_epochs=1,
batch_size=cfg.BATCH_SIZE,
num_threads=cfg.NUM_INPUT_THREADS,
shuffle_batch =cfg.SHUFFLE_QUEUE,
random_seed=cfg.RANDOM_SEED,
capacity=cfg.QUEUE_CAPACITY,
min_after_dequeue=cfg.QUEUE_MIN,
add_summaries=False,
input_type='test',
read_filenames=read_images
)
batched_one_hot_labels = slim.one_hot_encoding(batch_dict['labels'],
num_classes=cfg.NUM_CLASSES)
arg_scope = nets_factory.arg_scopes_map[cfg.MODEL_NAME]()
with slim.arg_scope(arg_scope):
logits, end_points = nets_factory.networks_map[cfg.MODEL_NAME](
inputs=batch_dict['inputs'],
num_classes=cfg.NUM_CLASSES,
is_training=False
)
predictions = end_points['Predictions']
#labels = tf.squeeze(batch_dict['labels'])
labels = batch_dict['labels']
# Add the loss summary
loss = tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=batched_one_hot_labels, label_smoothing=0., weights=1.0)
if 'MOVING_AVERAGE_DECAY' in cfg and cfg.MOVING_AVERAGE_DECAY > 0:
variable_averages = tf.train.ExponentialMovingAverage(
cfg.MOVING_AVERAGE_DECAY, global_step)
variables_to_restore = variable_averages.variables_to_restore(
slim.get_model_variables())
variables_to_restore[global_step.op.name] = global_step
else:
variables_to_restore = slim.get_variables_to_restore()
variables_to_restore.append(global_step)
# Define the metrics:
metric_map = {
'Accuracy': tf.metrics.accuracy(labels=labels, predictions=tf.argmax(predictions, 1)),#slim.metrics.streaming_accuracy(labels=labels, predictions=tf.argmax(predictions, 1)),
loss.op.name : slim.metrics.streaming_mean(loss)
}
if len(cfg.ACCURACY_AT_K_METRIC) > 0:
bool_labels = tf.ones([cfg.BATCH_SIZE], dtype=tf.bool)
for k in cfg.ACCURACY_AT_K_METRIC:
if k <= 1 or k > cfg.NUM_CLASSES:
continue
in_top_k = tf.nn.in_top_k(predictions=predictions, targets=labels, k=k)
metric_map['Accuracy_at_%s' % k] = tf.metrics.accuracy(labels=bool_labels, predictions=in_top_k)#slim.metrics.streaming_accuracy(labels=bool_labels, predictions=in_top_k)
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map(metric_map)
# Print the summaries to screen.
print_global_step = True
for name, value in names_to_values.iteritems():
summary_name = 'eval/%s' % name
op = tf.summary.scalar(summary_name, value, collections=[])
if print_global_step:
op=tf.Print(op, [global_step], "Model Step ")
print_global_step = False
op = tf.Print(op, [value], summary_name)
tf.add_to_collection(tf.GraphKeys.SUMMARIES, op)
if max_iterations > 0:
num_batches = max_iterations
else:
# This ensures that we make a single pass over all of the data.
# We could use ceil if the batch queue is allowed to pad the last batch
num_batches = np.floor(cfg.NUM_TEST_EXAMPLES / float(cfg.BATCH_SIZE))
sess_config = tf.ConfigProto(
log_device_placement=cfg.SESSION_CONFIG.LOG_DEVICE_PLACEMENT,
allow_soft_placement = True,
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=cfg.SESSION_CONFIG.PER_PROCESS_GPU_MEMORY_FRACTION
),
intra_op_parallelism_threads=cfg.SESSION_CONFIG.INTRA_OP_PARALLELISM_THREADS if 'INTRA_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None,
inter_op_parallelism_threads=cfg.SESSION_CONFIG.INTER_OP_PARALLELISM_THREADS if 'INTER_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None
)
if eval_interval_secs > 0:
if not os.path.isdir(checkpoint_path):
raise ValueError("checkpoint_path should be a path to a directory when " \
"evaluating in a loop.")
slim.evaluation.evaluation_loop(
master='',
checkpoint_dir=checkpoint_path,
logdir=save_dir,
num_evals=num_batches,
initial_op=None,
initial_op_feed_dict=None,
eval_op=names_to_updates.values(),
eval_op_feed_dict=None,
final_op=None,
final_op_feed_dict=None,
summary_op=tf.summary.merge_all(),
summary_op_feed_dict=None,
variables_to_restore=variables_to_restore,
eval_interval_secs=eval_interval_secs,
max_number_of_evaluations=None,
session_config=sess_config,
timeout=None
)
else:
if os.path.isdir(checkpoint_path):
checkpoint_dir = checkpoint_path
checkpoint_path = tf.train.latest_checkpoint(checkpoint_dir)
if checkpoint_path is None:
raise ValueError("Unable to find a model checkpoint in the " \
"directory %s" % (checkpoint_dir,))
tf.logging.info('Evaluating %s' % checkpoint_path)
slim.evaluation.evaluate_once(
master='',
checkpoint_path=checkpoint_path,
logdir=save_dir,
num_evals=num_batches,
eval_op=names_to_updates.values(),
variables_to_restore=variables_to_restore,
session_config=sess_config
)
def parse_args():
parser = argparse.ArgumentParser(description='Test the person classifier')
parser.add_argument('--tfrecords', dest='tfrecords',
help='Paths to tfrecords.', type=str,
nargs='+', required=True)
parser.add_argument('--checkpoint_path', dest='checkpoint_path',
help='Path to a specific model to test against. If a directory, then the newest checkpoint file will be used.', type=str,
required=True, default=None)
parser.add_argument('--save_dir', dest='savedir',
help='Path to directory to store summary files.', type=str,
required=True)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file.',
required=True, type=str)
parser.add_argument('--eval_interval_secs', dest='eval_interval_secs',
help='Go into an evaluation loop, waiting this many seconds between evaluations. Default is to evaluate once.',
required=False, type=int, default=0)
parser.add_argument('--batch_size', dest='batch_size',
help='The number of images in a batch.',
required=False, type=int, default=None)
parser.add_argument('--batches', dest='batches',
help='Maximum number of iterations to run. Default is all records (modulo the batch size).',
required=False, type=int, default=0)
parser.add_argument('--model_name', dest='model_name',
help='The name of the architecture to use.',
required=False, type=str, default=None)
parser.add_argument('--read_images', dest='read_images',
help='Read the images from the file system using the `filename` field rather than using the `encoded` field of the tfrecord.',
action='store_true', default=False)
args = parser.parse_args()
return args
def main():
args = parse_args()
cfg = parse_config_file(args.config_file)
if args.batch_size != None:
cfg.BATCH_SIZE = args.batch_size
if args.model_name != None:
cfg.MODEL_NAME = args.model_name
test(
tfrecords=args.tfrecords,
checkpoint_path=args.checkpoint_path,
save_dir=args.savedir,
max_iterations=args.batches,
eval_interval_secs=args.eval_interval_secs,
cfg=cfg,
read_images=args.read_images
)
if __name__ == '__main__':
main()
================================================
FILE: tfserving/README.md
================================================
# TensorFlow Serving Utilities
This directory contains utility code for interacting with a [TensorFlow Serving](https://www.tensorflow.org/serving/) instance. I'll walk through the basic steps of using TensorFlow Serving below.
## Export a Trained Model
When your training process has finished you will be left with a training checkpoint file created by the [tf.train.Saver](https://www.tensorflow.org/api_docs/python/tf/train/Saver) class. We need to convert this checkpoint file for use with TensorFlow Serving. You'll need to create a yaml configuration file for the export (essentially specifying the number of classes, input size, and a few other things). An example:
```yaml
# Export specific configuration
RANDOM_SEED : 1.0
SESSION_CONFIG : {
# If true, then the device location of each variable will be printed
LOG_DEVICE_PLACEMENT : false,
# How much GPU memory we are allowed to pre-allocate
PER_PROCESS_GPU_MEMORY_FRACTION : 0.9
}
#################################################
# Dataset Info
# The number of classes we are classifying
NUM_CLASSES : 200
# The model architecture to use.
MODEL_NAME : 'inception_v3'
# END: Dataset Info
#################################################
# Image Processing and Augmentation
IMAGE_PROCESSING : {
# Images are assumed to be raveled, and have length INPUT_SIZE * INPUT_SIZE * 3
INPUT_SIZE : 299
}
# END: Image Processing and Augmentation
#################################################
# Regularization
#
# The decay to use for the moving average. If 0, then moving average is not computed
# When restoring models, this value is needed to determine whether to restore moving
# average variables or not.
MOVING_AVERAGE_DECAY : 0.9999
# End: Regularization
#################################################
```
To export the model, we'll use the [export.py](export.py) script:
```
python export.py \
--checkpoint_path model.ckpt-399739 \
--export_dir export \
--export_version 1 \
--config config_export.yaml \
--serving \
--add_preprocess \
--class_names class-codes.txt
```
This will create a directory called `1` in the `export_dir` directory and will contain the files that TensorFlow Serving requires. We've passed in semantic identifiers for the classes using the `--class_names` argument. This will allow clients to receive semantically meaningful identifiers along with the prediction results. This removes the requirement of clients having to map from score indices to identifiers themselves. The class-codes.txt file contains one identifier per line, with each line corresponding to one index in the scores array. For example:
```txt
car
pedestrian
light post
trash can
bench
```
## Server Machine
Spin up an Ubuntu 16.04 instance on your favorite cloud provider, or use your personal machine. You'll need to add the TensorFlow Serving distribution URI as a package source prior to installing (notes [here](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/setup.md#installing-using-apt-get)):
```
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install tensorflow-model-server
```
You can also install from [source](https://github.com/tensorflow/serving/blob/master/tensorflow_serving/g3doc/setup.md#installation).
Create a models directory, such as `/home/ubuntu/serving/models`, and copy your `1` directory (that was created with the export.py script) to this directory. Alternatively, you can just specify `/home/ubuntu/serving/models` as your `--export_dir` when calling the export.py script.
Now you can start the server:
```
tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/home/ubuntu/serving/models
```
Note the `--model_name` field, the client will need to know this when querying the server.
## Client Machine
To query the server from a client machine you'll need to install the `tensorflow-serving-api` PIP package along with the `tensorflow` package. I use `numpy` for some operations so I'll install that too:
```
pip install numpy tensorflow tensorflow-serving-api
```
We can now query the server using the [client.py](client.py) file:
```
python client.py \
--images IMG_0932_sm.jpg \
--num_results 10 \
--model_name inception \
--host localhost \
--port 9000 \
--timeout 10
```
This command will send the `IMG_0932_sm.jpg` file to the TensorFlow Serving instance at `localhost:9000` and print the top 10 class predictions.
Rather than sending the raw image bytes to the TensorFlow Serving instance, we can send the prepared image array. This image array will be fed directly into the network, so it must be the proper size and have had any transformations already applied. The [inputs.py](inputs.py) file has a convenience function to prepare an image for inception style networks. For example:
```python
from scipy.misc import imread
import inputs
import tfserver
image = imread('IMG_0898.jpg')
preped_image = inputs.prepare_image(image)
image_data = [preped_image]
predictions = tfserver.predict(image_data)
results = tfserver.process_classification_prediction(predictions, max_classes=10)
print(results)
```
================================================
FILE: tfserving/__init__.py
================================================
================================================
FILE: tfserving/client.py
================================================
"""
A simple client to query a TensorFlow Serving instance.
Example:
$ python client.py \
--images IMG_0932_sm.jpg \
--num_results 10 \
--model_name inception \
--host localhost \
--port 9000 \
--timeout 10
Author: Grant Van Horn
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import time
import tfserver
def parse_args():
parser = argparse.ArgumentParser(description='Command line classification client. Sorts and prints the classification results.')
parser.add_argument('--images', dest='image_paths',
help='Path to one or more images to classify (jpeg or png).',
type=str, nargs='+', required=True)
parser.add_argument('--num_results', dest='num_results',
help='The number of results to print. Set to 0 to print all classes.',
required=False, type=int, default=0)
parser.add_argument('--model_name', dest='model_name',
help='The name of the model to query.',
required=False, type=str, default='inception')
parser.add_argument('--host', dest='host',
help='Machine host where the TensorFlow Serving model is.',
required=False, type=str, default='localhost')
parser.add_argument('--port', dest='port',
help='Port that the TensorFlow Server is listening on.',
required=False, type=int, default=9000)
parser.add_argument('--timeout', dest='timeout',
help='Amount of time to wait before failing.',
required=False, type=int, default=10)
args = parser.parse_args()
return args
def main():
args = parse_args()
# Read in the image bytes
image_data = []
for fp in args.image_paths:
with open(fp) as f:
data = f.read()
image_data.append(data)
# Get the predictions
t = time.time()
predictions = tfserver.predict(image_data, model_name=args.model_name,
host=args.host, port=args.port, timeout=args.timeout
)
dt = time.time() - t
print("Prediction call took %0.4f seconds" % (dt,))
# Process the results
results = tfserver.process_classification_prediction(predictions, max_classes=args.num_results)
# Print the results
for i, fp in enumerate(args.image_paths):
print("Results for image: %s" % (fp,))
for name, score in results[i]:
print("%s: %0.3f" % (name, score))
print()
if __name__ == '__main__':
main()
================================================
FILE: tfserving/inputs.py
================================================
"""
Numpy and scipy image preparation.
Author: Grant Van Horn
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from scipy.misc import imresize
def prepare_image(image, input_height=299, input_width=299):
""" Prepare an image to be passed through a network.
Arguments:
image (numpy.ndarray): An uint8 RGB image
Returns:
list: the image resized, centered and raveled
"""
# We assume an uint8 RGB image
assert image.dtype == np.uint8
assert image.ndim == 3
assert image.shape[2] == 3
resized_image = imresize(image, (input_height, input_width, 3))
float_image = resized_image.astype(np.float32)
centered_image = ((float_image / 255.) - 0.5) * 2.0
return centered_image.ravel().tolist()
================================================
FILE: tfserving/tfserver.py
================================================
"""
TensorFlow Serving caller code.
Requirements:
pip install numpy tensorflow tensorflow-serving-api
Author: Grant Van Horn
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from grpc.beta import implementations
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2
def predict(image_data,
model_name='inception',
host='localhost',
port=9000,
timeout=10):
"""
Arguments:
image_data (list): A list of image data. The image data should either be the image bytes or
float arrays.
model_name (str): The name of the model to query (specified when you started the Server)
model_signature_name (str): The name of the signature to query (specified when you created the exported model)
host (str): The machine host identifier that the classifier is running on.
port (int): The port that the classifier is listening on.
timeout (int): Time in seconds before timing out.
Returns:
PredictResponse protocol buffer. See here: https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/predict.proto
"""
if len(image_data) <= 0:
return None
channel = implementations.insecure_channel(host, int(port))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = model_name
if type(image_data[0]) == str:
request.model_spec.signature_name = 'predict_image_bytes'
request.inputs['images'].CopyFrom(
tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data)]))
else:
request.model_spec.signature_name = 'predict_image_array'
request.inputs['images'].CopyFrom(
tf.contrib.util.make_tensor_proto(image_data, shape=[len(image_data), len(image_data[1])]))
result = stub.Predict(request, timeout)
return result
def process_classification_prediction(predictions, max_classes=10):
"""
Arguments:
prediction (PredictResponse protocol buffer): TensorFlow Serving prediction response.
num_classes (int): Maximum number of results to return. Set to 0 for all results.
Returns:
list of lists: A list of (name, score) tuples, one for each prediction.
"""
# Determine how many outputs there are
dims = predictions.outputs['classes'].tensor_shape.dim
num_inputs = dims[0].size
num_classes = dims[1].size
all_class_names = np.array(predictions.outputs['classes'].string_val).reshape(num_inputs, num_classes)
all_scores = np.array(predictions.outputs['scores'].float_val).reshape(num_inputs, num_classes)
results = []
for i in range(num_inputs):
scores = all_scores[i]
class_names = all_class_names[i]
idxs = np.argsort(scores)[::-1]
scores = scores[idxs]
class_names = class_names[idxs]
num_to_return = min(num_classes, max_classes)
if num_to_return <= 0:
num_to_return = scores.shape[-1]
names_scores = [(class_names[i], scores[i]) for i in range(num_to_return)]
results.append(names_scores)
return results
================================================
FILE: train.py
================================================
# Some of this code came from the https://github.com/tensorflow/models/tree/master/slim
# directory, so lets keep the Google license around for now.
#
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import copy
import os
import numpy as np
import tensorflow as tf
import tensorflow.contrib.slim as slim
from config.parse_config import parse_config_file
from nets import nets_factory
from preprocessing.inputs import input_nodes
def _configure_learning_rate(global_step, cfg):
"""Configures the learning rate.
Args:
num_samples_per_epoch: The number of samples in each epoch of training.
global_step: The global_step tensor.
Returns:
A `Tensor` representing the learning rate.
Raises:
ValueError: if cfg.LEARNING_RATE_DECAY_TYPE is not recognized.
"""
decay_steps = int(cfg.NUM_TRAIN_EXAMPLES / cfg.BATCH_SIZE * cfg.NUM_EPOCHS_PER_DELAY)
if cfg.LEARNING_RATE_DECAY_TYPE == 'exponential':
return tf.train.exponential_decay(cfg.INITIAL_LEARNING_RATE,
global_step,
decay_steps,
cfg.LEARNING_RATE_DECAY_FACTOR,
staircase=cfg.LEARNING_RATE_STAIRCASE,
name='exponential_decay_learning_rate')
elif cfg.LEARNING_RATE_DECAY_TYPE == 'fixed':
return tf.constant(cfg.INITIAL_LEARNING_RATE, name='fixed_learning_rate')
elif cfg.LEARNING_RATE_DECAY_TYPE == 'polynomial':
return tf.train.polynomial_decay(cfg.INITIAL_LEARNING_RATE,
global_step,
decay_steps,
cfg.END_LEARNING_RATE,
power=1.0,
cycle=False,
name='polynomial_decay_learning_rate')
else:
raise ValueError('learning_rate_decay_type [%s] was not recognized',
cfg.LEARNING_RATE_DECAY_TYPE)
def _configure_optimizer(learning_rate, cfg):
"""Configures the optimizer used for training.
Args:
learning_rate: A scalar or `Tensor` learning rate.
Returns:
An instance of an optimizer.
Raises:
ValueError: if FLAGS.optimizer is not recognized.
"""
if cfg.OPTIMIZER == 'adadelta':
optimizer = tf.train.AdadeltaOptimizer(
learning_rate,
rho=cfg.ADADELTA_RHO,
epsilon=cfg.OPTIMIZER_EPSILON)
elif cfg.OPTIMIZER == 'adagrad':
optimizer = tf.train.AdagradOptimizer(
learning_rate,
initial_accumulator_value=cfg.ADAGRAD_INITIAL_ACCUMULATOR_VALUE)
elif cfg.OPTIMIZER == 'adam':
optimizer = tf.train.AdamOptimizer(
learning_rate,
beta1=cfg.ADAM_BETA1,
beta2=cfg.ADAM_BETA2,
epsilon=cfg.OPTIMIZER_EPSILON)
elif cfg.OPTIMIZER == 'ftrl':
optimizer = tf.train.FtrlOptimizer(
learning_rate,
learning_rate_power=cfg.FTRL_LEARNING_RATE_POWER,
initial_accumulator_value=cfg.FTRL_INITIAL_ACCUMULATOR_VALUE,
l1_regularization_strength=cfg.FTRL_L1,
l2_regularization_strength=cfg.FTRL_L2)
elif cfg.OPTIMIZER == 'momentum':
optimizer = tf.train.MomentumOptimizer(
learning_rate,
momentum=cfg.MOMENTUM,
name='Momentum')
elif cfg.OPTIMIZER == 'rmsprop':
optimizer = tf.train.RMSPropOptimizer(
learning_rate,
decay=cfg.RMSPROP_DECAY,
momentum=cfg.MOMENTUM,
epsilon=cfg.OPTIMIZER_EPSILON)
elif cfg.OPTIMIZER == 'sgd':
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
else:
raise ValueError('Optimizer [%s] was not recognized', cfg.OPTIMIZER)
return optimizer
def get_trainable_variables(trainable_scopes):
"""Returns a list of variables to train.
Returns:
A list of variables to train by the optimizer.
"""
if trainable_scopes is None:
return tf.trainable_variables()
trainable_scopes = [scope.strip() for scope in trainable_scopes]
variables_to_train = []
for scope in trainable_scopes:
variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
variables_to_train.extend(variables)
return variables_to_train
def get_init_function(logdir, pretrained_model_path, checkpoint_exclude_scopes, restore_variables_with_moving_averages=False, restore_moving_averages=False, ema=None):
"""
Args:
logdir : location of where we will be storing checkpoint files.
pretrained_model_path : a path to a specific model, or a directory with a checkpoint file. The latest model will be used.
fine_tune : If True, then the detection heads will not be restored.
original_inception_vars : A list of variables that do not include the detection heads.
use_moving_averages : If True, then the moving average values of the variables will be restored.
restore_moving_averages : If True, then the moving average values will also be restored.
ema : The exponential moving average object
"""
if pretrained_model_path is None:
return None
# Warn the user if a checkpoint exists in the train_dir. Then we'll be
# ignoring the checkpoint anyway.
if tf.train.latest_checkpoint(logdir):
tf.logging.info(
'Ignoring --pretrained_model_path because a checkpoint already exists in %s'
% logdir)
return None
exclusions = []
if checkpoint_exclude_scopes:
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
break
if not excluded:
variables_to_restore.append(var)
#for variable in variables_to_restore:
# print(variable.name)
if os.path.isdir(pretrained_model_path):
checkpoint_path = tf.train.latest_checkpoint(pretrained_model_path)
if checkpoint_path is None:
raise ValueError(
"No model checkpoint file found in directory %s" % (pretrained_model_path))
else:
checkpoint_path = pretrained_model_path
tf.logging.info('Restoring variables from %s' % checkpoint_path)
if ema != None:
# # Restore each variable with its moving average value
# if restore_variables_with_moving_averages:
# # Also restore the moving average variables
# if restore_moving_averages:
# variables_to_restore_with_ma = variables_to_restore + [ema.average(var) for var in variables_to_restore]
# normal_saver = tf.train.Saver(variables_to_restore_with_ma, reshape=False)
# else:
# normal_saver = tf.train.Saver(variables_to_restore, reshape=False)
# ema_saver = tf.train.Saver({
# ema.average_name(var) : ema.average(var)
# for var in variables_to_restore
# }, reshape=False)
# def callback(session):
# normal_saver.restore(session, checkpoint_path)
# ema_saver.restore(session, checkpoint_path)
# return callback
# elif restore_moving_averages:
# variables_to_restore += [ema.average(var) for var in variables_to_restore]
# Load in the moving average value for a variable, rather than the variable itself
if restore_variables_with_moving_averages:
variables_to_restore = {
ema.average_name(var) : var
for var in variables_to_restore
}
# Do we want to restore the moving average variables? Otherwise they will be reinitialized
if restore_moving_averages:
# If we are already using the moving averages to restore the variables, then we will need
# two Saver() objects (since the names in the dictionaries will clash)
if restore_variables_with_moving_averages:
normal_saver = tf.train.Saver(variables_to_restore, reshape=False)
ema_saver = tf.train.Saver({
ema.average_name(var) : ema.average(var)
for var in variables_to_restore.values()
}, reshape=False)
def callback(session):
normal_saver.restore(session, checkpoint_path)
ema_saver.restore(session, checkpoint_path)
return callback
else:
# GVH: Need to check for dict
variables_to_restore += [ema.average(var) for var in variables_to_restore]
return slim.assign_from_checkpoint_fn(
checkpoint_path,
variables_to_restore,
ignore_missing_vars=False)
def train(tfrecords, logdir, cfg, pretrained_model_path=None, trainable_scopes=None, checkpoint_exclude_scopes=None, restore_variables_with_moving_averages=False, restore_moving_averages=False, read_images=False):
"""
Args:
tfrecords (list)
bbox_priors (np.array)
logdir (str)
cfg (EasyDict)
pretrained_model_path (str) : path to a pretrained Inception Network
"""
tf.logging.set_verbosity(tf.logging.INFO)
graph = tf.Graph()
# Force all Variables to reside on the CPU.
with graph.as_default():
# Create a variable to count the number of train() calls.
global_step = slim.get_or_create_global_step()
with tf.device('/cpu:0'):
batch_dict = input_nodes(
tfrecords=tfrecords,
cfg=cfg.IMAGE_PROCESSING,
num_epochs=None,
batch_size=cfg.BATCH_SIZE,
num_threads=cfg.NUM_INPUT_THREADS,
shuffle_batch =cfg.SHUFFLE_QUEUE,
random_seed=cfg.RANDOM_SEED,
capacity=cfg.QUEUE_CAPACITY,
min_after_dequeue=cfg.QUEUE_MIN,
add_summaries=True,
input_type='train',
read_filenames=read_images
)
batched_one_hot_labels = slim.one_hot_encoding(batch_dict['labels'],
num_classes=cfg.NUM_CLASSES)
# GVH: Doesn't seem to help to the poor queueing performance...
# batch_queue = slim.prefetch_queue.prefetch_queue(
# [batch_dict['inputs'], batched_one_hot_labels], capacity=2)
# inputs, labels = batch_queue.dequeue()
arg_scope = nets_factory.arg_scopes_map[cfg.MODEL_NAME](
weight_decay=cfg.WEIGHT_DECAY,
batch_norm_decay=cfg.BATCHNORM_MOVING_AVERAGE_DECAY,
batch_norm_epsilon=cfg.BATCHNORM_EPSILON
)
with slim.arg_scope(arg_scope):
logits, end_points = nets_factory.networks_map[cfg.MODEL_NAME](
inputs=batch_dict['inputs'],
num_classes=cfg.NUM_CLASSES,
dropout_keep_prob=cfg.DROPOUT_KEEP_PROB,
is_training=True
)
# Add the losses
if 'AuxLogits' in end_points:
tf.losses.softmax_cross_entropy(
logits=end_points['AuxLogits'], onehot_labels=batched_one_hot_labels,
label_smoothing=cfg.LABEL_SMOOTHING, weights=0.4, scope='aux_loss')
tf.losses.softmax_cross_entropy(
logits=logits, onehot_labels=batched_one_hot_labels, label_smoothing=cfg.LABEL_SMOOTHING, weights=1.0)
summaries = set(tf.get_collection(tf.GraphKeys.SUMMARIES))
# Summarize the losses
for loss in tf.get_collection(tf.GraphKeys.LOSSES):
summaries.add(tf.summary.scalar(name='losses/%s' % loss.op.name, tensor=loss))
regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
if regularization_losses:
regularization_loss = tf.add_n(regularization_losses, name='regularization_loss')
summaries.add(tf.summary.scalar(name='losses/regularization_loss', tensor=regularization_loss))
total_loss = tf.losses.get_total_loss()
summaries.add(tf.summary.scalar(name='losses/total_loss', tensor=total_loss))
if 'MOVING_AVERAGE_DECAY' in cfg and cfg.MOVING_AVERAGE_DECAY > 0:
moving_average_variables = slim.get_model_variables()
ema = tf.train.ExponentialMovingAverage(
decay=cfg.MOVING_AVERAGE_DECAY,
num_updates=global_step
)
elif restore_variables_with_moving_averages or restore_moving_averages:
# Perhaps we are finetuning the last layer of a pretrained model?
# So we just need something to load in the moving averages, for use in get_init_function()
moving_average_variables = None
ema = tf.train.ExponentialMovingAverage(
decay=1,
num_updates=global_step
)
else:
moving_average_variables = None
ema = None
# Calculate the learning rate schedule.
lr = _configure_learning_rate(global_step, cfg)
# Create an optimizer that performs gradient descent.
optimizer = _configure_optimizer(lr, cfg)
summaries.add(tf.summary.scalar(tensor=lr,
name='learning_rate'))
# Add the moving average update ops to the graph
if ema != None and moving_average_variables != None:
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, ema.apply(moving_average_variables))
trainable_vars = get_trainable_variables(trainable_scopes)
train_op = slim.learning.create_train_op(total_loss=total_loss,
optimizer=optimizer,
global_step=global_step,
variables_to_train=trainable_vars,
clip_gradient_norm=cfg.CLIP_GRADIENT_NORM)
# Merge all of the summaries
summaries |= set(tf.get_collection(tf.GraphKeys.SUMMARIES))
summary_op = tf.summary.merge(inputs=list(summaries), name='summary_op')
sess_config = tf.ConfigProto(
log_device_placement=cfg.SESSION_CONFIG.LOG_DEVICE_PLACEMENT,
allow_soft_placement = True,
gpu_options = tf.GPUOptions(
per_process_gpu_memory_fraction=cfg.SESSION_CONFIG.PER_PROCESS_GPU_MEMORY_FRACTION
),
intra_op_parallelism_threads=cfg.SESSION_CONFIG.INTRA_OP_PARALLELISM_THREADS if 'INTRA_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None,
inter_op_parallelism_threads=cfg.SESSION_CONFIG.INTER_OP_PARALLELISM_THREADS if 'INTER_OP_PARALLELISM_THREADS' in cfg.SESSION_CONFIG else None
)
saver = tf.train.Saver(
# Save all variables
max_to_keep = cfg.MAX_TO_KEEP,
keep_checkpoint_every_n_hours = cfg.KEEP_CHECKPOINT_EVERY_N_HOURS
)
# Run training.
slim.learning.train(
train_op=train_op,
logdir=logdir,
init_fn=get_init_function(logdir, pretrained_model_path, checkpoint_exclude_scopes, restore_variables_with_moving_averages=restore_variables_with_moving_averages, restore_moving_averages=restore_moving_averages, ema=ema),
number_of_steps=cfg.NUM_TRAIN_ITERATIONS,
save_summaries_secs=cfg.SAVE_SUMMARY_SECS,
save_interval_secs=cfg.SAVE_INTERVAL_SECS,
saver=saver,
session_config=sess_config,
summary_op = summary_op,
log_every_n_steps = cfg.LOG_EVERY_N_STEPS
)
def parse_args():
parser = argparse.ArgumentParser(description='Train the classification system')
parser.add_argument('--tfrecords', dest='tfrecords',
help='Paths to tfrecord files.', type=str,
nargs='+', required=True)
parser.add_argument('--logdir', dest='logdir',
help='path to directory to store summary files and checkpoint files', type=str,
required=True)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file',
required=True, type=str)
parser.add_argument('--pretrained_model', dest='pretrained_model',
help='Path to a model to restore. This is ignored if there is model in the logdir.',
required=False, type=str, default=None)
parser.add_argument('--trainable_scopes', dest='trainable_scopes',
help='Only variables within these scopes will be trained.',
type=str, nargs='+', default=None, required=False)
parser.add_argument('--checkpoint_exclude_scopes', dest='checkpoint_exclude_scopes',
help='Variables within these scopes will not be restored from the checkpoint files.',
type=str, nargs='+', default=None, required=False)
parser.add_argument('--max_number_of_steps', dest='max_number_of_steps',
help='The maximum number of iterations to run.',
required=False, type=int, default=None)
parser.add_argument('--learning_rate_decay_type', dest='learning_rate_decay_type',
help='Type of the decay', type=str,
required=False, default=None)
parser.add_argument('--lr', dest='learning_rate',
help='Initial learning rate', type=float,
required=False, default=None)
parser.add_argument('--batch_size', dest='batch_size',
help='The number of images in a batch.',
required=False, type=int, default=None)
parser.add_argument('--model_name', dest='model_name',
help='The name of the architecture to use.',
required=False, type=str, default=None)
parser.add_argument('--restore_variables_with_moving_averages', dest='restore_variables_with_moving_averages',
help='If True, then we restore variables with their moving average values.',
required=False, action='store_true', default=False)
parser.add_argument('--restore_moving_averages', dest='restore_moving_averages',
help='If True, then we restore the variable that tracks the moving average of each trainable varibale.',
required=False, action='store_true', default=False)
parser.add_argument('--read_images', dest='read_images',
help='Read the images from the file system using the `filename` field rather than using the `encoded` field of the tfrecord.',
action='store_true', default=False)
args = parser.parse_args()
return args
def main():
args = parse_args()
cfg = parse_config_file(args.config_file)
# Replace cfg parameters with the command line values
if args.max_number_of_steps != None:
cfg.NUM_TRAIN_ITERATIONS = args.max_number_of_steps
if args.learning_rate_decay_type != None:
cfg.LEARNING_RATE_DECAY_TYPE = args.learning_rate_decay_type
if args.learning_rate != None:
cfg.INITIAL_LEARNING_RATE = args.learning_rate
if args.batch_size != None:
cfg.BATCH_SIZE = args.batch_size
if args.model_name != None:
cfg.MODEL_NAME = args.model_name
train(
tfrecords=args.tfrecords,
logdir=args.logdir,
cfg=cfg,
pretrained_model_path=args.pretrained_model,
trainable_scopes = args.trainable_scopes,
checkpoint_exclude_scopes = args.checkpoint_exclude_scopes,
restore_variables_with_moving_averages=args.restore_variables_with_moving_averages,
restore_moving_averages=args.restore_moving_averages,
read_images=args.read_images
)
if __name__ == '__main__':
main()
================================================
FILE: visualize_train_inputs.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from matplotlib import pyplot as plt
import numpy as np
import tensorflow as tf
from config.parse_config import parse_config_file
from preprocessing.inputs import input_nodes
def visualize_train_inputs(tfrecords, cfg, show_text_labels=False, read_images=False):
graph = tf.Graph()
sess = tf.Session(graph = graph)
# run a session to look at the images...
with sess.as_default(), graph.as_default():
# Input Nodes
with tf.device('/cpu:0'):
batch_dict = input_nodes(
tfrecords=tfrecords,
cfg=cfg.IMAGE_PROCESSING,
num_epochs=1,
batch_size=cfg.BATCH_SIZE,
num_threads=cfg.NUM_INPUT_THREADS,
shuffle_batch =cfg.SHUFFLE_QUEUE,
random_seed=cfg.RANDOM_SEED,
capacity=cfg.QUEUE_CAPACITY,
min_after_dequeue=cfg.QUEUE_MIN,
add_summaries=False,
input_type='visualize',
fetch_text_labels=show_text_labels,
read_filenames=read_images
)
# Convert float images to uint8 images
image_to_convert = tf.placeholder(dtype=tf.float32,
shape=[cfg.IMAGE_PROCESSING.INPUT_SIZE,
cfg.IMAGE_PROCESSING.INPUT_SIZE, 3])
uint8_image = tf.image.convert_image_dtype(image_to_convert, dtype=tf.uint8)
coord = tf.train.Coordinator()
tf.global_variables_initializer().run()
tf.local_variables_initializer().run()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
plt.ion()
done = False
while not done:
output = sess.run(batch_dict)
original_images = output['original_inputs']
distorted_images = output['inputs']
image_ids = output['ids']
labels = output['labels']
if show_text_labels:
text_labels = output['text_labels']
for b in range(cfg.BATCH_SIZE):
original_image = original_images[b]
distorted_image = distorted_images[b]
if original_image.dtype != np.uint8:
original_image = sess.run(uint8_image, {image_to_convert : original_image})
if distorted_image.dtype != np.uint8:
distorted_image = sess.run(uint8_image, {image_to_convert : distorted_image})
image_id = image_ids[b]
label = labels[b]
fig = plt.figure('Train Inputs')
if show_text_labels:
text_label = text_labels[b]
st = fig.suptitle("Image: %s\nLabel: %d\nText: %s" %
(image_id, label, text_label), fontsize=12)
else:
st = fig.suptitle("Image: %s\nLabel: %d" % (image_id, label), fontsize=12)
plt.subplot(2, 1, 1)
plt.imshow(original_image)
plt.title("Original")
plt.axis('off')
plt.subplot(2, 1, 2)
plt.imshow(distorted_image)
plt.title("Modified")
plt.axis('off')
# Shift the subplots down a bit to make room for the super title
st.set_y(0.95)
fig.subplots_adjust(top=0.75)
plt.show(block=False)
t = raw_input("Press Enter to view next image. Press any key followed " \
"by Enter to quite: ")
if t != '':
done = True
break
plt.clf()
def parse_args():
parser = argparse.ArgumentParser(description='Visualize the inputs to train the classification system.')
parser.add_argument('--tfrecords', dest='tfrecords',
help='Paths to tfrecord files.', type=str,
nargs='+', required=True)
parser.add_argument('--config', dest='config_file',
help='Path to the configuration file',
required=True, type=str)
parser.add_argument('--text_labels', dest='show_text_labels',
help='If text labels have been stored in the tfrecords, then you can use this flag to show them.',
action='store_true', default=False)
parser.add_argument('--read_images', dest='read_images',
help='Read the images from the file system using the `filename` field rather than using the `encoded` field of the tfrecord.',
action='store_true', default=False)
args = parser.parse_args()
return args
def main():
args = parse_args()
cfg = parse_config_file(args.config_file)
visualize_train_inputs(
tfrecords=args.tfrecords,
cfg=cfg,
show_text_labels=args.show_text_labels,
read_images=args.read_images
)
if __name__ == '__main__':
main()