Showing preview only (274K chars total). Download the full file or copy to clipboard to get everything.
Repository: qhgz2013/anime-face-detector
Branch: master
Commit: 94d75475a17f
Files: 29
Total size: 262.8 KB
Directory structure:
gitextract_tx2oprbr/
├── .gitignore
├── LICENSE
├── Makefile
├── README.md
├── _tf_compat_import.py
├── faster_rcnn_wrapper.py
├── main.py
├── make.bat
├── model/
│ └── .gitignore
├── nms/
│ ├── .gitignore
│ ├── __init__.py
│ ├── cpu_nms.pyx
│ ├── gpu_nms.hpp
│ ├── gpu_nms.pyx
│ ├── nms_kernel.cu
│ └── py_cpu_nms.py
├── nms_wrapper.py
├── setup.py
└── tf_contrib/
├── README.md
├── arg_scope.py
├── initializers.py
├── layers.py
├── loader.py
├── regularizers.py
├── resnet_utils.py
├── resnet_v1.py
├── slim.py
├── utils.py
└── variables.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Ignore VSCode configurations
.vscode
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
# idea pycharm data
.idea/
# cython build result
build/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2018 Zhou Xuebin
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: Makefile
================================================
all:
python setup.py build_ext --inplace
rm -rf build
clean:
rm -rf */*.pyc
rm -rf */*.so
================================================
FILE: README.md
================================================
# Anime-Face-Detector
A Faster-RCNN based anime face detector.
This detector is trained on 6000 training samples and 641 testing samples, randomly selected from the dataset which is crawled from top 100 [pixiv daily ranking](https://www.pixiv.net/ranking.php?mode=daily).
Thanks to [OpenCV based Anime face detector](https://github.com/nagadomi/lbpcascade_animeface) written by nagadomi, which helps labelling the data.
The original implementation of Faster-RCNN using Tensorflow can be found [here](https://github.com/endernewton/tf-faster-rcnn)
## Dependencies
- Python >= 3.6
- `tensorflow` latest 1.x or 2.x
- `opencv-python` (Will use other packages like `pillow` and `scikit-image` as backend in future version)
- `cython` (optional, can be ignored with additional `-nms-type PY_NMS` argument)
- Pre-trained ResNet101 model
## Usage
1. Clone this repository
```bash
git clone https://github.com/qhgz2013/anime-face-detector.git
```
2. Download the pre-trained model
Google Drive: [here](https://drive.google.com/open?id=1WjBgfOUqp4sdRd9BHs4TkdH2EcBtV5ri)
Baidu Netdisk: [here](https://pan.baidu.com/s/1bvpCp1sbD7t9qnta8IhpmA)
3. Unzip the model file into `model` directory
4. Build the CPU NMS model (skip this step if use PY_NMS with argument: `-nms-type PY_NMS`)
```bash
make clean
make
```
If using Windows Power Shell, type `cmd /C make.bat` to run build script.
5. Run the demo as you want
- Visualize the result (without output path):
```bash
python main.py -i /path/to/image.jpg
```
- Save results to a json file
```bash
python main.py -i /path/to/image.jpg -o /path/to/output.json
```
Format: `{"image_path": [{"score": predicted_probability, "bbox": [min_x, min_y, max_x, max_y]}, ...], ...}`
Sample output file:
```json
{"/path/to/image.jpg": [{"score": 0.9999708, "bbox": [551.3375, 314.50253, 729.2599, 485.25674]}]}
```
- Detecting a whole directory with recursion
```bash
python main.py -i /path/to/dir -o /path/to/output.json
```
- Customize threshold
```bash
python main.py -i /path/to/image.jpg -nms 0.3 -conf 0.8
```
- Customize model path
```bash
python main.py -i /path/to/image.jpg -model /path/to/model.ckpt
```
- Customize nms type (supports CPU_NMS and PY_NMS, not supports GPU_NMS because of the complicated build process for Windows platform)
```bash
python main.py -i /path/to/image.jpg -nms-type PY_NMS
```
- Crop detected images and store them in a folder (start output is an integer to start naming the cropped images, default is 0)
```bash
python main.py -i /path/to/image/or/folder -crop-location /path/to/store/cropped/images -start-output 1
```
- Crop detected images and resizes them
```bash
python main.py -i /path/to/image/or/folder -crop-location /path/to/store/cropped/images -crop-height 224 -crop-width 224
```
## Results
**Mean AP for this model: 0.9086**

Copyright info: [東方まとめ](https://www.pixiv.net/member_illust.php?mode=medium&illust_id=54275439) by [羽々斬](https://www.pixiv.net/member.php?id=2179695)

Copyright info: [【C94】桜と刀](https://www.pixiv.net/member_illust.php?mode=medium&illust_id=69797346) by [幻像黒兎](https://www.pixiv.net/member.php?id=4462245)

Copyright info: [アイドルマスター シンデレラガールズ](https://www.pixiv.net/member_illust.php?mode=medium&illust_id=69753772) by [我美蘭@1日目 東A-40a](https://www.pixiv.net/member.php?id=2003931)
## About training
This model is directly trained by [Faster-RCNN](https://github.com/endernewton/tf-faster-rcnn), with following argument:
```bash
python tools/trainval_net.py --weight data/imagenet_weights/res101.ckpt --imdb voc_2007_trainval --imdbval voc_2007_test --iters 60000 --cfg experiments/cfgs/res101.yml --net res101 --set ANCHOR_SCALES "[4,8,16,32]" ANCHOR_RATIOS "[1]" TRAIN.STEPSIZE "[50000]"
```
## Dataset
We've uploaded the dataset to Google drive [here](https://drive.google.com/open?id=1nDPimhiwbAWc2diok-6davhubNVe82pr), dataset structure is similar to VOC2007 (used in original Faster-RCNN implementation).
## Citation and declaration
Feel free to cite this repo and dataset.
This work is not related to my research team and lab, just my personal interest.
================================================
FILE: _tf_compat_import.py
================================================
__all__ = ['compat_tensorflow']
def _compat_tf_import(enable_gpu: bool = True):
if not enable_gpu:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
try:
tf_v1 = tf.compat.v1
tf_v1.disable_v2_behavior()
return tf_v1
except ImportError:
return tf
compat_tensorflow = _compat_tf_import()
================================================
FILE: faster_rcnn_wrapper.py
================================================
from _tf_compat_import import compat_tensorflow as tf
from tf_contrib.resnet_v1 import resnet_v1_block, resnet_v1
import tf_contrib.slim as slim
from tf_contrib.resnet_utils import arg_scope, conv2d_same
import numpy as np
class FasterRCNNSlim:
def __init__(self):
self._blocks = [resnet_v1_block('block1', base_depth=64, num_units=3, stride=2),
resnet_v1_block('block2', base_depth=128, num_units=4, stride=2),
resnet_v1_block('block3', base_depth=256, num_units=23, stride=1),
resnet_v1_block('block4', base_depth=512, num_units=3, stride=1)]
self._image = tf.placeholder(tf.float32, shape=[1, None, None, 3])
self._im_info = tf.placeholder(tf.float32, shape=[3])
self._anchor_scales = [4, 8, 16, 32]
self._num_scales = len(self._anchor_scales)
self._anchor_ratios = [1]
self._num_ratios = len(self._anchor_ratios)
self._num_anchors = self._num_scales * self._num_ratios
self._scope = 'resnet_v1_101'
with arg_scope([slim.conv2d, slim.conv2d_in_plane, slim.conv2d_transpose, slim.separable_conv2d,
slim.fully_connected],
weights_regularizer=slim.l2_regularizer(0.0001),
biases_regularizer=tf.no_regularizer,
biases_initializer=tf.constant_initializer(0.0)):
# in _build_network
initializer = tf.random_normal_initializer(stddev=0.01)
initializer_bbox = tf.random_normal_initializer(stddev=0.001)
# in _image_to_head
with slim.arg_scope(self._resnet_arg_scope()):
# in _build_base
with tf.variable_scope(self._scope, self._scope):
net_conv = conv2d_same(self._image, 64, 7, stride=2, scope='conv1')
net_conv = tf.pad(net_conv, [[0, 0], [1, 1], [1, 1], [0, 0]])
net_conv = slim.max_pool2d(net_conv, [3, 3], stride=2, padding='VALID', scope='pool1')
net_conv, _ = resnet_v1(net_conv, self._blocks[:-1], global_pool=False, include_root_block=False,
scope=self._scope)
with tf.variable_scope(self._scope, self._scope):
# in _anchor_component
with tf.variable_scope('ANCHOR-default'):
height = tf.cast(tf.ceil(self._im_info[0] / 16.0), dtype=tf.int32)
width = tf.cast(tf.ceil(self._im_info[1] / 16.0), dtype=tf.int32)
shift_x = tf.range(width) * 16
shift_y = tf.range(height) * 16
shift_x, shift_y = tf.meshgrid(shift_x, shift_y)
sx = tf.reshape(shift_x, [-1])
sy = tf.reshape(shift_y, [-1])
shifts = tf.transpose(tf.stack([sx, sy, sx, sy]))
k = width * height
shifts = tf.transpose(tf.reshape(shifts, [1, k, 4]), perm=[1, 0, 2])
anchors = np.array([[-24, -24, 39, 39], [-56, -56, 71, 71],
[-120, -120, 135, 135], [-248, -248, 263, 263]], dtype=np.int32)
a = anchors.shape[0]
anchor_constant = tf.constant(anchors.reshape([1, a, 4]), dtype=tf.int32)
length = k * a
anchors_tf = tf.reshape(anchor_constant + shifts, shape=[length, 4])
anchors = tf.cast(anchors_tf, dtype=tf.float32)
self._anchors = anchors
self._anchor_length = length
# in _region_proposal
rpn = slim.conv2d(net_conv, 512, [3, 3], trainable=False, weights_initializer=initializer,
scope='rpn_conv/3x3')
rpn_cls_score = slim.conv2d(rpn, self._num_anchors * 2, [1, 1], trainable=False,
weights_initializer=initializer, padding='VALID', activation_fn=None,
scope='rpn_cls_score')
rpn_cls_score_reshape = self._reshape(rpn_cls_score, 2, 'rpn_cls_score_reshape')
rpn_cls_prob_reshape = self._softmax(rpn_cls_score_reshape, 'rpn_cls_prob_reshape')
# rpn_cls_pred = tf.argmax(tf.reshape(rpn_cls_score_reshape, [-1, 2]), axis=1, name='rpn_cls_pred')
rpn_cls_prob = self._reshape(rpn_cls_prob_reshape, self._num_anchors * 2, 'rpn_cls_prob')
rpn_bbox_pred = slim.conv2d(rpn, self._num_anchors * 4, [1, 1], trainable=False,
weights_initializer=initializer, padding='VALID', activation_fn=None,
scope='rpn_bbox_pred')
# in _proposal_layer
with tf.variable_scope('rois'):
post_nms_topn = 300
nms_thresh = 0.7
scores = rpn_cls_prob[:, :, :, self._num_anchors:]
scores = tf.reshape(scores, [-1])
rpn_bbox_pred = tf.reshape(rpn_bbox_pred, [-1, 4])
boxes = tf.cast(self._anchors, rpn_bbox_pred.dtype)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + widths * 0.5
ctr_y = boxes[:, 1] + heights * 0.5
dx = rpn_bbox_pred[:, 0]
dy = rpn_bbox_pred[:, 1]
dw = rpn_bbox_pred[:, 2]
dh = rpn_bbox_pred[:, 3]
pred_ctr_x = dx * widths + ctr_x
pred_ctr_y = dy * heights + ctr_y
pred_w = tf.exp(dw) * widths
pred_h = tf.exp(dh) * heights
pred_boxes0 = pred_ctr_x - pred_w * 0.5
pred_boxes1 = pred_ctr_y - pred_h * 0.5
pred_boxes2 = pred_ctr_x + pred_w * 0.5
pred_boxes3 = pred_ctr_y + pred_h * 0.5
b0 = tf.clip_by_value(pred_boxes0, 0, self._im_info[1] - 1)
b1 = tf.clip_by_value(pred_boxes1, 0, self._im_info[0] - 1)
b2 = tf.clip_by_value(pred_boxes2, 0, self._im_info[1] - 1)
b3 = tf.clip_by_value(pred_boxes3, 0, self._im_info[0] - 1)
proposals = tf.stack([b0, b1, b2, b3], axis=1)
indices = tf.image.non_max_suppression(proposals, scores, max_output_size=post_nms_topn,
iou_threshold=nms_thresh)
boxes = tf.cast(tf.gather(proposals, indices), dtype=tf.float32)
# rpn_scores = tf.reshape(tf.gather(scores, indices), [-1, 1])
batch_inds = tf.zeros([tf.shape(indices)[0], 1], dtype=tf.float32)
rois = tf.concat([batch_inds, boxes], 1)
# in _crop_pool_layer
with tf.variable_scope('pool5'):
batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name='bath_id'), [1])
bottom_shape = tf.shape(net_conv)
height = (tf.cast(bottom_shape[1], dtype=tf.float32) - 1) * 16.0
width = (tf.cast(bottom_shape[2], dtype=tf.float32) - 1) * 16.0
x1 = tf.slice(rois, [0, 1], [-1, 1], name='x1') / width
y1 = tf.slice(rois, [0, 2], [-1, 1], name='y1') / height
x2 = tf.slice(rois, [0, 3], [-1, 1], name='x2') / width
y2 = tf.slice(rois, [0, 4], [-1, 1], name='y2') / height
bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], 1))
pool5 = tf.image.crop_and_resize(net_conv, bboxes, tf.cast(batch_ids, dtype=tf.int32), [7, 7],
name='crops')
# in _head_to_tail
with slim.arg_scope(self._resnet_arg_scope()):
fc7, _ = resnet_v1(pool5, self._blocks[-1:], global_pool=False, include_root_block=False,
scope=self._scope)
fc7 = tf.reduce_mean(fc7, axis=[1, 2])
with tf.variable_scope(self._scope, self._scope):
# in _region_classification
cls_score = slim.fully_connected(fc7, 2, weights_initializer=initializer, trainable=False,
activation_fn=None, scope='cls_score')
cls_prob = self._softmax(cls_score, 'cls_prob')
# cls_pred = tf.argmax(cls_score, 'cls_pred')
bbox_pred = slim.fully_connected(fc7, 2*4, weights_initializer=initializer_bbox, trainable=False,
activation_fn=None, scope='bbox_pred')
self._cls_score = cls_score
self._cls_prob = cls_prob
self._bbox_pred = bbox_pred
self._rois = rois
stds = np.tile(np.array([0.1, 0.1, 0.2, 0.2]), 2)
means = np.tile(np.array([0.0, 0.0, 0.0, 0.0]), 2)
self._bbox_pred *= stds
self._bbox_pred += means
@staticmethod
def _resnet_arg_scope():
batch_norm_params = {
'is_training': False,
'decay': 0.997,
'epsilon': 1e-5,
'scale': True,
'trainable': False,
'updates_collections': tf.GraphKeys.UPDATE_OPS
}
with arg_scope([slim.conv2d],
weights_regularizer=slim.l2_regularizer(0.0001),
weights_initializer=slim.variance_scaling_initializer(),
trainable=False,
activation_fn=tf.nn.relu,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params):
with arg_scope([slim.batch_norm], **batch_norm_params) as arg_sc:
return arg_sc
@staticmethod
def _reshape(bottom, num_dim, name):
input_shape = tf.shape(bottom)
with tf.variable_scope(name):
to_caffe = tf.transpose(bottom, [0, 3, 1, 2])
reshaped = tf.reshape(to_caffe, [1, num_dim, -1, input_shape[2]])
to_tf = tf.transpose(reshaped, [0, 2, 3, 1])
return to_tf
@staticmethod
def _softmax(bottom, name):
if name.startswith('rpn_cls_prob_reshape'):
input_shape = tf.shape(bottom)
bottom_reshaped = tf.reshape(bottom, [-1, input_shape[-1]])
reshaped_score = tf.nn.softmax(bottom_reshaped, name=name)
return tf.reshape(reshaped_score, input_shape)
return tf.nn.softmax(bottom, name=name)
def test_image(self, sess, image, im_info):
return sess.run([self._cls_score, self._cls_prob, self._bbox_pred, self._rois], feed_dict={
self._image: image,
self._im_info: im_info
})
================================================
FILE: main.py
================================================
import numpy as np
import cv2
from faster_rcnn_wrapper import FasterRCNNSlim
from _tf_compat_import import compat_tensorflow as tf
import argparse
import os
import json
import time
from nms_wrapper import NMSType, NMSWrapper
def detect(sess, rcnn_cls, image):
# pre-processing image for Faster-RCNN
img_origin = image.astype(np.float32, copy=True)
img_origin -= np.array([[[102.9801, 115.9465, 112.7717]]])
img_shape = img_origin.shape
img_size_min = np.min(img_shape[:2])
img_size_max = np.max(img_shape[:2])
img_scale = 600 / img_size_min
if np.round(img_scale * img_size_max) > 1000:
img_scale = 1000 / img_size_max
img = cv2.resize(img_origin, None, None, img_scale, img_scale, cv2.INTER_LINEAR)
img_info = np.array([img.shape[0], img.shape[1], img_scale], dtype=np.float32)
img = np.expand_dims(img, 0)
# test image
_, scores, bbox_pred, rois = rcnn_cls.test_image(sess, img, img_info)
# bbox transform
boxes = rois[:, 1:] / img_scale
boxes = boxes.astype(bbox_pred.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1
heights = boxes[:, 3] - boxes[:, 1] + 1
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = bbox_pred[:, 0::4]
dy = bbox_pred[:, 1::4]
dw = bbox_pred[:, 2::4]
dh = bbox_pred[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros_like(bbox_pred, dtype=bbox_pred.dtype)
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
# clipping edge
pred_boxes[:, 0::4] = np.maximum(pred_boxes[:, 0::4], 0)
pred_boxes[:, 1::4] = np.maximum(pred_boxes[:, 1::4], 0)
pred_boxes[:, 2::4] = np.minimum(pred_boxes[:, 2::4], img_shape[1] - 1)
pred_boxes[:, 3::4] = np.minimum(pred_boxes[:, 3::4], img_shape[0] - 1)
return scores, pred_boxes
def load_file_from_dir(dir_path):
ret = []
for file in os.listdir(dir_path):
path_comb = os.path.join(dir_path, file)
if os.path.isdir(path_comb):
ret += load_file_from_dir(path_comb)
else:
ret.append(path_comb)
return ret
def fmt_time(dtime):
if dtime <= 0:
return '0:00.000'
elif dtime < 60:
return '0:%02d.%03d' % (int(dtime), int(dtime * 1000) % 1000)
elif dtime < 3600:
return '%d:%02d.%03d' % (int(dtime / 60), int(dtime) % 60, int(dtime * 1000) % 1000)
else:
return '%d:%02d:%02d.%03d' % (int(dtime / 3600), int((dtime % 3600) / 60), int(dtime) % 60,
int(dtime * 1000) % 1000)
def main():
parser = argparse.ArgumentParser(description='Anime face detector demo')
parser.add_argument('-i', help='The input path of an image or directory', required=True, dest='input', type=str)
parser.add_argument('-o', help='The output json path of the detection result', dest='output')
parser.add_argument('-nms', help='Change the threshold for non maximum suppression',
dest='nms_thresh', default=0.3, type=float)
parser.add_argument('-conf', help='Change the threshold for class regression', dest='conf_thresh',
default=0.8, type=float)
parser.add_argument('-model', help='Specify a new path for model', dest='model', type=str,
default='model/res101_faster_rcnn_iter_60000.ckpt')
parser.add_argument('-nms-type', help='Type of nms', choices=['PY_NMS', 'CPU_NMS', 'GPU_NMS'], dest='nms_type',
default='CPU_NMS')
parser.add_argument('-crop-location', help='The output folder to place the cropped images', dest='crop_output_image_location')
parser.add_argument('-start-output', help='Start the numbering of the cropped images filename', dest='start_output_number',
default=0, type=int)
parser.add_argument('-crop-width', help='The width of images to crop', dest='crop_width', type=int)
parser.add_argument('-crop-height', help='The height of images to crop', dest='crop_height', type=int)
args = parser.parse_args()
assert os.path.exists(args.input), 'The input path does not exists'
if os.path.isdir(args.input):
files = load_file_from_dir(args.input)
else:
files = [args.input]
file_len = len(files)
if args.nms_type == 'PY_NMS':
nms_type = NMSType.PY_NMS
elif args.nms_type == 'CPU_NMS':
nms_type = NMSType.CPU_NMS
elif args.nms_type == 'GPU_NMS':
nms_type = NMSType.GPU_NMS
else:
raise ValueError('Incorrect NMS Type, not supported yet')
nms = NMSWrapper(nms_type)
cfg = tf.ConfigProto()
cfg.gpu_options.allow_growth = True
sess = tf.Session(config=cfg)
net = FasterRCNNSlim()
saver = tf.train.Saver()
saver.restore(sess, args.model)
result = {}
time_start = time.time()
for idx, file in enumerate(files):
elapsed = time.time() - time_start
eta = (file_len - idx) * elapsed / idx if idx > 0 else 0
print('[%d/%d] Elapsed: %s, ETA: %s >> %s' % (idx+1, file_len, fmt_time(elapsed), fmt_time(eta), file))
img = cv2.imread(file)
if img is None:
continue
scores, boxes = detect(sess, net, img)
boxes = boxes[:, 4:8]
scores = scores[:, 1]
keep = nms(np.hstack([boxes, scores[:, np.newaxis]]).astype(np.float32), args.nms_thresh)
boxes = boxes[keep, :]
scores = scores[keep]
inds = np.where(scores >= args.conf_thresh)[0]
scores = scores[inds]
boxes = boxes[inds, :]
result[file] = []
for i in range(scores.shape[0]):
x1, y1, x2, y2 = boxes[i, :].tolist()
new_result = {'score': float(scores[i]),
'bbox': [x1, y1, x2, y2]}
result[file].append(new_result)
if args.output is None and args.crop_output_image_location is None:
cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 0, 255), 2)
if args.crop_output_image_location:
cropped_image = img[int(y1):int(y2), int(x1):int(x2)]
if args.crop_width and args.crop_height:
cropped_image = cv2.resize(cropped_image,
(args.crop_width, args.crop_height),
interpolation = cv2.INTER_AREA)
cv2.imwrite(args.crop_output_image_location + str(args.start_output_number) + ".jpg", cropped_image)
args.start_output_number += 1
if args.output:
if ((idx+1) % 1000) == 0:
# saving the temporary result
with open(args.output, 'w') as f:
json.dump(result, f)
elif args.crop_output_image_location is None:
cv2.imshow(file, img)
if args.output:
with open(args.output, 'w') as f:
json.dump(result, f)
else:
cv2.waitKey()
if __name__ == '__main__':
main()
================================================
FILE: make.bat
================================================
@echo off
if /i "%1" == "clean" goto clean
goto all
:all
python setup.py build_ext --inplace
rd /s /q build
goto exit
:clean
del /f /s /q *.cpp
del /f /s /q *.c
del /f /s /q *.pyd
goto exit
:exit
================================================
FILE: model/.gitignore
================================================
# all pre-trained models
*.index
*.data-00000-of-00001
*.meta
*.pkl
================================================
FILE: nms/.gitignore
================================================
*.c
*.cpp
================================================
FILE: nms/__init__.py
================================================
================================================
FILE: nms/cpu_nms.pyx
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
cdef np.ndarray[np.int64_t, ndim=1] order = scores.argsort()[::-1]
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
np.zeros((ndets), dtype=np.int)
# nominal indices
cdef int _i, _j
# sorted indices
cdef int i, j
# temp variables for box i's (the box currently under consideration)
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
# variables for computing overlap with box j (lower scoring box)
cdef np.float32_t xx1, yy1, xx2, yy2
cdef np.float32_t w, h
cdef np.float32_t inter, ovr
keep = []
for _i in range(ndets):
i = order[_i]
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i + 1, ndets):
j = order[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = min(ix2, x2[j])
yy2 = min(iy2, y2[j])
w = max(0.0, xx2 - xx1 + 1)
h = max(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
return keep
================================================
FILE: nms/gpu_nms.hpp
================================================
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
int boxes_dim, float nms_overlap_thresh, int device_id);
================================================
FILE: nms/gpu_nms.pyx
================================================
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import numpy as np
cimport numpy as np
assert sizeof(int) == sizeof(np.int32_t)
cdef extern from "gpu_nms.hpp":
void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int)
def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh,
np.int32_t device_id=0):
cdef int boxes_num = dets.shape[0]
cdef int boxes_dim = dets.shape[1]
cdef int num_out
cdef np.ndarray[np.int32_t, ndim=1] \
keep = np.zeros(boxes_num, dtype=np.int32)
cdef np.ndarray[np.float32_t, ndim=1] \
scores = dets[:, 4]
cdef np.ndarray[np.int64_t, ndim=1] \
order = scores.argsort()[::-1]
cdef np.ndarray[np.float32_t, ndim=2] \
sorted_dets = dets[order, :]
_nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id)
keep = keep[:num_out]
return list(order[keep])
================================================
FILE: nms/nms_kernel.cu
================================================
// ------------------------------------------------------------------
// Faster R-CNN
// Copyright (c) 2015 Microsoft
// Licensed under The MIT License [see fast-rcnn/LICENSE for details]
// Written by Shaoqing Ren
// ------------------------------------------------------------------
#include "gpu_nms.hpp"
#include <vector>
#include <iostream>
#define CUDA_CHECK(condition) \
/* Code block avoids redefinition of cudaError_t error */ \
do { \
cudaError_t error = condition; \
if (error != cudaSuccess) { \
std::cout << cudaGetErrorString(error) << std::endl; \
} \
} while (0)
#define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
int const threadsPerBlock = sizeof(unsigned long long) * 8;
__device__ inline float devIoU(float const * const a, float const * const b) {
float left = max(a[0], b[0]), right = min(a[2], b[2]);
float top = max(a[1], b[1]), bottom = min(a[3], b[3]);
float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);
float interS = width * height;
float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);
float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);
return interS / (Sa + Sb - interS);
}
__global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh,
const float *dev_boxes, unsigned long long *dev_mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
// if (row_start > col_start) return;
const int row_size =
min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
const int col_size =
min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
__shared__ float block_boxes[threadsPerBlock * 5];
if (threadIdx.x < col_size) {
block_boxes[threadIdx.x * 5 + 0] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
block_boxes[threadIdx.x * 5 + 1] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
block_boxes[threadIdx.x * 5 + 2] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
block_boxes[threadIdx.x * 5 + 3] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
block_boxes[threadIdx.x * 5 + 4] =
dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
}
__syncthreads();
if (threadIdx.x < row_size) {
const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
const float *cur_box = dev_boxes + cur_box_idx * 5;
int i = 0;
unsigned long long t = 0;
int start = 0;
if (row_start == col_start) {
start = threadIdx.x + 1;
}
for (i = start; i < col_size; i++) {
if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) {
t |= 1ULL << i;
}
}
const int col_blocks = DIVUP(n_boxes, threadsPerBlock);
dev_mask[cur_box_idx * col_blocks + col_start] = t;
}
}
void _set_device(int device_id) {
int current_device;
CUDA_CHECK(cudaGetDevice(¤t_device));
if (current_device == device_id) {
return;
}
// The call to cudaSetDevice must come before any calls to Get, which
// may perform initialization using the GPU.
CUDA_CHECK(cudaSetDevice(device_id));
}
void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,
int boxes_dim, float nms_overlap_thresh, int device_id) {
_set_device(device_id);
float* boxes_dev = NULL;
unsigned long long* mask_dev = NULL;
const int col_blocks = DIVUP(boxes_num, threadsPerBlock);
CUDA_CHECK(cudaMalloc(&boxes_dev,
boxes_num * boxes_dim * sizeof(float)));
CUDA_CHECK(cudaMemcpy(boxes_dev,
boxes_host,
boxes_num * boxes_dim * sizeof(float),
cudaMemcpyHostToDevice));
CUDA_CHECK(cudaMalloc(&mask_dev,
boxes_num * col_blocks * sizeof(unsigned long long)));
dim3 blocks(DIVUP(boxes_num, threadsPerBlock),
DIVUP(boxes_num, threadsPerBlock));
dim3 threads(threadsPerBlock);
nms_kernel<<<blocks, threads>>>(boxes_num,
nms_overlap_thresh,
boxes_dev,
mask_dev);
std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
CUDA_CHECK(cudaMemcpy(&mask_host[0],
mask_dev,
sizeof(unsigned long long) * boxes_num * col_blocks,
cudaMemcpyDeviceToHost));
std::vector<unsigned long long> remv(col_blocks);
memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
int num_to_keep = 0;
for (int i = 0; i < boxes_num; i++) {
int nblock = i / threadsPerBlock;
int inblock = i % threadsPerBlock;
if (!(remv[nblock] & (1ULL << inblock))) {
keep_out[num_to_keep++] = i;
unsigned long long *p = &mask_host[0] + i * col_blocks;
for (int j = nblock; j < col_blocks; j++) {
remv[j] |= p[j];
}
}
}
*num_out = num_to_keep;
CUDA_CHECK(cudaFree(boxes_dev));
CUDA_CHECK(cudaFree(mask_dev));
}
================================================
FILE: nms/py_cpu_nms.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import numpy as np
def py_cpu_nms(dets, thresh):
"""Pure Python NMS baseline."""
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
areas = (x2 - x1 + 1) * (y2 - y1 + 1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(i)
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
w = np.maximum(0.0, xx2 - xx1 + 1)
h = np.maximum(0.0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (areas[i] + areas[order[1:]] - inter)
inds = np.where(ovr <= thresh)[0]
order = order[inds + 1]
return keep
================================================
FILE: nms_wrapper.py
================================================
from enum import Enum
class NMSType(Enum):
PY_NMS = 1
CPU_NMS = 2
GPU_NMS = 3
default_nms_type = NMSType.PY_NMS
class NMSWrapper:
def __init__(self, nms_type=default_nms_type):
assert type(nms_type) == NMSType
if nms_type == NMSType.PY_NMS:
from nms.py_cpu_nms import py_cpu_nms
self._nms = py_cpu_nms
elif nms_type == NMSType.CPU_NMS:
from nms.cpu_nms import cpu_nms
self._nms = cpu_nms
elif nms_type == NMSType.GPU_NMS:
from nms.gpu_nms import gpu_nms
self._nms = gpu_nms
else:
raise ValueError('current nms type is not implemented yet')
def __call__(self, *args, **kwargs):
return self._nms(*args, **kwargs)
================================================
FILE: setup.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import os
from os.path import join as pjoin
import numpy as np
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import sys
# Obtain the numpy include directory. This logic works across numpy versions.
try:
numpy_include = np.get_include()
except AttributeError:
numpy_include = np.get_numpy_include()
# run the customize_compiler
class custom_build_ext(build_ext):
def build_extensions(self):
build_ext.build_extensions(self)
ext_modules = [
Extension(
"nms.cpu_nms",
["nms/cpu_nms.pyx"],
extra_compile_args=["-Wno-cpp", "-Wno-unused-function"] if sys.platform == 'linux' else [],
include_dirs = [numpy_include]
)
]
setup(
name='tf_faster_rcnn',
ext_modules=ext_modules,
# inject our custom trigger
cmdclass={'build_ext': custom_build_ext},
)
================================================
FILE: tf_contrib/README.md
================================================
Since tensorflow 2.0 no longer supports contrib module, I decided to freeze the related contrib code segments into this directory.
The files in this directory are extracted from tensorflow.contrib module of version 1.15.3 (with imports fixed, and some unused codes that causes some errors are removed).
================================================
FILE: tf_contrib/arg_scope.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Contains the arg_scope used for scoping layers arguments.
Allows one to define models much more compactly by eliminating boilerplate
code. This is accomplished through the use of argument scoping (arg_scope).
Example of how to use tf.contrib.framework.arg_scope:
```
from third_party.tensorflow.contrib.layers.python import layers
arg_scope = tf.contrib.framework.arg_scope
with arg_scope([layers.conv2d], padding='SAME',
initializer=layers.variance_scaling_initializer(),
regularizer=layers.l2_regularizer(0.05)):
net = layers.conv2d(inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
net = layers.conv2d(net, 256, [5, 5], scope='conv2')
```
The first call to conv2d will behave as follows:
layers.conv2d(inputs, 64, [11, 11], 4, padding='VALID',
initializer=layers.variance_scaling_initializer(),
regularizer=layers.l2_regularizer(0.05), scope='conv1')
The second call to conv2d will also use the arg_scope's default for padding:
layers.conv2d(inputs, 256, [5, 5], padding='SAME',
initializer=layers.variance_scaling_initializer(),
regularizer=layers.l2_regularizer(0.05), scope='conv2')
Example of how to reuse an arg_scope:
```
with arg_scope([layers.conv2d], padding='SAME',
initializer=layers.variance_scaling_initializer(),
regularizer=layers.l2_regularizer(0.05)) as sc:
net = layers.conv2d(net, 256, [5, 5], scope='conv1')
....
with arg_scope(sc):
net = layers.conv2d(net, 256, [5, 5], scope='conv2')
```
Example of how to use tf.contrib.framework.add_arg_scope to enable your
function to be called within an arg_scope later:
@tf.contrib.framework.add_arg_scope
def conv2d(*args, **kwargs)
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from tensorflow.python.util import tf_contextlib
from tensorflow.python.util import tf_decorator
__all__ = [
'arg_scope', 'add_arg_scope', 'current_arg_scope', 'has_arg_scope',
'arg_scoped_arguments', 'arg_scope_func_key'
]
_ARGSTACK = [{}]
_DECORATED_OPS = {}
def _get_arg_stack():
if _ARGSTACK:
return _ARGSTACK
else:
_ARGSTACK.append({})
return _ARGSTACK
def current_arg_scope():
stack = _get_arg_stack()
return stack[-1]
def arg_scope_func_key(op):
return getattr(op, '_key_op', str(op))
def _name_op(op):
return (op.__module__, op.__name__)
def _kwarg_names(func):
kwargs_length = len(func.__defaults__) if func.__defaults__ else 0
return func.__code__.co_varnames[-kwargs_length:func.__code__.co_argcount]
def _add_op(op):
key_op = arg_scope_func_key(op)
_DECORATED_OPS[key_op] = _kwarg_names(op)
@tf_contextlib.contextmanager
def arg_scope(list_ops_or_scope, **kwargs):
"""Stores the default arguments for the given set of list_ops.
For usage, please see examples at top of the file.
Args:
list_ops_or_scope: List or tuple of operations to set argument scope for or
a dictionary containing the current scope. When list_ops_or_scope is a
dict, kwargs must be empty. When list_ops_or_scope is a list or tuple,
then every op in it need to be decorated with @add_arg_scope to work.
**kwargs: keyword=value that will define the defaults for each op in
list_ops. All the ops need to accept the given set of arguments.
Yields:
the current_scope, which is a dictionary of {op: {arg: value}}
Raises:
TypeError: if list_ops is not a list or a tuple.
ValueError: if any op in list_ops has not be decorated with @add_arg_scope.
"""
if isinstance(list_ops_or_scope, dict):
# Assumes that list_ops_or_scope is a scope that is being reused.
if kwargs:
raise ValueError('When attempting to re-use a scope by suppling a'
'dictionary, kwargs must be empty.')
current_scope = list_ops_or_scope.copy()
try:
_get_arg_stack().append(current_scope)
yield current_scope
finally:
_get_arg_stack().pop()
else:
# Assumes that list_ops_or_scope is a list/tuple of ops with kwargs.
if not isinstance(list_ops_or_scope, (list, tuple)):
raise TypeError('list_ops_or_scope must either be a list/tuple or reused '
'scope (i.e. dict)')
try:
current_scope = current_arg_scope().copy()
for op in list_ops_or_scope:
key = arg_scope_func_key(op)
if not has_arg_scope(op):
raise ValueError('%s is not decorated with @add_arg_scope',
_name_op(op))
if key in current_scope:
current_kwargs = current_scope[key].copy()
current_kwargs.update(kwargs)
current_scope[key] = current_kwargs
else:
current_scope[key] = kwargs.copy()
_get_arg_stack().append(current_scope)
yield current_scope
finally:
_get_arg_stack().pop()
def add_arg_scope(func):
"""Decorates a function with args so it can be used within an arg_scope.
Args:
func: function to decorate.
Returns:
A tuple with the decorated function func_with_args().
"""
def func_with_args(*args, **kwargs):
current_scope = current_arg_scope()
current_args = kwargs
key_func = arg_scope_func_key(func)
if key_func in current_scope:
current_args = current_scope[key_func].copy()
current_args.update(kwargs)
return func(*args, **current_args)
_add_op(func)
setattr(func_with_args, '_key_op', arg_scope_func_key(func))
return tf_decorator.make_decorator(func, func_with_args)
def has_arg_scope(func):
"""Checks whether a func has been decorated with @add_arg_scope or not.
Args:
func: function to check.
Returns:
a boolean.
"""
return arg_scope_func_key(func) in _DECORATED_OPS
def arg_scoped_arguments(func):
"""Returns the list kwargs that arg_scope can set for a func.
Args:
func: function which has been decorated with @add_arg_scope.
Returns:
a list of kwargs names.
"""
assert has_arg_scope(func)
return _DECORATED_OPS[arg_scope_func_key(func)]
================================================
FILE: tf_contrib/initializers.py
================================================
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Weight initializers for use with layers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from tensorflow.python.framework import dtypes
from tensorflow.python.ops import random_ops
__all__ = ['xavier_initializer', 'xavier_initializer_conv2d',
'variance_scaling_initializer']
def xavier_initializer(uniform=True, seed=None, dtype=dtypes.float32):
"""Returns an initializer performing "Xavier" initialization for weights.
This function implements the weight initialization from:
Xavier Glorot and Yoshua Bengio (2010):
[Understanding the difficulty of training deep feedforward neural
networks. International conference on artificial intelligence and
statistics.](
http://www.jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)
This initializer is designed to keep the scale of the gradients roughly the
same in all layers. In uniform distribution this ends up being the range:
`x = sqrt(6. / (in + out)); [-x, x]` and for normal distribution a standard
deviation of `sqrt(2. / (in + out))` is used.
Args:
uniform: Whether to use uniform or normal distributed random initialization.
seed: A Python integer. Used to create random seeds. See
`tf.compat.v1.set_random_seed` for behavior.
dtype: The data type. Only floating point types are supported.
Returns:
An initializer for a weight matrix.
"""
return variance_scaling_initializer(factor=1.0, mode='FAN_AVG',
uniform=uniform, seed=seed, dtype=dtype)
xavier_initializer_conv2d = xavier_initializer
def variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False,
seed=None, dtype=dtypes.float32):
"""Returns an initializer that generates tensors without scaling variance.
When initializing a deep network, it is in principle advantageous to keep
the scale of the input variance constant, so it does not explode or diminish
by reaching the final layer. This initializer use the following formula:
```python
if mode='FAN_IN': # Count only number of input connections.
n = fan_in
elif mode='FAN_OUT': # Count only number of output connections.
n = fan_out
elif mode='FAN_AVG': # Average number of inputs and output connections.
n = (fan_in + fan_out)/2.0
truncated_normal(shape, 0.0, stddev=sqrt(factor / n))
```
* To get [Delving Deep into Rectifiers](
http://arxiv.org/pdf/1502.01852v1.pdf) (also know as the "MSRA
initialization"), use (Default):<br/>
`factor=2.0 mode='FAN_IN' uniform=False`
* To get [Convolutional Architecture for Fast Feature Embedding](
http://arxiv.org/abs/1408.5093), use:<br/>
`factor=1.0 mode='FAN_IN' uniform=True`
* To get [Understanding the difficulty of training deep feedforward neural
networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf),
use:<br/>
`factor=1.0 mode='FAN_AVG' uniform=True.`
* To get `xavier_initializer` use either:<br/>
`factor=1.0 mode='FAN_AVG' uniform=True`, or<br/>
`factor=1.0 mode='FAN_AVG' uniform=False`.
Args:
factor: Float. A multiplicative factor.
mode: String. 'FAN_IN', 'FAN_OUT', 'FAN_AVG'.
uniform: Whether to use uniform or normal distributed random initialization.
seed: A Python integer. Used to create random seeds. See
`tf.compat.v1.set_random_seed` for behavior.
dtype: The data type. Only floating point types are supported.
Returns:
An initializer that generates tensors with unit variance.
Raises:
ValueError: if `dtype` is not a floating point type.
TypeError: if `mode` is not in ['FAN_IN', 'FAN_OUT', 'FAN_AVG'].
"""
if not dtype.is_floating:
raise TypeError('Cannot create initializer for non-floating point type.')
if mode not in ['FAN_IN', 'FAN_OUT', 'FAN_AVG']:
raise TypeError('Unknown mode %s [FAN_IN, FAN_OUT, FAN_AVG]', mode)
# pylint: disable=unused-argument
def _initializer(shape, dtype=dtype, partition_info=None):
"""Initializer function."""
if not dtype.is_floating:
raise TypeError('Cannot create initializer for non-floating point type.')
# Estimating fan_in and fan_out is not possible to do perfectly, but we try.
# This is the right thing for matrix multiply and convolutions.
if shape:
fan_in = float(shape[-2]) if len(shape) > 1 else float(shape[-1])
fan_out = float(shape[-1])
else:
fan_in = 1.0
fan_out = 1.0
for dim in shape[:-2]:
fan_in *= float(dim)
fan_out *= float(dim)
if mode == 'FAN_IN':
# Count only number of input connections.
n = fan_in
elif mode == 'FAN_OUT':
# Count only number of output connections.
n = fan_out
elif mode == 'FAN_AVG':
# Average number of inputs and output connections.
n = (fan_in + fan_out) / 2.0
if uniform:
# To get stddev = math.sqrt(factor / n) need to adjust for uniform.
limit = math.sqrt(3.0 * factor / n)
return random_ops.random_uniform(shape, -limit, limit,
dtype, seed=seed)
else:
# To get stddev = math.sqrt(factor / n) need to adjust for truncated.
trunc_stddev = math.sqrt(1.3 * factor / n)
return random_ops.truncated_normal(shape, 0.0, trunc_stddev, dtype,
seed=seed)
# pylint: enable=unused-argument
return _initializer
================================================
FILE: tf_contrib/layers.py
================================================
# -*- coding: utf-8 -*-
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# pylint: disable=g-short-docstring-punctuation
"""Higher level ops for building layers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import six
from .arg_scope import add_arg_scope
from . import variables
from . import initializers
from . import utils
from tensorflow.python.eager import context
from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import function
from tensorflow.python.framework import ops
from tensorflow.python.framework import sparse_tensor
from tensorflow.python.framework import tensor_shape
from tensorflow.python.keras.engine import input_spec
from tensorflow.python.layers import base
from tensorflow.python.layers import convolutional as convolutional_layers
from tensorflow.python.layers import core as core_layers
from tensorflow.python.layers import normalization as normalization_layers
from tensorflow.python.layers import pooling as pooling_layers
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import check_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import linalg_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn
from tensorflow.python.ops import sparse_ops
from tensorflow.python.ops import standard_ops
from tensorflow.python.ops import variable_scope
from tensorflow.python.ops import variables as tf_variables
from tensorflow.python.training import moving_averages
# TODO(b/28426988): Replace legacy_* fns migrated from slim.
# TODO(b/28426988): Remove legacy_* when all uses have migrated to new API.
__all__ = [
'avg_pool2d', 'avg_pool3d', 'batch_norm', 'bias_add', 'conv1d', 'conv2d',
'conv3d', 'conv2d_in_plane', 'conv2d_transpose', 'conv3d_transpose',
'convolution', 'convolution1d', 'convolution2d', 'convolution2d_in_plane',
'convolution2d_transpose', 'convolution3d', 'convolution3d_transpose',
'dense_to_sparse', 'dropout', 'elu', 'flatten', 'fully_connected',
'images_to_sequence', 'layer_norm', 'linear', 'pool', 'max_pool2d',
'max_pool3d', 'one_hot_encoding', 'relu', 'relu6', 'repeat',
'scale_gradient', 'separable_conv2d', 'separable_convolution2d',
'sequence_to_images', 'softmax', 'spatial_softmax', 'stack', 'unit_norm',
'legacy_fully_connected', 'legacy_linear', 'legacy_relu', 'maxout'
]
DATA_FORMAT_NCHW = 'NCHW'
DATA_FORMAT_NHWC = 'NHWC'
DATA_FORMAT_NCDHW = 'NCDHW'
DATA_FORMAT_NDHWC = 'NDHWC'
@add_arg_scope
def avg_pool2d(inputs,
kernel_size,
stride=2,
padding='VALID',
data_format=DATA_FORMAT_NHWC,
outputs_collections=None,
scope=None):
"""Adds a 2D average pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs: A 4-D tensor of shape `[batch_size, height, width, channels]` if
`data_format` is `NHWC`, and `[batch_size, channels, height, width]` if
`data_format` is `NCHW`.
kernel_size: A list of length 2: [kernel_height, kernel_width] of the
pooling kernel over which the op is computed. Can be an int if both values
are the same.
stride: A list of length 2: [stride_height, stride_width]. Can be an int if
both strides are the same. Note that presently both strides must have the
same value.
padding: The padding method, either 'VALID' or 'SAME'.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
with ops.name_scope(scope, 'AvgPool2D', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = pooling_layers.AveragePooling2D(
pool_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
_scope=sc)
outputs = layer.apply(inputs)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
@add_arg_scope
def avg_pool3d(inputs,
kernel_size,
stride=2,
padding='VALID',
data_format=DATA_FORMAT_NDHWC,
outputs_collections=None,
scope=None):
"""Adds a 3D average pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs: A 5-D tensor of shape `[batch_size, depth, height, width, channels]`
if `data_format` is `NDHWC`, and `[batch_size, channels, depth, height,
width]` if `data_format` is `NCDHW`.
kernel_size: A list of length 3: [kernel_depth, kernel_height, kernel_width]
of the pooling kernel over which the op is computed. Can be an int if both
values are the same.
stride: A list of length 3: [stride_depth, stride_height, stride_width]. Can
be an int if both strides are the same. Note that presently both strides
must have the same value.
padding: The padding method, either 'VALID' or 'SAME'.
data_format: A string. `NDHWC` (default) and `NCDHW` are supported.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: If `data_format` is neither `NDHWC` nor `NCDHW`.
"""
if data_format not in (DATA_FORMAT_NCDHW, DATA_FORMAT_NDHWC):
raise ValueError('data_format has to be either NCDHW or NDHWC.')
with ops.name_scope(scope, 'AvgPool3D', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = pooling_layers.AveragePooling3D(
pool_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
_scope=sc)
outputs = layer.apply(inputs)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
def _fused_batch_norm(inputs,
decay=0.999,
center=True,
scale=False,
epsilon=0.001,
activation_fn=None,
param_initializers=None,
param_regularizers=None,
updates_collections=ops.GraphKeys.UPDATE_OPS,
is_training=True,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
data_format=DATA_FORMAT_NHWC,
zero_debias_moving_mean=False,
scope=None):
"""Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Can be used as a normalizer function for conv2d and fully_connected.
Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in `tf.GraphKeys.UPDATE_OPS`, so they
need to be added as a dependency to the `train_op`. For example:
```python
update_ops = tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
```
One can set updates_collections=None to force the updates in place, but that
can have a speed penalty, especially in distributed settings.
Args:
inputs: A tensor with 2 or more dimensions, where the first dimension has
`batch_size`. The normalization is over all but the last dimension if
`data_format` is `NHWC` and the second dimension if `data_format` is
`NCHW`.
decay: Decay for the moving average. Reasonable values for `decay` are close
to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc.
Lower `decay` value (recommend trying `decay`=0.9) if model experiences
reasonably good training performance but poor validation and/or test
performance.
center: If True, add offset of `beta` to normalized tensor. If False,
`beta` is ignored.
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the
next layer is linear (also e.g. `nn.relu`), this can be disabled since the
scaling can be done by the next layer.
epsilon: Small float added to variance to avoid dividing by zero.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
param_initializers: Optional initializers for beta, gamma, moving mean and
moving variance.
param_regularizers: Optional regularizer for beta and gamma.
updates_collections: Collections to collect the update ops for computation.
The updates_ops need to be executed with the train_op. If None, a control
dependency would be added to make sure the updates are computed in place.
is_training: Whether or not the layer is in training mode. In training mode
it would accumulate the statistics of the moments into `moving_mean` and
`moving_variance` using an exponential moving average with the given
`decay`. When it is not in training mode then it would use the values of
the `moving_mean` and the `moving_variance`.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
data_format: A string. `NHWC` (default) and `NCHW` are supported.
zero_debias_moving_mean: Use zero_debias for moving_mean.
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation.
Raises:
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
ValueError: If the rank of `inputs` is undefined.
ValueError: If the rank of `inputs` is neither 2 or 4.
ValueError: If rank or `C` dimension of `inputs` is undefined.
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
with variable_scope.variable_scope(
scope, 'BatchNorm', [inputs], reuse=reuse) as sc:
inputs = ops.convert_to_tensor(inputs)
original_shape = inputs.get_shape()
original_inputs = inputs
original_rank = original_shape.ndims
if original_rank is None:
raise ValueError('Inputs %s has undefined rank' % inputs.name)
elif original_rank not in [2, 4]:
raise ValueError('Inputs %s has unsupported rank.'
' Expected 2 or 4 but got %d' %
(inputs.name, original_rank))
if original_rank == 2:
channels = inputs.get_shape().dims[-1].value
if channels is None:
raise ValueError('`C` dimension must be known but is None')
new_shape = [-1, 1, 1, channels]
if data_format == DATA_FORMAT_NCHW:
new_shape = [-1, channels, 1, 1]
inputs = array_ops.reshape(inputs, new_shape)
inputs_shape = inputs.get_shape()
if data_format == DATA_FORMAT_NHWC:
params_shape = inputs_shape[-1:]
else:
params_shape = inputs_shape[1:2]
if not params_shape.is_fully_defined():
raise ValueError('Inputs %s has undefined `C` dimension %s.' %
(inputs.name, params_shape))
# Allocate parameters for the beta and gamma of the normalization.
beta_collections = utils.get_variable_collections(variables_collections,
'beta')
# Float32 required to avoid precision-loss when using fp16 input/output
variable_dtype = dtypes.float32
if not param_initializers:
param_initializers = {}
if not param_regularizers:
param_regularizers = {}
beta_regularizer = param_regularizers.get('beta')
gamma_regularizer = param_regularizers.get('gamma')
if center:
beta_initializer = param_initializers.get('beta',
init_ops.zeros_initializer())
beta = variables.model_variable(
'beta',
shape=params_shape,
dtype=variable_dtype,
initializer=beta_initializer,
regularizer=beta_regularizer,
collections=beta_collections,
trainable=trainable)
else:
beta = array_ops.constant(0.0, dtype=variable_dtype, shape=params_shape)
if scale:
gamma_collections = utils.get_variable_collections(
variables_collections, 'gamma')
gamma_initializer = param_initializers.get('gamma',
init_ops.ones_initializer())
gamma = variables.model_variable(
'gamma',
shape=params_shape,
dtype=variable_dtype,
initializer=gamma_initializer,
regularizer=gamma_regularizer,
collections=gamma_collections,
trainable=trainable)
else:
gamma = array_ops.constant(1.0, dtype=variable_dtype, shape=params_shape)
# Create moving_mean and moving_variance variables and add them to the
# appropriate collections. We disable variable partitioning while creating
# them, because assign_moving_average is not yet supported for partitioned
# variables (this needs to be handled carefully, as it may break
# the checkpoint backward compatibility).
with variable_scope.variable_scope(
variable_scope.get_variable_scope()) as local_scope:
local_scope.set_partitioner(None)
moving_mean_collections = utils.get_variable_collections(
variables_collections, 'moving_mean')
moving_mean_initializer = param_initializers.get(
'moving_mean', init_ops.zeros_initializer())
moving_mean = variables.model_variable(
'moving_mean',
shape=params_shape,
dtype=variable_dtype,
initializer=moving_mean_initializer,
trainable=False,
collections=moving_mean_collections)
moving_variance_collections = utils.get_variable_collections(
variables_collections, 'moving_variance')
moving_variance_initializer = param_initializers.get(
'moving_variance', init_ops.ones_initializer())
moving_variance = variables.model_variable(
'moving_variance',
shape=params_shape,
dtype=variable_dtype,
initializer=moving_variance_initializer,
trainable=False,
collections=moving_variance_collections)
def _fused_batch_norm_training():
return nn.fused_batch_norm(
inputs, gamma, beta, epsilon=epsilon, data_format=data_format)
def _fused_batch_norm_inference():
return nn.fused_batch_norm(
inputs,
gamma,
beta,
mean=moving_mean,
variance=moving_variance,
epsilon=epsilon,
is_training=False,
data_format=data_format)
outputs, mean, variance = utils.smart_cond(is_training,
_fused_batch_norm_training,
_fused_batch_norm_inference)
# If `is_training` doesn't have a constant value, because it is a `Tensor`,
# a `Variable` or `Placeholder` then is_training_value will be None and
# `need_updates` will be true.
is_training_value = utils.constant_value(is_training)
need_updates = is_training_value is None or is_training_value
if need_updates:
if updates_collections is None:
no_updates = lambda: outputs
def _force_updates():
"""Internal function forces updates moving_vars if is_training."""
update_moving_mean = moving_averages.assign_moving_average(
moving_mean, mean, decay, zero_debias=zero_debias_moving_mean)
update_moving_variance = moving_averages.assign_moving_average(
moving_variance, variance, decay, zero_debias=False)
with ops.control_dependencies(
[update_moving_mean, update_moving_variance]):
return array_ops.identity(outputs)
outputs = utils.smart_cond(is_training, _force_updates, no_updates)
else:
moving_vars_fn = lambda: (moving_mean, moving_variance)
def _delay_updates():
"""Internal function that delay updates moving_vars if is_training."""
update_moving_mean = moving_averages.assign_moving_average(
moving_mean, mean, decay, zero_debias=zero_debias_moving_mean)
update_moving_variance = moving_averages.assign_moving_average(
moving_variance, variance, decay, zero_debias=False)
return update_moving_mean, update_moving_variance
update_mean, update_variance = utils.smart_cond(is_training,
_delay_updates,
moving_vars_fn)
ops.add_to_collections(updates_collections, update_mean)
ops.add_to_collections(updates_collections, update_variance)
outputs.set_shape(inputs_shape)
if original_shape.ndims == 2:
outputs = array_ops.reshape(outputs, array_ops.shape(original_inputs))
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def batch_norm(inputs,
decay=0.999,
center=True,
scale=False,
epsilon=0.001,
activation_fn=None,
param_initializers=None,
param_regularizers=None,
updates_collections=ops.GraphKeys.UPDATE_OPS,
is_training=True,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
batch_weights=None,
fused=None,
data_format=DATA_FORMAT_NHWC,
zero_debias_moving_mean=False,
scope=None,
renorm=False,
renorm_clipping=None,
renorm_decay=0.99,
adjustment=None):
"""Adds a Batch Normalization layer from http://arxiv.org/abs/1502.03167.
"Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Can be used as a normalizer function for conv2d and fully_connected. The
normalization is over all but the last dimension if `data_format` is `NHWC`
and all but the second dimension if `data_format` is `NCHW`. In case of a 2D
tensor this corresponds to the batch dimension, while in case of a 4D tensor
this
corresponds to the batch and space dimensions.
Note: when training, the moving_mean and moving_variance need to be updated.
By default the update ops are placed in `tf.GraphKeys.UPDATE_OPS`, so they
need to be added as a dependency to the `train_op`. For example:
```python
update_ops = tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op = optimizer.minimize(loss)
```
One can set updates_collections=None to force the updates in place, but that
can have a speed penalty, especially in distributed settings.
Args:
inputs: A tensor with 2 or more dimensions, where the first dimension has
`batch_size`. The normalization is over all but the last dimension if
`data_format` is `NHWC` and the second dimension if `data_format` is
`NCHW`.
decay: Decay for the moving average. Reasonable values for `decay` are close
to 1.0, typically in the multiple-nines range: 0.999, 0.99, 0.9, etc.
Lower `decay` value (recommend trying `decay`=0.9) if model experiences
reasonably good training performance but poor validation and/or test
performance. Try zero_debias_moving_mean=True for improved stability.
center: If True, add offset of `beta` to normalized tensor. If False, `beta`
is ignored.
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the
next layer is linear (also e.g. `nn.relu`), this can be disabled since the
scaling can be done by the next layer.
epsilon: Small float added to variance to avoid dividing by zero.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
param_initializers: Optional initializers for beta, gamma, moving mean and
moving variance.
param_regularizers: Optional regularizer for beta and gamma.
updates_collections: Collections to collect the update ops for computation.
The updates_ops need to be executed with the train_op. If None, a control
dependency would be added to make sure the updates are computed in place.
is_training: Whether or not the layer is in training mode. In training mode
it would accumulate the statistics of the moments into `moving_mean` and
`moving_variance` using an exponential moving average with the given
`decay`. When it is not in training mode then it would use the values of
the `moving_mean` and the `moving_variance`.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
batch_weights: An optional tensor of shape `[batch_size]`, containing a
frequency weight for each batch item. If present, then the batch
normalization uses weighted mean and variance. (This can be used to
correct for bias in training example selection.)
fused: if `None` or `True`, use a faster, fused implementation if possible.
If `False`, use the system recommended implementation.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
zero_debias_moving_mean: Use zero_debias for moving_mean. It creates a new
pair of variables 'moving_mean/biased' and 'moving_mean/local_step'.
scope: Optional scope for `variable_scope`.
renorm: Whether to use Batch Renormalization
(https://arxiv.org/abs/1702.03275). This adds extra variables during
training. The inference is the same for either value of this parameter.
renorm_clipping: A dictionary that may map keys 'rmax', 'rmin', 'dmax' to
scalar `Tensors` used to clip the renorm correction. The correction `(r,
d)` is used as `corrected_value = normalized_value * r + d`, with `r`
clipped to [rmin, rmax], and `d` to [-dmax, dmax]. Missing rmax, rmin,
dmax are set to inf, 0, inf, respectively.
renorm_decay: Momentum used to update the moving means and standard
deviations with renorm. Unlike `momentum`, this affects training and
should be neither too small (which would add noise) nor too large (which
would give stale estimates). Note that `decay` is still applied to get the
means and variances for inference.
adjustment: A function taking the `Tensor` containing the (dynamic) shape of
the input tensor and returning a pair (scale, bias) to apply to the
normalized values (before gamma and beta), only during training. For
example,
`adjustment = lambda shape: (
tf.random.uniform(shape[-1:], 0.93, 1.07),
tf.random.uniform(shape[-1:], -0.1, 0.1))` will scale the normalized
value by up to 7% up or down, then shift the result by up to 0.1
(with independent scaling and bias for each feature but shared
across all examples), and finally apply gamma and/or beta. If
`None`, no adjustment is applied.
Returns:
A `Tensor` representing the output of the operation.
Raises:
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
ValueError: If the rank of `inputs` is undefined.
ValueError: If rank or channels dimension of `inputs` is undefined.
"""
if fused is None:
fused = True
# Only use _fused_batch_norm if all of the following three
# conditions are true:
# (1) fused is set True;
# (2) it is possible to use (currently it doesn't support batch weights,
# renorm, and the case when rank is neither 2 nor 4);
# (3) it is used with zero_debias_moving_mean, or an input shape of rank 2,
# or non-default updates_collections (not implemented in
# normalization_layers.BatchNormalization yet); otherwise use the fused
# implementation in normalization_layers.BatchNormalization.
inputs = ops.convert_to_tensor(inputs)
rank = inputs.get_shape().ndims
possible_to_fuse = (
batch_weights is None and not renorm and rank in [2, 4] and
adjustment is None)
if fused and possible_to_fuse and (
zero_debias_moving_mean or rank == 2 or
updates_collections is not ops.GraphKeys.UPDATE_OPS):
return _fused_batch_norm(
inputs,
decay=decay,
center=center,
scale=scale,
epsilon=epsilon,
activation_fn=activation_fn,
param_initializers=param_initializers,
param_regularizers=param_regularizers,
updates_collections=updates_collections,
is_training=is_training,
reuse=reuse,
variables_collections=variables_collections,
outputs_collections=outputs_collections,
trainable=trainable,
data_format=data_format,
zero_debias_moving_mean=zero_debias_moving_mean,
scope=scope)
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
layer_variable_getter = _build_variable_getter()
with variable_scope.variable_scope(
scope,
'BatchNorm', [inputs],
reuse=reuse,
custom_getter=layer_variable_getter) as sc:
inputs = ops.convert_to_tensor(inputs)
# Determine whether we can use the core layer class.
if (batch_weights is None and
updates_collections is ops.GraphKeys.UPDATE_OPS and
not zero_debias_moving_mean):
# Use the core layer class.
axis = 1 if data_format == DATA_FORMAT_NCHW else -1
if not param_initializers:
param_initializers = {}
beta_initializer = param_initializers.get('beta',
init_ops.zeros_initializer())
gamma_initializer = param_initializers.get('gamma',
init_ops.ones_initializer())
moving_mean_initializer = param_initializers.get(
'moving_mean', init_ops.zeros_initializer())
moving_variance_initializer = param_initializers.get(
'moving_variance', init_ops.ones_initializer())
if not param_regularizers:
param_regularizers = {}
beta_regularizer = param_regularizers.get('beta')
gamma_regularizer = param_regularizers.get('gamma')
layer = normalization_layers.BatchNormalization(
axis=axis,
momentum=decay,
epsilon=epsilon,
center=center,
scale=scale,
beta_initializer=beta_initializer,
gamma_initializer=gamma_initializer,
moving_mean_initializer=moving_mean_initializer,
moving_variance_initializer=moving_variance_initializer,
beta_regularizer=beta_regularizer,
gamma_regularizer=gamma_regularizer,
trainable=trainable,
renorm=renorm,
renorm_clipping=renorm_clipping,
renorm_momentum=renorm_decay,
adjustment=adjustment,
name=sc.name,
_scope=sc,
_reuse=reuse,
fused=fused)
outputs = layer.apply(inputs, training=is_training)
# Add variables to collections.
_add_variable_to_collections(layer.moving_mean, variables_collections,
'moving_mean')
_add_variable_to_collections(layer.moving_variance, variables_collections,
'moving_variance')
if layer.beta is not None:
_add_variable_to_collections(layer.beta, variables_collections, 'beta')
if layer.gamma is not None:
_add_variable_to_collections(layer.gamma, variables_collections,
'gamma')
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
# Not supported by layer class: batch_weights argument,
# and custom updates_collections. In that case, use the legacy BN
# implementation.
# Custom updates collections are not supported because the update logic
# is different in this case, in particular w.r.t. "forced updates" and
# update op reuse.
if renorm:
raise ValueError('renorm is not supported with batch_weights, '
'updates_collections or zero_debias_moving_mean')
inputs_shape = inputs.get_shape()
inputs_rank = inputs_shape.ndims
if inputs_rank is None:
raise ValueError('Inputs %s has undefined rank.' % inputs.name)
dtype = inputs.dtype.base_dtype
if batch_weights is not None:
batch_weights = ops.convert_to_tensor(batch_weights)
inputs_shape[0:1].assert_is_compatible_with(batch_weights.get_shape())
# Reshape batch weight values so they broadcast across inputs.
nshape = [-1] + [1 for _ in range(inputs_rank - 1)]
batch_weights = array_ops.reshape(batch_weights, nshape)
if data_format == DATA_FORMAT_NCHW:
moments_axes = [0] + list(range(2, inputs_rank))
params_shape = inputs_shape[1:2]
# For NCHW format, rather than relying on implicit broadcasting, we
# explicitly reshape the params to params_shape_broadcast when computing
# the moments and the batch normalization.
params_shape_broadcast = list([1, inputs_shape.dims[1].value] +
[1 for _ in range(2, inputs_rank)])
else:
moments_axes = list(range(inputs_rank - 1))
params_shape = inputs_shape[-1:]
params_shape_broadcast = None
if not params_shape.is_fully_defined():
raise ValueError('Inputs %s has undefined channels dimension %s.' %
(inputs.name, params_shape))
# Allocate parameters for the beta and gamma of the normalization.
beta, gamma = None, None
if not param_initializers:
param_initializers = {}
if center:
beta_collections = utils.get_variable_collections(variables_collections,
'beta')
beta_initializer = param_initializers.get('beta',
init_ops.zeros_initializer())
beta = variables.model_variable(
'beta',
shape=params_shape,
dtype=dtype,
initializer=beta_initializer,
collections=beta_collections,
trainable=trainable)
if scale:
gamma_collections = utils.get_variable_collections(
variables_collections, 'gamma')
gamma_initializer = param_initializers.get('gamma',
init_ops.ones_initializer())
gamma = variables.model_variable(
'gamma',
shape=params_shape,
dtype=dtype,
initializer=gamma_initializer,
collections=gamma_collections,
trainable=trainable)
# Create moving_mean and moving_variance variables and add them to the
# appropriate collections. We disable variable partitioning while creating
# them, because assign_moving_average is not yet supported for partitioned
# variables (this needs to be handled carefully, as it may break
# the checkpoint backward compatibility).
with variable_scope.variable_scope(
variable_scope.get_variable_scope()) as local_scope:
local_scope.set_partitioner(None)
moving_mean_collections = utils.get_variable_collections(
variables_collections, 'moving_mean')
moving_mean_initializer = param_initializers.get(
'moving_mean', init_ops.zeros_initializer())
moving_mean = variables.model_variable(
'moving_mean',
shape=params_shape,
dtype=dtype,
initializer=moving_mean_initializer,
trainable=False,
collections=moving_mean_collections)
moving_variance_collections = utils.get_variable_collections(
variables_collections, 'moving_variance')
moving_variance_initializer = param_initializers.get(
'moving_variance', init_ops.ones_initializer())
moving_variance = variables.model_variable(
'moving_variance',
shape=params_shape,
dtype=dtype,
initializer=moving_variance_initializer,
trainable=False,
collections=moving_variance_collections)
# If `is_training` doesn't have a constant value, because it is a `Tensor`,
# a `Variable` or `Placeholder` then is_training_value will be None and
# `needs_moments` will be true.
is_training_value = utils.constant_value(is_training)
need_moments = is_training_value is None or is_training_value
if need_moments:
# Calculate the moments based on the individual batch.
if batch_weights is None:
if data_format == DATA_FORMAT_NCHW:
mean, variance = nn.moments(inputs, moments_axes, keep_dims=True)
mean = array_ops.reshape(mean, [-1])
variance = array_ops.reshape(variance, [-1])
else:
mean, variance = nn.moments(inputs, moments_axes)
else:
if data_format == DATA_FORMAT_NCHW:
mean, variance = nn.weighted_moments(
inputs, moments_axes, batch_weights, keepdims=True)
mean = array_ops.reshape(mean, [-1])
variance = array_ops.reshape(variance, [-1])
else:
mean, variance = nn.weighted_moments(inputs, moments_axes,
batch_weights)
moving_vars_fn = lambda: (moving_mean, moving_variance)
if updates_collections is None:
def _force_updates():
"""Internal function forces updates moving_vars if is_training."""
update_moving_mean = moving_averages.assign_moving_average(
moving_mean, mean, decay, zero_debias=zero_debias_moving_mean)
update_moving_variance = moving_averages.assign_moving_average(
moving_variance, variance, decay, zero_debias=False)
with ops.control_dependencies(
[update_moving_mean, update_moving_variance]):
return array_ops.identity(mean), array_ops.identity(variance)
mean, variance = utils.smart_cond(is_training, _force_updates,
moving_vars_fn)
else:
def _delay_updates():
"""Internal function that delay updates moving_vars if is_training."""
update_moving_mean = moving_averages.assign_moving_average(
moving_mean, mean, decay, zero_debias=zero_debias_moving_mean)
update_moving_variance = moving_averages.assign_moving_average(
moving_variance, variance, decay, zero_debias=False)
return update_moving_mean, update_moving_variance
update_mean, update_variance = utils.smart_cond(is_training,
_delay_updates,
moving_vars_fn)
ops.add_to_collections(updates_collections, update_mean)
ops.add_to_collections(updates_collections, update_variance)
# Use computed moments during training and moving_vars otherwise.
vars_fn = lambda: (mean, variance)
mean, variance = utils.smart_cond(is_training, vars_fn, moving_vars_fn)
else:
mean, variance = moving_mean, moving_variance
if data_format == DATA_FORMAT_NCHW:
mean = array_ops.reshape(mean, params_shape_broadcast)
variance = array_ops.reshape(variance, params_shape_broadcast)
if beta is not None:
beta = array_ops.reshape(beta, params_shape_broadcast)
if gamma is not None:
gamma = array_ops.reshape(gamma, params_shape_broadcast)
# Compute batch_normalization.
outputs = nn.batch_normalization(inputs, mean, variance, beta, gamma,
epsilon)
outputs.set_shape(inputs_shape)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def bias_add(inputs,
activation_fn=None,
initializer=init_ops.zeros_initializer(),
regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
data_format=DATA_FORMAT_NHWC,
scope=None):
"""Adds a bias to the inputs.
Can be used as a normalizer function for conv2d and fully_connected.
Args:
inputs: A tensor of with at least rank 2 and value for the last dimension,
e.g. `[batch_size, depth]`, `[None, None, None, depth]`.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
initializer: An initializer for the bias, defaults to 0.
regularizer: A regularizer like the result of `l1_regularizer` or
`l2_regularizer`.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
data_format: A string. 'NHWC' and 'NCHW' are supported.
scope: Optional scope for variable_scope.
Returns:
A tensor representing the result of adding biases to the inputs.
Raises:
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
ValueError: If `data_format` is `NCHW` and rank of `inputs` is not 4.
ValueError: If the rank of `inputs` is undefined.
ValueError: If rank or `C` dimension of `inputs` is undefined.
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
with variable_scope.variable_scope(
scope, 'BiasAdd', [inputs], reuse=reuse) as sc:
inputs = ops.convert_to_tensor(inputs)
dtype = inputs.dtype.base_dtype
inputs_shape = inputs.get_shape()
inputs_rank = inputs_shape.ndims
if inputs_rank is None:
raise ValueError('Dims of shape must be known but is None')
elif inputs_rank != 4 and data_format == DATA_FORMAT_NCHW:
raise ValueError('Data format NCHW only supports 4D Tensor')
axis = 1 if data_format == DATA_FORMAT_NCHW else -1
num_features = inputs_shape.dims[axis].value
if num_features is None:
raise ValueError('`C` dimension must be known but is None')
biases_collections = utils.get_variable_collections(variables_collections,
'biases')
biases = variables.model_variable(
'biases',
shape=[
num_features,
],
dtype=dtype,
initializer=initializer,
regularizer=regularizer,
collections=biases_collections,
trainable=trainable)
outputs = nn.bias_add(inputs, biases, data_format=data_format)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
# TODO(jbms): change `rate` parameter to `dilation_rate` for consistency with
# underlying op.
@add_arg_scope
def convolution(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None,
conv_dims=None):
"""Adds an N-D convolution followed by an optional batch_norm layer.
It is required that 1 <= N <= 3.
`convolution` creates a variable called `weights`, representing the
convolutional kernel, that is convolved (actually cross-correlated) with the
`inputs` to produce a `Tensor` of activations. If a `normalizer_fn` is
provided (such as `batch_norm`), it is then applied. Otherwise, if
`normalizer_fn` is None and a `biases_initializer` is provided then a `biases`
variable would be created and added the activations. Finally, if
`activation_fn` is not `None`, it is applied to the activations as well.
Performs atrous convolution with input stride/dilation rate equal to `rate`
if a value > 1 for any dimension of `rate` is specified. In this case
`stride` values != 1 are not supported.
Args:
inputs: A Tensor of rank N+2 of shape `[batch_size] + input_spatial_shape +
[in_channels]` if data_format does not start with "NC" (default), or
`[batch_size, in_channels] + input_spatial_shape` if data_format starts
with "NC".
num_outputs: Integer, the number of output filters.
kernel_size: A sequence of N positive integers specifying the spatial
dimensions of the filters. Can be a single integer to specify the same
value for all spatial dimensions.
stride: A sequence of N positive integers specifying the stride at which to
compute output. Can be a single integer to specify the same value for all
spatial dimensions. Specifying any `stride` value != 1 is incompatible
with specifying any `rate` value != 1.
padding: One of `"VALID"` or `"SAME"`.
data_format: A string or None. Specifies whether the channel dimension of
the `input` and output is the last dimension (default, or if `data_format`
does not start with "NC"), or the second dimension (if `data_format`
starts with "NC"). For N=1, the valid values are "NWC" (default) and
"NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For
N=3, the valid values are "NDHWC" (default) and "NCDHW".
rate: A sequence of N positive integers specifying the dilation rate to use
for atrous convolution. Can be a single integer to specify the same value
for all spatial dimensions. Specifying any `rate` value != 1 is
incompatible with specifying any `stride` value != 1.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
scope: Optional scope for `variable_scope`.
conv_dims: Optional convolution dimensionality, when set it would use the
corresponding convolution (e.g. 2 for Conv 2D, 3 for Conv 3D, ..). When
leaved to None it would select the convolution dimensionality based on the
input rank (i.e. Conv ND, with N = input_rank - 2).
Returns:
A tensor representing the output of the operation.
Raises:
ValueError: If `data_format` is invalid.
ValueError: Both 'rate' and `stride` are not uniformly 1.
"""
if data_format not in [None, 'NWC', 'NCW', 'NHWC', 'NCHW', 'NDHWC', 'NCDHW']:
raise ValueError('Invalid data_format: %r' % (data_format,))
layer_variable_getter = _build_variable_getter({
'bias': 'biases',
'kernel': 'weights'
})
with variable_scope.variable_scope(
scope, 'Conv', [inputs], reuse=reuse,
custom_getter=layer_variable_getter) as sc:
inputs = ops.convert_to_tensor(inputs)
input_rank = inputs.get_shape().ndims
if conv_dims is not None and conv_dims + 2 != input_rank:
raise ValueError('Convolution expects input with rank %d, got %d' %
(conv_dims + 2, input_rank))
if input_rank == 3:
layer_class = convolutional_layers.Convolution1D
elif input_rank == 4:
layer_class = convolutional_layers.Convolution2D
elif input_rank == 5:
layer_class = convolutional_layers.Convolution3D
else:
raise ValueError('Convolution not supported for input with rank',
input_rank)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = layer_class(
filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=rate,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
# Add variables to collections.
_add_variable_to_collections(layer.kernel, variables_collections, 'weights')
if layer.use_bias:
_add_variable_to_collections(layer.bias, variables_collections, 'biases')
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def convolution1d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
return convolution(
inputs,
num_outputs,
kernel_size,
stride,
padding,
data_format,
rate,
activation_fn,
normalizer_fn,
normalizer_params,
weights_initializer,
weights_regularizer,
biases_initializer,
biases_regularizer,
reuse,
variables_collections,
outputs_collections,
trainable,
scope,
conv_dims=1)
convolution1d.__doc__ = convolution.__doc__
@add_arg_scope
def convolution2d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
return convolution(
inputs,
num_outputs,
kernel_size,
stride,
padding,
data_format,
rate,
activation_fn,
normalizer_fn,
normalizer_params,
weights_initializer,
weights_regularizer,
biases_initializer,
biases_regularizer,
reuse,
variables_collections,
outputs_collections,
trainable,
scope,
conv_dims=2)
convolution2d.__doc__ = convolution.__doc__
@add_arg_scope
def convolution3d(inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=None,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
return convolution(
inputs,
num_outputs,
kernel_size,
stride,
padding,
data_format,
rate,
activation_fn,
normalizer_fn,
normalizer_params,
weights_initializer,
weights_regularizer,
biases_initializer,
biases_regularizer,
reuse,
variables_collections,
outputs_collections,
trainable,
scope,
conv_dims=3)
convolution3d.__doc__ = convolution.__doc__
@add_arg_scope
def convolution2d_in_plane(
inputs,
kernel_size,
stride=1,
padding='SAME',
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Performs the same in-plane convolution to each channel independently.
This is useful for performing various simple channel-independent convolution
operations such as image gradients:
image = tf.constant(..., shape=(16, 240, 320, 3))
vert_gradients = layers.conv2d_in_plane(image,
kernel=[1, -1],
kernel_size=[2, 1])
horz_gradients = layers.conv2d_in_plane(image,
kernel=[1, -1],
kernel_size=[1, 2])
Args:
inputs: A 4-D tensor with dimensions [batch_size, height, width, channels].
kernel_size: A list of length 2 holding the [kernel_height, kernel_width] of
of the pooling. Can be an int if both values are the same.
stride: A list of length 2 `[stride_height, stride_width]`. Can be an int if
both strides are the same. Note that presently both strides must have the
same value.
padding: The padding type to use, either 'SAME' or 'VALID'.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation.
"""
with variable_scope.variable_scope(
scope, 'ConvInPlane', [inputs], reuse=reuse) as sc:
dtype = inputs.dtype.base_dtype
kernel_h, kernel_w = utils.two_element_tuple(kernel_size)
stride_h, stride_w = utils.two_element_tuple(stride)
num_filters_in = utils.last_dimension(inputs.get_shape(), min_rank=4)
weights_shape = [kernel_h, kernel_w, 1, 1]
weights_collections = utils.get_variable_collections(
variables_collections, 'weights')
weights = variables.model_variable(
'weights',
shape=weights_shape,
dtype=dtype,
initializer=weights_initializer,
regularizer=weights_regularizer,
collections=weights_collections,
trainable=trainable)
depthwise_weights = array_ops.tile(weights, [1, 1, num_filters_in, 1])
outputs = nn.depthwise_conv2d(inputs, depthwise_weights,
[1, stride_h, stride_w, 1], padding)
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
else:
if biases_initializer is not None:
biases_collections = utils.get_variable_collections(
variables_collections, 'biases')
biases = variables.model_variable(
'biases',
shape=[
num_filters_in,
],
dtype=dtype,
initializer=biases_initializer,
regularizer=biases_regularizer,
collections=biases_collections,
trainable=trainable)
outputs = nn.bias_add(outputs, biases)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def convolution2d_transpose(
inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=DATA_FORMAT_NHWC,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds a convolution2d_transpose with an optional batch normalization layer.
The function creates a variable called `weights`, representing the
kernel, that is convolved with the input. If `normalizer_fn` is `None`, a
second variable called 'biases' is added to the result of the operation.
Args:
inputs: A 4-D `Tensor` of type `float` and shape `[batch, height, width,
in_channels]` for `NHWC` data format or `[batch, in_channels, height,
width]` for `NCHW` data format.
num_outputs: Integer, the number of output filters.
kernel_size: A list of length 2 holding the [kernel_height, kernel_width] of
of the filters. Can be an int if both values are the same.
stride: A list of length 2: [stride_height, stride_width]. Can be an int if
both strides are the same. Note that presently both strides must have the
same value.
padding: One of 'VALID' or 'SAME'.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: Whether or not the variables should be trainable or not.
scope: Optional scope for variable_scope.
Returns:
A tensor representing the output of the operation.
Raises:
ValueError: If 'kernel_size' is not a list of length 2.
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
ValueError: If `C` dimension of `inputs` is None.
"""
layer_variable_getter = _build_variable_getter({
'bias': 'biases',
'kernel': 'weights'
})
with variable_scope.variable_scope(
scope,
'Conv2d_transpose', [inputs],
reuse=reuse,
custom_getter=layer_variable_getter) as sc:
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = convolutional_layers.Convolution2DTranspose(
filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
# Add variables to collections.
_add_variable_to_collections(layer.kernel, variables_collections, 'weights')
if layer.bias is not None:
_add_variable_to_collections(layer.bias, variables_collections, 'biases')
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def convolution3d_transpose(
inputs,
num_outputs,
kernel_size,
stride=1,
padding='SAME',
data_format=DATA_FORMAT_NDHWC,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds a convolution3d_transpose with an optional batch normalization layer.
The function creates a variable called `weights`, representing the
kernel, that is convolved with the input. If `batch_norm_params` is `None`, a
second variable called 'biases' is added to the result of the operation.
Args:
inputs: A 5-D `Tensor` of type `float` and shape `[batch, depth, height,
width, in_channels]` for `NDHWC` data format or `[batch, in_channels,
depth, height, width]` for `NCDHW` data format.
num_outputs: Integer, the number of output filters.
kernel_size: A list of length 3 holding the [kernel_depth, kernel_height,
kernel_width] of the filters. Can be an int if both values are the same.
stride: A list of length 3: [stride_depth, stride_height, stride_width]. Can
be an int if both strides are the same. Note that presently both strides
must have the same value.
padding: One of 'VALID' or 'SAME'.
data_format: A string. `NDHWC` (default) and `NCDHW` are supported.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: Whether or not the variables should be trainable or not.
scope: Optional scope for variable_scope.
Returns:
A tensor representing the output of the operation.
Raises:
ValueError: If 'kernel_size' is not a list of length 3.
ValueError: If `data_format` is neither `NDHWC` nor `NCDHW`.
ValueError: If `C` dimension of `inputs` is None.
"""
layer_variable_getter = _build_variable_getter({
'bias': 'biases',
'kernel': 'weights'
})
with variable_scope.variable_scope(
scope,
'Conv3d_transpose', [inputs],
reuse=reuse,
custom_getter=layer_variable_getter) as sc:
if data_format not in (DATA_FORMAT_NCDHW, DATA_FORMAT_NDHWC):
raise ValueError('data_format has to be either NCDHW or NDHWC.')
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = convolutional_layers.Convolution3DTranspose(
filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
# Add variables to collections.
_add_variable_to_collections(layer.kernel, variables_collections, 'weights')
if layer.bias is not None:
_add_variable_to_collections(layer.bias, variables_collections, 'biases')
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def dense_to_sparse(tensor, eos_token=0, outputs_collections=None, scope=None):
"""Converts a dense tensor into a sparse tensor.
An example use would be to convert dense labels to sparse ones
so that they can be fed to the ctc_loss.
Args:
tensor: An `int` `Tensor` to be converted to a `Sparse`.
eos_token: An integer. It is part of the target label that signifies the
end of a sentence.
outputs_collections: Collection to add the outputs.
scope: Optional scope for name_scope.
"""
with variable_scope.variable_scope(scope, 'dense_to_sparse', [tensor]) as sc:
tensor = ops.convert_to_tensor(tensor)
indices = array_ops.where(
math_ops.not_equal(tensor, constant_op.constant(eos_token,
tensor.dtype)))
values = array_ops.gather_nd(tensor, indices)
shape = array_ops.shape(tensor, out_type=dtypes.int64)
outputs = sparse_tensor.SparseTensor(indices, values, shape)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def dropout(inputs,
keep_prob=0.5,
noise_shape=None,
is_training=True,
outputs_collections=None,
scope=None,
seed=None):
"""Returns a dropout op applied to the input.
With probability `keep_prob`, outputs the input element scaled up by
`1 / keep_prob`, otherwise outputs `0`. The scaling is so that the expected
sum is unchanged.
Args:
inputs: The tensor to pass to the nn.dropout op.
keep_prob: A scalar `Tensor` with the same type as x. The probability that
each element is kept.
noise_shape: A 1-D `Tensor` of type `int32`, representing the shape for
randomly generated keep/drop flags.
is_training: A bool `Tensor` indicating whether or not the model is in
training mode. If so, dropout is applied and values scaled. Otherwise,
inputs is returned.
outputs_collections: Collection to add the outputs.
scope: Optional scope for name_scope.
seed: A Python integer. Used to create random seeds. See
`tf.compat.v1.set_random_seed` for behavior.
Returns:
A tensor representing the output of the operation.
"""
with variable_scope.variable_scope(
scope, 'Dropout', [inputs], custom_getter=_model_variable_getter) as sc:
inputs = ops.convert_to_tensor(inputs)
layer = core_layers.Dropout(
rate=1 - keep_prob,
noise_shape=noise_shape,
seed=seed,
name=sc.name,
_scope=sc)
outputs = layer.apply(inputs, training=is_training)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def flatten(inputs, outputs_collections=None, scope=None):
"""Flattens the input while maintaining the batch_size.
Assumes that the first dimension represents the batch.
Args:
inputs: A tensor of size [batch_size, ...].
outputs_collections: Collection to add the outputs.
scope: Optional scope for name_scope.
Returns:
A flattened tensor with shape [batch_size, k].
Raises:
ValueError: If inputs rank is unknown or less than 2.
"""
with ops.name_scope(scope, 'Flatten', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
outputs = core_layers.flatten(inputs)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
def _sparse_inner_flatten(inputs, new_rank):
"""Helper function for `inner_flatten`."""
inputs_rank = inputs.dense_shape.get_shape().as_list()[0]
if inputs_rank < new_rank:
raise ValueError(
'Inputs has rank less than new_rank. {} must have rank at least'
' {}. Received rank {}, shape {}'.format(inputs, new_rank, inputs_rank,
inputs.get_shape()))
outer_dimensions = inputs.dense_shape[:new_rank - 1]
inner_dimensions = inputs.dense_shape[new_rank - 1:]
new_shape = array_ops.concat(
(outer_dimensions, [math_ops.reduce_prod(inner_dimensions)]), 0)
flattened = sparse_ops.sparse_reshape(inputs, new_shape)
return flattened
def _dense_inner_flatten(inputs, new_rank):
"""Helper function for `inner_flatten`."""
rank_assertion = check_ops.assert_rank_at_least(
inputs, new_rank, message='inputs has rank less than new_rank')
with ops.control_dependencies([rank_assertion]):
outer_dimensions = array_ops.strided_slice(
array_ops.shape(inputs), [0], [new_rank - 1])
new_shape = array_ops.concat((outer_dimensions, [-1]), 0)
reshaped = array_ops.reshape(inputs, new_shape)
# if `new_rank` is an integer, try to calculate new shape.
if isinstance(new_rank, six.integer_types):
static_shape = inputs.get_shape()
if static_shape is not None and static_shape.dims is not None:
static_shape = static_shape.as_list()
static_outer_dims = static_shape[:new_rank - 1]
static_inner_dims = static_shape[new_rank - 1:]
flattened_dimension = 1
for inner_dim in static_inner_dims:
if inner_dim is None:
flattened_dimension = None
break
flattened_dimension *= inner_dim
reshaped.set_shape(static_outer_dims + [flattened_dimension])
return reshaped
@add_arg_scope
def _inner_flatten(inputs, new_rank, output_collections=None, scope=None):
"""Flattens inner dimensions of `inputs`, returns a Tensor with `new_rank`.
For example:
'''
x = tf.random.uniform(shape=[1, 2, 3, 4, 5, 6])
y = _inner_flatten(x, 4)
assert y.get_shape().as_list() == [1, 2, 3, (4 * 5 * 6)]
'''
This layer will fail at run time if `new_rank` is greater than the current
rank of `inputs`.
Args:
inputs: A `Tensor` or `SparseTensor`.
new_rank: The desired rank of the returned `Tensor` or `SparseTensor`.
output_collections: Collection to which the outputs will be added.
scope: Optional scope for `name_scope`.
Returns:
A `Tensor` or `SparseTensor` containing the same values as `inputs`, but
with innermost dimensions flattened to obtain rank `new_rank`.
Raises:
TypeError: `inputs` is not a `Tensor` or `SparseTensor`.
"""
with ops.name_scope(scope, 'InnerFlatten', [inputs, new_rank]) as sc:
if isinstance(inputs, sparse_tensor.SparseTensor):
flattened = _sparse_inner_flatten(inputs, new_rank)
else:
inputs = ops.convert_to_tensor(inputs)
flattened = _dense_inner_flatten(inputs, new_rank)
return utils.collect_named_outputs(output_collections, sc, flattened)
def _model_variable_getter(
getter,
name,
shape=None,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
collections=None,
caching_device=None,
partitioner=None,
rename=None,
use_resource=None,
synchronization=tf_variables.VariableSynchronization.AUTO,
aggregation=tf_variables.VariableAggregation.NONE,
**_):
"""Getter that uses model_variable for compatibility with core layers."""
short_name = name.split('/')[-1]
if rename and short_name in rename:
name_components = name.split('/')
name_components[-1] = rename[short_name]
name = '/'.join(name_components)
return variables.model_variable(
name,
shape=shape,
dtype=dtype,
initializer=initializer,
regularizer=regularizer,
collections=collections,
trainable=trainable,
caching_device=caching_device,
partitioner=partitioner,
custom_getter=getter,
use_resource=use_resource,
synchronization=synchronization,
aggregation=aggregation)
def _build_variable_getter(rename=None):
"""Build a model variable getter that respects scope getter and renames."""
# VariableScope will nest the getters
def layer_variable_getter(getter, *args, **kwargs):
kwargs['rename'] = rename
return _model_variable_getter(getter, *args, **kwargs)
return layer_variable_getter
def _add_variable_to_collections(variable, collections_set, collections_name):
"""Adds variable (or all its parts) to all collections with that name."""
collections = utils.get_variable_collections(collections_set,
collections_name) or []
variables_list = [variable]
if isinstance(variable, tf_variables.PartitionedVariable):
variables_list = [v for v in variable]
for collection in collections:
for var in variables_list:
if var not in ops.get_collection(collection):
ops.add_to_collection(collection, var)
@add_arg_scope
def fully_connected(inputs,
num_outputs,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds a fully connected layer.
`fully_connected` creates a variable called `weights`, representing a fully
connected weight matrix, which is multiplied by the `inputs` to produce a
`Tensor` of hidden units. If a `normalizer_fn` is provided (such as
`batch_norm`), it is then applied. Otherwise, if `normalizer_fn` is
None and a `biases_initializer` is provided then a `biases` variable would be
created and added the hidden units. Finally, if `activation_fn` is not `None`,
it is applied to the hidden units as well.
Note: that if `inputs` have a rank greater than 2, then `inputs` is flattened
prior to the initial matrix multiply by `weights`.
Args:
inputs: A tensor of at least rank 2 and static value for the last dimension;
i.e. `[batch_size, depth]`, `[None, None, None, channels]`.
num_outputs: Integer or long, the number of output units in the layer.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the weights.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collections per variable.
outputs_collections: Collection to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
scope: Optional scope for variable_scope.
Returns:
The tensor variable representing the result of the series of operations.
Raises:
ValueError: If x has rank less than 2 or if its last dimension is not set.
"""
if not isinstance(num_outputs, six.integer_types):
raise ValueError('num_outputs type should be one of %s, got %s.' %
(list(six.integer_types), type(num_outputs)))
layer_variable_getter = _build_variable_getter({
'bias': 'biases',
'kernel': 'weights'
})
with variable_scope.variable_scope(
scope,
'fully_connected', [inputs],
reuse=reuse,
custom_getter=layer_variable_getter) as sc:
inputs = ops.convert_to_tensor(inputs)
layer = core_layers.Dense(
units=num_outputs,
activation=None,
use_bias=not normalizer_fn and biases_initializer,
kernel_initializer=weights_initializer,
bias_initializer=biases_initializer,
kernel_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
# Add variables to collections.
_add_variable_to_collections(layer.kernel, variables_collections, 'weights')
if layer.bias is not None:
_add_variable_to_collections(layer.bias, variables_collections, 'biases')
# Apply normalizer function / layer.
if normalizer_fn is not None:
if not normalizer_params:
normalizer_params = {}
outputs = normalizer_fn(outputs, **normalizer_params)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
# class GDN(base.Layer):
# """Generalized divisive normalization layer.
# Based on the papers:
# "Density Modeling of Images using a Generalized Normalization
# Transformation"
# Johannes Ballé, Valero Laparra, Eero P. Simoncelli
# https://arxiv.org/abs/1511.06281
# "End-to-end Optimized Image Compression"
# Johannes Ballé, Valero Laparra, Eero P. Simoncelli
# https://arxiv.org/abs/1611.01704
# Implements an activation function that is essentially a multivariate
# generalization of a particular sigmoid-type function:
# ```
# y[i] = x[i] / sqrt(beta[i] + sum_j(gamma[j, i] * x[j]))
# ```
# where `i` and `j` run over channels. This implementation never sums across
# spatial dimensions. It is similar to local response normalization, but much
# more flexible, as `beta` and `gamma` are trainable parameters.
# Arguments:
# inverse: If `False` (default), compute GDN response. If `True`, compute IGDN
# response (one step of fixed point iteration to invert GDN; the division is
# replaced by multiplication).
# beta_min: Lower bound for beta, to prevent numerical error from causing
# square root of zero or negative values.
# gamma_init: The gamma matrix will be initialized as the identity matrix
# multiplied with this value. If set to zero, the layer is effectively
# initialized to the identity operation, since beta is initialized as one. A
# good default setting is somewhere between 0 and 0.5.
# reparam_offset: Offset added to the reparameterization of beta and gamma.
# The reparameterization of beta and gamma as their square roots lets the
# training slow down when their values are close to zero, which is desirable
# as small values in the denominator can lead to a situation where gradient
# noise on beta/gamma leads to extreme amounts of noise in the GDN
# activations. However, without the offset, we would get zero gradients if
# any elements of beta or gamma were exactly zero, and thus the training
# could get stuck. To prevent this, we add this small constant. The default
# value was empirically determined as a good starting point. Making it
# bigger potentially leads to more gradient noise on the activations, making
# it too small may lead to numerical precision issues.
# data_format: Format of input tensor. Currently supports `'channels_first'`
# and `'channels_last'`.
# activity_regularizer: Regularizer function for the output.
# trainable: Boolean, if `True`, also add variables to the graph collection
# `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
# name: String, the name of the layer. Layers with the same name will share
# weights, but to avoid mistakes we require `reuse=True` in such cases.
# Properties:
# inverse: Boolean, whether GDN is computed (`True`) or IGDN (`False`).
# data_format: Format of input tensor. Currently supports `'channels_first'`
# and `'channels_last'`.
# beta: The beta parameter as defined above (1D `Tensor`).
# gamma: The gamma parameter as defined above (2D `Tensor`).
# """
# def __init__(self,
# inverse=False,
# beta_min=1e-6,
# gamma_init=.1,
# reparam_offset=2**-18,
# data_format='channels_last',
# activity_regularizer=None,
# trainable=True,
# name=None,
# **kwargs):
# super(GDN, self).__init__(
# trainable=trainable,
# name=name,
# activity_regularizer=activity_regularizer,
# **kwargs)
# self.inverse = inverse
# self._beta_min = beta_min
# self._gamma_init = gamma_init
# self._reparam_offset = reparam_offset
# self.data_format = data_format
# self._channel_axis() # trigger ValueError early
# self.input_spec = input_spec.InputSpec(min_ndim=3, max_ndim=5)
# def _channel_axis(self):
# try:
# return {'channels_first': 1, 'channels_last': -1}[self.data_format]
# except KeyError:
# raise ValueError('Unsupported `data_format` for GDN layer: {}.'.format(
# self.data_format))
# @staticmethod
# def _lower_bound(inputs, bound, name=None):
# """Same as tf.maximum, but with helpful gradient for inputs < bound.
# The gradient is overwritten so that it is passed through if the input is not
# hitting the bound. If it is, only gradients that push `inputs` higher than
# the bound are passed through. No gradients are passed through to the bound.
# Args:
# inputs: input tensor
# bound: lower bound for the input tensor
# name: name for this op
# Returns:
# tf.maximum(inputs, bound)
# """
# with ops.name_scope(name, 'GDNLowerBound', [inputs, bound]) as scope:
# inputs = ops.convert_to_tensor(inputs, name='inputs')
# bound = ops.convert_to_tensor(bound, name='bound')
# with ops.get_default_graph().gradient_override_map(
# {'Maximum': 'GDNLowerBound'}):
# return math_ops.maximum(inputs, bound, name=scope)
# @staticmethod
# def _lower_bound_grad(op, grad):
# """Gradient for `_lower_bound`.
# Args:
# op: the tensorflow op for which to calculate a gradient
# grad: gradient with respect to the output of the op
# Returns:
# gradients with respect to the inputs of the op
# """
# inputs = op.inputs[0]
# bound = op.inputs[1]
# pass_through_if = math_ops.logical_or(inputs >= bound, grad < 0)
# return [math_ops.cast(pass_through_if, grad.dtype) * grad, None]
# def build(self, input_shape):
# channel_axis = self._channel_axis()
# input_shape = tensor_shape.TensorShape(input_shape)
# num_channels = input_shape.dims[channel_axis].value
# if num_channels is None:
# raise ValueError('The channel dimension of the inputs to `GDN` '
# 'must be defined.')
# self._input_rank = input_shape.ndims
# self.input_spec = input_spec.InputSpec(
# ndim=input_shape.ndims, axes={channel_axis: num_channels})
# pedestal = array_ops.constant(self._reparam_offset**2, dtype=self.dtype)
# beta_bound = array_ops.constant(
# (self._beta_min + self._reparam_offset**2)**.5, dtype=self.dtype)
# gamma_bound = array_ops.constant(self._reparam_offset, dtype=self.dtype)
# def beta_initializer(shape, dtype=None, partition_info=None):
# del partition_info # unused
# pedestal = array_ops.constant(self._reparam_offset**2, dtype=self.dtype)
# return math_ops.sqrt(array_ops.ones(shape, dtype=dtype) + pedestal)
# def gamma_initializer(shape, dtype=None, partition_info=None):
# del partition_info # unused
# assert len(shape) == 2
# assert shape[0] == shape[1]
# eye = linalg_ops.eye(shape[0], dtype=dtype)
# pedestal = array_ops.constant(self._reparam_offset**2, dtype=self.dtype)
# return math_ops.sqrt(self._gamma_init * eye + pedestal)
# beta = self.add_variable(
# 'reparam_beta',
# shape=[num_channels],
# initializer=beta_initializer,
# dtype=self.dtype,
# trainable=True)
# beta = self._lower_bound(beta, beta_bound)
# self.beta = math_ops.square(beta) - pedestal
# gamma = self.add_variable(
# 'reparam_gamma',
# shape=[num_channels, num_channels],
# initializer=gamma_initializer,
# dtype=self.dtype,
# trainable=True)
# gamma = self._lower_bound(gamma, gamma_bound)
# self.gamma = math_ops.square(gamma) - pedestal
# self.built = True
# def call(self, inputs):
# inputs = ops.convert_to_tensor(inputs, dtype=self.dtype)
# ndim = self._input_rank
# shape = self.gamma.get_shape().as_list()
# gamma = array_ops.reshape(self.gamma, (ndim - 2) * [1] + shape)
# # Compute normalization pool.
# if self.data_format == 'channels_first':
# norm_pool = nn.convolution(
# math_ops.square(inputs),
# gamma,
# 'VALID',
# data_format='NC' + 'DHW' [-(ndim - 2):])
# if ndim == 3:
# norm_pool = array_ops.expand_dims(norm_pool, 2)
# norm_pool = nn.bias_add(norm_pool, self.beta, data_format='NCHW')
# norm_pool = array_ops.squeeze(norm_pool, [2])
# elif ndim == 5:
# shape = array_ops.shape(norm_pool)
# norm_pool = array_ops.reshape(norm_pool, shape[:3] + [-1])
# norm_pool = nn.bias_add(norm_pool, self.beta, data_format='NCHW')
# norm_pool = array_ops.reshape(norm_pool, shape)
# else: # ndim == 4
# norm_pool = nn.bias_add(norm_pool, self.beta, data_format='NCHW')
# else: # channels_last
# norm_pool = nn.convolution(math_ops.square(inputs), gamma, 'VALID')
# norm_pool = nn.bias_add(norm_pool, self.beta, data_format='NHWC')
# norm_pool = math_ops.sqrt(norm_pool)
# if self.inverse:
# outputs = inputs * norm_pool
# else:
# outputs = inputs / norm_pool
# outputs.set_shape(inputs.get_shape())
# return outputs
# def compute_output_shape(self, input_shape):
# channel_axis = self._channel_axis()
# input_shape = tensor_shape.TensorShape(input_shape)
# if not 3 <= input_shape.ndim <= 5:
# raise ValueError('`input_shape` must be of rank 3 to 5, inclusive.')
# if input_shape.dims[channel_axis].value is None:
# raise ValueError(
# 'The channel dimension of `input_shape` must be defined.')
# return input_shape
# ops.RegisterGradient('GDNLowerBound')(GDN._lower_bound_grad) # pylint:disable=protected-access
# def gdn(inputs,
# inverse=False,
# beta_min=1e-6,
# gamma_init=.1,
# reparam_offset=2**-18,
# data_format='channels_last',
# activity_regularizer=None,
# trainable=True,
# name=None,
# reuse=None):
# """Functional interface for GDN layer.
# Based on the papers:
# "Density Modeling of Images using a Generalized Normalization
# Transformation"
# Johannes Ballé, Valero Laparra, Eero P. Simoncelli
# https://arxiv.org/abs/1511.06281
# "End-to-end Optimized Image Compression"
# Johannes Ballé, Valero Laparra, Eero P. Simoncelli
# https://arxiv.org/abs/1611.01704
# Implements an activation function that is essentially a multivariate
# generalization of a particular sigmoid-type function:
# ```
# y[i] = x[i] / sqrt(beta[i] + sum_j(gamma[j, i] * x[j]))
# ```
# where `i` and `j` run over channels. This implementation never sums across
# spatial dimensions. It is similar to local response normalization, but much
# more flexible, as `beta` and `gamma` are trainable parameters.
# Args:
# inputs: Tensor input.
# inverse: If `False` (default), compute GDN response. If `True`, compute IGDN
# response (one step of fixed point iteration to invert GDN; the division is
# replaced by multiplication).
# beta_min: Lower bound for beta, to prevent numerical error from causing
# square root of zero or negative values.
# gamma_init: The gamma matrix will be initialized as the identity matrix
# multiplied with this value. If set to zero, the layer is effectively
# initialized to the identity operation, since beta is initialized as one. A
# good default setting is somewhere between 0 and 0.5.
# reparam_offset: Offset added to the reparameterization of beta and gamma.
# The reparameterization of beta and gamma as their square roots lets the
# training slow down when their values are close to zero, which is desirable
# as small values in the denominator can lead to a situation where gradient
# noise on beta/gamma leads to extreme amounts of noise in the GDN
# activations. However, without the offset, we would get zero gradients if
# any elements of beta or gamma were exactly zero, and thus the training
# could get stuck. To prevent this, we add this small constant. The default
# value was empirically determined as a good starting point. Making it
# bigger potentially leads to more gradient noise on the activations, making
# it too small may lead to numerical precision issues.
# data_format: Format of input tensor. Currently supports `'channels_first'`
# and `'channels_last'`.
# activity_regularizer: Regularizer function for the output.
# trainable: Boolean, if `True`, also add variables to the graph collection
# `GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
# name: String, the name of the layer. Layers with the same name will share
# weights, but to avoid mistakes we require `reuse=True` in such cases.
# reuse: Boolean, whether to reuse the weights of a previous layer by the same
# name.
# Returns:
# Output tensor.
# """
# layer = GDN(
# inverse=inverse,
# beta_min=beta_min,
# gamma_init=gamma_init,
# reparam_offset=reparam_offset,
# data_format=data_format,
# activity_regularizer=activity_regularizer,
# trainable=trainable,
# name=name,
# dtype=inputs.dtype.base_dtype,
# _scope=name,
# _reuse=reuse)
# return layer.apply(inputs)
@add_arg_scope
def layer_norm(inputs,
center=True,
scale=True,
activation_fn=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
begin_norm_axis=1,
begin_params_axis=-1,
scope=None):
"""Adds a Layer Normalization layer.
Based on the paper:
"Layer Normalization"
Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
https://arxiv.org/abs/1607.06450.
Can be used as a normalizer function for conv2d and fully_connected.
Given a tensor `inputs` of rank `R`, moments are calculated and normalization
is performed over axes `begin_norm_axis ... R - 1`. Scaling and centering,
if requested, is performed over axes `begin_params_axis .. R - 1`.
By default, `begin_norm_axis = 1` and `begin_params_axis = -1`,
meaning that normalization is performed over all but the first axis
(the `HWC` if `inputs` is `NHWC`), while the `beta` and `gamma` trainable
parameters are calculated for the rightmost axis (the `C` if `inputs` is
`NHWC`). Scaling and recentering is performed via broadcast of the
`beta` and `gamma` parameters with the normalized tensor.
The shapes of `beta` and `gamma` are `inputs.shape[begin_params_axis:]`,
and this part of the inputs' shape must be fully defined.
Args:
inputs: A tensor having rank `R`. The normalization is performed over axes
`begin_norm_axis ... R - 1` and centering and scaling parameters are
calculated over `begin_params_axis ... R - 1`.
center: If True, add offset of `beta` to normalized tensor. If False, `beta`
is ignored.
scale: If True, multiply by `gamma`. If False, `gamma` is not used. When the
next layer is linear (also e.g. `nn.relu`), this can be disabled since the
scaling can be done by the next layer.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional collections for the variables.
outputs_collections: Collections to add the outputs.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
begin_norm_axis: The first normalization dimension: normalization will be
performed along dimensions `begin_norm_axis : rank(inputs)`
begin_params_axis: The first parameter (beta, gamma) dimension: scale and
centering parameters will have dimensions
`begin_params_axis : rank(inputs)` and will be broadcast with the
normalized inputs accordingly.
scope: Optional scope for `variable_scope`.
Returns:
A `Tensor` representing the output of the operation, having the same
shape and dtype as `inputs`.
Raises:
ValueError: If the rank of `inputs` is not known at graph build time,
or if `inputs.shape[begin_params_axis:]` is not fully defined at
graph build time.
"""
with variable_scope.variable_scope(
scope, 'LayerNorm', [inputs], reuse=reuse) as sc:
inputs = ops.convert_to_tensor(inputs)
inputs_shape = inputs.shape
inputs_rank = inputs_shape.ndims
if inputs_rank is None:
raise ValueError('Inputs %s has undefined rank.' % inputs.name)
dtype = inputs.dtype.base_dtype
if begin_norm_axis < 0:
begin_norm_axis = inputs_rank + begin_norm_axis
if begin_params_axis >= inputs_rank or begin_norm_axis >= inputs_rank:
raise ValueError('begin_params_axis (%d) and begin_norm_axis (%d) '
'must be < rank(inputs) (%d)' %
(begin_params_axis, begin_norm_axis, inputs_rank))
params_shape = inputs_shape[begin_params_axis:]
if not params_shape.is_fully_defined():
raise ValueError(
'Inputs %s: shape(inputs)[%s:] is not fully defined: %s' %
(inputs.name, begin_params_axis, inputs_shape))
# Allocate parameters for the beta and gamma of the normalization.
beta, gamma = None, None
if center:
beta_collections = utils.get_variable_collections(variables_collections,
'beta')
beta = variables.model_variable(
'beta',
shape=params_shape,
dtype=dtype,
initializer=init_ops.zeros_initializer(),
collections=beta_collections,
trainable=trainable)
if scale:
gamma_collections = utils.get_variable_collections(
variables_collections, 'gamma')
gamma = variables.model_variable(
'gamma',
shape=params_shape,
dtype=dtype,
initializer=init_ops.ones_initializer(),
collections=gamma_collections,
trainable=trainable)
# By default, compute the moments across all the dimensions except the one with index 0.
norm_axes = list(range(begin_norm_axis, inputs_rank))
mean, variance = nn.moments(inputs, norm_axes, keep_dims=True)
# Compute layer normalization using the batch_normalization function.
# Note that epsilon must be increased for float16 due to the limited
# representable range.
variance_epsilon = 1e-12 if dtype != dtypes.float16 else 1e-3
outputs = nn.batch_normalization(
inputs,
mean,
variance,
offset=beta,
scale=gamma,
variance_epsilon=variance_epsilon)
outputs.set_shape(inputs_shape)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def images_to_sequence(inputs,
data_format=DATA_FORMAT_NHWC,
outputs_collections=None,
scope=None):
"""Convert a batch of images into a batch of sequences.
Args:
inputs: a (num_images, height, width, depth) tensor
data_format: A string. `NHWC` (default) and `NCHW` are supported.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Raises:
ValueError: If `data_format` is not either NCHW or NHWC.
Returns:
(width, num_images*height, depth) sequence tensor
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
with ops.name_scope(scope, 'ImagesToSequence', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
if df == 'channels_first':
inputs = array_ops.transpose(inputs, [0, 2, 3, 1])
_, _, width, depth = inputs.get_shape().as_list()
s = array_ops.shape(inputs)
batch_size, height = s[0], s[1]
transposed = array_ops.transpose(inputs, [2, 0, 1, 3])
outputs = array_ops.reshape(transposed, [width, batch_size * height, depth])
return utils.collect_named_outputs(outputs_collections, sc, outputs)
@add_arg_scope
def max_pool2d(inputs,
kernel_size,
stride=2,
padding='VALID',
data_format=DATA_FORMAT_NHWC,
outputs_collections=None,
scope=None):
"""Adds a 2D Max Pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs: A 4-D tensor of shape `[batch_size, height, width, channels]` if
`data_format` is `NHWC`, and `[batch_size, channels, height, width]` if
`data_format` is `NCHW`.
kernel_size: A list of length 2: [kernel_height, kernel_width] of the
pooling kernel over which the op is computed. Can be an int if both values
are the same.
stride: A list of length 2: [stride_height, stride_width]. Can be an int if
both strides are the same. Note that presently both strides must have the
same value.
padding: The padding method, either 'VALID' or 'SAME'.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: If `data_format` is neither `NHWC` nor `NCHW`.
ValueError: If 'kernel_size' is not a 2-D list
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
with ops.name_scope(scope, 'MaxPool2D', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = pooling_layers.MaxPooling2D(
pool_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
_scope=sc)
outputs = layer.apply(inputs)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
@add_arg_scope
def max_pool3d(inputs,
kernel_size,
stride=2,
padding='VALID',
data_format=DATA_FORMAT_NDHWC,
outputs_collections=None,
scope=None):
"""Adds a 3D Max Pooling op.
It is assumed that the pooling is done per image but not in batch or channels.
Args:
inputs: A 5-D tensor of shape `[batch_size, depth, height, width, channels]`
if `data_format` is `NDHWC`, and `[batch_size, channels, depth, height,
width]` if `data_format` is `NCDHW`.
kernel_size: A list of length 3: [kernel_depth, kernel_height, kernel_width]
of the pooling kernel over which the op is computed. Can be an int if both
values are the same.
stride: A list of length 3: [stride_depth, stride_height, stride_width]. Can
be an int if both strides are the same. Note that presently both strides
must have the same value.
padding: The padding method, either 'VALID' or 'SAME'.
data_format: A string. `NDHWC` (default) and `NCDHW` are supported.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: If `data_format` is neither `NDHWC` nor `NCDHW`.
ValueError: If 'kernel_size' is not a 3-D list
"""
if data_format not in (DATA_FORMAT_NCDHW, DATA_FORMAT_NDHWC):
raise ValueError('data_format has to be either NCDHW or NDHWC.')
with ops.name_scope(scope, 'MaxPool3D', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
layer = pooling_layers.MaxPooling3D(
pool_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
_scope=sc)
outputs = layer.apply(inputs)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
@add_arg_scope
def pool(inputs,
kernel_size,
pooling_type,
padding='VALID',
data_format=None,
dilation_rate=1,
stride=1,
outputs_collections=None,
scope=None):
# pylint: disable=line-too-long
"""Adds a pooling op.
Args:
inputs: Tensor of rank N+2, of shape `[batch_size] + input_spatial_shape +
[num_channels]` if data_format does not start with "NC" (default), or
`[batch_size, num_channels] + input_spatial_shape` if data_format starts
with "NC". Pooling happens over the spatial dimensions only.
kernel_size: Sequence of N ints >= 1. Can also be a single integer to
specify the same value for all spatial dimensions.
pooling_type: Specifies pooling operation, must be "AVG" or "MAX".
padding: The padding algorithm, must be "SAME" or "VALID".
data_format: A string or None. Specifies whether the channel dimension of
the `input` and output is the last dimension (default, or if `data_format`
does not start with "NC"), or the second dimension (if `data_format`
starts with "NC"). For N=1, the valid values are "NWC" (default) and
"NCW". For N=2, the valid values are "NHWC" (default) and "NCHW". For
N=3, the valid values are "NDHWC" (default) and "NCDHW".
dilation_rate: Optional. Dilation rate. Sequence of N ints >= 1. Defaults
to [1]*N. Can also be a single integer to specify the same value for all
spatial dimensions. If any value of dilation_rate is > 1, then all values
of stride must be 1.
stride: Optional. Sequence of N ints >= 1. Defaults to [1]*N. Can also be
a single integer to specify the same value for all spatial dimensions. If
any value of stride is > 1, then all values of dilation_rate must be 1.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: If arguments are invalid.
"""
# pylint: enable=line-too-long
with ops.name_scope(scope, '%s_pool' % (pooling_type.lower()),
[inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
input_rank = inputs.get_shape().ndims
if input_rank is None:
raise ValueError('Rank of inputs must be known')
if input_rank < 3:
raise ValueError('Rank of inputs must be >= 3')
num_spatial_dims = input_rank - 2
output = nn.pool(
input=inputs,
window_shape=utils.n_positive_integers(num_spatial_dims, kernel_size),
pooling_type=pooling_type,
padding=padding,
data_format=data_format,
dilation_rate=utils.n_positive_integers(num_spatial_dims,
dilation_rate),
strides=utils.n_positive_integers(num_spatial_dims, stride),
name=sc)
return utils.collect_named_outputs(outputs_collections, sc, output)
@add_arg_scope
def one_hot_encoding(labels,
num_classes,
on_value=1.0,
off_value=0.0,
outputs_collections=None,
scope=None):
"""Transform numeric labels into onehot_labels using `tf.one_hot`.
Args:
labels: [batch_size] target labels.
num_classes: Total number of classes.
on_value: A scalar defining the on-value.
off_value: A scalar defining the off-value.
outputs_collections: Collection to add the outputs.
scope: Optional scope for name_scope.
Returns:
One-hot encoding of the labels.
"""
with ops.name_scope(scope, 'OneHotEncoding', [labels, num_classes]) as sc:
labels = ops.convert_to_tensor(labels)
if labels.dtype == dtypes.int32:
labels = standard_ops.to_int64(labels)
outputs = standard_ops.one_hot(
labels, num_classes, on_value=on_value, off_value=off_value)
return utils.collect_named_outputs(outputs_collections, sc, outputs)
def _apply_activation(y, activation_fn, output_collections):
if activation_fn is not None:
y = activation_fn(y)
ops.add_to_collections(
list(output_collections or []) + [ops.GraphKeys.ACTIVATIONS], y)
return y
def repeat(inputs, repetitions, layer, *args, **kwargs):
"""Applies the same layer with the same arguments repeatedly.
```python
y = repeat(x, 3, conv2d, 64, [3, 3], scope='conv1')
# It is equivalent to:
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_1')
x = conv2d(x, 64, [3, 3], scope='conv1/conv1_2')
y = conv2d(x, 64, [3, 3], scope='conv1/conv1_3')
```
If the `scope` argument is not given in `kwargs`, it is set to
`layer.__name__`, or `layer.func.__name__` (for `functools.partial`
objects). If neither `__name__` nor `func.__name__` is available, the
layers are called with `scope='stack'`.
Args:
inputs: A `Tensor` suitable for layer.
repetitions: Int, number of repetitions.
layer: A layer with arguments `(inputs, *args, **kwargs)`
*args: Extra args for the layer.
**kwargs: Extra kwargs for the layer.
Returns:
A tensor result of applying the layer, repetitions times.
Raises:
ValueError: If the op is unknown or wrong.
"""
scope = kwargs.pop('scope', None)
with variable_scope.variable_scope(scope, 'Repeat', [inputs]):
inputs = ops.convert_to_tensor(inputs)
if scope is None:
if hasattr(layer, '__name__'):
scope = layer.__name__
elif hasattr(layer, 'func') and hasattr(layer.func, '__name__'):
scope = layer.func.__name__ # In case layer is a functools.partial.
else:
scope = 'repeat'
outputs = inputs
for i in range(repetitions):
kwargs['scope'] = scope + '_' + str(i + 1)
outputs = layer(outputs, *args, **kwargs)
return outputs
def _scale_gradient_shape(op):
"""Shape helper function for scale_gradient function below."""
return [op.inputs[0].shape]
def _scale_gradient_grad(op, grad):
"""Python gradient helper function for scale_gradient function below."""
return [grad * op.inputs[1], None]
@function.Defun(
python_grad_func=_scale_gradient_grad, shape_func=_scale_gradient_shape)
def scale_gradient(inputs, gradient_multiplier):
"""Identity operation, but with the gradient multiplied by a tensor.
The TensorFlow gradient system will compute the gradient with respect to
`inputs` as the product of the gradient with respect to the `output`
multiplied by a specified `gradient_multiplier` tensor. If
`gradient_multiplier` is equal to 1, then this results in the true gradient.
Otherwise, it results in a scaled gradient.
This can be useful for adjusting the relative learning rate of different
parameter tensors when performing gradient descent, and because this rescaling
can be inserted at arbitrary locations within a graph, is often more
convenient to apply than simply rescaling the final computed gradients.
Args:
inputs: Tensor to be output.
gradient_multiplier: Tensor by which to multiply the gradient with respect
to `output` to compute the gradient with respect to `inputs`. Its shape
must be broadcastable to the shape of `inputs`.
Returns:
output Tensor, equal to `inputs`.
"""
# gradient_multiplier is implicitly saved by decorator, and only used for
# gradient computation.
del gradient_multiplier
return inputs
@add_arg_scope
def separable_convolution2d(
inputs,
num_outputs,
kernel_size,
depth_multiplier=1,
stride=1,
padding='SAME',
data_format=DATA_FORMAT_NHWC,
rate=1,
activation_fn=nn.relu,
normalizer_fn=None,
normalizer_params=None,
weights_initializer=initializers.xavier_initializer(),
pointwise_initializer=None,
weights_regularizer=None,
biases_initializer=init_ops.zeros_initializer(),
biases_regularizer=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None):
"""Adds a depth-separable 2D convolution with optional batch_norm layer.
This op first performs a depthwise convolution that acts separately on
channels, creating a variable called `depthwise_weights`. If `num_outputs`
is not None, it adds a pointwise convolution that mixes channels, creating a
variable called `pointwise_weights`. Then, if `normalizer_fn` is None,
it adds bias to the result, creating a variable called 'biases', otherwise,
the `normalizer_fn` is applied. It finally applies an activation function
to produce the end result.
Args:
inputs: A tensor of size [batch_size, height, width, channels].
num_outputs: The number of pointwise convolution output filters. If is None,
then we skip the pointwise convolution stage.
kernel_size: A list of length 2: [kernel_height, kernel_width] of of the
filters. Can be an int if both values are the same.
depth_multiplier: The number of depthwise convolution output channels for
each input channel. The total number of depthwise convolution output
channels will be equal to `num_filters_in * depth_multiplier`.
stride: A list of length 2: [stride_height, stride_width], specifying the
depthwise convolution stride. Can be an int if both strides are the same.
padding: One of 'VALID' or 'SAME'.
data_format: A string. `NHWC` (default) and `NCHW` are supported.
rate: A list of length 2: [rate_height, rate_width], specifying the dilation
rates for atrous convolution. Can be an int if both rates are the same. If
any value is larger than one, then both stride values need to be one.
activation_fn: Activation function. The default value is a ReLU function.
Explicitly set it to None to skip it and maintain a linear activation.
normalizer_fn: Normalization function to use instead of `biases`. If
`normalizer_fn` is provided then `biases_initializer` and
`biases_regularizer` are ignored and `biases` are not created nor added.
default set to None for no normalizer function
normalizer_params: Normalization function parameters.
weights_initializer: An initializer for the depthwise weights.
pointwise_initializer: An initializer for the pointwise weights. default set
to None, means use weights_initializer.
weights_regularizer: Optional regularizer for the weights.
biases_initializer: An initializer for the biases. If None skip biases.
biases_regularizer: Optional regularizer for the biases.
reuse: Whether or not the layer and its variables should be reused. To be
able to reuse the layer scope must be given.
variables_collections: Optional list of collections for all the variables or
a dictionary containing a different list of collection per variable.
outputs_collections: Collection to add the outputs.
trainable: Whether or not the variables should be trainable or not.
scope: Optional scope for variable_scope.
Returns:
A `Tensor` representing the output of the operation.
Raises:
ValueError: If `data_format` is invalid.
"""
if data_format not in (DATA_FORMAT_NCHW, DATA_FORMAT_NHWC):
raise ValueError('data_format has to be either NCHW or NHWC.')
layer_variable_getter = _build_variable_getter({
'bias': 'biases',
'depthwise_kernel': 'depthwise_weights',
'pointwise_kernel': 'pointwise_weights'
})
with variable_scope.variable_scope(
scope,
'SeparableConv2d', [inputs],
reuse=reuse,
custom_getter=layer_variable_getter) as sc:
inputs = ops.convert_to_tensor(inputs)
if pointwise_initializer is None:
pointwise_initializer = weights_initializer
df = ('channels_first'
if data_format and data_format.startswith('NC') else 'channels_last')
if num_outputs is not None:
# Apply separable conv using the SeparableConvolution2D layer.
layer = convolutional_layers.SeparableConvolution2D(
filters=num_outputs,
kernel_size=kernel_size,
strides=stride,
padding=padding,
data_format=df,
dilation_rate=utils.two_element_tuple(rate),
activation=None,
depth_multiplier=depth_multiplier,
use_bias=not normalizer_fn and biases_initializer,
depthwise_initializer=weights_initializer,
pointwise_initializer=pointwise_initializer,
bias_initializer=biases_initializer,
depthwise_regularizer=weights_regularizer,
pointwise_regularizer=weights_regularizer,
bias_regularizer=biases_regularizer,
activity_regularizer=None,
trainable=trainable,
name=sc.name,
dtype=inputs.dtype.base_dtype,
_scope=sc,
_reuse=reuse)
outputs = layer.apply(inputs)
# Add variables to collections.
_add_variable_to_collections(layer.depthwise_kernel,
variables_collections, 'weights')
_add_variable_to_collections(layer.pointwise_kernel,
variables_collections, 'weights')
if layer.bias is not None:
_add_variable_to_collections(layer.bias, variables_collections,
'biases')
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
else:
# Actually apply depthwise conv instead of separable conv.
dtype = inputs.dtype.base_dtype
kernel_h, kernel_w = utils.two_element_tuple(kernel_size)
stride_h, stride_w = utils.two_element_tuple(stride)
num_filters_in = utils.channel_dimension(
inputs.get_shape(), df, min_rank=4)
weights_collections = utils.get_variable_collections(
variables_collections, 'weights')
depthwise_shape = [kernel_h, kernel_w, num_filters_in, depth_multiplier]
depthwise_weights = variables.model_variable(
'depthwise_weights',
shape=depthwise_shape,
dtype=dtype,
initializer=weights_initializer,
regularizer=weights_regularizer,
trainable=trainable,
collections=weights_collections)
strides = [
1, 1, stride_h, stride_w
] if data_format.startswith('NC') else [1, stride_h, stride_w, 1]
outputs = nn.depthwise_conv2d(
inputs,
depthwise_weights,
strides,
padding,
rate=utils.two_element_tuple(rate),
data_format=data_format)
num_outputs = depth_multiplier * num_filters_in
if normalizer_fn is not None:
normalizer_params = normalizer_params or {}
outputs = normalizer_fn(outputs, **normalizer_params)
else:
if biases_initializer is not None:
biases_collections = utils.get_variable_collections(
variables_collections, 'biases')
biases = variables.model_variable(
'biases',
shape=[
num_outputs,
],
dtype=dtype,
initializer=biases_initializer,
regularizer=biases_regularizer,
trainable=trainable,
collections=biases_collections)
outputs = nn.bias_add(outputs, biases, data_format=data_format)
if activation_fn is not None:
outputs = activation_fn(outputs)
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
@add_arg_scope
def sequence_to_images(inputs,
height,
output_data_format='channels_last',
outputs_collections=None,
scope=None):
"""Convert a batch of sequences into a batch of images.
Args:
inputs: (num_steps, num_batches, depth) sequence tensor
height: the height of the images
output_data_format: Format of output tensor. Currently supports
`'channels_first'` and `'channels_last'`.
outputs_collections: The collections to which the outputs are added.
scope: Optional scope for name_scope.
Returns:
A tensor representing the output of the operation.
"""
with ops.name_scope(scope, 'SequenceToImages', [inputs]) as sc:
inputs = ops.convert_to_tensor(inputs)
width, num_batches, depth = inputs.get_shape().as_list()
if num_batches is None:
num_batches = -1
else:
num_batches //= height
reshaped = array_ops.reshape(inputs, [width, num_batches, height, depth])
if output_data_format == 'channels_first':
outputs = array_ops.transpose(reshaped, [1, 3, 2, 0])
else:
outputs = array_ops.transpose(reshaped, [1, 2, 0, 3])
return utils.collect_named_outputs(outputs_collections, sc, outputs)
@add_arg_scope
def softmax(logits, scope=None):
"""Performs softmax on Nth dimension of N-dimensional logit tensor.
For two-dimensional logits this reduces to tf.nn.softmax. The N-th dimension
needs to have a specified number of elements (number of classes).
Args:
logits: N-dimensional `Tensor` with logits, where N > 1.
scope: Optional scope for variable_scope.
Returns:
A `Tensor` with same shape and type as logits.
"""
# TODO(jrru): Add axis argument which defaults to last dimension.
with variable_scope.variable_scope(scope, 'softmax', [logits]):
num_logits = utils.last_dimension(logits.get_shape(), min_rank=2)
logits_2d = array_ops.reshape(logits, [-1, num_logits])
predictions = nn.softmax(logits_2d)
predictions = array_ops.reshape(predictions, array_ops.shape(logits))
if not context.executing_eagerly():
predictions.set_shape(logits.get_shape())
return predictions
@add_arg_scope
def spatial_softmax(features,
temperature=None,
name=None,
variables_collections=None,
trainable=True,
data_format='NHWC'):
"""Computes the spatial softmax of a convolutional feature map.
First computes the softmax over the spatial extent of each channel of a
convolutional feature map. Then computes the expected 2D position of the
points of maximal activation for each channel, resulting in a set of
feature keypoints [i1, j1, ... iN, jN] for all N channels.
Read more here:
"Learning visual feature spaces for robotic manipulation with
deep spatial autoencoders." Finn et al., http://arxiv.org/abs/1509.06113.
Args:
features: A `Tensor` of size [batch_size, W, H, num_channels]; the
convolutional feature map.
temperature: Softmax temperature (optional). If None, a learnable
temperature is created.
name: A name for this operation (optional).
variables_collections: Collections for the temperature variable.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see `tf.Variable`).
data_format: A string. `NHWC` (default) and `NCHW` are supported.
Returns:
feature_keypoints: A `Tensor` with size [batch_size, num_channels * 2];
the expected 2D locations of each channel's feature keypoint (normalized
to the range (-1,1)). The inner dimension is arranged as
[i1, j1, ... iN, jN].
Raises:
ValueError: If unexpected data_format specified.
ValueError: If num_channels dimension is unspecified.
"""
with variable_scope.variable_scope(name, 'spatial_softmax'):
shape = array_ops.shape(features)
static_shape = features.shape
if data_format == DATA_FORMAT_NHWC:
height, width, num_channels = shape[1], shape[2], static_shape[3]
elif data_format == DATA_FORMAT_NCHW:
num_channels, height, width = static_shape[1], shape[2], shape[3]
else:
raise ValueError('data_format has to be either NCHW or NHWC.')
if tensor_shape.dimension_value(num_channels) is None:
raise ValueError('The num_channels dimension of the inputs to '
'`spatial_softmax` should be defined. Found `None`.')
with ops.name_scope('spatial_softmax_op', 'spatial_softmax_op', [features]):
# Create tensors for x and y coordinate values, scaled to range [-1, 1].
pos_x, pos_y = array_ops.meshgrid(
math_ops.lin_space(-1., 1., num=height),
math_ops.lin_space(-1., 1., num=width),
indexing='ij')
pos_x = array_ops.reshape(pos_x, [height * width])
pos_y = array_ops.reshape(pos_y, [height * width])
if temperature is None:
temp_initializer = init_ops.ones_initializer()
else:
temp_initializer = init_ops.constant_initializer(temperature)
if not trainable:
temp_collections = None
else:
temp_collections = utils.get_variable_collections(
variables_collections, 'temperature')
temperature = variables.model_variable(
'temperature',
shape=(),
dtype=dtypes.float32,
initializer=temp_initializer,
collections=temp_collections,
trainable=trainable)
if data_format == 'NCHW':
features = array_ops.reshape(features, [-1, height * width])
else:
features = array_ops.reshape(
array_ops.transpose(features, [0, 3, 1, 2]), [-1, height * width])
softmax_attention = nn.softmax(features / temperature)
expected_x = math_ops.reduce_sum(
pos_x * softmax_attention, [1], keepdims=True)
expected_y = math_ops.reduce_sum(
pos_y * softmax_attention, [1], keepdims=True)
expected_xy = array_ops.concat([expected_x, expected_y], 1)
feature_keypoints = array_ops.reshape(
expected_xy, [-1, tensor_shape.dimension_value(num_channels) * 2])
feature_keypoints.set_shape(
[None, tensor_shape.dimension_value(num_channels) * 2])
return feature_keypoints
def stack(inputs, layer, stack_args, **kwargs):
"""Builds a stack of layers by applying layer repeatedly using stack_args.
`stack` allows you to repeatedly apply the same operation with different
arguments `stack_args[i]`. For each application of the layer, `stack` creates
a new scope appended with an increasing number. For example:
```python
y = stack(x, fully_connected, [32, 64, 128], scope='fc')
# It is equivalent to:
x = fully_connected(x, 32, scope='fc/fc_1')
x = fully_connected(x, 64, scope='fc/fc_2')
y = fully_connected(x, 128, scope='fc/fc_3')
```
If the `scope` argument is not given in `kwargs`, it is set to
`layer.__name__`, or `layer.func.__name__` (for `functools.partial`
objects). If neither `__name__` nor `func.__name__` is available, the
layers are called with `scope='stack'`.
Args:
inputs: A `Tensor` suitable for layer.
layer: A layer with arguments `(inputs, *args, **kwargs)`
stack_args: A list/tuple of parameters for each call of layer.
**kwargs: Extra kwargs for the layer.
Returns:
A `Tensor` result of applying the stacked layers.
Raises:
ValueError: If the op is unknown or wrong.
"""
scope = kwargs.pop('scope', None)
if not isinstance(stack_args, (list, tuple)):
raise ValueError('stack_args need to be a list or tuple')
with variable_scope.variable_scope(scope, 'Stack', [inputs]):
inputs = ops.convert_to_tensor(inputs)
if scope is None:
if hasattr(layer, '__name__'):
scope = layer.__name__
elif hasattr(layer, 'func') and hasattr(layer.func, '__name__'):
scope = layer.func.__name__ # In case layer is a functools.partial.
else:
scope = 'stack'
outputs = inputs
for i in range(len(stack_args)):
kwargs['scope'] = scope + '_' + str(i + 1)
layer_args = stack_args[i]
if not isinstance(layer_args, (list, tuple)):
layer_args = [layer_args]
outputs = layer(outputs, *layer_args, **kwargs)
return outputs
@add_arg_scope
def unit_norm(inputs, dim, epsilon=1e-7, scope=None):
"""Normalizes the given input across the specified dimension to unit length.
Note that the rank of `input` must be known.
Args:
inputs: A `Tensor` of arbitrary size.
dim: The dimension along which the input is normalized.
epsilon: A small value to add to the inputs to avoid dividing by zero.
scope: Optional scope for variable_scope.
Returns:
The normalized `Tensor`.
Raises:
ValueError: If dim is smaller than the number of dimensions in 'inputs'.
"""
with variable_scope.variable_scope(scope, 'UnitNorm', [inputs]):
if not inputs.get_shape():
raise ValueError('The input rank must be known.')
input_rank = len(inputs.get_shape().as_list())
if dim < 0 or dim >= input_rank:
raise ValueError('dim must be positive but smaller than the input rank.')
lengths = math_ops.sqrt(
epsilon + math_ops.reduce_sum(math_ops.square(inputs), dim, True))
multiples = []
if dim > 0:
multiples.append(array_ops.ones([dim], dtypes.int32))
multiples.append(
array_ops.strided_slice(array_ops.shape(inputs), [dim], [dim + 1]))
if dim < (input_rank - 1):
multiples.append(array_ops.ones([input_rank - 1 - dim], dtypes.int32))
multiples = array_ops.concat(multiples, 0)
return math_ops.div(inputs, array_ops.tile(lengths, multiples))
@add_arg_scope
def maxout(inputs, num_units, axis=-1, scope=None):
"""Adds a maxout op from https://arxiv.org/abs/1302.4389
"Maxout Networks" Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron
Courville,
Yoshua Bengio
Usually the operation is performed in the filter/channel dimension. This can
also be
used after fully-connected layers to reduce number of features.
Arguments:
inputs: Tensor input
num_units: Specifies how many features will remain after maxout in the
`axis` dimension (usually channel). This must be a factor of number of
features.
axis: The dimension where max pooling will be performed. Default is the last
dimension.
scope: Optional scope for variable_scope.
Returns:
A `Tensor` representing the results of the pooling operation.
Raises:
ValueError: if num_units is not multiple of number of features.
"""
with variable_scope.variable_scope(scope, 'MaxOut', [inputs]):
inputs = ops.convert_to_tensor(inputs)
shape = inputs.get_shape().as_list()
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not '
'a multiple of num_units({})'.format(
num_channels, num_units))
shape[axis] = num_units
shape += [num_channels // num_units]
# Dealing with batches with arbitrary sizes
for i in range(len(shape)):
if shape[i] is None:
shape[i] = array_ops.shape(inputs)[i]
outputs = math_ops.reduce_max(
array_ops.reshape(inputs, shape), -1, keepdims=False)
return outputs
def poincare_normalize(x, axis=1, epsilon=1e-5, name=None):
"""Project into the Poincare ball with norm <= 1.0 - epsilon.
https://en.wikipedia.org/wiki/Poincare_ball_model
Used in
Poincare Embeddings for Learning Hierarchical Representations
Maximilian Nickel, Douwe Kiela
https://arxiv.org/pdf/1705.08039.pdf
For a 1-D tensor with `axis = 0`, computes
(x * (1 - epsilon)) / ||x|| if ||x|| > 1 - epsilon
output =
x otherwise
For `x` with more dimensions, independently normalizes each 1-D slice along
dimension `axis`.
Args:
x: A `Tensor`.
axis: Axis along which to normalize. A scalar or a vector of integers.
epsilon: A small deviation from the edge of the unit sphere for numerical
stability.
name: A name for this operation (optional).
Returns:
A `Tensor` with the same shape as `x`.
"""
with ops.name_scope(name, 'poincare_normalize', [x]) as name:
x = ops.convert_to_tensor(x, name='x')
square_sum = math_ops.reduce_sum(math_ops.square(x), axis, keepdims=True)
x_inv_norm = math_ops.rsqrt(square_sum)
x_inv_norm = math_ops.minimum((1. - epsilon) * x_inv_norm, 1.)
return math_ops.multiply(x, x_inv_norm, name=name)
def legacy_fully_connected(x,
num_output_units,
activation_fn=None,
weight_init=initializers.xavier_initializer(),
bias_init=init_ops.zeros_initializer(),
name=None,
weight_collections=(ops.GraphKeys.WEIGHTS,),
bias_collections=(ops.GraphKeys.BIASES,),
output_collections=(ops.GraphKeys.ACTIVATIONS,),
trainable=True,
weight_regularizer=None,
bias_regularizer=None):
# pylint: disable=anomalous-backslash-in-string
r"""Adds the parameters for a fully connected layer and returns the output.
A fully connected layer is generally defined as a matrix multiply:
`y = f(w * x + b)` where `f` is given by `activation_fn`. If
`activation_fn` is `None`, the result of `y = w * x + b` is
returned.
If `x` has shape [\\(\text{dim}_0, \text{dim}_1, ..., \text{dim}_n\\)]
with more than 2 dimensions (\\(n > 1\\)), then we repeat the matrix
multiply along the first dimensions. The result r is a tensor of shape
[\\(\text{dim}_0, ..., \text{dim}_{n-1},\\) `num_output_units`],
where \\( r_{i_0, ..., i_{n-1}, k} =
\sum_{0 \leq j < \text{dim}_n} x_{i_0, ... i_{n-1}, j} \cdot w_{j, k}\\).
This is accomplished by reshaping `x` to 2-D
[\\(\text{dim}_0 \cdot ... \cdot \text{dim}_{n-1}, \text{dim}_n\\)]
before the matrix multiply and afterwards reshaping it to
[\\(\text{dim}_0, ..., \text{dim}_{n-1},\\) `num_output_units`].
This op creates `w` and optionally `b`. Bias (`b`) can be disabled by setting
`bias_init` to `None`.
The variable creation is compatible with `tf.compat.v1.variable_scope` and so
can be
reused with `tf.compat.v1.variable_scope` or `tf.compat.v1.make_template`.
Most of the details of variable creation can be controlled by specifying the
initializers (`weight_init` and `bias_init`) and in which collections to place
the created variables (`weight_collections` and `bias_collections`; note that
the variables are always added to the `VARIABLES` collection). The output of
the layer can be placed in custom collections using `output_collections`.
The collections arguments default to `WEIGHTS`, `BIASES` and `ACTIVATIONS`,
respectively.
A per layer regularization can be specified by setting `weight_regularizer`
and `bias_regularizer`, which are applied to the weights and biases
respectively, and whose output is added to the `REGULARIZATION_LOSSES`
collection.
Args:
x: The input `Tensor`.
num_output_units: The size of the output.
activation_fn: Activation function, default set to None to skip it and
maintain a linear activation.
weight_init: An optional weight initialization, defaults to
`xavier_initializer`.
bias_init: An initializer for the bias, defaults to 0. Set to `None` in
order to disable bias.
name: The name for this operation is used to name operations and to find
variables. If specified it must be unique for this scope, otherwise a
unique name starting with "fully_connected" will be created. See
`tf.compat.v1.variable_scope` for details.
weight_collections: List of graph collections to which weights are added.
bias_collections: List of graph collections to which biases are added.
output_collections: List of graph collections to which outputs are added.
trainable: If `True` also add variables to the graph collection
`GraphKeys.TRAINABLE_VARIABLES` (see tf.Variable).
weight_regularizer: A regularizer like the result of `l1_regularizer` or
`l2_regularizer`. Used for weights.
bias_regularizer: A regularizer like the result of `l1_regularizer` or
`l2_regularizer`. Used for biases.
Returns:
The output of the fully connected layer.
Raises:
ValueError: If x has rank less than 2 or if its last dimension is not set.
"""
with variable_scope.variable_scope(name, 'fully_connected', [x]):
x = ops.convert_to_tensor(x)
dims = x.get_shape().dims
if dims is None:
raise ValueError('dims of x must be known but is None')
if len(dims) < 2:
raise ValueError('rank of x must be at least 2 not: %d' % len(dims))
num_input_units = dims[-1].value
if num_input_units is None:
raise ValueError('last dimension of x must be known but is None')
dtype = x.dtype.base_dtype
weight_collections = set(
list(weight_collections or []) + [ops.GraphKeys.GLOBAL_VARIABLES])
w = variable_scope.get_variable(
'weights',
shape=[num_input_units, num_output_units],
dtype=dtype,
initializer=weight_init,
collections=weight_collections,
regularizer=weight_regularizer,
trainable=trainable)
x_2_dim = x if len(dims) <= 2 else array_ops.reshape(
x, [-1, num_input_units])
y = standard_ops.matmul(x_2_dim, w)
if bias_init is not None:
bias_collections = set(
list(bias_collections or []) + [ops.GraphKeys.GLOBAL_VARIABLES])
b = variable_scope.get_variable(
'bias',
shape=[num_output_units],
dtype=dtype,
initializer=bias_init,
collections=bias_collections,
regularizer=bias_regularizer,
trainable=trainable)
y = nn.bias_add(y, b)
if len(dims) > 2:
out_shape = array_ops.unstack(array_ops.shape(x))
out_shape[-1] = num_output_units
y = array_ops.reshape(y, array_ops.stack(out_shape))
static_shape = x.get_shape().as_list()
static_shape[-1] = num_output_units
y.set_shape(static_shape)
return _apply_activation(y, activation_fn, output_collections)
# TODO(eiderm): Verify and fix autocomplete in colab (also relu6).
# Simple aliases which remove the activation_fn parameter.
elu = functools.partial(fully_connected, activation_fn=nn.elu)
legacy_relu = functools.partial(legacy_fully_connected, activation_fn=nn.relu)
legacy_linear = functools.partial(legacy_fully_connected, activation_fn=None)
relu = functools.partial(fully_connected, activation_fn=nn.relu)
relu6 = functools.partial(fully_connected, activation_fn=nn.relu6)
linear = functools.partial(fully_connected, activation_fn=None)
# Simple alias.
conv1d = convolution1d
conv2d = convolution2d
conv3d = convolution3d
conv2d_transpose = convolution2d_transpose
conv3d_transpose = convolution3d_transpose
conv2d_in_plane = convolution2d_in_plane
separable_conv2d = separable_convolution2d
================================================
FILE: tf_contrib/loader.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Utilities for loading op libraries.
@@load_op_library
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import re
from tensorflow.python.framework import load_library
from tensorflow.python.platform import resource_loader
def load_op_library(path):
"""Loads a contrib op library from the given path.
NOTE(mrry): On Windows, we currently assume that some contrib op
libraries are statically linked into the main TensorFlow Python
extension DLL - use dynamically linked ops if the .so is present.
Args:
path: An absolute path to a shared object file.
Returns:
A Python module containing the Python wrappers for Ops defined in the
plugin.
"""
if os.name == 'nt':
# To avoid making every user_ops aware of windows, re-write
# the file extension from .so to .dll if .so file doesn't exist.
if not os.path.exists(path):
path = re.sub(r'\.so$', '.dll', path)
# Currently we have only some user_ops as dlls on windows - don't try
# to load them if the dll is not found.
# TODO(mrry): Once we have all of them this check should be removed.
if not os.path.exists(path):
return None
path = resource_loader.get_path_to_datafile(path)
ret = load_library.load_op_library(path)
assert ret, 'Could not load %s' % path
return ret
================================================
FILE: tf_contrib/regularizers.py
================================================
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Regularizers for use with layers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numbers
from tensorflow.python.framework import constant_op
from tensorflow.python.framework import ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn
from tensorflow.python.ops import standard_ops
from tensorflow.python.platform import tf_logging as logging
__all__ = ['l1_regularizer',
'l2_regularizer',
'l1_l2_regularizer',
'sum_regularizer',
'apply_regularization']
def l1_regularizer(scale, scope=None):
"""Returns a function that can be used to apply L1 regularization to weights.
L1 regularization encourages sparsity.
Args:
scale: A scalar multiplier `Tensor`. 0.0 disables the regularizer.
scope: An optional scope name.
Returns:
A function with signature `l1(weights)` that apply L1 regularization.
Raises:
ValueError: If scale is negative or if scale is not a float.
"""
if isinstance(scale, numbers.Integral):
raise ValueError('scale cannot be an integer: %s' % scale)
if isinstance(scale, numbers.Real):
if scale < 0.:
raise ValueError('Setting a scale less than 0 on a regularizer: %g' %
scale)
if scale == 0.:
logging.info('Scale of 0 disables regularizer.')
return lambda _: None
def l1(weights, name=None):
"""Applies L1 regularization to weights."""
with ops.name_scope(scope, 'l1_regularizer', [weights]) as name:
my_scale = ops.convert_to_tensor(scale,
dtype=weights.dtype.base_dtype,
name='scale')
return standard_ops.multiply(
my_scale,
standard_ops.reduce_sum(standard_ops.abs(weights)),
name=name)
return l1
def l2_regularizer(scale, scope=None):
"""Returns a function that can be used to apply L2 regularization to weights.
Small values of L2 can help prevent overfitting the training data.
Args:
scale: A scalar multiplier `Tensor`. 0.0 disables the regularizer.
scope: An optional scope name.
Returns:
A function with signature `l2(weights)` that applies L2 regularization.
Raises:
ValueError: If scale is negative or if scale is not a float.
"""
if isinstance(scale, numbers.Integral):
raise ValueError('scale cannot be an integer: %s' % (scale,))
if isinstance(scale, numbers.Real):
if scale < 0.:
raise ValueError('Setting a scale less than 0 on a regularizer: %g.' %
scale)
if scale == 0.:
logging.info('Scale of 0 disables regularizer.')
return lambda _: None
def l2(weights):
"""Applies l2 regularization to weights."""
with ops.name_scope(scope, 'l2_regularizer', [weights]) as name:
my_scale = ops.convert_to_tensor(scale,
dtype=weights.dtype.base_dtype,
name='scale')
return standard_ops.multiply(my_scale, nn.l2_loss(weights), name=name)
return l2
def l1_l2_regularizer(scale_l1=1.0, scale_l2=1.0, scope=None):
"""Returns a function that can be used to apply L1 L2 regularizations.
Args:
scale_l1: A scalar multiplier `Tensor` for L1 regularization.
scale_l2: A scalar multiplier `Tensor` for L2 regularization.
scope: An optional scope name.
Returns:
A function with signature `l1_l2(weights)` that applies a weighted sum of
L1 L2 regularization.
Raises:
ValueError: If scale is negative or if scale is not a float.
"""
if isinstance(scale_l1, numbers.Integral):
raise ValueError('scale_l1 cannot be an integer: %s' % (scale_l1,))
if isinstance(scale_l2, numbers.Integral):
raise ValueError('scale_l2 cannot be an integer: %s' % (scale_l2,))
scope = scope or 'l1_l2_regularizer'
if scale_l1 == 0.:
return l2_regularizer(scale_l2, scope)
if scale_l2 == 0.:
return l1_regularizer(scale_l1, scope)
return sum_regularizer([l1_regularizer(scale_l1),
l2_regularizer(scale_l2)],
scope=scope)
def sum_regularizer(regularizer_list, scope=None):
"""Returns a function that applies the sum of multiple regularizers.
Args:
regularizer_list: A list of regularizers to apply.
scope: An optional scope name
Returns:
A function with signature `sum_reg(weights)` that appl
gitextract_tx2oprbr/
├── .gitignore
├── LICENSE
├── Makefile
├── README.md
├── _tf_compat_import.py
├── faster_rcnn_wrapper.py
├── main.py
├── make.bat
├── model/
│ └── .gitignore
├── nms/
│ ├── .gitignore
│ ├── __init__.py
│ ├── cpu_nms.pyx
│ ├── gpu_nms.hpp
│ ├── gpu_nms.pyx
│ ├── nms_kernel.cu
│ └── py_cpu_nms.py
├── nms_wrapper.py
├── setup.py
└── tf_contrib/
├── README.md
├── arg_scope.py
├── initializers.py
├── layers.py
├── loader.py
├── regularizers.py
├── resnet_utils.py
├── resnet_v1.py
├── slim.py
├── utils.py
└── variables.py
SYMBOL INDEX (132 symbols across 15 files)
FILE: _tf_compat_import.py
function _compat_tf_import (line 3) | def _compat_tf_import(enable_gpu: bool = True):
FILE: faster_rcnn_wrapper.py
class FasterRCNNSlim (line 8) | class FasterRCNNSlim:
method __init__ (line 10) | def __init__(self):
method _resnet_arg_scope (line 164) | def _resnet_arg_scope():
method _reshape (line 184) | def _reshape(bottom, num_dim, name):
method _softmax (line 193) | def _softmax(bottom, name):
method test_image (line 201) | def test_image(self, sess, image, im_info):
FILE: main.py
function detect (line 12) | def detect(sess, rcnn_cls, image):
function load_file_from_dir (line 60) | def load_file_from_dir(dir_path):
function fmt_time (line 71) | def fmt_time(dtime):
function main (line 83) | def main():
FILE: nms/py_cpu_nms.py
function py_cpu_nms (line 10) | def py_cpu_nms(dets, thresh):
FILE: nms_wrapper.py
class NMSType (line 4) | class NMSType(Enum):
class NMSWrapper (line 13) | class NMSWrapper:
method __init__ (line 14) | def __init__(self, nms_type=default_nms_type):
method __call__ (line 28) | def __call__(self, *args, **kwargs):
FILE: setup.py
class custom_build_ext (line 24) | class custom_build_ext(build_ext):
method build_extensions (line 25) | def build_extensions(self):
FILE: tf_contrib/arg_scope.py
function _get_arg_stack (line 79) | def _get_arg_stack():
function current_arg_scope (line 87) | def current_arg_scope():
function arg_scope_func_key (line 92) | def arg_scope_func_key(op):
function _name_op (line 96) | def _name_op(op):
function _kwarg_names (line 100) | def _kwarg_names(func):
function _add_op (line 105) | def _add_op(op):
function arg_scope (line 111) | def arg_scope(list_ops_or_scope, **kwargs):
function add_arg_scope (line 165) | def add_arg_scope(func):
function has_arg_scope (line 189) | def has_arg_scope(func):
function arg_scoped_arguments (line 201) | def arg_scoped_arguments(func):
FILE: tf_contrib/initializers.py
function xavier_initializer (line 31) | def xavier_initializer(uniform=True, seed=None, dtype=dtypes.float32):
function variance_scaling_initializer (line 62) | def variance_scaling_initializer(factor=2.0, mode='FAN_IN', uniform=False,
FILE: tf_contrib/layers.py
function avg_pool2d (line 78) | def avg_pool2d(inputs,
function avg_pool3d (line 127) | def avg_pool3d(inputs,
function _fused_batch_norm (line 175) | def _fused_batch_norm(inputs,
function batch_norm (line 431) | def batch_norm(inputs,
function bias_add (line 841) | def bias_add(inputs,
function convolution (line 918) | def convolution(inputs,
function convolution1d (line 1074) | def convolution1d(inputs,
function convolution2d (line 1120) | def convolution2d(inputs,
function convolution3d (line 1166) | def convolution3d(inputs,
function convolution2d_in_plane (line 1212) | def convolution2d_in_plane(
function convolution2d_transpose (line 1318) | def convolution2d_transpose(
function convolution3d_transpose (line 1434) | def convolution3d_transpose(
function dense_to_sparse (line 1548) | def dense_to_sparse(tensor, eos_token=0, outputs_collections=None, scope...
function dropout (line 1573) | def dropout(inputs,
function flatten (line 1617) | def flatten(inputs, outputs_collections=None, scope=None):
function _sparse_inner_flatten (line 1638) | def _sparse_inner_flatten(inputs, new_rank):
function _dense_inner_flatten (line 1655) | def _dense_inner_flatten(inputs, new_rank):
function _inner_flatten (line 1683) | def _inner_flatten(inputs, new_rank, output_collections=None, scope=None):
function _model_variable_getter (line 1717) | def _model_variable_getter(
function _build_variable_getter (line 1755) | def _build_variable_getter(rename=None):
function _add_variable_to_collections (line 1766) | def _add_variable_to_collections(variable, collections_set, collections_...
function fully_connected (line 1780) | def fully_connected(inputs,
function layer_norm (line 2204) | def layer_norm(inputs,
function images_to_sequence (line 2337) | def images_to_sequence(inputs,
function max_pool2d (line 2372) | def max_pool2d(inputs,
function max_pool3d (line 2422) | def max_pool3d(inputs,
function pool (line 2472) | def pool(inputs,
function one_hot_encoding (line 2541) | def one_hot_encoding(labels,
function _apply_activation (line 2569) | def _apply_activation(y, activation_fn, output_collections):
function repeat (line 2577) | def repeat(inputs, repetitions, layer, *args, **kwargs):
function _scale_gradient_shape (line 2623) | def _scale_gradient_shape(op):
function _scale_gradient_grad (line 2628) | def _scale_gradient_grad(op, grad):
function scale_gradient (line 2635) | def scale_gradient(inputs, gradient_multiplier):
function separable_convolution2d (line 2666) | def separable_convolution2d(
function sequence_to_images (line 2855) | def sequence_to_images(inputs,
function softmax (line 2889) | def softmax(logits, scope=None):
function spatial_softmax (line 2914) | def spatial_softmax(features,
function stack (line 3010) | def stack(inputs, layer, stack_args, **kwargs):
function unit_norm (line 3066) | def unit_norm(inputs, dim, epsilon=1e-7, scope=None):
function maxout (line 3104) | def maxout(inputs, num_units, axis=-1, scope=None):
function poincare_normalize (line 3150) | def poincare_normalize(x, axis=1, epsilon=1e-5, name=None):
function legacy_fully_connected (line 3187) | def legacy_fully_connected(x,
FILE: tf_contrib/loader.py
function load_op_library (line 30) | def load_op_library(path):
FILE: tf_contrib/regularizers.py
function l1_regularizer (line 37) | def l1_regularizer(scale, scope=None):
function l2_regularizer (line 76) | def l2_regularizer(scale, scope=None):
function l1_l2_regularizer (line 112) | def l1_l2_regularizer(scale_l1=1.0, scale_l2=1.0, scope=None):
function sum_regularizer (line 141) | def sum_regularizer(regularizer_list, scope=None):
function apply_regularization (line 170) | def apply_regularization(regularizer, weights_list=None):
FILE: tf_contrib/resnet_utils.py
class Block (line 55) | class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
function subsample (line 68) | def subsample(inputs, factor, scope=None):
function conv2d_same (line 86) | def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=...
function stack_blocks_dense (line 149) | def stack_blocks_dense(net,
function resnet_arg_scope (line 224) | def resnet_arg_scope(weight_decay=0.0001,
FILE: tf_contrib/resnet_v1.py
function bottleneck (line 74) | def bottleneck(inputs,
function resnet_v1 (line 127) | def resnet_v1(inputs,
function resnet_v1_block (line 226) | def resnet_v1_block(scope, base_depth, num_units, stride):
function resnet_v1_50 (line 250) | def resnet_v1_50(inputs,
function resnet_v1_101 (line 276) | def resnet_v1_101(inputs,
function resnet_v1_152 (line 302) | def resnet_v1_152(inputs,
function resnet_v1_200 (line 328) | def resnet_v1_200(inputs,
FILE: tf_contrib/utils.py
function collect_named_outputs (line 42) | def collect_named_outputs(collections, alias, outputs):
function append_tensor_alias (line 65) | def append_tensor_alias(tensor, alias):
function gather_tensors_aliases (line 85) | def gather_tensors_aliases(tensors):
function get_tensor_aliases (line 100) | def get_tensor_aliases(tensor):
function convert_collection_to_dict (line 123) | def convert_collection_to_dict(collection, clear_collection=False):
function constant_value (line 142) | def constant_value(value_or_tensor_or_var, dtype=None):
function static_cond (line 171) | def static_cond(pred, fn1, fn2):
function smart_cond (line 197) | def smart_cond(pred, fn1, fn2, name=None):
function get_variable_collections (line 220) | def get_variable_collections(variables_collections, name):
function _get_dimension (line 228) | def _get_dimension(shape, dim, min_rank=1):
function channel_dimension (line 256) | def channel_dimension(shape, data_format, min_rank=1):
function last_dimension (line 275) | def last_dimension(shape, min_rank=1):
function two_element_tuple (line 292) | def two_element_tuple(int_or_tuple):
function n_positive_integers (line 323) | def n_positive_integers(n, value):
FILE: tf_contrib/variables.py
function zero_initializer (line 71) | def zero_initializer(ref, use_locking=True, name="zero_initializer"):
function assert_global_step (line 96) | def assert_global_step(global_step_tensor):
function assert_or_get_global_step (line 100) | def assert_or_get_global_step(graph=None, global_step_tensor=None):
function get_global_step (line 125) | def get_global_step(graph=None):
function create_global_step (line 130) | def create_global_step(graph=None):
function get_or_create_global_step (line 149) | def get_or_create_global_step(graph=None):
function local_variable (line 162) | def local_variable(initial_value,
function global_variable (line 186) | def global_variable(initial_value,
function variable (line 211) | def variable(name,
function model_variable (line 286) | def model_variable(name,
function add_model_variable (line 356) | def add_model_variable(var):
function get_variables (line 366) | def get_variables(scope=None,
function get_model_variables (line 390) | def get_model_variables(scope=None, suffix=None):
function get_local_variables (line 403) | def get_local_variables(scope=None, suffix=None):
function get_trainable_variables (line 416) | def get_trainable_variables(scope=None, suffix=None):
function get_variables_to_restore (line 429) | def get_variables_to_restore(include=None, exclude=None):
function get_variables_by_suffix (line 465) | def get_variables_by_suffix(suffix, scope=None):
function get_variables_by_name (line 478) | def get_variables_by_name(given_name, scope=None):
function get_unique_variable (line 492) | def get_unique_variable(var_op_name):
function assign_from_values (line 515) | def assign_from_values(var_names_to_values):
function assign_from_values_fn (line 570) | def assign_from_values_fn(var_names_to_values):
function get_variable_full_name (line 598) | def get_variable_full_name(var):
function assign_from_checkpoint (line 623) | def assign_from_checkpoint(model_path, var_list, ignore_missing_vars=Fal...
function assign_from_checkpoint_fn (line 710) | def assign_from_checkpoint_fn(model_path,
class VariableDeviceChooser (line 770) | class VariableDeviceChooser(object):
method __init__ (line 777) | def __init__(self,
method __call__ (line 807) | def __call__(self, op):
function filter_variables (line 820) | def filter_variables(var_list,
Condensed preview — 29 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (278K chars).
[
{
"path": ".gitignore",
"chars": 1301,
"preview": "# Ignore VSCode configurations\n.vscode\n\n# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C e"
},
{
"path": "LICENSE",
"chars": 1068,
"preview": "MIT License\n\nCopyright (c) 2018 Zhou Xuebin\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
},
{
"path": "Makefile",
"chars": 94,
"preview": "all:\n\tpython setup.py build_ext --inplace\n\trm -rf build\nclean:\n\trm -rf */*.pyc\n\trm -rf */*.so\n"
},
{
"path": "README.md",
"chars": 4467,
"preview": "# Anime-Face-Detector\nA Faster-RCNN based anime face detector.\n\nThis detector is trained on 6000 training samples and 64"
},
{
"path": "_tf_compat_import.py",
"chars": 378,
"preview": "__all__ = ['compat_tensorflow']\n\ndef _compat_tf_import(enable_gpu: bool = True):\n if not enable_gpu:\n import o"
},
{
"path": "faster_rcnn_wrapper.py",
"chars": 11011,
"preview": "from _tf_compat_import import compat_tensorflow as tf\nfrom tf_contrib.resnet_v1 import resnet_v1_block, resnet_v1\nimport"
},
{
"path": "main.py",
"chars": 7366,
"preview": "import numpy as np\nimport cv2\nfrom faster_rcnn_wrapper import FasterRCNNSlim\nfrom _tf_compat_import import compat_tensor"
},
{
"path": "make.bat",
"chars": 203,
"preview": "@echo off\nif /i \"%1\" == \"clean\" goto clean\ngoto all\n\n:all\npython setup.py build_ext --inplace\nrd /s /q build\n\ngoto exit\n"
},
{
"path": "model/.gitignore",
"chars": 68,
"preview": "# all pre-trained models\n*.index\n*.data-00000-of-00001\n*.meta\n*.pkl\n"
},
{
"path": "nms/.gitignore",
"chars": 10,
"preview": "*.c\n*.cpp\n"
},
{
"path": "nms/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "nms/cpu_nms.pyx",
"chars": 2243,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "nms/gpu_nms.hpp",
"chars": 146,
"preview": "void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,\n int boxes_dim, float nms_overla"
},
{
"path": "nms/gpu_nms.pyx",
"chars": 1112,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed unde"
},
{
"path": "nms/nms_kernel.cu",
"chars": 5064,
"preview": "// ------------------------------------------------------------------\n// Faster R-CNN\n// Copyright (c) 2015 Microsoft\n//"
},
{
"path": "nms/py_cpu_nms.py",
"chars": 1051,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "nms_wrapper.py",
"chars": 770,
"preview": "from enum import Enum\n\n\nclass NMSType(Enum):\n PY_NMS = 1\n CPU_NMS = 2\n GPU_NMS = 3\n\n\ndefault_nms_type = NMSType"
},
{
"path": "setup.py",
"chars": 1139,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "tf_contrib/README.md",
"chars": 303,
"preview": "Since tensorflow 2.0 no longer supports contrib module, I decided to freeze the related contrib code segments into this "
},
{
"path": "tf_contrib/arg_scope.py",
"chars": 6871,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/initializers.py",
"chars": 6235,
"preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/layers.py",
"chars": 138920,
"preview": "# -*- coding: utf-8 -*-\n# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache Lice"
},
{
"path": "tf_contrib/loader.py",
"chars": 2063,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/regularizers.py",
"chars": 7484,
"preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/resnet_utils.py",
"chars": 10730,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/resnet_v1.py",
"chars": 13820,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/slim.py",
"chars": 2120,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/utils.py",
"chars": 11398,
"preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "tf_contrib/variables.py",
"chars": 31706,
"preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
}
]
About this extraction
This page contains the full source code of the qhgz2013/anime-face-detector GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 29 files (262.8 KB), approximately 63.6k tokens, and a symbol index with 132 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.