Repository: lengstrom/fast-style-transfer Branch: master Commit: 0d3d981f7ab9 Files: 13 Total size: 39.9 KB Directory structure: gitextract_gzfrkqti/ ├── .github/ │ └── FUNDING.yml ├── .gitignore ├── CITATION.cff ├── README.md ├── docs.md ├── evaluate.py ├── setup.sh ├── src/ │ ├── optimize.py │ ├── transform.py │ ├── utils.py │ └── vgg.py ├── style.py └── transform_video.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/FUNDING.yml ================================================ # These are supported funding model platforms github: [lengstrom] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] ================================================ FILE: .gitignore ================================================ t Byte-compiled / optimized / DLL files deps.txt archive saver *~ styles pngs preds *.sw* data __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python env/ build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ *.egg-info/ .installed.cfg *.egg # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *,cover .hypothesis/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # IPython Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # dotenv .env # virtualenv venv/ ENV/ # Spyder project settings .spyderproject # Rope project settings .ropeproject # PyCharm .idea # checkpoint checkpoint ================================================ FILE: CITATION.cff ================================================ # YAML 1.2 --- authors: - family-names: Engstrom given-names: Logan cff-version: "1.1.0" date-released: 2016-10-31 message: "If you use this software, please cite it using these metadata." repository-code: "https://github.com/lengstrom/fast-style-transfer" title: "Fast Style Transfer" version: "1.0" ... ================================================ FILE: README.md ================================================ ## Fast Style Transfer in [TensorFlow](https://github.com/tensorflow/tensorflow) Add styles from famous paintings to any photo in a fraction of a second! [You can even style videos!](#video-stylization)

It takes 100ms on a 2015 Titan X to style the MIT Stata Center (1024×680) like Udnie, by Francis Picabia.

Our implementation is based off of a combination of Gatys' [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576), Johnson's [Perceptual Losses for Real-Time Style Transfer and Super-Resolution](http://cs.stanford.edu/people/jcjohns/eccv16/), and Ulyanov's [Instance Normalization](https://arxiv.org/abs/1607.08022). ### Sponsorship Please consider sponsoring my work on this project! ### License Copyright (c) 2016 Logan Engstrom. Contact me for commercial use (or rather any use that is not academic research) (email: engstrom at my university's domain dot edu). Free for research use, as long as proper attribution is given and this copyright notice is retained. ## Video Stylization Here we transformed every frame in a video, then combined the results. [Click to go to the full demo on YouTube!](https://www.youtube.com/watch?v=xVJwwWQlQ1o) The style here is Udnie, as above.
Stylized fox video. Click to go to YouTube!
See how to generate these videos [here](#stylizing-video)! ## Image Stylization We added styles from various paintings to a photo of Chicago. Click on thumbnails to see full applied style images.


## Implementation Details Our implementation uses TensorFlow to train a fast style transfer network. We use roughly the same transformation network as described in Johnson, except that batch normalization is replaced with Ulyanov's instance normalization, and the scaling/offset of the output `tanh` layer is slightly different. We use a loss function close to the one described in Gatys, using VGG19 instead of VGG16 and typically using "shallower" layers than in Johnson's implementation (e.g. we use `relu1_1` rather than `relu1_2`). Empirically, this results in larger scale style features in transformations. ## Virtual Environment Setup (Anaconda) - Windows/Linux Tested on | Spec | | |-----------------------------|-------------------------------------------------------------| | Operating System | Windows 10 Home | | GPU | Nvidia GTX 2080 TI | | CUDA Version | 11.0 | | Driver Version | 445.75 | ### Step 1:Install Anaconda https://docs.anaconda.com/anaconda/install/ ### Step 2:Build a virtual environment Run the following commands in sequence in Anaconda Prompt: ``` conda create -n tf-gpu tensorflow-gpu=2.1.0 conda activate tf-gpu conda install jupyterlab jupyter lab ``` Run the following command in the notebook or just conda install the package: ``` !pip install moviepy==1.0.2 ``` Follow the commands below to use fast-style-transfer ## Documentation ### Training Style Transfer Networks Use `style.py` to train a new style transfer network. Run `python style.py` to view all the possible parameters. Training takes 4-6 hours on a Maxwell Titan X. [More detailed documentation here](docs.md#stylepy). **Before you run this, you should run `setup.sh`**. Example usage: python style.py --style path/to/style/img.jpg \ --checkpoint-dir checkpoint/path \ --test path/to/test/img.jpg \ --test-dir path/to/test/dir \ --content-weight 1.5e1 \ --checkpoint-iterations 1000 \ --batch-size 20 ### Evaluating Style Transfer Networks Use `evaluate.py` to evaluate a style transfer network. Run `python evaluate.py` to view all the possible parameters. Evaluation takes 100 ms per frame (when batch size is 1) on a Maxwell Titan X. [More detailed documentation here](docs.md#evaluatepy). Takes several seconds per frame on a CPU. **Models for evaluation are [located here](https://drive.google.com/drive/folders/0B9jhaT37ydSyRk9UX0wwX3BpMzQ?resourcekey=0-Z9LcNHC-BTB4feKwm4loXw&usp=sharing)**. Example usage: python evaluate.py --checkpoint path/to/style/model.ckpt \ --in-path dir/of/test/imgs/ \ --out-path dir/for/results/ ### Stylizing Video Use `transform_video.py` to transfer style into a video. Run `python transform_video.py` to view all the possible parameters. Requires `ffmpeg`. [More detailed documentation here](docs.md#transform_videopy). Example usage: python transform_video.py --in-path path/to/input/vid.mp4 \ --checkpoint path/to/style/model.ckpt \ --out-path out/video.mp4 \ --device /gpu:0 \ --batch-size 4 ### Requirements You will need the following to run the above: - TensorFlow 0.11.0 - Python 2.7.9, Pillow 3.4.2, scipy 0.18.1, numpy 1.11.2 - If you want to train (and don't want to wait for 4 months): - A decent GPU - All the required NVIDIA software to run TF on a GPU (cuda, etc) - ffmpeg 3.1.3 if you want to stylize video ### Citation ``` @misc{engstrom2016faststyletransfer, author = {Logan Engstrom}, title = {Fast Style Transfer}, year = {2016}, howpublished = {\url{https://github.com/lengstrom/fast-style-transfer/}}, note = {commit xxxxxxx} } ``` ### Attributions/Thanks - This project could not have happened without the advice (and GPU access) given by [Anish Athalye](http://www.anishathalye.com/). - The project also borrowed some code from Anish's [Neural Style](https://github.com/anishathalye/neural-style/) - Some readme/docs formatting was borrowed from Justin Johnson's [Fast Neural Style](https://github.com/jcjohnson/fast-neural-style) - The image of the Stata Center at the very beginning of the README was taken by [Juan Paulo](https://juanpaulo.me/) ### Related Work - Michael Ramos ported this network [to use CoreML on iOS](https://medium.com/@rambossa/diy-prisma-fast-style-transfer-app-with-coreml-and-tensorflow-817c3b90dacd) ================================================ FILE: docs.md ================================================ ## style.py `style.py` trains networks that can transfer styles from artwork into images. **Flags** - `--checkpoint-dir`: Directory to save checkpoint in. Required. - `--style`: Path to style image. Required. - `--train-path`: Path to training images folder. Default: `data/train2014`. - `--test`: Path to content image to test network on at at every checkpoint iteration. Default: no image. - `--test-dir`: Path to directory to save test images in. Required if `--test` is passed a value. - `--epochs`: Epochs to train for. Default: `2`. - `--batch-size`: Batch size for training. Default: `4`. - `--checkpoint-iterations`: Number of iterations to go for between checkpoints. Default: `2000`. - `--vgg-path`: Path to VGG19 network (default). Can pass VGG16 if you want to try out other loss functions. Default: `data/imagenet-vgg-verydeep-19.mat`. - `--content-weight`: Weight of content in loss function. Default: `7.5e0`. - `--style-weight`: Weight of style in loss function. Default: `1e2`. - `--tv-weight`: Weight of total variation term in loss function. Default: `2e2`. - `--learning-rate`: Learning rate for optimizer. Default: `1e-3`. - `--slow`: For debugging loss function. Direct optimization on pixels using Gatys' approach. Uses `test` image as content value, `test_dir` for saving fully optimized images. ## evaluate.py `evaluate.py` evaluates trained networks given a checkpoint directory. If evaluating images from a directory, every image in the directory must have the same dimensions. **Flags** - `--checkpoint`: Directory or `ckpt` file to load checkpoint from. Required. - `--in-path`: Path of image or directory of images to transform. Required. - `--out-path`: Out path of transformed image or out directory to put transformed images from in directory (if `in_path` is a directory). Required. - `--device`: Device used to transform image. Default: `/cpu:0`. - `--batch-size`: Batch size used to evaluate images. In particular meant for directory transformations. Default: `4`. - `--allow-different-dimensions`: Allow different image dimensions. Default: not enabled ## transform_video.py `transform_video.py` transforms videos into stylized videos given a style transfer net. **Flags** - `--checkpoint-dir`: Directory or `ckpt` file to load checkpoint from. Required. - `--in-path`: Path to video to transfer style to. Required. - `--out-path`: Path to out video. Required. - `--tmp-dir`: Directory to put temporary processing files in. Will generate a dir if you do not pass it a path. Will delete tmpdir afterwards. Default: randomly generates invisible dir, then deletes it after execution completion. - `--device`: Device to evaluate frames with. Default: `/gpu:0`. - `--batch-size`: Batch size for evaluating images. Default: `4`. ================================================ FILE: evaluate.py ================================================ from __future__ import print_function import sys sys.path.insert(0, 'src') import transform, numpy as np, vgg, pdb, os import scipy.misc import tensorflow as tf from utils import save_img, get_img, exists, list_files from argparse import ArgumentParser from collections import defaultdict import time import json import subprocess import numpy from moviepy.video.io.VideoFileClip import VideoFileClip import moviepy.video.io.ffmpeg_writer as ffmpeg_writer BATCH_SIZE = 4 DEVICE = '/gpu:0' def ffwd_video(path_in, path_out, checkpoint_dir, device_t='/gpu:0', batch_size=4): video_clip = VideoFileClip(path_in, audio=False) video_writer = ffmpeg_writer.FFMPEG_VideoWriter(path_out, video_clip.size, video_clip.fps, codec="libx264", preset="medium", bitrate="2000k", audiofile=path_in, threads=None, ffmpeg_params=None) g = tf.Graph() soft_config = tf.compat.v1.ConfigProto(allow_soft_placement=True) soft_config.gpu_options.allow_growth = True with g.as_default(), g.device(device_t), \ tf.compat.v1.Session(config=soft_config) as sess: batch_shape = (batch_size, video_clip.size[1], video_clip.size[0], 3) img_placeholder = tf.compat.v1.placeholder(tf.float32, shape=batch_shape, name='img_placeholder') preds = transform.net(img_placeholder) saver = tf.compat.v1.train.Saver() if os.path.isdir(checkpoint_dir): ckpt = tf.train.get_checkpoint_state(checkpoint_dir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) else: raise Exception("No checkpoint found...") else: saver.restore(sess, checkpoint_dir) X = np.zeros(batch_shape, dtype=np.float32) def style_and_write(count): for i in range(count, batch_size): X[i] = X[count - 1] # Use last frame to fill X _preds = sess.run(preds, feed_dict={img_placeholder: X}) for i in range(0, count): video_writer.write_frame(np.clip(_preds[i], 0, 255).astype(np.uint8)) frame_count = 0 # The frame count that written to X for frame in video_clip.iter_frames(): X[frame_count] = frame frame_count += 1 if frame_count == batch_size: style_and_write(frame_count) frame_count = 0 if frame_count != 0: style_and_write(frame_count) video_writer.close() # get img_shape def ffwd(data_in, paths_out, checkpoint_dir, device_t='/gpu:0', batch_size=4): assert len(paths_out) > 0 is_paths = type(data_in[0]) == str if is_paths: assert len(data_in) == len(paths_out) img_shape = get_img(data_in[0]).shape else: assert data_in.size[0] == len(paths_out) img_shape = X[0].shape g = tf.Graph() batch_size = min(len(paths_out), batch_size) curr_num = 0 soft_config = tf.compat.v1.ConfigProto(allow_soft_placement=True) soft_config.gpu_options.allow_growth = True with g.as_default(), g.device(device_t), \ tf.compat.v1.Session(config=soft_config) as sess: batch_shape = (batch_size,) + img_shape img_placeholder = tf.compat.v1.placeholder(tf.float32, shape=batch_shape, name='img_placeholder') preds = transform.net(img_placeholder) saver = tf.compat.v1.train.Saver() if os.path.isdir(checkpoint_dir): ckpt = tf.train.get_checkpoint_state(checkpoint_dir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) else: raise Exception("No checkpoint found...") else: saver.restore(sess, checkpoint_dir) num_iters = int(len(paths_out)/batch_size) for i in range(num_iters): pos = i * batch_size curr_batch_out = paths_out[pos:pos+batch_size] if is_paths: curr_batch_in = data_in[pos:pos+batch_size] X = np.zeros(batch_shape, dtype=np.float32) for j, path_in in enumerate(curr_batch_in): img = get_img(path_in) assert img.shape == img_shape, \ 'Images have different dimensions. ' + \ 'Resize images or use --allow-different-dimensions.' X[j] = img else: X = data_in[pos:pos+batch_size] _preds = sess.run(preds, feed_dict={img_placeholder:X}) for j, path_out in enumerate(curr_batch_out): save_img(path_out, _preds[j]) remaining_in = data_in[num_iters*batch_size:] remaining_out = paths_out[num_iters*batch_size:] if len(remaining_in) > 0: ffwd(remaining_in, remaining_out, checkpoint_dir, device_t=device_t, batch_size=1) def ffwd_to_img(in_path, out_path, checkpoint_dir, device='/cpu:0'): paths_in, paths_out = [in_path], [out_path] ffwd(paths_in, paths_out, checkpoint_dir, batch_size=1, device_t=device) def ffwd_different_dimensions(in_path, out_path, checkpoint_dir, device_t=DEVICE, batch_size=4): in_path_of_shape = defaultdict(list) out_path_of_shape = defaultdict(list) for i in range(len(in_path)): in_image = in_path[i] out_image = out_path[i] shape = "%dx%dx%d" % get_img(in_image).shape in_path_of_shape[shape].append(in_image) out_path_of_shape[shape].append(out_image) for shape in in_path_of_shape: print('Processing images of shape %s' % shape) ffwd(in_path_of_shape[shape], out_path_of_shape[shape], checkpoint_dir, device_t, batch_size) def build_parser(): parser = ArgumentParser() parser.add_argument('--checkpoint', type=str, dest='checkpoint_dir', help='dir or .ckpt file to load checkpoint from', metavar='CHECKPOINT', required=True) parser.add_argument('--in-path', type=str, dest='in_path',help='dir or file to transform', metavar='IN_PATH', required=True) help_out = 'destination (dir or file) of transformed file or files' parser.add_argument('--out-path', type=str, dest='out_path', help=help_out, metavar='OUT_PATH', required=True) parser.add_argument('--device', type=str, dest='device',help='device to perform compute on', metavar='DEVICE', default=DEVICE) parser.add_argument('--batch-size', type=int, dest='batch_size',help='batch size for feedforwarding', metavar='BATCH_SIZE', default=BATCH_SIZE) parser.add_argument('--allow-different-dimensions', action='store_true', dest='allow_different_dimensions', help='allow different image dimensions') return parser def check_opts(opts): exists(opts.checkpoint_dir, 'Checkpoint not found!') exists(opts.in_path, 'In path not found!') if os.path.isdir(opts.out_path): exists(opts.out_path, 'out dir not found!') assert opts.batch_size > 0 def main(): parser = build_parser() opts = parser.parse_args() check_opts(opts) if not os.path.isdir(opts.in_path): if os.path.exists(opts.out_path) and os.path.isdir(opts.out_path): out_path = \ os.path.join(opts.out_path,os.path.basename(opts.in_path)) else: out_path = opts.out_path ffwd_to_img(opts.in_path, out_path, opts.checkpoint_dir, device=opts.device) else: files = list_files(opts.in_path) full_in = [os.path.join(opts.in_path,x) for x in files] full_out = [os.path.join(opts.out_path,x) for x in files] if opts.allow_different_dimensions: ffwd_different_dimensions(full_in, full_out, opts.checkpoint_dir, device_t=opts.device, batch_size=opts.batch_size) else : ffwd(full_in, full_out, opts.checkpoint_dir, device_t=opts.device, batch_size=opts.batch_size) if __name__ == '__main__': main() ================================================ FILE: setup.sh ================================================ #! /bin/bash mkdir data cd data wget http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat mkdir bin wget http://msvocds.blob.core.windows.net/coco2014/train2014.zip unzip -q train2014.zip ================================================ FILE: src/optimize.py ================================================ from __future__ import print_function import functools import vgg, pdb, time import tensorflow as tf, numpy as np, os import transform from utils import get_img STYLE_LAYERS = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1') CONTENT_LAYER = 'relu4_2' DEVICES = 'CUDA_VISIBLE_DEVICES' # np arr, np arr def optimize(content_targets, style_target, content_weight, style_weight, tv_weight, vgg_path, epochs=2, print_iterations=1000, batch_size=4, save_path='saver/fns.ckpt', slow=False, learning_rate=1e-3, debug=False): if slow: batch_size = 1 mod = len(content_targets) % batch_size if mod > 0: print("Train set has been trimmed slightly..") content_targets = content_targets[:-mod] style_features = {} batch_shape = (batch_size,256,256,3) style_shape = (1,) + style_target.shape print(style_shape) # precompute style features with tf.Graph().as_default(), tf.device('/cpu:0'), tf.compat.v1.Session() as sess: style_image = tf.compat.v1.placeholder(tf.float32, shape=style_shape, name='style_image') style_image_pre = vgg.preprocess(style_image) net = vgg.net(vgg_path, style_image_pre) style_pre = np.array([style_target]) for layer in STYLE_LAYERS: features = net[layer].eval(feed_dict={style_image:style_pre}) features = np.reshape(features, (-1, features.shape[3])) gram = np.matmul(features.T, features) / features.size style_features[layer] = gram with tf.Graph().as_default(), tf.compat.v1.Session() as sess: X_content = tf.compat.v1.placeholder(tf.float32, shape=batch_shape, name="X_content") X_pre = vgg.preprocess(X_content) # precompute content features content_features = {} content_net = vgg.net(vgg_path, X_pre) content_features[CONTENT_LAYER] = content_net[CONTENT_LAYER] if slow: preds = tf.Variable( tf.random.normal(X_content.get_shape()) * 0.256 ) preds_pre = preds else: preds = transform.net(X_content/255.0) preds_pre = vgg.preprocess(preds) net = vgg.net(vgg_path, preds_pre) content_size = _tensor_size(content_features[CONTENT_LAYER])*batch_size assert _tensor_size(content_features[CONTENT_LAYER]) == _tensor_size(net[CONTENT_LAYER]) content_loss = content_weight * (2 * tf.nn.l2_loss( net[CONTENT_LAYER] - content_features[CONTENT_LAYER]) / content_size ) style_losses = [] for style_layer in STYLE_LAYERS: layer = net[style_layer] bs, height, width, filters = map(lambda i:i,layer.get_shape()) size = height * width * filters feats = tf.reshape(layer, (bs, height * width, filters)) feats_T = tf.transpose(a=feats, perm=[0,2,1]) grams = tf.matmul(feats_T, feats) / size style_gram = style_features[style_layer] style_losses.append(2 * tf.nn.l2_loss(grams - style_gram)/style_gram.size) style_loss = style_weight * functools.reduce(tf.add, style_losses) / batch_size # total variation denoising tv_y_size = _tensor_size(preds[:,1:,:,:]) tv_x_size = _tensor_size(preds[:,:,1:,:]) y_tv = tf.nn.l2_loss(preds[:,1:,:,:] - preds[:,:batch_shape[1]-1,:,:]) x_tv = tf.nn.l2_loss(preds[:,:,1:,:] - preds[:,:,:batch_shape[2]-1,:]) tv_loss = tv_weight*2*(x_tv/tv_x_size + y_tv/tv_y_size)/batch_size loss = content_loss + style_loss + tv_loss # overall loss train_step = tf.compat.v1.train.AdamOptimizer(learning_rate).minimize(loss) sess.run(tf.compat.v1.global_variables_initializer()) import random uid = random.randint(1, 100) print("UID: %s" % uid) for epoch in range(epochs): num_examples = len(content_targets) iterations = 0 while iterations * batch_size < num_examples: start_time = time.time() curr = iterations * batch_size step = curr + batch_size X_batch = np.zeros(batch_shape, dtype=np.float32) for j, img_p in enumerate(content_targets[curr:step]): X_batch[j] = get_img(img_p, (256,256,3)).astype(np.float32) iterations += 1 assert X_batch.shape[0] == batch_size feed_dict = { X_content:X_batch } train_step.run(feed_dict=feed_dict) end_time = time.time() delta_time = end_time - start_time if debug: print("UID: %s, batch time: %s" % (uid, delta_time)) is_print_iter = int(iterations) % print_iterations == 0 if slow: is_print_iter = epoch % print_iterations == 0 is_last = epoch == epochs - 1 and iterations * batch_size >= num_examples should_print = is_print_iter or is_last if should_print: to_get = [style_loss, content_loss, tv_loss, loss, preds] test_feed_dict = { X_content:X_batch } tup = sess.run(to_get, feed_dict = test_feed_dict) _style_loss,_content_loss,_tv_loss,_loss,_preds = tup losses = (_style_loss, _content_loss, _tv_loss, _loss) if slow: _preds = vgg.unprocess(_preds) else: saver = tf.compat.v1.train.Saver() res = saver.save(sess, save_path) yield(_preds, losses, iterations, epoch) def _tensor_size(tensor): from operator import mul return functools.reduce(mul, (d for d in tensor.get_shape()[1:]), 1) ================================================ FILE: src/transform.py ================================================ import tensorflow as tf, pdb WEIGHTS_INIT_STDEV = .1 def net(image): conv1 = _conv_layer(image, 32, 9, 1) conv2 = _conv_layer(conv1, 64, 3, 2) conv3 = _conv_layer(conv2, 128, 3, 2) resid1 = _residual_block(conv3, 3) resid2 = _residual_block(resid1, 3) resid3 = _residual_block(resid2, 3) resid4 = _residual_block(resid3, 3) resid5 = _residual_block(resid4, 3) conv_t1 = _conv_tranpose_layer(resid5, 64, 3, 2) conv_t2 = _conv_tranpose_layer(conv_t1, 32, 3, 2) conv_t3 = _conv_layer(conv_t2, 3, 9, 1, relu=False) preds = tf.nn.tanh(conv_t3) * 150 + 255./2 return preds def _conv_layer(net, num_filters, filter_size, strides, relu=True): weights_init = _conv_init_vars(net, num_filters, filter_size) strides_shape = [1, strides, strides, 1] net = tf.nn.conv2d(input=net, filters=weights_init, strides=strides_shape, padding='SAME') net = _instance_norm(net) if relu: net = tf.nn.relu(net) return net def _conv_tranpose_layer(net, num_filters, filter_size, strides): weights_init = _conv_init_vars(net, num_filters, filter_size, transpose=True) batch_size, rows, cols, in_channels = [i for i in net.get_shape()] new_rows, new_cols = int(rows * strides), int(cols * strides) # new_shape = #tf.pack([tf.shape(net)[0], new_rows, new_cols, num_filters]) new_shape = [batch_size, new_rows, new_cols, num_filters] tf_shape = tf.stack(new_shape) strides_shape = [1,strides,strides,1] net = tf.nn.conv2d_transpose(net, weights_init, tf_shape, strides_shape, padding='SAME') net = _instance_norm(net) return tf.nn.relu(net) def _residual_block(net, filter_size=3): tmp = _conv_layer(net, 128, filter_size, 1) return net + _conv_layer(tmp, 128, filter_size, 1, relu=False) def _instance_norm(net, train=True): batch, rows, cols, channels = [i for i in net.get_shape()] var_shape = [channels] mu, sigma_sq = tf.nn.moments(x=net, axes=[1,2], keepdims=True) shift = tf.Variable(tf.zeros(var_shape)) scale = tf.Variable(tf.ones(var_shape)) epsilon = 1e-3 normalized = (net-mu)/(sigma_sq + epsilon)**(.5) return scale * normalized + shift def _conv_init_vars(net, out_channels, filter_size, transpose=False): _, rows, cols, in_channels = [i for i in net.get_shape()] if not transpose: weights_shape = [filter_size, filter_size, in_channels, out_channels] else: weights_shape = [filter_size, filter_size, out_channels, in_channels] weights_init = tf.Variable(tf.random.truncated_normal(weights_shape, stddev=WEIGHTS_INIT_STDEV, seed=1), dtype=tf.float32) return weights_init ================================================ FILE: src/utils.py ================================================ import scipy.misc, numpy as np, os, sys import imageio from PIL import Image def save_img(out_path, img): img = np.clip(img, 0, 255).astype(np.uint8) imageio.imwrite(out_path, img) def scale_img(style_path, style_scale): scale = float(style_scale) o0, o1, o2 = imageio.imread(style_path, pilmode='RGB').shape scale = float(style_scale) new_shape = (int(o0 * scale), int(o1 * scale), o2) style_target = _get_img(style_path, img_size=new_shape) return style_target def get_img(src, img_size=False): img = imageio.imread(src, pilmode='RGB') # misc.imresize(, (256, 256, 3)) if not (len(img.shape) == 3 and img.shape[2] == 3): img = np.dstack((img,img,img)) if img_size != False: img = np.array(Image.fromarray(img).resize(img_size[:2])) return img def exists(p, msg): assert os.path.exists(p), msg def list_files(in_path): files = [] for (dirpath, dirnames, filenames) in os.walk(in_path): files.extend(filenames) break return files ================================================ FILE: src/vgg.py ================================================ # Copyright (c) 2015-2016 Anish Athalye. Released under GPLv3. import tensorflow as tf import numpy as np import scipy.io import pdb MEAN_PIXEL = np.array([ 123.68 , 116.779, 103.939]) def net(data_path, input_image): layers = ( 'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4' ) data = scipy.io.loadmat(data_path) mean = data['normalization'][0][0][0] mean_pixel = np.mean(mean, axis=(0, 1)) weights = data['layers'][0] net = {} current = input_image for i, name in enumerate(layers): kind = name[:4] if kind == 'conv': kernels, bias = weights[i][0][0][0][0] # matconvnet: weights are [width, height, in_channels, out_channels] # tensorflow: weights are [height, width, in_channels, out_channels] kernels = np.transpose(kernels, (1, 0, 2, 3)) bias = bias.reshape(-1) current = _conv_layer(current, kernels, bias) elif kind == 'relu': current = tf.nn.relu(current) elif kind == 'pool': current = _pool_layer(current) net[name] = current assert len(net) == len(layers) return net def _conv_layer(input, weights, bias): conv = tf.nn.conv2d(input=input, filters=tf.constant(weights), strides=(1, 1, 1, 1), padding='SAME') return tf.nn.bias_add(conv, bias) def _pool_layer(input): return tf.nn.max_pool2d(input=input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding='SAME') def preprocess(image): return image - MEAN_PIXEL def unprocess(image): return image + MEAN_PIXEL ================================================ FILE: style.py ================================================ from __future__ import print_function import sys, os, pdb sys.path.insert(0, 'src') import numpy as np, scipy.misc from optimize import optimize from argparse import ArgumentParser from utils import save_img, get_img, exists, list_files import evaluate CONTENT_WEIGHT = 7.5e0 STYLE_WEIGHT = 1e2 TV_WEIGHT = 2e2 LEARNING_RATE = 1e-3 NUM_EPOCHS = 2 CHECKPOINT_DIR = 'checkpoints' CHECKPOINT_ITERATIONS = 2000 VGG_PATH = 'data/imagenet-vgg-verydeep-19.mat' TRAIN_PATH = 'data/train2014' BATCH_SIZE = 4 DEVICE = '/gpu:0' FRAC_GPU = 1 def build_parser(): parser = ArgumentParser() parser.add_argument('--checkpoint-dir', type=str, dest='checkpoint_dir', help='dir to save checkpoint in', metavar='CHECKPOINT_DIR', required=True) parser.add_argument('--style', type=str, dest='style', help='style image path', metavar='STYLE', required=True) parser.add_argument('--train-path', type=str, dest='train_path', help='path to training images folder', metavar='TRAIN_PATH', default=TRAIN_PATH) parser.add_argument('--test', type=str, dest='test', help='test image path', metavar='TEST', default=False) parser.add_argument('--test-dir', type=str, dest='test_dir', help='test image save dir', metavar='TEST_DIR', default=False) parser.add_argument('--slow', dest='slow', action='store_true', help='gatys\' approach (for debugging, not supported)', default=False) parser.add_argument('--epochs', type=int, dest='epochs', help='num epochs', metavar='EPOCHS', default=NUM_EPOCHS) parser.add_argument('--batch-size', type=int, dest='batch_size', help='batch size', metavar='BATCH_SIZE', default=BATCH_SIZE) parser.add_argument('--checkpoint-iterations', type=int, dest='checkpoint_iterations', help='checkpoint frequency', metavar='CHECKPOINT_ITERATIONS', default=CHECKPOINT_ITERATIONS) parser.add_argument('--vgg-path', type=str, dest='vgg_path', help='path to VGG19 network (default %(default)s)', metavar='VGG_PATH', default=VGG_PATH) parser.add_argument('--content-weight', type=float, dest='content_weight', help='content weight (default %(default)s)', metavar='CONTENT_WEIGHT', default=CONTENT_WEIGHT) parser.add_argument('--style-weight', type=float, dest='style_weight', help='style weight (default %(default)s)', metavar='STYLE_WEIGHT', default=STYLE_WEIGHT) parser.add_argument('--tv-weight', type=float, dest='tv_weight', help='total variation regularization weight (default %(default)s)', metavar='TV_WEIGHT', default=TV_WEIGHT) parser.add_argument('--learning-rate', type=float, dest='learning_rate', help='learning rate (default %(default)s)', metavar='LEARNING_RATE', default=LEARNING_RATE) return parser def check_opts(opts): exists(opts.checkpoint_dir, "checkpoint dir not found!") exists(opts.style, "style path not found!") exists(opts.train_path, "train path not found!") if opts.test or opts.test_dir: exists(opts.test, "test img not found!") exists(opts.test_dir, "test directory not found!") exists(opts.vgg_path, "vgg network data not found!") assert opts.epochs > 0 assert opts.batch_size > 0 assert opts.checkpoint_iterations > 0 assert os.path.exists(opts.vgg_path) assert opts.content_weight >= 0 assert opts.style_weight >= 0 assert opts.tv_weight >= 0 assert opts.learning_rate >= 0 def _get_files(img_dir): files = list_files(img_dir) return [os.path.join(img_dir,x) for x in files] def main(): parser = build_parser() options = parser.parse_args() check_opts(options) style_target = get_img(options.style) if not options.slow: content_targets = _get_files(options.train_path) elif options.test: content_targets = [options.test] kwargs = { "slow":options.slow, "epochs":options.epochs, "print_iterations":options.checkpoint_iterations, "batch_size":options.batch_size, "save_path":os.path.join(options.checkpoint_dir,'fns.ckpt'), "learning_rate":options.learning_rate } if options.slow: if options.epochs < 10: kwargs['epochs'] = 1000 if options.learning_rate < 1: kwargs['learning_rate'] = 1e1 args = [ content_targets, style_target, options.content_weight, options.style_weight, options.tv_weight, options.vgg_path ] for preds, losses, i, epoch in optimize(*args, **kwargs): style_loss, content_loss, tv_loss, loss = losses print('Epoch %d, Iteration: %d, Loss: %s' % (epoch, i, loss)) to_print = (style_loss, content_loss, tv_loss) print('style: %s, content:%s, tv: %s' % to_print) if options.test: assert options.test_dir != False preds_path = '%s/%s_%s.png' % (options.test_dir,epoch,i) if not options.slow: ckpt_dir = os.path.dirname(options.checkpoint_dir) evaluate.ffwd_to_img(options.test,preds_path, options.checkpoint_dir) else: save_img(preds_path, img) ckpt_dir = options.checkpoint_dir cmd_text = 'python evaluate.py --checkpoint %s ...' % ckpt_dir print("Training complete. For evaluation:\n `%s`" % cmd_text) if __name__ == '__main__': main() ================================================ FILE: transform_video.py ================================================ from __future__ import print_function from argparse import ArgumentParser import sys sys.path.insert(0, 'src') import os, random, subprocess, evaluate, shutil from utils import exists, list_files import pdb TMP_DIR = '.fns_frames_%s/' % random.randint(0,99999) DEVICE = '/gpu:0' BATCH_SIZE = 4 def build_parser(): parser = ArgumentParser() parser.add_argument('--checkpoint', type=str, dest='checkpoint', help='checkpoint directory or .ckpt file', metavar='CHECKPOINT', required=True) parser.add_argument('--in-path', type=str, dest='in_path', help='in video path', metavar='IN_PATH', required=True) parser.add_argument('--out-path', type=str, dest='out', help='path to save processed video to', metavar='OUT', required=True) parser.add_argument('--tmp-dir', type=str, dest='tmp_dir', help='tmp dir for processing', metavar='TMP_DIR', default=TMP_DIR) parser.add_argument('--device', type=str, dest='device', help='device for eval. CPU discouraged. ex: \'/gpu:0\'', metavar='DEVICE', default=DEVICE) parser.add_argument('--batch-size', type=int, dest='batch_size',help='batch size for eval. default 4.', metavar='BATCH_SIZE', default=BATCH_SIZE) parser.add_argument('--no-disk', type=bool, dest='no_disk', help='Don\'t save intermediate files to disk. Default False', metavar='NO_DISK', default=False) return parser def check_opts(opts): exists(opts.checkpoint) exists(opts.out) def main(): parser = build_parser() opts = parser.parse_args() evaluate.ffwd_video(opts.in_path, opts.out, opts.checkpoint, opts.device, opts.batch_size) if __name__ == '__main__': main()