Full Code of google/neural_rerendering_in_the_wild for AI

master 5f5226f00835 cached

16 files

144.2 KB

36.6k tokens

130 symbols

1 requests

Download .txt

Repository: google/neural_rerendering_in_the_wild
Branch: master
Commit: 5f5226f00835
Files: 16
Total size: 144.2 KB

Directory structure:
gitextract_k0balaqs/

├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── data.py
├── dataset_utils.py
├── evaluate_quantitative_metrics.py
├── layers.py
├── losses.py
├── networks.py
├── neural_rerendering.py
├── options.py
├── pretrain_appearance.py
├── segment_dataset.py
├── staged_model.py
├── style_loss.py
└── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: CONTRIBUTING.md
================================================
# How to Contribute

We'd love to accept your patches and contributions to this project. There are
just a few small guidelines you need to follow.

## Contributor License Agreement

Contributions to this project must be accompanied by a Contributor License
Agreement. You (or your employer) retain the copyright to your contribution;
this simply gives us permission to use and redistribute your contributions as
part of the project. Head over to <https://cla.developers.google.com/> to see
your current agreements on file or to sign a new one.

You generally only need to submit a CLA once, so if you've already submitted one
(even if it was for a different project), you probably don't need to do it
again.

## Code reviews

All submissions, including submissions by project members, require review. We
use GitHub pull requests for this purpose. Consult
[GitHub Help](https://help.github.com/articles/about-pull-requests/) for more
information on using pull requests.

## Community Guidelines

This project follows
[Google's Open Source Community Guidelines](https://opensource.google.com/conduct/).


================================================
FILE: LICENSE
================================================

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

================================================
FILE: README.md
================================================
# Neural Rerendering in the Wild
Moustafa Meshry<sup>1</sup>,
[Dan B Goldman](http://www.danbgoldman.com/)<sup>2</sup>,
[Sameh Khamis](http://www.samehkhamis.com/)<sup>2</sup>,
[Hugues Hoppe](http://hhoppe.com/)<sup>2</sup>,
Rohit Pandey<sup>2</sup>,
[Noah Snavely](http://www.cs.cornell.edu/~snavely/)<sup>2</sup>,
[Ricardo Martin-Brualla](http://www.ricardomartinbrualla.com/)<sup>2</sup>.

<sup>1</sup>University of Maryland, College Park &nbsp;&nbsp;&nbsp;&nbsp; <sup>2</sup>Google Inc.

To appear at CVPR 2019 (Oral). <br><br>


<figure class="image">
  <img align="center" src="imgs/teaser_with_caption.jpg" width="500px">
</figure>

<!--- ![Teaser figure](https://github.com/MoustafaMeshry/neural_rerendering_in_the_wild/blob/master/imgs/teaser_with_caption.jpg?raw=true | width=450) --->

We will provide Tensorflow implementation and pretrained models for our paper soon.

[**Paper**](https://arxiv.org/abs/1904.04290) | [**Video**](https://www.youtube.com/watch?v=E1crWQn_kmY) | [**Code**](https://github.com/MoustafaMeshry/neural_rerendering_in_the_wild) | [**Project page**](https://moustafameshry.github.io/neural_rerendering_in_the_wild/)

### Abstract

We explore total scene capture — recording, modeling, and rerendering a scene under varying appearance such as season and time of day.
Starting from internet photos of a tourist landmark, we apply traditional 3D reconstruction to register the photos and approximate the scene as a point cloud.
For each photo, we render the scene points into a deep framebuffer,
and train a neural network to learn the mapping of these initial renderings to the actual photos.
This rerendering network also takes as input a latent appearance vector and a semantic mask indicating the location of transient objects like pedestrians.
The model is evaluated on several datasets of publicly available images spanning a broad range of illumination conditions.
We create short videos demonstrating realistic manipulation of the image viewpoint, appearance, and semantic labeling.
We also compare results with prior work on scene reconstruction from internet photos.

### Video
[![Supplementary material video](https://img.youtube.com/vi/E1crWQn_kmY/0.jpg)](https://www.youtube.com/watch?v=E1crWQn_kmY)


### Appearance variation

We capture the appearance of the original images in the left column, and rerender several viewpoints under them. The last column is a detail of the previous one. The top row shows the renderings part of the input to the rerenderer, that exhibit artifacts like incomplete features in the statue, and an inconsistent mix of day and night appearances. Note the hallucinated twilight scene in the sky using the last appearance. Image credits: Flickr users William Warby, Neil Rickards, Rafael Jimenez, acme401 (Creative Commons).

<figure class="image">
  <img src="imgs/app_variation.jpg" width="900px">
</figure>

### Appearance interpolation
Frames from a synthesized camera path that smoothly transitions from the photo on the left to the photo on the right by smoothly interpolating both viewpoint and the latent appearance vectors. Please see the supplementary video. Photo Credits: Allie Caulfield, Tahbepet, Till Westermayer, Elliott Brown (Creative Commons).
<figure class="image">
  <img src="imgs/app_interpolation.jpg" width="900px">
</figure>

### Acknowledgements
We thank Gregory Blascovich for his help in conducting the user study, and Johannes Schönberger and True Price for their help generating datasets.

### Run and train instructions

Staged-training consists of three stages:

-   Pretraining the appearance network.
-   Training the rendering network while fixing the weights for the appearance
    network.
-   Finetuning both the appearance and the rendering networks.

### Aligned dataset preprocessing

#### Manual preparation

*   Set a path to a base_dir that contains the source code:

```
base_dir=//to/neural_rendering
mkdir $base_dir
cd $base_dir
```

*   We assume the following format for an aligned dataset:
    * Each training image contains 3 file with the following nameing format:
        * real image: %04d_reference.png
        * render color: %04d_color.png
        * render depth: %04d_depth.png
*   Set dataset name: e.g.
```
dataset_name='trevi3k'  # set to any name
```
*   Split the dataset into train and validation sets in two subdirectories:
    *   $base_dir/datasets/$dataset_name/train
    *   $base_dir/datasets/$dataset_name/val
*   Download the DeepLab semantic segmentation model trained on the ADE20K
    dataset from this link:
    http://download.tensorflow.org/models/deeplabv3_xception_ade20k_train_2018_05_29.tar.gz
*   Unzip the downloaded file to: $base_dir/deeplabv3_xception_ade20k_train
*   Download this [file](https://github.com/MoustafaMeshry/vgg_loss/blob/master/vgg16.py) for an implementation of a vgg-based perceptual loss.
*   Download trained weights for the vgg network as instructed in this link: https://github.com/machrisaa/tensorflow-vgg
*   Save the vgg weights to $base_dir/vgg16_weights/vgg16.npy


#### Data preprocessing

*   Run the preprocessing pipeline which consists of:
    *   Filtering out sparse renders.
    *   Semantic segmentation of ground truth images.
    *   Exporting the dataset to tfrecord format.

```
# Run locally
python tools/dataset_utils.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$base_dir/datasets/$dataset_name \
--output_dir=$base_dir/datasets/$dataset_name \
--xception_frozen_graph_path=$base_dir/deeplabv3_xception_ade20k_train/frozen_inference_graph.pb \
--alsologtostderr
```

### Pretraining the appearance encoder network

```
# Run locally
python pretrain_appearance.py \
  --dataset_name=$dataset_name \
  --train_dir=$base_dir/train_models/$dataset_name-app_pretrain \
  --imageset_dir=$base_dir/datasets/$dataset_name/train \
  --train_resolution=512 \
  --metadata_output_dir=$base_dir/datasets/$dataset_name
```

### Training the rerendering network with a fixed appearance encoder

Set the dataset_parent_dir variable below to point to the directory containing
the generated TFRecords.

```
# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
load_pretrained_app_encoder=true
appearance_pretrain_dir=$base_dir/train_models/$dataset_name-app_pretrain
load_from_another_ckpt=false
fixed_appearance_train_dir=''
train_app_encoder=false

python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000
```

### Finetuning the rerendering network and the appearance encoder

Set the fixed_appearance_train_dir to the train directory from the previous
step.

```
# Run locally:
dataset_parent_dir=$base_dir/datasets/$dataset_name
train_dir=$base_dir/train_models/$dataset_name-staged-finetune_appearance
load_pretrained_app_encoder=false
appearance_pretrain_dir=''
load_from_another_ckpt=true
fixed_appearance_train_dir=$base_dir/train_models/$dataset_name-staged-fixed_appearance
train_app_encoder=true

python neural_rerendering.py \
--dataset_name=$dataset_name \
--dataset_parent_dir=$dataset_parent_dir \
--train_dir=$train_dir \
--load_pretrained_app_encoder=$load_pretrained_app_encoder \
--appearance_pretrain_dir=$appearance_pretrain_dir \
--train_app_encoder=$train_app_encoder \
--load_from_another_ckpt=$load_from_another_ckpt \
--fixed_appearance_train_dir=$fixed_appearance_train_dir \
--total_kimg=4000
```


### Evaluate model on validation set

```
experiment_title=$dataset_name-staged-finetune_appearance
local_train_dir=$base_dir/train_models/$experiment_title
dataset_parent_dir=$base_dir/datasets/$dataset_name
val_set_out_dir=$local_train_dir/val_set_output

# Run the model on validation set
echo "Evaluating the validation set"
python neural_rerendering.py \
      --train_dir=$local_train_dir \
      --dataset_name=$dataset_name \
      --dataset_parent_dir=$dataset_parent_dir \
      --run_mode='eval_subset' \
      --virtual_seq_name='val' \
      --output_validation_dir=$val_set_out_dir \
      --logtostderr
# Evaluate quantitative metrics
python evaluate_quantitative_metrics.py \
      --val_set_out_dir=$val_set_out_dir \
      --experiment_title=$experiment_title \
      --logtostderr
```


================================================
FILE: data.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from options import FLAGS as opts
import functools
import glob
import numpy as np
import os.path as osp
import random
import tensorflow as tf


def provide_data(dataset_name='', parent_dir='', batch_size=8, subset=None,
                 max_examples=None, crop_flag=False, crop_size=256, seeds=None,
                 use_appearance=True, shuffle=128):
  # Parsing function for each tfrecord example.
  record_parse_fn = functools.partial(
      _parser_rendered_dataset, crop_flag=crop_flag, crop_size=crop_size,
      use_alpha=opts.use_alpha, use_depth=opts.use_depth,
      use_semantics=opts.use_semantic, seeds=seeds,
      use_appearance=use_appearance)

  input_dict_var = multi_input_fn_record(
      record_parse_fn, parent_dir, dataset_name, batch_size,
      subset=subset, max_examples=max_examples, shuffle=shuffle)
  return input_dict_var


def _parser_rendered_dataset(
    serialized_example, crop_flag, crop_size, seeds, use_alpha, use_depth,
    use_semantics, use_appearance):
  """
  Parses a single tf.Example into a features dictionary with input tensors.
  """
  # Structure of features_dict need to match the dictionary structure that was
  # serialized to a tf.Example
  features_dict = {'height': tf.FixedLenFeature([], tf.int64),
                   'width': tf.FixedLenFeature([], tf.int64),
                   'rendered': tf.FixedLenFeature([], tf.string),
                   'depth': tf.FixedLenFeature([], tf.string),
                   'real': tf.FixedLenFeature([], tf.string),
                   'seg': tf.FixedLenFeature([], tf.string)}
  features = tf.parse_single_example(serialized_example, features=features_dict)
  height = tf.cast(features['height'], tf.int32)
  width = tf.cast(features['width'], tf.int32)

  # Parse the rendered image.
  rendered = tf.decode_raw(features['rendered'], tf.uint8)
  rendered = tf.cast(rendered, tf.float32) * (2.0 / 255) - 1.0
  rendered = tf.reshape(rendered, [height, width, 4])
  if not use_alpha:
    rendered = tf.slice(rendered, [0, 0, 0], [height, width, 3])
  conditional_input = rendered

  # Parse the depth image.
  if use_depth:
    depth = tf.decode_raw(features['depth'], tf.uint16)
    depth = tf.reshape(depth, [height, width, 1])
    depth = tf.cast(depth, tf.float32) * (2.0 / 255) - 1.0
    conditional_input = tf.concat([conditional_input, depth], axis=-1)

  # Parse the semantic map.
  if use_semantics:
    seg_img = tf.decode_raw(features['seg'], tf.uint8)
    seg_img = tf.reshape(seg_img, [height, width, 3])
    seg_img = tf.cast(seg_img, tf.float32) * (2.0 / 255) - 1
    conditional_input = tf.concat([conditional_input, seg_img], axis=-1)

  # Verify that the parsed input has the correct number of channels.
  assert conditional_input.shape[-1] == opts.deep_buffer_nc, ('num channels '
      'in the parsed input doesn\'t match num input channels specified in '
      'opts.deep_buffer_nc!')

  # Parse the ground truth image.
  real = tf.decode_raw(features['real'], tf.uint8)
  real = tf.cast(real, tf.float32) * (2.0 / 255) - 1.0
  real = tf.reshape(real, [height, width, 3])

  # Parse the appearance image (if any).
  appearance_input = []
  if use_appearance:
    # Concatenate the deep buffer to the real image.
    appearance_input = tf.concat([real, conditional_input], axis=-1)
    # Verify that the parsed input has the correct number of channels.
    assert appearance_input.shape[-1] == opts.appearance_nc, ('num channels '
        'in the parsed appearance input doesn\'t match num input channels '
        'specified in opts.appearance_nc!')

  # Crop conditional_input and real images, but keep the appearance input
  # uncropped (learn a one-to-many mapping from appearance to output)
  if crop_flag:
    assert crop_size is not None, 'crop_size is not provided!'
    if isinstance(crop_size, int):
      crop_size = [crop_size, crop_size]
    assert len(crop_size) == 2, 'crop_size is either an int or a 2-tuple!'

    # Central crop
    if seeds is not None and len(seeds) <= 1:
      conditional_input = tf.image.resize_image_with_crop_or_pad(
          conditional_input, crop_size[0], crop_size[1])
      real = tf.image.resize_image_with_crop_or_pad(real, crop_size[0],
                                                    crop_size[1])
    else:
      if not seeds:  # random crops
        seed = random.randint(0, (1 << 31) - 1)
      else:  # fixed crops
        seed_idx = random.randint(0, len(seeds) - 1)
        seed = seeds[seed_idx]
      conditional_input = tf.random_crop(
          conditional_input, crop_size + [opts.deep_buffer_nc], seed=seed)
      real = tf.random_crop(real, crop_size + [3], seed=seed)

  features = {'conditional_input': conditional_input,
              'expected_output': real,
              'peek_input': appearance_input}
  return features


def multi_input_fn_record(
    record_parse_fn, parent_dir, tfrecord_basename, batch_size, subset=None,
    max_examples=None, shuffle=128):
  """Creates a Dataset pipeline for tfrecord files.

  Returns:
    Dataset iterator.
  """
  subset_suffix = '*_%s.tfrecord' % subset if subset else '*.tfrecord'
  input_pattern = osp.join(parent_dir, tfrecord_basename + subset_suffix)
  filenames = sorted(glob.glob(input_pattern))
  assert len(filenames) > 0, ('Error! input pattern "%s" didn\'t match any '
                              'files' % input_pattern)
  dataset = tf.data.TFRecordDataset(filenames)
  if shuffle == 0:  # keep input deterministic
    # use one thread to get deterministic results
    dataset = dataset.map(record_parse_fn, num_parallel_calls=None)
  else:
    dataset = dataset.repeat()  # Repeat indefinitely.
    dataset = dataset.map(record_parse_fn,
                          num_parallel_calls=max(4, batch_size // 4))
    if opts.training_pipeline == 'drit':
      dataset1 = dataset.shuffle(shuffle)
      dataset2 = dataset.shuffle(shuffle)
      paired_dataset = tf.data.Dataset.zip((dataset1, dataset2))

      def _join_paired_dataset(features_a, features_b):
        features_a['conditional_input_2'] = features_b['conditional_input']
        features_a['expected_output_2'] = features_b['expected_output']
        return features_a

      joined_dataset = paired_dataset.map(_join_paired_dataset)
      dataset = joined_dataset
    else:
      dataset = dataset.shuffle(shuffle)
  if max_examples is not None:
    dataset = dataset.take(max_examples)
  dataset = dataset.batch(batch_size)
  if shuffle > 0:  # input is not deterministic
    dataset = dataset.prefetch(4)  # Prefetch a few batches.
  return dataset.make_one_shot_iterator().get_next()


================================================
FILE: dataset_utils.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from PIL import Image
from absl import app
from absl import flags
from options import FLAGS as opts
import cv2
import data
import functools
import glob
import numpy as np
import os
import os.path as osp
import shutil
import six
import tensorflow as tf
import segment_dataset as segment_utils
import utils

FLAGS = flags.FLAGS
flags.DEFINE_string('output_dir', None, 'Directory to save exported tfrecords.')
flags.DEFINE_string('xception_frozen_graph_path', None,
                    'Path to the deeplab xception model frozen graph')


class AlignedRenderedDataset(object):
  def __init__(self, rendered_filepattern, use_semantic_map=True):
    """
    Args:
      rendered_filepattern: string, path filepattern to 3D rendered images (
        assumes filenames are '/path/to/dataset/%d_color.png')
      use_semantic_map: bool, include semantic maps. in the TFRecord
    """
    self.filenames = sorted(glob.glob(rendered_filepattern))
    assert len(self.filenames) > 0, ('input %s didn\'t match any files!' %
                                     rendered_filepattern)
    self.iter_idx = 0
    self.use_semantic_map = use_semantic_map

  def __iter__(self):
    return self

  def __next__(self):
    return self.next()

  def next(self):
    if self.iter_idx < len(self.filenames):
      rendered_img_name = self.filenames[self.iter_idx]
      basename = rendered_img_name[:-9]  # remove the 'color.png' suffix
      ref_img_name = basename + 'reference.png'
      depth_img_name = basename + 'depth.png'
      # Read the 3D rendered image
      img_rendered = cv2.imread(rendered_img_name, cv2.IMREAD_UNCHANGED)
      # Change BGR (default cv2 format) to RGB
      img_rendered = img_rendered[:, :, [2,1,0,3]]  # it has a 4th alpha channel
      # Read the depth image
      img_depth = cv2.imread(depth_img_name, cv2.IMREAD_UNCHANGED)
      # Workaround as some depth images are read with a different data type!
      img_depth = img_depth.astype(np.uint16)
      # Read reference image if exists, otherwise replace with a zero image.
      if osp.exists(ref_img_name):
        img_ref = cv2.imread(ref_img_name)
        img_ref = img_ref[:, :, ::-1]  # Change BGR to RGB format.
      else:  # use a dummy 3-channel zero image as a placeholder
        print('Warning: no reference image found! Using a dummy placeholder!')
        img_height, img_width = img_depth.shape
        img_ref = np.zeros((img_height, img_width, 3), dtype=np.uint8)

      if self.use_semantic_map:
        semantic_seg_img_name = basename + 'seg_rgb.png'
        img_seg = cv2.imread(semantic_seg_img_name)
        img_seg = img_seg[:, :, ::-1]  # Change from BGR to RGB
        if img_seg.shape[0] == 512 and img_seg.shape[1] == 512:
          img_ref = utils.get_central_crop(img_ref)
          img_rendered = utils.get_central_crop(img_rendered)
          img_depth = utils.get_central_crop(img_depth)

      img_shape = img_depth.shape
      assert img_seg.shape == (img_shape + (3,)), 'error in seg image %s %s' % (
        basename, str(img_seg.shape))
      assert img_ref.shape == (img_shape + (3,)), 'error in ref image %s %s' % (
        basename, str(img_ref.shape))
      assert img_rendered.shape == (img_shape + (4,)), ('error in rendered '
        'image %s %s' % (basename, str(img_rendered.shape)))
      assert len(img_depth.shape) == 2, 'error in depth image %s %s' % (
        basename, str(img_depth.shape))

      raw_example = dict()
      raw_example['height'] = img_ref.shape[0]
      raw_example['width'] = img_ref.shape[1]
      raw_example['rendered'] = img_rendered.tostring()
      raw_example['depth'] = img_depth.tostring()
      raw_example['real'] = img_ref.tostring()
      if self.use_semantic_map:
        raw_example['seg'] = img_seg.tostring()
      self.iter_idx += 1
      return raw_example
    else:
      raise StopIteration()


def filter_out_sparse_renders(dataset_dir, splits, ratio_threshold=0.15):
  print('Filtering %s' % dataset_dir)
  if splits is None:
    imgs_dirs = [dataset_dir]
  else:
    imgs_dirs = [osp.join(dataset_dir, split) for split in splits]
  
  filtered_images = []
  total_images = 0
  sum_density = 0
  for cur_dir in imgs_dirs:
    filtered_dir = osp.join(cur_dir, 'sparse_renders')
    if not osp.exists(filtered_dir):
      os.makedirs(filtered_dir)
    imgs_file_pattern = osp.join(cur_dir, '*_color.png')
    images_path = sorted(glob.glob(imgs_file_pattern))
    print('Processing %d files' % len(images_path))
    total_images += len(images_path)
    for ii, img_path in enumerate(images_path):
      img = np.array(Image.open(img_path))
      aggregate = np.squeeze(np.sum(img, axis=2))
      height, width = aggregate.shape
      mask = aggregate > 0
      density = np.sum(mask) * 1. / (height * width)
      sum_density += density
      if density <= ratio_threshold:
        parent, basename = osp.split(img_path)
        basename = basename[:-10]  # remove the '_color.png' suffix
        srcs = sorted(glob.glob(osp.join(parent, basename + '_*')))
        dest = unicode(filtered_dir + '/.')
        for src in srcs:
          shutil.move(src, dest)
        filtered_images.append(basename)
        print('filtered fie %d: %s with a desnity of %.3f' % (ii, basename,
                                                              density))
    print('Filtered %d/%d images' % (len(filtered_images), total_images))
    print('Mean desnity = %.4f' % (sum_density / total_images))


def _to_example(dictionary):
  """Helper: build tf.Example from (string -> int/float/str list) dictionary."""
  features = {}
  for (k, v) in six.iteritems(dictionary):
    if isinstance(v, six.integer_types):
      features[k] = tf.train.Feature(int64_list=tf.train.Int64List(value=[v]))
    elif isinstance(v, float):
      features[k] = tf.train.Feature(float_list=tf.train.FloatList(value=[v]))
    elif isinstance(v, six.string_types):
      features[k] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[v]))
    elif isinstance(v, bytes):
      features[k] = tf.train.Feature(bytes_list=tf.train.BytesList(value=[v]))
    else:
      raise ValueError("Value for %s is not a recognized type; v: %s type: %s" %
                       (k, str(v[0]), str(type(v[0]))))

  return tf.train.Example(features=tf.train.Features(feature=features))


def _generate_tfrecord_dataset(generator,
                              output_name,
                              output_dir):
  """Convert a dataset into TFRecord format."""
  output_filename = os.path.join(output_dir, output_name)
  output_file = os.path.join(output_dir, output_filename)
  tf.logging.info("Writing TFRecords to file %s", output_file)
  writer = tf.python_io.TFRecordWriter(output_file)

  counter = 0
  for case in generator:
    if counter % 100 == 0:
      print('Generating case %d for %s.' % (counter, output_name))
    counter += 1
    example = _to_example(case)
    writer.write(example.SerializeToString())

  writer.close()
  return output_file


def export_aligned_dataset_to_tfrecord(
    dataset_dir, output_dir, output_basename, splits,
    xception_frozen_graph_path):

  # Step 1: filter out sparse renders
  filter_out_sparse_renders(dataset_dir, splits, 0.15)

  # Step 2: generate semantic segmentation masks
  segment_utils.segment_and_color_dataset(
      dataset_dir, xception_frozen_graph_path, splits)

  # Step 3: export dataset to TFRecord
  if splits is None:
    input_filepattern = osp.join(dataset_dir, '*_color.png')
    dataset_iter = AlignedRenderedDataset(input_filepattern)
    output_name = output_basename + '.tfrecord'
    _generate_tfrecord_dataset(dataset_iter, output_name, output_dir)
  else:
    for split in splits:
      input_filepattern = osp.join(dataset_dir, split, '*_color.png')
      dataset_iter = AlignedRenderedDataset(input_filepattern)
      output_name = '%s_%s.tfrecord' % (output_basename, split)
      _generate_tfrecord_dataset(dataset_iter, output_name, output_dir)


def main(argv):
  # Read input flags
  dataset_name = opts.dataset_name
  dataset_parent_dir = opts.dataset_parent_dir
  output_dir = FLAGS.output_dir
  xception_frozen_graph_path = FLAGS.xception_frozen_graph_path
  splits = ['train', 'val']
  # Run the preprocessing pipeline
  export_aligned_dataset_to_tfrecord(
    dataset_parent_dir, output_dir, dataset_name, splits,
    xception_frozen_graph_path)


if __name__ == '__main__':
  app.run(main)


================================================
FILE: evaluate_quantitative_metrics.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from PIL import Image
from absl import app
from absl import flags
import functools
import glob
import numpy as np
import os
import os.path as osp
import skimage.measure
import tensorflow as tf
import utils

FLAGS = flags.FLAGS
flags.DEFINE_string('val_set_out_dir', None,
                    'Output directory with concatenated fake and real images.')
flags.DEFINE_string('experiment_title', 'experiment',
                    'Name for the experiment to evaluate')


def _extract_real_and_fake_from_concatenated_output(val_set_out_dir):
      out_dir = osp.join(val_set_out_dir, 'fake')
      gt_dir = osp.join(val_set_out_dir, 'real')
      if not osp.exists(out_dir):
        os.makedirs(out_dir)
      if not osp.exists(gt_dir):
        os.makedirs(gt_dir)
      imgs_pattern = osp.join(val_set_out_dir, '*.png')
      imgs_paths = sorted(glob.glob(imgs_pattern))
      print('Separating %d images in %s.' % (len(imgs_paths), val_set_out_dir))
      for img_path in imgs_paths:
        basename = osp.basename(img_path)[:-4]  # remove the '.png' suffix
        img = np.array(Image.open(img_path))
        img_res = 512
        fake = img[:, -2*img_res:-img_res, :]
        real = img[:, -img_res:, :]
        fake_path = osp.join(out_dir, '%s_fake.png' % basename)
        real_path = osp.join(gt_dir, '%s_real.png' % basename)
        Image.fromarray(fake).save(fake_path)
        Image.fromarray(real).save(real_path)


def compute_l1_loss_metric(image_set1_paths, image_set2_paths):
  assert len(image_set1_paths) == len(image_set2_paths)
  assert len(image_set1_paths) > 0
  print('Evaluating L1 loss for %d pairs' % len(image_set1_paths))

  total_loss = 0.
  for ii, (img1_path, img2_path) in enumerate(zip(image_set1_paths,
                                                  image_set2_paths)):
    img1_in_ar = np.array(Image.open(img1_path), dtype=np.float32)
    img1_in_ar = utils.crop_to_multiple(img1_in_ar)

    img2_in_ar = np.array(Image.open(img2_path), dtype=np.float32)
    img2_in_ar = utils.crop_to_multiple(img2_in_ar)

    loss_l1 = np.mean(np.abs(img1_in_ar - img2_in_ar))
    total_loss += loss_l1

  return total_loss / len(image_set1_paths)


def compute_psnr_loss_metric(image_set1_paths, image_set2_paths):
  assert len(image_set1_paths) == len(image_set2_paths)
  assert len(image_set1_paths) > 0
  print('Evaluating PSNR loss for %d pairs' % len(image_set1_paths))

  total_loss = 0.
  for ii, (img1_path, img2_path) in enumerate(zip(image_set1_paths,
                                                  image_set2_paths)):
    img1_in_ar = np.array(Image.open(img1_path))
    img1_in_ar = utils.crop_to_multiple(img1_in_ar)

    img2_in_ar = np.array(Image.open(img2_path))
    img2_in_ar = utils.crop_to_multiple(img2_in_ar)

    loss_psnr = skimage.measure.compare_psnr(img1_in_ar, img2_in_ar)
    total_loss += loss_psnr

  return total_loss / len(image_set1_paths)


def evaluate_experiment(val_set_out_dir, title='experiment',
                        metrics=['psnr', 'l1']):

  out_dir = osp.join(val_set_out_dir, 'fake')
  gt_dir = osp.join(val_set_out_dir, 'real')
  _extract_real_and_fake_from_concatenated_output(val_set_out_dir)
  input_pattern1 = osp.join(gt_dir, '*.png')
  input_pattern2 = osp.join(out_dir, '*.png')
  set1 = sorted(glob.glob(input_pattern1))
  set2 = sorted(glob.glob(input_pattern2))
  for metric in metrics:
    if metric == 'l1':
      mean_loss = compute_l1_loss_metric(set1, set2)
    elif metric == 'psnr':
      mean_loss = compute_psnr_loss_metric(set1, set2)
    print('*** mean %s loss for %s = %f' % (metric, title, mean_loss))


def main(argv):
  evaluate_experiment(FLAGS.val_set_out_dir, title=FLAGS.experiment_title,
                      metrics=['psnr', 'l1'])


if __name__ == '__main__':
  app.run(main)


================================================
FILE: layers.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import functools
import numpy as np
import tensorflow as tf


class LayerInstanceNorm(object):

  def __init__(self, scope_suffix='instance_norm'):
    curr_scope = tf.get_variable_scope().name
    self._scope = curr_scope + '/' + scope_suffix

  def __call__(self, x):
    with tf.variable_scope(self._scope, reuse=tf.AUTO_REUSE):
      return tf.contrib.layers.instance_norm(
        x, epsilon=1e-05, center=True, scale=True)


def layer_norm(x, scope='layer_norm'):
  return tf.contrib.layers.layer_norm(x, center=True, scale=True)


def pixel_norm(x):
  """Pixel normalization.

  Args:
    x: 4D image tensor in B01C format.

  Returns:
    4D tensor with pixel normalized channels.
  """
  return x * tf.rsqrt(tf.reduce_mean(tf.square(x), [-1], keepdims=True) + 1e-8)


def global_avg_pooling(x):
  return tf.reduce_mean(x, axis=[1, 2], keepdims=True)


class FullyConnected(object):

  def __init__(self, n_out_units, scope_suffix='FC'):
    weight_init = tf.random_normal_initializer(mean=0., stddev=0.02)
    weight_regularizer = tf.contrib.layers.l2_regularizer(scale=0.0001)

    curr_scope = tf.get_variable_scope().name
    self._scope = curr_scope + '/' + scope_suffix
    self.fc_layer = functools.partial(
      tf.layers.dense, units=n_out_units, kernel_initializer=weight_init,
      kernel_regularizer=weight_regularizer, use_bias=True)

  def __call__(self, x):
    with tf.variable_scope(self._scope, reuse=tf.AUTO_REUSE):
      return self.fc_layer(x)


def init_he_scale(shape, slope=1.0):
  """He neural network random normal scaling for initialization.

  Args:
    shape: list of the dimensions of the tensor.
    slope: float, slope of the ReLu following the layer.

  Returns:
    a float, He's standard deviation.
  """
  fan_in = np.prod(shape[:-1])
  return np.sqrt(2. / ((1. + slope**2) * fan_in))


class LayerConv(object):
  """Convolution layer with support for equalized learning."""

  def __init__(self,
               name,
               w,
               n,
               stride,
               padding='SAME',
               use_scaling=False,
               relu_slope=1.):
    """Layer constructor.

    Args:
      name: string, layer name.
      w: int or 2-tuple, width of the convolution kernel.
      n: 2-tuple of ints, input and output channel depths.
      stride: int or 2-tuple, stride for the convolution kernel.
      padding: string, the padding method. {SAME, VALID, REFLECT}.
      use_scaling: bool, whether to use weight norm and scaling.
      relu_slope: float, the slope of the ReLu following the layer.
    """
    assert padding in ['SAME', 'VALID', 'REFLECT'], 'Error: unsupported padding'
    self._padding = padding
    with tf.variable_scope(name):
      if isinstance(stride, int):
        stride = [1, stride, stride, 1]
      else:
        assert len(stride) == 0, "stride is either an int or a 2-tuple"
        stride = [1, stride[0], stride[1], 1]
      if isinstance(w, int):
        w = [w, w]
      self.w = w
      shape = [w[0], w[1], n[0], n[1]]
      init_scale, pre_scale = init_he_scale(shape, relu_slope), 1.
      if use_scaling:
        init_scale, pre_scale = pre_scale, init_scale
      self._stride = stride
      self._pre_scale = pre_scale
      self._weight = tf.get_variable(
          'weight',
          shape=shape,
          initializer=tf.random_normal_initializer(stddev=init_scale))
      self._bias = tf.get_variable(
          'bias', shape=[n[1]], initializer=tf.zeros_initializer)

  def __call__(self, x):
    """Apply layer to tensor x."""
    if self._padding != 'REFLECT':
      padding = self._padding
    else:
      padding = 'VALID'
      pad_top = self.w[0] // 2
      pad_left = self.w[1] // 2
      if (self.w[0] - self._stride[1]) % 2 == 0:
        pad_bottom = pad_top
      else:
        pad_bottom = self.w[0] - self._stride[1] - pad_top
      if (self.w[1] - self._stride[2]) % 2 == 0:
        pad_right = pad_left
      else:
        pad_right = self.w[1] - self._stride[2] - pad_left
      x = tf.pad(x, [[0, 0], [pad_top, pad_bottom], [pad_left, pad_right],
                     [0, 0]], mode='REFLECT')
    y = tf.nn.conv2d(x, self._weight, strides=self._stride, padding=padding)
    return self._pre_scale * y + self._bias


class LayerTransposedConv(object):
  """Convolution layer with support for equalized learning."""

  def __init__(self,
               name,
               w,
               n,
               stride,
               padding='SAME',
               use_scaling=False,
               relu_slope=1.):
    """Layer constructor.

    Args:
      name: string, layer name.
      w: int or 2-tuple, width of the convolution kernel.
      n: 2-tuple int, [n_in_channels, n_out_channels]
      stride: int or 2-tuple, stride for the convolution kernel.
      padding: string, the padding method {SAME, VALID, REFLECT}.
      use_scaling: bool, whether to use weight norm and scaling.
      relu_slope: float, the slope of the ReLu following the layer.
    """
    assert padding in ['SAME'], 'Error: unsupported padding for transposed conv'
    if isinstance(stride, int):
      stride = [1, stride, stride, 1]
    else:
      assert len(stride) == 2, "stride is either an int or a 2-tuple"
      stride = [1, stride[0], stride[1], 1]
    if isinstance(w, int):
      w = [w, w]
    self.padding = padding
    self.nc_in, self.nc_out = n
    self.stride = stride
    with tf.variable_scope(name):
      kernel_shape = [w[0], w[1], self.nc_out, self.nc_in]
      init_scale, pre_scale = init_he_scale(kernel_shape, relu_slope), 1.
      if use_scaling:
        init_scale, pre_scale = pre_scale, init_scale
      self._pre_scale = pre_scale
      self._weight = tf.get_variable(
          'weight',
          shape=kernel_shape,
          initializer=tf.random_normal_initializer(stddev=init_scale))
      self._bias = tf.get_variable(
          'bias', shape=[self.nc_out], initializer=tf.zeros_initializer)

  def __call__(self, x):
    """Apply layer to tensor x."""
    x_shape = x.get_shape().as_list()
    batch_size = tf.shape(x)[0]
    stride_x, stride_y = self.stride[1], self.stride[2]
    output_shape = tf.stack([
      batch_size, x_shape[1] * stride_x, x_shape[2] * stride_y, self.nc_out])
    y = tf.nn.conv2d_transpose(
      x, filter=self._weight, output_shape=output_shape, strides=self.stride,
      padding=self.padding)
    return self._pre_scale * y + self._bias


class ResBlock(object):
  def __init__(self,
               name,
               nc,
               norm_layer_constructor,
               activation,
               padding='SAME',
               use_scaling=False,
               relu_slope=1.):
    """Layer constructor."""
    self.name = name
    conv2d = functools.partial(
        LayerConv, w=3, n=[nc, nc], stride=1, padding=padding,
        use_scaling=use_scaling, relu_slope=relu_slope)
    self.blocks = []
    with tf.variable_scope(self.name):
      with tf.variable_scope('res0'):
        self.blocks.append(
          LayerPipe([
            conv2d('res0_conv'),
            norm_layer_constructor('res0_norm'),
            activation
          ])
        )
      with tf.variable_scope('res1'):
        self.blocks.append(
          LayerPipe([
            conv2d('res1_conv'),
            norm_layer_constructor('res1_norm')
          ])
        )

  def __call__(self, x_init):
    """Apply layer to tensor x."""
    x = x_init
    for f in self.blocks:
      x = f(x)
    return x + x_init


class BasicBlock(object):
  def __init__(self,
               name,
               n,
               activation=functools.partial(tf.nn.leaky_relu, alpha=0.2),
               padding='SAME',
               use_scaling=True,
               relu_slope=1.):
    """Layer constructor."""
    self.name = name
    conv2d = functools.partial(
        LayerConv, stride=1, padding=padding,
        use_scaling=use_scaling, relu_slope=relu_slope)
    avg_pool = functools.partial(downscale, n=2)
    nc_in, nc_out = n  # n is a 2-tuple
    with tf.variable_scope(self.name):
      self.path1_blocks = []
      with tf.variable_scope('bb_path1'):
        self.path1_blocks.append(
          LayerPipe([
            activation,
            conv2d('bb_conv0', w=3, n=[nc_in, nc_out]),
            activation,
            conv2d('bb_conv1', w=3, n=[nc_out, nc_out]),
            downscale
          ])
        )

      self.path2_blocks = []
      with tf.variable_scope('bb_path2'):
        self.path2_blocks.append(
          LayerPipe([
            downscale,
            conv2d('path2_conv', w=1, n=[nc_in, nc_out])
          ])
        )

  def __call__(self, x_init):
    """Apply layer to tensor x."""
    x1 = x_init
    x2 = x_init
    for f in self.path1_blocks:
      x1 = f(x1)
    for f in self.path2_blocks:
      x2 = f(x2)
    return x1 + x2


class LayerDense(object):
  """Dense layer with a non-linearity."""

  def __init__(self, name, n, use_scaling=False, relu_slope=1.):
    """Layer constructor.

    Args:
      name: string, layer name.
      n: 2-tuple of ints, input and output widths.
      use_scaling: bool, whether to use weight norm and scaling.
      relu_slope: float, the slope of the ReLu following the layer.
    """
    with tf.variable_scope(name):
      init_scale, pre_scale = init_he_scale(n, relu_slope), 1.
      if use_scaling:
        init_scale, pre_scale = pre_scale, init_scale
      self._pre_scale = pre_scale
      self._weight = tf.get_variable(
          'weight',
          shape=n,
          initializer=tf.random_normal_initializer(stddev=init_scale))
      self._bias = tf.get_variable(
          'bias', shape=[n[1]], initializer=tf.zeros_initializer)

  def __call__(self, x):
    """Apply layer to tensor x."""
    return self._pre_scale * tf.matmul(x, self._weight) + self._bias


class LayerPipe(object):
  """Pipe a sequence of functions."""

  def __init__(self, functions):
    """Layer constructor.

    Args:
      functions: list, functions to pipe.
    """
    self._functions = tuple(functions)

  def __call__(self, x, **kwargs):
    """Apply pipe to tensor x and return result."""
    del kwargs
    for f in self._functions:
      x = f(x)
    return x


def downscale(x, n=2):
  """Box downscaling.

  Args:
    x: 4D image tensor.
    n: integer scale (must be a power of 2).

  Returns:
    4D tensor of images down scaled by a factor n.
  """
  if n == 1:
    return x
  return tf.nn.avg_pool(x, [1, n, n, 1], [1, n, n, 1], 'VALID')


def upscale(x, n):
  """Box upscaling (also called nearest neighbors).

  Args:
    x: 4D image tensor in B01C format.
    n: integer scale (must be a power of 2).

  Returns:
    4D tensor of images up scaled by a factor n.
  """
  if n == 1:
    return x
  x_shape = tf.shape(x)
  height, width = x_shape[1], x_shape[2]
  return tf.image.resize_nearest_neighbor(x, [n * height, n * width])


def tile_and_concatenate(x, z, n_z):
  z = tf.reshape(z, shape=[-1, 1, 1, n_z])
  z = tf.tile(z, [1, tf.shape(x)[1], tf.shape(x)[2], 1])
  x = tf.concat([x, z], axis=-1)
  return x


def minibatch_mean_variance(x):
  """Computes the variance average.

  This is used by the discriminator as a form of batch discrimination.

  Args:
    x: nD tensor for which to compute variance average.

  Returns:
    a scalar, the mean variance of variable x.
  """
  mean = tf.reduce_mean(x, 0, keepdims=True)
  vals = tf.sqrt(tf.reduce_mean(tf.squared_difference(x, mean), 0) + 1e-8)
  vals = tf.reduce_mean(vals)
  return vals


def scalar_concat(x, scalar):
  """Concatenate a scalar to a 4D tensor as an extra channel.

  Args:
    x: 4D image tensor in B01C format.
    scalar: a scalar to concatenate to the tensor.

  Returns:
    a 4D tensor with one extra channel containing the value scalar at
     every position.
  """
  s = tf.shape(x)
  return tf.concat([x, tf.ones([s[0], s[1], s[2], 1]) * scalar], axis=3)


================================================
FILE: losses.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from options import FLAGS as opts
import layers
import os.path as osp
import tensorflow as tf
import vgg16


def gradient_penalty_loss(y_xy, xy, iwass_target=1, iwass_lambda=10):
  grad = tf.gradients(tf.reduce_sum(y_xy), [xy])[0]
  grad_norm = tf.sqrt(tf.reduce_sum(tf.square(grad), axis=[1, 2, 3]) + 1e-8)
  loss_gp = tf.reduce_mean(
      tf.square(grad_norm - iwass_target)) * iwass_lambda / iwass_target**2
  return loss_gp


def KL_loss(mean, logvar):
  loss = 0.5 * tf.reduce_sum(tf.square(mean) + tf.exp(logvar) - 1. - logvar,
                             axis=-1)
  return tf.reduce_sum(loss)  # just to match DRIT implementation


def l2_regularize(x):
  return tf.reduce_mean(tf.square(x))


def L1_loss(x, y):
  return tf.reduce_mean(tf.abs(x - y))


class PerceptualLoss:
  def __init__(self, x, y, image_shape, layers, w_layers, w_act=0.1):
    """
    Builds vgg16 network and computes the perceptual loss.
    """
    assert len(image_shape) == 3 and image_shape[-1] == 3
    assert osp.exists(opts.vgg16_path), 'Cannot find %s' % opts.vgg16_path

    self.w_act = w_act
    self.vgg_layers = layers
    self.w_layers = w_layers
    batch_shape = [None] + image_shape  # [None, H, W, 3]

    vgg_net = vgg16.Vgg16(opts.vgg16_path)
    self.x_acts = vgg_net.get_vgg_activations(x, layers)
    self.y_acts = vgg_net.get_vgg_activations(y, layers)
    loss = 0
    for w, act1, act2 in zip(self.w_layers, self.x_acts, self.y_acts):
      loss += w * tf.reduce_mean(tf.square(self.w_act * (act1 - act2)))
    self.loss = loss

  def __call__(self):
    return self.loss


def lsgan_appearance_E_loss(disc_response):
  disc_response = tf.squeeze(disc_response)
  gt_label = 0.5
  loss = tf.reduce_mean(tf.square(disc_response - gt_label))
  return loss


def lsgan_loss(disc_response, is_real):
  gt_label = 1 if is_real else 0
  disc_response = tf.squeeze(disc_response)
  # The following works for both regular and patchGAN discriminators
  loss = tf.reduce_mean(tf.square(disc_response - gt_label))
  return loss


def multiscale_discriminator_loss(Ds_responses, is_real):
  num_D = len(Ds_responses)
  loss = 0
  for i in range(num_D):
    curr_response = Ds_responses[i][-1][-1]
    loss += lsgan_loss(curr_response, is_real)
  return loss


================================================
FILE: networks.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from options import FLAGS as opts
import functools
import layers
import tensorflow as tf


class RenderingModel(object):

  def __init__(self, model_name, use_appearance=True):

    if model_name == 'pggan':
      self._model = ModelPGGAN(use_appearance)
    else:
      raise ValueError('Model %s not implemented!' % model_name)

  def __call__(self, x_in, z_app=None):
    return self._model(x_in, z_app)

  def get_appearance_encoder(self):
    return self._model._appearance_encoder

  def get_generator(self):
    return self._model._generator

  def get_content_encoder(self):
    return self._model._content_encoder


# "Progressive Growing of GANs (PGGAN)"-inspired architecture. Implementation is
# based on the implementation details in their paper, but code is not taken from
# the authors' released code.
# Main changes are:
#  - conditional GAN setup by introducting an encoder + skip connections.
#  - no progressive growing during training.
class ModelPGGAN(RenderingModel):

  def __init__(self, use_appearance=True):
    self._use_appearance = use_appearance
    self._content_encoder = None
    self._generator = GeneratorPGGAN(appearance_vec_size=opts.app_vector_size)
    if use_appearance:
      self._appearance_encoder = DRITAppearanceEncoderConcat(
          'appearance_net', opts.appearance_nc, opts.normalize_drit_Ez)
    else:
      self._appearance_encoder = None

  def __call__(self, x_in, z_app=None):
    y = self._generator(x_in, z_app)
    return y

  def get_appearance_encoder(self):
    return self._appearance_encoder

  def get_generator(self):
    return self._generator

  def get_content_encoder(self):
    return self._content_encoder


class PatchGANDiscriminator(object):

  def __init__(self, name_scope, input_nc, nf=64, n_layers=3, get_fmaps=False):
    """Constructor for a patchGAN discriminators.

    Args:
      name_scope: str - tf name scope.
      input_nc: int - number of input channels.
      nf: int - starting number of discriminator filters.
      n_layers: int - number of layers in the discriminator.
      get_fmaps: bool - return intermediate feature maps for FeatLoss.
    """
    self.get_fmaps = get_fmaps
    self.n_layers = n_layers
    kw = 4  # kernel width for convolution

    activation = functools.partial(tf.nn.leaky_relu, alpha=0.2)
    norm_layer = functools.partial(layers.LayerInstanceNorm)
    conv2d = functools.partial(layers.LayerConv, use_scaling=opts.use_scaling,
                               relu_slope=0.2)

    def minibatch_stats(x):
      return layers.scalar_concat(x, layers.minibatch_mean_variance(x))

    # Create layers.
    self.blocks = []
    with tf.variable_scope(name_scope, tf.AUTO_REUSE):
      with tf.variable_scope('block_0'):
        self.blocks.append([
            conv2d('conv0', w=kw, n=[input_nc, nf], stride=2),
            activation
        ])
      for ii_block in range(1, n_layers):
        nf_prev = nf
        nf = min(nf * 2, 512)
        with tf.variable_scope('block_%d' % ii_block):
          self.blocks.append([
              conv2d('conv%d' % ii_block, w=kw, n=[nf_prev, nf], stride=2),
              norm_layer(),
              activation
          ])
      # Add minibatch_stats (from PGGAN) and do a stride1 convolution.
      nf_prev = nf
      nf = min(nf * 2, 512)
      with tf.variable_scope('block_%d' % (n_layers + 1)):
        self.blocks.append([
            minibatch_stats,  # this is improvised by @meshry
            conv2d('conv%d' % (n_layers + 1), w=kw, n=[nf_prev + 1, nf],
                   stride=1),
            norm_layer(),
            activation
        ])
      # Get 1-channel patchGAN logits
      with tf.variable_scope('patchGAN_logits'):
        self.blocks.append([
            conv2d('conv%d' % (n_layers + 2), w=kw, n=[nf, 1], stride=1)
        ])

  def __call__(self, x, x_cond=None):
    # Concatenate extra conditioning input, if any.
    if x_cond is not None:
      x = tf.concat([x, x_cond], axis=3)

    if self.get_fmaps:
      # Dummy addition of x to D_fmaps, which will be removed before returing
      D_fmaps = [[x]]
      for i_block in range(len(self.blocks)):
        # Apply layer #0 in the current block
        block_fmaps = [self.blocks[i_block][0](D_fmaps[-1][-1])]
        # Apply the remaining layers of this block
        for i_layer in range(1, len(self.blocks[i_block])):
          block_fmaps.append(self.blocks[i_block][i_layer](block_fmaps[-1]))
        # Append the feature maps of this block to D_fmaps
        D_fmaps.append(block_fmaps)
      return D_fmaps[1:]  # exclude the input x which we added initially
    else:
      y = x
      for i_block in range(len(self.blocks)):
        for i_layer in range(len(self.blocks[i_block])):
          y = self.blocks[i_block][i_layer](y)
      return [[y]]


class MultiScaleDiscriminator(object):

  def __init__(self, name_scope, input_nc, num_scales=3, nf=64, n_layers=3,
               get_fmaps=False):
    self.get_fmaps = get_fmaps
    discs = []
    with tf.variable_scope(name_scope):
      for i in range(num_scales):
        discs.append(PatchGANDiscriminator(
            'D_scale%d' % i, input_nc, nf=nf, n_layers=n_layers,
            get_fmaps=get_fmaps))
    self.discriminators = discs

  def __call__(self, x, x_cond=None, params=None):
    del params
    if x_cond is not None:
      x = tf.concat([x, x_cond], axis=3)

    responses = []
    for ii, D in enumerate(self.discriminators):
      responses.append(D(x, x_cond=None))  # x_cond is already concatenated
      if ii != len(self.discriminators) - 1:
        x = layers.downscale(x, n=2)
    return responses


class GeneratorPGGAN(object):
  def __init__(self, appearance_vec_size=8, use_scaling=True,
               num_blocks=5, input_nc=7,
               fmap_base=8192, fmap_decay=1.0, fmap_max=512):
    """Generator model.
  
    Args:
      appearance_vec_size: int, size of the latent appearance vector.
      use_scaling: bool, whether to use weight scaling.
      resolution: int, width of the images (assumed to be square).
      input_nc: int, number of input channles.
      fmap_base: int, base number of channels.
      fmap_decay: float, decay rate of channels with respect to depth.
      fmap_max: int, max number of channels (supersedes fmap_base).
  
    Returns:
      function of the model.
    """
    def _num_filters(fmap_base, fmap_decay, fmap_max, stage):
      if opts.g_nf == 32:
        return min(int(2**(10 - stage)), fmap_max)  # nf32
      elif opts.g_nf == 64:
        return min(int(2**(11 - stage)), fmap_max)  # nf64
      else:
        raise ValueError('Currently unsupported num filters')

    nf = functools.partial(_num_filters, fmap_base, fmap_decay, fmap_max)
    self.num_blocks = num_blocks
    activation = functools.partial(tf.nn.leaky_relu, alpha=0.2)
    conv2d_stride1 = functools.partial(
        layers.LayerConv, stride=1, use_scaling=use_scaling, relu_slope=0.2)
    conv2d_rgb = functools.partial(layers.LayerConv, w=1, stride=1,
                                   use_scaling=use_scaling)
  
    # Create encoder layers.
    with tf.variable_scope('g_model_enc', tf.AUTO_REUSE):
      self.enc_stage = []
      self.from_rgb = []

      if opts.use_appearance and opts.inject_z == 'to_encoder':
        input_nc += appearance_vec_size
  
      for i in range(num_blocks, -1, -1):
        with tf.variable_scope('res_%d' % i):
          self.from_rgb.append(
              layers.LayerPipe([
                  conv2d_rgb('from_rgb', n=[input_nc, nf(i + 1)]),
                  activation,
              ])
          )
          self.enc_stage.append(
              layers.LayerPipe([
                  functools.partial(layers.downscale, n=2),
                  conv2d_stride1('conv0', w=3, n=[nf(i + 1), nf(i)]),
                  activation,
                  layers.pixel_norm,
                  conv2d_stride1('conv1', w=3, n=[nf(i), nf(i)]),
                  activation,
                  layers.pixel_norm
              ])
          )
  
    # Create decoder layers.
    with tf.variable_scope('g_model_dec', tf.AUTO_REUSE):
      self.dec_stage = []
      self.to_rgb = []
  
      nf_bottleneck = nf(0)  # num input filters at the bottleneck
      if opts.use_appearance and opts.inject_z == 'to_bottleneck':
        nf_bottleneck += appearance_vec_size

      with tf.variable_scope('res_0'):
        self.dec_stage.append(
          layers.LayerPipe([
            functools.partial(layers.upscale, n=2),
            conv2d_stride1('conv0', w=3, n=[nf_bottleneck, nf(1)]),
            activation,
            layers.pixel_norm,
            conv2d_stride1('conv1', w=3, n=[nf(1), nf(1)]),
            activation,
            layers.pixel_norm
          ])
        )
        self.to_rgb.append(conv2d_rgb('to_rgb', n=[nf(1), opts.output_nc]))
  
      multiply_factor = 2 if opts.concatenate_skip_layers else 1
      for i in range(1, num_blocks + 1):
        with tf.variable_scope('res_%d' % i):
          self.dec_stage.append(
              layers.LayerPipe([
                  functools.partial(layers.upscale, n=2),
                  conv2d_stride1('conv0', w=3,
                                 n=[multiply_factor * nf(i), nf(i + 1)]),
                  activation,
                  layers.pixel_norm,
                  conv2d_stride1('conv1', w=3, n=[nf(i + 1), nf(i + 1)]),
                  activation,
                  layers.pixel_norm
              ])
          )
          self.to_rgb.append(conv2d_rgb('to_rgb',
                                        n=[nf(i + 1), opts.output_nc]))

  def __call__(self, x, appearance_embedding=None, encoder_fmaps=None):
    """Generator function.

    Args:
      x: 2D tensor (batch, latents), the conditioning input batch of images.
      appearance_embedding: float tensor: latent appearance vector.
    Returns:
      4D tensor of images (NHWC), the generated images.
    """
    del encoder_fmaps
    enc_st_idx = 0
    if opts.use_appearance and opts.inject_z == 'to_encoder':
      x = layers.tile_and_concatenate(x, appearance_embedding,
                                      opts.app_vector_size)
    y = self.from_rgb[enc_st_idx](x)

    enc_responses = []
    for i in range(enc_st_idx, len(self.enc_stage)):
      y = self.enc_stage[i](y)
      enc_responses.insert(0, y)

    # Concatenate appearance vector to y
    if opts.use_appearance and opts.inject_z == 'to_bottleneck':
      appearance_tensor = tf.tile(appearance_embedding,
                                  [1, tf.shape(y)[1], tf.shape(y)[2], 1])
      y = tf.concat([y, appearance_tensor], axis=3)

    y_list = []
    for i in range(self.num_blocks + 1):
      if i > 0:
        y_skip = enc_responses[i]  # skip layer
        if opts.concatenate_skip_layers:
          y = tf.concat([y, y_skip], axis=3)
        else:
          y = y + y_skip
      y = self.dec_stage[i](y)
      y_list.append(y)

    return self.to_rgb[self.num_blocks](y_list[-1])


class DRITAppearanceEncoderConcat(object):

  def __init__(self, name_scope, input_nc, normalize_encoder):
    self.blocks = []
    activation = functools.partial(tf.nn.leaky_relu, alpha=0.2)
    conv2d = functools.partial(layers.LayerConv, use_scaling=opts.use_scaling,
                               relu_slope=0.2, padding='SAME')
    with tf.variable_scope(name_scope, tf.AUTO_REUSE):
      if normalize_encoder:
        self.blocks.append(layers.LayerPipe([
            conv2d('conv0', w=4, n=[input_nc, 64], stride=2),
            layers.BasicBlock('BB0', n=[64, 128], use_scaling=opts.use_scaling),
            layers.pixel_norm,
            layers.BasicBlock('BB1', n=[128, 192], use_scaling=opts.use_scaling),
            layers.pixel_norm,
            layers.BasicBlock('BB2', n=[192, 256], use_scaling=opts.use_scaling),
            layers.pixel_norm,
            activation,
            layers.global_avg_pooling
        ]))
      else:
        self.blocks.append(layers.LayerPipe([
            conv2d('conv0', w=4, n=[input_nc, 64], stride=2),
            layers.BasicBlock('BB0', n=[64, 128], use_scaling=opts.use_scaling),
            layers.BasicBlock('BB1', n=[128, 192], use_scaling=opts.use_scaling),
            layers.BasicBlock('BB2', n=[192, 256], use_scaling=opts.use_scaling),
            activation,
            layers.global_avg_pooling
        ]))
      # FC layers to get the mean and logvar
      self.fc_mean = layers.FullyConnected(opts.app_vector_size, 'FC_mean')
      self.fc_logvar = layers.FullyConnected(opts.app_vector_size, 'FC_logvar')

  def __call__(self, x):
    for f in self.blocks:
      x = f(x)

    mean = self.fc_mean(x)
    logvar = self.fc_logvar(x)
    # The following is an arbitrarily chosen *deterministic* latent vector
    # computation. Another option is to let z = mean, but gradients from logvar
    # will be None and will need to be removed.
    z = mean + tf.exp(0.5 * logvar)
    return z, mean, logvar


================================================
FILE: neural_rerendering.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from PIL import Image
from absl import app
from options import FLAGS as opts
import data
import datetime
import functools
import glob
import losses
import networks
import numpy as np
import options
import os.path as osp
import random
import skimage.measure
import staged_model
import tensorflow as tf
import time
import utils


def build_model_fn(use_exponential_moving_average=True):
  """Builds and returns the model function for an estimator.

  Args:
    use_exponential_moving_average: bool. If true, the exponential moving
    average will be used.

  Returns:
    function, the model_fn function typically required by an estimator.
  """
  arch_type = opts.arch_type
  use_appearance = opts.use_appearance
  def model_fn(features, labels, mode, params):
    """An estimator build_fn."""
    del labels, params
    if mode == tf.estimator.ModeKeys.TRAIN:
      step = tf.train.get_global_step()

      x_in = features['conditional_input']
      x_gt = features['expected_output']  # ground truth output
      x_app = features['peek_input']

      if opts.training_pipeline == 'staged':
        ops = staged_model.create_computation_graph(x_in, x_gt, x_app=x_app,
                                                    arch_type=opts.arch_type)
        op_increment_step = tf.assign_add(step, 1)
        train_disc_op = ops['train_disc_op']
        train_renderer_op = ops['train_renderer_op']
        train_op = tf.group(train_disc_op, train_renderer_op, op_increment_step)

        utils.HookReport.log_tensor(ops['total_loss_d'], 'total_loss_d')
        utils.HookReport.log_tensor(ops['loss_d_real'], 'loss_d_real')
        utils.HookReport.log_tensor(ops['loss_d_fake'], 'loss_d_fake')
        utils.HookReport.log_tensor(ops['total_loss_g'], 'total_loss_g')
        utils.HookReport.log_tensor(ops['loss_g_gan'], 'loss_g_gan')
        utils.HookReport.log_tensor(ops['loss_g_recon'], 'loss_g_recon')
        utils.HookReport.log_tensor(step, 'global_step')

        return tf.estimator.EstimatorSpec(
            mode=mode, loss=ops['total_loss_d'] + ops['total_loss_g'],
            train_op=train_op)
      else:
        raise NotImplementedError('%s training is not implemented.' %
                                  opts.training_pipeline)
    elif mode == tf.estimator.ModeKeys.EVAL:
      raise NotImplementedError('Eval is not implemented.')
    else:  # all below modes are for difference inference tasks.
      # Build network and initialize inference variables.
      g_func = networks.RenderingModel(arch_type, use_appearance)
      if use_appearance:
        app_func = g_func.get_appearance_encoder()
      if use_exponential_moving_average:
        ema = tf.train.ExponentialMovingAverage(decay=0.999)
        var_dict = ema.variables_to_restore()
        tf.train.init_from_checkpoint(osp.join(opts.train_dir), var_dict)

      if mode == tf.estimator.ModeKeys.PREDICT:
        x_in = features['conditional_input']
        if use_appearance:
          x_app = features['peek_input']
          x_app_embedding, _, _ = app_func(x_app)
        else:
          x_app_embedding = None
        y = g_func(x_in, x_app_embedding)
        tf.logging.info('DBG: shape of y during prediction %s.' % str(y.shape))
        return tf.estimator.EstimatorSpec(mode=mode, predictions=y)

      # 'eval_subset' mode is same as PREDICT but it concatenates the output to
      # the input render, semantic map and ground truth for easy comparison.
      elif mode == 'eval_subset':
        x_in = features['conditional_input']
        x_gt = features['expected_output']
        if use_appearance:
          x_app = features['peek_input']
          x_app_embedding, _, _ = app_func(x_app)
        else:
          x_app_embedding = None
        y = g_func(x_in, x_app_embedding)
        tf.logging.info('DBG: shape of y during prediction %s.' % str(y.shape))
        x_in_rgb = tf.slice(x_in, [0, 0, 0, 0], [-1, -1, -1, 3])
        if opts.use_semantic:
          x_in_semantic = tf.slice(x_in, [0, 0, 0, 4], [-1, -1, -1, 3])
          output_tuple = tf.concat([x_in_rgb, x_in_semantic, y, x_gt], axis=2)
        else:
          output_tuple = tf.concat([x_in_rgb, y, x_gt], axis=2)
        return tf.estimator.EstimatorSpec(mode=mode, predictions=output_tuple)

      # 'compute_appearance' mode computes and returns the latent z vector.
      elif mode == 'compute_appearance':
        assert use_appearance, 'use_appearance is set to False!'
        x_app_in = features['peek_input']
        # NOTE the following line is a temporary hack (which is
        # specially bad for inputs smaller than 512x512).
        x_app_in = tf.image.resize_image_with_crop_or_pad(x_app_in, 512, 512)
        app_embedding, _, _ = app_func(x_app_in)
        return tf.estimator.EstimatorSpec(mode=mode, predictions=app_embedding)

      # 'interpolate_appearance' mode expects an already computed latent z
      # vector as input passed a value to the dict key 'appearance_embedding'.
      elif mode == 'interpolate_appearance':
        assert use_appearance, 'use_appearance is set to False!'
        x_in = features['conditional_input']
        x_app_embedding = features['appearance_embedding']
        y = g_func(x_in, x_app_embedding)
        tf.logging.info('DBG: shape of y during prediction %s.' % str(y.shape))
        return tf.estimator.EstimatorSpec(mode=mode, predictions=y)
      else:
        raise ValueError('Unsupported mode: ' + mode)

  return model_fn


def make_sample_grid_and_save(est, dataset_name, dataset_parent_dir, grid_dims,
                              output_dir, cur_nimg):
  """Evaluate a fixed set of validation images and save output.

  Args:
    est: tf,estimator.Estimator, TF estimator to run the predictions.
    dataset_name: basename for the validation tfrecord from which to load
      validation images.
    dataset_parent_dir: path to a directory containing the validation tfrecord.
    grid_dims: 2-tuple int for the grid size (1 unit = 1 image).
    output_dir: string, where to save image samples.
    cur_nimg: int, current number of images seen by training.

  Returns:
    None.
  """
  num_examples = grid_dims[0] * grid_dims[1]
  def input_val_fn():
    dict_inp = data.provide_data(
        dataset_name=dataset_name, parent_dir=dataset_parent_dir, subset='val',
        batch_size=1, crop_flag=True, crop_size=opts.train_resolution,
        seeds=[0], max_examples=num_examples,
        use_appearance=opts.use_appearance, shuffle=0)
    x_in = dict_inp['conditional_input']
    x_gt = dict_inp['expected_output']  # ground truth output
    x_app = dict_inp['peek_input']
    return x_in, x_gt, x_app

  def est_input_val_fn():
    x_in, _, x_app = input_val_fn()
    features = {'conditional_input': x_in, 'peek_input': x_app}
    return features

  images = [x for x in est.predict(est_input_val_fn)]
  images = np.array(images, 'f')
  images = images.reshape(grid_dims + images.shape[1:])
  utils.save_images(utils.to_png(utils.images_to_grid(images)), output_dir,
                    cur_nimg)


def visualize_image_sequence(est, dataset_name, dataset_parent_dir,
                             input_sequence_name, app_base_path, output_dir):
  """Generates an image sequence as a video and stores it to disk."""
  batch_sz = opts.batch_size
  def input_seq_fn():
    dict_inp = data.provide_data(
        dataset_name=dataset_name, parent_dir=dataset_parent_dir,
        subset=input_sequence_name, batch_size=batch_sz, crop_flag=False,
        seeds=None, use_appearance=False, shuffle=0)
    x_in = dict_inp['conditional_input']
    return x_in

  # Compute appearance embedding only once and use it for all input frames.
  app_rgb_path = app_base_path + '_reference.png'
  app_rendered_path = app_base_path + '_color.png'
  app_depth_path = app_base_path + '_depth.png'
  app_sem_path = app_base_path + '_seg_rgb.png'
  x_app = _load_and_concatenate_image_channels(
      app_rgb_path, app_rendered_path, app_depth_path, app_sem_path)
  def seq_with_single_appearance_inp_fn():
    """input frames with a fixed latent appearance vector."""
    x_in_op = input_seq_fn()
    x_app_op = tf.convert_to_tensor(x_app)
    x_app_tiled_op = tf.tile(x_app_op, [tf.shape(x_in_op)[0], 1, 1, 1])
    return {'conditional_input': x_in_op,
            'peek_input': x_app_tiled_op}

  images = [x for x in est.predict(seq_with_single_appearance_inp_fn)]
  for i, gen_img in enumerate(images):
    output_file_path = osp.join(output_dir, 'out_%04d.png' % i)
    print('Saving frame #%d to %s' % (i, output_file_path))
    with tf.gfile.Open(output_file_path, 'wb') as f:
      f.write(utils.to_png(gen_img))


def train(dataset_name, dataset_parent_dir, load_pretrained_app_encoder,
          load_trained_fixed_app, save_samples_kimg=50):
  """Main training procedure.

  The trained model is saved in opts.train_dir, the function itself does not
   return anything.

  Args:
    save_samples_kimg: int, period (in KiB) to save sample images.

  Returns:
    None.
  """
  image_dir = osp.join(opts.train_dir, 'images')  # to save validation images.
  tf.gfile.MakeDirs(image_dir)
  config = tf.estimator.RunConfig(
      save_summary_steps=(1 << 10) // opts.batch_size,
      save_checkpoints_steps=(save_samples_kimg << 10) // opts.batch_size,
      keep_checkpoint_max=5,
      log_step_count_steps=1 << 30)
  model_dir = opts.train_dir
  if (opts.use_appearance and load_trained_fixed_app and
      not tf.train.latest_checkpoint(model_dir)):
    tf.logging.warning('***** Loading resume_step from %s!' %
                       opts.fixed_appearance_train_dir)
    resume_step = utils.load_global_step_from_checkpoint_dir(
        opts.fixed_appearance_train_dir)
  else:
    tf.logging.warning('***** Loading resume_step (if any) from %s!' %
                       model_dir)
    resume_step = utils.load_global_step_from_checkpoint_dir(model_dir)
  if resume_step != 0:
    tf.logging.warning('****** Resuming training at %d!' % resume_step)

  model_fn = build_model_fn()  # model function for TFEstimator.

  hooks = [utils.HookReport(1 << 12, opts.batch_size)]

  if opts.use_appearance and load_pretrained_app_encoder:
    tf.logging.warning('***** will warm-start from %s!' %
                       opts.appearance_pretrain_dir)
    ws = tf.estimator.WarmStartSettings(
        ckpt_to_initialize_from=opts.appearance_pretrain_dir,
        vars_to_warm_start='appearance_net/.*')
  elif opts.use_appearance and load_trained_fixed_app:
    tf.logging.warning('****** finetuning will warm-start from %s!' %
                       opts.fixed_appearance_train_dir)
    ws = tf.estimator.WarmStartSettings(
        ckpt_to_initialize_from=opts.fixed_appearance_train_dir,
        vars_to_warm_start='.*')
  else:
    ws = None
    tf.logging.warning('****** No warm-starting; using random initialization!')

  est = tf.estimator.Estimator(model_fn, model_dir, config, params={},
                               warm_start_from=ws)

  for next_kimg in range(opts.save_samples_kimg, opts.total_kimg + 1,
                         opts.save_samples_kimg):
    next_step = (next_kimg << 10) // opts.batch_size
    if opts.num_crops == -1:  # use random crops
      crop_seeds = None
    else:
      crop_seeds = list(100 * np.arange(opts.num_crops))
    input_train_fn = functools.partial(
        data.provide_data, dataset_name=dataset_name,
        parent_dir=dataset_parent_dir, subset='train',
        batch_size=opts.batch_size, crop_flag=True,
        crop_size=opts.train_resolution, seeds=crop_seeds,
        use_appearance=opts.use_appearance)
    est.train(input_train_fn, max_steps=next_step, hooks=hooks)
    tf.logging.info('DBG: kimg=%d, cur_step=%d' % (next_kimg, next_step))
    tf.logging.info('DBG: Saving a validation grid image %06d to %s' % (
        next_kimg, image_dir))
    make_sample_grid_and_save(est, dataset_name, dataset_parent_dir, (3, 3),
                              image_dir, next_kimg << 10)


def _build_inference_estimator(model_dir):
  model_fn = build_model_fn()
  est = tf.estimator.Estimator(model_fn, model_dir)
  return est


def evaluate_sequence(dataset_name, dataset_parent_dir, virtual_seq_name,
                      app_base_path):
  output_dir = osp.join(opts.train_dir, 'seq_output_%s' % virtual_seq_name)
  tf.gfile.MakeDirs(output_dir)
  est = _build_inference_estimator(opts.train_dir)
  visualize_image_sequence(est, dataset_name, dataset_parent_dir,
                           virtual_seq_name, app_base_path, output_dir)


def evaluate_image_set(dataset_name, dataset_parent_dir, subset_suffix,
                       output_dir=None, batch_size=6):
  if output_dir is None:
    output_dir = osp.join(opts.train_dir, 'validation_output_%s' % subset_suffix)
  tf.gfile.MakeDirs(output_dir)
  model_fn_old = build_model_fn()
  def model_fn_wrapper(features, labels, mode, params):
    del mode
    return model_fn_old(features, labels, 'eval_subset', params)
  model_dir = opts.train_dir
  est = tf.estimator.Estimator(model_fn_wrapper, model_dir)
  est_inp_fn = functools.partial(
      data.provide_data, dataset_name=dataset_name,
      parent_dir=dataset_parent_dir, subset=subset_suffix,
      batch_size=batch_size, use_appearance=opts.use_appearance, shuffle=0)

  print('Evaluating images for subset %s' % subset_suffix)
  images = [x for x in est.predict(est_inp_fn)]
  print('Evaluated %d images' % len(images))
  for i, img in enumerate(images):
    output_file_path = osp.join(output_dir, 'out_%04d.png' % i)
    print('Saving file #%d: %s' % (i, output_file_path))
    with tf.gfile.Open(output_file_path, 'wb') as f:
      f.write(utils.to_png(img))


def _load_and_concatenate_image_channels(rgb_path=None, rendered_path=None,
                                         depth_path=None, seg_path=None,
                                         size_multiple=64):
  """Prepares a single input for the network."""
  if (rgb_path is None and rendered_path is None and depth_path is None and
      seg_path is None):
    raise ValueError('At least one of the inputs has to be not None')

  channels = ()
  if rgb_path is not None:
    rgb_img = np.array(Image.open(rgb_path)).astype(np.float32)
    rgb_img = utils.crop_to_multiple(rgb_img, size_multiple)
    channels = channels + (rgb_img,)
  if rendered_path is not None:
    rendered_img = np.array(Image.open(rendered_path)).astype(np.float32)
    if not opts.use_alpha:
      rendered_img = rendered_img[:, :, :3]  # drop the alpha channel
    rendered_img = utils.crop_to_multiple(rendered_img, size_multiple)
    channels = channels + (rendered_img,)
  if depth_path is not None:
    depth_img = np.array(Image.open(depth_path))
    depth_img = depth_img.astype(np.float32)
    depth_img = utils.crop_to_multiple(depth_img[:, :, np.newaxis],
                                       size_multiple)
    channels = channels + (depth_img,)
    # depth_img = depth_img * (2.0 / 255) - 1.0
  if seg_path is not None:
    seg_img = np.array(Image.open(seg_path)).astype(np.float32)
    seg_img = utils.crop_to_multiple(seg_img, size_multiple)
    channels = channels + (seg_img,)
  # Concatenate and normalize channels
  img = np.dstack(channels)
  img = np.expand_dims(img, axis=0)
  img = img * (2.0 / 255) - 1.0
  return img


def infer_dir(model_dir, input_dir, output_dir):
  tf.gfile.MakeDirs(output_dir)
  est = _build_inference_estimator(opts.train_dir)

  def read_image(base_path, is_appearance=False):
    if is_appearance:
      ref_img_path = base_path + '_reference.png'
    else:
      ref_img_path = None
    rendered_img_path = base_path + '_color.png'
    depth_img_path = base_path + '_depth.png'
    seg_img_path = base_path + '_seg_rgb.png'
    img = _load_and_concatenate_image_channels(
        rgb_path=ref_img_path, rendered_path=rendered_img_path,
        depth_path=depth_img_path, seg_path=seg_img_path)
    return img

  def get_inference_input_fn(base_path, app_base_path):
    x_in = read_image(base_path, False)
    x_app_in = read_image(app_base_path, True)
    def infer_input_fn():
      return {'conditional_input': x_in, 'peek_input': x_app_in}
    return infer_input_fn

  file_paths = sorted(glob.glob(osp.join(input_dir, '*_depth.png')))
  base_paths = [x[:-10] for x in file_paths]  # remove the '_depth.png' suffix
  for inp_base_path in base_paths:
    est_inp_fn = get_inference_input_fn(inp_base_path, inp_base_path)
    img = next(est.predict(est_inp_fn))
    basename = osp.basename(inp_base_path)
    output_img_path = osp.join(output_dir, basename + '_out.png')
    print('Saving generated image to %s' % output_img_path)
    with tf.gfile.Open(output_img_path, 'wb') as f:
      f.write(utils.to_png(img))


def joint_interpolation(model_dir, app_input_dir, st_app_basename,
                        end_app_basename, camera_path_dir):
  """
  Interpolates both viewpoint and appearance between two input images.
  """
  # Create output direcotry
  output_dir = osp.join(model_dir, 'joint_interpolation_out')
  tf.gfile.MakeDirs(output_dir)

  # Build estimator
  model_fn_old = build_model_fn()
  def model_fn_wrapper(features, labels, mode, params):
    del mode
    return model_fn_old(features, labels, 'interpolate_appearance', params)
  def appearance_model_fn(features, labels, mode, params):
    del mode
    return model_fn_old(features, labels, 'compute_appearance', params)
  config = tf.estimator.RunConfig(
      save_summary_steps=1000, save_checkpoints_steps=50000,
      keep_checkpoint_max=50, log_step_count_steps=1 << 30)
  model_dir = model_dir
  est = tf.estimator.Estimator(model_fn_wrapper, model_dir, config, params={})
  est_app = tf.estimator.Estimator(appearance_model_fn, model_dir, config,
                                   params={})

  # Compute appearance embeddings for the two input appearance images.
  app_inputs = []
  for app_basename in [st_app_basename, end_app_basename]:
    app_rgb_path = osp.join(app_input_dir, app_basename + '_reference.png')
    app_rendered_path = osp.join(app_input_dir, app_basename + '_color.png')
    app_depth_path = osp.join(app_input_dir, app_basename + '_depth.png')
    app_seg_path = osp.join(app_input_dir, app_basename + '_seg_rgb.png')
    app_in = _load_and_concatenate_image_channels(
        rgb_path=app_rgb_path, rendered_path=app_rendered_path,
        depth_path=app_depth_path, seg_path=app_seg_path)
    # app_inputs.append(tf.convert_to_tensor(app_in))
    app_inputs.append(app_in)

  embedding1 = next(est_app.predict(
      lambda: {'peek_input': app_inputs[0]}))
  embedding1 = np.expand_dims(embedding1, axis=0)
  embedding2 = next(est_app.predict(
      lambda: {'peek_input': app_inputs[1]}))
  embedding2 = np.expand_dims(embedding2, axis=0)

  file_paths = sorted(glob.glob(osp.join(camera_path_dir, '*_depth.png')))
  base_paths = [x[:-10] for x in file_paths]  # remove the '_depth.png' suffix

  # Compute interpolated appearance embeddings
  num_interpolations = len(base_paths)
  interpolated_embeddings = []
  delta_vec = (embedding2 - embedding1) / (num_interpolations - 1)
  for delta_iter in range(num_interpolations):
    x_app_embedding = embedding1 + delta_iter * delta_vec
    interpolated_embeddings.append(x_app_embedding)

  # Generate and save interpolated images
  for frame_idx, embedding in enumerate(interpolated_embeddings):
    # Read in input frame
    frame_render_path = osp.join(base_paths[frame_idx] + '_color.png')
    frame_depth_path = osp.join(base_paths[frame_idx] + '_depth.png')
    frame_seg_path = osp.join(base_paths[frame_idx] + '_seg_rgb.png')
    x_in = _load_and_concatenate_image_channels(
        rgb_path=None, rendered_path=frame_render_path,
        depth_path=frame_depth_path, seg_path=frame_seg_path)

    img = next(est.predict(
        lambda: {'conditional_input': tf.convert_to_tensor(x_in),
                 'appearance_embedding': tf.convert_to_tensor(embedding)}))
    output_img_name = '%s_%s_%03d.png' % (st_app_basename, end_app_basename,
                                          frame_idx)
    output_img_path = osp.join(output_dir, output_img_name)
    print('Saving interpolated image to %s' % output_img_path)
    with tf.gfile.Open(output_img_path, 'wb') as f:
      f.write(utils.to_png(img))


def interpolate_appearance(model_dir, input_dir, target_img_basename,
                           appearance_img1_basename, appearance_img2_basename):
  # Create output direcotry
  output_dir = osp.join(model_dir, 'interpolate_appearance_out')
  tf.gfile.MakeDirs(output_dir)

  # Build estimator
  model_fn_old = build_model_fn()
  def model_fn_wrapper(features, labels, mode, params):
    del mode
    return model_fn_old(features, labels, 'interpolate_appearance', params)
  def appearance_model_fn(features, labels, mode, params):
    del mode
    return model_fn_old(features, labels, 'compute_appearance', params)
  config = tf.estimator.RunConfig(
      save_summary_steps=1000, save_checkpoints_steps=50000,
      keep_checkpoint_max=50, log_step_count_steps=1 << 30)
  model_dir = model_dir
  est = tf.estimator.Estimator(model_fn_wrapper, model_dir, config, params={})
  est_app = tf.estimator.Estimator(appearance_model_fn, model_dir, config,
                                   params={})

  # Compute appearance embeddings for the two input appearance images.
  app_inputs = []
  for app_basename in [appearance_img1_basename, appearance_img2_basename]:
    app_rgb_path = osp.join(input_dir, app_basename + '_reference.png')
    app_rendered_path = osp.join(input_dir, app_basename + '_color.png')
    app_depth_path = osp.join(input_dir, app_basename + '_depth.png')
    app_seg_path = osp.join(input_dir, app_basename + '_seg_rgb.png')
    app_in = _load_and_concatenate_image_channels(
        rgb_path=app_rgb_path, rendered_path=app_rendered_path,
        depth_path=app_depth_path, seg_path=app_seg_path)
    # app_inputs.append(tf.convert_to_tensor(app_in))
    app_inputs.append(app_in)

  embedding1 = next(est_app.predict(
      lambda: {'peek_input': app_inputs[0]}))
  embedding2 = next(est_app.predict(
      lambda: {'peek_input': app_inputs[1]}))
  embedding1 = np.expand_dims(embedding1, axis=0)
  embedding2 = np.expand_dims(embedding2, axis=0)

  # Compute interpolated appearance embeddings
  num_interpolations = 10
  interpolated_embeddings = []
  delta_vec = (embedding2 - embedding1) / num_interpolations
  for delta_iter in range(num_interpolations + 1):
    x_app_embedding = embedding1 + delta_iter * delta_vec
    interpolated_embeddings.append(x_app_embedding)

  # Read in the generator input for the target image to render
  rendered_img_path = osp.join(input_dir, target_img_basename + '_color.png')
  depth_img_path = osp.join(input_dir, target_img_basename + '_depth.png')
  seg_img_path = osp.join(input_dir, target_img_basename + '_seg_rgb.png')
  x_in = _load_and_concatenate_image_channels(
      rgb_path=None, rendered_path=rendered_img_path,
      depth_path=depth_img_path, seg_path=seg_img_path)

  # Generate and save interpolated images
  for interpolate_iter, embedding in enumerate(interpolated_embeddings):
    img = next(est.predict(
        lambda: {'conditional_input': tf.convert_to_tensor(x_in),
                 'appearance_embedding': tf.convert_to_tensor(embedding)}))
    output_img_name = 'interpolate_%s_%s_%s_%03d.png' % (
        target_img_basename, appearance_img1_basename, appearance_img2_basename,
        interpolate_iter)
    output_img_path = osp.join(output_dir, output_img_name)
    print('Saving interpolated image to %s' % output_img_path)
    with tf.gfile.Open(output_img_path, 'wb') as f:
      f.write(utils.to_png(img))


def main(argv):
  del argv
  configs_str = options.list_options()
  tf.gfile.MakeDirs(opts.train_dir)
  with tf.gfile.Open(osp.join(opts.train_dir, 'configs.txt'), 'wb') as f:
    f.write(configs_str)
  tf.logging.info('Local configs\n%s' % configs_str)

  if opts.run_mode == 'train':
    dataset_name = opts.dataset_name
    dataset_parent_dir = opts.dataset_parent_dir
    load_pretrained_app_encoder = opts.load_pretrained_app_encoder
    load_trained_fixed_app = opts.load_from_another_ckpt
    batch_size = opts.batch_size
    train(dataset_name, dataset_parent_dir, load_pretrained_app_encoder,
          load_trained_fixed_app)
  elif opts.run_mode == 'eval':  # generate a camera path output sequence from TFRecord inputs.
    dataset_name = opts.dataset_name
    dataset_parent_dir = opts.dataset_parent_dir
    virtual_seq_name = opts.virtual_seq_name
    inp_app_img_base_path = opts.inp_app_img_base_path
    evaluate_sequence(dataset_name, dataset_parent_dir, virtual_seq_name,
                      inp_app_img_base_path)
  elif opts.run_mode == 'eval_subset':  # generate output for validation set (encoded as TFRecords)
    dataset_name = opts.dataset_name
    dataset_parent_dir = opts.dataset_parent_dir
    virtual_seq_name = opts.virtual_seq_name
    evaluate_image_set(dataset_name, dataset_parent_dir, virtual_seq_name,
                       opts.output_validation_dir, opts.batch_size)
  elif opts.run_mode == 'eval_dir':  # evaluate output for a directory with input images
    input_dir = opts.inference_input_path
    output_dir = opts.inference_output_dir
    model_dir = opts.train_dir
    infer_dir(model_dir, input_dir, output_dir)
  elif opts.run_mode == 'interpolate_appearance':  # interpolate appearance only between two images.
    model_dir = opts.train_dir
    input_dir = opts.inference_input_path
    target_img_basename = opts.target_img_basename
    app_img1_basename = opts.appearance_img1_basename
    app_img2_basename = opts.appearance_img2_basename
    interpolate_appearance(model_dir, input_dir, target_img_basename,
                           app_img1_basename, app_img2_basename)
  elif opts.run_mode == 'joint_interpolation':  # interpolate viewpoint and appearance between two images
    model_dir = opts.train_dir
    app_input_dir = opts.inference_input_path
    st_app_basename = opts.appearance_img1_basename
    end_app_basename = opts.appearance_img2_basename
    frames_dir = opts.frames_dir
    joint_interpolation(model_dir, app_input_dir, st_app_basename,
                        end_app_basename, frames_dir)
  else:
    raise ValueError('Unsupported --run_mode %s' % opts.run_mode)


if __name__ == '__main__':
  app.run(main)


================================================
FILE: options.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


from absl import flags
import numpy as np

FLAGS = flags.FLAGS

# ------------------------------------------------------------------------------
# Train flags
# ------------------------------------------------------------------------------

# Dataset, model directory and run mode
flags.DEFINE_string('train_dir', '/tmp/nerual_rendering',
                    'Directory for model training.')
flags.DEFINE_string('dataset_name', 'sanmarco9k', 'name ID for a dataset.')
flags.DEFINE_string(
    'dataset_parent_dir', '',
    'Directory containing generated tfrecord dataset.')
flags.DEFINE_string('run_mode', 'train', "{'train', 'eval', 'infer'}")
flags.DEFINE_string('imageset_dir', None, 'Directory containing trainset '
                    'images for appearance pretraining.')
flags.DEFINE_string('metadata_output_dir', None, 'Directory to save pickled '
                    'pairwise distance matrix for appearance pretraining.')
flags.DEFINE_integer('save_samples_kimg', 50, 'kimg cycle to save sample'
                     'validation ouptut during training.')

# Network inputs/outputs
flags.DEFINE_boolean('use_depth', True, 'Add depth image to the deep buffer.')
flags.DEFINE_boolean('use_alpha', False,
                     'Add alpha channel to the deep buffer.')
flags.DEFINE_boolean('use_semantic', True,
                     'Add semantic map to the deep buffer.')
flags.DEFINE_boolean('use_appearance', True,
                     'Capture appearance from an input real image.')
flags.DEFINE_integer('deep_buffer_nc', 7,
                     'Number of input channels in the deep buffer.')
flags.DEFINE_integer('appearance_nc', 10,
                     'Number of input channels to the appearance encoder.')
flags.DEFINE_integer('output_nc', 3,
                     'Number of channels for the generated image.')

# Staged training flags
flags.DEFINE_string(
    'vgg16_path', './vgg16_weights/vgg16.npy',
    'path to a *.npy file with vgg16 pretrained weights')
flags.DEFINE_boolean('load_pretrained_app_encoder', False,
                     'Warmstart appearance encoder with pretrained weights.')
flags.DEFINE_string('appearance_pretrain_dir', '',
                    'Model dir for the pretrained appearance encoder.')
flags.DEFINE_boolean('train_app_encoder', False, 'Whether to make the weights '
                     'for the appearance encoder trainable or not.')
flags.DEFINE_boolean(
    'load_from_another_ckpt', False, 'Load weights from another trained model, '
                     'e.g load model trained with a fixed appearance encoder.')
flags.DEFINE_string('fixed_appearance_train_dir', '',
                    'Model dir for training G with a fixed appearance net.')

# -----------------------------------------------------------------------------

# More hparams
flags.DEFINE_integer('train_resolution', 256,
                     'Crop train images to this resolution.')
flags.DEFINE_float('d_lr', 0.001, 'Learning rate for the discriminator.')
flags.DEFINE_float('g_lr', 0.001, 'Learning rate for the generator.')
flags.DEFINE_float('ez_lr', 0.0001, 'Learning rate for appearance encoder.')
flags.DEFINE_integer('batch_size', 8, 'Batch size for training.')
flags.DEFINE_boolean('use_scaling', True, "use He's scaling.")
flags.DEFINE_integer('num_crops', 30, 'num crops from train images'
                     '(use -1 for random crops).')
flags.DEFINE_integer('app_vector_size', 8, 'Size of latent appearance vector.')
flags.DEFINE_integer('total_kimg', 20000,
                     'Max number (in kilo) of training images for training.')
flags.DEFINE_float('adam_beta1', 0.0, 'beta1 for adam optimizer.')
flags.DEFINE_float('adam_beta2', 0.99, 'beta2 for adam optimizer.')

# Loss weights
flags.DEFINE_float('w_loss_vgg', 0.3, 'VGG loss weight.')
flags.DEFINE_float('w_loss_feat', 10., 'Feature loss weight (from pix2pixHD).')
flags.DEFINE_float('w_loss_l1', 50., 'L1 loss weight.')
flags.DEFINE_float('w_loss_z_recon', 10., 'Z reconstruction loss weight.')
flags.DEFINE_float('w_loss_gan', 1., 'Adversarial loss weight.')
flags.DEFINE_float('w_loss_z_gan', 1., 'Z adversarial loss weight.')
flags.DEFINE_float('w_loss_kl', 0.01, 'KL divergence weight.')
flags.DEFINE_float('w_loss_l2_reg', 0.01, 'Weight for L2 regression on Z.')

# -----------------------------------------------------------------------------

# Architecture and training setup
flags.DEFINE_string('arch_type', 'pggan',
                    'Architecture type: {pggan, pix2pixhd}.')
flags.DEFINE_string('training_pipeline', 'staged',
                    'Training type type: {staged, bicycle_gan, drit}.')
flags.DEFINE_integer('g_nf', 64,
                     'num filters in the first/last layers of U-net.')
flags.DEFINE_boolean('concatenate_skip_layers', True,
                     'Use concatenation for skip connections.')

## if arch_type == 'pggan':
flags.DEFINE_integer('pggan_n_blocks', 5,
                     'Num blocks for the pggan architecture.')
## if arch_type == 'pix2pixhd':
flags.DEFINE_integer('p2p_n_downsamples', 3,
                     'Num downsamples for the pix2pixHD architecture.')
flags.DEFINE_integer('p2p_n_resblocks', 4, 'Num residual blocks at the '
                     'end/start of the pix2pixHD encoder/decoder.')
## if use_drit_pipeline:
flags.DEFINE_boolean('use_concat', True, '"concat" mode from DRIT.')
flags.DEFINE_boolean('normalize_drit_Ez', True, 'Add pixelnorm layers to the '
                     'appearance encoder.')
flags.DEFINE_boolean('concat_z_in_all_layers', True, 'Inject z at each '
                     'upsampling layer in the decoder (only for DRIT baseline)')
flags.DEFINE_string('inject_z', 'to_bottleneck', 'Method for injecting z; '
                     'one of {to_encoder, to_bottleneck}.')
flags.DEFINE_boolean('use_vgg_loss', True, 'vgg v L1 reconstruction loss.')

# ------------------------------------------------------------------------------
# Inference flags
# ------------------------------------------------------------------------------

flags.DEFINE_string('inference_input_path', '',
                    'Parent directory for input images at inference time.')
flags.DEFINE_string('inference_output_dir', '', 'Output path for inference')
flags.DEFINE_string('target_img_basename', '',
                    'basename of target image to render for interpolation')
flags.DEFINE_string('virtual_seq_name', 'full_camera_path',
                    'name for the virtual camera path suffix for the TFRecord.')
flags.DEFINE_string('inp_app_img_base_path', '',
                    'base path for the input appearance image for camera paths')

flags.DEFINE_string('appearance_img1_basename', '',
                    'basename of the first appearance image for interpolation')
flags.DEFINE_string('appearance_img2_basename', '',
                    'basename of the first appearance image for interpolation')
flags.DEFINE_list('input_basenames', [], 'input basenames for inference')
flags.DEFINE_list('input_app_basenames', [], 'input appearance basenames for '
                  'inference')
flags.DEFINE_string('frames_dir', '',
                    'Folder with input frames to a camera path')
flags.DEFINE_string('output_validation_dir', '',
                    'dataset_name for storing results in a structured folder')
flags.DEFINE_string('input_rendered', '',
                    'input rendered image name for inference')
flags.DEFINE_string('input_depth', '', 'input depth image name for inference')
flags.DEFINE_string('input_seg', '',
                    'input segmentation mask image name for inference')
flags.DEFINE_string('input_app_rgb', '',
                    'input appearance rgb image name for inference')
flags.DEFINE_string('input_app_rendered', '',
                    'input appearance rendered image name for inference')
flags.DEFINE_string('input_app_depth', '',
                    'input appearance depth image name for inference')
flags.DEFINE_string('input_app_seg', '',
                    'input appearance segmentation mask image name for'
                    'inference')
flags.DEFINE_string('output_img_name', '',
                    '[OPTIONAL] output image name for inference')

# -----------------------------------------------------------------------------
# Some validation and assertions
# -----------------------------------------------------------------------------

def validate_options():
  if FLAGS.use_drit_training:
    assert FLAGS.use_appearance, 'DRIT pipeline requires --use_appearance'
  assert not (
    FLAGS.load_pretrained_appearance_encoder and FLAGS.load_from_another_ckpt), (
      'You cannot load weights for the appearance encoder from two different '
      'checkpoints!')
  if not FLAGS.use_appearance:
    print('**Warning: setting --app_vector_size to 0 since '
          '--use_appearance=False!')
    FLAGS.set_default('app_vector_size', 0)
  
# -----------------------------------------------------------------------------
# Print all options
# -----------------------------------------------------------------------------

def list_options():
  configs = ('# Run flags/options from options.py:\n'
             '# ----------------------------------\n')
  configs += ('## Train flags:\n'
              '## ------------\n')
  configs += 'train_dir = %s\n' % FLAGS.train_dir
  configs += 'dataset_name = %s\n' % FLAGS.dataset_name
  configs += 'dataset_parent_dir = %s\n' % FLAGS.dataset_parent_dir
  configs += 'run_mode = %s\n' % FLAGS.run_mode
  configs += 'save_samples_kimg = %d\n' % FLAGS.save_samples_kimg
  configs += '\n# --------------------------------------------------------\n\n'

  configs += ('## Network inputs and outputs:\n'
              '## ---------------------------\n')
  configs += 'use_depth = %s\n' % str(FLAGS.use_depth)
  configs += 'use_alpha = %s\n' % str(FLAGS.use_alpha)
  configs += 'use_semantic = %s\n' % str(FLAGS.use_semantic)
  configs += 'use_appearance = %s\n' % str(FLAGS.use_appearance)
  configs += 'deep_buffer_nc = %d\n' % FLAGS.deep_buffer_nc
  configs += 'appearance_nc = %d\n' % FLAGS.appearance_nc
  configs += 'output_nc = %d\n' % FLAGS.output_nc
  configs += 'train_resolution = %d\n' % FLAGS.train_resolution
  configs += '\n# --------------------------------------------------------\n\n'

  configs += ('## Staged training flags:\n'
              '## ----------------------\n')
  configs += 'load_pretrained_app_encoder = %s\n' % str(
                                            FLAGS.load_pretrained_app_encoder)
  configs += 'appearance_pretrain_dir = %s\n' % FLAGS.appearance_pretrain_dir
  configs += 'train_app_encoder = %s\n' % str(FLAGS.train_app_encoder)
  configs += 'load_from_another_ckpt = %s\n' % str(FLAGS.load_from_another_ckpt)
  configs += 'fixed_appearance_train_dir = %s\n' % str(
                                            FLAGS.fixed_appearance_train_dir)
  configs += '\n# --------------------------------------------------------\n\n'

  configs += ('## More hyper-parameters:\n'
              '## ----------------------\n')
  configs += 'd_lr = %f\n' % FLAGS.d_lr
  configs += 'g_lr = %f\n' % FLAGS.g_lr
  configs += 'ez_lr = %f\n' % FLAGS.ez_lr
  configs += 'batch_size = %d\n' % FLAGS.batch_size
  configs += 'use_scaling = %s\n' % str(FLAGS.use_scaling)
  configs += 'num_crops = %d\n' % FLAGS.num_crops
  configs += 'app_vector_size = %d\n' % FLAGS.app_vector_size
  configs += 'total_kimg = %d\n' % FLAGS.total_kimg
  configs += 'adam_beta1 = %f\n' % FLAGS.adam_beta1
  configs += 'adam_beta2 = %f\n' % FLAGS.adam_beta2
  configs += '\n# --------------------------------------------------------\n\n'

  configs += ('## Loss weights:\n'
              '## -------------\n')
  configs += 'w_loss_vgg = %f\n' % FLAGS.w_loss_vgg
  configs += 'w_loss_feat = %f\n' % FLAGS.w_loss_feat
  configs += 'w_loss_l1 = %f\n' % FLAGS.w_loss_l1
  configs += 'w_loss_z_recon = %f\n' % FLAGS.w_loss_z_recon
  configs += 'w_loss_gan = %f\n' % FLAGS.w_loss_gan
  configs += 'w_loss_z_gan = %f\n' % FLAGS.w_loss_z_gan
  configs += 'w_loss_kl = %f\n' % FLAGS.w_loss_kl
  configs += 'w_loss_l2_reg = %f\n' % FLAGS.w_loss_l2_reg
  configs += '\n# --------------------------------------------------------\n\n'

  configs += ('## Architecture and training setup:\n'
              '## --------------------------------\n')
  configs += 'arch_type = %s\n' % FLAGS.arch_type
  configs += 'training_pipeline = %s\n' % FLAGS.training_pipeline
  configs += 'g_nf = %d\n' % FLAGS.g_nf
  configs += 'concatenate_skip_layers = %s\n' % str(
                                                FLAGS.concatenate_skip_layers)
  configs += 'p2p_n_downsamples = %d\n' % FLAGS.p2p_n_downsamples
  configs += 'p2p_n_resblocks = %d\n' % FLAGS.p2p_n_resblocks
  configs += 'use_concat = %s\n' % str(FLAGS.use_concat)
  configs += 'normalize_drit_Ez = %s\n' % str(FLAGS.normalize_drit_Ez)
  configs += 'inject_z = %s\n' % FLAGS.inject_z
  configs += 'concat_z_in_all_layers = %s\n' % str(FLAGS.concat_z_in_all_layers)
  configs += 'use_vgg_loss = %s\n' % str(FLAGS.use_vgg_loss)
  configs += '\n# --------------------------------------------------------\n\n'

  return configs


================================================
FILE: pretrain_appearance.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from PIL import Image
from absl import app
from absl import flags
from options import FLAGS as opts
import glob
import networks
import numpy as np
import os
import os.path as osp
import pickle
import style_loss
import tensorflow as tf
import utils


def _load_and_concatenate_image_channels(
    rgb_path=None, rendered_path=None, depth_path=None, seg_path=None,
    crop_size=512):
  if (rgb_path is None and rendered_path is None and depth_path is None and
      seg_path is None):
    raise ValueError('At least one of the inputs has to be not None')

  channels = ()
  if rgb_path is not None:
    rgb_img = np.array(Image.open(rgb_path)).astype(np.float32)
    rgb_img = utils.get_central_crop(rgb_img, crop_size, crop_size)
    channels = channels + (rgb_img,)
  if rendered_path is not None:
    rendered_img = np.array(Image.open(rendered_path)).astype(np.float32)
    rendered_img = utils.get_central_crop(rendered_img, crop_size, crop_size)
    if not opts.use_alpha:
      rendered_img = rendered_img[:,:, :3]  # drop the alpha channel
    channels = channels + (rendered_img,)
  if depth_path is not None:
    depth_img = np.array(Image.open(depth_path))
    depth_img = depth_img.astype(np.float32)
    depth_img = utils.get_central_crop(depth_img, crop_size, crop_size)
    channels = channels + (depth_img,)
  if seg_path is not None:
    seg_img = np.array(Image.open(seg_path)).astype(np.float32)
    channels = channels + (seg_img,)
  # Concatenate and normalize channels
  img = np.dstack(channels)
  img = img * (2.0 / 255) - 1.0
  return img


def read_single_appearance_input(rgb_img_path):
  base_path = rgb_img_path[:-14]  # remove the '_reference.png' suffix
  rendered_img_path = base_path + '_color.png'
  depth_img_path = base_path + '_depth.png'
  semantic_img_path = base_path + '_seg_rgb.png'
  network_input_img = _load_and_concatenate_image_channels(
      rgb_img_path, rendered_img_path, depth_img_path, semantic_img_path,
      crop_size=opts.train_resolution)
  return network_input_img


def get_triplet_input_fn(dataset_path, dist_file_path=None, k_max_nearest=5,
                         k_max_farthest=13):
  input_images_pattern = osp.join(dataset_path, '*_reference.png')
  filenames = sorted(glob.glob(input_images_pattern))
  print('DBG: obtained %d input filenames for triplet inputs' % len(filenames))
  print('DBG: Computing pairwise style distances:')
  if dist_file_path is not None and osp.exists(dist_file_path):
    print('*** Loading distance matrix from %s' % dist_file_path)
    with open(dist_file_path, 'rb') as f:
      dist_matrix = pickle.load(f)['dist_matrix']
      print('loaded a dist_matrix of shape: %s' % str(dist_matrix.shape))
  else:
    dist_matrix = style_loss.compute_pairwise_style_loss_v2(filenames)
    dist_dict = {'dist_matrix': dist_matrix}
    print('Saving distance matrix to %s' % dist_file_path)
    with open(dist_file_path, 'wb') as f:
      pickle.dump(dist_dict, f)

  # Sort neighbors for each anchor image
  num_imgs = len(dist_matrix)
  sorted_neighbors = [np.argsort(dist_matrix[ii, :]) for ii in range(num_imgs)]

  def triplet_input_fn(anchor_idx):
    # start from 1 to avoid getting the same image as its own neighbor
    positive_neighbor_idx = np.random.randint(1, k_max_nearest + 1)
    negative_neighbor_idx = num_imgs - 1 - np.random.randint(0, k_max_farthest)
    positive_img_idx = sorted_neighbors[anchor_idx][positive_neighbor_idx]
    negative_img_idx = sorted_neighbors[anchor_idx][negative_neighbor_idx]
    # Read anchor image
    anchor_rgb_path = osp.join(dataset_path, filenames[anchor_idx])
    anchor_input = read_single_appearance_input(anchor_rgb_path)
    # Read positive image
    positive_rgb_path = osp.join(dataset_path, filenames[positive_img_idx])
    positive_input = read_single_appearance_input(positive_rgb_path)
    # Read negative image
    negative_rgb_path = osp.join(dataset_path, filenames[negative_img_idx])
    negative_input = read_single_appearance_input(negative_rgb_path)
    # Return triplet
    return anchor_input, positive_input, negative_input

  return triplet_input_fn


def get_tf_triplet_dataset_iter(
    dataset_path, trainset_size, dist_file_path, batch_size=4,
    deterministic_flag=False, shuffle_buf_size=128, repeat_flag=True):
  # Create a dataset of anchor image indices.
  idx_dataset = tf.data.Dataset.range(trainset_size)
  # Create a mapper function from anchor idx to triplet images.
  triplet_mapper = lambda idx: tuple(tf.py_func(
      get_triplet_input_fn(dataset_path, dist_file_path), [idx],
      [tf.float32, tf.float32, tf.float32]))
  # Convert triplet to a dictionary for the estimator input format.
  triplet_to_dict_mapper = lambda anchor, pos, neg: {
      'anchor_img': anchor, 'positive_img': pos, 'negative_img': neg}
  if repeat_flag:
    idx_dataset = idx_dataset.repeat()  # Repeat indefinitely.
  if not deterministic_flag:
    idx_dataset = idx_dataset.shuffle(shuffle_buf_size)
    triplet_dataset = idx_dataset.map(
        triplet_mapper, num_parallel_calls=max(4, batch_size // 4))
    triplet_dataset = triplet_dataset.map(
        triplet_to_dict_mapper, num_parallel_calls=max(4, batch_size // 4))
  else:
    triplet_dataset = idx_dataset.map(triplet_mapper, num_parallel_calls=None)
    triplet_dataset = triplet_dataset.map(triplet_to_dict_mapper,
                                          num_parallel_calls=None)
  triplet_dataset = triplet_dataset.batch(batch_size)
  if not deterministic_flag:
    triplet_dataset = triplet_dataset.prefetch(4)  # Prefetch a few batches.
  return triplet_dataset.make_one_shot_iterator()


def build_model_fn(batch_size, lr_app_pretrain=0.0001, adam_beta1=0.0,
                   adam_beta2=0.99):
  def model_fn(features, labels, mode, params):
    del labels, params

    step = tf.train.get_global_step()
    app_func = networks.DRITAppearanceEncoderConcat(
      'appearance_net', opts.appearance_nc, opts.normalize_drit_Ez)

    if mode == tf.estimator.ModeKeys.TRAIN:
      op_increment_step = tf.assign_add(step, 1)
      with tf.name_scope('Appearance_Loss'):
        anchor_img = features['anchor_img']
        positive_img = features['positive_img']
        negative_img = features['negative_img']
        # Compute embeddings (each of shape [batch_sz, 1, 1, app_vector_sz])
        z_anchor, _, _ = app_func(anchor_img)
        z_pos, _, _ = app_func(positive_img)
        z_neg, _, _ = app_func(negative_img)
        # Squeeze into shape of [batch_sz x vec_sz]
        anchor_embedding = tf.squeeze(z_anchor, axis=[1, 2], name='z_anchor')
        positive_embedding = tf.squeeze(z_pos, axis=[1, 2])
        negative_embedding = tf.squeeze(z_neg, axis=[1, 2])
        # Compute triplet loss
        margin = 0.1
        anchor_positive_dist = tf.reduce_sum(
            tf.square(anchor_embedding - positive_embedding), axis=1)
        anchor_negative_dist = tf.reduce_sum(
            tf.square(anchor_embedding - negative_embedding), axis=1)
        triplet_loss = anchor_positive_dist - anchor_negative_dist + margin
        triplet_loss = tf.maximum(triplet_loss, 0.)
        triplet_loss = tf.reduce_sum(triplet_loss) / batch_size
        tf.summary.scalar('appearance_triplet_loss', triplet_loss)

        # Image summaries
        anchor_rgb = tf.slice(anchor_img, [0, 0, 0, 0], [-1, -1, -1, 3])
        positive_rgb = tf.slice(positive_img, [0, 0, 0, 0], [-1, -1, -1, 3])
        negative_rgb = tf.slice(negative_img, [0, 0, 0, 0], [-1, -1, -1, 3])
        tb_vis = tf.concat([anchor_rgb, positive_rgb, negative_rgb], axis=2)
        with tf.name_scope('triplet_vis'):
          tf.summary.image('anchor-pos-neg', tb_vis)

      optimizer = tf.train.AdamOptimizer(lr_app_pretrain, adam_beta1,
                                         adam_beta2)
      optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
      app_vars = utils.model_vars('appearance_net')[0]
      print('\n\n***************************************************')
      print('DBG: len(app_vars) = %d' % len(app_vars))
      for ii, v in enumerate(app_vars):
        print('%03d) %s' % (ii, str(v)))
      print('***************************************************\n\n')
      app_train_op = optimizer.minimize(triplet_loss, var_list=app_vars)
      return tf.estimator.EstimatorSpec(
          mode=mode, loss=triplet_loss,
          train_op=tf.group(app_train_op, op_increment_step))
    elif mode == tf.estimator.ModeKeys.PREDICT:
      imgs = features['anchor_img']
      embeddings = tf.squeeze(app_func(imgs), axis=[1, 2])
      app_vars = utils.model_vars('appearance_net')[0]
      tf.train.init_from_checkpoint(osp.join(opts.train_dir),
                                    {'appearance_net/': 'appearance_net/'})
      return tf.estimator.EstimatorSpec(mode=mode, predictions=embeddings)
    else:
      raise ValueError('Unsupported mode for the appearance model: ' + mode)

  return model_fn


def compute_dist_matrix(imageset_dir, dist_file_path, recompute_dist=False):
  if not recompute_dist and osp.exists(dist_file_path):
   print('*** Loading distance matrix from %s' % dist_file_path)
   with open(dist_file_path, 'rb') as f:
     dist_matrix = pickle.load(f)['dist_matrix']
     print('loaded a dist_matrix of shape: %s' % str(dist_matrix.shape))
     return dist_matrix
  else:
    images_paths = sorted(glob.glob(osp.join(imageset_dir, '*_reference.png')))
    dist_matrix = style_loss.compute_pairwise_style_loss_v2(images_paths)
    dist_dict = {'dist_matrix': dist_matrix}
    print('Saving distance matrix to %s' % dist_file_path)
    with open(dist_file_path, 'wb') as f:
      pickle.dump(dist_dict, f)
    return dist_matrix


def train_appearance(train_dir, imageset_dir, dist_file_path):
  batch_size = 8
  lr_app_pretrain = 0.001

  trainset_size = len(glob.glob(osp.join(imageset_dir, '*_reference.png')))
  resume_step = utils.load_global_step_from_checkpoint_dir(train_dir)
  if resume_step != 0:
    tf.logging.warning('DBG: resuming apperance pretraining at %d!' %
                       resume_step)
  model_fn = build_model_fn(batch_size, lr_app_pretrain)
  config = tf.estimator.RunConfig(
      save_summary_steps=50,
      save_checkpoints_steps=500,
      keep_checkpoint_max=5,
      log_step_count_steps=100)
  est = tf.estimator.Estimator(
      tf.contrib.estimator.replicate_model_fn(model_fn), train_dir,
      config, params={})
  # Get input function
  input_train_fn = lambda: get_tf_triplet_dataset_iter(
      imageset_dir, trainset_size, dist_file_path,
      batch_size=batch_size).get_next()
  print('Starting pretraining steps...')
  est.train(input_train_fn, steps=None, hooks=None)  # train indefinitely


def main(argv):
  if len(argv) > 1:
    raise app.UsageError('Too many command-line arguments.')

  train_dir = opts.train_dir
  dataset_name = opts.dataset_name
  imageset_dir = opts.imageset_dir
  output_dir = opts.metadata_output_dir
  if not osp.exists(output_dir):
    os.makedirs(output_dir)
  dist_file_path = osp.join(output_dir, 'dist_%s.pckl' % dataset_name)
  compute_dist_matrix(imageset_dir, dist_file_path)
  train_appearance(train_dir, imageset_dir, dist_file_path)

if __name__ == '__main__':
  app.run(main)


================================================
FILE: segment_dataset.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Generate semantic segmentations
This module uses Xception model trained on ADE20K dataset to generate semantic
segmentation mask to any set of images.
"""

from absl import app
from absl import flags
from PIL import Image
import glob
import matplotlib.pyplot as plt
import numpy as np
import os
import os.path as osp
import shutil
import tensorflow as tf
import utils


def get_semantic_color_coding():
  """
  assigns the 30 (actually 29) semantic colors from cityscapes semantic mapping
  to selected classes from the ADE20K150 semantic classes.
  """
  # Below are the 30 cityscape colors (one is duplicate. so total is 29 not 30)
  colors = [
    (111, 74,  0),
    ( 81,  0, 81),
    (128, 64,128),
    (244, 35,232),
    (250,170,160),
    (230,150,140),
    ( 70, 70, 70),
    (102,102,156),
    (190,153,153),
    (180,165,180),
    (150,100,100),
    (150,120, 90),
    (153,153,153),
    # (153,153,153),
    (250,170, 30),
    (220,220,  0),
    (107,142, 35),
    (152,251,152),
    ( 70,130,180),
    (220, 20, 60),
    (255,  0,  0),
    (  0,  0,142),
    (  0,  0, 70),
    (  0, 60,100),
    (  0,  0, 90),
    (  0,  0,110),
    (  0, 80,100),
    (  0,  0,230),
    (119, 11, 32),
    (  0,  0,142)]
  k_num_ade20k_classes = 150
  # initially all 150 classes are mapped to a single color (last color idx: -1)
  # Some classes are to be assigned independent colors
  # semantic classes are 1-based (1 thru 150)
  semantic_to_color_idx = -1 * np.ones(k_num_ade20k_classes + 1, dtype=int)
  semantic_to_color_idx [1] = 0    # wall
  semantic_to_color_idx [2] = 1    # building;edifice
  semantic_to_color_idx [3] = 2    # sky
  semantic_to_color_idx [105] = 3  # fountain
  semantic_to_color_idx [27] = 4   # sea
  semantic_to_color_idx [60] = 5   # stairway;staircase 
  semantic_to_color_idx [5] = 6    # tree
  semantic_to_color_idx [12] = 7   # sidewalk;pavement 
  semantic_to_color_idx [4]  = 7   # floor;flooring
  semantic_to_color_idx [7]  = 7   # road;route
  semantic_to_color_idx [13] = 8   # people
  semantic_to_color_idx [18] = 9   # plant;flora;plant;life
  semantic_to_color_idx [17] = 10  # mountain;mount
  semantic_to_color_idx [20] = 11  # chair
  semantic_to_color_idx [6] = 12   # ceiling
  semantic_to_color_idx [22] = 13  # water
  semantic_to_color_idx [35] = 14  # rock;stone
  semantic_to_color_idx [14] = 15  # earth;ground
  semantic_to_color_idx [10] = 16  # grass
  semantic_to_color_idx [70] = 17  # bench
  semantic_to_color_idx [54] = 18  # stairs;steps
  semantic_to_color_idx [101] = 19 # poster
  semantic_to_color_idx [77] = 20  # boat
  semantic_to_color_idx [85] = 21  # tower
  semantic_to_color_idx [23] = 22  # painting;picture
  semantic_to_color_idx [88] = 23  # streetlight;stree;lamp
  semantic_to_color_idx [43] = 24  # column;pillar
  semantic_to_color_idx [9] = 25   # window;windowpane
  semantic_to_color_idx [15] = 26  # door;
  semantic_to_color_idx [133] = 27 # sculpture

  semantic_to_rgb = np.array(
    [colors[col_idx][:] for col_idx in semantic_to_color_idx])
  return semantic_to_rgb


def _apply_colors(seg_images_path, save_dir, idx_to_color):
  for i, img_path in enumerate(seg_images_path):
    print('processing img #%05d / %05d: %s' % (i, len(seg_images_path),
                                               osp.split(img_path)[1]))
    seg = np.array(Image.open(img_path))
    seg_rgb = np.zeros(seg.shape + (3,), dtype=np.uint8)
    for col_idx in range(len(idx_to_color)):
      if idx_to_color[col_idx][0] != -1:
        mask = seg == col_idx
        seg_rgb[mask, :] = idx_to_color[col_idx][:]

    parent_dir, filename = osp.split(img_path)
    basename, ext = osp.splitext(filename)
    out_filename = basename + "_rgb.png"
    out_filepath = osp.join(save_dir, out_filename)
    # Save rescaled segmentation image
    Image.fromarray(seg_rgb).save(out_filepath)


# The frozen xception model only segments 512x512 images. But it would be better
# to segment the full image instead!
def segment_images(images_path, xception_frozen_graph_path, save_dir,
                   crop_height=512, crop_width=512):
  if not osp.exists(xception_frozen_graph_path):
    raise OSError('Xception frozen graph not found at %s' %
                            xception_frozen_graph_path)
  with tf.gfile.GFile(xception_frozen_graph_path, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

  with tf.Graph().as_default() as graph:
    new_input = tf.placeholder(tf.uint8, [1, crop_height, crop_width, 3],
                               name="new_input")
    tf.import_graph_def(
      graph_def,
      input_map={"ImageTensor:0": new_input},
      return_elements=None,
      name="sem_seg",
      op_dict=None,
      producer_op_list=None
    )

  corrupted_dir = osp.join(save_dir, 'corrupted')
  if not osp.exists(corrupted_dir):
    os.makedirs(corrupted_dir)
  with tf.Session(graph=graph) as sess:
    for i, img_path in enumerate(images_path):
      print('Segmenting image %05d / %05d: %s' % (i + 1, len(images_path),
                                                  img_path))
      img = np.array(Image.open(img_path))
      if len(img.shape) == 2 or img.shape[2] != 3:
        print('Warning! corrupted image %s' % img_path)
        img_base_path = img_path[:-14]  # remove the '_reference.png' suffix
        srcs = sorted(glob.glob(img_base_path + '_*'))
        dest = unicode(corrupted_dir + '/.')
        for src in srcs:
          shutil.move(src, dest)
        continue
      img = utils.get_central_crop(img, crop_height=crop_height,
                             crop_width=crop_width)
      img = np.expand_dims(img, 0)  # convert to NHWC format
      seg = sess.run("sem_seg/SemanticPredictions:0", feed_dict={
          new_input: img})
      assert np.max(seg[:]) <= 255, 'segmentation image is not of type uint8!'
      seg = np.squeeze(np.uint8(seg))  # convert to uint8 and squeeze to WxH.
      parent_dir, filename = osp.split(img_path)
      basename, ext = osp.splitext(filename)
      basename = basename[:-10]  # remove the '_reference' suffix
      seg_filename = basename + "_seg.png"
      seg_filepath = osp.join(save_dir, seg_filename)
      # Save segmentation image
      Image.fromarray(seg).save(seg_filepath)

def segment_and_color_dataset(dataset_dir, xception_frozen_graph_path,
                              splits=None, resegment_images=True):
  if splits is None:
    imgs_dirs = [dataset_dir]
  else:
    imgs_dirs = [osp.join(dataset_dir, split) for split in splits]
  
  for cur_dir in imgs_dirs:
    imgs_file_pattern = osp.join(cur_dir, '*_reference.png')
    images_path = sorted(glob.glob(imgs_file_pattern))
    if resegment_images:
      segment_images(images_path, xception_frozen_graph_path, cur_dir,
                     crop_height=512, crop_width=512)

  idx_to_col = get_semantic_color_coding()

  for cur_dir in imgs_dirs:
    save_dir = cur_dir
    seg_file_pattern = osp.join(cur_dir, '*_seg.png')
    seg_imgs_paths = sorted(glob.glob(seg_file_pattern))
    _apply_colors(seg_imgs_paths, save_dir, idx_to_col)


================================================
FILE: staged_model.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Neural re-rerendering in the wild.

Implementation of the staged training pipeline.
"""

from options import FLAGS as opts
import losses
import networks
import tensorflow as tf
import utils


def create_computation_graph(x_in, x_gt, x_app=None, arch_type='pggan',
                             use_appearance=True):
  """Create the models and the losses.

  Args:
    x_in: 4D tensor, batch of conditional input images in NHWC format.
    x_gt: 2D tensor, batch ground-truth images in NHWC format.
    x_app: 4D tensor, batch of input appearance images.

  Returns:
    Dictionary of placeholders and TF graph functions.
  """
  # ---------------------------------------------------------------------------
  # Build models/networks
  # ---------------------------------------------------------------------------

  rerenderer = networks.RenderingModel(arch_type, use_appearance)
  app_enc = rerenderer.get_appearance_encoder()
  discriminator = networks.MultiScaleDiscriminator(
      'd_model', opts.appearance_nc, num_scales=3, nf=64, n_layers=3,
      get_fmaps=False)

  # ---------------------------------------------------------------------------
  # Forward pass
  # ---------------------------------------------------------------------------

  if opts.use_appearance:
    z_app, _, _ = app_enc(x_app)
  else:
    z_app = None

  y = rerenderer(x_in, z_app)

  # ---------------------------------------------------------------------------
  # Losses
  # ---------------------------------------------------------------------------

  w_loss_gan = opts.w_loss_gan
  w_loss_recon = opts.w_loss_vgg if opts.use_vgg_loss else opts.w_loss_l1

  # compute discriminator logits
  disc_real_featmaps = discriminator(x_gt, x_in)
  disc_fake_featmaps = discriminator(y, x_in)

  # discriminator loss
  loss_d_real = losses.multiscale_discriminator_loss(disc_real_featmaps, True)
  loss_d_fake = losses.multiscale_discriminator_loss(disc_fake_featmaps, False)
  loss_d = loss_d_real + loss_d_fake

  # generator loss
  loss_g_gan = losses.multiscale_discriminator_loss(disc_fake_featmaps, True)
  if opts.use_vgg_loss:
    vgg_layers = ['conv%d_2' % i for i in range(1, 6)]  # conv1 through conv5
    vgg_layer_weights = [1./32, 1./16, 1./8, 1./4, 1.]
    vgg_loss = losses.PerceptualLoss(y, x_gt, [256, 256, 3], vgg_layers,
                                     vgg_layer_weights)  # NOTE: shouldn't hardcode image size!
    loss_g_recon = vgg_loss()
  else:
    loss_g_recon = losses.L1_loss(y, x_gt)
  loss_g = w_loss_gan * loss_g_gan + w_loss_recon * loss_g_recon

  # ---------------------------------------------------------------------------
  # Tensorboard visualizations
  # ---------------------------------------------------------------------------

  x_in_render = tf.slice(x_in, [0, 0, 0, 0], [-1, -1, -1, 3])
  if opts.use_semantic:
    x_in_semantic = tf.slice(x_in, [0, 0, 0, 4], [-1, -1, -1, 3])
    tb_visualization = tf.concat([x_in_render, x_in_semantic, y, x_gt], axis=2)
  else:
    tb_visualization = tf.concat([x_in_render, y, x_gt], axis=2)
  tf.summary.image('rendered-semantic-generated-gt tuple', tb_visualization)

  # Show input appearance images
  if opts.use_appearance:
    x_app_rgb = tf.slice(x_app, [0, 0, 0, 0], [-1, -1, -1, 3])
    x_app_sem = tf.slice(x_app, [0, 0, 0, 7], [-1, -1, -1, -1])
    tb_app_visualization = tf.concat([x_app_rgb, x_app_sem], axis=2)
    tf.summary.image('input appearance image', tb_app_visualization)

  # Loss summaries
  with tf.name_scope('Discriminator_Loss'):
    tf.summary.scalar('D_real_loss', loss_d_real)
    tf.summary.scalar('D_fake_loss', loss_d_fake)
    tf.summary.scalar('D_total_loss', loss_d)
  with tf.name_scope('Generator_Loss'):
    tf.summary.scalar('G_GAN_loss', w_loss_gan * loss_g_gan)
    tf.summary.scalar('G_reconstruction_loss', w_loss_recon * loss_g_recon)
    tf.summary.scalar('G_total_loss', loss_g)

  # ---------------------------------------------------------------------------
  # Optimizers
  # ---------------------------------------------------------------------------

  def get_optimizer(lr, loss, var_list):
    optimizer = tf.train.AdamOptimizer(lr, opts.adam_beta1, opts.adam_beta2)
    # optimizer = tf.contrib.estimator.TowerOptimizer(optimizer)
    return optimizer.minimize(loss, var_list=var_list)

  # Training ops.
  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
  with tf.control_dependencies(update_ops):
    with tf.variable_scope('optimizers'):
      d_vars = utils.model_vars('d_model')[0]
      g_vars_all = utils.model_vars('g_model')[0]
      train_d = [get_optimizer(opts.d_lr, loss_d, d_vars)]
      train_g = [get_optimizer(opts.g_lr, loss_g, g_vars_all)]

      train_app_encoder = []
      if opts.train_app_encoder:
        lr_app = opts.ez_lr
        app_enc_vars = utils.model_vars('appearance_net')[0]
        train_app_encoder.append(get_optimizer(lr_app, loss_g, app_enc_vars))

  ema = tf.train.ExponentialMovingAverage(decay=0.999)
  with tf.control_dependencies(train_g + train_app_encoder):
    inference_vars_all = g_vars_all
    if opts.use_appearance:
      app_enc_vars = utils.model_vars('appearance_net')[0]
      inference_vars_all += app_enc_vars
    ema_op = ema.apply(inference_vars_all)

  print('***************************************************')
  print('len(g_vars_all) = %d' % len(g_vars_all))
  for ii, v in enumerate(g_vars_all):
    print('%03d) %s' % (ii, str(v)))
  print('-------------------------------------------------------')
  print('len(d_vars) = %d' % len(d_vars))
  for ii, v in enumerate(d_vars):
    print('%03d) %s' % (ii, str(v)))
  if opts.train_app_encoder:
    print('-------------------------------------------------------')
    print('len(app_enc_vars) = %d' % len(app_enc_vars))
    for ii, v in enumerate(app_enc_vars):
      print('%03d) %s' % (ii, str(v)))
  print('***************************************************\n\n')

  return {
      'train_disc_op': tf.group(train_d),
      'train_renderer_op': ema_op,
      'total_loss_d': loss_d,
      'loss_d_real': loss_d_real,
      'loss_d_fake': loss_d_fake,
      'loss_g_gan': w_loss_gan * loss_g_gan,
      'loss_g_recon': w_loss_recon * loss_g_recon,
      'total_loss_g': loss_g}


================================================
FILE: style_loss.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

from PIL import Image
from options import FLAGS as opts
import data
import layers
import numpy as np
import tensorflow as tf
import utils
import vgg16


def gram_matrix(layer):
  """Computes the gram_matrix for a batch of single vgg layer
  Input:
    layer: a batch of vgg activations for a single conv layer
  Returns:
    gram: [batch_sz x num_channels x num_channels]: a batch of gram matrices
  """
  batch_size, height, width, num_channels = layer.get_shape().as_list()
  features = tf.reshape(layer, [batch_size, height * width, num_channels])
  num_elements = tf.constant(num_channels * height * width, tf.float32)
  gram = tf.matmul(features, features, adjoint_a=True) / num_elements
  return gram


def compute_gram_matrices(
    images, vgg_layers=['conv1_2', 'conv2_2', 'conv3_2', 'conv4_2', 'conv5_2']):
  """Computes the gram matrix representation of a batch of images"""
  vgg_net = vgg16.Vgg16(opts.vgg16_path)
  vgg_acts = vgg_net.get_vgg_activations(images, vgg_layers)
  grams = [gram_matrix(layer) for layer in vgg_acts]
  return grams


def compute_pairwise_style_loss_v2(image_paths_list):
  grams_all = [None] * len(image_paths_list)
  crop_height, crop_width = opts.train_resolution, opts.train_resolution
  img_var = tf.placeholder(tf.float32, shape=[1, crop_height, crop_width, 3])
  vgg_layers = ['conv%d_2' % i for i in range(1, 6)]  # conv1 through conv5
  grams_ops = compute_gram_matrices(img_var, vgg_layers)
  with tf.Session() as sess:
    for ii, img_path in enumerate(image_paths_list):
      print('Computing gram matrices for image #%d' % (ii + 1))
      img = np.array(Image.open(img_path), dtype=np.float32)
      img = img * 2. / 255. - 1  # normalize image
      img = utils.get_central_crop(img, crop_height, crop_width)
      img = np.expand_dims(img, axis=0)
      grams_all[ii] = sess.run(grams_ops, feed_dict={img_var: img})
  print('Number of images = %d' % len(grams_all))
  print('Gram matrices per image:')
  for i in range(len(grams_all[0])):
    print('gram_matrix[%d].shape = %s' % (i, grams_all[0][i].shape))
  n_imgs = len(grams_all)
  dist_matrix = np.zeros((n_imgs, n_imgs))
  for i in range(n_imgs):
    print('Computing distances for image #%d' % i)
    for j in range(i + 1, n_imgs):
      loss_style = 0
      # Compute loss using all gram matrices from all layers
      for gram_i, gram_j in zip(grams_all[i], grams_all[j]):
        loss_style += np.mean((gram_i - gram_j) ** 2, axis=(1, 2))
      dist_matrix[i][j] = dist_matrix[j][i] = loss_style

  return dist_matrix


================================================
FILE: utils.py
================================================
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Utilities for GANs.

Basic functions such as generating sample grid, exporting to PNG, etc...
"""

import functools
import numpy as np
import os.path
import tensorflow as tf
import time


def crop_to_multiple(img, size_multiple=64):
  """ Crops the image so that its dimensions are multiples of size_multiple."""
  new_width = (img.shape[1] // size_multiple) * size_multiple
  new_height = (img.shape[0] // size_multiple) * size_multiple
  offset_x = (img.shape[1] - new_width) // 2
  offset_y = (img.shape[0] - new_height) // 2
  return img[offset_y:offset_y + new_height, offset_x:offset_x + new_width, :]


def get_central_crop(img, crop_height=512, crop_width=512):
  if len(img.shape) == 2:
    img = np.expand_dims(img, axis=2)
  assert len(img.shape) == 3, ('input image should be either a 2D or 3D matrix,'
                               ' but input was of shape %s' % str(img.shape))
  height, width, _ = img.shape
  assert height >= crop_height and width >= crop_width, ('input image cannot '
      'be smaller than the requested crop size')
  st_y = (height - crop_height) // 2
  st_x = (width - crop_width) // 2
  return np.squeeze(img[st_y : st_y + crop_height, st_x : st_x + crop_width, :])


def load_global_step_from_checkpoint_dir(checkpoint_dir):
  """Loads  the global step from the checkpoint directory.

  Args:
    checkpoint_dir: string, path to the checkpoint directory.

  Returns:
    int, the global step of the latest checkpoint or 0 if none was found.
  """
  try:
    checkpoint_reader = tf.train.NewCheckpointReader(
        tf.train.latest_checkpoint(checkpoint_dir))
    return checkpoint_reader.get_tensor(tf.GraphKeys.GLOBAL_STEP)
  except:
    return 0


def model_vars(prefix):
  """Return trainable variables matching a prefix.

  Args:
    prefix: string, the prefix variable names must match.

  Returns:
    a tuple (match, others) of TF variables, 'match' contains the matched
     variables and 'others' contains the remaining variables.
  """
  match, no_match = [], []
  for x in tf.trainable_variables():
    if x.name.startswith(prefix):
      match.append(x)
    else:
      no_match.append(x)
  return match, no_match


def to_png(x):
  """Convert a 3D tensor to png.

  Args:
    x: Tensor, 01C formatted input image.

  Returns:
    Tensor, 1D string representing the image in png format.
  """
  with tf.Graph().as_default():
    with tf.Session() as sess_temp:
      x = tf.constant(x)
      y = tf.image.encode_png(
          tf.cast(
              tf.clip_by_value(tf.round(127.5 + 127.5 * x), 0, 255), tf.uint8),
          compression=9)
      return sess_temp.run(y)


def images_to_grid(images):
  """Converts a grid of images (5D tensor) to a single image.

  Args:
    images: 5D tensor (count_y, count_x, height, width, colors), grid of images.

  Returns:
    a 3D tensor image of shape (count_y * height, count_x * width, colors).
  """
  ny, nx, h, w, c = images.shape
  images = images.transpose(0, 2, 1, 3, 4)
  images = images.reshape([ny * h, nx * w, c])
  return images


def save_images(image, output_dir, cur_nimg):
  """Saves images to disk.

  Saves a file called 'name.png' containing the latest samples from the
   generator and a file called 'name_123.png' where 123 is the KiB of trained
   images.

  Args:
    image: 3D numpy array (height, width, colors), the image to save.
    output_dir: string, the directory where to save the image.
    cur_nimg: int, current number of images seen by training.

  Returns:
    None
  """
  for name in ('name.png', 'name_%06d.png' % (cur_nimg >> 10)):
    with tf.gfile.Open(os.path.join(output_dir, name), 'wb') as f:
      f.write(image)


class HookReport(tf.train.SessionRunHook):
  """Custom reporting hook.

  Register your tensor scalars with HookReport.log_tensor(my_tensor, 'my_name').
  This hook will report their average values over report period argument
  provided to the constructed. The values are printed in the order the tensors
  were registered.

  Attributes:
    step: int, the current global step.
    active: bool, whether logging is active or disabled.
  """
  _REPORT_KEY = 'report'
  _TENSOR_NAMES = {}

  def __init__(self, period, batch_size):
    self.step = 0
    self.active = True
    self._period = period // batch_size
    self._batch_size = batch_size
    self._sums = np.array([])
    self._count = 0
    self._nimgs_per_cycle = 0
    self._step_ratio = 0
    self._start = time.time()
    self._nimgs = 0
    self._batch_size = batch_size

  def disable(self):
    parent = self

    class Disabler(object):

      def __enter__(self):
        parent.active = False
        return parent

      def __exit__(self, exc_type, exc_val, exc_tb):
        parent.active = True

    return Disabler()

  def begin(self):
    self.active = True
    self._count = 0
    self._nimgs_per_cycle = 0
    self._start = time.time()

  def before_run(self, run_context):
    if not self.active:
      return
    del run_context
    fetches = tf.get_collection(self._REPORT_KEY)
    return tf.train.SessionRunArgs(fetches)

  def after_run(self, run_context, run_values):
    if not self.active:
      return
    del run_context
    results = run_values.results
    # Note: sometimes the returned step is incorrect (off by one) for some
    # unknown reason.
    self.step = results[-1] + 1
    self._count += 1
    self._nimgs_per_cycle += self._batch_size
    self._nimgs += self._batch_size

    if not self._sums.size:
      self._sums = np.array(results[:-1], 'd')
    else:
      self._sums += np.array(results[:-1], 'd')

    if self.step // self._period != self._step_ratio:
      fetches = tf.get_collection(self._REPORT_KEY)[:-1]
      stats = '  '.join('%s=% .2f' % (self._TENSOR_NAMES[tensor],
                                      value / self._count)
                        for tensor, value in zip(fetches, self._sums))
      stop = time.time()
      tf.logging.info('step=%d, kimg=%d  %s  [%.2f img/s]' %
                      (self.step, ((self.step * self._batch_size) >> 10),
                       stats, self._nimgs_per_cycle / (stop - self._start)))
      self._step_ratio = self.step // self._period
      self._start = stop
      self._sums *= 0
      self._count = 0
      self._nimgs_per_cycle = 0

  def end(self, session=None):
    del session

  @classmethod
  def log_tensor(cls, tensor, name):
    """Adds a tensor to be reported by the hook.

    Args:
      tensor: `tensor scalar`, a value to report.
      name: string, the name to give the value in the report.

    Returns:
      None.
    """
    cls._TENSOR_NAMES[tensor] = name
    tf.add_to_collection(cls._REPORT_KEY, tensor)

Download .txt

gitextract_k0balaqs/

├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── data.py
├── dataset_utils.py
├── evaluate_quantitative_metrics.py
├── layers.py
├── losses.py
├── networks.py
├── neural_rerendering.py
├── options.py
├── pretrain_appearance.py
├── segment_dataset.py
├── staged_model.py
├── style_loss.py
└── utils.py

Download .txt

SYMBOL INDEX (130 symbols across 13 files)

FILE: data.py
  function provide_data (line 24) | def provide_data(dataset_name='', parent_dir='', batch_size=8, subset=None,
  function _parser_rendered_dataset (line 40) | def _parser_rendered_dataset(
  function multi_input_fn_record (line 130) | def multi_input_fn_record(

FILE: dataset_utils.py
  class AlignedRenderedDataset (line 38) | class AlignedRenderedDataset(object):
    method __init__ (line 39) | def __init__(self, rendered_filepattern, use_semantic_map=True):
    method __iter__ (line 52) | def __iter__(self):
    method __next__ (line 55) | def __next__(self):
    method next (line 58) | def next(self):
  function filter_out_sparse_renders (line 114) | def filter_out_sparse_renders(dataset_dir, splits, ratio_threshold=0.15):
  function _to_example (line 153) | def _to_example(dictionary):
  function _generate_tfrecord_dataset (line 172) | def _generate_tfrecord_dataset(generator,
  function export_aligned_dataset_to_tfrecord (line 193) | def export_aligned_dataset_to_tfrecord(
  function main (line 218) | def main(argv):

FILE: evaluate_quantitative_metrics.py
  function _extract_real_and_fake_from_concatenated_output (line 34) | def _extract_real_and_fake_from_concatenated_output(val_set_out_dir):
  function compute_l1_loss_metric (line 56) | def compute_l1_loss_metric(image_set1_paths, image_set2_paths):
  function compute_psnr_loss_metric (line 76) | def compute_psnr_loss_metric(image_set1_paths, image_set2_paths):
  function evaluate_experiment (line 96) | def evaluate_experiment(val_set_out_dir, title='experiment',
  function main (line 114) | def main(argv):

FILE: layers.py
  class LayerInstanceNorm (line 20) | class LayerInstanceNorm(object):
    method __init__ (line 22) | def __init__(self, scope_suffix='instance_norm'):
    method __call__ (line 26) | def __call__(self, x):
  function layer_norm (line 32) | def layer_norm(x, scope='layer_norm'):
  function pixel_norm (line 36) | def pixel_norm(x):
  function global_avg_pooling (line 48) | def global_avg_pooling(x):
  class FullyConnected (line 52) | class FullyConnected(object):
    method __init__ (line 54) | def __init__(self, n_out_units, scope_suffix='FC'):
    method __call__ (line 64) | def __call__(self, x):
  function init_he_scale (line 69) | def init_he_scale(shape, slope=1.0):
  class LayerConv (line 83) | class LayerConv(object):
    method __init__ (line 86) | def __init__(self,
    method __call__ (line 129) | def __call__(self, x):
  class LayerTransposedConv (line 151) | class LayerTransposedConv(object):
    method __init__ (line 154) | def __init__(self,
    method __call__ (line 197) | def __call__(self, x):
  class ResBlock (line 210) | class ResBlock(object):
    method __init__ (line 211) | def __init__(self,
    method __call__ (line 242) | def __call__(self, x_init):
  class BasicBlock (line 250) | class BasicBlock(object):
    method __init__ (line 251) | def __init__(self,
    method __call__ (line 287) | def __call__(self, x_init):
  class LayerDense (line 298) | class LayerDense(object):
    method __init__ (line 301) | def __init__(self, name, n, use_scaling=False, relu_slope=1.):
    method __call__ (line 322) | def __call__(self, x):
  class LayerPipe (line 327) | class LayerPipe(object):
    method __init__ (line 330) | def __init__(self, functions):
    method __call__ (line 338) | def __call__(self, x, **kwargs):
  function downscale (line 346) | def downscale(x, n=2):
  function upscale (line 361) | def upscale(x, n):
  function tile_and_concatenate (line 378) | def tile_and_concatenate(x, z, n_z):
  function minibatch_mean_variance (line 385) | def minibatch_mean_variance(x):
  function scalar_concat (line 402) | def scalar_concat(x, scalar):

FILE: losses.py
  function gradient_penalty_loss (line 22) | def gradient_penalty_loss(y_xy, xy, iwass_target=1, iwass_lambda=10):
  function KL_loss (line 30) | def KL_loss(mean, logvar):
  function l2_regularize (line 36) | def l2_regularize(x):
  function L1_loss (line 40) | def L1_loss(x, y):
  class PerceptualLoss (line 44) | class PerceptualLoss:
    method __init__ (line 45) | def __init__(self, x, y, image_shape, layers, w_layers, w_act=0.1):
    method __call__ (line 65) | def __call__(self):
  function lsgan_appearance_E_loss (line 69) | def lsgan_appearance_E_loss(disc_response):
  function lsgan_loss (line 76) | def lsgan_loss(disc_response, is_real):
  function multiscale_discriminator_loss (line 84) | def multiscale_discriminator_loss(Ds_responses, is_real):

FILE: networks.py
  class RenderingModel (line 21) | class RenderingModel(object):
    method __init__ (line 23) | def __init__(self, model_name, use_appearance=True):
    method __call__ (line 30) | def __call__(self, x_in, z_app=None):
    method get_appearance_encoder (line 33) | def get_appearance_encoder(self):
    method get_generator (line 36) | def get_generator(self):
    method get_content_encoder (line 39) | def get_content_encoder(self):
  class ModelPGGAN (line 49) | class ModelPGGAN(RenderingModel):
    method __init__ (line 51) | def __init__(self, use_appearance=True):
    method __call__ (line 61) | def __call__(self, x_in, z_app=None):
    method get_appearance_encoder (line 65) | def get_appearance_encoder(self):
    method get_generator (line 68) | def get_generator(self):
    method get_content_encoder (line 71) | def get_content_encoder(self):
  class PatchGANDiscriminator (line 75) | class PatchGANDiscriminator(object):
    method __init__ (line 77) | def __init__(self, name_scope, input_nc, nf=64, n_layers=3, get_fmaps=...
    method __call__ (line 133) | def __call__(self, x, x_cond=None):
  class MultiScaleDiscriminator (line 158) | class MultiScaleDiscriminator(object):
    method __init__ (line 160) | def __init__(self, name_scope, input_nc, num_scales=3, nf=64, n_layers=3,
    method __call__ (line 171) | def __call__(self, x, x_cond=None, params=None):
  class GeneratorPGGAN (line 184) | class GeneratorPGGAN(object):
    method __init__ (line 185) | def __init__(self, appearance_vec_size=8, use_scaling=True,
    method __call__ (line 287) | def __call__(self, x, appearance_embedding=None, encoder_fmaps=None):
  class DRITAppearanceEncoderConcat (line 328) | class DRITAppearanceEncoderConcat(object):
    method __init__ (line 330) | def __init__(self, name_scope, input_nc, normalize_encoder):
    method __call__ (line 361) | def __call__(self, x):

FILE: neural_rerendering.py
  function build_model_fn (line 35) | def build_model_fn(use_exponential_moving_average=True):
  function make_sample_grid_and_save (line 147) | def make_sample_grid_and_save(est, dataset_name, dataset_parent_dir, gri...
  function visualize_image_sequence (line 187) | def visualize_image_sequence(est, dataset_name, dataset_parent_dir,
  function train (line 222) | def train(dataset_name, dataset_parent_dir, load_pretrained_app_encoder,
  function _build_inference_estimator (line 300) | def _build_inference_estimator(model_dir):
  function evaluate_sequence (line 306) | def evaluate_sequence(dataset_name, dataset_parent_dir, virtual_seq_name,
  function evaluate_image_set (line 315) | def evaluate_image_set(dataset_name, dataset_parent_dir, subset_suffix,
  function _load_and_concatenate_image_channels (line 341) | def _load_and_concatenate_image_channels(rgb_path=None, rendered_path=None,
  function infer_dir (line 378) | def infer_dir(model_dir, input_dir, output_dir):
  function joint_interpolation (line 414) | def joint_interpolation(model_dir, app_input_dir, st_app_basename,
  function interpolate_appearance (line 491) | def interpolate_appearance(model_dir, input_dir, target_img_basename,
  function main (line 563) | def main(argv):

FILE: options.py
  function validate_options (line 175) | def validate_options():
  function list_options (line 191) | def list_options():

FILE: pretrain_appearance.py
  function _load_and_concatenate_image_channels (line 30) | def _load_and_concatenate_image_channels(
  function read_single_appearance_input (line 62) | def read_single_appearance_input(rgb_img_path):
  function get_triplet_input_fn (line 73) | def get_triplet_input_fn(dataset_path, dist_file_path=None, k_max_neares...
  function get_tf_triplet_dataset_iter (line 116) | def get_tf_triplet_dataset_iter(
  function build_model_fn (line 146) | def build_model_fn(batch_size, lr_app_pretrain=0.0001, adam_beta1=0.0,
  function compute_dist_matrix (line 214) | def compute_dist_matrix(imageset_dir, dist_file_path, recompute_dist=Fal...
  function train_appearance (line 231) | def train_appearance(train_dir, imageset_dir, dist_file_path):
  function main (line 257) | def main(argv):

FILE: segment_dataset.py
  function get_semantic_color_coding (line 33) | def get_semantic_color_coding():
  function _apply_colors (line 111) | def _apply_colors(seg_images_path, save_dir, idx_to_color):
  function segment_images (line 132) | def segment_images(images_path, xception_frozen_graph_path, save_dir,
  function segment_and_color_dataset (line 184) | def segment_and_color_dataset(dataset_dir, xception_frozen_graph_path,

FILE: staged_model.py
  function create_computation_graph (line 27) | def create_computation_graph(x_in, x_gt, x_app=None, arch_type='pggan',

FILE: style_loss.py
  function gram_matrix (line 25) | def gram_matrix(layer):
  function compute_gram_matrices (line 39) | def compute_gram_matrices(
  function compute_pairwise_style_loss_v2 (line 48) | def compute_pairwise_style_loss_v2(image_paths_list):

FILE: utils.py
  function crop_to_multiple (line 27) | def crop_to_multiple(img, size_multiple=64):
  function get_central_crop (line 36) | def get_central_crop(img, crop_height=512, crop_width=512):
  function load_global_step_from_checkpoint_dir (line 49) | def load_global_step_from_checkpoint_dir(checkpoint_dir):
  function model_vars (line 66) | def model_vars(prefix):
  function to_png (line 85) | def to_png(x):
  function images_to_grid (line 104) | def images_to_grid(images):
  function save_images (line 119) | def save_images(image, output_dir, cur_nimg):
  class HookReport (line 139) | class HookReport(tf.train.SessionRunHook):
    method __init__ (line 154) | def __init__(self, period, batch_size):
    method disable (line 167) | def disable(self):
    method begin (line 181) | def begin(self):
    method before_run (line 187) | def before_run(self, run_context):
    method after_run (line 194) | def after_run(self, run_context, run_values):
    method end (line 226) | def end(self, session=None):
    method log_tensor (line 230) | def log_tensor(cls, tensor, name):

Download .json

Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (153K chars).

[
  {
    "path": "CONTRIBUTING.md",
    "chars": 1101,
    "preview": "# How to Contribute\n\nWe'd love to accept your patches and contributions to this project. There are\njust a few small guid"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "\n                                 Apache License\n                           Version 2.0, January 2004\n                  "
  },
  {
    "path": "README.md",
    "chars": 8576,
    "preview": "# Neural Rerendering in the Wild\nMoustafa Meshry<sup>1</sup>,\n[Dan B Goldman](http://www.danbgoldman.com/)<sup>2</sup>,\n"
  },
  {
    "path": "data.py",
    "chars": 7169,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "dataset_utils.py",
    "chars": 8980,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "evaluate_quantitative_metrics.py",
    "chars": 4364,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "layers.py",
    "chars": 12473,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "losses.py",
    "chars": 2832,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "networks.py",
    "chars": 13518,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "neural_rerendering.py",
    "chars": 26943,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "options.py",
    "chars": 13725,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "pretrain_appearance.py",
    "chars": 11791,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "segment_dataset.py",
    "chars": 7662,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "staged_model.py",
    "chars": 6809,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "style_loss.py",
    "chars": 3110,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  },
  {
    "path": "utils.py",
    "chars": 7238,
    "preview": "# Copyright 2019 Google LLC\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this f"
  }
]

About this extraction

This page contains the full source code of the google/neural_rerendering_in_the_wild GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (144.2 KB), approximately 36.6k tokens, and a symbol index with 130 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo