[
  {
    "path": ".gitignore",
    "content": "## Makefile\nMakefile\n\n## Core latex/pdflatex auxiliary files:\n*.aux\n*.lof\n*.log\n*.lot\n*.fls\n*.out\n*.toc\n*.fmt\n\n## Intermediate documents:\n*.dvi\n*-converted-to.*\n# these rules might exclude image files for figures etc.\n# *.ps\n# *.eps\n*.pdf\n\n## Bibliography auxiliary files (bibtex/biblatex/biber):\n*.bbl\n*.bcf\n*.blg\n*-blx.aux\n*-blx.bib\n*.brf\n*.run.xml\n\n## Build tool auxiliary files:\n*.fdb_latexmk\n*.synctex\n*.synctex.gz\n*.synctex.gz(busy)\n*.pdfsync\n\n## Auxiliary and intermediate files from other packages:\n# algorithms\n*.alg\n*.loa\n\n# achemso\nacs-*.bib\n\n# amsthm\n*.thm\n\n# beamer\n*.nav\n*.snm\n*.vrb\n\n# cprotect\n*.cpt\n\n#(e)ledmac/(e)ledpar\n*.end\n*.[1-9]\n*.[1-9][0-9]\n*.[1-9][0-9][0-9]\n*.[1-9]R\n*.[1-9][0-9]R\n*.[1-9][0-9][0-9]R\n*.eledsec[1-9]\n*.eledsec[1-9]R\n*.eledsec[1-9][0-9]\n*.eledsec[1-9][0-9]R\n*.eledsec[1-9][0-9][0-9]\n*.eledsec[1-9][0-9][0-9]R\n\n# glossaries\n*.acn\n*.acr\n*.glg\n*.glo\n*.gls\n\n# gnuplottex\n*-gnuplottex-*\n\n# hyperref\n*.brf\n\n# knitr\n*-concordance.tex\n*.tikz\n*-tikzDictionary\n\n# listings\n*.lol\n\n# makeidx\n*.idx\n*.ilg\n*.ind\n*.ist\n\n# minitoc\n*.maf\n*.mtc\n*.mtc[0-9]\n*.mtc[1-9][0-9]\n\n# minted\n_minted*\n*.pyg\n\n# morewrites\n*.mw\n\n# mylatexformat\n*.fmt\n\n# nomencl\n*.nlo\n\n# sagetex\n*.sagetex.sage\n*.sagetex.py\n*.sagetex.scmd\n\n# sympy\n*.sout\n*.sympy\nsympy-plots-for-*.tex/\n\n# pdfcomment\n*.upa\n*.upb\n\n#pythontex\n*.pytxcode\npythontex-files-*/\n\n# Texpad\n.texpadtmp\n\n# TikZ & PGF\n*.dpth\n*.md5\n*.auxlock\n\n# todonotes\n*.tdo\n\n# xindy\n*.xdy\n\n# xypic precompiled matrices\n*.xyc\n\n# WinEdt\n*.bak\n*.sav\n\n# endfloat\n*.ttt\n*.fff\n\n# Latexian\nTSWLatexianTemp*\n"
  },
  {
    "path": "LICENSE",
    "content": "The MIT License (MIT)\n\nCopyright (c) 2016 \n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Convolution arithmetic\n\nA technical report on convolution arithmetic in the context of deep learning.\n\nThe code and the images of this tutorial are free to use as regulated by the \nlicence and subject to proper attribution:\n\n* \\[1\\] Vincent Dumoulin, Francesco Visin - [A guide to convolution arithmetic\n  for deep learning](https://arxiv.org/abs/1603.07285)\n  ([BibTeX](https://gist.github.com/fvisin/165ca9935392fa9600a6c94664a01214))\n\n## Convolution animations\n\n_N.B.: Blue maps are inputs, and cyan maps are outputs._\n\n<table style=\"width:100%; table-layout:fixed;\">\n  <tr>\n    <td><img width=\"150px\" src=\"gif/no_padding_no_strides.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/arbitrary_padding_no_strides.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/same_padding_no_strides.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/full_padding_no_strides.gif\"></td>\n  </tr>\n  <tr>\n    <td>No padding, no strides</td>\n    <td>Arbitrary padding, no strides</td>\n    <td>Half padding, no strides</td>\n    <td>Full padding, no strides</td>\n  </tr>\n  <tr>\n    <td><img width=\"150px\" src=\"gif/no_padding_strides.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/padding_strides.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/padding_strides_odd.gif\"></td>\n    <td></td>\n  </tr>\n  <tr>\n    <td>No padding, strides</td>\n    <td>Padding, strides</td>\n    <td>Padding, strides (odd)</td>\n    <td></td>\n  </tr>\n</table>\n\n## Transposed convolution animations\n\n_N.B.: Blue maps are inputs, and cyan maps are outputs._\n\n<table style=\"width:100%; table-layout:fixed;\">\n  <tr>\n    <td><img width=\"150px\" src=\"gif/no_padding_no_strides_transposed.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/arbitrary_padding_no_strides_transposed.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/same_padding_no_strides_transposed.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/full_padding_no_strides_transposed.gif\"></td>\n  </tr>\n  <tr>\n    <td>No padding, no strides, transposed</td>\n    <td>Arbitrary padding, no strides, transposed</td>\n    <td>Half padding, no strides, transposed</td>\n    <td>Full padding, no strides, transposed</td>\n  </tr>\n  <tr>\n    <td><img width=\"150px\" src=\"gif/no_padding_strides_transposed.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/padding_strides_transposed.gif\"></td>\n    <td><img width=\"150px\" src=\"gif/padding_strides_odd_transposed.gif\"></td>\n    <td></td>\n  </tr>\n  <tr>\n    <td>No padding, strides, transposed</td>\n    <td>Padding, strides, transposed</td>\n    <td>Padding, strides, transposed (odd)</td>\n    <td></td>\n  </tr>\n</table>\n\n## Dilated convolution animations\n\n_N.B.: Blue maps are inputs, and cyan maps are outputs._\n\n<table style=\"width:25%\"; table-layout:fixed;>\n  <tr>\n    <td><img width=\"150px\" src=\"gif/dilation.gif\"></td>\n  </tr>\n  <tr>\n    <td>No padding, no stride, dilation</td>\n  </tr>\n</table>\n\n## Generating the Makefile\n\nFrom the repository's root directory:\n\n``` bash\n$ ./bin/generate_makefile\n```\n## Generating the animations\n\nFrom the repository's root directory:\n\n``` bash\n$ make all_animations\n```\n\nThe animations will be output to the `gif` directory. Individual animation steps\nwill be output in PDF format to the `pdf` directory and in PNG format to the\n`png` directory.\n\n## Compiling the document\n\nFrom the repository's root directory:\n\n``` bash\n$ make\n```\n"
  },
  {
    "path": "bibliography.bib",
    "content": "@inproceedings{le1997reading,\n  title={Reading checks with multilayer graph transformer networks},\n  author={Le Cun, Yann and Bottou, L{\\'e}on and Bengio, Yoshua},\n  booktitle={Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on},\n  volume={1},\n  pages={151--154},\n  year={1997},\n  organization={IEEE}\n}\n\n@inproceedings{bergstra2010theano,\n  title={Theano: A CPU and GPU math compiler in Python},\n  author={Bergstra, James and Breuleux, Olivier and Bastien, Fr{\\'e}d{\\'e}ric and Lamblin, Pascal and Pascanu, Razvan and Desjardins, Guillaume and Turian, Joseph and Warde-Farley, David and Bengio, Yoshua},\n  booktitle={Proc. 9th Python in Science Conf},\n  pages={1--7},\n  year={2010}\n}\n\n@inproceedings{collobert2011torch7,\n  title={Torch7: A matlab-like environment for machine learning},\n  author={Collobert, Ronan and Kavukcuoglu, Koray and Farabet, Cl{\\'e}ment},\n  booktitle={BigLearn, NIPS Workshop},\n  number={EPFL-CONF-192376},\n  year={2011}\n}\n\n@inproceedings{zeiler2011adaptive,\n  title={Adaptive deconvolutional networks for mid and high level feature learning},\n  author={Zeiler, Matthew D and Taylor, Graham W and Fergus, Rob},\n  booktitle={Computer Vision (ICCV), 2011 IEEE International Conference on},\n  pages={2018--2025},\n  year={2011},\n  organization={IEEE}\n}\n\n@inproceedings{krizhevsky2012imagenet,\n  title={Imagenet classification with deep convolutional neural networks},\n  author={Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},\n  booktitle={Advances in neural information processing systems},\n  pages={1097--1105},\n  year={2012}\n}\n\n@article{bastien2012theano,\n  title={Theano: new features and speed improvements},\n  author={Bastien, Fr{\\'e}d{\\'e}ric and Lamblin, Pascal and Pascanu, Razvan and Bergstra, James and Goodfellow, Ian and Bergeron, Arnaud and Bouchard, Nicolas and Warde-Farley, David and Bengio, Yoshua},\n  journal={arXiv preprint arXiv:1211.5590},\n  year={2012}\n}\n\n@inproceedings{jia2014caffe,\n  title={Caffe: Convolutional architecture for fast feature embedding},\n  author={Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},\n  booktitle={Proceedings of the ACM International Conference on Multimedia},\n  pages={675--678},\n  year={2014},\n  organization={ACM}\n}\n\n@incollection{zeiler2014visualizing,\n  title={Visualizing and understanding convolutional networks},\n  author={Zeiler, Matthew D and Fergus, Rob},\n  booktitle={Computer vision--ECCV 2014},\n  pages={818--833},\n  year={2014},\n  publisher={Springer}\n}\n\n@article{chen2014semantic,\n  title={Semantic image segmentation with deep convolutional nets and fully connected crfs},\n  author={Chen, Liang-Chieh and Papandreou, George and Kokkinos, Iasonas and Murphy, Kevin and Yuille, Alan L},\n  journal={arXiv preprint arXiv:1412.7062},\n  year={2014}\n}\n\n@article{abaditensorflow,\n  title={TensorFlow: Large-scale machine learning on heterogeneous systems},\n  author={Abadi, Mart{\\i}n and Agarwal, Ashish and Barham, Paul and Brevdo, Eugene and Chen, Zhifeng and Citro, Craig and Corrado, Greg S and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and others},\n  journal={Software available from tensorflow.org},\n  year={2015}\n}\n\n@inproceedings{long2015fully,\n  title={Fully convolutional networks for semantic segmentation},\n  author={Long, Jonathan and Shelhamer, Evan and Darrell, Trevor},\n  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},\n  pages={3431--3440},\n  year={2015}\n}\n\n@article{radford2015unsupervised,\n  title={Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks},\n  author={Radford, Alec and Metz, Luke and Chintala, Soumith},\n  journal={arXiv preprint arXiv:1511.06434},\n  year={2015}\n}\n\n@unpublished{Goodfellow-et-al-2016-Book,\n    title={Deep Learning},\n    author={Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron},\n    note={Book in preparation for MIT Press},\n    url={http://goodfeli.github.io/dlbook/},\n    year={2016}\n}\n\n@article{im2016generating,\n  title={Generating images with recurrent adversarial networks},\n  author={Im, Daniel Jiwoong and Kim, Chris Dongjoo and Jiang, Hui and Memisevic, Roland},\n  journal={arXiv preprint arXiv:1602.05110},\n  year={2016}\n}\n\n@article{visin15,\n  author    = {Francesco Visin and\n               Kyle Kastner and\n               Aaron C. Courville and\n               Yoshua Bengio and\n               Matteo Matteucci and\n               KyungHyun Cho},\n  title     = {ReSeg: {A} Recurrent Neural Network for Object Segmentation},\n  year      = {2015},\n  url       = {http://arxiv.org/abs/1511.07053},\n}\n\n@article{yu2015multi,\n  title={Multi-scale context aggregation by dilated convolutions},\n  author={Yu, Fisher and Koltun, Vladlen},\n  journal={arXiv preprint arXiv:1511.07122},\n  year={2015}\n}\n\n@inproceedings {boureau-cvpr-10,\n    title = \"Learning Mid-Level Features for Recognition\",\n    author = \"Boureau, {Y-Lan} and Bach, Francis and LeCun, Yann and Ponce, Jean\",\n    booktitle = \"Proc. International Conference on Computer Vision and Pattern Recognition (CVPR'10)\",\n    publisher = \"IEEE\",   \n    year = \"2010\"\n}\n\n@inproceedings {boureau-icml-10,\n    title = \"A theoretical analysis of feature pooling in vision algorithms\",\n    author = \"Boureau, {Y-Lan} and Ponce, Jean and LeCun, Yann\",\n    booktitle = \"Proc. International Conference on Machine learning (ICML'10)\",\n    year = \"2010\"\n}\n\n@inproceedings {boureau-iccv-11,\n    title = \"Ask the locals: multi-way local pooling for image recognition\",\n    author = \"Boureau, {Y-Lan} and {Le Roux}, Nicolas and Bach, Francis and Ponce, Jean and LeCun, Yann\",\n    booktitle = \"Proc. International Conference on Computer Vision (ICCV'11)\",\n    publisher = \"IEEE\",   \n    year = \"2011\"\n}\n \n@InProceedings{ICML2011Saxe_551,\n  author =    {Andrew Saxe and Pang Wei Koh and Zhenghao Chen and Maneesh Bhand and Bipin Suresh and Andrew Ng},\n  title =     {On Random Weights and Unsupervised Feature Learning },\n  booktitle = {Proceedings of the 28th International Conference on Machine Learning (ICML-11)},\n  series =    {ICML '11},\n  year =      {2011},\n  editor =    {Lise Getoor and Tobias Scheffer},\n  location =  {Bellevue, Washington, USA},\n  isbn =      {978-1-4503-0619-5},\n  month =     {June},\n  publisher = {ACM},\n  address =   {New York, NY, USA},\n  pages=      {1089--1096},\n}\n\n@article{oord2016wavenet,\n  title={Wavenet: A generative model for raw audio},\n  author={Oord, Aaron van den and Dieleman, Sander and Zen, Heiga and Simonyan,\n          Karen and Vinyals, Oriol and Graves, Alex and Kalchbrenner, Nal and\n          Senior, Andrew and Kavukcuoglu, Koray},\n  journal={arXiv preprint arXiv:1609.03499},\n  year={2016}\n}\n"
  },
  {
    "path": "bin/generate_makefile",
    "content": "#!/usr/bin/env python\nfrom six import iteritems\nfrom six.moves import range\n\narithmetic_files = ('bin/produce_figure templates/arithmetic_figure.txt '\n                    'templates/unit.txt')\nnumerical_files = 'bin/produce_figure templates/numerical_figure.txt'\n\nanimations = (\n    ('no_padding_no_strides',\n     ('arithmetic', 4, 2, 0, 3, 1, 1, 'convolution', False)),\n    ('no_padding_no_strides_transposed',\n     ('arithmetic', 4, 2, 0, 3, 1, 1, 'convolution', True)),\n    ('arbitrary_padding_no_strides',\n     ('arithmetic', 5, 6, 2, 4, 1, 1, 'convolution', False)),\n    ('arbitrary_padding_no_strides_transposed',\n     ('arithmetic', 5, 6, 2, 4, 1, 1, 'convolution', True)),\n    ('same_padding_no_strides',\n     ('arithmetic', 5, 5, 1, 3, 1, 1, 'convolution', False)),\n    ('same_padding_no_strides_transposed',\n     ('arithmetic', 5, 5, 1, 3, 1, 1, 'convolution', True)),\n    ('full_padding_no_strides',\n     ('arithmetic', 5, 7, 2, 3, 1, 1, 'convolution', False)),\n    ('full_padding_no_strides_transposed',\n     ('arithmetic', 5, 7, 2, 3, 1, 1, 'convolution', True)),\n    ('no_padding_strides',\n     ('arithmetic', 5, 2, 0, 3, 2, 1, 'convolution', False)),\n    ('no_padding_strides_transposed',\n     ('arithmetic', 5, 2, 0, 3, 2, 1, 'convolution', True)),\n    ('padding_strides',\n     ('arithmetic', 5, 3, 1, 3, 2, 1, 'convolution', False)),\n    ('padding_strides_transposed',\n     ('arithmetic', 5, 3, 1, 3, 2, 1, 'convolution', True)),\n    ('padding_strides_odd',\n     ('arithmetic', 6, 3, 1, 3, 2, 1, 'convolution', False)),\n    ('padding_strides_odd_transposed',\n     ('arithmetic', 6, 3, 1, 3, 2, 1, 'convolution', True)),\n    ('dilation',\n     ('arithmetic', 7, 3, 0, 3, 1, 2, 'convolution', False)),\n    ('numerical_no_padding_no_strides',\n     ('numerical', 5, 3, 0, 3, 1, 1, 'convolution', False)),\n    ('numerical_padding_strides',\n     ('numerical', 5, 3, 1, 3, 2, 1, 'convolution', False)),\n    ('numerical_average_pooling',\n     ('numerical', 5, 3, 0, 3, 1, 1, 'average', False)),\n    ('numerical_max_pooling',\n     ('numerical', 5, 3, 0, 3, 1, 1, 'max', False)),\n)\n\nfields = ('type', 'input-size', 'output-size', 'padding', 'kernel-size',\n          'stride', 'dilation', 'mode', 'transposed')\nanimations = dict([(name, dict(zip(fields, config)))\n                   for name, config in animations])\n\n\ndef make_header():\n    return ('.PHONY : all_animations\\nall_animations : {}\\n\\n'.format(\n                ' '.join(['gif/{}.gif'.format(name)\n                          for name in animations.keys()])) +\n            '.SECONDARY : \\n')\n\n\ndef make_report_section():\n\n    return ('conv_arithmetic.pdf : export BSTINPUTS=$BSTINPUTS:./natbib\\n'\n\t\t\t'conv_arithmetic.pdf : conv_arithmetic.tex\\n'\n            '\\tpdflatex conv_arithmetic\\n'\n            '\\tpdflatex conv_arithmetic\\n'\n            '\\tbibtex conv_arithmetic\\n'\n            '\\tpdflatex conv_arithmetic\\n'\n            '\\tpdflatex conv_arithmetic\\n\\n'\n            '.PHONY : clean\\n'\n            'clean : \\n'\n            '\\trm -f conv_arithmetic.{aux,bbl,blg,log}\\n')\n\n\ndef make_gif_section():\n    rules = []\n    for name, config in iteritems(animations):\n        if config['transposed']:\n            steps = config['input-size'] ** 2\n        else:\n            steps = config['output-size'] ** 2\n        rules.append(\n            'gif/{}.gif : '.format(name) +\n            ' '.join(['png/{}_{:02d}.png'.format(name, i)\n                      for i in range(steps)]) + '\\n' +\n            '\\tconvert -delay 100 -loop 0 -layers Optimize +map -dispose previous $^ $@\\n' +\n            '\\tgifsicle --batch -O3 $@\\n')\n    return '\\n'.join(rules)\n\n\ndef make_png_section():\n    return ('png/%.png : pdf/%.pdf\\n'\n            '\\tconvert -density 600 $< -flatten -resize 25% $@\\n')\n\n\ndef make_pdf_section():\n    rules = []\n    for name, config in iteritems(animations):\n        if config['transposed']:\n            steps = config['input-size'] ** 2\n        else:\n            steps = config['output-size'] ** 2\n        if config['type'] == 'arithmetic':\n            dependencies = arithmetic_files\n        else:\n            dependencies = numerical_files\n        subrules = []\n        for i in range(steps):\n            subrules.append(\n                'pdf/{}_{:02d}.pdf : {}\\n'.format(name, i, dependencies) +\n                '\\t./bin/produce_figure ' +\n                '{} {} {} '.format(config['type'], name, i) +\n                '--input-size={} '.format(config['input-size']) +\n                '--output-size={} '.format(config['output-size']) +\n                '--padding={} '.format(config['padding']) +\n                '--kernel-size={} '.format(config['kernel-size']) +\n                '--stride={} '.format(config['stride']) +\n                ('--dilation={} '.format(config['dilation'])\n                 if config['dilation'] != 1 else '' ) +\n                ('--mode={} '.format(config['mode'])\n                 if config['type'] == 'numerical' else '') +\n                ('--transposed\\n' if config['transposed'] else '\\n'))\n        rules.append('\\n'.join(subrules))\n    return '\\n'.join(rules)\n\n\ndef main():\n    with open('Makefile', 'w') as makefile:\n        makefile.write('\\n'.join([make_report_section(), make_header(),\n                                  make_gif_section(), make_png_section(),\n                                  make_pdf_section()]))\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "bin/produce_figure",
    "content": "#!/usr/bin/env python\nimport argparse\nimport itertools\nimport os\nimport subprocess\nfrom glob import glob\n\nimport numpy\nimport six\n\nnumpy.random.seed(1234)\n\n\ndef make_numerical_tex_string(step, input_size, output_size, padding,\n                              kernel_size, stride, dilation, mode):\n    \"\"\"Creates a LaTeX string for a numerical convolution animation.\n\n    Parameters\n    ----------\n    step : int\n        Which step of the animation to generate the LaTeX string for.\n    input_size : int\n        Convolution input size.\n    output_size : int\n        Convolution output size.\n    padding : int\n        Zero padding.\n    kernel_size : int\n        Convolution kernel size.\n    stride : int\n        Convolution stride.\n    mode : str\n        Kernel mode, one of {'convolution', 'average', 'max'}.\n\n    Returns\n    -------\n    tex_string : str\n        A string to be compiled by LaTeX to produce one step of the\n        animation.\n\n    \"\"\"\n    if mode not in ('convolution', 'average', 'max'):\n        raise ValueError(\"wrong convolution mode, choices are 'convolution', \"\n                         \"'average' or 'max'\")\n    if dilation != 1:\n        raise ValueError(\"Only a dilation of 1 is currently supported for numerical output\")\n    max_steps = output_size ** 2\n    if step >= max_steps:\n        raise ValueError('step {} out of bounds (there are '.format(step) +\n                         '{} steps for this animation'.format(max_steps))\n\n    with open(os.path.join('templates', 'numerical_figure.txt'), 'r') as f:\n        tex_template = f.read()\n\n    total_input_size = input_size + 2 * padding\n\n    input_ = numpy.zeros((total_input_size, total_input_size), dtype='int32')\n    input_[padding: padding + input_size,\n           padding: padding + input_size] = numpy.random.randint(\n        low=0, high=4, size=(input_size, input_size))\n    kernel = numpy.random.randint(\n        low=0, high=3, size=(kernel_size, kernel_size))\n    output = numpy.empty((output_size, output_size), dtype='float32')\n    for offset_x, offset_y in itertools.product(range(output_size),\n                                                range(output_size)):\n        if mode == 'convolution':\n            output[offset_x, offset_y] = (\n                input_[stride * offset_x: stride * offset_x + kernel_size,\n                       stride * offset_y: stride * offset_y + kernel_size] * kernel).sum()\n        elif mode == 'average':\n            output[offset_x, offset_y] = (\n                input_[stride * offset_x: stride * offset_x + kernel_size,\n                       stride * offset_y: stride * offset_y + kernel_size]).mean()\n        else:\n            output[offset_x, offset_y] = (\n                input_[stride * offset_x: stride * offset_x + kernel_size,\n                       stride * offset_y: stride * offset_y + kernel_size]).max()\n\n    offsets = list(itertools.product(range(output_size - 1, -1, -1),\n                                     range(output_size)))\n    offset_y, offset_x = offsets[step]\n\n    if mode == 'convolution':\n        kernel_values_string = ''.join(\n            \"\\\\node (node) at ({0},{1}) {{\\\\tiny {2}}};\\n\".format(\n                i + 0.8 + stride * offset_x, j + 0.2 + stride * offset_y,\n                kernel[kernel_size - 1 - j, i])\n            for i, j in itertools.product(range(kernel_size),\n                                          range(kernel_size)))\n    else:\n        kernel_values_string = '\\n'\n\n    return six.b(tex_template.format(**{\n        'PADDING_TO': '{0},{0}'.format(total_input_size),\n        'INPUT_FROM': '{0},{0}'.format(padding),\n        'INPUT_TO': '{0},{0}'.format(padding + input_size),\n        'INPUT_VALUES': ''.join(\n            \"\\\\node (node) at ({0},{1}) {{\\\\footnotesize {2}}};\\n\".format(\n                i + 0.5, j + 0.5, input_[total_input_size - 1 - j, i])\n            for i, j in itertools.product(range(total_input_size),\n                                          range(total_input_size))),\n        'INPUT_GRID_FROM': '{},{}'.format(stride * offset_x,\n                                          stride * offset_y),\n        'INPUT_GRID_TO': '{},{}'.format(stride * offset_x + kernel_size,\n                                        stride * offset_y + kernel_size),\n        'KERNEL_VALUES': kernel_values_string,\n        'OUTPUT_TO': '{0},{0}'.format(output_size),\n        'OUTPUT_GRID_FROM': '{},{}'.format(offset_x, offset_y),\n        'OUTPUT_GRID_TO': '{},{}'.format(offset_x + 1, offset_y + 1),\n        'OUTPUT_VALUES': ''.join(\n            \"\\\\node (node) at ({0},{1}) {{\\\\tiny {2:.1f}}};\\n\".format(\n                i + 0.5, j + 0.5, output[output_size - 1 - j, i])\n            for i, j in itertools.product(range(output_size),\n                                          range(output_size))),\n        'XSHIFT': '{}cm'.format(total_input_size + 1),\n        'YSHIFT': '{}cm'.format((total_input_size - output_size) // 2),\n    }))\n\n\ndef make_arithmetic_tex_string(step, input_size, output_size, padding,\n                               kernel_size, stride, dilation, transposed):\n    \"\"\"Creates a LaTeX string for a convolution arithmetic animation.\n\n    Parameters\n    ----------\n    step : int\n        Which step of the animation to generate the LaTeX string for.\n    input_size : int\n        Convolution input size.\n    output_size : int\n        Convolution output size.\n    padding : int\n        Zero padding.\n    kernel_size : int\n        Convolution kernel size.\n    stride : int\n        Convolution stride.\n    dilation: int\n        Input Dilation\n    transposed : bool\n        If ``True``, generate strings for the transposed convolution\n        animation.\n\n    Returns\n    -------\n    tex_string : str\n        A string to be compiled by LaTeX to produce one step of the\n        animation.\n\n    \"\"\"\n    kernel_size = (kernel_size - 1)*dilation + 1\n    if transposed:\n        # Used to add bottom-padding to account for odd shapes\n        bottom_pad = (input_size + 2 * padding - kernel_size) % stride\n\n        input_size, output_size, padding, spacing, stride = (\n            output_size, input_size, kernel_size - 1 - padding, stride, 1)\n        total_input_size = output_size + kernel_size - 1\n        y_adjustment = 0\n    else:\n        # Not used in convolutions\n        bottom_pad = 0\n\n        spacing = 1\n        total_input_size = input_size + 2 * padding\n        y_adjustment = (total_input_size - (kernel_size - stride)) % stride\n\n    max_steps = output_size ** 2\n    if step >= max_steps:\n        raise ValueError('step {} out of bounds (there are '.format(step) +\n                         '{} steps for this animation'.format(max_steps))\n\n    with open(os.path.join('templates', 'arithmetic_figure.txt'), 'r') as f:\n        tex_template = f.read()\n    with open(os.path.join('templates', 'unit.txt'), 'r') as f:\n        unit_template = f.read()\n\n    offsets = list(itertools.product(range(output_size - 1, -1, -1),\n                                     range(output_size)))\n    offset_y, offset_x = offsets[step]\n\n    return six.b(tex_template.format(**{\n        'PADDING_TO': '{0},{0}'.format(total_input_size),\n        'INPUT_UNITS': ''.join(\n            unit_template.format(padding + spacing * i,\n                                 bottom_pad + padding + spacing * j,\n                                 padding + spacing * i + 1,\n                                 bottom_pad + padding + spacing * j + 1)\n            for i, j in itertools.product(range(input_size),\n                                          range(input_size))),\n        'INPUT_GRID_FROM_X': '{}'.format(\n            stride * offset_x),\n        'INPUT_GRID_FROM_Y': '{}'.format(\n            y_adjustment + stride * offset_y),\n        'INPUT_GRID_TO_X': '{}'.format(\n            stride * offset_x + kernel_size),\n        'INPUT_GRID_TO_Y': '{}'.format(\n            y_adjustment + stride * offset_y + kernel_size),\n        'DILATION': '{}'.format(dilation),\n        'OUTPUT_BOTTOM_LEFT': '{},{}'.format(offset_x, offset_y),\n        'OUTPUT_BOTTOM_RIGHT': '{},{}'.format(offset_x + 1, offset_y),\n        'OUTPUT_TOP_LEFT': '{},{}'.format(offset_x, offset_y + 1),\n        'OUTPUT_TOP_RIGHT': '{},{}'.format(offset_x + 1, offset_y + 1),\n        'OUTPUT_TO': '{0},{0}'.format(output_size),\n        'OUTPUT_GRID_FROM': '{},{}'.format(offset_x, offset_y),\n        'OUTPUT_GRID_TO': '{},{}'.format(offset_x + 1, offset_y + 1),\n        'OUTPUT_ELEVATION': '{}cm'.format(total_input_size + 1),\n    }))\n\n\ndef compile_figure(which_, name, step, **kwargs):\n    if which_ == 'arithmetic':\n        tex_string = make_arithmetic_tex_string(step, **kwargs)\n    else:\n        tex_string = make_numerical_tex_string(step, **kwargs)\n    jobname = '{}_{:02d}'.format(name, step)\n    p = subprocess.Popen(['pdflatex', '-jobname={}'.format(jobname),\n                          '-output-directory', 'pdf'],\n                         stdin=subprocess.PIPE, stdout=subprocess.PIPE,\n                         stderr=subprocess.PIPE)\n    stdoutdata, stderrdata = p.communicate(input=tex_string)\n    # Remove logs and aux if compilation was successfull\n    if '! LaTeX Error' in stdoutdata or '! Emergency stop' in stdoutdata:\n        print('! LaTeX Error: check the log file in pdf/{}.log'.format(jobname))\n    else:\n        subprocess.call(['rm'] + glob('pdf/{}.aux'.format(jobname)) +\n                        glob('pdf/{}.log'.format(jobname)))\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser(\n        description=\"Compile a LaTeX figure as part of a convolution \"\n                    \"animation.\")\n\n    subparsers = parser.add_subparsers()\n\n    parent_parser = argparse.ArgumentParser(add_help=False)\n    parent_parser.add_argument(\"name\", type=str, help=\"name for the animation\")\n    parent_parser.add_argument(\"step\", type=int, help=\"animation step\")\n    parent_parser.add_argument(\"-i\", \"--input-size\", type=int, default=5,\n                               help=\"input size\")\n    parent_parser.add_argument(\"-o\", \"--output-size\", type=int, default=3,\n                               help=\"output size\")\n    parent_parser.add_argument(\"-p\", \"--padding\", type=int, default=0,\n                               help=\"zero padding\")\n    parent_parser.add_argument(\"-k\", \"--kernel-size\", type=int, default=3,\n                               help=\"kernel size\")\n    parent_parser.add_argument(\"-s\", \"--stride\", type=int, default=1,\n                               help=\"stride\")\n    parent_parser.add_argument(\"-d\", \"--dilation\", type=int, default=1,\n                               help=\"dilation\")\n\n    subparser = subparsers.add_parser('arithmetic', parents=[parent_parser],\n                                      help='convolution arithmetic animation')\n    subparser.add_argument(\"--transposed\", action=\"store_true\",\n                           help=\"animate a transposed convolution\")\n    subparser.set_defaults(which_='arithmetic')\n\n    subparser = subparsers.add_parser('numerical', parents=[parent_parser],\n                                      help='numerical convolution animation')\n    subparser.add_argument(\"-m\", \"--mode\", type=str, default='convolution',\n                           choices=('convolution', 'average', 'max'),\n                           help=\"kernel mode\")\n    subparser.set_defaults(which_='numerical')\n\n    args = parser.parse_args()\n    args_dict = vars(args)\n    which_ = args_dict.pop('which_')\n    name = args_dict.pop('name')\n    step = args_dict.pop('step')\n\n    compile_figure(which_, name, step, **args_dict)\n"
  },
  {
    "path": "conv_arithmetic.tex",
    "content": "\\documentclass[notitlepage]{report}\n\\usepackage{amsmath,amsfonts,amsthm,amssymb}\n\\usepackage{authblk}\n\\usepackage[T1]{fontenc}\n\\usepackage{graphicx}\n\\usepackage[utf8]{inputenc}\n\\usepackage[framemethod=tikz]{mdframed}\n\\usepackage{natbib}\n\\usepackage{subcaption}\n\\usepackage{tikz}\n\\usepackage{xcolor}\n\\usepackage{epigraph}\n\\usepackage{float}\n\n\\usepackage{hyperref}\n\n\n\\definecolor{blue}{RGB}{38,139,210}\n\\definecolor{cyan}{RGB}{42,161,152}\n\\definecolor{violet}{RGB}{108,113,196}\n\\definecolor{red}{RGB}{220,50,47}\n\\definecolor{base01}{RGB}{88,110,117}\n\\definecolor{base02}{RGB}{7,54,66}\n\\definecolor{base03}{RGB}{0,43,54}\n\n\\usetikzlibrary{calc,shapes,positioning}\n\n\\newcommand{\\todo}[1]{\\textcolor{red}{TODO: #1}}\n\n\\newtheorem{relationship}{Relationship}\n\\providecommand*{\\relationshipautorefname}{Relationship}\n\\surroundwithmdframed[\n    topline=false,\n    bottomline=false,\n    middlelinewidth=0.5pt,\n    linecolor=base01,\n    roundcorner=5pt,\n    innertopmargin=0pt,\n    leftmargin=15pt,\n    rightmargin=15pt,\n    nobreak=true,\n]{relationship}\n\n\\setcounter{MaxMatrixCols}{16}\n\n\\let\\originalepigraph\\epigraph\n\\renewcommand\\epigraph[2]{\\originalepigraph{\\textit{#1}}{\\textsc{#2}}}\n\n% Use arabic numbers for thanks\n\\makeatletter\n\\let\\@fnsymbol\\@arabic\n\\makeatother\n\n\\title{A guide to convolution arithmetic for deep learning}\n\\author[$\\bigstar$]{Vincent Dumoulin\\thanks{dumouliv@iro.umontreal.ca}}\n\\author[$\\bigstar\\dagger$]{Francesco Visin\\thanks{francesco.visin@polimi.it}}\n\\affil[$\\bigstar$]{MILA, Universit\\'{e} de Montr\\'{e}al}\n\\affil[$\\dagger$]{AIRLab, Politecnico di Milano}\n\\date{\\today}\n\n\\begin{document}\n\n\\maketitle\n\\thispagestyle{empty}\n\\clearpage\n\n\\setlength{\\epigraphwidth}{0.4\\textwidth}\n\\epigraph{All models are wrong, but some are useful.}{George E. P. Box}\n\\clearpage\n\n\\renewcommand{\\abstractname}{Acknowledgements}\n\\begin{abstract}\n    The authors of this guide would like to thank David Warde-Farley, Guillaume\n    Alain and Caglar Gulcehre for their valuable feedback. We are likewise\n    grateful to all those who helped improve this tutorial with helpful\n    comments, constructive criticisms and code contributions. Keep them coming!\n\n    Special thanks to Ethan Schoonover, creator of the Solarized color\n    scheme,\\footnote{\\url{http://ethanschoonover.com/solarized}} whose colors\n    were used for the figures.\n\\end{abstract}\n\n\\renewcommand{\\abstractname}{Feedback}\n\\begin{abstract}\n    Your feedback is welcomed! We did our best to be as precise, informative and\n    up to the point as possible, but should there be anything you feel might be\n    an error or could be rephrased to be more precise or comprehensible, please\n    don't refrain from contacting us. Likewise, drop us a line if you think\n    there is something that might fit this technical report and you would like\n    us to discuss -- we will make our best effort to update this document.\n\\end{abstract}\n\n\\renewcommand{\\abstractname}{Source code and animations}\n\\begin{abstract}\n    The code used to generate this guide along with its figures is available on\n    GitHub.\\footnote{\\url{https://github.com/vdumoulin/conv_arithmetic}} There\n    the reader can also find an animated version of the figures.\n\\end{abstract}\n\n\\tableofcontents\n\n\\chapter{Introduction}\n\nDeep convolutional neural networks (CNNs) have been at the heart of spectacular\nadvances in deep learning. Although CNNs have been used as early as the nineties\nto solve character recognition tasks \\citep{le1997reading}, their current\nwidespread application is due to much more recent work, when a deep CNN was used\nto beat state-of-the-art in the ImageNet image classification challenge\n\\citep{krizhevsky2012imagenet}.\n\nConvolutional neural networks therefore constitute a very useful tool for\nmachine learning practitioners. However, learning to use CNNs for the first time\nis generally an intimidating experience. A convolutional layer's output shape is\naffected by the shape of its input as well as the choice of kernel shape, zero\npadding and strides, and the relationship between these properties is not\ntrivial to infer. This contrasts with fully-connected layers, whose output size\nis independent of the input size. Additionally, CNNs also usually feature a {\\em\npooling\\/} stage, adding yet another level of complexity with respect to\nfully-connected networks.  Finally, so-called transposed convolutional layers\n(also known as fractionally strided convolutional layers) have been employed in\nmore and more work as of late \\citep{zeiler2011adaptive,zeiler2014visualizing,\nlong2015fully,radford2015unsupervised,visin15,im2016generating}, and their\nrelationship with convolutional layers has been explained with various degrees\nof clarity.\n\nThis guide's objective is twofold:\n\n\\begin{enumerate}\n    \\item Explain the relationship between convolutional layers and transposed\n        convolutional layers.\n    \\item Provide an intuitive understanding of the relationship between input\n        shape, kernel shape, zero padding, strides and output shape in\n        convolutional, pooling and transposed convolutional layers.\n\\end{enumerate}\n\nIn order to remain broadly applicable, the results shown in this guide are\nindependent of implementation details and apply to all commonly used machine\nlearning frameworks, such as Theano\n\\citep{bergstra2010theano,bastien2012theano}, Torch \\citep{collobert2011torch7},\nTensorflow \\citep{abaditensorflow} and Caffe \\citep{jia2014caffe}.\n\nThis chapter briefly reviews the main building blocks of CNNs, namely discrete\nconvolutions and pooling. For an in-depth treatment of the subject, see Chapter\n9 of the Deep Learning textbook \\citep{Goodfellow-et-al-2016-Book}.\n\n\\section{Discrete convolutions}\n\nThe bread and butter of neural networks is \\emph{affine transformations}: a\nvector is received as input and is multiplied with a matrix to produce an\noutput (to which a bias vector is usually added before passing the result\nthrough a nonlinearity). This is applicable to any type of input, be it an\nimage, a sound clip or an unordered collection of features: whatever their\ndimensionality, their representation can always be flattened into a vector\nbefore the transformation.\n\nImages, sound clips and many other similar kinds of data have an intrinsic\nstructure. More formally, they share these important properties:\n\n\\begin{itemize}\n    \\item They are stored as multi-dimensional arrays.\n    \\item They feature one or more axes for which ordering matters (e.g., width\n        and height axes for an image, time axis for a sound clip).\n    \\item One axis, called the channel axis, is used to access different views\n        of the data (e.g., the red, green and blue channels of a color image, or\n        the left and right channels of a stereo audio track).\n\\end{itemize}\n\nThese properties are not exploited when an affine transformation is applied; in\nfact, all the axes are treated in the same way and the topological information\nis not taken into account. Still, taking advantage of the implicit structure of\nthe data may prove very handy in solving some tasks, like computer vision and\nspeech recognition, and in these cases it would be best to preserve it. This is\nwhere discrete convolutions come into play.\n\nA discrete convolution is a linear transformation that preserves this notion of\nordering. It is sparse (only a few input units contribute to a given output\nunit) and reuses parameters (the same weights are applied to multiple locations\nin the input).\n\n\\autoref{fig:numerical_no_padding_no_strides} provides an example of a discrete\nconvolution. The light blue grid is called the {\\em input feature map}. To keep\nthe drawing simple, a single input feature map is represented, but it is not\nuncommon to have multiple feature maps stacked one onto another.\\footnote{%\n    An example of this is what was referred to earlier as {\\em channels\\/} for\n    images and sound clips.}\nA {\\em kernel\\/} (shaded area) of value\n\n\\begin{figure}[H]\n    \\centering\n    \\begin{tikzpicture}[scale=.4,every node/.style={minimum size=1cm}, on grid]\n            \\draw[fill=base02,opacity=0.4] (0,0) rectangle (3,3);\n            \\draw[draw=base03,thick] (0,0) grid (3,3);\n            \\node (00) at (0.5,2.5) {\\tiny 0};\n            \\node (01) at (1.5,2.5) {\\tiny 1};\n            \\node (02) at (2.5,2.5) {\\tiny 2};\n            \\node (10) at (0.5,1.5) {\\tiny 2};\n            \\node (11) at (1.5,1.5) {\\tiny 2};\n            \\node (12) at (2.5,1.5) {\\tiny 0};\n            \\node (20) at (0.5,0.5) {\\tiny 0};\n            \\node (21) at (1.5,0.5) {\\tiny 1};\n            \\node (22) at (2.5,0.5) {\\tiny 2};\n    \\end{tikzpicture}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_00.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_01.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_02.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_03.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_04.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_05.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_06.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_07.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_no_padding_no_strides_08.pdf}\n    \\caption{\\label{fig:numerical_no_padding_no_strides} Computing the output\n        values of a discrete convolution.}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_00.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_01.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_02.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_03.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_04.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_05.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_06.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_07.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_padding_strides_08.pdf}\n    \\caption{\\label{fig:numerical_padding_strides} Computing the output values\n        of a discrete convolution for $N = 2$, $i_1 = i_2 = 5$, $k_1 = k_2 = 3$,\n        $s_1 = s_2 = 2$, and $p_1 = p_2 = 1$.}\n\\end{figure}\n\n\\noindent slides across the input feature map. At each location, the product\nbetween each element of the kernel and the input element it overlaps is computed\nand the results are summed up to obtain the output in the current location. The\nprocedure can be repeated using different kernels to form as many output feature\nmaps as desired (\\autoref{fig:full_picture}). The final outputs of this procedure\nare called {\\em output feature maps}.\\footnote{%\n    While there is a distinction between convolution and cross-correlation from\n    a signal processing perspective, the two become interchangeable when the\n    kernel is learned. For the sake of simplicity and to stay consistent with\n    most of the machine learning literature, the term {\\em convolution\\/}\n    will be used in this guide.}\nIf there are multiple input feature maps, the kernel will have to be\n3-dimensional -- or, equivalently each one of the feature maps will be\nconvolved with a distinct kernel -- and the resulting feature maps will\nbe summed up elementwise to produce the output feature map.\n\nThe convolution depicted in \\autoref{fig:numerical_no_padding_no_strides} is an\ninstance of a 2-D convolution, but it can be generalized to N-D convolutions.\nFor instance, in a 3-D convolution, the kernel would be a {\\em cuboid\\/} and\nwould slide across the height, width and depth of the input feature map.\n\nThe collection of kernels defining a discrete convolution has a shape\ncorresponding to some permutation of $(n, m, k_1, \\ldots, k_N)$, where\n\n\\begin{equation*}\n\\begin{split}\n    n &\\equiv \\text{number of output feature maps},\\\\\n    m &\\equiv \\text{number of input feature maps},\\\\\n    k_j &\\equiv \\text{kernel size along axis $j$}.\n\\end{split}\n\\end{equation*}\n\nThe following properties affect the output size $o_j$ of a convolutional layer\nalong axis $j$:\n\n\\begin{itemize}\n    \\item $i_j$: input size along axis $j$,\n    \\item $k_j$: kernel size along axis $j$,\n    \\item $s_j$: stride (distance between two consecutive positions of the\n        kernel) along axis $j$,\n    \\item $p_j$: zero padding (number of zeros concatenated at the beginning and\n        at the end of an axis) along axis $j$.\n\\end{itemize}\n\n\\noindent For instance, \\autoref{fig:numerical_padding_strides} shows a $3\n\\times 3$ kernel applied to a $5 \\times 5$ input padded with a $1 \\times 1$\nborder of zeros using $2 \\times 2$ strides.\n\nNote that strides constitute a form of \\emph{subsampling}. As an alternative to\nbeing interpreted as a measure of how much the kernel is translated, strides\ncan also be viewed as how much of the output is retained. For instance, moving\nthe kernel by hops of two is equivalent to moving the kernel by hops of one but\nretaining only odd output elements (\\autoref{fig:strides_subsampling}).\n\n\\begin{figure}[p]\n    \\centering\n    \\begin{tikzpicture}[scale=.35,every node/.style={minimum size=1cm}, on grid]\n        \\begin{scope}[xshift=0cm,yshift=0cm]\n            \\begin{scope}[xshift=0cm,yshift=0cm]\n                \\draw[draw=base03,fill=violet,thick]\n                    (0,0) grid (5,5) rectangle (0,0);\n            \\end{scope}\n            \\begin{scope}[xshift=0.5cm,yshift=0.5cm]\n                \\draw[draw=base03,fill=blue,thick]\n                    (0,0) grid (5,5) rectangle (0,0);\n            \\end{scope}\n        \\end{scope}\n        \\foreach \\x in {-10,1,11} {%\n            \\begin{scope}[xshift=\\x cm,yshift=10cm]\n                \\begin{scope}[xshift=0cm,yshift=0cm]\n                    \\draw[draw=base03,fill=violet,thick]\n                        (0,0) grid (3,3) rectangle (0,0);\n                \\end{scope}\n                \\begin{scope}[xshift=0.5cm,yshift=0.5cm]\n                    \\draw[draw=base03,fill=blue,thick]\n                        (0,0) grid (3,3) rectangle (0,0);\n                \\end{scope}\n            \\end{scope}\n            \\begin{scope}[xshift=\\x cm,yshift=20cm]\\begin{scope}[xshift=0.5cm]\n                \\draw[draw=base03,fill=cyan,thick]\n                    (0,0) grid (3,3) rectangle (0,0);\n            \\end{scope}\\end{scope}\n        }\n        \\begin{scope}[xshift=1cm,yshift=30cm]\n            \\foreach \\s in {0.0,0.5,1.0} {%\n                \\begin{scope}[xshift=\\s cm,yshift=\\s cm]\n                    \\draw[draw=base03,fill=cyan,thick]\n                        (0,0) grid (3,3) rectangle (0,0);\n                \\end{scope}\n            }\n        \\end{scope}\n        \\draw[->, thick] (-0.5,2.5) to (-8.5,9.5);\n        \\draw[->, thick] (3,6) to (3,9.5);\n        \\draw[->, thick] (6,3.5) to (12.5,9.5);\n        \\draw[thick]  (-8,14.5) to (-8,16);\n        \\draw[->, thick]  (-8,18) to (-8,19.5);\n        \\node[thick] (p1) at (-8,17) {$+$};\n        \\draw[thick]  (3,14.5) to (3,16);\n        \\draw[->, thick]  (3,18) to (3,19.5);\n        \\node[thick] (p2) at (3,17) {$+$};\n        \\draw[thick]  (13,14.5) to (13,16);\n        \\draw[->, thick]  (13,18) to (13,19.5);\n        \\node[thick] (p3) at (13,17) {$+$};\n        \\draw[->, thick]  (-8,23.5) to (2,29.5);\n        \\draw[->, thick]  (3,23.5) to (2.5,29.5);\n        \\draw[->, thick]  (13,23.5) to (3,29.5);\n    \\end{tikzpicture}\n    \\caption{\\label{fig:full_picture} A convolution mapping from two input\n        feature maps to three output feature maps using a $3 \\times 2 \\times 3\n        \\times 3$ collection of kernels $\\mathbf{w}$. In the left pathway, input\n        feature map 1 is convolved with kernel $\\mathbf{w}_{1,1}$ and input\n        feature map 2 is convolved with kernel $\\mathbf{w}_{1,2}$, and the\n        results are summed together elementwise to form the first output feature\n        map. The same is repeated for the middle and right pathways to form the\n        second and third feature maps, and all three output feature maps are\n        grouped together to form the output.}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\begin{tikzpicture}[scale=.35,every node/.style={minimum size=1cm}, on grid]\n        \\begin{scope}[xshift=0,yshift=0cm]\n            \\begin{scope}[xshift=0cm,yshift=0cm]\n                \\draw[draw=base03,fill=blue,thick] (0,0) grid (5,5) rectangle (0,0);\n                \\draw[fill=base02, opacity=0.4] (0,2) rectangle (3,5);\n            \\end{scope}\n            \\begin{scope}[xshift=7cm,yshift=1.5cm]\n                \\draw[draw=base03,fill=cyan,thick] (0,0) grid (2,2) rectangle (0,0);\n            \\end{scope}\n        \\end{scope}\n        \\draw[draw=base03, ->, thick] (2.6,3.5) to  (4.5,3.5);\n        \\draw[draw=base03, ->, thick] (1.5,2.4) to (1.5,0.5);\n        \\draw[draw=base03, ->, thick] (5.25, 2.5) to (6.75, 2.5);\n        \\begin{scope}[xshift=12cm,yshift=0cm]\n            \\begin{scope}[xshift=0cm,yshift=0cm]\n                \\draw[draw=base03,fill=blue,thick] (0,0) grid (5,5) rectangle (0,0);\n                \\draw[fill=base02, opacity=0.4] (0,2) rectangle (3,5);\n            \\end{scope}\n            \\begin{scope}[xshift=7cm,yshift=1cm]\n                \\draw[draw=base03,fill=cyan,thick] (0,0) grid (3,3) rectangle (0,0);\n                \\draw[draw=base03] (1,0) -- (2,1) -- (2,0) -- (1,1);\n                \\draw[draw=base03] (0,1) -- (1,2) -- (1,1) -- (0,2);\n                \\draw[draw=base03] (1,1) -- (2,2) -- (2,1) -- (1,2);\n                \\draw[draw=base03] (2,1) -- (3,2) -- (3,1) -- (2,2);\n                \\draw[draw=base03] (1,2) -- (2,3) -- (2,2) -- (1,3);\n            \\end{scope}\n            \\begin{scope}[xshift=12cm,yshift=1.5cm]\n                \\draw[draw=base03,fill=cyan,thick] (0,0) grid (2,2) rectangle (0,0);\n            \\end{scope}\n        \\end{scope}\n        \\draw[draw=base03, ->, thick] (14.6,3.5) to  (15.5,3.5);\n        \\draw[draw=base03, ->, thick] (15.6,3.5) to  (16.5,3.5);\n        \\draw[draw=base03, ->, thick] (13.5,2.4) to (13.5,1.5);\n        \\draw[draw=base03, ->, thick] (13.5,1.4) to (13.5,0.5);\n        \\draw[draw=base03, ->, thick] (17.25, 2.5) to (18.75, 2.5);\n        \\draw[draw=base03, ->, thick] (22.25, 2.5) to (23.75, 2.5);\n    \\end{tikzpicture}\n    \\caption{\\label{fig:strides_subsampling} An alternative way of viewing\n        strides. Instead of translating the $3 \\times 3$ kernel by increments of\n        $s = 2$ (left), the kernel is translated by increments of $1$ and only\n        one in $s = 2$ output elements is retained (right).}\n\\end{figure}\n\n\\section{Pooling}\n\nIn addition to discrete convolutions themselves, {\\em pooling\\/} operations\nmake up another important building block in CNNs. Pooling operations reduce\nthe size of feature maps by using some function to summarize subregions, such\nas taking the average or the maximum value.\n\nPooling works by sliding a window across the input and feeding the content of\nthe window to a {\\em pooling function}. In some sense, pooling works very much\nlike a discrete convolution, but replaces the linear combination described by\nthe kernel with some other function. \\autoref{fig:numerical_average_pooling}\nprovides an example for average pooling, and \\autoref{fig:numerical_max_pooling}\ndoes the same for max pooling.\n\nThe following properties affect the output size $o_j$ of a pooling layer\nalong axis $j$:\n\n\\begin{itemize}\n    \\item $i_j$: input size along axis $j$,\n    \\item $k_j$: pooling window size along axis $j$,\n    \\item $s_j$: stride (distance between two consecutive positions of the\n        pooling window) along axis $j$.\n\\end{itemize}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_00.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_01.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_02.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_03.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_04.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_05.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_06.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_07.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_average_pooling_08.pdf}\n    \\caption{\\label{fig:numerical_average_pooling} Computing the output values\n        of a $3 \\times 3$ average pooling operation on a $5 \\times 5$ input\n        using $1 \\times 1$ strides.}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_00.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_01.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_02.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_03.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_04.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_05.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_06.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_07.pdf}\n    \\includegraphics[width=0.32\\textwidth]{pdf/numerical_max_pooling_08.pdf}\n    \\caption{\\label{fig:numerical_max_pooling} Computing the output values of a\n        $3 \\times 3$ max pooling operation on a $5 \\times 5$ input using $1\n        \\times 1$ strides.}\n\\end{figure}\n\n\\chapter{Convolution arithmetic}\n\nThe analysis of the relationship between convolutional layer properties is eased\nby the fact that they don't interact across axes, i.e., the choice of kernel\nsize, stride and zero padding along axis $j$ only affects the output size of\naxis $j$. Because of that, this chapter will focus on the following simplified\nsetting:\n\n\\begin{itemize}\n    \\item 2-D discrete convolutions ($N = 2$),\n    \\item square inputs ($i_1 = i_2 = i$),\n    \\item square kernel size ($k_1 = k_2 = k$),\n    \\item same strides along both axes ($s_1 = s_2 = s$),\n    \\item same zero padding along both axes ($p_1 = p_2 = p$).\n\\end{itemize}\n\nThis facilitates the analysis and the visualization, but keep in mind that the\nresults outlined here also generalize to the N-D and non-square cases.\n\n\\section{No zero padding, unit strides}\n\nThe simplest case to analyze is when the kernel just slides across every\nposition of the input (i.e., $s = 1$ and $p = 0$).\n\\autoref{fig:no_padding_no_strides} provides an example for $i = 4$ and $k =\n3$.\n\nOne way of defining the output size in this case is by the number of possible\nplacements of the kernel on the input. Let's consider the width axis: the kernel\nstarts on the leftmost part of the input feature map and slides by steps of one\nuntil it touches the right side of the input. The size of the output will be\nequal to the number of steps made, plus one, accounting for the initial position\nof the kernel (\\autoref{fig:no_padding_no_strides_explained}). The same logic\napplies for the height axis.\n\nMore formally, the following relationship can be inferred:\n\n\\begin{relationship}\\label{rel:no_padding_no_strides}\nFor any $i$ and $k$, and for $s = 1$ and $p = 0$,\n\\begin{equation*}\n    o = (i - k) + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_03.pdf}\n    \\caption{\\label{fig:no_padding_no_strides} (No padding, unit strides)\n        Convolving a $3 \\times 3$ kernel over a $4 \\times 4$ input using unit\n        strides (i.e., $i = 4$, $k = 3$, $s = 1$ and $p = 0$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_03.pdf}\n    \\caption{\\label{fig:arbitrary_padding_no_strides} (Arbitrary padding, unit\n        strides) Convolving a $4 \\times 4$ kernel over a $5 \\times 5$ input\n        padded with a $2 \\times 2$ border of zeros using unit strides (i.e.,\n        $i = 5$, $k = 4$, $s = 1$ and $p = 2$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_03.pdf}\n    \\caption{\\label{fig:same_padding_no_strides} (Half padding, unit strides)\n        Convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input using half\n        padding and unit strides (i.e., $i = 5$, $k = 3$, $s = 1$ and $p = 1$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_03.pdf}\n    \\caption{\\label{fig:full_padding_no_strides} (Full padding, unit strides)\n        Convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input using full\n        padding and unit strides (i.e., $i = 5$, $k = 3$, $s = 1$ and $p = 2$).}\n\\end{figure}\n\n\\section{Zero padding, unit strides}\n\nTo factor in zero padding (i.e., only restricting to $s = 1$), let's consider\nits effect on the effective input size: padding with $p$ zeros changes the\neffective input size from $i$ to $i + 2p$. In the general case,\n\\autoref{rel:no_padding_no_strides} can then be used to infer the following\nrelationship:\n\n\\begin{relationship}\\label{rel:arbitrary_padding_no_strides}\nFor any $i$, $k$ and $p$, and for $s = 1$,\n\\begin{equation*}\n    o = (i - k) + 2p + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\noindent \\autoref{fig:arbitrary_padding_no_strides} provides an example for $i\n= 5$, $k = 4$ and $p = 2$.\n\nIn practice, two specific instances of zero padding are used quite extensively\nbecause of their respective properties. Let's discuss them in more detail.\n\n\\subsection{Half (same) padding}\n\nHaving the output size be the same as the input size (i.e., $o = i$) can be a\ndesirable property:\n\n\\begin{relationship}\\label{rel:same_padding_no_strides}\nFor any $i$ and for $k$ odd ($k = 2n + 1, \\quad n \\in \\mathbb{N}$), $s = 1$ and\n$p = \\lfloor k / 2 \\rfloor = n$,\n\\begin{equation*}\n\\begin{split}\n    o &= i + 2 \\lfloor k / 2 \\rfloor - (k - 1) \\\\\n      &= i + 2n - 2n \\\\\n      &= i.\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\noindent This is sometimes referred to as {\\em half\\/} (or {\\em same\\/})\npadding. \\autoref{fig:same_padding_no_strides} provides an example for\n$i = 5$, $k = 3$ and (therefore) $p = 1$.\n\n\\subsection{Full padding}\n\nWhile convolving a kernel generally {\\em decreases\\/} the output size with\nrespect to the input size, sometimes the opposite is required. This can be\nachieved with proper zero padding:\n\n\\begin{relationship}\\label{rel:full_padding_no_strides}\nFor any $i$ and $k$, and for $p = k - 1$ and $s = 1$,\n\\begin{equation*}\n\\begin{split}\n    o &= i + 2(k - 1) - (k - 1) \\\\\n      &= i + (k - 1).\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\noindent This is sometimes referred to as {\\em full\\/} padding, because in this\nsetting every possible partial or complete superimposition of the kernel on the\ninput feature map is taken into account. \\autoref{fig:full_padding_no_strides}\nprovides an example for $i = 5$, $k = 3$ and (therefore) $p = 2$.\n\n\\section{No zero padding, non-unit strides}\n\nAll relationships derived so far only apply for unit-strided convolutions.\nIncorporating non unitary strides requires another inference leap. To\nfacilitate the analysis, let's momentarily ignore zero padding (i.e., $s > 1$\nand $p = 0$). \\autoref{fig:no_padding_strides} provides an example for $i =\n5$, $k = 3$ and $s = 2$.\n\nOnce again, the output size can be defined in terms of the number of possible\nplacements of the kernel on the input. Let's consider the width axis: the\nkernel starts as usual on the leftmost part of the input, but this time it\nslides by steps of size $s$ until it touches the right side of the input. The\nsize of the output is again equal to the number of steps made, plus one,\naccounting for the initial position of the kernel\n(\\autoref{fig:no_padding_strides_explained}). The same logic applies for the\nheight axis.\n\nFrom this, the following relationship can be inferred:\n\n\\begin{relationship}\\label{rel:no_padding_strides}\nFor any $i$, $k$ and $s$, and for $p = 0$,\n\\begin{equation*}\n    o = \\left\\lfloor \\frac{i - k}{s} \\right\\rfloor + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\noindent The floor function accounts for the fact that sometimes the last\npossible step does {\\em not\\/} coincide with the kernel reaching the end of the\ninput, i.e., some input units are left out (see\n\\autoref{fig:padding_strides_odd} for an example of such a case).\n\n\\section{Zero padding, non-unit strides}\n\nThe most general case (convolving over a zero padded input using non-unit\nstrides) can be derived by applying \\autoref{rel:no_padding_strides} on an\neffective input of size $i + 2p$, in analogy to what was done for\n\\autoref{rel:arbitrary_padding_no_strides}:\n\n\\begin{relationship}\\label{rel:padding_strides}\nFor any $i$, $k$, $p$ and $s$,\n\\begin{equation*}\n    o = \\left\\lfloor \\frac{i + 2p - k}{s} \\right\\rfloor + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\noindent As before, the floor function means that in some cases a convolution\nwill produce the same output size for multiple input sizes. More specifically,\nif $i + 2p - k$ is a multiple of $s$, then any input size $j = i + a, \\quad a\n\\in \\{0,\\ldots,s - 1\\}$ will produce the same output size. Note that this\nambiguity applies only for $s > 1$.\n\n\\autoref{fig:padding_strides} shows an example with $i = 5$, $k = 3$, $s = 2$\nand $p = 1$, while \\autoref{fig:padding_strides_odd} provides an example for\n$i = 6$, $k = 3$, $s = 2$ and $p = 1$. Interestingly, despite having different\ninput sizes these convolutions share the same output size. While this doesn't\naffect the analysis for {\\em convolutions}, this will complicate the analysis\nin the case of {\\em transposed convolutions}.\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_03.pdf}\n    \\caption{\\label{fig:no_padding_strides} (No zero padding, arbitrary\n        strides) Convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input\n        using $2 \\times 2$ strides (i.e., $i = 5$, $k = 3$, $s = 2$ and\n        $p = 0$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_03.pdf}\n    \\caption{\\label{fig:padding_strides} (Arbitrary padding and strides)\n        Convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input padded with\n        a $1 \\times 1$ border of zeros using $2 \\times 2$ strides (i.e.,\n        $i = 5$, $k = 3$, $s = 2$ and $p = 1$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_03.pdf}\n    \\caption{\\label{fig:padding_strides_odd} (Arbitrary padding and strides)\n        Convolving a $3 \\times 3$ kernel over a $6 \\times 6$ input padded with\n        a $1 \\times 1$ border of zeros using $2 \\times 2$ strides (i.e.,\n        $i = 6$, $k = 3$, $s = 2$ and $p = 1$). In this case, the bottom row\n        and right column of the zero padded input are not covered by the\n        kernel.}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\begin{subfigure}[t]{0.48\\textwidth}\n        \\centering\n        \\begin{tikzpicture}[scale=.35,every node/.style={minimum size=1cm},\n                            on grid]\n            \\draw[fill=blue] (0,0) rectangle (5,5);\n            \\draw[draw=base03, thick] (0,0) grid (5,5);\n            \\draw[fill=base02, opacity=0.4] (0,2) rectangle (3,5);\n            \\draw[step=10mm, base03, thick] (0,2) grid (3,5);\n            \\draw[draw=base03, ->, thick] (2.6,3.5) to  (3.5,3.5);\n            \\draw[draw=base03, ->, thick] (3.6,3.5) to  (4.5,3.5);\n            \\draw[draw=base03, ->, thick] (1.5,2.4) to  (1.5,1.5);\n            \\draw[draw=base03, ->, thick] (1.5,1.4) to  (1.5,0.5);\n        \\end{tikzpicture}\n        \\caption{\\label{fig:no_padding_no_strides_explained} The kernel has to\n            slide two steps to the right to touch the right side of the input\n            (and equivalently downwards).  Adding one to account for the\n            initial kernel position, the output size is $3 \\times 3$.}\n    \\end{subfigure}\n    ~\n    \\begin{subfigure}[t]{0.48\\textwidth}\n        \\centering\n        \\begin{tikzpicture}[scale=.35,every node/.style={minimum size=1cm},\n                            on grid]\n            \\draw[fill=blue] (0,0) rectangle (5,5);\n            \\draw[draw=base03, thick] (0,0) grid (5,5);\n            \\draw[fill=base02, opacity=0.4] (0,2) rectangle (3,5);\n            \\draw[step=10mm, base03, thick] (0,2) grid (3,5);\n            \\draw[draw=base03, ->, thick] (2.5,3.5) to  (4.5,3.5);\n            \\draw[draw=base03, ->, thick] (1.5,2.5) to  (1.5,0.5);\n        \\end{tikzpicture}\n        \\caption{\\label{fig:no_padding_strides_explained} The kernel has to\n            slide one step of size two to the right to touch the right side of\n            the input (and equivalently downwards).  Adding one to account for\n            the initial kernel position, the output size is $2 \\times 2$.}\n    \\end{subfigure}\n    \\caption{Counting kernel positions.}\n\\end{figure}\n\n\\chapter{Pooling arithmetic}\n\nIn a neural network, pooling layers provide invariance to small translations of\nthe input. The most common kind of pooling is \\emph{max pooling}, which\nconsists in splitting the input in (usually non-overlapping) patches and\noutputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\nmean or average pooling, which all share the same idea of aggregating the input\nlocally by applying a non-linearity to the content of some patches \\citep{%\nboureau-cvpr-10,boureau-icml-10,boureau-iccv-11,ICML2011Saxe_551}.\n\nSome readers may have noticed that the treatment of convolution arithmetic only\nrelies on the assumption that some function is repeatedly applied onto subsets\nof the input. This means that the relationships derived in the previous chapter\ncan be reused in the case of pooling arithmetic. Since pooling does not involve\nzero padding, the relationship describing the general case is as follows:\n\n\\begin{relationship}\\label{rel:pooling}\nFor any $i$, $k$ and $s$,\n\\begin{equation*}\n    o = \\left\\lfloor \\frac{i - k}{s} \\right\\rfloor + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\noindent This relationship holds for any type of pooling.\n\n\\chapter{Transposed convolution arithmetic}\n\nThe need for transposed convolutions generally arises from the desire to use a\ntransformation going in the opposite direction of a normal convolution, i.e.,\nfrom something that has the shape of the output of some convolution to\nsomething that has the shape of its input while maintaining a connectivity\npattern that is compatible with said convolution. For instance, one might use\nsuch a transformation as the decoding layer of a convolutional autoencoder or to\nproject feature maps to a higher-dimensional space.\n\nOnce again, the convolutional case is considerably more complex than the\nfully-connected case, which only requires to use a weight matrix whose shape\nhas been transposed. However, since every convolution boils down to an\nefficient implementation of a matrix operation, the insights gained from the\nfully-connected case are useful in solving the convolutional case.\n\nLike for convolution arithmetic, the dissertation about transposed convolution\narithmetic is simplified by the fact that transposed convolution properties\ndon't interact across axes.\n\nThe chapter will focus on the following setting:\n\n\\begin{itemize}\n    \\item 2-D transposed convolutions ($N = 2$),\n    \\item square inputs ($i_1 = i_2 = i$),\n    \\item square kernel size ($k_1 = k_2 = k$),\n    \\item same strides along both axes ($s_1 = s_2 = s$),\n    \\item same zero padding along both axes ($p_1 = p_2 = p$).\n\\end{itemize}\n\n\\noindent Once again, the results outlined generalize to the N-D and non-square\ncases.\n\n\\section{Convolution as a matrix operation}\n\nTake for example the convolution represented in\n\\autoref{fig:no_padding_no_strides}. If the input and output were to be unrolled\ninto vectors from left to right, top to bottom, the convolution could be\nrepresented as a sparse matrix $\\mathbf{C}$ where the non-zero elements are the\nelements $w_{i,j}$ of the kernel (with $i$ and $j$ being the row and column of\nthe kernel respectively):\n\\begin{equation*}\n\\resizebox{.98\\hsize}{!}{$\n    \\begin{pmatrix}\n    w_{0,0} & w_{0,1} & w_{0,2} & 0       & w_{1,0} & w_{1,1} & w_{1,2} & 0       &\n    w_{2,0} & w_{2,1} & w_{2,2} & 0       & 0       & 0       & 0       & 0       \\\\\n    0       & w_{0,0} & w_{0,1} & w_{0,2} & 0       & w_{1,0} & w_{1,1} & w_{1,2} &\n    0       & w_{2,0} & w_{2,1} & w_{2,2} & 0       & 0       & 0       & 0       \\\\\n    0       & 0       & 0       & 0       & w_{0,0} & w_{0,1} & w_{0,2} & 0       &\n    w_{1,0} & w_{1,1} & w_{1,2} & 0       & w_{2,0} & w_{2,1} & w_{2,2} & 0       \\\\\n    0       & 0       & 0       & 0       & 0       & w_{0,0} & w_{0,1} & w_{0,2} &\n    0       & w_{1,0} & w_{1,1} & w_{1,2} & 0       & w_{2,0} & w_{2,1} & w_{2,2} \\\\\n    \\end{pmatrix}$}\n\\end{equation*}\n\nThis linear operation takes the input matrix flattened as a 16-dimensional\nvector and produces a 4-dimensional vector that is later reshaped as the $2\n\\times 2$ output matrix.\n\nUsing this representation, the backward pass is easily obtained by transposing\n$\\mathbf{C}$; in other words, the error is backpropagated by multiplying the\nloss with $\\mathbf{C}^T$. This operation takes a 4-dimensional vector as input\nand produces a 16-dimensional vector as output, and its connectivity pattern is\ncompatible with $\\mathbf{C}$ by construction.\n\nNotably, the kernel $\\mathbf{w}$ defines both the matrices $\\mathbf{C}$ and\n$\\mathbf{C}^T$ used for the forward and backward passes.\n\n\\section{Transposed convolution}\n\nLet's now consider what would be required to go the other way around, i.e., map\nfrom a 4-dimensional space to a 16-dimensional space, while keeping the\nconnectivity pattern of the convolution depicted in\n\\autoref{fig:no_padding_no_strides}. This operation is known as a {\\em\ntransposed convolution}.\n\nTransposed convolutions -- also called {\\em fractionally strided convolutions\\/}\nor {\\em deconvolutions\\/}\\footnote{The term ``deconvolution'' is sometimes used\nin the literature, but we advocate against it on the grounds that a\ndeconvolution is mathematically defined as the inverse of a convolution, which\nis different from a transposed convolution.} -- work by swapping the forward and\nbackward passes of a convolution. One way to put it is to note that the kernel\ndefines a convolution, but whether it's a direct convolution or a transposed\nconvolution is determined by how the forward and backward passes are computed.\n\nFor instance, although the kernel $\\mathbf{w}$ defines a convolution whose\nforward and backward passes are computed by multiplying with $\\mathbf{C}$ and\n$\\mathbf{C}^T$ respectively, it {\\em also\\/} defines a transposed convolution\nwhose forward and backward passes are computed by multiplying with\n$\\mathbf{C}^T$ and $(\\mathbf{C}^T)^T = \\mathbf{C}$ respectively.\\footnote{The\n    transposed convolution operation can be thought of as the gradient of {\\em\n    some\\/} convolution with respect to its input, which is usually how\n    transposed convolutions are implemented in practice.}\n\nFinally note that it is always possible to emulate a transposed convolution with\na direct convolution. The disadvantage is that it usually involves adding many\ncolumns and rows of zeros to the input, resulting in a much less efficient\nimplementation.\n\nBuilding on what has been introduced so far, this chapter will proceed somewhat\nbackwards with respect to the convolution arithmetic chapter, deriving the\nproperties of each transposed convolution by referring to the direct\nconvolution with which it shares the kernel, and defining the equivalent direct\nconvolution.\n\n\\section{No zero padding, unit strides, transposed}\n\nThe simplest way to think about a transposed convolution on a given input is to\nimagine such an input as being the result of a direct convolution applied on\nsome initial feature map. The trasposed convolution can be then considered as\nthe operation that allows to recover the \\emph{shape}~\\footnote{Note that the\n  transposed convolution does not guarantee to recover the input itself, as it\n  is not defined as the inverse of the convolution, but rather just returns a\n  feature map that has the same width and height.} of this initial feature map.\n\nLet's consider the convolution of a $3 \\times 3$ kernel on a $4 \\times 4$\ninput with unitary stride and no padding (i.e., $i = 4$, $k = 3$, $s = 1$ and\n$p = 0$). As depicted in \\autoref{fig:no_padding_no_strides}, this produces a\n$2 \\times 2$ output. The transpose of this convolution will then have an output\nof shape $4 \\times 4$ when applied on a $2 \\times 2$ input.\n\nAnother way to obtain the result of a transposed convolution is to apply an\nequivalent -- but much less efficient -- direct convolution. The example\ndescribed so far could be tackled by convolving a $3 \\times 3$ kernel over a\n$2 \\times 2$ input padded with a $2 \\times 2$ border of zeros using unit\nstrides (i.e., $i' = 2$, $k' = k$, $s' = 1$ and $p' = 2$), as shown in\n\\autoref{fig:no_padding_no_strides_transposed}. Notably, the kernel's and\nstride's sizes remain the same, but the input of the transposed convolution is\nnow zero padded.\\footnote{Note that although\n    equivalent to applying the transposed matrix, this visualization adds a lot\n    of zero multiplications in the form of zero padding.  This is done here for\n    illustration purposes, but it is inefficient, and software implementations\n    will normally not perform the useless zero multiplications.}\n\nOne way to understand the logic behind zero padding is to consider the\nconnectivity pattern of the transposed convolution and use it to guide the\ndesign of the equivalent convolution. For example, the top left pixel of the\ninput of the direct convolution only contribute to the top left pixel of the\noutput, the top right pixel is only connected to the top right output pixel,\nand so on.\n\nTo maintain the same connectivity pattern in the equivalent convolution it is\nnecessary to zero pad the input in such a way that the first (top-left)\napplication of the kernel only touches the top-left pixel, i.e., the padding\nhas to be equal to the size of the kernel minus one.\n\nProceeding in the same fashion it is possible to determine similar observations\nfor the other elements of the image, giving rise to the following relationship:\n\n\\begin{relationship}\\label{rel:no_padding_no_strides_transposed}\nA convolution described by $s = 1$, $p = 0$ and $k$ has an associated\ntransposed convolution described by $k' = k$, $s' = s$ and $p' = k - 1$ and its\noutput size is\n\\begin{equation*}\n    o' = i' + (k - 1).\n\\end{equation*}\n\\end{relationship}\n\nInterestingly, this corresponds to a fully padded convolution with unit\nstrides.\n\n\\section{Zero padding, unit strides, transposed}\n\nKnowing that the transpose of a non-padded convolution is equivalent to\nconvolving a zero padded input, it would be reasonable to suppose that the\ntranspose of a zero padded convolution is equivalent to convolving an input\npadded with {\\em less\\/} zeros.\n\nIt is indeed the case, as shown in\n\\autoref{fig:arbitrary_padding_no_strides_transposed} for $i = 5$, $k = 4$ and\n$p = 2$.\n\nFormally, the following relationship applies for zero padded convolutions:\n\n\\begin{relationship}\\label{rel:arbitrary_padding_no_strides_transposed}\nA convolution described by $s = 1$, $k$ and $p$ has an\nassociated transposed convolution described by $k' = k$, $s' = s$ and $p' = k -\np - 1$ and its output size is\n\\begin{equation*}\n    o' = i' + (k - 1) - 2p.\n\\end{equation*}\n\\end{relationship}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_no_strides_transposed_03.pdf}\n    \\caption{\\label{fig:no_padding_no_strides_transposed} The transpose of\n        convolving a $3 \\times 3$ kernel over a $4 \\times 4$ input using unit\n        strides (i.e., $i = 4$, $k = 3$, $s = 1$ and $p = 0$). It is equivalent\n        to convolving a $3 \\times 3$ kernel over a $2 \\times 2$ input padded\n        with a $2 \\times 2$ border of zeros using unit strides (i.e., $i' = 2$,\n        $k' = k$, $s' = 1$ and $p' = 2$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/arbitrary_padding_no_strides_transposed_03.pdf}\n    \\caption{\\label{fig:arbitrary_padding_no_strides_transposed} The transpose\n        of convolving a $4 \\times 4$ kernel over a $5 \\times 5$ input padded\n        with a $2 \\times 2$ border of zeros using unit strides (i.e., $i = 5$,\n        $k = 4$, $s = 1$ and $p = 2$). It is equivalent to convolving a $4\n        \\times 4$ kernel over a $6 \\times 6$ input padded with a $1 \\times 1$\n        border of zeros using unit strides (i.e., $i' = 6$, $k' = k$, $s' = 1$\n        and $p' = 1$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/same_padding_no_strides_transposed_03.pdf}\n    \\caption{\\label{fig:same_padding_no_strides_transposed} The transpose of\n        convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input using half\n        padding and unit strides (i.e., $i = 5$, $k = 3$, $s = 1$ and $p = 1$).\n        It is equivalent to convolving a $3 \\times 3$ kernel over a $5 \\times 5$\n        input using half padding and unit strides (i.e., $i' = 5$, $k' = k$, $s'\n        = 1$ and $p' = 1$).}\n\\end{figure}\n\n\\subsection{Half (same) padding, transposed}\n\nBy applying the same inductive reasoning as before, it is reasonable to expect\nthat the equivalent convolution of the transpose of a half padded convolution\nis itself a half padded convolution, given that the output size of a half\npadded convolution is the same as its input size. Thus the following relation\napplies:\n\n\\begin{relationship}\\label{rel:half_padding_no_strides_transposed}\nA convolution described by $k = 2n + 1, \\quad n \\in \\mathbb{N}$, $s = 1$ and $p\n= \\lfloor k / 2 \\rfloor = n$ has an associated transposed convolution described\nby $k' = k$, $s' = s$ and $p' = p$ and its output size is\n\\begin{equation*}\n\\begin{split}\n    o' &= i' + (k - 1) - 2p \\\\\n       &= i' + 2n - 2n \\\\\n       &= i'.\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\autoref{fig:same_padding_no_strides_transposed} provides an example for $i =\n5$, $k = 3$ and (therefore) $p = 1$.\n\n\\subsection{Full padding, transposed}\n\nKnowing that the equivalent convolution of the transpose of a non-padded\nconvolution involves full padding, it is unsurprising that the equivalent of\nthe transpose of a fully padded convolution is a non-padded convolution:\n\n\\begin{relationship}\\label{rel:full_padding_no_strides_transposed}\nA convolution described by $s = 1$, $k$ and $p = k - 1$ has an\nassociated transposed convolution described by $k' = k$, $s' = s$ and $p' = 0$\nand its output size is\n\\begin{equation*}\n\\begin{split}\n    o' &= i' + (k - 1) - 2p \\\\\n       &= i' - (k - 1)\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\autoref{fig:full_padding_no_strides_transposed} provides an example for $i =\n5$, $k = 3$ and (therefore) $p = 2$.\n\n\\section{No zero padding, non-unit strides, transposed}\n\nUsing the same kind of inductive logic as for zero padded convolutions, one\nmight expect that the transpose of a convolution with $s > 1$ involves an\nequivalent convolution with $s < 1$. As will be explained, this is a valid\nintuition, which is why transposed convolutions are sometimes called {\\em\nfractionally strided convolutions}.\n\n\\autoref{fig:no_padding_strides_transposed} provides an example for $i = 5$, $k\n= 3$ and $s = 2$ which helps understand what fractional strides involve: zeros\nare inserted {\\em between\\/} input units, which makes the kernel move around at\na slower pace than with unit strides.\\footnote{Doing so is inefficient and\n    real-world implementations avoid useless multiplications by zero, but\n    conceptually it is how the transpose of a strided convolution can be\n    thought of.}\n\nFor the moment, it will be assumed that the convolution is non-padded ($p = 0$)\nand that its input size $i$ is such that $i - k$ is a multiple of $s$. In that\ncase, the following relationship holds:\n\n\\begin{relationship}\\label{rel:no_padding_strides_transposed}\nA convolution described by $p = 0$, $k$ and $s$ and whose input\nsize is such that $i - k$ is a multiple of $s$, has an associated transposed\nconvolution described by $\\tilde{i}'$, $k' = k$, $s' = 1$ and $p' = k - 1$,\nwhere $\\tilde{i}'$ is the size of the stretched input obtained by adding\n$s - 1$ zeros between each input unit, and its output size is\n\\begin{equation*}\n\\begin{split}\n    o' = s (i' - 1) + k.\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/full_padding_no_strides_transposed_03.pdf}\n    \\caption{\\label{fig:full_padding_no_strides_transposed} The transpose of\n        convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input using full\n        padding and unit strides (i.e., $i = 5$, $k = 3$, $s = 1$ and $p = 2$).\n        It is equivalent to convolving a $3 \\times 3$ kernel over a $7 \\times 7$\n        input using unit strides (i.e., $i' = 7$, $k' = k$, $s' = 1$ and $p' =\n        0$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/no_padding_strides_transposed_03.pdf}\n    \\caption{\\label{fig:no_padding_strides_transposed} The transpose of\n        convolving a $3 \\times 3$ kernel over a $5 \\times 5$ input using $2\n        \\times 2$ strides (i.e., $i = 5$, $k = 3$, $s = 2$ and $p = 0$). It is\n        equivalent to convolving a $3 \\times 3$ kernel over a $2 \\times 2$ input\n        (with $1$ zero inserted between inputs) padded with a $2 \\times 2$\n        border of zeros using unit strides (i.e., $i' = 2$, $\\tilde{i}' = 3$, $k'\n        = k$, $s' = 1$ and $p' = 2$).}\n\\end{figure}\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_transposed_03.pdf}\n    \\caption{\\label{fig:padding_strides_transposed} The transpose of convolving\n        a $3 \\times 3$ kernel over a $5 \\times 5$ input padded with a $1 \\times\n        1$ border of zeros using $2 \\times 2$ strides (i.e., $i = 5$, $k = 3$, $s\n        = 2$ and $p = 1$). It is equivalent to convolving a $3 \\times 3$ kernel\n        over a $3 \\times 3$ input (with $1$ zero inserted between inputs) padded\n        with a $1 \\times 1$ border of zeros using unit strides (i.e., $i' = 3$,\n        $\\tilde{i}' = 5$, $k' = k$, $s' = 1$ and $p' = 1$).}\n\\end{figure}\n\n\\section{Zero padding, non-unit strides, transposed}\n\nWhen the convolution's input size $i$ is such that $i + 2p - k$ is a multiple\nof $s$, the analysis can extended to the zero padded case by combining\n\\autoref{rel:arbitrary_padding_no_strides_transposed} and\n\\autoref{rel:no_padding_strides_transposed}:\n\n\\begin{relationship}\\label{rel:padding_strides_transposed}\nA convolution described by $k$, $s$ and $p$ and whose\ninput size $i$ is such that $i + 2p - k$ is a multiple of $s$ has an associated\ntransposed convolution described by $\\tilde{i}'$, $k' = k$, $s' = 1$ and\n$p' = k - p - 1$, where $\\tilde{i}'$ is the size of the stretched input\nobtained by adding $s - 1$ zeros between each input unit, and its output size\nis\n\\begin{equation*}\n\\begin{split}\n    o' = s (i' - 1) + k - 2p.\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\autoref{fig:padding_strides_transposed} provides an example for $i = 5$, $k =\n3$, $s = 2$ and $p = 1$.\n\nThe constraint on the size of the input $i$ can be relaxed by introducing\nanother parameter $a \\in \\{0, \\ldots, s - 1\\}$ that allows to distinguish\nbetween the $s$ different cases that all lead to the same $i'$:\n\n\\begin{relationship}\\label{rel:padding_strides_transposed_odd}\nA convolution described by $k$, $s$ and $p$ has an\nassociated transposed convolution described by $a$, $\\tilde{i}'$, $k' = k$, $s'\n= 1$ and $p' = k - p - 1$, where $\\tilde{i}'$ is the size of the stretched\ninput obtained by adding $s - 1$ zeros between each input unit, and $a = (i +\n2p - k) \\mod s$ represents the number of zeros added to the bottom and right edges\nof the input, and its output size is\n\\begin{equation*}\n\\begin{split}\n    o' = s (i' - 1) + a + k - 2p.\n\\end{split}\n\\end{equation*}\n\\end{relationship}\n\n\\autoref{fig:padding_strides_odd_transposed} provides an example for $i = 6$, $k\n= 3$, $s = 2$ and $p = 1$.\n\n\\begin{figure}[p]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_transposed_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_transposed_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_transposed_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/padding_strides_odd_transposed_03.pdf}\n    \\caption{\\label{fig:padding_strides_odd_transposed} The transpose of\n        convolving a $3 \\times 3$ kernel over a $6 \\times 6$ input padded with a\n        $1 \\times 1$ border of zeros using $2 \\times 2$ strides (i.e., $i = 6$,\n        $k = 3$, $s = 2$ and $p = 1$). It is equivalent to convolving a $3\n        \\times 3$ kernel over a $2 \\times 2$ input (with $1$ zero inserted\n        between inputs) padded with a $1 \\times 1$ border of zeros (with an\n        additional border of size $1$ added to the bottom and right edges) using\n        unit strides (i.e., $i' = 3$, $\\tilde{i}' = 5$, $a = 1$, $k' = k$, $s' =\n        1$ and $p' = 1$).}\n\\end{figure}\n\n\\chapter{Miscellaneous convolutions}\n\n\\section{Dilated convolutions}\n\nReaders familiar with the deep learning literature may have noticed the term\n``dilated convolutions'' (or ``atrous convolutions'', from the French expression\n{\\em convolutions \\`{a} trous}) appear in recent papers. Here we attempt to\nprovide an intuitive understanding of dilated convolutions. For a more in-depth\ndescription and to understand in what contexts they are applied, see\n\\citet{chen2014semantic,yu2015multi}.\n\nDilated convolutions ``inflate'' the kernel by inserting spaces between the\nkernel elements. The dilation ``rate'' is controlled by an additional\nhyperparameter $d$. Implementations may vary, but there are usually $d - 1$\nspaces inserted between kernel elements such that $d = 1$ corresponds to a\nregular convolution.\n\nDilated convolutions are used to cheaply increase the receptive field of output\nunits without increasing the kernel size, which is especially effective\nwhen multiple dilated convolutions are stacked one after another. For a\nconcrete example, see \\citet{oord2016wavenet}, in which the proposed WaveNet\nmodel implements an autoregressive generative model for raw audio which uses\ndilated convolutions to condition new audio frames on a large context of past\naudio frames.\n\nTo understand the relationship tying the dilation rate $d$ and the output size\n$o$, it is useful to think of the impact of $d$ on the {\\em effective kernel\nsize}. A kernel of size $k$ dilated by a factor $d$ has an effective size\n\\begin{equation*}\n    \\hat{k} = k + (k - 1)(d - 1).\n\\end{equation*}\nThis can be combined with \\autoref{rel:padding_strides} to form the following\nrelationship for dilated convolutions:\n\n\\begin{relationship}\\label{rel:dilation}\nFor any $i$, $k$, $p$ and $s$, and for a dilation rate $d$,\n\\begin{equation*}\n    o = \\left\\lfloor \\frac{i + 2p - k - (k - 1)(d - 1)}{s} \\right\\rfloor + 1.\n\\end{equation*}\n\\end{relationship}\n\n\\begin{figure}[h]\n    \\centering\n    \\includegraphics[width=0.24\\textwidth]{pdf/dilation_00.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/dilation_01.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/dilation_02.pdf}\n    \\includegraphics[width=0.24\\textwidth]{pdf/dilation_03.pdf}\n    \\caption{\\label{fig:dilation} (Dilated convolution)\n        Convolving a $3 \\times 3$ kernel over a $7 \\times 7$ input with a\n        dilation factor of 2 (i.e., $i = 7$, $k = 3$, $d = 2$, $s = 1$ and\n        $p = 0$).}\n\\end{figure}\n\n\\noindent \\autoref{fig:dilation} provides an example for $i = 7$, $k = 3$ and\n$d = 2$.\n\n\\bibliography{bibliography}\n\\bibliographystyle{natbib}\n\\end{document}\n"
  },
  {
    "path": "gif/.gitignore",
    "content": "# Ignore everything in this directory\n*\n# Except this file\n!.gitignore\n"
  },
  {
    "path": "natbib/natbib.bst",
    "content": "%% \n%% This is file `natbib.bst', generated \n%% on <1994/9/16> with the docstrip utility (2.2h).\n%% \n%% The original source files were:\n%% \n%% genbst.mbs  (with options: `ay,nat,seq-lab,nm-rev,dt-beg,yr-par,vol-bf,\n%%                             volp-com,etal-it')\n%% ---------------------------------------- \n%% *** Personal bib style, PWD *** \n%% \n%% (Here are the specifications of the source file)\n%% \\ProvidesFile{genbst.mbs}[1994/09/16 1.5 (PWD)]\n%%   For use with BibTeX version 0.99a or later\n%%     and with LaTeX 2.09 or 2e\n%%-------------------------------------------------------------------\n%% NOTICE:\n%% This file may be used for non-profit purposes.\n%% It may not be distributed in exchange for money,\n%%   other than distribution costs.\n%%\n%% The author provides it `as is' and does not guarantee it in any way.\n%%\n%% Copyright (C) 1994 Patrick W. Daly\n%% Max-Planck-Institut f\\\"ur Aeronomie\n%% Postfach 20\n%% D-37189 Katlenburg-Lindau\n%% Germany\n%%\n%% E-mail:\n%% SPAN--     nsp::linmpi::daly    (note nsp also known as ecd1)\n%% Internet-- daly@linmpi.dnet.gwdg.de\n%%-----------------------------------------------------------\n%% \\CharacterTable\n%%  {Upper-case    \\A\\B\\C\\D\\E\\F\\G\\H\\I\\J\\K\\L\\M\\N\\O\\P\\Q\\R\\S\\T\\U\\V\\W\\X\\Y\\Z\n%%   Lower-case    \\a\\b\\c\\d\\e\\f\\g\\h\\i\\j\\k\\l\\m\\n\\o\\p\\q\\r\\s\\t\\u\\v\\w\\x\\y\\z\n%%   Digits        \\0\\1\\2\\3\\4\\5\\6\\7\\8\\9\n%%   Exclamation   \\!     Double quote  \\\"     Hash (number) \\#\n%%   Dollar        \\$     Percent       \\%     Ampersand     \\&\n%%   Acute accent  \\'     Left paren    \\(     Right paren   \\)\n%%   Asterisk      \\*     Plus          \\+     Comma         \\,\n%%   Minus         \\-     Point         \\.     Solidus       \\/\n%%   Colon         \\:     Semicolon     \\;     Less than     \\<\n%%   Equals        \\=     Greater than  \\>     Question mark \\?\n%%   Commercial at \\@     Left bracket  \\[     Backslash     \\\\\n%%   Right bracket \\]     Circumflex    \\^     Underscore    \\_\n%%   Grave accent  \\`     Left brace    \\{     Vertical bar  \\|\n%%   Right brace   \\}     Tilde         \\~}\n%%---------------------------------------------------------------------\n % This is an author-year citation style bibliography. As such, it is\n % non-standard LaTeX, and requires a special package file to function properly.\n % Such a package is    natbib.sty   by Patrick W. Daly\n % The form of the \\bibitem entries is\n %   \\bibitem[Jones et al.(1990)]{key}...\n %   \\bibitem[Jones et al.(1990)Jones, Baker, and Smith]{key}...\n % The essential feature is that the label (the part in brackets) consists\n % of the author names, as they should appear in the citation, with the year\n % in parentheses following. There must be no space before the opening\n % parenthesis!\n % With natbib v5.3, a full list of authors may also follow the year.\n % In natbib.sty, it is possible to define the type of enclosures that is\n % really wanted (brackets or parentheses), but in either case, there must\n % be parentheses in the label.\n % The \\cite command functions as follows:\n %   \\cite{key} ==>>                Jones et al. (1990)\n %   \\cite[]{key} ==>>              (Jones et al., 1990)\n %   \\cite[chap. 2]{key} ==>>       (Jones et al., 1990, chap. 2)\n %   \\cite[e.g.][]{key} ==>>        (e.g. Jones et al., 1990)\n %   \\cite[e.g.][p. 32]{key} ==>>   (e.g. Jones et al., p. 32)\n %   \\citeauthor{key}               Jones et al.\n %   \\citefullauthor{key}           Jones, Baker, and Smith\n %   \\citeyear{key}                 1990\n%%---------------------------------------------------------------------\n\nENTRY\n  { address\n    author\n    booktitle\n    chapter\n    edition\n    editor\n    howpublished\n    institution\n    journal\n    key\n    month\n    note\n    number\n    organization\n    pages\n    publisher\n    school\n    series\n    title\n    type\n    volume\n    year\n  }\n  {}\n  { label extra.label sort.label }\n\nINTEGERS { output.state before.all mid.sentence after.sentence after.block }\n\nFUNCTION {init.state.consts}\n{ #0 'before.all :=\n  #1 'mid.sentence :=\n  #2 'after.sentence :=\n  #3 'after.block :=\n}\n\nSTRINGS { s t }\n\nFUNCTION {output.nonnull}\n{ 's :=\n  output.state mid.sentence =\n    { \", \" * write$ }\n    { output.state after.block =\n        { add.period$ write$\n          newline$\n          \"\\newblock \" write$\n        }\n        { output.state before.all =\n            'write$\n            { add.period$ \" \" * write$ }\n          if$\n        }\n      if$\n      mid.sentence 'output.state :=\n    }\n  if$\n  s\n}\n\nFUNCTION {output}\n{ duplicate$ empty$\n    'pop$\n    'output.nonnull\n  if$\n}\n\nFUNCTION {output.check}\n{ 't :=\n  duplicate$ empty$\n    { pop$ \"empty \" t * \" in \" * cite$ * warning$ }\n    'output.nonnull\n  if$\n}\n\nFUNCTION {fin.entry}\n{ add.period$\n  write$\n  newline$\n}\n\nFUNCTION {new.block}\n{ output.state before.all =\n    'skip$\n    { after.block 'output.state := }\n  if$\n}\n\nFUNCTION {new.sentence}\n{ output.state after.block =\n    'skip$\n    { output.state before.all =\n        'skip$\n        { after.sentence 'output.state := }\n      if$\n    }\n  if$\n}\n\nFUNCTION {not}\n{   { #0 }\n    { #1 }\n  if$\n}\n\nFUNCTION {and}\n{   'skip$\n    { pop$ #0 }\n  if$\n}\n\nFUNCTION {or}\n{   { pop$ #1 }\n    'skip$\n  if$\n}\n\nFUNCTION {non.stop}\n{ duplicate$\n   \"}\" * add.period$\n   #-1 #1 substring$ \".\" =\n}\n\nFUNCTION {new.block.checkb}\n{ empty$\n  swap$ empty$\n  and\n    'skip$\n    'new.block\n  if$\n}\n\nFUNCTION {field.or.null}\n{ duplicate$ empty$\n    { pop$ \"\" }\n    'skip$\n  if$\n}\n\nFUNCTION {emphasize}\n{ duplicate$ empty$\n    { pop$ \"\" }\n    { \"{\\em \" swap$ * non.stop\n        { \"\\/}\" * }\n        { \"}\" * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {bolden}\n{ duplicate$ empty$\n    { pop$ \"\" }\n    { \"{\\bf \" swap$ * \"}\" * }\n  if$\n}\n\nINTEGERS { nameptr namesleft numnames }\n\nFUNCTION {format.names}\n{ 's :=\n  #1 'nameptr :=\n  s num.names$ 'numnames :=\n  numnames 'namesleft :=\n    { namesleft #0 > }\n    { s nameptr\n      \"{vv~}{ll}{, jj}{, f.}\" format.name$ 't :=\n      nameptr #1 >\n        {\n          namesleft #1 >\n            { \", \" * t * }\n            {\n              numnames #2 >\n                { \",\" * }\n                'skip$\n              if$\n              t \"others\" =\n                { \" \" * \"et~al.\" emphasize * }\n                { \" and \" * t * }\n              if$\n            }\n          if$\n        }\n        't\n      if$\n      nameptr #1 + 'nameptr :=\n      namesleft #1 - 'namesleft :=\n    }\n  while$\n}\n\nFUNCTION {format.names.ed}\n{ 's :=\n  #1 'nameptr :=\n  s num.names$ 'numnames :=\n  numnames 'namesleft :=\n    { namesleft #0 > }\n    { s nameptr\n      \"{f.~}{vv~}{ll}{, jj}\"\n      format.name$ 't :=\n      nameptr #1 >\n        {\n          namesleft #1 >\n            { \", \" * t * }\n            {\n              numnames #2 >\n                { \",\" * }\n                'skip$\n              if$\n              t \"others\" =\n                { \" \" * \"et~al.\" emphasize * }\n                { \" and \" * t * }\n              if$\n            }\n          if$\n        }\n        't\n      if$\n      nameptr #1 + 'nameptr :=\n      namesleft #1 - 'namesleft :=\n    }\n  while$\n}\n\nFUNCTION {format.key}\n{ empty$\n    { key field.or.null }\n    { \"\" }\n  if$\n}\n\nFUNCTION {format.authors}\n{ author empty$\n    { \"\" }\n    { author format.names }\n  if$\n}\n\nFUNCTION {format.editors}\n{ editor empty$\n    { \"\" }\n    { editor format.names\n      editor num.names$ #1 >\n        { \", editors\" * }\n        { \", editor\" * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.in.editors}\n{ editor empty$\n    { \"\" }\n    { editor format.names.ed\n      editor num.names$ #1 >\n        { \", editors\" * }\n        { \", editor\" * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.title}\n{ title empty$\n    { \"\" }\n    { title \"t\" change.case$\n    }\n  if$\n}\n\nFUNCTION {format.full.names}\n{'s :=\n  #1 'nameptr :=\n  s num.names$ 'numnames :=\n  numnames 'namesleft :=\n    { namesleft #0 > }\n    { s nameptr\n      \"{vv~}{ll}\" format.name$ 't :=\n      nameptr #1 >\n        {\n          namesleft #1 >\n            { \", \" * t * }\n            {\n              numnames #2 >\n                { \",\" * }\n                'skip$\n              if$\n              t \"others\" =\n                { \" \" * \"et~al.\" emphasize * }\n                { \" and \" * t * }\n              if$\n            }\n          if$\n        }\n        't\n      if$\n      nameptr #1 + 'nameptr :=\n      namesleft #1 - 'namesleft :=\n    }\n  while$\n}\n\nFUNCTION {author.editor.key.full}\n{ author empty$\n    { editor empty$\n        { key empty$\n            { cite$ #1 #3 substring$ }\n            'key\n          if$\n        }\n        { editor format.full.names }\n      if$\n    }\n    { author format.full.names }\n  if$\n}\n\nFUNCTION {author.key.full}\n{ author empty$\n    { key empty$\n         { cite$ #1 #3 substring$ }\n          'key\n      if$\n    }\n    { author format.full.names }\n  if$\n}\n\nFUNCTION {editor.key.full}\n{ editor empty$\n    { key empty$\n         { cite$ #1 #3 substring$ }\n          'key\n      if$\n    }\n    { editor format.full.names }\n  if$\n}\n\nFUNCTION {make.full.names}\n{ type$ \"book\" =\n  type$ \"inbook\" =\n  or\n    'author.editor.key.full\n    { type$ \"proceedings\" =\n        'editor.key.full\n        'author.key.full\n      if$\n    }\n  if$\n}\n\nFUNCTION {output.bibitem}\n{ newline$\n  \"\\bibitem[\" write$\n  label write$\n  \")\" make.full.names * \"]{\" * write$\n  cite$ write$\n  \"}\" write$\n  newline$\n  \"\"\n  before.all 'output.state :=\n}\n\nFUNCTION {n.dashify}\n{ 't :=\n  \"\"\n    { t empty$ not }\n    { t #1 #1 substring$ \"-\" =\n        { t #1 #2 substring$ \"--\" = not\n            { \"--\" *\n              t #2 global.max$ substring$ 't :=\n            }\n            {   { t #1 #1 substring$ \"-\" = }\n                { \"-\" *\n                  t #2 global.max$ substring$ 't :=\n                }\n              while$\n            }\n          if$\n        }\n        { t #1 #1 substring$ *\n          t #2 global.max$ substring$ 't :=\n        }\n      if$\n    }\n  while$\n}\n\nFUNCTION {word.in}\n{ \"In \" }\n\nFUNCTION {format.date}\n{ year duplicate$ empty$\n    { \"empty year in \" cite$ * \"; set to ????\" * warning$\n       pop$ \"????\" }\n    'skip$\n  if$\n  before.all 'output.state :=\n  \" (\" swap$ * extra.label * \")\" *\n}\n\nFUNCTION {format.btitle}\n{ title emphasize\n}\n\nFUNCTION {tie.or.space.connect}\n{ duplicate$ text.length$ #3 <\n    { \"~\" }\n    { \" \" }\n  if$\n  swap$ * *\n}\n\nFUNCTION {either.or.check}\n{ empty$\n    'pop$\n    { \"can't use both \" swap$ * \" fields in \" * cite$ * warning$ }\n  if$\n}\n\nFUNCTION {format.bvolume}\n{ volume empty$\n    { \"\" }\n    { \"volume\" volume tie.or.space.connect\n      series empty$\n        'skip$\n        { \" of \" * series emphasize * }\n      if$\n      \"volume and number\" number either.or.check\n    }\n  if$\n}\n\nFUNCTION {format.number.series}\n{ volume empty$\n    { number empty$\n        { series field.or.null }\n        { output.state mid.sentence =\n            { \"number\" }\n            { \"Number\" }\n          if$\n          number tie.or.space.connect\n          series empty$\n            { \"there's a number but no series in \" cite$ * warning$ }\n            { \" in \" * series * }\n          if$\n        }\n      if$\n    }\n    { \"\" }\n  if$\n}\n\nFUNCTION {format.edition}\n{ edition empty$\n    { \"\" }\n    { output.state mid.sentence =\n        { edition \"l\" change.case$ \" edition\" * }\n        { edition \"t\" change.case$ \" edition\" * }\n      if$\n    }\n  if$\n}\n\nINTEGERS { multiresult }\n\nFUNCTION {multi.page.check}\n{ 't :=\n  #0 'multiresult :=\n    { multiresult not\n      t empty$ not\n      and\n    }\n    { t #1 #1 substring$\n      duplicate$ \"-\" =\n      swap$ duplicate$ \",\" =\n      swap$ \"+\" =\n      or or\n        { #1 'multiresult := }\n        { t #2 global.max$ substring$ 't := }\n      if$\n    }\n  while$\n  multiresult\n}\n\nFUNCTION {format.pages}\n{ pages empty$\n    { \"\" }\n    { pages multi.page.check\n        { \"pages\" pages n.dashify tie.or.space.connect }\n        { \"page\" pages tie.or.space.connect }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.vol.num.pages}\n{ volume field.or.null\n  bolden\n  number empty$\n    'skip$\n    { \"(\" number * \")\" * *\n      volume empty$\n        { \"there's a number but no volume in \" cite$ * warning$ }\n        'skip$\n      if$\n    }\n  if$\n  pages empty$\n    'skip$\n    { duplicate$ empty$\n        { pop$ format.pages }\n        { \", \" * pages n.dashify * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.chapter.pages}\n{ chapter empty$\n    'format.pages\n    { type empty$\n        { \"chapter\" }\n        { type \"l\" change.case$ }\n      if$\n      chapter tie.or.space.connect\n      pages empty$\n        'skip$\n        { \", \" * format.pages * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.in.ed.booktitle}\n{ booktitle empty$\n    { \"\" }\n    { editor empty$\n        { word.in booktitle emphasize * }\n        { word.in format.in.editors * \", \" * booktitle emphasize * }\n      if$\n    }\n  if$\n}\n\nFUNCTION {format.thesis.type}\n{ type empty$\n    'skip$\n    { pop$\n      type \"t\" change.case$\n    }\n  if$\n}\n\nFUNCTION {format.tr.number}\n{ type empty$\n    { \"Technical Report\" }\n    'type\n  if$\n  number empty$\n    { \"t\" change.case$ }\n    { number tie.or.space.connect }\n  if$\n}\n\nFUNCTION {format.article.crossref}\n{\n  word.in\n  \"\\cite{\" * crossref * \"}\" *\n}\n\nFUNCTION {format.book.crossref}\n{ volume empty$\n    { \"empty volume in \" cite$ * \"'s crossref of \" * crossref * warning$\n      word.in\n    }\n    { \"Volume\" volume tie.or.space.connect\n      \" of \" *\n    }\n  if$\n  \"\\cite{\" * crossref * \"}\" *\n}\n\nFUNCTION {format.incoll.inproc.crossref}\n{\n  word.in\n  \"\\cite{\" * crossref * \"}\" *\n}\n\nFUNCTION {article}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  crossref missing$\n    { journal emphasize \"journal\" output.check\n      format.vol.num.pages output\n    }\n    { format.article.crossref output.nonnull\n      format.pages output\n    }\n  if$\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {book}\n{ output.bibitem\n  author empty$\n    { format.editors \"author and editor\" output.check\n      editor format.key output\n    }\n    { format.authors output.nonnull\n      crossref missing$\n        { \"author and editor\" editor either.or.check }\n        'skip$\n      if$\n    }\n  if$\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  crossref missing$\n    { format.bvolume output\n      new.block\n      format.number.series output\n      new.sentence\n      publisher \"publisher\" output.check\n      address output\n    }\n    {\n      new.block\n      format.book.crossref output.nonnull\n    }\n  if$\n  format.edition output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {booklet}\n{ output.bibitem\n  format.authors output\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  howpublished output\n  address output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {inbook}\n{ output.bibitem\n  author empty$\n    { format.editors \"author and editor\" output.check\n      editor format.key output\n    }\n    { format.authors output.nonnull\n      crossref missing$\n        { \"author and editor\" editor either.or.check }\n        'skip$\n      if$\n    }\n  if$\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  crossref missing$\n    { format.bvolume output\n      format.chapter.pages \"chapter and pages\" output.check\n      new.block\n      format.number.series output\n      new.sentence\n      publisher \"publisher\" output.check\n      address output\n    }\n    { format.chapter.pages \"chapter and pages\" output.check\n      new.block\n      format.book.crossref output.nonnull\n    }\n  if$\n  format.edition output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {incollection}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  crossref missing$\n    { format.in.ed.booktitle \"booktitle\" output.check\n      format.bvolume output\n      format.number.series output\n      format.chapter.pages output\n      new.sentence\n      publisher \"publisher\" output.check\n      address output\n      format.edition output\n    }\n    { format.incoll.inproc.crossref output.nonnull\n      format.chapter.pages output\n    }\n  if$\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {inproceedings}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  crossref missing$\n    { format.in.ed.booktitle \"booktitle\" output.check\n      format.bvolume output\n      format.number.series output\n      format.pages output\n      address output\n      new.sentence\n      organization output\n      publisher output\n    }\n    { format.incoll.inproc.crossref output.nonnull\n      format.pages output\n    }\n  if$\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {conference} { inproceedings }\n\nFUNCTION {manual}\n{ output.bibitem\n  format.authors output\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  organization address new.block.checkb\n  organization output\n  address output\n  format.edition output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {mastersthesis}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  new.block\n  \"Master's thesis\" format.thesis.type output.nonnull\n  school \"school\" output.check\n  address output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {misc}\n{ output.bibitem\n  format.authors output\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title output\n  new.block\n  howpublished output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {phdthesis}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  new.block\n  \"Ph.D. thesis\" format.thesis.type output.nonnull\n  school \"school\" output.check\n  address output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {proceedings}\n{ output.bibitem\n  format.editors output\n  editor format.key output\n  format.date \"year\" output.check\n  new.block\n  format.btitle \"title\" output.check\n  format.bvolume output\n  format.number.series output\n  address output\n  new.sentence\n  organization output\n  publisher output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {techreport}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  format.tr.number output.nonnull\n  institution \"institution\" output.check\n  address output\n  new.block\n  note output\n  fin.entry\n}\n\nFUNCTION {unpublished}\n{ output.bibitem\n  format.authors \"author\" output.check\n  author format.key output\n  format.date \"year\" output.check\n  new.block\n  format.title \"title\" output.check\n  new.block\n  note \"note\" output.check\n  fin.entry\n}\n\nFUNCTION {default.type} { misc }\n\nMACRO {jan} {\"January\"}\n\nMACRO {feb} {\"February\"}\n\nMACRO {mar} {\"March\"}\n\nMACRO {apr} {\"April\"}\n\nMACRO {may} {\"May\"}\n\nMACRO {jun} {\"June\"}\n\nMACRO {jul} {\"July\"}\n\nMACRO {aug} {\"August\"}\n\nMACRO {sep} {\"September\"}\n\nMACRO {oct} {\"October\"}\n\nMACRO {nov} {\"November\"}\n\nMACRO {dec} {\"December\"}\n\nMACRO {acmcs} {\"ACM Computing Surveys\"}\n\nMACRO {acta} {\"Acta Informatica\"}\n\nMACRO {cacm} {\"Communications of the ACM\"}\n\nMACRO {ibmjrd} {\"IBM Journal of Research and Development\"}\n\nMACRO {ibmsj} {\"IBM Systems Journal\"}\n\nMACRO {ieeese} {\"IEEE Transactions on Software Engineering\"}\n\nMACRO {ieeetc} {\"IEEE Transactions on Computers\"}\n\nMACRO {ieeetcad}\n {\"IEEE Transactions on Computer-Aided Design of Integrated Circuits\"}\n\nMACRO {ipl} {\"Information Processing Letters\"}\n\nMACRO {jacm} {\"Journal of the ACM\"}\n\nMACRO {jcss} {\"Journal of Computer and System Sciences\"}\n\nMACRO {scp} {\"Science of Computer Programming\"}\n\nMACRO {sicomp} {\"SIAM Journal on Computing\"}\n\nMACRO {tocs} {\"ACM Transactions on Computer Systems\"}\n\nMACRO {tods} {\"ACM Transactions on Database Systems\"}\n\nMACRO {tog} {\"ACM Transactions on Graphics\"}\n\nMACRO {toms} {\"ACM Transactions on Mathematical Software\"}\n\nMACRO {toois} {\"ACM Transactions on Office Information Systems\"}\n\nMACRO {toplas} {\"ACM Transactions on Programming Languages and Systems\"}\n\nMACRO {tcs} {\"Theoretical Computer Science\"}\n\nREAD\n\nFUNCTION {sortify}\n{ purify$\n  \"l\" change.case$\n}\n\nINTEGERS { len }\n\nFUNCTION {chop.word}\n{ 's :=\n  'len :=\n  s #1 len substring$ =\n    { s len #1 + global.max$ substring$ }\n    's\n  if$\n}\n\nFUNCTION {format.lab.names}\n{ 's :=\n  s #1 \"{vv~}{ll}\" format.name$\n  s num.names$ duplicate$\n  #2 >\n    { pop$ \" \" * \"et~al.\" emphasize * }\n    { #2 <\n        'skip$\n        { s #2 \"{ff }{vv }{ll}{ jj}\" format.name$ \"others\" =\n            { \" \" * \"et~al.\" emphasize * }\n            { \" and \" * s #2 \"{vv~}{ll}\" format.name$ * }\n          if$\n        }\n      if$\n    }\n  if$\n}\n\nFUNCTION {author.key.label}\n{ author empty$\n    { key empty$\n        { cite$ #1 #3 substring$ }\n        'key\n      if$\n    }\n    { author format.lab.names }\n  if$\n}\n\nFUNCTION {author.editor.key.label}\n{ author empty$\n    { editor empty$\n        { key empty$\n            { cite$ #1 #3 substring$ }\n            'key\n          if$\n        }\n        { editor format.lab.names }\n      if$\n    }\n    { author format.lab.names }\n  if$\n}\n\nFUNCTION {editor.key.label}\n{ editor empty$\n    { key empty$\n        { cite$ #1 #3 substring$ }\n        'key\n      if$\n    }\n    { editor format.lab.names }\n  if$\n}\n\nFUNCTION {calc.label}\n{ type$ \"book\" =\n  type$ \"inbook\" =\n  or\n    'author.editor.key.label\n    { type$ \"proceedings\" =\n        'editor.key.label\n        'author.key.label\n      if$\n    }\n  if$\n  \"(\"\n  *\n  year duplicate$ empty$\n     { pop$ \"????\" }\n     { purify$ #-1 #4 substring$ }\n  if$\n  *\n  'label :=\n}\n\nFUNCTION {sort.format.names}\n{ 's :=\n  #1 'nameptr :=\n  \"\"\n  s num.names$ 'numnames :=\n  numnames 'namesleft :=\n    { namesleft #0 > }\n    { nameptr #1 >\n        { \"   \" * }\n        'skip$\n      if$\n      s nameptr\n      \"{vv{ } }{ll{ }}{  f{ }}{  jj{ }}\"\n      format.name$ 't :=\n      nameptr numnames = t \"others\" = and\n        { \"et al\" * }\n        { numnames #2 > nameptr #2 = and\n          { \"zzzzzz\" * #1 'namesleft := }\n          { t sortify * }\n        if$\n        }\n      if$\n      nameptr #1 + 'nameptr :=\n      namesleft #1 - 'namesleft :=\n    }\n  while$\n}\n\nFUNCTION {sort.format.title}\n{ 't :=\n  \"A \" #2\n    \"An \" #3\n      \"The \" #4 t chop.word\n    chop.word\n  chop.word\n  sortify\n  #1 global.max$ substring$\n}\n\nFUNCTION {author.sort}\n{ author empty$\n    { key empty$\n        { \"to sort, need author or key in \" cite$ * warning$\n          \"\"\n        }\n        { key sortify }\n      if$\n    }\n    { author sort.format.names }\n  if$\n}\n\nFUNCTION {author.editor.sort}\n{ author empty$\n    { editor empty$\n        { key empty$\n            { \"to sort, need author, editor, or key in \" cite$ * warning$\n              \"\"\n            }\n            { key sortify }\n          if$\n        }\n        { editor sort.format.names }\n      if$\n    }\n    { author sort.format.names }\n  if$\n}\n\nFUNCTION {editor.sort}\n{ editor empty$\n    { key empty$\n        { \"to sort, need editor or key in \" cite$ * warning$\n          \"\"\n        }\n        { key sortify }\n      if$\n    }\n    { editor sort.format.names }\n  if$\n}\n\nFUNCTION {presort}\n{ calc.label\n  label sortify\n  \"    \"\n  *\n  type$ \"book\" =\n  type$ \"inbook\" =\n  or\n    'author.editor.sort\n    { type$ \"proceedings\" =\n        'editor.sort\n        'author.sort\n      if$\n    }\n  if$\n  #1 entry.max$ substring$\n  'sort.label :=\n  sort.label\n  *\n  \"    \"\n  *\n  title field.or.null\n  sort.format.title\n  *\n  #1 entry.max$ substring$\n  'sort.key$ :=\n}\n\nITERATE {presort}\n\nSORT\n\nSTRINGS { last.label next.extra }\n\nINTEGERS { last.extra.num }\n\nFUNCTION {initialize.extra.label.stuff}\n{ #0 int.to.chr$ 'last.label :=\n  \"\" 'next.extra :=\n  #0 'last.extra.num :=\n}\n\nFUNCTION {forward.pass}\n{ last.label label =\n    { last.extra.num #1 + 'last.extra.num :=\n      last.extra.num int.to.chr$ 'extra.label :=\n    }\n    { \"a\" chr.to.int$ 'last.extra.num :=\n      \"\" 'extra.label :=\n      label 'last.label :=\n    }\n  if$\n}\n\nFUNCTION {reverse.pass}\n{ next.extra \"b\" =\n    { \"a\" 'extra.label := }\n    'skip$\n  if$\n  extra.label 'next.extra :=\n  label extra.label * 'label :=\n}\n\nEXECUTE {initialize.extra.label.stuff}\n\nITERATE {forward.pass}\n\nREVERSE {reverse.pass}\n\nFUNCTION {bib.sort.order}\n{ sort.label\n  \"    \"\n  *\n  year field.or.null sortify\n  *\n  \"    \"\n  *\n  title field.or.null\n  sort.format.title\n  *\n  #1 entry.max$ substring$\n  'sort.key$ :=\n}\n\nITERATE {bib.sort.order}\n\nSORT\n\nFUNCTION {begin.bib}\n{ preamble$ empty$\n    'skip$\n    { preamble$ write$ newline$ }\n  if$\n  \"\\begin{thebibliography}{}\" write$ newline$\n}\n\nEXECUTE {begin.bib}\n\nEXECUTE {init.state.consts}\n\nITERATE {call.type$}\n\nFUNCTION {end.bib}\n{ newline$\n  \"\\end{thebibliography}\" write$ newline$\n}\n\nEXECUTE {end.bib}\n%% End of customized bst file \n"
  },
  {
    "path": "pdf/.gitignore",
    "content": "# Ignore everything in this directory\n*\n# Except this file\n!.gitignore\n"
  },
  {
    "path": "png/.gitignore",
    "content": "# Ignore everything in this directory\n*\n# Except this file\n!.gitignore\n"
  },
  {
    "path": "templates/arithmetic_figure.txt",
    "content": "\\documentclass[class=minimal,border=10pt]{{standalone}}\n\\usepackage{{tikz}}\n\\usepackage{{xcolor}}\n\\definecolor{{blue}}{{RGB}}{{38,139,210}}\n\\definecolor{{cyan}}{{RGB}}{{42,161,152}}\n\\definecolor{{base01}}{{RGB}}{{88,110,117}}\n\\definecolor{{base02}}{{RGB}}{{7,54,66}}\n\\definecolor{{base03}}{{RGB}}{{0,43,54}}\n\\usetikzlibrary{{calc,shapes,positioning}}\n\\begin{{document}}\n\\begin{{tikzpicture}}[scale=.5,every node/.style={{minimum size=1cm}},on grid]\n    \\begin{{scope}}[node/.append style={{yslant=0.5,xslant=-0.7}},\n                    yslant=0.5,xslant=-0.7]\n        \\draw[step=10mm, base03, dashed, thick] (0,0) grid ({PADDING_TO});\n        {INPUT_UNITS}\n\n        \\foreach \\x in {{ {INPUT_GRID_FROM_X},\\number\\numexpr {INPUT_GRID_FROM_X}+{DILATION},...,\\number\\numexpr {INPUT_GRID_TO_X}-1 }} {{\n            \\foreach \\y in {{ {INPUT_GRID_FROM_Y},\\number\\numexpr {INPUT_GRID_FROM_Y}+{DILATION},...,\\number\\numexpr {INPUT_GRID_TO_Y}-1 }} {{\n                \\draw[fill=base02, opacity=0.4] (\\x,\\y) rectangle\n                                        (\\x+1,\\y+1);\n            }}\n        }}\n        \\draw[step=10mm, base03, thick] ({INPUT_GRID_FROM_X}, {INPUT_GRID_FROM_Y}) grid\n                                        ({INPUT_GRID_TO_X}, {INPUT_GRID_TO_Y});\n        \\coordinate (BL) at ({INPUT_GRID_FROM_X},{INPUT_GRID_FROM_Y});\n        \\coordinate (BR) at ({INPUT_GRID_TO_X},{INPUT_GRID_FROM_Y});\n        \\coordinate (TL) at ({INPUT_GRID_FROM_X},{INPUT_GRID_TO_Y});\n        \\coordinate (TR) at ({INPUT_GRID_TO_X},{INPUT_GRID_TO_Y});\n    \\end{{scope}}\n    \\begin{{scope}}[xshift=-5, yshift={OUTPUT_ELEVATION},\n                    every node/.append style={{yslant=0.5,xslant=-0.7}},\n                    yslant=0.5,xslant=-0.7]\n        \\draw (BL) -- ({OUTPUT_BOTTOM_LEFT}) (BR) -- ({OUTPUT_BOTTOM_RIGHT})\n              (TL) -- ({OUTPUT_TOP_LEFT})    (TR) -- ({OUTPUT_TOP_RIGHT});\n        \\draw[fill=cyan] (0,0) rectangle ({OUTPUT_TO});\n        \\draw[step=10mm, base03, thick] (0,0) grid ({OUTPUT_TO});\n        \\draw[fill=base02, opacity=0.4] ({OUTPUT_GRID_FROM}) rectangle\n                                        ({OUTPUT_GRID_TO});\n        \\draw[base03, thick] ({OUTPUT_GRID_FROM}) rectangle ({OUTPUT_GRID_TO});\n    \\end{{scope}}\n\\end{{tikzpicture}}\n\\end{{document}}\n"
  },
  {
    "path": "templates/numerical_figure.txt",
    "content": "\\documentclass[class=article,border=10pt]{{standalone}}\n\\usepackage{{tikz}}\n\\usepackage{{xcolor}}\n\\definecolor{{blue}}{{RGB}}{{38,139,210}}\n\\definecolor{{cyan}}{{RGB}}{{42,161,152}}\n\\definecolor{{base01}}{{RGB}}{{88,110,117}}\n\\definecolor{{base02}}{{RGB}}{{7,54,66}}\n\\definecolor{{base03}}{{RGB}}{{0,43,54}}\n\\usetikzlibrary{{calc,shapes,positioning}}\n\\begin{{document}}\n\\begin{{tikzpicture}}[scale=.5,every node/.style={{minimum size=1cm}},on grid]\n    \\begin{{scope}}\n        \\draw[step=10mm, base03, dashed, thick] (0,0) grid ({PADDING_TO});\n        \\draw[fill=blue] ({INPUT_FROM}) rectangle ({INPUT_TO});\n        \\draw[draw=base03, thick] ({INPUT_FROM}) grid ({INPUT_TO});\n        {INPUT_VALUES}\n        \\draw[fill=base02, opacity=0.4] ({INPUT_GRID_FROM}) rectangle\n                                        ({INPUT_GRID_TO});\n        \\draw[step=10mm, base03, thick] ({INPUT_GRID_FROM}) grid\n                                        ({INPUT_GRID_TO});\n        {KERNEL_VALUES}\n    \\end{{scope}}\n    \\begin{{scope}}[xshift={XSHIFT}, yshift={YSHIFT}]\n        \\draw[fill=cyan] (0,0) rectangle ({OUTPUT_TO});\n        \\draw[step=10mm, base03, thick] (0,0) grid ({OUTPUT_TO});\n        \\draw[fill=base02, opacity=0.4] ({OUTPUT_GRID_FROM}) rectangle\n                                        ({OUTPUT_GRID_TO});\n        \\draw[base03, thick] ({OUTPUT_GRID_FROM}) rectangle ({OUTPUT_GRID_TO});\n        {OUTPUT_VALUES}\n    \\end{{scope}}\n\\end{{tikzpicture}}\n\\end{{document}}\n"
  },
  {
    "path": "templates/unit.txt",
    "content": "\\draw[draw=base03, fill=blue, thick] ({0},{1}) rectangle ({2},{3});\n"
  }
]