Repository: cs231n/cs231n.github.io
Branch: master
Commit: 54528735aa21
Files: 83
Total size: 1.2 MB

Directory structure:
gitextract_o4agppfp/

├── .gitignore
├── Gemfile
├── LICENSE
├── Readme.md
├── _config.yml
├── _includes/
│   ├── footer.html
│   ├── head.html
│   └── header.html
├── _layouts/
│   ├── default.html
│   ├── page.html
│   └── post.html
├── adversary-attacks.md
├── assets/
│   └── conv-demo/
│       ├── index.html
│       └── utils.js
├── assignments/
│   ├── 2015/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2016/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2017/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2018/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2019/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2020/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2021/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2022/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2023/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   ├── 2024/
│   │   ├── assignment1.md
│   │   ├── assignment2.md
│   │   └── assignment3.md
│   └── 2025/
│       ├── assignment1.md
│       ├── assignment2.md
│       └── assignment3.md
├── attention.md
├── aws-tutorial.md
├── choose-project.md
├── classification.md
├── convnet-tips.md
├── convolutional-networks.md
├── css/
│   └── main.css
├── generative-modeling.md
├── generative-models.md
├── index.html
├── jupyter-colab-tutorial.md
├── jupyter-notebook-tutorial.ipynb
├── linear-classify.md
├── nerf.md
├── neural-networks-1.md
├── neural-networks-2.md
├── neural-networks-3.md
├── neural-networks-case-study.md
├── optimization-1.md
├── optimization-2.md
├── overview.md
├── pixelrnn.md
├── poster-2018.md
├── poster.md
├── python-colab.ipynb
├── python-numpy-tutorial.md
├── rnn.md
├── setup.md
├── student-contributions/
│   ├── BackPropagationBasicMatrixOperations.bib
│   ├── BackPropagationBasicMatrixOperations.tex
│   ├── Makefile
│   └── latexmkrc
├── terminal-tutorial.md
├── transfer-learning.md
├── transformers.md
└── understanding-cnn.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
.static_storage/
.media/
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
# env/
venv/
# ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

# Mac
.DS_Store

# MuJoCo License key
mjkey.txt

.mujocomanip_temp_model.xml

# Python IDE
.idea

# Locally generated files
dump.rdb
*.local.ipynb
runs/
temp*
debug_*
*.swp


================================================
FILE: Gemfile
================================================
source 'https://rubygems.org'

gem 'jekyll'


================================================
FILE: LICENSE
================================================
The MIT License (MIT)

Copyright (c) 2015 Andrej Karpathy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: Readme.md
================================================

Notes and assignments for Stanford CS class [CS231n: Convolutional Neural Networks for Visual Recognition](http://vision.stanford.edu/teaching/cs231n/)


================================================
FILE: _config.yml
================================================
# Site settings
title: CS231n Deep Learning for Computer Vision
email: cgokmen@stanford.edu
description: "Course materials and notes for Stanford class CS231n: Deep Learning for Computer Vision."
baseurl: ""
url: "https://cs231n.github.io"
courseurl: "http://cs231n.stanford.edu/"
twitter_username: cs231n
github_username: cs231n

# Build settings
markdown: kramdown
permalink: pretty

highlighter: rouge
kramdown:
  input: GFM
  auto_ids: true
  syntax_highlighter: rouge

# links to homeworks
hw_1_colab: https://cs231n.github.io/assignments/2025/assignment1_colab.zip
hw_2_colab: https://cs231n.github.io/assignments/2025/assignment2_colab.zip
hw_3_colab: https://cs231n.github.io/assignments/2025/assignment3_colab.zip


================================================
FILE: _includes/footer.html
================================================
<footer class="site-footer">

  <div class="wrap">

    <div class="footer-col-1 column">
      <ul>

        {% if site.github_username %}<li>
          <a href="https://github.com/{{ site.github_username }}">
            <span class="icon github">
              <svg version="1.1" class="github-icon-svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
                 viewBox="0 0 16 16" enable-background="new 0 0 16 16" xml:space="preserve">
                <path fill-rule="evenodd" clip-rule="evenodd" fill="#C2C2C2" d="M7.999,0.431c-4.285,0-7.76,3.474-7.76,7.761
                c0,3.428,2.223,6.337,5.307,7.363c0.388,0.071,0.53-0.168,0.53-0.374c0-0.184-0.007-0.672-0.01-1.32
                c-2.159,0.469-2.614-1.04-2.614-1.04c-0.353-0.896-0.862-1.135-0.862-1.135c-0.705-0.481,0.053-0.472,0.053-0.472
                c0.779,0.055,1.189,0.8,1.189,0.8c0.692,1.186,1.816,0.843,2.258,0.645c0.071-0.502,0.271-0.843,0.493-1.037
                C4.86,11.425,3.049,10.76,3.049,7.786c0-0.847,0.302-1.54,0.799-2.082C3.768,5.507,3.501,4.718,3.924,3.65
                c0,0,0.652-0.209,2.134,0.796C6.677,4.273,7.34,4.187,8,4.184c0.659,0.003,1.323,0.089,1.943,0.261
                c1.482-1.004,2.132-0.796,2.132-0.796c0.423,1.068,0.157,1.857,0.077,2.054c0.497,0.542,0.798,1.235,0.798,2.082
                c0,2.981-1.814,3.637-3.543,3.829c0.279,0.24,0.527,0.713,0.527,1.437c0,1.037-0.01,1.874-0.01,2.129
                c0,0.208,0.14,0.449,0.534,0.373c3.081-1.028,5.302-3.935,5.302-7.362C15.76,3.906,12.285,0.431,7.999,0.431z"/>
              </svg>
            </span>
            <span class="username">{{ site.github_username }}</span>
          </a>
        </li>{% endif %}
        {% if site.twitter_username %}<li>
          <a href="https://twitter.com/{{ site.twitter_username }}">
            <span class="icon twitter">
              <svg version="1.1" class="twitter-icon-svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px"
                 viewBox="0 0 16 16" enable-background="new 0 0 16 16" xml:space="preserve">
                <path fill="#C2C2C2" d="M15.969,3.058c-0.586,0.26-1.217,0.436-1.878,0.515c0.675-0.405,1.194-1.045,1.438-1.809
                c-0.632,0.375-1.332,0.647-2.076,0.793c-0.596-0.636-1.446-1.033-2.387-1.033c-1.806,0-3.27,1.464-3.27,3.27
                c0,0.256,0.029,0.506,0.085,0.745C5.163,5.404,2.753,4.102,1.14,2.124C0.859,2.607,0.698,3.168,0.698,3.767
                c0,1.134,0.577,2.135,1.455,2.722C1.616,6.472,1.112,6.325,0.671,6.08c0,0.014,0,0.027,0,0.041c0,1.584,1.127,2.906,2.623,3.206
                C3.02,9.402,2.731,9.442,2.433,9.442c-0.211,0-0.416-0.021-0.615-0.059c0.416,1.299,1.624,2.245,3.055,2.271
                c-1.119,0.877-2.529,1.4-4.061,1.4c-0.264,0-0.524-0.015-0.78-0.046c1.447,0.928,3.166,1.469,5.013,1.469
                c6.015,0,9.304-4.983,9.304-9.304c0-0.142-0.003-0.283-0.009-0.423C14.976,4.29,15.531,3.714,15.969,3.058z"/>
              </svg>
            </span>
            <span class="username">{{ site.twitter_username }}</span>
          </a>
        </li>{% endif %}
        <!-- <li>
          <a href="mailto:{{ site.email }}">{{ site.email }}</a>
        </li> -->
      </ul>
    </div>

    <div class="footer-col-2 column">

    </div>

    <div class="footer-col-3 column">

    </div>

  </div>

</footer>


================================================
FILE: _includes/head.html
================================================
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <title>{% if page.title %}{{ page.title }}{% else %}{{ site.title }}{% endif %}</title>
  <meta name="viewport" content="width=device-width">
  <meta name="description" content="{{ site.description }}">
  <link rel="canonical" href="{{ page.url | replace:'index.html','' | prepend: site.baseurl | prepend: site.url }}">

  <!-- Custom CSS -->
  <link rel="stylesheet" href="{{ "/css/main.css" | prepend: site.baseurl }}">

  <!-- Google fonts -->
  <link href='https://fonts.googleapis.com/css?family=Roboto:400,300' rel='stylesheet' type='text/css'>

  <!-- Google tracking -->
  <script>
    (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
    (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
    m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
    })(window,document,'script','//www.google-analytics.com/analytics.js','ga');

    ga('create', 'UA-46895817-2', 'auto');
    ga('send', 'pageview');
  </script>
</head>


================================================
FILE: _includes/header.html
================================================
<header class="site-header">

  <a class="site-title" href="{{ site.url }}">{{ site.title }}</a>
  <a class="site-link" href="{{ site.courseurl }}">Course Website</a>

</header>


================================================
FILE: _layouts/default.html
================================================
<!DOCTYPE html>
<html>

  {% include head.html %}

    <body>

      <script src="https://unpkg.com/vanilla-back-to-top@7.2.1/dist/vanilla-back-to-top.min.js"></script>
      <script>addBackToTop({
        backgroundColor: '#fff',
        innerHTML: 'Back to Top',
        textColor: '#333'
      })</script>
      <style>
        #back-to-top {
          border: 1px solid #ccc;
          border-radius: 0;
          font-family: sans-serif;
          font-size: 14px;
          width: 100px;
          text-align: center;
          line-height: 30px;
          height: 30px;
        }
      </style>

    {% include header.html %}

    <div class="page-content">
      <div class="wrap">
      {{ content }}
      </div>
    </div>

    {% include footer.html %}

    <!-- mathjax -->
    <script type="text/javascript" src="//cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <script type="text/x-mathjax-config">
      // Make responsive
      MathJax.Hub.Config({
       "HTML-CSS": { linebreaks: { automatic: true } },
       "SVG": { linebreaks: { automatic: true } },
      });
    </script>

    </body>
</html>


================================================
FILE: _layouts/page.html
================================================
---
layout: default
---
<div class="post">

  <header class="post-header">
    <h1>{{ page.title }}</h1>
  </header>

  <article class="post-content">
  {{ content }}
  </article>

</div>


================================================
FILE: _layouts/post.html
================================================
---
layout: default
---
<div class="post">

  <header class="post-header">
    <h1>{{ page.title }}</h1>
    <p class="meta">{{ page.date | date: "%b %-d, %Y" }}{% if page.author %} • {{ page.author }}{% endif %}{% if page.meta %} • {{ page.meta }}{% endif %}</p>
  </header>

  <article class="post-content">
  {{ content }}
  </article>

</div>


================================================
FILE: adversary-attacks.md
================================================
Table of contents:

##### Adversarial Attacks
- Poisoning Attack vs. Evasion Attack
- White-box Attack vs. Black-box Attack
- Adversarial Goals

##### Adversarial Examples
- Fast Gradient Sign Method (FGSM)
- One-Pixel Attack

##### Defense Strategies against Adversarial Examples
- Adversarial Training
- Defensive Distillation
- Denoising

## Adversarial Attacks
In class, we have seen examples where image classification models can be fooled with adversarial examples. By adding small and often imperceptible perturbations to images, these adversarial attacks can deceive a deep learning model to label the modified image as a completely different class. In this note, we will give an overview of the various types of adversarial attacks on machine learning models, discuss several representative methods for generating adversarial examples, as well as some possible defense methods and mitigation strategies against such adversarial attacks.

**Adversary**. In computer security, the term “adversary” refers to people or machines that attempt to penetrate or corrupt a computer network or system. In the context of machine learning and deep learning, adversaries can use a variety of attack methods to disrupt a machine learning model, and cause it to behave erratically (e.g. to misclassify a dog image as a cat image). In general, attacks can happen either during model training (known as a “poisoning” attack) or after the model has finished training (an “evasion” attack).

### Poisoning Attack vs. Evasion Attack
**Poisoning attack**. A poisoning attack involves polluting a machine learning model's training data. Such attacks take place during the training time of the machine learning model, when an adversary presents data that is intentionally mislabeled to the model, therefore instilling misleading knowledge, and will eventually cause the model to make inaccurate predictions at test time. Poisoning attacks require that an adversary has access to the model’s training data, and is able to inject misleading data into the training set. To make the attacks less noticeable, the adversary may decide to slowly introduce ill-intentioned samples over an extended period of time.

An example of a poisoning attack took place in 2016, when Microsoft launched a Twitter chatbot, Tay. Tay was designed to mimic the language patterns of a 18- to 24- year-old in the U.S. for entertainment purposes, to engage people through “casual and playful conversation”, and to learn from its conversations with human users on Twitter. However, soon after its launch, a vulnerability in Tay was exploited by some adversaries, who interacted with Tay using profane and offensive language. The attack caused the chatbot to learn and internalize inappropriate language. The more Tay engaged with adversarial users, the more offensive Tay’s tweets became. As a result, Tay was quickly shut down by Microsoft, only 16 hours after its launch.

**Evasion attack**. In evasion attacks, the adversary tries to evade or fool the system by adjusting malicious samples during the testing phase. Compared with poisoning attacks, evasion attacks are more common, and easier to conduct. One main reason is that with evasion attacks, adversaries don’t necessarily need to access the training data, nor inject bad data into the training process of the model.

### White-box vs. Black-box Attacks
The evasion attacks discussed above occur during the testing phase of the model. The effectiveness of such attacks depends on the amount of information available to the adversary about the model. Before we dive into the various methods for generating adversarial examples, let’s first briefly discuss the differences between white-box attacks and black-box attacks, and understand the various adversarial goals.

**White-box attack**. In a white-box attack, the adversary is assumed to have total knowledge about the model, such as the model architecture, number of layers, the weights of the final trained model, etc. The adversary also has knowledge on the model’s training process, such as the optimization algorithm (e.g. Adam, RMSProp etc.) that is used, the data that the model is trained on, the distribution of the training data, and the model’s performance on the training data. It can be very dangerous if the adversary is able to identify the feature space where the model has a high error rate, and use that information to construct adversarial examples and exploit the model. The more the adversary knows, the more severe the attacks can be, and so are the consequences.

**Black-box attack**. On the contrary, in black-box attacks, the adversary assumes no knowledge about the model. Instead of constructing adversarial examples based on prior knowledge, the adversary exploits a model by providing a series of carefully crafted inputs and observing outputs. Through trial and error, the attacks may eventually be successful in misleading the model to make the wrong predictions.

### Adversarial Goals
The goals of the adversarial attacks can be broadly categorized as follows:
- **Confidence Reduction**. The adversary aims to reduce the model’s confidence in its predictions, which does not necessarily lead to the wrong class output. For example, due to the adversarial attack, a model which originally classifies an image of a cat with high probability ends up outputting a lower probability for the same image and class pair.
- **Untargeted Misclassification**. The adversary tries to misguide the model to predict any of the incorrect classes. For example, when presented with an image of a cat, the model outputs any class that is non-cat (e.g. dog, airplane, computer, etc.).
- **Targeted Misclassification**. The adversary tries to misguide the model to output a particular class other than the true class. For example, when presented with an image of a cat, the model is forced to classify it as a dog image, where the output class of dog is specified by the adversary.

Generally speaking, targeted attacks are more sophisticated than untargeted attacks, which are in turn more difficult than confidence reduction.

## Adversarial Examples
In CS231n, we mainly focus on examples of adversarial images in the context of image classification. In an adversarial attack, the adversary attempts to modify the original input image by adding some carefully crafted perturbations, which can cause the image classification model to yield mispredictions. Oftentimes, the generated perturbations are either too small to be visually identified by human eyes, or small enough that humans consider them to be harmless, random noise. And yet, these perturbations can be “meaningful” and misleading to the image classification model. Below, we discuss two methods to generate adversarial examples.

<p align="center">
  <img src="/assets/adversary-1.png" width="450">
  <div class="figcaption">An example of adversarial attack, in which a tiny amount of carefully crafted perturbations leads to misclassification. Here the perturbations are so small that they only become visible to humans after being magnified for about 30 times.</div>
</p>


### Fast Gradient Sign Method (FGSM)
The simplest yet highly efficient algorithm for generating adversarial examples is known as the Fast Gradient Sign Method (FGSM), which is a single step attack on images. Proposed by Goodfellow et al. in 2014, FGSM combines a white box approach with a misclassification goal. Using FGSM, a small perturbation is first generated in the direction of the sign of the gradients with respect to the input image. Next, the generated perturbations are added to the original image, resulting in an adversarial image. The equation for untargeted attack using FGSM is given by:

$$ adv\_x = x + \epsilon*\text{sign}(\nabla_xJ(\theta, x, y)) $$

Here, $$ J $$ is the cost function (e.g. cross-entropy cost) of the trained model, $$ \nabla_x $$ denotes the gradient of the model’s loss function with respect to the original image $$ x $$. The thing to note here is we are calculating the gradient with respect to the pixels of the image. From the gradient of the model’s loss function, we take the sign of each term in the gradient, reducing it to a matrix of 1s, 0s and -1s. The intuition here is that we nudge the pixels of the image in the direction that maximizes the loss. In other words, we perform **gradient ascent** instead of gradient descent, since the goal is to increase the error and let the model output the incorrect results.

Having obtained the sign of the gradient, we then multiply the result with a tiny value, ϵ, which controls the amount of perturbations (i.e. the perturbation’s amplitude) to be added. The larger the value of epsilon, the more noticeable the perturbations are to humans. Recall that from the adversary’s perspective, the goal is to ensure the corruption to the original image is imperceptible, while being able to fool the classification model. ϵ is a hyper-parameter to be chosen.

FGSM can also be used for targeted misclassification attacks. In this case, the adversary aims to maximize the probability of some specific target class, which is unlikely to be the true class of the original image $$ x $$:

$$ adv\_x = x - \epsilon*\text{sign}(\nabla_xJ(\theta, x, y_{target})) $$

The difference is in case of targeted attacks, we minimize the loss between the model’s predicted class and the target class, whereas in case of untargeted attack we maximize the loss instead of minimize it.

In addition to only applying the FGSM equation once, a straightforward extension is to develop an iterative procedure, and run FGSM multiple times. Here is what the iterative procedure might look like for untargeted attacks using FGSM, when implemented with TensorFlow:

```
import tensorflow as tf
import keras.backend as K

# Get the true label of the iamge
correct_label = get_correct_label()
total_class_count = N

# Initialize adversarial example with original input image
x_adv = original_img
x_adv = tf.convert_to_tensor(x_adv, dtype=tf.float32)

# Initialize the perturbations
noise = np.zeros_like(original_img)

# Epsilon is a hyper-parameter
epsilon = 0.01
epochs = 100

for i in range(epochs):
    target = K.one_hot(correct_label, total_class_count)

    with tf.GradientTape() as tape:
        tape.watch(x_adv)
        prediction = model(x_adv)
        loss = K.categorical_crossentropy(target, prediction[0])

    # Calculate the gradient
    grads = tape.gradient(loss, x_adv)

    # Get the sign of the gradient
    delta = K.sign(grads[0])
    noise = noise + delta

    # Generate an adversarial example with FGSM
    x_adv = x_adv + epsilon*delta

    # Get the latest model output
    preds = model.predict(x_adv, steps=1).squeeze()
    pred = np.argmax(preds, axis=-1)

    # Exit the procedure if model is fooled
    if pred != correct_label:
      break
```

In the example implementation above, we also employ early-stopping and exit the iterative procedure once the model is fooled. This helps minimize the amount of perturbations added, and may also improve the efficiency by reducing the time needed to fool the model. With the iterative approach, we can also obtain additional adversarial examples when the procedure is run for more iterations.

<p align="center">
  <img src="/assets/adversary-2.png" width="550">
  <div class="figcaption">An example of a “successful” adversarial attack in which the image classifier recognized a watermelon as a tomato. In this case, although the goal of misclassification is achieved, the unmagnified perturbations are large enough to be perceived by human eyes.</div>
</p>

In practice, FGSM attacks work particularly well for network architectures that favor linearity, such as logistic regression, maxout networks, LSTMs, networks that use the ReLU activation function, etc. While ReLU is non-linear, when ϵ is sufficiently small, the ReLU activation does not change the sign of the gradient with respect to the original image, and thus will not prevent the pixels of the image to be nudged in the direction that maximizes the loss. The authors of FGSM stated that changing to nonlinear model families such as RBF networks confer a significant reduction in a model’s vulnerability to adversarial examples.

### One-Pixel Attack
In order to fool a machine learning model, the Fast Gradient Sign Method discussed above requires many pixels of the original image to be changed, if only by a little. As shown in the example image above, sometimes the modifications might be excessive (i.e. the amount of modified pixels are fairly large) such that they become visually identifiable to human eyes. One may then wonder if it’s possible to modify fewer pixels, while still keeping the model fooled? The answer is yes. In 2019, a method for generating one-pixel adversarial perturbations was proposed, in which an adversarial example can be generated by modifying just one pixel.

The One-pixel attack uses differential evolution to find out which pixel is to be changed, and how. Differential evolution (DE) is a type of evolutionary algorithm (EA). It is a population based optimization algorithm for solving complex optimization problems. In specific, during each iteration, a set of candidate solutions (children) is generated according to the current population (parents). The candidate solutions are then compared with their corresponding parents, surviving if they are better candidate solutions according to some criterion. The process repeats until some stopping criterion are met.

In one-pixel attack, each candidate solution encodes a pixel modification and is represented by a vector of five elements: the x and y coordinates, and the red, green and blue (RGB) values of the pixel. The search starts with 400 initial candidate solutions. In each iteration, another 400 candidate solutions (children) are generated using the following formula:

$$ x_{i}(g+1) = x_{r1}(g) + F(x_{r2}(g) - x_{r3}(g)), $$

$$ r1 \neq r2 \neq r3 $$

where $$ x_{i} $$ is an element of the candidate solution, $$ g $$ is the current generation, $$ F $$ is the scale parameter set to be 0.5, and $$ r1 $$, $$ r2 $$, $$ r3 $$ are different random numbers. The search stops when one of the candidate solutions is an adversarial example that fools the model successfully, or if the maximum number of iterations specified has been reached.

By using differential evolution, one-pixel attack has several advantages. Since DE doesn’t use gradient information for optimization, it’s not required for the objective function to be differentiable, as is the case with classical optimization methods such as gradient descent. Calculating the gradient requires much more information about the model to be exploited, therefore not needing gradient information makes conducting the attack more feasible. Finally, it’s worth noting that one-pixel attack is a type of black-box attack, which assumes no information about the classification model; it is sufficient to just observe the model’s output probabilities.

## Defense Strategies against Adversarial Examples
Having discussed some techniques for generating adversarial examples, we now turn our attention to possible defense strategies against such adversarial attacks. While we go through each of the countermeasures, it’s worth keeping in mind that none of them can act as a panacea for all challenges. Moreover, implementing such defense strategies may incur extra performance costs.

### Adversarial Training
One of the most intuitive and effective defenses against adversarial attacks is adversarial training. The idea of adversarial training is to incorporate adversarial samples into the model training stage, and thus increase model robustness. In other words, since we know that the original training process leads to models that are vulnerable to adversarial examples, we just also train on adversarial examples so that the models have some “immunity” over the adversarial examples.

To perform adversarial training, the defender simply generates a lot of adversarial examples and include them in the training data. At training time, the model is trained to assign the same label to the adversarial example as to the original example. For example, upon seeing an adversarially perturbed training image, whose original label is cat, the model should learn the correct label for the perturbed image is still cat.

The problem with adversarial training is that it is only effective in defending the models against the same attacks used to craft the examples originally included in the training pool. In black-box attacks, adversaries only need to find one crack in a system’s defenses for an attack to go through. It can be likely that the attack method employed by the adversary is not anticipated by the defender at the time of model training, therefore leaving the adversarially trained model vulnerable to the unseen attacks.

### Defensive Distillation
Introduced in 2015 by Papernot et al., defensive distillation uses the idea of distillation and knowledge transfer to reduce the effectiveness of adversarial samples on deep neural networks. The term distillation was originally proposed as a way to transfer knowledge from a large neural network to a smaller one. Doing so can help reduce the computational complexity of deep neural networks, and facilitate the deployment of deep learning models in resource constrained devices. In defensive distillation, instead of transferring knowledge between models of different architectures, the knowledge is extracted from a model to then improve its own resilience to adversarial examples.

Let's assume we are training a neural network for image classification tasks, and the network is designed with a softmax layer as the output layer. The key point in distillation is the addition of a **temperature parameter T** to the softmax operation:

$$ F(X) = \left[ \frac{e^{z_i(x)/T}}{\sum_{l=0}^{N-1} e^{z_l(x)/T}} \right]_{i \in 0 ... N-1} $$

The authors showed that experimentally, a high empirical value of T gives a better distillation performance. During test time, T is set to 1, making the above equation equivalent to standard softmax operation.

Defensive distillation is a two-step process. First, we train an initial network $$F$$ on data $$X$$. In this step, instead of letting the network output hard class labels, we take the probability vectors produced by the softmax layer. The benefit of using class probabilities (i.e. soft labels) instead of hard labels is that in addition to merely providing a sample’s correct class, probabilities also encode the relative differences between classes. Next, we then use the probability vectors from the initial network as the labels to train another distilled network $$F’$$ with the same architecture on the same training data $$X$$. During training, it is important to set the temperature parameter $$T$$ for both networks to a value larger than 1. After training is completed, we will use the distilled network $$F’$$ with $$T$$ set to 1 to make predictions at test time.

<p align="center">
  <img src="/assets/defensive-distillation.png" width="600">
  <div class="figcaption">Defense mechanism based on a transfer of knowledge contained in probability vectors through distillation.</div>
</p>

Why is defensive distillation a good idea? First, a large value of $$T$$ has the effect of pushing the resulting probability distribution closer to uniform. This helps improve the model’s ability to generalize outside of its training dataset, by avoiding situations where the model is forced to make an overly confident prediction in one class when a sample includes characteristics of two or more classes. The authors also argue that distillation at high temperatures reduces a model’s sensitivity to small input variations, which are often found in adversarial examples. The model’s sensitivity to input variation is quantified by its Jacobian:

$$ \frac{\partial F_i(X)}{\partial X_j} = \frac{\partial}{\partial X_j}  \left( \frac{e^{z_i/T}}{\sum_{l=0}^{N-1} e^{z_l/T}} \right) \\
= \frac{1}{T} \frac{e^{z_i/T}}{g^2(X)} \left( \sum_{l=0}^{N-1} \left(\frac{\partial z_i}{\partial X_j} - \frac{\partial z_l}{\partial X_j} \right) e^{z_l/T} \right) $$

where

$$ g(X) = \sum_{l=0}^{N-1} e^{z_l(X)/T} $$

From the above expression, it can be observed that the amplitude of the Jacobian is inversely proportional to the temperature value. During test time, although $$T$$ is set to a relatively small value of 1, the model’s sensitivity to small variations and perturbations will not be affected, since the weights learned at training time remain unchanged, and decreasing temperature only makes the class probability vector more discrete without changing the relative ordering of the classes.

### Denoising
Since adversarial examples are images with added perturbations (i.e. noise), one straightforward defense strategy is to have some mechanisms to denoise the adversarial samples. There can be two approaches to denoising: input denoising and feature denoising. Input denoising attempts to partially or fully remove the adversarial perturbations from the input images, whereas feature denoising aims to alleviate the effects of adversarial perturbations on high-level features.

In their study, Chow et al. proposed a method for input denoising with ensembles of denoisers. The intuition of using ensembles for denoising is that there are various ways for adversaries to generate and add perturbations to images, and no single denoiser is guaranteed to be effective across all data corruption methods - a denoiser that excels at removing some types of noise may perform poorly on others. Therefore, it is often helpful to employ an ensemble of diverse denoisers, instead of relying only on a single denoiser.

Autoencoders are used for training the denoisers. First, the denoising autoencoder takes a clean input image and transforms it into an adversarial example by adding some perturbations. Next, the noisy image is fed into the auto encoder, with a goal of reconstructing the original clean, uncorrupted image. Given N training examples, the denoising autoencoder is trained by backpropagation to minimize the reconstruction loss:

$$ Loss = \frac{1}{N}\sum_{i=1}^{N}d(x_i, g_{\theta'}(f_{\theta}(x'_i))) + \frac{\lambda}{2}(\|\theta \|_\text{F}^2 + \|\theta' \|_\text{F}^2) $$

where d is a distance function and λ is a regularization hyperparameter penalizing the Frobenius norm of θ and θ’. $$ g^i $$ is the operation at the i-th decoding layer with weights $$ \theta_i^' $$.

For feature denoising as a defense strategy, one study was conducted by Xie et al, in which the authors incorporated denoising blocks at intermediate layers of a convolutional neural network. The authors argued that the adversarial perturbation of the features gradually increases as an image is propagated through the network, causing the model to eventually make the wrong predictions. Therefore, it can be helpful to add denoising blocks at intermediate layers of the network to combat feature noise.

<p align="center">
  <img src="/assets/denoising-block.png" width="300">
  <div class="figcaption">Defense mechanism based on a transfer of knowledge contained in probability vectors through distillation.</div>
</p>

The input to a denoising block can be any feature layer in the convolutional neural network. In the study, each denoising blocks performs one type of the following denoising operations: nonlocal means, bilateral filter, mean filter, and median filter. These are the techniques commonly used in computer vision tasks such as image processing and denoising. The denoising blocks are trained jointly with all layers of the network in an end-to-end manner using adversarial training. In their experiments, denoising blocks were added to the variants of ResNet models. The results showed that the proposed denoising method achieved 55.7 percent accuracy under white-box attacks on ImageNet, whereas previous state of the art was only 27.9 percent accuracy.

## References
[Explaining and harnessing adversarial examples](https://arxiv.org/abs/1412.6572)  
[One pixel attack for fooling deep neural networks](https://arxiv.org/abs/1710.08864)  
[Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/abs/1511.04508)  
[Denoising and Verification Cross-Layer Ensemble Against Black-box Adversarial Attacks](https://arxiv.org/abs/1908.07667)  
[Feature Denoising for Improving Adversarial Robustness](https://arxiv.org/abs/1812.03411)  
[Adversarial Attacks and Defences: A Survey](https://arxiv.org/abs/1810.00069)

## Additional Resources
[Adversarial Robustness - Theory and Practice (NeurIPS 2018 tutorial)](https://adversarial-ml-tutorial.org/)  
[CleverHans - An Python Library on Adversarial Example](https://github.com/cleverhans-lab/cleverhans)


================================================
FILE: assets/conv-demo/index.html
================================================

<html>
<head>
<title>Convolution demo</title>

<script type="text/javascript" src="external/d3.min.js"></script>
<script type="text/javascript" src="utils.js"></script>

<style type="text/css">
body {
  margin: 0;
  padding: 0;
}
</style>

<script type="text/javascript">

var W1 = 7;
var H1 = 7;
var D1 = 3;

var K = 2;
var F = 3;
var S = 2; // stride

var cs = 25; // cell size

var X = new U.Vol(W1, H1, D1); // input volume
for(var q=0;q<X.w.length;q++) {
  X.w[q] = Math.floor(Math.random()*3);
  // 0 pad with P = 1
  for(var d=0;d<X.depth;d++) {
    for(var x=0;x<X.sx;x++) {
      for(var y=0;y<X.sy;y++) {
        if(x === 0 || x === (X.sx - 1) || y === 0 || y === (X.sy - 1)) {
          X.set(x,y,d,0);
        }
      }  
    }
  }
}

var Ws = [];
var bs = [];
for(var k=0;k<K;k++) {
  var W = new U.Vol(F, F, D1);
  for(var q=0;q<W.w.length;q++) {
    W.w[q] = Math.floor(Math.random()*3) - 1;
  }
  Ws.push(W);
  var b = new U.Vol(1,1,1);
  b.w[0] = 1 - k;
  bs.push(b);
}

var conv_forward = function(V, Ws, bs, stride) {
  // optimized code by @mdda that achieves 2x speedup over previous version
  var out_sy = ((V.sy-W.sy)/stride +1);
  var out_sx = ((V.sx-W.sx)/stride +1);
  var A = new U.Vol(out_sx |0, out_sy |0, Ws.length |0, 0.0);
  
  var V_sx = V.sx |0;
  var V_sy = V.sy |0;
  var xy_stride = stride |0;

  for(var d=0;d<Ws.length;d++) {
    var f = Ws[d];
    var x = 0;
    var y = 0;
    for(var ay=0; ay<out_sy; y+=xy_stride,ay++) {  // xy_stride
      x = 0;
      for(var ax=0; ax<out_sx; x+=xy_stride,ax++) {  // xy_stride

        // convolve centered at this particular location
        var a = 0.0;
        for(var fy=0;fy<f.sy;fy++) {
          var oy = y+fy; // coordinates in the original input array coordinates
          for(var fx=0;fx<f.sx;fx++) {
            var ox = x+fx;
            if(oy>=0 && oy<V_sy && ox>=0 && ox<V_sx) {
              for(var fd=0;fd<f.depth;fd++) {
                // avoid function call overhead (x2) for efficiency, compromise modularity :(
                a += f.w[((f.sx * fy)+fx)*f.depth+fd] * V.w[((V_sx * oy)+ox)*V.depth+fd];
              }
            }
          }
        }
        a += bs[d].w[0];
        A.set(ax, ay, d, a);
      }
    }
  }
  return A;
}

function renderVol(svg, V, xoff, yoff, col, title, vid) {

  var pad = 3;
  var dpad = 20;

  var gyoff = 20;

  var txt = title + ' (' + V.sx + 'x' + V.sy + 'x' + V.depth + ')';
  // 1 padding exception
  //if(vid === 'x') { txt = title + ' (' + (V.sx-2) + 'x' + (V.sy-2) + 'x' + V.depth + ')'; }

  svg.append('text')
    .attr('x', xoff)
    .attr('y', yoff - 5)
    .attr('font-size', 16)
    .attr('fill', 'black')
    .text(txt);

  for(var d = 0; d < V.depth; d++) {

    svg.append('text')
      .attr('x', xoff)
      .attr('y', yoff + d * (V.sy * (cs + pad) + dpad) + gyoff - 5)
      .attr('font-size', 16)
      .attr('fill', 'black')
      .attr('style', 'font-family: courier;')
      .text(vid + '[:,:,'+d+']');

    for(var x = 0; x < V.sx; x++) {
      for(var y = 0; y < V.sy; y++) {

        var xcoord = xoff + x * (cs + pad);
        var ycoord = yoff + y * (cs + pad) + d * (V.sy * (cs + pad) + dpad) + gyoff;

        var thecol = col;
        if(vid === 'x' && (x === 0 || y === 0 || x === V.sx - 1 || y === V.sy - 1)) {thecol = '#DDD';}

        svg.append('rect')
          .attr('x', xcoord)
          .attr('y', ycoord)
          .attr('height', cs)
          .attr('width', cs)
          .attr('fill', thecol)
          .attr('stroke', 'none')
          .attr('stroke-width', '2')
          .attr('id', vid+'_'+x+'_'+y+'_'+d)
          .attr('class', vid);

        svg.append('text')
          .attr('x', xcoord + 5)
          .attr('y', ycoord + 15)
          .attr('font-size', 16)
          .attr('fill', 'black')
          .text(V.get(x,y,d).toFixed(0));

      }
    }
  }
}

function draw() {
  var d3elt = d3.select('#draw');
  svg = d3elt.append('svg').attr('width', '100%').attr('height', '100%')
        .append('g').attr('transform', 'scale(1)');

  var yoff = 20;
  // render input volume
  renderVol(svg, X, 10, yoff, '#DDF', 'Input Volume (+pad 1)', 'x');

  for(var i=0;i<Ws.length;i++) {
    // render weights
    renderVol(svg, Ws[i], 270 + i*170, yoff, '#FDD', 'Filter W'+i, 'w'+i);
    // render biases
    renderVol(svg, bs[i], 270 + i*170, 350 + yoff, '#FDD', 'Bias b'+i, 'b'+i);
  }

  // render output
  renderVol(svg, O, 600, yoff, '#DFD', 'Output Volume', 'o');

  // render controls
  
  svg.append('text')
    .attr('x', 520)
    .attr('y', 470)
    .attr('font-size', 16)
    .attr('fill', 'black')
    .text('toggle movement');
  svg.append('rect')
    .attr('x', 500)
    .attr('y', 450)
    .attr('height', 30)
    .attr('width', 150)
    .attr('fill', "rgba(200, 200, 200, 0.1)")
    .attr('stroke', 'black')
    .attr('stroke-width', '2')
    .attr('style', 'cursor:pointer;')
    .on('click', function() {
      // toggle 
      if(iid === -1) {
        iid = setInterval(focusCell, 1000);
      } else {
        clearInterval(iid);
        iid = -1;
      }
    });
}

var fxg = 0;
var fyg = 0;
var fdg = 0;
function focusCell() {
  
  // first unfocus all
  for(var i=0;i<Ws.length;i++) {
    d3.selectAll('.w'+i).attr('stroke', 'none');
    d3.selectAll('.b'+i).attr('stroke', 'none');
  }
  d3.selectAll('.x').attr('stroke', 'none');
  d3.selectAll('.o').attr('stroke', 'none');

  var fx = fxg;
  var fy = fyg;
  var fd = fdg;

  // highlight the output cell
  var csel = d3.select('#o'+'_'+fx+'_'+fy+'_'+fd);
  csel.attr('stroke', '#0A0');

  // highlight the weights
  d3.selectAll('.w'+fd).attr('stroke', '#A00');
  // highlight the bias
  d3.selectAll('.b'+fd).attr('stroke', '#A00');

  d3.selectAll('.ll').remove();

  // highlight the input cell
  for(var d=0;d<D1;d++) {
    for(var x=0;x<F;x++) {
      for(var y=0;y<F;y++) {
        var ix = fx * S + x;
        var iy = fy * S + y;
        var id = d;
        var csel = d3.select('#x'+'_'+ix+'_'+iy+'_'+id);
        csel.attr('stroke', '#00A');

        // connect with line
        if(x === 0 && y === 0) {
          var wsel = d3.select('#w'+fd+'_'+x+'_'+y+'_'+d);
          svg.append('line')
            .attr('x1', csel.attr('x'))
            .attr('y1', csel.attr('y'))
            .attr('x2', wsel.attr('x'))
            .attr('y2', wsel.attr('y'))
            .attr('stroke', 'black')
            .attr('stroke-width', '1')
            .attr('class', 'll');
        }
        if(x === 0 && y === (F-1)) {
          var wsel = d3.select('#w'+fd+'_'+x+'_'+y+'_'+d);
          svg.append('line')
            .attr('x1', csel.attr('x'))
            .attr('y1', parseFloat(csel.attr('y')) + cs)
            .attr('x2', wsel.attr('x'))
            .attr('y2', parseFloat(wsel.attr('y')) + cs)
            .attr('stroke', 'black')
            .attr('stroke-width', '1')
            .attr('class', 'll');
        }
        if(x === (F-1) && y === 0) {
          var wsel = d3.select('#w'+fd+'_'+x+'_'+y+'_'+d);
          svg.append('line')
            .attr('x1', parseFloat(csel.attr('x')) + cs)
            .attr('y1', csel.attr('y'))
            .attr('x2', parseFloat(wsel.attr('x')) + cs)
            .attr('y2', wsel.attr('y'))
            .attr('stroke', 'black')
            .attr('stroke-width', '1')
            .attr('class', 'll');
        }
        if(x === (F-1) && y === (F-1)) {
          var wsel = d3.select('#w'+fd+'_'+x+'_'+y+'_'+d);
          svg.append('line')
            .attr('x1', parseFloat(csel.attr('x')) + cs)
            .attr('y1', parseFloat(csel.attr('y')) + cs)
            .attr('x2', parseFloat(wsel.attr('x')) + cs)
            .attr('y2', parseFloat(wsel.attr('y')) + cs)
            .attr('stroke', 'black')
            .attr('stroke-width', '1')
            .attr('class', 'll');
        }
        
      }
    }
  }

  // output focus cycle
  fxg++;
  if(fxg >= O.sx) {
    fxg = 0;
    fyg++;
    if(fyg >=O.sy) {
      fyg = 0;
      fdg++;
      if(fdg >= O.depth) {
        fdg = 0;
      }
    }
  }

}

iid = -1;
function start() {
  O = conv_forward(X, Ws, bs, S);
  draw();
  iid = setInterval(focusCell, 1000);
}

</script>


</head>

<body onload="start()">

<div id="draw">
</div>

</body>
</html>

================================================
FILE: assets/conv-demo/utils.js
================================================
var U = {};

(function(global) {
  "use strict";

  // Random number utilities
  var return_v = false;
  var v_val = 0.0;
  var gaussRandom = function() {
    if(return_v) { 
      return_v = false;
      return v_val; 
    }
    var u = 2*Math.random()-1;
    var v = 2*Math.random()-1;
    var r = u*u + v*v;
    if(r == 0 || r > 1) return gaussRandom();
    var c = Math.sqrt(-2*Math.log(r)/r);
    v_val = v*c; // cache this
    return_v = true;
    return u*c;
  }
  var randf = function(a, b) { return Math.random()*(b-a)+a; }
  var randi = function(a, b) { return Math.floor(Math.random()*(b-a)+a); }
  var randn = function(mu, std){ return mu+gaussRandom()*std; }

  // Array utilities
  var zeros = function(n) {
    if(typeof(n)==='undefined' || isNaN(n)) { return []; }
    if(typeof ArrayBuffer === 'undefined') {
      // lacking browser support
      var arr = new Array(n);
      for(var i=0;i<n;i++) { arr[i]= 0; }
      return arr;
    } else {
      return new Float64Array(n);
    }
  }

  var arrContains = function(arr, elt) {
    for(var i=0,n=arr.length;i<n;i++) {
      if(arr[i]===elt) return true;
    }
    return false;
  }

  var arrUnique = function(arr) {
    var b = [];
    for(var i=0,n=arr.length;i<n;i++) {
      if(!arrContains(b, arr[i])) {
        b.push(arr[i]);
      }
    }
    return b;
  }

  // return max and min of a given non-empty array.
  var maxmin = function(w) {
    if(w.length === 0) { return {}; } // ... ;s
    var maxv = w[0];
    var minv = w[0];
    var maxi = 0;
    var mini = 0;
    var n = w.length;
    for(var i=1;i<n;i++) {
      if(w[i] > maxv) { maxv = w[i]; maxi = i; } 
      if(w[i] < minv) { minv = w[i]; mini = i; } 
    }
    return {maxi: maxi, maxv: maxv, mini: mini, minv: minv, dv:maxv-minv};
  }

  // create random permutation of numbers, in range [0...n-1]
  var randperm = function(n) {
    var i = n,
        j = 0,
        temp;
    var array = [];
    for(var q=0;q<n;q++)array[q]=q;
    while (i--) {
        j = Math.floor(Math.random() * (i+1));
        temp = array[i];
        array[i] = array[j];
        array[j] = temp;
    }
    return array;
  }

  // sample from list lst according to probabilities in list probs
  // the two lists are of same size, and probs adds up to 1
  var weightedSample = function(lst, probs) {
    var p = randf(0, 1.0);
    var cumprob = 0.0;
    for(var k=0,n=lst.length;k<n;k++) {
      cumprob += probs[k];
      if(p < cumprob) { return lst[k]; }
    }
  }

  // syntactic sugar function for getting default parameter values
  var getopt = function(opt, field_name, default_value) {
    if(typeof field_name === 'string') {
      // case of single string
      return (typeof opt[field_name] !== 'undefined') ? opt[field_name] : default_value;
    } else {
      // assume we are given a list of string instead
      var ret = default_value;
      for(var i=0;i<field_name.length;i++) {
        var f = field_name[i];
        if (typeof opt[f] !== 'undefined') {
          ret = opt[f]; // overwrite return value
        }
      }
      return ret;
    }
  }

  function assert(condition, message) {
    if (!condition) {
      message = message || "Assertion failed";
      if (typeof Error !== "undefined") {
        throw new Error(message);
      }
      throw message; // Fallback
    }
  }

  global.randf = randf;
  global.randi = randi;
  global.randn = randn;
  global.zeros = zeros;
  global.maxmin = maxmin;
  global.randperm = randperm;
  global.weightedSample = weightedSample;
  global.arrUnique = arrUnique;
  global.arrContains = arrContains;
  global.getopt = getopt;
  global.assert = assert;
  
})(U);

(function(global) {
  "use strict";

  // Vol is the basic building block of all data in a net.
  // it is essentially just a 3D volume of numbers, with a
  // width (sx), height (sy), and depth (depth).
  // it is used to hold data for all filters, all volumes,
  // all weights, and also stores all gradients w.r.t. 
  // the data. c is optionally a value to initialize the volume
  // with. If c is missing, fills the Vol with random numbers.
  var Vol = function(sx, sy, depth, c) {
    // this is how you check if a variable is an array. Oh, Javascript :)
    if(Object.prototype.toString.call(sx) === '[object Array]') {
      // we were given a list in sx, assume 1D volume and fill it up
      this.sx = 1;
      this.sy = 1;
      this.depth = sx.length;
      // we have to do the following copy because we want to use
      // fast typed arrays, not an ordinary javascript array
      this.w = global.zeros(this.depth);
      this.dw = global.zeros(this.depth);
      for(var i=0;i<this.depth;i++) {
        this.w[i] = sx[i];
      }
    } else {
      // we were given dimensions of the vol
      this.sx = sx;
      this.sy = sy;
      this.depth = depth;
      var n = sx*sy*depth;
      this.w = global.zeros(n);
      this.dw = global.zeros(n);
      if(typeof c === 'undefined') {
        // weight normalization is done to equalize the output
        // variance of every neuron, otherwise neurons with a lot
        // of incoming connections have outputs of larger variance
        var scale = Math.sqrt(1.0/(sx*sy*depth));
        for(var i=0;i<n;i++) { 
          this.w[i] = global.randn(0.0, scale);
        }
      } else {
        for(var i=0;i<n;i++) { 
          this.w[i] = c;
        }
      }
    }
  }

  Vol.prototype = {
    get: function(x, y, d) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      return this.w[ix];
    },
    set: function(x, y, d, v) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      this.w[ix] = v; 
    },
    add: function(x, y, d, v) { 
      var ix=((this.sx * y)+x)*this.depth+d;
      this.w[ix] += v; 
    },
    get_grad: function(x, y, d) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      return this.dw[ix]; 
    },
    set_grad: function(x, y, d, v) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      this.dw[ix] = v; 
    },
    add_grad: function(x, y, d, v) { 
      var ix = ((this.sx * y)+x)*this.depth+d;
      this.dw[ix] += v; 
    },
    cloneAndZero: function() { return new Vol(this.sx, this.sy, this.depth, 0.0)},
    clone: function() {
      var V = new Vol(this.sx, this.sy, this.depth, 0.0);
      var n = this.w.length;
      for(var i=0;i<n;i++) { V.w[i] = this.w[i]; }
      return V;
    },
    addFrom: function(V) { for(var k=0;k<this.w.length;k++) { this.w[k] += V.w[k]; }},
    addFromScaled: function(V, a) { for(var k=0;k<this.w.length;k++) { this.w[k] += a*V.w[k]; }},
    setConst: function(a) { for(var k=0;k<this.w.length;k++) { this.w[k] = a; }},
  }

  global.Vol = Vol;
})(U);


================================================
FILE: assignments/2015/assignment1.md
================================================
---
layout: page
mathjax: true
permalink: assignments2015/assignment1/
---

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- develop proficiency in writing efficient **vectorized** code with numpy
- implement and apply a k-Nearest Neighbor (**kNN**) classifier
- implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- implement and apply a **Softmax** classifier
- understand the differences and tradeoffs between these classifiers
- get a basic understanding of performance improvements from using **higher-level representations** than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine
through [Terminal](https://www.terminal.com/).

### Working locally
**Get the code:**
[Download the starter code here](http://cs231n.stanford.edu/assignments/2015/assignment1.zip).

**[Optional] virtual environment:**
Once you have unzipped the starter code, you might want to create a
[virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
for the project. If you choose not to use a virtual environment, it is up to you
to make sure that all dependencies for the code are installed on your machine.
To set up a virtual environment, run the following:

```bash
cd assignment1
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

**Download data:**
Once you have the starter code, you will need to download the CIFAR-10 dataset.
Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

**Start IPython:**
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment1` directory. If you are unfamiliar with IPython, you should read our
[IPython tutorial](/ipython-tutorial).

### Working on Terminal
We have created a Terminal snapshot that is preconfigured for this assignment;
you can [find it here](https://www.terminal.com/tiny/hUxP8UTMKa). Terminal allows you to work on the assignment from your browser. You can find a tutorial on how to use it [here](/terminal-tutorial).

### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment1.zip`. Upload this file to your dropbox on
[the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
page for the course.

### Q1: k-Nearest Neighbor classifier (30 points)

The IPython Notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (30 points)

The IPython Notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (30 points)

The IPython Notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Higher Level Representations: Image Features (10 points)

The IPython Notebook **features.ipynb** will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

### Q5: Bonus: Design your own features! (+10 points)

In this assignment we provide you with Color Histograms and HOG features. To claim these bonus points, implement your own additional features from scratch, and using only numpy or scipy (no external dependencies). You will have to research different feature types to get ideas for what you might want to implement. Your new feature should help you improve the performance beyond what you got in Q4 if you wish to get these bonus points. If you come up with nice features we'll feature them in the lecture.

### Q6: Cool Bonus: Do something extra! (+10 points)

Implement, investigate or analyze something extra surrounding the topics in this assignment, and using the code you developed. For example, is there some other interesting question we could have asked? Is there any insightful visualization you can plot? Or maybe you can experiment with a spin on the loss function? If you try out something cool we'll give you points and might feature your results in the lecture.


================================================
FILE: assignments/2015/assignment2.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2015/assignment2/
---

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- understand **Neural Networks** and how they are arranged in layered architectures
- understand and be able to implement (vectorized) **backpropagation**
- implement the core parameter update loop of **mini-batch gradient descent**
- effectively **cross-validate** and find the best hyperparameters for Neural Network architecture
- understand the architecture of **Convolutional Neural Networks** and train gain experience with training these models on data

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine through [Terminal](https://www.terminal.com/).

### Working locally
**Get the code:**
[Download the starter code here](http://cs231n.stanford.edu/assignments/2015/assignment2.zip).

**[Optional] virtual environment:**
Once you have unzipped the starter code, you might want to create a
[virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
for the project. If you choose not to use a virtual environment, it is up to you
to make sure that all dependencies for the code are installed on your machine.
To set up a virtual environment, run the following:

```bash
cd assignment2
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

You can reuse the virtual environment that you created for the first assignment,
but you will need to run `pip install -r requirements.txt` after activating it
to install additional dependencies required by this assignment.

**Download data:**
Once you have the starter code, you will need to download the CIFAR-10 dataset.
Run the following from the `assignment2` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

**Compile the Cython extension:** Convolutional Neural Networks require a very efficient implementation. We have implemented of the functionality using [Cython](http://cython.org/); you will need to compile the Cython extension before you can run the code. From the `cs231n` directory, run the following command:

```bash
python setup.py build_ext --inplace
```

**Start IPython:**
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment2` directory. If you are unfamiliar with IPython, you should read our
[IPython tutorial](/ipython-tutorial).


### Working on Terminal
We have created a Terminal snapshot that is preconfigured for this assignment;
you can [find it here](https://www.terminal.com/tiny/WcL5zSlT0e). Terminal allows you to work on the assignment from your browser. You can find a tutorial on how to use it [here](/terminal-tutorial).

### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment2.zip`. Upload this file to your dropbox on
[the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
page for the course.

### Q1: Two-layer Neural Network (30 points)

The IPython Notebook `two_layer_net.ipynb` will walk you through implementing a two-layer neural network on CIFAR-10. You will write a hard-coded 2-layer Neural Network, implement its backprop pass, and tune its hyperparameters.

### Q2: Modular Neural Network (30 points)

The IPython Notebook `layers.ipynb` will walk you through a modular Neural Network implementation. You will implement the forward and backward passes of many different layer types, including convolution and pooling layers.

### Q3: ConvNet on CIFAR-10 (40 points)

The IPython Notebook `convnet.ipynb` will walk you through the process of training a (shallow) convolutional neural network on CIFAR-10. It wil then be up to you to train the best network that you can.

### Q4: Do something extra! (up to +20 points)

In the process of training your network, you should feel free to implement anything that you want to get better performance. You can modify the solver, implement additional layers, use different types of regularization, use an ensemble of models, or anything else that comes to mind. If you implement these or other ideas not covered in the assignment then you will be awarded some bonus points.


================================================
FILE: assignments/2015/assignment3.md
================================================
---
layout: page
mathjax: true
permalink: assignments2015/assignment3/
---

In the previous assignment, you implemented and trained your own ConvNets.
In this assignment, we will explore many of the ideas we have discussed in
lectures. Specifically, you will:

- Reduce overfitting using **dropout** and **data augmentation**
- Combine models into **ensembles** to boost performance
- Use **transfer learning** to adapt a pretrained model to a new dataset
- Use data gradients to visualize **saliency maps** and create **fooling images**

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine through [Terminal](https://www.terminal.com/).

### Working locally
**Get the code:**
Download the starter code [here](http://cs231n.stanford.edu/assignments/2015/assignment3.zip).


**[Optional] virtual environment:**
Once you have unzipped the starter code, you might want to create a
[virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
for the project. If you choose not to use a virtual environment, it is up to you
to make sure that all dependencies for the code are installed on your machine.
To set up a virtual environment, run the following:

```bash
cd assignment3
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

You can reuse the virtual environment that you created for the first or second 
assignment, but you will need to run `pip install -r requirements.txt` after
activating it to install additional dependencies required by this assignment.

**Download data:**
Once you have the starter code, you will need to download the CIFAR-10 dataset,
the TinyImageNet-100-A and TinyImageNet-100-B datasets, and pretrained models
for the TinyImageNet-100-A dataset.

Run the following from the `assignment3` directory:

NOTE: After downloading and unpacking, the data and pretrained models will
take about 900MB of disk space.

```bash
cd cs231n/datasets
./get_datasets.sh
./get_tiny_imagenet_splits.sh
./get_pretrained_models.sh
```

**Compile the Cython extension:** Convolutional Neural Networks require a very efficient implementation. We have implemented of the functionality using [Cython](http://cython.org/); you will need to compile the Cython extension before you can run the code. From the `cs231n` directory, run the following command:

```bash
python setup.py build_ext --inplace
```

**Start IPython:**
After you have downloaded the data and compiled the Cython extensions,
you should start the IPython notebook server from the
`assignmen3` directory. If you are unfamiliar with IPython, you should read our
[IPython tutorial](/ipython-tutorial).

### Working on Terminal
We have created a Terminal snapshot that is preconfigured for this assignment;
you can [find it here](https://www.terminal.com/snapshot/f8e4a56cfde0d0baa519c501dfe24b3394a3124b8b84356d2cd033a229a59bff). Terminal allows you to work on the assignment from your browser. You can find a tutorial on how to use it [here](/terminal-tutorial).

### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment3.zip`. Upload this file to your dropbox on
[the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
page for the course.

### Q1: Dropout and Data Augmentation

In the IPython notebook `q1.ipynb` you will implement dropout and several forms of data augmentation. This will allow you to reduce overfitting when training on a small subset of the CIFAR-10 dataset.


### Q2: TinyImageNet and Model Ensembles

In the IPython notebook `q2.ipynb` we will introduce the TinyImageNet-100-A
dataset. You will try to classify examples from this dataset by hand, and you
will combine pretrained models into an ensemble to boost your performance on
this dataset.

### Q3: Transfer Learning

In the IPython notebook `q3.ipynb` you will implement several forms of transfer
learning. You will use adapt a pretrained model for TinyImageNet-100-A to achieve
reasonable performanc with minimal training time on the TinyImageNet-100-B
dataset.

### Q4: Visualizing and Breaking ConvNets

In the IPython notebook `q4.ipynb` you will visualize data gradients and you
will generate images to fool a pretrained ConvNet.


================================================
FILE: assignments/2016/assignment1.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2016/assignment1/
---
**Note: this is the 2016 version of this assignment.**

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- develop proficiency in writing efficient **vectorized** code with numpy
- implement and apply a k-Nearest Neighbor (**kNN**) classifier
- implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- implement and apply a **Softmax** classifier
- implement and apply a **Two layer neural network** classifier
- understand the differences and tradeoffs between these classifiers
- get a basic understanding of performance improvements from using **higher-level representations** than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine through Terminal.com. 

### Working in the cloud on Terminal

Terminal has created a separate subdomain to serve our class, [www.stanfordterminalcloud.com](https://www.stanfordterminalcloud.com). Register your account there. The Assignment 1 snapshot can then be found [here](https://www.stanfordterminalcloud.com/snapshot/49f5a1ea15dc424aec19155b3398784d57c55045435315ce4f8b96b62819ef65). If you're registered in the class you can contact the TA (see Piazza for more information) to request Terminal credits for use on the assignment. Once you boot up the snapshot everything will be installed for you, and you'll be ready to start on your assignment right away. We've written a small tutorial on Terminal [here](/terminal-tutorial).

### Working locally
Get the code as a zip file [here](http://cs231n.stanford.edu/assignments/2016/winter1516_assignment1.zip). As for the dependencies:

**[Option 1] Use Anaconda:**
The preferred approach for installing all the assignment dependencies is to use [Anaconda](https://www.continuum.io/downloads), which is a Python distribution that includes many of the most popular Python packages for science, math, engineering and data analysis. Once you install it you can skip all mentions of requirements and you're ready to go directly to working on the assignment.

**[Option 2] Manual install, virtual environment:**
If you'd like to (instead of Anaconda) go with a more manual and risky installation route you will likely want to create a [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following:

```bash
cd assignment1
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

**Download data:**
Once you have the starter code, you will need to download the CIFAR-10 dataset.
Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

**Start IPython:**
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment1` directory. If you are unfamiliar with IPython, you should read our
[IPython tutorial](/ipython-tutorial).

**NOTE:** If you are working in a virtual environment on OSX, you may encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). You can work around this issue by starting the IPython server using the `start_ipython_osx.sh` script from the `assignment1` directory; the script assumes that your virtual environment is named `.env`.

### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment1.zip`. Upload this file to your dropbox on
[the coursework](https://coursework.stanford.edu/portal/site/W16-CS-231N-01/)
page for the course.

### Q1: k-Nearest Neighbor classifier (20 points)

The IPython Notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (25 points)

The IPython Notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (20 points)

The IPython Notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network (25 points)
The IPython Notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features (10 points)

The IPython Notebook **features.ipynb** will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

### Q6: Cool Bonus: Do something extra! (+10 points)

Implement, investigate or analyze something extra surrounding the topics in this assignment, and using the code you developed. For example, is there some other interesting question we could have asked? Is there any insightful visualization you can plot? Or anything fun to look at? Or maybe you can experiment with a spin on the loss function? If you try out something cool we'll give you up to 10 extra points and may feature your results in the lecture.


================================================
FILE: assignments/2016/assignment2.md
================================================
---
layout: page
mathjax: true
permalink: assignments2016/assignment2/
---
**Note: this is the 2016 version of this assignment.**

In this assignment you will practice writing backpropagation code, and training
Neural Networks and Convolutional Neural Networks. The goals of this assignment
are as follows:

- understand **Neural Networks** and how they are arranged in layered
  architectures
- understand and be able to implement (vectorized) **backpropagation**
- implement various **update rules** used to optimize Neural Networks
- implement **batch normalization** for training deep networks
- implement **dropout** to regularize networks
- effectively **cross-validate** and find the best hyperparameters for Neural
  Network architecture
- understand the architecture of **Convolutional Neural Networks** and train
  gain experience with training these models on data

## Setup
You can work on the assignment in one of two ways: locally on your own machine,
or on a virtual machine through Terminal.com. 

### Working in the cloud on Terminal

Terminal has created a separate subdomain to serve our class,
[www.stanfordterminalcloud.com](https://www.stanfordterminalcloud.com). Register
your account there. The Assignment 2 snapshot can then be found [HERE](https://www.stanfordterminalcloud.com/snapshot/6c95ca2c9866a962964ede3ea5813d4c2410ba48d92cf8d11a93fbb13e08b76a). If you are
registered in the class you can contact the TA (see Piazza for more information)
to request Terminal credits for use on the assignment. Once you boot up the
snapshot everything will be installed for you, and you will be ready to start on
your assignment right away. We have written a small tutorial on Terminal
[here](/terminal-tutorial).

### Working locally
Get the code as a zip file
[here](http://cs231n.stanford.edu/assignments/2016/winter1516_assignment2.zip).
As for the dependencies:

**[Option 1] Use Anaconda:**
The preferred approach for installing all the assignment dependencies is to use
[Anaconda](https://www.continuum.io/downloads), which is a Python distribution
that includes many of the most popular Python packages for science, math,
engineering and data analysis. Once you install it you can skip all mentions of
requirements and you are ready to go directly to working on the assignment.

**[Option 2] Manual install, virtual environment:**
If you do not want to use Anaconda and want to go with a more manual and risky
installation route you will likely want to create a
[virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
for the project. If you choose not to use a virtual environment, it is up to you
to make sure that all dependencies for the code are installed globally on your
machine. To set up a virtual environment, run the following:

```bash
cd assignment2
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

**Download data:**
Once you have the starter code, you will need to download the CIFAR-10 dataset.
Run the following from the `assignment2` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

**Compile the Cython extension:** Convolutional Neural Networks require a very
efficient implementation. We have implemented of the functionality using
[Cython](http://cython.org/); you will need to compile the Cython extension
before you can run the code. From the `cs231n` directory, run the following
command:

```bash
python setup.py build_ext --inplace
```

**Start IPython:**
After you have the CIFAR-10 data, you should start the IPython notebook server
from the `assignment2` directory. If you are unfamiliar with IPython, you should 
read our [IPython tutorial](/ipython-tutorial).

**NOTE:** If you are working in a virtual environment on OSX, you may encounter
errors with matplotlib due to the
[issues described here](http://matplotlib.org/faq/virtualenv_faq.html).
You can work around this issue by starting the IPython server using the
`start_ipython_osx.sh` script from the `assignment2` directory; the script
assumes that your virtual environment is named `.env`.


### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment2.zip`. Upload this file under the Assignments tab on
[the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
page for the course.


### Q1: Fully-connected Neural Network (30 points)
The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our
modular layer design, and then use those layers to implement fully-connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (30 points)
In the IPython notebook `BatchNormalization.ipynb` you will implement batch
normalization, and use it to train deep fully-connected networks.

### Q3: Dropout (10 points)
The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore
its effects on model generalization.

### Q4: ConvNet on CIFAR-10 (30 points)
In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several
new layers that are commonly used in convolutional networks. You will train a
(shallow) convolutional network on CIFAR-10, and it will then be up to you to
train the best network that you can.

### Q5: Do something extra! (up to +10 points)
In the process of training your network, you should feel free to implement
anything that you want to get better performance. You can modify the solver,
implement additional layers, use different types of regularization, use an
ensemble of models, or anything else that comes to mind. If you implement these
or other ideas not covered in the assignment then you will be awarded some bonus
points.


================================================
FILE: assignments/2016/assignment3.md
================================================
---
layout: page
mathjax: true
permalink: assignments2016/assignment3/
---
**Note: this is the 2016 version of this assignment.**

In this assignment you will implement recurrent networks, and apply them to image captioning on Microsoft COCO. We will also introduce the TinyImageNet dataset, and use a pretrained model on this dataset to explore different applications of image gradients.

The goals of this assignment are as follows:

- Understand the architecture of *recurrent neural networks (RNNs)* and how they operate on sequences by sharing weights over time
- Understand the difference between vanilla RNNs and Long-Short Term Memory (LSTM) RNNs
- Understand how to sample from an RNN at test-time
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system
- Understand how a trained convolutional network can be used to compute gradients with respect to the input image
- Implement and different applications of image gradients, including saliency maps, fooling images, class visualizations, feature inversion, and DeepDream.

## Setup
You can work on the assignment in one of two ways: locally on your own machine,
or on a virtual machine through Terminal.com. 

### Working in the cloud on Terminal

Terminal has created a separate subdomain to serve our class,
[www.stanfordterminalcloud.com](https://www.stanfordterminalcloud.com). Register
your account there. The Assignment 3 snapshot can then be found [HERE](https://www.stanfordterminalcloud.com/snapshot/29054ca27bc2e8bda888709ba3d9dd07a172cbbf0824152aac49b14a018ffbe5).
If you are registered in the class you can contact the TA (see Piazza for more
information) to request Terminal credits for use on the assignment. Once you
boot up the snapshot everything will be installed for you, and you will be ready to start on your assignment right away. We have written a small tutorial on Terminal [here](/terminal-tutorial).

### Working locally
Get the code as a zip file
[here](http://cs231n.stanford.edu/assignments/2016/winter1516_assignment3.zip).
As for the dependencies:

**[Option 1] Use Anaconda:**
The preferred approach for installing all the assignment dependencies is to use
[Anaconda](https://www.continuum.io/downloads), which is a Python distribution
that includes many of the most popular Python packages for science, math,
engineering and data analysis. Once you install it you can skip all mentions of
requirements and you are ready to go directly to working on the assignment.

**[Option 2] Manual install, virtual environment:**
If you do not want to use Anaconda and want to go with a more manual and risky
installation route you will likely want to create a
[virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/)
for the project. If you choose not to use a virtual environment, it is up to you
to make sure that all dependencies for the code are installed globally on your
machine. To set up a virtual environment, run the following:

```bash
cd assignment3
sudo pip install virtualenv      # This may already be installed
virtualenv .env                  # Create a virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

**Download data:**
Once you have the starter code, you will need to download the processed MS-COCO dataset, the TinyImageNet dataset, and the pretrained TinyImageNet model. Run the following from the `assignment3` directory:

```bash
cd cs231n/datasets
./get_coco_captioning.sh
./get_tiny_imagenet_a.sh
./get_pretrained_model.sh
```

**Compile the Cython extension:** Convolutional Neural Networks require a very
efficient implementation. We have implemented of the functionality using
[Cython](http://cython.org/); you will need to compile the Cython extension
before you can run the code. From the `cs231n` directory, run the following
command:

```bash
python setup.py build_ext --inplace
```

**Start IPython:**
After you have the data, you should start the IPython notebook server
from the `assignment3` directory. If you are unfamiliar with IPython, you should 
read our [IPython tutorial](/ipython-tutorial).

**NOTE:** If you are working in a virtual environment on OSX, you may encounter
errors with matplotlib due to the
[issues described here](http://matplotlib.org/faq/virtualenv_faq.html).
You can work around this issue by starting the IPython server using the
`start_ipython_osx.sh` script from the `assignment3` directory; the script
assumes that your virtual environment is named `.env`.


### Submitting your work:
Whether you work on the assignment locally or using Terminal, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment3.zip`. Upload this file under the Assignments tab on
[the coursework](https://coursework.stanford.edu/portal/site/W15-CS-231N-01/)
page for the course.


### Q1: Image Captioning with Vanilla RNNs (40 points)
The IPython notebook `RNN_Captioning.ipynb` will walk you through the
implementation of an image captioning system on MS-COCO using vanilla recurrent
networks.

### Q2: Image Captioning with LSTMs (35 points)
The IPython notebook `LSTM_Captioning.ipynb` will walk you through the
implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image
captioning on MS-COCO.

### Q3: Image Gradients: Saliency maps and Fooling Images (10 points)
The IPython notebook `ImageGradients.ipynb` will introduce the TinyImageNet
dataset. You will use a pretrained model on this dataset to compute gradients
with respect to the image, and use them to produce saliency maps and fooling
images.

### Q4: Image Generation: Classes, Inversion, DeepDream (15 points)
In the IPython notebook `ImageGeneration.ipynb` you will use the pretrained
TinyImageNet model to generate images. In particular you will generate
class visualizations and implement feature inversion and DeepDream.

### Q5: Do something extra! (up to +10 points)
Given the components of the assignment, try to do something cool. Maybe there is
some way to generate images that we did not implement in the assignment?


================================================
FILE: assignments/2017/assignment1.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2017/assignment1/
---
**Note: this is the 2017 version of this assignment.**

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- develop proficiency in writing efficient **vectorized** code with numpy
- implement and apply a k-Nearest Neighbor (**kNN**) classifier
- implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- implement and apply a **Softmax** classifier
- implement and apply a **Two layer neural network** classifier
- understand the differences and tradeoffs between these classifiers
- get a basic understanding of performance improvements from using **higher-level representations** than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine on Google Cloud. 

### Working remotely on Google Cloud (Recommended)

**Note:** after following these instructions, make sure you go to **Download data** below (you can skip the **Working locally** section).

As part of this course, you can use Google Cloud for your assignments. We recommend this route for anyone who is having trouble with installation set-up, or if you would like to use better CPU/GPU resources than you may have locally. Please see the set-up tutorial [here](http://cs231n.github.io/gce-tutorial/) for more details. :)

### Working locally
Get the code as a zip file [here](http://cs231n.stanford.edu/assignments/2017/spring1617_assignment1.zip). As for the dependencies:

**Installing Python 3.5+:**
To use python3, make sure to install version 3.5 or 3.6 on your local machine. If you are on Mac OS X, you can do this using [Homebrew](https://brew.sh) with `brew install python3`. You can find instructions for Ubuntu [here](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).

**Virtual environment:**
If you decide to work locally, we recommend using [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following:

```bash
cd assignment1
sudo pip install virtualenv      # This may already be installed
virtualenv -p python3 .env       # Create a virtual environment (python3)
# Note: you can also use "virtualenv .env" to use your default python (usually python 2.7)
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Work on the assignment for a while ...
deactivate                       # Exit the virtual environment
```

Note that every time you want to work on the assignment, you should run `source .env/bin/activate` (from within your `assignment1` folder) to re-activate the virtual environment, and `deactivate` again whenever you are done.

### Download data:
Once you have the starter code (regardless of which method you choose above), you will need to download the CIFAR-10 dataset.
Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

### Start IPython:
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment1` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial.md).

### Some Notes
**NOTE 1:** This year, the `assignment1` code has been tested to be compatible with python versions `2.7`, `3.5`, `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your `virtualenv` setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment1` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Submitting your work:
Whether you work on the assignment locally or using Google Cloud, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment1.zip`. Please submit this file on [Canvas](https://canvas.stanford.edu/courses/66461/).

### Q1: k-Nearest Neighbor classifier (20 points)

The IPython Notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (25 points)

The IPython Notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (20 points)

The IPython Notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network (25 points)
The IPython Notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features (10 points)

The IPython Notebook **features.ipynb** will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

### Q6: Cool Bonus: Do something extra! (+10 points)

Implement, investigate or analyze something extra surrounding the topics in this assignment, and using the code you developed. For example, is there some other interesting question we could have asked? Is there any insightful visualization you can plot? Or anything fun to look at? Or maybe you can experiment with a spin on the loss function? If you try out something cool we'll give you up to 10 extra points and may feature your results in the lecture.


================================================
FILE: assignments/2017/assignment2.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2017/assignment2/
---
**Note: this is the 2017 version of this assignment.**

In this assignment you will practice writing backpropagation code, and training
Neural Networks and Convolutional Neural Networks. The goals of this assignment
are as follows:

- understand **Neural Networks** and how they are arranged in layered
  architectures
- understand and be able to implement (vectorized) **backpropagation**
- implement various **update rules** used to optimize Neural Networks
- implement **batch normalization** for training deep networks
- implement **dropout** to regularize networks
- effectively **cross-validate** and find the best hyperparameters for Neural
  Network architecture
- understand the architecture of **Convolutional Neural Networks** and train
  gain experience with training these models on data

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine on Google Cloud.

### Working remotely on Google Cloud (Recommended)

**Note:** after following these instructions, make sure you go to **Working on the assignment** below (you can skip the **Working locally** section).

As part of this course, you can use Google Cloud for your assignments. We recommend this route for anyone who is having trouble with installation set-up, or if you would like to use better CPU/GPU resources than you may have locally. 

#### GPU Resources
**A GPU will only help on Question 5**. Please see the Google Cloud GPU set-up tutorial [here](http://cs231n.github.io/gce-tutorial-gpus/) for instructions. The GPU instances are much more expensive, so use them only when needed.

Once you've got the cloud instance running, make sure to run the following line to enter the virtual environment that we prepared for you (you do **not** need to make your own virtual environment):

```
source /home/cs231n/myVE35/bin/activate
```

We strongly recommend using Google Cloud with GPU support for the **Question 5** of this assignment (the TensorFlow or PyTorch notebooks), since your training will go much, much faster. However, it will not help on any of the other questions. 

### Working locally
Here's how you install the necessary dependencies:

**(OPTIONAL) Installing GPU drivers:**
If you choose to work locally, you are at no disadvantage for the first parts of the assignment. For the last question, which is in TensorFlow or PyTorch, however, having a GPU will be a significant advantage. We recommend using a Google Cloud Instance with a GPU, at least for this part. If you have your own NVIDIA GPU, however, and wish to use that, that's fine -- you'll need to install the drivers for your GPU, install CUDA, install cuDNN, and then install either [TensorFlow](https://www.tensorflow.org/install/) or [PyTorch](http://pytorch.org/). You could theoretically do the entire assignment with no GPUs, though this will make training much slower in the last part. However, our reference code runs in 10-15 minutes on a dual-core laptop without a GPU, so it is certainly possible. 

**Installing Python 3.5+:**
To use python3, make sure to install version 3.5 or 3.6 on your local machine. If you are on Mac OS X, you can do this using [Homebrew](https://brew.sh) with `brew install python3`. You can find instructions for Ubuntu [here](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).

**Virtual environment:**
If you decide to work locally, we recommend using [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following:

```bash
cd assignment2
sudo pip install virtualenv      # This may already be installed
python3 -m venv .env       		 # Create a virtual environment (python3)
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Note that this does NOT install TensorFlow or PyTorch, 
# which you need to do yourself.

# Work on the assignment for a while ...
# ... and when you're done:
deactivate                       # Exit the virtual environment
```

Note that every time you want to work on the assignment, you should run `source .env/bin/activate` (from within your `assignment2` folder) to re-activate the virtual environment, and `deactivate` again whenever you are done.

## Working on the assignment:
Get the code as a zip file [here](http://cs231n.stanford.edu/assignments/2017/spring1617_assignment2.zip).

### Download data:
Once you have the starter code (regardless of which method you choose above), you will need to download the CIFAR-10 dataset.
Run the following from the `assignment2` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

### Start IPython:
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment2` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** This year, the `assignment2` code has been tested to be compatible with python versions `3.5` and `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). For this assignment, we are NOT officially supporting python2. Use it at your own risk. You will need to make sure that during your `virtualenv` setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `python --version`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment1` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Submitting your work:
Whether you work on the assignment locally or using Google Cloud, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment2.zip`. Please submit this file on [Canvas](https://canvas.stanford.edu/courses/66461/).

### Q1: Fully-connected Neural Network (25 points)
The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our
modular layer design, and then use those layers to implement fully-connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (25 points)
In the IPython notebook `BatchNormalization.ipynb` you will implement batch
normalization, and use it to train deep fully-connected networks.

### Q3: Dropout (10 points)
The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore
its effects on model generalization.

### Q4: Convolutional Networks (30 points)
In the IPython Notebook ConvolutionalNetworks.ipynb you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch / TensorFlow on CIFAR-10 (10 points)
For this last part, you will be working in either TensorFlow or PyTorch, two popular and powerful deep learning frameworks. **You only need to complete ONE of these two notebooks.** You do NOT need to do both, but a very small amount of extra credit will be awarded to those who do. 

Open up either `PyTorch.ipynb` or `TensorFlow.ipynb`. There, you will learn how the framework works, culminating in training a  convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Q5: Do something extra! (up to +10 points)
In the process of training your network, you should feel free to implement
anything that you want to get better performance. You can modify the solver,
implement additional layers, use different types of regularization, use an
ensemble of models, or anything else that comes to mind. If you implement these
or other ideas not covered in the assignment then you will be awarded some bonus
points.


================================================
FILE: assignments/2017/assignment3.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2017/assignment3/
---
**Note: this is the 2017 version of this assignment.**

In this assignment you will implement recurrent networks, and apply them to image captioning on Microsoft COCO. You will also explore methods for visualizing the features of a pretrained model on ImageNet, and also this model to implement Style Transfer. Finally, you will train a generative adversarial network to generate images that look like a training dataset!

The goals of this assignment are as follows:

- Understand the architecture of *recurrent neural networks (RNNs)* and how they operate on sequences by sharing weights over time
- Understand and implement both Vanilla RNNs and Long-Short Term Memory (LSTM) RNNs
- Understand how to sample from an RNN language model at test-time
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system
- Understand how a trained convolutional network can be used to compute gradients with respect to the input image
- Implement and different applications of image gradients, including saliency maps, fooling images, class visualizations.
- Understand and implement style transfer.
- Understand how to train and implement a generative adversarial network (GAN) to produce images that look like a dataset. 

## Setup
You can work on the assignment in one of two ways: locally on your own machine, or on a virtual machine on Google Cloud.

### Working remotely on Google Cloud (Recommended)

**Note:** after following these instructions, make sure you go to **Working on the assignment** below (you can skip the **Working locally** section).

As part of this course, you can use Google Cloud for your assignments. We recommend this route for anyone who is having trouble with installation set-up, or if you would like to use better CPU/GPU resources than you may have locally. 

#### GPU Resources
GPUs are **not required** for this assignment, but will help to speed up training and processing time for questions 3-5. Please see the Google Cloud GPU set-up tutorial [here](http://cs231n.github.io/gce-tutorial-gpus/) for instructions. The GPU instances are much more expensive, so use them only when needed. **We only recommend a GPU for Q5 (GANs)**, Q3 and Q4 run in under a minute even on CPU, while Q5 can take 30+ minutes for your model to converge on a CPU (versus under a minute on the Google Cloud GPUs). 

Once you've got the cloud instance running, make sure to run the following line to enter the virtual environment that we prepared for you (you do **not** need to make your own virtual environment):

```bash
source /home/cs231n/myVE35/bin/activate
```

We recommend using Google Cloud with GPU support for the question 5 of this assignment (the GAN notebook), since your training will go much, much faster. However, it will not help at all for questions 1 and 2 (RNN and LSTM), and questions 3 and 4 are still fast on CPU (these notebook should run in a few minutes).

#### What do I do if my Google Cloud GPUs disappeared?
You might note that sometimes, your GPUs are no longer accessible on your Google Cloud instance after you restart it. If this happens, please run the following commands in your assignment3 directory:

```bash
sudo apt-get remove unattended-upgrades
chmod u+x where_are_my_drivers.sh
./where_are_my_drivers.sh
```

If this isn't working, you can find more detailed instructions and manual ways of fixing this [here](https://cloud.google.com/compute/docs/gpus/add-gpus#install-driver-script). You should follow the "Ubuntu 16.04" instructions. 

### Working locally
Here's how you install the necessary dependencies:

**(OPTIONAL) Installing GPU drivers:**
If you choose to work locally, you are at no disadvantage for the first parts of the assignment. For the last question, which is in TensorFlow or PyTorch, however, having a GPU will be a significant advantage. We recommend using a Google Cloud Instance with a GPU, at least for this part. If you have your own NVIDIA GPU, however, and wish to use that, that's fine -- you'll need to install the drivers for your GPU, install CUDA, install cuDNN, and then install either [TensorFlow](https://www.tensorflow.org/install/) or [PyTorch](http://pytorch.org/). You could theoretically do the entire assignment with no GPUs, though this will make training much slower in the last part. However, our reference code runs in 10-15 minutes on a dual-core laptop without a GPU, so it is certainly possible. 

**Installing Python 3.5+:**
To use python3, make sure to install version 3.5 or 3.6 on your local machine. If you are on Mac OS X, you can do this using [Homebrew](https://brew.sh) with `brew install python3`. You can find instructions for Ubuntu [here](https://www.digitalocean.com/community/tutorials/how-to-install-python-3-and-set-up-a-local-programming-environment-on-ubuntu-16-04).

**Virtual environment:**
If you decide to work locally, we recommend using [virtual environment](http://docs.python-guide.org/en/latest/dev/virtualenvs/) for the project. If you choose not to use a virtual environment, it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment, run the following:

```bash
cd assignment2
sudo pip install virtualenv      # This may already be installed
virtualenv -p python3 .env   		 # Create a virtual environment (python3)
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install dependencies
# Note that this does NOT install TensorFlow or PyTorch, 
# which you need to do yourself.

# Work on the assignment for a while ...
# ... and when you're done:
deactivate                       # Exit the virtual environment
```

Note that every time you want to work on the assignment, you should run `source .env/bin/activate` (from within your `assignment3` folder) to re-activate the virtual environment, and `deactivate` again whenever you are done.

## Working on the assignment:

### Get the code as a zip file [here](http://cs231n.stanford.edu/assignments/2017/spring1617_assignment3_v3.zip).

### Download data:
Once you have the starter code (regardless of which method you choose above), you will need to download the COCO captioning data,  pretrained SqueezeNet model (TensorFlow-only), and a few ImageNet validation images.
Run the following from the `assignment3` directory:

```bash
cd cs231n/datasets
./get_assignment3_data.sh
```

### Start IPython:
After you have downloaded the data, you should start the IPython notebook server from the
`assignment3` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** This year, the `assignment3` code has been tested to be compatible with python versions `3.5` and `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). For this assignment, we are NOT officially supporting python2. Use it at your own risk. You will need to make sure that during your `virtualenv` setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `python --version`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment3` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Submitting your work:
Whether you work on the assignment locally or using Google Cloud, once you are done
working run the `collectSubmission.sh` script; this will produce a file called
`assignment3.zip`. Please submit this file on [Canvas](https://canvas.stanford.edu/courses/66461/).

#### You can do Questions 3, 4, and 5 in TensorFlow or PyTorch. There are two versions of each notebook, with suffixes -TensorFlow or -PyTorch. No extra credit will be awarded if you do a question in both TensorFlow and PyTorch.

### Q1: Image Captioning with Vanilla RNNs (25 points)
The Jupyter notebook `RNN_Captioning.ipynb` will walk you through the
implementation of an image captioning system on MS-COCO using vanilla recurrent
networks.

### Q2: Image Captioning with LSTMs (30 points)
The Jupyter notebook `LSTM_Captioning.ipynb` will walk you through the
implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image
captioning on MS-COCO.

### Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (15 points)
The Jupyter notebooks `NetworkVisualization-TensorFlow.ipynb` /`NetworkVisualization-PyTorch.ipynb` will introduce the pretrained SqueezeNet model, compute gradients
with respect to images, and use them to produce saliency maps and fooling
images. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q4: Style Transfer (15 points)
In the Jupyter notebooks `StyleTransfer-TensorFlow.ipynb`/`StyleTransfer-PyTorch.ipynb` you will learn how to create images with the content of one image but the style of another. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q5: Generative Adversarial Networks (15 points)
In the Jupyter notebooks `GANs-TensorFlow.ipynb`/`GANs-PyTorch.ipynb` you will learn how to generate images that match a training dataset, and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awarded if you complete both notebooks.


================================================
FILE: assignments/2018/assignment1.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2018/assignment1/
---
**Note: this is the 2018 version of this assignment.**

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- develop proficiency in writing efficient **vectorized** code with numpy
- implement and apply a k-Nearest Neighbor (**kNN**) classifier
- implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- implement and apply a **Softmax** classifier
- implement and apply a **Two layer neural network** classifier
- understand the differences and tradeoffs between these classifiers
- get a basic understanding of performance improvements from using **higher-level representations** than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

## Setup

Get the code as a zip file [here](http://cs231n.github.io/assignments/2018/spring1718_assignment1.zip).

You can follow the setup instructions [here](http://cs231n.github.io/setup-instructions/).

### Download data:
Once you have the starter code (regardless of which method you choose above), you will need to download the CIFAR-10 dataset.
Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

### Start IPython:
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment1` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** This year, the `assignment1` code has been tested to be compatible with python version `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment1` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Q1: k-Nearest Neighbor classifier (20 points)

The IPython Notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (25 points)

The IPython Notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (20 points)

The IPython Notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network (25 points)
The IPython Notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features (10 points)

The IPython Notebook **features.ipynb** will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

### Submitting your work
There are **_two_** steps to submitting your assignment:

**1.** Submit a pdf of the completed iPython notebooks to [Gradescope](https://gradescope.com/courses/17367). If you are enrolled in the course, then you should have already been automatically added to the course on Gradescope. 

To produce a pdf of your work, you can first convert each of the .ipynb files to HTML. To do this, simply run 

```bash
ipython nbconvert --to html FILE.ipynb
```
for each of the notebooks, where `FILE.ipynb` is the notebook you want to convert. Then you can convert the HTML files to PDFs with your favorite web browser, and then concatenate them all together in your favorite PDF viewer/editor. Submit this final PDF on Gradescope, and be sure to tag the questions correctly!

**Important:** _Please make sure that the submitted notebooks have been run and the cell outputs are visible._


**2.** Submit a zip file of your assignment on AFS. To do this, run the provided `collectSubmission.sh` script, which will produce a file called `assignment1.zip`. You will then need to SCP this file over to Stanford AFS using the following command (entering your Stanford password if requested):

```bash
# Run from the assignment directory where the zip file is located
scp assignment1.zip YOUR_SUNET@myth.stanford.edu:~/DEST_PATH
```

`YOUR_SUNET` should be replaced with your SUNetID (e.g. `jdoe`), and `DEST_PATH` should be a path to an existing directory on AFS where you want the zip file to be copied to (you may want to create a CS231N directory for convenience). Once this is done, run the following:

 ```bash
# SSH into the Stanford Myth machines 
ssh YOUR_SUNET@myth.stanford.edu

# Descend into the directory where the zip file is now located
cd DEST_PATH

# Run the script to actually submit the assignment
/afs/ir/class/cs231n/submit
```
Once you run the submit script, simply follow the on-screen prompts to finish submitting the assignment on AFS. If successful, you should see a "SUBMIT SUCCESS" message output by the script.


================================================
FILE: assignments/2018/assignment2.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2018/assignment2/
---
**Note: this is the 2018 version of this assignment.**

In this assignment you will practice writing backpropagation code, and training
Neural Networks and Convolutional Neural Networks. The goals of this assignment
are as follows:

- understand **Neural Networks** and how they are arranged in layered
  architectures
- understand and be able to implement (vectorized) **backpropagation**
- implement various **update rules** used to optimize Neural Networks
- implement **Batch Normalization** and **Layer Normalization** for training deep networks
- implement **Dropout** to regularize networks
- understand the architecture of **Convolutional Neural Networks** and
  get practice with training these models on data
- gain experience with a major deep learning framework, such as **TensorFlow** or **PyTorch**.

## Setup
Get the code as a zip file [here](http://cs231n.github.io/assignments/2018/spring1718_assignment2_v2.zip).

You can follow the setup instructions [here](http://cs231n.github.io/setup-instructions/).

**NOTE: Our initial release of the assignment did not include the PyTorch and TensorFlow notebooks for Q5. These have now been finalized, and the zip file has been updated with these notebooks.**

### Download data:
Once you have the starter code, you will need to download the CIFAR-10 dataset.
Run the following from the `assignment2` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

### Start IPython:
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment2` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely).

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** This year, the `assignment2` code has been tested to be compatible with python version `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment2` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Q1: Fully-connected Neural Network (20 points)
The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our
modular layer design, and then use those layers to implement fully-connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (30 points)
In the IPython notebook `BatchNormalization.ipynb` you will implement batch
normalization, and use it to train deep fully-connected networks.

### Q3: Dropout (10 points)
The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore
its effects on model generalization.

### Q4: Convolutional Networks (30 points)
In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch / TensorFlow on CIFAR-10 (10 points)
For this last part, you will be working in either TensorFlow or PyTorch, two popular and powerful deep learning frameworks. **You only need to complete ONE of these two notebooks.** You do NOT need to do both, and we will _not_ be awarding extra credit to those who do. 

Open up either `PyTorch.ipynb` or `TensorFlow.ipynb`. There, you will learn how the framework works, culminating in training a  convolutional network of your own design on CIFAR-10 to get the best performance you can.

**NOTE**: The PyTorch notebook requires PyTorch version 0.4, which was released on 4/24/2018. You can install this version of PyTorch using conda or pip by following the instructions here: http://pytorch.org/


### Submitting your work
There are **_two_** steps to submitting your assignment:

**1.** Submit a pdf of the completed iPython notebooks to [Gradescope](https://gradescope.com/courses/17367). If you are enrolled in the course, then you should have already been automatically added to the course on Gradescope. 

To produce a pdf of your work, you can first convert each of the .ipynb files to HTML. To do this, simply run from your assignment directory

```bash
jupyter nbconvert --to html FILE.ipynb
```
for each of the notebooks, where `FILE.ipynb` is the notebook you want to convert. Then you can convert the HTML files to PDFs with your favorite web browser, and then concatenate them all together in your favorite PDF viewer/editor. Submit this final PDF on Gradescope, and be sure to tag the questions correctly!

**Important:** _Please make sure that the submitted notebooks have been run and the cell outputs are visible._


**2.** Submit a zip file of your assignment on AFS. To do this, run the provided `collectSubmission.sh` script, which will produce a file called `assignment2.zip`. You will then need to SCP this file over to Stanford AFS using the following command (entering your Stanford password if requested):

```bash
# Run from the assignment directory where the zip file is located
scp assignment2.zip YOUR_SUNET@myth.stanford.edu:~/DEST_PATH
```

`YOUR_SUNET` should be replaced with your SUNetID (e.g. `jdoe`), and `DEST_PATH` should be a path to an existing directory on AFS where you want the zip file to be copied to (you may want to create a CS231N directory for convenience). Once this is done, run the following:

 ```bash
# SSH into the Stanford Myth machines 
ssh YOUR_SUNET@myth.stanford.edu

# Descend into the directory where the zip file is now located
cd DEST_PATH

# Run the script to actually submit the assignment
/afs/ir/class/cs231n/submit
```
Once you run the submit script, simply follow the on-screen prompts to finish submitting the assignment on AFS. If successful, you should see a "SUBMIT SUCCESS" message output by the script.


================================================
FILE: assignments/2018/assignment3.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2018/assignment3/
---
**Note: this is the 2018 version of this assignment.**

In this assignment you will implement recurrent networks, and apply them to image captioning on Microsoft COCO. You will also explore methods for visualizing the features of a pretrained model on ImageNet, and also this model to implement Style Transfer. Finally, you will train a Generative Adversarial Network to generate images that look like a training dataset!

The goals of this assignment are as follows:

- Understand the architecture of *recurrent neural networks (RNNs)* and how they operate on sequences by sharing weights over time
- Understand and implement both Vanilla RNNs and Long-Short Term Memory (LSTM) networks.
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system
- Explore various applications of image gradients, including saliency maps, fooling images, class visualizations.
- Understand and implement techniques for image style transfer.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset. 

## Setup

Get the code as a zip file [here](http://cs231n.github.io/assignments/2018/spring1718_assignment3.zip).

**Update (5/11/18, 6:05 a.m.): The assignment zip file has been updated to include the proper optim.py and layers.py files.**

You can follow the setup instructions [here](http://cs231n.github.io/setup-instructions/).

If you haven't already, you'll need to install either TensorFlow 1.7 (installation instructions [here](https://www.tensorflow.org/versions/r1.7/install/)) or PyTorch 0.4 (instructions [here](http://pytorch.org/)) depending on which notebooks you decide to complete.

### Download data:
Once you have the starter code, you will need to download the COCO captioning data,  pretrained SqueezeNet model (TensorFlow-only), and a few ImageNet validation images.
Run the following from the `assignment3` directory:

```bash
cd cs231n/datasets
./get_assignment3_data.sh
```

### Start IPython:
After you have downloaded the data, you should start the IPython notebook server from the
`assignment3` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](http://cs231n.github.io/gce-tutorial/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** This year, the `assignment3` code has been tested to be compatible with python version `3.6` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment3` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

#### You can do Questions 3, 4, and 5 in TensorFlow or PyTorch. There are two versions of each of these notebooks, one for TensorFlow and one for PyTorch. No extra credit will be awarded if you do a question in both TensorFlow and PyTorch.

### Q1: Image Captioning with Vanilla RNNs (25 points)
The Jupyter notebook `RNN_Captioning.ipynb` will walk you through the
implementation of an image captioning system on MS-COCO using vanilla recurrent
networks.

### Q2: Image Captioning with LSTMs (30 points)
The Jupyter notebook `LSTM_Captioning.ipynb` will walk you through the
implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image
captioning on MS-COCO.

### Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (15 points)
The Jupyter notebooks `NetworkVisualization-TensorFlow.ipynb` /`NetworkVisualization-PyTorch.ipynb` will introduce the pretrained SqueezeNet model, compute gradients
with respect to images, and use them to produce saliency maps and fooling
images. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q4: Style Transfer (15 points)
In the Jupyter notebooks `StyleTransfer-TensorFlow.ipynb`/`StyleTransfer-PyTorch.ipynb` you will learn how to create images with the content of one image but the style of another. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q5: Generative Adversarial Networks (15 points)
In the Jupyter notebooks `GANS-TensorFlow.ipynb`/`GANS-PyTorch.ipynb` you will learn how to generate images that match a training dataset, and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awarded if you complete both notebooks.

### Submitting your work
There are **_two_** steps that you must complete to submit your assignment:

**1.** Submit a pdf of the completed iPython notebooks to [Gradescope](https://gradescope.com/courses/17367). If you are enrolled in the course, then you should have already been automatically added to the course on Gradescope. 

To produce a pdf of your work, you can first convert each of the .ipynb files to HTML. To do this, simply run from your assignment directory

```bash
jupyter nbconvert --to html FILE.ipynb
```
for each of the notebooks, where `FILE.ipynb` is the notebook you want to convert. Then you can convert the HTML files to PDFs with your favorite web browser, and then concatenate them all together in your favorite PDF viewer/editor. Submit this final PDF on Gradescope, and be sure to tag the questions correctly!

**Important:** _Please make sure that the submitted notebooks have been run and the cell outputs are visible._


**2.** Submit a zip file of your assignment on AFS. To do this, run the provided `collectSubmission.sh` script, which will produce a file called `assignment3.zip`. You will then need to SCP this file over to Stanford AFS using the following command (entering your Stanford password if requested):

```bash
# Run from the assignment directory where the zip file is located
scp assignment3.zip YOUR_SUNET@myth.stanford.edu:~/DEST_PATH
```

`YOUR_SUNET` should be replaced with your SUNetID (e.g. `jdoe`), and `DEST_PATH` should be a path to an existing directory on AFS where you want the zip file to be copied to (you may want to create a CS231N directory for convenience). Once this is done, run the following:

 ```bash
# SSH into the Stanford Myth machines 
ssh YOUR_SUNET@myth.stanford.edu

# Descend into the directory where the zip file is now located
cd DEST_PATH

# Run the script to actually submit the assignment
/afs/ir/class/cs231n/submit
```
Once you run the submit script, simply follow the on-screen prompts to finish submitting the assignment on AFS. If successful, you should see a "SUBMIT SUCCESS" message output by the script.


================================================
FILE: assignments/2019/assignment1.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2019/assignment1/
---

In this assignment you will practice putting together a simple image classification pipeline, based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- develop proficiency in writing efficient **vectorized** code with numpy
- implement and apply a k-Nearest Neighbor (**kNN**) classifier
- implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- implement and apply a **Softmax** classifier
- implement and apply a **Two layer neural network** classifier
- understand the differences and tradeoffs between these classifiers
- get a basic understanding of performance improvements from using **higher-level representations** than raw pixels (e.g. color histograms, Histogram of Gradient (HOG) features)

## Setup

Get the code as a zip file [here](http://cs231n.github.io/assignments/2019/spring1819_assignment1.zip).

You can follow the setup instructions [here](/setup-instructions).

### Download data:
Once you have the starter code (regardless of which method you choose above), you will need to download the CIFAR-10 dataset.
Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```

### Start IPython:
After you have the CIFAR-10 data, you should start the IPython notebook server from the
`assignment1` directory, with the `jupyter notebook` command. (See the [Google Cloud Tutorial](https://github.com/cs231n/gcloud/) for any additional steps you may need to do for setting this up, if you are working remotely)

If you are unfamiliar with IPython, you can also refer to our
[IPython tutorial](/ipython-tutorial).

### Some Notes
**NOTE 1:** There are `# *****START OF YOUR CODE`/`# *****END OF YOUR CODE` tags denoting the start and end of code sections you should fill out. Take care to not delete or modify these tags, or your assignment may not be properly graded.

**NOTE 2:** The submission process this year has **2 steps**, requiring you to 1. run a submission script and 2. download/upload an auto-generated pdf (details below.) We suggest **_making a test submission early on_** to make sure you are able to successfully submit your assignment on time (a maximum of 10 submissions can be made.)

**NOTE 3:** This year, the `assignment1` code has been tested to be compatible with python version `3.7` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 4:** If you are working in a virtual environment on OSX, you may *potentially* encounter
errors with matplotlib due to the [issues described here](http://matplotlib.org/faq/virtualenv_faq.html). In our testing, it seems that this issue is no longer present with the most recent version of matplotlib, but if you do end up running into this issue you may have to use the `start_ipython_osx.sh` script from the `assignment1` directory (instead of `jupyter notebook` above) to launch your IPython notebook server. Note that you may have to modify some variables within the script to match your version of python/installation directory. The script assumes that your virtual environment is named `.env`.

### Q1: k-Nearest Neighbor classifier (20 points)

The IPython Notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (25 points)

The IPython Notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (20 points)

The IPython Notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network (25 points)
The IPython Notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features (10 points)

The IPython Notebook **features.ipynb** will walk you through this exercise, in which you will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

### Submitting your work

**Important:** _Please make sure that the submitted notebooks have been run and the cell outputs are visible._

There are **_two_** steps to submitting your assignment:

**1.** Run the provided `collectSubmission.sh` script in the `assignment1` directory.

You will be prompted for your SunetID (e.g. `jdoe`) and will need to provide your Stanford password. This script will generate a zip file of your code, submit your source code to Stanford AFS, and generate a pdf `a1.pdf` in a `cs231n-2019-assignment1/` folder in your AFS home directory. 

If your submission for this step was successful, you should see a display message 

`### Code submitted at [TIME], [N] submission attempts remaining. ###`

**2.** Download the generated `a1.pdf` from AFS, then submit the pdf to [Gradescope](https://gradescope.com/courses/17367). If you are enrolled in the course, you should have already been automatically added to the course on Gradescope. 

================================================
FILE: assignments/2019/assignment2.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2019/assignment2/
---

In this assignment you will practice writing backpropagation code, and training
Neural Networks and Convolutional Neural Networks. The goals of this assignment
are as follows:

- understand **Neural Networks** and how they are arranged in layered
  architectures
- understand and be able to implement (vectorized) **backpropagation**
- implement various **update rules** used to optimize Neural Networks
- implement **Batch Normalization** and **Layer Normalization** for training deep networks
- implement **Dropout** to regularize networks
- understand the architecture of **Convolutional Neural Networks** and
  get practice with training these models on data
- gain experience with a major deep learning framework, such as **TensorFlow** or **PyTorch**.

## Setup
Get the code as a zip file [here](http://cs231n.github.io/assignments/2019/spring1819_assignment2.zip).

You can follow the setup instructions [here](/setup-instructions).

If you performed the google cloud setup already for assignment1, you can skip this step and use the virtual machine you created previously. 
(However, if you're using your virtual machine from assignment1, you might need to perform additional installation steps for the 5th notebook depending on whether you're using Pytorch or Tensorflow. See below for details.)

### Some Notes
**NOTE 1:** This year, the `assignment2` code has been tested to be compatible with python version `3.7` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2:** As noted in the setup instructions, we recommend you to develop on Google Cloud, and we have limited support for local machine configurations. In particular, for students who wish to develop with Windows machines, we recommend installing a Linux subsystem (preferably Ubuntu) via the [Windows App Store](https://docs.microsoft.com/en-us/windows/wsl/install-win10) to streamline the AFS submission process.

**NOTE 3:** The submission process this year has **2 steps**, requiring you to 1. run a submission script and 2. download/upload an auto-generated pdf (details below.) We suggest **_making a test submission early on_** to make sure you are able to successfully submit your assignment on time (a maximum of 10 successful submissions can be made.)

### Q1: Fully-connected Neural Network (20 points)
The IPython notebook `FullyConnectedNets.ipynb` will introduce you to our
modular layer design, and then use those layers to implement fully-connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (30 points)
In the IPython notebook `BatchNormalization.ipynb` you will implement batch
normalization, and use it to train deep fully-connected networks.

### Q3: Dropout (10 points)
The IPython notebook `Dropout.ipynb` will help you implement Dropout and explore
its effects on model generalization.

### Q4: Convolutional Networks (30 points)
In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch / TensorFlow on CIFAR-10 (10 points)
For this last part, you will be working in either TensorFlow or PyTorch, two popular and powerful deep learning frameworks. **You only need to complete ONE of these two notebooks.** You do NOT need to do both, and we will _not_ be awarding extra credit to those who do. 

Open up either `PyTorch.ipynb` or `TensorFlow.ipynb`. There, you will learn how the framework works, culminating in training a  convolutional network of your own design on CIFAR-10 to get the best performance you can.

**NOTE 1**: The PyTorch notebook requires PyTorch version 1.0, which comes pre-installed on the Google cloud instances.

**NOTE 2**: The TensorFlow notebook requires Tensorflow version 2.0. If you want to work on the Tensorflow notebook with your VM from assignment1, please follow the instructions on [Piazza](https://piazza.com/class/js3o5prh5w378a?cid=384) to install TensorFlow. 
 New virtual machines that are set up following the [instructions](/setup-instructions) will come with the correct version of Tensorflow.


### Submitting your work
There are **_two_** steps to submitting your assignment:

**1.** Run the provided `collectSubmission.sh` script in the `assignment2` directory.

You will be prompted for your SunetID (e.g. `jdoe`) and will need to provide your Stanford password. This script will generate a zip file of your code, submit your source code to Stanford AFS, and generate a pdf `a2.pdf` in a `cs231n-2019-assignment2/` folder in your AFS home directory. 

If your submission for this step was successful, you should see a display message 

`### Code submitted at [TIME], [N] submission attempts remaining. ###`

**2.** Download the generated `a2.pdf` from AFS, then submit the pdf to [Gradescope](https://gradescope.com/courses/17367).


================================================
FILE: assignments/2019/assignment3.md
================================================
---
layout: page
mathjax: true
permalink: /assignments2019/assignment3/
---

In this assignment you will implement recurrent networks, and apply them to image captioning on Microsoft COCO. You will also explore methods for visualizing the features of a pretrained model on ImageNet, and also this model to implement Style Transfer. Finally, you will train a Generative Adversarial Network to generate images that look like a training dataset!

The goals of this assignment are as follows:

- Understand the architecture of *recurrent neural networks (RNNs)* and how they operate on sequences by sharing weights over time
- Understand and implement both Vanilla RNNs and Long-Short Term Memory (LSTM) networks.
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system
- Explore various applications of image gradients, including saliency maps, fooling images, class visualizations.
- Understand and implement techniques for image style transfer.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset. 

## Setup

Get the code as a zip file [here](http://cs231n.github.io/assignments/2019/spring1819_assignment3.zip).
You should be able to use your setup from assignment 2.

### Download data:
Once you have the starter code, you will need to download the COCO captioning data,  pretrained SqueezeNet model (TensorFlow-only), and a few ImageNet validation images.
Run the following from the `assignment3` directory:

```bash
cd cs231n/datasets
./get_assignment3_data.sh
```

### Some Notes
**NOTE 1:** This year, the `assignment3` code has been tested to be compatible with python version `3.7` (it may work with other versions of `3.x`, but we won't be officially supporting them). You will need to make sure that during your virtual environment setup that the correct version of `python` is used. You can confirm your python version by (1) activating your virtualenv and (2) running `which python`.

**NOTE 2: Please make sure that the submitted notebooks have been run and saved, and the cell outputs are visible on your pdfs.** In addition, please **do not use the Web AFS interface** to retrieve your pdfs, and rely on **scp commands** directly, as there is a known Web AFS caching bug University IT is looking at that causes AFS files to not be properly updated with their most current version.

#### You can do Questions 3, 4, and 5 in TensorFlow or PyTorch. There are two versions of each of these notebooks, one for TensorFlow and one for PyTorch. No extra credit will be awarded if you do a question in both TensorFlow and PyTorch.

### Q1: Image Captioning with Vanilla RNNs (25 points)
The Jupyter notebook `RNN_Captioning.ipynb` will walk you through the
implementation of an image captioning system on MS-COCO using vanilla recurrent
networks.

### Q2: Image Captioning with LSTMs (30 points)
The Jupyter notebook `LSTM_Captioning.ipynb` will walk you through the
implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image
captioning on MS-COCO.

### Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (15 points)
The Jupyter notebooks `NetworkVisualization-TensorFlow.ipynb` /`NetworkVisualization-PyTorch.ipynb` will introduce the pretrained SqueezeNet model, compute gradients
with respect to images, and use them to produce saliency maps and fooling
images. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q4: Style Transfer (15 points)
In the Jupyter notebooks `StyleTransfer-TensorFlow.ipynb`/`StyleTransfer-PyTorch.ipynb` you will learn how to create images with the content of one image but the style of another. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q5: Generative Adversarial Networks (15 points)
In the Jupyter notebooks `GANS-TensorFlow.ipynb`/`GANS-PyTorch.ipynb` you will learn how to generate images that match a training dataset, and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awarded if you complete both notebooks.

### Submitting your work

**Important:** _Please make sure that the submitted notebooks have been run and saved, and the cell outputs are visible on your pdfs._ In addition, please _do not use the Web AFS interface_ to retrieve your pdfs, and rely on scp directly.

There are **_two_** steps to submitting your assignment:

**1.** Run the provided `collectSubmission_*.sh` script in the `assignment3` directory, depending on which version (TensorFlow/PyTorch) you intend to submit.

You will be prompted for your SunetID (e.g. `jdoe`) and will need to provide your Stanford password. This script will generate a zip file of your code, submit your source code to Stanford AFS, and generate a pdf `a3.pdf` in a `cs231n-2019-assignment3/` folder in your AFS home directory. 

If your submission for this step was successful, you should see a display message 

`### Code submitted at [TIME], [N] submission attempts remaining. ###`

**2.** Download the generated `a3.pdf` from AFS, then submit the pdf to Gradescope.
Again, do NOT use Web AFS to retrieve this file, and instead use the following scp command.

```bash
# replace DEST_PATH with where you want the pdf to be downloaded to.
scp YOUR_SUNET@myth.stanford.edu:cs231n-2019-assignment3/a3.pdf DEST_PATH/a3.pdf
```


================================================
FILE: assignments/2020/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2020/assignment1/
---

This assignment is due on **Wednesday, April 22 2020** at 11:59pm PST.

<details>
<summary>Handy Download Links</summary>

 <ul>
  <li><a href="{{ site.hw_1_colab }}">Option A: Colab starter code</a></li>
  <li><a href="{{ site.hw_1_jupyter }}">Option B: Jupyter starter code</a></li>
</ul>
</details>

- [Goals](#goals)
- [Setup](#setup)
  - [Option A: Google Colaboratory (Recommended)](#option-a-google-colaboratory-recommended)
  - [Option B: Local Development](#option-b-local-development)
- [Q1: k-Nearest Neighbor classifier (20 points)](#q1-k-nearest-neighbor-classifier-20-points)
- [Q2: Training a Support Vector Machine (25 points)](#q2-training-a-support-vector-machine-25-points)
- [Q3: Implement a Softmax classifier (20 points)](#q3-implement-a-softmax-classifier-20-points)
- [Q4: Two-Layer Neural Network (25 points)](#q4-two-layer-neural-network-25-points)
- [Q5: Higher Level Representations: Image Features (10 points)](#q5-higher-level-representations-image-features-10-points)
- [Submitting your work](#submitting-your-work)

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages)
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier
- Implement and apply a Multiclass Support Vector Machine (**SVM**) classifier
- Implement and apply a **Softmax** classifier
- Implement and apply a **Two layer neural network** classifier
- Understand the differences and tradeoffs between these classifiers
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Gradient (HOG) features, etc.

### Setup

You can work on the assignment in one of two ways: **remotely** on Google Colaboratory or **locally** on your own machine.

**Regardless of the method chosen, ensure you have followed the [setup instructions](/setup-instructions) before proceeding.**

#### Option A: Google Colaboratory (Recommended)

**Download.** Starter code containing Colab notebooks can be downloaded [here]({{site.hw_1_colab}}).

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/qvwYtun1uhQ" frameborder="0" allowfullscreen></iframe>

If you choose to work with Google Colab, please watch the workflow tutorial above or read the instructions below.

1. Unzip the starter code zip file. You should see an `assignment1` folder.
2. Create a folder in your personal Google Drive and upload `assignment1/` folder to the Drive folder. We recommend that you call the Google Drive folder `cs231n/assignments/` so that the final uploaded folder has the path `cs231n/assignments/assignment1/`.
3. Each Colab notebook (i.e. files ending in `.ipynb`) corresponds to an assignment question. In Google Drive, double click on the notebook and select the option to open with `Colab`.
4. You will be connected to a Colab VM. You can mount your Google Drive and access your uploaded
files by executing the first cell in the notebook. It will prompt you for an authorization code which you can obtain
from a popup window. The code cell will also automatically download the CIFAR-10 dataset for you.
5. Once you have completed the assignment question (i.e. reached the end of the notebook), you can save your edited files back to your Drive and move on to the next question. For your convenience, we also provide you with a code cell (the very last one) that automatically saves the modified files for that question back to your Drive.
6. Repeat steps 3-5 for each remaining notebook.

**Note 1**. Please make sure that you work on the Colab notebooks in the order of the questions (see below). Specifically, you should work on kNN first, then SVM, the Softmax, then Two-layer Net and finally on Image Features. The reason is that the code cells that get executed *at the end* of the notebooks save the modified files back to your drive and some notebooks may require code from previous notebook.

**Note 2**. Related to above, ensure you are periodically saving your notebook (`File -> Save`), and any edited `.py` files relevant to that notebook (i.e. **by executing the last code cell**) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

#### Option B: Local Development

**Download.** Starter code containing jupyter notebooks can be downloaded [here]({{site.hw_1_jupyter}}).

**Install Packages**. Once you have the starter code, activate your environment (the one you installed in the [Software Setup]({{site.baseurl}}/setup-instructions/) page) and run `pip install -r requirements.txt`.

**Download CIFAR-10**. Next, you will need to download the CIFAR-10 dataset. Run the following from the `assignment1` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```
**Start Jupyter Server**. After you have the CIFAR-10 data, you should start the Jupyter server from the
`assignment1` directory by executing `jupyter notebook` in your terminal.

Complete each notebook, then once you are done, go to the [submission instructions](#submitting-your-work).

### Q1: k-Nearest Neighbor classifier (20 points)

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine (25 points)

The notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier (20 points)

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network (25 points)

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features (10 points)

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, there are **_two_** steps you must follow to submit your assignment:

**1.** If you selected Option A and worked on the assignment in Colab, open `collect_submission.ipynb` in Colab and execute the notebook cells. If you selected Option B and worked on the assignment locally, run the bash script in `assignment1` by executing `bash collectSubmission.sh`.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1.zip`.
* Convert all notebooks into a single PDF file.

**Note for Option B users**. You must have (a) `nbconvert` installed with Pandoc and Tex support and (b) `PyPDF2` installed to successfully convert your notebooks to a PDF file. Please follow these [installation instructions](https://nbconvert.readthedocs.io/en/latest/install.html#installing-nbconvert) to install (a) and run `pip install PyPDF2` to install (b). If you are, for some inexplicable reason, unable to successfully install the above dependencies, you can manually convert each jupyter notebook to HTML (`File -> Download as -> HTML (.html)`), save the HTML page as a PDF, then concatenate all the PDFs into a single PDF submission using your favorite PDF viewer.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1.zip and the pdfs to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/103764).

**Note for Option A users**. Remember to download `a1.zip` and `assignment.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2020/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2020/assignment2/
---

This assignment is due on **Wednesday, May 6 2020** at 11:59pm PDT.

<details>
<summary>Handy Download Links</summary>

 <ul>
  <li><a href="{{ site.hw_2_colab }}">Option A: Colab starter code</a></li>
  <li><a href="{{ site.hw_2_jupyter }}">Option B: Jupyter starter code</a></li>
</ul>
</details>

- [Goals](#goals)
- [Setup](#setup)
  - [Option A: Google Colaboratory (Recommended)](#option-a-google-colaboratory-recommended)
  - [Option B: Local Development](#option-b-local-development)
- [Q1: Fully-connected Neural Network (20 points)](#q1-fully-connected-neural-network-20-points)
- [Q2: Batch Normalization (30 points)](#q2-batch-normalization-30-points)
- [Q3: Dropout (10 points)](#q3-dropout-10-points)
- [Q4: Convolutional Networks (30 points)](#q4-convolutional-networks-30-points)
- [Q5: PyTorch / TensorFlow on CIFAR-10 (10 points)](#q5-pytorch--tensorflow-on-cifar-10-10-points)
- [Submitting your work](#submitting-your-work)

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Understand **Neural Networks** and how they are arranged in layered architectures.
- Understand and be able to implement (vectorized) **backpropagation**.
- Implement various **update rules** used to optimize Neural Networks.
- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, such as **TensorFlow** or **PyTorch**.

### Setup

You can work on the assignment in one of two ways: **remotely** on Google Colaboratory or **locally** on your own machine.

**Regardless of the method chosen, ensure you have followed the [setup instructions](/setup-instructions) before proceeding.**

#### Option A: Google Colaboratory (Recommended)

**Download.** Starter code containing Colab notebooks can be downloaded [here]({{site.hw_2_colab}}).

If you choose to work with Google Colab, please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory).

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/IZUz4pRYlus" frameborder="0" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

#### Option B: Local Development

**Download.** Starter code containing jupyter notebooks can be downloaded [here]({{site.hw_2_jupyter}}).

**Install Packages**. Once you have the starter code, activate your environment (the one you installed in the [Software Setup]({{site.baseurl}}/setup-instructions/) page) and run `pip install -r requirements.txt`.

**Download CIFAR-10**. Next, you will need to download the CIFAR-10 dataset. Run the following from the `assignment2` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```
**Start Jupyter Server**. After you have the CIFAR-10 data, you should start the Jupyter server from the
`assignment2` directory by executing `jupyter notebook` in your terminal.

Complete each notebook, then once you are done, go to the [submission instructions](#submitting-your-work).

### Q1: Fully-connected Neural Network (20 points)

The notebook `FullyConnectedNets.ipynb` will introduce you to our
modular layer design, and then use those layers to implement fully-connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (30 points)

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully-connected networks.

### Q3: Dropout (10 points)

The notebook `Dropout.ipynb` will help you implement Dropout and explore its effects on model generalization.

### Q4: Convolutional Networks (30 points)
In the IPython Notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch / TensorFlow on CIFAR-10 (10 points)
For this last part, you will be working in either TensorFlow or PyTorch, two popular and powerful deep learning frameworks. **You only need to complete ONE of these two notebooks.** You do NOT need to do both, and we will _not_ be awarding extra credit to those who do.

Open up either `PyTorch.ipynb` or `TensorFlow.ipynb`. There, you will learn how the framework works, culminating in training a  convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, there are **_two_** steps you must follow to submit your assignment:

**1.** If you selected Option A and worked on the assignment in Colab, open `collect_submission.ipynb` in Colab and execute the notebook cells. If you selected Option B and worked on the assignment locally, run the bash script in `assignment2` by executing `bash collectSubmission.sh`.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2.zip`.
* Convert all notebooks into a single PDF file.

**Note for Option B users**. You must have (a) `nbconvert` installed with Pandoc and Tex support and (b) `PyPDF2` installed to successfully convert your notebooks to a PDF file. Please follow these [installation instructions](https://nbconvert.readthedocs.io/en/latest/install.html#installing-nbconvert) to install (a) and run `pip install PyPDF2` to install (b). If you are, for some inexplicable reason, unable to successfully install the above dependencies, you can manually convert each jupyter notebook to HTML (`File -> Download as -> HTML (.html)`), save the HTML page as a PDF, then concatenate all the PDFs into a single PDF submission using your favorite PDF viewer.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2.zip and the pdfs to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/103764).

**Note for Option A users**. Remember to download `a2.zip` and `assignment.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2020/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2020/assignment3/
---

This assignment is due on **Wednesday, May 27 2020** at 11:59pm PDT.

<details>
<summary>Handy Download Links</summary>

 <ul>
  <li><a href="{{ site.hw_3_colab }}">Option A: Colab starter code</a></li>
  <li><a href="{{ site.hw_3_jupyter }}">Option B: Jupyter starter code</a></li>
</ul>
</details>

- [Goals](#goals)
- [Setup](#setup)
  - [Option A: Google Colaboratory (Recommended)](#option-a-google-colaboratory-recommended)
  - [Option B: Local Development](#option-b-local-development)
- [Q1: Image Captioning with Vanilla RNNs (29 points)](#q1-image-captioning-with-vanilla-rnns-29-points)
- [Q2: Image Captioning with LSTMs (23 points)](#q2-image-captioning-with-lstms-23-points)
- [Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (15 points)](#q3-network-visualization-saliency-maps-class-visualization-and-fooling-images-15-points)
- [Q4: Style Transfer (15 points)](#q4-style-transfer-15-points)
- [Q5: Generative Adversarial Networks (15 points)](#q5-generative-adversarial-networks-15-points)
- [Submitting your work](#submitting-your-work)

### Goals

In this assignment, you will implement recurrent neural networks and apply them to image captioning on the Microsoft COCO data. You will also explore methods for visualizing the features of a pretrained model on ImageNet, and use this model to implement Style Transfer. Finally, you will train a Generative Adversarial Network to generate images that look like a training dataset!

The goals of this assignment are as follows:

- Understand the architecture of recurrent neural networks (RNNs) and how they operate on sequences by sharing weights over time.
- Understand and implement both Vanilla RNNs and Long-Short Term Memory (LSTM) networks.
- Understand how to combine convolutional neural nets and recurrent nets to implement an image captioning system.
- Explore various applications of image gradients, including saliency maps, fooling images, class visualizations.
- Understand and implement techniques for image style transfer.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset.

### Setup

You should be able to use your setup from assignments 1 and 2.

You can work on the assignment in one of two ways: **remotely** on Google Colaboratory or **locally** on your own machine.

**Regardless of the method chosen, ensure you have followed the [setup instructions](/setup-instructions) before proceeding.**

#### Option A: Google Colaboratory (Recommended)

**Download.** Starter code containing Colab notebooks can be downloaded [here]({{site.hw_3_colab}}).

If you choose to work with Google Colab, please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory).

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/IZUz4pRYlus" frameborder="0" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

#### Option B: Local Development

**Download.** Starter code containing jupyter notebooks can be downloaded [here]({{site.hw_3_jupyter}}).

**Install Packages**. Once you have the starter code, activate your environment (the one you installed in the [Software Setup]({{site.baseurl}}/setup-instructions/) page) and run `pip install -r requirements.txt`.

**Download data**. Next, you will need to download the COCO captioning data, a pretrained SqueezeNet model (for TensorFlow), and a few ImageNet validation images. Run the following from the `assignment3` directory:

```bash
cd cs231n/datasets
./get_datasets.sh
```
**Start Jupyter Server**. After you've downloaded the data, you can start the Jupyter server from the `assignment3` directory by executing `jupyter notebook` in your terminal.

Complete each notebook, then once you are done, go to the [submission instructions](#submitting-your-work).

**You can do Questions 3, 4, and 5 in TensorFlow or PyTorch. There are two versions of each of these notebooks, one for TensorFlow and one for PyTorch. No extra credit will be awarded if you do a question in both TensorFlow and PyTorch**

### Q1: Image Captioning with Vanilla RNNs (29 points)

The notebook `RNN_Captioning.ipynb` will walk you through the implementation of an image captioning system on MS-COCO using vanilla recurrent networks.

### Q2: Image Captioning with LSTMs (23 points)

The notebook `LSTM_Captioning.ipynb` will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image captioning on MS-COCO.

### Q3: Network Visualization: Saliency maps, Class Visualization, and Fooling Images (15 points)

The notebooks `NetworkVisualization-TensorFlow.ipynb`, and `NetworkVisualization-PyTorch.ipynb` will introduce the pretrained SqueezeNet model, compute gradients with respect to images, and use them to produce saliency maps and fooling images. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q4: Style Transfer (15 points)

In thenotebooks `StyleTransfer-TensorFlow.ipynb` or `StyleTransfer-PyTorch.ipynb` you will learn how to create images with the content of one image but the style of another. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awardeded if you complete both notebooks.

### Q5: Generative Adversarial Networks (15 points)

In the notebooks `GANS-TensorFlow.ipynb` or `GANS-PyTorch.ipynb` you will learn how to generate images that match a training dataset, and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. Please complete only one of the notebooks (TensorFlow or PyTorch). No extra credit will be awarded if you complete both notebooks.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, there are **_two_** steps you must follow to submit your assignment:

**1.** If you selected Option A and worked on the assignment in Colab, open `collect_submission.ipynb` in Colab and execute the notebook cells. If you selected Option B and worked on the assignment locally, run the bash script in `assignment3` by executing `bash collectSubmission.sh`.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3.zip`.
* Convert all notebooks into a single PDF file.

**Note for Option B users**. You must have (a) `nbconvert` installed with Pandoc and Tex support and (b) `PyPDF2` installed to successfully convert your notebooks to a PDF file. Please follow these [installation instructions](https://nbconvert.readthedocs.io/en/latest/install.html#installing-nbconvert) to install (a) and run `pip install PyPDF2` to install (b). If you are, for some inexplicable reason, unable to successfully install the above dependencies, you can manually convert each jupyter notebook to HTML (`File -> Download as -> HTML (.html)`), save the HTML page as a PDF, then concatenate all the PDFs into a single PDF submission using your favorite PDF viewer.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3.zip and the pdfs to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/103764).

**Note for Option A users**. Remember to download `a3.zip` and `assignment.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2021/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2021/assignment1/
---

<span style="color:red">This assignment is due on **Friday, April 16 2021** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_1_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: k-Nearest Neighbor classifier](#q1-k-nearest-neighbor-classifier)
- [Q2: Training a Support Vector Machine](#q2-training-a-support-vector-machine)
- [Q3: Implement a Softmax classifier](#q3-implement-a-softmax-classifier)
- [Q4: Two-Layer Neural Network](#q4-two-layer-neural-network)
- [Q5: Higher Level Representations: Image Features](#q5-higher-level-representations-image-features)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/IZUz4pRYlus" frameborder="0" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages).
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy.
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier.
- Implement and apply a Multiclass Support Vector Machine (**SVM**) classifier.
- Implement and apply a **Softmax** classifier.
- Implement and apply a **Two layer neural network** classifier.
- Understand the differences and tradeoffs between these classifiers.
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Gradient (HOG) features, etc.

### Q1: k-Nearest Neighbor classifier

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine

The notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1.zip and the pdfs to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/257661).

Remember to download `a1.zip` and `assignment.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2021/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2021/assignment2/
---

<span style="color:red">This assignment is due on **Friday, April 30 2021** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_2_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Multi-Layer Fully Connected Neural Networks (16%)](#q1-multi-layer-fully-connected-neural-networks-16)
- [Q2: Batch Normalization (34%)](#q2-batch-normalization-34)
- [Q3: Dropout (10%)](#q3-dropout-10)
- [Q4: Convolutional Neural Networks (30%)](#q4-convolutional-neural-networks-30)
- [Q5: PyTorch/TensorFlow on CIFAR-10 (10%)](#q5-pytorchtensorflow-on-cifar-10-10)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/IZUz4pRYlus" frameborder="0" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Understand **Neural Networks** and how they are arranged in layered architectures.
- Understand and be able to implement (vectorized) **backpropagation**.
- Implement various **update rules** used to optimize Neural Networks.
- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, such as **TensorFlow** or **PyTorch**.

### Q1: Multi-Layer Fully Connected Neural Networks (16%)

The notebook `FullyConnectedNets.ipynb` will have you implement fully connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization (34%)

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully connected networks.

### Q3: Dropout (10%)

The notebook `Dropout.ipynb` will help you implement dropout and explore its effects on model generalization.

### Q4: Convolutional Neural Networks (30%)

In the notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch/TensorFlow on CIFAR-10 (10%)

For this last part, you will be working in either TensorFlow or PyTorch, two popular and powerful deep learning frameworks. **You only need to complete ONE of these two notebooks.** While you are welcome to explore both for your own learning, there will be no extra credit.

Open up either `PyTorch.ipynb` or `TensorFlow.ipynb`. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2.zip and the pdfs to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/257661).

Remember to download `a2.zip` and `assignment.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2021/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2021/assignment3/
---

<span style="color:red">This assignment is due on **Tuesday, May 25 2021** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_3_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Image Captioning with Vanilla RNNs (30 points)](#q1-image-captioning-with-vanilla-rnns-30-points)
- [Q2: Image Captioning with Transformers (20 points)](#q2-image-captioning-with-transformers-20-points)
- [Q3: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images (15 points)](#q3-network-visualization-saliency-maps-class-visualization-and-fooling-images-15-points)
- [Q4: Generative Adversarial Networks (15 points)](#q4-generative-adversarial-networks-15-points)
- [Q5: Self-Supervised Learning for Image Classification (20 points)](#q5-self-supervised-learning-for-image-classification-20-points)
- [Extra Credit: Image Captioning with LSTMs (5 points)](#extra-credit-image-captioning-with-lstms-5-points)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/IZUz4pRYlus" frameborder="0" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment, you will implement language networks and apply them to image captioning on the COCO dataset. Then you will explore methods for visualizing the features of a pretrained model on ImageNet and train a Generative Adversarial Network to generate images that look like a training dataset. Finally, you will be introduced to self-supervised learning to automatically learn the visual representations of an unlabeled dataset.

The goals of this assignment are as follows:

- Understand and implement RNN and Transformer networks. Combine them with CNN networks for image captioning.
- Explore various applications of image gradients, including saliency maps, fooling images, class visualizations.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset.
- Understand how to leverage self-supervised learning techniques to help with image classification tasks.

**You will use PyTorch for the majority of this homework.**

### Q1: Image Captioning with Vanilla RNNs (30 points)

The notebook `RNN_Captioning.ipynb` will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

### Q2: Image Captioning with Transformers (20 points)

The notebook `Transformer_Captioning.ipynb` will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

### Q3: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images (15 points)

The notebook `Network_Visualization.ipynb` will introduce the pretrained SqueezeNet model, compute gradients with respect to images, and use them to produce saliency maps and fooling images.

### Q4: Generative Adversarial Networks (15 points)

In the notebook `Generative_Adversarial_Networks.ipynb` you will learn how to generate images that match a training dataset and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Q5: Self-Supervised Learning for Image Classification (20 points)

In the notebook `Self_Supervised_Learning.ipynb`, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Extra Credit: Image Captioning with LSTMs (5 points)

The notebook `LSTM_Captioning.ipynb` will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3_code_submission.zip`.
* Convert all notebooks into a single PDF file called `a3_inline_submission.pdf`.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3_code_submission.zip and a3_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/257661).

Remember to download `a3_code_submission.zip` and `a3_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2022/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2022/assignment1/
---

<span style="color:red">This assignment is due on **Friday, April 15 2022** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_1_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: k-Nearest Neighbor classifier](#q1-k-nearest-neighbor-classifier)
- [Q2: Training a Support Vector Machine](#q2-training-a-support-vector-machine)
- [Q3: Implement a Softmax classifier](#q3-implement-a-softmax-classifier)
- [Q4: Two-Layer Neural Network](#q4-two-layer-neural-network)
- [Q5: Higher Level Representations: Image Features](#q5-higher-level-representations-image-features)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages).
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy.
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier.
- Implement and apply a Multiclass Support Vector Machine (**SVM**) classifier.
- Implement and apply a **Softmax** classifier.
- Implement and apply a **Two layer neural network** classifier.
- Understand the differences and tradeoffs between these classifiers.
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Oriented Gradient (HOG) features, etc.

### Q1: k-Nearest Neighbor classifier

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine

The notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1_code_submission.zip and a1_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/379571).

Remember to download `a1_code_submission.zip` and `a1_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2022/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2022/assignment2/
---

<span style="color:red">This assignment is due on **Monday, May 02 2022** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_2_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Multi-Layer Fully Connected Neural Networks](#q1-multi-layer-fully-connected-neural-networks)
- [Q2: Batch Normalization](#q2-batch-normalization)
- [Q3: Dropout](#q3-dropout)
- [Q4: Convolutional Neural Networks](#q4-convolutional-neural-networks)
- [Q5: PyTorch on CIFAR-10](#q5-pytorch-on-cifar-10)
- [Q6: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images](#q6-network-visualization-saliency-maps-class-visualization-and-fooling-images)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Understand **Neural Networks** and how they are arranged in layered architectures.
- Understand and be able to implement (vectorized) **backpropagation**.
- Implement various **update rules** used to optimize Neural Networks.
- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, such as **TensorFlow** or **PyTorch**.
- Explore various applications of image gradients, including saliency maps, fooling images, class visualizations.

### Q1: Multi-Layer Fully Connected Neural Networks

The notebook `FullyConnectedNets.ipynb` will have you implement fully connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully connected networks.

### Q3: Dropout

The notebook `Dropout.ipynb` will help you implement dropout and explore its effects on model generalization.

### Q4: Convolutional Neural Networks

In the notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch on CIFAR-10

For this part, you will be working with PyTorch, a popular and powerful deep learning framework.

Open up `PyTorch.ipynb`. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Q6: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images

The notebook `Network_Visualization.ipynb` will introduce the pretrained SqueezeNet model, compute gradients with respect to images, and use them to produce saliency maps and fooling images.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2_code_submission.zip and a2_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/379571).

Remember to download `a2_code_submission.zip` and `a2_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2022/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2022/assignment3/
---

<span style="color:red">This assignment is due on **Tuesday, May 24 2022** at 11:59pm PST.</span>

**Update (May 15, 07:00pm PST)**: For `MultiHeadAttention` class in `Transformer_Captioning.ipynb` notebook, you are expected to apply dropout to the attention weights. Earlier instructions were unclear, the instructions have been updated to clarify this.

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_3_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Image Captioning with Vanilla RNNs (30 points)](#q1-image-captioning-with-vanilla-rnns-30-points)
- [Q2: Image Captioning with Transformers (25 points)](#q2-image-captioning-with-transformers-25-points)
- [Q3: Generative Adversarial Networks (15 points)](#q3-generative-adversarial-networks-15-points)
- [Q4: Self-Supervised Learning for Image Classification (20 points)](#q4-self-supervised-learning-for-image-classification-20-points)
- [Extra Credit: Image Captioning with LSTMs (5 points)](#extra-credit-image-captioning-with-lstms-5-points)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment, you will implement language networks and apply them to image captioning on the COCO dataset. Then you will train a Generative Adversarial Network to generate images that look like a training dataset. Finally, you will be introduced to self-supervised learning to automatically learn the visual representations of an unlabeled dataset.

The goals of this assignment are as follows:

- Understand and implement RNN and Transformer networks. Combine them with CNN networks for image captioning.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset.
- Understand how to leverage self-supervised learning techniques to help with image classification tasks.

**You will use PyTorch for the majority of this homework.**

### Q1: Image Captioning with Vanilla RNNs (30 points)

The notebook `RNN_Captioning.ipynb` will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

### Q2: Image Captioning with Transformers (25 points)

The notebook `Transformer_Captioning.ipynb` will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

### Q3: Generative Adversarial Networks (15 points)

In the notebook `Generative_Adversarial_Networks.ipynb` you will learn how to generate images that match a training dataset and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Q4: Self-Supervised Learning for Image Classification (20 points)

In the notebook `Self_Supervised_Learning.ipynb`, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Extra Credit: Image Captioning with LSTMs (5 points)

The notebook `LSTM_Captioning.ipynb` will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3_code_submission.zip`.
* Convert all notebooks into a single PDF file called `a3_inline_submission.pdf`.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3_code_submission.zip and a3_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/379571).

Remember to download `a3_code_submission.zip` and `a3_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2023/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2023/assignment1/
---

<span style="color:red">This assignment is due on **Friday, April 21 2023** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_1_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: k-Nearest Neighbor classifier](#q1-k-nearest-neighbor-classifier)
- [Q2: Training a Support Vector Machine](#q2-training-a-support-vector-machine)
- [Q3: Implement a Softmax classifier](#q3-implement-a-softmax-classifier)
- [Q4: Two-Layer Neural Network](#q4-two-layer-neural-network)
- [Q5: Higher Level Representations: Image Features](#q5-higher-level-representations-image-features)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages).
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy.
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier.
- Implement and apply a Multiclass Support Vector Machine (**SVM**) classifier.
- Implement and apply a **Softmax** classifier.
- Implement and apply a **Two layer neural network** classifier.
- Understand the differences and tradeoffs between these classifiers.
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Oriented Gradient (HOG) features, etc.

### Q1: k-Nearest Neighbor classifier

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine

The notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1_code_submission.zip and a1_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/527613).

Remember to download `a1_code_submission.zip` and `a1_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2023/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2023/assignment2/
---

<span style="color:red">This assignment is due on **Monday, May 08 2023** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_2_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Multi-Layer Fully Connected Neural Networks](#q1-multi-layer-fully-connected-neural-networks)
- [Q2: Batch Normalization](#q2-batch-normalization)
- [Q3: Dropout](#q3-dropout)
- [Q4: Convolutional Neural Networks](#q4-convolutional-neural-networks)
- [Q5: PyTorch on CIFAR-10](#q5-pytorch-on-cifar-10)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Understand **Neural Networks** and how they are arranged in layered architectures.
- Understand and be able to implement (vectorized) **backpropagation**.
- Implement various **update rules** used to optimize Neural Networks.
- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, **PyTorch**.

### Q1: Multi-Layer Fully Connected Neural Networks

The notebook `FullyConnectedNets.ipynb` will have you implement fully connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully connected networks.

### Q3: Dropout

The notebook `Dropout.ipynb` will help you implement dropout and explore its effects on model generalization.

### Q4: Convolutional Neural Networks

In the notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch on CIFAR-10

For this part, you will be working with PyTorch, a popular and powerful deep learning framework.

Open up `PyTorch.ipynb`. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2_code_submission.zip and a2_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/527613).

Remember to download `a2_code_submission.zip` and `a2_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2023/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2023/assignment3/
---

<span style="color:red">This assignment is due on **Tuesday, May 30 2023** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_3_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images](#q1-network-visualization-saliency-maps-class-visualization-and-fooling-images)
- [Q2: Image Captioning with Vanilla RNNs](#q2-image-captioning-with-vanilla-rnns)
- [Q3: Image Captioning with Transformers](#q3-image-captioning-with-transformers)
- [Q4: Generative Adversarial Networks](#q4-generative-adversarial-networks)
- [Q5: Self-Supervised Learning for Image Classification](#q5-self-supervised-learning-for-image-classification)
- [Extra Credit: Image Captioning with LSTMs](#extra-credit-image-captioning-with-lstms-5-points)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment, you will implement language networks and apply them to image captioning on the COCO dataset. Then you will train a Generative Adversarial Network to generate images that look like a training dataset. Finally, you will be introduced to self-supervised learning to automatically learn the visual representations of an unlabeled dataset.

The goals of this assignment are as follows:

- Understand and implement RNN and Transformer networks. Combine them with CNN networks for image captioning.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset.
- Understand how to leverage self-supervised learning techniques to help with image classification tasks.

**You will use PyTorch for the majority of this homework.**

### Q1: Network Visualization: Saliency Maps, Class Visualization, and Fooling Images

The notebook `Network_Visualization.ipynb` will introduce the pretrained SqueezeNet model, compute gradients with respect to images, and use them to produce saliency maps and fooling images.

### Q2: Image Captioning with Vanilla RNNs

The notebook `RNN_Captioning.ipynb` will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

### Q3: Image Captioning with Transformers

The notebook `Transformer_Captioning.ipynb` will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

### Q4: Generative Adversarial Networks 

In the notebook `Generative_Adversarial_Networks.ipynb` you will learn how to generate images that match a training dataset and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Q5: Self-Supervised Learning for Image Classification 

In the notebook `Self_Supervised_Learning.ipynb`, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Extra Credit: Image Captioning with LSTMs

The notebook `LSTM_Captioning.ipynb` will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3_code_submission.zip`.
* Convert all notebooks into a single PDF file called `a3_inline_submission.pdf`.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3_code_submission.zip and a3_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/379571).

Remember to download `a3_code_submission.zip` and `a3_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2024/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2024/assignment1/
---

<span style="color:red">This assignment is due on **Friday, April 19 2024** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_1_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: k-Nearest Neighbor classifier](#q1-k-nearest-neighbor-classifier)
- [Q2: Training a Support Vector Machine](#q2-training-a-support-vector-machine)
- [Q3: Implement a Softmax classifier](#q3-implement-a-softmax-classifier)
- [Q4: Two-Layer Neural Network](#q4-two-layer-neural-network)
- [Q5: Higher Level Representations: Image Features](#q5-higher-level-representations-image-features)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the recommended workflow by watching the Colab walkthrough tutorial below:

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages).
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy.
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier.
- Implement and apply a Multiclass Support Vector Machine (**SVM**) classifier.
- Implement and apply a **Softmax** classifier.
- Implement and apply a **Two layer neural network** classifier.
- Understand the differences and tradeoffs between these classifiers.
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Oriented Gradient (HOG) features, etc.

### Q1: k-Nearest Neighbor classifier

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Training a Support Vector Machine

The notebook **svm.ipynb** will walk you through implementing the SVM classifier.

### Q3: Implement a Softmax classifier

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q4: Two-Layer Neural Network

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q5: Higher Level Representations: Image Features

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1_code_submission.zip and a1_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/527613).

Remember to download `a1_code_submission.zip` and `a1_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2024/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2024/assignment2/
---

<span style="color:red">This assignment is due on **Monday, May 06 2024** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_2_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Multi-Layer Fully Connected Neural Networks](#q1-multi-layer-fully-connected-neural-networks)
- [Q2: Batch Normalization](#q2-batch-normalization)
- [Q3: Dropout](#q3-dropout)
- [Q4: Convolutional Neural Networks](#q4-convolutional-neural-networks)
- [Q5: PyTorch on CIFAR-10](#q5-pytorch-on-cifar-10)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Understand **Neural Networks** and how they are arranged in layered architectures.
- Understand and be able to implement (vectorized) **backpropagation**.
- Implement various **update rules** used to optimize Neural Networks.
- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, **PyTorch**.

### Q1: Multi-Layer Fully Connected Neural Networks

The notebook `FullyConnectedNets.ipynb` will have you implement fully connected
networks of arbitrary depth. To optimize these models you will implement several
popular update rules.

### Q2: Batch Normalization

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully connected networks.

### Q3: Dropout

The notebook `Dropout.ipynb` will help you implement dropout and explore its effects on model generalization.

### Q4: Convolutional Neural Networks

In the notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q5: PyTorch on CIFAR-10

For this part, you will be working with PyTorch, a popular and powerful deep learning framework.

Open up `PyTorch.ipynb`. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2_code_submission.zip and a2_inline_submission.pdf to Gradescope. ###`

**_Note: When you have completed all notebookes, please ensure your most recent kernel execution order is chronological as this can otherwise cause issues for the Gradescope autograder. If this isn't the case, you should restart your kernel for that notebook and rerun all cells in the notebook using the Runtime Menu option "Restart and Run All"._**

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/527613).

Remember to download `a2_code_submission.zip` and `a2_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2024/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2024/assignment3/
---

<span style="color:red">This assignment is due on **Tuesday, May 28 2024** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_3_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Image Captioning with Vanilla RNNs](#q1-image-captioning-with-vanilla-rnns)
- [Q2: Image Captioning with Transformers](#q2-image-captioning-with-transformers)
- [Q3: Generative Adversarial Networks](#q3-generative-adversarial-networks)
- [Q4: Self-Supervised Learning for Image Classification](#q4-self-supervised-learning-for-image-classification)
- [Extra Credit: Image Captioning with LSTMs](#extra-credit-image-captioning-with-lstms-5-points)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment, you will implement language networks and apply them to image captioning on the COCO dataset. Then you will train a Generative Adversarial Network to generate images that look like a training dataset. Finally, you will be introduced to self-supervised learning to automatically learn the visual representations of an unlabeled dataset.

The goals of this assignment are as follows:

- Understand and implement RNN and Transformer networks. Combine them with CNN networks for image captioning.
- Understand how to train and implement a Generative Adversarial Network (GAN) to produce images that resemble samples from a dataset.
- Understand how to leverage self-supervised learning techniques to help with image classification tasks.

**You will use PyTorch for the majority of this homework.**

### Q1: Image Captioning with Vanilla RNNs

The notebook `RNN_Captioning.ipynb` will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

### Q2: Image Captioning with Transformers

The notebook `Transformer_Captioning.ipynb` will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

### Q3: Generative Adversarial Networks 

In the notebook `Generative_Adversarial_Networks.ipynb` you will learn how to generate images that match a training dataset and use these models to improve classifier performance when training on a large amount of unlabeled data and a small amount of labeled data. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Q4: Self-Supervised Learning for Image Classification 

In the notebook `Self_Supervised_Learning.ipynb`, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. **When first opening the notebook, go to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Extra Credit: Image Captioning with LSTMs

The notebook `LSTM_Captioning.ipynb` will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3_code_submission.zip`.
* Convert all notebooks into a single PDF file called `a3_inline_submission.pdf`.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3_code_submission.zip and a3_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to Gradescope.

Remember to download `a3_code_submission.zip` and `a3_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2025/assignment1.md
================================================
---
layout: page
title: Assignment 1
mathjax: true
permalink: /assignments2025/assignment1/
---

<span style="color:red">This assignment is due on **Wednesday, April 23 2025** at 11:59pm Pacific Time.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_1_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: k-Nearest Neighbor classifier](#q1-k-nearest-neighbor-classifier)
- [Q2: Implement a Softmax classifier](#q2-implement-a-softmax-classifier)
- [Q3: Two-Layer Neural Network](#q3-two-layer-neural-network)
- [Q4: Higher Level Representations: Image Features](#q4-higher-level-representations-image-features)
- [Q5: Training a fully connected network](#q5-training-a-fully-connected-network)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the recommended workflow by watching the Colab walkthrough tutorial below:

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice putting together a simple image classification pipeline based on the k-Nearest Neighbor or the SVM/Softmax classifier. The goals of this assignment are as follows:

- Understand the basic **Image Classification pipeline** and the data-driven approach (train/predict stages).
- Understand the train/val/test **splits** and the use of validation data for **hyperparameter tuning**.
- Develop proficiency in writing efficient **vectorized** code with numpy.
- Implement and apply a k-Nearest Neighbor (**kNN**) classifier.
- Implement and apply a **Softmax** classifier.
- Implement and apply a **Two layer neural network** classifier.
- Implement and apply a **fully connected network** classifier.
- Understand the differences and tradeoffs between these classifiers.
- Get a basic understanding of performance improvements from using **higher-level representations** as opposed to raw pixels, e.g. color histograms, Histogram of Oriented Gradient (HOG) features, etc.

### Q1: k-Nearest Neighbor classifier

The notebook **knn.ipynb** will walk you through implementing the kNN classifier.

### Q2: Implement a Softmax classifier

The notebook **softmax.ipynb** will walk you through implementing the Softmax classifier.

### Q3: Two-Layer Neural Network

The notebook **two\_layer\_net.ipynb** will walk you through the implementation of a two-layer neural network classifier.

### Q4: Higher Level Representations: Image Features

The notebook **features.ipynb** will examine the improvements gained by using higher-level representations
as opposed to using raw pixel values.

### Q5: Training a fully connected network

The notebook **FullyConnectedNets.ipynb** will walk you through implementing the fully connected network.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a1_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a1_code_submission.zip and a1_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/1012166).

Remember to download `a1_code_submission.zip` and `a1_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2025/assignment2.md
================================================
---
layout: page
title: Assignment 2
mathjax: true
permalink: /assignments2025/assignment2/
---

<span style="color:red">This assignment is due on **Wednesday, May 07 2025** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_2_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Batch Normalization](#q1-batch-normalization)
- [Q2: Dropout](#q2-dropout)
- [Q3: Convolutional Neural Networks](#q3-convolutional-neural-networks)
- [Q4: PyTorch on CIFAR-10](#q4-pytorch-on-cifar-10)
- [Q5: Image Captioning with Vanilla RNNs](#q5-image-captioning-with-vanilla-rnns)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to the [submission instructions](#submitting-your-work).

### Goals

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

- Implement **Batch Normalization** and **Layer Normalization** for training deep networks.
- Implement **Dropout** to regularize networks.
- Understand the architecture of **Convolutional Neural Networks** and get practice with training them.
- Gain experience with a major deep learning framework, **PyTorch**.
- Understand and implement RNN networks. Combine them with CNN networks for image captioning.


### Q1: Batch Normalization

In notebook `BatchNormalization.ipynb` you will implement batch normalization, and use it to train deep fully connected networks.

### Q2: Dropout

The notebook `Dropout.ipynb` will help you implement dropout and explore its effects on model generalization.

### Q3: Convolutional Neural Networks

In the notebook `ConvolutionalNetworks.ipynb` you will implement several new layers that are commonly used in convolutional networks.

### Q4: PyTorch on CIFAR-10

For this part, you will be working with PyTorch, a popular and powerful deep learning framework.

Open up `PyTorch.ipynb`. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

### Q5: Image Captioning with Vanilla RNNs
The notebook `RNN_Captioning_pytorch.ipynb` will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a2_code_submission.zip`.
* Convert all notebooks into a single PDF file.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a2_code_submission.zip and a2_inline_submission.pdf to Gradescope. ###`

**_Note: When you have completed all notebookes, please ensure your most recent kernel execution order is chronological as this can otherwise cause issues for the Gradescope autograder. If this isn't the case, you should restart your kernel for that notebook and rerun all cells in the notebook using the Runtime Menu option "Restart and Run All"._**

**2.** Submit the PDF and the zip file to [Gradescope](https://www.gradescope.com/courses/1012166).

Remember to download `a2_code_submission.zip` and `a2_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: assignments/2025/assignment3.md
================================================
---
layout: page
title: Assignment 3
mathjax: true
permalink: /assignments2025/assignment3/
---

<span style="color:red">This assignment is due on **Friday, May 30 2025** at 11:59pm PST.</span>

Starter code containing Colab notebooks can be [downloaded here]({{site.hw_3_colab}}).

- [Setup](#setup)
- [Goals](#goals)
- [Q1: Image Captioning with Transformers](#q1-image-captioning-with-transformers)
- [Q2: Self-Supervised Learning for Image Classification](#q2-self-supervised-learning-for-image-classification)
- [Q3: Denoising Diffusion Probabilistic Models](#q3-denoising-diffusion-probabilistic-models)
- [Q4: CLIP and Dino](#q4-clip-and-dino)
- [Submitting your work](#submitting-your-work)

### Setup

Please familiarize yourself with
the [recommended workflow]({{site.baseurl}}/setup-instructions/#working-remotely-on-google-colaboratory) before starting
the assignment. You should also watch the Colab walkthrough tutorial below.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Note**. Ensure you are periodically saving your notebook (`File -> Save`) so that you don't lose your progress if you
step away from the assignment and the Colab VM disconnects.

While we don't officially support local development, we've added a <b>requirements.txt</b> file that you can use to
setup a virtual env.

Once you have completed all Colab notebooks **except `collect_submission.ipynb`**, proceed to
the [submission instructions](#submitting-your-work).

### Goals

In this assignment, you will implement language networks and apply them to image captioning on the COCO dataset. Then
you will be introduced to self-supervised learning to automatically learn the visual representations of an unlabeled
dataset. Next, you will implement diffusion models (DDPMs) and apply them to image generation. Finally, you will explore
CLIP and DINO, two self-supervised learning methods that leverage large amounts of unlabeled data to learn visual
representations.

The goals of this assignment are as follows:

- Understand and implement Transformer networks. Combine them with CNN networks for image captioning.
- Understand how to leverage self-supervised learning techniques to help with image classification tasks.
- Implement and understand diffusion models (DDPMs) and apply them to image generation.
- Implement and understand CLIP and DINO, two self-supervised learning methods that leverage large amounts of unlabeled
  data to learn visual representations.

**You will use PyTorch for the majority of this homework.**

### Q1: Image Captioning with Transformers

The notebook `Transformer_Captioning.ipynb` will walk you through the implementation of a Transformer model and apply it
to image captioning on COCO.

### Q2: Self-Supervised Learning for Image Classification

In the notebook `Self_Supervised_Learning.ipynb`, you will learn how to leverage self-supervised pretraining to obtain
better performance on image classification tasks. **When first opening the notebook, go
to `Runtime > Change runtime type` and set `Hardware accelerator` to `GPU`.**

### Q3: Denoising Diffusion Probabilistic Models

In the notebook `DDPM.ipynb`, you will implement a Denoising Diffusion Probabilistic Model
(DDPM) and apply it to image generation.

### Q4: CLIP and Dino

In the notebook `CLIP_DINO.ipynb`, you will implement CLIP and DINO, two self-supervised learning methods that leverage
large amounts of unlabeled data to learn visual representations.

### Submitting your work

**Important**. Please make sure that the submitted notebooks have been run and the cell outputs are visible.

Once you have completed all notebooks and filled out the necessary code, you need to follow the below instructions to
submit your work:

**1.** Open `collect_submission.ipynb` in Colab and execute the notebook cells.

This notebook/script will:

* Generate a zip file of your code (`.py` and `.ipynb`) called `a3_code_submission.zip`.
* Convert all notebooks into a single PDF file called `a3_inline_submission.pdf`.

If your submission for this step was successful, you should see the following display message:

`### Done! Please submit a3_code_submission.zip and a3_inline_submission.pdf to Gradescope. ###`

**2.** Submit the PDF and the zip file to Gradescope.

Remember to download `a3_code_submission.zip` and `a3_inline_submission.pdf` locally before submitting to Gradescope.


================================================
FILE: attention.md
================================================
---
layout: page
permalink: /attention/
---

Table of Contents:

- [Motivation](#motivation)
- [General Attention Layers](#attention)
    - [Operations](#operations)
- [Self-Attention](#self)
    - [Masked Self-Attention Layers](#masked)
    - [Multi-Head Self-Attention Layers](#multihead)
- [Summary](#summary)
- [Additional References](#resources)

## Attention

We discussed fundamental workhorses of modern deep learning such as Convolutional Neural Networks and Recurrent Neural
Networks in previous sections. This section is devoted to yet another layer -- the attention layer -- that forms a new
primitive for modern Computer Vision and NLP applications.

<a name='motivation'></a>

### Motivation

To motivate the attention layer, let us look at a sample application -- image captioning, and see what's the problem
with using plain CNNs and RNNs there.

The figure below shows a pipeline of applying such networks on a given image to generate a caption. It first uses a
pre-trained CNN feature extractor to summarize the image, resulting in an image feature vector \\(c = h_0\\). It then
applies a recurrent network to repeatedly generate tokens at each step. After five time steps, the image captioning
model obtains the sentence: "surfer riding on wave".

<div class="fig figcenter fighighlight">
  <img src="/assets/att/captioning.png" width="80%">
</div>

What is the problem here? Notice that the model relies entirely on the context vector \\(c\\) to write the caption --
everything it wants to say about the image needs to be compressed within this vector. What if we want to be very
specific, and describe every nitty-gritty detail of the image, e.g. color of the surfer's shirt, facing direction of the
waves? Obviously, a finite-length vector cannot be used to encode all such possibilities, especially if the desired
number of tokens goes to the magnitude of hundreds or thousands.

The central idea of the attention layer is borrowed from human's visual attention system: when humans like us are given
a visual scene and try to understand a specific region of that scene, we focus our eyesight on that region. The
attention layer simulates this process, and *attends* to different parts of the image while generating words to describe
it.

With attention in play, a similar diagram showing the pipeline for image captioning is as follows. What's the main
difference? We incorporate two additional matrices: one for *alignment scores*, and the other for *attention*; and have
*different context vectors* \\(c_i\\) at different steps. At each step, the model uses a multi-layer perceptron to
digest the current hidden vector \\(h_i\\) and the input image features, to generate an alignment score matrix of shape
\\(H \times W\\). This score matrix is then fed into a softmax layer that converts it to an attention matrix with
weights summing to one. The weights in the attention matrix are next multiplied element-wise with image features,
allowing the model to focus on regions of the image differently. This entire process is differentiable and enables the
model to choose its own attention weights.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/captioning-attention.png" width="60%">
</div>

<a name='attention'></a>

### General Attention Layers

While the previous section details the application of an attention layer in image captioning, we next present a more
general and principled formulation of the attention layer, de-contextualizing it from the image captioning and recurrent
network settings. In a general setting, the attention layer is a layer with input and output vectors, and five major
operations. These are illustrated in the following diagrams.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/attention.png" width="70%">
  <div class="figcaption">Left: A General Attention Layer. Right: A Self-Attention Layer.</div>
</div>

As illustrated, inputs to an attention layer contain input vectors \\(X\\) and query vectors \\(Q\\). The input vectors,
\\(X\\), are of shape \\(N \times D_x\\) while the query vectors \\(Q\\) are of shape \\(M \times D_k\\). In the image
captioning example, input vectors are the image features while query vectors are the hidden states of the recurrent
network. Outputs of an attention layer are the vectors \\(Y\\) of shape \\(M \times D_k\\), at the top.

The bulk of the attention operations are illustrated as the colorful grids in the middle, and contains two major types
of operations: linear key-value maps; and align & attend operations that we saw earlier in the image captioning example.

<a name='operations'></a>

#### Operations

**Linear Key and Value Transformations.** These operations are linear transformations that convert the input vectors \\(
X\\) to two alternative set of vectors:

- Key vectors \\(K\\): These vectors are obtained by using the linear equation \\(K = X W_k\\) where \\(W_k\\) is a
  learnable weight matrix of shape \\(D_x \times D_k\\), converting from input vector dimension \\(D_x\\) to key
  dimension \\(D_k\\). The resulting keys have the same dimension as the query vectors, to enable alignment.
- Value vectors \\(V\\): Similarly, the equation to derive these vectors is the linear rule \\(V = X W_v\\)
  where \\(W_v\\) is of shape \\(D_x \times D_v\\). The value vectors have the same dimension as the output vectors.

By applying these fully-connected layers on top of the inputs, the attention model achieves additional expressivity.

**Alignment.** Core to the attention layer are two fundamental operations: alignment, and attention. In the alignment
step, while more complex functions are possible, practitioners often opt for a simple function between vectors: pairwise
dot products between key and query vectors.

Moreover, for vectors with a larger dimensionality, more terms are multiplied and summed in the dot product and this
usually implies a larger variance. Vectors with larger magnitude contribute more weights to the resulting softmax
calculation, and many terms usually receive low attention. To deal with this issue, a scaling factor, the reciprocal of
\\(\sqrt{D_x}\\), is often incorporated to reduce the alignment scores. This scaling procedure reduces the effect of
large magnitude terms, so that the resulting attention weights are more spread-out. The alignment computation can be
summarized as the following equation:

$$ e_{i,j} = \frac{q_j \cdot x_i}{\sqrt{D_x}} $$

**Attention.** The attention matrix is obtained by applying the softmax function column-wise to the alignment matrix.

$$ \mathbf{a} = \text{softmax}(\mathbf{e}) $$

The output vectors are finally calculated as multiplications of the attention matrix and the input vectors:

$$ y_j = \sum_{i} a_{i,j} x_i $$

<a name='self'></a>

### Self-Attention

While we explain the general attention layer above, the self-attention layer refers to the special case where similar to
the key and value vectors, the query vectors \\(Q\\) are also expressed as a linear transformation of the input vectors:
\\(Q = X W_q\\) where \\(W_q\\) is of shape \\(D_x \times D_k\\). With this expression of query vectors as a linear
function of the inputs, the attention layer is self-contained. This is illustrated on the right of the figure above.

**Permutation Invariance.** It is worth noting that the self-attention layer is invariant to the order of the input
vectors: if we apply a permutation to the input vectors, the outputs will be permuted in exactly the same way. This is
illustrated in the following diagram.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/permutation.png" width="55%">
  <div class="figcaption">Permutation Invariance of Self-Attention Layers.</div>
</div>

**Positional Encoding.** While the self-attention layer is agnostic to the ordering of inputs, practical applications
often require some notion of ordering. For example, in natural language sequences, the relative ordering of words often
plays a pivotal role in differentiating the meaning of the entire sentence. This necessitates the inclusion of a
positional encoding component into the self-attention module, to endow the model with the ability to determine the
positions of its inputs. A number of desiderata are needed for this component:

- The positional encodings should be *unique* for each time step.
- The *distance* between any two consecutive encodings should be the same.
- The positional encoding function should generalize to arbitrarily *long* sequences.
- The function should be *deterministic*.

While there exists a number of functions that satisfy the above criteria, a commonly used method makes use of mixed sine
and cosine values. Concretely, the encoding function looks like the following:

$$ p(t)
= [\sin(w_1 \cdot t), \cos(w_1 \cdot t), \sin(w_2 \cdot t), \cos(w_2 \cdot t), \cdots, \sin(w_{d/2} \cdot t), \cos(w_{d/2} \cdot t)]
$$

where the frequency \\(w_k = \frac{1}{10000^{2k/d}}\\). What does this function encode? The following diagram is an
intuitive explanation of the same phenomenon, but in the binary domain:

<div class="fig figcenter fighighlight">
  <img src="/assets/att/position-binary.png" width="50%">
</div>

The frequencies \\(w_k\\) are varied, to represent the relative positions of the inputs, in a similar vein as the 0s and
1s in the binary case. In practice, the positional encoding component concatenates additional information to the input
vectors, before they are passed to the self-attention module:

<div class="fig figcenter fighighlight">
  <img src="/assets/att/position.png" width="15%">
</div>

**A Comparison Between General Attention vs. Self-Attention**. The general attention layer has access to three sets of
vectors: key, value, and query vectors. In comparison, the self-attention layer is entirely self-enclosed, and instead
parameterizes the three sets of vectors as linear functions of the inputs.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/comparison.png" width="60%">
</div>

<a name='masked'></a>

#### Masked Self-Attention Layers

While the positional encoding layer integrates some positional information, in more critical applications, it may be
necessary to distill into the model a clearer idea of relative input orderings and prevent it from *looking-ahead* at
future vectors. To this end, the *masked* self-attention layer is created: it explicitly sets the lower-triangular part
of the alignment matrix to negative infinity values, to ignore the corresponding, future vectors while the model
processes earlier vectors.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/masked.png" width="30%">
</div>

<a name='multihead'></a>

#### Multi-Head Self-Attention Layers

Yet another possibility to increase the expressivity of the model is to exploit the notion of a *multi-head* attention.
Instead of using one single self-attention layer, multi-head attention utilizes multiple, parallel attention layers. In
some cases, to maintain the total computation, the key and value dimensions \\(D_k, D_v\\) may be reduced accordingly.
The benefit of using multiple attention heads is to allow the model to focus on different aspects of the input vectors.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/multihead.png" width="60%">
</div>

<a name='summary'></a>

### Summary

To summarize this section,

- We motivated and introduced a novel layer popular in deep learning, the **attention** layer.
- We introduced it in its general formulation and in particular, studied details of the **align and attend** operations.
- We then specialized to the case of a **self-attention** layer.
- We learned that self-attention layers are **permutation-invariant** to the input vectors.
- To retain some positional information, self-attention layers use a **positional-encoding** function.
- Moreover, we also studied two extensions of the vanilla self-attention layer: the **masked** attention layer, and
  the **multi-head** attention. While the former layer prevents the model from looking ahead, the latter serves to
  increase its expressivity.

<a name='resources'></a>

### Additional Resources

- [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](http://proceedings.mlr.press/v37/xuc15.pdf)
  presents an application of the attention layer to image captioning.
- [Women also Snowboard: Overcoming Bias in Captioning Models](https://arxiv.org/pdf/1803.09797.pdf) exploits the
  attention layer to detect gender bias in image captioning models.
- [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf) applies
  attention to natural language translation.
- [Attention is All You Need](https://arxiv.org/pdf/1706.03762.pdf) is the seminal paper on attention-based
  Transformers, that took the Vision and NLP communities by storm.

================================================
FILE: aws-tutorial.md
================================================
---
layout: page
title: AWS Tutorial
permalink: /aws-tutorial/
---
For GPU instances, we also have an Amazon Machine Image (AMI) that you can use
to launch GPU instances on Amazon EC2. This tutorial goes through how to set up
your own EC2 instance with the provided AMI. **We do not currently 
distribute AWS credits to CS231N students but you are welcome to use this 
snapshot on your own budget.**

**TL;DR** for the AWS-savvy: Our image is
`cs231n_caffe_torch7_keras_lasagne_v2`, AMI ID: `ami-125b2c72` in the us-west-1
region. Use a `g2.2xlarge` instance.  Caffe, Torch7, Theano, Keras and Lasagne
are pre-installed. Python bindings of caffe are available. It has CUDA 7.5 and
CuDNN v3.

First, if you don't have an AWS account already, create one by going to the [AWS
homepage](http://aws.amazon.com/), and clicking on the yellow "Sign In to the
Console" button. It will direct you to a signup page which looks like the
following.

<div class='fig figcenter fighighlight'>
  <img src='/assets/aws-signup.png'>
</div>

Select the "I am a new user" checkbox, click the "Sign in using our secure
server" button, and follow the subsequent pages to provide the required details.
They will ask for a credit card information, and also a phone verification, so
have your phone and credit card ready.

Once you have signed up, go back to the [AWS homepage](http://aws.amazon.com),
click on "Sign In to the Console", and this time sign in using your username and
password.

<div class='fig figcenter fighighlight'>
  <img src='/assets/aws-signin.png'>
</div>

Once you have signed in, you will be greeted by a page like this:

<div class='fig figcenter fighighlight'>
  <img src='/assets/aws-homepage.png'>
</div>

Make sure that the region information on the top right is set to N. California.
If it is not, change it to N. California by selecting from the dropdown menu
there.

(Note that the subsequent steps requires your account to be "Verified" by
 Amazon. This may take up to 2 hrs, and you may not be able to launch instances
 until your account verification is complete.)

Next, click on the EC2 link (first link under the Compute category). You will go
to a dashboard page like this:

<div class='fig figcenter fighighlight'>
  <img src='/assets/ec2-dashboard.png'>
</div>

Click the blue "Launch Instance" button, and you will be redirected to a page
like the following:

<div class='fig figcenter fighighlight'>
  <img src='/assets/ami-selection.png'>
</div>

Click on the "Community AMIs" link on the left sidebar, and search for "cs231n"
in the search box. You should be able to see the AMI
`cs231n_caffe_torch7_keras_lasagne_v2` (AMI ID: `ami-125b2c72`). Select that
AMI, and continue to the next step to choose your instance type.

<div class='fig figcenter fighighlight'>
  <img src='/assets/community-AMIs.png'>
</div>

Choose the instance type `g2.2xlarge`, and click on "Review and Launch".

<div class='fig figcenter fighighlight'>
  <img src='/assets/instance-selection.png'>
</div>

In the next page, click on Launch.

<div class='fig figcenter fighighlight'>
  <img src='/assets/launch-screen.png'>
</div>

You will be then prompted to create or use an existing key-pair. If you already
use AWS and have a key-pair, you can use that, or alternately you can create a
new one by choosing "Create a new key pair" from the drop-down menu and giving
it some name of your choice. You should then download the key pair, and keep it
somewhere that you won't accidentally delete. Remember that there is **NO WAY**
to get to your instance if you lose your key. 

<div class='fig figcenter fighighlight'>
  <img src='/assets/key-pair.png'>
</div>

<div class='fig figcenter fighighlight'>
  <img src='/assets/key-pair-create.png'>
</div>

Once you download your key, you should change the permissions of the key to
user-only RW, In Linux/OSX you can do it by:

```
$ chmod 600 PEM_FILENAME
```
Here `PEM_FILENAME` is the full file name of the .pem file you just downloaded.

After this is done, click on "Launch Instances", and you should see a screen
showing that your instances are launching:

<div class='fig figcenter fighighlight'>
  <img src='/assets/launching-screen.png'>
</div>

Click on "View Instances" to see your instance state. It should change to
"Running" and "2/2 status checks passed" as shown below within some time. You
are now ready to ssh into the instance.

<div class='fig figcenter fighighlight'>
  <img src='/assets/instances-page.png'>
</div>

First, note down the Public IP of the instance from the instance listing. Then,
do:

```
ssh -i PEM_FILENAME ubuntu@PUBLIC_IP
```

Now you should be logged in to the instance. You can check that Caffe is working
by doing:

```
$ cd caffe
$ ./build/tools/caffe time --gpu 0 --model examples/mnist/lenet.prototxt
```

We have Caffe, Theano, Torch7, Keras and Lasagne pre-installed. Caffe python
bindings are also available by default. We have CUDA 7.5 and CuDNN v3 installed.

If you encounter any error such as 

```
Check failed: error == cudaSuccess (77 vs.  0)  an illegal memory access was encountered
```

you might want to terminate your instance and start over again. I have observed
this rarely, and I am not sure what causes this.

About how to use these instances:

- The root directory is only 12GB, and only ~ 3GB of that is free.
- There should be a 60GB `/mnt` directory that you can use to put your data,
model checkpoints, models etc.
- Remember that the `/mnt` directory won't be persistent across
reboots/terminations.
- Stop your instances when are done for the day to avoid incurring charges. GPU
instances are costly. Use your funds wisely. Terminate them when you are sure
you are done with your instance (disk storage also costs something, and can be
significant if you have a large disk footprint).
- Look into creating custom alarms to automatically stop your instances when
they are not doing anything.
- If you need access to a large dataset and don't want to download it every time
you spin up an instance, the best way to go would be to create an AMI for that
and attach that AMI to your machine when configuring your instance (before
launching but after you have selected the AMI).


================================================
FILE: choose-project.md
================================================
---
layout: page
title: Taking a Course Project to Publication
permalink: /choose-project/
---

*This tutorial was originally contributed by [Leila Abdelrahman](http://leilaabdel.com/), [Amil Khanzada](https://www.linkedin.com/in/amilkhanzada), [Cong Kevin Chen](https://www.linkedin.com/in/cong-kevin-chen-11544186/), and [Tom Jin](https://www.linkedin.com/in/tomjinvancouver/) with oversight and guidance from [Professor Fei-Fei Li](https://profiles.stanford.edu/fei-fei-li) and [Professor Ranjay Krishna](http://www.ranjaykrishna.com/).*

Taking a course project to publication is a challenging but rewarding endeavor. Starting in CS231n in Spring 2020, our team spent hundreds of hours over seven months to publish our project at the [ACM 2020 International Conference on Multimodal Interaction](http://icmi.acm.org/2020/)! We aim to share tips on creating a great project and how you can take it to the next level through publication.

<div class='fig figcenter fighighlight'>
<a href="https://github.com/fusical/emotiw">
<img alt="W3Schools" src="/assets/student-post-files/fusical.gif"> </a>
</div>

Here is a link to [our paper 📄](https://dl.acm.org/doi/abs/10.1145/3382507.3417966) and [corresponding pdf](https://github.com/fusical/emotiw/blob/master/acm_fusical_paper.pdf) for the 2020 ACM ICMI!
<br>

## Table of Contents 📖

- [CS231n: Taking a Course Project to Publication](#cs231n-taking-a-course-project-to-publication)
  - [Picking your Team 🤝](#picking-your-team-)
  - [Choosing your Project 🎛](#choosing-your-project-)
  - [Building the Project 👷🏽](#building-the-project-)
  - [Showcasing your Work 📽](#showcasing-your-work-)
  - [Refining for Publication 📄](#refining-for-publication-)
  - [Maximizing your Impact 🙌](#maximizing-your-impact-)
  - [Conclusion](#conclusion)

## Picking your Team 🤝

Although working alone may seem faster and more comfortable, it is generally not recommended. One of the primary advantages of taking a course is building a "dense network" with classmates through spending hours together working on the project. Additionally, your talented classmates may actually be teachers in disguise! If done correctly, working with a team is a lot more fun and lends for greater creativity and impact through differing opinions.

Investing good time to choose your teammates is CRITICAL, as they will shape your thinking, dictate your sanity, join your lifelong network, and ultimately determine the success of your project. Start your search as early as possible (e.g., before the course starts) so that you have enough time to consider different ideas and hit the ground running.

Have a sense of what you want to do and who you want to work with. Post your interest on Piazza and evaluate others. Join a Slack channel for your area of interest (e.g., #bio_healthcare). Interview folks through project brainstorming video calls. Consider their experience in machine learning, motivation, and personality. Specifically, when evaluating your potential team members' deep learning background, consider factors such as prior work experience, projects/publications, and AI courses in topics related to your intended project. Ask questions such as what kind of networks they would use to tackle your intended project, how they would go about finding data sources, and what tooling they are comfortable with (e.g., Google Colab or AWS). A good list of deep learning interview questions can be found on [Workera.ai](https://workera.ai/resources/deep-learning-algorithms-interview/).

Finally, be flexible and pick your group. Always keep in mind that plans can change, and some students do end up dropping the course. Be open in your communication and watch out for unexpected changes in your group before the course drop deadline.

### Picking your Mentors
Do note that mentorship and support are invaluable in ensuring your success. Find the TA, professor, and/or industry mentors best suited to guide you and meet with them often. Course staff and mentors are generally interested in helping YOU succeed and may even write you a recommendation letter in the future!

<div class='fig figcenter fighighlight'>
  <img src='/assets/student-post-files/choose-project-team.png' width=500px>
</div>

### What We Did
We looked at the profiles of all the TAs and went to office hours. We were lucky to find Christina Yuan, who had done similar research work and soon became a kind mentor for us. Professor Ranjay Krishna was also very kind to guide us in taking our project to publication.

## Choosing your Project 🎛

What you do is very important, as it determines your enjoyment and builds your foundation for future research.

### Sample Categories of CS231n Projects

Past projects can be found here on the [the course website](http://cs231n.stanford.edu/2020/project.html) and include categories such as:

1. Image Classification
2. Image Segmentation
3. Video Processing
4. 3D Deep Learning
5. Sentiment Analysis
6. Object Detection
7. Data Augmentation
8. Audio Analysis with CNNs and Spectrogram Images

### Consider [SMART Goals](https://en.wikipedia.org/wiki/SMART_criteria)

#### Specific

What question are you exactly trying to answer, or what concept do you want to address in-depth? Specific projects are suitable for focus, but make sure it has some generalizability.

*Good Example:*
Generate Deep Fakes

*Bad Example:*
Generating Deep Fakes on lung CT scans to improve medical image segmentation.


#### Measurable
  Make sure there are performance metrics, qualitative assessments, and other benchmarks you can use to measure your work's success. Examples include accuracy on a test dataset and performance relative to a human.

*Bad Example:*
Create high-fidelity image reconstructions.

*Good Example:*
Create high-fidelity image reconstructions with low mean-squared error difference compared to original images.

#### Attainable
Do you have the resources, tools, and background experience to accomplish the project's tasks? Choosing grandiose projects beyond reach can only bring disappointment, be realistic. Ask yourself if you have the computing power (GPUs, disk space), the data (is it open-access, private, do you have to curate it yourself), and the bandwidth (are you juggling a busy semester/quarter).

If you are currently doing CV related research, you can incorporate that into your project, and your supervisor may be willing to lend you data or compute resources. If not, there is usually a short list of faculty-sponsored projects looking for students.

*Bad Example:*
Detection of cancer from facial videos.<br> This is not attainable because there is unlikely to be a large enough dataset that is open-access.

*Good Example:*
Projects that can leverage open-source repos and APIs or work with pre-trained or fine-tuned Deep Learning models.

#### Relevant
CS231n requires you to process some form of 2D pixelated data with a CNN. <br> <br> Discussing your idea with the TAs is good to know how relevant the project is to the class. Make sure the project also aligns with your values and interests. Think long term, and ask yourself five years after the project ends if it enriched your interests and career direction.

*Bad Example:*
Chirp classification with principal component analysis.

*Good Example:*
Object recognition using CNNs. We incorporated feature detection tasks like scene analysis and image captioning for emotion recognition in videos in our project.

#### Timely
CS231n only lasts for 8-10 weeks. Make sure you can deliver your objectives along the class's timeline. Roadmap your project to see what needs to get done--and when! This is perhaps the most critical. <br> <br> Allow yourself 2-3 weeks to find a team and formulate a proposal, 3-4 to collect/preprocess data and train your first model against an established baseline (milestone), and the rest to run additional models, tune hyperparameters, and format your report. <br> <br> It's helpful to log everything in Overleaf and repurpose your milestone for your final submission so that everything doesn't catch up to you in the last few days.

*Bad Example:*
A project that requires two years to collect, clean, and organize raw data.

*Good Example:*
A project where you can work with existing datasets or spend a few weeks collecting raw data. For our EmotiW project, an existing dataset was provided to us, so we were able to spend more time on algorithms.

### Advice from Professor Fei-Fei Li

1. **Maximize learning:** Find a project that can enable your maximal learning.
2. **Feasibility:** Make sure it is executable within the short amount of time in this quarter, especially making sure that data, evaluation metric, and compute resources are adequate for what you want to do.
3. **Think long term:** Executing a good project regardless of the topic would benefit your job application. Companies look for people with a solid technical background and the ability to execute and work as a team. I don't think they will be very narrow-minded on the specific project topics.

## Building the Project 👷🏽

When working on the project, consider these tips to maximize the potential for impact.

### Work in the Cloud

Store documentation, code, datasets, and model files in services such as Google Drive, Dropbox, or GitHub.

The course offers AWS and GCP credits for your projects. Consider taking advantage of these resources, especially when it comes to heavy computing.

If you plan to use Colab, consider saving your model checkpoints directly to your Stanford Google Drive folder to avoid losing files. Colab is also great to parallelize your work as you can run multiple experiments simultaneously. Be careful not to let your work session time out, or you could lose your progress!


### Work Collaboratively
Brainstorm your projects on live-editors like Google Sheets and Google Docs so others can work in the same workspace. Overleaf is excellent for polishing the final report as a team.

Set up regular weekly or semi-weekly meetings with your teammates. This establishes a cadence and helps keep everyone on track, even if no progress has been made.

### Document Everything
Write your code as if you will be showing it to someone who has no idea about your project. This is great for making your methods transparent and reproducible.

Record all experiments you run (e.g., hyperparameter tuning) even if they don't work out, as you can write about them in your report.


### Be Organized
Create clean file structures. Give folders and scripts precise names. Organization saves your team hours, if not days, in the long run.

Write your notebooks, so they are executable top-to-bottom. That way, if you suffer a catastrophic data loss or discover a mistake in your code, you can smoothly go back and replicate your entire setup.

## Showcasing your Work 📽

Communication makes the project memorable. Come up with a fun title, and make sure to present your work in multiple ways. Beyond the written report, create a slide deck, record a video, and practice talking about your work with TAs, professors, and industry experts. Articulation skills in oral and visual presentations ensure that the audience remembers your work!

Before completing the project, take the time to clean up your GitHub repository, or make one if you have not done so already. Add a strong README.md file with a clear description of your project and a "how-to" guide complete with examples. Doing so makes your project accessible and encourages others to build upon your work.

If you are submitting to a journal or conference, consider submitting a copy of your work to a preprint service like [arXiv](https://arxiv.org/). arXiv does not require peer-review and is commonly used by researchers within the CS field to disseminate their work. Publishing a preprint is a quick way to add to your list of publications on Google Scholar. You may also want to take 5 minutes to register and [ORCID](https://orcid.org/).

### Our Challenges
With four people working simultaneously, our Github repository had naturally become a little hard to navigate since everyone used a slightly different naming scheme. Multiple cloned notebooks were also leftover from other experiments. In our case, our goal had always been to submit a grand challenge paper based on our results, so organizing public-facing repository by project completion was critical.

In 2020, many conferences were entirely virtual, which made it even more important to have a polished video presentation as this would end up as part of conference proceedings.

### What We Did
We made sure our Github repository reflected our results, and we took the opportunity to embed multiple figures into the README to make it visually appealing and easily understandable.

We chose the playful title "Fusical" and created a one-figure visual that summarized our work at a high-level. This made it easy for people to reference our work during the conference, allowing us to talk about our project without confusing attendees with different slides.

## Refining for Publication 📄

When refining for publication, ask yourself: *What am I bringing to the conversations that is new?* This can come from novelty or exceeding state-of-the-art methods. Reinforcing this question throughout refinement is critical for successfully publishing.

The next step is to find the right journal or conference to publish to. Unlike most science fields, CS and AI tend to value conference submissions over journal submissions due to the publication turnaround time and higher visibility within the field. Conferences are fun and allow you to network with other like-minded individuals.

In Computer Vision, you'll find conferences of all sizes happening at different times throughout the year. Top conferences include [CVPR](http://cvpr2021.thecvf.com/), [ICCV](http://iccv2021.thecvf.com/home), and [ECCV](https://eccv2020.eu/), but due to their popularity, it might be challenging to get a successful submission. Because many students choose to do an application-based project, consider niche conferences depending on your project's topic. For instance, teams who developed a model to determine student engagement in the classroom might consider submitting to [AIED](https://aied2020.nees.com.br/). Talking to your mentors might also help narrow down the specific method or venue of publication. Keep in mind that Stanford does offer some [travel grants](https://undergrad.stanford.edu/opportunities/research/get-funded).


Iterating over the work is essential for creating publication-quality work. Improving the writing, optimizing the code, and making quality figures are vital points to consider. Reference the work of experts and reach out for advice. [Dr. Juan Carlos Niebles' publications page](http://www.niebles.net/publications/ ) is a good example.

Publications are a lengthy process, partly due to peer review. Review your submission for possible ambiguous areas or claims that are unsubstantiated. Ensure you are clear with your experiment parameters and ask someone outside your team to review the paper.

Writing for a peer-reviewed journal or conference proceedings is very different than preparing a project report, yet the project helps guide what goes into the final paper.

### Our Challenges
While we made advances during the CS231n course, we didn't believe our models were novel enough for publication just yet. Based on guidance from our mentors, we looked for opportunities to differentiate our work from others. Because our project was part of the [EmotiW grand challenge](https://sites.google.com/view/emotiw2020), one of the motivating factors was to continue exploring additional methods to improve our validation performance before the submission deadline.

### What We Did
Based on a literature search, we found common patterns in the modalities used by other groups for sentiment classification (e.g. audio, pose). As a result, we focused on exploring two novel modalities that we felt had value: image captioning (to look for certain objects or settings, such as protests, that are correlated with a specific degree of sentiment) and laughter detection (since the presence of laughter in an audio clip quite often indicates a positive sentiment). Using modified saliency maps and ablation studies, we tried to demonstrate the usefulness of all of these modalities for a more convincing argument.

Overall, we attained a test accuracy of approximately 64%. Because there was no public leaderboard, there was initial uncertainty about how we fared compared to other groups, but we were confident in our approach as it beat the baseline by a large margin. Although we ultimately ranked below the top three winning scores, we were still able to publish our work at the conference due to our novel findings, demonstrating that the test data performance is only of many criteria stressed for publication.

### Differences Between our Course Project and Publication Paper

#### Course Project
- More focus on error analysis and experiments that went wrong.
- Tendency towards experimental approaches, as the course timeframe was too short for pretraining and auxiliary data.
- Significant time spent on "invisible" setup work for the project's infrastructure (preprocessing, APIs, and overall backbone).
- Our original CS231n original project submission can be found [here](https://github.com/fusical/emotiw/blob/master/cs231n_project_report.pdf).

#### Publication Paper
- More focus on the novelty of our approach and emphasis on beating the competition baseline.
- The video presentation was longer and required more detail.
- Ran more experiments and hyperparameter tuning.
- Our official published paper can be found [here](https://dl.acm.org/doi/abs/10.1145/3382507.3417966) and [as a pdf](https://github.com/fusical/emotiw/blob/master/acm_fusical_paper.pdf).

*If you don't get the results you were hoping for, don't be scared to submit it anyway - the worst reviewers can do is say no, and feedback from experts in the field is always valuable!*

## Maximizing your Impact 🙌
Your project has worth and impact. Presenting at global conferences (we presented at the 2020 ICMI) is a great way to share your results and findings with other academic and industry experts, but there are other options. Do you think your idea can be deployed in practice? Stanford has resources and opportunities to help students continue to pursue their ideas, including the [Startup Garage](https://www.gsb.stanford.edu/experience/learning/entrepreneurship/courses/startup-garage), a hands-on project-based course to develop and test out new business concepts, and the [Vision Lab](http://vision.stanford.edu/people.html).

If your idea is more research-focused, seek out professors who might work in the domain area and pitch them your thoughts. You might have opportunities to continue your project as an independent study project or apply your idea to more extensive and significant datasets or pivot to similar research areas.

## Conclusion

It was hard work and fun because we had a good team! We hope you all enjoy these tips on the process of taking a project from brainstorming to impactful presentations and publications. These tips are scalable, and we hope you use them in other endeavors as well!


================================================
FILE: classification.md
================================================
---
layout: page
mathjax: true
permalink: /classification/
---

This is an introductory lecture designed to introduce people from outside of Computer Vision to the Image Classification problem, and the data-driven approach. The Table of Contents:

- [Image Classification](#image-classification)
  - [Nearest Neighbor Classifier](#nearest-neighbor-classifier)
  - [k - Nearest Neighbor Classifier](#k---nearest-neighbor-classifier)
  - [Validation sets for Hyperparameter tuning](#validation-sets-for-hyperparameter-tuning)
  - [Summary](#summary)
  - [Summary: Applying kNN in practice](#summary-applying-knn-in-practice)
    - [Further Reading](#further-reading)

<a name='intro'></a>

## Image Classification

**Motivation**. In this section we will introduce the Image Classification problem, which is the task of assigning an input image one label from a fixed set of categories. This is one of the core problems in Computer Vision that, despite its simplicity, has a large variety of practical applications. Moreover, as we will see later in the course, many other seemingly distinct Computer Vision tasks (such as object detection, segmentation) can be reduced to image classification.

**Example**. For example, in the image below an image classification model takes a single image and assigns probabilities to 4 labels, *{cat, dog, hat, mug}*. As shown in the image, keep in mind that to a computer an image is represented as one large 3-dimensional array of numbers. In this example, the cat image is 248 pixels wide, 400 pixels tall, and has three color channels Red,Green,Blue (or RGB for short). Therefore, the image consists of 248 x 400 x 3 numbers, or a total of 297,600 numbers. Each number is an integer that ranges from 0 (black) to 255 (white). Our task is to turn this quarter of a million numbers into a single label, such as *"cat"*.

<div class="fig figcenter fighighlight">
  <img src="/assets/classify.png">
  <div class="figcaption">The task in Image Classification is to predict a single label (or a distribution over labels as shown here to indicate our confidence) for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width x Height x 3. The 3 represents the three color channels Red, Green, Blue.</div>
</div>

**Challenges**. Since this task of recognizing a visual concept (e.g. cat) is relatively trivial for a human to perform, it is worth considering the challenges involved from the perspective of a Computer Vision algorithm. As we present (an inexhaustive) list of challenges below, keep in mind the raw representation of images as a 3-D array of brightness values:

- **Viewpoint variation**. A single instance of an object can be oriented in many ways with respect to the camera.
- **Scale variation**. Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).
- **Deformation**. Many objects of interest are not rigid bodies and can be deformed in extreme ways.
- **Occlusion**. The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
- **Illumination conditions**. The effects of illumination are drastic on the pixel level.
- **Background clutter**. The objects of interest may *blend* into their environment, making them hard to identify.
- **Intra-class variation**. The classes of interest can often be relatively broad, such as *chair*. There are many different types of these objects, each with their own appearance.

A good image classification model must be invariant to the cross product of all these variations, while simultaneously retaining sensitivity to the inter-class variations.

<div class="fig figcenter fighighlight">
  <img src="/assets/challenges.jpeg">
  <div class="figcaption"></div>
</div>

**Data-driven approach**. How might we go about writing an algorithm that can classify images into distinct categories? Unlike writing an algorithm for, for example, sorting a list of numbers, it is not obvious how one might write an algorithm for identifying cats in images. Therefore, instead of trying to specify what every one of the categories of interest look like directly in code, the approach that we will take is not unlike one you would take with a child: we're going to provide the computer with many examples of each class and then develop learning algorithms that look at these examples and learn about the visual appearance of each class. This approach is referred to as a *data-driven approach*, since it relies on first accumulating a *training dataset* of labeled images. Here is an example of what such a dataset might look like:

<div class="fig figcenter fighighlight">
  <img src="/assets/trainset.jpg">
  <div class="figcaption">An example training set for four visual categories. In practice we may have thousands of categories and hundreds of thousands of images for each category.</div>
</div>

**The image classification pipeline**. We've seen that the task in Image Classification is to take an array of pixels that represents a single image and assign a label to it. Our complete pipeline can be formalized as follows:

- **Input:** Our input consists of a set of *N* images, each labeled with one of *K* different classes. We refer to this data as the *training set*.
- **Learning:** Our task is to use the training set to learn what every one of the classes looks like. We refer to this step as *training a classifier*, or *learning a model*.
- **Evaluation:** In the end, we evaluate the quality of the classifier by asking it to predict labels for a new set of images that it has never seen before. We will then compare the true labels of these images to the ones predicted by the classifier. Intuitively, we're hoping that a lot of the predictions match up with the true answers  (which we call the *ground truth*).

<a name='nn'></a>

### Nearest Neighbor Classifier
As our first approach, we will develop what we call a **Nearest Neighbor Classifier**. This classifier has nothing to do with Convolutional Neural Networks and it is very rarely used in practice, but it will allow us to get an idea about the basic approach to an image classification problem.

**Example image classification dataset: CIFAR-10.** One popular toy image classification dataset is the <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR-10 dataset</a>. This dataset consists of 60,000 tiny images that are 32 pixels high and wide. Each image is labeled with one of 10 classes (for example *"airplane, automobile, bird, etc"*). These 60,000 images are partitioned into a training set of 50,000 images and a test set of 10,000 images. In the image below you can see 10 random example images from each one of the 10 classes:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn.jpg">
  <div class="figcaption">Left: Example images from the <a href="https://www.cs.toronto.edu/~kriz/cifar.html">CIFAR-10 dataset</a>. Right: first column shows a few test images and next to each we show the top 10 nearest neighbors in the training set according to pixel-wise difference.</div>
</div>

Suppose now that we are given the CIFAR-10 training set of 50,000 images (5,000 images for every one of the labels), and we wish to label the remaining 10,000. The nearest neighbor classifier will take a test image, compare it to every single one of the training images, and predict the label of the closest training image. In the image above and on the right you can see an example result of such a procedure for 10 example test images. Notice that in only about 3 out of 10 examples an image of the same class is retrieved, while in the other 7 examples this is not the case. For example, in the 8th row the nearest training image to the horse head is a red car, presumably due to the strong black background. As a result, this image of a horse would in this case be mislabeled as a car.

You may have noticed that we left unspecified the details of exactly how we compare two images, which in this case are just two blocks of 32 x 32 x 3. One of the simplest possibilities is to compare the images pixel by pixel and add up all the differences. In other words, given two images and representing them as vectors \\( I_1, I_2 \\) , a reasonable choice for comparing them might be the **L1 distance**:

$$
d_1 (I_1, I_2) = \sum_{p} \left| I^p_1 - I^p_2 \right|
$$

Where the sum is taken over all pixels. Here is the procedure visualized:

<div class="fig figcenter fighighlight">
  <img src="/assets/nneg.jpeg">
  <div class="figcaption">An example of using pixel-wise differences to compare two images with L1 distance (for one color channel in this example). Two images are subtracted elementwise and then all differences are added up to a single number. If two images are identical the result will be zero. But if the images are very different the result will be large.</div>
</div>

Let's also look at how we might implement the classifier in code. First, let's load the CIFAR-10 data into memory as 4 arrays: the training data/labels and the test data/labels. In the code below, `Xtr` (of size 50,000 x 32 x 32 x 3) holds all the images in the training set, and a corresponding 1-dimensional array `Ytr` (of length 50,000) holds the training labels (from 0 to 9):

```python
Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') # a magic function we provide
# flatten out all images to be one-dimensional
Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3) # Xtr_rows becomes 50000 x 3072
Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3) # Xte_rows becomes 10000 x 3072
```

Now that we have all images stretched out as rows, here is how we could train and evaluate a classifier:

```python
nn = NearestNeighbor() # create a Nearest Neighbor classifier class
nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels
Yte_predict = nn.predict(Xte_rows) # predict labels on the test images
# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)
print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )
```

Notice that as an evaluation criterion, it is common to use the **accuracy**, which measures the fraction of predictions that were correct. Notice that all classifiers we will build satisfy this one common API: they have a `train(X,y)` function that takes the data and the labels to learn from. Internally, the class should build some kind of model of the labels and how they can be predicted from the data. And then there is a `predict(X)` function, which takes new data and predicts the labels. Of course, we've left out the meat of things - the actual classifier itself. Here is an implementation of a simple Nearest Neighbor classifier with the L1 distance that satisfies this template:

```python
import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in range(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred
```

If you ran this code, you would see that this classifier only achieves **38.6%** on CIFAR-10. That's more impressive than guessing at random (which would give 10% accuracy since there are 10 classes), but nowhere near human performance (which is [estimated at about 94%](https://karpathy.github.io/2011/04/27/manually-classifying-cifar10/)) or near state-of-the-art Convolutional Neural Networks that achieve about 95%, matching human accuracy (see the [leaderboard](https://www.kaggle.com/c/cifar-10/leaderboard) of a recent Kaggle competition on CIFAR-10).

**The choice of distance.**
There are many other ways of computing distances between vectors. Another common choice could be to instead use the **L2 distance**, which has the geometric interpretation of computing the euclidean distance between two vectors. The distance takes the form:

$$
d_2 (I_1, I_2) = \sqrt{\sum_{p} \left( I^p_1 - I^p_2 \right)^2}
$$

In other words we would be computing the pixelwise difference as before, but this time we square all of them, add them up and finally take the square root. In numpy, using the code from above we would need to only replace a single line of code. The line that computes the distances:

```python
distances = np.sqrt(np.sum(np.square(self.Xtr - X[i,:]), axis = 1))
```

Note that I included the `np.sqrt` call above, but in a practical nearest neighbor application we could leave out the square root operation because square root is a *monotonic function*. That is, it scales the absolute sizes of the distances but it preserves the ordering, so the nearest neighbors with or without it are identical. If you ran the Nearest Neighbor classifier on CIFAR-10 with this distance, you would obtain **35.4%** accuracy (slightly lower than our L1 distance result).

**L1 vs. L2.** It is interesting to consider differences between the two metrics. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one. L1 and L2 distances (or equivalently the L1/L2 norms of the differences between a pair of images) are the most commonly used special cases of a [p-norm](https://planetmath.org/vectorpnorm).

<a name='knn'></a>

### k - Nearest Neighbor Classifier

You may have noticed that it is strange to only use the label of the nearest image when we wish to make a prediction. Indeed, it is almost always the case that one can do better by using what's called a **k-Nearest Neighbor Classifier**. The idea is very simple: instead of finding the single closest image in the training set, we will find the top **k** closest images, and have them vote on the label of the test image. In particular, when *k = 1*, we recover the Nearest Neighbor classifier. Intuitively, higher values of **k** have a smoothing effect that makes the classifier more resistant to outliers:

<div class="fig figcenter fighighlight">
  <img src="/assets/knn.jpeg">
  <div class="figcaption">An example of the difference between Nearest Neighbor and a 5-Nearest Neighbor classifier, using 2-dimensional points and 3 classes (red, blue, green). The colored regions show the <b>decision boundaries</b> induced by the classifier with an L2 distance. The white regions show points that are ambiguously classified (i.e. class votes are tied for at least two classes). Notice that in the case of a NN classifier, outlier datapoints (e.g. green point in the middle of a cloud of blue points) create small islands of likely incorrect predictions, while the 5-NN classifier smooths over these irregularities, likely leading to better <b>generalization</b> on the test data (not shown). Also note that the gray regions in the 5-NN image are caused by ties in the votes among the nearest neighbors (e.g. 2 neighbors are red, next two neighbors are blue, last neighbor is green).</div>
</div>

In practice, you will almost always want to use k-Nearest Neighbor. But what value of *k* should you use? We turn to this problem next.

<a name='val'></a>

### Validation sets for Hyperparameter tuning

The k-nearest neighbor classifier requires a setting for *k*. But what number works best? Additionally, we saw that there are many different distance functions we could have used: L1 norm, L2 norm, there are many other choices we didn't even consider (e.g. dot products). These choices are called **hyperparameters** and they come up very often in the design of many Machine Learning algorithms that learn from data. It's often not obvious what values/settings one should choose.

You might be tempted to suggest that we should try out many different values and see what works best. That is a fine idea and that's indeed what we will do, but this must be done very carefully. In particular, **we cannot use the test set for the purpose of tweaking hyperparameters**. Whenever you're designing Machine Learning algorithms, you should think of the test set as a very precious resource that should ideally never be touched until one time at the very end. Otherwise, the very real danger is that you may tune your hyperparameters to work well on the test set, but if you were to deploy your model you could see a significantly reduced performance. In practice, we would say that you **overfit** to the test set. Another way of looking at it is that if you tune your hyperparameters on the test set, you are effectively using the test set as the training set, and therefore the performance you achieve on it will be too optimistic with respect to what you might actually observe when you deploy your model. But if you only use the test set once at end, it remains a good proxy for measuring the **generalization** of your classifier (we will see much more discussion surrounding generalization later in the class).

> Evaluate on the test set only a single time, at the very end.

Luckily, there is a correct way of tuning the hyperparameters and it does not touch the test set at all. The idea is to split our training set in two: a slightly smaller training set, and what we call a **validation set**. Using CIFAR-10 as an example, we could for example use 49,000 of the training images for training, and leave 1,000 aside for validation. This validation set is essentially used as a fake test set to tune the hyper-parameters.

Here is what this might look like in the case of CIFAR-10:

```python
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train
Ytr = Ytr[1000:]

# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:

  # use a particular value of k and evaluation on validation data
  nn = NearestNeighbor()
  nn.train(Xtr_rows, Ytr)
  # here we assume a modified NearestNeighbor class that can take a k as input
  Yval_predict = nn.predict(Xval_rows, k = k)
  acc = np.mean(Yval_predict == Yval)
  print 'accuracy: %f' % (acc,)

  # keep track of what works on the validation set
  validation_accuracies.append((k, acc))
```

By the end of this procedure, we could plot a graph that shows which values of *k* work best. We would then stick with this value and evaluate once on the actual test set.

> Split your training set into training set and a validation set. Use validation set to tune all hyperparameters. At the end run a single time on the test set and report performance.

**Cross-validation**.
In cases where the size of your training data (and therefore also the validation data) might be small, people sometimes use a more sophisticated technique for hyperparameter tuning called **cross-validation**. Working with our previous example, the idea is that instead of arbitrarily picking the first 1000 datapoints to be the validation set and rest training set, you can get a better and less noisy estimate of how well a certain value of *k* works by iterating over different validation sets and averaging the performance across these. For example, in 5-fold cross-validation, we would split the training data into 5 equal folds, use 4 of them for training, and 1 for validation. We would then iterate over which fold is the validation fold, evaluate the performance, and finally average the performance across the different folds.

<div class="fig figleft fighighlight">
  <img src="/assets/cvplot.png">
  <div class="figcaption">Example of a 5-fold cross-validation run for the parameter <b>k</b>. For each value of <b>k</b> we train on 4 folds and evaluate on the 5th. Hence, for each <b>k</b> we receive 5 accuracies on the validation fold (accuracy is the y-axis, each result is a point). The trend line is drawn through the average of the results for each <b>k</b> and the error bars indicate the standard deviation. Note that in this particular case, the cross-validation suggests that a value of about <b>k</b> = 7 works best on this particular dataset (corresponding to the peak in the plot). If we used more than 5 folds, we might expect to see a smoother (i.e. less noisy) curve.</div>
  <div style="clear:both"></div>
</div>


**In practice**. In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive. The splits people tend to use is between 50%-90% of the training data for training and rest for validation. However, this depends on multiple factors: For example if the number of hyperparameters is large you may prefer to use bigger validation splits. If the number of examples in the validation set is small (perhaps only a few hundred or so), it is safer to use cross-validation. Typical number of folds you can see in practice would be 3-fold, 5-fold or 10-fold cross-validation.

<div class="fig figcenter fighighlight">
  <img src="/assets/crossval.jpeg">
  <div class="figcaption">Common data splits. A training and test set is given. The training set is split into folds (for example 5 folds here). The folds 1-4 become the training set. One fold (e.g. fold 5 here in yellow) is denoted as the Validation fold and is used to tune the hyperparameters. Cross-validation goes a step further and iterates over the choice of which fold is the validation fold, separately from 1-5. This would be referred to as 5-fold cross-validation. In the very end once the model is trained and all the best hyperparameters were determined, the model is evaluated a single time on the test data (red).</div>
</div>

<a name='procon'></a>

**Pros and Cons of Nearest Neighbor classifier.**

It is worth considering some advantages and drawbacks of the Nearest Neighbor classifier. Clearly, one advantage is that it is very simple to implement and understand. Additionally, the classifier takes no time to train, since all that is required is to store and possibly index the training data. However, we pay that computational cost at test time, since classifying a test example requires a comparison to every single training example. This is backwards, since in practice we often care about the test time efficiency much more than the efficiency at training time. In fact, the deep neural networks we will develop later in this class shift this tradeoff to the other extreme: They are very expensive to train, but once the training is finished it is very cheap to classify a new test example. This mode of operation is much more desirable in practice.

As an aside, the computational complexity of the Nearest Neighbor classifier is an active area of research, and several **Approximate Nearest Neighbor** (ANN) algorithms and libraries exist that can accelerate the nearest neighbor lookup in a dataset (e.g. [FLANN](https://github.com/mariusmuja/flann)). These algorithms allow one to trade off the correctness of the nearest neighbor retrieval with its space/time complexity during retrieval, and usually rely on a pre-processing/indexing stage that involves building a kdtree, or running the k-means algorithm.

The Nearest Neighbor Classifier may sometimes be a good choice in some settings (especially if the data is low-dimensional), but it is rarely appropriate for use in practical image classification settings. One problem is that images are high-dimensional objects (i.e. they often contain many pixels), and distances over high-dimensional spaces can be very counter-intuitive. The image below illustrates the point that the pixel-based L2 similarities we developed above are very different from perceptual similarities:

<div class="fig figcenter fighighlight">
  <img src="/assets/samenorm.png">
  <div class="figcaption">Pixel-based distances on high-dimensional data (and images especially) can be very unintuitive. An original image (left) and three other images next to it that are all equally far away from it based on L2 pixel distance. Clearly, the pixel-wise distance does not correspond at all to perceptual or semantic similarity.</div>
</div>

Here is one more visualization to convince you that using pixel differences to compare images is inadequate. We can use a visualization technique called <a href="https://lvdmaaten.github.io/tsne/">t-SNE</a> to take the CIFAR-10 images and embed them in two dimensions so that their (local) pairwise distances are best preserved. In this visualization, images that are shown nearby are considered to be very near according to the L2 pixelwise distance we developed above:

<div class="fig figcenter fighighlight">
  <img src="/assets/pixels_embed_cifar10.jpg">
  <div class="figcaption">CIFAR-10 images embedded in two dimensions with t-SNE. Images that are nearby on this image are considered to be close based on the L2 pixel distance. Notice the strong effect of background rather than semantic class differences. Click <a href="/assets/pixels_embed_cifar10_big.jpg">here</a> for a bigger version of this visualization.</div>
</div>

In particular, note that images that are nearby each other are much more a function of the general color distribution of the images, or the type of background rather than their semantic identity. For example, a dog can be seen very near a frog since both happen to be on white background. Ideally we would like images of all of the 10 classes to form their own clusters, so that images of the same class are nearby to each other regardless of irrelevant characteristics and variations (such as the background). However, to get this property we will have to go beyond raw pixels.

<a name='summary'></a>

### Summary

In summary:

- We introduced the problem of **Image Classification**, in which we are given a set of images that are all labeled with a single category. We are then asked to predict these categories for a novel set of test images and measure the accuracy of the predictions.
- We introduced a simple classifier called the **Nearest Neighbor classifier**. We saw that there are multiple hyper-parameters (such as value of k, or the type of distance used to compare examples) that are associated with this classifier and that there was no obvious way of choosing them.
-  We saw that the correct way to set these hyperparameters is to split your training data into two: a training set and a fake test set, which we call **validation set**. We try different hyperparameter values and keep the values that lead to the best performance on the validation set.
- If the lack of training data is a concern, we discussed a procedure called **cross-validation**, which can help reduce noise in estimating which hyperparameters work best.
- Once the best hyperparameters are found, we fix them and perform a single **evaluation** on the actual test set.
- We saw that Nearest Neighbor can get us about 40% accuracy on CIFAR-10. It is simple to implement but requires us to store the entire training set and it is expensive to evaluate on a test image.
- Finally, we saw that the use of L1 or L2 distances on raw pixel values is not adequate since the distances correlate more strongly with backgrounds and color distributions of images than with their semantic content.

In next lectures we will embark on addressing these challenges and eventually arrive at solutions that give 90% accuracies, allow us to completely discard the training set once learning is complete, and they will allow us to evaluate a test image in less than a millisecond.

<a name='summaryapply'></a>

### Summary: Applying kNN in practice

If you wish to apply kNN in practice (hopefully not on images, or perhaps as only a baseline) proceed as follows:

1. Preprocess your data: Normalize the features in your data (e.g. one pixel in images) to have zero mean and unit variance. We will cover this in more detail in later sections, and chose not to cover data normalization in this section because pixels in images are usually homogeneous and do not exhibit widely different distributions, alleviating the need for data normalization.
2. If your data is very high-dimensional, consider using a dimensionality reduction technique such as PCA ([wiki ref](https://en.wikipedia.org/wiki/Principal_component_analysis), [CS229ref](http://cs229.stanford.edu/notes/cs229-notes10.pdf), [blog ref](https://web.archive.org/web/20150503165118/http://www.bigdataexaminer.com:80/understanding-dimensionality-reduction-principal-component-analysis-and-singular-value-decomposition/)), NCA ([wiki ref](https://en.wikipedia.org/wiki/Neighbourhood_components_analysis), [blog ref](https://kevinzakka.github.io/2020/02/10/nca/)), or even [Random Projections](https://scikit-learn.org/stable/modules/random_projection.html).
3. Split your training data randomly into train/val splits. As a rule of thumb, between 70-90% of your data usually goes to the train split. This setting depends on how many hyperparameters you have and how much of an influence you expect them to have. If there are many hyperparameters to estimate, you should err on the side of having larger validation set to estimate them effectively. If you are concerned about the size of your validation data, it is best to split the training data into folds and perform cross-validation. If you can afford the computational budget it is always safer to go with cross-validation (the more folds the better, but more expensive).
4. Train and evaluate the kNN classifier on the validation data (for all folds, if doing cross-validation) for many choices of **k** (e.g. the more the better) and across different distance types (L1 and L2 are good candidates)
5. If your kNN classifier is running too long, consider using an Approximate Nearest Neighbor library (e.g. [FLANN](https://github.com/mariusmuja/flann)) to accelerate the retrieval (at cost of some accuracy).
6. Take note of the hyperparameters that gave the best results. There is a question of whether you should use the full training set with the best hyperparameters, since the optimal hyperparameters might change if you were to fold the validation data into your training set (since the size of the data would be larger). In practice it is cleaner to not use the validation data in the final classifier and consider it to be *burned* on estimating the hyperparameters. Evaluate the best model on the test set. Report the test set accuracy and declare the result to be the performance of the kNN classifier on your data.

<a name='reading'></a>

#### Further Reading

Here are some (optional) links you may find interesting for further reading:

- [A Few Useful Things to Know about Machine Learning](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf), where especially section 6 is related but the whole paper is a warmly recommended reading.

- [Recognizing and Learning Object Categories](https://people.csail.mit.edu/torralba/shortCourseRLOC/index.html), a short course of object categorization at ICCV 2005.


================================================
FILE: convnet-tips.md
================================================
---
layout: page
permalink: /convnet-tips/
---

<a name='overfitting'></a>
### Addressing Overfitting

#### Data Augmentation

- Flip the training images over x-axis
- Sample random crops / scales in the original image
- Jitter the colors

#### Dropout 

- Dropout is just as effective for Conv layers. Usually people apply less dropout right before early conv layers since there are not that many parameters there compared to later stages of the network (e.g. the fully connected layers).


================================================
FILE: convolutional-networks.md
================================================
---
layout: page
permalink: /convolutional-networks/
---

Table of Contents:

- [Architecture Overview](#overview)
- [ConvNet Layers](#layers)
  - [Convolutional Layer](#conv)
  - [Pooling Layer](#pool)
  - [Normalization Layer](#norm)
  - [Fully-Connected Layer](#fc)
  - [Converting Fully-Connected Layers to Convolutional Layers](#convert)
- [ConvNet Architectures](#architectures)
  - [Layer Patterns](#layerpat)
  - [Layer Sizing Patterns](#layersizepat)
  - [Case Studies](#case) (LeNet / AlexNet / ZFNet / GoogLeNet / VGGNet)
  - [Computational Considerations](#comp)
- [Additional References](#add)

## Convolutional Neural Networks (CNNs / ConvNets)

Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous chapter: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.

So what changes? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

<a name='overview'></a>

### Architecture Overview

*Recall: Regular Neural Nets.* As we saw in the previous chapter, Neural Networks receive an input (a single vector), and transform it through a series of *hidden layers*. Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. The last fully-connected layer is called the "output layer" and in classification settings it represents the class scores.

*Regular Neural Nets don't scale well to full images*. In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels), so a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32\*32\*3 = 3072 weights. This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images. For example, an image of more respectable size, e.g. 200x200x3, would lead to neurons that have 200\*200\*3 = 120,000 weights. Moreover, we would almost certainly want to have several such neurons, so the parameters would add up quickly! Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.

*3D volumes of neurons*. Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way. In particular, unlike a regular Neural Network, the layers of a ConvNet have neurons arranged in 3 dimensions: **width, height, depth**. (Note that the word *depth* here refers to the third dimension of an activation volume, not to the depth of a full Neural Network, which can refer to the total number of layers in a network.) For example, the input images in CIFAR-10 are an input volume of activations, and the volume has dimensions 32x32x3 (width, height, depth respectively). As we will soon see, the neurons in a layer will only be connected to a small region of the layer before it, instead of all of the neurons in a fully-connected manner. Moreover, the final output layer would for CIFAR-10 have dimensions 1x1x10, because by the end of the ConvNet architecture we will reduce the full image into a single vector of class scores, arranged along the depth dimension. Here is a visualization:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/neural_net2.jpeg" width="40%">
  <img src="/assets/cnn/cnn.jpeg" width="48%" style="border-left: 1px solid black;">
  <div class="figcaption">Left: A regular 3-layer Neural Network. Right: A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</div>
</div>

> A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an input 3D volume to an output 3D volume with some differentiable function that may or may not have parameters.

<a name='layers'></a>

### Layers used to build ConvNets 

As we described above, a simple ConvNet is a sequence of layers, and every layer of a ConvNet transforms one volume of activations to another through a differentiable function. We use three main types of layers to build ConvNet architectures: **Convolutional Layer**, **Pooling Layer**, and **Fully-Connected Layer** (exactly as seen in regular Neural Networks). We will stack these layers to form a full ConvNet **architecture**. 

*Example Architecture: Overview*. We will go into more details below, but a simple ConvNet for CIFAR-10 classification could have the architecture [INPUT - CONV - RELU - POOL - FC]. In more detail:

- INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
- CONV layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as [32x32x12] if we decided to use 12 filters.
- RELU layer will apply an elementwise activation function, such as the \\(max(0,x)\\) thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
- POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12].
- FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.

In this way, ConvNets transform the original image layer by layer from the original pixel values to the final class scores. Note that some layers contain parameters and other don't. In particular, the CONV/FC layers perform transformations that are a function of not only the activations in the input volume, but also of the parameters (the weights and biases of the neurons). On the other hand, the RELU/POOL layers will implement a fixed function. The parameters in the CONV/FC layers will be trained with gradient descent so that the class scores that the ConvNet computes are consistent with the labels in the training set for each image. 

In summary:

- A ConvNet architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
- There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
- Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
- Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don't)
- Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn't)

<div class="fig figcenter fighighlight">
  <img src="/assets/cnn/convnet.jpeg" width="100%">
  <div class="figcaption">
    The activations of an example ConvNet architecture. The initial volume stores the raw image pixels (left) and the last volume stores the class scores (right). Each volume of activations along the processing path is shown as a column. Since it's difficult to visualize 3D volumes, we lay out each volume's slices in rows. The last layer volume holds the scores for each class, but here we only visualize the sorted top 5 scores, and print the labels of each one. The full <a href="http://cs231n.stanford.edu/">web-based demo</a> is shown in the header of our website. The architecture shown here is a tiny VGG Net, which we will discuss later.
  </div>
</div>

We now describe the individual layers and the details of their hyperparameters and their connectivities.

<a name='conv'></a>

#### Convolutional Layer

The Conv layer is the core building block of a Convolutional Network that does most of the computational heavy lifting. 

**Overview and intuition without brain stuff.** Let's first discuss what the CONV layer computes without brain/neuron analogies. The CONV layer's parameters consist of a set of learnable filters. Every filter is small spatially (along width and height), but extends through the full depth of the input volume. For example, a typical filter on a first layer of a ConvNet might have size 5x5x3 (i.e. 5 pixels width and height, and 3 because images have depth 3, the color channels). During the forward pass, we slide (more precisely, convolve) each filter across the width and height of the input volume and compute dot products between the entries of the filter and the input at any position. As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. Now, we will have an entire set of filters in each CONV layer (e.g. 12 filters), and each of them will produce a separate 2-dimensional activation map. We will stack these activation maps along the depth dimension and produce the output volume.

**The brain view**. If you're a fan of the brain/neuron analogies, every entry in the 3D output volume can also be interpreted as an output of a neuron that looks at only a small region in the input and shares parameters with all neurons to the left and right spatially (since these numbers all result from applying the same filter). 

We now discuss the details of the neuron connectivities, their arrangement in space, and their parameter sharing scheme.

**Local Connectivity.** When dealing with high-dimensional inputs such as images, as we saw above it is impractical to connect neurons to all neurons in the previous volume. Instead, we will connect each neuron to only a local region of the input volume. The spatial extent of this connectivity is a hyperparameter called the **receptive field** of the neuron (equivalently this is the filter size). The extent of the connectivity along the depth axis is always equal to the depth of the input volume. It is important to emphasize again this asymmetry in how we treat the spatial dimensions (width and height) and the depth dimension: The connections are local in 2D space (along width and height), but always full along the entire depth of the input volume.

*Example 1*. For example, suppose that the input volume has size [32x32x3], (e.g. an RGB CIFAR-10 image). If the receptive field (or the filter size) is 5x5, then each neuron in the Conv Layer will have weights to a [5x5x3] region in the input volume, for a total of 5\*5\*3 = 75 weights (and +1 bias parameter). Notice that the extent of the connectivity along the depth axis must be 3, since this is the depth of the input volume.

*Example 2*. Suppose an input volume had size [16x16x20]. Then using an example receptive field size of 3x3, every neuron in the Conv Layer would now have a total of 3\*3\*20 = 180 connections to the input volume. Notice that, again, the connectivity is local in 2D space (e.g. 3x3), but full along the input depth (20).

<div class="fig figcenter fighighlight">
  <img src="/assets/cnn/depthcol.jpeg" width="40%">
  <img src="/assets/nn1/neuron_model.jpeg" width="40%" style="border-left: 1px solid black;">
  <div class="figcaption">
    <b>Left:</b> An example input volume in red (e.g. a 32x32x3 CIFAR-10 image), and an example volume of neurons in the first Convolutional layer. Each neuron in the convolutional layer is connected only to a local region in the input volume spatially, but to the full depth (i.e. all color channels). Note, there are multiple neurons (5 in this example) along the depth, all looking at the same region in the input: the lines that connect this column of 5 neurons do not represent the weights (i.e. these 5 neurons do not share the same weights, but they are associated with 5 different filters), they just indicate that these neurons are connected to or looking at the same receptive field or region of the input volume, i.e. they share the same receptive field but not the same weights. <b>Right:</b> The neurons from the Neural Network chapter remain unchanged: They still compute a dot product of their weights with the input followed by a non-linearity, but their connectivity is now restricted to be local spatially.
  </div>
</div>

**Spatial arrangement**. We have explained the connectivity of each neuron in the Conv Layer to the input volume, but we haven't yet discussed how many neurons there are in the output volume or how they are arranged. Three hyperparameters control the size of the output volume: the **depth, stride** and **zero-padding**. We discuss these next:

1. First, the **depth** of the output volume is a hyperparameter: it corresponds to the number of filters we would like to use, each learning to look for something different in the input. For example, if the first Convolutional Layer takes as input the raw image, then different neurons along the depth dimension may activate in presence of various oriented edges, or blobs of color. We will refer to a set of neurons that are all looking at the same region of the input as a **depth column** (some people also prefer the term *fibre*).
2. Second, we must specify the **stride** with which we slide the filter. When the stride is 1 then we move the filters one pixel at a time. When the stride is 2 (or uncommonly 3 or more, though this is rare in practice) then the filters jump 2 pixels at a time as we slide them around. This will produce smaller output volumes spatially.
3. As we will soon see, sometimes it will be convenient to pad the input volume with zeros around the border. The size of this **zero-padding** is a hyperparameter. The nice feature of zero padding is that it will allow us to control the spatial size of the output volumes (most commonly as we'll see soon we will use it to exactly preserve the spatial size of the input volume so the input and output width and height are the same).

We can compute the spatial size of the output volume as a function of the input volume size (\\(W\\)), the receptive field size of the Conv Layer neurons (\\(F\\)), the stride with which they are applied (\\(S\\)), and the amount of zero padding used (\\(P\\)) on the border. You can convince yourself that the correct formula for calculating how many neurons "fit" is given by \\((W - F + 2P)/S + 1\\). For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we would get a 5x5 output. With stride 2 we would get a 3x3 output. Lets also see one more graphical example:

<div class="fig figcenter fighighlight">
  <img src="/assets/cnn/stride.jpeg">
  <div class="figcaption">
    Illustration of spatial arrangement. In this example there is only one spatial dimension (x-axis), one neuron with a receptive field size of F = 3, the input size is W = 5, and there is zero padding of P = 1. <b>Left:</b> The neuron strided across the input in stride of S = 1, giving output of size (5 - 3 + 2)/1+1 = 5. <b>Right:</b> The neuron uses stride of S = 2, giving output of size (5 - 3 + 2)/2+1 = 3. Notice that stride S = 3 could not be used since it wouldn't fit neatly across the volume. In terms of the equation, this can be determined since (5 - 3 + 2) = 4 is not divisible by 3. 
    <br>The neuron weights are in this example [1,0,-1] (shown on very right), and its bias is zero. These weights are shared across all yellow neurons (see parameter sharing below).
  </div>
</div>

*Use of zero-padding*. In the example above on left, note that the input dimension was 5 and the output dimension was equal: also 5. This worked out so because our receptive fields were 3 and we used zero padding of 1. If there was no zero-padding used, then the output volume would have had spatial dimension of only 3, because that is how many neurons would have "fit" across the original input. In general, setting zero padding to be \\(P = (F - 1)/2\\) when the stride is \\(S = 1\\) ensures that the input volume and output volume will have the same size spatially. It is very common to use zero-padding in this way and we will discuss the full reasons when we talk more about ConvNet architectures.

*Constraints on strides*. Note again that the spatial arrangement hyperparameters have mutual constraints. For example, when the input has size \\(W = 10\\), no zero-padding is used \\(P = 0\\), and the filter size is \\(F = 3\\), then it would be impossible to use stride \\(S = 2\\), since \\((W - F + 2P)/S + 1 = (10 - 3 + 0) / 2 + 1 = 4.5\\), i.e. not an integer, indicating that the neurons don't "fit" neatly and symmetrically across the input. Therefore, this setting of the hyperparameters is considered to be invalid, and a ConvNet library could throw an exception or zero pad the rest to make it fit, or crop the input to make it fit, or something. As we will see in the ConvNet architectures section, sizing the ConvNets appropriately so that all the dimensions "work out" can be a real headache, which the use of zero-padding and some design guidelines will significantly alleviate.

*Real-world example*. The [Krizhevsky et al.](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) architecture that won the ImageNet challenge in 2012 accepted images of size [227x227x3]. On the first Convolutional Layer, it used neurons with receptive field size \\(F = 11\\), stride \\(S = 4\\) and no zero padding \\(P = 0\\). Since (227 - 11)/4 + 1 = 55, and since the Conv layer had a depth of \\(K = 96\\), the Conv layer output volume had size [55x55x96]. Each of the 55\*55\*96 neurons in this volume was connected to a region of size [11x11x3] in the input volume. Moreover, all 96 neurons in each depth column are connected to the same [11x11x3] region of the input, but of course with different weights. As a fun aside, if you read the actual paper it claims that the input images were 224x224, which is surely incorrect because (224 - 11)/4 + 1 is quite clearly not an integer. This has confused many people in the history of ConvNets and little is known about what happened. My own best guess is that Alex used zero-padding of 3 extra pixels that he does not mention in the paper.

**Parameter Sharing.** Parameter sharing scheme is used in Convolutional Layers to control the number of parameters. Using the real-world example above, we see that there are 55\*55\*96 = 290,400 neurons in the first Conv Layer, and each has 11\*11\*3 = 363 weights and 1 bias. Together, this adds up to 290400 * 364 = 105,705,600 parameters on the first layer of the ConvNet alone. Clearly, this number is very high.

It turns out that we can dramatically reduce the number of parameters by making one reasonable assumption: That if one feature is useful to compute at some spatial position (x,y), then it should also be useful to compute at a different position (x2,y2). In other words, denoting a single 2-dimensional slice of depth as a **depth slice** (e.g. a volume of size [55x55x96] has 96 depth slices, each of size [55x55]), we are going to constrain the neurons in each depth slice to use the same weights and bias. With this parameter sharing scheme, the first Conv Layer in our example would now have only 96 unique set of weights (one for each depth slice), for a total of 96\*11\*11\*3 = 34,848 unique weights, or 34,944 parameters (+96 biases). Alternatively, all 55\*55 neurons in each depth slice will now be using the same parameters. In practice during backpropagation, every neuron in the volume will compute the gradient for its weights, but these gradients will be added up across each depth slice and only update a single set of weights per slice.

Notice that if all neurons in a single depth slice are using the same weight vector, then the forward pass of the CONV layer can in each depth slice be computed as a **convolution** of the neuron's weights with the input volume (Hence the name: Convolutional Layer). This is why it is common to refer to the sets of weights as a **filter** (or a **kernel**), that is convolved with the input.

<div class="fig figcenter fighighlight">
  <img src="/assets/cnn/weights.jpeg">
  <div class="figcaption">
    Example filters learned by Krizhevsky et al. Each of the 96 filters shown here is of size [11x11x3], and each one is shared by the 55*55 neurons in one depth slice. Notice that the parameter sharing assumption is relatively reasonable: If detecting a horizontal edge is important at some location in the image, it should intuitively be useful at some other location as well due to the translationally-invariant structure of images. There is therefore no need to relearn to detect a horizontal edge at every one of the 55*55 distinct locations in the Conv layer output volume.
  </div>
</div>

Note that sometimes the parameter sharing assumption may not make sense. This is especially the case when the input images to a ConvNet have some specific centered structure, where we should expect, for example, that completely different features should be learned on one side of the image than another. One practical example is when the input are faces that have been centered in the image. You might expect that different eye-specific or hair-specific features could (and should) be learned in different spatial locations. In that case it is common to relax the parameter sharing scheme, and instead simply call the layer a **Locally-Connected Layer**.

**Numpy examples.** To make the discussion above more concrete, lets express the same ideas but in code and with a specific example. Suppose that the input volume is a numpy array `X`. Then:

- A *depth column* (or a *fibre*) at position `(x,y)` would be the activations `X[x,y,:]`.
- A *depth slice*, or equivalently an *activation map* at depth `d` would be the activations `X[:,:,d]`. 

*Conv Layer Example*. Suppose that the input volume `X` has shape `X.shape: (11,11,4)`. Suppose further that we use no zero padding (\\(P = 0\\)), that the filter size is \\(F = 5\\), and that the stride is \\(S = 2\\). The output volume would therefore have spatial size (11-5)/2+1 = 4, giving a volume with width and height of 4. The activation map in the output volume (call it `V`), would then look as follows (only some of the elements are computed in this example):

- `V[0,0,0] = np.sum(X[:5,:5,:] * W0) + b0`
- `V[1,0,0] = np.sum(X[2:7,:5,:] * W0) + b0`
- `V[2,0,0] = np.sum(X[4:9,:5,:] * W0) + b0`
- `V[3,0,0] = np.sum(X[6:11,:5,:] * W0) + b0`

Remember that in numpy, the operation `*` above denotes elementwise multiplication between the arrays. Notice also that the weight vector `W0` is the weight vector of that neuron and `b0` is the bias. Here, `W0` is assumed to be of shape `W0.shape: (5,5,4)`, since the filter size is 5 and the depth of the input volume is 4. Notice that at each point, we are computing the dot product as seen before in ordinary neural networks. Also, we see that we are using the same weight and bias (due to parameter sharing), and where the dimensions along the width are increasing in steps of 2 (i.e. the stride). To construct a second activation map in the output volume, we would have:

- `V[0,0,1] = np.sum(X[:5,:5,:] * W1) + b1`
- `V[1,0,1] = np.sum(X[2:7,:5,:] * W1) + b1`
- `V[2,0,1] = np.sum(X[4:9,:5,:] * W1) + b1`
- `V[3,0,1] = np.sum(X[6:11,:5,:] * W1) + b1`
- `V[0,1,1] = np.sum(X[:5,2:7,:] * W1) + b1` (example of going along y)
- `V[2,3,1] = np.sum(X[4:9,6:11,:] * W1) + b1` (or along both)

where we see that we are indexing into the second depth dimension in `V` (at index 1) because we are computing the second activation map, and that a different set of parameters (`W1`) is now used. In the example above, we are for brevity leaving out some of the other operations the Conv Layer would perform to fill the other parts of the output array `V`. Additionally, recall that these activation maps are often followed elementwise through an activation function such as ReLU, but this is not shown here.

**Summary**. To summarize, the Conv Layer:

- Accepts a volume of size \\(W_1 \times H_1 \times D_1\\)
- Requires four hyperparameters: 
  - Number of filters \\(K\\), 
  - their spatial extent \\(F\\), 
  - the stride \\(S\\), 
  - the amount of zero padding \\(P\\).
- Produces a volume of size \\(W_2 \times H_2 \times D_2\\) where:
  - \\(W_2 = (W_1 - F + 2P)/S + 1\\)
  - \\(H_2 = (H_1 - F + 2P)/S + 1\\) (i.e. width and height are computed equally by symmetry)
  - \\(D_2 = K\\)
- With parameter sharing, it introduces \\(F \cdot F \cdot D_1\\) weights per filter, for a total of \\((F \cdot F \cdot D_1) \cdot K\\) weights and \\(K\\) biases.
- In the output volume, the \\(d\\)-th depth slice (of size \\(W_2 \times H_2\\)) is the result of performing a valid convolution of the \\(d\\)-th filter over the input volume with a stride of \\(S\\), and then offset by \\(d\\)-th bias.

A common setting of the hyperparameters is \\(F = 3, S = 1, P = 1\\). However, there are common conventions and rules of thumb that motivate these hyperparameters. See the [ConvNet architectures](#architectures) section below.

**Convolution Demo**. Below is a running demo of a CONV layer. Since 3D volumes are hard to visualize, all the volumes (the input volume (in blue), the weight volumes (in red), the output volume (in green)) are visualized with each depth slice stacked in rows. The input volume is of size \\(W_1 = 5, H_1 = 5, D_1 = 3\\), and the CONV layer parameters are \\(K = 2, F = 3, S = 2, P = 1\\). That is, we have two filters of size \\(3 \times 3\\), and they are applied with a stride of 2. Therefore, the output volume size has spatial size (5 - 3 + 2)/2 + 1 = 3. Moreover, notice that a padding of \\(P = 1\\) is applied to the input volume, making the outer border of the input volume zero. The visualization below iterates over the output activations (green), and shows that each element is computed by elementwise multiplying the highlighted input (blue) with the filter (red), summing it up, and then offsetting the result by the bias.

<div class="fig figcenter fighighlight">
  <iframe src="/assets/conv-demo/index.html" width="100%" height="700px;" style="border:none;"></iframe>
  <div class="figcaption"></div>
</div>

**Implementation as Matrix Multiplication**. Note that the convolution operation essentially performs dot products between the filters and local regions of the input. A common implementation pattern of the CONV layer is to take advantage of this fact and formulate the forward pass of a convolutional layer as one big matrix multiply as follows:

1. The local regions in the input image are stretched out into columns in an operation commonly called **im2col**. For example, if the input is [227x227x3] and it is to be convolved with 11x11x3 filters at stride 4, then we would take [11x11x3] blocks of pixels in the input and stretch each block into a column vector of size 11\*11\*3 = 363. Iterating this process in the input at stride of 4 gives (227-11)/4+1 = 55 locations along both width and height, leading to an output matrix `X_col` of *im2col* of size [363 x 3025], where every column is a stretched out receptive field and there are 55*55 = 3025 of them in total. Note that since the receptive fields overlap, every number in the input volume may be duplicated in multiple distinct columns.
2. The weights of the CONV layer are similarly stretched out into rows. For example, if there are 96 filters of size [11x11x3] this would give a matrix `W_row` of size [96 x 363].
3. The result of a convolution is now equivalent to performing one large matrix multiply `np.dot(W_row, X_col)`, which evaluates the dot product between every filter and every receptive field location. In our example, the output of this operation would be [96 x 3025], giving the output of the dot product of each filter at each location. 
4. The result must finally be reshaped back to its proper output dimension [55x55x96].

This approach has the downside that it can use a lot of memory, since some values in the input volume are replicated multiple times in `X_col`. However, the benefit is that there are many very efficient implementations of Matrix Multiplication that we can take advantage of (for example, in the commonly used [BLAS](http://www.netlib.org/blas/) API). Moreover, the same *im2col* idea can be reused to perform the pooling operation, which we discuss next.

**Backpropagation.** The backward pass for a convolution operation (for both the data and the weights) is also a convolution (but with spatially-flipped filters). This is easy to derive in the 1-dimensional case with a toy example (not expanded on for now).

**1x1 convolution**. As an aside, several papers use 1x1 convolutions, as first investigated by [Network in Network](http://arxiv.org/abs/1312.4400). Some people are at first confused to see 1x1 convolutions especially when they come from signal processing background. Normally signals are 2-dimensional so 1x1 convolutions do not make sense (it's just pointwise scaling). However, in ConvNets this is not the case because one must remember that we operate over 3-dimensional volumes, and that the filters always extend through the full depth of the input volume. For example, if the input is [32x32x3] then doing 1x1 convolutions would effectively be doing 3-dimensional dot products (since the input depth is 3 channels).

**Dilated convolutions.** A recent development (e.g. see [paper by Fisher Yu and Vladlen Koltun](https://arxiv.org/abs/1511.07122)) is to introduce one more hyperparameter to the CONV layer called the *dilation*. So far we've only discussed CONV filters that are contiguous. However, it's possible to have filters that have spaces between each cell, called dilation. As an example, in one dimension a filter `w` of size 3 would compute over input `x` the following: `w[0]*x[0] + w[1]*x[1] + w[2]*x[2]`. This is dilation of 0. For dilation 1 the filter would instead compute `w[0]*x[0] + w[1]*x[2] + w[2]*x[4]`; In other words there is a gap of 1 between the applications. This can be very useful in some settings to use in conjunction with 0-dilated filters because it allows you to merge spatial information across the inputs much more agressively with fewer layers. For example, if you stack two 3x3 CONV layers on top of each other then you can convince yourself that the neurons on the 2nd layer are a function of a 5x5 patch of the input (we would say that the *effective receptive field* of these neurons is 5x5). If we use dilated convolutions then this effective receptive field would grow much quicker.

<a name='pool'></a>

#### Pooling Layer

It is common to periodically insert a Pooling layer in-between successive Conv layers in a ConvNet architecture. Its function is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. The most common form is a pooling layer with filters of size 2x2 applied with a stride of 2 downsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations. Every MAX operation would in this case be taking a max over 4 numbers (little 2x2 region in some depth slice). The depth dimension remains unchanged. More generally, the pooling layer:

- Accepts a volume of size \\(W_1 \times H_1 \times D_1\\)
- Requires two hyperparameters: 
  - their spatial extent \\(F\\), 
  - the stride \\(S\\), 
- Produces a volume of size \\(W_2 \times H_2 \times D_2\\) where:
  - \\(W_2 = (W_1 - F)/S + 1\\)
  - \\(H_2 = (H_1 - F)/S + 1\\)
  - \\(D_2 = D_1\\)
- Introduces zero parameters since it computes a fixed function of the input
- For Pooling layers, it is not common to pad the input using zero-padding.

It is worth noting that there are only two commonly seen variations of the max pooling layer found in practice: A pooling layer with \\(F = 3, S = 2\\) (also called overlapping pooling), and more commonly \\(F = 2, S = 2\\). Pooling sizes with larger receptive fields are too destructive.

**General pooling**. In addition to max pooling, the pooling units can also perform other functions, such as *average pooling* or even *L2-norm pooling*. Average pooling was often used historically but has recently fallen out of favor compared to the max pooling operation, which has been shown to work better in practice.

<div class="fig figcenter fighighlight">
  <img src="/assets/cnn/pool.jpeg" width="36%">
  <img src="/assets/cnn/maxpool.jpeg" width="59%" style="border-left: 1px solid black;">
  <div class="figcaption">
    Pooling layer downsamples the volume spatially, independently in each depth slice of the input volume. <b>Left:</b> In this example, the input volume of size [224x224x64] is pooled with filter size 2, stride 2 into output volume of size [112x112x64]. Notice that the volume depth is preserved. <b>Right:</b> The most common downsampling operation is max, giving rise to <b>max pooling</b>, here shown with a stride of 2. That is, each max is taken over 4 numbers (little 2x2 square).
  </div>
</div>

**Backpropagation**. Recall from the backpropagation chapter that the backward pass for a max(x, y) operation has a simple interpretation as only routing the gradient to the input that had the highest value in the forward pass. Hence, during the forward pass of a pooling layer it is common to keep track of the index of the max activation (sometimes also called *the switches*) so that gradient routing is efficient during backpropagation.

**Getting rid of pooling**. Many people dislike the pooling operation and think that we can get away without it. For example, [Striving for Simplicity: The All Convolutional Net](http://arxiv.org/abs/1412.6806) proposes to discard the pooling layer in favor of architecture that only consists of repeated CONV layers. To reduce the size of the representation they suggest using larger stride in CONV layer once in a while. Discarding pooling layers has also been found to be important in training good generative models, such as variational autoencoders (VAEs) or generative adversarial networks (GANs). It seems likely that future architectures will feature very few to no pooling layers.

<a name='norm'></a>

#### Normalization Layer

Many types of normalization layers have been proposed for use in ConvNet architectures, sometimes with the intentions of implementing inhibition schemes observed in the biological brain. However, these layers have since fallen out of favor because in practice their contribution has been shown to be minimal, if any. For various types of normalizations, see the discussion in Alex Krizhevsky's [cuda-convnet library API](http://code.google.com/p/cuda-convnet/wiki/LayerParams#Local_response_normalization_layer_(same_map)).

<a name='fc'></a>

#### Fully-connected layer

Neurons in a fully connected layer have full connections to all activations in the previous layer, as seen in regular Neural Networks. Their activations can hence be computed with a matrix multiplication followed by a bias offset. See the *Neural Network* section of the notes for more information.

<a name='convert'></a>

#### Converting FC layers to CONV layers 

It is worth noting that the only difference between FC and CONV layers is that the neurons in the CONV layer are connected only to a local region in the input, and that many of the neurons in a CONV volume share parameters. However, the neurons in both layers still compute dot products, so their functional form is identical. Therefore, it turns out that it's possible to convert between FC and CONV layers:

- For any CONV layer there is an FC layer that implements the same forward function. The weight matrix would be a large matrix that is mostly zero except for at certain blocks (due to local connectivity) where the weights in many of the blocks are equal (due to parameter sharing).
- Conversely, any FC layer can be converted to a CONV layer. For example, an FC layer with \\(K = 4096\\) that is looking at some input volume of size \\(7 \times 7 \times 512\\) can be equivalently expressed as a CONV layer with \\(F = 7, P = 0, S = 1, K = 4096\\). In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be \\(1 \times 1 \times 4096\\) since only a single depth column "fits" across the input volume, giving identical result as the initial FC layer.

**FC->CONV conversion**. Of these two conversions, the ability to convert an FC layer to a CONV layer is particularly useful in practice. Consider a ConvNet architecture that takes a 224x224x3 image, and then uses a series of CONV layers and POOL layers to reduce the image to an activations volume of size 7x7x512 (in an *AlexNet* architecture that we'll see later, this is done by use of 5 pooling layers that downsample the input spatially by a factor of two each time, making the final spatial size 224/2/2/2/2/2 = 7). From there, an AlexNet uses two FC layers of size 4096 and finally the last FC layers with 1000 neurons that compute the class scores. We can convert each of these three FC layers to CONV layers as described above:

- Replace the first FC layer that looks at [7x7x512] volume with a CONV layer that uses filter size \\(F = 7\\), giving output volume [1x1x4096].
- Replace the second FC layer with a CONV layer that uses filter size \\(F = 1\\), giving output volume [1x1x4096]
- Replace the last FC layer similarly, with \\(F=1\\), giving final output [1x1x1000]

Each of these conversions could in practice involve manipulating (e.g. reshaping) the weight matrix \\(W\\) in each FC layer into CONV layer filters. It turns out that this conversion allows us to "slide" the original ConvNet very efficiently across many spatial positions in a larger image, in a single forward pass. 

For example, if 224x224 image gives a volume of size [7x7x512] - i.e. a reduction by 32, then forwarding an image of size 384x384 through the converted architecture would give the equivalent volume in size [12x12x512], since 384/32 = 12. Following through with the next 3 CONV layers that we just converted from FC layers would now give the final volume of size [6x6x1000], since (12 - 7)/1 + 1 = 6. Note that instead of a single vector of class scores of size [1x1x1000], we're now getting an entire 6x6 array of class scores across the 384x384 image.

> Evaluating the original ConvNet (with FC layers) independently across 224x224 crops of the 384x384 image in strides of 32 pixels gives an identical result to forwarding the converted ConvNet one time.

Naturally, forwarding the converted ConvNet a single time is much more efficient than iterating the original ConvNet over all those 36 locations, since the 36 evaluations share computation. This trick is often used in practice to get better performance, where for example, it is common to resize an image to make it bigger, use a converted ConvNet to evaluate the class scores at many spatial positions and then average the class scores.

Lastly, what if we wanted to efficiently apply the original ConvNet over the image but at a stride smaller than 32 pixels? We could achieve this with multiple forward passes. For example, note that if we wanted to use a stride of 16 pixels we could do so by combining the volumes received by forwarding the converted ConvNet twice: First over the original image and second over the image but with the image shifted spatially by 16 pixels along both width and height.

- An IPython Notebook on [Net Surgery](https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb) shows how to perform the conversion in practice, in code (using Caffe)

<a name='architectures'></a>

### ConvNet Architectures

We have seen that Convolutional Networks are commonly made up of only three layer types: CONV, POOL (we assume Max pool unless stated otherwise) and FC (short for fully-connected). We will also explicitly write the RELU activation function as a layer, which applies elementwise non-linearity. In this section we discuss how these are commonly stacked together to form entire ConvNets. 

<a name='layerpat'></a>

#### Layer Patterns
The most common form of a ConvNet architecture stacks a few CONV-RELU layers, follows them with POOL layers, and repeats this pattern until the image has been merged spatially to a small size. At some point, it is common to transition to fully-connected layers. The last fully-connected layer holds the output, such as the class scores. In other words, the most common ConvNet architecture follows the pattern:

`INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC`

where the `*` indicates repetition, and the `POOL?` indicates an optional pooling layer. Moreover, `N >= 0` (and usually `N <= 3`), `M >= 0`, `K >= 0` (and usually `K < 3`). For example, here are some common ConvNet architectures you may see that follow this pattern:

- `INPUT -> FC`, implements a linear classifier. Here `N = M = K = 0`.
- `INPUT -> CONV -> RELU -> FC`
- `INPUT -> [CONV -> RELU -> POOL]*2 -> FC -> RELU -> FC`. Here we see that there is a single CONV layer between every POOL layer.
- `INPUT -> [CONV -> RELU -> CONV -> RELU -> POOL]*3 -> [FC -> RELU]*2 -> FC` Here we see two CONV layers stacked before every POOL layer. This is generally a good idea for larger and deeper networks, because multiple stacked CONV layers can develop more complex features of the input volume before the destructive pooling operation.

*Prefer a stack of small filter CONV to one large receptive field CONV layer*. Suppose that you stack three 3x3 CONV layers on top of each other (with non-linearities in between, of course). In this arrangement, each neuron on the first CONV layer has a 3x3 view of the input volume. A neuron on the second CONV layer has a 3x3 view of the first CONV layer, and hence by extension a 5x5 view of the input volume. Similarly, a neuron on the third CONV layer has a 3x3 view of the 2nd CONV layer, and hence a 7x7 view of the input volume. Suppose that instead of these three layers of 3x3 CONV, we only wanted to use a single CONV layer with 7x7 receptive fields. These neurons would have a receptive field size of the input volume that is identical in spatial extent (7x7), but with several disadvantages. First, the neurons would be computing a linear function over the input, while the three stacks of CONV layers contain non-linearities that make their features more expressive. Second, if we suppose that all the volumes have \\(C\\) channels, then it can be seen that the single 7x7 CONV layer would contain \\(C \times (7 \times 7 \times C) = 49 C^2\\) parameters, while the three 3x3 CONV layers would only contain \\(3 \times (C \times (3 \times 3 \times C)) = 27 C^2\\) parameters. Intuitively, stacking CONV layers with tiny filters as opposed to having one CONV layer with big filters allows us to express more powerful features of the input, and with fewer parameters. As a practical disadvantage, we might need more memory to hold all the intermediate CONV layer results if we plan to do backpropagation.

**Recent departures.** It should be noted that the conventional paradigm of a linear list of layers has recently been challenged, in Google's Inception architectures and also in current (state of the art) Residual Networks from Microsoft Research Asia. Both of these (see details below in case studies section) feature more intricate and different connectivity structures.

**In practice: use whatever works best on ImageNet**. If you're feeling a bit of a fatigue in thinking about the architectural decisions, you'll be pleased to know that in 90% or more of applications you should not have to worry about these. I like to summarize this point as "*don't be a hero*": Instead of rolling your own architecture for a problem, you should look at whatever architecture currently works best on ImageNet, download a pretrained model and finetune it on your data. You should rarely ever have to train a ConvNet from scratch or design one from scratch. I also made this point at the [Deep Learning school](https://www.youtube.com/watch?v=u6aEYuemt0M).

<a name='layersizepat'></a>

#### Layer Sizing Patterns

Until now we've omitted mentions of common hyperparameters used in each of the layers in a ConvNet. We will first state the common rules of thumb for sizing the architectures and then follow the rules with a discussion of the notation:

The **input layer** (that contains the image) should be divisible by 2 many times. Common numbers include 32 (e.g. CIFAR-10), 64, 96 (e.g. STL-10), or 224 (e.g. common ImageNet ConvNets), 384, and 512.

The **conv layers** should be using small filters (e.g. 3x3 or at most 5x5), using a stride of \\(S = 1\\), and crucially, padding the input volume with zeros in such way that the conv layer does not alter the spatial dimensions of the input. That is, when \\(F = 3\\), then using \\(P = 1\\) will retain the original size of the input. When \\(F = 5\\), \\(P = 2\\). For a general \\(F\\), it can be seen that \\(P = (F - 1) / 2\\) preserves the input size. If you must use bigger filter sizes (such as 7x7 or so), it is only common to see this on the very first conv layer that is looking at the input image.

The **pool layers** are in charge of downsampling the spatial dimensions of the input. The most common setting is to use max-pooling with 2x2 receptive fields (i.e. \\(F = 2\\)), and with a stride of 2 (i.e. \\(S = 2\\)). Note that this discards exactly 75% of the activations in an input volume (due to downsampling by 2 in both width and height). Another slightly less common setting is to use 3x3 receptive fields with a stride of 2, but this makes "fitting" more complicated (e.g., a 32x32x3 layer would require zero padding to be used with a max-pooling layer with 3x3 receptive field and stride 2). It is very uncommon to see receptive field sizes for max pooling that are larger than 3 because the pooling is then too lossy and aggressive. This usually leads to worse performance.

*Reducing sizing headaches.* The scheme presented above is pleasing because all the CONV layers preserve the spatial size of their input, while the POOL layers alone are in charge of down-sampling the volumes spatially. In an alternative scheme where we use strides greater than 1 or don't zero-pad the input in CONV layers, we would have to very carefully keep track of the input volumes throughout the CNN architecture and make sure that all strides and filters "work out", and that the ConvNet architecture is nicely and symmetrically wired.

*Why use stride of 1 in CONV?* Smaller strides work better in practice. Additionally, as already mentioned stride 1 allows us to leave all spatial down-sampling to the POOL layers, with the CONV layers only transforming the input volume depth-wise.

*Why use padding?* In addition to the aforementioned benefit of keeping the spatial sizes constant after CONV, doing this actually improves performance. If the CONV layers were to not zero-pad the inputs and only perform valid convolutions, then the size of the volumes would reduce by a small amount after each CONV, and the information at the borders would be "washed away" too quickly.

*Compromising based on memory constraints.* In some cases (especially early in the ConvNet architectures), the amount of memory can build up very quickly with the rules of thumb presented above. For example, filtering a 224x224x3 image with three 3x3 CONV layers with 64 filters each and padding 1 would create three activation volumes of size [224x224x64]. This amounts to a total of about 10 million activations, or 72MB of memory (per image, for both activations and gradients). Since GPUs are often bottlenecked by memory, it may be necessary to compromise. In practice, people prefer to make the compromise at only the first CONV layer of the network. For example, one compromise might be to use a first CONV layer with filter sizes of 7x7 and stride of 2 (as seen in a ZF net). As another example, an AlexNet uses filter sizes of 11x11 and stride of 4.

<a name='case'></a>

#### Case studies

There are several architectures in the field of Convolutional Networks that have a name. The most common are:

- **LeNet**. The first successful applications of Convolutional Networks were developed by Yann LeCun in 1990's. Of these, the best known is the [LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf) architecture that was used to read zip codes, digits, etc.
- **AlexNet**. The first work that popularized Convolutional Networks in Computer Vision was the [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks), developed by Alex Krizhevsky, Ilya Sutskever and Geoff Hinton. The AlexNet was submitted to the [ImageNet ILSVRC challenge](http://www.image-net.org/challenges/LSVRC/2014/) in 2012 and significantly outperformed the second runner-up (top 5 error of 16% compared to runner-up with 26% error). The Network had a very similar architecture to LeNet, but was deeper, bigger, and featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer).
- **ZF Net**. The ILSVRC 2013 winner was a Convolutional Network from Matthew Zeiler and Rob Fergus. It became known as the [ZFNet](http://arxiv.org/abs/1311.2901) (short for Zeiler & Fergus Net). It was an improvement on AlexNet by tweaking the architecture hyperparameters, in particular by expanding the size of the middle convolutional layers and making the stride and filter size on the first layer smaller.
- **GoogLeNet**. The ILSVRC 2014 winner was a Convolutional Network from [Szegedy et al.](http://arxiv.org/abs/1409.4842) from Google. Its main contribution was the development of an *Inception Module* that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters that do not seem to matter much. There are also several followup versions to the GoogLeNet, most recently [Inception-v4](http://arxiv.org/abs/1602.07261).
- **VGGNet**. The runner-up in ILSVRC 2014 was the network from Karen Simonyan and Andrew Zisserman that became known as the [VGGNet](http://www.robots.ox.ac.uk/~vgg/research/very_deep/). Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end. Their [pretrained model](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) is available for plug and play use in Caffe. A downside of the VGGNet is that it is more expensive to evaluate and uses a lot more memory and parameters (140M). Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters.
- **ResNet**. [Residual Network](http://arxiv.org/abs/1512.03385) developed by Kaiming He et al. was the winner of ILSVRC 2015. It features special *skip connections* and a heavy use of [batch normalization](http://arxiv.org/abs/1502.03167). The architecture is also missing fully connected layers at the end of the network. The reader is also referred to Kaiming's presentation ([video](https://www.youtube.com/watch?v=1PGLj-uKT1w), [slides](http://research.microsoft.com/en-us/um/people/kahe/ilsvrc15/ilsvrc2015_deep_residual_learning_kaiminghe.pdf)), and some [recent experiments](https://github.com/gcr/torch-residual-networks) that reproduce these networks in Torch. ResNets are currently by far state of the art Convolutional Neural Network models and are the default choice for using ConvNets in practice (as of May 10, 2016). In particular, also see more recent developments that tweak the original architecture from [Kaiming He et al. Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) (published March 2016).

**VGGNet in detail**.
Lets break down the [VGGNet](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) in more detail as a case study. The whole VGGNet is composed of CONV layers that perform 3x3 convolutions with stride 1 and pad 1, and of POOL layers that perform 2x2 max pooling with stride 2 (and no padding). We can write out the size of the representation at each step of the processing and keep track of both the representation size and the total number of weights:

```
INPUT: [224x224x3]        memory:  224*224*3=150K   weights: 0
CONV3-64: [224x224x64]  memory:  224*224*64=3.2M   weights: (3*3*3)*64 = 1,728
CONV3-64: [224x224x64]  memory:  224*224*64=3.2M   weights: (3*3*64)*64 = 36,864
POOL2: [112x112x64]  memory:  112*112*64=800K   weights: 0
CONV3-128: [112x112x128]  memory:  112*112*128=1.6M   weights: (3*3*64)*128 = 73,728
CONV3-128: [112x112x128]  memory:  112*112*128=1.6M   weights: (3*3*128)*128 = 147,456
POOL2: [56x56x128]  memory:  56*56*128=400K   weights: 0
CONV3-256: [56x56x256]  memory:  56*56*256=800K   weights: (3*3*128)*256 = 294,912
CONV3-256: [56x56x256]  memory:  56*56*256=800K   weights: (3*3*256)*256 = 589,824
CONV3-256: [56x56x256]  memory:  56*56*256=800K   weights: (3*3*256)*256 = 589,824
POOL2: [28x28x256]  memory:  28*28*256=200K   weights: 0
CONV3-512: [28x28x512]  memory:  28*28*512=400K   weights: (3*3*256)*512 = 1,179,648
CONV3-512: [28x28x512]  memory:  28*28*512=400K   weights: (3*3*512)*512 = 2,359,296
CONV3-512: [28x28x512]  memory:  28*28*512=400K   weights: (3*3*512)*512 = 2,359,296
POOL2: [14x14x512]  memory:  14*14*512=100K   weights: 0
CONV3-512: [14x14x512]  memory:  14*14*512=100K   weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512]  memory:  14*14*512=100K   weights: (3*3*512)*512 = 2,359,296
CONV3-512: [14x14x512]  memory:  14*14*512=100K   weights: (3*3*512)*512 = 2,359,296
POOL2: [7x7x512]  memory:  7*7*512=25K  weights: 0
FC: [1x1x4096]  memory:  4096  weights: 7*7*512*4096 = 102,760,448
FC: [1x1x4096]  memory:  4096  weights: 4096*4096 = 16,777,216
FC: [1x1x1000]  memory:  1000 weights: 4096*1000 = 4,096,000

TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd)
TOTAL params: 138M parameters
```

As is common with Convolutional Networks, notice that most of the memory (and also compute time) is used in the early CONV layers, and that most of the parameters are in the last FC layers. In this particular case, the first FC layer contains 100M weights, out of a total of 140M.


<a name='comp'></a>

#### Computational Considerations

The largest bottleneck to be aware of when constructing ConvNet architectures is the memory bottleneck. Many modern GPUs have a limit of 3/4/6GB memory, with the best GPUs having about 12GB of memory. There are three major sources of memory to keep track of:

- From the intermediate volume sizes: These are the raw number of **activations** at every layer of the ConvNet, and also their gradients (of equal size). Usually, most of the activations are on the earlier layers of a ConvNet (i.e. first Conv Layers). These are kept around because they are needed for backpropagation, but a clever implementation that runs a ConvNet only at test time could in principle reduce this by a huge amount, by only storing the current activations at any layer and discarding the previous activations on layers below.
- From the parameter sizes: These are the numbers that hold the network **parameters**, their gradients during backpropagation, and commonly also a step cache if the optimization is using momentum, Adagrad, or RMSProp. Therefore, the memory to store the parameter vector alone must usually be multiplied by a factor of at least 3 or so.
- Every ConvNet implementation has to maintain **miscellaneous** memory, such as the image data batches, perhaps their augmented versions, etc.

Once you have a rough estimate of the total number of values (for activations, gradients, and misc), the number should be converted to size in GB. Take the number of values, multiply by 4 to get the raw number of bytes (since every floating point is 4 bytes, or maybe by 8 for double precision), and then divide by 1024 multiple times to get the amount of memory in KB, MB, and finally GB. If your network doesn't fit, a common heuristic to "make it fit" is to decrease the batch size, since most of the memory is usually consumed by the activations.


<a name='add'></a>

### Additional Resources

Additional resources related to implementation:

- [Soumith benchmarks for CONV performance](https://github.com/soumith/convnet-benchmarks)
- [ConvNetJS CIFAR-10 demo](http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html) allows you to play with ConvNet architectures and see the results and computations in real time, in the browser.
- [Caffe](http://caffe.berkeleyvision.org/), one of the popular ConvNet libraries.
- [State of the art ResNets in Torch7](http://torch.ch/blog/2016/02/04/resnets.html)


================================================
FILE: css/main.css
================================================
/* Base */
/* ----------------------------------------------------------*/

* {
  margin: 0;
  padding: 0;
}

html, body { height: 100%; }

body {
  /* font-family: Helvetica, Arial, sans-serif; */
  font-family: 'Roboto', sans-serif;
  font-size: 16px;
  line-height: 1.5;
  font-weight: 300;
  background-color: #fdfdfd;
}

h1, h2, h3, h4, h5, h6 { font-size: 100%; font-weight: 400; }

a         { color: #2a7ae2; text-decoration: none; }
a:hover   { color: #000; text-decoration: underline; }
a:visited { color: #205caa; }

/* Utility */

.wrap:before,
.wrap:after { content:""; display:table; }
.wrap:after { clear: both; }
.wrap {
  max-width: 800px;
  padding: 0 30px;
  margin: 0 auto;
  zoom: 1;
}


/* Layout Styles */
/* ----------------------------------------------------------*/

/* Custom CSS rules for index page */

.module-header {
  font-size: 24px;
  color: #8C1515;
  margin-top: 20px;
  margin-bottom: 5px;
}

.materials-wrap {
  font-size: 18px;
}
.materials-item a{
  color: #333;
  display: block;
  padding: 3px;
}
.materials-item {
  border-bottom: 1px solid #CCC;
}
.materials-item:nth-child(even) {
  background-color: #f7f6f1;
}

.colab-badge {
  border: 0 !important;
}

/* Custom CSS rules for content */

.embedded-video {
  text-align: center;
  background-color: #f7f6f1;
  padding: 20px 0px 20px 0px;
  border-bottom: 1px solid #999;
  border-top: 1px solid #999;
}

/* Custom CSS for pages */
.figcenter {
  text-align: center;
}
.fig img {
  max-width: 98%;
}
.figleft img {
  max-width: 50%;
  float: left;
  margin-right: 20px;
}
.figleft svg {
  float: left;
  margin-right: 20px;
}
.fighighlight {
  /* background-color: #f7f6f1; */
  padding: 20px 4px 20px 4px;
  border-bottom: 1px solid #999;
  border-top: 1px solid #999;
  /* box-shadow: 0px 2px 5px #CCC, 0px -2px 5px #CCC; */
}
.figcaption {
  font-weight: 400;
  font-size: 14px;
  color: #575651;
  text-align: justify;
}

.post-content p {
  text-align: justify;
}

.kw {
  font-size: 16px;
  color: #08A;
  margin-left: 10px;
}

.notyet {
  background-color: #FEE !important;
}

/* Site header */

.site-header {
  position: relative;
  border-bottom: 1px solid #e8e8e8;
  background-color: #8C1515;
  padding: 15px;
  text-align: center;
}

.site-title,
.site-title:hover,
.site-title:visited {
  display: block;
  padding: 10px;
  font-size: 26px;
  line-height: 1.2em;
  letter-spacing: -1px;
  color: #FFF;
  font-weight: 100;
}

.site-link:link,
.site-link:hover,
.site-link:visited {
  margin-bottom: 10px;
  display: inline-block;
  text-align: center;
  font-size: 18px;
  line-height: 2em;
  height: 2em;
  padding: 0 10px;
  color: #fff;
  border: 2px solid #fff;
  font-weight: 50;
}

/* Site footer */

.site-footer {
  border-top: 1px solid #e8e8e8;
  padding: 30px 0;
}

.site-footer .column { float: left; margin-bottom: 15px; }

.footer-col-1 {
  width: 270px; /*fallback*/
  width: -webkit-calc(35% - 10px);
  width: -moz-calc(35% - 10px);
  width: -o-calc(35% - 10px);
  width: calc(35% - 10px);
  margin-right: 10px
}
.footer-col-2 {
  width: 175px; /*fallback*/
  width: -webkit-calc(23.125% - 10px);
  width: -moz-calc(23.125% - 10px);
  width: -o-calc(23.125% - 10px);
  width: calc(23.125% - 10px);
  margin-right: 10px
}
.footer-col-3 {
  width: 335px; /*fallback*/
  width: -webkit-calc(41.875%);
  width: -moz-calc(41.875%);
  width: -o-calc(41.875%);
  width: calc(41.875%);
}

.site-footer ul { list-style: none; }

.site-footer li,
.site-footer p {
  font-size: 15px;
  letter-spacing: -.3px;
  color: #828282;
}

.github-icon-svg,
.twitter-icon-svg {
  display: inline-block;
  width: 16px;
  height: 16px;
  position: relative;
  top: 3px;
}


/* Page Content styles */
/* ----------------------------------------------------------*/

.page-content {
  padding: 30px 0;
  background-color: #fff;
}


/* Home styles */
/* ----------------------------------------------------------*/

.home h1 { margin-bottom: 25px; }

.posts { list-style-type: none; }

.posts li { margin-bottom: 30px; }

.posts .post-link {
  font-size: 24px;
  letter-spacing: -1px;
  line-height: 1;
}

.posts .post-date {
  display: block;
  font-size: 15px;
  color: #818181;
}


/* Post styles */
/* ----------------------------------------------------------*/

.post-header { margin: 10px 0 30px; }

.post-header h1 {
  font-size: 30px;
  letter-spacing: -1.75px;
  line-height: 1;
  font-weight: 300;
}

.post-header .meta {
  font-size: 15px;
  color: #818181;
  margin-top: 5px;
}

.post-content { margin: 0 0 30px; }

.post-content > * { margin: 20px 0; }


.post-content h1,
.post-content h2,
.post-content h3,
.post-content h4,
.post-content h5,
.post-content h6 {
  line-height: 1;
  font-weight: 300;
  margin: 40px 0 20px;
}

.post-content h2 {
  font-size: 32px;
  letter-spacing: -1.25px;
}

.post-content h3 {
  font-size: 26px;
  letter-spacing: -1px;
}

.post-content h4 {
  font-size: 20px;
  letter-spacing: -1px;
}

.post-content blockquote {
  border-left: 4px solid #e8e8e8;
  padding-left: 20px;
  font-size: 18px;
  opacity: .6;
  letter-spacing: -1px;
  font-style: italic;
  margin: 30px 0;
}

.post-content ul,
.post-content ol { padding-left: 20px; }

.post pre,
.post code {
  border: 1px solid #d5d5e9;
  background-color: #eef;
  padding: 8px 12px;
  -webkit-border-radius: 3px;
  -moz-border-radius: 3px;
  border-radius: 3px;
  font-size: 15px;
  overflow:scroll;
}

.post code { padding: 1px 5px; }

.post ul,
.post ol { margin-left: 1.35em; }

.post pre code {
  border: 0;
  padding-right: 0;
  padding-left: 0;
}

/* terminal */
.post pre.terminal {
  border: 1px solid #000;
  background-color: #333;
  color: #FFF;
  -webkit-border-radius: 3px;
  -moz-border-radius: 3px;
  border-radius: 3px;
}

.post pre.terminal code { background-color: #333; }

/* Syntax highlighting styles */
/* ----------------------------------------------------------*/

.highlight  { background: #ffffff; }
.highlight .c { color: #999988; font-style: italic } /* Comment */
.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */
.highlight .k { font-weight: bold } /* Keyword */
.highlight .o { font-weight: bold } /* Operator */
.highlight .cm { color: #999988; font-style: italic } /* Comment.Multiline */
.highlight .cp { color: #999999; font-weight: bold } /* Comment.Preproc */
.highlight .c1 { color: #999988; font-style: italic } /* Comment.Single */
.highlight .cs { color: #999999; font-weight: bold; font-style: italic } /* Comment.Special */
.highlight .gd { color: #000000; background-color: #ffdddd } /* Generic.Deleted */
.highlight .gd .x { color: #000000; background-color: #ffaaaa } /* Generic.Deleted.Specific */
.highlight .ge { font-style: italic } /* Generic.Emph */
.highlight .gr { color: #aa0000 } /* Generic.Error */
.highlight .gh { color: #999999 } /* Generic.Heading */
.highlight .gi { color: #000000; background-color: #ddffdd } /* Generic.Inserted */
.highlight .gi .x { color: #000000; background-color: #aaffaa } /* Generic.Inserted.Specific */
.highlight .go { color: #888888 } /* Generic.Output */
.highlight .gp { color: #555555 } /* Generic.Prompt */
.highlight .gs { font-weight: bold } /* Generic.Strong */
.highlight .gu { color: #aaaaaa } /* Generic.Subheading */
.highlight .gt { color: #aa0000 } /* Generic.Traceback */
.highlight .kc { font-weight: bold } /* Keyword.Constant */
.highlight .kd { font-weight: bold } /* Keyword.Declaration */
.highlight .kp { font-weight: bold } /* Keyword.Pseudo */
.highlight .kr { font-weight: bold } /* Keyword.Reserved */
.highlight .kt { color: #445588; font-weight: bold } /* Keyword.Type */
.highlight .m { color: #009999 } /* Literal.Number */
.highlight .s { color: #d14 } /* Literal.String */
.highlight .na { color: #008080 } /* Name.Attribute */
.highlight .nb { color: #0086B3 } /* Name.Builtin */
.highlight .nc { color: #445588; font-weight: bold } /* Name.Class */
.highlight .no { color: #008080 } /* Name.Constant */
.highlight .ni { color: #800080 } /* Name.Entity */
.highlight .ne { color: #990000; font-weight: bold } /* Name.Exception */
.highlight .nf { color: #990000; font-weight: bold } /* Name.Function */
.highlight .nn { color: #555555 } /* Name.Namespace */
.highlight .nt { color: #000080 } /* Name.Tag */
.highlight .nv { color: #008080 } /* Name.Variable */
.highlight .ow { font-weight: bold } /* Operator.Word */
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
.highlight .mf { color: #009999 } /* Literal.Number.Float */
.highlight .mh { color: #009999 } /* Literal.Number.Hex */
.highlight .mi { color: #009999 } /* Literal.Number.Integer */
.highlight .mo { color: #009999 } /* Literal.Number.Oct */
.highlight .sb { color: #d14 } /* Literal.String.Backtick */
.highlight .sc { color: #d14 } /* Literal.String.Char */
.highlight .sd { color: #d14 } /* Literal.String.Doc */
.highlight .s2 { color: #d14 } /* Literal.String.Double */
.highlight .se { color: #d14 } /* Literal.String.Escape */
.highlight .sh { color: #d14 } /* Literal.String.Heredoc */
.highlight .si { color: #d14 } /* Literal.String.Interpol */
.highlight .sx { color: #d14 } /* Literal.String.Other */
.highlight .sr { color: #009926 } /* Literal.String.Regex */
.highlight .s1 { color: #d14 } /* Literal.String.Single */
.highlight .ss { color: #990073 } /* Literal.String.Symbol */
.highlight .bp { color: #999999 } /* Name.Builtin.Pseudo */
.highlight .vc { color: #008080 } /* Name.Variable.Class */
.highlight .vg { color: #008080 } /* Name.Variable.Global */
.highlight .vi { color: #008080 } /* Name.Variable.Instance */
.highlight .il { color: #009999 } /* Literal.Number.Integer.Long */

/* media queries */
/* ----------------------------------------------------------*/

@media (min-width: 1080px) {
  .site-link:link,
  .site-link:visited {
    position: absolute;
    right: 20px;
    top: 50%;
    transform: translateY(-50%);
  }
}

@media screen and (max-width: 750px) {

  .footer-col-1 { width: 50%; }

  .footer-col-2 {
    width: 45%; /*fallback*/
    width: -webkit-calc(50% - 10px);
    width: -moz-calc(50% - 10px);
    width: -o-calc(50% - 10px);
    width: calc(50% - 10px);
    margin-right: 0;
  }

  .site-footer .column.footer-col-3 {
    width: auto;
    float: none;
    clear: both;
  }

}

@media screen and (max-width: 600px) {

  .wrap { padding: 0 12px; }

  .post-header h1 { font-size: 36px; }
  .post-content h2 { font-size: 28px; }
  .post-content h3 { font-size: 22px; }
  .post-content h4 { font-size: 18px; }
  .post-content blockquote { padding-left: 10px; }
  .post-content ul,
  .post-content ol { padding-left: 10px; }

  .site-footer .column {
    float: none;
    clear: both;
    width: auto;
    margin: 0 0 15px; }

}


================================================
FILE: generative-modeling.md
================================================
---
title: 'Generative Modeling'
layout: page
permalink: /generative-modeling/
---


Table of Contents
- [Motivation and Overview](#Motivation-and-Overview)
- [Pixel RNN/CNN](#Pixel-RNN/CNN)
    - [Explicit density model](#Explicit-density-model)
    - [Pixel RNN](#Pixel-RNN)
    - [Pixel CNN](#Pixel-CNN)
- [Variational Autoencoder](#Variational-Autoencoder)
    - [Overview of Variational Autoencoder](#Overview-of-Variational-Autoencoder)
    - [Autoencoders v.s. Variational Autoencoders](#Autoencoders-v.s.-Variational-Autoencoders)
    - [VAE Mathematical Explanation](#VAE-Mathematical-Explanation)
    - [VAE training process](#VAE-training-process)
- [Generative Adversarial Networks](#Generative-Adversarial-Networks)
    - [Overview of Generative Adversarial Networks](#Overview-of-Generative-Adversarial-Networks) 
    - [Generative Adversarial Nets - 2014 original version](#Generative-Adversarial-Nets---2014-original-version)
        - [Discriminator network](#Discriminator-network)
        - [Generator network](#Generator-network)
    - [GAN Mathematical explanation](#GAN-Mathematical-explanation)
    - [Evaluation](#Evaluation)
        - [Inception Scores](#Inception-Scores)
        - [Nearest Neighbours](#Nearest-Neighbours)
        - [HYPE - Human eye perceptual Evaluation](#HYPE---Human-eye-perceptual-Evaluation)
    - [Challenges](#Challenges)
        - [Optimization](#Optimization)
        - [Mode Collapse](#Mode-collapse)
    - [Case Studies](#Case-Studies)
        - [DCGAN](#DCGAN)
        - [CycleGAN](#CycleGAN)
        - [StyleGAN](#StyleGAN)
    - [Summary](#Summary)

<a name='intro'></a>

## Motivation and Overview
In the first half of the quarter, we studied several supervised learning methods, which learn functions to map input images to labels. However, labeling the training data may be expensive because it requires much time and effort. Thus, we are introducing unsupervised learning methods. In unsupervised learning methods, training data is relatively cheaper because the methods don't need labeling from the huge dataset. The goal is to learn the underlying hidden structures or feature representations from raw data directly.

This table compares supervised and unsupervised learning: 
|          | Supervised Learning | Unsupervised Learning |
| -------- | -------- | -------- |
| Data         | has label y         |    no labels      |
| Goal         |  input data -> output label        |    Learn some underlying hidden strcture of the data      |
| Examples     | Classification, regression, object detection, semantic segmentation, image captioning, etc     |   Clustering, dimensionality reduction, feature learning, density estimation, etc.   |


**Generative modeling** is in the class of unsupervised learning. The goal of generative modeling is to generate new samples from the same distribution. In the application of image generation, we want to make sure that the quality of generated images aligns with the raw data image distributions. Thus, during the training process, there are two objectives:
1. Learn $p_{model} (x)$ that **approximates** $p_{data}(x)$ 
2. Sampling new data $x$ from $p_{model}(x)$ 


Within the first objective (how $p_{model} (x)$ approximates $p_{data}(x)$), we can categorize generative model into two types: 
1. Explicit density estimation
2. Implicit density estimation

![](https://i.imgur.com/L6l6qLn.png)


In this document, we will talk about 3 most popular types: 
1. Pixel RNN/CNN - Explicit density estimation
2. Variational Autoencoder - Approximate density
3. Generative Adversarial Networks - Implicit density

## Pixel RNN/CNN


### Explicit density model
**Pixel RNN/CNN** is an explicit density estimation method, which means that we explicitly define and solve for $p_{model}(x)$. For example, given an input image $x$, we can calculate the joint likelihood of each pixel in the image, in order to predict the of the pixel values of the image $x$. Being able to explicitly calculate the joint likelihood is why we called it explicitly defined method. 

However, estimating the joint likelihood of pixels directly can be difficult. A trick borrowed from probability is to rewrite the joint likelihood as a product of the conditional likelihoods on the previous pixels. This uses the chain rule to decompose the likelihood of an image $x$ into products of 1-dimensional densities. Our objective is then to maximize the likelihood of training data.


$$
\begin{aligned}
p(x) = p(x_1, x_2, ..., x_n)
     &= \prod_{i=1}^n p(x_i | x_1, ..., x_{i - 1}).
\end{aligned}
$$


### Pixel RNN
You may notice that the distribution of $p(x)$ is very complex because each pixel is conditioned on hundreds of thousands of pixels, making the computation very expensive. How can we resolve this problem? 

Recall RNN that we learn in the previous lecture. RNN has an “internal state” that is updated as a sequence is processed and allows previous outputs to be used as inputs. We can treat the conditional distribution of each pixel as a sequence of data, and apply RNN to model the joint likelihood function. More specifically, we can *model the dependency of one pixel on all the previous pixels by keeping the hidden state of all the previous inputs*. We use the hidden state to express the dependency of generating a new pixel from the previous pixel. In the beginning, we have the default hidden state,  first pixel $x_1$, and the image we want to generate. We will use the first pixel to generate the second pixel and repeat. Then it becomes a sequential generating process: feeding the previously predicted pixel back to the network to generate the next pixel.


The process described in the Pixel RNN paper is as follows: Starting from the corner, each pixel is conditional on the pixel from the left and the pixel above. Repeat the sequential generating process until the whole image is generated.

![](https://i.imgur.com/6ajn4Pe.gif)
<figcaption> Pixel RNN Sequential Generating Process </figcaption>


### Pixel CNN
One drawback of Pixel RNN is that the sequential generation is slow and we need to process the pixels one at a time. Is there a way to process more pixels at a time? In the same paper, the authors proposed another method, **Pixel CNN**, which allows parallelizaton among pixels. Specifically, Pixel CNN uses a *masked convolution over context region*. Different from regular square receptive field in the convolutional layer, the receptive field of masked convolution need not be a square. 

You may wonder: are we able to generate the whole image with masked convolution? In fact, if we stack enough layers of this kind of masked convolution, we can achieve the same effective receptive field as the pixel generation that conditional on all of the previous pixels (pixel RNN).

![](https://i.imgur.com/2eSJZ2Y.png)
<figcaption> Pixel CNN Generating Example </figcaption>


## Variational Autoencoder

### Overview of Variational Autoencoder

The modeling procedure of Pixel RNN is still slow because it's a sequential generation process. What if we make a little bit of trade-off but we can generate all pixels at the same time and model a simpler data distribution? Instead of optimizing the expensive tractable density function directly, we can *derive and optimize the lower bound on likelihood* instead. This is called Approximate density estimation.

We can re-write the probability density function as $p_{\theta}(x) = \int p_{\theta}(z)p_{\theta}(x|z)dz$. We introduce a new latent variable $z$ to *decompose the data likelihood as the marginal distribution of the conditional likelihood w.r.t this latent variable*. The latent variable $z$ represents *the underlying structure of the data distribution*. 

This method is called **Variational Autoencoders** (VAE). There is no dependency among the pixels. All pixels are conditional on this variable z. We can generate all pixels at the same time. The drawback is we need to integrate all possible values of z. In reality, $p_\theta(z)$ is low dimensional and $p_\theta(x|z)$ is often times complex, making it impossible to integrate $p_\theta(x)$ with an analytical solution. Thus, we cannot directly optimize this function as a result. We will discuss how to resolve this issue by approximating the unknown posterior distribution from only the observed data $x$ in the following section. 

### Autoencoders v.s. Variational Autoencoders

**Autoencoder** Before diving into Variational Autoencoders, let's take a look at **Autoencoder**, a model that encodes input by reconstructing the input itself. An Autoencoder contains an encoder and a decoder, with the goal of learning a low-dimensional feature representation from the input (unlabeled) training data. The encoder compresses the input data to low-dimensional feature vector z, while the decoder decomposes $z$ to the same shape as the input data.

The idea of Autoencoder is to *compress input images such that each vector in z contains meaningful factors of variation in data*. For example, if the inputs are different faces, the dimensions in z could be facial expressions, poses, different degrees of smile, etc.


However, we cannot generate new images from an autoencoder because we don't know the distributional space of z. VAE makes Autoencoders generative and allows us to sample from the model to generate data. VAE estimates the latent z representation so that we can generate more realistic images from the sampling. The intuition is z space shall reflect the factors of the variations. Assume that each image x is generated by sampling a new z with a slightly different factor of variations. Overall, z is used to conditionally generate the x.


### VAE Mathematical Explanation

We need two things to represent the model:
1. choose a proper $p(z)$:
Gaussian distribution is a reasonable choice for latent attributes. We can interpret every expression as a variation of the average neutral expression.
2. conditional distribution $p(x|z)$ is represented with the neural network:
We want to be able to generate a high-dimensional image from the simple low-dimension Gaussian distribution.

**Intractability** To train the model, we can learn model parameters to maximize likelihood of training data: $p_{\theta}(x) = \int p_{\theta}(z)p_{\theta}(x|z)dz$. However, this likelihood expression is intractable to evaluate or to optimize because we cannot compute $p(x|z)$ for every z in the intergral. We may also try to estimate posterior density $p_{\theta}(z|x) = p_{\theta}(x|z)p_{\theta}(z)/p_{\theta}(x)$ but it's also intractable due to the $p_{\theta}(x)$ term. Alternatively, in the paper, the authors proposed that we can approximate the true posterior $p_{\theta}(z|x)$ with $q_{\phi}(z|x)$, which is a lower bound on the data likelihood and can be optimized. 

The goal is to maximize the log-likelihood of $p_{\theta} (x^{(i)})$. Since $p_{\theta} (x^{(i)})$ does not depend on z, we can re-write as the expectation of z w.r.t $q_{\phi}(z|x^{(i)})$ and further derive (from lecture 12 slide):
$$
\begin{aligned}
\log p_{\theta} (x^{(i)}) &= \mathbb{E}_{z \sim q_{\phi}(z|x^{(i)})} [\log p_{\theta}(x^{(i)})] \\
&= \mathbb{E}_z [\log \frac{p_{\theta}(x^{(i)} | z)p_{\theta}(z)}{p_{\theta}(z | x^{(i)})}] \\
&= \mathbb{E}_{z} [\log \frac{p_{\theta}(x^{(i)} | z)p_{\theta}(z) q_{\phi}(z | x^{(i)}) }{p_{\theta}(z | x^{(i)}) q_{\phi}(z | x^{(i)}) }] \\
&= \mathbb{E}_z [\log p_{\theta} (x^{(i)} | z)] - \mathbb{E}_z [\log \frac{q_{\phi}(z | x^{(i)})}{p_{\theta}(z)}] + \mathbb{E}_z [\log \frac{q_{\phi}(z | x^{(i)})}{p_{\theta}(z | x_{(i)})}] \\
&=  \mathbb{E}_z [\log p_{\theta} (x^{(i)} | z)] - D_{KL}(q_{\phi}(z | x^{(i)}) || p_{\theta}(z)) + D_{KL}(q_{\phi}(z | x^{(i)})|| p_{\theta}(z | x^{(i)}))
\end{aligned}
$$

The estimate of $\mathbb{E}_z [\log p_{\theta} (x^{(i)} | z)]$ can be computed through sampling. $D_{KL}(q_{\phi}(z | x^{(i)}) || p_{\theta}(z))$ has closed-form solution. $D_{KL}(q_{\phi}(z | x^{(i)})|| p_{\theta}(z | x^{(i)}))$ would be always larger than or equal to zero. Thus, we have tractable lower bound. 

### VAE training process
![](https://i.imgur.com/IK8laIh.png)


$q_{\phi}(z | x^{(i)})$ is the encoder network in this process. The goal of $D_{KL}(q_{\phi}(z | x^{(i)}) || p_{\theta}(z))$ is to estimate a posterior distribution close to prior distribution. On the other hand, $p_{\theta} (x^{(i)} | z)$ is the decoder network and reconstruct the input data. We compute them in the forward pass for every minibatch of input data and then perform back-propagation. 


## Generative Adversarial Networks

### Overview of Generative Adversarial Networks
While implicit modeling is proven useful in generating data, it has the drawback of needing to estimate a probability distribution. What if we give up on explicitly modeling density, and just want the ability to sample? For **Generative Adversarial Networks**, we don't model the likelihood function $p(x)$ at all but only care about generating high-quality pictures.


From VAE, we learn that we can map a simple Gaussian distribution to a complex image distribution. We could leverage the same idea by mapping low dimensional noise to high dimensional image distribution. We can think of the decoder network as a generative network. The goal of Generative Adversarial Networks is to directly generate samples from a high-dimensional training distribution.


### Generative Adversarial Nets - 2014 original version
You may be curious: if we don't model z's distribution and don't know which sample z maps to which training image, how can we learn by reconstructing training images?

The general objective is to *generate images that should look "real"*. To achieve that, Generative Adversarial Net trains another network that learns to tell the difference between real and fake images and whether the generated image from a generator network looks like the one coming from the real distribution.


#### Discriminator network
The network that tells whether the image is real or fake is called the **Discriminator network**. We refer to images from the training distribution as real and the generated images from the generator network as fake. The discriminator network is essentially performing a supervised binary-class classification task. The discriminator uses the real/fake information to compute the gradient and backpropagate to the generation network, to make the generative examples more 'real'.

In the beginning, the discriminator can tell whether an image is a real input image or a generated image easily. Over time, as the generator network improves, images become more and more realistic. The discriminator has to change the decision boundary gradually to fit the new distribution better and better.

Python code example: 
```python
logits_real = D(real_data)

random_noise = sample_noise(batch_size, noise_size)
fake_images = G(random_noise)
logits_fake = D(fake_images.view(batch_size, 1, size, size))

d_total_error = discriminator_loss(logits_real, logits_fake)
```


#### Generator network
On the other side, the network that maps low dimensional noise to high dimensional image distribution is called the **Generator**. The goal of the generator is to fool the discriminator by generating real-looking images. In the beginning, the generator will generate random tensors that don't look like real images at all. However, the signal from the discriminator would inform the generator how it should improve the generated image to look more real. Over time, the generator would learn to generate more and more realistic samples.

python code example: 
```python
random_noise = sample_noise(batch_size, noise_size)
fake_images = G(random_noise)

gen_logits_fake = D(fake_images.view(batch_size, 1, size, size))
g_error = generator_loss(gen_logits_fake)
```

#### GAN Mathematical explanation 
Due to the coexistence of two networks, GAN is a two-player/min-max game that balances the optimization between Generator and Discriminator network. 

**Objective function**: 
$$
\begin{aligned}
\min_{\theta_g} \max_{\theta_d} [\mathbb{E}_{x \sim p_{data}} \log D_{\theta_d} (x) + \mathbb{E}_{z \sim p(z)} \log(1 - D_{\theta_d}(G_{\theta_g}(z)))]
\end{aligned}
$$

**Generator Objective**: $\min_{\theta_g}$ find weights that minimize this objective. $\mathbb{E}_{x \sim p_{data}} \log D_{\theta_d} (x)$ is the expectation of score predicted by discriminator, given on the training set. The generator would try to maximize this term because the generator tries to fool the discriminator by generating more realistic images.

**Discriminator Objective**: $\max_{\theta_d}$ find weights that maximize this objective.

During the training, generator transforms noise z to tensor and then the generated image is fed to the discriminator. Thus, $D_{\theta_d}(G_{\theta_g}(z))$ is the generated image score predicted by discriminator. Discriminator tries to tell the difference between real and fake images by moving this term to 0 and maximizing $1 - D_{\theta_d}(G_{\theta_g}(z))$. 


The training process is to alternate between 
1. Gradient ascent on discriminator 
2. Gradient descent on generator

Problem of 2 is the gradient dominated by the region sample is already good. Training is very slow and unstable at the beginning. One solution is to change to use gradient ascent on generator and modify the different objective.

![](https://i.imgur.com/zvAex6j.png) 
<figcaption> Generative Adversaial Nets training flow </figcaption>


### Evaluation

Recall that there are two objectives in Generative Modeling: 
1. Learn $p_{model} (x)$ that approximates $p_{data}(x)$
2. Sampling new data $x$ from $p_{model}(x)$ 

When evaluating the GAN output, we want to make sure the two objectives are taken care of.


#### Inception Scores
Inception score was a popular evaluation metric, which evaluates the quality of generated images. Inception Score uses the Inception V3 pre-trained model on ImageNet to observe the distribution of generated images. If the generated image is easily recognized by the discriminator, the classification score (i.e. $p(y|x)$) would be large, which leads to $p(y|x)$ having low entropy. Meanwhile, we also want the marginal distribution $p(y)$ to have high entropy. The inception score can be calculated as follows:
$$
\begin{aligned}
IS(x) &= \exp(\mathbb{E}_{x \sim p_g}[D_{KL}[p(y|x) || p(y)]]) \\
&= \exp(\mathbb{E}_{x \sim p_g, y \sim p(y | x)} [\log p(y | x) - \log p(y)]) \\
&= \exp(H(y) - H(y | x))
\end{aligned}
$$
where a high inception score indicates better-quality generated images. However, the inception score has some drawbacks and is fooled over the years. Thus, people started using measurements such as FID in recent years. 

#### Nearest Neighbours
A simpler evaluation method is to visualize a sample of generated images to tell how realistic the generated images are. We can also leverage Nearest Neighbours to compare real images and generated images. The idea is to sample some real images from the training set and calculate the distance between the sampled generated images. If the generated images are real-looking, the distances should be small.

#### HYPE - Human eye perceptual Evaluation 
HYPE is a new evaluation method introduced in 2019. It evaluates GAN by a social computing method: the website invites users to evaluate GAN and try to build metrics on top of it. The goal is to ensure the evaluation is consistent while evaluating different types of GANs.


### Challenges

#### Optimization 
It's not easy to train GAN because the process has many challenges. Often times, the generator and discriminator loss keeps oscillating during GAN training. There is also no stopping criterion in practice. Also, when the discriminator is very confidently classifying fake samples, the generator training may fail due to vanishing gradients.  


#### Mode collapse

Mode collapse happens when the generator learns to fool the discriminator by producing a single class from the whole training dataset. Often time the training dataset is multi-modal, which means the probability density distribution over features has multiple peaks. If data is imbalanced or some other problems happen during the training process, the generating image may collapse into one mode or few modes while other modes are disappearing. For example, the discriminator classifies a lot of generated images incorrectly. The generator takes the feedback and only generates images that are the same or similar to the ones that fool the discriminator. Eventually, the generated images collapse into single-mode or fewer modes.

### Case Studies

#### DCGAN
The idea of DCGAN is to use a convolutional neural network in GAN. Here are some architecture guidelines DCGAN gave in their paper:
> 1. Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
> 2. Use batch norm in both the generator and the discriminator.
> 3. Remove fully connected hidden layers for deeper architectures
> 4. Use ReLU activation in generator for all layers except for the output, which uses Tanh.
> 5. Use LeakyReLU activation in the discriminator for all layers.


#### CycleGAN
Image-to-image translation is a class of problems where the goal is to map an image to another image within the same pair. For example, one may wish to map an image of a location during the Spring season to an image of the same location but during the Fall season. However, paired images are not always availabe. The goal of CycleGAN is then to learn a mapping $G$ such that the distribution $G(X)$ is as similar to the distribution of $Y$ as possible, where $X$ and $Y$ are the input and output images respectively. 

#### StyleGAN
StyleGAN is an extension of GAN that aims to improve the generator's ability to generate a wider variety of images. The main modifications to the architecture of GAN's generator is by having two sources of randomness (instead of one): a mapping network that controls the style of the output image, and an additional noise that adds variability to the image. Applications of StyleGAN include human-face generation, anime character generations, new fonts, etc. 


### Summary

Comparision between the methods


|  |  Pixel RNN/CNN   | Variational AutoEncoders | Generative Adversial Modeling |
| -------- | --- | -------- | -------- |
|    Pros      | <ul><li>Can explicitly compute likelihood p(x) </li> <li> Easy to optimize </li> <li>Good samples </li></ul>| <ul><li>principled approach to generative models </li> <li>interpretable latent space </li> <li> allows inference of q(z\|x) </li> <li>can be useful feature representation for other tasks  </li></ul>    | Beautiful, state-of-the-art samples!         |
|    Cons     | slow sequential generation    | <ul><li>Maximizes lower bound of likelihood which is not as good evaluation as PixelRNN/PixelCNN </li> <li> Samples blurrier and lower quality compared to state-of-the-art (GANs) </li></ul>    | <ul><li>Trickier/more unstable to train </li> <li> cannot solve inference queries such as p(x) p(z\|x)  </li></ul>   |


### Further Reading
These readings are optional and contain pointers of interest. 
> PixelRNN/CNN: https://arxiv.org/pdf/1601.06759.pdf
> Variational Auto-Encoders: https://arxiv.org/pdf/1312.6114.pdf
> Generative Adversial Net: https://arxiv.org/pdf/1406.2661.pdf
> DCGAN: https://arxiv.org/pdf/1511.06434.pdf
> CycleGAN: https://arxiv.org/pdf/1703.10593.pdf
> StyleGAN: https://arxiv.org/pdf/1812.04948.pdf
> Mode collapse: https://www.coursera.org/lecture/build-basic-generative-adversarial-networks-gans/mode-collapse-Terkm
> HYPE: https://arxiv.org/pdf/1904.01121.pdf


================================================
FILE: generative-models.md
================================================
# Generative Modeling

With generative modeling, we aim to learn how to generate new samples from the same distribution of the given training data. Specifically, there are two major objectives: 

- Learn $p_{\text{model}}(x)$ that approximates true data distribution $p_{\text{data}}(x)$
- Sampling new $x$ from $p_{\text{model}}(x)$

The former can be structured as learning how likely a given sample is drawn from a true data distribution; the latter means the model should be able to produce new samples that are similar but not exactly the same as the training samples. One way to judge if the model has learned the correct underlying representation of the training data distribution is the quality of the new samples produced by the trained model. 

These objectives can be formulated as density estimation problems. There are two different approaches: 

- **Explicit density estimation**: explicitly define and solve for $p_{\text{model}}(x)$
- **Implicit density estimation**: learn model that can sample from $p_{\text{model}}(x)$ without explicitly define it 

The explicit approach can be challenging because it is generally difficult to find an expression for image likelihood function from a high dimensional space. The implicit approach may be preferable in situations that the only interest is to generate new samples. In this case, instead of finding the specific expression of the density function, we can simply training the model to directly sample from the data distribution without going through the process of explicit modeling. 

<img src="assets/generative-models/taxonomy.svg" width="946"></img>

Generative models are widely used in various computer vision tasks. For instance, they are used in super-resolution applications in which the model fills in the details of the low resolution inputs and generates higher resolution images. They are also used for colorization in which greyscale images get converted to color images. 

# PixelRNN and PixelCNN

PixelRNN and PixelCNN [[van den Oord et al., 2016]](https://arxiv.org/abs/1601.06759) are examples of a **fully visible belief network (FVBN)** in which data likelihood function $p(x)$ is explicitly modeled given image input $x$: 

\( p(x) = p(x_1, x_2, \cdots, x_n) \)

where $x_1, x_2, \cdots$ are each pixel in the image. In other words, the likelihood of an image (LHS) is the joint likelihood of each pixel in the image (RHS). We then use chain rule to decompose the joint likelihood into product of 1-d distributions: 

\( \displaystyle p(x) = \prod_{i=1}^{n} p(x_i \mid x_1, \cdots, x_{i-1}) \)

where each distribution in the product gives the probability of $i$<sup>th</sup> pixel value given all previous pixels. Choice of pixel ordering (hence "previous") might have implications on computational efficiencies in training and inference. Following is an example of ordering in which each pixel $x_i$ (in red) is conditioned on all the previously generated pixels left and above of $x_i$ (in blue): 

<img src="assets/generative-models/ordering.svg" width="200"></img>

To train the model, we could try to maximize the defined likelihood of the training data. 

## PixelRNN

The problem with above naive approach is that the conditional distributions can be extremely complex. To mitigate the difficulty, the PixelRNN instead expresses conditional distribution of each pixel as sequence modeling problem. It uses RNNs (more specifically LSTMs) to model the joint likelihood function $p(x_i \mid x_1, \cdots, x_{i-1})$. The main idea is that we model the dependencies of one pixel on all previous pixels by the hidden state of an RNN. Then the process of feeding previous predicted pixel back to the network to get the next pixel allows us to sequentially generate all pixels of an image. 

<img src="assets/generative-models/pixelrnn.svg" width="622.02"></img>

Another thing that the PixelRNN model does slightly differently is that it defines pixel order diagonally. This allows some level of parallelization, which makes training and generation a bit faster. With this ordering, generation process starts from the top-left corner, and then makes its way down and right until the entire image is produced. 

<img src="assets/generative-models/diagonal.gif" width="200"></img>

## PixelCNN

Because the generation process even with this diagonal ordering is still largely sequential, it is expensive to train such model. To achieve further parallelization, instead of taking all previous pixels into consideration, we could instead only model dependencies on pixels in a context region. This gives rise to the PixelCNN model in which a context region is defined by a masked convolution. The receptive field of a masked convolution is an incomplete square around a central pixel (darker blue squares). This ensures that each pixel only depends on the already generated pixels in the region. The paper shows that with enough masked convolutional layers, the effective receptive field is the same as the pixel generation process that directly models dependencies on all previous pixels (all blue squares), like the PixelRNN model. 

<img src="assets/generative-models/pixelcnn.svg" width="508"></img>

Because context region values are known from training images, PixelCNN is faster in training thanks to convolution parallelizations. However, generation is still slow as the process is inherently sequential. For instance, for a $32 \times 32$ image, the model needs to perform forward pass $1024$ times to generate a single image. 

<img src="assets/generative-models/pixelcnn_samples.png" width="700"></img>

From the generation samples on [``CIFAR-10``](https://www.cs.toronto.edu/~kriz/cifar.html) (left) and [``ImageNet``](https://www.image-net.org/) (right), we see these models are able to capture the distribution of training data to some extent, yet the generated samples do not look like natural images. Later models like flow based deep generative models are able to strike a better balance between training and generation efficiencies, and generate better quality images. 

In summary, PixelRNN and PixelCNN models explicitly compute likelihood, and thus are relatively easy to optimize. The major drawback of these models is the sequential generation process which is time consuming. There have been follow-up efforts on improving PixelCNN performance, ranging from architecture changes to training tricks. 

# Variational Autoencoder

We introduce a new latent variable $z$ that allows us to decompose the data likelihood as the marginal distribution of the conditional data likelihood with respect to this latent variable $z$: 

\( \displaystyle p_{\theta}(x) = \int p_{\theta}(z) \cdot p_{\theta}(x \mid z) ~dz \)

In other words, all pixels of an image are independent with each other given latent variable $z$. This makes simultaneous generation of all pixels possible. However, we cannot directly optimize this likelihood expression. Instead, we optimize a lower bound of this expression to approximate the optimization we'd like to perform. 

## Autoencoder

On a high-level, the goal of an autoencoder is to learn a lower-dimensional feature representation from un-labeled training data. The "encoder" component of an autoencoder aims at compressing input data into a lower-dimensional feature vector $z$. Then the "decoder" component decodes this feature vector and converts it back to the data in the original dimensional space. 

<img src="assets/generative-models/autoencoder.svg" width="302"></img>

The idea of the dimensionality reduction step is that we want every dimension of the feature vector $z$ captures meaningful factors of variation in data. We feed the feature vector into the decoder network and have it learn how to reconstruct the original input data with some pre-defined pixel-wise reconstruction loss (L2 is one of the most common choices). By training an autoencoder model, we hope feature vector $z$ eventually encodes the most essential information about possible variables of the data. 

Now the autoencoder gives a way to effectively represent the underlying structure of the data distribution, which is one of the objectives of generative modeling. However, since do not know the entire latent space that $z$ is in (not every latent feature in the latent space can be decoded into a meaningful image), we are unable to arbitrarily generate new images from an autoencoder. 

## Variational Autoencoder

To be able to sample from the latent space, we take a probabilistic approach to autoencoder models. Assume training data $\big\{x^{(i)}\big\}_{i=1}^{N}$ is generated from the distribution of unobserved latent representation $z$. So $x$ follows the conditional distribution given $z$; that is, $p_{\theta^{\ast}}(x \mid z^{(i)})$. And $z^{(i)}$ follows the prior distribution $p_{\theta^{\ast}}(z)$. In other words, we assume each image $x$ is generated by first sampling a new $z$ that has a slight different factors of variation and then sampling the image conditionally on that chosen variable $z$. 

<img src="assets/generative-models/vae_assumptions.svg" width="536"></img>

With variational autoencoder [[Kingma and Welling, 2014]](https://arxiv.org/abs/1312.6114), we would like to estimate true parameters $\theta^{\ast}$ of both the prior and conditional distributions of the training data. We choose prior $p_{\theta}(z)$ to be a simple distribution (e.g. a diagonal/isotropic Gaussian distribution), and use a neural network, denoted as **decoder** network, to decode a latent sample from prior $pp_{\theta}(z)$ to a conditional distribution of the image $p_{\theta}(x \mid z)$. We have the data likelihood: 

\(\displaystyle p_{\theta}(x) = \int p_{\theta}(z) \cdot p_{\theta}(x \mid z) ~dz \)

We note that to train the model, we need to compute the integral which involves computing $p(x \mid z)$ for every possible $z$. Hence it is intractably to directly optimize the likelihood expression. We could use Monte Carlo estimation technique but there will incur high variance because of the high dimensionality nature of the density function. If we look at the posterior distribution using the Baye's rule: 

\( p_{\theta}(z \mid x) = \dfrac{p_{\theta}(x \mid z) \cdot p_{\theta}(z)}{p_{\theta}(x)} \)

we see it is still intractable to compute because $p_{\theta}(x)$ shows up in the denominator. 

To make it tractable, we instead learn another distribution $q_{\phi}(z \mid x)$ that approximates the true posterior distribution $p_{\theta}(z \mid x)$. We denote this approximate distribution as probabilistic **encoder** because given an input image, it produces a distribution over the possible values of latent feature vector $z$ from which the image could have been sampled from. This approximate posterior distribution $q_{\phi}(z \mid x)$ allows us to derive a lower bound on the data likelihood. We then can optimize the tractable lower bound instead. The goal of the variational inference is to approximate the unknown posterior distribution $p_{\theta}(z \mid x)$ from only the observed data. 

### Tractable Lower Bound

To derive the tractable lower bound, we start from the log likelihood of an observed example: 

\( \begin{aligned}
  \log p_{\theta}(x^{(i)}) &= \mathbb{E}_{z \sim q_{\phi}(z \mid x^{(i)})} \Big[\log p_{\theta}(x^{(i)})\Big] \quad \cdots \small\mathsf{(1)} \\
  &= \mathbb{E}_{z} \bigg[\log \frac{p_{\theta}(x^{(i)} \mid z) \cdot p_{\theta}(z)}{p_{\theta}(z \mid x^{(i)})}\bigg] \quad \cdots \small\mathsf{(2)} \\
  &= \mathbb{E}_{z} \bigg[\log \bigg(\frac{p_{\theta}(x^{(i)} \mid z) \cdot p_{\theta}(z)}{p_{\theta}(z \mid x^{(i)})} \cdot \frac{q_{\phi}(z \mid x^{(i)})}{q_{\phi}(z \mid x^{(i)})}\bigg)\bigg] \quad \cdots \small\mathsf{(3)} \\
  &= \mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big] - \mathbb{E}_{z} \bigg[\log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z)}\bigg] + \mathbb{E}_{z} \bigg[\log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z \mid x^{(i)})}\bigg] \quad \cdots \small\mathsf{(4)} \\
  &= \mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big] - D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z)\Big) + D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z \mid x^{(i)})\Big) \quad \cdots \small\mathsf{(5)}
\end{aligned} \)

- Step $\mathrm{(1)}$: the true data distribution is independent of the estimated posterior $q_{\phi}(z \mid x^{(i)})$; moreover, since $q_{\phi}(z \mid x^{(i)})$ is represented by a neural network, we are able to sample from distribution $q_{\phi}$. 

- Step $\mathrm{(2)}$: by the Baye's rule: 

\( \begin{aligned}
  & p_{\theta}(z \mid x) = \dfrac{p_{\theta}(x \mid z) \cdot p_{\theta}(z)}{p_{\theta}(x)} \\
  \Longrightarrow \quad & p_{\theta}(x) = \dfrac{p_{\theta}(x \mid z) \cdot p_{\theta}(z)}{p_{\theta}(z \mid x)}
\end{aligned} \)

- Step $\mathrm{(3)}$: multiplying the expression by $1 = \dfrac{q_{\phi}(z \mid x^{(i)})}{q_{\phi}(z \mid x^{(i)})}$

- Step $\mathrm{(4)}$: by logarithm properties as well as linearity of expectation: 

\( \begin{aligned}
  &~ \mathbb{E}_{z} \bigg[\log \bigg(\frac{p_{\theta}(x^{(i)} \mid z) \cdot p_{\theta}(z)}{p_{\theta}(z \mid x^{(i)})} \cdot \frac{q_{\phi}(z \mid x^{(i)})}{q_{\phi}(z \mid x^{(i)})}\bigg)\bigg] \\
  =&~ \mathbb{E}_{z} \bigg[\log \bigg(p_{\theta}(x^{(i)} \mid z) \cdot \frac{p_{\theta}(z)}{q_{\phi}(z \mid x^{(i)})} \cdot \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z \mid x^{(i)})}\bigg)\bigg] \\
  =&~ \mathbb{E}_{z} \bigg[\log p_{\theta}(x^{(i)} \mid z) + \log \frac{p_{\theta}(z)}{q_{\phi}(z \mid x^{(i)})} + \log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z \mid x^{(i)})}\bigg] \\
  =&~ \mathbb{E}_{z} \bigg[\log p_{\theta}(x^{(i)} \mid z) - \log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z)} + \log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z \mid x^{(i)})}\bigg] \\
  =&~ \mathbb{E}_{z} \bigg[\log p_{\theta}(x^{(i)} \mid z)\bigg] - \mathbb{E}_{z}\bigg[\log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z)}\bigg] + \mathbb{E}_{z}\bigg[\log \frac{q_{\phi}(z \mid x^{(i)})}{p_{\theta}(z \mid x^{(i)})}\bigg] \\
\end{aligned} \)

- Step $\mathrm{(5)}$: by definition of the [Kullback–Leibler divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence). The KL divergence gives a measure of the “distance” between two distributions. 

We see the first term $\mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big]$ involves $p_{\theta}(x^{(i)} \mid z)$ that is given by the decoder network. With some tricks, this term can be estimated through sampling. 

The second term is the KL divergence between the approximate posterior and the prior (a Gaussian distribution). Assuming the approximate posterior posterior takes on a Gaussian form with diagonal covariance matrix, the KL divergence then has an analytical closed form solution. 

The third term is the KL divergence between between the approximate posterior and the true posterior. Even though we it is intractable to computer, by non-negativity of KL divergence, we know this term is non-negative. 

Therefore we obtain the tractable lower bound of log likelihood of the data: 

\( \log p_{\theta}(x^{(i)}) = \underbrace{\mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big] - D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z)\Big)}_{\mathcal{L}(x^{(i)}; \theta, \phi)} + \underbrace{D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z \mid x^{(i)})\Big)}_{\geqslant 0} \)

We note that $\mathcal{L}(x^{(i)}; \theta, \phi)$, as known as the **evidence lower bound (ELBO)**, is differentiable, hence we can apply gradient descent methods to optimize the lower bound. 

The lower bound can also be interpreted as encoder component $D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z)\Big)$ that seeks to approximate posterior distribution close to prior; and decoder component $\mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big]$ that concerns with reconstructing the original input data. 

### Training

<img src="assets/generative-models/vae_training.svg" width="702"></img>

For a given input, we first use the encoder network to to generate the mean $\mu_{z \mid x}$ and variance $\Sigma_{z \mid x}$ of the approximate posterior Gaussian distribution. Notice that here $\Sigma_{z \mid x}$ is represented by a vector instead of a matrix because the approximate posterior is assumed to have a diagonal covariance matrix. Then we can compute the gradient of the KL divergence term $D_{\mathrm{KL}} \Big(q_{\phi}(z \mid x^{(i)}) \parallel p_{\theta}(z)\Big)$ as it has an analytical solution. 

Next we compute the gradient of the expectation term $\mathbb{E}_{z} \Big[\log p_{\theta}(x^{(i)} \mid z) \Big]$. Since $p_{\theta}(x^{(i)} \mid z)$ is represented by the decoder network, we need to sample $z$ from the approximate posterior $\mathcal{N}(\mu_{z \mid x}, \Sigma_{z \mid x})$. However, since $z$ is not part of the computation graph, we won't be able to find the gradient of the expression that entails the sampling process. To solve this problem, we perform **reparameterization**. Specifically, we take advantage of the Gaussian distribution assumption, and sample $\varepsilon \sim \mathcal{N}(0, I)$. Then we represent $z$ as $z = \mu_{z \mid x} + \varepsilon \Sigma_{z \mid x}$. This way $z$ has the same Gaussian distribution as before; moreover, since $\varepsilon$ is seen as an input to the computation graph, and both $\mu_{z \mid x}$ and $\Sigma_{z \mid x}$ are part of the computation graph, the sampling process now becomes differentiable. 

Lastly, we use the decoder network to produce the pixel-wise conditional distribution $p_{\theta}(x \mid z)$. We are now able to perform the maximum likelihood of the original input. In practice, L2 distance between the predicted image and the actual input image is commonly used. 

For every minibatch of input data, we compute the forward pass and then perform the back-propagation. 

### Inference

<img src="assets/generative-models/vae_inference.svg" width="647"></img>

We take a sample $z$ from the prior distribution $p_{\theta}(z)$ (e.g. a Gaussian distribution), then feed the sample into the trained decoder network to obtain the conditional distributions $p_{\theta}(x \mid z)$. Lastly we sample a new image from the conditional distribution. 

### Generated Samples

Since we assumed diagonal prior for $z$, components of latent variable are independent of each other. This means different dimensions of $z$ encode interpretable factors of variation. 

<img src="assets/generative-models/vae_interpretations_mnist.png" height="320"></img>

After training the model using a $2$-dimensional latent variable $z$ on [``MNIST``](https://yann.lecun.com/exdb/mnist/), we discover that varying the samples of $z$ would induce interpretable variations in the image space. For instance, one possible interpretation would be that $z_1$ morphs digit ``6`` to ``9`` through ``7``, and $z_2$ is related to the orientation of digits. 

<img src="assets/generative-models/vae_interpretations_face.png" height="320"></img>

Similarly, we also find that dimensions of latent variable $z$ can be interpretable after training the model on head pose dataset. For instance, it appears $z_1$ encodes degree of smile and $z_2$ encodes head pose orientation. 

<img src="assets/generative-models/vae_samples.png" width="750"></img>

From above generation samples on [``CIFAR-10``](https://www.cs.toronto.edu/~kriz/cifar.html) (left) and labeled face images (right), we see newly generated images are similar to the original ones. However, these generated images are still blurry and generating high quality images is an active area for research. 

# Generative Adversarial Networks (GANs)

We would like to train a model to directly generate high quality samples without modeling any explicit density function $p(x)$. With GAN [[Goodfellow et al., 2014]](https://arxiv.org/abs/1406.2661), our goal is to train a **generator network** to learn transformation from a simple distribution (e.g. random noise) that we can easily sample from to the high-dimensional training distribution followed by the data. The challenge is that because we do not model any data distribution, we don't have the mapping between random sample $z$ to a training image $x$. This means we cannot directly train the model with supervised reconstruction loss. 

To overcome this challenge, we recognize the general objective that all the images generated from the latent space of $z$ should exhibit "realness". In other words, all the generated images should look like they belong to the original training data. To formulate this general objective into a learning objective, we introduce another **discriminator network** that learns to identify whether an image is from the training data distribution or not. Specifically, the discriminator network performs a two-class classification task in which an input image feeds into the network and a label indicating if the input image is from the training data distribution or is produced by the generated network. 

We then can use the output from the discriminator network to compute gradient and perform back-propagation to the generator network to gradually improve the image generation process. Overtime, learning signal from the discriminator will inform the generator on how to produce more "realistic" samples. Similarly, as generated images from the generator become more and more close to the real training data, the discriminator adapt its decision boundary to fit the training data distribution better. The discriminator effectively learns to model the data distribution without explicitly defining it. 

<img src="assets/generative-models/gan.svg" width="553"></img>

In summary: 

- **discriminator network**: try to distinguish between real and fake images
- **generator network**: try to fool the discriminator by generating real-looking images

## Training GANs

Training GAN can be formulated as the minimax optimization of a two-player adversarial game. Assume that the discriminator outputs likelihood in $(0,1)$ of real image, the objective function is the following: 

\( \displaystyle \min_{\theta_g} \max_{\theta_d} \Big\{\mathbb{E}_{x \sim p_{\mathrm{data}}}\big[\log \underbrace{D_{\theta_d}(x)}_{\mathsf{(1)}}\big] + \mathbb{E}_{z \sim p(z)}\big[\log\big(1 - \underbrace{D_{\theta_d}(G_{\theta_g}(z))\big)}_{\mathsf{(2)}}\big]\Big\} \)


- $\mathsf{(1)}$: $D_{\theta_d}(x)$ is the discriminator output (score) for real data $x$
- $\mathsf{(2)}$: $D_{\theta_d}(G_{\theta_g}(z))\big)$ is the discriminator output (score) for generated fake data $G(z)$

The inner maximization is the discriminator objective. The discriminator aims to find maximizer $\theta_g$ such that real data $D(x)$ is close to $1$ (real) while generated fake data $D(G(z))$ is close to $0$ (fake). 

The outer minimization is the generator objective. The generator aims to find minimizer $\theta_g$ such that generated fake data $D(G(z))$ is close to $1$ (real). This means the generator seeks to fool discriminator into thinking that generated fake data $D(G(z))$ is real. 

Naively, we could alternate between maximization and minimization by performing **gradient ascent on discriminator**: 

\( \displaystyle \max_{\theta_d} \Big\{\mathbb{E}_{x \sim p_{\mathrm{data}}}\big[\log D_{\theta_d}(x)\big] + \mathbb{E}_{z \sim p(z)}\big[\log \big(1 - D_{\theta_d}(G_{\theta_g}(z))\big)\big]\Big\} \)

and **gradient descent on generator**: 

\( \displaystyle \min_{\theta_g} \Big\{\mathbb{E}_{z \sim p(z)}\big[\log \big(1 - D_{\theta_d}(G_{\theta_g}(z))\big)\big]\Big\} \)

However, we note that when a sample is likely fake—hence $D(G(z))$ is small, expression $\log \big(1 - D_{\theta_d}(G_{\theta_g}(z))\big)$ in the generator objective function has small derivative with respect to $D(G(z))$. This means in the beginning of the training, the gradient of the generator objective function is small; that is updates to parameters $\theta_g$ are small. Conversely, the updates are large (strong gradient signal) when samples are already realistic ($D(G(z))$ is large). This creates an unfavorable situation as ideally we would hope the generator is able to learn fast when the discriminator outsmarts the generator. 

<img src="assets/generative-models/gan_training.svg" width="498"></img>

To remedy this problem, we now maximize likelihood of the discriminator being wrong, as opposed to minimizing the likelihood of it being correct: 

\( \displaystyle \max{\theta_g} \Big\{\mathbb{E}_{z \sim p(z)}\big[\log D_{\theta_d}\big(G_{\theta_g}(z)\big)\big]\Big\} \)

The objective remains unchanged, yet there will be higher gradient signal to the generator for unrealistic samples (in the eyes of the discriminator), which improves training performance. 

> **for** number of training iterations **do**:  
> &nbsp;&nbsp; **for** $k$ steps **do**:  
> &nbsp;&nbsp;&nbsp;&nbsp; - Sample minibatch of $m$ noise samples $\{z^{(1)}, \cdots, z^{(m)}\}$ from noise prior $p(z)$  
> &nbsp;&nbsp;&nbsp;&nbsp; - Sample minibatch of $m$ samples $\{x^{(1)}, \cdots, x^{(m)}\}$ from data generating distribution $p_{\mathrm{data}}(x)$  
> &nbsp;&nbsp;&nbsp;&nbsp; - Update the discriminator by ascending its stochastic gradient: 

> \( \displaystyle \nabla_{\theta_d} \frac{1}{m} \sum_{i=1}^{m} \Big[\log D_{\theta_d}(x^{(i)}) + \log \big(1 - D_{\theta_d}(G_{\theta_g}(z^{(i)}))\big)\Big] \)  
> &nbsp;&nbsp; **end for**  
> &nbsp;&nbsp; - Sample minibatch of $m$ samples $\{x^{(1)}, \cdots, x^{(m)}\}$ from data generating distribution $p_{\mathrm{data}}(x)$  
> &nbsp;&nbsp; - Update the generator by ascending its stochastic gradient of the improved objective: 

> \( \displaystyle \nabla_{\theta_g} \frac{1}{m} \sum_{i=1}^{m} \log D_{\theta_d}(G_{\theta_g}(z^{(i)})) \)   
> **end for**  

Here $k \geqslant 1$ is the hyper-parameter and there is not best rule as to the value of $k$. In general, GANs are difficult to train, and followup work like Wasserstein GAN [[Arjovsky et al., 2017]](https://arxiv.org/abs/1701.07875) and BEGAN [[Berthelot et al., 2017]](https://arxiv.org/abs/1703.10717) sets out to achieve better training stability. 

## Inference

After training, we use the generator network to generate images. Specifically, we first draw a sample $z$ from noise prior $p(z)$; then we feed the sampled $z$ into the generator network. The output from the network gives us an image that is similar to the training images. 

## Generated Samples

<img src="assets/generative-models/gan_samples.png" width="900"></img>

From the generated samples, we see GAN can generate high quality samples, indicating the model does not simply memorize exact images from the training data. Training sets from left to right: [``MNIST``](https://yann.lecun.com/exdb/mnist/), ``Toronto Face Dataset (TFD)``, [``CIFAR-10``](https://www.cs.toronto.edu/~kriz/cifar.html). The highlighted columns show the nearest training example of the neighboring generated sample. 

There have been numerous followup studies on improving sample quality, training stability, and other aspects of GANs. The ICLR 2016 paper [[Radford et al., 2015]](https://arxiv.org/abs/1511.06434) proposed deep convolutional networks and other architecture features (deep convolutional generative adversarial networks, or DCGANs) to achieve better image quality and training stability: 

- Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
- Use batchnorm in both the generator and the discriminator.
- Remove fully connected hidden layers for deeper architectures.
- Use ReLU activation in generator for all layers except for the output, which uses Tanh.
- Use LeakyReLU activation in the discriminator for all layers.

<img src="assets/generative-models/dcgan_examples.png" width="621"></img>

Generated samples from DCGANs trained on [``LSUN``](https://www.yf.io/p/lsun) bedrooms dataset show promising improvements as the model can produce high resolution and high quality images without memorizing (overfitting) training examples. 

Similar to VAE, we are also able to find structures in the latent space and meaningfully interpolate random points in the latent space. This means we observe smooth semantic changes to the image generations along any direction of the manifold, which suggests that model has learned relevant  representations (as opposed to memorization). 

<img src="assets/generative-models/dcgan_transitions.png" width="621"></img>

The above figure shows smooth transitions between a series of $9$ random points in the latent space. Every image in the interpolation reasonably looks like a bedroom. For instance, generated samples in the $6$<sup>th</sup> row exhibit transition from a room without a window to a room with a large window; in the $10$<sup>th</sup>, an TV-alike object morphs into a window. 

Additionally, we can also perform arithmetic on $z$ vectors in the latent space. By averaging the $z$ vector for three exemplary generated samples of different visual concepts, we see consistent and stable generations that semantically obeyed the arithmetic. 

<img src="assets/generative-models/dcgan_interpretations_latent_1.png" width="649"></img>

<img src="assets/generative-models/dcgan_interpretations_latent_2.png" width="649"></img>

Arithmetic is performed on the mean vectors and the resulting vector feeds into the generator to produce the center sample on the right hand side. The remaining samples around the center are produced by adding uniform noise in $[-0.25, 0.25]$ to the vector. 

<img src="assets/generative-models/dcgan_interpretations_image_1.png" width="399"></img>

<img src="assets/generative-models/dcgan_interpretations_image_2.png" width="399"></img>

We note that same arithmetic performed pixel-wise in the image space does not behave similarly, as it
only yields in noise overlap due to misalignment. Therefore latent representations learned by the model and associated vector arithmetic have the potential to compactly model conditional generative process of complex image distributions.

## Other Variants

- new loss function (LSGAN): [Mao et al., Least Squares Generative Adversarial Networks, 2016](https://arxiv.org/abs/1611.04076)
- new training methods: 
    - Wasserstein GAN: [Arjovsky et al., Wasserstein GAN, 2017](https://arxiv.org/abs/1701.07875)
    - Improved Wasserstein GAN: [Gulrajani et al., Improved Training of Wasserstein GANs, 2017](https://arxiv.org/abs/1704.00028)
    - Progressive GAN: [Karras et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017](https://arxiv.org/abs/1710.10196)
- source-to-target domain transfer (CycleGAN): [Zhu et al., Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks, 2017](https://arxiv.org/abs/1703.10593)
- text-to-image synthesis: [Reed et al., Generative Adversarial Text to Image Synthesis, 2016](https://arxiv.org/abs/1605.05396)
- image-to-image translation (Pix2pix): [Isola et al., Image-to-Image Translation with Conditional Adversarial Networks, 2016](https://arxiv.org/abs/1611.07004)
- high-resolution and high-quality generations (BigGAN): [Brock et al., Large Scale GAN Training for High Fidelity Natural Image Synthesis, 2018](https://arxiv.org/abs/1809.11096)
- scene graphs to GANs: [Johnson et al., Image Generation from Scene Graphs, 2018](https://arxiv.org/abs/1804.01622)
- benchmark for generative models: [Zhou, Gordon, Krishna et al., HYPE: Human eYe Perceptual Evaluations, 2019](https://arxiv.org/abs/1904.01121)
- many more: ["the GAN zoo"](https://github.com/hindupuravinash/the-gan-zoo)

================================================
FILE: index.html
================================================
---
layout: default
---

<div>

  These notes accompany the Stanford CS class <a href="http://cs231n.stanford.edu/">CS231n: Deep Learning for Computer Vision</a>. For questions/concerns/bug reports, please submit a pull request directly to
  our <a href="https://github.com/cs231n/cs231n.github.io">git repo</a>.
  <br>
  <!-- For questions/concerns/bug reports contact <a href="http://cs.stanford.edu/people/jcjohns/">Justin Johnson</a> regarding the assignments, or contact <a href="http://cs.stanford.edu/people/karpathy/">Andrej Karpathy</a> regarding the course notes. You can also submit a pull request directly to our <a href="https://github.com/cs231n/cs231n.github.io">git repo</a>. -->
  <!-- <br> -->
  <!-- We encourage the use of the <a href="https://hypothes.is/">hypothes.is</a> extension to annote comments and discuss these notes inline. -->
</div>

<div class="home">
  <div class="materials-wrap">
    <div class="module-header">Spring 2024 Assignments</div>
    <div class="materials-item">
      <a href="assignments2024/assignment1/">Assignment #1: Image Classification, kNN, SVM, Softmax, Fully Connected Neural Network</a>
    </div>
    <div class="materials-item">
      <a href="assignments2024/assignment2/">Assignment #2: Fully Connected and Convolutional Nets, Batch Normalization, Dropout, Pytorch & Network Visualization</a>
    </div>
    <div class="materials-item">
      <a href="assignments2024/assignment3/">Assignment #3: Network Visualization, Image Captioning with RNNs and Transformers, Generative Adversarial Networks, Self-Supervised Contrastive Learning</a>
    </div>
  </div>

  <!-- <div class="materials-wrap">
    <div class="module-header">Spring 2021 Assignments</div>
      <div class="materials-item">
        <a href="assignments2021/assignment1/">Assignment #1: Image Classification, kNN, SVM, Softmax, Fully Connected Neural Network</a>
      </div>
      <div class="materials-item">
        <a href="assignments2021/assignment2/">Assignment #2: Fully Connected and Convolutional Nets, Batch Normalization, Dropout, Frameworks</a>
      </div>
      <div class="materials-item">
        <a href="assignments2021/assignment3/">Assignment #3: Image Captioning with RNNs and Transformers, Network Visualization,
          Generative Adversarial Networks, Self-Supervised Contrastive Learning</a>
      </div>
  </div> -->
  <!--
    <div class="materials-item">
      <a href="assignments2019/assignment2/">
        Assignment #2: Fully Connected Nets, Batch Normalization, Dropout,
        Convolutional Nets
      </a>
    </div>

    <div class="materials-item">
      <a href="assignments2019/assignment3/">
        Assignment #3: Image Captioning with Vanilla RNNs, Image Captioning
  with LSTMs, Network Visualization, Style Transfer, Generative Adversarial Networks
      </a>
    </div> -->
  <!--
    <div class="module-header">Spring 2018 Assignments</div>

    <div class="materials-item">
      <a href="assignments2018/assignment1/">
        Assignment #1: Image Classification, kNN, SVM, Softmax, Neural Network
      </a>
    </div>

    <div class="materials-item">
      <a href="assignments2018/assignment2/">
        Assignment #2: Fully-Connected Nets, Batch Normalization, Dropout,
        Convolutional Nets
      </a>
    </div>

    <div class="materials-item">
      <a href="assignments2018/assignment3/">
        Assignment #3: Image Captioning with Vanilla RNNs, Image Captioning
        with LSTMs, Network Visualization, Style Transfer, Generative Adversarial Networks
      </a>
    </div>
    -->

  <!--
    <div class="module-header">Winter 2016 Assignments</div>

    <div class="materials-item">
      <a href="assignments2016/assignment1/">
        Assignment #1: Image Classification, kNN, SVM, Softmax, Neural Network
      </a>
    </div>

    <div class="materials-item">
      <a href="assignments2016/assignment2/">
        Assignment #2: Fully-Connected Nets, Batch Normalization, Dropout,
        Convolutional Nets
      </a>
    </div>

    <div class="materials-item">
      <a href="assignments2016/assignment3/">
        Assignment #3: Recurrent Neural Networks, Image Captioning,
        Image Gradients, DeepDream
      </a>
    </div>
    -->

  <!--
    <div class="module-header">Winter 2015 Assignments</div>

    <div class="materials-item">
      <a href="assignment1/">
        Assignment #1: Image Classification, kNN, SVM, Softmax
      </a>
    </div>

    <div class="materials-item">
      <a href="assignment2/">
        Assignment #2: Neural Networks, ConvNets I
      </a>
    </div>

    <div class="materials-item">
      <a href="assignment3/">
        Assignment #3: ConvNets II, Transfer Learning, Visualization
      </a>
    </div>
  -->

  <div class="module-header">Module 0: Preparation</div>

  <div class="materials-item">
    <a href="setup-instructions/">
      Software Setup
    </a>
  </div>

  <div class="materials-item">
    <a href="python-numpy-tutorial/">
      Python / Numpy Tutorial (with Jupyter and Colab)
    </a>
  </div>
  <!--
    <div class="materials-item">
      <a href="terminal-tutorial/">
        Terminal.com Tutorial
      </a>
    </div>
-->
  <!-- <div class="materials-item">
      <a href="https://github.com/cs231n/gcloud">
        Google Cloud Tutorial
      </a>
    </div> -->
  <!-- <div class="materials-item">
      <a href="aws-tutorial/">
        AWS Tutorial
      </a>
    </div> -->

  <!-- hardcoding items here to force a specific order -->
  <div class="module-header">Module 1: Neural Networks</div>

  <div class="materials-item">
    <a href="classification/">
      Image Classification: Data-driven Approach, k-Nearest Neighbor, train/val/test splits
    </a>
    <div class="kw">
      L1/L2 distances, hyperparameter search, cross-validation
    </div>
  </div>

  <div class="materials-item">
    <a href="linear-classify/">
      Linear classification: Support Vector Machine, Softmax
    </a>
    <div class="kw">
      parameteric approach, bias trick, hinge loss, cross-entropy loss, L2 regularization, web demo
    </div>
  </div>

  <div class="materials-item">
    <a href="optimization-1/">
      Optimization: Stochastic Gradient Descent
    </a>
    <div class="kw">
      optimization landscapes, local search, learning rate, analytic/numerical gradient
    </div>
  </div>

  <div class="materials-item">
    <a href="optimization-2/">
      Backpropagation, Intuitions
    </a>
    <div class="kw">
      chain rule interpretation, real-valued circuits, patterns in gradient flow
    </div>
  </div>

  <div class="materials-item">
    <a href="neural-networks-1/">
      Neural Networks Part 1: Setting up the Architecture
    </a>
    <div class="kw">
      model of a biological neuron, activation functions, neural net architecture, representational power
    </div>
  </div>

  <div class="materials-item">
    <a href="neural-networks-2/">
      Neural Networks Part 2: Setting up the Data and the Loss
    </a>
    <div class="kw">
      preprocessing, weight initialization, batch normalization, regularization (L2/dropout), loss functions
    </div>
  </div>

  <div class="materials-item">
    <a href="neural-networks-3/">
      Neural Networks Part 3: Learning and Evaluation
    </a>
    <div class="kw">
      gradient checks, sanity checks, babysitting the learning process, momentum (+nesterov), second-order methods,
      Adagrad/RMSprop, hyperparameter optimization, model ensembles
    </div>
  </div>

  <div class="materials-item">
    <a href="neural-networks-case-study/">
      Putting it together: Minimal Neural Network Case Study
    </a>
    <div class="kw">
      minimal 2D toy data example
    </div>
  </div>

  <div class="module-header">Module 2: Convolutional Neural Networks</div>

  <div class="materials-item">
    <a href="convolutional-networks/">
      Convolutional Neural Networks: Architectures, Convolution / Pooling Layers
    </a>
    <div class="kw">
      layers, spatial arrangement, layer patterns, layer sizing patterns, AlexNet/ZFNet/VGGNet case studies,
      computational considerations
    </div>
  </div>

  <div class="materials-item">
    <a href="understanding-cnn/">
      Understanding and Visualizing Convolutional Neural Networks
    </a>
    <div class="kw">
      tSNE embeddings, deconvnets, data gradients, fooling ConvNets, human comparisons
    </div>
  </div>

  <div class="materials-item">
    <a href="transfer-learning/">
      Transfer Learning and Fine-tuning Convolutional Neural Networks
    </a>
  </div>

  <div class="module-header">Student-Contributed Posts</div>

  <div class="materials-item">
    <a href="choose-project/">
      Taking a Course Project to Publication
    </a>
    <a href="rnn/">
      Recurrent Neural Networks
    </a>
  </div>

</div>
</div>


================================================
FILE: jupyter-colab-tutorial.md
================================================
---
layout: page
title: Jupyter Notebook / Google Colab Tutorial
permalink: /jupyter-colab-tutorial/
---

A Jupyter notebook lets you write and execute
Python code *locally* in your web browser. Jupyter notebooks
make it very easy to tinker with code and execute it in bits
and pieces; for this reason they are widely used in scientific
computing.
Colab on the other hand is Google's flavor of
Jupyter notebooks that is particularly suited for machine
learning and data analysis and that runs entirely in the *cloud*.
Colab is basically Jupyter notebook on steroids: it's free, requires no setup,
comes preinstalled with many packages, is easy to share with the world,
and benefits from free access to hardware accelerators like GPUs and TPUs (with some caveats).

To get yourself familiar with Python and notebooks, we'll be running
a short tutorial as a standalone Jupyter or Colab notebook. If you wish
to use Colab, click the `Open in Colab` badge below.

<div>
  <a href="https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb" target="_blank">
    <img class="colab-badge" src="/assets/badges/colab-open.svg" alt="Colab Notebook"/>
  </a>
</div>

If you wish to run the notebook locally make sure your virtual environment was installed correctly (as per the [setup instructions]({{site.baseurl}}/setup-instructions/)), activate it, then run `pip install notebook` to install Jupyter notebook. Next, [open the notebook](https://raw.githubusercontent.com/cs231n/cs231n.github.io/master/jupyter-notebook-tutorial.ipynb) and download it to a directory of your choice by right-clicking on the page and selecting `Save Page As`. Then `cd` to that directory and run the following in your terminal:

```
jupyter notebook
```

Once your notebook server is up and running, point your web browser to `http://localhost:8888` to
start using your notebooks. If everything worked correctly, you should
see a screen like this, showing all available notebooks in the current
directory:

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/file-browser.png'>
</div>

Click `jupyter-notebook-tutorial.ipynb` and follow the instructions in the notebook. Enjoy!

<!-- If you click through to a notebook file, you will see a screen like this: -->

<!-- <div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/notebook-1.png'>
</div>

A Jupyter notebook is made up of a number of **cells**. Each cell can contain
Python code. You can execute a cell by clicking on it (the highlight color will
switch from blue to green) and pressing `Shift-Enter`.
When you do so, the code in the cell will run, and the output of the cell
will be displayed beneath the cell. For example, after running the first cell,
the notebook shoud look like this:

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/notebook-2.png'>
</div>

Global variables are shared between cells. Executing the second cell thus gives
the following result:

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/notebook-3.png'>
</div>

There are a few keyboard shortcuts you should be aware of to make your notebook
experience more pleasant. To escape cell editing, press `esc`. The highlight color
should switch back to blue. To place a cell below the current one, press `b`.
To place a cell above the current one, press `a`. Finally, to delete a cell, press `dd`.

You can restart a notebook and clear all cells by clicking `Kernel -> Restart & Clear Output`.

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/notebook-restart.png'>
</div>

By convention, Jupyter notebooks are expected to be run from top to bottom.
Failing to execute some cells or executing cells out of order can result in
errors. After restarting the notebook, try running the second cell directly:

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/notebook-error.png'>
</div>

After you have modified a Jupyter notebook for one of the assignments by
modifying or executing some of its cells, remember to **save your changes!**

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/save-notebook.png'>
</div>

This has only been a brief introduction to Jupyter notebooks, but it should
be enough to get you up and running on the assignments for this course. -->


================================================
FILE: jupyter-notebook-tutorial.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dzNng6vCL9eP"
   },
   "source": [
    "## CS231n Python Tutorial With Jupyter Notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0vJLt3JRL9eR"
   },
   "source": [
    "This tutorial was originally written by [Justin Johnson](https://web.eecs.umich.edu/~justincj/) for cs231n and adapted as a Jupyter notebook for cs228 by [Volodymyr Kuleshov](http://web.stanford.edu/~kuleshov/) and [Isaac Caswell](https://symsys.stanford.edu/viewing/symsysaffiliate/21335).\n",
    "\n",
    "This current version has been adapted as a Jupyter notebook with Python3 support by Kevin Zakka for the Spring 2020 edition of [cs231n](http://cs231n.stanford.edu/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is a Jupyter Notebook?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Jupyter notebook is made up of a number of cells. Each cell can contain Python code. There are two main types of cells: `Code` cells and `Markdown` cells. This particular cell is a `Markdown` cell. You can execute a particular cell by double clicking on it (the highlight color will switch from blue to green) and pressing `Shift-Enter`. When you do so, if the cell is a `Code` cell, the code in the cell will run, and the output of the cell will be displayed beneath the cell, and if the cell is a `Markdown` cell, the markdown text will get rendered beneath the cell.\n",
    "\n",
    "Go ahead and try executing this cell."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cell below is a `Code` cell. Go ahead and click it, then execute it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n"
     ]
    }
   ],
   "source": [
    "x = 1\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Global variables are shared between cells. Try executing the cell below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "y = 2 * x\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Keyboard Shortcuts\n",
    "\n",
    "There are a few keyboard shortcuts you should be aware of to make your notebook experience more pleasant. To escape editing of a cell, press `esc`. Escaping a `Markdown` cell won't render it, so make sure to execute it if you wish to render the markdown. Notice how the highlight color switches back to blue when you have escaped a cell.\n",
    "\n",
    "You can navigate between cells by pressing your arrow keys. Executing a cell automatically shifts the cell cursor down 1 cell if one exists, or creates a new cell below the current one if none exist.\n",
    "\n",
    "* To place a cell below the current one, press `b`.\n",
    "* To place a cell above the current one, press `a`.\n",
    "* To delete a cell, press `dd`.\n",
    "* To convert a cell to `Markdown` press `m`. Note you have to be in `esc` mode.\n",
    "* To convert it back to `Code` press `y`. Note you have to be in `esc` mode.\n",
    "\n",
    "Get familiar with these keyboard shortcuts, they really help!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can restart a notebook and clear all cells by clicking `Kernel -> Restart & Clear Output`. If you don't want to clear cell outputs, just hit `Kernel -> Restart`.\n",
    "\n",
    "By convention, Jupyter notebooks are expected to be run from top to bottom. Failing to execute some cells or executing cells out of order can result in errors. After restarting the notebook, try running the `y = 2 * x` cell 2 cells above and observe what happens."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After you have modified a Jupyter notebook for one of the assignments by modifying or executing some of its cells, remember to save your changes! You can save with the `Command/Control + s` shortcut or by clicking `File -> Save and Checkpoint`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This has only been a brief introduction to Jupyter notebooks, but it should be enough to get you up and running on the assignments for this course."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qVrTo-LhL9eS"
   },
   "source": [
    "## Python Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9t1gKp9PL9eV"
   },
   "source": [
    "Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing.\n",
    "\n",
    "We expect that many of you will have some experience with Python and numpy; for the rest of you, this section will serve as a quick crash course both on the Python programming language and on the use of Python for scientific computing.\n",
    "\n",
    "Some of you may have previous knowledge in Matlab, in which case we also recommend the numpy for Matlab users page (https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U1PvreR9L9eW"
   },
   "source": [
    "In this tutorial, we will cover:\n",
    "\n",
    "* Basic Python: Basic data types (Containers, Lists, Dictionaries, Sets, Tuples), Functions, Classes\n",
    "* Numpy: Arrays, Array indexing, Datatypes, Array math, Broadcasting\n",
    "* Matplotlib: Plotting, Subplots, Images\n",
    "* IPython: Creating notebooks, Typical workflows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nxvEkGXPM3Xh"
   },
   "source": [
    "## A Brief Note on Python Versions\n",
    "\n",
    "As of Janurary 1, 2020, Python has [officially dropped support](https://www.python.org/doc/sunset-python-2/) for `python2`. **We'll be using Python 3.7 for this iteration of the course.**\n",
    "\n",
    "You should have activated your `cs231n` virtual environment created in the [Setup Instructions](https://cs231n.github.io/setup-instructions/) before calling `jupyter notebook`. If that is\n",
    "the case, the cell below should print out a major version of 3.7."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "1L4Am0QATgOc",
    "outputId": "bb5ee3ac-8683-44ab-e599-a2077510f327"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python 3.7.6\r\n"
     ]
    }
   ],
   "source": [
    "!python --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "JAFKYgrpL9eY"
   },
   "source": [
    "## Basics of Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RbFS6tdgL9ea"
   },
   "source": [
    "Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable. As an example, here is an implementation of the classic quicksort algorithm in Python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "cYb0pjh1L9eb",
    "outputId": "9a8e37de-1dc1-4092-faee-06ad4ff2d73a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 1, 2, 3, 6, 8, 10]\n"
     ]
    }
   ],
   "source": [
    "def quicksort(arr):\n",
    "    if len(arr) <= 1:\n",
    "        return arr\n",
    "    pivot = arr[len(arr) // 2]\n",
    "    left = [x for x in arr if x < pivot]\n",
    "    middle = [x for x in arr if x == pivot]\n",
    "    right = [x for x in arr if x > pivot]\n",
    "    return quicksort(left) + middle + quicksort(right)\n",
    "\n",
    "print(quicksort([3,6,8,10,1,2,1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NwS_hu4xL9eo"
   },
   "source": [
    "### Basic data types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DL5sMSZ9L9eq"
   },
   "source": [
    "#### Numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MGS0XEWoL9er"
   },
   "source": [
    "Integers and floats work as you would expect from other languages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "KheDr_zDL9es",
    "outputId": "1db9f4d3-2e0d-4008-f78a-161ed52c4359"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 <class 'int'>\n"
     ]
    }
   ],
   "source": [
    "x = 3\n",
    "print(x, type(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "sk_8DFcuL9ey",
    "outputId": "dd60a271-3457-465d-e16a-41acf12a56ab"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n",
      "2\n",
      "6\n",
      "9\n"
     ]
    }
   ],
   "source": [
    "print(x + 1)   # Addition\n",
    "print(x - 1)   # Subtraction\n",
    "print(x * 2)   # Multiplication\n",
    "print(x ** 2)  # Exponentiation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "U4Jl8K0tL9e4",
    "outputId": "07e3db14-3781-42b7-8ba6-042b3f9f72ba"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n",
      "8\n"
     ]
    }
   ],
   "source": [
    "x += 1\n",
    "print(x)\n",
    "x *= 2\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "w-nZ0Sg_L9e9",
    "outputId": "3aa579f8-9540-46ef-935e-be887781ecb4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'float'>\n",
      "2.5 3.5 5.0 6.25\n"
     ]
    }
   ],
   "source": [
    "y = 2.5\n",
    "print(type(y))\n",
    "print(y, y + 1, y * 2, y ** 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "r2A9ApyaL9fB"
   },
   "source": [
    "Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.\n",
    "\n",
    "Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#numeric-types-int-float-long-complex)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EqRS7qhBL9fC"
   },
   "source": [
    "#### Booleans"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Nv_LIVOJL9fD"
   },
   "source": [
    "Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "RvoImwgGL9fE",
    "outputId": "1517077b-edca-463f-857b-6a8c386cd387"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'bool'>\n"
     ]
    }
   ],
   "source": [
    "t, f = True, False\n",
    "print(type(t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YQgmQfOgL9fI"
   },
   "source": [
    "Now we let's look at the operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "6zYm7WzCL9fK",
    "outputId": "f3cebe76-5af4-473a-8127-88a1fd60560f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "True\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "print(t and f) # Logical AND;\n",
    "print(t or f)  # Logical OR;\n",
    "print(not t)   # Logical NOT;\n",
    "print(t != f)  # Logical XOR;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UQnQWFEyL9fP"
   },
   "source": [
    "#### Strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "AijEDtPFL9fP",
    "outputId": "2a6b0cd7-58f1-43cf-e6b7-bf940d532549"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello 5\n"
     ]
    }
   ],
   "source": [
    "hello = 'hello'   # String literals can use single quotes\n",
    "world = \"world\"   # or double quotes; it does not matter\n",
    "print(hello, len(hello))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "saDeaA7hL9fT",
    "outputId": "2837d0ab-9ae5-4053-d087-bfa0af81c344"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello world\n"
     ]
    }
   ],
   "source": [
    "hw = hello + ' ' + world  # String concatenation\n",
    "print(hw)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "Nji1_UjYL9fY",
    "outputId": "0149b0ca-425a-4a34-8e24-8dff7080922e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello world 12\n"
     ]
    }
   ],
   "source": [
    "hw12 = '{} {} {}'.format(hello, world, 12)  # string formatting\n",
    "print(hw12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bUpl35bIL9fc"
   },
   "source": [
    "String objects have a bunch of useful methods; for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 121
    },
    "colab_type": "code",
    "id": "VOxGatlsL9fd",
    "outputId": "ab009df3-8643-4d3e-f85f-a813b70db9cb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n",
      "HELLO\n",
      "  hello\n",
      " hello \n",
      "he(ell)(ell)o\n",
      "world\n"
     ]
    }
   ],
   "source": [
    "s = \"hello\"\n",
    "print(s.capitalize())  # Capitalize a string\n",
    "print(s.upper())       # Convert a string to uppercase; prints \"HELLO\"\n",
    "print(s.rjust(7))      # Right-justify a string, padding with spaces\n",
    "print(s.center(7))     # Center a string, padding with spaces\n",
    "print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another\n",
    "print('  world '.strip())  # Strip leading and trailing whitespace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "06cayXLtL9fi"
   },
   "source": [
    "You can find a list of all string methods in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#string-methods)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "p-6hClFjL9fk"
   },
   "source": [
    "### Containers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FD9H18eQL9fk"
   },
   "source": [
    "Python includes several built-in container types: lists, dictionaries, sets, and tuples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UsIWOe0LL9fn"
   },
   "source": [
    "#### Lists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wzxX7rgWL9fn"
   },
   "source": [
    "A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "hk3A8pPcL9fp",
    "outputId": "b545939a-580c-4356-db95-7ad3670b46e4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 2] 2\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "xs = [3, 1, 2]   # Create a list\n",
    "print(xs, xs[2])\n",
    "print(xs[-1])     # Negative indices count from the end of the list; prints \"2\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "YCjCy_0_L9ft",
    "outputId": "417c54ff-170b-4372-9099-0f756f8e48af"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 'foo']\n"
     ]
    }
   ],
   "source": [
    "xs[2] = 'foo'    # Lists can contain elements of different types\n",
    "print(xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "vJ0x5cF-L9fx",
    "outputId": "a97731a3-70e1-4553-d9e0-2aea227cac80"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 'foo', 'bar']\n"
     ]
    }
   ],
   "source": [
    "xs.append('bar') # Add a new element to the end of the list\n",
    "print(xs)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "cxVCNRTNL9f1",
    "outputId": "508fbe59-20aa-48b5-a1b2-f90363e7a104"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bar [3, 1, 'foo']\n"
     ]
    }
   ],
   "source": [
    "x = xs.pop()     # Remove and return the last element of the list\n",
    "print(x, xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ilyoyO34L9f4"
   },
   "source": [
    "As usual, you can find all the gory details about lists in the [documentation](https://docs.python.org/3.7/tutorial/datastructures.html#more-on-lists)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ovahhxd_L9f5"
   },
   "source": [
    "#### Slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YeSYKhv9L9f6"
   },
   "source": [
    "In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 139
    },
    "colab_type": "code",
    "id": "ninq666bL9f6",
    "outputId": "c3c2ed92-7358-4fdb-bbc0-e90f82e7e941"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 2, 3, 4]\n",
      "[2, 3]\n",
      "[2, 3, 4]\n",
      "[0, 1]\n",
      "[0, 1, 2, 3, 4]\n",
      "[0, 1, 2, 3]\n",
      "[0, 1, 8, 9, 4]\n"
     ]
    }
   ],
   "source": [
    "nums = list(range(5))    # range is a built-in function that creates a list of integers\n",
    "print(nums)         # Prints \"[0, 1, 2, 3, 4]\"\n",
    "print(nums[2:4])    # Get a slice from index 2 to 4 (exclusive); prints \"[2, 3]\"\n",
    "print(nums[2:])     # Get a slice from index 2 to the end; prints \"[2, 3, 4]\"\n",
    "print(nums[:2])     # Get a slice from the start to index 2 (exclusive); prints \"[0, 1]\"\n",
    "print(nums[:])      # Get a slice of the whole list; prints [\"0, 1, 2, 3, 4]\"\n",
    "print(nums[:-1])    # Slice indices can be negative; prints [\"0, 1, 2, 3]\"\n",
    "nums[2:4] = [8, 9] # Assign a new sublist to a slice\n",
    "print(nums)         # Prints \"[0, 1, 8, 9, 4]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UONpMhF4L9f_"
   },
   "source": [
    "#### Loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_DYz1j6QL9f_"
   },
   "source": [
    "You can loop over the elements of a list like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "4cCOysfWL9gA",
    "outputId": "560e46c7-279c-409a-838c-64bea8d321c4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cat\n",
      "dog\n",
      "monkey\n"
     ]
    }
   ],
   "source": [
    "animals = ['cat', 'dog', 'monkey']\n",
    "for animal in animals:\n",
    "    print(animal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KxIaQs7pL9gE"
   },
   "source": [
    "If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "JjGnDluWL9gF",
    "outputId": "81421905-17ea-4c5a-bcc0-176de19fd9bd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#1: cat\n",
      "#2: dog\n",
      "#3: monkey\n"
     ]
    }
   ],
   "source": [
    "animals = ['cat', 'dog', 'monkey']\n",
    "for idx, animal in enumerate(animals):\n",
    "    print('#{}: {}'.format(idx + 1, animal))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "arrLCcMyL9gK"
   },
   "source": [
    "#### List comprehensions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5Qn2jU_pL9gL"
   },
   "source": [
    "When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "IVNEwoMXL9gL",
    "outputId": "d571445b-055d-45f0-f800-24fd76ceec5a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 4, 9, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "squares = []\n",
    "for x in nums:\n",
    "    squares.append(x ** 2)\n",
    "print(squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7DmKVUFaL9gQ"
   },
   "source": [
    "You can make this code simpler using a list comprehension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "kZxsUfV6L9gR",
    "outputId": "4254a7d4-58ba-4f70-a963-20c46b485b72"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 4, 9, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "squares = [x ** 2 for x in nums]\n",
    "print(squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-D8ARK7tL9gV"
   },
   "source": [
    "List comprehensions can also contain conditions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "yUtgOyyYL9gV",
    "outputId": "1ae7ab58-8119-44dc-8e57-fda09197d026"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 4, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "even_squares = [x ** 2 for x in nums if x % 2 == 0]\n",
    "print(even_squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "H8xsUEFpL9gZ"
   },
   "source": [
    "#### Dictionaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kkjAGMAJL9ga"
   },
   "source": [
    "A dictionary stores (key, value) pairs, similar to a `Map` in Java or an object in Javascript. You can use it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "XBYI1MrYL9gb",
    "outputId": "8e24c1da-0fc0-4b4c-a3e6-6f758a53b7da"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cute\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data\n",
    "print(d['cat'])       # Get an entry from a dictionary; prints \"cute\"\n",
    "print('cat' in d)     # Check if a dictionary has a given key; prints \"True\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "pS7e-G-HL9gf",
    "outputId": "feb4bf18-c0a3-42a2-eaf5-3fc390f36dcf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "wet\n"
     ]
    }
   ],
   "source": [
    "d['fish'] = 'wet'    # Set an entry in a dictionary\n",
    "print(d['fish'])      # Prints \"wet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 165
    },
    "colab_type": "code",
    "id": "tFY065ItL9gi",
    "outputId": "7e42a5f0-1856-4608-a927-0930ab37a66c"
   },
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "'monkey'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-26-78fc9745d9cf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'monkey'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m  \u001b[0;31m# KeyError: 'monkey' not a key of d\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m: 'monkey'"
     ]
    }
   ],
   "source": [
    "print(d['monkey'])  # KeyError: 'monkey' not a key of d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "8TjbEWqML9gl",
    "outputId": "ef14d05e-401d-4d23-ed1a-0fe6b4c77d6f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n",
      "wet\n"
     ]
    }
   ],
   "source": [
    "print(d.get('monkey', 'N/A'))  # Get an element with a default; prints \"N/A\"\n",
    "print(d.get('fish', 'N/A'))    # Get an element with a default; prints \"wet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "0EItdNBJL9go",
    "outputId": "652a950f-b0c2-4623-98bd-0191b300cd57"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "del d['fish']        # Remove an element from a dictionary\n",
    "print(d.get('fish', 'N/A')) # \"fish\" is no longer a key; prints \"N/A\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wqm4dRZNL9gr"
   },
   "source": [
    "You can find all you need to know about dictionaries in the [documentation](https://docs.python.org/2/library/stdtypes.html#dict)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "IxwEqHlGL9gr"
   },
   "source": [
    "It is easy to iterate over the keys in a dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "rYfz7ZKNL9gs",
    "outputId": "155bdb17-3179-4292-c832-8166e955e942"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A person has 2 legs\n",
      "A cat has 4 legs\n",
      "A spider has 8 legs\n"
     ]
    }
   ],
   "source": [
    "d = {'person': 2, 'cat': 4, 'spider': 8}\n",
    "for animal, legs in d.items():\n",
    "    print('A {} has {} legs'.format(animal, legs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "17sxiOpzL9gz"
   },
   "source": [
    "Dictionary comprehensions: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "8PB07imLL9gz",
    "outputId": "e9ddf886-39ed-4f35-dd80-64a19d2eec9b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0: 0, 2: 4, 4: 16}\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}\n",
    "print(even_num_to_square)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "V9MHfUdvL9g2"
   },
   "source": [
    "#### Sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Rpm4UtNpL9g2"
   },
   "source": [
    "A set is an unordered collection of distinct elements. As a simple example, consider the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "MmyaniLsL9g2",
    "outputId": "8f152d48-0a07-432a-cf98-8de4fd57ddbb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "animals = {'cat', 'dog'}\n",
    "print('cat' in animals)   # Check if an element is in a set; prints \"True\"\n",
    "print('fish' in animals)  # prints \"False\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ElJEyK86L9g6",
    "outputId": "b9d7dab9-5a98-41cd-efbc-786d0c4377f7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "animals.add('fish')      # Add an element to a set\n",
    "print('fish' in animals)\n",
    "print(len(animals))       # Number of elements in a set;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "5uGmrxdPL9g9",
    "outputId": "e644d24c-26c6-4b43-ab15-8aa81fe884d4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "animals.add('cat')       # Adding an element that is already in the set does nothing\n",
    "print(len(animals))       \n",
    "animals.remove('cat')    # Remove an element from a set\n",
    "print(len(animals))       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zk2DbvLKL9g_"
   },
   "source": [
    "*Loops*: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "K47KYNGyL9hA",
    "outputId": "4477f897-4355-4816-b39b-b93ffbac4bf0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#1: dog\n",
      "#2: fish\n",
      "#3: cat\n"
     ]
    }
   ],
   "source": [
    "animals = {'cat', 'dog', 'fish'}\n",
    "for idx, animal in enumerate(animals):\n",
    "    print('#{}: {}'.format(idx + 1, animal))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "puq4S8buL9hC"
   },
   "source": [
    "Set comprehensions: Like lists and dictionaries, we can easily construct sets using set comprehensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "iw7k90k3L9hC",
    "outputId": "72d6b824-6d31-47b2-f929-4cf434590ee5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0, 1, 2, 3, 4, 5}\n"
     ]
    }
   ],
   "source": [
    "from math import sqrt\n",
    "print({int(sqrt(x)) for x in range(30)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qPsHSKB1L9hF"
   },
   "source": [
    "#### Tuples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kucc0LKVL9hG"
   },
   "source": [
    "A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "9wHUyTKxL9hH",
    "outputId": "cdc5f620-04fe-4b0b-df7a-55b061d23d88"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'tuple'>\n",
      "5\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys\n",
    "t = (5, 6)       # Create a tuple\n",
    "print(type(t))\n",
    "print(d[t])       \n",
    "print(d[(1, 2)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 165
    },
    "colab_type": "code",
    "id": "HoO8zYKzL9hJ",
    "outputId": "28862bfc-0298-40d7-f8c4-168e109d2d93"
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'tuple' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-38-c8aeb8cd20ae>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "t[0] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "AXA4jrEOL9hM"
   },
   "source": [
    "### Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WaRms-QfL9hN"
   },
   "source": [
    "Python functions are defined using the `def` keyword. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "kiMDUr58L9hN",
    "outputId": "9f53bf9a-7b2a-4c51-9def-398e4677cd6c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "negative\n",
      "zero\n",
      "positive\n"
     ]
    }
   ],
   "source": [
    "def sign(x):\n",
    "    if x > 0:\n",
    "        return 'positive'\n",
    "    elif x < 0:\n",
    "        return 'negative'\n",
    "    else:\n",
    "        return 'zero'\n",
    "\n",
    "for x in [-1, 0, 1]:\n",
    "    print(sign(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U-QJFt8TL9hR"
   },
   "source": [
    "We will often define functions to take optional keyword arguments, like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "PfsZ3DazL9hR",
    "outputId": "6e6af832-67d8-4d8c-949b-335927684ae3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, Bob!\n",
      "HELLO, FRED\n"
     ]
    }
   ],
   "source": [
    "def hello(name, loud=False):\n",
    "    if loud:\n",
    "        print('HELLO, {}'.format(name.upper()))\n",
    "    else:\n",
    "        print('Hello, {}!'.format(name))\n",
    "\n",
    "hello('Bob')\n",
    "hello('Fred', loud=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ObA9PRtQL9hT"
   },
   "source": [
    "### Classes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hAzL_lTkL9hU"
   },
   "source": [
    "The syntax for defining classes in Python is straightforward:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "RWdbaGigL9hU",
    "outputId": "4f6615c5-75a7-4ce4-8ea1-1e7f5e4e9fc3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, Fred!\n",
      "HELLO, FRED\n"
     ]
    }
   ],
   "source": [
    "class Greeter:\n",
    "\n",
    "    # Constructor\n",
    "    def __init__(self, name):\n",
    "        self.name = name  # Create an instance variable\n",
    "\n",
    "    # Instance method\n",
    "    def greet(self, loud=False):\n",
    "        if loud:\n",
    "          print('HELLO, {}'.format(self.name.upper()))\n",
    "        else:\n",
    "          print('Hello, {}!'.format(self.name))\n",
    "\n",
    "g = Greeter('Fred')  # Construct an instance of the Greeter class\n",
    "g.greet()            # Call an instance method; prints \"Hello, Fred\"\n",
    "g.greet(loud=True)   # Call an instance method; prints \"HELLO, FRED!\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3cfrOV4dL9hW"
   },
   "source": [
    "## Numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fY12nHhyL9hX"
   },
   "source": [
    "Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this [tutorial](http://wiki.scipy.org/NumPy_for_Matlab_Users) useful to get started with Numpy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lZMyAdqhL9hY"
   },
   "source": [
    "To use Numpy, we first need to import the `numpy` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "58QdX8BLL9hZ"
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DDx6v1EdL9hb"
   },
   "source": [
    "### Arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "f-Zv3f7LL9hc"
   },
   "source": [
    "A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_eMTRnZRL9hc"
   },
   "source": [
    "We can initialize numpy arrays from nested Python lists, and access elements using square brackets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "-l3JrGxCL9hc",
    "outputId": "8d9dad18-c734-4a8a-ca8c-44060a40fb79"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> (3,) 1 2 3\n",
      "[5 2 3]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1, 2, 3])  # Create a rank 1 array\n",
    "print(type(a), a.shape, a[0], a[1], a[2])\n",
    "a[0] = 5                 # Change an element of the array\n",
    "print(a)                  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ma6mk-kdL9hh",
    "outputId": "0b54ff2f-e7f1-4b30-c653-9bf81cb8fbb0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]\n",
      " [4 5 6]]\n"
     ]
    }
   ],
   "source": [
    "b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ymfSHAwtL9hj",
    "outputId": "5bd292d8-c751-43b9-d480-f357dde52342"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 3)\n",
      "1 2 4\n"
     ]
    }
   ],
   "source": [
    "print(b.shape)\n",
    "print(b[0, 0], b[0, 1], b[1, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "F2qwdyvuL9hn"
   },
   "source": [
    "Numpy also provides many functions to create arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "mVTN_EBqL9hn",
    "outputId": "d267c65f-ba90-4043-cedb-f468ab1bcc5d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0. 0.]\n",
      " [0. 0.]]\n"
     ]
    }
   ],
   "source": [
    "a = np.zeros((2,2))  # Create an array of all zeros\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "skiKlNmlL9h5",
    "outputId": "7d1ec1b5-a1fe-4f44-cbe3-cdeacad425f1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1. 1.]]\n"
     ]
    }
   ],
   "source": [
    "b = np.ones((1,2))   # Create an array of all ones\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "HtFsr03bL9h7",
    "outputId": "2688b157-2fad-4fc6-f20b-8633207f0326"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[7 7]\n",
      " [7 7]]\n"
     ]
    }
   ],
   "source": [
    "c = np.full((2,2), 7) # Create a constant array\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "-QcALHvkL9h9",
    "outputId": "5035d6fe-cb7e-4222-c972-55fe23c9d4c0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1. 0.]\n",
      " [0. 1.]]\n"
     ]
    }
   ],
   "source": [
    "d = np.eye(2)        # Create a 2x2 identity matrix\n",
    "print(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "RCpaYg9qL9iA",
    "outputId": "25f0b387-39cf-42f3-8701-de860cc75e2e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.5293933  0.83232089]\n",
      " [0.54040558 0.42955453]]\n"
     ]
    }
   ],
   "source": [
    "e = np.random.random((2,2)) # Create an array filled with random values\n",
    "print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jI5qcSDfL9iC"
   },
   "source": [
    "### Array indexing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "M-E4MUeVL9iC"
   },
   "source": [
    "Numpy offers several ways to index into arrays."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QYv4JyIEL9iD"
   },
   "source": [
    "Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "wLWA0udwL9iD",
    "outputId": "99f08618-c513-4982-8982-b146fc72dab3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2 3]\n",
      " [6 7]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Create the following rank 2 array with shape (3, 4)\n",
    "# [[ 1  2  3  4]\n",
    "#  [ 5  6  7  8]\n",
    "#  [ 9 10 11 12]]\n",
    "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n",
    "\n",
    "# Use slicing to pull out the subarray consisting of the first 2 rows\n",
    "# and columns 1 and 2; b is the following array of shape (2, 2):\n",
    "# [[2 3]\n",
    "#  [6 7]]\n",
    "b = a[:2, 1:3]\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KahhtZKYL9iF"
   },
   "source": [
    "A slice of an array is a view into the same data, so modifying it will modify the original array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "1kmtaFHuL9iG",
    "outputId": "ee3ab60c-4064-4a9e-b04c-453d3955f1d1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "77\n"
     ]
    }
   ],
   "source": [
    "print(a[0, 1])\n",
    "b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]\n",
    "print(a[0, 1]) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_Zcf3zi-L9iI"
   },
   "source": [
    "You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. Note that this is quite different from the way that MATLAB handles array slicing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "G6lfbPuxL9iJ",
    "outputId": "a225fe9d-2a29-4e14-a243-2b7d583bd4bc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1  2  3  4]\n",
      " [ 5  6  7  8]\n",
      " [ 9 10 11 12]]\n"
     ]
    }
   ],
   "source": [
    "# Create the following rank 2 array with shape (3, 4)\n",
    "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NCye3NXhL9iL"
   },
   "source": [
    "Two ways of accessing the data in the middle row of the array.\n",
    "Mixing integer indexing with slices yields an array of lower rank,\n",
    "while using only slices yields an array of the same rank as the\n",
    "original array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "EOiEMsmNL9iL",
    "outputId": "ab2ebe48-9002-45a8-9462-fd490b467f40"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 6 7 8] (4,)\n",
      "[[5 6 7 8]] (1, 4)\n",
      "[[5 6 7 8]] (1, 4)\n"
     ]
    }
   ],
   "source": [
    "row_r1 = a[1, :]    # Rank 1 view of the second row of a  \n",
    "row_r2 = a[1:2, :]  # Rank 2 view of the second row of a\n",
    "row_r3 = a[[1], :]  # Rank 2 view of the second row of a\n",
    "print(row_r1, row_r1.shape)\n",
    "print(row_r2, row_r2.shape)\n",
    "print(row_r3, row_r3.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "JXu73pfDL9iN",
    "outputId": "6c589b85-e9b0-4c13-a39d-4cd9fb2f41ac"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 2  6 10] (3,)\n",
      "\n",
      "[[ 2]\n",
      " [ 6]\n",
      " [10]] (3, 1)\n"
     ]
    }
   ],
   "source": [
    "# We can make the same distinction when accessing columns of an array:\n",
    "col_r1 = a[:, 1]\n",
    "col_r2 = a[:, 1:2]\n",
    "print(col_r1, col_r1.shape)\n",
    "print()\n",
    "print(col_r2, col_r2.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "VP3916bOL9iP"
   },
   "source": [
    "Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "TBnWonIDL9iP",
    "outputId": "c29fa2cd-234e-4765-c70a-6889acc63573"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 4 5]\n",
      "[1 4 5]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,2], [3, 4], [5, 6]])\n",
    "\n",
    "# An example of integer array indexing.\n",
    "# The returned array will have shape (3,) and \n",
    "print(a[[0, 1, 2], [0, 1, 0]])\n",
    "\n",
    "# The above example of integer array indexing is equivalent to this:\n",
    "print(np.array([a[0, 0], a[1, 1], a[2, 0]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "n7vuati-L9iR",
    "outputId": "c3e9ba14-f66e-4202-999e-2e1aed5bd631"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2 2]\n",
      "[2 2]\n"
     ]
    }
   ],
   "source": [
    "# When using integer array indexing, you can reuse the same\n",
    "# element from the source array:\n",
    "print(a[[0, 0], [1, 1]])\n",
    "\n",
    "# Equivalent to the previous integer array indexing example\n",
    "print(np.array([a[0, 1], a[0, 1]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kaipSLafL9iU"
   },
   "source": [
    "One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "ehqsV7TXL9iU",
    "outputId": "de509c40-4ee4-4b7c-e75d-1a936a3350e7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1  2  3]\n",
      " [ 4  5  6]\n",
      " [ 7  8  9]\n",
      " [10 11 12]]\n"
     ]
    }
   ],
   "source": [
    "# Create a new array from which we will select elements\n",
    "a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "pAPOoqy5L9iV",
    "outputId": "f812e29b-9218-4767-d3a8-e9854e754e68"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 1  6  7 11]\n"
     ]
    }
   ],
   "source": [
    "# Create an array of indices\n",
    "b = np.array([0, 2, 0, 1])\n",
    "\n",
    "# Select one element from each row of a using the indices in b\n",
    "print(a[np.arange(4), b])  # Prints \"[ 1  6  7 11]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "6v1PdI1DL9ib",
    "outputId": "89f50f82-de1b-4417-e55c-edbc0ee07584"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[11  2  3]\n",
      " [ 4  5 16]\n",
      " [17  8  9]\n",
      " [10 21 12]]\n"
     ]
    }
   ],
   "source": [
    "# Mutate one element from each row of a using the indices in b\n",
    "a[np.arange(4), b] += 10\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kaE8dBGgL9id"
   },
   "source": [
    "Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "32PusjtKL9id",
    "outputId": "8782e8ec-b78d-44d7-8141-23e39750b854"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[False False]\n",
      " [ True  True]\n",
      " [ True  True]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "a = np.array([[1,2], [3, 4], [5, 6]])\n",
    "\n",
    "bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;\n",
    "                    # this returns a numpy array of Booleans of the same\n",
    "                    # shape as a, where each slot of bool_idx tells\n",
    "                    # whether that element of a is > 2.\n",
    "\n",
    "print(bool_idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "cb2IRMXaL9if",
    "outputId": "5983f208-3738-472d-d6ab-11fe85b36c95"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3 4 5 6]\n",
      "[3 4 5 6]\n"
     ]
    }
   ],
   "source": [
    "# We use boolean array indexing to construct a rank 1 array\n",
    "# consisting of the elements of a corresponding to the True values\n",
    "# of bool_idx\n",
    "print(a[bool_idx])\n",
    "\n",
    "# We can do all of the above in a single concise statement:\n",
    "print(a[a > 2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CdofMonAL9ih"
   },
   "source": [
    "For brevity we have left out a lot of details about numpy array indexing; if you want to know more you should read the documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jTctwqdQL9ih"
   },
   "source": [
    "### Datatypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kSZQ1WkIL9ih"
   },
   "source": [
    "Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "4za4O0m5L9ih",
    "outputId": "2ea4fb80-a4df-43f9-c162-5665895c13ae"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int64 float64 int64\n"
     ]
    }
   ],
   "source": [
    "x = np.array([1, 2])  # Let numpy choose the datatype\n",
    "y = np.array([1.0, 2.0])  # Let numpy choose the datatype\n",
    "z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype\n",
    "\n",
    "print(x.dtype, y.dtype, z.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RLVIsZQpL9ik"
   },
   "source": [
    "You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TuB-fdhIL9ik"
   },
   "source": [
    "### Array math"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "18e8V8elL9ik"
   },
   "source": [
    "Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "gHKvBrSKL9il",
    "outputId": "a8a924b1-9d60-4b68-8fd3-e4657ae3f08b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 6.  8.]\n",
      " [10. 12.]]\n",
      "[[ 6.  8.]\n",
      " [10. 12.]]\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]], dtype=np.float64)\n",
    "y = np.array([[5,6],[7,8]], dtype=np.float64)\n",
    "\n",
    "# Elementwise sum; both produce the array\n",
    "print(x + y)\n",
    "print(np.add(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "1fZtIAMxL9in",
    "outputId": "122f1380-6144-4d6c-9d31-f62d839889a2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-4. -4.]\n",
      " [-4. -4.]]\n",
      "[[-4. -4.]\n",
      " [-4. -4.]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise difference; both produce the array\n",
    "print(x - y)\n",
    "print(np.subtract(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "nil4AScML9io",
    "outputId": "038c8bb2-122b-4e59-c0a8-a091014fe68e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5. 12.]\n",
      " [21. 32.]]\n",
      "[[ 5. 12.]\n",
      " [21. 32.]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise product; both produce the array\n",
    "print(x * y)\n",
    "print(np.multiply(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "0JoA4lH6L9ip",
    "outputId": "12351a74-7871-4bc2-97ce-a508bf4810da"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.2        0.33333333]\n",
      " [0.42857143 0.5       ]]\n",
      "[[0.2        0.33333333]\n",
      " [0.42857143 0.5       ]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise division; both produce the array\n",
    "# [[ 0.2         0.33333333]\n",
    "#  [ 0.42857143  0.5       ]]\n",
    "print(x / y)\n",
    "print(np.divide(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "g0iZuA6bL9ir",
    "outputId": "29927dda-4167-4aa8-fbda-9008b09e4356"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1.         1.41421356]\n",
      " [1.73205081 2.        ]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise square root; produces the array\n",
    "# [[ 1.          1.41421356]\n",
    "#  [ 1.73205081  2.        ]]\n",
    "print(np.sqrt(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "a5d_uujuL9it"
   },
   "source": [
    "Note that unlike MATLAB, `*` is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "I3FnmoSeL9iu",
    "outputId": "46f4575a-2e5e-4347-a34e-0cc5bd280110"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "219\n",
      "219\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]])\n",
    "y = np.array([[5,6],[7,8]])\n",
    "\n",
    "v = np.array([9,10])\n",
    "w = np.array([11, 12])\n",
    "\n",
    "# Inner product of vectors; both produce 219\n",
    "print(v.dot(w))\n",
    "print(np.dot(v, w))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "vmxPbrHASVeA"
   },
   "source": [
    "You can also use the `@` operator which is equivalent to numpy's `dot` operator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "vyrWA-mXSdtt",
    "outputId": "a9aae545-2c93-4649-b220-b097655955f6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "219\n"
     ]
    }
   ],
   "source": [
    "print(v @ w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "zvUODeTxL9iw",
    "outputId": "4093fc76-094f-4453-a421-a212b5226968"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[29 67]\n",
      "[29 67]\n",
      "[29 67]\n"
     ]
    }
   ],
   "source": [
    "# Matrix / vector product; both produce the rank 1 array [29 67]\n",
    "print(x.dot(v))\n",
    "print(np.dot(x, v))\n",
    "print(x @ v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 121
    },
    "colab_type": "code",
    "id": "3V_3NzNEL9iy",
    "outputId": "af2a89f9-af5d-47a6-9ad2-06a84b521b94"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[19 22]\n",
      " [43 50]]\n",
      "[[19 22]\n",
      " [43 50]]\n",
      "[[19 22]\n",
      " [43 50]]\n"
     ]
    }
   ],
   "source": [
    "# Matrix / matrix product; both produce the rank 2 array\n",
    "# [[19 22]\n",
    "#  [43 50]]\n",
    "print(x.dot(y))\n",
    "print(np.dot(x, y))\n",
    "print(x @ y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FbE-1If_L9i0"
   },
   "source": [
    "Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "DZUdZvPrL9i0",
    "outputId": "99cad470-d692-4b25-91c9-a57aa25f4c6e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "[4 6]\n",
      "[3 7]\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]])\n",
    "\n",
    "print(np.sum(x))  # Compute sum of all elements; prints \"10\"\n",
    "print(np.sum(x, axis=0))  # Compute sum of each column; prints \"[4 6]\"\n",
    "print(np.sum(x, axis=1))  # Compute sum of each row; prints \"[3 7]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ahdVW4iUL9i3"
   },
   "source": [
    "You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).\n",
    "\n",
    "Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "63Yl1f3oL9i3",
    "outputId": "c75ac7ba-4351-42f8-a09c-a4e0d966ab50"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2]\n",
      " [3 4]]\n",
      "transpose\n",
      " [[1 3]\n",
      " [2 4]]\n"
     ]
    }
   ],
   "source": [
    "print(x)\n",
    "print(\"transpose\\n\", x.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "mkk03eNIL9i4",
    "outputId": "499eec5a-55b7-473a-d4aa-9d023d63885a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]]\n",
      "transpose\n",
      " [[1]\n",
      " [2]\n",
      " [3]]\n"
     ]
    }
   ],
   "source": [
    "v = np.array([[1,2,3]])\n",
    "print(v )\n",
    "print(\"transpose\\n\", v.T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "REfLrUTcL9i7"
   },
   "source": [
    "### Broadcasting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EygGAMWqL9i7"
   },
   "source": [
    "Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.\n",
    "\n",
    "For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "WEEvkV1ZL9i7",
    "outputId": "3896d03c-3ece-4aa8-f675-aef3a220574d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "# We will add the vector v to each row of the matrix x,\n",
    "# storing the result in the matrix y\n",
    "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "v = np.array([1, 0, 1])\n",
    "y = np.empty_like(x)   # Create an empty matrix with the same shape as x\n",
    "\n",
    "# Add the vector v to each row of the matrix x with an explicit loop\n",
    "for i in range(4):\n",
    "    y[i, :] = x[i, :] + v\n",
    "\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2OlXXupEL9i-"
   },
   "source": [
    "This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "vS7UwAQQL9i-",
    "outputId": "8621e502-c25d-4a18-c973-886dbfd1df36"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 0 1]\n",
      " [1 0 1]\n",
      " [1 0 1]\n",
      " [1 0 1]]\n"
     ]
    }
   ],
   "source": [
    "vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other\n",
    "print(vv)                # Prints \"[[1 0 1]\n",
    "                         #          [1 0 1]\n",
    "                         #          [1 0 1]\n",
    "                         #          [1 0 1]]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "N0hJphSIL9jA",
    "outputId": "def6a757-170c-43bf-8728-732dfb133273"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "y = x + vv  # Add x and vv elementwise\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zHos6RJnL9jB"
   },
   "source": [
    "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "vnYFb-gYL9jC",
    "outputId": "df3bea8a-ad72-4a83-90bb-306b55c6fb93"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# We will add the vector v to each row of the matrix x,\n",
    "# storing the result in the matrix y\n",
    "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "v = np.array([1, 0, 1])\n",
    "y = x + v  # Add v to each row of x using broadcasting\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "08YyIURKL9jH"
   },
   "source": [
    "The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.\n",
    "\n",
    "Broadcasting two arrays together follows these rules:\n",
    "\n",
    "1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.\n",
    "2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.\n",
    "3. The arrays can be broadcast together if they are compatible in all dimensions.\n",
    "4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.\n",
    "5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension\n",
    "\n",
    "If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).\n",
    "\n",
    "Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).\n",
    "\n",
    "Here are some applications of broadcasting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "EmQnwoM9L9jH",
    "outputId": "f59e181e-e2d4-416c-d094-c4d003ce8509"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 4  5]\n",
      " [ 8 10]\n",
      " [12 15]]\n"
     ]
    }
   ],
   "source": [
    "# Compute outer product of vectors\n",
    "v = np.array([1,2,3])  # v has shape (3,)\n",
    "w = np.array([4,5])    # w has shape (2,)\n",
    "# To compute an outer product, we first reshape v to be a column\n",
    "# vector of shape (3, 1); we can then broadcast it against w to yield\n",
    "# an output of shape (3, 2), which is the outer product of v and w:\n",
    "\n",
    "print(np.reshape(v, (3, 1)) * w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "PgotmpcnL9jK",
    "outputId": "567763d3-073a-4e3c-9ebe-6c7d2b6d3446"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2 4 6]\n",
      " [5 7 9]]\n"
     ]
    }
   ],
   "source": [
    "# Add a vector to each row of a matrix\n",
    "x = np.array([[1,2,3], [4,5,6]])\n",
    "# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),\n",
    "# giving the following matrix:\n",
    "\n",
    "print(x + v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "T5hKS1QaL9jK",
    "outputId": "5f14ac5c-7a21-4216-e91d-cfce5720a804"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5  6  7]\n",
      " [ 9 10 11]]\n"
     ]
    }
   ],
   "source": [
    "# Add a vector to each column of a matrix\n",
    "# x has shape (2, 3) and w has shape (2,).\n",
    "# If we transpose x then it has shape (3, 2) and can be broadcast\n",
    "# against w to yield a result of shape (3, 2); transposing this result\n",
    "# yields the final result of shape (2, 3) which is the matrix x with\n",
    "# the vector w added to each column. Gives the following matrix:\n",
    "\n",
    "print((x.T + w).T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "JDUrZUl6L9jN",
    "outputId": "53e99a89-c599-406d-9fe3-7aa35ae5fb90"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5  6  7]\n",
      " [ 9 10 11]]\n"
     ]
    }
   ],
   "source": [
    "# Another solution is to reshape w to be a row vector of shape (2, 1);\n",
    "# we can then broadcast it directly against x to produce the same\n",
    "# output.\n",
    "print(x + np.reshape(w, (2, 1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "VzrEo4KGL9jP",
    "outputId": "53c9d4cc-32d5-46b0-d090-53c7db57fb32"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  4  6]\n",
      " [ 8 10 12]]\n"
     ]
    }
   ],
   "source": [
    "# Multiply a matrix by a constant:\n",
    "# x has shape (2, 3). Numpy treats scalars as arrays of shape ();\n",
    "# these can be broadcast together to shape (2, 3), producing the\n",
    "# following array:\n",
    "print(x * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "89e2FXxFL9jQ"
   },
   "source": [
    "Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "iF3ZtwVNL9jQ"
   },
   "source": [
    "This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tEINf4bEL9jR"
   },
   "source": [
    "## Matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0hgVWLaXL9jR"
   },
   "source": [
    "Matplotlib is a plotting library. In this section give a brief introduction to the `matplotlib.pyplot` module, which provides a plotting system similar to that of MATLAB."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cmh_7c6KL9jR"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jOsaA5hGL9jS"
   },
   "source": [
    "By running this special iPython command, we will be displaying plots inline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ijpsmwGnL9jT"
   },
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U5Z_oMoLL9jV"
   },
   "source": [
    "### Plotting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6QyFJ7dhL9jV"
   },
   "source": [
    "The most important function in `matplotlib` is plot, which allows you to plot 2D data. Here is a simple example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 282
    },
    "colab_type": "code",
    "id": "pua52BGeL9jW",
    "outputId": "9ac3ee0f-7ff7-463b-b901-c33d21a2b10c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x121741990>]"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dd3hU95Xw8e8ZVVRBSEINAaIXgQCZYjvFNsYUG9yDnTgkcdbObuxN22zKm02yfpNNNptN2WyaY8d2bMfYwYVujFtcAIMokugIUVRRAYSEunTePzTkVbCompk75XyeZ56ZuXPv3KNhmHPvub8iqooxxpjQ5XI6AGOMMc6yRGCMMSHOEoExxoQ4SwTGGBPiLBEYY0yIC3c6gCuRnJysw4cPdzoMY4wJKNu2batT1ZRzlwdkIhg+fDgFBQVOh2GMMQFFRI72tdxKQ8YYE+IsERhjTIizRGCMMSHOEoExxoQ4SwTGGBPiPJIIROSPIlIjIrvO87qIyP+ISImIFInItF6vLRWRg+7bUk/EY4wx5tJ56ozgSWDeBV6fD4x23x4AfgsgIknA94CZwAzgeyIyyEMxGWOMuQQe6Uegqu+IyPALrLIY+JP2jHm9WUQGikg68HFgg6qeABCRDfQklOc8EVewae/sZsvhE9Q0tnK6pYPG1k7SBw5g5ogksgYNQEScDtEYv3H8dCvvHqzjTFsnXd1KtyojU+OYnTOY6Igwp8PzK77qUJYJlPV6Xu5edr7lHyIiD9BzNkF2drZ3ovRDqkpReQMvbS9nZWElJ5s7+lwvLSGaORNSefj60QxJiPZxlMb4h4bmDp7beoxXd1Wzs+xUn+tER7i4ZmQyi/IyuGVyBi6XHUD5KhH09UnrBZZ/eKHqo8CjAPn5+SExm05NYyvfeXkXr+05TlS4ixsnDOHWvExGpsaREB1OXHQ4R+qa2XLkBJtL63l+axnLt5Xz+WtzePBjOcRHRzj9JxjjE6rKyzsq+OGavdSfaSc3M5F/mTuGOROGkBwXRZj7bLmw/BRv7avhjX01fGnZTv606SiPLJ7IxIxEh/8CZ4mnZihzl4ZWq+qkPl77PfC2qj7nfr6fnrLQx4GPq+qDfa13Pvn5+RrMQ0yoKisLK/neyt00t3fx5Tmj+dSsYSRc5If9WH0zP31tPysLK0mOi+S3n5rOVcOTfBS1Mc44UneGb71UzKbSeqZmD+QHt0666A97d7fy4vZyfrxuHyeb27lv1jC+vXA8UeHBXTISkW2qmv+h5T5KBAuBh4AF9FwY/h9VneG+WLwNONuKaDsw/ew1g/MJ5kTQ1a18+6Vini8oI2/oQH561xRGpcZd1nsUlzfwpWU7KD/Zwo9uz+WO6VleitYYZ+0sO8Vnn9hCV7fyjfnjuOeq7Msq9TS0dPDzDQd4cuMRZuUk8fv78kkcELxn0l5NBCLyHD1H98nAcXpaAkUAqOrvpOcq5v/ScyG4Gfisqha4t/0c8G33W/1QVZ+42P6CNRF0dHXzled3srqoii9eN5KvzBlDeNiVNexqaO7gH5/dxsZD9fzTx0fyL3PHWi3UBJV3DtTyhWe2kRwXxdP3z2DY4Ngrfq9XdlTw9eWF5CTH8eTnriI9cYAHI/UfXj8j8KVgTARtnV089OcdbNhznG/NH8eDHxvZ7/fs6Ormuyt289yWY3zumhF895YJHojUGOetKqzkqy/sZFRqPE999ipSPdBA4r2DdXzhmW3ER4fz3D/MYnjylScWf3W+RGA9i/1AV7fyT89sZ8Oe4/z7ookeSQIAEWEu/uO2SXz2muH88f3DPPZuqUfe1xgnbTxUx1ee38nUoYNY9sAsjyQBgGtHJ/P8g7No7ejic09upeE8LfSCkSUCP/CT9ft4Y18NjyyeyNKrh3v0vUWE7yycwPxJafxgzV5WF1V69P2N8aXDdWf4x2e2Mzw5lsc+4/l6/sSMRH5/Xz5lJ5v5wjPbaO/s9uj7+ytLBA5bVVjJ7/9ayidnZvPp2cO9so8wl/DzT+Rx1fBBfPX5QrYeueC1eGP8UkNLB/c/tRWXwONL8y/aiu5KzRiRxH/eMZlNpfX82yu7CMTy+eWyROCgPZWn+dflReQPG8T3bpno1X1FR4Txh0/nkzloAA//eQenmtu9uj9jPKmrW3n4uR0cq2/mt5+a3q8Lw5fi9mlZPHz9KJ4vKOPJjUe8ui9/YInAIadbO3jwmQISBoTzm09NIzLc+/8UA2Mi+Z8lU6lrauNbLxWHxJGOCQ6Pv1fKOwdqeWTxJGblDPbJPr8yZwxzxqfyo3X7OHi80Sf7dIolAof8aO0+Kk628JtPTic13ndDQuRmJfK1uWNZt6uav2wr99l+jblSB4438tP1B5g7YQj3zBjqs/26XMKP75hMfFQ4X3lhJx1dwXu9wBKBA94vqeO5Lcf4h4/kMH2Y7wdbfeCjOczKSeL7K3dzpO6Mz/dvzKXq6Ormay8UEhcdzg9vy/X5wIrJcVH88LZcdlWc5ldvHPTpvn3JEoGPnWnr5BsvFpGTHMtXbhzjSAxhLuFnd+cR7hK+9pdCurutRGT802/fPkRxRQM/uHUSKfFRjsQwb1Iad0zL4tdvH2LHsZOOxOBtlgh87Cev7qPiVAs/uXOyo0PhZgwcwL/dPIFtR0+yfLuViIz/2VN5mv954yCLpmSwIDfd0Vi+t2gCaQnRfH15UVCWiCwR+NC2oyd4atNRls4eTr4fDAZ3x7QspmUP5D/X7QupzjPG/6kq31+1m4QBEfz7Iu+2qLsUCdERfH/RREpqmnh601Gnw/E4SwQ+0t2tPLJqD2kJ0fzrvLFOhwP0XAx7ZPEkTja3898b9jsdjjF/s7a4mi2HT/C1uWMYFBvpdDgAzBmfykdGJ/Pz1w9Q39TmdDgeZYnAR1YUVlBY3sDXbxpLTKSvpoG4uEmZiXxq1jCe2XyU3ZUNTodjDK0dXfzH2r2MS4tnyVX+MwmViPC9WybQ0t7FT18LrgMnSwQ+0NLexU9e3U9uZiK3Te1zAjZHfe3GsQyKieS7K3Zb3wLjuD+8U0rFqRa+d8tEwvxsxNxRqfEsvXo4y7aWUVwePAdOlgh84LF3S6lqaOU7C8f75VDQiTERfP2msWw7epL1u487HY4JYVUNLfzm7UPMn5TG7JG+6Th2uf75htEkxUTy/VXBc+BkicDLjp9u5bd/PcS8iWnM9FGPyCtx5/QscpJj+dmG/XRZc1LjkP9+7QBdqnx7wXinQzmvxAERfG1uz4HTG3trnA7HIywReNkv3zhIR1c335w/zulQLig8zMVX547hwPEmVhZWOB2OCUGH687w0vZy7ps1jKFJMU6Hc0F35WeRnRTDzzYcCIp+OB5JBCIyT0T2i0iJiHyzj9d/LiI73bcDInKq12tdvV5b6Yl4/EXFqRb+UlDGJ64aGhCTXCyYlM749AR+vuFgULaVNv7tV28cJDLcxRc8NB+HN0WEufjSDaPZU3Wa9burnQ6n3/qdCEQkDPg1MB+YANwjIn83FZaqfkVV81Q1D/gV8FKvl1vOvqaqi/objz/5zVslAPzjx0c5HMmlcbmEr980hmMnmnmhoMzpcEwIOVTbxCs7K/j07OGO9SC+XLdOzSQnJZafv34g4MupnjgjmAGUqGqpqrYDy4DFF1j/HuA5D+zXr1WeauGFgjLuyh9K5sDAmf/0urGpTB82iF+9UUJrR5fT4ZgQ8T9vHCQ6IowHP5rjdCiXLMwlfHlOTzl1TXGV0+H0iycSQSbQ+/Cx3L3sQ0RkGDACeLPX4mgRKRCRzSJy6/l2IiIPuNcrqK2t9UDY3vXbtw8B8E8f9//T3N5EhH+ZO5bq0612VmB84uDxRlYWVvLp2cMZHBcYZwNn3Zybztgh8fzi9QN0BnA51ROJoK/2kOc7T1oCLFfV3oea2e7JlO8FfiEiff5yquqjqpqvqvkpKSn9i9jLqhpaeH5rGXdOzyJrkH9f9OrLrJwkpmUP5NF3SgP6y20Cwy/fOEhMRBgPBNDZwFkul/CVG0dTWnsmoM8KPJEIyoHeg4RnAeebGHcJ55SFVLXSfV8KvA1M9UBMjvr9X0vpVuWfAuTawLlEhC98bCTlJ1sC+stt/N/R+jOsLa7ivtnDSfKToSQu19wJaYxMieXRd0oDtl+BJxLBVmC0iIwQkUh6fuw/1PpHRMYCg4BNvZYNEpEo9+Nk4BpgjwdicsyJM+0s23qM26Zm+n0TuAuZM34Io1Lj+N1fA/fLbfzf4+8dJswlfO6a4U6HcsVcLuGBj+awu/I075fUOx3OFel3IlDVTuAhYD2wF3hBVXeLyCMi0rsV0D3AMv37X5XxQIGIFAJvAT9W1YBOBM9uPkprRzf/EICnub25XMKDH81hb9Vp/nrA/6/JmMBz8kw7LxSUcWteJqkJvpulzxtunZpJSnwUv3/nkNOhXBGPjH6mqmuBtecs++45z7/fx3YbgVxPxOAPWju6eGrTUT42JoUxQ+KdDqffFudl8rMNB/jdXw/x8bGpTodjgszTQXLQBBAVHsZnrh7Of63fz+7KBiZmJDod0mWxnsUetLKwkrqmNv7hI4H/xQaIDHdx/7Uj2Fx6ImhnZjLOaO3o4qmNR7hubHAcNAF8auYwYiPD+MM7pU6HctksEXiIqvL4u4cZlxbPNaP8d0yhy7VkRjYJ0eE8/t5hp0MxQeSl7RXUn2nngY8GVvPqC0mMiWDJjGxWFVVRfrLZ6XAuiyUCD3nnYB37jzfy+Y/k+HyCbW+Kiwrn7vyhvLqrmuOnW50OxwSB7m7lsXdLyc1MZFaO8zP1edLnrh2BAE9tPOJ0KJfFEoGHPPZuKanxUSyakuF0KB736dnD6VLl2c3BN0Wf8b13DtZSWneGz39kRFAdNAFkDhzATRPTeKGgnJb2wOmZb4nAA0pqGnn3YB2fnj2MyPDg+0izB8dw/dhU/rzlGG2dgfPlNv7p6U1HSY6LYv4kZyek95ZPzx5GQ0tHQI3iG3y/Wg54ZvMxIsKEJTP8Z1o9T1t69XDqmtpZax3MTD+UnWjmzf013DNjaFAeNAHMGJHE2CHxPLXxaMD0wQnOfwkfam7v5MVt5SzITSc5wMZJuRzXjkomJyWWJzdaechcuWc/OIZLhHtnBu9Bk4hw3+xh7Kk6zfYAaW1niaCfVu6spLGtk0/NGuZ0KF7lcglLZw+nsOwUO8tOXXwDY87R2tHF81uPMWd8KumJgTMi75W4bWom8VHh/GlTYBw4WSLoB1Xl6c1HGZcWT/6wQU6H43V3TM8iLiqcPwVYiwjjH9YUVXGyuYNPzx7udCheFxsVzh3Ts1hbXEVtY5vT4VyUJYJ+2Fl2it2Vp/nkrGFB1/qhL3FR4dw2NZPVxVWcam53OhwTYP60+Sg5KbFc7aeT0nvafbOH0dGlLNtyzOlQLsoSQT88s/kYsZFh3Da1z+kXgtKSGUNp7+zm5R2B0yLCOK+4vIHCslPcFyIHTQAjU+K4dlQyy7aW+f0MZpYIrtDJM+2sKqrktmmZxEV5ZMimgDAxI5HJWYks21IWMC0ijPOWbT1GVLiL26dlOR2KTy2ZMZSKUy28V1LndCgXZIngCr28o4L2zm4+OTO4LxL3ZclV2ew/3sgOu2hsLkFLexcrd1ayMDedxAERTofjUzdOGMKgmAie3+rf5SFLBFdAVXmhoIwpWYmMT09wOhyfW5SXQUxkWEDUPo3z1hZX0djWyd1XDb34ykEmKjyM26dlsWHPceqb/PeisSWCK1Bc0cC+6kbuyg+9Lzb0XDS+ZXIGqwqraGztcDoc4+ee31rG8MExzBwRXOMKXapPXDWUji716+tqHkkEIjJPRPaLSImIfLOP1z8jIrUistN9+3yv15aKyEH3bakn4vG2FwrKiAp3sSgv+MYVulRLZgylpaOLlYXnm5XUGCitbWLLkRPcfdXQkLlIfK4xQ+KZlj2QZVv997pavxOBiIQBvwbmAxOAe0RkQh+rPq+qee7bY+5tk4DvATOBGcD3RMSvG+S3dnSxYmclC3LTSYgOrXpnb3lDBzIuLZ5lW8qcDsX4sRcKyglzCXeG2EXicy25KpuSmia/7WnsiTOCGUCJqpaqajuwDFh8idveBGxQ1ROqehLYAMzzQExe8+quahpbO7k7RMtCZ4kIS64aSnFFA3urTjsdjvFDHV3dLN9WznVjUwN+Ksr+Wjg5ndjIML89cPJEIsgEev915e5l57pDRIpEZLmInP0VvdRt/cbzW8vITgrdemdvi/IyiQgTXtxW7nQoxg+9ta+GuqY2loTgReJzxUaFsygvg9VFVTS1dTodzod4IhH0Vfg7txC2ChiuqpOB14GnLmPbnhVFHhCRAhEpqK11ZjL1Y/XNbCqt5+78LFyu0Kx39pYUG8l1Y1N5ZWclnV3dTodj/MzybeWkxEfx8bEpTofiF+6cnkVLRxev7qp2OpQP8UQiKAd6p/ws4O+uIKpqvaqebTv1B2D6pW7b6z0eVdV8Vc1PSXHmi7V8ezkiPWPumB53TM+irqmNdw46k5yNfzpxpp239tdwa14G4WHWOBFgWvYghg2O4aXt/ncG7Yl/oa3AaBEZISKRwBJgZe8VRKT3DBSLgL3ux+uBuSIyyH2ReK57md9RVV7eUc61o5KDfuTEy3Hd2FQGxUTw4jb/bRpnfG91USUdXRpyPYkvRES4fWoWm0rrqTjV4nQ4f6ffiUBVO4GH6PkB3wu8oKq7ReQREVnkXu2fRWS3iBQC/wx8xr3tCeD/0pNMtgKPuJf5nYKjJyk70RJS4wpdishwF4vzMtmw5zgNzdanwPR4cXsF49MTQrLD5YXcPi0TVXjFz/oUeOScTVXXquoYVR2pqj90L/uuqq50P/6Wqk5U1Smqep2q7uu17R9VdZT79oQn4vGGl7ZXMCAijJsmpjkdit+5c3oW7V3drCqyPgUGSmqaKCw7xR3T7KDpXEOTYpgxIokXt5f7VZ8CK95dgtaOLlYXVTJvUhqxITTA3KWamJHA2CHxLLfWQwZ4eUc5LiGkO1xeyB3TMimtPeNXEzxZIrgEb+6robG108pC5yEi3DE9k51lpzhU2+R0OMZB3d3Ky9sr+OiYFFLjQ7vvwPksyE0nKtzFS9v9pzxkieASvLS9gtT4KK4Zlex0KH7r1rxMRGCFn9U+jW9tPlxPZUOrXSS+gPjoCG6amMaqokraOrucDgewRHBRJ8608/b+GhbnZRBmfQfOKzUhmqtHDmZFYaVf1T6Nb720vYL4qHDmThjidCh+7fZpmZxq7uDt/f7R7NoSwUWsLqqks9uawV2KxXmZHK1v9qvap/GdVndnqXmT0oiOCHM6HL927ahkBsdGsnKnfzSwsERwES/vqGBcWrw1g7sE8yalERnuYoWffLmNb725r4amtk5utWtpFxUe5uLmyem8vve4XwzlbongAo7VN7Pj2CkW59kX+1IkREdww7jUnrMoG3Ii5KzYWUFKfBSzckJjcvr+WpSXSVtnN6/tPu50KJYILuRsu/hbpqRfZE1z1uK8TOqa2v1+jlbjWQ0tHby1v5ZbJtu1tEs1LXsgWYMGsMIP5vSwRHABK3ZWkD9sEFmDYpwOJWBcNy6FhOhwv6l9Gt9Yv7ua9s5u6ztwGUSExXkZvHewltpGZ6extERwHvuqT3PgeJN9sS9TVHgYC3LTWb+7mpZ2/2gaZ7xv5c5Khg2OYUpWotOhBJTFeZl0K6xxuFe+JYLzWLmzkjCXsCDXykKXa1FeBmfau9iw1/nap/G+mtOtbDxUx+IpGSE7HeWVGjOkpyGK0+UhSwR9UFVWFlZyzahkkuOinA4n4MwaMZghCVGs8oPap/G+1UVVdKsNKXGlFudlsOPYKY7VNzsWgyWCPmw/doryky0snmJf7CvhcgkLczP46/5aTvtB0zjjXSsKK5mYkcCo1HinQwlIt7h/Z1YWOtcr3xJBH1YVVhIV7mLuROsdeaVunpJOe5d/NI0z3nOsvpnCslMssoOmK5Y5cADThw1idVGVYzFYIjhHZ1c3q4uquH5cKvHREU6HE7CmDh1I5sABrLahqYPa2SbWCyfbtbT+uGVyOvuqGympaXRk/5YIzrHl8Anqmtr+drpmroyIcMuUDN47WMfJM+1Oh2O8ZHVRFVOzB1oT635akJuOCKwqdOaswCOJQETmich+ESkRkW/28fpXRWSPiBSJyBsiMqzXa10istN9W3nutr62qqiKmMgwrhub6nQoAe/myel0diuv7va/ybpN/x2qbWJv1WlunmwHTf2VmhDNzBFJrC5yZtDGficCEQkDfg3MByYA94jIhHNW2wHkq+pkYDnwk16vtahqnvu2CAd1dnXz6q4q5owfwoBIGzSrvyZmJJCTHGuth4LUmqIqRGChNbH2iJsnZ3Co9gz7qn1fHvLEGcEMoERVS1W1HVgGLO69gqq+papn20ZtBvxyKM+Nh+o52dxh9U4PERFunpzO5tJ6ahpbnQ7HeNjqokquGpZEWqJNQOMJ8yelEeYSRw6cPJEIMoGyXs/L3cvO535gXa/n0SJSICKbReTW820kIg+41yuorfXOGN5riqqIiwrnY2NSvPL+oeiWKRl0K6wrtvJQMDlwvJEDx5u42cbh8pjBcVFcPXIwq4uqfF4e8kQi6KsrYZ9/hYh8CsgH/qvX4mxVzQfuBX4hIiP72lZVH1XVfFXNT0nx/A91e2c3r+6u5sYJQ2wsdQ8aPSSesUPirfVQkFldWIlLYP4kSwSedPPkdI6daKa4osGn+/VEIigHhvZ6ngV86H+9iMwB/g+wSFX/NsKSqla670uBt4GpHojpsr1fUkdDSwc3W1nI4xZOTqfg6EmqG6w8FAxUldVFVczKGUxKvPW896SbJqYR7hKf9ynwRCLYCowWkREiEgksAf6u9Y+ITAV+T08SqOm1fJCIRLkfJwPXAHs8ENNlW11URXx0ONeOtnmJPW1BbjqqsG6Xcx1mjOfsrWqktO6MtRbygoExkXxkdDJrfFwe6nciUNVO4CFgPbAXeEFVd4vIIyJythXQfwFxwF/OaSY6HigQkULgLeDHqurzRNDW2cVre6q5aWIaUeFWFvK0UalxjEuLZ22xJYJgsKa4Z0DGm6znvVcsyE2n4lQLheW+Kw+Fe+JNVHUtsPacZd/t9XjOebbbCOR6Iob+ePdAHY2tndZayIsW5Kbzsw0HqG5otVYmAUxVWVtczeycwQy2ARm9Yu6ENL4dVsza4iryhg70yT6tZzGwtriKhOhwrhlpZSFvOTuct5WHAtveqkYO152x4dm9KDEmgmtH+bY8FPKJoK2zZ9z8uRN7Jl433mHloeCwtrjKykI+4OvyUMj/8r1f4i4L2RGO1y3MTWfrEWs9FKh6ykJVzMpJsrKQl82dkEZEmPjswCnkE8GaouqestAoKwt524LJVh4KZPuqe1oLWVnI+xJjIrjGh+WhkE4E7Z3dbNhTzY0TrCzkCyNTespDaxwcd91cubXFVbikp6278b6z5aEiH5SHQvrX7/2SOk63drJwsn2xfWVhrnUuC0Sqyprink5kNn2rb9zkLg+t8UF5KKQTwZriKuKjrCzkS/PdZYVXrTwUUPZVN1Jaa2UhX/JleShkE0F7ZzevuccWsk5kvjMqNY6xQ+JZu8sGoQsk69xloXmT7OzZl86Wh7w99lDIJoKNh3rKQnaE43vzc9PYeuSEDU0dQNbuqmbmCCsL+drcCUMIdwlrvTx6b8gmgnXF1cRFhfORMVYW8rWzYw+tt7OCgHDweCMlNU0syLWzAV8bGBPJ7JGDWbfLu+WhkEwEHV3drN9TzZzxqVYWcsDo1DhGpsR6/SjHeMba4mrEWgs5ZkFuOkfrm9lTddpr+wjJRPBB6QlONXf87cKl8S0RYWFuOh8crqeuqe3iGxhHrdtVxVXDkkhNsDGinDB3whDCXOLVyZ1CMhGs3dUzQb3NROac+bnpdCust4nt/dqh2ib2VTcy38pCjhkcF8WsnCTWFnuvPBRyiaCrW1m/q5rrx6XaTGQOGpcWz4jkWJvC0s+96r6OY62FnDV/UjqldWc4cLzJK+8fcolgy+ET1J9pt9ZCDhMR5k9KY1NpPSfOtDsdjjmPtcVVTMseSHriAKdDCWk3TUxDBK+NPRRyiWDdriqiI1x8fKyVhZy2IDedrm5lwx47K/BHR+vPsLvytB00+YGU+ChmDE/y2jhdHkkEIjJPRPaLSImIfLOP16NE5Hn36x+IyPBer33LvXy/iNzkiXjOp7tbWbermuvGphIT6ZE5eUw/TMxIIDspxloP+al1VhbyKwty0zlwvImSmkaPv3e/fw1FJAz4NXAjPRPZbxWRledMOXk/cFJVR4nIEuA/gU+IyAR65jieCGQAr4vIGFXt6m9cfdl27CS1jW3WWshPiAjzc9N4/N3DNDR3kBgT4XRIppd1xVVMzkoka1CM06EYejpixkaFk+aFMp0nzghmACWqWqqq7cAyYPE56ywGnnI/Xg7cICLiXr5MVdtU9TBQ4n4/r1hbXEVkuIvrx6V6axfmMs2flE5nt7Jh73GnQzG9lJ9sprC8wcpCfiQ1Ppo7p2cRF+X5aoYnEkEmUNbrebl7WZ/ruCe7bwAGX+K2AIjIAyJSICIFtbW1VxRoV7cyb2KaVz5Ic2WmZCWSkRhtg9D5mbOtheZbWSgkeOIXUfpYdm5j1/Otcynb9ixUfRR4FCA/P/+KGtM+sniSz+YANZempzyUztObjtLY2kF8tJWH/MHa4iomZiQwbHCs06EYH/DEGUE5MLTX8yyg8nzriEg4kAicuMRtPaqnImX8yYLcNNq7unlzX43ToRigqqGF7cdOWVkohHgiEWwFRovICBGJpOfi78pz1lkJLHU/vhN4U3sOzVcCS9ytikYAo4EtHojJBJCpQwcxJCHKJrb3E1YWCj39Lg2paqeIPASsB8KAP6rqbhF5BChQ1ZXA48DTIlJCz5nAEve2u0XkBWAP0Al80Vsthoz/crmE+ZPSeW7LMc60dRJr13Acta64mnFp8eSkxDkdivERj/QjUNW1qjpGVUeq6g/dy77rTgKoaquq3qWqo1R1hqqW9tr2h+7txqrqOk/EYwLP/ElptHV289Z+Kw85qeZ0K1uPnvD8HPcAABUuSURBVGD+JCsLhZKQ61ls/FP+8CSS46Js7CGHrd9djSo290CIsURg/EKYS5g3aQhv7quhpd2qg05ZW1zNqNQ4Rg+JdzoU40OWCIzfWJCbTktHF29becgRdU1tfHC43i4ShyBLBMZvzBiexODYSJvY3iHrd1fTrViz0RBkicD4jfAwF3MnpvHm3uO0dlh5yNfWFleRkxzLuDQrC4UaSwTGryzMTedMexd/PXBlw4iYK1Pf1Mbm0hPMz02zTpchyBKB8Sszc5IYFBPBOutc5lOv7TlOV7daWShEWSIwfiUizMXcCWm8vrfGykM+tLa4iuGDY5iQnuB0KMYBlgiM31kwOZ2mtk7eO1jndCgh4eSZdjYeqmd+brqVhUKUJQLjd64eOZjEARE29pCPvLanmq5uZaGVhUKWJQLjd3rKQ0PYsOc4bZ1WHvK2NcXVZCfFMDHDykKhyhKB8UsLJ6fT2NbJuwesPORNp5rb2VhSxwIrC4U0SwTGL10zKpnEARGssfKQV722+zid3WpjC4U4SwTGL0WEubhp4hBe32Ody7xpdXEV2Ukx5GYmOh2KcZAlAuO3Fk7O6CkPWeshrzh5pp33S+pYONnKQqHOEoHxW1ePHMzAmAjWFHl19tKQtX63tRYyPfqVCEQkSUQ2iMhB9/2gPtbJE5FNIrJbRIpE5BO9XntSRA6LyE73La8/8ZjgEhHmYt7ENDZYecgr1hRXMSI51loLmX6fEXwTeENVRwNvuJ+fqxn4tKpOBOYBvxCRgb1e/7qq5rlvO/sZjwkyC2zsIa+ob2pj46F6FlprIUP/E8Fi4Cn346eAW89dQVUPqOpB9+NKoAZI6ed+TYiYPXIwg2Ksc5mnvXq2LDTZykKm/4lgiKpWAbjvUy+0sojMACKBQ70W/9BdMvq5iERdYNsHRKRARApqa+3oMFREhLmYN6mnPGQzl3nOmqIqclJsyGnT46KJQEReF5FdfdwWX86ORCQdeBr4rKp2uxd/CxgHXAUkAd843/aq+qiq5qtqfkqKnVCEklsmZ9Dc3mUT23tIbWMbm0vrudnKQsYt/GIrqOqc870mIsdFJF1Vq9w/9H3+TxWRBGAN8B1V3dzrvc+e77eJyBPAv1xW9CYkzMwZTHJcFKsKK22YZA94dVcV3drTPNcY6H9paCWw1P14KbDi3BVEJBJ4GfiTqv7lnNfS3fdCz/WFXf2MxwShMJdw8+R03txXQ2Nrh9PhBLyVhZWMGRLHWCsLGbf+JoIfAzeKyEHgRvdzRCRfRB5zr3M38FHgM300E31WRIqBYiAZ+EE/4zFB6pYp6bR1dvP63uNOhxLQKk61sPXISRZNsbMB8/9dtDR0IapaD9zQx/IC4PPux88Az5xn++v7s38TOqYOHUTmwAGsKqzitqlZTocTsM52zrvZykKmF+tZbAKCy10eeudALaea250OJ2CtLKxkSlYiw5NjnQ7F+BFLBCZg3DIlg85u5dVd1U6HEpBKa5vYVXGaW6wsZM5hicAEjIkZCYxIjmWVjT10RVYWViKCJQLzIZYITMAQEW6ZnM6mQ/XUnG51OpyAoqqsLKxk5ogkhiREOx2O8TOWCExAWZSXSbfCqiIbcuJy7K48TWntGRZNyXQ6FOOHLBGYgDIqNY5JmQms2FnhdCgBZVVhJeEuYf4km4nMfJglAhNwbs3LpKi8gUO1TU6HEhC6u3vKQh8dk8Kg2EinwzF+yBKBCTi3TMnAJbBih50VXIrNh+upamjltqlWFjJ9s0RgAs6QhGiuHpnMKzsrUVWnw/F7r+yoIC4qnDnjhzgdivFTlghMQFqcl8GxE81sP3bK6VD8WmtHF+uKq5k3KY0BkWFOh2P8lCUCE5DmTUojKtxlF40v4vW9x2ls6+R2KwuZC7BEYAJSfHQEcyYMYXVRFR1d3RffIES9sqOCtIRoZuYMdjoU48csEZiAdWteJifOtPOOzWfcpxNn2nl7fy2L8zIIc9kENOb8LBGYgPWxMSkkxUby0nYrD/VldVElnd3KrVYWMhdhicAErMhwF4vzMtiw57iNSNqHl3dUMC4tnvHpCU6HYvxcvxKBiCSJyAYROei+H3Se9bp6TUqzstfyESLygXv7592zmRlzye6cnkV7VzcrC20gut5KaprYcewUt0+zswFzcf09I/gm8IaqjgbecD/vS4uq5rlvi3ot/0/g5+7tTwL39zMeE2ImZiQyPj2B5dvKnQ7Fr/xlWxlhLrFJfMwl6W8iWAw85X78FD3zDl8S9zzF1wPLr2R7Y866c3oWReUN7K9udDoUv9DZ1c1L2yu4bmwqKfFRTodjAkB/E8EQVa0CcN+nnme9aBEpEJHNInL2x34wcEpVO93Py4HznseKyAPu9yiorbVWIub/W5yXQbhLeHG7nRUA/PVALbWNbdydb2cD5tJcNBGIyOsisquP2+LL2E+2quYD9wK/EJGRQF/t2c47XoCqPqqq+aqan5KSchm7NsEuOS6K68al8tL2CjqtTwF/KSgnOS6S68ad77jMmL930USgqnNUdVIftxXAcRFJB3Df15znPSrd96XA28BUoA4YKCLh7tWyALviZ67IndOzqGtq452DoX22WN/Uxut7j3Pb1EwiwqxRoLk0/f2mrASWuh8vBVacu4KIDBKRKPfjZOAaYI/2jBb2FnDnhbY35lJcNzaVpNhInt9a5nQojnplZ0/fgbvyhzodigkg/U0EPwZuFJGDwI3u54hIvog85l5nPFAgIoX0/PD/WFX3uF/7BvBVESmh55rB4/2Mx4SoyHAXd07P4o29NSE7jaWq8peCMqYMHciYIfFOh2MCSL8SgarWq+oNqjrafX/CvbxAVT/vfrxRVXNVdYr7/vFe25eq6gxVHaWqd6lqW//+HBPKllw1lM5u5S8h2pS0qLyBfdWN3DXdLhKby2NFRBM0clLimJWTxLKtx+juDr15Cp794CgxkWEszstwOhQTYCwRmKByz4xsyk608F5JndOh+FRDSwcrCytZnJdJfHSE0+GYAGOJwASVeZPSGBQTwXNbjjkdik+9vL2c1o5uPjkz2+lQTACyRGCCSlR4GHdOz2LDnuPUNIbGRWNV5dkPjjElK5FJmYlOh2MCkCUCE3SWzMims1tDZvyhrUdOcrCmiU/OHOZ0KCZAWSIwQWek+6Lxnz84RlcIXDR+9oOjxEeHc/OUdKdDMQHKEoEJSp+ePZzyky28sfe406F4VX1TG+uKq7ljWhYxkeEX38CYPlgiMEFp7oQhZCRG8+TGI06H4lXPF5TR3tXNvXaR2PSDJQITlMLDXNw3ezgbD9Wzr/q00+F4RUdXN3/aeJRrRg22nsSmXywRmKC15KqhREe4eCpIzwrWFldRfbqV+68d4XQoJsBZIjBBa1BsJLdNzeSl7RWcPBNccxqrKo+/d5iclFg+PsaGmzb9Y4nABLWlVw+nrbObZUE2KmnB0ZMUlTfw2WtG4HL1NbWHMZfOEoEJauPSErh65GCe3nSEjiCatObxdw+TOCCCO2xyeuMBlghM0PvcNSOobGhldVFwzHtUdqKZ1/ZUc+/MbGsyajzCEoEJetePS2XskHh+89ahoBiV9In3j+ASYens4U6HYoKEJQIT9Fwu4Z+uG8nBmiZeD/AOZnVNbfx5y1EW5WWQlhjtdDgmSPQrEYhIkohsEJGD7vtBfaxznYjs7HVrFZFb3a89KSKHe72W1594jDmfhbnpZCfF8Ou3D9EzS2pgeuzdw7R1dvPF60Y5HYoJIv09I/gm8IaqjgbecD//O6r6lqrmqWoecD3QDLzWa5Wvn31dVXf2Mx5j+hQe5uLBj+VQWHaKTYfqnQ7nipw8087Tm45w8+QMRqbEOR2OCSL9TQSLgafcj58Cbr3I+ncC61S1uZ/7Neay3TEti5T4KH79donToVyRJ94/zJn2Lh6yswHjYf1NBENUtQrAfX+xni1LgOfOWfZDESkSkZ+LSNT5NhSRB0SkQEQKamtr+xe1CUnREWH8w0dG8H5JPduPnXQ6nMtyurWDJzYeYd7ENMam2XASxrMumghE5HUR2dXHbfHl7EhE0oFcYH2vxd8CxgFXAUnAN863vao+qqr5qpqfkpJyObs25m8+OXMYg2Mj+a9X9wfUtYI/bTxCY2snD11vZwPG8y6aCFR1jqpO6uO2Ajju/oE/+0Nfc4G3uht4WVU7er13lfZoA54AZvTvzzHmwmKjwnno+lFsKq3n3YOBMa9xQ3MHf3j3MNePS7UZyIxX9Lc0tBJY6n68FFhxgXXv4ZyyUK8kIvRcX9jVz3iMuah7Z2aTOXAAP1m/LyD6Ffz67RJOt3bw9ZvGOh2KCVL9TQQ/Bm4UkYPAje7niEi+iDx2diURGQ4MBf56zvbPikgxUAwkAz/oZzzGXFRUeBhfvXEMuypOs3ZXldPhXFDZiWaefP8Id0zLYnx6gtPhmCDVr/7pqloP3NDH8gLg872eHwE+NCiKql7fn/0bc6VunZrJo++U8tP1+7lpYhoRYf7Zt/Knr+3H5YKvzR3jdCgmiPnnt98YLwtzCV+/aSxH6ptZtuWY0+H0qaj8FCt2VnL/tSNITxzgdDgmiFkiMCHrhvGpzByRxE9fO0BdU5vT4fwdVeVHa/eRFBvJgx8b6XQ4JshZIjAhS0T4wa2TONPWyY/W7nM6nL+zqqiKTaX1fHnOaBKiI5wOxwQ5SwQmpI0eEs8DH83hxe3lbC71j6EnTp5p599X7mZKViKfnDnM6XBMCLBEYELew9ePJmvQAL7zyi7aO52fvOaHa/fS0NLBj26fTJjNPmZ8wBKBCXkDIsP490UTKalp4g/vljoay3sH61i+rZwHP5bDhAxrLmp8wxKBMcAN44cwf1Iav3z9ILsqGhyJoaW9i2+9XEROciwPXz/akRhMaLJEYIzbf9yWS1JsJA8/t4Omtk6f7/97K3dRdqKF/7g9l+iIMJ/v34QuSwTGuA2KjeSXS/I4Wn+G777i29FOnt96jBcKynn4+lHMyhns030bY4nAmF5m5gzmSzeM4aUdFby4rdwn+9xV0cC/rdjNtaOS+fIc60FsfM8SgTHneOj6UcwckcR3Xtnl9XkLTjW384VntjHYfTZirYSMEywRGHOOMJfwv/dOIzUhis8+sZUDxxu9sp/m9k4eeHobx0+38utPTmNw3HnnZTLGqywRGNOHlPgonrl/JlHhLu57/APKTnh2dtWW9i4+9+RWCo6c4L/vzmNa9iCPvr8xl8MSgTHnMTQphj/dP4OW9i7ue/wDyk96Jhm0tHdx/1Nb2XL4BD+7O49FUzI88r7GXClLBMZcwLi0BJ747Azqm9pZ9L/vs7Gkf7OaVTW0sPSPW9hUWs9P75rCrVM/NDq7MT5nicCYi5g+bBArHrqGwbGRfOrxD3j0nUNXNN/x+t3VzP/lu+yqbOAXn8jj9mlZXojWmMvXr0QgIneJyG4R6RaR/AusN09E9otIiYh8s9fyESLygYgcFJHnRSSyP/EY4y05KXG8/MVrmDcpjf9Yu4+7freJdw/WXlJCOFbfzDeWF/Hg09sYOiiG1Q9fy+I8OxMw/kOu5MjmbxuLjAe6gd8D/+KemezcdcKAA/RMZVkObAXuUdU9IvIC8JKqLhOR3wGFqvrbi+03Pz9fCwo+tCtjvE5VeW5LGb968yBVDa1MzR7IvTOymZSZyKjUOCLCXHR3K7VNbeyqaODZD47x1v4aXCJ8/iMj+NqNY4kMtxNx4wwR2aaqHzpo7+9UlXvdb36h1WYAJapa6l53GbBYRPYC1wP3utd7Cvg+cNFEYIxTRIR7Z2Zzx/RMlm8r5zdvHeLry4sAiAx3kRIXRU1jKx1dPQdYyXFRPHzdKO6ZmW2zjBm/1a9EcIkygbJez8uBmcBg4JSqdvZaft7zZRF5AHgAIDs72zuRGnOJosLD+OTMYSy5KpvDdU3srjzNnsrT1Da2kZYYTfrAAWQnxTA7Z7CdARi/d9FEICKvA2l9vPR/VHXFJeyjr9MFvcDyPqnqo8Cj0FMauoT9GuN1YS5hVGo8o1Ljre5vAtZFE4GqzunnPsqBob2eZwGVQB0wUETC3WcFZ5cbY4zxIV+cs24FRrtbCEUCS4CV2nOV+i3gTvd6S4FLOcMwxhjjQf1tPnqbiJQDs4E1IrLevTxDRNYCuI/2HwLWA3uBF1R1t/stvgF8VURK6Llm8Hh/4jHGGHP5+tV81CnWfNQYYy7f+ZqPWnMGY4wJcZYIjDEmxFkiMMaYEGeJwBhjQlxAXiwWkVrg6BVunkxPH4ZQZp+BfQah/vdDaH4Gw1Q15dyFAZkI+kNECvq6ah5K7DOwzyDU/36wz6A3Kw0ZY0yIs0RgjDEhLhQTwaNOB+AH7DOwzyDU/36wz+BvQu4agTHGmL8XimcExhhjerFEYIwxIS6kEoGIzBOR/SJSIiLfdDoeXxKRoSLylojsFZHdIvIlp2NyioiEicgOEVntdCxOEJGBIrJcRPa5vw+znY7J10TkK+7/B7tE5DkRiXY6JieFTCIQkTDg18B8YAJwj4hMcDYqn+oEvqaq44FZwBdD7O/v7Uv0DIkeqn4JvKqq44AphNhnISKZwD8D+ao6CQijZ56UkBUyiQCYAZSoaqmqtgPLgMUOx+Qzqlqlqtvdjxvp+c8fcnMrikgWsBB4zOlYnCAiCcBHcc/9oartqnrK2agcEQ4MEJFwIIYQnx0xlBJBJlDW63k5IfhDCCAiw4GpwAfORuKIXwD/CnQ7HYhDcoBa4Al3eewxEYl1OihfUtUK4KfAMaAKaFDV15yNylmhlAikj2Uh13ZWROKAF4Evq+ppp+PxJRG5GahR1W1Ox+KgcGAa8FtVnQqcAULtetkgeqoBI4AMIFZEPuVsVM4KpURQDgzt9TyLEDsdFJEIepLAs6r6ktPxOOAaYJGIHKGnNHi9iDzjbEg+Vw6Uq+rZs8Hl9CSGUDIHOKyqtaraAbwEXO1wTI4KpUSwFRgtIiNEJJKei0MrHY7JZ0RE6KkL71XVnzkdjxNU9VuqmqWqw+n5939TVUPqSFBVq4EyERnrXnQDsMfBkJxwDJglIjHu/xc3EGIXzM8V7nQAvqKqnSLyELCenlYCf1TV3Q6H5UvXAPcBxSKy073s26q61sGYjDMeBp51HxCVAp91OB6fUtUPRGQ5sJ2e1nQ7CPHhJmyICWOMCXGhVBoyxhjTB0sExhgT4iwRGGNMiLNEYIwxIc4SgTHGhDhLBMYYE+IsERhjTIj7fwNC64VTR4WPAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Compute the x and y coordinates for points on a sine curve\n",
    "x = np.arange(0, 3 * np.pi, 0.1)\n",
    "y = np.sin(x)\n",
    "\n",
    "# Plot the points using matplotlib\n",
    "plt.plot(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9W2VAcLiL9jX"
   },
   "source": [
    "With just a little bit of extra work we can easily plot multiple lines at once, and add a title, legend, and axis labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 312
    },
    "colab_type": "code",
    "id": "TfCQHJ5AL9jY",
    "outputId": "fdb9c033-0f06-4041-a69d-a0f3a54c7206"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x121939950>"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEWCAYAAABIVsEJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nOydd3hU55X/P0ddQgXUCyCaKCpIdONuY0wzCBv3EjvNSTb5ZRNnk3WSXcebrHednmzijddJHDtxN7bp2OBeMAZRhApVNElIQkggVFA/vz/uyFGwJAZNuXNH9/M895mZW78qd859z3uKqCo2NjY2NjYXS4DZAmxsbGxsrIltQGxsbGxsBoVtQGxsbGxsBoVtQGxsbGxsBoVtQGxsbGxsBoVtQGxsbGxsBoVtQGyGBCJyl4hsMlvHhRCRd0XkS16+5g9E5E/evKaNf2AbEBu/QUQuF5EtItIgIvUi8pGIzAJQ1WdV9XqzNbqKiEwUkZdF5JTj59wjIg+ISOBgz6mq/6WqXjVaNv6BbUBs/AIRiQbWAb8DYoE04D+ANjN1uRMRGQ98ApQDOaoaA9wCzASizNRmMzSxDYiNvzARQFWfV9UuVT2nqptUdQ+AiNwnIh/27CwiKiJfFZGDInJaRB4TEem1/Qsistex7Q0RSe/vwo4RQbVjRPC+iGT12vaU49zrRaRRRD5xGIKe7fNFZJ/j2N8D0udFDP4D2KKqD6hqlePn3a+qd6rqGcf5lolIiYiccbjDpvS61r+KSKVDx34RmedY/7CIPON4P8bxu7lXRI47Rjo/7HWOABF5UETKRKRORF4SkdgL/nVs/BLbgNj4CweALhF5WkQWicgIJ465AZgF5AK3AgsARGQ58APgJiAB+AB4foDzbAQygERgJ/DsedvvwPjyHwEcAh5xXCceeAX4NyAeKAMuG+A61wEr+9soIhMdOr/l0L0BWCsiISIyCfgGMEtVoxw/69EBrnU5MAmYBzzUyxB9E1gOXAWkAqeBxwY4j40fYxsQG79AVc9ifOkp8EegVkTWiEjSAIc9qqpnVPU48A6Q51j/FeC/VXWvqnYC/wXk9TcKUdUnVbVRVduAh4FcEYnptcurqrrNca5ne11nMVCqqitVtQP4DVA9gN44oGqA7bcB61V1s+N8vwDCgUuBLiAUyBSRYFU9qqplA5zrPxyjuEKgEMPIgvG7+aGqVvT6eW8WkaABzmXjp9gGxMZvcHzh36eqI4FsjCfk3wxwSO8v6xYg0vE+Hfitww10BqjHcC2lnX8CEQkUkUcdLp2z/P2pPt6J66RizGf06Nfen/ugDkgZYHsqcKzX+bod50tT1UMYI5OHgZMi8oKIpA5wroF+N6/1+t3sxTBOAxlqGz/FNiA2fomq7gOewjAkF0s58BVVHd5rCVfVLX3seyeQj+FeigHGONYPNJfRQxUwqueDYw5mVP+78yawYoDtJzC+4M8/XyWAqj6nqpc79lHgp05oPJ9yYNF5v5swVa0cxLlsLI5tQGz8AhGZLCLfEZGRjs+jMOYetg7idI8D3++ZDBeRGBG5pZ99ozAiveqACAx3l7OsB7JE5CaHC+ibQPIA+/8IuFREfi4iyQ5tE0TkGREZDrwELBGReSISDHzHoW2LiEwSkWtFJBRoBc5hjBwulseBR3rceSKSICL5gziPjR9gGxAbf6ERmAN8IiLNGIajGONL9KJQ1dcwns5fcLilioFF/ez+Vwy3USVQykUYLFU9hRGG+yiGAcoAPhpg/zJgLsYop0REGjAm4QuARlXdD9yNEcp8ClgKLFXVdoz5j0cd66sxJvx/4KzWXvwWWANsEpFGjJ93ziDOY+MHiN1QysbGxsZmMNgjEBsbGxubQWEbEBsbGxubQWEbEBsbGxubQWEbEBsbGxubQTGkskfj4+N1zJgxZsuwsbGxsRQ7duw4paoJ568fUgZkzJgxFBQUmC3DxsbGxlKIyLG+1tsuLBsbGxubQWEbEBsbGxubQWEbEBsbGxubQWEbEBsbGxubQWEbEBsbGxubQWGqARGRJ0XkpIgU97NdROR/ROSQiOwRkem9tt3raEd6UETu9Z5qGxsbGxswfwTyFLBwgO2LMCqUZgD3A38AcPRg/hFGFdDZwI+cbGFqY2NjY+MmTM0DUdX3RWTMALvkA391dGrbKiLDRSQFuBrYrKr1ACKyGcMQDdS3evAUvgDNtRCXAfEZMDwdAq2TQtPVrewuP0NtYytnz3VytrWD1OHhzBoTS0JUqNnybPyBxhqoOwSNVdBYDUGhkDodkrON9xbiTEs7W8rqaGrrpLtb6VYYnzCMGekjCAo0+5nbt/D1b8E0/rHFZ4VjXX/rP4OI3I8xemH06NGDU1HyGhx4/e+fQ2Ng5n0w52sQPVCHUXM5UNPIKzsrWLWrkpqzbX3uMy5+GNdlJvHVq8YTOyzEywptLE13FxzcDAVPwsFNGE0OzyMgGNJmwOXfgokLQZxp1Oh9mto6eXVnBa8XV/PJkXq6uj/7s0SHBXHlxATy89K4bkoi4qM/izfxdQPS119IB1j/2ZWqTwBPAMycOXNwzU/ufBFa6uHUQag7aNw0W34HH/8vTL0VrnsYIhMHdWpPcKalnYfXlLBq9wkCA4SrJybwwyVpjE8YRnRYMFFhQRw51cz2o/VsPVzPnz44zPOfHOerV4/nC5eNJTwk0OwfwcbXOfgmrPs2NByHyCS44jsw5nKISoGoJGhrghO7jKXkVXj+dkidBtf8ECZc5zOGRFV5o6Sah9eUUn22lfEJw/jKleO4LjOJhMhQAgMMnXsqzvD2vpO8s7+WdXuquHxCPA8vy2RCYpTJP4G5mN5QyuHCWqeqn+ldLSL/B7yrqs87Pu/HcF9dDVytql/pa7/+mDlzprqtlEn9Edj6v7DjaQgfATf/2biBTGZzaQ0/eK2I083tfO3q8dx76RjiIwd2IRysaeSnr+/nzb01pMSE8fjdM8gdNdxLim0sRcc52PwQbHsCEjPh6u/DpEUQGNz/MV0dhhv4/Z/BmeMw/XOw+Bemu7Yqz5zj31cV8/a+k0xJieY/l2cxIz12wGM6u7p5bttxfvHGflrau/jiFWP5l+snEeznri0R2aGqMz+z3scNyBLgG8BijAnz/1HV2Y5J9B1AT1TWTmBGz5xIf7jVgPRQXQwv3wv1h42nq8sfgADv/zN1dys/XlfKU1uOMjk5il/emktWasxFnWPbkXq+/eJuTjW18atb81gy1XfdczYmUHsAXrwbTu2HS74O8x6C4DDnj+9sh/cehQ9+CSNnwa1/M80FXHKigXuf3E5LeycPzJ/IfZeOuaj5jVNNbfzs9X28VFDB1ZMSeOzO6QwL9XWHzuDxSQMiIs9jjCbigRqMyKpgAFV9XAwn4+8xJshbgM+raoHj2C/w957Oj6jqXy50PY8YEIC2Rlj7LSheCVNvg+WPe9WIdHUrD76yh5d3VPD5y8bw/UVTCAka3PVPNbXxlb/tYMex03xn/kS+ce0E29drAyf3wdNLAYWbnoDx1w7+XCWrYNU/QWik4R5OneY2mc6w9XAdX366gMiwIP76hdlkJA3eDfX8tuP88LUistNiePK+WRcc7VsVnzQg3sZjBgRAFd7/BbzznzDjPrjhN17x83Z0dfPAS4WsLTzBP8/L4FvXZbj8hd/a0cX3Xy3itV2VfP2a8Xx3wWQ3qbWxJCf3wdM3gATAvesgYaLr56wphedug/Ym+MIb7jmnE2wqqeYbz+9i1Ihw/vbFOaQOD3f5nG/treHrz+0kMSqM5748h5EjItyg1Lfoz4D4t+POm4jAVd81JhN3PAVv/NAwKh6ku1v51gu7WVt4ggcXTebb8ye6ZbQQFhzIr27N5Y7Zo3nsnTKe2dpnJWeboYAnjAdAUiZ8bhUEBMIzN0FDpXvOOwAFR+v5xnO7mJISzcqvXuoW4wEwb0oSz3/5Ek63tPPFpwpobO1wy3mtgG1A3M21/w5zvgpbH4P3fubRS/32rYOsL6riB4sn89Wrxrv13CLCT/KzmDc5kYdWF7O5tMat57exAC318OwthvG4b737Rwlx4+GulXDuDDyzwriehyivb+Erf9tB6vAwnv78LEa4OWR92ugR/OGuGRyqbeL/Pb+Lzq5ut57fV7ENiLsRgYWPQu4d8O5/GSG/HmBTSTW/fesgK6aP5MtXjPPINYICA/jdndPISYvh/z2/k93lZzxyHRsfpLsLXvkSNFXD7c8bCbSeIDUP7ngO6svgpc8Z13UzTW2dfPmvBbR3dfOne2cxPMIz+U6XZ8Tzk/xs3t1fy0/WlXrkGr6GbUA8gQjc8GtIyoZX74eGCree/tDJRr794m5yR8bwyI3ZHp3kjggJ4s+OycGvP7uTs0NoeD6kee+nUPYWLPoZjJzh2WuNvRKW/haOfmBEaLmRHjfvwZNN/O9d05mQGOnW85/PnXNG8+UrxvL0x8d49hP/d/3aBsRTBIfDLU9BVzus/IIRC+8Gmts6uf+vOwgPCeTxe2YQFuz5pL/4yFB+e/s0qs+28qPVJR6/no3J7H/dMCB5dxsBId4g9w7IuRXe/W84vtVtp/3rx0d5c28N/7ZkCldkfKalt0d4cNEUrsiI5yfrSjlyqtkr1zQL24B4kvgM48mq/BN4+yduOeXP39jPkbpmfn/ndFJi3DMJ6Awz0kfwzWszeG1XJat3e37C08Ykmk7Cqq9C8lRY8gvvZYyLwJJfGnXmXvkSnDvt8ikP1zbx6Ov7uHpSAvddOsZ1jU4SGCD84pZcQoMC+faLu/16PsQ2IJ4m52bjKe6j30L5NpdOte1IPU9tOcq9c8dwybg49+i7CL5+zXhmpI/g314rpry+xevXt/ECG78H7c2w4s/GKNqbhEUbFR0aq2DNN12KYuzqVv7l5UJCgwL56YqpXs9lSooO4z+XZ7O7/AyPv1fm1Wt7E9uAeIPrH4HokbD2nwftyjrX3sX3VhYyKjac7y2c5GaBzhEUGMBvbstDge+uLGQo5RANCfZtMAqHXvU9r+VlfIa0GUZFh71rYP+GQZ/mjx8cZufxM/w4P4uk6IvIlncjS3NTWZqbym/ePEhxZYMpGjyNbUC8QWgkLP45nCw1ijAOgl9t3s/RuhZ+umIqESHmlUwYFRvBDxZPYevhetYUnjBNh42baW2A9Q8YgR+XfctcLZf+P0jMgo3/aoyGLpKDNY38atMBFmUnsyw31QMCnecn+VnERYbwLy8X+qUryzYg3mLyYpiy1JicrD9yUYcWlp/hzx8e4a45o7l0fLyHBDrPbbNGMXVkDP+5fu+QSpryazb/CJpqYNnvBi6M6A0Cg435kIbyi86lUlUeXltCeEggP1nu2QhFZxgeEcLDS7PYV93I89uOm6rFE9gGxJss+pnRH2H9A077d1WNIomxw0J5cJFvlBQJDBB+nJ/NqaY2fvvmQbPl2LhKRQHs+Atc8k+QNv3C+3uD9LlGFNjHvzey4Z1kc2kNHx2q49vXZfhMXaqF2cnMHRfHLzcf4HRzu9ly3IptQLxJdKpRwbTsbdi3zqlD1hdVsePYab67YCJRYSY/GfYib9Rwbp81ir9sOcr+6kaz5dgMFlWj7M6wRKM0uy8x/8cQGgXrv+PUA1dbZxePbNhLRmIkd12S7gWBziEi/GhZJmfPdfCrzQfMluNWbAPibWZ+AeInwZv/AV2dA+7a2tHFoxv3MTk5iptnjPKSQOf57oLJRIUF8dDqYntC3arsXQvlW+HaHxpzdb7EsDijWduxD6F01QV3/8tHRzlW18JDSzN9rj/H5ORo7rkknWc/OcbeqrNmy3EbvvVbHgoEBhk3Rd1B2PXXAXf9y0dHqTh9jn9bkvlpZzRfInZYCN+ZP5FPjtTz9r6TZsuxuVg62+HNH0HCFMNd5ItMu8fQ9/YjAz5wnWxs5XdvHeS6KYleSxi8WL49fyIx4cE8vKbEbx64bANiBpMWwahL4N1H+40yOdXUxmPvHGLe5EQuzzB/4rw/bp89mtGxEfxi0wG6++gjbePDFDxpNEKb/2PjwcYXCQiEef9uPHAVPtfvbr/efID2rm5+uCTTi+IujuERIXzb8cD13oFas+W4BduAmIGIcdM21Rh91fvgd28dpLWjix8smeJlcRdHcGAA356fwd6qs2worjJbjo2znDtjRASOvQoy5putZmAmLTY6GL77qNFS9zzK61t4uaCCO2ePZmz8MBMEOs/ts0aTNjycX20+4BejEFMNiIgsFJH9InJIRB7sY/uvRWS3YzkgImd6bevqtW2Nd5W7gdFzYPINRoZ686l/2FTd0Mrz28q5ecZIxif4mF+6D5blpjExKZJfbTrgl7HufsnHv4dz9XD9T7xXrmSwiMC8H8HZStj+p89s/t3bBwkIEP7pmgkmiLs4QoIC+Oa8CeypaODNvdZ3+5pmQEQkEHgMWARkAneIyD+MP1X126qap6p5wO+AV3ttPtezTVWXeU24O5n3I+hohg9//Q+rH3+vjG5Vvm6BGwKMsN7vXD+Jw6eaeXWXXSfL5zl3Bj75PyMvKSXXbDXOMfYKGD8PPviVkfTo4OipZl7ZWcldc0ablnF+sdw0fSTpcRH8arP13b5mjkBmA4dU9bCqtgMvAPkD7H8H8LxXlHmLhImQfTMU/OXTZjo1Z1t5bttxbpqexqhY67TGvD4zidyRMfz2zYO0dbq/p4ONG9n2BLSdhSu/Z7aSi2PeQ8aoaesfPl31P28fJDhQ+NrV7m2o5kmCAwP453mG2/f1kmqz5biEmQYkDSjv9bnCse4ziEg6MBZ4u9fqMBEpEJGtIrK8v4uIyP2O/Qpqa31w4uqKB4xRiOOmePy9Mrq6lW9c46EGPh5CxBiFVJ45xys77FGIz9LWCB8/BhMXQcpUs9VcHKl5hu5PHof2Zg7XNrFqVyX3XJJOYpQ1Rh895OelMT5hGL/efIAuC49CzDQgfTle+/tN3g6sVNXej7ajHU3e7wR+IyJ9PoKo6hOqOlNVZyYk+GB4X+IUYy5k2/9RW1vLc58c56ZpaYyOs87oo4crMuKZOjKGJ94vs/RN4dds+yO0noGrvmu2ksFxxQNGqfcdT/O7tw8RGhTIV9zcztkbBAYI37puIgdPNvF6sXVHIWYakAqgd3bcSKC/6ny3c577SlVPOF4PA+8C09wv0Utc+S/Q2sCeVb+ks1v5xrXWmPs4HxHha1eN52hdi6VvCr+lvdmYPJ9wnVH11oqMmg3pl9P50e/YWHicuy8Z7TMlSy6WxTkpjImL4In3yywbkWWmAdkOZIjIWBEJwTASn4mmEpFJwAjg417rRohIqON9PHAZYN0mxKnT6Bh7LXkVz3JzTizpcb4dijgQ12clMzZ+GI+/Z92bwm/Z8RS01Flv7uN8rvg2QU0nWB7wAV+4fKzZagZNYIDwpSvGUVjRwNbD9WbLGRSmGRBV7QS+AbwB7AVeUtUSEfmxiPSOqroDeEH/8dtoClAgIoXAO8CjqmpdAwKsjbmTODnLt+M+vvDOPkxggHD/leMoqmxgS1md2XJseujqNHKOxlxhhJBbmIaUKynVMTwQsZGUqBCz5bjEzTNGEjcshCfet2bTKVPzQFR1g6pOVNXxqvqIY91Dqrqm1z4Pq+qD5x23RVVzVDXX8fpnb2t3J+2d3fy0dAT7QrJJLnnygjWyfJ2bpqeRGBXKH9615k3hl+xdDWcrYO7XzVbiMs9sO85jHctIbC83anlZmLDgQO69dAzv7K+1ZFFSOxPdB1i35wQ1Z9vonP01aDjuUic2XyA0KJAvXD6WDw+doqjCPzuxWQpVI/IqdjxkLDBbjUu0dXbx1JajNI5bZPw8W/7HbEkuc88l6YQHB/LE+4fNlnLR2AbEZFSVP35whIzESLKuuR2Gj/6HOHerctec0USFBvGnD613U/gd5dugcgdc8jUIsPYtv3rXCWob27j/qokw56vGz1VRYLYslxgxLITbZo1iTWElVQ2fLdXiy1j7v8kP2FJWx96qs3zpirFIYBDM/goc3wIndpstzSWiwoJZMWMkG4qqONnYaracoc3Hv4ew4ZB3p9lKXKK7W/njB4fJTInmsglxkHcHhEQZWfUW54uXj6WrW3lqy1GzpVwUtgExmT9+cJj4yBDy8xw5lNPvgZBII1nK4nxubjodXcpzn/hfK0/LcPqo0bxsxn0QYt3oPoCPyk5x8GST8bAlYjSbmnY3lLwGjdYOGx8VG8H1mcm8tL2c1g7rVHKwDYiJHK5t4t39tdx9STphwYHGyrAYyLsLilZCY425Al1kXEIkV01M4NlPjtPeaRdZNIVPngAJgNn3m63EZf728TFih4WwZGrK31fO/jJ0dxql6S3O5+amc7qlg3V7rFPV2jYgJvLsJ8cJChDunDP6HzfM+Yrf3BT3XTqG2sY2y9f8sSTtzbDrb5C5HGL6rBJkGSrPnOPNvTXcNmsUoUGBf98QNx4mLjDulc428wS6gbnj45iQGMlfPz5qthSnsQ2ISZxr72LljgoWZCd/to7PpzfFn42ucRbmqokJjImL4GmL+Xb9gqKVRtHE2V82W4nLPPfJMcAIzvgMc74CzbWGK8vCiAifm5vOnooGdpefufABPoBtQExi7Z4TNJzr4J5L0vveYdaXjJti3zrvCnMzAQHCPXPHsOPYaYor7ZBer6FqPIAkZsIoaycOtnV28cK2cq6dnMTIEX3UiBt3DcRP8ot5wxunpTEsJNAyoxDbgJjEs1uPkZEYyZyxsX3vMP5aI6R3x1+8K8wD3DJzJBEhgZaLMLE0J3ZCVSHM/ILvN4y6ABuLqqlrbueeuf08bIkYD1wndhmLhemJXlxXWEVdk++75GwDYgJ7Ks5QWNHA3ZekG9EkfREQCNPvhSPvw6lD3hXoZqLDglk+LY11jlGXjRfY/iQED4Opt5mtxGX+tvUYY+IiuGJCfP87Tb0VgsJhx9PeE+Yh7rkknfaubl7YXn7hnU3GNiAm8MzWY4QHB3Lj9AtMbE67BwKC/GIUcses0bR2dLN6t90rxOOcOw3Fr0DOzRAWbbYalyg50cCOY6e5+5J0AgIGGEmFD4fsm6DoZWhr8p5AD5CRFMXccXE8v+24z3cstA2Il2lo6WBN4QmWT0slOix44J2jkmDSYtj9HHRYOxkvZ2QMWanRPL+t3K7S62kKX4TOczDri2YrcZkXt5cTEhTAzTNGXnjnGfdBe5NhPC3O7bNHUXH6nM8XJLUNiJdZtbuS1o5u7prTjz/3fGZ+3mjjafGicQC3zx7N3qqz7LHrY3kOVSOkNW2Gdfqd90NrRxev7apkYVYywyOcqLo7chYkTDHK1lucBVnJxIQH82KBb7uxbAPiZV7cXk52WjTZaTHOHTD2ahgxxi/cWPl5qYQHB/LCdjsz3WMc3wqn9sOMz5utxGVeL66msbWT22eNuvDOYEymz7jPEUCwx6PaPE1YcCA3TkvjjeJqTjf7bii/bUC8SHFlA6VVZ7l1ppM3BBjF72bcB8c+gtoDHtPmDaLDglkyNYU1u0/Q3GbtkvU+y65njFI4WTearcRlXth+nNGxEVwyLs75g6beCkFhsNP6k+m3zRpFe1c3r+3y3XlD24B4kZcLDH9ufu5FZgXn3gkSCLuf8YwwL3LH7FE0t3extrC/7sU2g6atyUimy1oOoZFmq3GJY3XNbD1cz60zRw48eX4+EbFG5v2el4xMfAszJSWa3FHDeXG7784bmmpARGShiOwXkUMi8mAf2+8TkVoR2e1YvtRr270ictCx3Otd5RdPa0cXq3afYGFWMjERF5g8P5+oJMi43pgctXizqemjR5CRGMnzFghRtBylq6Cj2YjeszgvFZQTIHDzjIsYrfcw414jA98f5g1njWJ/TaPPZqabZkBEJBB4DFgEZAJ3iEhmH7u+qKp5juVPjmNjgR8Bc4DZwI9EZISXpA+KTaU1NJzr4DZn/bnnM+0uaKqGsrfdK8zLiAi3zRpFYfkZDtRYrwObT7PrGYibYPnM886ubl4uqODqSYkkx4Rd+IDzGT3XmDfc/azbtXmbpbmpRIQE8qKPPnCZOQKZDRxS1cOq2g68AOQ7eewCYLOq1qvqaWAzsNBDOt3CywXlpA0PZ+7F+HN7k7EAIuL8wo21fFoaQQHCKzsqzJbiP5w6BMc/NsqbWzzz/L0DtZxsbBv8w5aIUdH6yPtwxtoBG5GhQdwwNYW1hSdoafc974OZBiQN6G1WKxzrzmeFiOwRkZUi0vMf5eyxiMj9IlIgIgW1tbXu0H3RVJxu4cNDp7jlYv25vQkKgZxbYf9GaKl3r0AvEx8ZytWTEnhtVyWdXXaZd7ew+1ljniz3DrOVuMzKHRXER4Zw7eTEwZ8k93bjtfBF94gykZtnGPOGb/hgRWszDUhf36TnzxStBcao6lTgTaAntMKZY42Vqk+o6kxVnZmQkDBosa7wyg4jisKpZKiBmHYXdLUbVVYtzorpIznZ2MaHh06ZLcX6dHdB4fMw4TqISjZbjUucaWnnrb0nWZabRnCgC19Pw0fD2CsNw+qjE9DOMjN9BKNiwz/9HvElzDQgFUDvMepI4B9Cc1S1TlV7Kor9EZjh7LG+gqry6q4K5o6L67uS6MWQnAPJU/3CjXXtlERiwoN5Zafv3RSWo+xtaKwy3FcWZ92eKtq7urnpQmV+nCH3Tjh9xMiNsTABAcKN00byUdkpn+uZbqYB2Q5kiMhYEQkBbgfW9N5BRHq1HmMZsNfx/g3gehEZ4Zg8v96xzufYefwMx+pauHGamxr6TLvbqLJaXeye85lEaFAg+XmpbCqp5myrXWDRJQpfgPARMNGnpwGd4tWdFUxKiiIr1Q01vDKXGTkxfjCZvmJ6GqqwapdvPSebZkBUtRP4BsYX/17gJVUtEZEfi8gyx27fFJESESkEvgnc5zi2HvgJhhHaDvzYsc7neHVnBWHBASzKSbnwzs6QcwsEBMOeF9xzPhNZMX0kbZ3drLdQC0+fo60R9q2HrJuMeTILc+RUMzuPn+Gm6Wn9V6m+GEKGGTkhJaugvcX185lIetwwZqaP4JWdFT6VE2JqHoiqblDViao6XlUfcax7SFXXON5/X1WzVDVXVa9R1X29jn1SVSc4Fp+s89r6KGcAACAASURBVNHW2cW6PVUsyEomMjTIPSeNiDVyQopWGr5vCzN1ZAwTEiNZaUdjDZ69a43CiX5Qtv21nRUEiBGl5zby7oT2Rss3ZgO4afpIDp1sosiHGrPZmege5J19tTSc63Cf+6qHqbcaPu+jH7j3vF5GRFgxfSQ7jp3myClrZw2bRuELMGIsjJptthKX6O5WXt1VyWUT4kmKHkTuR3+Mngsxo2GP9aOxlkxNISQogFd9aN7QNiAe5LVdFcRHhnL5QI1wBsPEhRAabZRrsDjLp6Uigt0nZDA0VBq5DlNvs3zux/aj9VScPseK6S5GKp5PQIDRF6XsHWgyJ4zfXcSEBzM/M4nVuytp7/SN8HfbgHiIMy3tvL3vJPl5qQS5Eo7YF8FhxgRh6RrL+3ZTYsKZMzaW1btP+JRv1xIUrwTUGJFanFd3VjIsJJDrs5Lcf/Kpt4F2Qcmr7j+3l1kxPY3TLR28d8A3jKFtQDzEuj1VdHSp+91XPUy9zfDtHtjomfN7keV5aRw51Wz3CblYCl80emDEjTdbiUu0dnSxobiKhdkpRIS4aa6wN4mTjRB4P3BjXZGRwIiIYNb4SDFS24B4iFW7KpmYFOmecMS+SL8cotP8wo21KCeFkMAAVu/2jZvCElQXwckSv5g8f3d/LY2tneTnpXruIjm3QuUOqCvz3DW8QHBgAEumprC5tJomH2iJYBsQD1Be30LBsdPk57kpHLEveny7h96EZmtnc8eEB3PN5ATW7jlBl4/3gPYZ9rwIAUFG+K7FWVNYSXxkCJeOH2SdOGfIuRkQo2e6xcnPS6O1o5vNpeaXNrENiAdYu8d4kl6W68EnKjCePrs7jR4QFmd5Xhq1jW1sKbO2MfQK3d1Q/KpRumSYB790vUBjawdv7j3JDVM9MFfYm+hUGHuFYXgtPtc2Y/QI0oaH+8SI3TYgHmDN7hNMHz2cUbEuli65EElZkJjlF26sayYnEhUa5HOZtj5J+VY4WwnZK8xW4jJvlNTQ3tnNMk+6r3rIuRXqD0PlTs9fy4MEBAjL8lL54OAp6praLnyAJ7WYenU/5EBNI/uqGz0/+ughZwVUbLN82eqw4EAWZifzRkk1rR3WTpD0OEUrISgcJi02W4nLrN5dyajYcKaNGu75i2Uug8BQKLL+A1d+Xipd3cqGInOrONgGxM2s2X2CAIElU71kQHqeQoutH6K4fFoaTW2dvLm3xmwpvktXh9F5cNJCy7etrW1s46NDp8jP9eBcYW/CYmDi9YbL1+JVHCYnRzMpKYpVJruxbAPiRlSVNYUnuGxCPAlRod656IgxkDbTkRNgbS4ZF0dCVCjrCu3aWP1y+D1oqYPsm81W4jIbiqroVjwbfXU+2SugqQaOfeS9a3qIZXmp7Dh2mvJ683LBbAPiRnaXn+F4fQtLveW+6iF7hRHWWXvAu9d1M4EBwpKcFN7ef5JGu0Jv3xS/AqExkDHfbCUus3p3JZOTo8hIivLeRTMWQPAw4/docXrc5GbmhNgGxI2sKTxBSFAAC7O93NQn60ZA/OKmWJqbQntnN5tLbTfWZ+hoNYoCTlkKQV4a4XqI8voWdh4/453J896ERMDkxVC62nAHWphRsRFMGz2cdSZWs7YNiJvo6lbW7animkkJRIcFe/fi0Skw5nLDgFg8RHHaKCNE0cybwmc5uAnazhqBExZnvWPyd6m35gp7k70Czp2Gw+96/9pu5oapqeytOktZbZMp17cNiJvYdqSe2sY277uvesi+CeoOGq4sCxMQICyZmsL7B2o509JuthzfonglDEuAMVearcRl1u05Qe4oL4S698X4a40JdT8YsS/JSUEE0+YNTTUgIrJQRPaLyCERebCP7Q+ISKmI7BGRt0Qkvde2LhHZ7VjWnH+st1m35wThwYFcOznRHAFT8o3MZD+YTF86NZXObuWNEvMzbX2GtiY4sMlokBTogXpRXuToqWaKK8+ydKqbmqxdLEGhhhtw7zrDLWhhkmPCmJUey7o95syDmGZARCQQeAxYBGQCd4hI5nm77QJmqupUYCXws17bzqlqnmNZhol0dnXzenE1105J9EwxOGcYFgfjrjHCeS3uxspOiyY9LoK1djTW3znwutE4Ktv6pUt6vuwWu6tL52DIXmEUIz202TwNbuKG3BQOnmxif3Wj16/drwERkd+JyP/0t7jh2rOBQ6p6WFXbgReA/N47qOo7qtoTo7YVcHOzAPew9XA9dc3t5j1R9ZB9EzSUG0XjLIyIsHRqKlvKTnHK5Exbn6HkNYhMhlGXmK3EZdbtqWJm+ghSh4ebJ2LMlRAR7xdurEXZKQQIpoxCBhqBFAA7BlhcJQ0o7/W5wrGuP74I9K5dHiYiBSKyVUSW93eQiNzv2K+gttYzNfTXF51gWEggV08yyX3Vw6TFRr90P6iNdUNuCt0KG03OtPUJWs/Cwc2QtdwoomlhDp00KjXcYPbDVmAQZObD/teh3drdMBOiQpk7Po51e6q83lOn3/9GVX269wKsPO+zq/SVetrnTy8idwMzgZ/3Wj1aVWcCdwK/EZE+myKo6hOqOlNVZyYkJLiq+TN0dHWzsbia6zKTCAsOdPv5L4rw4TBhHpSsMgruWZhJSVFkJEay1o7GMtxXXW1+UXl3bWEVIia7r3rIutFwCx7cZLYSl7lhaipHTjVTcuKsV697wccZEZkrIqXAXsfnXBH5XzdcuwIY1evzSOAzYzARuQ74IbBMVT/1Z6jqCcfrYeBdYJobNF00W8rqONPSwQ1mhCP2RdaNcLYCKgvMVuISIkY01vaj9Zw8a+2JTpcpftXo/TJyltlKXEJVWbfnBHPGxpLozr7ngyX9UhiW6Bcj9oVZyQQFyKeVwL2FM+Ph3wALgDoAVS0E3BFHuB3IEJGxIhIC3A78QzSViEwD/g/DeJzstX6EiIQ63scDlwGlbtB00awrPEFUaBBXTnRz3/PBMmkRBIb4xU2xJCcFVdhYPISjsc6dgbK3jOgri7uv9tc0Ulbb7L06cRciINAosHhgk+XdWCOGhXDZhHg2FHnXjeXUf6Sqlp+3yuVKZKraCXwDeANjdPOSqpaIyI9FpCeq6udAJPDyeeG6U4ACESkE3gEeVVWvG5D2zm7eKKlmflYSoUEmu696CIuBCfP9wo2VkRTFxKTIT5POhiT7N0BXu6PagLVZv6eKAIFF3q7UMBA9bqwDb5itxGWW5KRQXn+OokrvtYZ2xoCUi8ilgIpIiIj8Cw53lquo6gZVnaiq41X1Ece6h1R1jeP9daqadH64rqpuUdUcVc11vP7ZHXoulo8OneJsa6f5E4Lnk3UjNJ4wyrxbnMU5Q9yNVfIaxIyCkTPNVuISqsr6oiouGRdHfKQPlWEZPRcik/xixH59VhJBAeLVBy5nDMhXga9jREhVAnmOz0Oe9UVVRIUFcfkE90/Ou8SkhUbfAz+4KYa0G+vcaSh7x4gW8ka5cw+yv6aRw7XNvjF53puAQOP3e3CTkaxpYYZHhHCpl91YFzQgqnpKVe9yjAQSVPVuVa3zhjhfpr2zm00l1czPTCIkyMd806FRRrXW0tW2G8vK7NsA3R1+EX21weG+8nqhUWfIXA6drXDQ+m6sG7zsxnImCmuciKwVkVoROSkiq0VknDfE+TIflRnuqyW+9kTVQ9aN0FhltD+1OEtyUoemG6t0FcSMhrTpZitxiR731ZyxPua+6mH0JUaSph+M2L3txnLm0fk54CUgBUgFXgae96QoK7BhTxVRoUFcnuEj0VfnM3GB4cYqXW22EpdZMjV56Lmxzp1xuK+WWd59daCmibLaZhb72lxhD5+6sTZDm/fLgbgTb7uxnDEgoqp/U9VOx/IM/ST8DRU6urrZVFrD/Ewfir46n9AomHAdlK6xvBtrQmIUk5KiWD+Ukgr3bzTcV5n9FlmwDOuLHO6rLB90X/WQ1ePGsn5S4ZKcZMrrz1Fc6fmkwoFqYcWKSCzwjog8KCJjRCRdRL4HrPe4Mh/mo0OnaDjX4XsTgueTtdyIxrJ4UiE4orGODSE3VukqiB7pF9FXG4qqmD021nttngfDqDmOaKxVZitxmeszjaTCdUWeTyocaASyA6Me1m3AVzDyLd4FvgZ83uPKfJiNRdVEhQZxha8kD/bHxAWOpELr3xQ9bqzXh0KJ99YGKHvbL6KvDtQ0cehkk+/OFfYQEAhTlhluLD9IKvSWG2ugWlhjVXWc4/X8ZchOond0dfNGqVH7ymfdVz2ExcD4ecY8iMVLvE9INGpjbRgK0Vj7XzeSBzPzL7yvj7OhyKh9tcAXo6/OJzPfURvL+iXee9xYnq6N5VT8qYhki8itIvK5nsWjqnyYjx21r3wqm3YgMvMdtbGsXeIdYFFOyqedH/2a0lUQlWr52lcAG4urmD0mlsQoH6h9dSHSLzVKvJdaf8Q+PzOZwADx+AOXM2G8PwJ+51iuwWjqZGoDJzPZWFxFZGgQV070seTB/pi0yCjx7gc3xeKcZLoV/+5U2HoWDr1lGH6L1746dLKRAzVNvj9X2ENAoNGp8MAmaG+58P4+TOywEOaOi/O4G8uZ/9CbgXlAtap+HsgFfHg2zHN0dnXzRkkN86Ykml+63VnCh8P4a6DE+m6sSUlRjEsYxsZiP3ZjHXjDKN3uB+6rjUWGoffJ5MH+yFoOHc1GAUuLsygnmaN1LezzYKdCZwzIOVXtBjpFJBo4CQzJOZBtR+qpb25nUbZFnqh6yMyHhuNwYqfZSlxCRFicncLHZXXU+WunwtJVjs6Dc8xW4jIbiquZmT6CJF8o3e4s6ZdDeKxfBJ4syEomQDzblM0ZA1IgIsOBP2JEZu0ErF+lbxBsKK4iPDiQq6zivuph0mIICDJyQizOIocba1NpjdlS3E9bExx600getLj76sipZvZWnWWRVdxXPQQGwZQbjCZeHdYOGY+PDGXO2Dg2eDAB15laWP+kqmdU9XFgPnCvw5U1pOjqVl4vruHayYmEh1jEfdVDRCyMvcovorEyU6JJj4vwz2isg5uMZDZ/cF853IyWcl/1kLkc2pv8wo21OCeZQyebOFjjGTfWQImE089fgFggyPF+SFFwtJ5TTW0syrHgDQHGl9LpI1BdZLYSlxARFueksKWsjtPN7WbLcS+lq2FYglFi3OJsKKoib9Rw0oaHmy3l4hl7JYQN94sR+4KsZERgQ5FnRiEDjUB+OcDyC4+o8WE2FlcTGhTANZMSzZYyOCbfABLoF7WxFmen0NWtbN7rR26s9hZjBDJlqRENZGGO17VQXHmWxVZ92AoMNu6X/Ruh09pzbYnRYcxKj/VY4MlAiYTXDLBc646Li8hCEdkvIodE5ME+toeKyIuO7Z+IyJhe277vWL9fRBa4Q09/dHcrG4uruHpSAsNCgzx5Kc8xLA7GXG5M0lrcjZWdFs3IEeEenRz0OmVvQUeLX7mvLBds0pvMfGhrgMPvma3EZRblJLOvupGyWvf3OzFtpk5EAoHHgEVAJnCHiGSet9sXgdOqOgH4NfBTx7GZGD3Us4CFwP86zucRdpWfpuZsm3Xi2fsjMx/qDsFJtzSUNI0eN9aHjppkfkHpaiP6J/1ys5W4zIbiarLTohkVG2G2lMEz7ioIjfaLEfuSnBT++6Ycj9QiMzPUYzZwSFUPq2o78AJw/uNXPvC04/1KYJ6IiGP9C6rapqpHgEOO83mEDUXVhAQGcO1ki7qvepiyFBC/uCkWZSfT0aW85Q9urI5Wo3zJlBuMKCALU3G6hcLyM9Z/2AoKNZJw962DLms/pCRGh3HH7NFEhwW7/dxmGpA0oLzX5wrHuj73UdVOoAGIc/JYAETkfhEpEJGC2traQQk919HFdZmJRHngD+BVIhMh/TK/MCB5o4aTGhPmsclBr3L4HWhv9Av31euOkFFLu696yMyH1jNw5H2zlfgszpQyuUxEhjne3y0ivxKRdDdcu68yo+c75/vbx5ljjZWqT6jqTFWdmZAwuPyN/7oxh8fu9JPAs8xlULsXavebrcQlRISF2Sm8f7CWprZOs+W4Rulqo/DlmCvNVuIyG4urmZISzdj4YWZLcZ3x10JIJOy1fjSWp3BmBPIHoEVEcoHvAceAv7rh2hXAqF6fRwLnF7D/dB8RCQJigHonj3UrYvGy2p8yZanx6gejkMU5ybR3dvP2vpNmSxk8ne1G7/PJN0BQiNlqXKK6oZUdx06z2Iq5H30RHG60RNi7Dros/pDiIZwxIJ1qVOPKB36rqr8Fotxw7e1AhoiMFZEQjEnx8039GuBex/ubgbcdWtYAtzuitMYCGQzR7PiLJjrVKJPhBzHu00ePIDEq1NrRWIffNaJ9/MJ95Yi+svr8R28y86HlFBzfYrYSn8QZA9IoIt8H7gbWO6KdXJ4McMxpfAN4A9gLvKSqJSLyYxHpqfb7ZyBORA4BDwAPOo4twejTXgq8DnxdVbtc1TRkyFwONUVQV2a2EpcICBAWZifzzv6TtLRb9AmxdLUR7TPuarOVuMzG4momJkUyITHSbCnuY8J8CI7wi9pYnsAZA3Ib0AZ8UVWrMSarf+6Oi6vqBlWdqKrjVfURx7qHVHWN432rqt6iqhNUdbaqHu517COO4yap6kZ36Bky+JEba1F2Cq0d3by7f3ABEqbS1WFE+UxaZET9WJjaxja2Ha33j8nz3oREQMZ82LsWuu1n1PNxphZWtar+SlU/cHw+rqrumAOxMYvhoyBtpl/0CJk9Npb4yBBr1sY68r4R5eMH7qs3SqpRxfrhu32RmQ/NJ+H4VrOV+BwD1cL60PHaKCJney2NIuLZPok2niczH6oKof6I2UpcIjBAWJCVzNv7TtLaYbEnxNLVRpTPeLcUdjCVjcVVjEsYxsQkP3Jf9ZCxAILC/GLE7m4GKmVyueM1SlWjey1RqhrtPYk2HiHTMc3kByGKi3NSaGnvspYbq6vTcF9NXGBE+1iYuqY2th6uZ1F2sv9EK/YmNBImXGfcK93dZqvxKZzJA7muj3X39rWvjYUYMQZS8vziqWrO2Fhih4VYq1PhsY+gpc4IaLA4b5TU0NWt/um+6iFzOTRWQYUd7NkbZybRHxKRP4jIMBFJEpG1wFJPC7PxApn5ULkDzhw3W4lLBAUGsCAribf2WsiNVbraiO6Z8JnnM8uxsbiKMXERZKb4sWNi4gIIDPWLBy534owBuQooA3YDHwLPqerNHlVl4x16Jm/9ICdkUXYKTW2dfHDwlNlSLkx3lxHVkzHfiPKxMPXN7Wwpq2NxTop/uq96CIuGCfMMA2K7sT7FGQMyApiDYUTagHTx6/+UIUTceEjO8YtorLnj4xgeEWyNaKxjW4yoHj9wX20qqfZ/91UPmflwttIYtdsAzhmQrcBGVV0IzAJSgY88qsrGe2TmQ8V2aKgwW4lLBAcGcH1mEm+W1tDW6eNurNJVEOQok2FxNhRXMzo2gqxUP3Zf9TBxIQQE+8UDl7twxoBcp6pPAqjqOVX9Jo6McBs/IPNG49Uf3Fg5KTS2dfKhL7uxuruM33XGfAixdsHBMy3tbDl0yv/dVz2EDzdCrktXW74pm7twJpHwuIiMEJHZInKliFi/ZKjN34mfAEk5UPKa2Upc5rLx8USHBbHel91Yxz823FdZ/uC+qqGzW1kyFNxXPWQth4ZyqNxpthKfwJkw3i8B72PUrPoPx+vDnpVl41Wy8o3wRIu7sUKCArg+K5nNvuzGKlllJKVl+IP7qoqRI8LJThsC7qseJi023Fglr5qtxCdwxoX1zxhzH8dU9RpgGmChjC2bC+JHbqwlU1NobPVRN1Z3l5GMljHfSE6zMA0tHXx06BRLhor7qgfbjfUPOGNAWlW1FUBEQlV1HzDJs7JsvEr8BEjK9ovJQZ92Yx3fCk01/hF9VVpNR9cQib46n6wbHW4sOxrLGQNSISLDgVXAZhFZjYebN9mYQOZyKP8EGirNVuISIUEBLMhKZnOJD7qxSh3uq4kLzVbiMuv2VDEqNpypI2PMluJ9Ji1yuLGsP2/oKs5Mot+oqmdU9WHg3zF6dFj/EcrmH+mZ1PWD2lhLphrRWB8c8CE3Vne34SKccJ3l3Venm9sd7qvUoeW+6iF8+N+TCoe4G8uZEcinqOp7qrpGVds9JcjGJOIzDDeWHzxVXTYhnpjwYN9yYx3/GJqqDfeHxdlUWk1nt3LD1CHovurBdmMBF2lA3IWIxIrIZhE56Hgd0cc+eSLysYiUiMgeEbmt17anROSIiOx2LHne/Qn8lE/dWNaOxgp21MZ6s7TGd2pjlbzqSB70D/dVetwQSR7sj0mLIDDELx64XMEUA4KRiPiWqmYAb9F3YmIL8DlVzQIWAr9xzMX08F1VzXMsuz0veQiQfZPx6gftO5dMTTXcWL4QjdXVabg7Ji6wvPuqrqmNLWV1Qy/66nzCYmD8PONeGcK1sZzJA/lGXyMEF8kHnna8f5o+5lRU9YCqHnS8PwGcBBLcrMOmN3HjIXmqX8S4X+qojbV+jw/Eexz7EJpr/26gLUxP6fYlQ9l91UPWjXC2AioLzFZiGs6MQJKB7SLykogsdFMhxSRVrQJwvCYOtLOIzAZCMAo69vCIw7X1axHpt6G0iNwvIgUiUlBba6evXJDsFYZf9/RRs5W4RHBgAAsyjaRC091Yxa9C8DDIuN5cHW5gfdEJxsUP8+/S7c4yaZFR4r34FbOVmIYzUVj/BmRgRF/dBxwUkf8SkfEDHScib4pIcR/LRTWAFpEU4G/A51W1Z6z4fWAyRoJjLPCvA+h/QlVnqurMhAR7AHNBeiZ5i60/Clmam0pzexfv7DtpnoiuDiOybdIiy3cePNXUxsdldSyZOsTdVz2ERcPE6415kG4fmWvzMk7NgaiqAtWOpROjxPtKEfnZAMdcp6rZfSyrgRqHYegxEH3e4SISDawH/k1Vt/Y6d5UatAF/AWY79dPaXJgR6ZA20y/cWJeMiyU+MoS1ZrqxDr8H5077hftqY3E13crQTB7sj+wVRnLosS1mKzEFZ+ZAvikiO4CfYZRxz1HVrwEzgBWDvO4aoKct7r3AZ9p8iUgI8BrwV1V9+bxtPcZHMOZPigepw6YvsldAdRGcOmi2EpcICgxgSU4Kb+09SVNbpzkiSl6F0Gi/6Dy4dvcJJiRGMjk5ymwpvkPGAsM9OUTdWM6MQOKBm1R1gaq+rKodAA530g2DvO6jwHwROQjMd3xGRGaKyJ8c+9wKXAnc10e47rMiUgQUOfT95yB12PRF1nJA/MaN1dbZzZulNd6/eGcb7F0Hk5dAUL/TdJagquEc247Wsyx3iCYP9kdIhOGeLF1tuCuHGM7MgTykqsf62bZ3MBdV1TpVnaeqGY7Xesf6AlX9kuP9M6oa3CtU99NwXVW9VlVzHC6xu1W1aTA6bPohOhVGz/ULN9b00SNIjQljbaEJbqyyt6GtAbKs775aV2gkZS7LTTVZiQ+SvQLO1RvuyiGGWXkgNr5O9k1Quw9qSsxW4hIBAcINuam8f7CWMy1eLqBQ9DKEx8L4a7x7XQ+wpvAEU0fGMCbe2k2wPMKEeRAa4xcPXBeLbUBs+iZzOUggFK00W4nLLJ2aSkeX8kZJtfcu2tYE+zYY7sDAYO9d1wMcOdVMUWWDPfroj6BQmHID7F1ruC2HELYBsembyAQYd7VhQCxeMC47LZoxcRGsLfRibaz9G6HzHOTc4r1reog1u08gAjdMtQ1Iv2TfBG1n4eBms5V4FduA2PRPzi3QcBzKt5mtxCVEhKW5qWwpO0Vto5eeEItehug0GHWJd67nIVSVNYWVzBoTS3JMmNlyfJexV0NEPBS9ZLYSr2IbEJv+mXKD0b+i6OUL7+vjLMtNpVthnTdyQlrqoewtY3I1wNq32N6qRspqm2331YUIDDJGIftfh9YGs9V4DWv/d9t4ltAoI0Sx5FXLhyhmJEWRlRrNql1eaJhVugq6O/3DfVV4gqAAsZMHnSHnVuhqM+ZChgi2AbEZmJxboaUODr9rthKXWZ6XRmFFA4drPRz1XbQS4idCco5nr+NhuruVNbsruTwjnthhIWbL8X1GzoQRY/1ixO4stgGxGZgJ10HYcL+4KZbmpiICq3Z70I3VUAHHPjJGHxZPuPvkSD0nGlq5cVqa2VKsgYjxdz/yPjR6MeLPRGwDYjMwQSGQmW9kVLc3m63GJZJjwrh0fByrd1einoos6wl7zh5slR/fYdWuSoaFBHJ9ZrLZUqzD1FtBu4dMaRPbgNhcmKm3QkezEZpqcfLz0jhW18Ku8jPuP7kq7HkRRs4yeqtYmNaOLjYUVbEwO4XwkECz5ViH+AxIyYM9QyMayzYgNhdm9KVGSOqeF81W4jILs5MJCQpgtScm06uL4GQpTL3twvv6OG/tPUljW6ftvhoMU2+Fqt2WL0bqDLYBsbkwAQHGl+Kht6DRhKKEbiQ6LJj5U5JYt6eKji43tyItfAECgv3CffXarkqSokOZOz7ObCnWI3sFSIBfPHBdCNuA2DhH7h2gXX4xmZ6fl0pdczsfHHRjh8quTiOJbOICiIh133lNoL65nXf3nyQ/L43AAGsHAphCVLJRxaHwBb/vl24bEBvnSJgIaTOg8HmzlbjM1ZMSGRERzCs73ejGKnvb6Huee4f7zmkS6/ecoLNbWZ5nu68GTe6d0FAOxz40W4lHsQ2IjfPk3gE1xYav38KEBAWQn5fG5pIaGlrclCC55wUIH+EXfc9f21XJ5OQoMlPtvueDZvISo5HYbus/cA2EKQZERGJFZLOIHHS8juhnv65ezaTW9Fo/VkQ+cRz/oqN7oY2nyV5h+Pj94Ka4ecZI2ru6WeOO0iatDbBvvfH7CbL2v2JZbRM7j5+xJ89dJSTCqMRcutqozOynmDUCeRB4S1UzgLccn/viXK9mUst6rf8p8GvH8aeBL3pWrg1g+PYnLjB8/V0mtYh1E1mp0UxOjmJlQbnrJytdDZ2tfuG+eHxcQQAAHgxJREFUermggsAA4cbptgFxmby7jPD3vWsuvK9FMcuA5ANPO94/jdHX3CkcfdCvBXoaVVzU8TYuknen4esve8tsJS4hItw8YySFFQ0cqGl07WS7n4e4CcYckYXp7Orm1Z0VXDMpgcQou/Kuy4yaA7HjYPdzZivxGGYZkCRVrQJwvCb2s1+YiBSIyFYR6TESccAZVe15BK4A7MclbzFhvtFlzw9uiuXT0ggKEF7ZUTH4k5w6BMe3GE+bFi9d8v7BWk42tnHLzFFmS/EPRIxR6dEP4Mxxs9V4BI8ZEBF5U0SK+1jyL+I0o1V1JnAn8BsRGQ/0dZf2W5dCRO53GKGC2lo3hm0OVYJCjJyQ/Ruguc5sNS4RHxnK1ZMSeXVXJZ2DzQnZ9Tejc2Pene4VZwIvF1QQNyyEayf39zxnc9Hk3m68Fr5grg4P4TEDoqrXqWp2H8tqoEZEUgAcryf7OccJx+th4F1gGnAKGC4iQY7dRgL9zoSq6hOqOlNVZyYkJLjt5xvSTL8Hutr9IlHq5hkjqW1s44ODpy7+4K5OI6x54gIj9t/C1De38+beGm6clkZwoB2c6TaGj4axV8LuZ/0yJ8Ss/5Q1wL2O9/cCq8/fQURGiEio4308cBlQqkYVvHeAmwc63saDJGVB2kzY+bTl291eOzmR2GEhvLh9EJPpBzdBUw1Mu8f9wrzMql2VdHSp7b7yBNM+B6ePwpH3zFbidswyII8C80XkIDDf8RkRmSkif3LsMwUoEJFCDIPxqKqWOrb9K/CAiBzCmBP5s1fV28D0z0HtPqjYbrYSlwgJCmDF9DTe3FvDycbWizt4518hMsnyuR+qyksF5eSOjGFScpTZcvyPKUuNHKGdT194X4thigFR1TpVnaeqGY7Xesf6AlX9kuP9FlXNUdVcx+ufex1/WFVnq+oEVb1FVb3U6NrmU7JvguBhfnFT3D57NJ3dysqLmUxvrDZGILl3GO1MLUxRZQP7qhu52R59eIbgMOP/ZO86aB6Eq9SHsZ2dNoMjNMowIsWvQutZs9W4xPiESOaMjeWFbeV0dzvpktv9nFEbzA/cV89uPU54cCD5eXbfc48x/V7o7vCL6MXe2AbEZvDMuA86Woye6RbnzjmjOV7fwkdlTjwhdnfDrmcg/TKIn+B5cR6k4VwHawpPkJ+XSnRYsNly/JfEyTDqEr+YN+yNbUBsBk/aDEjMhB3Wd2MtyEpmREQwz29zIl7/yHtQX2Y8VVqcVbsqOdfRxV1z0s2W4v/MuBfqDhktj/0E24DYDB4R40v0xE44sctsNS4RFhzIiukj2VRSQ23jBabUtv8JIuKMWkcWRlV59pNjTB0ZQ87IGLPl+D+ZyyE0BnY8ZbYSt2EbEBvXyL0dgiNg258uvK+Pc8ccYzL95R0DhPQ2VBhJlNM/B0Gh3hPnAQqOneZATRN3zRlttpShQUiE0a2wdI3fTKbbBsTGNcKHG5npxSuhpd5sNS7RM5n+/LbjdPU3mV7wF8OHPfML3hXnAZ7deoyo0CCW5tqT515j1hehq80vohfBNiA27mD2l41qtLv+ZrYSl/nc3DGU15/j7X19FEfobDdu/IkLjQxjC1Pf3M6Gompump5GRIi1w5AtReIUIzN9+5OWr2gNtgGxcQdJWZB+uTE30N1lthqXWJCVREpMGH/56MhnN+5dY1QinvUl7wtzMy9uL6e9q5s77clz7zPnq3C2AvavN1uJy9gGxMY9zP6yUXH04CazlbhEUGAA98xNZ0tZHfurzyvzvv1PMGIsjL/WHHFuoqOrm6e3HOWyCXF25rkZ9IxgP/k/s5W4jG1AbNzD5CUQlQrbnjBbicvcMWs0oUEBPLXl6N9XVhfB8Y8NH3aAtW+bDUVVVJ9t5QuXjTVbytAkIBBmfdkI560uNluNS1j7TrDxHQKDYebnoextqD1gthqXGDEshBunpfHargrOtLQbK7f8HkIiLZ95rqo8+eERxsUP45pJdtl205h2NwSFwzZrj0JsA2LjPmbc9//bu/PwKuqrgePfk5CQQEB20LAkCAqBQJTIIosRFIRSFZVNpVZARBbRKpW2vi19qxQtVepSfAVUsCiWRbAI1aIoYBCTsENEQFmCQWPQhACBLOf9Yy40QEK2m8y95Hye5z5JZubOPXcgOXd+v5lzILA6fP6y25GU2y+7R5Cdk8/ChEOQcdi5yuzaXzhXnfmxpAM/sjUlg/u7RxAQ4N8NsPxajXrOJb3bFvn11YuWQIz3hDVyGitteRuOfed2NOXSpklturWsz/z4/eR/Pgs035n89HNz13/DZaFB3NmpqduhmC5jIfckJPhvMXFLIMa7rp/oNJvy81NzgFE9IsnMOEpewuvOXcR1/fuKpUNHT/DBziMM79zcLt31BY2joHU/2DgLTp9wO5oysQRivKv+lU7/g4Q5cOpY8dv7sN5tGjGxzgaCcrPI7zbR7XDK7fXP9hMgwn3X+3civKT0eBROpDvFOf2QJRDjfd0nQXYGbPLvGwsDNJcR8j6f57flo8xwt8Mpl/SsU7z9xUFu7XgFl18W6nY45owW3ZwqvfEvQl6O29GUmivnsSJSD3gHiAD2A0NU9cfztrkReL7AojbAMFVdJiJvADcAGZ51v1TVLWWJJScnh5SUFLKzS9mN7hIVEhJC06ZNCQoqR2nvprHOjYUbXnbuDwn00zLhO5ZS4+QR3g0dyZdr9nJT20aI+OfE85z135Cdm8e4G/27/Pwlqcej8PZQp7dOx6FuR1Mqbg2ETgE+UtXpIjLF8/MTBTdQ1TVADJxNOHuBgnepTVbVxeUNJCUlhVq1ahEREeG3fxy8RVVJT08nJSWFyMhy3iPQ/WF4awhsXwwxw70TYGXKy4W1z0KjKKKvGcI7y3exYV8617dq4HZkpfbTidPMj9/Pz6Ivp1WjMLfDMedr3ddpi7D+eYge7Ff3GbkV6W3AmWpi84Di6mLfBaxSVa/PNGVnZ1O/fv0qnzwARIT69et752ys1c3QOBrW/sU/a/7sWOz0boibwl2xzWlYqzp//2Sf21GVyWuf7ef46Twm9LazD58UEADdH4G0ZNjzgdvRlIpbCaSxqqYCeL4Wd0fTMODt85Y9LSLbROR5ESmyrraIjBGRRBFJTEtLK2qbUoR+afPasQgIgBt/6zRe2rbQO/usLHm58Okz0Lg9tPk5IUGBjO4Ryfq9P7D54I/FP9+HZGbn8Ppn39CvXWPaNKntdjimKO3vgDotYM00p+Oln6iwBCIiq0VkRyGP20q5n8uBaKBgav4NzpzIdUA9zhv+KkhVX1XVWFWNbdiwYRneiSmzq/vDFdfCJ884lWz9xfZ/wtGvIW7K2eGEe7q2oH7NYGZ8uNvl4Epnfvx+jmXnMrF3a7dDMRcTGOR84DqyDZKXux1NiVVYAlHVm1S1fSGP5cB3nsRwJkEUUjv7rCHAu6p69hIFVU1VxyngdaBzRb2PyvD000/Trl07OnToQExMDBs3bmT06NHs2rXL7dDKRwR6/w4yDsLm+W5HUzJ5ufDps9AkGtoMPLs4rHo1JvRuxWd701m3p/AzWV+TcSKH2eu+oXebRrQPt46DPi96MDRsCx8/7TfDvm4NYb0HnGkofR9wsZQ7nPOGrwokH8GZP/HbimQbNmxgxYoVbNq0iW3btrF69WqaNWvGnDlziIqKcju88ruyj3OZ4toZkHPS7WiKt20h/PgNxP3GSYAF3N2lOeF1Qnn237vJL6rhlA95+ZO9ZGbn8Hjfq90OxZREQCD0fhLS98DW80fsfZNbV2FNB/4pIqOAg8BgABGJBcaq6mjPzxFAM+DT856/QEQaAgJsAbxSY+KP/9rJrm8zvbGrs6KuqM0fft6uyPWpqak0aNCA6tWdaZwGDZyrfOLi4pgxYwaxsbGEhYUxadIkVqxYQWhoKMuXL6dx48akpaUxduxYDh48CMDMmTPp3r27V+MvNxHnl2LeQEh8DbqNdzuiop0+4YxBXx4DVw+4YHX1aoH86uareGzRVlbuSGVgB9/t5Hfo6Ane+Gw/d1zTlKgrbO7Db7T5mWfYd7pTK8vH2ya7cgaiqumq2kdVW3u+HvUsTzyTPDw/71fVcFXNP+/5vVU12jMkdq+qZlX2e/CWvn37cujQIa666irGjRvHp5+enyvh+PHjdO3ala1bt9KrVy9mz54NwKRJk3j00UdJSEhgyZIljB7to42OIntCyzjniixfLhwX/wJkHoZb/nzB2ccZt18TztWNa/HXD78iJ893Jzv/+uFuRODxfle5HYopDRHo83un4VTia25HUywriFPAxc4UKkpYWBhJSUmsW7eONWvWMHToUKZPn37ONsHBwQwc6IzHd+rUif/85z8ArF69+px5kszMTI4dO0atWj7YJKjfNHilJ6x5Gn72V7ejuVBGCqyf6dS8anF9kZsFBgiT+13N6PmJLEw4xIiuvlcWZHtKBsu2fMu4uCvtrnN/1DIOIm9wzkKih0DN+m5HVCRLID4gMDCQuLg44uLiiI6OZt68eeesDwoKOnt5bWBgILm5zgRbfn4+GzZsIDTUD/5ING7ntIJNmO2UfW8S7XZE51o91am4e/P/Frtpn7aN6NqyHjM+2M2A9k2oH+Y7wwyqyrSVydSrGczYuCvdDseUhQj0fwZmdYePpsKtL7odUZH855bHS9Tu3bvZs2fP2Z+3bNlCixYl+1Tbt29fXnrppXOe69Nu/A2E1oWVvwb1oUnoQ1/A9kVw/YQSVdwVEf50W3uOn8rlz6u+rIQAS27FtlQ2fJ3OpD6tqR3ipyVkDDRqC10fgk3z4VCC29EUyRKIy7KysrjvvvuIioqiQ4cO7Nq1i6lTp5bouS+88AKJiYl06NCBqKgoXnnllYoNtrxC6zrjuwfjYccSt6Nx5OfBqicgrAn0+FWJn9a6cS3G9GrJ4qQUNn6dXoEBltyPx08z9b2ddGh6Gff64NCaKaW4KVDrclj5mPP/1AeJ+tInwQoWGxuriYmJ5yxLTk6mbdu2LkXkmyr0mOTnwewbIet7GPe5+x3+1s+E1X+AO+dC9F2leurJ03nc/PynhAYF8v7DPQmu5u7nsccXbeXdzYf514QeduXVpWLHElg8EgbMcAqTukREklQ19vzldgZiKldAIAycCcfTYOVkd2P5/ktnUr/NQGh/Z6mfHhocyB9vbcee77OYs/7rCgiw5Nbv+YHFSSk82KulJY9LSbs7nAn1j/4EPx1yO5oLWAIxlS/8Wuj1a6dkyI6l7sSQlwvLxkL1Wk5CK2MNsD5tG9O/fRNm/mcPOw5nFP+ECnDydB6/fXc7kQ1q8nAfK1lySRGBn/8NNA+WjvG5oSxLIMYdPR+D8E6w4lHI/LbyX/+zmfDtZueS4rDy1UibNiiaejWDmfj2ZrJOVX4Jiqnv7eTg0RNMGxRNSFBgpb++qWD1Ip3/pwfjYd1zbkdzDksgxh2B1WDQq07/9OXjK/eqrEMJzjX27e6AdoPKvbu6NYOZOSyGA+nH+f3yyq2q807CQd5JPMTE3q3odqXv3i9gyqnDUGh/F3zyZ+eqQR9hCcS4p0Er6PsU7PvY+cWoDBmHYeHdcFm4V29o7NqyPg/3ac3STYdZkpTitf1ezI7DGfzP8p30aNWAR26yO84vaSIw8DmoHQ5LRsNJ32grYAnEuCt2JFxzr9N/Y0sFF5A7fQIWDneKOg5fCDXqeXX3E3u3pktkPZ5ctqPC+4ZknMhh3IJN1K8ZzN+GxRAYYD1tLnkhl8Gdc5wh34X3QI77bbgtgfiAI0eOMGzYMK688kqioqIYMGAAX331Van2MWDAAH766acKirACiTiT2JG94L2JsH99xbyOqjNUlroN7prr3KjlZYEBwot3X0PDWtW5/40EvvrumNdfA+DE6VzGvJlIasZJXr7nWp+6E95UsOZdYNArcOAzWPqA65PqlkBcpqoMGjSIuLg49u3bx65du5g2bRrfffddqfazcuVK6tRx+Z6KsgoMgiFvQr2Wzier75O9u//8fFj1a9i5FG6aClf18+7+C2hUK4R/jOpCcGAAI+Zu5NBR73ZhPnk6j1FvJJKw/yh/HRLDtc3renX/xg9E3wV9n4bk9+DfU1yt6mC1sApaNQWObPfuPptEQ//pRa5es2YNQUFBjB3734r0MTExqCqTJ09m1apViAhPPvkkQ4cOJTU1laFDh5KZmUlubi6zZs2iZ8+eREREkJiYSFZWFv3796dHjx7Ex8cTHh7O8uXLCQ0NZd++fYwfP560tDRq1KjB7NmzadOmjXffb1mF1oF7/glz+8Jrt8CwtyDCC6Xp83Jg2UNOqZJuE6D7pPLvsxjN69dg/qjODHllAyPmbmTBA10Jr1P+emXZOXmMnp/Axm/SeW5IDLd29N1y8qaCXT8BjqXChpcgoJozlxhQ+Vfg2RmIy3bs2EGnTp0uWL506VK2bNnC1q1bWb16NZMnTyY1NZW33nqLfv36nV0XExNzwXP37NnD+PHj2blzJ3Xq1GHJEqdsyJgxY3jxxRdJSkpixowZjBs3rsLfX6nUjYBRH0LNhvDm7bB9cfn2d/qEM2G+fRH0+YPzS+atnu/FaNOkNq/f35n0rNP8/MX1xO/9oVz7O5KRzS/mfkH8vnRmDO7I7deEeylS47du/hN0fhA+/zu8NRSyK/8+JDsDKegiZwqVbf369QwfPpzAwEAaN27MDTfcQEJCAtdddx0jR44kJyeH22+/vdAEEhkZeXZ5p06d2L9/P1lZWcTHxzN48OCz2506darS3k+JnUkiC++GJaOc4axej0NQKT/BH9wI/5oEaV86cyyx91dIuBfTqUVdlk/ozoNvJnHv3I1M6d+GB3q2PFtZuaQ+2HmEJ5Zs43RuPjOHxnBbjCUPAwQEwIBnoVEbp6rDnJth2AJoUHk3k7pyBiIig0Vkp4jke7oQFrXdLSKyW0T2isiUAssjRWSjiOwRkXdEJLhyIve+du3akZSUdMHyomqU9erVi7Vr1xIeHs6IESOYP//CXuNnuhvCf8u/5+fnU6dOHbZs2XL2kZzs5bkGb6lRD0Ysg47DYd0MeKkz7FxWsrHe7AxY8St4rR+cOgb3LHYleZzRsmEY747vzi3tmzBt5ZcMfmUD6/akFfnvW9DB9BM8sXgbD76ZRLO6NVgxsYclD3Oh2JHO78vx7+HvXeH9x+BY6eZQy8qtIawdwB3A2qI2EJFA4GWgPxAFDBeRM03CnwGeV9XWwI/AqIoNt+L07t2bU6dOne0yCJCQkEDdunV55513yMvLIy0tjbVr19K5c2cOHDhAo0aNeOCBBxg1ahSbNm0q0evUrl2byMhIFi1aBDgJauvWrRXynrwiKMS52uS+FRBSGxbdB7N7O73Vv93sTIyfkXMSkv/lXB//XDtIeh26joPxG6H1Te69B4+w6tV4+e5rmTYomsM/nWTE3C+4Y1Y8ixIPkZyaebazYX6+8l1mNh8lf8fINxK4YcYaFm9K4cEbWrLkoetp2TDM5XdifFZkT6c46bW/gKQ34IUYp21C8gqncGkFcWUIS1WTgeJO5TsDe1X1a8+2C4HbRCQZ6A3c7dluHjAVmFVR8VYkEeHdd9/lkUceYfr06YSEhBAREcHMmTPJysqiY8eOiAjPPvssTZo0Yd68efzlL38hKCiIsLCwQs9AirJgwQIeeughnnrqKXJychg2bBgdO3aswHfnBZE94cG1sGme84vx8Z+cR3AYSADknoI8z1BcaD1oPwhiR8EVFw7tuUlEuLtLc+7sFM7ipBT+vmYfkxdvAyC4WgANw6rz/bFscvKcM5MGYdWZeGMrhndpbl0FTcnUagIDn3cuFlkzzfkg9cX/OevqRsLwt71++bqr5dxF5BPgcVVNLGTdXcAtZ3qki8gIoAtOsvhcVVt5ljcDVqlq+yJeYwwwBqB58+adDhw4cM56K+d+IZ8+Jlnfw741cDjJSSDVgqFaKDTvChE9nEuC/UBevvLND1ns/DaTXd9mknbsFE0uC+HyOqE0r1eDbi3ru14e3vi5nGxI3QqHPnfKn9w+yzmbL4OiyrlX2BmIiKwGmhSy6nequrwkuyhkmV5keaFU9VXgVXD6gZTgdY0vC2sEHYc6Dz8WGCC0alSLVo1q2byGqRhBIc6Nh827VNhLVFgCUdXyDj6nAM0K/NwU+Bb4AagjItVUNbfAcmOMMZXIl8+RE4DWniuugoFhwHvqjLmtAc60j7sPKMkZTZGqUlfG4tixMMaUlFuX8Q4SkRSgG/C+iHzgWX6FiKwE8JxdTAA+AJKBf6rqTs8ungB+JSJ7gfrA3LLGEhISQnp6uv3hxEke6enphISEuB2KMcYPVPme6Dk5OaSkpJCd7X5lS18QEhJC06ZNCQryj8loY0zFq/RJdH8RFBREZGSk22EYY4zf8eU5EGOMMT7MEogxxpgysQRijDGmTKrUJLqIpAEHit2wcA1w7kGpyuwY2DGo6u8fquYxaKGqDc9fWKUSSHmISGJhVyFUJXYM7BhU9fcPdgwKsiEsY4wxZWIJxBhjTJlYAim5V90OwAfYMbBjUNXfP9gxOMvmQIwxxpSJnYEYY4wpE0sgxhhjysQSSAmIyC0isltE9orIFLfjqUwi0kxE1ohIsojsFJFJbsfkFhEJFJHNIrLC7VjcICJ1RGSxiHzp+f/Qze2YKpuIPOr5PdghIm+LSJUuXW0JpBgiEgi8DPQHooDhIhLlblSVKhd4TFXbAl2B8VXs/Rc0Cae1QFX1N+DfqtoG6EgVOxYiEg48DMR6WmgH4vQpqrIsgRSvM7BXVb9W1dPAQuA2l2OqNKqaqqqbPN8fw/mjUeV6sIpIU+BnwBy3Y3GDiNQGeuHpvaOqp1X1J3ejckU1IFREqgE1qOLdUC2BFC8cOFTg5xSq4B9QABGJAK4BNrobiStmAr8G8t0OxCUtgTTgdc8w3hwRqel2UJVJVQ8DM4CDQCqQoaofuhuVuyyBFE8KWVblrn0WkTBgCfCIqma6HU9lEpGBwPeqmuR2LC6qBlwLzFLVa4DjQFWbD6yLM/oQCVwB1BSRe92Nyl2WQIqXAjQr8HNTqthpq4gE4SSPBaq61O14XNAduFVE9uMMYfYWkX+4G1KlSwFSVPXM2edinIRSldwEfKOqaaqaAywFrnc5JldZAileAtBaRCJFJBhn0uw9l2OqNCIiOOPeyar6nNvxuEFVf6OqTVU1Auff/2NVrVKfPFX1CHBIRK72LOoD7HIxJDccBLqKSA3P70UfqtiFBOer8i1ti6OquSIyAfgA56qL11R1p8thVabuwAhgu4hs8Sz7raqudDEm446JwALPB6mvgftdjqdSqepGEVkMbMK5OnEzVbysiZUyMcYYUyY2hGWMMaZMLIEYY4wpE0sgxhhjysQSiDHGmDKxBGKMMaZMLIEYU4lEJL4U234iIrHFbLNfRBqUYp+/FJGXSrq9MRdjCcSYSqSqVfrOZXNpsQRiTCFE5DoR2SYiISJS09MDon0h2y0TkSTP+jGeZS1EZI+INBCRABFZJyJ9PeuyPF8vF5G1IrLF01uiZzHxzBKRRM/r/PG81ZNF5AvPo5Vn+4YiskREEjyP7l45MMYUYHeiG1MIVU0QkfeAp4BQ4B+quqOQTUeq6lERCQUSRGSJqh4QkWeAV3AqF+8qpGrr3cAHqvq0p+dMjWJC+p3ndQKBj0Skg6pu86zLVNXOIvILnKrBA3F6dzyvqutFpDlOJYW2pT8SxhTNEogxRftfnFpo2TiNhArzsIgM8nzfDGgNpKvqHBEZDIwFYgp5XgLwmqdQ5TJV3VLINgUN8ZzhVAMux2ludiaBvF3g6/Oe728CopySTQDUFpFaxbyGMaViQ1jGFK0eEAbUAi5oXSoicTh/qLupakec2kghnnU1cCo349nHOVR1LU6DpsPAm56zh0KJSCTwONBHVTsA758XjxbyfYAnrhjPI9zTEMwYr7EEYkzRXgX+B1gAPFPI+suAH1X1hIi0wWn5e8Yznuf9Hph9/hNFpAVOj5HZONWOL1YavTZO/40MEWmM0165oKEFvm7wfP8hMKHA6xV2FmRMudgQljGF8JwR5KrqW555h3gR6a2qHxfY7N/AWBHZBuwGPvc89wbgOqC7quaJyJ0icr+qvl7guXE4k985QBZQ5BmIqm4Vkc3ATpwquJ+dt0l1EdmI84FwuGfZw8DLntiqAWtxhtOM8RqrxmuMMaZMbAjLGGNMmVgCMcYYUyaWQIwxxpSJJRBjjDFlYgnEGGNMmVgCMcYYUyaWQIwxxpTJ/wOtI45gaidN7gAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "y_sin = np.sin(x)\n",
    "y_cos = np.cos(x)\n",
    "\n",
    "# Plot the points using matplotlib\n",
    "plt.plot(x, y_sin)\n",
    "plt.plot(x, y_cos)\n",
    "plt.xlabel('x axis label')\n",
    "plt.ylabel('y axis label')\n",
    "plt.title('Sine and Cosine')\n",
    "plt.legend(['Sine', 'Cosine'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "R5IeAY03L9ja"
   },
   "source": [
    "### Subplots "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CfUzwJg0L9ja"
   },
   "source": [
    "You can plot different things in the same figure using the subplot function. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 281
    },
    "colab_type": "code",
    "id": "dM23yGH9L9ja",
    "outputId": "14dfa5ea-f453-4da5-a2ee-fea0de8f72d9"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEICAYAAABCnX+uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dd1zVZf/H8dfFYW8FFAUBBwqKynJmNmxoWpppjjQtuy2btve4W3e71OwuV5qZZplZVlqZlZYLBBRFBcGBqIAIyF7X7w/w/llpOQ58z/g8Hw8fDznAOe9zhLfX9zrf6/oqrTVCCCGsl4PRAYQQQlwYKXIhhLByUuRCCGHlpMiFEMLKSZELIYSVkyIXQggrJ0Uu7JZS6ial1PdG5xDiQik5j1zYOqVUP+A1oAtQC6QBU7XWWwwNJoSZOBodQIjGpJTyBlYCU4ClgDNwMVBpZC4hzEmmVoSt6wigtV6sta7VWpdrrb/XWm9TSk1USq0/+YVKKa2UukMpla6UOq6UmqmUUqd8/lalVFrD51YrpUKNeEJC/JkUubB1e4BapdQCpdQgpVSzf/j6IUAPoDtwI3A1gFJqGPAEMBwIANYBixsttRDnQIpc2DStdTHQD9DAbCBPKfWVUqrlGb7lFa11odb6ALAWiG64/XbgP1rrNK11DfAyEC2jcmEJpMiFzWso34la62AgCmgNvHOGLz9yyt/LAM+Gv4cC05RShUqpQqAAUEBQI8UW4qxJkQu7orXeBcynvtDPxUHgdq217yl/3LTWv5s9pBDnSIpc2DSlVIRS6kGlVHDDx22AMcDGc7yr94HHlVJdGu7HRyk10rxphTg/UuTC1p0AegGblFKl1Bd4KvDgudyJ1no58CqwRClV3HAfg8ycVYjzIguChBDCysmIXAghrJwUuRBCWDkpciGEsHJS5EIIYeUM2TTL399fh4WFGfHQQghhtRITE/O11gF/vt2QIg8LCyMhIcGIhxZCCKullNp/utvNMrWilJqnlMpVSqWa4/6EEEKcPXPNkc8HBprpvoQQQpwDs0ytaK1/VUqFmeO+bE15VS0J+wvYfeQE6UdLyMgrobi8mqraOqpq6nBzMhHo40qgjythfh70CGtOTIgvrk4mo6ML0eT2Hytlw95jZOaXkpVfyoFjZVTU1KI1aDReLk6ENHcnxM+dDgGe9O3gR3Azd6NjG67J5siVUpOByQAhISFN9bCGKCqvZlXqYX7Ymcv6jDwqqusA8PNwpkMLT9oHeOLi5ICzyYGyqloOF5Wzce8xlicdQmtwdnQgNsSXodFBDOnWCi9XJ4OfkRCNQ2tN4v7jfJ2Swy978th3rAyo/x0Ibe5OqJ8HHi4mFKCUorCsivTcE/y0O5eqmvrfq3b+HvTvGMCIuGCignwMfDbGMdsS/YYR+Uqt9T/uKhcfH69t8c3OvXklzP9tH8u2ZlNWVUuQrxtXRLbgsogWdA3ywc/T5W+/v6ismi37CtiUdYy1u/PIyC3B1cmBa7q2YlK/tnRpbZ8/pML2lFXV8GVSDgs37iftcDFuTib6tPfjko4B9Av3J8zPA5ODOuP319VpMvJKWJeez/r0PH7fe4zKmjq6B/swtlcIQ6ODbPKoVimVqLWO/8vtUuQXbl9+Ka98t4tVO47gbHLguujWTOgTRlSQN6dcKeycaK1JyS5iacJBvkrOoaSyhmu6BnL/FR0Jb+ll5mcgRNOorq1jyeYDTFuTTn5JFZGtvLm5TyhDo1vj7nz+EwRF5dUs35rNJ5sPsOdoCa18XLn/io4Mjw3C0WQ7y2WkyBtBUXk1M9aks2DDPpxNDkzq15bxfcII8Pr7kff5PM7cdZnM+20fpVU1jIwL5olrIvF1dzbr4wjRmFbvOMIr3+0iK7+Unm2b89BVnegR1uy8Bzuno7Xm973HeG31blIOFtKhhSdPDY7k0k4tzPYYRmrUIldKLQYuBfyBo8CzWuu5Z/p6WyjyH3Ye5fEvtnGstIqRccE8dFUnWni7NupjHi+t4r2fM5j32z6auTvx3HVdGNy1lVl/EYQwt2MllTy9IpVvtx8hvIUnjw2K4PKIFo36c6u1ZlXqEV5fvZvM/FJuiA3mmSGd8XG37vebGn1Efi6suchPVFTzwsqdLE3IpnMrb14b0a3J32DZkVPEY8u2s/1QEVd2bslrN3SjmYeMzoXl+W77YZ76MpUTFTVMvTKcyRe3a9KpjsqaWmasyeC/v+zFz8OZ/wzvyoDIM12u1fJJkZvB9uwipixKJKewnCmXtue+AR1xdjRm/q2mto55v2Xx+urdtPBy5d2xMcSE/NMF4oVoGlU1dTy/cgcfbzxA1yAf3ryxOx0NfG8n9VARD32Wwq4jJ7j9knY8fFUnq5w7lyK/QMuTsnls2Xb8PV2YPiaauNDmRkcCIOVgIXcu2kruiQqeuCaSiX3DZKpFGOpIUQVTFiWSdKDQokqzsqaW57/eyaJNB+jTzo/pY2LM/n5WY5MiP081tXW88t0u5qzPone75swcG/uPpxE2taKyah78LJkf03IZ07MNLwyNsohfHGF/Evcf5/aFCZRX1fL6yO5c07WV0ZH+YlliNk8s304zd2fmToy3qtN6z1Tk8tv+Nyqqa7nj40TmrM9iYt8wFk7qZXElDuDj7sTsm+O5+7IOLN58kEkLEiiprDE6lrAzP+w8ytjZG/F0ceTLuy6yyBIHuCEumC/u7ItSMOqDjfyWkW90pAsmRX4GReXVjJ+7iTW7cnlhaBeeu64LThY8ylVK8dDVnfjP8K6sz8hn5PsbyC2uMDqWsBOLNx/g9oUJRAR6sWxKX4tf69CltQ9f3NmXIF83Jn64mS+TDhkd6YJYbjMZKLe4glEfbCD5YCEzxsQwvk+Y0ZHO2pieIcydEM/+Y6WMmrWRw0XlRkcSNu7dn9J5/Ivt9O8YwOLJvS3yqPV0Wvm4sfSOPsSFNmPqp8ks+H2f0ZHOmxT5nxwtrmDUrI0cKChj3sQeDOnW2uhI5+zSTi1YOKkn+ScqufGDDRwsKDM6krBR7/y4hze+38PwmCBm3xx/QaszjeDj5sSCW3tyVeeWPPvVDuatzzI60nmRIj9F7okKxszeSG5xBQsn9eTi8L9ciMNqxIU25+PbelFUVs3oWRvZf6zU6EjCxrzz4x7e+TGdEXHBvD6yu0VPPf4dF0cTM2+KZWCXQJ5fuZM56zKNjnTOrPOVbwR5JyoZO3sTR4oqmH9rT4s5vfBCdG/jyyf/6k1ZVQ1jZ2+SaRZhNtN+TP9fib96Q7e/3eDKGjiZHJgxNoZrugby4jdpVlfmUuTUn743bs4mDh0vZ97EHvQIs/4SPykqyIeFk3pRXF7/HI+VVBodSVi5eeuzePvHPdwQaxslfpKTyYFpo/+/zD9LOGh0pLNm90VeUV3LpAVbyMovZc6EeHq38zM6ktlFBfkwZ0I82cfLmfDhZoorqo2OJKzUl0mHeH7lTgZ2CeS1EbZT4ic5mRx4e1Q0F4f789gX2/l+xxGjI50Vuy7ymto67v4kicQDx3lrVHcu6uBvdKRG06udH++Pi2PX4RP8a0EClTW1RkcSVmbt7lwe+iyFPu38eGd0tM2V+EkujibeHxdHVJAPdy9OYsPeY0ZH+kd2W+Raa576MpUf047y3LVdrPLslHN1WUQL3ryxO5uyCnjk823U1TX9ql5hnVIOFjLl40QiWnkx6+Y4m7xow6k8XByZP7EHoc3dmfxRAnuOnjA60t+y2yJ/7+e9LNlykLsv68CEvmFGx2kyQ6ODeGRgJ1Yk5/DWD3uMjiOsQPbxMiYtSCDAy4X5t/S0m0sPNvNwZv6tPXF1NnHLh1vIO2G57y/ZZZF/s+0wr6/ezbDo1jx4VUej4zS5KZe0Z0zPNry7NoMlmw8YHUdYsOKKaibNr5+K+3BiD/ytZLGPuQT5ujF3QjzHSiu57aP6PWQskd0VecrBQh5YmkxcaDNeuaGbXe4UqJTi+aFR9O8YwJNfpvK7Dew1Iczv5HtIe/NKeH9cHB1aWPay+8bSLdiXaaNj2JZdyP2fJlvklKRdFXlOYTm3fVR/iPjBeNuf5/s7TiYHZo6NoZ2/B3d+spUDx2T1p/ijF79J49c9ebw4LMqmTwQ4G1d3CeTJayJZteMI76xJNzrOX9hNkVdU13L7wkTKq2qZZ4eHiKfj5Vq/a6LW8K+PZMdE8f+WJhxk/u/7mNSvLaN7hhgdxyJM6teWEXHBTF+TzqrUw0bH+QO7KHKtNU8uT2X7oSLeHhVt6JVKLE2Yvwfvjo0hPfeExR42iqaVdOA4Ty1P5aIOfjw+KMLoOBZDKcWLw6KIbuPLA0tT2HWk2OhI/2MXRb7g930s25rNfQPCubKz9V6vr7FcHB7Ak4M788POo0z/yfIOG0XTyS2u4I6PE2np48K7Y2LlAiV/4upk4oPxcXi6OPKvjxI4XlpldCTADop8U+YxXvgmjSsiW3LfgHCj41isWy8K4/qYIKatSefn3blGxxEGqK6t485FWykur2HW+Hi5oPcZtPR25f3xcRwtqmSqhRzF2nSR5xZXcNcnSYT6ufPWqO442OhKNHNQSvHy9V3p1NKL+5Yky9a3duiV73aRsP84r47oRmQrb6PjWLTYkGY8c21nftmTZxFHsTZb5CdPnSqtrOH9cXF428kihgvh5ly/NLlOa6YsSqSi2jLPmRXm9822w8xtuKThdd1tf5WzOdzUK4ThsZZxFGuzRf7697vZvK+A/wzvKm9unoMwfw/evjGa1EPFPPfVDqPjiCawN6+ERz5PISbElyeuiTQ6jtVQSvHSsPqj2KmfJpN93LijWJss8u93HOGDXzIZ1zuEYTFBRsexOld0bsmdl7ZnyZaDLE/KNjqOaETlVbVM+TgRFycT790Ui7OjTVZCozl5FFtbq7nrkySqauoMyWFz/2oHC8p48LMUugX78PSQzkbHsVoPXNmRnm2b8+TyVDJyS4yOIxrJs1+lkp5bwjujomnl42Z0HKsU5u/BayO6kXKwkFdX7TIkg00VeVVNHXcvTgJg5thYXBztd+XmhXI0OTBjTAxuTibuWrTVYveYEOfvi63ZLE3I5u7LOtC/o/Ve1tASDOraiol9w5i7PsuQPcxtqshfW7WLlIOFvD6iG22auxsdx+q19Hbl7VHR7Mk9wbNfpRodR5hRRu4JnlyeSs+2zeW0XDN5/JoIugb58NBnKU1+1pfNFPmPO48yZ30WE/qEMjCqldFxbEb/jgHcdWkHliZksyL5kNFxhBlUVNdy16Ik3JxNTB8dI4t+zMTF0cTMsbFoDfcsTqK6tunmy23iXzCnsJyHPk+hS2tvHpd33c1u6hXhxIc248nlqezLLzU6jrhAz6/cye6jJ3jrxu4E+rgaHcemhPi588oN3Ug+WMib3zfdfv9WX+S1dZqpS5Kprqnj3bGxdr2jYWNxNDkwbUwMJgfFPYuNe2deXLjvth/mk00HuL1/Oy7t1MLoODZpcLdWjOkZwvu/7OXXPXlN8phWX+Qzfkpn874CXhgWRVt/D6Pj2KwgXzdeG9GN7YeKDHtnXlyY7ONlPLpsG92DfXjwqk5Gx7FpzwzpTMeWnjywNKVJrixk1UW+OauA6WvSGR4TxPDYYKPj2LyruwQyoU8oc9dnsVb2Y7EqNbV1TF2STJ2GGWPkfPHG5uZsYsaYWE5UVPPA0sbfj8Vq/zULy6qYuiSJkObuPD8syug4duPxayKJCPTioaUp5J6oMDqOOEvT16STsP84L10fRYifnNHVFDoFevHMtZ1Zl57PnPWZjfpYVlnkWmseW7advJJKZoyJxdPF0ehIdsPVycSMMTGUVtXw4NIUi9j5Tfy9jZnHeHdtBiPjghkaLSudm9LYniEM7BLI66t3sz27qNEexyqLfPHmg6zacYRHro6ga7CP0XHsTnhLL54e0jQjDXFhCsuquP/TZEL9PHjuui5Gx7E7SileuaEr/p4u3LukfhO/xmCWIldKDVRK7VZKZSilHjPHfZ5J+tETPL9yBxeH+zOpX9vGfCjxN06ONF5btZtt2YVGxxGnobXm0WXbyC+pZProGDzkyNUQvu7OvD0qmn3HSnm2kTaiu+AiV0qZgJnAIKAzMEYp1SibnFRU13LP4iQ8nB1580bZX9xIJ0caAV4u3Ls4Sa73aYE+2XyA1TuO8vDVneTI1WC92/lx92Ud+Dwxm9WNsITfHCPynkCG1jpTa10FLAGGmuF+/+K1VbvZdeQEb4zsTgsvWchgtJMjjf0FZbLlrYVJP3qCF1bu5OJwf27r187oOAK4b0A4jw6M4OJwf7PftzmKPAg4eMrH2Q23/YFSarJSKkEplZCXd34nyV/TNZCHr+7EZRGykMFSnDrS+Colx+g4gv8/cnV3duTNkXLkaikcTQ5MubQ97s7mn+IyR5Gf7qfkL6cyaK1naa3jtdbxAQHnt9NafFhz7rqsw3l9r2g89w4IJybElye/2C6XiLMAr67a1XDk2o0W3nLkag/MUeTZQJtTPg4GZGhmR5xMDkwfHQPAfUuSqGnCzYLEH63dlcuHv+1jYt8wLo9oaXQc0UTMUeRbgHClVFullDMwGvjKDPcrrEib5u68eH0UWw8UMm2N8RejtUe5xRU89FkKEYFePDYowug4ogldcJFrrWuAu4HVQBqwVGst73zZoaHRQYyIC+bdtRls2HvM6Dh2pa5O88DSFEqrapgxJkY2j7MzZjmPXGv9rda6o9a6vdb6JXPcp7BO/76uC2F+Htz/aTLHS6uMjmM3Pvg1k/UZ+Tx7bRfC5WLjdscqV3YKy+Xh4siMMTEcK63kkWXb0FqW8De2pAPHefP73Qzu2orRPdr88zcImyNFLswuKsiHRwdG8MPOo3y0Yb/RcWxacUU19y5JoqW3Ky8P74pScqqhPZIiF41iUr+2XB7Rgpe+SSP1UONtFmTPtNY8/sV2cgormD4mGh83J6MjCYNIkYtGoZTijZHdaebhxD2yhL9RfLL5AN9sO8yDV3UkLrS50XGEgaTIRaNp7uHMtNEx7D9WytNfpsp8uRmlHS7m+a/rl+Df0b+90XGEwaTIRaPq3c6PeweEszzpEJ8lZBsdxyaUVdVw9ydb8XZz4u1R0bIEX0iRi8Z3z+XhXNTBj6dXpJJ2uNjoOFZNa82Ty1PJzC9l2qho/D1djI4kLIAUuWh0JgfFO6Ni8HFz4q5FW2W+/AIs3nyQ5UmHmDqgI307mH8XPWGdpMhFkwjwcmH6mBj2HSvl8S+2y3z5eUg9VMRzX+2gf8cA7rlcNo8T/0+KXDSZ3u38ePCqTnydksOC3/cZHceqFJVXM2VRIn6ezrwj8+LiT6TIRZOackl7rohswYvfpLFlX4HRcaxCXZ3mwaXJHC6s4N2xsTT3cDY6krAwUuSiSTk4KN68MZrgZm7cuWgrucUVRkeyeDN+yuDHtFyeGhxJXGgzo+MICyRFLpqcj5sT74+Po6Sihrs+2Uq17F9+RmvSjvL2j3sYHhvEhL5hRscRFkqKXBgiItCbV27oypZ9x3lh5U6j41ikrPxSpn6aTFSQNy9fL/uoiDMz/8XjhDhLQ6ODSD1UxOx1WXQK9OKmXqFGR7IYxRXVTP4oAUcHxfvj4mR/cfG3ZEQuDPXYoEgu7RTAsyt2yMUoGtTU1nHPJ0lk5Zfy3k1xBDdzNzqSsHBS5MJQJgfF9DExhPq5c+eiRA4ck4s3v/RtGr/syePFYVH0ae9ndBxhBaTIheG8XZ2YM6EHdRpuXbCForJqoyMZZtGm/Xz42z4m9WvL6J4hRscRVkKKXFiEtv4evD8ujgPHypi8MIHKmlqjIzW5tbtzeWbFDi7rFMAT10QaHUdYESlyYTH6tPfj9ZHd2JRVwEOfbaOuzn6W8accLOTOj7cSEejF9DExmGTlpjgHctaKsChDo4PIKazg1VW7aO3jyuN2MDLdl1/KrfO34O/lzIe39MDLVa70I86NFLmwOHdc0o6cwnI++DUTX3dnplxquxdOyD1Rwc3zNqOBBbf0pIWXq9GRhBWSIhcWRynFc9d1obiimldX7cLDxcTNfcKMjmV2x0oquWn2JvJLKll0Wy/aBXgaHUlYKSlyYZFMDvXX/CyrquWZFTtwd3ZkRFyw0bHMprCsinFzN3OgoIz5t/QkJkT2UBHnT97sFBbLyeTAjDEx9OvgzyOfp7A8yTYuFVdcUc3N8zazN7eE2TfHy7ni4oJJkQuL5upkYtbNcfRu58cDS1P4ZNMBoyNdkGMllYydvZG0w8X8d1ws/TsGGB1J2AApcmHx3J0dmTexB5d2DOCJ5duZuz7L6Ejn5XBROTd+sIH0oyV8MD6OAZEtjY4kbIQUubAKrk4mPhgfz6CoQF5YuZM3Vu+2qvPMs/JLGfHfDeQWV7JwUi8uj5ASF+YjRS6shrNj/Zz5qPg2vLs2g3uXJFFRbfkrQDfsPcbw936jvLqWxZN707Ntc6MjCRsjZ60Iq+JocuCVG7rSNsCDV77bxaHCcmbfHI+/p4vR0U5r0ab9PLtiB6F+7syd0IMwfw+jIwkbJCNyYXWUUtxxSXv+e1MsaYeLGTJ9PZsyLWsL3IrqWp7+MpUnl6fSL9yf5XddJCUuGo0UubBag7q24vM7+uLmbGLM7I3MWJNOrQXMm+8+coJhM39j4cb9TO7fjrkTeuAty+5FI5IiF1YtKsiHr+/px7XdW/PmD3u4ac5GMvNKDMlSV6eZ/1sW1767nvySSj6c2IMnromUDbBEo1NaN/0IJj4+XickJDT54wrbpbXms8RsXli5k8rqOqZc2p4pl7ZvskukJR8s5NmvdpBysJDLOgXw2ojuBHhZ5ry9sF5KqUStdfyfb5c3O4VNUEpxY3wbLu0UwIsr05i2Jp0VyYeYekVHhnRrhaOpcQ4+DxeV89b3e/gsMZsALxfeHNmd4bFBcqFk0aRkRC5s0rr0PF5cmcbuoydo6+/BnZe257ro1rg4mmeEvutIMbN+zeSr5ByUglsvass9A8LxdJGxkWg8ZxqRX1CRK6VGAs8BkUBPrfVZtbMUuWgKdXWa73ceZfqadHYeLsbb1ZHB3VozPDaIuJBmOJzj3PXR4gpWpR7h2+2H2ZRVgJuTiVE92jCpX1vaNJcLJIvG11hFHgnUAR8AD0mRC0uktWZdej7Lkw6xKvUI5dW1+Lg5ERPiS1xIMyJbeePv5YKfhzPerk6UVddQWllDYVk1u46cYOfhYlIPFbEtuwiA8BaeDI1uzU29Qmnm4WzwsxP2pFHmyLXWaQ13fiF3I0SjUkrRv2MA/TsG8OKwGn5MO8qGvcdI3H+cn3fn/eP3e7s60rm1Nw9c2ZFBUYGEt/RqgtRCnL0mm9BTSk0GJgOEhMjVwYUxPFwcGRodxNDoIACKyqrJOlZKQWkl+SVVnKiowd3ZhIeLI16ujnQI8CS4mZsMVoRF+8ciV0r9CASe5lNPaq1XnO0Daa1nAbOgfmrlrBMK0Yh83J2Idvc1OoYQF+Qfi1xrfUVTBBFCCHF+ZGWnEEJYuQs9a+V6YAYQABQCyVrrq8/i+/KA/ef5sP5A/nl+r62Q10BeA3t//mCfr0Go1vovl5UyZEHQhVBKJZzu9Bt7Iq+BvAb2/vxBXoNTydSKEEJYOSlyIYSwctZY5LOMDmAB5DWQ18Denz/Ia/A/VjdHLkRTUUrtAO7SWv9sdBYh/o41jsiFOC2l1FilVIJSqkQpdVgp9Z1Sqt/53p/WuouUuLAGUuTCJiilHgDeAV4GWgIhwHvAUCNzCdEUrKrIlVIDlVK7lVIZSqnHjM7TlJRSbZRSa5VSaUqpHUqp+4zOZBSllEkplaSUWtnwsQ/wPPXTIF9orUu11tVa66+11g8rpVyUUu8opXIa/ryjlHJp+F5/pdRKpVShUqpAKbVOKeXQ8Ll9SqkrGv7+nFJqqVLqI6XUiYZ/g/hTMrVWSi1TSuUppbKUUvc24vP3VUp9rpTa1fDz0KexHstSKaXub/g3SFVKLVZKuRqdyUhWU+RKKRMwExgEdAbGKKU6G5uqSdUAD2qtI4HewF129vxPdR+QdsrHfQBXYPkZvv5J6l+zaKA70BN4quFzDwLZ1C9qawk8AZzpjaPrgCWAL/AV8C5AQ/F/DaQAQcAAYKpS6h8Xx52nacAqrXUE9c8n7R++3qYopYKAe4F4rXUUYAJGG5vKWFZT5NT/8mVorTO11lXU/0LZzWGz1vqw1nprw99PUP/LG2RsqqanlAoGBgNzTrnZD8jXWtec4dtuAp7XWudqrfOAfwPjGz5XDbSifsVctdZ6nT7zGQDrtdbfaq1rgYXUlyhADyBAa/281rpKa50JzKYRykUp5Q30B+YCNDxeobkfxwo4Am5KKUfAHcgxOI+hrKnIg4CDp3ycjR0WGYBSKgyIATYZm8QQ7wCPUH9Bk5OOAf4Nv9Sn05o/bgmxv+E2gNeBDOB7pVTmP0zZHTnl72WAa8NjhgKtG6ZnCpVShdSP7Fue7ZM6B+2APODDhumlOUopj0Z4HIultT4EvAEcAA4DRVrr741NZSxrKvLTbQhtd+dOKqU8gWXAVK11sdF5mpJSagiQq7VO/NOnNgAVwLAzfGsO9WV7UkjDbWitT2itH9RatwOuBR5QSg04x2gHgSytte8pf7y01tec4/2cDUcgFviv1joGKAXs7f2iZtQfjbel/j9kD6XUOGNTGcuaijwbaHPKx8HY2eGUUsqJ+hJfpLX+wug8BrgIuE4ptY/6qbXLlVIfa62LgGeAmUqpYUopd6WUk1JqkFLqNWAx8JRSKkAp5d/wtR9D/X8OSqkOqv7KEcVAbcOfc7EZKFZKPaqUcmt4MzZKKdXDLM/6j7KBbK31yaOxz6kvdntyBfX/ceZprauBL4C+BmcylDUV+RYgXCnVVinlTP3841cGZ2oyDUUzF0jTWr9ldB4jaK0f11oHa63DqP/3/0lrPa7hc28BD1D/JmYe9aPku4EvgReBBGAbsB3Y2nAbQDjwI1BC/cj+vXM9d7xhzvxa6t9MzaJ+R745gM95PtW/e6wjwEGlVKeGmwYAO839OBbuANC74T9sRf1rYFdv+P6ZVa3sVEpdQ/0cqQmYp7V+yeBITaZhYcs66ovo5PzwE1rrb41LZXMA9c8AABw3SURBVByl1KXUX/B7iNFZmppSKpr6/yicgUzgFq31cWNTNS2l1L+BUdSfzZUE3Ka1rjQ2lXGsqsiFEEL8lTVNrQghhDgNKXIhhLByUuRCCGHlzrSAolH5+/vrsLAwIx5aCCGsVmJiYv7prtlpliJXSs0DTi7WiPqnrw8LCyMhIcEcDy2EEHZDKXXai9aba2plPjDQTPclhBDiHJhlRK61/rVh/49GlXa4mLwTlfi6O+Hj5kRzD2e8XJ0a+2GFsBrHS6vIK6mktLKG8qr6Bar+Xi4EeLrg4+aEg8PpdroQ1q7J5siVUpOByQAhISHndR8fb9zPok0H/nBbSHN3ugX7EN3Gl8siWtA+wPOCswphDbTWbD9UxI87j7LtUBFph4s5WnzmNTEujg50D/YlLqwZPcKa0be9P65OpiZMLBqL2RYENYzIV57NHHl8fLw+nznynMJycgrLKSyrpqi8miPFFWzPLmJbdiE5RRUAdGntzbXdW3N9TBAtve16r3lhozJyT7Bo0wFWpx4hp6gCk4MivIUnka28iWzlRSsfNzxdHHF3NlGnIb+kkvySSg4UlLF1/3F25BRTU6fxcnHkmq6tuD42iJ5hzWW0bgWUUola6/i/3G5NRf53cgrL+S71CF+n5JB8sBBnkwM3xAVxe//2hPnb1S6fwgZprdmUVcDsXzNZsysXZ0cH+ocHMDAqkAERLWjm4XzW91VeVcuWfQWsSM7hu9TDlFXV0rGlJ/cOCOeaqFZS6BbM5ov8VPvyS5mzPpOlCdnU1NYxLDqIxwZF0EJG6MIKpR0u5vmvd7Ih8xjNPZy5uU8o43uH4ufpcsH3XVZVw3fbj/DfX/aSkVtCeAtPHriyIwOjAqnfj0pYkkYtcqXUYuBSwB84CjyrtZ57pq9v7CI/Kbe4gjnrs5j/+z6cTQ7cf2VHJvQJxdEk66CE5TteWsVbP+xh0ab9eLs5MXVAOKN7hjTKvHZtnebb7YeZviad9NwSLukYwIvDomjT3N3sjyXOX6OPyM9FUxX5SfvyS3n2qx38siePiEAv3hkdTUSgd5M9vhDnau2uXB7+PIXjZdWM6xXC/Vd2xNf97KdPzldtneajDft4Y/VuarXmvgEdmdy/HSaZbrEIdl3kUD/HuHrHUZ5ekUpReTVPDY5kfO9QOXwUFqW8qpaXv01j4cb9RAR68faoaCJbNf2g43BROc+u2MH3O4/Su11zpo2OkZMHLIDdF/lJ+SWVPPRZCj/vzuOKyJa8MbJbk4x0hPgnmXklTF6YSEZuCbf1a8vDAzvh4mjc6YFaaz5PzOaZFTtwdzbx9qho+nf8y+pw0YTOVOR2N1ns7+nCvAk9eHpIZ37Zk8v17/1OZl6J0bGEnftlTx5DZ/5GQWkVH0/qxVNDOhta4gBKKUbGt+Gruy/Cz9OZCR9uZubaDOQaBpbH7oocwMFBMalfWz75V2+Kyqu5/r3f+T0j3+hYwg5prZmzLpNbPtxMkK8bK+66iH7h/kbH+oPwll6suKsf13Zrzeurd/Posm1U19b98zeKJmOXRX5Sj7DmrLjrIlp6u3DzvM0s3XLQ6EjCjtTVaf799U5e/CaNqzoHsmxKX4s9S8TN2cS00dHce3kHliZkM/HDzRSVVxsdSzSw6yIHaNPcnWVT+tKnvR+PLNvG3PVZRkcSdqC6to4HP0th/u/7mNSvLe/dFIuHiyG7Sp81pRQPXNWJN0Z2Z3NWAaNnbeRYid1eJtOi2H2RA3i5OjFnQjyDogJ5YeVOpq9Jl3lA0WgqqmuZ8nEiy5MO8dBVHXlqcKRVraYcERfMnAk9yMwrYdSsjeQWVxgdye5JkTdwcTQxY0wMN8QG89YPe3jlu11S5sLsKqpr+ddHCazZlcsLw6K4+/JwqzwF9pKOAcy/pSc5heXc+MEGDhWWGx3JrkmRn8LR5MDrI7oxvncoH/yayVs/7DE6krAhVTV13LloK+vS83n1hvqfM2vWp70fCyf14lhJFaNnbeBIkYzMjSJF/icODop/X9eFUfFtmPFTBjPXZhgdSdiA6to67v5kKz/tyuXl67tyY3wboyOZRVxoMxbe1ouCkirGzd0kc+YGkSI/DQcHxcvDuzI0uv50q3nyBqi4AHV1mgeXpvD9zqP8+7oujO11fvvxW6roNr7MndiDgwVl3DxvM8UVcjZLU5MiPwOTg+LNkd0Z2CWQ51fuZEXyIaMjCSukteaFb3byVUoOjw6MYELfMKMjNYre7fx4f3wce46eYNL8LVRU1xodya5Ikf8NR5MD08ZE07tdcx76LIXfZNGQOEezfs3kw9/2cetFbbnjknZGx2lUl3VqwTujYkjYf5ypS5KprZOTBZqKFPk/cHE08cH4eNr5e3LHwkTSDhcbHUlYieVJ2fznu10M6daKpwZHWuXZKedqcLdWPD24M6t2HOGlb9KMjmM3pMjPgo+bEx/e0gMPF0cmfriZHDnVSvyDjZnHePizbfRp58ebN3a3qvPEL9St/dpy60VtmfdbliywayJS5Gepta8b82/tQWllLbctSKCsqsboSMJC7csv5Y6PEwn1c+f98XGGb35lhCcHRzKwSyAvfrOT73ccMTqOzZMiPwcRgd5MHxNN2pFiHlyaQp3MAYo/KSqvZtKCLQDMndADHzcngxMZw+SgeGd0NN2CfZn6aTK7jsiUZGOSIj9Hl0e05IlBkXyXeoR3fpQFQ+L/1TScK36goIz3x8XZ/UW/XZ1MzBofh6eLI7ctSJBzzBuRFPl5uO3itoyMC2b6Txl8nZJjdBxhIV75bhfr0vN5cVgUvdv5GR3HIrT0dmXWzfHknqhkyqKtVNXI9reNQYr8PCileOn6rsSHNuORz7fJYaNgRfIh5qzPYkKfUEb1sK0FPxcquo0vr93Qjc1ZBbywcqfRcWySFPl5cnZ04L2bYvFydeT2hYmyN7Md25lTzKPLttEzrDlPDelsdByLNCwmiMn927Fw436WJWYbHcfmSJFfgBbervx3XCw5heXc/2myvPlphwrLqrj94wR83ZyZeVMsTib5lTqTR67uRJ92fjyxfDs7coqMjmNT5KfuAsWFNueZIZ35aVcu09akGx1HNKG6Os3UT5M5WlTJf8fFEuDlYnQki+ZocmDG2BiauTtzx8eJFJZVGR3JZkiRm8G43qEMjw1i+k/p/Lonz+g4oom893MGP+/O4+lrOxMT0szoOFbB39OF98bFcqSoQo5izUiK3AyUUrw0rCsdW3gx9dNkDhfJyk9b93tGPm/9sIeh0a0ZZ2O7GTa22JBmPHNtF9buzuP9X/caHccmSJGbiZuziZk3xVJZXcs9nyTJVcZt2NHiCu5dkkS7AE9evr6rXeyhYm7jeoUwpFsr3li9m02Zx4yOY/WkyM2oQwtPXh7elYT9x3l99W6j44hGUFNbxz2LkyitrOW/VnDBZEullOI/w7sS6ufBPYuTyJfFQhdEitzMhkYHMa53CLN+zeSnXUeNjiPMbPqadDZnFfDS9VGEt/QyOo5V83J1YubYWIrKq2W+/AJJkTeCpwZ3pnMrbx5cmiLz5Tbkt4x8ZqzNYGRcMMNjg42OYxM6t/bm39d1YV16Pv/9RebLz5cUeSNwdTLx7tgYKmvquG9xMjUyX2718k5Uct+SZNoHePLvoV2MjmNTRvVow7XdW/PWD3tI2FdgdByrJEXeSNoFePLisCg27ytgupxfbtXq6jQPLE3mREU1746Nwd1Z5sXNSSnFy9dHEeTrxr2Lk+T88vMgRd6IhscGMyIumBlrM/h9r1wmzlrNWpfJuvR8nr22CxGB3kbHsUlerk68OzaGvJJKHvl8G1rLfPm5kCJvZM8P7UJbfw/u/zSZglIZaVibpAPHeWP1bgZ3bcWYnm2MjmPTugX78ujACL7feZSFG/cbHceqSJE3MndnR6aPjuF4aTWPfJ4iIw0rUlxRzb1Lkmjp7crLw+V88aZw60VtubRTAC9+kya7ip4DKfImEBXkw2ODIvgxLZePNshIwxporXlqeSo5hRVMHxNtt1f6aWoODoo3RnbH29WJexcnUV5Va3QkqyBF3kRuuSiMyyNa8NK3aaQdlpGGpVu29RBfpeRw34Bw4kKbGx3Hrvh7uvDWjd3Zc7SEF7+R/cvPhlmKXCk1UCm1WymVoZR6zBz3aWuUUrw+ohs+bk7cIyMNi5aVX8ozK1Lp1bY5d13Wweg4dql/xwBu79+ORZsOsCpVLt78Ty64yJVSJmAmMAjoDIxRSsnu+qfh1zDSyMiVkYalqqqp474lSTiZHHh7VDQmB5kXN8qDV3WiW7APj32xTRbW/QNzjMh7Ahla60ytdRWwBBhqhvu1SReHy0jDkr35w262ZRfx6g3daO3rZnQcu+bs6MC00TFU1dQxdUkytbKE/4zMUeRBwMFTPs5uuO0PlFKTlVIJSqmEvDz73rP7was60TVIRhqWZn16Ph/8ksnYXiEMjAo0Oo4A2vp78O/rurApq4D3ZQn/GZmjyE937PmX/zq11rO01vFa6/iAgAAzPKz1cnZ0YPqY+pHG/Z/KSMMSHCup5P6lyXRo4cnTg2Vm0JKMiAv+3xL+rQeOGx3HIpmjyLOBU1dKBAM5Zrhfm3ZypLExU0YaRtNa8/Dn2ygqr2bGmBjcnE1GRxKnUErx4rAoAr1duW9JEsUVcqHzPzNHkW8BwpVSbZVSzsBo4Csz3K/Nk5GGZVjw+z5+2pXL44MiiGwlS/AtkY+bE9PHRJNTWMFTy1NlYd2fXHCRa61rgLuB1UAasFRrveNC79ceKKV46fooWvm4cu9iGWkYIe1wMS9/t4vLI1owsW+Y0XHE34gLbc7UAeF8lZLDsq2HjI5jUcxyHrnW+lutdUetdXut9UvmuE974e3qxLTRMRwukpFGUyurquGexUn4uDnx+ohusgTfCtx5WQd6tW3OMytSycwrMTqOxZCVnRYgLrTZ/0YanyVmGx3Hbjz/9U725pXw9o3R+Hm6GB1HnAWTg+LtUdE4mRy4d0kSVTWy1z9IkVuMOy/rQO92zXl2xQ4ycmWk0di+TslhyZaD3HFJe/qF+xsdR5yD1r5uvDaiG6mHinlt1S6j41gEKXILYXJQTBtdf8bE3Z9spaJalvA3loMFZTzxxXZiQnx54MqORscR5+HqLoHc3CeUOeuz5Nq4SJFblJberrwxshu7jpzgP9+mGR3HJlXX1nHP4iRQMH10DE4m+RWwVk9cE0lkw7VxjxRVGB3HUPJTbGEuj2jJpH5tWbBhP6tSDxsdx+a8vno3yQcLeWV4N9o0dzc6jrgAf7g27pIku15YJ0VugR4dGEH3YB8e/nwbBwvKjI5jM37adZRZv2YyrncIg7u1MjqOMIP2AZ48PzSKTVkFTLPja+NKkVsgZ0cH3h0bC8Ddn2yVd+bNIKewnAeWptC5lTdPyRJ8mzIiLpgbYoOZ8VM669Ltcx8nKXIL1aa5O6+P6E5KdhH/+U7myy9EdW0d9y5Oorqmjpk3xeLqJEvwbc0Lw7rQIcCTqUuSOVpsf/PlUuQWbGBUILdcFMaHv+3ju+0yX36+Xlu1i4T9x3l5eFfa+nsYHUc0AndnR967KZayqlruWZxETa19HcVKkVu4xwdF0r2NLw9/vk1Wsp2HVamHmb0ui/G9Qxka/ZfdlYUNCW/pxUvXR7E5q4A3vt9jdJwmJUVu4ZwdHXjvplicTIopH2+lrKrG6EhWIzOvhIc+20b3Nr48NSTS6DiiCQyPDWZMzxDe/2WvXV24RYrcCgT5ujF9TAx7ck/w+BfbZT+Ws1BeVcudi7biZFK8d1MsLo4yL24vnruuM92DfXjosxS7OYqVIrcSF4cH8OCVHVmRnMOC3/cZHceiaa15dNk2dh89wTujYwiSS7bZFRdHE++Ni8PJpLjj40RKK23/KFaK3IrceWkHrohswYvfpLFh7zGj41isWb9m8lVKDg9d1YlLOtr31ajsVZCvGzPGxJKRW8Ijy7bZ/FGsFLkVcXBQvDUqmlA/d+76ZKssFjqNX/bk8eqqXVzTNZA7L21vdBxhoH7h/jwyMIJvth1m5toMo+M0KilyK+Pt6sTsm+Oprq1j8sJEefPzFPvyS7nnk610bOnF6yO6y/7igtv7t2NYdGve+H4P3++w3Tc/pcitULsAT2aMiWH3kWIe/mwbdXa8x8RJReXVTFqwBQcHxazx8Xi4OBodSVgApRSv3NCNbsE+3P9pMruPnDA6UqOQIrdSl3ZqwWODIvhm+2He/GG30XEMVV1bx52LEjlQUMb74+II8ZPNsMT/c3UyMWt8PO4ujkxasIW8E5VGRzI7KXIr9q+L2zGmZwgz1+5l6ZaDRscxhNaap79M5beMY/xneDd6t/MzOpKwQIE+rsy5OZ78kkpu+yiB8irb2u9fityKKaV4fmgXLg7354nl21mfnm90pCb3wa+ZLNlykLsv68CIuGCj4wgL1r2NL9NHx7Atu5B7bWzbWylyK+dkql/52aGFJ1M+TmRnTrHRkZrMF1uzeeW7XQzu1kqu9CPOylVdAnl2SGd+2HmUF1butJnTEqXIbYCXqxPzJvbA09WRm+dtZl9+qdGRGt1Pu47y8Ofb6Nvej7du7I6Dg5yhIs7OxIvaclu/tsz/fR/v/bzX6DhmIUVuI1r7urFwUk9q6+oYN3eTTV/6KnF/AXcu2kpkKy8+GB8ny+/FOXvimkiujwni9dW7bWKltBS5DenQwosFt/bkeGkVN8/bxPHSKqMjmV3qoSJunZ9AKx835t/SEy9XJ6MjCSvk4KB4fUQ3ruzckme/2sGyxGyjI10QKXIb0y3Yl9kT4tl3rIyb5myiwIbKPPVQETfN2YSniyMf3doTf08XoyMJK+ZocmDGmBgu6uDHI8u28c02693zX4rcBvVt78/sm+PZm1fC2NkbOVZi/efN7sgpYtzc+hJfMrm3XDhZmMXJc8xjQ3y5Z/FWlidZ58hcitxGXdIxgLkTepCVX8qY2RutehHE9uz6kbi7k4nF/5ISF+bl4eLI/Ft60qutHw8sTbHKNRlS5DasX7g/H07swYGCMm78YAP7j1nf2Sy/7slj1KwNeDg7smRyH1m1KRqFh4sj8yb2oF8Hfx5Zto35v2UZHemcSJHbuL4d/Fl0Wy+Ol1Ux/L3fSTlYaHSks/Zl0iFunb+FUD8Plt/ZV0pcNCo3ZxOzb47nys4tee7rnby4cqfV7GMkRW4H4kKbs2xKX9ycTYyetZE1aUeNjvS3tNbMXJvB1E+TiQ9rxqe396aFt6vRsYQdcHUy8f64OCb2DWPO+izuXLTVKpbzS5HbifYBnnxxZ186tPDkto8SmPZjukWONkoqa5jy8VZeX72b67q3Zv4tPfGWUwxFEzI5KJ67rgtPD+nM6p1HGDVrg8Xv/S9FbkdaeLny6e29GRYdxNs/7uHWBVsoLLOc0xP35pUwbOZv/JB2lKcGRzJtdDSuTrLYRxhjUr+2fDAujqy8UgZPX2fR+5lLkdsZd2dH3rqxOy8Oi+L3jGMMnr6e3zKM3Wyrrk6z4Pd9DJm+noLSKhZO6sltF7eTC0MIw13VJZCV9/YjxM+dyQsTeWHlTiqqLW+qRRmxaUx8fLxOSEho8scVf5R8sJD7P02uP0WxZxsevyayyacxDhaU8eiybfy+9xj9Owbw6g1daeUjF0sWlqWyppaXvknjow37aevvwcvXd6VP+6bfMlkplai1jv/L7VLk9q2iupa3f9jD7HWZtPBy5dFBnbiuexCmRt6EqrSyhlm/ZjJ7XSYKeGpIZ0b3aCOjcGHR1qfn88Ty7fWn9MYH8/DVEQR4Nd0KYyly8bdSDhbyxPLt7MgpJiLQi4eu6sSAyBZmL9bKmlqWJR7i7R/3kHeiksFdW/HYoAhZ5COsRnlVLdPWpDN7XSZOJsWEPmFM7t8OvybYMqJRilwpNRJ4DogEemqtz6qdpcgtU12d5pvth3nrhz1k5ZcSEejF2F4hDI0OwsftwqZcDheV88mmAyzefID8kiriQ5vxxOBIYkOamSm9EE0rK7+UGWvS+TL5EK5OJkbEBTMyrg1RQd6NdmTZWEUeCdQBHwAPSZHbhuraOpZvPcSCDfvYkVOMq5MDV3YO5OIO/vTt4Edws38ePWutycgt4efdeazdncumrALqtGZARAtu7hPGxeH+Mo0ibEJGbgkz12bwzfbDVNXUERHoxZBurejVzo9uwT5m3Wa5UadWlFI/I0Vuk7ZnF/HJ5gP8sPMo+Q2bbwX5uhHq506bZu608nXFpBR1Gmrr6jhUWEFWfgmZ+aUUllUD0LGlJ1dEtmRMzxCZQhE2q6ismq+35fBZYvb/VlA7OzrQpbU3rX3dCPR2paW3C4OiWp3374HhRa6UmgxMBggJCYnbv3//BT+uaDpaa/YcLeG3jHySDhZysKCM7OPl/yv3kwK9XWnr70HbAA+6tPbm0k4tCPKVs1CEfSkorSJhXwGbswpIzSniaHElR4oqKK+u5eNJvegX7n9e93veRa6U+hEIPM2nntRar2j4mp+REbldqq6tA8CkFEoh0yVCnIHWmpLKGpwdHc57uuVMRe54Fg9+xXk9orALTiZZUybE2VBKNdoVreS3UAghrNwFFblS6nqlVDbQB/hGKbXaPLGEEEKcLUMWBCml8oDzfbfTHzB2cxDjyWsgr4G9P3+wz9cgVGsd8OcbDSnyC6GUSjjdZL89kddAXgN7f/4gr8GpZI5cCCGsnBS5EEJYOWss8llGB7AA8hrIa2Dvzx/kNfgfq5sjF0II8UfWOCIXQghxCilyIYSwclZV5EqpgUqp3UqpDKXUY0bnaUpKqTZKqbVKqTSl1A6l1H1GZzKKUsqklEpSSq00OosRlFK+SqnPlVK7Gn4e+hidqakppe5v+D1IVUotVkq5Gp3JSFZT5EopEzATGAR0BsYopTobm6pJ1QAPaq0jgd7AXXb2/E91H5BmdAgDTQNWaa0jgO7Y2WuhlAoC7gXitdZRgAkYbWwqY1lNkQM9gQytdabWugpYAgw1OFOT0Vof1lpvbfj7Cep/eYOMTdX0lFLBwGBgjtFZjKCU8gb6A3MBtNZVWutCY1MZwhFwU0o5Au5AjsF5DGVNRR4EHDzl42zssMgAlFJhQAywydgkhngHeIT6K1PZo3ZAHvBhw/TSHKWUh9GhmpLW+hDwBnAAOAwUaa2/NzaVsaypyE+30bXdnTuplPIElgFTtdbFRudpSkqpIUCu1jrR6CwGcgRigf9qrWOAUsDe3i9qRv3ReFugNeChlBpnbCpjWVORZwNtTvk4GDs7nFJKOVFf4ou01l8YnccAFwHXKaX2UT+1drlS6mNjIzW5bCBba33yaOxz6ovdnlwBZGmt87TW1cAXQF+DMxnKmop8CxCulGqrlHKm/s2NrwzO1GRU/aV35gJpWuu3jM5jBK3141rrYK11GPX//j9pre1qJKa1PgIcVEp1arhpALDTwEhGOAD0Vkq5N/xeDMDO3vD9s3+8QpCl0FrXKKXuBlZT/y71PK31DoNjNaWLgPHAdqVUcsNtT2itvzUwkzDGPcCihgFNJnCLwXmalNZ6k1Lqc2Ar9WdzJWHny/Vlib4QQlg5a5paEUIIcRpS5EIIYeWkyIUQwspJkQshhJWTIhdCCCsnRS6EEFZOilwIIazc/wHu8fP7vucweAAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Compute the x and y coordinates for points on sine and cosine curves\n",
    "x = np.arange(0, 3 * np.pi, 0.1)\n",
    "y_sin = np.sin(x)\n",
    "y_cos = np.cos(x)\n",
    "\n",
    "# Set up a subplot grid that has height 2 and width 1,\n",
    "# and set the first such subplot as active.\n",
    "plt.subplot(2, 1, 1)\n",
    "\n",
    "# Make the first plot\n",
    "plt.plot(x, y_sin)\n",
    "plt.title('Sine')\n",
    "\n",
    "# Set the second subplot as active, and make the second plot.\n",
    "plt.subplot(2, 1, 2)\n",
    "plt.plot(x, y_cos)\n",
    "plt.title('Cosine')\n",
    "\n",
    "# Show the figure.\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gLtsST5SL9jc"
   },
   "source": [
    "You can read much more about the `subplot` function in the [documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "colab-tutorial.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}


================================================
FILE: linear-classify.md
================================================
---
layout: page
permalink: /linear-classify/
---

Table of Contents:

- [Linear Classification](#linear-classification)
  - [Parameterized mapping from images to label scores](#parameterized-mapping-from-images-to-label-scores)
  - [Interpreting a linear classifier](#interpreting-a-linear-classifier)
  - [Loss function](#loss-function)
    - [Multiclass Support Vector Machine loss](#multiclass-support-vector-machine-loss)
  - [Practical Considerations](#practical-considerations)
  - [Softmax classifier](#softmax-classifier)
  - [SVM vs. Softmax](#svm-vs-softmax)
  - [Interactive web demo](#interactive-web-demo)
  - [Summary](#summary)
  - [Further Reading](#further-reading)

<a name='intro'></a>

## Linear Classification

In the last section we introduced the problem of Image Classification, which is the task of assigning a single label to an image from a fixed set of categories. Moreover, we described the k-Nearest Neighbor (kNN) classifier which labels images by comparing them to (annotated) images from the training set. As we saw, kNN has a number of disadvantages:

- The classifier must *remember* all of the training data and store it for future comparisons with the test data. This is space inefficient because datasets may easily be gigabytes in size.
- Classifying a test image is expensive since it requires a comparison to all training images.

**Overview**. We are now going to develop a more powerful approach to image classification that we will eventually naturally extend to entire Neural Networks and Convolutional Neural Networks. The approach will have two major components: a **score function** that maps the raw data to class scores, and a **loss function** that quantifies the agreement between the predicted scores and the ground truth labels. We will then cast this as an optimization problem in which we will minimize the loss function with respect to the parameters of the score function.

<a name='score'></a>

### Parameterized mapping from images to label scores

The first component of this approach is to define the score function that maps the pixel values of an image to confidence scores for each class. We will develop the approach with a concrete example. As before, let's assume a training dataset of images \\( x_i \in R^D \\), each associated with a label \\( y_i \\). Here \\( i = 1 \dots N \\) and \\( y_i \in \{ 1 \dots K \} \\). That is, we have **N** examples (each with a dimensionality **D**) and **K** distinct categories. For example, in CIFAR-10 we have a training set of **N** = 50,000 images, each with **D** = 32 x 32 x 3 = 3072 pixels, and **K** = 10, since there are 10 distinct classes (dog, cat, car, etc). We will now define the score function \\(f: R^D \mapsto R^K\\) that maps the raw image pixels to class scores.

**Linear classifier.** In this module we will start out with arguably the simplest possible function, a linear mapping:

$$
f(x_i, W, b) =  W x_i + b
$$

In the above equation, we are assuming that the image \\(x_i\\) has all of its pixels flattened out to a single column vector of shape [D x 1]. The matrix **W** (of size [K x D]), and the vector **b** (of size [K x 1]) are the **parameters** of the function. In CIFAR-10, \\(x_i\\) contains all pixels in the i-th image flattened into a single [3072 x 1] column, **W** is [10 x 3072] and **b** is [10 x 1], so 3072 numbers come into the function (the raw pixel values) and 10 numbers come out (the class scores). The parameters in **W** are often called the **weights**, and **b** is called the **bias vector** because it influences the output scores, but without interacting with the actual data \\(x_i\\). However, you will often hear people use the terms *weights* and *parameters* interchangeably.

There are a few things to note:

- First, note that the single matrix multiplication \\(W x_i\\) is effectively evaluating 10 separate classifiers in parallel (one for each class), where each classifier is a row of **W**.
- Notice also that we think of the input data \\( (x_i, y_i) \\) as given and fixed, but we have control over the setting of the parameters **W,b**. Our goal will be to set these in such way that the computed scores match the ground truth labels across the whole training set. We will go into much more detail about how this is done, but intuitively we wish that the correct class has a score that is higher than the scores of incorrect classes.
- An advantage of this approach is that the training data is used to learn the parameters **W,b**, but once the learning is complete we can discard the entire training set and only keep the learned parameters. That is because a new test image can be simply forwarded through the function and classified based on the computed scores.
- Lastly, note that classifying the test image involves a single matrix multiplication and addition, which is significantly faster than comparing a test image to all training images.

> Foreshadowing: Convolutional Neural Networks will map image pixels to scores exactly as shown above, but the mapping ( f ) will be more complex and will contain more parameters.

<a name='interpret'></a>

### Interpreting a linear classifier

Notice that a linear classifier computes the score of a class as a weighted sum of all of its pixel values across all 3 of its color channels. Depending on precisely what values we set for these weights, the function has the capacity to like or dislike (depending on the sign of each weight) certain colors at certain positions in the image. For instance, you can imagine that the "ship" class might be more likely if there is a lot of blue on the sides of an image (which could likely correspond to water). You might expect that the "ship" classifier would then have a lot of positive weights across its blue channel weights (presence of blue increases score of ship), and negative weights in the red/green channels (presence of red/green decreases the score of ship).

<div class="fig figcenter fighighlight">
  <img src="/assets/imagemap.jpg">
  <div class="figcaption">An example of mapping an image to class scores. For the sake of visualization, we assume the image only has 4 pixels (4 monochrome pixels, we are not considering color channels in this example for brevity), and that we have 3 classes (red (cat), green (dog), blue (ship) class). (Clarification: in particular, the colors here simply indicate 3 classes and are not related to the RGB channels.) We stretch the image pixels into a column and perform matrix multiplication to get the scores for each class. Note that this particular set of weights W is not good at all: the weights assign our cat image a very low cat score. In particular, this set of weights seems convinced that it's looking at a dog.</div>
</div>

**Analogy of images as high-dimensional points.** Since the images are stretched into high-dimensional column vectors, we can interpret each image as a single point in this space (e.g. each image in CIFAR-10 is a point in 3072-dimensional space of 32x32x3 pixels). Analogously, the entire dataset is a (labeled) set of points.

Since we defined the score of each class as a weighted sum of all image pixels, each class score is a linear function over this space. We cannot visualize 3072-dimensional spaces, but if we imagine squashing all those dimensions into only two dimensions, then we can try to visualize what the classifier might be doing:

<div class="fig figcenter fighighlight">
  <img src="/assets/pixelspace.jpeg">
  <div class="figcaption">
    Cartoon representation of the image space, where each image is a single point, and three classifiers are visualized. Using the example of the car classifier (in red), the red line shows all points in the space that get a score of zero for the car class. The red arrow shows the direction of increase, so all points to the right of the red line have positive (and linearly increasing) scores, and all points to the left have a negative (and linearly decreasing) scores.
  </div>
</div>

As we saw above, every row of \\(W\\) is a classifier for one of the classes. The geometric interpretation of these numbers is that as we change one of the rows of \\(W\\), the corresponding line in the pixel space will rotate in different directions. The biases \\(b\\), on the other hand, allow our classifiers to translate the lines. In particular, note that without the bias terms, plugging in \\( x_i = 0 \\) would always give score of zero regardless of the weights, so all lines would be forced to cross the origin.

**Interpretation of linear classifiers as template matching.**
Another interpretation for the weights \\(W\\) is that each row of \\(W\\) corresponds to a *template* (or sometimes also called a *prototype*) for one of the classes. The score of each class for an image is then obtained by comparing each template with the image using an *inner product* (or *dot product*) one by one to find the one that "fits" best. With this terminology, the linear classifier is doing template matching, where the templates are learned. Another way to think of it is that we are still effectively doing Nearest Neighbor, but instead of having thousands of training images we are only using a single image per class (although we will learn it, and it does not necessarily have to be one of the images in the training set), and we use the (negative) inner product as the distance instead of the L1 or L2 distance.

<div class="fig figcenter fighighlight">
  <img src="/assets/templates.jpg">
  <div class="figcaption">
    Skipping ahead a bit: Example learned weights at the end of learning for CIFAR-10. Note that, for example, the ship template contains a lot of blue pixels as expected. This template will therefore give a high score once it is matched against images of ships on the ocean with an inner product.
  </div>
</div>

Additionally, note that the horse template seems to contain a two-headed horse, which is due to both left and right facing horses in the dataset. The linear classifier *merges* these two modes of horses in the data into a single template. Similarly, the car classifier seems to have merged several modes into a single template which has to identify cars from all sides, and of all colors. In particular, this template ended up being red, which hints that there are more red cars in the CIFAR-10 dataset than of any other color. The linear classifier is too weak to properly account for different-colored cars, but as we will see later neural networks will allow us to perform this task. Looking ahead a bit, a neural network will be able to develop intermediate neurons in its hidden layers that could detect specific car types (e.g. green car facing left, blue car facing front, etc.), and neurons on the next layer could combine these into a more accurate car score through a weighted sum of the individual car detectors.

**Bias trick.** Before moving on we want to mention a common simplifying trick to representing the two parameters \\(W,b\\) as one. Recall that we defined the score function as:

$$
f(x_i, W, b) =  W x_i + b
$$

As we proceed through the material it is a little cumbersome to keep track of two sets of parameters (the biases \\(b\\) and weights \\(W\\)) separately. A commonly used trick is to combine the two sets of parameters into a single matrix that holds both of them by extending the vector \\(x_i\\) with one additional dimension that always holds the constant \\(1\\) - a default *bias dimension*. With the extra dimension, the new score function will simplify to a single matrix multiply:

$$
f(x_i, W) =  W x_i
$$

With our CIFAR-10 example, \\(x_i\\) is now [3073 x 1] instead of [3072 x 1] - (with the extra dimension holding the constant 1), and \\(W\\) is now [10 x 3073] instead of [10 x 3072]. The extra column that \\(W\\) now corresponds to the bias \\(b\\). An illustration might help clarify:

<div class="fig figcenter fighighlight">
  <img src="/assets/wb.jpeg">
  <div class="figcaption">
    Illustration of the bias trick. Doing a matrix multiplication and then adding a bias vector (left) is equivalent to adding a bias dimension with a constant of 1 to all input vectors and extending the weight matrix by 1 column - a bias column (right). Thus, if we preprocess our data by appending ones to all vectors we only have to learn a single matrix of weights instead of two matrices that hold the weights and the biases.
  </div>
</div>

**Image data preprocessing.** As a quick note, in the examples above we used the raw pixel values (which range from [0...255]). In Machine Learning, it is a very common practice to always perform normalization of your input features (in the case of images, every pixel is thought of as a feature). In particular, it is important to **center your data** by subtracting the mean from every feature. In the case of images, this corresponds to computing a *mean image* across the training images and subtracting it from every image to get images where the pixels range from approximately [-127 ... 127]. Further common preprocessing is to scale each input feature so that its values range from [-1, 1]. Of these, zero mean centering is arguably more important but we will have to wait for its justification until we understand the dynamics of gradient descent.

<a name='loss'></a>

### Loss function
In the previous section we defined a function from the pixel values to class scores, which was parameterized by a set of weights \\(W\\). Moreover, we saw that we don't have control over the data \\( (x_i,y_i) \\) (it is fixed and given), but we do have control over these weights and we want to set them so that the predicted class scores are consistent with the ground truth labels in the training data.

For example, going back to the example image of a cat and its scores for the classes "cat", "dog" and "ship", we saw that the particular set of weights in that example was not very good at all: We fed in the pixels that depict a cat but the cat score came out very low (-96.8) compared to the other classes (dog score 437.9 and ship score 61.95). We are going to measure our unhappiness with outcomes such as this one with a **loss function** (or sometimes also referred to as the **cost function** or the **objective**). Intuitively, the loss will be high if we're doing a poor job of classifying the training data, and it will be low if we're doing well.

<a name='svm'></a>

#### Multiclass Support Vector Machine loss

There are several ways to define the details of the loss function. As a first example we will first develop a commonly used loss called the **Multiclass Support Vector Machine** (SVM) loss. The SVM loss is set up so that the SVM "wants" the correct class for each image to a have a score higher than the incorrect classes by some fixed margin \\(\Delta\\). Notice that it's sometimes helpful to anthropomorphise the loss functions as we did above: The SVM "wants" a certain outcome in the sense that the outcome would yield a lower loss (which is good).

Let's now get more precise. Recall that for the i-th example we are given the pixels of image \\( x_i \\) and the label \\( y_i \\) that specifies the index of the correct class. The score function takes the pixels and computes the vector \\( f(x_i, W) \\) of class scores, which we will abbreviate to \\(s\\) (short for scores).  For example, the score for the j-th class is the j-th element: \\( s_j = f(x_i, W)_j \\). The Multiclass SVM loss for the i-th example is then formalized as follows:

$$
L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)
$$

**Example.** Lets unpack this with an example to see how it works. Suppose that we have three classes that receive the scores \\( s = [13, -7, 11]\\), and that the first class is the true class (i.e. \\(y_i = 0\\)). Also assume that \\(\Delta\\) (a hyperparameter we will go into more detail about soon) is 10. The expression above sums over all incorrect classes (\\(j \neq y_i\\)), so we get two terms:

$$
L_i = \max(0, -7 - 13 + 10) + \max(0, 11 - 13 + 10)
$$

You can see that the first term gives zero since [-7 - 13 + 10] gives a negative number, which is then thresholded to zero with the \\(max(0,-)\\) function. We get zero loss for this pair because the correct class score (13) was greater than the incorrect class score (-7) by at least the margin 10. In fact the difference was 20, which is much greater than 10 but the SVM only cares that the difference is at least 10; Any additional difference above the margin is clamped at zero with the max operation. The second term computes [11 - 13 + 10] which gives 8. That is, even though the correct class had a higher score than the incorrect class (13 > 11), it was not greater by the desired margin of 10. The difference was only 2, which is why the loss comes out to 8 (i.e. how much higher the difference would have to be to meet the margin). In summary, the SVM loss function wants the score of the correct class \\(y_i\\) to be larger than the incorrect class scores by at least by \\(\Delta\\) (delta). If this is not the case, we will accumulate loss.

Note that in this particular module we are working with linear score functions ( \\( f(x_i; W) =  W x_i \\) ), so we can also rewrite the loss function in this equivalent form:

$$
L_i = \sum_{j\neq y_i} \max(0, w_j^T x_i - w_{y_i}^T x_i + \Delta)
$$

where \\(w_j\\) is the j-th row of \\(W\\) reshaped as a column. However, this will not necessarily be the case once we start to consider more complex forms of the score function \\(f\\).

A last piece of terminology we'll mention before we finish with this section is that the threshold at zero \\(max(0,-)\\) function is often called the **hinge loss**. You'll sometimes hear about people instead using the squared hinge loss SVM (or L2-SVM), which uses the form \\(max(0,-)^2\\) that penalizes violated margins more strongly (quadratically instead of linearly). The unsquared version is more standard, but in some datasets the squared hinge loss can work better. This can be determined during cross-validation.

> The loss function quantifies our unhappiness with predictions on the training set


<div class="fig figcenter fighighlight">
  <img src="/assets/margin.jpg">
  <div class="figcaption">
    The Multiclass Support Vector Machine "wants" the score of the correct class to be higher than all other scores by at least a margin of delta. If any class has a score inside the red region (or higher), then there will be accumulated loss. Otherwise the loss will be zero. Our objective will be to find the weights that will simultaneously satisfy this constraint for all examples in the training data and give a total loss that is as low as possible.<br>
  </div>
</div>


<a name='regularization'></a>

**Regularization**. There is one bug with the loss function we presented above. Suppose that we have a dataset and a set of parameters **W** that correctly classify every example (i.e. all scores are so that all the margins are met, and \\(L_i = 0\\) for all i). The issue is that this set of **W** is not necessarily unique: there might be many similar **W** that correctly classify the examples. One easy way to see this is that if some parameters **W** correctly classify all examples (so loss is zero for each example), then any multiple of these parameters \\( \lambda W \\) where \\( \lambda > 1 \\) will also give zero loss because this transformation uniformly stretches all score magnitudes and hence also their absolute differences. For example, if the difference in scores between a correct class and a nearest incorrect class was 15, then multiplying all elements of **W** by 2 would make the new difference 30.

In other words, we wish to encode some preference for a certain set of weights **W** over others to remove this ambiguity. We can do so by extending the loss function with a **regularization penalty** \\(R(W)\\). The most common regularization penalty is the squared **L2** norm that discourages large weights through an elementwise quadratic penalty over all parameters:

$$
R(W) = \sum_k\sum_l W_{k,l}^2
$$

In the expression above, we are summing up all the squared elements of \\(W\\). Notice that the regularization function is not a function of the data, it is only based on the weights. Including the regularization penalty completes the full Multiclass Support Vector Machine loss, which is made up of two components: the **data loss** (which is the average loss \\(L_i\\) over all examples) and the **regularization loss**. That is, the full Multiclass SVM loss becomes:

$$
L =  \underbrace{ \frac{1}{N} \sum_i L_i }_\text{data loss} + \underbrace{ \lambda R(W) }_\text{regularization loss} \\\\
$$

Or expanding this out in its full form:

$$
L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)_j - f(x_i; W)_{y_i} + \Delta) \right] + \lambda \sum_k\sum_l W_{k,l}^2
$$

Where \\(N\\) is the number of training examples. As you can see, we append the regularization penalty to the loss objective, weighted by a hyperparameter \\(\lambda\\). There is no simple way of setting this hyperparameter and it is usually determined by cross-validation.

In addition to the motivation we provided above there are many desirable properties to include the regularization penalty, many of which we will come back to in later sections. For example, it turns out that including the L2 penalty leads to the appealing **max margin** property in SVMs (See [CS229](http://cs229.stanford.edu/notes/cs229-notes3.pdf) lecture notes for full details if you are interested).

The most appealing property is that penalizing large weights tends to improve generalization, because it means that no input dimension can have a very large influence on the scores all by itself. For example, suppose that we have some input vector \\(x = [1,1,1,1] \\) and two weight vectors \\(w_1 = [1,0,0,0]\\), \\(w_2 = [0.25,0.25,0.25,0.25] \\). Then \\(w_1^Tx = w_2^Tx = 1\\) so both weight vectors lead to the same dot product, but the L2 penalty of \\(w_1\\) is 1.0 while the L2 penalty of \\(w_2\\) is only 0.25. Therefore, according to the L2 penalty the weight vector \\(w_2\\) would be preferred since it achieves a lower regularization loss. Intuitively, this is because the weights in \\(w_2\\) are smaller and more diffuse. Since the L2 penalty prefers smaller and more diffuse weight vectors, the final classifier is encouraged to take into account all input dimensions to small amounts rather than a few input dimensions and very strongly. As we will see later in the class, this effect can improve the generalization performance of the classifiers on test images and lead to less *overfitting*.

Note that biases do not have the same effect since, unlike the weights, they do not control the strength of influence of an input dimension. Therefore, it is common to only regularize the weights \\(W\\) but not the biases \\(b\\). However, in practice this often turns out to have a negligible effect. Lastly, note that due to the regularization penalty we can never achieve loss of exactly 0.0 on all examples, because this would only be possible in the pathological setting of \\(W = 0\\).

**Code**. Here is the loss function (without regularization) implemented in Python, in both unvectorized and half-vectorized form:

```python
def L_i(x, y, W):
  """
  unvectorized version. Compute the multiclass svm loss for a single example (x,y)
  - x is a column vector representing an image (e.g. 3073 x 1 in CIFAR-10)
    with an appended bias dimension in the 3073-rd position (i.e. bias trick)
  - y is an integer giving index of correct class (e.g. between 0 and 9 in CIFAR-10)
  - W is the weight matrix (e.g. 10 x 3073 in CIFAR-10)
  """
  delta = 1.0 # see notes about delta later in this section
  scores = W.dot(x) # scores becomes of size 10 x 1, the scores for each class
  correct_class_score = scores[y]
  D = W.shape[0] # number of classes, e.g. 10
  loss_i = 0.0
  for j in range(D): # iterate over all wrong classes
    if j == y:
      # skip for the true class to only loop over incorrect classes
      continue
    # accumulate loss for the i-th example
    loss_i += max(0, scores[j] - correct_class_score + delta)
  return loss_i

def L_i_vectorized(x, y, W):
  """
  A faster half-vectorized implementation. half-vectorized
  refers to the fact that for a single example the implementation contains
  no for loops, but there is still one loop over the examples (outside this function)
  """
  delta = 1.0
  scores = W.dot(x)
  # compute the margins for all classes in one vector operation
  margins = np.maximum(0, scores - scores[y] + delta)
  # on y-th position scores[y] - scores[y] canceled and gave delta. We want
  # to ignore the y-th position and only consider margin on max wrong class
  margins[y] = 0
  loss_i = np.sum(margins)
  return loss_i

def L(X, y, W):
  """
  fully-vectorized implementation :
  - X holds all the training examples as columns (e.g. 3073 x 50,000 in CIFAR-10)
  - y is array of integers specifying correct class (e.g. 50,000-D array)
  - W are weights (e.g. 10 x 3073)
  """
  # evaluate loss over all examples in X without using any for loops
  # left as exercise to reader in the assignment
```

The takeaway from this section is that the SVM loss takes one particular approach to measuring how consistent the predictions on training data are with the ground truth labels. Additionally, making good predictions on the training set is equivalent to minimizing the loss.

> All we have to do now is to come up with a way to find the weights that minimize the loss.

### Practical Considerations

**Setting Delta.** Note that we brushed over the hyperparameter \\(\Delta\\) and its setting. What value should it be set to, and do we have to cross-validate it? It turns out that this hyperparameter can safely be set to \\(\Delta = 1.0\\) in all cases. The hyperparameters \\(\Delta\\) and \\(\lambda\\) seem like two different hyperparameters, but in fact they both control the same tradeoff: The tradeoff between the data loss and the regularization loss in the objective. The key to understanding this is that the magnitude of the weights \\(W\\) has direct effect on the scores (and hence also their differences): As we shrink all values inside \\(W\\) the score differences will become lower, and as we scale up the weights the score differences will all become higher. Therefore, the exact value of the margin between the scores (e.g. \\(\Delta = 1\\), or \\(\Delta = 100\\)) is in some sense meaningless because the weights can shrink or stretch the differences arbitrarily. Hence, the only real tradeoff is how large we allow the weights to grow (through the regularization strength \\(\lambda\\)).

**Relation to Binary Support Vector Machine**. You may be coming to this class with previous experience with Binary Support Vector Machines, where the loss for the i-th example can be written as:

$$
L_i = C \max(0, 1 - y_i w^Tx_i) + R(W)
$$

where \\(C\\) is a hyperparameter, and \\(y_i \in \\{ -1,1 \\} \\). You can convince yourself that the formulation we presented in this section contains the binary SVM as a special case when there are only two classes. That is, if we only had two classes then the loss reduces to the binary SVM shown above. Also, \\(C\\) in this formulation and \\(\lambda\\) in our formulation control the same tradeoff and are related through reciprocal relation \\(C \propto \frac{1}{\lambda}\\).

**Aside: Optimization in primal**. If you're coming to this class with previous knowledge of SVMs, you may have also heard of kernels, duals, the SMO algorithm, etc. In this class (as is the case with Neural Networks in general) we will always work with the optimization objectives in their unconstrained primal form. Many of these objectives are technically not differentiable (e.g. the max(x,y) function isn't because it has a *kink* when x=y), but in practice this is not a problem and it is common to use a subgradient.

**Aside: Other Multiclass SVM formulations.** It is worth noting that the Multiclass SVM presented in this section is one of few ways of formulating the SVM over multiple classes. Another commonly used form is the *One-Vs-All* (OVA) SVM which trains an independent binary SVM for each class vs. all other classes. Related, but less common to see in practice is also the *All-vs-All* (AVA) strategy. Our formulation follows the [Weston and Watkins 1999 (pdf)](https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es1999-461.pdf) version, which is a more powerful version than OVA (in the sense that you can construct multiclass datasets where this version can achieve zero data loss, but OVA cannot. See details in the paper if interested). The last formulation you may see is a *Structured SVM*, which maximizes the margin between the score of the correct class and the score of the highest-scoring incorrect runner-up class. Understanding the differences between these formulations is outside of the scope of the class. The version presented in these notes is a safe bet to use in practice, but the arguably simplest OVA strategy is likely to work just as well (as also argued by Rikin et al. 2004 in [In Defense of One-Vs-All Classification (pdf)](http://www.jmlr.org/papers/volume5/rifkin04a/rifkin04a.pdf)).

<a name='softmax'></a>

### Softmax classifier

It turns out that the SVM is one of two commonly seen classifiers. The other popular choice is the **Softmax classifier**, which has a different loss function. If you've heard of the binary Logistic Regression classifier before, the Softmax classifier is its generalization to multiple classes. Unlike the SVM which treats the outputs \\(f(x_i,W)\\) as (uncalibrated and possibly difficult to interpret) scores for each class, the Softmax classifier gives a slightly more intuitive output (normalized class probabilities) and also has a probabilistic interpretation that we will describe shortly. In the Softmax classifier, the function mapping \\(f(x_i; W) =  W x_i\\) stays unchanged, but we now interpret these scores as the unnormalized log probabilities for each class and replace the *hinge loss* with a **cross-entropy loss** that has the form:

$$
L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right) \hspace{0.5in} \text{or equivalently} \hspace{0.5in} L_i = -f_{y_i} + \log\sum_j e^{f_j}
$$

where we are using the notation \\(f_j\\) to mean the j-th element of the vector of class scores \\(f\\). As before, the full loss for the dataset is the mean of \\(L_i\\) over all training examples together with a regularization term \\(R(W)\\). The function \\(f_j(z) = \frac{e^{z_j}}{\sum_k e^{z_k}} \\) is called the **softmax function**: It takes a vector of arbitrary real-valued scores (in \\(z\\)) and squashes it to a vector of values between zero and one that sum to one. The full cross-entropy loss that involves the softmax function might look scary if you're seeing it for the first time but it is relatively easy to motivate.

**Information theory view**. The *cross-entropy* between a "true" distribution \\(p\\) and an estimated distribution \\(q\\) is defined as:

$$
H(p,q) = - \sum_x p(x) \log q(x)
$$

The Softmax classifier is hence minimizing the cross-entropy between the estimated class probabilities ( \\(q = e^{f_{y_i}}  / \sum_j e^{f_j} \\) as seen above) and the "true" distribution, which in this interpretation is the distribution where all probability mass is on the correct class (i.e. \\(p = [0, \ldots 1, \ldots, 0]\\) contains a single 1 at the \\(y_i\\) -th position.). Moreover, since the cross-entropy can be written in terms of entropy and the Kullback-Leibler divergence as \\(H(p,q) = H(p) + D_{KL}(p\|\|q)\\), and the entropy of the delta function \\(p\\) is zero, this is also equivalent to minimizing the KL divergence between the two distributions (a measure of distance). In other words, the cross-entropy objective *wants* the predicted distribution to have all of its mass on the correct answer.

**Probabilistic interpretation**. Looking at the expression, we see that

$$
P(y_i \mid x_i; W) = \frac{e^{f_{y_i}}}{\sum_j e^{f_j} }
$$

can be interpreted as the (normalized) probability assigned to the correct label \\(y_i\\) given the image \\(x_i\\) and parameterized by \\(W\\). To see this, remember that the Softmax classifier interprets the scores inside the output vector \\(f\\) as the unnormalized log probabilities. Exponentiating these quantities therefore gives the (unnormalized) probabilities, and the division performs the normalization so that the probabilities sum to one. In the probabilistic interpretation, we are therefore minimizing the negative log likelihood of the correct class, which can be interpreted as performing *Maximum Likelihood Estimation* (MLE). A nice feature of this view is that we can now also interpret the regularization term \\(R(W)\\) in the full loss function as coming from a Gaussian prior over the weight matrix \\(W\\), where instead of MLE we are performing the *Maximum a posteriori* (MAP) estimation. We mention these interpretations to help your intuitions, but the full details of this derivation are beyond the scope of this class.

<a name='softmax-stability'></a>

**Practical issues: Numeric stability**. When you're writing code for computing the Softmax function in practice, the intermediate terms \\(e^{f_{y_i}}\\) and \\(\sum_j e^{f_j}\\) may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick. Notice that if we multiply the top and bottom of the fraction by a constant \\(C\\) and push it into the sum, we get the following (mathematically equivalent) expression:

$$
\frac{e^{f_{y_i}}}{\sum_j e^{f_j}}
= \frac{Ce^{f_{y_i}}}{C\sum_j e^{f_j}}
= \frac{e^{f_{y_i} + \log C}}{\sum_j e^{f_j + \log C}}
$$

We are free to choose the value of \\(C\\). This will not change any of the results, but we can use this value to improve the numerical stability of the computation. A common choice for \\(C\\) is to set \\(\log C = -\max_j f_j \\). This simply states that we should shift the values inside the vector \\(f\\) so that the highest value is zero. In code:

```python
f = np.array([123, 456, 789]) # example with 3 classes and each having large scores
p = np.exp(f) / np.sum(np.exp(f)) # Bad: Numeric problem, potential blowup

# instead: first shift the values of f so that the highest number is 0:
f -= np.max(f) # f becomes [-666, -333, 0]
p = np.exp(f) / np.sum(np.exp(f)) # safe to do, gives the correct answer

```

**Possibly confusing naming conventions**. To be precise, the *SVM classifier* uses the *hinge loss*, or also sometimes called the *max-margin loss*. The *Softmax classifier* uses the *cross-entropy loss*. The Softmax classifier gets its name from the *softmax function*, which is used to squash the raw class scores into normalized positive values that sum to one, so that the cross-entropy loss can be applied. In particular, note that technically it doesn't make sense to talk about the "softmax loss", since softmax is just the squashing function, but it is a relatively commonly used shorthand.

<a name='svmvssoftmax'></a>

### SVM vs. Softmax

A picture might help clarify the distinction between the Softmax and SVM classifiers:

<div class="fig figcenter fighighlight">
  <img src="/assets/svmvssoftmax.png">
  <div class="figcaption">Example of the difference between the SVM and Softmax classifiers for one datapoint. In both cases we compute the same score vector <b>f</b> (e.g. by matrix multiplication in this section). The difference is in the interpretation of the scores in <b>f</b>: The SVM interprets these as class scores and its loss function encourages the correct class (class 2, in blue) to have a score higher by a margin than the other class scores. The Softmax classifier instead interprets the scores as (unnormalized) log probabilities for each class and then encourages the (normalized) log probability of the correct class to be high (equivalently the negative of it to be low). The final loss for this example is 1.58 for the SVM and 1.04 (note this is 1.04 using the natural logarithm, not base 2 or base 10) for the Softmax classifier, but note that these numbers are not comparable; They are only meaningful in relation to loss computed within the same classifier and with the same data.</div>
</div>

**Softmax classifier provides "probabilities" for each class.** Unlike the SVM which computes uncalibrated and not easy to interpret scores for all classes, the Softmax classifier allows us to compute "probabilities" for all labels. For example, given an image the SVM classifier might give you scores [12.5, 0.6, -23.0] for the classes "cat", "dog" and "ship". The softmax classifier can instead compute the probabilities of the three labels as [0.9, 0.09, 0.01], which allows you to interpret its confidence in each class. The reason we put the word "probabilities" in quotes, however, is that how peaky or diffuse these probabilities are depends directly on the regularization strength \\(\lambda\\) - which you are in charge of as input to the system. For example, suppose that the unnormalized log-probabilities for some three classes come out to be [1, -2, 0]. The softmax function would then compute:

$$
[1, -2, 0] \rightarrow [e^1, e^{-2}, e^0] = [2.71, 0.14, 1] \rightarrow [0.7, 0.04, 0.26]
$$

Where the steps taken are to exponentiate and normalize to sum to one. Now, if the regularization strength \\(\lambda\\) was higher, the weights \\(W\\) would be penalized more and this would lead to smaller weights. For example, suppose that the weights became one half smaller ([0.5, -1, 0]). The softmax would now compute:

$$
[0.5, -1, 0] \rightarrow [e^{0.5}, e^{-1}, e^0] = [1.65, 0.37, 1] \rightarrow [0.55, 0.12, 0.33]
$$

where the probabilites are now more diffuse. Moreover, in the limit where the weights go towards tiny numbers due to very strong regularization strength \\(\lambda\\), the output probabilities would be near uniform. Hence, the probabilities computed by the Softmax classifier are better thought of as confidences where, similar to the SVM, the ordering of the scores is interpretable, but the absolute numbers (or their differences) technically are not.

**In practice, SVM and Softmax are usually comparable.** The performance difference between the SVM and Softmax are usually very small, and different people will have different opinions on which classifier works better. Compared to the Softmax classifier, the SVM is a more *local* objective, which could be thought of either as a bug or a feature. Consider an example that achieves the scores [10, -2, 3] and where the first class is correct. An SVM (e.g. with desired margin of \\(\Delta = 1\\)) will see that the correct class already has a score higher than the margin compared to the other classes and it will compute loss of zero. The SVM does not care about the details of the individual scores: if they were instead [10, -100, -100] or [10, 9, 9] the SVM would be indifferent since the margin of 1 is satisfied and hence the loss is zero. However, these scenarios are not equivalent to a Softmax classifier, which would accumulate a much higher loss for the scores [10, 9, 9] than for [10, -100, -100]. In other words, the Softmax classifier is never fully happy with the scores it produces: the correct class could always have a higher probability and the incorrect classes always a lower probability and the loss would always get better. However, the SVM is happy once the margins are satisfied and it does not micromanage the exact scores beyond this constraint. This can intuitively be thought of as a feature: For example, a car classifier which is likely spending most of its "effort" on the difficult problem of separating cars from trucks should not be influenced by the frog examples, which it already assigns very low scores to, and which likely cluster around a completely different side of the data cloud.

<a name='webdemo'></a>

### Interactive web demo

<div class="fig figcenter fighighlight">
  <a href="http://vision.stanford.edu/teaching/cs231n/linear-classify-demo" style="text-decoration:none;">
    <img src="/assets/classifydemo.jpeg">
  </a>
  <div class="figcaption">We have written an interactive web demo to help your intuitions with linear classifiers. The demo visualizes the loss functions discussed in this section using a toy 3-way classification on 2D data. The demo also jumps ahead a bit and performs the optimization, which we will discuss in full detail in the next section.
  </div>
</div>


<a name='summary'></a>

### Summary

In summary,

- We defined a **score function** from image pixels to class scores (in this section, a linear function that depends on weights **W** and biases **b**).
- Unlike kNN classifier, the advantage of this **parametric approach** is that once we learn the parameters we can discard the training data. Additionally, the prediction for a new test image is fast since it requires a single matrix multiplication with **W**, not an exhaustive comparison to every single training example.
- We introduced the **bias trick**, which allows us to fold the bias vector into the weight matrix for convenience of only having to keep track of one parameter matrix.
- We defined a **loss function** (we introduced two commonly used losses for linear classifiers: the **SVM** and the **Softmax**) that measures how compatible a given set of parameters is with respect to the ground truth labels in the training dataset. We also saw that the loss function was defined in such way that making good predictions on the training data is equivalent to having a small loss.

We now saw one way to take a dataset of images and map each one to class scores based on a set of parameters, and we saw two examples of loss functions that we can use to measure the quality of the predictions. But how do we efficiently determine the parameters that give the best (lowest) loss? This process is *optimization*, and it is the topic of the next section.

<a name='furtherreading'></a>

### Further Reading

These readings are optional and contain pointers of interest.

- [Deep Learning using Linear Support Vector Machines](https://arxiv.org/abs/1306.0239) from Charlie Tang 2013 presents some results claiming that the L2SVM outperforms Softmax.


================================================
FILE: nerf.md
================================================
Tl;dr What is NeRF and what does it do
======================================

NeRF stands for Neural Radiance Fields. It solves for view
interpolation, which is taking a set of input views (in this case a
sparse set) and synthesizing novel views of the same scene. Current RGB
volume rendering models are great for optimization, but require
extensive storage space (1-10GB). One side benefit of NeRF is the
weights generated from the neural network are $\sim$6000 less in size
than the original images.

Helpful Terminology
===================

**Rasterization**: Computer graphics use this technique to display a 3D
object on a 2D screen. Objects on the screen are created from virtual
triangles/polygons to create 3D models of the objects. Computers convert
these triangles into pixels, which are assigned a color. Overall, this
is a computationally intensive process.\
**Ray Tracing**: In the real world, the 3D objects we see are
illuminated by light. Light may be blocked, reflected, or refracted. Ray
tracing captures those effects. It is also computationally intensive,
but creates more realistic effects. **Ray**: A ray is a line connected
from the camera center, determined by camera position parameters, in a
particular direction determined by the camera angle.\
**NeRF uses ray tracing rather than rasterization for its models.**\
**Neural Rendering** As of 2020/2021, this terminology is used when a
neural network is a black box that models the geometry of the world and
a graphics engine renders it. Other terms commonly used are *scene
representations*, and less frequently, *implicit representations*. In
this case, the neural network is just a flexible function approximator
and the rendering machine does not learn at all.

Approach
========

A continuous scene is represented as a 3D location *x* = (x, y, z) and
2D viewing direction $(\theta,\phi)$ whose output is an emitted color c
= (r, g, b) and volume density $\sigma$. The density at each point acts
like a differential opacity controlling how much radiance is accumulated
in a ray passing through point *x*. In other words, an opaque surface
will have a density of $\infty$ while a transparent surface would have
$\sigma = 0$. In layman terms, the neural network is a black box that
will repeatedly ask what is the color and what is the density at this
point, and it will provide responses such as “red, dense.”\
This neural network is wrapped into volumetric ray tracing where you
start with the back of the ray (furthest from you) and walk closer to
you, querying the color and density. The equation for expected color
$C(r)$ of a camera ray $r(t) = o + td$ with near and far bounds $t_n$
and $t_f$ is calculated using the following:

$C(r) = \[ \int_{t_n}^{t_f} T(t)\sigma(r(t))c(r(t),d) \,dt \]
where
    \[T(t) = exp(-\int_{t_n}^{t}\sigma(r(s))\, ds)\]$

\
To actually calculate this, the authors used a stratified sampling
approach where they partition $[t_n, t_f]$ into N evenly spaced bins and
then drew one sample uniformly from each bin:

$$\hat{C}(r) = \sum_{i = 1}^{N}T_{i}(1-exp(-\sigma_{i}\delta_{i}))c_{i}, where T_{i} = exp(-\sum_{j=1}^{i-1}\sigma_{j}\delta_{j})$$

Where $\delta_{i} = t_{i+1} - t_{i}$ is the distance between adjacent
samples. The volume rendering is differentiable. You can then train the
model by minimizing rendering loss.

$$min_{\theta}\sum_{i}\left\| render_{i}(F_{\Theta}-I_{i}\right\|^{2}$$

![In this illustration taken from the paper, the five variables are fed
into the MLP to produce color and volume density. $F_\Theta$ has 9
layers, 256 channels](assets/raydiagram.png "fig:") [fig:Figure 1]

\
In practice, the Cartesian coordinates are expressed as vector d. You
can approximate this representation through MLP with
$F_\Theta = (x, d) \rightarrow (c, \sigma)$.\
**Why does NeRF use MLP rather than CNN?** Multilayer perceptron (MLP)
is a feed forward neural network. The model doesn’t need to conserve
every feature, therefore a CNN is not necessary.\

Common issues and mitigation
============================

The naive implementation of a neural radiance field creates blurry
results. To fix this, the 5D coordinates are transformed into positional
encoding (terminology borrowed from transformer literature). $F_\Theta$
is a composition of two formulas: $F_\Theta = F'_\Theta \cdot \gamma$
which significantly improves performance.

$$\gamma(p) = (sin(2^{0}\pi p), cos(2^{0}\pi p),...,sin(2^{L-1}\pi p), cos(2^{L-1} \pi p)$$

L determines how many levels there are in the positional encoding and it
is used for regularizing NeRF (low L = smooth). This is also known as a
Fourier feature, and it turns your MLP into an interpolation tool.
Another way of looking at this is your Fourier feature based neural
network is just a tiny look up table with extremely high resolution.
Here is an example of applying Fourier feature to your code:\

    B = SCALE * np.random.normal(shape = (input_dims, NUM_FEATURES))
    x = np.concatenate([np.sin(x @ B), np.cos(x @ B)], axis = -1)
    x = nn.Dense(x, features = 256)

![Mapping how Fourier features are related to NeRF’s positional
encoding. Taken from Jon Barron’s CS 231n talk in Spring
2021](assets/fourier.png "fig:") [fig:Figure 2]

NeRF also uses hierarchical volume sampling: coarse sampling and the
fine network. This allows NeRF to more efficiently run their model and
deprioritize areas of the camera ray where there is free space and
occlusion. The coarse network uses $N_{c}$ sample points to evaluate the
expected color of the ray with the stratified sampling. Based on these
results, they bias the samples towards more relevant parts of the
volume.

$$\hat{C}_c(r) = \sum_{i=1}^{N_{c}}w_{i}c_{i}, w_{i}=T_{i}(1-exp(-\sigma_{i}\delta_{i}))$$

A second set of $N_{f}$ locations are sampled from this distribution
using inverse transform sampling. This method allocates more samples to
regions where we expect visual content.

Results
=======

The paper goes in depth on quantitative measures of the results, which
NeRF outperforms existing models. A visual assessment is shared below:

![Example of NeRF results versus existing SOTA
results](assets/NeRFresults.png "fig:") [fig:Figure 3]

Additional references
=====================

[What’s the difference between ray tracing and
rasterization?](https://blogs.nvidia.com/blog/2018/03/19/whats-difference-between-ray-tracing-rasterization/)
Self explanatory title, excellent write-up helping reader differentiate
between two concepts.\
[Matthew Tancik NeRF ECCV 2020 Oral](https://www.matthewtancik.com/nerf)
Videos showcasing NeRF produced images.\
[NeRF: Representing Scenes as Neural Radiance Fields for View
Synthesis](https://towardsdatascience.com/nerf-representing-scenes-as-neural-radiance-fields-for-view-synthesis-ef1e8cebace4)
Simple and alternative explanation for NeRF.\
[NeRF: Representing Scenes as Neural Radiance Fields for View
Synthesis](https://arxiv.org/pdf/2003.08934.pdf) arxiv paper\
[CS 231n Spring 2021 Jon Barron Guest
Lecture](https://stanford-pilot.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=66a23f12-764c-4787-a48a-ad330173e4b5)


================================================
FILE: neural-networks-1.md
================================================
---
layout: page
permalink: /neural-networks-1/
---

Table of Contents:

- [Quick intro without brain analogies](#quick)
- [Modeling one neuron](#intro)
  - [Biological motivation and connections](#bio)
  - [Single neuron as a linear classifier](#classifier)
  - [Commonly used activation functions](#actfun)
- [Neural Network architectures](#nn)
  - [Layer-wise organization](#layers)
  - [Example feed-forward computation](#feedforward)
  - [Representational power](#power)
  - [Setting number of layers and their sizes](#arch)
- [Summary](#summary)
- [Additional references](#add)

<a name='quick'></a>

## Quick intro

It is possible to introduce neural networks without appealing to brain analogies. In the section on linear classification we computed scores for different visual categories given the image using the formula \\( s = W x \\), where \\(W\\) was a matrix and \\(x\\) was an input column vector containing all pixel data of the image. In the case of CIFAR-10, \\(x\\) is a [3072x1] column vector, and \\(W\\) is a [10x3072] matrix, so that the output scores is a vector of 10 class scores. 

An example neural network would instead compute \\( s = W_2 \max(0, W_1 x) \\). Here, \\(W_1\\) could be, for example, a [100x3072] matrix transforming the image into a 100-dimensional intermediate vector. The function \\(max(0,-) \\) is a non-linearity that is applied elementwise. There are several choices we could make for the non-linearity (which we'll study below), but this one is a common choice and simply thresholds all activations that are below zero to zero. Finally, the matrix \\(W_2\\) would then be of size [10x100], so that we again get 10 numbers out that we interpret as the class scores. Notice that the non-linearity is critical computationally - if we left it out, the two matrices could be collapsed to a single matrix, and therefore the predicted class scores would again be a linear function of the input. The non-linearity is where we get the *wiggle*. The parameters \\(W_2, W_1\\) are learned with stochastic gradient descent, and their gradients are derived with chain rule (and computed with backpropagation).

A three-layer neural network could analogously look like \\( s = W_3 \max(0, W_2 \max(0, W_1 x)) \\), where all of \\(W_3, W_2, W_1\\) are parameters to be learned. The sizes of the intermediate hidden vectors are hyperparameters of the network and we'll see how we can set them later. Lets now look into how we can interpret these computations from the neuron/network perspective.

<a name='intro'></a>

## Modeling one neuron

The area of Neural Networks has originally been primarily inspired by the goal of modeling biological neural systems, but has since diverged and become a matter of engineering and achieving good results in Machine Learning tasks. Nonetheless, we begin our discussion with a very brief and high-level description of the biological system that a large portion of this area has been inspired by.

<a name='bio'></a>

### Biological motivation and connections

The basic computational unit of the brain is a **neuron**. Approximately 86 billion neurons can be found in the human nervous system and they are connected with approximately 10^14 - 10^15 **synapses**. The diagram below shows a cartoon drawing of a biological neuron (left) and a common mathematical model (right). Each neuron receives input signals from its **dendrites** and produces output signals along its (single) **axon**. The axon eventually branches out and connects via synapses to dendrites of other neurons. In the computational model of a neuron, the signals that travel along the axons (e.g. \\(x_0\\)) interact multiplicatively (e.g. \\(w_0 x_0\\)) with the dendrites of the other neuron based on the synaptic strength at that synapse (e.g. \\(w_0\\)). The idea is that the synaptic strengths (the weights \\(w\\)) are learnable and control the strength of influence (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. In the basic model, the dendrites carry the signal to the cell body where they all get summed. If the final sum is above a certain threshold, the neuron can *fire*, sending a spike along its axon. In the computational model, we assume that the precise timings of the spikes do not matter, and that only the frequency of the firing communicates information. Based on this *rate code* interpretation, we model the *firing rate* of the neuron with an **activation function** \\(f\\), which represents the frequency of the spikes along the axon. Historically, a common choice of activation function is the **sigmoid function** \\(\sigma\\), since it takes a real-valued input (the signal strength after the sum) and squashes it to range between 0 and 1. We will see details of these activation functions later in this section.

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/neuron.png" width="49%">
  <img src="/assets/nn1/neuron_model.jpeg" width="49%" style="border-left: 1px solid black;">
  <div class="figcaption">A cartoon drawing of a biological neuron (left) and its mathematical model (right).</div>
</div>

An example code for forward-propagating a single neuron might look as follows:

```python
class Neuron(object):
  # ... 
  def forward(self, inputs):
    """ assume inputs and weights are 1-D numpy arrays and bias is a number """
    cell_body_sum = np.sum(inputs * self.weights) + self.bias
    firing_rate = 1.0 / (1.0 + math.exp(-cell_body_sum)) # sigmoid activation function
    return firing_rate
```

In other words, each neuron performs a dot product with the input and its weights, adds the bias and applies the non-linearity (or activation function), in this case the sigmoid \\(\sigma(x) = 1/(1+e^{-x})\\). We will go into more details about different activation functions at the end of this section.


**Coarse model.** It's important to stress that this model of a biological neuron is very coarse: For example, there are many different types of neurons, each with different properties. The dendrites in biological neurons perform complex nonlinear computations. The synapses are not just a single weight, they're a complex non-linear dynamical system. The exact timing of the output spikes in many systems is known to be important, suggesting that the rate code approximation may not hold. Due to all these and many other simplifications, be prepared to hear groaning sounds from anyone with some neuroscience background if you draw analogies between Neural Networks and real brains. See this [review](https://physics.ucsd.edu/neurophysics/courses/physics_171/annurev.neuro.28.061604.135703.pdf) (pdf), or more recently this [review](http://www.sciencedirect.com/science/article/pii/S0959438814000130) if you are interested.

<a name='classifier'></a>

### Single neuron as a linear classifier

The mathematical form of the model Neuron's forward computation might look familiar to you. As we saw with linear classifiers, a neuron has the capacity to "like" (activation near one) or "dislike" (activation near zero) certain linear regions of its input space. Hence, with an appropriate loss function on the neuron's output, we can turn a single neuron into a linear classifier:

**Binary Softmax classifier**. For example, we can interpret \\(\sigma(\sum_iw_ix_i + b)\\) to be the probability of one of the classes \\(P(y_i = 1 \mid x_i; w) \\). The probability of the other class would be \\(P(y_i = 0 \mid x_i; w) = 1 - P(y_i = 1 \mid x_i; w) \\), since they must sum to one. With this interpretation, we can formulate the cross-entropy loss as we have seen in the Linear Classification section, and optimizing it would lead to a binary Softmax classifier (also known as *logistic regression*). Since the sigmoid function is restricted to be between 0-1, the predictions of this classifier are based on whether the output of the neuron is greater than 0.5. 

**Binary SVM classifier**. Alternatively, we could attach a max-margin hinge loss to the output of the neuron and train it to become a binary Support Vector Machine.

**Regularization interpretation**. The regularization loss in both SVM/Softmax cases could in this biological view be interpreted as *gradual forgetting*, since it would have the effect of driving all synaptic weights \\(w\\) towards zero after every parameter update.

> A single neuron can be used to implement a binary classifier (e.g. binary Softmax or binary SVM classifiers)

<a name='actfun'></a>

### Commonly used activation functions

Every activation function (or *non-linearity*) takes a single number and performs a certain fixed mathematical operation on it. There are several activation functions you may encounter in practice:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/sigmoid.jpeg" width="40%">
  <img src="/assets/nn1/tanh.jpeg" width="40%" style="border-left: 1px solid black;">
  <div class="figcaption"><b>Left:</b> Sigmoid non-linearity squashes real numbers to range between [0,1] <b>Right:</b> The tanh non-linearity squashes real numbers to range between [-1,1].</div>
</div>

**Sigmoid.** The sigmoid non-linearity has the mathematical form \\(\sigma(x) = 1 / (1 + e^{-x})\\) and is shown in the image above on the left. As alluded to in the previous section, it takes a real-valued number and "squashes" it into range between 0 and 1. In particular, large negative numbers become 0 and large positive numbers become 1. The sigmoid function has seen frequent use historically since it has a nice interpretation as the firing rate of a neuron: from not firing at all (0) to fully-saturated firing at an assumed maximum frequency (1). In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. It has two major drawbacks: 

  - *Sigmoids saturate and kill gradients*. A very undesirable property of the sigmoid neuron is that when the neuron's activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. Recall that during backpropagation, this (local) gradient will be multiplied to the gradient of this gate's output for the whole objective. Therefore, if the local gradient is very small, it will effectively "kill" the gradient and almost no signal will flow through the neuron to its weights and recursively to its data. Additionally, one must pay extra caution when initializing the weights of sigmoid neurons to prevent saturation. For example, if the initial weights are too large then most neurons would become saturated and the network will barely learn.
  - *Sigmoid outputs are not zero-centered*. This is undesirable since neurons in later layers of processing in a Neural Network (more on this soon) would be receiving data that is not zero-centered. This has implications on the dynamics during gradient descent, because if the data coming into a neuron is always positive (e.g. \\(x > 0\\) elementwise in \\(f = w^Tx + b\\))), then the gradient on the weights \\(w\\) will during backpropagation become either all be positive, or all negative (depending on the gradient of the whole expression \\(f\\)). This could introduce undesirable zig-zagging dynamics in the gradient updates for the weights. However, notice that once these gradients are added up across a batch of data the final update for the weights can have variable signs, somewhat mitigating this issue. Therefore, this is an inconvenience but it has less severe consequences compared to the saturated activation problem above.

**Tanh.** The tanh non-linearity is shown on the image above on the right. It squashes a real-valued number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered. Therefore, in practice the *tanh non-linearity is always preferred to the sigmoid nonlinearity.* Also note that the tanh neuron is simply a scaled sigmoid neuron, in particular the following holds: \\( \tanh(x) = 2 \sigma(2x) -1  \\).

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/relu.jpeg" width="40%">
  <img src="/assets/nn1/alexplot.jpeg" width="40%" style="border-left: 1px solid black;">
  <div class="figcaption"><b>Left:</b> Rectified Linear Unit (ReLU) activation function, which is zero when x &lt; 0 and then linear with slope 1 when x &gt; 0. <b>Right:</b> A plot from <a href="http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf">Krizhevsky et al.</a> (pdf) paper indicating the 6x improvement in convergence with the ReLU unit compared to the tanh unit.</div>
</div>

**ReLU.** The Rectified Linear Unit has become very popular in the last few years. It computes the function \\(f(x) = \max(0, x)\\). In other words, the activation is simply thresholded at zero (see image above on the left). There are several pros and cons to using the ReLUs: 

- (+) It was found to greatly accelerate (e.g. a factor of 6 in [Krizhevsky et al.](http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf)) the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. It is argued that this is due to its linear, non-saturating form.
- (+) Compared to tanh/sigmoid neurons that involve expensive operations (exponentials, etc.), the ReLU can be implemented by simply thresholding a matrix of activations at zero.
- (-) Unfortunately, ReLU units can be fragile during training and can "die". For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. For example, you may find that as much as 40% of your network can be "dead" (i.e. neurons that never activate across the entire training dataset) if the learning rate is set too high. With a proper setting of the learning rate this is less frequently an issue.

**Leaky ReLU.** Leaky ReLUs are one attempt to fix the "dying ReLU" problem. Instead of the function being zero when x < 0, a leaky ReLU will instead have a small positive slope (of 0.01, or so). That is, the function computes \\(f(x) = \mathbb{1}(x < 0) (\alpha x) + \mathbb{1}(x>=0) (x) \\) where \\(\alpha\\) is a small constant. Some people report success with this form of activation function, but the results are not always consistent. The slope in the negative region can also be made into a parameter of each neuron, as seen in PReLU neurons, introduced in [Delving Deep into Rectifiers](http://arxiv.org/abs/1502.01852), by Kaiming He et al., 2015. However, the consistency of the benefit across tasks is presently unclear.

**Maxout**. Other types of units have been proposed that do not have the functional form \\(f(w^Tx + b)\\) where a non-linearity is applied on the dot product between the weights and the data. One relatively popular choice is the Maxout neuron (introduced recently by [Goodfellow et al.](https://arxiv.org/abs/1302.4389)) that generalizes the ReLU and its leaky version. The Maxout neuron computes the function \\(\max(w_1^Tx+b_1, w_2^Tx + b_2)\\). Notice that both ReLU and Leaky ReLU are a special case of this form (for example, for ReLU we have \\(w_1, b_1 = 0\\)). The Maxout neuron therefore enjoys all the benefits of a ReLU unit (linear regime of operation, no saturation) and does not have its drawbacks (dying ReLU). However, unlike the ReLU neurons it doubles the number of parameters for every single neuron, leading to a high total number of parameters.

This concludes our discussion of the most common types of neurons and their activation functions. As a last comment, it is very rare to mix and match different types of neurons in the same network, even though there is no fundamental problem with doing so.

**TLDR**: "*What neuron type should I use?*" Use the ReLU non-linearity, be careful with your learning rates and possibly monitor the fraction of "dead" units in a network. If this concerns you, give Leaky ReLU or Maxout a try. Never use sigmoid. Try tanh, but expect it to work worse than ReLU/Maxout.

<a name='nn'></a>

## Neural Network architectures

<a name='layers'></a>

### Layer-wise organization

**Neural Networks as neurons in graphs**. Neural Networks are modeled as collections of neurons that are connected in an acyclic graph. In other words, the outputs of some neurons can become inputs to other neurons. Cycles are not allowed since that would imply an infinite loop in the forward pass of a network. Instead of an amorphous blobs of connected neurons, Neural Network models are often organized into distinct layers of neurons. For regular neural networks, the most common layer type is the **fully-connected layer** in which  neurons between two adjacent layers are fully pairwise connected, but neurons within a single layer share no connections. Below are two example Neural Network topologies that use a stack of fully-connected layers:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/neural_net.jpeg" width="40%">
  <img src="/assets/nn1/neural_net2.jpeg" width="55%" style="border-left: 1px solid black;">
  <div class="figcaption"><b>Left:</b> A 2-layer Neural Network (one hidden layer of 4 neurons (or units) and one output layer with 2 neurons), and three inputs. <b>Right:</b> A 3-layer neural network with three inputs, two hidden layers of 4 neurons each and one output layer. Notice that in both cases there are connections (synapses) between neurons across layers, but not within a layer.</div>
</div>

**Naming conventions.** Notice that when we say N-layer neural network, we do not count the input layer. Therefore, a single-layer neural network describes a network with no hidden layers (input directly mapped to output). In that sense, you can sometimes hear people say that logistic regression or SVMs are simply a special case of single-layer Neural Networks. You may also hear these networks interchangeably referred to as *"Artificial Neural Networks"* (ANN) or *"Multi-Layer Perceptrons"* (MLP). Many people do not like the analogies between Neural Networks and real brains and prefer to refer to neurons as *units*.

**Output layer.** Unlike all layers in a Neural Network, the output layer neurons most commonly do not have an activation function (or you can think of them as having a linear identity activation function). This is because the last output layer is usually taken to represent the class scores (e.g. in classification), which are arbitrary real-valued numbers, or some kind of real-valued target (e.g. in regression). 

**Sizing neural networks**. The two metrics that people commonly use to measure the size of neural networks are the number of neurons, or more commonly the number of parameters. Working with the two example networks in the above picture:

- The first network (left) has 4 + 2 = 6 neurons (not counting the inputs), [3 x 4] + [4 x 2] = 20 weights and 4 + 2 = 6 biases, for a total of 26 learnable parameters. 
- The second network (right) has 4 + 4 + 1 = 9 neurons, [3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32 weights and 4 + 4 + 1 = 9 biases, for a total of 41 learnable parameters.

To give you some context, modern Convolutional Networks contain on orders of 100 million parameters and are usually made up of approximately 10-20 layers (hence *deep learning*). However, as we will see the number of *effective* connections is significantly greater due to parameter sharing. More on this in the Convolutional Neural Networks module.

<a name='feedforward'></a>

### Example feed-forward computation

*Repeated matrix multiplications interwoven with activation function*. One of the primary reasons that Neural Networks are organized into layers is that this structure makes it very simple and efficient to evaluate Neural Networks using matrix vector operations. Working with the example three-layer neural network in the diagram above, the input would be a [3x1] vector. All connection strengths for a layer can be stored in a single matrix. For example, the first hidden layer's weights `W1` would be of size [4x3], and the biases for all units would be in the vector `b1`, of size [4x1]. Here, every single neuron has its weights in a row of `W1`, so the matrix vector multiplication `np.dot(W1,x)` evaluates the activations of all neurons in that layer. Similarly, `W2` would be a [4x4] matrix that stores the connections of the second hidden layer, and `W3` a [1x4] matrix for the last (output) layer. The full forward pass of this 3-layer neural network is then simply three matrix multiplications, interwoven with the application of the activation function:

```python
# forward-pass of a 3-layer neural network:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)
```

In the above code, `W1,W2,W3,b1,b2,b3` are the learnable parameters of the network. Notice also that instead of having a single input column vector, the variable `x` could hold an entire batch of training data (where each input example would be a column of `x`) and then all examples would be efficiently evaluated in parallel. Notice that the final Neural Network layer usually doesn't have an activation function (e.g. it represents a (real-valued) class score in a classification setting).

> The forward pass of a fully-connected layer corresponds to one matrix multiplication followed by a bias offset and an activation function.

<a name='power'></a>

### Representational power

One way to look at Neural Networks with fully-connected layers is that they define a family of functions that are parameterized by the weights of the network. A natural question that arises is: What is the representational power of this family of functions? In particular, are there functions that cannot be modeled with a Neural Network? 

It turns out that Neural Networks with at least one hidden layer are *universal approximators*. That is, it can be shown (e.g. see [*Approximation by Superpositions of Sigmoidal Function*](http://www.dartmouth.edu/~gvc/Cybenko_MCSS.pdf) from 1989 (pdf), or this [intuitive explanation](http://neuralnetworksanddeeplearning.com/chap4.html) from Michael Nielsen) that given any continuous function \\(f(x)\\) and some \\(\epsilon > 0\\), there exists a Neural Network \\(g(x)\\) with one hidden layer (with a reasonable choice of non-linearity, e.g. sigmoid) such that \\( \forall x, \mid f(x) - g(x) \mid < \epsilon \\). In other words, the neural network can approximate any continuous function. 

If one hidden layer suffices to approximate any function, why use more layers and go deeper? The answer is that the fact that a two-layer Neural Network is a universal approximator is, while mathematically cute, a relatively weak and useless statement in practice. In one dimension, the "sum of indicator bumps" function \\(g(x) = \sum_i c_i \mathbb{1}(a_i < x < b_i)\\) where \\(a,b,c\\) are parameter vectors is also a universal approximator, but noone would suggest that we use this functional form in Machine Learning. Neural Networks work well in practice because they compactly express nice, smooth functions that fit well with the statistical properties of data we encounter in practice, and are also easy to learn using our optimization algorithms (e.g. gradient descent). Similarly, the fact that deeper networks (with multiple hidden layers) can work better than a single-hidden-layer networks is an empirical observation, despite the fact that their representational power is equal.

As an aside, in practice it is often the case that 3-layer neural networks will outperform 2-layer nets, but going even deeper (4,5,6-layer) rarely helps much more. This is in stark contrast to Convolutional Networks, where depth has been found to be an extremely important component for a good recognition system (e.g. on order of 10 learnable layers). One argument for this observation is that images contain hierarchical structure (e.g. faces are made up of eyes, which are made up of edges, etc.), so several layers of processing make intuitive sense for this data domain.

The full story is, of course, much more involved and a topic of much recent research. If you are interested in these topics we recommend for further reading:

- [Deep Learning](http://www.deeplearningbook.org/) book in press by Bengio, Goodfellow, Courville, in particular [Chapter 6.4](http://www.deeplearningbook.org/contents/mlp.html).
- [Do Deep Nets Really Need to be Deep?](http://arxiv.org/abs/1312.6184)
- [FitNets: Hints for Thin Deep Nets](http://arxiv.org/abs/1412.6550)

<a name='arch'></a>

### Setting number of layers and their sizes

How do we decide on what architecture to use when faced with a practical problem? Should we use no hidden layers? One hidden layer? Two hidden layers? How large should each layer be? First, note that as we increase the size and number of layers in a Neural Network, the **capacity** of the network increases. That is, the space of representable functions grows since the neurons can collaborate to express many different functions. For example, suppose we had a binary classification problem in two dimensions. We could train three separate neural networks, each with one hidden layer of some size and obtain the following classifiers:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/layer_sizes.jpeg">
  <div class="figcaption">Larger Neural Networks can represent more complicated functions. The data are shown as circles colored by their class, and the decision regions by a trained neural network are shown underneath. You can play with these examples in this <a href="http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html">ConvNetsJS demo</a>.</div>
</div>

In the diagram above, we can see that Neural Networks with more neurons can express more complicated functions. However, this is both a blessing (since we can learn to classify more complicated data) and a curse (since it is easier to overfit the training data). **Overfitting** occurs when a model with high capacity fits the noise in the data instead of the (assumed) underlying relationship. For example, the model with 20 hidden neurons fits all the training data but at the cost of segmenting the space into many disjoint red and green decision regions. The model with 3 hidden neurons only has the representational power to classify the data in broad strokes. It models the data as two blobs and interprets the few red points inside the green cluster as **outliers** (noise). In practice, this could lead to better **generalization** on the test set.

Based on our discussion above, it seems that smaller neural networks can be preferred if the data is not complex enough to prevent overfitting. However, this is incorrect - there are many other preferred ways to prevent overfitting in Neural Networks that we will discuss later (such as L2 regularization, dropout, input noise). In practice, it is always better to use these methods to control overfitting instead of the number of neurons. 

The subtle reason behind this is that smaller networks are harder to train with local methods such as Gradient Descent: It's clear that their loss functions have relatively few local minima, but it turns out that many of these minima are easier to converge to, and that they are bad (i.e. with high loss). Conversely, bigger neural networks contain significantly more local minima, but these minima turn out to be much better in terms of their actual loss. Since Neural Networks are non-convex, it is hard to study these properties mathematically, but some attempts to understand these objective functions have been made, e.g. in a recent paper [The Loss Surfaces of Multilayer Networks](http://arxiv.org/abs/1412.0233). In practice, what you find is that if you train a small network the final loss can display a good amount of variance - in some cases you get lucky and converge to a good place but in some cases you get trapped in one of the bad minima. On the other hand, if you train a large network you'll start to find many different solutions, but the variance in the final achieved loss will be much smaller. In other words, all solutions are about equally as good, and rely less on the luck of random initialization.

To reiterate, the regularization strength is the preferred way to control the overfitting of a neural network. We can look at the results achieved by three different settings:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn1/reg_strengths.jpeg">
  <div class="figcaption">
    The effects of regularization strength: Each neural network above has 20 hidden neurons, but changing the regularization strength makes its final decision regions smoother with a higher regularization. You can play with these examples in this <a href="http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html">ConvNetsJS demo</a>.
  </div>
</div>

The takeaway is that you should not be using smaller networks because you are afraid of overfitting. Instead, you should use as big of a neural network as your computational budget allows, and use other regularization techniques to control overfitting.

<a name='summary'></a>

## Summary

In summary, 

- We introduced a very coarse model of a biological **neuron**.
- We discussed several types of **activation functions** that are used in practice, with ReLU being the most common choice.
- We introduced **Neural Networks** where neurons are connected with **Fully-Connected layers** where neurons in adjacent layers have full pair-wise connections, but neurons within a layer are not connected.
- We saw that this layered architecture enables very efficient evaluation of Neural Networks based on matrix multiplications interwoven with the application of the activation function.
- We saw that that Neural Networks are **universal function approximators**, but we also discussed the fact that this property has little to do with their ubiquitous use. They are used because they make certain "right" assumptions about the functional forms of functions that come up in practice.
- We discussed the fact that larger networks will always work better than smaller networks, but their higher model capacity must be appropriately addressed with stronger regularization (such as higher weight decay), or they might overfit. We will see more forms of regularization (especially dropout) in later sections.

<a name='add'></a>

## Additional References

- [deeplearning.net tutorial](http://www.deeplearning.net/tutorial/mlp.html) with Theano
- [ConvNetJS](http://cs.stanford.edu/people/karpathy/convnetjs/) demos for intuitions
- [Michael Nielsen's](http://neuralnetworksanddeeplearning.com/chap1.html) tutorials


================================================
FILE: neural-networks-2.md
================================================
---
layout: page
permalink: /neural-networks-2/
---

Table of Contents:

- [Setting up the data and the model](#intro)
  - [Data Preprocessing](#datapre)
  - [Weight Initialization](#init)
  - [Batch Normalization](#batchnorm)
  - [Regularization](#reg) (L2/L1/Maxnorm/Dropout)
- [Loss functions](#losses)
- [Summary](#summary)

<a name='intro'></a>

## Setting up the data and the model

In the previous section we introduced a model of a Neuron, which computes a dot product following a non-linearity, and Neural Networks that arrange neurons into layers. Together, these choices define the new form of the **score function**, which we have extended from the simple linear mapping that we have seen in the Linear Classification section. In particular, a Neural Network performs a sequence of linear mappings with interwoven non-linearities. In this section we will discuss additional design choices regarding data preprocessing, weight initialization, and loss functions.

<a name='datapre'></a>

### Data Preprocessing

There are three common forms of data preprocessing a data matrix `X`, where we will assume that `X` is of size `[N x D]` (`N` is the number of data, `D` is their dimensionality).

**Mean subtraction**  is the most common form of preprocessing. It involves subtracting the mean across every individual *feature* in the data, and has the geometric interpretation of centering the cloud of data around the origin along every dimension. In numpy, this operation would be implemented as: `X -= np.mean(X, axis = 0)`. With images specifically, for convenience it can be common to subtract a single value from all pixels (e.g. `X -= np.mean(X)`), or to do so separately across the three color channels.

**Normalization** refers to normalizing the data dimensions so that they are of approximately the same scale. There are two common ways of achieving this normalization. One is to divide each dimension by its standard deviation, once it has been zero-centered: (`X /= np.std(X, axis = 0)`). Another form of this preprocessing normalizes each dimension so that the min and max along the dimension is -1 and 1 respectively. It only makes sense to apply this preprocessing if you have a reason to believe that different input features have different scales (or units), but they should be of approximately equal importance to the learning algorithm.
In case of images, the relative scales of pixels are already approximately equal (and in range from 0 to 255), so it is not strictly necessary to perform this additional preprocessing step.

<div class="fig figcenter fighighlight">
  <img src="/assets/nn2/prepro1.jpeg">
  <div class="figcaption">Common data preprocessing pipeline. <b>Left</b>: Original toy, 2-dimensional input data. <b>Middle</b>: The data is zero-centered by subtracting the mean in each dimension. The data cloud is now centered around the origin. <b>Right</b>: Each dimension is additionally scaled by its standard deviation. The red lines indicate the extent of the data - they are of unequal length in the middle, but of equal length on the right.</div>
</div>

**PCA and Whitening** is another form of preprocessing. In this process, the data is first centered as described above. Then, we can compute the covariance matrix that tells us about the correlation structure in the data:

```python
# Assume input data matrix X of size [N x D]
X -= np.mean(X, axis = 0) # zero-center the data (important)
cov = np.dot(X.T, X) / X.shape[0] # get the data covariance matrix
```

The (i,j) element of the data covariance matrix contains the *covariance* between i-th and j-th dimension of the data. In particular, the diagonal of this matrix contains the variances. Furthermore, the covariance matrix is symmetric and [positive semi-definite](http://en.wikipedia.org/wiki/Positive-definite_matrix#Negative-definite.2C_semidefinite_and_indefinite_matrices). We can compute the SVD factorization of the data covariance matrix:

```python
U,S,V = np.linalg.svd(cov)
```

where the columns of `U` are the eigenvectors and `S` is a 1-D array of the singular values. To decorrelate the data, we project the original (but zero-centered) data into the eigenbasis:

```python
Xrot = np.dot(X, U) # decorrelate the data
```

Notice that the columns of `U` are a set of orthonormal vectors (norm of 1, and orthogonal to each other), so they can be regarded as basis vectors. The projection therefore corresponds to a rotation of the data in `X` so that the new axes are the eigenvectors. If we were to compute the covariance matrix of `Xrot`, we would see that it is now diagonal. A nice property of `np.linalg.svd` is that in its returned value `U`, the eigenvector columns are sorted by their eigenvalues. We can use this to reduce the dimensionality of the data by only using the top few eigenvectors, and discarding the dimensions along which the data has no variance. This is also sometimes referred to as [Principal Component Analysis (PCA)](http://en.wikipedia.org/wiki/Principal_component_analysis) dimensionality reduction:

```python
Xrot_reduced = np.dot(X, U[:,:100]) # Xrot_reduced becomes [N x 100]
```

After this operation, we would have reduced the original dataset of size [N x D] to one of size [N x 100], keeping the 100 dimensions of the data that contain the most variance. It is very often the case that you can get very good performance by training linear classifiers or neural networks on the PCA-reduced datasets, obtaining savings in both space and time.

The last transformation you may see in practice is **whitening**. The whitening operation takes the data in the eigenbasis and divides every dimension by the eigenvalue to normalize the scale. The geometric interpretation of this transformation is that if the input data is a multivariable gaussian, then the whitened data will be a gaussian with zero mean and identity covariance matrix. This step would take the form:

```python
# whiten the data:
# divide by the eigenvalues (which are square roots of the singular values)
Xwhite = Xrot / np.sqrt(S + 1e-5)
```

*Warning: Exaggerating noise.* Note that we're adding 1e-5 (or a small constant) to prevent division by zero. One weakness of this transformation is that it can greatly exaggerate the noise in the data, since it stretches all dimensions (including the irrelevant dimensions of tiny variance that are mostly noise) to be of equal size in the input. This can in practice be mitigated by stronger smoothing (i.e. increasing 1e-5 to be a larger number).

<div class="fig figcenter fighighlight">
  <img src="/assets/nn2/prepro2.jpeg">
  <div class="figcaption">PCA / Whitening. <b>Left</b>: Original toy, 2-dimensional input data. <b>Middle</b>: After performing PCA. The data is centered at zero and then rotated into the eigenbasis of the data covariance matrix. This decorrelates the data (the covariance matrix becomes diagonal). <b>Right</b>: Each dimension is additionally scaled by the eigenvalues, transforming the data covariance matrix into the identity matrix. Geometrically, this corresponds to stretching and squeezing the data into an isotropic gaussian blob.</div>
</div>

We can also try to visualize these transformations with CIFAR-10 images. The training set of CIFAR-10 is of size 50,000 x 3072, where every image is stretched out into a 3072-dimensional row vector. We can then compute the [3072 x 3072] covariance matrix and compute its SVD decomposition (which can be relatively expensive). What do the computed eigenvectors look like visually? An image might help:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn2/cifar10pca.jpeg">
  <div class="figcaption"><b>Left:</b> An example set of 49 images. <b>2nd from Left:</b> The top 144 out of 3072 eigenvectors. The top eigenvectors account for most of the variance in the data, and we can see that they correspond to lower frequencies in the images.  <b>2nd from Right:</b> The 49 images reduced with PCA, using the 144 eigenvectors shown here. That is, instead of expressing every image as a 3072-dimensional vector where each element is the brightness of a particular pixel at some location and channel, every image above is only represented with a 144-dimensional vector, where each element measures how much of each eigenvector adds up to make up the image. In order to visualize what image information has been retained in the 144 numbers, we must rotate back into the "pixel" basis of 3072 numbers. Since U is a rotation, this can be achieved by multiplying by U.transpose()[:144,:], and then visualizing the resulting 3072 numbers as the image. You can see that the images are slightly blurrier, reflecting the fact that the top eigenvectors capture lower frequencies. However, most of the information is still preserved. <b>Right:</b> Visualization of the "white" representation, where the variance along every one of the 144 dimensions is squashed to equal length. Here, the whitened 144 numbers are rotated back to image pixel basis by multiplying by U.transpose()[:144,:]. The lower frequencies (which accounted for most variance) are now negligible, while the higher frequencies (which account for relatively little variance originally) become exaggerated.</div>
</div>

**In practice.** We mention PCA/Whitening in these notes for completeness, but these transformations are not used with Convolutional Networks. However, it is very important to zero-center the data, and it is common to see normalization of every pixel as well.

**Common pitfall**. An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).

<a name='init'></a>

### Weight Initialization

We have seen how to construct a Neural Network architecture, and how to preprocess the data. Before we can begin to train the network we have to initialize its parameters.

**Pitfall: all zero initialization**. Lets start with what we should not do. Note that we do not know what the final value of every weight should be in the trained network, but with proper data normalization it is reasonable to assume that approximately half of the weights will be positive and half of them will be negative. A reasonable-sounding idea then might be to set all the initial weights to zero, which we expect to be the "best guess" in expectation. This turns out to be a mistake, because if every neuron in the network computes the same output, then they will also all compute the same gradients during backpropagation and undergo the exact same parameter updates. In other words, there is no source of asymmetry between neurons if their weights are initialized to be the same.

**Small random numbers**. Therefore, we still want the weights to be very close to zero, but as we have argued above, not identically zero. As a solution, it is common to initialize the weights of the neurons to small numbers and refer to doing so as *symmetry breaking*. The idea is that the neurons are all random and unique in the beginning, so they will compute distinct updates and integrate themselves as diverse parts of the full network. The implementation for one weight matrix might look like `W = 0.01* np.random.randn(D,H)`, where `randn` samples from a zero mean, unit standard deviation gaussian. With this formulation, every neuron's weight vector is initialized as a random vector sampled from a multi-dimensional gaussian, so the neurons point in random direction in the input space. It is also possible to use small numbers drawn from a uniform distribution, but this seems to have relatively little impact on the final performance in practice.

*Warning*: It's not necessarily the case that smaller numbers will work strictly better. For example, a Neural Network layer that has very small weights will during backpropagation compute very small gradients on its data (since this gradient is proportional to the value of the weights). This could greatly diminish the "gradient signal" flowing backward through a network, and could become a concern for deep networks.

**Calibrating the variances with 1/sqrt(n)**. One problem with the above suggestion is that the distribution of the outputs from a randomly initialized neuron has a variance that grows with the number of inputs. It turns out that we can normalize the variance of each neuron's output to 1 by scaling its weight vector by the square root of its *fan-in* (i.e. its number of inputs). That is, the recommended heuristic is to initialize each neuron's weight vector as: `w = np.random.randn(n) / sqrt(n)`, where `n` is the number of its inputs. This ensures that all neurons in the network initially have approximately the same output distribution and empirically improves the rate of convergence.

The sketch of the derivation is as follows: Consider the inner product \\(s = \sum_i^n w_i x_i\\) between the weights \\(w\\) and input \\(x\\), which gives the raw activation of a neuron before the non-linearity. We can examine the variance of \\(s\\):

$$
\begin{align}
\text{Var}(s) &= \text{Var}(\sum_i^n w_ix_i) \\\\
&= \sum_i^n \text{Var}(w_ix_i) \\\\
&= \sum_i^n [E(w_i)]^2\text{Var}(x_i) + [E(x_i)]^2\text{Var}(w_i) + \text{Var}(x_i)\text{Var}(w_i) \\\\
&= \sum_i^n \text{Var}(x_i)\text{Var}(w_i) \\\\
&= \left( n \text{Var}(w) \right) \text{Var}(x)
\end{align}
$$

where in the first 2 steps we have used [properties of variance](http://en.wikipedia.org/wiki/Variance). In third step we assumed zero mean inputs and weights, so \\(E[x_i] = E[w_i] = 0\\). Note that this is not generally the case: For example ReLU units will have a positive mean. In the last step we assumed that all \\(w_i, x_i\\) are identically distributed. From this derivation we can see that if we want \\(s\\) to have the same variance as all of its inputs \\(x\\), then during initialization we should make sure that the variance of every weight \\(w\\) is \\(1/n\\). And since \\(\text{Var}(aX) = a^2\text{Var}(X)\\) for a random variable \\(X\\) and a scalar \\(a\\), this implies that we should draw from unit gaussian and then scale it by \\(a = \sqrt{1/n}\\), to make its variance \\(1/n\\). This gives the initialization `w = np.random.randn(n) / sqrt(n)`.

A similar analysis is carried out in [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) by Glorot et al. In this paper, the authors end up recommending an initialization of the form \\( \text{Var}(w) = 2/(n_{in} + n_{out}) \\) where \\(n_{in}, n_{out}\\) are the number of units in the previous layer and the next layer. This is based on a compromise and an equivalent analysis of the backpropagated gradients. A more recent paper on this topic, [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://arxiv-web3.library.cornell.edu/abs/1502.01852) by He et al., derives an initialization specifically for ReLU neurons, reaching the conclusion that the variance of neurons in the network should be \\(2.0/n\\). This gives the initialization `w = np.random.randn(n) * sqrt(2.0/n)`, and is the current recommendation for use in practice in the specific case of neural networks with ReLU neurons.

**Sparse initialization**. Another way to address the uncalibrated variances problem is to set all weight matrices to zero, but to break symmetry every neuron is randomly connected (with weights sampled from a small gaussian as above) to a fixed number of neurons below it. A typical number of neurons to connect to may be as small as 10.

**Initializing the biases**. It is possible and common to initialize the biases to be zero, since the asymmetry breaking is provided by the small random numbers in the weights. For ReLU non-linearities, some people like to use small constant value such as 0.01 for all biases because this ensures that all ReLU units fire in the beginning and therefore obtain and propagate some gradient. However, it is not clear if this provides a consistent improvement (in fact some results seem to indicate that this performs worse) and it is more common to simply use 0 bias initialization.

**In practice**, the current recommendation is to use ReLU units and use the `w = np.random.randn(n) * sqrt(2.0/n)`, as discussed in [He et al.](http://arxiv-web3.library.cornell.edu/abs/1502.01852). 

<a name='batchnorm'></a>

**Batch Normalization**. A recently developed technique by Ioffe and Szegedy called [Batch Normalization](http://arxiv.org/abs/1502.03167) alleviates a lot of headaches with properly initializing neural networks by explicitly forcing the activations throughout a network to take on a unit gaussian distribution at the beginning of the training. The core observation is that this is possible because normalization is a simple differentiable operation. In the implementation, applying this technique usually amounts to insert the BatchNorm layer immediately after fully connected layers (or convolutional layers, as we'll soon see), and before non-linearities. We do not expand on this technique here because it is well described in the linked paper, but note that it has become a very common practice to use Batch Normalization in neural networks. In practice networks that use Batch Normalization are significantly more robust to bad initialization. Additionally, batch normalization can be interpreted as doing preprocessing at every layer of the network, but integrated into the network itself in a differentiable manner. Neat!

<a name='reg'></a>

### Regularization

There are several ways of controlling the capacity of Neural Networks to prevent overfitting:

**L2 regularization** is perhaps the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective. That is, for every weight \\(w\\) in the network, we add the term \\(\frac{1}{2} \lambda w^2\\) to the objective, where \\(\lambda\\) is the regularization strength. It is common to see the factor of \\(\frac{1}{2}\\) in front because then the gradient of this term with respect to the parameter \\(w\\) is simply \\(\lambda w\\) instead of \\(2 \lambda w\\). The L2 regularization has the intuitive interpretation of heavily penalizing peaky weight vectors and preferring diffuse weight vectors. As we discussed in the Linear Classification section, due to multiplicative interactions between weights and inputs this has the appealing property of encouraging the network to use all of its inputs a little rather than some of its inputs a lot. Lastly, notice that during gradient descent parameter update, using the L2 regularization ultimately means that every weight is decayed linearly: `W += -lambda * W` towards zero.

**L1 regularization** is another relatively common form of regularization, where for each weight \\(w\\) we add the term \\(\lambda  \mid w \mid\\) to the objective. It is possible to combine the L1 regularization with the L2 regularization: \\(\lambda_1 \mid w \mid + \lambda_2 w^2\\) (this is called [Elastic net regularization](http://web.stanford.edu/~hastie/Papers/B67.2%20%282005%29%20301-320%20Zou%20&%20Hastie.pdf)). The L1 regularization has the intriguing property that it leads the weight vectors to become sparse during optimization (i.e. very close to exactly zero). In other words, neurons with L1 regularization end up using only a sparse subset of their most important inputs and become nearly invariant to the "noisy" inputs. In comparison, final weight vectors from L2 regularization are usually diffuse, small numbers. In practice, if you are not concerned with explicit feature selection, L2 regularization can be expected to give superior performance over L1.

**Max norm constraints**. Another form of regularization is to enforce an absolute upper bound on the magnitude of the weight vector for every neuron and use projected gradient descent to enforce the constraint. In practice, this corresponds to performing the parameter update as normal, and then enforcing the constraint by clamping the weight vector \\(\vec{w}\\) of every neuron to satisfy \\(\Vert \vec{w} \Vert_2 < c\\). Typical values of \\(c\\) are on orders of 3 or 4. Some people report improvements when using this form of regularization. One of its appealing properties is that network cannot "explode" even when the learning rates are set too high because the updates are always bounded.

**Dropout** is an extremely effective, simple and recently introduced regularization technique by Srivastava et al. in [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) (pdf) that complements the other methods (L1, L2, maxnorm). While training, dropout is implemented by only keeping a neuron active with some probability \\(p\\) (a hyperparameter), or setting it to zero otherwise.

<div class="fig figcenter fighighlight">
  <img src="/assets/nn2/dropout.jpeg" width="70%">
  <div class="figcaption">Figure taken from the <a href="http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf">Dropout paper</a> that illustrates the idea. During training, Dropout can be interpreted as sampling a Neural Network within the full Neural Network, and only updating the parameters of the sampled network based on the input data. (However, the exponential number of possible sampled networks are not independent because they share the parameters.) During testing there is no dropout applied, with the interpretation of evaluating an averaged prediction across the exponentially-sized ensemble of all sub-networks (more about ensembles in the next section).</div>
</div>

Vanilla dropout in an example 3-layer Neural Network would be implemented as follows:

```python
""" Vanilla Dropout: Not recommended implementation (see notes below) """

p = 0.5 # probability of keeping a unit active. higher = less dropout

def train_step(X):
  """ X contains the data """
  
  # forward pass for example 3-layer neural network
  H1 = np.maximum(0, np.dot(W1, X) + b1)
  U1 = np.random.rand(*H1.shape) < p # first dropout mask
  H1 *= U1 # drop!
  H2 = np.maximum(0, np.dot(W2, H1) + b2)
  U2 = np.random.rand(*H2.shape) < p # second dropout mask
  H2 *= U2 # drop!
  out = np.dot(W3, H2) + b3
  
  # backward pass: compute gradients... (not shown)
  # perform parameter update... (not shown)
  
def predict(X):
  # ensembled forward pass
  H1 = np.maximum(0, np.dot(W1, X) + b1) * p # NOTE: scale the activations
  H2 = np.maximum(0, np.dot(W2, H1) + b2) * p # NOTE: scale the activations
  out = np.dot(W3, H2) + b3
```

In the code above, inside the `train_step` function we have performed dropout twice: on the first hidden layer and on the second hidden layer. It is also possible to perform dropout right on the input layer, in which case we would also create a binary mask for the input `X`. The backward pass remains unchanged, but of course has to take into account the generated masks `U1,U2`. 

Crucially, note that in the `predict` function we are not dropping anymore, but we are performing a scaling of both hidden layer outputs by \\(p\\). This is important because at test time all neurons see all their inputs, so we want the outputs of neurons at test time to be identical to their expected outputs at training time. For example, in case of \\(p = 0.5\\), the neurons must halve their outputs at test time to have the same output as they had during training time (in expectation). To see this, consider an output of a neuron \\(x\\) (before dropout). With dropout, the expected output from this neuron will become \\(px + (1-p)0\\), because the neuron's output will be set to zero with probability \\(1-p\\). At test time, when we keep the neuron always active, we must adjust \\(x \rightarrow px\\) to keep the same expected output. It can also be shown that performing this attenuation at test time can be related to the process of iterating over all the possible binary masks (and therefore all the exponentially many sub-networks) and computing their ensemble prediction.

The undesirable property of the scheme presented above is that we must scale the activations by \\(p\\) at test time. Since test-time performance is so critical, it is always preferable to use **inverted dropout**, which performs the scaling at train time, leaving the forward pass at test time untouched. Additionally, this has the appealing property that the prediction code can remain untouched when you decide to tweak where you apply dropout, or if at all. Inverted dropout looks as follows:

```python
""" 
Inverted Dropout: Recommended implementation example.
We drop and scale at train time and don't do anything at test time.
"""

p = 0.5 # probability of keeping a unit active. higher = less dropout

def train_step(X):
  # forward pass for example 3-layer neural network
  H1 = np.maximum(0, np.dot(W1, X) + b1)
  U1 = (np.random.rand(*H1.shape) < p) / p # first dropout mask. Notice /p!
  H1 *= U1 # drop!
  H2 = np.maximum(0, np.dot(W2, H1) + b2)
  U2 = (np.random.rand(*H2.shape) < p) / p # second dropout mask. Notice /p!
  H2 *= U2 # drop!
  out = np.dot(W3, H2) + b3
  
  # backward pass: compute gradients... (not shown)
  # perform parameter update... (not shown)
  
def predict(X):
  # ensembled forward pass
  H1 = np.maximum(0, np.dot(W1, X) + b1) # no scaling necessary
  H2 = np.maximum(0, np.dot(W2, H1) + b2)
  out = np.dot(W3, H2) + b3
```

There has a been a large amount of research after the first introduction of dropout that tries to understand the source of its power in practice, and its relation to the other regularization techniques. Recommended further reading for an interested reader includes:

- [Dropout paper](http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf) by Srivastava et al. 2014.
- [Dropout Training as Adaptive Regularization](http://papers.nips.cc/paper/4882-dropout-training-as-adaptive-regularization.pdf): "we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix".

**Theme of noise in forward pass**. Dropout falls into a more general category of methods that introduce stochastic behavior in the forward pass of the network. During testing, the noise is marginalized over *analytically* (as is the case with dropout when multiplying by \\(p\\)), or *numerically* (e.g. via sampling, by performing several forward passes with different random decisions and then averaging over them). An example of other research in this direction includes [DropConnect](http://cs.nyu.edu/~wanli/dropc/), where a random set of weights is instead set to zero during forward pass. As foreshadowing, Convolutional Neural Networks also take advantage of this theme with methods such as stochastic pooling, fractional pooling, and data augmentation. We will go into details of these methods later.

**Bias regularization**. As we already mentioned in the Linear Classification section, it is not common to regularize the bias parameters because they do not interact with the data through multiplicative interactions, and therefore do not have the interpretation of controlling the influence of a data dimension on the final objective. However, in practical applications (and with proper data preprocessing) regularizing the bias rarely leads to significantly worse performance. This is likely because there are very few bias terms compared to all the weights, so the classifier can "afford to" use the biases if it needs them to obtain a better data loss.

**Per-layer regularization**. It is not very common to regularize different layers to different amounts (except perhaps the output layer). Relatively few results regarding this idea have been published in the literature.

**In practice**: It is most common to use a single, global L2 regularization strength that is cross-validated. It is also common to combine this with dropout applied after all layers. The value of \\(p = 0.5\\) is a reasonable default, but this can be tuned on validation data.

<a name='losses'></a>

### Loss functions

We have discussed the regularization loss part of the objective, which can be seen as penalizing some measure of complexity of the model. The second part of an objective is the *data loss*, which in a supervised learning problem measures the compatibility between a prediction (e.g. the class scores in classification) and the ground truth label. The data loss takes the form of an average over the data losses for every individual example. That is, \\(L = \frac{1}{N} \sum_i L_i\\) where \\(N\\) is the number of training data. Lets abbreviate \\(f = f(x_i; W)\\) to be the activations of the output layer in a Neural Network. There are several types of problems you might want to solve in practice:

**Classification** is the case that we have so far discussed at length. Here, we assume a dataset of examples and a single correct label (out of a fixed set) for each example. One of two most commonly seen cost functions in this setting is the SVM (e.g. the Weston Watkins formulation):

$$
L_i = \sum_{j\neq y_i} \max(0, f_j - f_{y_i} + 1)
$$

As we briefly alluded to, some people report better performance with the squared hinge loss (i.e. instead using \\(\max(0, f_j - f_{y_i} + 1)^2\\)). The second common choice is the Softmax classifier that uses the cross-entropy loss:

$$
L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right)
$$

**Problem: Large number of classes**. When the set of labels is very large (e.g. words in English dictionary, or ImageNet which contains 22,000 categories), computing the full softmax probabilities becomes expensive. For certain applications, approximate versions are popular. For instance, it may be helpful to use *Hierarchical Softmax* in natural language processing tasks (see one explanation [here](http://arxiv.org/pdf/1310.4546.pdf) (pdf)). The hierarchical softmax decomposes words as labels in a tree. Each label is then represented as a path along the tree, and a Softmax classifier is trained at every node of the tree to disambiguate between the left and right branch. The structure of the tree strongly impacts the performance and is generally problem-dependent.

**Attribute classification**. Both losses above assume that there is a single correct answer \\(y_i\\). But what if \\(y_i\\) is a binary vector where every example may or may not have a certain attribute, and where the attributes are not exclusive? For example, images on Instagram can be thought of as labeled with a certain subset of hashtags from a large set of all hashtags, and an image may contain multiple. A sensible approach in this case is to build a binary classifier for every single attribute independently. For example, a binary classifier for each category independently would take the form:

$$
L_i = \sum_j \max(0, 1 - y_{ij} f_j)
$$

where the sum is over all categories \\(j\\), and \\(y_{ij}\\) is either +1 or -1 depending on whether the i-th example is labeled with the j-th attribute, and the score vector \\(f_j\\) will be positive when the class is predicted to be present and negative otherwise. Notice that loss is accumulated if a positive example has score less than +1, or when a negative example has score greater than -1. 

An alternative to this loss would be to train a logistic regression classifier for every attribute independently. A binary logistic regression classifier has only two classes (0,1), and calculates the probability of class 1 as:

$$
P(y = 1 \mid x; w, b) = \frac{1}{1 + e^{-(w^Tx +b)}} = \sigma (w^Tx + b)
$$

Since the probabilities of class 1 and 0 sum to one, the probability for class 0 is \\(P(y = 0 \mid x; w, b) = 1 - P(y = 1 \mid x; w,b)\\). Hence, an example is classified as a positive example (y = 1) if \\(\sigma (w^Tx + b) > 0.5\\), or equivalently if the score \\(w^Tx +b > 0\\). The loss function then maximizes this probability. You can convince yourself that this simplifies to minimizing the negative log-likelihood:

$$
L_i = -\sum_j y_{ij} \log(\sigma(f_j)) + (1 - y_{ij}) \log(1 - \sigma(f_j))
$$

where the labels \\(y_{ij}\\) are assumed to be either 1 (positive) or 0 (negative), and \\(\sigma(\cdot)\\) is the sigmoid function. The expression above can look scary but the gradient on \\(f\\) is in fact extremely simple and intuitive: \\(\partial{L_i} / \partial{f_j} = \sigma(f_j) - y_{ij}\\) (as you can double check yourself by taking the derivatives).

**Regression** is the task of predicting real-valued quantities, such as the price of houses or the length of something in an image. For this task, it is common to compute the loss between the predicted quantity and the true answer and then measure the L2 squared norm, or L1 norm of the difference. The L2 norm squared would compute the loss for a single example of the form:

$$
L_i = \Vert f - y_i \Vert_2^2
$$

The reason the L2 norm is squared in the objective is that the gradient becomes much simpler, without changing the optimal parameters since squaring is a monotonic operation. The L1 norm would be formulated by summing the absolute value along each dimension:

$$
L_i = \Vert f - y_i \Vert_1 = \sum_j \mid f_j - (y_i)_j \mid
$$

where the sum \\(\sum_j\\) is a sum over all dimensions of the desired prediction, if there is more than one quantity being predicted. Looking at only the j-th dimension of the i-th example and denoting the difference between the true and the predicted value by \\(\delta_{ij}\\), the gradient for this dimension (i.e. \\(\partial{L_i} / \partial{f_j}\\)) is easily derived to be either \\(\delta_{ij}\\) with the L2 norm, or \\(sign(\delta_{ij})\\). That is, the gradient on the score will either be directly proportional to the difference in the error, or it will be fixed and only inherit the sign of the difference.

*Word of caution*: It is important to note that the L2 loss is much harder to optimize than a more stable loss such as Softmax. Intuitively, it requires a very fragile and specific property from the network to output exactly one correct value for each input (and its augmentations). Notice that this is not the case with Softmax, where the precise value of each score is less important: It only matters that their magnitudes are appropriate. Additionally, the L2 loss is less robust because outliers can introduce huge gradients. When faced with a regression problem, first consider if it is absolutely inadequate to quantize the output into bins. For example, if you are predicting star rating for a product, it might work much better to use 5 independent classifiers for ratings of 1-5 stars instead of a regression loss. Classification has the additional benefit that it can give you a distribution over the regression outputs, not just a single output with no indication of its confidence. If you're certain that classification is not appropriate, use the L2 but be careful: For example, the L2 is more fragile and applying dropout in the network (especially in the layer right before the L2 loss) is not a great idea.

> When faced with a regression task, first consider if it is absolutely necessary. Instead, have a strong preference to discretizing your outputs to bins and perform classification over them whenever possible.

**Structured prediction**. The structured loss refers to a case where the labels can be arbitrary structures such as graphs, trees, or other complex objects. Usually it is also assumed that the space of structures is very large and not easily enumerable. The basic idea behind the structured SVM loss is to demand a margin between the correct structure \\(y_i\\) and the highest-scoring incorrect structure. It is not common to solve this problem as a simple unconstrained optimization problem with gradient descent. Instead, special solvers are usually devised so that the specific simplifying assumptions of the structure space can be taken advantage of. We mention the problem briefly but consider the specifics to be outside of the scope of the class.

<a name='summary'></a>

## Summary

In summary:

- The recommended preprocessing is to center the data to have mean of zero, and normalize its scale to [-1, 1] along each feature
- Initialize the weights by drawing them from a gaussian distribution with standard deviation of \\(\sqrt{2/n}\\), where \\(n\\) is the number of inputs to the neuron. E.g. in numpy: `w = np.random.randn(n) * sqrt(2.0/n)`.
- Use L2 regularization and dropout (the inverted version)
- Use batch normalization
- We discussed different tasks you might want to perform in practice, and the most common loss functions for each task

We've now preprocessed the data and set up and initialized the model. In the next section we will look at the learning process and its dynamics.


================================================
FILE: neural-networks-3.md
================================================
---
layout: page
permalink: /neural-networks-3/
---

Table of Contents:

- [Gradient checks](#gradcheck)
- [Sanity checks](#sanitycheck)
- [Babysitting the learning process](#baby)
  - [Loss function](#loss)
  - [Train/val accuracy](#accuracy)
  - [Weights:Updates ratio](#ratio)
  - [Activation/Gradient distributions per layer](#distr)
  - [Visualization](#vis)
- [Parameter updates](#update)
  - [First-order (SGD), momentum, Nesterov momentum](#sgd)
  - [Annealing the learning rate](#anneal)
  - [Second-order methods](#second)
  - [Per-parameter adaptive learning rates (Adagrad, RMSProp)](#ada)
- [Hyperparameter Optimization](#hyper)
- [Evaluation](#eval)
  - [Model Ensembles](#ensemble)
- [Summary](#summary)
- [Additional References](#add)

## Learning

In the previous sections we've discussed the static parts of a Neural Networks: how we can set up the network connectivity, the data, and the loss function. This section is devoted to the dynamics, or in other words, the process of learning the parameters and finding good hyperparameters.

<a name='gradcheck'></a>

### Gradient Checks

In theory, performing a gradient check is as simple as comparing the analytic gradient to the numerical gradient. In practice, the process is much more involved and error prone. Here are some tips, tricks, and issues to watch out for:

**Use the centered formula**. The formula you may have seen for the finite difference approximation when evaluating the numerical gradient looks as follows:

$$
\frac{df(x)}{dx} = \frac{f(x + h) - f(x)}{h} \hspace{0.1in} \text{(bad, do not use)}
$$

where \\(h\\) is a very small number, in practice approximately 1e-5 or so. In practice, it turns out that it is much better to use the *centered* difference formula of the form:

$$
\frac{df(x)}{dx} = \frac{f(x + h) - f(x - h)}{2h} \hspace{0.1in} \text{(use instead)}
$$

This requires you to evaluate the loss function twice to check every single dimension of the gradient (so it is about 2 times as expensive), but the gradient approximation turns out to be much more precise. To see this, you can use Taylor expansion of \\(f(x+h)\\) and \\(f(x-h)\\) and verify that the first formula has an error on order of \\(O(h)\\), while the second formula only has error terms on order of \\(O(h^2)\\) (i.e. it is a second order approximation).

**Use relative error for the comparison**. What are the details of comparing the numerical gradient \\(f'_n\\) and analytic gradient \\(f'_a\\)? That is, how do we know if the two are not compatible? You might be temped to keep track of the difference \\(\mid f'_a - f'_n \mid \\) or its square and define the gradient check as failed if that difference is above a threshold. However, this is problematic. For example, consider the case where their difference is 1e-4. This seems like a very appropriate difference if the two gradients are about 1.0, so we'd consider the two gradients to match. But if the gradients were both on order of 1e-5 or lower, then we'd consider 1e-4 to be a huge difference and likely a failure. Hence, it is always more appropriate to consider the *relative error*:

$$
\frac{\mid f'_a - f'_n \mid}{\max(\mid f'_a \mid, \mid f'_n \mid)}
$$

which considers their ratio of the differences to the ratio of the absolute values of both gradients. Notice that normally the relative error formula only includes one of the two terms (either one), but I prefer to max (or add) both to make it symmetric and to prevent dividing by zero in the case where one of the two is zero (which can often happen, especially with ReLUs). However, one must explicitly keep track of the case where both are zero and pass the gradient check in that edge case. In practice:

- relative error > 1e-2 usually means the gradient is probably wrong
- 1e-2 > relative error > 1e-4 should make you feel uncomfortable
- 1e-4 > relative error is usually okay for objectives with kinks. But if there are no kinks (e.g. use of tanh nonlinearities and softmax), then 1e-4 is too high.
- 1e-7 and less you should be happy.

Also keep in mind that the deeper the network, the higher the relative errors will be. So if you are gradient checking the input data for a 10-layer network, a relative error of 1e-2 might be okay because the errors build up on the way. Conversely, an error of 1e-2 for a single differentiable function likely indicates incorrect gradient.

**Use double precision**. A common pitfall is using single precision floating point to compute gradient check. It is often that case that you might get high relative errors (as high as 1e-2) even with a correct gradient implementation. In my experience I've sometimes seen my relative errors plummet from 1e-2 to 1e-8 by switching to double precision.

**Stick around active range of floating point**. It's a good idea to read through ["What Every Computer Scientist Should Know About Floating-Point Arithmetic"](http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html), as it may demystify your errors and enable you to write more careful code. For example, in neural nets it can be common to normalize the loss function over the batch. However, if your gradients per datapoint are very small, then *additionally* dividing them by the number of data points is starting to give very small numbers, which in turn will lead to more numerical issues. This is why I like to always print the raw numerical/analytic gradient, and make sure that the numbers you are comparing are not extremely small (e.g. roughly 1e-10 and smaller in absolute value is worrying). If they are you may want to temporarily scale your loss function up by a constant to bring them to a "nicer" range where floats are more dense - ideally on the order of 1.0, where your float exponent is 0.

**Kinks in the objective**. One source of inaccuracy to be aware of during gradient checking is the problem of *kinks*. Kinks refer to non-differentiable parts of an objective function, introduced by functions such as ReLU (\\(max(0,x)\\)), or the SVM loss, Maxout neurons, etc. Consider gradient checking the ReLU function at \\(x = -1e6\\). Since \\(x < 0\\), the analytic gradient at this point is exactly zero. However, the numerical gradient would suddenly compute a non-zero gradient because \\(f(x+h)\\) might cross over the kink (e.g. if \\(h > 1e-6\\)) and introduce a non-zero contribution. You might think that this is a pathological case, but in fact this case can be very common. For example, an SVM for CIFAR-10 contains up to 450,000 \\(max(0,x)\\) terms because there are 50,000 examples and each example yields 9 terms to the objective. Moreover, a Neural Network with an SVM classifier will contain many more kinks due to ReLUs.

Note that it is possible to know if a kink was crossed in the evaluation of the loss. This can be done by keeping track of the identities of all "winners" in a function of form \\(max(x,y)\\); That is, was x or y higher during the forward pass. If the identity of at least one winner changes when evaluating \\(f(x+h)\\) and then \\(f(x-h)\\), then a kink was crossed and the numerical gradient will not be exact.

**Use only few datapoints**. One fix to the above problem of kinks is to use fewer datapoints, since loss functions that contain kinks (e.g. due to use of ReLUs or margin losses etc.) will have fewer kinks with fewer datapoints, so it is less likely for you to cross one when you perform the finite different approximation. Moreover, if your gradcheck for only ~2 or 3 datapoints then you would almost certainly gradcheck for an entire batch. Using very few datapoints also makes your gradient check faster and more efficient.

**Be careful with the step size h**. It is not necessarily the case that smaller is better, because when \\(h\\) is much smaller, you may start running into numerical precision problems. Sometimes when the gradient doesn't check, it is possible that you change \\(h\\) to be 1e-4 or 1e-6 and suddenly the gradient will be correct. This [wikipedia article](http://en.wikipedia.org/wiki/Numerical_differentiation) contains a chart that plots the value of **h** on the x-axis and the numerical gradient error on the y-axis.

**Gradcheck during a "characteristic" mode of operation**. It is important to realize that a gradient check is performed at a particular (and usually random), single point in the space of parameters. Even if the gradient check succeeds at that point, it is not immediately certain that the gradient is correctly implemented globally. Additionally, a random initialization might not be the most "characteristic" point in the space of parameters and may in fact introduce pathological situations where the gradient seems to be correctly implemented but isn't. For instance, an SVM with very small weight initialization will assign almost exactly zero scores to all datapoints and the gradients will exhibit a particular pattern across all datapoints. An incorrect implementation of the gradient could still produce this pattern and not generalize to a more characteristic mode of operation where some scores are larger than others. Therefore, to be safe it is best to use a short **burn-in** time during which the network is allowed to learn and perform the gradient check after the loss starts to go down. The danger of performing it at the first iteration is that this could introduce pathological edge cases and mask an incorrect implementation of the gradient.

**Don't let the regularization overwhelm the data**. It is often the case that a loss function is a sum of the data loss and the regularization loss (e.g. L2 penalty on weights). One danger to be aware of is that the regularization loss may overwhelm the data loss, in which case the gradients will be primarily coming from the regularization term (which usually has a much simpler gradient expression). This can mask an incorrect implementation of the data loss gradient. Therefore, it is recommended to turn off regularization and check the data loss alone first, and then the regularization term second and independently. One way to perform the latter is to hack the code to remove the data loss contribution. Another way is to increase the regularization strength so as to ensure that its effect is non-negligible in the gradient check, and that an incorrect implementation would be spotted.

**Remember to turn off dropout/augmentations**. When performing gradient check, remember to turn off any non-deterministic effects in the network, such as dropout, random data augmentations, etc. Otherwise these can clearly introduce huge errors when estimating the numerical gradient. The downside of turning off these effects is that you wouldn't be gradient checking them (e.g. it might be that dropout isn't backpropagated correctly). Therefore, a better solution might be to force a particular random seed before evaluating both \\(f(x+h)\\) and \\(f(x-h)\\), and when evaluating the analytic gradient.

**Check only few dimensions**. In practice the gradients can have sizes of million parameters. In these cases it is only practical to check some of the dimensions of the gradient and assume that the others are correct. **Be careful**: One issue to be careful with is to make sure to gradient check a few dimensions for every separate parameter. In some applications, people combine the parameters into a single large parameter vector for convenience. In these cases, for example, the biases could only take up a tiny number of parameters from the whole vector, so it is important to not sample at random but to take this into account and check that all parameters receive the correct gradients.

<a name='sanitycheck'></a>

### Before learning: sanity checks Tips/Tricks

Here are a few sanity checks you might consider running before you plunge into expensive optimization:

- **Look for correct loss at chance performance.** Make sure you're getting the loss you expect when you initialize with small parameters. It's best to first check the data loss alone (so set regularization strength to zero). For example, for CIFAR-10 with a Softmax classifier we would expect the initial loss to be 2.302, because we expect a diffuse probability of 0.1 for each class (since there are 10 classes), and Softmax loss is the negative log probability of the correct class so: -ln(0.1) = 2.302. For The Weston Watkins SVM, we expect all desired margins to be violated (since all scores are approximately zero), and hence expect a loss of 9 (since margin is 1 for each wrong class). If you're not seeing these losses there might be issue with initialization.
- As a second sanity check, increasing the regularization strength should increase the loss
- **Overfit a tiny subset of data**. Lastly and most importantly, before training on the full dataset try to train on a tiny portion (e.g. 20 examples) of your data and make sure you can achieve zero cost. For this experiment it's also best to set regularization to zero, otherwise this can prevent you from getting zero cost. Unless you pass this sanity check with a small dataset it is not worth proceeding to the full dataset. Note that it may happen that you can overfit very small dataset but still have an incorrect implementation. For instance, if your datapoints' features are random due to some bug, then it will be possible to overfit your small training set but you will never notice any generalization when you fold it your full dataset.

<a name='baby'></a>

### Babysitting the learning process

There are multiple useful quantities you should monitor during training of a neural network. These plots are the window into the training process and should be utilized to get intuitions about different hyperparameter settings and how they should be changed for more efficient learning. 

The x-axis of the plots below are always in units of epochs, which measure how many times every example has been seen during training in expectation (e.g. one epoch means that every example has been seen once). It is preferable to track epochs rather than iterations since the number of iterations depends on the arbitrary setting of batch size.

<a name='loss'></a>

#### Loss function

The first quantity that is useful to track during training is the loss, as it is evaluated on the individual batches during the forward pass. Below is a cartoon diagram showing the loss over time, and especially what the shape might tell you about the learning rate:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn3/learningrates.jpeg" width="49%">
  <img src="/assets/nn3/loss.jpeg" width="49%">
  <div class="figcaption">
    <b>Left:</b> A cartoon depicting the effects of different learning rates. With low learning rates the improvements will be linear. With high learning rates they will start to look more exponential. Higher learning rates will decay the loss faster, but they get stuck at worse values of loss (green line). This is because there is too much "energy" in the optimization and the parameters are bouncing around chaotically, unable to settle in a nice spot in the optimization landscape. <b>Right:</b> An example of a typical loss function over time, while training a small network on CIFAR-10 dataset. This loss function looks reasonable (it might indicate a slightly too small learning rate based on its speed of decay, but it's hard to say), and also indicates that the batch size might be a little too low (since the cost is a little too noisy).
  </div>
</div>

The amount of "wiggle" in the loss is related to the batch size. When the batch size is 1, the wiggle will be relatively high. When the batch size is the full dataset, the wiggle will be minimal because every gradient update should be improving the loss function monotonically (unless the learning rate is set too high).

Some people prefer to plot their loss functions in the log domain. Since learning progress generally takes an exponential form shape, the plot appears as a slightly more interpretable straight line, rather than a hockey stick. Additionally, if multiple cross-validated models are plotted on the same loss graph, the differences between them become more apparent.

Sometimes loss functions can look funny [lossfunctions.tumblr.com](http://lossfunctions.tumblr.com/).

<a name='accuracy'></a>

#### Train/Val accuracy

The second important quantity to track while training a classifier is the validation/training accuracy. This plot can give you valuable insights into the amount of overfitting in your model:

<div class="fig figleft fighighlight">
  <img src="/assets/nn3/accuracies.jpeg">
  <div class="figcaption">
    The gap between the training and validation accuracy indicates the amount of overfitting. Two possible cases are shown in the diagram on the left. The blue validation error curve shows very small validation accuracy compared to the training accuracy, indicating strong overfitting (note, it's possible for the validation accuracy to even start to go down after some point). When you see this in practice you probably want to increase regularization (stronger L2 weight penalty, more dropout, etc.) or collect more data. The other possible case is when the validation accuracy tracks the training accuracy fairly well. This case indicates that your model capacity is not high enough: make the model larger by increasing the number of parameters.
  </div>
  <div style="clear:both"></div>
</div>

<a name='ratio'></a>

#### Ratio of weights:updates

The last quantity you might want to track is the ratio of the update magnitudes to the value magnitudes. Note: *updates*, not the raw gradients (e.g. in vanilla sgd this would be the gradient multiplied by the learning rate). You might want to evaluate and track this ratio for every set of parameters independently. A rough heuristic is that this ratio should be somewhere around 1e-3. If it is lower than this then the learning rate might be too low. If it is higher then the learning rate is likely too high. Here is a specific example:

```python
# assume parameter vector W and its gradient vector dW
param_scale = np.linalg.norm(W.ravel())
update = -learning_rate*dW # simple SGD update
update_scale = np.linalg.norm(update.ravel())
W += update # the actual update
print update_scale / param_scale # want ~1e-3
```

Instead of tracking the min or the max, some people prefer to compute and track the norm of the gradients and their updates instead. These metrics are usually correlated and often give approximately the same results.

<a name='distr'></a>

#### Activation / Gradient distributions per layer

An incorrect initialization can slow down or even completely stall the learning process. Luckily, this issue can be diagnosed relatively easily. One way to do so is to plot activation/gradient histograms for all layers of the network. Intuitively, it is not a good sign to see any strange distributions - e.g. with tanh neurons we would like to see a distribution of neuron activations between the full range of [-1,1], instead of seeing all neurons outputting zero, or all neurons being completely saturated at either -1 or 1.


<a name='vis'></a>

#### First-layer Visualizations

Lastly, when one is working with image pixels it can be helpful and satisfying to plot the first-layer features visually:

<div class="fig figcenter fighighlight">
  <img src="/assets/nn3/weights.jpeg" width="43%" style="margin-right:10px;">
  <img src="/assets/nn3/cnnweights.jpg" width="49%">
  <div class="figcaption">
    Examples of visualized weights for the first layer of a neural network. <b>Left</b>: Noisy features indicate could be a symptom: Unconverged network, improperly set learning rate, very low weight regularization penalty. <b>Right:</b> Nice, smooth, clean and diverse features are a good indication that the training is proceeding well.
  </div>
</div>

<a name='update'></a>

### Parameter updates

Once the analytic gradient is computed with backpropagation, the gradients are used to perform a parameter update. There are several approaches for performing the update, which we discuss next.

We note that optimization for deep networks is currently a very active area of research. In this section we highlight some established and common techniques you may see in practice, briefly describe their intuition, but leave a detailed analysis outside of the scope of the class. We provide some further pointers for an interested reader.

<a name='sgd'></a>

#### SGD and bells and whistles

**Vanilla update**. The simplest form of update is to change the parameters along the negative gradient direction (since the gradient indicates the direction of increase, but we usually wish to minimize a loss function). Assuming a vector of parameters `x` and the gradient `dx`, the simplest update has the form:

```python
# Vanilla update
x += - learning_rate * dx
```

where `learning_rate` is a hyperparameter - a fixed constant. When evaluated on the full dataset, and when the learning rate is low enough, this is guaranteed to make non-negative progress on the loss function.

**Momentum update** is another approach that almost always enjoys better converge rates on deep networks. This update can be motivated from a physical perspective of the optimization problem. In particular, the loss can be interpreted as the height of a hilly terrain (and therefore also to the potential energy since \\(U = mgh\\) and therefore \\( U \propto h \\) ). Initializing the parameters with random numbers is equivalent to setting a particle with zero initial velocity at some location. The optimization process can then be seen as equivalent to the process of simulating the parameter vector (i.e. a particle) as rolling on the landscape.

Since the force on the particle is related to the gradient of potential energy (i.e. \\(F = - \nabla U \\) ), the **force** felt by the particle is precisely the (negative) **gradient** of the loss function. Moreover, \\(F = ma \\) so the (negative) gradient is in this view proportional to the acceleration of the particle. Note that this is different from the SGD update shown above, where the gradient directly integrates the position. Instead, the physics view suggests an update in which the gradient only directly influences the velocity, which in turn has an effect on the position:

```python
# Momentum update
v = mu * v - learning_rate * dx # integrate velocity
x += v # integrate position
```

Here we see an introduction of a `v` variable that is initialized at zero, and an additional hyperparameter (`mu`). As an unfortunate misnomer, this variable is in optimization referred to as *momentum* (its typical value is about 0.9), but its physical meaning is more consistent with the coefficient of friction. Effectively, this variable damps the velocity and reduces the kinetic energy of the system, or otherwise the particle would never come to a stop at the bottom of a hill. When cross-validated, this parameter is usually set to values such as [0.5, 0.9, 0.95, 0.99]. Similar to annealing schedules for learning rates (discussed later, below), optimization can sometimes benefit a little from momentum schedules, where the momentum is increased in later stages of learning. A typical setting is to start with momentum of about 0.5 and anneal it to 0.99 or so over multiple epochs.

> With Momentum update, the parameter vector will build up velocity in any direction that has consistent gradient.

**Nesterov Momentum** is a slightly different version of the momentum update that has recently been gaining popularity. It enjoys stronger theoretical converge guarantees for convex functions and in practice it also consistenly works slightly better than standard momentum.

The core idea behind Nesterov momentum is that when the current parameter vector is at some position `x`, then looking at the momentum update above, we know that the momentum term alone (i.e. ignoring the second term with the gradient) is about to nudge the parameter vector by `mu * v`. Therefore, if we are about to compute the gradient, we can treat the future approximate position `x + mu * v` as a "lookahead" - this is a point in the vicinity of where we are soon going to end up. Hence, it makes sense to compute the gradient at `x + mu * v` instead of at the "old/stale" position `x`. 

<div class="fig figcenter fighighlight">
  <img src="/assets/nn3/nesterov.jpeg">
  <div class="figcaption">
    Nesterov momentum. Instead of evaluating gradient at the current position (red circle), we know that our momentum is about to carry us to the tip of the green arrow. With Nesterov momentum we therefore instead evaluate the gradient at this "looked-ahead" position.
  </div>
</div>

That is, in a slightly awkward notation, we would like to do the following:

```python
x_ahead = x + mu * v
# evaluate dx_ahead (the gradient at x_ahead instead of at x)
v = mu * v - learning_rate * dx_ahead
x += v
```

However, in practice people prefer to express the update to look as similar to vanilla SGD or to the previous momentum update as possible. This is possible to achieve by manipulating the update above with a variable transform `x_ahead = x + mu * v`, and then expressing the update in terms of `x_ahead` instead of `x`. That is, the parameter vector we are actually storing is always the ahead version. The equations in terms of `x_ahead` (but renaming it back to `x`) then become:

```python
v_prev = v # back this up
v = mu * v - learning_rate * dx # velocity update stays the same
x += -mu * v_prev + (1 + mu) * v # position update changes form
```

We recommend this further reading to understand the source of these equations and the mathematical formulation of Nesterov's Accelerated Momentum (NAG):

- [Advances in optimizing Recurrent Networks](http://arxiv.org/pdf/1212.0901v2.pdf) by Yoshua Bengio, Section 3.5.
- [Ilya Sutskever's thesis](http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf) (pdf) contains a longer exposition of the topic in section 7.2


<a name='anneal'></a>

#### Annealing the learning rate

In training deep networks, it is usually helpful to anneal the learning rate over time. Good intuition to have in mind is that with a high learning rate, the system contains too much kinetic energy and the parameter vector bounces around chaotically, unable to settle down into deeper, but narrower parts of the loss function. Knowing when to decay the learning rate can be tricky: Decay it slowly and you'll be wasting computation bouncing around chaotically with little improvement for a long time. But decay it too aggressively and the system will cool too quickly, unable to reach the best position it can. There are three common types of implementing the learning rate decay:

- **Step decay**: Reduce the learning rate by some factor every few epochs. Typical values might be reducing the learning rate by a half every 5 epochs, or by 0.1 every 20 epochs. These numbers depend heavily on the type of problem and the model. One heuristic you may see in practice is to watch the validation error while training with a fixed learning rate, and reduce the learning rate by a constant (e.g. 0.5) whenever the validation error stops improving.
- **Exponential decay.** has the mathematical form \\(\alpha = \alpha_0 e^{-k t}\\), where \\(\alpha_0, k\\) are hyperparameters and \\(t\\) is the iteration number (but you can also use units of epochs).
- **1/t decay** has the mathematical form \\(\alpha = \alpha_0 / (1 + k t )\\) where \\(a_0, k\\) are hyperparameters and \\(t\\) is the iteration number.

In practice, we find that the step decay is slightly preferable because the hyperparameters it involves (the fraction of decay and the step timings in units of epochs) are more interpretable than the hyperparameter \\(k\\). Lastly, if you can afford the computational budget, err on the side of slower decay and train for a longer time.

<a name='second'></a>

#### Second order methods

A second, popular group of methods for optimization in context of deep learning is based on [Newton's method](http://en.wikipedia.org/wiki/Newton%27s_method_in_optimization), which iterates the following update:

$$
x \leftarrow x - [H f(x)]^{-1} \nabla f(x)
$$

Here, \\(H f(x)\\) is the [Hessian matrix](http://en.wikipedia.org/wiki/Hessian_matrix), which is a square matrix of second-order partial derivatives of the function. The term \\(\nabla f(x)\\) is the gradient vector, as seen in Gradient Descent. Intuitively, the Hessian describes the local curvature of the loss function, which allows us to perform a more efficient update. In particular, multiplying by the inverse Hessian leads the optimization to take more aggressive steps in directions of shallow curvature and shorter steps in directions of steep curvature. Note, crucially, the absence of any learning rate hyperparameters in the update formula, which the proponents of these methods cite this as a large advantage over first-order methods.

However, the update above is impractical for most deep learning applications because computing (and inverting) the Hessian in its explicit form is a very costly process in both space and time. For instance, a Neural Network with one million parameters would have a Hessian matrix of size [1,000,000 x 1,000,000], occupying approximately 3725 gigabytes of RAM. Hence, a large variety of *quasi-Newton* methods have been developed that seek to approximate the inverse Hessian. Among these, the most popular is [L-BFGS](http://en.wikipedia.org/wiki/Limited-memory_BFGS), which uses the information in the gradients over time to form the approximation implicitly (i.e. the full matrix is never computed).

However, even after we eliminate the memory concerns, a large downside of a naive application of L-BFGS is that it must be computed over the entire training set, which could contain millions of examples. Unlike mini-batch SGD, getting L-BFGS to work on mini-batches is more tricky and an active area of research.

**In practice**, it is currently not common to see L-BFGS or similar second-order methods applied to large-scale Deep Learning and Convolutional Neural Networks. Instead, SGD variants based on (Nesterov's) momentum are more standard because they are simpler and scale more easily.

Additional references:

- [Large Scale Distributed Deep Networks](http://research.google.com/archive/large_deep_networks_nips2012.html) is a paper from the Google Brain team, comparing L-BFGS and SGD variants in large-scale distributed optimization.
- [SFO](http://arxiv.org/abs/1311.2115) algorithm strives to combine the advantages of SGD with advantages of L-BFGS.

<a name='ada'></a>

#### Per-parameter adaptive learning rate methods

All previous approaches we've discussed so far manipulated the learning rate globally and equally for all parameters. Tuning the learning rates is an expensive process, so much work has gone into devising methods that can adaptively tune the learning rates, and even do so per parameter. Many of these methods may still require other hyperparameter settings, but the argument is that they are well-behaved for a broader range of hyperparameter values than the raw learning rate. In this section we highlight some common adaptive methods you may encounter in practice:

**Adagrad** is an adaptive learning rate method originally proposed by [Duchi et al.](http://jmlr.org/papers/v12/duchi11a.html).

```python
# Assume the gradient dx and parameter vector x
cache += dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```

Notice that the variable `cache` has size equal to the size of the gradient, and keeps track of per-parameter sum of squared gradients. This is then used to normalize the parameter update step, element-wise. Notice that the weights that receive high gradients will have their effective learning rate reduced, while weights that receive small or infrequent updates will have their effective learning rate increased. Amusingly, the square root operation turns out to be very important and without it the algorithm performs much worse. The smoothing term `eps` (usually set somewhere in range from 1e-4 to 1e-8) avoids division by zero. A downside of Adagrad is that in case of Deep Learning, the monotonic learning rate usually proves too aggressive and stops learning too early.

**RMSprop.** RMSprop is a very effective, but currently unpublished adaptive learning rate method. Amusingly, everyone who uses this method in their work currently cites [slide 29 of Lecture 6](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) of Geoff Hinton's Coursera class. The RMSProp update adjusts the Adagrad method in a very simple way in an attempt to reduce its aggressive, monotonically decreasing learning rate. In particular, it uses a moving average of squared gradients instead, giving:

```python
cache = decay_rate * cache + (1 - decay_rate) * dx**2
x += - learning_rate * dx / (np.sqrt(cache) + eps)
```

Here, `decay_rate` is a hyperparameter and typical values are [0.9, 0.99, 0.999]. Notice that the `x+=` update is identical to Adagrad, but the `cache` variable is a "leaky". Hence, RMSProp still modulates the learning rate of each weight based on the magnitudes of its gradients, which has a beneficial equalizing effect, but unlike Adagrad the updates do not get monotonically smaller.

**Adam.** [Adam](http://arxiv.org/abs/1412.6980) is a recently proposed update that looks a bit like RMSProp with momentum. The (simplified) update looks as follows:

```python
m = beta1*m + (1-beta1)*dx
v = beta2*v + (1-beta2)*(dx**2)
x += - learning_rate * m / (np.sqrt(v) + eps)
```

Notice that the update looks exactly as RMSProp update, except the "smooth" version of the gradient `m` is used instead of the raw (and perhaps noisy) gradient vector `dx`. Recommended values in the paper are `eps = 1e-8`, `beta1 = 0.9`, `beta2 = 0.999`. In practice Adam is currently recommended as the default algorithm to use, and often works slightly better than RMSProp. However, it is often also worth trying SGD+Nesterov Momentum as an alternative. The full Adam update also includes a *bias correction* mechanism, which compensates for the fact that in the first few time steps the vectors `m,v` are both initialized and therefore biased at zero, before they fully "warm up". With the *bias correction* mechanism, the update looks as follows:

```python
# t is your iteration counter going from 1 to infinity
m = beta1*m + (1-beta1)*dx
mt = m / (1-beta1**t)
v = beta2*v + (1-beta2)*(dx**2)
vt = v / (1-beta2**t)
x += - learning_rate * mt / (np.sqrt(vt) + eps)
```
Note that the update is now a function of the iteration as well as the other parameters.
We refer the reader to the paper for the details, or the course slides where this is expanded on.

Additional References:

- [Unit Tests for Stochastic Optimization](http://arxiv.org/abs/1312.6055) proposes a series of tests as a standardized benchmark for stochastic optimization.

<div class="fig figcenter fighighlight">
  <img src="/assets/nn3/opt2.gif" width="49%" style="margin-right:10px;">
  <img src="/assets/nn3/opt1.gif" width="49%">
  <div class="figcaption">
    Animations that may help your intuitions about the learning process dynamics. <b>Left:</b> Contours of a loss surface and time evolution of different optimization algorithms. Notice the "overshooting" behavior of momentum-based methods, which make the optimization look like a ball rolling down the hill. <b>Right:</b> A visualization of a saddle point in the optimization landscape, where the curvature along different dimension has different signs (one dimension curves up and another down). Notice that SGD has a very hard time breaking symmetry and gets stuck on the top. Conversely, algorithms such as RMSprop will see very low gradients in the saddle direction. Due to the denominator term in the RMSprop update, this will increase the effective learning rate along this direction, helping RMSProp proceed. Images credit: <a href="https://twitter.com/alecrad">Alec Radford</a>.
  </div>
</div>

<a name='hyper'></a>

### Hyperparameter optimization

As we've seen, training Neural Networks can involve many hyperparameter settings. The most common hyperparameters in context of Neural Networks include:

- the initial learning rate
- learning rate decay schedule (such as the decay constant)
- regularization strength (L2 penalty, dropout strength)

But as we saw, there are many more relatively less sensitive hyperparameters, for example in per-parameter adaptive learning methods, the setting of momentum and its schedule, etc. In this section we describe some additional tips and tricks for performing the hyperparameter search:

**Implementation**. Larger Neural Networks typically require a long time to train, so performing hyperparameter search can take many days/weeks. It is important to keep this in mind since it influences the design of your code base. One particular design is to have a **worker** that continuously samples random hyperparameters and performs the optimization. During the training, the worker will keep track of the validation performance after every epoch, and writes a model checkpoint (together with miscellaneous training statistics such as the loss over time) to a file, preferably on a shared file system. It is useful to include the validation performance directly in the filename, so that it is simple to inspect and sort the progress. Then there is a second program which we will call a **master**, which launches or kills workers across a computing cluster, and may additionally inspect the checkpoints written by workers and plot their training statistics, etc.

**Prefer one validation fold to cross-validation**. In most cases a single validation set of respectable size substantially simplifies the code base, without the need for cross-validation with multiple folds. You'll hear people say they "cross-validated" a parameter, but many times it is assumed that they still only used a single validation set.

**Hyperparameter ranges**. Search for hyperparameters on log scale. For example, a typical sampling of the learning rate would look as follows: `learning_rate = 10 ** uniform(-6, 1)`. That is, we are generating a random number from a uniform distribution, but then raising it to the power of 10. The same strategy should be used for the regularization strength. Intuitively, this is because learning rate and regularization strength have multiplicative effects on the training dynamics. For example, a fixed change of adding 0.01 to a learning rate has huge effects on the dynamics if the learning rate is 0.001, but nearly no effect if the learning rate when it is 10. This is because the learning rate multiplies the computed gradient in the update. Therefore, it is much more natural to consider a range of learning rate multiplied or divided by some value, than a range of learning rate added or subtracted to by some value. Some parameters (e.g. dropout) are instead usually searched in the original scale (e.g. `dropout = uniform(0,1)`).

**Prefer random search to grid search**. As argued by Bergstra and Bengio in [Random Search for Hyper-Parameter Optimization](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf), "randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid". As it turns out, this is also usually easier to implement.

<div class="fig figcenter fighighlight">
  <img src="/assets/nn3/gridsearchbad.jpeg" width="50%">
  <div class="figcaption">
    Core illustration from <a href="http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf">Random Search for Hyper-Parameter Optimization</a> by Bergstra and Bengio. It is very often the case that some of the hyperparameters matter much more than others (e.g. top hyperparam vs. left one in this figure). Performing random search rather than grid search allows you to much more precisely discover good values for the important ones.
  </div>
</div>

**Careful with best values on border**. Sometimes it can happen that you're searching for a hyperparameter (e.g. learning rate) in a bad range. For example, suppose we use `learning_rate = 10 ** uniform(-6, 1)`. Once we receive the results, it is important to double check that the final learning rate is not at the edge of this interval, or otherwise you may be missing more optimal hyperparameter setting beyond the interval.

**Stage your search from coarse to fine**. In practice, it can be helpful to first search in coarse ranges (e.g. 10 ** [-6, 1]), and then depending on where the best results are turning up, narrow the range. Also, it can be helpful to perform the initial coarse search while only training for 1 epoch or even less, because many hyperparameter settings can lead the model to not learn at all, or immediately explode with infinite cost. The second stage could then perform a narrower search with 5 epochs, and the last stage could perform a detailed search in the final range for many more epochs (for example).

**Bayesian Hyperparameter Optimization** is a whole area of research devoted to coming up with algorithms that try to more efficiently navigate the space of hyperparameters. The core idea is to appropriately balance the exploration - exploitation trade-off when querying the performance at different hyperparameters. Multiple libraries have been developed based on these models as well, among some of the better known ones are [Spearmint](https://github.com/JasperSnoek/spearmint), [SMAC](http://www.cs.ubc.ca/labs/beta/Projects/SMAC/), and [Hyperopt](http://jaberg.github.io/hyperopt/). However, in practical settings with ConvNets it is still relatively difficult to beat random search in a carefully-chosen intervals. See some additional from-the-trenches discussion [here](http://nlpers.blogspot.com/2014/10/hyperparameter-search-bayesian.html).

<a name='eval'></a>

## Evaluation

<a name='ensemble'></a>

### Model Ensembles

In practice, one reliable approach to improving the performance of Neural Networks by a few percent is to train multiple independent models, and at test time average their predictions. As the number of models in the ensemble increases, the performance typically monotonically improves (though with diminishing returns). Moreover, the improvements are more dramatic with higher model variety in the ensemble. There are a few approaches to forming an ensemble:

- **Same model, different initializations**. Use cross-validation to determine the best hyperparameters, then train multiple models with the best set of hyperparameters but with different random initialization. The danger with this approach is that the variety is only due to initialization.
- **Top models discovered during cross-validation**. Use cross-validation to determine the best hyperparameters, then pick the top few (e.g. 10) models to form the ensemble. This improves the variety of the ensemble but has the danger of including suboptimal models. In practice, this can be easier to perform since it doesn't require additional retraining of models after cross-validation
- **Different checkpoints of a single model**. If training is very expensive, some people have had limited success in taking different checkpoints of a single network over time (for example after every epoch) and using those to form an ensemble. Clearly, this suffers from some lack of variety, but can still work reasonably well in practice. The advantage of this approach is that is very cheap.
- **Running average of parameters during training**. Related to the last point, a cheap way of almost always getting an extra percent or two of performance is to maintain a second copy of the network's weights in memory that maintains an exponentially decaying sum of previous weights during training. This way you're averaging the state of the network over last several iterations. You will find that this "smoothed" version of the weights over last few steps almost always achieves better validation error. The rough intuition to have in mind is that the objective is bowl-shaped and your network is jumping around the mode, so the average has a higher chance of being somewhere nearer the mode.

One disadvantage of model ensembles is that they take longer to evaluate on test example. An interested reader may find the recent work from Geoff Hinton on ["Dark Knowledge"](https://www.youtube.com/watch?v=EK61htlw8hY) inspiring, where the idea is to "distill" a good ensemble back to a single model by incorporating the ensemble log likelihoods into a modified objective.

<a name='summary'></a>

## Summary

To train a Neural Network:

- Gradient check your implementation with a small batch of data and be aware of the pitfalls.
- As a sanity check, make sure your initial loss is reasonable, and that you can achieve 100% training accuracy on a very small portion of the data
- During training, monitor the loss, the training/validation accuracy, and if you're feeling fancier, the magnitude of updates in relation to parameter values (it should be ~1e-3), and when dealing with ConvNets, the first-layer weights.
- The two recommended updates to use are either SGD+Nesterov Momentum or Adam.
- Decay your learning rate over the period of the training. For example, halve the learning rate after a fixed number of epochs, or whenever the validation accuracy tops off.
- Search for good hyperparameters with random search (not grid search). Stage your search from coarse (wide hyperparameter ranges, training only for 1-5 epochs), to fine (narrower rangers, training for many more epochs)
- Form model ensembles for extra performance

<a name='add'></a>

## Additional References

- [SGD](http://research.microsoft.com/pubs/192769/tricks-2012.pdf) tips and tricks from Leon Bottou
- [Efficient BackProp](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) (pdf) from Yann LeCun
- [Practical Recommendations for Gradient-Based Training of Deep
Architectures](http://arxiv.org/pdf/1206.5533v2.pdf) from Yoshua Bengio


================================================
FILE: neural-networks-case-study.md
================================================
---
layout: page
permalink: /neural-networks-case-study/
---

Table of Contents:

- [Generating some data](#data)
- [Training a Softmax Linear Classifier](#linear)
  - [Initialize the parameters](#init)
  - [Compute the class scores](#scores)
  - [Compute the loss](#loss)
  - [Computing the analytic gradient with backpropagation](#grad)
  - [Performing a parameter update](#update)
  - [Putting it all together: Training a Softmax Classifier](#together)
- [Training a Neural Network](#net)
- [Summary](#summary)

In this section we'll walk through a complete implementation of a toy Neural Network in 2 dimensions. We'll first implement a simple linear classifier and then extend the code to a 2-layer Neural Network. As we'll see, this extension is surprisingly simple and very few changes are necessary.

<a name='data'></a>

## Generating some data

Lets generate a classification dataset that is not easily linearly separable. Our favorite example is the spiral dataset, which can be generated as follows:

```python
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
  ix = range(N*j,N*(j+1))
  r = np.linspace(0.0,1,N) # radius
  t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
  X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
  y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
```

<div class="fig figcenter fighighlight">
  <img src="/assets/eg/spiral_raw.png">
  <div class="figcaption">
    The toy spiral data consists of three classes (blue, red, yellow) that are not linearly separable.
  </div>
</div>

Normally we would want to preprocess the dataset so that each feature has zero mean and unit standard deviation, but in this case the features are already in a nice range from -1 to 1, so we skip this step.

<a name='linear'></a>

## Training a Softmax Linear Classifier

<a name='init'></a>

### Initialize the parameters

Lets first train a Softmax classifier on this classification dataset. As we saw in the previous sections, the Softmax classifier has a linear score function and uses the cross-entropy loss. The parameters of the linear classifier consist of a weight matrix `W` and a bias vector `b` for each class. Lets first initialize these parameters to be random numbers:

```python
# initialize parameters randomly
W = 0.01 * np.random.randn(D,K)
b = np.zeros((1,K))
```

Recall that we `D = 2` is the dimensionality and `K = 3` is the number of classes.

<a name='scores'></a>

### Compute the class scores

Since this is a linear classifier, we can compute all class scores very simply in parallel with a single matrix multiplication:

```python
# compute class scores for a linear classifier
scores = np.dot(X, W) + b
```

In this example we have 300 2-D points, so after this multiplication the array `scores` will have size [300 x 3], where each row gives the class scores corresponding to the 3 classes (blue, red, yellow).

<a name='loss'></a>

### Compute the loss

The second key ingredient we need is a loss function, which is a differentiable objective that quantifies our unhappiness with the computed class scores. Intuitively, we want the correct class to have a higher score than the other classes. When this is the case, the loss should be low and otherwise the loss should be high. There are many ways to quantify this intuition, but in this example lets use the cross-entropy loss that is associated with the Softmax classifier. Recall that if \\(f\\) is the array of class scores for a single example (e.g. array of 3 numbers here), then the Softmax classifier computes the loss for that example as:

$$
L_i = -\log\left(\frac{e^{f_{y_i}}}{ \sum_j e^{f_j} }\right)
$$

We can see that the Softmax classifier interprets every element of \\(f\\) as holding the (unnormalized) log probabilities of the three classes. We exponentiate these to get (unnormalized) probabilities, and then normalize them to get probabilites. Therefore, the expression inside the log is the normalized probability of the correct class. Note how this expression works: this quantity is always between 0 and 1. When the probability of the correct class is very small (near 0), the loss will go towards (positive) infinity. Conversely, when the correct class probability goes towards 1, the loss will go towards zero because \\(log(1) = 0\\). Hence, the expression for \\(L_i\\) is low when the correct class probability is high, and it's very high when it is low.

Recall also that the full Softmax classifier loss is then defined as the average cross-entropy loss over the training examples and the regularization:

$$
L =  \underbrace{ \frac{1}{N} \sum_i L_i }_\text{data loss} + \underbrace{ \frac{1}{2} \lambda \sum_k\sum_l W_{k,l}^2 }_\text{regularization loss} \\\\
$$

Given the array of `scores` we've computed above, we can compute the loss. First, the way to obtain the probabilities is straight forward:

```python
num_examples = X.shape[0]
# get unnormalized probabilities
exp_scores = np.exp(scores)
# normalize them for each example
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
```

We now have an array `probs` of size [300 x 3], where each row now contains the class probabilities. In particular, since we've normalized them every row now sums to one. We can now query for the  log probabilities assigned to the correct classes in each example:

```python
correct_logprobs = -np.log(probs[range(num_examples),y])
```

The array `correct_logprobs` is a 1D array of just the probabilities assigned to the correct classes for each example. The full loss is then the average of these log probabilities and the regularization loss:

```python
# compute the loss: average cross-entropy loss and regularization
data_loss = np.sum(correct_logprobs)/num_examples
reg_loss = 0.5*reg*np.sum(W*W)
loss = data_loss + reg_loss
```

In this code, the regularization strength \\(\lambda\\) is stored inside the `reg`. The convenience factor of `0.5` multiplying the regularization will become clear in a second. Evaluating this in the beginning (with random parameters) might give us `loss = 1.1`, which is `-np.log(1.0/3)`, since with small initial random weights all probabilities assigned to all classes are about one third. We now want to make the loss as low as possible, with `loss = 0` as the absolute lower bound. But the lower the loss is, the higher are the probabilities assigned to the correct classes for all examples.

<a name='grad'></a>

### Computing the Analytic Gradient with Backpropagation

We have a way of evaluating the loss, and now we have to minimize it. We'll do so with gradient descent. That is, we start with random parameters (as shown above), and evaluate the gradient of the loss function with respect to the parameters, so that we know how we should change the parameters to decrease the loss. Lets introduce the intermediate variable \\(p\\), which is a vector of the (normalized) probabilities. The loss for one example is:

$$
p_k = \frac{e^{f_k}}{ \sum_j e^{f_j} } \hspace{1in} L_i =-\log\left(p_{y_i}\right)
$$

We now wish to understand how the computed scores inside \\(f\\) should change to decrease the loss \\(L_i\\) that this example contributes to the full objective. In other words, we want to derive the gradient \\( \partial L_i / \partial f_k \\). The loss \\(L_i\\) is computed from \\(p\\), which in turn depends on \\(f\\). It's a fun exercise to the reader to use the chain rule to derive the gradient, but it turns out to be extremely simple and interpretible in the end, after a lot of things cancel out:

$$
\frac{\partial L_i }{ \partial f_k } = p_k - \mathbb{1}(y_i = k)
$$

Notice how elegant and simple this expression is. Suppose the probabilities we computed were `p = [0.2, 0.3, 0.5]`, and that the correct class was the middle one (with probability 0.3). According to this derivation the gradient on the scores would be `df = [0.2, -0.7, 0.5]`. Recalling what the interpretation of the gradient, we see that this result is highly intuitive: increasing the first or last element of the score vector `f` (the scores of the incorrect classes) leads to an *increased* loss (due to the positive signs +0.2 and +0.5) - and increasing the loss is bad, as expected. However, increasing the score of the correct class has *negative* influence on the loss. The gradient of -0.7 is telling us that increasing the correct class score would lead to a decrease of the loss \\(L_i\\), which makes sense.

All of this boils down to the following code. Recall that `probs` stores the probabilities of all classes (as rows) for each example. To get the gradient on the scores, which we call `dscores`, we proceed as follows:

```python
dscores = probs
dscores[range(num_examples),y] -= 1
dscores /= num_examples
```

Lastly, we had that `scores = np.dot(X, W) + b`, so armed with the gradient on `scores` (stored in `dscores`), we can now backpropagate into `W` and `b`:

```python
dW = np.dot(X.T, dscores)
db = np.sum(dscores, axis=0, keepdims=True)
dW += reg*W # don't forget the regularization gradient
```

Where we see that we have backpropped through the matrix multiply operation, and also added the contribution from the regularization. Note that the regularization gradient has the very simple form `reg*W` since we used the constant `0.5` for its loss contribution (i.e. \\(\frac{d}{dw} ( \frac{1}{2} \lambda w^2) = \lambda w\\). This is a common convenience trick that simplifies the gradient expression.

<a name='update'></a>

### Performing a parameter update

Now that we've evaluated the gradient we know how every parameter influences the loss function. We will now perform a parameter update in the *negative* gradient direction to *decrease* the loss:

```python
# perform a parameter update
W += -step_size * dW
b += -step_size * db
```

<a name='together'></a>

### Putting it all together: Training a Softmax Classifier

Putting all of this together, here is the full code for training a Softmax classifier with Gradient descent:

```python
#Train a Linear Classifier

# initialize parameters randomly
W = 0.01 * np.random.randn(D,K)
b = np.zeros((1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop
num_examples = X.shape[0]
for i in range(200):

  # evaluate class scores, [N x K]
  scores = np.dot(X, W) + b

  # compute the class probabilities
  exp_scores = np.exp(scores)
  probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

  # compute the loss: average cross-entropy loss and regularization
  correct_logprobs = -np.log(probs[range(num_examples),y])
  data_loss = np.sum(correct_logprobs)/num_examples
  reg_loss = 0.5*reg*np.sum(W*W)
  loss = data_loss + reg_loss
  if i % 10 == 0:
    print "iteration %d: loss %f" % (i, loss)

  # compute the gradient on scores
  dscores = probs
  dscores[range(num_examples),y] -= 1
  dscores /= num_examples

  # backpropate the gradient to the parameters (W,b)
  dW = np.dot(X.T, dscores)
  db = np.sum(dscores, axis=0, keepdims=True)

  dW += reg*W # regularization gradient

  # perform a parameter update
  W += -step_size * dW
  b += -step_size * db
```

Running this prints the output:

```
iteration 0: loss 1.096956
iteration 10: loss 0.917265
iteration 20: loss 0.851503
iteration 30: loss 0.822336
iteration 40: loss 0.807586
iteration 50: loss 0.799448
iteration 60: loss 0.794681
iteration 70: loss 0.791764
iteration 80: loss 0.789920
iteration 90: loss 0.788726
iteration 100: loss 0.787938
iteration 110: loss 0.787409
iteration 120: loss 0.787049
iteration 130: loss 0.786803
iteration 140: loss 0.786633
iteration 150: loss 0.786514
iteration 160: loss 0.786431
iteration 170: loss 0.786373
iteration 180: loss 0.786331
iteration 190: loss 0.786302
```

We see that we've converged to something after about 190 iterations. We can evaluate the training set accuracy:

```python
# evaluate training set accuracy
scores = np.dot(X, W) + b
predicted_class = np.argmax(scores, axis=1)
print 'training accuracy: %.2f' % (np.mean(predicted_class == y))
```

This prints **49%**. Not very good at all, but also not surprising given that the dataset is constructed so it is not linearly separable. We can also plot the learned decision boundaries:

<div class="fig figcenter fighighlight">
  <img src="/assets/eg/spiral_linear.png">
  <div class="figcaption">
    Linear classifier fails to learn the toy spiral dataset.
  </div>
</div>

<a name='net'></a>

## Training a Neural Network

Clearly, a linear classifier is inadequate for this dataset and we would like to use a Neural Network. One additional hidden layer will suffice for this toy data. We will now need two sets of weights and biases (for the first and second layers):

```python
# initialize parameters randomly
h = 100 # size of hidden layer
W = 0.01 * np.random.randn(D,h)
b = np.zeros((1,h))
W2 = 0.01 * np.random.randn(h,K)
b2 = np.zeros((1,K))
```

The forward pass to compute scores now changes form:

```python
# evaluate class scores with a 2-layer Neural Network
hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation
scores = np.dot(hidden_layer, W2) + b2
```

Notice that the only change from before is one extra line of code, where we first compute the hidden layer representation and then the scores based on this hidden layer. Crucially, we've also added a non-linearity, which in this case is simple ReLU that thresholds the activations on the hidden layer at zero.

Everything else remains the same. We compute the loss based on the scores exactly as before, and get the gradient for the scores `dscores` exactly as before. However, the way we backpropagate that gradient into the model parameters now changes form, of course. First lets backpropagate the second layer of the Neural Network. This looks identical to the code we had for the Softmax classifier, except we're replacing `X` (the raw data), with the variable `hidden_layer`):

```python
# backpropate the gradient to the parameters
# first backprop into parameters W2 and b2
dW2 = np.dot(hidden_layer.T, dscores)
db2 = np.sum(dscores, axis=0, keepdims=True)
```

However, unlike before we are not yet done, because `hidden_layer` is itself a function of other parameters and the data! We need to continue backpropagation through this variable. Its gradient can be computed as:

```python
dhidden = np.dot(dscores, W2.T)
```

Now we have the gradient on the outputs of the hidden layer. Next, we have to backpropagate the ReLU non-linearity. This turns out to be easy because ReLU during the backward pass is effectively a switch. Since \\(r = max(0, x)\\), we have that \\(\frac{dr}{dx} = 1(x > 0) \\). Combined with the chain rule, we see that the ReLU unit lets the gradient pass through unchanged if its input was greater than 0, but *kills it* if its input was less than zero during the forward pass. Hence, we can backpropagate the ReLU in place simply with:

```python
# backprop the ReLU non-linearity
dhidden[hidden_layer <= 0] = 0
```

And now we finally continue to the first layer weights and biases:

```python
# finally into W,b
dW = np.dot(X.T, dhidden)
db = np.sum(dhidden, axis=0, keepdims=True)
```

We're done! We have the gradients `dW,db,dW2,db2` and can perform the parameter update. Everything else remains unchanged. The full code looks very similar:

```python
# initialize parameters randomly
h = 100 # size of hidden layer
W = 0.01 * np.random.randn(D,h)
b = np.zeros((1,h))
W2 = 0.01 * np.random.randn(h,K)
b2 = np.zeros((1,K))

# some hyperparameters
step_size = 1e-0
reg = 1e-3 # regularization strength

# gradient descent loop
num_examples = X.shape[0]
for i in range(10000):

  # evaluate class scores, [N x K]
  hidden_layer = np.maximum(0, np.dot(X, W) + b) # note, ReLU activation
  scores = np.dot(hidden_layer, W2) + b2

  # compute the class probabilities
  exp_scores = np.exp(scores)
  probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) # [N x K]

  # compute the loss: average cross-entropy loss and regularization
  correct_logprobs = -np.log(probs[range(num_examples),y])
  data_loss = np.sum(correct_logprobs)/num_examples
  reg_loss = 0.5*reg*np.sum(W*W) + 0.5*reg*np.sum(W2*W2)
  loss = data_loss + reg_loss
  if i % 1000 == 0:
    print "iteration %d: loss %f" % (i, loss)

  # compute the gradient on scores
  dscores = probs
  dscores[range(num_examples),y] -= 1
  dscores /= num_examples

  # backpropate the gradient to the parameters
  # first backprop into parameters W2 and b2
  dW2 = np.dot(hidden_layer.T, dscores)
  db2 = np.sum(dscores, axis=0, keepdims=True)
  # next backprop into hidden layer
  dhidden = np.dot(dscores, W2.T)
  # backprop the ReLU non-linearity
  dhidden[hidden_layer <= 0] = 0
  # finally into W,b
  dW = np.dot(X.T, dhidden)
  db = np.sum(dhidden, axis=0, keepdims=True)

  # add regularization gradient contribution
  dW2 += reg * W2
  dW += reg * W

  # perform a parameter update
  W += -step_size * dW
  b += -step_size * db
  W2 += -step_size * dW2
  b2 += -step_size * db2
```

This prints:

```
iteration 0: loss 1.098744
iteration 1000: loss 0.294946
iteration 2000: loss 0.259301
iteration 3000: loss 0.248310
iteration 4000: loss 0.246170
iteration 5000: loss 0.245649
iteration 6000: loss 0.245491
iteration 7000: loss 0.245400
iteration 8000: loss 0.245335
iteration 9000: loss 0.245292
```

The training accuracy is now:

```python
# evaluate training set accuracy
hidden_layer = np.maximum(0, np.dot(X, W) + b)
scores = np.dot(hidden_layer, W2) + b2
predicted_class = np.argmax(scores, axis=1)
print 'training accuracy: %.2f' % (np.mean(predicted_class == y))
```

Which prints **98%**!. We can also visualize the decision boundaries:

<div class="fig figcenter fighighlight">
  <img src="/assets/eg/spiral_net.png">
  <div class="figcaption">
    Neural Network classifier crushes the spiral dataset.
  </div>
</div>

## Summary

We've worked with a toy 2D dataset and trained both a linear network and a 2-layer Neural Network. We saw that the change from a linear classifier to a Neural Network involves very few changes in the code. The score function changes its form (1 line of code difference), and the backpropagation changes its form (we have to perform one more round of backprop through the hidden layer to the first layer of the network).

- You may want to look at this IPython Notebook code [rendered as HTML](http://cs.stanford.edu/people/karpathy/cs231nfiles/minimal_net.html).
- Or download the [ipynb file](http://cs.stanford.edu/people/karpathy/cs231nfiles/minimal_net.ipynb)


================================================
FILE: optimization-1.md
================================================
---
layout: page
permalink: /optimization-1/
---

Table of Contents:

- [Introduction](#intro)
- [Visualizing the loss function](#vis)
- [Optimization](#optimization)
  - [Strategy #1: Random Search](#opt1)
  - [Strategy #2: Random Local Search](#opt2)
  - [Strategy #3: Following the gradient](#opt3)
- [Computing the gradient](#gradcompute)
  - [Numerically with finite differences](#numerical)
  - [Analytically with calculus](#analytic)
- [Gradient descent](#gd)
- [Summary](#summary)

<a name='intro'></a>

### Introduction

In the previous section we introduced two key components in context of the image classification task:

1. A (parameterized) **score function** mapping the raw image pixels to class scores (e.g. a linear function)
2. A **loss function** that measured the quality of a particular set of parameters based on how well the induced scores agreed with the ground truth labels in the training data. We saw that there are many ways and versions of this (e.g. Softmax/SVM).

Concretely, recall that the linear function had the form \\( f(x_i, W) =  W x_i \\) and the SVM we developed was formulated as:

$$
L = \frac{1}{N} \sum_i \sum_{j\neq y_i} \left[ \max(0, f(x_i; W)_j - f(x_i; W)_{y_i} + 1) \right] + \alpha R(W)
$$

We saw that a setting of the parameters \\(W\\) that produced predictions for examples \\(x_i\\) consistent with their ground truth labels \\(y_i\\) would also have a very low loss \\(L\\). We are now going to introduce the third and last key component: **optimization**. Optimization is the process of finding the set of parameters \\(W\\) that minimize the loss function.

**Foreshadowing:** Once we understand how these three core components interact, we will revisit the first component (the parameterized function mapping) and extend it to functions much more complicated than a linear mapping: First entire Neural Networks, and then Convolutional Neural Networks. The loss functions and the optimization process will remain relatively unchanged.

<a name='vis'></a>

### Visualizing the loss function

The loss functions we'll look at in this class are usually defined over very high-dimensional spaces (e.g. in CIFAR-10 a linear classifier weight matrix is of size [10 x 3073] for a total of 30,730 parameters), making them difficult to visualize. However, we can still gain some intuitions about one by slicing through the high-dimensional space along rays (1 dimension), or along planes (2 dimensions). For example, we can generate a random weight matrix \\(W\\) (which corresponds to a single point in the space), then march along a ray and record the loss function value along the way. That is, we can generate a random direction \\(W_1\\) and compute the loss along this direction by evaluating \\(L(W + a W_1)\\) for different values of \\(a\\). This process generates a simple plot with the value of \\(a\\) as the x-axis and the value of the loss function as the y-axis. We can also carry out the same procedure with two dimensions by evaluating the loss \\( L(W + a W_1 + b W_2) \\) as we vary \\(a, b\\). In a plot, \\(a, b\\) could then correspond to the x-axis and the y-axis, and the value of the loss function can be visualized with a color:

<div class="fig figcenter fighighlight">
  <img src="/assets/svm1d.png">
  <img src="/assets/svm_one.jpg">
  <img src="/assets/svm_all.jpg">
  <div class="figcaption">
    Loss function landscape for the Multiclass SVM (without regularization) for one single example (left,middle) and for a hundred examples (right) in CIFAR-10. Left: one-dimensional loss by only varying <b>a</b>. Middle, Right: two-dimensional loss slice, Blue = low loss, Red = high loss. Notice the piecewise-linear structure of the loss function. The losses for multiple examples are combined with average, so the bowl shape on the right is the average of many piece-wise linear bowls (such as the one in the middle).
  </div>
</div>

We can explain the piecewise-linear structure of the loss function by examining the math. For a single example we have:

$$
L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + 1) \right]
$$

It is clear from the equation that the data loss for each example is a sum of (zero-thresholded due to the \\(\max(0,-)\\) function) linear functions of \\(W\\). Moreover, each row of \\(W\\) (i.e. \\(w_j\\)) sometimes has a positive sign in front of it (when it corresponds to a wrong class for an example), and sometimes a negative sign (when it corresponds to the correct class for that example). To make this more explicit, consider a simple dataset that contains three 1-dimensional points and three classes. The full SVM loss (without regularization) becomes:

$$
\begin{align}
L_0 = & \max(0, w_1^Tx_0 - w_0^Tx_0 + 1) + \max(0, w_2^Tx_0 - w_0^Tx_0 + 1) \\\\
L_1 = & \max(0, w_0^Tx_1 - w_1^Tx_1 + 1) + \max(0, w_2^Tx_1 - w_1^Tx_1 + 1) \\\\
L_2 = & \max(0, w_0^Tx_2 - w_2^Tx_2 + 1) + \max(0, w_1^Tx_2 - w_2^Tx_2 + 1) \\\\
L = & (L_0 + L_1 + L_2)/3
\end{align}
$$

Since these examples are 1-dimensional, the data \\(x_i\\) and weights \\(w_j\\) are numbers. Looking at, for instance, \\(w_0\\), some terms above are linear functions of \\(w_0\\) and each is clamped at zero. We can visualize this as follows:

<div class="fig figcenter fighighlight">
  <img src="/assets/svmbowl.png">
  <div class="figcaption">
    1-dimensional illustration of the data loss. The x-axis is a single weight and the y-axis is the loss. The data loss is a sum of multiple terms, each of which is either independent of a particular weight, or a linear function of it that is thresholded at zero. The full SVM data loss is a 30,730-dimensional version of this shape.
  </div>
</div>

As an aside, you may have guessed from its bowl-shaped appearance that the SVM cost function is an example of a [convex function](http://en.wikipedia.org/wiki/Convex_function) There is a large amount of literature devoted to efficiently minimizing these types of functions, and you can also take a Stanford class on the topic ( [convex optimization](http://stanford.edu/~boyd/cvxbook/) ). Once we extend our score functions \\(f\\) to Neural Networks our objective functions will become non-convex, and the visualizations above will not feature bowls but complex, bumpy terrains.

*Non-differentiable loss functions*. As a technical note, you can also see that the *kinks* in the loss function (due to the max operation) technically make the loss function non-differentiable because at these kinks the gradient is not defined. However, the [subgradient](http://en.wikipedia.org/wiki/Subderivative) still exists and is commonly used instead. In this class will use the terms *subgradient* and *gradient* interchangeably.

<a name='optimization'></a>

### Optimization

To reiterate, the loss function lets us quantify the quality of any particular set of weights **W**. The goal of optimization is to find **W** that minimizes the loss function. We will now motivate and slowly develop an approach to optimizing the loss function. For those of you coming to this class with previous experience, this section might seem odd since the working example we'll use (the SVM loss) is a convex problem, but keep in mind that our goal is to eventually optimize Neural Networks where we can't easily use any of the tools developed in the Convex Optimization literature.

<a name='opt1'></a>

#### Strategy #1: A first very bad idea solution: Random search

Since it is so simple to check how good a given set of parameters **W** is, the first (very bad) idea that may come to mind is to simply try out many different random weights and keep track of what works best. This procedure might look as follows:

```python
# assume X_train is the data where each column is an example (e.g. 3073 x 50,000)
# assume Y_train are the labels (e.g. 1D array of 50,000)
# assume the function L evaluates the loss function

bestloss = float("inf") # Python assigns the highest possible float value
for num in range(1000):
  W = np.random.randn(10, 3073) * 0.0001 # generate random parameters
  loss = L(X_train, Y_train, W) # get the loss over the entire training set
  if loss < bestloss: # keep track of the best solution
    bestloss = loss
    bestW = W
  print 'in attempt %d the loss was %f, best %f' % (num, loss, bestloss)

# prints:
# in attempt 0 the loss was 9.401632, best 9.401632
# in attempt 1 the loss was 8.959668, best 8.959668
# in attempt 2 the loss was 9.044034, best 8.959668
# in attempt 3 the loss was 9.278948, best 8.959668
# in attempt 4 the loss was 8.857370, best 8.857370
# in attempt 5 the loss was 8.943151, best 8.857370
# in attempt 6 the loss was 8.605604, best 8.605604
# ... (trunctated: continues for 1000 lines)
```

In the code above, we see that we tried out several random weight vectors **W**, and some of them work better than others. We can take the best weights **W** found by this search and try it out on the test set:

```python
# Assume X_test is [3073 x 10000], Y_test [10000 x 1]
scores = Wbest.dot(Xte_cols) # 10 x 10000, the class scores for all test examples
# find the index with max score in each column (the predicted class)
Yte_predict = np.argmax(scores, axis = 0)
# and calculate accuracy (fraction of predictions that are correct)
np.mean(Yte_predict == Yte)
# returns 0.1555
```

With the best **W** this gives an accuracy of about **15.5%**. Given that guessing classes completely at random achieves only 10%, that's not a very bad outcome for a such a brain-dead random search solution!

**Core idea: iterative refinement**. Of course, it turns out that we can do much better. The core idea is that finding the best set of weights **W** is a very difficult or even impossible problem (especially once **W** contains weights for entire complex neural networks), but the problem of refining a specific set of weights **W** to be slightly better is significantly less difficult. In other words, our approach will be to start with a random **W** and then iteratively refine it, making it slightly better each time.

> Our strategy will be to start with random weights and iteratively refine them over time to get lower loss

**Blindfolded hiker analogy.** One analogy that you may find helpful going forward is to think of yourself as hiking on a hilly terrain with a blindfold on, and trying to reach the bottom. In the example of CIFAR-10, the hills are 30,730-dimensional, since the dimensions of **W** are 10 x 3073. At every point on the hill we achieve a particular loss (the height of the terrain).

<a name='opt2'></a>

#### Strategy #2: Random Local Search

The first strategy you may think of is to try to extend one foot in a random direction and then take a step only if it leads downhill. Concretely, we will start out with a random \\(W\\), generate random perturbations \\( \delta W \\) to it and if the loss at the perturbed \\(W + \delta W\\) is lower, we will perform an update. The code for this procedure is as follows:

```python
W = np.random.randn(10, 3073) * 0.001 # generate random starting W
bestloss = float("inf")
for i in range(1000):
  step_size = 0.0001
  Wtry = W + np.random.randn(10, 3073) * step_size
  loss = L(Xtr_cols, Ytr, Wtry)
  if loss < bestloss:
    W = Wtry
    bestloss = loss
  print 'iter %d loss is %f' % (i, bestloss)
```

Using the same number of loss function evaluations as before (1000), this approach achieves test set classification accuracy of **21.4%**. This is better, but still wasteful and computationally expensive.

<a name='opt3'></a>

#### Strategy #3: Following the Gradient

In the previous section we tried to find a direction in the weight-space that would improve our weight vector (and give us a lower loss). It turns out that there is no need to randomly search for a good direction: we can compute the *best* direction along which we should change our weight vector that is mathematically guaranteed to be the direction of the steepest descent (at least in the limit as the step size goes towards zero). This direction will be related to the **gradient** of the loss function. In our hiking analogy, this approach roughly corresponds to feeling the slope of the hill below our feet and stepping down the direction that feels steepest.

In one-dimensional functions, the slope is the instantaneous rate of change of the function at any point you might be interested in. The gradient is a generalization of slope for functions that don't take a single number but a vector of numbers. Additionally, the gradient is just a vector of slopes (more commonly referred to as **derivatives**) for each dimension in the input space. The mathematical expression for the derivative of a 1-D function with respect its input is:

$$
\frac{df(x)}{dx} = \lim_{h\ \to 0} \frac{f(x + h) - f(x)}{h}
$$

When the functions of interest take a vector of numbers instead of a single number, we call the derivatives **partial derivatives**, and the gradient is simply the vector of partial derivatives in each dimension.

<a name='gradcompute'></a>

### Computing the gradient

There are two ways to compute the gradient: A slow, approximate but easy way (**numerical gradient**), and a fast, exact but more error-prone way that requires calculus (**analytic gradient**). We will now present both.

<a name='numerical'></a>

#### Computing the gradient numerically with finite differences

The formula given above allows us to compute the gradient numerically. Here is a generic function that takes a function `f`, a vector `x` to evaluate the gradient on, and returns the gradient of `f` at `x`:

```python
def eval_numerical_gradient(f, x):
  """
  a naive implementation of numerical gradient of f at x
  - f should be a function that takes a single argument
  - x is the point (numpy array) to evaluate the gradient at
  """

  fx = f(x) # evaluate function value at original point
  grad = np.zeros(x.shape)
  h = 0.00001

  # iterate over all indexes in x
  it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
  while not it.finished:

    # evaluate function at x+h
    ix = it.multi_index
    old_value = x[ix]
    x[ix] = old_value + h # increment by h
    fxh = f(x) # evalute f(x + h)
    x[ix] = old_value # restore to previous value (very important!)

    # compute the partial derivative
    grad[ix] = (fxh - fx) / h # the slope
    it.iternext() # step to next dimension

  return grad
```

Following the gradient formula we gave above, the code above iterates over all dimensions one by one, makes a small change `h` along that dimension and calculates the partial derivative of the loss function along that dimension by seeing how much the function changed. The variable `grad` holds the full gradient in the end.

**Practical considerations**. Note that in the mathematical formulation the gradient is defined in the limit as **h** goes towards zero, but in practice it is often sufficient to use a very small value (such as 1e-5 as seen in the example). Ideally, you want to use the smallest step size that does not lead to numerical issues. Additionally, in practice it often works better to compute the numeric gradient using the **centered difference formula**: \\( [f(x+h) - f(x-h)] / 2 h \\) . See [wiki](http://en.wikipedia.org/wiki/Numerical_differentiation) for details.

We can use the function given above to compute the gradient at any point and for any function. Lets compute the gradient for the CIFAR-10 loss function at some random point in the weight space:

```python

# to use the generic code above we want a function that takes a single argument
# (the weights in our case) so we close over X_train and Y_train
def CIFAR10_loss_fun(W):
  return L(X_train, Y_train, W)

W = np.random.rand(10, 3073) * 0.001 # random weight vector
df = eval_numerical_gradient(CIFAR10_loss_fun, W) # get the gradient
```

The gradient tells us the slope of the loss function along every dimension, which we can use to make an update:

```python
loss_original = CIFAR10_loss_fun(W) # the original loss
print 'original loss: %f' % (loss_original, )

# lets see the effect of multiple step sizes
for step_size_log in [-10, -9, -8, -7, -6, -5,-4,-3,-2,-1]:
  step_size = 10 ** step_size_log
  W_new = W - step_size * df # new position in the weight space
  loss_new = CIFAR10_loss_fun(W_new)
  print 'for step size %f new loss: %f' % (step_size, loss_new)

# prints:
# original loss: 2.200718
# for step size 1.000000e-10 new loss: 2.200652
# for step size 1.000000e-09 new loss: 2.200057
# for step size 1.000000e-08 new loss: 2.194116
# for step size 1.000000e-07 new loss: 2.135493
# for step size 1.000000e-06 new loss: 1.647802
# for step size 1.000000e-05 new loss: 2.844355
# for step size 1.000000e-04 new loss: 25.558142
# for step size 1.000000e-03 new loss: 254.086573
# for step size 1.000000e-02 new loss: 2539.370888
# for step size 1.000000e-01 new loss: 25392.214036
```

**Update in negative gradient direction**. In the code above, notice that to compute `W_new` we are making an update in the negative direction of the gradient `df` since we wish our loss function to decrease, not increase.

**Effect of step size**. The gradient tells us the direction in which the function has the steepest rate of increase, but it does not tell us how far along this direction we should step. As we will see later in the course, choosing the step size (also called the *learning rate*) will become one of the most important (and most headache-inducing) hyperparameter settings in training a neural network. In our blindfolded hill-descent analogy, we feel the hill below our feet sloping in some direction, but the step length we should take is uncertain. If we shuffle our feet carefully we can expect to make consistent but very small progress (this corresponds to having a small step size). Conversely, we can choose to make a large, confident step in an attempt to descend faster, but this may not pay off. As you can see in the code example above, at some point taking a bigger step gives a higher loss as we "overstep".

<div class="fig figleft fighighlight">
  <img src="/assets/stepsize.jpg">
  <div class="figcaption">
    Visualizing the effect of step size. We start at some particular spot W and evaluate the gradient (or rather its negative - the white arrow) which tells us the direction of the steepest decrease in the loss function. Small steps are likely to lead to consistent but slow progress. Large steps can lead to better progress but are more risky. Note that eventually, for a large step size we will overshoot and make the loss worse. The step size (or as we will later call it - the <b>learning rate</b>) will become one of the most important hyperparameters that we will have to carefully tune.
  </div>
  <div style="clear:both;"></div>
</div>

**A problem of efficiency**. You may have noticed that evaluating the numerical gradient has complexity linear in the number of parameters. In our example we had 30730 parameters in total and therefore had to perform 30,731 evaluations of the loss function to evaluate the gradient and to perform only a single parameter update. This problem only gets worse, since modern Neural Networks can easily have tens of millions of parameters. Clearly, this strategy is not scalable and we need something better.

<a name='analytic'></a>

#### Computing the gradient analytically with Calculus

The numerical gradient is very simple to compute using the finite difference approximation, but the downside is that it is approximate (since we have to pick a small value of *h*, while the true gradient is defined as the limit as *h* goes to zero), and that it is very computationally expensive to compute. The second way to compute the gradient is analytically using Calculus, which allows us to derive a direct formula for the gradient (no approximations) that is also very fast to compute. However, unlike the numerical gradient it can be more error prone to implement, which is why in practice it is very common to compute the analytic gradient and compare it to the numerical gradient to check the correctness of your implementation. This is called a **gradient check**.

Lets use the example of the SVM loss function for a single datapoint:

$$
L_i = \sum_{j\neq y_i} \left[ \max(0, w_j^Tx_i - w_{y_i}^Tx_i + \Delta) \right]
$$

We can differentiate the function with respect to the weights. For example, taking the gradient with respect to \\(w_{y_i}\\) we obtain:

$$
\nabla_{w_{y_i}} L_i = - \left( \sum_{j\neq y_i} \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) \right) x_i
$$

where \\(\mathbb{1}\\) is the indicator function that is one if the condition inside is true or zero otherwise. While the expression may look scary when it is written out, when you're implementing this in code you'd simply count the number of classes that didn't meet the desired margin (and hence contributed to the loss function) and then the data vector \\(x_i\\) scaled by this number is the gradient. Notice that this is the gradient only with respect to the row of \\(W\\) that corresponds to the correct class. For the other rows where \\(j \neq y_i \\) the gradient is:

$$
\nabla_{w_j} L_i = \mathbb{1}(w_j^Tx_i - w_{y_i}^Tx_i + \Delta > 0) x_i
$$

Once you derive the expression for the gradient it is straight-forward to implement the expressions and use them to perform the gradient update.

<a name='gd'></a>

### Gradient Descent

Now that we can compute the gradient of the loss function, the procedure of repeatedly evaluating the gradient and then performing a parameter update is called *Gradient Descent*. Its **vanilla** version looks as follows:

```python
# Vanilla Gradient Descent

while True:
  weights_grad = evaluate_gradient(loss_fun, data, weights)
  weights += - step_size * weights_grad # perform parameter update
```

This simple loop is at the core of all Neural Network libraries. There are other ways of performing the optimization (e.g. LBFGS), but Gradient Descent is currently by far the most common and established way of optimizing Neural Network loss functions. Throughout the class we will put some bells and whistles on the details of this loop (e.g. the exact details of the update equation), but the core idea of following the gradient until we're happy with the results will remain the same.

**Mini-batch gradient descent.** In large-scale applications (such as the ILSVRC challenge), the training data can have on order of millions of examples. Hence, it seems wasteful to compute the full loss function over the entire training set in order to perform only a single parameter update. A very common approach to addressing this challenge is to compute the gradient over **batches** of the training data. For example, in current state of the art ConvNets, a typical batch contains 256 examples from the entire training set of 1.2 million. This batch is then used to perform a parameter update:

```python
# Vanilla Minibatch Gradient Descent

while True:
  data_batch = sample_training_data(data, 256) # sample 256 examples
  weights_grad = evaluate_gradient(loss_fun, data_batch, weights)
  weights += - step_size * weights_grad # perform parameter update
```

The reason this works well is that the examples in the training data are correlated. To see this, consider the extreme case where all 1.2 million images in ILSVRC are in fact made up of exact duplicates of only 1000 unique images (one for each class, or in other words 1200 identical copies of each image). Then it is clear that the gradients we would compute for all 1200 identical copies would all be the same, and when we average the data loss over all 1.2 million images we would get the exact same loss as if we only evaluated on a small subset of 1000. In practice of course, the dataset would not contain duplicate images, the gradient from a mini-batch is a good approximation of the gradient of the full objective. Therefore, much faster convergence can be achieved in practice by evaluating the mini-batch gradients to perform more frequent parameter updates.

The extreme case of this is a setting where the mini-batch contains only a single example. This process is called **Stochastic Gradient Descent (SGD)** (or also sometimes **on-line** gradient descent). This is relatively less common to see because in practice due to vectorized code optimizations it can be computationally much more efficient to evaluate the gradient for 100 examples, than the gradient for one example 100 times. Even though SGD technically refers to using a single example at a time to evaluate the gradient, you will hear people use the term SGD even when referring to mini-batch gradient descent (i.e. mentions of MGD for "Minibatch Gradient Descent", or BGD for "Batch gradient descent" are rare to see), where it is usually assumed that mini-batches are used. The size of the mini-batch is a hyperparameter but it is not very common to cross-validate it. It is usually based on memory constraints (if any), or set to some value, e.g. 32, 64 or 128. We use powers of 2 in practice because many vectorized operation implementations work faster when their inputs are sized in powers of 2.

<a name='summary'></a>

### Summary

<div class="fig figcenter fighighlight">
  <img src="/assets/dataflow.jpeg">
  <div class="figcaption">
    Summary of the information flow. The dataset of pairs of <b>(x,y)</b> is given and fixed. The weights start out as random numbers and can change. During the forward pass the score function computes class scores, stored in vector <b>f</b>. The loss function contains two components: The data loss computes the compatibility between the scores <b>f</b> and the labels <b>y</b>. The regularization loss is only a function of the weights. During Gradient Descent, we compute the gradient on the weights (and optionally on data if we wish) and use them to perform a parameter update during Gradient Descent.
  </div>
</div>


In this section,

- We developed the intuition of the loss function as a **high-dimensional optimization landscape** in which we are trying to reach the bottom. The working analogy we developed was that of a blindfolded hiker who wishes to reach the bottom. In particular, we saw that the SVM cost function is piece-wise linear and bowl-shaped.
- We motivated the idea of optimizing the loss function with
**iterative refinement**, where we start with a random set of weights and refine them step by step until the loss is minimized.
- We saw that the **gradient** of a function gives the steepest ascent direction and we discussed a simple but inefficient way of computing it numerically using the finite difference approximation (the finite difference being the value of *h* used in computing the numerical gradient).
- We saw that the parameter update requires a tricky setting of the **step size** (or the **learning rate**) that must be set just right: if it is too low the progress is steady but slow. If it is too high the progress can be faster, but more risky. We will explore this tradeoff in much more detail in future sections.
- We discussed the tradeoffs between computing the **numerical** and **analytic** gradient. The numerical gradient is simple but it is approximate and expensive to compute. The analytic gradient is exact, fast to compute but more error-prone since it requires the derivation of the gradient with math. Hence, in practice we always use the analytic gradient and then perform a **gradient check**, in which its implementation is compared to the numerical gradient.
- We introduced the **Gradient Descent** algorithm which iteratively computes the gradient and performs a parameter update in loop.

**Coming up:** The core takeaway from this section is that the ability to compute the gradient of a loss function with respect to its weights (and have some intuitive understanding of it) is the most important skill needed to design, train and understand neural networks. In the next section we will develop proficiency in computing the gradient analytically using the chain rule, otherwise also referred to as **backpropagation**. This will allow us to efficiently optimize relatively arbitrary loss functions that express all kinds of Neural Networks, including Convolutional Neural Networks.


================================================
FILE: optimization-2.md
================================================
---
layout: page
permalink: /optimization-2/
---

Table of Contents:

- [Introduction](#intro)
- [Simple expressions, interpreting the gradient](#grad)
- [Compound expressions, chain rule, backpropagation](#backprop)
- [Intuitive understanding of backpropagation](#intuitive)
- [Modularity: Sigmoid example](#sigmoid)
- [Backprop in practice: Staged computation](#staged)
- [Patterns in backward flow](#patterns)
- [Gradients for vectorized operations](#mat)
- [Summary](#summary)

<a name='intro'></a>

### Introduction

**Motivation**. In this section we will develop expertise with an intuitive understanding of **backpropagation**, which is a way of computing gradients of expressions through recursive application of **chain rule**. Understanding of this process and its subtleties is critical for you to understand, and effectively develop, design and debug neural networks.

**Problem statement**. The core problem studied in this section is as follows: We are given some function \\(f(x)\\) where \\(x\\) is a vector of inputs and we are interested in computing the gradient of \\(f\\) at \\(x\\) (i.e. \\(\nabla f(x)\\) ).

**Motivation**. Recall that the primary reason we are interested in this problem is that in the specific case of neural networks, \\(f\\) will correspond to the loss function ( \\(L\\) ) and the inputs \\(x\\) will consist of the training data and the neural network weights. For example, the loss could be the SVM loss function and the inputs are both the training data \\((x_i,y_i), i=1 \ldots N\\) and the weights and biases \\(W,b\\). Note that (as is usually the case in Machine Learning) we think of the training data as given and fixed, and of the weights as variables we have control over. Hence, even though we can easily use backpropagation to compute the gradient on the input examples \\(x_i\\), in practice we usually only compute the gradient for the parameters (e.g. \\(W,b\\)) so that we can use it to perform a parameter update. However, as we will see later in the class the gradient on \\(x_i\\) can still be useful sometimes, for example for purposes of visualization and interpreting what the Neural Network might be doing.

If you are coming to this class and you're comfortable with deriving gradients with chain rule, we would still like to encourage you to at least skim this section, since it presents a rarely developed view of backpropagation as backward flow in real-valued circuits and any insights you'll gain may help you throughout the class.

<a name='grad'></a>

### Simple expressions and interpretation of the gradient

Lets start simple so that we can develop the notation and conventions for more complex expressions. Consider a simple multiplication function of two numbers \\(f(x,y) = x y\\). It is a matter of simple calculus to derive the partial derivative for either input:

$$
f(x,y) = x y \hspace{0.5in} \rightarrow \hspace{0.5in} \frac{\partial f}{\partial x} = y \hspace{0.5in} \frac{\partial f}{\partial y} = x 
$$

**Interpretation**. Keep in mind what the derivatives tell you: They indicate the rate of change of a function with respect to that variable surrounding an infinitesimally small region near a particular point:

$$
\frac{df(x)}{dx} = \lim_{h\ \to 0} \frac{f(x + h) - f(x)}{h}
$$

A technical note is that the division sign on the left-hand side is, unlike the division sign on the right-hand side, not a division. Instead, this notation indicates that the operator \\(  \frac{d}{dx} \\) is being applied to the function \\(f\\), and returns a different function (the derivative). A nice way to think about the expression above is that when \\(h\\) is very small, then the function is well-approximated by a straight line, and the derivative is its slope. In other words, the derivative on each variable tells you the sensitivity of the whole expression on its value. For example, if \\(x = 4, y = -3\\) then \\(f(x,y) = -12\\) and the derivative on \\(x\\) \\(\frac{\partial f}{\partial x} = -3\\). This tells us that if we were to increase the value of this variable by a tiny amount, the effect on the whole expression would be to decrease it (due to the negative sign), and by three times that amount. This can be seen by rearranging the above equation ( \\( f(x + h) = f(x) + h \frac{df(x)}{dx} \\) ). Analogously, since \\(\frac{\partial f}{\partial y} = 4\\), we expect that increasing the value of \\(y\\) by some very small amount \\(h\\) would also increase the output of the function (due to the positive sign), and by \\(4h\\).

> The derivative on each variable tells you the sensitivity of the whole expression on its value.

As mentioned, the gradient \\(\nabla f\\) is the vector of partial derivatives, so we have that \\(\nabla f = [\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}] = [y, x]\\). Even though the gradient is technically a vector, we will often use terms such as *"the gradient on x"* instead of the technically correct phrase *"the partial derivative on x"* for simplicity.

We can also derive the derivatives for the addition operation:

$$
f(x,y) = x + y \hspace{0.5in} \rightarrow \hspace{0.5in} \frac{\partial f}{\partial x} = 1 \hspace{0.5in} \frac{\partial f}{\partial y} = 1
$$

that is, the derivative on both \\(x,y\\) is one regardless of what the values of \\(x,y\\) are. This makes sense, since increasing either \\(x,y\\) would increase the output of \\(f\\), and the rate of that increase would be independent of what the actual values of \\(x,y\\) are (unlike the case of multiplication above). The last function we'll use quite a bit in the class is the *max* operation:

$$
f(x,y) = \max(x, y) \hspace{0.5in} \rightarrow \hspace{0.5in} \frac{\partial f}{\partial x} = \mathbb{1}(x >= y) \hspace{0.5in} \frac{\partial f}{\partial y} = \mathbb{1}(y >= x)
$$

That is, the (sub)gradient is 1 on the input that was larger and 0 on the other input. Intuitively, if the inputs are \\(x = 4,y = 2\\), then the max is 4, and the function is not sensitive to the setting of \\(y\\). That is, if we were to increase it by a tiny amount \\(h\\), the function would keep outputting 4, and therefore the gradient is zero: there is no effect. Of course, if we were to change \\(y\\) by a large amount (e.g. larger than 2), then the value of \\(f\\) would change, but the derivatives tell us nothing about the effect of such large changes on the inputs of a function; They are only informative for tiny, infinitesimally small changes on the inputs, as indicated by the \\(\lim_{h \rightarrow 0}\\) in its definition.


<a name='backprop'></a>

### Compound expressions with chain rule

Lets now start to consider more complicated expressions that involve multiple composed functions, such as \\(f(x,y,z) = (x + y) z\\). This expression is still simple enough to differentiate directly, but we'll take a particular approach to it that will be helpful with understanding the intuition behind backpropagation. In particular, note that this expression can be broken down into two expressions: \\(q = x + y\\) and \\(f = q z\\). Moreover, we know how to compute the derivatives of both expressions separately, as seen in the previous section. \\(f\\) is just multiplication of \\(q\\) and \\(z\\), so \\(\frac{\partial f}{\partial q} = z, \frac{\partial f}{\partial z} = q\\), and \\(q\\) is addition of \\(x\\) and \\(y\\) so \\( \frac{\partial q}{\partial x} = 1, \frac{\partial q}{\partial y} = 1 \\). However, we don't necessarily care about the gradient on the intermediate value \\(q\\) - the value of \\(\frac{\partial f}{\partial q}\\) is not useful. Instead, we are ultimately interested in the gradient of \\(f\\) with respect to its inputs \\(x,y,z\\). The **chain rule** tells us that the correct way to "chain" these gradient expressions together is through multiplication. For example, \\(\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \frac{\partial q}{\partial x} \\). In practice this is simply a multiplication of the two numbers that hold the two gradients. Lets see this with an example:

```python
# set some inputs
x = -2; y = 5; z = -4

# perform the forward pass
q = x + y # q becomes 3
f = q * z # f becomes -12

# perform the backward pass (backpropagation) in reverse order:
# first backprop through f = q * z
dfdz = q # df/dz = q, so gradient on z becomes 3
dfdq = z # df/dq = z, so gradient on q becomes -4
dqdx = 1.0
dqdy = 1.0
# now backprop through q = x + y
dfdx = dfdq * dqdx  # The multiplication here is the chain rule!
dfdy = dfdq * dqdy  
```

We are left with the gradient in the variables `[dfdx,dfdy,dfdz]`, which tell us the sensitivity of the variables `x,y,z` on `f`!. This is the simplest example of backpropagation. Going forward, we will use a more concise notation that omits the `df` prefix. For example, we will simply write `dq` instead of `dfdq`, and always assume that the gradient is computed on the final output.

This computation can also be nicely visualized with a circuit diagram:

<div class="fig figleft fighighlight">
<svg style="max-width: 420px" viewbox="0 0 420 220"><defs><marker id="arrowhead" refX="6" refY="2" markerWidth="6" markerHeight="4" orient="auto"><path d="M 0,0 V 4 L6,2 Z"></path></marker></defs><line x1="40" y1="30" x2="110" y2="30" stroke="black" stroke-width="1"></line><text x="45" y="24" font-size="16" fill="green">-2</text><text x="45" y="47" font-size="16" fill="red">-4</text><text x="35" y="24" font-size="16" text-anchor="end" fill="black">x</text><line x1="40" y1="100" x2="110" y2="100" stroke="black" stroke-width="1"></line><text x="45" y="94" font-size="16" fill="green">5</text><text x="45" y="117" font-size="16" fill="red">-4</text><text x="35" y="94" font-size="16" text-anchor="end" fill="black">y</text><line x1="40" y1="170" x2="110" y2="170" stroke="black" stroke-width="1"></line><text x="45" y="164" font-size="16" fill="green">-4</text><text x="45" y="187" font-size="16" fill="red">3</text><text x="35" y="164" font-size="16" text-anchor="end" fill="black">z</text><line x1="210" y1="65" x2="280" y2="65" stroke="black" stroke-width="1"></line><text x="215" y="59" font-size="16" fill="green">3</text><text x="215" y="82" font-size="16" fill="red">-4</text><text x="205" y="59" font-size="16" text-anchor="end" fill="black">q</text><circle cx="170" cy="65" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="170" y="70" font-size="20" fill="black" text-anchor="middle">+</text><line x1="110" y1="30" x2="150" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="110" y1="100" x2="150" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="190" y1="65" x2="210" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="380" y1="117" x2="450" y2="117" stroke="black" stroke-width="1"></line><text x="385" y="111" font-size="16" fill="green">-12</text><text x="385" y="134" font-size="16" fill="red">1</text><text x="375" y="111" font-size="16" text-anchor="end" fill="black">f</text><circle cx="340" cy="117" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="340" y="127" font-size="20" fill="black" text-anchor="middle">*</text><line x1="280" y1="65" x2="320" y2="117" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="110" y1="170" x2="320" y2="117" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="360" y1="117" x2="380" y2="117" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line></svg>

<div class="figcaption">
  The real-valued <i>"circuit"</i> on left shows the visual representation of the computation. The <b>forward pass</b> computes values from inputs to output (shown in green). The <b>backward pass</b> then performs backpropagation which starts at the end and recursively applies the chain rule to compute the gradients (shown in red) all the way to the inputs of the circuit. The gradients can be thought of as flowing backwards through the circuit.
</div>
<div style="clear:both;"></div>
</div>

<a name='intuitive'></a>

### Intuitive understanding of backpropagation

Notice that backpropagation is a beautifully local process. Every gate in a circuit diagram gets some inputs and can right away compute two things: 1. its output value and 2. the *local* gradient of its output with respect to its inputs. Notice that the gates can do this completely independently without being aware of any of the details of the full circuit that they are embedded in. However, once the forward pass is over, during backpropagation the gate will eventually learn about the gradient of its output value on the final output of the entire circuit. Chain rule says that the gate should take that gradient and multiply it into every gradient it normally computes for all of its inputs.

> This extra multiplication (for each input) due to the chain rule can turn a single and relatively useless gate into a cog in a complex circuit such as an entire neural network.

Lets get an intuition for how this works by referring again to the example. The add gate received inputs [-2, 5] and computed output 3. Since the gate is computing the addition operation, its local gradient for both of its inputs is +1. The rest of the circuit computed the final value, which is -12. During the backward pass in which the chain rule is applied recursively backwards through the circuit, the add gate (which is an input to the multiply gate) learns that the gradient for its output was -4. If we anthropomorphize the circuit as wanting to output a higher value (which can help with intuition), then we can think of the circuit as "wanting" the output of the add gate to be lower (due to negative sign), and with a *force* of 4. To continue the recurrence and to chain the gradient, the add gate takes that gradient and multiplies it to all of the local gradients for its inputs (making the gradient on both **x** and **y** 1 * -4 = -4). Notice that this has the desired effect: If **x,y** were to decrease (responding to their negative gradient) then the add gate's output would decrease, which in turn makes the multiply gate's output increase.

Backpropagation can thus be thought of as gates communicating to each other (through the gradient signal) whether they want their outputs to increase or decrease (and how strongly), so as to make the final output value higher.

<a name='sigmoid'></a>

### Modularity: Sigmoid example

The gates we introduced above are relatively arbitrary. Any kind of differentiable function can act as a gate, and we can group multiple gates into a single gate, or decompose a function into multiple gates whenever it is convenient. Lets look at another expression that illustrates this point:

$$
f(w,x) = \frac{1}{1+e^{-(w_0x_0 + w_1x_1 + w_2)}}
$$

as we will see later in the class, this expression describes a 2-dimensional neuron (with inputs **x** and weights **w**) that uses the *sigmoid activation* function. But for now lets think of this very simply as just a function from inputs *w,x* to a single number. The function is made up of multiple gates. In addition to the ones described already above (add, mul, max), there are four more:

$$
f(x) = \frac{1}{x} 
\hspace{1in} \rightarrow \hspace{1in} 
\frac{df}{dx} = -1/x^2 
\\\\
f_c(x) = c + x
\hspace{1in} \rightarrow \hspace{1in} 
\frac{df}{dx} = 1 
\\\\
f(x) = e^x
\hspace{1in} \rightarrow \hspace{1in} 
\frac{df}{dx} = e^x
\\\\
f_a(x) = ax
\hspace{1in} \rightarrow \hspace{1in} 
\frac{df}{dx} = a
$$

Where the functions \\(f_c, f_a\\) translate the input by a constant of \\(c\\) and scale the input by a constant of \\(a\\), respectively. These are technically special cases of addition and multiplication, but we introduce them as (new) unary gates here since we do not need the gradients for the constants \\(c,a\\). The full circuit then looks as follows:

<div class="fig figleft fighighlight">
<svg style="max-width: 799px" viewbox="0 0 799 306"><g transform="scale(0.8)"><defs><marker id="arrowhead" refX="6" refY="2" markerWidth="6" markerHeight="4" orient="auto"><path d="M 0,0 V 4 L6,2 Z"></path></marker></defs><line x1="50" y1="30" x2="90" y2="30" stroke="black" stroke-width="1"></line><text x="55" y="24" font-size="16" fill="green">2.00</text><text x="55" y="47" font-size="16" fill="red">-0.20</text><text x="45" y="24" font-size="16" text-anchor="end" fill="black">w0</text><line x1="50" y1="100" x2="90" y2="100" stroke="black" stroke-width="1"></line><text x="55" y="94" font-size="16" fill="green">-1.00</text><text x="55" y="117" font-size="16" fill="red">0.39</text><text x="45" y="94" font-size="16" text-anchor="end" fill="black">x0</text><line x1="50" y1="170" x2="90" y2="170" stroke="black" stroke-width="1"></line><text x="55" y="164" font-size="16" fill="green">-3.00</text><text x="55" y="187" font-size="16" fill="red">-0.39</text><text x="45" y="164" font-size="16" text-anchor="end" fill="black">w1</text><line x1="50" y1="240" x2="90" y2="240" stroke="black" stroke-width="1"></line><text x="55" y="234" font-size="16" fill="green">-2.00</text><text x="55" y="257" font-size="16" fill="red">-0.59</text><text x="45" y="234" font-size="16" text-anchor="end" fill="black">x1</text><line x1="50" y1="310" x2="90" y2="310" stroke="black" stroke-width="1"></line><text x="55" y="304" font-size="16" fill="green">-3.00</text><text x="55" y="327" font-size="16" fill="red">0.20</text><text x="45" y="304" font-size="16" text-anchor="end" fill="black">w2</text><line x1="170" y1="65" x2="210" y2="65" stroke="black" stroke-width="1"></line><text x="175" y="59" font-size="16" fill="green">-2.00</text><text x="175" y="82" font-size="16" fill="red">0.20</text><circle cx="130" cy="65" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="130" y="75" font-size="20" fill="black" text-anchor="middle">*</text><line x1="90" y1="30" x2="110" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="90" y1="100" x2="110" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="150" y1="65" x2="170" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="170" y1="205" x2="210" y2="205" stroke="black" stroke-width="1"></line><text x="175" y="199" font-size="16" fill="green">6.00</text><text x="175" y="222" font-size="16" fill="red">0.20</text><circle cx="130" cy="205" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="130" y="215" font-size="20" fill="black" text-anchor="middle">*</text><line x1="90" y1="170" x2="110" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="90" y1="240" x2="110" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="150" y1="205" x2="170" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="290" y1="135" x2="330" y2="135" stroke="black" stroke-width="1"></line><text x="295" y="129" font-size="16" fill="green">4.00</text><text x="295" y="152" font-size="16" fill="red">0.20</text><circle cx="250" cy="135" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="250" y="140" font-size="20" fill="black" text-anchor="middle">+</text><line x1="210" y1="65" x2="230" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="210" y1="205" x2="230" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="270" y1="135" x2="290" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="410" y1="222" x2="450" y2="222" stroke="black" stroke-width="1"></line><text x="415" y="216" font-size="16" fill="green">1.00</text><text x="415" y="239" font-size="16" fill="red">0.20</text><circle cx="370" cy="222" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="370" y="227" font-size="20" fill="black" text-anchor="middle">+</text><line x1="330" y1="135" x2="350" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="90" y1="310" x2="350" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="390" y1="222" x2="410" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="530" y1="222" x2="570" y2="222" stroke="black" stroke-width="1"></line><text x="535" y="216" font-size="16" fill="green">-1.00</text><text x="535" y="239" font-size="16" fill="red">-0.20</text><circle cx="490" cy="222" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="490" y="227" font-size="20" fill="black" text-anchor="middle">*-1</text><line x1="450" y1="222" x2="470" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="510" y1="222" x2="530" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="650" y1="222" x2="690" y2="222" stroke="black" stroke-width="1"></line><text x="655" y="216" font-size="16" fill="green">0.37</text><text x="655" y="239" font-size="16" fill="red">-0.53</text><circle cx="610" cy="222" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="610" y="227" font-size="20" fill="black" text-anchor="middle">exp</text><line x1="570" y1="222" x2="590" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="630" y1="222" x2="650" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="770" y1="222" x2="810" y2="222" stroke="black" stroke-width="1"></line><text x="775" y="216" font-size="16" fill="green">1.37</text><text x="775" y="239" font-size="16" fill="red">-0.53</text><circle cx="730" cy="222" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="730" y="227" font-size="20" fill="black" text-anchor="middle">+1</text><line x1="690" y1="222" x2="710" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="750" y1="222" x2="770" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="890" y1="222" x2="930" y2="222" stroke="black" stroke-width="1"></line><text x="895" y="216" font-size="16" fill="green">0.73</text><text x="895" y="239" font-size="16" fill="red">1.00</text><circle cx="850" cy="222" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="850" y="227" font-size="20" fill="black" text-anchor="middle">1/x</text><line x1="810" y1="222" x2="830" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="870" y1="222" x2="890" y2="222" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line></g></svg>
<div class="figcaption">
  Example circuit for a 2D neuron with a sigmoid activation function. The inputs are [x0,x1] and the (learnable) weights of the neuron are [w0,w1,w2]. As we will see later, the neuron computes a dot product with the input and then its activation is softly squashed by the sigmoid function to be in range from 0 to 1.
</div>
<div style="clear:both;"></div>
</div>

In the example above, we see a long chain of function applications that operates on the result of the dot product between **w,x**. The function that these operations implement is called the *sigmoid function* \\(\sigma(x)\\). It turns out that the derivative of the sigmoid function with respect to its input simplifies if you perform the derivation (after a fun tricky part where we add and subtract a 1 in the numerator):

$$
\sigma(x) = \frac{1}{1+e^{-x}} \\\\
\rightarrow \hspace{0.3in} \frac{d\sigma(x)}{dx} = \frac{e^{-x}}{(1+e^{-x})^2} = \left( \frac{1 + e^{-x} - 1}{1 + e^{-x}} \right) \left( \frac{1}{1+e^{-x}} \right) 
= \left( 1 - \sigma(x) \right) \sigma(x)
$$

As we see, the gradient turns out to simplify and becomes surprisingly simple. For example, the sigmoid expression receives the input 1.0 and computes the output 0.73 during the forward pass. The derivation above shows that the *local* gradient would simply be (1 - 0.73) * 0.73 ~= 0.2, as the circuit computed before (see the image above), except this way it would be done with a single, simple and efficient expression (and with less numerical issues). Therefore, in any real practical application it would be very useful to group these operations into a single gate. Lets see the backprop for this neuron in code:

```python
w = [2,-3,-3] # assume some random weights and data
x = [-1, -2]

# forward pass
dot = w[0]*x[0] + w[1]*x[1] + w[2]
f = 1.0 / (1 + math.exp(-dot)) # sigmoid function

# backward pass through the neuron (backpropagation)
ddot = (1 - f) * f # gradient on dot variable, using the sigmoid gradient derivation
dx = [w[0] * ddot, w[1] * ddot] # backprop into x
dw = [x[0] * ddot, x[1] * ddot, 1.0 * ddot] # backprop into w
# we're done! we have the gradients on the inputs to the circuit
```

**Implementation protip: staged backpropagation**. As shown in the code above, in practice it is always helpful to break down the forward pass into stages that are easily backpropped through. For example here we created an intermediate variable `dot` which holds the output of the dot product between `w` and `x`. During backward pass we then successively compute (in reverse order) the corresponding variables (e.g. `ddot`, and ultimately `dw, dx`) that hold the gradients of those variables.

The point of this section is that the details of how the backpropagation is performed, and which parts of the forward function we think of as gates, is a matter of convenience. It helps to be aware of which parts of the expression have easy local gradients, so that they can be chained together with the least amount of code and effort. 

<a name='staged'></a>

### Backprop in practice: Staged computation

Lets see this with another example. Suppose that we have a function of the form:

$$
f(x,y) = \frac{x + \sigma(y)}{\sigma(x) + (x+y)^2}
$$

To be clear, this function is completely useless and it's not clear why you would ever want to compute its gradient, except for the fact that it is a good example of backpropagation in practice. It is very important to stress that if you were to launch into performing the differentiation with respect to either \\(x\\) or \\(y\\), you would end up with very large and complex expressions. However, it turns out that doing so is completely unnecessary because we don't need to have an explicit function written down that evaluates the gradient. We only have to know how to compute it. Here is how we would structure the forward pass of such expression:

```python
x = 3 # example values
y = -4

# forward pass
sigy = 1.0 / (1 + math.exp(-y)) # sigmoid in numerator   #(1)
num = x + sigy # numerator                               #(2)
sigx = 1.0 / (1 + math.exp(-x)) # sigmoid in denominator #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # denominator                        #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # done!                                 #(8)
```

Phew, by the end of the expression we have computed the forward pass. Notice that we have structured the code in such way that it contains multiple intermediate variables, each of which are only simple expressions for which we already know the local gradients. Therefore, computing the backprop pass is easy: We'll go backwards and for every variable along the way in the forward pass (`sigy, num, sigx, xpy, xpysqr, den, invden`) we will have the same variable, but one that begins with a `d`, which will hold the gradient of the output of the circuit with respect to that variable. Additionally, note that every single piece in our backprop will involve computing the local gradient of that expression, and chaining it with the gradient on that expression with a multiplication. For each row, we also highlight which part of the forward pass it refers to:

```python
# backprop f = num * invden
dnum = invden # gradient on numerator                             #(8)
dinvden = num                                                     #(8)
# backprop invden = 1.0 / den 
dden = (-1.0 / (den**2)) * dinvden                                #(7)
# backprop den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)
# backprop xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)
# backprop xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)
# backprop sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)
# backprop num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)
# backprop sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# done! phew
```

Notice a few things:

**Cache forward pass variables**. To compute the backward pass it is very helpful to have some of the variables that were used in the forward pass. In practice you want to structure your code so that you cache these variables, and so that they are available during backpropagation. If this is too difficult, it is possible (but wasteful) to recompute them.

**Gradients add up at forks**. The forward expression involves the variables **x,y** multiple times, so when we perform backpropagation we must be careful to use `+=` instead of `=` to accumulate the gradient on these variables (otherwise we would overwrite it). This follows the *multivariable chain rule* in Calculus, which states that if a variable branches out to different parts of the circuit, then the gradients that flow back to it will add.

<a name='patterns'></a>

### Patterns in backward flow

It is interesting to note that in many cases the backward-flowing gradient can be interpreted on an intuitive level. For example, the three most commonly used gates in neural networks (*add,mul,max*), all have very simple interpretations in terms of how they act during backpropagation. Consider this example circuit:

<div class="fig figleft fighighlight">
<svg style="max-width: 460px" viewbox="0 0 460 290"><g transform="scale(1)"><defs><marker id="arrowhead" refX="6" refY="2" markerWidth="6" markerHeight="4" orient="auto"><path d="M 0,0 V 4 L6,2 Z"></path></marker></defs><line x1="50" y1="30" x2="90" y2="30" stroke="black" stroke-width="1"></line><text x="55" y="24" font-size="16" fill="green">3.00</text><text x="55" y="47" font-size="16" fill="red">-8.00</text><text x="45" y="24" font-size="16" text-anchor="end" fill="black">x</text><line x1="50" y1="100" x2="90" y2="100" stroke="black" stroke-width="1"></line><text x="55" y="94" font-size="16" fill="green">-4.00</text><text x="55" y="117" font-size="16" fill="red">6.00</text><text x="45" y="94" font-size="16" text-anchor="end" fill="black">y</text><line x1="50" y1="170" x2="90" y2="170" stroke="black" stroke-width="1"></line><text x="55" y="164" font-size="16" fill="green">2.00</text><text x="55" y="187" font-size="16" fill="red">2.00</text><text x="45" y="164" font-size="16" text-anchor="end" fill="black">z</text><line x1="50" y1="240" x2="90" y2="240" stroke="black" stroke-width="1"></line><text x="55" y="234" font-size="16" fill="green">-1.00</text><text x="55" y="257" font-size="16" fill="red">0.00</text><text x="45" y="234" font-size="16" text-anchor="end" fill="black">w</text><line x1="170" y1="65" x2="210" y2="65" stroke="black" stroke-width="1"></line><text x="175" y="59" font-size="16" fill="green">-12.00</text><text x="175" y="82" font-size="16" fill="red">2.00</text><circle cx="130" cy="65" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="130" y="75" font-size="20" fill="black" text-anchor="middle">*</text><line x1="90" y1="30" x2="110" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="90" y1="100" x2="110" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="150" y1="65" x2="170" y2="65" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="170" y1="205" x2="210" y2="205" stroke="black" stroke-width="1"></line><text x="175" y="199" font-size="16" fill="green">2.00</text><text x="175" y="222" font-size="16" fill="red">2.00</text><circle cx="130" cy="205" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="130" y="210" font-size="20" fill="black" text-anchor="middle">max</text><line x1="90" y1="170" x2="110" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="90" y1="240" x2="110" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="150" y1="205" x2="170" y2="205" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="290" y1="135" x2="330" y2="135" stroke="black" stroke-width="1"></line><text x="295" y="129" font-size="16" fill="green">-10.00</text><text x="295" y="152" font-size="16" fill="red">2.00</text><circle cx="250" cy="135" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="250" y="140" font-size="20" fill="black" text-anchor="middle">+</text><line x1="210" y1="65" x2="230" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="210" y1="205" x2="230" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="270" y1="135" x2="290" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="410" y1="135" x2="450" y2="135" stroke="black" stroke-width="1"></line><text x="415" y="129" font-size="16" fill="green">-20.00</text><text x="415" y="152" font-size="16" fill="red">1.00</text><circle cx="370" cy="135" fill="white" stroke="black" stroke-width="1" r="20"></circle><text x="370" y="140" font-size="20" fill="black" text-anchor="middle">*2</text><line x1="330" y1="135" x2="350" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line><line x1="390" y1="135" x2="410" y2="135" stroke="black" stroke-width="1" marker-end="url(#arrowhead)"></line></g></svg>
<div class="figcaption">
  An example circuit demonstrating the intuition behind the operations that backpropagation performs during the backward pass in order to compute the gradients on the inputs. Sum operation distributes gradients equally to all its inputs. Max operation routes the gradient to the higher input. Multiply gate takes the input activations, swaps them and multiplies by its gradient.
</div>
<div style="clear:both;"></div>
</div>

Looking at the diagram above as an example, we can see that:

The **add gate** always takes the gradient on its output and distributes it equally to all of its inputs, regardless of what their values were during the forward pass. This follows from the fact that the local gradient for the add operation is simply +1.0, so the gradients on all inputs will exactly equal the gradients on the output because it will be multiplied by x1.0 (and remain unchanged). In the example circuit above, note that the + gate routed the gradient of 2.00 to both of its inputs, equally and unchanged.

The **max gate** routes the gradient. Unlike the add gate which distributed the gradient unchanged to all its inputs, the max gate distributes the gradient (unchanged) to exactly one of its inputs (the input that had the highest value during the forward pass). This is because the local gradient for a max gate is 1.0 for the highest value, and 0.0 for all other values. In the example circuit above, the max operation routed the gradient of 2.00 to the **z** variable, which had a higher value than **w**, and the gradient on **w** remains zero.

The **multiply gate** is a little less easy to interpret. Its local gradients are the input values (except switched), and this is multiplied by the gradient on its output during the chain rule. In the example above, the gradient on **x** is -8.00, which is -4.00 x 2.00. 

*Unintuitive effects and their consequences*. Notice that if one of the inputs to the multiply gate is very small and the other is very big, then the multiply gate will do something slightly unintuitive: it will assign a relatively huge gradient to the small input and a tiny gradient to the large input. Note that in linear classifiers where the weights are dot producted \\(w^Tx_i\\) (multiplied) with the inputs, this implies that the scale of the data has an effect on the magnitude of the gradient for the weights. For example, if you multiplied all input data examples \\(x_i\\) by 1000 during preprocessing, then the gradient on the weights will be 1000 times larger, and you'd have to lower the learning rate by that factor to compensate. This is why preprocessing matters a lot, sometimes in subtle ways! And having intuitive understanding for how the gradients flow can help you debug some of these cases.

<a name='mat'></a>

### Gradients for vectorized operations

The above sections were concerned with single variables, but all concepts extend in a straight-forward manner to matrix and vector operations. However, one must pay closer attention to dimensions and transpose operations.

**Matrix-Matrix multiply gradient**. Possibly the most tricky operation is the matrix-matrix multiplication (which generalizes all matrix-vector and vector-vector) multiply operations:

```python
# forward pass
W = np.random.randn(5, 10)
X = np.random.randn(10, 3)
D = W.dot(X)

# now suppose we had the gradient on D from above in the circuit
dD = np.random.randn(*D.shape) # same shape as D
dW = dD.dot(X.T) #.T gives the transpose of the matrix
dX = W.T.dot(dD)
```

*Tip: use dimension analysis!* Note that you do not need to remember the expressions for `dW` and `dX`  because they are easy to re-derive based on dimensions. For instance, we know that the gradient on the weights `dW` must be of the same size as `W` after it is computed, and that it must depend on matrix multiplication of `X` and `dD` (as is the case when both `X,W` are single numbers and not matrices). There is always exactly one way of achieving this so that the dimensions work out. For example, `X` is of size [10 x 3] and `dD` of size [5 x 3], so if we want `dW` and `W` has shape [5 x 10], then the only way of achieving this is with `dD.dot(X.T)`, as shown above.

**Work with small, explicit examples**. Some people may find it difficult at first to derive the gradient updates for some vectorized expressions. Our recommendation is to explicitly write out a minimal vectorized example, derive the gradient on paper and then generalize the pattern to its efficient, vectorized form.

Erik Learned-Miller has also written up a longer related document on taking matrix/vector derivatives which you might find helpful. [Find it here](http://cs231n.stanford.edu/vecDerivs.pdf).

<a name='summary'></a>

### Summary

- We developed intuition for what the gradients mean, how they flow backwards in the circuit, and how they communicate which part of the circuit should increase or decrease and with what force to make the final output higher.
- We discussed the importance of **staged computation** for practical implementations of backpropagation. You always want to break up your function into modules for which you can easily derive local gradients, and then chain them with chain rule. Crucially, you almost never want to write out these expressions on paper and differentiate them symbolically in full, because you never need an explicit mathematical equation for the gradient of the input variables. Hence, decompose your expressions into stages such that you can differentiate every stage independently (the stages will be matrix vector multiplies, or max operations, or sum operations, etc.) and then backprop through the variables one step at a time.

In the next section we will start to define neural networks, and backpropagation will allow us to efficiently compute the gradient of a loss function with respect to its parameters. In other words, we're now ready to train neural nets, and the most conceptually difficult part of this class is behind us! ConvNets will then be a small step away.


### References

- [Automatic differentiation in machine learning: a survey](http://arxiv.org/abs/1502.05767)


================================================
FILE: overview.md
================================================
---
layout: page
title: Overview of Computer Vision and Visual Recognition
permalink: /overview/
---

Our introductory lecture covered the history of Computer Vision and many of its current applications, to help you understand the context in which this class is offered. See the [slides](http://vision.stanford.edu/teaching/cs231n/slides/lecture1.pdf) for details.


================================================
FILE: pixelrnn.md
================================================
PixelRNN
========

We now give a brief overview of PixelRNN. PixelRNNs belongs to a family
of explicit density models called **fully visible belief networks
(FVBN)**. We can represent our model with the following equation:
$$p(x) = p(x_1, x_2, \dots, x_n),$$ where the left hand side $p(x)$
represents the likelihood of an entire image $x$, and the right hand
side represents the joint likelihood of each pixel in the image. Using
the Chain Rule, we can decompose this likelihood into a product of
1-dimensional distributions:
$$p(x) = \prod_{i = 1}^n p(x_i \mid x_1, \dots, x_{i - 1}).$$ Maximizing
the likelihood of training data, we obtain our models PixelRNN.

Introduction
============

PixelRNN, first introduced in van der Oord et al. 2016, uses an RNN-like
structure, modeling the pixels one-by-one, to maximize the likelihood
function given above. One of the more difficult tasks in generative
modeling is to create a model that is tractable, and PixelRNN seeks to
address that. It does so by tractably modeling a joint distribution of
the pixels in the image, casting it as a product of conditional
distributions. The factorization turns the joint modeling problem into
one that relates to sequences, i.e., we have to predict the next pixel
given all the previously generated pixels. Thus, we use Recurrent Neural
Networks for this tasks as they learn sequentially. Those same
principles apply here; more precisely, we generate image pixels starting
from the top left corner, and we model each pixel’s dependency on
previous pixels using an RNN (LSTM).

<div class="fig figcenter fighighlight">
  <img src="/assets/pixelrnn.png">
</div>

Specifically, the PixelRNN framework is made up of twelve
two-dimensional LSTM layers, with convolutions applied to each dimension
of the data. There are two types of layers here. One is the Row LSTM
layer where the convolution is applied along each row. The second type
is the Diagonal BiLSTM layer where the convolution is applied to the
diagonals of the image. In addition, the pixel values are modeled as
discrete values using a multinomial distribution implemented with a
softmax layer. This is in contrast to many previous approaches, which
model pixels as continuous values.

Model
=====

The approach of the PixelRNN is as follows. The RNN scans the each
individual pixel, going row-wise, predicting the conditional
distribution over the possible pixel values given what context the
network has. As mentioned before, PixelRNN uses a two-dimensional LSTM
network which begins scanning at the top left of the image and makesits
way to the bottom right. One of the reasons an LSTM is used is that it
can better capture some longer range dependencies between pixels - this
is essential for understanding image composition. The reason a
two-dimensional structure is used is to ensure that the signals
propagate in the left-to-right and top-to-bottom directions well.\

The input image to the network is represented by a 1D vector of pixel
values $\{x_1,..., x_{n^2}\}$ for an $n$-by-$n$ sized image, where
$\{x_1,..., x_{n}\}$ represents the pixels from the first row. Our goal
is to use these pixel values to find a probability distribution $p(X)$
for each image $X$. We define this probability as:
$$p(x) = \prod_{i = 1}^{n^2} p(x_i \mid x_1, \dots, x_{i - 1}).$$

This is the product of the conditional distributions across all the
pixels in the image - for pixel $x_i$, we have
$p(x_i \mid x_1, \dots, x_{i - 1})$. In turn, each of these conditional
distributions is determined by three values, associated with each of the
color channels present in the image (red, green and blue). In other
words:

$$p(x_i \mid x_1, \dots, x_{i - 1}) =  p(x_{i,R} \mid \textbf{x}_{<i}) \cdot p(x_{i,G} \mid \textbf{x}_{<i}, x_{i,R}) \cdot p(x_{i,B} \mid \textbf{x}_{<i}, x_{i,R}, x_{i,G}).$$

In the next section we will see how these distributions are calculated
and used within the Recurrent Neural Network framework proposed in
PixelRNN.

Architecture
------------

As we have seen, there are two distinct components to the
“two-dimensional” LSTM, the Row LSTM and the Diagonal BiLSTM. Figure 2
illustrates how each of these two LSTMs operates, when applied to an RGB
image.

<div class="fig figcenter fighighlight">
  <img src="/assets/Screen Shot 2021-06-15 at 9.41.08 AM.png">
</div>

**Row LSTM** is a unidirectional layer that processes the image row by
row from top to bottom computing features for a whole row at once using
a 1D convolution. As we can see in the image above, the Row LSTM
captures a triangle-shaped context for a given pixel. An LSTM layer has
an input-to-state component and a recurrent state-to-state component
that together determine the four gates inside the LSTM core. In the Row
LSTM, the input-to-state component is computed for the whole
two-dimensional input map with a one-dimensional convolution, row-wise.
The output of the convolution is a 4h × n × n tensor, where the first
dimension represents the four gate vectors for each position in the
input map (h here is the number of output feature maps). Below are the
computations for this state-to-state component, using the previous
hidden state ($h_{i-1}$) and previous cell state ($c_{i-1}$).
$$[o_i, f_i, i_i, g_i]  = \sigma(\textbf{K}^{ss} \circledast h_{i-1} + \textbf{K}^{is} \circledast  \textbf{x}_{i})$$
$$c_i  = f_i \odot c_{i-1} + i_i \odot g_i$$
$$h_i  = o_i \odot \tanh(c_{i})$$

Here, $\textbf{x}_i$ is the row of the input representation and
$\textbf{K}^{ss}$, $\textbf{K}^{is}$ are the kernel weights for
state-to-state and input-to-state respectively. $g_i, o_i, f_i$ and
$i_i$ are the content, output, forget and input gates. $\sigma$
represents the activation function (tanh activation for the content
gate, and sigmoid for the rest of the gates).\

**Diagonal BiLSTM** The Diagonal BiLSTM is able to capture the entire
image context by scanning along both diagonals of the image, for each
direction of the LSTM. We first compute the input-to-state and
state-to-state components of the layer. For each of the directions, the
input-to-state component is simply a 1×1 convolution $K^{is}$,
generating a $4h × n × n$ tensor (Here again the dimension represents
the four gate vectors for each position in the input map where h is the
number of output feature maps). The state-to-state is calculated using
the $K^{ss}$ that has a kernel of size 2 × 1. This step takes the
previous hidden and cell states, combines the contribution of the
input-to-state component and produces the next hidden and cell states,
as explained in the equations for Row LSTM above. We repeat this process
for each of the two directions.

Performance
===========

When originally presented, the PixelRNN model’s performance was tested
on some of the most prominent datasets in the computer vision space -
ImageNet and CIFAR-10. The results in some cases were state-of-the-art.
On the ImageNet data set, achieved an NLL score of 3.86 and 3.63 on the
the 32x32 and 64x64 image sizes respectively. On CiFAR-10, it achievied
a NLL score of 3.00, which was state-of-the-art at the time of
publication.

References
==========

1) CS231n Lecture 11 'Generative Modeling'
2) Pixel Recurrent Neural Networks (Oord et. al.) 2016


================================================
FILE: poster-2018.md
================================================
---
layout: page
title: CS231n Poster Session 2018
permalink: /poster-session-2018/
---

<div class='fig figcenter fighighlight'>
  <img src='/assets/huang.jpg'>
</div>

* **Location**: [Jen-Hsun Huang Engineering Center](https://www.google.com/maps/place/Jen-Hsun+Huang+Engineering+Center/@37.4277284,-122.1763852,17z/data=!3m1!4b1!4m5!3m4!1s0x808fbb2ad1efaf1d:0xe4be58a43178043f!8m2!3d37.4277284!4d-122.1741965)
* **Parking: Parking information can be found [here](https://lbre.stanford.edu/sites/default/files/parking_and_circulation_map_0.pdf "title")**
* **Sponsor Setup Time**: 11:15am - 12:00pm
* **Poster Session**: 12:00pm - 3:15pm
* **Award Ceremony**: 3:15pm - 3:30pm


The 2018 Stanford CS231N poster session will showcase projects in Convolutional Neural Networks for Visual Recognition that students have worked on over the past quarter. This year, 650 students will be presenting over 300 projects. 
The topics range from Generative Adversarial Networks (GANs), healthcare and medical imaging, art and style transfer, satellite imaging, self-driving cars, video understanding and more! See the list below for the projects that will be presented.

Catered food and refreshments will be made available over the course of the event.

This poster session has been made possible thanks to generous donations from Waymo, Bloomberg, Zoox, and NVIDIA!

### List of Projects (3 digit number is ID to locate the poster) ###

* **101**: Enhance! Image Super-Resolution
* **102**: Super Resolution using CNNs and GANS
* **103**: Modern Approaches for Real-Time Neural Style Transfer
* **104**: A convolutional classification approach to colorization
* **105**: Face swapping and harmonization using neural nets
* **106**: Defendr
* **107**: Automatic Graphic Generation from Text using GANs for Artists and Engineers
* **108**: Generating Novel Images of Landmarks with WGANs
* **109**: Generating Artwork with GANs
* **110**: Audio2GIF
* **111**: End-to-End Super Resolution Object Detection Networks
* **112**: Conditional Face Generation with Deep Convolutional Generative Adversarial Networks
* **113**: Learning a Meta-Discriminator for GANs
* **115**: Exploring Adversarial Input Spaces for Convolutional Neural Network Defense
* **118**: Contour-Aware Image-to-Image Translation
* **119**: Human Portrait Colorization
* **120**: Adversarial Image Segmentation 
* **121**: Artwork Classification and Style Transfer
* **122**: Automatic Colorization through Transfer Learning
* **123**: CNN models for classifying emotions in art images
* **124**: Deep Learning for Image Completion
* **125**: Computer Generated Art 
* **127**: Filter discriminators: adversarial loss for convolutional kernels
* **128**: Anime Character Generation with GAN
* **129**: Modifying the AttnGAN Discriminator for Improved Feature-Specific Text-to-Image Synthesis
* **130**: Data Genie 
* **131**: Fully Unsupervised Aerial-to-Map Translation with Cycle-GAN
* **132**: NeuralHash: An Adversarial Steganographic Method For Robust, Imperceptible Watermarking
* **133**: Mapping Decision Boundaries of Convolutional Neural Network Classifiers
* **134**: Distributionally Robust Adversarial Training
* **135**: Generating realistic morphologies of neurons in rodent hippocampus with DGCAN
* **136**: Text2Mesh using StackGAN
* **137**: Detection and removal of biases using adversarial architectures
* **138**: Cloud Removal in Hyperspectral Satellite Images using Generative Adversarial Networks
* **140**: In your face! GAN Face Generation
* **141**: Anonymity as a Service: Addressing Perceptual Biases by Anonymizing Facial Features
* **144**: Global an Local Neural Style Transfer
* **146**: End-to-End Neural Color Transfer of Subjects in Portrait Photography
* **147**: Text to Artistic Image Generation with GANs
* **148**: Be Creative: Style Transfer Mix & Match
* **149**: Determine the Origin of Historical Artifacts
* **150**: Style Transfer Incorporating Shape Deformability for Animation Genre Replication
* **200**: Dog Breed Identification
* **203**: Convolutional Neural Networks for Landmark Recognition
* **204**: World Landmark Recognition
* **205**: Distracted Driver Detection
* **206**: Deep Shopping: Object Detection Based Fashion Recommendation
* **208**: Capsule Networks on Modified Real-World Datasets
* **209**: Aircraft Type Recognition with Convolutional Neural Networks
* **210**: FloraDex: A Fine-Grained Flower Recognizer
* **212**: Handing Noisy Bounding-Box Annotations with Faster R-CNN
* **213**: A Convolutional Recurrent Attention Model for Fine-Grained Classification
* **215**: Probabilistic Convolutional Neural Network for Probabilistic Inference
* **217**: Optical Music Recognition based on Convolutional Neural Networks
* **218**: Fine Grain Image Classification on Dog Breeds
* **219**: TINE-CNN Augmentation: Automatic data augmentation for any image classification task
* **220**: Fashion Product Recognition in Fine-Grained Visual Categorization
* **222**: One-Shot Learning of Cosmetic Objects
* **223**: Exploring Movie Poster Classification and Generation with Deep Convolutional Neural Networks
* **224**: Classifying Electromagnetic Showers from Calorimeter Images with CNNs
* **228**: Deep Stereo Fusion: Deep Learning for Disparity Map Estimation and Image Fusion with Dual Camera Phone Imagery
* **229**: Automobile-Based Dual-Camera Object Segmentation
* **230**: Fine-grained Classification of Furniture and Home Goods Images
* **231**: Fine-grained Image Classification with Batch Contrastive Loss
* **232**: Don't Judge a Movie by its Poster
* **233**: Product multi-class classification using pretrained CNN models
* **234**: SIFT++
* **237**: Spatio-temporal large traffic network speed prediction using CNN
* **238**: Shazam for Fashion
* **239**: Composite Architecture for High Performance Image Classification
* **240**: Traffic monitoring from seismic data
* **241**: Image-based Merged Di-photon Identification for the ATLAS Experiment at the Large Hadron Collider
* **242**: DeepClean:  Deep Bayesian Restoration of Interferometric Images
* **243**: 3D GAN Object Generation and Reconstruction
* **244**: Image Segmentation for Autonomous Cars
* **245**: Multi-Task Facial Landmark Detection with CNN
* **247**: 3DNetView: Training Neural Networks to Perform 3D Image Reconstruction
* **248**: Architecture for Automatic Fashion Product Labeling
* **249**: Understanding Deforestation in the Amazon Basin with Neural Networks
* **250**: Gotta Train 'Em All: Classifying Pokémon using Deep Learning
* **252**: Movie Recommendation System Enhanced by Image Data and ConvNet
* **253**: Biomedical Image Segmentation for Nuclei Detection
* **254**: Self Attention Generative Adversarial Networks for High-Dimensional Scene Representations from Single 2D Images
* **255**: Learning to Detect Light Field Features
* **300**: Neural Techniques for Pose Guided Image Generation
* **301**: Conversational Group Detection With Deep Convolutional Networks
* **302**: Apparent Age Estimation From Facial Images
* **303**: American Sign Language Gesture Detection
* **306**: Human Body Pose Estimation
* **307**: Learning to Feel: Training a CNN to recognize emotion
* **308**: Monocular 3D Human Bounding Box Estimation
* **309**: Decrypting human emotions
* **310**: Semantic Segmentation(will update correct title)
* **311**: Predicting Human Emotions from Images and Captions
* **312**: Transformation Generalizability of Novel Objects in Human and Computational Vision
* **314**: Classification of Dance Styles Using CNNs
* **316**: Detecting Assaults in Surveillance Videos
* **317**: Video Hand Gesture Recognition
* **319**: Improving Affectnet: Emotion Classification
* **320**: Video Gesture Classification Using Combined RGB and Depth Features
* **321**: Predicting Hand Pose and Gesture from Monocular RGB Images
* **402**: Medical Image Super-Resolution using GANs
* **403**: 3D Reconstruction and Alignment of MRIs for Improved Medical Diagnostics
* **405**: Learning electrode probing strategy in retinal prosthesis systems
* **408**: PineappLeNet: PineappLeNet: Synthesizing Dynamic Contrast-Enhanced (DCE) Magnetic Resonance Data
* **409**: Classifying Dementia Ratings from MRI Imaging Data
* **413**: Protein Functional Site Detection Using Amino Acid Contact Maps
* **415**: Using Deep Learning for UIP Classification and Cyst Volume Calculation
* **417**: BoneNet: Convolutional Methods for Abnormality Detection in the Lower Extremities 
* **418**: Does it look good? Evaluating GANs for Medical Imaging Applications
* **419**: Detecting Unburnt Skin Using Deep Learning
* **423**: Heirarchical Semantic Segmentation of Brain Tumors in MRIs using Convolutional Neural Networks
* **424**: Cross-Institute Histopathology Image Stain Normalization with Deep Convolutional Neural Networks
* **426**: Automated Burn Prognosis of Percent Total Body Surface Area
* **429**: Neural Stain Normalization and Unsupervised Classification of Cell Nuclei in Histopathological Breast Cancer Images
* **430**: Segmentation of Stroke Lesion in T1-Weighted MRIs
* **431**: Applying Deep Learning to Chest X-Rays for Pneumonia Detection
* **432**: 3D Brain Tumor Segmentation
* **434**: Thoracic Imaging Temporal Interpolation
* **437**: Automated Iris Detection
* **438**: Using Transfer Learning to classify Brain Tumors from Pathology Images
* **440**: Preprocessing Histopathology Stains with Deep Learning
* **442**: U-Net Architecture for Segmenting Nuclei in Medical Images
* **444**: Diagnosing Retinal Pathology from Optical Coherence Tomography Using VGG19 and InceptionV3
* **445**: Peak finding for crystallography
* **446**: Reconstruction of multi-shot diffusion-weighted MRI using deep learning
* **450**: Super-resolution MRIs with Semi-Supervised GANs
* **451**: Classification and Segmentation of Brain Lesions after Stroke
* **452**: Neural Networks for the Identification of Viable Eggs in In-vitro Fertilization
* **471**: Dense Convolutional Networks for Abnormality Classification in Mammograms
* **455**: 3D BrainNet: Brain Tumor Segmentation with Convolutional Neural Networks and Fully Connected CRFs
* **456**: Lung Nodule Candidate Generation and Cancer Prediction
* **457**: Anonymizing MRI Images via U-Net Image Segmentation
* **458**: Tackling Diabetic Retinopathy with Visual Recognition
* **460**: Unpaired Magnetic Resonance Image-to-Image Translation for Prostate using Cycle-Consistent Adversarial Networks
* **461**: Rethinking Radiology: An Analysis of Different Approaches to BraTS
* **462**: Classification Techniques for White Blood Cell Types
* **470**: Segmentation of Stroke Lesions in T1-Weighted Brain MRIs using Deep Learning
* **500**: D4QN: Distributed Deep Reinforcement Learning in the Arcade Learning Environment
* **501**: Single Shot Robotic Grasp Pose Detection
* **502**: Occupancy Grid Mapping for Robust and Efficient Learning
* **504**: Robotic Grasping Planning using Convolutional Neural Networks
* **508**: RevNet: An End-to-End Model for training self-driving simulated vehicles 
* **509**: LiDAR & RGB Fusion for 3D Bounding Box Estimation
* **510**: Autonomous Driving Video Segmentation with Deep Learning
* **511**: Recent Tesla Model X Autopilot Accident Analysis and Possible Solution via  Traffic Sign  and Lan Detection
* **512**: Smaller is Better: Image Compression using Deep Learning
* **513**: Multi-Task Learning in Visual Attribute Recognition
* **514**: A look at the topology of convolutional neural networks
* **517**: Deep Learning on LIDAR  Point Cloud for 3D Object Detection
* **518**: Semantic Segmentation in the Traffic Environment
* **519**: Suction Affordance Maps for Grasping of Diverse Objects
* **521**: Comparing Deep Neuroevolution to RL algorithms on Atari and Mujoco Environments
* **524**: Using Human Gameplay to Augment Deep Q-Networks for Crypt of the NecroDancer
* **525**: Comparing Learning Algorithms with Pac-Man 
* **526**: Twist to Grasp: Rotational Regional Proposals for Grasp Detection
* **529**: Learning to Convolve
* **530**: Adversarial Examples for YOLO Object Detection
* **531**: High Fidelity Image Compression Using CNNs
* **533**: Video Object Segmentation for Autonomous Vehicles
* **534**: Object Detection in Autonomous Driving
* **535**: Improving 3D Data Resolution: Using CNNs to Upscale Resolution of LIDAR Data
* **536**: Semantic Segmentation on Autonomous Vehicle Data 
* **537**: Traffic Sign Detection and Classification using Capsule Network
* **539**: Using t-SNE to Visualize Network Dynamics
* **540**: Quantization Methodology for Efficient CNN Inference
* **600**: Identifying Modes of Fishing from Ship Images
* **601**: Enhancing Climate Data Resolution using Residual Networks
* **602**: Detection of Cryogenic tanks
* **604**: Tents Density Mapping of Refugee Camp via High-Resolution Remote Sensing Imagery
* **607**: Convolutional Neural Networks for Radargram Segmentation
* **609**: Power Plant Type Classification and Numerical Prediction Using Satellite Imagery Data
* **612**: Semi-Supervised Image Segmentation for Satellite Imagery
* **700**: DeepGIFs: Using Deep Learning to Understand and Synthesize Motion
* **701**: Conventional Video Object Segmentation using Mask R-CNN and Online Learning
* **702**: Moments in Time Challenge: Spatiotemporal information extraction from sequential video inputs using joint CNN-LSTM framework
* **703**: Automated Visual Weak Supervision for Object Recognition in Videos
* **704**: Using CNNs to Identify Player Actions in Basketball Videos
* **705**: Video Compression with 3D CNNs
* **706**: Action Recognition in the Moments In Time Dataset
* **707**: Convolutional Neural Networks for MLB Pitch Recognition
* **711**: Video Classification on the YouTube-8M Dataset
* **712**: A Multi-Faceted Approach to Video Action Classification
* **713**: Basketball Shot Quality Assessment with Deep Learning
* **714**: Moments in Time: Deep Action Recognition
* **800**: Spotting and Transcribing Structured Nutrition Information from Product Images
* **802**: A Novel Approach using Weak Supervision to Label Digital Screenshots\\ for Behavioral Analysis with Transfer Learning
* **803**: End-To-End Trainable Model for Text Recognition
* **804**: Converting Handwriting to Latex
* **807**: Layerwise Quantization for Neural Networks
* **808**: Image Retrieval and Image-Text Feature Alignment
* **809**: Emoji-Language Image Captioning
* **810**: Optimizing Reddit Posts
* **811**: Culinary Images' Feature-Extraction And Recipe Generation Using Deep Convolutional Neural Network
* **812**: Automatic Code Generation from UI Screenshots
* **813**: GUCCI GAN: Grouped Unique Contextually-Classified Inpainting GAN
* **814**: Adversarial Visual Question Answering
* **816**: Handwritten Mathematics to LaTeX Code
* **817**: Image captioning with attention
* **818**: Identifying and Solving CAPTCHAs with Deep Learning
* **819**: Mobile-Focused Networks for Classification of Chinese Text in the Wild
* **820**: End-to-end Dense Video Captioning with Reinforcement Learning
* **823**: Image Captioning using Policy Gradient Method
* **825**: Chinese Character Synthesization From Components
* **826**: Photo Quality Assessment with Deep CNNs
* **827**: Bridging 3D shape and natural language
* **828**: Neural Image Captioning on MSCOCO Dataset


================================================
FILE: poster.md
================================================
---
layout: page
title: CS231n Poster Session
permalink: /poster-session/
---

# 2017 Stanford CS231n Poster Session #
<div class='fig figcenter fighighlight'>
  <img src='/assets/bing.jpg'>
</div>

* **Location: [Bing Concert Hall](https://www.google.com/maps/place/Bing+Concert+Hall/@37.4320543,-122.1660855,15z/data=!4m5!3m4!1s0x0:0xf5f2c6a783cabf93!8m2!3d37.4320543!4d-122.1660855, "title")**
* **Parking: Parking information can be found [here](https://lbre.stanford.edu/sites/default/files/parking_and_circulation_map_0.pdf "title")**
* **Sponsor Setup Time: 11:15 am - 12:00 pm**
* **Poster Session: 12:00 pm - 2:45 pm**
* **Award Ceremony: 2:55 pm - 3:15 pm**

The 2017 Stanford CS231N poster session will showcase projects in Convolutional Neural Networks for Visual Recognition that students have worked on over the past quarter. This year, 750 students will be presenting over 350 projects. 
The topics range from Generative Adversarial Networks (GANs), healthcare and medical imaging, art and style transfer, satellite imaging, self-driving cars, video understanding and more! See the complete list of projects and poster session map below to find the location of a specific poster. We will be awarding 10+ awards to the top posters! The top prizes will be $500+ in value! Stanford affiliates (faculty, staff, students, alumni) and their guests are welcome to attend. Catered food and refreshments will be made available over the course of the event. This poster session is made possible through the generous support of Benchmark, Andreessen Horowitz, Nvidia and Apple! 

### Bing Map: See the Project IDs in the list below and find them on the map ###
<div class='fig figcenter fighighlight'>
  <img src='/assets/map.png'>
</div>


### List of Projects (3 Digit Number is ID to Locate the Poster) ###

* **101**		Invasive Species Detection									
* **102**		Scene Classification with Convolutional Neural Networks									
* **104**		Adaptive Regularization for Neural Networks									
* **105**		Image-based Product Recommendation System with Convolutional Neural Networks									
* **107**		Metric Learning for Clustering Images from Unknown Classes									
* **108**		Intra-Class and Inter-Class Feature Learning through Deep Metric Learning for Large Scale Image Retrieval									
* **110**		Neural Combinatorial Optimization for Solving Jigsaw Puzzles									
* **112**		Faster R-CNN with RoI Refinement									
* **113**		CNNs for Object Recognition in Surveillance									
* **114**		Greedy Layer-wise Training for Weakly-supervised Object Localization and Segmentation									
* **116**		Design And Analysis of a Hardware CNN Accelerator									
* **118**		XNOR-Net on FPGA									
* **119**		Convolutional Neural Network Classification of Functional Shoe Types									
* **121**		CHILDNet: Curiosity-driven Human-In-the-Loop Deep Network									
* **122**		Automated Smart TV UI Performance Testing with Visual Recognition									
* **125**		Labeling Images with thematic and emotional content									
* **126**		Classifying U.S. Houses by Architectural Style									
* **128**		Finding Protests in Social Media Data									
* **129**		Deep Visual Learning of Reddit Images									
* **130**		Fast Softmax Sampling for Deep Neural Networks									
* **131**		Prototypical one-shot learning using high-dimensional embeddings									
* **133**		Malicious Dropout									
* **134**		Compression and Acceleration of CNN Training and Evaluation									
* **135**		YOLO for real time object detection on mobile									
* **136**		Identifying Architectural Styles by Convolutional Neural Network									
* **200**		UAV Depth Perception from Visual Images using a Deep Convolutional Neural Network									
* **203**		Depth Estimation from Single Image Using Convolutional Neural Networks									
* **204**		Improving 3D Scene Reconstruction through Object Recognition and Segmentation									
* **205**		Depth-Enhanced Classification Network									
* **207**		Depth Regression from a Single Monocular Image using a Multi-Scale Deep Network									
* **209**		Understanding Physical Geometry of a Scene from Single Monocular Images									
* **210**		StereoPhonic: Depth From Stereo on Phones									
* **211**		Deep Video Interpolation									
* **212**		To Post or Not To Post: Using CNNs to Classify Social Media Worthy Images									
* **213**		Item Removal Detection for Retail Environments with Neural Networks									
* **214**		Extracting Kinematic Information Using Pose Estimation									
* **216**		Assessing Driver Distraction from Real-Time Video									
* **217**		Hand Gesture Recognition using 3D models and Convolutional Neural Network									
* **218**		Hand Gesture Recognition using Image Processing and Convolutional Neural Network									
* **219**		Mobile, Marker-less, 3D Body Pose Estimation									
* **220**		Deep 3D Human Key Point Estimation									
* **221**		Touchy Feely: Real-time Emotion Recognition									
* **222**		Face detection and recognition with YOLO									
* **223**		Reconstructing Obfuscated Human Faces									
* **224**		Recognizing facial expressions using deep learning									
* **225**		Detection of Hand Grasping Tasks for “Grab and Go” Groceries									
* **226**		Ava Makeup									
* **227**		Lip Reading Word Classification									
* **229**		Gaze Capture with CNNs									
* **230**		Deep Networks for Robust Depth Estimation with Single-Photon Sensors									
* **231**		Facial Emotion Recognition for Wild Images									
* **300**		Effectiveness of Style Transfer as a Data Augmentation Technique									
* **301**		Historical and Modern Image-to-Image Translation with Generative Adversarial Networks									
* **302**		Colorization using ConvNets and GAN									
* **304**		Automatic Manga Colorization with Hint									
* **305**		Generative Adversarial Networks for Custom Image Generation									
* **306**		Evaluation of Image Completion Algorithm: Deep Convolutional Generative Adversarial Nets vs. Exemplar-Based Inpainting									
* **307**		Autoencoders for Inverse Rendering									
* **308**		Adversarial Generator-Encoder Networks Based Visual Recommender System									
* **309**		Semi-supervised learning via adversarial training									
* **311**		Chinese Painting Generation Using Deep Convolutional Generative Adversarial Networks									
* **312**		Super Resolution on Video using GANs									
* **313**		Using Generative Models for Semi-Supervised Learning									
* **314**		Class-Conditional Super-resolution with GANs									
* **315**		FlowGAN									
* **316**		Generative Text to Image Synthesis with Applications to Reinforcement Learning in Minecraft									
* **317**		Frame Interpolation Using Generative Adversarial Networks									
* **318**		Can We Train Dogs and Humans Together: Hidden Distribution GANs									
* **319**		Multi-task Learning on Multi-Spectral Images Using GANs for Predicting Poverty									
* **320**		From 2D Sketch to 3D Shading									
* **322**		Art Generation from text with GANs									
* **323**		Re(live) Photos: Predicting Neighboring Frames with GANs									
* **324**		Multi-instances text-to-image generation with StackGAN									
* **325**		Plant Disease Detection with Deep Learning									
* **327**		Label-Free Object Detection in Video									
* **328**		Filling the Blanks: GANs vs RNNs									
* **329**		Towards Facial Reconstruction from Sparse Data: Utilizing Encoder-Decoder Networks with GANs to Infer Facial Rotations									
* **330**		maaGMA: Modified-Adversarial-Autoencoding Generator with Multiple Adversaries									
* **331**		visual search by brushing									
* **400**		Using Convolutional Neural Networks to Predict Completion Year of Fine Art Paintings									
* **401**		Multi-style Transfer: Generalizing Style Transfer to an Artist									
* **402**		Style Transfer of Images Incorporating Depth Perception and Object Segmentation									
* **403**		Automatic Sketch Colourization									
* **404**		Semantic Segmented Style Transfer									
* **405**		Mixed Style Transfer									
* **406**		Artist Identification with Convolutional Neural Networks									
* **407**		Learning Instance Normalization Parameters for Real-Time Arbitrary Style Transfer									
* **409**		Automatic image colorization using deep neural networks									
* **410**		Classifying Rjiksmuseum Paintings by Artist									
* **411**		From Renaissance to Pop: A Study of Artistic Eras using Deep CNNs									
* **412**		Style transfer between two photographs									
* **414**		ArtTalk: Labeling Images with Thematic and Emotional Content									
* **415**		Labeling Paintings with Thematic and Emotional Content									
* **416**		Localized Style Transfer Using Semantic Segmentation									
* **417**		Smooth In-Image Style Transitions									
* **418**		AutoColorisation									
* **419**		Judging Thematic Similarity in Paintings									
* **420**		Freehand Sketch Recognition									
* **421**		EXIF Estimation With Convolutional Neural Networks									
* **423**		Full Resolutional Video Compression Using Recurrent Convolutional Neural Networks									
* **424**		HiDDeN: Hiding Data with Deep Networks									
* **425**		Line Drawing Colorization									
* **426**		DeepSynth: Synthesizing A Musical Instrument With Video									
* **428**		Exploring Style Transfer									
* **500**		Mice behaviour analysis in open field test									
* **501**		Unsupervised Deep Learning with variational autoencoder for Interpretable Mammogram CBIR and Risk scoring									
* **502**		Brain Tumor Segmentation									
* **503**		Vision-Based Approach to Senior Healthcare: Depth-Based Activity Recognition with Convolutional Neural Networks									
* **505**		Privacy-Preserving Knowledge Transfer for Hand Hygiene Detection									
* **506**		Depth-Based Activity Recognition in ICUs Using Convolutional Neural Networks									
* **507**		BURNED: Efficient and Accurate Burn Prognosis Using Deep Learning									
* **511**		Differentiating Tumor Cells from Healthy Cells in a Tumor Biopsy using CNNs									
* **512**		Multimodal Brain MRI Tumor Segmentation via Convolutional Neural Networks									
* **513**		Accelerating dynamic magnetic resonance image reconstruction using deep convolutional neural networks									
* **514**		Predict DNA methylation states from whole slide images of brain tumors									
* **515**		Predicting Lung Cancer Incidence from CT Imagery									
* **516**		Automatic Neuronal Cell Classification in Calcium Imaging with Convolutional Neural Networks									
* **517**		GlimpseNet: Multi-Instance Generative Attention for Full Mammogram Diagnosis									
* **518**		Deep Convolutional Neural Networks for Lung Cancer Detection									
* **519**		Predicting therapeutic efficacy in Non Small Cell Lung Cancer									
* **520**		Breast Cancer Image Segmentation: Cancer Cell vs Stroma									
* **521**		MRI to MGMT: Predicting Drug Efficacy for Glioblastoma Patients									
* **523**		Prediction of Head and Neck Cancer Submolecular Types from Pathology Images									
* **524**		Automated Detection of Diabetic Retinopathy using Deep Learning									
* **525**		Patch-based Head and Neck Cancer Subtype Classification									
* **526**		Deep Neural Nets for Brain Tumor Segmentation									
* **527**		Deep Learning for Chest X-ray abnormality detection									
* **528**		Multiparametric MR Image Analysis for Prostate Cancer Assessment with Convolutional Neural Networks									
* **530**		MR Contrast Prediction Using Deep Learning									
* **531**		Brendan - A Deep Convolutional Network for Representing Latent Features of Protein-Ligand Binding Poses									
* **533**		Using Deep Learning for Segmentation of Microscopy Images									
* **534**		Quantum Annealing Assisted Deep Learning for Lung Cancer Detection									
* **536**		Early Stage Integrated Circuit Design Efficacy Prediction using Congestion and Cell Density Images									
* **537**		Geological scenario identification using seismic impedance data									
* **538**		Variational Autoencoders for Classical Ising Models									
* **539**		Convolutional Neural Networks for Pile Up Identification in ATLAS									
* **541**		Deep Learning application to proton radiography analysis									
* **542**		Wave-dynamics simulation using deep neural networks									
* **545**		RouteAI: A Convolutional Neural Network based Router for Integrated Circuits									
* **547**		Convolutional Neural Networks For Automated Surface-Wettability Characterization									
* **548**		Data Classification with Residual Network for Single Particle Imaging Experiments									
* **550**		Extraction of Building Footprints from Satellite Imagery									
* **551**		Predicting Land Use and Atmospheric Conditions from Amazon Rainforest Satellite Imagery									
* **552**		Temporal Poverty Prediction									
* **553**		UAV Road Mapping Using Convolutional Neural Nets									
* **554**		Motion Prediction from Trajectory and Top-Down Visual Data									
* **555**		Understanding Localized Remote Sensing-Based Crop Yield Prediction using Convolutional Neural Networks									
* **556**		Neighborhood Watch: Using CNNs to Predict Income Brackets from Google StreetView Images									
* **557**		CNNs for wide-area precipitation estimation from geostationary satellite imagery									
* **558**		Mapping Tidal Salt Marshes									
* **559**		Predicting Health using Satellite Images									
* **561**		Convolutional Neural Nets for Segmentation of Satellite Imagery									
* **563**		Discovering Scenic Roads Using Neural Networks									
* **564**		Rail Network Detection from Aerial Imagery using Deep Learning									
* **601**		AI Attack: Learning to Play a Video Game Using Visual Inputs									
* **602**		Deep Predictive Action Conditional Neural Network for Frame Prediction in Atari Games									
* **603**		Playing Go without Game Tree Search Using Convolutional Neural Networks									
* **604**		Learning a Visual State Representation for Generative Adversarial Imitation Learning									
* **605**		Playing Geometry Dash with Deep Reinforcement Learning									
* **606**		End-to-end models for task-oriented visual dialogue with reinforcement									
* **607**		Classifying grocery items images using Convolutional Neural Networks									
* **608**		Imitation Learning with THOR									
* **610**		Differentiable Neural Computer for Maze Navigation									
* **611**		Playing Atari Games with Deep Learning									
* **612**		Trajectory-Aware Visual Navigation in Indoor Scenes with Target-driven Deep Reinforcement Learning									
* **614**		Indoor Navigation with Imitation Learning									
* **616**		Deep Q-Learning with OpenAI Gym									
* **617**		A3C Methods for General Game Playing									
* **618**		Deep Reinforcement Learning using Memory-based Approaches									
* **619**		FoxNet: A Deep-Learning Agent for Nintendo’s Star Fox 64									
* **620**		Target-Driven Navigation with Imitation Learning									
* **621**		End-to-End Learning for Fighting Forest Fires (EELFFF)									
* **622**		Teach Me To Tango - Fidelity Estimation of Visual Odometry Systems for Robust Navigation									
* **623**		Indoor Target-driven Visual Navigation									
* **624**		NeuralKart: A Real-Time Mario Kart 64 AI									
* **625**		Understanding Driver’s Intention Using Convolutional Neural Networks									
* **626**		Self-Driving Car Steering Angle Prediction Based on Image Recognition									
* **627**		Object Detection and Its Implementation on Android Devices									
* **629**		Convolutional Architectures for Self-Driving Cars									
* **630**		Real-Time Multiple Object Tracking for Autonomous Cars									
* **631**		Single Stage Detector for Object Detection									
* **632**		Convolutional Neural Network Information Fusion based on Dempster-Shafer Theory for Urban Scene Understanding									
* **633**		Street View Instance Segmentation using R-CNN models									
* **700**		Detecting videos containing (child) pornography									
* **701**		Superflow: Frame Prediction with Convolutional Neural Network									
* **702**		YouTube8-M: Video Classification									
* **705**		YouTube-8M Video Classificatoin									
* **707**		YOLO-based Adaptive Window Two-stream Convolutional Neural Network for Video Classification									
* **708**		Spotlight: A Smart Video Highlight Generator									
* **709**		Video Understanding: From Video Classification to Video Captioning									
* **710**		Selecting Youtube Video Thumbnails via Convolutional Neural Networks									
* **711**		Google Cloud and YouTube-8M Video Understanding Challenge									
* **713**		Prediction of Personality First Impression with Deep Bimodal LSTM									
* **714**		Video frame Interpolation and extrapolation									
* **715**		Using Convolutional Neural Networks to Classify Noisy Sports Videos									
* **716**		Detecting Guns in Video Content									
* **717**		Soccer Stats with Computer Vision									
* **718**		Real-time Surveillance Video Analytics System									
* **800**		Using Graphical Programming Languages Designed for Novices to Train Neural Networks to Understand and Write Simple Programs									
* **801**		Siamese Convolutional Neural Networks for Authorship Verification									
* **804**		Deep Convolutional Neural Networks for Automatic Speech Recognition									
* **805**		CLEVR Advancements for High-Level Visual Reasoning Programs									
* **806**		Pix2Sketch									
* **807**		Testing Image Understanding through Question Answering									
* **808**		Real-time Object detection									
* **810**		Handwritten Text Recognition using Deep Learning									
* **811**		Bilinear Pooling and Co-Attention Inspired Models for Visual Question Answering									
* **812**		Applying NLP Deep Learning Ideas to Image Classification									
* **814**		You CAN Judge a Book by its Cover									
* **815**		Image to Latex									
* **816**		Combining RNN and CNN to Understand the Usefulness of Yelp Reviews									
* **817**		Where is Waldo, a deep-learning approach to general purpose template matching									
* **818**		An Automated Art Historian									
* **819**		Deep Bayesian Pragmatics for Image Captioning									
* **900**		Predicting Deforestation in the Amazon									
* **902**		Amazon Rainforest Satellite Image Labeling									
* **903**		Understanding the Amazon from Space									
* **904**		Understand Amazon Deforestation using Neural Network									
* **905**		Multi-label Classification on Satellite Images of the Amazon Rainforest									
* **906**		Deep Multi-Label Classification for High Resolution Satellite Imagery of Rainforests									
* **907**		Classification of natural landmarks and human footprint of Amazon using satellite data									
* **908**		Image Classification with Deep Learning									
* **909**		Understanding the Amazon Basin from Space									
* **911**		Deeprootz: Mapping the Amazon									
* **912**		Understanding the Amazon from Space									
* **913**		Understanding the Amazon from Space									
* **914**		Understanding the Amazon from Space									
* **915**		DeepRootz: Classifying satellite images of the Amazon rainforest									
* **916**		Trainforest									
* **917**		Land Cover Classification in the Amazon									
* **918**		Tracking Human Impact on the Amazon rainforest									
* **919**		Understanding the Amazon from Space									
* **920**		Guiding the management of cervical cancer with convolutional neural networks									
* **921**		Cervix Screening Classification for Cancer Preventive Treatment									
* **922**		Deep Learning Approaches for Determining Optimal Cervical Cancer Treatment									
* **923**		Cervix Type Classification Using Deep Learning and Image Classification									
* **924**		Deep Learning for Cervical Cancer									
* **925**		Identifying Cervix Types using Deep Convolutional Networks									
* **926**		Exploring Methods on Tiny ImageNet Problem									
* **927**		Wide Residual Network for the Tiny Image Net Challenge									
* **928**		The Power of Inception: Tackling the Tiny ImageNet Challenge									
* **929**		Unconstrained Handwriting Recognition									
* **930**		Tiny ImageNet Challenge									
* **931**		Deep Convolutional Neural Networks for ImageNet Classification									
* **933**		Overwhelming Tiny ImageNet: Bayesian Optimization on Residual Networks									
* **934**		A Dense Take on Inception									
* **935**		Tiny ImageNet Challenge									
* **937**		Classification on Tiny ImageNet									
* **938**		Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classiﬁcation Problems									
* **939**		Attention Networks for ImageNet Classification									
* **940**		Image Classification for Tiny ImageNet Challenge									
* **941**		Analyzing Deforestation in the Amazon									
		

================================================
FILE: python-colab.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "dzNng6vCL9eP"
   },
   "source": [
    "#CS231n Python Tutorial With Google Colab"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0vJLt3JRL9eR"
   },
   "source": [
    "This tutorial was originally written by [Justin Johnson](https://web.eecs.umich.edu/~justincj/) for cs231n. It was adapted as a Jupyter notebook for cs228 by [Volodymyr Kuleshov](http://web.stanford.edu/~kuleshov/) and [Isaac Caswell](https://symsys.stanford.edu/viewing/symsysaffiliate/21335).\n",
    "\n",
    "This version has been adapted for Colab by Kevin Zakka for the Spring 2020 edition of [cs231n](https://cs231n.github.io/). It runs Python3 by default."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qVrTo-LhL9eS"
   },
   "source": [
    "##Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9t1gKp9PL9eV"
   },
   "source": [
    "Python is a great general-purpose programming language on its own, but with the help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing.\n",
    "\n",
    "We expect that many of you will have some experience with Python and numpy; for the rest of you, this section will serve as a quick crash course both on the Python programming language and on the use of Python for scientific computing.\n",
    "\n",
    "Some of you may have previous knowledge in Matlab, in which case we also recommend the numpy for Matlab users page (https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-users.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U1PvreR9L9eW"
   },
   "source": [
    "In this tutorial, we will cover:\n",
    "\n",
    "* Basic Python: Basic data types (Containers, Lists, Dictionaries, Sets, Tuples), Functions, Classes\n",
    "* Numpy: Arrays, Array indexing, Datatypes, Array math, Broadcasting\n",
    "* Matplotlib: Plotting, Subplots, Images\n",
    "* IPython: Creating notebooks, Typical workflows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nxvEkGXPM3Xh"
   },
   "source": [
    "## A Brief Note on Python Versions\n",
    "\n",
    "As of Janurary 1, 2020, Python has [officially dropped support](https://www.python.org/doc/sunset-python-2/) for `python2`. We'll be using Python 3.7 for this iteration of the course. You can check your Python version at the command line by running `python --version`. In Colab, we can enforce the Python version by clicking `Runtime -> Change Runtime Type` and selecting `python3`. Note that as of April 2020, Colab uses Python 3.6.9 which should run everything without any errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "1L4Am0QATgOc",
    "outputId": "bb5ee3ac-8683-44ab-e599-a2077510f327"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python 3.6.9\n"
     ]
    }
   ],
   "source": [
    "!python --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "JAFKYgrpL9eY"
   },
   "source": [
    "##Basics of Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RbFS6tdgL9ea"
   },
   "source": [
    "Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable. As an example, here is an implementation of the classic quicksort algorithm in Python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "cYb0pjh1L9eb",
    "outputId": "9a8e37de-1dc1-4092-faee-06ad4ff2d73a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1, 1, 2, 3, 6, 8, 10]\n"
     ]
    }
   ],
   "source": [
    "def quicksort(arr):\n",
    "    if len(arr) <= 1:\n",
    "        return arr\n",
    "    pivot = arr[len(arr) // 2]\n",
    "    left = [x for x in arr if x < pivot]\n",
    "    middle = [x for x in arr if x == pivot]\n",
    "    right = [x for x in arr if x > pivot]\n",
    "    return quicksort(left) + middle + quicksort(right)\n",
    "\n",
    "print(quicksort([3,6,8,10,1,2,1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NwS_hu4xL9eo"
   },
   "source": [
    "###Basic data types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DL5sMSZ9L9eq"
   },
   "source": [
    "####Numbers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "MGS0XEWoL9er"
   },
   "source": [
    "Integers and floats work as you would expect from other languages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "KheDr_zDL9es",
    "outputId": "1db9f4d3-2e0d-4008-f78a-161ed52c4359"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3 <class 'int'>\n",
      "ERROR! Session/line number was not unique in database. History logging moved to new session 60\n"
     ]
    }
   ],
   "source": [
    "x = 3\n",
    "print(x, type(x))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "sk_8DFcuL9ey",
    "outputId": "dd60a271-3457-465d-e16a-41acf12a56ab"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n",
      "2\n",
      "6\n",
      "9\n"
     ]
    }
   ],
   "source": [
    "print(x + 1)   # Addition\n",
    "print(x - 1)   # Subtraction\n",
    "print(x * 2)   # Multiplication\n",
    "print(x ** 2)  # Exponentiation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "U4Jl8K0tL9e4",
    "outputId": "07e3db14-3781-42b7-8ba6-042b3f9f72ba"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "9\n",
      "18\n"
     ]
    }
   ],
   "source": [
    "x += 1\n",
    "print(x)\n",
    "x *= 2\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "w-nZ0Sg_L9e9",
    "outputId": "3aa579f8-9540-46ef-935e-be887781ecb4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'float'>\n",
      "2.5 3.5 5.0 6.25\n"
     ]
    }
   ],
   "source": [
    "y = 2.5\n",
    "print(type(y))\n",
    "print(y, y + 1, y * 2, y ** 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "r2A9ApyaL9fB"
   },
   "source": [
    "Note that unlike many languages, Python does not have unary increment (x++) or decrement (x--) operators.\n",
    "\n",
    "Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#numeric-types-int-float-long-complex)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EqRS7qhBL9fC"
   },
   "source": [
    "####Booleans"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Nv_LIVOJL9fD"
   },
   "source": [
    "Python implements all of the usual operators for Boolean logic, but uses English words rather than symbols (`&&`, `||`, etc.):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "RvoImwgGL9fE",
    "outputId": "1517077b-edca-463f-857b-6a8c386cd387"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'bool'>\n"
     ]
    }
   ],
   "source": [
    "t, f = True, False\n",
    "print(type(t))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YQgmQfOgL9fI"
   },
   "source": [
    "Now we let's look at the operations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "6zYm7WzCL9fK",
    "outputId": "f3cebe76-5af4-473a-8127-88a1fd60560f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "False\n",
      "True\n",
      "False\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "print(t and f) # Logical AND;\n",
    "print(t or f)  # Logical OR;\n",
    "print(not t)   # Logical NOT;\n",
    "print(t != f)  # Logical XOR;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UQnQWFEyL9fP"
   },
   "source": [
    "####Strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "AijEDtPFL9fP",
    "outputId": "2a6b0cd7-58f1-43cf-e6b7-bf940d532549"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello 5\n"
     ]
    }
   ],
   "source": [
    "hello = 'hello'   # String literals can use single quotes\n",
    "world = \"world\"   # or double quotes; it does not matter\n",
    "print(hello, len(hello))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "saDeaA7hL9fT",
    "outputId": "2837d0ab-9ae5-4053-d087-bfa0af81c344"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello world\n"
     ]
    }
   ],
   "source": [
    "hw = hello + ' ' + world  # String concatenation\n",
    "print(hw)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "Nji1_UjYL9fY",
    "outputId": "0149b0ca-425a-4a34-8e24-8dff7080922e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hello world 12\n"
     ]
    }
   ],
   "source": [
    "hw12 = '{} {} {}'.format(hello, world, 12)  # string formatting\n",
    "print(hw12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bUpl35bIL9fc"
   },
   "source": [
    "String objects have a bunch of useful methods; for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 121
    },
    "colab_type": "code",
    "id": "VOxGatlsL9fd",
    "outputId": "ab009df3-8643-4d3e-f85f-a813b70db9cb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n",
      "HELLO\n",
      "  hello\n",
      " hello \n",
      "he(ell)(ell)o\n",
      "world\n"
     ]
    }
   ],
   "source": [
    "s = \"hello\"\n",
    "print(s.capitalize())  # Capitalize a string\n",
    "print(s.upper())       # Convert a string to uppercase; prints \"HELLO\"\n",
    "print(s.rjust(7))      # Right-justify a string, padding with spaces\n",
    "print(s.center(7))     # Center a string, padding with spaces\n",
    "print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another\n",
    "print('  world '.strip())  # Strip leading and trailing whitespace"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "06cayXLtL9fi"
   },
   "source": [
    "You can find a list of all string methods in the [documentation](https://docs.python.org/3.7/library/stdtypes.html#string-methods)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "p-6hClFjL9fk"
   },
   "source": [
    "###Containers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FD9H18eQL9fk"
   },
   "source": [
    "Python includes several built-in container types: lists, dictionaries, sets, and tuples."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UsIWOe0LL9fn"
   },
   "source": [
    "####Lists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wzxX7rgWL9fn"
   },
   "source": [
    "A list is the Python equivalent of an array, but is resizeable and can contain elements of different types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "hk3A8pPcL9fp",
    "outputId": "b545939a-580c-4356-db95-7ad3670b46e4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 2] 2\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "xs = [3, 1, 2]   # Create a list\n",
    "print(xs, xs[2])\n",
    "print(xs[-1])     # Negative indices count from the end of the list; prints \"2\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "YCjCy_0_L9ft",
    "outputId": "417c54ff-170b-4372-9099-0f756f8e48af"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 'foo']\n"
     ]
    }
   ],
   "source": [
    "xs[2] = 'foo'    # Lists can contain elements of different types\n",
    "print(xs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "vJ0x5cF-L9fx",
    "outputId": "a97731a3-70e1-4553-d9e0-2aea227cac80"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3, 1, 'foo', 'bar']\n"
     ]
    }
   ],
   "source": [
    "xs.append('bar') # Add a new element to the end of the list\n",
    "print(xs)  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "cxVCNRTNL9f1",
    "outputId": "508fbe59-20aa-48b5-a1b2-f90363e7a104"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bar [3, 1, 'foo']\n"
     ]
    }
   ],
   "source": [
    "x = xs.pop()     # Remove and return the last element of the list\n",
    "print(x, xs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ilyoyO34L9f4"
   },
   "source": [
    "As usual, you can find all the gory details about lists in the [documentation](https://docs.python.org/3.7/tutorial/datastructures.html#more-on-lists)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ovahhxd_L9f5"
   },
   "source": [
    "####Slicing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YeSYKhv9L9f6"
   },
   "source": [
    "In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 139
    },
    "colab_type": "code",
    "id": "ninq666bL9f6",
    "outputId": "c3c2ed92-7358-4fdb-bbc0-e90f82e7e941"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 2, 3, 4]\n",
      "[2, 3]\n",
      "[2, 3, 4]\n",
      "[0, 1]\n",
      "[0, 1, 2, 3, 4]\n",
      "[0, 1, 2, 3]\n",
      "[0, 1, 8, 9, 4]\n"
     ]
    }
   ],
   "source": [
    "nums = list(range(5))    # range is a built-in function that creates a list of integers\n",
    "print(nums)         # Prints \"[0, 1, 2, 3, 4]\"\n",
    "print(nums[2:4])    # Get a slice from index 2 to 4 (exclusive); prints \"[2, 3]\"\n",
    "print(nums[2:])     # Get a slice from index 2 to the end; prints \"[2, 3, 4]\"\n",
    "print(nums[:2])     # Get a slice from the start to index 2 (exclusive); prints \"[0, 1]\"\n",
    "print(nums[:])      # Get a slice of the whole list; prints [\"0, 1, 2, 3, 4]\"\n",
    "print(nums[:-1])    # Slice indices can be negative; prints [\"0, 1, 2, 3]\"\n",
    "nums[2:4] = [8, 9] # Assign a new sublist to a slice\n",
    "print(nums)         # Prints \"[0, 1, 8, 9, 4]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UONpMhF4L9f_"
   },
   "source": [
    "####Loops"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_DYz1j6QL9f_"
   },
   "source": [
    "You can loop over the elements of a list like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "4cCOysfWL9gA",
    "outputId": "560e46c7-279c-409a-838c-64bea8d321c4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cat\n",
      "dog\n",
      "monkey\n"
     ]
    }
   ],
   "source": [
    "animals = ['cat', 'dog', 'monkey']\n",
    "for animal in animals:\n",
    "    print(animal)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KxIaQs7pL9gE"
   },
   "source": [
    "If you want access to the index of each element within the body of a loop, use the built-in `enumerate` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "JjGnDluWL9gF",
    "outputId": "81421905-17ea-4c5a-bcc0-176de19fd9bd"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#1: cat\n",
      "#2: dog\n",
      "#3: monkey\n"
     ]
    }
   ],
   "source": [
    "animals = ['cat', 'dog', 'monkey']\n",
    "for idx, animal in enumerate(animals):\n",
    "    print('#{}: {}'.format(idx + 1, animal))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "arrLCcMyL9gK"
   },
   "source": [
    "####List comprehensions:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5Qn2jU_pL9gL"
   },
   "source": [
    "When programming, frequently we want to transform one type of data into another. As a simple example, consider the following code that computes square numbers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "IVNEwoMXL9gL",
    "outputId": "d571445b-055d-45f0-f800-24fd76ceec5a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 4, 9, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "squares = []\n",
    "for x in nums:\n",
    "    squares.append(x ** 2)\n",
    "print(squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7DmKVUFaL9gQ"
   },
   "source": [
    "You can make this code simpler using a list comprehension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "kZxsUfV6L9gR",
    "outputId": "4254a7d4-58ba-4f70-a963-20c46b485b72"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 1, 4, 9, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "squares = [x ** 2 for x in nums]\n",
    "print(squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-D8ARK7tL9gV"
   },
   "source": [
    "List comprehensions can also contain conditions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "yUtgOyyYL9gV",
    "outputId": "1ae7ab58-8119-44dc-8e57-fda09197d026"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0, 4, 16]\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "even_squares = [x ** 2 for x in nums if x % 2 == 0]\n",
    "print(even_squares)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "H8xsUEFpL9gZ"
   },
   "source": [
    "####Dictionaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kkjAGMAJL9ga"
   },
   "source": [
    "A dictionary stores (key, value) pairs, similar to a `Map` in Java or an object in Javascript. You can use it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "XBYI1MrYL9gb",
    "outputId": "8e24c1da-0fc0-4b4c-a3e6-6f758a53b7da"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "cute\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data\n",
    "print(d['cat'])       # Get an entry from a dictionary; prints \"cute\"\n",
    "print('cat' in d)     # Check if a dictionary has a given key; prints \"True\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "pS7e-G-HL9gf",
    "outputId": "feb4bf18-c0a3-42a2-eaf5-3fc390f36dcf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "wet\n"
     ]
    }
   ],
   "source": [
    "d['fish'] = 'wet'    # Set an entry in a dictionary\n",
    "print(d['fish'])      # Prints \"wet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 165
    },
    "colab_type": "code",
    "id": "tFY065ItL9gi",
    "outputId": "7e42a5f0-1856-4608-a927-0930ab37a66c"
   },
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "ignored",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-31-78fc9745d9cf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'monkey'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m  \u001b[0;31m# KeyError: 'monkey' not a key of d\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m: 'monkey'"
     ]
    }
   ],
   "source": [
    "print(d['monkey'])  # KeyError: 'monkey' not a key of d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "8TjbEWqML9gl",
    "outputId": "ef14d05e-401d-4d23-ed1a-0fe6b4c77d6f"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n",
      "wet\n"
     ]
    }
   ],
   "source": [
    "print(d.get('monkey', 'N/A'))  # Get an element with a default; prints \"N/A\"\n",
    "print(d.get('fish', 'N/A'))    # Get an element with a default; prints \"wet\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "0EItdNBJL9go",
    "outputId": "652a950f-b0c2-4623-98bd-0191b300cd57"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "del d['fish']        # Remove an element from a dictionary\n",
    "print(d.get('fish', 'N/A')) # \"fish\" is no longer a key; prints \"N/A\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "wqm4dRZNL9gr"
   },
   "source": [
    "You can find all you need to know about dictionaries in the [documentation](https://docs.python.org/2/library/stdtypes.html#dict)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "IxwEqHlGL9gr"
   },
   "source": [
    "It is easy to iterate over the keys in a dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "rYfz7ZKNL9gs",
    "outputId": "155bdb17-3179-4292-c832-8166e955e942"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "A person has 2 legs\n",
      "A cat has 4 legs\n",
      "A spider has 8 legs\n"
     ]
    }
   ],
   "source": [
    "d = {'person': 2, 'cat': 4, 'spider': 8}\n",
    "for animal, legs in d.items():\n",
    "    print('A {} has {} legs'.format(animal, legs))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "17sxiOpzL9gz"
   },
   "source": [
    "Dictionary comprehensions: These are similar to list comprehensions, but allow you to easily construct dictionaries. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "8PB07imLL9gz",
    "outputId": "e9ddf886-39ed-4f35-dd80-64a19d2eec9b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0: 0, 2: 4, 4: 16}\n"
     ]
    }
   ],
   "source": [
    "nums = [0, 1, 2, 3, 4]\n",
    "even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}\n",
    "print(even_num_to_square)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "V9MHfUdvL9g2"
   },
   "source": [
    "####Sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Rpm4UtNpL9g2"
   },
   "source": [
    "A set is an unordered collection of distinct elements. As a simple example, consider the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "MmyaniLsL9g2",
    "outputId": "8f152d48-0a07-432a-cf98-8de4fd57ddbb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "animals = {'cat', 'dog'}\n",
    "print('cat' in animals)   # Check if an element is in a set; prints \"True\"\n",
    "print('fish' in animals)  # prints \"False\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ElJEyK86L9g6",
    "outputId": "b9d7dab9-5a98-41cd-efbc-786d0c4377f7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "animals.add('fish')      # Add an element to a set\n",
    "print('fish' in animals)\n",
    "print(len(animals))       # Number of elements in a set;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "5uGmrxdPL9g9",
    "outputId": "e644d24c-26c6-4b43-ab15-8aa81fe884d4"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "2\n"
     ]
    }
   ],
   "source": [
    "animals.add('cat')       # Adding an element that is already in the set does nothing\n",
    "print(len(animals))       \n",
    "animals.remove('cat')    # Remove an element from a set\n",
    "print(len(animals))       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zk2DbvLKL9g_"
   },
   "source": [
    "_Loops_: Iterating over a set has the same syntax as iterating over a list; however since sets are unordered, you cannot make assumptions about the order in which you visit the elements of the set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "K47KYNGyL9hA",
    "outputId": "4477f897-4355-4816-b39b-b93ffbac4bf0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#1: dog\n",
      "#2: cat\n",
      "#3: fish\n"
     ]
    }
   ],
   "source": [
    "animals = {'cat', 'dog', 'fish'}\n",
    "for idx, animal in enumerate(animals):\n",
    "    print('#{}: {}'.format(idx + 1, animal))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "puq4S8buL9hC"
   },
   "source": [
    "Set comprehensions: Like lists and dictionaries, we can easily construct sets using set comprehensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "iw7k90k3L9hC",
    "outputId": "72d6b824-6d31-47b2-f929-4cf434590ee5"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{0, 1, 2, 3, 4, 5}\n"
     ]
    }
   ],
   "source": [
    "from math import sqrt\n",
    "print({int(sqrt(x)) for x in range(30)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "qPsHSKB1L9hF"
   },
   "source": [
    "####Tuples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kucc0LKVL9hG"
   },
   "source": [
    "A tuple is an (immutable) ordered list of values. A tuple is in many ways similar to a list; one of the most important differences is that tuples can be used as keys in dictionaries and as elements of sets, while lists cannot. Here is a trivial example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "9wHUyTKxL9hH",
    "outputId": "cdc5f620-04fe-4b0b-df7a-55b061d23d88"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'tuple'>\n",
      "5\n",
      "1\n"
     ]
    }
   ],
   "source": [
    "d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys\n",
    "t = (5, 6)       # Create a tuple\n",
    "print(type(t))\n",
    "print(d[t])       \n",
    "print(d[(1, 2)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 165
    },
    "colab_type": "code",
    "id": "HoO8zYKzL9hJ",
    "outputId": "28862bfc-0298-40d7-f8c4-168e109d2d93"
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "ignored",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-43-c8aeb8cd20ae>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mt\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "t[0] = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "AXA4jrEOL9hM"
   },
   "source": [
    "###Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WaRms-QfL9hN"
   },
   "source": [
    "Python functions are defined using the `def` keyword. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "kiMDUr58L9hN",
    "outputId": "9f53bf9a-7b2a-4c51-9def-398e4677cd6c"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "negative\n",
      "zero\n",
      "positive\n"
     ]
    }
   ],
   "source": [
    "def sign(x):\n",
    "    if x > 0:\n",
    "        return 'positive'\n",
    "    elif x < 0:\n",
    "        return 'negative'\n",
    "    else:\n",
    "        return 'zero'\n",
    "\n",
    "for x in [-1, 0, 1]:\n",
    "    print(sign(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U-QJFt8TL9hR"
   },
   "source": [
    "We will often define functions to take optional keyword arguments, like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "PfsZ3DazL9hR",
    "outputId": "6e6af832-67d8-4d8c-949b-335927684ae3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, Bob!\n",
      "HELLO, FRED\n"
     ]
    }
   ],
   "source": [
    "def hello(name, loud=False):\n",
    "    if loud:\n",
    "        print('HELLO, {}'.format(name.upper()))\n",
    "    else:\n",
    "        print('Hello, {}!'.format(name))\n",
    "\n",
    "hello('Bob')\n",
    "hello('Fred', loud=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ObA9PRtQL9hT"
   },
   "source": [
    "###Classes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hAzL_lTkL9hU"
   },
   "source": [
    "The syntax for defining classes in Python is straightforward:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "RWdbaGigL9hU",
    "outputId": "4f6615c5-75a7-4ce4-8ea1-1e7f5e4e9fc3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello, Fred!\n",
      "HELLO, FRED\n"
     ]
    }
   ],
   "source": [
    "class Greeter:\n",
    "\n",
    "    # Constructor\n",
    "    def __init__(self, name):\n",
    "        self.name = name  # Create an instance variable\n",
    "\n",
    "    # Instance method\n",
    "    def greet(self, loud=False):\n",
    "        if loud:\n",
    "          print('HELLO, {}'.format(self.name.upper()))\n",
    "        else:\n",
    "          print('Hello, {}!'.format(self.name))\n",
    "\n",
    "g = Greeter('Fred')  # Construct an instance of the Greeter class\n",
    "g.greet()            # Call an instance method; prints \"Hello, Fred\"\n",
    "g.greet(loud=True)   # Call an instance method; prints \"HELLO, FRED!\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3cfrOV4dL9hW"
   },
   "source": [
    "##Numpy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "fY12nHhyL9hX"
   },
   "source": [
    "Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this [tutorial](http://wiki.scipy.org/NumPy_for_Matlab_Users) useful to get started with Numpy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lZMyAdqhL9hY"
   },
   "source": [
    "To use Numpy, we first need to import the `numpy` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "58QdX8BLL9hZ"
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DDx6v1EdL9hb"
   },
   "source": [
    "###Arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "f-Zv3f7LL9hc"
   },
   "source": [
    "A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_eMTRnZRL9hc"
   },
   "source": [
    "We can initialize numpy arrays from nested Python lists, and access elements using square brackets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "-l3JrGxCL9hc",
    "outputId": "8d9dad18-c734-4a8a-ca8c-44060a40fb79"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> (3,) 1 2 3\n",
      "[5 2 3]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([1, 2, 3])  # Create a rank 1 array\n",
    "print(type(a), a.shape, a[0], a[1], a[2])\n",
    "a[0] = 5                 # Change an element of the array\n",
    "print(a)                  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ma6mk-kdL9hh",
    "outputId": "0b54ff2f-e7f1-4b30-c653-9bf81cb8fbb0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]\n",
      " [4 5 6]]\n"
     ]
    }
   ],
   "source": [
    "b = np.array([[1,2,3],[4,5,6]])   # Create a rank 2 array\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "ymfSHAwtL9hj",
    "outputId": "5bd292d8-c751-43b9-d480-f357dde52342"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 3)\n",
      "1 2 4\n"
     ]
    }
   ],
   "source": [
    "print(b.shape)\n",
    "print(b[0, 0], b[0, 1], b[1, 0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "F2qwdyvuL9hn"
   },
   "source": [
    "Numpy also provides many functions to create arrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "mVTN_EBqL9hn",
    "outputId": "d267c65f-ba90-4043-cedb-f468ab1bcc5d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0. 0.]\n",
      " [0. 0.]]\n"
     ]
    }
   ],
   "source": [
    "a = np.zeros((2,2))  # Create an array of all zeros\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "skiKlNmlL9h5",
    "outputId": "7d1ec1b5-a1fe-4f44-cbe3-cdeacad425f1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1. 1.]]\n"
     ]
    }
   ],
   "source": [
    "b = np.ones((1,2))   # Create an array of all ones\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "HtFsr03bL9h7",
    "outputId": "2688b157-2fad-4fc6-f20b-8633207f0326"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[7 7]\n",
      " [7 7]]\n"
     ]
    }
   ],
   "source": [
    "c = np.full((2,2), 7) # Create a constant array\n",
    "print(c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "-QcALHvkL9h9",
    "outputId": "5035d6fe-cb7e-4222-c972-55fe23c9d4c0"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1. 0.]\n",
      " [0. 1.]]\n"
     ]
    }
   ],
   "source": [
    "d = np.eye(2)        # Create a 2x2 identity matrix\n",
    "print(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "RCpaYg9qL9iA",
    "outputId": "25f0b387-39cf-42f3-8701-de860cc75e2e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.8690054  0.57244319]\n",
      " [0.29647245 0.81464494]]\n"
     ]
    }
   ],
   "source": [
    "e = np.random.random((2,2)) # Create an array filled with random values\n",
    "print(e)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jI5qcSDfL9iC"
   },
   "source": [
    "###Array indexing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "M-E4MUeVL9iC"
   },
   "source": [
    "Numpy offers several ways to index into arrays."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "QYv4JyIEL9iD"
   },
   "source": [
    "Slicing: Similar to Python lists, numpy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "wLWA0udwL9iD",
    "outputId": "99f08618-c513-4982-8982-b146fc72dab3"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2 3]\n",
      " [6 7]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Create the following rank 2 array with shape (3, 4)\n",
    "# [[ 1  2  3  4]\n",
    "#  [ 5  6  7  8]\n",
    "#  [ 9 10 11 12]]\n",
    "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n",
    "\n",
    "# Use slicing to pull out the subarray consisting of the first 2 rows\n",
    "# and columns 1 and 2; b is the following array of shape (2, 2):\n",
    "# [[2 3]\n",
    "#  [6 7]]\n",
    "b = a[:2, 1:3]\n",
    "print(b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KahhtZKYL9iF"
   },
   "source": [
    "A slice of an array is a view into the same data, so modifying it will modify the original array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "1kmtaFHuL9iG",
    "outputId": "ee3ab60c-4064-4a9e-b04c-453d3955f1d1"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n",
      "77\n"
     ]
    }
   ],
   "source": [
    "print(a[0, 1])\n",
    "b[0, 0] = 77    # b[0, 0] is the same piece of data as a[0, 1]\n",
    "print(a[0, 1]) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_Zcf3zi-L9iI"
   },
   "source": [
    "You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. Note that this is quite different from the way that MATLAB handles array slicing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "G6lfbPuxL9iJ",
    "outputId": "a225fe9d-2a29-4e14-a243-2b7d583bd4bc"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1  2  3  4]\n",
      " [ 5  6  7  8]\n",
      " [ 9 10 11 12]]\n"
     ]
    }
   ],
   "source": [
    "# Create the following rank 2 array with shape (3, 4)\n",
    "a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NCye3NXhL9iL"
   },
   "source": [
    "Two ways of accessing the data in the middle row of the array.\n",
    "Mixing integer indexing with slices yields an array of lower rank,\n",
    "while using only slices yields an array of the same rank as the\n",
    "original array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "EOiEMsmNL9iL",
    "outputId": "ab2ebe48-9002-45a8-9462-fd490b467f40"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 6 7 8] (4,)\n",
      "[[5 6 7 8]] (1, 4)\n",
      "[[5 6 7 8]] (1, 4)\n"
     ]
    }
   ],
   "source": [
    "row_r1 = a[1, :]    # Rank 1 view of the second row of a  \n",
    "row_r2 = a[1:2, :]  # Rank 2 view of the second row of a\n",
    "row_r3 = a[[1], :]  # Rank 2 view of the second row of a\n",
    "print(row_r1, row_r1.shape)\n",
    "print(row_r2, row_r2.shape)\n",
    "print(row_r3, row_r3.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "JXu73pfDL9iN",
    "outputId": "6c589b85-e9b0-4c13-a39d-4cd9fb2f41ac"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 2  6 10] (3,)\n",
      "\n",
      "[[ 2]\n",
      " [ 6]\n",
      " [10]] (3, 1)\n"
     ]
    }
   ],
   "source": [
    "# We can make the same distinction when accessing columns of an array:\n",
    "col_r1 = a[:, 1]\n",
    "col_r2 = a[:, 1:2]\n",
    "print(col_r1, col_r1.shape)\n",
    "print()\n",
    "print(col_r2, col_r2.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "VP3916bOL9iP"
   },
   "source": [
    "Integer array indexing: When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "TBnWonIDL9iP",
    "outputId": "c29fa2cd-234e-4765-c70a-6889acc63573"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 4 5]\n",
      "[1 4 5]\n"
     ]
    }
   ],
   "source": [
    "a = np.array([[1,2], [3, 4], [5, 6]])\n",
    "\n",
    "# An example of integer array indexing.\n",
    "# The returned array will have shape (3,) and \n",
    "print(a[[0, 1, 2], [0, 1, 0]])\n",
    "\n",
    "# The above example of integer array indexing is equivalent to this:\n",
    "print(np.array([a[0, 0], a[1, 1], a[2, 0]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "n7vuati-L9iR",
    "outputId": "c3e9ba14-f66e-4202-999e-2e1aed5bd631"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2 2]\n",
      "[2 2]\n"
     ]
    }
   ],
   "source": [
    "# When using integer array indexing, you can reuse the same\n",
    "# element from the source array:\n",
    "print(a[[0, 0], [1, 1]])\n",
    "\n",
    "# Equivalent to the previous integer array indexing example\n",
    "print(np.array([a[0, 1], a[0, 1]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kaipSLafL9iU"
   },
   "source": [
    "One useful trick with integer array indexing is selecting or mutating one element from each row of a matrix:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "ehqsV7TXL9iU",
    "outputId": "de509c40-4ee4-4b7c-e75d-1a936a3350e7"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1  2  3]\n",
      " [ 4  5  6]\n",
      " [ 7  8  9]\n",
      " [10 11 12]]\n"
     ]
    }
   ],
   "source": [
    "# Create a new array from which we will select elements\n",
    "a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "pAPOoqy5L9iV",
    "outputId": "f812e29b-9218-4767-d3a8-e9854e754e68"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 1  6  7 11]\n"
     ]
    }
   ],
   "source": [
    "# Create an array of indices\n",
    "b = np.array([0, 2, 0, 1])\n",
    "\n",
    "# Select one element from each row of a using the indices in b\n",
    "print(a[np.arange(4), b])  # Prints \"[ 1  6  7 11]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "6v1PdI1DL9ib",
    "outputId": "89f50f82-de1b-4417-e55c-edbc0ee07584"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[11  2  3]\n",
      " [ 4  5 16]\n",
      " [17  8  9]\n",
      " [10 21 12]]\n"
     ]
    }
   ],
   "source": [
    "# Mutate one element from each row of a using the indices in b\n",
    "a[np.arange(4), b] += 10\n",
    "print(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kaE8dBGgL9id"
   },
   "source": [
    "Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "32PusjtKL9id",
    "outputId": "8782e8ec-b78d-44d7-8141-23e39750b854"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[False False]\n",
      " [ True  True]\n",
      " [ True  True]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "a = np.array([[1,2], [3, 4], [5, 6]])\n",
    "\n",
    "bool_idx = (a > 2)  # Find the elements of a that are bigger than 2;\n",
    "                    # this returns a numpy array of Booleans of the same\n",
    "                    # shape as a, where each slot of bool_idx tells\n",
    "                    # whether that element of a is > 2.\n",
    "\n",
    "print(bool_idx)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "cb2IRMXaL9if",
    "outputId": "5983f208-3738-472d-d6ab-11fe85b36c95"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[3 4 5 6]\n",
      "[3 4 5 6]\n"
     ]
    }
   ],
   "source": [
    "# We use boolean array indexing to construct a rank 1 array\n",
    "# consisting of the elements of a corresponding to the True values\n",
    "# of bool_idx\n",
    "print(a[bool_idx])\n",
    "\n",
    "# We can do all of the above in a single concise statement:\n",
    "print(a[a > 2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CdofMonAL9ih"
   },
   "source": [
    "For brevity we have left out a lot of details about numpy array indexing; if you want to know more you should read the documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jTctwqdQL9ih"
   },
   "source": [
    "###Datatypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kSZQ1WkIL9ih"
   },
   "source": [
    "Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "4za4O0m5L9ih",
    "outputId": "2ea4fb80-a4df-43f9-c162-5665895c13ae"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "int64 float64 int64\n"
     ]
    }
   ],
   "source": [
    "x = np.array([1, 2])  # Let numpy choose the datatype\n",
    "y = np.array([1.0, 2.0])  # Let numpy choose the datatype\n",
    "z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype\n",
    "\n",
    "print(x.dtype, y.dtype, z.dtype)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "RLVIsZQpL9ik"
   },
   "source": [
    "You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TuB-fdhIL9ik"
   },
   "source": [
    "###Array math"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "18e8V8elL9ik"
   },
   "source": [
    "Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "gHKvBrSKL9il",
    "outputId": "a8a924b1-9d60-4b68-8fd3-e4657ae3f08b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 6.  8.]\n",
      " [10. 12.]]\n",
      "[[ 6.  8.]\n",
      " [10. 12.]]\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]], dtype=np.float64)\n",
    "y = np.array([[5,6],[7,8]], dtype=np.float64)\n",
    "\n",
    "# Elementwise sum; both produce the array\n",
    "print(x + y)\n",
    "print(np.add(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "1fZtIAMxL9in",
    "outputId": "122f1380-6144-4d6c-9d31-f62d839889a2"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-4. -4.]\n",
      " [-4. -4.]]\n",
      "[[-4. -4.]\n",
      " [-4. -4.]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise difference; both produce the array\n",
    "print(x - y)\n",
    "print(np.subtract(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "nil4AScML9io",
    "outputId": "038c8bb2-122b-4e59-c0a8-a091014fe68e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5. 12.]\n",
      " [21. 32.]]\n",
      "[[ 5. 12.]\n",
      " [21. 32.]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise product; both produce the array\n",
    "print(x * y)\n",
    "print(np.multiply(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "0JoA4lH6L9ip",
    "outputId": "12351a74-7871-4bc2-97ce-a508bf4810da"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.2        0.33333333]\n",
      " [0.42857143 0.5       ]]\n",
      "[[0.2        0.33333333]\n",
      " [0.42857143 0.5       ]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise division; both produce the array\n",
    "# [[ 0.2         0.33333333]\n",
    "#  [ 0.42857143  0.5       ]]\n",
    "print(x / y)\n",
    "print(np.divide(x, y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "g0iZuA6bL9ir",
    "outputId": "29927dda-4167-4aa8-fbda-9008b09e4356"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1.         1.41421356]\n",
      " [1.73205081 2.        ]]\n"
     ]
    }
   ],
   "source": [
    "# Elementwise square root; produces the array\n",
    "# [[ 1.          1.41421356]\n",
    "#  [ 1.73205081  2.        ]]\n",
    "print(np.sqrt(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "a5d_uujuL9it"
   },
   "source": [
    "Note that unlike MATLAB, `*` is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "I3FnmoSeL9iu",
    "outputId": "46f4575a-2e5e-4347-a34e-0cc5bd280110"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "219\n",
      "219\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]])\n",
    "y = np.array([[5,6],[7,8]])\n",
    "\n",
    "v = np.array([9,10])\n",
    "w = np.array([11, 12])\n",
    "\n",
    "# Inner product of vectors; both produce 219\n",
    "print(v.dot(w))\n",
    "print(np.dot(v, w))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "vmxPbrHASVeA"
   },
   "source": [
    "You can also use the `@` operator which is equivalent to numpy's `dot` operator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "id": "vyrWA-mXSdtt",
    "outputId": "a9aae545-2c93-4649-b220-b097655955f6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "219\n"
     ]
    }
   ],
   "source": [
    "print(v @ w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "zvUODeTxL9iw",
    "outputId": "4093fc76-094f-4453-a421-a212b5226968"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[29 67]\n",
      "[29 67]\n",
      "[29 67]\n"
     ]
    }
   ],
   "source": [
    "# Matrix / vector product; both produce the rank 1 array [29 67]\n",
    "print(x.dot(v))\n",
    "print(np.dot(x, v))\n",
    "print(x @ v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 121
    },
    "colab_type": "code",
    "id": "3V_3NzNEL9iy",
    "outputId": "af2a89f9-af5d-47a6-9ad2-06a84b521b94"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[19 22]\n",
      " [43 50]]\n",
      "[[19 22]\n",
      " [43 50]]\n",
      "[[19 22]\n",
      " [43 50]]\n"
     ]
    }
   ],
   "source": [
    "# Matrix / matrix product; both produce the rank 2 array\n",
    "# [[19 22]\n",
    "#  [43 50]]\n",
    "print(x.dot(y))\n",
    "print(np.dot(x, y))\n",
    "print(x @ y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FbE-1If_L9i0"
   },
   "source": [
    "Numpy provides many useful functions for performing computations on arrays; one of the most useful is `sum`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "DZUdZvPrL9i0",
    "outputId": "99cad470-d692-4b25-91c9-a57aa25f4c6e"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "[4 6]\n",
      "[3 7]\n"
     ]
    }
   ],
   "source": [
    "x = np.array([[1,2],[3,4]])\n",
    "\n",
    "print(np.sum(x))  # Compute sum of all elements; prints \"10\"\n",
    "print(np.sum(x, axis=0))  # Compute sum of each column; prints \"[4 6]\"\n",
    "print(np.sum(x, axis=1))  # Compute sum of each row; prints \"[3 7]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ahdVW4iUL9i3"
   },
   "source": [
    "You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).\n",
    "\n",
    "Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "63Yl1f3oL9i3",
    "outputId": "c75ac7ba-4351-42f8-a09c-a4e0d966ab50"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2]\n",
      " [3 4]]\n",
      "transpose\n",
      " [[1 3]\n",
      " [2 4]]\n"
     ]
    }
   ],
   "source": [
    "print(x)\n",
    "print(\"transpose\\n\", x.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 104
    },
    "colab_type": "code",
    "id": "mkk03eNIL9i4",
    "outputId": "499eec5a-55b7-473a-d4aa-9d023d63885a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 2 3]]\n",
      "transpose\n",
      " [[1]\n",
      " [2]\n",
      " [3]]\n"
     ]
    }
   ],
   "source": [
    "v = np.array([[1,2,3]])\n",
    "print(v )\n",
    "print(\"transpose\\n\", v.T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "REfLrUTcL9i7"
   },
   "source": [
    "###Broadcasting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "EygGAMWqL9i7"
   },
   "source": [
    "Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.\n",
    "\n",
    "For example, suppose that we want to add a constant vector to each row of a matrix. We could do it like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "WEEvkV1ZL9i7",
    "outputId": "3896d03c-3ece-4aa8-f675-aef3a220574d"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "# We will add the vector v to each row of the matrix x,\n",
    "# storing the result in the matrix y\n",
    "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "v = np.array([1, 0, 1])\n",
    "y = np.empty_like(x)   # Create an empty matrix with the same shape as x\n",
    "\n",
    "# Add the vector v to each row of the matrix x with an explicit loop\n",
    "for i in range(4):\n",
    "    y[i, :] = x[i, :] + v\n",
    "\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2OlXXupEL9i-"
   },
   "source": [
    "This works; however when the matrix `x` is very large, computing an explicit loop in Python could be slow. Note that adding the vector v to each row of the matrix `x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically, then performing elementwise summation of `x` and `vv`. We could implement this approach like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "vS7UwAQQL9i-",
    "outputId": "8621e502-c25d-4a18-c973-886dbfd1df36"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1 0 1]\n",
      " [1 0 1]\n",
      " [1 0 1]\n",
      " [1 0 1]]\n"
     ]
    }
   ],
   "source": [
    "vv = np.tile(v, (4, 1))  # Stack 4 copies of v on top of each other\n",
    "print(vv)                # Prints \"[[1 0 1]\n",
    "                         #          [1 0 1]\n",
    "                         #          [1 0 1]\n",
    "                         #          [1 0 1]]\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "N0hJphSIL9jA",
    "outputId": "def6a757-170c-43bf-8728-732dfb133273"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "y = x + vv  # Add x and vv elementwise\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zHos6RJnL9jB"
   },
   "source": [
    "Numpy broadcasting allows us to perform this computation without actually creating multiple copies of v. Consider this version, using broadcasting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "id": "vnYFb-gYL9jC",
    "outputId": "df3bea8a-ad72-4a83-90bb-306b55c6fb93"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  2  4]\n",
      " [ 5  5  7]\n",
      " [ 8  8 10]\n",
      " [11 11 13]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# We will add the vector v to each row of the matrix x,\n",
    "# storing the result in the matrix y\n",
    "x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])\n",
    "v = np.array([1, 0, 1])\n",
    "y = x + v  # Add v to each row of x using broadcasting\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "08YyIURKL9jH"
   },
   "source": [
    "The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape `(3,)` due to broadcasting; this line works as if v actually had shape `(4, 3)`, where each row was a copy of `v`, and the sum was performed elementwise.\n",
    "\n",
    "Broadcasting two arrays together follows these rules:\n",
    "\n",
    "1. If the arrays do not have the same rank, prepend the shape of the lower rank array with 1s until both shapes have the same length.\n",
    "2. The two arrays are said to be compatible in a dimension if they have the same size in the dimension, or if one of the arrays has size 1 in that dimension.\n",
    "3. The arrays can be broadcast together if they are compatible in all dimensions.\n",
    "4. After broadcasting, each array behaves as if it had shape equal to the elementwise maximum of shapes of the two input arrays.\n",
    "5. In any dimension where one array had size 1 and the other array had size greater than 1, the first array behaves as if it were copied along that dimension\n",
    "\n",
    "If this explanation does not make sense, try reading the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) or this [explanation](http://wiki.scipy.org/EricsBroadcastingDoc).\n",
    "\n",
    "Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).\n",
    "\n",
    "Here are some applications of broadcasting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 69
    },
    "colab_type": "code",
    "id": "EmQnwoM9L9jH",
    "outputId": "f59e181e-e2d4-416c-d094-c4d003ce8509"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 4  5]\n",
      " [ 8 10]\n",
      " [12 15]]\n"
     ]
    }
   ],
   "source": [
    "# Compute outer product of vectors\n",
    "v = np.array([1,2,3])  # v has shape (3,)\n",
    "w = np.array([4,5])    # w has shape (2,)\n",
    "# To compute an outer product, we first reshape v to be a column\n",
    "# vector of shape (3, 1); we can then broadcast it against w to yield\n",
    "# an output of shape (3, 2), which is the outer product of v and w:\n",
    "\n",
    "print(np.reshape(v, (3, 1)) * w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "PgotmpcnL9jK",
    "outputId": "567763d3-073a-4e3c-9ebe-6c7d2b6d3446"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2 4 6]\n",
      " [5 7 9]]\n"
     ]
    }
   ],
   "source": [
    "# Add a vector to each row of a matrix\n",
    "x = np.array([[1,2,3], [4,5,6]])\n",
    "# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),\n",
    "# giving the following matrix:\n",
    "\n",
    "print(x + v)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "T5hKS1QaL9jK",
    "outputId": "5f14ac5c-7a21-4216-e91d-cfce5720a804"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5  6  7]\n",
      " [ 9 10 11]]\n"
     ]
    }
   ],
   "source": [
    "# Add a vector to each column of a matrix\n",
    "# x has shape (2, 3) and w has shape (2,).\n",
    "# If we transpose x then it has shape (3, 2) and can be broadcast\n",
    "# against w to yield a result of shape (3, 2); transposing this result\n",
    "# yields the final result of shape (2, 3) which is the matrix x with\n",
    "# the vector w added to each column. Gives the following matrix:\n",
    "\n",
    "print((x.T + w).T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "JDUrZUl6L9jN",
    "outputId": "53e99a89-c599-406d-9fe3-7aa35ae5fb90"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 5  6  7]\n",
      " [ 9 10 11]]\n"
     ]
    }
   ],
   "source": [
    "# Another solution is to reshape w to be a row vector of shape (2, 1);\n",
    "# we can then broadcast it directly against x to produce the same\n",
    "# output.\n",
    "print(x + np.reshape(w, (2, 1)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 52
    },
    "colab_type": "code",
    "id": "VzrEo4KGL9jP",
    "outputId": "53c9d4cc-32d5-46b0-d090-53c7db57fb32"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 2  4  6]\n",
      " [ 8 10 12]]\n"
     ]
    }
   ],
   "source": [
    "# Multiply a matrix by a constant:\n",
    "# x has shape (2, 3). Numpy treats scalars as arrays of shape ();\n",
    "# these can be broadcast together to shape (2, 3), producing the\n",
    "# following array:\n",
    "print(x * 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "89e2FXxFL9jQ"
   },
   "source": [
    "Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "iF3ZtwVNL9jQ"
   },
   "source": [
    "This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tEINf4bEL9jR"
   },
   "source": [
    "##Matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0hgVWLaXL9jR"
   },
   "source": [
    "Matplotlib is a plotting library. In this section give a brief introduction to the `matplotlib.pyplot` module, which provides a plotting system similar to that of MATLAB."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "cmh_7c6KL9jR"
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jOsaA5hGL9jS"
   },
   "source": [
    "By running this special iPython command, we will be displaying plots inline:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ijpsmwGnL9jT"
   },
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "U5Z_oMoLL9jV"
   },
   "source": [
    "###Plotting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6QyFJ7dhL9jV"
   },
   "source": [
    "The most important function in `matplotlib` is plot, which allows you to plot 2D data. Here is a simple example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 282
    },
    "colab_type": "code",
    "id": "pua52BGeL9jW",
    "outputId": "9ac3ee0f-7ff7-463b-b901-c33d21a2b10c"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x7f0f3a0b4208>]"
      ]
     },
     "execution_count": 105,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0\ndHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3dd3hU95X4//cZVVRBSEKNJnoRCBDN\ndhIXjCk2uAdXkjixs4m9qd442d0461+cZLPZ1HWKY8d2bMfYwdh0Y9wLYBBFEh0hiioSAoSEunR+\nf2jIV8GiajR3ynk9zzyauWXukRjm3Hvup4iqYowxJni5nA7AGGOMsywRGGNMkLNEYIwxQc4SgTHG\nBDlLBMYYE+RCnQ7gUiQmJuqgQYOcDsMYY/zK5s2bj6pq0pnL/TIRDBo0iNzcXKfDMMYYvyIih7pa\nbqUhY4wJcpYIjDEmyFkiMMaYIGeJwBhjgpwlAmOMCXIeSQQi8hcRqRSR7WdZLyLyWxEpFJF8EZnY\nad1CEdnnfiz0RDzGGGMunKeuCJ4FZp1j/WxgmPtxP/AHABFJAB4FpgJTgEdFpI+HYjLGGHMBPNKP\nQFU/EJFB59hkPvBX7RjzeoOI9BaRVOBKYK2qHgMQkbV0JJSXPBFXoGlubWfjgWNU1jZysqGF2sZW\nUnv3YurgBDL69EJEnA7RGJ9x5GQjH+47yqmmVtralXZVhiTHMD2zL5FhIU6H51O81aEsHSju9LrE\nvexsyz9FRO6n42qCAQMG9EyUPkhVyS+pYcmWEpbllXG8vqXL7VLiIpkxOpmHrh5Gv7hIL0dpjG+o\nqW/hpU2HeWN7BduKT3S5TWSYi8uHJDIvO40bxqXhctkJlN/0LFbVJ4EnAXJycoJiNp3K2kb+47Xt\nvLnzCBGhLq4d3Y8bs9MZkhxDXGQoMZGhHDxaz8aDx9hQVM3Lm4pZvLmEL1+RyQOfyyQ2MszpX8EY\nr1BVXttayuMrd1F9qpms9Hi+O3M4M0b3IzEmghD31XJeyQne3V3J27sr+caibfx1/SEemz+GMWnx\nDv8GzhJPzVDmLg2tUNWxXaz7E/Ceqr7kfr2HjrLQlcCVqvpAV9udTU5OjgbyEBOqyrK8Mh5dtoP6\n5ja+OWMYd08bSNx5vtgPV9fzizf3sCyvjMSYcP5w9yQmD0rwUtTGOOPg0VN8f0kB64uqmTCgNz++\ncex5v9jb25VXt5Tws9W7OV7fzD3TBvKDuaOICA3skpGIbFbVnE8t91IimAs8CMyh48bwb1V1ivtm\n8WbgdCuiLcCk0/cMziaQE0Fbu/KDJQW8nFtMdv/e/OK28QxNjrmo9ygoqeEbi7ZScryBn96cxS2T\nMnooWmOcta34BF98ZiNt7cr3Zo/kjskDLqrUU9PQwq/W7uXZdQeZlpnAn+7JIb5X4F5J92giEJGX\n6Di7TwSO0NESKAxAVf8oHXcx/4+OG8H1wBdVNde975eAH7jf6nFVfeZ8xwvURNDS1s63Xt7Givxy\nvn7VEL41YzihIZfWsKumvoV/eXEz6/ZX87Urh/DdmSOsFmoCygd7q/jqC5tJjIng+fumMLBv9CW/\n1+tbS3l4cR6ZiTE8+6XJpMb38mCkvqPHrwi8KRATQVNrGw/+bStrdx7h+7NH8sDnhnT7PVva2vnh\n0h28tPEwX7p8MD+8YbQHIjXGecvzyvj2K9sYmhzLc1+cTLIHGkh8tO8oX31hM7GRobz0lWkMSrz0\nxOKrzpYIrGexD2hrV772whbW7jzCf80b45EkABAW4uInN43li5cP4i8fH+CpD4s88r7GOGnd/qN8\n6+VtTOjfh0X3T/NIEgC4YlgiLz8wjcaWNr707CZqztJCLxBZIvABP1+zm7d3V/LY/DEsvGyQR99b\nRPiPuaOZPTaFH6/cxYr8Mo++vzHedODoKf7lhS0MSozmqS94vp4/Ji2eP92TQ/Hxer76wmaaW9s9\n+v6+yhKBw5bnlfGn94u4a+oA7p0+qEeOEeISfvX5bCYP6sO3X85j08Fz3os3xifVNLRw33ObcAk8\nvTDnvK3oLtWUwQn89y3jWF9UzX++vh1/LJ9fLEsEDtpZdpJ/W5xPzsA+PHrDmB49VmRYCH++N4f0\nPr146G9bOVHf3KPHM8aT2tqVh17ayuHqev5w96Ru3Ri+EDdPzOChq4fycm4xz6472KPH8gWWCBxy\nsrGFB17IJa5XKL+/eyLhoT3/T9E7KpzfLpjA0bomvr+kICjOdExgePqjIj7YW8Vj88cyLbOvV475\nrRnDmTEqmZ+u3s2+I7VeOaZTLBE45KerdlN6vIHf3zWJ5FjvDQmRlRHPd2aOYPX2Cv6+ucRrxzXm\nUu09Ussv1uxl5uh+3DGlv9eO63IJP7tlHLERoXzrlW20tAXu/QJLBA74uPAoL208zFc+k8mkgd4f\nbPX+z2YyLTOBHy3bwcGjp7x+fGMuVEtbO995JY+YyFAevynL6wMrJsZE8PhNWWwvPcnv3t7n1WN7\nkyUCLzvV1Mr3Xs0nMzGab1073JEYQlzCL2/PJtQlfOfvebS3W4nI+KY/vLefgtIafnzjWJJiIxyJ\nYdbYFG6ZmMET7+1n6+HjjsTQ0ywReNnP39hN6YkGfn7rOEeHwk3r3Yv/vH40mw8dZ/EWKxEZ37Oz\n7CS/fXsf88anMScr1dFYHp03mpS4SB5enB+QJSJLBF60+dAxnlt/iIXTB5HjA4PB3TIxg4kDevPf\nq3cHVecZ4/tUlR8t30FcrzD+a17Ptqi7EHGRYfxo3hgKK+t4fv0hp8PxOEsEXtLerjy2fCcpcZH8\n26wRTocDdNwMe2z+WI7XN/O/a/c4HY4x/7CqoIKNB47xnZnD6RMd7nQ4AMwYlcxnhiXyq7f2Ul3X\n5HQ4HmWJwEuW5pWSV1LDw9eNICrcd6aBGJsez93TBvLChkPsKKtxOhxjaGxp4yerdjEyJZYFk31n\nEioR4dEbRtPQ3MYv3gysEydLBF7Q0NzGz9/YQ1Z6PDdN6HICNkd959oR9IkK54dLd1jfAuO4P39Q\nROmJBh69YQwhPjZi7tDkWBZeNohFm4opKAmcEydLBF7w1IdFlNc08h9zR/nkUNDxUWE8fN0INh86\nzpodR5wOxwSx8poGfv/efmaPTWH6EO90HLtY/3rNMBKiwvnR8sA5cbJE0MOOnGzkD+/vZ9aYFKZ6\nqUfkpbh1UgaZidH8cu0e2qw5qXHI/765lzZVfjBnlNOhnFV8rzC+M7PjxOntXZVOh+MRlgh62G/e\n3kdLWzuPzB7pdCjnFBri4tszh7P3SB3L8kqdDscEoQNHT7FkSwn3TBtI/4Qop8M5p9tyMhiQEMUv\n1+4NiH44HkkEIjJLRPaISKGIPNLF+l+JyDb3Y6+InOi0rq3TumWeiMdXlJ5o4O+5xXx+cn+/mORi\nzthURqXG8au1+wKyrbTxbb97ex/hoS6+6qH5OHpSWIiLb1wzjJ3lJ1mzo8LpcLqt24lAREKAJ4DZ\nwGjgDhH5p6mwVPVbqpqtqtnA74AlnVY3nF6nqvO6G48v+f27hQD8y5VDHY7kwrhcwsPXDefwsXpe\nyS12OhwTRPZX1fH6tlLunT7IsR7EF+vGCelkJkXzq7f2+n051RNXBFOAQlUtUtVmYBEw/xzb3wG8\n5IHj+rSyEw28klvMbTn9Se/tP/OfXjUimUkD+/C7twtpbGlzOhwTJH779j4iw0J44LOZTodywUJc\nwjdndJRTVxaUOx1Ot3giEaQDnU8fS9zLPkVEBgKDgXc6LY4UkVwR2SAiN57tICJyv3u73KqqKg+E\n3bP+8N5+AL52pe9f5nYmInx35ggqTjbaVYHxin1HalmWV8a90wfRN8Y/rgZOuz4rlRH9Yvn1W3tp\n9eNyqrdvFi8AFqtq51PNge7JlO8Efi0iXX5zquqTqpqjqjlJSUneiPWSldc08PKmYm6dlEFGH9++\n6dWVaZkJTBzQmyc/KPLrD7fxD795ex9RYSHc70dXA6e5XMK3rh1GUdUpv74q8EQiKAU6DxKe4V7W\nlQWcURZS1VL3zyLgPWCCB2Jy1J/eL6Jdla/5yb2BM4kIX/3cEEqON/j1h9v4vkPVp1hVUM490weR\n4CNDSVysmaNTGJIUzZMfFPltvwJPJIJNwDARGSwi4XR82X+q9Y+IjAT6AOs7LesjIhHu54nA5cBO\nD8TkmGOnmlm06TA3TUj3+SZw5zJjVD+GJsfwx/f998NtfN/THx0gxCV86fJBTodyyVwu4f7PZrKj\n7CQfF1Y7Hc4l6XYiUNVW4EFgDbALeEVVd4jIYyLSuRXQAmCR/vO3yiggV0TygHeBn6mqXyeCFzcc\norGlna/44WVuZy6X8MBnM9lVfpL39/r+PRnjf46fauaV3GJuzE4nOc57s/T1hBsnpJMUG8GfPtjv\ndCiXxCOjn6nqKmDVGct+eMbrH3Wx3zogyxMx+ILGljaeW3+Izw1PYni/WKfD6bb52en8cu1e/vj+\nfq4ckex0OCbAPB8gJ00AEaEhfOGyQfzPmj3sKKthTFq80yFdFOtZ7EHL8so4WtfEVz7j/x9sgPBQ\nF/ddMZgNRccCdmYm44zGljaeW3eQq0YExkkTwN1TBxIdHsKfPyhyOpSLZonAQ1SVpz88wMiUWC4f\n6rtjCl2sBVMGEBcZytMfHXA6FBNAlmwppfpUM/d/1r+aV59LfFQYC6YMYHl+OSXH650O56JYIvCQ\nD/YdZc+RWr78mUyvT7Ddk2IiQrk9pz9vbK/gyMlGp8MxAaC9XXnqwyKy0uOZlun8TH2e9KUrBiPA\nc+sOOh3KRbFE4CFPfVhEcmwE88anOR2Kx907fRBtqry4IfCm6DPe98G+KoqOnuLLnxkcUCdNAOm9\ne3HdmBReyS2hodl/euZbIvCAwspaPtx3lHunDyQ8NPD+pAP6RnH1iGT+tvEwTa3+8+E2vun59YdI\njIlg9lhnJ6TvKfdOH0hNQ4tfjeIbeN9aDnhhw2HCQoQFU3xnWj1PW3jZII7WNbPKOpiZbig+Vs87\neyq5Y0r/gDxpApgyOIER/WJ5bt0hv+mDE5j/El5U39zKq5tLmJOVSqKfjZNyMa4YmkhmUjTPrrPy\nkLl0L35yGJcId04N3JMmEeGe6QPZWX6SLX7S2s4SQTct21ZGbVMrd08b6HQoPcrlEhZOH0Re8Qm2\nFZ84/w7GnKGxpY2XNx1mxqhkUuP9Z0TeS3HThHRiI0L563r/OHGyRNANqsrzGw4xMiWWnIF9nA6n\nx90yKYOYiFD+6mctIoxvWJlfzvH6Fu6dPsjpUHpcdEQot0zKYFVBOVW1TU6Hc16WCLphW/EJdpSd\n5K5pAwOu9UNXYiJCuWlCOisKyjlR3+x0OMbP/HXDITKTornMRyel97R7pg+kpU1ZtPGw06GclyWC\nbnhhw2Giw0O4aUKX0y8EpAVT+tPc2s5rW/2nRYRxXkFJDXnFJ7gnSE6aAIYkxXDF0EQWbSr2+RnM\nLBFcouOnmlmeX8ZNE9OJifDIkE1+YUxaPOMy4lm0sdhvWkQY5y3adJiIUBc3T8xwOhSvWjClP6Un\nGvio8KjToZyTJYJL9NrWUppb27lramDfJO7KgskD2HOklq1209hcgIbmNpZtK2NuVirxvcKcDser\nrh3djz5RYby8ybfLQ5YILoGq8kpuMeMz4hmVGud0OF43LzuNqPAQv6h9GuetKiintqmV2yf3P//G\nASYiNISbJ2awducRqut896axJYJLUFBaw+6KWm7LCb4PNnTcNL5hXBrL88qpbWxxOhzj417eVMyg\nvlFMHRxY4wpdqM9P7k9Lm/r0fTWPJAIRmSUie0SkUEQe6WL9F0SkSkS2uR9f7rRuoYjscz8WeiKe\nnvZKbjERoS7mZQfeuEIXasGU/jS0tLEsr8zpUIwPK6qqY+PBY9w+uX/Q3CQ+0/B+sUwc0JtFm3z3\nvlq3E4GIhABPALOB0cAdIjK6i01fVtVs9+Mp974JwKPAVGAK8KiI+HSD/MaWNpZuK2NOVipxkcFV\n7+wsu39vRqbEsmhjsdOhGB/2Sm4JIS7h1iC7SXymBZMHUFhZ57M9jT1xRTAFKFTVIlVtBhYB8y9w\n3+uAtap6TFWPA2uBWR6Iqce8sb2C2sZWbg/SstBpIsKCyf0pKK1hV/lJp8MxPqilrZ3Fm0u4akSy\n309F2V1zx6USHR7isydOnkgE6UDn367EvexMt4hIvogsFpHT36IXuq/PeHlTMQMSgrfe2dm87HTC\nQoRXN5c4HYrxQe/uruRoXRMLgvAm8ZmiI0KZl53Givxy6ppanQ7nU7x1s3g5MEhVx9Fx1v/cxb6B\niNwvIrkikltV5cxk6oer61lfVM3tORm4XMFZ7+wsITqcq0Yk8/q2Mlrb2p0Ox/iYxZtLSIqN4MoR\nSU6H4hNunZRBQ0sbb2yvcDqUT/FEIigFOqf8DPeyf1DValU93XbqKWDShe7b6T2eVNUcVc1JSnLm\ng7V4SwkiHWPumA63TMrgaF0TH+xzJjkb33TsVDPv7qnkxuw0QkOscSLAxAF9GNg3iiVbfO8K2hP/\nQpuAYSIyWETCgQXAss4biEjnGSjmAbvcz9cAM0Wkj/sm8Uz3Mp+jqry2tYQrhiYG/MiJF+OqEcn0\niQrj1c2+2zTOeN+K/DJa2jToehKfi4hw84QM1hdVU3qiwelw/km3E4GqtgIP0vEFvgt4RVV3iMhj\nIjLPvdm/isgOEckD/hX4gnvfY8D/R0cy2QQ85l7mc3IPHaf4WENQjSt0IcJDXczPTmftziPU1Fuf\nAtPh1S2ljEqNC8oOl+dy88R0VOF1H+tT4JFrNlVdparDVXWIqj7uXvZDVV3mfv59VR2jquNV9SpV\n3d1p37+o6lD34xlPxNMTlmwppVdYCNeNSXE6FJ9z66QMmtvaWZ5vfQoMFFbWkVd8glsm2knTmfon\nRDFlcAKvbinxqT4FVry7AI0tbazIL2PW2BSig2iAuQs1Ji2OEf1iWWythwzw2tYSXEJQd7g8l1sm\nplNUdcqnJniyRHAB3tldSW1jq5WFzkJEuGVSOtuKT7C/qs7pcIyD2tuV17aU8tnhSSTHBnffgbOZ\nk5VKRKiLJVt8pzxkieACLNlSSnJsBJcPTXQ6FJ91Y3Y6IrDUx2qfxrs2HKimrKbRbhKfQ2xkGNeN\nSWF5fhlNrW1OhwNYIjivY6eaeW9PJfOz0wixvgNnlRwXyWVD+rI0r8ynap/Gu5ZsKSU2IpSZo/s5\nHYpPu3liOifqW3hvj280u7ZEcB4r8stobbdmcBdifnY6h6rrfar2abyn0d1ZatbYFCLDQpwOx6dd\nMTSRvtHhLNvmGw0sLBGcx2tbSxmZEmvN4C7ArLEphIe6WOojH27jXe/srqSuqZUb7V7aeYWGuLh+\nXCpv7TriE0O5WyI4h8PV9Ww9fIL52fbBvhBxkWFcMzK54yrKhpwIOku3lZIUG8G0zOCYnL675mWn\n09Tazps7jjgdiiWCczndLv6G8ann2dKcNj87naN1zT4/R6vxrJqGFt7dU8UN4+xe2oWaOKA3GX16\nsdQH5vSwRHAOS7eVkjOwDxl9opwOxW9cNTKJuMhQn6l9Gu9Ys6OC5tZ26ztwEUSE+dlpfLSviqpa\nZ6extERwFrsrTrL3SJ19sC9SRGgIc7JSWbOjgoZm32gaZ3resm1lDOwbxfiMeKdD8Svzs9NpV1jp\ncK98SwRnsWxbGSEuYU6WlYUu1rzsNE41t7F2l/O1T9PzKk82sm7/UeaPTwva6Sgv1fB+HQ1RnC4P\nWSLogqqyLK+My4cmkhgT4XQ4fmfa4L70i4tguQ/UPk3PW5FfTrvakBKXan52GlsPn+Bwdb1jMVgi\n6MKWwycoOd7A/PH2wb4ULpcwNyuN9/dUcdIHmsaZnrU0r4wxaXEMTY51OhS/dIP7e2ZZnnO98i0R\ndGF5XhkRoS5mjrHekZfq+vGpNLf5RtM403MOV9eTV3yCeXbSdMnSe/di0sA+rMgvdywGSwRnaG1r\nZ0V+OVePTCY2MszpcPzWhP69Se/dixU2NHVAO93Eeu44u5fWHTeMS2V3RS2FlbWOHN8SwRk2HjjG\n0bqmf1yumUsjItwwPo2P9h3l+Klmp8MxPWRFfjkTBvS2JtbdNCcrFRFYnufMVYFHEoGIzBKRPSJS\nKCKPdLH+2yKyU0TyReRtERnYaV2biGxzP5adua+3Lc8vJyo8hKtGJDsdit+7flwqre3KGzt8b7Ju\n0337q+rYVX6S68fZSVN3JcdFMnVwAivynRm0sduJQERCgCeA2cBo4A4RGX3GZluBHFUdBywGft5p\nXYOqZrsf83BQa1s7b2wvZ8aofvQKt0GzumtMWhyZidHWeihArcwvRwTmWhNrj7h+XBr7q06xu8L7\n5SFPXBFMAQpVtUhVm4FFwPzOG6jqu6p6um3UBsAnh/Jct7+a4/UtVu/0EBHh+nGpbCiqprK20elw\njIetyC9j8sAEUuJtAhpPmD02hRCXOHLi5IlEkA4Ud3pd4l52NvcBqzu9jhSRXBHZICI3nm0nEbnf\nvV1uVVXPjOG9Mr+cmIhQPjc8qUfePxjdMD6NdoXVBVYeCiR7j9Sy90gd19s4XB7TNyaCy4b0ZUV+\nudfLQ169WSwidwM5wP90WjxQVXOAO4Ffi8iQrvZV1SdVNUdVc5KSPP9F3dzazhs7Krh2dD8bS92D\nhvWLZUS/WGs9FGBW5JXhEpg91hKBJ10/LpXDx+opKK3x6nE9kQhKgf6dXme4l/0TEZkB/DswT1X/\nMcKSqpa6fxYB7wETPBDTRfu48Cg1DS1cb2Uhj5s7LpXcQ8epqLHyUCBQVVbklzMtsy9Jsdbz3pOu\nG5NCqEu83qfAE4lgEzBMRAaLSDiwAPin1j8iMgH4Ex1JoLLT8j4iEuF+nghcDuz0QEwXbUV+ObGR\noVwxzOYl9rQ5WamowurtznWYMZ6zq7yWoqOnrLVQD+gdFc5nhiWy0svloW4nAlVtBR4E1gC7gFdU\ndYeIPCYip1sB/Q8QA/z9jGaio4BcEckD3gV+pqpeTwRNrW28ubOC68akEBFqZSFPG5ocw8iUWFYV\nWCIIBCsLOgZkvM563veIOVmplJ5oIK/Ee+WhUE+8iaquAladseyHnZ7POMt+64AsT8TQHR/uPUpt\nY6u1FupBc7JS+eXavVTUNForEz+mqqwqqGB6Zl/62oCMPWLm6BR+EFLAqoJysvv39soxrWcxsKqg\nnLjIUC4fYmWhnnJ6OG8rD/m3XeW1HDh6yoZn70HxUWFcMdS75aGgTwRNrR3j5s8c0zHxuukZVh4K\nDKsKyq0s5AXeLg8F/Tffx4XuspCd4fS4uVmpbDporYf8VUdZqJxpmQlWFuphM0enEBYiXjtxCvpE\nsDK/oqMsNNTKQj1tzjgrD/mz3RUdrYWsLNTz4qPCuNyL5aGgTgTNre2s3VnBtaOtLOQNQ5I6ykMr\nHRx33Vy6VQXluKSjrbvpeafLQ/leKA8F9bffx4VHOdnYytxx9sH2lrlZ1rnMH6kqKws6OpHZ9K3e\ncZ27PLTSC+WhoE4EKwvKiY2wspA3zXaXFd6w8pBf2V1RS1GVlYW8yZvloaBNBM2t7bzpHlvIOpF5\nz9DkGEb0i2XVdhuEzp+sdpeFZo21q2dvOl0e6umxh4I2Eazb31EWsjMc75udlcKmg8dsaGo/smp7\nBVMHW1nI22aO7keoS1jVw6P3Bm0iWF1QQUxEKJ8ZbmUhbzs99tAauyrwC/uO1FJYWcecLLsa8Lbe\nUeFMH9KX1dt7tjwUlImgpa2dNTsrmDEq2cpCDhiWHMOQpOgeP8sxnrGqoAKx1kKOmZOVyqHqenaW\nn+yxYwRlIvik6Bgn6lv+cePSeJeIMDcrlU8OVHO0run8OxhHrd5ezuSBCSTH2RhRTpg5uh8hLunR\nyZ2CMhGs2t4xQb3NROac2VmptCussYntfdr+qjp2V9Qy28pCjukbE8G0zARWFfRceSjoEkFbu7Jm\newVXj0y2mcgcNDIllsGJ0TaFpY97w30fx1oLOWv22FSKjp5i75G6Hnn/oEsEGw8co/pUs7UWcpiI\nMHtsCuuLqjl2qtnpcMxZrCooZ+KA3qTG93I6lKB23ZgUROixsYeCLhGs3l5OZJiLK0dYWchpc7JS\naWtX1u60qwJfdKj6FDvKTtpJkw9Iio1gyqCEHhunyyOJQERmicgeESkUkUe6WB8hIi+7138iIoM6\nrfu+e/keEbnOE/GcTXu7snp7BVeNSCYq3CNz8phuGJMWx4CEKGs95KNWW1nIp8zJSmXvkToKK2s9\n/t7d/jYUkRDgCeBaoATYJCLLzphy8j7guKoOFZEFwH8DnxeR0XTMcTwGSAPeEpHhqtrW3bi6svnw\ncapqm6y1kI8QEWZnpfD0hweoqW8hPirM6ZBMJ6sLyhmXEU9GnyinQzF0dMSMjgglpQfKdJ64IpgC\nFKpqkao2A4uA+WdsMx94zv18MXCNiIh7+SJVbVLVA0Ch+/16xKqCcsJDXVw9MrmnDmEu0uyxqbS2\nK2t3HXE6FNNJyfF68kpqrCzkQ5JjI7l1UgYxEZ6vZngiEaQDxZ1el7iXdbmNe7L7GqDvBe4LgIjc\nLyK5IpJbVVV1SYG2tSuzxqT0yB/SXJrxGfGkxUfaIHQ+5nRrodlWFgoKfvONqKpPAk8C5OTkXFJj\n2sfmj/XaHKDmwnSUh1J5fv0hahtbiI208pAvWFVQzpi0OAb2jXY6FOMFnrgiKAX6d3qd4V7W5TYi\nEgrEA9UXuK9HdVSkjC+Zk5VCc1s77+yudDoUA5TXNLDl8AkrCwURTySCTcAwERksIuF03PxddsY2\ny4CF7ue3Au9ox6n5MmCBu1XRYGAYsNEDMRk/MqF/H/rFRdjE9j7CykLBp9ulIVVtFZEHgTVACPAX\nVd0hIo8Buaq6DHgaeF5ECoFjdCQL3Nu9AuwEWoGv91SLIeO7XC5h9thUXtp4mFNNrUTbPRxHrS6o\nYGRKLJlJMU6HYrzEI/0IVHWVqg5X1SGq+rh72Q/dSQBVbVTV21R1qKpOUdWiTvs+7t5vhKqu9kQ8\nxv/MHptCU2s77+6x8pCTKk82sunQMWaPtbJQMAm6nsXGN+UMSiAxJsLGHnLYmh0VqGJzDwQZSwTG\nJ4S4hFlj+/HO7koamq066OKZgxsAABUDSURBVJRVBRUMTY5hWL9Yp0MxXmSJwPiMOVmpNLS08Z6V\nhxxxtK6JTw5U203iIGSJwPiMKYMS6BsdbhPbO2TNjgraFWs2GoQsERifERriYuaYFN7ZdYTGFisP\neduqgnIyE6MZmWJloWBjicD4lLlZqZxqbuP9vZc2jIi5NNV1TWwoOsbsrBTrdBmELBEYnzI1M4E+\nUWGsts5lXvXmziO0tauVhYKUJQLjU8JCXMwcncJbuyqtPORFqwrKGdQ3itGpcU6HYhxgicD4nDnj\nUqlrauWjfUedDiUoHD/VzLr91czOSrWyUJCyRGB8zmVD+hLfK8zGHvKSN3dW0NauzLWyUNCyRGB8\nTkd5qB9rdx6hqdXKQz1tZUEFAxKiGJNmZaFgZYnA+KS541KpbWrlw71WHupJJ+qbWVd4lDlWFgpq\nlgiMT7p8aCLxvcJYaeWhHvXmjiO0tquNLRTkLBEYnxQW4uK6Mf14a6d1LutJKwrKGZAQRVZ6vNOh\nGAdZIjA+a+64tI7ykLUe6hHHTzXzceFR5o6zslCws0RgfNZlQ/rSOyqMlfllTocSkNbssNZCpkO3\nEoGIJIjIWhHZ5/7Zp4ttskVkvYjsEJF8Efl8p3XPisgBEdnmfmR3Jx4TWMJCXMwak8JaKw/1iJUF\n5QxOjLbWQqbbVwSPAG+r6jDgbffrM9UD96rqGGAW8GsR6d1p/cOqmu1+bOtmPCbAzLGxh3pEdV0T\n6/ZXM9daCxm6nwjmA8+5nz8H3HjmBqq6V1X3uZ+XAZVAUjePa4LE9CF96RNlncs87Y3TZaFxVhYy\n3U8E/VT19P/QCqDfuTYWkSlAOLC/0+LH3SWjX4lIxDn2vV9EckUkt6rKzg6DRViIi1ljO8pDNnOZ\n56zMLyczyYacNh3OmwhE5C0R2d7FY37n7VRVAT3H+6QCzwNfVNV29+LvAyOByUAC8L2z7a+qT6pq\njqrmJCXZBUUwuWFcGvXNbTaxvYdU1Taxoaia660sZNxCz7eBqs442zoROSIiqapa7v6i7/J/qojE\nASuBf1fVDZ3e+/TVRJOIPAN896KiN0FhamZfEmMiWJ5XZsMke8Ab28tp147mucZA90tDy4CF7ucL\ngaVnbiAi4cBrwF9VdfEZ61LdP4WO+wvbuxmPCUAhLuH6cam8s7uS2sYWp8Pxe8vyyhjeL4YRVhYy\nbt1NBD8DrhWRfcAM92tEJEdEnnJvczvwWeALXTQTfVFECoACIBH4cTfjMQHqhvGpNLW289auI06H\n4tdKTzSw6eBx5o23qwHz/5y3NHQuqloNXNPF8lzgy+7nLwAvnGX/q7tzfBM8JvTvQ3rvXizPK+em\nCRlOh+O3TnfOu97KQqYT61ls/ILLXR76YG8VJ+qbnQ7Hby3LK2N8RjyDEqOdDsX4EEsExm/cMD6N\n1nblje0VTofil4qq6theepIbrCxkzmCJwPiNMWlxDE6MZrmNPXRJluWVIYIlAvMplgiM3xARbhiX\nyvr91VSebHQ6HL+iqizLK2Pq4AT6xUU6HY7xMZYIjF+Zl51Ou8LyfBty4mLsKDtJUdUp5o1PdzoU\n44MsERi/MjQ5hrHpcSzdVup0KH5leV4ZoS5h9libicx8miUC43duzE4nv6SG/VV1TofiF9rbO8pC\nnx2eRJ/ocKfDMT7IEoHxOzeMT8MlsHSrXRVciA0HqimvaeSmCVYWMl2zRGD8Tr+4SC4bksjr28ro\nGOvQnMvrW0uJiQhlxqhzDg5sgpglAuOX5mencfhYPVsOn3A6FJ/W2NLG6oIKZo1NoVd4iNPhGB9l\nicD4pVljU4gIddlN4/N4a9cRaptaudnKQuYcLBEYvxQbGcaM0f1YkV9OS1v7+XcIUq9vLSUlLpKp\nmX2dDsX4MEsExm/dmJ3OsVPNfGDzGXfp2Klm3ttTxfzsNEJcNgGNOTtLBMZvfW54EgnR4SzZYuWh\nrqzIL6O1XbnRykLmPCwRGL8VHupifnYaa3cesRFJu/Da1lJGpsQyKjXO6VCMj+tWIhCRBBFZKyL7\n3D/7nGW7tk6T0izrtHywiHwiIoUi8rJ7NjNjLtitkzJobmtnWZ4NRNdZYWUdWw+f4OaJdjVgzq+7\nVwSPAG+r6jDgbffrrjSoarb7Ma/T8v8GfqWqQ4HjwH3djMcEmTFp8YxKjWPx5hKnQ/Epf99cTIhL\nbBIfc0G6mwjmA8+5nz9Hx7zDF8Q9T/HVwOl5jC9qf2NOu3VSBvklNeypqHU6FJ/Q2tbOki2lXDUi\nmaTYCKfDMX6gu4mgn6qeHgayAjhb18VIEckVkQ0icvrLvi9wQlVb3a9LgLNex4rI/e73yK2qslYi\n5v+Zn51GqEt4dYtdFQC8v7eKqtombs+xqwFzYc6bCETkLRHZ3sVjfufttKOv/9n6+w9U1RzgTuDX\nIjLkYgNV1SdVNUdVc5KSki52dxPAEmMiuGpkMku2lNJqfQr4e24JiTHhXDUy2elQjJ84byJQ1Rmq\nOraLx1LgiIikArh/Vp7lPUrdP4uA94AJQDXQW0RC3ZtlANYO0FySWydlcLSuiQ/2BffVYnVdE2/t\nOsJNE9IJC7FGgebCdPeTsgxY6H6+EFh65gYi0kdEItzPE4HLgZ3uK4h3gVvPtb8xF+KqEckkRIfz\n8qZip0Nx1OvbOvoO3JbT3+lQjB/pbiL4GXCtiOwDZrhfIyI5IvKUe5tRQK6I5NHxxf8zVd3pXvc9\n4NsiUkjHPYOnuxmPCVLhoS5unZTB27sqg3YaS1Xl77nFjO/fm+H9Yp0Ox/iRbiUCVa1W1WtUdZi7\nhHTMvTxXVb/sfr5OVbNUdbz759Od9i9S1SmqOlRVb1PVpu79OiaYLZjcn9Z25e9B2pQ0v6SG3RW1\n3DbJbhKbi2NFRBMwMpNimJaZwKJNh2lvD755Cl785BBR4SHMz05zOhTjZywRmIByx5QBFB9r4KPC\no06H4lU1DS0syytjfnY6sZFhTodj/IwlAhNQZo1NoU9UGC9tPOx0KF712pYSGlvauWvqAKdDMX7I\nEoEJKBGhIdw6KYO1O49QWRscN41VlRc/Ocz4jHjGpsc7HY7xQ5YITMBZMGUAre0aNOMPbTp4nH2V\nddw1daDToRg/ZYnABJwh7pvGf/vkMG1BcNP4xU8OERsZyvXjU50OxfgpSwQmIN07fRAlxxt4e9cR\np0PpUdV1TawuqOCWiRlEhYeefwdjumCJwASkmaP7kRYfybPrDjodSo96ObeY5rZ27rSbxKYbLBGY\ngBQa4uKe6YNYt7+a3RUnnQ6nR7S0tfPXdYe4fGhf60lsusUSgQlYCyb3JzLMxXMBelWwqqCcipON\n3HfFYKdDMX7OEoEJWH2iw7lpQjpLtpRy/FRgzWmsqjz90QEyk6K5crgNN226xxKBCWgLLxtEU2s7\niwJsVNLcQ8fJL6nhi5cPxuUSp8Mxfs4SgQloI1PiuGxIX55ff5CWAJq05ukPDxDfK4xbbHJ64wGW\nCEzA+9LlgymraWRFfpnToXhE8bF63txZwZ1TB1iTUeMRlghMwLt6ZDIj+sXy+3f3B8SopM98fBCX\nCAunD3I6FBMgLBGYgOdyCV+7agj7Kut4y887mB2ta+JvGw8xLzuNlPhIp8MxAaJbiUBEEkRkrYjs\nc//s08U2V4nItk6PRhG50b3uWRE50GlddnfiMeZs5malMiAhiife20/HLKn+6akPD9DU2s7Xrxrq\ndCgmgHT3iuAR4G1VHQa87X79T1T1XVXNVtVs4GqgHniz0yYPn16vqtu6GY8xXQoNcfHA5zLJKz7B\n+v3VTodzSY6faub59Qe5flwaQ5JinA7HBJDuJoL5wHPu588BN55n+1uB1apa383jGnPRbpmYQVJs\nBE+8V+h0KJfkmY8PcKq5jQftasB4WHcTQT9VLXc/rwD6nWf7BcBLZyx7XETyReRXIhJxth1F5H4R\nyRWR3Kqqqm6EbIJVZFgIX/nMYD4urGbL4eNOh3NRTja28My6g8wak8KIFBtOwnjWeROBiLwlItu7\neMzvvJ12FF7PWnwVkVQgC1jTafH3gZHAZCAB+N7Z9lfVJ1U1R1VzkpKSzhe2MV26a+pA+kaH8z9v\n7PGrewV/XXeQ2sZWHrzargaM5503EajqDFUd28VjKXDE/QV/+ou+8hxvdTvwmqq2dHrvcu3QBDwD\nTOner2PMuUVHhPLg1UNZX1TNh/v8Y17jmvoW/vzhAa4emWwzkJke0d3S0DJgofv5QmDpOba9gzPK\nQp2SiNBxf2F7N+Mx5rzunDqA9N69+Pma3X7Rr+CJ9wo52djCw9eNcDoUE6C6mwh+BlwrIvuAGe7X\niEiOiDx1eiMRGQT0B94/Y/8XRaQAKAASgR93Mx5jzisiNIRvXzuc7aUnWbW9/Pw7OKj4WD3PfnyQ\nWyZmMCo1zulwTIDqVv90Va0GrulieS7w5U6vDwKfGhRFVa/uzvGNuVQ3TkjnyQ+K+MWaPVw3JoWw\nEN/sW/mLN/fgcsF3Zg53OhQTwHzz029MDwtxCQ9fN4KD1fUs2njY6XC6lF9ygqXbyrjvisGkxvdy\nOhwTwCwRmKB1zahkpg5O4Bdv7uVoXZPT4fwTVeWnq3aTEB3OA58b4nQ4JsBZIjBBS0T48Y1jOdXU\nyk9X7XY6nH+yPL+c9UXVfHPGMOIiw5wOxwQ4SwQmqA3rF8v9n83k1S0lbCjyjaEnjp9q5r+W7WB8\nRjx3TR3odDgmCFgiMEHvoauHkdGnF//x+naaW52fvObxVbuoaWjhpzePI8RmHzNeYInABL1e4SH8\n17wxFFbW8ecPixyN5aN9R1m8uYQHPpfJ6DRrLmq8wxKBMcA1o/oxe2wKv3lrH9tLaxyJoaG5je+/\nlk9mYjQPXT3MkRhMcLJEYIzbT27KIiE6nIde2kpdU6vXj//osu0UH2vgJzdnERkW4vXjm+BlicAY\ntz7R4fxmQTaHqk/xw9e9O9rJy5sO80puCQ9dPZRpmX29emxjLBEY08nUzL5845rhLNlayqubS7xy\nzO2lNfzn0h1cMTSRb86wHsTG+ywRGHOGB68eytTBCfzH69t7fN6CE/XNfPWFzfR1X41YKyHjBEsE\nxpwhxCX8350TSY6L4IvPbGLvkdoeOU59cyv3P7+ZIycbeeKuifSNOeu8TMb0KEsExnQhKTaCF+6b\nSkSoi3ue/oTiY56dXbWhuY0vPbuJ3IPH+N/bs5k4oI9H39+Yi2GJwJiz6J8QxV/vm0JDcxv3PP0J\nJcc9kwwamtu477lNbDxwjF/ens288WkeeV9jLpUlAmPOYWRKHM98cQrVdc3M+7+PWVfYvVnNymsa\nWPiXjawvquYXt43nxgmfGp3dGK+zRGDMeUwa2IelD15O3+hw7n76E578YP8lzXe8ZkcFs3/zIdvL\navj157O5eWJGD0RrzMXrViIQkdtEZIeItItIzjm2myUie0SkUEQe6bR8sIh84l7+soiEdyceY3pK\nZlIMr339cmaNTeEnq3Zz2x/X8+G+qgtKCIer6/ne4nweeH4z/ftEseKhK5ifbVcCxnfIpZzZ/GNn\nkVFAO/An4LvumcnO3CYE2AtcC5QAm4A7VHWniLwCLFHVRSLyRyBPVf9wvuPm5ORobu6nDmVMj1NV\nXtpYzO/e2Ud5TSMTBvTmzikDGJsez9DkGMJCXLS3K1V1TWwvreHFTw7z7p5KXCJ8+TOD+c61IwgP\ntQtx4wwR2ayqnzpp7+5Ulbvcb36uzaYAhapa5N52ETBfRHYBVwN3urd7DvgRcN5EYIxTRIQ7pw7g\nlknpLN5cwu/f3c/Di/MBCA91kRQTQWVtIy1tHSdYiTERPHTVUO6YOsBmGTM+q1uJ4AKlA8WdXpcA\nU4G+wAlVbe20/KzXyyJyP3A/wIABA3omUmMuUERoCHdNHciCyQM4cLSOHWUn2Vl2kqraJlLiI0nt\n3YsBCVFMz+xrVwDG5503EYjIW0BKF6v+XVWXej6krqnqk8CT0FEa8tZxjTmXEJcwNDmWocmxVvc3\nfuu8iUBVZ3TzGKVA/06vM9zLqoHeIhLqvio4vdwYY4wXeeOadRMwzN1CKBxYACzTjrvU7wK3urdb\nCHjtCsMYY0yH7jYfvUlESoDpwEoRWeNeniYiqwDcZ/sPAmuAXcArqrrD/RbfA74tIoV03DN4ujvx\nGGOMuXjdaj7qFGs+aowxF+9szUetOYMxxgQ5SwTGGBPkLBEYY0yQs0RgjDFBzi9vFotIFXDoEndP\nBLo3lrD/s7+B/Q2C/feH4PwbDFTVpDMX+mUi6A4Rye3qrnkwsb+B/Q2C/fcH+xt0ZqUhY4wJcpYI\njDEmyAVjInjS6QB8gP0N7G8Q7L8/2N/gH4LuHoExxph/FoxXBMYYYzqxRGCMMUEuqBKBiMwSkT0i\nUigijzgdjzeJSH8ReVdEdorIDhH5htMxOUVEQkRkq4iscDoWJ4hIbxFZLCK7RWSXiEx3OiZvE5Fv\nuf8fbBeRl0Qk0umYnBQ0iUBEQoAngNnAaOAOERntbFRe1Qp8R1VHA9OArwfZ79/ZN+gYEj1Y/QZ4\nQ1VHAuMJsr+FiKQD/wrkqOpYIISOeVKCVtAkAmAKUKiqRaraDCwC5jsck9eoarmqbnE/r6XjP3/Q\nza0oIhnAXOApp2NxgojEA5/FPfeHqjar6glno3JEKNBLREKBKKDM4XgcFUyJIB0o7vS6hCD8IgQQ\nkUHABOATZyNxxK+BfwPanQ7EIYOBKuAZd3nsKRGJdjoob1LVUuAXwGGgHKhR1TedjcpZwZQIDCAi\nMcCrwDdV9aTT8XiTiFwPVKrqZqdjcVAoMBH4g6pOAE4BwXa/rA8d1YDBQBoQLSJ3OxuVs4IpEZQC\n/Tu9znAvCxoiEkZHEnhRVZc4HY8DLgfmichBOkqDV4vIC86G5HUlQImqnr4aXExHYggmM4ADqlql\nqi3AEuAyh2NyVDAlgk3AMBEZLCLhdNwcWuZwTF4jIkJHXXiXqv7S6XicoKrfV9UMVR1Ex7//O6oa\nVGeCqloBFIvICPeia4CdDobkhMPANBGJcv+/uIYgu2F+plCnA/AWVW0VkQeBNXS0EviLqu5wOCxv\nuhy4BygQkW3uZT9Q1VUOxmSc8RDwovuEqAj4osPxeJWqfiIii4EtdLSm20qQDzdhQ0wYY0yQC6bS\nkDHGmC5YIjDGmCBnicAYY4KcJQJjjAlylgiMMSbIWSIwxpggZ4nAGGOC3P8PYUfkhBhZgZUAAAAA\nSUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "tags": []
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Compute the x and y coordinates for points on a sine curve\n",
    "x = np.arange(0, 3 * np.pi, 0.1)\n",
    "y = np.sin(x)\n",
    "\n",
    "# Plot the points using matplotlib\n",
    "plt.plot(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "9W2VAcLiL9jX"
   },
   "source": [
    "With just a little bit of extra work we can easily plot multiple lines at once, and add a title, legend, and axis labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 312
    },
    "colab_type": "code",
    "id": "TfCQHJ5AL9jY",
    "outputId": "fdb9c033-0f06-4041-a69d-a0f3a54c7206"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x7f0f39c04780>"
      ]
     },
     "execution_count": 106,
     "metadata": {
      "tags": []
     },
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEWCAYAAABIVsEJAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0\ndHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOydd3hU55X/P0ddQgXUCyCaKOp0424D\nphmEjXuJneYkm2w2cTZZJ9mNvcl6f042m7KJN1mn2Ym7sU3HBveCMYgiVChCNElIQkggVFA/vz/u\nyJGxJISm3Lmj+3me+8zMrd8B3Tn3Pee854iqYmNjY2Njc6n4mS3AxsbGxsaa2AbExsbGxmZI2AbE\nxsbGxmZI2AbExsbGxmZI2AbExsbGxmZI2AbExsbGxmZI2AbEZlggIneLyBazdVwMEXlHRL7k4Wv+\nQET+6Mlr2vgGtgGx8RlE5EoR2SYiDSJSLyIfishsAFV9RlVvMFujs4jIZBF5SUROO77nPhF5UET8\nh3pOVf1PVfWo0bLxDWwDYuMTiEgksAH4DRANpAD/DrSZqcuViMhE4GOgHMhS1SjgVmAWEGGmNpvh\niW1AbHyFyQCq+pyqdqnqeVXdoqr7AETkfhH5oGdnEVER+aqIlIrIWRF5XESk1/YviMh+ETkjIq+L\nSGp/F3aMCKodI4L3RCSj17YnHefeKCKNIvKxwxD0bF8oIgccx/4WkD4vYvDvwDZVfVBVqxzf96Cq\n3qWqZx3nWyEixY7v9I6ITOt1rX8RkUqHjoMiMt+x/hERedrxfpzj3+Y+ETnhGOn8sNc5/ETkIREp\nE5E6EXlRRKIv+r9j45PYBsTGVzgEdInIUyKyRERGDeKYG4HZQDZwG7AIQETygB8ANwNxwPvAcwOc\nZzOQBsQDu4FnLth+B8aP/yjgMPCo4zqxwCvAvwKxQBlwxQDXWQCs7m+jiEx26PyWQ/cmYL2IBInI\nFOAbwGxVjXB812MDXOtKYAowH/hRL0P0j8BK4BogGTgDPD7AeWx8GNuA2PgEqnoO40dPgT8AtSKy\nTkQSBjjsMVU9q6ongLeBXMf6rwL/T1X3q2on8J9Abn+jEFX9s6o2qmob8AiQIyJRvXZ5VVV3OM71\nTK/rLAWKVXW1qnYAvwKqB9AbA1QNsP12YKOqbnWc7+dAKHA50AUEA+kiEqiqx1S1bIBz/btjFFcA\nFAA5jvVfBX6oqhW9vu8tIhIwwLlsfBTbgNj4DI4f/PtVdTSQifGE/KsBDun9Y90ChDvepwK/driB\nzgL1GK6llAtPICL+IvKYw6Vzjr8/1ccO4jrJGPGMHv3a+3Mf1AFJA2xPBo73Ol+343wpqnoYY2Ty\nCHBKRJ4XkeQBzjXQv82rvf5t9mMYp4EMtY2PYhsQG59EVQ8AT2IYkkulHPiKqo7stYSq6rY+9r0L\nyMNwL0UB4xzrB4pl9FAFjOn54IjBjOl/d94AVg2w/STGD/yF56sEUNVnVfVKxz4K/HQQGi+kHFhy\nwb9NiKpWDuFcNhbHNiA2PoGITBWR74jIaMfnMcCdwPYhnO73wPd7guEiEiUit/azbwRGplcdEIbh\n7hosG4EMEbnZ4QL6JpA4wP4PA5eLyH+JSKJD2yQReVpERgIvAstEZL6IBALfcWjbJiJTROR6EQkG\nWoHzQPclaO3h98CjPe48EYlzxIxshiG2AbHxFRqBucDHItKMYTiKMH5ELwlVfRXj6fx5h1uqCFjS\nz+5/xXAbVQIlXILBUtXTGGm4j2EYoDTgwwH2LwPmYYxyikWkAXgZyAcaVfUgcA9GKvNpYDmwXFXb\nMeIfjznWV2ME/L8/WK29+DWwDtgiIo0Y33fuEM5j4wOI3VDKxsbGxmYo2CMQGxsbG5shYRsQGxsb\nG5shYRsQGxsbG5shYRsQGxsbG5shMaxmj8bGxuq4cePMlmFjY2NjKXbt2nVaVeMuXD+sDMi4cePI\nz883W4aNjY2NpRCR432tt11YNjY2NjZDwjYgNjY2NjZDwjYgNjY2NjZDwjYgNjY2NjZDwjYgNjY2\nNjZDwlQDIiJ/FpFTIlLUz3YRkf8RkcMisk9EZvTadp+jHWmpiNznOdU2NjY2NmD+CORJYPEA25dg\nVChNAx4Afgfg6MH8MEYV0DnAw4NsYWpjY2Nj4yJMnQeiqu+JyLgBdskD/uro1LZdREaKSBJwLbBV\nVesBRGQrhiEaqG/10Cl4HpprISYNYtNgZCr4W2cKTVe3srf8LLWNrZw738m51g6SR4Yye1w0cRHB\nZsuz8QUaa6DuMDRWQWM1BARD8gxIzDTeW4izLe1sK6ujqa2T7m6lW2Fi3Ahmpo4iwN/sZ27vwtt/\nBVP4dIvPCse6/tZ/BhF5AGP0wtixY4emovhVOPTa3z8HR8Gs+2Hu1yByoA6j5nKoppGXd1ewZk8l\nNefa+txnQuwIFqQn8NVrJhI9IsjDCm0sTXcXlG6F/D9D6RaMJocX4BcIKTPhym/B5MUgg2nU6Hma\n2jp5ZXcFrxVV8/HRerq6P/tdIkMCuHpyHHm5KSyYFo946XfxJN5uQJxGVZ8AngCYNWvW0Jqf3PUC\ntNTD6VKoKzVumm2/gY/+F7JvgwWPQHi860Q7ydmWdh5ZV8yavSfx9xOunRzHD5elMDFuBJEhgUSE\nBHD0dDM7j9Wz/Ug9f3z/CM99fIKvXjuRL1wxntAgf7O/go23U/oGbPg2NJyA8AS46jsw7kqISIKI\nBGhrgpN7jKX4FXjuDkieDtf9ECYt8BpDoqq8XlzNI+tKqD7XysS4EXzl6gksSE8gLjwYfz9D576K\ns7x14BRvH6xlw74qrpwUyyMr0pkUH2HyNzAX0xtKOVxYG1T1M72rReT/gHdU9TnH54MY7qtrgWtV\n9St97dcfs2bNUpeVMqk/Ctv/F3Y9BaGj4JY/GTeQyWwtqeEHrxZyprmdr107kfsuH0ds+MAuhNKa\nRn762kHe2F9DUlQIv79nJjljRnpIsY2l6DgPW38EO56A+HS49vswZQn4B/Z/TFeH4QZ+72dw9gTM\n+Bws/bnprq3Ks+f5tzVFvHXgFNOSIvmPlRnMTI0e8JjOrm6e3XGCn79+kJb2Lr541Xj++YYpBPq4\na0tEdqnqrM+s93IDsgz4BrAUI2D+P6o6xxFE3wX0ZGXtBmb2xET6w6UGpIfqInjpPqg/YjxdXfkg\n+Hn+j6m7W/nxhhKe3HaMqYkR/PdtOWQkR13SOXYcrefbL+zldFMbv7gtl2XZ3uueszGB2kPwwj1w\n+iBc9nWY/yMIDBn88Z3t8O5j8P5/w+jZcNvfTHMBF59s4L4/76SlvZMHF07m/svHXVJ843RTGz97\n7QAv5ldw7ZQ4Hr9rBiOCfdeh45UGRESewxhNxAI1GJlVgQCq+nsxnIy/xQiQtwCfV9V8x7FfAH7g\nONWjqvqXi13PLQYEoK0R1n8LilZD9u2w8vceNSJd3cpDL+/jpV0VfP6KcXx/yTSCAoZ2/dNNbXzl\nb7vYdfwM31k4mW9cP8n29drAqQPw1HJA4eYnYOL1Qz9X8RpY8w8QHG64h5Onu0zmYNh+pI4vP5VP\neEgAf/3CHNIShu6Gem7HCX74aiGZKVH8+f7ZFx3tWxWvNCCexm0GBEAV3vs5vP0fMPN+uPFXHvHz\ndnR18+CLBawvOMk/zU/jWwvSnP7Bb+3o4vuvFPLqnkq+ft1EvrtoqovU2liSUwfgqRtB/OC+DRA3\n2flz1pTAs7dDexN84XXXnHMQbCmu5hvP7WHMqFD+9sW5JI8Mdfqcb+6v4evP7iY+IoRnvzyX0aPC\nXKDUu+jPgPi2486TiMA13zWCibuehNd/aBgVN9LdrXzr+b2sLzjJQ0um8u2Fk10yWggJ9OcXt+Vw\n55yxPP52GU9v77OSs81wwB3GAyAhHT63Bvz84emboaHSNecdgPxj9Xzj2T1MS4pk9Vcvd4nxAJg/\nLYHnvnwZZ1ra+eKT+TS2drjkvFbANiCu5vp/g7lfhe2Pw7s/c+ulfv1mKRsLq/jB0ql89ZqJLj23\niPCTvAzmT43nR2uL2FpS49Lz21iAlnp45lbDeNy/0fWjhJiJcPdqOH8Wnl5lXM9NlNe38JW/7SJ5\nZAhPfX42o1ycsj597Ch+d/dMDtc28Y/P7aGzq9ul5/dWbAPiakRg8WOQcye8859Gyq8b2FJcza/f\nLGXVjNF8+aoJbrlGgL8fv7lrOlkpUfzjc7vZW37WLdex8UK6u+DlL0FTNdzxnDGB1h0k58Kdz0J9\nGbz4OeO6LqaprZMv/zWf9q5u/njfbEaGuWe+05VpsfwkL5N3Dtbykw0lbrmGt2EbEHcgAjf+EhIy\n4ZUHoKHCpac/fKqRb7+wl5zRUTx6U6Zbg9xhQQH8yREc/Pozuzk3jIbnw5p3fwplb8KSn8Home69\n1virYfmv4dj7RoaWC+lx85aeauJ/757BpPhwl57/Qu6aO5YvXzWepz46zjMf+77r1zYg7iIwFG59\nErraYfUXjFx4F9Dc1skDf91FaJA/v793JiGB7p/0FxsezK/vmE71uVYeXlvs9uvZmMzB1wwDknuP\nkRDiCXLuhKzb4J3/Bye2u+y0f/3oGG/sr+Ffl03jqrTPtPR2Cw8tmcZVabH8ZEMJR083e+SaZmEb\nEHcSm2Y8WZV/DG/9xCWn/K/XD3K0rpnf3jWDpCjXBAEHw8zUUXzz+jRe3VPJ2r3uD3jamETTKVjz\nVUjMhmU/99yMcRFY9t9GnbmXvwTnzzh9yiO1TTz22gGunRLH/ZePc17jIPH3E35+aw7BAf58+4W9\nPh0PsQ2Iu8m6xXiK+/DXUL7DqVPtOFrPk9uOcd+8cVw2IcY1+i6Br183kZmpo/jXV4sor2/x+PVt\nPMDm70F7M6z6kzGK9iQhkUZFh8YqWPdNp7IYu7qVf36pgOAAf366Ktvjc5kSIkP4j5WZ7C0/y+/f\nLfPotT2JbUA8wQ2PQuRoWP9PQ3ZlnW/v4nurCxgTHcr3Fk9xscDBEeDvx69uz0WB764uYDjNIRoW\nHNhkFA695nsem5fxGVJmGhUd9q+Dg5uGfJo/vH+E3SfO8uO8DBIiL2G2vAtZnpPM8pxkfvVGKUWV\nDaZocDe2AfEEweGw9L/gVIlRhHEI/GLrQY7VtfDTVdmEBZlXMmFMdBg/WDqN7UfqWVdw0jQdNi6m\ntQE2PmgkflzxLXO1XP6PEJ8Bm//FGA1dIqU1jfxiyyGWZCayIifZDQIHz0/yMogJD+KfXyrwSVeW\nbUA8xdSlMG25EZysP3pJhxaUn+VPHxzl7rljuXxirJsEDp7bZ48he3QU/7Fx/7CaNOXTbH0Ymmpg\nxW8GLozoCfwDjXhIQ/klz6VSVR5ZX0xokD8/WeneDMXBMDIsiEeWZ3CgupHndpwwVYs7sA2IJ1ny\nM6M/wsYHB+3fVTWKJEaPCOahJd5RUsTfT/hxXianm9r49RulZsuxcZaKfNj1F7jsHyBlxsX39wSp\n84wssI9+a8yGHyRbS2r48HAd316Q5jV1qRZnJjJvQgz/vfUQZ5rbzZbjUmwD4kkik40KpmVvwYEN\ngzpkY2EVu46f4buLJhMRYvKTYS9yx4zkjtlj+Mu2YxysbjRbjs1QUTXK7oyIN0qzexMLfwzBEbDx\nO4N64Grr7OLRTftJiw/n7stSPSBwcIgID69I59z5Dn6x9ZDZclyKbUA8zawvQOwUeOPfoatzwF1b\nO7p4bPMBpiZGcMvMMR4SOHi+u2gqESEB/GhtkR1Qtyr710P5drj+h0aszpsYEWM0azv+AZSsueju\nf/nwGMfrWvjR8nSv688xNTGSey9L5ZmPj7O/6pzZclyGd/0rDwf8A4yboq4U9vx1wF3/8uExKs6c\n51+XpX/SGc2biB4RxHcWTubjo/W8deCU2XJsLpXOdnjjYYibZriLvJHp9xr63np0wAeuU42t/ObN\nUhZMi/fYhMFL5dsLJxMVGsgj64p95oHLNiBmMGUJjLkM3nms3yyT001tPP72YeZPjefKNPMD5/1x\nx5yxjI0O4+dbDtHdRx9pGy8m/89GI7SFPzYebLwRP3+Y/2/GA1fBs/3u9suth2jv6uaHy9I9KO7S\nGBkWxLcdD1zvHqo1W45LsA2IGYgYN21TjdFXvQ9+82YprR1d/GDZNA+LuzQC/f349sI09ledY1NR\nldlybAbL+bNGRuD4ayBtodlqBmbKUqOD4TuPGS11L6C8voWX8iu4a85YxseOMEHg4Llj9lhSRoby\ni62HfGIUYqoBEZHFInJQRA6LyEN9bP+liOx1LIdE5GyvbV29tq3zrHIXMHYuTL3RmKHefPpTm6ob\nWnluRzm3zBzNxDgv80v3wYqcFCYnhPOLLYd8MtfdJ/not3C+Hm74iefKlQwVEZj/MJyrhJ1//Mzm\n37xVip+f8A/XTTJB3KURFODHN+dPYl9FA2/st77b1zQDIiL+wOPAEiAduFNEPjX+VNVvq2ququYC\nvwFe6bX5fM82VV3hMeGuZP7D0NEMH/zyU6t//24Z3ap83QI3BBhpvd+5YQpHTjfzyh67TpbXc/4s\nfPx/xrykpByz1QyO8VfBxPnw/i+MSY8Ojp1u5uXdldw9d6xpM84vlZtnjCY1JoxfbLW+29fMEcgc\n4LCqHlHVduB5IG+A/e8EnvOIMk8RNxkyb4H8v3zSTKfmXCvP7jjBzTNSGBNtndaYN6QnkDM6il+/\nUUpbp+t7Oti4kB1PQNs5uPp7Ziu5NOb/yBg1bf/dJ6v+561SAv2Fr13r2oZq7iTQ349/mm+4fV8r\nrjZbjlOYaUBSgPJenysc6z6DiKQC44G3eq0OEZF8EdkuIiv7u4iIPODYL7+21gsDV1c9aIxCHDfF\n798to6tb+cZ1bmrg4yZEjFFI5dnzvLzLHoV4LW2N8NHjMHkJJGWbrebSSM41dH/8e2hv5khtE2v2\nVHLvZanER1hj9NFDXm4KE+NG8Muth+iy8CjEKkH0O4DVqtr70TbV0eT9LuBXItLnI4iqPqGqs1R1\nVlycF6b3xU8zYiE7/o/a2lqe/fgEN09PYWyMdUYfPVyVFkv26CieeK/M0jeFT7PjD9B6Fq75rtlK\nhsZVDxql3nc9xW/eOkxwgD9fcXE7Z0/g7yd8a8FkSk818VqRdUchZhqQSqD37LjRjnV9cQcXuK9U\ntdLxegR4B5jueoke4up/htYG9q35bzq7lW9cb43Yx4WICF+7ZiLH6losfVP4LO3NRvB80gKj6q0V\nGTMHUq+k88PfsLngBPdcNtZrSpZcKkuzkhgXE8YT75VZNiPLTAOyE0gTkfEiEoRhJD6TTSUiU4FR\nwEe91o0SkWDH+1jgCsC6TYiTp9Mx/npyK57hlqxoUmO8OxVxIG7ISGR87Ah+/651bwqfZdeT0FJn\nvdjHhVz1bQKaTrLS732+cOV4s9UMGX8/4UtXTaCgooHtR+rNljMkTDMgqtoJfAN4HdgPvKiqxSLy\nYxHpnVV1B/C8fvrXaBqQLyIFwNvAY6pqXQMCrI+6ixg5x7djPrr4zl6Mv5/wwNUTKKxsYFtZndly\nbHro6jTmHI27ykghtzANSVdTouN4MGwzSRFBZstxiltmjiZmRBBPvGfNplOmxkBUdZOqTlbViar6\nqGPdj1R1Xa99HlHVhy44bpuqZqlqjuP1T57W7kraO7v5ackoDgRlklj854vWyPJ2bp6RQnxEML97\nx5o3hU+yfy2cq4B5XzdbidM8veMEj3esIL693KjlZWFCAv257/JxvH2w1pJFSa0SRPdpNuw7Sc25\nNjrnfA0aTjjVic0bCA7w5wtXjueDw6cprPDNTmyWQtXIvIqeCGmLzFbjFG2dXTy57RiNE5YY32fb\n/5gtyWnuvSyV0EB/nnjviNlSLhnbgJiMqvKH94+SFh9OxnV3wMixn8pztyp3zx1LRHAAf/zAejeF\nz1G+Ayp3wWVfAz9r3/Jr95yktrGNB66ZDHO/anyvinyzZTnFqBFB3D57DOsKKqlq+GypFm/G2n9N\nPsC2sjr2V53jS1eNR/wDYM5X4MQ2OLnXbGlOERESyKqZo9lUWMWpxlaz5QxvPvothIyE3LvMVuIU\n3d3KH94/QnpSJFdMioHcOyEowphVb3G+eOV4urqVJ7cdM1vKJWEbEJP5w/tHiA0PIi/XMYdyxr0Q\nFG5MlrI4n5uXSkeX8uzHvtfK0zKcOWY0L5t5PwRZN7sP4MOy05SeajIetkSMZlPT74HiV6HR2mnj\nY6LDuCE9kRd3ltPaYZ1KDrYBMZEjtU28c7CWey5LJSTQ31gZEgW5d0PhamisMVegk0yIC+eayXE8\n8/EJ2jvtIoum8PETIH4w5wGzlTjN3z46TvSIIJZlJ/195ZwvQ3enUZre4nxuXipnWjrYsM86Va1t\nA2Iiz3x8ggA/4a65Yz+9Ye5XfOamuP/ycdQ2tlm+5o8laW+GPX+D9JUQ1WeVIMtQefY8b+yv4fbZ\nYwgO8P/7hpiJMHmRca90tpkn0AXMmxjDpPhw/vrRMbOlDBrbgJjE+fYuVu+qYFFm4mfr+HxyU/zJ\n6BpnYa6ZHMe4mDCesphv1ycoXG0UTZzzZbOVOM2zHx8HjOSMzzD3K9Bca7iyLIyI8Ll5qeyraGBv\n+dmLH+AF2AbEJNbvO0nD+Q7uvSy17x1mf8m4KQ5s8KwwF+PnJ9w7bxy7jp+hqNJO6fUYqsYDSHw6\njLH2xMG2zi6e31HO9VMTGD2qjxpxE66D2Ck+ETe8aXoKI4L8LTMKsQ2ISTyz/Thp8eHMHR/d9w4T\nrzdSenf9xbPC3MCts0YTFuRvuQwTS3NyN1QVwKwveH/DqIuwubCauuZ27p3Xz8OWiPHAdXKPsViY\nnuzFDQVV1DV5v0vONiAmsK/iLAUVDdxzWaqRTdIXfv4w4z44+h6cPuxZgS4mMiSQldNT2OAYddl4\ngJ1/hsARkH272Uqc5m/bjzMuJoyrJsX2v1P2bRAQCrue8pwwN3HvZam0d3Xz/M7yi+9sMrYBMYGn\ntx8nNNCfm2ZcJLA5/V7wC/CJUcids8fS2tHN2r12rxC3c/4MFL0MWbdASKTZapyi+GQDu46f4Z7L\nUvHzG2AkFToSMm+GwpegrclzAt1AWkIE8ybE8NyOE17fsdA2IB6moaWDdQUnWTk9mciQwIF3jkiA\nKUth77PQYe3JeFmjo8hIjuS5HeV2lV53U/ACdJ6H2V80W4nTvLCznKAAP26ZOfriO8+8H9qbDONp\nce6YM4aKM+e9viCpbUA8zJq9lbR2dHP33H78uRcy6/NGG0+LF40DuGPOWPZXnWOfXR/LfagaKa0p\nM63T77wfWju6eHVPJYszEhkZNoiqu6NnQ9w0o2y9xVmUkUhUaCAv5Hu3G8s2IB7mhZ3lZKZEkpkS\nNbgDxl8Lo8b5hBsrLzeZ0EB/nt9pz0x3Gye2w+mDMPPzZitxmteKqmls7eSO2WMuvjMYwfSZ9zsS\nCPa5VZu7CQn056bpKbxeVM2ZZu9N5bcNiAcpqmygpOoct80a5A0BRvG7mffD8Q+h9pDbtHmCyJBA\nlmUnsW7vSZrbrF2y3mvZ87RRCifjJrOVOM3zO08wNjqMyybEDP6g7NsgIAR2Wz+YfvvsMbR3dfPq\nHu+NG9oGxIO8lG/4c/NyLnFWcM5dIP6w92n3CPMgd84ZQ3N7F+sLTpotxfdoazIm02WshOBws9U4\nxfG6ZrYfqee2WaMHDp5fSFi0MfN+34vGTHwLMy0pkpwxI3lhp/fGDU01ICKyWEQOishhEXmoj+33\ni0itiOx1LF/qte0+ESl1LPd5Vvml09rRxZq9J1mckUhU2EWC5xcSkQBpNxjBUYs3m5oxdhRp8eE8\nZ4EURctRsgY6mo3sPYvzYn45fgK3zLyE0XoPM+8zZuD7Qtxw9hgO1jR67cx00wyIiPgDjwNLgHTg\nThFJ72PXF1Q117H80XFsNPAwMBeYAzwsIqM8JH1IbCmpoeF8B7cP1p97IdPvhqZqKHvLtcI8jIhw\n++wxFJSf5VCN9TqweTV7noaYSZafed7Z1c1L+RVcOyWexKiQix9wIWPnGXHDvc+4XJunWZ6TTFiQ\nPy946QOXmSOQOcBhVT2iqu3A80DeII9dBGxV1XpVPQNsBRa7SadLeCm/nJSRocy7FH9ub9IWQViM\nT7ixVk5PIcBPeHlXhdlSfIfTh+HER0Z5c4vPPH/3UC2nGtuG/rAlYlS0PvoenLV2wkZ4cAA3Ziex\nvuAkLe3e530w04CkAL3NaoVj3YWsEpF9IrJaRHr+ogZ7LCLygIjki0h+bW2tK3RfMhVnWvjg8Glu\nvVR/bm8CgiDrNji4GVrqXSvQw8SGB3PtlDhe3VNJZ5dd5t0l7H3GiJPl3Gm2EqdZvauC2PAgrp8a\nP/ST5NxhvBa84BpRJnLLTCNu+LoXVrT29iD6emCcqmZjjDIuObVCVZ9Q1VmqOisuLs7lAgfDy7uM\nLIpBTYYaiOl3Q1e7UWXV4qyaMZpTjW18cPi02VKsT3cXFDwHkxZARKLZapzibEs7b+4/xYqcFAL9\nnfh5GjkWxl9tGFYvDUAPllmpoxgTHfrJ74g3YaYBqQR6j1FHO9Z9gqrWqWpPRbE/AjMHe6y3oKq8\nsqeCeRNi+q4keikkZkFitk+4sa6fFk9UaCAv7/bK/zZrUfYWNFYZ7iuLs2FfFe1d3dx8sTI/gyHn\nLjhz1JgbY2H8/ISbpo/mw7LTXtcz3UwDshNIE5HxIhIE3AGs672DiPRqPcYKYL/j/evADSIyyhE8\nv8GxzuvYfeIsx+tauGm6ixr6TL/HqLJaXeSa85lEcIA/ebnJbCmu5lyrXWDRKQqeh9BRMNmrw4CD\n4pXdFUxJiCAj2QU1vNJXGHNifCCYvmpGCqqwZo93pb+bZkBUtRP4BsYP/37gRVUtFpEfi8gKx27f\nFJFiESkAvgnc7zi2HvgJhhHaCfzYsc7reGV3BSGBfizJSrr4zoMh61bwC4R9z7vmfCayasZo2jq7\n2WihFp5eR1sjHNgIGTcbcTILc/R0M7tPnOXmGSn9V6m+FIJGGHNCitdAe4vz5zOR1JgRzEodxcu7\nK7xqToipMRBV3aSqk1V1oqo+6lj3I1Vd53j/fVXNUNUcVb1OVQ/0OvbPqjrJsXhlnY+2zi427Kti\nUUYi4cEBrjlpWLQxJ6RwtUhlsUcAACAASURBVOH7tjDZo6OYFB/Oajsba+jsX28UTvSBsu2v7q7A\nT4wsPZeRexe0N1q+MRvAzTNGc/hUE4Ve1JjN24PolubtA7U0nO9wnfuqh+zbDJ/3sfdde14PIyKs\nmjGaXcfPcPS0tWcNm0bB8zBqPIyZY7YSp+juVl7ZU8kVk2JJiBzC3I/+GDsPosbCPutnYy3LTiIo\nwI9XvChuaBsQN/Lqngpiw4O5cqBGOENh8mIIjjTKNVicldOTEcHuEzIUGiqNuQ7Zt1t+7sfOY/VU\nnDnPqhlOZipeiJ+f0Rel7G1oMieN31VEhQayMD2BtXsrae/0jvR324C4ibMt7bx14BR5uckEOJOO\n2BeBIUaAsGSd5X27SVGhzB0fzdq9J73Kt2sJilYDaoxILc4ruysZEeTPDRkJrj959u2gXVD8iuvP\n7WFWzUjhTEsH7x7yDmNoGxA3sWFfFR1d6nr3VQ/Ztxu+3UOb3XN+D7IyN4Wjp5vtPiGXSsELRg+M\nmIlmK3GK1o4uNhVVsTgzibAgF8UKexM/1UiB9wE31lVpcYwKC2SdlxQjtQ2Im1izp5LJCeGuSUfs\ni9QrITLFJ9xYS7KSCPL3Y+1e77gpLEF1IZwq9ong+TsHa2ls7SQvN9l9F8m6DSp3QV2Z+67hAQL9\n/ViWncTWkmqavKAlgm1A3EB5fQv5x8+Ql+uidMS+6PHtHn4Dmq09mzsqNJDrpsaxft9Jury8B7TX\nsO8F8Asw0nctzrqCSmLDg7h84hDrxA2GrFsAMXqmW5y83BRaO7rZWmJ+aRPbgLiB9fuMJ+kVOW58\nogLj6bO70+gBYXFW5qZQ29jGtjJrG0OP0N0NRa8YpUtGuPFH1wM0tnbwxv5T3JjthlhhbyKTYfxV\nhuG1eKxt5thRpIwM9YoRu21A3MC6vSeZMXYkY6KdLF1yMRIyID7DJ9xY102NJyI4wOtm2nol5dvh\nXCVkrjJbidO8XlxDe2c3K9zpvuoh6zaoPwKVu91/LTfi5yesyE3m/dLT1DW1XfwAd2ox9eo+yKGa\nRg5UN7p/9NFD1iqo2GH5stUhgf4szkzk9eJqWjusPUHS7RSuhoBQmLLUbCVOs3ZvJWOiQ5k+ZqT7\nL5a+AvyDodD6D1x5ucl0dSubCs2t4mAbEBezbu9J/ASWZXvIgPQ8hRZZP0Vx5fQUmto6eWN/jdlS\nvJeuDqPz4JTFlm9bW9vYxoeHT5OX48ZYYW9ComDyDYbL1+JVHKYmRjIlIYI1JruxbAPiQlSVdQUn\nuWJSLHERwZ656KhxkDLLMSfA2lw2IYa4iGA2FNi1sfrlyLvQUgeZt5itxGk2FVbRrbg3++pCMldB\nUw0c/9Bz13QTK3KT2XX8DOX15s0Fsw2IC9lbfpYT9S0s95T7qofMVUZaZ+0hz17Xxfj7Ccuyknjr\n4Cka7Qq9fVP0MgRHQdpCs5U4zdq9lUxNjCAtIcJzF01bBIEjjH9Hi9PjJjdzTohtQFzIuoKTBAX4\nsTjTw019Mm4CxCduiuU5SbR3drO1xHZjfYaOVqMo4LTlEOChEa6bKK9vYfeJs54JnvcmKAymLoWS\ntYY70MKMiQ5j+tiRbDCxmrVtQFxEV7eyYV8V102JIzIk0LMXj0yCcVcaBsTiKYrTxxgpimbeFF5L\n6RZoO2ckTlicjY7g73JPxQp7k7kKzp+BI+94/tou5sbsZPZXnaOstsmU69sGxEXsOFpPbWOb591X\nPWTeDHWlhivLwvj5Ccuyk3jvUC1nW9rNluNdFK2GEXEw7mqzlTjNhn0nyRnjgVT3vph4vRFQ94ER\n+7KsJEQwLW5oqgERkcUiclBEDovIQ31sf1BESkRkn4i8KSKpvbZ1ichex7LuwmM9zYZ9JwkN9Of6\nqfHmCJiWZ8xM9oFg+vLsZDq7ldeLzZ9p6zW0NcGhLUaDJH831IvyIMdON1NUeY7l2S5qsnapBAQb\nbsD9Gwy3oIVJjAphdmo0G/aZEwcxzYCIiD/wOLAESAfuFJH0C3bbA8xS1WxgNfCzXtvOq2quY1mB\niXR2dfNaUTXXT4t3TzG4wTAiBiZcZ6TzWtyNlZkSSWpMGOvtbKy/c+g1o3FUpvVLl/T82C11VZfO\noZC5yihGenireRpcxI05SZSeauJgdaPHr92vARGR34jI//S3uODac4DDqnpEVduB54G83juo6tuq\n2pOjth1wcbMA17D9SD11ze3mPVH1kHkzNJQbReMsjIiwPDuZbWWnOW3yTFuvofhVCE+EMZeZrcRp\nNuyrYlbqKJJHhponYtzVEBbrE26sJZlJ+AmmjEIGGoHkA7sGWJwlBSjv9bnCsa4/vgj0rl0eIiL5\nIrJdRFb2d5CIPODYL7+21j019DcWnmREkD/XTjHJfdXDlKVGv3QfqI11Y04S3QqbTZ5p6xW0noPS\nrZCx0iiiaWEOnzIqNdxo9sOWfwCk58HB16Dd2t0w4yKCmTcxhg37qjzeU6ffv0ZVfar3Arx0wWeP\nISL3ALOA/+q1OlVVZwF3Ab8SkT6bIqjqE6o6S1VnxcXFuVxbR1c3m4uqWZCeQEigv8vPf0mEjoRJ\n86F4jVFwz8JMSYggLT6c9XY2luG+6mrzicq76wuqEDHZfdVDxk2GW7B0i9lKnObG7GSOnm6m+OQ5\nj173oo8zIjJPREqAA47POSLyvy64diUwptfn0Y51F15/AfBDYIWqfuLPUNVKx+sR4B1gugs0XTLb\nyuo429LBjWakI/ZFxk1wrgIq881W4hQiRjbWzmP1nDpn7UCn0xS9YvR+GT3bbCVOoaps2HeSueOj\niXdl3/Ohkno5jIj3iRH74oxEAvzkk0rgnmIw4+FfAYuAOgBVLQBckUe4E0gTkfEiEgTcAXwqm0pE\npgP/h2E8TvVaP0pEgh3vY4ErgBIXaLpkNhScJCI4gKsnu7jv+VCZsgT8g3zipliWlYQqbC4axtlY\n589C2ZtG9pXF3VcHaxopq232XJ24i+HnbxRYPLTF8m6sUSOCuGJSLJsKPevGGtRfpKqWX7DK6Upk\nqtoJfAN4HdgPvKiqxSLyYxHpyar6LyAceOmCdN1pQL6IFABvA4+pqscNSHtnN68XV7MwI4HgAJPd\nVz2ERMGkhT7hxkpLiGByQvgnk86GJQc3QVe7o9qAtdm4rwo/gSWertQwED1urEOvm63EaZZlJVFe\nf57CSs+1hh6MASkXkcsBFZFAEflnjB98p1HVTao6WVUnquqjjnU/UtV1jvcLVDXhwnRdVd2mqlmq\nmuN4/ZMr9FwqHx4+zbnWTvMDgheScRM0njTKvFucpVnD3I1V/CpEjYHRs8xW4hSqysbCKi6bEENs\nuBeVYRk7D8ITfGLEfkNGAgF+4tEHrsEYkK8CX8fIkDoJ5Do+D3s2FlYRERLAlZNcH5x3iimLjb4H\nPnBTDGs31vkzUPa2kS3kiXLnbuRgTSNHapu9I3jeGz9/49+3dIsxWdPCjAwL4nIPu7EuakBU9bSq\n3u0YCcSp6j2qWucJcd5Me2c3W4qrWZieQFCAl/mmgyOMaq0la203lpU5sAm6O3wi+2qTw33l8UKj\ngyF9JXS2Qqn13Vg3etiNNZgsrAkisl5EakXklIisFZEJnhDnzXxYZrivlnnbE1UPGTdBY5XR/tTi\nLMtKHp5urJI1EDUWUmaYrcQpetxXc8d7mfuqh7GXGZM0fWDE7mk31mAenZ8FXgSSgGTgJeA5d4qy\nApv2VRERHMCVaV6SfXUhkxcZbqyStWYrcZpl2YnDz411/qzDfbXC8u6rQzVNlNU2s9TbYoU9fOLG\n2gptni8H4ko87cYajAEJU9W/qWqnY3ka8IIkbvPo6OpmS0kNC9O9KPvqQoIjYNICKFlneTfWpPgI\npiREsHE4TSo8uNlwX6X3W2TBMmwsdLivMrzQfdVDRo8by/qTCpdlJVJef56iSvdPKhyoFla0iEQD\nm0XkIREZJyKpIvI9YJPblXkxHx4+TcP5Du8LCF5IxkojG8vikwrBkY11fBi5sUrWQORon8i+2lRY\nxZzx0Z5r8zwUxsx1ZGOtMVuJ09yQbkwq3FDo/kmFA41AdmHUw7oN+ArGfIt3gK8Bt7tdmRezubCa\niOAArvKWyYP9MXmRY1Kh9W+KHjfWa8OhxHtrA5S95RPZV4dqmjh8qsl7Y4U9+PnDtBWGG8sHJhV6\nyo01UC2s8ao6wfF64TJsg+gdXd28XmLUvvJa91UPIVEwcb4RB7F4ifdJ8UZtrE3DIRvr4GvG5MH0\nvIvv6+VsKjRqXy3yxuyrC0nPc9TGsn6J9x43lrtrYw0q/1REMkXkNhH5XM/iVlVezEeO2ldeNZt2\nINLzHLWxrF3iHWBJVtInnR99mpI1EJFs+dpXAJuLqpgzLpr4CAuETVMvN0q8l1h/xL4wPRF/P3H7\nA9dg0ngfBn7jWK7DaOpkagMnM9lcVEV4cABXT/ayyYP9MWWJUeLdB26KpVmJdCu+3amw9RwcftMw\n/BavfXX4VCOHapq8P1bYg5+/0anw0BZob7n4/l5M9Igg5k2IcbsbazB/obcA84FqVf08kANEuU2R\nF9PZ1c3rxTXMnxZvfun2wRI6EiZeB8XWd2NNSYhgQtwINhf5sBvr0OtG6XYfcF9tLjQMvVdOHuyP\njJXQ0WwUsLQ4S7ISOVbXwgE3diocjAE5r6rdQKeIRAKn+HQZ9mHDjqP11De3syTTIk9UPaTnQcMJ\nOLnbbCVOISIszUzio7I66ny1U2HJGkfnwblmK3GaTUXVzEodRYI3lG4fLKlXQmi0TySeLMpIxE/c\n25RtMAYkX0RGAn/AyMzaDXzkNkVezKaiKkID/bnGKu6rHqYsBb8AY06IxVnicGNtKakxW4rraWuC\nw28Ykwct7r46erqZ/VXnWGIV91UP/gEw7UajiVeHtVPGY8ODmTs+hk1unIA7mFpY/6CqZ1X198BC\n4D6HK2tY0dWtvFZUw/VT4wkNsoj7qoewaBh/jU9kY6UnRZIaE+ab2VilW4zJbL7gvnK4GS3lvuoh\nfSW0N/mEG2tpViKHTzVRWuMeN9ZAEwlnXLgA0UCA4/2wIv9YPaeb2liSZcEbAowfpTNHobrQbCVO\nISIszUpiW1kdZ5rbzZbjWkrWwog4o8S4xdlUWEXumJGkjAw1W8qlM/5qCBnpEyP2RRmJiMCmQveM\nQgYagfz3AMvP3aLGi9lcVE1wgB/XTYk3W8rQmHojiL9P1MZamplEV7eydb8PubHaW4wRyLTlRjaQ\nhTlR10JR5TmWWvVhyz/QuF8OboZOa8fa4iNDmJ0a7bbEk4EmEl43wHK9Ky4uIotF5KCIHBaRh/rY\nHiwiLzi2fywi43pt+75j/UERWeQKPf3R3a1sLqri2ilxjAgOcOel3MeIGBh3pRGktbgbKzMlktGj\nQt0aHPQ4ZW9CR4tPua8sl2zSm/Q8aGuAI++arcRplmQlcqC6kbJa1/c7MS1SJyL+wOPAEiAduFNE\n0i/Y7YvAGVWdBPwS+Knj2HSMHuoZwGLgfx3ncwt7ys9Qc67NOvns/ZGeB3WH4ZRLGkqaRo8b6wNH\nTTKfoGStkf2TeqXZSpxmU1E1mSmRjIkOM1vK0JlwDQRH+sSIfVlWEv/v5iy31CIzM9VjDnBYVY+o\najvwPHDh41ce8JTj/WpgvoiIY/3zqtqmqkeBw47zuYVNhdUE+ftx/VSLuq96mLYcEJ+4KZZkJtLR\npbzpC26sjlajfMm0G40sIAtTcaaFgvKz1n/YCgg2JuEe2ABd1n5IiY8M4c45Y4kMCXT5uc00IClA\nea/PFY51fe6jqp1AAxAzyGMBEJEHRCRfRPJra2uHJPR8RxcL0uOJcMN/gEcJj4fUK3zCgOSOGUly\nVIjbgoMe5cjb0N7oE+6r1xwpo5Z2X/WQngetZ+Hoe2Yr8VoGU8rkChEZ4Xh/j4j8QkRS3S/NNajq\nE6o6S1VnxcUNbf7Gf96UxeN3+UjiWfoKqN0PtQfNVuIUIsLizCTeK62lqa3TbDnOUbLWKHw57mqz\nlTjN5qJqpiVFMj52hNlSnGfi9RAUDvutn43lLgYzAvkd0CIiOcB3gDLgry64diWfntE+2rGuz31E\nJACjhErdII91KWLxstqfMG258eoDo5ClWYm0d3bz1oFTZksZOp3tRu/zqTdCQJDZapyiuqGVXcfP\nsNSKcz/6IjDUaImwfwN0WfwhxU0MxoB0qlGNKw/4rao+DkS44No7gTQRGS8iQRhB8QtN/TrgPsf7\nW4C3HFrWAXc4srTGA2nADhdo8n0ik40yGT6Q4z5j7CjiI4KtnY115B0j28cn3FeO7Curxz96k54H\nLafhxDazlXglgzEgjSLyfeAeYKOI+AFOBwMcMY1vAK8D+4EXVbVYRH4sIj3Vfv8ExIjIYeBB4CHH\nscUYfdpLgNeAr6tql7Oahg3pK6GmEOrKzFbiFH5+wuLMRN4+eIqWdos+IZasNbJ9JlxrthKn2VxU\nzeSEcCbFh5stxXVMWgiBYT5RG8sdDMaA3A60AV9U1WoMd9F/ueLiqrpJVSer6kRVfdSx7kequs7x\nvlVVb1XVSao6R1WP9Dr2UcdxU1R1syv0DBt8yI21JDOJ1o5u3jk4tAQJU+nqMLJ8piwxsn4sTG1j\nGzuO1ftG8Lw3QWGQthD2r4du+xn1QgZTC6taVX+hqu87Pp9QVVfEQGzMYuQYSJnlEz1C5oyPJjY8\nyJq1sY6+Z2T5+ID76vXialSxfvpuX6TnQfMpOLHdbCVex0C1sD5wvDaKyLleS6OIuLdPoo37Sc+D\nqgKoP2q2Eqfw9xMWZSTy1oFTtHZY7AmxZK2R5TPRJYUdTGVzURUT4kYwOcGH3Fc9pC2CgBCfGLG7\nmoFKmVzpeI1Q1cheS4SqRnpOoo1bSHeEmXwgRXFpVhIt7V3WcmN1dRruq8mLjGwfC1PX1Mb2I/Us\nyUz0nWzF3gSHw6QFxr3S3W22Gq9iMPNAFvSx7r6+9rWxEKPGQVKuTzxVzR0fTfSIIGt1Kjz+IbTU\nGQkNFuf14hq6utU33Vc9pK+ExiqosJM9ezOYIPqPROR3IjJCRBJEZD2w3N3CbDxAeh5U7oKzJ8xW\n4hQB/n4sykjgzf0WcmOVrDWyeyZ95vnMcmwuqmJcTBjpST7smJi8CPyDfeKBy5UMxoBcgzF5cC/w\nAfCsqt7iVlU2nqEneOsDc0KWZCbR1NbJ+6WnzZZycbq7jKyetIVGlo+FqW9uZ1tZHUuzknzTfdVD\nSCRMmm8YENuN9QmDMSCjMAoVlmGk86aKT/+lDCNiJkJilk9kY82bGMPIsEBrZGMd32Zk9fiA+2pL\ncbXvu696SM+Dc5XGqN0GGJwB2Q68pqqLgdlAMvChW1XZeI70PKjYCQ0VZitxikB/P25IT+CNkhra\nOr3cjVWyBgIcZTIszqaiasZGh5GR7MPuqx4mLwa/QJ944HIVgzEgC1T1zwCqel5Vv4ljRriND5B+\nk/HqC26srCQa2zr5wJvdWN1dxr912kIIsnbBwbMt7Ww7fNr33Vc9hI40Uq5L1lq+KZurGMxEwhMi\nMkpE5ojI1SJi/ZKhNn8ndhIkZEHxq2YrcZorJsYSGRLARm92Y534yHBfZfiC+6qGzm5l2XBwX/WQ\nsRIayqFyt9lKvILBpPF+CXgPo2bVvzteH3GvLBuPkpFnpCda3I0VFODHDRmJbPVmN1bxGmNSWpov\nuK+qGD0qlMyUYeC+6mHKUsONVfyK2Uq8gsG4sP4JI/ZxXFWvA6YDZ92qysaz+JAba1l2Eo2tXurG\n6u4yJqOlLTQmp1mYhpYOPjx8mmXDxX3Vg+3G+hSDMSCtqtoKICLBqnoAmOJeWTYeJXYSJGT6RHDQ\nq91YJ7ZDU41vZF+VVNPRNUyyry4k4yaHG8vOxhqMAakQkZHAGmCriKwFjrtXlo3HSV8J5R9Dg1v7\ncrmdoAA/FmUksrXYC91YJQ731eTFZitxmg37qhgTHUr26CizpXieKUscbizrxw2dZTBB9JtU9ayq\nPgL8G0aPDus/Qtl8mp6grg/UxlqWbWRjvX/Ii9xY3d2Gi3DSAsu7r840tzvcV8nDy33VQ+jIv08q\nHOZurMGMQD5BVd9V1XWq2u4uQTYmEZtmuLF84KnqikmxRIUGepcb68RH0FRtuD8szpaSajq7lRuz\nh6H7qgfbjQVcogFxFSISLSJbRaTU8Tqqj31yReQjESkWkX0icnuvbU+KyFER2etYcj37DXyUT9xY\n1s7GCnTUxnqjpMZ7amMVv+KYPOgb7qvUmGEyebA/piwB/yCfeOByBlMMCMZExDdVNQ14k74nJrYA\nn1PVDGAx8CtHLKaH76pqrmPZ637Jw4DMm41XH2jfuSw72XBjeUM2Vlen4e6YvMjy7qu6pja2ldUN\nv+yrCwmJgonzjXtlGNfGGsw8kH/sa4TgJHnAU473T9FHTEVVD6lqqeP9SeAUEOdiHTa9iZkIidk+\nkeN+uaM21sZ9J82WAsc/gObavxtoC9NTun3ZcHZf9ZBxE5yrgMp8s5WYxmBGIAnAThF5UUQWu6iQ\nYoKq9jioqx3X6BcRmQMEYRR07OFRh2vrlyLSb0NpEXlARPJFJL+21kINh8wic5Xh1z1zzGwlThHo\n78eidGNSoelurKJXIHAEpN1grg4XsLHwJBNiR/h26fbBMmWJUeK96GWzlZjGYLKw/hVIw8i+uh8o\nFZH/FJGJAx0nIm+ISFEfy6caQKuqAv2mMohIEvA34POq2jNW/D4wFWOCYzTwLwPof0JVZ6nqrLg4\newBzUXqCvEXWH4Usz0mmub2Ltw+cMk9EV4eR2TZlieU7D55uauOjsjqWZQ9z91UPIZEw+QYjDtLt\nJbE2DzOoGIjjR77asXRilHhfLSI/G+CYBaqa2ceyFqhxGIYeA9HnHS4ikcBG4Iequr3XuavUoA34\nC0a5eRtXMCoVUmb5hBvrsgnRxIYHsd5MN9aRd+H8GZ9wX20uqqZbGZ6TB/sjc5UxOfT4NrOVmMJg\nYiD/JCK7gJ9hlHHPUtWvATOBVUO87jqgpy3ufcBn2nyJSBDwKvBXVV19wbYe4yMY8ZOiIeqw6YvM\nVVBdCKdLzVbiFAH+fizLSuLN/adoaus0R0TxKxAc6ROdB9fvPcmk+HCmJkaYLcV7SFtkuCeHqRtr\nMCOQaOBmVV2kqi+pageAw5104xCv+xiwUERKgQWOz4jILBH5o2Of24Crgfv7SNd9RkQKgUIgFviP\nIeqw6YuMlYD4jBurrbObN0pqPH/xzjbYvwGmLoOAfsN0lqCq4Tw7jtWzImeYTh7sj6Awwz1ZstZw\nVw4zBhMDeVhV+yxdoqr7h3JRVa1T1fmqmuZwddU71uer6pcc759W1cBeqbqfpOuq6vWqmuVwid2j\nqk1D0WHTD5HJMHaeT7ixZowdRXJUCOsLTHBjlb0FbQ2QYX331YYCI+dlRU6yyUq8kMxVcL7ecFcO\nM8yaB2Lj7WTeDLUHoKbYbCVO4ecn3JiTzHultZxt8XABhcKXIDQaJl7n2eu6gXUFJ8keHcW4WGs3\nwXILk+ZDcJRPPHBdKrYBsemb9JUg/lC4+uL7ejnLs5Pp6FJeL6723EXbmuDAJsMd6B/oueu6gaOn\nmymsbLBHH/0REAzTboT96w235TDCNiA2fRMeBxOuNQyIxQvGZaZEMi4mjPUFHqyNdXAzdJ6HrFs9\nd003sW7vSUTgxmzbgPRL5s3Qdg5Kt5qtxKPYBsSmf7JuhYYTUL7DbCVOISIsz0lmW9lpahs99IRY\n+BJEpsCYyzxzPTehqqwrqGT2uGgSo0LMluO9jL8WwmKh8EWzlXgU24DY9M+0G43+FYUvma3EaVbk\nJNOtsMETc0Ja6qHsTSO46mftW2x/VSNltc22++pi+AcYo5CDr0Frg9lqPIa1/7pt3EtwhJGiWPyK\n5VMU0xIiyEiOZM0eDzTMKlkD3Z2+4b4qOEmAn9iTBwdD1m3Q1WbEQoYJtgGxGZis26ClDo68Y7YS\np1mZm0JBRQNHat2c9V24GmInQ2KWe6/jZrq7lXV7K7kyLZboEUFmy/F+Rs+CUeN9YsQ+WGwDYjMw\nkxZAyEifuCmW5yQjAmv2utGN1VABxz80Rh8Wn3D38dF6Tja0ctP0FLOlWAMR4//96HvQ6MGMPxOx\nDYjNwAQEQXqeMaO6vdlsNU6RGBXC5RNjWLu3EnVXZllP2nPmUKv8eA9r9lQyIsifG9ITzZZiHbJv\nA+0eNqVNbANic3Gyb4OOZiM11eLk5aZwvK6FPeVnXX9yVdj3AoyebfRWsTCtHV1sKqxicWYSoUH+\nZsuxDrFpkJQL+4ZHNpZtQGwuztjLjZTUfS+YrcRpFmcmEhTgx1p3BNOrC+FUCWTffvF9vZw395+i\nsa3Tdl8NhezboGqv5YuRDgbbgNhcHD8/40fx8JvQaEJRQhcSGRLIwmkJbNhXRUeXi1uRFjwPfoE+\n4b56dU8lCZHBzJsYY7YU65G5CsTPJx64LoZtQGwGR86doF0+EUzPy02mrrmd90td2KGyq9OYRDZ5\nEYRFu+68JlDf3M47B0+Rl5uCv5+1EwFMISLRqOJQ8LzP90u3DYjN4IibDCkzoeA5s5U4zbVT4hkV\nFsjLu13oxip7y+h7nnOn685pEhv3naSzW1mZa7uvhkzOXdBQDsc/MFuJW7ENiM3gybkTaooMX7+F\nCQrwIy83ha3FNTS0uGiC5L7nIXSUT/Q9f3VPJVMTI0hPtvueD5mpy4xGYnut/8A1EKYYEBGJFpGt\nIlLqeB3Vz35dvZpJreu1fryIfCwih0XkBUf3Qht3k7nK8PH7wE1xy8zRtHd1s84VpU1aG+DARuPf\nJ8Daf4pltU3sPnHWDp47S1CYUYm5ZK1RmdlHMWsE8hDwpqqmAW86PvfF+V7NpFb0Wv9T4JeqOgk4\nA3zRvXJtAMO3P3mR4evvMqlFrIvISI5kamIEq/PLnT9ZyVrobPUJ99VL+RX4+wk3zbANiNPk3m2k\nv+9fd/F9LYpZBiQPgLXpRAAAHflJREFUeMrx/imMvuaDwtEH/Xqgp1HFJR1v4yS5dxm+/rI3zVbi\nFCLCLTNHU1DRwKGaRudOtvc5iJlkxIgsTGdXN6/sruC6KXHER9iVd51mzFyIngB7nzVbidswy4Ak\nqGpPc4ZqIKGf/UJEJF9EtotIj5GIAc6qas8jcAVgPy55ikkLjS57PnBTrJyeQoCf8PKuiqGf5PRh\nOLHNeNq0eOmS90prOdXYxq2zxpgtxTcQMUalx96HsyfMVuMW3GZAROQNESnqY8nrvZ8aNSX6qyuR\nqqqzgLuAX4nIJU/vFZEHHEYov7bWhWmbw5WAIGNOyMFN0FxnthqniA0P5top8byyp5LOoc4J2fM3\no3Nj7l2uFWcCL+VXEDMiiOunxpstxXfIucN4LXjeXB1uwm0GRFUXqGpmH8taoEZEkgAcr6f6OUel\n4/UI8A4wHagDRopIgGO30UC/+Ziq+oSqzlLVWXFxcS77fsOaGfdCV7tPTJS6ZeZoahvbeL/09KUf\n3NVppDVPXmTk/luY+uZ23thfw03TUwj0t5MzXcbIsTD+atj7jE/OCTHrL2UdcJ/j/X3A2gt3EJFR\nIhLseB8LXAGUOEYsbwO3DHS8jRtJyICUWbD7Kcu3u71+ajzRI4J4YecQgumlW6CpBqbf63phHmbN\nnko6utR2X7mD6Z+DM8fg6LtmK3E5ZhmQx4CFIlIKLHB8RkRmicgfHftMA/JFpADDYDymqiWObf8C\nPCgihzFiIn/yqHobmPE5qD0AFTvNVuIUQQF+rJqRwhv7azjV2HppB+/+K4QnWH7uh6ryYn45OaOj\nmJIYYbYc32PacmOO0O6nLr6vxTDFgKhqnarOV9U0h6ur3rE+X1W/5Hi/TVWzVDXH8fqnXscfUdU5\nqjpJVW9VVQ81urb5hMybIXCET9wUd8wZS2e3svpSgumN1cYIJOdOo52phSmsbOBAdSO32KMP9xAY\nYvyd7N8AzUNwlXoxtrPTZmgERxhGpOgVaD1nthqnmBgXztzx0Ty/o5zu7kG65PY+a9QG8wH31TPb\nTxAa6E9ert333G3MuA+6O3wie7E3tgGxGToz74eOFqNnusW5a+5YTtS38GHZIJ4Qu7thz9OQegXE\nTnK/ODfScL6DdQUnyctNJjIk0Gw5vkv8VBhzmU/EDXtjGxCboZMyE+LTYZf13ViLMhIZFRbIczsG\nka9/9F2oLzOeKi3Omj2VnO/o4u65qWZL8X1m3gd1h42Wxz6CbUBsho6I8SN6cjec3GO2GqcICfRn\n1YzRbCmuobbxIiG1nX+EsBij1pGFUVWe+fg42aOjyBodZbYc3yd9JQRHwa4nzVbiMmwDYuMcOXdA\nYBjs+OPF9/Vy7pxrBNNf2jVASm9DhTGJcsbnICDYc+LcQP7xMxyqaeLuuWPNljI8CAozuhWWrPOZ\nYLptQGycI3SkMTO9aDW01Jutxil6gunP7ThBV3/B9Py/GD7sWV/wrDg38Mz240QEB7A8xw6ee4zZ\nX4SuNp/IXgTbgNi4gjlfNqrR7vmb2Uqc5nPzxlFef563DvRRHKGz3bjxJy82ZhhbmPrmdjYVVnPz\njBTCgqydhmwp4qcZM9N3/tnyFa3BNiA2riAhA1KvNGID3V1mq3GKRRkJJEWF8JcPj3524/51RiXi\n2V/yvDAX88LOctq7urnLDp57nrlfhXMVcHCj2UqcxjYgNq5hzpeNiqOlW8xW4hQB/n7cOy+VbWV1\nHKy+oMz7zj/CqPEw8XpzxLmIjq5untp2jCsmxdgzz82gZwT78f+ZrcRpbANi4xqmLoOIZNjxhNlK\nnObO2WMJDvDjyW3H/r6yuhBOfGT4sP2sfdtsKqyi+lwrX7hivNlShid+/jD7y0Y6b3WR2Wqcwtp3\ngo334B8Isz4PZW9B7SGz1TjFqBFB3DQ9hVf3VHC2pd1Yue23EBRu+ZnnqsqfPzjKhNgRXDfFLttu\nGtPvgYBQ2PH/27v3sKrK7IHj3wWCoGgoKhZewLQQRSjIS14iLE3Hqay8dZlKzcxL1pTlzPSbcWa6\n2Tjl5DQ2qZU2lo2XtDGdGstSQw1QxAuZWioYGoMFoqJc1u+PfXAIIW7nnH2OvJ/n4Qn22WefdXbC\nOu/77r2Wd49CTAIxnCfuPvBtDFtfsTuSeruvbziFRaUsTc6EvKPWVWZX/8K66syLpR7+np1Zedzf\nNxwfH+9ugOXVmrS0LulNX+bVVy+aBGI4T1Abq7FS2jtw8rjd0dRLZNvm9OkUwuKkQ5RunQdaai1+\nermFm7/hkkA/bo9rZ3coRq+JUHwGkr23mLhJIIZzXTvVajbl5UNzgHH9IsjPO0FJ8hvWXcQtvPuK\npcwTp/lwzzHG9OxgLt31BKFR0GUwbJsH507bHU2dmARiOFfI5Vb/g+QFcPZk9ft7sMTINkwN3oJf\ncQGlfabaHU69vfH5IXxEuPda706EF5V+j8LpXKs4pxcyCcRwvr7ToDAPtnv3jYU+Wsw98gFbS7vy\ncX6Y3eHUS27BWd754gg3x1zGpZcE2h2OUaZjH6tKb9JcKCmyO5pas2UcKyItgXeBcOAQMFJVv6+w\nz/XAS+U2RQKjVXWViLwJXAfkOR67T1XT6hJLUVERWVlZFBbWshvdRSogIIB27drh51eP0t7t4q0b\nC7e8Yt0f4uulZcJ3r6TJmWO8FziWLzcc4IaubRDxzoXnBZu/obC4hEnXe3f5+YtSv0fhnVFWb52Y\nUXZHUyt2TYTOAD5W1edFZIbj5yfL76CqG4BYOJ9wDgDl71KbrqrL6xtIVlYWzZo1Izw83Gv/ODiL\nqpKbm0tWVhYREfW8R6Dvw/D2SNi1HGLHOCdAdyopho0vQJsooq8aybur97LlYC7Xdm5ld2S19sPp\ncyxOOsTPoi+lc5sgu8MxKuoyyGqLsPkliB7hVfcZ2RXpLUBZNbFFQHV1se8A1qmq01eaCgsLCQkJ\nafDJA0BECAkJcc5orPONEBoNG//knTV/di+3ejckzOCO+A60btaYv3160O6o6uT1zw9x6lwJUxLN\n6MMj+fhA30cgJwP2f2h3NLViVwIJVdVsx/fHgNBq9h8NvFNh2zMiki4iL4lIlXW1RWSCiKSISEpO\nTk5V+9Q07oue086Fjw9c/2ur8VL6Uucc011KiuGzWRDaHSJ/ToCfL+P7RbD5wH/ZceT76p/vQfIL\ni3jj828Y3C2UyLbN7Q7HqEr32yC4I2x41up46SVclkBEZL2I7K7k65by+6mqAlX2eBSRS4FooHxq\n/hXWmsg1QEsqTH9VOP5rqhqvqvGtW7euz1syauvKIXDZ1fDpLKuSrbfY9U848TUkzDg/nXBX746E\nNPVn9kf7bA6udhYnHeJkYTFTE7vYHYrxU3z9rA9cx9IhY7Xd0dSYyxKIqt6gqt0r+VoNHHckhrIE\nUUnt7PNGAu+p6vlLFFQ1Wy1ngTeAnq56H+7wzDPP0K1bN3r06EFsbCzbtm1j/Pjx7N271+7Q6kcE\nEn8DeUdgx2K7o6mZkmL47AVoGw2Rw85vDmrciCmJnfn8QC6b9lc+kvU0eaeLmL/pGxIj29A9zHQc\n9HjRI6B1V/jkGa+Z9rVrCut9oKyh9L3AT6XcMVSYviqXfARr/cRrK5Jt2bKFNWvWsH37dtLT01m/\nfj3t27dnwYIFREVF2R1e/V0+0LpMceNsKDpjdzTVS18K338DCb+yEmA5d/bqQFhwIC/8ex+lVTWc\n8iCvfHqA/MIiHh90pd2hGDXh4wuJT0HufthZccbeM9l1FdbzwD9FZBxwGGuUgYjEAxNVdbzj53Cg\nPfBZhecvEZHWgABpgFNqTPz+X3vY+22+Mw51XtRlzfndz7tV+Xh2djatWrWicWNrGadVK+sqn4SE\nBGbPnk18fDxBQUFMmzaNNWvWEBgYyOrVqwkNDSUnJ4eJEydy5MgRAObMmUPfvn2dGn+9iVi/FIuG\nQcrr0Gey3RFV7dxpaw760li4cugFDzdu5Msvb7yCx5btZO3ubIb18NxOfpknTvPm54e47ap2RF1m\n1j68RuTPHNO+z1u1sjy8bbItIxBVzVXVgaraxTHVdcKxPaUseTh+PqSqYapaWuH5iaoa7ZgSu1tV\nC9z9Hpxl0KBBZGZmcsUVVzBp0iQ++6xiroRTp07Ru3dvdu7cyYABA5g/fz4A06ZN49FHHyU5OZkV\nK1YwfryHNjqK6A+dEqwrsjy5cFzSy5B/FG567oLRR5lbrwrjytBm/Pmjrygq8dzFzj9/tA8ReHzw\nFXaHYtSGCAz8rdVwKuV1u6OplimIU85PjRRcJSgoiNTUVDZt2sSGDRsYNWoUzz///I/28ff3Z9gw\naz4+Li6O//znPwCsX7/+R+sk+fn5FBQUEBTkgdf6D34WXu0PG56Bn/3Z7mgulJcFm+dYNa86Xlvl\nbr4+wvTBVzJ+cQpLkzO5p7fnlQXZlZXHqrRvmZRwubnr3Bt1SoCI66xRSPRIaBpid0RVMgnEA/j6\n+pKQkEBCQgLR0dEsWrToR4/7+fmdv7zW19eX4mJrga20tJStW7cSEBDg9phrLbSb1Qo2eb5V9r1t\ntN0R/dj6mVbF3Rv/UO2uA7u2oXenlsz+cB9Du7clJMhzphlUlWfXZtCyqT8TEy63OxyjLkRgyCyY\n1xc+ngk3z7U7oip5zy2PF6l9+/axf//+8z+npaXRsWPNPtUOGjSIuXP/948rLa1O1Vzc5/pfQWAL\nWPsEqActQmd+AbuWwbVTalRxV0T44y3dOXW2mOfWfemGAGtuTXo2W77OZdrALjQP8NISMga06Qq9\nH4LtiyEz2e5oqmQSiM0KCgq49957iYqKokePHuzdu5eZM2fW6Lkvv/wyKSkp9OjRg6ioKF599VXX\nBltfgS2s+d0jSbB7hd3RWEpLYN2TENQW+v2yxk/rEtqMCQM6sTw1i21f57owwJr7/tQ5Zr6/hx7t\nLuFuD5xaM2opYQY0uxTWPmb9O/VAop70SdDF4uPjNSUl5UfbMjIy6Nq1q00ReSaXnpPSEph/PRR8\nB5O22t/hb/McWP87uH0hRN9Rq6eeOVfCjS99RqCfLx883B//RvZ+Hnt82U7e23GUf03pZ668uljs\nXgHLx8LQ2VZhUpuISKqqxlfcbkYghnv5+MKwOXAqB9ZOtzeW7760FvUjh0H322v99EB/X35/czf2\nf1fAgs1fuyDAmtu8/78sT83iwQGdTPK4mHS7zVpQ//iP8EOm3dFcwCQQw/3CroYBT1glQ3avtCeG\nkmJYNREaN7MSWh1rgA3sGsqQ7m2Z85/97D6aV/0TXODMuRJ+/d4uIlo15eGBpmTJRUUEfv4X0BJY\nOcHjprJMAjHs0f8xCIuDNY9C/rfuf/3P58C3O6xLioPqVyPt2eHRtGzqz9R3dlBw1v0lKGa+v4cj\nJ07z7PBoAvx83f76hou1jLD+nR5Jgk0v2h3Nj5gEYtjDtxEMf83qn756snuvyspMtq6x73YbdBte\n78O1aOrPnNGxHM49xW9Xu7eqzrvJR3g3JZOpiZ3pc7nn3i9g1FOPUdD9Dvj0OeuqQQ9hEohhn1ad\nYdDTcPAT6xfDHfKOwtI74ZIwp97Q2LtTCA8P7MLK7UdZkZrltOP+lN1H8/i/1Xvo17kVj9xg7ji/\nqInAsBeheRisGA9nPKOtgEkghr3ix8JVd1v9N9JcXEDu3GlYOsYq6jhmKTRp6dTDT03sQq+Iljy1\narfL+4bknS5i0pLthDT15y+jY/H1MT1tLnoBl8DtC6wp36V3QZH9bbhNAvEAx44dY/To0Vx++eXE\nxcUxdOhQvvrqq1odY+jQofzwww8uitCFRKxF7IgB8P5UOLTZNa+jak2VZafDHQutG7WczNdHmHvn\nVbRu1pj730zmq+Mnnf4aAKfPFTPhrRSy887wyl1Xe9Sd8IaLdegFw1+Fw5/DygdsX1Q3CcRmqsrw\n4cNJSEjg4MGDpKam8txzz3H8+PFaHWft2rUEB9t8T0Vd+frByLegZSfrk9V3Gc49fmkprHsC9qyE\nG2bCFYOde/xy2jQL4B/jeuHv68M9C7eRecK5XZjPnCth3JspJB86wZ9HxnJ1hxZOPb7hBaLvgEHP\nQMb78O8ZtlZ1MLWwyls3A47tcu4x20bDkOerfHjDhg34+fkxceL/KtLHxMSgqkyfPp1169YhIjz1\n1FOMGjWK7OxsRo0aRX5+PsXFxcybN4/+/fsTHh5OSkoKBQUFDBkyhH79+pGUlERYWBirV68mMDCQ\ngwcPMnnyZHJycmjSpAnz588nMjLSue+3rgKD4a5/wsJB8PpNMPptCHdCafqSIlj1kFWqpM8U6Dut\n/sesRoeQJiwe15ORr27hnoXbWPJAb8KC61/UsLCohPGLk9n2TS4vjozl5hjPLSdvuNi1U+BkNmz5\nK/g0stYSfdx/BZ4Zgdhs9+7dxMXFXbB95cqVpKWlsXPnTtavX8/06dPJzs7m7bffZvDgwecfi42N\nveC5+/fvZ/LkyezZs4fg4GBWrLDKhkyYMIG5c+eSmprK7NmzmTRpksvfX620CIdxH0HT1vDWrbBr\nef2Od+60tWC+axkM/J31S+asnu/ViGzbnDfu70luwTl+PnczSQf+W6/jHcsr5BcLvyDpYC6zR8Rw\n61VhTorU8Fo3/hF6Pghb/wZvj4JC99+HZEYg5f3ESMHdNm/ezJgxY/D19SU0NJTrrruO5ORkrrnm\nGsaOHUtRURG33nprpQkkIiLi/Pa4uDgOHTpEQUEBSUlJjBgx4vx+Z8+eddv7qbGyJLL0TlgxzprO\nGvA4+NXyE/yRbfCvaZDzpbXGEn+/S8L9KXEdW7B6Sl8efCuVuxduY8aQSB7o3+l8ZeWa+nDPMZ5c\nkc654lLmjIrllliTPAzAxweGvgBtIq2qDgtuhNFLoJX7bia1ZQQiIiNEZI+IlDq6EFa1300isk9E\nDojIjHLbI0Rkm2P7uyLi757Ina9bt26kpqbWeP8BAwawceNGwsLCuO+++1i8+MJe42XdDeF/5d9L\nS0sJDg4mLS3t/FdGhpPXGpylSUu4ZxXEjIFNs+GvPWHPqprN9RbmwZpfwuuD4exJuGu5LcmjTKfW\nQbw3uS83dW/Ls2u/ZMSrW9i0P4ea1KA7knuaJ5en8+BbqbRv0YQ1U/uZ5GFcKH6s9fty6jv4W2/4\n4DE4Wbs11LqyawprN3AbsLGqHUTEF3gFGAJEAWNEpKxJ+CzgJVXtDHwPjHNtuK6TmJjI2bNnee21\n185vS09PJzg4mHfffZeSkhJycnLYuHEjPXv25PDhw4SGhvLAAw8wfvx4tm/fXqPXad68ORERESxb\ntgywFu937tzpkvfkFH4B1tUm966BgOaw7F6Yn2j1Vv92h7UwXqboDGT8y7o+/sVukPoG9J4Ek7dB\nlxvsew8OQY0b8cqdV/Ps8GiO/nCGexZ+wW3zkliWkklGdv75zoalpcrx/EI+zjjO2DeTuW72BpZv\nz+LB6zqx4qFr6dTaAxuFGZ4hor9VnPTqX0Dqm/ByrNU2IWONVbjURWyZwlLVDKC6oXxP4ICqfu3Y\ndylwi4hkAInAnY79FgEzgXmuiteVRIT33nuPRx55hFmzZhEQEEB4eDhz5syhoKCAmJgYRIQXXniB\ntm3bsmjRIv70pz/h5+dHUFBQpSOQqixZsoSHHnqIp59+mqKiIkaPHk1MTIwL350TRPSHBzfC9kXW\nL8Ynf7S+/INAfKD4LJQ4puICW0L34RA/Di67cGrPTiLCnb06cHtcGMtTs/jbhoNMX54OgH8jH1oH\nNea7k4UUlVgjk1ZBjZl6fWfG9OpgugoaNdOsLQx7ybpYZMOz1gepL/5uPdYiAsa84/TL120t5y4i\nnwKPq2pKJY/dAdxU1iNdRO4BemEli62O0Qci0h5Yp6rdq3iNCcAEgA4dOsQdPnz4R4+bcu4X8uhz\nUvAdHNwAR1OtBNLIHxoFQofeEN7PuiTYC5SUKt/8t4A93+az99t8ck6epe0lAVwaHEiHlk3o0ynE\n9vLwhpcrKoTsnZC51Sp/cus8azRfB1WVc3fZCERE1gNtK3noN6q62lWvW5Gqvga8BlY/EHe9ruEi\nQW0gZpT15cV8fYTObZrRuU0zs65huIZfgHXjYYdeLnsJlyUQVa3v5PNRoH25n9s5tuUCwSLSSFWL\ny203DMMw3MiTx8jJQBfHFVf+wGjgfbXm3DYAZe3j7gXqNaJpSF0Zq2POhWEYNWXXZbzDRSQL6AN8\nICIfOrZfJiJrARyjiynAh0AG8E9V3eM4xJPAL0XkABACLKxrLAEBAeTm5po/nFjJIzc3l4CAALtD\nMQzDCzT4nuhFRUVkZWVRWGh/ZUtPEBAQQLt27fDz847FaMMwXM/ti+jews/Pj4iICLvDMAzD8Dqe\nvAZiGIZheDCTQAzDMIw6MQnEMAzDqJMGtYguIjnA4Wp3rFwroH41ub2fOQfmHDT09w8N8xx0VNXW\nFTc2qARSHyKSUtlVCA2JOQfmHDT09w/mHJRnprAMwzCMOjEJxDAMw6gTk0Bq7rXqd7nomXNgzkFD\nf/9gzsF5Zg3EMAzDqBMzAjEMwzDqxCQQwzAMo05MAqkBEblJRPaJyAERmWF3PO4kIu1FZIOI7BWR\nPSIyze6Y7CIiviKyQ0TW2B2LHUQkWESWi8iXIpIhIn3sjsndRORRx+/BbhF5R0QadOlqk0CqISK+\nwCvAECAKGCMiUfZG5VbFwGOqGgX0BiY3sPdf3jSs1gIN1V+Af6tqJBBDAzsXIhIGPAzEO1po+2L1\nKWqwTAKpXk/ggKp+rarngKXALTbH5Daqmq2q2x3fn8T6o9HgerCKSDvgZ8ACu2Oxg4hcAgzA0XtH\nVc+p6g/2RmWLRkCgiDQCmgDf2hyPrUwCqV4YkFnu5ywa4B9QABEJB64CttkbiS3mAE8ApXYHYpMI\nIAd4wzGNt0BEmtodlDup6lFgNnAEyAbyVPUje6Oyl0kgRo2ISBCwAnhEVfPtjsedRGQY8J2qptod\ni40aAVcD81T1KuAU0NDWA1tgzT5EAJcBTUXkbnujspdJINU7CrQv93M7x7YGQ0T8sJLHElVdaXc8\nNugL3Cwih7CmMBNF5B/2huR2WUCWqpaNPpdjJZSG5AbgG1XNUdUiYCVwrc0x2cokkOolA11EJEJE\n/LEWzd63OSa3ERHBmvfOUNUX7Y7HDqr6K1Vtp6rhWP//P1HVBvXJU1WPAZkicqVj00Bgr40h2eEI\n0FtEmjh+LwbSwC4kqKjBt7StjqoWi8gU4EOsqy5eV9U9NoflTn2Be4BdIpLm2PZrVV1rY0yGPaYC\nSxwfpL4G7rc5HrdS1W0ishzYjnV14g4aeFkTU8rEMAzDqBMzhWUYhmHUiUkghmEYRp2YBGIYhmHU\niUkghmEYRp2YBGIYhmHUiUkghuFGIpJUi30/FZH4avY5JCKtanHM+0TkrzXd3zB+ikkghuFGqtqg\n71w2Li4mgRhGJUTkGhFJF5EAEWnq6AHRvZL9VolIquPxCY5tHUVkv4i0EhEfEdkkIoMcjxU4/nup\niGwUkTRHb4n+1cQzT0RSHK/z+woPPyEiu0TkCxHp7Ni/tYisEJFkx1dfp5wYwyjH3IluGJVQ1WQR\neR94GggE/qGquyvZdayqnhCRQCBZRFao6mERmQXMA74A9lZStfVO4ENVfcbRc6ZJNSH9xvE6vsDH\nItJDVdMdj+WparSI/AKravAwrN4dL6nqZhHpgFVJoWvtz4RhVM0kEMOo2h+waqEVYjUSqszDIjLc\n8X17oAuQq6oLRGQEMBGIreR5ycDrjkKVq1Q1rZJ9yhvpGOE0Ai7Fam5WlkDeKffflxzf3wBEWSWb\nAGjuqKhsGE5jprAMo2ohQBDQDLigdamIJGD9oe6jqjFYtZECHI81warcjOMYP6KqG7EaNB0F3nSM\nHiolIhHA48BAVe0BfFAhHq3kex+gt6rGOr7CVLWg2ndsGLVgEohhVO3vwP8BS4BZlTx+CfC9qp4W\nkUislr9lZjme91tgfsUnikhH4LiqzsfqcvhTpdGbY/XfyBORUKz2yuWNKvffLY7vP8Iqflj2epWN\nggyjXswUlmFUwjEiKFLVtx3rDkkikqiqn5Tb7d/ARBHJAPYBWx3PvQ64BuirqiUicruI3K+qb5R7\nbgIwXUSKgAKgyhGIqu4UkR3Al1jdMT+vsEsLEUkHzgJjHNseBl5xbG8EbMSaTjMMpzHVeA3DMIw6\nMVNYhmEYRp2YBGIYhmHUiUkghmEYRp2YBGIYhmHUiUkghmEYRp2YBGIYhmHUiUkghmEYRp38P60k\nhJOkqACmAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "tags": []
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "y_sin = np.sin(x)\n",
    "y_cos = np.cos(x)\n",
    "\n",
    "# Plot the points using matplotlib\n",
    "plt.plot(x, y_sin)\n",
    "plt.plot(x, y_cos)\n",
    "plt.xlabel('x axis label')\n",
    "plt.ylabel('y axis label')\n",
    "plt.title('Sine and Cosine')\n",
    "plt.legend(['Sine', 'Cosine'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "R5IeAY03L9ja"
   },
   "source": [
    "###Subplots "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CfUzwJg0L9ja"
   },
   "source": [
    "You can plot different things in the same figure using the subplot function. Here is an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 281
    },
    "colab_type": "code",
    "id": "dM23yGH9L9ja",
    "outputId": "14dfa5ea-f453-4da5-a2ee-fea0de8f72d9"
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXIAAAEICAYAAABCnX+uAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0\ndHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3deVzU1f7H8ddh2HcFFAVZVBQ3ZHPN\nbLG6mpZmmkuall3LVtv35bbdbrua3XJLM9MsM8tKK7PScgMRRVFBcEFUQARk387vD/D+bLFchvnO\n8nk+Hj4eMsLMe0bn7fme+Z7zVVprhBBC2C4nowMIIYS4MFLkQghh46TIhRDCxkmRCyGEjZMiF0II\nGydFLoQQNk6KXDgspdSNSqlvjc4hxIVSch65sHdKqX7AK0AXoA5IB6ZqrbcYGkwIM3E2OoAQTUkp\n5QusBKYASwFX4GKgyshcQpiTTK0Ie9cBQGu9WGtdp7Wu0Fp/q7XerpSaqJRaf+oblVJaKXW7UipD\nKVWklJqplFKn/fktSql0pdQJpdRqpVS4EU9IiN+TIhf2bi9Qp5RaoJQapJRq9jffPwToAcQANwD/\nAFBKDQUeB4YDQcA6YHGTpRbiHEiRC7umtS4B+gEamA3kK6W+UEq1PMOPvKy1LtJaHwTWArGNt98O\n/Ftrna61rgVeAmJlVC6sgRS5sHuN5TtRax0KdAVaA2+d4duPnvb7csC78ffhwLTGKZcioBBQQEgT\nxRbirEmRC4eitd4NzKeh0M/FIeA2rbX/ab88tNa/mj2kEOdIilzYNaVUtFLqAaVUaOPXbYAxwMZz\nvKt3gceUUl0a78dPKTXSvGmFOD9S5MLenQR6AZuUUmU0FHga8MC53InWejnwH2CJUqqk8T4GmTmr\nEOdFFgQJIYSNkxG5EELYOClyIYSwcVLkQghh46TIhRDCxhmyaVZgYKCOiIgw4qGFEMJmJScnF2it\ng35/uyFFHhERQVJSkhEPLYQQNkspdeDPbjfL1IpSap5SKk8plWaO+xNCCHH2zDVHPh8YaKb7EkII\ncQ7MMrWitf5ZKRVhjvuyNxXVdSQdKGTP0ZNkHCslM7+Ukooaquvqqa6tx8PFRLCfO8F+7kQEeNEj\nojlxYf64u5iMji6ExR04XsaGfcfJKigju6CMg8fLqaytQ2vQaHzcXAhr7klYgCftg7zp2z6A0Gae\nRsc2nMXmyJVSk4HJAGFhYZZ6WEMUV9SwKu0I3+3KY31mPpU19QAEeLnSvoU37YK8cXNxwtXkRHl1\nHUeKK9i47zjLUw6jNbg6OxEf5s/Q2BCGxLTCx93F4GckRNPQWpN84ARfpuby09589h8vBxreA+HN\nPQkP8MLLzYQClFIUlVeTkXeSH/bkUV3b8L5qG+hF/w5BjEgIpWuIn4HPxjhmW6LfOCJfqbX+213l\nEhMTtT1+2Lkvv5T5v+xn2dYcyqvrCPH34IpOLbgsugXdQvwI8Hb7y58vLq9hy/5CNmUfZ+2efDLz\nSnF3ceLqbq2Y1C+SLq0d8x+psD/l1bV8npLLwo0HSD9SgoeLiT7tArikQxD9ogKJCPDC5KTO+PP1\n9ZrM/FLWZRSwPiOfX/cdp6q2nu6hfoztFcbQ2BC7PKpVSiVrrRP/cLsU+YXbX1DGy9/sZtXOo7ia\nnLg2tjUT+kTQNcSX064Udk601qTmFLM06RBfbMultKqWq7sFc98VHYhq6WPmZyCEZdTU1bNk80Gm\nrcmgoLSaTq18ualPOENjW+Ppev4TBMUVNSzfmsNHmw+y91gprfzcue+KDgyPD8HZZD/LZaTIm0Bx\nRQ0z1mSwYMN+XE1OTOoXyfg+EQT5/PXI+3weZ+66LOb9sp+y6lpGJoTy+NWd8Pd0NevjCNGUVu88\nysvf7Ca7oIyekc158KqO9Ihodt6DnT+jtebXfcd5ZfUeUg8V0b6FN08O7sSlHVuY7TGM1KRFrpRa\nDFwKBALHgGe01nPP9P32UOTf7TrGY59t53hZNSMTQnnwqo608HVv0sc8UVbNOz9mMu+X/TTzdOHZ\na7swuFsrs74RhDC346VVPLUija93HCWqhTePDorm8ugWTfrvVmvNqrSjvLp6D1kFZVwfH8rTQzrj\n52nbnzc1+Yj8XNhykZ+srOH5lbtYmpRD51a+vDIixuIfsOzMLebRZTvYcbiYKzu35JXrY2jmJaNz\nYX2+2XGEJz9P42RlLVOvjGLyxW0tOtVRVVvHjDWZ/PenfQR4ufLv4d0Y0OlMl2u1flLkZrAjp5gp\ni5LJLapgyqXtuHdAB1ydjZl/q62rZ94v2by6eg8tfNx5e2wccWF/d4F4ISyjurae51bu5MONB+kW\n4sfrN3Sng4Gf7aQdLubBT1LZffQkt13Sloeu6miTc+dS5BdoeUoOjy7bQaC3G9PHxJIQ3tzoSACk\nHirijkVbyTtZyeNXd2Ji3wiZahGGOlpcyZRFyaQcLLKq0qyqreO5L3exaNNB+rQNYPqYOLN/ntXU\npMjPU21dPS9/s5s567Pp3bY5M8fG/+1phJZWXF7DA59s4/v0PMb0bMPzQ7taxRtHOJ7kAye4bWES\nFdV1vDqyO1d3a2V0pD9YlpzD48t30MzTlbkTE23qtN4zFbm82/9CZU0dt3+YzJz12UzsG8HCSb2s\nrsQB/DxdmH1TIndd1p7Fmw8xaUESpVW1RscSDua7XccYO3sj3m7OfH7nRVZZ4gDXJ4Ty2R19UQpG\nvbeRXzILjI50waTIz6C4oobxczexZncezw/twrPXdsHFike5Sike/EdH/j28G+szCxj57gbySiqN\njiUcxOLNB7ltYRLRwT4sm9LX6tc6dGntx2d39CXE34OJ72/m85TDRke6INbbTAbKK6lk1Hsb2Hao\niBlj4hjfJ8LoSGdtTM8w5k5I5MDxMkbN2siR4gqjIwk79/YPGTz22Q76dwhi8eTeVnnU+mda+Xmw\n9PY+JIQ3Y+rH21jw636jI503KfLfOVZSyahZGzlYWM68iT0YEtPa6Ejn7NKOLVg4qScFJ6u44b0N\nHCosNzqSsFNvfb+X177dy/C4EGbflHhBqzON4OfhwoJbenJV55Y888VO5q3PNjrSeZEiP03eyUrG\nzN5IXkklCyf15OKoP1yIw2YkhDfnw1t7UVxew+hZGzlwvMzoSMLOvPX9Xt76PoMRCaG8OrK7VU89\n/hU3ZxMzb4xnYJdgnlu5iznrsoyOdM5s85VvAvknqxg7exNHiyuZf0tPqzm98EJ0b+PPR//sTXl1\nLWNnb5JpFmE2077P+F+J/+f6mL/c4MoWuJicmDE2jqu7BfPCV+k2V+ZS5DScvjduziYOn6hg3sQe\n9Iiw/RI/pWuIHwsn9aKkouE5Hi+tMjqSsHHz1mfz5vd7uT7ePkr8FBeTE9NG/3+Zf5J0yOhIZ83h\ni7yypo5JC7aQXVDGnAmJ9G4bYHQks+sa4secCYnknKhgwvubKamsMTqSsFGfpxzmuZW7GNglmFdG\n2E+Jn+JicuLNUbFcHBXIo5/t4NudR42OdFYcushr6+q566MUkg+e4I1R3bmofaDRkZpMr7YBvDsu\ngd1HTvLPBUlU1dYZHUnYmLV78njwk1T6tA3grdGxdlfip7g5m3h3XAJdQ/y4a3EKG/YdNzrS33LY\nItda8+TnaXyffoxnr+lik2ennKvLolvw+g3d2ZRdyMOfbqe+3vKreoVtSj1UxJQPk4lu5cOsmxLs\n8qINp/Nyc2b+xB6EN/dk8gdJ7D120uhIf8lhi/ydH/exZMsh7rqsPRP6Rhgdx2KGxobw8MCOrNiW\nyxvf7TU6jrABOSfKmbQgiSAfN+bf3NNhLj3YzMuV+bf0xN3VxM3vbyH/pPV+vuSQRf7V9iO8unoP\nw2Jb88BVHYyOY3FTLmnHmJ5teHttJks2HzQ6jrBiJZU1TJrfMBX3/sQeBNrIYh9zCfH3YO6ERI6X\nVXHrBw17yFgjhyvy1ENF3L90GwnhzXj5+hiH3ClQKcVzQ7vSv0MQT3yexq92sNeEML9TnyHtyy/l\n3XEJtG9h3cvum0pMqD/TRsexPaeI+z7eZpVTkg5V5LlFFdz6QcMh4nvj7X+e76+4mJyYOTaOtoFe\n3PHRVg4el9Wf4rde+Cqdn/fm88KwrnZ9IsDZ+EeXYJ64uhOrdh7lrTUZRsf5A4cp8sqaOm5bmExF\ndR3zHPAQ8c/4uDfsmqg1/PMD2TFR/L+lSYeY/+t+JvWLZHTPMKPjWIVJ/SIZkRDK9DUZrEo7YnSc\n33CIItda88TyNHYcLubNUbGGXqnE2kQEevH22Dgy8k5a7WGjsKyUgyd4cnkaF7UP4LFB0UbHsRpK\nKV4Y1pXYNv7cvzSV3UdLjI70Pw5R5At+3c+yrTncOyCKKzvb7vX6msrFUUE8Mbgz3+06xvQfrO+w\nUVhOXkklt3+YTEs/N94eEy8XKPkddxcT741PwNvNmX9+kMSJsmqjIwEOUOSbso7z/FfpXNGpJfcO\niDI6jtW65aIIrosLYdqaDH7ck2d0HGGAmrp67li0lZKKWmaNT5QLep9BS1933h2fwLHiKqZayVGs\nXRd5Xkkld36UQniAJ2+M6o6Tna5EMwelFC9d142OLX24d8k22frWAb38zW6SDpzgPyNi6NTK1+g4\nVi0+rBlPX9OZn/bmW8VRrN0W+alTp8qqanl3XAK+DrKI4UJ4uDYsTa7XmimLkqmssc5zZoX5fbX9\nCHMbL2l4bXf7X+VsDjf2CmN4vHUcxdptkb/67R427y/k38O7yYeb5yAi0Is3b4gl7XAJz36x0+g4\nwgL25Zfy8KepxIX58/jVnYyOYzOUUrw4rOEodurH28g5YdxRrF0W+bc7j/LeT1mM6x3GsLgQo+PY\nnCs6t+SOS9uxZMshlqfkGB1HNKGK6jqmfJiMm4uJd26Mx9XZLiuhyZw6iq2r09z5UQrVtfWG5LC7\nv7VDheU88EkqMaF+PDWks9FxbNb9V3agZ2RznlieRmZeqdFxRBN55os0MvJKeWtULK38PIyOY5Mi\nAr14ZUQMqYeK+M+q3YZksKsir66t567FKQDMHBuPm7Pjrty8UM4mJ2aMicPDxcSdi7Za7R4T4vx9\ntjWHpUk53HVZe/p3sN3LGlqDQd1aMbFvBHPXZxuyh7ldFfkrq3aTeqiIV0fE0Ka5p9FxbF5LX3fe\nHBXL3ryTPPNFmtFxhBll5p3kieVp9IxsLqflmsljV0fTLcSPBz9JtfhZX3ZT5N/vOsac9dlM6BPO\nwK6tjI5jN/p3COLOS9uzNCmHFdsOGx1HmEFlTR13LkrBw9XE9NFxsujHTNycTcwcG4/WcPfiFGrq\nLDdfbhd/g7lFFTz4aSpdWvvymHzqbnZTr4giMbwZTyxPY39BmdFxxAV6buUu9hw7yRs3dCfYz93o\nOHYlLMCTl6+PYduhIl7/1nL7/dt8kdfVa6Yu2UZNbT1vj4136B0Nm4qzyYlpY+IwOSnuXmzcJ/Pi\nwn2z4wgfbTrIbf3bcmnHFkbHsUuDY1oxpmcY7/60j5/35lvkMW2+yGf8kMHm/YU8P6wrkYFeRsex\nWyH+HrwyIoYdh4sN+2ReXJicE+U8smw73UP9eOCqjkbHsWtPD+lMh5be3L801SJXFrLpIt+cXcj0\nNRkMjwtheHyo0XHs3j+6BDOhTzhz12ezVvZjsSm1dfVMXbKNeg0zxsj54k3Nw9XEjDHxnKys4f6l\nTb8fi83+bRaVVzN1SQphzT15blhXo+M4jMeu7kR0sA8PLk0l72Sl0XHEWZq+JoOkAyd48bquhAXI\nGV2W0DHYh6ev6cy6jALmrM9q0seyySLXWvPosh3kl1YxY0w83m7ORkdyGO4uJmaMiaOsupYHlqZa\nxc5v4q9tzDrO22szGZkQytBYWelsSWN7hjGwSzCvrt7DjpziJnscmyzyxZsPsWrnUR7+RzTdQv2M\njuNwolr68NQQy4w0xIUpKq/mvo+3ER7gxbPXdjE6jsNRSvHy9d0I9HbjniUNm/g1BbMUuVJqoFJq\nj1IqUyn1qDnu80wyjp3kuZU7uTgqkEn9IpvyocRfODXSeGXVHrbnFBkdR/wJrTWPLNtOQWkV00fH\n4SVHrobw93TlzVGx7D9exjNNtBHdBRe5UsoEzAQGAZ2BMUqpJtnkpLKmjrsXp+Dl6szrN8j+4kY6\nNdII8nHjnsUpcr1PK/TR5oOs3nmMh/7RUY5cDda7bQB3XdaeT5NzWN0ES/jNMSLvCWRqrbO01tXA\nEmCoGe73D15ZtYfdR0/y2sjutPCRhQxGOzXSOFBYLlveWpmMYyd5fuUuLo4K5NZ+bY2OI4B7B0Tx\nyMBoLo4KNPt9m6PIQ4BDp32d03jbbyilJiulkpRSSfn553eS/NXdgnnoHx25LFoWMliL00caX6Tm\nGh1H8P9Hrp6uzrw+Uo5crYWzyYkpl7bD09X8U1wW+7BTaz1La52otU4MCjq/ndYSI5pz52XtzZxM\nXKh7BkQRF+bPE5/tkEvEWYH/rNrdeOQaQwtfOXJ1BOYo8sNAm9O+Dm28TTgIF5MT00fHAXDvkhRq\nLbhZkPittbvzeP+X/UzsG8Hl0S2NjiMsxBxFvgWIUkpFKqVcgdHAF2a4X2FD2jT35IXrurL1YBHT\n1hh/MVpHlFdSyYOfpBId7MOjg6KNjiMs6IKLXGtdC9wFrAbSgaVaa/nkywENjQ1hREIob6/NZMO+\n40bHcSj19Zr7l6ZSVl3LjDFxsnmcgzHLHLnW+mutdQetdTut9YvmuE9hm/51bRciAry47+NtnCir\nNjqOw3jv5yzWZxbwzDVdiJKLjTscm1zZKayXl5szM8bEcbysioeXbUdrWcLf1FIOnuD1b/cwuFsr\nRvdo8/c/IOyOFLkwu64hfjwyMJrvdh3jgw0HjI5j10oqa7hnSQotfd15aXg3lJJTDR2RFLloEpP6\nRXJ5dAte/CqdtMNNt1mQI9Na89hnO8gtqmT6mFj8PFyMjiQMIkUumoRSitdGdqeZlwt3yxL+JvHR\n5oN8tf0ID1zVgYTw5kbHEQaSIhdNprmXK9NGx3HgeBlPfZ4m8+VmlH6khOe+bFiCf3v/dkbHEQaT\nIhdNqnfbAO4ZEMXylMN8kpRjdBy7UF5dy10fbcXXw4U3R8XKEnwhRS6a3t2XR3FR+wCeWpFG+pES\no+PYNK01TyxPI6ugjGmjYgn0djM6krACUuSiyZmcFG+NisPPw4U7F22V+fILsHjzIZanHGbqgA70\nbW/+XfSEbZIiFxYR5OPG9DFx7D9exmOf7ZD58vOQdriYZ7/YSf8OQdx9uWweJ/6fFLmwmN5tA3jg\nqo58mZrLgl/3Gx3HphRX1DBlUTIB3q68JfPi4nekyIVFTbmkHVd0asELX6WzZX+h0XFsQn295oGl\n2zhSVMnbY+Np7uVqdCRhZaTIhUU5OSlevyGW0GYe3LFoK3kllUZHsnozfsjk+/Q8nhzciYTwZkbH\nEVZIilxYnJ+HC++OT6C0spY7P9pKjexffkZr0o/x5vd7GR4fwoS+EUbHEVZKilwYIjrYl5ev78aW\n/Sd4fuUuo+NYpeyCMqZ+vI2uIb68dJ3soyLOzPwXjxPiLA2NDSHtcDGz12XTMdiHG3uFGx3JapRU\n1jD5gyScnRTvjkuQ/cXFX5IRuTDUo4M6cWnHIJ5ZsVMuRtGotq6euz9KIbugjHduTCC0mafRkYSV\nkyIXhjI5KaaPiSM8wJM7FiVz8LhcvPnFr9P5aW8+LwzrSp92AUbHETZAilwYztfdhTkTelCv4ZYF\nWygurzE6kmEWbTrA+7/sZ1K/SEb3DDM6jrARUuTCKkQGevHuuAQOHi9n8sIkqmrrjI5kcWv35PH0\nip1c1jGIx6/uZHQcYUOkyIXV6NMugFdHxrApu5AHP9lOfb3jLONPPVTEHR9uJTrYh+lj4jDJyk1x\nDuSsFWFVhsaGkFtUyX9W7aa1nzuPOcDIdH9BGbfM30Kgjyvv39wDH3e50o84N1Lkwurcfklbcosq\neO/nLPw9XZlyqf1eOCHvZCU3zduMBhbc3JMWPu5GRxI2SIpcWB2lFM9e24WSyhr+s2o3Xm4mbuoT\nYXQsszteWsWNszdRUFrFolt70TbI2+hIwkZJkQurZHJquOZneXUdT6/YiaerMyMSQo2OZTZF5dWM\nm7uZg4XlzL+5J3FhsoeKOH/yYaewWi4mJ2aMiaNf+0Ae/jSV5Sn2cam4ksoabpq3mX15pcy+KVHO\nFRcXTIpcWDV3FxOzbkqgd9sA7l+aykebDhod6YIcL61i7OyNpB8p4b/j4unfIcjoSMIOSJELq+fp\n6sy8iT24tEMQjy/fwdz12UZHOi9Hiiu44b0NZBwr5b3xCQzo1NLoSMJOSJELm+DuYuK98YkM6hrM\n8yt38drqPTZ1nnl2QRkj/ruBvJIqFk7qxeXRUuLCfKTIhc1wdW6YMx+V2Ia312Zyz5IUKmusfwXo\nhn3HGf7OL1TU1LF4cm96RjY3OpKwM3LWirApziYnXr6+G5FBXrz8zW4OF1Uw+6ZEAr3djI72pxZt\nOsAzK3YSHuDJ3Ak9iAj0MjqSsEMyIhc2RynF7Ze04783xpN+pIQh09ezKcu6tsCtrKnjqc/TeGJ5\nGv2iAll+50VS4qLJSJELmzWoWys+vb0vHq4mxszeyIw1GdRZwbz5nqMnGTbzFxZuPMDk/m2ZO6EH\nvrLsXjQhKXJh07qG+PHl3f24pntrXv9uLzfO2UhWfqkhWerrNfN/yeaat9dTUFrF+xN78PjVnWQD\nLNHklNaWH8EkJibqpKQkiz+usF9aaz5JzuH5lbuoqqlnyqXtmHJpO4tdIm3boSKe+WInqYeKuKxj\nEK+M6E6Qj3XO2wvbpZRK1lon/v52+bBT2AWlFDcktuHSjkG8sDKdaWsyWLHtMFOv6MCQmFY4m5rm\n4PNIcQVvfLuXT5JzCPJx4/WR3RkeHyIXShYWJSNyYZfWZeTzwsp09hw7SWSgF3dc2o5rY1vj5mye\nEfruoyXM+jmLL7blohTcclEkdw+IwttNxkai6ZxpRH5BRa6UGgk8C3QCemqtz6qdpciFJdTXa77d\ndYzpazLYdaQEX3dnBse0Znh8CAlhzXA6x7nrYyWVrEo7ytc7jrApuxAPFxOjerRhUr9I2jSXCySL\nptdURd4JqAfeAx6UIhfWSGvNuowClqccZlXaUSpq6vDzcCEuzJ+EsGZ0auVLoI8bAV6u+Lq7UF5T\nS1lVLUXlNew+epJdR0pIO1zM9pxiAKJaeDM0tjU39gqnmZerwc9OOJImmSPXWqc33vmF3I0QTUop\nRf8OQfTvEMQLw2r5Pv0YG/YdJ/nACX7ck/+3P+/r7kzn1r7cf2UHBnUNJqqljwVSC3H2LDahp5Sa\nDEwGCAuTq4MLY3i5OTM0NoShsSEAFJfXkH28jMKyKgpKqzlZWYunqwkvN2d83J1pH+RNaDMPGawI\nq/a3Ra6U+h4I/pM/ekJrveJsH0hrPQuYBQ1TK2edUIgm5OfpQqynv9ExhLggf1vkWusrLBFECCHE\n+ZGVnUIIYeMu9KyV64AZQBBQBGzTWv/jLH4uHzhwng8bCBSc58/aC3kN5DVw9OcPjvkahGut/3BZ\nKUMWBF0IpVTSn51+40jkNZDXwNGfP8hrcDqZWhFCCBsnRS6EEDbOFot8ltEBrIC8BvIaOPrzB3kN\n/sfm5siFsBSl1E7gTq31j0ZnEeKv2OKIXIg/pZQaq5RKUkqVKqWOKKW+UUr1O9/701p3kRIXtkCK\nXNgFpdT9wFvAS0BLIAx4BxhqZC4hLMGmilwpNVAptUcplamUetToPJaklGqjlFqrlNqllNqplLrX\n6ExGUUqZlFIpSqmVjV/7Ac/RMA3ymda6TGtdo7X+Umv9kFLKTSn1llIqt/HXW0opt8afDVRKrVRK\nFSmlCpVS65RSTo1/tl8pdUXj759VSi1VSn2glDrZ+HeQeFqm1kqpZUqpfKVUtlLqniZ8/v5KqU+V\nUruVUulKqT5N9VjWSil1X+PfQZpSarFSyt3oTEaymSJXSpmAmcAgoDMwRinV2dhUFlULPKC17gz0\nBu50sOd/unuB9NO+7gO4A8vP8P1P0PCaxQLdgZ7Ak41/9gCQQ8OitpbA48CZPji6FlgC+ANfAG8D\nNBb/l0AqEAIMAKYqpf52cdx5mgas0lpH0/B80v/m++2KUioEuAdI1Fp3BUzAaGNTGctmipyGN1+m\n1jpLa11NwxvKYQ6btdZHtNZbG39/koY3b4ixqSxPKRUKDAbmnHZzAFCgta49w4/dCDyntc7TWucD\n/wLGN/5ZDdCKhhVzNVrrdfrMZwCs11p/rbWuAxbSUKIAPYAgrfVzWutqrXUWMJsmKJfGo4/+wFyA\nxscrMvfj2ABnwEMp5Qx4ArkG5zGULRV5CHDotK9zcMAiA1BKRQBxwCZjkxjiLeBhGi5ocspxILDx\nTf1nWvPbLSEONN4G8CqQCXyrlMr6mym7o6f9vhxwb3zMcKB14/RMkVKqiIaRfcuzfVLnIBLIB95v\nnF6ao5TyaoLHsVpa68PAa8BB4AhQrLX+1thUxrKlIheAUsobWAZM1VqXGJ3HkpRSQ4A8rXXy7/5o\nA1AFDDvDj+bSULanhDXehtb6pNb6Aa11WxqmTu5XSg04x2iHgGyttf9pv3y01lef4/2cDWcgHviv\n1joOKAMc7fOiZjQcjUfS8B+yl1JqnLGpjGVLRX4YaHPa16GNtzkMpZQLDSW+SGv9mdF5DHARcK1S\naj8NU2uXK6U+1FoXA08DM5VSw5RSnkopF6XUIKXUK8Bi4EmlVJBSKrDxez+Ehv8clFLtVcOVI4qB\nOn472j8bm4GTSqlHlFIejR/GdlVK9TDLs/6tHCBHa33qaOxTGordkVxBw3+c+VrrGuAzoK/BmQxl\nS0W+BYhSSkUqpVxpmH/8wuBMFtNYNHOBdK31G0bnMYLW+jGtdajWOoKGv/8ftNbjGv/sdeB+Gj7E\nzKdhlHwX8DnwApAEbAd2AFsbbwOIAr4HSmkY2b+jtV57jrnqgCE0fJiaTcOOfHMAv/N9rn/xWEeB\nQ0qpjo03DQB2mftxrNxBoHfjf9iKhtfAoT7w/T2bWtmplLqahjlSEzBPa/2iwZEspnFhyzoaiujU\niPFxrfXXxqUyjlLqUhou+MbigIgAABwhSURBVD3E6CyWppSKpeE/ClcgC7hZa33C2FSWpZT6FzCK\nhrO5UoBbtdZVxqYyjk0VuRBCiD+ypakVIYQQf0KKXAghbJwUuRBC2LgzLaBoUoGBgToiIsKIhxZC\nCJuVnJxc8GfX7DRLkSul5tFw+lVe494HfykiIoKkpCRzPLQQQjgMpdSfXrTeXFMr84GBZrovIYQQ\n58AsI3Kt9c+N+380qfQjJeSfrMLf0wU/Dxeae7ni4+7S1A8rhM04UVZNfmkVZVW1VFTXARDo40aQ\ntxt+Hi44OSmDE4qmYLE5cqXUZGAyQFhY2Hndx4cbD7Bo08Hf3BbW3JOYUD9i2/hzWXQL2gV5X3BW\nIWyB1podh4v5ftcxth8uJv1ICcdKzrwmxs3Zie6h/iRENKNHRDP6tgvE3cVkwcSiqZhtQVDjiHzl\n2cyRJyYm6vOZI88tqiC3qIKi8hqKK2o4WlLJjpxitucUkVtcCUCX1r5c070118WF0NLXofeaF3Yq\nM+8kizYdZHXaUXKLKzE5KaJaeNOplS+dWvnQys8DbzdnPF1N1GsoKK2ioLSKg4XlbD1wgp25JdTW\na3zcnLm6Wyuuiw+hZ0RzGa3bAKVUstY68Q+321KR/5Xcogq+STvKl6m5bDtUhKvJiesTQritfzsi\nAh1ql09hh7TWbMouZPbPWazZnYersxP9o4IY2DWYAdEtaObletb3VVFdx5b9hazYlss3aUcor66j\nQ0tv7hkQxdVdW0mhWzG7L/LT7S8oY876LJYm5VBbV8+w2BAeHRRNCxmhCxuUfqSE577cxYas4zT3\ncuWmPuGM7x1OgLfbBd93eXUt3+w4yn9/2kdmXilRLby5/8oODOwaTMN+VMKaNGmRK6UWA5cCgcAx\n4Bmt9dwzfX9TF/kpeSWVzFmfzfxf9+NqcuK+KzswoU84ziZZByWs34myat74bi+LNh3A18OFqQOi\nGN0zrEnmtevqNV/vOML0NRlk5JVySYcgXhjWlTbNPc3+WOL8NfmI/FxYqshP2V9QxjNf7OSnvflE\nB/vw1uhYooN9Lfb4QpyrtbvzeOjTVE6U1zCuVxj3XdkBf8+znz45X3X1mg827Oe11Xuo05p7B3Rg\ncv+2mGS6xSo4dJFDwxzj6p3HeGpFGsUVNTw5uBPje4fL4aOwKhXVdbz0dToLNx4gOtiHN0fF0qmV\n5QcdR4oreGbFTr7ddYzebZszbXScnDxgBRy+yE8pKK3iwU9S+XFPPld0aslrI2MsMtIR4u9k5Zcy\neWEymXml3NovkocGdsTN2bjTA7XWfJqcw9MrduLpauLNUbH07/CH1eHCgs5U5A43WRzo7ca8CT14\nakhnftqbx3Xv/EpWfqnRsYSD+2lvPkNn/kJhWTUfTurFk0M6G1riAEopRia24Yu7LiLA25UJ729m\n5tpM5BoG1sfhihzAyUkxqV8kH/2zN8UVNVz3zq/8mllgdCzhgLTWzFmXxc3vbybE34MVd15Ev6hA\no2P9RlRLH1bc2Y9rYlrz6uo9PLJsOzV153pZU9GUHLLIT+kR0ZwVd15ES183bpq3maVbDhkdSTiQ\n+nrNv77cxQtfpXNV52CWTelrtWeJeLiamDY6lnsub8/SpBwmvr+Z4ooao2OJRg5d5ABtmnuybEpf\n+rQL4OFl25m7PtvoSMIB1NTV88Anqcz/dT+T+kXyzo3xeLkZsqv0WVNKcf9VHXltZHc2ZxcyetZG\njpc67GUyrYrDFzmAj7sLcyYkMqhrMM+v3MX0NRkyDyiaTGVNHVM+TGZ5ymEevKoDTw7uZFOrKUck\nhDJnQg+y8ksZNWsjeSWVRkdyeFLkjdycTcwYE8f18aG88d1eXv5mt5S5MLvKmjr++UESa3bn8fyw\nrtx1eZRNngJ7SYcg5t/ck9yiCm54bwOHiyqMjuTQpMhP42xy4tURMYzvHc57P2fxxnd7jY4k7Eh1\nbT13LNrKuowC/nN9w78zW9anXQALJ/XieGk1o2dt4GixjMyNIkX+O05Oin9d24VRiW2Y8UMmM9dm\nGh1J2IGaunru+mgrP+zO46XrunFDYhujI5lFQngzFt7ai8LSasbN3SRz5gaRIv8TTk6Kl4Z3Y2hs\nw+lW8+QDUHEB6us1DyxN5dtdx/jXtV0Y2+v89uO3VrFt/Jk7sQeHCsu5ad5mSirlbBZLkyI/A5OT\n4vWR3RnYJZjnVu5ixbbDRkcSNkhrzfNf7eKL1FweGRjNhL4RRkdqEr3bBvDu+AT2HjvJpPlbqKyp\nMzqSQ5Ei/wvOJiemjYmld9vmPPhJKr/IoiFxjmb9nMX7v+znlosiuf2StkbHaVKXdWzBW6PiSDpw\ngqlLtlFXLycLWIoU+d9wczbx3vhE2gZ6c/vCZNKPlBgdSdiI5Sk5/Pub3QyJacWTgzvZ5Nkp52pw\nTCueGtyZVTuP8uJX6UbHcRhS5GfBz8OF92/ugZebMxPf30yunGol/sbGrOM89Ml2+rQN4PUbutvU\neeIX6pZ+kdxyUSTzfsmWBXYWIkV+llr7ezD/lh6UVdVx64IkyqtrjY4krNT+gjJu/zCZ8ABP3h2f\nYPjmV0Z4YnAnBnYJ5oWvdvHtzqNGx7F7UuTnIDrYl+ljYkk/WsIDS1OplzlA8TvFFTVMWrAFgLkT\neuDn4WJwImOYnBRvjY4lJtSfqR9vY/dRmZJsSlLk5+jy6JY8PqgT36Qd5a3vZcGQ+H+1jeeKHyws\n591xCQ5/0W93FxOzxifg7ebMrQuS5BzzJiRFfh5uvTiSkQmhTP8hky9Tc42OI6zEy9/sZl1GAS8M\n60rvtgFGx7EKLX3dmXVTInknq5iyaCvVtbL9bVOQIj8PSilevK4bieHNePjT7XLYKFix7TBz1mcz\noU84o3rY14KfCxXbxp9Xro9hc3Yhz6/cZXQcuyRFfp5cnZ1458Z4fNyduW1hsuzN7MB25ZbwyLLt\n9IxozpNDOhsdxyoNiwthcv+2LNx4gGXJOUbHsTtS5Begha87/x0XT25RBfd9vE0+/HRAReXV3PZh\nEv4ersy8MR4Xk7ylzuThf3SkT9sAHl++g525xUbHsSvyr+4CJYQ35+khnflhdx7T1mQYHUdYUH29\nZurH2zhWXMV/x8UT5ONmdCSr5mxyYsbYOJp5unL7h8kUlVcbHcluSJGbwbje4QyPD2H6Dxn8vDff\n6DjCQt75MZMf9+Tz1DWdiQtrZnQcmxDo7cY74+I5WlwpR7FmJEVuBkopXhzWjQ4tfJj68TaOFMvK\nT3v3a2YBb3y3l6GxrRlnZ7sZNrX4sGY8fU0X1u7J592f9xkdxy5IkZuJh6uJmTfGU1VTx90fpchV\nxu3YsZJK7lmSQtsgb166rptD7KFibuN6hTEkphWvrd7DpqzjRsexeVLkZtS+hTcvDe9G0oETvLp6\nj9FxRBOoravn7sUplFXV8V8buGCytVJK8e/h3QgP8OLuxSkUyGKhCyJFbmZDY0MY1zuMWT9n8cPu\nY0bHEWY2fU0Gm7MLefG6rkS19DE6jk3zcXdh5th4iitqZL78AkmRN4EnB3emcytfHliaKvPlduSX\nzAJmrM1kZEIow+NDjY5jFzq39uVf13ZhXUYB//1J5svPlxR5E3B3MfH22Diqauu5d/E2amW+3Obl\nn6zi3iXbaBfkzb+GdjE6jl0Z1aMN13RvzRvf7SVpf6HRcWySFHkTaRvkzQvDurJ5fyHT5fxym1Zf\nr7l/6TZOVtbw9tg4PF1lXtyclFK8dF1XQvw9uGdxipxffh6kyJvQ8PhQRiSEMmNtJr/uk8vE2apZ\n67JYl1HAM9d0ITrY1+g4dsnH3YW3x8aRX1rFw59uR2uZLz8XUuRN7LmhXYgM9OK+j7dRWCYjDVuT\ncvAEr63ew+BurRjTs43RcexaTKg/jwyM5ttdx1i48YDRcWyKFHkT83R1ZvroOE6U1fDwp6ky0rAh\nJZU13LMkhZa+7rw0XM4Xt4RbLork0o5BvPBVuuwqeg6kyC2ga4gfjw6K5vv0PD7YICMNW6C15snl\naeQWVTJ9TKzDXunH0pycFK+N7I6vuwv3LE6horrO6Eg2QYrcQm6+KILLo1vw4tfppB+RkYa1W7b1\nMF+k5nLvgCgSwpsbHcehBHq78cYN3dl7rJQXvpL9y8+GWYpcKTVQKbVHKZWplHrUHPdpb5RSvDoi\nBj8PF+6WkYZVyy4o4+kVafSKbM6dl7U3Oo5D6t8hiNv6t2XRpoOsSpOLN/+dCy5ypZQJmAkMAjoD\nY5RSsrv+nwhoHGlk5slIw1pV19Zz75IUXExOvDkqFpOTzIsb5YGrOhIT6sejn22XhXV/wxwj8p5A\nptY6S2tdDSwBhprhfu3SxVEy0rBmr3+3h+05xfzn+hha+3sYHcehuTo7MW10HNW19Uxdso06WcJ/\nRuYo8hDg0Glf5zTe9htKqclKqSSlVFJ+vmPv2f3AVR3pFiIjDWuzPqOA937KYmyvMAZ2DTY6jgAi\nA73417Vd2JRdyLuyhP+MLPZhp9Z6ltY6UWudGBQUZKmHtUquzk5MH9Mw0rjvYxlpWIPjpVXct3Qb\n7Vt489RgmRm0JiMSQv+3hH/rwRNGx7FK5ijyw8DpKyVCG28Tf+HUSGNjlow0jKa15qFPt1NcUcOM\nMXF4uJqMjiROo5TihWFdCfZ1594lKZRUyoXOf88cRb4FiFJKRSqlXIHRwBdmuF+7JyMN67Dg1/38\nsDuPxwZF06mVLMG3Rn4eLkwfE0tuUSVPLk+ThXW/c8FFrrWuBe4CVgPpwFKt9c4LvV9HoJTixeu6\n0srPnXsWy0jDCOlHSnjpm91cHt2CiX0jjI4j/kJCeHOmDojii9Rclm2Vg/7TmWWOXGv9tda6g9a6\nndb6RXPcp6PwdXdh2ug4jhTLSMPSyqtruXtxCn4eLrw6IkaW4NuAOy5rT6/I5jy9Io2s/FKj41gN\nWdlpBRLCm/1vpPFJco7RcRzGc1/uYl9+KW/eEEuAt5vRccRZMDkp3hwVi4vJiXuWpFBdK3v9gxS5\n1bjjsvb0btucZ1bsJDNPRhpN7cvUXJZsOcTtl7SjX1Sg0XHEOWjt78ErI2JIO1zCK6t2Gx3HKkiR\nWwmTk2La6IYzJu76aCuVNbKEv6kcKizn8c92EBfmz/1XdjA6jjgP/+gSzE19wpmzPluujYsUuVVp\n6evOayNj2H30JP/+Ot3oOHappq6euxengILpo+NwMclbwFY9fnUnOjVeG/docaXRcQwl/4qtzOXR\nLZnUL5IFGw6wKu2I0XHszqur97DtUBEvD4+hTXNPo+OIC/Cba+MuSXHohXVS5FbokYHRdA/146FP\nt3OosNzoOHbjh93HmPVzFuN6hzE4ppXRcYQZtAvy5rmhXdmUXcg0B742rhS5FXJ1duLtsfEA3PXR\nVvlk3gxyiyq4f2kqnVv58qQswbcrIxJCuT4+lBk/ZLAuwzH3cZIit1Jtmnvy6ojupOYU8+9vZL78\nQtTU1XPP4hRqauuZeWM87i6yBN/ePD+sC+2DvJm6ZBvHShxvvlyK3IoN7BrMzRdF8P4v+/lmh8yX\nn69XVu0m6cAJXhrejchAL6PjiCbg6erMOzfGU15dx92LU6itc6yjWClyK/fYoE50b+PPQ59ul5Vs\n52FV2hFmr8tmfO9whsb+YXdlYUeiWvrw4nVd2ZxdyGvf7jU6jkVJkVs5V2cn3rkxHheTYsqHWymv\nrjU6ks3Iyi/lwU+2072NP08O6WR0HGEBw+NDGdMzjHd/2udQF26RIrcBIf4eTB8Tx968kzz22Q7Z\nj+UsVFTXcceirbiYFO/cGI+bs8yLO4pnr+1M91A/Hvwk1WGOYqXIbcTFUUE8cGUHVmzLZcGv+42O\nY9W01jyybDt7jp3krdFxhMgl2xyKm7OJd8Yl4GJS3P5hMmVV9n8UK0VuQ+64tD1XdGrBC1+ls2Hf\ncaPjWK1ZP2fxRWouD17VkUs6OPbVqBxViL8HM8bEk5lXysPLttv9UawUuQ1xclK8MSqW8ABP7vxo\nqywW+hM/7c3nP6t2c3W3YO64tJ3RcYSB+kUF8vDAaL7afoSZazONjtOkpMhtjK+7C7NvSqSmrp7J\nC5Plw8/T7C8o4+6PttKhpQ+vjugu+4sLbuvflmGxrXnt2718u9N+P/yUIrdBbYO8mTEmjj1HS3jo\nk+3UO/AeE6cUV9QwacEWnJwUs8Yn4uXmbHQkYQWUUrx8fQwxoX7c9/E29hw9aXSkJiFFbqMu7diC\nRwdF89WOI7z+3R6j4xiqpq6eOxYlc7CwnHfHJRAWIJthif/n7mJi1vhEPN2cmbRgC/knq4yOZHZS\n5Dbsnxe3ZUzPMGau3cfSLYeMjmMIrTVPfZ7GL5nH+ffwGHq3DTA6krBCwX7uzLkpkYLSKm79IImK\navva71+K3IYppXhuaBcujgrk8eU7WJ9RYHQki3vv5yyWbDnEXZe1Z0RCqNFxhBXr3saf6aPj2J5T\nxD12tu2tFLmNczE1rPxs38KbKR8msyu3xOhIFvPZ1hxe/mY3g2NayZV+xFm5qkswzwzpzHe7jvH8\nyl12c1qiFLkd8HF3Yd7EHni7O3PTvM3sLygzOlKT+2H3MR76dDt92wXwxg3dcXKSM1TE2Zl4USS3\n9otk/q/7eefHfUbHMQspcjvR2t+DhZN6Uldfz7i5m+z60lfJBwq5Y9FWOrXy4b3xCbL8Xpyzx6/u\nxHVxIby6eo9drJSWIrcj7Vv4sOCWnpwoq+ameZs4UVZtdCSzSztczC3zk2jl58H8m3vi4+5idCRh\ng5ycFK+OiOHKzi155oudLEvOMTrSBZEitzMxof7MnpDI/uPl3DhnE4V2VOZph4u5cc4mvN2c+eCW\nngR6uxkdSdgwZ5MTM8bEcVH7AB5etp2vttvunv9S5Haob7tAZt+UyL78UsbO3sjxUts/b3ZnbjHj\n5jaU+JLJveXCycIsTp1jHh/mz92Lt7I8xTZH5lLkduqSDkHMndCD7IIyxszeaNOLIHbkNIzEPV1M\nLP6nlLgwLy83Z+bf3JNekQHcvzTVJtdkSJHbsX5Rgbw/sQcHC8u54b0NHDhue2ez/Lw3n1GzNuDl\n6sySyX1k1aZoEl5uzsyb2IN+7QN5eNl25v+SbXSkcyJFbuf6tg9k0a29OFFezfB3fiX1UJHRkc7a\n5ymHuWX+FsIDvFh+R18pcdGkPFxNzL4pkSs7t+TZL3fxwspdNrOPkRS5A0gIb86yKX3xcDUxetZG\n1qQfMzrSX9JaM3NtJlM/3kZiRDM+vq03LXzdjY4lHIC7i4l3xyUwsW8Ec9Znc8eirTaxnF+K3EG0\nC/Lmszv60r6FN7d+kMS07zOscrRRWlXLlA+38urqPVzbvTXzb+6Jr5xiKCzI5KR49touPDWkM6t3\nHWXUrA1Wv/e/FLkDaeHjzse39WZYbAhvfr+XWxZsoajcek5P3JdfyrCZv/Bd+jGeHNyJaaNjcXeR\nxT7CGJP6RfLeuASy88sYPH2dVe9nLkXuYDxdnXnjhu68MKwrv2YeZ/D09fySaexmW/X1mgW/7mfI\n9PUUllWzcFJPbr24rVwYQhjuqi7BrLynH2EBnkxemMzzK3dRWWN9Uy3KiE1jEhMTdVJSksUfV/zW\ntkNF3PfxtoZTFHu24bGrO1l8GuNQYTmPLNvOr/uO079DEP+5vhut/ORiycK6VNXW8eJX6Xyw4QCR\ngV68dF03+rSz/JbJSqlkrXXiH26XIndslTV1vPndXmavy6KFjzuPDOrItd1DMDXxJlRlVbXM+jmL\n2euyUMCTQzozukcbGYULq7Y+o4DHl+9oOKU3MZSH/hFNkI/lVhhLkYu/lHqoiMeX72BnbgnRwT48\neFVHBnRqYfZiraqtY1nyYd78fi/5J6sY3K0Vjw6KlkU+wmZUVNcxbU0Gs9dl4WJSTOgTweT+bQmw\nwJYRTVLkSqmRwLNAJ6Cn1vqs2lmK3DrV12u+2nGEN77bS3ZBGdHBPoztFcbQ2BD8PC5syuVIcQUf\nbTrI4s0HKSitJjG8GY8P7kR8WDMzpRfCsrILypixJoPPtx3G3cXEiIRQRia0oWuIb5MdWTZVkXcC\n6oH3gAelyO1DTV09y7ceZsGG/ezMLcHdxYkrOwdzcftA+rYPILTZ34+etdZk5pXy45581u7JY1N2\nIfVaMyC6BTf1ieDiqECZRhF2ITOvlJlrM/lqxxGqa+uJDvZhSEwrerUNICbUz6zbLDfp1IpS6kek\nyO3SjpxiPtp8kO92HaOgcfOtEH8PwgM8adPMk1b+7piUol5DXX09h4sqyS4oJaugjKLyGgA6tPTm\nik4tGdMzTKZQhN0qLq/hy+25fJKc878V1K7OTnRp7Utrfw+Cfd1p6evGoK6tzvt9YHiRK6UmA5MB\nwsLCEg4cOHDBjyssR2vN3mOl/JJZQMqhIg4VlpNzouJ/5X5KsK87kYFeRAZ50aW1L5d2bEGIv5yF\nIhxLYVk1SfsL2ZxdSFpuMcdKqjhaXElFTR0fTupFv6jA87rf8y5ypdT3QPCf/NETWusVjd/zIzIi\nd0g1dfUAmJRCKWS6RIgz0FpTWlWLq7PTeU+3nKnInc/iwa84r0cUDsHFJGvKhDgbSqkmu6KVvAuF\nEMLGXVCRK6WuU0rlAH2Ar5RSq80TSwghxNkyZEGQUiofON9POwMBYzcHMZ68BvIaOPrzB8d8DcK1\n1kG/v9GQIr8QSqmkP5vsdyTyGshr4OjPH+Q1OJ3MkQshhI2TIhdCCBtni0U+y+gAVkBeA3kNHP35\ng7wG/2Nzc+RCCCF+yxZH5EIIIU4jRS6EEDbOpopcKTVQKbVHKZWplHrU6DyWpJRqo5Raq5TapZTa\nqZS61+hMRlFKmZRSKUqplUZnMYJSyl8p9alSardSKl0p1cfoTJamlLqv8X2QppRarJRyNzqTkWym\nyJVSJmAmMAjoDIxRSnU2NpVF1QIPaK07A72BOx3s+Z/uXiDd6BAGmgas0lpHA91xsNdCKRUC3AMk\naq27AiZgtLGpjGUzRQ70BDK11lla62pgCTDU4EwWo7U+orXe2vj7kzS8eUOMTWV5SqlQYDAwx+gs\nRlBK+QH9gbkAWutqrXWRsakM4Qx4KKWcAU8g1+A8hrKlIg8BDp32dQ4OWGQASqkIIA7YZGwSQ7wF\nPEzDlakcUSSQD7zfOL00RynlZXQoS9JaHwZeAw4CR4BirfW3xqYyli0VuQCUUt7AMmCq1rrE6DyW\npJQaAuRprZONzmIgZyAe+K/WOg4oAxzt86JmNByNRwKtAS+l1DhjUxnLlor8MNDmtK9DG29zGEop\nFxpKfJHW+jOj8xjgIuBapdR+GqbWLldKfWhsJIvLAXK01qeOxj6lodgdyRVAttY6X2tdA3wG9DU4\nk6Fsqci3AFFKqUillCsNH258YXAmi1ENl96ZC6Rrrd8wOo8RtNaPaa1DtdYRNPz9/6C1dqiRmNb6\nKHBIKdWx8aYBwC4DIxnhINBbKeXZ+L4YgIN94Pt7f3uFIGuhta5VSt0FrKbhU+p5WuudBseypIuA\n8cAOpdS2xtse11p/bWAmYYy7gUWNA5os4GaD81iU1nqTUupTYCsNZ3Ol4ODL9WWJvhBC2DhbmloR\nQgjxJ6TIhRDCxkmRCyGEjZMiF0IIGydFLoQQNk6KXAghbJwUuRBC2Lj/AyMO7/scLs1PAAAAAElF\nTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "tags": []
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Compute the x and y coordinates for points on sine and cosine curves\n",
    "x = np.arange(0, 3 * np.pi, 0.1)\n",
    "y_sin = np.sin(x)\n",
    "y_cos = np.cos(x)\n",
    "\n",
    "# Set up a subplot grid that has height 2 and width 1,\n",
    "# and set the first such subplot as active.\n",
    "plt.subplot(2, 1, 1)\n",
    "\n",
    "# Make the first plot\n",
    "plt.plot(x, y_sin)\n",
    "plt.title('Sine')\n",
    "\n",
    "# Set the second subplot as active, and make the second plot.\n",
    "plt.subplot(2, 1, 2)\n",
    "plt.plot(x, y_cos)\n",
    "plt.title('Cosine')\n",
    "\n",
    "# Show the figure.\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gLtsST5SL9jc"
   },
   "source": [
    "You can read much more about the `subplot` function in the [documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 0,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "eJXA5AWSL9jc"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "colab-tutorial.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}


================================================
FILE: python-numpy-tutorial.md
================================================
---
layout: page
title: Python Numpy Tutorial (with Jupyter and Colab)
permalink: /python-numpy-tutorial/
---

<div>
  <a href="https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb" target="_blank">
    <img class="colab-badge" src="/assets/badges/colab-open.svg" alt="Colab Notebook"/>
  </a>
</div>

This tutorial was originally contributed by [Justin Johnson](http://cs.stanford.edu/people/jcjohns/).

We will use the Python programming language for all assignments in this course.
Python is a great general-purpose programming language on its own, but with the
help of a few popular libraries (numpy, scipy, matplotlib) it becomes a powerful
environment for scientific computing.

We expect that many of you will have some experience with Python and numpy;
for the rest of you, this section will serve as a quick crash course on both
the Python programming language and its use for scientific
computing. We'll also introduce notebooks, which are a very convenient way
of tinkering with Python code. Some of you may have previous knowledge in a
different language, in which case we also recommend referencing:
[NumPy for Matlab users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html),
[Python for R users](http://www.data-analysis-in-python.org/python_for_r.html), and/or
[Python for SAS users](https://nbviewer.jupyter.org/github/RandyBetancourt/PythonForSASUsers/tree/master/).


**Table of Contents**

- [Jupyter and Colab Notebooks](#jupyter-and-colab-notebooks)
- [Python](#python)
  - [Python versions](#python-versions)
  - [Basic data types](#basic-data-types)
  - [Containers](#containers)
    - [Lists](#lists)
    - [Dictionaries](#dictionaries)
    - [Sets](#sets)
    - [Tuples](#tuples)
  - [Functions](#functions)
  - [Classes](#classes)
- [Numpy](#numpy)
  - [Arrays](#arrays)
  - [Array indexing](#array-indexing)
  - [Datatypes](#datatypes)
  - [Array math](#array-math)
  - [Broadcasting](#broadcasting)
  - [Numpy Documentation](#numpy-documentation)
- [SciPy](#scipy)
  - [Image operations](#image-operations)
  - [MATLAB files](#matlab-files)
  - [Distance between points](#distance-between-points)
- [Matplotlib](#matplotlib)
  - [Plotting](#plotting)
  - [Subplots](#subplots)
  - [Images](#images)

## Jupyter and Colab Notebooks

Before we dive into Python, we'd like to briefly talk about *notebooks*.
A Jupyter notebook lets you write and execute
Python code *locally* in your web browser. Jupyter notebooks
make it very easy to tinker with code and execute it in bits
and pieces; for this reason they are widely used in scientific
computing.
Colab on the other hand is Google's flavor of
Jupyter notebooks that is particularly suited for machine
learning and data analysis and that runs entirely in the *cloud*.
Colab is basically Jupyter notebook on steroids: it's free, requires no setup,
comes preinstalled with many packages, is easy to share with the world,
and benefits from free access to hardware accelerators like GPUs and TPUs (with some caveats).

**Run Tutorial in Colab (recommended)**. If you wish to run this tutorial entirely in Colab, click the `Open in Colab` badge at the very top of this page.

**Run Tutorial in Jupyter Notebook**. If you wish to run the notebook locally with Jupyter, make sure your virtual environment is installed correctly (as per the [setup instructions]({{site.baseurl}}/setup-instructions/)), activate it, then run `pip install notebook` to install Jupyter notebook. Next, [open the notebook](https://raw.githubusercontent.com/cs231n/cs231n.github.io/master/jupyter-notebook-tutorial.ipynb) and download it to a directory of your choice by right-clicking on the page and selecting `Save Page As`. Then `cd` to that directory and run `jupyter notebook`.

<div class='fig figcenter'>
  <img src='/assets/ipython-tutorial/file-browser.png'>
</div>

This should automatically launch a notebook server at `http://localhost:8888`.
If everything worked correctly, you should see a screen like this, showing all
available notebooks in the current directory. Click `jupyter-notebook-tutorial.ipynb`
and follow the instructions in the notebook. Otherwise, you can continue reading the
tutorial with code snippets below.

<a name='python'></a>

## Python

Python is a high-level, dynamically typed multiparadigm programming language.
Python code is often said to be almost like pseudocode, since it allows you
to express very powerful ideas in very few lines of code while being very
readable. As an example, here is an implementation of the classic quicksort
algorithm in Python:

```python
def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)

print(quicksort([3,6,8,10,1,2,1]))
# Prints "[1, 1, 2, 3, 6, 8, 10]"
```

### Python versions
As of Janurary 1, 2020, Python has [officially dropped support](https://www.python.org/doc/sunset-python-2/) for `python2`.
**For this class all code will use Python 3.7**. Ensure you have gone through the [setup instructions](setup.md)
and correctly installed a `python3` virtual environment before proceeding with this tutorial.
You can double-check your Python version at the command line after activating your environment
by running `python --version`.

<a name='python-basic'></a>

### Basic data types

Like most languages, Python has a number of basic types including integers,
floats, booleans, and strings. These data types behave in ways that are
familiar from other programming languages.

**Numbers:** Integers and floats work as you would expect from other languages:

```python
x = 3
print(type(x)) # Prints "<class 'int'>"
print(x)       # Prints "3"
print(x + 1)   # Addition; prints "4"
print(x - 1)   # Subtraction; prints "2"
print(x * 2)   # Multiplication; prints "6"
print(x ** 2)  # Exponentiation; prints "9"
x += 1
print(x)  # Prints "4"
x *= 2
print(x)  # Prints "8"
y = 2.5
print(type(y)) # Prints "<class 'float'>"
print(y, y + 1, y * 2, y ** 2) # Prints "2.5 3.5 5.0 6.25"
```
Note that unlike many languages, Python does not have unary increment (`x++`)
or decrement (`x--`) operators.

Python also has built-in types for complex numbers;
you can find all of the details
[in the documentation](https://docs.python.org/3.5/library/stdtypes.html#numeric-types-int-float-complex).

**Booleans:** Python implements all of the usual operators for Boolean logic,
but uses English words rather than symbols (`&&`, `||`, etc.):

```python
t = True
f = False
print(type(t)) # Prints "<class 'bool'>"
print(t and f) # Logical AND; prints "False"
print(t or f)  # Logical OR; prints "True"
print(not t)   # Logical NOT; prints "False"
print(t != f)  # Logical XOR; prints "True"
```

**Strings:** Python has great support for strings:

```python
hello = 'hello'    # String literals can use single quotes
world = "world"    # or double quotes; it does not matter.
print(hello)       # Prints "hello"
print(len(hello))  # String length; prints "5"
hw = hello + ' ' + world  # String concatenation
print(hw)  # prints "hello world"
hw12 = '%s %s %d' % (hello, world, 12)  # sprintf style string formatting
print(hw12)  # prints "hello world 12"
```

String objects have a bunch of useful methods; for example:

```python
s = "hello"
print(s.capitalize())  # Capitalize a string; prints "Hello"
print(s.upper())       # Convert a string to uppercase; prints "HELLO"
print(s.rjust(7))      # Right-justify a string, padding with spaces; prints "  hello"
print(s.center(7))     # Center a string, padding with spaces; prints " hello "
print(s.replace('l', '(ell)'))  # Replace all instances of one substring with another;
                                # prints "he(ell)(ell)o"
print('  world '.strip())  # Strip leading and trailing whitespace; prints "world"
```
You can find a list of all string methods [in the documentation](https://docs.python.org/3.5/library/stdtypes.html#string-methods).

<a name='python-containers'></a>

### Containers
Python includes several built-in container types: lists, dictionaries, sets, and tuples.

<a name='python-lists'></a>

#### Lists
A list is the Python equivalent of an array, but is resizeable
and can contain elements of different types:

```python
xs = [3, 1, 2]    # Create a list
print(xs, xs[2])  # Prints "[3, 1, 2] 2"
print(xs[-1])     # Negative indices count from the end of the list; prints "2"
xs[2] = 'foo'     # Lists can contain elements of different types
print(xs)         # Prints "[3, 1, 'foo']"
xs.append('bar')  # Add a new element to the end of the list
print(xs)         # Prints "[3, 1, 'foo', 'bar']"
x = xs.pop()      # Remove and return the last element of the list
print(x, xs)      # Prints "bar [3, 1, 'foo']"
```
As usual, you can find all the gory details about lists
[in the documentation](https://docs.python.org/3.5/tutorial/datastructures.html#more-on-lists).

**Slicing:**
In addition to accessing list elements one at a time, Python provides
concise syntax to access sublists; this is known as *slicing*:

```python
nums = list(range(5))     # range is a built-in function that creates a list of integers
print(nums)               # Prints "[0, 1, 2, 3, 4]"
print(nums[2:4])          # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print(nums[2:])           # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print(nums[:2])           # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print(nums[:])            # Get a slice of the whole list; prints "[0, 1, 2, 3, 4]"
print(nums[:-1])          # Slice indices can be negative; prints "[0, 1, 2, 3]"
nums[2:4] = [8, 9]        # Assign a new sublist to a slice
print(nums)               # Prints "[0, 1, 8, 9, 4]"
```
We will see slicing again in the context of numpy arrays.

**Loops:** You can loop over the elements of a list like this:

```python
animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print(animal)
# Prints "cat", "dog", "monkey", each on its own line.
```

If you want access to the index of each element within the body of a loop,
use the built-in `enumerate` function:

```python
animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print('#%d: %s' % (idx + 1, animal))
# Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line
```

**List comprehensions:**
When programming, frequently we want to transform one type of data into another.
As a simple example, consider the following code that computes square numbers:

```python
nums = [0, 1, 2, 3, 4]
squares = []
for x in nums:
    squares.append(x ** 2)
print(squares)   # Prints [0, 1, 4, 9, 16]
```

You can make this code simpler using a **list comprehension**:

```python
nums = [0, 1, 2, 3, 4]
squares = [x ** 2 for x in nums]
print(squares)   # Prints [0, 1, 4, 9, 16]
```

List comprehensions can also contain conditions:

```python
nums = [0, 1, 2, 3, 4]
even_squares = [x ** 2 for x in nums if x % 2 == 0]
print(even_squares)  # Prints "[0, 4, 16]"
```

<a name='python-dicts'></a>

#### Dictionaries
A dictionary stores (key, value) pairs, similar to a `Map` in Java or
an object in Javascript. You can use it like this:

```python
d = {'cat': 'cute', 'dog': 'furry'}  # Create a new dictionary with some data
print(d['cat'])       # Get an entry from a dictionary; prints "cute"
print('cat' in d)     # Check if a dictionary has a given key; prints "True"
d['fish'] = 'wet'     # Set an entry in a dictionary
print(d['fish'])      # Prints "wet"
# print(d['monkey'])  # KeyError: 'monkey' not a key of d
print(d.get('monkey', 'N/A'))  # Get an element with a default; prints "N/A"
print(d.get('fish', 'N/A'))    # Get an element with a default; prints "wet"
del d['fish']         # Remove an element from a dictionary
print(d.get('fish', 'N/A')) # "fish" is no longer a key; prints "N/A"
```
You can find all you need to know about dictionaries
[in the documentation](https://docs.python.org/3.5/library/stdtypes.html#dict).

**Loops:** It is easy to iterate over the keys in a dictionary:

```python
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal in d:
    legs = d[animal]
    print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"
```

If you want access to keys and their corresponding values, use the `items` method:

```python
d = {'person': 2, 'cat': 4, 'spider': 8}
for animal, legs in d.items():
    print('A %s has %d legs' % (animal, legs))
# Prints "A person has 2 legs", "A cat has 4 legs", "A spider has 8 legs"
```

**Dictionary comprehensions:**
These are similar to list comprehensions, but allow you to easily construct
dictionaries. For example:

```python
nums = [0, 1, 2, 3, 4]
even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0}
print(even_num_to_square)  # Prints "{0: 0, 2: 4, 4: 16}"
```

<a name='python-sets'></a>

#### Sets
A set is an unordered collection of distinct elements. As a simple example, consider
the following:

```python
animals = {'cat', 'dog'}
print('cat' in animals)   # Check if an element is in a set; prints "True"
print('fish' in animals)  # prints "False"
animals.add('fish')       # Add an element to a set
print('fish' in animals)  # Prints "True"
print(len(animals))       # Number of elements in a set; prints "3"
animals.add('cat')        # Adding an element that is already in the set does nothing
print(len(animals))       # Prints "3"
animals.remove('cat')     # Remove an element from a set
print(len(animals))       # Prints "2"
```

As usual, everything you want to know about sets can be found
[in the documentation](https://docs.python.org/3.5/library/stdtypes.html#set).


**Loops:**
Iterating over a set has the same syntax as iterating over a list;
however since sets are unordered, you cannot make assumptions about the order
in which you visit the elements of the set:

```python
animals = {'cat', 'dog', 'fish'}
for idx, animal in enumerate(animals):
    print('#%d: %s' % (idx + 1, animal))
# Prints "#1: fish", "#2: dog", "#3: cat"
```

**Set comprehensions:**
Like lists and dictionaries, we can easily construct sets using set comprehensions:

```python
from math import sqrt
nums = {int(sqrt(x)) for x in range(30)}
print(nums)  # Prints "{0, 1, 2, 3, 4, 5}"
```

<a name='python-tuples'></a>

#### Tuples
A tuple is an (immutable) ordered list of values.
A tuple is in many ways similar to a list; one of the most important differences is that
tuples can be used as keys in dictionaries and as elements of sets, while lists cannot.
Here is a trivial example:

```python
d = {(x, x + 1): x for x in range(10)}  # Create a dictionary with tuple keys
t = (5, 6)        # Create a tuple
print(type(t))    # Prints "<class 'tuple'>"
print(d[t])       # Prints "5"
print(d[(1, 2)])  # Prints "1"
```
[The documentation](https://docs.python.org/3.5/tutorial/datastructures.html#tuples-and-sequences) has more information about tuples.

<a name='python-functions'></a>

### Functions
Python functions are defined using the `def` keyword. For example:

```python
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))
# Prints "negative", "zero", "positive"
```

We will often define functions to take optional keyword arguments, like this:

```python
def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Bob') # Prints "Hello, Bob"
hello('Fred', loud=True)  # Prints "HELLO, FRED!"
```
There is a lot more information about Python functions
[in the documentation](https://docs.python.org/3.5/tutorial/controlflow.html#defining-functions).

<a name='python-classes'></a>

### Classes

The syntax for defining classes in Python is straightforward:

```python
class Greeter(object):

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"
```
You can read a lot more about Python classes
[in the documentation](https://docs.python.org/3.5/tutorial/classes.html).

<a name='numpy'></a>

## Numpy

[Numpy](http://www.numpy.org/) is the core library for scientific computing in Python.
It provides a high-performance multidimensional array object, and tools for working with these
arrays. If you are already familiar with MATLAB, you might find
[this tutorial useful](https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html) to get started with Numpy.

<a name='numpy-arrays'></a>

### Arrays
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of
nonnegative integers. The number of dimensions is the *rank* of the array; the *shape*
of an array is a tuple of integers giving the size of the array along each dimension.

We can initialize numpy arrays from nested Python lists,
and access elements using square brackets:

```python
import numpy as np

a = np.array([1, 2, 3])   # Create a rank 1 array
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"
```

Numpy also provides many functions to create arrays:

```python
import numpy as np

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"
```
You can read about other methods of array creation
[in the documentation](http://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation).

<a name='numpy-array-indexing'></a>

### Array indexing
Numpy offers several ways to index into arrays.

**Slicing:**
Similar to Python lists, numpy arrays can be sliced.
Since arrays may be multidimensional, you must specify a slice for each dimension
of the array:

```python
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"
```

You can also mix integer indexing with slice indexing.
However, doing so will yield an array of lower rank than the original array.
Note that this is quite different from the way that MATLAB handles array
slicing:

```python
import numpy as np

# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"
```

**Integer array indexing:**
When you index into numpy arrays using slicing, the resulting array view
will always be a subarray of the original array. In contrast, integer array
indexing allows you to construct arbitrary arrays using the data from another
array. Here is an example:

```python
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"
```

One useful trick with integer array indexing is selecting or mutating one
element from each row of a matrix:

```python
import numpy as np

# Create a new array from which we will select elements
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"

# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])
```

**Boolean array indexing:**
Boolean array indexing lets you pick out arbitrary elements of an array.
Frequently this type of indexing is used to select the elements of an array
that satisfy some condition. Here is an example:

```python
import numpy as np

a = np.array([[1,2], [3, 4], [5, 6]])

bool_idx = (a > 2)   # Find the elements of a that are bigger than 2;
                     # this returns a numpy array of Booleans of the same
                     # shape as a, where each slot of bool_idx tells
                     # whether that element of a is > 2.

print(bool_idx)      # Prints "[[False False]
                     #          [ True  True]
                     #          [ True  True]]"

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  # Prints "[3 4 5 6]"

# We can do all of the above in a single concise statement:
print(a[a > 2])     # Prints "[3 4 5 6]"
```

For brevity we have left out a lot of details about numpy array indexing;
if you want to know more you should
[read the documentation](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html).

<a name='numpy-datatypes'></a>

### Datatypes
Every numpy array is a grid of elements of the same type.
Numpy provides a large set of numeric datatypes that you can use to construct arrays.
Numpy tries to guess a datatype when you create an array, but functions that construct
arrays usually also include an optional argument to explicitly specify the datatype.
Here is an example:

```python
import numpy as np

x = np.array([1, 2])   # Let numpy choose the datatype
print(x.dtype)         # Prints "int64"

x = np.array([1.0, 2.0])   # Let numpy choose the datatype
print(x.dtype)             # Prints "float64"

x = np.array([1, 2], dtype=np.int64)   # Force a particular datatype
print(x.dtype)                         # Prints "int64"
```
You can read all about numpy datatypes
[in the documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html).

<a name='numpy-math'></a>

### Array math
Basic mathematical functions operate elementwise on arrays, and are available
both as operator overloads and as functions in the numpy module:

```python
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))
```

Note that unlike MATLAB, `*` is elementwise multiplication, not matrix
multiplication. We instead use the `dot` function to compute inner
products of vectors, to multiply a vector by a matrix, and to
multiply matrices. `dot` is available both as a function in the numpy
module and as an instance method of array objects:

```python
import numpy as np

x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

# Inner product of vectors; both produce 219
print(v.dot(w))
print(np.dot(v, w))

# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(v))
print(np.dot(x, v))

# Matrix / matrix product; both produce the rank 2 array
# [[19 22]
#  [43 50]]
print(x.dot(y))
print(np.dot(x, y))
```

Numpy provides many useful functions for performing computations on
arrays; one of the most useful is `sum`:

```python
import numpy as np

x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"
```
You can find the full list of mathematical functions provided by numpy
[in the documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

Apart from computing mathematical functions using arrays, we frequently
need to reshape or otherwise manipulate data in arrays. The simplest example
of this type of operation is transposing a matrix; to transpose a matrix,
simply use the `T` attribute of an array object:

```python
import numpy as np

x = np.array([[1,2], [3,4]])
print(x)    # Prints "[[1 2]
            #          [3 4]]"
print(x.T)  # Prints "[[1 3]
            #          [2 4]]"

# Note that taking the transpose of a rank 1 array does nothing:
v = np.array([1,2,3])
print(v)    # Prints "[1 2 3]"
print(v.T)  # Prints "[1 2 3]"
```
Numpy provides many more functions for manipulating arrays; you can see the full list
[in the documentation](http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html).


<a name='numpy-broadcasting'></a>

### Broadcasting
Broadcasting is a powerful mechanism that allows numpy to work with arrays of different
shapes when performing arithmetic operations. Frequently we have a smaller array and a
larger array, and we want to use the smaller array multiple times to perform some operation
on the larger array.

For example, suppose that we want to add a constant vector to each
row of a matrix. We could do it like this:

```python
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = np.empty_like(x)   # Create an empty matrix with the same shape as x

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(4):
    y[i, :] = x[i, :] + v

# Now y is the following
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]
print(y)
```

This works; however when the matrix `x` is very large, computing an explicit loop
in Python could be slow. Note that adding the vector `v` to each row of the matrix
`x` is equivalent to forming a matrix `vv` by stacking multiple copies of `v` vertically,
then performing elementwise summation of `x` and `vv`. We could implement this
approach like this:

```python
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
vv = np.tile(v, (4, 1))   # Stack 4 copies of v on top of each other
print(vv)                 # Prints "[[1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]
                          #          [1 0 1]]"
y = x + vv  # Add x and vv elementwise
print(y)  # Prints "[[ 2  2  4
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"
```

Numpy broadcasting allows us to perform this computation without actually
creating multiple copies of `v`. Consider this version, using broadcasting:

```python
import numpy as np

# We will add the vector v to each row of the matrix x,
# storing the result in the matrix y
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
y = x + v  # Add v to each row of x using broadcasting
print(y)  # Prints "[[ 2  2  4]
          #          [ 5  5  7]
          #          [ 8  8 10]
          #          [11 11 13]]"
```

The line `y = x + v` works even though `x` has shape `(4, 3)` and `v` has shape
`(3,)` due to broadcasting; this line works as if `v` actually had shape `(4, 3)`,
where each row was a copy of `v`, and the sum was performed elementwise.

Broadcasting two arrays together follows these rules:

1. If the arrays do not have the same rank, prepend the shape of the lower rank array
   with 1s until both shapes have the same length.
2. The two arrays are said to be *compatible* in a dimension if they have the same
   size in the dimension, or if one of the arrays has size 1 in that dimension.
3. The arrays can be broadcast together if they are compatible in all dimensions.
4. After broadcasting, each array behaves as if it had shape equal to the elementwise
   maximum of shapes of the two input arrays.
5. In any dimension where one array had size 1 and the other array had size greater than 1,
   the first array behaves as if it were copied along that dimension

If this explanation does not make sense, try reading the explanation
[from the documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)
or [this explanation](https://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc).


Functions that support broadcasting are known as *universal functions*. You can find
the list of all universal functions
[in the documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are some applications of broadcasting:

```python
import numpy as np

# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
print(np.reshape(v, (3, 1)) * w)

# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:
# [[2 4 6]
#  [5 7 9]]
print(x + v)

# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:
# [[ 5  6  7]
#  [ 9 10 11]]
print((x.T + w).T)
# Another solution is to reshape w to be a column vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
# [[ 2  4  6]
#  [ 8 10 12]]
print(x * 2)
```

Broadcasting typically makes your code more concise and faster, so you
should strive to use it where possible.

### Numpy Documentation
This brief overview has touched on many of the important things that you need to
know about numpy, but is far from complete. Check out the
[numpy reference](http://docs.scipy.org/doc/numpy/reference/)
to find out much more about numpy.

<a name='scipy'></a>

## SciPy
Numpy provides a high-performance multidimensional array and basic tools to
compute with and manipulate these arrays.
[SciPy](http://docs.scipy.org/doc/scipy/reference/)
builds on this, and provides
a large number of functions that operate on numpy arrays and are useful for
different types of scientific and engineering applications.

The best way to get familiar with SciPy is to
[browse the documentation](http://docs.scipy.org/doc/scipy/reference/index.html).
We will highlight some parts of SciPy that you might find useful for this class.

<a name='scipy-image'></a>

### Image operations
SciPy provides some basic functions to work with images.
For example, it has functions to read images from disk into numpy arrays,
to write numpy arrays to disk as images, and to resize images.
Here is a simple example that showcases these functions:

```python
from scipy.misc import imread, imsave, imresize

# Read an JPEG image into a numpy array
img = imread('assets/cat.jpg')
print(img.dtype, img.shape)  # Prints "uint8 (400, 248, 3)"

# We can tint the image by scaling each of the color channels
# by a different scalar constant. The image has shape (400, 248, 3);
# we multiply it by the array [1, 0.95, 0.9] of shape (3,);
# numpy broadcasting means that this leaves the red channel unchanged,
# and multiplies the green and blue channels by 0.95 and 0.9
# respectively.
img_tinted = img * [1, 0.95, 0.9]

# Resize the tinted image to be 300 by 300 pixels.
img_tinted = imresize(img_tinted, (300, 300))

# Write the tinted image back to disk
imsave('assets/cat_tinted.jpg', img_tinted)
```

<div class='fig figcenter fighighlight'>
  <img src='/assets/cat.jpg'>
  <img src='/assets/cat_tinted.jpg'>
  <div class='figcaption'>
    Left: The original image.
    Right: The tinted and resized image.
  </div>
</div>

<a name='scipy-matlab'></a>

### MATLAB files
The functions `scipy.io.loadmat` and `scipy.io.savemat` allow you to read and
write MATLAB files. You can read about them
[in the documentation](http://docs.scipy.org/doc/scipy/reference/io.html).

<a name='scipy-dist'></a>

### Distance between points
SciPy defines some useful functions for computing distances between sets of points.

The function `scipy.spatial.distance.pdist` computes the distance between all pairs
of points in a given set:

```python
import numpy as np
from scipy.spatial.distance import pdist, squareform

# Create the following array where each row is a point in 2D space:
# [[0 1]
#  [1 0]
#  [2 0]]
x = np.array([[0, 1], [1, 0], [2, 0]])
print(x)

# Compute the Euclidean distance between all rows of x.
# d[i, j] is the Euclidean distance between x[i, :] and x[j, :],
# and d is the following array:
# [[ 0.          1.41421356  2.23606798]
#  [ 1.41421356  0.          1.        ]
#  [ 2.23606798  1.          0.        ]]
d = squareform(pdist(x, 'euclidean'))
print(d)
```
You can read all the details about this function
[in the documentation](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html).

A similar function (`scipy.spatial.distance.cdist`) computes the distance between all pairs
across two sets of points; you can read about it
[in the documentation](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html).

<a name='matplotlib'></a>

## Matplotlib
[Matplotlib](http://matplotlib.org/) is a plotting library.
In this section give a brief introduction to the `matplotlib.pyplot` module,
which provides a plotting system similar to that of MATLAB.

<a name='matplotlib-plotting'></a>

### Plotting
The most important function in matplotlib is `plot`,
which allows you to plot 2D data. Here is a simple example:

```python
import numpy as np
import matplotlib.pyplot as plt

# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)

# Plot the points using matplotlib
plt.plot(x, y)
plt.show()  # You must call plt.show() to make graphics appear.
```

Running this code produces the following plot:

<div class='fig figcenter fighighlight'>
  <img src='/assets/sine.png'>
</div>

With just a little bit of extra work we can easily plot multiple lines
at once, and add a title, legend, and axis labels:

```python
import numpy as np
import matplotlib.pyplot as plt

# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Plot the points using matplotlib
plt.plot(x, y_sin)
plt.plot(x, y_cos)
plt.xlabel('x axis label')
plt.ylabel('y axis label')
plt.title('Sine and Cosine')
plt.legend(['Sine', 'Cosine'])
plt.show()
```
<div class='fig figcenter fighighlight'>
  <img src='/assets/sine_cosine.png'>
</div>

You can read much more about the `plot` function
[in the documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot).

<a name='matplotlib-subplots'></a>

### Subplots
You can plot different things in the same figure using the `subplot` function.
Here is an example:

```python
import numpy as np
import matplotlib.pyplot as plt

# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)

# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)

# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')

# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')

# Show the figure.
plt.show()
```

<div class='fig figcenter fighighlight'>
  <img src='/assets/sine_cosine_subplot.png'>
</div>

You can read much more about the `subplot` function
[in the documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot).

<a name='matplotlib-images'></a>

### Images
You can use the `imshow` function to show images. Here is an example:

```python
import numpy as np
from scipy.misc import imread, imresize
import matplotlib.pyplot as plt

img = imread('assets/cat.jpg')
img_tinted = img * [1, 0.95, 0.9]

# Show the original image
plt.subplot(1, 2, 1)
plt.imshow(img)

# Show the tinted image
plt.subplot(1, 2, 2)

# A slight gotcha with imshow is that it might give strange results
# if presented with data that is not uint8. To work around this, we
# explicitly cast the image to uint8 before displaying it.
plt.imshow(np.uint8(img_tinted))
plt.show()
```

<div class='fig figcenter fighighlight'>
  <img src='/assets/cat_tinted_imshow.png'>
</div>


================================================
FILE: rnn.md
================================================
---
layout: page
permalink: /rnn/
---


Table of Contents:

- [Introduction to RNN](#intro)
- [RNN example as Character-level language model](#char)
- [Multilayer RNNs](#multi)
- [Long-Short Term Memory (LSTM)](#lstm)


<a name='intro'></a>

## Introduction to RNN

In this lecture note, we're going to be talking about the Recurrent Neural Networks (RNNs). One
great thing about the RNNs is that they offer a lot of flexibility on how we wire up the neural
network architecture. Normally when we're working with neural networks (Figure 1), we are given a fixed sized
input vector (red), then we process it with some hidden layers (green), and  we produce a
fixed sized output vector (blue) as depicted in the leftmost model ("Vanilla" Neural Networks) in Figure 1.
While **"Vanilla" Neural Networks** receive a single input and produce one label for that image, there are tasks where
the model produce a sequence of outputs as shown in the one-to-many model in Figure 1. **Recurrent Neural Networks** allow 
us to operate over sequences of input, output, or both at the same time. 
* An example of **one-to-many** model is image captioning where we are given a fixed sized image and produce a sequence of words that describe the content of that image through RNN (second model in Figure 1).
* An example of **many-to-one** task is action prediction where we look at a sequence of video frames instead of a single image and produce
a label of what action was happening in the video as shown in the third model in Figure 1. Another example of many-to-one task is 
sentiment classification in NLP where we are given a sequence of words of a sentence and then classify what sentiment (e.g. positive or negative) that sentence is.
* An example of **many-to-many** task is video-captioning where the input is a sequence of video frames and the output is caption that describes
what was in the video as shown in the fourth model in Figure 1. Another example of many-to-many task is machine translation in NLP, where we can have an
RNN that takes a sequence of words of a sentence in English, and then this RNN is asked to produce a sequence of words of a sentence in French. 
* There is a also a **variation of many-to-many** task as shown in the last model in Figure 1, 
where the model generates an output at every timestep. An example of this many-to-many task is video classification on a frame level
where the model classifies every single frame of video with some number of classes. We should note that we don't want 
this prediction to only be a function of the current timestep (current frame of the video), but also all the timesteps (frames)
that have come before this video. 

In general, RNNs allow us to wire up an architecture, where the prediction at every single timestep is a
function of all the timesteps that have come before.

<div class="fig figcenter fighighlight">
  <img src="/assets/rnn/types.png" width="100%">
  <div class="figcaption"> <b> Figure 1.</b> Different (non-exhaustive) types of Recurrent Neural Network architectures. Red boxes are input vectors. Green boxes are hidden layers. Blue boxes are output vectors.</div>
</div>

### Why are existing convnets insufficient? 
The existing convnets are insufficient to deal with tasks that have inputs and outputs with variable sequence lengths. 
In the example of video captioning, inputs have variable number of frames (e.g. 10-minute and 10-hour long video) and outputs are captions
of variable length. Convnets can only take in inputs with a fixed size of width and height and cannot generalize over 
inputs with different sizes. In order to tackle this problem, we introduce Recurrent Neural Networks (RNNs). 

### Recurrent Neural Network
RNN is basically a blackbox (Left of Figure 2), where it has an “internal state” that is updated as a sequence is processed. At every single timestep, we feed in an input vector into RNN where it modifies that state as a function of what it receives. When we tune RNN weights, 
RNN will show different behaviors in terms of how its state evolves as it receives these inputs. 
We are also interested in producing an output based on the RNN state, so we can produce these output vectors on top of the RNN (as depicted in Figure 2).

If we unroll an RNN model (Right of Figure 2), then there are inputs (e.g. video frame) at different timesteps shown as $$x_1, x_2, x_3$$ ... $$x_t$$. 
RNN at each timestep takes in two inputs -- an input frame ($$x_i$$) and previous representation of what it seems so far (i.e. history) -- to generate an output $$y_i$$ and update its history, which will get forward propagated over time. All the RNN blocks in Figure 2 (Right) are the same block that share the same parameter, but have different inputs and history at each timestep.

<div class="fig figcenter fighighlight">
  <img src="/assets/rnn/rnn_blackbox.png" width="16%" >
  <img src="/assets/rnn/unrolledRNN.png" width="60%" >
  <div class="figcaption"><b>Figure 2. </b>Simplified RNN box (Left) and Unrolled RNN (Right).</div>
</div>

More precisely, RNN can be represented as a recurrence formula of some function $$f_W$$ with
parameters $$W$$:

$$
h_t = f_W(h_{t-1}, x_t)
$$

where at every timestep it receives some previous state as a vector $$h_{t-1}$$ of previous
iteration timestep $$t-1$$ and current input vector $$x_t$$ to produce the current state as a vector
$$h_t$$. A fixed function $$f_W$$ with weights $$W$$ is applied at every single timestep and that allows us to use
the Recurrent Neural Network on sequences without having to commit to the size of the sequence because
we apply the exact same function at every single timestep, no matter how long the input or output
sequences are.

In the most simplest form of RNN, which we call a Vanilla RNN, the network is just a single hidden
state $$h$$ where we use a recurrence formula that basically tells us how we should update our hidden
state $$h$$ as a function of previous hidden state $$h_{t-1}$$ and the current input $$x_t$$. In particular, we're
going to have weight matrices $$W_{hh}$$ and $$W_{xh}$$, where they will project both the hidden
state $$h_{t-1}$$ from the previous timestep and the current input $$x_t$$, and then those are going to be summed
and squished with $$tanh$$ function to update the hidden state $$h_t$$ at timestep $$t$$. This recurrence
is telling us how $$h$$ will change as a function of its history and also the current input at this
timestep:

$$
h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t)
$$

<div class="fig figcenter">
  <img src="/assets/rnn/vanilla_rnn_mformula_1.png" width="80%" >
</div>

We can base predictions on top of $$h_t$$ by using just another matrix projection on top
of the hidden state. This is the simplest complete case in which you can wire up a neural network:

$$
y_t = W_{hy}h_t
$$

<div class="fig figcenter">
  <img src="/assets/rnn/vanilla_rnn_mformula_2.png" width="40%" >
</div>

So far we have showed RNN in terms of abstract vectors $$x, h, y$$, however we can endow these vectors
with semantics in the following section.


<a name='char'></a>

## RNN example as Character-level language model

One of the simplest ways in which we can use an RNN is in the case of a character-level language model
since it's intuitive to understand. The way this RNN will work is we will feed a sequence of characters
into the RNN and at every single timestep, we will ask the RNN to predict the next character in the
sequence. The prediction of RNN will be in the form of score distribution of the characters in the vocabulary
for what RNN thinks should come next in the sequence that it has seen so far.

So suppose, in a very simple example (Figure 3), we have the training sequence of just one string $$\text{"hello"}$$, and we have a vocabulary
$$V \in \{\text{"h"}, \text{"e"}, \text{"l"}, \text{"o"}\}$$ of 4 characters in the entire dataset. We are going to try to get an RNN to
learn to predict the next character in the sequence on this training data.

<div class="fig figcenter fighighlight">
  <img src="/assets/rnn/char_level_language_model.png" width="60%" >
  <div class="figcaption">Figure 3. Simplified Character-level Language Model RNN.</div>
</div>

As shown in Figure 3, we'll feed in one character at a time into an RNN, first $$\text{"h"}$$, then
$$\text{"e"}$$, then $$\text{"l"}$$, and finally $$\text{"l"}$$. All characters are encoded in the representation
of what's called a one-hot vector, where only one unique bit of the vector is turned on for each unique
character in the vocabulary. For example:

$$
\begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \end{bmatrix} = \text{"h"}\ \ 
\begin{bmatrix}0 \\ 1 \\ 0 \\ 0 \end{bmatrix} = \text{"e"}\ \ 
\begin{bmatrix}0 \\ 0 \\ 1 \\ 0 \end{bmatrix} = \text{"l"}\ \ 
\begin{bmatrix}0 \\ 0 \\ 0 \\ 1 \end{bmatrix} = \text{"o"}
$$

Then we're going to use the recurrence formula from the previous section at every single timestep.
Suppose we start off with $$h$$ as a vector of size 3 with all zeros. By using this fixed recurrence
formula, we're going to end up with a 3-dimensional representation of the next hidden state $$h$$
that basically at any point in time summarizes all the characters that have come until then:

$$
\begin{aligned}
\begin{bmatrix}0.3 \\ -0.1 \\ 0.9 \end{bmatrix} &= f_W(W_{hh}\begin{bmatrix}0 \\ 0 \\ 0 \end{bmatrix} + W_{xh}\begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \end{bmatrix}) \ \ \ \ &(1) \\
\begin{bmatrix}1.0 \\ 0.3 \\ 0.1 \end{bmatrix} &= f_W(W_{hh}\begin{bmatrix}0.3 \\ -0.1 \\ 0.9 \end{bmatrix} + W_{xh}\begin{bmatrix}0 \\ 1 \\ 0 \\ 0 \end{bmatrix}) \ \ \ \ &(2) \\
\begin{bmatrix}0.1 \\ -0.5 \\ -0.3 \end{bmatrix} &= f_W(W_{hh}\begin{bmatrix}1.0 \\ 0.3 \\ 0.1 \end{bmatrix} + W_{xh}\begin{bmatrix}0 \\ 0 \\ 1 \\ 0 \end{bmatrix}) \ \ \ \ &(3) \\
\begin{bmatrix}-0.3 \\ 0.9 \\ 0.7 \end{bmatrix} &= f_W(W_{hh}\begin{bmatrix}0.1 \\ -0.5 \\ -0.3 \end{bmatrix} + W_{xh}\begin{bmatrix}0 \\ 0 \\ 1 \\ 0 \end{bmatrix}) \ \ \ \ &(4)
\end{aligned}
$$

As we apply this recurrence at every timestep, we're going to predict what should be the next character
in the sequence at every timestep. Since we have four characters in vocabulary $$V$$, we're going to
predict 4-dimensional vector of logits at every single timestep.

As shown in Figure 3, in the very first timestep we fed in $$\text{"h"}$$, and the RNN with its current
setting of weights computed a vector of logits:

$$
\begin{bmatrix}1.0 \\ 2.2 \\ -3.0 \\ 4.1 \end{bmatrix} \rightarrow \begin{bmatrix}\text{"h"} \\ \text{"e"} \\ \text{"l"}\\ \text{"o"} \end{bmatrix}
$$

where RNN thinks that the next character $$\text{"h"}$$ is $$1.0$$ likely to come next, $$\text{"e"}$$ is $$2.2$$ likely,
$$\text{"e"}$$ is $$-3.0$$ likely, and $$\text{"o"}$$ is $$4.1$$ likely to come next. In this case,
RNN incorrectly suggests that $$\text{"o"}$$ should come next, as the score of $$4.1$$ is the highest.
However, of course, we know that in this training sequence $$\text{"e"}$$ should follow $$\text{"h"}$$,
so in fact the score of $$2.2$$ is the correct answer as it's highlighted in green in Figure 3, and we want that to be high and all other scores
to be low. At every single timestep we have a target for what next character should come in the sequence,
therefore the error signal is backpropagated as a gradient of the loss function through the connections.
As a loss function we could choose to have a softmax classifier, for example, so we just get all those losses
flowing down from the top backwards to calculate the gradients on all the weight matrices to figure out how to
shift the matrices so that the correct probabilities are coming out of the RNN. Similarly we can imagine
how to scale up the training of the model over larger training dataset.


<a name='multi'></a>

## Multilayer RNNs

So far we have only shown RNNs with just one layer. However, we're not limited to only a single layer architectures.
One of the ways, RNNs are used today is in more complex manner. RNNs can be stacked together in multiple layers,
which gives more depth, and empirically deeper architectures tend to work better (Figure 4).

<div class="fig figcenter fighighlight">
  <img src="/assets/rnn/multilayer_rnn.png" width="40%" >
  <div class="figcaption">Figure 4. Multilayer RNN example.</div>
</div>

For example, in Figure 4, there are three separate RNNs each with their own set of weights. Three RNNs
are stacked on top of each other, so the input of the second RNN (second RNN layer in Figure 4) is the
vector of the hidden state vector of the first RNN (first RNN layer in Figure 4). All stacked RNNs
are trained jointly, and the diagram in Figure 4 represents one computational graph.


<a name='lstm'></a>

## Long-Short Term Memory (LSTM)

So far we have seen only a simple recurrence formula for the Vanilla RNN. In practice, we actually will
rarely ever use Vanilla RNN formula. Instead, we will use what we call a Long-Short Term Memory (LSTM)
RNN.

### Vanilla RNN Gradient Flow & Vanishing Gradient Problem
An RNN block takes in input $$x_t$$ and previous hidden representation $$h_{t-1}$$ and learn a transformation, which is then passed through tanh to produce the hidden representation $$h_{t}$$ for the next time step and output $$y_{t}$$ as shown in the equation below.

$$ h_t = tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$

For the back propagation, Let's examine how the output at the very last timestep affects the weights at the very first time step.
The partial derivative of $$h_t$$ with respect to $$h_{t-1}$$ is written as: 
$$ \frac{\partial h_t}{\partial h_{t-1}} =  tanh^{'}(W_{hh}h_{t-1} + W_{xh}x_t)W_{hh} $$

We update the weights $$W_{hh}$$ by getting the derivative of the loss at the very last time step $$L_{t}$$ with respect to $$W_{hh}$$. 

$$
\begin{aligned}
\frac{\partial L_{t}}{\partial W_{hh}} = \frac{\partial L_{t}}{\partial h_{t}} \frac{\partial h_{t}}{\partial h_{t-1} } \dots \frac{\partial h_{1}}{\partial W_{hh}} \\
= \frac{\partial L_{t}}{\partial h_{t}}(\prod_{t=2}^{T} \frac{\partial h_{t}}{\partial  h_{t-1}})\frac{\partial h_{1}}{\partial W_{hh}} \\
= \frac{\partial L_{t}}{\partial h_{t}}(\prod_{t=2}^{T}  tanh^{'}(W_{hh}h_{t-1} + W_{xh}x_t)W_{hh}^{T-1})\frac{\partial h_{1}}{\partial W_{hh}} \\
\end{aligned}
$$

* **Vanishing gradient:** We see that $$tanh^{'}(W_{hh}h_{t-1} + W_{xh}x_t)$$ will almost always be less than 1 because tanh is always between negative one and one. Thus, as $$t$$ gets larger (i.e. longer timesteps),  the gradient ($$\frac{\partial L_{t}}{\partial W} $$) will descrease in value and get close to zero. 
This will lead to vanishing gradient problem, where gradients at future time steps rarely impact gradients at the very first time step. This is problematic when we model long sequence of inputs because the updates will be extremely slow. 

* **Removing non-linearity (tanh):** If we remove non-linearity (tanh) to solve the vanishing gradient problem, then we will be left with  
$$
\begin{aligned}
\frac{\partial L_{t}}{\partial W} = \frac{\partial L_{t}}{\partial h_{t}}(\prod_{t=2}^{T} W_{hh}^{T-1})\frac{\partial h_{1}}{\partial W} 
\end{aligned}
$$
    * Exploding gradients: If the largest singular value of W_{hh} is greater than 1, then the gradients will blow up and the model will get very large gradients coming back from future time steps. Exploding gradient often leads to getting gradients that are NaNs. 
    * Vanishing gradients: If the laregest singular value of W_{hh} is smaller than 1, then we will have vanishing gradient problem as mentioned above which will significantly slow down learning.   
 
In practice, we can treat the exploding gradient problem through gradient clipping, which is clipping large gradient values to a maximum threshold. However, since vanishing gradient problem still exists in cases where largest singular value of W_{hh} matrix is less than one, LSTM was designed to avoid this problem. 


### LSTM Formulation

The following is the precise formulation for LSTM. On step $$t$$, there is a hidden state $$h_t$$ and
a cell state $$c_t$$. Both $$h_t$$ and $$c_t$$ are vectors of size $$n$$. One distinction of LSTM from
Vanilla RNN is that LSTM has this additional $$c_t$$ cell state, and intuitively it can be thought of as
$$c_t$$ stores long-term information. LSTM can read, erase, and write information to and from this $$c_t$$ cell.
The way LSTM alters $$c_t$$ cell is through three special gates: $$i, f, o$$ which correspond to “input”,
“forget”, and “output” gates. The values of these gates vary from closed (0) to open (1). All $$i, f, o$$
gates are vectors of size $$n$$.

At every timestep we have an input vector $$x_t$$, previous hidden state $$h_{t-1}$$, previous cell state $$c_{t-1}$$,
and LSTM computes the next hidden state $$h_t$$, and next cell state $$c_t$$ at timestep $$t$$ as follows:

$$
\begin{aligned}
f_t &= \sigma(W_{hf}h_{t_1} + W_{xf}x_t) \\
i_t &= \sigma(W_{hi}h_{t_1} + W_{xi}x_t) \\
o_t &= \sigma(W_{ho}h_{t_1} + W_{xo}x_t) \\
g_t &= \text{tanh}(W_{hg}h_{t_1} + W_{xg}x_t) \\
\end{aligned}
$$

<div class="fig figcenter">
  <img src="/assets/rnn/lstm_mformula_1.png" width="50%" >
</div>

$$
\begin{aligned}
c_t &= f_t \odot c_{t-1} + i_t \odot g_t \\
h_t &= o_t \odot \text{tanh}(c_t) \\
\end{aligned}
$$

<div class="fig figcenter">
  <img src="/assets/rnn/lstm_mformula_2.png" width="40%" >
</div>

where $$\odot$$ is an element-wise Hadamard product. $$g_t$$ in the above formulas is an intermediary
calculation cache that's later used with $$o$$ gate in the above formulas.

Since all $$f, i, o$$ gate vector values range from 0 to 1, because they were squashed by sigmoid function
$$\sigma$$, when multiplied element-wise, we can see that:

  * **Forget Gate:** Forget gate $$f_t$$ at time step $$t$$ controls how much information needs to be "removed" from the previous cell state $$c_{t-1}$$. 
This forget gate learns to erase hidden representations from the previous time steps, which is why LSTM will have two 
hidden represtnations $$h_t$$ and cell state $$c_t$$. This $$c_t$$ will get propagated over time and learn whether to forget
the previous cell state or not.
  * **Input Gate:** Input gate $$i_t$$ at time step $$t$$ controls how much information needs to be "added" to the next cell state $$c_t$$ from previous hidden state $$h_{t-1}$$ and input $$x_t$$. Instead of tanh, the "input" gate $$i$$ has a sigmoid function, which converts inputs to values between zero and one.
This serves as a switch, where values are either almost always zero or almost always one. This "input" gate decides whether to take the RNN output that is produced by the "gate" gate $$g$$ and multiplies the output with input gate $$i$$.
  * **Output Gate:** Output gate $$o_t$$ at time step $$t$$ controls how much information needs to be "shown" as output in the current hidden state $$h_t$$. 

The key idea of LSTM is the cell state, the horizontal line running through between recurrent timesteps. You can imagine the cell
state to be some kind of highway of information passing through straight down the entire chain, with
only some minor linear interactions. With the formulation above, it's easy for information to just flow
along this highway (Figure 5). Thus, even when there is a bunch of LSTMs stacked together, we can get an uninterrupted gradient flow where the gradients flow back through cell states instead of hidden states $$h$$ without vanishing in every time step.

This greatly fixes the gradient vanishing/exploding problem we have outlined above. Figure 5 also shows that gradient contains a vector of activations of the "forget" gate. This allows better control of gradients values by using suitable parameter updates of the "forget" gate.

<div class="fig figcenter fighighlight">
  <img src="/assets/rnn/lstm_highway.png" width="70%" >
  <div class="figcaption">Figure 5. LSTM cell state highway.</div>
</div>

### Does LSTM solve the vanishing gradient problem? 
LSTM architecture makes it easier for the RNN to preserve information over many recurrent time steps. For example,
if the forget gate is set to 1, and the input gate is set to 0, then the infomation of the cell state
will always be preserved over many recurrent time steps. For a Vanilla RNN, in contrast, it's much harder to preserve information
in hidden states in recurrent time steps by just making use of a single weight matrix.

LSTMs do not guarantee that there is no vanishing/exploding gradient problems, but it does provide an
easier way for the model to learn long-distance dependencies.


================================================
FILE: setup.md
================================================
---
layout: page
title: Software Setup
permalink: /setup-instructions/
---

This year, the recommended way to work on assignments is through [Google Colaboratory](https://colab.research.google.com/). However, if you already own GPU-backed hardware and would prefer to work locally, we provide you with instructions for setting up a virtual environment.

- [Working remotely on Google Colaboratory](#working-remotely-on-google-colaboratory)
- [Working locally on your machine](#working-locally-on-your-machine)
  - [Anaconda virtual environment](#anaconda-virtual-environment)
  - [Python venv](#python-venv)
  - [Installing packages](#installing-packages)

### Working remotely on Google Colaboratory

Google Colaboratory is basically a combination of Jupyter notebook and Google Drive. It runs entirely in the cloud and comes
preinstalled with many packages (e.g. PyTorch and Tensorflow) so everyone has access to the same
dependencies. Even cooler is the fact that Colab benefits from free access to hardware accelerators
like GPUs (K80, P100) and TPUs which will be particularly useful for assignments 2 and 3.

**Requirements**. To use Colab, you must have a Google account with an associated Google Drive. Assuming you have both, you can connect Colab to your Drive with the following steps:

1. Click the wheel in the top right corner and select `Settings`.
2. Click on the `Manage Apps` tab.
3. At the top, select `Connect more apps` which should bring up a `GSuite Marketplace` window.
4. Search for **Colab** then click `Add`.

**Workflow**. Every assignment provides you with a download link to a zip file containing Colab notebooks and Python starter code. You can upload the folder to Drive, open the notebooks in Colab and work on them, then save your progress back to Drive. We encourage you to watch the tutorial video below which covers the recommended workflow using assignment 1 as an example.

<iframe style="display: block; margin: auto;" width="560" height="315" src="https://www.youtube.com/embed/DsGd2e9JNH4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Best Practices**. There are a few things you should be aware of when working with Colab. The first thing to note is that resources aren't guaranteed (this is the price for being free). If you are idle for a certain amount of time or your total connection time exceeds the maximum allowed time (~12 hours), the Colab VM will disconnect. This means any unsaved progress will be lost. <font color="red"><strong>Thus, get into the habit of frequently saving your code whilst working on assignments.</strong></font> To read more about resource limitations in Colab, read their FAQ [here](https://research.google.com/colaboratory/faq.html).

**Using a GPU**. Using a GPU is as simple as switching the runtime in Colab. Specifically, click `Runtime -> Change runtime type -> Hardware Accelerator -> GPU` and your Colab instance will automatically be backed by GPU compute.

If you're interested in learning more about Colab, we encourage you to visit the resources below:

* [Intro to Google Colab](https://www.youtube.com/watch?v=inN8seMm7UI)
* [Welcome to Colab](https://colab.research.google.com/notebooks/intro.ipynb)
* [Overview of Colab Features](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)

### Working locally on your machine
If you wish to work locally, you should use a virtual environment. You can install one via Anaconda (recommended) or via Python's native `venv` module. Ensure you are using Python 3.7 as **we are no longer supporting Python 2**.

#### Anaconda virtual environment
We strongly recommend using the free [Anaconda Python distribution](https://www.anaconda.com/download/), which provides an easy way for you to handle package dependencies. Please be sure to download the Python 3 version, which currently installs Python 3.7. The neat thing about Anaconda is that it ships with [MKL optimizations](https://docs.anaconda.com/mkl-optimizations/) by default, which means your `numpy` and `scipy` code benefit from significant speed-ups without having to change a single line of code.

Once you have Anaconda installed, it makes sense to create a virtual environment for the course. If you choose not to use a virtual environment (strongly not recommended!), it is up to you to make sure that all dependencies for the code are installed globally on your machine. To set up a virtual environment called `cs231n`, run the following in your terminal:

```bash
# this will create an anaconda environment
# called cs231n in 'path/to/anaconda3/envs/'
conda create -n cs231n python=3.7
```

To activate and enter the environment, run `conda activate cs231n`. To deactivate the environment, either run `conda deactivate cs231n` or exit the terminal. Note that every time you want to work on the assignment, you should rerun `conda activate cs231n`.

```bash
# sanity check that the path to the python
# binary matches that of the anaconda env
# after you activate it
which python
# for example, on my machine, this prints
# $ '/Users/kevin/anaconda3/envs/sci/bin/python'
```

You may refer to [this page](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) for more detailed instructions on managing virtual environments with Anaconda.

**Note:** If you've chosen to go the Anaconda route, you can safely skip the next section and move straight to [Installing Packages](#installing-packages).

<a name='venv'></a>
#### Python venv

As of 3.3, Python natively ships with a lightweight virtual environment module called [venv](https://docs.python.org/3/library/venv.html). Each virtual environment packages its own independent set of installed Python packages that are isolated from system-wide Python packages and runs a Python version that matches that of the binary that was used to create it. To set up a virtual environment called `cs231n`, run the following in your terminal:

```bash
# this will create a virtual environment
# called cs231n in your home directory
python3.7 -m venv ~/cs231n
```

To activate and enter the environment, run `source ~/cs231n/bin/activate`. To deactivate the environment, either run `deactivate` or exit the terminal. Note that every time you want to work on the assignment, you should rerun `source ~/cs231n/bin/activate`.

```bash
# sanity check that the path to the python
# binary matches that of the virtual env
# after you activate it
which python
# for example, on my machine, this prints
# $ '/Users/kevin/cs231n/bin/python'
```

<a name='packages'></a>
#### Installing packages

Once you've **setup** and **activated** your virtual environment (via `conda` or `venv`), you should install the libraries needed to run the assignments using `pip`. To do so, run:

```bash
# again, ensure your virtual env (either conda or venv)
# has been activated before running the commands below
cd assignment1  # cd to the assignment directory

# install assignment dependencies.
# since the virtual env is activated,
# this pip is associated with the
# python binary of the environment
pip install -r requirements.txt
```


================================================
FILE: student-contributions/BackPropagationBasicMatrixOperations.bib
================================================
@misc{wiki:Matrix_calculus,
   author = "Wikipedia",
   title = "{Matrix calculus} --- {W}ikipedia{,} The Free Encyclopedia",
   year = "2021",
   url = {http://en.wikipedia.org/w/index.php?title=Matrix\%20calculus&oldid=1027030038},
   note = "[Online; accessed 13-June-2021]"
}

@misc{li_krishna_xu_2021, place={Stanford}, title={Convolutional Neural Networks}, url={http://cs231n.stanford.edu/slides/2021/lecture_5.pdf}, journal={CS231N Lectures}, publisher={CA}, author={Li, Fei-Fei and Krishna, Ranjay and Xu, Danfei}, year={2021}, month={4}}


================================================
FILE: student-contributions/BackPropagationBasicMatrixOperations.tex
================================================
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{mathtools} % Required for display matrices. Extension on top of amsmath package.
\usepackage{bm} % for rendering vectors correctly
\usepackage{xcolor}
\usepackage{amssymb} % for rendering dimension symbol R
\usepackage[nocfg,notintoc]{nomencl}
\makenomenclature
\usepackage{booktabs}
%\renewcommand{\arraystretch}{2.0} % affects matrices too.
\usepackage[letterpaper, hmargin=0.8in]{geometry}
\usepackage{fancyhdr}
\pagestyle{fancy}

% with this we ensure that the chapter and section
% headings are in lowercase.
%\renewcommand{\chaptermark}[1]{\markboth{#1}{}}
\renewcommand{\sectionmark}[1]{%
\markright{\thesection\ #1}}
\fancyhf{} % delete current header and footer
\fancyhead[L,R]{\bfseries\thepage}
\fancyhead[LO]{\bfseries\rightmark}
%\fancyhead[RE]{\bfseries\leftmark}
\renewcommand{\headrulewidth}{0.5pt}
\renewcommand{\footrulewidth}{0pt}
\addtolength{\headheight}{0.5pt} % space for the rule
\fancypagestyle{plain}{%
\fancyhead{} % get rid of headers on plain pages
\renewcommand{\headrulewidth}{0pt} % and the line
}

\usepackage{biblatex}
\addbibresource{BackPropagationBasicMatrixOperations.bib}

% hyperref package doesn't seem to be working.


\renewcommand{\nomname}{} % We don't to use any word here.
\newcommand{\transpose}[1]{#1^\top}
\newcommand{\vecr}[1]{\bm{#1}}
\newcommand{\matr}[1]{\mathbf{#1}} % undergraduate algebra version
%\newcommand{\matr}[1]{#1}          % pure math version
%\newcommand{\matr}[1]{\bm{#1}}     % ISO complying version

\newcommand{\eqncomment}[1]{
\footnotesize
\textcolor{gray}{
\begin{pmatrix*}[l]
\text{#1}
\end{pmatrix*}
}}
\newcommand{\longeqncomment}[2]
{\footnotesize
\textcolor{gray}{
\begin{pmatrix*}[l]
\text{#1} \\
\text{#2}
\end{pmatrix*}
}}

\title{CS231N: Backpropagation - Vector and Matrix Calculus}
\author{
  Ashish Jain \\
  Stanford University \\
  \texttt{ashishj@stanford.edu/cs231n@ashishjain.io}
 }
\date{\today}

\begin{document}
\pagestyle{empty}
\maketitle

\tableofcontents

\newpage
\pagestyle{fancy}
\section{Introduction}
\subsection{Intended Audience}
This document is primarily targeted towards CS231N students who don't have a formal background in vector and matrix calculus.

\subsection{Learning Goals}
CS231N assignments have a heavy emphasis on vector and matrix calculus. Several different notation and layout conventions are popular in the literature which can be confusing and overwhelming for students just starting out with vector and matrix calculus. This document firmly aligns and sticks with the layout conventions for matrix calculus taught in CS231N lectures \cite{li_krishna_xu_2021}. This document hopes to bring the students who lack a formal background in vector and matrix calculus up to speed as well as serve as a reference while solving the various assignments. 

\subsection{Content}
Through small non-trivial examples, we show how and why certain backpropagation operations reduce to the equations they reduce to for basic operations such as matrix multiplication, some element-wise operation on a matrix, element-wise operations between two matrices, reductions of a matrix to a vector and broadcasting of a vector to a matrix. The example matrices and vectors are deliberately kept small to ensure one can convince oneself or alternately verify on paper if the contents of this document are correct (please feel free to submit bugs or corrections). The examples considered while small are kept as general as possible throughout the derivations. Therefore, a motivated student can easily extend these examples to general proofs.

Each section from section \ref{Matrix Multiplication} onward is written independently of each other, and therefore, you can directly jump to a section of your interest.

\section{Nomenclature}
\vspace{-2.5em}
\nomenclature[N]{\(\vecr{x}\)}{Row or column vector}
\nomenclature[N]{\(\matr{X}\)}{Two dimensional matrix}
\nomenclature[N]{\(L\)}{Loss/Cost scalar}
\nomenclature[N]{\(\mathbb{R}\)}{Real numbers}
\nomenclature[O]{\(\mathrm{d}\text{Variable}\)}{$\frac{\partial L}{\partial \text{Variable}}$}

\printnomenclature

\section{Layout Convention}
Given a vector $\vecr{y}$ where $\bm{y} \in \mathbb{R}^{m}$ and a vector $\bm{x}$ where $\bm{x} \in \mathbb{R}^{n}$, we are going to follow the \textbf{denominator layout} convention whereby $\frac{\partial y}{\partial x}$ is written as $n \times m$ matrix. This is in contrast to the \textbf{numerator layout} whereby $\frac{\partial y}{\partial x}$ is written as $m \times n$ matrix. For more details, please refer to the Wikipedia page \cite{wiki:Matrix_calculus}.

For example, if $\bm{x} \in \mathbb{R}^{3}$ and $\bm{y} \in \mathbb{R}^{2}$ then:

\begin{flalign}
\frac{\partial \bm{y}}{\partial \bm{x}} &= 
\begin{bmatrix}
\frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{1}} \\[0.5em]
\frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{2}} \\[0.5em]
\frac{\partial y_{1}}{\partial x_{3}} & \frac{\partial y_{2}}{\partial x_{3}} \\[0.5em]
\end{bmatrix}
& \longeqncomment{Denominator layout where layout is according to $\transpose{\vecr{y}}$ across axis 1 and $\vecr{x}$ across axis 0.}{In other words, the elements of $\vecr{y}$ are laid out in columns and the elements of $\vecr{x}$ are laid out in rows.}
\nonumber
\end{flalign}

\section{Summary}
\small
\begin{tabular}{cllll}
\toprule
Index & Name & $\matr{Z}$/$\vecr{z}$ & $\frac{\partial L}{\partial \matr{X}}$/$\frac{\partial L}{\partial \vecr{x}}$ & $\frac{\partial L}{\partial \matr{Y}}$/$\frac{\partial L}{\partial \vecr{y}}$ \\[0.3em]

\midrule

1 & Matrix Multiplication & $\matr{Z} = \matr{X}\matr{Y}$ &
$\frac{\partial L}{\partial \matr{X}}=\frac{\partial L}{\partial \matr{Z}} \transpose{\matr{Y}}$ &
$\frac{\partial L}{\partial \matr{Y}}=\transpose{\matr{X}} \frac{\partial L}{\partial \matr{Z}}$ \\[1em]

2 & Element-wise function & $\matr{Z} = g(\matr{X})$ &
$\frac{\partial L}{\partial \matr{X}}=g'(\matr{X}) \circ \frac{\partial L}{\partial \matr{Z}}$ & \\[1em]

3 & Hadamard Product & $\matr{Z} = \matr{X} \circ \matr{Y}$ &
$\frac{\partial L}{\partial \matr{X}}=\matr{Y} \circ \frac{\partial L}{\partial \matr{Z}}$ &
$\frac{\partial L}{\partial \matr{Y}}=\matr{X} \circ \frac{\partial L}{\partial \matr{Z}}$ \\[1em]

4 & Matrix Addition & $\matr{Z} = \matr{X} + \matr{Y}$ &
$\frac{\partial L}{\partial \matr{X}}=\frac{\partial L}{\partial \matr{Z}}$ &
$\frac{\partial L}{\partial \matr{Y}}=\frac{\partial L}{\partial \matr{Z}}$\\[1em]

5 & Transpose & $\matr{Z} = \transpose{\matr{X}}$ &
$\frac{\partial L}{\partial \matr{X}}=\transpose{\frac{\partial L}{\partial \matr{Z}}}$ & \\[1em]

6 & Sum along axis=0 & $\vecr{z}$ = \verb|np.sum(|${\matr{X}}$\verb|, axis=0)| &
$\frac{\partial L}{\partial \matr{X}}=\mathbf{1}_{\text{rows}(\matr{X}),1} \frac{\partial L}{\partial \vecr{z}}$ & \\[1em]

7 & Sum along axis=1 & $\vecr{z}$ = \verb|np.sum(|${\matr{X}}$\verb|, axis=1)| &
$\frac{\partial L}{\partial \matr{X}}=\frac{\partial L}{\partial \vecr{z}} \mathbf{1}_{1, \text{cols}(\matr{X})}$ & \\[1em]

8 & Broadcasting a column vector & $\matr{Z} = \vecr{x} \mathbf{1}_{1,\text{C}}$ &
$\frac{\partial L}{\partial \vecr{x}}=\mathtt{np.sum(} \frac{\partial L}{\partial \matr{Z}} \mathtt{, axis=1)}$ & \\[1em]

9 & Broadcasting a row vector & $\matr{Z} = \mathbf{1}_{\text{R},1} \vecr{x}$ &
$\frac{\partial L}{\partial \vecr{x}}=\mathtt{np.sum(} \frac{\partial L}{\partial \matr{Z}} \mathtt{, axis=0)}$ & \\[0.3em]
\bottomrule
\end{tabular}

\normalsize

\section{NumPy}
\footnotesize
\begin{tabular}{cllll}
\toprule
Index & Name & $\matr{Z}$/$\vecr{z}$ &
\verb dX  = $\frac{\partial L}{\partial \matr{X}}$/ \verb dx  = $\frac{\partial L}{\partial \vecr{x}}$ &
\verb dY  = $\frac{\partial L}{\partial \matr{Y}}$/ \verb dy  = $\frac{\partial L}{\partial \vecr{y}}$ \\[0.3em]

\midrule
1 & Matrix Multiplication & \verb Z  = \verb X@Y &
\verb dX  = \verb dZ@Y.T &
\verb dY  = \verb X.T@dZ \\[0.7em]

2 & Element-wise function & \verb Z  = \verb g(X) &
\verb dX  = \verb g'(X) *\verb dZ & \\[0.7em]

3 & Hadamard Product & \verb Z  = \verb X*Y &
\verb dX  = \verb Y*dZ &
\verb dY  = \verb X*dZ \\[0.7em]

4 & Matrix Addition & \verb Z  = \verb X+Y &
\verb dX  = \verb dZ &
\verb dY  = \verb dZ \\[0.7em]

5 & Transpose & \verb Z  = \verb X.T &
\verb dX  = \verb dZ.T & \\[0.7em]

6 & Sum along axis=0 & \verb z  = \verb|np.sum(X,axis=0)| &
\verb dX  = \verb|np.ones((X.shape[0],1))@dz| & \\[0.7em]

7 & Sum along axis=1 & \verb z  = \verb|np.sum(X,axis=1)| &
\verb dX  = \verb|dz@np.ones((1,X.shape[1]))| & \\[0.7em]

8 & Broadcasting a column vector & \verb Z  = \verb|x+np.zeros((1,C))| &
\verb dx  = \verb|np.sum(dZ,axis=1)| & \\[0.7em]

9 & Broadcasting a row vector & \verb Z  = \verb|x+np.zeros((R,1))| &
\verb dx  = \verb|np.sum(dZ,axis=0)| & \\[0.3em]
\bottomrule
\end{tabular}

\normalsize
\section{Matrix Multiplication} \label{Matrix Multiplication}
\subsection{Forward Pass}
Let $\matr{X}$ be a $2 \times 3$ matrix, and let $\matr{Y}$ be a $3 \times 2$ matrix. Let $\matr{Z} = \matr{X}\matr{Y}$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} & x_{1,3} \\%[0.5em]
x_{2,1} & x_{2,2} & x_{2,3} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\begin{flalign}
\matr{Y} &=
\begin{bmatrix}
y_{1,1} & y_{1,2} \\%[0.5em]
y_{2,1} & y_{2,2} \\%[0.5em]
y_{3,1} & y_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\vspace{1em}
\noindent Given $\matr{Z} = \matr{X}\matr{Y}$, Z is a $2 \times 2$ matrix which can be expressed as:

\begin{flalign}
\matr{Z} &= \begin{bmatrix}
z_{1,1} & z_{1,2}\\[0.5em]
z_{2,1} & z_{2,2}\\[0.5em]
\end{bmatrix}
&
\nonumber
\\
&=
\begin{bmatrix}
x_{1,1}.y_{1,1} + x_{1,2}.y_{2,1} + x_{1,3}.y_{3,1} & x_{1,1}.y_{1,2} + x_{1,2}.y_{2,2} + x_{1,3}.y_{3,2}\\[0.5em]
x_{2,1}.y_{1,1} + x_{2,2}.y_{2,1} + x_{2,3}.y_{3,1} & x_{2,1}.y_{1,2} + x_{2,2}.y_{2,2} + x_{2,3}.y_{3,2}\\[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We are given $\frac{\partial L}{\partial \matr{Z}}$. It will be of shape $2 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\end{bmatrix} & \eqncomment{$\frac{\partial L}{\partial \matr{Z}}$ is the same shape as $\matr{Z}$ as $L$ is a scalar} \label{dZ_matrix_multiplication}
\end{flalign}

\noindent We need to compute $\frac{\partial L}{\partial \matr{X}}$ and $\frac{\partial L}{\partial \matr{Y}}$. Using chain rule, we get:

\begin{flalign} \label{dX_matrix_multiplication}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\begin{flalign} \label{dY_matrix_multiplication}
\frac{\partial L}{\partial \matr{Y}} &= \frac{\partial \matr{Z}}{\partial \matr{Y}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial x_{1,3}} & \frac{\partial z_{1,2}}{\partial x_{1,3}} & \frac{\partial z_{2,1}}{\partial x_{1,3}} & \frac{\partial z_{2,2}}{\partial x_{1,3}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{2,2}}{\partial x_{2,2}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial x_{2,3}} & \frac{\partial z_{1,2}}{\partial x_{2,3}} & \frac{\partial z_{2,1}}{\partial x_{2,3}} & \frac{\partial z_{2,2}}{\partial x_{2,3}} \\[0.5em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\matr{Z}$ are being treated as column vectors. Therefore, $\frac{\partial \matr{Z}}{\partial \matr{X}}$ is of shape $6\times4$.}
\nonumber \\
\label{dZbydX_matrix_multiplication}
&=
\begin{bmatrix}
y_{1,1} & y_{1,2} & 0 & 0 \\[0.5em]
y_{2,1} & y_{2,2} & 0 & 0 \\[0.5em]
y_{3,1} & y_{3,2} & 0 & 0 \\[0.5em]
0 & 0 & y_{1,1} & y_{1,2} \\[0.5em]
0 & 0 & y_{2,1} & y_{2,2} \\[0.5em]
0 & 0 & y_{3,1} & y_{3,2} \\[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_matrix_multiplication} expressed as a column vector will be:

\begin{flalign}
\label{dZAsColumnVector_matrix_multiplication}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix} & \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $2 \times 2$ to $4 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_matrix_multiplication} and \ref{dZAsColumnVector_matrix_multiplication} into equation \ref{dX_matrix_multiplication}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
y_{1,1} & y_{1,2} & 0 & 0 \\[0.5em]
y_{2,1} & y_{2,2} & 0 & 0 \\[0.5em]
y_{3,1} & y_{3,2} & 0 & 0 \\[0.5em]
0 & 0 & y_{1,1} & y_{1,2} \\[0.5em]
0 & 0 & y_{2,1} & y_{2,2} \\[0.5em]
0 & 0 & y_{3,1} & y_{3,2} \\[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\ 
&= 
\begin{bmatrix}
y_{1,1}.\frac{\partial L}{\partial z_{1,1}} + y_{1,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{2,1}.\frac{\partial L}{\partial z_{1,1}} + y_{2,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{3,1}.\frac{\partial L}{\partial z_{1,1}} + y_{3,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{1,1}.\frac{\partial L}{\partial z_{2,1}} + y_{1,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
y_{2,1}.\frac{\partial L}{\partial z_{2,1}} + y_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
y_{3,1}.\frac{\partial L}{\partial z_{2,1}} + y_{3,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_matrix_multiplication}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_matrix_multiplication} as a matrix of shape $\matr{X}$, we get:
\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
y_{1,1}.\frac{\partial L}{\partial z_{1,1}} + y_{1,2}.\frac{\partial L}{\partial z_{1,2}} &
y_{2,1}.\frac{\partial L}{\partial z_{1,1}} + y_{2,2}.\frac{\partial L}{\partial z_{1,2}} &
y_{3,1}.\frac{\partial L}{\partial z_{1,1}} + y_{3,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{1,1}.\frac{\partial L}{\partial z_{2,1}} + y_{1,2}.\frac{\partial L}{\partial z_{2,2}} &
y_{2,1}.\frac{\partial L}{\partial z_{2,1}} + y_{2,2}.\frac{\partial L}{\partial z_{2,2}} &
y_{3,1}.\frac{\partial L}{\partial z_{2,1}} + y_{3,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
\nonumber \\ 
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
\begin{bmatrix}
y_{1,1} & y_{2,1} & y_{3,1} \\%[0.5em]
y_{1,2} & y_{2,2} & y_{3,2} \\%[0.5em]
\end{bmatrix}
& \eqncomment{Decomposing into a matmul operation}
\nonumber \\
&=
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
\underbrace{
\transpose{\begin{bmatrix}
y_{1,1} & y_{1,2} \\%[0.5em]
y_{2,1} & y_{2,2} \\%[0.5em]
y_{3,1} & y_{3,2} \\%[0.5em]
\end{bmatrix}}}_{\transpose{\matr{Y}}}
\nonumber \\ \label{dXFinal}
&= \frac{\partial L}{\partial \matr{Z}} \transpose{\matr{Y}}
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{Y}}$}
To compute $\frac{\partial L}{\partial \matr{Y}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{Y}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{Y}}$, we will reshape it back to a matrix with the same shape as $\matr{Y}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{Y}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial y_{1,1}} & \frac{\partial z_{1,2}}{\partial y_{1,1}} & \frac{\partial z_{2,1}}{\partial y_{1,1}} & \frac{\partial z_{2,2}}{\partial y_{1,1}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial y_{1,2}} & \frac{\partial z_{1,2}}{\partial y_{1,2}} & \frac{\partial z_{2,1}}{\partial y_{1,2}} & \frac{\partial z_{2,2}}{\partial y_{1,2}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial y_{2,1}} & \frac{\partial z_{1,2}}{\partial y_{2,1}} & \frac{\partial z_{2,1}}{\partial y_{2,1}} & \frac{\partial z_{2,2}}{\partial y_{2,1}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial y_{2,2}} & \frac{\partial z_{1,2}}{\partial y_{2,2}} & \frac{\partial z_{2,1}}{\partial y_{2,2}} & \frac{\partial z_{2,2}}{\partial y_{2,2}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial y_{3,1}} & \frac{\partial z_{1,2}}{\partial y_{3,1}} & \frac{\partial z_{2,1}}{\partial y_{3,1}} & \frac{\partial z_{2,2}}{\partial y_{3,1}} \\[0.5em]
\frac{\partial z_{1,1}}{\partial y_{3,2}} & \frac{\partial z_{1,2}}{\partial y_{3,2}} & \frac{\partial z_{2,1}}{\partial y_{3,2}} & \frac{\partial z_{2,2}}{\partial y_{3,2}} \\[0.5em]
\end{bmatrix}
& \eqncomment{$\matr{Y}$, $\matr{Z}$ are being treated as column vectors. Therefore, $\frac{\partial \matr{Z}}{\partial \matr{Y}}$ is of shape $6\times4$.}
\nonumber
\\
&=
\begin{bmatrix}
x_{1,1} & 0 & x_{2,1} & 0 \\[0.5em]
0 & x_{1,1} & 0 & x_{2,1} \\[0.5em]
x_{1,2} & 0 & x_{2,2} & 0 \\[0.5em]
0 & x_{1,2} & 0 & x_{2,2} \\[0.5em]
x_{1,3} & 0 & x_{2,3} & 0 \\[0.5em]
0 & x_{1,3} & 0 & x_{2,3} \\[0.5em]
\end{bmatrix} \label{dZbydY_matrix_multiplication}
\end{flalign}

\noindent Plugging equations \ref{dZbydY_matrix_multiplication} and \ref{dZAsColumnVector_matrix_multiplication} into equation \ref{dY_matrix_multiplication}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{Y}} &=
\begin{bmatrix}
x_{1,1} & 0 & x_{2,1} & 0 \\[0.5em]
0 & x_{1,1} & 0 & x_{2,1} \\[0.5em]
x_{1,2} & 0 & x_{2,2} & 0 \\[0.5em]
0 & x_{1,2} & 0 & x_{2,2} \\[0.5em]
x_{1,3} & 0 & x_{2,3} & 0 \\[0.5em]
0 & x_{1,3} & 0 & x_{2,3} \\[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{Y}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
x_{1,1}.\frac{\partial L}{\partial z_{1,1}} + x_{2,1}.\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
x_{1,1}.\frac{\partial L}{\partial z_{1,2}} + x_{2,1}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{1,2}.\frac{\partial L}{\partial z_{1,1}} + x_{2,2}.\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
x_{1,2}.\frac{\partial L}{\partial z_{1,2}} + x_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{1,3}.\frac{\partial L}{\partial z_{1,1}} + x_{2,3}.\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
x_{1,3}.\frac{\partial L}{\partial z_{1,2}} + x_{2,3}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix} \label{dYAsColumnVector_matrix_multiplication}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dYAsColumnVector_matrix_multiplication} as a matrix of shape $\matr{Y}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{Y}} &=
\begin{bmatrix}
x_{1,1}.\frac{\partial L}{\partial z_{1,1}} + x_{2,1}.\frac{\partial L}{\partial z_{2,1}} &
x_{1,1}.\frac{\partial L}{\partial z_{1,2}} + x_{2,1}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{1,2}.\frac{\partial L}{\partial z_{1,1}} + x_{2,2}.\frac{\partial L}{\partial z_{2,1}} &
x_{1,2}.\frac{\partial L}{\partial z_{1,2}} + x_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{1,3}.\frac{\partial L}{\partial z_{1,1}} + x_{2,3}.\frac{\partial L}{\partial z_{2,1}} &
x_{1,3}.\frac{\partial L}{\partial z_{1,2}} + x_{2,3}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} & x_{2,1} \\%[0.5em]
x_{1,2} & x_{2,2} \\%[0.5em]
x_{1,3} & x_{2,3} \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Decomposing into a matmul operation}
\nonumber \\
&=
\underbrace{
\transpose{
\begin{bmatrix}
x_{1,1} & x_{1,2} & x_{1,3} \\%[0.5em]
x_{2,1} & x_{2,2} & x_{2,3} \\%[0.5em]
\end{bmatrix}}}_{\transpose{\matr{X}}}
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
\nonumber \\
&= \transpose{\matr{X}} \frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\section{Element-wise operation on a Matrix}
\subsection{Forward Pass}
Consider some function $g(x)$ which is applied element-wise on a matrix $\matr{X}$ of shape $3 \times 2$. Let $\matr{Z} = g(\matr{X})$. $\matr{Z}$ will be of shape $3 \times 2$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\matr{Z}$ can be expressed as:

\begin{flalign}
\matr{Z} &=
\begin{bmatrix}
z_{1,1} & z_{1,2} \\%[0.5em]
z_{2,1} & z_{2,2} \\%[0.5em]
z_{3,1} & z_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber \\
&=
\begin{bmatrix}
g(x_{1,1}) & g(x_{1,2}) \\[0.5em]
g(x_{2,1}) & g(x_{2,2}) \\[0.5em]
g(x_{3,1}) & g(x_{3,2}) \\[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $3 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix} & \eqncomment{$\frac{\partial L}{\partial \matr{Z}}$ is the same shape as $\matr{Z}$ as $L$ is a scalar}
\label{dZ_elewise_single_matrix}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$. Using chain rule, we get:

\begin{flalign} \label{dX_elewise_single_matrix}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{3,1}}{\partial x_{1,1}} & \frac{\partial z_{3,2}}{\partial x_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} & \frac{\partial z_{3,1}}{\partial x_{1,2}} & \frac{\partial z_{3,2}}{\partial x_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} & \frac{\partial z_{3,1}}{\partial x_{2,1}} & \frac{\partial z_{3,2}}{\partial x_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{2,2}}{\partial x_{2,2}} & \frac{\partial z_{3,1}}{\partial x_{2,2}} & \frac{\partial z_{3,2}}{\partial x_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{2,2}}{\partial x_{3,1}} & \frac{\partial z_{3,1}}{\partial x_{3,1}} & \frac{\partial z_{3,2}}{\partial x_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{1,2}}{\partial x_{3,2}} & \frac{\partial z_{2,1}}{\partial x_{3,2}} & \frac{\partial z_{2,2}}{\partial x_{3,2}} & \frac{\partial z_{3,1}}{\partial x_{3,2}} & \frac{\partial z_{3,2}}{\partial x_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{X}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydX_elewise_single_matrix}
&=
\begin{bmatrix*}[c]
g'(x_{1,1}) & 0 & 0 & 0 & 0 & 0 \\[0.0em]
0 & g'(x_{1,2}) & 0 & 0 & 0 & 0 \\[0.0em]
0 & 0 & g'(x_{2,1}) & 0 & 0 & 0 \\[0.0em]
0 & 0 & 0 & g'(x_{2,2}) & 0 & 0 \\[0.0em]
0 & 0 & 0 & 0 & g'(x_{3,1}) & 0 \\[0.0em]
0 & 0 & 0 & 0 & 0 & g'(x_{3,2}) \\[0.0em]
\end{bmatrix*}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_elewise_single_matrix} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_elewise_single_matrix}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $3 \times 2$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_elewise_single_matrix} and \ref{dZAsColumnVector_elewise_single_matrix} into equation \ref{dX_elewise_single_matrix}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
g'(x_{1,1}) & 0 & 0 & 0 & 0 & 0 \\[0.5em]
0 & g'(x_{1,2}) & 0 & 0 & 0 & 0 \\[0.5em]
0 & 0 & g'(x_{2,1}) & 0 & 0 & 0 \\[0.5em]
0 & 0 & 0 & g'(x_{2,2}) & 0 & 0 \\[0.5em]
0 & 0 & 0 & 0 & g'(x_{3,1}) & 0 \\[0.5em]
0 & 0 & 0 & 0 & 0 & g'(x_{3,2}) \\[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being}{treated as column vectors.}
\nonumber \\
&=
\begin{bmatrix}
g'(x_{1,1}).\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
g'(x_{1,2}).\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
g'(x_{2,1}).\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
g'(x_{2,2}).\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
g'(x_{3,1}).\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
g'(x_{3,2}).\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_elewise_single_matrix}
\end{flalign}

\noindent Note, we will be using $\circ$ to denote element-wise multiplication between matrices popularly known as Hadamard product. Also $g'(\matr{X})$ like $g(\matr{X})$ will be applied element-wise.

\begin{flalign}
g'(\matr{X}) &=
\begin{bmatrix}
g'(x_{1,1}) & g'(x_{1,2}) \\[0.5em]
g'(x_{2,1}) & g'(x_{2,2}) \\[0.5em]
g'(x_{3,1}) & g'(x_{3,2}) \\[0.5em]
\end{bmatrix} & \nonumber
\end{flalign}

\noindent Now, reshaping column vector in equation \ref{dXAsColumnVector_elewise_single_matrix} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
g'(x_{1,1}).\frac{\partial L}{\partial z_{1,1}} &
g'(x_{1,2}).\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
g'(x_{2,1}).\frac{\partial L}{\partial z_{2,1}} &
g'(x_{2,2}).\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
g'(x_{3,1}).\frac{\partial L}{\partial z_{3,1}} &
g'(x_{3,2}).\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
\nonumber \\
&=
\underbrace{
\begin{bmatrix}
g'(x_{1,1}) & g'(x_{1,2}) \\[0.5em]
g'(x_{2,1}) & g'(x_{2,2}) \\[0.5em]
g'(x_{3,1}) & g'(x_{3,2}) \\[0.5em]
\end{bmatrix}}_{g'(\matr{X})}
\circ
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
& \eqncomment{Decomposing into an element-wise multiplication between matrices.}
\nonumber \\
&=
g'(\matr{X}) \circ \frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\section{Hadamard product}
\subsection{Forward Pass}
Let $\matr{X}$ be a $3 \times 2$ matrix, and let $\matr{Y}$ be a $3 \times 2$ matrix. Let $\matr{Z} = \matr{X} \circ \matr{Y}$ that is element-wise multiplication between $\matr{X}$ and $\matr{Y}$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\begin{flalign}
\matr{Y} &=
\begin{bmatrix}
y_{1,1} & y_{1,2} \\%[0.5em]
y_{2,1} & y_{2,2} \\%[0.5em]
y_{3,1} & y_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent Given $\matr{Z} = \matr{X} \circ \matr{Y}$, $\matr{Z}$ is a $3 \times 2$ matrix which can be expressed as:

\begin{flalign}
\matr{Z} &= \begin{bmatrix}
z_{1,1} & z_{1,2}\\[0.5em]
z_{2,1} & z_{2,2}\\[0.5em]
z_{3,1} & z_{3,2}\\[0.5em]
\end{bmatrix}
&
\nonumber
\\
&=
\begin{bmatrix}
x_{1,1}.y_{1,1} & x_{1,2}.y_{1,2} \\[0.5em]
x_{2,1}.y_{2,1} & x_{2,2}.y_{2,2} \\[0.5em]
x_{3,1}.y_{3,1} & x_{3,2}.y_{3,2} \\[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $3 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \matr{Z}}$ is the same shape as $\matr{Z}$ as $L$ is a scalar}
\label{dZ_hadamard_product}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$ and $\frac{\partial L}{\partial \matr{Y}}$. Using chain rule, we get:

\begin{flalign} \label{dX_hadamard_product}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\begin{flalign} \label{dY_hadamard_product}
\frac{\partial L}{\partial \matr{Y}} &= \frac{\partial \matr{Z}}{\partial \matr{Y}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{3,1}}{\partial x_{1,1}} & \frac{\partial z_{3,2}}{\partial x_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} & \frac{\partial z_{3,1}}{\partial x_{1,2}} & \frac{\partial z_{3,2}}{\partial x_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} & \frac{\partial z_{3,1}}{\partial x_{2,1}} & \frac{\partial z_{3,2}}{\partial x_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{2,2}}{\partial x_{2,2}} & \frac{\partial z_{3,1}}{\partial x_{2,2}} & \frac{\partial z_{3,2}}{\partial x_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{2,2}}{\partial x_{3,1}} & \frac{\partial z_{3,1}}{\partial x_{3,1}} & \frac{\partial z_{3,2}}{\partial x_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{1,2}}{\partial x_{3,2}} & \frac{\partial z_{2,1}}{\partial x_{3,2}} & \frac{\partial z_{2,2}}{\partial x_{3,2}} & \frac{\partial z_{3,1}}{\partial x_{3,2}} & \frac{\partial z_{3,2}}{\partial x_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{X}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydX_hadamard_product}
&=
\begin{bmatrix}
y_{1,1} & 0 & 0 & 0 & 0 & 0 \\[0.5em]
0 & y_{1,2} & 0 & 0 & 0 & 0 \\[0.5em]
0 & 0 & y_{2,1} & 0 & 0 & 0 \\[0.5em]
0 & 0 & 0 & y_{2,2} & 0 & 0 \\[0.5em]
0 & 0 & 0 & 0 & y_{3,1} & 0 \\[0.5em]
0 & 0 & 0 & 0 & 0 & y_{3,2} \\[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_hadamard_product} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_hadamard_product}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $3 \times 2$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_hadamard_product} and \ref{dZAsColumnVector_hadamard_product} into equation \ref{dX_hadamard_product}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
y_{1,1} & 0 & 0 & 0 & 0 & 0 \\[0.5em]
0 & y_{1,2} & 0 & 0 & 0 & 0 \\[0.5em]
0 & 0 & y_{2,1} & 0 & 0 & 0 \\[0.5em]
0 & 0 & 0 & y_{2,2} & 0 & 0 \\[0.5em]
0 & 0 & 0 & 0 & y_{3,1} & 0 \\[0.5em]
0 & 0 & 0 & 0 & 0 & y_{3,2} \\[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
y_{1,1}.\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
y_{1,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{2,1}.\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
y_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
y_{3,1}.\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
y_{3,2}.\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_hadamard_product}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_hadamard_product} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
y_{1,1}.\frac{\partial L}{\partial z_{1,1}} &
y_{1,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
y_{2,1}.\frac{\partial L}{\partial z_{2,1}} &
y_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
y_{3,1}.\frac{\partial L}{\partial z_{3,1}} &
y_{3,2}.\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
\nonumber \\
&=
\underbrace{
\begin{bmatrix}
y_{1,1} & y_{1,2} \\%[0.5em]
y_{2,1} & y_{2,2} \\%[0.5em]
y_{3,1} & y_{3,2} \\%[0.5em]
\end{bmatrix}}_{\matr{Y}}
\circ
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
& \eqncomment{Decomposing into an element-wise multiplication between matrices.}
\nonumber \\
&=
\matr{Y} \circ \frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{Y}}$}
To compute $\frac{\partial L}{\partial \matr{Y}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{Y}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{Y}}$, we will reshape it back to a matrix with the same shape as $\matr{Y}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{Y}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial y_{1,1}} & \frac{\partial z_{1,2}}{\partial y_{1,1}} & \frac{\partial z_{2,1}}{\partial y_{1,1}} & \frac{\partial z_{2,2}}{\partial y_{1,1}} & \frac{\partial z_{3,1}}{\partial y_{1,1}} & \frac{\partial z_{3,2}}{\partial y_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{1,2}} & \frac{\partial z_{1,2}}{\partial y_{1,2}} & \frac{\partial z_{2,1}}{\partial y_{1,2}} & \frac{\partial z_{2,2}}{\partial y_{1,2}} & \frac{\partial z_{3,1}}{\partial y_{1,2}} & \frac{\partial z_{3,2}}{\partial y_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{2,1}} & \frac{\partial z_{1,2}}{\partial y_{2,1}} & \frac{\partial z_{2,1}}{\partial y_{2,1}} & \frac{\partial z_{2,2}}{\partial y_{2,1}} & \frac{\partial z_{3,1}}{\partial y_{2,1}} & \frac{\partial z_{3,2}}{\partial y_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{2,2}} & \frac{\partial z_{1,2}}{\partial y_{2,2}} & \frac{\partial z_{2,1}}{\partial y_{2,2}} & \frac{\partial z_{2,2}}{\partial y_{2,2}} & \frac{\partial z_{3,1}}{\partial y_{2,2}} & \frac{\partial z_{3,2}}{\partial y_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{3,1}} & \frac{\partial z_{1,2}}{\partial y_{3,1}} & \frac{\partial z_{2,1}}{\partial y_{3,1}} & \frac{\partial z_{2,2}}{\partial y_{3,1}} & \frac{\partial z_{3,1}}{\partial y_{3,1}} & \frac{\partial z_{3,2}}{\partial y_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{3,2}} & \frac{\partial z_{1,2}}{\partial y_{3,2}} & \frac{\partial z_{2,1}}{\partial y_{3,2}} & \frac{\partial z_{2,2}}{\partial y_{3,2}} & \frac{\partial z_{3,1}}{\partial y_{3,2}} & \frac{\partial z_{3,2}}{\partial y_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{Y}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{Y}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydY_hadamard_product}
&=
\begin{bmatrix}
x_{1,1} & 0 & 0 & 0 & 0 & 0 \\[0.5em]
0 & x_{1,2} & 0 & 0 & 0 & 0 \\[0.5em]
0 & 0 & x_{2,1} & 0 & 0 & 0 \\[0.5em]
0 & 0 & 0 & x_{2,2} & 0 & 0 \\[0.5em]
0 & 0 & 0 & 0 & x_{3,1} & 0 \\[0.5em]
0 & 0 & 0 & 0 & 0 & x_{3,2} \\[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Plugging equations \ref{dZbydY_hadamard_product} and \ref{dZAsColumnVector_hadamard_product} into equation \ref{dY_hadamard_product}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{Y}} &=
\frac{\partial \matr{Z}}{\partial \matr{Y}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} & 0 & 0 & 0 & 0 & 0 \\[0.5em]
0 & x_{1,2} & 0 & 0 & 0 & 0 \\[0.5em]
0 & 0 & x_{2,1} & 0 & 0 & 0 \\[0.5em]
0 & 0 & 0 & x_{2,2} & 0 & 0 \\[0.5em]
0 & 0 & 0 & 0 & x_{3,1} & 0 \\[0.5em]
0 & 0 & 0 & 0 & 0 & x_{3,2} \\[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{Y}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
x_{1,1}.\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
x_{1,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
x_{2,1}.\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
x_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{3,1}.\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
x_{3,2}.\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dYAsColumnVector_hadamard_product}
\end{flalign}

\noindent Now, reshaping column vector in equation \ref{dYAsColumnVector_hadamard_product} as a matrix of shape $\matr{Y}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
x_{1,1}.\frac{\partial L}{\partial z_{1,1}} &
x_{1,2}.\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
x_{2,1}.\frac{\partial L}{\partial z_{2,1}} &
x_{2,2}.\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
x_{3,1}.\frac{\partial L}{\partial z_{3,1}} &
x_{3,2}.\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
\nonumber \\
&=
\underbrace{
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix}}_{\matr{X}}
\circ
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
& \eqncomment{Decomposing into an element-wise multiplication between matrices.}
\nonumber \\
&=
\matr{X} \circ \frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\section{Matrix Addition}
\subsection{Forward Pass}
Let $\matr{X}$ be a $3 \times 2$ matrix, and let $\matr{Y}$ be a $3 \times 2$ matrix. Let $\matr{Z} = \matr{X} + \matr{Y}$ that is element-wise addition between $\matr{X}$ and $\matr{Y}$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\begin{flalign}
\matr{Y} &=
\begin{bmatrix}
y_{1,1} & y_{1,2} \\%[0.5em]
y_{2,1} & y_{2,2} \\%[0.5em]
y_{3,1} & y_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent Given $\matr{Z} = \matr{X} + \matr{Y}$, Z is a $3 \times 2$ matrix which can be expressed as:

\begin{flalign}
\matr{Z} &= \begin{bmatrix}
z_{1,1} & z_{1,2}\\[0.5em]
z_{2,1} & z_{2,2}\\[0.5em]
z_{3,1} & z_{3,2}\\[0.5em]
\end{bmatrix}
&
\nonumber
\\
&=
\begin{bmatrix}
x_{1,1} + y_{1,1} & x_{1,2} + y_{1,2} \\[0.5em]
x_{2,1} + y_{2,1} & x_{2,2} + y_{2,2} \\[0.5em]
x_{3,1} + y_{3,1} & x_{3,2} + y_{3,2} \\[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $3 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.5em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.5em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \matr{Z}}$ is the same shape as $\matr{Z}$ as $L$ is a scalar}
\label{dZ_matrix_addition}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$ and $\frac{\partial L}{\partial \matr{Y}}$. Using chain rule, we get:

\begin{flalign} \label{dX_matrix_addition}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\begin{flalign} \label{dY_matrix_addition}
\frac{\partial L}{\partial \matr{Y}} &= \frac{\partial \matr{Z}}{\partial \matr{Y}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{3,1}}{\partial x_{1,1}} & \frac{\partial z_{3,2}}{\partial x_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} & \frac{\partial z_{3,1}}{\partial x_{1,2}} & \frac{\partial z_{3,2}}{\partial x_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} & \frac{\partial z_{3,1}}{\partial x_{2,1}} & \frac{\partial z_{3,2}}{\partial x_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{2,2}}{\partial x_{2,2}} & \frac{\partial z_{3,1}}{\partial x_{2,2}} & \frac{\partial z_{3,2}}{\partial x_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{2,2}}{\partial x_{3,1}} & \frac{\partial z_{3,1}}{\partial x_{3,1}} & \frac{\partial z_{3,2}}{\partial x_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{1,2}}{\partial x_{3,2}} & \frac{\partial z_{2,1}}{\partial x_{3,2}} & \frac{\partial z_{2,2}}{\partial x_{3,2}} & \frac{\partial z_{3,1}}{\partial x_{3,2}} & \frac{\partial z_{3,2}}{\partial x_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{X}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydX_matrix_addition}
&=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_matrix_addition} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_matrix_addition}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $3 \times 2$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_matrix_addition} and \ref{dZAsColumnVector_matrix_addition} into equation \ref{dX_matrix_addition}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_matrix_addition}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_matrix_addition} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} &
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} &
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
&
\nonumber \\
&=
\frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{Y}}$}
To compute $\frac{\partial L}{\partial \matr{Y}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{Y}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$, $\matr{Y}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{Y}}$, we will reshape it back to a matrix with the same shape as $\matr{Y}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{Y}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial y_{1,1}} & \frac{\partial z_{1,2}}{\partial y_{1,1}} & \frac{\partial z_{2,1}}{\partial y_{1,1}} & \frac{\partial z_{2,2}}{\partial y_{1,1}} & \frac{\partial z_{3,1}}{\partial y_{1,1}} & \frac{\partial z_{3,2}}{\partial y_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{1,2}} & \frac{\partial z_{1,2}}{\partial y_{1,2}} & \frac{\partial z_{2,1}}{\partial y_{1,2}} & \frac{\partial z_{2,2}}{\partial y_{1,2}} & \frac{\partial z_{3,1}}{\partial y_{1,2}} & \frac{\partial z_{3,2}}{\partial y_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{2,1}} & \frac{\partial z_{1,2}}{\partial y_{2,1}} & \frac{\partial z_{2,1}}{\partial y_{2,1}} & \frac{\partial z_{2,2}}{\partial y_{2,1}} & \frac{\partial z_{3,1}}{\partial y_{2,1}} & \frac{\partial z_{3,2}}{\partial y_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{2,2}} & \frac{\partial z_{1,2}}{\partial y_{2,2}} & \frac{\partial z_{2,1}}{\partial y_{2,2}} & \frac{\partial z_{2,2}}{\partial y_{2,2}} & \frac{\partial z_{3,1}}{\partial y_{2,2}} & \frac{\partial z_{3,2}}{\partial y_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{3,1}} & \frac{\partial z_{1,2}}{\partial y_{3,1}} & \frac{\partial z_{2,1}}{\partial y_{3,1}} & \frac{\partial z_{2,2}}{\partial y_{3,1}} & \frac{\partial z_{3,1}}{\partial y_{3,1}} & \frac{\partial z_{3,2}}{\partial y_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial y_{3,2}} & \frac{\partial z_{1,2}}{\partial y_{3,2}} & \frac{\partial z_{2,1}}{\partial y_{3,2}} & \frac{\partial z_{2,2}}{\partial y_{3,2}} & \frac{\partial z_{3,1}}{\partial y_{3,2}} & \frac{\partial z_{3,2}}{\partial y_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{Y}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{Y}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydY_matrix_addition}
&=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Plugging equations \ref{dZbydY_matrix_addition} and \ref{dZAsColumnVector_matrix_addition} into equation \ref{dY_matrix_addition}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{Y}} &=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{Y}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dYAsColumnVector_matrix_addition}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dYAsColumnVector_matrix_addition} as a matrix of shape $\matr{Y}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{Y}} &=
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} &
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} &
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
&
\nonumber \\
&=
\frac{\partial L}{\partial \matr{Z}}
\end{flalign}

\section{Transpose}
\subsection{Forward Pass}
Suppose we are given a matrix $\matr{X}$ of shape $3 \times 2$. Let $\matr{Z} = \transpose{\matr{X}}$. $\matr{Z}$ will be of shape $2 \times 3$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\matr{Z}$ can be expressed as:

\begin{flalign}
\matr{Z} &=
\begin{bmatrix}
z_{1,1} & z_{1,2} & z_{1,3}\\%[0.5em]
z_{2,1} & z_{2,2} & z_{2,3}\\%[0.5em]
\end{bmatrix} &
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} & x_{2,1} & x_{3,1} \\%[0.5em]
x_{1,2} & x_{2,2} & x_{3,2} \\%[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $2 \times 3$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} & \frac{\partial L}{\partial z_{1,3}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} & \frac{\partial L}{\partial z_{2,3}} \\[0.5em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \matr{Z}}$ is the same shape as $\matr{Z}$ as $L$ is a scalar}
\label{dZ_transpose}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$. Using chain rule, we get:

\begin{flalign} \label{dX_transpose}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrices $\matr{X}$ and $\matr{Z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{1,3}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{3,3}}{\partial x_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{1,3}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} & \frac{\partial z_{3,3}}{\partial x_{1,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{1,3}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} & \frac{\partial z_{3,3}}{\partial x_{2,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} & \frac{\partial z_{1,3}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{2,2}}{\partial x_{2,2}} & \frac{\partial z_{3,3}}{\partial x_{2,2}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} & \frac{\partial z_{1,3}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{2,2}}{\partial x_{3,1}} & \frac{\partial z_{3,3}}{\partial x_{3,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{1,2}}{\partial x_{3,2}} & \frac{\partial z_{1,3}}{\partial x_{3,2}} & \frac{\partial z_{2,1}}{\partial x_{3,2}} & \frac{\partial z_{2,2}}{\partial x_{3,2}} & \frac{\partial z_{3,3}}{\partial x_{3,2}}\\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\matr{Z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \matr{X}}$ is of shape $6\times6$.}
\nonumber
\\ \label{dZbydX_transpose}
&=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_transpose} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_transpose}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $2 \times 3$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_transpose} and \ref{dZAsColumnVector_transpose} into equation \ref{dX_transpose}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \matr{Z}}{\partial \matr{X}}\frac{\partial L}{\partial \matr{Z}} &
\nonumber \\
&=
\begin{bmatrix}
1 & 0 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 1 & 0 & 0 \\%[0.5em]
0 & 1 & 0 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 1 & 0 \\%[0.5em]
0 & 0 & 1 & 0 & 0 & 0 \\%[0.5em]
0 & 0 & 0 & 0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} \\[0.7em]
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_transpose}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_transpose} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} &
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} &
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
&
\nonumber \\
&=
\underbrace{
\transpose{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} & \frac{\partial L}{\partial z_{1,3}} \\[0.5em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} & \frac{\partial L}{\partial z_{2,3}} \\[0.5em]
\end{bmatrix}}}_{\transpose{\frac{\partial L}{\partial \matr{Z}}}}
\nonumber \\
&=
\transpose{\frac{\partial L}{\partial \matr{Z}}}
\end{flalign}

\section{Sum along axis=0}
\subsection{Forward Pass}
Suppose we are given a matrix $\matr{X}$ of shape $3 \times 2$. Let $\vecr{z}$ = \verb|np.sum(|${\matr{X}}$\verb|, axis=0)|. $\vecr{z}$ will be of shape $1 \times 2$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\vecr{z}$ can be expressed as:

\begin{flalign}
\vecr{z} &=
\begin{bmatrix}
z_{1,1} & z_{1,2} \\%[0.5em]
\end{bmatrix}
&
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} + x_{2,1} + x_{3,1} &
x_{1,2} + x_{2,2} + x_{3,2} \\%[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \vecr{z}}$ of shape $1 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \vecr{z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.3em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \vecr{z}}$ is the same shape as $\vecr{z}$ as $L$ is a scalar}
\label{dZ_sum_along_axis_0}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$. Using chain rule, we get:

\begin{flalign} \label{dX_sum_along_axis_0}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \vecr{z}}{\partial \matr{X}}\frac{\partial L}{\partial \vecr{z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \vecr{z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrix $\matr{X}$ as well as vector $\vecr{z}$ as column vectors, and compute Jacobians on them. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \vecr{z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{1,2}}{\partial x_{2,2}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{1,2}}{\partial x_{3,2}} \\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$, $\vecr{z}$ are being treated as column vectors.}{Therefore, $\frac{\partial \vecr{z}}{\partial \matr{X}}$ is of shape $6\times2$.}
\nonumber
\\ \label{dZbydX_sum_along_axis_0}
&=
\begin{bmatrix}
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \vecr{z}}$ in equation \ref{dZ_sum_along_axis_0} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_sum_along_axis_0}
\frac{\partial L}{\partial \vecr{z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\end{bmatrix} &
& \eqncomment{Reshaping $\frac{\partial L}{\partial \vecr{z}}$ from shape $1 \times 2$ to $2 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_sum_along_axis_0} and \ref{dZAsColumnVector_sum_along_axis_0} into equation \ref{dX_sum_along_axis_0}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \vecr{z}}{\partial \matr{X}}\frac{\partial L}{\partial \vecr{z}} &
\nonumber \\
&=
\begin{bmatrix}
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
1 & 0 \\%[0.5em]
0 & 1 \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$, $\vecr{z}$ and $\frac{\partial L}{\partial \vecr{z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_sum_along_axis_0}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_sum_along_axis_0} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\end{bmatrix} \nonumber
\\
&=
\begin{bmatrix}
1 \\
1 \\
1 \\
\end{bmatrix}
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.3em]
\end{bmatrix}}_{\frac{\partial L}{\partial \vecr{z}}}
& \eqncomment{Decomposing into a matmul operation}
\nonumber
\\
&=
\mathbf{1}_{3,1} \frac{\partial L}{\partial \vecr{z}} \nonumber
& \eqncomment{We are using a bold 1 namely $\mathbf{1}$ to denote matrix of ones}
\\
&=
\mathbf{1}_{\text{rows}(\matr{X}),1} \frac{\partial L}{\partial \vecr{z}}
& \eqncomment{Generalizing beyond our considered example}
\end{flalign}

\section{Sum along axis=1}
\subsection{Forward Pass}
Suppose we are given a matrix $\matr{X}$ of shape $3 \times 2$. Let $\vecr{z}$ = \verb|np.sum(|${\matr{X}}$\verb|, axis=1)|. $\vecr{z}$ will be of shape $3 \times 1$.

\begin{flalign}
\matr{X} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} \\%[0.5em]
x_{2,1} & x_{2,2} \\%[0.5em]
x_{3,1} & x_{3,2} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\vecr{z}$ can be expressed as:

\begin{flalign}
\vecr{z} &=
\begin{bmatrix}
z_{1,1} \\
z_{2,1} \\
z_{3,1} \\
\end{bmatrix}
&
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} + x_{1,2} \\
x_{2,1} + x_{2,2} \\
x_{3,1} + x_{3,2} \\
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \vecr{z}}$ of shape $3 \times 1$.

\begin{flalign}
\frac{\partial L}{\partial \vecr{z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \vecr{z}}$ is the same shape as $\vecr{z}$ as $L$ is a scalar}
\label{dZAsColumnVector_sum_along_axis_1}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \matr{X}}$. Using chain rule, we get:

\begin{flalign} \label{dX_sum_along_axis_1}
\frac{\partial L}{\partial \matr{X}} &= \frac{\partial \vecr{z}}{\partial \matr{X}}\frac{\partial L}{\partial \vecr{z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \matr{X}}$}
To compute $\frac{\partial L}{\partial \matr{X}}$, we need to compute $\frac{\partial \vecr{z}}{\partial \matr{X}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrix $\matr{X}$ as a column vector, and compute Jacobians on the column vectors instead. Once we have computed the column vector corresponding to $\frac{\partial L}{\partial \matr{X}}$, we will reshape it back to a matrix with the same shape as $\matr{X}$.

\begin{flalign}
\frac{\partial \vecr{z}}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{3,1}}{\partial x_{1,1}}\\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{3,1}}{\partial x_{1,2}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{3,1}}{\partial x_{2,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,2}} & \frac{\partial z_{2,1}}{\partial x_{2,2}} & \frac{\partial z_{3,1}}{\partial x_{2,2}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{3,1}}{\partial x_{3,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,2}} & \frac{\partial z_{2,1}}{\partial x_{3,2}} & \frac{\partial z_{3,1}}{\partial x_{3,2}} \\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{X}$ is being treated as a column vector.}{Therefore, $\frac{\partial \vecr{z}}{\partial \matr{X}}$ is of shape $6\times3$.}
\nonumber
\\ \label{dZbydX_sum_along_axis_1}
&=
\begin{bmatrix}
1 & 0 & 0 \\%[0.5em]
1 & 0 & 0 \\%[0.5em]
0 & 1 & 0 \\%[0.5em]
0 & 1 & 0 \\%[0.5em]
0 & 0 & 1 \\%[0.5em]
0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_sum_along_axis_1} and \ref{dZAsColumnVector_sum_along_axis_1} into equation \ref{dX_sum_along_axis_1}, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\frac{\partial \vecr{z}}{\partial \matr{X}}\frac{\partial L}{\partial \vecr{z}} &
\nonumber \\
&=
\begin{bmatrix}
1 & 0 & 0 \\%[0.5em]
1 & 0 & 0 \\%[0.5em]
0 & 1 & 0 \\%[0.5em]
0 & 1 & 0 \\%[0.5em]
0 & 0 & 1 \\%[0.5em]
0 & 0 & 1 \\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{X}$ and $\frac{\partial L}{\partial \vecr{z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_sum_along_axis_1}
\end{flalign}

\noindent Reshaping column vector in equation \ref{dXAsColumnVector_sum_along_axis_1} as a matrix of shape $\matr{X}$, we get:

\begin{flalign}
\frac{\partial L}{\partial \matr{X}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} &
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} &
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} &
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\end{bmatrix} \nonumber
\\
&=
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \vecr{z}}}
\begin{bmatrix}
1 & 1 \\%[0.3em]
\end{bmatrix}
& \eqncomment{Decomposing into a matmul operation}
\nonumber
\\
&=
\frac{\partial L}{\partial \vecr{z}} \mathbf{1}_{1,2}
& \eqncomment{We are using a bold 1 namely $\mathbf{1}$ to denote matrix of ones}
\nonumber
\\
&=
\frac{\partial L}{\partial \vecr{z}} \mathbf{1}_{1, \text{cols}(\matr{X})}
& \eqncomment{Generalizing beyond our considered example}
\end{flalign}

\section{Broadcasting a column vector}
\subsection{Forward Pass}
Suppose we are given a vector $\vecr{x}$ of shape $3 \times 1$. Let $\matr{Z} = \vecr{x} \mathbf{1}_{1,\text{C}}$ where $\mathbf{1}$ denotes a matrix of ones. $\matr{Z}$ will be of shape $3 \times \text{C}$. Let us suppose that C = 2.

\begin{flalign}
\vecr{x} &=
\begin{bmatrix}
x_{1,1} \\%[0.5em]
x_{2,1} \\%[0.5em]
x_{3,1} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\matr{Z}$ can be expressed as:

\begin{flalign}
\matr{Z} &=
\begin{bmatrix}
z_{1,1} & z_{1,2} \\
z_{2,1} & z_{2,2} \\
z_{3,1} & z_{2,3} \\
\end{bmatrix}
&
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} & x_{1,1} \\
x_{2,1} & x_{2,1} \\
x_{3,1} & x_{3,1} \\
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $3 \times 2$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \label{dZ_broadcast_column_vector}
& \eqncomment{$\frac{\partial L}{\partial \vecr{z}}$ is the same shape as $\vecr{z}$ as $L$ is a scalar}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \vecr{x}}$. Using chain rule, we get:

\begin{flalign} \label{dX_broadcast_column_vector}
\frac{\partial L}{\partial \vecr{x}} &= \frac{\partial \matr{Z}}{\partial \vecr{x}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \vecr{x}}$}
To compute $\frac{\partial L}{\partial \vecr{x}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \vecr{x}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrix $\matr{Z}$ as a column vector, and compute Jacobians on the column vectors instead.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \vecr{x}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{3,1}}{\partial x_{1,1}} & \frac{\partial z_{3,2}}{\partial x_{1,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{2,1}} & \frac{\partial z_{1,2}}{\partial x_{2,1}} & \frac{\partial z_{2,1}}{\partial x_{2,1}} & \frac{\partial z_{2,2}}{\partial x_{2,1}} & \frac{\partial z_{3,1}}{\partial x_{2,1}} & \frac{\partial z_{3,2}}{\partial x_{2,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{3,1}} & \frac{\partial z_{1,2}}{\partial x_{3,1}} & \frac{\partial z_{2,1}}{\partial x_{3,1}} & \frac{\partial z_{2,2}}{\partial x_{3,1}} & \frac{\partial z_{3,1}}{\partial x_{3,1}} & \frac{\partial z_{3,2}}{\partial x_{3,1}} \\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{Z}$ is being treated as a column vector.}{Therefore, $\frac{\partial \matr{Z}}{\partial \vecr{x}}$ is of shape $3\times6$.}
\nonumber
\\ \label{dZbydX_broadcast_column_vector}
&=
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 & 0\\%[0.5em]
0 & 0 & 1 & 1 & 0 & 0\\%[0.5em]
0 & 0 & 0 & 0 & 1 & 1\\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_broadcast_column_vector} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_broadcast_column_vector}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $3 \times 2$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_broadcast_column_vector} and \ref{dZAsColumnVector_broadcast_column_vector} into equation \ref{dX_broadcast_column_vector}, we get:

\begin{flalign}
\frac{\partial L}{\partial \vecr{x}} &=
\frac{\partial \matr{Z}}{\partial \vecr{x}}\frac{\partial L}{\partial \matr{Z}}
&
\nonumber \\
&=
\begin{bmatrix}
1 & 1 & 0 & 0 & 0 & 0\\%[0.5em]
0 & 0 & 1 & 1 & 0 & 0\\%[0.5em]
0 & 0 & 0 & 0 & 1 & 1\\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} \\[0.7em]
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{Z}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} +
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} +
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} +
\frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix} \nonumber
\\
&=
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{3,1}} & \frac{\partial L}{\partial z_{3,2}} \\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}} }
\begin{bmatrix}
1 \\
1 \\
\end{bmatrix}
& \eqncomment{Decomposing into a matmul operation}
\nonumber
\\
&=
\frac{\partial L}{\partial \matr{Z}} \mathbf{1}_{\text{2},1}
& \eqncomment{We are using a bold 1 namely $\mathbf{1}$ to denote matrix of ones}
\nonumber
\\
&=
\frac{\partial L}{\partial \matr{Z}} \mathbf{1}_{\text{C},1}
& \eqncomment{Generalizing beyond our considered example}
\nonumber
\\
&= \mathtt{np.sum(} \matr{Z} \mathtt{, axis=1)}
& \eqncomment{Using $\mathtt{NumPy}$ notation for brevity}
\end{flalign}

\section{Broadcasting a row vector}
\subsection{Forward Pass}
Suppose we are given a vector $\vecr{x}$ of shape $1 \times 3$. Let $\matr{Z} = \mathbf{1}_{\text{R},1} \vecr{x}$ where $\mathbf{1}$ denotes a matrix of ones. $\matr{Z}$ will be of shape $\text{R} \times 3$. Let us suppose that R = 2.

\begin{flalign}
\vecr{x} &=
\begin{bmatrix}
x_{1,1} & x_{1,2} & x_{1,3} \\%[0.5em]
\end{bmatrix} &
\nonumber
\end{flalign}

\noindent $\matr{Z}$ can be expressed as:

\begin{flalign}
\matr{Z} &=
\begin{bmatrix}
z_{1,1} & z_{1,2} & z_{1,3} \\
z_{2,1} & z_{2,2} & z_{2,3} \\
\end{bmatrix}
&
\nonumber \\
&=
\begin{bmatrix}
x_{1,1} & x_{1,2} & x_{1,3} \\%[0.5em]
x_{1,1} & x_{1,2} & x_{1,3} \\%[0.5em]
\end{bmatrix}
\nonumber
\end{flalign}

\subsection{Backward Pass}
We have $\frac{\partial L}{\partial \matr{Z}}$ of shape $2 \times 3$.

\begin{flalign}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} & \frac{\partial L}{\partial z_{1,3}}\\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} & \frac{\partial L}{\partial z_{2,3}}\\[0.7em]
\end{bmatrix}
& \eqncomment{$\frac{\partial L}{\partial \vecr{z}}$ is the same shape as $\vecr{z}$ as $L$ is a scalar}
\label{dZ_broadcast_row_vector}
\end{flalign}

\noindent We now need to compute $\frac{\partial L}{\partial \vecr{x}}$. Using chain rule, we get:

\begin{flalign} \label{dX_broadcast_row_vector}
\frac{\partial L}{\partial \vecr{x}} &= \frac{\partial \matr{Z}}{\partial \vecr{x}}\frac{\partial L}{\partial \matr{Z}} &
\end{flalign}

\subsubsection{Computing $\frac{\partial L}{\partial \vecr{x}}$}
To compute $\frac{\partial L}{\partial \vecr{x}}$, we need to compute $\frac{\partial \matr{Z}}{\partial \vecr{x}}$. To make it easy for us to think about and capture the Jacobian in a two dimensional matrix (as opposed to a tensor), we will reshape matrix $\matr{Z}$ as well as vector $\vecr{x}$ as a column vector, and compute Jacobians on the column vectors instead.

\begin{flalign}
\frac{\partial \matr{Z}}{\partial \vecr{x}} &=
\begin{bmatrix}
\frac{\partial z_{1,1}}{\partial x_{1,1}} & \frac{\partial z_{1,2}}{\partial x_{1,1}} & \frac{\partial z_{1,3}}{\partial x_{1,1}} & \frac{\partial z_{2,1}}{\partial x_{1,1}} & \frac{\partial z_{2,2}}{\partial x_{1,1}} & \frac{\partial z_{2,3}}{\partial x_{1,1}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,2}} & \frac{\partial z_{1,2}}{\partial x_{1,2}} & \frac{\partial z_{1,3}}{\partial x_{1,2}} & \frac{\partial z_{2,1}}{\partial x_{1,2}} & \frac{\partial z_{2,2}}{\partial x_{1,2}} & \frac{\partial z_{2,3}}{\partial x_{1,2}} \\[0.7em]
\frac{\partial z_{1,1}}{\partial x_{1,3}} & \frac{\partial z_{1,2}}{\partial x_{1,3}} & \frac{\partial z_{1,3}}{\partial x_{1,3}} & \frac{\partial z_{2,1}}{\partial x_{1,3}} & \frac{\partial z_{2,2}}{\partial x_{1,3}} & \frac{\partial z_{2,3}}{\partial x_{1,3}} \\[0.7em]
\end{bmatrix}
& \longeqncomment{$\matr{Z}$ and $\vecr{x}$ are being treated as column vectors.}{Therefore, $\frac{\partial \matr{Z}}{\partial \vecr{x}}$ is of shape $3\times6$.}
\nonumber
\\ \label{dZbydX_broadcast_row_vector}
&=
\begin{bmatrix}
1 & 0 & 0 & 1 & 0 & 0\\%[0.5em]
0 & 1 & 0 & 0 & 1 & 0\\%[0.5em]
0 & 0 & 1 & 0 & 0 & 1\\%[0.5em]
\end{bmatrix}
\end{flalign}

\noindent Now, $\frac{\partial L}{\partial \matr{Z}}$ in equation \ref{dZ_broadcast_row_vector} expressed as a column vector will be:

\begin{flalign} \label{dZAsColumnVector_broadcast_row_vector}
\frac{\partial L}{\partial \matr{Z}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
& \eqncomment{Reshaping $\frac{\partial L}{\partial \matr{Z}}$ from shape $2 \times 3$ to $6 \times 1$}
\end{flalign}

\noindent Plugging equations \ref{dZbydX_broadcast_row_vector} and \ref{dZAsColumnVector_broadcast_row_vector} into equation \ref{dX_broadcast_row_vector}, we get:

\begin{flalign}
\frac{\partial L}{\partial \vecr{x}} &=
\frac{\partial \matr{Z}}{\partial \vecr{x}}\frac{\partial L}{\partial \matr{Z}}
&
\nonumber \\
&=
\begin{bmatrix}
1 & 0 & 0 & 1 & 0 & 0\\%[0.5em]
0 & 1 & 0 & 0 & 1 & 0\\%[0.5em]
0 & 0 & 1 & 0 & 0 & 1\\%[0.5em]
\end{bmatrix}
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} \\[0.7em]
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
& \eqncomment{$\matr{Z}$, $\vecr{x}$ and $\frac{\partial L}{\partial \matr{Z}}$ are being treated as column vectors}
\nonumber \\
&=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} +
\frac{\partial L}{\partial z_{2,1}} \\[0.7em]
\frac{\partial L}{\partial z_{1,2}} +
\frac{\partial L}{\partial z_{2,2}} \\[0.7em]
\frac{\partial L}{\partial z_{1,3}} +
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix} \label{dXAsColumnVector_broadcast_row_vector}
\end{flalign}

\noindent Now reshaping $\frac{\partial L}{\partial \vecr{x}}$ from column vector of shape $3 \times 1$ in equation \ref{dXAsColumnVector_broadcast_row_vector} into row vector of shape $1 \times 3$ we get:

\begin{flalign}
\frac{\partial L}{\partial \vecr{x}} &=
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} +
\frac{\partial L}{\partial z_{2,1}} &
\frac{\partial L}{\partial z_{1,2}} +
\frac{\partial L}{\partial z_{2,2}} &
\frac{\partial L}{\partial z_{1,3}} +
\frac{\partial L}{\partial z_{2,3}} \\[0.7em]
\end{bmatrix}
\nonumber \\
&=
\begin{bmatrix}
1 & 1\\
\end{bmatrix}
\underbrace{
\begin{bmatrix}
\frac{\partial L}{\partial z_{1,1}} & \frac{\partial L}{\partial z_{1,2}} & \frac{\partial L}{\partial z_{1,3}}\\[0.7em]
\frac{\partial L}{\partial z_{2,1}} & \frac{\partial L}{\partial z_{2,2}} & \frac{\partial L}{\partial z_{2,3}}\\[0.7em]
\end{bmatrix}}_{\frac{\partial L}{\partial \matr{Z}}}
& \eqncomment{Decomposing into a matmul operation}
\nonumber \\
&=
\mathbf{1}_{1, \text{2}} \frac{\partial L}{\partial \matr{Z}}
& \eqncomment{We are using a bold 1 namely $\mathbf{1}$ to denote matrix of ones}
\nonumber \\
&=
\mathbf{1}_{1, \text{R}} \frac{\partial L}{\partial \matr{Z}}
& \eqncomment{Generalizing beyond our considered example}
\nonumber \\
&=
\mathtt{np.sum(} \matr{Z} \mathtt{, axis=0)}
& \eqncomment{Using $\mathtt{NumPy}$ notation for brevity}
\end{flalign}

\medskip

\printbibliography

\end{document}


================================================
FILE: student-contributions/Makefile
================================================
# You want latexmk to *always* run, because make does not have all the info.
# Also, include non-file targets in .PHONY so they are run regardless of any
# file of the given name existing.
.PHONY: BackPropagationBasicMatrixOperations.pdf

# The first rule in a Makefile is the one executed by default ("make"). It
# should always be the "all" rule, so that "make" and "make all" are identical.
all: BackPropagationBasicMatrixOperations.pdf

# MAIN LATEXMK RULE

CC = latexmk 

# -pdf tells latexmk to generate PDF directly (instead of DVI).
# -pdflatex="" tells latexmk to call a specific backend with specific options.
# -use-make tells latexmk to call make for generating missing files.

# -interaction=nonstopmode keeps the pdflatex backend from stopping at a
# missing file reference and interactively asking you for an alternative.
CFLAGS = -pdf -pdflatex="pdflatex -interaction=nonstopmode" -use-make

BackPropagationBasicMatrixOperations.pdf: BackPropagationBasicMatrixOperations.tex
	$(CC) $(CFLAGS) BackPropagationBasicMatrixOperations.tex

# latexmk
# -CA clean up (remove) all nonessential files.
clean:
	$(CC) -CA


================================================
FILE: student-contributions/latexmkrc
================================================
# Settings
$xdvipdfmx = "xdvipdfmx -z 6 -o %D %O %S";

###############################
# Post processing of pdf file #
###############################

# assume the jobname is 'output' for sharelatex
my $ORIG_PDF_AGE = -M "output.pdf"; # get age of existing pdf if present

END {
    my $NEW_PDF_AGE = -M "output.pdf";
    return if !defined($NEW_PDF_AGE); # bail out if no pdf file
    return if defined($ORIG_PDF_AGE) && $NEW_PDF_AGE == $ORIG_PDF_AGE; # bail out if pdf was not updated
    $qpdf //= "/usr/local/bin/qpdf";
    $qpdf = $ENV{QPDF} if defined($ENV{QPDF}) && -x $ENV{QPDF};
    return if ! -x $qpdf; # check that qpdf exists
    $qpdf_opts //= "--linearize --newline-before-endstream";
    $qpdf_opts = $ENV{QPDF_OPTS} if defined($ENV{QPDF_OPTS});
    my $status = system($qpdf, split(' ', $qpdf_opts), "output.pdf", "output.pdf.opt");
    my $exitcode = ($status >> 8);
    print "qpdf exit code=$exitcode\n";
    # qpdf returns 0 for success, 3 for warnings (output pdf still created)
    return if !($exitcode == 0 || $exitcode == 3);
    print "Renaming optimised file to output.pdf\n";
    rename("output.pdf.opt", "output.pdf");
}

##############
# Glossaries #
##############
add_cus_dep( 'glo', 'gls', 0, 'glo2gls' );
add_cus_dep( 'acn', 'acr', 0, 'glo2gls');  # from Overleaf v1
sub glo2gls {
    system("makeglossaries $_[0]");
}

#############
# makeindex #
#############
@ist = glob("*.ist");
if (scalar(@ist) > 0) {
    $makeindex = "makeindex -s $ist[0] %O -o %D %S";
}

################
# nomenclature #
################
add_cus_dep("nlo", "nls", 0, "nlo2nls");
sub nlo2nls {
        system("makeindex $_[0].nlo -s nomencl.ist -o $_[0].nls -t $_[0].nlg");
}

#########
# Knitr #
#########
my $root_file = $ARGV[-1];

add_cus_dep( 'Rtex', 'tex', 0, 'rtex_to_tex');
sub rtex_to_tex {
    do_knitr("$_[0].Rtex");
}

sub do_knitr {
    my $dirname = dirname $_[0];
    my $basename = basename $_[0];
    system("Rscript -e \"library('knitr'); setwd('$dirname'); knit('$basename')\"");
}

my $rtex_file = $root_file =~ s/\.tex$/.Rtex/r;
unless (-e $root_file) {
    if (-e $rtex_file) {
        do_knitr($rtex_file);
    }
}

##########
# feynmf #
##########
push(@file_not_found, '^feynmf: Files .* and (.*) not found:$');
add_cus_dep("mf", "tfm", 0, "mf_to_tfm");
sub mf_to_tfm { system("mf '\\mode:=laserjet; input $_[0]'"); }

push(@file_not_found, '^feynmf: Label file (.*) not found:$');
add_cus_dep("mf", "t1", 0, "mf_to_label1");
sub mf_to_label1 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t1"); }
add_cus_dep("mf", "t2", 0, "mf_to_label2");
sub mf_to_label2 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t2"); }
add_cus_dep("mf", "t3", 0, "mf_to_label3");
sub mf_to_label3 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t3"); }
add_cus_dep("mf", "t4", 0, "mf_to_label4");
sub mf_to_label4 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t4"); }
add_cus_dep("mf", "t5", 0, "mf_to_label5");
sub mf_to_label5 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t5"); }
add_cus_dep("mf", "t6", 0, "mf_to_label6");
sub mf_to_label6 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t6"); }
add_cus_dep("mf", "t7", 0, "mf_to_label7");
sub mf_to_label7 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t7"); }
add_cus_dep("mf", "t8", 0, "mf_to_label8");
sub mf_to_label8 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t8"); }
add_cus_dep("mf", "t9", 0, "mf_to_label9");
sub mf_to_label9 { system("mf '\\mode:=laserjet; input $_[0]' && touch $_[0].t9"); }

##########
# feynmp #
##########
push(@file_not_found, '^dvipdf: Could not find figure file (.*); continuing.$');
add_cus_dep("mp", "1", 0, "mp_to_eps");
sub mp_to_eps {
    system("mpost $_[0]");
    return 0;
}

#############
# asymptote #
#############
sub asy {return system("asy --offscreen '$_[0]'");}
add_cus_dep("asy","eps",0,"asy");
add_cus_dep("asy","pdf",0,"asy");
add_cus_dep("asy","tex",0,"asy");

#############
# metapost  #  # from Overleaf v1
#############
add_cus_dep('mp', '1', 0, 'mpost');
sub mpost {
    my $file = $_[0];
    my ($name, $path) = fileparse($file);
    pushd($path);
    my $return = system "mpost $name";
    popd();
    return $return;
}

##########
# chktex #
##########
unlink 'output.chktex' if -f 'output.chktex';
if (defined $ENV{'CHKTEX_OPTIONS'}) {
    use File::Basename;
    use Cwd;

    # identify the main file
    my $target = $ARGV[-1];
    my $file = basename($target);

    if ($file =~ /\.tex$/) {
        # change directory for a limited scope
        my $orig_dir = cwd();
        my $subdir = dirname($target);
        chdir($subdir);
        # run chktex on main file
        $status = system("/usr/bin/run-chktex.sh", $orig_dir, $file);
        # go back to original directory
        chdir($orig_dir);

        # in VALIDATE mode we always exit after running chktex
        # otherwise we exit if EXIT_ON_ERROR is set

        if ($ENV{'CHKTEX_EXIT_ON_ERROR'} || $ENV{'CHKTEX_VALIDATE'}) {
            # chktex doesn't let us access the error info via exit status
            # so look through the output
            open(my $fh, "<", "output.chktex");
            my $errors = 0;
            {
                local $/ = "\n";
                while(<$fh>) {
                    if (/^\S+:\d+:\d+: Error:/) {
                        $errors++;
                        print;
                    }
                }
            }
            close($fh);
            exit(1) if $errors > 0;
            exit(0) if $ENV{'CHKTEX_VALIDATE'};
        }
    }
}


================================================
FILE: terminal-tutorial.md
================================================
---
layout: page
title: Terminal.com Tutorial
permalink: /terminal-tutorial/
---
For the assignments, we offer an option to use [Terminal](https://www.stanfordterminalcloud.com) for developing and testing your implementations. Notice that we're not using the main Terminal.com site but a subdomain which has been assigned specifically for this class. [Terminal](https://www.stanfordterminalcloud.com) is an online computing platform that allows us to access pre-configured command line environments. Note that, it's not required to use [Terminal](https://www.stanfordterminalcloud.com) for your assignments; however, it might make life easier with all the required dependencies and development toolkits configured for you.

This tutorial lists the necessary steps of working on the assignments using Terminal. First of all, [sign up your own account](https://www.stanfordterminalcloud.com/signup). Log in [Terminal](https://www.stanfordterminalcloud.com) with the account that you have just created.

For each assignment, we will provide you a link to a shared terminal snapshot. These snapshots are pre-configured command line environments with the starter code, where you can write your implementations and execute the code.

Here's an example of what a snapshot page looked like for an assignment in 2015:

<div class='fig figcenter fighighlight'>
  <img src='/assets/terminal-shared.jpg'>
</div>

Yours will look similar. Click the "Start" button on the lower right corner. This will clone the shared snapshot to your own account. Now you should be able to find the terminal under the [My Terminals](https://www.stanfordterminalcloud.com/terminals) tab.

<div class='fig figcenter fighighlight'>
  <img src='/assets/terminal-my.jpg'>
</div>

Yours will look similar. You are all set! To work on the assignments, click the link to your terminal (shown in the red box in the above image). This link will open up the user interface layer over an AWS machine. It will look something similar to this:

<div class='fig figcenter fighighlight'>
  <img src='/assets/terminal-development.jpg'>
</div>

We have set up the Jupyter Notebook and other dependencies in the terminal. Launch a new console window with the small + sign (if you don't already have one), navigate around and look for the assignment folder and code. Launch a Jupyer notebook and work on the assignment. If your're a student enrolled in the class you will submit your assignment through Coursework:

<div class='fig figcenter fighighlight'>
  <img src='/assets/terminal-coursework.jpg'>
</div>

For more information about [Terminal](https://www.stanfordterminalcloud.com), check out the [FAQ](https://www.stanfordterminalcloud.com/faq) page.

**Important Note:** the usage of Terminal is charged on an hourly rate based on the instance type. A medium type instance costs $0.124 per hour. If you are enrolled in the class email Serena Yeung (syyeung@cs.stanford.edu) to request Terminal credits. We will send you $3 the first time around, and you can request more funds on a rolling basis when you run out. Please be responsible with the funds we allocate you.

================================================
FILE: transfer-learning.md
================================================
---
layout: page
permalink: /transfer-learning/
---

(These notes are currently in draft form and under development)

Table of Contents:

- [Transfer Learning](#tf)
- [Additional References](#add)

<a name='tf'></a>

## Transfer Learning

In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:

- **ConvNet as fixed feature extractor**. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer's outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features **CNN codes**. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
- **Fine-tuning the ConvNet**. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it's possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.
- **Pretrained models**. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a [Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo) where people share their network weights.

**When and how to fine-tune?** How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:

1. *New dataset is small and similar to original dataset*. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
2. *New dataset is large and similar to the original dataset*. Since we have more data, we can have more confidence that we won't overfit if we were to try to fine-tune through the full network.
3. *New dataset is small but very different from the original dataset*. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
4. *New dataset is large and very different from the original dataset*. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

**Practical advice**. There are a few additional things to keep in mind when performing Transfer Learning:

- *Constraints from pretrained models*. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can't arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides "fit"). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convolutional Layer that has receptive field size 6x6, and is applied with padding of 0.
- *Learning rates*. It's common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don't wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

<a name='tf'></a>

## Additional References

- [CNN Features off-the-shelf: an Astounding Baseline for Recognition](http://arxiv.org/abs/1403.6382) trains SVMs on features from ImageNet-pretrained ConvNet and reports several state of the art results.
- [DeCAF](http://arxiv.org/abs/1310.1531) reported similar findings in 2013. The framework in this paper (DeCAF) was a Python-based precursor to the C++ Caffe library.
- [How transferable are features in deep neural networks?](http://arxiv.org/abs/1411.1792) studies the transfer learning performance in detail, including some unintuitive findings about layer co-adaptations.


================================================
FILE: transformers.md
================================================
Table of Contents:

- [Transformers Overview](#overview)
- [Why Transformers?](#why)
- [Multi-Headed Attention](#multihead)
- [Multi-Headed Attention Tips](#tips)
- [Transformer Steps: Encoder-Decoder](#steps)

<a name='overview'></a>

### Transformer Overview

In ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762), Vaswani et al. introduced the Transformer, which
introduces parallelism and enables models to learn long-range dependencies--thereby helping solve two key issues with
RNNs: their slow speed of training and their difficulty in encoding long-range dependencies. Transformers are highly
scalable and highly parallelizable, allowing for faster training, larger models, and better performance across vision
and language tasks. Transformers are beginning to replace RNNs and LSTMs and may soon replace convolutions as well.

<a name='why'></a>

### Why Transformers?

- Transformers are great for working with long input sequences since the attention calculation looks at all inputs. In
  contrast, RNNs struggle to encode long-range dependencies. LSTMs are much better at capturing long-range dependencies
  by using the input, output, and forget gates.
- Transformers can operate over unordered sets or ordered sequences with positional encodings (using positional encoding
  to add ordering the sets). In contrast, RNN/LSTM expect an ordered sequence of inputs.
- Transformers use parallel computation where all alignment and attention scores for all inputs can be done in parallel.
  In contrast, RNN/LSTM uses sequential computation since the hidden state at a current timestep can only be computed
  after the previous states are calculated which makes them often slow to train.

<a name='multihead'></a>

### Multi-Headed Attention

Let’s refresh our concepts from the attention unit to help us with transformers.
<br>

- **Dot-Product Attention:**

<div class="fig figcenter fighighlight">
  <img src="/assets/att/dotproduct.png" width="80%">
  <div class="figcaption">Dot-Product Attention</div>
</div>
<br>
With query q (D,), value vectors {v_1,...,v_n} where v_i (D,), key vectors {k_1,...,k_n} where k_i (D,), attention weights a_i, and output c (D,).
The output is a weighted average over the value vectors.

- **Self-Attention:** we derive values, keys, and queries from the input

<div class="fig figcenter fighighlight">
  <img src="/assets/att/vkq.png" width="80%">
  <div class="figcaption">Value, Key, and Query</div>
</div>
<br>
Combining the above two, we can now implement multi-headed scaled dot product attention for transformers. 

- **Multi-Headed Scaled Dot Product Attention:** We learn a parameter matrix V_i, K_i, Q_i (DxD) for each head i, which
  increases the model’s expressivity to attend to different parts of the input. We apply a scaling term (1/sqrt(d/h)) to
  the dot-product attention described previously in order to reduce the effect of large magnitude vectors.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/softmax.png" width="80%">
  <div class="figcaption">Multi-Headed Scaled Dot Product Attention</div>
</div>
<br>
We can then apply dropout, generate the output of the attention layer, and finally add a linear transformation to the output of the attention operation, which allows the model to learn the relationship between heads, thereby improving the model’s expressivity.

<a name='tips'></a>

### Step-by-Step Multi-Headed Attention with Intermediate Dimensions

There's a lot happening throughout the Multi-Headed Attention so hopefully this chart will help further clarify the
intermediate steps and how the dimensions change after each step!

<div class="fig figcenter fighighlight">
  <img src="/assets/att/multiheadgraph.PNG" width="80%">
  <div class="figcaption">Step-by-Step Multi-Headed Attention with Intermediate Dimensions</div>
</div>

### A couple tips on Permute and Reshape:

To create the multiple heads, we divide the embedding dimension by the number of heads and use Reshape (Ex: Reshape
allows us to go from shape (N x S x D) to (N x S x H x D//H) ). It is important to note that Reshape doesn’t change the
ordering of your data. It simply takes the original data and ‘reshapes’ it into the dimensions you provide. We use
Permute (or can use Transpose) to rearrange the ordering of dimensions of the data (Ex: Permute allows us to rearrange
the dimensions from (N x S x H x D//H) to (N x H x S x D//H) ). Notice why we needed to use Permute before Reshaping
after the final MatMul operation. Our current tensor had a shape of (N x H x S x D//H) but in order to reshape it to
be (N x S x D) we needed to first ensure that the H and D//H dimensions are right next to each other because reshape
doesn’t change the ordering of the data. Therefore we use Permute first to rearrange the dimensions from (N x H x S x
D//H) to (N x S x H x D//H) and then can use reshape to get the shape of  (N x S x D).

<a name='steps'></a>

### Transformer Steps: Encoder-Decoder

### Encoder Block

The role of the Encoder block is to encode all the image features (where the spatial features are extracted using
pretrained CNN) into a set of context vectors. The context vectors outputted are a representation of the input sequence
in a higher dimensional space. We define the Encoder as c = T_W(z) where z is the spatial CNN features and T_w(.) is the
transformer encoder. In the "Attention Is All You Need" paper a transformer encoder block made up of N encoder blocks (N
= 6, D = 512) is used.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/encoder.png" width="80%">
  <div class="figcaption">Encoder Block</div>
</div>
<br>

Let’s walk through the steps of the Encoder block!

- We first take in a set of input vectors X (where each input vector represents a word for instance)
- We then add positional encoding to the input vectors.
- We pass the positional encoded vectors through the **Multi-head self-attention layer** (where each vector attends on
  all the other vectors). The output of this layer gives us a set of context vectors.
- We have a Residual Connection after the Multi-head self-attention layer which allows us to bypass the attention layer
  if it’s not needed.
- We then apply Layer Normalization on the output which normalizes each individual vector.
- We then apply MLP over each vector individually.
- We then have another Residual Connection.
- A final Layer Normalization on the output.
- And finally the set of context vectors C is outputted!

### Decoder Block

The Decoder block takes in the set of context vectors C outputted from the encoder block and set of input vectors X and
outputs a set of vectors Y which defines the output sequence. We define the Decoder as y_t = T_S(y_{0:t-1},c) where T_D(
.) is the transformer decoder. In the"Attention Is All You Need" paper a transformer decoder block made up of N decoder
blocks (N = 6, D = 512) is used.

<div class="fig figcenter fighighlight">
  <img src="/assets/att/decoder.png" width="80%">
  <div class="figcaption">Decoder Block</div>
</div>
<br>

Let’s walk through the steps of the Decoder block!

- We take in the set of input vectors X and context vectors C (outputted from Encoder block)
- We then add positional encoding to the input vectors X.
- We pass the positional encoded vectors through the **Masked Multi-head self-attention layer**. The mask ensures that
  we only attend over previous inputs.
- We have a Residual Connection after this layer which allows us to bypass the attention layer if it’s not needed.
- We then apply Layer Normalization on the output which normalizes each individual vector.
- Then we pass the output through another **Multi-head attention layer** which takes in the context vectors outputted by
  the Encoder block as well as the output of the Layer Normalization. In this step the Key comes from the set of context
  vectors C, the Value comes from the set of context vectors C, and the Query comes from the output of the Layer
  Normalization step.
- We then have another Residual Connection.
- Apply another Layer Normalization.
- Apply MLP over each vector individually.
- Another Residual Connection
- A final Layer Normalization
- And finally we pass the output through a Fully-connected layer which produces the final set of output vectors Y which
  is the output sequence.

### Additional Notes on Layer Normalization and MLP

**Layer Normalization:** As seen in the Encoder and Decoder block implementation, we use Layer Normalization after the
Residual Connections in both the Encoder and Decoder Blocks. Recall that in Layer Normalization we are normalizing
across the feature dimension (so we are applying LayerNorm over the image features). Using Layer Normalization at these
points helps us prevent issues with vanishing or exploding gradients, helps stabilize the network, and can reduce
training time.

**MLP:** Both the encoder and decoder blocks contain position-wise fully-connected feed-forward networks, which are
“applied to each position separately and identically” (Vaswani et al.). The linear transformations use different
parameters across layers. FFN(x) = max(0, xW_1 + b_1)W_2 + b_2. Additionally, the combination of a self-attention layer
and a point-wise feed-forward layer reduces the complexity required by convolutional layers.

### Additional Resources

Additional resources related to implementation:

- ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762)


================================================
FILE: understanding-cnn.md
================================================
---
layout: page
permalink: /understanding-cnn/
---

<a name='vis'></a>

(this page is currently in draft form)

## Visualizing what ConvNets learn

Several approaches for understanding and visualizing Convolutional Networks have been developed in the literature, partly as a response the common criticism that the learned features in a Neural Network are not interpretable. In this section we briefly survey some of these approaches and related work.

### Visualizing the activations and first-layer weights

**Layer Activations**. The most straight-forward visualization technique is to show the activations of the network during the forward pass. For ReLU networks, the activations usually start out looking relatively blobby and dense, but as the training progresses the activations usually become more sparse and localized. One dangerous pitfall that can be easily noticed with this visualization is that some activation maps may be all zero for many different inputs, which can indicate *dead* filters, and can be a symptom of high learning rates.

<div class="fig figcenter fighighlight">
  <img src="/assets/cnnvis/act1.jpeg" width="49%">
  <img src="/assets/cnnvis/act2.jpeg" width="49%">
  <div class="figcaption">
    Typical-looking activations on the first CONV layer (left), and the 5th CONV layer (right) of a trained AlexNet looking at a picture of a cat. Every box shows an activation map corresponding to some filter. Notice that the activations are sparse (most values are zero, in this visualization shown in black) and mostly local.
  </div>
</div>

**Conv/FC Filters.** The second common strategy is to visualize the weights. These are usually most interpretable on the first CONV layer which is looking directly at the raw pixel data, but it is possible to also show the filter weights deeper in the network. The weights are useful to visualize because well-trained networks usually display nice and smooth filters without any noisy patterns. Noisy patterns can be an indicator of a network that hasn't been trained for long enough, or possibly a very low regularization strength that may have led to overfitting.

<div class="fig figcenter fighighlight">
  <img src="/assets/cnnvis/filt1.jpeg" width="49%">
  <img src="/assets/cnnvis/filt2.jpeg" width="49%">
  <div class="figcaption">
    Typical-looking filters on the first CONV layer (left), and the 2nd CONV layer (right) of a trained AlexNet. Notice that the first-layer weights are very nice and smooth, indicating nicely converged network. The color/grayscale features are clustered because the AlexNet contains two separate streams of processing, and an apparent consequence of this architecture is that one stream develops high-frequency grayscale features and the other low-frequency color features. The 2nd CONV layer weights are not as interpretable, but it is apparent that they are still smooth, well-formed, and absent of noisy patterns.
  </div>
</div>

### Retrieving images that maximally activate a neuron

Another visualization technique is to take a large dataset of images, feed them through the network and keep track of which images maximally activate some neuron. We can then visualize the images to get an understanding of what the neuron is looking for in its receptive field. One such visualization (among others) is shown in [Rich feature hierarchies for accurate object detection and semantic segmentation](http://arxiv.org/abs/1311.2524) by Ross Girshick et al.:

<div class="fig figcenter fighighlight">
  <img src="/assets/cnnvis/pool5max.jpeg" width="100%">
  <div class="figcaption">
    Maximally activating images for some POOL5 (5th pool layer) neurons of an AlexNet. The activation values and the receptive field of the particular neuron are shown in white. (In particular, note that the POOL5 neurons are a function of a relatively large portion of the input image!) It can be seen that some neurons are responsive to upper bodies, text, or specular highlights.
  </div>
</div>

One problem with this approach is that ReLU neurons do not necessarily have any semantic meaning by themselves. Rather, it is more appropriate to think of multiple ReLU neurons as the basis vectors of some space that represents in image patches. In other words, the visualization is showing the patches at the edge of the cloud of representations, along the (arbitrary) axes that correspond to the filter weights. This can also be seen by the fact that neurons in a ConvNet operate linearly over the input space, so any arbitrary rotation of that space is a no-op. This point was further argued in [Intriguing properties of neural networks](http://arxiv.org/abs/1312.6199) by Szegedy et al., where they perform a similar visualization along arbitrary directions in the representation space.

### Embedding the codes with t-SNE 

ConvNets can be interpreted as gradually transforming the images into a representation in which the classes are separable by a linear classifier. We can get a rough idea about the topology of this space by embedding images into two dimensions so that their low-dimensional representation has approximately equal distances than their high-dimensional representation. There are many embedding methods that have been developed with the intuition of embedding high-dimensional vectors in a low-dimensional space while preserving the pairwise distances of the points. Among these, [t-SNE](http://lvdmaaten.github.io/tsne/) is one of the best-known methods that consistently produces visually-pleasing results.

To produce an embedding, we can take a set of images and use the ConvNet to extract the CNN codes (e.g. in AlexNet the 4096-dimensional vector right before the classifier, and crucially, including the ReLU non-linearity). We can then plug these into t-SNE and get 2-dimensional vector for each image. The corresponding images can them be visualized in a grid:

<div class="fig figcenter fighighlight">
  <img src="/assets/cnnvis/tsne.jpeg" width="100%">
  <div class="figcaption">
    t-SNE embedding of a set of images based on their CNN codes. Images that are nearby each other are also close in the CNN representation space, which implies that the CNN "sees" them as being very similar. Notice that the similarities are more often class-based and semantic rather than pixel and color-based. For more details on how this visualization was produced the associated code, and more related visualizations at different scales refer to <a href="http://cs.stanford.edu/people/karpathy/cnnembed/">t-SNE visualization of CNN codes</a>.
  </div>
</div>

### Occluding parts of the image

Suppose that a ConvNet classifies an image as a dog. How can we be certain that it's actually picking up on the dog in the image as opposed to some contextual cues from the background or some other miscellaneous object? One way of investigating which part of the image some classification prediction is coming from is by plotting the probability of the class of interest (e.g. dog class) as a function of the position of an occluder object. That is, we iterate over regions of the image, set a patch of the image to be all zero, and look at the probability of the class. We can visualize the probability as a 2-dimensional heat map. This approach has been used in Matthew Zeiler's [Visualizing and Understanding Convolutional Networks](http://arxiv.org/abs/1311.2901):

<div class="fig figcenter fighighlight">
  <img src="/assets/cnnvis/occlude.jpeg" width="100%">
  <div class="figcaption">
    Three input images (top). Notice that the occluder region is shown in grey. As we slide the occluder over the image we record the probability of the correct class and then visualize it as a heatmap (shown below each image). For instance, in the left-most image we see that the probability of Pomeranian plummets when the occluder covers the face of the dog, giving us some level of confidence that the dog's face is primarily responsible for the high classification score. Conversely, zeroing out other parts of the image is seen to have relatively negligible impact.
  </div>
</div>

### Visualizing the data gradient and friends

**Data Gradient**.

[Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](http://arxiv.org/abs/1312.6034)

**DeconvNet**.

[Visualizing and Understanding Convolutional Networks](http://arxiv.org/abs/1311.2901)

**Guided Backpropagation**.

[Striving for Simplicity: The All Convolutional Net](http://arxiv.org/abs/1412.6806)

### Reconstructing original images based on CNN Codes

[Understanding Deep Image Representations by Inverting Them](http://arxiv.org/abs/1412.0035)

### How much spatial information is preserved?

[Do ConvNets Learn Correspondence?](http://papers.nips.cc/paper/5420-do-convnets-learn-correspondence.pdf) (tldr: yes)

### Plotting performance as a function of image attributes

[ImageNet Large Scale Visual Recognition Challenge](http://arxiv.org/abs/1409.0575)

## Fooling ConvNets

[Explaining and Harnessing Adversarial Examples](http://arxiv.org/abs/1412.6572)

## Comparing ConvNets to Human labelers

[What I learned from competing against a ConvNet on ImageNet](http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/)