[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n.hypothesis/\n.pytest_cache/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# pyenv\n.python-version\n\n# celery beat schedule file\ncelerybeat-schedule\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2018 Joost Bastings\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# The Annotated Encoder Decoder with Attention\n\nRead the [blog post](https://bastings.github.io/annotated_encoder_decoder/) or simply run the jupyter notebook from this repository.\n"
  },
  {
    "path": "_config.yml",
    "content": "title: The Annotated Encoder Decoder\ndescription: A PyTorch tutorial implementing Bahdanau et al. (2015)\ngoogle_analytics: UA-126252625-1\nshow_downloads: true\ntheme: jekyll-theme-cayman\nkramdown:\n   math_engine: mathjax\n   syntax_highlighter: rouge\n   \ngems:\n  - jekyll-mentions\n"
  },
  {
    "path": "annotated_encoder_decoder.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# The Annotated Encoder-Decoder with Attention\\n\",\n    \"\\n\",\n    \"Recently, Alexander Rush wrote a blog post called [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html), describing the Transformer model from the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762). This post can be seen as a **prequel** to that: *we will implement an Encoder-Decoder with Attention* using (Gated) Recurrent Neural Networks, very closely following the original attention-based neural machine translation paper [\\\"Neural Machine Translation by Jointly Learning to Align and Translate\\\"](https://arxiv.org/abs/1409.0473) of Bahdanau et al. (2015). \\n\",\n    \"\\n\",\n    \"The idea is that going through both blog posts will make you familiar with two very influential sequence-to-sequence architectures. If you have any comments or suggestions, please let me know: [@BastingsJasmijn](https://twitter.com/BastingsJasmijn).\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Model Architecture\\n\",\n    \"\\n\",\n    \"We will model the probability $p(Y\\\\mid X)$ of a target sequence $Y=(y_1, \\\\dots, y_{N})$ given a source sequence $X=(x_1, \\\\dots, x_M)$ directly with a neural network: an Encoder-Decoder.\\n\",\n    \"\\n\",\n    \"<img src=\\\"images/bahdanau.png\\\" width=\\\"636\\\">\\n\",\n    \"\\n\",\n    \"#### Encoder \\n\",\n    \"\\n\",\n    \"The encoder reads in the source sentence (*at the bottom of the figure*) and produces a sequence of hidden states $\\\\mathbf{h}_1, \\\\dots, \\\\mathbf{h}_M$, one for each source word. These states should capture the meaning of a word in its context of the given sentence.\\n\",\n    \"\\n\",\n    \"We will use a bi-directional recurrent neural network (Bi-RNN) as the encoder; a Bi-GRU in particular.\\n\",\n    \"\\n\",\n    \"First of all we **embed** the source words. \\n\",\n    \"We simply look up the **word embedding** for each word in a (randomly initialized) lookup table.\\n\",\n    \"We will denote the word embedding for word $i$ in a given sentence with $\\\\mathbf{x}_i$.\\n\",\n    \"By embedding words, our model may exploit the fact that certain words (e.g. *cat* and *dog*) are semantically similar, and can be processed in a similar way.\\n\",\n    \"\\n\",\n    \"Now, how do we get hidden states $\\\\mathbf{h}_1, \\\\dots, \\\\mathbf{h}_M$? A forward GRU reads the source sentence left-to-right, while a backward GRU reads it right-to-left.\\n\",\n    \"Each of them follows a simple recursive formula: \\n\",\n    \"$$\\\\mathbf{h}_j = \\\\text{GRU}( \\\\mathbf{x}_j , \\\\mathbf{h}_{j - 1} )$$\\n\",\n    \"i.e. we obtain the next state from the previous state and the current input word embedding.\\n\",\n    \"\\n\",\n    \"The hidden state of the forward GRU at time step $j$ will know what words **precede** the word at that time step, but it doesn't know what words will follow. In contrast, the backward GRU will only know what words **follow** the word at time step $j$. By **concatenating** those two hidden states (*shown in blue in the figure*), we get $\\\\mathbf{h}_j$, which captures word $j$ in its full sentence context.\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"#### Decoder \\n\",\n    \"\\n\",\n    \"The decoder (*at the top of the figure*) is a GRU with hidden state $\\\\mathbf{s_i}$. It follows a similar formula to the encoder, but takes one extra input $\\\\mathbf{c}_{i}$ (*shown in yellow*).\\n\",\n    \"\\n\",\n    \"$$\\\\mathbf{s}_{i} = f( \\\\mathbf{s}_{i - 1}, \\\\mathbf{y}_{i - 1}, \\\\mathbf{c}_i )$$\\n\",\n    \"\\n\",\n    \"Here, $\\\\mathbf{y}_{i - 1}$ is the previously generated target word (*not shown*).\\n\",\n    \"\\n\",\n    \"At each time step, an **attention mechanism** dynamically selects that part of the source sentence that is most relevant for predicting the current target word. It does so by comparing the last decoder state with each source hidden state. The result is a context vector $\\\\mathbf{c_i}$ (*shown in yellow*).\\n\",\n    \"Later the attention mechanism is explained in more detail.\\n\",\n    \"\\n\",\n    \"After computing the decoder state $\\\\mathbf{s}_i$, a non-linear function $g$ (which applies a [softmax](https://en.wikipedia.org/wiki/Softmax_function)) gives us the probability of the target word $y_i$ for this time step:\\n\",\n    \"\\n\",\n    \"$$ p(y_i \\\\mid y_{<i}, x_1^M) = g(\\\\mathbf{s}_i, \\\\mathbf{c}_i, \\\\mathbf{y}_{i - 1})$$\\n\",\n    \"\\n\",\n    \"Because $g$ applies a softmax, it provides a vector the size of the output vocabulary that sums to 1.0: it is a distribution over all target words. During test time, we would select the word with the highest probability for our translation.\\n\",\n    \"\\n\",\n    \"Now, for optimization, a [cross-entropy loss](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy) is used to maximize the probability of selecting the correct word at this time step. All parameters (including word embeddings) are then updated to maximize this probability.\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Prelims\\n\",\n    \"\\n\",\n    \"This tutorial requires **PyTorch >= 0.4.1** and was tested with **Python 3.6**.  \\n\",\n    \"\\n\",\n    \"Make sure you have those versions, and install the packages below if you don't have them yet.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"#!pip install torch numpy matplotlib sacrebleu\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"CUDA: True\\n\",\n      \"cuda:0\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"%matplotlib inline\\n\",\n    \"import numpy as np\\n\",\n    \"import torch\\n\",\n    \"import torch.nn as nn\\n\",\n    \"import torch.nn.functional as F\\n\",\n    \"import math, copy, time\\n\",\n    \"import matplotlib.pyplot as plt\\n\",\n    \"from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence\\n\",\n    \"from IPython.core.debugger import set_trace\\n\",\n    \"\\n\",\n    \"# we will use CUDA if it is available\\n\",\n    \"USE_CUDA = torch.cuda.is_available()\\n\",\n    \"DEVICE=torch.device('cuda:0') # or set to 'cpu'\\n\",\n    \"print(\\\"CUDA:\\\", USE_CUDA)\\n\",\n    \"print(DEVICE)\\n\",\n    \"\\n\",\n    \"seed = 42\\n\",\n    \"np.random.seed(seed)\\n\",\n    \"torch.manual_seed(seed)\\n\",\n    \"torch.cuda.manual_seed(seed)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Let's start coding!\\n\",\n    \"\\n\",\n    \"## Model class\\n\",\n    \"\\n\",\n    \"Our base model class `EncoderDecoder` is very similar to the one in *The Annotated Transformer*.\\n\",\n    \"\\n\",\n    \"One difference is that our encoder also returns its final states (`encoder_final` below), which is used to initialize the decoder RNN. We also provide the sequence lengths as the RNNs require those.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class EncoderDecoder(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    A standard Encoder-Decoder architecture. Base for this and many \\n\",\n    \"    other models.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    def __init__(self, encoder, decoder, src_embed, trg_embed, generator):\\n\",\n    \"        super(EncoderDecoder, self).__init__()\\n\",\n    \"        self.encoder = encoder\\n\",\n    \"        self.decoder = decoder\\n\",\n    \"        self.src_embed = src_embed\\n\",\n    \"        self.trg_embed = trg_embed\\n\",\n    \"        self.generator = generator\\n\",\n    \"        \\n\",\n    \"    def forward(self, src, trg, src_mask, trg_mask, src_lengths, trg_lengths):\\n\",\n    \"        \\\"\\\"\\\"Take in and process masked src and target sequences.\\\"\\\"\\\"\\n\",\n    \"        encoder_hidden, encoder_final = self.encode(src, src_mask, src_lengths)\\n\",\n    \"        return self.decode(encoder_hidden, encoder_final, src_mask, trg, trg_mask)\\n\",\n    \"    \\n\",\n    \"    def encode(self, src, src_mask, src_lengths):\\n\",\n    \"        return self.encoder(self.src_embed(src), src_mask, src_lengths)\\n\",\n    \"    \\n\",\n    \"    def decode(self, encoder_hidden, encoder_final, src_mask, trg, trg_mask,\\n\",\n    \"               decoder_hidden=None):\\n\",\n    \"        return self.decoder(self.trg_embed(trg), encoder_hidden, encoder_final,\\n\",\n    \"                            src_mask, trg_mask, hidden=decoder_hidden)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"To keep things easy we also keep the `Generator` class the same. \\n\",\n    \"It simply projects the pre-output layer ($x$ in the `forward` function below) to obtain the output layer, so that the final dimension is the target vocabulary size.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class Generator(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"Define standard linear + softmax generation step.\\\"\\\"\\\"\\n\",\n    \"    def __init__(self, hidden_size, vocab_size):\\n\",\n    \"        super(Generator, self).__init__()\\n\",\n    \"        self.proj = nn.Linear(hidden_size, vocab_size, bias=False)\\n\",\n    \"\\n\",\n    \"    def forward(self, x):\\n\",\n    \"        return F.log_softmax(self.proj(x), dim=-1)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Encoder\\n\",\n    \"\\n\",\n    \"Our encoder is a bi-directional GRU. \\n\",\n    \"\\n\",\n    \"Because we want to process multiple sentences at the same time for speed reasons (it is more effcient on GPU), we need to support **mini-batches**. Sentences in a mini-batch may have different lengths, which means that the RNN needs to unroll further for certain sentences while it might already have finished for others:\\n\",\n    \"\\n\",\n    \"```\\n\",\n    \"Example: mini-batch with 3 source sentences of different lengths (7, 5, and 3).\\n\",\n    \"End-of-sequence is marked with a \\\"3\\\" here, and padding positions with \\\"1\\\".\\n\",\n    \"\\n\",\n    \"+---------------+\\n\",\n    \"| 4 5 9 8 7 8 3 |\\n\",\n    \"+---------------+\\n\",\n    \"| 5 4 8 7 3 1 1 |\\n\",\n    \"+---------------+\\n\",\n    \"| 5 8 3 1 1 1 1 |\\n\",\n    \"+---------------+\\n\",\n    \"```\\n\",\n    \"You can see that, when computing hidden states for this mini-batch, for sentence #2 and #3 we will need to stop updating the hidden state after we have encountered \\\"3\\\". We don't want to incorporate the padding values (1s).\\n\",\n    \"\\n\",\n    \"Luckily, PyTorch has convenient helper functions called `pack_padded_sequence` and `pad_packed_sequence`.\\n\",\n    \"These functions take care of masking and padding, so that the resulting word representations are simply zeros after a sentence stops.\\n\",\n    \"\\n\",\n    \"The code below reads in a source sentence (a sequence of word embeddings) and produces the hidden states.\\n\",\n    \"It also returns a final vector, a summary of the complete sentence, by concatenating the first and the last hidden states (they have both seen the whole sentence, each in a different direction). We will use the final vector to initialize the decoder.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 5,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class Encoder(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"Encodes a sequence of word embeddings\\\"\\\"\\\"\\n\",\n    \"    def __init__(self, input_size, hidden_size, num_layers=1, dropout=0.):\\n\",\n    \"        super(Encoder, self).__init__()\\n\",\n    \"        self.num_layers = num_layers\\n\",\n    \"        self.rnn = nn.GRU(input_size, hidden_size, num_layers, \\n\",\n    \"                          batch_first=True, bidirectional=True, dropout=dropout)\\n\",\n    \"        \\n\",\n    \"    def forward(self, x, mask, lengths):\\n\",\n    \"        \\\"\\\"\\\"\\n\",\n    \"        Applies a bidirectional GRU to sequence of embeddings x.\\n\",\n    \"        The input mini-batch x needs to be sorted by length.\\n\",\n    \"        x should have dimensions [batch, time, dim].\\n\",\n    \"        \\\"\\\"\\\"\\n\",\n    \"        packed = pack_padded_sequence(x, lengths, batch_first=True)\\n\",\n    \"        output, final = self.rnn(packed)\\n\",\n    \"        output, _ = pad_packed_sequence(output, batch_first=True)\\n\",\n    \"\\n\",\n    \"        # we need to manually concatenate the final states for both directions\\n\",\n    \"        fwd_final = final[0:final.size(0):2]\\n\",\n    \"        bwd_final = final[1:final.size(0):2]\\n\",\n    \"        final = torch.cat([fwd_final, bwd_final], dim=2)  # [num_layers, batch, 2*dim]\\n\",\n    \"\\n\",\n    \"        return output, final\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Decoder\\n\",\n    \"\\n\",\n    \"The decoder is a conditional GRU. Rather than starting with an empty state like the encoder, its initial hidden state results from a projection of the encoder final vector. \\n\",\n    \"\\n\",\n    \"#### Training\\n\",\n    \"In `forward` you can find a for-loop that computes the decoder hidden states one time step at a time. \\n\",\n    \"Note that, during training, we know exactly what the target words should be! (They are in `trg_embed`.) This means that we are not even checking here what the prediction is! We simply feed the correct previous target word embedding to the GRU at each time step. This is called teacher forcing.\\n\",\n    \"\\n\",\n    \"The `forward` function returns all decoder hidden states and pre-output vectors. Elsewhere these are used to compute the loss, after which the parameters are updated.\\n\",\n    \"\\n\",\n    \"#### Prediction\\n\",\n    \"For prediction time, for forward function is only used for a single time step. After predicting a word from the returned pre-output vector, we can call it again, supplying it the word embedding of the previously predicted word and the last state.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class Decoder(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"A conditional RNN decoder with attention.\\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    def __init__(self, emb_size, hidden_size, attention, num_layers=1, dropout=0.5,\\n\",\n    \"                 bridge=True):\\n\",\n    \"        super(Decoder, self).__init__()\\n\",\n    \"        \\n\",\n    \"        self.hidden_size = hidden_size\\n\",\n    \"        self.num_layers = num_layers\\n\",\n    \"        self.attention = attention\\n\",\n    \"        self.dropout = dropout\\n\",\n    \"                 \\n\",\n    \"        self.rnn = nn.GRU(emb_size + 2*hidden_size, hidden_size, num_layers,\\n\",\n    \"                          batch_first=True, dropout=dropout)\\n\",\n    \"                 \\n\",\n    \"        # to initialize from the final encoder state\\n\",\n    \"        self.bridge = nn.Linear(2*hidden_size, hidden_size, bias=True) if bridge else None\\n\",\n    \"\\n\",\n    \"        self.dropout_layer = nn.Dropout(p=dropout)\\n\",\n    \"        self.pre_output_layer = nn.Linear(hidden_size + 2*hidden_size + emb_size,\\n\",\n    \"                                          hidden_size, bias=False)\\n\",\n    \"        \\n\",\n    \"    def forward_step(self, prev_embed, encoder_hidden, src_mask, proj_key, hidden):\\n\",\n    \"        \\\"\\\"\\\"Perform a single decoder step (1 word)\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"        # compute context vector using attention mechanism\\n\",\n    \"        query = hidden[-1].unsqueeze(1)  # [#layers, B, D] -> [B, 1, D]\\n\",\n    \"        context, attn_probs = self.attention(\\n\",\n    \"            query=query, proj_key=proj_key,\\n\",\n    \"            value=encoder_hidden, mask=src_mask)\\n\",\n    \"\\n\",\n    \"        # update rnn hidden state\\n\",\n    \"        rnn_input = torch.cat([prev_embed, context], dim=2)\\n\",\n    \"        output, hidden = self.rnn(rnn_input, hidden)\\n\",\n    \"        \\n\",\n    \"        pre_output = torch.cat([prev_embed, output, context], dim=2)\\n\",\n    \"        pre_output = self.dropout_layer(pre_output)\\n\",\n    \"        pre_output = self.pre_output_layer(pre_output)\\n\",\n    \"\\n\",\n    \"        return output, hidden, pre_output\\n\",\n    \"    \\n\",\n    \"    def forward(self, trg_embed, encoder_hidden, encoder_final, \\n\",\n    \"                src_mask, trg_mask, hidden=None, max_len=None):\\n\",\n    \"        \\\"\\\"\\\"Unroll the decoder one step at a time.\\\"\\\"\\\"\\n\",\n    \"                                         \\n\",\n    \"        # the maximum number of steps to unroll the RNN\\n\",\n    \"        if max_len is None:\\n\",\n    \"            max_len = trg_mask.size(-1)\\n\",\n    \"\\n\",\n    \"        # initialize decoder hidden state\\n\",\n    \"        if hidden is None:\\n\",\n    \"            hidden = self.init_hidden(encoder_final)\\n\",\n    \"        \\n\",\n    \"        # pre-compute projected encoder hidden states\\n\",\n    \"        # (the \\\"keys\\\" for the attention mechanism)\\n\",\n    \"        # this is only done for efficiency\\n\",\n    \"        proj_key = self.attention.key_layer(encoder_hidden)\\n\",\n    \"        \\n\",\n    \"        # here we store all intermediate hidden states and pre-output vectors\\n\",\n    \"        decoder_states = []\\n\",\n    \"        pre_output_vectors = []\\n\",\n    \"        \\n\",\n    \"        # unroll the decoder RNN for max_len steps\\n\",\n    \"        for i in range(max_len):\\n\",\n    \"            prev_embed = trg_embed[:, i].unsqueeze(1)\\n\",\n    \"            output, hidden, pre_output = self.forward_step(\\n\",\n    \"              prev_embed, encoder_hidden, src_mask, proj_key, hidden)\\n\",\n    \"            decoder_states.append(output)\\n\",\n    \"            pre_output_vectors.append(pre_output)\\n\",\n    \"\\n\",\n    \"        decoder_states = torch.cat(decoder_states, dim=1)\\n\",\n    \"        pre_output_vectors = torch.cat(pre_output_vectors, dim=1)\\n\",\n    \"        return decoder_states, hidden, pre_output_vectors  # [B, N, D]\\n\",\n    \"\\n\",\n    \"    def init_hidden(self, encoder_final):\\n\",\n    \"        \\\"\\\"\\\"Returns the initial decoder state,\\n\",\n    \"        conditioned on the final encoder state.\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"        if encoder_final is None:\\n\",\n    \"            return None  # start with zeros\\n\",\n    \"\\n\",\n    \"        return torch.tanh(self.bridge(encoder_final))            \\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Attention                                                                                                                                                                               \\n\",\n    \"\\n\",\n    \"At every time step, the decoder has access to *all* source word representations $\\\\mathbf{h}_1, \\\\dots, \\\\mathbf{h}_M$. \\n\",\n    \"An attention mechanism allows the model to focus on the currently most relevant part of the source sentence.\\n\",\n    \"The state of the decoder is represented by GRU hidden state $\\\\mathbf{s}_i$.\\n\",\n    \"So if we want to know which source word representation(s) $\\\\mathbf{h}_j$ are most relevant, we will need to define a function that takes those two things as input.\\n\",\n    \"\\n\",\n    \"Here we use the MLP-based, additive attention that was used in Bahdanau et al.:\\n\",\n    \"\\n\",\n    \"<img src=\\\"images/attention.png\\\" width=\\\"280\\\">\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"We apply an MLP with tanh-activation to both the current decoder state $\\\\bf s_i$ (the *query*) and each encoder state $\\\\bf h_j$ (the *key*), and then project this to a single value (i.e. a scalar) to get the *attention energy* $e_{ij}$. \\n\",\n    \"\\n\",\n    \"Once all energies are computed, they are normalized by a softmax so that they sum to one: \\n\",\n    \"\\n\",\n    \"$$ \\\\alpha_{ij} = \\\\text{softmax}(\\\\mathbf{e}_i)[j] $$\\n\",\n    \"\\n\",\n    \"$$\\\\sum_j \\\\alpha_{ij} = 1.0$$ \\n\",\n    \"\\n\",\n    \"The context vector for time step $i$ is then a weighted sum of the encoder hidden states (the *values*):\\n\",\n    \"$$\\\\mathbf{c}_i = \\\\sum_j \\\\alpha_{ij} \\\\mathbf{h}_j$$\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class BahdanauAttention(nn.Module):\\n\",\n    \"    \\\"\\\"\\\"Implements Bahdanau (MLP) attention\\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    def __init__(self, hidden_size, key_size=None, query_size=None):\\n\",\n    \"        super(BahdanauAttention, self).__init__()\\n\",\n    \"        \\n\",\n    \"        # We assume a bi-directional encoder so key_size is 2*hidden_size\\n\",\n    \"        key_size = 2 * hidden_size if key_size is None else key_size\\n\",\n    \"        query_size = hidden_size if query_size is None else query_size\\n\",\n    \"\\n\",\n    \"        self.key_layer = nn.Linear(key_size, hidden_size, bias=False)\\n\",\n    \"        self.query_layer = nn.Linear(query_size, hidden_size, bias=False)\\n\",\n    \"        self.energy_layer = nn.Linear(hidden_size, 1, bias=False)\\n\",\n    \"        \\n\",\n    \"        # to store attention scores\\n\",\n    \"        self.alphas = None\\n\",\n    \"        \\n\",\n    \"    def forward(self, query=None, proj_key=None, value=None, mask=None):\\n\",\n    \"        assert mask is not None, \\\"mask is required\\\"\\n\",\n    \"\\n\",\n    \"        # We first project the query (the decoder state).\\n\",\n    \"        # The projected keys (the encoder states) were already pre-computated.\\n\",\n    \"        query = self.query_layer(query)\\n\",\n    \"        \\n\",\n    \"        # Calculate scores.\\n\",\n    \"        scores = self.energy_layer(torch.tanh(query + proj_key))\\n\",\n    \"        scores = scores.squeeze(2).unsqueeze(1)\\n\",\n    \"        \\n\",\n    \"        # Mask out invalid positions.\\n\",\n    \"        # The mask marks valid positions so we invert it using `mask & 0`.\\n\",\n    \"        scores.data.masked_fill_(mask == 0, -float('inf'))\\n\",\n    \"        \\n\",\n    \"        # Turn scores to probabilities.\\n\",\n    \"        alphas = F.softmax(scores, dim=-1)\\n\",\n    \"        self.alphas = alphas        \\n\",\n    \"        \\n\",\n    \"        # The context vector is the weighted sum of the values.\\n\",\n    \"        context = torch.bmm(alphas, value)\\n\",\n    \"        \\n\",\n    \"        # context shape: [B, 1, 2D], alphas shape: [B, 1, M]\\n\",\n    \"        return context, alphas\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Embeddings and Softmax                                                                                                                                                                                                                                                                                           \\n\",\n    \"We use learned embeddings to convert the input tokens and output tokens to vectors of dimension `emb_size`.\\n\",\n    \"\\n\",\n    \"We will simply use PyTorch's [nn.Embedding](https://pytorch.org/docs/stable/nn.html?highlight=embedding#torch.nn.Embedding) class.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Full Model\\n\",\n    \"\\n\",\n    \"Here we define a function from hyperparameters to a full model. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def make_model(src_vocab, tgt_vocab, emb_size=256, hidden_size=512, num_layers=1, dropout=0.1):\\n\",\n    \"    \\\"Helper: Construct a model from hyperparameters.\\\"\\n\",\n    \"\\n\",\n    \"    attention = BahdanauAttention(hidden_size)\\n\",\n    \"\\n\",\n    \"    model = EncoderDecoder(\\n\",\n    \"        Encoder(emb_size, hidden_size, num_layers=num_layers, dropout=dropout),\\n\",\n    \"        Decoder(emb_size, hidden_size, attention, num_layers=num_layers, dropout=dropout),\\n\",\n    \"        nn.Embedding(src_vocab, emb_size),\\n\",\n    \"        nn.Embedding(tgt_vocab, emb_size),\\n\",\n    \"        Generator(hidden_size, tgt_vocab))\\n\",\n    \"\\n\",\n    \"    return model.cuda() if USE_CUDA else model\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Training\\n\",\n    \"\\n\",\n    \"This section describes the training regime for our models.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"We stop for a quick interlude to introduce some of the tools \\n\",\n    \"needed to train a standard encoder decoder model. First we define a batch object that holds the src and target sentences for training, as well as their lengths and masks. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Batches and Masking\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class Batch:\\n\",\n    \"    \\\"\\\"\\\"Object for holding a batch of data with mask during training.\\n\",\n    \"    Input is a batch from a torch text iterator.\\n\",\n    \"    \\\"\\\"\\\"\\n\",\n    \"    def __init__(self, src, trg, pad_index=0):\\n\",\n    \"        \\n\",\n    \"        src, src_lengths = src\\n\",\n    \"        \\n\",\n    \"        self.src = src\\n\",\n    \"        self.src_lengths = src_lengths\\n\",\n    \"        self.src_mask = (src != pad_index).unsqueeze(-2)\\n\",\n    \"        self.nseqs = src.size(0)\\n\",\n    \"        \\n\",\n    \"        self.trg = None\\n\",\n    \"        self.trg_y = None\\n\",\n    \"        self.trg_mask = None\\n\",\n    \"        self.trg_lengths = None\\n\",\n    \"        self.ntokens = None\\n\",\n    \"\\n\",\n    \"        if trg is not None:\\n\",\n    \"            trg, trg_lengths = trg\\n\",\n    \"            self.trg = trg[:, :-1]\\n\",\n    \"            self.trg_lengths = trg_lengths\\n\",\n    \"            self.trg_y = trg[:, 1:]\\n\",\n    \"            self.trg_mask = (self.trg_y != pad_index)\\n\",\n    \"            self.ntokens = (self.trg_y != pad_index).data.sum().item()\\n\",\n    \"        \\n\",\n    \"        if USE_CUDA:\\n\",\n    \"            self.src = self.src.cuda()\\n\",\n    \"            self.src_mask = self.src_mask.cuda()\\n\",\n    \"\\n\",\n    \"            if trg is not None:\\n\",\n    \"                self.trg = self.trg.cuda()\\n\",\n    \"                self.trg_y = self.trg_y.cuda()\\n\",\n    \"                self.trg_mask = self.trg_mask.cuda()\\n\",\n    \"                \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Training Loop\\n\",\n    \"The code below trains the model for 1 epoch (=1 pass through the training data).\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 10,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def run_epoch(data_iter, model, loss_compute, print_every=50):\\n\",\n    \"    \\\"\\\"\\\"Standard Training and Logging Function\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    start = time.time()\\n\",\n    \"    total_tokens = 0\\n\",\n    \"    total_loss = 0\\n\",\n    \"    print_tokens = 0\\n\",\n    \"\\n\",\n    \"    for i, batch in enumerate(data_iter, 1):\\n\",\n    \"        \\n\",\n    \"        out, _, pre_output = model.forward(batch.src, batch.trg,\\n\",\n    \"                                           batch.src_mask, batch.trg_mask,\\n\",\n    \"                                           batch.src_lengths, batch.trg_lengths)\\n\",\n    \"        loss = loss_compute(pre_output, batch.trg_y, batch.nseqs)\\n\",\n    \"        total_loss += loss\\n\",\n    \"        total_tokens += batch.ntokens\\n\",\n    \"        print_tokens += batch.ntokens\\n\",\n    \"        \\n\",\n    \"        if model.training and i % print_every == 0:\\n\",\n    \"            elapsed = time.time() - start\\n\",\n    \"            print(\\\"Epoch Step: %d Loss: %f Tokens per Sec: %f\\\" %\\n\",\n    \"                    (i, loss / batch.nseqs, print_tokens / elapsed))\\n\",\n    \"            start = time.time()\\n\",\n    \"            print_tokens = 0\\n\",\n    \"\\n\",\n    \"    return math.exp(total_loss / float(total_tokens))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Training Data and Batching\\n\",\n    \"\\n\",\n    \"We will use torch text for batching. This is discussed in more detail below. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Optimizer\\n\",\n    \"\\n\",\n    \"We will use the [Adam optimizer](https://arxiv.org/abs/1412.6980) with default settings ($\\\\beta_1=0.9$, $\\\\beta_2=0.999$ and $\\\\epsilon=10^{-8}$).\\n\",\n    \"\\n\",\n    \"We will use $0.0003$ as the learning rate here, but for different problems another learning rate may be more appropriate. You will have to tune that.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# A First  Example\\n\",\n    \"\\n\",\n    \"We can begin by trying out a simple copy-task. Given a random set of input symbols from a small vocabulary, the goal is to generate back those same symbols. \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Synthetic Data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def data_gen(num_words=11, batch_size=16, num_batches=100, length=10, pad_index=0, sos_index=1):\\n\",\n    \"    \\\"\\\"\\\"Generate random data for a src-tgt copy task.\\\"\\\"\\\"\\n\",\n    \"    for i in range(num_batches):\\n\",\n    \"        data = torch.from_numpy(\\n\",\n    \"          np.random.randint(1, num_words, size=(batch_size, length)))\\n\",\n    \"        data[:, 0] = sos_index\\n\",\n    \"        data = data.cuda() if USE_CUDA else data\\n\",\n    \"        src = data[:, 1:]\\n\",\n    \"        trg = data\\n\",\n    \"        src_lengths = [length-1] * batch_size\\n\",\n    \"        trg_lengths = [length] * batch_size\\n\",\n    \"        yield Batch((src, src_lengths), (trg, trg_lengths), pad_index=pad_index)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Loss Computation\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 12,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class SimpleLossCompute:\\n\",\n    \"    \\\"\\\"\\\"A simple loss compute and train function.\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    def __init__(self, generator, criterion, opt=None):\\n\",\n    \"        self.generator = generator\\n\",\n    \"        self.criterion = criterion\\n\",\n    \"        self.opt = opt\\n\",\n    \"\\n\",\n    \"    def __call__(self, x, y, norm):\\n\",\n    \"        x = self.generator(x)\\n\",\n    \"        loss = self.criterion(x.contiguous().view(-1, x.size(-1)),\\n\",\n    \"                              y.contiguous().view(-1))\\n\",\n    \"        loss = loss / norm\\n\",\n    \"\\n\",\n    \"        if self.opt is not None:\\n\",\n    \"            loss.backward()          \\n\",\n    \"            self.opt.step()\\n\",\n    \"            self.opt.zero_grad()\\n\",\n    \"\\n\",\n    \"        return loss.data.item() * norm\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Printing examples\\n\",\n    \"\\n\",\n    \"To monitor progress during training, we will translate a few examples.\\n\",\n    \"\\n\",\n    \"We use greedy decoding for simplicity; that is, at each time step, starting at the first token, we choose the one with that maximum probability, and we never revisit that choice. \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 13,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def greedy_decode(model, src, src_mask, src_lengths, max_len=100, sos_index=1, eos_index=None):\\n\",\n    \"    \\\"\\\"\\\"Greedily decode a sentence.\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    with torch.no_grad():\\n\",\n    \"        encoder_hidden, encoder_final = model.encode(src, src_mask, src_lengths)\\n\",\n    \"        prev_y = torch.ones(1, 1).fill_(sos_index).type_as(src)\\n\",\n    \"        trg_mask = torch.ones_like(prev_y)\\n\",\n    \"\\n\",\n    \"    output = []\\n\",\n    \"    attention_scores = []\\n\",\n    \"    hidden = None\\n\",\n    \"\\n\",\n    \"    for i in range(max_len):\\n\",\n    \"        with torch.no_grad():\\n\",\n    \"            out, hidden, pre_output = model.decode(\\n\",\n    \"              encoder_hidden, encoder_final, src_mask,\\n\",\n    \"              prev_y, trg_mask, hidden)\\n\",\n    \"\\n\",\n    \"            # we predict from the pre-output layer, which is\\n\",\n    \"            # a combination of Decoder state, prev emb, and context\\n\",\n    \"            prob = model.generator(pre_output[:, -1])\\n\",\n    \"\\n\",\n    \"        _, next_word = torch.max(prob, dim=1)\\n\",\n    \"        next_word = next_word.data.item()\\n\",\n    \"        output.append(next_word)\\n\",\n    \"        prev_y = torch.ones(1, 1).type_as(src).fill_(next_word)\\n\",\n    \"        attention_scores.append(model.decoder.attention.alphas.cpu().numpy())\\n\",\n    \"    \\n\",\n    \"    output = np.array(output)\\n\",\n    \"        \\n\",\n    \"    # cut off everything starting from </s> \\n\",\n    \"    # (only when eos_index provided)\\n\",\n    \"    if eos_index is not None:\\n\",\n    \"        first_eos = np.where(output==eos_index)[0]\\n\",\n    \"        if len(first_eos) > 0:\\n\",\n    \"            output = output[:first_eos[0]]      \\n\",\n    \"    \\n\",\n    \"    return output, np.concatenate(attention_scores, axis=1)\\n\",\n    \"  \\n\",\n    \"\\n\",\n    \"def lookup_words(x, vocab=None):\\n\",\n    \"    if vocab is not None:\\n\",\n    \"        x = [vocab.itos[i] for i in x]\\n\",\n    \"\\n\",\n    \"    return [str(t) for t in x]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 14,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def print_examples(example_iter, model, n=2, max_len=100, \\n\",\n    \"                   sos_index=1, \\n\",\n    \"                   src_eos_index=None, \\n\",\n    \"                   trg_eos_index=None, \\n\",\n    \"                   src_vocab=None, trg_vocab=None):\\n\",\n    \"    \\\"\\\"\\\"Prints N examples. Assumes batch size of 1.\\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    model.eval()\\n\",\n    \"    count = 0\\n\",\n    \"    print()\\n\",\n    \"    \\n\",\n    \"    if src_vocab is not None and trg_vocab is not None:\\n\",\n    \"        src_eos_index = src_vocab.stoi[EOS_TOKEN]\\n\",\n    \"        trg_sos_index = trg_vocab.stoi[SOS_TOKEN]\\n\",\n    \"        trg_eos_index = trg_vocab.stoi[EOS_TOKEN]\\n\",\n    \"    else:\\n\",\n    \"        src_eos_index = None\\n\",\n    \"        trg_sos_index = 1\\n\",\n    \"        trg_eos_index = None\\n\",\n    \"        \\n\",\n    \"    for i, batch in enumerate(example_iter):\\n\",\n    \"      \\n\",\n    \"        src = batch.src.cpu().numpy()[0, :]\\n\",\n    \"        trg = batch.trg_y.cpu().numpy()[0, :]\\n\",\n    \"\\n\",\n    \"        # remove </s> (if it is there)\\n\",\n    \"        src = src[:-1] if src[-1] == src_eos_index else src\\n\",\n    \"        trg = trg[:-1] if trg[-1] == trg_eos_index else trg      \\n\",\n    \"      \\n\",\n    \"        result, _ = greedy_decode(\\n\",\n    \"          model, batch.src, batch.src_mask, batch.src_lengths,\\n\",\n    \"          max_len=max_len, sos_index=trg_sos_index, eos_index=trg_eos_index)\\n\",\n    \"        print(\\\"Example #%d\\\" % (i+1))\\n\",\n    \"        print(\\\"Src : \\\", \\\" \\\".join(lookup_words(src, vocab=src_vocab)))\\n\",\n    \"        print(\\\"Trg : \\\", \\\" \\\".join(lookup_words(trg, vocab=trg_vocab)))\\n\",\n    \"        print(\\\"Pred: \\\", \\\" \\\".join(lookup_words(result, vocab=trg_vocab)))\\n\",\n    \"        print()\\n\",\n    \"        \\n\",\n    \"        count += 1\\n\",\n    \"        if count == n:\\n\",\n    \"            break\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Training the copy task\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 15,\n   \"metadata\": {\n    \"scrolled\": false\n   },\n   \"outputs\": [],\n   \"source\": [\n    \"def train_copy_task():\\n\",\n    \"    \\\"\\\"\\\"Train the simple copy task.\\\"\\\"\\\"\\n\",\n    \"    num_words = 11\\n\",\n    \"    criterion = nn.NLLLoss(reduction=\\\"sum\\\", ignore_index=0)\\n\",\n    \"    model = make_model(num_words, num_words, emb_size=32, hidden_size=64)\\n\",\n    \"    optim = torch.optim.Adam(model.parameters(), lr=0.0003)\\n\",\n    \"    eval_data = list(data_gen(num_words=num_words, batch_size=1, num_batches=100))\\n\",\n    \" \\n\",\n    \"    dev_perplexities = []\\n\",\n    \"    \\n\",\n    \"    if USE_CUDA:\\n\",\n    \"        model.cuda()\\n\",\n    \"\\n\",\n    \"    for epoch in range(10):\\n\",\n    \"        \\n\",\n    \"        print(\\\"Epoch %d\\\" % epoch)\\n\",\n    \"\\n\",\n    \"        # train\\n\",\n    \"        model.train()\\n\",\n    \"        data = data_gen(num_words=num_words, batch_size=32, num_batches=100)\\n\",\n    \"        run_epoch(data, model,\\n\",\n    \"                  SimpleLossCompute(model.generator, criterion, optim))\\n\",\n    \"\\n\",\n    \"        # evaluate\\n\",\n    \"        model.eval()\\n\",\n    \"        with torch.no_grad(): \\n\",\n    \"            perplexity = run_epoch(eval_data, model,\\n\",\n    \"                                   SimpleLossCompute(model.generator, criterion, None))\\n\",\n    \"            print(\\\"Evaluation perplexity: %f\\\" % perplexity)\\n\",\n    \"            dev_perplexities.append(perplexity)\\n\",\n    \"            print_examples(eval_data, model, n=2, max_len=9)\\n\",\n    \"        \\n\",\n    \"    return dev_perplexities\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 16,\n   \"metadata\": {\n    \"scrolled\": false\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/home/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1\\n\",\n      \"  \\\"num_layers={}\\\".format(dropout, num_layers))\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch 0\\n\",\n      \"Epoch Step: 50 Loss: 19.887581 Tokens per Sec: 7748.957397\\n\",\n      \"Epoch Step: 100 Loss: 17.856726 Tokens per Sec: 7925.338918\\n\",\n      \"Evaluation perplexity: 7.172198\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  8 3 7 5 8 3 7 5 8\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 8 8 8 8 8 8 8\\n\",\n      \"\\n\",\n      \"Epoch 1\\n\",\n      \"Epoch Step: 50 Loss: 15.715487 Tokens per Sec: 8662.903188\\n\",\n      \"Epoch Step: 100 Loss: 12.368280 Tokens per Sec: 7860.172940\\n\",\n      \"Evaluation perplexity: 3.709498\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 7 5 10 8 7 5 7\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 5 6 2 6 8 2 5\\n\",\n      \"\\n\",\n      \"Epoch 2\\n\",\n      \"Epoch Step: 50 Loss: 9.246480 Tokens per Sec: 7971.095313\\n\",\n      \"Epoch Step: 100 Loss: 7.701921 Tokens per Sec: 7876.198908\\n\",\n      \"Evaluation perplexity: 2.303158\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 7 3 10 5 8 7 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 5 6 2 6 8 5 2\\n\",\n      \"\\n\",\n      \"Epoch 3\\n\",\n      \"Epoch Step: 50 Loss: 6.166847 Tokens per Sec: 8069.631171\\n\",\n      \"Epoch Step: 100 Loss: 5.673258 Tokens per Sec: 7855.858586\\n\",\n      \"Evaluation perplexity: 1.775795\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 7 5 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 8\\n\",\n      \"\\n\",\n      \"Epoch 4\\n\",\n      \"Epoch Step: 50 Loss: 4.830031 Tokens per Sec: 8094.515152\\n\",\n      \"Epoch Step: 100 Loss: 4.152125 Tokens per Sec: 7999.315744\\n\",\n      \"Evaluation perplexity: 1.572305\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 5 7 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\",\n      \"Epoch 5\\n\",\n      \"Epoch Step: 50 Loss: 3.638369 Tokens per Sec: 8112.868501\\n\",\n      \"Epoch Step: 100 Loss: 3.784709 Tokens per Sec: 7843.288141\\n\",\n      \"Evaluation perplexity: 1.433951\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 7 5 3 10 7 8 7\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\",\n      \"Epoch 6\\n\",\n      \"Epoch Step: 50 Loss: 2.802792 Tokens per Sec: 8128.952327\\n\",\n      \"Epoch Step: 100 Loss: 2.403310 Tokens per Sec: 7893.746819\\n\",\n      \"Evaluation perplexity: 1.284198\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 5 7 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\",\n      \"Epoch 7\\n\",\n      \"Epoch Step: 50 Loss: 2.174423 Tokens per Sec: 8181.341663\\n\",\n      \"Epoch Step: 100 Loss: 1.838792 Tokens per Sec: 7833.160747\\n\",\n      \"Evaluation perplexity: 1.173110\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 5 7 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\",\n      \"Epoch 8\\n\",\n      \"Epoch Step: 50 Loss: 1.226522 Tokens per Sec: 8267.548130\\n\",\n      \"Epoch Step: 100 Loss: 1.090876 Tokens per Sec: 7842.856308\\n\",\n      \"Evaluation perplexity: 1.123090\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 5 7 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\",\n      \"Epoch 9\\n\",\n      \"Epoch Step: 50 Loss: 1.216270 Tokens per Sec: 8181.132215\\n\",\n      \"Epoch Step: 100 Loss: 0.636999 Tokens per Sec: 7866.309111\\n\",\n      \"Evaluation perplexity: 1.088564\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Trg :  4 8 5 7 10 3 7 8 5\\n\",\n      \"Pred:  4 8 5 7 10 3 7 8 5\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Trg :  8 8 3 6 5 2 8 6 2\\n\",\n      \"Pred:  8 8 3 6 5 2 8 6 2\\n\",\n      \"\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAYcAAAElCAYAAAAPyi6bAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xl8XHW9//HXJ5ksTdqkbdp0L91bFilgla0spYiKG4qIiF5xwQVZVFx+3quIuFwVxAXtlUVEUUQU1KtXBSmUrWwtlNLaBbpDl3RNmqTZP78/zpl0moXMpMmczMz7+XjMI5kzc875zKSd95zv93y/x9wdERGRRHlRFyAiIgOPwkFERDpROIiISCcKBxER6UThICIinSgcRESkE4WDdGJm15qZJ9x2m9njZva2iOpxM/tqP237WjNrSbg/NFx2bH/sL926+Fsm3n4TYV2LzOzBqPYvPYtFXYAMWK3A3PD3UcDVwN/M7Bx3/1d0ZfW524B/JNwfCnwdeBlYHklFfS/xb5loZ7oLkcyhcJBuuftT8d/N7GFgM3AlcFjhYGZF7t54mOX1CXd/BXgl6joORzLvZ+LfUiQZalaSpLh7DbAWmBJfZmZDzOyHZrbFzBrNbI2ZXZq4XrzZxsxmm9kjZlYPfD98zM3sGjP7jpntMLM6M7vXzEb1VI+Zvd7M/mFm1eF695vZ0QmPHxfWdE2H9e4zs1fNrCKxvvD3ScCG8Kl3JjS/nGlmfzazZ7qo46zwOSe/Rq2LzOxBM/uAma01swYze8bM3tjFcy80syVmdsDMdpnZL8xsWMLjk8L9XWpmN5lZFbCjp/erJ2a20cxuM7PPmtnmcP8LzWx6h+cVmdl3w795k5m9ZGZXm5l1eN6YcHvbwr/DOjP7Thf7fZuZLTezejNbamanHe5rkT7i7rrpdsgNuBZo6bAsBmwDHgjvFwCLCT6YLgPOJvjQbwU+1WFbrQTNNJ8H5gEnho85wbf2fwFvBz4G7AIWd9i3A19NuD8HOBCu927gncDjwG5gbMLzvgg0J+zv40Ab8KauXitQFG7PCZqWTgpvZcBbw+XHdqjtd8CLPbyfi8L37mXgonAfS4FqYGTC8z4T1rcAeDPwYeBV4AkgL3zOpLCOrcBvw7rO6+lvGf79Ot4s4Xkbw7/Fc2F9FxEE5QagKOF5vweagP8EzgF+ENbz7YTnVITb2xb+2zgLuAS4rcN7shV4AXg/cC7wLLAPGBr1/wHdXOGgW+dbFx8oY4Gfhx8Cl4bP+Y/wg+wNHda9NfxQyEvYlgMf7WI/8Q+5xA+fd4bLz+nwvMRweAhYBsQSlpURBMv3EpblAQ8DLwHHA7XAjV291oT78Q/fD3Z4Xl74QXlTwrIKoAG4sof3cxEdggWoJAi474T3BxOExU0d1j01XPetHep7JIW/pXdzSwzxjWE9lQnLjg2f94nw/uvC+1/ssI+bw3WHhve/RRDKs3p4TxqAiQnLTgi3/76o/w/o5mpWkm7lE/wHbyb49vpB4Fp3vzV8/M0EzUzPm1ksfgPuB0YD0zps73+72c9f/dD28r8CjQTf2Dsxs0HA6QTfYEnYbz3wJNDeLOHubQQhNiJ8bD3wlZ5femfhtm4FPhjWQLhtB+5MYhNr3b29g9vdq4DHOPg6TyYIuN91eD+fBvYnvq7QX1MovxV4Qxe3ezs879GwrniNywmCNV7j6eHPuzqs9zugGIg3k50NPO7uq3uoa6W7b068H/6c2MN6kgbqkJbutBJ8KDiwF9js7i0Jj1cCMwnCoysVCb+3ufuubp5XlXjH3d3MdgJjunn+cILg+k5462hth+1tMbNHgHcBN/vhdYT/guCb+HsJAuHjwL3uvjeJdau6WLaDoIkMgvcTgiakrlR0uJ9SP4O7L0niad3VGP9bDEtYlmh7h8crCEKtJ4e8b+7eGHZdFCexrvQzhYN0q4cPlD3AGoIjiq4kfmt8rXnhKxPvhB2bIwmaprqyj6A560bCo4cOGjps7wKCYHgO+IaZ3efu3W37Nbn7DjP7M3Cpma0HjiJoU09GZRfLRnHwde4Jf15E0DfRUcfTTvtjrv3uanwp/H1vwrJXE54zOvwZfw27gHF9Xp2klZqVpLfuB44Adrv7ki5utUlu5x1mVpR4n6BjuMtTL929juDb9eu62e+K+HPNbBxBe/gvCTrCa4FfdjyzpoP4kUV3315vJmji+S5BU9EjPb9EAGYkDqwzs8pwO/HX+URY3+RuXtemJPdzOE4P64rXeCwwPaHGR8Of7++w3oUEofxseP9BYK6ZzejHWqWf6chBeutO4KPAw2Z2A0F7cSkwCzjJ3d+b5HbaCAbX/ZjgG+n3gCfd/YHXWOfzwCNm9jfgVwTNIaMIOm9fcvefhgHwK4Jvu1e6e62ZfYigI/QK4CfdbHsHwTfgi8xsDUFH6xp33x8+/hDBN+m5wJeSfI0QNL3cZ2ZfC7f5NYKzfn4IwanCZvZl4EdmNgZ4gKAfZSLBWUE/dffFKezvEGbWVR/OfndfmXB/N/BPM/smQUB/h6Cj+o6wxhfN7B7gO2ZWCCwJa/skQcf6vnA7PwQ+BCwys+sIjjDHA6e5+yd6+xokvRQO0ivu3mxm5wD/BXyW4ENsH8EHwd0pbOoWgm/ptxOcsfNPemiqcfcl4YfdtQRnUZUSfPg+RXB6J8DngDMJPpBqw/UeM7PvAd8zswfd/d9dbLvNzC4h+GB8IKxtHkGoxPtE/hRu/1cpvM6VBH0W1xG8Vy8QnJHV3lzk7gvM7BWCU3A/Gi7eQvBNfAO9l0/QId/R0xza8f9PYAVBcI4kOJr5tLs3JTznP8LXcBlBIG8K670x4XXsMbNTCN7D6wg62l8htX8XEjFz12VCJRpm5sDX3P1bUdeSCjN7EVjt7hck+fxFBKfLnt2vhR0GM9sIPOjuH4+6FhkYdOQgkoSwX+QEgsF6xwBqHpGspnAQSc4YghHhe4AvuXtXzTQiWUPNSiIi0olOZRURkU4UDiIi0knG9jmMGDHCJ02aFHUZIiIZZenSpbvcfWRPz8vYcJg0aRJLliQzXYyIiMSZWVKj7dWsJCIinSgcRESkE4WDiIh0onAQEZFOFA4iItKJwkFERDrJqXBwdx5Zu5Nr/rKCF7bs63kFEZEclbHjHHrDzLjurytZt7OOQQX5zJ4wNOqSREQGpJw6cgCYf+QoAB5cldL12UVEckruhcOs4BK563bWsXFXXcTViIgMTGkLBzNbaWa1CbcDZuZmdkK6agB4/RHDKB9UAOjoQUSkO2kLB3c/2t0Hx28E15z9t7s/l64aAGL5ecybGcw5tXBVVTp3LSKSMSJpVjKzGMEF1G9Ocb0KM5thZjNaWlp6vf94v8OzG/dQfaC519sREclWUfU5nAeUA79Ocb0rgDXAmqqq3n/rP2PmSGJ5RktbcGqriIgcKqpw+CTwe3dPdbDBTcBMYGZlZWWvd15WXMAbJw8HYKH6HUREOkl7OJjZVGA+8PNU13X33e6+1t3XxmKHN0Qj3rS0aM1OWlrbDmtbIiLZJoojh08CL7j70xHsu93ZRwZHHtUHmlm6aW+UpYiIDDhpDQczKwQuoRdHDX3tiIpSplUOBmDhap21JCKSKN1HDu8BioHfpnm/XZofHj1ovIOIyKHSGg7ufre7l7l7bTr3252zw36H9Tvr2KDR0iIi7XJu+oxEJ0wcxrCSYLS0zloSETkop8MhP8+YN1NNSyIiHeV0OEDiaOm9VNdrtLSICCgcOH3GCAryjdY2Z9FanbUkIgIKB4YUF3Di5ApAE/GJiMTlfDjAwVNaF62polmjpUVEFA4A82cF/Q41DS0s2ajR0iIiCgdgYkUJ0+OjpXXWkoiIwiEuftaSptIQEVE4tItPxLdhVx3rdg6IAdwiIpFROISOnziM4aWFgJqWREQUDqH8POPM8NrSD+qUVhHJcQqHBPGJ+JZu2su++qaIqxERiY7CIcFp0w+Olta1pUUklykcEgwpLuCkKcFoaTUtiUguUzh0MH+WRkuLiCgcOoiPd9jf0MKzG/dEXI2ISDQUDh1MGF7CzFFDAE3EJyK5S+HQhfhEfAtX7cDdI65GRCT9FA5diDctbdxdz7qdura0iOQehUMXjpswlAqNlhaRHKZw6EJ+njFvVrxpSf0OIpJ70h4OZna2mT1lZrVmtsvMFqS7hmTEJ+JbsmkPe+s0WlpEcktaw8HMzgT+CNwAVADjgdvSWUOy5k4fSWF+Hm2Ori0tIjkn3UcO/w383N3/6O6N7t7g7s+luYakDC6KceKU4YBGS4tI7klbOJhZKfBGIGZmz4VNSovMbE4K26gwsxlmNqOlpaX/ig3FJ+J7dM1Omlo0WlpEckc6jxyGhfu7CLgEGAs8APzdzIYmuY0rgDXAmqqq/v82Hx/vsL9Ro6VFJLekMxz2hz9/6e7L3b2JoJmpADglyW3cBMwEZlZWVvZDiYcaP6yEWaOD0dIP6pRWEckhaQsHd68GNgIdhxx7F8u628Zud1/r7mtjsVgfV9i1g6OlqzRaWkRyRro7pBcAHzGzo8wsBnwRaAQWp7mOpMVHS2/eU8/LVbq2tIjkhvR8/T7oBmAI8BBQDDwPvDU8qhiQjhs/lBGDC9lV28TC1VVMDyflExHJZmk9cvDANe4+2t2Huvs8d1+WzhpSlZdnzJt5cCI+EZFcoOkzkjA/4drSGi0tIrlA4ZCE06aPaB8t/fAaDYgTkeyncEhCaVGMk6cG15bWRHwikgsUDkmKT8T3yFqNlhaR7KdwSNJZYb9DbWMLz2zQaGkRyW4KhySNGzqII8eUARotLSLZT+GQgnjT0sLVura0iGQ3hUMK4qe0btlzgJc0WlpEspjCIQXHjitn5JAiQE1LIpLdFA4pyMszzpqpa0uLSPZTOKTorLDf4bnNe9ld2xhxNSIi/UPhkKLTpo+gMJaHOzy8ZmfU5YiI9AuFQ4pKCmOc0j5aWv0OIpKdFA69ED9r6dG1O2lsaY24GhGRvqdw6IX5s4J+h7qmVp5er9HSIpJ9FA69MHboII4KR0s/tFpnLYlI9lE49FJ8tPSDqzRaWkSyj8Khl+L9Dq/sPcDaHRotLSLZReHQS6/TaGkRyWIKh17Ky7P2jmmd0ioi2UbhcBjiTUvPb9nHLo2WFpEsonA4DHOnjaAoPlpaZy2JSBZROByGQYX5nDptBKCJ+EQku6QUDmb2rJldamaDU92Rmd1hZs1mVptwuyzV7Qw088NTWh97SaOlRSR7pHrk8DBwHbDNzH5hZiemuP6v3H1wwm1BiusPOPNnBf0OdU2tPKXR0iKSJVIKB3f/EjAB+DAwGnjCzF40syvNbFh/FJjIzCrMbIaZzWhpaenv3SVldHkxx4wLRkvrrCURyRYp9zm4e4u73+fubwOOAO4Dvge8ama/NbM3vMbq55vZHjNba2bX96J56gpgDbCmqmrgtPHHjx4WrqrSaGkRyQq97pA2s6nA5cAngAPAbUAxwdHENV2schMwCxgBvBs4A7g1xd3eBMwEZlZWVvay8r53dnhK66v7DrB6+/6IqxEROXypdkgXmdnFZvYwwTf404AvA2Pd/Up3Px84D7i647ruvtTdd7h7m7uvBD4HvNfMipLdv7vvdve17r42FoulUnq/OnpsGZXhaGk1LYlINkj1yGE7wbf3F4HZ7j7X3X/t7g0Jz1kMJNMz2xb+tBRrGHDy8qz9rKUHdUqriGSBVMPhc8C48ChhZVdPcPd97j6543Ize7+ZDQ1/nw78APjfDsGSseL9Di+8so+d+zVaWkQyW6rhcDrQqT3HzErN7PYe1v0UsN7M6oAHgKeAj6S4/wHrVI2WFpEskmo4fBgY1MXyQcB/vNaK7n6muw9391J3n+zun3f3mhT3P2ANKsxnbny09Gr1O4hIZks1HAw45FxNMzNgLrCzr4rKVPGJ+B57aRcNzRotLSKZK6lwMLM2M2slCIbtZtYavwEtwL3Ab/qxzowQ75Sub2rlqfW7I65GRKT3kj0f9CKCo4a7CPoOqhMeawI2uPuyPq4t44wqK+Z148p58dVqFq6q4syZA2cshohIKpIKB3f/PYCZbQOecPeBMXfFADT/yMowHHZw3buOJmh1ExHJLD02K5lZ4tffVcBwM6vs6tZ/ZWaO+GjprdUNrNqm0dIikpmSOXLYZmZj3L2KYBBcV5MHxTuq8/uyuEx09NgyRpcVs72mgYWrdnDU2LKoSxIRSVky4XAWB0c8n0XX4SAhM+OsIyu56+nNPLi6iivmT4+6JBGRlPUYDu7+SMLvi/q1mixxdhgOL2zZR9X+BiqHFEddkohISlKdeO8L3SwvNrOb+6akzHfK1BEUFwRvrUZLi0gmSnUQ3FfM7B9mNjK+wMyOBZ4DzuzLwjJZcUE+c6cFb5Em4hORTJRqOBwPDAaWm9mbzexK4GngGeCEvi4uk50dDoh7XKOlRSQDpXqZ0M0EF+n5A/B34AbgY+5+ibvX9UN9GeusWUE4HGhu5cl1Gi0tIpmlN1eCmwdcQNCUVAdcbGYj+rSqLFBZVszs8eUAPKgLAIlIhkm1Q/q7wD8ILu95EnAcMBR40czO6fvyMttZ4TUeHlqta0uLSGZJ9cjhYuBN7n6Nu7e6+yaCS4XeCvy1z6vLcPGJ+LZVN7Bya9bMTi4iOSDVcJidOO4BILwm9DXA2X1XVnY4emwZY8qDMQ4LddaSiGSQVDuk9wCYWYWZnWhmRQmPPdbXxWU6M2vvmH5IFwASkQySap/DYDP7HcGFfRYD48LlN5vZ1/uhvowXn4jvhVeqqarJistli0gOSLVZ6b+BKcCJwIGE5X8D3t1XRWWTk6dWMKggmI/wIY2WFpEMkWo4vBO4yt2f5dAJ+FYRhIZ0UFyQz9zpwZm+Gi0tIpki1XAYCXTVeD6IYNpu6UL7aOmXd2q0tIhkhFTDYTldn5V0MfDs4ZeTneaFndINzW0sXrcr4mpERHqWajhcC/zQzL5BcGGfi8zsN8DnwseSYmZ5ZrbYzNzMxqdYQ8apHFLM7AlDATUtiUhmSPVU1n8C5xHMr9QG/BdwBPBWd380hU19DqhPZd+Z7uz4Ka2rNFpaRAa+lOdWcvcH3f1Mdx/s7iXufpq7P5Ts+mY2A7gM6PLaENlqfnhK6/YajZYWkYGvNxPv9ZqZ5QG3EwTDvl6sX2FmM8xsRktLS5/X15+OHDOEseFoaU3EJyIDXY/hYGYHzKw+mVsS+7sK2O7uf+plvVcAa4A1VVWZ1XZvZu1HD5pKQ0QGuh6vIQ18mkPHNPSKmU0DrgbmHMZmbgLuAqisrFxzuDWl2/wjK7nzqU28+Go1O2oaGFWma0uLyMDUYzi4+x19tK+5BOMkVpgZHDxqWW5mX3X3BUnUshvYDTBnzuFkTDROmlJBSWE+9U2tLFxVxQdOnBh1SSIiXepVn4OZnWJmHw9vpyS52j3AVIJrQBwHnBsuPwf4dW/qyDTFBfmcFo6WXqh+BxEZwJJpVmpnZhMIPuRP5GCH8lAzewa4wN23dLeuu9eTcPqqmcX3vd3da1OqOoPNP3IU96/cweMv7+JAUyuDCvOjLklEpJNUjxxuJQiUo919uLsPB44mmDrj1lQ25O4b3d3c/ZUUa8ho82ZWYgaNLW088bJGS4vIwJRqOJwBfNrdV8UXhL9fDpzel4Vlq5FDipg9PhgtvVDXeBCRASrVcNgGdDXAoBXQ+ZlJik/Et3BVFW1tGi0tIgNPquFwDfAjM2s/zSb8/Qbga31ZWDaLj3eo2t/Ic5v3RlyNiEhnqYbDfxGMU1hvZq+Y2SvAeuCNwFfM7N/xW18Xmk1mjR7ClJGlAHz53uXUN2XWaG8RyX4pna0E3N0vVeQYM+P75x/Lhbc8xbqddVzzl5XccMHsqMsSEWmXdDiYWT7wMLDc3VOeF0kONWfScD7/phlcf/8a/rj0FU6ZWsF7Tsj62ctFJEMk3azk7q3Av4Bh/VdObvn0GVPbB8V99c8rWLczZ4Z7iMgAl2qfwypAX2/7SF6eceP7jmPkkCLqm1q5/K7ndRlRERkQUg2HLwDXm9kbw2YmOUwjhxTxowuPwwxWbavh2/+3queVRET6Warh8FeCs5WeBBp6MWW3dOHUaSO4fN40AO58ahP/eHFbxBWJSK5L9WylT/VLFcJV86fz9Po9PLNxD1+6dznHjCtnwvCSqMsSkRxlmXo94zlz5viSJUuiLqNPbas+wLk/foy99c3MnjCUP3zyZApjab1Yn4hkOTNb6u49XvMg5U8eM6s0s6vN7H/MbES47FQzm9ybQuWgMeWD2sc7vLBlHzc8kHHXMxKRLJFSOJjZ8cBq4CPAx4Cy8KE3Ad/q29Jy0/wjR/HxuUHO3vLoeh5erSmrRCT9Uj1y+AFwi7sfAzQmLL8fOLXPqspxX3rLLGaPLwfg8/csY3t1Q8QViUiuSTUcTgBu62L5VmDU4ZcjAIWxPG666ASGFMXYW9/MVXc/T6tmbxWRNEo1HFqA0i6WTwX2HH45EjexooTvnn8sAE9v2MNPFr4UcUUikktSDYd/Al80Mwvvu5kNA64jGAMhfehtx47h4hOD2dF/8tBLLF6nK8eJSHr0ZoT064F1QDFwL7ABGAr8Z9+WJgBfe/tRzBo9BHf47N3L2FXb2PNKIiKHKdVw2Au8geBI4WbgKeBqYI67q1mpHxQX5PPTD5zAoIJ8qvY38vl7XtDV40Sk3yUVDmY23Mz+CtQC1cClwA3ufpm7/8LddTpNP5pWOZhvnncMAI+u3cktj62PuCIRyXbJHjl8GzgR+DrwRYIzk37eX0VJZ+99/Xjec/w4AK6/fw1LN+nyoiLSf5INh7cCH3P377j7jcA7gbPNLNW5meQwfPO8Y5gyopTWNufK3z1PdX1z1CWJSJZKNhzGAUvjd9z930ATMDaVnZnZt81sg5nVmFmVmf3RzCamso1cVloU46cfOIHCWB6v7jvAF//4Apk6N5aIDGzJhkM+0PFramu4PBV3Ase5exkwCdiMrkudkqPGlvG1tx8FwAP/3sGvn9wUcUUiko1SaRb6nZk1JdwvBn6ZeB0Hdz/3tTbg7qsT7hrQBsxMtgAzqwAqAGbPnp3salnngydOZPHLu/jHiu18+/9W8fojhnHMuPKoyxKRLJLskcOvgC3AjoTbbwjGOCQu65GZfcDMqgnOfLoKuDaFeq8A1gBrqqpyd0I6M+O75x/L+GGDaGpt4/K7nqO2sSXqskQki0R2PQczG00ws+sT7r4oyXUSjxzWLFu2rP8KzADLtuzjvf+zmJY2513HjQ0vN2o9rygiOavfrufQV9x9O3Ar8DczG57kOrvdfa27r43FdKLUcROG8uW3zALgL8u28oclr0RckYhki6gvMxYjmMgvpbOe5KCPzZ3MvJkjAbjmf1ewdsf+iCsSkWyQtnAwszwzu9zMKsP744GfARsJLiAkvZCXZ/zgfccxqqyIhuag/+FAU2vUZYlIhkv3kcO5wAozqwOeBuqBs91dvamHYXhpIT9+//HkGazdUcs3/roy6pJEJMOlLRzcvc3dz3X3Sncvdfdx7n6xu69LVw3Z7KQpFVw1fwYAdz+7hb8sezXiikQkk0Xd5yB96PKzpnHylAoA/vO+F9m4qy7iikQkUykcskh+nvHj9x9HRWkhdU2tXP6752hsUf+DiKRO4ZBlKsuKufHC4wBY8WoN3/2H+vpFJHUKhyx0xoyRfOqMqQD88omNPLBye8QViUimUThkqavPmcEJE4cC8MU/LufVfQcirkhEMonCIUsV5Ofxk4uOp6w4RvWBZq783fM0t7ZFXZaIZAiFQxYbP6yE6y8IZq9dumkvP/zX2ogrEpFMoXDIcm8+ejSXnDIJgAWL1vHo2p3RFiQiGUHhkAO+cu4sjh5bBsDn71lGVU1DxBWJyECncMgBRbF8fvqBEygtzGdXbROf/f0yWtt0eVER6Z7CIUdMHlHKd97zOgAWr9vNgodfjrgiERnIFA455F3HjePCORMA+OGDa3l6/e6IKxKRgUrhkGOufefRTK8cTJvDVXcvY09dU88riUjOUTjkmEGF+fzs4hMoLshje00DX/jDC0R1qVgRGbgUDjloxqghXPuOowF4aHUVtz22IeKKRGSgUTjkqAvfMIF3zg6uzvrtv6/ik3cuYdW2moirEpGBQuGQo8yMb7/7GI4ZF4x/uH/lDt7648e47LdLWbNd16EWyXUKhxw2pLiAP192Kje+bzZHVJQA8PcXt/OWHz/K5Xc9x8tVCgmRXGWZ2hk5Z84cX7JkSdRlZI2W1jbue/5VfrLwJV7ZG8zgagbvmj2WK+dPZ8rIwRFXKCJ9wcyWuvucHp+ncJBEza1t3Lv0FW566OX2ab7zDM47fhxXnjWdSSNKI65QRA6HwkEOS1NLG/cs2cLPHn6ZbdXBXEz5ecb5J4zjirOmM2F4ScQVikhvKBykTzS2tPL7Z4OQ2FHTCEAsz7hgzng+M28a44cpJEQyyYALBzP7HvB2YAJQC/wf8GV339Ob7Skc0quhuZW7nt7MgkXr2FUbhERBvnHhGybwmXnTGFM+KOIKRSQZyYZDOs9WagU+CFQAs4HxwB1p3L8chuKCfD46dzKPfWkeX33bkVSUFtLc6vzmqc2c8f1FfP0vK9ihqcBFskZkzUpm9hbgHncv6836OnKIVn1TC3c+uYmfP7KOvfXNABTG8rj4xIl8+sypVA4pjrhCEenKgGtW6rRjs+uBk9z9tBTWqSA48mD27Nlrli1b1l/lSZJqG1v41eKN3PLoeqoPBCFRXJDHh046gk+eMZURg4sirlBEEg3ocDCz8wmalM5w9+dSWO9a4OsAY8aMYevWrf1Sn6Ruf0MzdzyxkVsfW09NQwsAgwry+fApk/jE6VMYXloYcYUiAgM4HMzsAuBm4Hx3fzjFdXXkMMBVH2jm9sc3cPvjG9jfGIREaWE+l5w6iUtPm8LQEoWESJQGZDiY2UeAHwDvcPcnDmdb6nMY2Krrm7nt8fXc/vgG6ppaARhcFOOjcyfzsbmTKR9UEHGFIrlpwIWDmV1J0CT0Fnd/9nC3p3DIDHvrmrj1sfXcsXgj9WFIDCmO8fG5U/jI3EmUFSskRNJpIIaDAy1AY+Jyd+++V0jCAAANN0lEQVTVpD0Kh8yyu7aRWx5dz6+f3MSB5iAkygcVcOlpk7nk1MkMLopFXKFIbhhw4dDXFA6Zaef+Rm5+ZB13PrWJxpY2AIaVFPCJ06dy8UkTdSQh0s8UDjKgVdU0sGDROu56ZjNNYUgATBlRytHjyjlmbBnHjCvn6LFl6sQW6UMKB8kI26sbWLDoZe5+ZgtNrW1dPmf8sEEcM7acY8aVhcFRzsghGj8h0hsKB8ko1fXNLH91HyterWHF1mpWvFrNpt313T5/VFkRx4wtP+QoY0x5MWaWxqpFMo/CQTJe9YFm/r21hpVhWKzYWsO6nbV09092eGkhR4dBET/SmDi8RIEhkkDhIFmprrGF1dtrgiOMMDBe2rGflrau/x0PKY61B0XQh1HO5BGl5OcpMCQ3KRwkZzQ0t7J2x/5DmqRWb9vfbR9GSWE+R4052OF9zLhyplUOpiBfl1SX7JdsOOjkcsl4xQX5HDt+KMeOH9q+rLm1jZd21LJiazUrwyOMf2+t4UBzK/VNrSzZtJclm/a2P78wlseRo4dw9Lhypo4czMThJUwcXsKE4YMoKdR/E8k9OnKQnNHa5mzYVZvQJFXNyldr2ueA6s6IwUVMHD4oITCCnxMrShg1pJg8NVFJBlGzkkgS2tqcLXvr25ukVm6tYfPuOl7Ze6DbfoxEhfl5jE8IjsTwmDC8RCO/ZcBRs5JIEvLyjCMqSjmiopS3HTumfXlLaxvbaxrYvKeeLXvq2bynns17DrTf31PXBEBTaxvrd9axfmddl9uvKC08eKSRGB4VJYwuK1bHuAxYCgeRLsTy8xg/rITxw0pgaufH9zc0syUhLDaHty176tmyt57m1uCoY3ddE7vrmli2ZV+nbRTm5zF+2KBDwmNCQl/HEE0lIhFSOIj0wpDiAo4aW8BRYztf5ba1zdkRHnV0FR67ahOOOnbVsX5X10cdIwYXMik8qpk8ooRJI0qZVFHKpBGlaq6Sfqd/YSJ9LD/PGDt0EGOHDuKkKRWdHq9rbGHL3no27+4iPPYeaJ9raldtE7tqmw45qypu5JAiJlWUtIfF5PbgKNHZVdIn9K9IJM1Ki2LMGl3GrNGdjzra2pyq/Y1s3F3Hpt11bNhVz8ZddWzcHdwamoPg2Lm/kZ37G3l2Y+fgqBxSFARGe3AERx1HDC9lUGF+v78+yQ4KB5EBJC/PGF1ezOjy4k5HHfHg2BAPi1117b9v2l3fPgV61f5GqvY38syGPZ22P7qsmEkjShKONErDpqsSigsUHHKQwkEkQyQGx8lTOwfH9pqGIDDag6Oejbvr2Ly7vn20+PaaBrbXNPDU+kODwwzGlBUHYTGi9JAmq9HlxQwpimmOqhyjcQ4iWa61zdlWfYCNu+rbgyMeIlv2HDyz6rWUFOYHwVQW3sKQGlVWzJhwecXgIp2amwE0zkFEgKCDPH5a7tzpIw55rKW1jW3VDe3NUxvC4Ni4O+gojw8ErG9qfc3xHPH9VA4pag+R9uBICJFRZcVqvsoQCgeRHBbLz2NCOL7idEYe8lg8OHbUNLT/3F7dwLaaBnZUB81TO2oa2o88giOU4LmvZWhJwcGjj4QQGRXeH1NeTPmgAjVjRUzhICJdSgyO7rS1OXvqm9heHQRHPDASw2R7TQP7Gw7OX7Wvvpl99c2s3r6/2+0WxfIONmOVF1NRWsTgonxKi2KUFMUoLQx+Ly2MUVKUz+CiGCWF+e33i2I6OjlcCgcR6bW8PGPE4CJGDC7imHHl3T6vrrElCI7wyKKrENlZ29h+IafGljY27a5/zasBvpaCfKOk8GCIHBoo+ZQUxQ4JlNKiGKVF+cE6RfFl8fvBOrEcm9Jd4SAi/a60KMbUkYOZOnJwt89pbm1j5/7GQ0JkR3h21Z66JuoaW6hvaqWuqYW6xlbqGlvaT9/tvC2n+kAz1Qea++w1FMbyKCnMpyiWR3FB1z+Lulme6s/4dopieZE1rykcRGRAKMjPax9ZnqyW1jbqmlqpTwiMuqYW6hsPhkh9Uwu18WAJfwb3D65zMHRauj17q6mlrX30ejp1FR6jyor5zcdP7Nf9pjUczOz9wGeA2UCJuyucRKTXYvl5lA/Ko3xQ301S2NTS1ilQ6hqDQGlsaaWxuY2G+M/mVhpbkvvZ1MXyZKaFb2xp63SEVNvDNUj6Qro/nPcCC4BBwC1p3reISI8KY3kUxgoZWlLY7/tqaW1LOlwaW9poDH8Wxfq//yOt4eDu9wOY2Zm9Wd/MKoAKgNmzZ/ddYSIiEYjl5xHLz6N0AM6ym2nd71cAa4A1VVVVUdciIpK1Mi0cbgJmAjMrKyujrkVEJGtlVDi4+253X+vua2OxgXcYJiKSLTIqHEREJD3SfSprPlAAFIb3i8OHGj1Tp4cVEclC6T5y+BBwALgfyA9/PwAckeY6RETkNaQ1HNz9Dne3Lm4b01mHiIi8toy92I+Z7QQ29WLVfGAUsANo7dOiMpPej0Pp/ThI78WhsuX9OMLdR/b0pIwNh94ysxkEYyVmuvvaqOuJmt6PQ+n9OEjvxaFy7f3Q2UoiItKJwkFERDrJxXDYDXwj/Cl6PzrS+3GQ3otD5dT7kXN9DiIi0rNcPHIQEZEeKBxERKQThYOIiHSicBARkU4UDiIi0onCQUREOlE4iIhIJwoHERHpJKfCwczyzex6M9tpZvvN7F4zGxF1XVEws++Z2UozqzGzrWZ2q5kNj7quqJlZnpktNjM3s/FR1xMlMzvbzJ4ys1oz22VmC6KuKSpmNtrMfh9+duw1s4fMbHbUdfWnnAoH4P8B7wJOBOL/8e+MrpxItQIfBCqA2QTvxx1RFjRAfA6oj7qIqJnZmcAfgRsI/o2MB26LsqaILQCGAzMIpu1eAvzNzCzSqvpRTk2fYWabgOvc/Rfh/anAy8Akd+/NtSGyhpm9BbjH3cuiriUq4ZTM/wDOB54HJrj7K9FWFQ0zexJ4xN3/X9S1DARmthz4qbvfEt6fCawGRrr7rkiL6yc5c+RgZkOBicDS+DJ3XwfUEHxzznXzgReiLiIqZpYH3A58AdgXcTmRMrNS4I1AzMyeC5uUFpnZnKhri9D1wPlmNtLMioFPAI9nazBADoUDMCT8Wd1h+T4gZ78tA5jZ+cCngKuiriVCVwHb3f1PURcyAAwj+Gy4CLgEGAs8APw9/JKVi54guBJcFVALvAe4NNKK+lkuhcP+8Gd5h+VDCY4ecpKZXQDcCrzT3Z+Lup4omNk04Grg8qhrGSDi/1d+6e7L3b0J+G+gADglurKiER5VPgisJfj8KAG+DTxmZqOirK0/5Uw4uPs+YDNwQnyZmU0hOGpYHlVdUTKzjwA3A+9w94ejridCc4GRwAoz2wXEQ3K5mV0WXVnRcPdqYCPQsUPSu1iWC4YDk4Gb3L3G3Zvc/TaCz8+Toy2t/+RMOIRuAb5sZpPNrAz4HnC/u2+Mtqz0M7MrCc5EebO7PxF1PRG7B5gKHBfezg2XnwP8OqqiIrYA+IiZHWVmMeCLQCOwONqy0i/sV1gLXGZmpWYWM7OPEjRVZ+0Xy1jUBaTZdwnaU58FioB/EZzOmYt+DLQADyeejefugyOrKCLuXk/C6avhhyEEfRC10VQVuRsIPvweAooJzt56a3hUkYvOI+iU3kTQvPYycIG7r4+0qn6UU6eyiohIcnKtWUlERJKgcBARkU4UDiIi0onCQUREOlE4iIhIJwoHERHpROEgMgCY2SVm1hB1HSJxCgfJeWZ2R3hxn463nJyuWwRyb4S0SHceBj7QYVlrFIWIDAQ6chAJNLn79g63nQBmttHMrjOz28PLqu40s28mXgXMzMrN7BfhtQ8azOwJMztkUjYzmx5emnavmdWb2fNmNq/Dc04zs2Xh48+Y2fHpefkih1I4iCTnswSz+s4huCDQ1cCnEx7/JXAGcCHwemAdcH98SmczG0NwTYASgon9jgW+1WEfBeGyz4Tb2AfcHU4ZLZJWmltJcp6Z3UEwAWPHDuE/ufuHzGwjsMHd5yWs833gPe4+zcymE8za+SZ3fzB8PD4526/d/Wtm9i3gI8A0dz/QRQ2XEATMbHdfHi47FXgcXcZWIqA+B5HAYuCjHZYlzsj6ZIfHngC+EF4y8kiC6xw8Hn/Q3ZvD6zAfFS46geCykp2CIUELsCLh/tbw5yiC2UBF0kbhIBKod/eXI66h1d3bEu7HD+vVrCRpp390Isk5qcP9UwiamhqAfwNGcEU5oL1Z6WRgZbjoOeBUMxuUhlpFDpvCQSRQaGajO94SHp9jZl8zsxlm9iGC603/ECA84rgP+LmZnWVmRwG/ILiw1M/C9RcQXDTnPjM72cymmNm7Op6tJDJQqFlJJDAP2NZxYXgEAPAjYBqwFGgiuJLegoSnfhS4EfgDUBo+783uvgPA3bea2Vzg+8D9QD6wmuCsJ5EBR2crifQgPFvp5+7+3ahrEUkXNSuJiEgnCgcREelEzUoiItKJjhxERKQThYOIiHSicBARkU4UDiIi0onCQUREOlE4iIhIJ/8fDb9RGSoVbuYAAAAASUVORK5CYII=\\n\",\n      \"text/plain\": [\n       \"<Figure size 432x288 with 1 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"needs_background\": \"light\"\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"# train the copy task\\n\",\n    \"dev_perplexities = train_copy_task()\\n\",\n    \"\\n\",\n    \"def plot_perplexity(perplexities):\\n\",\n    \"    \\\"\\\"\\\"plot perplexities\\\"\\\"\\\"\\n\",\n    \"    plt.title(\\\"Perplexity per Epoch\\\")\\n\",\n    \"    plt.xlabel(\\\"Epoch\\\")\\n\",\n    \"    plt.ylabel(\\\"Perplexity\\\")\\n\",\n    \"    plt.plot(perplexities)\\n\",\n    \"    \\n\",\n    \"plot_perplexity(dev_perplexities)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"You can see that the model managed to correctly 'translate' the two examples in the end.\\n\",\n    \"\\n\",\n    \"Moreover, the perplexity of the development data nicely went down towards 1.\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# A Real World Example\\n\",\n    \"\\n\",\n    \"Now we consider a real-world example using the IWSLT German-English Translation task. \\n\",\n    \"This task is much smaller than usual, but it illustrates the whole system. \\n\",\n    \"\\n\",\n    \"The cell below installs torch text and spacy. This might take a while.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 17,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"#!pip install git+git://github.com/pytorch/text spacy \\n\",\n    \"#!python -m spacy download en\\n\",\n    \"#!python -m spacy download de\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Data Loading\\n\",\n    \"\\n\",\n    \"We will load the dataset using torchtext and spacy for tokenization.\\n\",\n    \"\\n\",\n    \"This cell might take a while to run the first time, as it will download and tokenize the IWSLT data.\\n\",\n    \"\\n\",\n    \"For speed we only include short sentences, and we include a word in the vocabulary only if it occurs at least 5 times. In this case we also lowercase the data.\\n\",\n    \"\\n\",\n    \"If you have **issues** with torch text in the cell below (e.g. an `ascii` error), try running `export LC_ALL=\\\"en_US.UTF-8\\\"` before you start `jupyter notebook`.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 18,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# For data loading.\\n\",\n    \"from torchtext import data, datasets\\n\",\n    \"\\n\",\n    \"if True:\\n\",\n    \"    import spacy\\n\",\n    \"    spacy_de = spacy.load('de')\\n\",\n    \"    spacy_en = spacy.load('en')\\n\",\n    \"\\n\",\n    \"    def tokenize_de(text):\\n\",\n    \"        return [tok.text for tok in spacy_de.tokenizer(text)]\\n\",\n    \"\\n\",\n    \"    def tokenize_en(text):\\n\",\n    \"        return [tok.text for tok in spacy_en.tokenizer(text)]\\n\",\n    \"\\n\",\n    \"    UNK_TOKEN = \\\"<unk>\\\"\\n\",\n    \"    PAD_TOKEN = \\\"<pad>\\\"    \\n\",\n    \"    SOS_TOKEN = \\\"<s>\\\"\\n\",\n    \"    EOS_TOKEN = \\\"</s>\\\"\\n\",\n    \"    LOWER = True\\n\",\n    \"    \\n\",\n    \"    # we include lengths to provide to the RNNs\\n\",\n    \"    SRC = data.Field(tokenize=tokenize_de, \\n\",\n    \"                     batch_first=True, lower=LOWER, include_lengths=True,\\n\",\n    \"                     unk_token=UNK_TOKEN, pad_token=PAD_TOKEN, init_token=None, eos_token=EOS_TOKEN)\\n\",\n    \"    TRG = data.Field(tokenize=tokenize_en, \\n\",\n    \"                     batch_first=True, lower=LOWER, include_lengths=True,\\n\",\n    \"                     unk_token=UNK_TOKEN, pad_token=PAD_TOKEN, init_token=SOS_TOKEN, eos_token=EOS_TOKEN)\\n\",\n    \"\\n\",\n    \"    MAX_LEN = 25  # NOTE: we filter out a lot of sentences for speed\\n\",\n    \"    train_data, valid_data, test_data = datasets.IWSLT.splits(\\n\",\n    \"        exts=('.de', '.en'), fields=(SRC, TRG), \\n\",\n    \"        filter_pred=lambda x: len(vars(x)['src']) <= MAX_LEN and \\n\",\n    \"            len(vars(x)['trg']) <= MAX_LEN)\\n\",\n    \"    MIN_FREQ = 5  # NOTE: we limit the vocabulary to frequent words for speed\\n\",\n    \"    SRC.build_vocab(train_data.src, min_freq=MIN_FREQ)\\n\",\n    \"    TRG.build_vocab(train_data.trg, min_freq=MIN_FREQ)\\n\",\n    \"    \\n\",\n    \"    PAD_INDEX = TRG.vocab.stoi[PAD_TOKEN]\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Let's look at the data\\n\",\n    \"\\n\",\n    \"It never hurts to look at your data and some statistics.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 19,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Data set sizes (number of sentence pairs):\\n\",\n      \"train 143116\\n\",\n      \"valid 690\\n\",\n      \"test 963 \\n\",\n      \"\\n\",\n      \"First training example:\\n\",\n      \"src: david gallo : das ist bill lange . ich bin dave gallo .\\n\",\n      \"trg: david gallo : this is bill lange . i 'm dave gallo . \\n\",\n      \"\\n\",\n      \"Most common words (src):\\n\",\n      \"         .     138325\\n\",\n      \"         ,     105944\\n\",\n      \"       und      41839\\n\",\n      \"       die      40809\\n\",\n      \"       das      33324\\n\",\n      \"       sie      33035\\n\",\n      \"       ich      31153\\n\",\n      \"       ist      31035\\n\",\n      \"        es      27449\\n\",\n      \"       wir      25817 \\n\",\n      \"\\n\",\n      \"Most common words (trg):\\n\",\n      \"         .     137259\\n\",\n      \"         ,      91619\\n\",\n      \"       the      73344\\n\",\n      \"       and      50273\\n\",\n      \"        to      42798\\n\",\n      \"         a      39573\\n\",\n      \"        of      39496\\n\",\n      \"         i      33524\\n\",\n      \"        it      32921\\n\",\n      \"      that      32643 \\n\",\n      \"\\n\",\n      \"First 10 words (src):\\n\",\n      \"00 <unk>\\n\",\n      \"01 <pad>\\n\",\n      \"02 </s>\\n\",\n      \"03 .\\n\",\n      \"04 ,\\n\",\n      \"05 und\\n\",\n      \"06 die\\n\",\n      \"07 das\\n\",\n      \"08 sie\\n\",\n      \"09 ich \\n\",\n      \"\\n\",\n      \"First 10 words (trg):\\n\",\n      \"00 <unk>\\n\",\n      \"01 <pad>\\n\",\n      \"02 <s>\\n\",\n      \"03 </s>\\n\",\n      \"04 .\\n\",\n      \"05 ,\\n\",\n      \"06 the\\n\",\n      \"07 and\\n\",\n      \"08 to\\n\",\n      \"09 a \\n\",\n      \"\\n\",\n      \"Number of German words (types): 15761\\n\",\n      \"Number of English words (types): 13003 \\n\",\n      \"\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"def print_data_info(train_data, valid_data, test_data, src_field, trg_field):\\n\",\n    \"    \\\"\\\"\\\" This prints some useful stuff about our data sets. \\\"\\\"\\\"\\n\",\n    \"\\n\",\n    \"    print(\\\"Data set sizes (number of sentence pairs):\\\")\\n\",\n    \"    print('train', len(train_data))\\n\",\n    \"    print('valid', len(valid_data))\\n\",\n    \"    print('test', len(test_data), \\\"\\\\n\\\")\\n\",\n    \"\\n\",\n    \"    print(\\\"First training example:\\\")\\n\",\n    \"    print(\\\"src:\\\", \\\" \\\".join(vars(train_data[0])['src']))\\n\",\n    \"    print(\\\"trg:\\\", \\\" \\\".join(vars(train_data[0])['trg']), \\\"\\\\n\\\")\\n\",\n    \"\\n\",\n    \"    print(\\\"Most common words (src):\\\")\\n\",\n    \"    print(\\\"\\\\n\\\".join([\\\"%10s %10d\\\" % x for x in src_field.vocab.freqs.most_common(10)]), \\\"\\\\n\\\")\\n\",\n    \"    print(\\\"Most common words (trg):\\\")\\n\",\n    \"    print(\\\"\\\\n\\\".join([\\\"%10s %10d\\\" % x for x in trg_field.vocab.freqs.most_common(10)]), \\\"\\\\n\\\")\\n\",\n    \"\\n\",\n    \"    print(\\\"First 10 words (src):\\\")\\n\",\n    \"    print(\\\"\\\\n\\\".join(\\n\",\n    \"        '%02d %s' % (i, t) for i, t in enumerate(src_field.vocab.itos[:10])), \\\"\\\\n\\\")\\n\",\n    \"    print(\\\"First 10 words (trg):\\\")\\n\",\n    \"    print(\\\"\\\\n\\\".join(\\n\",\n    \"        '%02d %s' % (i, t) for i, t in enumerate(trg_field.vocab.itos[:10])), \\\"\\\\n\\\")\\n\",\n    \"\\n\",\n    \"    print(\\\"Number of German words (types):\\\", len(src_field.vocab))\\n\",\n    \"    print(\\\"Number of English words (types):\\\", len(trg_field.vocab), \\\"\\\\n\\\")\\n\",\n    \"    \\n\",\n    \"    \\n\",\n    \"print_data_info(train_data, valid_data, test_data, SRC, TRG)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Iterators\\n\",\n    \"Batching matters a ton for speed. We will use torch text's BucketIterator here to get batches containing sentences of (almost) the same length.\\n\",\n    \"\\n\",\n    \"#### Note on sorting batches for RNNs in PyTorch\\n\",\n    \"\\n\",\n    \"For effiency reasons, PyTorch RNNs require that batches have been sorted by length, with the longest sentence in the batch first. For training, we simply sort each batch. \\n\",\n    \"For validation, we would run into trouble if we want to compare our translations with some external file that was not sorted. Therefore we simply set the validation batch size to 1, so that we can keep it in the original order.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 20,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"train_iter = data.BucketIterator(train_data, batch_size=64, train=True, \\n\",\n    \"                                 sort_within_batch=True, \\n\",\n    \"                                 sort_key=lambda x: (len(x.src), len(x.trg)), repeat=False,\\n\",\n    \"                                 device=DEVICE)\\n\",\n    \"valid_iter = data.Iterator(valid_data, batch_size=1, train=False, sort=False, repeat=False, \\n\",\n    \"                           device=DEVICE)\\n\",\n    \"\\n\",\n    \"\\n\",\n    \"def rebatch(pad_idx, batch):\\n\",\n    \"    \\\"\\\"\\\"Wrap torchtext batch into our own Batch class for pre-processing\\\"\\\"\\\"\\n\",\n    \"    return Batch(batch.src, batch.trg, pad_idx)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Training the System\\n\",\n    \"\\n\",\n    \"Now we train the model. \\n\",\n    \"\\n\",\n    \"On a Titan X GPU, this runs at ~18,000 tokens per second with a batch size of 64.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 21,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def train(model, num_epochs=10, lr=0.0003, print_every=100):\\n\",\n    \"    \\\"\\\"\\\"Train a model on IWSLT\\\"\\\"\\\"\\n\",\n    \"    \\n\",\n    \"    if USE_CUDA:\\n\",\n    \"        model.cuda()\\n\",\n    \"\\n\",\n    \"    # optionally add label smoothing; see the Annotated Transformer\\n\",\n    \"    criterion = nn.NLLLoss(reduction=\\\"sum\\\", ignore_index=PAD_INDEX)\\n\",\n    \"    optim = torch.optim.Adam(model.parameters(), lr=lr)\\n\",\n    \"    \\n\",\n    \"    dev_perplexities = []\\n\",\n    \"\\n\",\n    \"    for epoch in range(num_epochs):\\n\",\n    \"      \\n\",\n    \"        print(\\\"Epoch\\\", epoch)\\n\",\n    \"        model.train()\\n\",\n    \"        train_perplexity = run_epoch((rebatch(PAD_INDEX, b) for b in train_iter), \\n\",\n    \"                                     model,\\n\",\n    \"                                     SimpleLossCompute(model.generator, criterion, optim),\\n\",\n    \"                                     print_every=print_every)\\n\",\n    \"        \\n\",\n    \"        model.eval()\\n\",\n    \"        with torch.no_grad():\\n\",\n    \"            print_examples((rebatch(PAD_INDEX, x) for x in valid_iter), \\n\",\n    \"                           model, n=3, src_vocab=SRC.vocab, trg_vocab=TRG.vocab)        \\n\",\n    \"\\n\",\n    \"            dev_perplexity = run_epoch((rebatch(PAD_INDEX, b) for b in valid_iter), \\n\",\n    \"                                       model, \\n\",\n    \"                                       SimpleLossCompute(model.generator, criterion, None))\\n\",\n    \"            print(\\\"Validation perplexity: %f\\\" % dev_perplexity)\\n\",\n    \"            dev_perplexities.append(dev_perplexity)\\n\",\n    \"        \\n\",\n    \"    return dev_perplexities\\n\",\n    \"        \"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 22,\n   \"metadata\": {\n    \"scrolled\": false\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch 0\\n\"\n     ]\n    },\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"/home/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1\\n\",\n      \"  \\\"num_layers={}\\\".format(dropout, num_layers))\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch Step: 100 Loss: 22.353386 Tokens per Sec: 16007.731248\\n\",\n      \"Epoch Step: 200 Loss: 34.410126 Tokens per Sec: 16368.906298\\n\",\n      \"Epoch Step: 300 Loss: 44.763870 Tokens per Sec: 16586.324787\\n\",\n      \"Epoch Step: 400 Loss: 57.584606 Tokens per Sec: 16717.486756\\n\",\n      \"Epoch Step: 500 Loss: 40.508701 Tokens per Sec: 16486.886104\\n\",\n      \"Epoch Step: 600 Loss: 51.919121 Tokens per Sec: 16529.862635\\n\",\n      \"Epoch Step: 700 Loss: 82.279633 Tokens per Sec: 16973.462052\\n\",\n      \"Epoch Step: 800 Loss: 35.026432 Tokens per Sec: 16724.939524\\n\",\n      \"Epoch Step: 900 Loss: 63.407204 Tokens per Sec: 16606.524355\\n\",\n      \"Epoch Step: 1000 Loss: 37.909828 Tokens per Sec: 19105.497130\\n\",\n      \"Epoch Step: 1100 Loss: 90.584244 Tokens per Sec: 19643.264684\\n\",\n      \"Epoch Step: 1200 Loss: 84.000832 Tokens per Sec: 19468.084935\\n\",\n      \"Epoch Step: 1300 Loss: 54.331242 Tokens per Sec: 19679.282614\\n\",\n      \"Epoch Step: 1400 Loss: 49.921040 Tokens per Sec: 19629.820942\\n\",\n      \"Epoch Step: 1500 Loss: 21.851797 Tokens per Sec: 19565.639729\\n\",\n      \"Epoch Step: 1600 Loss: 55.154270 Tokens per Sec: 19515.738007\\n\",\n      \"Epoch Step: 1700 Loss: 40.758137 Tokens per Sec: 19486.791554\\n\",\n      \"Epoch Step: 1800 Loss: 50.094219 Tokens per Sec: 19761.236905\\n\",\n      \"Epoch Step: 1900 Loss: 90.545143 Tokens per Sec: 19447.650965\\n\",\n      \"Epoch Step: 2000 Loss: 22.882494 Tokens per Sec: 19539.331538\\n\",\n      \"Epoch Step: 2100 Loss: 99.448174 Tokens per Sec: 19278.704892\\n\",\n      \"Epoch Step: 2200 Loss: 16.793839 Tokens per Sec: 19183.702688\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was born years old , i was a <unk> of the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father was on his <unk> , the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he was very interested in the way , what was pretty much more , and then it was the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 31.839708\\n\",\n      \"Epoch 1\\n\",\n      \"Epoch Step: 100 Loss: 4.451122 Tokens per Sec: 19110.156367\\n\",\n      \"Epoch Step: 200 Loss: 11.262838 Tokens per Sec: 19538.253630\\n\",\n      \"Epoch Step: 300 Loss: 55.240711 Tokens per Sec: 19584.509548\\n\",\n      \"Epoch Step: 400 Loss: 54.733456 Tokens per Sec: 19787.183104\\n\",\n      \"Epoch Step: 500 Loss: 38.923244 Tokens per Sec: 19385.772613\\n\",\n      \"Epoch Step: 600 Loss: 63.162933 Tokens per Sec: 19013.165752\\n\",\n      \"Epoch Step: 700 Loss: 47.323864 Tokens per Sec: 18863.104141\\n\",\n      \"Epoch Step: 800 Loss: 43.414978 Tokens per Sec: 19258.337491\\n\",\n      \"Epoch Step: 900 Loss: 87.750214 Tokens per Sec: 19179.949782\\n\",\n      \"Epoch Step: 1000 Loss: 39.787056 Tokens per Sec: 19110.748464\\n\",\n      \"Epoch Step: 1100 Loss: 78.177170 Tokens per Sec: 19272.044197\\n\",\n      \"Epoch Step: 1200 Loss: 37.122997 Tokens per Sec: 19194.535740\\n\",\n      \"Epoch Step: 1300 Loss: 26.103378 Tokens per Sec: 19337.967366\\n\",\n      \"Epoch Step: 1400 Loss: 78.804855 Tokens per Sec: 19018.413406\\n\",\n      \"Epoch Step: 1500 Loss: 61.593956 Tokens per Sec: 19259.272095\\n\",\n      \"Epoch Step: 1600 Loss: 81.611786 Tokens per Sec: 19259.527179\\n\",\n      \"Epoch Step: 1700 Loss: 28.692696 Tokens per Sec: 19230.891840\\n\",\n      \"Epoch Step: 1800 Loss: 84.163223 Tokens per Sec: 19071.272023\\n\",\n      \"Epoch Step: 1900 Loss: 36.782116 Tokens per Sec: 19209.383788\\n\",\n      \"Epoch Step: 2000 Loss: 56.666332 Tokens per Sec: 19127.522297\\n\",\n      \"Epoch Step: 2100 Loss: 5.576357 Tokens per Sec: 18957.458966\\n\",\n      \"Epoch Step: 2200 Loss: 38.791512 Tokens per Sec: 19166.811446\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was a <unk> of the <unk> <unk> .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father was on his <unk> , in the little <unk> , the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he saw very happy , what was pretty much , and it was the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 19.906190\\n\",\n      \"Epoch 2\\n\",\n      \"Epoch Step: 100 Loss: 58.981544 Tokens per Sec: 19121.747106\\n\",\n      \"Epoch Step: 200 Loss: 34.874680 Tokens per Sec: 19689.768904\\n\",\n      \"Epoch Step: 300 Loss: 27.895102 Tokens per Sec: 19751.401628\\n\",\n      \"Epoch Step: 400 Loss: 52.931011 Tokens per Sec: 16369.447354\\n\",\n      \"Epoch Step: 500 Loss: 77.191933 Tokens per Sec: 16337.808093\\n\",\n      \"Epoch Step: 600 Loss: 65.645668 Tokens per Sec: 16307.871308\\n\",\n      \"Epoch Step: 700 Loss: 7.141161 Tokens per Sec: 16420.432824\\n\",\n      \"Epoch Step: 800 Loss: 76.990250 Tokens per Sec: 17512.558218\\n\",\n      \"Epoch Step: 900 Loss: 43.835995 Tokens per Sec: 16399.672659\\n\",\n      \"Epoch Step: 1000 Loss: 68.026192 Tokens per Sec: 16598.504664\\n\",\n      \"Epoch Step: 1100 Loss: 23.746111 Tokens per Sec: 16368.137311\\n\",\n      \"Epoch Step: 1200 Loss: 42.117832 Tokens per Sec: 16324.872475\\n\",\n      \"Epoch Step: 1300 Loss: 47.894409 Tokens per Sec: 16532.223380\\n\",\n      \"Epoch Step: 1400 Loss: 43.772861 Tokens per Sec: 16472.315811\\n\",\n      \"Epoch Step: 1500 Loss: 60.978756 Tokens per Sec: 16368.088307\\n\",\n      \"Epoch Step: 1600 Loss: 59.143227 Tokens per Sec: 16553.220745\\n\",\n      \"Epoch Step: 1700 Loss: 34.091373 Tokens per Sec: 16557.579342\\n\",\n      \"Epoch Step: 1800 Loss: 11.551711 Tokens per Sec: 16639.281663\\n\",\n      \"Epoch Step: 1900 Loss: 40.060520 Tokens per Sec: 16666.679672\\n\",\n      \"Epoch Step: 2000 Loss: 21.947863 Tokens per Sec: 16403.240568\\n\",\n      \"Epoch Step: 2100 Loss: 12.891315 Tokens per Sec: 16656.630033\\n\",\n      \"Epoch Step: 2200 Loss: 12.300262 Tokens per Sec: 16592.045153\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was a <unk> of the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father was on his little , <unk> , <unk> the <unk> of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he looked very happy to what was pretty much more , because it was the <unk> of the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 15.555337\\n\",\n      \"Epoch 3\\n\",\n      \"Epoch Step: 100 Loss: 36.178066 Tokens per Sec: 16064.364293\\n\",\n      \"Epoch Step: 200 Loss: 20.046204 Tokens per Sec: 16557.065342\\n\",\n      \"Epoch Step: 300 Loss: 53.514584 Tokens per Sec: 16375.767859\\n\",\n      \"Epoch Step: 400 Loss: 29.280447 Tokens per Sec: 16687.195842\\n\",\n      \"Epoch Step: 500 Loss: 64.491814 Tokens per Sec: 16491.438857\\n\",\n      \"Epoch Step: 600 Loss: 62.286755 Tokens per Sec: 16443.863308\\n\",\n      \"Epoch Step: 700 Loss: 60.861393 Tokens per Sec: 16303.304238\\n\",\n      \"Epoch Step: 800 Loss: 25.101744 Tokens per Sec: 16437.206262\\n\",\n      \"Epoch Step: 900 Loss: 41.884624 Tokens per Sec: 16712.862598\\n\",\n      \"Epoch Step: 1000 Loss: 65.880905 Tokens per Sec: 16406.042864\\n\",\n      \"Epoch Step: 1100 Loss: 34.799385 Tokens per Sec: 16257.804744\\n\",\n      \"Epoch Step: 1200 Loss: 57.244125 Tokens per Sec: 16403.685499\\n\",\n      \"Epoch Step: 1300 Loss: 6.766514 Tokens per Sec: 16262.412676\\n\",\n      \"Epoch Step: 1400 Loss: 31.528254 Tokens per Sec: 16723.894609\\n\",\n      \"Epoch Step: 1500 Loss: 4.534189 Tokens per Sec: 16512.533272\\n\",\n      \"Epoch Step: 1600 Loss: 50.852787 Tokens per Sec: 16820.837828\\n\",\n      \"Epoch Step: 1700 Loss: 30.657820 Tokens per Sec: 16574.791159\\n\",\n      \"Epoch Step: 1800 Loss: 75.787910 Tokens per Sec: 16441.350335\\n\",\n      \"Epoch Step: 1900 Loss: 23.563347 Tokens per Sec: 16836.284727\\n\",\n      \"Epoch Step: 2000 Loss: 10.594786 Tokens per Sec: 16522.362683\\n\",\n      \"Epoch Step: 2100 Loss: 40.561062 Tokens per Sec: 16508.617285\\n\",\n      \"Epoch Step: 2200 Loss: 15.348518 Tokens per Sec: 16624.360367\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 11 years old , i was a <unk> of the <unk> <unk> joy .\\n\",\n      \"\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father was on his little , <unk> , <unk> , the <unk> of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he saw very happy , what was pretty much , because it was the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 13.563748\\n\",\n      \"Epoch 4\\n\",\n      \"Epoch Step: 100 Loss: 9.601490 Tokens per Sec: 16309.901017\\n\",\n      \"Epoch Step: 200 Loss: 13.329712 Tokens per Sec: 16693.352689\\n\",\n      \"Epoch Step: 300 Loss: 61.213333 Tokens per Sec: 16774.275779\\n\",\n      \"Epoch Step: 400 Loss: 37.759483 Tokens per Sec: 16628.037095\\n\",\n      \"Epoch Step: 500 Loss: 35.616104 Tokens per Sec: 16677.874896\\n\",\n      \"Epoch Step: 600 Loss: 58.753849 Tokens per Sec: 16452.736708\\n\",\n      \"Epoch Step: 700 Loss: 11.741160 Tokens per Sec: 16615.759446\\n\",\n      \"Epoch Step: 800 Loss: 24.230316 Tokens per Sec: 16804.673563\\n\",\n      \"Epoch Step: 900 Loss: 27.786499 Tokens per Sec: 16373.396939\\n\",\n      \"Epoch Step: 1000 Loss: 65.063515 Tokens per Sec: 16520.381173\\n\",\n      \"Epoch Step: 1100 Loss: 34.756481 Tokens per Sec: 16492.656502\\n\",\n      \"Epoch Step: 1200 Loss: 43.993877 Tokens per Sec: 17075.912389\\n\",\n      \"Epoch Step: 1300 Loss: 36.514729 Tokens per Sec: 16812.641454\\n\",\n      \"Epoch Step: 1400 Loss: 58.995735 Tokens per Sec: 16535.979640\\n\",\n      \"Epoch Step: 1500 Loss: 29.516464 Tokens per Sec: 16500.141569\\n\",\n      \"Epoch Step: 1600 Loss: 10.143467 Tokens per Sec: 16613.933279\\n\",\n      \"Epoch Step: 1700 Loss: 53.287037 Tokens per Sec: 16756.922926\\n\",\n      \"Epoch Step: 1800 Loss: 24.687494 Tokens per Sec: 16477.783348\\n\",\n      \"Epoch Step: 1900 Loss: 21.578268 Tokens per Sec: 16808.344988\\n\",\n      \"Epoch Step: 2000 Loss: 60.965946 Tokens per Sec: 16651.623717\\n\",\n      \"Epoch Step: 2100 Loss: 18.895075 Tokens per Sec: 16636.292649\\n\",\n      \"Epoch Step: 2200 Loss: 53.253704 Tokens per Sec: 16642.799323\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was a <unk> of the <unk> <unk> joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my dad listened on his little , <unk> radio <unk> the bbc of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he saw a happy very happy , which was pretty much , because he was the most famous <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 12.664111\\n\",\n      \"Epoch 5\\n\",\n      \"Epoch Step: 100 Loss: 21.919912 Tokens per Sec: 16266.471497\\n\",\n      \"Epoch Step: 200 Loss: 31.320656 Tokens per Sec: 16527.955427\\n\",\n      \"Epoch Step: 300 Loss: 40.778984 Tokens per Sec: 16517.710752\\n\",\n      \"Epoch Step: 400 Loss: 63.466324 Tokens per Sec: 16770.294841\\n\",\n      \"Epoch Step: 500 Loss: 49.329956 Tokens per Sec: 16694.936223\\n\",\n      \"Epoch Step: 600 Loss: 52.290169 Tokens per Sec: 16755.442966\\n\",\n      \"Epoch Step: 700 Loss: 51.911785 Tokens per Sec: 16768.565847\\n\",\n      \"Epoch Step: 800 Loss: 25.005857 Tokens per Sec: 16813.186507\\n\",\n      \"Epoch Step: 900 Loss: 50.679825 Tokens per Sec: 17109.031968\\n\",\n      \"Epoch Step: 1000 Loss: 13.069316 Tokens per Sec: 16692.984251\\n\",\n      \"Epoch Step: 1100 Loss: 12.595688 Tokens per Sec: 16546.293379\\n\",\n      \"Epoch Step: 1200 Loss: 46.846031 Tokens per Sec: 16491.379305\\n\",\n      \"Epoch Step: 1300 Loss: 30.238283 Tokens per Sec: 16558.196936\\n\",\n      \"Epoch Step: 1400 Loss: 23.865877 Tokens per Sec: 16556.353749\\n\",\n      \"Epoch Step: 1500 Loss: 42.451859 Tokens per Sec: 16784.645679\\n\",\n      \"Epoch Step: 1600 Loss: 37.048477 Tokens per Sec: 16651.129133\\n\",\n      \"Epoch Step: 1700 Loss: 17.043219 Tokens per Sec: 16655.630464\\n\",\n      \"Epoch Step: 1800 Loss: 17.227308 Tokens per Sec: 16688.568658\\n\",\n      \"Epoch Step: 1900 Loss: 23.672441 Tokens per Sec: 16609.439477\\n\",\n      \"Epoch Step: 2000 Loss: 19.385946 Tokens per Sec: 16586.442474\\n\",\n      \"Epoch Step: 2100 Loss: 25.717686 Tokens per Sec: 16879.694187\\n\",\n      \"Epoch Step: 2200 Loss: 22.427767 Tokens per Sec: 16844.504307\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was <unk> by the morning of joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father listened on his little , gray radio waves the bbc of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he saw a very happy ending , which was pretty unusual , since then they were <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 12.246438\\n\",\n      \"Epoch 6\\n\",\n      \"Epoch Step: 100 Loss: 19.048712 Tokens per Sec: 19024.102757\\n\",\n      \"Epoch Step: 200 Loss: 31.636736 Tokens per Sec: 19387.779254\\n\",\n      \"Epoch Step: 300 Loss: 15.952754 Tokens per Sec: 19559.196457\\n\",\n      \"Epoch Step: 400 Loss: 24.849632 Tokens per Sec: 18968.450791\\n\",\n      \"Epoch Step: 500 Loss: 47.227837 Tokens per Sec: 19009.957585\\n\",\n      \"Epoch Step: 600 Loss: 8.887992 Tokens per Sec: 19024.581918\\n\",\n      \"Epoch Step: 700 Loss: 58.158920 Tokens per Sec: 16834.343585\\n\",\n      \"Epoch Step: 800 Loss: 32.257362 Tokens per Sec: 16725.454783\\n\",\n      \"Epoch Step: 900 Loss: 5.977044 Tokens per Sec: 16398.470679\\n\",\n      \"Epoch Step: 1000 Loss: 51.871101 Tokens per Sec: 16302.492231\\n\",\n      \"Epoch Step: 1100 Loss: 44.715164 Tokens per Sec: 16505.477988\\n\",\n      \"Epoch Step: 1200 Loss: 4.128096 Tokens per Sec: 19255.909773\\n\",\n      \"Epoch Step: 1300 Loss: 53.065189 Tokens per Sec: 19016.853318\\n\",\n      \"Epoch Step: 1400 Loss: 23.775473 Tokens per Sec: 18877.681861\\n\",\n      \"Epoch Step: 1500 Loss: 15.587101 Tokens per Sec: 18916.694718\\n\",\n      \"Epoch Step: 1600 Loss: 59.449795 Tokens per Sec: 19166.565245\\n\",\n      \"Epoch Step: 1700 Loss: 48.393402 Tokens per Sec: 18836.264938\\n\",\n      \"Epoch Step: 1800 Loss: 45.651253 Tokens per Sec: 18823.983316\\n\",\n      \"Epoch Step: 1900 Loss: 51.898994 Tokens per Sec: 19015.027947\\n\",\n      \"Epoch Step: 2000 Loss: 16.392334 Tokens per Sec: 19180.065119\\n\",\n      \"Epoch Step: 2100 Loss: 20.312500 Tokens per Sec: 19059.061076\\n\",\n      \"Epoch Step: 2200 Loss: 41.126842 Tokens per Sec: 19110.648056\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 , i was a <unk> of the <unk> <unk> joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father listened to his little , <unk> radio shack the <unk> of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he looked very happy , which was pretty unusual , and then they had the news <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 12.045694\\n\",\n      \"Epoch 7\\n\",\n      \"Epoch Step: 100 Loss: 22.484320 Tokens per Sec: 19136.387726\\n\",\n      \"Epoch Step: 200 Loss: 54.793003 Tokens per Sec: 19562.003455\\n\",\n      \"Epoch Step: 300 Loss: 52.516510 Tokens per Sec: 19494.585192\\n\",\n      \"Epoch Step: 400 Loss: 25.631699 Tokens per Sec: 19127.415568\\n\",\n      \"Epoch Step: 500 Loss: 15.818419 Tokens per Sec: 18909.082434\\n\",\n      \"Epoch Step: 600 Loss: 40.660767 Tokens per Sec: 19063.824782\\n\",\n      \"Epoch Step: 700 Loss: 21.253407 Tokens per Sec: 19011.780769\\n\",\n      \"Epoch Step: 800 Loss: 9.494976 Tokens per Sec: 19032.447976\\n\",\n      \"Epoch Step: 900 Loss: 21.503059 Tokens per Sec: 19120.646494\\n\",\n      \"Epoch Step: 1000 Loss: 34.198826 Tokens per Sec: 18751.274337\\n\",\n      \"Epoch Step: 1100 Loss: 21.471136 Tokens per Sec: 19119.629059\\n\",\n      \"Epoch Step: 1200 Loss: 45.433662 Tokens per Sec: 19158.978952\\n\",\n      \"Epoch Step: 1300 Loss: 48.697639 Tokens per Sec: 18852.568454\\n\",\n      \"Epoch Step: 1400 Loss: 48.406239 Tokens per Sec: 19090.121092\\n\",\n      \"Epoch Step: 1500 Loss: 10.506186 Tokens per Sec: 18996.606224\\n\"\n     ]\n    },\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch Step: 1600 Loss: 22.061657 Tokens per Sec: 18889.519602\\n\",\n      \"Epoch Step: 1700 Loss: 11.148299 Tokens per Sec: 19179.133196\\n\",\n      \"Epoch Step: 1800 Loss: 16.580446 Tokens per Sec: 19184.709044\\n\",\n      \"Epoch Step: 1900 Loss: 20.219671 Tokens per Sec: 18889.205997\\n\",\n      \"Epoch Step: 2000 Loss: 21.245464 Tokens per Sec: 18869.151894\\n\",\n      \"Epoch Step: 2100 Loss: 29.567142 Tokens per Sec: 18825.496347\\n\",\n      \"Epoch Step: 2200 Loss: 22.790722 Tokens per Sec: 18923.950021\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was <unk> a <unk> of the <unk> <unk> joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father listened to his little , <unk> radio <unk> the <unk> of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he looked very happy , which was pretty unusual , because he was going to put him in the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 11.837098\\n\",\n      \"Epoch 8\\n\",\n      \"Epoch Step: 100 Loss: 49.162842 Tokens per Sec: 19241.082862\\n\",\n      \"Epoch Step: 200 Loss: 35.163906 Tokens per Sec: 19633.028114\\n\",\n      \"Epoch Step: 300 Loss: 10.108455 Tokens per Sec: 17179.927672\\n\",\n      \"Epoch Step: 400 Loss: 12.883712 Tokens per Sec: 16510.876579\\n\",\n      \"Epoch Step: 500 Loss: 32.006828 Tokens per Sec: 16459.413702\\n\",\n      \"Epoch Step: 600 Loss: 21.056961 Tokens per Sec: 16640.683528\\n\",\n      \"Epoch Step: 700 Loss: 5.884560 Tokens per Sec: 16567.539919\\n\",\n      \"Epoch Step: 800 Loss: 17.562445 Tokens per Sec: 16529.548052\\n\",\n      \"Epoch Step: 900 Loss: 25.654568 Tokens per Sec: 16629.045928\\n\",\n      \"Epoch Step: 1000 Loss: 30.116678 Tokens per Sec: 16519.515326\\n\",\n      \"Epoch Step: 1100 Loss: 49.594883 Tokens per Sec: 16766.220937\\n\",\n      \"Epoch Step: 1200 Loss: 35.545147 Tokens per Sec: 16729.972737\\n\",\n      \"Epoch Step: 1300 Loss: 12.314122 Tokens per Sec: 16479.824355\\n\",\n      \"Epoch Step: 1400 Loss: 5.982590 Tokens per Sec: 16592.352361\\n\",\n      \"Epoch Step: 1500 Loss: 23.507740 Tokens per Sec: 16396.264595\\n\",\n      \"Epoch Step: 1600 Loss: 36.874157 Tokens per Sec: 16554.722618\\n\",\n      \"Epoch Step: 1700 Loss: 13.514697 Tokens per Sec: 16605.822594\\n\",\n      \"Epoch Step: 1800 Loss: 6.016938 Tokens per Sec: 16390.681327\\n\",\n      \"Epoch Step: 1900 Loss: 44.648132 Tokens per Sec: 16575.965569\\n\",\n      \"Epoch Step: 2000 Loss: 21.025373 Tokens per Sec: 16363.246501\\n\",\n      \"Epoch Step: 2100 Loss: 32.213993 Tokens per Sec: 16395.313089\\n\",\n      \"Epoch Step: 2200 Loss: 29.033810 Tokens per Sec: 16528.855537\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 years old , i was <unk> a <unk> of the <unk> <unk> joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father listened to his little , gray radio shack , the radio of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he looked very happy , which was pretty unusual , because he was the news of the most famous .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 11.868392\\n\",\n      \"Epoch 9\\n\",\n      \"Epoch Step: 100 Loss: 33.819195 Tokens per Sec: 16155.433696\\n\",\n      \"Epoch Step: 200 Loss: 26.771244 Tokens per Sec: 16447.243194\\n\",\n      \"Epoch Step: 300 Loss: 22.235714 Tokens per Sec: 16557.847083\\n\",\n      \"Epoch Step: 400 Loss: 16.233931 Tokens per Sec: 16802.777289\\n\",\n      \"Epoch Step: 500 Loss: 34.811615 Tokens per Sec: 16637.208199\\n\",\n      \"Epoch Step: 600 Loss: 11.960271 Tokens per Sec: 16478.541533\\n\",\n      \"Epoch Step: 700 Loss: 32.807648 Tokens per Sec: 16526.645827\\n\",\n      \"Epoch Step: 800 Loss: 25.779436 Tokens per Sec: 16572.304586\\n\",\n      \"Epoch Step: 900 Loss: 18.101871 Tokens per Sec: 16472.573763\\n\",\n      \"Epoch Step: 1000 Loss: 34.465992 Tokens per Sec: 16489.131609\\n\",\n      \"Epoch Step: 1100 Loss: 47.311241 Tokens per Sec: 16501.563937\\n\",\n      \"Epoch Step: 1200 Loss: 22.709623 Tokens per Sec: 16416.828638\\n\",\n      \"Epoch Step: 1300 Loss: 45.883862 Tokens per Sec: 16338.132985\\n\",\n      \"Epoch Step: 1400 Loss: 21.321081 Tokens per Sec: 16680.505744\\n\",\n      \"Epoch Step: 1500 Loss: 11.126824 Tokens per Sec: 16636.646687\\n\",\n      \"Epoch Step: 1600 Loss: 32.759712 Tokens per Sec: 16440.968759\\n\",\n      \"Epoch Step: 1700 Loss: 19.354910 Tokens per Sec: 16476.318234\\n\",\n      \"Epoch Step: 1800 Loss: 14.631118 Tokens per Sec: 16490.663260\\n\",\n      \"Epoch Step: 1900 Loss: 2.233373 Tokens per Sec: 16390.177497\\n\",\n      \"Epoch Step: 2000 Loss: 42.503407 Tokens per Sec: 16498.365808\\n\",\n      \"Epoch Step: 2100 Loss: 35.935966 Tokens per Sec: 16257.764127\\n\",\n      \"Epoch Step: 2200 Loss: 37.685387 Tokens per Sec: 16498.916279\\n\",\n      \"\\n\",\n      \"Example #1\\n\",\n      \"Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\\n\",\n      \"Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\",\n      \"Pred:  when i was 11 , i was a <unk> of <unk> <unk> joy .\\n\",\n      \"\\n\",\n      \"Example #2\\n\",\n      \"Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\\n\",\n      \"Trg :  my father was listening to bbc news on his small , gray radio .\\n\",\n      \"Pred:  my father listened to his little , gray radio shack the bbc of the bbc .\\n\",\n      \"\\n\",\n      \"Example #3\\n\",\n      \"Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\\n\",\n      \"Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\\n\",\n      \"Pred:  he looked very happy , which was pretty unusual since then , they were <unk> the <unk> .\\n\",\n      \"\\n\",\n      \"Validation perplexity: 11.886973\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"model = make_model(len(SRC.vocab), len(TRG.vocab),\\n\",\n    \"                   emb_size=256, hidden_size=256,\\n\",\n    \"                   num_layers=1, dropout=0.2)\\n\",\n    \"dev_perplexities = train(model, print_every=100)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 23,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAY8AAAElCAYAAAAcHW5vAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3Xl8XHW9//HXJ1uTpkvapCmlC23pgoAtS1lboEUERQU38IKgoqKALCqoV+8VERVFEEEUQUQRVJYrcO8PZce20IWlLdCytbSldG+atkmbptk/vz/OmXY6TUlOOslJZt7Px+M8MnPmzJlPJu2853y/3/M95u6IiIhEkRN3ASIi0vMoPEREJDKFh4iIRKbwEBGRyBQeIiISmcJDREQiU3hIZGZ2jZl50rLJzGaZ2cdiqsfN7L87ad/XmFlT0v2ScN2Ezni9rtbK3zJ5+WuMdc0ws2fien1pW17cBUiP1QxMCW8PBq4E/mlmp7r70/GVlXZ/BB5Pul8C/AhYCiyMpaL0S/5bJtvY1YVIz6HwkA5z9xcSt81sOrASuBzYp/Aws17uXr+P5aWFu68GVsddx75oz/uZ/LcUaQ81W0lauPtWYAkwOrHOzPqa2a/NbJWZ1ZvZYjO7MPl5iWYhM5toZjPNrBb4ZfiYm9nVZnadmW0ws+1m9pCZDW6rHjM70sweN7Pq8HlPmtkhSY8fFtZ0dcrzHjazNWZWmlxfeHsk8G646b1JzTtTzex/zeylVuo4OdzmuPepdYaZPWNm55rZEjOrM7OXzOzoVrb9nJnNM7MdZlZpZneZ2YCkx0eGr3ehmd1qZhXAhrber7aY2Qoz+6OZfdPMVoav/6yZjU3ZrpeZ/SL8mzeY2TtmdqWZWcp2Q8L9rQv/DsvM7LpWXvdjZrbQzGrNbL6ZnbCvv4ukibtr0RJpAa4BmlLW5QHrgKfC+/nAHIIPrkuAUwhCoRm4KGVfzQTNQN8GpgHHhI85wbf+p4GPA18BKoE5Ka/twH8n3Z8E7Aif9yngDGAWsAnYP2m77wCNSa/3VaAF+HBrvyvQK9yfEzRdHRsu/YCPhusnpNR2H7CojfdzRvjeLQXOCV9jPlANDEra7hthfbcBpwFfBNYAs4GccJuRYR1rgb+FdX2yrb9l+PdLXSxpuxXh32JBWN85BEH6LtArabsHgAbgB8CpwK/Cen6WtE1puL914b+Nk4EvAX9MeU/WAq8B/wGcDrwMVAElcf8f0OIKDy3Rl1Y+cPYHbg8/JC4Mt/lC+EF3VMpz7ww/NHKS9uXAl1t5ncSHYPKH0xnh+lNTtksOj38DrwJ5Sev6EQTP9UnrcoDpwDvA4UANcFNrv2vS/cSH83kp2+WEH6S3Jq0rBeqAy9t4P2eQEjxAOUEAXhfe70MQJremPHdy+NyPptQ3M8Lf0veyJIf8irCe8qR1E8Ltvhbe/2B4/zspr3FH+NyS8P5PCUL7oDbekzpgRNK6I8L9nx33/wEtrmYr6bBcgg+ARoJvv+cB17j7neHjpxE0Y71iZnmJBXgS2A8Yk7K//7eX13nUd2+vfxSoJ/jGvwczKwJOJPgGTNLr1gJzgZ3NHu7eQhByZeFjy4Hvt/2r7ync153AeWENhPt24N527GKJu+/sgHf3CuB5dv2exxEE4H0p7+eLwLbk3yv0aITym4GjWlkeStnuubCuRI0LCYI3UeOJ4c+/pzzvPqAQSDTDnQLMcve326jrDXdfmXw//DmijedJF1CHuXRUM8GHhgNbgJXu3pT0eDkwniBcWlOadLvF3Sv3sl1F8h13dzPbCAzZy/YDCYLtunBJtSRlf6vMbCZwJnCH71tH/V0E3+Q/SxAYXwUecvct7XhuRSvrNhA0wUHwfkLQRNWa0pT7kfo53H1eOzbbW42Jv8WApHXJ1qc8XkoQem3Z7X1z9/qw66SwHc+VTqbwkA5r4wNnM7CY4IikNcnfOt/vugDlyXfCjtdBBE1frakiaC67ifDoI0Vdyv7OIgiOBcCPzexhd9/bvt+Xu28ws/8FLjSz5cDBBG367VHeyrrB7Po9N4c/zyHoG0mVOqy2M661sLca3wlvb0latyZpm/3Cn4nfoRIYmvbqpEup2Uo6y5PAAcAmd5/XylLTzv18wsx6Jd8n6LhudWipu28n+Hb+wb287uuJbc1sKEF7/J8JOuprgD+njgxKkTgy2du33zsImpB+QdAUNbPtXxGAccknHppZebifxO85O6xv1F5+r/fa+Tr74sSwrkSNE4CxSTU+F/78j5TnfY4gtF8O7z8DTDGzcZ1Yq3QyHXlIZ7kX+DIw3cxuJGivLgYOAo5198+2cz8tBCcf3kLwjfZ6YK67P/U+z/k2MNPM/gn8haC5ZTBB5/I77v7bMCD+QvBt+XJ3rzGz8wk6ai8DfrOXfW8g+AZ9jpktJugIXuzu28LH/03wTXwK8N12/o4QNO08bGY/DPf5Q4JRS7+GYCi0mX0PuNnMhgBPEfTjjCAY1fRbd58T4fV2Y2at9SFtc/c3ku5vAp4ws58QBPh1BB3pd4c1LjKzB4HrzKwAmBfW9nWCjv+qcD+/Bs4HZpjZtQRHqMOAE9z9ax39HaRrKTykU7h7o5mdCvwX8E2CD7kqgg+K+yPs6g8E3/L/RDDi6AnaaApy93nhh+E1BKPAigk+nF8gGL4K8C1gKsEHVk34vOfN7HrgejN7xt3fbGXfLWb2JYIPzqfC2qYRhE6iT+aRcP9/ifB7vkHQZ3ItwXv1GsGIsp3NUe5+m5mtJhhi/OVw9SqCb/Lv0nG5BAMGUr3I7gMTngBeJwjWQQRHQxe7e0PSNl8If4dLCAL7vbDem5J+j81mdjzBe3gtwUCA1UT7dyExM3ddhla6JzNz4Ifu/tO4a4nCzBYBb7v7We3cfgbBcOBTOrWwfWBmK4Bn3P2rcdci3YOOPETSIOyXOYLgZMZDATW/SEZTeIikxxCCM+o3A99199aagUQyhpqtREQkMg3VFRGRyBQeIiISWcb2eZSVlfnIkSPjLkNEpEeZP39+pbsPamu7jA2PkSNHMm9ee6brERGRBDNr12wFarYSEZHIFB4iIhKZwkNERCJTeIiISGQKDxERiUzhISIikWXsUN2OemNtNU+9sYGhA4o4e9LwuMsREemWdOSR4r6XVnLLs+/w9xdXxl2KiEi3pfBIMXVccJXN11ZXsammvo2tRUSyk8IjxfFjSinIzcEdnn+nMu5yRES6JYVHit4FeRwzeiAA0xdXxFyNiEj3pPBoxdTxQdPVc0s20tyi652IiKRSeLRi6vhgQskttY0sXF0VczUiIt2PwqMVo8uKGT6wCIDpizfGXI2ISPej8GiFmTEtbLqaqX4PEZE9KDz2ItF09drqaio1ZFdEZDcKj704bnQZBXnB2/PcEjVdiYgkU3jsRVFBLseOLgXU7yEikkrh8T6mhU1Xz7+jIbsiIskUHu8jcb5HVW0jr67SkF0RkQSFx/sYVVbMAaW9AZihUVciIjspPNqQGLI7Q/0eIiI7dWl4mNnPzOxdM9tqZhVm9g8zG5H0+BfMbJmZ1ZrZi2Z2ZFfW15qTwn6PRWuqqdhWF3M1IiLdQ1cfedwLHObu/YCRwErgfgAzmwL8HrgYGAA8BDxmZv26uMbdHDe6lF47h+xqll0REeji8HD3t929OrxrQAswPrx/IfCwuz/l7vXADUA98KmurDFVYX4uxx0YDNlVv4eISKDL+zzM7FwzqwZqgCuAa8KHJgLzE9u5uwOvhOvbu+9SMxtnZuOamprSVvPUcUHT1XNLNtLU3JK2/YqI9FRdHh7u/nd37w8MIQiOReFDfYHqlM2rgCjNVpcBi4HFFRXpO0pIDNndWtekIbsiIsQ42srd1wN3Av80s4HANqB/ymYlwNYIu72VoBlsfHl5eVrqBBhZVsyosmJAF4gSEYH4h+rmAcXA/sBrwBGJB8zMgMPC9e3i7pvcfYm7L8nLy0troYmJEjVkV0SkC8PDzHLM7FIzKw/vDwN+B6wA3iY4Cvm0mX3IzAqAK4FC4JGuqvH9JJqu3li7lYqtGrIrItmtq488TgdeN7PtwItALXCKuze5+yzgEoIQqQbOBk539yjNVp3mmFEDKcwP3q4ZmmVXRLJcl4WHu7e4++nuXu7uxe4+1N0/7+7Lkra5x91Hu3uRux/t7vPfb59dqTA/l+MPLANgppquRCTLxd3n0aMk+j2ee0dDdkUkuyk8Ipg6Luj32FbXxIKVGrIrItlL4RHBiNLejB6kIbsiIgqPiDTLroiIwiOyRL/HW+u2sr5aQ3ZFJDspPCI6etRAivJzAZi5RE1XIpKdFB4R9crLZfKYxCy7aroSkeyk8OiAk8J+j1nvVNKoIbsikoUUHh2QmKJ9W30T89/bEnM1IiJdT+HRAcMH9mZMeR9ATVcikp0UHh2UOPrQ1QVFJBspPDpo2kFBv8fb67exrnpHzNWIiHQthUcHTRo5gN4FwZBdNV2JSLZReHRQMGQ3mGVXTVcikm0UHvsgcbb57KWbaGjSkF0RyR4Kj32QuLpgTX0T897bHHM1IiJdR+GxD4aWFDFucDBkVxeIEpFsovDYR4mjD03RLiLZROGxjxL9Hks21LCmSkN2RSQ7KDz20aQDBlK8c8iujj5EJDsoPPZRQV4OU8Ymhuyq30NEsoPCIw0S/R5zllZS39QcczUiIp1P4ZEGiX6P7Q3NzFuhWXZFJPMpPNJgSP8iDtqvL6B+DxHJDgqPNDkpPPqYrn4PEckCCo80mRb2eyytqGH1ltqYqxER6VwKjzQ58oAB9O2VB2jUlYhkPoVHmuTn5miWXRHJGgqPNJp20K5ZdjVkV0QymcIjjU4aF/R77Ghs5qV3NcuuiGQuhUca7de/kA8M6Qeo30NEMpvCI82m7hyyq34PEclcCo80SwzZXb5xO6s2a8iuiGQmhUeaHTGihL6FiSG7OvoQkcyk8EizvNwcTghn2dXZ5iKSqRQenWDnLLvLKqlr1JBdEck8Co9OMHVc0Gle19jCixqyKyIZSOHRCcr7FXLI/okhu+r3EJHMo/DoJIkhuzPV7yEiGUjh0UkS/R7LK7fz3qbtMVcjIpJeXRYeZna9mb1hZlvNbK2Z3WlmA5Me/5KZtZhZTdJyX1fVl26HDy+hX6Fm2RWRzNSVRx7NwHlAKTARGAbcnbLNcnfvk7Sc04X1pVVebg4njNPZ5iKSmbosPNz9B+7+irs3uvtG4BZgajpfw8xKzWycmY1rampK5647JHG2+dxlmzRkV0QySqTwMLOXzexCM+uThtf+EPBayrrhZrbezFaZ2f1mNiriPi8DFgOLKyri/7Z/UnjkUd/UwgvLN8VcjYhI+kQ98pgOXAusM7O7zOyYjryomX0GuAi4Imn1c8AHgf2Bo4A64GkzK46w61uB8cD48vLyjpSWVoP69uKDQ/sD6vcQkcwSKTzc/bvAcOCLwH7AbDNbZGaXm9mA9uzDzM4C7gTOcPcFSfte7u5L3L3F3dcDFxIEybER6tsU7mNJXl5ehN+s8ySG7Op8DxHJJJH7PNy9yd0fdvePAQcADwPXA2vM7G9mdtTenmtmFwB3AJ9w9+ltvVS4WNQau5NEeKzYVMu7lRqyKyKZocMd5mZ2IHAp8DVgB/BHoJDgaOTqVra/HLgROM3dZ7fy+MfMbJgFBgK/AyqBFzpaY3dw2PABlPTOB3T0ISKZI2qHeS8z+7yZTSfomD4B+B6wv7tf7u6fAT4JXNnK028B+gHTk8/lSHp8KvASUAO8QTCk98PuXrPHnnqQ3BzjhLGJIbvq9xCRzBC1Y2A9QVPSX4FL3f2NVraZA+wxG6C7v2/zk7t/B/hOxHp6hGnjB/Hoa2t5YfkmdjQ0U1SQG3dJIiL7JGqz1beAoeFRRmvBgbtXuXvUIbYZ7cRwyG6DhuyKSIaIGh4n0srRipkVm9mf0lNS5inr04sJw4IhuzrbXEQyQdTw+CJQ1Mr6IuAL+15O5kpMlDhj8UbcPeZqRET2TdTwMII+j10rzAyYAqg3+H0khuyu3FzLcg3ZFZEerl3hEc5220wQHOvNrDmxAE3AQwSd6LIXE4eVMGDnkF3lrIj0bO0dbXUOwVHH3wmmFalOeqwBeNfdX01zbRklN8c4cdwg/u/VtcxYXMFXpmhMgYj0XO0KD3d/AMDM1gGz3T3+KWt7oKnjg/B4cflmahua6F3QPaZQERGJqs1mKzNLnmHwLWCgmZW3tnRemZnhxLGDMIOG5hbmLtOQXRHpudrT57EuKRjWA+taWRLr5X2U9unFhGElgIbsikjP1p52k5PZdcb4yaSMtpJopo0fxGurqnYO2Q0Gq4mI9Cxthoe7z0y6PaNTq8kCU8eXc/Mz77B6yw6WbdzOmPJ0XFdLRKRrRZ0Y8aq9rC80szvSU1JmmzC0P6XFBYBm2RWRnivqSYLfN7PHzWxQYoWZTQAWkObrkWeqnHDILuh8DxHpuaKGx+FAH2ChmZ0WXqPjRYKp1I9Id3GZKnG2+UvvbmZ7vUY9i0jPE/UytCuBk4D/AR4juLjTV9z9S+6uOTfaKXnI7hwN2RWRHqgjVxKcBpxF0FS1Hfi8mZWltaoMN6C4gMOGa8iuiPRcUTvMfwE8DtwJHAscBpQAi8zs1PSXl7mmhbPsztQsuyLSA0U98vg8waVhr3b3Znd/j+BStHcCj6a9ugyW6PdYU7WDpRU9+kq7IpKFoobHxOTzPgDcvcXdrwZOSV9Zme/Q/ftT1icYsqumKxHpaaJ2mG8GMLNSMzvGzHolPfZ8uovLZBqyKyI9WdQ+jz5mdh/BhZ/mAEPD9XeY2Y86ob6Mlri64MsrNlOjIbsi0oNEbbb6OTAaOAbYkbT+n8Cn0lVUtjhxbBk5Bo3NzuyllXGXIyLSblHD4wzgCnd/md0nSHyLIFQkgpLeBRw+YgCgpisR6VmihscgYEMr64sIrjQoEU0bn+j3qNCQXRHpMaKGx0JaH1X1eeDlfS8n+yT6PdZV17Fkg4bsikjPEPU6qNcA/zCzYUAucI6ZfYDgjPMPp7m2rHDwkH6U9elFZU090xdXMH6/vnGXJCLSpqhDdZ8APkkwv1UL8F/AAcBH3f259JeX+XJybOcJg5qiXUR6ishzW7n7M+4+1d37uHtvdz/B3f/dGcVli0R4zFuxhW11jTFXIyLSto5MjChpdsKYQeTmGE0tGrIrIj1Dm+FhZjvMrLY9S1cUnIn6987niBHBLLsasisiPUF7OswvZvdzOqQTTB1fzssrtjAjnGXXTCOfRaT7ajM83P3uLqgj600dP4gbnlzM+q11vL1+Gx8Y0i/ukkRE9qpDfR5mdryZfTVcjk93Udno4CH9KO8bzDOpWXZFpLuLOjHicDObC8wCfhkus8zsBTMb3hkFZguz5CG76vcQke4t6pHHnQRNXYe4+0B3HwgcQjA1yZ3pLi7bJM42n//eFrZqyK6IdGNRw+Mk4GJ3fyuxIrx9KXBiOgvLRlPGlpGbYzS3OLPe0ZBdEem+oobHOqC1C080A2qo30f9CvM58oDELLt6O0Wk+4oaHlcDN5vZiMSK8PaNwA/TWVi2Su730Cy7ItJdRQ2P/wImAcvNbLWZrQaWA0cD3zezNxNLugvNFtPCfo+KbfW8uW5rzNWIiLQu6qy693f0hczseuDjwHCgBvgX8L3EddHDbb4A/AgYAiwCLnH3+R19zZ7ooP36sl+/QtZvrWPG4o0csn//uEsSEdlDu8PDzHKB6cBCd6/qwGs1A+cBrwMlwD3A3QRXJ8TMpgC/J7ic7UzgCuAxMxvr7lnzFTwxZPf+l1cxY3EF35g2Ju6SRET20O5mK3dvBp4GBnTkhdz9B+7+irs3uvtG4BZgatImFwIPu/tT7l4P3ADUk4XXRk/0eyxYWUV1rYbsikj3E7XP4y1gWJpe+0PAa0n3JwI7m6g86C1+JVzfLmZWambjzGxcU1Nrg8J6hsljysgLh+w+v1QnDIpI9xM1PK4CbjCzo8NmrA4xs88AFxE0TSX0BapTNq0CokzydBmwGFhcUdFzh7r2Lcxn0sjgAO9305exo6E55opERHYXNTweJRhtNReo68iU7GZ2FsHZ6Ge4+4Kkh7YBqb3DJUCU/o5bgfHA+PLy8ghP634uO3ksZvDWuq381yOLNGxXRLqVqKOtLtqXFzOzC4BfAZ9w99kpD78GHJG0rQGHAQ+3d//uvgnYBDBp0qR9KTV2k8eUcdWp47nhycU8/MoaJg4v4YvHj4y7LBERIGJ4uPtfOvpCZnY5wTDc09z95VY2uRN4wsz+AjwPXA4UAo909DV7uotPOpDXVlXx1Jsb+Mk/3+SQ/fsxaeTAuMsSEYk+JbuZlZvZlWb2ezMrC9dNNrNRbTz1FoL+i+lmVpNYEg+6+yzgEoIQqQbOBk7PpmG6qXJyjF+dPZHRZcU0tTiX/G0BFVvr4i5LRCTylOyHA28DFwBfYVdn9oeBn77fc93d3D3f3fskLynb3OPuo929yN2PzrYTBFvTtzCfO84/kt4FuVRsq+cbf19AY3NL3GWJSJaLeuTxK+AP7n4owTkYCU8Ck9NWlexm7OC+3PDZYMTyyyu28LN/vdXGM0REOlfU8DgC+GMr69cCg/e9HNmbj00YwtdOHA3A3XNW8Mgrq2OuSESyWdTwaAKKW1l/ILC5lfWSRt89bTzHjS4F4PsPL+LNtVnbHSQiMYsaHk8A3wmH0QK4mQ0AriU4B0Q6UV5uDreeezhD+hdS19jCRX+dr+lLRCQWHTnD/EhgGcEw2oeAdwlO5vtBekuT1pT16cXvzzuSgtwcVm6u5ZsPvEJLi04gFJGuFTU8tgBHERxp3AG8AFwJTEqeWl0612HDS/jxmYcAMH3xRm559p2YKxKRbNOukwTNbCDwF+AjBIHzAvB5d1/ReaXJ+znn6BG8urKKB+at4pZn32HCsP586AMasyAiXaO9Rx4/A44hOEP8OwQjq27vrKKkfX585iFMGBZMB/bNB15lReX2mCsSkWzR3vD4KPAVd7/O3W8iuIDTKWYWdW4sSaPC/Fx+f96RDCwuYFtdExf9dT61DT13KnoR6TnaGx5D2f1aG28CDcD+nVGUtN/QkiJuPedwcgzeXr+N7z+sGXhFpPO1NzxygdQxoc3heonZ5DFlfPcjBwHwf6+u5c+zV8RbkIhkvCjNTveZWUPS/ULgz8nX8XD309NWmUTy9RNH89qqKh5/fT3XPfYWhw7tz9GjNAOviHSO9h55/AVYBWxIWv5KcI5H8jqJiZlxw1kTOXDQrhl4N2gGXhHpJJap7eOTJk3yefPmxV1Gl1taUcMnfzebmvomjhhRwv1fO46CvMgz74tIljKz+e7e5tX09KmSYcaU9+HGsyYAsGBlFT/915sxVyQimUjhkYE+cugQLjrpQADumfseD83XDLwikl4Kjwx11anjmDKmDIAfPLKI19dUx1yRiGQShUeGysvN4TfnHM7QkiLqm1q4+G/zqaptaPuJIiLtoPDIYAOLC/j9eUdQkJfDqs07uPz+V2nWDLwikgYKjww3YVgJPz3zUACeW7KRm59ZEnNFIpIJFB5Z4OyjhnPO0SMAuPXfS3n6TZ2SIyL7RuGRJa4542AmDi8B4NsPvMq7moFXRPaBwiNL9MrL5fbzjqC0uIBt9U18/d55bK/XDLwi0jEKjywypH8Rt54bzMC7ZEMN33tooWbgFZEOUXhkmeMPLOP7H/0AAP9cuI67Zr0bc0Ui0hMpPLLQV08YxccmDAHg54+/zdxlm2KuSER6GoVHFjIzfvmZCYwt70Nzi3PZfQtYV70j7rJEpAdReGSp4l553H7+kfTtlUdlTQMX/3UB9U3NcZclIj2EwiOLHTioD786eyIAr66q4tpHNQOviLSPwiPLnXrIflw6bQwAf3txJQ/OWxVzRSLSEyg8hG99eBwnjhsEwH//7+ssWq0ZeEXk/Sk8hNwc45bPHcawAUU0NLVw0V/ns2W7ZuAVkb1TeAgAA4oLuP28I+mVl8Oaqh1cfv8rmoFXRPZK4SE7HTq0Pz/71AcBeP6dSn711OKYKxKR7krhIbv57JHDOO/YYAbe22Ys48k31sdckYh0RwoP2cPVHz+Ew0cEM/Be+eBrLNtYE3NFItLdKDxkDwV5Ofz+80dS1qeAmvomvn7vfCq21cVdloh0IwoPadV+/Qv57blHkJtjLK2o4eQbZ/KH55bR0NQSd2ki0g0oPGSvjh1dyk1nT6RvYR419U1c99jbfOTm55j+dkXcpYlIzLo0PMzsP8zseTPbamZNKY9NNTM3s5qkZU5X1id7OvOwocy4airnHD0cM1heuZ0L7n6ZC/78EsvVFyKStbr6yGMLcBvwzb083uzufZKW47uwNtmL0j69+PmnJ/DopVM4auQAAKYv3shpNz/HdY+9xba6xpgrFJGu1qXh4e5Puvt9wPKufF1Jj0OH9ufBrx/Hb845nCH9C2lsdv7w3HKm3TiTB+etokUnFYpkje7W55FrZqvMbL2Z/cvMJkZ5spmVmtk4MxvX1KTrc3cGM+OMifvz7JUncfnJYyjIy6Gypp7v/mMhn7ptNgtWbom7RBHpAt0pPN4GDgNGAQcBC4F/m9n+EfZxGbAYWFxRoU7dztS7II9vnzqeZ799Eh89dD8AXltdzadvm8O3H3iVDVs1tFckk5l71zc1mNlU4Bl3z2tju3eAX7j7Xe3cbylQCjBx4sTFr7766r6WKu00Z2klP370TRZv2AZA74JcLj15DF+ZMopeebkxVyci7WVm8919Ulvbdacjj9a0ANbejd19k7svcfcleXnvm0uSZsePKeNfl0/h2jMPoX9RPrUNzfzyicWc+uvnePrNDcTxJUVEOk9XD9XNNbNCoCC8XxguZmYnm9kYM8sxsz5mdg0wGHiyK2uUjsvLzeELx41kxlVTOf/YA8gxeG9TLRfeM48v/OklllZsi7tEEUmTrj7yOB/YQRAIueHtHcABwETgWWAbwWisY4EPu7subdfDDCgu4CefPJR/XX4Cx44eCASz9J528/P8+NE3qN6hob0iPV0sfR5dYdKkST5v3ry4y8h67s7jr6/nZ/96izVVOwAYWFzAVaeO53NHDSc3p92tkiLSBTKlz0P5PCIUAAAMbElEQVR6ODPj9A8O4dkrT+Jbp4yjMD+Hzdsb+MEjizjjt7N4ecXmuEsUkQ5QeEiXKMzP5YpTxvLslVP5+IQhALyxditn3T6Xy+57hbXhUYmI9AwKD+lSQ0uK+O25R/Dg14/j4CH9AHj0tbV86Fcz+c2z71DX2BxzhSLSHgoPicXRowby6GVTuO5TH2RgcQE7Gpu56eklnHLTTB5ftE5De0W6OYWHxCY3xzj3mBFMv3IqF0weSW6OsXrLDi7+2wLOvfNF3l6/Ne4SRWQvFB4Su/698/nRJw7hiStO4ISxZQDMXb6J0295nqv/73WqahtirlBEUik8pNsYO7gv93z5aP5w/pGMGNibFod75r7H1BtncO/cFTQ16yqGIt2FzvOQbqmusZm7Zr3L76YvpbYh6EQfVVbMtPHlTB5TyjGjS+nTS1PQiKRbe8/zUHhIt7a+uo7rn3ibR15Zs9v6vBxj4vASJh9YyuQxZRw+YgAFeTqQFtlXCg+FR0ZZuLqKxxatZ/bSSl5fW03qP9ui/FyOGjWQKWNKOf7AMg4e0o8cnb0uEpnCQ+GRsapqG3hh+SZmLa1kztJNLK/cvsc2A3rnc/yBZRw/ppTJB5ZxQGlvzBQmIm1ReCg8ssbaqh3MXlrJnGWbmL20kopt9XtsM7SkiMljgiau4w4spbxvYQyVinR/Cg+FR1Zyd5ZW1DB7aSWzl23ihWWb2Fa/5yWJxw/uu/Oo5JjRA+lbmB9DtSLdj8JD4SFAU3MLi9ZU7zwqmffeFhqadh/ym5tjTBzWn8ljysLO9xJd/VCylsJD4SGtqGtsZt6KLcxeVsnspZUsWrNn53thfg5HjRzI5DFlTBmjznfJLgoPhYe0Q3VtI3OXb2LOskpmLa1k+cY9O99Leudz3OjSnUcmI9X5LhlM4aHwkA5YV72DOUs3hX0mlWzYumfne7/CPIYO6M3QkiKGDQiWoSVFDA1/DiwuULhIj6XwUHjIPnJ3lm3cHgTJ0krmLt/Etro9O99TFeXn7gySoUnhEvzsTXnfXmoGk25L4aHwkDRrbnHeWFvNso01rN68gzVVwbJ6S/AztSN+bwpycxhSUhiES0kRwwb03hk2wwYUsV//QvJzdba8xKO94aHJgUTaKTfHmDCshAnDSvZ4rKXFqdxez5otu8JkzZZEuNSyZssOtodzdDU0t/Deplre21Tb6uvkGOzXrzDl6KX3bk1jhfkaDSbxUniIpEFOjlHet5DyvoUcPmLAHo+7O9U7Glm9R7jU7rxfVdsIQIvD2uo61lbX8TJbWn29sj4FDCwuoG9hPn0L85J+5tEv6XbfXrs/3q8onz698shVs5nsI4WHSBcwM0p6F1DSu4BDh/ZvdZua+ibWJh2prA4DJhEuG5POnK+saaCypuPXOenTK29XwEQNoMJ8+hQqgLKdwkOkm+jTK49xg/sybnDfVh+va2xmXXXdziOWqtpGttU1sa0u+Lk16fa2+sRjTTS37NmvWVPfRE19E+uqO15vcUEufQvz6Ve0K2z6Fe0KmOTbicf6FebTL7zdKy9Ho9J6MIWHSA9RmJ/LqLJiRpUVt/s57s6OxuadIbM1DJSdIVO3K2S2trIucbuplQDa3tDM9oZmOnq14Pxcaz1kwkDqGwZN33CbXbfDo6BeeZ02aq2lxWlxp8UJfwa3m1scb+22Oy0tjofbO+x8DIKf7uA4LS3BT09e57tvH+xn1zrHIbEuafvE66RuD84JYwdR3InXvFF4iGQwM6N3QR69C/IY3K9jk0G6O3WNLUnhkxo44ZHPjsad67bu2BVGiZ+pGpudTdsb2LS9Y81vZsHRWr/CfIp75eLhh3jiA7y5ZffbiQ/X5A/65qRgSA6MTDDjqqkKDxGJj5lRVJBLUUEu5f06to/mFqemPjz62bHrKCgInF2h1FrobN3RyNa6Rhqbd/9Ud2dniGWrHAv+PjkGhoGx83ZnZ6DCQ0Q6XW6O0b8on/5F+bDnYLQ2uTv1TS1hkOwZLLX1zZhBjhm5ObbzQ3W322bk5ATb7FqCkXKt3c412/nBnJuz++0cMyzpdk742skf5GbsrMnCdTlG+AFvGLt/8FsOe64L95F47s7HukFfkcJDRLo9M6MwP5fC/I4f/Uh66TRWERGJTOEhIiKRKTxERCQyhYeIiESm8BARkcgUHiIiEpnCQ0REIsvYi0GZ2UbgvQ4+PRcYDGwAmtNWVM+k92J3ej92p/djl0x5Lw5w90FtbZSx4bEvzGwcsBgY7+5L4q4nTnovdqf3Y3d6P3bJtvdCzVYiIhKZwkNERCJTeLRuE/Dj8Ge203uxO70fu9P7sUtWvRfq8xARkch05CEiIpEpPEREJDKFh4iIRKbwEBGRyBQeIiISmcJDREQiU3iIiEhkCg8REYlM4ZHEzHLN7AYz22hm28zsITMri7uuOJjZ9Wb2hpltNbO1ZnanmQ2Mu664mVmOmc0xMzezYXHXEyczO8XMXjCzGjOrNLPb4q4pLma2n5k9EH52bDGzf5vZxLjr6kwKj939J3AmcAyQ+GC4N75yYtUMnAeUAhMJ3o+74yyom/gWUBt3EXEzs6nAP4AbCf6NDAP+GGdNMbsNGAiMI5iWfR7wTzOzWKvqRJqeJImZvQdc6+53hfcPBJYCI929o9cGyQhm9hHgQXfvF3ctcQmn3H4c+AzwCjDc3VfHW1U8zGwuMNPd/zPuWroDM1sI/Nbd/xDeHw+8DQxy98pYi+skOvIImVkJMAKYn1jn7suArQTfvLPdh4DX4i4iLmaWA/wJuAqoirmcWJlZMXA0kGdmC8ImqxlmNinu2mJ0A/AZMxtkZoXA14BZmRocoPBI1jf8WZ2yvgrI2m/bAGb2GeAi4Iq4a4nRFcB6d38k7kK6gQEEnx3nAF8C9geeAh4Lv4Rlo9kEVxKsAGqATwMXxlpRJ1N47LIt/Nk/ZX0JwdFHVjKzs4A7gTPcfUHc9cTBzMYAVwKXxl1LN5H4v/Jnd1/o7g3Az4F84Pj4yopHeFT6DLCE4POjN/Az4HkzGxxnbZ1J4RFy9ypgJXBEYp2ZjSY46lgYV11xMrMLgDuAT7j79LjridEUYBDwuplVAokQXWhml8RXVjzcvRpYAaR2mHor67LBQGAUcKu7b3X3Bnf/I8Hn63HxltZ5FB67+wPwPTMbZWb9gOuBJ919RbxldT0zu5xgJM1p7j477npi9iBwIHBYuJwerj8VuCeuomJ2G3CBmR1sZnnAd4B6YE68ZXW9sF9jCXCJmRWbWZ6ZfZmgKTxjv3jmxV1AN/MLgvbcl4FewNMEw1Wz0S1AEzA9ebShu/eJraKYuHstScNzww9LCPpAauKpKnY3Enw4/hsoJBh99tHwqCQbfZKg0/w9gua7pcBZ7r481qo6kYbqiohIZGq2EhGRyBQeIiISmcJDREQiU3iIiEhkCg8REYlM4SEiIpEpPER6CDP7kpnVxV2HCCg8RNrFzO4OLwCVumTllOwiOsNcpP2mA+emrGuOoxCRuOnIQ6T9Gtx9fcqyEcDMVpjZtWb2p/DSvRvN7CfJV5Izs/5mdld4/Ys6M5ttZrtNnGdmY8PLH28xs1oze8XMpqVsc4KZvRo+/pKZHd41v77ILgoPkfT5JsHMzJMILhp1JXBx0uN/Bk4CPgccCSwDnkxM221mQwiuC9GbYPLFCcBPU14jP1z3jXAfVcD94bTgIl1Gc1uJtIOZ3U0wSWZqh/Uj7n6+ma0A3nX3aUnP+SXwaXcfY2ZjCWZe/bC7PxM+nphA7x53/6GZ/RS4ABjj7jtaqeFLBAE00d0XhusmA7PQpZKli6nPQ6T95gBfTlmXPKvu3JTHZgNXhZcl/QDBtS5mJR5098bwWuAHh6uOILh06R7BkaQJeD3p/trw52CCGV1FuoTCQ6T9at19acw1NLt7S9L9RNOBmq2kS+kfnEj6HJty/3iCpqw64E3ACK5KCOxstjoOeCNctQCYbGZFXVCryD5ReIi0X4GZ7Ze6JD0+ycx+aGbjzOx8gmue/xogPGJ5GLjdzE42s4OBuwguPva78Pm3EVxY6WEzO87MRpvZmamjrUS6AzVbibTfNGBd6srwCALgZmAMMB9oILga421Jm34ZuAn4H6A43O40d98A4O5rzWwK8EvgSSAXeJtg1JZIt6LRViJpEI62ut3dfxF3LSJdQc1WIiISmcJDREQiU7OViIhEpiMPERGJTOEhIiKRKTxERCQyhYeIiESm8BARkcgUHiIiEtn/B1CUqx3mK+1vAAAAAElFTkSuQmCC\\n\",\n      \"text/plain\": [\n       \"<Figure size 432x288 with 1 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"needs_background\": \"light\"\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"plot_perplexity(dev_perplexities)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Prediction and Evaluation\\n\",\n    \"\\n\",\n    \"Once trained we can use the model to produce a set of translations. \\n\",\n    \"\\n\",\n    \"If we translate the whole validation set, we can use [SacreBLEU](https://github.com/mjpost/sacreBLEU) to get a [BLEU score](https://en.wikipedia.org/wiki/BLEU), which is the most common way to evaluate translations.\\n\",\n    \"\\n\",\n    \"#### Important sidenote\\n\",\n    \"Typically you would use SacreBLEU from the **command line** using the output file and original (possibly tokenized) development reference file. This will give you a nice version string that shows how the BLEU score was calculated; for example, if it was lowercased, if it was tokenized (and how), and what smoothing was used. If you want to learn more about how BLEU scores are (and should be) reported, check out [this paper](https://arxiv.org/abs/1804.08771).\\n\",\n    \"\\n\",\n    \"However, right now our pre-processed data is only in memory, so we'll calculate the BLEU score right from this notebook for demonstration purposes.\\n\",\n    \"\\n\",\n    \"We'll first test the raw BLEU function:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 24,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import sacrebleu\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 25,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"100.00000000000004\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# this should result in a perfect BLEU of 100%\\n\",\n    \"hypotheses = [\\\"this is a test\\\"]\\n\",\n    \"references = [\\\"this is a test\\\"]\\n\",\n    \"bleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\\n\",\n    \"print(bleu)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 26,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"22.360679774997894\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# here the BLEU score will be lower, because some n-grams won't match\\n\",\n    \"hypotheses = [\\\"this is a test\\\"]\\n\",\n    \"references = [\\\"this is a fest\\\"]\\n\",\n    \"bleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\\n\",\n    \"print(bleu)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Since we did some filtering for speed, our validation set contains 690 sentences.\\n\",\n    \"The references are the tokenized versions, but they should not contain out-of-vocabulary UNKs that our network might have seen. So we'll take the references straight out of the `valid_data` object:\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 27,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"690\"\n      ]\n     },\n     \"execution_count\": 27,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"len(valid_data)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 28,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"690\\n\",\n      \"when i was 11 , i remember waking up one morning to the sound of joy in my house .\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"references = [\\\" \\\".join(example.trg) for example in valid_data]\\n\",\n    \"print(len(references))\\n\",\n    \"print(references[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 29,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"\\\"i 'm always the one taking the picture .\\\"\"\n      ]\n     },\n     \"execution_count\": 29,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"references[-2]\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"**Now we translate the validation set!**\\n\",\n    \"\\n\",\n    \"This might take a little bit of time.\\n\",\n    \"\\n\",\n    \"Note that `greedy_decode` will cut-off the sentence when it encounters the end-of-sequence symbol, if we provide it the index of that symbol.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 30,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"hypotheses = []\\n\",\n    \"alphas = []  # save the last attention scores\\n\",\n    \"for batch in valid_iter:\\n\",\n    \"  batch = rebatch(PAD_INDEX, batch)\\n\",\n    \"  pred, attention = greedy_decode(\\n\",\n    \"    model, batch.src, batch.src_mask, batch.src_lengths, max_len=25,\\n\",\n    \"    sos_index=TRG.vocab.stoi[SOS_TOKEN],\\n\",\n    \"    eos_index=TRG.vocab.stoi[EOS_TOKEN])\\n\",\n    \"  hypotheses.append(pred)\\n\",\n    \"  alphas.append(attention)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 31,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"array([  70,   11,   24, 1460,    5,   11,   24,    9,    0,   10,    0,\\n\",\n       \"          0, 1806,    4])\"\n      ]\n     },\n     \"execution_count\": 31,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"# we will still need to convert the indices to actual words!\\n\",\n    \"hypotheses[0]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 32,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"['when',\\n\",\n       \" 'i',\\n\",\n       \" 'was',\\n\",\n       \" '11',\\n\",\n       \" ',',\\n\",\n       \" 'i',\\n\",\n       \" 'was',\\n\",\n       \" 'a',\\n\",\n       \" '<unk>',\\n\",\n       \" 'of',\\n\",\n       \" '<unk>',\\n\",\n       \" '<unk>',\\n\",\n       \" 'joy',\\n\",\n       \" '.']\"\n      ]\n     },\n     \"execution_count\": 32,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"hypotheses = [lookup_words(x, TRG.vocab) for x in hypotheses]\\n\",\n    \"hypotheses[0]\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 33,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"690\\n\",\n      \"when i was 11 , i was a <unk> of <unk> <unk> joy .\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# finally, the SacreBLEU raw scorer requires string input, so we convert the lists to strings\\n\",\n    \"hypotheses = [\\\" \\\".join(x) for x in hypotheses]\\n\",\n    \"print(len(hypotheses))\\n\",\n    \"print(hypotheses[0])\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 34,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"23.4681520210298\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# now we can compute the BLEU score!\\n\",\n    \"bleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\\n\",\n    \"print(bleu)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Attention Visualization\\n\",\n    \"\\n\",\n    \"We can also visualize the attention scores of the decoder.\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 47,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def plot_heatmap(src, trg, scores):\\n\",\n    \"\\n\",\n    \"    fig, ax = plt.subplots()\\n\",\n    \"    heatmap = ax.pcolor(scores, cmap='viridis')\\n\",\n    \"\\n\",\n    \"    ax.set_xticklabels(trg, minor=False, rotation='vertical')\\n\",\n    \"    ax.set_yticklabels(src, minor=False)\\n\",\n    \"\\n\",\n    \"    # put the major ticks at the middle of each cell\\n\",\n    \"    # and the x-ticks on top\\n\",\n    \"    ax.xaxis.tick_top()\\n\",\n    \"    ax.set_xticks(np.arange(scores.shape[1]) + 0.5, minor=False)\\n\",\n    \"    ax.set_yticks(np.arange(scores.shape[0]) + 0.5, minor=False)\\n\",\n    \"    ax.invert_yaxis()\\n\",\n    \"\\n\",\n    \"    plt.colorbar(heatmap)\\n\",\n    \"    plt.show()\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 71,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"src ['\\\"', 'jetzt', 'kannst', 'du', 'auf', 'eine', 'richtige', 'schule', 'gehen', ',', '\\\"', 'sagte', 'er', '.', '</s>']\\n\",\n      \"ref ['\\\"', 'you', 'can', 'go', 'to', 'a', 'real', 'school', 'now', ',', '\\\"', 'he', 'said', '.', '</s>']\\n\",\n      \"pred ['\\\"', 'now', 'you', 'can', 'go', 'to', 'a', 'right', 'school', ',', '\\\"', 'he', 'said', '.', '</s>']\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAZUAAAEhCAYAAAC3AD1YAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzt3XmYXFWd//H3J52EAFmAhM1AIAgBRcUdHVwQUXDHUQcVUAZFGR3UnzCKimwiojhuzKgDqFHEFVHABZElCq64sChKhBACkS1AEgJIku7v749zmtwUvdzqul23qvJ5Pc99quou556qdOrU2b5HEYGZmVkVJtSdATMz6x0uVMzMrDIuVMzMrDIuVMzMrDIuVMzMrDIuVMzMrDIuVMzMrDIuVMzMrDIuVMzMrDIT686AWa+SdEaZ8yLibeOdF7N2caFiNn4m1Z0Bs3aTY3+ZmVlVXFMxaxNJAp4JbA8sAa4K/6qzHuOailkbSNoeuBB4HHAXsBXwV+CVEbGkzryZVcmjv8za47PAVcAWEbE9MBP4LfC5WnNlVjHXVMzaQNJdwA4R8VBh3ybA4ojYqr6cmVXLNRWz9vgnMKNh3wxgdQ15MRs3LlTM2uP7wPcl7SNpJ0n7AOcC36s5X2aVcvOXWRtI2hj4DPAmYCPgYeBrwHuKTWJm3c6Filkb5WHFWwJ3ezhx55P0fGCriPhu3XnpFm7+MhuBpA8Ns/8DY0hrcJ7K84Bn5NfW2T4K/K8kf1eW5JqK2QgkrYyI6UPsvzcitmgiHc9T6TKS5gB/BP4OnBgRF9Wcpa7g0tdsCJIeI+kxwARJ2w6+ztvzSX0izfA8le7zBtIAi3PycyvBNRWzIUgaAIb6zyGgH/hwRJzaRHqep9JlJP0JOAq4Pm/bRkSzPyY2OI79VQNJT4+I39edDxvRXFIBcjWwR2H/AKmT/Z9Npjc4T6U40muDnaci6bgy50XESeOdl6FIejywDXB5RISkq4GX4yHgo3KhUo9LJPUDlwOXAJdExI0158kKIuKW/HSzipIcnKfyIWAxsCPwETbcL6nnFp6LNHjhDuAWYAfSF/rPa8jXoDcC5xZG6H0779tQ/71Kc/NXDST1Ac8CXgjsSxoRdAfws4g4vM682aNJejbwdGBacX9EnNJEGp6nMgxJnwKWAR8b/BLPo+tmRcRRNeXpJuCQiPhVfj0LuBl4TETcX0eeuoULlZpJ2gN4NfAeYOOI2KjkdX9n6DZ/ImLeGPIxjUd/af6j2XR6jaQTgA+SmsEeKByKiNhnDOl5nkoDScuAbSJibWHfROCOiJhVQ362Bo6LiHc27P8wcGFEXN3uPHUTFyo1kHQoqYbyQtIvtEvztqDsryBJb27YNRt4K3BWk7+gnw18FXhscTfpS7OvbDq9StIdwKsi4rd156VXSboVeEXxy1rSU0hf4NvVlzMbCxcqNcgji/4OfBj4XkT0V5TuE4FPRsR+TVxzLalf5yzW/yVe7FfYYEm6kzTqZ6DFdOYBpzN0M9rkVtLudrmp693A/7Guv+ltwOnN/EAaT5L2Bvoj4oq689LpXKjUIAcT3DdvuwBXAD8j9an8rYV0JwDLh5qsN8I19wPT3RQzNEkfJQ37PbPFdH4F3AbM59GFd50d0h1B0iHAIcB2wFLg7Ij4Wo35uRg4OSJ+IendwMdIQ8mPi4hP15WvbuBCpWaSNgfeCRwNTCvb5JQn5hVtChxGakZ4QhP3vwT4z1YKs14j6Wes66+aQBqZdANwe/G8iHhxE2muBGZGxJqq8mnjJ88rmh0RayT9GTgCWA78ICJ2rjd3nc1DimsgaRvW1VReCMwCfkVqhirrNtbvqBep6aCxr2U0lwIXSPoiaQTaIyLiG02m1SuubHhdRZPH30ihWZZWkFbXk7RNRNyRnzf+QHpEjYNFJucCZWtSQMkrASR5ouooXFOpgaS1pNFEl5IKkiuanUwnaYeGXfdHxL1jyMvNwxyKiNip2fRsHUn/Uni5B6nA/wSPLrx/1c58dYJiTLVhohfUOlhE0h9JoXUeC8yLiNdLmgn81REQRuZCpQaSthhLAdDpJE0FXgZsD9wK/CgiVtWbq9ZIet4whx4Gbhn8tT3MtWU69zfIUXaSto+IW/Pzxh9Ij6hrsIikfUmjIh8mjf67TtKbgAMj4mV15KlbuFCpSatfwHm+w3tJw4gH0zgL+HSrI5XGQtLupMEG/awbwdMHvDgi/tzu/FRF0hpSv0oxTH3xP83PgYMiYr3+lg2JpJ2A15P6IN4paVdgYkT8peasNU3S1hFx5zDHJgG4X2xkLlRqUMUXcA73cRjwceAmUjX9fcD8iDi5ibxsDBxL6tvZksKXZzPNX7lz+0rgpBwrSTndvSPihWXT6TR5VNLLSBMgB0OInAxcDPwa+BSwKiIOHCWdzYHVEfFAYd+mwKSIWD5O2R93kl4EnEcKObR3REyXtBdwbES8pIl0dgP25tF/g22N/SVpBfBnclidiLipnffvCRHhrc0bqUA5nnWFukhzVi5tIo0bgd0a9u0KLGoyL18k/Sd6J7AqP/6V9KXQTDr3kL4gi/smAfeM4fOZChxIGhF3IDC1xn+rmxvvT5pncnN+vjVp5vdo6VwBPLNh357Az+v8W6zg8/kDsH9+fl9+3Bi4s4k03kBqZrqq8LiaFBOv3e9nI1LgyLOAO/P/jY8AT6v7s+6WzTWVGki6hxSWYk1h3yTSl9PMkmncC2w9RBp3RnOLRy0FnhsRiyQtj4jNcoTW06OJGkaOlbR/RPy9sG8X4OKImNtEOh3VjJb/reZFxD2FfbOAhRGxRdm5Qfnfa1YUmiZzDLi7m/n36jSDfzP5+SMLlzWziFkesntSRHxH0n0Rsbmkw0g/mt43frkfNV8TgL2AA/I2ETgf+AEp+kXbm5m7gRfpqsdy0pdl0Y7AyibS+BPwXw37jiaNKmvG1IhYlJ+vljQ5Iq4HntFkOl8FfiTpMEn75C+FC0mT/ZrxGdLM6jkR8VxgDvAF0kicOvyIFF34+ZLm5pnV5wI/zMf3JDWLjeafwCYN+zYFur19/lZJ682LyvHsFjeRxhygcQ34r5EmQ9YmIgYi4oqIOCoiHgu8glQj/29Sc6gNpe6q0oa4AccBC0l9Ivvkx78BxzeRxpNYFyr8F/nxDuBJTeblauBx+fkvSJO8Xg/c2mQ6fcAHSJMEH8yPHyB12DaTTmXNaBX9W00FvkwqFAby45fJTWKkdVceVyKd75BWeZyQX4tUgJ5X999ji5/P4flv+WBgBfAaUpPRIU2ksQTYLD//G2nJ5a2AlTW9py2Bo4fYf2jh/8qkduapmzY3f9UgN3u8j/RHOjhyaz5wWhQitY6Sxvak/8SNI8iaqe0g6UBS881Pc6fr94HJwH9ExJeaSGdO8SXrRkg9HMOMphkmnUqa0aqWm0JmActiDM0eedjsZcAUYBGwE6nf4AURsbjCrLadpMOBd5EK2MXAZyLirCau/zJprtZXJJ0EvIVUg/tdRPzbOGR5tPxMIE0u3j8irs37ppEmru4UEcvanadu4kKlRvkPdTqFIapRcgZxHup6MalD8YIYY1DKPFLn9shNYLlfZh4wI5qYlDfMBLZBDwPfIq0dMmKhp7Qi4MHAqazrU3kf8K2IOKGJ/Ig03HqoUW1Nh6yvQh5p93LSe7qdFEy0J9ZSUUVLJ+R/tzfktL5W1+cj6dPAQxHxwfz6IOBN0USw1g1W3VWlDXEDnk1qHuovbAOkKKhl03gscArpF9XtpIB3jx1DXv5M+vXVmPZ1TaZzGGlY6T75+n1IEQPeDuwH/A44o0Q6Exi6Ga2vyfycQmoOPI0UwPG0/PpTJa79U+H530nNO4/amszPyeTRX8CL8ntbRRqAUPvfZM1/yx332ZAWzrup8Pp84M11f97dsLmmUoMqw83nqvrLSF/qLyHNFTmT9Ct41Ka0YriMhv33R8S0oa4ZJp2/Ac+JQtOApC1JzRq75Qlyv4hR1seQ9NeIeNwQ+6+LiCc2kZ/FpJnQ1xRGFD0LeF9E/Oso174xIr6htFDUUaRmxkf9Yo6IrzaRn1uB3SNipaRfkDqmV5KCeTY7KKJjVPG33KmfjdJCeG8k/YhYAmwXXvVxVC5USpB0OblpJypoOlHF4eYlTQH+jfQFOJe08Nck4C0RcfEo195EatdfUti3A2n+xI5N5GE5aUZ1cXLfVOC2WDfkdNSCarhzBguGJvJTjC21jDT8ur+VdFohaUVEzMgTHv9Bili8ttn8VJCPjvtb7pTPZoh8nUQaqHEN6QfKiD9GLHGU4nLmV5zeb0kTFVsKNy/paaR+g9eTZtV/HjgnIlZJOhj4CmlFyJF8Hzhb0ttJTT275HTOazI7VwDzJR1NGjQwh9QvckXO6xNpCKTY8F4Gh2hOLDwftHNOsxlLJc3JheUi4CW5cGl2CO/vJT0pcodtC+7Js8afAPw2f2lu3GKawCNBQS8jrfUxWhTk+VXcs6CKv+Vx+2xa9E3SnKnHk2piVoILlXKqrs61HG5e0tWkEUTfBl4UEb9vSOfrkj5VIqnjSUNkr2fd+zyXNMO/GW8FvkGagT6YzgJS8wGkvpLDR7j+RflxUuE5pPb5O0jNe834AvA0UrPFp0kT1kR6v824HLhQ0hmkYduPjPwq+2+VfYY0+xzgoPz4PNLn3qqvkkYAXgrsNsq5Lf8tS3pj4WUVSyeM52cDgKQfRZOBICPir/mHyLOAV1WVl17n5q8ScpMBpIiyVTQZtBxuXtIRpFpJJW28uf9jR9Iqh3e3kM5sUu1oaYlfzUNdf3pEHDnW+4+Q7nakuSVN/aKu4t+qkNYuwNqIuDm/nkdat6NtkQKq+Fse4TMpaurzGe/PRtIXIuI/xnDda0nNunVNvu06LlTMzKwyDtNiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaEyRpJmSjpBUqnlf8crDafTXel0Ul6cTvvS2aDUHSa5WzfSmiNBWr+8tjScTnel00l5cTrtS2dD2lxTMTOzyjhMyzD6pm0aE2cNH3U7BgYYWLmKCdOnoglDl81T7hx51dmIAVavfYDJEzclLYsyjDUjL4sSDLB64CEmT9gYjfQ7QRr+2GB+Bh5k8oRNRsxPrBk50G8wwGoeZjIbjZyfUfRiOp2UF6dTXTr3c9+yiNhyrPfZ7wWbxj33llu89Q/XPvzTiNh/rPcab45SPIyJszZn2xNbi2u426ceGP2kEnT7mOM7rm/SpEqSWXt76SXnh9f8Mu9mHeuSOLepxfUaLbu3n9/+dMT16x4xadubZrVyr/HmQsXMrHZBf4/80HKhYmZWswAGKl+2qR49V6hIWsC61e0OjYi9a8uMmVlJA7imYmZmFQiCNW7+6j15gtNMgElztq05N2a2oQig381fnamhuWt+k5cfSV7DfGDlqopyZGY2ul7pU/Hkx/WdDuwK7Dph+tS682JmG4gA+iNKbZ2u52oqrYiIe4B7ADaaW27MuJlZFXqjR8WFiplZ7YJwn4qZmVUjAtb0RpniQsXMrH6in5Fj83ULFypmZjULYMA1ld42ZekaHnfs7S2lsXbOmIOWrqdv1haVpKN7l1eSTt+M6ZWkU4X+FSurSahHJp410sRqgojG2pEjU1vrXFMxM7NKpMmPLlTMzKwCAayJ3pg2WNu7kPQTSe+r6/5mZp0iEP1MKLV1utpyGBEviYhPlDlX0mJJB5dNu9nzzczqNhAqtXU6N3+ZmdWsl/pU6mz+WiDp2Px8jqRzJd0h6XZJZ0ialo9dCMwBzpK0StLFkp6Rnxe3kHTAUOfX9R7NzMoR/TGh1Nbpas+hpCnAZcD1wFzg8cB2wGcBIuIVwBLgrRExNSJeHBFX5edTI2Iq8DHgRuDKoc5vIi8zJc2TNC96JhKPmXW6tPLjhFJbp+uE5q+XA4qI4/LrhyR9GPiVpMMjon+kiyUdArwL2CsilrWYl0dC36/uf6jFpMzMyokQq6Ov7mxUohMKlbnAHEmNM/MC2AZYOtyFkvYlhavfLyJurCAvpwPfAJjct/ENFaRnZlbKQI/0qXRCoXILsDAidh/hnEe1RUl6EvAd4M0R8dvRzi+jGPp+xuStx5KEmVnTUkd95zdtldEJ7+KHwGRJH5Q0TclsSa8unHMHsMvgC0mzgR8Dx0bE+UOkud75ZmadrT0d9ZL6JJ0m6W5J90v6nqRZI5x/tKSb8rl/l/SO0e5Re6ESEQ8C+5A66P8GrAAuBZ5cOO1k4GBJ90n6CfAiYDbwiYYRYC8f5nwzs47Vxo76Y4BXAXuSBkQBnD3UiZJeCZwIHBQR04A3AadJetFIN6iz+asPWA0QEbcCw05WjIgfk2omRfObPN/MrGP1l5/Y2CdpXuH1Pbnpvoy3ASdFxCKAHNXkRkk7RMQtDefuDFwTEb8BiIhfS7oW2AP42XA3qKVQyXNQdiYNA+5Ia6dP5p4XzGkpjXteWs0IskmTNqoknX/eMbeSdHY7bmHLaQw4unBXqSLasSMdDy8Qa6L01/HWQHEg0YnACaNdJGkz0hy+Pzxy34ibJK0kFRSNhcq3gMMk7QX8GtgLmAdcNNJ92l6oSHoKsICUsQvafX8zs07TZEf9ncDehddlaynT8uOKhv3LgaHWs7gLOBe4nHVdJe+JiD+PdJO2FyoR8SdgRrvva2bWqQI10/zVHxFjaS64Pz82fv9uBgzVdPBh4A2k/u2/kvq9L5D0UER8abib1N5Rb2Zm499RHxHLSdFGnjq4T9JOpFrKtUNc8jTg+xFxfSR/AX4AvGKk+4w5h44EbGZWjQjaFfvrDOD9kuZKmg58HPhpRCwe4txfAgdI2gVA0uOAAyj0yQylEyY/VkbSjsDNwPYRcVu9uTEzKyd11LclTMupwObAVcBGpFFcBwNIOgj4vxxPEeA0UlPZz/JclnuB7+Y0htVThYqZWbdqx4z6HEvx6Lw1HjsHOKfwei1pXssxzdyjknchaRNJ50v6kaTtJF2UZ2yukHSFpKcVzj1B0qWSTpF0V95OLBzfW9JaSQfmmZwrJH2nEApfkj4q6R95ludiSUfmy6/JjzfkyZAfbvJ9rItSPOChqmbWHkG5Bbq6YZGulgsVSdsAPwf+Abwyp/l5YAdSQMg/AudJKg50fx6pw+gx+ZoP5rHQg/qAF5PGTs8DnkKKRAxpNv2bgT3zLM9nAlfmY3vkx11z2PuPNPl2jiSN/75h7T9XNXmpmdnYeTnhZHfSpJjvRsR/RER/RCyJiAsi4sGIeAg4ljThphiLa2FEfDEi1ubZmlcDT29I+5iIWBURd5JGHAweXw1MAXaXNCUi7srDlKtwOrArsOvEKVNHO9fMrBIBDMSEUlunazWH/w48QKqZACBplqSvSVqSZ2remg9tWbju9oZ0HmDdxBxI47DvHup4RCwAPkgqrO7KK0E2FkhjEhH3RMTCiFioCZ3/j2dmvUL0l9w6XavfnMcA15FGB2ye930M2JbUPDUd2D7vr+zTiIgzIuI5pOa1q4Hz8iF3hJhZ1wlgTfSV2jpdq4XKWuAg4M/AAklbkSbSPAjcJ2kqaRx0ZSQ9U9JzJW0EPEyaJTq4OuTdpILFYe/NrGtEyM1fgyJiICIOJ4Wrv4LUFLYVKR7NtcCvWPelX4WppPXrl+V7vBg4MOflIVJogW9KWi7pQxXe18xs3LRp8uO4G/M8lYjYseH1e4H35pfPbjj964XzThgirb0Lzxc05qt4TURcRiHMwBBpnQKcMmLmzcw6SFpPpfP7S8rw5MdhTLz3QTb/TmuDymb+YsvRTyqjr6J21Mn/rCSZgYeqScfGn8PNdwt1RS2kDBcqZmY1S0OKXVMxM7MKtDH217jr2vpWDgcTOYikmVlXa9Ma9ePONRUzs5ql0Pdu/jIzs4r0Sp9K59elMknbSLogRy1eCOxfODZf0lkN53sRMTPrCilKcW9Mfuymmso5pHWU5wAbA+dWfQNJM4GZAFMfiTpjZja+UpiWzi8wyuiKQkXSbGAfYOeIWAGsyGuwXFzxrY4EjgdYHZ6LYWbtoq6ohZTRLe9iu/x4S2HfzeNwn0dC30/WlHFI3sxsaAOo1NbpuqKmAizNjzsAN+XnOxaO3w/MGnwhaSIp/lhTIuIeUjwxZkyYOZZ8mpk1rZdGf3VFTSUibgMWAJ+QNF3S1sBxhVP+ALxQ0twcvfijwKRHp2Rm1pl6paO+83O4zhuBjUiLfl0BfK1w7BzgAtLSxTeRlipe2piAmVkn6qU16rul+YuIuB14ecPu4jDit+Zt0P+Oe6bMzCoQwNouqIWU0TWFSrsFEP0tLgOz6sFK8sIWm1WSTP/0agYfTNhlx5bTuH/XGa1nBJh+6Q2VpNN/732VpIMq+mIIL2K6oemGpq0yXKiYmdWtS5q2ynChYmZWMy/SZWZmleqVmkpvNOI1kDRZ0rcl3SdpWd35MTMbyeAiXR791bleCzwTmB0RFfWWm5mNj0CsHeiN3/i9WqjsBNzkAsXMukWv9Kl0bNEo6d2S/ibpfklLJH1MUl8+FpKeUzh3b0lr8/P/Ic2231vSKknzm7jnTEnzJM0LD+k0s3YJN3+1w23AS4DFwJOBi/Lz/xvpooj4z9yP8pyI2LfJe66LUoyjFJtZewz2qfSCjq2pRMT3IuLmSP4EnA28cJxvuy5KMY5SbGbt45rKOJP0BuC9pP6RicBk4Dfjec9ilOLpjlJsZm0SiP4e6ajvyHchaXvg68DJwLYRMYMUy2uwmF4FbFq45DHtzaGZWbV6ZT2VjixUgKmkvN0NrJH0LOCQwvE/AG/O81F2JNVozMy6UvRQR31HFioR8VdSh/n5wHLgGOCbhVP+E9gZuBf4DjC/zVk0M6tUhEptna5j+1Qi4iTgpGGO/Zk0ubHofwrHTxi/nJmZVa07aiFldGyhUrsIYu2alpJYW1U49YrS0YRq/mhjo41aTmPGg9tUkBNY+IHdKknnscf8rpJ0YiAqScc2PO2oheS5fqcChwJTgIuBt0fEkOGsJG0FnEZay2oSsAh4aUT8Y7h7dGTzl5nZhiQC+gdUamvRMcCrgD2B7fK+s4c6UdIU4FJgNWmqxWbAQaSBUsNyTcXMrAM0MbKrT9K8wut78nSIMt4GnBQRiwAkvQ+4UdIOEXFLw7lvJhUk74iIwWabv4x2g46tqUj6Yg65YmbW04KmOuq3Bm4obEeWuYekzYA5pNGz6b4RNwErgT2GuOQFwN+B+ZLuyWGz/t9o9+nYmkpEHFF3HszM2qOpjvo7gb0Lr8vWUqblxxUN+5cD04c4fxapYHkP8O/Ak4CLJN0VEecMd5OOLVTMzDYkUX6MR39ELBzDLe7PjzMa9m9Gqq0Mdf7SiPhsfv17SV8n9ckMW6jU2vwlaRNJn5R0s6R7JV0kaed8bL6kswrnhqR3SLoqRy7+jaTdCscnSvqgpIWSlkv6paSn1/G+zMyaNd7zVCJiObAEeOrgPkk7kWop1w5xydWklrlHJTXSferuUzkT2A14FrAN8Fvgh5ImDXP+ocBrSNWyW0kBIAedSCpB9wdmAl8mVdU2L5uZ9ULf49D3ZtYeafTXhFJbi84A3i9prqTpwMeBn0bE4iHOnQ/MlPROSX2S9iCN/jpvpBvUVqhImgW8kTSy4M6IWE0qGLYlDXcbymkRsSQiHia94afntAS8C/iviFgUEf0R8SXgduBlTWTrSHLn12oeHsvbMjMbk4hyW4tOBS4ErgKWAn3AwQCSDpL0yHDhPBrspcBbSc1j5wInRMS3R7pBnX0qc/PjtalMeMQkYPthrrm98PwB1nU8zSLFC7tQUvFjn8S6sdhlnA58A2AyG93QxHVmZi1px+THiOgHjs5b47FzaOgriYgFwFOauUedhcrgmOhdIuLuxoOS9msirWWkQmbfiLhqrBlaL/S9thhrMmZmTQm6I65XGbU1f0XEXaRaweclzYY0jlrSqyVNbTKtAD4LfFLSLjmtqZL2k+Sw+GbW8aLk1unq7qg/nNSHsUDS/cB1wOsY22c3GNX4fEkrSZN2jqD+92hmNrKAGFCprdPVOk8lIh4Ejs1bo0MbzlXD6wUU8h8Ra4FP5c3MrKv0SvOXJz+ORC1WcqKiYcmt5mMwmcmTK0mnCiv32KqSdCY+UM1/xItu/WMl6bz0CXtXkk7/fY2Tnseoqr9BG3cVjOzqCC5UzMxqNhj7qxe4UDEzq1sALlTMzKwqvdL81XRjfZ51eU3Jc9eL39XEPVZJenaz15mZdadyI7+6YfRX04VKRJwTEUPF3m+apL0lrR3iHlMj4tdV3MPMrCv0yESVppu/JE0qrAJmZmatit7pqB+1piJpsaTjJF2eg40dJenGwvFJOeT8DTkk/U2SXltIYiNJZ+Zw9EslvT1f9xjgJ6SlMVfl7c35WEh6TuEeb8nprpR0tqSvS5pfOD5H0rmS7pB0u6QzJE2jSY5SbGa16ZGaStnmr8OB95ICODYOoD+ZFOXydaS4/M8HigvIvJYUFXMLUhTg/8nrIf8DeAlpwZmpeftq440lPQ/4n5yHLYAfA/9WOD4FuAy4nhSk8vGkIJKfbUyrBEcpNrOaqOTW2coWKmdGxJ9yjK2HBnfmkPPvJIWcvzaS2yKiuODLZRFxQUQMRMR5pKUrn9xEHt8EfDciLouItRHxTdK6K4NeDigijouIhyLiPuDDwEGS+pq4D6QoxbsCu05moyYvNTNrwUDJrcOV7VNZPMz+LYFNWb9m0uj2htfFkPVlzAZ+37DvlsLzucAcScsbzgnSwl9Ly97IUYrNrBYb4DyV4crHu4EHgV1IARybVabcXQrs0LBvDrAoP78FWBgRu4/h/mZmHWGDnadSlJvDPg98QtITlGwn6Uklk7iD1FE/d4RzzgZeK+kFeUnLA0nLDw/6ITA5DxaYlvMwW9Krx/SmzMzqsIF11I/kQ8B3gB8A9wMLgJ3LXBgRC4EvAL/Lo8MOGeKcnwPvJq05fx+pD+UHkHrSc6TjfUgd9H8jDSS4lOb6bczM6hUqt3W4UZu/ImLHhtfzSevDD75eDZyUt8ZrDy2R3juAdzTsawxzfyZw5uBrSb8Gri4cv5W8zrKZWTdSF9RCyuiK2F953stFwGrSOisfquAlAAAR/klEQVRPJ40KG8+bor5mB4+tL/orykqL+XhEfzUZWrPXE1tOY9NbHqggJzD9ijsqSeeln3l+Jensc+WtlaRz2V6zK0mnf+X9laTjEPrjLARdEIKljK4oVIDXAGcBfcCNwKsjYiwDA8zMOpNrKu0TEW+oOw9mZuOqRwqV2tZvbwzFMobrDy2GizEz62o9MvqrK2oqZmY9bQOc/GhmZuOoV0Z/VdL8Jeldkm7OUYqXSjol799R0ndz5ODlkn4paWbh0idJuipf9xtJuxXSXCDp2Ib7DNtkJmlingC5sHCvp1fx/szMxl2PNH+1XKhImgecCrw8IqYBuwMXSNqEFD34LmA3YBZwFGlY8KBDSSO7ZgG3kgI6jtWJwKuA/YGZpMmSF0navIn3si70vYdQmlkbKcptna6KmspaUjzm3SVNjYjlEfEb0sz3jYF3R8SKHGH4NxFRHDh/WkQsiYiHSRMqx1SzyNGS30WKlrwoIvoj4kukYJYvayKpQuj7f44lK2ZmY9MjM+pbLlQiYhFwEGm9k39IulLSi4EdgUUR8ajlgguKEYybjV5cNAuYClyYm76W56jFO5HWVimrEPp+yhizYmbWpLJNX11QU6mkoz6vk3KepMnAEcD5wNuBuZL6IsY0t/x+Ulh94JGVIoezjFQo7RsRV43hXkBD6PsJM0c528ysQl1QYJRRRZ/KrpL2z30oa0gBHQP4Hqn/5NOSZuSO9Gc1sczvH4ADJG2Zr/nocCfmaMmfBT4paZecr6mS9hulMDIz6wgaKLd1uir6VCYDx5GaspaT+jZeExEPkKIHb09aa2UZcBowqWS6nwb+CtxECh75o1HOP55UQzpf0sp8zyOocYKnmVlpbv5KIuI64F+GObYIGHJdkyEiES8o5iciVgD/2nCZCsfns3605LXAp/JmZtY1umVkVxme/DgMTZjAhOlTW0ojHniomrxsunE16Ww2o5J0Fh3ceuXvsWeXrbCO4t7GVaTHJtauqSSdS55cegT7iCZMqSaitCZUM1qoqojbNoIuGNlVhgsVM7NO4JqKmZlVxc1fZmZWjeiOkV1ldMTIKEknSLqk7nyYmdWmDaO/JPVJOk3S3Tnm4vckzSpx3X/k2IvHjnZuRxQqZmYbvPYMKT6GFCNxT9ZFGzl7pAsk7UCK23hdmRu4UDEz6wBtCij5NuDjOUbiCuB9wP654BjOl4APAfeWuUFlhYqkbSRdKGlFDj//llxd2jEfP1zSn/PxP+X4YA1J6BRJd+XtxIaDT5D001xtWyLpY5Im5WM75nsdIun6XK27WNK2Tb6HdVGK6ZEGTjPrNX2D31N5KxVTStJmwBxStBIAIuImYCWwxzDXvB14ICK+XTZzVdZUziGFZdkeeA5wSCFjhwPvJwWe3JxU6p0naefC9c8DlgCPAV4JfFDSXvn6rYCfA+cBs4FnAy8CPtCQhwNzOrNJccNOavI9rItSPOAoxWbWRuWbv7Ymf0/l7ciSdxgMkbWiYf9yYHrjyZLmAMcC7yj/JqpbpGs7UkiW/4qIlRFxF/CRwinvBk6KiGsiYiAifgxcDry+cM7CiPjiYIh8UmiWwVD4bwKuiYj/i4jVEbEU+FjeX3RiRCyLiJXAN2g+lP66KMUTHKXYzNokmor9dSf5eypvZdehGlx2pHEW9Gak2kqjs4CT8/dtaVUNKZ6dH5cU9t1SeD4X+F9Jn2u4922F18Uw+LB+KPy5wF45nP0gAX0N17QUSr8YpXjGxC2budTMrDXl+0v6I2Jh08lHLJe0BHgq6Uc7knYi1VKuHeKSFwFPkzQYzHcG8AxJ+0XEc4e7T1WFymBJNgdYVHg+6Bbg+Ij47hjTvwW4JCKaWXDLzKwriLZNfjwDeL+ky0k/oD8O/DQiFg9x7vYNr78LXAH890g3qKT5KyJuAxYAp0qaJmlLUlvcoE8DJ0h6spKNJT2nuCb9KL4GPF3SYZKmSJogaSdJ+1eRfzOz2rVnSPGpwIXAVaTKQB9wMICkgySteiQ7EbcVN+BhYGVE3DnSDarsqH8jsAmpSeuXpFIN4OGIOBP4BPAV4D5SM9mHKRkGPyLuAF4AHAAszml8n7Syo5lZdys5nLjV2kxeav3oiJgVEdMi4l8jYlk+dk5EDBtFNyL2joiTR7tHZWFaIuJ20rr0AEjaj1Sy3ZGPfxX46jDXnjDEvr0bXl9PGhU21PWLKYTFz/vmUwiNb2bW0XpkFkNlhYqkJ5M+lutIHesnA9/OqzJ2nejvZ2DFUAMimkujEg9XNLz53vsqSWbe225tOQ1NrCj0fV/jWI2xUUXpTJi9TSXpLH9GNenM+NGfK0mnf9Wq0U+ylvRKQMkqm782J80jWQVcSRpN8O4K0zcz611e+XF9EXE5sPOoJ5qZ2fq6pMAow6Hvzcw6QK80f7lQMTPrBC5UzMysKr2ySJcLFTOzurlPpTflENIzAaayWc25MbMNhWiYaNfFvEjX+taFvseh782sjXpkSLELlfWtC32PQ9+bWfu0aeXHcefmr4Ji6Pvp2qLm3JjZBqULCowyerqmIukESYvrzoeZ2YiaW6Sro/V6TWUOKSS/mVln65GaSq8XKs8BXlh3JszMRtMN/SVl9HShEhHz6s5DR1E1rZ0xUMFff1URnCtSVUTpCRWlc8delSTD1LdXMzT+vrOf2HIaW5z9+wpyAhOmbFRJOmufsksl6QDwi3NbT8OFipmZVcU1FTMzq0bgRbrMzKwawjUVMzOrkguVziRpAevWpj+0ca17M7NOpO5cef1Req5QMTPrOl0S16sMFyoFjlJsZnVxn0qHamjumt/k5UcCxwM4SrGZtVM3hGApo6djf42BoxSbWT16JPR9z9VUWuEoxWZWiy4Ja1+GCxUzs07gQsXMzKrgyY9mZlYpVRGotQO4UDEzq1uXdMKX4UJlBJWEeO8k0TljFmNt5+SlSmuXLK0knZ2/Wc1AkYe2nl1JOssOWN1yGvu+u/U0AH531JMqSWejqxdXkk5VemVIsQsVM7NO0CO/YV2omJl1AHfUm5lZNQLokYCSXTmjXtJiSQfXnQ8zs6pooNzW6VxTMTOrWS/NUxnXmoqkd0m6WdL9kpZKOiXv/4qkW/P+6yW9seG6l+X9qyT9UNKn8zopSLoQmAOclY9fnPdPlPRBSQslLZf0S0lPH8/3Z2ZWiYjyW4cbt0JF0jzgVODlETEN2B24IB++EngysBlwEjBf0uPzdY8FzgM+ko9/GnjLYLoR8QpgCfDWiJgaES/Oh04EXgXsTwpf/2XgIkmbN5HnmZLmSZoXvbJgtJl1BUW5rdONZ01lLalWt7ukqRGxPCJ+AxARX4qIeyKiPyK+BVwL7J2vewPw24j4ZkSsjYhLgfNHupEkAe8C/isiFuV0vwTcDrysiTwfCdwA3LCah5u4zMysRW2IUiypT9Jpku7OLUXfkzRrmHNfKukyScsk3SfpCknPHe0e41aoRMQi4CDgcOAfkq6U9GJJEySdJOkGSSskLQf2ALbMl84GbmlIrvF1o1nAVODC3PS1PKe7E7BdE9kuhL7fqInLzMxa06aayjGkFp09WffdePYw525O+k7cmfT9/A3gJ5K2H+kG49pRHxHnAedJmgwcQapxvDVvLwauj4gBSb8n1WoAluZjRXMaXje2TS0DHgD2jYirWsivQ9+bWfsF0F+6xOjL3QuD7snfXWW8DTgp/+hH0vuAGyXtEBHr/XiPiHMarv2CpOOBZwC3DneD8exT2VXS/pI2AdYAK0gf3XRS09jdwARJh5FqKoO+Bewp6d9yVe0FwAENyd8B7DL4IiIC+CzwSUm75PtPlbSfpMeM01s0M6tMEzWVrcnN9Hk7slT60makH+h/GNwXETcBK1n/O3i4659IahW6bqTzxrNPZTJwHKlfYzmpz+M1wFeB3wI3kmoljweuGLwoIm4EXkfqeF8BHEWqnhU7OU4GDs7tfD/J+44n1YTOl7QS+DupdtSVc3HMbANTfvTXneRm+rydXvIO0/Ljiob9y0k/9oclaSvge8AnI+LvI507bs1fEXEd8C/DHH7dKNdewLqRYkj6JoV+lYj4MfDjhmvWAp/Km5lZV2miv6Q/IhaO4Rb358cZDfs3I9VWhpRbe34GXAx8YLSbdOTkR0mvJA07XkkavfUaYL+2Z6SDovrahqXvupsqSWfqDZMqSWfe7SP2zZbyy633rCAnMPXaRZWkE2vXVpJOJdoQ+j4ilktaAjwVuBpA0k6kWsq1Q10jaUfgUuD7EXF0mft0ZKECPI80z2QKaU7KERFxeb1ZMjMbHwJUvqO+FWcA75d0OWlQ0seBn0bE4kflSdoNuASYHxHHlr1BR/Y3RMTRETErT258fER8ue48mZmNJ0WU2lp0KnAhcBWpT7sPOBhA0kGSVhXOfT9pisd7cvSSwe2gkW7QqTUVM7MNR5tWfoyIfuDovDUeOwc4p/D634F/b/YeLlTMzGrXHXG9ynChYmbWAbohrlcZPV+oSJoUEWvqzoeZ2Yh6pKbSkR31o5G0iaRP5rD690q6SNLO+dgCSZ+R9IM8CfKoJtJ1lGIza79Io7/KbJ2uKwsV4ExgN+BZwDakGfo/lDQ4KP8w4HOkST6fayJdRyk2s3q0IUpxO3RdoZLDNL8ReEdE3BkRq0khXbYlRd4EODciLovkwSaSd5RiM6tFm4YUj7tu7FOZmx+vTcuoPGISMDjtd/FYEnaUYjOrTRcUGGV0Y6EyGANsl4i4u/GgpLfz6ND4ZmadK+iZb62ua/6KiLtIi8V8XtJsSCGdJb1a0tR6c2dm1jxRrumrG5q/uq5QyQ4ndagvkHQ/Kb7/6+iKbiwzsyEMDJTbOlw3Nn+RO9+PzVujvdubGzOzFvVQ81dXFipmHaui5RL6H2hm0OIIVlX0TfW7xnWdmrfJ5MkVZATufPNTK0ln6188qkt27Ja3nkQ3NG2V4ULFzKwTuFAxM7NqOKCkmZlVJYAuCMFShgsVM7MO4D4VMzOrjgsVMzOrRAADLlR6jqSZwEyAqcyoOTdmtuHonY76bp1RP14c+t7M6hFRbutwLlTW59D3ZtZ+AfQPlNs6nJu/Chz63szqEZVFY6ibCxUzs07QBU1bZfRs85ekL0r6Sd35MDMb1eDorzJbh+vZmkpEHFF3HszMSuuRmkrPFipVUF9fS9dHf39FOTEbI1XTGNHq/wWo7v/Dyrmjn1PG1NetqSYhgH0rSMOFipmZVSICeuRHqAsVM7NO4JqKmZlVxoWKmZlVoztGdpXRkUOKJU2RtFLStnXnxcxs3AVEDJTaOl1HFCqS+iRtVdj1IuAvEXH7KNdtM745MzNrkx4J01JroSJpT0mfAW4DDi0cOgD4fj5nX0l/yjWXZZIuKZx3nKRFkj4qafcK8jNT0jxJ84LeqIqaWReIgIGBcluHa3uhIunxkj4i6UbgW8A/gZdExCfy8T7gFcAP8iVfAz4HzABmAycXknsn8KZ87FJJ10g6RtIOY8xeIUrxP8eYhJnZGDhKcXMkvU7S1cDPgGnAwRExNyKOiYirC6fuBSyLiIX59WrgscDWEfFwRCwYPDGSKyPiP0kFzlHAzsAfJf1S0guazGYhSvGUsbxNM7MxiYGBUluna2dNZTawE/AX4Brgb8Oc90jTV/YqYBfgOknXS3rPUBdFRH8h7RtJhcNWQ507nIi4JyIWRsRCoWYuNTNrQclaimsq60TEZ4CtgbOAVwJLJF0o6WBJ0wunHsC6pi8i4pqIOJBUQLwd+JikfQaPS9pS0hGSLgf+CjwTOBHYJiK+Pe5vzMysVT0UULKtfSoR8VBEfCciXg1sD5xH6hP5h6TDJO0BTAJ+DyBpsqQ3S5oVEQHcBwwA/fn4ccDNwH7A54FtI+KQiPhxRKxt53szMxurIMVGK7N1utomP0bECuArwFfycOItgdcB5+cCZNCBwH9LmgLcBRwfET/Px34IfDanZWbWncKLdFUqIu4C7pJ0DnB0Yf9q4KUjXPfHNmTPzGzcRRc0bZWh6JCOH0mTgWOAUzqh6UrS3cAtdefDzLrCDhGx5VgvlnQRMKvk6csiYv+x3mu8dUyhYmZm3a8jwrSYmVlvcKFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaVcaFiZmaV+f+MBdiOezSSbgAAAABJRU5ErkJggg==\\n\",\n      \"text/plain\": [\n       \"<Figure size 432x288 with 2 Axes>\"\n      ]\n     },\n     \"metadata\": {\n      \"needs_background\": \"light\"\n     },\n     \"output_type\": \"display_data\"\n    }\n   ],\n   \"source\": [\n    \"# This plots a chosen sentence, for which we saved the attention scores above.\\n\",\n    \"idx = 5\\n\",\n    \"src = valid_data[idx].src + [\\\"</s>\\\"]\\n\",\n    \"trg = valid_data[idx].trg + [\\\"</s>\\\"]\\n\",\n    \"pred = hypotheses[idx].split() + [\\\"</s>\\\"]\\n\",\n    \"pred_att = alphas[idx][0].T[:, :len(pred)]\\n\",\n    \"print(\\\"src\\\", src)\\n\",\n    \"print(\\\"ref\\\", trg)\\n\",\n    \"print(\\\"pred\\\", pred)\\n\",\n    \"plot_heatmap(src, pred, pred_att)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Congratulations! You've finished this notebook.\\n\",\n    \"\\n\",\n    \"What didn't we cover?\\n\",\n    \"\\n\",\n    \"- Subwords / Byte Pair Encoding [[paper]](https://arxiv.org/abs/1508.07909) [[github]](https://github.com/rsennrich/subword-nmt) let you deal with unknown words. \\n\",\n    \"- You can implement a [multiplicative/bilinear attention mechanism](https://arxiv.org/abs/1508.04025) instead of the additive one used here.\\n\",\n    \"- We used greedy decoding here to get translations, but you can get better results with beam search.\\n\",\n    \"- The original model only uses a single dropout layer (in the decoder), but you can experiment with adding more dropout layers, for example on the word embeddings and the source word representations.\\n\",\n    \"- You can experiment with multiple encoder/decoder layers.\",\n    \"- Experiment with a benchmarked and improved codebase: [Joey NMT](https://github.com/joeynmt/joeynmt)\"\n   ]\n  },\n  {\n    \"metadata\": {},\n    \"cell_type\": \"markdown\",\n    \"source\": [\n      \"If this was useful to your research, please consider citing:\\n\",\n      \"\\n\",\n      \"> J Bastings. 2018. The Annotated Encoder-Decoder with Attention. https://bastings.github.io/annotated_encoder_decoder/\\n\",\n      \"\\n\",\n      \"Or use the following `Bibtex`:\\n\",\n      \"```\\n\",\n      \"@misc{bastings2018annotated,\\n\",\n      \"  title={The Annotated Encoder-Decoder with Attention},\\n\",\n      \"  author={Bastings, J.},\\n\",\n      \"  journal={https://bastings.github.io/annotated\\\\_encoder\\\\_decoder/},\\n\",\n      \"  year={2018}\\n\",\n      \"}```\"\n    ]\n  }  \n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.6.5\"\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "index.md",
    "content": "<script src=\"https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML\" type=\"text/javascript\"></script>\n\n# The Annotated Encoder-Decoder with Attention\n\nRecently, Alexander Rush wrote a blog post called [The Annotated Transformer](http://nlp.seas.harvard.edu/2018/04/03/attention.html), describing the Transformer model from the paper [Attention is All You Need](https://arxiv.org/abs/1706.03762). This post can be seen as a **prequel** to that: *we will implement an Encoder-Decoder with Attention* using (Gated) Recurrent Neural Networks, very closely following the original attention-based neural machine translation paper [\"Neural Machine Translation by Jointly Learning to Align and Translate\"](https://arxiv.org/abs/1409.0473) of Bahdanau et al. (2015). \n\nThe idea is that going through both blog posts will make you familiar with two very influential sequence-to-sequence architectures. If you have any comments or suggestions, please let me know: [@BastingsJasmijn](https://twitter.com/BastingsJasmijn).\n\n[Click here to open this notebook in Google Colab.](https://colab.research.google.com/github/bastings/annotated_encoder_decoder/blob/master/annotated_encoder_decoder.ipynb)\n\n# Model Architecture\n\nWe will model the probability $$p(Y\\mid X)$$ of a target sequence $$Y=(y_1, \\dots, y_{N})$$ given a source sequence $$X=(x_1, \\dots, x_M)$$ directly with a neural network: an Encoder-Decoder.\n\n<img src=\"images/bahdanau.png\" width=\"636\">\n\n#### Encoder \n\nThe encoder reads in the source sentence (*at the bottom of the figure*) and produces a sequence of hidden states $$\\mathbf{h}_1, \\dots, \\mathbf{h}_M$$, one for each source word. These states should capture the meaning of a word in its context of the given sentence.\n\nWe will use a bi-directional recurrent neural network (Bi-RNN) as the encoder; a Bi-GRU in particular.\n\nFirst of all we **embed** the source words. \nWe simply look up the **word embedding** for each word in a (randomly initialized) lookup table.\nWe will denote the word embedding for word $i$ in a given sentence with $\\mathbf{x}_i$.\nBy embedding words, our model may exploit the fact that certain words (e.g. *cat* and *dog*) are semantically similar, and can be processed in a similar way.\n\nNow, how do we get hidden states $$\\mathbf{h}_1, \\dots, \\mathbf{h}_M$$? A forward GRU reads the source sentence left-to-right, while a backward GRU reads it right-to-left.\nEach of them follows a simple recursive formula: \n$$\\mathbf{h}_j = \\text{GRU}( \\mathbf{x}_j , \\mathbf{h}_{j - 1} )$$\ni.e. we obtain the next state from the previous state and the current input word embedding.\n\nThe hidden state of the forward GRU at time step j will know what words **precede** the word at that time step, but it doesn't know what words will follow. In contrast, the backward GRU will only know what words **follow** the word at time step j. By **concatenating** those two hidden states (*shown in blue in the figure*), we get $$\\mathbf{h}_j$$, which captures word j in its full sentence context.\n\n\n#### Decoder \n\nThe decoder (*at the top of the figure*) is a GRU with hidden state $\\mathbf{s_i}$. It follows a similar formula to the encoder, but takes one extra input $$\\mathbf{c}_{i}$$ (*shown in yellow*).\n\n$$\\mathbf{s}_{i} = f( \\mathbf{s}_{i - 1}, \\mathbf{y}_{i - 1}, \\mathbf{c}_i )$$\n\nHere, $$\\mathbf{y}_{i - 1}$$ is the previously generated target word (*not shown*).\n\nAt each time step, an **attention mechanism** dynamically selects that part of the source sentence that is most relevant for predicting the current target word. It does so by comparing the last decoder state with each source hidden state. The result is a context vector $\\mathbf{c_i}$ (*shown in yellow*).\nLater the attention mechanism is explained in more detail.\n\nAfter computing the decoder state $\\mathbf{s}_i$, a non-linear function $g$ (which applies a [softmax](https://en.wikipedia.org/wiki/Softmax_function)) gives us the probability of the target word $y_i$ for this time step:\n\n$$ p(y_i \\mid y_{<i}, x_1^M) = g(\\mathbf{s}_i, \\mathbf{c}_i, \\mathbf{y}_{i - 1})$$\n\nBecause g applies a softmax, it provides a vector the size of the output vocabulary that sums to 1.0: it is a distribution over all target words. During test time, we would select the word with the highest probability for our translation.\n\nNow, for optimization, a [cross-entropy loss](https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy) is used to maximize the probability of selecting the correct word at this time step. All parameters (including word embeddings) are then updated to maximize this probability.\n\n\n\n# Prelims\n\nThis tutorial requires **PyTorch >= 0.4.1** and was tested with **Python 3.6**.  \n\nMake sure you have those versions, and install the packages below if you don't have them yet.\n\n\n```python\n#!pip install torch numpy matplotlib sacrebleu\n```\n\n\n```python\n%matplotlib inline\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math, copy, time\nimport matplotlib.pyplot as plt\nfrom torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence\nfrom IPython.core.debugger import set_trace\n\n# we will use CUDA if it is available\nUSE_CUDA = torch.cuda.is_available()\nDEVICE=torch.device('cuda:0') # or set to 'cpu'\nprint(\"CUDA:\", USE_CUDA)\nprint(DEVICE)\n\nseed = 42\nnp.random.seed(seed)\ntorch.manual_seed(seed)\ntorch.cuda.manual_seed(seed)\n```\n\n    CUDA: True\n    cuda:0\n\n\n# Let's start coding!\n\n## Model class\n\nOur base model class `EncoderDecoder` is very similar to the one in *The Annotated Transformer*.\n\nOne difference is that our encoder also returns its final states (`encoder_final` below), which is used to initialize the decoder RNN. We also provide the sequence lengths as the RNNs require those.\n\n\n```python\nclass EncoderDecoder(nn.Module):\n    \"\"\"\n    A standard Encoder-Decoder architecture. Base for this and many \n    other models.\n    \"\"\"\n    def __init__(self, encoder, decoder, src_embed, trg_embed, generator):\n        super(EncoderDecoder, self).__init__()\n        self.encoder = encoder\n        self.decoder = decoder\n        self.src_embed = src_embed\n        self.trg_embed = trg_embed\n        self.generator = generator\n        \n    def forward(self, src, trg, src_mask, trg_mask, src_lengths, trg_lengths):\n        \"\"\"Take in and process masked src and target sequences.\"\"\"\n        encoder_hidden, encoder_final = self.encode(src, src_mask, src_lengths)\n        return self.decode(encoder_hidden, encoder_final, src_mask, trg, trg_mask)\n    \n    def encode(self, src, src_mask, src_lengths):\n        return self.encoder(self.src_embed(src), src_mask, src_lengths)\n    \n    def decode(self, encoder_hidden, encoder_final, src_mask, trg, trg_mask,\n               decoder_hidden=None):\n        return self.decoder(self.trg_embed(trg), encoder_hidden, encoder_final,\n                            src_mask, trg_mask, hidden=decoder_hidden)\n```\n\nTo keep things easy we also keep the `Generator` class the same. \nIt simply projects the pre-output layer (x in the `forward` function below) to obtain the output layer, so that the final dimension is the target vocabulary size.\n\n\n```python\nclass Generator(nn.Module):\n    \"\"\"Define standard linear + softmax generation step.\"\"\"\n    def __init__(self, hidden_size, vocab_size):\n        super(Generator, self).__init__()\n        self.proj = nn.Linear(hidden_size, vocab_size, bias=False)\n\n    def forward(self, x):\n        return F.log_softmax(self.proj(x), dim=-1)\n```\n\n## Encoder\n\nOur encoder is a bi-directional GRU. \n\nBecause we want to process multiple sentences at the same time for speed reasons (it is more effcient on GPU), we need to support **mini-batches**. Sentences in a mini-batch may have different lengths, which means that the RNN needs to unroll further for certain sentences while it might already have finished for others:\n\n```\nExample: mini-batch with 3 source sentences of different lengths (7, 5, and 3).\nEnd-of-sequence is marked with a \"3\" here, and padding positions with \"1\".\n\n+---------------+\n| 4 5 9 8 7 8 3 |\n+---------------+\n| 5 4 8 7 3 1 1 |\n+---------------+\n| 5 8 3 1 1 1 1 |\n+---------------+\n```\nYou can see that, when computing hidden states for this mini-batch, for sentence #2 and #3 we will need to stop updating the hidden state after we have encountered \"3\". We don't want to incorporate the padding values (1s).\n\nLuckily, PyTorch has convenient helper functions called `pack_padded_sequence` and `pad_packed_sequence`.\nThese functions take care of masking and padding, so that the resulting word representations are simply zeros after a sentence stops.\n\nThe code below reads in a source sentence (a sequence of word embeddings) and produces the hidden states.\nIt also returns a final vector, a summary of the complete sentence, by concatenating the first and the last hidden states (they have both seen the whole sentence, each in a different direction). We will use the final vector to initialize the decoder.\n\n\n```python\nclass Encoder(nn.Module):\n    \"\"\"Encodes a sequence of word embeddings\"\"\"\n    def __init__(self, input_size, hidden_size, num_layers=1, dropout=0.):\n        super(Encoder, self).__init__()\n        self.num_layers = num_layers\n        self.rnn = nn.GRU(input_size, hidden_size, num_layers, \n                          batch_first=True, bidirectional=True, dropout=dropout)\n        \n    def forward(self, x, mask, lengths):\n        \"\"\"\n        Applies a bidirectional GRU to sequence of embeddings x.\n        The input mini-batch x needs to be sorted by length.\n        x should have dimensions [batch, time, dim].\n        \"\"\"\n        packed = pack_padded_sequence(x, lengths, batch_first=True)\n        output, final = self.rnn(packed)\n        output, _ = pad_packed_sequence(output, batch_first=True)\n\n        # we need to manually concatenate the final states for both directions\n        fwd_final = final[0:final.size(0):2]\n        bwd_final = final[1:final.size(0):2]\n        final = torch.cat([fwd_final, bwd_final], dim=2)  # [num_layers, batch, 2*dim]\n\n        return output, final\n```\n\n### Decoder\n\nThe decoder is a conditional GRU. Rather than starting with an empty state like the encoder, its initial hidden state results from a projection of the encoder final vector. \n\n#### Training\nIn `forward` you can find a for-loop that computes the decoder hidden states one time step at a time. \nNote that, during training, we know exactly what the target words should be! (They are in `trg_embed`.) This means that we are not even checking here what the prediction is! We simply feed the correct previous target word embedding to the GRU at each time step. This is called teacher forcing.\n\nThe `forward` function returns all decoder hidden states and pre-output vectors. Elsewhere these are used to compute the loss, after which the parameters are updated.\n\n#### Prediction\nFor prediction time, for forward function is only used for a single time step. After predicting a word from the returned pre-output vector, we can call it again, supplying it the word embedding of the previously predicted word and the last state.\n\n\n```python\nclass Decoder(nn.Module):\n    \"\"\"A conditional RNN decoder with attention.\"\"\"\n    \n    def __init__(self, emb_size, hidden_size, attention, num_layers=1, dropout=0.5,\n                 bridge=True):\n        super(Decoder, self).__init__()\n        \n        self.hidden_size = hidden_size\n        self.num_layers = num_layers\n        self.attention = attention\n        self.dropout = dropout\n                 \n        self.rnn = nn.GRU(emb_size + 2*hidden_size, hidden_size, num_layers,\n                          batch_first=True, dropout=dropout)\n                 \n        # to initialize from the final encoder state\n        self.bridge = nn.Linear(2*hidden_size, hidden_size, bias=True) if bridge else None\n\n        self.dropout_layer = nn.Dropout(p=dropout)\n        self.pre_output_layer = nn.Linear(hidden_size + 2*hidden_size + emb_size,\n                                          hidden_size, bias=False)\n        \n    def forward_step(self, prev_embed, encoder_hidden, src_mask, proj_key, hidden):\n        \"\"\"Perform a single decoder step (1 word)\"\"\"\n\n        # compute context vector using attention mechanism\n        query = hidden[-1].unsqueeze(1)  # [#layers, B, D] -> [B, 1, D]\n        context, attn_probs = self.attention(\n            query=query, proj_key=proj_key,\n            value=encoder_hidden, mask=src_mask)\n\n        # update rnn hidden state\n        rnn_input = torch.cat([prev_embed, context], dim=2)\n        output, hidden = self.rnn(rnn_input, hidden)\n        \n        pre_output = torch.cat([prev_embed, output, context], dim=2)\n        pre_output = self.dropout_layer(pre_output)\n        pre_output = self.pre_output_layer(pre_output)\n\n        return output, hidden, pre_output\n    \n    def forward(self, trg_embed, encoder_hidden, encoder_final, \n                src_mask, trg_mask, hidden=None, max_len=None):\n        \"\"\"Unroll the decoder one step at a time.\"\"\"\n                                         \n        # the maximum number of steps to unroll the RNN\n        if max_len is None:\n            max_len = trg_mask.size(-1)\n\n        # initialize decoder hidden state\n        if hidden is None:\n            hidden = self.init_hidden(encoder_final)\n        \n        # pre-compute projected encoder hidden states\n        # (the \"keys\" for the attention mechanism)\n        # this is only done for efficiency\n        proj_key = self.attention.key_layer(encoder_hidden)\n        \n        # here we store all intermediate hidden states and pre-output vectors\n        decoder_states = []\n        pre_output_vectors = []\n        \n        # unroll the decoder RNN for max_len steps\n        for i in range(max_len):\n            prev_embed = trg_embed[:, i].unsqueeze(1)\n            output, hidden, pre_output = self.forward_step(\n              prev_embed, encoder_hidden, src_mask, proj_key, hidden)\n            decoder_states.append(output)\n            pre_output_vectors.append(pre_output)\n\n        decoder_states = torch.cat(decoder_states, dim=1)\n        pre_output_vectors = torch.cat(pre_output_vectors, dim=1)\n        return decoder_states, hidden, pre_output_vectors  # [B, N, D]\n\n    def init_hidden(self, encoder_final):\n        \"\"\"Returns the initial decoder state,\n        conditioned on the final encoder state.\"\"\"\n\n        if encoder_final is None:\n            return None  # start with zeros\n\n        return torch.tanh(self.bridge(encoder_final))            \n\n```\n\n### Attention                                                                                                                                                                               \n\nAt every time step, the decoder has access to *all* source word representations $$\\mathbf{h}_1, \\dots, \\mathbf{h}_M$$. \nAn attention mechanism allows the model to focus on the currently most relevant part of the source sentence.\nThe state of the decoder is represented by GRU hidden state $$\\mathbf{s}_i$$.\nSo if we want to know which source word representation(s) $$\\mathbf{h}_j$$ are most relevant, we will need to define a function that takes those two things as input.\n\nHere we use the MLP-based, additive attention that was used in Bahdanau et al.:\n\n<img src=\"images/attention.png\" width=\"280\">\n\n\nWe apply an MLP with tanh-activation to both the current decoder state $$\\bf s_i$$ (the *query*) and each encoder state $$\\bf h_j$$ (the *key*), and then project this to a single value (i.e. a scalar) to get the *attention energy* $$e_{ij}$$. \n\nOnce all energies are computed, they are normalized by a softmax so that they sum to one: \n\n$$ \\alpha_{ij} = \\text{softmax}(\\mathbf{e}_i)[j] $$\n\n$$\\sum_j \\alpha_{ij} = 1.0$$ \n\nThe context vector for time step $i$ is then a weighted sum of the encoder hidden states (the *values*):\n$$\\mathbf{c}_i = \\sum_j \\alpha_{ij} \\mathbf{h}_j$$\n\n\n```python\nclass BahdanauAttention(nn.Module):\n    \"\"\"Implements Bahdanau (MLP) attention\"\"\"\n    \n    def __init__(self, hidden_size, key_size=None, query_size=None):\n        super(BahdanauAttention, self).__init__()\n        \n        # We assume a bi-directional encoder so key_size is 2*hidden_size\n        key_size = 2 * hidden_size if key_size is None else key_size\n        query_size = hidden_size if query_size is None else query_size\n\n        self.key_layer = nn.Linear(key_size, hidden_size, bias=False)\n        self.query_layer = nn.Linear(query_size, hidden_size, bias=False)\n        self.energy_layer = nn.Linear(hidden_size, 1, bias=False)\n        \n        # to store attention scores\n        self.alphas = None\n        \n    def forward(self, query=None, proj_key=None, value=None, mask=None):\n        assert mask is not None, \"mask is required\"\n\n        # We first project the query (the decoder state).\n        # The projected keys (the encoder states) were already pre-computated.\n        query = self.query_layer(query)\n        \n        # Calculate scores.\n        scores = self.energy_layer(torch.tanh(query + proj_key))\n        scores = scores.squeeze(2).unsqueeze(1)\n        \n        # Mask out invalid positions.\n        # The mask marks valid positions so we invert it using `mask & 0`.\n        scores.data.masked_fill_(mask == 0, -float('inf'))\n        \n        # Turn scores to probabilities.\n        alphas = F.softmax(scores, dim=-1)\n        self.alphas = alphas        \n        \n        # The context vector is the weighted sum of the values.\n        context = torch.bmm(alphas, value)\n        \n        # context shape: [B, 1, 2D], alphas shape: [B, 1, M]\n        return context, alphas\n```\n\n## Embeddings and Softmax                                                                                                                                                                                                                                                                                           \nWe use learned embeddings to convert the input tokens and output tokens to vectors of dimension `emb_size`.\n\nWe will simply use PyTorch's [nn.Embedding](https://pytorch.org/docs/stable/nn.html?highlight=embedding#torch.nn.Embedding) class.\n\n## Full Model\n\nHere we define a function from hyperparameters to a full model. \n\n\n```python\ndef make_model(src_vocab, tgt_vocab, emb_size=256, hidden_size=512, num_layers=1, dropout=0.1):\n    \"Helper: Construct a model from hyperparameters.\"\n\n    attention = BahdanauAttention(hidden_size)\n\n    model = EncoderDecoder(\n        Encoder(emb_size, hidden_size, num_layers=num_layers, dropout=dropout),\n        Decoder(emb_size, hidden_size, attention, num_layers=num_layers, dropout=dropout),\n        nn.Embedding(src_vocab, emb_size),\n        nn.Embedding(tgt_vocab, emb_size),\n        Generator(hidden_size, tgt_vocab))\n\n    return model.cuda() if USE_CUDA else model\n```\n\n# Training\n\nThis section describes the training regime for our models.\n\nWe stop for a quick interlude to introduce some of the tools \nneeded to train a standard encoder decoder model. First we define a batch object that holds the src and target sentences for training, as well as their lengths and masks. \n\n## Batches and Masking\n\n\n```python\nclass Batch:\n    \"\"\"Object for holding a batch of data with mask during training.\n    Input is a batch from a torch text iterator.\n    \"\"\"\n    def __init__(self, src, trg, pad_index=0):\n        \n        src, src_lengths = src\n        \n        self.src = src\n        self.src_lengths = src_lengths\n        self.src_mask = (src != pad_index).unsqueeze(-2)\n        self.nseqs = src.size(0)\n        \n        self.trg = None\n        self.trg_y = None\n        self.trg_mask = None\n        self.trg_lengths = None\n        self.ntokens = None\n\n        if trg is not None:\n            trg, trg_lengths = trg\n            self.trg = trg[:, :-1]\n            self.trg_lengths = trg_lengths\n            self.trg_y = trg[:, 1:]\n            self.trg_mask = (self.trg_y != pad_index)\n            self.ntokens = (self.trg_y != pad_index).data.sum().item()\n        \n        if USE_CUDA:\n            self.src = self.src.cuda()\n            self.src_mask = self.src_mask.cuda()\n\n            if trg is not None:\n                self.trg = self.trg.cuda()\n                self.trg_y = self.trg_y.cuda()\n                self.trg_mask = self.trg_mask.cuda()\n                \n```\n\n## Training Loop\nThe code below trains the model for 1 epoch (=1 pass through the training data).\n\n\n```python\ndef run_epoch(data_iter, model, loss_compute, print_every=50):\n    \"\"\"Standard Training and Logging Function\"\"\"\n\n    start = time.time()\n    total_tokens = 0\n    total_loss = 0\n    print_tokens = 0\n\n    for i, batch in enumerate(data_iter, 1):\n        \n        out, _, pre_output = model.forward(batch.src, batch.trg,\n                                           batch.src_mask, batch.trg_mask,\n                                           batch.src_lengths, batch.trg_lengths)\n        loss = loss_compute(pre_output, batch.trg_y, batch.nseqs)\n        total_loss += loss\n        total_tokens += batch.ntokens\n        print_tokens += batch.ntokens\n        \n        if model.training and i % print_every == 0:\n            elapsed = time.time() - start\n            print(\"Epoch Step: %d Loss: %f Tokens per Sec: %f\" %\n                    (i, loss / batch.nseqs, print_tokens / elapsed))\n            start = time.time()\n            print_tokens = 0\n\n    return math.exp(total_loss / float(total_tokens))\n```\n\n## Training Data and Batching\n\nWe will use torch text for batching. This is discussed in more detail below. \n\n## Optimizer\n\nWe will use the [Adam optimizer](https://arxiv.org/abs/1412.6980) with default settings ($$\\beta_1=0.9$$, $$\\beta_2=0.999$$ and $$\\epsilon=10^{-8}$$).\n\nWe will use 0.0003 as the learning rate here, but for different problems another learning rate may be more appropriate. You will have to tune that.\n\n# A First  Example\n\nWe can begin by trying out a simple copy-task. Given a random set of input symbols from a small vocabulary, the goal is to generate back those same symbols. \n\n## Synthetic Data\n\n\n```python\ndef data_gen(num_words=11, batch_size=16, num_batches=100, length=10, pad_index=0, sos_index=1):\n    \"\"\"Generate random data for a src-tgt copy task.\"\"\"\n    for i in range(num_batches):\n        data = torch.from_numpy(\n          np.random.randint(1, num_words, size=(batch_size, length)))\n        data[:, 0] = sos_index\n        data = data.cuda() if USE_CUDA else data\n        src = data[:, 1:]\n        trg = data\n        src_lengths = [length-1] * batch_size\n        trg_lengths = [length] * batch_size\n        yield Batch((src, src_lengths), (trg, trg_lengths), pad_index=pad_index)\n```\n\n## Loss Computation\n\n\n```python\nclass SimpleLossCompute:\n    \"\"\"A simple loss compute and train function.\"\"\"\n\n    def __init__(self, generator, criterion, opt=None):\n        self.generator = generator\n        self.criterion = criterion\n        self.opt = opt\n\n    def __call__(self, x, y, norm):\n        x = self.generator(x)\n        loss = self.criterion(x.contiguous().view(-1, x.size(-1)),\n                              y.contiguous().view(-1))\n        loss = loss / norm\n\n        if self.opt is not None:\n            loss.backward()          \n            self.opt.step()\n            self.opt.zero_grad()\n\n        return loss.data.item() * norm\n```\n\n### Printing examples\n\nTo monitor progress during training, we will translate a few examples.\n\nWe use greedy decoding for simplicity; that is, at each time step, starting at the first token, we choose the one with that maximum probability, and we never revisit that choice. \n\n\n```python\ndef greedy_decode(model, src, src_mask, src_lengths, max_len=100, sos_index=1, eos_index=None):\n    \"\"\"Greedily decode a sentence.\"\"\"\n\n    with torch.no_grad():\n        encoder_hidden, encoder_final = model.encode(src, src_mask, src_lengths)\n        prev_y = torch.ones(1, 1).fill_(sos_index).type_as(src)\n        trg_mask = torch.ones_like(prev_y)\n\n    output = []\n    attention_scores = []\n    hidden = None\n\n    for i in range(max_len):\n        with torch.no_grad():\n            out, hidden, pre_output = model.decode(\n              encoder_hidden, encoder_final, src_mask,\n              prev_y, trg_mask, hidden)\n\n            # we predict from the pre-output layer, which is\n            # a combination of Decoder state, prev emb, and context\n            prob = model.generator(pre_output[:, -1])\n\n        _, next_word = torch.max(prob, dim=1)\n        next_word = next_word.data.item()\n        output.append(next_word)\n        prev_y = torch.ones(1, 1).type_as(src).fill_(next_word)\n        attention_scores.append(model.decoder.attention.alphas.cpu().numpy())\n    \n    output = np.array(output)\n        \n    # cut off everything starting from </s> \n    # (only when eos_index provided)\n    if eos_index is not None:\n        first_eos = np.where(output==eos_index)[0]\n        if len(first_eos) > 0:\n            output = output[:first_eos[0]]      \n    \n    return output, np.concatenate(attention_scores, axis=1)\n  \n\ndef lookup_words(x, vocab=None):\n    if vocab is not None:\n        x = [vocab.itos[i] for i in x]\n\n    return [str(t) for t in x]\n```\n\n\n```python\ndef print_examples(example_iter, model, n=2, max_len=100, \n                   sos_index=1, \n                   src_eos_index=None, \n                   trg_eos_index=None, \n                   src_vocab=None, trg_vocab=None):\n    \"\"\"Prints N examples. Assumes batch size of 1.\"\"\"\n\n    model.eval()\n    count = 0\n    print()\n    \n    if src_vocab is not None and trg_vocab is not None:\n        src_eos_index = src_vocab.stoi[EOS_TOKEN]\n        trg_sos_index = trg_vocab.stoi[SOS_TOKEN]\n        trg_eos_index = trg_vocab.stoi[EOS_TOKEN]\n    else:\n        src_eos_index = None\n        trg_sos_index = 1\n        trg_eos_index = None\n        \n    for i, batch in enumerate(example_iter):\n      \n        src = batch.src.cpu().numpy()[0, :]\n        trg = batch.trg_y.cpu().numpy()[0, :]\n\n        # remove </s> (if it is there)\n        src = src[:-1] if src[-1] == src_eos_index else src\n        trg = trg[:-1] if trg[-1] == trg_eos_index else trg      \n      \n        result, _ = greedy_decode(\n          model, batch.src, batch.src_mask, batch.src_lengths,\n          max_len=max_len, sos_index=trg_sos_index, eos_index=trg_eos_index)\n        print(\"Example #%d\" % (i+1))\n        print(\"Src : \", \" \".join(lookup_words(src, vocab=src_vocab)))\n        print(\"Trg : \", \" \".join(lookup_words(trg, vocab=trg_vocab)))\n        print(\"Pred: \", \" \".join(lookup_words(result, vocab=trg_vocab)))\n        print()\n        \n        count += 1\n        if count == n:\n            break\n```\n\n## Training the copy task\n\n\n```python\ndef train_copy_task():\n    \"\"\"Train the simple copy task.\"\"\"\n    num_words = 11\n    criterion = nn.NLLLoss(reduction=\"sum\", ignore_index=0)\n    model = make_model(num_words, num_words, emb_size=32, hidden_size=64)\n    optim = torch.optim.Adam(model.parameters(), lr=0.0003)\n    eval_data = list(data_gen(num_words=num_words, batch_size=1, num_batches=100))\n \n    dev_perplexities = []\n    \n    if USE_CUDA:\n        model.cuda()\n\n    for epoch in range(10):\n        \n        print(\"Epoch %d\" % epoch)\n\n        # train\n        model.train()\n        data = data_gen(num_words=num_words, batch_size=32, num_batches=100)\n        run_epoch(data, model,\n                  SimpleLossCompute(model.generator, criterion, optim))\n\n        # evaluate\n        model.eval()\n        with torch.no_grad(): \n            perplexity = run_epoch(eval_data, model,\n                                   SimpleLossCompute(model.generator, criterion, None))\n            print(\"Evaluation perplexity: %f\" % perplexity)\n            dev_perplexities.append(perplexity)\n            print_examples(eval_data, model, n=2, max_len=9)\n        \n    return dev_perplexities\n```\n\n\n```python\n# train the copy task\ndev_perplexities = train_copy_task()\n\ndef plot_perplexity(perplexities):\n    \"\"\"plot perplexities\"\"\"\n    plt.title(\"Perplexity per Epoch\")\n    plt.xlabel(\"Epoch\")\n    plt.ylabel(\"Perplexity\")\n    plt.plot(perplexities)\n    \nplot_perplexity(dev_perplexities)\n```\n\n    /home/jb/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1\n      \"num_layers={}\".format(dropout, num_layers))\n\n\n    Epoch 0\n    Epoch Step: 50 Loss: 19.887581 Tokens per Sec: 7748.957397\n    Epoch Step: 100 Loss: 17.856726 Tokens per Sec: 7925.338918\n    Evaluation perplexity: 7.172198\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  8 3 7 5 8 3 7 5 8\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 8 8 8 8 8 8 8\n    \n    Epoch 1\n    Epoch Step: 50 Loss: 15.715487 Tokens per Sec: 8662.903188\n    Epoch Step: 100 Loss: 12.368280 Tokens per Sec: 7860.172940\n    Evaluation perplexity: 3.709498\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 7 5 10 8 7 5 7\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 5 6 2 6 8 2 5\n    \n    Epoch 2\n    Epoch Step: 50 Loss: 9.246480 Tokens per Sec: 7971.095313\n    Epoch Step: 100 Loss: 7.701921 Tokens per Sec: 7876.198908\n    Evaluation perplexity: 2.303158\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 7 3 10 5 8 7 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 5 6 2 6 8 5 2\n    \n    Epoch 3\n    Epoch Step: 50 Loss: 6.166847 Tokens per Sec: 8069.631171\n    Epoch Step: 100 Loss: 5.673258 Tokens per Sec: 7855.858586\n    Evaluation perplexity: 1.775795\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 7 5 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 8\n    \n    Epoch 4\n    Epoch Step: 50 Loss: 4.830031 Tokens per Sec: 8094.515152\n    Epoch Step: 100 Loss: 4.152125 Tokens per Sec: 7999.315744\n    Evaluation perplexity: 1.572305\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 5 7 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n    Epoch 5\n    Epoch Step: 50 Loss: 3.638369 Tokens per Sec: 8112.868501\n    Epoch Step: 100 Loss: 3.784709 Tokens per Sec: 7843.288141\n    Evaluation perplexity: 1.433951\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 7 5 3 10 7 8 7\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n    Epoch 6\n    Epoch Step: 50 Loss: 2.802792 Tokens per Sec: 8128.952327\n    Epoch Step: 100 Loss: 2.403310 Tokens per Sec: 7893.746819\n    Evaluation perplexity: 1.284198\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 5 7 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n    Epoch 7\n    Epoch Step: 50 Loss: 2.174423 Tokens per Sec: 8181.341663\n    Epoch Step: 100 Loss: 1.838792 Tokens per Sec: 7833.160747\n    Evaluation perplexity: 1.173110\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 5 7 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n    Epoch 8\n    Epoch Step: 50 Loss: 1.226522 Tokens per Sec: 8267.548130\n    Epoch Step: 100 Loss: 1.090876 Tokens per Sec: 7842.856308\n    Evaluation perplexity: 1.123090\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 5 7 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n    Epoch 9\n    Epoch Step: 50 Loss: 1.216270 Tokens per Sec: 8181.132215\n    Epoch Step: 100 Loss: 0.636999 Tokens per Sec: 7866.309111\n    Evaluation perplexity: 1.088564\n    \n    Example #1\n    Src :  4 8 5 7 10 3 7 8 5\n    Trg :  4 8 5 7 10 3 7 8 5\n    Pred:  4 8 5 7 10 3 7 8 5\n    \n    Example #2\n    Src :  8 8 3 6 5 2 8 6 2\n    Trg :  8 8 3 6 5 2 8 6 2\n    Pred:  8 8 3 6 5 2 8 6 2\n    \n\n\n\n![png](images/output_36_2.png)\n\n\nYou can see that the model managed to correctly 'translate' the two examples in the end.\n\nMoreover, the perplexity of the development data nicely went down towards 1.\n\n# A Real World Example\n\nNow we consider a real-world example using the IWSLT German-English Translation task. \nThis task is much smaller than usual, but it illustrates the whole system. \n\nThe cell below installs torch text and spacy. This might take a while.\n\n\n```python\n#!pip install git+git://github.com/pytorch/text spacy \n#!python -m spacy download en\n#!python -m spacy download de\n```\n\n## Data Loading\n\nWe will load the dataset using torchtext and spacy for tokenization.\n\nThis cell might take a while to run the first time, as it will download and tokenize the IWSLT data.\n\nFor speed we only include short sentences, and we include a word in the vocabulary only if it occurs at least 5 times. In this case we also lowercase the data.\n\nIf you have **issues** with torch text in the cell below (e.g. an `ascii` error), try running `export LC_ALL=\"en_US.UTF-8\"` before you start `jupyter notebook`.\n\n\n```python\n# For data loading.\nfrom torchtext import data, datasets\n\nif True:\n    import spacy\n    spacy_de = spacy.load('de')\n    spacy_en = spacy.load('en')\n\n    def tokenize_de(text):\n        return [tok.text for tok in spacy_de.tokenizer(text)]\n\n    def tokenize_en(text):\n        return [tok.text for tok in spacy_en.tokenizer(text)]\n\n    UNK_TOKEN = \"<unk>\"\n    PAD_TOKEN = \"<pad>\"    \n    SOS_TOKEN = \"<s>\"\n    EOS_TOKEN = \"</s>\"\n    LOWER = True\n    \n    # we include lengths to provide to the RNNs\n    SRC = data.Field(tokenize=tokenize_de, \n                     batch_first=True, lower=LOWER, include_lengths=True,\n                     unk_token=UNK_TOKEN, pad_token=PAD_TOKEN, init_token=None, eos_token=EOS_TOKEN)\n    TRG = data.Field(tokenize=tokenize_en, \n                     batch_first=True, lower=LOWER, include_lengths=True,\n                     unk_token=UNK_TOKEN, pad_token=PAD_TOKEN, init_token=SOS_TOKEN, eos_token=EOS_TOKEN)\n\n    MAX_LEN = 25  # NOTE: we filter out a lot of sentences for speed\n    train_data, valid_data, test_data = datasets.IWSLT.splits(\n        exts=('.de', '.en'), fields=(SRC, TRG), \n        filter_pred=lambda x: len(vars(x)['src']) <= MAX_LEN and \n            len(vars(x)['trg']) <= MAX_LEN)\n    MIN_FREQ = 5  # NOTE: we limit the vocabulary to frequent words for speed\n    SRC.build_vocab(train_data.src, min_freq=MIN_FREQ)\n    TRG.build_vocab(train_data.trg, min_freq=MIN_FREQ)\n    \n    PAD_INDEX = TRG.vocab.stoi[PAD_TOKEN]\n\n```\n\n### Let's look at the data\n\nIt never hurts to look at your data and some statistics.\n\n\n```python\ndef print_data_info(train_data, valid_data, test_data, src_field, trg_field):\n    \"\"\" This prints some useful stuff about our data sets. \"\"\"\n\n    print(\"Data set sizes (number of sentence pairs):\")\n    print('train', len(train_data))\n    print('valid', len(valid_data))\n    print('test', len(test_data), \"\\n\")\n\n    print(\"First training example:\")\n    print(\"src:\", \" \".join(vars(train_data[0])['src']))\n    print(\"trg:\", \" \".join(vars(train_data[0])['trg']), \"\\n\")\n\n    print(\"Most common words (src):\")\n    print(\"\\n\".join([\"%10s %10d\" % x for x in src_field.vocab.freqs.most_common(10)]), \"\\n\")\n    print(\"Most common words (trg):\")\n    print(\"\\n\".join([\"%10s %10d\" % x for x in trg_field.vocab.freqs.most_common(10)]), \"\\n\")\n\n    print(\"First 10 words (src):\")\n    print(\"\\n\".join(\n        '%02d %s' % (i, t) for i, t in enumerate(src_field.vocab.itos[:10])), \"\\n\")\n    print(\"First 10 words (trg):\")\n    print(\"\\n\".join(\n        '%02d %s' % (i, t) for i, t in enumerate(trg_field.vocab.itos[:10])), \"\\n\")\n\n    print(\"Number of German words (types):\", len(src_field.vocab))\n    print(\"Number of English words (types):\", len(trg_field.vocab), \"\\n\")\n    \n    \nprint_data_info(train_data, valid_data, test_data, SRC, TRG)\n```\n\n    Data set sizes (number of sentence pairs):\n    train 143116\n    valid 690\n    test 963 \n    \n    First training example:\n    src: david gallo : das ist bill lange . ich bin dave gallo .\n    trg: david gallo : this is bill lange . i 'm dave gallo . \n    \n    Most common words (src):\n             .     138325\n             ,     105944\n           und      41839\n           die      40809\n           das      33324\n           sie      33035\n           ich      31153\n           ist      31035\n            es      27449\n           wir      25817 \n    \n    Most common words (trg):\n             .     137259\n             ,      91619\n           the      73344\n           and      50273\n            to      42798\n             a      39573\n            of      39496\n             i      33524\n            it      32921\n          that      32643 \n    \n    First 10 words (src):\n    00 <unk>\n    01 <pad>\n    02 </s>\n    03 .\n    04 ,\n    05 und\n    06 die\n    07 das\n    08 sie\n    09 ich \n    \n    First 10 words (trg):\n    00 <unk>\n    01 <pad>\n    02 <s>\n    03 </s>\n    04 .\n    05 ,\n    06 the\n    07 and\n    08 to\n    09 a \n    \n    Number of German words (types): 15761\n    Number of English words (types): 13003 \n    \n\n\n## Iterators\nBatching matters a ton for speed. We will use torch text's BucketIterator here to get batches containing sentences of (almost) the same length.\n\n#### Note on sorting batches for RNNs in PyTorch\n\nFor effiency reasons, PyTorch RNNs require that batches have been sorted by length, with the longest sentence in the batch first. For training, we simply sort each batch. \nFor validation, we would run into trouble if we want to compare our translations with some external file that was not sorted. Therefore we simply set the validation batch size to 1, so that we can keep it in the original order.\n\n\n```python\ntrain_iter = data.BucketIterator(train_data, batch_size=64, train=True, \n                                 sort_within_batch=True, \n                                 sort_key=lambda x: (len(x.src), len(x.trg)), repeat=False,\n                                 device=DEVICE)\nvalid_iter = data.Iterator(valid_data, batch_size=1, train=False, sort=False, repeat=False, \n                           device=DEVICE)\n\n\ndef rebatch(pad_idx, batch):\n    \"\"\"Wrap torchtext batch into our own Batch class for pre-processing\"\"\"\n    return Batch(batch.src, batch.trg, pad_idx)\n```\n\n## Training the System\n\nNow we train the model. \n\nOn a Titan X GPU, this runs at ~18,000 tokens per second with a batch size of 64.\n\n\n```python\ndef train(model, num_epochs=10, lr=0.0003, print_every=100):\n    \"\"\"Train a model on IWSLT\"\"\"\n    \n    if USE_CUDA:\n        model.cuda()\n\n    # optionally add label smoothing; see the Annotated Transformer\n    criterion = nn.NLLLoss(reduction=\"sum\", ignore_index=PAD_INDEX)\n    optim = torch.optim.Adam(model.parameters(), lr=lr)\n    \n    dev_perplexities = []\n\n    for epoch in range(num_epochs):\n      \n        print(\"Epoch\", epoch)\n        model.train()\n        train_perplexity = run_epoch((rebatch(PAD_INDEX, b) for b in train_iter), \n                                     model,\n                                     SimpleLossCompute(model.generator, criterion, optim),\n                                     print_every=print_every)\n        \n        model.eval()\n        with torch.no_grad():\n            print_examples((rebatch(PAD_INDEX, x) for x in valid_iter), \n                           model, n=3, src_vocab=SRC.vocab, trg_vocab=TRG.vocab)        \n\n            dev_perplexity = run_epoch((rebatch(PAD_INDEX, b) for b in valid_iter), \n                                       model, \n                                       SimpleLossCompute(model.generator, criterion, None))\n            print(\"Validation perplexity: %f\" % dev_perplexity)\n            dev_perplexities.append(dev_perplexity)\n        \n    return dev_perplexities\n        \n```\n\n\n```python\nmodel = make_model(len(SRC.vocab), len(TRG.vocab),\n                   emb_size=256, hidden_size=256,\n                   num_layers=1, dropout=0.2)\ndev_perplexities = train(model, print_every=100)\n```\n\n    Epoch 0\n\n\n    /home/jb/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.2 and num_layers=1\n      \"num_layers={}\".format(dropout, num_layers))\n\n\n    Epoch Step: 100 Loss: 22.353386 Tokens per Sec: 16007.731248\n    Epoch Step: 200 Loss: 34.410126 Tokens per Sec: 16368.906298\n    Epoch Step: 300 Loss: 44.763870 Tokens per Sec: 16586.324787\n    Epoch Step: 400 Loss: 57.584606 Tokens per Sec: 16717.486756\n    Epoch Step: 500 Loss: 40.508701 Tokens per Sec: 16486.886104\n    Epoch Step: 600 Loss: 51.919121 Tokens per Sec: 16529.862635\n    Epoch Step: 700 Loss: 82.279633 Tokens per Sec: 16973.462052\n    Epoch Step: 800 Loss: 35.026432 Tokens per Sec: 16724.939524\n    Epoch Step: 900 Loss: 63.407204 Tokens per Sec: 16606.524355\n    Epoch Step: 1000 Loss: 37.909828 Tokens per Sec: 19105.497130\n    Epoch Step: 1100 Loss: 90.584244 Tokens per Sec: 19643.264684\n    Epoch Step: 1200 Loss: 84.000832 Tokens per Sec: 19468.084935\n    Epoch Step: 1300 Loss: 54.331242 Tokens per Sec: 19679.282614\n    Epoch Step: 1400 Loss: 49.921040 Tokens per Sec: 19629.820942\n    Epoch Step: 1500 Loss: 21.851797 Tokens per Sec: 19565.639729\n    Epoch Step: 1600 Loss: 55.154270 Tokens per Sec: 19515.738007\n    Epoch Step: 1700 Loss: 40.758137 Tokens per Sec: 19486.791554\n    Epoch Step: 1800 Loss: 50.094219 Tokens per Sec: 19761.236905\n    Epoch Step: 1900 Loss: 90.545143 Tokens per Sec: 19447.650965\n    Epoch Step: 2000 Loss: 22.882494 Tokens per Sec: 19539.331538\n    Epoch Step: 2100 Loss: 99.448174 Tokens per Sec: 19278.704892\n    Epoch Step: 2200 Loss: 16.793839 Tokens per Sec: 19183.702688\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was born years old , i was a <unk> of the <unk> of the <unk> .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father was on his <unk> , the <unk> of the <unk> .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he was very interested in the way , what was pretty much more , and then it was the <unk> .\n    \n    Validation perplexity: 31.839708\n    Epoch 1\n    Epoch Step: 100 Loss: 4.451122 Tokens per Sec: 19110.156367\n    Epoch Step: 200 Loss: 11.262838 Tokens per Sec: 19538.253630\n    Epoch Step: 300 Loss: 55.240711 Tokens per Sec: 19584.509548\n    Epoch Step: 400 Loss: 54.733456 Tokens per Sec: 19787.183104\n    Epoch Step: 500 Loss: 38.923244 Tokens per Sec: 19385.772613\n    Epoch Step: 600 Loss: 63.162933 Tokens per Sec: 19013.165752\n    Epoch Step: 700 Loss: 47.323864 Tokens per Sec: 18863.104141\n    Epoch Step: 800 Loss: 43.414978 Tokens per Sec: 19258.337491\n    Epoch Step: 900 Loss: 87.750214 Tokens per Sec: 19179.949782\n    Epoch Step: 1000 Loss: 39.787056 Tokens per Sec: 19110.748464\n    Epoch Step: 1100 Loss: 78.177170 Tokens per Sec: 19272.044197\n    Epoch Step: 1200 Loss: 37.122997 Tokens per Sec: 19194.535740\n    Epoch Step: 1300 Loss: 26.103378 Tokens per Sec: 19337.967366\n    Epoch Step: 1400 Loss: 78.804855 Tokens per Sec: 19018.413406\n    Epoch Step: 1500 Loss: 61.593956 Tokens per Sec: 19259.272095\n    Epoch Step: 1600 Loss: 81.611786 Tokens per Sec: 19259.527179\n    Epoch Step: 1700 Loss: 28.692696 Tokens per Sec: 19230.891840\n    Epoch Step: 1800 Loss: 84.163223 Tokens per Sec: 19071.272023\n    Epoch Step: 1900 Loss: 36.782116 Tokens per Sec: 19209.383788\n    Epoch Step: 2000 Loss: 56.666332 Tokens per Sec: 19127.522297\n    Epoch Step: 2100 Loss: 5.576357 Tokens per Sec: 18957.458966\n    Epoch Step: 2200 Loss: 38.791512 Tokens per Sec: 19166.811446\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was a <unk> of the <unk> <unk> .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father was on his <unk> , in the little <unk> , the <unk> of the <unk> .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he saw very happy , what was pretty much , and it was the <unk> of the <unk> .\n    \n    Validation perplexity: 19.906190\n    Epoch 2\n    Epoch Step: 100 Loss: 58.981544 Tokens per Sec: 19121.747106\n    Epoch Step: 200 Loss: 34.874680 Tokens per Sec: 19689.768904\n    Epoch Step: 300 Loss: 27.895102 Tokens per Sec: 19751.401628\n    Epoch Step: 400 Loss: 52.931011 Tokens per Sec: 16369.447354\n    Epoch Step: 500 Loss: 77.191933 Tokens per Sec: 16337.808093\n    Epoch Step: 600 Loss: 65.645668 Tokens per Sec: 16307.871308\n    Epoch Step: 700 Loss: 7.141161 Tokens per Sec: 16420.432824\n    Epoch Step: 800 Loss: 76.990250 Tokens per Sec: 17512.558218\n    Epoch Step: 900 Loss: 43.835995 Tokens per Sec: 16399.672659\n    Epoch Step: 1000 Loss: 68.026192 Tokens per Sec: 16598.504664\n    Epoch Step: 1100 Loss: 23.746111 Tokens per Sec: 16368.137311\n    Epoch Step: 1200 Loss: 42.117832 Tokens per Sec: 16324.872475\n    Epoch Step: 1300 Loss: 47.894409 Tokens per Sec: 16532.223380\n    Epoch Step: 1400 Loss: 43.772861 Tokens per Sec: 16472.315811\n    Epoch Step: 1500 Loss: 60.978756 Tokens per Sec: 16368.088307\n    Epoch Step: 1600 Loss: 59.143227 Tokens per Sec: 16553.220745\n    Epoch Step: 1700 Loss: 34.091373 Tokens per Sec: 16557.579342\n    Epoch Step: 1800 Loss: 11.551711 Tokens per Sec: 16639.281663\n    Epoch Step: 1900 Loss: 40.060520 Tokens per Sec: 16666.679672\n    Epoch Step: 2000 Loss: 21.947863 Tokens per Sec: 16403.240568\n    Epoch Step: 2100 Loss: 12.891315 Tokens per Sec: 16656.630033\n    Epoch Step: 2200 Loss: 12.300262 Tokens per Sec: 16592.045153\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was a <unk> of the <unk> of the <unk> .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father was on his little , <unk> , <unk> the <unk> of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he looked very happy to what was pretty much more , because it was the <unk> of the <unk> .\n    \n    Validation perplexity: 15.555337\n    Epoch 3\n    Epoch Step: 100 Loss: 36.178066 Tokens per Sec: 16064.364293\n    Epoch Step: 200 Loss: 20.046204 Tokens per Sec: 16557.065342\n    Epoch Step: 300 Loss: 53.514584 Tokens per Sec: 16375.767859\n    Epoch Step: 400 Loss: 29.280447 Tokens per Sec: 16687.195842\n    Epoch Step: 500 Loss: 64.491814 Tokens per Sec: 16491.438857\n    Epoch Step: 600 Loss: 62.286755 Tokens per Sec: 16443.863308\n    Epoch Step: 700 Loss: 60.861393 Tokens per Sec: 16303.304238\n    Epoch Step: 800 Loss: 25.101744 Tokens per Sec: 16437.206262\n    Epoch Step: 900 Loss: 41.884624 Tokens per Sec: 16712.862598\n    Epoch Step: 1000 Loss: 65.880905 Tokens per Sec: 16406.042864\n    Epoch Step: 1100 Loss: 34.799385 Tokens per Sec: 16257.804744\n    Epoch Step: 1200 Loss: 57.244125 Tokens per Sec: 16403.685499\n    Epoch Step: 1300 Loss: 6.766514 Tokens per Sec: 16262.412676\n    Epoch Step: 1400 Loss: 31.528254 Tokens per Sec: 16723.894609\n    Epoch Step: 1500 Loss: 4.534189 Tokens per Sec: 16512.533272\n    Epoch Step: 1600 Loss: 50.852787 Tokens per Sec: 16820.837828\n    Epoch Step: 1700 Loss: 30.657820 Tokens per Sec: 16574.791159\n    Epoch Step: 1800 Loss: 75.787910 Tokens per Sec: 16441.350335\n    Epoch Step: 1900 Loss: 23.563347 Tokens per Sec: 16836.284727\n    Epoch Step: 2000 Loss: 10.594786 Tokens per Sec: 16522.362683\n    Epoch Step: 2100 Loss: 40.561062 Tokens per Sec: 16508.617285\n    Epoch Step: 2200 Loss: 15.348518 Tokens per Sec: 16624.360367\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 11 years old , i was a <unk> of the <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father was on his little , <unk> , <unk> , the <unk> of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he saw very happy , what was pretty much , because it was the <unk> .\n    \n    Validation perplexity: 13.563748\n    Epoch 4\n    Epoch Step: 100 Loss: 9.601490 Tokens per Sec: 16309.901017\n    Epoch Step: 200 Loss: 13.329712 Tokens per Sec: 16693.352689\n    Epoch Step: 300 Loss: 61.213333 Tokens per Sec: 16774.275779\n    Epoch Step: 400 Loss: 37.759483 Tokens per Sec: 16628.037095\n    Epoch Step: 500 Loss: 35.616104 Tokens per Sec: 16677.874896\n    Epoch Step: 600 Loss: 58.753849 Tokens per Sec: 16452.736708\n    Epoch Step: 700 Loss: 11.741160 Tokens per Sec: 16615.759446\n    Epoch Step: 800 Loss: 24.230316 Tokens per Sec: 16804.673563\n    Epoch Step: 900 Loss: 27.786499 Tokens per Sec: 16373.396939\n    Epoch Step: 1000 Loss: 65.063515 Tokens per Sec: 16520.381173\n    Epoch Step: 1100 Loss: 34.756481 Tokens per Sec: 16492.656502\n    Epoch Step: 1200 Loss: 43.993877 Tokens per Sec: 17075.912389\n    Epoch Step: 1300 Loss: 36.514729 Tokens per Sec: 16812.641454\n    Epoch Step: 1400 Loss: 58.995735 Tokens per Sec: 16535.979640\n    Epoch Step: 1500 Loss: 29.516464 Tokens per Sec: 16500.141569\n    Epoch Step: 1600 Loss: 10.143467 Tokens per Sec: 16613.933279\n    Epoch Step: 1700 Loss: 53.287037 Tokens per Sec: 16756.922926\n    Epoch Step: 1800 Loss: 24.687494 Tokens per Sec: 16477.783348\n    Epoch Step: 1900 Loss: 21.578268 Tokens per Sec: 16808.344988\n    Epoch Step: 2000 Loss: 60.965946 Tokens per Sec: 16651.623717\n    Epoch Step: 2100 Loss: 18.895075 Tokens per Sec: 16636.292649\n    Epoch Step: 2200 Loss: 53.253704 Tokens per Sec: 16642.799323\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was a <unk> of the <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my dad listened on his little , <unk> radio <unk> the bbc of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he saw a happy very happy , which was pretty much , because he was the most famous <unk> .\n    \n    Validation perplexity: 12.664111\n    Epoch 5\n    Epoch Step: 100 Loss: 21.919912 Tokens per Sec: 16266.471497\n    Epoch Step: 200 Loss: 31.320656 Tokens per Sec: 16527.955427\n    Epoch Step: 300 Loss: 40.778984 Tokens per Sec: 16517.710752\n    Epoch Step: 400 Loss: 63.466324 Tokens per Sec: 16770.294841\n    Epoch Step: 500 Loss: 49.329956 Tokens per Sec: 16694.936223\n    Epoch Step: 600 Loss: 52.290169 Tokens per Sec: 16755.442966\n    Epoch Step: 700 Loss: 51.911785 Tokens per Sec: 16768.565847\n    Epoch Step: 800 Loss: 25.005857 Tokens per Sec: 16813.186507\n    Epoch Step: 900 Loss: 50.679825 Tokens per Sec: 17109.031968\n    Epoch Step: 1000 Loss: 13.069316 Tokens per Sec: 16692.984251\n    Epoch Step: 1100 Loss: 12.595688 Tokens per Sec: 16546.293379\n    Epoch Step: 1200 Loss: 46.846031 Tokens per Sec: 16491.379305\n    Epoch Step: 1300 Loss: 30.238283 Tokens per Sec: 16558.196936\n    Epoch Step: 1400 Loss: 23.865877 Tokens per Sec: 16556.353749\n    Epoch Step: 1500 Loss: 42.451859 Tokens per Sec: 16784.645679\n    Epoch Step: 1600 Loss: 37.048477 Tokens per Sec: 16651.129133\n    Epoch Step: 1700 Loss: 17.043219 Tokens per Sec: 16655.630464\n    Epoch Step: 1800 Loss: 17.227308 Tokens per Sec: 16688.568658\n    Epoch Step: 1900 Loss: 23.672441 Tokens per Sec: 16609.439477\n    Epoch Step: 2000 Loss: 19.385946 Tokens per Sec: 16586.442474\n    Epoch Step: 2100 Loss: 25.717686 Tokens per Sec: 16879.694187\n    Epoch Step: 2200 Loss: 22.427767 Tokens per Sec: 16844.504307\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was <unk> by the morning of joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father listened on his little , gray radio waves the bbc of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he saw a very happy ending , which was pretty unusual , since then they were <unk> .\n    \n    Validation perplexity: 12.246438\n    Epoch 6\n    Epoch Step: 100 Loss: 19.048712 Tokens per Sec: 19024.102757\n    Epoch Step: 200 Loss: 31.636736 Tokens per Sec: 19387.779254\n    Epoch Step: 300 Loss: 15.952754 Tokens per Sec: 19559.196457\n    Epoch Step: 400 Loss: 24.849632 Tokens per Sec: 18968.450791\n    Epoch Step: 500 Loss: 47.227837 Tokens per Sec: 19009.957585\n    Epoch Step: 600 Loss: 8.887992 Tokens per Sec: 19024.581918\n    Epoch Step: 700 Loss: 58.158920 Tokens per Sec: 16834.343585\n    Epoch Step: 800 Loss: 32.257362 Tokens per Sec: 16725.454783\n    Epoch Step: 900 Loss: 5.977044 Tokens per Sec: 16398.470679\n    Epoch Step: 1000 Loss: 51.871101 Tokens per Sec: 16302.492231\n    Epoch Step: 1100 Loss: 44.715164 Tokens per Sec: 16505.477988\n    Epoch Step: 1200 Loss: 4.128096 Tokens per Sec: 19255.909773\n    Epoch Step: 1300 Loss: 53.065189 Tokens per Sec: 19016.853318\n    Epoch Step: 1400 Loss: 23.775473 Tokens per Sec: 18877.681861\n    Epoch Step: 1500 Loss: 15.587101 Tokens per Sec: 18916.694718\n    Epoch Step: 1600 Loss: 59.449795 Tokens per Sec: 19166.565245\n    Epoch Step: 1700 Loss: 48.393402 Tokens per Sec: 18836.264938\n    Epoch Step: 1800 Loss: 45.651253 Tokens per Sec: 18823.983316\n    Epoch Step: 1900 Loss: 51.898994 Tokens per Sec: 19015.027947\n    Epoch Step: 2000 Loss: 16.392334 Tokens per Sec: 19180.065119\n    Epoch Step: 2100 Loss: 20.312500 Tokens per Sec: 19059.061076\n    Epoch Step: 2200 Loss: 41.126842 Tokens per Sec: 19110.648056\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 , i was a <unk> of the <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father listened to his little , <unk> radio shack the <unk> of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he looked very happy , which was pretty unusual , and then they had the news <unk> .\n    \n    Validation perplexity: 12.045694\n    Epoch 7\n    Epoch Step: 100 Loss: 22.484320 Tokens per Sec: 19136.387726\n    Epoch Step: 200 Loss: 54.793003 Tokens per Sec: 19562.003455\n    Epoch Step: 300 Loss: 52.516510 Tokens per Sec: 19494.585192\n    Epoch Step: 400 Loss: 25.631699 Tokens per Sec: 19127.415568\n    Epoch Step: 500 Loss: 15.818419 Tokens per Sec: 18909.082434\n    Epoch Step: 600 Loss: 40.660767 Tokens per Sec: 19063.824782\n    Epoch Step: 700 Loss: 21.253407 Tokens per Sec: 19011.780769\n    Epoch Step: 800 Loss: 9.494976 Tokens per Sec: 19032.447976\n    Epoch Step: 900 Loss: 21.503059 Tokens per Sec: 19120.646494\n    Epoch Step: 1000 Loss: 34.198826 Tokens per Sec: 18751.274337\n    Epoch Step: 1100 Loss: 21.471136 Tokens per Sec: 19119.629059\n    Epoch Step: 1200 Loss: 45.433662 Tokens per Sec: 19158.978952\n    Epoch Step: 1300 Loss: 48.697639 Tokens per Sec: 18852.568454\n    Epoch Step: 1400 Loss: 48.406239 Tokens per Sec: 19090.121092\n    Epoch Step: 1500 Loss: 10.506186 Tokens per Sec: 18996.606224\n    Epoch Step: 1600 Loss: 22.061657 Tokens per Sec: 18889.519602\n    Epoch Step: 1700 Loss: 11.148299 Tokens per Sec: 19179.133196\n    Epoch Step: 1800 Loss: 16.580446 Tokens per Sec: 19184.709044\n    Epoch Step: 1900 Loss: 20.219671 Tokens per Sec: 18889.205997\n    Epoch Step: 2000 Loss: 21.245464 Tokens per Sec: 18869.151894\n    Epoch Step: 2100 Loss: 29.567142 Tokens per Sec: 18825.496347\n    Epoch Step: 2200 Loss: 22.790722 Tokens per Sec: 18923.950021\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was <unk> a <unk> of the <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father listened to his little , <unk> radio <unk> the <unk> of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he looked very happy , which was pretty unusual , because he was going to put him in the <unk> .\n    \n    Validation perplexity: 11.837098\n    Epoch 8\n    Epoch Step: 100 Loss: 49.162842 Tokens per Sec: 19241.082862\n    Epoch Step: 200 Loss: 35.163906 Tokens per Sec: 19633.028114\n    Epoch Step: 300 Loss: 10.108455 Tokens per Sec: 17179.927672\n    Epoch Step: 400 Loss: 12.883712 Tokens per Sec: 16510.876579\n    Epoch Step: 500 Loss: 32.006828 Tokens per Sec: 16459.413702\n    Epoch Step: 600 Loss: 21.056961 Tokens per Sec: 16640.683528\n    Epoch Step: 700 Loss: 5.884560 Tokens per Sec: 16567.539919\n    Epoch Step: 800 Loss: 17.562445 Tokens per Sec: 16529.548052\n    Epoch Step: 900 Loss: 25.654568 Tokens per Sec: 16629.045928\n    Epoch Step: 1000 Loss: 30.116678 Tokens per Sec: 16519.515326\n    Epoch Step: 1100 Loss: 49.594883 Tokens per Sec: 16766.220937\n    Epoch Step: 1200 Loss: 35.545147 Tokens per Sec: 16729.972737\n    Epoch Step: 1300 Loss: 12.314122 Tokens per Sec: 16479.824355\n    Epoch Step: 1400 Loss: 5.982590 Tokens per Sec: 16592.352361\n    Epoch Step: 1500 Loss: 23.507740 Tokens per Sec: 16396.264595\n    Epoch Step: 1600 Loss: 36.874157 Tokens per Sec: 16554.722618\n    Epoch Step: 1700 Loss: 13.514697 Tokens per Sec: 16605.822594\n    Epoch Step: 1800 Loss: 6.016938 Tokens per Sec: 16390.681327\n    Epoch Step: 1900 Loss: 44.648132 Tokens per Sec: 16575.965569\n    Epoch Step: 2000 Loss: 21.025373 Tokens per Sec: 16363.246501\n    Epoch Step: 2100 Loss: 32.213993 Tokens per Sec: 16395.313089\n    Epoch Step: 2200 Loss: 29.033810 Tokens per Sec: 16528.855537\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 years old , i was <unk> a <unk> of the <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father listened to his little , gray radio shack , the radio of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he looked very happy , which was pretty unusual , because he was the news of the most famous .\n    \n    Validation perplexity: 11.868392\n    Epoch 9\n    Epoch Step: 100 Loss: 33.819195 Tokens per Sec: 16155.433696\n    Epoch Step: 200 Loss: 26.771244 Tokens per Sec: 16447.243194\n    Epoch Step: 300 Loss: 22.235714 Tokens per Sec: 16557.847083\n    Epoch Step: 400 Loss: 16.233931 Tokens per Sec: 16802.777289\n    Epoch Step: 500 Loss: 34.811615 Tokens per Sec: 16637.208199\n    Epoch Step: 600 Loss: 11.960271 Tokens per Sec: 16478.541533\n    Epoch Step: 700 Loss: 32.807648 Tokens per Sec: 16526.645827\n    Epoch Step: 800 Loss: 25.779436 Tokens per Sec: 16572.304586\n    Epoch Step: 900 Loss: 18.101871 Tokens per Sec: 16472.573763\n    Epoch Step: 1000 Loss: 34.465992 Tokens per Sec: 16489.131609\n    Epoch Step: 1100 Loss: 47.311241 Tokens per Sec: 16501.563937\n    Epoch Step: 1200 Loss: 22.709623 Tokens per Sec: 16416.828638\n    Epoch Step: 1300 Loss: 45.883862 Tokens per Sec: 16338.132985\n    Epoch Step: 1400 Loss: 21.321081 Tokens per Sec: 16680.505744\n    Epoch Step: 1500 Loss: 11.126824 Tokens per Sec: 16636.646687\n    Epoch Step: 1600 Loss: 32.759712 Tokens per Sec: 16440.968759\n    Epoch Step: 1700 Loss: 19.354910 Tokens per Sec: 16476.318234\n    Epoch Step: 1800 Loss: 14.631118 Tokens per Sec: 16490.663260\n    Epoch Step: 1900 Loss: 2.233373 Tokens per Sec: 16390.177497\n    Epoch Step: 2000 Loss: 42.503407 Tokens per Sec: 16498.365808\n    Epoch Step: 2100 Loss: 35.935966 Tokens per Sec: 16257.764127\n    Epoch Step: 2200 Loss: 37.685387 Tokens per Sec: 16498.916279\n    \n    Example #1\n    Src :  als ich 11 jahre alt war , wurde ich eines morgens von den <unk> heller freude geweckt .\n    Trg :  when i was 11 , i remember waking up one morning to the sound of joy in my house .\n    Pred:  when i was 11 , i was a <unk> of <unk> <unk> joy .\n    \n    Example #2\n    Src :  mein vater hörte sich auf seinem kleinen , grauen radio die <unk> der bbc an .\n    Trg :  my father was listening to bbc news on his small , gray radio .\n    Pred:  my father listened to his little , gray radio shack the bbc of the bbc .\n    \n    Example #3\n    Src :  er sah sehr glücklich aus , was damals ziemlich ungewöhnlich war , da ihn die nachrichten meistens <unk> .\n    Trg :  there was a big smile on his face which was unusual then , because the news mostly depressed him .\n    Pred:  he looked very happy , which was pretty unusual since then , they were <unk> the <unk> .\n    \n    Validation perplexity: 11.886973\n\n\n\n```python\nplot_perplexity(dev_perplexities)\n```\n\n\n![png](images/output_49_0.png)\n\n\n## Prediction and Evaluation\n\nOnce trained we can use the model to produce a set of translations. \n\nIf we translate the whole validation set, we can use [SacreBLEU](https://github.com/mjpost/sacreBLEU) to get a [BLEU score](https://en.wikipedia.org/wiki/BLEU), which is the most common way to evaluate translations.\n\n#### Important sidenote\nTypically you would use SacreBLEU from the **command line** using the output file and original (possibly tokenized) development reference file. This will give you a nice version string that shows how the BLEU score was calculated; for example, if it was lowercased, if it was tokenized (and how), and what smoothing was used. If you want to learn more about how BLEU scores are (and should be) reported, check out [this paper](https://arxiv.org/abs/1804.08771).\n\nHowever, right now our pre-processed data is only in memory, so we'll calculate the BLEU score right from this notebook for demonstration purposes.\n\nWe'll first test the raw BLEU function:\n\n\n```python\nimport sacrebleu\n```\n\n\n```python\n# this should result in a perfect BLEU of 100%\nhypotheses = [\"this is a test\"]\nreferences = [\"this is a test\"]\nbleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\nprint(bleu)\n```\n\n    100.00000000000004\n\n\n\n```python\n# here the BLEU score will be lower, because some n-grams won't match\nhypotheses = [\"this is a test\"]\nreferences = [\"this is a fest\"]\nbleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\nprint(bleu)\n```\n\n    22.360679774997894\n\n\nSince we did some filtering for speed, our validation set contains 690 sentences.\nThe references are the tokenized versions, but they should not contain out-of-vocabulary UNKs that our network might have seen. So we'll take the references straight out of the `valid_data` object:\n\n\n```python\nlen(valid_data)\n```\n\n\n\n\n    690\n\n\n\n\n```python\nreferences = [\" \".join(example.trg) for example in valid_data]\nprint(len(references))\nprint(references[0])\n```\n\n    690\n    when i was 11 , i remember waking up one morning to the sound of joy in my house .\n\n\n\n```python\nreferences[-2]\n```\n\n\n\n\n    \"i 'm always the one taking the picture .\"\n\n\n\n**Now we translate the validation set!**\n\nThis might take a little bit of time.\n\nNote that `greedy_decode` will cut-off the sentence when it encounters the end-of-sequence symbol, if we provide it the index of that symbol.\n\n\n```python\nhypotheses = []\nalphas = []  # save the last attention scores\nfor batch in valid_iter:\n  batch = rebatch(PAD_INDEX, batch)\n  pred, attention = greedy_decode(\n    model, batch.src, batch.src_mask, batch.src_lengths, max_len=25,\n    sos_index=TRG.vocab.stoi[SOS_TOKEN],\n    eos_index=TRG.vocab.stoi[EOS_TOKEN])\n  hypotheses.append(pred)\n  alphas.append(attention)\n```\n\n\n```python\n# we will still need to convert the indices to actual words!\nhypotheses[0]\n```\n\n\n\n\n    array([  70,   11,   24, 1460,    5,   11,   24,    9,    0,   10,    0,\n              0, 1806,    4])\n\n\n\n\n```python\nhypotheses = [lookup_words(x, TRG.vocab) for x in hypotheses]\nhypotheses[0]\n```\n\n\n\n\n    ['when',\n     'i',\n     'was',\n     '11',\n     ',',\n     'i',\n     'was',\n     'a',\n     '<unk>',\n     'of',\n     '<unk>',\n     '<unk>',\n     'joy',\n     '.']\n\n\n\n\n```python\n# finally, the SacreBLEU raw scorer requires string input, so we convert the lists to strings\nhypotheses = [\" \".join(x) for x in hypotheses]\nprint(len(hypotheses))\nprint(hypotheses[0])\n```\n\n    690\n    when i was 11 , i was a <unk> of <unk> <unk> joy .\n\n\n\n```python\n# now we can compute the BLEU score!\nbleu = sacrebleu.raw_corpus_bleu(hypotheses, [references], .01).score\nprint(bleu)\n```\n\n    23.4681520210298\n\n\n## Attention Visualization\n\nWe can also visualize the attention scores of the decoder.\n\n\n```python\ndef plot_heatmap(src, trg, scores):\n\n    fig, ax = plt.subplots()\n    heatmap = ax.pcolor(scores, cmap='viridis')\n\n    ax.set_xticklabels(trg, minor=False, rotation='vertical')\n    ax.set_yticklabels(src, minor=False)\n\n    # put the major ticks at the middle of each cell\n    # and the x-ticks on top\n    ax.xaxis.tick_top()\n    ax.set_xticks(np.arange(scores.shape[1]) + 0.5, minor=False)\n    ax.set_yticks(np.arange(scores.shape[0]) + 0.5, minor=False)\n    ax.invert_yaxis()\n\n    plt.colorbar(heatmap)\n    plt.show()\n```\n\n\n```python\n# This plots a chosen sentence, for which we saved the attention scores above.\nidx = 5\nsrc = valid_data[idx].src + [\"</s>\"]\ntrg = valid_data[idx].trg + [\"</s>\"]\npred = hypotheses[idx].split() + [\"</s>\"]\npred_att = alphas[idx][0].T[:, :len(pred)]\nprint(\"src\", src)\nprint(\"ref\", trg)\nprint(\"pred\", pred)\nplot_heatmap(src, pred, pred_att)\n```\n\n    src ['\"', 'jetzt', 'kannst', 'du', 'auf', 'eine', 'richtige', 'schule', 'gehen', ',', '\"', 'sagte', 'er', '.', '</s>']\n    ref ['\"', 'you', 'can', 'go', 'to', 'a', 'real', 'school', 'now', ',', '\"', 'he', 'said', '.', '</s>']\n    pred ['\"', 'now', 'you', 'can', 'go', 'to', 'a', 'right', 'school', ',', '\"', 'he', 'said', '.', '</s>']\n\n\n\n![png](images/output_66_1.png)\n\n\n# Congratulations! You've finished this notebook.\n\nWhat didn't we cover?\n\n- Subwords / Byte Pair Encoding [[paper]](https://arxiv.org/abs/1508.07909) [[github]](https://github.com/rsennrich/subword-nmt) let you deal with unknown words. \n- You can implement a [multiplicative/bilinear attention mechanism](https://arxiv.org/abs/1508.04025) instead of the additive one used here.\n- We used greedy decoding here to get translations, but you can get better results with beam search.\n- The original model only uses a single dropout layer (in the decoder), but you can experiment with adding more dropout layers, for example on the word embeddings and the source word representations.\n- You can experiment with multiple encoder/decoder layers.\n- Experiment with a benchmarked and improved codebase: [Joey NMT](https://github.com/joeynmt/joeynmt)\n\nIf this was useful to your research, please consider citing:\n\n> J. Bastings. 2018. The Annotated Encoder-Decoder with Attention. https://bastings.github.io/annotated_encoder_decoder/\n\nOr use the following Bibtex:\n\n```\n@misc{bastings2018annotated,\n  title={The Annotated Encoder-Decoder with Attention},\n  author={Bastings, J.},\n  journal={https://bastings.github.io/annotated\\_encoder\\_decoder/},\n  year={2018}\n}\n```\n"
  }
]