Full Code of iamtrask/Grokking-Deep-Learning for AI

master e665168b4aef cached

106 files

75.0 MB

3.2M tokens

1 requests

Copy disabled (too large) Download .txt

Showing preview only (12,932K chars total). Download the full file to get everything.

Repository: iamtrask/Grokking-Deep-Learning
Branch: master
Commit: e665168b4aef
Files: 106
Total size: 75.0 MB

Directory structure:
gitextract_3ri_1omh/

├── .gitignore
├── Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb
├── Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb
├── Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb
├── Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb
├── Chapter14 - Exploding Gradients Examples.ipynb
├── Chapter14 - Intro to LSTMs - Learn to Write Like Shakespeare.ipynb
├── Chapter14 - Intro to LSTMs - Part 2 - Learn to Write Like Shakespeare.ipynb
├── Chapter15 - Intro to Federated Learning - Deep Learning on Unseen Data.ipynb
├── Chapter3 -  Forward Propagation - Intro to Neural Prediction.ipynb
├── Chapter4 - Gradient Descent - Intro to Neural Learning.ipynb
├── Chapter5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time.ipynb
├── Chapter6 - Intro to Backpropagation - Building Your First DEEP Neural Network.ipynb
├── Chapter8 - Intro to Regularization - Learning Signal and Ignoring Noise.ipynb
├── Chapter9 - Intro to Activation Functions - Modeling Probabilities.ipynb
├── MNISTPreprocessor.ipynb
├── README.md
├── docker-compose.yml
├── floyd.yml
├── ham.txt
├── labels.txt
├── reviews.txt
├── shakespear.txt
├── spam.txt
└── tasksv11/
    ├── LICENSE
    ├── README
    ├── en/
    │   ├── qa10_indefinite-knowledge_test.txt
    │   ├── qa10_indefinite-knowledge_train.txt
    │   ├── qa11_basic-coreference_test.txt
    │   ├── qa11_basic-coreference_train.txt
    │   ├── qa12_conjunction_test.txt
    │   ├── qa12_conjunction_train.txt
    │   ├── qa13_compound-coreference_test.txt
    │   ├── qa13_compound-coreference_train.txt
    │   ├── qa14_time-reasoning_test.txt
    │   ├── qa14_time-reasoning_train.txt
    │   ├── qa15_basic-deduction_test.txt
    │   ├── qa15_basic-deduction_train.txt
    │   ├── qa16_basic-induction_test.txt
    │   ├── qa16_basic-induction_train.txt
    │   ├── qa17_positional-reasoning_test.txt
    │   ├── qa17_positional-reasoning_train.txt
    │   ├── qa18_size-reasoning_test.txt
    │   ├── qa18_size-reasoning_train.txt
    │   ├── qa19_path-finding_test.txt
    │   ├── qa19_path-finding_train.txt
    │   ├── qa1_single-supporting-fact_test.txt
    │   ├── qa1_single-supporting-fact_train.txt
    │   ├── qa20_agents-motivations_test.txt
    │   ├── qa20_agents-motivations_train.txt
    │   ├── qa2_two-supporting-facts_test.txt
    │   ├── qa2_two-supporting-facts_train.txt
    │   ├── qa3_three-supporting-facts_test.txt
    │   ├── qa3_three-supporting-facts_train.txt
    │   ├── qa4_two-arg-relations_test.txt
    │   ├── qa4_two-arg-relations_train.txt
    │   ├── qa5_three-arg-relations_test.txt
    │   ├── qa5_three-arg-relations_train.txt
    │   ├── qa6_yes-no-questions_test.txt
    │   ├── qa6_yes-no-questions_train.txt
    │   ├── qa7_counting_test.txt
    │   ├── qa7_counting_train.txt
    │   ├── qa8_lists-sets_test.txt
    │   ├── qa8_lists-sets_train.txt
    │   ├── qa9_simple-negation_test.txt
    │   └── qa9_simple-negation_train.txt
    └── shuffled/
        ├── qa10_indefinite-knowledge_test.txt
        ├── qa10_indefinite-knowledge_train.txt
        ├── qa11_basic-coreference_test.txt
        ├── qa11_basic-coreference_train.txt
        ├── qa12_conjunction_test.txt
        ├── qa12_conjunction_train.txt
        ├── qa13_compound-coreference_test.txt
        ├── qa13_compound-coreference_train.txt
        ├── qa14_time-reasoning_test.txt
        ├── qa14_time-reasoning_train.txt
        ├── qa15_basic-deduction_test.txt
        ├── qa15_basic-deduction_train.txt
        ├── qa16_basic-induction_test.txt
        ├── qa16_basic-induction_train.txt
        ├── qa17_positional-reasoning_test.txt
        ├── qa17_positional-reasoning_train.txt
        ├── qa18_size-reasoning_test.txt
        ├── qa18_size-reasoning_train.txt
        ├── qa19_path-finding_test.txt
        ├── qa19_path-finding_train.txt
        ├── qa1_single-supporting-fact_test.txt
        ├── qa1_single-supporting-fact_train.txt
        ├── qa20_agents-motivations_test.txt
        ├── qa20_agents-motivations_train.txt
        ├── qa2_two-supporting-facts_test.txt
        ├── qa2_two-supporting-facts_train.txt
        ├── qa3_three-supporting-facts_test.txt
        ├── qa3_three-supporting-facts_train.txt
        ├── qa4_two-arg-relations_test.txt
        ├── qa4_two-arg-relations_train.txt
        ├── qa5_three-arg-relations_test.txt
        ├── qa5_three-arg-relations_train.txt
        ├── qa6_yes-no-questions_test.txt
        ├── qa6_yes-no-questions_train.txt
        ├── qa7_counting_test.txt
        ├── qa7_counting_train.txt
        ├── qa8_lists-sets_test.txt
        ├── qa8_lists-sets_train.txt
        ├── qa9_simple-negation_test.txt
        └── qa9_simple-negation_train.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================

# Created by https://www.gitignore.io/api/jupyternotebook

### JupyterNotebook ###
.ipynb_checkpoints
*/.ipynb_checkpoints/*

# Remove previous ipynb_checkpoints
#   git rm -r .ipynb_checkpoints/
#


# End of https://www.gitignore.io/api/jupyternotebook


================================================
FILE: Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upgrading our MNIST Network"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "I:0 Test-Acc:0.0288 Train-Acc:0.055\n",
      "I:1 Test-Acc:0.0273 Train-Acc:0.037\n",
      "I:2 Test-Acc:0.028 Train-Acc:0.037\n",
      "I:3 Test-Acc:0.0292 Train-Acc:0.04\n",
      "I:4 Test-Acc:0.0339 Train-Acc:0.046\n",
      "I:5 Test-Acc:0.0478 Train-Acc:0.068\n",
      "I:6 Test-Acc:0.076 Train-Acc:0.083\n",
      "I:7 Test-Acc:0.1316 Train-Acc:0.096\n",
      "I:8 Test-Acc:0.2137 Train-Acc:0.127\n",
      "I:9 Test-Acc:0.2941 Train-Acc:0.148\n",
      "I:10 Test-Acc:0.3563 Train-Acc:0.181\n",
      "I:11 Test-Acc:0.4023 Train-Acc:0.209\n",
      "I:12 Test-Acc:0.4358 Train-Acc:0.238\n",
      "I:13 Test-Acc:0.4473 Train-Acc:0.286\n",
      "I:14 Test-Acc:0.4389 Train-Acc:0.274\n",
      "I:15 Test-Acc:0.3951 Train-Acc:0.257\n",
      "I:16 Test-Acc:0.2222 Train-Acc:0.243\n",
      "I:17 Test-Acc:0.0613 Train-Acc:0.112\n",
      "I:18 Test-Acc:0.0266 Train-Acc:0.035\n",
      "I:19 Test-Acc:0.0127 Train-Acc:0.026\n",
      "I:20 Test-Acc:0.0133 Train-Acc:0.022\n",
      "I:21 Test-Acc:0.0185 Train-Acc:0.038\n",
      "I:22 Test-Acc:0.0363 Train-Acc:0.038\n",
      "I:23 Test-Acc:0.0928 Train-Acc:0.067\n",
      "I:24 Test-Acc:0.1994 Train-Acc:0.081\n",
      "I:25 Test-Acc:0.3086 Train-Acc:0.154\n",
      "I:26 Test-Acc:0.4276 Train-Acc:0.204\n",
      "I:27 Test-Acc:0.5323 Train-Acc:0.256\n",
      "I:28 Test-Acc:0.5919 Train-Acc:0.305\n",
      "I:29 Test-Acc:0.6324 Train-Acc:0.341\n",
      "I:30 Test-Acc:0.6608 Train-Acc:0.426\n",
      "I:31 Test-Acc:0.6815 Train-Acc:0.439\n",
      "I:32 Test-Acc:0.7048 Train-Acc:0.462\n",
      "I:33 Test-Acc:0.7171 Train-Acc:0.484\n",
      "I:34 Test-Acc:0.7313 Train-Acc:0.505\n",
      "I:35 Test-Acc:0.7355 Train-Acc:0.53\n",
      "I:36 Test-Acc:0.7417 Train-Acc:0.548\n",
      "I:37 Test-Acc:0.747 Train-Acc:0.534\n",
      "I:38 Test-Acc:0.7491 Train-Acc:0.55\n",
      "I:39 Test-Acc:0.7459 Train-Acc:0.562\n",
      "I:40 Test-Acc:0.7352 Train-Acc:0.54\n",
      "I:41 Test-Acc:0.7082 Train-Acc:0.496\n",
      "I:42 Test-Acc:0.6487 Train-Acc:0.456\n",
      "I:43 Test-Acc:0.5209 Train-Acc:0.353\n",
      "I:44 Test-Acc:0.3305 Train-Acc:0.234\n",
      "I:45 Test-Acc:0.2052 Train-Acc:0.174\n",
      "I:46 Test-Acc:0.2149 Train-Acc:0.136\n",
      "I:47 Test-Acc:0.2679 Train-Acc:0.171\n",
      "I:48 Test-Acc:0.3237 Train-Acc:0.172\n",
      "I:49 Test-Acc:0.3581 Train-Acc:0.186\n",
      "I:50 Test-Acc:0.4202 Train-Acc:0.21\n",
      "I:51 Test-Acc:0.5165 Train-Acc:0.223\n",
      "I:52 Test-Acc:0.6007 Train-Acc:0.262\n",
      "I:53 Test-Acc:0.6476 Train-Acc:0.308\n",
      "I:54 Test-Acc:0.676 Train-Acc:0.363\n",
      "I:55 Test-Acc:0.696 Train-Acc:0.402\n",
      "I:56 Test-Acc:0.7077 Train-Acc:0.434\n",
      "I:57 Test-Acc:0.7204 Train-Acc:0.441\n",
      "I:58 Test-Acc:0.7303 Train-Acc:0.475\n",
      "I:59 Test-Acc:0.7359 Train-Acc:0.475\n",
      "I:60 Test-Acc:0.7401 Train-Acc:0.525\n",
      "I:61 Test-Acc:0.7493 Train-Acc:0.517\n",
      "I:62 Test-Acc:0.7533 Train-Acc:0.517\n",
      "I:63 Test-Acc:0.7606 Train-Acc:0.538\n",
      "I:64 Test-Acc:0.7644 Train-Acc:0.554\n",
      "I:65 Test-Acc:0.7724 Train-Acc:0.57\n",
      "I:66 Test-Acc:0.7788 Train-Acc:0.586\n",
      "I:67 Test-Acc:0.7855 Train-Acc:0.595\n",
      "I:68 Test-Acc:0.7853 Train-Acc:0.591\n",
      "I:69 Test-Acc:0.7925 Train-Acc:0.605\n",
      "I:70 Test-Acc:0.7973 Train-Acc:0.64\n",
      "I:71 Test-Acc:0.8013 Train-Acc:0.621\n",
      "I:72 Test-Acc:0.8029 Train-Acc:0.626\n",
      "I:73 Test-Acc:0.8092 Train-Acc:0.631\n",
      "I:74 Test-Acc:0.8099 Train-Acc:0.638\n",
      "I:75 Test-Acc:0.8156 Train-Acc:0.661\n",
      "I:76 Test-Acc:0.8156 Train-Acc:0.639\n",
      "I:77 Test-Acc:0.8184 Train-Acc:0.65\n",
      "I:78 Test-Acc:0.8216 Train-Acc:0.67\n",
      "I:79 Test-Acc:0.8246 Train-Acc:0.675\n",
      "I:80 Test-Acc:0.8237 Train-Acc:0.666\n",
      "I:81 Test-Acc:0.8273 Train-Acc:0.673\n",
      "I:82 Test-Acc:0.8273 Train-Acc:0.704\n",
      "I:83 Test-Acc:0.8314 Train-Acc:0.674\n",
      "I:84 Test-Acc:0.8292 Train-Acc:0.686\n",
      "I:85 Test-Acc:0.8335 Train-Acc:0.699\n",
      "I:86 Test-Acc:0.8359 Train-Acc:0.694\n",
      "I:87 Test-Acc:0.8375 Train-Acc:0.704\n",
      "I:88 Test-Acc:0.8373 Train-Acc:0.697\n",
      "I:89 Test-Acc:0.8398 Train-Acc:0.704\n",
      "I:90 Test-Acc:0.8393 Train-Acc:0.687\n",
      "I:91 Test-Acc:0.8436 Train-Acc:0.705\n",
      "I:92 Test-Acc:0.8437 Train-Acc:0.711\n",
      "I:93 Test-Acc:0.8446 Train-Acc:0.721\n",
      "I:94 Test-Acc:0.845 Train-Acc:0.719\n",
      "I:95 Test-Acc:0.8469 Train-Acc:0.724\n",
      "I:96 Test-Acc:0.8476 Train-Acc:0.726\n",
      "I:97 Test-Acc:0.848 Train-Acc:0.718\n",
      "I:98 Test-Acc:0.8496 Train-Acc:0.719\n",
      "I:99 Test-Acc:0.85 Train-Acc:0.73\n",
      "I:100 Test-Acc:0.8511 Train-Acc:0.737\n",
      "I:101 Test-Acc:0.8503 Train-Acc:0.73\n",
      "I:102 Test-Acc:0.8504 Train-Acc:0.717\n",
      "I:103 Test-Acc:0.8528 Train-Acc:0.74\n",
      "I:104 Test-Acc:0.8532 Train-Acc:0.733\n",
      "I:105 Test-Acc:0.8537 Train-Acc:0.73\n",
      "I:106 Test-Acc:0.8568 Train-Acc:0.721\n",
      "I:107 Test-Acc:0.857 Train-Acc:0.75\n",
      "I:108 Test-Acc:0.8558 Train-Acc:0.731\n",
      "I:109 Test-Acc:0.8578 Train-Acc:0.744\n",
      "I:110 Test-Acc:0.8588 Train-Acc:0.754\n",
      "I:111 Test-Acc:0.8579 Train-Acc:0.732\n",
      "I:112 Test-Acc:0.8582 Train-Acc:0.747\n",
      "I:113 Test-Acc:0.8593 Train-Acc:0.747\n",
      "I:114 Test-Acc:0.8598 Train-Acc:0.751\n",
      "I:115 Test-Acc:0.8603 Train-Acc:0.74\n",
      "I:116 Test-Acc:0.86 Train-Acc:0.753\n",
      "I:117 Test-Acc:0.8588 Train-Acc:0.746\n",
      "I:118 Test-Acc:0.861 Train-Acc:0.741\n",
      "I:119 Test-Acc:0.8616 Train-Acc:0.731\n",
      "I:120 Test-Acc:0.8629 Train-Acc:0.753\n",
      "I:121 Test-Acc:0.8609 Train-Acc:0.743\n",
      "I:122 Test-Acc:0.8627 Train-Acc:0.752\n",
      "I:123 Test-Acc:0.8646 Train-Acc:0.76\n",
      "I:124 Test-Acc:0.8649 Train-Acc:0.766\n",
      "I:125 Test-Acc:0.8659 Train-Acc:0.752\n",
      "I:126 Test-Acc:0.868 Train-Acc:0.756\n",
      "I:127 Test-Acc:0.8648 Train-Acc:0.767\n",
      "I:128 Test-Acc:0.8662 Train-Acc:0.747\n",
      "I:129 Test-Acc:0.8669 Train-Acc:0.753\n",
      "I:130 Test-Acc:0.8694 Train-Acc:0.753\n",
      "I:131 Test-Acc:0.8692 Train-Acc:0.76\n",
      "I:132 Test-Acc:0.8658 Train-Acc:0.756\n",
      "I:133 Test-Acc:0.8666 Train-Acc:0.769\n",
      "I:134 Test-Acc:0.8692 Train-Acc:0.77\n",
      "I:135 Test-Acc:0.8681 Train-Acc:0.757\n",
      "I:136 Test-Acc:0.8705 Train-Acc:0.77\n",
      "I:137 Test-Acc:0.8706 Train-Acc:0.77\n",
      "I:138 Test-Acc:0.8684 Train-Acc:0.768\n",
      "I:139 Test-Acc:0.8664 Train-Acc:0.774\n",
      "I:140 Test-Acc:0.8666 Train-Acc:0.756\n",
      "I:141 Test-Acc:0.8705 Train-Acc:0.783\n",
      "I:142 Test-Acc:0.87 Train-Acc:0.775\n",
      "I:143 Test-Acc:0.8729 Train-Acc:0.769\n",
      "I:144 Test-Acc:0.8725 Train-Acc:0.776\n",
      "I:145 Test-Acc:0.8721 Train-Acc:0.772\n",
      "I:146 Test-Acc:0.8718 Train-Acc:0.765\n",
      "I:147 Test-Acc:0.8746 Train-Acc:0.777\n",
      "I:148 Test-Acc:0.8746 Train-Acc:0.77\n",
      "I:149 Test-Acc:0.8734 Train-Acc:0.778\n",
      "I:150 Test-Acc:0.873 Train-Acc:0.785\n",
      "I:151 Test-Acc:0.8732 Train-Acc:0.76\n",
      "I:152 Test-Acc:0.8727 Train-Acc:0.779\n",
      "I:153 Test-Acc:0.8754 Train-Acc:0.772\n",
      "I:154 Test-Acc:0.8729 Train-Acc:0.773\n",
      "I:155 Test-Acc:0.8758 Train-Acc:0.784\n",
      "I:156 Test-Acc:0.8732 Train-Acc:0.774\n",
      "I:157 Test-Acc:0.8743 Train-Acc:0.782\n",
      "I:158 Test-Acc:0.8762 Train-Acc:0.772\n",
      "I:159 Test-Acc:0.8755 Train-Acc:0.79\n",
      "I:160 Test-Acc:0.8751 Train-Acc:0.774\n",
      "I:161 Test-Acc:0.8749 Train-Acc:0.782\n",
      "I:162 Test-Acc:0.8744 Train-Acc:0.78\n",
      "I:163 Test-Acc:0.8765 Train-Acc:0.782\n",
      "I:164 Test-Acc:0.8738 Train-Acc:0.796\n",
      "I:165 Test-Acc:0.8753 Train-Acc:0.798\n",
      "I:166 Test-Acc:0.8767 Train-Acc:0.794\n",
      "I:167 Test-Acc:0.8746 Train-Acc:0.784\n",
      "I:168 Test-Acc:0.8769 Train-Acc:0.796\n",
      "I:169 Test-Acc:0.8758 Train-Acc:0.789\n",
      "I:170 Test-Acc:0.8764 Train-Acc:0.79\n",
      "I:171 Test-Acc:0.873 Train-Acc:0.791\n",
      "I:172 Test-Acc:0.8765 Train-Acc:0.797\n",
      "I:173 Test-Acc:0.8772 Train-Acc:0.789\n",
      "I:174 Test-Acc:0.8778 Train-Acc:0.781\n",
      "I:175 Test-Acc:0.8758 Train-Acc:0.799\n",
      "I:176 Test-Acc:0.8773 Train-Acc:0.785\n",
      "I:177 Test-Acc:0.8766 Train-Acc:0.796\n",
      "I:178 Test-Acc:0.8782 Train-Acc:0.803\n",
      "I:179 Test-Acc:0.8789 Train-Acc:0.794\n",
      "I:180 Test-Acc:0.8778 Train-Acc:0.794\n",
      "I:181 Test-Acc:0.8778 Train-Acc:0.8\n",
      "I:182 Test-Acc:0.8785 Train-Acc:0.791\n",
      "I:183 Test-Acc:0.8777 Train-Acc:0.787\n",
      "I:184 Test-Acc:0.8769 Train-Acc:0.781\n",
      "I:185 Test-Acc:0.8765 Train-Acc:0.786\n",
      "I:186 Test-Acc:0.8765 Train-Acc:0.793\n",
      "I:187 Test-Acc:0.8785 Train-Acc:0.796\n",
      "I:188 Test-Acc:0.879 Train-Acc:0.789\n",
      "I:189 Test-Acc:0.8763 Train-Acc:0.79\n",
      "I:190 Test-Acc:0.8774 Train-Acc:0.787\n",
      "I:191 Test-Acc:0.8766 Train-Acc:0.782\n",
      "I:192 Test-Acc:0.8803 Train-Acc:0.798\n",
      "I:193 Test-Acc:0.8781 Train-Acc:0.789\n",
      "I:194 Test-Acc:0.8795 Train-Acc:0.785\n",
      "I:195 Test-Acc:0.8791 Train-Acc:0.807\n",
      "I:196 Test-Acc:0.8778 Train-Acc:0.796\n",
      "I:197 Test-Acc:0.8783 Train-Acc:0.801\n",
      "I:198 Test-Acc:0.8778 Train-Acc:0.81\n",
      "I:199 Test-Acc:0.8771 Train-Acc:0.784\n",
      "I:200 Test-Acc:0.8776 Train-Acc:0.792\n",
      "I:201 Test-Acc:0.8784 Train-Acc:0.794\n",
      "I:202 Test-Acc:0.8787 Train-Acc:0.795\n",
      "I:203 Test-Acc:0.8803 Train-Acc:0.781\n",
      "I:204 Test-Acc:0.8798 Train-Acc:0.804\n",
      "I:205 Test-Acc:0.8779 Train-Acc:0.779\n",
      "I:206 Test-Acc:0.8788 Train-Acc:0.792\n",
      "I:207 Test-Acc:0.8764 Train-Acc:0.793\n",
      "I:208 Test-Acc:0.8792 Train-Acc:0.792\n",
      "I:209 Test-Acc:0.8798 Train-Acc:0.803\n",
      "I:210 Test-Acc:0.8788 Train-Acc:0.804\n",
      "I:211 Test-Acc:0.8793 Train-Acc:0.797\n",
      "I:212 Test-Acc:0.8764 Train-Acc:0.791\n",
      "I:213 Test-Acc:0.8801 Train-Acc:0.801\n",
      "I:214 Test-Acc:0.8814 Train-Acc:0.799\n",
      "I:215 Test-Acc:0.8806 Train-Acc:0.79\n",
      "I:216 Test-Acc:0.8799 Train-Acc:0.8\n",
      "I:217 Test-Acc:0.8803 Train-Acc:0.802\n",
      "I:218 Test-Acc:0.8782 Train-Acc:0.807\n",
      "I:219 Test-Acc:0.8818 Train-Acc:0.797\n",
      "I:220 Test-Acc:0.8793 Train-Acc:0.799\n",
      "I:221 Test-Acc:0.8789 Train-Acc:0.815\n",
      "I:222 Test-Acc:0.8791 Train-Acc:0.816\n",
      "I:223 Test-Acc:0.8793 Train-Acc:0.809\n",
      "I:224 Test-Acc:0.8814 Train-Acc:0.795\n",
      "I:225 Test-Acc:0.8798 Train-Acc:0.799\n",
      "I:226 Test-Acc:0.8805 Train-Acc:0.806\n",
      "I:227 Test-Acc:0.88 Train-Acc:0.808\n",
      "I:228 Test-Acc:0.8782 Train-Acc:0.801\n",
      "I:229 Test-Acc:0.8802 Train-Acc:0.814\n",
      "I:230 Test-Acc:0.8807 Train-Acc:0.8\n",
      "I:231 Test-Acc:0.8809 Train-Acc:0.798\n",
      "I:232 Test-Acc:0.8805 Train-Acc:0.82\n",
      "I:233 Test-Acc:0.8795 Train-Acc:0.794\n",
      "I:234 Test-Acc:0.8807 Train-Acc:0.806\n",
      "I:235 Test-Acc:0.8806 Train-Acc:0.808\n",
      "I:236 Test-Acc:0.8787 Train-Acc:0.802\n",
      "I:237 Test-Acc:0.8796 Train-Acc:0.81\n",
      "I:238 Test-Acc:0.8766 Train-Acc:0.805\n",
      "I:239 Test-Acc:0.8781 Train-Acc:0.792\n",
      "I:240 Test-Acc:0.8787 Train-Acc:0.809\n",
      "I:241 Test-Acc:0.8762 Train-Acc:0.802\n",
      "I:242 Test-Acc:0.8775 Train-Acc:0.811\n",
      "I:243 Test-Acc:0.8804 Train-Acc:0.814\n",
      "I:244 Test-Acc:0.8794 Train-Acc:0.804\n",
      "I:245 Test-Acc:0.8788 Train-Acc:0.801\n",
      "I:246 Test-Acc:0.8777 Train-Acc:0.795\n",
      "I:247 Test-Acc:0.8785 Train-Acc:0.808\n",
      "I:248 Test-Acc:0.8788 Train-Acc:0.803\n",
      "I:249 Test-Acc:0.8773 Train-Acc:0.813\n",
      "I:250 Test-Acc:0.8786 Train-Acc:0.808\n",
      "I:251 Test-Acc:0.8787 Train-Acc:0.803\n",
      "I:252 Test-Acc:0.8789 Train-Acc:0.812\n",
      "I:253 Test-Acc:0.8792 Train-Acc:0.804\n",
      "I:254 Test-Acc:0.8779 Train-Acc:0.815\n",
      "I:255 Test-Acc:0.8796 Train-Acc:0.811\n",
      "I:256 Test-Acc:0.8798 Train-Acc:0.806\n",
      "I:257 Test-Acc:0.88 Train-Acc:0.803\n",
      "I:258 Test-Acc:0.8776 Train-Acc:0.795\n",
      "I:259 Test-Acc:0.8798 Train-Acc:0.803\n",
      "I:260 Test-Acc:0.8799 Train-Acc:0.805\n",
      "I:261 Test-Acc:0.8789 Train-Acc:0.807\n",
      "I:262 Test-Acc:0.8784 Train-Acc:0.804\n",
      "I:263 Test-Acc:0.8792 Train-Acc:0.806\n",
      "I:264 Test-Acc:0.8777 Train-Acc:0.796\n",
      "I:265 Test-Acc:0.8785 Train-Acc:0.821\n",
      "I:266 Test-Acc:0.8794 Train-Acc:0.81\n",
      "I:267 Test-Acc:0.8783 Train-Acc:0.816\n",
      "I:268 Test-Acc:0.8777 Train-Acc:0.812\n",
      "I:269 Test-Acc:0.8791 Train-Acc:0.812\n",
      "I:270 Test-Acc:0.878 Train-Acc:0.813\n",
      "I:271 Test-Acc:0.8784 Train-Acc:0.82\n",
      "I:272 Test-Acc:0.8792 Train-Acc:0.821\n",
      "I:273 Test-Acc:0.8781 Train-Acc:0.823\n",
      "I:274 Test-Acc:0.8788 Train-Acc:0.816\n",
      "I:275 Test-Acc:0.8793 Train-Acc:0.82\n",
      "I:276 Test-Acc:0.8781 Train-Acc:0.829\n",
      "I:277 Test-Acc:0.8795 Train-Acc:0.809\n",
      "I:278 Test-Acc:0.875 Train-Acc:0.806\n",
      "I:279 Test-Acc:0.8795 Train-Acc:0.813\n",
      "I:280 Test-Acc:0.88 Train-Acc:0.816\n",
      "I:281 Test-Acc:0.8796 Train-Acc:0.819\n",
      "I:282 Test-Acc:0.8802 Train-Acc:0.809\n",
      "I:283 Test-Acc:0.8804 Train-Acc:0.811\n",
      "I:284 Test-Acc:0.8779 Train-Acc:0.808\n",
      "I:285 Test-Acc:0.8816 Train-Acc:0.82\n",
      "I:286 Test-Acc:0.8792 Train-Acc:0.822\n",
      "I:287 Test-Acc:0.8791 Train-Acc:0.817\n",
      "I:288 Test-Acc:0.8769 Train-Acc:0.814\n",
      "I:289 Test-Acc:0.8785 Train-Acc:0.807\n",
      "I:290 Test-Acc:0.8778 Train-Acc:0.817\n",
      "I:291 Test-Acc:0.8794 Train-Acc:0.82\n",
      "I:292 Test-Acc:0.8804 Train-Acc:0.824\n",
      "I:293 Test-Acc:0.8779 Train-Acc:0.812\n",
      "I:294 Test-Acc:0.8784 Train-Acc:0.816\n",
      "I:295 Test-Acc:0.877 Train-Acc:0.817\n",
      "I:296 Test-Acc:0.8767 Train-Acc:0.826\n",
      "I:297 Test-Acc:0.8774 Train-Acc:0.816\n",
      "I:298 Test-Acc:0.8774 Train-Acc:0.804\n",
      "I:299 Test-Acc:0.8774 Train-Acc:0.814"
     ]
    }
   ],
   "source": [
    "import numpy as np, sys\n",
    "np.random.seed(1)\n",
    "\n",
    "from keras.datasets import mnist\n",
    "\n",
    "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
    "\n",
    "images, labels = (x_train[0:1000].reshape(1000,28*28) / 255,\n",
    "                  y_train[0:1000])\n",
    "\n",
    "\n",
    "one_hot_labels = np.zeros((len(labels),10))\n",
    "for i,l in enumerate(labels):\n",
    "    one_hot_labels[i][l] = 1\n",
    "labels = one_hot_labels\n",
    "\n",
    "test_images = x_test.reshape(len(x_test),28*28) / 255\n",
    "test_labels = np.zeros((len(y_test),10))\n",
    "for i,l in enumerate(y_test):\n",
    "    test_labels[i][l] = 1\n",
    "\n",
    "def tanh(x):\n",
    "    return np.tanh(x)\n",
    "\n",
    "def tanh2deriv(output):\n",
    "    return 1 - (output ** 2)\n",
    "\n",
    "def softmax(x):\n",
    "    temp = np.exp(x)\n",
    "    return temp / np.sum(temp, axis=1, keepdims=True)\n",
    "\n",
    "alpha, iterations = (2, 300)\n",
    "pixels_per_image, num_labels = (784, 10)\n",
    "batch_size = 128\n",
    "\n",
    "input_rows = 28\n",
    "input_cols = 28\n",
    "\n",
    "kernel_rows = 3\n",
    "kernel_cols = 3\n",
    "num_kernels = 16\n",
    "\n",
    "hidden_size = ((input_rows - kernel_rows) * \n",
    "               (input_cols - kernel_cols)) * num_kernels\n",
    "\n",
    "# weights_0_1 = 0.02*np.random.random((pixels_per_image,hidden_size))-0.01\n",
    "kernels = 0.02*np.random.random((kernel_rows*kernel_cols,\n",
    "                                 num_kernels))-0.01\n",
    "\n",
    "weights_1_2 = 0.2*np.random.random((hidden_size,\n",
    "                                    num_labels)) - 0.1\n",
    "\n",
    "\n",
    "\n",
    "def get_image_section(layer,row_from, row_to, col_from, col_to):\n",
    "    section = layer[:,row_from:row_to,col_from:col_to]\n",
    "    return section.reshape(-1,1,row_to-row_from, col_to-col_from)\n",
    "\n",
    "for j in range(iterations):\n",
    "    correct_cnt = 0\n",
    "    for i in range(int(len(images) / batch_size)):\n",
    "        batch_start, batch_end=((i * batch_size),((i+1)*batch_size))\n",
    "        layer_0 = images[batch_start:batch_end]\n",
    "        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)\n",
    "        layer_0.shape\n",
    "\n",
    "        sects = list()\n",
    "        for row_start in range(layer_0.shape[1]-kernel_rows):\n",
    "            for col_start in range(layer_0.shape[2] - kernel_cols):\n",
    "                sect = get_image_section(layer_0,\n",
    "                                         row_start,\n",
    "                                         row_start+kernel_rows,\n",
    "                                         col_start,\n",
    "                                         col_start+kernel_cols)\n",
    "                sects.append(sect)\n",
    "\n",
    "        expanded_input = np.concatenate(sects,axis=1)\n",
    "        es = expanded_input.shape\n",
    "        flattened_input = expanded_input.reshape(es[0]*es[1],-1)\n",
    "\n",
    "        kernel_output = flattened_input.dot(kernels)\n",
    "        layer_1 = tanh(kernel_output.reshape(es[0],-1))\n",
    "        dropout_mask = np.random.randint(2,size=layer_1.shape)\n",
    "        layer_1 *= dropout_mask * 2\n",
    "        layer_2 = softmax(np.dot(layer_1,weights_1_2))\n",
    "\n",
    "        for k in range(batch_size):\n",
    "            labelset = labels[batch_start+k:batch_start+k+1]\n",
    "            _inc = int(np.argmax(layer_2[k:k+1]) == \n",
    "                               np.argmax(labelset))\n",
    "            correct_cnt += _inc\n",
    "\n",
    "        layer_2_delta = (labels[batch_start:batch_end]-layer_2)\\\n",
    "                        / (batch_size * layer_2.shape[0])\n",
    "        layer_1_delta = layer_2_delta.dot(weights_1_2.T) * \\\n",
    "                        tanh2deriv(layer_1)\n",
    "        layer_1_delta *= dropout_mask\n",
    "        weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n",
    "        l1d_reshape = layer_1_delta.reshape(kernel_output.shape)\n",
    "        k_update = flattened_input.T.dot(l1d_reshape)\n",
    "        kernels -= alpha * k_update\n",
    "    \n",
    "    test_correct_cnt = 0\n",
    "\n",
    "    for i in range(len(test_images)):\n",
    "\n",
    "        layer_0 = test_images[i:i+1]\n",
    "#         layer_1 = tanh(np.dot(layer_0,weights_0_1))\n",
    "        layer_0 = layer_0.reshape(layer_0.shape[0],28,28)\n",
    "        layer_0.shape\n",
    "\n",
    "        sects = list()\n",
    "        for row_start in range(layer_0.shape[1]-kernel_rows):\n",
    "            for col_start in range(layer_0.shape[2] - kernel_cols):\n",
    "                sect = get_image_section(layer_0,\n",
    "                                         row_start,\n",
    "                                         row_start+kernel_rows,\n",
    "                                         col_start,\n",
    "                                         col_start+kernel_cols)\n",
    "                sects.append(sect)\n",
    "\n",
    "        expanded_input = np.concatenate(sects,axis=1)\n",
    "        es = expanded_input.shape\n",
    "        flattened_input = expanded_input.reshape(es[0]*es[1],-1)\n",
    "\n",
    "        kernel_output = flattened_input.dot(kernels)\n",
    "        layer_1 = tanh(kernel_output.reshape(es[0],-1))\n",
    "        layer_2 = np.dot(layer_1,weights_1_2)\n",
    "\n",
    "        test_correct_cnt += int(np.argmax(layer_2) == \n",
    "                                np.argmax(test_labels[i:i+1]))\n",
    "    if(j % 1 == 0):\n",
    "        sys.stdout.write(\"\\n\"+ \\\n",
    "         \"I:\" + str(j) + \\\n",
    "         \" Test-Acc:\"+str(test_correct_cnt/float(len(test_images)))+\\\n",
    "         \" Train-Acc:\" + str(correct_cnt/float(len(images))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download the IMDB Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download reviews.txt and labels.txt from here: https://github.com/udacity/deep-learning/tree/master/sentiment-network\n",
    "\n",
    "def pretty_print_review_and_label(i):\n",
    "   print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
    "\n",
    "g = open('reviews.txt','r') # What we know!\n",
    "reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
    "g.close()\n",
    "\n",
    "g = open('labels.txt','r') # What we WANT to know!\n",
    "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
    "g.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Capturing Word Correlation in Input Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sent Encoding:[1 1 0 1]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "onehots = {}\n",
    "onehots['cat'] = np.array([1,0,0,0])\n",
    "onehots['the'] = np.array([0,1,0,0])\n",
    "onehots['dog'] = np.array([0,0,1,0])\n",
    "onehots['sat'] = np.array([0,0,0,1])\n",
    "\n",
    "sentence = ['the','cat','sat']\n",
    "x = onehots[sentence[0]] + \\\n",
    "    onehots[sentence[1]] + \\\n",
    "    onehots[sentence[2]]\n",
    "\n",
    "print(\"Sent Encoding:\" + str(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Predicting Movie Reviews"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "\n",
    "f = open('reviews.txt')\n",
    "raw_reviews = f.readlines()\n",
    "f.close()\n",
    "\n",
    "f = open('labels.txt')\n",
    "raw_labels = f.readlines()\n",
    "f.close()\n",
    "\n",
    "tokens = list(map(lambda x:set(x.split(\" \")),raw_reviews))\n",
    "\n",
    "vocab = set()\n",
    "for sent in tokens:\n",
    "    for word in sent:\n",
    "        if(len(word)>0):\n",
    "            vocab.add(word)\n",
    "vocab = list(vocab)\n",
    "\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "\n",
    "input_dataset = list()\n",
    "for sent in tokens:\n",
    "    sent_indices = list()\n",
    "    for word in sent:\n",
    "        try:\n",
    "            sent_indices.append(word2index[word])\n",
    "        except:\n",
    "            \"\"\n",
    "    input_dataset.append(list(set(sent_indices)))\n",
    "\n",
    "target_dataset = list()\n",
    "for label in raw_labels:\n",
    "    if label == 'positive\\n':\n",
    "        target_dataset.append(1)\n",
    "    else:\n",
    "        target_dataset.append(0)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "import numpy as np\n",
    "np.random.seed(1)\n",
    "\n",
    "def sigmoid(x):\n",
    "    return 1/(1 + np.exp(-x))\n",
    "\n",
    "alpha, iterations = (0.01, 2)\n",
    "hidden_size = 100\n",
    "\n",
    "weights_0_1 = 0.2*np.random.random((len(vocab),hidden_size)) - 0.1\n",
    "weights_1_2 = 0.2*np.random.random((hidden_size,1)) - 0.1\n",
    "\n",
    "correct,total = (0,0)\n",
    "for iter in range(iterations):\n",
    "    \n",
    "    # train on first 24,000\n",
    "    for i in range(len(input_dataset)-1000):\n",
    "\n",
    "        x,y = (input_dataset[i],target_dataset[i])\n",
    "        layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0)) #embed + sigmoid\n",
    "        layer_2 = sigmoid(np.dot(layer_1,weights_1_2)) # linear + softmax\n",
    "\n",
    "        layer_2_delta = layer_2 - y # compare pred with truth\n",
    "        layer_1_delta = layer_2_delta.dot(weights_1_2.T) #backprop\n",
    "\n",
    "        weights_0_1[x] -= layer_1_delta * alpha\n",
    "        weights_1_2 -= np.outer(layer_1,layer_2_delta) * alpha\n",
    "\n",
    "        if(np.abs(layer_2_delta) < 0.5):\n",
    "            correct += 1\n",
    "        total += 1\n",
    "        if(i % 10 == 9):\n",
    "            progress = str(i/float(len(input_dataset)))\n",
    "            sys.stdout.write('\\rIter:'+str(iter)\\\n",
    "                             +' Progress:'+progress[2:4]\\\n",
    "                             +'.'+progress[4:6]\\\n",
    "                             +'% Training Accuracy:'\\\n",
    "                             + str(correct/float(total)) + '%')\n",
    "    print()\n",
    "correct,total = (0,0)\n",
    "for i in range(len(input_dataset)-1000,len(input_dataset)):\n",
    "\n",
    "    x = input_dataset[i]\n",
    "    y = target_dataset[i]\n",
    "\n",
    "    layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0))\n",
    "    layer_2 = sigmoid(np.dot(layer_1,weights_1_2))\n",
    "    \n",
    "    if(np.abs(layer_2 - y) < 0.5):\n",
    "        correct += 1\n",
    "    total += 1\n",
    "print(\"Test Accuracy:\" + str(correct / float(total)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'',\n",
       " '\\n',\n",
       " '.',\n",
       " 'a',\n",
       " 'about',\n",
       " 'adults',\n",
       " 'age',\n",
       " 'all',\n",
       " 'and',\n",
       " 'as',\n",
       " 'at',\n",
       " 'believe',\n",
       " 'bromwell',\n",
       " 'burn',\n",
       " 'can',\n",
       " 'cartoon',\n",
       " 'classic',\n",
       " 'closer',\n",
       " 'comedy',\n",
       " 'down',\n",
       " 'episode',\n",
       " 'expect',\n",
       " 'far',\n",
       " 'fetched',\n",
       " 'financially',\n",
       " 'here',\n",
       " 'high',\n",
       " 'i',\n",
       " 'immediately',\n",
       " 'in',\n",
       " 'insightful',\n",
       " 'inspector',\n",
       " 'is',\n",
       " 'isn',\n",
       " 'it',\n",
       " 'knew',\n",
       " 'lead',\n",
       " 'life',\n",
       " 'line',\n",
       " 'm',\n",
       " 'many',\n",
       " 'me',\n",
       " 'much',\n",
       " 'my',\n",
       " 'of',\n",
       " 'one',\n",
       " 'other',\n",
       " 'pathetic',\n",
       " 'pettiness',\n",
       " 'pity',\n",
       " 'pomp',\n",
       " 'profession',\n",
       " 'programs',\n",
       " 'ran',\n",
       " 'reality',\n",
       " 'recalled',\n",
       " 'remind',\n",
       " 'repeatedly',\n",
       " 'right',\n",
       " 's',\n",
       " 'sack',\n",
       " 'same',\n",
       " 'satire',\n",
       " 'saw',\n",
       " 'school',\n",
       " 'schools',\n",
       " 'scramble',\n",
       " 'see',\n",
       " 'situation',\n",
       " 'some',\n",
       " 'student',\n",
       " 'students',\n",
       " 'such',\n",
       " 'survive',\n",
       " 't',\n",
       " 'teachers',\n",
       " 'teaching',\n",
       " 'than',\n",
       " 'that',\n",
       " 'the',\n",
       " 'their',\n",
       " 'think',\n",
       " 'through',\n",
       " 'time',\n",
       " 'to',\n",
       " 'tried',\n",
       " 'welcome',\n",
       " 'what',\n",
       " 'when',\n",
       " 'which',\n",
       " 'who',\n",
       " 'whole',\n",
       " 'years',\n",
       " 'your'}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tokens[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Comparing Word Embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "import math \n",
    "\n",
    "def similar(target='beautiful'):\n",
    "    target_index = word2index[target]\n",
    "    scores = Counter()\n",
    "    for word,index in word2index.items():\n",
    "        raw_difference = weights_0_1[index] - (weights_0_1[target_index])\n",
    "        squared_difference = raw_difference * raw_difference\n",
    "        scores[word] = -math.sqrt(sum(squared_difference))\n",
    "\n",
    "    return scores.most_common(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('beautiful', -0.0), ('heart', -0.7461901055360456), ('captures', -0.7767713774499612), ('impact', -0.7851006592549541), ('unexpected', -0.8024296074764704), ('bit', -0.8041029062033365), ('touching', -0.8041105203290175), ('true', -0.8092335336931215), ('worth', -0.8095649927927353), ('strong', -0.8095814455120289)]\n"
     ]
    }
   ],
   "source": [
    "print(similar('beautiful'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('terrible', -0.0), ('boring', -0.7591663900380615), ('lame', -0.7732283645546325), ('horrible', -0.788081854105546), ('disappointing', -0.7893120726668719), ('avoid', -0.7939105009456955), ('badly', -0.8054784389155504), ('annoying', -0.8067172753479477), ('dull', -0.8072650189634973), ('mess', -0.8139036459320503)]\n"
     ]
    }
   ],
   "source": [
    "print(similar('terrible'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Filling in the Blank"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys,random,math\n",
    "from collections import Counter\n",
    "import numpy as np\n",
    "\n",
    "np.random.seed(1)\n",
    "random.seed(1)\n",
    "f = open('reviews.txt')\n",
    "raw_reviews = f.readlines()\n",
    "f.close()\n",
    "\n",
    "tokens = list(map(lambda x:(x.split(\" \")),raw_reviews))\n",
    "wordcnt = Counter()\n",
    "for sent in tokens:\n",
    "    for word in sent:\n",
    "        wordcnt[word] -= 1\n",
    "vocab = list(set(map(lambda x:x[0],wordcnt.most_common())))\n",
    "\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "\n",
    "concatenated = list()\n",
    "input_dataset = list()\n",
    "for sent in tokens:\n",
    "    sent_indices = list()\n",
    "    for word in sent:\n",
    "        try:\n",
    "            sent_indices.append(word2index[word])\n",
    "            concatenated.append(word2index[word])\n",
    "        except:\n",
    "            \"\"\n",
    "    input_dataset.append(sent_indices)\n",
    "concatenated = np.array(concatenated)\n",
    "random.shuffle(input_dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Progress:0.99998[('terrible', -0.0), ('horrible', -3.488841411481131), ('bad', -4.0636425093941595), ('brilliant', -4.211247495138625), ('pathetic', -4.304645745396163), ('fantastic', -4.341998952418319), ('fabulous', -4.356925869405997), ('phenomenal', -4.361301237074382), ('marvelous', -4.3856957968039145), ('spectacular', -4.413156799233535)]\n"
     ]
    }
   ],
   "source": [
    "alpha, iterations = (0.05, 2)\n",
    "hidden_size,window,negative = (50,2,5)\n",
    "\n",
    "weights_0_1 = (np.random.rand(len(vocab),hidden_size) - 0.5) * 0.2\n",
    "weights_1_2 = np.random.rand(len(vocab),hidden_size)*0\n",
    "\n",
    "layer_2_target = np.zeros(negative+1)\n",
    "layer_2_target[0] = 1\n",
    "\n",
    "def similar(target='beautiful'):\n",
    "  target_index = word2index[target]\n",
    "\n",
    "  scores = Counter()\n",
    "  for word,index in word2index.items():\n",
    "    raw_difference = weights_0_1[index] - (weights_0_1[target_index])\n",
    "    squared_difference = raw_difference * raw_difference\n",
    "    scores[word] = -math.sqrt(sum(squared_difference))\n",
    "  return scores.most_common(10)\n",
    "\n",
    "def sigmoid(x):\n",
    "    return 1/(1 + np.exp(-x))\n",
    "\n",
    "for rev_i,review in enumerate(input_dataset * iterations):\n",
    "  for target_i in range(len(review)):\n",
    "        \n",
    "    # since it's really expensive to predict every vocabulary\n",
    "    # we're only going to predict a random subset\n",
    "    target_samples = [review[target_i]]+list(concatenated\\\n",
    "    [(np.random.rand(negative)*len(concatenated)).astype('int').tolist()])\n",
    "\n",
    "    left_context = review[max(0,target_i-window):target_i]\n",
    "    right_context = review[target_i+1:min(len(review),target_i+window)]\n",
    "\n",
    "    layer_1 = np.mean(weights_0_1[left_context+right_context],axis=0)\n",
    "    layer_2 = sigmoid(layer_1.dot(weights_1_2[target_samples].T))\n",
    "    layer_2_delta = layer_2 - layer_2_target\n",
    "    layer_1_delta = layer_2_delta.dot(weights_1_2[target_samples])\n",
    "\n",
    "    weights_0_1[left_context+right_context] -= layer_1_delta * alpha\n",
    "    weights_1_2[target_samples] -= np.outer(layer_2_delta,layer_1)*alpha\n",
    "\n",
    "  if(rev_i % 250 == 0):\n",
    "    sys.stdout.write('\\rProgress:'+str(rev_i/float(len(input_dataset)\n",
    "        *iterations)) + \"   \" + str(similar('terrible')))\n",
    "  sys.stdout.write('\\rProgress:'+str(rev_i/float(len(input_dataset)\n",
    "        *iterations)))\n",
    "print(similar('terrible'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# King - Man + Woman ~= Queen"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "def analogy(positive=['terrible','good'],negative=['bad']):\n",
    "    \n",
    "    norms = np.sum(weights_0_1 * weights_0_1,axis=1)\n",
    "    norms.resize(norms.shape[0],1)\n",
    "    \n",
    "    normed_weights = weights_0_1 * norms\n",
    "    \n",
    "    query_vect = np.zeros(len(weights_0_1[0]))\n",
    "    for word in positive:\n",
    "        query_vect += normed_weights[word2index[word]]\n",
    "    for word in negative:\n",
    "        query_vect -= normed_weights[word2index[word]]\n",
    "    \n",
    "    scores = Counter()\n",
    "    for word,index in word2index.items():\n",
    "        raw_difference = weights_0_1[index] - query_vect\n",
    "        squared_difference = raw_difference * raw_difference\n",
    "        scores[word] = -math.sqrt(sum(squared_difference))\n",
    "        \n",
    "    return scores.most_common(10)[1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('terrific', -210.46593317724228),\n",
       " ('perfect', -210.52652806032205),\n",
       " ('worth', -210.53162266358495),\n",
       " ('good', -210.55072184482773),\n",
       " ('terrible', -210.58429046605724),\n",
       " ('decent', -210.87945442008805),\n",
       " ('superb', -211.01143515971094),\n",
       " ('great', -211.1327058081335),\n",
       " ('worthy', -211.13577238103477)]"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "analogy(['terrible','good'],['bad'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('simon', -193.82490698964878),\n",
       " ('obsessed', -193.91805919583555),\n",
       " ('stanwyck', -194.22311983847902),\n",
       " ('sandler', -194.22846640800597),\n",
       " ('branagh', -194.24551334589853),\n",
       " ('daniel', -194.24631020485714),\n",
       " ('peter', -194.29908544092078),\n",
       " ('tony', -194.31388897167716),\n",
       " ('aged', -194.35115773165094)]"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "analogy(['elizabeth','he'],['she'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Download & Preprocess the IMDB Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Download reviews.txt and labels.txt from here: https://github.com/udacity/deep-learning/tree/master/sentiment-network\n",
    "\n",
    "def pretty_print_review_and_label(i):\n",
    "   print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n",
    "\n",
    "g = open('reviews.txt','r') # What we know!\n",
    "reviews = list(map(lambda x:x[:-1],g.readlines()))\n",
    "g.close()\n",
    "\n",
    "g = open('labels.txt','r') # What we WANT to know!\n",
    "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n",
    "g.close()\n",
    "\n",
    "\n",
    "# Preprocess dataset:\n",
    "\n",
    "import sys\n",
    "\n",
    "f = open('reviews.txt')\n",
    "raw_reviews = f.readlines()\n",
    "f.close()\n",
    "\n",
    "f = open('labels.txt')\n",
    "raw_labels = f.readlines()\n",
    "f.close()\n",
    "\n",
    "tokens = list(map(lambda x:set(x.split(\" \")),raw_reviews))\n",
    "\n",
    "vocab = set()\n",
    "for sent in tokens:\n",
    "    for word in sent:\n",
    "        if(len(word)>0):\n",
    "            vocab.add(word)\n",
    "vocab = list(vocab)\n",
    "\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "\n",
    "input_dataset = list()\n",
    "for sent in tokens:\n",
    "    sent_indices = list()\n",
    "    for word in sent:\n",
    "        try:\n",
    "            sent_indices.append(word2index[word])\n",
    "        except:\n",
    "            \"\"\n",
    "    input_dataset.append(list(set(sent_indices)))\n",
    "\n",
    "target_dataset = list()\n",
    "for label in raw_labels:\n",
    "    if label == 'positive\\n':\n",
    "        target_dataset.append(1)\n",
    "    else:\n",
    "        target_dataset.append(0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Surprising Power of Averaged Word Vectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['this tim burton remake of the original  ',\n",
       " 'certainly one of the dozen or so worst m',\n",
       " 'boring and appallingly acted  summer phe']"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "norms = np.sum(weights_0_1 * weights_0_1,axis=1)\n",
    "norms.resize(norms.shape[0],1)\n",
    "normed_weights = weights_0_1 * norms\n",
    "\n",
    "def make_sent_vect(words):\n",
    "    indices = list(map(lambda x:word2index[x],filter(lambda x:x in word2index,words)))\n",
    "    return np.mean(normed_weights[indices],axis=0)\n",
    "\n",
    "reviews2vectors = list()\n",
    "for review in tokens: # tokenized reviews\n",
    "    reviews2vectors.append(make_sent_vect(review))\n",
    "reviews2vectors = np.array(reviews2vectors)\n",
    "\n",
    "def most_similar_reviews(review):\n",
    "    v = make_sent_vect(review)\n",
    "    scores = Counter()\n",
    "    for i,val in enumerate(reviews2vectors.dot(v)):\n",
    "        scores[i] = val\n",
    "    most_similar = list()\n",
    "    \n",
    "    for idx,score in scores.most_common(3):\n",
    "        most_similar.append(raw_reviews[idx][0:40])\n",
    "    return most_similar\n",
    "\n",
    "most_similar_reviews(['boring','awful'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Matrices that Change Absolutely Nothing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[1. 0. 0.]\n",
      " [0. 1. 0.]\n",
      " [0. 0. 1.]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "a = np.array([1,2,3])\n",
    "b = np.array([0.1,0.2,0.3])\n",
    "c = np.array([-1,-0.5,0])\n",
    "d = np.array([0,0,0])\n",
    "\n",
    "identity = np.eye(3)\n",
    "print(identity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1. 2. 3.]\n",
      "[0.1 0.2 0.3]\n",
      "[-1.  -0.5  0. ]\n",
      "[0. 0. 0.]\n"
     ]
    }
   ],
   "source": [
    "print(a.dot(identity))\n",
    "print(b.dot(identity))\n",
    "print(c.dot(identity))\n",
    "print(d.dot(identity))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[13 15 17]\n",
      "[13. 15. 17.]\n"
     ]
    }
   ],
   "source": [
    "this = np.array([2,4,6])\n",
    "movie = np.array([10,10,10])\n",
    "rocks = np.array([1,1,1])\n",
    "\n",
    "print(this + movie + rocks)\n",
    "print((this.dot(identity) + movie).dot(identity) + rocks)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Forward Propagation in Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def softmax(x_):\n",
    "    x = np.atleast_2d(x_)\n",
    "    temp = np.exp(x)\n",
    "    return temp / np.sum(temp, axis=1, keepdims=True)\n",
    "\n",
    "word_vects = {}\n",
    "word_vects['yankees'] = np.array([[0.,0.,0.]])\n",
    "word_vects['bears'] = np.array([[0.,0.,0.]])\n",
    "word_vects['braves'] = np.array([[0.,0.,0.]])\n",
    "word_vects['red'] = np.array([[0.,0.,0.]])\n",
    "word_vects['socks'] = np.array([[0.,0.,0.]])\n",
    "word_vects['lose'] = np.array([[0.,0.,0.]])\n",
    "word_vects['defeat'] = np.array([[0.,0.,0.]])\n",
    "word_vects['beat'] = np.array([[0.,0.,0.]])\n",
    "word_vects['tie'] = np.array([[0.,0.,0.]])\n",
    "\n",
    "sent2output = np.random.rand(3,len(word_vects))\n",
    "\n",
    "identity = np.eye(3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0.11111111 0.11111111 0.11111111 0.11111111 0.11111111 0.11111111\n",
      "  0.11111111 0.11111111 0.11111111]]\n"
     ]
    }
   ],
   "source": [
    "layer_0 = word_vects['red']\n",
    "layer_1 = layer_0.dot(identity) + word_vects['socks']\n",
    "layer_2 = layer_1.dot(identity) + word_vects['defeat']\n",
    "\n",
    "pred = softmax(layer_2.dot(sent2output))\n",
    "print(pred)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How do we Backpropagate into this?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = np.array([1,0,0,0,0,0,0,0,0]) # target one-hot vector for \"yankees\"\n",
    "\n",
    "pred_delta = pred - y\n",
    "layer_2_delta = pred_delta.dot(sent2output.T)\n",
    "defeat_delta = layer_2_delta * 1 # can ignore the \"1\" like prev. chapter\n",
    "layer_1_delta = layer_2_delta.dot(identity.T)\n",
    "socks_delta = layer_1_delta * 1 # again... can ignore the \"1\"\n",
    "layer_0_delta = layer_1_delta.dot(identity.T)\n",
    "alpha = 0.01\n",
    "word_vects['red'] -= layer_0_delta * alpha\n",
    "word_vects['socks'] -= socks_delta * alpha\n",
    "word_vects['defeat'] -= defeat_delta * alpha\n",
    "identity -= np.outer(layer_0,layer_1_delta) * alpha\n",
    "identity -= np.outer(layer_1,layer_2_delta) * alpha\n",
    "sent2output -= np.outer(layer_2,pred_delta) * alpha"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's Train it!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['mary', 'moved', 'to', 'the', 'bathroom.'], ['john', 'went', 'to', 'the', 'hallway.'], ['where', 'is', 'mary?', '\\tbathroom\\t1']]\n"
     ]
    }
   ],
   "source": [
    "import sys,random,math\n",
    "from collections import Counter\n",
    "import numpy as np\n",
    "\n",
    "f = open('tasksv11/en/qa1_single-supporting-fact_train.txt','r')\n",
    "raw = f.readlines()\n",
    "f.close()\n",
    "\n",
    "tokens = list()\n",
    "for line in raw[0:1000]:\n",
    "    tokens.append(line.lower().replace(\"\\n\",\"\").split(\" \")[1:])\n",
    "\n",
    "print(tokens[0:3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "vocab = set()\n",
    "for sent in tokens:\n",
    "    for word in sent:\n",
    "        vocab.add(word)\n",
    "\n",
    "vocab = list(vocab)\n",
    "\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "    \n",
    "def words2indices(sentence):\n",
    "    idx = list()\n",
    "    for word in sentence:\n",
    "        idx.append(word2index[word])\n",
    "    return idx\n",
    "\n",
    "def softmax(x):\n",
    "    e_x = np.exp(x - np.max(x))\n",
    "    return e_x / e_x.sum(axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "embed_size = 10\n",
    "\n",
    "# word embeddings\n",
    "embed = (np.random.rand(len(vocab),embed_size) - 0.5) * 0.1\n",
    "\n",
    "# embedding -> embedding (initially the identity matrix)\n",
    "recurrent = np.eye(embed_size)\n",
    "\n",
    "# sentence embedding for empty sentence\n",
    "start = np.zeros(embed_size)\n",
    "\n",
    "# embedding -> output weights\n",
    "decoder = (np.random.rand(embed_size, len(vocab)) - 0.5) * 0.1\n",
    "\n",
    "# one hot lookups (for loss function)\n",
    "one_hot = np.eye(len(vocab))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Forward Propagation with Arbitrary Length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict(sent):\n",
    "    \n",
    "    layers = list()\n",
    "    layer = {}\n",
    "    layer['hidden'] = start\n",
    "    layers.append(layer)\n",
    "\n",
    "    loss = 0\n",
    "\n",
    "    # forward propagate\n",
    "    preds = list()\n",
    "    for target_i in range(len(sent)):\n",
    "\n",
    "        layer = {}\n",
    "\n",
    "        # try to predict the next term\n",
    "        layer['pred'] = softmax(layers[-1]['hidden'].dot(decoder))\n",
    "\n",
    "        loss += -np.log(layer['pred'][sent[target_i]])\n",
    "\n",
    "        # generate the next hidden state\n",
    "        layer['hidden'] = layers[-1]['hidden'].dot(recurrent) + embed[sent[target_i]]\n",
    "        layers.append(layer)\n",
    "\n",
    "    return layers, loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Backpropagation with Arbitrary Length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "# forward\n",
    "for iter in range(30000):\n",
    "    alpha = 0.001\n",
    "    sent = words2indices(tokens[iter%len(tokens)][1:])\n",
    "    layers,loss = predict(sent) \n",
    "\n",
    "    # back propagate\n",
    "    for layer_idx in reversed(range(len(layers))):\n",
    "        layer = layers[layer_idx]\n",
    "        target = sent[layer_idx-1]\n",
    "\n",
    "        if(layer_idx > 0):  # if not the first layer\n",
    "            layer['output_delta'] = layer['pred'] - one_hot[target]\n",
    "            new_hidden_delta = layer['output_delta'].dot(decoder.transpose())\n",
    "\n",
    "            # if the last layer - don't pull from a later one becasue it doesn't exist\n",
    "            if(layer_idx == len(layers)-1):\n",
    "                layer['hidden_delta'] = new_hidden_delta\n",
    "            else:\n",
    "                layer['hidden_delta'] = new_hidden_delta + layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n",
    "        else: # if the first layer\n",
    "            layer['hidden_delta'] = layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Weight Update with Arbitrary Length"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Perplexity:82.09227500075585\n",
      "Perplexity:81.87615610433569\n",
      "Perplexity:81.53705034457951\n",
      "Perplexity:80.88879456876245\n",
      "Perplexity:79.50015694256045\n",
      "Perplexity:76.04440447063566\n",
      "Perplexity:63.76523100870378\n",
      "Perplexity:34.69262611144399\n",
      "Perplexity:21.77439314730968\n",
      "Perplexity:19.74440305631078\n",
      "Perplexity:18.813349002926333\n",
      "Perplexity:17.920571868736154\n",
      "Perplexity:16.84823833832929\n",
      "Perplexity:15.302868260393344\n",
      "Perplexity:12.898616378336536\n",
      "Perplexity:9.781678937443305\n",
      "Perplexity:7.546724222346714\n",
      "Perplexity:6.4277474041777305\n",
      "Perplexity:5.685698933881173\n",
      "Perplexity:5.240514920446924\n",
      "Perplexity:4.916476504398705\n",
      "Perplexity:4.674677629541541\n",
      "Perplexity:4.494159385603734\n",
      "Perplexity:4.365041755388302\n",
      "Perplexity:4.289971726173599\n",
      "Perplexity:4.243384558378477\n",
      "Perplexity:4.192001080475404\n",
      "Perplexity:4.132556753967558\n",
      "Perplexity:4.071667181580819\n",
      "Perplexity:4.0167814473718435\n"
     ]
    }
   ],
   "source": [
    "# forward\n",
    "for iter in range(30000):\n",
    "    alpha = 0.001\n",
    "    sent = words2indices(tokens[iter%len(tokens)][1:])\n",
    "\n",
    "    layers,loss = predict(sent) \n",
    "\n",
    "    # back propagate\n",
    "    for layer_idx in reversed(range(len(layers))):\n",
    "        layer = layers[layer_idx]\n",
    "        target = sent[layer_idx-1]\n",
    "\n",
    "        if(layer_idx > 0):\n",
    "            layer['output_delta'] = layer['pred'] - one_hot[target]\n",
    "            new_hidden_delta = layer['output_delta'].dot(decoder.transpose())\n",
    "\n",
    "            # if the last layer - don't pull from a \n",
    "            # later one becasue it doesn't exist\n",
    "            if(layer_idx == len(layers)-1):\n",
    "                layer['hidden_delta'] = new_hidden_delta\n",
    "            else:\n",
    "                layer['hidden_delta'] = new_hidden_delta + layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n",
    "        else:\n",
    "            layer['hidden_delta'] = layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n",
    "\n",
    "    # update weights\n",
    "    start -= layers[0]['hidden_delta'] * alpha / float(len(sent))\n",
    "    for layer_idx,layer in enumerate(layers[1:]):\n",
    "        \n",
    "        decoder -= np.outer(layers[layer_idx]['hidden'], layer['output_delta']) * alpha / float(len(sent))\n",
    "        \n",
    "        embed_idx = sent[layer_idx]\n",
    "        embed[embed_idx] -= layers[layer_idx]['hidden_delta'] * alpha / float(len(sent))\n",
    "        recurrent -= np.outer(layers[layer_idx]['hidden'], layer['hidden_delta']) * alpha / float(len(sent))\n",
    "        \n",
    "    if(iter % 1000 == 0):\n",
    "        print(\"Perplexity:\" + str(np.exp(loss/len(sent))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Execution and Output Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['sandra', 'moved', 'to', 'the', 'garden.']\n",
      "Prev Input:sandra      True:moved          Pred:is\n",
      "Prev Input:moved       True:to             Pred:to\n",
      "Prev Input:to          True:the            Pred:the\n",
      "Prev Input:the         True:garden.        Pred:bedroom.\n"
     ]
    }
   ],
   "source": [
    "sent_index = 4\n",
    "\n",
    "l,_ = predict(words2indices(tokens[sent_index]))\n",
    "\n",
    "print(tokens[sent_index])\n",
    "\n",
    "for i,each_layer in enumerate(l[1:-1]):\n",
    "    input = tokens[sent_index][i]\n",
    "    true = tokens[sent_index][i+1]\n",
    "    pred = vocab[each_layer['pred'].argmax()]\n",
    "    print(\"Prev Input:\" + input + (' ' * (12 - len(input))) +\\\n",
    "          \"True:\" + true + (\" \" * (15 - len(true))) + \"Pred:\" + pred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1: Introduction to Tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 2 3 4 5]\n",
      "[ 2  4  6  8 10]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self, data):\n",
    "        self.data = np.array(data)\n",
    "    \n",
    "    def __add__(self, other):\n",
    "        return Tensor(self.data + other.data)\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())\n",
    "    \n",
    "x = Tensor([1,2,3,4,5])\n",
    "print(x)\n",
    "\n",
    "y = x + x\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2: Introduction to Autograd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self, data, creators=None, creation_op = None):\n",
    "        self.data = np.array(data)\n",
    "        self.creation_op = creation_op\n",
    "        self.creators = creators\n",
    "        self.grad = None\n",
    "    \n",
    "    def backward(self, grad):\n",
    "        self.grad = grad\n",
    "        \n",
    "        if(self.creation_op == \"add\"):\n",
    "            self.creators[0].backward(grad)\n",
    "            self.creators[1].backward(grad)\n",
    "\n",
    "    def __add__(self, other):\n",
    "        return Tensor(self.data + other.data,  creators=[self,other], creation_op=\"add\")\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())\n",
    "    \n",
    "x = Tensor([1,2,3,4,5])\n",
    "y = Tensor([2,2,2,2,2])\n",
    "\n",
    "z = x + y\n",
    "z.backward(Tensor(np.array([1,1,1,1,1])))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 1 1 1 1]\n",
      "[1 1 1 1 1]\n",
      "[array([1, 2, 3, 4, 5]), array([2, 2, 2, 2, 2])]\n",
      "add\n"
     ]
    }
   ],
   "source": [
    "print(x.grad)\n",
    "print(y.grad)\n",
    "print(z.creators)\n",
    "print(z.creation_op)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 1 1 1 1]\n"
     ]
    }
   ],
   "source": [
    "a = Tensor([1,2,3,4,5])\n",
    "b = Tensor([2,2,2,2,2])\n",
    "c = Tensor([5,4,3,2,1])\n",
    "d = Tensor([-1,-2,-3,-4,-5])\n",
    "\n",
    "e = a + b\n",
    "f = c + d\n",
    "g = e + f\n",
    "\n",
    "g.backward(Tensor(np.array([1,1,1,1,1])))\n",
    "\n",
    "print(a.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 3: Tensors That Are Used Multiple Times"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([False, False, False, False, False])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = Tensor([1,2,3,4,5])\n",
    "b = Tensor([2,2,2,2,2])\n",
    "c = Tensor([5,4,3,2,1])\n",
    "\n",
    "d = a + b\n",
    "e = b + c\n",
    "f = d + e\n",
    "f.backward(Tensor(np.array([1,1,1,1,1])))\n",
    "\n",
    "b.grad.data == np.array([2,2,2,2,2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 4: Upgrading Autograd to Support Multiple Tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ True  True  True  True  True]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True        \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    "            if(grad is None):\n",
    "                grad = FloatTensor(np.ones_like(self.data))\n",
    "            \n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "a = Tensor([1,2,3,4,5], autograd=True)\n",
    "b = Tensor([2,2,2,2,2], autograd=True)\n",
    "c = Tensor([5,4,3,2,1], autograd=True)\n",
    "\n",
    "d = a + b\n",
    "e = b + c\n",
    "f = d + e\n",
    "\n",
    "f.backward(Tensor(np.array([1,1,1,1,1])))\n",
    "\n",
    "print(b.grad.data == np.array([2,2,2,2,2]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 5: Add Support for Negation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ True  True  True  True  True]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True        \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    "            if(grad is None):\n",
    "                grad = FloatTensor(np.ones_like(self.data))\n",
    "            \n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1) \n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "a = Tensor([1,2,3,4,5], autograd=True)\n",
    "b = Tensor([2,2,2,2,2], autograd=True)\n",
    "c = Tensor([5,4,3,2,1], autograd=True)\n",
    "\n",
    "d = a + (-b)\n",
    "e = (-b) + c\n",
    "f = d + e\n",
    "\n",
    "f.backward(Tensor(np.array([1,1,1,1,1])))\n",
    "\n",
    "print(b.grad.data == np.array([-2,-2,-2,-2,-2]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 6: Add Support for Additional Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ True  True  True  True  True]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1)\n",
    "    \n",
    "    def __sub__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data - other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"sub\")\n",
    "        return Tensor(self.data - other.data)\n",
    "    \n",
    "    def __mul__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data * other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"mul\")\n",
    "        return Tensor(self.data * other.data)    \n",
    "\n",
    "    def sum(self, dim):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.sum(dim),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sum_\"+str(dim))\n",
    "        return Tensor(self.data.sum(dim))\n",
    "    \n",
    "    def expand(self, dim,copies):\n",
    "\n",
    "        trans_cmd = list(range(0,len(self.data.shape)))\n",
    "        trans_cmd.insert(dim,len(self.data.shape))\n",
    "        new_data = self.data.repeat(copies).reshape(list(self.data.shape) + [copies]).transpose(trans_cmd)\n",
    "        \n",
    "        if(self.autograd):\n",
    "            return Tensor(new_data,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"expand_\"+str(dim))\n",
    "        return Tensor(new_data)\n",
    "    \n",
    "    def transpose(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.transpose(),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"transpose\")\n",
    "        \n",
    "        return Tensor(self.data.transpose())\n",
    "    \n",
    "    def mm(self, x):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.dot(x.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self,x],\n",
    "                          creation_op=\"mm\")\n",
    "        return Tensor(self.data.dot(x.data))\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "a = Tensor([1,2,3,4,5], autograd=True)\n",
    "b = Tensor([2,2,2,2,2], autograd=True)\n",
    "c = Tensor([5,4,3,2,1], autograd=True)\n",
    "\n",
    "d = a + b\n",
    "e = b + c\n",
    "f = d + e\n",
    "\n",
    "f.backward(Tensor(np.array([1,1,1,1,1])))\n",
    "\n",
    "print(b.grad.data == np.array([2,2,2,2,2]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A few Notes on Sum and Expand"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = Tensor(np.array([[1,2,3],\n",
    "                     [4,5,6]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([5, 7, 9])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.sum(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 6, 15])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.sum(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[[1, 1, 1, 1],\n",
       "        [2, 2, 2, 2],\n",
       "        [3, 3, 3, 3]],\n",
       "\n",
       "       [[4, 4, 4, 4],\n",
       "        [5, 5, 5, 5],\n",
       "        [6, 6, 6, 6]]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.expand(dim=2, copies=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 7: Use Autograd to Train a Neural Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Previously we would train a model like this"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5.066439994622395\n",
      "0.4959907791902342\n",
      "0.4180671892167177\n",
      "0.35298133007809646\n",
      "0.2972549636567377\n",
      "0.2492326038163328\n",
      "0.20785392075862477\n",
      "0.17231260916265176\n",
      "0.14193744536652986\n",
      "0.11613979792168384\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = np.array([[0,0],[0,1],[1,0],[1,1]])\n",
    "target = np.array([[0],[1],[0],[1]])\n",
    "\n",
    "weights_0_1 = np.random.rand(2,3)\n",
    "weights_1_2 = np.random.rand(3,1)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    layer_1 = data.dot(weights_0_1)\n",
    "    layer_2 = layer_1.dot(weights_1_2)\n",
    "    \n",
    "    # Compare\n",
    "    diff = (layer_2 - target)\n",
    "    sqdiff = (diff * diff)\n",
    "    loss = sqdiff.sum(0) # mean squared error loss\n",
    "\n",
    "    # Learn: this is the backpropagation piece\n",
    "    layer_1_grad = diff.dot(weights_1_2.transpose())\n",
    "    weight_1_2_update = layer_1.transpose().dot(diff)\n",
    "    weight_0_1_update = data.transpose().dot(layer_1_grad)\n",
    "    \n",
    "    weights_1_2 -= weight_1_2_update * 0.1\n",
    "    weights_0_1 -= weight_0_1_update * 0.1\n",
    "    print(loss[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.58128304]\n",
      "[0.48988149]\n",
      "[0.41375111]\n",
      "[0.34489412]\n",
      "[0.28210124]\n",
      "[0.2254484]\n",
      "[0.17538853]\n",
      "[0.1324231]\n",
      "[0.09682769]\n",
      "[0.06849361]\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([[0,0],[0,1],[1,0],[1,1]]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "w = list()\n",
    "w.append(Tensor(np.random.rand(2,3), autograd=True))\n",
    "w.append(Tensor(np.random.rand(3,1), autograd=True))\n",
    "\n",
    "for i in range(10):\n",
    "\n",
    "    # Predict\n",
    "    pred = data.mm(w[0]).mm(w[1])\n",
    "    \n",
    "    # Compare\n",
    "    loss = ((pred - target)*(pred - target)).sum(0)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "\n",
    "    for w_ in w:\n",
    "        w_.data -= w_.grad.data * 0.1\n",
    "        w_.grad.data *= 0\n",
    "\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 8: Adding Automatic Optimization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "class SGD(object):\n",
    "    \n",
    "    def __init__(self, parameters, alpha=0.1):\n",
    "        self.parameters = parameters\n",
    "        self.alpha = alpha\n",
    "    \n",
    "    def zero(self):\n",
    "        for p in self.parameters:\n",
    "            p.grad.data *= 0\n",
    "        \n",
    "    def step(self, zero=True):\n",
    "        \n",
    "        for p in self.parameters:\n",
    "            \n",
    "            p.data -= p.grad.data * self.alpha\n",
    "            \n",
    "            if(zero):\n",
    "                p.grad.data *= 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.58128304]\n",
      "[0.48988149]\n",
      "[0.41375111]\n",
      "[0.34489412]\n",
      "[0.28210124]\n",
      "[0.2254484]\n",
      "[0.17538853]\n",
      "[0.1324231]\n",
      "[0.09682769]\n",
      "[0.06849361]\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([[0,0],[0,1],[1,0],[1,1]]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "w = list()\n",
    "w.append(Tensor(np.random.rand(2,3), autograd=True))\n",
    "w.append(Tensor(np.random.rand(3,1), autograd=True))\n",
    "\n",
    "optim = SGD(parameters=w, alpha=0.1)\n",
    "\n",
    "for i in range(10):\n",
    "\n",
    "    # Predict\n",
    "    pred = data.mm(w[0]).mm(w[1])\n",
    "    \n",
    "    # Compare\n",
    "    loss = ((pred - target)*(pred - target)).sum(0)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 9: Adding Support for Layer Types"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Layer(object):\n",
    "    \n",
    "    def __init__(self):\n",
    "        self.parameters = list()\n",
    "        \n",
    "    def get_parameters(self):\n",
    "        return self.parameters\n",
    "\n",
    "\n",
    "class Linear(Layer):\n",
    "\n",
    "    def __init__(self, n_inputs, n_outputs):\n",
    "        super().__init__()\n",
    "        W = np.random.randn(n_inputs, n_outputs) * np.sqrt(2.0/(n_inputs))\n",
    "        self.weight = Tensor(W, autograd=True)\n",
    "        self.bias = Tensor(np.zeros(n_outputs), autograd=True)\n",
    "        \n",
    "        self.parameters.append(self.weight)\n",
    "        self.parameters.append(self.bias)\n",
    "\n",
    "    def forward(self, input):\n",
    "        return input.mm(self.weight)+self.bias.expand(0,len(input.data))\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 10: Layers Which Contain Layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2.33428272]\n",
      "[0.06743796]\n",
      "[0.0521849]\n",
      "[0.04079507]\n",
      "[0.03184365]\n",
      "[0.02479336]\n",
      "[0.01925443]\n",
      "[0.01491699]\n",
      "[0.01153118]\n",
      "[0.00889602]\n"
     ]
    }
   ],
   "source": [
    "\n",
    "class Sequential(Layer):\n",
    "    \n",
    "    def __init__(self, layers=list()):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.layers = layers\n",
    "    \n",
    "    def add(self, layer):\n",
    "        self.layers.append(layer)\n",
    "        \n",
    "    def forward(self, input):\n",
    "        for layer in self.layers:\n",
    "            input = layer.forward(input)\n",
    "        return input\n",
    "    \n",
    "    def get_parameters(self):\n",
    "        params = list()\n",
    "        for l in self.layers:\n",
    "            params += l.get_parameters()\n",
    "        return params\n",
    "    \n",
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([[0,0],[0,1],[1,0],[1,1]]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "model = Sequential([Linear(2,3), Linear(3,1)])\n",
    "\n",
    "optim = SGD(parameters=model.get_parameters(), alpha=0.05)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    pred = model.forward(data)\n",
    "    \n",
    "    # Compare\n",
    "    loss = ((pred - target)*(pred - target)).sum(0)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 11: Loss Function Layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[2.33428272]\n",
      "[0.06743796]\n",
      "[0.0521849]\n",
      "[0.04079507]\n",
      "[0.03184365]\n",
      "[0.02479336]\n",
      "[0.01925443]\n",
      "[0.01491699]\n",
      "[0.01153118]\n",
      "[0.00889602]\n"
     ]
    }
   ],
   "source": [
    "class MSELoss(Layer):\n",
    "    \n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, pred, target):\n",
    "        return ((pred - target)*(pred - target)).sum(0)\n",
    "    \n",
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([[0,0],[0,1],[1,0],[1,1]]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "model = Sequential([Linear(2,3), Linear(3,1)])\n",
    "criterion = MSELoss()\n",
    "\n",
    "optim = SGD(parameters=model.get_parameters(), alpha=0.05)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    pred = model.forward(data)\n",
    "    \n",
    "    # Compare\n",
    "    loss = criterion.forward(pred, target)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 12: Non-linearity Layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "                if(self.creation_op == \"sigmoid\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (self * (ones - self)))\n",
    "                \n",
    "                if(self.creation_op == \"tanh\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (ones - (self * self)))\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1)\n",
    "    \n",
    "    def __sub__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data - other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"sub\")\n",
    "        return Tensor(self.data - other.data)\n",
    "    \n",
    "    def __mul__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data * other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"mul\")\n",
    "        return Tensor(self.data * other.data)    \n",
    "\n",
    "    def sum(self, dim):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.sum(dim),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sum_\"+str(dim))\n",
    "        return Tensor(self.data.sum(dim))\n",
    "    \n",
    "    def expand(self, dim,copies):\n",
    "\n",
    "        trans_cmd = list(range(0,len(self.data.shape)))\n",
    "        trans_cmd.insert(dim,len(self.data.shape))\n",
    "        new_data = self.data.repeat(copies).reshape(list(self.data.shape) + [copies]).transpose(trans_cmd)\n",
    "        \n",
    "        if(self.autograd):\n",
    "            return Tensor(new_data,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"expand_\"+str(dim))\n",
    "        return Tensor(new_data)\n",
    "    \n",
    "    def transpose(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.transpose(),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"transpose\")\n",
    "        \n",
    "        return Tensor(self.data.transpose())\n",
    "    \n",
    "    def mm(self, x):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.dot(x.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self,x],\n",
    "                          creation_op=\"mm\")\n",
    "        return Tensor(self.data.dot(x.data))\n",
    "    \n",
    "    def sigmoid(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(1 / (1 + np.exp(-self.data)),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sigmoid\")\n",
    "        return Tensor(1 / (1 + np.exp(-self.data)))\n",
    "\n",
    "    def tanh(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(np.tanh(self.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"tanh\")\n",
    "        return Tensor(np.tanh(self.data))\n",
    "        \n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "class Tanh(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.tanh()\n",
    "    \n",
    "class Sigmoid(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.sigmoid()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1.06372865]\n",
      "[0.75148144]\n",
      "[0.57384259]\n",
      "[0.39574294]\n",
      "[0.2482279]\n",
      "[0.15515294]\n",
      "[0.10423398]\n",
      "[0.07571169]\n",
      "[0.05837623]\n",
      "[0.04700013]\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([[0,0],[0,1],[1,0],[1,1]]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "model = Sequential([Linear(2,3), Tanh(), Linear(3,1), Sigmoid()])\n",
    "criterion = MSELoss()\n",
    "\n",
    "optim = SGD(parameters=model.get_parameters(), alpha=1)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    pred = model.forward(data)\n",
    "    \n",
    "    # Compare\n",
    "    loss = criterion.forward(pred, target)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 13: The Embedding Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Embedding(Layer):\n",
    "    \n",
    "    def __init__(self, vocab_size, dim):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.vocab_size = vocab_size\n",
    "        self.dim = dim\n",
    "        \n",
    "        # this random initialiation style is just a convention from word2vec\n",
    "        self.weight = (np.random.rand(vocab_size, dim) - 0.5) / dim\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 14: Add Indexing to Autograd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "                if(self.creation_op == \"sigmoid\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (self * (ones - self)))\n",
    "                \n",
    "                if(self.creation_op == \"tanh\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (ones - (self * self)))\n",
    "                \n",
    "                if(self.creation_op == \"index_select\"):\n",
    "                    new_grad = np.zeros_like(self.creators[0].data)\n",
    "                    indices_ = self.index_select_indices.data.flatten()\n",
    "                    grad_ = grad.data.reshape(len(indices_), -1)\n",
    "                    for i in range(len(indices_)):\n",
    "                        new_grad[indices_[i]] += grad_[i]\n",
    "                    self.creators[0].backward(Tensor(new_grad))\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1)\n",
    "    \n",
    "    def __sub__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data - other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"sub\")\n",
    "        return Tensor(self.data - other.data)\n",
    "    \n",
    "    def __mul__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data * other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"mul\")\n",
    "        return Tensor(self.data * other.data)    \n",
    "\n",
    "    def sum(self, dim):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.sum(dim),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sum_\"+str(dim))\n",
    "        return Tensor(self.data.sum(dim))\n",
    "    \n",
    "    def expand(self, dim,copies):\n",
    "\n",
    "        trans_cmd = list(range(0,len(self.data.shape)))\n",
    "        trans_cmd.insert(dim,len(self.data.shape))\n",
    "        new_data = self.data.repeat(copies).reshape(list(self.data.shape) + [copies]).transpose(trans_cmd)\n",
    "        \n",
    "        if(self.autograd):\n",
    "            return Tensor(new_data,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"expand_\"+str(dim))\n",
    "        return Tensor(new_data)\n",
    "    \n",
    "    def transpose(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.transpose(),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"transpose\")\n",
    "        \n",
    "        return Tensor(self.data.transpose())\n",
    "    \n",
    "    def mm(self, x):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.dot(x.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self,x],\n",
    "                          creation_op=\"mm\")\n",
    "        return Tensor(self.data.dot(x.data))\n",
    "    \n",
    "    def sigmoid(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(1 / (1 + np.exp(-self.data)),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sigmoid\")\n",
    "        return Tensor(1 / (1 + np.exp(-self.data)))\n",
    "\n",
    "    def tanh(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(np.tanh(self.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"tanh\")\n",
    "        return Tensor(np.tanh(self.data))\n",
    "    \n",
    "    def index_select(self, indices):\n",
    "\n",
    "        if(self.autograd):\n",
    "            new = Tensor(self.data[indices.data],\n",
    "                         autograd=True,\n",
    "                         creators=[self],\n",
    "                         creation_op=\"index_select\")\n",
    "            new.index_select_indices = indices\n",
    "            return new\n",
    "        return Tensor(self.data[indices.data])\n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "class Tanh(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.tanh()\n",
    "    \n",
    "class Sigmoid(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.sigmoid()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[0. 0. 0. 0. 0.]\n",
      " [1. 1. 1. 1. 1.]\n",
      " [2. 2. 2. 2. 2.]\n",
      " [2. 2. 2. 2. 2.]\n",
      " [1. 1. 1. 1. 1.]]\n"
     ]
    }
   ],
   "source": [
    "x = Tensor(np.eye(5), autograd=True)\n",
    "x.index_select(Tensor([[1,2,3],[2,3,4]])).backward()\n",
    "print(x.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 15: The Embedding Layer (revisited)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Embedding(Layer):\n",
    "    \n",
    "    def __init__(self, vocab_size, dim):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.vocab_size = vocab_size\n",
    "        self.dim = dim\n",
    "        \n",
    "        # this random initialiation style is just a convention from word2vec\n",
    "        self.weight = Tensor((np.random.rand(vocab_size, dim) - 0.5) / dim, autograd=True)\n",
    "        \n",
    "        self.parameters.append(self.weight)\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return self.weight.index_select(input)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.98874126]\n",
      "[0.6658868]\n",
      "[0.45639889]\n",
      "[0.31608168]\n",
      "[0.2260925]\n",
      "[0.16877423]\n",
      "[0.13120515]\n",
      "[0.10555487]\n",
      "[0.08731868]\n",
      "[0.07387834]\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "data = Tensor(np.array([1,2,1,2]), autograd=True)\n",
    "target = Tensor(np.array([[0],[1],[0],[1]]), autograd=True)\n",
    "\n",
    "embed = Embedding(5,3)\n",
    "model = Sequential([embed, Tanh(), Linear(3,1), Sigmoid()])\n",
    "criterion = MSELoss()\n",
    "\n",
    "optim = SGD(parameters=model.get_parameters(), alpha=0.5)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    pred = model.forward(data)\n",
    "    \n",
    "    # Compare\n",
    "    loss = criterion.forward(pred, target)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 16: The Cross Entropy Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "                if(self.creation_op == \"sigmoid\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (self * (ones - self)))\n",
    "                \n",
    "                if(self.creation_op == \"tanh\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (ones - (self * self)))\n",
    "                \n",
    "                if(self.creation_op == \"index_select\"):\n",
    "                    new_grad = np.zeros_like(self.creators[0].data)\n",
    "                    indices_ = self.index_select_indices.data.flatten()\n",
    "                    grad_ = grad.data.reshape(len(indices_), -1)\n",
    "                    for i in range(len(indices_)):\n",
    "                        new_grad[indices_[i]] += grad_[i]\n",
    "                    self.creators[0].backward(Tensor(new_grad))\n",
    "                    \n",
    "                if(self.creation_op == \"cross_entropy\"):\n",
    "                    dx = self.softmax_output - self.target_dist\n",
    "                    self.creators[0].backward(Tensor(dx))\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1)\n",
    "    \n",
    "    def __sub__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data - other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"sub\")\n",
    "        return Tensor(self.data - other.data)\n",
    "    \n",
    "    def __mul__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data * other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"mul\")\n",
    "        return Tensor(self.data * other.data)    \n",
    "\n",
    "    def sum(self, dim):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.sum(dim),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sum_\"+str(dim))\n",
    "        return Tensor(self.data.sum(dim))\n",
    "    \n",
    "    def expand(self, dim,copies):\n",
    "\n",
    "        trans_cmd = list(range(0,len(self.data.shape)))\n",
    "        trans_cmd.insert(dim,len(self.data.shape))\n",
    "        new_data = self.data.repeat(copies).reshape(list(self.data.shape) + [copies]).transpose(trans_cmd)\n",
    "        \n",
    "        if(self.autograd):\n",
    "            return Tensor(new_data,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"expand_\"+str(dim))\n",
    "        return Tensor(new_data)\n",
    "    \n",
    "    def transpose(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.transpose(),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"transpose\")\n",
    "        \n",
    "        return Tensor(self.data.transpose())\n",
    "    \n",
    "    def mm(self, x):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.dot(x.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self,x],\n",
    "                          creation_op=\"mm\")\n",
    "        return Tensor(self.data.dot(x.data))\n",
    "    \n",
    "    def sigmoid(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(1 / (1 + np.exp(-self.data)),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sigmoid\")\n",
    "        return Tensor(1 / (1 + np.exp(-self.data)))\n",
    "\n",
    "    def tanh(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(np.tanh(self.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"tanh\")\n",
    "        return Tensor(np.tanh(self.data))\n",
    "    \n",
    "    def index_select(self, indices):\n",
    "\n",
    "        if(self.autograd):\n",
    "            new = Tensor(self.data[indices.data],\n",
    "                         autograd=True,\n",
    "                         creators=[self],\n",
    "                         creation_op=\"index_select\")\n",
    "            new.index_select_indices = indices\n",
    "            return new\n",
    "        return Tensor(self.data[indices.data])\n",
    "    \n",
    "    def cross_entropy(self, target_indices):\n",
    "\n",
    "        temp = np.exp(self.data)\n",
    "        softmax_output = temp / np.sum(temp,\n",
    "                                       axis=len(self.data.shape)-1,\n",
    "                                       keepdims=True)\n",
    "        \n",
    "        t = target_indices.data.flatten()\n",
    "        p = softmax_output.reshape(len(t),-1)\n",
    "        target_dist = np.eye(p.shape[1])[t]\n",
    "        loss = -(np.log(p) * (target_dist)).sum(1).mean()\n",
    "    \n",
    "        if(self.autograd):\n",
    "            out = Tensor(loss,\n",
    "                         autograd=True,\n",
    "                         creators=[self],\n",
    "                         creation_op=\"cross_entropy\")\n",
    "            out.softmax_output = softmax_output\n",
    "            out.target_dist = target_dist\n",
    "            return out\n",
    "\n",
    "        return Tensor(loss)\n",
    "        \n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "    \n",
    "class Tanh(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.tanh()\n",
    "    \n",
    "class Sigmoid(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.sigmoid()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "class CrossEntropyLoss(object):\n",
    "    \n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input, target):\n",
    "        return input.cross_entropy(target)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.3885032434928422\n",
      "0.9558181509266037\n",
      "0.6823083585795604\n",
      "0.5095259967493119\n",
      "0.39574491472895856\n",
      "0.31752527285348264\n",
      "0.2617222861964216\n",
      "0.22061283923954234\n",
      "0.18946427334830068\n",
      "0.16527389263866668\n"
     ]
    }
   ],
   "source": [
    "import numpy\n",
    "np.random.seed(0)\n",
    "\n",
    "# data indices\n",
    "data = Tensor(np.array([1,2,1,2]), autograd=True)\n",
    "\n",
    "# target indices\n",
    "target = Tensor(np.array([0,1,0,1]), autograd=True)\n",
    "\n",
    "model = Sequential([Embedding(3,3), Tanh(), Linear(3,4)])\n",
    "criterion = CrossEntropyLoss()\n",
    "\n",
    "optim = SGD(parameters=model.get_parameters(), alpha=0.1)\n",
    "\n",
    "for i in range(10):\n",
    "    \n",
    "    # Predict\n",
    "    pred = model.forward(data)\n",
    "    \n",
    "    # Compare\n",
    "    loss = criterion.forward(pred, target)\n",
    "    \n",
    "    # Learn\n",
    "    loss.backward(Tensor(np.ones_like(loss.data)))\n",
    "    optim.step()\n",
    "    print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 17: The Recurrent Neural Network Layer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 400,
   "metadata": {},
   "outputs": [],
   "source": [
    "class RNNCell(Layer):\n",
    "    \n",
    "    def __init__(self, n_inputs, n_hidden, n_output, activation='sigmoid'):\n",
    "        super().__init__()\n",
    "\n",
    "        self.n_inputs = n_inputs\n",
    "        self.n_hidden = n_hidden\n",
    "        self.n_output = n_output\n",
    "        \n",
    "        if(activation == 'sigmoid'):\n",
    "            self.activation = Sigmoid()\n",
    "        elif(activation == 'tanh'):\n",
    "            self.activation == Tanh()\n",
    "        else:\n",
    "            raise Exception(\"Non-linearity not found\")\n",
    "\n",
    "        self.w_ih = Linear(n_inputs, n_hidden)\n",
    "        self.w_hh = Linear(n_hidden, n_hidden)\n",
    "        self.w_ho = Linear(n_hidden, n_output)\n",
    "        \n",
    "        self.parameters += self.w_ih.get_parameters()\n",
    "        self.parameters += self.w_hh.get_parameters()\n",
    "        self.parameters += self.w_ho.get_parameters()        \n",
    "    \n",
    "    def forward(self, input, hidden):\n",
    "        from_prev_hidden = self.w_hh.forward(hidden)\n",
    "        combined = self.w_ih.forward(input) + from_prev_hidden\n",
    "        new_hidden = self.activation.forward(combined)\n",
    "        output = self.w_ho.forward(new_hidden)\n",
    "        return output, new_hidden\n",
    "    \n",
    "    def init_hidden(self, batch_size=1):\n",
    "        return Tensor(np.zeros((batch_size,self.n_hidden)), autograd=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 401,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys,random,math\n",
    "from collections import Counter\n",
    "import numpy as np\n",
    "\n",
    "f = open('tasksv11/en/qa1_single-supporting-fact_train.txt','r')\n",
    "raw = f.readlines()\n",
    "f.close()\n",
    "\n",
    "tokens = list()\n",
    "for line in raw[0:1000]:\n",
    "    tokens.append(line.lower().replace(\"\\n\",\"\").split(\" \")[1:])\n",
    "\n",
    "new_tokens = list()\n",
    "for line in tokens:\n",
    "    new_tokens.append(['-'] * (6 - len(line)) + line)\n",
    "\n",
    "tokens = new_tokens\n",
    "\n",
    "vocab = set()\n",
    "for sent in tokens:\n",
    "    for word in sent:\n",
    "        vocab.add(word)\n",
    "\n",
    "vocab = list(vocab)\n",
    "\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "    \n",
    "def words2indices(sentence):\n",
    "    idx = list()\n",
    "    for word in sentence:\n",
    "        idx.append(word2index[word])\n",
    "    return idx\n",
    "\n",
    "indices = list()\n",
    "for line in tokens:\n",
    "    idx = list()\n",
    "    for w in line:\n",
    "        idx.append(word2index[w])\n",
    "    indices.append(idx)\n",
    "\n",
    "data = np.array(indices)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 402,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 403,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 404,
   "metadata": {},
   "outputs": [],
   "source": [
    "embed = Embedding(vocab_size=len(vocab),dim=16)\n",
    "model = RNNCell(n_inputs=16, n_hidden=16, n_output=len(vocab))\n",
    "\n",
    "criterion = CrossEntropyLoss()\n",
    "optim = SGD(parameters=model.get_parameters() + embed.get_parameters(), alpha=0.05)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 405,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.47631100976371393 % Correct: 0.01\n",
      "Loss: 0.17189538896184856 % Correct: 0.28\n",
      "Loss: 0.1460940222788725 % Correct: 0.37\n",
      "Loss: 0.13845863915406884 % Correct: 0.37\n",
      "Loss: 0.135574472565278 % Correct: 0.37\n"
     ]
    }
   ],
   "source": [
    "for iter in range(1000):\n",
    "    batch_size = 100\n",
    "    total_loss = 0\n",
    "    \n",
    "    hidden = model.init_hidden(batch_size=batch_size)\n",
    "\n",
    "    for t in range(5):\n",
    "        input = Tensor(data[0:batch_size,t], autograd=True)\n",
    "        rnn_input = embed.forward(input=input)\n",
    "        output, hidden = model.forward(input=rnn_input, hidden=hidden)\n",
    "\n",
    "    target = Tensor(data[0:batch_size,t+1], autograd=True)    \n",
    "    loss = criterion.forward(output, target)\n",
    "    loss.backward()\n",
    "    optim.step()\n",
    "    total_loss += loss.data\n",
    "    if(iter % 200 == 0):\n",
    "        p_correct = (target.data == np.argmax(output.data,axis=1)).mean()\n",
    "        print(\"Loss:\",total_loss / (len(data)/batch_size),\"% Correct:\",p_correct)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 362,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 406,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Context: - mary moved to the \n",
      "True: bathroom.\n",
      "Pred: office.\n"
     ]
    }
   ],
   "source": [
    "batch_size = 1\n",
    "hidden = model.init_hidden(batch_size=batch_size)\n",
    "for t in range(5):\n",
    "    input = Tensor(data[0:batch_size,t], autograd=True)\n",
    "    rnn_input = embed.forward(input=input)\n",
    "    output, hidden = model.forward(input=rnn_input, hidden=hidden)\n",
    "\n",
    "target = Tensor(data[0:batch_size,t+1], autograd=True)    \n",
    "loss = criterion.forward(output, target)\n",
    "\n",
    "ctx = \"\"\n",
    "for idx in data[0:batch_size][0][0:-1]:\n",
    "    ctx += vocab[idx] + \" \"\n",
    "print(\"Context:\",ctx)\n",
    "print(\"True:\",vocab[target.data[0]])\n",
    "print(\"Pred:\", vocab[output.data.argmax()])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter14 - Exploding Gradients Examples.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 158,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Activations\n",
      "[0.93940638 0.96852968]\n",
      "[0.9919462  0.99121735]\n",
      "[0.99301385 0.99302901]\n",
      "[0.9930713  0.99307098]\n",
      "[0.99307285 0.99307285]\n",
      "[0.99307291 0.99307291]\n",
      "[0.99307291 0.99307291]\n",
      "[0.99307291 0.99307291]\n",
      "[0.99307291 0.99307291]\n",
      "[0.99307291 0.99307291]\n",
      "\n",
      "Gradients\n",
      "[0.03439552 0.03439552]\n",
      "[0.00118305 0.00118305]\n",
      "[4.06916726e-05 4.06916726e-05]\n",
      "[1.39961115e-06 1.39961115e-06]\n",
      "[4.81403643e-08 4.81403637e-08]\n",
      "[1.65582672e-09 1.65582765e-09]\n",
      "[5.69682675e-11 5.69667160e-11]\n",
      "[1.97259346e-12 1.97517920e-12]\n",
      "[8.45387597e-14 8.02306381e-14]\n",
      "[1.45938177e-14 2.16938983e-14]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "sigmoid = lambda x:1/(1 + np.exp(-x))\n",
    "relu = lambda x:(x>0).astype(float)*x\n",
    "\n",
    "weights = np.array([[1,4],[4,1]])\n",
    "activation = sigmoid(np.array([1,0.01]))\n",
    "\n",
    "print(\"Activations\")\n",
    "activations = list()\n",
    "for iter in range(10):\n",
    "    activation = sigmoid(activation.dot(weights))\n",
    "    activations.append(activation)\n",
    "    print(activation)\n",
    "print(\"\\nGradients\")\n",
    "gradient = np.ones_like(activation)\n",
    "for activation in reversed(activations):\n",
    "    gradient = (activation * (1 - activation) * gradient)\n",
    "    gradient = gradient.dot(weights.transpose())\n",
    "    print(gradient)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Relu Activations\n",
      "[23.71814585 23.98025559]\n",
      "[119.63916823 118.852839  ]\n",
      "[595.05052421 597.40951192]\n",
      "[2984.68857188 2977.61160877]\n",
      "[14895.13500696 14916.36589628]\n",
      "[74560.59859209 74496.90592414]\n",
      "[372548.22228863 372739.30029248]\n",
      "[1863505.42345854 1862932.18944699]\n",
      "[9315234.18124649 9316953.88328115]\n",
      "[46583049.71437107 46577890.60826711]\n",
      "\n",
      "Relu Gradients\n",
      "[5. 5.]\n",
      "[25. 25.]\n",
      "[125. 125.]\n",
      "[625. 625.]\n",
      "[3125. 3125.]\n",
      "[15625. 15625.]\n",
      "[78125. 78125.]\n",
      "[390625. 390625.]\n",
      "[1953125. 1953125.]\n",
      "[9765625. 9765625.]\n"
     ]
    }
   ],
   "source": [
    "print(\"Relu Activations\")\n",
    "activations = list()\n",
    "for iter in range(10):\n",
    "    activation = relu(activation.dot(weights))\n",
    "    activations.append(activation)\n",
    "    print(activation)\n",
    "\n",
    "print(\"\\nRelu Gradients\")\n",
    "gradient = np.ones_like(activation)\n",
    "for activation in reversed(activations):\n",
    "    gradient = ((activation > 0) * gradient).dot(weights.transpose())\n",
    "    print(gradient)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter14 - Intro to LSTMs - Learn to Write Like Shakespeare.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,1000000000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    return\n",
    "                    print(self.id)\n",
    "                    print(self.creation_op)\n",
    "                    print(len(self.creators))\n",
    "                    for c in self.creators:\n",
    "                        print(c.creation_op)\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "                if(self.creation_op == \"sigmoid\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (self * (ones - self)))\n",
    "                \n",
    "                if(self.creation_op == \"tanh\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (ones - (self * self)))\n",
    "                \n",
    "                if(self.creation_op == \"index_select\"):\n",
    "                    new_grad = np.zeros_like(self.creators[0].data)\n",
    "                    indices_ = self.index_select_indices.data.flatten()\n",
    "                    grad_ = grad.data.reshape(len(indices_), -1)\n",
    "                    for i in range(len(indices_)):\n",
    "                        new_grad[indices_[i]] += grad_[i]\n",
    "                    self.creators[0].backward(Tensor(new_grad))\n",
    "                    \n",
    "                if(self.creation_op == \"cross_entropy\"):\n",
    "                    dx = self.softmax_output - self.target_dist\n",
    "                    self.creators[0].backward(Tensor(dx))\n",
    "                    \n",
    "    def __add__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data + other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"add\")\n",
    "        return Tensor(self.data + other.data)\n",
    "\n",
    "    def __neg__(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data * -1,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"neg\")\n",
    "        return Tensor(self.data * -1)\n",
    "    \n",
    "    def __sub__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data - other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"sub\")\n",
    "        return Tensor(self.data - other.data)\n",
    "    \n",
    "    def __mul__(self, other):\n",
    "        if(self.autograd and other.autograd):\n",
    "            return Tensor(self.data * other.data,\n",
    "                          autograd=True,\n",
    "                          creators=[self,other],\n",
    "                          creation_op=\"mul\")\n",
    "        return Tensor(self.data * other.data)    \n",
    "\n",
    "    def sum(self, dim):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.sum(dim),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sum_\"+str(dim))\n",
    "        return Tensor(self.data.sum(dim))\n",
    "    \n",
    "    def expand(self, dim,copies):\n",
    "\n",
    "        trans_cmd = list(range(0,len(self.data.shape)))\n",
    "        trans_cmd.insert(dim,len(self.data.shape))\n",
    "        new_data = self.data.repeat(copies).reshape(list(self.data.shape) + [copies]).transpose(trans_cmd)\n",
    "        \n",
    "        if(self.autograd):\n",
    "            return Tensor(new_data,\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"expand_\"+str(dim))\n",
    "        return Tensor(new_data)\n",
    "    \n",
    "    def transpose(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.transpose(),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"transpose\")\n",
    "        \n",
    "        return Tensor(self.data.transpose())\n",
    "    \n",
    "    def mm(self, x):\n",
    "        if(self.autograd):\n",
    "            return Tensor(self.data.dot(x.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self,x],\n",
    "                          creation_op=\"mm\")\n",
    "        return Tensor(self.data.dot(x.data))\n",
    "    \n",
    "    def sigmoid(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(1 / (1 + np.exp(-self.data)),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"sigmoid\")\n",
    "        return Tensor(1 / (1 + np.exp(-self.data)))\n",
    "\n",
    "    def tanh(self):\n",
    "        if(self.autograd):\n",
    "            return Tensor(np.tanh(self.data),\n",
    "                          autograd=True,\n",
    "                          creators=[self],\n",
    "                          creation_op=\"tanh\")\n",
    "        return Tensor(np.tanh(self.data))\n",
    "    \n",
    "    def index_select(self, indices):\n",
    "\n",
    "        if(self.autograd):\n",
    "            new = Tensor(self.data[indices.data],\n",
    "                         autograd=True,\n",
    "                         creators=[self],\n",
    "                         creation_op=\"index_select\")\n",
    "            new.index_select_indices = indices\n",
    "            return new\n",
    "        return Tensor(self.data[indices.data])\n",
    "    \n",
    "    def softmax(self):\n",
    "        temp = np.exp(self.data)\n",
    "        softmax_output = temp / np.sum(temp,\n",
    "                                       axis=len(self.data.shape)-1,\n",
    "                                       keepdims=True)\n",
    "        return softmax_output\n",
    "    \n",
    "    def cross_entropy(self, target_indices):\n",
    "\n",
    "        temp = np.exp(self.data)\n",
    "        softmax_output = temp / np.sum(temp,\n",
    "                                       axis=len(self.data.shape)-1,\n",
    "                                       keepdims=True)\n",
    "        \n",
    "        t = target_indices.data.flatten()\n",
    "        p = softmax_output.reshape(len(t),-1)\n",
    "        target_dist = np.eye(p.shape[1])[t]\n",
    "        loss = -(np.log(p) * (target_dist)).sum(1).mean()\n",
    "    \n",
    "        if(self.autograd):\n",
    "            out = Tensor(loss,\n",
    "                         autograd=True,\n",
    "                         creators=[self],\n",
    "                         creation_op=\"cross_entropy\")\n",
    "            out.softmax_output = softmax_output\n",
    "            out.target_dist = target_dist\n",
    "            return out\n",
    "\n",
    "        return Tensor(loss)\n",
    "        \n",
    "    \n",
    "    def __repr__(self):\n",
    "        return str(self.data.__repr__())\n",
    "    \n",
    "    def __str__(self):\n",
    "        return str(self.data.__str__())  \n",
    "\n",
    "class Layer(object):\n",
    "    \n",
    "    def __init__(self):\n",
    "        self.parameters = list()\n",
    "        \n",
    "    def get_parameters(self):\n",
    "        return self.parameters\n",
    "\n",
    "    \n",
    "class SGD(object):\n",
    "    \n",
    "    def __init__(self, parameters, alpha=0.1):\n",
    "        self.parameters = parameters\n",
    "        self.alpha = alpha\n",
    "    \n",
    "    def zero(self):\n",
    "        for p in self.parameters:\n",
    "            p.grad.data *= 0\n",
    "        \n",
    "    def step(self, zero=True):\n",
    "        \n",
    "        for p in self.parameters:\n",
    "            \n",
    "            p.data -= p.grad.data * self.alpha\n",
    "            \n",
    "            if(zero):\n",
    "                p.grad.data *= 0\n",
    "\n",
    "\n",
    "class Linear(Layer):\n",
    "\n",
    "    def __init__(self, n_inputs, n_outputs, bias=True):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.use_bias = bias\n",
    "        \n",
    "        W = np.random.randn(n_inputs, n_outputs) * np.sqrt(2.0/(n_inputs))\n",
    "        self.weight = Tensor(W, autograd=True)\n",
    "        if(self.use_bias):\n",
    "            self.bias = Tensor(np.zeros(n_outputs), autograd=True)\n",
    "        \n",
    "        self.parameters.append(self.weight)\n",
    "        \n",
    "        if(self.use_bias):        \n",
    "            self.parameters.append(self.bias)\n",
    "\n",
    "    def forward(self, input):\n",
    "        if(self.use_bias):\n",
    "            return input.mm(self.weight)+self.bias.expand(0,len(input.data))\n",
    "        return input.mm(self.weight)\n",
    "\n",
    "\n",
    "class Sequential(Layer):\n",
    "    \n",
    "    def __init__(self, layers=list()):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.layers = layers\n",
    "    \n",
    "    def add(self, layer):\n",
    "        self.layers.append(layer)\n",
    "        \n",
    "    def forward(self, input):\n",
    "        for layer in self.layers:\n",
    "            input = layer.forward(input)\n",
    "        return input\n",
    "    \n",
    "    def get_parameters(self):\n",
    "        params = list()\n",
    "        for l in self.layers:\n",
    "            params += l.get_parameters()\n",
    "        return params\n",
    "\n",
    "\n",
    "class Embedding(Layer):\n",
    "    \n",
    "    def __init__(self, vocab_size, dim):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.vocab_size = vocab_size\n",
    "        self.dim = dim\n",
    "        \n",
    "        # this random initialiation style is just a convention from word2vec\n",
    "        self.weight = Tensor((np.random.rand(vocab_size, dim) - 0.5) / dim, autograd=True)\n",
    "        \n",
    "        self.parameters.append(self.weight)\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return self.weight.index_select(input)\n",
    "\n",
    "\n",
    "class Tanh(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.tanh()\n",
    "\n",
    "\n",
    "class Sigmoid(Layer):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input):\n",
    "        return input.sigmoid()\n",
    "    \n",
    "\n",
    "class CrossEntropyLoss(object):\n",
    "    \n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "    \n",
    "    def forward(self, input, target):\n",
    "        return input.cross_entropy(target)\n",
    "\n",
    "    \n",
    "class RNNCell(Layer):\n",
    "    \n",
    "    def __init__(self, n_inputs, n_hidden, n_output, activation='sigmoid'):\n",
    "        super().__init__()\n",
    "\n",
    "        self.n_inputs = n_inputs\n",
    "        self.n_hidden = n_hidden\n",
    "        self.n_output = n_output\n",
    "        \n",
    "        if(activation == 'sigmoid'):\n",
    "            self.activation = Sigmoid()\n",
    "        elif(activation == 'tanh'):\n",
    "            self.activation == Tanh()\n",
    "        else:\n",
    "            raise Exception(\"Non-linearity not found\")\n",
    "\n",
    "        self.w_ih = Linear(n_inputs, n_hidden)\n",
    "        self.w_hh = Linear(n_hidden, n_hidden)\n",
    "        self.w_ho = Linear(n_hidden, n_output)\n",
    "        \n",
    "        self.parameters += self.w_ih.get_parameters()\n",
    "        self.parameters += self.w_hh.get_parameters()\n",
    "        self.parameters += self.w_ho.get_parameters()        \n",
    "    \n",
    "    def forward(self, input, hidden):\n",
    "        from_prev_hidden = self.w_hh.forward(hidden)\n",
    "        combined = self.w_ih.forward(input) + from_prev_hidden\n",
    "        new_hidden = self.activation.forward(combined)\n",
    "        output = self.w_ho.forward(new_hidden)\n",
    "        return output, new_hidden\n",
    "    \n",
    "    def init_hidden(self, batch_size=1):\n",
    "        return Tensor(np.zeros((batch_size,self.n_hidden)), autograd=True)\n",
    "    \n",
    "class LSTMCell(Layer):\n",
    "    \n",
    "    def __init__(self, n_inputs, n_hidden, n_output):\n",
    "        super().__init__()\n",
    "\n",
    "        self.n_inputs = n_inputs\n",
    "        self.n_hidden = n_hidden\n",
    "        self.n_output = n_output\n",
    "\n",
    "        self.xf = Linear(n_inputs, n_hidden)\n",
    "        self.xi = Linear(n_inputs, n_hidden)\n",
    "        self.xo = Linear(n_inputs, n_hidden)        \n",
    "        self.xc = Linear(n_inputs, n_hidden)        \n",
    "        \n",
    "        self.hf = Linear(n_hidden, n_hidden, bias=False)\n",
    "        self.hi = Linear(n_hidden, n_hidden, bias=False)\n",
    "        self.ho = Linear(n_hidden, n_hidden, bias=False)\n",
    "        self.hc = Linear(n_hidden, n_hidden, bias=False)        \n",
    "        \n",
    "        self.w_ho = Linear(n_hidden, n_output, bias=False)\n",
    "        \n",
    "        self.parameters += self.xf.get_parameters()\n",
    "        self.parameters += self.xi.get_parameters()\n",
    "        self.parameters += self.xo.get_parameters()\n",
    "        self.parameters += self.xc.get_parameters()\n",
    "\n",
    "        self.parameters += self.hf.get_parameters()\n",
    "        self.parameters += self.hi.get_parameters()        \n",
    "        self.parameters += self.ho.get_parameters()        \n",
    "        self.parameters += self.hc.get_parameters()                \n",
    "        \n",
    "        self.parameters += self.w_ho.get_parameters()        \n",
    "    \n",
    "    def forward(self, input, hidden):\n",
    "        \n",
    "        prev_hidden = hidden[0]        \n",
    "        prev_cell = hidden[1]\n",
    "        \n",
    "        f = (self.xf.forward(input) + self.hf.forward(prev_hidden)).sigmoid()\n",
    "        i = (self.xi.forward(input) + self.hi.forward(prev_hidden)).sigmoid()\n",
    "        o = (self.xo.forward(input) + self.ho.forward(prev_hidden)).sigmoid()        \n",
    "        g = (self.xc.forward(input) + self.hc.forward(prev_hidden)).tanh()        \n",
    "        c = (f * prev_cell) + (i * g)\n",
    "\n",
    "        h = o * c.tanh()\n",
    "        \n",
    "        output = self.w_ho.forward(h)\n",
    "        return output, (h, c)\n",
    "    \n",
    "    def init_hidden(self, batch_size=1):\n",
    "        init_hidden = Tensor(np.zeros((batch_size,self.n_hidden)), autograd=True)\n",
    "        init_cell = Tensor(np.zeros((batch_size,self.n_hidden)), autograd=True)\n",
    "        init_hidden.data[:,0] += 1\n",
    "        init_cell.data[:,0] += 1\n",
    "        return (init_hidden, init_cell)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1: RNN Character Language Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys,random,math\n",
    "from collections import Counter\n",
    "import numpy as np\n",
    "import sys\n",
    "\n",
    "np.random.seed(0)\n",
    "\n",
    "# dataset from http://karpathy.github.io/2015/05/21/rnn-effectiveness/\n",
    "f = open('shakespear.txt','r')\n",
    "raw = f.read()\n",
    "f.close()\n",
    "\n",
    "vocab = list(set(raw))\n",
    "word2index = {}\n",
    "for i,word in enumerate(vocab):\n",
    "    word2index[word]=i\n",
    "indices = np.array(list(map(lambda x:word2index[x], raw)))\n",
    "\n",
    "embed = Embedding(vocab_size=len(vocab),dim=512)\n",
    "model = LSTMCell(n_inputs=512, n_hidden=512, n_output=len(vocab))\n",
    "model.w_ho.weight.data *= 0\n",
    "\n",
    "criterion = CrossEntropyLoss()\n",
    "optim = SGD(parameters=model.get_parameters() + embed.get_parameters(), alpha=0.05)\n",
    "\n",
    "def generate_sample(n=30, init_char=' '):\n",
    "    s = \"\"\n",
    "    hidden = model.init_hidden(batch_size=1)\n",
    "    input = Tensor(np.array([word2index[init_char]]))\n",
    "    for i in range(n):\n",
    "        rnn_input = embed.forward(input)\n",
    "        output, hidden = model.forward(input=rnn_input, hidden=hidden)\n",
    "#         output.data *= 25\n",
    "#         temp_dist = output.softmax()\n",
    "#         temp_dist /= temp_dist.sum()\n",
    "\n",
    "#         m = (temp_dist > np.random.rand()).argmax()\n",
    "        m = output.data.argmax()\n",
    "        c = vocab[m]\n",
    "        input = Tensor(np.array([m]))\n",
    "        s += c\n",
    "    return s\n",
    "\n",
    "batch_size = 16\n",
    "bptt = 25\n",
    "n_batches = int((indices.shape[0] / (batch_size)))\n",
    "\n",
    "trimmed_indices = indices[:n_batches*batch_size]\n",
    "batched_indices = trimmed_indices.reshape(batch_size, n_batches).transpose()\n",
    "\n",
    "input_batched_indices = batched_indices[0:-1]\n",
    "target_batched_indices = batched_indices[1:]\n",
    "\n",
    "n_bptt = int(((n_batches-1) / bptt))\n",
    "input_batches = input_batched_indices[:n_bptt*bptt].reshape(n_bptt,bptt,batch_size)\n",
    "target_batches = target_batched_indices[:n_bptt*bptt].reshape(n_bptt, bptt, batch_size)\n",
    "min_loss = 1000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train(iterations=400):\n",
    "    for iter in range(iterations):\n",
    "        total_loss = 0\n",
    "        n_loss = 0\n",
    "\n",
    "        hidden = model.init_hidden(batch_size=batch_size)\n",
    "        batches_to_train = len(input_batches)\n",
    "    #     batches_to_train = 32\n",
    "        for batch_i in range(batches_to_train):\n",
    "\n",
    "            hidden = (Tensor(hidden[0].data, autograd=True), Tensor(hidden[1].data, autograd=True))\n",
    "\n",
    "            losses = list()\n",
    "            for t in range(bptt):\n",
    "                input = Tensor(input_batches[batch_i][t], autograd=True)\n",
    "                rnn_input = embed.forward(input=input)\n",
    "                output, hidden = model.forward(input=rnn_input, hidden=hidden)\n",
    "\n",
    "                target = Tensor(target_batches[batch_i][t], autograd=True)    \n",
    "                batch_loss = criterion.forward(output, target)\n",
    "\n",
    "                if(t == 0):\n",
    "                    losses.append(batch_loss)\n",
    "                else:\n",
    "                    losses.append(batch_loss + losses[-1])\n",
    "\n",
    "            loss = losses[-1]\n",
    "\n",
    "            loss.backward()\n",
    "            optim.step()\n",
    "            total_loss += loss.data / bptt\n",
    "\n",
    "            epoch_loss = np.exp(total_loss / (batch_i+1))\n",
    "            if(epoch_loss < min_loss):\n",
    "                min_loss = epoch_loss\n",
    "                print()\n",
    "\n",
    "            log = \"\\r Iter:\" + str(iter)\n",
    "            log += \" - Alpha:\" + str(optim.alpha)[0:5]\n",
    "            log += \" - Batch \"+str(batch_i+1)+\"/\"+str(len(input_batches))\n",
    "            log += \" - Min Loss:\" + str(min_loss)[0:5]\n",
    "            log += \" - Loss:\" + str(epoch_loss)\n",
    "            if(batch_i == 0):\n",
    "                log += \" - \" + generate_sample(n=70, init_char='T').replace(\"\\n\",\" \")\n",
    "            if(batch_i % 1 == 0):\n",
    "                sys.stdout.write(log)\n",
    "        optim.alpha *= 0.99\n",
    "    #     print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      " Iter:0 - Alpha:0.05 - Batch 1/249 - Min Loss:62.00 - Loss:62.000000000000064 -           eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee\n",
      " Iter:0 - Alpha:0.05 - Batch 2/249 - Min Loss:61.99 - Loss:61.999336055802885\n",
      " Iter:0 - Alpha:0.05 - Batch 3/249 - Min Loss:61.98 - Loss:61.989983546689196\n",
      " Iter:0 - Alpha:0.05 - Batch 4/249 - Min Loss:61.97 - Loss:61.972948235165255\n",
      " Iter:0 - Alpha:0.05 - Batch 5/249 - Min Loss:61.94 - Loss:61.941383549495384\n",
      " Iter:0 - Alpha:0.05 - Batch 6/249 - Min Loss:61.88 - Loss:61.88023671827271\n",
      " Iter:0 - Alpha:0.05 - Batch 7/249 - Min Loss:61.77 - Loss:61.77690827437837\n",
      " Iter:0 - Alpha:0.05 - Batch 8/249 - Min Loss:61.52 - Loss:61.52953899883961\n",
      " Iter:0 - Alpha:0.05 - Batch 9/249 - Min Loss:61.00 - Loss:61.00486153547285\n",
      " Iter:0 - Alpha:0.05 - Batch 10/249 - Min Loss:60.23 - Loss:60.236912186726684\n",
      " Iter:0 - Alpha:0.05 - Batch 11/249 - Min Loss:58.70 - Loss:58.7055559369767\n",
      " Iter:0 - Alpha:0.05 - Batch 12/249 - Min Loss:56.73 - Loss:56.73775220158473\n",
      " Iter:0 - Alpha:0.05 - Batch 13/249 - Min Loss:54.10 - Loss:54.10996106485584\n",
      " Iter:0 - Alpha:0.05 - Batch 14/249 - Min Loss:52.75 - Loss:52.75637293050057\n",
      " Iter:0 - Alpha:0.05 - Batch 15/249 - Min Loss:51.07 - Loss:51.078681882080105\n",
      " Iter:0 - Alpha:0.05 - Batch 16/249 - Min Loss:49.37 - Loss:49.37743406427449\n",
      " Iter:0 - Alpha:0.05 - Batch 17/249 - Min Loss:47.81 - Loss:47.81006661764188\n",
      " Iter:0 - Alpha:0.05 - Batch 18/249 - Min Loss:46.68 - Loss:46.68131330399904\n",
      " Iter:0 - Alpha:0.05 - Batch 19/249 - Min Loss:45.76 - Loss:45.76135529411921\n",
      " Iter:0 - Alpha:0.05 - Batch 20/249 - Min Loss:44.63 - Loss:44.63742967139992\n",
      " Iter:0 - Alpha:0.05 - Batch 21/249 - Min Loss:43.43 - Loss:43.43315342999167\n",
      " Iter:0 - Alpha:0.05 - Batch 22/249 - Min Loss:43.13 - Loss:43.133727315170454\n",
      " Iter:0 - Alpha:0.05 - Batch 23/249 - Min Loss:43.08 - Loss:43.08924458053491\n",
      " Iter:0 - Alpha:0.05 - Batch 24/249 - Min Loss:42.48 - Loss:42.48625785761426\n",
      " Iter:0 - Alpha:0.05 - Batch 25/249 - Min Loss:41.59 - Loss:41.59564764008973\n",
      " Iter:0 - Alpha:0.05 - Batch 26/249 - Min Loss:40.64 - Loss:40.64633262212879\n",
      " Iter:0 - Alpha:0.05 - Batch 27/249 - Min Loss:40.08 - Loss:40.08437857978491\n",
      " Iter:0 - Alpha:0.05 - Batch 28/249 - Min Loss:39.38 - Loss:39.38197568983813\n",
      " Iter:0 - Alpha:0.05 - Batch 29/249 - Min Loss:38.85 - Loss:38.85036603038319\n",
      " Iter:0 - Alpha:0.05 - Batch 30/249 - Min Loss:38.32 - Loss:38.32050246588233\n",
      " Iter:0 - Alpha:0.05 - Batch 31/249 - Min Loss:38.02 - Loss:38.028742643067304\n",
      " Iter:0 - Alpha:0.05 - Batch 32/249 - Min Loss:37.57 - Loss:37.579230715808585\n",
      " Iter:0 - Alpha:0.05 - Batch 33/249 - Min Loss:37.15 - Loss:37.1513332533316\n",
      " Iter:0 - Alpha:0.05 - Batch 34/249 - Min Loss:36.72 - Loss:36.72716819545398\n",
      " Iter:0 - Alpha:0.05 - Batch 35/249 - Min Loss:36.50 - Loss:36.505523013835905\n",
      " Iter:0 - Alpha:0.05 - Batch 36/249 - Min Loss:36.26 - Loss:36.264791172196766\n",
      " Iter:0 - Alpha:0.05 - Batch 37/249 - Min Loss:35.93 - Loss:35.93241785924657\n",
      " Iter:0 - Alpha:0.05 - Batch 39/249 - Min Loss:35.69 - Loss:35.69162009265761\n",
      " Iter:0 - Alpha:0.05 - Batch 40/249 - Min Loss:35.39 - Loss:35.391766263709975\n",
      " Iter:0 - Alpha:0.05 - Batch 41/249 - Min Loss:35.24 - Loss:35.24024995428248\n",
      " Iter:0 - Alpha:0.05 - Batch 42/249 - Min Loss:35.16 - Loss:35.16636943540858\n",
      " Iter:0 - Alpha:0.05 - Batch 43/249 - Min Loss:34.82 - Loss:34.82115954562641\n",
      " Iter:0 - Alpha:0.05 - Batch 44/249 - Min Loss:34.60 - Loss:34.60065020684661\n",
      " Iter:0 - Alpha:0.05 - Batch 45/249 - Min Loss:34.54 - Loss:34.549754104397785\n",
      " Iter:0 - Alpha:0.05 - Batch 46/249 - Min Loss:34.32 - Loss:34.32999305867251\n",
      " Iter:0 - Alpha:0.05 - Batch 47/249 - Min Loss:34.11 - Loss:34.117257032452486\n",
      " Iter:0 - Alpha:0.05 - Batch 48/249 - Min Loss:33.90 - Loss:33.90487349892798\n",
      " Iter:0 - Alpha:0.05 - Batch 49/249 - Min Loss:33.75 - Loss:33.75966234624244\n",
      " Iter:0 - Alpha:0.05 - Batch 50/249 - Min Loss:33.61 - Loss:33.61016131907992\n",
      " Iter:0 - Alpha:0.05 - Batch 51/249 - Min Loss:33.31 - Loss:33.31167842936299\n",
      " Iter:0 - Alpha:0.05 - Batch 52/249 - Min Loss:33.11 - Loss:33.11904817623289\n",
      " Iter:0 - Alpha:0.05 - Batch 53/249 - Min Loss:33.00 - Loss:33.004345599234625\n",
      " Iter:0 - Alpha:0.05 - Batch 54/249 - Min Loss:32.81 - Loss:32.817426265009786\n",
      " Iter:0 - Alpha:0.05 - Batch 55/249 - Min Loss:32.60 - Loss:32.60854105028041\n",
      " Iter:0 - Alpha:0.05 - Batch 56/249 - Min Loss:32.42 - Loss:32.42050431188535\n",
      " Iter:0 - Alpha:0.05 - Batch 57/249 - Min Loss:32.19 - Loss:32.198334721563576\n",
      " Iter:0 - Alpha:0.05 - Batch 58/249 - Min Loss:32.02 - Loss:32.027256645802886\n",
      " Iter:0 - Alpha:0.05 - Batch 59/249 - Min Loss:31.81 - Loss:31.818082530542316\n",
      " Iter:0 - Alpha:0.05 - Batch 60/249 - Min Loss:31.63 - Loss:31.631415472551268\n",
      " Iter:0 - Alpha:0.05 - Batch 61/249 - Min Loss:31.39 - Loss:31.393959746851287\n",
      " Iter:0 - Alpha:0.05 - Batch 62/249 - Min Loss:31.19 - Loss:31.19942305047541\n",
      " Iter:0 - Alpha:0.05 - Batch 63/249 - Min Loss:30.95 - Loss:30.95343987653838\n",
      " Iter:0 - Alpha:0.05 - Batch 64/249 - Min Loss:30.74 - Loss:30.7489265136333\n",
      " Iter:0 - Alpha:0.05 - Batch 65/249 - Min Loss:30.66 - Loss:30.665132078887083\n",
      " Iter:0 - Alpha:0.05 - Batch 66/249 - Min Loss:30.60 - Loss:30.605053974596405\n",
      " Iter:0 - Alpha:0.05 - Batch 67/249 - Min Loss:30.45 - Loss:30.456479778353064\n",
      " Iter:0 - Alpha:0.05 - Batch 68/249 - Min Loss:30.24 - Loss:30.241772045820696\n",
      " Iter:0 - Alpha:0.05 - Batch 69/249 - Min Loss:30.14 - Loss:30.14494883095973\n",
      " Iter:0 - Alpha:0.05 - Batch 70/249 - Min Loss:30.00 - Loss:30.004543678506863\n",
      " Iter:0 - Alpha:0.05 - Batch 71/249 - Min Loss:29.89 - Loss:29.899002156894124\n",
      " Iter:0 - Alpha:0.05 - Batch 72/249 - Min Loss:29.85 - Loss:29.85125401656389\n",
      " Iter:0 - Alpha:0.05 - Batch 73/249 - Min Loss:29.74 - Loss:29.742558662511755\n",
      " Iter:0 - Alpha:0.05 - Batch 74/249 - Min Loss:29.55 - Loss:29.554013392531395\n",
      " Iter:0 - Alpha:0.05 - Batch 75/249 - Min Loss:29.41 - Loss:29.413451221065877\n",
      " Iter:0 - Alpha:0.05 - Batch 76/249 - Min Loss:29.30 - Loss:29.300279999145584\n",
      " Iter:0 - Alpha:0.05 - Batch 77/249 - Min Loss:29.16 - Loss:29.162816380218032\n",
      " Iter:0 - Alpha:0.05 - Batch 78/249 - Min Loss:29.09 - Loss:29.09592033129947\n",
      " Iter:0 - Alpha:0.05 - Batch 79/249 - Min Loss:28.96 - Loss:28.969066622778954\n",
      " Iter:0 - Alpha:0.05 - Batch 80/249 - Min Loss:28.84 - Loss:28.847024497772598\n",
      " Iter:0 - Alpha:0.05 - Batch 81/249 - Min Loss:28.68 - Loss:28.682792440948468\n",
      " Iter:0 - Alpha:0.05 - Batch 82/249 - Min Loss:28.59 - Loss:28.598158009843733\n",
      " Iter:0 - Alpha:0.05 - Batch 86/249 - Min Loss:28.54 - Loss:28.672844929166207\n",
      " Iter:0 - Alpha:0.05 - Batch 87/249 - Min Loss:28.50 - Loss:28.5084865620548\n",
      " Iter:0 - Alpha:0.05 - Batch 88/249 - Min Loss:28.35 - Loss:28.352102311265327\n",
      " Iter:0 - Alpha:0.05 - Batch 89/249 - Min Loss:28.19 - Loss:28.192980003869685\n",
      " Iter:0 - Alpha:0.05 - Batch 90/249 - Min Loss:28.05 - Loss:28.052399362314123\n",
      " Iter:0 - Alpha:0.05 - Batch 91/249 - Min Loss:27.92 - Loss:27.928541406577803\n",
      " Iter:0 - Alpha:0.05 - Batch 92/249 - Min Loss:27.82 - Loss:27.826934205543306\n",
      " Iter:0 - Alpha:0.05 - Batch 93/249 - Min Loss:27.67 - Loss:27.67999675592469\n",
      " Iter:0 - Alpha:0.05 - Batch 94/249 - Min Loss:27.54 - Loss:27.549420657967516\n",
      " Iter:0 - Alpha:0.05 - Batch 95/249 - Min Loss:27.40 - Loss:27.40841828220945\n",
      " Iter:0 - Alpha:0.05 - Batch 96/249 - Min Loss:27.31 - Loss:27.318381171485257\n",
      " Iter:0 - Alpha:0.05 - Batch 97/249 - Min Loss:27.19 - Loss:27.199553368220013\n",
      " Iter:0 - Alpha:0.05 - Batch 98/249 - Min Loss:27.08 - Loss:27.085005756069428\n",
      " Iter:0 - Alpha:0.05 - Batch 99/249 - Min Loss:26.92 - Loss:26.920652799695258\n",
      " Iter:0 - Alpha:0.05 - Batch 100/249 - Min Loss:26.77 - Loss:26.778338171170603\n",
      " Iter:0 - Alpha:0.05 - Batch 101/249 - Min Loss:26.63 - Loss:26.63444492542303\n",
      " Iter:0 - Alpha:0.05 - Batch 102/249 - Min Loss:26.50 - Loss:26.503214805264342\n",
      " Iter:0 - Alpha:0.05 - Batch 103/249 - Min Loss:26.38 - Loss:26.38627303021265\n",
      " Iter:0 - Alpha:0.05 - Batch 104/249 - Min Loss:26.28 - Loss:26.28984072160501\n",
      " Iter:0 - Alpha:0.05 - Batch 105/249 - Min Loss:26.19 - Loss:26.197356511158755\n",
      " Iter:0 - Alpha:0.05 - Batch 106/249 - Min Loss:26.11 - Loss:26.110568826540085\n",
      " Iter:0 - Alpha:0.05 - Batch 107/249 - Min Loss:26.08 - Loss:26.08702025964796\n",
      " Iter:0 - Alpha:0.05 - Batch 108/249 - Min Loss:25.98 - Loss:25.98179026707607\n",
      " Iter:0 - Alpha:0.05 - Batch 109/249 - Min Loss:25.89 - Loss:25.891771314535525\n",
      " Iter:0 - Alpha:0.05 - Batch 110/249 - Min Loss:25.80 - Loss:25.802742883596974\n",
      " Iter:0 - Alpha:0.05 - Batch 111/249 - Min Loss:25.71 - Loss:25.717937269876025\n",
      " Iter:0 - Alpha:0.05 - Batch 112/249 - Min Loss:25.64 - Loss:25.641765977541738\n",
      " Iter:0 - Alpha:0.05 - Batch 113/249 - Min Loss:25.57 - Loss:25.57325789739905\n",
      " Iter:0 - Alpha:0.05 - Batch 114/249 - Min Loss:25.49 - Loss:25.499436282313912\n",
      " Iter:0 - Alpha:0.05 - Batch 115/249 - Min Loss:25.44 - Loss:25.442407943785195\n",
      " Iter:0 - Alpha:0.05 - Batch 116/249 - Min Loss:25.35 - Loss:25.35542832928056\n",
      " Iter:0 - Alpha:0.05 - Batch 117/249 - Min Loss:25.27 - Loss:25.276678269744394\n",
      " Iter:0 - Alpha:0.05 - Batch 118/249 - Min Loss:25.17 - Loss:25.175983940560986\n",
      " Iter:0 - Alpha:0.05 - Batch 119/249 - Min Loss:25.10 - Loss:25.10534037752\n",
      " Iter:0 - Alpha:0.05 - Batch 120/249 - Min Loss:25.01 - Loss:25.010115025991002\n",
      " Iter:0 - Alpha:0.05 - Batch 121/249 - Min Loss:24.92 - Loss:24.920317590856904\n",
      " Iter:0 - Alpha:0.05 - Batch 122/249 - Min Loss:24.82 - Loss:24.822277409798907\n",
      " Iter:0 - Alpha:0.05 - Batch 123/249 - Min Loss:24.72 - Loss:24.72203946090018\n",
      " Iter:0 - Alpha:0.05 - Batch 124/249 - Min Loss:24.62 - Loss:24.626004292971295\n",
      " Iter:0 - Alpha:0.05 - Batch 125/249 - Min Loss:24.54 - Loss:24.548408835881084\n",
      " Iter:0 - Alpha:0.05 - Batch 126/249 - Min Loss:24.48 - Loss:24.481874816478964\n",
      " Iter:0 - Alpha:0.05 - Batch 127/249 - Min Loss:24.39 - Loss:24.397815037786064\n",
      " Iter:0 - Alpha:0.05 - Batch 128/249 - Min Loss:24.29 - Loss:24.29141790917974\n",
      " Iter:0 - Alpha:0.05 - Batch 129/249 - Min Loss:24.19 - Loss:24.19678204070675\n",
      " Iter:0 - Alpha:0.05 - Batch 130/249 - Min Loss:24.12 - Loss:24.126267171636325\n",
      " Iter:0 - Alpha:0.05 - Batch 131/249 - Min Loss:24.03 - Loss:24.031656923161755\n",
      " Iter:0 - Alpha:0.05 - Batch 132/249 - Min Loss:23.93 - Loss:23.93408720034178\n",
      " Iter:0 - Alpha:0.05 - Batch 133/249 - Min Loss:23.85 - Loss:23.850310797547163\n",
      " Iter:0 - Alpha:0.05 - Batch 134/249 - Min Loss:23.76 - Loss:23.76517754738187\n",
      " Iter:0 - Alpha:0.05 - Batch 135/249 - Min Loss:23.71 - Loss:23.715494372742555\n",
      " Iter:0 - Alpha:0.05 - Batch 136/249 - Min Loss:23.70 - Loss:23.701413122242627\n",
      " Iter:0 - Alpha:0.05 - Batch 137/249 - Min Loss:23.62 - Loss:23.628566884474214\n",
      " Iter:0 - Alpha:0.05 - Batch 138/249 - Min Loss:23.53 - Loss:23.536622255870185\n",
      " Iter:0 - Alpha:0.05 - Batch 140/249 - Min Loss:23.51 - Loss:23.553200160956592\n",
      " Iter:0 - Alpha:0.05 - Batch 141/249 - Min Loss:23.49 - Loss:23.4954391983157\n",
      " Iter:0 - Alpha:0.05 - Batch 142/249 - Min Loss:23.40 - Loss:23.407151066893466\n",
      " Iter:0 - Alpha:0.05 - Batch 143/249 - Min Loss:23.32 - Loss:23.322406347920353\n",
      " Iter:0 - Alpha:0.05 - Batch 144/249 - Min Loss:23.23 - Loss:23.23392038507518\n",
      " Iter:0 - Alpha:0.05 - Batch 145/249 - Min Loss:23.15 - Loss:23.158962886202442\n",
      " Iter:0 - Alpha:0.05 - Batch 146/249 - Min Loss:23.13 - Loss:23.13027389905198\n",
      " Iter:0 - Alpha:0.05 - Batch 147/249 - Min Loss:23.08 - Loss:23.08432984727065\n",
      " Iter:0 - Alpha:0.05 - Batch 148/249 - Min Loss:23.05 - Loss:23.052812278291672\n",
      " Iter:0 - Alpha:0.05 - Batch 149/249 - Min Loss:22.97 - Loss:22.977617993036645\n",
      " Iter:0 - Alpha:0.05 - Batch 150/249 - Min Loss:22.90 - Loss:22.90860982682122\n",
      " Iter:0 - Alpha:0.05 - Batch 151/249 - Min Loss:22.86 - Loss:22.86285029247915\n",
      " Iter:0 - Alpha:0.05 - Batch 152/249 - Min Loss:22.79 - Loss:22.798324855724506\n",
      " Iter:0 - Alpha:0.05 - Batch 153/249 - Min Loss:22.71 - Loss:22.714156980919036\n",
      " Iter:0 - Alpha:0.05 - Batch 154/249 - Min Loss:22.64 - Loss:22.649942295215556\n",
      " Iter:0 - Alpha:0.05 - Batch 155/249 - Min Loss:22.60 - Loss:22.60987787211344\n",
      " Iter:0 - Alpha:0.05 - Batch 156/249 - Min Loss:22.58 - Loss:22.586330875896408\n",
      " Iter:0 - Alpha:0.05 - Batch 157/249 - Min Loss:22.53 - Loss:22.539866221935924\n",
      " Iter:0 - Alpha:0.05 - Batch 158/249 - Min Loss:22.48 - Loss:22.481459540414978\n",
      " Iter:0 - Alpha:0.05 - Batch 159/249 - Min Loss:22.44 - Loss:22.44458654448001\n",
      " Iter:0 - Alpha:0.05 - Batch 160/249 - Min Loss:22.38 - Loss:22.383503615633796\n",
      " Iter:0 - Alpha:0.05 - Batch 161/249 - Min Loss:22.31 - Loss:22.317076639885897\n",
      " Iter:0 - Alpha:0.05 - Batch 162/249 - Min Loss:22.25 - Loss:22.259430027034902\n",
      " Iter:0 - Alpha:0.05 - Batch 163/249 - Min Loss:22.19 - Loss:22.19904509067437\n",
      " Iter:0 - Alpha:0.05 - Batch 164/249 - Min Loss:22.14 - Loss:22.148098444656362\n",
      " Iter:0 - Alpha:0.05 - Batch 165/249 - Min Loss:22.10 - Loss:22.101528454497767\n",
      " Iter:0 - Alpha:0.05 - Batch 166/249 - Min Loss:22.04 - Loss:22.04778889335714\n",
      " Iter:0 - Alpha:0.05 - Batch 167/249 - Min Loss:22.01 - Loss:22.015539256206182\n",
      " Iter:0 - Alpha:0.05 - Batch 168/249 - Min Loss:21.97 - Loss:21.97610738320577\n",
      " Iter:0 - Alpha:0.05 - Batch 169/249 - Min Loss:21.91 - Loss:21.91886403464624\n",
      " Iter:0 - Alpha:0.05 - Batch 170/249 - Min Loss:21.87 - Loss:21.87209957981593\n",
      " Iter:0 - Alpha:0.05 - Batch 171/249 - Min Loss:21.83 - Loss:21.83197231706187\n",
      " Iter:0 - Alpha:0.05 - Batch 172/249 - Min Loss:21.79 - Loss:21.797138291435964\n",
      " Iter:0 - Alpha:0.05 - Batch 173/249 - Min Loss:21.75 - Loss:21.752929637839053\n",
      " Iter:0 - Alpha:0.05 - Batch 174/249 - Min Loss:21.71 - Loss:21.71451452173225\n",
      " Iter:0 - Alpha:0.05 - Batch 175/249 - Min Loss:21.68 - Loss:21.68635181976672\n",
      " Iter:0 - Alpha:0.05 - Batch 176/249 - Min Loss:21.64 - Loss:21.643368363210932\n",
      " Iter:0 - Alpha:0.05 - Batch 177/249 - Min Loss:21.59 - Loss:21.59547234826887\n",
      " Iter:0 - Alpha:0.05 - Batch 178/249 - Min Loss:21.54 - Loss:21.548969384961\n",
      " Iter:0 - Alpha:0.05 - Batch 179/249 - Min Loss:21.52 - Loss:21.528790503616882\n",
      " Iter:0 - Alpha:0.05 - Batch 180/249 - Min Loss:21.48 - Loss:21.484217083500187\n",
      " Iter:0 - Alpha:0.05 - Batch 181/249 - Min Loss:21.43 - Loss:21.43585185733848\n",
      " Iter:0 - Alpha:0.05 - Batch 182/249 - Min Loss:21.38 - Loss:21.386494316884\n",
      " Iter:0 - Alpha:0.05 - Batch 183/249 - Min Loss:21.33 - Loss:21.333293821976987\n",
      " Iter:0 - Alpha:0.05 - Batch 184/249 - Min Loss:21.29 - Loss:21.292714546692963\n",
      " Iter:0 - Alpha:0.05 - Batch 185/249 - Min Loss:21.24 - Loss:21.2479883334045\n",
      " Iter:0 - Alpha:0.05 - Batch 186/249 - Min Loss:21.21 - Loss:21.21454412091733\n",
      " Iter:0 - Alpha:0.05 - Batch 187/249 - Min Loss:21.18 - Loss:21.18709730607852\n",
      " Iter:0 - Alpha:0.05 - Batch 188/249 - Min Loss:21.18 - Loss:21.18500879669473\n",
      " Iter:0 - Alpha:0.05 - Batch 189/249 - Min Loss:21.14 - Loss:21.149192169258725\n",
      " Iter:0 - Alpha:0.05 - Batch 190/249 - Min Loss:21.12 - Loss:21.121492290739525\n",
      " Iter:0 - Alpha:0.05 - Batch 191/249 - Min Loss:21.09 - Loss:21.090529064322375\n",
      " Iter:0 - Alpha:0.05 - Batch 192/249 - Min Loss:21.06 - Loss:21.061615675909103\n",
      " Iter:0 - Alpha:0.05 - Batch 193/249 - Min Loss:21.00 - Loss:21.000099027990963\n",
      " Iter:0 - Alpha:0.05 - Batch 194/249 - Min Loss:20.98 - Loss:20.98468231590407\n",
      " Iter:0 - Alpha:0.05 - Batch 195/249 - Min Loss:20.98 - Loss:20.983618463178825\n",
      " Iter:0 - Alpha:0.05 - Batch 196/249 - Min Loss:20.94 - Loss:20.941073976275163\n",
      " Iter:0 - Alpha:0.05 - Batch 197/249 - Min Loss:20.90 - Loss:20.909982209463195\n",
      " Iter:0 - Alpha:0.05 - Batch 198/249 - Min Loss:20.87 - Loss:20.8792320482417\n",
      " Iter:0 - Alpha:0.05 - Batch 199/249 - Min Loss:20.83 - Loss:20.836144396820785\n",
      " Iter:0 - Alpha:0.05 - Batch 200/249 - Min Loss:20.78 - Loss:20.786702940643394\n",
      " Iter:0 - Alpha:0.05 - Batch 201/249 - Min Loss:20.75 - Loss:20.755678930297545\n",
      " Iter:0 - Alpha:0.05 - Batch 202/249 - Min Loss:20.72 - Loss:20.72621131450987\n",
      " Iter:0 - Alpha:0.05 - Batch 203/249 - Min Loss:20.68 - Loss:20.68893807384813\n",
      " Iter:0 - Alpha:0.05 - Batch 204/249 - Min Loss:20.65 - Loss:20.650964182346172\n",
      " Iter:0 - Alpha:0.05 - Batch 205/249 - Min Loss:20.61 - Loss:20.617254141780762\n",
      " Iter:0 - Alpha:0.05 - Batch 206/249 - Min Loss:20.57 - Loss:20.573633574910954\n",
      " Iter:0 - Alpha:0.05 - Batch 207/249 - Min Loss:20.53 - Loss:20.53542759736034\n",
      " Iter:0 - Alpha:0.05 - Batch 208/249 - Min Loss:20.49 - Loss:20.49739996174571\n",
      " Iter:0 - Alpha:0.05 - Batch 209/249 - Min Loss:20.44 - Loss:20.449686230569824\n",
      " Iter:0 - Alpha:0.05 - Batch 210/249 - Min Loss:20.40 - Loss:20.403291221330196\n",
      " Iter:0 - Alpha:0.05 - Batch 211/249 - Min Loss:20.34 - Loss:20.3475100925815\n",
      " Iter:0 - Alpha:0.05 - Batch 212/249 - Min Loss:20.30 - Loss:20.301789206440752\n",
      " Iter:0 - Alpha:0.05 - Batch 213/249 - Min Loss:20.26 - Loss:20.26542625947414\n",
      " Iter:0 - Alpha:0.05 - Batch 214/249 - Min Loss:20.23 - Loss:20.23111001746059\n",
      " Iter:0 - Alpha:0.05 - Batch 215/249 - Min Loss:20.18 - Loss:20.18481394544832\n",
      " Iter:0 - Alpha:0.05 - Batch 216/249 - Min Loss:20.14 - Loss:20.144828981387178\n",
      " Iter:0 - Alpha:0.05 - Batch 217/249 - Min Loss:20.11 - Loss:20.117436374376467\n",
      " Iter:0 - Alpha:0.05 - Batch 218/249 - Min Loss:20.07 - Loss:20.07285790792156\n",
      " Iter:0 - Alpha:0.05 - Batch 219/249 - Min Loss:20.01 - Loss:20.019440903545878\n",
      " Iter:0 - Alpha:0.05 - Batch 220/249 - Min Loss:19.99 - Loss:19.995961465663378\n",
      " Iter:0 - Alpha:0.05 - Batch 221/249 - Min Loss:19.98 - Loss:19.986751087464363\n",
      " Iter:0 - Alpha:0.05 - Batch 222/249 - Min Loss:19.95 - Loss:19.95413004376577\n",
      " Iter:0 - Alpha:0.05 - Batch 223/249 - Min Loss:19.92 - Loss:19.920535628027963\n",
      " Iter:0 - Alpha:0.05 - Batch 224/249 - Min Loss:19.88 - Loss:19.88982739248521\n",
      " Iter:0 - Alpha:0.05 - Batch 225/249 - Min Loss:19.85 - Loss:19.85661224385163\n",
      " Iter:0 - Alpha:0.05 - Batch 226/249 - Min Loss:19.81 - Loss:19.818270498440558\n",
      " Iter:0 - Alpha:0.05 - Batch 227/249 - Min Loss:19.78 - Loss:19.78879746898553\n",
      " Iter:0 - Alpha:0.05 - Batch 228/249 - Min Loss:19.77 - Loss:19.773987163273635\n",
      " Iter:0 - Alpha:0.05 - Batch 229/249 - Min Loss:19.74 - Loss:19.74321893462951\n",
      " Iter:0 - Alpha:0.05 - Batch 230/249 - Min Loss:19.71 - Loss:19.714114028548735\n",
      " Iter:0 - Alpha:0.05 - Batch 231/249 - Min Loss:19.69 - Loss:19.696300464252033\n",
      " Iter:0 - Alpha:0.05 - Batch 232/249 - Min Loss:19.66 - Loss:19.664281554171573\n",
      " Iter:0 - Alpha:0.05 - Batch 233/249 - Min Loss:19.64 - Loss:19.64497900030178\n",
      " Iter:0 - Alpha:0.05 - Batch 234/249 - Min Loss:19.61 - Loss:19.61390018795743\n",
      " Iter:0 - Alpha:0.05 - Batch 235/249 - Min Loss:19.57 - Loss:19.578057993323174\n",
      " Iter:0 - Alpha:0.05 - Batch 236/249 - Min Loss:19.53 - Loss:19.539290922887076\n",
      " Iter:0 - Alpha:0.05 - Batch 237/249 - Min Loss:19.51 - Loss:19.51336152702854\n",
      " Iter:0 - Alpha:0.05 - Batch 238/249 - Min Loss:19.49 - Loss:19.494216997205427\n",
      " Iter:0 - Alpha:0.05 - Batch 239/249 - Min Loss:19.47 - Loss:19.474255434858627\n",
      " Iter:0 - Alpha:0.05 - Batch 240/249 - Min Loss:19.45 - Loss:19.45452512691269\n",
      " Iter:0 - Alpha:0.05 - Batch 241/249 - Min Loss:19.41 - Loss:19.41567509465724\n",
      " Iter:0 - Alpha:0.05 - Batch 242/249 - Min Loss:19.38 - Loss:19.384809146622928\n",
      " Iter:0 - Alpha:0.05 - Batch 244/249 - Min Loss:19.35 - Loss:19.35866317796798\n",
      " Iter:0 - Alpha:0.05 - Batch 245/249 - Min Loss:19.33 - Loss:19.33615864113177\n",
      " Iter:0 - Alpha:0.05 - Batch 246/249 - Min Loss:19.31 - Loss:19.311764056907894\n",
      " Iter:0 - Alpha:0.05 - Batch 247/249 - Min Loss:19.28 - Loss:19.28639653162862\n",
      " Iter:0 - Alpha:0.05 - Batch 248/249 - Min Loss:19.25 - Loss:19.259355873841894\n",
      " Iter:0 - Alpha:0.05 - Batch 249/249 - Min Loss:19.23 - Loss:19.233214769358476\n",
      " Iter:1 - Alpha:0.049 - Batch 1/249 - Min Loss:13.06 - Loss:13.063830116471486 - her t tere t tere t tere t tere t tere t tere t tere t tere t tere t t\n",
      " Iter:1 - Alpha:0.049 - Batch 3/249 - Min Loss:12.94 - Loss:13.045405787590937\n",
      " Iter:1 - Alpha:0.049 - Batch 4/249 - Min Loss:12.93 - Loss:12.931715871054474\n",
      " Iter:1 - Alpha:0.049 - Batch 249/249 - Min Loss:12.88 - Loss:13.297767268945559\n",
      " Iter:2 - Alpha:0.049 - Batch 249/249 - Min Loss:11.98 - Loss:12.467682706214898 hen theren theren therer then theren theren therer then theren theren \n",
      " Iter:3 - Alpha:0.048 - Batch 2/249 - Min Loss:11.43 - Loss:11.463986229895673 - hen theres thes thes thes thes thes thes thes thes thes thes thes thes\n",
      " Iter:3 - Alpha:0.048 - Batch 3/249 - Min Loss:11.43 - Loss:11.433608994379455\n",
      " Iter:3 - Alpha:0.048 - Batch 4/249 - Min Loss:11.29 - Loss:11.292592685693808\n",
      " Iter:5 - Alpha:0.047 - Batch 55/249 - Min Loss:11.19 - Loss:11.211233778075991- hend seates, and seates, and seates, and seates, and seates, and seate\n",
      " Iter:5 - Alpha:0.047 - Batch 56/249 - Min Loss:11.17 - Loss:11.172951972803446\n",
      " Iter:5 - Alpha:0.047 - Batch 100/249 - Min Loss:11.14 - Loss:11.15109606209378\n",
      " Iter:5 - Alpha:0.047 - Batch 101/249 - Min Loss:11.13 - Loss:11.13802901054302\n",
      " Iter:5 - Alpha:0.047 - Batch 102/249 - Min Loss:11.11 - Loss:11.119285037049702\n",
      " Iter:5 - Alpha:0.047 - Batch 103/249 - Min Loss:11.11 - Loss:11.112100526728414\n",
      " Iter:5 - Alpha:0.047 - Batch 104/249 - Min Loss:11.10 - Loss:11.104394523353339\n",
      " Iter:5 - Alpha:0.047 - Batch 105/249 - Min Loss:11.09 - Loss:11.099332194264191\n",
      " Iter:5 - Alpha:0.047 - Batch 107/249 - Min Loss:11.08 - Loss:11.084563410952635\n",
      " Iter:5 - Alpha:0.047 - Batch 128/249 - Min Loss:11.07 - Loss:11.080755098118637\n",
      " Iter:5 - Alpha:0.047 - Batch 129/249 - Min Loss:11.07 - Loss:11.074532810524575\n",
      " Iter:5 - Alpha:0.047 - Batch 130/249 - Min Loss:11.07 - Loss:11.070308307744044\n",
      " Iter:5 - Alpha:0.047 - Batch 131/249 - Min Loss:11.05 - Loss:11.053205895772722\n",
      " Iter:5 - Alpha:0.047 - Batch 133/249 - Min Loss:11.03 - Loss:11.041628774766117\n",
      " Iter:5 - Alpha:0.047 - Batch 135/249 - Min Loss:11.03 - Loss:11.035707041019423\n",
      " Iter:5 - Alpha:0.047 - Batch 137/249 - Min Loss:11.02 - Loss:11.024606766677806\n",
      " Iter:5 - Alpha:0.047 - Batch 138/249 - Min Loss:11.00 - Loss:11.007025857860137\n",
      " Iter:5 - Alpha:0.047 - Batch 144/249 - Min Loss:11.00 - Loss:11.004993617647228\n",
      " Iter:5 - Alpha:0.047 - Batch 152/249 - Min Loss:11.00 - Loss:11.003699896686872\n",
      " Iter:5 - Alpha:0.047 - Batch 153/249 - Min Loss:10.99 - Loss:10.996498599690339\n",
      " Iter:5 - Alpha:0.047 - Batch 208/249 - Min Loss:10.98 - Loss:10.995703693907218\n",
      " Iter:5 - Alpha:0.047 - Batch 209/249 - Min Loss:10.98 - Loss:10.987160856768003\n",
      " Iter:5 - Alpha:0.047 - Batch 210/249 - Min Loss:10.97 - Loss:10.979813712335138\n",
      " Iter:5 - Alpha:0.047 - Batch 211/249 - Min Loss:10.96 - Loss:10.961275932129482\n",
      " Iter:5 - Alpha:0.047 - Batch 212/249 - Min Loss:10.95 - Loss:10.954692498871907\n",
      " Iter:5 - Alpha:0.047 - Batch 213/249 - Min Loss:10.94 - Loss:10.948507347452539\n",
      " Iter:5 - Alpha:0.047 - Batch 214/249 - Min Loss:10.94 - Loss:10.947598863066297\n",
      " Iter:5 - Alpha:0.047 - Batch 215/249 - Min Loss:10.93 - Loss:10.937779129758908\n",
      " Iter:5 - Alpha:0.047 - Batch 217/249 - Min Loss:10.93 - Loss:10.933565142331665\n",
      " Iter:5 - Alpha:0.047 - Batch 218/249 - Min Loss:10.92 - Loss:10.927695951351394\n",
      " Iter:5 - Alpha:0.047 - Batch 223/249 - Min Loss:10.91 - Loss:10.920012740008673\n",
      " Iter:5 - Alpha:0.047 - Batch 224/249 - Min Loss:10.91 - Loss:10.915941202659539\n",
      " Iter:5 - Alpha:0.047 - Batch 225/249 - Min Loss:10.91 - Loss:10.911412416260603\n",
      " Iter:5 - Alpha:0.047 - Batch 226/249 - Min Loss:10.90 - Loss:10.908265904635522\n",
      " Iter:5 - Alpha:0.047 - Batch 235/249 - Min Loss:10.90 - Loss:10.905597365624304\n",
      " Iter:5 - Alpha:0.047 - Batch 236/249 - Min Loss:10.89 - Loss:10.898794557204706\n",
      " Iter:5 - Alpha:0.047 - Batch 237/249 - Min Loss:10.89 - Loss:10.890970313767603\n",
      " Iter:5 - Alpha:0.047 - Batch 240/249 - Min Loss:10.88 - Loss:10.88534610730771\n",
      " Iter:5 - Alpha:0.047 - Batch 241/249 - Min Loss:10.87 - Loss:10.876534714857252\n",
      " Iter:5 - Alpha:0.047 - Batch 242/249 - Min Loss:10.87 - Loss:10.87459117637956\n",
      " Iter:5 - Alpha:0.047 - Batch 248/249 - Min Loss:10.86 - Loss:10.872254639188432\n",
      " Iter:5 - Alpha:0.047 - Batch 249/249 - Min Loss:10.86 - Loss:10.86373688723013\n",
      " Iter:6 - Alpha:0.047 - Batch 1/249 - Min Loss:10.69 - Loss:10.690402702580894 - hen theres, and theres, and theres, and theres, and theres, and theres\n",
      " Iter:7 - Alpha:0.046 - Batch 140/249 - Min Loss:10.55 - Loss:10.736922784153954heres, and seent thees, and seent thees, and seent thees, and seent th"
     ]
    }
   ],
   "source": [
    "train(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Iter:91 - Alpha:0.016 - Batch 176/249 - Min Loss:9.900 - Loss:11.975722569949843\n"
     ]
    }
   ],
   "source": [
    "train(100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Intestay thee.\n",
      "\n",
      "SIR:\n",
      "It thou my thar the sentastar the see the see:\n",
      "Imentary take the subloud I\n",
      "Stall my thentaring fook the senternight pead me, the gakentlenternot they day them.\n",
      "\n",
      "KENNOR:\n",
      "I stay the see talk :\n",
      "Non the seady!\n",
      "\n",
      "Sustar thou shour in the suble the see the senternow the antently the see the seaventlace peake,\n",
      "I sentlentony my thent:\n",
      "I the sentastar thamy this not thame.\n",
      "\n",
      "From the stay the sentastar star the see the senternce thentlent\n",
      "stay you, he shad be his say the senterny astak\n"
     ]
    }
   ],
   "source": [
    "def generate_sample(n=30, init_char=' '):\n",
    "    s = \"\"\n",
    "    hidden = model.init_hidden(batch_size=1)\n",
    "    input = Tensor(np.array([word2index[init_char]]))\n",
    "    for i in range(n):\n",
    "        rnn_input = embed.forward(input)\n",
    "        output, hidden = model.forward(input=rnn_input, hidden=hidden)\n",
    "        output.data *= 15\n",
    "        temp_dist = output.softmax()\n",
    "        temp_dist /= temp_dist.sum()\n",
    "\n",
    "#         m = (temp_dist > np.random.rand()).argmax() # sample from predictions\n",
    "        m = output.data.argmax() # take the max prediction\n",
    "        c = vocab[m]\n",
    "        input = Tensor(np.array([m]))\n",
    "        s += c\n",
    "    return s\n",
    "print(generate_sample(n=500, init_char='\\n'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Chapter14 - Intro to LSTMs - Part 2 - Learn to Write Like Shakespeare.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "class Tensor (object):\n",
    "    \n",
    "    def __init__(self,data,\n",
    "                 autograd=False,\n",
    "                 creators=None,\n",
    "                 creation_op=None,\n",
    "                 id=None):\n",
    "        \n",
    "        self.data = np.array(data)\n",
    "        self.autograd = autograd\n",
    "        self.grad = None\n",
    "        if(id is None):\n",
    "            self.id = np.random.randint(0,100000)\n",
    "        else:\n",
    "            self.id = id\n",
    "        \n",
    "        self.creators = creators\n",
    "        self.creation_op = creation_op\n",
    "        self.children = {}\n",
    "        \n",
    "        if(creators is not None):\n",
    "            for c in creators:\n",
    "                if(self.id not in c.children):\n",
    "                    c.children[self.id] = 1\n",
    "                else:\n",
    "                    c.children[self.id] += 1\n",
    "\n",
    "    def all_children_grads_accounted_for(self):\n",
    "        for id,cnt in self.children.items():\n",
    "            if(cnt != 0):\n",
    "                return False\n",
    "        return True \n",
    "        \n",
    "    def backward(self,grad=None, grad_origin=None):\n",
    "        if(self.autograd):\n",
    " \n",
    "            if(grad is None):\n",
    "                grad = Tensor(np.ones_like(self.data))\n",
    "\n",
    "            if(grad_origin is not None):\n",
    "                if(self.children[grad_origin.id] == 0):\n",
    "                    raise Exception(\"cannot backprop more than once\")\n",
    "                else:\n",
    "                    self.children[grad_origin.id] -= 1\n",
    "\n",
    "            if(self.grad is None):\n",
    "                self.grad = grad\n",
    "            else:\n",
    "                self.grad += grad\n",
    "            \n",
    "            # grads must not have grads of their own\n",
    "            assert grad.autograd == False\n",
    "            \n",
    "            # only continue backpropping if there's something to\n",
    "            # backprop into and if all gradients (from children)\n",
    "            # are accounted for override waiting for children if\n",
    "            # \"backprop\" was called on this variable directly\n",
    "            if(self.creators is not None and \n",
    "               (self.all_children_grads_accounted_for() or \n",
    "                grad_origin is None)):\n",
    "\n",
    "                if(self.creation_op == \"add\"):\n",
    "                    self.creators[0].backward(self.grad, self)\n",
    "                    self.creators[1].backward(self.grad, self)\n",
    "                    \n",
    "                if(self.creation_op == \"sub\"):\n",
    "                    self.creators[0].backward(Tensor(self.grad.data), self)\n",
    "                    self.creators[1].backward(Tensor(self.grad.__neg__().data), self)\n",
    "\n",
    "                if(self.creation_op == \"mul\"):\n",
    "                    new = self.grad * self.creators[1]\n",
    "                    self.creators[0].backward(new , self)\n",
    "                    new = self.grad * self.creators[0]\n",
    "                    self.creators[1].backward(new, self)                    \n",
    "                    \n",
    "                if(self.creation_op == \"mm\"):\n",
    "                    c0 = self.creators[0]\n",
    "                    c1 = self.creators[1]\n",
    "                    new = self.grad.mm(c1.transpose())\n",
    "                    c0.backward(new)\n",
    "                    new = self.grad.transpose().mm(c0).transpose()\n",
    "                    c1.backward(new)\n",
    "                    \n",
    "                if(self.creation_op == \"transpose\"):\n",
    "                    self.creators[0].backward(self.grad.transpose())\n",
    "\n",
    "                if(\"sum\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.expand(dim,\n",
    "                                                               self.creators[0].data.shape[dim]))\n",
    "\n",
    "                if(\"expand\" in self.creation_op):\n",
    "                    dim = int(self.creation_op.split(\"_\")[1])\n",
    "                    self.creators[0].backward(self.grad.sum(dim))\n",
    "                    \n",
    "                if(self.creation_op == \"neg\"):\n",
    "                    self.creators[0].backward(self.grad.__neg__())\n",
    "                    \n",
    "                if(self.creation_op == \"sigmoid\"):\n",
    "                    ones = Tensor(np.ones_like(self.grad.data))\n",
    "                    self.creators[0].backward(self.grad * (self * (ones - self)))\n",
    "                \n",
    "                if(self.creation_op == \"tanh\"):\n",
    "

Download .txt

gitextract_3ri_1omh/

├── .gitignore
├── Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb
├── Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb
├── Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb
├── Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb
├── Chapter14 - Exploding Gradients Examples.ipynb
├── Chapter14 - Intro to LSTMs - Learn to Write Like Shakespeare.ipynb
├── Chapter14 - Intro to LSTMs - Part 2 - Learn to Write Like Shakespeare.ipynb
├── Chapter15 - Intro to Federated Learning - Deep Learning on Unseen Data.ipynb
├── Chapter3 -  Forward Propagation - Intro to Neural Prediction.ipynb
├── Chapter4 - Gradient Descent - Intro to Neural Learning.ipynb
├── Chapter5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time.ipynb
├── Chapter6 - Intro to Backpropagation - Building Your First DEEP Neural Network.ipynb
├── Chapter8 - Intro to Regularization - Learning Signal and Ignoring Noise.ipynb
├── Chapter9 - Intro to Activation Functions - Modeling Probabilities.ipynb
├── MNISTPreprocessor.ipynb
├── README.md
├── docker-compose.yml
├── floyd.yml
├── ham.txt
├── labels.txt
├── reviews.txt
├── shakespear.txt
├── spam.txt
└── tasksv11/
    ├── LICENSE
    ├── README
    ├── en/
    │   ├── qa10_indefinite-knowledge_test.txt
    │   ├── qa10_indefinite-knowledge_train.txt
    │   ├── qa11_basic-coreference_test.txt
    │   ├── qa11_basic-coreference_train.txt
    │   ├── qa12_conjunction_test.txt
    │   ├── qa12_conjunction_train.txt
    │   ├── qa13_compound-coreference_test.txt
    │   ├── qa13_compound-coreference_train.txt
    │   ├── qa14_time-reasoning_test.txt
    │   ├── qa14_time-reasoning_train.txt
    │   ├── qa15_basic-deduction_test.txt
    │   ├── qa15_basic-deduction_train.txt
    │   ├── qa16_basic-induction_test.txt
    │   ├── qa16_basic-induction_train.txt
    │   ├── qa17_positional-reasoning_test.txt
    │   ├── qa17_positional-reasoning_train.txt
    │   ├── qa18_size-reasoning_test.txt
    │   ├── qa18_size-reasoning_train.txt
    │   ├── qa19_path-finding_test.txt
    │   ├── qa19_path-finding_train.txt
    │   ├── qa1_single-supporting-fact_test.txt
    │   ├── qa1_single-supporting-fact_train.txt
    │   ├── qa20_agents-motivations_test.txt
    │   ├── qa20_agents-motivations_train.txt
    │   ├── qa2_two-supporting-facts_test.txt
    │   ├── qa2_two-supporting-facts_train.txt
    │   ├── qa3_three-supporting-facts_test.txt
    │   ├── qa3_three-supporting-facts_train.txt
    │   ├── qa4_two-arg-relations_test.txt
    │   ├── qa4_two-arg-relations_train.txt
    │   ├── qa5_three-arg-relations_test.txt
    │   ├── qa5_three-arg-relations_train.txt
    │   ├── qa6_yes-no-questions_test.txt
    │   ├── qa6_yes-no-questions_train.txt
    │   ├── qa7_counting_test.txt
    │   ├── qa7_counting_train.txt
    │   ├── qa8_lists-sets_test.txt
    │   ├── qa8_lists-sets_train.txt
    │   ├── qa9_simple-negation_test.txt
    │   └── qa9_simple-negation_train.txt
    └── shuffled/
        ├── qa10_indefinite-knowledge_test.txt
        ├── qa10_indefinite-knowledge_train.txt
        ├── qa11_basic-coreference_test.txt
        ├── qa11_basic-coreference_train.txt
        ├── qa12_conjunction_test.txt
        ├── qa12_conjunction_train.txt
        ├── qa13_compound-coreference_test.txt
        ├── qa13_compound-coreference_train.txt
        ├── qa14_time-reasoning_test.txt
        ├── qa14_time-reasoning_train.txt
        ├── qa15_basic-deduction_test.txt
        ├── qa15_basic-deduction_train.txt
        ├── qa16_basic-induction_test.txt
        ├── qa16_basic-induction_train.txt
        ├── qa17_positional-reasoning_test.txt
        ├── qa17_positional-reasoning_train.txt
        ├── qa18_size-reasoning_test.txt
        ├── qa18_size-reasoning_train.txt
        ├── qa19_path-finding_test.txt
        ├── qa19_path-finding_train.txt
        ├── qa1_single-supporting-fact_test.txt
        ├── qa1_single-supporting-fact_train.txt
        ├── qa20_agents-motivations_test.txt
        ├── qa20_agents-motivations_train.txt
        ├── qa2_two-supporting-facts_test.txt
        ├── qa2_two-supporting-facts_train.txt
        ├── qa3_three-supporting-facts_test.txt
        ├── qa3_three-supporting-facts_train.txt
        ├── qa4_two-arg-relations_test.txt
        ├── qa4_two-arg-relations_train.txt
        ├── qa5_three-arg-relations_test.txt
        ├── qa5_three-arg-relations_train.txt
        ├── qa6_yes-no-questions_test.txt
        ├── qa6_yes-no-questions_train.txt
        ├── qa7_counting_test.txt
        ├── qa7_counting_train.txt
        ├── qa8_lists-sets_test.txt
        ├── qa8_lists-sets_train.txt
        ├── qa9_simple-negation_test.txt
        └── qa9_simple-negation_train.txt

Copy disabled (too large) Download .json

Condensed preview — 106 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (13,513K chars).

[
  {
    "path": ".gitignore",
    "chars": 255,
    "preview": "\n# Created by https://www.gitignore.io/api/jupyternotebook\n\n### JupyterNotebook ###\n.ipynb_checkpoints\n*/.ipynb_checkpoi"
  },
  {
    "path": "Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb",
    "chars": 21342,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Upgrading our MNIST Network\"\n   ]"
  },
  {
    "path": "Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb",
    "chars": 16631,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Download the IMDB Dataset\"\n   ]\n "
  },
  {
    "path": "Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb",
    "chars": 17527,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Download & Preprocess the IMDB Da"
  },
  {
    "path": "Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb",
    "chars": 78590,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Part 1: Introduction to Tensors\"\n"
  },
  {
    "path": "Chapter14 - Exploding Gradients Examples.ipynb",
    "chars": 3771,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 158,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name"
  },
  {
    "path": "Chapter14 - Intro to LSTMs - Learn to Write Like Shakespeare.ipynb",
    "chars": 55354,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n "
  },
  {
    "path": "Chapter14 - Intro to LSTMs - Part 2 - Learn to Write Like Shakespeare.ipynb",
    "chars": 43446,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 100,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": ["
  },
  {
    "path": "Chapter15 - Intro to Federated Learning - Deep Learning on Unseen Data.ipynb",
    "chars": 45426,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 93,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n"
  },
  {
    "path": "Chapter3 -  Forward Propagation - Intro to Neural Prediction.ipynb",
    "chars": 16589,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# A Simple Neural Network Making a "
  },
  {
    "path": "Chapter4 - Gradient Descent - Intro to Neural Learning.ipynb",
    "chars": 91972,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Compare: Does our network make go"
  },
  {
    "path": "Chapter5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time.ipynb",
    "chars": 12268,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Gradient Descent Learning with Mu"
  },
  {
    "path": "Chapter6 - Intro to Backpropagation - Building Your First DEEP Neural Network.ipynb",
    "chars": 24043,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Creating a Matrix or Two in Pytho"
  },
  {
    "path": "Chapter8 - Intro to Regularization - Learning Signal and Ignoring Noise.ipynb",
    "chars": 21649,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# 3 Layer Network on MNIST\"\n   ]\n  "
  },
  {
    "path": "Chapter9 - Intro to Activation Functions - Modeling Probabilities.ipynb",
    "chars": 5302,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Upgrading our MNIST Network\"\n   ]"
  },
  {
    "path": "MNISTPreprocessor.ipynb",
    "chars": 7309,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n "
  },
  {
    "path": "README.md",
    "chars": 3496,
    "preview": "# Grokking-Deep-Learning\n[![Run on FloydHub](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/"
  },
  {
    "path": "docker-compose.yml",
    "chars": 581,
    "preview": "#\n# Run the Jupyter notebooks in a container using Docker Compose\n#\n# Start container:\n# docker-compose up -d\n#\n# Open h"
  },
  {
    "path": "floyd.yml",
    "chars": 33,
    "preview": "env: tensorflow-1.9\nmachine: cpu\n"
  },
  {
    "path": "labels.txt",
    "chars": 225000,
    "preview": "positive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nnegative\npositive\nneg"
  },
  {
    "path": "shakespear.txt",
    "chars": 99993,
    "preview": "That, poor contempt, or claim'd thou slept so faithful,\nI may contrive our father; and, in their defeated queen,\nHer fle"
  },
  {
    "path": "tasksv11/LICENSE",
    "chars": 19561,
    "preview": "CC License\n\nbAbI tasks data\n\nCopyright (c) 2015-present, Facebook, Inc. All rights reserved.\n\nCreative Commons Legal Cod"
  },
  {
    "path": "tasksv11/README",
    "chars": 2980,
    "preview": "Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks\n------------------------------------------------"
  },
  {
    "path": "tasksv11/en/qa10_indefinite-knowledge_test.txt",
    "chars": 99632,
    "preview": "1 Mary is in the school.\n2 Bill is in the kitchen.\n3 Is Bill in the bedroom? \tno\t2\n4 Bill journeyed to the bedroom.\n5 Fr"
  },
  {
    "path": "tasksv11/en/qa10_indefinite-knowledge_train.txt",
    "chars": 99217,
    "preview": "1 Fred is either in the school or the park.\n2 Mary went back to the office.\n3 Is Mary in the office? \tyes\t2\n4 Bill is ei"
  },
  {
    "path": "tasksv11/en/qa11_basic-coreference_test.txt",
    "chars": 104800,
    "preview": "1 John journeyed to the hallway.\n2 After that he journeyed to the garden.\n3 Where is John? \tgarden\t1 2\n4 John moved to t"
  },
  {
    "path": "tasksv11/en/qa11_basic-coreference_train.txt",
    "chars": 104520,
    "preview": "1 Mary went back to the bathroom.\n2 After that she went to the bedroom.\n3 Where is Mary? \tbedroom\t1 2\n4 Daniel moved to "
  },
  {
    "path": "tasksv11/en/qa12_conjunction_test.txt",
    "chars": 114489,
    "preview": "1 John and Mary travelled to the hallway.\n2 Sandra and Mary journeyed to the bedroom.\n3 Where is Mary? \tbedroom\t2\n4 Mary"
  },
  {
    "path": "tasksv11/en/qa12_conjunction_train.txt",
    "chars": 114560,
    "preview": "1 Mary and Daniel travelled to the bathroom.\n2 John and Daniel travelled to the office.\n3 Where is Daniel? \toffice\t2\n4 S"
  },
  {
    "path": "tasksv11/en/qa13_compound-coreference_test.txt",
    "chars": 116444,
    "preview": "1 John and Mary went back to the hallway.\n2 Then they went to the bathroom.\n3 Where is John? \tbathroom\t1 2\n4 Mary and Jo"
  },
  {
    "path": "tasksv11/en/qa13_compound-coreference_train.txt",
    "chars": 116516,
    "preview": "1 Mary and Daniel went to the bathroom.\n2 Then they journeyed to the hallway.\n3 Where is Daniel? \thallway\t1 2\n4 Sandra a"
  },
  {
    "path": "tasksv11/en/qa14_time-reasoning_test.txt",
    "chars": 151420,
    "preview": "1 This morning Mary moved to the kitchen.\n2 This afternoon Mary moved to the cinema.\n3 Yesterday Bill went to the bedroo"
  },
  {
    "path": "tasksv11/en/qa14_time-reasoning_train.txt",
    "chars": 152131,
    "preview": "1 Bill went back to the cinema yesterday.\n2 Julie went to the school this morning.\n3 Fred went to the park yesterday.\n4 "
  },
  {
    "path": "tasksv11/en/qa15_basic-deduction_test.txt",
    "chars": 87769,
    "preview": "1 Wolves are afraid of mice.\n2 Sheep are afraid of mice.\n3 Winona is a sheep.\n4 Mice are afraid of cats.\n5 Cats are afra"
  },
  {
    "path": "tasksv11/en/qa15_basic-deduction_train.txt",
    "chars": 87688,
    "preview": "1 Mice are afraid of wolves.\n2 Gertrude is a mouse.\n3 Cats are afraid of sheep.\n4 Winona is a mouse.\n5 Sheep are afraid "
  },
  {
    "path": "tasksv11/en/qa16_basic-induction_test.txt",
    "chars": 285966,
    "preview": "1 Brian is a swan.\n2 Julius is a rhino.\n3 Brian is gray.\n4 Lily is a swan.\n5 Bernhard is a lion.\n6 Greg is a swan.\n7 Ber"
  },
  {
    "path": "tasksv11/en/qa16_basic-induction_train.txt",
    "chars": 281808,
    "preview": "1 Julius is a rhino.\n2 Brian is a swan.\n3 Brian is white.\n4 Greg is a frog.\n5 Julius is yellow.\n6 Greg is gray.\n7 Bernha"
  },
  {
    "path": "tasksv11/en/qa17_positional-reasoning_test.txt",
    "chars": 68578,
    "preview": "1 The pink rectangle is to the left of the triangle.\n2 The triangle is to the left of the red square.\n3 Is the pink rect"
  },
  {
    "path": "tasksv11/en/qa17_positional-reasoning_train.txt",
    "chars": 68596,
    "preview": "1 The triangle is above the pink rectangle.\n2 The blue square is to the left of the triangle.\n3 Is the pink rectangle to"
  },
  {
    "path": "tasksv11/en/qa18_size-reasoning_test.txt",
    "chars": 103626,
    "preview": "1 The suitcase fits inside the box.\n2 The chocolate fits inside the box.\n3 The container is bigger than the box of choco"
  },
  {
    "path": "tasksv11/en/qa18_size-reasoning_train.txt",
    "chars": 104815,
    "preview": "1 The box of chocolates fits inside the chest.\n2 The box is bigger than the chest.\n3 The box is bigger than the suitcase"
  },
  {
    "path": "tasksv11/en/qa19_path-finding_test.txt",
    "chars": 247449,
    "preview": "1 The garden is west of the bathroom.\n2 The bedroom is north of the hallway.\n3 The office is south of the hallway.\n4 The"
  },
  {
    "path": "tasksv11/en/qa19_path-finding_train.txt",
    "chars": 247496,
    "preview": "1 The office is east of the hallway.\n2 The kitchen is north of the office.\n3 The garden is west of the bedroom.\n4 The of"
  },
  {
    "path": "tasksv11/en/qa1_single-supporting-fact_test.txt",
    "chars": 94477,
    "preview": "1 John travelled to the hallway.\n2 Mary journeyed to the bathroom.\n3 Where is John? \thallway\t1\n4 Daniel went back to the"
  },
  {
    "path": "tasksv11/en/qa1_single-supporting-fact_train.txt",
    "chars": 94346,
    "preview": "1 Mary moved to the bathroom.\n2 John went to the hallway.\n3 Where is Mary? \tbathroom\t1\n4 Daniel went back to the hallway"
  },
  {
    "path": "tasksv11/en/qa20_agents-motivations_test.txt",
    "chars": 67963,
    "preview": "1 Jason is thirsty.\n2 Where will jason go?\tkitchen\t1\n3 Antoine is bored.\n4 Where will antoine go?\tgarden\t3\n5 Jason went "
  },
  {
    "path": "tasksv11/en/qa20_agents-motivations_train.txt",
    "chars": 67892,
    "preview": "1 Sumit is tired.\n2 Where will sumit go?\tbedroom\t1\n3 Sumit went back to the bedroom.\n4 Why did sumit go to the bedroom?\t"
  },
  {
    "path": "tasksv11/en/qa2_two-supporting-facts_test.txt",
    "chars": 179835,
    "preview": "1 Mary got the milk there.\n2 John moved to the bedroom.\n3 Sandra went back to the kitchen.\n4 Mary travelled to the hallw"
  },
  {
    "path": "tasksv11/en/qa2_two-supporting-facts_train.txt",
    "chars": 177412,
    "preview": "1 Mary moved to the bathroom.\n2 Sandra journeyed to the bedroom.\n3 Mary got the football there.\n4 John went to the kitch"
  },
  {
    "path": "tasksv11/en/qa3_three-supporting-facts_test.txt",
    "chars": 526527,
    "preview": "1 Mary got the milk.\n2 John moved to the bedroom.\n3 Daniel journeyed to the office.\n4 John grabbed the apple there.\n5 Jo"
  },
  {
    "path": "tasksv11/en/qa3_three-supporting-facts_train.txt",
    "chars": 529204,
    "preview": "1 Mary moved to the bathroom.\n2 Sandra journeyed to the bedroom.\n3 Mary got the football there.\n4 John went back to the "
  },
  {
    "path": "tasksv11/en/qa4_two-arg-relations_test.txt",
    "chars": 117394,
    "preview": "1 The hallway is east of the bathroom.\n2 The bedroom is west of the bathroom.\n3 What is the bathroom east of?\tbedroom\t2\n"
  },
  {
    "path": "tasksv11/en/qa4_two-arg-relations_train.txt",
    "chars": 117470,
    "preview": "1 The office is north of the kitchen.\n2 The garden is south of the kitchen.\n3 What is north of the kitchen?\toffice\t1\n1 T"
  },
  {
    "path": "tasksv11/en/qa5_three-arg-relations_test.txt",
    "chars": 207037,
    "preview": "1 Fred picked up the football there.\n2 Fred gave the football to Jeff.\n3 What did Fred give to Jeff? \tfootball\t2\n4 Bill "
  },
  {
    "path": "tasksv11/en/qa5_three-arg-relations_train.txt",
    "chars": 199473,
    "preview": "1 Bill travelled to the office.\n2 Bill picked up the football there.\n3 Bill went to the bedroom.\n4 Bill gave the footbal"
  },
  {
    "path": "tasksv11/en/qa6_yes-no-questions_test.txt",
    "chars": 100109,
    "preview": "1 Mary got the milk there.\n2 John moved to the bedroom.\n3 Is John in the kitchen? \tno\t2\n4 Mary discarded the milk.\n5 Joh"
  },
  {
    "path": "tasksv11/en/qa6_yes-no-questions_train.txt",
    "chars": 100561,
    "preview": "1 Mary moved to the bathroom.\n2 Sandra journeyed to the bedroom.\n3 Is Sandra in the hallway? \tno\t2\n4 Mary went back to t"
  },
  {
    "path": "tasksv11/en/qa7_counting_test.txt",
    "chars": 130136,
    "preview": "1 Mary got the milk there.\n2 John moved to the bedroom.\n3 How many objects is Mary carrying? \tone\t1\n4 Sandra went back t"
  },
  {
    "path": "tasksv11/en/qa7_counting_train.txt",
    "chars": 135895,
    "preview": "1 Mary moved to the bathroom.\n2 Sandra journeyed to the bedroom.\n3 John went to the kitchen.\n4 Mary took the football th"
  },
  {
    "path": "tasksv11/en/qa8_lists-sets_test.txt",
    "chars": 126863,
    "preview": "1 Mary got the milk there.\n2 John moved to the bedroom.\n3 What is Mary carrying? \tmilk\t1\n4 John picked up the football t"
  },
  {
    "path": "tasksv11/en/qa8_lists-sets_train.txt",
    "chars": 124487,
    "preview": "1 Mary moved to the bathroom.\n2 Sandra journeyed to the bedroom.\n3 Mary got the football there.\n4 John went to the kitch"
  },
  {
    "path": "tasksv11/en/qa9_simple-negation_test.txt",
    "chars": 97208,
    "preview": "1 John is in the hallway.\n2 Sandra is in the kitchen.\n3 Is Sandra in the bedroom? \tno\t2\n4 Sandra journeyed to the bedroo"
  },
  {
    "path": "tasksv11/en/qa9_simple-negation_train.txt",
    "chars": 97310,
    "preview": "1 Mary is no longer in the bedroom.\n2 Daniel moved to the hallway.\n3 Is Mary in the bedroom? \tno\t1\n4 Sandra moved to the"
  },
  {
    "path": "tasksv11/shuffled/qa10_indefinite-knowledge_test.txt",
    "chars": 99632,
    "preview": "1 Utxi el em qzh lozbbu.\n2 Zeuu el em qzh keqozhm.\n3 Xl Zeuu em qzh phaxbby? \tmb\t2\n4 Zeuu cbdxmhiha qb qzh phaxbby.\n5 Tx"
  },
  {
    "path": "tasksv11/shuffled/qa10_indefinite-knowledge_train.txt",
    "chars": 99217,
    "preview": "1 Txha el heqzhx em qzh lozbbu bx qzh vtxk.\n2 Utxi jhmq ptok qb qzh bsseoh.\n3 Xl Utxi em qzh bsseoh? \tihl\t2\n4 Zeuu el he"
  },
  {
    "path": "tasksv11/shuffled/qa11_basic-coreference_test.txt",
    "chars": 104273,
    "preview": "1 Hbzm cbdxmhiha qb qzh ztuujti.\n2 Csqhx qztq zh cbdxmhiha qb qzh rtxahm.\n3 Mzhxh el Hbzm? \trtxahm\t1 2\n4 Hbzm ybnha qb q"
  },
  {
    "path": "tasksv11/shuffled/qa11_basic-coreference_train.txt",
    "chars": 104043,
    "preview": "1 Utxi jhmq ptok qb qzh ptqzxbby.\n2 Csqhx qztq zh jhmq qb qzh phaxbby.\n3 Mzhxh el Utxi? \tphaxbby\t1 2\n4 Ytmehu ybnha qb q"
  },
  {
    "path": "tasksv11/shuffled/qa12_conjunction_test.txt",
    "chars": 114489,
    "preview": "1 Hbzm tma Utxi qxtnhuuha qb qzh ztuujti.\n2 Otmaxt tma Utxi cbdxmhiha qb qzh phaxbby.\n3 Mzhxh el Utxi? \tphaxbby\t2\n4 Utxi"
  },
  {
    "path": "tasksv11/shuffled/qa12_conjunction_train.txt",
    "chars": 114560,
    "preview": "1 Utxi tma Ytmehu qxtnhuuha qb qzh ptqzxbby.\n2 Hbzm tma Ytmehu qxtnhuuha qb qzh bsseoh.\n3 Mzhxh el Ytmehu? \tbsseoh\t2\n4 O"
  },
  {
    "path": "tasksv11/shuffled/qa13_compound-coreference_test.txt",
    "chars": 116444,
    "preview": "1 Hbzm tma Utxi jhmq ptok qb qzh ztuujti.\n2 Rzhm qzhi jhmq qb qzh ptqzxbby.\n3 Mzhxh el Hbzm? \tptqzxbby\t1 2\n4 Utxi tma Hb"
  },
  {
    "path": "tasksv11/shuffled/qa13_compound-coreference_train.txt",
    "chars": 116516,
    "preview": "1 Utxi tma Ytmehu jhmq qb qzh ptqzxbby.\n2 Rzhm qzhi cbdxmhiha qb qzh ztuujti.\n3 Mzhxh el Ytmehu? \tztuujti\t1 2\n4 Otmaxt t"
  },
  {
    "path": "tasksv11/shuffled/qa14_time-reasoning_test.txt",
    "chars": 151420,
    "preview": "1 Rzel ybxmemr Utxi ybnha qb qzh keqozhm.\n2 Rzel tsqhxmbbm Utxi ybnha qb qzh oemhyt.\n3 Vhlqhxati Zeuu jhmq qb qzh phaxbb"
  },
  {
    "path": "tasksv11/shuffled/qa14_time-reasoning_train.txt",
    "chars": 152131,
    "preview": "1 Zeuu jhmq ptok qb qzh oemhyt ihlqhxati.\n2 Hdueh jhmq qb qzh lozbbu qzel ybxmemr.\n3 Txha jhmq qb qzh vtxk ihlqhxati.\n4 "
  },
  {
    "path": "tasksv11/shuffled/qa15_basic-deduction_test.txt",
    "chars": 87769,
    "preview": "1 Mbunhl txh tsxtea bs yeoh.\n2 Ozhhv txh tsxtea bs yeoh.\n3 Membmt el t lzhhv.\n4 Ueoh txh tsxtea bs otql.\n5 Wtql txh tsxt"
  },
  {
    "path": "tasksv11/shuffled/qa15_basic-deduction_train.txt",
    "chars": 87688,
    "preview": "1 Ueoh txh tsxtea bs jbunhl.\n2 Ahxqxdah el t ybdlh.\n3 Wtql txh tsxtea bs lzhhv.\n4 Membmt el t ybdlh.\n5 Ozhhv txh tsxtea "
  },
  {
    "path": "tasksv11/shuffled/qa16_basic-induction_test.txt",
    "chars": 285966,
    "preview": "1 Zxetm el t ljtm.\n2 Hduedl el t xzemb.\n3 Zxetm el rxti.\n4 Feui el t ljtm.\n5 Zhxmztxa el t uebm.\n6 Axhr el t ljtm.\n7 Zhx"
  },
  {
    "path": "tasksv11/shuffled/qa16_basic-induction_train.txt",
    "chars": 281808,
    "preview": "1 Hduedl el t xzemb.\n2 Zxetm el t ljtm.\n3 Zxetm el jzeqh.\n4 Axhr el t sxbr.\n5 Hduedl el ihuubj.\n6 Axhr el rxti.\n7 Zhxmzt"
  },
  {
    "path": "tasksv11/shuffled/qa17_positional-reasoning_test.txt",
    "chars": 68578,
    "preview": "1 Rzh vemk xhoqtmruh el qb qzh uhsq bs qzh qxetmruh.\n2 Rzh qxetmruh el qb qzh uhsq bs qzh xha lfdtxh.\n3 Xl qzh vemk xhoq"
  },
  {
    "path": "tasksv11/shuffled/qa17_positional-reasoning_train.txt",
    "chars": 68596,
    "preview": "1 Rzh qxetmruh el tpbnh qzh vemk xhoqtmruh.\n2 Rzh pudh lfdtxh el qb qzh uhsq bs qzh qxetmruh.\n3 Xl qzh vemk xhoqtmruh qb"
  },
  {
    "path": "tasksv11/shuffled/qa18_size-reasoning_test.txt",
    "chars": 103626,
    "preview": "1 Rzh ldeqotlh seql emleah qzh pbw.\n2 Rzh ozbobutqh seql emleah qzh pbw.\n3 Rzh obmqtemhx el perrhx qztm qzh pbw bs ozbob"
  },
  {
    "path": "tasksv11/shuffled/qa18_size-reasoning_train.txt",
    "chars": 104815,
    "preview": "1 Rzh pbw bs ozbobutqhl seql emleah qzh ozhlq.\n2 Rzh pbw el perrhx qztm qzh ozhlq.\n3 Rzh pbw el perrhx qztm qzh ldeqotlh"
  },
  {
    "path": "tasksv11/shuffled/qa19_path-finding_test.txt",
    "chars": 247449,
    "preview": "1 Rzh rtxahm el jhlq bs qzh ptqzxbby.\n2 Rzh phaxbby el mbxqz bs qzh ztuujti.\n3 Rzh bsseoh el lbdqz bs qzh ztuujti.\n4 Rzh"
  },
  {
    "path": "tasksv11/shuffled/qa19_path-finding_train.txt",
    "chars": 247496,
    "preview": "1 Rzh bsseoh el htlq bs qzh ztuujti.\n2 Rzh keqozhm el mbxqz bs qzh bsseoh.\n3 Rzh rtxahm el jhlq bs qzh phaxbby.\n4 Rzh bs"
  },
  {
    "path": "tasksv11/shuffled/qa1_single-supporting-fact_test.txt",
    "chars": 94477,
    "preview": "1 Hbzm qxtnhuuha qb qzh ztuujti.\n2 Utxi cbdxmhiha qb qzh ptqzxbby.\n3 Mzhxh el Hbzm? \tztuujti\t1\n4 Ytmehu jhmq ptok qb qzh"
  },
  {
    "path": "tasksv11/shuffled/qa1_single-supporting-fact_train.txt",
    "chars": 94346,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Hbzm jhmq qb qzh ztuujti.\n3 Mzhxh el Utxi? \tptqzxbby\t1\n4 Ytmehu jhmq ptok qb qzh ztuujti"
  },
  {
    "path": "tasksv11/shuffled/qa20_agents-motivations_test.txt",
    "chars": 67963,
    "preview": "1 Htlbm el qzexlqi.\n2 Mzhxh jeuu ctlbm rb?\tkeqozhm\t1\n3 Cmqbemh el pbxha.\n4 Mzhxh jeuu tmqbemh rb?\trtxahm\t3\n5 Htlbm jhmq "
  },
  {
    "path": "tasksv11/shuffled/qa20_agents-motivations_train.txt",
    "chars": 67892,
    "preview": "1 Odyeq el qexha.\n2 Mzhxh jeuu ldyeq rb?\tphaxbby\t1\n3 Odyeq jhmq ptok qb qzh phaxbby.\n4 Mzi aea ldyeq rb qb qzh phaxbby?\t"
  },
  {
    "path": "tasksv11/shuffled/qa2_two-supporting-facts_test.txt",
    "chars": 179835,
    "preview": "1 Utxi rbq qzh yeuk qzhxh.\n2 Hbzm ybnha qb qzh phaxbby.\n3 Otmaxt jhmq ptok qb qzh keqozhm.\n4 Utxi qxtnhuuha qb qzh ztuuj"
  },
  {
    "path": "tasksv11/shuffled/qa2_two-supporting-facts_train.txt",
    "chars": 177412,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Otmaxt cbdxmhiha qb qzh phaxbby.\n3 Utxi rbq qzh sbbqptuu qzhxh.\n4 Hbzm jhmq qb qzh keqoz"
  },
  {
    "path": "tasksv11/shuffled/qa3_three-supporting-facts_test.txt",
    "chars": 526527,
    "preview": "1 Utxi rbq qzh yeuk.\n2 Hbzm ybnha qb qzh phaxbby.\n3 Ytmehu cbdxmhiha qb qzh bsseoh.\n4 Hbzm rxtppha qzh tvvuh qzhxh.\n5 Hb"
  },
  {
    "path": "tasksv11/shuffled/qa3_three-supporting-facts_train.txt",
    "chars": 529204,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Otmaxt cbdxmhiha qb qzh phaxbby.\n3 Utxi rbq qzh sbbqptuu qzhxh.\n4 Hbzm jhmq ptok qb qzh "
  },
  {
    "path": "tasksv11/shuffled/qa4_two-arg-relations_test.txt",
    "chars": 117394,
    "preview": "1 Rzh ztuujti el htlq bs qzh ptqzxbby.\n2 Rzh phaxbby el jhlq bs qzh ptqzxbby.\n3 Mztq el qzh ptqzxbby htlq bs?\tphaxbby\t2\n"
  },
  {
    "path": "tasksv11/shuffled/qa4_two-arg-relations_train.txt",
    "chars": 117470,
    "preview": "1 Rzh bsseoh el mbxqz bs qzh keqozhm.\n2 Rzh rtxahm el lbdqz bs qzh keqozhm.\n3 Mztq el mbxqz bs qzh keqozhm?\tbsseoh\t1\n1 R"
  },
  {
    "path": "tasksv11/shuffled/qa5_three-arg-relations_test.txt",
    "chars": 207037,
    "preview": "1 Txha veokha dv qzh sbbqptuu qzhxh.\n2 Txha rtnh qzh sbbqptuu qb Hhss.\n3 Mztq aea Txha renh qb Hhss? \tsbbqptuu\t2\n4 Zeuu "
  },
  {
    "path": "tasksv11/shuffled/qa5_three-arg-relations_train.txt",
    "chars": 199473,
    "preview": "1 Zeuu qxtnhuuha qb qzh bsseoh.\n2 Zeuu veokha dv qzh sbbqptuu qzhxh.\n3 Zeuu jhmq qb qzh phaxbby.\n4 Zeuu rtnh qzh sbbqptu"
  },
  {
    "path": "tasksv11/shuffled/qa6_yes-no-questions_test.txt",
    "chars": 100109,
    "preview": "1 Utxi rbq qzh yeuk qzhxh.\n2 Hbzm ybnha qb qzh phaxbby.\n3 Xl Hbzm em qzh keqozhm? \tmb\t2\n4 Utxi aelotxaha qzh yeuk.\n5 Hbz"
  },
  {
    "path": "tasksv11/shuffled/qa6_yes-no-questions_train.txt",
    "chars": 100561,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Otmaxt cbdxmhiha qb qzh phaxbby.\n3 Xl Otmaxt em qzh ztuujti? \tmb\t2\n4 Utxi jhmq ptok qb q"
  },
  {
    "path": "tasksv11/shuffled/qa7_counting_test.txt",
    "chars": 130136,
    "preview": "1 Utxi rbq qzh yeuk qzhxh.\n2 Hbzm ybnha qb qzh phaxbby.\n3 Dbj ytmi bpchoql el Utxi otxxiemr? \tbmh\t1\n4 Otmaxt jhmq ptok q"
  },
  {
    "path": "tasksv11/shuffled/qa7_counting_train.txt",
    "chars": 135895,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Otmaxt cbdxmhiha qb qzh phaxbby.\n3 Hbzm jhmq qb qzh keqozhm.\n4 Utxi qbbk qzh sbbqptuu qz"
  },
  {
    "path": "tasksv11/shuffled/qa8_lists-sets_test.txt",
    "chars": 126863,
    "preview": "1 Utxi rbq qzh yeuk qzhxh.\n2 Hbzm ybnha qb qzh phaxbby.\n3 Mztq el Utxi otxxiemr? \tyeuk\t1\n4 Hbzm veokha dv qzh sbbqptuu q"
  },
  {
    "path": "tasksv11/shuffled/qa8_lists-sets_train.txt",
    "chars": 124487,
    "preview": "1 Utxi ybnha qb qzh ptqzxbby.\n2 Otmaxt cbdxmhiha qb qzh phaxbby.\n3 Utxi rbq qzh sbbqptuu qzhxh.\n4 Hbzm jhmq qb qzh keqoz"
  },
  {
    "path": "tasksv11/shuffled/qa9_simple-negation_test.txt",
    "chars": 97208,
    "preview": "1 Hbzm el em qzh ztuujti.\n2 Otmaxt el em qzh keqozhm.\n3 Xl Otmaxt em qzh phaxbby? \tmb\t2\n4 Otmaxt cbdxmhiha qb qzh phaxbb"
  },
  {
    "path": "tasksv11/shuffled/qa9_simple-negation_train.txt",
    "chars": 97310,
    "preview": "1 Utxi el mb ubmrhx em qzh phaxbby.\n2 Ytmehu ybnha qb qzh ztuujti.\n3 Xl Utxi em qzh phaxbby? \tmb\t1\n4 Otmaxt ybnha qb qzh"
  }
]

// ... and 3 more files (download for full content)

About this extraction

This page contains the full source code of the iamtrask/Grokking-Deep-Learning GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 106 files (75.0 MB), approximately 3.2M tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo