Full Code of jmugan/modern_practical_nlp for AI

master 1ac3e8102ce7 cached
6 files
512.5 KB
145.1k tokens
1 requests
Download .txt
Showing preview only (526K chars total). Download the full file or copy to clipboard to get everything.
Repository: jmugan/modern_practical_nlp
Branch: master
Commit: 1ac3e8102ce7
Files: 6
Total size: 512.5 KB

Directory structure:
gitextract_yp9_knfn/

├── Episode_1_Text_to_Vectors.ipynb
├── Episode_2_Classifying_with_Vectors.ipynb
├── Episode_3_Visualizing_Vectors.ipynb
├── Episode_4_Generating_Text_and_Extracting_Info.ipynb
├── README.md
└── jmugan_tweets.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: Episode_1_Text_to_Vectors.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Modern Practical Natural Language Processing\n",
    "\n",
    "This course will cover how you can use NLP to do stuff. \n",
    "\n",
    "We envision four videos.\n",
    "1. Overview and Converting Text to Vectors (this one)\n",
    "  * For finding similar documents\n",
    "  * \"I have this document or text, what others talk about the same stuff?\"\n",
    "2. Learning with Vectors and Classification\n",
    "  * For classifying documents\n",
    "  * \"I need to put these documents into buckets.\"\n",
    "3. Sequence Generation \n",
    "  * For translation and document summarization\n",
    "  * \"I need to create quick summaries of these documents, maybe in Urdu.\"\n",
    "4. Extracting Pieces of Information from Text\n",
    "  * For pulling out sentences and documents that talk about specific things\n",
    "  * \"I need every mention of a street address or business in Garland, Texas.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Additional Details\n",
    "\n",
    "The idea is we make short videos that focus on the aspects of NLP that currently work well and are useful.\n",
    "\n",
    "Speech-to-text now works pretty well, so these methods will also be useful for the audio portions of videos.\n",
    "\n",
    "All code will be available on GitHub here https://github.com/jmugan/modern_practical_nlp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# About Me, Jonathan Mugan\n",
    "* PhD in Computer Science in 2010 from UT Austin\n",
    "* Thesis work was about how a robot could wake up in the world and figure out what is going on\n",
    "* Work at [DeUmbra](https://deumbra.com/) where we build AI for the DoD\n",
    "  * We also work in healthcare, which I can talk about. A future video (not in this series) will cover how we use graph neural networks to identify who is at risk for opioid overdose\n",
    "* Wrote *The Curiosity Cycle: Preparing Your Child for the Ongoing Technological Explosion* http://www.jonathanmugan.com/CuriosityCycle/\n",
    "* Also do independent consulting work\n",
    "* Can find me here jonathanwilliammugan@gmail.com or on Twitter at [@jmugan](https://twitter.com/jmugan)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# The Limits of NLP\n",
    "## Computers can't read\n",
    "* Reading requires mapping language to internal concepts grounded in behaving in the same general environment as the writer.\n",
    "  * Computers don’t have those concepts.\n",
    "  * Example: “I pulled the wagon.” Computers don’t know that wagons can carry things or that pulling exerts a gentle tension to the arm and leg muscles as one walks.\n",
    "  \n",
    "## Computers can't write\n",
    "* Writing requires mapping internal concepts grounded in behaving in the same general environment as the expected reader.\n",
    "  * Computers don’t have those concepts"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Additional Information on NLP, AI, and Their Limits and Promise\n",
    "* [Generating Natural-Language Text with Neural Networks](https://medium.com/@jmugan/generating-natural-language-text-with-neural-networks-e983bb48caad)\n",
    "* [Why Is There Life? and What Does It Have to Do with AI?](https://towardsdatascience.com/why-is-there-life-and-what-does-it-have-to-do-with-ai-2195ac91532f)\n",
    "* [Chatbots: Theory and Practice](https://medium.com/intuitionmachine/chatbots-theory-and-practice-3274f7e6d648)\n",
    "* [You and Your Bot: A New Kind of Lifelong Relationship](https://chatbotsmagazine.com/you-and-your-bot-a-new-kind-of-lifelong-relationship-6a9649feeb71)\n",
    "* [Computers Could Understand Natural Language Using Simulated Physics](https://chatbotslife.com/computers-could-understand-natural-language-using-simulated-physics-26e9706013da)\n",
    "* [The Two Paths from Natural Language Processing to Artificial Intelligence](https://medium.com/intuitionmachine/the-two-paths-from-natural-language-processing-to-artificial-intelligence-d5384ddbfc18)\n",
    "* [DeepGrammar: Grammar Checking Using Deep Learning](https://www.linkedin.com/pulse/deep-grammar-checking-using-learning-jonathan-mugan)\n",
    "* [Deep Learning for Natural Language Processing](https://www.linkedin.com/pulse/deep-learning-natural-language-processing-jonathan-mugan)\n",
    "* [What Deep Learning Really Means](https://www.linkedin.com/pulse/20141114065942-42285562-what-deep-learning-really-means)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## NLP Works Around Computers Not Having the Experience or Conceptual Framework to Read and Write\n",
    "* NLP is about how to make natural language amenable to computation even though computers can’t read or write.\n",
    "* Representing text as *vectors* has transformed NLP in the last 10 years.\n",
    "* There are also *symbolic methods* that are practically useful; we will cover those too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Now the Meat of Video 1: Generating Vectors from Text\n",
    "* By converting a sentence, paragraph, or document to a vector, we can measure its distance to the vectors of other texts.\n",
    "  * We can find similar texts this way.\n",
    "* Similar to how computer vision is done. We can say there is a cat in an image, and we can say that this image is similar to this other image, but we don't really have a human-understandable representation of what exactly is going on.\n",
    "* But it is still useful."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tools to make vectors from text\n",
    "* PyTorch https://pytorch.org/\n",
    "  * Like TensorFlow, does automatic differentiation\n",
    "* HuggingFace Transformer library https://huggingface.co/transformers/index.html\n",
    "  * Built on top of PyTorch\n",
    "* spaCy https://spacy.io/\n",
    "  * Most common tool for NLP practitioners"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get some Python going, if you have Anaconda, you can do from the command line:\n",
    "\n",
    "`conda create -n nlp_videos python=3.8`\n",
    "\n",
    "`conda activate nlp_videos`\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install PyTorch from the command line\n",
    "\n",
    "`pip install torch torchvision`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install HuggingFace Transformers from the command line\n",
    "\n",
    "`pip install transformers`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import a bunch of stuff\n",
    "from typing import List, Tuple\n",
    "import scipy\n",
    "import numpy as np\n",
    "import os.path\n",
    "from pathlib import Path\n",
    "import pickle\n",
    "\n",
    "import torch  # PyTorch\n",
    "\n",
    "from transformers import BertModel, BertTokenizer, BertConfig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://huggingface.co/transformers/pretrained_models.html\n",
    "model_name = 'bert-base-uncased'\n",
    "\n",
    "# Need to use the same tokenizer that was used to train the model so that it breaks \n",
    "# up words into tokens the same way.\n",
    "tokenizer = BertTokenizer.from_pretrained(model_name)\n",
    "\n",
    "# This model is huge!!!!!!!!\n",
    "model = BertModel.from_pretrained(model_name)\n",
    "\n",
    "# Parameters used by the pre-trained model\n",
    "config = BertConfig.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define a function to convert text to tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_tokens(text: str,\n",
    "               tokenizer: BertTokenizer,\n",
    "               config: BertConfig) -> List[str]:\n",
    "    \n",
    "    tokens = tokenizer.tokenize(text)\n",
    "    \n",
    "    # make sure it isn't too long\n",
    "    max_length = config.max_position_embeddings\n",
    "    tokens = tokens[:max_length-1] # Will add special begin token\n",
    "    \n",
    "    # cls token to hold vector \n",
    "    # https://huggingface.co/transformers/main_classes/tokenizer.html\n",
    "    tokens = [tokenizer.cls_token] + tokens\n",
    "    \n",
    "    return tokens"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['[CLS]', 'i', 'went', 'to', 'the', 'store', '.']\n"
     ]
    }
   ],
   "source": [
    "text = \"I went to the store.\"\n",
    "tokens = get_tokens(text, tokenizer, config)\n",
    "print(tokens)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert tokens into integer ids"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[101, 1045, 2253, 2000, 1996, 3573, 1012]\n"
     ]
    }
   ],
   "source": [
    "token_ids: List[int] = tokenizer.convert_tokens_to_ids(tokens)\n",
    "print(token_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert the input into a Torch tensor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([7]) tensor([ 101, 1045, 2253, 2000, 1996, 3573, 1012])\n"
     ]
    }
   ],
   "source": [
    "token_ids_tensor = torch.tensor(token_ids)\n",
    "print(token_ids_tensor.shape, token_ids_tensor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Make it shape `(1 x num_tokens)`\n",
    "\n",
    "In real applications, we would send a bunch of sentences in at once."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "torch.Size([1, 7]) tensor([[ 101, 1045, 2253, 2000, 1996, 3573, 1012]])\n"
     ]
    }
   ],
   "source": [
    "token_ids_tensor = torch.unsqueeze(token_ids_tensor, 0)\n",
    "print(token_ids_tensor.shape, token_ids_tensor)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's convert it into a vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'torch.Tensor'> tensor([[-6.8463e-01, -2.8959e-01, -2.9154e-01,  6.7127e-01,  3.9507e-01,\n",
      "         -2.1678e-01,  6.2011e-01,  1.9677e-01, -2.8793e-02, -9.4984e-01,\n",
      "          6.5669e-02,  4.6028e-01,  9.6631e-01, -2.3616e-01,  9.4689e-01,\n",
      "         -6.0630e-01, -4.7309e-01, -4.3481e-01,  2.0222e-01, -3.4150e-01,\n",
      "          8.3457e-01,  9.8431e-01,  2.2365e-01,  3.2114e-01,  4.1334e-01,\n",
      "          2.7076e-01, -3.6277e-01,  9.4006e-01,  8.4959e-01,  6.9592e-01,\n",
      "         -6.7077e-01, -7.6625e-02, -9.8442e-01, -1.7111e-01, -7.3174e-01,\n",
      "         -9.3164e-01,  7.5510e-02, -3.6217e-01, -1.8402e-02,  2.4065e-01,\n",
      "         -8.9202e-01,  2.1222e-01,  9.8343e-01, -7.1324e-01,  6.2176e-01,\n",
      "         -1.2014e-01, -9.9812e-01,  7.7176e-02, -8.9065e-01, -4.3335e-01,\n",
      "          2.4937e-01,  1.9725e-01, -1.0021e-02,  2.5571e-01,  2.0711e-01,\n",
      "         -4.4047e-01, -1.4055e-01,  1.4745e-01, -4.4538e-02, -2.2215e-01,\n",
      "         -5.2094e-01,  3.4585e-01, -5.1174e-01, -7.0657e-01, -1.8190e-01,\n",
      "         -1.1501e-01, -1.8398e-01, -4.6214e-02,  7.4689e-02, -1.3377e-01,\n",
      "          6.5313e-01,  4.1187e-01,  4.7467e-01, -8.5554e-01, -4.7246e-01,\n",
      "          2.6163e-01, -5.9148e-01,  9.9996e-01, -2.2857e-01, -9.7987e-01,\n",
      "          7.7000e-01, -3.0872e-01,  6.1258e-01,  5.1875e-01, -5.7974e-01,\n",
      "         -9.9979e-01,  3.0398e-01, -2.2005e-01, -9.7479e-01, -1.5614e-01,\n",
      "          5.7737e-01, -1.0548e-01,  4.3095e-01,  5.8140e-01, -3.3167e-01,\n",
      "         -5.9095e-01, -6.8403e-02, -3.7091e-01, -1.2306e-01, -3.5265e-01,\n",
      "          2.0860e-01,  3.8559e-02, -2.1643e-01, -7.3485e-02,  1.6054e-01,\n",
      "         -1.7230e-01, -6.1174e-01, -3.2474e-01,  2.1762e-01,  4.0952e-01,\n",
      "          4.8370e-01, -6.3214e-02,  2.3135e-01, -8.5628e-01,  5.7745e-02,\n",
      "         -6.2461e-02, -9.6422e-01, -5.9186e-01, -9.7909e-01,  6.3998e-01,\n",
      "         -3.9063e-01, -3.8640e-01,  9.2037e-01, -2.2520e-01,  9.6262e-03,\n",
      "         -2.1822e-01,  1.9589e-01, -9.9999e-01, -1.9221e-01, -5.8043e-01,\n",
      "          4.0236e-01, -2.3272e-01, -9.5746e-01, -9.5151e-01,  3.5794e-01,\n",
      "          8.5313e-01,  6.6643e-03,  9.4455e-01, -1.5654e-01,  9.3562e-01,\n",
      "          4.7287e-01, -3.5763e-01, -1.6274e-01, -4.1491e-01,  5.4784e-01,\n",
      "          1.2096e-01, -4.5101e-01,  1.7629e-01, -2.0530e-01, -2.0993e-01,\n",
      "         -6.8818e-01, -1.1124e-02, -2.3549e-01, -8.2398e-01, -2.2184e-01,\n",
      "          9.6111e-01, -2.9530e-01,  1.3862e-01,  4.2531e-01, -3.6020e-02,\n",
      "         -4.8034e-01,  6.0736e-01,  7.4829e-01,  1.3367e-01, -1.7527e-01,\n",
      "          4.8676e-02, -5.9256e-01,  1.1316e-01, -6.5533e-01,  4.6883e-01,\n",
      "          2.5188e-01, -1.1483e-01, -2.0792e-01, -9.8412e-01, -6.5169e-02,\n",
      "         -4.1505e-02,  9.7155e-01,  4.7783e-01,  8.8895e-02, -1.8174e-01,\n",
      "         -2.0750e-01,  3.6108e-01, -9.0281e-01,  9.6643e-01,  6.3364e-02,\n",
      "          8.6277e-02, -4.3666e-01, -2.8309e-01, -8.4300e-01, -1.6856e-01,\n",
      "          6.5029e-01, -4.9486e-02, -5.9637e-01, -2.2820e-02, -3.2272e-01,\n",
      "         -8.1942e-02, -4.9286e-01,  3.8231e-01, -1.7478e-01, -2.7009e-01,\n",
      "         -2.5142e-01,  8.7474e-01,  7.1716e-01,  4.4309e-01, -1.8028e-01,\n",
      "          4.2791e-01, -7.5773e-01, -7.4819e-01, -6.1290e-02, -4.7889e-02,\n",
      "          1.3942e-01,  9.7752e-01, -7.6803e-01,  1.9361e-02, -8.3478e-01,\n",
      "         -9.6340e-01,  3.9162e-02, -8.5415e-01, -2.0570e-01, -2.8596e-01,\n",
      "          3.7838e-01, -5.4226e-01, -6.4749e-01,  2.5859e-01, -8.8820e-01,\n",
      "         -8.6905e-01,  3.0495e-01, -3.1694e-01,  1.6252e-01, -2.7408e-01,\n",
      "          9.7091e-01,  3.1627e-01, -5.9503e-01,  6.9102e-01,  9.5153e-01,\n",
      "         -7.0397e-01, -8.3993e-01,  7.9146e-01, -1.3707e-01,  4.1921e-01,\n",
      "         -3.4158e-01,  8.4210e-01,  2.3320e-01,  5.9743e-01, -8.6541e-01,\n",
      "         -3.4436e-01, -5.8305e-01,  4.6551e-01, -1.8610e-01, -6.8970e-01,\n",
      "         -3.6817e-01,  5.6169e-01,  1.1734e-01,  6.1270e-01, -2.4431e-01,\n",
      "          7.2129e-01, -9.6205e-01, -9.4983e-01, -7.1281e-01,  2.9124e-01,\n",
      "         -9.8616e-01,  5.0502e-01,  1.4242e-01,  5.0880e-01, -3.0191e-01,\n",
      "         -1.2370e-01, -9.5590e-01,  4.8255e-01,  6.0764e-02,  7.7390e-01,\n",
      "         -5.6401e-01, -7.3786e-01, -1.0773e-01, -9.2285e-01,  9.0839e-02,\n",
      "         -1.4504e-01,  4.3344e-01, -1.2290e-01, -8.5779e-01,  2.1065e-01,\n",
      "          5.7555e-01,  3.3000e-01,  4.6564e-01,  9.3778e-01,  9.9824e-01,\n",
      "          9.6806e-01,  8.2033e-01,  2.1980e-01, -9.8223e-01, -1.2467e-01,\n",
      "          9.7780e-01, -6.0067e-01, -9.9982e-01, -9.1369e-01, -5.2972e-01,\n",
      "          2.0041e-01, -9.9999e-01, -2.2998e-01,  9.7607e-02, -7.3847e-01,\n",
      "         -2.2182e-02,  9.4859e-01,  8.6662e-01, -9.9995e-01,  1.9446e-01,\n",
      "          8.8456e-01, -6.6574e-01,  5.5210e-02, -3.6430e-01,  9.6708e-01,\n",
      "          5.0735e-01,  6.0674e-01, -3.9902e-02,  3.6633e-01, -6.4363e-01,\n",
      "         -6.3040e-01, -3.0765e-02, -5.7105e-01,  8.8454e-01,  1.2028e-01,\n",
      "         -5.3834e-01, -8.3262e-01,  4.8732e-01, -2.1228e-01,  3.4476e-01,\n",
      "         -9.5176e-01, -9.8953e-02,  7.2055e-02,  5.6133e-01,  1.4952e-01,\n",
      "         -7.6354e-02, -3.9572e-01,  6.6977e-02,  1.5441e-01,  1.9891e-01,\n",
      "          6.4225e-01, -6.3491e-01, -2.4015e-01,  5.0210e-01, -4.3184e-01,\n",
      "          6.9309e-03, -9.7179e-01,  9.3479e-01, -4.3015e-02, -3.3167e-01,\n",
      "          9.9997e-01,  2.3615e-01, -7.1675e-01,  2.3047e-01,  1.0895e-01,\n",
      "         -7.4958e-01,  9.9992e-01,  1.8075e-01, -9.7035e-01, -4.9875e-01,\n",
      "          1.1086e-02, -3.2893e-01, -6.0028e-01,  9.4290e-01, -3.8645e-02,\n",
      "          1.1772e-01, -4.4417e-01,  9.8164e-01, -9.8379e-01,  8.6616e-01,\n",
      "         -5.3697e-01, -9.5878e-01,  8.9757e-01,  9.3980e-01, -6.7731e-01,\n",
      "          1.1224e-01,  3.0228e-02,  5.3302e-02,  1.1718e-01, -7.3350e-01,\n",
      "          4.5478e-01,  6.6144e-01, -9.1131e-02,  8.7689e-01, -1.1498e-01,\n",
      "         -5.7983e-01, -1.8680e-01, -2.0697e-01,  5.5213e-01,  6.4853e-01,\n",
      "          1.8649e-01, -1.2323e-01, -1.1785e-01, -9.6719e-02, -8.3518e-01,\n",
      "         -8.9681e-01,  6.8071e-01,  9.9997e-01,  9.6424e-02,  6.6242e-01,\n",
      "          3.5163e-01,  1.3645e-03,  2.0233e-02,  1.7773e-01,  3.2743e-01,\n",
      "          3.2634e-02, -1.9178e-01,  4.0169e-01, -8.7595e-01, -9.8522e-01,\n",
      "          7.1714e-01,  1.4060e-01, -1.2248e-01,  9.9393e-01,  2.6829e-01,\n",
      "         -2.1408e-01, -3.9473e-02,  2.3414e-01,  4.1849e-02,  1.5338e-01,\n",
      "         -2.7583e-01,  9.4316e-01, -1.2325e-01,  5.8779e-01,  6.0759e-01,\n",
      "          2.3006e-01, -5.1844e-01, -3.6375e-01, -5.5054e-02, -9.2117e-01,\n",
      "         -1.7819e-01, -9.1414e-01,  9.4992e-01, -4.7370e-01,  1.2210e-01,\n",
      "          1.0127e-01,  5.8619e-01,  9.9996e-01, -7.7297e-01,  6.3918e-01,\n",
      "         -1.6503e-02,  8.5887e-01, -9.6903e-01, -6.5689e-01, -4.2514e-01,\n",
      "          1.8557e-01,  2.8983e-01, -6.4668e-02,  1.5650e-01, -8.9022e-01,\n",
      "         -3.9330e-01,  1.0839e-01, -8.7953e-01, -9.7536e-01,  1.5899e-01,\n",
      "          6.0560e-01,  1.2089e-01, -8.8287e-01, -4.4234e-01, -4.5809e-01,\n",
      "          1.5670e-01, -1.6881e-01, -9.1873e-01,  5.8025e-01, -3.0912e-01,\n",
      "          2.1214e-01, -2.0353e-01,  5.6712e-01,  7.5314e-02,  7.2038e-01,\n",
      "         -2.5369e-01, -4.3001e-01, -1.8585e-01, -7.0050e-01,  7.6715e-01,\n",
      "         -6.0295e-01, -3.1446e-01, -2.9695e-02,  9.9997e-01, -3.1701e-01,\n",
      "          5.6117e-01,  5.0637e-01,  7.0757e-01, -2.9232e-02,  2.3940e-01,\n",
      "          3.9347e-01,  2.4624e-01,  2.8964e-01,  1.3107e-01, -2.5525e-01,\n",
      "         -1.2822e-01,  5.5914e-01,  6.6970e-01, -4.9123e-02,  7.8410e-01,\n",
      "         -1.6722e-01,  1.1560e-01,  1.6259e-01,  1.9482e-01,  8.9179e-01,\n",
      "         -1.2318e-01, -2.1431e-01, -1.5294e-01, -2.1595e-01, -2.7673e-01,\n",
      "         -3.6827e-01,  9.9985e-01,  9.1469e-02,  2.9056e-01, -9.7954e-01,\n",
      "         -4.0206e-01, -4.9129e-01,  9.9745e-01,  8.4380e-01, -1.4853e-01,\n",
      "          3.7771e-01,  1.8318e-01, -2.6459e-02,  4.7974e-01, -5.4117e-03,\n",
      "          4.3701e-03, -2.4330e-03,  9.8072e-02,  9.3859e-01, -4.5234e-01,\n",
      "         -9.7426e-01, -6.5835e-01,  2.5815e-02, -9.3973e-01,  9.8055e-01,\n",
      "         -1.9610e-01, -5.1893e-02, -1.7175e-02, -8.7915e-02,  4.1474e-01,\n",
      "          5.4786e-02, -9.6506e-01,  1.0456e-01,  1.0606e-01,  9.6423e-01,\n",
      "         -3.3901e-02, -6.2046e-01, -9.4383e-01, -2.5289e-01,  5.9409e-01,\n",
      "         -6.7694e-02, -8.1933e-01,  9.5689e-01, -9.3558e-01,  3.6238e-01,\n",
      "          9.9938e-01,  2.2634e-01, -6.6810e-01,  6.0785e-03, -3.5088e-01,\n",
      "          1.8301e-01, -1.5569e-01,  5.9032e-01, -9.3563e-01, -3.5780e-01,\n",
      "         -1.0177e-01,  1.1414e-01, -1.2837e-01, -2.0000e-01,  4.0425e-01,\n",
      "          8.2765e-02, -5.8469e-01, -2.9229e-01,  4.9537e-03,  3.8927e-01,\n",
      "          6.6923e-01, -1.7641e-01, -1.2864e-01,  5.8341e-02, -9.9779e-02,\n",
      "         -9.1635e-01, -2.5509e-01, -2.5024e-01, -9.9048e-01,  5.4205e-01,\n",
      "         -9.9997e-01,  1.9936e-01, -6.2746e-01, -3.1429e-01,  8.8385e-01,\n",
      "          6.0793e-01,  4.4053e-01, -6.6756e-01,  2.4216e-01,  8.5366e-01,\n",
      "          5.4879e-01, -2.7448e-01,  5.0527e-01, -6.2542e-01,  1.3795e-01,\n",
      "         -1.2833e-01,  1.9358e-01,  7.6996e-02,  3.8815e-01, -2.4976e-01,\n",
      "          9.9998e-01,  1.1730e-01, -6.5224e-01, -7.9764e-01,  7.6038e-02,\n",
      "         -1.9107e-01,  9.9939e-01, -6.3015e-01, -9.5948e-01,  2.4225e-01,\n",
      "         -6.0806e-01, -8.4127e-01,  2.9755e-01,  2.6237e-01, -5.3159e-01,\n",
      "         -6.0424e-01,  9.1510e-01,  7.9815e-01, -6.3712e-01,  7.8975e-01,\n",
      "         -2.2756e-01, -4.5958e-01,  8.4606e-04,  6.4294e-02,  9.7041e-01,\n",
      "          5.4821e-01,  7.7400e-01, -2.0310e-01, -6.1499e-01,  9.6664e-01,\n",
      "          2.8915e-01,  1.8473e-01, -1.9784e-01,  9.9980e-01,  3.2068e-01,\n",
      "         -8.8019e-01, -7.7237e-02, -9.0901e-01, -1.9813e-01, -8.8418e-01,\n",
      "          8.1715e-02,  2.2635e-01,  9.2042e-01, -1.0348e-01,  9.3367e-01,\n",
      "         -2.8730e-01, -2.4075e-03, -2.3414e-01,  4.7300e-01,  3.0885e-01,\n",
      "         -9.0943e-01, -9.7602e-01, -9.7999e-01,  1.5841e-01, -1.4602e-01,\n",
      "         -1.3396e-01,  1.1578e-01, -4.0504e-02, -1.6110e-01, -9.2639e-02,\n",
      "         -9.9980e-01,  9.1276e-01,  3.0597e-01, -1.8919e-01,  9.4640e-01,\n",
      "          3.2521e-01,  5.2297e-01,  1.3397e-01, -9.6512e-01, -8.7257e-01,\n",
      "         -1.5825e-01, -1.4047e-01,  6.4015e-01,  3.1501e-01,  8.4411e-01,\n",
      "         -7.6320e-02, -3.5128e-01, -5.6055e-01, -6.5825e-01, -8.2077e-01,\n",
      "         -9.8028e-01,  2.9467e-01,  5.9930e-01, -7.7228e-01,  9.2316e-01,\n",
      "         -6.5145e-01, -1.7735e-01,  5.6600e-01, -2.6573e-01,  3.6191e-01,\n",
      "          3.2193e-01,  1.0799e-01, -2.6561e-02,  6.1199e-01,  8.2953e-01,\n",
      "          8.0335e-01,  9.7372e-01,  6.5444e-02,  4.5875e-01,  4.3933e-01,\n",
      "          3.4857e-01,  9.1538e-01, -8.5089e-01,  8.6904e-02,  5.5658e-01,\n",
      "         -4.8172e-01,  2.4070e-01, -2.1773e-01, -6.0461e-01,  4.1906e-01,\n",
      "         -3.5208e-01,  2.9005e-01, -2.7767e-01,  8.3306e-02, -1.6652e-01,\n",
      "         -3.5441e-01, -3.2011e-01, -3.0685e-01,  6.5583e-01,  5.1440e-01,\n",
      "          9.0891e-01,  6.0811e-01,  5.7899e-02, -5.3289e-01,  9.7129e-02,\n",
      "          2.5882e-01, -8.9232e-01,  5.0798e-01,  1.1261e-02,  5.6267e-01,\n",
      "          6.5232e-01,  2.9266e-02,  8.6119e-01, -2.6809e-01, -1.5990e-01,\n",
      "         -6.9884e-02, -4.5516e-01,  4.8223e-01, -6.4390e-01, -4.1659e-01,\n",
      "         -4.6073e-01,  4.8635e-02,  1.9580e-01,  9.9074e-01, -3.5556e-01,\n",
      "          2.8231e-01, -4.0274e-01, -2.1153e-01,  1.2159e-01, -3.8889e-01,\n",
      "         -9.9987e-01,  1.1610e-01,  4.5443e-01,  1.9631e-02, -4.8330e-01,\n",
      "          5.9117e-01,  1.7777e-01, -9.2548e-01, -2.7403e-01,  6.1358e-01,\n",
      "          1.9727e-01, -2.7212e-01, -1.0140e-01,  5.6724e-01,  8.6259e-01,\n",
      "          3.6661e-01,  7.1262e-01, -3.1770e-01,  5.2326e-01,  6.4676e-01,\n",
      "         -5.8528e-01, -4.9630e-01,  8.7960e-01]], grad_fn=<TanhBackward>)\n"
     ]
    }
   ],
   "source": [
    "last_hidden_state, pooler_output = model(token_ids_tensor)\n",
    "\n",
    "# pooler output is the last layer hidden state of the first token.\n",
    "# Since this uses attention, it takes the whole sequence into account.\n",
    "vector = pooler_output\n",
    "\n",
    "print(type(vector), vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's convert it into Numpy and make it one-dimensional"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'numpy.ndarray'> [-6.84628487e-01 -2.89591521e-01 -2.91541070e-01  6.71265423e-01\n",
      "  3.95071477e-01 -2.16784209e-01  6.20105147e-01  1.96766943e-01\n",
      " -2.87933350e-02 -9.49840426e-01  6.56687692e-02  4.60277617e-01\n",
      "  9.66312706e-01 -2.36155078e-01  9.46891129e-01 -6.06303632e-01\n",
      " -4.73092347e-01 -4.34805721e-01  2.02217102e-01 -3.41503441e-01\n",
      "  8.34570289e-01  9.84307647e-01  2.23645478e-01  3.21142077e-01\n",
      "  4.13339913e-01  2.70759493e-01 -3.62768382e-01  9.40062046e-01\n",
      "  8.49591494e-01  6.95917845e-01 -6.70773387e-01 -7.66250417e-02\n",
      " -9.84417677e-01 -1.71114579e-01 -7.31735408e-01 -9.31641102e-01\n",
      "  7.55103827e-02 -3.62168103e-01 -1.84017215e-02  2.40652755e-01\n",
      " -8.92024696e-01  2.12224603e-01  9.83429790e-01 -7.13237345e-01\n",
      "  6.21758878e-01 -1.20137155e-01 -9.98118401e-01  7.71760195e-02\n",
      " -8.90646696e-01 -4.33352202e-01  2.49371767e-01  1.97246686e-01\n",
      " -1.00208428e-02  2.55708367e-01  2.07114801e-01 -4.40470636e-01\n",
      " -1.40554667e-01  1.47446528e-01 -4.45383005e-02 -2.22154856e-01\n",
      " -5.20936608e-01  3.45849454e-01 -5.11738360e-01 -7.06572950e-01\n",
      " -1.81898311e-01 -1.15012474e-01 -1.83977082e-01 -4.62138429e-02\n",
      "  7.46894032e-02 -1.33774593e-01  6.53133094e-01  4.11871672e-01\n",
      "  4.74670112e-01 -8.55535150e-01 -4.72458601e-01  2.61630505e-01\n",
      " -5.91484666e-01  9.99962032e-01 -2.28569582e-01 -9.79867816e-01\n",
      "  7.69997954e-01 -3.08717132e-01  6.12579882e-01  5.18750727e-01\n",
      " -5.79740167e-01 -9.99791801e-01  3.03983480e-01 -2.20049590e-01\n",
      " -9.74786937e-01 -1.56140268e-01  5.77371418e-01 -1.05475947e-01\n",
      "  4.30951089e-01  5.81398845e-01 -3.31668705e-01 -5.90946019e-01\n",
      " -6.84028342e-02 -3.70905131e-01 -1.23055927e-01 -3.52654070e-01\n",
      "  2.08595157e-01  3.85590382e-02 -2.16434225e-01 -7.34849498e-02\n",
      "  1.60537064e-01 -1.72299922e-01 -6.11736298e-01 -3.24742407e-01\n",
      "  2.17623830e-01  4.09515023e-01  4.83697861e-01 -6.32140711e-02\n",
      "  2.31352299e-01 -8.56281161e-01  5.77445626e-02 -6.24606647e-02\n",
      " -9.64218497e-01 -5.91862440e-01 -9.79088306e-01  6.39976859e-01\n",
      " -3.90627444e-01 -3.86396587e-01  9.20367837e-01 -2.25204900e-01\n",
      "  9.62616783e-03 -2.18224168e-01  1.95887834e-01 -9.99987602e-01\n",
      " -1.92212701e-01 -5.80429733e-01  4.02364343e-01 -2.32717037e-01\n",
      " -9.57460582e-01 -9.51511085e-01  3.57939929e-01  8.53128493e-01\n",
      "  6.66428730e-03  9.44551826e-01 -1.56540543e-01  9.35615659e-01\n",
      "  4.72874522e-01 -3.57629716e-01 -1.62744746e-01 -4.14905965e-01\n",
      "  5.47836840e-01  1.20960295e-01 -4.51014608e-01  1.76286623e-01\n",
      " -2.05302909e-01 -2.09933981e-01 -6.88180745e-01 -1.11242337e-02\n",
      " -2.35485718e-01 -8.23984623e-01 -2.21838370e-01  9.61110651e-01\n",
      " -2.95296848e-01  1.38620794e-01  4.25305992e-01 -3.60202454e-02\n",
      " -4.80337173e-01  6.07358575e-01  7.48292267e-01  1.33673429e-01\n",
      " -1.75273851e-01  4.86755706e-02 -5.92557430e-01  1.13160603e-01\n",
      " -6.55327499e-01  4.68833297e-01  2.51879752e-01 -1.14830986e-01\n",
      " -2.07915619e-01 -9.84116793e-01 -6.51691183e-02 -4.15052399e-02\n",
      "  9.71545279e-01  4.77825493e-01  8.88953581e-02 -1.81736618e-01\n",
      " -2.07501307e-01  3.61075282e-01 -9.02807891e-01  9.66431141e-01\n",
      "  6.33644387e-02  8.62771571e-02 -4.36664551e-01 -2.83090413e-01\n",
      " -8.42999756e-01 -1.68564975e-01  6.50294781e-01 -4.94863689e-02\n",
      " -5.96365750e-01 -2.28201989e-02 -3.22715014e-01 -8.19422379e-02\n",
      " -4.92862284e-01  3.82305920e-01 -1.74775273e-01 -2.70093858e-01\n",
      " -2.51422495e-01  8.74737144e-01  7.17155099e-01  4.43087578e-01\n",
      " -1.80278927e-01  4.27908182e-01 -7.57730663e-01 -7.48189867e-01\n",
      " -6.12897873e-02 -4.78890687e-02  1.39420003e-01  9.77522969e-01\n",
      " -7.68028498e-01  1.93608403e-02 -8.34781587e-01 -9.63399231e-01\n",
      "  3.91616747e-02 -8.54150474e-01 -2.05695122e-01 -2.85961628e-01\n",
      "  3.78379524e-01 -5.42263269e-01 -6.47491515e-01  2.58589417e-01\n",
      " -8.88202250e-01 -8.69045138e-01  3.04953486e-01 -3.16939861e-01\n",
      "  1.62522838e-01 -2.74077952e-01  9.70908046e-01  3.16269398e-01\n",
      " -5.95034599e-01  6.91015005e-01  9.51529503e-01 -7.03967214e-01\n",
      " -8.39933217e-01  7.91463673e-01 -1.37074366e-01  4.19213355e-01\n",
      " -3.41576040e-01  8.42101932e-01  2.33200401e-01  5.97429276e-01\n",
      " -8.65413129e-01 -3.44361335e-01 -5.83049357e-01  4.65510070e-01\n",
      " -1.86100230e-01 -6.89699054e-01 -3.68169695e-01  5.61690569e-01\n",
      "  1.17335021e-01  6.12698376e-01 -2.44306251e-01  7.21290469e-01\n",
      " -9.62048054e-01 -9.49831903e-01 -7.12814510e-01  2.91243792e-01\n",
      " -9.86161709e-01  5.05021393e-01  1.42423317e-01  5.08802593e-01\n",
      " -3.01911443e-01 -1.23704679e-01 -9.55899835e-01  4.82549012e-01\n",
      "  6.07642196e-02  7.73899198e-01 -5.64008772e-01 -7.37863898e-01\n",
      " -1.07733138e-01 -9.22845125e-01  9.08392072e-02 -1.45039484e-01\n",
      "  4.33436275e-01 -1.22899801e-01 -8.57792377e-01  2.10649326e-01\n",
      "  5.75548708e-01  3.30001593e-01  4.65644717e-01  9.37779844e-01\n",
      "  9.98240054e-01  9.68057632e-01  8.20330799e-01  2.19797492e-01\n",
      " -9.82231200e-01 -1.24674484e-01  9.77800369e-01 -6.00667715e-01\n",
      " -9.99817908e-01 -9.13691044e-01 -5.29718995e-01  2.00410217e-01\n",
      " -9.99987125e-01 -2.29984701e-01  9.76074189e-02 -7.38466859e-01\n",
      " -2.21821181e-02  9.48592246e-01  8.66617620e-01 -9.99952793e-01\n",
      "  1.94457278e-01  8.84555161e-01 -6.65743113e-01  5.52103221e-02\n",
      " -3.64303052e-01  9.67083454e-01  5.07346332e-01  6.06739342e-01\n",
      " -3.99024002e-02  3.66331011e-01 -6.43628657e-01 -6.30395293e-01\n",
      " -3.07651833e-02 -5.71051121e-01  8.84538949e-01  1.20282300e-01\n",
      " -5.38344979e-01 -8.32620263e-01  4.87318933e-01 -2.12279305e-01\n",
      "  3.44758391e-01 -9.51757252e-01 -9.89525095e-02  7.20545277e-02\n",
      "  5.61325371e-01  1.49524227e-01 -7.63539076e-02 -3.95724148e-01\n",
      "  6.69770613e-02  1.54408976e-01  1.98911846e-01  6.42251253e-01\n",
      " -6.34914577e-01 -2.40154058e-01  5.02099752e-01 -4.31837559e-01\n",
      "  6.93090353e-03 -9.71793175e-01  9.34785664e-01 -4.30152826e-02\n",
      " -3.31673473e-01  9.99972880e-01  2.36150980e-01 -7.16750324e-01\n",
      "  2.30471417e-01  1.08951956e-01 -7.49580443e-01  9.99923050e-01\n",
      "  1.80747479e-01 -9.70346451e-01 -4.98750210e-01  1.10859834e-02\n",
      " -3.28933179e-01 -6.00282550e-01  9.42904294e-01 -3.86454463e-02\n",
      "  1.17722236e-01 -4.44166392e-01  9.81641889e-01 -9.83789444e-01\n",
      "  8.66157413e-01 -5.36971211e-01 -9.58775699e-01  8.97572100e-01\n",
      "  9.39799368e-01 -6.77313149e-01  1.12242155e-01  3.02283894e-02\n",
      "  5.33015206e-02  1.17180012e-01 -7.33501196e-01  4.54784691e-01\n",
      "  6.61437333e-01 -9.11309123e-02  8.76886487e-01 -1.14977278e-01\n",
      " -5.79825401e-01 -1.86798289e-01 -2.06969395e-01  5.52134931e-01\n",
      "  6.48525834e-01  1.86487138e-01 -1.23227246e-01 -1.17850266e-01\n",
      " -9.67185870e-02 -8.35179389e-01 -8.96809399e-01  6.80714130e-01\n",
      "  9.99968231e-01  9.64235067e-02  6.62416458e-01  3.51625174e-01\n",
      "  1.36445567e-03  2.02326421e-02  1.77731991e-01  3.27429146e-01\n",
      "  3.26343887e-02 -1.91776007e-01  4.01694953e-01 -8.75952363e-01\n",
      " -9.85216916e-01  7.17136025e-01  1.40597507e-01 -1.22481257e-01\n",
      "  9.93934870e-01  2.68285513e-01 -2.14082897e-01 -3.94734740e-02\n",
      "  2.34138295e-01  4.18490544e-02  1.53384566e-01 -2.75834113e-01\n",
      "  9.43155348e-01 -1.23246826e-01  5.87794423e-01  6.07589424e-01\n",
      "  2.30061203e-01 -5.18435419e-01 -3.63748938e-01 -5.50540388e-02\n",
      " -9.21173453e-01 -1.78191528e-01 -9.14138138e-01  9.49922621e-01\n",
      " -4.73696887e-01  1.22101106e-01  1.01273373e-01  5.86193085e-01\n",
      "  9.99960899e-01 -7.72973001e-01  6.39182568e-01 -1.65028721e-02\n",
      "  8.58873606e-01 -9.69030261e-01 -6.56886160e-01 -4.25135881e-01\n",
      "  1.85572311e-01  2.89826870e-01 -6.46679550e-02  1.56500235e-01\n",
      " -8.90223265e-01 -3.93299162e-01  1.08385630e-01 -8.79534364e-01\n",
      " -9.75355566e-01  1.58991352e-01  6.05597854e-01  1.20888151e-01\n",
      " -8.82865548e-01 -4.42339748e-01 -4.58085924e-01  1.56696096e-01\n",
      " -1.68813601e-01 -9.18732047e-01  5.80247581e-01 -3.09122860e-01\n",
      "  2.12144047e-01 -2.03534871e-01  5.67118585e-01  7.53136873e-02\n",
      "  7.20383346e-01 -2.53689378e-01 -4.30007577e-01 -1.85849831e-01\n",
      " -7.00501502e-01  7.67153025e-01 -6.02951407e-01 -3.14460278e-01\n",
      " -2.96945479e-02  9.99970555e-01 -3.17008764e-01  5.61168790e-01\n",
      "  5.06368399e-01  7.07565308e-01 -2.92320810e-02  2.39395663e-01\n",
      "  3.93467486e-01  2.46238425e-01  2.89643347e-01  1.31072745e-01\n",
      " -2.55250186e-01 -1.28223553e-01  5.59136808e-01  6.69703603e-01\n",
      " -4.91230711e-02  7.84098208e-01 -1.67220399e-01  1.15600668e-01\n",
      "  1.62588030e-01  1.94819525e-01  8.91793966e-01 -1.23181760e-01\n",
      " -2.14313745e-01 -1.52940795e-01 -2.15952396e-01 -2.76730835e-01\n",
      " -3.68270665e-01  9.99846220e-01  9.14686546e-02  2.90558368e-01\n",
      " -9.79540884e-01 -4.02055115e-01 -4.91288841e-01  9.97451246e-01\n",
      "  8.43795240e-01 -1.48530632e-01  3.77714396e-01  1.83181688e-01\n",
      " -2.64594313e-02  4.79741544e-01 -5.41166356e-03  4.37006261e-03\n",
      " -2.43297848e-03  9.80723873e-02  9.38587427e-01 -4.52343583e-01\n",
      " -9.74259734e-01 -6.58348322e-01  2.58153658e-02 -9.39729035e-01\n",
      "  9.80549276e-01 -1.96101218e-01 -5.18931895e-02 -1.71751007e-02\n",
      " -8.79152864e-02  4.14739043e-01  5.47859445e-02 -9.65056896e-01\n",
      "  1.04558483e-01  1.06062792e-01  9.64227557e-01 -3.39010209e-02\n",
      " -6.20462656e-01 -9.43826437e-01 -2.52888769e-01  5.94085097e-01\n",
      " -6.76936656e-02 -8.19332540e-01  9.56893861e-01 -9.35577273e-01\n",
      "  3.62383097e-01  9.99376357e-01  2.26340652e-01 -6.68098807e-01\n",
      "  6.07851427e-03 -3.50878119e-01  1.83013320e-01 -1.55687496e-01\n",
      "  5.90323150e-01 -9.35633540e-01 -3.57800782e-01 -1.01766042e-01\n",
      "  1.14137232e-01 -1.28365502e-01 -1.99995205e-01  4.04254109e-01\n",
      "  8.27645883e-02 -5.84692597e-01 -2.92294472e-01  4.95370431e-03\n",
      "  3.89271051e-01  6.69233680e-01 -1.76408023e-01 -1.28644943e-01\n",
      "  5.83405569e-02 -9.97789875e-02 -9.16350901e-01 -2.55089194e-01\n",
      " -2.50243396e-01 -9.90482867e-01  5.42046785e-01 -9.99969125e-01\n",
      "  1.99364290e-01 -6.27459228e-01 -3.14289927e-01  8.83848667e-01\n",
      "  6.07934117e-01  4.40531135e-01 -6.67558134e-01  2.42158934e-01\n",
      "  8.53662610e-01  5.48789859e-01 -2.74479300e-01  5.05267262e-01\n",
      " -6.25422180e-01  1.37953520e-01 -1.28331512e-01  1.93576068e-01\n",
      "  7.69957006e-02  3.88148278e-01 -2.49758705e-01  9.99982476e-01\n",
      "  1.17298804e-01 -6.52236521e-01 -7.97644317e-01  7.60384873e-02\n",
      " -1.91071138e-01  9.99392867e-01 -6.30147219e-01 -9.59475875e-01\n",
      "  2.42252082e-01 -6.08055294e-01 -8.41268301e-01  2.97548950e-01\n",
      "  2.62366086e-01 -5.31587303e-01 -6.04243040e-01  9.15102124e-01\n",
      "  7.98154771e-01 -6.37118876e-01  7.89745927e-01 -2.27564409e-01\n",
      " -4.59584385e-01  8.46059818e-04  6.42944351e-02  9.70409095e-01\n",
      "  5.48211753e-01  7.73996472e-01 -2.03097641e-01 -6.14989102e-01\n",
      "  9.66643095e-01  2.89150745e-01  1.84731692e-01 -1.97842211e-01\n",
      "  9.99799073e-01  3.20680529e-01 -8.80194962e-01 -7.72371590e-02\n",
      " -9.09011543e-01 -1.98125198e-01 -8.84180903e-01  8.17148089e-02\n",
      "  2.26351470e-01  9.20415580e-01 -1.03479266e-01  9.33674216e-01\n",
      " -2.87304580e-01 -2.40752520e-03 -2.34143049e-01  4.72999781e-01\n",
      "  3.08846265e-01 -9.09434378e-01 -9.76023614e-01 -9.79994416e-01\n",
      "  1.58407822e-01 -1.46016181e-01 -1.33959278e-01  1.15776278e-01\n",
      " -4.05044630e-02 -1.61096469e-01 -9.26388875e-02 -9.99804139e-01\n",
      "  9.12760854e-01  3.05970848e-01 -1.89190030e-01  9.46398735e-01\n",
      "  3.25214028e-01  5.22969782e-01  1.33965105e-01 -9.65115786e-01\n",
      " -8.72572541e-01 -1.58247799e-01 -1.40467674e-01  6.40147448e-01\n",
      "  3.15005809e-01  8.44109178e-01 -7.63198659e-02 -3.51275116e-01\n",
      " -5.60550869e-01 -6.58248484e-01 -8.20772350e-01 -9.80284035e-01\n",
      "  2.94667512e-01  5.99301040e-01 -7.72279859e-01  9.23164725e-01\n",
      " -6.51449859e-01 -1.77346095e-01  5.66000819e-01 -2.65734702e-01\n",
      "  3.61909896e-01  3.21929604e-01  1.07993722e-01 -2.65614409e-02\n",
      "  6.11993313e-01  8.29534948e-01  8.03346157e-01  9.73720789e-01\n",
      "  6.54435307e-02  4.58745241e-01  4.39331889e-01  3.48565638e-01\n",
      "  9.15383101e-01 -8.50885570e-01  8.69044736e-02  5.56581616e-01\n",
      " -4.81716871e-01  2.40704149e-01 -2.17731923e-01 -6.04605138e-01\n",
      "  4.19061750e-01 -3.52084994e-01  2.90052563e-01 -2.77674407e-01\n",
      "  8.33056569e-02 -1.66522965e-01 -3.54407400e-01 -3.20105851e-01\n",
      " -3.06851268e-01  6.55826509e-01  5.14403284e-01  9.08911705e-01\n",
      "  6.08110785e-01  5.78991361e-02 -5.32887876e-01  9.71290991e-02\n",
      "  2.58820146e-01 -8.92317355e-01  5.07979870e-01  1.12613374e-02\n",
      "  5.62671065e-01  6.52322888e-01  2.92660873e-02  8.61185431e-01\n",
      " -2.68088043e-01 -1.59904942e-01 -6.98840916e-02 -4.55157697e-01\n",
      "  4.82234299e-01 -6.43897057e-01 -4.16591495e-01 -4.60731089e-01\n",
      "  4.86354046e-02  1.95800930e-01  9.90741372e-01 -3.55561614e-01\n",
      "  2.82314390e-01 -4.02741760e-01 -2.11525574e-01  1.21593408e-01\n",
      " -3.88887376e-01 -9.99871790e-01  1.16095670e-01  4.54431295e-01\n",
      "  1.96306035e-02 -4.83297318e-01  5.91171026e-01  1.77768096e-01\n",
      " -9.25477743e-01 -2.74028271e-01  6.13578856e-01  1.97269052e-01\n",
      " -2.72115350e-01 -1.01401471e-01  5.67236960e-01  8.62587094e-01\n",
      "  3.66609961e-01  7.12623894e-01 -3.17700714e-01  5.23258626e-01\n",
      "  6.46758616e-01 -5.85283220e-01 -4.96297508e-01  8.79601717e-01]\n"
     ]
    }
   ],
   "source": [
    "np_vector = vector.detach().numpy()\n",
    "\n",
    "# make it one-dimensional\n",
    "np_vector = np_vector.squeeze()\n",
    "\n",
    "print(type(np_vector), np_vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's make it a nice function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_vector(text: str) -> np.ndarray:\n",
    "    tokens = get_tokens(text, tokenizer, config)\n",
    "    token_ids: List[int] = tokenizer.convert_tokens_to_ids(tokens)\n",
    "    token_ids_tensor = torch.tensor(token_ids)\n",
    "    token_ids_tensor = torch.unsqueeze(token_ids_tensor, 0)\n",
    "    last_hidden_state, pooler_output = model(token_ids_tensor)\n",
    "    vector = pooler_output\n",
    "    #vector = last_hidden_state.mean(dim=1)  # Can do this too\n",
    "    np_vector = vector.detach().numpy()\n",
    "    np_vector = np_vector.squeeze()\n",
    "    return np_vector"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# This uses the transformer model and takes the sequence of words into account\n",
    "\n",
    "* The transformer model is graph neural network with a lot of layers. One way it is trained is that for each token position, it looks around at all of the neighboring tokens to decide what that token should be. \n",
    "  * Great explanation of transformers http://nlp.seas.harvard.edu/2018/04/03/attention.html\n",
    "  * Bert builds on transformers https://arxiv.org/pdf/1810.04805.pdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using word order is the hard way\n",
    "* The easy way is to learn a vector for each word and then make the vector for the sentence the average of the word vectors.\n",
    "* Let's try that with spaCy."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Install spaCy\n",
    "\n",
    "`pip install -U spacy`\n",
    "\n",
    "The large model has good word vectors.\n",
    "\n",
    "`python -m spacy download en_vectors_web_lg`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dog [-5.71195185e-02  5.26851341e-02  3.02558858e-03 -4.85166162e-02\n",
      "  7.04297796e-03  4.18558009e-02 -2.47040205e-02 -3.97829153e-02\n",
      "  9.61403828e-03  3.08416426e-01 -8.91298205e-02  4.13809419e-02\n",
      " -9.56399292e-02  3.31533775e-02 -4.87142392e-02  2.60333400e-02\n",
      "  7.14079365e-02  1.51968971e-01  2.08966229e-02 -6.43049553e-02\n",
      " -5.94667979e-02 -2.27007996e-02  3.80284972e-02 -6.94757923e-02\n",
      "  5.18392026e-02 -6.17074501e-03 -3.47954780e-02 -5.93601651e-02\n",
      "  1.26659293e-02 -3.63281034e-02 -7.91833773e-02  1.74062699e-02\n",
      " -1.18751619e-02  7.83303455e-02  5.17652743e-02  2.18392313e-02\n",
      "  7.92445242e-02 -1.28953964e-01 -6.98042149e-03  5.48504330e-02\n",
      "  5.40258288e-02  2.05084905e-02 -3.87009755e-02 -5.26268445e-02\n",
      " -1.83460359e-02 -2.14468315e-02 -5.41338809e-02  7.04937521e-03\n",
      "  1.81341972e-02 -1.17702372e-02  2.03862209e-02  4.62589078e-02\n",
      "  3.87080871e-02  6.20330237e-02 -4.51670177e-02  1.12892650e-01\n",
      "  3.77171375e-02  1.44092571e-02 -4.73138280e-02  6.13008291e-02\n",
      "  2.37244479e-02  1.52537664e-02  1.27128465e-02  4.07113060e-02\n",
      "  5.70356324e-02 -5.57631850e-02  6.42864779e-02  1.92232449e-02\n",
      " -4.10567857e-02 -3.24425101e-03 -4.97250892e-02 -3.26941572e-02\n",
      "  2.87531149e-02 -3.01080253e-02  3.86483744e-02  1.30377114e-02\n",
      " -2.93019041e-02 -9.34902802e-02  2.69404072e-02 -3.80398706e-02\n",
      "  1.31707862e-02  6.15837574e-02 -6.94772154e-02 -5.44651411e-02\n",
      " -3.11501548e-02 -6.28164038e-02  1.39392331e-01  9.58574563e-02\n",
      "  1.19429782e-01 -2.58314554e-02  2.47168168e-02  5.94966561e-02\n",
      "  2.28870474e-02 -1.49139725e-02 -5.96629940e-02 -5.06989732e-02\n",
      " -2.39377059e-02 -9.02202949e-02  5.46258017e-02 -4.98217680e-02\n",
      "  2.48604119e-02  7.61024877e-02  2.86379531e-02  5.38509572e-03\n",
      "  6.69707060e-02 -6.30453005e-02  2.39419732e-02 -2.37216037e-02\n",
      " -3.41528542e-02 -1.43267959e-02  4.31268290e-02  6.07506223e-02\n",
      "  4.80588190e-02 -6.18183464e-02  1.61267109e-02  8.80876929e-03\n",
      "  8.78744386e-03 -1.99142061e-02  1.16607649e-02 -5.56323864e-03\n",
      "  7.31367571e-03  4.08392623e-02  8.24960247e-02 -8.19500759e-02\n",
      " -4.92658690e-02  1.44049916e-02  2.05625147e-02  1.64480216e-03\n",
      " -4.79138009e-02 -2.50025857e-02 -5.07899635e-02 -3.04577723e-02\n",
      "  1.62489782e-03  6.77384362e-02 -5.32623567e-03 -4.19240445e-02\n",
      " -2.48305555e-02  4.30145115e-02  8.57546255e-02 -9.49575007e-03\n",
      " -3.84578586e-01 -9.99591574e-02  5.76484017e-02  8.93900022e-02\n",
      "  8.96828771e-02 -7.75028989e-02 -1.36757852e-03  3.77228223e-02\n",
      "  3.32557410e-02 -7.37680029e-03 -9.34916956e-04  2.64058332e-03\n",
      " -6.49632215e-02 -1.00020291e-02 -4.35348675e-02 -1.99298444e-03\n",
      " -2.90147141e-02  5.27462699e-02 -4.59987298e-02 -1.20343953e-01\n",
      "  3.85175757e-02 -1.70053411e-02 -1.35883493e-02 -8.59636292e-02\n",
      "  6.02942472e-03  3.50542329e-02  5.46584977e-03 -3.62072582e-03\n",
      " -1.32090310e-02 -3.03625148e-02  5.13529740e-02  2.71735713e-03\n",
      "  8.92009027e-03 -1.86005253e-02 -2.15335574e-04  8.27988461e-02\n",
      " -2.69503575e-02  1.11044407e-01  1.48954894e-03  1.55366912e-01\n",
      "  1.44163659e-02 -5.15349545e-02 -1.70067623e-02 -4.89957370e-02\n",
      " -7.91961774e-02  3.66764292e-02  4.74233031e-02  4.71929833e-02\n",
      " -4.44973782e-02 -1.07407607e-01 -1.07042231e-01 -1.32323466e-02\n",
      " -1.58850159e-02 -8.13955963e-02  2.36562043e-02  9.03923213e-02\n",
      "  3.41301076e-02 -4.15302217e-02  1.28214672e-01  1.76650248e-02\n",
      " -8.21064636e-02  6.82232482e-03 -6.07762150e-02  3.47556695e-02\n",
      "  6.71512587e-03  5.07473126e-02  6.28988594e-02 -3.27780396e-02\n",
      "  9.38869342e-02 -1.05184026e-03 -5.38225211e-02  3.23572084e-02\n",
      " -5.28002940e-02  4.41518985e-02 -1.02514010e-02 -3.48182246e-02\n",
      " -5.65294968e-03  7.62759373e-02 -5.89706115e-02  2.35481523e-02\n",
      "  4.79223318e-02  1.55253168e-02  5.29154539e-02 -7.92288780e-02\n",
      " -1.10980429e-01  2.02611070e-02 -5.09378277e-02  5.91980889e-02\n",
      "  3.04904711e-02  2.61740927e-02 -6.78223222e-02 -3.12852184e-03\n",
      " -3.36012244e-02 -3.24723683e-02  4.93653901e-02  3.36481407e-02\n",
      "  1.05562201e-02 -1.25703886e-02  4.06871364e-02 -6.67389631e-02\n",
      " -6.24339506e-02 -3.76389399e-02 -4.36329655e-02 -2.16956362e-02\n",
      " -1.20662432e-02  4.03914154e-02 -2.62750350e-02 -3.14515643e-02\n",
      " -1.58793293e-02 -3.58859473e-03  6.53541926e-03  5.02482848e-03\n",
      "  3.19420621e-02  7.32988268e-02 -9.26073772e-05  1.41546251e-02\n",
      " -2.02099252e-02  2.86280029e-02  4.02833670e-02 -4.09060828e-02\n",
      "  5.36931399e-03 -5.34685105e-02 -1.66072566e-02 -9.52844992e-02\n",
      " -6.57764450e-03  5.51404655e-02 -4.59148455e-03 -7.71872653e-03\n",
      " -6.45238981e-02  2.77977102e-02 -4.18984517e-02  1.20860048e-01\n",
      "  1.47078205e-02  1.37922252e-02  1.61210243e-02  5.61612695e-02\n",
      "  8.39433447e-03  3.12468316e-02  2.67925449e-02 -2.25927494e-02\n",
      " -1.46452645e-02  4.71503325e-02  8.74038413e-03 -4.24358696e-02\n",
      "  6.32813126e-02  6.72891736e-02  3.74086201e-02 -2.62949392e-02\n",
      "  2.08312236e-02 -4.47987858e-03  3.25690443e-03 -3.68641019e-02\n",
      " -4.38775048e-02  2.49442935e-04 -2.69588884e-02  7.78952986e-02\n",
      "  4.43495214e-02  3.51068377e-02  4.25510295e-02 -1.06432298e-02]\n",
      "cat [-0.02255263 -0.00366243 -0.03497775 -0.03499272 -0.02751459  0.04896256\n",
      " -0.03305583 -0.04307406  0.01909796  0.17446963 -0.09604063 -0.01473697\n",
      " -0.09339724  0.00156134 -0.03839799  0.04759746  0.00565485  0.17818175\n",
      " -0.02651471 -0.03884254 -0.04709154  0.00581141 -0.02351957 -0.02018316\n",
      "  0.05528664 -0.04574589 -0.06079944 -0.05832369  0.05517288  0.00209001\n",
      " -0.10320591  0.00060861 -0.02046156  0.04874253  0.03695355 -0.02097198\n",
      "  0.08066227 -0.1204059  -0.02659854 -0.01934194  0.0244027   0.0223281\n",
      " -0.01024261 -0.05077522  0.02768373 -0.01235537 -0.070189    0.05924574\n",
      " -0.02056934 -0.05258637  0.03326389 -0.02155424 -0.00722771  0.05057764\n",
      " -0.04777259  0.03072378  0.01476226 -0.03573963  0.00678629  0.06577188\n",
      "  0.0045481  -0.00206876 -0.01396131 -0.02720924  0.02909524 -0.05660983\n",
      "  0.10499312  0.02430241  0.00088479  0.00372679 -0.02037625 -0.0171012\n",
      " -0.0472966  -0.02126835  0.00422014  0.08111281 -0.06348474 -0.08965967\n",
      "  0.03738464 -0.0404187   0.02239845  0.04383744 -0.04682211  0.02476194\n",
      " -0.03150063 -0.06597994  0.18222317  0.0766912   0.0841349   0.0211516\n",
      "  0.0138477   0.10686714 -0.00315096 -0.05044891 -0.03034808 -0.05415654\n",
      "  0.03301242 -0.03841595  0.04254718 -0.02539808  0.00868591  0.0915786\n",
      "  0.04726367 -0.01185259  0.05319408 -0.0766912   0.06339043 -0.04495407\n",
      " -0.0334929   0.02278613 -0.00722846  0.03522323  0.06961273 -0.10115378\n",
      " -0.04925295  0.01264216 -0.03311421 -0.00678555  0.051585   -0.02177877\n",
      " -0.02701316 -0.02677367  0.14501068 -0.15010136 -0.07086407  0.04272231\n",
      "  0.08439385 -0.04971097 -0.05729089 -0.04114765 -0.03435956 -0.0363204\n",
      " -0.05641375  0.03715413  0.05529412  0.02192995 -0.05667569  0.04660207\n",
      " -0.0425831   0.0553046  -0.42171478 -0.05735675 -0.00334884  0.08438487\n",
      "  0.06006899 -0.06306263 -0.01693056 -0.02592048  0.02112017 -0.01974908\n",
      "  0.02768224  0.01462276 -0.01457022 -0.03590428  0.02489366 -0.04274327\n",
      "  0.00057858  0.07976867 -0.04844766 -0.05799289  0.04043068 -0.05116289\n",
      " -0.04146498 -0.10070472 -0.01612228 -0.00930859 -0.03709576 -0.01061008\n",
      " -0.0312806   0.00934077  0.00334869  0.0200694   0.01953354 -0.0292569\n",
      " -0.07012464  0.11634952 -0.00658273  0.0572834  -0.03498973  0.15652274\n",
      " -0.02151084 -0.05336173 -0.0120813  -0.04647185 -0.08654927 -0.04201132\n",
      " -0.01042956  0.01031745 -0.02428894 -0.09569785 -0.09302603  0.01679734\n",
      " -0.02539958 -0.0817819   0.07433371  0.06969954  0.01321604 -0.0725899\n",
      "  0.10367442 -0.01032463 -0.08039284  0.0311369  -0.06434391 -0.01784362\n",
      "  0.0175727  -0.0276059   0.06555634 -0.01850073  0.05399039 -0.0293497\n",
      " -0.05293662  0.02815224 -0.07575419  0.02163657 -0.00364746 -0.01612377\n",
      " -0.00172135  0.0877647  -0.00815185  0.00114487 -0.00842666  0.0407031\n",
      "  0.03457061 -0.04385091 -0.03641021  0.01544272 -0.01498918  0.10610975\n",
      "  0.02604771 -0.00056144 -0.06930887  0.01767149 -0.02463321 -0.05779082\n",
      "  0.02173985  0.01468712 -0.01848875 -0.01567173  0.05844643 -0.04584768\n",
      " -0.09785478 -0.00066231 -0.00507064  0.00555531 -0.04112071  0.00079552\n",
      "  0.04600784  0.01875219 -0.02922845 -0.0246392   0.01309989 -0.00764981\n",
      " -0.02443263  0.07798446  0.01619861 -0.00903767 -0.10737457 -0.00962861\n",
      "  0.0554468  -0.06145056 -0.04083332 -0.04522949  0.00236064 -0.06444719\n",
      "  0.05335723  0.02572739 -0.08172352 -0.03224305 -0.00671909 -0.01586183\n",
      " -0.08141368  0.08069071  0.01061816  0.01464476  0.01465509  0.02665093\n",
      "  0.02843215  0.07478425 -0.02773462  0.00766882  0.00292988  0.03712868\n",
      "  0.0470601  -0.04386289  0.08118017  0.06985971  0.03894284 -0.06691545\n",
      "  0.04234062 -0.00499115 -0.04966607 -0.01631836 -0.00349119  0.03152308\n",
      " -0.04435534  0.12200002  0.00576636  0.06885982  0.02572589 -0.0446113 ]\n",
      "banana [ 2.0228e-01 -7.6618e-02  3.7032e-01  3.2845e-02 -4.1957e-01  7.2069e-02\n",
      " -3.7476e-01  5.7460e-02 -1.2401e-02  5.2949e-01 -5.2380e-01 -1.9771e-01\n",
      " -3.4147e-01  5.3317e-01 -2.5331e-02  1.7380e-01  1.6772e-01  8.3984e-01\n",
      "  5.5107e-02  1.0547e-01  3.7872e-01  2.4275e-01  1.4745e-02  5.5951e-01\n",
      "  1.2521e-01 -6.7596e-01  3.5842e-01 -4.0028e-02  9.5949e-02 -5.0690e-01\n",
      " -8.5318e-02  1.7980e-01  3.3867e-01  1.3230e-01  3.1021e-01  2.1878e-01\n",
      "  1.6853e-01  1.9874e-01 -5.7385e-01 -1.0649e-01  2.6669e-01  1.2838e-01\n",
      " -1.2803e-01 -1.3284e-01  1.2657e-01  8.6723e-01  9.6721e-02  4.8306e-01\n",
      "  2.1271e-01 -5.4990e-02 -8.2425e-02  2.2408e-01  2.3975e-01 -6.2260e-02\n",
      "  6.2194e-01 -5.9900e-01  4.3201e-01  2.8143e-01  3.3842e-02 -4.8815e-01\n",
      " -2.1359e-01  2.7401e-01  2.4095e-01  4.5950e-01 -1.8605e-01 -1.0497e+00\n",
      " -9.7305e-02 -1.8908e-01 -7.0929e-01  4.0195e-01 -1.8768e-01  5.1687e-01\n",
      "  1.2520e-01  8.4150e-01  1.2097e-01  8.8239e-02 -2.9196e-02  1.2151e-03\n",
      "  5.6825e-02 -2.7421e-01  2.5564e-01  6.9793e-02 -2.2258e-01 -3.6006e-01\n",
      " -2.2402e-01 -5.3699e-02  1.2022e+00  5.4535e-01 -5.7998e-01  1.0905e-01\n",
      "  4.2167e-01  2.0662e-01  1.2936e-01 -4.1457e-02 -6.6777e-01  4.0467e-01\n",
      " -1.5218e-02 -2.7640e-01 -1.5611e-01 -7.9198e-02  4.0037e-02 -1.2944e-01\n",
      " -2.4090e-04 -2.6785e-01 -3.8115e-01 -9.7245e-01  3.1726e-01 -4.3951e-01\n",
      "  4.1934e-01  1.8353e-01 -1.5260e-01 -1.0808e-01 -1.0358e+00  7.6217e-02\n",
      "  1.6519e-01  2.6526e-04  1.6616e-01 -1.5281e-01  1.8123e-01  7.0274e-01\n",
      "  5.7956e-03  5.1664e-02 -5.9745e-02 -2.7551e-01 -3.9049e-01  6.1132e-02\n",
      "  5.5430e-01 -8.7997e-02 -4.1681e-01  3.2826e-01 -5.2549e-01 -4.4288e-01\n",
      "  8.2183e-03  2.4486e-01 -2.2982e-01 -3.4981e-01  2.6894e-01  3.9166e-01\n",
      " -4.1904e-01  1.6191e-01 -2.6263e+00  6.4134e-01  3.9743e-01 -1.2868e-01\n",
      " -3.1946e-01 -2.5633e-01 -1.2220e-01  3.2275e-01 -7.9933e-02 -1.5348e-01\n",
      "  3.1505e-01  3.0591e-01  2.6012e-01  1.8553e-01 -2.4043e-01  4.2886e-02\n",
      "  4.0622e-01 -2.4256e-01  6.3870e-01  6.9983e-01 -1.4043e-01  2.5209e-01\n",
      "  4.8984e-01 -6.1067e-02 -3.6766e-01 -5.5089e-01 -3.8265e-01 -2.0843e-01\n",
      "  2.2832e-01  5.1218e-01  2.7868e-01  4.7652e-01  4.7951e-02 -3.4008e-01\n",
      " -3.2873e-01 -4.1967e-01 -7.5499e-02 -3.8954e-01 -2.9622e-02 -3.4070e-01\n",
      "  2.2170e-01 -6.2856e-02 -5.1903e-01 -3.7774e-01 -4.3477e-03 -5.8301e-01\n",
      " -8.7546e-02 -2.3929e-01 -2.4711e-01 -2.5887e-01 -2.9894e-01  1.3715e-01\n",
      "  2.9892e-02  3.6544e-02 -4.9665e-01 -1.8160e-01  5.2939e-01  2.1992e-01\n",
      " -4.4514e-01  3.7798e-01 -5.7062e-01 -4.6946e-02  8.1806e-02  1.9279e-02\n",
      "  3.3246e-01 -1.4620e-01  1.7156e-01  3.9981e-01  3.6217e-01  1.2816e-01\n",
      "  3.1644e-01  3.7569e-01 -7.4690e-02 -4.8480e-02 -3.1401e-01 -1.9286e-01\n",
      " -3.1294e-01 -1.7553e-02 -1.7514e-01 -2.7587e-02 -1.0000e+00  1.8387e-01\n",
      "  8.1434e-01 -1.8913e-01  5.0999e-01 -9.1960e-03 -1.9295e-03  2.8189e-01\n",
      "  2.7247e-02  4.3409e-01 -5.4967e-01 -9.7426e-02 -2.4540e-01 -1.7203e-01\n",
      " -8.8650e-02 -3.0298e-01 -1.3591e-01 -2.7765e-01  3.1286e-03  2.0556e-01\n",
      " -1.5772e-01 -5.2308e-01 -6.4701e-01 -3.7014e-01  6.9393e-02  1.1401e-01\n",
      "  2.7594e-01 -1.3875e-01 -2.7268e-01  6.6891e-01 -5.6454e-02  2.4017e-01\n",
      " -2.6730e-01  2.9860e-01  1.0083e-01  5.5592e-01  3.2849e-01  7.6858e-02\n",
      "  1.5528e-01  2.5636e-01 -1.0772e-01 -1.2359e-01  1.1827e-01 -9.9029e-02\n",
      " -3.4328e-01  1.1502e-01 -3.7808e-01 -3.9012e-02 -3.4593e-01 -1.9404e-01\n",
      " -3.3580e-01 -6.2334e-02  2.8919e-01  2.8032e-01 -5.3741e-01  6.2794e-01\n",
      "  5.6955e-02  6.2147e-01 -2.5282e-01  4.1670e-01 -1.0108e-02 -2.5434e-01\n",
      "  4.0003e-01  4.2432e-01  2.2672e-01  1.7553e-01  2.3049e-01  2.8323e-01\n",
      "  1.3882e-01  3.1218e-03  1.7057e-01  3.6685e-01  2.5247e-03 -6.4009e-01\n",
      " -2.9765e-01  7.8943e-01  3.3168e-01 -1.1966e+00 -4.7156e-02  5.3175e-01]\n",
      "afskfsd [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
      " 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n"
     ]
    }
   ],
   "source": [
    "import spacy\n",
    "nlp = spacy.load(\"en_vectors_web_lg\")\n",
    "tokens = nlp(\"dog cat banana afskfsd\")\n",
    "for token in tokens:\n",
    "    print(token, token.vector)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's make a function to get the average vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_average_vector_for_text(text: str) -> np.ndarray:\n",
    "    tokens = nlp(text)\n",
    "    vectors = []\n",
    "    for token in tokens:\n",
    "        vectors.append(token.vector)\n",
    "    all_vecs = np.array(vectors)\n",
    "    #print(\"shape: \", all_vecs.shape)\n",
    "    vec = np.mean(all_vecs, axis=0)\n",
    "    return vec"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's compare the average vector method with the transfomer method\n",
    "* We will compare them on the task of finding similar tweets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load Tweets\n",
    "Can be found at with the code at https://github.com/jmugan/modern_practical_nlp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "all_tweets: List[str] = []\n",
    "with open('jmugan_tweets.txt', 'r') as f:\n",
    "    for tweet in f:\n",
    "        tweet = tweet.strip()  # remove newline\n",
    "        all_tweets.append(tweet)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generate the Vectors\n",
    "* Save them after that because it is a bit slow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data directory:  /Users/jmugan/temp\n",
      "Loading saved transformer vecs.\n",
      "Generating average vecs.\n"
     ]
    }
   ],
   "source": [
    "home = str(Path.home())\n",
    "data_dir = os.path.join(home, 'temp')\n",
    "print(\"Data directory: \", data_dir)\n",
    "\n",
    "transformer_vec_pickle_file = os.path.join(data_dir, 'transformer_vecs.pkl')\n",
    "average_vec_pickle_file = os.path.join(data_dir, 'average_vecs.pkl')\n",
    "\n",
    "if os.path.isfile(transformer_vec_pickle_file):\n",
    "    print(\"Loading saved transformer vecs.\")\n",
    "    with open(transformer_vec_pickle_file, 'rb') as f:\n",
    "        all_transformer_vecs = pickle.load(f)\n",
    "else:\n",
    "    print(\"Generating transformer vecs.\")\n",
    "    all_transformer_vecs: List[np.ndarray] = []\n",
    "    for tweet in all_tweets:\n",
    "        vector = make_vector(tweet)\n",
    "        all_transformer_vecs.append(vector)\n",
    "    with open(transformer_vec_pickle_file, 'wb') as f:\n",
    "        pickle.dump(all_transformer_vecs, f)\n",
    "\n",
    "if os.path.isfile(average_vec_pickle_file):\n",
    "    print(\"Loading saved average vecs.\")\n",
    "    with open(average_vec_pickle_file, 'rb') as f:\n",
    "        all_average_vecs = pickle.load(f)\n",
    "else:\n",
    "    print(\"Generating average vecs.\")\n",
    "    all_average_vecs: List[np.ndarray] = []\n",
    "    for tweet in all_tweets:\n",
    "        vector = get_average_vector_for_text(tweet)\n",
    "        all_average_vecs.append(vector)\n",
    "    with open(average_vec_pickle_file, 'wb') as f:\n",
    "        pickle.dump(all_average_vecs, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Code to Find Most Similar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_most_similar_vector(text_vec: np.ndarray, vectors: List[np.ndarray]\n",
    "                           ) -> Tuple[float, str]:\n",
    "    \"\"\"\n",
    "    This is simple and slow. In reality you want to use something like \n",
    "    Faiss to find the closest vector\n",
    "    https://engineering.fb.com/data-infrastructure/faiss-a-library-for-efficient-similarity-search/\n",
    "    \"\"\"\n",
    "    closest_index = None\n",
    "    smallest_distance = 10000\n",
    "    for i, vector in enumerate(vectors):\n",
    "        #dist = np.linalg.norm(vector - text_vec)  # fancy people use cosine distance\n",
    "        dist = scipy.spatial.distance.cosine(vector, text_vec)\n",
    "        if dist < smallest_distance:\n",
    "            smallest_distance = dist\n",
    "            closest_index = i\n",
    "    return smallest_distance, all_tweets[closest_index]\n",
    "\n",
    "\n",
    "def get_most_similar_tweet_transformer(text: str) -> Tuple[float, str]:\n",
    "    text_vec = make_vector(text)\n",
    "    return get_most_similar_vector(text_vec, all_transformer_vecs)\n",
    "\n",
    "def get_most_similar_tweet_average(text: str) -> Tuple[float, str]:\n",
    "    text_vec = get_average_vector_for_text(text)\n",
    "    return get_most_similar_vector(text_vec, all_average_vecs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Try it Out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "orig:  Cats don't like to wrestle.\n",
      "Transformer (0.00): Cats don't like to wrestle.\n",
      "Average (0.00): Cats don't like to wrestle.\n",
      "\n",
      "orig:  This transformer vector approach doesn't work well on tweets because it was trained on other data.\n",
      "Transformer (0.04): Children used to build either glue or snap-on models. Now, Legos are so detailed that kids build with those instead.\n",
      "Average (0.07): Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\n",
      "\n",
      "orig:  I like to watch movies.\n",
      "Transformer (0.03): I just watched 127 Hours with the special lady. Felt kind of bad sitting there with my microwave popcorn and big glass of cold water.\n",
      "Average (0.10): I like a lot of kids movies, but I just can't get into these Ice Age films.\n",
      "\n",
      "orig:  My children like to climb trees, drink water, and read about aardvarks.\n",
      "Transformer (0.02): It's funny how, in a group, the result of hoarding is more hoarding.\n",
      "Average (0.11): Amazing how kids practice actions. At age 2, my boy would go up the step, down the step. Repeat. He loved it, drove me nuts.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "orig = \"Cats don't like to wrestle.\"\n",
    "tweet_1 = (\"This transformer vector approach doesn't work well on tweets because\" +\n",
    "          \" it was trained on other data.\")\n",
    "tweet_2 = \"I like to watch movies.\"  \n",
    "tweet_3 = \"My children like to climb trees, drink water, and read about aardvarks.\"\n",
    "\n",
    "new_tweets = [orig, tweet_1, tweet_2, tweet_3]\n",
    "\n",
    "for tweet in new_tweets:\n",
    "    print(\"orig: \",tweet)\n",
    "    dist, closest = get_most_similar_tweet_transformer(tweet)\n",
    "    print(f\"Transformer ({dist:.2f}): {closest}\")\n",
    "    dist, closest = get_most_similar_tweet_average(tweet)\n",
    "    print(f\"Average ({dist:.2f}): {closest}\")\n",
    "    print()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tweets are Weird\n",
    "* Bert would work better if we used it for data other than tweets. Tweets are weird.\n",
    "* You can fine-tune Bert on your data for finding similar documents, but it is a little more involved. Here's a place to get you started. https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "* We have seen that we can convert natural language into vectors, which is a representation amenable to computation and automation.\n",
    "* In the next video, we'll look at fine-tuning these vectors for classification so that we can put documents into buckets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.7.4 64-bit ('base': conda)",
   "language": "python",
   "name": "python37464bitbaseconda6a86cbe610af4db3860115ffdb24f1cc"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: Episode_2_Classifying_with_Vectors.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Classifying Text with Vectors\n",
    "\n",
    "* Last episode we converted text to vectors and used those vectors to find similar documents. \n",
    "* This episode we are going to use those vectors for classification.\n",
    "* We are also going to use machine learning to make the vectors align with our classification goal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import List, Tuple\n",
    "import os.path\n",
    "from pathlib import Path\n",
    "import random\n",
    "\n",
    "import torch\n",
    "import torch.nn.functional as F\n",
    "from transformers import BertModel, BertTokenizer, BertConfig"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Install TensorBoard\n",
    "\n",
    "We are going to use it to graph our results.\n",
    "\n",
    "`pip install tensorboard`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.tensorboard import SummaryWriter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's get the data\n",
    "\n",
    "* We are going to take my tweets and classify whether they are about movies or not.\n",
    "* Since most tweets aren't about movies, we have to down sample. Imbalanced classes are a whole topic that we can cover another time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data directory:  /Users/jmugan/temp\n",
      "Downsampled dataset: 45 positive tweets out of 101\n"
     ]
    }
   ],
   "source": [
    "home = str(Path.home())\n",
    "data_dir = os.path.join(home, 'temp')\n",
    "print(\"Data directory: \", data_dir)\n",
    "\n",
    "all_tweets: List[str] = []\n",
    "with open('jmugan_tweets.txt', 'r') as f:\n",
    "    for tweet in f:\n",
    "        tweet = tweet.strip()  # remove newline\n",
    "        all_tweets.append(tweet)\n",
    "\n",
    "def is_movie_tweet(tweet: str) -> bool:\n",
    "    return 'movie' in tweet.lower()\n",
    "\n",
    "def down_sample(all_tweets: List[str]) -> List[str]:\n",
    "    down_sampled = []\n",
    "    num_pos = 0\n",
    "    for tweet in all_tweets:\n",
    "        if is_movie_tweet(tweet):\n",
    "            num_pos += 1\n",
    "            down_sampled.append(tweet)\n",
    "        elif random.random() > .95:  # let in 80% of others\n",
    "            down_sampled.append(tweet)\n",
    "    print(f\"Downsampled dataset: {num_pos} positive tweets out of {len(down_sampled)}\")\n",
    "    return down_sampled\n",
    "\n",
    "# Normally, we would break the data into train, valid, and test, \n",
    "# but this is just a quick video\n",
    "training_tweets = down_sample(all_tweets)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# We need to `Dataset` to represent our data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import Dataset\n",
    "\n",
    "# https://pytorch.org/docs/stable/data.html\n",
    "# You don't have to load the whole data into memory\n",
    "class TweetDataset(torch.utils.data.Dataset):\n",
    "\n",
    "    def __init__(self, tweets: List[str]):\n",
    "        self.tweets = tweets\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.tweets)\n",
    "\n",
    "    def __getitem__(self, index: int):\n",
    "        return self.tweets[index]\n",
    "\n",
    "    #def __iter__(self):\n",
    "    #    # you could be pulling these from a file instead so the whole thing\n",
    "    #    # doesn't have to sit in memory, see torch.utils.data.IterableDataset\n",
    "    #    for tweet in self.tweets:\n",
    "    #        yield tweet\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# We need a `DataLoader` to batch and preprocess our data and pass it to the learner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# recall from last time\n",
    "def get_tokens(text: str,\n",
    "               tokenizer: BertTokenizer,\n",
    "               config: BertConfig) -> List[str]:\n",
    "    tokens = tokenizer.tokenize(text)\n",
    "    # make sure it isn't too long\n",
    "    max_length = config.max_position_embeddings\n",
    "    tokens = tokens[:max_length-1] # Will add special begin token\n",
    "    # cls token to hold vector https://huggingface.co/transformers/main_classes/tokenizer.html\n",
    "    tokens = [tokenizer.cls_token] + tokens\n",
    "    return tokens\n",
    "\n",
    "\n",
    "\n",
    "def get_labels(batch: List[str]) -> List[int]:\n",
    "    \"\"\"\n",
    "    Normally you have the labeled data from somewhere, maybe somebody hand labeled it.\n",
    "    Here, we will just use a simple-stupid function\n",
    "    \"\"\"\n",
    "    labels: List[int] = []\n",
    "    for tweet in batch:\n",
    "        if is_movie_tweet(tweet):\n",
    "            labels.append(1)\n",
    "        else:\n",
    "            labels.append(0)\n",
    "    return labels\n",
    "\n",
    "\n",
    "# We need a function that puts masks on because the tweets have different lengths\n",
    "def preprocess_batch(batch: List[str]\n",
    "    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, List[str]]:\n",
    "    \"\"\"\n",
    "    We get a list of batch-size in and we have to convert it to ids and make a mask\n",
    "    \"\"\"\n",
    "\n",
    "    # First tokenize\n",
    "    tokenized_tweets: List[List[str]] = [get_tokens(tweet, tokenizer, config) \n",
    "                                         for tweet in batch]\n",
    "\n",
    "    # find the max length\n",
    "    lengths = [len(tokenized_tweet) for tokenized_tweet in tokenized_tweets]\n",
    "    max_length = max(lengths)\n",
    "\n",
    "    # get batch size\n",
    "    batch_size = len(batch)\n",
    "\n",
    "    # let's make it tensors\n",
    "    input_data = torch.zeros(batch_size, max_length, dtype=torch.long)\n",
    "    mask_data = torch.zeros(batch_size, max_length, dtype=torch.long)\n",
    "\n",
    "    for i, tokenized_tweet in enumerate(tokenized_tweets):\n",
    "        token_ids = tokenizer.convert_tokens_to_ids(tokenized_tweet)\n",
    "        tensor = torch.tensor(token_ids)\n",
    "        input_data[i,:len(token_ids)] = tensor\n",
    "        mask_data[i,:len(token_ids)] = 1\n",
    "\n",
    "    # get labels\n",
    "    labels = get_labels(batch)\n",
    "    labels_tensor = torch.tensor(labels, dtype=torch.long)\n",
    "\n",
    "    return input_data, mask_data, labels_tensor, batch\n",
    "\n",
    "\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "BATCH_SIZE = 20\n",
    "train_data_loader = DataLoader(\n",
    "    TweetDataset(training_tweets),\n",
    "    batch_size = BATCH_SIZE,\n",
    "    shuffle = True,\n",
    "    collate_fn=preprocess_batch\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Now let's define our classifier as a PyTorch module\n",
    "* It takes a Bert model as the encoder and declares a matrix of size `hidden_size x 2` to map the Bert vector down to two dimensions (tweet about movies or not)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Classifier(torch.nn.Module):\n",
    "    def __init__(self, encoder: BertModel):\n",
    "        \"\"\"\n",
    "        The init function of a module specifies the model parameters\n",
    "        \"\"\"\n",
    "        super().__init__()\n",
    "        self.encoder = encoder\n",
    "        # We have two classes, about movies and not about movies\n",
    "        self.classes = torch.nn.Linear(encoder.config.hidden_size, 2)\n",
    "\n",
    "    def forward(self, input_data: torch.Tensor,\n",
    "                mask_data: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:\n",
    "        \"\"\"\n",
    "        Forward is called when you call the model instance\n",
    "        \"\"\"\n",
    "\n",
    "        # The mask is so it doesn't pay attention to any tokens that are just \n",
    "        # filling up space. You need this because the input tensor is of shape \n",
    "        # batch_size x max_len where max_len is the longest tweet in that batch.\n",
    "        _last_hidden_state, pooler_output = self.encoder(input_data, \n",
    "                                                         attention_mask = mask_data)\n",
    "\n",
    "        # shape batch_size x encoder.config.hidden_size\n",
    "        vectors = pooler_output\n",
    "\n",
    "        # shape batch_size x 2\n",
    "        logits = self.classes(vectors)\n",
    "\n",
    "        # shape batch_size x 2; do softmax and log to pass \n",
    "        # into negative log likelihood later\n",
    "        log_softmax = F.log_softmax(logits, dim=1)\n",
    "\n",
    "        # We will use the vectors in the next video\n",
    "        return log_softmax, vectors\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Let's pull up our Bert model like we did before"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_name = 'bert-base-uncased'\n",
    "# Need to use the same tokenizer that was used to train the model so that it breaks up words\n",
    "# into tokens the same way.\n",
    "tokenizer = BertTokenizer.from_pretrained(model_name)\n",
    "\n",
    "# This model is huge!!!!!!!!\n",
    "model = BertModel.from_pretrained(model_name)\n",
    "\n",
    "# What do I use this for?\n",
    "config = BertConfig.from_pretrained(model_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# We can now instantiate our classifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Classifier(\n",
       "  (encoder): BertModel(\n",
       "    (embeddings): BertEmbeddings(\n",
       "      (word_embeddings): Embedding(30522, 768, padding_idx=0)\n",
       "      (position_embeddings): Embedding(512, 768)\n",
       "      (token_type_embeddings): Embedding(2, 768)\n",
       "      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "      (dropout): Dropout(p=0.1, inplace=False)\n",
       "    )\n",
       "    (encoder): BertEncoder(\n",
       "      (layer): ModuleList(\n",
       "        (0): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (1): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (2): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (3): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (4): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (5): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (6): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (7): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (8): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (9): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (10): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "        (11): BertLayer(\n",
       "          (attention): BertAttention(\n",
       "            (self): BertSelfAttention(\n",
       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "            (output): BertSelfOutput(\n",
       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "              (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "              (dropout): Dropout(p=0.1, inplace=False)\n",
       "            )\n",
       "          )\n",
       "          (intermediate): BertIntermediate(\n",
       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
       "          )\n",
       "          (output): BertOutput(\n",
       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
       "            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n",
       "            (dropout): Dropout(p=0.1, inplace=False)\n",
       "          )\n",
       "        )\n",
       "      )\n",
       "    )\n",
       "    (pooler): BertPooler(\n",
       "      (dense): Linear(in_features=768, out_features=768, bias=True)\n",
       "      (activation): Tanh()\n",
       "    )\n",
       "  )\n",
       "  (classes): Linear(in_features=768, out_features=2, bias=True)\n",
       ")"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "classifier = Classifier(model)\n",
    "classifier.train()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Set up our optimizers\n",
    "Slower learning rate for Bert so we don't change it so much"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "bert_optimizer = torch.optim.Adam(classifier.encoder.parameters(), lr=0.000005)\n",
    "head_optimizer = torch.optim.Adam(classifier.classes.parameters(), lr=0.001)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time to train the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "STARTING EPOCH:  0\n",
      "Loss: 0.72\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [1 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 1 0 0 0], Accuracy: 0.6\n",
      "['Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'Amazing how black socks change to blue when you take them out of the dresser drawer.', 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", 'Recently watched the movie The Bay. Anyone want to go for a swim?', 'Coat hangers are very poorly behaved objects.', 'My inbox now has 0 messages. It feels good to get organized.', \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.']\n",
      "Loss: 0.74\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [1 0 0 1 1 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1], Accuracy: 0.55\n",
      "['Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing', \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", 'I love the smell of coffee in the morning. It smells like victory.', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\"]\n",
      "Loss: 0.68\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [0 1 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0], Accuracy: 0.6\n",
      "['I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', \"It's amazing how much joy in life comes from making and executing plans.\", 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'When they show scientists in movies, they never show them making PowerPoint slides.', \"Why can't we get good disaster movies?\", 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'I keep buying pyrite but I must be doing something wrong.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\"]\n",
      "Loss: 0.68\n",
      "Predictions: [0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0], Labels: [1 1 0 0 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 0], Accuracy: 0.55\n",
      "['Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', 'Good comedy consists of statements that are both true and absurd.', 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', \"One bummer about getting older is that I've already seen every new movie that comes out.\", \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'Want evidence of intelligent machines? Captchas are now too hard for me.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\"]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.73\n",
      "Predictions: [1 1 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 1], Labels: [0 1 0 1 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 1], Accuracy: 0.4\n",
      "[\"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', \"My son (7) wants to know why he can't have a credit card.\", \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'I love it when movies about ghosts are \"based on a true story.\"', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'I love how a good book becomes part of your mental life forever.', 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.']\n",
      "Loss: 0.70\n",
      "Predictions: [0], Labels: [1], Accuracy: 0.0\n",
      "['Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.']\n",
      "STARTING EPOCH:  1\n",
      "Loss: 0.74\n",
      "Predictions: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0], Accuracy: 0.45\n",
      "['Movies should bring back quicksand. We all need to be reminded of the dangers.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', 'Coat hangers are very poorly behaved objects.', \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", \"It's amazing how much joy in life comes from making and executing plans.\", 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.']\n",
      "Loss: 0.64\n",
      "Predictions: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [1 0 1 1 1 1 1 0 1 0 1 0 0 1 1 1 0 1 0 1], Accuracy: 0.65\n",
      "['Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", 'When they show scientists in movies, they never show them making PowerPoint slides.', 'Recently watched the movie The Bay. Anyone want to go for a swim?', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.']\n",
      "Loss: 0.84\n",
      "Predictions: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [0 1 1 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1], Accuracy: 0.35\n",
      "[\"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'My inbox now has 0 messages. It feels good to get organized.', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'Good comedy consists of statements that are both true and absurd.', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.78\n",
      "Predictions: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 1], Accuracy: 0.4\n",
      "[\"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", \"One bummer about getting older is that I've already seen every new movie that comes out.\", \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", \"My son (7) wants to know why he can't have a credit card.\", 'Amazing how black socks change to blue when you take them out of the dresser drawer.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\"]\n",
      "Loss: 0.71\n",
      "Predictions: [0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [0 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 0 0 0], Accuracy: 0.45\n",
      "['I love the smell of coffee in the morning. It smells like victory.', \"Why can't we get good disaster movies?\", 'Went to Costco for the first time not too long ago. That place is a disaster movie.', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', 'I love it when movies about ghosts are \"based on a true story.\"', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'Want evidence of intelligent machines? Captchas are now too hard for me.', \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", 'I love how a good book becomes part of your mental life forever.', \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", 'I keep buying pyrite but I must be doing something wrong.', \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.']\n",
      "Loss: 0.90\n",
      "Predictions: [1], Labels: [0], Accuracy: 0.0\n",
      "[\"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\"]\n",
      "STARTING EPOCH:  2\n",
      "Loss: 0.69\n",
      "Predictions: [1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0], Labels: [0 1 0 0 0 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0], Accuracy: 0.65\n",
      "['I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', \"My son (7) wants to know why he can't have a credit card.\", \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'Coat hangers are very poorly behaved objects.', \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", 'I love how a good book becomes part of your mental life forever.', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.']\n",
      "Loss: 0.61\n",
      "Predictions: [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0], Labels: [1 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0], Accuracy: 0.7\n",
      "['Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', 'I love the smell of coffee in the morning. It smells like victory.', \"Why can't we get good disaster movies?\", 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'I keep buying pyrite but I must be doing something wrong.', \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\"]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.68\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0], Labels: [0 1 1 0 1 0 1 0 1 1 0 1 0 0 0 0 0 1 1 0], Accuracy: 0.6\n",
      "[\"I hate when I write a note to myself but then later don't remember what the note refers to.\", \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'When they show scientists in movies, they never show them making PowerPoint slides.', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', \"It's amazing how much joy in life comes from making and executing plans.\", 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'Recently watched the movie The Bay. Anyone want to go for a swim?', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.']\n",
      "Loss: 0.77\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [1 1 0 0 1 1 1 0 1 0 0 0 0 1 1 1 0 1 0 1], Accuracy: 0.45\n",
      "['Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing', 'Sugar is the enemy. Sugar gives me life. #Productivity', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'Want evidence of intelligent machines? Captchas are now too hard for me.', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"One bummer about getting older is that I've already seen every new movie that comes out.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.']\n",
      "Loss: 0.74\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [1 1 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 1], Accuracy: 0.55\n",
      "['Went to Costco for the first time not too long ago. That place is a disaster movie.', 'I love it when movies about ghosts are \"based on a true story.\"', \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'My inbox now has 0 messages. It feels good to get organized.', 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', 'Amazing how black socks change to blue when you take them out of the dresser drawer.', 'Good comedy consists of statements that are both true and absurd.', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.']\n",
      "Loss: 0.18\n",
      "Predictions: [0], Labels: [0], Accuracy: 1.0\n",
      "['Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?']\n",
      "STARTING EPOCH:  3\n",
      "Loss: 0.78\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0], Labels: [0 1 0 0 1 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0], Accuracy: 0.5\n",
      "[\"I hate when I write a note to myself but then later don't remember what the note refers to.\", \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", 'Want evidence of intelligent machines? Captchas are now too hard for me.', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", 'Coat hangers are very poorly behaved objects.', \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", \"Why can't we get good disaster movies?\", \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'When they show scientists in movies, they never show them making PowerPoint slides.', 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\"]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.76\n",
      "Predictions: [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1], Labels: [0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 0 1 1 1 0], Accuracy: 0.55\n",
      "['I love the smell of coffee in the morning. It smells like victory.', 'My inbox now has 0 messages. It feels good to get organized.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'I love how a good book becomes part of your mental life forever.', 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", 'I keep buying pyrite but I must be doing something wrong.', \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.']\n",
      "Loss: 0.57\n",
      "Predictions: [0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0], Labels: [0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0], Accuracy: 0.75\n",
      "['Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', \"One bummer about getting older is that I've already seen every new movie that comes out.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', 'I love it when movies about ghosts are \"based on a true story.\"', 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', 'Recently watched the movie The Bay. Anyone want to go for a swim?', 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing']\n",
      "Loss: 0.64\n",
      "Predictions: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], Labels: [0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 1 1 0 1 0], Accuracy: 0.6\n",
      "['I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'Amazing how black socks change to blue when you take them out of the dresser drawer.', 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\"]\n",
      "Loss: 0.79\n",
      "Predictions: [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0], Labels: [1 0 1 0 0 0 1 0 1 1 1 0 0 1 0 1 1 0 0 1], Accuracy: 0.4\n",
      "[\"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", \"My son (7) wants to know why he can't have a credit card.\", \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', \"It's amazing how much joy in life comes from making and executing plans.\", 'Good comedy consists of statements that are both true and absurd.', 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.79\n",
      "Predictions: [0], Labels: [1], Accuracy: 0.0\n",
      "['Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.']\n",
      "STARTING EPOCH:  4\n",
      "Loss: 0.58\n",
      "Predictions: [1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0], Labels: [1 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0 0], Accuracy: 0.65\n",
      "['Recently watched the movie The Bay. Anyone want to go for a swim?', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\"]\n",
      "Loss: 0.60\n",
      "Predictions: [0 0 1 0 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0], Labels: [1 0 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1 1 0 0], Accuracy: 0.75\n",
      "[\"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'I keep buying pyrite but I must be doing something wrong.', \"My son (7) wants to know why he can't have a credit card.\", \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'Coat hangers are very poorly behaved objects.', 'Good comedy consists of statements that are both true and absurd.']\n",
      "Loss: 0.51\n",
      "Predictions: [0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1 0 0 0], Labels: [0 0 1 0 1 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0], Accuracy: 0.9\n",
      "[\"It's amazing how much joy in life comes from making and executing plans.\", \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", \"One bummer about getting older is that I've already seen every new movie that comes out.\", 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'When they show scientists in movies, they never show them making PowerPoint slides.', 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', 'I love how a good book becomes part of your mental life forever.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', 'My inbox now has 0 messages. It feels good to get organized.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Want evidence of intelligent machines? Captchas are now too hard for me.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\"]\n",
      "Loss: 0.56\n",
      "Predictions: [0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 0 1], Labels: [1 0 0 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1], Accuracy: 0.7\n",
      "['Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.63\n",
      "Predictions: [0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 1], Labels: [0 0 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 0 0 1], Accuracy: 0.65\n",
      "['Amazing how black socks change to blue when you take them out of the dresser drawer.', 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', \"Why can't we get good disaster movies?\", \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', 'I love it when movies about ghosts are \"based on a true story.\"', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'I love the smell of coffee in the morning. It smells like victory.', 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?']\n",
      "Loss: 0.38\n",
      "Predictions: [1], Labels: [1], Accuracy: 1.0\n",
      "['I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people']\n",
      "STARTING EPOCH:  5\n",
      "Loss: 0.58\n",
      "Predictions: [1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1], Labels: [0 1 0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 0 0], Accuracy: 0.6\n",
      "['I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', \"One bummer about getting older is that I've already seen every new movie that comes out.\", \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', 'Amazing how black socks change to blue when you take them out of the dresser drawer.', \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'Want evidence of intelligent machines? Captchas are now too hard for me.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.']\n",
      "Loss: 0.47\n",
      "Predictions: [1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1], Labels: [1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 1 1 1 1 0], Accuracy: 0.8\n",
      "['When they show scientists in movies, they never show them making PowerPoint slides.', 'I love it when movies about ghosts are \"based on a true story.\"', 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'One of my superpowers is the ability to abandon a book or movie halfway through.', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', 'I love how a good book becomes part of your mental life forever.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.']\n",
      "Loss: 0.64\n",
      "Predictions: [0 1 1 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 0], Labels: [0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0], Accuracy: 0.55\n",
      "[\"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', 'I keep buying pyrite but I must be doing something wrong.', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.61\n",
      "Predictions: [1 1 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 1], Labels: [0 0 0 0 1 0 0 0 1 1 1 0 1 0 1 0 1 0 1 1], Accuracy: 0.6\n",
      "['I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'I love the smell of coffee in the morning. It smells like victory.', 'Recently watched the movie The Bay. Anyone want to go for a swim?', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', \"Why can't we get good disaster movies?\", 'My inbox now has 0 messages. It feels good to get organized.', 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'Good comedy consists of statements that are both true and absurd.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.']\n",
      "Loss: 0.71\n",
      "Predictions: [0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1], Labels: [0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1], Accuracy: 0.5\n",
      "['I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", \"It's amazing how much joy in life comes from making and executing plans.\", \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', 'Coat hangers are very poorly behaved objects.', \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", \"My son (7) wants to know why he can't have a credit card.\", \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing']\n",
      "Loss: 0.93\n",
      "Predictions: [0], Labels: [1], Accuracy: 0.0\n",
      "[\"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\"]\n",
      "STARTING EPOCH:  6\n",
      "Loss: 0.56\n",
      "Predictions: [0 1 1 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1], Labels: [0 0 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0], Accuracy: 0.8\n",
      "[\"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'One of my superpowers is the ability to abandon a book or movie halfway through.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'I love the smell of coffee in the morning. It smells like victory.', \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Good comedy consists of statements that are both true and absurd.']\n",
      "Loss: 0.48\n",
      "Predictions: [0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 1], Labels: [0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1], Accuracy: 0.85\n",
      "[\"My son (7) wants to know why he can't have a credit card.\", 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', 'Want evidence of intelligent machines? Captchas are now too hard for me.', \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", 'I love how a good book becomes part of your mental life forever.', 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.42\n",
      "Predictions: [0 0 1 0 1 1 1 0 1 0 0 1 0 1 1 1 1 0 1 0], Labels: [0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 0 1 0 1 0], Accuracy: 0.9\n",
      "['Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'I love it when movies about ghosts are \"based on a true story.\"', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'When they show scientists in movies, they never show them making PowerPoint slides.', 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'I keep buying pyrite but I must be doing something wrong.']\n",
      "Loss: 0.43\n",
      "Predictions: [1 1 0 0 1 0 0 1 1 0 0 0 1 0 1 1 1 0 1 1], Labels: [1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0 1 1], Accuracy: 0.85\n",
      "[\"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', \"It's amazing how much joy in life comes from making and executing plans.\", \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", 'Coat hangers are very poorly behaved objects.', 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', 'Amazing how black socks change to blue when you take them out of the dresser drawer.', \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\"]\n",
      "Loss: 0.45\n",
      "Predictions: [1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1 0], Labels: [1 1 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1], Accuracy: 0.75\n",
      "['It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.', 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', \"One bummer about getting older is that I've already seen every new movie that comes out.\", 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', 'I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'My inbox now has 0 messages. It feels good to get organized.', 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing', \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", 'Recently watched the movie The Bay. Anyone want to go for a swim?', \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', \"Why can't we get good disaster movies?\"]\n",
      "Loss: 0.20\n",
      "Predictions: [1], Labels: [1], Accuracy: 1.0\n",
      "[\"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\"]\n",
      "STARTING EPOCH:  7\n",
      "Loss: 0.40\n",
      "Predictions: [0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0], Labels: [0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1], Accuracy: 0.8\n",
      "['My kids saw a typewriter today and were blown away. \"What IS that thing?\" Like it was from an alien world.', 'Finally watched Dr. Strangelove. Great movie, but the last scene with the character Dr. Strangelove was ... stupid.', 'I\\'ve been in the workforce for 15 years, and I still can\\'t get in the habit of saying \"good morning\" instead of \"hey\" or \"what\\'s up?\"', 'Amazing how much of our time we spend in fantasy---movies, books, and TV. We live multiple simultaneous narratives.', 'Just watched Tyrannosaur. That movie breaks your heart over and over again. Fewer dinosaurs than I expected.', 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', \"I've never been one to see symbolism in books or movies. I like to take fictional stories at face value, as if they were my own experience.\", \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", 'Compared to the cold and hungry existence lived by our ancestors, our society is almost perfect. I tell myself that when I buy printer ink.', 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', 'I learned a new word yesterday. aliteracy: people who can read but choose not to.', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'Just saw Limitless. Great movie. But he said doubling your money every day was too slow. Clearly, even with NZT, he was still limited.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", 'I love how a good book becomes part of your mental life forever.', 'Went to Costco for the first time not too long ago. That place is a disaster movie.', 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', \"It's amazing how much joy in life comes from making and executing plans.\", 'Young people sometimes read the same books and watch the same movies multiple times. When we get older we stop doing that. Why?']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.43\n",
      "Predictions: [1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1], Labels: [1 1 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1], Accuracy: 0.85\n",
      "['I wonder why we raise our kids on movies with clear good guys and bad guys, when for the most part, the real world is just people', 'Just saw Black Swan. Man, that was stressful. I need to relax, maybe watch a movie or something.', \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\", 'I just wrote the word \"print\" and Google Docs flagged it as misspelled. It wanted me to write \"printf\". How cool is that?', 'One of my superpowers is the ability to abandon a book or movie halfway through.', 'Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', 'Coat hangers are very poorly behaved objects.', 'Network programming is somehow very satisfying. All of these happy little computers talking to each other.', 'My inbox now has 0 messages. It feels good to get organized.', \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", 'Good comedy consists of statements that are both true and absurd.', \"It's weird how music makes everything more exciting like movies, tech videos, parties. It must tap into some deep, social part of the brain.\", \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", 'Sprinkling unexpected words into your prose can keep it fresh but can also distract the reader. #Trade-offs #Writing', 'When younger, the soreness in my muscles after playing soccer was kind of cool. It meant I had worked hard. Now the soreness just hurts.', \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", \"Mute button didn't work on the box loudly squawking ads at the gas pump. Is the world not noisy enough? Those things are evil.\", \"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\"]\n",
      "Loss: 0.44\n",
      "Predictions: [1 1 0 1 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1], Labels: [1 1 0 1 1 1 0 0 1 1 1 1 1 0 1 0 1 0 0 0], Accuracy: 0.75\n",
      "['Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', 'Want evidence of intelligent machines? Captchas are now too hard for me.', 'Loved that movie Argo. It felt good to feel good about America, Hollywood, and Canada. #TwoThumbsUp', 'When they show scientists in movies, they never show them making PowerPoint slides.', \"Why can't we get good disaster movies?\", \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'Dreamed I played basketball with a giant strawberry for a ball. And there were South American soccer players camped out outside my window.', 'Talking about the 6 Star Wars movies with my kids. Terms like \"first one\" and \"last one\" result in limitless confusion.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'I love it when movies about ghosts are \"based on a true story.\"', 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'I love getting reminders from Microsoft Outlook telling me that I am 23 hours late for a meeting.', 'Movies should bring back quicksand. We all need to be reminded of the dangers.', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', 'I love the smell of coffee in the morning. It smells like victory.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\"]\n",
      "Loss: 0.33\n",
      "Predictions: [0 0 1 1 1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 1], Labels: [0 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 0 1 1 1], Accuracy: 0.95\n",
      "[\"My son (7) wants to know why he can't have a credit card.\", 'My daughter, age 3, kept asking me what time it was today. I guess she had somewhere to be.', 'Recently watched the movie The Bay. Anyone want to go for a swim?', \"Recently watched Anna Karenina. That movie would make no sense at all if I hadn't read the book.\", 'Evil Dead was last horror movie I saw. Never seen that kind of movie before. The brain is not meant to process that kind of sensory input.', 'HT @TheOnion: 5-Year-Old Critics Agree: Movie \"Cars\" Only Gets Better After 40th Viewing', 'Always surprised people keep those email ad signatures, such as \"Sent from my iPhone.\" Consider changing to \"Sent from my subconscious.\"', 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'Just saw The Muppets. Interesting movie about a small fringe group fighting to keep America dependent on foreign oil.', 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"Last one in the office. I'm riding a scooter down an empty hall, and I'm suddenly reminded of the Big Wheel scene from The Shining.\", 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', \"One bummer about getting older is that I've already seen every new movie that comes out.\", \"I bet that primitive people used to just sit around and tell stories about monsters. Anyway, I'm off to the movies.\", \"After having watched many Stars Wars movies with my kids lately, I've decided that R2-D2 is the unsung hero. The AI is strong in him.\", \"According to toddlers, a cat's tail was made for pulling. The perfect Gibsonian affordance.\", \"Hate when I can't find my mouse pointer. Even worse with two screens.\", \"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", 'Told my wife I rented a movie about an automobile tire that comes to life and murders people. She just walked out of the room. #Marriage', 'Love it when movies show you what it would be like to be someone else. Saw Last Train Home. Great movie about migrant workers in China.']\n",
      "Loss: 0.26\n",
      "Predictions: [0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1], Labels: [0 0 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 1], Accuracy: 0.95\n",
      "['Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', \"I hate when I write a note to myself but then later don't remember what the note refers to.\", \"A lot of my recurring dreams are like old movies that I don't quite remember. I know how they end, but I don't know how they'll get there.\", \"Camping in the living room eating S'mores and watching a movie. It's surprising that they don't have more safety railings on the Death Star.\", 'There is a tension in writing. You want to keep going to maintain momentum, but you also need to stop periodically to gain perspective.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'Amazing how black socks change to blue when you take them out of the dresser drawer.', 'Motto for the future of information: \"If it\\'s not relevant, get it out of my way.\"', 'I was once at a party where someone spilled red wine, and they used white wine to clean it up. I wonder if the opposite would work.', 'It would be nice to be able to cite videos in some permanent place.  Things disappear on YouTube.', \"I've largely cut sugar from my diet this week. Remember that Paul Giamatti movie Cold Souls? Me too.\", 'I keep buying pyrite but I must be doing something wrong.', \"Companies should realize that surveys must be short. I'm not going to click on  25 ovals, I just want to say the hotel carpet was dirty.\", \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", \"Was just explaining @klout and how a person's influence is measured relative to the max of 100 held by Justin Bieber. This is our society.\", 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?']\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loss: 0.17\n",
      "Predictions: [1], Labels: [1], Accuracy: 1.0\n",
      "[\"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\"]\n",
      "STARTING EPOCH:  8\n",
      "Loss: 0.28\n",
      "Predictions: [1 0 1 1 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1], Labels: [1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1], Accuracy: 0.9\n",
      "[\"There must be a special place in hell for the people who decided you can't skip movie previews on DVD.\", \"My stomach is uncomfortably full, but I am still distracted by hunger. What's that story about the guy pushing the rock up the hill?\", 'Wrestling with my boys. Good to have a little cave-man time before I go back to work tomorrow.', 'I love reading the reviews for kids movies. \"... the archetypes do not resonate as one would expect given the ...\" Kids want burp jokes.', \"I hate when I have to get up early. That's enough, but my body will often get me up even earlier than that for no good reason.\", \"It's amazing how much joy in life comes from making and executing plans.\", 'Fame and fortune await the first person to develop a robot that powers itself by eating fire ants.', 'Just saw Contagion. I thought it was pretty good. I like that it stayed with the big picture. Not as sappy as most disaster movies.', 'Coat hangers are very poorly behaved objects.', \"Man, how do people watch TV? You're sitting there all happy, and then suddenly someone tries to sell you something. Over, and over again.\", \"It's funny. In negotiations, the first person to specify a number loses, but the first person to establish a number as an anchor wins.\", \"Hate wearing a belt, but I need to dress fancy for work. I'm waiting for elastic belts to come back in fashion.\", \"Surprisingly, the search feature on Google docs doesn't work very well. If only they had access to search experts.\", \"I like a lot of kids movies, but I just can't get into these Ice Age films.\", \"I successfully resurrected my wife's old laptop by installing Ubuntu. My work here is done.\", \"Finally saw Sleep Dealer (Traficante de Suenos). If you like dystopian movies about migrant works and drones, this one's for you. I loved it\", 'Good comedy consists of statements that are both true and absurd.', 'Isn\\'t it weird that an actor is \"in\" a movie but \"on\" a TV show?', \"Hate when I can't find my mouse pointer. Even worse with two screens.\", \"For picking movies, I've learned that not only must I enjoy it for 2 hours, but I must also enjoy having it in my head for days after.\"]\n",
      "Loss: 0.22\n",
      "Predictions: [0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 0 1], Labels: [0 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1], Accuracy: 0.95\n",
      "['Should the streets be widened to help emergency vehicles? Of course. Should my yard be shortened to widen the street? Get off my property.', \"I used to go one way to work. But lately, I've been asking Google Maps, and it sends me another way. I'm starting to see its wisdom.\", 'I coach kindergarten soccer. My standard April fools day joke is to email the parents about having two-a-day practices.', 'It would be funny to make a movie where characters argue using only maxims. Early bird gets the worm! Second mouse gets the cheese!', 'Finally saw Napoleon Dynamite. Love how you can always hear the birds chirping in the background. And the steaks.', 'I\\'m used to \"twenty years ago\" meaning the 1970s, but that was twenty years ago.', 'I keep buying pyrite but I must be doing something wrong.', 'The weird thing about the Lord of the Rings movies is that they all seem to just end. Like they ran out of film or something. #TheHobbit', 'Someday, we are going to look back and be embarrassed about the number of comic book movies made during this time period.', \"The movie I watched about a tire that comes to life and starts killing people wasn't as good as I had hoped it would be.\", 'I love how a good book becomes part of your mental life forever.', 'Remember that kid\\'s movie Escape from Planet Earth? One character had the coolest coffee cup ever. It said, \"I love safety.\" I want that.', \"With a big bucket of popcorn, it doesn't matter what the movie is. You're happy.\", 'Office is out of coffee. Luckily, I have emergency backup. Kind of like the strategic petroleum reserve.', 'I feel lucky to have a job where I learn every day. The world changes so fast--you have to run to keep up. #scary', \"Some movies don't seem to have enough material to fill up the trailer. Don't know what they do for the other 1 hour and 28 minutes.\", 'Science moves forward by more accurate predictions. Math by new theorems. How does philosophy move forward?', 'Sugar is the enemy. Sugar gives me life. #Productivity', 'If it\\'s \"not necessary to dial a 1 when calling this number\" can\\'t the phone company just figure it out? Google has spoiled us all.', 'When they show scientists in movies, they never show them making PowerPoint slides.']\n",
      "Loss: 0.45\n",
      "Predictions: [0 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1], Labels: [0 0 0 0 1 1 1 0 1 1 0 0 0 0 1 1 1 0 0 1], Accuracy: 0.8\n",
      "[\"It's funny how deep questions and ignorant questions look alike. E.g., what is science?\", 'It would be nice if we could abandon all of this inane talk about celebrities and go back to gossiping about people that we actually know.', \"Just got done working in the yard. We could solve the world's energy problems if we could somehow harness the power of cords to get tangled.\", \"I wish my financial/retirement companies would stop sending me mail. With all these unnecessary words, I'm likely to miss important ones.\", 'Just got a wrong-number-call asking for a girl named Shirley. Always reminds me of that old movie Ruthless People.', \"A lot of 
Download .txt
gitextract_yp9_knfn/

├── Episode_1_Text_to_Vectors.ipynb
├── Episode_2_Classifying_with_Vectors.ipynb
├── Episode_3_Visualizing_Vectors.ipynb
├── Episode_4_Generating_Text_and_Extracting_Info.ipynb
├── README.md
└── jmugan_tweets.txt
Condensed preview — 6 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (547K chars).
[
  {
    "path": "Episode_1_Text_to_Vectors.ipynb",
    "chars": 65991,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"slideshow\": {\n     \"slide_type\": \"slide\"\n    }\n   },"
  },
  {
    "path": "Episode_2_Classifying_with_Vectors.ipynb",
    "chars": 168154,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Classifying Text with Vectors\\n\""
  },
  {
    "path": "Episode_3_Visualizing_Vectors.ipynb",
    "chars": 137921,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"slideshow\": {\n     \"slide_type\": \"slide\"\n    }\n   },"
  },
  {
    "path": "Episode_4_Generating_Text_and_Extracting_Info.ipynb",
    "chars": 11257,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Overview\\n\",\n    \"* In the first "
  },
  {
    "path": "README.md",
    "chars": 5024,
    "preview": "# Modern Practical Natural Language Processing\n\nThis course will cover how you can use NLP to do stuff.\n\nThere are four "
  },
  {
    "path": "jmugan_tweets.txt",
    "chars": 136431,
    "preview": "Cats don't like to wrestle.\nIt would be nice if we could abandon all of this inane talk about celebrities and go back to"
  }
]

About this extraction

This page contains the full source code of the jmugan/modern_practical_nlp GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 6 files (512.5 KB), approximately 145.1k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!