[
  {
    "path": ".gitignore",
    "content": "/data\n/papers\n/weights\n/summaries\n*.swp\n*.pyc\n*.zip\n*.xlsx\n*.gz\ndmn_original.py\n"
  },
  {
    "path": "LICENSE.txt",
    "content": "The MIT License (MIT)\n\nCopyright (c) 2016 Alex Barron\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Dynamic Memory Networks in TensorFlow\n\nDMN+ implementation in TensorFlow for question answering on the bAbI 10k dataset.\n\nStructure and parameters from [Dynamic Memory Networks for Visual and Textual Question Answering](https://arxiv.org/abs/1603.01417) which is henceforth referred to as Xiong et al.\n\nAdapted from Stanford's [cs224d](http://cs224d.stanford.edu/) assignment 2 starter code and using methods from [Dynamic Memory Networks in Theano](https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano) for importing the Babi-10k dataset.\n\n## Repository Contents\n| file | description |\n| --- | --- |\n| `dmn_plus.py` | contains the DMN+ model |\n| `dmn_train.py` | trains the model on a specified (-b) babi task|\n| `dmn_test.py` | tests the model on a specified (-b) babi task |\n| `babi_input.py` | prepares bAbI data for input into DMN |\n| `attention_gru_cell.py` | contains a custom Attention GRU cell implementation |\n| `fetch_babi_data.sh` | shell script to fetch bAbI tasks (from [DMNs in Theano](https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano)) |\n\n## Usage\nInstall [TensorFlow r1.4](https://www.tensorflow.org/install/)\n\nRun the included shell script to fetch the data\n\n\tbash fetch_babi_data.sh\n\nUse 'dmn_train.py' to train the DMN+ model contained in 'dmn_plus.py'\n\n\tpython dmn_train.py --babi_task_id 2\n\nOnce training is finished, test the model on a specified task\n\n\tpython dmn_test.py --babi_task_id 2\n\nThe l2 regularization constant can be set with -l2-loss (-l). All other parameters were specified by [Xiong et al](https://arxiv.org/abs/1603.01417) and can be found in the 'Config' class in 'dmn_plus.py'.\n\n## Benchmarks\nThe TensorFlow DMN+ reaches close to state of the art performance on the 10k dataset with weak supervision (no supporting facts).\n\nEach task was trained on separately with l2 = 0.001. As the paper suggests, 10 training runs were used for tasks 2, 3, 17 and 18 (configurable with --num-runs), where the weights which produce the lowest validation loss in any run are used for testing. \n\nThe pre-trained weights which achieve these benchmarks are available in 'pretrained'.\n\nI haven't yet had the time to fully optimize the l2 parameter which is not specified by the paper. My hypothesis is that fully optimizing l2 regularization would close the final significant performance gap between the TensorFlow DMN+ and original DMN+ on task 3. \n\nBelow are the full results for each bAbI task (tasks where both implementations achieved 0 test error are omitted):\n\n| Task ID | TensorFlow DMN+| Xiong et al DMN+ |\n| :---: | :---: | :---: |\n| 2 | 0.9 | 0.3 |\n| 3 | 18.4 | 1.1 |\n| 5 | 0.5 | 0.5 |\n| 7 | 2.8 | 2.4 |\n| 8 | 0.5 | 0.0 |\n| 9 | 0.1 | 0.0 |\n| 14 | 0.0 | 0.2 |\n| 16 | 46.2 | 45.3 |\n| 17 | 5.0 | 4.2 |\n| 18 | 2.2 | 2.1 |\n\n\n\n"
  },
  {
    "path": "attention_gru_cell.py",
    "content": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport math\n\nfrom tensorflow.python.framework import ops\nfrom tensorflow.python.ops import array_ops\nfrom tensorflow.python.ops import clip_ops\nfrom tensorflow.python.ops import embedding_ops\nfrom tensorflow.python.ops import init_ops\nfrom tensorflow.python.ops import math_ops\nfrom tensorflow.python.ops import nn_ops\nfrom tensorflow.python.ops import partitioned_variables\nfrom tensorflow.python.ops import variable_scope as vs\n\nfrom tensorflow.python.ops.math_ops import sigmoid\nfrom tensorflow.python.ops.math_ops import tanh\nfrom tensorflow.python.ops.rnn_cell_impl import RNNCell\n\nfrom tensorflow.python.platform import tf_logging as logging\nfrom tensorflow.python.util import nest\n\n\nclass AttentionGRUCell(RNNCell):\n    \"\"\"Gated Recurrent Unit incoporating attention (cf. https://arxiv.org/abs/1603.01417).\n       Adapted from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py\n\n       NOTE: Takes an input of shape:  (batch_size, max_time_step, input_dim + 1)\n       Where an input vector of shape: (batch_size, max_time_step, input_dim)\n       and scalar attention of shape:  (batch_size, max_time_step, 1)\n       are concatenated along the final axis\"\"\"\n\n    def __init__(self, num_units, input_size=None, activation=tanh):\n        if input_size is not None:\n            logging.warn(\"%s: The input_size parameter is deprecated.\", self)\n        self._num_units = num_units\n        self._activation = activation\n\n    @property\n    def state_size(self):\n        return self._num_units\n\n\n    @property\n    def output_size(self):\n        return self._num_units\n\n    def __call__(self, inputs, state, scope=None):\n        \"\"\"Attention GRU with nunits cells.\"\"\"\n        with vs.variable_scope(scope or \"attention_gru_cell\"):\n            with vs.variable_scope(\"gates\"):  # Reset gate and update gate.\n                # We start with bias of 1.0 to not reset and not update.\n                if inputs.get_shape()[-1] != self._num_units + 1:\n                    raise ValueError(\"Input should be passed as word input concatenated with 1D attention on end axis\")\n                # extract input vector and attention\n                inputs, g = array_ops.split(inputs,\n                        num_or_size_splits=[self._num_units,1],\n                        axis=1)\n                r = _linear([inputs, state], self._num_units, True)\n                r = sigmoid(r)\n            with vs.variable_scope(\"candidate\"):\n                r = r*_linear(state, self._num_units, False)\n            with vs.variable_scope(\"input\"):\n                x = _linear(inputs, self._num_units, True)\n            h_hat = self._activation(r + x)\n\n            new_h = (1 - g) * state + g * h_hat\n        return new_h, new_h\n\ndef _linear(args, output_size, bias, bias_start=0.0):\n    \"\"\"Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.\n    Args:\n    args: a 2D Tensor or a list of 2D, batch x n, Tensors.\n    output_size: int, second dimension of W[i].\n    bias: boolean, whether to add a bias term or not.\n    bias_start: starting value to initialize the bias; 0 by default.\n    Returns:\n    A 2D Tensor with shape [batch x output_size] equal to\n    sum_i(args[i] * W[i]), where W[i]s are newly created matrices.\n    Raises:\n    ValueError: if some of the arguments has unspecified or wrong shape.\n    \"\"\"\n    if args is None or (nest.is_sequence(args) and not args):\n        raise ValueError(\"`args` must be specified\")\n    if not nest.is_sequence(args):\n        args = [args]\n\n    # Calculate the total size of arguments on dimension 1.\n    total_arg_size = 0\n    shapes = [a.get_shape() for a in args]\n    for shape in shapes:\n        if shape.ndims != 2:\n            raise ValueError(\"linear is expecting 2D arguments: %s\" % shapes)\n        if shape[1].value is None:\n            raise ValueError(\"linear expects shape[1] to be provided for shape %s, \"\n                \"but saw %s\" % (shape, shape[1]))\n        else:\n            total_arg_size += shape[1].value\n\n    dtype = [a.dtype for a in args][0]\n\n    # Now the computation.\n    scope = vs.get_variable_scope()\n    with vs.variable_scope(scope) as outer_scope:\n        weights = vs.get_variable(\n            \"weights\", [total_arg_size, output_size], dtype=dtype)\n        if len(args) == 1:\n            res = math_ops.matmul(args[0], weights)\n        else:\n            res = math_ops.matmul(array_ops.concat(args, 1), weights)\n        if not bias:\n            return res\n        with vs.variable_scope(outer_scope) as inner_scope:\n            inner_scope.set_partitioner(None)\n            biases = vs.get_variable(\n                        \"biases\", [output_size],\n                      dtype=dtype,\n                    initializer=init_ops.constant_initializer(bias_start, dtype=dtype))\n        return nn_ops.bias_add(res, biases)\n"
  },
  {
    "path": "babi_input.py",
    "content": "from __future__ import division\nfrom __future__ import print_function\n\nimport sys\n\nimport os as os\nimport numpy as np\n\n# can be sentence or word\ninput_mask_mode = \"sentence\"\n\n# adapted from https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano/\ndef init_babi(fname):\n    \n    print(\"==> Loading test from %s\" % fname)\n    tasks = []\n    task = None\n    for i, line in enumerate(open(fname)):\n        id = int(line[0:line.find(' ')])\n        if id == 1:\n            task = {\"C\": \"\", \"Q\": \"\", \"A\": \"\", \"S\": \"\"} \n            counter = 0\n            id_map = {}\n            \n        line = line.strip()\n        line = line.replace('.', ' . ')\n        line = line[line.find(' ')+1:]\n        # if not a question\n        if line.find('?') == -1:\n            task[\"C\"] += line\n            id_map[id] = counter\n            counter += 1\n            \n        else:\n            idx = line.find('?')\n            tmp = line[idx+1:].split('\\t')\n            task[\"Q\"] = line[:idx]\n            task[\"A\"] = tmp[1].strip()\n            task[\"S\"] = []\n            for num in tmp[2].split():\n                task[\"S\"].append(id_map[int(num.strip())])\n            tasks.append(task.copy())\n\n    return tasks\n\n\ndef get_babi_raw(id, test_id):\n    babi_map = {\n        \"1\": \"qa1_single-supporting-fact\",\n        \"2\": \"qa2_two-supporting-facts\",\n        \"3\": \"qa3_three-supporting-facts\",\n        \"4\": \"qa4_two-arg-relations\",\n        \"5\": \"qa5_three-arg-relations\",\n        \"6\": \"qa6_yes-no-questions\",\n        \"7\": \"qa7_counting\",\n        \"8\": \"qa8_lists-sets\",\n        \"9\": \"qa9_simple-negation\",\n        \"10\": \"qa10_indefinite-knowledge\",\n        \"11\": \"qa11_basic-coreference\",\n        \"12\": \"qa12_conjunction\",\n        \"13\": \"qa13_compound-coreference\",\n        \"14\": \"qa14_time-reasoning\",\n        \"15\": \"qa15_basic-deduction\",\n        \"16\": \"qa16_basic-induction\",\n        \"17\": \"qa17_positional-reasoning\",\n        \"18\": \"qa18_size-reasoning\",\n        \"19\": \"qa19_path-finding\",\n        \"20\": \"qa20_agents-motivations\",\n        \"MCTest\": \"MCTest\",\n        \"19changed\": \"19changed\",\n        \"joint\": \"all_shuffled\", \n        \"sh1\": \"../shuffled/qa1_single-supporting-fact\",\n        \"sh2\": \"../shuffled/qa2_two-supporting-facts\",\n        \"sh3\": \"../shuffled/qa3_three-supporting-facts\",\n        \"sh4\": \"../shuffled/qa4_two-arg-relations\",\n        \"sh5\": \"../shuffled/qa5_three-arg-relations\",\n        \"sh6\": \"../shuffled/qa6_yes-no-questions\",\n        \"sh7\": \"../shuffled/qa7_counting\",\n        \"sh8\": \"../shuffled/qa8_lists-sets\",\n        \"sh9\": \"../shuffled/qa9_simple-negation\",\n        \"sh10\": \"../shuffled/qa10_indefinite-knowledge\",\n        \"sh11\": \"../shuffled/qa11_basic-coreference\",\n        \"sh12\": \"../shuffled/qa12_conjunction\",\n        \"sh13\": \"../shuffled/qa13_compound-coreference\",\n        \"sh14\": \"../shuffled/qa14_time-reasoning\",\n        \"sh15\": \"../shuffled/qa15_basic-deduction\",\n        \"sh16\": \"../shuffled/qa16_basic-induction\",\n        \"sh17\": \"../shuffled/qa17_positional-reasoning\",\n        \"sh18\": \"../shuffled/qa18_size-reasoning\",\n        \"sh19\": \"../shuffled/qa19_path-finding\",\n        \"sh20\": \"../shuffled/qa20_agents-motivations\",\n    }\n    if (test_id == \"\"):\n        test_id = id \n    babi_name = babi_map[id]\n    babi_test_name = babi_map[test_id]\n    babi_train_raw = init_babi(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'data/en-10k/%s_train.txt' % babi_name))\n    babi_test_raw = init_babi(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'data/en-10k/%s_test.txt' % babi_test_name))\n    return babi_train_raw, babi_test_raw\n\n            \ndef load_glove(dim):\n    word2vec = {}\n    \n    print(\"==> loading glove\")\n    with open((\"./data/glove/glove.6B/glove.6B.\" + str(dim) + \"d.txt\")) as f:\n        for line in f:    \n            l = line.split()\n            word2vec[l[0]] = map(float, l[1:])\n            \n    print(\"==> glove is loaded\")\n    \n    return word2vec\n\n\ndef create_vector(word, word2vec, word_vector_size, silent=True):\n    # if the word is missing from Glove, create some fake vector and store in glove!\n    vector = np.random.uniform(0.0,1.0,(word_vector_size,))\n    word2vec[word] = vector\n    if (not silent):\n        print(\"utils.py::create_vector => %s is missing\" % word)\n    return vector\n\ndef process_word(word, word2vec, vocab, ivocab, word_vector_size, to_return=\"word2vec\", silent=True):\n    if not word in word2vec:\n        create_vector(word, word2vec, word_vector_size, silent)\n    if not word in vocab: \n        next_index = len(vocab)\n        vocab[word] = next_index\n        ivocab[next_index] = word\n    \n    if to_return == \"word2vec\":\n        return word2vec[word]\n    elif to_return == \"index\":\n        return vocab[word]\n    elif to_return == \"onehot\":\n        raise Exception(\"to_return = 'onehot' is not implemented yet\")\n\ndef process_input(data_raw, floatX, word2vec, vocab, ivocab, embed_size, split_sentences=False):\n    questions = []\n    inputs = []\n    answers = []\n    input_masks = []\n    for x in data_raw:\n        if split_sentences:\n            inp = x[\"C\"].lower().split(' . ') \n            inp = [w for w in inp if len(w) > 0]\n            inp = [i.split() for i in inp]\n        else:\n            inp = x[\"C\"].lower().split(' ') \n            inp = [w for w in inp if len(w) > 0]\n\n        q = x[\"Q\"].lower().split(' ')\n        q = [w for w in q if len(w) > 0]\n\n        if split_sentences: \n            inp_vector = [[process_word(word = w, \n                                        word2vec = word2vec, \n                                        vocab = vocab, \n                                        ivocab = ivocab, \n                                        word_vector_size = embed_size, \n                                        to_return = \"index\") for w in s] for s in inp]\n        else:\n            inp_vector = [process_word(word = w, \n                                        word2vec = word2vec, \n                                        vocab = vocab, \n                                        ivocab = ivocab, \n                                        word_vector_size = embed_size, \n                                        to_return = \"index\") for w in inp]\n                                    \n        q_vector = [process_word(word = w, \n                                    word2vec = word2vec, \n                                    vocab = vocab, \n                                    ivocab = ivocab, \n                                    word_vector_size = embed_size, \n                                    to_return = \"index\") for w in q]\n        \n        if split_sentences:\n            inputs.append(inp_vector)\n        else:\n            inputs.append(np.vstack(inp_vector).astype(floatX))\n        questions.append(np.vstack(q_vector).astype(floatX))\n        answers.append(process_word(word = x[\"A\"], \n                                        word2vec = word2vec, \n                                        vocab = vocab, \n                                        ivocab = ivocab, \n                                        word_vector_size = embed_size, \n                                        to_return = \"index\"))\n        # NOTE: here we assume the answer is one word! \n\n        if not split_sentences:\n            if input_mask_mode == 'word':\n                input_masks.append(np.array([index for index, w in enumerate(inp)], dtype=np.int32)) \n            elif input_mask_mode == 'sentence': \n                input_masks.append(np.array([index for index, w in enumerate(inp) if w == '.'], dtype=np.int32)) \n            else:\n                raise Exception(\"invalid input_mask_mode\")\n\n    return inputs, questions, answers, input_masks\n\ndef get_lens(inputs, split_sentences=False):\n    lens = np.zeros((len(inputs)), dtype=int)\n    for i, t in enumerate(inputs):\n        lens[i] = t.shape[0]\n    return lens\n\ndef get_sentence_lens(inputs):\n    lens = np.zeros((len(inputs)), dtype=int)\n    sen_lens = []\n    max_sen_lens = []\n    for i, t in enumerate(inputs):\n        sentence_lens = np.zeros((len(t)), dtype=int)\n        for j, s in enumerate(t):\n            sentence_lens[j] = len(s)\n        lens[i] = len(t)\n        sen_lens.append(sentence_lens)\n        max_sen_lens.append(np.max(sentence_lens))\n    return lens, sen_lens, max(max_sen_lens)\n    \n\ndef pad_inputs(inputs, lens, max_len, mode=\"\", sen_lens=None, max_sen_len=None):\n    if mode == \"mask\":\n        padded = [np.pad(inp, (0, max_len - lens[i]), 'constant', constant_values=0) for i, inp in enumerate(inputs)]\n        return np.vstack(padded)\n\n    elif mode == \"split_sentences\":\n        padded = np.zeros((len(inputs), max_len, max_sen_len))\n        for i, inp in enumerate(inputs):\n            padded_sentences = [np.pad(s, (0, max_sen_len - sen_lens[i][j]), 'constant', constant_values=0) for j, s in enumerate(inp)]\n            # trim array according to max allowed inputs\n            if len(padded_sentences) > max_len:\n                padded_sentences = padded_sentences[(len(padded_sentences)-max_len):]\n                lens[i] = max_len\n            padded_sentences = np.vstack(padded_sentences)\n            padded_sentences = np.pad(padded_sentences, ((0, max_len - lens[i]),(0,0)), 'constant', constant_values=0)\n            padded[i] = padded_sentences\n        return padded\n\n    padded = [np.pad(np.squeeze(inp, axis=1), (0, max_len - lens[i]), 'constant', constant_values=0) for i, inp in enumerate(inputs)]\n    return np.vstack(padded)\n\ndef create_embedding(word2vec, ivocab, embed_size):\n    embedding = np.zeros((len(ivocab), embed_size))\n    for i in range(len(ivocab)):\n        word = ivocab[i]\n        embedding[i] = word2vec[word]\n    return embedding\n\ndef load_babi(config, split_sentences=False):\n    vocab = {}\n    ivocab = {}\n\n    babi_train_raw, babi_test_raw = get_babi_raw(config.babi_id, config.babi_test_id)\n\n    if config.word2vec_init:\n        assert config.embed_size == 100\n        word2vec = load_glove(config.embed_size)\n    else:\n        word2vec = {}\n\n    # set word at index zero to be end of sentence token so padding with zeros is consistent\n    process_word(word = \"<eos>\", \n                word2vec = word2vec, \n                vocab = vocab, \n                ivocab = ivocab, \n                word_vector_size = config.embed_size, \n                to_return = \"index\")\n\n    print('==> get train inputs')\n    train_data = process_input(babi_train_raw, config.floatX, word2vec, vocab, ivocab, config.embed_size, split_sentences)\n    print('==> get test inputs')\n    test_data = process_input(babi_test_raw, config.floatX, word2vec, vocab, ivocab, config.embed_size, split_sentences)\n\n    if config.word2vec_init:\n        assert config.embed_size == 100\n        word_embedding = create_embedding(word2vec, ivocab, config.embed_size)\n    else:\n        word_embedding = np.random.uniform(-config.embedding_init, config.embedding_init, (len(ivocab), config.embed_size))\n\n    inputs, questions, answers, input_masks = train_data if config.train_mode else test_data\n\n    if split_sentences:\n        input_lens, sen_lens, max_sen_len = get_sentence_lens(inputs)\n        max_mask_len = max_sen_len\n    else:\n        input_lens = get_lens(inputs)\n        mask_lens = get_lens(input_masks)\n        max_mask_len = np.max(mask_lens)\n\n    q_lens = get_lens(questions)\n\n    max_q_len = np.max(q_lens)\n    max_input_len = min(np.max(input_lens), config.max_allowed_inputs)\n\n    #pad out arrays to max\n    if split_sentences:\n        inputs = pad_inputs(inputs, input_lens, max_input_len, \"split_sentences\", sen_lens, max_sen_len)\n        input_masks = np.zeros(len(inputs))\n    else:\n        inputs = pad_inputs(inputs, input_lens, max_input_len)\n        input_masks = pad_inputs(input_masks, mask_lens, max_mask_len, \"mask\")\n\n    questions = pad_inputs(questions, q_lens, max_q_len)\n\n    answers = np.stack(answers)\n\n    if config.train_mode:\n        train = questions[:config.num_train], inputs[:config.num_train], q_lens[:config.num_train], input_lens[:config.num_train], input_masks[:config.num_train], answers[:config.num_train]\n\n        valid = questions[config.num_train:], inputs[config.num_train:], q_lens[config.num_train:], input_lens[config.num_train:], input_masks[config.num_train:], answers[config.num_train:]\n        return train, valid, word_embedding, max_q_len, max_input_len, max_mask_len, len(vocab)\n\n    else:\n        test = questions, inputs, q_lens, input_lens, input_masks, answers\n        return test, word_embedding, max_q_len, max_input_len, max_mask_len, len(vocab)\n"
  },
  {
    "path": "dmn_plus.py",
    "content": "from __future__ import print_function\nfrom __future__ import division\n\nimport sys\nimport time\n\nimport numpy as np\nfrom copy import deepcopy\n\nimport tensorflow as tf\nfrom attention_gru_cell import AttentionGRUCell\n\nfrom tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops\n\nimport babi_input\n\nclass Config(object):\n    \"\"\"Holds model hyperparams and data information.\"\"\"\n\n    batch_size = 100\n    embed_size = 80\n    hidden_size = 80\n\n    max_epochs = 256\n    early_stopping = 20\n\n    dropout = 0.9\n    lr = 0.001\n    l2 = 0.001\n\n    cap_grads = False\n    max_grad_val = 10\n    noisy_grads = False\n\n    word2vec_init = False\n    embedding_init = np.sqrt(3)\n\n    # NOTE not currently used hence non-sensical anneal_threshold\n    anneal_threshold = 1000\n    anneal_by = 1.5\n\n    num_hops = 3\n    num_attention_features = 4\n\n    max_allowed_inputs = 130\n    num_train = 9000\n\n    floatX = np.float32\n\n    babi_id = \"1\"\n    babi_test_id = \"\"\n\n    train_mode = True\n\ndef _add_gradient_noise(t, stddev=1e-3, name=None):\n    \"\"\"Adds gradient noise as described in http://arxiv.org/abs/1511.06807\n    The input Tensor `t` should be a gradient.\n    The output will be `t` + gaussian noise.\n    0.001 was said to be a good fixed value for memory networks.\"\"\"\n    with tf.variable_scope('gradient_noise'):\n        gn = tf.random_normal(tf.shape(t), stddev=stddev)\n        return tf.add(t, gn)\n\n# from https://github.com/domluna/memn2n\ndef _position_encoding(sentence_size, embedding_size):\n    \"\"\"We could have used RNN for parsing sentence but that tends to overfit.\n    The simpler choice would be to take sum of embedding but we loose loose positional information.\n    Position encoding is described in section 4.1 in \"End to End Memory Networks\" in more detail (http://arxiv.org/pdf/1503.08895v5.pdf)\"\"\"\n    encoding = np.ones((embedding_size, sentence_size), dtype=np.float32)\n    ls = sentence_size+1\n    le = embedding_size+1\n    for i in range(1, le):\n        for j in range(1, ls):\n            encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2)\n    encoding = 1 + 4 * encoding / embedding_size / sentence_size\n    return np.transpose(encoding)\n\nclass DMN_PLUS(object):\n\n    def load_data(self, debug=False):\n        \"\"\"Loads train/valid/test data and sentence encoding\"\"\"\n        if self.config.train_mode:\n            self.train, self.valid, self.word_embedding, self.max_q_len, self.max_sentences, self.max_sen_len, self.vocab_size = babi_input.load_babi(self.config, split_sentences=True)\n        else:\n            self.test, self.word_embedding, self.max_q_len, self.max_sentences, self.max_sen_len, self.vocab_size = babi_input.load_babi(self.config, split_sentences=True)\n        self.encoding = _position_encoding(self.max_sen_len, self.config.embed_size)\n\n    def add_placeholders(self):\n        \"\"\"add data placeholder to graph\"\"\"\n        self.question_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size, self.max_q_len))\n        self.input_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size, self.max_sentences, self.max_sen_len))\n\n        self.question_len_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size,))\n        self.input_len_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size,))\n\n        self.answer_placeholder = tf.placeholder(tf.int64, shape=(self.config.batch_size,))\n\n        self.dropout_placeholder = tf.placeholder(tf.float32)\n\n    def get_predictions(self, output):\n        preds = tf.nn.softmax(output)\n        pred = tf.argmax(preds, 1)\n        return pred\n\n    def add_loss_op(self, output):\n        \"\"\"Calculate loss\"\"\"\n        loss = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output, labels=self.answer_placeholder))\n\n        # add l2 regularization for all variables except biases\n        for v in tf.trainable_variables():\n            if not 'bias' in v.name.lower():\n                loss += self.config.l2*tf.nn.l2_loss(v)\n\n        tf.summary.scalar('loss', loss)\n\n        return loss\n        \n    def add_training_op(self, loss):\n        \"\"\"Calculate and apply gradients\"\"\"\n        opt = tf.train.AdamOptimizer(learning_rate=self.config.lr)\n        gvs = opt.compute_gradients(loss)\n\n        # optionally cap and noise gradients to regularize\n        if self.config.cap_grads:\n            gvs = [(tf.clip_by_norm(grad, self.config.max_grad_val), var) for grad, var in gvs]\n        if self.config.noisy_grads:\n            gvs = [(_add_gradient_noise(grad), var) for grad, var in gvs]\n\n        train_op = opt.apply_gradients(gvs)\n        return train_op\n  \n\n    def get_question_representation(self):\n        \"\"\"Get question vectors via embedding and GRU\"\"\"\n        questions = tf.nn.embedding_lookup(self.embeddings, self.question_placeholder)\n\n        gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)\n        _, q_vec = tf.nn.dynamic_rnn(gru_cell,\n                questions,\n                dtype=np.float32,\n                sequence_length=self.question_len_placeholder\n        )\n\n        return q_vec\n\n    def get_input_representation(self):\n        \"\"\"Get fact (sentence) vectors via embedding, positional encoding and bi-directional GRU\"\"\"\n        # get word vectors from embedding\n        inputs = tf.nn.embedding_lookup(self.embeddings, self.input_placeholder)\n\n        # use encoding to get sentence representation\n        inputs = tf.reduce_sum(inputs * self.encoding, 2)\n\n        forward_gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)\n        backward_gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)\n        outputs, _ = tf.nn.bidirectional_dynamic_rnn(\n                forward_gru_cell,\n                backward_gru_cell,\n                inputs,\n                dtype=np.float32,\n                sequence_length=self.input_len_placeholder\n        )\n\n        # sum forward and backward output vectors\n        fact_vecs = tf.reduce_sum(tf.stack(outputs), axis=0)\n\n        # apply dropout\n        fact_vecs = tf.nn.dropout(fact_vecs, self.dropout_placeholder)\n\n        return fact_vecs\n\n    def get_attention(self, q_vec, prev_memory, fact_vec, reuse):\n        \"\"\"Use question vector and previous memory to create scalar attention for current fact\"\"\"\n        with tf.variable_scope(\"attention\", reuse=reuse):\n\n            features = [fact_vec*q_vec,\n                        fact_vec*prev_memory,\n                        tf.abs(fact_vec - q_vec),\n                        tf.abs(fact_vec - prev_memory)]\n\n            feature_vec = tf.concat(features, 1)\n\n            attention = tf.contrib.layers.fully_connected(feature_vec,\n                            self.config.embed_size,\n                            activation_fn=tf.nn.tanh,\n                            reuse=reuse, scope=\"fc1\")\n\n            attention = tf.contrib.layers.fully_connected(attention,\n                            1,\n                            activation_fn=None,\n                            reuse=reuse, scope=\"fc2\")\n\n        return attention\n\n    def generate_episode(self, memory, q_vec, fact_vecs, hop_index):\n        \"\"\"Generate episode by applying attention to current fact vectors through a modified GRU\"\"\"\n\n        attentions = [tf.squeeze(\n            self.get_attention(q_vec, memory, fv, bool(hop_index) or bool(i)), axis=1)\n            for i, fv in enumerate(tf.unstack(fact_vecs, axis=1))]\n\n        attentions = tf.transpose(tf.stack(attentions))\n        self.attentions.append(attentions)\n        attentions = tf.nn.softmax(attentions)\n        attentions = tf.expand_dims(attentions, axis=-1)\n\n        reuse = True if hop_index > 0 else False\n\n        # concatenate fact vectors and attentions for input into attGRU\n        gru_inputs = tf.concat([fact_vecs, attentions], 2)\n\n        with tf.variable_scope('attention_gru', reuse=reuse):\n            _, episode = tf.nn.dynamic_rnn(AttentionGRUCell(self.config.hidden_size),\n                    gru_inputs,\n                    dtype=np.float32,\n                    sequence_length=self.input_len_placeholder\n            )\n\n        return episode\n\n    def add_answer_module(self, rnn_output, q_vec):\n        \"\"\"Linear softmax answer module\"\"\"\n\n        rnn_output = tf.nn.dropout(rnn_output, self.dropout_placeholder)\n\n        output = tf.layers.dense(tf.concat([rnn_output, q_vec], 1),\n                self.vocab_size,\n                activation=None)\n\n        return output\n\n    def inference(self):\n        \"\"\"Performs inference on the DMN model\"\"\"\n\n        # input fusion module\n        with tf.variable_scope(\"question\", initializer=tf.contrib.layers.xavier_initializer()):\n            print('==> get question representation')\n            q_vec = self.get_question_representation()\n\n\n        with tf.variable_scope(\"input\", initializer=tf.contrib.layers.xavier_initializer()):\n            print('==> get input representation')\n            fact_vecs = self.get_input_representation()\n\n        # keep track of attentions for possible strong supervision\n        self.attentions = []\n\n        # memory module\n        with tf.variable_scope(\"memory\", initializer=tf.contrib.layers.xavier_initializer()):\n            print('==> build episodic memory')\n\n            # generate n_hops episodes\n            prev_memory = q_vec\n\n            for i in range(self.config.num_hops):\n                # get a new episode\n                print('==> generating episode', i)\n                episode = self.generate_episode(prev_memory, q_vec, fact_vecs, i)\n\n                # untied weights for memory update\n                with tf.variable_scope(\"hop_%d\" % i):\n                    prev_memory = tf.layers.dense(tf.concat([prev_memory, episode, q_vec], 1),\n                            self.config.hidden_size,\n                            activation=tf.nn.relu)\n\n            output = prev_memory\n\n        # pass memory module output through linear answer module\n        with tf.variable_scope(\"answer\", initializer=tf.contrib.layers.xavier_initializer()):\n            output = self.add_answer_module(output, q_vec)\n\n        return output\n\n\n    def run_epoch(self, session, data, num_epoch=0, train_writer=None, train_op=None, verbose=2, train=False):\n        config = self.config\n        dp = config.dropout\n        if train_op is None:\n            # train_op = tf.no_op()\n            dp = 1\n        total_steps = len(data[0]) // config.batch_size\n        total_loss = []\n        accuracy = 0\n\n        # shuffle data\n        p = np.random.permutation(len(data[0]))\n        qp, ip, ql, il, im, a = data\n        qp, ip, ql, il, im, a = qp[p], ip[p], ql[p], il[p], im[p], a[p]\n\n        for step in range(total_steps):\n            index = range(step*config.batch_size,(step+1)*config.batch_size)\n            feed = {self.question_placeholder: qp[index],\n                  self.input_placeholder: ip[index],\n                  self.question_len_placeholder: ql[index],\n                  self.input_len_placeholder: il[index],\n                  self.answer_placeholder: a[index],\n                  self.dropout_placeholder: dp}\n\n            if train_op is None:\n                loss, pred, summary,  = session.run(\n                    [self.calculate_loss, self.pred, self.merged], feed_dict=feed)\n            else:\n                loss, pred, summary, _ = session.run(\n                    [self.calculate_loss, self.pred, self.merged, train_op], feed_dict=feed)\n\n            if train_writer is not None:\n                train_writer.add_summary(summary, num_epoch*total_steps + step)\n\n            answers = a[step*config.batch_size:(step+1)*config.batch_size]\n            accuracy += np.sum(pred == answers)/float(len(answers))\n\n\n            total_loss.append(loss)\n            if verbose and step % verbose == 0:\n                sys.stdout.write('\\r{} / {} : loss = {}'.format(\n                  step, total_steps, np.mean(total_loss)))\n                sys.stdout.flush()\n\n\n        if verbose:\n            sys.stdout.write('\\r')\n\n        return np.mean(total_loss), accuracy/float(total_steps)\n\n\n    def __init__(self, config):\n        self.config = config\n        self.variables_to_save = {}\n        self.load_data(debug=False)\n        self.add_placeholders()\n\n        # set up embedding\n        self.embeddings = tf.Variable(self.word_embedding.astype(np.float32), name=\"Embedding\")\n\n        self.output = self.inference()\n        self.pred = self.get_predictions(self.output)\n        self.calculate_loss = self.add_loss_op(self.output)\n        self.train_step = self.add_training_op(self.calculate_loss)\n        self.merged = tf.summary.merge_all()\n\n"
  },
  {
    "path": "dmn_test.py",
    "content": "from __future__ import print_function\nfrom __future__ import division\n\nimport tensorflow as tf\nimport numpy as np\n\nimport time\nimport argparse\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"-b\", \"--babi_task_id\", help=\"specify babi task 1-20 (default=1)\")\nparser.add_argument(\"-t\", \"--dmn_type\", help=\"specify type of dmn (default=original)\")\nargs = parser.parse_args()\n\ndmn_type = args.dmn_type if args.dmn_type is not None else \"plus\"\n\nif dmn_type == \"original\":\n    from dmn_original import Config\n    config = Config()\nelif dmn_type == \"plus\":\n    from dmn_plus import Config\n    config = Config()\nelse:\n    raise NotImplementedError(dmn_type + ' DMN type is not currently implemented')\n\nif args.babi_task_id is not None:\n    config.babi_id = args.babi_task_id\n\nconfig.strong_supervision = False\n\nconfig.train_mode = False\n\nprint( 'Testing DMN ' + dmn_type + ' on babi task', config.babi_id)\n\n# create model\nwith tf.variable_scope('DMN') as scope:\n    if dmn_type == \"original\":\n        from dmn_original import DMN\n        model = DMN(config)\n    elif dmn_type == \"plus\":\n        from dmn_plus import DMN_PLUS\n        model = DMN_PLUS(config)\n\nprint('==> initializing variables')\ninit = tf.global_variables_initializer()\nsaver = tf.train.Saver()\n\nwith tf.Session() as session:\n    session.run(init)\n\n    print('==> restoring weights')\n    saver.restore(session, 'weights/task' + str(model.config.babi_id) + '.weights')\n\n    print('==> running DMN')\n    test_loss, test_accuracy = model.run_epoch(session, model.test)\n\n    print('')\n    print('Test accuracy:', test_accuracy)\n"
  },
  {
    "path": "dmn_train.py",
    "content": "from __future__ import print_function\nfrom __future__ import division\n\nimport tensorflow as tf\n\nimport time\nimport argparse\nimport os\n\n\nparser = argparse.ArgumentParser()\nparser.add_argument(\"-b\", \"--babi_task_id\", help=\"specify babi task 1-20 (default=1)\")\nparser.add_argument(\"-r\", \"--restore\", help=\"restore previously trained weights (default=false)\")\nparser.add_argument(\"-s\", \"--strong_supervision\", help=\"use labelled supporting facts (default=false)\")\nparser.add_argument(\"-t\", \"--dmn_type\", help=\"specify type of dmn (default=original)\")\nparser.add_argument(\"-l\", \"--l2_loss\", type=float, default=0.001, help=\"specify l2 loss constant\")\nparser.add_argument(\"-n\", \"--num_runs\", type=int, help=\"specify the number of model runs\")\n\nargs = parser.parse_args()\n\ndmn_type = args.dmn_type if args.dmn_type is not None else \"plus\"\n\nif dmn_type == \"plus\":\n    from dmn_plus import Config\n\n    config = Config()\nelse:\n    raise NotImplementedError(dmn_type + ' DMN type is not currently implemented')\n\nif args.babi_task_id is not None:\n    config.babi_id = args.babi_task_id\n\nconfig.babi_id = args.babi_task_id if args.babi_task_id is not None else str(1)\nconfig.l2 = args.l2_loss if args.l2_loss is not None else 0.001\nconfig.strong_supervision = args.strong_supervision if args.strong_supervision is not None else False\nnum_runs = args.num_runs if args.num_runs is not None else 1\n\nprint('Training DMN ' + dmn_type + ' on babi task', config.babi_id)\n\nbest_overall_val_loss = float('inf')\n\n# create model\nwith tf.variable_scope('DMN') as scope:\n    if dmn_type == \"plus\":\n        from dmn_plus import DMN_PLUS\n\n        model = DMN_PLUS(config)\n\nfor run in range(num_runs):\n\n    print('Starting run', run)\n\n    print('==> initializing variables')\n    init = tf.global_variables_initializer()\n    saver = tf.train.Saver()\n\n    with tf.Session() as session:\n\n        sum_dir = 'summaries/train/' + time.strftime(\"%Y-%m-%d %H %M\")\n        if not os.path.exists(sum_dir):\n            os.makedirs(sum_dir)\n        train_writer = tf.summary.FileWriter(sum_dir, session.graph)\n\n        session.run(init)\n\n        best_val_epoch = 0\n        prev_epoch_loss = float('inf')\n        best_val_loss = float('inf')\n        best_val_accuracy = 0.0\n\n        if args.restore:\n            print('==> restoring weights')\n            saver.restore(session, 'weights/task' + str(model.config.babi_id) + '.weights')\n\n        print('==> starting training')\n        for epoch in range(config.max_epochs):\n            print('Epoch {}'.format(epoch))\n            start = time.time()\n\n            train_loss, train_accuracy = model.run_epoch(\n                session, model.train, epoch, train_writer,\n                train_op=model.train_step, train=True)\n            valid_loss, valid_accuracy = model.run_epoch(session, model.valid)\n            print('Training loss: {}'.format(train_loss))\n            print('Validation loss: {}'.format(valid_loss))\n            print('Training accuracy: {}'.format(train_accuracy))\n            print('Vaildation accuracy: {}'.format(valid_accuracy))\n\n            if valid_loss < best_val_loss:\n                best_val_loss = valid_loss\n                best_val_epoch = epoch\n                if best_val_loss < best_overall_val_loss:\n                    print('Saving weights')\n                    best_overall_val_loss = best_val_loss\n                    best_val_accuracy = valid_accuracy\n                    saver.save(session, 'weights/task' + str(model.config.babi_id) + '.weights')\n\n            # anneal\n            if train_loss > prev_epoch_loss * model.config.anneal_threshold:\n                model.config.lr /= model.config.anneal_by\n                print('annealed lr to %f' % model.config.lr)\n\n            prev_epoch_loss = train_loss\n\n            if epoch - best_val_epoch > config.early_stopping:\n                break\n            print('Total time: {}'.format(time.time() - start))\n\n        print('Best validation accuracy:', best_val_accuracy)\n"
  },
  {
    "path": "fetch_babi_data.sh",
    "content": "#!/bin/bash\n\nurl=http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz\nfname=`basename $url`\n\ncurl -SLO $url\ntar zxvf $fname \nmkdir -p data\nmv tasks_1-20_v1-2/* data/\nrm -r tasks_1-20_v1-2\nrm tasks_1-20_v1-2.tar.gz\n\nmkdir weights\n"
  }
]