Repository: barronalex/Dynamic-Memory-Networks-in-TensorFlow
Branch: master
Commit: 6b35d5b397f7
Files: 9
Total size: 38.8 KB
Directory structure:
gitextract_74facvp9/
├── .gitignore
├── LICENSE.txt
├── README.md
├── attention_gru_cell.py
├── babi_input.py
├── dmn_plus.py
├── dmn_test.py
├── dmn_train.py
└── fetch_babi_data.sh
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
/data
/papers
/weights
/summaries
*.swp
*.pyc
*.zip
*.xlsx
*.gz
dmn_original.py
================================================
FILE: LICENSE.txt
================================================
The MIT License (MIT)
Copyright (c) 2016 Alex Barron
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Dynamic Memory Networks in TensorFlow
DMN+ implementation in TensorFlow for question answering on the bAbI 10k dataset.
Structure and parameters from [Dynamic Memory Networks for Visual and Textual Question Answering](https://arxiv.org/abs/1603.01417) which is henceforth referred to as Xiong et al.
Adapted from Stanford's [cs224d](http://cs224d.stanford.edu/) assignment 2 starter code and using methods from [Dynamic Memory Networks in Theano](https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano) for importing the Babi-10k dataset.
## Repository Contents
| file | description |
| --- | --- |
| `dmn_plus.py` | contains the DMN+ model |
| `dmn_train.py` | trains the model on a specified (-b) babi task|
| `dmn_test.py` | tests the model on a specified (-b) babi task |
| `babi_input.py` | prepares bAbI data for input into DMN |
| `attention_gru_cell.py` | contains a custom Attention GRU cell implementation |
| `fetch_babi_data.sh` | shell script to fetch bAbI tasks (from [DMNs in Theano](https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano)) |
## Usage
Install [TensorFlow r1.4](https://www.tensorflow.org/install/)
Run the included shell script to fetch the data
bash fetch_babi_data.sh
Use 'dmn_train.py' to train the DMN+ model contained in 'dmn_plus.py'
python dmn_train.py --babi_task_id 2
Once training is finished, test the model on a specified task
python dmn_test.py --babi_task_id 2
The l2 regularization constant can be set with -l2-loss (-l). All other parameters were specified by [Xiong et al](https://arxiv.org/abs/1603.01417) and can be found in the 'Config' class in 'dmn_plus.py'.
## Benchmarks
The TensorFlow DMN+ reaches close to state of the art performance on the 10k dataset with weak supervision (no supporting facts).
Each task was trained on separately with l2 = 0.001. As the paper suggests, 10 training runs were used for tasks 2, 3, 17 and 18 (configurable with --num-runs), where the weights which produce the lowest validation loss in any run are used for testing.
The pre-trained weights which achieve these benchmarks are available in 'pretrained'.
I haven't yet had the time to fully optimize the l2 parameter which is not specified by the paper. My hypothesis is that fully optimizing l2 regularization would close the final significant performance gap between the TensorFlow DMN+ and original DMN+ on task 3.
Below are the full results for each bAbI task (tasks where both implementations achieved 0 test error are omitted):
| Task ID | TensorFlow DMN+| Xiong et al DMN+ |
| :---: | :---: | :---: |
| 2 | 0.9 | 0.3 |
| 3 | 18.4 | 1.1 |
| 5 | 0.5 | 0.5 |
| 7 | 2.8 | 2.4 |
| 8 | 0.5 | 0.0 |
| 9 | 0.1 | 0.0 |
| 14 | 0.0 | 0.2 |
| 16 | 46.2 | 45.3 |
| 17 | 5.0 | 4.2 |
| 18 | 2.2 | 2.1 |
================================================
FILE: attention_gru_cell.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import math
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import clip_ops
from tensorflow.python.ops import embedding_ops
from tensorflow.python.ops import init_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import partitioned_variables
from tensorflow.python.ops import variable_scope as vs
from tensorflow.python.ops.math_ops import sigmoid
from tensorflow.python.ops.math_ops import tanh
from tensorflow.python.ops.rnn_cell_impl import RNNCell
from tensorflow.python.platform import tf_logging as logging
from tensorflow.python.util import nest
class AttentionGRUCell(RNNCell):
"""Gated Recurrent Unit incoporating attention (cf. https://arxiv.org/abs/1603.01417).
Adapted from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py
NOTE: Takes an input of shape: (batch_size, max_time_step, input_dim + 1)
Where an input vector of shape: (batch_size, max_time_step, input_dim)
and scalar attention of shape: (batch_size, max_time_step, 1)
are concatenated along the final axis"""
def __init__(self, num_units, input_size=None, activation=tanh):
if input_size is not None:
logging.warn("%s: The input_size parameter is deprecated.", self)
self._num_units = num_units
self._activation = activation
@property
def state_size(self):
return self._num_units
@property
def output_size(self):
return self._num_units
def __call__(self, inputs, state, scope=None):
"""Attention GRU with nunits cells."""
with vs.variable_scope(scope or "attention_gru_cell"):
with vs.variable_scope("gates"): # Reset gate and update gate.
# We start with bias of 1.0 to not reset and not update.
if inputs.get_shape()[-1] != self._num_units + 1:
raise ValueError("Input should be passed as word input concatenated with 1D attention on end axis")
# extract input vector and attention
inputs, g = array_ops.split(inputs,
num_or_size_splits=[self._num_units,1],
axis=1)
r = _linear([inputs, state], self._num_units, True)
r = sigmoid(r)
with vs.variable_scope("candidate"):
r = r*_linear(state, self._num_units, False)
with vs.variable_scope("input"):
x = _linear(inputs, self._num_units, True)
h_hat = self._activation(r + x)
new_h = (1 - g) * state + g * h_hat
return new_h, new_h
def _linear(args, output_size, bias, bias_start=0.0):
"""Linear map: sum_i(args[i] * W[i]), where W[i] is a variable.
Args:
args: a 2D Tensor or a list of 2D, batch x n, Tensors.
output_size: int, second dimension of W[i].
bias: boolean, whether to add a bias term or not.
bias_start: starting value to initialize the bias; 0 by default.
Returns:
A 2D Tensor with shape [batch x output_size] equal to
sum_i(args[i] * W[i]), where W[i]s are newly created matrices.
Raises:
ValueError: if some of the arguments has unspecified or wrong shape.
"""
if args is None or (nest.is_sequence(args) and not args):
raise ValueError("`args` must be specified")
if not nest.is_sequence(args):
args = [args]
# Calculate the total size of arguments on dimension 1.
total_arg_size = 0
shapes = [a.get_shape() for a in args]
for shape in shapes:
if shape.ndims != 2:
raise ValueError("linear is expecting 2D arguments: %s" % shapes)
if shape[1].value is None:
raise ValueError("linear expects shape[1] to be provided for shape %s, "
"but saw %s" % (shape, shape[1]))
else:
total_arg_size += shape[1].value
dtype = [a.dtype for a in args][0]
# Now the computation.
scope = vs.get_variable_scope()
with vs.variable_scope(scope) as outer_scope:
weights = vs.get_variable(
"weights", [total_arg_size, output_size], dtype=dtype)
if len(args) == 1:
res = math_ops.matmul(args[0], weights)
else:
res = math_ops.matmul(array_ops.concat(args, 1), weights)
if not bias:
return res
with vs.variable_scope(outer_scope) as inner_scope:
inner_scope.set_partitioner(None)
biases = vs.get_variable(
"biases", [output_size],
dtype=dtype,
initializer=init_ops.constant_initializer(bias_start, dtype=dtype))
return nn_ops.bias_add(res, biases)
================================================
FILE: babi_input.py
================================================
from __future__ import division
from __future__ import print_function
import sys
import os as os
import numpy as np
# can be sentence or word
input_mask_mode = "sentence"
# adapted from https://github.com/YerevaNN/Dynamic-memory-networks-in-Theano/
def init_babi(fname):
print("==> Loading test from %s" % fname)
tasks = []
task = None
for i, line in enumerate(open(fname)):
id = int(line[0:line.find(' ')])
if id == 1:
task = {"C": "", "Q": "", "A": "", "S": ""}
counter = 0
id_map = {}
line = line.strip()
line = line.replace('.', ' . ')
line = line[line.find(' ')+1:]
# if not a question
if line.find('?') == -1:
task["C"] += line
id_map[id] = counter
counter += 1
else:
idx = line.find('?')
tmp = line[idx+1:].split('\t')
task["Q"] = line[:idx]
task["A"] = tmp[1].strip()
task["S"] = []
for num in tmp[2].split():
task["S"].append(id_map[int(num.strip())])
tasks.append(task.copy())
return tasks
def get_babi_raw(id, test_id):
babi_map = {
"1": "qa1_single-supporting-fact",
"2": "qa2_two-supporting-facts",
"3": "qa3_three-supporting-facts",
"4": "qa4_two-arg-relations",
"5": "qa5_three-arg-relations",
"6": "qa6_yes-no-questions",
"7": "qa7_counting",
"8": "qa8_lists-sets",
"9": "qa9_simple-negation",
"10": "qa10_indefinite-knowledge",
"11": "qa11_basic-coreference",
"12": "qa12_conjunction",
"13": "qa13_compound-coreference",
"14": "qa14_time-reasoning",
"15": "qa15_basic-deduction",
"16": "qa16_basic-induction",
"17": "qa17_positional-reasoning",
"18": "qa18_size-reasoning",
"19": "qa19_path-finding",
"20": "qa20_agents-motivations",
"MCTest": "MCTest",
"19changed": "19changed",
"joint": "all_shuffled",
"sh1": "../shuffled/qa1_single-supporting-fact",
"sh2": "../shuffled/qa2_two-supporting-facts",
"sh3": "../shuffled/qa3_three-supporting-facts",
"sh4": "../shuffled/qa4_two-arg-relations",
"sh5": "../shuffled/qa5_three-arg-relations",
"sh6": "../shuffled/qa6_yes-no-questions",
"sh7": "../shuffled/qa7_counting",
"sh8": "../shuffled/qa8_lists-sets",
"sh9": "../shuffled/qa9_simple-negation",
"sh10": "../shuffled/qa10_indefinite-knowledge",
"sh11": "../shuffled/qa11_basic-coreference",
"sh12": "../shuffled/qa12_conjunction",
"sh13": "../shuffled/qa13_compound-coreference",
"sh14": "../shuffled/qa14_time-reasoning",
"sh15": "../shuffled/qa15_basic-deduction",
"sh16": "../shuffled/qa16_basic-induction",
"sh17": "../shuffled/qa17_positional-reasoning",
"sh18": "../shuffled/qa18_size-reasoning",
"sh19": "../shuffled/qa19_path-finding",
"sh20": "../shuffled/qa20_agents-motivations",
}
if (test_id == ""):
test_id = id
babi_name = babi_map[id]
babi_test_name = babi_map[test_id]
babi_train_raw = init_babi(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'data/en-10k/%s_train.txt' % babi_name))
babi_test_raw = init_babi(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'data/en-10k/%s_test.txt' % babi_test_name))
return babi_train_raw, babi_test_raw
def load_glove(dim):
word2vec = {}
print("==> loading glove")
with open(("./data/glove/glove.6B/glove.6B." + str(dim) + "d.txt")) as f:
for line in f:
l = line.split()
word2vec[l[0]] = map(float, l[1:])
print("==> glove is loaded")
return word2vec
def create_vector(word, word2vec, word_vector_size, silent=True):
# if the word is missing from Glove, create some fake vector and store in glove!
vector = np.random.uniform(0.0,1.0,(word_vector_size,))
word2vec[word] = vector
if (not silent):
print("utils.py::create_vector => %s is missing" % word)
return vector
def process_word(word, word2vec, vocab, ivocab, word_vector_size, to_return="word2vec", silent=True):
if not word in word2vec:
create_vector(word, word2vec, word_vector_size, silent)
if not word in vocab:
next_index = len(vocab)
vocab[word] = next_index
ivocab[next_index] = word
if to_return == "word2vec":
return word2vec[word]
elif to_return == "index":
return vocab[word]
elif to_return == "onehot":
raise Exception("to_return = 'onehot' is not implemented yet")
def process_input(data_raw, floatX, word2vec, vocab, ivocab, embed_size, split_sentences=False):
questions = []
inputs = []
answers = []
input_masks = []
for x in data_raw:
if split_sentences:
inp = x["C"].lower().split(' . ')
inp = [w for w in inp if len(w) > 0]
inp = [i.split() for i in inp]
else:
inp = x["C"].lower().split(' ')
inp = [w for w in inp if len(w) > 0]
q = x["Q"].lower().split(' ')
q = [w for w in q if len(w) > 0]
if split_sentences:
inp_vector = [[process_word(word = w,
word2vec = word2vec,
vocab = vocab,
ivocab = ivocab,
word_vector_size = embed_size,
to_return = "index") for w in s] for s in inp]
else:
inp_vector = [process_word(word = w,
word2vec = word2vec,
vocab = vocab,
ivocab = ivocab,
word_vector_size = embed_size,
to_return = "index") for w in inp]
q_vector = [process_word(word = w,
word2vec = word2vec,
vocab = vocab,
ivocab = ivocab,
word_vector_size = embed_size,
to_return = "index") for w in q]
if split_sentences:
inputs.append(inp_vector)
else:
inputs.append(np.vstack(inp_vector).astype(floatX))
questions.append(np.vstack(q_vector).astype(floatX))
answers.append(process_word(word = x["A"],
word2vec = word2vec,
vocab = vocab,
ivocab = ivocab,
word_vector_size = embed_size,
to_return = "index"))
# NOTE: here we assume the answer is one word!
if not split_sentences:
if input_mask_mode == 'word':
input_masks.append(np.array([index for index, w in enumerate(inp)], dtype=np.int32))
elif input_mask_mode == 'sentence':
input_masks.append(np.array([index for index, w in enumerate(inp) if w == '.'], dtype=np.int32))
else:
raise Exception("invalid input_mask_mode")
return inputs, questions, answers, input_masks
def get_lens(inputs, split_sentences=False):
lens = np.zeros((len(inputs)), dtype=int)
for i, t in enumerate(inputs):
lens[i] = t.shape[0]
return lens
def get_sentence_lens(inputs):
lens = np.zeros((len(inputs)), dtype=int)
sen_lens = []
max_sen_lens = []
for i, t in enumerate(inputs):
sentence_lens = np.zeros((len(t)), dtype=int)
for j, s in enumerate(t):
sentence_lens[j] = len(s)
lens[i] = len(t)
sen_lens.append(sentence_lens)
max_sen_lens.append(np.max(sentence_lens))
return lens, sen_lens, max(max_sen_lens)
def pad_inputs(inputs, lens, max_len, mode="", sen_lens=None, max_sen_len=None):
if mode == "mask":
padded = [np.pad(inp, (0, max_len - lens[i]), 'constant', constant_values=0) for i, inp in enumerate(inputs)]
return np.vstack(padded)
elif mode == "split_sentences":
padded = np.zeros((len(inputs), max_len, max_sen_len))
for i, inp in enumerate(inputs):
padded_sentences = [np.pad(s, (0, max_sen_len - sen_lens[i][j]), 'constant', constant_values=0) for j, s in enumerate(inp)]
# trim array according to max allowed inputs
if len(padded_sentences) > max_len:
padded_sentences = padded_sentences[(len(padded_sentences)-max_len):]
lens[i] = max_len
padded_sentences = np.vstack(padded_sentences)
padded_sentences = np.pad(padded_sentences, ((0, max_len - lens[i]),(0,0)), 'constant', constant_values=0)
padded[i] = padded_sentences
return padded
padded = [np.pad(np.squeeze(inp, axis=1), (0, max_len - lens[i]), 'constant', constant_values=0) for i, inp in enumerate(inputs)]
return np.vstack(padded)
def create_embedding(word2vec, ivocab, embed_size):
embedding = np.zeros((len(ivocab), embed_size))
for i in range(len(ivocab)):
word = ivocab[i]
embedding[i] = word2vec[word]
return embedding
def load_babi(config, split_sentences=False):
vocab = {}
ivocab = {}
babi_train_raw, babi_test_raw = get_babi_raw(config.babi_id, config.babi_test_id)
if config.word2vec_init:
assert config.embed_size == 100
word2vec = load_glove(config.embed_size)
else:
word2vec = {}
# set word at index zero to be end of sentence token so padding with zeros is consistent
process_word(word = "<eos>",
word2vec = word2vec,
vocab = vocab,
ivocab = ivocab,
word_vector_size = config.embed_size,
to_return = "index")
print('==> get train inputs')
train_data = process_input(babi_train_raw, config.floatX, word2vec, vocab, ivocab, config.embed_size, split_sentences)
print('==> get test inputs')
test_data = process_input(babi_test_raw, config.floatX, word2vec, vocab, ivocab, config.embed_size, split_sentences)
if config.word2vec_init:
assert config.embed_size == 100
word_embedding = create_embedding(word2vec, ivocab, config.embed_size)
else:
word_embedding = np.random.uniform(-config.embedding_init, config.embedding_init, (len(ivocab), config.embed_size))
inputs, questions, answers, input_masks = train_data if config.train_mode else test_data
if split_sentences:
input_lens, sen_lens, max_sen_len = get_sentence_lens(inputs)
max_mask_len = max_sen_len
else:
input_lens = get_lens(inputs)
mask_lens = get_lens(input_masks)
max_mask_len = np.max(mask_lens)
q_lens = get_lens(questions)
max_q_len = np.max(q_lens)
max_input_len = min(np.max(input_lens), config.max_allowed_inputs)
#pad out arrays to max
if split_sentences:
inputs = pad_inputs(inputs, input_lens, max_input_len, "split_sentences", sen_lens, max_sen_len)
input_masks = np.zeros(len(inputs))
else:
inputs = pad_inputs(inputs, input_lens, max_input_len)
input_masks = pad_inputs(input_masks, mask_lens, max_mask_len, "mask")
questions = pad_inputs(questions, q_lens, max_q_len)
answers = np.stack(answers)
if config.train_mode:
train = questions[:config.num_train], inputs[:config.num_train], q_lens[:config.num_train], input_lens[:config.num_train], input_masks[:config.num_train], answers[:config.num_train]
valid = questions[config.num_train:], inputs[config.num_train:], q_lens[config.num_train:], input_lens[config.num_train:], input_masks[config.num_train:], answers[config.num_train:]
return train, valid, word_embedding, max_q_len, max_input_len, max_mask_len, len(vocab)
else:
test = questions, inputs, q_lens, input_lens, input_masks, answers
return test, word_embedding, max_q_len, max_input_len, max_mask_len, len(vocab)
================================================
FILE: dmn_plus.py
================================================
from __future__ import print_function
from __future__ import division
import sys
import time
import numpy as np
from copy import deepcopy
import tensorflow as tf
from attention_gru_cell import AttentionGRUCell
from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops
import babi_input
class Config(object):
"""Holds model hyperparams and data information."""
batch_size = 100
embed_size = 80
hidden_size = 80
max_epochs = 256
early_stopping = 20
dropout = 0.9
lr = 0.001
l2 = 0.001
cap_grads = False
max_grad_val = 10
noisy_grads = False
word2vec_init = False
embedding_init = np.sqrt(3)
# NOTE not currently used hence non-sensical anneal_threshold
anneal_threshold = 1000
anneal_by = 1.5
num_hops = 3
num_attention_features = 4
max_allowed_inputs = 130
num_train = 9000
floatX = np.float32
babi_id = "1"
babi_test_id = ""
train_mode = True
def _add_gradient_noise(t, stddev=1e-3, name=None):
"""Adds gradient noise as described in http://arxiv.org/abs/1511.06807
The input Tensor `t` should be a gradient.
The output will be `t` + gaussian noise.
0.001 was said to be a good fixed value for memory networks."""
with tf.variable_scope('gradient_noise'):
gn = tf.random_normal(tf.shape(t), stddev=stddev)
return tf.add(t, gn)
# from https://github.com/domluna/memn2n
def _position_encoding(sentence_size, embedding_size):
"""We could have used RNN for parsing sentence but that tends to overfit.
The simpler choice would be to take sum of embedding but we loose loose positional information.
Position encoding is described in section 4.1 in "End to End Memory Networks" in more detail (http://arxiv.org/pdf/1503.08895v5.pdf)"""
encoding = np.ones((embedding_size, sentence_size), dtype=np.float32)
ls = sentence_size+1
le = embedding_size+1
for i in range(1, le):
for j in range(1, ls):
encoding[i-1, j-1] = (i - (le-1)/2) * (j - (ls-1)/2)
encoding = 1 + 4 * encoding / embedding_size / sentence_size
return np.transpose(encoding)
class DMN_PLUS(object):
def load_data(self, debug=False):
"""Loads train/valid/test data and sentence encoding"""
if self.config.train_mode:
self.train, self.valid, self.word_embedding, self.max_q_len, self.max_sentences, self.max_sen_len, self.vocab_size = babi_input.load_babi(self.config, split_sentences=True)
else:
self.test, self.word_embedding, self.max_q_len, self.max_sentences, self.max_sen_len, self.vocab_size = babi_input.load_babi(self.config, split_sentences=True)
self.encoding = _position_encoding(self.max_sen_len, self.config.embed_size)
def add_placeholders(self):
"""add data placeholder to graph"""
self.question_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size, self.max_q_len))
self.input_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size, self.max_sentences, self.max_sen_len))
self.question_len_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size,))
self.input_len_placeholder = tf.placeholder(tf.int32, shape=(self.config.batch_size,))
self.answer_placeholder = tf.placeholder(tf.int64, shape=(self.config.batch_size,))
self.dropout_placeholder = tf.placeholder(tf.float32)
def get_predictions(self, output):
preds = tf.nn.softmax(output)
pred = tf.argmax(preds, 1)
return pred
def add_loss_op(self, output):
"""Calculate loss"""
loss = tf.reduce_sum(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output, labels=self.answer_placeholder))
# add l2 regularization for all variables except biases
for v in tf.trainable_variables():
if not 'bias' in v.name.lower():
loss += self.config.l2*tf.nn.l2_loss(v)
tf.summary.scalar('loss', loss)
return loss
def add_training_op(self, loss):
"""Calculate and apply gradients"""
opt = tf.train.AdamOptimizer(learning_rate=self.config.lr)
gvs = opt.compute_gradients(loss)
# optionally cap and noise gradients to regularize
if self.config.cap_grads:
gvs = [(tf.clip_by_norm(grad, self.config.max_grad_val), var) for grad, var in gvs]
if self.config.noisy_grads:
gvs = [(_add_gradient_noise(grad), var) for grad, var in gvs]
train_op = opt.apply_gradients(gvs)
return train_op
def get_question_representation(self):
"""Get question vectors via embedding and GRU"""
questions = tf.nn.embedding_lookup(self.embeddings, self.question_placeholder)
gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)
_, q_vec = tf.nn.dynamic_rnn(gru_cell,
questions,
dtype=np.float32,
sequence_length=self.question_len_placeholder
)
return q_vec
def get_input_representation(self):
"""Get fact (sentence) vectors via embedding, positional encoding and bi-directional GRU"""
# get word vectors from embedding
inputs = tf.nn.embedding_lookup(self.embeddings, self.input_placeholder)
# use encoding to get sentence representation
inputs = tf.reduce_sum(inputs * self.encoding, 2)
forward_gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)
backward_gru_cell = tf.contrib.rnn.GRUCell(self.config.hidden_size)
outputs, _ = tf.nn.bidirectional_dynamic_rnn(
forward_gru_cell,
backward_gru_cell,
inputs,
dtype=np.float32,
sequence_length=self.input_len_placeholder
)
# sum forward and backward output vectors
fact_vecs = tf.reduce_sum(tf.stack(outputs), axis=0)
# apply dropout
fact_vecs = tf.nn.dropout(fact_vecs, self.dropout_placeholder)
return fact_vecs
def get_attention(self, q_vec, prev_memory, fact_vec, reuse):
"""Use question vector and previous memory to create scalar attention for current fact"""
with tf.variable_scope("attention", reuse=reuse):
features = [fact_vec*q_vec,
fact_vec*prev_memory,
tf.abs(fact_vec - q_vec),
tf.abs(fact_vec - prev_memory)]
feature_vec = tf.concat(features, 1)
attention = tf.contrib.layers.fully_connected(feature_vec,
self.config.embed_size,
activation_fn=tf.nn.tanh,
reuse=reuse, scope="fc1")
attention = tf.contrib.layers.fully_connected(attention,
1,
activation_fn=None,
reuse=reuse, scope="fc2")
return attention
def generate_episode(self, memory, q_vec, fact_vecs, hop_index):
"""Generate episode by applying attention to current fact vectors through a modified GRU"""
attentions = [tf.squeeze(
self.get_attention(q_vec, memory, fv, bool(hop_index) or bool(i)), axis=1)
for i, fv in enumerate(tf.unstack(fact_vecs, axis=1))]
attentions = tf.transpose(tf.stack(attentions))
self.attentions.append(attentions)
attentions = tf.nn.softmax(attentions)
attentions = tf.expand_dims(attentions, axis=-1)
reuse = True if hop_index > 0 else False
# concatenate fact vectors and attentions for input into attGRU
gru_inputs = tf.concat([fact_vecs, attentions], 2)
with tf.variable_scope('attention_gru', reuse=reuse):
_, episode = tf.nn.dynamic_rnn(AttentionGRUCell(self.config.hidden_size),
gru_inputs,
dtype=np.float32,
sequence_length=self.input_len_placeholder
)
return episode
def add_answer_module(self, rnn_output, q_vec):
"""Linear softmax answer module"""
rnn_output = tf.nn.dropout(rnn_output, self.dropout_placeholder)
output = tf.layers.dense(tf.concat([rnn_output, q_vec], 1),
self.vocab_size,
activation=None)
return output
def inference(self):
"""Performs inference on the DMN model"""
# input fusion module
with tf.variable_scope("question", initializer=tf.contrib.layers.xavier_initializer()):
print('==> get question representation')
q_vec = self.get_question_representation()
with tf.variable_scope("input", initializer=tf.contrib.layers.xavier_initializer()):
print('==> get input representation')
fact_vecs = self.get_input_representation()
# keep track of attentions for possible strong supervision
self.attentions = []
# memory module
with tf.variable_scope("memory", initializer=tf.contrib.layers.xavier_initializer()):
print('==> build episodic memory')
# generate n_hops episodes
prev_memory = q_vec
for i in range(self.config.num_hops):
# get a new episode
print('==> generating episode', i)
episode = self.generate_episode(prev_memory, q_vec, fact_vecs, i)
# untied weights for memory update
with tf.variable_scope("hop_%d" % i):
prev_memory = tf.layers.dense(tf.concat([prev_memory, episode, q_vec], 1),
self.config.hidden_size,
activation=tf.nn.relu)
output = prev_memory
# pass memory module output through linear answer module
with tf.variable_scope("answer", initializer=tf.contrib.layers.xavier_initializer()):
output = self.add_answer_module(output, q_vec)
return output
def run_epoch(self, session, data, num_epoch=0, train_writer=None, train_op=None, verbose=2, train=False):
config = self.config
dp = config.dropout
if train_op is None:
# train_op = tf.no_op()
dp = 1
total_steps = len(data[0]) // config.batch_size
total_loss = []
accuracy = 0
# shuffle data
p = np.random.permutation(len(data[0]))
qp, ip, ql, il, im, a = data
qp, ip, ql, il, im, a = qp[p], ip[p], ql[p], il[p], im[p], a[p]
for step in range(total_steps):
index = range(step*config.batch_size,(step+1)*config.batch_size)
feed = {self.question_placeholder: qp[index],
self.input_placeholder: ip[index],
self.question_len_placeholder: ql[index],
self.input_len_placeholder: il[index],
self.answer_placeholder: a[index],
self.dropout_placeholder: dp}
if train_op is None:
loss, pred, summary, = session.run(
[self.calculate_loss, self.pred, self.merged], feed_dict=feed)
else:
loss, pred, summary, _ = session.run(
[self.calculate_loss, self.pred, self.merged, train_op], feed_dict=feed)
if train_writer is not None:
train_writer.add_summary(summary, num_epoch*total_steps + step)
answers = a[step*config.batch_size:(step+1)*config.batch_size]
accuracy += np.sum(pred == answers)/float(len(answers))
total_loss.append(loss)
if verbose and step % verbose == 0:
sys.stdout.write('\r{} / {} : loss = {}'.format(
step, total_steps, np.mean(total_loss)))
sys.stdout.flush()
if verbose:
sys.stdout.write('\r')
return np.mean(total_loss), accuracy/float(total_steps)
def __init__(self, config):
self.config = config
self.variables_to_save = {}
self.load_data(debug=False)
self.add_placeholders()
# set up embedding
self.embeddings = tf.Variable(self.word_embedding.astype(np.float32), name="Embedding")
self.output = self.inference()
self.pred = self.get_predictions(self.output)
self.calculate_loss = self.add_loss_op(self.output)
self.train_step = self.add_training_op(self.calculate_loss)
self.merged = tf.summary.merge_all()
================================================
FILE: dmn_test.py
================================================
from __future__ import print_function
from __future__ import division
import tensorflow as tf
import numpy as np
import time
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-b", "--babi_task_id", help="specify babi task 1-20 (default=1)")
parser.add_argument("-t", "--dmn_type", help="specify type of dmn (default=original)")
args = parser.parse_args()
dmn_type = args.dmn_type if args.dmn_type is not None else "plus"
if dmn_type == "original":
from dmn_original import Config
config = Config()
elif dmn_type == "plus":
from dmn_plus import Config
config = Config()
else:
raise NotImplementedError(dmn_type + ' DMN type is not currently implemented')
if args.babi_task_id is not None:
config.babi_id = args.babi_task_id
config.strong_supervision = False
config.train_mode = False
print( 'Testing DMN ' + dmn_type + ' on babi task', config.babi_id)
# create model
with tf.variable_scope('DMN') as scope:
if dmn_type == "original":
from dmn_original import DMN
model = DMN(config)
elif dmn_type == "plus":
from dmn_plus import DMN_PLUS
model = DMN_PLUS(config)
print('==> initializing variables')
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as session:
session.run(init)
print('==> restoring weights')
saver.restore(session, 'weights/task' + str(model.config.babi_id) + '.weights')
print('==> running DMN')
test_loss, test_accuracy = model.run_epoch(session, model.test)
print('')
print('Test accuracy:', test_accuracy)
================================================
FILE: dmn_train.py
================================================
from __future__ import print_function
from __future__ import division
import tensorflow as tf
import time
import argparse
import os
parser = argparse.ArgumentParser()
parser.add_argument("-b", "--babi_task_id", help="specify babi task 1-20 (default=1)")
parser.add_argument("-r", "--restore", help="restore previously trained weights (default=false)")
parser.add_argument("-s", "--strong_supervision", help="use labelled supporting facts (default=false)")
parser.add_argument("-t", "--dmn_type", help="specify type of dmn (default=original)")
parser.add_argument("-l", "--l2_loss", type=float, default=0.001, help="specify l2 loss constant")
parser.add_argument("-n", "--num_runs", type=int, help="specify the number of model runs")
args = parser.parse_args()
dmn_type = args.dmn_type if args.dmn_type is not None else "plus"
if dmn_type == "plus":
from dmn_plus import Config
config = Config()
else:
raise NotImplementedError(dmn_type + ' DMN type is not currently implemented')
if args.babi_task_id is not None:
config.babi_id = args.babi_task_id
config.babi_id = args.babi_task_id if args.babi_task_id is not None else str(1)
config.l2 = args.l2_loss if args.l2_loss is not None else 0.001
config.strong_supervision = args.strong_supervision if args.strong_supervision is not None else False
num_runs = args.num_runs if args.num_runs is not None else 1
print('Training DMN ' + dmn_type + ' on babi task', config.babi_id)
best_overall_val_loss = float('inf')
# create model
with tf.variable_scope('DMN') as scope:
if dmn_type == "plus":
from dmn_plus import DMN_PLUS
model = DMN_PLUS(config)
for run in range(num_runs):
print('Starting run', run)
print('==> initializing variables')
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as session:
sum_dir = 'summaries/train/' + time.strftime("%Y-%m-%d %H %M")
if not os.path.exists(sum_dir):
os.makedirs(sum_dir)
train_writer = tf.summary.FileWriter(sum_dir, session.graph)
session.run(init)
best_val_epoch = 0
prev_epoch_loss = float('inf')
best_val_loss = float('inf')
best_val_accuracy = 0.0
if args.restore:
print('==> restoring weights')
saver.restore(session, 'weights/task' + str(model.config.babi_id) + '.weights')
print('==> starting training')
for epoch in range(config.max_epochs):
print('Epoch {}'.format(epoch))
start = time.time()
train_loss, train_accuracy = model.run_epoch(
session, model.train, epoch, train_writer,
train_op=model.train_step, train=True)
valid_loss, valid_accuracy = model.run_epoch(session, model.valid)
print('Training loss: {}'.format(train_loss))
print('Validation loss: {}'.format(valid_loss))
print('Training accuracy: {}'.format(train_accuracy))
print('Vaildation accuracy: {}'.format(valid_accuracy))
if valid_loss < best_val_loss:
best_val_loss = valid_loss
best_val_epoch = epoch
if best_val_loss < best_overall_val_loss:
print('Saving weights')
best_overall_val_loss = best_val_loss
best_val_accuracy = valid_accuracy
saver.save(session, 'weights/task' + str(model.config.babi_id) + '.weights')
# anneal
if train_loss > prev_epoch_loss * model.config.anneal_threshold:
model.config.lr /= model.config.anneal_by
print('annealed lr to %f' % model.config.lr)
prev_epoch_loss = train_loss
if epoch - best_val_epoch > config.early_stopping:
break
print('Total time: {}'.format(time.time() - start))
print('Best validation accuracy:', best_val_accuracy)
================================================
FILE: fetch_babi_data.sh
================================================
#!/bin/bash
url=http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
fname=`basename $url`
curl -SLO $url
tar zxvf $fname
mkdir -p data
mv tasks_1-20_v1-2/* data/
rm -r tasks_1-20_v1-2
rm tasks_1-20_v1-2.tar.gz
mkdir weights
gitextract_74facvp9/ ├── .gitignore ├── LICENSE.txt ├── README.md ├── attention_gru_cell.py ├── babi_input.py ├── dmn_plus.py ├── dmn_test.py ├── dmn_train.py └── fetch_babi_data.sh
SYMBOL INDEX (34 symbols across 3 files)
FILE: attention_gru_cell.py
class AttentionGRUCell (line 26) | class AttentionGRUCell(RNNCell):
method __init__ (line 35) | def __init__(self, num_units, input_size=None, activation=tanh):
method state_size (line 42) | def state_size(self):
method output_size (line 47) | def output_size(self):
method __call__ (line 50) | def __call__(self, inputs, state, scope=None):
function _linear (line 72) | def _linear(args, output_size, bias, bias_start=0.0):
FILE: babi_input.py
function init_babi (line 13) | def init_babi(fname):
function get_babi_raw (line 47) | def get_babi_raw(id, test_id):
function load_glove (line 102) | def load_glove(dim):
function create_vector (line 116) | def create_vector(word, word2vec, word_vector_size, silent=True):
function process_word (line 124) | def process_word(word, word2vec, vocab, ivocab, word_vector_size, to_ret...
function process_input (line 139) | def process_input(data_raw, floatX, word2vec, vocab, ivocab, embed_size,...
function get_lens (line 201) | def get_lens(inputs, split_sentences=False):
function get_sentence_lens (line 207) | def get_sentence_lens(inputs):
function pad_inputs (line 221) | def pad_inputs(inputs, lens, max_len, mode="", sen_lens=None, max_sen_le...
function create_embedding (line 242) | def create_embedding(word2vec, ivocab, embed_size):
function load_babi (line 249) | def load_babi(config, split_sentences=False):
FILE: dmn_plus.py
class Config (line 17) | class Config(object):
function _add_gradient_noise (line 55) | def _add_gradient_noise(t, stddev=1e-3, name=None):
function _position_encoding (line 65) | def _position_encoding(sentence_size, embedding_size):
class DMN_PLUS (line 78) | class DMN_PLUS(object):
method load_data (line 80) | def load_data(self, debug=False):
method add_placeholders (line 88) | def add_placeholders(self):
method get_predictions (line 100) | def get_predictions(self, output):
method add_loss_op (line 105) | def add_loss_op(self, output):
method add_training_op (line 118) | def add_training_op(self, loss):
method get_question_representation (line 133) | def get_question_representation(self):
method get_input_representation (line 146) | def get_input_representation(self):
method get_attention (line 172) | def get_attention(self, q_vec, prev_memory, fact_vec, reuse):
method generate_episode (line 195) | def generate_episode(self, memory, q_vec, fact_vecs, hop_index):
method add_answer_module (line 221) | def add_answer_module(self, rnn_output, q_vec):
method inference (line 232) | def inference(self):
method run_epoch (line 275) | def run_epoch(self, session, data, num_epoch=0, train_writer=None, tra...
method __init__ (line 326) | def __init__(self, config):
Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (42K chars).
[
{
"path": ".gitignore",
"chars": 80,
"preview": "/data\n/papers\n/weights\n/summaries\n*.swp\n*.pyc\n*.zip\n*.xlsx\n*.gz\ndmn_original.py\n"
},
{
"path": "LICENSE.txt",
"chars": 1078,
"preview": "The MIT License (MIT)\n\nCopyright (c) 2016 Alex Barron\n\nPermission is hereby granted, free of charge, to any person obtai"
},
{
"path": "README.md",
"chars": 2778,
"preview": "# Dynamic Memory Networks in TensorFlow\n\nDMN+ implementation in TensorFlow for question answering on the bAbI 10k datase"
},
{
"path": "attention_gru_cell.py",
"chars": 4963,
"preview": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport col"
},
{
"path": "babi_input.py",
"chars": 12498,
"preview": "from __future__ import division\nfrom __future__ import print_function\n\nimport sys\n\nimport os as os\nimport numpy as np\n\n#"
},
{
"path": "dmn_plus.py",
"chars": 12547,
"preview": "from __future__ import print_function\nfrom __future__ import division\n\nimport sys\nimport time\n\nimport numpy as np\nfrom c"
},
{
"path": "dmn_test.py",
"chars": 1590,
"preview": "from __future__ import print_function\nfrom __future__ import division\n\nimport tensorflow as tf\nimport numpy as np\n\nimpor"
},
{
"path": "dmn_train.py",
"chars": 3971,
"preview": "from __future__ import print_function\nfrom __future__ import division\n\nimport tensorflow as tf\n\nimport time\nimport argpa"
},
{
"path": "fetch_babi_data.sh",
"chars": 244,
"preview": "#!/bin/bash\n\nurl=http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz\nfname=`basename $url`\n\ncurl -SLO $u"
}
]
About this extraction
This page contains the full source code of the barronalex/Dynamic-Memory-Networks-in-TensorFlow GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (38.8 KB), approximately 9.8k tokens, and a symbol index with 34 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.