Showing preview only (782K chars total). Download the full file or copy to clipboard to get everything.
Repository: uber-common/differentiable-plasticity
Branch: master
Commit: 5bd29a18cc20
Files: 106
Total size: 747.0 KB
Directory structure:
gitextract_rz2xt4mk/
├── .gitignore
├── LICENSE
├── NOTICE.md
├── README.md
├── awd-lstm-lm/
│ ├── .gitignore
│ ├── LICENSE
│ ├── OpusHdfsCopy.py
│ ├── OpusPrepare.sh
│ ├── README.md
│ ├── TESTCOMMAND
│ ├── data.py
│ ├── embed_regularize.py
│ ├── finetune.py
│ ├── generate.py
│ ├── getdata.sh
│ ├── locked_dropout.py
│ ├── main.py
│ ├── model.py
│ ├── model.py.old
│ ├── mylstm.py
│ ├── mylstm.py.orig
│ ├── opus.docker.old
│ ├── plotresults.py
│ ├── plotresultssingle.py
│ ├── pointer.py
│ ├── request_devbox.json
│ ├── request_full.json
│ ├── request_opus.json
│ ├── request_opus.json.old
│ ├── request_plast.json
│ ├── splitcross.py
│ ├── test.py
│ ├── tmp.py
│ ├── utils.py
│ └── weight_drop.py
├── images/
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── anim.py
│ ├── images.py
│ ├── plotresults.py
│ ├── request.json
│ ├── showcompletion_eta.py
│ └── testpics.py
├── maze/
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── anim.py
│ ├── animbatch.py
│ ├── batch.py
│ ├── makefigure.py
│ ├── makemaze.py
│ ├── maze.py
│ ├── opus.docker
│ ├── opus.docker.old
│ ├── plotfigure.py
│ ├── plotresults.py
│ ├── request.json
│ ├── request_devbox.json
│ ├── request_modplast.json
│ ├── request_modul.json
│ ├── request_plastic.json
│ ├── request_rnn.json
│ ├── request_rnn100neurons.json
│ ├── testbatch.py
│ └── testnobatch.py
├── omniglot/
│ ├── .ipynb_checkpoints/
│ │ └── Omniglot Data Loading-checkpoint.ipynb
│ ├── README.md
│ ├── omniglot.py
│ ├── opus.docker
│ ├── plotresults.py
│ ├── request.json
│ └── test_omniglot_allseeds.py
├── opus.docker
├── request_devbox.json
├── request_lstm.json
├── request_lstm_simple.json
├── simple/
│ ├── .gitignore
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── full.py
│ ├── lstm.py
│ ├── opus.docker
│ ├── plotresults.py
│ ├── request.json
│ ├── request_lstm.json
│ ├── simple.py
│ └── simplest.py
├── simplemaze/
│ ├── README.md
│ └── maze.py
└── sr/
├── .gitignore
├── OpusHdfsCopy.py
├── README.md
├── anim.py
├── cueshown0.dat.npy
├── makefigure.py
├── modul.py
├── modulator0.dat.npy
├── opus.docker.old
├── plotmodulator.py
├── plotresults.py
├── request.json
├── request_batch.json
├── request_easy.json
├── rewardsprevstep0.dat.npy
├── srbatch.py
├── srrun.py
└── srrun1episode.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*/*.data
*/data
*/*.pt
*/*.swp
*/*.txt
*/*.png
*/*.dat
*/tmp
*.swp
*.txt
*.png
*.gif
*.dat
loss*
grads_*
__pycache__/*
*/__pycache__/*
*/__pycache__/
torchmod*
params*
tmp*/
================================================
FILE: LICENSE
================================================
"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by the text below.
"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.
"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or Object form, made available under this License.
This License governs use of the accompanying Work, and your use of the Work constitutes acceptance of this License.
You may use this Work for any non-commercial purpose, subject to the restrictions in this License. Some purposes which can be non-commercial are teaching, academic research, and personal experimentation. You may also distribute this Work with books or other teaching materials, or publish the Work on websites, that are intended to teach the use of the Work.
You may not use or distribute this Work, or any derivative works, outputs, or results from the Work, in any form for commercial purposes. Non-exhaustive examples of commercial purposes would be running business operations, licensing, leasing, or selling the Work, or distributing the Work for use with commercial products.
You may modify this Work and distribute the modified Work for non-commercial purposes, however, you may not grant rights to the Work or derivative works that are broader than or in conflict with those provided by this License. For example, you may not distribute modifications of the Work under terms that would permit commercial use, or under terms that purport to require the Work or derivative works to be sublicensed to others.
In return, we require that you agree:
1. Not to remove any copyright or other notices from the Work.
2. That if you distribute the Work in Source or Object form, you will include a verbatim copy of this License.
3. That if you distribute derivative works of the Work in Source form, you do so only under a license that includes all of the provisions of this License and is not in conflict with this License, and if you distribute derivative works of the Work solely in Object form you do so only under a license that complies with this License.
4. That if you have modified the Work or created derivative works from the Work, and distribute such modifications or derivative works, you will cause the modified files to carry prominent notices so that recipients know that they are not receiving the original Work. Such notices must state: (i) that you have changed the Work; and (ii) the date of any changes.
5. If you publicly use the Work or any output or result of the Work, you will provide a notice with such use that provides any person who uses, views, accesses, interacts with, or is otherwise exposed to the Work (i) with information of the nature of the Work, (ii) with a link to the Work, and (iii) a notice that the Work is available under this License.
6. THAT THE WORK COMES "AS IS", WITH NO WARRANTIES. THIS MEANS NO EXPRESS, IMPLIED OR STATUTORY WARRANTY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR ANY WARRANTY OF TITLE OR NON-INFRINGEMENT. ALSO, YOU MUST PASS THIS DISCLAIMER ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS.
7. THAT NEITHER UBER TECHNOLOGIES, INC. NOR ANY OF ITS AFFILIATES, SUPPLIERS, SUCCESSORS, NOR ASSIGNS WILL BE LIABLE FOR ANY DAMAGES RELATED TO THE WORK OR THIS LICENSE, INCLUDING DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL OR INCIDENTAL DAMAGES, TO THE MAXIMUM EXTENT THE LAW PERMITS, NO MATTER WHAT LEGAL THEORY IT IS BASED ON. ALSO, YOU MUST PASS THIS LIMITATION OF LIABILITY ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS.
8. That if you sue anyone over patents that you think may apply to the Work or anyone's use of the Work, your license to the Work ends automatically.
9. That your rights under the License end automatically if you breach it in any way.
10. Uber Technologies, Inc. reserves all rights not expressly granted to you in this License.
================================================
FILE: NOTICE.md
================================================
The `awd-lstm-lm` directory (language modelling with plastic LSTMs) was forked
from the [Salesforce Language Model
Toolkit](https://github.com/salesforce/awd-lstm-lm/), which implements the
baseline language modelling system used in our experiments (this baseline is
the model described in [Merity et al. (2017), Regularizing and Optimizing LSTM
Language Models](https://arxiv.org/abs/1708.02182).
License for the Salesforce Language Model Toolkit:
Copyright (c) 2017,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: README.md
================================================
## Differentiable plasticity
This repo contains implementations of the algorithms described in [Differentiable plasticity: training plastic networks with gradient descent](https://arxiv.org/abs/1804.02464), a research paper from Uber AI Labs.
NOTE: please see also our more recent work on differentiable *neuromodulated* plasticity: the "[backpropamine](https://github.com/uber-research/backpropamine)" framework.
There are four different experiments included here:
- `simple`: Binary pattern memorization and completion. Read this one first!
- `images`: Natural image memorization and completion
- `omniglot`: One-shot learning in the Omniglot task
- `maze`: Maze exploration task (reinforcement learning)
We strongly recommend studying the `simple/simplest.py` program first, as it is deliberately kept as simple as possible while showing full-fledged differentiable plasticity learning.
The code requires Python 3 and PyTorch 0.3.0 or later. The `images` code also requires scikit-learn. By default our code requires a GPU, but most programs can be run on CPU by simply uncommenting the relevant lines (for others, remove all occurrences of `.cuda()`).
To comment, please open an issue. We will not be accepting pull requests but encourage further study of this research. To learn more, check out our accompanying article on the [Uber Engineering Blog](https://eng.uber.com/differentiable-plasticity).
## Copyright and licensing information
Copyright (c) 2018-2019 Uber Technologies, Inc.
All code is licensed under the Uber Non-Commercial License (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at the root directory of this project.
See the LICENSE file in this repository for the specific language governing
permissions and limitations under the License.
================================================
FILE: awd-lstm-lm/.gitignore
================================================
maintmp.py
HDFS/
*.patch
model_*
results_*
*.pt
*.swp
__pycache__/
data/
corpus*
================================================
FILE: awd-lstm-lm/LICENSE
================================================
BSD 3-Clause License
Copyright (c) 2017,
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================
FILE: awd-lstm-lm/OpusHdfsCopy.py
================================================
import os
import os.path
def checkHdfs():
return os.path.isfile('/opt/hadoop/latest/bin/hdfs')
def transferFileToHdfsPath(sourcepath, targetpath):
hdfspath = targetpath
targetdir = os.path.dirname(targetpath)
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
def transferFileToHdfsDir(sourcepath, targetdir):
hdfspath = os.path.join(targetdir, os.path.basename(sourcepath))
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
================================================
FILE: awd-lstm-lm/OpusPrepare.sh
================================================
cd /home/work
# $HOME is not the same as ~ !!!!
# Installing pyenv and putting it in the path
curl -L https://raw.githubusercontent.com/yyuu/pyenv-installer/master/bin/pyenv-installer | bash
echo "HOME is $HOME"
echo 'export PATH="$HOME/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
' > $HOME/.bashrc
# Installing python 3.5 and making it default
source $HOME/.bashrc
pyenv install 3.5.2
pyenv local 3.5.2
# Note: when we exit the script, environments go away and we need to re-source ~/.bashrc and re-run pyenv local 3.5.2
# Installing numpy and PyTorch
pip install numpy==1.14
pip install torch
apt-get install unzip # Some machines seem not to have it?
# Downloading the data
sh ./getdata.sh
================================================
FILE: awd-lstm-lm/README.md
================================================
# LSTMs with neuromodulated plasticity
This code implements language modelling on the Penn Treebank dataset, using LSTMs with neuromodulated plasticity ("backpropamine"), as described in [Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity (Miconi et al., ICLR 2016)](https://openreview.net/forum?id=r1lrAiA5Ym), a paper from Uber AI labs.
The code is forked from [Salesforce Language model toolkit](https://github.com/Smerity/awd-lstm-lm) and uses most of their parameters and design choices. The main differences are that we do not implement DropConnect and reduce batch size to 6 for computational reasons. This code requires Python 3 and PyTorch 1.0.
To comment, please open an issue. Note that the code is provided "as is": we cannot provide support or accept pull requests at this time.
## Usage
Before running this code, run `getdata.sh` to obtain the Penn Treebank data.
Plasticity and neuromodulation: `python3 main.py --batch_size 6 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 500 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype modplasth2mod --modulout fanout --nhid 1149 --alphatype perneuron --asgdtime 125 --agdiv 1149`
Plasticity without neuromodulation: `python3 main.py --batch_size 6 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 500 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype none --modulout none --nhid 1149 --alphatype perneuron --asgdtime 125 --agdiv 1149`
No plasticity, just plain LSTM: `python3 main.py --batch_size 6 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 500 --save PTB.pt --wdrop 0 --model MYLSTM --modultype modplasth2mod --modulout fanout --nhid 1150 --alphatype full --asgdtime 125 --agdiv 1150`
Note that in all of the above, we use per-neuron plasticity coefficients and reduce the number of neurons in plastic LSTMs (`nhid`) to ensure that plastic LSTMs do not have more trainable parameters.
## Code organization.
The main program is `main.py`. There is some interface code in `model.py`. The code for actual plastic LSTMs is in `mylstm.py`.
## Plastic LSTMs
The code for plastic LSTMs is relatively straightforward, as can be seen in `mylstm.py`.
However, note that in `main.py` we selectively reduce the gradient for `alpha`
parameters when using plastic LSTMs with either per-neuron or single `alpha`.
More precisely, we divide the gradient on `alpha` coefficients by a value that should be roughly equal
to the number of neurons in the LSTM. This greatly enhances stability without
forcing a reduction in learning rates.
================================================
FILE: awd-lstm-lm/TESTCOMMAND
================================================
python test.py --model MYLSTM --nhid 1150 --file ./HDFS/ptb/model__SqUsq_MYLSTM_clip_cv2.0_modplasth2mod_fanout_i2c_perneuron_asgdtime125_agdiv1150_lr30_3l_1150h_0.5lstm_rngseed1.dat
================================================
FILE: awd-lstm-lm/data.py
================================================
import os
import torch
from collections import Counter
class Dictionary(object):
def __init__(self):
self.word2idx = {}
self.idx2word = []
self.counter = Counter()
self.total = 0
def add_word(self, word):
if word not in self.word2idx:
self.idx2word.append(word)
self.word2idx[word] = len(self.idx2word) - 1
token_id = self.word2idx[word]
self.counter[token_id] += 1
self.total += 1
return self.word2idx[word]
def __len__(self):
return len(self.idx2word)
class Corpus(object):
def __init__(self, path):
self.dictionary = Dictionary()
self.train = self.tokenize(os.path.join(path, 'train.txt'))
self.valid = self.tokenize(os.path.join(path, 'valid.txt'))
self.test = self.tokenize(os.path.join(path, 'test.txt'))
def tokenize(self, path):
"""Tokenizes a text file."""
assert os.path.exists(path)
# Add words to the dictionary
with open(path, 'r') as f:
tokens = 0
for line in f:
words = line.split() + ['<eos>']
tokens += len(words)
for word in words:
self.dictionary.add_word(word)
# Tokenize file content
with open(path, 'r') as f:
ids = torch.LongTensor(tokens)
token = 0
for line in f:
words = line.split() + ['<eos>']
for word in words:
ids[token] = self.dictionary.word2idx[word]
token += 1
return ids
================================================
FILE: awd-lstm-lm/embed_regularize.py
================================================
import numpy as np
import pdb
import torch
def embedded_dropout(embed, words, dropout=0.1, scale=None):
if dropout:
mask = embed.weight.data.new().resize_((embed.weight.size(0), 1)).bernoulli_(1 - dropout).expand_as(embed.weight) / (1 - dropout)
masked_embed_weight = mask * embed.weight
else:
masked_embed_weight = embed.weight
if scale:
masked_embed_weight = scale.expand_as(masked_embed_weight) * masked_embed_weight
padding_idx = embed.padding_idx
if padding_idx is None:
padding_idx = -1
X = torch.nn.functional.embedding(words, masked_embed_weight,
padding_idx, embed.max_norm, embed.norm_type,
embed.scale_grad_by_freq, embed.sparse
)
return X
if __name__ == '__main__':
V = 50
h = 4
bptt = 10
batch_size = 2
embed = torch.nn.Embedding(V, h)
words = np.random.random_integers(low=0, high=V-1, size=(batch_size, bptt))
words = torch.LongTensor(words)
origX = embed(words)
X = embedded_dropout(embed, words)
print(origX)
print(X)
================================================
FILE: awd-lstm-lm/finetune.py
================================================
import argparse
import time
import math
import numpy as np
np.random.seed(331)
import torch
import torch.nn as nn
import data
import model
from utils import batchify, get_batch, repackage_hidden
parser = argparse.ArgumentParser(description='PyTorch PennTreeBank RNN/LSTM Language Model')
parser.add_argument('--data', type=str, default='data/penn/',
help='location of the data corpus')
parser.add_argument('--model', type=str, default='LSTM',
help='type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)')
parser.add_argument('--emsize', type=int, default=400,
help='size of word embeddings')
parser.add_argument('--nhid', type=int, default=1150,
help='number of hidden units per layer')
parser.add_argument('--nlayers', type=int, default=3,
help='number of layers')
parser.add_argument('--lr', type=float, default=30,
help='initial learning rate')
parser.add_argument('--clip', type=float, default=0.25,
help='gradient clipping')
parser.add_argument('--epochs', type=int, default=8000,
help='upper epoch limit')
parser.add_argument('--batch_size', type=int, default=80, metavar='N',
help='batch size')
parser.add_argument('--bptt', type=int, default=70,
help='sequence length')
parser.add_argument('--dropout', type=float, default=0.4,
help='dropout applied to layers (0 = no dropout)')
parser.add_argument('--dropouth', type=float, default=0.3,
help='dropout for rnn layers (0 = no dropout)')
parser.add_argument('--dropouti', type=float, default=0.65,
help='dropout for input embedding layers (0 = no dropout)')
parser.add_argument('--dropoute', type=float, default=0.1,
help='dropout to remove words from embedding layer (0 = no dropout)')
parser.add_argument('--wdrop', type=float, default=0.5,
help='amount of weight dropout to apply to the RNN hidden to hidden matrix')
parser.add_argument('--tied', action='store_false',
help='tie the word embedding and softmax weights')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--nonmono', type=int, default=5,
help='random seed')
parser.add_argument('--cuda', action='store_false',
help='use CUDA')
parser.add_argument('--log-interval', type=int, default=200, metavar='N',
help='report interval')
randomhash = ''.join(str(time.time()).split('.'))
parser.add_argument('--save', type=str, default=randomhash+'.pt',
help='path to save the final model')
parser.add_argument('--alpha', type=float, default=2,
help='alpha L2 regularization on RNN activation (alpha = 0 means no regularization)')
parser.add_argument('--beta', type=float, default=1,
help='beta slowness regularization applied on RNN activiation (beta = 0 means no regularization)')
parser.add_argument('--wdecay', type=float, default=1.2e-6,
help='weight decay applied to all weights')
args = parser.parse_args()
# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda")
else:
torch.cuda.manual_seed(args.seed)
###############################################################################
# Load data
###############################################################################
corpus = data.Corpus(args.data)
eval_batch_size = 10
test_batch_size = 1
train_data = batchify(corpus.train, args.batch_size, args)
val_data = batchify(corpus.valid, eval_batch_size, args)
test_data = batchify(corpus.test, test_batch_size, args)
###############################################################################
# Build the model
###############################################################################
ntokens = len(corpus.dictionary)
model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid, args.nlayers, args.dropout, args.dropouth, args.dropouti, args.dropoute, args.wdrop, args.tied)
if args.cuda:
model.cuda()
total_params = sum(x.size()[0] * x.size()[1] if len(x.size()) > 1 else x.size()[0] for x in model.parameters())
print('Args:', args)
print('Model total parameters:', total_params)
criterion = nn.CrossEntropyLoss()
###############################################################################
# Training code
###############################################################################
def evaluate(data_source, batch_size=10):
# Turn on evaluation mode which disables dropout.
if args.model == 'QRNN': model.reset()
model.eval()
total_loss = 0
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(batch_size)
for i in range(0, data_source.size(0) - 1, args.bptt):
data, targets = get_batch(data_source, i, args, evaluation=True)
output, hidden = model(data, hidden)
output_flat = output.view(-1, ntokens)
total_loss += len(data) * criterion(output_flat, targets).data
hidden = repackage_hidden(hidden)
return total_loss[0] / len(data_source)
def train():
# Turn on training mode which enables dropout.
if args.model == 'QRNN': model.reset()
total_loss = 0
start_time = time.time()
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(args.batch_size)
batch, i = 0, 0
while i < train_data.size(0) - 1 - 1:
bptt = args.bptt if np.random.random() < 0.95 else args.bptt / 2.
# Prevent excessively small or negative sequence lengths
seq_len = max(5, int(np.random.normal(bptt, 5)))
# There's a very small chance that it could select a very long sequence length resulting in OOM
seq_len = min(seq_len, args.bptt + 10)
lr2 = optimizer.param_groups[0]['lr']
optimizer.param_groups[0]['lr'] = lr2 * seq_len / args.bptt
model.train()
data, targets = get_batch(train_data, i, args, seq_len=seq_len)
# Starting each batch, we detach the hidden state from how it was previously produced.
# If we didn't, the model would try backpropagating all the way to start of the dataset.
hidden = repackage_hidden(hidden)
optimizer.zero_grad()
output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
raw_loss = criterion(output.view(-1, ntokens), targets)
loss = raw_loss
# Activiation Regularization
loss = loss + sum(args.alpha * dropped_rnn_h.pow(2).mean() for dropped_rnn_h in dropped_rnn_hs[-1:])
# Temporal Activation Regularization (slowness)
loss = loss + sum(args.beta * (rnn_h[1:] - rnn_h[:-1]).pow(2).mean() for rnn_h in rnn_hs[-1:])
loss.backward()
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
optimizer.step()
total_loss += raw_loss.data
optimizer.param_groups[0]['lr'] = lr2
if batch % args.log_interval == 0 and batch > 0:
cur_loss = total_loss[0] / args.log_interval
elapsed = time.time() - start_time
print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.2f} | ms/batch {:5.2f} | '
'loss {:5.2f} | ppl {:8.2f}'.format(
epoch, batch, len(train_data) // args.bptt, optimizer.param_groups[0]['lr'],
elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss)))
total_loss = 0
start_time = time.time()
###
batch += 1
i += seq_len
# Load the best saved model.
with open(args.save, 'rb') as f:
model = torch.load(f)
# Loop over epochs.
lr = args.lr
stored_loss = evaluate(val_data)
best_val_loss = []
# At any point you can hit Ctrl + C to break out of training early.
try:
#optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, weight_decay=args.wdecay)
optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)
for epoch in range(1, args.epochs+1):
epoch_start_time = time.time()
train()
if 't0' in optimizer.param_groups[0]:
tmp = {}
for prm in model.parameters():
tmp[prm] = prm.data.clone()
prm.data = optimizer.state[prm]['ax'].clone()
val_loss2 = evaluate(val_data)
print('-' * 89)
print('| end of epoch {:3d} | time: {:5.2f}s | valid loss {:5.2f} | '
'valid ppl {:8.2f}'.format(epoch, (time.time() - epoch_start_time),
val_loss2, math.exp(val_loss2)))
print('-' * 89)
if val_loss2 < stored_loss:
with open(args.save, 'wb') as f:
torch.save(model, f)
print('Saving Averaged!')
stored_loss = val_loss2
for prm in model.parameters():
prm.data = tmp[prm].clone()
if (len(best_val_loss)>args.nonmono and val_loss2 > min(best_val_loss[:-args.nonmono])):
print('Done!')
import sys
sys.exit(1)
optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)
#optimizer.param_groups[0]['lr'] /= 2.
best_val_loss.append(val_loss2)
except KeyboardInterrupt:
print('-' * 89)
print('Exiting from training early')
# Load the best saved model.
with open(args.save, 'rb') as f:
model = torch.load(f)
# Run on test data.
test_loss = evaluate(test_data, test_batch_size)
print('=' * 89)
print('| End of training | test loss {:5.2f} | test ppl {:8.2f}'.format(
test_loss, math.exp(test_loss)))
print('=' * 89)
================================================
FILE: awd-lstm-lm/generate.py
================================================
###############################################################################
# Language Modeling on Penn Tree Bank
#
# This file generates new sentences sampled from the language model
#
###############################################################################
import argparse
import torch
from torch.autograd import Variable
import data
parser = argparse.ArgumentParser(description='PyTorch PTB Language Model')
# Model parameters.
parser.add_argument('--data', type=str, default='./data/penn',
help='location of the data corpus')
parser.add_argument('--model', type=str, default='LSTM',
help='type of recurrent net (LSTM, QRNN)')
parser.add_argument('--checkpoint', type=str, default='./model.pt',
help='model checkpoint to use')
parser.add_argument('--outf', type=str, default='generated.txt',
help='output file for generated text')
parser.add_argument('--words', type=int, default='1000',
help='number of words to generate')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--cuda', action='store_true',
help='use CUDA')
parser.add_argument('--temperature', type=float, default=1.0,
help='temperature - higher will increase diversity')
parser.add_argument('--log-interval', type=int, default=100,
help='reporting interval')
args = parser.parse_args()
# Set the random seed manually for reproducibility.
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda")
else:
torch.cuda.manual_seed(args.seed)
if args.temperature < 1e-3:
parser.error("--temperature has to be greater or equal 1e-3")
with open(args.checkpoint, 'rb') as f:
model = torch.load(f)
model.eval()
if args.model == 'QRNN':
model.reset()
if args.cuda:
model.cuda()
else:
model.cpu()
corpus = data.Corpus(args.data)
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(1)
input = Variable(torch.rand(1, 1).mul(ntokens).long(), volatile=True)
if args.cuda:
input.data = input.data.cuda()
with open(args.outf, 'w') as outf:
for i in range(args.words):
output, hidden = model(input, hidden)
word_weights = output.squeeze().data.div(args.temperature).exp().cpu()
word_idx = torch.multinomial(word_weights, 1)[0]
input.data.fill_(word_idx)
word = corpus.dictionary.idx2word[word_idx]
outf.write(word + ('\n' if i % 20 == 19 else ' '))
if i % args.log_interval == 0:
print('| Generated {}/{} words'.format(i, args.words))
================================================
FILE: awd-lstm-lm/getdata.sh
================================================
echo "=== Acquiring datasets ==="
echo "---"
mkdir -p save
mkdir -p data
cd data
#echo "- Downloading WikiText-2 (WT2)"
#wget --quiet --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
#unzip -q wikitext-2-v1.zip
#cd wikitext-2
#mv wiki.train.tokens train.txt
#mv wiki.valid.tokens valid.txt
#mv wiki.test.tokens test.txt
#cd ..
#
#echo "- Downloading WikiText-103 (WT2)"
#wget --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip
#unzip -q wikitext-103-v1.zip
#cd wikitext-103
#mv wiki.train.tokens train.txt
#mv wiki.valid.tokens valid.txt
#mv wiki.test.tokens test.txt
#cd ..
#
#echo "- Downloading enwik8 (Character)"
#mkdir -p enwik8
#cd enwik8
#wget --continue http://mattmahoney.net/dc/enwik8.zip
#python prep_enwik8.py
#cd ..
echo "- Downloading Penn Treebank (PTB)"
wget --quiet --continue http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
tar -xzf simple-examples.tgz
mkdir -p penn
cd penn
mv ../simple-examples/data/ptb.train.txt train.txt
mv ../simple-examples/data/ptb.test.txt test.txt
mv ../simple-examples/data/ptb.valid.txt valid.txt
cd ..
#echo "- Downloading Penn Treebank (Character)"
#mkdir -p pennchar
#cd pennchar
#mv ../simple-examples/data/ptb.char.train.txt train.txt
#mv ../simple-examples/data/ptb.char.test.txt test.txt
#mv ../simple-examples/data/ptb.char.valid.txt valid.txt
#cd ..
#
rm -rf simple-examples/
# echo "- Downloading WikiText-2 (WT2)"
# wget --quiet --continue https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-v1.zip
# unzip -q wikitext-2-v1.zip
# cd wikitext-2
# mv wiki.train.tokens train.txt
# mv wiki.valid.tokens valid.txt
# mv wiki.test.tokens test.txt
#
echo "---"
echo "Happy language modeling :)"
================================================
FILE: awd-lstm-lm/locked_dropout.py
================================================
import torch
import torch.nn as nn
from torch.autograd import Variable
class LockedDropout(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x, dropout=0.5):
if not self.training or not dropout:
return x
m = x.data.new(1, x.size(1), x.size(2)).bernoulli_(1 - dropout)
mask = Variable(m, requires_grad=False) / (1 - dropout)
mask = mask.expand_as(x)
return mask * x
================================================
FILE: awd-lstm-lm/main.py
================================================
import OpusHdfsCopy
from OpusHdfsCopy import transferFileToHdfsDir, checkHdfs
import argparse
import time
import math
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import pdb
import data
import model
from utils import batchify, get_batch, repackage_hidden
parser = argparse.ArgumentParser(description='PyTorch PennTreeBank RNN/LSTM Language Model')
parser.add_argument('--data', type=str, default='data/penn/',
help='location of the data corpus')
parser.add_argument('--model', type=str, default='PLASTICLSTM',
help='type of recurrent net (LSTM, QRNN, GRU, PLASTICLSTM, MYLSTM, FASTPLASTICLSTM, SIMPLEPLASTICLSTM)')
parser.add_argument('--alphatype', type=str, default='full',
help="type of alpha matrix: (full, perneuron, single)")
parser.add_argument('--modultype', type=str, default='none',
help="type of modulation: (none, modplasth2mod, modplastc2mod)")
parser.add_argument('--modulout', type=str, default='single',
help="modulatory output (single or fanout)")
parser.add_argument('--cliptype', type=str, default='clip',
help="clip type (decay, clip, aditya)")
parser.add_argument('--hebboutput', type=str, default='i2c',
help='output used for hebbian computations (i2c, h2co, cell, hidden)')
parser.add_argument('--emsize', type=int, default=400,
help='size of word embeddings')
parser.add_argument('--nhid', type=int, default=1150,
help='number of hidden units per layer')
parser.add_argument('--nlayers', type=int, default=3,
help='number of layers')
parser.add_argument('--clipval', type=float, default=2.0,
help='value of the hebbian trace clipping')
parser.add_argument('--lr', type=float, default=30,
help='initial learning rate')
parser.add_argument('--agdiv', type=float, default=1150.0,
help='divider of the gradient of alpha')
parser.add_argument('--clip', type=float, default=0.25,
help='gradient clipping')
parser.add_argument('--epochs', type=int, default=300,
help='upper epoch limit')
parser.add_argument('--batch_size', type=int, default=80, metavar='N',
help='batch size')
parser.add_argument('--bptt', type=int, default=70,
help='sequence length')
parser.add_argument('--dropout', type=float, default=0.4,
help='dropout applied to layers (0 = no dropout)')
parser.add_argument('--dropouth', type=float, default=0.3,
help='dropout for rnn layers (0 = no dropout)')
parser.add_argument('--dropouti', type=float, default=0.65,
help='dropout for input embedding layers (0 = no dropout)')
parser.add_argument('--dropoute', type=float, default=0.1,
help='dropout to remove words from embedding layer (0 = no dropout)')
parser.add_argument('--proplstm', type=float, default=0.5,
help='for split-lstms: proportion of LSTM cells in the recurrent layer')
parser.add_argument('--wdrop', type=float, default=0.5,
help='amount of weight dropout to apply to the RNN hidden to hidden matrix')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--asgdtime', type=int, default=-1,
help='number of iterations before switch to ASGD (if positive)')
parser.add_argument('--nonmono', type=int, default=5,
help='range of non monotonicity before switch to ASGD (if asgdtime is negative)')
parser.add_argument('--cuda', action='store_false',
help='use CUDA')
parser.add_argument('--numgpu', type=int, default=0,
help='which GPU to use? (no effect if GPU not used at all)')
parser.add_argument('--log-interval', type=int, default=200, metavar='N',
help='report interval')
randomhash = ''.join(str(time.time()).split('.'))
parser.add_argument('--save', type=str, default=randomhash+'.pt',
help='path to save the final model')
parser.add_argument('--alpha', type=float, default=2,
help='alpha L2 regularization on RNN activation (alpha = 0 means no regularization)')
parser.add_argument('--beta', type=float, default=1,
help='beta slowness regularization applied on RNN activiation (beta = 0 means no regularization)')
parser.add_argument('--wdecay', type=float, default=1.2e-6,
help='weight decay applied to all weights')
parser.add_argument('--resume', type=str, default='',
help='path of model to resume')
parser.add_argument('--optimizer', type=str, default='sgd',
help='optimizer to use (sgd, adam)')
parser.add_argument('--when', nargs="+", type=int, default=[-1],
help='When (which epochs) to divide the learning rate by 10 - accepts multiple')
args = parser.parse_args()
args.tied = True
# Set the random seed manually for reproducibility.
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda :
print("WARNING: You have a CUDA device, so you should probably run with --cuda")
else:
torch.cuda.manual_seed(args.seed)
else:
print("NOTE: no CUDA device detected.")
import platform
print("PyTorch version:", torch.__version__, "Numpy version:", np.version.version, "Python version:", platform.python_version(), "GPU used (if any):", args.numgpu)
###############################################################################
# Load data
###############################################################################
def model_save(fn):
with open(fn, 'wb') as f:
torch.save([model, criterion, optimizer], f)
def model_load(fn):
global model, criterion, optimizer
with open(fn, 'rb') as f:
model, criterion, optimizer = torch.load(f)
import os
import hashlib
fn = 'corpus.{}.data'.format(hashlib.md5(args.data.encode()).hexdigest())
if os.path.exists(fn):
print('Loading cached dataset...')
corpus = torch.load(fn)
else:
print('Producing dataset...')
corpus = data.Corpus(args.data)
torch.save(corpus, fn)
eval_batch_size = 10
test_batch_size = 1
train_data = batchify(corpus.train, args.batch_size, args)
val_data = batchify(corpus.valid, eval_batch_size, args)
test_data = batchify(corpus.test, test_batch_size, args)
#train_data = train_data[:5000,:] # For debugging
###############################################################################
# Build the model
###############################################################################
from splitcross import SplitCrossEntropyLoss
criterion = None
ntokens = len(corpus.dictionary)
# Configuration parameters of the plastic LSTM. See mylstm.py for details.
myparams={}
myparams['clipval'] = args.clipval
myparams['cliptype'] = args.cliptype
myparams['modultype'] = args.modultype
myparams['modulout'] = args.modulout
myparams['hebboutput'] = args.hebboutput
myparams['alphatype'] = args.alphatype
suffix = '_SqUsq_'+args.model+'_'+myparams['cliptype']+'_cv'+str(myparams['clipval'])+'_'+myparams['modultype']+'_'+myparams['modulout']+'_'+myparams['hebboutput']+'_'+myparams['alphatype']+'_asgdtime'+str(args.asgdtime)+'_agdiv'+str(int(args.agdiv))+'_lr'+str(args.lr)+'_'+str(args.nlayers)+'l_'+str(args.nhid)+'h_'+str(args.proplstm)+'lstm_rngseed'+str(args.seed)
print("Suffix:", suffix)
MODELFILENAME = 'model_'+suffix+'.dat'
RESULTSFILENAME = 'results_'+suffix+'.txt'
FILENAMESTOSAVE = [MODELFILENAME, RESULTSFILENAME] # We will append to this list the additional files at each learning rate reduction, if any
print("Plasticity and neuromodulation parameters:", myparams)
model = model.RNNModel(args.model, ntokens, args.emsize, args.nhid, args.proplstm, args.nlayers, args.dropout, args.dropouth, args.dropouti, args.dropoute, args.wdrop, args.tied, myparams)
###
if args.resume:
print('Resuming model ...')
model_load(args.resume)
optimizer.param_groups[0]['lr'] = args.lr
model.dropouti, model.dropouth, model.dropout, args.dropoute = args.dropouti, args.dropouth, args.dropout, args.dropoute
if args.wdrop:
from weight_drop import WeightDrop
for rnn in model.rnns:
if type(rnn) == WeightDrop: rnn.dropout = args.wdrop
elif rnn.zoneout > 0: rnn.zoneout = args.wdrop
###
if not criterion:
splits = []
if ntokens > 500000:
# One Billion
# This produces fairly even matrix mults for the buckets:
# 0: 11723136, 1: 10854630, 2: 11270961, 3: 11219422
splits = [4200, 35000, 180000]
elif ntokens > 75000:
# WikiText-103
splits = [2800, 20000, 76000]
print('Using', splits)
criterion = SplitCrossEntropyLoss(args.emsize, splits=splits, verbose=False)
###
params = list(model.parameters()) + list(criterion.parameters())
if args.cuda:
model = model.cuda(args.numgpu)
criterion = criterion.cuda(args.numgpu)
params = list(model.parameters()) + list(criterion.parameters())
###
#total_params = sum(x.size()[0] * x.size()[1] if len(x.size()) > 1 else x.size()[0] for x in params if x.size()) # Smerity version, doesn't work when size==3
total_params = sum(x.numel() for x in params if x.numel())
print('Args:', args)
print('Model total parameters:', total_params)
###############################################################################
# Training code
###############################################################################
def evaluate(data_source, batch_size=10):
# Turn on evaluation mode which disables dropout.
model.eval()
with torch.no_grad():
if args.model == 'QRNN': model.reset()
total_loss = 0
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(batch_size)
for i in range(0, data_source.size(0) - 1, args.bptt):
data, targets = get_batch(data_source, i, args, evaluation=True)
output, hidden = model(data, hidden)
total_loss += len(data) * criterion(model.decoder.weight, model.decoder.bias, output, targets).data
hidden = repackage_hidden(hidden)
#return total_loss[0] / len(data_source) # Error under modern PyTorch
return total_loss / len(data_source)
def train():
# Turn on training mode which enables dropout.
if args.model == 'QRNN': model.reset()
total_loss = 0
start_time = time.time()
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(args.batch_size)
batch, i = 0, 0
while i < train_data.size(0) - 1 - 1:
bptt = args.bptt if np.random.random() < 0.95 else args.bptt / 2.
# Prevent excessively small or negative sequence lengths
seq_len = max(5, int(np.random.normal(bptt, 5)))
# There's a very small chance that it could select a very long sequence length resulting in OOM
# NOTE: this was commented out in smerity's code!
seq_len = min(seq_len, args.bptt + 10)
lr2 = optimizer.param_groups[0]['lr']
optimizer.param_groups[0]['lr'] = lr2 * seq_len / args.bptt
model.train()
data, targets = get_batch(train_data, i, args, seq_len=seq_len)
# Starting each batch, we detach the hidden state from how it was previously produced.
# If we didn't, the model would try backpropagating all the way to start of the dataset.
# NOTE: Now 'hidden' includes the Hebbian traces if using plasticity.
hidden = repackage_hidden(hidden)
optimizer.zero_grad()
output, hidden, rnn_hs, dropped_rnn_hs = model(data, hidden, return_h=True)
raw_loss = criterion(model.decoder.weight, model.decoder.bias, output, targets)
loss = raw_loss
# Activiation Regularization
if args.alpha: loss = loss + sum(args.alpha * dropped_rnn_h.pow(2).mean() for dropped_rnn_h in dropped_rnn_hs[-1:])
# Temporal Activation Regularization (slowness)
if args.beta: loss = loss + sum(args.beta * (rnn_h[1:] - rnn_h[:-1]).pow(2).mean() for rnn_h in rnn_hs[-1:])
loss.backward()
# When using plastic LSTMs,
# We divide the gradient on the alphas by the number of inputs, i.e.
# the number of recurrent neurons, but only if plasticity is
# 'perneuron' or 'single' (as opposed to 'full').
# This is necessary to preserve stability while using the same learning rate as Merity et al.
if args.model == 'PLASTICLSTM' or args.model == 'SPLITLSTM' or args.model == 'FASTPLASTICLSTM':
if args.alphatype == 'perneuron' or args.alphatype == 'single': # Based on other experiments, this is actually not good for full-plasticity
for x in model.rnns:
if hasattr(x.alpha.grad, 'data'):
x.alpha.grad.data /= args.agdiv
# `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
if args.clip: torch.nn.utils.clip_grad_norm(model.parameters(), args.clip)
# OPTIMIZATION STEP
optimizer.step()
total_loss += raw_loss.data
optimizer.param_groups[0]['lr'] = lr2
if batch % args.log_interval == 0 and batch > 0:
cur_loss = total_loss / args.log_interval
elapsed = time.time() - start_time
print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:05.5f} | ms/batch {:5.2f} | '
'loss {:5.2f} | ppl {:8.2f} | bpc {:8.3f}'.format(
epoch, batch, len(train_data) // args.bptt, optimizer.param_groups[0]['lr'],
elapsed * 1000 / args.log_interval, cur_loss, math.exp(cur_loss), cur_loss / math.log(2)))
total_loss = 0
start_time = time.time()
###
batch += 1
i += seq_len
# Loop over epochs.
lr = args.lr
best_val_loss = []
stored_loss = 100000000
# At any point you can hit Ctrl + C to break out of training early.
try:
optimizer = None
if args.optimizer == 'sgd':
optimizer = torch.optim.SGD(model.parameters(), lr=args.lr, weight_decay=args.wdecay)
if args.optimizer == 'adam':
optimizer = torch.optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.wdecay)
allvallosses = []
for epoch in range(1, args.epochs+1):
epoch_start_time = time.time()
train()
if 't0' in optimizer.param_groups[0]: # Are we in the ASGD regime?
tmp = {}
for prm in model.parameters():
tmp[prm] = prm.data.clone()
# NOTE (TM): the following line may cause trouble after the switch to ASGD if some declared pytorch Parameters of the network are not actually used in the computational graph
prm.data = optimizer.state[prm]['ax'].clone()
val_loss2 = evaluate(val_data, eval_batch_size)
print('-' * 89)
print('| end of epoch {:3d} (t0 on) | time: {:5.2f}s | valid loss {:5.2f} | '
'valid ppl {:8.2f} | valloss2 ppl {:8.2f}'.format(
epoch, (time.time() - epoch_start_time), val_loss, math.exp(val_loss), math.exp(val_loss2)))
print('-' * 89)
if val_loss2 < stored_loss:
model_save(MODELFILENAME)
print('Saving Averaged!')
stored_loss = val_loss2
for prm in model.parameters():
prm.data = tmp[prm].clone()
allvallosses.append(val_loss2)
else:
val_loss = evaluate(val_data, eval_batch_size)
print('-' * 89)
print('| end of epoch {:3d} | time: {:5.2f}s | valid loss {:5.2f} | '
'valid ppl {:8.2f} | valid bpc {:8.3f}'.format(
epoch, (time.time() - epoch_start_time), val_loss, math.exp(val_loss), val_loss / math.log(2)))
print('-' * 89)
if val_loss < stored_loss:
model_save(MODELFILENAME)
print('Saving model (new best validation)')
stored_loss = val_loss
if args.optimizer == 'sgd' and 't0' not in optimizer.param_groups[0]:
if (args.asgdtime < 0 and len(best_val_loss)>args.nonmono and val_loss > min(best_val_loss[:-args.nonmono])) or (args.asgdtime > 0 and len(best_val_loss) == args.asgdtime) :
print('Switching to ASGD')
optimizer = torch.optim.ASGD(model.parameters(), lr=args.lr, t0=0, lambd=0., weight_decay=args.wdecay)
if epoch in args.when:
print('Saving model before learning rate decreased')
EPOCHFILENAME = '{}.e{}'.format(MODELFILENAME, epoch)
model_save(EPOCHFILENAME)
FILENAMESTOSAVE.append(EPOCHFILENAME)
print('Dividing learning rate by 10')
optimizer.param_groups[0]['lr'] /= 10.
best_val_loss.append(val_loss)
allvallosses.append(val_loss)
np.savetxt(RESULTSFILENAME, allvallosses)
# Saving files remotely.... (Uber only!)
if os.path.isdir('/mnt/share/tmiconi'):
print("Transferring to NFS storage...")
for fn in FILENAMESTOSAVE:
result = os.system(
'cp {} {}'.format(fn, '/mnt/share/tmiconi/ptb/'+fn))
print("Done!")
#if checkHdfs():
# print("Transfering to HDFS...")
# for fn in FILENAMESTOSAVE:
# transferFileToHdfsDir(fn, '/ailabs/tmiconi/ptb/')
except KeyboardInterrupt:
print('-' * 89)
print('Exiting from training early')
# Load the best saved model.
model_load(MODELFILENAME)
# Run on test data.
test_loss = evaluate(test_data, test_batch_size)
print('=' * 89)
print('| End of training | test loss {:5.2f} | test ppl {:8.2f} | test bpc {:8.3f}'.format(
test_loss, math.exp(test_loss), test_loss / math.log(2)))
print('=' * 89)
================================================
FILE: awd-lstm-lm/model.py
================================================
import torch
import torch.nn as nn
#from torch.autograd import Variable
from embed_regularize import embedded_dropout
from locked_dropout import LockedDropout
from weight_drop import WeightDrop
import random, pdb
import mylstm
class RNNModel(nn.Module):
"""Container module with an encoder, a recurrent module, and a decoder."""
def __init__(self, rnn_type, ntoken, ninp, nhid, proplstm, nlayers, dropout=0.5, dropouth=0.5, dropouti=0.5, dropoute=0.1, wdrop=0, tie_weights=False, params={}):
super(RNNModel, self).__init__()
self.lockdrop = LockedDropout()
self.idrop = nn.Dropout(dropouti)
self.hdrop = nn.Dropout(dropouth)
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(ntoken, ninp)
assert rnn_type in ['LSTM', 'QRNN', 'GRU', 'MYLSTM', 'MYFASTLSTM', 'SIMPLEPLASTICLSTM', 'FASTPLASTICLSTM', 'PLASTICLSTM', 'SPLITLSTM'], 'RNN type is not supported'
if rnn_type == 'LSTM':
self.rnns = [torch.nn.LSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), 1, dropout=0) for l in range(nlayers)]
#for rr in self.rnns:
# rr.flatten_parameters()
if wdrop:
print("Using WeightDrop!")
self.rnns = [WeightDrop(rnn, ['weight_hh_l0'], dropout=wdrop) for rnn in self.rnns]
elif rnn_type == 'MYLSTM':
self.rnns = [mylstm.MyLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid)) for l in range(nlayers)]
elif rnn_type == 'MYFASTLSTM':
self.rnns = [mylstm.MyFastLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid)) for l in range(nlayers)]
elif rnn_type == 'PLASTICLSTM':
self.rnns = [mylstm.PlasticLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), params) for l in range(nlayers)]
elif rnn_type == 'SIMPLEPLASTICLSTM':
# Note that this one ignores the 'params' argument, which is only kept to preserve identical signature with PlasticLSTM
self.rnns = [mylstm.SimplePlasticLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), params) for l in range(nlayers)]
elif rnn_type == 'FASTPLASTICLSTM':
self.rnns = [mylstm.MyFastPlasticLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), params) for l in range(nlayers)]
elif rnn_type == 'SPLITLSTM': # Not used
self.rnns = [mylstm.SplitLSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), proplstm, params) for l in range(nlayers)]
elif rnn_type == 'GRU':
self.rnns = [torch.nn.GRU(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else ninp, 1, dropout=0) for l in range(nlayers)]
if wdrop:
self.rnns = [WeightDrop(rnn, ['weight_hh_l0'], dropout=wdrop) for rnn in self.rnns]
elif rnn_type == 'QRNN':
from torchqrnn import QRNNLayer
self.rnns = [QRNNLayer(input_size=ninp if l == 0 else nhid, hidden_size=nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), save_prev_x=True, zoneout=0, window=2 if l == 0 else 1, output_gate=True) for l in range(nlayers)]
for rnn in self.rnns:
rnn.linear = WeightDrop(rnn.linear, ['weight'], dropout=wdrop)
print(self.rnns)
self.rnns = torch.nn.ModuleList(self.rnns)
self.decoder = nn.Linear(nhid, ntoken)
# Optionally tie weights as in:
# "Using the Output Embedding to Improve Language Models" (Press & Wolf 2016)
# https://arxiv.org/abs/1608.05859
# and
# "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling" (Inan et al. 2016)
# https://arxiv.org/abs/1611.01462
if tie_weights:
#if nhid != ninp:
# raise ValueError('When using the tied flag, nhid must be equal to emsize')
self.decoder.weight = self.encoder.weight
self.init_weights()
self.rnn_type = rnn_type
self.ninp = ninp
self.nhid = nhid
self.proplstm = proplstm
self.nlayers = nlayers
self.dropout = dropout
self.dropouti = dropouti
self.dropouth = dropouth
self.dropoute = dropoute
self.tie_weights = tie_weights
def reset(self):
if self.rnn_type == 'QRNN': [r.reset() for r in self.rnns]
def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
self.decoder.bias.data.fill_(0)
self.decoder.weight.data.uniform_(-initrange, initrange)
def forward(self, input, hidden, return_h=False):
emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0)
#emb = self.idrop(emb)
emb = self.lockdrop(emb, self.dropouti)
raw_output = emb
new_hidden = []
#raw_output, hidden = self.rnn(emb, hidden)
raw_outputs = []
outputs = []
for l, rnn in enumerate(self.rnns):
current_input = raw_output
# Each rnn is a layer!
# each raw_output has shape seq_len x batch_size x nb_hidden
# new_h is a tuple of 2 elements, each of size 1 x batch_size x nb_hidden (last h and last c)
if self.rnn_type != 'MYLSTM' and self.rnn_type != 'MYFASTLSTM' and self.rnn_type != 'SIMPLEPLASTICLSTM' and self.rnn_type != 'PLASTICLSTM' and self.rnn_type != 'FASTPLASTICLSTM' and self.rnn_type != 'SPLITLSTM':
raw_output, new_h = rnn(raw_output, hidden[l])
else:
single_h = hidden[l] # actually a tuple, includes the h and the c (and for plastic LTMS, includes Hebb as third element!)
singleouts = []
for z in range(raw_output.shape[0]):
singleout, single_h = rnn(raw_output[z], single_h)
#if z==0:
# print("RANDOM NUMBER 1:",float(torch.rand(1)))
singleouts.append(singleout)
new_h = single_h # the last (h,c[,hebb]) after the sequence is processed
raw_output = torch.stack(singleouts)
new_hidden.append(new_h)
raw_outputs.append(raw_output)
if l != self.nlayers - 1:
#self.hdrop(raw_output)
# lockdrop will zero out some output units over the whole sequence (separately chosen for each batch, but fixed across sequence)
#pdb.set_trace()
raw_output = self.lockdrop(raw_output, self.dropouth)
outputs.append(raw_output)
#pdb.set_trace()
hidden = new_hidden
#pdb.set_trace()
output = self.lockdrop(raw_output, self.dropout)
outputs.append(output)
result = output.view(output.size(0)*output.size(1), output.size(2))
if return_h:
return result, hidden, raw_outputs, outputs
return result, hidden
def init_hidden(self, bsz):
weight = next(self.parameters()).data
if self.rnn_type == 'MYLSTM' or self.rnn_type == 'MYFASTLSTM':
return [((weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()),
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()))
for l in range(self.nlayers)]
elif self.rnn_type == 'PLASTICLSTM' or self.rnn_type == 'SIMPLEPLASTICLSTM':
return [(
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()), # h state
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()), # c state
(weight.new(bsz, self.rnns[l].w.shape[0], self.rnns[l].w.shape[1]).zero_()) # hebbian trace for the recurrent weights
#(weight.new(bsz, self.rnns[l].isize, self.rnns[l].hsize).zero_()) # hebbian trace for the input weights (not necessarily used)
)
for l in range(self.nlayers)]
elif self.rnn_type == 'FASTPLASTICLSTM':
return [(
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()), # h state
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()), # c state
(weight.new(bsz, self.rnns[l].hsize, self.rnns[l].hsize).zero_()) # hebbian trace of recurrent weights
#(weight.new(bsz, self.rnns[l].isize, self.rnns[l].hsize).zero_()) # hebbian trace for the input weights (not necessarily used)
#(weight.new(bsz, self.rnns[l].w.shape[0], self.rnns[l].w.shape[1]).zero_()), # hebbian trace for the recurrent weights
#(weight.new(bsz, self.rnns[l].win.shape[0], self.rnns[l].win.shape[1]).zero_()) # hebbian trace for the input weights (not necessarily used)
)
for l in range(self.nlayers)]
elif self.rnn_type == 'SPLITLSTM':
return [(
(weight.new(bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()), # H state
(weight.new(bsz, self.rnns[l].lsize ).zero_()), # C state
(weight.new(bsz, self.rnns[l].w.shape[0], self.rnns[l].w.shape[1]).zero_()), # hebb
(weight.new(bsz, self.rnns[l].win.shape[0], self.rnns[l].win.shape[1]).zero_()) # hebbin
)
for l in range(self.nlayers)]
elif self.rnn_type == 'LSTM' :
return [((weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()),
(weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()))
for l in range(self.nlayers)]
elif self.rnn_type == 'QRNN' or self.rnn_type == 'GRU':
return [(weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_())
for l in range(self.nlayers)]
================================================
FILE: awd-lstm-lm/model.py.old
================================================
import torch
import torch.nn as nn
from torch.autograd import Variable
from embed_regularize import embedded_dropout
from locked_dropout import LockedDropout
from weight_drop import WeightDrop
class RNNModel(nn.Module):
"""Container module with an encoder, a recurrent module, and a decoder."""
def __init__(self, rnn_type, ntoken, ninp, nhid, nlayers, dropout=0.5, dropouth=0.5, dropouti=0.5, dropoute=0.1, wdrop=0, tie_weights=False):
super(RNNModel, self).__init__()
self.lockdrop = LockedDropout()
self.idrop = nn.Dropout(dropouti)
self.hdrop = nn.Dropout(dropouth)
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(ntoken, ninp)
assert rnn_type in ['LSTM', 'QRNN', 'GRU'], 'RNN type is not supported'
if rnn_type == 'LSTM':
self.rnns = [torch.nn.LSTM(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), 1, dropout=0) for l in range(nlayers)]
if wdrop:
self.rnns = [WeightDrop(rnn, ['weight_hh_l0'], dropout=wdrop) for rnn in self.rnns]
if rnn_type == 'GRU':
self.rnns = [torch.nn.GRU(ninp if l == 0 else nhid, nhid if l != nlayers - 1 else ninp, 1, dropout=0) for l in range(nlayers)]
if wdrop:
self.rnns = [WeightDrop(rnn, ['weight_hh_l0'], dropout=wdrop) for rnn in self.rnns]
elif rnn_type == 'QRNN':
from torchqrnn import QRNNLayer
self.rnns = [QRNNLayer(input_size=ninp if l == 0 else nhid, hidden_size=nhid if l != nlayers - 1 else (ninp if tie_weights else nhid), save_prev_x=True, zoneout=0, window=2 if l == 0 else 1, output_gate=True) for l in range(nlayers)]
for rnn in self.rnns:
rnn.linear = WeightDrop(rnn.linear, ['weight'], dropout=wdrop)
print(self.rnns)
self.rnns = torch.nn.ModuleList(self.rnns)
self.decoder = nn.Linear(nhid, ntoken)
# Optionally tie weights as in:
# "Using the Output Embedding to Improve Language Models" (Press & Wolf 2016)
# https://arxiv.org/abs/1608.05859
# and
# "Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling" (Inan et al. 2016)
# https://arxiv.org/abs/1611.01462
if tie_weights:
#if nhid != ninp:
# raise ValueError('When using the tied flag, nhid must be equal to emsize')
self.decoder.weight = self.encoder.weight
self.init_weights()
self.rnn_type = rnn_type
self.ninp = ninp
self.nhid = nhid
self.nlayers = nlayers
self.dropout = dropout
self.dropouti = dropouti
self.dropouth = dropouth
self.dropoute = dropoute
self.tie_weights = tie_weights
def reset(self):
if self.rnn_type == 'QRNN': [r.reset() for r in self.rnns]
def init_weights(self):
initrange = 0.1
self.encoder.weight.data.uniform_(-initrange, initrange)
self.decoder.bias.data.fill_(0)
self.decoder.weight.data.uniform_(-initrange, initrange)
def forward(self, input, hidden, return_h=False):
emb = embedded_dropout(self.encoder, input, dropout=self.dropoute if self.training else 0)
#emb = self.idrop(emb)
emb = self.lockdrop(emb, self.dropouti)
raw_output = emb
new_hidden = []
#raw_output, hidden = self.rnn(emb, hidden)
raw_outputs = []
outputs = []
for l, rnn in enumerate(self.rnns):
current_input = raw_output
raw_output, new_h = rnn(raw_output, hidden[l])
new_hidden.append(new_h)
raw_outputs.append(raw_output)
if l != self.nlayers - 1:
#self.hdrop(raw_output)
raw_output = self.lockdrop(raw_output, self.dropouth)
outputs.append(raw_output)
hidden = new_hidden
output = self.lockdrop(raw_output, self.dropout)
outputs.append(output)
result = output.view(output.size(0)*output.size(1), output.size(2))
if return_h:
return result, hidden, raw_outputs, outputs
return result, hidden
def init_hidden(self, bsz):
weight = next(self.parameters()).data
if self.rnn_type == 'LSTM':
return [(Variable(weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()),
Variable(weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_()))
for l in range(self.nlayers)]
elif self.rnn_type == 'QRNN' or self.rnn_type == 'GRU':
return [Variable(weight.new(1, bsz, self.nhid if l != self.nlayers - 1 else (self.ninp if self.tie_weights else self.nhid)).zero_())
for l in range(self.nlayers)]
================================================
FILE: awd-lstm-lm/mylstm.py
================================================
# Plastic LSTMs, with neuromodulation (backpropamine),
# as described in Miconi et al. ICLR 2019,
# by Thomas Miconi and Aditya Rawal.
# Copyright (c) 2018-2019 Uber Technologies, Inc.
#
# Licensed under the Uber Non-Commercial License (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at the root directory of this project.
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import numpy as np
import pdb
# SimplePlasticLSTM is a full-fledged implementation of Plastic LSTMs that uses
# default settings and is not parametrizable beyond input size and hidden size.
# This allows for simpler code and easier understanding. See "PlasticLSTM"
# below for a more customizable version.
class SimplePlasticLSTM(nn.Module):
def __init__(self, isize, hsize, params): # Note that 'params' is ignored for this class; we keep it to preserve the constructor's signature
super(SimplePlasticLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
self.activ = F.tanh
# Plastic connection trainable parameters, i.e. w and alpha:
self.w = torch.nn.Parameter(.02 * torch.rand(hsize, hsize) - .01)
self.alpha = torch.nn.Parameter(.0001 * torch.rand(1,1,hsize)) # One alpha per neuron (all incoming connections to a neuron share same alpha)
#self.alpha = torch.nn.Parameter(.0001 * torch.ones(1)) # One alpha for the whole network
#self.alpha = torch.nn.Parameter(.0001 * torch.rand(hsize, hsize)) # One alpha per connection
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
#self.h2c = torch.nn.Linear(hsize, hsize) # This (equivalent to Whg in PyTorch LSTM docs / Uc in Wikipedia description of LSTM) is replaced by the plastic connection
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
# Modulator output (M(t))
self.h2mod = torch.nn.Linear(hsize, 1) # Takes input from the h-state, computes the neuromodulator output
self.modfanout = torch.nn.Linear(1, hsize) # Projects the network's common neuromodulator output onto each neuron
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h, c and hebb
hebb = hidden[2]
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
# To implement plasticity, we replace h2c / Whg / Uc with a plastic connection composed of w, alpha and hebb
# Note that h2c / Whg / Uc is the matrix of weights that takes in the
# previous time-step h, and whose output (after adding the current input
# and passing through tanh) is multiplied by the input gates before being
# added to the cell state
# Note: Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
# This is probably not the most elegant way to do it, but it works (remember that there is one alpha per neuron, applied to all input connections of this neuron)
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, hebb)).squeeze(1)
x2coutput = self.x2c(inputs)
inputstocell = F.tanh(self.x2c(inputs) + h2coutput) # We compute this intermediary state to be used in Hebbian computations below
# Finally, compute the new cell and hidden states
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputstocell)
hactiv = torch.mul(opt, F.tanh(cell))
# Now we need to update the Hebbian traces, including any neuromodulation.
deltahebb = torch.bmm(hidden[0].unsqueeze(2), inputstocell.unsqueeze(1))
myeta = F.tanh(self.h2mod(hactiv)).unsqueeze(2) # Shape: BatchSize x 1 x 1
# The output of the following line has shape BatchSize x 1 x NHidden, i.e. 1 line and NHidden columns for each
# batch element.
# When multiplying by deltahebb (BatchSize x NHidden x NHidden), broadcasting will provide a different
# value for each column but the same value for all rows within each column. This is equivalent to providing
# the same neuromodulation to all the inputs to a given cell, while letting neuromodulation differ from
# cell to cell, as required for the fanout concept.
myeta = self.modfanout(myeta).squeeze().unsqueeze(1)
hebb = torch.clamp(hebb + myeta * deltahebb, min=-2.0, max=2.0)
# Note that "hactiv" (i.e. the new h-state) is duplicated in the return
# values. This is to maintain the signature used by main.py/model.py (which is from Merity et al.'s code)
# and is not necessary for other applications.
hidden = (hactiv, cell, hebb)
activout = hactiv
return activout, hidden
# A more customizable version of plastic LSTMs, using parameters passed in the 'params' argument.
class PlasticLSTM(nn.Module):
def __init__(self, isize, hsize, params):
super(PlasticLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
# Default values for configuration parameters:
self.cliptype, self.modultype, self.hebboutput, self.modulout, self.clipval, self.alphatype = 'clip', 'modplasth2mod', 'i2c', 'fanout', '2.0', 'perneuron'
# Description of the parameters:
# alphatype: do we have one alpha coefficient for each connection
# ('full'), one per neuron ('perneuron' - i.e. all input connections to
# a given neuron share the same alpha), or one for the entire network
# ('single')?
# modultype: 'none' (non-modulated plasticity) , 'modplasth2mod'
# (neuromodulation takes input from the current h-state) or
# 'modplastc2mod' (neuromodulation takes input from the currrent
# c-state).
# cliptype: 'clip', 'aditya' or 'decay' - specifies how the Hebbian traces should be constrained.
# clipval: maximum magnitude of the Hebbian trace values (default 2.0)
# modulout: 'single' (all connections receive the same neuromodulator
# output) or 'fanout' (neuromodulator input goes through a 1xN linear layer to reach each neuron)
# hebboutput: what counts as the "output" in the Hebbian product of input by output. Better to leave it at 'i2c'.
if 'cliptype' in params:
self.cliptype = params['cliptype']
if 'modultype' in params:
self.modultype = params['modultype']
if 'hebboutput' in params:
self.hebboutput = params['hebboutput']
if 'modulout' in params:
self.modulout= params['modulout']
if 'clipval' in params:
self.clipval= params['clipval']
if 'alphatype' in params:
self.alphatype= params['alphatype']
# Plastic connection trainable parameters, i.e. w and alpha:
self.w = torch.nn.Parameter(.02 * torch.rand(hsize, hsize) - .01)
if self.alphatype == 'perneuron':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(1,1,hsize))
elif self.alphatype == 'single':
self.alpha = torch.nn.Parameter(.0001 * torch.ones(1))
elif self.alphatype == 'full':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(hsize, hsize))
else:
raise ValueError("Must select appropriate alpha type (current incorrect value is:", str(self.alphatype), ")")
if self.modultype == 'none':
self.eta = torch.nn.Parameter(.01 * torch.ones(1)) # Everyone has the same eta (Note: if a parameter is not actually used, there can be problems with ASGD handling in main.py)
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
#self.h2c = torch.nn.Linear(hsize, hsize) # This (equivalent to Whg in PyTorch LSTM docs / Uc in Wikipedia description of LSTM) is replaced by the plastic connection
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
if self.modultype != 'none':
# This is the layer that computes the neuromodulator output at any time step, based on current hidden state.
# Although called 'h2mod', it may take input from h or c depending on modultype value
self.h2mod = torch.nn.Linear(hsize, 1)
# Is the modulation just a single scalar, or do we pass it through a 'fanout' weight matrix to get one different value for each target neuron?
if self.modulout == 'fanout':
self.modfanout = torch.nn.Linear(1, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h, c and hebb
hebb = hidden[2]
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
# To implement plasticity, we replace h2c / Whg / Uc with a plastic connection composed of w, alpha and hebb
# Note that h2c / Whg / Uc is the matrix of weights that takes in the
# previous time-step h, and whose output (after adding the current input
# and passing through tanh) is multiplied by the input gates before being
# added to the cell state
# Note: Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
if self.cliptype == 'aditya': # Clipping Hebbian traces a posteriori
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, torch.clamp(hebb, min=-self.clipval, max=self.clipval))).squeeze(1)
else:
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, hebb)).squeeze(1)
x2coutput = self.x2c(inputs)
inputstocell = F.tanh(self.x2c(inputs) + h2coutput) # We compute this intermediary state to be used in Hebbian computations below
# Finally, compute the new cell and hidden states
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputstocell)
hactiv = torch.mul(opt, F.tanh(cell))
# Now we need to compute the updates to the Hebbian traces, including any neuromodulation.
# For the Hebbian computation, what counts as "output"?
if self.hebboutput == 'i2c':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), inputstocell.unsqueeze(1))
elif self.hebboutput == 'h2co':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), h2coutput.unsqueeze(1))
elif self.hebboutput == 'cell':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), cell.unsqueeze(1))
elif self.hebboutput == 'hidden':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), hactiv.unsqueeze(1))
else:
raise ValueError("Must choose Hebbian target output")
# What is the source of the neuromodulator computation (if any)?
if self.modultype == 'none':
myeta = self.eta
elif self.modultype == 'modplasth2mod': # The neuromodulation takes input from the h-state
myeta = F.tanh(self.h2mod(hactiv)).unsqueeze(2) # Shape: BatchSize x 1 x 1
elif self.modultype == 'modplastc2mod': # The neuromodulation takes input from the c-state
myeta = F.tanh(self.h2mod(cell)).unsqueeze(2)
else:
raise ValueError("Must choose modulation type")
# If we use "fanout" neuromodulation, the neuromodulator output is passed through a (trainable) linear layer before hitting the neurons.
if self.modultype != 'none' and self.modulout == 'fanout':
# The output of the following line has shape BatchSize x 1 x NHidden, i.e. 1 line and NHidden columns for each
# batch element.
# When multiplying by deltahebb (BatchSize x NHidden x NHidden), broadcasting will provide a different
# value for each column but the same value for all rows within each column. This is equivalent to providing
# the same neuromodulation to all the inputs to a given cell, while letting neuromodulation differ from
# cell to cell, as required for the fanout concept.
myeta = self.modfanout(myeta).squeeze().unsqueeze(1)
# Various possible ways to clip the Hebbian trace
if self.cliptype == 'decay': # Exponential decay
hebb = (1 - myeta) * hebb + myeta * deltahebb
elif self.cliptype == 'clip': # Just a hard clip
hebb = torch.clamp(hebb + myeta * deltahebb, min=-self.clipval, max=self.clipval)
elif self.cliptype == 'aditya': # For this one, the clipping only occurs a posteriori (see above); hebb itself can grow arbitrarily
hebb = hebb + myeta * deltahebb
else:
raise ValueError("Must choose clip type")
# Note that "hactiv" (i.e. the new h-state) is duplicated in the return
# values. This is to maintain the signature used by main.py/model.py
# and is not necessary for other applications.
hidden = (hactiv, cell, hebb)
activout = hactiv
return activout, hidden
# This is a slightly faster implementation of Plastic Lstms: cut time by ~30% by grouping all matrix multiplications into two. Not fully debugged, use at own risk.
class MyFastPlasticLSTM(nn.Module):
def __init__(self, isize, hsize, params):
super(MyFastPlasticLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
self.activ = F.tanh
ok=0
if 'cliptype' in params:
self.cliptype = params['cliptype']
ok+=1
if 'modultype' in params:
self.modultype = params['modultype']
ok+=1
if 'hebboutput' in params:
self.hebboutput = params['hebboutput']
ok+=1
if 'modulout' in params:
self.modulout= params['modulout']
ok+=1
if 'clipval' in params:
self.clipval= params['clipval']
ok+=1
if 'alphatype' in params:
self.alphatype= params['alphatype']
ok+=1
if ok < 6:
raise ValueError('When constructing PlasticLSTM, must pass "params" dictionary including cliptype, clipval, modultype, modulout, alphatype and hebboutput')
# We group all weight matrices into two, just like the C implementation of LSTMs in PyTorch does. Faster!
# Note: this creates some redundant biases (though not many)
self.h2f_i_opt_c = torch.nn.Linear(hsize, 4*hsize) # Weights from h to f, i, o and c
self.x2f_i_opt_c = torch.nn.Linear(isize, 4*hsize) # Weights from x to f, i, o and c
self.isize = isize
self.hsize = hsize
if self.modultype != 'none':
self.h2mod = torch.nn.Linear(hsize, 1) # Although called 'h2mod', it may take input from h or c depending on modultype value
if self.modulout == 'fanout':
self.modfanout = torch.nn.Linear(1, hsize)
if self.alphatype == 'perneuron':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(1,1,hsize))
#self.alpha = Variable(.0001 * torch.ones(1).cuda(), requires_grad=True) #torch.rand(1,1,hsize))
elif self.alphatype == 'single':
self.alpha = torch.nn.Parameter(.0001 * torch.ones(1))
elif self.alphatype == 'full':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(hsize, hsize))
else:
raise ValueError("Must select alpha type (current incorrect value is:", str(self.alphatype), ")")
if self.modultype == 'none':
self.eta = torch.nn.Parameter(.01 * torch.ones(1)) # Everyone has the same eta (Note: if a parameter is not actually used, there can be problems with ASGD handling in main.py)
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h and c states
hsize = self.hsize
#fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
#ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
#opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
alloutputs = self.x2f_i_opt_c(inputs) + self.h2f_i_opt_c(hidden[0])
# hidden[0] and hidden[1] are the h state and the c state; hidden[2] is the hebbian trace
hebb = hidden[2]
fgt = F.sigmoid(alloutputs[:,:hsize])
ipt = F.sigmoid(alloutputs[:,hsize:2*hsize])
opt = F.sigmoid(alloutputs[:,2*hsize:3*hsize])
handx2coutput_w = alloutputs[:,3*hsize:]
if self.cliptype == 'aditya':
h2coutput_hebb = hidden[0].unsqueeze(1).bmm(torch.mul(self.alpha, self.clipval * torch.tanh(hebb))).squeeze(1) # Slightly different version
else:
h2coutput_hebb = hidden[0].unsqueeze(1).bmm(torch.mul(self.alpha, hebb)).squeeze(1)
inputtoc = F.tanh(handx2coutput_w + h2coutput_hebb)
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputtoc)
hactiv = torch.mul(opt, F.tanh(cell))
#if self.hebboutput == 'i2c':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), inputtoc.unsqueeze(1))
if self.modultype == 'none':
myeta = self.eta
elif self.modultype == 'modplasth2mod':
myeta = F.tanh(self.h2mod(hactiv)).unsqueeze(2) # Shape: BatchSize x 1 x 1
elif self.modultype == 'modplastc2mod':
myeta = F.tanh(self.h2mod(cell)).unsqueeze(2)
else:
raise ValueError("Must choose modulation type")
if self.modultype != 'none' and self.modulout == 'fanout':
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
# The output of the following line has shape BatchSize x 1 x NHidden, i.e. 1 line and NHidden columns for each
# batch element. When multiplying by hebb (BatchSize x NHidden x NHidden), broadcasting will provide a different
# value of myeta for each cell but the same value for all inputs of a cell, as required by fanout concept.
myeta = self.modfanout(myeta).squeeze().unsqueeze(1)
if self.cliptype == 'decay':
hebb = (1 - myeta) * hebb + myeta * deltahebb
elif self.cliptype == 'clip':
hebb = torch.clamp(hebb + myeta * deltahebb, min=-self.clipval, max=self.clipval)
elif self.cliptype == 'aditya' :
hebb = hebb + myeta * deltahebb
else:
raise ValueError("Must choose clip type")
hidden = (hactiv, cell, hebb)
activout = hactiv
return activout, hidden #, hebb, et, pw
# Standard, non-plastic LSTM, reimplemented "by hand" to check if our
# implementation is correct, and to ensure that our comparisons use the closest
# possible non-plastic equivalent to our plastic LSTMs. Gets almost identical
# results to the PyTorch internal LSTM used by the original smerity code.
class MyLSTM(nn.Module):
def __init__(self, isize, hsize):
super(MyLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
self.h2c = torch.nn.Linear(hsize, hsize)
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h and c states
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))
hactiv = torch.mul(opt, F.tanh(cell))
#pdb.set_trace()
hidden = (hactiv, cell)
activout = hactiv #self.h2o(hactiv)
#pdb.set_trace()
return activout, hidden #, hebb, et, pw
# Faster MyLSTM - by ~30% in comparison to MyLSTM, by grouping matrices and matrix multiplications. Not fully debugged, use at own risk.
class MyFastLSTM(nn.Module):
def __init__(self, isize, hsize):
super(MyFastLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
# We group all weight matrices into two, just like the C implementation of LSTMs in PyTorch does
# Note: this creates some redundant biases (though not many)
self.h2f_i_opt_c = torch.nn.Linear(hsize, 4*hsize) # Weights from h to f, i, o and c
self.x2f_i_opt_c = torch.nn.Linear(isize, 4*hsize) # Weights from x to f, i, o and c
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h and c states
#fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0])) #
#ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0])) #
#opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0])) #
alloutputs = self.x2f_i_opt_c(inputs) + self.h2f_i_opt_c(hidden[0])
hsize = self.hsize
# You can gain ~ 5% in speed by grouping these three :
fgt = F.sigmoid(alloutputs[:,:hsize])
ipt = F.sigmoid(alloutputs[:,hsize:2*hsize])
opt = F.sigmoid(alloutputs[:,2*hsize:3*hsize])
inputtoc = F.tanh(alloutputs[:,3*hsize:])
#cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))#
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputtoc)
hactiv = torch.mul(opt, F.tanh(cell))
hidden = (hactiv, cell)
activout = hactiv
#pdb.set_trace()
return activout, hidden #, hebb, et, pw
================================================
FILE: awd-lstm-lm/mylstm.py.orig
================================================
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import numpy as np
import pdb
class PlasticLSTM(nn.Module):
def __init__(self, isize, hsize, params):
super(PlasticLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
ok=0
if 'cliptype' in params:
self.cliptype = params['cliptype']
ok+=1
if 'modultype' in params:
self.modultype = params['modultype']
ok+=1
if 'hebboutput' in params:
self.hebboutput = params['hebboutput']
ok+=1
if 'modulout' in params:
self.modulout= params['modulout']
ok+=1
if 'alphatype' in params:
self.alphatype= params['alphatype']
ok+=1
if ok < 5:
raise ValueError('When using PlasticLSTM, must specify cliptype, modultype, modulout, alphatype and hebboutput in params')
# Plastic connection parameters:
self.w = torch.nn.Parameter(.02 * torch.rand(hsize, hsize) - .01)
if self.alphatype == 'perneuron':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(1,1,hsize))
#self.alpha = Variable(.0001 * torch.ones(1).cuda(), requires_grad=True) #torch.rand(1,1,hsize))
elif self.alphatype == 'full':
self.alpha = torch.nn.Parameter(.0001 * torch.rand(hsize, hsize))
else:
raise ValueError("Must select alpha type (current incorrect value is:", str(self.alphatype), ")")
if self.modultype == 'none':
self.eta = torch.nn.Parameter(.01 * torch.ones(1)) # Everyone has the same eta (Note: if a parameter is not actually used, there can be problems with ASGD handling in main.py)
#self.eta = .01
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
#self.h2c = torch.nn.Linear(hsize, hsize) # This (equivalent to Whg in the PyTorch docs, Uc in Wikipedia) is replaced by the plastic connection
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
# Is the modulation just a single scalar, or do we pass it through a 'fanout' weight matrix?
if self.modultype != 'none':
self.h2mod = torch.nn.Linear(hsize, 1) # Although called 'h2mod', it may take input from h or c depending on modultype value
if self.modulout == 'fanout':
self.modfanout = torch.nn.Linear(1, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h, c and hebb
hebb = hidden[2]
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
#cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))
# To implement plasticity, we replace h2c / Whg / Uc with a plastic connection composed of w, alpha and hebb
# Note that h2c / Whg / Uc is the matrix of weights that takes in the
# previous time-step h, and whose output (after adding the current input
# and passing through tanh) is multiplied by the input gates before being
# added to the cell state
if self.cliptype == 'aditya':
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, torch.clamp(hebb, min=-1.0, max=1.0))).squeeze()
else:
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, hebb)).squeeze()
#if np.random.rand() < .1:
# pdb.set_trace()
inputstocell = F.tanh(self.x2c(inputs) + h2coutput)
#inputstocell = F.tanh(self.x2c(inputs) + torch.matmul(hidden[0].unsqueeze(1), self.w.unsqueeze(0)).squeeze(1))
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputstocell) # self.h2c(hidden[0])))
#pdb.set_trace()
hactiv = torch.mul(opt, F.tanh(cell))
#pdb.set_trace()
# For the Hebbian computation, what counts as "output"?
if self.hebboutput == 'i2c':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), inputstocell.unsqueeze(1))
elif self.hebboutput == 'h2co':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), h2coutput.unsqueeze(1))
elif self.hebboutput == 'cell':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), cell.unsqueeze(1))
elif self.hebboutput == 'hidden':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), hactiv.unsqueeze(1))
else:
raise ValueError("Must choose Hebbian target output")
# What is the source of the neuromodulator computation (if any)?
if self.modultype == 'none':
myeta = self.eta
elif self.modultype == 'modplasth2mod':
myeta = F.tanh(self.h2mod(hactiv)).unsqueeze(2) # Shape: BatchSize x 1 x 1
elif self.modultype == 'modplastc2mod':
myeta = F.tanh(self.h2mod(cell)).unsqueeze(2)
else:
raise ValueError("Must choose modulation type")
#pdb.set_trace()
if self.modultype != 'none' and self.modulout == 'fanout':
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
# The output of the following line has shape BatchSize x 1 x NHidden, i.e. 1 line and NHidden columns for each
# batch element. When multiplying by hebb (BatchSize x NHidden x NHidden), broadcasting will provide a different
# value for each cell but the same value for all inputs of a cell, as required by fanout concept.
myeta = self.modfanout(myeta).squeeze().unsqueeze(1)
if self.cliptype == 'decay':
hebb = (1 - myeta) * hebb + myeta * deltahebb
elif self.cliptype == 'clip':
hebb = torch.clamp(hebb + myeta * deltahebb, min=-1.0, max=1.0)
elif self.cliptype == 'aditya':
hebb = hebb + myeta * deltahebb
else:
raise ValueError("Must choose clip type")
hidden = (hactiv, cell, hebb)
activout = hactiv #self.h2o(hactiv)
#if np.isnan(np.sum(hactiv.data.cpu().numpy())) or np.isnan(np.sum(hidden[1].data.cpu().numpy())) :
# raise ValueError("Nan detected !")
#pdb.set_trace()
return activout, hidden #, hebb, et, pw
class MyLSTM(nn.Module):
# Standard, non-plastic LSTM, reimplemented "by hand" to check if our
# implementation is correct. Gets almost identical results to the PyTorch
# internal LSTM used by the original smerity code.
def __init__(self, isize, hsize):
super(MyLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
self.h2c = torch.nn.Linear(hsize, hsize)
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h and c states
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))
hactiv = torch.mul(opt, F.tanh(cell))
#pdb.set_trace()
hidden = (hactiv, cell)
activout = hactiv #self.h2o(hactiv)
#if np.isnan(np.sum(hactiv.data.cpu().numpy())) or np.isnan(np.sum(hidden[1].data.cpu().numpy())) :
# raise ValueError("Nan detected !")
#pdb.set_trace()
return activout, hidden #, hebb, et, pw
================================================
FILE: awd-lstm-lm/opus.docker.old
================================================
#tmiconi_rl
#latest
#.
#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10
FROM opus-deep-learning-py3:master-prod-2019_2_5_4_54_39
#FROM opus-deep-learning:master--2018_9_20_18_2_31
RUN mkdir /home/work
COPY ./*.py /home/work/
COPY ./*.sh /home/work/
COPY ./*.md /home/work/
ENV LC_ALL C.UTF-8
ENV LANG C.UTF-8
================================================
FILE: awd-lstm-lm/plotresults.py
================================================
import numpy as np
import glob
import matplotlib.pyplot as plt
import scipy
from scipy import stats
colorz = ['r', 'b', 'g', 'c', 'm', 'y', 'orange', 'k']
groupnames = glob.glob('./HDFS/ptb/results*seed0.txt')
#groupnames = glob.glob('./HDFS/ptbprevious/results*seed0.txt')
#groupnames = glob.glob('./HDFS/ptbold/results*.txt')
#groupnames = glob.glob('./tmp/loss_*new*eplen_250*rngseed_0.txt')
#groupnames = glob.glob('./tmp/loss_*new*.9_*rngseed_0.txt')
# If you can only use 7 runs, smooth the losses within each run to obtain more reliable estimates of performance!
def mavg(x, N=20):
return x
#cumsum = np.cumsum(np.insert(x, 0, 0))
#return (cumsum[N:] - cumsum[:-N]) / N
plt.ion()
#plt.figure(figsize=(5,4)) # Smaller figure = relative larger fonts
plt.figure()
allmedianls = []
alllosses = []
poscol = 0
minminlen = 999999
for numgroup, groupname in enumerate(groupnames):
if "ults__" not in groupname:
continue
g = groupname[:-6]+"*"
print("====", groupname)
fnames = glob.glob(g)
fulllosses=[]
losses=[]
lgts=[]
for fn in fnames:
if "COPY" in fn:
continue
if False:
#if "seed_3" in fn:
# continue
#if "seed_7" in fn:
# continue
if "seed_8" in fn:
continue
if "seed_9" in fn:
continue
if "seed_10" in fn:
continue
if "seed_11" in fn:
continue
if "seed_12" in fn:
continue
if "seed_13" in fn:
continue
if "seed_14" in fn:
continue
if "seed_15" in fn:
continue
z = np.loadtxt(fn)
#z = mavg(z, 10) # For each run, we average the losses over K successive episodes
#z = z[::10] # Decimation - speed things up!
print(len(z))
#if len(z) < 100:
# print(fn, len(z))
# continue
#z = z[:90]
lgts.append(len(z))
fulllosses.append(z)
minlen = min(lgts)
if minlen < minminlen:
minminlen = minlen
print(minlen)
#if minlen < 1000:
# continue
for z in fulllosses:
losses.append(z[:minlen])
losses = np.array(losses)
alllosses.append(losses)
meanl = np.mean(losses, axis=0)
stdl = np.std(losses, axis=0)
cil = stdl / np.sqrt(losses.shape[0]) * 1.96 # 95% confidence interval - assuming normality
#cil = stdl / np.sqrt(losses.shape[0]) * 2.5 # 95% confidence interval - approximated with the t-distribution for 7 d.f.
medianl = np.median(losses, axis=0)
allmedianls.append(medianl)
q1l = np.percentile(losses, 25, axis=0)
q3l = np.percentile(losses, 75, axis=0)
highl = np.max(losses, axis=0)
lowl = np.min(losses, axis=0)
#highl = meanl+stdl
#lowl = meanl-stdl
xx = range(len(meanl))
# xticks and labels
#xt = range(0, len(meanl), 1000)
xt = range(0, 10001, 2000)
xtl = [str(10 * 10 * i) for i in xt] # Because of decimation above, and only every 10th loss is recorded in the files
#plt.plot(mavg(meanl, 100), label=g) #, color='blue')
#plt.fill_between(xx, lowl, highl, alpha=.2)
#plt.fill_between(xx, q1l, q3l, alpha=.1)
#plt.plot(meanl) #, color='blue')
####plt.plot(mavg(medianl, 100), label=g) #, color='blue') # mavg changes the number of points !
#plt.plot(mavg(q1l, 100), label=g, alpha=.3) #, color='blue')
#plt.plot(mavg(q3l, 100), label=g, alpha=.3) #, color='blue')
#plt.fill_between(xx, q1l, q3l, alpha=.2)
#plt.plot(medianl, label=g) #, color='blue')
AVGSIZE = 1
xlen = len(mavg(q1l, AVGSIZE))
#mylabel = g[g.find('type'):]
mylabel = g
myls = '-'
if poscol >= len(colorz):
myls = "--"
plt.plot(mavg(medianl, AVGSIZE), label=mylabel, color=colorz[poscol % len(colorz)], ls=myls) # mavg changes the number of points !
plt.fill_between( range(xlen), mavg(q1l, AVGSIZE), mavg(q3l, AVGSIZE), alpha=.2, color=colorz[poscol % len(colorz)])
#xlen = len(mavg(meanl, AVGSIZE))
#plt.plot(mavg(meanl, AVGSIZE), label=g, color=colorz[poscol % len(colorz)]) # mavg changes the number of points !
#plt.fill_between( range(xlen), mavg(meanl - cil, AVGSIZE), mavg(meanl + cil, AVGSIZE), alpha=.2, color=colorz[poscol % len(colorz)])
poscol += 1
#plt.fill_between( range(xlen), mavg(lowl, 100), mavg(highl, 100), alpha=.2, color=colorz[numgroup % len(colorz)])
#plt.plot(mavg(losses[0], 1000), label=g, color=colorz[numgroup % len(colorz)])
#for curve in losses[1:]:
# plt.plot(mavg(curve, 1000), color=colorz[numgroup % len(colorz)])
ps = []
# Adapt for varying lengths across groups
#for n in range(0, alllosses[0].shape[1], 3):
#for n in range(0, minminlen):
# ps.append(scipy.stats.ranksums(alllosses[0][:,n], alllosses[1][:,n]).pvalue)
#ps = np.array(ps)
plt.legend(loc='best', fontsize=12)
#plt.xlabel('Loss (sum square diff. b/w final output and target)')
plt.xlabel('Number of Episodes')
plt.ylabel('Loss')
#plt.xticks(xt, xtl)
#plt.tight_layout()
================================================
FILE: awd-lstm-lm/plotresultssingle.py
================================================
import numpy as np
import matplotlib.pyplot as plt
import glob
fns = glob.glob('./HDFS/ptb/results_*.txt')
plt.figure()
numcurve = 0
for (ii, fn) in enumerate(fns):
#if 'B_' not in fn and 'MYLSTM' not in fn:
# continue
if 'rngseed' in fn:
if 'seed0' not in fn:
continue
if 'agdiv10' in fn:
continue
#if '44' not in fn:
# continue
print(fn)
#if 'perneuron' in fn:
# continue
numcurve += 1
if numcurve > 20:
ls = ':'
elif numcurve > 10:
ls = '--'
else:
ls = '-'
#z = np.loadtxt(fn)
z = np.exp(np.loadtxt(fn))
plt.plot(z, label=fn,ls=ls)
plt.legend(loc='upper right')
plt.show()
================================================
FILE: awd-lstm-lm/pointer.py
================================================
import argparse
import time
import math
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import data
import model
from utils import batchify, get_batch, repackage_hidden
parser = argparse.ArgumentParser(description='PyTorch PennTreeBank RNN/LSTM Language Model')
parser.add_argument('--data', type=str, default='data/penn',
help='location of the data corpus')
parser.add_argument('--model', type=str, default='LSTM',
help='type of recurrent net (LSTM, QRNN)')
parser.add_argument('--save', type=str,default='best.pt',
help='model to use the pointer over')
parser.add_argument('--cuda', action='store_false',
help='use CUDA')
parser.add_argument('--bptt', type=int, default=5000,
help='sequence length')
parser.add_argument('--window', type=int, default=3785,
help='pointer window length')
parser.add_argument('--theta', type=float, default=0.6625523432485668,
help='mix between uniform distribution and pointer softmax distribution over previous words')
parser.add_argument('--lambdasm', type=float, default=0.12785920428335693,
help='linear mix between only pointer (1) and only vocab (0) distribution')
args = parser.parse_args()
###############################################################################
# Load data
###############################################################################
corpus = data.Corpus(args.data)
eval_batch_size = 1
test_batch_size = 1
#train_data = batchify(corpus.train, args.batch_size)
val_data = batchify(corpus.valid, test_batch_size, args)
test_data = batchify(corpus.test, test_batch_size, args)
###############################################################################
# Build the model
###############################################################################
ntokens = len(corpus.dictionary)
criterion = nn.CrossEntropyLoss()
def one_hot(idx, size, cuda=True):
a = np.zeros((1, size), np.float32)
a[0][idx] = 1
v = Variable(torch.from_numpy(a))
if cuda: v = v.cuda()
return v
def evaluate(data_source, batch_size=10, window=args.window):
# Turn on evaluation mode which disables dropout.
if args.model == 'QRNN': model.reset()
model.eval()
total_loss = 0
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(batch_size)
next_word_history = None
pointer_history = None
for i in range(0, data_source.size(0) - 1, args.bptt):
if i > 0: print(i, len(data_source), math.exp(total_loss / i))
data, targets = get_batch(data_source, i, evaluation=True, args=args)
output, hidden, rnn_outs, _ = model(data, hidden, return_h=True)
rnn_out = rnn_outs[-1].squeeze()
output_flat = output.view(-1, ntokens)
###
# Fill pointer history
start_idx = len(next_word_history) if next_word_history is not None else 0
next_word_history = torch.cat([one_hot(t.data[0], ntokens) for t in targets]) if next_word_history is None else torch.cat([next_word_history, torch.cat([one_hot(t.data[0], ntokens) for t in targets])])
#print(next_word_history)
pointer_history = Variable(rnn_out.data) if pointer_history is None else torch.cat([pointer_history, Variable(rnn_out.data)], dim=0)
#print(pointer_history)
###
# Built-in cross entropy
# total_loss += len(data) * criterion(output_flat, targets).data[0]
###
# Manual cross entropy
# softmax_output_flat = torch.nn.functional.softmax(output_flat)
# soft = torch.gather(softmax_output_flat, dim=1, index=targets.view(-1, 1))
# entropy = -torch.log(soft)
# total_loss += len(data) * entropy.mean().data[0]
###
# Pointer manual cross entropy
loss = 0
softmax_output_flat = torch.nn.functional.softmax(output_flat)
for idx, vocab_loss in enumerate(softmax_output_flat):
p = vocab_loss
if start_idx + idx > window:
valid_next_word = next_word_history[start_idx + idx - window:start_idx + idx]
valid_pointer_history = pointer_history[start_idx + idx - window:start_idx + idx]
logits = torch.mv(valid_pointer_history, rnn_out[idx])
theta = args.theta
ptr_attn = torch.nn.functional.softmax(theta * logits).view(-1, 1)
ptr_dist = (ptr_attn.expand_as(valid_next_word) * valid_next_word).sum(0).squeeze()
lambdah = args.lambdasm
p = lambdah * ptr_dist + (1 - lambdah) * vocab_loss
###
target_loss = p[targets[idx].data]
loss += (-torch.log(target_loss)).data[0]
total_loss += loss / batch_size
###
hidden = repackage_hidden(hidden)
next_word_history = next_word_history[-window:]
pointer_history = pointer_history[-window:]
return total_loss / len(data_source)
# Load the best saved model.
with open(args.save, 'rb') as f:
if not args.cuda:
model = torch.load(f, map_location=lambda storage, loc: storage)
else:
model = torch.load(f)
print(model)
# Run on val data.
val_loss = evaluate(val_data, test_batch_size)
print('=' * 89)
print('| End of pointer | val loss {:5.2f} | val ppl {:8.2f}'.format(
val_loss, math.exp(val_loss)))
print('=' * 89)
# Run on test data.
test_loss = evaluate(test_data, test_batch_size)
print('=' * 89)
print('| End of pointer | test loss {:5.2f} | test ppl {:8.2f}'.format(
test_loss, math.exp(test_loss)))
print('=' * 89)
================================================
FILE: awd-lstm-lm/request_devbox.json
================================================
{
"dockerImage":"tmiconi_rl",
"tag":"master-test-2018_11_27_11_33_25",
"cpus":2.0,
"ramMB":26000,
"gpus":1,
"diskMB":8000,
"cluster":"opusprodda3e",
"environment":"devel",
"user":"tmiconi",
"resourcePool": "/ailabs/p1/tmiconi",
"instances":1,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":1,
"constraints":{"sku":"p40_24gb"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: awd-lstm-lm/request_full.json
================================================
{
"dockerImage":"tmiconi_rl",
"tag":"master-test-2019_1_22_14_38_35",
"name":"PLASTICLSTM_bs6_clip2_cliptype_clip_alphatype_full_modultype_modplasth2mod_modulout_fanout_asgdtime_85_1067n_5r",
"cpus":2.0,
"cmdLine":"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \u0026\u0026 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/opt/hadoop/latest/bin \u0026\u0026 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64/ \u0026\u0026 export LC_ALL=C.UTF-8 \u0026\u0026 export LANG=C.UTF-8 \u0026\u0026 cd /home/work/ \u0026\u0026 bash ./OpusPrepare.sh \u0026\u0026 source /.bashrc \u0026\u0026 pyenv local 3.5.2 \u0026\u0026 python main.py --batch_size 6 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 300 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype modplasth2mod --modulout fanout --nhid 1067 --alphatype full --asgdtime 85 --clipval 2.0 --cliptype clip --seed {{mesos.instance}} ",
"ramMB":25000,
"gpus":1,
"diskMB":6000,
"cluster":"opusprodda1",
"environment":"devel",
"user":"tmiconi",
"resourcePool": "/ailabs/p1/tmiconi",
"instances":5,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":1,
"constraints":{"sku":"p6000"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: awd-lstm-lm/request_opus.json
================================================
{
"dockerImage":"tmiconi_rl",
"tag":"master-test-2019_3_13_17_37_3",
"name":"newcode_SqUsq_clp2_PLASTICLSTM_agdiv1150_opus_alphatype_full_modultype_modplasth2mod_modulout_fanout_asgdtime_125_1068n_5run",
"cpus":2.0,
"cmdLine":"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \u0026\u0026 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/opt/hadoop/latest/bin \u0026\u0026 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64/ \u0026\u0026 export LC_ALL=C.UTF-8 \u0026\u0026 export LANG=C.UTF-8 \u0026\u0026 cd /home/work/ \u0026\u0026 apt-get install unzip \u0026\u0026 sh ./getdata.sh \u0026\u0026 python3 main.py --batch_size 6 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 500 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype modplasth2mod --modulout fanout --nhid 1068 --alphatype full --asgdtime 125 --agdiv 1150 --seed {{mesos.instance}} ",
"ramMB":25000,
"gpus":1,
"diskMB":6000,
"cluster":"opusprodda1",
"environment":"devel",
"user":"tmiconi",
"resourcePool": "/ailabs/p1/tmiconi",
"instances":5,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":1,
"constraints":{"sku":"p6000"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: awd-lstm-lm/request_opus.json.old
================================================
{
"dockerImage":"tmiconi_rl",
"tag":"master-test-2018_12_11_15_39_4",
"name":"PLSTM_plastin_bs3_clip2_opus_alphatype_perneuron_modultype_modplasth2mod_modulout_fanout_asgdtime_65_1149n_5run",
"cpus":2.0,
"cmdLine":"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \u0026\u0026 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/opt/hadoop/latest/bin \u0026\u0026 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64/ \u0026\u0026 export LC_ALL=C.UTF-8 \u0026\u0026 export LANG=C.UTF-8 \u0026\u0026 cd /home/work/ \u0026\u0026 apt-get install unzip \u0026\u0026 sh ./getdata.sh \u0026\u0026 python3 main.py --batch_size 3 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 300 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype modplasth2mod --modulout fanout --nhid 1149 --alphatype perneuron --asgdtime 65 --clipval 2.0 --seed {{mesos.instance}} ",
"ramMB":25000,
"gpus":1,
"diskMB":6000,
"cluster":"opusprodda1",
"environment":"devel",
"user":"tmiconi",
"resourcePool": "/ailabs/p1/tmiconi",
"instances":5,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":1,
"constraints":{"sku":"p6000"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: awd-lstm-lm/request_plast.json
================================================
{
"dockerImage":"tmiconi_rl",
"tag":"master-test-2018_12_11_15_39_4",
"name":"PLSTM_plastin_bs3_clip2_opus_alphatype_perneuron_modultype_nomodul_modulout_single_asgdtime_44_1149n_5run",
"cpus":2.0,
"cmdLine":"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \u0026\u0026 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/opt/hadoop/latest/bin \u0026\u0026 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64/ \u0026\u0026 export LC_ALL=C.UTF-8 \u0026\u0026 export LANG=C.UTF-8 \u0026\u0026 cd /home/work/ \u0026\u0026 apt-get install unzip \u0026\u0026 sh ./getdata.sh \u0026\u0026 python3 main.py --batch_size 3 --data data/penn --dropouti 0.4 --dropouth 0.25 --epoch 300 --save PTB.pt --wdrop 0 --model PLASTICLSTM --modultype none --modulout single --nhid 1149 --alphatype perneuron --asgdtime 44 --clipval 2.0 --seed {{mesos.instance}} ",
"ramMB":25000,
"gpus":1,
"diskMB":6000,
"cluster":"opusprodda1",
"environment":"devel",
"user":"tmiconi",
"resourcePool": "/ailabs/p1/tmiconi",
"instances":5,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":1,
"constraints":{"sku":"p6000"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: awd-lstm-lm/splitcross.py
================================================
from collections import defaultdict
import torch
import torch.nn as nn
import numpy as np
class SplitCrossEntropyLoss(nn.Module):
r'''SplitCrossEntropyLoss calculates an approximate softmax'''
def __init__(self, hidden_size, splits, verbose=False):
# We assume splits is [0, split1, split2, N] where N >= |V|
# For example, a vocab of 1000 words may have splits [0] + [100, 500] + [inf]
super(SplitCrossEntropyLoss, self).__init__()
self.hidden_size = hidden_size
self.splits = [0] + splits + [100 * 1000000]
self.nsplits = len(self.splits) - 1
self.stats = defaultdict(list)
self.verbose = verbose
# Each of the splits that aren't in the head require a pretend token, we'll call them tombstones
# The probability given to this tombstone is the probability of selecting an item from the represented split
if self.nsplits > 1:
self.tail_vectors = nn.Parameter(torch.zeros(self.nsplits - 1, hidden_size))
self.tail_bias = nn.Parameter(torch.zeros(self.nsplits - 1))
def logprob(self, weight, bias, hiddens, splits=None, softmaxed_head_res=None, verbose=False):
# First we perform the first softmax on the head vocabulary and the tombstones
if softmaxed_head_res is None:
start, end = self.splits[0], self.splits[1]
head_weight = None if end - start == 0 else weight[start:end]
head_bias = None if end - start == 0 else bias[start:end]
# We only add the tombstones if we have more than one split
if self.nsplits > 1:
head_weight = self.tail_vectors if head_weight is None else torch.cat([head_weight, self.tail_vectors])
head_bias = self.tail_bias if head_bias is None else torch.cat([head_bias, self.tail_bias])
# Perform the softmax calculation for the word vectors in the head for all splits
# We need to guard against empty splits as torch.cat does not like random lists
head_res = torch.nn.functional.linear(hiddens, head_weight, bias=head_bias)
softmaxed_head_res = torch.nn.functional.log_softmax(head_res, dim=-1)
if splits is None:
splits = list(range(self.nsplits))
results = []
running_offset = 0
for idx in splits:
# For those targets in the head (idx == 0) we only need to return their loss
if idx == 0:
results.append(softmaxed_head_res[:, :-(self.nsplits - 1)])
# If the target is in one of the splits, the probability is the p(tombstone) * p(word within tombstone)
else:
start, end = self.splits[idx], self.splits[idx + 1]
tail_weight = weight[start:end]
tail_bias = bias[start:end]
# Calculate the softmax for the words in the tombstone
tail_res = torch.nn.functional.linear(hiddens, tail_weight, bias=tail_bias)
# Then we calculate p(tombstone) * p(word in tombstone)
# Adding is equivalent to multiplication in log space
head_entropy = (softmaxed_head_res[:, -idx]).contiguous()
tail_entropy = torch.nn.functional.log_softmax(tail_res, dim=-1)
results.append(head_entropy.view(-1, 1) + tail_entropy)
if len(results) > 1:
return torch.cat(results, dim=1)
return results[0]
def split_on_targets(self, hiddens, targets):
# Split the targets into those in the head and in the tail
split_targets = []
split_hiddens = []
# Determine to which split each element belongs (for each start split value, add 1 if equal or greater)
# This method appears slower at least for WT-103 values for approx softmax
#masks = [(targets >= self.splits[idx]).view(1, -1) for idx in range(1, self.nsplits)]
#mask = torch.sum(torch.cat(masks, dim=0), dim=0)
###
# This is equally fast for smaller splits as method below but scales linearly
mask = None
for idx in range(1, self.nsplits):
partial_mask = targets >= self.splits[idx]
mask = mask + partial_mask if mask is not None else partial_mask
###
#masks = torch.stack([targets] * (self.nsplits - 1))
#mask = torch.sum(masks >= self.split_starts, dim=0)
for idx in range(self.nsplits):
# If there are no splits, avoid costly masked select
if self.nsplits == 1:
split_targets, split_hiddens = [targets], [hiddens]
continue
# If all the words are covered by earlier targets, we have empties so later stages don't freak out
if sum(len(t) for t in split_targets) == len(targets):
split_targets.append([])
split_hiddens.append([])
continue
# Are you in our split?
tmp_mask = mask == idx
split_targets.append(torch.masked_select(targets, tmp_mask))
split_hiddens.append(hiddens.masked_select(tmp_mask.unsqueeze(1).expand_as(hiddens)).view(-1, hiddens.size(1)))
return split_targets, split_hiddens
def forward(self, weight, bias, hiddens, targets, verbose=False):
if self.verbose or verbose:
for idx in sorted(self.stats):
print('{}: {}'.format(idx, int(np.mean(self.stats[idx]))), end=', ')
print()
total_loss = None
if len(hiddens.size()) > 2: hiddens = hiddens.view(-1, hiddens.size(2))
split_targets, split_hiddens = self.split_on_targets(hiddens, targets)
# First we perform the first softmax on the head vocabulary and the tombstones
start, end = self.splits[0], self.splits[1]
head_weight = None if end - start == 0 else weight[start:end]
head_bias = None if end - start == 0 else bias[start:end]
# We only add the tombstones if we have more than one split
if self.nsplits > 1:
head_weight = self.tail_vectors if head_weight is None else torch.cat([head_weight, self.tail_vectors])
head_bias = self.tail_bias if head_bias is None else torch.cat([head_bias, self.tail_bias])
# Perform the softmax calculation for the word vectors in the head for all splits
# We need to guard against empty splits as torch.cat does not like random lists
combo = torch.cat([split_hiddens[i] for i in range(self.nsplits) if len(split_hiddens[i])])
###
all_head_res = torch.nn.functional.linear(combo, head_weight, bias=head_bias)
softmaxed_all_head_res = torch.nn.functional.log_softmax(all_head_res, dim=-1)
if self.verbose or verbose:
self.stats[0].append(combo.size()[0] * head_weight.size()[0])
running_offset = 0
for idx in range(self.nsplits):
# If there are no targets for this split, continue
if len(split_targets[idx]) == 0: continue
# For those targets in the head (idx == 0) we only need to return their loss
if idx == 0:
softmaxed_head_res = softmaxed_all_head_res[running_offset:running_offset + len(split_hiddens[idx])]
entropy = -torch.gather(softmaxed_head_res, dim=1, index=split_targets[idx].view(-1, 1))
# If the target is in one of the splits, the probability is the p(tombstone) * p(word within tombstone)
else:
softmaxed_head_res = softmaxed_all_head_res[running_offset:running_offset + len(split_hiddens[idx])]
if self.verbose or verbose:
start, end = self.splits[idx], self.splits[idx + 1]
tail_weight = weight[start:end]
self.stats[idx].append(split_hiddens[idx].size()[0] * tail_weight.size()[0])
# Calculate the softmax for the words in the tombstone
tail_res = self.logprob(weight, bias, split_hiddens[idx], splits=[idx], softmaxed_head_res=softmaxed_head_res)
# Then we calculate p(tombstone) * p(word in tombstone)
# Adding is equivalent to multiplication in log space
head_entropy = softmaxed_head_res[:, -idx]
# All indices are shifted - if the first split handles [0,...,499] then the 500th in the second split will be 0 indexed
indices = (split_targets[idx] - self.splits[idx]).view(-1, 1)
# Warning: if you don't squeeze, you get an N x 1 return, which acts oddly with broadcasting
tail_entropy = torch.gather(torch.nn.functional.log_softmax(tail_res, dim=-1), dim=1, index=indices).squeeze()
entropy = -(head_entropy + tail_entropy)
###
running_offset += len(split_hiddens[idx])
total_loss = entropy.float().sum() if total_loss is None else total_loss + entropy.float().sum()
return (total_loss / len(targets)).type_as(weight)
if __name__ == '__main__':
np.random.seed(42)
torch.manual_seed(42)
if torch.cuda.is_available():
torch.cuda.manual_seed(42)
V = 8
H = 10
N = 100
E = 10
embed = torch.nn.Embedding(V, H)
crit = SplitCrossEntropyLoss(hidden_size=H, splits=[V // 2])
bias = torch.nn.Parameter(torch.ones(V))
optimizer = torch.optim.SGD(list(embed.parameters()) + list(crit.parameters()), lr=1)
for _ in range(E):
prev = torch.autograd.Variable((torch.rand(N, 1) * 0.999 * V).int().long())
x = torch.autograd.Variable((torch.rand(N, 1) * 0.999 * V).int().long())
y = embed(prev).squeeze()
c = crit(embed.weight, bias, y, x.view(N))
print('Crit', c.exp().data[0])
logprobs = crit.logprob(embed.weight, bias, y[:2]).exp()
print(logprobs)
print(logprobs.sum(dim=1))
optimizer.zero_grad()
c.backward()
optimizer.step()
================================================
FILE: awd-lstm-lm/test.py
================================================
import OpusHdfsCopy
from OpusHdfsCopy import transferFileToHdfsDir, checkHdfs
import argparse
import time
import math
import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import pdb
import data
import model
from utils import batchify, get_batch, repackage_hidden
torch.nn.Module.dump_patches=True
parser = argparse.ArgumentParser(description='PyTorch PennTreeBank RNN/LSTM Language Model Testing of Saved Models')
parser.add_argument('--file', type=str, default='',
help='name of the file containing the saved model to be tested')
parser.add_argument('--data', type=str, default='data/penn/',
help='location of the data corpus')
parser.add_argument('--model', type=str, default='LSTM',
help='type of recurrent net (LSTM, QRNN, GRU)')
parser.add_argument('--alphatype', type=str, default='full',
help="type of alpha matrix: (full, fanout)")
parser.add_argument('--modultype', type=str, default='none',
help="type of modulation: (none, modplasth2mod, modplastc2mod)")
parser.add_argument('--modulout', type=str, default='single',
help="modulatory output (single or fanout)")
parser.add_argument('--cliptype', type=str, default='clip',
help="clip type (decay, clip, aditya)")
parser.add_argument('--hebboutput', type=str, default='i2c',
help='output used for hebbian computations (i2c, h2co, cell, hidden)')
parser.add_argument('--emsize', type=int, default=400,
help='size of word embeddings')
parser.add_argument('--nhid', type=int, default=1150,
help='number of hidden units per layer')
parser.add_argument('--nlayers', type=int, default=3,
help='number of layers')
parser.add_argument('--lr', type=float, default=30,
help='initial learning rate')
parser.add_argument('--clip', type=float, default=0.25,
help='gradient clipping')
parser.add_argument('--numgpu', type=int, default=0,
help='which GPU to use? (no effect if GPU not used at all)')
parser.add_argument('--epochs', type=int, default=8000,
help='upper epoch limit')
parser.add_argument('--batch_size', type=int, default=80, metavar='N',
help='batch size')
parser.add_argument('--bptt', type=int, default=70,
help='sequence length')
parser.add_argument('--dropout', type=float, default=0.4,
help='dropout applied to layers (0 = no dropout)')
parser.add_argument('--dropouth', type=float, default=0.3,
help='dropout for rnn layers (0 = no dropout)')
parser.add_argument('--dropouti', type=float, default=0.65,
help='dropout for input embedding layers (0 = no dropout)')
parser.add_argument('--dropoute', type=float, default=0.1,
help='dropout to remove words from embedding layer (0 = no dropout)')
parser.add_argument('--wdrop', type=float, default=0.5,
help='amount of weight dropout to apply to the RNN hidden to hidden matrix')
parser.add_argument('--seed', type=int, default=1111,
help='random seed')
parser.add_argument('--nonmono', type=int, default=5,
help='random seed')
parser.add_argument('--cuda', action='store_false',
help='use CUDA')
parser.add_argument('--log-interval', type=int, default=200, metavar='N',
help='report interval')
randomhash = ''.join(str(time.time()).split('.'))
parser.add_argument('--save', type=str, default=randomhash+'.pt',
help='path to save the final model')
parser.add_argument('--alpha', type=float, default=2,
help='alpha L2 regularization on RNN activation (alpha = 0 means no regularization)')
parser.add_argument('--beta', type=float, default=1,
help='beta slowness regularization applied on RNN activiation (beta = 0 means no regularization)')
parser.add_argument('--wdecay', type=float, default=1.2e-6,
help='weight decay applied to all weights')
parser.add_argument('--resume', type=str, default='',
help='path of model to resume')
parser.add_argument('--optimizer', type=str, default='sgd',
help='optimizer to use (sgd, adam)')
parser.add_argument('--when', nargs="+", type=int, default=[-1],
help='When (which epochs) to divide the learning rate by 10 - accepts multiple')
args = parser.parse_args()
args.tied = True
# Set the random seed manually for reproducibility.
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if torch.cuda.is_available():
if not args.cuda:
print("WARNING: You have a CUDA device, so you should probably run with --cuda")
else:
torch.cuda.manual_seed(args.seed)
###############################################################################
# Load data
###############################################################################
def model_save(fn):
with open(fn, 'wb') as f:
torch.save([model, criterion, optimizer], f)
def model_load(fn):
global model, criterion, optimizer
with open(fn, 'rb') as f:
model, criterion, optimizer = torch.load(f, map_location=torch.device(args.numgpu))
import platform
print("Torch version:", torch.__version__, "Numpy version:", np.version.version, "Python version:", platform.python_version())
import os
import hashlib
fn = 'corpus.{}.data'.format(hashlib.md5(args.data.encode()).hexdigest())
if os.path.exists(fn):
print('Loading cached dataset...')
corpus = torch.load(fn)
else:
print('Producing dataset...')
corpus = data.Corpus(args.data)
torch.save(corpus, fn)
eval_batch_size = 10
test_batch_size = 1
train_data = batchify(corpus.train, args.batch_size, args)
val_data = batchify(corpus.valid, eval_batch_size, args)
test_data = batchify(corpus.test, test_batch_size, args)
#train_data = train_data[:5000,:] # For debugging
###############################################################################
# Build the model
###############################################################################
from splitcross import SplitCrossEntropyLoss
criterion = None
ntokens = len(corpus.dictionary)
myparams={}
myparams['cliptype'] = args.cliptype
myparams['modultype'] = args.modultype
myparams['modulout'] = args.modulout
myparams['hebboutput'] = args.hebboutput
myparams['alphatype'] = args.alphatype
suffix = args.model+'_'+myparams['cliptype']+'_'+myparams['modultype']+'_'+myparams['modulout']+'_'+myparams['hebboutput']+'_'+myparams['alphatype']+'_lr'+str(args.lr)+'_'+str(args.nlayers)+'l_'+str(args.nhid)+'h'
RESULTSFILENAME = 'results_'+suffix+'.txt'
MODELFILENAME = args.file
###
if not criterion:
splits = []
if ntokens > 500000:
# One Billion
# This produces fairly even matrix mults for the buckets:
# 0: 11723136, 1: 10854630, 2: 11270961, 3: 11219422
splits = [4200, 35000, 180000]
elif ntokens > 75000:
# WikiText-103
splits = [2800, 20000, 76000]
print('Using', splits)
criterion = SplitCrossEntropyLoss(args.emsize, splits=splits, verbose=False)
###
#params = list(model.parameters()) + list(criterion.parameters())
#if args.cuda:
# model = model.cuda()
# criterion = criterion.cuda()
# params = list(model.parameters()) + list(criterion.parameters())
####
#total_params = sum(x.size()[0] * x.size()[1] if len(x.size()) > 1 else x.size()[0] for x in params if x.size())
#print('Args:', args)
#print('Model total parameters:', total_params)
###############################################################################
# Training code
###############################################################################
def evaluate(data_source, batch_size=10):
# Turn on evaluation mode which disables dropout.
model.eval()
with torch.no_grad():
if args.model == 'QRNN': model.reset()
total_loss = 0
ntokens = len(corpus.dictionary)
hidden = model.init_hidden(batch_size)
for i in range(0, data_source.size(0) - 1, args.bptt):
data, targets = get_batch(data_source, i, args, evaluation=True)
output, hidden = model(data, hidden)
total_loss += len(data) * criterion(model.decoder.weight, model.decoder.bias, output, targets).data
hidden = repackage_hidden(hidden)
#return total_loss[0] / len(data_source)
return total_loss / len(data_source)
# Loop over epochs.
lr = args.lr
best_val_loss = []
stored_loss = 100000000
print("MyParams:", myparams)
print("Args:", args)
# Load the best saved model.
model_load(MODELFILENAME)
NUMGPU = args.numgpu
params = list(model.parameters()) + list(criterion.parameters())
if args.cuda:
model = model.cuda(device=NUMGPU)
criterion = criterion.cuda(device=NUMGPU)
params = list(model.parameters()) + list(criterion.parameters())
###
total_params = sum(x.numel() for x in params)# if x.numel())
print('Args:', args)
print('Model total parameters:', total_params)
#pdb.set_trace()
# Run on test data.
test_loss = evaluate(test_data, test_batch_size)
print('=' * 89)
print('| End of training | test loss {:5.2f} | test ppl {:8.2f} | test bpc {:8.3f}'.format(
test_loss, math.exp(test_loss), test_loss / math.log(2)))
print('=' * 89)
================================================
FILE: awd-lstm-lm/tmp.py
================================================
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import numpy as np
import pdb
class PlasticLSTM(nn.Module):
def __init__(self, isize, hsize, params):
super(PlasticLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
ok=0
if 'cliptype' in params:
self.cliptype = params['cliptype']
ok+=1
if 'modultype' in params:
self.modultype = params['modultype']
ok+=1
if 'hebboutput' in params:
self.hebboutput = params['hebboutput']
ok+=1
if 'modulout' in params:
self.modulout= params['modulout']
ok+=1
if 'alphatype' in params:
self.alphatype= params['alphatype']
ok+=1
if ok < 5:
raise ValueError('When using PlasticLSTM, must specify cliptype, modultype, modulout, alphatype and hebboutput in params')
# Plastic connection parameters:
self.w = torch.nn.Parameter(.02 * torch.rand(hsize, hsize) - .01)
if self.alphatype == 'fanout':
self.alpha = torch.nn.Parameter(.001 * torch.ones(1)) #torch.rand(1,1,hsize))
else:
self.alpha = torch.nn.Parameter(.00001 * torch.rand(hsize, hsize))
if self.modultype == 'none':
self.eta = torch.nn.Parameter(.01 * torch.ones(1)) # Everyone has the same eta (Note: if a parameter is not actually used, there can be problems with ASGD handling in main.py)
#self.eta = .01
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
#self.h2c = torch.nn.Linear(hsize, hsize) # This (equivalent to Whg in the PyTorch docs, Uc in Wikipedia) is replaced by the plastic connection
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
if self.modultype != 'none':
self.h2mod = torch.nn.Linear(hsize, 1) # Although called 'h2mod', it may take input from h or c depending on modultype value
if self.modulout == 'fanout':
self.modfanout = torch.nn.Linear(1, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h, c and hebb
hebb = hidden[2]
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
#cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))
# To implement plasticity, we replace h2c / Whg / Uc with a plastic connection composed of w, alpha and hebb
# Note that h2c / Whg / Uc is the matrix of weights that takes in the
# previous time-step h, and whose output (after adding the current input
# and passing through tanh) is multiplied by the input gates before being
# added to the cell state
if self.cliptype == 'aditya':
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, torch.clamp(hebb, min=-1.0, max=1.0))).squeeze()
else:
h2coutput = hidden[0].unsqueeze(1).bmm(self.w + torch.mul(self.alpha, hebb)).squeeze()
#if np.random.rand() < .1:
# pdb.set_trace()
inputstocell = F.tanh(self.x2c(inputs) + h2coutput)
#inputstocell = F.tanh(self.x2c(inputs) + torch.matmul(hidden[0].unsqueeze(1), self.w.unsqueeze(0)).squeeze(1))
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, inputstocell) # self.h2c(hidden[0])))
#pdb.set_trace()
hactiv = torch.mul(opt, F.tanh(cell))
#pdb.set_trace()
if self.hebboutput == 'i2c':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), inputstocell.unsqueeze(1))
elif self.hebboutput == 'h2co':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), h2coutput.unsqueeze(1))
elif self.hebboutput == 'cell':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), cell.unsqueeze(1))
elif self.hebboutput == 'hidden':
deltahebb = torch.bmm(hidden[0].unsqueeze(2), hactiv.unsqueeze(1))
else:
raise ValueError("Must choose Hebbian target output")
if self.modultype == 'none':
myeta = self.eta
elif self.modultype == 'modplasth2mod':
myeta = F.tanh(self.h2mod(hactiv)).unsqueeze(2) # Shape: BatchSize x 1 x 1
elif self.modultype == 'modplastc2mod':
myeta = F.tanh(self.h2mod(cell)).unsqueeze(2)
else:
raise ValueError("Must choose modulation type")
#pdb.set_trace()
if self.modultype != 'none' and self.modulout == 'fanout':
# Each *column* in w, hebb and alpha constitutes the inputs to a single cell
# For w and alpha, columns are 2nd dimension (i.e. dim 1); for hebb, it's dimension 2 (dimension 0 is batch)
# The output of the following line has shape BatchSize x 1 x NHidden, i.e. 1 line and NHidden columns for each
# batch element. When multiplying by hebb (BatchSize x NHidden x NHidden), broadcasting will provide a different
# value for each cell but the same value for all inputs of a cell, as required by fanout concept.
myeta = self.modfanout(myeta).squeeze().unsqueeze(1)
if self.cliptype == 'decay':
hebb = (1 - myeta) * hebb + myeta * deltahebb
elif self.cliptype == 'clip':
hebb = torch.clamp(hebb + myeta * deltahebb, min=-1.0, max=1.0)
elif self.cliptype == 'aditya':
hebb = hebb + myeta * deltahebb
else:
raise ValueError("Must choose clip type")
hidden = (hactiv, cell, hebb)
activout = hactiv #self.h2o(hactiv)
#if np.isnan(np.sum(hactiv.data.cpu().numpy())) or np.isnan(np.sum(hidden[1].data.cpu().numpy())) :
# raise ValueError("Nan detected !")
return activout, hidden #, hebb, et, pw
class MyLSTM(nn.Module):
def __init__(self, isize, hsize):
super(MyLSTM, self).__init__()
self.softmax= torch.nn.functional.softmax
#if params['activ'] == 'tanh':
self.activ = F.tanh
self.h2f = torch.nn.Linear(hsize, hsize)
self.h2i = torch.nn.Linear(hsize, hsize)
self.h2opt = torch.nn.Linear(hsize, hsize)
self.h2c = torch.nn.Linear(hsize, hsize)
self.x2f = torch.nn.Linear(isize, hsize)
self.x2opt = torch.nn.Linear(isize, hsize)
self.x2i = torch.nn.Linear(isize, hsize)
self.x2c = torch.nn.Linear(isize, hsize)
self.isize = isize
self.hsize = hsize
def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tuple of h and c states
fgt = F.sigmoid(self.x2f(inputs) + self.h2f(hidden[0]))
ipt = F.sigmoid(self.x2i(inputs) + self.h2i(hidden[0]))
opt = F.sigmoid(self.x2opt(inputs) + self.h2opt(hidden[0]))
cell = torch.mul(fgt, hidden[1]) + torch.mul(ipt, F.tanh(self.x2c(inputs) + self.h2c(hidden[0])))
hactiv = torch.mul(opt, F.tanh(cell))
#pdb.set_trace()
hidden = (hactiv, cell)
activout = hactiv #self.h2o(hactiv)
#if np.isnan(np.sum(hactiv.data.cpu().numpy())) or np.isnan(np.sum(hidden[1].data.cpu().numpy())) :
# raise ValueError("Nan detected !")
#pdb.set_trace()
return activout, hidden #, hebb, et, pw
================================================
FILE: awd-lstm-lm/utils.py
================================================
import torch
#from torch.autograd import Variable
def repackage_hidden(h):
"""Wraps hidden states in new Tensors, to detach them from their history."""
#if type(h) == Variable:
#return Variable(h.data)
if isinstance(h, torch.Tensor):
return h.detach()
else:
return tuple(repackage_hidden(v) for v in h)
def batchify(data, bsz, args):
# Work out how cleanly we can divide the dataset into bsz parts.
nbatch = data.size(0) // bsz
# Trim off any extra elements that wouldn't cleanly fit (remainders).
data = data.narrow(0, 0, nbatch * bsz)
# Evenly divide the data across the bsz batches.
data = data.view(bsz, -1).t().contiguous()
if args.cuda:
data = data.cuda(device=args.numgpu)
return data
def get_batch(source, i, args, seq_len=None, evaluation=False):
seq_len = min(seq_len if seq_len else args.bptt, len(source) - 1 - i)
data = source[i:i+seq_len]
target = source[i+1:i+1+seq_len].view(-1)
return data, target
================================================
FILE: awd-lstm-lm/weight_drop.py
================================================
import torch
from torch.nn import Parameter
from functools import wraps
class WeightDrop(torch.nn.Module):
def __init__(self, module, weights, dropout=0, variational=False):
super(WeightDrop, self).__init__()
self.module = module
self.weights = weights
self.dropout = dropout
self.variational = variational
self._setup()
def widget_demagnetizer_y2k_edition(*args, **kwargs):
# We need to replace flatten_parameters with a nothing function
# It must be a function rather than a lambda as otherwise pickling explodes
# We can't write boring code though, so ... WIDGET DEMAGNETIZER Y2K EDITION!
# (╯°□°)╯︵ ┻━┻
return
def _setup(self):
# Terrible temporary solution to an issue regarding compacting weights re: CUDNN RNN
if issubclass(type(self.module), torch.nn.RNNBase):
self.module.flatten_parameters = self.widget_demagnetizer_y2k_edition
for name_w in self.weights:
print('Applying weight drop of {} to {}'.format(self.dropout, name_w))
w = getattr(self.module, name_w)
del self.module._parameters[name_w]
self.module.register_parameter(name_w + '_raw', Parameter(w.data))
def _setweights(self):
for name_w in self.weights:
raw_w = getattr(self.module, name_w + '_raw')
w = None
if self.variational:
mask = torch.autograd.Variable(torch.ones(raw_w.size(0), 1))
if raw_w.is_cuda: mask = mask.cuda()
mask = torch.nn.functional.dropout(mask, p=self.dropout, training=True)
w = mask.expand_as(raw_w) * raw_w
else:
w = torch.nn.functional.dropout(raw_w, p=self.dropout, training=self.training)
setattr(self.module, name_w, w)
def forward(self, *args):
self._setweights()
return self.module.forward(*args)
if __name__ == '__main__':
import torch
from weight_drop import WeightDrop
# Input is (seq, batch, input)
x = torch.autograd.Variable(torch.randn(2, 1, 10)).cuda()
h0 = None
###
print('Testing WeightDrop')
print('=-=-=-=-=-=-=-=-=-=')
###
print('Testing WeightDrop with Linear')
lin = WeightDrop(torch.nn.Linear(10, 10), ['weight'], dropout=0.9)
lin.cuda()
run1 = [x.sum() for x in lin(x).data]
run2 = [x.sum() for x in lin(x).data]
print('All items should be different')
print('Run 1:', run1)
print('Run 2:', run2)
assert run1[0] != run2[0]
assert run1[1] != run2[1]
print('---')
###
print('Testing WeightDrop with LSTM')
wdrnn = WeightDrop(torch.nn.LSTM(10, 10), ['weight_hh_l0'], dropout=0.9)
wdrnn.cuda()
run1 = [x.sum() for x in wdrnn(x, h0)[0].data]
run2 = [x.sum() for x in wdrnn(x, h0)[0].data]
print('First timesteps should be equal, all others should differ')
print('Run 1:', run1)
print('Run 2:', run2)
# First time step, not influenced by hidden to hidden weights, should be equal
assert run1[0] == run2[0]
# Second step should not
assert run1[1] != run2[1]
print('---')
================================================
FILE: images/OpusHdfsCopy.py
================================================
# Uber-only code for interacting with hdfs
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path
def checkHdfs():
return os.path.isfile('/opt/hadoop/latest/bin/hdfs')
def transferFileToHdfsPath(sourcepath, targetpath):
hdfspath = targetpath
targetdir = os.path.dirname(targetpath)
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
def transferFileToHdfsDir(sourcepath, targetdir):
hdfspath = os.path.join(targetdir, os.path.basename(sourcepath))
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
================================================
FILE: images/README.md
================================================
## Images
This code implements the image completion task: three images are shown several times, then one of the image is half-erased and presented, and the network must reconstruct the missing portion of the image.
To run this code, you must download the [CIFAR10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) (Python version), and copy the `data_batch_*` files into this directory.
================================================
FILE: images/anim.py
================================================
# Make an animation from the activities of the network over time
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
from numpy import random
import torch.nn.functional as F
import scipy
import scipy.misc
from torch import optim
import random
import sys
import pickle
import pdb
import time
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import glob
np.set_printoptions(precision=3)
import images as pics
from images import Network
fig = plt.figure()
plt.axis('off')
# Note that this is a different file from the ones used in training
with open('./data_batch_5', 'rb') as fo:
imagedict = pickle.load(fo, encoding='bytes')
imagedata = imagedict[b'data']
#suffix = 'eta_prestime_20_probadegrade_0.5_interpresdelay_2_learningrate_0.0001_prestimetest_3_rngseed_0_nbiter_50000_nbprescycles_3_inputboost_1.0_eta_0.01_nbpatterns_3_patternsize_1024' # This one used for first draft of the paper, rngseed 4
#suffix = 'eta_inputboost_1.0_learningrate_0.0001_nbprescycles_3_interpresdelay_2_eta_0.01_rngseed_0_probadegrade_0.5_nbiter_150000_nbpatterns_3_prestimetest_3_patternsize_1024_prestime_20'
#suffix="eta_nbpatterns_3_inputboost_1.0_nbprescycles_3_prestime_20_prestimetest_5_interpresdelay_2_patternsize_1024_nbiter_50000_probadegrade_0.5_learningrate_0.0001_eta_0.01_rngseed_0"
suffix='etarefiner_eta_0.01_nbpatterns_3_interpresdelay_2_patternsize_1024_prestime_20_learningrate_1e-05_nbprescycles_3_rngseed_0_prestimetest_3_probadegrade_0.5_inputboost_1.0_nbiter_150000'
#fn = './tmp/results_'+suffix+'.dat'
fn = './results_'+suffix+'.dat'
with open(fn, 'rb') as fo:
myw = pickle.load(fo)
myalpha = pickle.load(fo)
myeta = pickle.load(fo)
myall_losses = pickle.load(fo)
myparams = pickle.load(fo)
net = Network(myparams)
#np.random.seed(params['rngseed']); random.seed(params['rngseed']); torch.manual_seed(params['rngseed'])
#rngseed=18
#rngseed=4
rngseed=7
np.random.seed(rngseed); random.seed(rngseed); torch.manual_seed(rngseed)
#print myall_losses
ttype = torch.cuda.FloatTensor # Must match the one in pics_eta.py
#ttype = torch.FloatTensor # Must match the one in pics_eta.py
net.w.data = torch.from_numpy(myw).type(ttype)
net.alpha.data = torch.from_numpy(myalpha).type(ttype)
net.eta.data = torch.from_numpy(myeta).type(ttype)
print(net.w.data[:10,:10])
print(net.eta.data)
NBPICS = 1 # 10
nn=1
imagesize = int(np.sqrt(myparams['patternsize']))
outputs={}
FILLINGSTEPS = myparams['prestimetest'] + myparams['interpresdelay'] + 1
# Two ways to do it : show the full actual process, or show a "simnplified" version where you just show the three images and the pattern completion (slowed down)
SIMPLIFIED = 0
if SIMPLIFIED:
for numpic in range(NBPICS):
print("Pattern", numpic)
z = np.random.rand()
z = np.random.rand()
inputsTensor, targetPattern = pics.generateInputsAndTarget(myparams, contiguousperturbation=True)
y = net.initialZeroState()
hebb = net.initialZeroHebb()
net.zeroDiagAlpha()
ax_imgs = []
print("Running the episode...")
for numstep in range(myparams['nbsteps']):
y, hebb = net(Variable(inputsTensor[numstep], requires_grad=False), y, hebb)
output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#output = scipy.misc.imresize(output, 4.0)
#plt.subplot(NBPICS, FILLINGSTEPS, nn)
#plt.axis('off')
#plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
#if numstep == 1 or numstep == myparams['prestime'] + myparams['interpresdelay'] + 1 or \
#numstep == 2 * (myparams['prestime'] + myparams['interpresdelay']) + 1 or \
# Show the last set of 3 patterns, and the completion:
if numstep == myparams['nbsteps'] - myparams['prestimetest'] - myparams['interpresdelay'] - 2 or \
numstep == myparams['nbsteps'] - myparams['prestimetest'] - (myparams['interpresdelay'] + myparams['prestime']) - myparams['interpresdelay'] - 2 or \
numstep == myparams['nbsteps'] - myparams['prestimetest'] - (myparams['interpresdelay'] + myparams['prestime']) *2 - myparams['interpresdelay'] - 2 or \
numstep >= myparams['nbsteps'] - myparams['prestimetest'] :
if numstep == myparams['nbsteps'] - myparams['prestimetest'] :
output_half = output.copy()
output_half[16:,:] = 0 # NOTE: we are assuming that the grayed part will be the bottom one, which is only true for half the cases
a1 = plt.imshow(output_half, animated=True, cmap='gray', vmin=-1.0, vmax=1.0)
else:
a1 = plt.imshow(output, animated=True, cmap='gray', vmin=-1.0, vmax=1.0)
#a2 = plt.text(1, 1, str(numstep)+"/"+str(myparams['nbsteps']), fontsize=12, color='r')
if numstep < myparams['nbsteps'] - myparams['prestimetest'] :
a3 = plt.text(1, 1, "Pattern "+str(nn), fontsize=12, color='r')
else:
a3 = plt.text(1, 1, "Pattern completion", fontsize=12, color='r')
ax_imgs.append([a1, a3])
#ax_imgs.append([fullimg])
nn += 1
#scipy.misc.imsave('pic'+str(numpic)+'_'+str(numstep)+'.png', output)
#plt.show(block=True)
print("Writing out the animation file")
anim = animation.ArtistAnimation(fig, ax_imgs, repeat_delay=2000) # repeat_delay is ignored...
anim.save('anim_short_'+str(numpic)+'.gif', writer='imagemagick', fps=1)
# All images could be rotated 90deg. This allows us to display each set as a
# vertical column by rotating the final image 90 degrees too.
#output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#pattern1 = inputsTensor.cpu().numpy()[0][0][:-1].reshape((imagesize, imagesize))
#pattern2 = inputsTensor.cpu().numpy()[myparams['prestime']+myparams['interpresdelay']+1][0][:-1].reshape((imagesize, imagesize))
#pattern3 = inputsTensor.cpu().numpy()[2*(myparams['prestime']+myparams['interpresdelay'])+1][0][:-1].reshape((imagesize, imagesize))
#blankedpattern = inputsTensor.cpu().numpy()[-1][0][:-1].reshape((imagesize, imagesize))
#plt.subplot(NBPICS,5,nn)
#plt.axis('off')
#plt.imshow(pattern1, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+1)
#plt.axis('off')
#plt.imshow(pattern2, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+2)
#plt.axis('off')
#plt.imshow(pattern3, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+3)
#plt.axis('off')
#plt.imshow(blankedpattern, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+4)
#plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.axis('off')
#nn += 5
#td = targetPattern.cpu().numpy()
#yd = y.data.cpu().numpy()[0][:-1]
#absdiff = np.abs(td-yd)
#print("Mean / median / max abs diff:", np.mean(absdiff), np.median(absdiff), np.max(absdiff))
#print("Correlation (full / sign): ", np.corrcoef(td, yd)[0][1], np.corrcoef(np.sign(td), np.sign(yd))[0][1])
##print inputs[numstep]
#plt.subplots_adjust(wspace=.1, hspace=.1)
else:
for numpic in range(NBPICS):
print("Pattern", numpic)
z = np.random.rand()
z = np.random.rand()
inputsTensor, targetPattern = pics.generateInputsAndTarget(myparams, contiguousperturbation=True)
y = net.initialZeroState()
hebb = net.initialZeroHebb()
net.zeroDiagAlpha()
ax_imgs = []
print("Running the episode...")
for numstep in range(myparams['nbsteps']):
y, hebb = net(Variable(inputsTensor[numstep], requires_grad=False), y, hebb)
output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#output = scipy.misc.imresize(output, 4.0)
#plt.subplot(NBPICS, FILLINGSTEPS, nn)
#plt.axis('off')
#plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
a1 = plt.imshow(output, animated=True, cmap='gray', vmin=-1.0, vmax=1.0)
a2 = plt.text(1, 1, str(numstep)+"/"+str(myparams['nbsteps']), fontsize=12, color='r')
if numstep < myparams['nbsteps'] - myparams['prestimetest'] - 1:
a3 = plt.text(14, 1, "Pattern presentations", fontsize=12, color='r')
else:
a3 = plt.text(14, 1, "Pattern completion", fontsize=12, color='r')
ax_imgs.append([a1, a2, a3])
#ax_imgs.append([fullimg])
nn += 1
#scipy.misc.imsave('pic'+str(numpic)+'_'+str(numstep)+'.png', output)
# Post-completion, keep the last image up a bit
for numstep_add in range(50):
a1 = plt.imshow(output, animated=True, cmap='gray', vmin=-1.0, vmax=1.0)
a2 = plt.text(1, 1, str(myparams['nbsteps'])+"/"+str(myparams['nbsteps']), fontsize=12, color='r')
a3 = plt.text(14, 1, "Pattern completion", fontsize=12, color='r')
ax_imgs.append([a1, a2, a3])
#plt.show(block=True)
print("Writing out the animation file")
anim = animation.ArtistAnimation(fig, ax_imgs, repeat_delay=2000) # repeat_delay is ignored...
anim.save('anim_full_'+str(numpic)+'.gif', writer='imagemagick', fps=10)
# All images could be rotated 90deg. This allows us to display each set as a
# vertical column by rotating the final image 90 degrees too.
#output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#pattern1 = inputsTensor.cpu().numpy()[0][0][:-1].reshape((imagesize, imagesize))
#pattern2 = inputsTensor.cpu().numpy()[myparams['prestime']+myparams['interpresdelay']+1][0][:-1].reshape((imagesize, imagesize))
#pattern3 = inputsTensor.cpu().numpy()[2*(myparams['prestime']+myparams['interpresdelay'])+1][0][:-1].reshape((imagesize, imagesize))
#blankedpattern = inputsTensor.cpu().numpy()[-1][0][:-1].reshape((imagesize, imagesize))
#plt.subplot(NBPICS,5,nn)
#plt.axis('off')
#plt.imshow(pattern1, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+1)
#plt.axis('off')
#plt.imshow(pattern2, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+2)
#plt.axis('off')
#plt.imshow(pattern3, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+3)
#plt.axis('off')
#plt.imshow(blankedpattern, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+4)
#plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.axis('off')
#nn += 5
#td = targetPattern.cpu().numpy()
#yd = y.data.cpu().numpy()[0][:-1]
#absdiff = np.abs(td-yd)
#print("Mean / median / max abs diff:", np.mean(absdiff), np.median(absdiff), np.max(absdiff))
#print("Correlation (full / sign): ", np.corrcoef(td, yd)[0][1], np.corrcoef(np.sign(td), np.sign(yd))[0][1])
##print inputs[numstep]
#plt.subplots_adjust(wspace=.1, hspace=.1)
================================================
FILE: images/images.py
================================================
# Differentiable plasticity: natural image memorization and reconstruction.
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This program uses the click module rather than argparse to scan command-line arguments. I won't do that again.
# You start getting acceptable results after ~3000 episodes (~15 minutes with a standard GPU). Let it run longer for better results.
# To observe the results, run testpics.py (which uses the output files produced by this program)
import torch
import torch.nn as nn
from torch.autograd import Variable
import click
import numpy as np
from numpy import random
import torch.nn.functional as F
from torch import optim
import random
import sys
import pickle
import pdb
import time
import os
import platform
# Uber-only:
#import OpusHdfsCopy
#from OpusHdfsCopy import transferFileToHdfsDir, checkHdfs
# Loading the image data. This requires downloading the CIFAR 10 dataset (Python version) - https://www.cs.toronto.edu/~kriz/cifar.html
imagedata=np.zeros((0, 1024*3))
for numfile in range(4):
with open('./data_batch_'+str(numfile+1), 'rb') as fo:
#imagedict = pickle.load(fo) # Python 2
imagedict = pickle.load(fo, encoding='bytes') # Python 3
imagedata = np.concatenate((imagedata, imagedict[b'data']), axis=0)
np.set_printoptions(precision=4)
defaultParams = {
'nbpatterns': 3, # number of images per episode
'nbprescycles': 3, # number of presentations for each image
'prestime': 20, # number of time steps for each image presentation
'prestimetest': 3, # number of time steps for the test (degraded) image
'interpresdelay': 2, # number of time steps (with zero input) between two presentations
'patternsize': 1024, # size of the images (32 x 32 = 1024)
'nbiter': 100000, # number of episodes
'probadegrade': .5, # when contiguousperturbation is False (which it shouldn't be), probability of zeroing each pixel in the test image
'lr': 1e-4, # Adam learning rate
'print_every': 10, # how often to print statistics and save files
'homogenous': 0, # whether alpha should be shared across connections
'rngseed':0 # random seed
}
#ttype = torch.FloatTensor; # For CPU
ttype = torch.cuda.FloatTensor; # For GPU
# Generate the full list of inputs for an episode
def generateInputsAndTarget(params, contiguousperturbation=True):
#print(("Input Boost:", params['inputboost']))
inputT = np.zeros((params['nbsteps'], 1, params['nbneur'])) #inputTensor, initially in numpy format...
# Create the random patterns to be memorized in an episode
# Floating-point, graded patterns, zero-mean
patterns=[]
for nump in range(params['nbpatterns']):
numpic = np.random.randint(imagedata.shape[0])
p = imagedata[numpic].reshape((3, 1024)).sum(0).astype(float)
p = p[:params['patternsize']]
p = p - np.mean(p)
p = p / (1e-8+np.max(np.abs(p)))
#p = (np.random.randint(2, size=params['patternsize']) - .5) *2 # Binary patterns
patterns.append(p)
#print "patterns generated!"
# Now 'patterns' contains the NBPATTERNS patterns to be memorized in this episode - in numpy format
# Creating the test pattern, partially zero'ed out, that the network will have to complete
testpattern = random.choice(patterns).copy()
preservedbits = np.ones(params['patternsize'])
if contiguousperturbation: # Contiguous perturbation = one contiguous half of the image is zeroed out. Default (see above).
preservedbits[int(params['patternsize']/2):] = 0
if np.random.rand() < .5:
preservedbits = 1 - preservedbits
else: # Otherwise, randomly zero out individual pixels. Because natural images are highly autocorrelated, a trivial approximate solution is to take the average of nearby pixels.
preservedbits[:int(params['probadegrade'] * params['patternsize'])] = 0; np.random.shuffle(preservedbits)
degradedtestpattern = testpattern * preservedbits
# Inserting the inputs in the input tensor at the proper places
for nc in range(params['nbprescycles']):
np.random.shuffle(patterns)
for ii in range(params['nbpatterns']):
for nn in range(params['prestime']):
numi =nc * (params['nbpatterns'] * (params['prestime']+params['interpresdelay'])) + ii * (params['prestime']+params['interpresdelay']) + nn
inputT[numi][0][:params['patternsize']] = patterns[ii][:]
for nn in range(params['prestimetest']):
inputT[-params['prestimetest'] + nn][0][:params['patternsize']] = degradedtestpattern[:]
for nn in range(params['nbsteps']):
inputT[nn][0][-1] = 1.0 # Bias neuron is forced to 1
#inputT[nn] *= params['inputboost'] # Strengthen inputs
inputT = torch.from_numpy(inputT).type(ttype) # Convert from numpy to Tensor
target = torch.from_numpy(testpattern).type(ttype)
return inputT, target
class Network(nn.Module):
def __init__(self, params):
super(Network, self).__init__()
# Notice that the vectors are row vectors, and the matrices are transposed wrt the comp neuro order, following deep learning / pytorch conventions
# Each *column* of w targets a single output neuron
self.w = Variable(.01 * torch.randn(params['nbneur'], params['nbneur']).type(ttype), requires_grad=True) # fixed (baseline) weights
if params['homogenous'] == 1:
self.alpha = Variable(.01 * torch.ones(1).type(ttype), requires_grad=True) # plasticity coefficients: homogenous/shared across connections
else:
self.alpha = Variable(.01 * torch.randn(params['nbneur'], params['nbneur']).type(ttype),requires_grad=True) # plasticity coefficients: independent
self.eta = Variable(.01 * torch.ones(1).type(ttype), requires_grad=True) # "learning rate" of plasticity, shared across all connections
self.params = params
def forward(self, input, yin, hebb):
# Inputs are fed by clamping the output of cells that receive input at the input value, like in standard Hopfield networks
# clamps = torch.zeros(1, self.params['nbneur'])
clamps = np.zeros(self.params['nbneur'])
zz = torch.nonzero(input.data[0].cpu()).numpy().squeeze()
#print(zz, zz.shape)
clamps[zz] = 1
#print(clamps)
clamps = Variable(torch.from_numpy(clamps).type(ttype), requires_grad=False).float()
yout = F.tanh( yin.mm(self.w + torch.mul(self.alpha, hebb))) * (1 - clamps) + input * clamps
hebb = (1 - self.eta) * hebb + self.eta * torch.bmm(yin.unsqueeze(2), yout.unsqueeze(1))[0] # bmm used to implement outer product
return yout, hebb
def initialZeroState(self):
return Variable(torch.zeros(1, self.params['nbneur']).type(ttype))
def initialZeroHebb(self):
return Variable(torch.zeros(self.params['nbneur'], self.params['nbneur']).type(ttype))
def train(paramdict=None):
#params = dict(click.get_current_context().params)
print("Starting training...")
params = {}
params.update(defaultParams)
if paramdict:
params.update(paramdict)
print("Passed params: ", params)
print(platform.uname())
sys.stdout.flush()
params['nbsteps'] = params['nbprescycles'] * ((params['prestime'] + params['interpresdelay']) * params['nbpatterns']) + params['prestimetest'] # Total number of steps per episode
params['nbneur'] = params['patternsize'] + 1
suffix = "images_"+"".join([str(x)+"_" if pair[0] is not 'nbneur' and pair[0] is not 'nbsteps' and pair[0] is not 'print_every' and pair[0] is not 'rngseed' else '' for pair in zip(params.keys(), params.values()) for x in pair])[:-1] + '_rngseed_'+str(params['rngseed']) # Turning the parameters into a nice suffix for filenames; rngseed always appears last
# Initialize random seeds (first two redundant?)
print("Setting random seeds")
np.random.seed(params['rngseed']); random.seed(params['rngseed']); torch.manual_seed(params['rngseed'])
#print(click.get_current_context().params)
print("Initializing network")
net = Network(params)
total_loss = 0.0
print("Initializing optimizer")
optimizer = torch.optim.Adam([net.w, net.alpha, net.eta], lr=params['lr'])
all_losses = []
#print_every = 20
nowtime = time.time()
print("Starting episodes...")
sys.stdout.flush()
for numiter in range(params['nbiter']):
# print("Iter ", numiter)
# sys.stdout.flush()
y = net.initialZeroState()
hebb = net.initialZeroHebb()
optimizer.zero_grad()
inputs, target = generateInputsAndTarget(params)
# Running the episode
for numstep in range(params['nbsteps']):
y, hebb = net(Variable(inputs[numstep], requires_grad=False), y, hebb)
# Computing gradients, applying optimizer
loss = (y[0][:params['patternsize']] - Variable(target, requires_grad=False)).pow(2).sum()
loss.backward()
optimizer.step()
lossnum = loss.data[0]
total_loss += lossnum
# Printing statistics, saving files
if (numiter+1) % params['print_every'] == 0:
print(numiter, "====")
td = target.cpu().numpy()
yd = y.data.cpu().numpy()[0][:-1]
print("y: ", yd[:10])
print("target: ", td[:10])
#print("target: ", target.unsqueeze(0)[0][:10])
absdiff = np.abs(td-yd)
print("Mean / median / max abs diff:", np.mean(absdiff), np.median(absdiff), np.max(absdiff))
print("Correlation (full / sign): ", np.corrcoef(td, yd)[0][1], np.corrcoef(np.sign(td), np.sign(yd))[0][1])
#print inputs[numstep]
previoustime = nowtime
nowtime = time.time()
print("Time spent on last", params['print_every'], "iters: ", nowtime - previoustime)
total_loss /= params['print_every']
all_losses.append(total_loss)
print("Mean loss over last", params['print_every'], "iters:", total_loss)
print("Saving local files...")
sys.stdout.flush()
with open('results_'+suffix+'.dat', 'wb') as fo:
pickle.dump(net.w.data.cpu().numpy(), fo)
pickle.dump(net.alpha.data.cpu().numpy(), fo)
pickle.dump(net.eta.data.cpu().numpy(), fo)
pickle.dump(all_losses, fo)
pickle.dump(params, fo)
print("ETA:", net.eta.data.cpu().numpy())
with open('loss_'+suffix+'.txt', 'w') as thefile:
for item in all_losses:
thefile.write("%s\n" % item)
# Uber-only
#print("Saving HDFS files...")
#if checkHdfs():
# print("Transfering to HDFS...")
# transferFileToHdfsDir('results_'+suffix+'.dat', '/ailabs/tmiconi/exp/')
# transferFileToHdfsDir('loss_'+suffix+'.txt', '/ailabs/tmiconi/exp/')
sys.stdout.flush()
sys.stderr.flush()
total_loss = 0
@click.command()
@click.option('--nbpatterns', default=defaultParams['nbpatterns'])
@click.option('--nbprescycles', default=defaultParams['nbprescycles'])
@click.option('--homogenous', default=defaultParams['prestime'])
@click.option('--prestime', default=defaultParams['prestime'])
@click.option('--prestimetest', default=defaultParams['prestimetest'])
@click.option('--interpresdelay', default=defaultParams['interpresdelay'])
@click.option('--patternsize', default=defaultParams['patternsize'])
@click.option('--nbiter', default=defaultParams['nbiter'])
@click.option('--probadegrade', default=defaultParams['probadegrade'])
@click.option('--lr', default=defaultParams['lr'])
@click.option('--print_every', default=defaultParams['print_every'])
@click.option('--rngseed', default=defaultParams['rngseed'])
def main(nbpatterns, nbprescycles, homogenous, prestime, prestimetest, interpresdelay, patternsize, nbiter, probadegrade, lr, print_every, rngseed):
train(paramdict=dict(click.get_current_context().params))
#print(dict(click.get_current_context().params))
if __name__ == "__main__":
#train()
main()
================================================
FILE: images/plotresults.py
================================================
import numpy as np
import glob
import matplotlib.pyplot as plt
fnames = glob.glob('./tmp/loss_simple_*.txt')
#fnames = glob.glob('./tmp/loss_api_*.txt')
#fnames = glob.glob('./tmp/loss_fixed_*.txt')
plt.ion()
plt.figure(figsize=(5,4)) # Smaller figure = relative larger fonts
fulllosses=[]
losses=[]
lgts=[]
for fn in fnames:
z = np.loadtxt(fn)
lgts.append(len(z))
fulllosses.append(z)
minlen = min(lgts)
for z in fulllosses:
losses.append(z[:minlen])
losses = np.array(losses)
meanl = np.mean(losses, axis=0)
stdl = np.std(losses, axis=0)
highl = np.max(losses, axis=0)
lowl = np.min(losses, axis=0)
#highl = meanl+stdl
#lowl = meanl-stdl
xx = range(len(meanl))
# xticks and labels
xt = range(0, len(meanl), 50)
xtl = [str(10*i) for i in xt]
plt.fill_between(xx, lowl, highl, color='blue', alpha=.5)
plt.plot(meanl, color='blue')
#plt.xlabel('Loss (sum square diff. b/w final output and target)')
plt.xlabel('Number of Episodes')
plt.ylabel('Loss')
plt.xticks(xt, xtl)
plt.tight_layout()
================================================
FILE: images/request.json
================================================
{
"dockerImage":"test_tm",
"tag":"master-test-2017_10_31_17_22_28",
"name":"PicsAPIToCompareWithFixed",
"cpus":2.0,
"cmdLine":"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \u0026\u0026 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/nvidia/bin:/opt/hadoop/latest/bin \u0026\u0026 export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/nvidia/lib64/ \u0026\u0026 export LC_ALL=C.UTF-8 \u0026\u0026 export LANG=C.UTF-8 \u0026\u0026 cd /home/work/ \u0026\u0026 python3 pics_api.py --rngseed {{mesos.instance}}",
"ramMB":8000,
"gpus":1,
"diskMB":1000,
"cluster":"opusprodda1",
"environment":"devel",
"user":"root",
"instances":10,
"isService":false,
"cronSchedule":"",
"custom":{},
"application":"testversion",
"maxRetries":3,
"constraints":{"sku":"1080ti"},
"accessTypes":[],
"dependencies":[],
"cronCollisionPolicy":"CANCEL_NEW",
"emailOnFail":[],
"emailOnSucceed":[]
}
================================================
FILE: images/showcompletion_eta.py
================================================
# Old code to show the dynamics of pattern completion : show the product of the network at each time step
# Useful to understand how the network works (i.e. the need to clear up remnant activity from previous stimuli)
# May require adjustments to work (e.g. change file names, etc.)
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
from numpy import random
import torch.nn.functional as F
import scipy
import scipy.misc
from torch import optim
import random
import sys
import pickle
import pdb
import time
np.set_printoptions(precision=3)
import matplotlib.pyplot as plt
plt.ion()
import images as pics
from images import Network
#plt.figure()
# Note that this is a different file from the ones used in training
with open('../data_batch_5', 'rb') as fo:
imagedict = pickle.load(fo, encoding='bytes')
imagedata = imagedict[b'data']
#suffix = 'eta_prestime_20_probadegrade_0.5_interpresdelay_2_learningrate_0.0001_prestimetest_3_rngseed_0_nbiter_50000_nbprescycles_3_inputboost_1.0_eta_0.01_nbpatterns_3_patternsize_1024' # This one used for first draft of the paper, rngseed 4
#suffix = 'eta_inputboost_1.0_learningrate_0.0001_nbprescycles_3_interpresdelay_2_eta_0.01_rngseed_0_probadegrade_0.5_nbiter_150000_nbpatterns_3_prestimetest_3_patternsize_1024_prestime_20'
#suffix="eta_nbpatterns_3_inputboost_1.0_nbprescycles_3_prestime_20_prestimetest_5_interpresdelay_2_patternsize_1024_nbiter_50000_probadegrade_0.5_learningrate_0.0001_eta_0.01_rngseed_0"
suffix='etarefiner_eta_0.01_nbpatterns_3_interpresdelay_2_patternsize_1024_prestime_20_learningrate_1e-05_nbprescycles_3_rngseed_0_prestimetest_3_probadegrade_0.5_inputboost_1.0_nbiter_150000'
#fn = './tmp/results_'+suffix+'.dat'
fn = './results_'+suffix+'.dat'
with open(fn, 'rb') as fo:
myw = pickle.load(fo)
myalpha = pickle.load(fo)
myeta = pickle.load(fo)
myall_losses = pickle.load(fo)
myparams = pickle.load(fo)
net = Network(myparams)
#np.random.seed(params['rngseed']); random.seed(params['rngseed']); torch.manual_seed(params['rngseed'])
#rngseed=4
rngseed=7
np.random.seed(rngseed); random.seed(rngseed); torch.manual_seed(rngseed)
#print myall_losses
ttype = torch.cuda.FloatTensor # Must match the one in pics_eta.py
#ttype = torch.FloatTensor # Must match the one in pics_eta.py
net.w.data = torch.from_numpy(myw).type(ttype)
net.alpha.data = torch.from_numpy(myalpha).type(ttype)
net.eta.data = torch.from_numpy(myeta).type(ttype)
print(net.w.data[:10,:10])
print(net.eta.data)
NBPICS = 10
nn=1
imagesize = int(np.sqrt(myparams['patternsize']))
outputs={}
plt.figure()
FILLINGSTEPS = myparams['prestimetest'] + myparams['interpresdelay'] + 1
for numpic in range(NBPICS):
print("Pattern", numpic)
z = np.random.rand()
z = np.random.rand()
inputsTensor, targetPattern = pics.generateInputsAndTarget(myparams, contiguousperturbation=True)
y = net.initialZeroState()
hebb = net.initialZeroHebb()
net.zeroDiagAlpha()
for numstep in range(myparams['nbsteps']):
y, hebb = net(Variable(inputsTensor[numstep], requires_grad=False), y, hebb)
if numstep >= myparams['nbsteps'] - FILLINGSTEPS:
output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#output = scipy.misc.imresize(output, 4.0)
plt.subplot(NBPICS, FILLINGSTEPS, nn)
plt.axis('off')
plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
nn += 1
#scipy.misc.imsave('pic'+str(numpic)+'_'+str(numstep)+'.png', output)
plt.show(block=True)
# All images could be rotated 90deg. This allows us to display each set as a
# vertical column by rotating the final image 90 degrees too.
#output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
#pattern1 = inputsTensor.cpu().numpy()[0][0][:-1].reshape((imagesize, imagesize))
#pattern2 = inputsTensor.cpu().numpy()[myparams['prestime']+myparams['interpresdelay']+1][0][:-1].reshape((imagesize, imagesize))
#pattern3 = inputsTensor.cpu().numpy()[2*(myparams['prestime']+myparams['interpresdelay'])+1][0][:-1].reshape((imagesize, imagesize))
#blankedpattern = inputsTensor.cpu().numpy()[-1][0][:-1].reshape((imagesize, imagesize))
#plt.subplot(NBPICS,5,nn)
#plt.axis('off')
#plt.imshow(pattern1, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+1)
#plt.axis('off')
#plt.imshow(pattern2, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+2)
#plt.axis('off')
#plt.imshow(pattern3, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+3)
#plt.axis('off')
#plt.imshow(blankedpattern, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.subplot(NBPICS,5,nn+4)
#plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
#plt.axis('off')
#nn += 5
#td = targetPattern.cpu().numpy()
#yd = y.data.cpu().numpy()[0][:-1]
#absdiff = np.abs(td-yd)
#print("Mean / median / max abs diff:", np.mean(absdiff), np.median(absdiff), np.max(absdiff))
#print("Correlation (full / sign): ", np.corrcoef(td, yd)[0][1], np.corrcoef(np.sign(td), np.sign(yd))[0][1])
##print inputs[numstep]
#plt.subplots_adjust(wspace=.1, hspace=.1)
================================================
FILE: images/testpics.py
================================================
# Generate a figure that shows a number of episodes
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
from numpy import random
import torch.nn.functional as F
from torch import optim
import random
import sys
import pickle
import pdb
import time
np.set_printoptions(precision=3)
import matplotlib.pyplot as plt
plt.ion()
import images as pics
from images import Network
plt.figure()
# Note that this is a different file from the ones used in training
with open('./data_batch_5', 'rb') as fo:
imagedict = pickle.load(fo, encoding='bytes')
imagedata = imagedict[b'data']
suffix='images_patternsize_1024_interpresdelay_2_nbpatterns_3_lr_0.0001_nbprescycles_3_homogenous_20_nbiter_100000_prestime_20_probadegrade_0.5_prestimetest_3_rngseed_0'
#fn = './tmp/results_'+suffix+'.dat'
fn = './results_'+suffix+'.dat'
with open(fn, 'rb') as fo:
myw = pickle.load(fo)
myalpha = pickle.load(fo)
myeta = pickle.load(fo)
myall_losses = pickle.load(fo)
myparams = pickle.load(fo)
net = Network(myparams)
#np.random.seed(params['rngseed']); random.seed(params['rngseed']); torch.manual_seed(params['rngseed'])
#rngseed=4
rngseed=4
np.random.seed(rngseed); random.seed(rngseed); torch.manual_seed(rngseed)
#print myall_losses
ttype = torch.cuda.FloatTensor # Must match the one in pics_eta.py
#ttype = torch.FloatTensor # Must match the one in pics_eta.py
net.w.data = torch.from_numpy(myw).type(ttype)
net.alpha.data = torch.from_numpy(myalpha).type(ttype)
net.eta.data = torch.from_numpy(myeta).type(ttype)
print(net.w.data[:10,:10])
print(net.eta.data)
NBPICS = 7
nn=1
for numpic in range(NBPICS):
print("Pattern", numpic)
inputsTensor, targetPattern = pics.generateInputsAndTarget(myparams, contiguousperturbation=True)
y = net.initialZeroState()
hebb = net.initialZeroHebb()
#net.zeroDiagAlpha()
for numstep in range(myparams['nbsteps']):
y, hebb = net(Variable(inputsTensor[numstep], requires_grad=False), y, hebb)
# All images could be rotated 90deg. This allows us to display each set as a
# vertical column by rotating the final image 90 degrees too.
imagesize = int(np.sqrt(myparams['patternsize']))
output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize))
pattern1 = inputsTensor.cpu().numpy()[0][0][:-1].reshape((imagesize, imagesize))
pattern2 = inputsTensor.cpu().numpy()[myparams['prestime']+myparams['interpresdelay']+1][0][:-1].reshape((imagesize, imagesize))
pattern3 = inputsTensor.cpu().numpy()[2*(myparams['prestime']+myparams['interpresdelay'])+1][0][:-1].reshape((imagesize, imagesize))
blankedpattern = inputsTensor.cpu().numpy()[-1][0][:-1].reshape((imagesize, imagesize))
#output = y.data.cpu().numpy()[0][:-1].reshape((imagesize, imagesize)).T
#pattern1 = inputsTensor.cpu().numpy()[0][0][:-1].reshape((imagesize, imagesize)).T
#pattern2 = inputsTensor.cpu().numpy()[myparams['prestime']+myparams['interpresdelay']+1][0][:-1].reshape((imagesize, imagesize)).T
#pattern3 = inputsTensor.cpu().numpy()[2*(myparams['prestime']+myparams['interpresdelay'])+1][0][:-1].reshape((imagesize, imagesize)).T
#blankedpattern = inputsTensor.cpu().numpy()[-1][0][:-1].reshape((imagesize, imagesize)).T
plt.subplot(NBPICS,5,nn)
plt.axis('off')
plt.imshow(pattern1, cmap='gray', vmin=-1.0, vmax=1.0)
plt.subplot(NBPICS,5,nn+1)
plt.axis('off')
plt.imshow(pattern2, cmap='gray', vmin=-1.0, vmax=1.0)
plt.subplot(NBPICS,5,nn+2)
plt.axis('off')
plt.imshow(pattern3, cmap='gray', vmin=-1.0, vmax=1.0)
plt.subplot(NBPICS,5,nn+3)
plt.axis('off')
plt.imshow(blankedpattern, cmap='gray', vmin=-1.0, vmax=1.0)
plt.subplot(NBPICS,5,nn+4)
plt.imshow(output, cmap='gray', vmin=-1.0, vmax=1.0)
plt.axis('off')
nn += 5
td = targetPattern.cpu().numpy()
yd = y.data.cpu().numpy()[0][:-1]
absdiff = np.abs(td-yd)
print("Mean / median / max abs diff:", np.mean(absdiff), np.median(absdiff), np.max(absdiff))
print("Correlation (full / sign): ", np.corrcoef(td, yd)[0][1], np.corrcoef(np.sign(td), np.sign(yd))[0][1])
#print inputs[numstep]
plt.subplots_adjust(wspace=.1, hspace=.1)
================================================
FILE: maze/OpusHdfsCopy.py
================================================
# Uber-only code for interacting with hdfs
#
# Copyright (c) 2018 Uber Technologies, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import os.path
def checkHdfs():
return os.path.isfile('/opt/hadoop/latest/bin/hdfs')
def transferFileToHdfsPath(sourcepath, targetpath):
hdfspath = targetpath
targetdir = os.path.dirname(targetpath)
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
def transferFileToHdfsDir(sourcepath, targetdir):
hdfspath = os.path.join(targetdir, os.path.basename(sourcepath))
os.system('/opt/hadoop/latest/bin/hdfs dfs -mkdir -p {}'.format(targetdir))
result = os.system(
'/opt/hadoop/latest/bin/hdfs dfs -copyFromLocal -f {} {}'.format(sourcepath, hdfspath)
)
if result != 0:
raise OSError('Cannot copyFromLocal {} {} returned {}'.format(sourcepath, hdfspath, result))
================================================
FILE: maze/README.md
================================================
# Maze task
This code performs the grid-maze task, in which the agent must locate a reward and then navigate back to it repeatedly (while being randomly relocated each time it finds it).
# Episode 1:

# Episode 300000:

=======
# Grid Maze task
The agent's task is to hit the (invisible) reward location as many times as
possible within a fixed number time steps. Because the reward location is
randomized at the start of each episode, and the agent is randomly teleported
every time it hits the reward, the agent must discover and memorize the reward
location for each episode.
The agent's only inputs consist of a 3x3 neighborhood around the agent's
location, as well as the reward obtained (if any) and the action chosen at the
previous time step.
The outer-loop metal-learning algorithm is Advantage Actor critic. All
within-episode learning occurs through the self-modulated plasticity of network
connections.
For a simpler (but less flexible) implementation of the same task, see the `simplemaze` directory in this repo.
## Visualizations of agent behavior
We show the behavior of the agent over two successive episodes, after 0 and 200,000 meta-learning iterations. The reward location is indicated only for visualization purposes: it is invisible to the agent.
### Episode 0

### Episode 200,000

## Usage
`python3 batch.py --eplen 200 --hs 100 --lr 1e-4 --l2 0 --addpw 3 --pe 1000 --blossv 0.1 --bent 0.03 --rew 10 --save_every 1000 --rsp 1 --type modplast --da tanh --nbiter 200002 --msize 13 --wp 0.0 --bs 30 --gc 4.0 --rngseed 0`
`eplen' is the length of an episode, `hs` is the hidden/recurrent layer size, `bs` is batch size and `gc` is gradient clipping.
`type` can be "modplast" (simple neuromodulation), "modul" (retroactive modulation), "plastic" (non-modulated plasticity) or "rnn" (no plasticity at all, plain rnn).
================================================
FILE: maze/anim.py
================================================
# python anim.py --nbiter 1000000 --rule oja --squash 0 --hiddensize 200 --lr 1e-4 --eplen 250 --print_every 200 --save_every 1000 --bentropy 0.1 --blossv .03 --randstart 1 --gr .9 --rp 0 --labsize 11 --rngseed 1 --type plastic
import argparse
import pdb
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
from numpy import random
import torch.nn.functional as F
from torch import optim
from torch.optim import lr_scheduler
import random
import sys
import pickle
import time
import os
import OpusHdfsCopy
from OpusHdfsCopy import transferFileToHdfsDir, checkHdfs
import platform
import gridlab
from gridlab import Network
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import glob
np.set_printoptions(precision=4)
ETA = .02 # Not used
ADDINPUT = 4 # 1 input for the previous reward, 1 input for numstep, 1 for whether currently on reward square, 1 "Bias" input
NBACTIONS = 4 # U, D, L, R
RFSIZE = 3 # Receptive Field
TOTALNBINPUTS = RFSIZE * RFSIZE + ADDINPUT + NBACTIONS
fig = plt.figure()
plt.axis('off')
def train(paramdict):
#params = dict(click.get_current_context().params)
print("Starting training...")
params = {}
#params.update(defaultParams)
params.update(paramdict)
print("Passed params: ", params)
print(platform.uname())
#params['nbsteps'] = params['nbshots'] * ((params['prestime'] + params['interpresdelay']) * params['nbclasses']) + params['prestimetest'] # Total number of steps per episode
# This needs to be the same as in the file generated by gridlab, and thus the command line parameters must be identical
suffix = "grid_"+"".join([str(x)+"_" if pair[0] is not 'nbsteps' and pair[0] is not 'rngseed' and pair[0] is not 'save_every' and pair[0] is not 'test_every' else '' for pair in sorted(zip(params.keys(), params.values()), key=lambda x:x[0] ) for x in pair])[:-1] + "_rngseed_" + str(params['rngseed']) # Turning the parameters into a nice suffix for filenames
params['rngseed'] = 3
# Initialize random se
gitextract_rz2xt4mk/
├── .gitignore
├── LICENSE
├── NOTICE.md
├── README.md
├── awd-lstm-lm/
│ ├── .gitignore
│ ├── LICENSE
│ ├── OpusHdfsCopy.py
│ ├── OpusPrepare.sh
│ ├── README.md
│ ├── TESTCOMMAND
│ ├── data.py
│ ├── embed_regularize.py
│ ├── finetune.py
│ ├── generate.py
│ ├── getdata.sh
│ ├── locked_dropout.py
│ ├── main.py
│ ├── model.py
│ ├── model.py.old
│ ├── mylstm.py
│ ├── mylstm.py.orig
│ ├── opus.docker.old
│ ├── plotresults.py
│ ├── plotresultssingle.py
│ ├── pointer.py
│ ├── request_devbox.json
│ ├── request_full.json
│ ├── request_opus.json
│ ├── request_opus.json.old
│ ├── request_plast.json
│ ├── splitcross.py
│ ├── test.py
│ ├── tmp.py
│ ├── utils.py
│ └── weight_drop.py
├── images/
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── anim.py
│ ├── images.py
│ ├── plotresults.py
│ ├── request.json
│ ├── showcompletion_eta.py
│ └── testpics.py
├── maze/
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── anim.py
│ ├── animbatch.py
│ ├── batch.py
│ ├── makefigure.py
│ ├── makemaze.py
│ ├── maze.py
│ ├── opus.docker
│ ├── opus.docker.old
│ ├── plotfigure.py
│ ├── plotresults.py
│ ├── request.json
│ ├── request_devbox.json
│ ├── request_modplast.json
│ ├── request_modul.json
│ ├── request_plastic.json
│ ├── request_rnn.json
│ ├── request_rnn100neurons.json
│ ├── testbatch.py
│ └── testnobatch.py
├── omniglot/
│ ├── .ipynb_checkpoints/
│ │ └── Omniglot Data Loading-checkpoint.ipynb
│ ├── README.md
│ ├── omniglot.py
│ ├── opus.docker
│ ├── plotresults.py
│ ├── request.json
│ └── test_omniglot_allseeds.py
├── opus.docker
├── request_devbox.json
├── request_lstm.json
├── request_lstm_simple.json
├── simple/
│ ├── .gitignore
│ ├── OpusHdfsCopy.py
│ ├── README.md
│ ├── full.py
│ ├── lstm.py
│ ├── opus.docker
│ ├── plotresults.py
│ ├── request.json
│ ├── request_lstm.json
│ ├── simple.py
│ └── simplest.py
├── simplemaze/
│ ├── README.md
│ └── maze.py
└── sr/
├── .gitignore
├── OpusHdfsCopy.py
├── README.md
├── anim.py
├── cueshown0.dat.npy
├── makefigure.py
├── modul.py
├── modulator0.dat.npy
├── opus.docker.old
├── plotmodulator.py
├── plotresults.py
├── request.json
├── request_batch.json
├── request_easy.json
├── rewardsprevstep0.dat.npy
├── srbatch.py
├── srrun.py
└── srrun1episode.py
SYMBOL INDEX (183 symbols across 45 files)
FILE: awd-lstm-lm/OpusHdfsCopy.py
function checkHdfs (line 4) | def checkHdfs():
function transferFileToHdfsPath (line 7) | def transferFileToHdfsPath(sourcepath, targetpath):
function transferFileToHdfsDir (line 17) | def transferFileToHdfsDir(sourcepath, targetdir):
FILE: awd-lstm-lm/data.py
class Dictionary (line 7) | class Dictionary(object):
method __init__ (line 8) | def __init__(self):
method add_word (line 14) | def add_word(self, word):
method __len__ (line 23) | def __len__(self):
class Corpus (line 27) | class Corpus(object):
method __init__ (line 28) | def __init__(self, path):
method tokenize (line 34) | def tokenize(self, path):
FILE: awd-lstm-lm/embed_regularize.py
function embedded_dropout (line 5) | def embedded_dropout(embed, words, dropout=0.1, scale=None):
FILE: awd-lstm-lm/finetune.py
function evaluate (line 104) | def evaluate(data_source, batch_size=10):
function train (line 120) | def train():
FILE: awd-lstm-lm/locked_dropout.py
class LockedDropout (line 5) | class LockedDropout(nn.Module):
method __init__ (line 6) | def __init__(self):
method forward (line 9) | def forward(self, x, dropout=0.5):
FILE: awd-lstm-lm/main.py
function model_save (line 113) | def model_save(fn):
function model_load (line 117) | def model_load(fn):
function evaluate (line 209) | def evaluate(data_source, batch_size=10):
function train (line 226) | def train():
FILE: awd-lstm-lm/model.py
class RNNModel (line 13) | class RNNModel(nn.Module):
method __init__ (line 16) | def __init__(self, rnn_type, ntoken, ninp, nhid, proplstm, nlayers, dr...
method reset (line 91) | def reset(self):
method init_weights (line 94) | def init_weights(self):
method forward (line 100) | def forward(self, input, hidden, return_h=False):
method init_hidden (line 148) | def init_hidden(self, bsz):
FILE: awd-lstm-lm/mylstm.py
class SimplePlasticLSTM (line 28) | class SimplePlasticLSTM(nn.Module):
method __init__ (line 29) | def __init__(self, isize, hsize, params): # Note that 'params' is ig...
method forward (line 57) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
class PlasticLSTM (line 110) | class PlasticLSTM(nn.Module):
method __init__ (line 111) | def __init__(self, isize, hsize, params):
method forward (line 188) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
class MyFastPlasticLSTM (line 272) | class MyFastPlasticLSTM(nn.Module):
method __init__ (line 273) | def __init__(self, isize, hsize, params):
method forward (line 326) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
class MyLSTM (line 400) | class MyLSTM(nn.Module):
method __init__ (line 401) | def __init__(self, isize, hsize):
method forward (line 419) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
class MyFastLSTM (line 436) | class MyFastLSTM(nn.Module):
method __init__ (line 437) | def __init__(self, isize, hsize):
method forward (line 451) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
FILE: awd-lstm-lm/plotresults.py
function mavg (line 26) | def mavg(x, N=20):
FILE: awd-lstm-lm/pointer.py
function one_hot (line 52) | def one_hot(idx, size, cuda=True):
function evaluate (line 59) | def evaluate(data_source, batch_size=10, window=args.window):
FILE: awd-lstm-lm/splitcross.py
class SplitCrossEntropyLoss (line 9) | class SplitCrossEntropyLoss(nn.Module):
method __init__ (line 11) | def __init__(self, hidden_size, splits, verbose=False):
method logprob (line 26) | def logprob(self, weight, bias, hiddens, splits=None, softmaxed_head_r...
method split_on_targets (line 72) | def split_on_targets(self, hiddens, targets):
method forward (line 106) | def forward(self, weight, bias, hiddens, targets, verbose=False):
FILE: awd-lstm-lm/test.py
function model_save (line 104) | def model_save(fn):
function model_load (line 108) | def model_load(fn):
function evaluate (line 184) | def evaluate(data_source, batch_size=10):
FILE: awd-lstm-lm/tmp.py
class PlasticLSTM (line 11) | class PlasticLSTM(nn.Module):
method __init__ (line 12) | def __init__(self, isize, hsize, params):
method forward (line 65) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
class MyLSTM (line 143) | class MyLSTM(nn.Module):
method __init__ (line 144) | def __init__(self, isize, hsize):
method forward (line 161) | def forward(self, inputs, hidden): #, hebb, et, pw): # hidden is a tu...
FILE: awd-lstm-lm/utils.py
function repackage_hidden (line 4) | def repackage_hidden(h):
function batchify (line 13) | def batchify(data, bsz, args):
function get_batch (line 24) | def get_batch(source, i, args, seq_len=None, evaluation=False):
FILE: awd-lstm-lm/weight_drop.py
class WeightDrop (line 5) | class WeightDrop(torch.nn.Module):
method __init__ (line 6) | def __init__(self, module, weights, dropout=0, variational=False):
method widget_demagnetizer_y2k_edition (line 14) | def widget_demagnetizer_y2k_edition(*args, **kwargs):
method _setup (line 21) | def _setup(self):
method _setweights (line 32) | def _setweights(self):
method forward (line 45) | def forward(self, *args):
FILE: images/OpusHdfsCopy.py
function checkHdfs (line 20) | def checkHdfs():
function transferFileToHdfsPath (line 23) | def transferFileToHdfsPath(sourcepath, targetpath):
function transferFileToHdfsDir (line 33) | def transferFileToHdfsDir(sourcepath, targetdir):
FILE: images/images.py
function generateInputsAndTarget (line 79) | def generateInputsAndTarget(params, contiguousperturbation=True):
class Network (line 129) | class Network(nn.Module):
method __init__ (line 130) | def __init__(self, params):
method forward (line 142) | def forward(self, input, yin, hebb):
method initialZeroState (line 155) | def initialZeroState(self):
method initialZeroHebb (line 158) | def initialZeroHebb(self):
function train (line 162) | def train(paramdict=None):
function main (line 272) | def main(nbpatterns, nbprescycles, homogenous, prestime, prestimetest, i...
FILE: maze/OpusHdfsCopy.py
function checkHdfs (line 20) | def checkHdfs():
function transferFileToHdfsPath (line 23) | def transferFileToHdfsPath(sourcepath, targetpath):
function transferFileToHdfsDir (line 33) | def transferFileToHdfsDir(sourcepath, targetdir):
FILE: maze/anim.py
function train (line 51) | def train(paramdict):
FILE: maze/animbatch.py
function train (line 60) | def train(paramdict):
FILE: maze/batch.py
class Network (line 64) | class Network(nn.Module):
method __init__ (line 65) | def __init__(self, params):
method forward (line 100) | def forward(self, inputs, hidden, hebb, et, pw):
method initialZeroHebb (line 263) | def initialZeroHebb(self):
method initialZeroPlasticWeights (line 265) | def initialZeroPlasticWeights(self):
method initialZeroState (line 268) | def initialZeroState(self):
function train (line 274) | def train(paramdict):
FILE: maze/makefigure.py
function mavg (line 45) | def mavg(x, N):
FILE: maze/makemaze.py
function genmaze (line 5) | def genmaze(size, nblines):
FILE: maze/maze.py
class Network (line 70) | class Network(nn.Module):
method __init__ (line 71) | def __init__(self, params):
method forward (line 138) | def forward(self, input, hidden, hebb):
method initialZeroHebb (line 215) | def initialZeroHebb(self):
method initialZeroState (line 218) | def initialZeroState(self):
function train (line 230) | def train(paramdict):
FILE: maze/plotfigure.py
function mavg (line 24) | def mavg(x, N):
FILE: maze/plotresults.py
function mavg (line 35) | def mavg(x, N):
FILE: maze/testbatch.py
class Network (line 50) | class Network(nn.Module):
method __init__ (line 51) | def __init__(self, params):
method forward (line 115) | def forward(self, inputs, hidden, hebb, et, pw):
method initialZeroHebb (line 498) | def initialZeroHebb(self):
method initialZeroPlasticWeights (line 501) | def initialZeroPlasticWeights(self):
method initialZeroState (line 504) | def initialZeroState(self):
function train (line 510) | def train(paramdict):
FILE: maze/testnobatch.py
class Network (line 50) | class Network(nn.Module):
method __init__ (line 51) | def __init__(self, params):
method forward (line 144) | def forward(self, inputs, hidden, hebb, et, pw):
method initialZeroHebb (line 573) | def initialZeroHebb(self):
method initialZeroPlasticWeights (line 575) | def initialZeroPlasticWeights(self):
method initialZeroState (line 578) | def initialZeroState(self):
function train (line 588) | def train(paramdict):
FILE: omniglot/omniglot.py
function generateInputsLabelsAndTarget (line 91) | def generateInputsLabelsAndTarget(params, imagedata, test=False):
class Network (line 165) | class Network(nn.Module):
method __init__ (line 166) | def __init__(self, params):
method forward (line 199) | def forward(self, inputx, inputlabel, hebb):
method initialZeroHebb (line 236) | def initialZeroHebb(self):
function train (line 243) | def train(paramdict=None):
function main (line 430) | def main(nbclasses, alpha, rule, gamma, steplr, activ, flare, nbshots, n...
FILE: omniglot/plotresults.py
function mavg (line 9) | def mavg(x, N):
FILE: omniglot/test_omniglot_allseeds.py
function generateInputsLabelsAndTarget (line 80) | def generateInputsLabelsAndTarget(params, imagedata, test=False):
function train (line 156) | def train(paramdict=None):
function main (line 336) | def main(nbclasses, nbshots, prestime, prestimetest, interpresdelay, nbi...
FILE: simple/OpusHdfsCopy.py
function checkHdfs (line 20) | def checkHdfs():
function transferFileToHdfsPath (line 23) | def transferFileToHdfsPath(sourcepath, targetpath):
function transferFileToHdfsDir (line 33) | def transferFileToHdfsDir(sourcepath, targetdir):
FILE: simple/full.py
function generateInputsAndTarget (line 82) | def generateInputsAndTarget():
class NETWORK (line 121) | class NETWORK(nn.Module):
method __init__ (line 122) | def __init__(self):
method forward (line 131) | def forward(self, input, yin, hebb):
method initialZeroState (line 142) | def initialZeroState(self):
method initialZeroHebb (line 145) | def initialZeroHebb(self):
method zeroDiagAlpha (line 148) | def zeroDiagAlpha(self):
FILE: simple/lstm.py
function generateInputsAndTarget (line 91) | def generateInputsAndTarget():
class NETWORK (line 138) | class NETWORK(nn.Module):
method __init__ (line 139) | def __init__(self):
method forward (line 145) | def forward(self, inputs,):
method initialZeroState (line 177) | def initialZeroState(self):
FILE: simple/simple.py
function generateInputsAndTarget (line 69) | def generateInputsAndTarget():
class NETWORK (line 107) | class NETWORK(nn.Module):
method __init__ (line 108) | def __init__(self):
method forward (line 116) | def forward(self, input, yin, hebb):
method initialZeroState (line 122) | def initialZeroState(self):
method initialZeroHebb (line 126) | def initialZeroHebb(self):
FILE: simple/simplest.py
function generateInputsAndTarget (line 46) | def generateInputsAndTarget():
FILE: simplemaze/maze.py
class Network (line 65) | class Network(nn.Module):
method __init__ (line 67) | def __init__(self, isize, hsize):
method forward (line 86) | def forward(self, inputs, hidden): # hidden is a tuple containing the ...
method initialZeroState (line 124) | def initialZeroState(self, BATCHSIZE):
method initialZeroHebb (line 128) | def initialZeroHebb(self, BATCHSIZE):
function train (line 138) | def train(paramdict):
FILE: sr/OpusHdfsCopy.py
function checkHdfs (line 4) | def checkHdfs():
function transferFileToHdfsPath (line 7) | def transferFileToHdfsPath(sourcepath, targetpath):
function transferFileToHdfsDir (line 17) | def transferFileToHdfsDir(sourcepath, targetdir):
FILE: sr/anim.py
function train (line 48) | def train(paramdict):
FILE: sr/makefigure.py
function my_mavg (line 24) | def my_mavg(x, N):
FILE: sr/modul.py
class NonPlasticRNN (line 19) | class NonPlasticRNN(nn.Module):
method __init__ (line 20) | def __init__(self, params):
method forward (line 37) | def forward(self, inputs, hidden): #, hebb):
method initialZeroState (line 55) | def initialZeroState(self):
class PlasticRNN (line 63) | class PlasticRNN(nn.Module):
method __init__ (line 64) | def __init__(self, params):
method forward (line 83) | def forward(self, inputs, hidden, hebb):
method initialZeroHebb (line 105) | def initialZeroHebb(self):
method initialZeroState (line 108) | def initialZeroState(self):
class SimpleModulRNN (line 115) | class SimpleModulRNN(nn.Module):
method __init__ (line 116) | def __init__(self, params):
method forward_test (line 135) | def forward_test(self, inputs, hidden, hebb):
method forward (line 145) | def forward(self, inputs, hidden, hebb):
method initialZeroHebb (line 198) | def initialZeroHebb(self):
method initialZeroState (line 201) | def initialZeroState(self):
class RetroModulRNN (line 209) | class RetroModulRNN(nn.Module):
method __init__ (line 210) | def __init__(self, params):
method forward (line 230) | def forward(self, inputs, hidden, hebb, et, pw):
method initialZeroHebb (line 283) | def initialZeroHebb(self):
method initialZeroPlasticWeights (line 286) | def initialZeroPlasticWeights(self):
method initialZeroState (line 288) | def initialZeroState(self):
FILE: sr/plotresults.py
function mavg (line 39) | def mavg(x, N):
FILE: sr/srbatch.py
function train (line 43) | def train(paramdict):
FILE: sr/srrun.py
function train (line 35) | def train(paramdict):
FILE: sr/srrun1episode.py
function train (line 35) | def train(paramdict):
Condensed preview — 106 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (793K chars).
[
{
"path": ".gitignore",
"chars": 174,
"preview": "*/*.data\n*/data\n*/*.pt\n*/*.swp\n*/*.txt\n*/*.png\n*/*.dat\n*/tmp\n*.swp\n*.txt\n*.png\n*.gif\n*.dat\nloss*\ngrads_*\n__pycache__/*\n*"
},
{
"path": "LICENSE",
"chars": 4819,
"preview": "\"License\" shall mean the terms and conditions for use, reproduction, and distribution as defined by the text below.\n \n\"Y"
},
{
"path": "NOTICE.md",
"chars": 1930,
"preview": "The `awd-lstm-lm` directory (language modelling with plastic LSTMs) was forked\nfrom the [Salesforce Language Model\nToolk"
},
{
"path": "README.md",
"chars": 1848,
"preview": "## Differentiable plasticity\n\nThis repo contains implementations of the algorithms described in [Differentiable plastici"
},
{
"path": "awd-lstm-lm/.gitignore",
"chars": 81,
"preview": "maintmp.py\nHDFS/\n*.patch\nmodel_*\nresults_*\n*.pt\n*.swp\n__pycache__/\ndata/\ncorpus*\n"
},
{
"path": "awd-lstm-lm/LICENSE",
"chars": 1500,
"preview": "BSD 3-Clause License\n\nCopyright (c) 2017, \nAll rights reserved.\n\nRedistribution and use in source and binary forms, with"
},
{
"path": "awd-lstm-lm/OpusHdfsCopy.py",
"chars": 996,
"preview": "import os\nimport os.path\n\ndef checkHdfs():\n return os.path.isfile('/opt/hadoop/latest/bin/hdfs')\n\ndef transferFileToH"
},
{
"path": "awd-lstm-lm/OpusPrepare.sh",
"chars": 733,
"preview": "cd /home/work\n\n# $HOME is not the same as ~ !!!!\n\n# Installing pyenv and putting it in the path\ncurl -L https://raw.git"
},
{
"path": "awd-lstm-lm/README.md",
"chars": 2593,
"preview": "# LSTMs with neuromodulated plasticity\n\n\nThis code implements language modelling on the Penn Treebank dataset, using LST"
},
{
"path": "awd-lstm-lm/TESTCOMMAND",
"chars": 184,
"preview": "python test.py --model MYLSTM --nhid 1150 --file ./HDFS/ptb/model__SqUsq_MYLSTM_clip_cv2.0_modplasth2mod_fanout_i2c_per"
},
{
"path": "awd-lstm-lm/data.py",
"chars": 1628,
"preview": "import os\nimport torch\n\nfrom collections import Counter\n\n\nclass Dictionary(object):\n def __init__(self):\n self"
},
{
"path": "awd-lstm-lm/embed_regularize.py",
"chars": 1011,
"preview": "import numpy as np\nimport pdb\nimport torch\n\ndef embedded_dropout(embed, words, dropout=0.1, scale=None):\n if dropout:\n "
},
{
"path": "awd-lstm-lm/finetune.py",
"chars": 10068,
"preview": "import argparse\nimport time\nimport math\nimport numpy as np\nnp.random.seed(331)\nimport torch\nimport torch.nn as nn\n\nimpor"
},
{
"path": "awd-lstm-lm/generate.py",
"chars": 2760,
"preview": "###############################################################################\n# Language Modeling on Penn Tree Bank\n#\n"
},
{
"path": "awd-lstm-lm/getdata.sh",
"chars": 1750,
"preview": "echo \"=== Acquiring datasets ===\"\necho \"---\"\nmkdir -p save\n\nmkdir -p data\ncd data\n\n#echo \"- Downloading WikiText-2 (WT2)"
},
{
"path": "awd-lstm-lm/locked_dropout.py",
"chars": 454,
"preview": "import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\n\nclass LockedDropout(nn.Module):\n def __init__"
},
{
"path": "awd-lstm-lm/main.py",
"chars": 18059,
"preview": "import OpusHdfsCopy\nfrom OpusHdfsCopy import transferFileToHdfsDir, checkHdfs\nimport argparse\nimport time\nimport math\nim"
},
{
"path": "awd-lstm-lm/model.py",
"chars": 10622,
"preview": "import torch\nimport torch.nn as nn\n#from torch.autograd import Variable\n\nfrom embed_regularize import embedded_dropout\nf"
},
{
"path": "awd-lstm-lm/model.py.old",
"chars": 4929,
"preview": "import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\n\nfrom embed_regularize import embedded_dropout\nfr"
},
{
"path": "awd-lstm-lm/mylstm.py",
"chars": 23596,
"preview": "# Plastic LSTMs, with neuromodulation (backpropamine), \n# as described in Miconi et al. ICLR 2019,\n# by Thomas Miconi an"
},
{
"path": "awd-lstm-lm/mylstm.py.orig",
"chars": 8824,
"preview": "import torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport numpy as np"
},
{
"path": "awd-lstm-lm/opus.docker.old",
"chars": 340,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\nFROM opus-deep-learning-p"
},
{
"path": "awd-lstm-lm/plotresults.py",
"chars": 5210,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\nimport scipy\nfrom scipy import stats\n\ncolorz = ['r', 'b',"
},
{
"path": "awd-lstm-lm/plotresultssingle.py",
"chars": 711,
"preview": "import numpy as np\nimport matplotlib.pyplot as plt\nimport glob\n\n\nfns = glob.glob('./HDFS/ptb/results_*.txt')\n\nplt.figure"
},
{
"path": "awd-lstm-lm/pointer.py",
"chars": 5676,
"preview": "import argparse\nimport time\nimport math\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import"
},
{
"path": "awd-lstm-lm/request_devbox.json",
"chars": 575,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_11_27_11_33_25\",\n \"cpus\":2.0,\n \"ramMB\":26000,\n \""
},
{
"path": "awd-lstm-lm/request_full.json",
"chars": 1468,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_1_22_14_38_35\",\n \"name\":\"PLASTICLSTM_bs6_clip2_cliptyp"
},
{
"path": "awd-lstm-lm/request_opus.json",
"chars": 1429,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_3_13_17_37_3\",\n \"name\":\"newcode_SqUsq_clp2_PLASTICLSTM"
},
{
"path": "awd-lstm-lm/request_opus.json.old",
"chars": 1422,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_12_11_15_39_4\",\n \"name\":\"PLSTM_plastin_bs3_clip2_opus_"
},
{
"path": "awd-lstm-lm/request_plast.json",
"chars": 1407,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_12_11_15_39_4\",\n \"name\":\"PLSTM_plastin_bs3_clip2_opus_"
},
{
"path": "awd-lstm-lm/splitcross.py",
"chars": 9995,
"preview": "from collections import defaultdict\n\nimport torch\nimport torch.nn as nn\n\nimport numpy as np\n\n\nclass SplitCrossEntropyLos"
},
{
"path": "awd-lstm-lm/test.py",
"chars": 9438,
"preview": "import OpusHdfsCopy\nfrom OpusHdfsCopy import transferFileToHdfsDir, checkHdfs\nimport argparse\nimport time\nimport math\nim"
},
{
"path": "awd-lstm-lm/tmp.py",
"chars": 8113,
"preview": "import torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport numpy as np"
},
{
"path": "awd-lstm-lm/utils.py",
"chars": 1014,
"preview": "import torch\n#from torch.autograd import Variable\n\ndef repackage_hidden(h):\n \"\"\"Wraps hidden states in new Tensors, t"
},
{
"path": "awd-lstm-lm/weight_drop.py",
"chars": 3199,
"preview": "import torch\nfrom torch.nn import Parameter\nfrom functools import wraps\n\nclass WeightDrop(torch.nn.Module):\n def __in"
},
{
"path": "images/OpusHdfsCopy.py",
"chars": 1647,
"preview": "# Uber-only code for interacting with hdfs\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache "
},
{
"path": "images/README.md",
"chars": 393,
"preview": "## Images\n\nThis code implements the image completion task: three images are shown several times, then one of the image i"
},
{
"path": "images/anim.py",
"chars": 12044,
"preview": "# Make an animation from the activities of the network over time\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Lice"
},
{
"path": "images/images.py",
"chars": 12955,
"preview": "# Differentiable plasticity: natural image memorization and reconstruction.\n#\n# Copyright (c) 2018 Uber Technologies, In"
},
{
"path": "images/plotresults.py",
"chars": 1017,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\n\nfnames = glob.glob('./tmp/loss_simple_*.txt')\n#fnames = "
},
{
"path": "images/request.json",
"chars": 1005,
"preview": "{\n \"dockerImage\":\"test_tm\", \n \"tag\":\"master-test-2017_10_31_17_22_28\",\n \"name\":\"PicsAPIToCompareWithFixed\",\n "
},
{
"path": "images/showcompletion_eta.py",
"chars": 5855,
"preview": "# Old code to show the dynamics of pattern completion : show the product of the network at each time step\n# Useful to un"
},
{
"path": "images/testpics.py",
"chars": 4845,
"preview": "# Generate a figure that shows a number of episodes\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under th"
},
{
"path": "maze/OpusHdfsCopy.py",
"chars": 1647,
"preview": "# Uber-only code for interacting with hdfs\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache "
},
{
"path": "maze/README.md",
"chars": 2024,
"preview": "# Maze task\n\nThis code performs the grid-maze task, in which the agent must locate a reward and then navigate back to it"
},
{
"path": "maze/anim.py",
"chars": 17915,
"preview": "# python anim.py --nbiter 1000000 --rule oja --squash 0 --hiddensize 200 --lr 1e-4 --eplen 250 --print_every 200 --sav"
},
{
"path": "maze/animbatch.py",
"chars": 16109,
"preview": "# This code produces animations showing the behavior of an agent for two successive episodes.\n\n# Usage: python animbatch"
},
{
"path": "maze/batch.py",
"chars": 32713,
"preview": "# Backpropamine: differentiable neuromdulated plasticity.\n#\n# Copyright (c) 2018-2019 Uber Technologies, Inc.\n#\n# Licens"
},
{
"path": "maze/makefigure.py",
"chars": 8429,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\nimport scipy\nfrom scipy import stats\n\n#colorz = ['r', 'b'"
},
{
"path": "maze/makemaze.py",
"chars": 2505,
"preview": "# Not used for the current version.\n\nimport numpy as np\n\ndef genmaze(size, nblines):\n nbiter = 0\n N = size\n m ="
},
{
"path": "maze/maze.py",
"chars": 25943,
"preview": " \n# Differentiable plasticity: maze exploration task.\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under "
},
{
"path": "maze/opus.docker",
"chars": 252,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\nFROM localhost:5000/opus-"
},
{
"path": "maze/opus.docker.old",
"chars": 292,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\nFROM opus-deep-learning-p"
},
{
"path": "maze/plotfigure.py",
"chars": 4979,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\nimport scipy\nfrom scipy import stats\n\ncolorz = ['r', 'b',"
},
{
"path": "maze/plotresults.py",
"chars": 5524,
"preview": "# Code for plotting results\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache License, Versio"
},
{
"path": "maze/request.json",
"chars": 1250,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_2_9_17_17_4\",\n \"name\":\"Exp10_new_B_gr9_hs_100_labsize_"
},
{
"path": "maze/request_devbox.json",
"chars": 566,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"cpus\":2.0,\n \"ramMB\":6000,\n \"gpus\""
},
{
"path": "maze/request_modplast.json",
"chars": 1297,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"name\":\"Maze_Modplast_hs100_eplen200_add"
},
{
"path": "maze/request_modul.json",
"chars": 1291,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"name\":\"Maze_Modul_hs100_eplen200_addpw3"
},
{
"path": "maze/request_plastic.json",
"chars": 1295,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"name\":\"Maze_Plastic_hs101_eplen200_addp"
},
{
"path": "maze/request_rnn.json",
"chars": 1286,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"name\":\"Maze_RNN_hs139_eplen200_addpw3_b"
},
{
"path": "maze/request_rnn100neurons.json",
"chars": 1285,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_3_11_4_2\",\n \"name\":\"Maze_RNN_hs00_eplen200_addpw3_bv"
},
{
"path": "maze/testbatch.py",
"chars": 59817,
"preview": "import argparse\nimport pdb\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nfro"
},
{
"path": "maze/testnobatch.py",
"chars": 58738,
"preview": "import argparse\nimport pdb\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nfro"
},
{
"path": "omniglot/.ipynb_checkpoints/Omniglot Data Loading-checkpoint.ipynb",
"chars": 87277,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 6,\n \"metadata\": {},\n \"outputs\": [\n {\n \"name\":"
},
{
"path": "omniglot/README.md",
"chars": 575,
"preview": "# Omniglot experiment\n\nThis code performs the Omniglot task (fast learning of image-label mappings).\n\nTo run this code, "
},
{
"path": "omniglot/omniglot.py",
"chars": 19973,
"preview": "# Differentiable plasticity: Omniglot task.\n\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache "
},
{
"path": "omniglot/opus.docker",
"chars": 326,
"preview": "#tmiconi_omniglot\n#latest\n#.\n\n\nFROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\n\nRUN pip3 install sc"
},
{
"path": "omniglot/plotresults.py",
"chars": 2256,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\n\ngroupnames = glob.glob('./tmp/loss*rngseed_0.txt')\n#fnam"
},
{
"path": "omniglot/request.json",
"chars": 1237,
"preview": "{\n \"dockerImage\":\"tmiconi_omniglot\", \n \"tag\":\"master-test-2018_6_22_10_40_5\", \n \"name\":\"Exp7_OmniglotNoSepPlast"
},
{
"path": "omniglot/test_omniglot_allseeds.py",
"chars": 13519,
"preview": "# Differentiable plasticity: Omniglot task.\n\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache "
},
{
"path": "opus.docker",
"chars": 711,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\nFROM opus-deep-learning-p"
},
{
"path": "request_devbox.json",
"chars": 570,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_21_10_41_12\",\n \"cpus\":2.0,\n \"ramMB\":26000,\n \"g"
},
{
"path": "request_lstm.json",
"chars": 1440,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_21_10_41_12\",\n \"name\":\"newcode_PLASTICLSTM_agdiv1149"
},
{
"path": "request_lstm_simple.json",
"chars": 1452,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2019_5_21_10_41_12\",\n \"name\":\"newcode_SIMPLEPLASTICLSTM_agd"
},
{
"path": "simple/.gitignore",
"chars": 12,
"preview": "*.txt\n*.dat\n"
},
{
"path": "simple/OpusHdfsCopy.py",
"chars": 1647,
"preview": "# Uber-only code for interacting with hdfs\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache "
},
{
"path": "simple/README.md",
"chars": 1249,
"preview": "# Pattern memorization and completion\n\nThis code implements the pattern completion task. Five binary pattern of 1000 ele"
},
{
"path": "simple/full.py",
"chars": 10794,
"preview": "# Differentiable plasticity: binary pattern memorization and reconstruction\n#\n# Copyright (c) 2018 Uber Technologies, In"
},
{
"path": "simple/lstm.py",
"chars": 11695,
"preview": "# Memorization of two 50-bit binary patterns per episode, with LSTMs. Takes a very long time to learn the task, and even"
},
{
"path": "simple/opus.docker",
"chars": 252,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\nFROM localhost:5000/opus-"
},
{
"path": "simple/plotresults.py",
"chars": 1685,
"preview": "# Code to plot learning curves\n#\n#\n# Copyright (c) 2018 Uber Technologies, Inc.\n#\n# Licensed under the Apache License, V"
},
{
"path": "simple/request.json",
"chars": 1202,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_6_5_9_32_56\",\n \"name\":\"Exp_simple_1Miter_0addneur_plas"
},
{
"path": "simple/request_lstm.json",
"chars": 1209,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_6_6_16_30_17\",\n \"name\":\"ExpD_simple_lstm_1Miter_1949ad"
},
{
"path": "simple/simple.py",
"chars": 8224,
"preview": "# Differentiable plasticity: simple binary pattern memorization and reconstruction.\n#\n# Copyright (c) 2018 Uber Technolo"
},
{
"path": "simple/simplest.py",
"chars": 5253,
"preview": "# Differentiable plasticity: simplest fully functional code.\n\n# Copyright (c) 2018 Uber Technologies, Inc.\n# Licensed un"
},
{
"path": "simplemaze/README.md",
"chars": 4656,
"preview": "# Simple code for the grid maze task.\n\nThis code is a deliberately simplified version of the `maze` experiment. The\ncode"
},
{
"path": "simplemaze/maze.py",
"chars": 19266,
"preview": "# Backpropamine: differentiable neuromdulated plasticity.\n#\n# Copyright (c) 2018-2019 Uber Technologies, Inc.\n#\n# Licens"
},
{
"path": "sr/.gitignore",
"chars": 29,
"preview": "tmp/\ntmp/*\n*.txt\n*.dat\n*.swp\n"
},
{
"path": "sr/OpusHdfsCopy.py",
"chars": 996,
"preview": "import os\nimport os.path\n\ndef checkHdfs():\n return os.path.isfile('/opt/hadoop/latest/bin/hdfs')\n\ndef transferFileToH"
},
{
"path": "sr/README.md",
"chars": 1738,
"preview": "# Target discovery task\n\nA simple stimulus-response (\"SR\") association task.\n\nAt the start of each episode, we generate "
},
{
"path": "sr/anim.py",
"chars": 12689,
"preview": "import argparse\nimport pdb \nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nfr"
},
{
"path": "sr/makefigure.py",
"chars": 5435,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\nimport scipy\nfrom scipy import stats\n\ncolorz = ['g', 'ora"
},
{
"path": "sr/modul.py",
"chars": 14664,
"preview": "import pdb\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nimport torch.nn.fun"
},
{
"path": "sr/opus.docker.old",
"chars": 291,
"preview": "#tmiconi_rl\n#latest\n#.\n\n\n#FROM localhost:5000/opus-deep-learning:master-test-2017_9_7_20_56_10\n#FROM opus-deep-learning:"
},
{
"path": "sr/plotmodulator.py",
"chars": 1468,
"preview": "import numpy as np; import matplotlib.pyplot as plt \n\nc = np.load('cueshown0.dat.npy'); r = np.load('rewardspre"
},
{
"path": "sr/plotresults.py",
"chars": 6851,
"preview": "import numpy as np\nimport glob\nimport matplotlib.pyplot as plt\nimport scipy\nfrom scipy import stats\n\ncolorz = ['r', 'b',"
},
{
"path": "sr/request.json",
"chars": 1403,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_9_21_9_55_16\",\n \"name\":\"Exp3lvlCS5_gc5_15runs_bent0.03"
},
{
"path": "sr/request_batch.json",
"chars": 1380,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_10_9_15_13_17\",\n \"name\":\"ExpSRbatch6_gc2.0_10runs_bent"
},
{
"path": "sr/request_easy.json",
"chars": 1380,
"preview": "{\n \"dockerImage\":\"tmiconi_rl\", \n \"tag\":\"master-test-2018_10_4_11_45_47\",\n \"name\":\"ExpSRbatch3_gc2.5_10runs_bent"
},
{
"path": "sr/srbatch.py",
"chars": 30743,
"preview": "# Stimulus-response task as described in Miconi et al. ICLR 2019.\n\n# Copyright (c) 2018-2019 Uber Technologies, Inc.\n#\n#"
},
{
"path": "sr/srrun.py",
"chars": 27022,
"preview": "import argparse\nimport pdb\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nfro"
},
{
"path": "sr/srrun1episode.py",
"chars": 28852,
"preview": "import argparse\nimport pdb\nimport torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nfro"
}
]
// ... and 3 more files (download for full content)
About this extraction
This page contains the full source code of the uber-common/differentiable-plasticity GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 106 files (747.0 KB), approximately 241.0k tokens, and a symbol index with 183 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.