Neural Networks are all the rage these days, and with good reason. Microsoft Research's winning model on the 2015 ImageNet competition is classifying images with 3.57% error rate (human performance is 5.1%). Google used a variant to crush one of the world's best Go players 4-1. Crazy things are happening in the field, with no sign of slowing down. In this project, I've applied recurrent neural nets to learn a predictive model over symbolic sequences of music.
Disclaimer: This post assumes a familiarity with machine learning and neural networks. For a good overview of RNN's, I highly recommend reading Andrej Karpathy's excellent blog post here for an in-depth explanation.
Music Language Modeling
Music Language Modeling is the problem of modeling symbolic sequences of polyphonic music in a completely general piano roll representation. Piano roll representation is a key distinction here, meaning we're going to use the symbolic note sequences as represented by sheet music, as opposed to more complex, acoustically rich audio signals. MIDI files are perfect for this, as they encode all the note information exactly to how it would be displayed on a piano roll.
The most straightforward way to learn this way is to discretize a piece of music into uniform time steps. There are 88 possible pitches from A0 to C8 in a MIDI file, so every time step is encoded into an 88-dimensional binary vector as shown above. A value of 1 at index i indicates that pitch i is playing at a given time step. Then we plug this sequence of input vectors into an RNN architecture, where at each step the target is to predict the next time step of the sequence. A trained model outputs the conditional distribution of notes at a time step, given the all the time steps that have occured before it.
One problem with this naive formulation is that the amount of potential note configurations is too high ($2^{N}$ for $N$ possible notes) to take the softmax classification approach normally used in image classification and language modeling. Instead, we need to use a sigmoid cross-entropy loss function to predict the probability of whether each note class is active or not separately. However, this approach does not make much sense for the complex joint distribution of notes usually found in a time step. For example, C is much more likely than C# to be playing when E and G are also active, but separate classification targets implicity assumes independence between note probabilities at the same time step. Modeling Temporal Dependencies in High-Dimensional Sequences (Boulanger-Lewandowski, 2012), perhaps the most succesful research paper on MLM so far, attempts to solve this problem using energy based generative models such as the Restricted Boltzmann Machine (RBM). They propose the combined RNN-RBM architecture, which achieves state-of-the-art performance on several music datasets.
Model
For my model, I decided to take the approach of introducing more musical structure into learning. Many musical pieces can be separated into two parts: a melody and harmony. I make the two following assumptions about a piece of music for my model: First, the melody is monophonic (only one note at most playing at every time step). Second, the harmony at each time step can be classified into a chord class. For example, a C, E, G active during a time step would is classified as C Major. These are strong assumptions, but they lead to the nice property of exactly one active melody class and one active harmony class at every time step. This allows us to take the sum of two softmax functions as the loss function for our model.
My model works in the following way: for every time step I encode the melody note into a one-hot-encoding binary vector. I then use the notes playing in the harmony to infer the chord class, and turn that into a one-hot-encoding binary vector as well. The full input vector is a concatentation of the melody and harmony vectors. This input vector then passes through hidden layer(s) of LSTM cells. The loss function is the sum of two separate softmax loss functions over the respective melody and harmony parts of the output layer.
If we have $M$ melody classes and $H$ harmony classes, the function above describes the log-likelihood loss at a time step given the output layer $z \in \mathbb{R}^{M+H}$, a target melody class $m$, and target harmony class $h$. $\alpha$ is what I call the melody coefficient that controls how much the loss function is affected by it's respective melody and harmony loss terms.
Experiments
The Nottingham dataset is a collection of 1200 jigs and folk tunes, most of which fit the assumptions specified above: they have a simple monophonic melody on top of recognizable chords. You can download the all the nottingham tunes as MIDI files here. I discretized each of these sequences into time steps of sixteenth notes (1/4 of a quarter note), and used the mingus python package to detect the chord classes in the harmonies. After some filtering out some sequences that didn't fit the assumptions, I ended up with 32 chord classes and 34 possible melody notes (1 class from each of these represented a rest) for a total input dimension of 66 over 997 sequences. The average length of a sequence was 516 (roughly 32 measures in 4/4). Finally, all the sequences were split up into 65% training, 15% validation, and 15% testing.
An example musical sequence from the Nottingham dataset
I used Google's TensorFlow library to implement my model. The architecture that I found worked best was 2 stacked hidden layers of 200 LSTM units each. I batched sequences by length, and used an unrolling length of 128 (8 measures in 4/4 time signature) for Backpropagation through time (BPTT). I used RMSProp with a learning rate of 0.005 and decay rate of 0.9 for minibatch gradient descent. When searching over the hyperparameter space, I trained each model for 250 epochs, and saved the model with the lowest validation loss.
Training and validation loss plotted over num epochs for a model with 2 stacked layers of 200 LSTM units, with 50% dropout on hidden layers and 80% dropout on input layers. Overfitting issues start showing up after about 20 epochs.
One big issue I ran into when training was extreme overfitting. Adding dropout on the non-recurrent connections helped some, but did not completely eliminate the issue. The best dropout configuration I found and ended up using was 50% on the hidden layers and 80% on the input layers.
Results
The best model I found achieved on overall accuracy of 77.84% on the test set. One nice consequence of my model is I can evaluate the melody and harmony accuracies separately, which ended up being 64.15% and 91.57% for the melody and harmony respectively. The higher harmony accuracy makes sense, because most of the pieces in the dataset hold out chords for 8 or 16 time steps (a half or whole note in 4/4 time).
Alright, enough numbers, let's get to the fun stuff. Once the model is trained, generating music from it is just a matter of sampling a melody and harmony from the probability distribution at each time step and plugging it back into the network. Rinse and repeat. I present to you 8 more pieces generated by my model below. I "primed" each with the starting 16 time steps (1 measure in 4/4 time) from a random test sequence, and then let them do their thing for 2048 time steps.
Some turned out sounding better than others to my ears, but overall the model clearly does not produce human-level compositions. The lack of long-term structure such as repeated phrases and themes is especially revealing. However, for the most part, the model seems to play a melody in key with the harmony that it chooses. The melody also tends to stay in the same key signature for short-term phrases, and sometimes the harmony accompanies it with short chord progressions in that same key. There does also seem to be small pieces of coherent rhythmic structure, although the "time signature" overall throughout a piece is sporadic.
Many thanks go out to Fei Sha for providing valuable advice; this work was completed as part of my final project for his research seminar. If you're interested in learning more, the final report contains more about the model and a few more experimental results. The source code is also available on github here if you'd like to use my code to train your own models! (warning: messy code)
.
================================================
FILE: install.sh
================================================
conda create -n music_rnn python=2.7
source activate music_rnn
pip install -r requirements.txt
mkdir models
mkdir data
# http://www-etud.iro.umontreal.ca/~boulanni/icml2012
wget http://www-etud.iro.umontreal.ca/~boulanni/Nottingham.zip -O data/Nottingham.zip
unzip data/Nottingham.zip -d data/
================================================
FILE: midi_util.py
================================================
import sys, os
from collections import defaultdict
import numpy as np
import midi
RANGE = 128
def round_tick(tick, time_step):
return int(round(tick/float(time_step)) * time_step)
def ingest_notes(track, verbose=False):
notes = { n: [] for n in range(RANGE) }
current_tick = 0
for msg in track:
# ignore all end of track events
if isinstance(msg, midi.EndOfTrackEvent):
continue
if msg.tick > 0:
current_tick += msg.tick
# velocity of 0 is equivalent to note off, so treat as such
if isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() != 0:
if len(notes[msg.get_pitch()]) > 0 and \
len(notes[msg.get_pitch()][-1]) != 2:
if verbose:
print "Warning: double NoteOn encountered, deleting the first"
print msg
else:
notes[msg.get_pitch()] += [[current_tick]]
elif isinstance(msg, midi.NoteOffEvent) or \
(isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() == 0):
# sanity check: no notes end without being started
if len(notes[msg.get_pitch()][-1]) != 1:
if verbose:
print "Warning: skipping NoteOff Event with no corresponding NoteOn"
print msg
else:
notes[msg.get_pitch()][-1] += [current_tick]
return notes, current_tick
def round_notes(notes, track_ticks, time_step, R=None, O=None):
if not R:
R = RANGE
if not O:
O = 0
sequence = np.zeros((track_ticks/time_step, R))
disputed = { t: defaultdict(int) for t in range(track_ticks/time_step) }
for note in notes:
for (start, end) in notes[note]:
start_t = round_tick(start, time_step) / time_step
end_t = round_tick(end, time_step) / time_step
# normal case where note is long enough
if end - start > time_step/2 and start_t != end_t:
sequence[start_t:end_t, note - O] = 1
# cases where note is within bounds of time step
elif start > start_t * time_step:
disputed[start_t][note] += (end - start)
elif end <= end_t * time_step:
disputed[end_t-1][note] += (end - start)
# case where a note is on the border
else:
before_border = start_t * time_step - start
if before_border > 0:
disputed[start_t-1][note] += before_border
after_border = end - start_t * time_step
if after_border > 0 and end < track_ticks:
disputed[start_t][note] += after_border
# solve disputed
for seq_idx in range(sequence.shape[0]):
if np.count_nonzero(sequence[seq_idx, :]) == 0 and len(disputed[seq_idx]) > 0:
# print seq_idx, disputed[seq_idx]
sorted_notes = sorted(disputed[seq_idx].items(),
key=lambda x: x[1])
max_val = max(x[1] for x in sorted_notes)
top_notes = filter(lambda x: x[1] >= max_val, sorted_notes)
for note, _ in top_notes:
sequence[seq_idx, note - O] = 1
return sequence
def parse_midi_to_sequence(input_filename, time_step, verbose=False):
sequence = []
pattern = midi.read_midifile(input_filename)
if len(pattern) < 1:
raise Exception("No pattern found in midi file")
if verbose:
print "Track resolution: {}".format(pattern.resolution)
print "Number of tracks: {}".format(len(pattern))
print "Time step: {}".format(time_step)
# Track ingestion stage
notes = { n: [] for n in range(RANGE) }
track_ticks = 0
for track in pattern:
current_tick = 0
for msg in track:
# ignore all end of track events
if isinstance(msg, midi.EndOfTrackEvent):
continue
if msg.tick > 0:
current_tick += msg.tick
# velocity of 0 is equivalent to note off, so treat as such
if isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() != 0:
if len(notes[msg.get_pitch()]) > 0 and \
len(notes[msg.get_pitch()][-1]) != 2:
if verbose:
print "Warning: double NoteOn encountered, deleting the first"
print msg
else:
notes[msg.get_pitch()] += [[current_tick]]
elif isinstance(msg, midi.NoteOffEvent) or \
(isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() == 0):
# sanity check: no notes end without being started
if len(notes[msg.get_pitch()][-1]) != 1:
if verbose:
print "Warning: skipping NoteOff Event with no corresponding NoteOn"
print msg
else:
notes[msg.get_pitch()][-1] += [current_tick]
track_ticks = max(current_tick, track_ticks)
track_ticks = round_tick(track_ticks, time_step)
if verbose:
print "Track ticks (rounded): {} ({} time steps)".format(track_ticks, track_ticks/time_step)
sequence = round_notes(notes, track_ticks, time_step)
return sequence
class MidiWriter(object):
def __init__(self, verbose=False):
self.verbose = verbose
self.note_range = RANGE
def note_off(self, val, tick):
self.track.append(midi.NoteOffEvent(tick=tick, pitch=val))
return 0
def note_on(self, val, tick):
self.track.append(midi.NoteOnEvent(tick=tick, pitch=val, velocity=70))
return 0
def dump_sequence_to_midi(self, sequence, output_filename, time_step,
resolution, metronome=24):
if self.verbose:
print "Dumping sequence to MIDI file: {}".format(output_filename)
print "Resolution: {}".format(resolution)
print "Time Step: {}".format(time_step)
pattern = midi.Pattern(resolution=resolution)
self.track = midi.Track()
# metadata track
meta_track = midi.Track()
time_sig = midi.TimeSignatureEvent()
time_sig.set_numerator(4)
time_sig.set_denominator(4)
time_sig.set_metronome(metronome)
time_sig.set_thirtyseconds(8)
meta_track.append(time_sig)
pattern.append(meta_track)
# reshape to (SEQ_LENGTH X NUM_DIMS)
sequence = np.reshape(sequence, [-1, self.note_range])
time_steps = sequence.shape[0]
if self.verbose:
print "Total number of time steps: {}".format(time_steps)
tick = time_step
self.notes_on = { n: False for n in range(self.note_range) }
# for seq_idx in range(188, 220):
for seq_idx in range(time_steps):
notes = np.nonzero(sequence[seq_idx, :])[0].tolist()
# this tick will only be assigned to first NoteOn/NoteOff in
# this time_step
# NoteOffEvents come first so they'll have the tick value
# go through all notes that are currently on and see if any
# turned off
for n in self.notes_on:
if self.notes_on[n] and n not in notes:
tick = self.note_off(n, tick)
self.notes_on[n] = False
# Turn on any notes that weren't previously on
for note in notes:
if not self.notes_on[note]:
tick = self.note_on(note, tick)
self.notes_on[note] = True
tick += time_step
# flush out notes
for n in self.notes_on:
if self.notes_on[n]:
self.note_off(n, tick)
tick = 0
self.notes_on[n] = False
pattern.append(self.track)
midi.write_midifile(output_filename, pattern)
if __name__ == '__main__':
pass
================================================
FILE: model.py
================================================
import os
import logging
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn_cell
from tensorflow.models.rnn import rnn, seq2seq
import nottingham_util
class Model(object):
"""
Cross-Entropy Naive Formulation
A single time step may have multiple notes active, so a sigmoid cross entropy loss
is used to match targets.
seq_input: a [ T x B x D ] matrix, where T is the time steps in the batch, B is the
batch size, and D is the amount of dimensions
"""
def __init__(self, config, training=False):
self.config = config
self.time_batch_len = time_batch_len = config.time_batch_len
self.input_dim = input_dim = config.input_dim
hidden_size = config.hidden_size
num_layers = config.num_layers
dropout_prob = config.dropout_prob
input_dropout_prob = config.input_dropout_prob
cell_type = config.cell_type
self.seq_input = \
tf.placeholder(tf.float32, shape=[self.time_batch_len, None, input_dim])
if (dropout_prob <= 0.0 or dropout_prob > 1.0):
raise Exception("Invalid dropout probability: {}".format(dropout_prob))
if (input_dropout_prob <= 0.0 or input_dropout_prob > 1.0):
raise Exception("Invalid input dropout probability: {}".format(input_dropout_prob))
# setup variables
with tf.variable_scope("rnnlstm"):
output_W = tf.get_variable("output_w", [hidden_size, input_dim])
output_b = tf.get_variable("output_b", [input_dim])
self.lr = tf.constant(config.learning_rate, name="learning_rate")
self.lr_decay = tf.constant(config.learning_rate_decay, name="learning_rate_decay")
def create_cell(input_size):
if cell_type == "vanilla":
cell_class = rnn_cell.BasicRNNCell
elif cell_type == "gru":
cell_class = rnn_cell.BasicGRUCell
elif cell_type == "lstm":
cell_class = rnn_cell.BasicLSTMCell
else:
raise Exception("Invalid cell type: {}".format(cell_type))
cell = cell_class(hidden_size, input_size = input_size)
if training:
return rnn_cell.DropoutWrapper(cell, output_keep_prob = dropout_prob)
else:
return cell
if training:
self.seq_input_dropout = tf.nn.dropout(self.seq_input, keep_prob = input_dropout_prob)
else:
self.seq_input_dropout = self.seq_input
self.cell = rnn_cell.MultiRNNCell(
[create_cell(input_dim)] + [create_cell(hidden_size) for i in range(1, num_layers)])
batch_size = tf.shape(self.seq_input_dropout)[0]
self.initial_state = self.cell.zero_state(batch_size, tf.float32)
inputs_list = tf.unpack(self.seq_input_dropout)
# rnn outputs a list of [batch_size x H] outputs
outputs_list, self.final_state = rnn.rnn(self.cell, inputs_list,
initial_state=self.initial_state)
outputs = tf.pack(outputs_list)
outputs_concat = tf.reshape(outputs, [-1, hidden_size])
logits_concat = tf.matmul(outputs_concat, output_W) + output_b
logits = tf.reshape(logits_concat, [self.time_batch_len, -1, input_dim])
# probabilities of each note
self.probs = self.calculate_probs(logits)
self.loss = self.init_loss(logits, logits_concat)
self.train_step = tf.train.RMSPropOptimizer(self.lr, decay = self.lr_decay) \
.minimize(self.loss)
def init_loss(self, outputs, _):
self.seq_targets = \
tf.placeholder(tf.float32, [self.time_batch_len, None, self.input_dim])
batch_size = tf.shape(self.seq_input_dropout)
cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(outputs, self.seq_targets)
return tf.reduce_sum(cross_ent) / self.time_batch_len / tf.to_float(batch_size)
def calculate_probs(self, logits):
return tf.sigmoid(logits)
def get_cell_zero_state(self, session, batch_size):
return self.cell.zero_state(batch_size, tf.float32).eval(session=session)
class NottinghamModel(Model):
"""
Dual softmax formulation
A single time step should be a concatenation of two one-hot-encoding binary vectors.
Loss function is a sum of two softmax loss functions over [:r] and [r:] respectively,
where r is the number of melody classes
"""
def init_loss(self, outputs, outputs_concat):
self.seq_targets = \
tf.placeholder(tf.int64, [self.time_batch_len, None, 2])
batch_size = tf.shape(self.seq_targets)[1]
with tf.variable_scope("rnnlstm"):
self.melody_coeff = tf.constant(self.config.melody_coeff)
r = nottingham_util.NOTTINGHAM_MELODY_RANGE
targets_concat = tf.reshape(self.seq_targets, [-1, 2])
melody_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( \
outputs_concat[:, :r], \
targets_concat[:, 0])
harmony_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( \
outputs_concat[:, r:], \
targets_concat[:, 1])
losses = tf.add(self.melody_coeff * melody_loss, (1 - self.melody_coeff) * harmony_loss)
return tf.reduce_sum(losses) / self.time_batch_len / tf.to_float(batch_size)
def calculate_probs(self, logits):
steps = []
for t in range(self.time_batch_len):
melody_softmax = tf.nn.softmax(logits[t, :, :nottingham_util.NOTTINGHAM_MELODY_RANGE])
harmony_softmax = tf.nn.softmax(logits[t, :, nottingham_util.NOTTINGHAM_MELODY_RANGE:])
steps.append(tf.concat(1, [melody_softmax, harmony_softmax]))
return tf.pack(steps)
def assign_melody_coeff(self, session, melody_coeff):
if melody_coeff < 0.0 or melody_coeff > 1.0:
raise Exception("Invalid melody coeffecient")
session.run(tf.assign(self.melody_coeff, melody_coeff))
class NottinghamSeparate(Model):
"""
Single softmax formulation
Regular single classification formulation, used to train baseline models
where the melody and harmony are trained separately
"""
def init_loss(self, outputs, outputs_concat):
self.seq_targets = \
tf.placeholder(tf.int64, [self.time_batch_len, None])
batch_size = tf.shape(self.seq_targets)[1]
with tf.variable_scope("rnnlstm"):
self.melody_coeff = tf.constant(self.config.melody_coeff)
targets_concat = tf.reshape(self.seq_targets, [-1])
losses = tf.nn.sparse_softmax_cross_entropy_with_logits( \
outputs_concat, targets_concat)
return tf.reduce_sum(losses) / self.time_batch_len / tf.to_float(batch_size)
def calculate_probs(self, logits):
steps = []
for t in range(self.time_batch_len):
softmax = tf.nn.softmax(logits[t, :, :])
steps.append(softmax)
return tf.pack(steps)
================================================
FILE: nottingham_util.py
================================================
import numpy as np
import os
import midi
import cPickle
from pprint import pprint
import midi_util
import mingus
import mingus.core.chords
import sampling
PICKLE_LOC = 'data/nottingham.pickle'
NOTTINGHAM_MELODY_MAX = 88
NOTTINGHAM_MELODY_MIN = 55
# add one to the range for silence in melody
NOTTINGHAM_MELODY_RANGE = NOTTINGHAM_MELODY_MAX - NOTTINGHAM_MELODY_MIN + 1 + 1
CHORD_BASE = 48
CHORD_BLACKLIST = ['major third', 'minor third', 'perfect fifth']
NO_CHORD = 'NONE'
SHARPS_TO_FLATS = {
"A#": "Bb",
"B#": "C",
"C#": "Db",
"D#": "Eb",
"E#": "F",
"F#": "Gb",
"G#": "Ab",
}
def resolve_chord(chord):
"""
Resolves rare chords to their closest common chord, to limit the total
amount of chord classes.
"""
if chord in CHORD_BLACKLIST:
return None
# take the first of dual chords
if "|" in chord:
chord = chord.split("|")[0]
# remove 7ths, 11ths, 9s, 6th,
if chord.endswith("11"):
chord = chord[:-2]
if chord.endswith("7") or chord.endswith("9") or chord.endswith("6"):
chord = chord[:-1]
# replace 'dim' with minor
if chord.endswith("dim"):
chord = chord[:-3] + "m"
return chord
def prepare_nottingham_pickle(time_step, chord_cutoff=64, filename=PICKLE_LOC, verbose=False):
"""
time_step: the time step to discretize all notes into
chord_cutoff: if chords are seen less than this cutoff, they are ignored and marked as
as rests in the resulting dataset
filename: the location where the pickle will be saved to
"""
data = {}
store = {}
chords = {}
max_seq = 0
seq_lens = []
for d in ["train", "test", "valid"]:
print "Parsing {}...".format(d)
parsed = parse_nottingham_directory("data/Nottingham/{}".format(d), time_step, verbose=False)
metadata = [s[0] for s in parsed]
seqs = [s[1] for s in parsed]
data[d] = seqs
data[d + '_metadata'] = metadata
lens = [len(s[1]) for s in seqs]
seq_lens += lens
max_seq = max(max_seq, max(lens))
for _, harmony in seqs:
for h in harmony:
if h not in chords:
chords[h] = 1
else:
chords[h] += 1
avg_seq = float(sum(seq_lens)) / len(seq_lens)
chords = { c: i for c, i in chords.iteritems() if chords[c] >= chord_cutoff }
chord_mapping = { c: i for i, c in enumerate(chords.keys()) }
num_chords = len(chord_mapping)
store['chord_to_idx'] = chord_mapping
if verbose:
pprint(chords)
print "Number of chords: {}".format(num_chords)
print "Max Sequence length: {}".format(max_seq)
print "Avg Sequence length: {}".format(avg_seq)
print "Num Sequences: {}".format(len(seq_lens))
def combine(melody, harmony):
full = np.zeros((melody.shape[0], NOTTINGHAM_MELODY_RANGE + num_chords))
assert melody.shape[0] == len(harmony)
# for all melody sequences that don't have any notes, add the empty melody marker (last one)
for i in range(melody.shape[0]):
if np.count_nonzero(melody[i, :]) == 0:
melody[i, NOTTINGHAM_MELODY_RANGE-1] = 1
# all melody encodings should now have exactly one 1
for i in range(melody.shape[0]):
assert np.count_nonzero(melody[i, :]) == 1
# add all the melodies
full[:, :melody.shape[1]] += melody
harmony_idxs = [ chord_mapping[h] if h in chord_mapping else chord_mapping[NO_CHORD] \
for h in harmony ]
harmony_idxs = [ NOTTINGHAM_MELODY_RANGE + h for h in harmony_idxs ]
full[np.arange(len(harmony)), harmony_idxs] = 1
# all full encodings should have exactly two 1's
for i in range(full.shape[0]):
assert np.count_nonzero(full[i, :]) == 2
return full
for d in ["train", "test", "valid"]:
print "Combining {}".format(d)
store[d] = [ combine(m, h) for m, h in data[d] ]
store[d + '_metadata'] = data[d + '_metadata']
with open(filename, 'w') as f:
cPickle.dump(store, f, protocol=-1)
return True
def parse_nottingham_directory(input_dir, time_step, verbose=False):
"""
input_dir: a directory containing MIDI files
returns a list of [T x D] matrices, where each matrix represents a
a sequence with T time steps over D dimensions
"""
files = [ os.path.join(input_dir, f) for f in os.listdir(input_dir)
if os.path.isfile(os.path.join(input_dir, f)) ]
sequences = [ \
parse_nottingham_to_sequence(f, time_step=time_step, verbose=verbose) \
for f in files ]
if verbose:
print "Total sequences: {}".format(len(sequences))
# filter out the non 2-track MIDI's
sequences = filter(lambda x: x[1] != None, sequences)
if verbose:
print "Total sequences left: {}".format(len(sequences))
return sequences
def parse_nottingham_to_sequence(input_filename, time_step, verbose=False):
"""
input_filename: a MIDI filename
returns a [T x D] matrix representing a sequence with T time steps over
D dimensions
"""
sequence = []
pattern = midi.read_midifile(input_filename)
metadata = {
"path": input_filename,
"name": input_filename.split("/")[-1].split(".")[0]
}
# Most nottingham midi's have 3 tracks. metadata info, melody, harmony
# throw away any tracks that don't fit this
if len(pattern) != 3:
if verbose:
"Skipping track with {} tracks".format(len(pattern))
return (metadata, None)
# ticks_per_quarter = -1
for msg in pattern[0]:
if isinstance(msg, midi.TimeSignatureEvent):
metadata["ticks_per_quarter"] = msg.get_metronome()
ticks_per_quarter = msg.get_metronome()
if verbose:
print "{}".format(input_filename)
print "Track resolution: {}".format(pattern.resolution)
print "Number of tracks: {}".format(len(pattern))
print "Time step: {}".format(time_step)
print "Ticks per quarter: {}".format(ticks_per_quarter)
# Track ingestion stage
track_ticks = 0
melody_notes, melody_ticks = midi_util.ingest_notes(pattern[1])
harmony_notes, harmony_ticks = midi_util.ingest_notes(pattern[2])
track_ticks = midi_util.round_tick(max(melody_ticks, harmony_ticks), time_step)
if verbose:
print "Track ticks (rounded): {} ({} time steps)".format(track_ticks, track_ticks/time_step)
melody_sequence = midi_util.round_notes(melody_notes, track_ticks, time_step,
R=NOTTINGHAM_MELODY_RANGE, O=NOTTINGHAM_MELODY_MIN)
for i in range(melody_sequence.shape[0]):
if np.count_nonzero(melody_sequence[i, :]) > 1:
if verbose:
print "Double note found: {}: {} ({})".format(i, np.nonzero(melody_sequence[i, :]), input_filename)
return (metadata, None)
harmony_sequence = midi_util.round_notes(harmony_notes, track_ticks, time_step)
harmonies = []
for i in range(harmony_sequence.shape[0]):
notes = np.where(harmony_sequence[i] == 1)[0]
if len(notes) > 0:
notes_shift = [ mingus.core.notes.int_to_note(h%12) for h in notes]
chord = mingus.core.chords.determine(notes_shift, shorthand=True)
if len(chord) == 0:
# try flat combinations
notes_shift = [ SHARPS_TO_FLATS[n] if n in SHARPS_TO_FLATS else n for n in notes_shift]
chord = mingus.core.chords.determine(notes_shift, shorthand=True)
if len(chord) == 0:
if verbose:
print "Could not determine chord: {} ({}, {}), defaulting to last steps chord" \
.format(notes_shift, input_filename, i)
if len(harmonies) > 0:
harmonies.append(harmonies[-1])
else:
harmonies.append(NO_CHORD)
else:
resolved = resolve_chord(chord[0])
if resolved:
harmonies.append(resolved)
else:
harmonies.append(NO_CHORD)
else:
harmonies.append(NO_CHORD)
return (metadata, (melody_sequence, harmonies))
class NottinghamMidiWriter(midi_util.MidiWriter):
def __init__(self, chord_to_idx, verbose=False):
super(NottinghamMidiWriter, self).__init__(verbose)
self.idx_to_chord = { i: c for c, i in chord_to_idx.items() }
self.note_range = NOTTINGHAM_MELODY_RANGE + len(self.idx_to_chord)
def dereference_chord(self, idx):
if idx not in self.idx_to_chord:
raise Exception("No chord index found: {}".format(idx))
shorthand = self.idx_to_chord[idx]
if shorthand == NO_CHORD:
return []
chord = mingus.core.chords.from_shorthand(shorthand)
return [ CHORD_BASE + mingus.core.notes.note_to_int(n) for n in chord ]
def note_on(self, val, tick):
if val >= NOTTINGHAM_MELODY_RANGE:
notes = self.dereference_chord(val - NOTTINGHAM_MELODY_RANGE)
else:
# if note is the top of the range, then it stands for gap in melody
if val == NOTTINGHAM_MELODY_RANGE - 1:
notes = []
else:
notes = [NOTTINGHAM_MELODY_MIN + val]
# print 'turning on {}'.format(notes)
for note in notes:
self.track.append(midi.NoteOnEvent(tick=tick, pitch=note, velocity=70))
tick = 0 # notes that come right after each other should have zero tick
return tick
def note_off(self, val, tick):
if val >= NOTTINGHAM_MELODY_RANGE:
notes = self.dereference_chord(val - NOTTINGHAM_MELODY_RANGE)
else:
notes = [NOTTINGHAM_MELODY_MIN + val]
# print 'turning off {}'.format(notes)
for note in notes:
self.track.append(midi.NoteOffEvent(tick=tick, pitch=note))
tick = 0
return tick
class NottinghamSampler(object):
def __init__(self, chord_to_idx, method = 'sample', harmony_repeat_max = 16, melody_repeat_max = 16, verbose=False):
self.verbose = verbose
self.idx_to_chord = { i: c for c, i in chord_to_idx.items() }
self.method = method
self.hlast = 0
self.hcount = 0
self.hrepeat = harmony_repeat_max
self.mlast = 0
self.mcount = 0
self.mrepeat = melody_repeat_max
def visualize_probs(self, probs):
if not self.verbose:
return
melodies = sorted(list(enumerate(probs[:NOTTINGHAM_MELODY_RANGE])),
key=lambda x: x[1], reverse=True)[:4]
harmonies = sorted(list(enumerate(probs[NOTTINGHAM_MELODY_RANGE:])),
key=lambda x: x[1], reverse=True)[:4]
harmonies = [(self.idx_to_chord[i], j) for i, j in harmonies]
print 'Top Melody Notes: '
pprint(melodies)
print 'Top Harmony Notes: '
pprint(harmonies)
def sample_notes_static(self, probs):
top_m = probs[:NOTTINGHAM_MELODY_RANGE].argsort()
if top_m[-1] == self.mlast and self.mcount >= self.mrepeat:
top_m = top_m[:-1]
self.mcount = 0
elif top_m[-1] == self.mlast:
self.mcount += 1
else:
self.mcount = 0
self.mlast = top_m[-1]
top_melody = top_m[-1]
top_h = probs[NOTTINGHAM_MELODY_RANGE:].argsort()
if top_h[-1] == self.hlast and self.hcount >= self.hrepeat:
top_h = top_h[:-1]
self.hcount = 0
elif top_h[-1] == self.hlast:
self.hcount += 1
else:
self.hcount = 0
self.hlast = top_h[-1]
top_chord = top_h[-1] + NOTTINGHAM_MELODY_RANGE
chord = np.zeros([len(probs)], dtype=np.int32)
chord[top_melody] = 1.0
chord[top_chord] = 1.0
return chord
def sample_notes_dist(self, probs):
idxed = [(i, p) for i, p in enumerate(probs)]
notes = [n[0] for n in idxed]
ps = np.array([n[1] for n in idxed])
r = NOTTINGHAM_MELODY_RANGE
assert np.allclose(np.sum(ps[:r]), 1.0)
assert np.allclose(np.sum(ps[r:]), 1.0)
# renormalize so numpy doesn't complain
ps[:r] = ps[:r] / ps[:r].sum()
ps[r:] = ps[r:] / ps[r:].sum()
melody = np.random.choice(notes[:r], p=ps[:r])
harmony = np.random.choice(notes[r:], p=ps[r:])
chord = np.zeros([len(probs)], dtype=np.int32)
chord[melody] = 1.0
chord[harmony] = 1.0
return chord
def sample_notes(self, probs):
self.visualize_probs(probs)
if self.method == 'static':
return self.sample_notes_static(probs)
elif self.method == 'sample':
return self.sample_notes_dist(probs)
def accuracy(batch_probs, data, num_samples=1):
"""
Batch Probs: { num_time_steps: [ time_step_1, time_step_2, ... ] }
Data: [
[ [ data ], [ target ] ], # batch with one time step
[ [ data1, data2 ], [ target1, target2 ] ], # batch with two time steps
...
]
"""
def calc_accuracy():
total = 0
melody_correct, harmony_correct = 0, 0
melody_incorrect, harmony_incorrect = 0, 0
for _, batch_targets in data:
num_time_steps = len(batch_targets)
for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):
assert ts_targets.shape == ts_targets.shape
for seq_idx in range(ts_targets.shape[1]):
for step_idx in range(ts_targets.shape[0]):
idxed = [(n, p) for n, p in \
enumerate(ts_probs[step_idx, seq_idx, :])]
notes = [n[0] for n in idxed]
ps = np.array([n[1] for n in idxed])
r = NOTTINGHAM_MELODY_RANGE
assert np.allclose(np.sum(ps[:r]), 1.0)
assert np.allclose(np.sum(ps[r:]), 1.0)
# renormalize so numpy doesn't complain
ps[:r] = ps[:r] / ps[:r].sum()
ps[r:] = ps[r:] / ps[r:].sum()
melody = np.random.choice(notes[:r], p=ps[:r])
harmony = np.random.choice(notes[r:], p=ps[r:])
melody_target = ts_targets[step_idx, seq_idx, 0]
if melody_target == melody:
melody_correct += 1
else:
melody_incorrect += 1
harmony_target = ts_targets[step_idx, seq_idx, 1] + r
if harmony_target == harmony:
harmony_correct += 1
else:
harmony_incorrect += 1
return (melody_correct, melody_incorrect, harmony_correct, harmony_incorrect)
maccs, haccs, taccs = [], [], []
for i in range(num_samples):
print "Sample {}".format(i)
m, mi, h, hi = calc_accuracy()
maccs.append( float(m) / float(m + mi))
haccs.append( float(h) / float(h + hi))
taccs.append( float(m + h) / float(m + h + mi + hi) )
print "Melody Precision/Recall: {}".format(sum(maccs)/len(maccs))
print "Harmony Precision/Recall: {}".format(sum(haccs)/len(haccs))
print "Total Precision/Recall: {}".format(sum(taccs)/len(taccs))
def seperate_accuracy(batch_probs, data, num_samples=1):
def calc_accuracy():
total = 0
total_correct, total_incorrect = 0, 0
for _, batch_targets in data:
num_time_steps = len(batch_targets)
for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):
assert ts_targets.shape == ts_targets.shape
for seq_idx in range(ts_targets.shape[1]):
for step_idx in range(ts_targets.shape[0]):
idxed = [(n, p) for n, p in \
enumerate(ts_probs[step_idx, seq_idx, :])]
notes = [n[0] for n in idxed]
ps = np.array([n[1] for n in idxed])
r = NOTTINGHAM_MELODY_RANGE
assert np.allclose(np.sum(ps), 1.0)
ps = ps / ps.sum()
note = np.random.choice(notes, p=ps)
target = ts_targets[step_idx, seq_idx]
if target == note:
total_correct += 1
else:
total_incorrect += 1
return (total_correct, total_incorrect)
taccs = []
for i in range(num_samples):
print "Sample {}".format(i)
c, ic = calc_accuracy()
taccs.append( float(c) / float(c + ic))
print "Precision/Recall: {}".format(sum(taccs)/len(taccs))
def i_vi_iv_v(chord_to_idx, repeats, input_dim):
r = NOTTINGHAM_MELODY_RANGE
i = np.zeros(input_dim)
i[r + chord_to_idx['CM']] = 1
vi = np.zeros(input_dim)
vi[r + chord_to_idx['Am']] = 1
iv = np.zeros(input_dim)
iv[r + chord_to_idx['FM']] = 1
v = np.zeros(input_dim)
v[r + chord_to_idx['GM']] = 1
full_seq = [i] * 16 + [vi] * 16 + [iv] * 16 + [v] * 16
full_seq = full_seq * repeats
return full_seq
if __name__ == '__main__':
resolution = 480
time_step = 120
assert resolve_chord("GM7") == "GM"
assert resolve_chord("G#dim|AM7") == "G#m"
assert resolve_chord("Dm9") == "Dm"
assert resolve_chord("AM11") == "AM"
prepare_nottingham_pickle(time_step, verbose=True)
================================================
FILE: requirements.txt
================================================
matplotlib
mingus
numpy
git+https://github.com/vishnubob/python-midi#egg=midi
# Linux, Python 2.7, GPU
https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl
================================================
FILE: rnn.py
================================================
import os, sys
import argparse
import time
import itertools
import cPickle
import logging
import random
import string
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import nottingham_util
import util
from model import Model, NottinghamModel
def get_config_name(config):
def replace_dot(s): return s.replace(".", "p")
return "nl_" + str(config.num_layers) + "_hs_" + str(config.hidden_size) + \
replace_dot("_mc_{}".format(config.melody_coeff)) + \
replace_dot("_dp_{}".format(config.dropout_prob)) + \
replace_dot("_idp_{}".format(config.input_dropout_prob)) + \
replace_dot("_tb_{}".format(config.time_batch_len))
class DefaultConfig(object):
# model parameters
num_layers = 2
hidden_size = 200
melody_coeff = 0.5
dropout_prob = 0.5
input_dropout_prob = 0.8
cell_type = 'lstm'
# learning parameters
max_time_batches = 9
time_batch_len = 128
learning_rate = 5e-3
learning_rate_decay = 0.9
num_epochs = 250
# metadata
dataset = 'softmax'
model_file = ''
def __repr__(self):
return """Num Layers: {}, Hidden Size: {}, Melody Coeff: {}, Dropout Prob: {}, Input Dropout Prob: {}, Cell Type: {}, Time Batch Len: {}, Learning Rate: {}, Decay: {}""".format(self.num_layers, self.hidden_size, self.melody_coeff, self.dropout_prob, self.input_dropout_prob, self.cell_type, self.time_batch_len, self.learning_rate, self.learning_rate_decay)
if __name__ == '__main__':
np.random.seed()
parser = argparse.ArgumentParser(description='Script to train and save a model.')
parser.add_argument('--dataset', type=str, default='softmax',
# choices = ['bach', 'nottingham', 'softmax'],
choices = ['softmax'])
parser.add_argument('--model_dir', type=str, default='models')
parser.add_argument('--run_name', type=str, default=time.strftime("%m%d_%H%M"))
args = parser.parse_args()
if args.dataset == 'softmax':
resolution = 480
time_step = 120
model_class = NottinghamModel
with open(nottingham_util.PICKLE_LOC, 'r') as f:
pickle = cPickle.load(f)
chord_to_idx = pickle['chord_to_idx']
input_dim = pickle["train"][0].shape[1]
print 'Finished loading data, input dim: {}'.format(input_dim)
else:
raise Exception("Other datasets not yet implemented")
initializer = tf.random_uniform_initializer(-0.1, 0.1)
best_config = None
best_valid_loss = None
# set up run dir
run_folder = os.path.join(args.model_dir, args.run_name)
if os.path.exists(run_folder):
raise Exception("Run name {} already exists, choose a different one", format(run_folder))
os.makedirs(run_folder)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler(os.path.join(run_folder, "training.log")))
grid = {
"dropout_prob": [0.5],
"input_dropout_prob": [0.8],
"melody_coeff": [0.5],
"num_layers": [2],
"hidden_size": [200],
"num_epochs": [250],
"learning_rate": [5e-3],
"learning_rate_decay": [0.9],
"time_batch_len": [128],
}
# Generate product of hyperparams
runs = list(list(itertools.izip(grid, x)) for x in itertools.product(*grid.itervalues()))
logger.info("{} runs detected".format(len(runs)))
for combination in runs:
config = DefaultConfig()
config.dataset = args.dataset
config.model_name = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(12)) + '.model'
for attr, value in combination:
setattr(config, attr, value)
if config.dataset == 'softmax':
data = util.load_data('', time_step, config.time_batch_len, config.max_time_batches, nottingham=pickle)
config.input_dim = data["input_dim"]
else:
raise Exception("Other datasets not yet implemented")
logger.info(config)
config_file_path = os.path.join(run_folder, get_config_name(config) + '.config')
with open(config_file_path, 'w') as f:
cPickle.dump(config, f)
with tf.Graph().as_default(), tf.Session() as session:
with tf.variable_scope("model", reuse=None):
train_model = model_class(config, training=True)
with tf.variable_scope("model", reuse=True):
valid_model = model_class(config, training=False)
saver = tf.train.Saver(tf.all_variables(), max_to_keep=40)
tf.initialize_all_variables().run()
# training
early_stop_best_loss = None
start_saving = False
saved_flag = False
train_losses, valid_losses = [], []
start_time = time.time()
for i in range(config.num_epochs):
loss = util.run_epoch(session, train_model,
data["train"]["data"], training=True, testing=False)
train_losses.append((i, loss))
if i == 0:
continue
logger.info('Epoch: {}, Train Loss: {}, Time Per Epoch: {}'.format(\
i, loss, (time.time() - start_time)/i))
valid_loss = util.run_epoch(session, valid_model, data["valid"]["data"], training=False, testing=False)
valid_losses.append((i, valid_loss))
logger.info('Valid Loss: {}'.format(valid_loss))
if early_stop_best_loss == None:
early_stop_best_loss = valid_loss
elif valid_loss < early_stop_best_loss:
early_stop_best_loss = valid_loss
if start_saving:
logger.info('Best loss so far encountered, saving model.')
saver.save(session, os.path.join(run_folder, config.model_name))
saved_flag = True
elif not start_saving:
start_saving = True
logger.info('Valid loss increased for the first time, will start saving models')
saver.save(session, os.path.join(run_folder, config.model_name))
saved_flag = True
if not saved_flag:
saver.save(session, os.path.join(run_folder, config.model_name))
# set loss axis max to 20
axes = plt.gca()
if config.dataset == 'softmax':
axes.set_ylim([0, 2])
else:
axes.set_ylim([0, 100])
plt.plot([t[0] for t in train_losses], [t[1] for t in train_losses])
plt.plot([t[0] for t in valid_losses], [t[1] for t in valid_losses])
plt.legend(['Train Loss', 'Validation Loss'])
chart_file_path = os.path.join(run_folder, get_config_name(config) + '.png')
plt.savefig(chart_file_path)
plt.clf()
logger.info("Config {}, Loss: {}".format(config, early_stop_best_loss))
if best_valid_loss == None or early_stop_best_loss < best_valid_loss:
logger.info("Found best new model!")
best_valid_loss = early_stop_best_loss
best_config = config
logger.info("Best Config: {}, Loss: {}".format(best_config, best_valid_loss))
================================================
FILE: rnn_sample.py
================================================
import os, sys
import argparse
import time
import itertools
import cPickle
import numpy as np
import tensorflow as tf
import util
import nottingham_util
from model import Model, NottinghamModel
from rnn import DefaultConfig
if __name__ == '__main__':
np.random.seed()
parser = argparse.ArgumentParser(description='Script to generated a MIDI file sample from a trained model.')
parser.add_argument('--config_file', type=str, required=True)
parser.add_argument('--sample_melody', action='store_true', default=False)
parser.add_argument('--sample_harmony', action='store_true', default=False)
parser.add_argument('--sample_seq', type=str, default='random',
choices = ['random', 'chords'])
parser.add_argument('--conditioning', type=int, default=-1)
parser.add_argument('--sample_length', type=int, default=512)
args = parser.parse_args()
with open(args.config_file, 'r') as f:
config = cPickle.load(f)
if config.dataset == 'softmax':
config.time_batch_len = 1
config.max_time_batches = -1
model_class = NottinghamModel
with open(nottingham_util.PICKLE_LOC, 'r') as f:
pickle = cPickle.load(f)
chord_to_idx = pickle['chord_to_idx']
time_step = 120
resolution = 480
# use time batch len of 1 so that every target is covered
test_data = util.batch_data(pickle['test'], time_batch_len = 1,
max_time_batches = -1, softmax = True)
else:
raise Exception("Other datasets not yet implemented")
print config
with tf.Graph().as_default(), tf.Session() as session:
with tf.variable_scope("model", reuse=None):
sampling_model = model_class(config)
saver = tf.train.Saver(tf.all_variables())
model_path = os.path.join(os.path.dirname(args.config_file),
config.model_name)
saver.restore(session, model_path)
state = sampling_model.get_cell_zero_state(session, 1)
if args.sample_seq == 'chords':
# 16 - one measure, 64 - chord progression
repeats = args.sample_length / 64
sample_seq = nottingham_util.i_vi_iv_v(chord_to_idx, repeats, config.input_dim)
print 'Sampling melody using a I, VI, IV, V progression'
elif args.sample_seq == 'random':
sample_index = np.random.choice(np.arange(len(pickle['test'])))
sample_seq = [ pickle['test'][sample_index][i, :]
for i in range(pickle['test'][sample_index].shape[0]) ]
chord = sample_seq[0]
seq = [chord]
if args.conditioning > 0:
for i in range(1, args.conditioning):
seq_input = np.reshape(chord, [1, 1, config.input_dim])
feed = {
sampling_model.seq_input: seq_input,
sampling_model.initial_state: state,
}
state = session.run(sampling_model.final_state, feed_dict=feed)
chord = sample_seq[i]
seq.append(chord)
if config.dataset == 'softmax':
writer = nottingham_util.NottinghamMidiWriter(chord_to_idx, verbose=False)
sampler = nottingham_util.NottinghamSampler(chord_to_idx, verbose=False)
else:
# writer = midi_util.MidiWriter()
# sampler = sampling.Sampler(verbose=False)
raise Exception("Other datasets not yet implemented")
for i in range(max(args.sample_length - len(seq), 0)):
seq_input = np.reshape(chord, [1, 1, config.input_dim])
feed = {
sampling_model.seq_input: seq_input,
sampling_model.initial_state: state,
}
[probs, state] = session.run(
[sampling_model.probs, sampling_model.final_state],
feed_dict=feed)
probs = np.reshape(probs, [config.input_dim])
chord = sampler.sample_notes(probs)
if config.dataset == 'softmax':
r = nottingham_util.NOTTINGHAM_MELODY_RANGE
if args.sample_melody:
chord[r:] = 0
chord[r:] = sample_seq[i][r:]
elif args.sample_harmony:
chord[:r] = 0
chord[:r] = sample_seq[i][:r]
seq.append(chord)
writer.dump_sequence_to_midi(seq, "best.midi",
time_step=time_step, resolution=resolution)
================================================
FILE: rnn_separate.py
================================================
import os, sys
import argparse
import time
import itertools
import cPickle
import logging
import random
import string
import pprint
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import midi_util
import nottingham_util
import sampling
import util
from rnn import get_config_name, DefaultConfig
from model import Model, NottinghamSeparate
if __name__ == '__main__':
np.random.seed()
parser = argparse.ArgumentParser(description='Music RNN')
parser.add_argument('--choice', type=str, default='melody',
choices = ['melody', 'harmony'])
parser.add_argument('--dataset', type=str, default='softmax',
choices = ['bach', 'nottingham', 'softmax'])
parser.add_argument('--model_dir', type=str, default='models')
parser.add_argument('--run_name', type=str, default=time.strftime("%m%d_%H%M"))
args = parser.parse_args()
if args.dataset == 'softmax':
resolution = 480
time_step = 120
model_class = NottinghamSeparate
with open(nottingham_util.PICKLE_LOC, 'r') as f:
pickle = cPickle.load(f)
chord_to_idx = pickle['chord_to_idx']
input_dim = pickle["train"][0].shape[1]
print 'Finished loading data, input dim: {}'.format(input_dim)
else:
raise Exception("Other datasets not yet implemented")
initializer = tf.random_uniform_initializer(-0.1, 0.1)
best_config = None
best_valid_loss = None
# set up run dir
run_folder = os.path.join(args.model_dir, args.run_name)
if os.path.exists(run_folder):
raise Exception("Run name {} already exists, choose a different one", format(run_folder))
os.makedirs(run_folder)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
logger.addHandler(logging.FileHandler(os.path.join(run_folder, "training.log")))
# grid
grid = {
"dropout_prob": [0.65],
"input_dropout_prob": [0.9],
"num_layers": [1],
"hidden_size": [100]
}
# Generate product of hyperparams
runs = list(list(itertools.izip(grid, x)) for x in itertools.product(*grid.itervalues()))
logger.info("{} runs detected".format(len(runs)))
for combination in runs:
config = DefaultConfig()
config.dataset = args.dataset
config.model_name = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(12)) + '.model'
for attr, value in combination:
setattr(config, attr, value)
if config.dataset == 'softmax':
data = util.load_data('', time_step, config.time_batch_len, config.max_time_batches, nottingham=pickle)
config.input_dim = data["input_dim"]
else:
raise Exception("Other datasets not yet implemented")
# cut away unnecessary parts
r = nottingham_util.NOTTINGHAM_MELODY_RANGE
if args.choice == 'melody':
print "Using only melody"
for d in ['train', 'test', 'valid']:
new_data = []
for batch_data, batch_targets in data[d]["data"]:
new_data.append(([tb[:, :, :r] for tb in batch_data],
[tb[:, :, 0] for tb in batch_targets]))
data[d]["data"] = new_data
else:
print "Using only harmony"
for d in ['train', 'test', 'valid']:
new_data = []
for batch_data, batch_targets in data[d]["data"]:
new_data.append(([tb[:, :, r:] for tb in batch_data],
[tb[:, :, 1] for tb in batch_targets]))
data[d]["data"] = new_data
input_dim = data["input_dim"] = data["train"]["data"][0][0][0].shape[2]
config.input_dim = input_dim
print "New input dim: {}".format(input_dim)
logger.info(config)
config_file_path = os.path.join(run_folder, get_config_name(config) + '.config')
with open(config_file_path, 'w') as f:
cPickle.dump(config, f)
with tf.Graph().as_default(), tf.Session() as session:
with tf.variable_scope("model", reuse=None):
train_model = model_class(config, training=True)
with tf.variable_scope("model", reuse=True):
valid_model = model_class(config, training=False)
saver = tf.train.Saver(tf.all_variables())
tf.initialize_all_variables().run()
# training
early_stop_best_loss = None
start_saving = False
saved_flag = False
train_losses, valid_losses = [], []
start_time = time.time()
for i in range(config.num_epochs):
loss = util.run_epoch(session, train_model, data["train"]["data"], training=True, testing=False)
train_losses.append((i, loss))
if i == 0:
continue
valid_loss = util.run_epoch(session, valid_model, data["valid"]["data"], training=False, testing=False)
valid_losses.append((i, valid_loss))
logger.info('Epoch: {}, Train Loss: {}, Valid Loss: {}, Time Per Epoch: {}'.format(\
i, loss, valid_loss, (time.time() - start_time)/i))
# if it's best validation loss so far, save it
if early_stop_best_loss == None:
early_stop_best_loss = valid_loss
elif valid_loss < early_stop_best_loss:
early_stop_best_loss = valid_loss
if start_saving:
logger.info('Best loss so far encountered, saving model.')
saver.save(session, os.path.join(run_folder, config.model_name))
saved_flag = True
elif not start_saving:
start_saving = True
logger.info('Valid loss increased for the first time, will start saving models')
saver.save(session, os.path.join(run_folder, config.model_name))
saved_flag = True
if not saved_flag:
saver.save(session, os.path.join(run_folder, config.model_name))
# set loss axis max to 20
axes = plt.gca()
if config.dataset == 'softmax':
axes.set_ylim([0, 2])
else:
axes.set_ylim([0, 100])
plt.plot([t[0] for t in train_losses], [t[1] for t in train_losses])
plt.plot([t[0] for t in valid_losses], [t[1] for t in valid_losses])
plt.legend(['Train Loss', 'Validation Loss'])
chart_file_path = os.path.join(run_folder, get_config_name(config) + '.png')
plt.savefig(chart_file_path)
plt.clf()
logger.info("Config {}, Loss: {}".format(config, early_stop_best_loss))
if best_valid_loss == None or early_stop_best_loss < best_valid_loss:
logger.info("Found best new model!")
best_valid_loss = early_stop_best_loss
best_config = config
logger.info("Best Config: {}, Loss: {}".format(best_config, best_valid_loss))
================================================
FILE: rnn_test.py
================================================
import os, sys
import argparse
import cPickle
import numpy as np
import tensorflow as tf
import util
import nottingham_util
from model import Model, NottinghamModel, NottinghamSeparate
from rnn import DefaultConfig
if __name__ == '__main__':
np.random.seed()
parser = argparse.ArgumentParser(description='Script to test a models performance against the test set')
parser.add_argument('--config_file', type=str, required=True)
parser.add_argument('--num_samples', type=int, default=1)
parser.add_argument('--seperate', action='store_true', default=False)
parser.add_argument('--choice', type=str, default='melody',
choices = ['melody', 'harmony'])
args = parser.parse_args()
with open(args.config_file, 'r') as f:
config = cPickle.load(f)
if config.dataset == 'softmax':
config.time_batch_len = 1
config.max_time_batches = -1
with open(nottingham_util.PICKLE_LOC, 'r') as f:
pickle = cPickle.load(f)
if args.seperate:
model_class = NottinghamSeparate
test_data = util.batch_data(pickle['test'], time_batch_len = 1,
max_time_batches = -1, softmax = True)
r = nottingham_util.NOTTINGHAM_MELODY_RANGE
if args.choice == 'melody':
print "Using only melody"
new_data = []
for batch_data, batch_targets in test_data:
new_data.append(([tb[:, :, :r] for tb in batch_data],
[tb[:, :, 0] for tb in batch_targets]))
test_data = new_data
else:
print "Using only harmony"
new_data = []
for batch_data, batch_targets in test_data:
new_data.append(([tb[:, :, r:] for tb in batch_data],
[tb[:, :, 1] for tb in batch_targets]))
test_data = new_data
else:
model_class = NottinghamModel
# use time batch len of 1 so that every target is covered
test_data = util.batch_data(pickle['test'], time_batch_len = 1,
max_time_batches = -1, softmax = True)
else:
raise Exception("Other datasets not yet implemented")
print config
with tf.Graph().as_default(), tf.Session() as session:
with tf.variable_scope("model", reuse=None):
test_model = model_class(config, training=False)
saver = tf.train.Saver(tf.all_variables())
model_path = os.path.join(os.path.dirname(args.config_file),
config.model_name)
saver.restore(session, model_path)
test_loss, test_probs = util.run_epoch(session, test_model, test_data,
training=False, testing=True)
print 'Testing Loss: {}'.format(test_loss)
if config.dataset == 'softmax':
if args.seperate:
nottingham_util.seperate_accuracy(test_probs, test_data, num_samples=args.num_samples)
else:
nottingham_util.accuracy(test_probs, test_data, num_samples=args.num_samples)
else:
util.accuracy(test_probs, test_data, num_samples=50)
sys.exit(1)
================================================
FILE: sampling.py
================================================
import numpy as np
from pprint import pprint
import midi_util
class Sampler(object):
def __init__(self, min_prob=0.5, num_notes = 4, method = 'sample', verbose=False):
self.min_prob = min_prob
self.num_notes = num_notes
self.method = method
self.verbose = verbose
def visualize_probs(self, probs):
if not self.verbose:
return
print 'Highest four probs: '
pprint(sorted(list(enumerate(probs)), key=lambda x: x[1],
reverse=True)[:4])
def sample_notes_prob(self, probs, max_notes=-1):
""" Samples all notes that are over a certain probability"""
self.visualize_probs(probs)
top_idxs = list()
for idx in probs.argsort()[::-1]:
if max_notes > 0 and len(top_idxs) >= max_notes:
break
if probs[idx] < self.min_prob:
break
top_idxs.append(idx)
chord = np.zeros([len(probs)], dtype=np.int32)
chord[top_idxs] = 1.0
return chord
def sample_notes_static(self, probs):
top_idxs = probs.argsort()[-self.num_notes:][::-1]
chord = np.zeros([len(probs)], dtype=np.int32)
chord[top_idxs] = 1.0
return chord
def sample_notes_bernoulli(self, probs):
chord = np.zeros([len(probs)], dtype=np.int32)
for note, prob in enumerate(probs):
if np.random.binomial(1, prob) > 0:
chord[note] = 1
return chord
def sample_notes(self, probs):
""" Samples a static amount of notes from probabilities by highest prob """
self.visualize_probs(probs)
if self.method == 'sample':
return self.sample_notes_bernoulli(probs)
elif self.method == 'static':
return self.sample_notes_static(probs)
elif self.method == 'min_prob':
return self.sample_notes_prob(probs)
else:
raise Exception("Unrecognized method: {}".format(self.method))
================================================
FILE: util.py
================================================
import os
import math
import cPickle
from collections import defaultdict
from random import shuffle
import numpy as np
import tensorflow as tf
import midi_util
import nottingham_util
def parse_midi_directory(input_dir, time_step):
"""
input_dir: data directory full of midi files
time_step: the number of ticks to use as a time step for discretization
Returns a list of [T x D] matrices, where T is the amount of time steps
and D is the range of notes.
"""
files = [ os.path.join(input_dir, f) for f in os.listdir(input_dir)
if os.path.isfile(os.path.join(input_dir, f)) ]
sequences = [ \
(f, midi_util.parse_midi_to_sequence(f, time_step=time_step)) \
for f in files ]
return sequences
def batch_data(sequences, time_batch_len=128, max_time_batches=10,
softmax=False, verbose=False):
"""
sequences: a list of [T x D] matrices, each matrix representing a sequencey
time_batch_len: the unrolling length that will be used by BPTT.
max_time_batches: the max amount of time batches to consider. Any sequences
longert than max_time_batches * time_batch_len will be ignored
Can be set to -1 to all time batches needed.
softmax: Flag should be set to true if using the dual-softmax formualtion
returns [
[ [ data ], [ target ] ], # batch with one time step
[ [ data1, data2 ], [ target1, target2 ] ], # batch with two time steps
...
]
"""
assert time_batch_len > 0
dims = sequences[0].shape[1]
sequence_lens = [s.shape[0] for s in sequences]
if verbose:
avg_seq_len = sum(sequence_lens) / len(sequences)
print "Average Sequence Length: {}".format(avg_seq_len)
print "Max Sequence Length: {}".format(time_batch_len)
print "Number of sequences: {}".format(len(sequences))
batches = defaultdict(list)
for sequence in sequences:
# -1 because we can't predict the first step
num_time_steps = ((sequence.shape[0]-1) // time_batch_len)
if num_time_steps < 1:
continue
if max_time_batches > 0 and num_time_steps > max_time_batches:
continue
batches[num_time_steps].append(sequence)
if verbose:
print "Batch distribution:"
print [(k, len(v)) for (k, v) in batches.iteritems()]
def arrange_batch(sequences, num_time_steps):
sequences = [s[:(num_time_steps*time_batch_len)+1, :] for s in sequences]
stacked = np.dstack(sequences)
# swap axes so that shape is (SEQ_LENGTH X BATCH_SIZE X INPUT_DIM)
data = np.swapaxes(stacked, 1, 2)
targets = np.roll(data, -1, axis=0)
# cutoff final time step
data = data[:-1, :, :]
targets = targets[:-1, :, :]
assert data.shape == targets.shape
if softmax:
r = nottingham_util.NOTTINGHAM_MELODY_RANGE
labels = np.ones((targets.shape[0], targets.shape[1], 2), dtype=np.int32)
assert np.all(np.sum(targets[:, :, :r], axis=2) == 1)
assert np.all(np.sum(targets[:, :, r:], axis=2) == 1)
labels[:, :, 0] = np.argmax(targets[:, :, :r], axis=2)
labels[:, :, 1] = np.argmax(targets[:, :, r:], axis=2)
targets = labels
assert targets.shape[:2] == data.shape[:2]
assert data.shape[0] == num_time_steps * time_batch_len
# split them up into time batches
tb_data = np.split(data, num_time_steps, axis=0)
tb_targets = np.split(targets, num_time_steps, axis=0)
assert len(tb_data) == len(tb_targets) == num_time_steps
for i in range(len(tb_data)):
assert tb_data[i].shape[0] == time_batch_len
assert tb_targets[i].shape[0] == time_batch_len
if softmax:
assert np.all(np.sum(tb_data[i], axis=2) == 2)
return (tb_data, tb_targets)
return [ arrange_batch(b, n) for n, b in batches.iteritems() ]
def load_data(data_dir, time_step, time_batch_len, max_time_batches, nottingham=None):
"""
nottingham: The sequences object as created in prepare_nottingham_pickle
(see nottingham_util for more). If None, parse all the MIDI
files from data_dir
time_step: the time_step used to parse midi files (only used if data_dir
is provided)
time_batch_len and max_time_batches: see batch_data()
returns {
"train": {
"data": [ batch_data() ],
"metadata: { ... }
},
"valid": { ... }
"test": { ... }
}
"""
data = {}
for dataset in ['train', 'test', 'valid']:
# For testing, use ALL the sequences
if dataset == 'test':
max_time_batches = -1
# Softmax formualation preparsed into sequences
if nottingham:
sequences = nottingham[dataset]
metadata = nottingham[dataset + '_metadata']
# Cross-entropy formulation needs to be parsed
else:
sf = parse_midi_directory(os.path.join(data_dir, dataset), time_step)
sequences = [s[1] for s in sf]
files = [s[0] for s in sf]
metadata = [{
'path': f,
'name': f.split("/")[-1].split(".")[0]
} for f in files]
dataset_data = batch_data(sequences, time_batch_len, max_time_batches, softmax = True if nottingham else False)
data[dataset] = {
"data": dataset_data,
"metadata": metadata,
}
data["input_dim"] = dataset_data[0][0][0].shape[2]
return data
def run_epoch(session, model, batches, training=False, testing=False):
"""
session: Tensorflow session object
model: model object (see model.py)
batches: data object loaded from util_data()
training: A backpropagation iteration will be performed on the dataset
if this flag is active
returns average loss per time step over all batches.
if testing flag is active: returns [ loss, probs ] where is the probability
values for each note
"""
# shuffle batches
shuffle(batches)
target_tensors = [model.loss, model.final_state]
if testing:
target_tensors.append(model.probs)
batch_probs = defaultdict(list)
if training:
target_tensors.append(model.train_step)
losses = []
for data, targets in batches:
# save state over unrolling time steps
batch_size = data[0].shape[1]
num_time_steps = len(data)
state = model.get_cell_zero_state(session, batch_size)
probs = list()
for tb_data, tb_targets in zip(data, targets):
if testing:
tbd = tb_data
tbt = tb_targets
else:
# shuffle all the batches of input, state, and target
batches = tb_data.shape[1]
permutations = np.random.permutation(batches)
tbd = np.zeros_like(tb_data)
tbd[:, np.arange(batches), :] = tb_data[:, permutations, :]
tbt = np.zeros_like(tb_targets)
tbt[:, np.arange(batches), :] = tb_targets[:, permutations, :]
state[np.arange(batches)] = state[permutations]
feed_dict = {
model.initial_state: state,
model.seq_input: tbd,
model.seq_targets: tbt,
}
results = session.run(target_tensors, feed_dict=feed_dict)
losses.append(results[0])
state = results[1]
if testing:
batch_probs[num_time_steps].append(results[2])
loss = sum(losses) / len(losses)
if testing:
return [loss, batch_probs]
else:
return loss
def accuracy(batch_probs, data, num_samples=20):
"""
batch_probs: probs object returned from run_epoch
data: data object passed into run_epoch
num_samples: the number of times to sample each note (an average over all
these samples will be used)
returns the accuracy metric according to
http://ismir2009.ismir.net/proceedings/PS2-21.pdf
"""
false_positives, false_negatives, true_positives = 0, 0, 0
for _, batch_targets in data:
num_time_steps = len(batch_data)
for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):
assert ts_targets.shape == ts_targets.shape
for seq_idx in range(ts_targets.shape[1]):
for step_idx in range(ts_targets.shape[0]):
for note_idx, prob in enumerate(ts_probs[step_idx, seq_idx, :]):
num_occurrences = np.random.binomial(num_samples, prob)
if ts_targets[step_idx, seq_idx, note_idx] == 0.0:
false_positives += num_occurrences
else:
false_negatives += (num_samples - num_occurrences)
true_positives += num_occurrences
accuracy = (float(true_positives) / float(true_positives + false_positives + false_negatives))
print "Precision: {}".format(float(true_positives) / (float(true_positives + false_positives)))
print "Recall: {}".format(float(true_positives) / (float(true_positives + false_negatives)))
print "Accuracy: {}".format(accuracy)