\n\n

Music Language Modeling with Recurrent Neural Networks

\n\n

TL;DR

I trained a Long Short-Term Memory (LSTM) Recurrent Neural Network on a dataset of around 650 jigs and folk tunes. I sampled from this model to generate the following musical pieces:

You can find 8 more pieces here\n

\n\n

Introduction

Neural Networks are all the rage these days, and with good reason. Microsoft Research's winning model on the 2015 ImageNet competition is classifying images with 3.57% error rate (human performance is 5.1%). Google used a variant to crush one of the world's best Go players 4-1. Crazy things are happening in the field, with no sign of slowing down. In this project, I've applied recurrent neural nets to learn a predictive model over symbolic sequences of music.

\n Disclaimer: This post assumes a familiarity with machine learning and neural networks. For a good overview of RNN's, I highly recommend reading Andrej Karpathy's excellent blog post here for an in-depth explanation.\n

\n \n

Music Language Modeling

Music Language Modeling is the problem of modeling symbolic sequences of polyphonic music in a completely general piano roll representation. Piano roll representation is a key distinction here, meaning we're going to use the symbolic note sequences as represented by sheet music, as opposed to more complex, acoustically rich audio signals. MIDI files are perfect for this, as they encode all the note information exactly to how it would be displayed on a piano roll.

The most straightforward way to learn this way is to discretize a piece of music into uniform time steps. There are 88 possible pitches from A0 to C8 in a MIDI file, so every time step is encoded into an 88-dimensional binary vector as shown above. A value of 1 at index i indicates that pitch i is playing at a given time step. Then we plug this sequence of input vectors into an RNN architecture, where at each step the target is to predict the next time step of the sequence. A trained model outputs the conditional distribution of notes at a time step, given the all the time steps that have occured before it.

\n\n

One problem with this naive formulation is that the amount of potential note configurations is too high ($2^{N}$ for $N$ possible notes) to take the softmax classification approach normally used in image classification and language modeling. Instead, we need to use a sigmoid cross-entropy loss function to predict the probability of whether each note class is active or not separately. However, this approach does not make much sense for the complex joint distribution of notes usually found in a time step. For example, C is much more likely than C# to be playing when E and G are also active, but separate classification targets implicity assumes independence between note probabilities at the same time step. Modeling Temporal Dependencies in High-Dimensional Sequences (Boulanger-Lewandowski, 2012), perhaps the most succesful research paper on MLM so far, attempts to solve this problem using energy based generative models such as the Restricted Boltzmann Machine (RBM). They propose the combined RNN-RBM architecture, which achieves state-of-the-art performance on several music datasets.

\n\n

Model

For my model, I decided to take the approach of introducing more musical structure into learning. Many musical pieces can be separated into two parts: a melody and harmony. I make the two following assumptions about a piece of music for my model: First, the melody is monophonic (only one note at most playing at every time step). Second, the harmony at each time step can be classified into a chord class. For example, a C, E, G active during a time step would is classified as C Major. These are strong assumptions, but they lead to the nice property of exactly one active melody class and one active harmony class at every time step. This allows us to take the sum of two softmax functions as the loss function for our model.

My model works in the following way: for every time step I encode the melody note into a one-hot-encoding binary vector. I then use the notes playing in the harmony to infer the chord class, and turn that into a one-hot-encoding binary vector as well. The full input vector is a concatentation of the melody and harmony vectors. This input vector then passes through hidden layer(s) of LSTM cells. The loss function is the sum of two separate softmax loss functions over the respective melody and harmony parts of the output layer.

\n\n

\n $L(z, m, h) = \\alpha \\, log \\bigg( \\frac{ e^{z_m} }{ \\sum_{n=0}^{M-1}{ e^{z_n}}} \\bigg) + (1 - \\alpha) \\, log \\bigg( \\frac{ e^{z_{M+h}} }{ \\sum_{n=M}^{M+H}{ e^{z_n}}} \\bigg)$\n

\n\n

If we have $M$ melody classes and $H$ harmony classes, the function above describes the log-likelihood loss at a time step given the output layer $z \\in \\mathbb{R}^{M+H}$, a target melody class $m$, and target harmony class $h$. $\\alpha$ is what I call the melody coefficient that controls how much the loss function is affected by it's respective melody and harmony loss terms.

\n\n

Experiments

The Nottingham dataset is a collection of 1200 jigs and folk tunes, most of which fit the assumptions specified above: they have a simple monophonic melody on top of recognizable chords. You can download the all the nottingham tunes as MIDI files here. I discretized each of these sequences into time steps of sixteenth notes (1/4 of a quarter note), and used the mingus python package to detect the chord classes in the harmonies. After some filtering out some sequences that didn't fit the assumptions, I ended up with 32 chord classes and 34 possible melody notes (1 class from each of these represented a rest) for a total input dimension of 66 over 997 sequences. The average length of a sequence was 516 (roughly 32 measures in 4/4). Finally, all the sequences were split up into 65% training, 15% validation, and 15% testing.

\n\n

An example musical sequence from the Nottingham dataset

\n\n

I used Google's TensorFlow library to implement my model. The architecture that I found worked best was 2 stacked hidden layers of 200 LSTM units each. I batched sequences by length, and used an unrolling length of 128 (8 measures in 4/4 time signature) for Backpropagation through time (BPTT). I used RMSProp with a learning rate of 0.005 and decay rate of 0.9 for minibatch gradient descent. When searching over the hyperparameter space, I trained each model for 250 epochs, and saved the model with the lowest validation loss.

\n \n \n

Training and validation loss plotted over num epochs for a model with 2 stacked layers of 200 LSTM units, with 50% dropout on hidden layers and 80% dropout on input layers. Overfitting issues start showing up after about 20 epochs.

\n\n

One big issue I ran into when training was extreme overfitting. Adding dropout on the non-recurrent connections helped some, but did not completely eliminate the issue. The best dropout configuration I found and ended up using was 50% on the hidden layers and 80% on the input layers.

\n\n

Results

The best model I found achieved on overall accuracy of 77.84% on the test set. One nice consequence of my model is I can evaluate the melody and harmony accuracies separately, which ended up being 64.15% and 91.57% for the melody and harmony respectively. The higher harmony accuracy makes sense, because most of the pieces in the dataset hold out chords for 8 or 16 time steps (a half or whole note in 4/4 time).

Alright, enough numbers, let's get to the fun stuff. Once the model is trained, generating music from it is just a matter of sampling a melody and harmony from the probability distribution at each time step and plugging it back into the network. Rinse and repeat. I present to you 8 more pieces generated by my model below. I \"primed\" each with the starting 16 time steps (1 measure in 4/4 time) from a random test sequence, and then let them do their thing for 2048 time steps.

Some turned out sounding better than others to my ears, but overall the model clearly does not produce human-level compositions. The lack of long-term structure such as repeated phrases and themes is especially revealing. However, for the most part, the model seems to play a melody in key with the harmony that it chooses. The melody also tends to stay in the same key signature for short-term phrases, and sometimes the harmony accompanies it with short chord progressions in that same key. There does also seem to be small pieces of coherent rhythmic structure, although the \"time signature\" overall throughout a piece is sporadic.

\n \n

Many thanks go out to Fei Sha for providing valuable advice; this work was completed as part of my final project for his research seminar. If you're interested in learning more, the final report contains more about the model and a few more experimental results. The source code is also available on github here if you'd like to use my code to train your own models! (warning: messy code)

. \n

\n\n \n \n \n \n \n \n\n" }, { "path": "install.sh", "content": "conda create -n music_rnn python=2.7\nsource activate music_rnn\n\npip install -r requirements.txt\n\nmkdir models\n\nmkdir data\n# http://www-etud.iro.umontreal.ca/~boulanni/icml2012\nwget http://www-etud.iro.umontreal.ca/~boulanni/Nottingham.zip -O data/Nottingham.zip\nunzip data/Nottingham.zip -d data/\n" }, { "path": "midi_util.py", "content": "import sys, os\nfrom collections import defaultdict\nimport numpy as np\nimport midi\n\nRANGE = 128\n\ndef round_tick(tick, time_step):\n return int(round(tick/float(time_step)) * time_step)\n\ndef ingest_notes(track, verbose=False):\n\n notes = { n: [] for n in range(RANGE) }\n current_tick = 0\n\n for msg in track:\n # ignore all end of track events\n if isinstance(msg, midi.EndOfTrackEvent):\n continue\n\n if msg.tick > 0: \n current_tick += msg.tick\n\n # velocity of 0 is equivalent to note off, so treat as such\n if isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() != 0:\n if len(notes[msg.get_pitch()]) > 0 and \\\n len(notes[msg.get_pitch()][-1]) != 2:\n if verbose:\n print \"Warning: double NoteOn encountered, deleting the first\"\n print msg\n else:\n notes[msg.get_pitch()] += [[current_tick]]\n elif isinstance(msg, midi.NoteOffEvent) or \\\n (isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() == 0):\n # sanity check: no notes end without being started\n if len(notes[msg.get_pitch()][-1]) != 1:\n if verbose:\n print \"Warning: skipping NoteOff Event with no corresponding NoteOn\"\n print msg\n else: \n notes[msg.get_pitch()][-1] += [current_tick]\n\n return notes, current_tick\n\ndef round_notes(notes, track_ticks, time_step, R=None, O=None):\n if not R:\n R = RANGE\n if not O:\n O = 0\n\n sequence = np.zeros((track_ticks/time_step, R))\n disputed = { t: defaultdict(int) for t in range(track_ticks/time_step) }\n for note in notes:\n for (start, end) in notes[note]:\n start_t = round_tick(start, time_step) / time_step\n end_t = round_tick(end, time_step) / time_step\n # normal case where note is long enough\n if end - start > time_step/2 and start_t != end_t:\n sequence[start_t:end_t, note - O] = 1\n # cases where note is within bounds of time step \n elif start > start_t * time_step:\n disputed[start_t][note] += (end - start)\n elif end <= end_t * time_step:\n disputed[end_t-1][note] += (end - start)\n # case where a note is on the border \n else:\n before_border = start_t * time_step - start\n if before_border > 0:\n disputed[start_t-1][note] += before_border\n after_border = end - start_t * time_step\n if after_border > 0 and end < track_ticks:\n disputed[start_t][note] += after_border\n\n # solve disputed\n for seq_idx in range(sequence.shape[0]):\n if np.count_nonzero(sequence[seq_idx, :]) == 0 and len(disputed[seq_idx]) > 0:\n # print seq_idx, disputed[seq_idx]\n sorted_notes = sorted(disputed[seq_idx].items(),\n key=lambda x: x[1])\n max_val = max(x[1] for x in sorted_notes)\n top_notes = filter(lambda x: x[1] >= max_val, sorted_notes)\n for note, _ in top_notes:\n sequence[seq_idx, note - O] = 1\n\n return sequence\n\ndef parse_midi_to_sequence(input_filename, time_step, verbose=False):\n sequence = []\n pattern = midi.read_midifile(input_filename)\n\n if len(pattern) < 1:\n raise Exception(\"No pattern found in midi file\")\n\n if verbose:\n print \"Track resolution: {}\".format(pattern.resolution)\n print \"Number of tracks: {}\".format(len(pattern))\n print \"Time step: {}\".format(time_step)\n\n # Track ingestion stage\n notes = { n: [] for n in range(RANGE) }\n track_ticks = 0\n for track in pattern:\n current_tick = 0\n for msg in track:\n # ignore all end of track events\n if isinstance(msg, midi.EndOfTrackEvent):\n continue\n\n if msg.tick > 0: \n current_tick += msg.tick\n\n # velocity of 0 is equivalent to note off, so treat as such\n if isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() != 0:\n if len(notes[msg.get_pitch()]) > 0 and \\\n len(notes[msg.get_pitch()][-1]) != 2:\n if verbose:\n print \"Warning: double NoteOn encountered, deleting the first\"\n print msg\n else:\n notes[msg.get_pitch()] += [[current_tick]]\n elif isinstance(msg, midi.NoteOffEvent) or \\\n (isinstance(msg, midi.NoteOnEvent) and msg.get_velocity() == 0):\n # sanity check: no notes end without being started\n if len(notes[msg.get_pitch()][-1]) != 1:\n if verbose:\n print \"Warning: skipping NoteOff Event with no corresponding NoteOn\"\n print msg\n else: \n notes[msg.get_pitch()][-1] += [current_tick]\n\n track_ticks = max(current_tick, track_ticks)\n\n track_ticks = round_tick(track_ticks, time_step)\n if verbose:\n print \"Track ticks (rounded): {} ({} time steps)\".format(track_ticks, track_ticks/time_step)\n\n sequence = round_notes(notes, track_ticks, time_step)\n\n return sequence\n\nclass MidiWriter(object):\n\n def __init__(self, verbose=False):\n self.verbose = verbose\n self.note_range = RANGE\n\n def note_off(self, val, tick):\n self.track.append(midi.NoteOffEvent(tick=tick, pitch=val))\n return 0\n\n def note_on(self, val, tick):\n self.track.append(midi.NoteOnEvent(tick=tick, pitch=val, velocity=70))\n return 0\n\n def dump_sequence_to_midi(self, sequence, output_filename, time_step, \n resolution, metronome=24):\n if self.verbose:\n print \"Dumping sequence to MIDI file: {}\".format(output_filename)\n print \"Resolution: {}\".format(resolution)\n print \"Time Step: {}\".format(time_step)\n\n pattern = midi.Pattern(resolution=resolution)\n self.track = midi.Track()\n\n # metadata track\n meta_track = midi.Track()\n time_sig = midi.TimeSignatureEvent()\n time_sig.set_numerator(4)\n time_sig.set_denominator(4)\n time_sig.set_metronome(metronome)\n time_sig.set_thirtyseconds(8)\n meta_track.append(time_sig)\n pattern.append(meta_track)\n\n # reshape to (SEQ_LENGTH X NUM_DIMS)\n sequence = np.reshape(sequence, [-1, self.note_range])\n\n time_steps = sequence.shape[0]\n if self.verbose:\n print \"Total number of time steps: {}\".format(time_steps)\n\n tick = time_step\n self.notes_on = { n: False for n in range(self.note_range) }\n # for seq_idx in range(188, 220):\n for seq_idx in range(time_steps):\n notes = np.nonzero(sequence[seq_idx, :])[0].tolist()\n\n # this tick will only be assigned to first NoteOn/NoteOff in\n # this time_step\n\n # NoteOffEvents come first so they'll have the tick value\n # go through all notes that are currently on and see if any\n # turned off\n for n in self.notes_on:\n if self.notes_on[n] and n not in notes:\n tick = self.note_off(n, tick)\n self.notes_on[n] = False\n\n # Turn on any notes that weren't previously on\n for note in notes:\n if not self.notes_on[note]:\n tick = self.note_on(note, tick)\n self.notes_on[note] = True\n\n tick += time_step\n\n # flush out notes\n for n in self.notes_on:\n if self.notes_on[n]:\n self.note_off(n, tick)\n tick = 0\n self.notes_on[n] = False\n\n pattern.append(self.track)\n midi.write_midifile(output_filename, pattern)\n\nif __name__ == '__main__':\n pass\n" }, { "path": "model.py", "content": "import os\nimport logging\nimport numpy as np\nimport tensorflow as tf \nfrom tensorflow.models.rnn import rnn_cell\nfrom tensorflow.models.rnn import rnn, seq2seq\n\nimport nottingham_util\n\nclass Model(object):\n \"\"\" \n Cross-Entropy Naive Formulation\n A single time step may have multiple notes active, so a sigmoid cross entropy loss\n is used to match targets.\n\n seq_input: a [ T x B x D ] matrix, where T is the time steps in the batch, B is the\n batch size, and D is the amount of dimensions\n \"\"\"\n \n def __init__(self, config, training=False):\n self.config = config\n self.time_batch_len = time_batch_len = config.time_batch_len\n self.input_dim = input_dim = config.input_dim\n hidden_size = config.hidden_size\n num_layers = config.num_layers\n dropout_prob = config.dropout_prob\n input_dropout_prob = config.input_dropout_prob\n cell_type = config.cell_type\n\n self.seq_input = \\\n tf.placeholder(tf.float32, shape=[self.time_batch_len, None, input_dim])\n\n if (dropout_prob <= 0.0 or dropout_prob > 1.0):\n raise Exception(\"Invalid dropout probability: {}\".format(dropout_prob))\n\n if (input_dropout_prob <= 0.0 or input_dropout_prob > 1.0):\n raise Exception(\"Invalid input dropout probability: {}\".format(input_dropout_prob))\n\n # setup variables\n with tf.variable_scope(\"rnnlstm\"):\n output_W = tf.get_variable(\"output_w\", [hidden_size, input_dim])\n output_b = tf.get_variable(\"output_b\", [input_dim])\n self.lr = tf.constant(config.learning_rate, name=\"learning_rate\")\n self.lr_decay = tf.constant(config.learning_rate_decay, name=\"learning_rate_decay\")\n\n def create_cell(input_size):\n if cell_type == \"vanilla\":\n cell_class = rnn_cell.BasicRNNCell\n elif cell_type == \"gru\":\n cell_class = rnn_cell.BasicGRUCell\n elif cell_type == \"lstm\":\n cell_class = rnn_cell.BasicLSTMCell\n else:\n raise Exception(\"Invalid cell type: {}\".format(cell_type))\n\n cell = cell_class(hidden_size, input_size = input_size)\n if training:\n return rnn_cell.DropoutWrapper(cell, output_keep_prob = dropout_prob)\n else:\n return cell\n\n if training:\n self.seq_input_dropout = tf.nn.dropout(self.seq_input, keep_prob = input_dropout_prob)\n else:\n self.seq_input_dropout = self.seq_input\n\n self.cell = rnn_cell.MultiRNNCell(\n [create_cell(input_dim)] + [create_cell(hidden_size) for i in range(1, num_layers)])\n\n batch_size = tf.shape(self.seq_input_dropout)[0]\n self.initial_state = self.cell.zero_state(batch_size, tf.float32)\n inputs_list = tf.unpack(self.seq_input_dropout)\n\n # rnn outputs a list of [batch_size x H] outputs\n outputs_list, self.final_state = rnn.rnn(self.cell, inputs_list, \n initial_state=self.initial_state)\n\n outputs = tf.pack(outputs_list)\n outputs_concat = tf.reshape(outputs, [-1, hidden_size])\n logits_concat = tf.matmul(outputs_concat, output_W) + output_b\n logits = tf.reshape(logits_concat, [self.time_batch_len, -1, input_dim])\n\n # probabilities of each note\n self.probs = self.calculate_probs(logits)\n self.loss = self.init_loss(logits, logits_concat)\n self.train_step = tf.train.RMSPropOptimizer(self.lr, decay = self.lr_decay) \\\n .minimize(self.loss)\n\n def init_loss(self, outputs, _):\n self.seq_targets = \\\n tf.placeholder(tf.float32, [self.time_batch_len, None, self.input_dim])\n\n batch_size = tf.shape(self.seq_input_dropout)\n cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(outputs, self.seq_targets)\n return tf.reduce_sum(cross_ent) / self.time_batch_len / tf.to_float(batch_size)\n\n def calculate_probs(self, logits):\n return tf.sigmoid(logits)\n\n def get_cell_zero_state(self, session, batch_size):\n return self.cell.zero_state(batch_size, tf.float32).eval(session=session)\n\nclass NottinghamModel(Model):\n \"\"\" \n Dual softmax formulation \n\n A single time step should be a concatenation of two one-hot-encoding binary vectors.\n Loss function is a sum of two softmax loss functions over [:r] and [r:] respectively,\n where r is the number of melody classes\n \"\"\"\n\n def init_loss(self, outputs, outputs_concat):\n self.seq_targets = \\\n tf.placeholder(tf.int64, [self.time_batch_len, None, 2])\n batch_size = tf.shape(self.seq_targets)[1]\n\n with tf.variable_scope(\"rnnlstm\"):\n self.melody_coeff = tf.constant(self.config.melody_coeff)\n\n r = nottingham_util.NOTTINGHAM_MELODY_RANGE\n targets_concat = tf.reshape(self.seq_targets, [-1, 2])\n\n melody_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( \\\n outputs_concat[:, :r], \\\n targets_concat[:, 0])\n harmony_loss = tf.nn.sparse_softmax_cross_entropy_with_logits( \\\n outputs_concat[:, r:], \\\n targets_concat[:, 1])\n losses = tf.add(self.melody_coeff * melody_loss, (1 - self.melody_coeff) * harmony_loss)\n return tf.reduce_sum(losses) / self.time_batch_len / tf.to_float(batch_size)\n\n def calculate_probs(self, logits):\n steps = []\n for t in range(self.time_batch_len):\n melody_softmax = tf.nn.softmax(logits[t, :, :nottingham_util.NOTTINGHAM_MELODY_RANGE])\n harmony_softmax = tf.nn.softmax(logits[t, :, nottingham_util.NOTTINGHAM_MELODY_RANGE:])\n steps.append(tf.concat(1, [melody_softmax, harmony_softmax]))\n return tf.pack(steps)\n\n def assign_melody_coeff(self, session, melody_coeff):\n if melody_coeff < 0.0 or melody_coeff > 1.0:\n raise Exception(\"Invalid melody coeffecient\")\n\n session.run(tf.assign(self.melody_coeff, melody_coeff))\n\nclass NottinghamSeparate(Model):\n \"\"\" \n Single softmax formulation \n \n Regular single classification formulation, used to train baseline models\n where the melody and harmony are trained separately\n \"\"\"\n\n def init_loss(self, outputs, outputs_concat):\n self.seq_targets = \\\n tf.placeholder(tf.int64, [self.time_batch_len, None])\n batch_size = tf.shape(self.seq_targets)[1]\n\n with tf.variable_scope(\"rnnlstm\"):\n self.melody_coeff = tf.constant(self.config.melody_coeff)\n\n targets_concat = tf.reshape(self.seq_targets, [-1])\n losses = tf.nn.sparse_softmax_cross_entropy_with_logits( \\\n outputs_concat, targets_concat)\n\n return tf.reduce_sum(losses) / self.time_batch_len / tf.to_float(batch_size)\n\n def calculate_probs(self, logits):\n steps = []\n for t in range(self.time_batch_len):\n softmax = tf.nn.softmax(logits[t, :, :])\n steps.append(softmax)\n return tf.pack(steps)\n" }, { "path": "nottingham_util.py", "content": "import numpy as np\nimport os\nimport midi\nimport cPickle\nfrom pprint import pprint\n\nimport midi_util\nimport mingus\nimport mingus.core.chords\nimport sampling\n\nPICKLE_LOC = 'data/nottingham.pickle'\nNOTTINGHAM_MELODY_MAX = 88\nNOTTINGHAM_MELODY_MIN = 55\n# add one to the range for silence in melody\nNOTTINGHAM_MELODY_RANGE = NOTTINGHAM_MELODY_MAX - NOTTINGHAM_MELODY_MIN + 1 + 1\nCHORD_BASE = 48\nCHORD_BLACKLIST = ['major third', 'minor third', 'perfect fifth']\nNO_CHORD = 'NONE'\nSHARPS_TO_FLATS = {\n \"A#\": \"Bb\",\n \"B#\": \"C\",\n \"C#\": \"Db\",\n \"D#\": \"Eb\",\n \"E#\": \"F\",\n \"F#\": \"Gb\",\n \"G#\": \"Ab\",\n}\n\ndef resolve_chord(chord):\n \"\"\"\n Resolves rare chords to their closest common chord, to limit the total\n amount of chord classes.\n \"\"\"\n if chord in CHORD_BLACKLIST:\n return None\n # take the first of dual chords\n if \"|\" in chord:\n chord = chord.split(\"|\")[0]\n # remove 7ths, 11ths, 9s, 6th,\n if chord.endswith(\"11\"):\n chord = chord[:-2]\n if chord.endswith(\"7\") or chord.endswith(\"9\") or chord.endswith(\"6\"):\n chord = chord[:-1]\n # replace 'dim' with minor\n if chord.endswith(\"dim\"):\n chord = chord[:-3] + \"m\"\n return chord\n\ndef prepare_nottingham_pickle(time_step, chord_cutoff=64, filename=PICKLE_LOC, verbose=False):\n \"\"\"\n time_step: the time step to discretize all notes into\n chord_cutoff: if chords are seen less than this cutoff, they are ignored and marked as\n as rests in the resulting dataset\n filename: the location where the pickle will be saved to\n \"\"\"\n\n data = {}\n store = {}\n chords = {}\n max_seq = 0\n seq_lens = []\n \n for d in [\"train\", \"test\", \"valid\"]:\n print \"Parsing {}...\".format(d)\n parsed = parse_nottingham_directory(\"data/Nottingham/{}\".format(d), time_step, verbose=False)\n metadata = [s[0] for s in parsed]\n seqs = [s[1] for s in parsed]\n data[d] = seqs\n data[d + '_metadata'] = metadata\n lens = [len(s[1]) for s in seqs]\n seq_lens += lens\n max_seq = max(max_seq, max(lens))\n \n for _, harmony in seqs:\n for h in harmony:\n if h not in chords:\n chords[h] = 1\n else:\n chords[h] += 1\n\n avg_seq = float(sum(seq_lens)) / len(seq_lens)\n\n chords = { c: i for c, i in chords.iteritems() if chords[c] >= chord_cutoff }\n chord_mapping = { c: i for i, c in enumerate(chords.keys()) }\n num_chords = len(chord_mapping)\n store['chord_to_idx'] = chord_mapping\n if verbose:\n pprint(chords)\n print \"Number of chords: {}\".format(num_chords)\n print \"Max Sequence length: {}\".format(max_seq)\n print \"Avg Sequence length: {}\".format(avg_seq)\n print \"Num Sequences: {}\".format(len(seq_lens))\n\n def combine(melody, harmony):\n full = np.zeros((melody.shape[0], NOTTINGHAM_MELODY_RANGE + num_chords))\n\n assert melody.shape[0] == len(harmony)\n\n # for all melody sequences that don't have any notes, add the empty melody marker (last one)\n for i in range(melody.shape[0]):\n if np.count_nonzero(melody[i, :]) == 0:\n melody[i, NOTTINGHAM_MELODY_RANGE-1] = 1\n\n # all melody encodings should now have exactly one 1\n for i in range(melody.shape[0]):\n assert np.count_nonzero(melody[i, :]) == 1\n\n # add all the melodies\n full[:, :melody.shape[1]] += melody\n\n harmony_idxs = [ chord_mapping[h] if h in chord_mapping else chord_mapping[NO_CHORD] \\\n for h in harmony ]\n harmony_idxs = [ NOTTINGHAM_MELODY_RANGE + h for h in harmony_idxs ]\n full[np.arange(len(harmony)), harmony_idxs] = 1\n\n # all full encodings should have exactly two 1's\n for i in range(full.shape[0]):\n assert np.count_nonzero(full[i, :]) == 2\n\n return full\n\n for d in [\"train\", \"test\", \"valid\"]:\n print \"Combining {}\".format(d)\n store[d] = [ combine(m, h) for m, h in data[d] ]\n store[d + '_metadata'] = data[d + '_metadata']\n\n with open(filename, 'w') as f:\n cPickle.dump(store, f, protocol=-1)\n\n return True\n\ndef parse_nottingham_directory(input_dir, time_step, verbose=False):\n \"\"\"\n input_dir: a directory containing MIDI files\n\n returns a list of [T x D] matrices, where each matrix represents a \n a sequence with T time steps over D dimensions\n \"\"\"\n\n files = [ os.path.join(input_dir, f) for f in os.listdir(input_dir)\n if os.path.isfile(os.path.join(input_dir, f)) ] \n sequences = [ \\\n parse_nottingham_to_sequence(f, time_step=time_step, verbose=verbose) \\\n for f in files ]\n\n if verbose:\n print \"Total sequences: {}\".format(len(sequences))\n \n # filter out the non 2-track MIDI's\n sequences = filter(lambda x: x[1] != None, sequences)\n\n if verbose:\n print \"Total sequences left: {}\".format(len(sequences))\n\n return sequences\n\ndef parse_nottingham_to_sequence(input_filename, time_step, verbose=False):\n \"\"\"\n input_filename: a MIDI filename\n\n returns a [T x D] matrix representing a sequence with T time steps over\n D dimensions\n \"\"\"\n sequence = []\n pattern = midi.read_midifile(input_filename)\n\n metadata = {\n \"path\": input_filename,\n \"name\": input_filename.split(\"/\")[-1].split(\".\")[0]\n }\n\n # Most nottingham midi's have 3 tracks. metadata info, melody, harmony\n # throw away any tracks that don't fit this\n if len(pattern) != 3:\n if verbose:\n \"Skipping track with {} tracks\".format(len(pattern))\n return (metadata, None)\n\n # ticks_per_quarter = -1\n for msg in pattern[0]:\n if isinstance(msg, midi.TimeSignatureEvent):\n metadata[\"ticks_per_quarter\"] = msg.get_metronome()\n ticks_per_quarter = msg.get_metronome()\n\n if verbose:\n print \"{}\".format(input_filename)\n print \"Track resolution: {}\".format(pattern.resolution)\n print \"Number of tracks: {}\".format(len(pattern))\n print \"Time step: {}\".format(time_step)\n print \"Ticks per quarter: {}\".format(ticks_per_quarter)\n\n # Track ingestion stage\n track_ticks = 0\n\n melody_notes, melody_ticks = midi_util.ingest_notes(pattern[1])\n harmony_notes, harmony_ticks = midi_util.ingest_notes(pattern[2])\n\n track_ticks = midi_util.round_tick(max(melody_ticks, harmony_ticks), time_step)\n if verbose:\n print \"Track ticks (rounded): {} ({} time steps)\".format(track_ticks, track_ticks/time_step)\n \n melody_sequence = midi_util.round_notes(melody_notes, track_ticks, time_step, \n R=NOTTINGHAM_MELODY_RANGE, O=NOTTINGHAM_MELODY_MIN)\n\n for i in range(melody_sequence.shape[0]):\n if np.count_nonzero(melody_sequence[i, :]) > 1:\n if verbose:\n print \"Double note found: {}: {} ({})\".format(i, np.nonzero(melody_sequence[i, :]), input_filename)\n return (metadata, None)\n\n harmony_sequence = midi_util.round_notes(harmony_notes, track_ticks, time_step)\n\n harmonies = []\n for i in range(harmony_sequence.shape[0]):\n notes = np.where(harmony_sequence[i] == 1)[0]\n if len(notes) > 0:\n notes_shift = [ mingus.core.notes.int_to_note(h%12) for h in notes]\n chord = mingus.core.chords.determine(notes_shift, shorthand=True)\n if len(chord) == 0:\n # try flat combinations\n notes_shift = [ SHARPS_TO_FLATS[n] if n in SHARPS_TO_FLATS else n for n in notes_shift]\n chord = mingus.core.chords.determine(notes_shift, shorthand=True)\n if len(chord) == 0:\n if verbose:\n print \"Could not determine chord: {} ({}, {}), defaulting to last steps chord\" \\\n .format(notes_shift, input_filename, i)\n if len(harmonies) > 0:\n harmonies.append(harmonies[-1])\n else:\n harmonies.append(NO_CHORD)\n else:\n resolved = resolve_chord(chord[0])\n if resolved:\n harmonies.append(resolved)\n else:\n harmonies.append(NO_CHORD)\n else:\n harmonies.append(NO_CHORD)\n\n return (metadata, (melody_sequence, harmonies))\n\nclass NottinghamMidiWriter(midi_util.MidiWriter):\n\n def __init__(self, chord_to_idx, verbose=False):\n super(NottinghamMidiWriter, self).__init__(verbose)\n self.idx_to_chord = { i: c for c, i in chord_to_idx.items() }\n self.note_range = NOTTINGHAM_MELODY_RANGE + len(self.idx_to_chord)\n\n def dereference_chord(self, idx):\n if idx not in self.idx_to_chord:\n raise Exception(\"No chord index found: {}\".format(idx))\n shorthand = self.idx_to_chord[idx]\n if shorthand == NO_CHORD:\n return []\n chord = mingus.core.chords.from_shorthand(shorthand)\n return [ CHORD_BASE + mingus.core.notes.note_to_int(n) for n in chord ]\n\n def note_on(self, val, tick):\n if val >= NOTTINGHAM_MELODY_RANGE:\n notes = self.dereference_chord(val - NOTTINGHAM_MELODY_RANGE)\n else:\n # if note is the top of the range, then it stands for gap in melody\n if val == NOTTINGHAM_MELODY_RANGE - 1:\n notes = []\n else:\n notes = [NOTTINGHAM_MELODY_MIN + val]\n\n # print 'turning on {}'.format(notes)\n for note in notes:\n self.track.append(midi.NoteOnEvent(tick=tick, pitch=note, velocity=70))\n tick = 0 # notes that come right after each other should have zero tick\n\n return tick\n\n def note_off(self, val, tick):\n if val >= NOTTINGHAM_MELODY_RANGE:\n notes = self.dereference_chord(val - NOTTINGHAM_MELODY_RANGE)\n else:\n notes = [NOTTINGHAM_MELODY_MIN + val]\n\n # print 'turning off {}'.format(notes)\n for note in notes:\n self.track.append(midi.NoteOffEvent(tick=tick, pitch=note))\n tick = 0\n\n return tick\n\nclass NottinghamSampler(object):\n\n def __init__(self, chord_to_idx, method = 'sample', harmony_repeat_max = 16, melody_repeat_max = 16, verbose=False):\n self.verbose = verbose \n self.idx_to_chord = { i: c for c, i in chord_to_idx.items() }\n self.method = method\n\n self.hlast = 0\n self.hcount = 0\n self.hrepeat = harmony_repeat_max\n\n self.mlast = 0\n self.mcount = 0\n self.mrepeat = melody_repeat_max \n\n def visualize_probs(self, probs):\n if not self.verbose:\n return\n\n melodies = sorted(list(enumerate(probs[:NOTTINGHAM_MELODY_RANGE])), \n key=lambda x: x[1], reverse=True)[:4]\n harmonies = sorted(list(enumerate(probs[NOTTINGHAM_MELODY_RANGE:])), \n key=lambda x: x[1], reverse=True)[:4]\n harmonies = [(self.idx_to_chord[i], j) for i, j in harmonies]\n print 'Top Melody Notes: '\n pprint(melodies)\n print 'Top Harmony Notes: '\n pprint(harmonies)\n\n def sample_notes_static(self, probs):\n top_m = probs[:NOTTINGHAM_MELODY_RANGE].argsort()\n if top_m[-1] == self.mlast and self.mcount >= self.mrepeat:\n top_m = top_m[:-1]\n self.mcount = 0\n elif top_m[-1] == self.mlast:\n self.mcount += 1\n else:\n self.mcount = 0\n self.mlast = top_m[-1]\n top_melody = top_m[-1]\n\n top_h = probs[NOTTINGHAM_MELODY_RANGE:].argsort()\n if top_h[-1] == self.hlast and self.hcount >= self.hrepeat:\n top_h = top_h[:-1]\n self.hcount = 0\n elif top_h[-1] == self.hlast:\n self.hcount += 1\n else:\n self.hcount = 0\n self.hlast = top_h[-1]\n top_chord = top_h[-1] + NOTTINGHAM_MELODY_RANGE\n\n chord = np.zeros([len(probs)], dtype=np.int32)\n chord[top_melody] = 1.0\n chord[top_chord] = 1.0\n return chord\n\n def sample_notes_dist(self, probs):\n idxed = [(i, p) for i, p in enumerate(probs)]\n\n notes = [n[0] for n in idxed]\n ps = np.array([n[1] for n in idxed])\n r = NOTTINGHAM_MELODY_RANGE\n\n assert np.allclose(np.sum(ps[:r]), 1.0)\n assert np.allclose(np.sum(ps[r:]), 1.0)\n\n # renormalize so numpy doesn't complain\n ps[:r] = ps[:r] / ps[:r].sum()\n ps[r:] = ps[r:] / ps[r:].sum()\n\n melody = np.random.choice(notes[:r], p=ps[:r])\n harmony = np.random.choice(notes[r:], p=ps[r:])\n\n chord = np.zeros([len(probs)], dtype=np.int32)\n chord[melody] = 1.0\n chord[harmony] = 1.0\n return chord\n\n\n def sample_notes(self, probs):\n self.visualize_probs(probs)\n if self.method == 'static':\n return self.sample_notes_static(probs)\n elif self.method == 'sample':\n return self.sample_notes_dist(probs)\n\ndef accuracy(batch_probs, data, num_samples=1):\n \"\"\"\n Batch Probs: { num_time_steps: [ time_step_1, time_step_2, ... ] }\n Data: [ \n [ [ data ], [ target ] ], # batch with one time step\n [ [ data1, data2 ], [ target1, target2 ] ], # batch with two time steps\n ...\n ]\n \"\"\"\n\n def calc_accuracy():\n total = 0\n melody_correct, harmony_correct = 0, 0\n melody_incorrect, harmony_incorrect = 0, 0\n for _, batch_targets in data:\n num_time_steps = len(batch_targets)\n for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):\n\n assert ts_targets.shape == ts_targets.shape\n\n for seq_idx in range(ts_targets.shape[1]):\n for step_idx in range(ts_targets.shape[0]):\n idxed = [(n, p) for n, p in \\\n enumerate(ts_probs[step_idx, seq_idx, :])]\n notes = [n[0] for n in idxed]\n ps = np.array([n[1] for n in idxed])\n r = NOTTINGHAM_MELODY_RANGE\n\n assert np.allclose(np.sum(ps[:r]), 1.0)\n assert np.allclose(np.sum(ps[r:]), 1.0)\n\n # renormalize so numpy doesn't complain\n ps[:r] = ps[:r] / ps[:r].sum()\n ps[r:] = ps[r:] / ps[r:].sum()\n\n melody = np.random.choice(notes[:r], p=ps[:r])\n harmony = np.random.choice(notes[r:], p=ps[r:])\n\n melody_target = ts_targets[step_idx, seq_idx, 0]\n if melody_target == melody:\n melody_correct += 1\n else:\n melody_incorrect += 1\n\n harmony_target = ts_targets[step_idx, seq_idx, 1] + r\n if harmony_target == harmony:\n harmony_correct += 1\n else:\n harmony_incorrect += 1\n\n return (melody_correct, melody_incorrect, harmony_correct, harmony_incorrect)\n\n maccs, haccs, taccs = [], [], []\n for i in range(num_samples):\n print \"Sample {}\".format(i)\n m, mi, h, hi = calc_accuracy()\n maccs.append( float(m) / float(m + mi))\n haccs.append( float(h) / float(h + hi))\n taccs.append( float(m + h) / float(m + h + mi + hi) )\n\n print \"Melody Precision/Recall: {}\".format(sum(maccs)/len(maccs))\n print \"Harmony Precision/Recall: {}\".format(sum(haccs)/len(haccs))\n print \"Total Precision/Recall: {}\".format(sum(taccs)/len(taccs))\n\ndef seperate_accuracy(batch_probs, data, num_samples=1):\n\n def calc_accuracy():\n total = 0\n total_correct, total_incorrect = 0, 0\n for _, batch_targets in data:\n num_time_steps = len(batch_targets)\n for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):\n\n assert ts_targets.shape == ts_targets.shape\n\n for seq_idx in range(ts_targets.shape[1]):\n for step_idx in range(ts_targets.shape[0]):\n\n idxed = [(n, p) for n, p in \\\n enumerate(ts_probs[step_idx, seq_idx, :])]\n notes = [n[0] for n in idxed]\n ps = np.array([n[1] for n in idxed])\n r = NOTTINGHAM_MELODY_RANGE\n\n assert np.allclose(np.sum(ps), 1.0)\n ps = ps / ps.sum()\n note = np.random.choice(notes, p=ps)\n\n target = ts_targets[step_idx, seq_idx]\n if target == note:\n total_correct += 1\n else:\n total_incorrect += 1\n\n return (total_correct, total_incorrect)\n\n taccs = []\n for i in range(num_samples):\n print \"Sample {}\".format(i)\n c, ic = calc_accuracy()\n taccs.append( float(c) / float(c + ic))\n\n print \"Precision/Recall: {}\".format(sum(taccs)/len(taccs))\n\ndef i_vi_iv_v(chord_to_idx, repeats, input_dim):\n r = NOTTINGHAM_MELODY_RANGE\n\n i = np.zeros(input_dim)\n i[r + chord_to_idx['CM']] = 1\n\n vi = np.zeros(input_dim)\n vi[r + chord_to_idx['Am']] = 1\n\n iv = np.zeros(input_dim)\n iv[r + chord_to_idx['FM']] = 1\n\n v = np.zeros(input_dim)\n v[r + chord_to_idx['GM']] = 1\n\n full_seq = [i] * 16 + [vi] * 16 + [iv] * 16 + [v] * 16\n full_seq = full_seq * repeats\n \n return full_seq\n\nif __name__ == '__main__':\n\n resolution = 480\n time_step = 120\n\n assert resolve_chord(\"GM7\") == \"GM\"\n assert resolve_chord(\"G#dim|AM7\") == \"G#m\"\n assert resolve_chord(\"Dm9\") == \"Dm\"\n assert resolve_chord(\"AM11\") == \"AM\"\n\n prepare_nottingham_pickle(time_step, verbose=True)\n" }, { "path": "requirements.txt", "content": "matplotlib\nmingus\nnumpy\ngit+https://github.com/vishnubob/python-midi#egg=midi\n# Linux, Python 2.7, GPU\nhttps://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl\n" }, { "path": "rnn.py", "content": "import os, sys\nimport argparse\nimport time\nimport itertools\nimport cPickle\nimport logging\nimport random\nimport string\n\nimport numpy as np\nimport tensorflow as tf \nimport matplotlib.pyplot as plt\n\nimport nottingham_util\nimport util\nfrom model import Model, NottinghamModel\n\ndef get_config_name(config):\n def replace_dot(s): return s.replace(\".\", \"p\")\n return \"nl_\" + str(config.num_layers) + \"_hs_\" + str(config.hidden_size) + \\\n replace_dot(\"_mc_{}\".format(config.melody_coeff)) + \\\n replace_dot(\"_dp_{}\".format(config.dropout_prob)) + \\\n replace_dot(\"_idp_{}\".format(config.input_dropout_prob)) + \\\n replace_dot(\"_tb_{}\".format(config.time_batch_len)) \n\nclass DefaultConfig(object):\n # model parameters\n num_layers = 2\n hidden_size = 200\n melody_coeff = 0.5\n dropout_prob = 0.5\n input_dropout_prob = 0.8\n cell_type = 'lstm'\n\n # learning parameters\n max_time_batches = 9 \n time_batch_len = 128\n learning_rate = 5e-3\n learning_rate_decay = 0.9\n num_epochs = 250\n\n # metadata\n dataset = 'softmax'\n model_file = ''\n\n def __repr__(self):\n return \"\"\"Num Layers: {}, Hidden Size: {}, Melody Coeff: {}, Dropout Prob: {}, Input Dropout Prob: {}, Cell Type: {}, Time Batch Len: {}, Learning Rate: {}, Decay: {}\"\"\".format(self.num_layers, self.hidden_size, self.melody_coeff, self.dropout_prob, self.input_dropout_prob, self.cell_type, self.time_batch_len, self.learning_rate, self.learning_rate_decay)\n \nif __name__ == '__main__':\n np.random.seed() \n\n parser = argparse.ArgumentParser(description='Script to train and save a model.')\n parser.add_argument('--dataset', type=str, default='softmax',\n # choices = ['bach', 'nottingham', 'softmax'],\n choices = ['softmax'])\n parser.add_argument('--model_dir', type=str, default='models')\n parser.add_argument('--run_name', type=str, default=time.strftime(\"%m%d_%H%M\"))\n\n args = parser.parse_args()\n\n if args.dataset == 'softmax':\n resolution = 480\n time_step = 120\n model_class = NottinghamModel\n with open(nottingham_util.PICKLE_LOC, 'r') as f:\n pickle = cPickle.load(f)\n chord_to_idx = pickle['chord_to_idx']\n\n input_dim = pickle[\"train\"][0].shape[1]\n print 'Finished loading data, input dim: {}'.format(input_dim)\n else:\n raise Exception(\"Other datasets not yet implemented\")\n\n initializer = tf.random_uniform_initializer(-0.1, 0.1)\n\n best_config = None\n best_valid_loss = None\n\n # set up run dir\n run_folder = os.path.join(args.model_dir, args.run_name)\n if os.path.exists(run_folder):\n raise Exception(\"Run name {} already exists, choose a different one\", format(run_folder))\n os.makedirs(run_folder)\n\n logger = logging.getLogger(__name__) \n logger.setLevel(logging.INFO)\n logger.addHandler(logging.StreamHandler())\n logger.addHandler(logging.FileHandler(os.path.join(run_folder, \"training.log\")))\n\n grid = {\n \"dropout_prob\": [0.5],\n \"input_dropout_prob\": [0.8],\n \"melody_coeff\": [0.5],\n \"num_layers\": [2],\n \"hidden_size\": [200],\n \"num_epochs\": [250],\n \"learning_rate\": [5e-3],\n \"learning_rate_decay\": [0.9],\n \"time_batch_len\": [128],\n }\n\n # Generate product of hyperparams\n runs = list(list(itertools.izip(grid, x)) for x in itertools.product(*grid.itervalues()))\n logger.info(\"{} runs detected\".format(len(runs)))\n\n for combination in runs:\n\n config = DefaultConfig()\n config.dataset = args.dataset\n config.model_name = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(12)) + '.model'\n for attr, value in combination:\n setattr(config, attr, value)\n\n if config.dataset == 'softmax':\n data = util.load_data('', time_step, config.time_batch_len, config.max_time_batches, nottingham=pickle)\n config.input_dim = data[\"input_dim\"]\n else:\n raise Exception(\"Other datasets not yet implemented\")\n\n logger.info(config)\n config_file_path = os.path.join(run_folder, get_config_name(config) + '.config')\n with open(config_file_path, 'w') as f: \n cPickle.dump(config, f)\n\n with tf.Graph().as_default(), tf.Session() as session:\n with tf.variable_scope(\"model\", reuse=None):\n train_model = model_class(config, training=True)\n with tf.variable_scope(\"model\", reuse=True):\n valid_model = model_class(config, training=False)\n\n saver = tf.train.Saver(tf.all_variables(), max_to_keep=40)\n tf.initialize_all_variables().run()\n\n # training\n early_stop_best_loss = None\n start_saving = False\n saved_flag = False\n train_losses, valid_losses = [], []\n start_time = time.time()\n for i in range(config.num_epochs):\n loss = util.run_epoch(session, train_model, \n data[\"train\"][\"data\"], training=True, testing=False)\n train_losses.append((i, loss))\n if i == 0:\n continue\n\n logger.info('Epoch: {}, Train Loss: {}, Time Per Epoch: {}'.format(\\\n i, loss, (time.time() - start_time)/i))\n valid_loss = util.run_epoch(session, valid_model, data[\"valid\"][\"data\"], training=False, testing=False)\n valid_losses.append((i, valid_loss))\n logger.info('Valid Loss: {}'.format(valid_loss))\n\n if early_stop_best_loss == None:\n early_stop_best_loss = valid_loss\n elif valid_loss < early_stop_best_loss:\n early_stop_best_loss = valid_loss\n if start_saving:\n logger.info('Best loss so far encountered, saving model.')\n saver.save(session, os.path.join(run_folder, config.model_name))\n saved_flag = True\n elif not start_saving:\n start_saving = True \n logger.info('Valid loss increased for the first time, will start saving models')\n saver.save(session, os.path.join(run_folder, config.model_name))\n saved_flag = True\n\n if not saved_flag:\n saver.save(session, os.path.join(run_folder, config.model_name))\n\n # set loss axis max to 20\n axes = plt.gca()\n if config.dataset == 'softmax':\n axes.set_ylim([0, 2])\n else:\n axes.set_ylim([0, 100])\n plt.plot([t[0] for t in train_losses], [t[1] for t in train_losses])\n plt.plot([t[0] for t in valid_losses], [t[1] for t in valid_losses])\n plt.legend(['Train Loss', 'Validation Loss'])\n chart_file_path = os.path.join(run_folder, get_config_name(config) + '.png')\n plt.savefig(chart_file_path)\n plt.clf()\n\n logger.info(\"Config {}, Loss: {}\".format(config, early_stop_best_loss))\n if best_valid_loss == None or early_stop_best_loss < best_valid_loss:\n logger.info(\"Found best new model!\")\n best_valid_loss = early_stop_best_loss\n best_config = config\n\n logger.info(\"Best Config: {}, Loss: {}\".format(best_config, best_valid_loss))\n" }, { "path": "rnn_sample.py", "content": "import os, sys\nimport argparse\nimport time\nimport itertools\nimport cPickle\n\nimport numpy as np\nimport tensorflow as tf \n\nimport util\nimport nottingham_util\nfrom model import Model, NottinghamModel\nfrom rnn import DefaultConfig\n\nif __name__ == '__main__':\n np.random.seed() \n\n parser = argparse.ArgumentParser(description='Script to generated a MIDI file sample from a trained model.')\n parser.add_argument('--config_file', type=str, required=True)\n parser.add_argument('--sample_melody', action='store_true', default=False)\n parser.add_argument('--sample_harmony', action='store_true', default=False)\n parser.add_argument('--sample_seq', type=str, default='random',\n choices = ['random', 'chords'])\n parser.add_argument('--conditioning', type=int, default=-1)\n parser.add_argument('--sample_length', type=int, default=512)\n\n args = parser.parse_args()\n\n with open(args.config_file, 'r') as f: \n config = cPickle.load(f)\n\n if config.dataset == 'softmax':\n config.time_batch_len = 1\n config.max_time_batches = -1\n model_class = NottinghamModel\n with open(nottingham_util.PICKLE_LOC, 'r') as f:\n pickle = cPickle.load(f)\n chord_to_idx = pickle['chord_to_idx']\n\n time_step = 120\n resolution = 480\n\n # use time batch len of 1 so that every target is covered\n test_data = util.batch_data(pickle['test'], time_batch_len = 1, \n max_time_batches = -1, softmax = True)\n else:\n raise Exception(\"Other datasets not yet implemented\")\n\n print config\n\n with tf.Graph().as_default(), tf.Session() as session:\n with tf.variable_scope(\"model\", reuse=None):\n sampling_model = model_class(config)\n\n saver = tf.train.Saver(tf.all_variables())\n model_path = os.path.join(os.path.dirname(args.config_file), \n config.model_name)\n saver.restore(session, model_path)\n\n state = sampling_model.get_cell_zero_state(session, 1)\n if args.sample_seq == 'chords':\n # 16 - one measure, 64 - chord progression\n repeats = args.sample_length / 64\n sample_seq = nottingham_util.i_vi_iv_v(chord_to_idx, repeats, config.input_dim)\n print 'Sampling melody using a I, VI, IV, V progression'\n\n elif args.sample_seq == 'random':\n sample_index = np.random.choice(np.arange(len(pickle['test'])))\n sample_seq = [ pickle['test'][sample_index][i, :] \n for i in range(pickle['test'][sample_index].shape[0]) ]\n\n chord = sample_seq[0]\n seq = [chord]\n\n if args.conditioning > 0:\n for i in range(1, args.conditioning):\n seq_input = np.reshape(chord, [1, 1, config.input_dim])\n feed = {\n sampling_model.seq_input: seq_input,\n sampling_model.initial_state: state,\n }\n state = session.run(sampling_model.final_state, feed_dict=feed)\n chord = sample_seq[i]\n seq.append(chord)\n\n if config.dataset == 'softmax':\n writer = nottingham_util.NottinghamMidiWriter(chord_to_idx, verbose=False)\n sampler = nottingham_util.NottinghamSampler(chord_to_idx, verbose=False)\n else:\n # writer = midi_util.MidiWriter()\n # sampler = sampling.Sampler(verbose=False)\n raise Exception(\"Other datasets not yet implemented\")\n\n for i in range(max(args.sample_length - len(seq), 0)):\n seq_input = np.reshape(chord, [1, 1, config.input_dim])\n feed = {\n sampling_model.seq_input: seq_input,\n sampling_model.initial_state: state,\n }\n [probs, state] = session.run(\n [sampling_model.probs, sampling_model.final_state],\n feed_dict=feed)\n probs = np.reshape(probs, [config.input_dim])\n chord = sampler.sample_notes(probs)\n\n if config.dataset == 'softmax':\n r = nottingham_util.NOTTINGHAM_MELODY_RANGE\n if args.sample_melody:\n chord[r:] = 0\n chord[r:] = sample_seq[i][r:]\n elif args.sample_harmony:\n chord[:r] = 0\n chord[:r] = sample_seq[i][:r]\n\n seq.append(chord)\n\n writer.dump_sequence_to_midi(seq, \"best.midi\", \n time_step=time_step, resolution=resolution)\n" }, { "path": "rnn_separate.py", "content": "import os, sys\nimport argparse\nimport time\nimport itertools\nimport cPickle\nimport logging\nimport random\nimport string\nimport pprint\n \nimport numpy as np\nimport tensorflow as tf \nimport matplotlib.pyplot as plt\n\nimport midi_util\nimport nottingham_util\nimport sampling\nimport util\nfrom rnn import get_config_name, DefaultConfig\nfrom model import Model, NottinghamSeparate\n\nif __name__ == '__main__':\n np.random.seed() \n\n parser = argparse.ArgumentParser(description='Music RNN')\n parser.add_argument('--choice', type=str, default='melody',\n choices = ['melody', 'harmony'])\n parser.add_argument('--dataset', type=str, default='softmax',\n choices = ['bach', 'nottingham', 'softmax'])\n parser.add_argument('--model_dir', type=str, default='models')\n parser.add_argument('--run_name', type=str, default=time.strftime(\"%m%d_%H%M\"))\n\n args = parser.parse_args()\n\n if args.dataset == 'softmax':\n resolution = 480\n time_step = 120\n model_class = NottinghamSeparate\n with open(nottingham_util.PICKLE_LOC, 'r') as f:\n pickle = cPickle.load(f)\n chord_to_idx = pickle['chord_to_idx']\n\n input_dim = pickle[\"train\"][0].shape[1]\n print 'Finished loading data, input dim: {}'.format(input_dim)\n else:\n raise Exception(\"Other datasets not yet implemented\")\n\n\n initializer = tf.random_uniform_initializer(-0.1, 0.1)\n\n best_config = None\n best_valid_loss = None\n\n # set up run dir\n run_folder = os.path.join(args.model_dir, args.run_name)\n if os.path.exists(run_folder):\n raise Exception(\"Run name {} already exists, choose a different one\", format(run_folder))\n os.makedirs(run_folder)\n\n logger = logging.getLogger(__name__) \n logger.setLevel(logging.INFO)\n logger.addHandler(logging.StreamHandler())\n logger.addHandler(logging.FileHandler(os.path.join(run_folder, \"training.log\")))\n\n # grid\n grid = {\n \"dropout_prob\": [0.65],\n \"input_dropout_prob\": [0.9],\n \"num_layers\": [1],\n \"hidden_size\": [100]\n }\n\n # Generate product of hyperparams\n runs = list(list(itertools.izip(grid, x)) for x in itertools.product(*grid.itervalues()))\n logger.info(\"{} runs detected\".format(len(runs)))\n\n for combination in runs:\n\n config = DefaultConfig()\n config.dataset = args.dataset\n config.model_name = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(12)) + '.model'\n for attr, value in combination:\n setattr(config, attr, value)\n\n if config.dataset == 'softmax':\n data = util.load_data('', time_step, config.time_batch_len, config.max_time_batches, nottingham=pickle)\n config.input_dim = data[\"input_dim\"]\n else:\n raise Exception(\"Other datasets not yet implemented\")\n\n # cut away unnecessary parts\n r = nottingham_util.NOTTINGHAM_MELODY_RANGE\n if args.choice == 'melody':\n print \"Using only melody\"\n for d in ['train', 'test', 'valid']:\n new_data = []\n for batch_data, batch_targets in data[d][\"data\"]:\n new_data.append(([tb[:, :, :r] for tb in batch_data],\n [tb[:, :, 0] for tb in batch_targets]))\n data[d][\"data\"] = new_data\n else:\n print \"Using only harmony\"\n for d in ['train', 'test', 'valid']:\n new_data = []\n for batch_data, batch_targets in data[d][\"data\"]:\n new_data.append(([tb[:, :, r:] for tb in batch_data],\n [tb[:, :, 1] for tb in batch_targets]))\n data[d][\"data\"] = new_data\n\n input_dim = data[\"input_dim\"] = data[\"train\"][\"data\"][0][0][0].shape[2]\n config.input_dim = input_dim\n print \"New input dim: {}\".format(input_dim)\n\n logger.info(config)\n config_file_path = os.path.join(run_folder, get_config_name(config) + '.config')\n with open(config_file_path, 'w') as f: \n cPickle.dump(config, f)\n\n with tf.Graph().as_default(), tf.Session() as session:\n with tf.variable_scope(\"model\", reuse=None):\n train_model = model_class(config, training=True)\n with tf.variable_scope(\"model\", reuse=True):\n valid_model = model_class(config, training=False)\n\n saver = tf.train.Saver(tf.all_variables())\n tf.initialize_all_variables().run()\n\n # training\n early_stop_best_loss = None\n start_saving = False\n saved_flag = False\n train_losses, valid_losses = [], []\n start_time = time.time()\n for i in range(config.num_epochs):\n loss = util.run_epoch(session, train_model, data[\"train\"][\"data\"], training=True, testing=False)\n train_losses.append((i, loss))\n if i == 0:\n continue\n\n valid_loss = util.run_epoch(session, valid_model, data[\"valid\"][\"data\"], training=False, testing=False)\n valid_losses.append((i, valid_loss))\n\n logger.info('Epoch: {}, Train Loss: {}, Valid Loss: {}, Time Per Epoch: {}'.format(\\\n i, loss, valid_loss, (time.time() - start_time)/i))\n\n # if it's best validation loss so far, save it\n if early_stop_best_loss == None:\n early_stop_best_loss = valid_loss\n elif valid_loss < early_stop_best_loss:\n early_stop_best_loss = valid_loss\n if start_saving:\n logger.info('Best loss so far encountered, saving model.')\n saver.save(session, os.path.join(run_folder, config.model_name))\n saved_flag = True\n elif not start_saving:\n start_saving = True \n logger.info('Valid loss increased for the first time, will start saving models')\n saver.save(session, os.path.join(run_folder, config.model_name))\n saved_flag = True\n\n if not saved_flag:\n saver.save(session, os.path.join(run_folder, config.model_name))\n\n # set loss axis max to 20\n axes = plt.gca()\n if config.dataset == 'softmax':\n axes.set_ylim([0, 2])\n else:\n axes.set_ylim([0, 100])\n plt.plot([t[0] for t in train_losses], [t[1] for t in train_losses])\n plt.plot([t[0] for t in valid_losses], [t[1] for t in valid_losses])\n plt.legend(['Train Loss', 'Validation Loss'])\n chart_file_path = os.path.join(run_folder, get_config_name(config) + '.png')\n plt.savefig(chart_file_path)\n plt.clf()\n\n logger.info(\"Config {}, Loss: {}\".format(config, early_stop_best_loss))\n if best_valid_loss == None or early_stop_best_loss < best_valid_loss:\n logger.info(\"Found best new model!\")\n best_valid_loss = early_stop_best_loss\n best_config = config\n\n logger.info(\"Best Config: {}, Loss: {}\".format(best_config, best_valid_loss))\n" }, { "path": "rnn_test.py", "content": "import os, sys\nimport argparse\nimport cPickle\n\nimport numpy as np\nimport tensorflow as tf \n\nimport util\nimport nottingham_util\nfrom model import Model, NottinghamModel, NottinghamSeparate\nfrom rnn import DefaultConfig\n\nif __name__ == '__main__':\n np.random.seed() \n\n parser = argparse.ArgumentParser(description='Script to test a models performance against the test set')\n parser.add_argument('--config_file', type=str, required=True)\n parser.add_argument('--num_samples', type=int, default=1)\n parser.add_argument('--seperate', action='store_true', default=False)\n parser.add_argument('--choice', type=str, default='melody',\n choices = ['melody', 'harmony'])\n args = parser.parse_args()\n\n with open(args.config_file, 'r') as f: \n config = cPickle.load(f)\n\n if config.dataset == 'softmax':\n config.time_batch_len = 1\n config.max_time_batches = -1\n with open(nottingham_util.PICKLE_LOC, 'r') as f:\n pickle = cPickle.load(f)\n if args.seperate:\n model_class = NottinghamSeparate\n test_data = util.batch_data(pickle['test'], time_batch_len = 1, \n max_time_batches = -1, softmax = True)\n r = nottingham_util.NOTTINGHAM_MELODY_RANGE\n if args.choice == 'melody':\n print \"Using only melody\"\n new_data = []\n for batch_data, batch_targets in test_data:\n new_data.append(([tb[:, :, :r] for tb in batch_data],\n [tb[:, :, 0] for tb in batch_targets]))\n test_data = new_data\n else:\n print \"Using only harmony\"\n new_data = []\n for batch_data, batch_targets in test_data:\n new_data.append(([tb[:, :, r:] for tb in batch_data],\n [tb[:, :, 1] for tb in batch_targets]))\n test_data = new_data\n else:\n model_class = NottinghamModel\n # use time batch len of 1 so that every target is covered\n test_data = util.batch_data(pickle['test'], time_batch_len = 1, \n max_time_batches = -1, softmax = True)\n else:\n raise Exception(\"Other datasets not yet implemented\")\n \n print config\n\n with tf.Graph().as_default(), tf.Session() as session:\n with tf.variable_scope(\"model\", reuse=None):\n test_model = model_class(config, training=False)\n\n saver = tf.train.Saver(tf.all_variables())\n model_path = os.path.join(os.path.dirname(args.config_file), \n config.model_name)\n saver.restore(session, model_path)\n \n test_loss, test_probs = util.run_epoch(session, test_model, test_data, \n training=False, testing=True)\n print 'Testing Loss: {}'.format(test_loss)\n\n if config.dataset == 'softmax':\n if args.seperate:\n nottingham_util.seperate_accuracy(test_probs, test_data, num_samples=args.num_samples)\n else:\n nottingham_util.accuracy(test_probs, test_data, num_samples=args.num_samples)\n\n else:\n util.accuracy(test_probs, test_data, num_samples=50)\n\n sys.exit(1)\n" }, { "path": "sampling.py", "content": "import numpy as np\nfrom pprint import pprint\n\nimport midi_util\n\n\nclass Sampler(object):\n\n def __init__(self, min_prob=0.5, num_notes = 4, method = 'sample', verbose=False):\n self.min_prob = min_prob\n self.num_notes = num_notes\n self.method = method\n self.verbose = verbose\n\n def visualize_probs(self, probs):\n if not self.verbose:\n return\n print 'Highest four probs: '\n pprint(sorted(list(enumerate(probs)), key=lambda x: x[1], \n reverse=True)[:4])\n\n def sample_notes_prob(self, probs, max_notes=-1):\n \"\"\" Samples all notes that are over a certain probability\"\"\"\n self.visualize_probs(probs)\n top_idxs = list()\n for idx in probs.argsort()[::-1]:\n if max_notes > 0 and len(top_idxs) >= max_notes:\n break\n if probs[idx] < self.min_prob:\n break\n top_idxs.append(idx)\n chord = np.zeros([len(probs)], dtype=np.int32)\n chord[top_idxs] = 1.0\n return chord\n\n def sample_notes_static(self, probs):\n top_idxs = probs.argsort()[-self.num_notes:][::-1]\n chord = np.zeros([len(probs)], dtype=np.int32)\n chord[top_idxs] = 1.0\n return chord\n\n def sample_notes_bernoulli(self, probs):\n chord = np.zeros([len(probs)], dtype=np.int32)\n for note, prob in enumerate(probs):\n if np.random.binomial(1, prob) > 0:\n chord[note] = 1\n return chord\n\n def sample_notes(self, probs):\n \"\"\" Samples a static amount of notes from probabilities by highest prob \"\"\"\n self.visualize_probs(probs)\n if self.method == 'sample':\n return self.sample_notes_bernoulli(probs)\n elif self.method == 'static':\n return self.sample_notes_static(probs)\n elif self.method == 'min_prob':\n return self.sample_notes_prob(probs)\n else:\n raise Exception(\"Unrecognized method: {}\".format(self.method))\n" }, { "path": "util.py", "content": "import os\nimport math\nimport cPickle\nfrom collections import defaultdict\nfrom random import shuffle\n\nimport numpy as np\nimport tensorflow as tf \n\nimport midi_util\nimport nottingham_util\n\ndef parse_midi_directory(input_dir, time_step):\n \"\"\" \n input_dir: data directory full of midi files\n time_step: the number of ticks to use as a time step for discretization\n\n Returns a list of [T x D] matrices, where T is the amount of time steps\n and D is the range of notes.\n \"\"\"\n files = [ os.path.join(input_dir, f) for f in os.listdir(input_dir)\n if os.path.isfile(os.path.join(input_dir, f)) ] \n sequences = [ \\\n (f, midi_util.parse_midi_to_sequence(f, time_step=time_step)) \\\n for f in files ]\n\n return sequences\n\ndef batch_data(sequences, time_batch_len=128, max_time_batches=10,\n softmax=False, verbose=False):\n \"\"\"\n sequences: a list of [T x D] matrices, each matrix representing a sequencey\n time_batch_len: the unrolling length that will be used by BPTT. \n max_time_batches: the max amount of time batches to consider. Any sequences \n longert than max_time_batches * time_batch_len will be ignored\n Can be set to -1 to all time batches needed.\n softmax: Flag should be set to true if using the dual-softmax formualtion\n\n returns [\n [ [ data ], [ target ] ], # batch with one time step\n [ [ data1, data2 ], [ target1, target2 ] ], # batch with two time steps\n ...\n ]\n \"\"\"\n\n assert time_batch_len > 0\n\n dims = sequences[0].shape[1]\n sequence_lens = [s.shape[0] for s in sequences]\n\n if verbose:\n avg_seq_len = sum(sequence_lens) / len(sequences)\n print \"Average Sequence Length: {}\".format(avg_seq_len)\n print \"Max Sequence Length: {}\".format(time_batch_len)\n print \"Number of sequences: {}\".format(len(sequences))\n\n batches = defaultdict(list)\n for sequence in sequences:\n # -1 because we can't predict the first step\n num_time_steps = ((sequence.shape[0]-1) // time_batch_len) \n if num_time_steps < 1:\n continue\n if max_time_batches > 0 and num_time_steps > max_time_batches:\n continue\n batches[num_time_steps].append(sequence)\n\n if verbose:\n print \"Batch distribution:\"\n print [(k, len(v)) for (k, v) in batches.iteritems()]\n\n def arrange_batch(sequences, num_time_steps):\n sequences = [s[:(num_time_steps*time_batch_len)+1, :] for s in sequences]\n stacked = np.dstack(sequences)\n # swap axes so that shape is (SEQ_LENGTH X BATCH_SIZE X INPUT_DIM)\n data = np.swapaxes(stacked, 1, 2)\n targets = np.roll(data, -1, axis=0)\n # cutoff final time step\n data = data[:-1, :, :]\n targets = targets[:-1, :, :]\n assert data.shape == targets.shape\n\n if softmax:\n r = nottingham_util.NOTTINGHAM_MELODY_RANGE\n labels = np.ones((targets.shape[0], targets.shape[1], 2), dtype=np.int32)\n assert np.all(np.sum(targets[:, :, :r], axis=2) == 1)\n assert np.all(np.sum(targets[:, :, r:], axis=2) == 1)\n labels[:, :, 0] = np.argmax(targets[:, :, :r], axis=2)\n labels[:, :, 1] = np.argmax(targets[:, :, r:], axis=2)\n targets = labels\n assert targets.shape[:2] == data.shape[:2]\n\n assert data.shape[0] == num_time_steps * time_batch_len\n\n # split them up into time batches\n tb_data = np.split(data, num_time_steps, axis=0)\n tb_targets = np.split(targets, num_time_steps, axis=0)\n\n assert len(tb_data) == len(tb_targets) == num_time_steps\n for i in range(len(tb_data)):\n assert tb_data[i].shape[0] == time_batch_len\n assert tb_targets[i].shape[0] == time_batch_len\n if softmax:\n assert np.all(np.sum(tb_data[i], axis=2) == 2)\n\n return (tb_data, tb_targets)\n\n return [ arrange_batch(b, n) for n, b in batches.iteritems() ]\n \ndef load_data(data_dir, time_step, time_batch_len, max_time_batches, nottingham=None):\n \"\"\"\n nottingham: The sequences object as created in prepare_nottingham_pickle\n (see nottingham_util for more). If None, parse all the MIDI\n files from data_dir\n time_step: the time_step used to parse midi files (only used if data_dir\n is provided)\n time_batch_len and max_time_batches: see batch_data()\n\n returns { \n \"train\": {\n \"data\": [ batch_data() ],\n \"metadata: { ... }\n },\n \"valid\": { ... }\n \"test\": { ... }\n }\n \"\"\"\n\n data = {}\n for dataset in ['train', 'test', 'valid']:\n\n # For testing, use ALL the sequences\n if dataset == 'test':\n max_time_batches = -1\n\n # Softmax formualation preparsed into sequences\n if nottingham:\n sequences = nottingham[dataset]\n metadata = nottingham[dataset + '_metadata']\n # Cross-entropy formulation needs to be parsed\n else:\n sf = parse_midi_directory(os.path.join(data_dir, dataset), time_step)\n sequences = [s[1] for s in sf]\n files = [s[0] for s in sf]\n metadata = [{\n 'path': f,\n 'name': f.split(\"/\")[-1].split(\".\")[0]\n } for f in files]\n\n dataset_data = batch_data(sequences, time_batch_len, max_time_batches, softmax = True if nottingham else False)\n\n data[dataset] = {\n \"data\": dataset_data,\n \"metadata\": metadata,\n }\n\n data[\"input_dim\"] = dataset_data[0][0][0].shape[2]\n\n return data\n\n\ndef run_epoch(session, model, batches, training=False, testing=False):\n \"\"\"\n session: Tensorflow session object\n model: model object (see model.py)\n batches: data object loaded from util_data()\n\n training: A backpropagation iteration will be performed on the dataset\n if this flag is active\n\n returns average loss per time step over all batches.\n if testing flag is active: returns [ loss, probs ] where is the probability\n values for each note\n \"\"\"\n\n # shuffle batches\n shuffle(batches)\n\n target_tensors = [model.loss, model.final_state]\n if testing:\n target_tensors.append(model.probs)\n batch_probs = defaultdict(list)\n if training:\n target_tensors.append(model.train_step)\n\n losses = []\n for data, targets in batches:\n # save state over unrolling time steps\n batch_size = data[0].shape[1]\n num_time_steps = len(data)\n state = model.get_cell_zero_state(session, batch_size) \n probs = list()\n\n for tb_data, tb_targets in zip(data, targets):\n if testing:\n tbd = tb_data\n tbt = tb_targets\n else:\n # shuffle all the batches of input, state, and target\n batches = tb_data.shape[1]\n permutations = np.random.permutation(batches)\n tbd = np.zeros_like(tb_data)\n tbd[:, np.arange(batches), :] = tb_data[:, permutations, :]\n tbt = np.zeros_like(tb_targets)\n tbt[:, np.arange(batches), :] = tb_targets[:, permutations, :]\n state[np.arange(batches)] = state[permutations]\n\n feed_dict = {\n model.initial_state: state,\n model.seq_input: tbd,\n model.seq_targets: tbt,\n }\n results = session.run(target_tensors, feed_dict=feed_dict)\n\n losses.append(results[0])\n state = results[1]\n if testing:\n batch_probs[num_time_steps].append(results[2])\n\n loss = sum(losses) / len(losses)\n\n if testing:\n return [loss, batch_probs]\n else:\n return loss\n\ndef accuracy(batch_probs, data, num_samples=20):\n \"\"\"\n batch_probs: probs object returned from run_epoch\n data: data object passed into run_epoch\n num_samples: the number of times to sample each note (an average over all\n these samples will be used)\n\n returns the accuracy metric according to\n http://ismir2009.ismir.net/proceedings/PS2-21.pdf\n \"\"\"\n\n false_positives, false_negatives, true_positives = 0, 0, 0 \n for _, batch_targets in data:\n num_time_steps = len(batch_data)\n for ts_targets, ts_probs in zip(batch_targets, batch_probs[num_time_steps]):\n\n assert ts_targets.shape == ts_targets.shape\n\n for seq_idx in range(ts_targets.shape[1]):\n for step_idx in range(ts_targets.shape[0]):\n for note_idx, prob in enumerate(ts_probs[step_idx, seq_idx, :]):\n num_occurrences = np.random.binomial(num_samples, prob)\n if ts_targets[step_idx, seq_idx, note_idx] == 0.0:\n false_positives += num_occurrences\n else:\n false_negatives += (num_samples - num_occurrences)\n true_positives += num_occurrences\n \n accuracy = (float(true_positives) / float(true_positives + false_positives + false_negatives)) \n\n print \"Precision: {}\".format(float(true_positives) / (float(true_positives + false_positives)))\n print \"Recall: {}\".format(float(true_positives) / (float(true_positives + false_negatives)))\n print \"Accuracy: {}\".format(accuracy)\n" } ]