Repository: francesclluis/source-separation-wavenet
Branch: master
Commit: c80bb531f32d
Files: 16
Total size: 76.7 MB
Directory structure:
gitextract_t_rsro8e/
├── LICENSE
├── README.md
├── config.md
├── config_multi_instrument.json
├── config_singing_voice.json
├── datasets.py
├── environment.yml
├── layers.py
├── main.py
├── models.py
├── separate.py
├── sessions/
│ ├── multi-instrument/
│ │ ├── checkpoints/
│ │ │ └── checkpoint.00045-0.hdf5
│ │ └── config.json
│ └── singing-voice/
│ ├── checkpoints/
│ │ └── checkpoint.00058-0.hdf5
│ └── config.json
└── util.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2018 Francesc Lluís Salvadó
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
A Wavenet for Music Source Separation
====
A neural network for end-to-end music source separation, as described in [End-to-end music source separation:
is it possible in the waveform domain?](https://arxiv.org/abs/1810.12187)
Listen to separated samples [here](http://jordipons.me/apps/end-to-end-music-source-separation/)
What is a Wavenet for Music Source Separation?
-----
The Wavenet for Music Source Separation is a fully convolutional neural network that directly operates on the raw audio waveform.
It is an adaptation of [Wavenet](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) that turns the original causal model (that is generative and slow), into a non-causal model (that is discriminative and parallelizable). This idea was originally proposed by [Rethage et al.](https://arxiv.org/abs/1706.07162) for speech denoising and now it is adapted for monaural music source separation. Their [code](https://github.com/drethage/speech-denoising-wavenet) is reused.
The main difference between the original Wavenet and the non-causal adaptation used, is that some samples from the future can be used to predict the present one. As a result of removing the autoregressive causal nature of the original Wavenet, this fully convolutional model is now able to predict a target field instead of one sample at a time – due to this parallelization, it is possible to run the model in real-time on a GPU.
<img src="img/wavenet_target_field.jpg">
See the diagram below for a summary of the network architecture.
<img src="img/wavenet_diagram.jpg">
Installation
-----
1. `git clone https://github.com/francesclluis/source-separation-wavenet.git`
2. Install [conda](https://conda.io/docs/user-guide/install/index.html)
3. `conda env create -f environment.yml`
4. `source activate sswavenet`
*Currently the project requires **Keras 2.1** and **Theano 1.0.1**, the large dilations present in the architecture are not supported by the current version of Tensorflow*
Usage
-----
A pre-trained multi-instrument model (best-performing model described in the paper) can be found in `sessions/multi-instrument/checkpoints` and is ready to be used out-of-the-box. The parameterization of this model is specified in `sessions/multi-instrument/config.json`
A pre-trained singing-voice model (best-performing model described in the paper) can be found in `sessions/singing-voice/checkpoints` and is ready to be used out-of-the-box. The parameterization of this model is specified in `sessions/singing-voice/config.json`
*Download the dataset as described [below](https://github.com/francesclluis/source-separation-wavenet#dataset)*
#### Source Separation:
Example (multi-instrument): `THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/multi-instrument/config.json --mixture_input_path audio/`
Example (singing-voice): `THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/singing-voice/config.json --mixture_input_path audio/`
###### Speedup
To achieve faster source separation, one can increase the target-field length by use of the optional `--target_field_length` argument. This defines the amount of samples that are separated in a single forward propagation, saving redundant calculations. In the following example, it is increased 10x that of when the model was trained, the batch_size is reduced to 4.
Faster Example: `THEANO_FLAGS=device=cuda python main.py --mode inference --target_field_length 16001 --batch_size 4 --config sessions/multi-instrument/config.json --mixture_input_path audio/`
#### Training:
Example (multi-instrument): `THEANO_FLAGS=device=cuda python main.py --mode training --target multi-instrument --config config_multi_instrument.json`
Example (singing-voice): `THEANO_FLAGS=device=cuda python main.py --mode training --target singing-voice --config config_singing_voice.json`
#### Configuration
A detailed description of all configurable parameters can be found in [config.md](https://github.com/francesclluis/source-separation-wavenet/blob/master/config.md)
#### Optional command-line arguments:
Argument | Valid Inputs | Default | Description
-------- | ---- | ---------- | -----
mode | [training, inference] | training |
target | [multi-instrument, singing-voice] | multi-instrument | Target of the model to train
config | string | config.json | Path to JSON-formatted config file
print_model_summary | bool | False | Prints verbose summary of the model
load_checkpoint | string | None | Path to hdf5 file containing a snapshot of model weights
#### Additional arguments during source separation:
Argument | Valid Inputs | Default | Description
-------- | ------------ | ------- | -----------
one_shot | bool | False | Separates each audio file in a single forward propagation
target_field_length | int | as defined in config.json | Overrides parameter in config.json for separating with different target-field lengths than used in training
batch_size | int | as defined in config.json | # of samples per batch
Dataset
-----
The MUSDB18 is used for training the model. It is provided by the Community-Based Signal Separation Evaluation Campaign (SISEC).
1. [Download here](https://sigsep.github.io/datasets/musdb.html#download)
2. Decode dataset to WAV format as explained [here](https://github.com/sigsep/sigsep-mus-io)
3. Extract to `data/MUSDB`
================================================
FILE: config.md
================================================
config.json - Configuring a training session
----
The parameters present in a `config.json` file allow one to configure a training session. Each of these parameters is described below:
### Dataset
How the data is used for training
* **extract_voice_percentage**: (float) Proportion of the data containing singing voice (instead of vocal streams having silence)
* **in_memory_percentage**: (float) Percentage of the dataset to load into memory, useful when dataset requires more memory than available
* **path**: (string) Path to dataset
* **sample_rate**: (int) Sample rate to which all samples should be resampled to
* **type**: (string) Identifier of which dataset is being used for training
### Model
What the model will be
* **condition_encoding**: (string) Which numerical representation to encode integer condition values to, either binary or one-hot
* **dilations**: (int) Maximum dilation factor as an exponent of 2, e.g. dilations = 9 results in a maximum dilation of 2^9 = 512
* **filters**:
* **lengths**:
* **res**: (int) Lengths of convolution kernels in residual blocks
* **final**: ([int, int]) Lengths of convolution kernels in final layers, individually definable
* **skip**: (int) Lengths of convolution kernels in skip connections
* **depths**:
* **res**: (int) Number of filters in residual-block convolution layers
* **skip**: (int) Number of filters in skip connections
* **final**: ([int, int]) Number of filters in final layers, individually definable
* **num_stacks**: (int) Number of stacks, as defined in the paper
* **target_field_length**: (int) Length of the output
* **target_padding**: (int) Number of samples used for padding the target_field *per side*
### Training
How training will be carried out
* **batch_size**: (int) Number of samples per batch
* **early_stopping_patience**: (int) Number of epochs to wait without improvement in accuracy before stopping training
* **loss**: (in the case of multi-instrument)
* **out_1**: First term in the three term loss (vocals)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to first term
* **out_2**: Second term in the three term loss (drums)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to second term
* **out_3**: Third term in the three term loss (bass)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to third term
* **loss**: (in the case of singing-voice)
* **out_1**: First term in the two term loss (singing voice)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to first term
* **out_2**: Second term in the two term loss (dissimilarity singing voice)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to second term
* **num_epochs**: (int) Maximum number of epochs to train for
* **num_steps_test**: (int) Total number of steps (batches of samples) to yield from validation generator before stopping at the end of every epoch.
* **num_steps_train**: (int) Total number of steps (batches of samples) to yield from training generator before declaring one epoch finished and starting the next epoch.
* **path**: (string) Path to the folder containing all files pertaining to the training session
* **verbosity**: (int) Keras verbosity level
================================================
FILE: config_multi_instrument.json
================================================
{
"dataset": {
"in_memory_percentage": 1,
"extract_voice_percentage": 0,
"path": "data/MUSDB",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"lengths": {
"res": 3,
"final": [3, 3],
"skip": 1
},
"depths": {
"res": 64,
"skip": 64,
"final": [2048, 256]
}
},
"num_stacks": 4,
"target_field_length": 1601,
"target_padding": 1
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_3": {
"l1": 1,
"l2": 0,
"weight": 1
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/003",
"verbosity": 1
}
}
================================================
FILE: config_singing_voice.json
================================================
{
"dataset": {
"in_memory_percentage": 1,
"extract_voice_percentage": 0.5,
"path": "data/MUSDB",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"lengths": {
"res": 3,
"final": [3, 3],
"skip": 1
},
"depths": {
"res": 64,
"skip": 64,
"final": [2048, 256]
}
},
"num_stacks": 4,
"target_field_length": 1601,
"target_padding": 1
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": -0.05
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/002",
"verbosity": 1
}
}
================================================
FILE: datasets.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Datasets.py
import util
import os
import numpy as np
import musdb
import logging
class SingingVoiceMUSDB18Dataset():
def __init__(self, config, model):
self.model = model
self.path = config['dataset']['path']
self.sample_rate = config['dataset']['sample_rate']
self.file_paths = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.sequences = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.voice_indices = {'train': [], 'val': []}
self.batch_size = config['training']['batch_size']
self.extract_voice_percent = config['dataset']['extract_voice_percentage']
self.in_memory_percentage = config['dataset']['in_memory_percentage']
self.num_sequences_in_memory = 0
self.condition_encode_function = util.get_condition_input_encode_func(config['model']['condition_encoding'])
def load_dataset(self):
print('Loading MUSDB18 dataset for singing voice separation...')
mus = musdb.DB(root_dir=self.path, is_wav=True)
tracks = mus.load_mus_tracks(subsets='train')
np.random.seed(seed=1337)
val_idx = np.random.choice(len(tracks), size=25, replace=False)
train_idx = [i for i in range(len(tracks)) if i not in val_idx]
val_tracks = [tracks[i] for i in val_idx]
train_tracks = [tracks[i] for i in train_idx]
for condition in ['mixture', 'vocals']:
self.file_paths['val'][condition] = [track.path[:-11] + condition + '.wav' for track in val_tracks]
for condition in ['mixture', 'vocals']:
self.file_paths['train'][condition] = [track.path[:-11] + condition + '.wav' for track in train_tracks]
self.load_songs()
return self
def load_songs(self):
for set in ['train', 'val']:
for condition in ['mixture', 'vocals']:
for filepath in self.file_paths[set][condition]:
if condition == 'vocals':
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
if self.extract_voice_percent > 0:
self.voice_indices[set].append(util.get_sequence_with_singing_indices(sequence))
else:
if self.in_memory_percentage == 1 or np.random.uniform(0, 1) <= (
self.in_memory_percentage - 0.5) * 2:
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
else:
self.sequences[set][condition].append([-1])
def get_num_sequences_in_dataset(self):
return len(self.sequences['train']['vocals']) + len(self.sequences['train']['mixture']) + len(
self.sequences['val']['vocals']) + len(self.sequences['val']['mixture'])
def retrieve_sequence(self, set, condition, sequence_num):
if len(self.sequences[set][condition][sequence_num]) == 1:
sequence = util.load_wav(self.file_paths[set][condition][sequence_num], self.sample_rate)
if (float(self.num_sequences_in_memory) / self.get_num_sequences_in_dataset()) < self.in_memory_percentage:
self.sequences[set][condition][sequence_num] = sequence
self.num_sequences_in_memory += 1
else:
sequence = self.sequences[set][condition][sequence_num]
return np.array(sequence)
def get_random_batch_generator(self, set):
if set not in ['train', 'val']:
raise ValueError("Argument SET must be either 'train' or 'val'")
while True:
sample_indices = np.random.randint(0, len(self.sequences[set]['vocals']), self.batch_size)
batch_inputs = []
batch_outputs_1 = []
batch_outputs_2 = []
for i, sample_i in enumerate(sample_indices):
while True:
starting_index = 0
mixture = self.retrieve_sequence(set, 'mixture', sample_i)
vocals = self.retrieve_sequence(set, 'vocals', sample_i)
accompaniment = mixture - vocals
if np.random.uniform(0, 1) < self.extract_voice_percent:
indices = self.voice_indices[set][sample_i]
vocals_indices, _ = util.get_indices_subsequence(indices)
vocals = vocals[vocals_indices[0]:vocals_indices[1]]
starting_index = vocals_indices[0]
if len(vocals) < self.model.input_length:
sample_i = np.random.randint(0, len(self.sequences[set]['vocals']))
else:
break
offset_1 = np.squeeze(np.random.randint(0, len(vocals) - self.model.input_length + 1, 1))
vocals_fragment = vocals[offset_1:offset_1 + self.model.input_length]
offset_2 = offset_1 + starting_index
accompaniment_fragment = accompaniment[offset_2:offset_2 + self.model.input_length]
input = accompaniment_fragment + vocals_fragment
output_vocals = vocals_fragment
output_accompaniment = accompaniment_fragment
batch_inputs.append(input)
batch_outputs_1.append(output_vocals)
batch_outputs_2.append(output_accompaniment)
batch_inputs = np.array(batch_inputs, dtype='float32')
batch_outputs_1 = np.array(batch_outputs_1, dtype='float32')
batch_outputs_2 = np.array(batch_outputs_2, dtype='float32')
batch_outputs_1 = batch_outputs_1[:, self.model.get_padded_target_field_indices()]
batch_outputs_2 = batch_outputs_2[:, self.model.get_padded_target_field_indices()]
batch = {'data_input': batch_inputs}, {'data_output_1': batch_outputs_1,
'data_output_2': batch_outputs_2}
yield batch
def get_condition_input_encode_func(self, representation):
if representation == 'binary':
return util.binary_encode
else:
return util.one_hot_encode
def get_target_sample_index(self):
return int(np.floor(self.fragment_length / 2.0))
def get_samples_of_interest_indices(self, causal=False):
if causal:
return -1
else:
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_sample_weight_vector_length(self):
if self.samples_of_interest_only:
return len(self.get_samples_of_interest_indices())
else:
return self.fragment_length
class MultiInstrumentMUSDB18Dataset():
def __init__(self, config, model):
self.model = model
self.path = config['dataset']['path']
self.sample_rate = config['dataset']['sample_rate']
self.file_paths = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.sequences = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.voice_indices = {'train': [], 'val': []}
self.batch_size = config['training']['batch_size']
self.extract_voice_percent = config['dataset']['extract_voice_percentage']
self.in_memory_percentage = config['dataset']['in_memory_percentage']
self.num_sequences_in_memory = 0
self.condition_encode_function = util.get_condition_input_encode_func(config['model']['condition_encoding'])
def load_dataset(self):
print('Loading MUSDB18 dataset for multi-instrument separation...')
mus = musdb.DB(root_dir=self.path, is_wav=True)
tracks = mus.load_mus_tracks(subsets='train')
np.random.seed(seed=1337)
val_idx = np.random.choice(len(tracks), size=25, replace=False)
train_idx = [i for i in range(len(tracks)) if i not in val_idx]
val_tracks = [tracks[i] for i in val_idx]
train_tracks = [tracks[i] for i in train_idx]
for condition in ['mixture', 'vocals', 'drums', 'other', 'bass']:
self.file_paths['val'][condition] = [track.path[:-11] + condition + '.wav' for track in val_tracks]
for condition in ['mixture', 'vocals', 'drums', 'other', 'bass']:
self.file_paths['train'][condition] = [track.path[:-11] + condition + '.wav' for track in train_tracks]
self.load_songs()
return self
def load_songs(self):
for set in ['train', 'val']:
for condition in ['vocals', 'mixture', 'drums', 'other', 'bass']:
for filepath in self.file_paths[set][condition]:
if condition == 'vocals':
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
if self.extract_voice_percent > 0:
self.voice_indices[set].append(util.get_sequence_with_singing_indices(sequence))
else:
if self.in_memory_percentage == 1 or np.random.uniform(0, 1) <= (
self.in_memory_percentage - 0.5) * 2:
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
else:
self.sequences[set][condition].append([-1])
def get_num_sequences_in_dataset(self):
return len(self.sequences['train']['vocals']) + len(self.sequences['train']['mixture']) + len(
self.sequences['val']['vocals']) + len(self.sequences['val']['mixture'])
def retrieve_sequence(self, set, condition, sequence_num):
if len(self.sequences[set][condition][sequence_num]) == 1:
sequence = util.load_wav(self.file_paths[set][condition][sequence_num], self.sample_rate)
if (float(self.num_sequences_in_memory) / self.get_num_sequences_in_dataset()) < self.in_memory_percentage:
self.sequences[set][condition][sequence_num] = sequence
self.num_sequences_in_memory += 1
else:
sequence = self.sequences[set][condition][sequence_num]
return np.array(sequence)
def get_random_batch_generator(self, set):
if set not in ['train', 'val']:
raise ValueError("Argument SET must be either 'train' or 'val'")
while True:
sample_indices = np.random.randint(0, len(self.sequences[set]['vocals']), self.batch_size)
batch_inputs = []
batch_outputs_1 = []
batch_outputs_2 = []
batch_outputs_3 = []
for i, sample_i in enumerate(sample_indices):
while True:
starting_index = 0
vocals = self.retrieve_sequence(set, 'vocals', sample_i)
bass = self.retrieve_sequence(set, 'bass', sample_i)
drums = self.retrieve_sequence(set, 'drums', sample_i)
other = self.retrieve_sequence(set, 'other', sample_i)
if np.random.uniform(0, 1) < self.extract_voice_percent:
indices = self.voice_indices[set][sample_i]
vocals_indices, _ = util.get_indices_subsequence(indices)
vocals = vocals[vocals_indices[0]:vocals_indices[1]]
starting_index = vocals_indices[0]
if len(vocals) < self.model.input_length:
sample_i = np.random.randint(0, len(self.sequences[set]['vocals']))
else:
break
offset_1 = np.squeeze(np.random.randint(0, len(vocals) - self.model.input_length + 1, 1))
vocals_fragment = vocals[offset_1:offset_1 + self.model.input_length]
offset_2 = offset_1 + starting_index
bass_fragment = bass[offset_2:offset_2 + self.model.input_length]
drums_fragment = drums[offset_2:offset_2 + self.model.input_length]
other_fragment = other[offset_2:offset_2 + self.model.input_length]
input = vocals_fragment + bass_fragment + drums_fragment + other_fragment
output_vocals = vocals_fragment
output_drums = drums_fragment
output_bass = bass_fragment
batch_inputs.append(input)
batch_outputs_1.append(output_vocals)
batch_outputs_2.append(output_drums)
batch_outputs_3.append(output_bass)
batch_inputs = np.array(batch_inputs, dtype='float32')
batch_outputs_1 = np.array(batch_outputs_1, dtype='float32')
batch_outputs_2 = np.array(batch_outputs_2, dtype='float32')
batch_outputs_3 = np.array(batch_outputs_3, dtype='float32')
batch_outputs_1 = batch_outputs_1[:, self.model.get_padded_target_field_indices()]
batch_outputs_2 = batch_outputs_2[:, self.model.get_padded_target_field_indices()]
batch_outputs_3 = batch_outputs_3[:, self.model.get_padded_target_field_indices()]
batch = {'data_input': batch_inputs}, {'data_output_1': batch_outputs_1,
'data_output_2': batch_outputs_2,
'data_output_3': batch_outputs_3}
yield batch
def get_condition_input_encode_func(self, representation):
if representation == 'binary':
return util.binary_encode
else:
return util.one_hot_encode
def get_target_sample_index(self):
return int(np.floor(self.fragment_length / 2.0))
def get_samples_of_interest_indices(self, causal=False):
if causal:
return -1
else:
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_sample_weight_vector_length(self):
if self.samples_of_interest_only:
return len(self.get_samples_of_interest_indices())
else:
return self.fragment_length
================================================
FILE: environment.yml
================================================
name: sswavenet
channels:
- anaconda
- conda-forge
- defaults
dependencies:
- intel-openmp=2018.0.0=hc7b2577_8
- mkl=2018.0.1=h19d6760_4
- mkl-service=1.1.2=py27hb2d42c5_4
- ca-certificates=2018.1.18=0
- certifi=2018.1.18=py27_0
- h5py=2.7.1=py27_2
- hdf5=1.10.1=2
- keras=2.1.5=py27_0
- libgpuarray=0.7.5=0
- mako=1.0.7=py27_0
- markupsafe=1.0=py27_0
- openssl=1.0.2n=0
- pygpu=0.7.5=py27_0
- pyyaml=3.12=py27_1
- six=1.11.0=py27_1
- theano=1.0.1=py27_1
- yaml=0.1.7=0
- libedit=3.1=heed3624_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=7.2.0=hdf63c60_3
- libgfortran=3.0.0=1
- libgfortran-ng=7.2.0=hdf63c60_3
- libstdcxx-ng=7.2.0=hdf63c60_3
- ncurses=6.0=h9df7e31_2
- numpy=1.14.2=py27hdbf6ddf_0
- pip=9.0.1=py27_5
- python=2.7.14=h1571d57_30
- readline=7.0=ha6073c6_4
- scipy=1.0.0=py27hf5f0f52_0
- setuptools=38.5.1=py27_0
- sqlite=3.22.0=h1bed415_0
- tk=8.6.7=hc745277_3
- wheel=0.30.0=py27h2bc6bb2_1
- zlib=1.2.11=ha838bed_2
- pip:
- cffi==1.11.5
- functools32==3.2.3.post2
- jsonschema==2.6.0
- musdb==0.2.3
- museval==0.2.0
- pyaml==17.12.1
- pycparser==2.18
- simplejson==3.13.2
- soundfile==0.9.0
- stempeg==0.1.3
- tqdm==4.19.7
================================================
FILE: layers.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Layers.py
import keras
class AddSingletonDepth(keras.layers.Layer):
def call(self, x, mask=None):
x = keras.backend.expand_dims(x, -1) # add a dimension of the right
if keras.backend.ndim(x) == 4:
return keras.backend.permute_dimensions(x, (0, 3, 1, 2))
else:
return x
def compute_output_shape(self, input_shape):
if len(input_shape) == 3:
return input_shape[0], 1, input_shape[1], input_shape[2]
else:
return input_shape[0], input_shape[1], 1
class Subtract(keras.layers.Layer):
def __init__(self, **kwargs):
super(Subtract, self).__init__(**kwargs)
def call(self, x, mask=None):
return x[0] - x[1]
def compute_output_shape(self, input_shape):
return input_shape[0]
class Add(keras.layers.Layer):
def __init__(self, **kwargs):
super(Add, self).__init__(**kwargs)
def call(self, x, mask=None):
output = x[0]
for i in range(1, len(x)):
output += x[i]
return output
def compute_output_shape(self, input_shape):
return input_shape[0]
class Slice(keras.layers.Layer):
def __init__(self, selector, output_shape, **kwargs):
self.selector = selector
self.desired_output_shape = output_shape
super(Slice, self).__init__(**kwargs)
def call(self, x, mask=None):
selector = self.selector
if len(self.selector) == 2 and not type(self.selector[1]) is slice and not type(self.selector[1]) is int:
x = keras.backend.permute_dimensions(x, [0, 2, 1])
selector = (self.selector[1], self.selector[0])
y = x[selector]
if len(self.selector) == 2 and not type(self.selector[1]) is slice and not type(self.selector[1]) is int:
y = keras.backend.permute_dimensions(y, [0, 2, 1])
return y
def compute_output_shape(self, input_shape):
output_shape = (None,)
for i, dim_length in enumerate(self.desired_output_shape):
if dim_length == Ellipsis:
output_shape = output_shape + (input_shape[i+1],)
else:
output_shape = output_shape + (dim_length,)
return output_shape
================================================
FILE: main.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Main.py
import sys
import logging
import optparse
import json
import os
import models
import datasets
import util
import separate
def set_system_settings():
sys.setrecursionlimit(50000)
logging.getLogger().setLevel(logging.INFO)
def get_command_line_arguments():
parser = optparse.OptionParser()
parser.set_defaults(config='sessions/multi-instrument/config.json')
parser.set_defaults(mode='training')
parser.set_defaults(target='multi-instrument')
parser.set_defaults(load_checkpoint=None)
parser.set_defaults(condition_value=0)
parser.set_defaults(batch_size=None)
parser.set_defaults(one_shot=False)
parser.set_defaults(mixture_input_path=None)
parser.set_defaults(print_model_summary=False)
parser.set_defaults(target_field_length=None)
parser.add_option('--mode', dest='mode')
parser.add_option('--target', dest='target')
parser.add_option('--print_model_summary', dest='print_model_summary')
parser.add_option('--config', dest='config')
parser.add_option('--load_checkpoint', dest='load_checkpoint')
parser.add_option('--condition_value', dest='condition_value')
parser.add_option('--batch_size', dest='batch_size')
parser.add_option('--one_shot', dest='one_shot')
parser.add_option('--mixture_input_path', dest='mixture_input_path')
parser.add_option('--target_field_length', dest='target_field_length')
(options, args) = parser.parse_args()
return options
def load_config(config_filepath):
try:
config_file = open(config_filepath, 'r')
except IOError:
logging.error('No readable config file at path: ' + config_filepath)
exit()
else:
with config_file:
return json.load(config_file)
def get_dataset(config, cla, model):
if config['dataset']['type'] == 'musdb18':
if cla.target == 'singing-voice':
return datasets.SingingVoiceMUSDB18Dataset(config, model).load_dataset()
elif cla.target == 'multi-instrument':
return datasets.MultiInstrumentMUSDB18Dataset(config, model).load_dataset()
def training(config, cla):
# Instantiate Model
if cla.target == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif cla.target == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
else:
raise Exception("Argument target must be either 'singing-voice' or 'multi-instrument'")
dataset = get_dataset(config, cla, model)
num_steps_train = config['training']['num_steps_train']
num_steps_val = config['training']['num_steps_test']
train_set_generator = dataset.get_random_batch_generator('train')
val_set_generator = dataset.get_random_batch_generator('val')
model.fit_model(train_set_generator, num_steps_train, val_set_generator, num_steps_val,
config['training']['num_epochs'])
def get_valid_output_folder_path(outputs_folder_path):
j = 1
while True:
output_folder_name = 'samples_%d' % j
output_folder_path = os.path.join(outputs_folder_path, output_folder_name)
if not os.path.isdir(output_folder_path):
os.mkdir(output_folder_path)
break
j += 1
return output_folder_path
def inference(config, cla):
if cla.batch_size is not None:
batch_size = int(cla.batch_size)
else:
batch_size = config['training']['batch_size']
if cla.target_field_length is not None:
cla.target_field_length = int(cla.target_field_length)
if not bool(cla.one_shot):
if config['model']['type'] == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif config['model']['type'] == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
print 'Performing inference..'
else:
print 'Performing one-shot inference..'
samples_folder_path = os.path.join(config['training']['path'], 'samples')
output_folder_path = get_valid_output_folder_path(samples_folder_path)
#If input_path is a single wav file, then set filenames to single element with wav filename
if cla.mixture_input_path.endswith('.wav'):
filenames = [cla.mixture_input_path.rsplit('/', 1)[-1]]
cla.mixture_input_path = cla.mixture_input_path.rsplit('/', 1)[0] + '/'
else:
if not cla.mixture_input_path.endswith('/'):
cla.mixture_input_path += '/'
filenames = [filename for filename in os.listdir(cla.mixture_input_path) if filename.endswith('.wav')]
for filename in filenames:
mixture_input = util.load_wav(cla.mixture_input_path + filename, config['dataset']['sample_rate'])
input = {'mixture': mixture_input}
output_filename_prefix = filename[0:-4]
if bool(cla.one_shot):
if len(input['mixture']) % 2 == 0: # If input length is even, remove one sample
input['mixture'] = input['mixture'][:-1]
if config['model']['type'] == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif config['model']['type'] == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
print "Separating: " + filename
separate.separate_sample(model, input, batch_size, output_filename_prefix,
config['dataset']['sample_rate'], output_folder_path, config['model']['type'])
def main():
set_system_settings()
cla = get_command_line_arguments()
config = load_config(cla.config)
if cla.mode == 'training':
training(config, cla)
elif cla.mode == 'inference':
inference(config, cla)
if __name__ == "__main__":
main()
================================================
FILE: models.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Models.py
import keras
import util
import os
import numpy as np
import layers
import logging
#Singing Voice Separation Wavenet Model
class SingingVoiceSeparationWavenet():
def __init__(self, config, load_checkpoint=None, input_length=None, target_field_length=None, print_model_summary=False):
self.config = config
self.verbosity = config['training']['verbosity']
self.num_stacks = self.config['model']['num_stacks']
if type(self.config['model']['dilations']) is int:
self.dilations = [2 ** i for i in range(0, self.config['model']['dilations'] + 1)]
elif type(self.config['model']['dilations']) is list:
self.dilations = self.config['model']['dilations']
self.receptive_field_length = util.compute_receptive_field_length(config['model']['num_stacks'], self.dilations,
config['model']['filters']['lengths']['res'],
1)
if input_length is not None:
self.input_length = input_length
self.target_field_length = self.input_length - (self.receptive_field_length - 1)
if target_field_length is not None:
self.target_field_length = target_field_length
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
else:
self.target_field_length = config['model']['target_field_length']
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
self.target_padding = config['model']['target_padding']
self.padded_target_field_length = self.target_field_length + 2 * self.target_padding
self.half_target_field_length = self.target_field_length / 2
self.half_receptive_field_length = self.receptive_field_length / 2
self.num_residual_blocks = len(self.dilations) * self.num_stacks
self.activation = keras.layers.Activation('relu')
self.samples_of_interest_indices = self.get_padded_target_field_indices()
self.target_sample_indices = self.get_target_field_indices()
self.optimizer = self.get_optimizer()
self.out_1_loss = self.get_out_1_loss()
self.out_2_loss = self.get_out_2_loss()
self.metrics = self.get_metrics()
self.epoch_num = 0
self.checkpoints_path = ''
self.samples_path = ''
self.history_filename = ''
self.config['model']['num_residual_blocks'] = self.num_residual_blocks
self.config['model']['receptive_field_length'] = self.receptive_field_length
self.config['model']['input_length'] = self.input_length
self.config['model']['target_field_length'] = self.target_field_length
self.config['model']['type'] = 'singing-voice'
self.model = self.setup_model(load_checkpoint, print_model_summary)
def setup_model(self, load_checkpoint=None, print_model_summary=False):
self.checkpoints_path = os.path.join(self.config['training']['path'], 'checkpoints')
self.samples_path = os.path.join(self.config['training']['path'], 'samples')
self.history_filename = 'history_' + self.config['training']['path'][
self.config['training']['path'].rindex('/') + 1:] + '.csv'
model = self.build_model()
if os.path.exists(self.checkpoints_path) and util.dir_contains_files(self.checkpoints_path):
if load_checkpoint is not None:
last_checkpoint_path = load_checkpoint
self.epoch_num = 0
else:
checkpoints = os.listdir(self.checkpoints_path)
checkpoints.sort(key=lambda x: os.stat(os.path.join(self.checkpoints_path, x)).st_mtime)
last_checkpoint = checkpoints[-1]
last_checkpoint_path = os.path.join(self.checkpoints_path, last_checkpoint)
self.epoch_num = int(last_checkpoint[11:16])
print 'Loading model from epoch: %d' % self.epoch_num
model.load_weights(last_checkpoint_path)
else:
print 'Building new model...'
if not os.path.exists(self.config['training']['path']):
os.mkdir(self.config['training']['path'])
if not os.path.exists(self.checkpoints_path):
os.mkdir(self.checkpoints_path)
self.epoch_num = 0
if not os.path.exists(self.samples_path):
os.mkdir(self.samples_path)
if print_model_summary:
model.summary()
model.compile(optimizer=self.optimizer,
loss={'data_output_1': self.out_1_loss, 'data_output_2': self.out_2_loss}, metrics=self.metrics)
self.config['model']['num_params'] = model.count_params()
config_path = os.path.join(self.config['training']['path'], 'config.json')
if not os.path.exists(config_path):
util.pretty_json_dump(self.config, config_path)
if print_model_summary:
util.pretty_json_dump(self.config)
return model
def get_optimizer(self):
return keras.optimizers.Adam(lr=self.config['optimizer']['lr'], decay=self.config['optimizer']['decay'],
epsilon=self.config['optimizer']['epsilon'])
def get_out_1_loss(self):
if self.config['training']['loss']['out_1']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_1']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_1']['l1'],
self.config['training']['loss']['out_1']['l2'])
def get_out_2_loss(self):
if self.config['training']['loss']['out_2']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_2']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_2']['l1'],
self.config['training']['loss']['out_2']['l2'])
def get_callbacks(self):
return [
keras.callbacks.EarlyStopping(patience=self.config['training']['early_stopping_patience'], verbose=1,
monitor='loss'),
keras.callbacks.ModelCheckpoint(os.path.join(self.checkpoints_path,
'checkpoint.{epoch:05d}-{val_loss:.3f}.hdf5')),
keras.callbacks.CSVLogger(os.path.join(self.config['training']['path'], self.history_filename), append=True)
]
def fit_model(self, train_set_generator, num_steps_train, test_set_generator, num_steps_test, num_epochs):
print('Fitting model with %d training num steps and %d test num steps...' % (num_steps_train, num_steps_test))
self.model.fit_generator(train_set_generator,
num_steps_train,
epochs=num_epochs,
validation_data=test_set_generator,
validation_steps=num_steps_test,
callbacks=self.get_callbacks(),
verbose=self.verbosity,
initial_epoch=self.epoch_num)
def separate_batch(self, inputs):
return self.model.predict_on_batch(inputs)
def get_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length,
target_sample_index + self.half_target_field_length + 1)
def get_padded_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_target_sample_index(self):
return int(np.floor(self.input_length / 2.0))
def get_metrics(self):
return [
keras.metrics.mean_absolute_error,
self.valid_mean_absolute_error
]
def valid_mean_absolute_error(self, y_true, y_pred):
return keras.backend.mean(
keras.backend.abs(y_true[:, 1:-2] - y_pred[:, 1:-2]))
def build_model(self):
data_input = keras.engine.Input(
shape=(self.input_length,),
name='data_input')
data_expanded = layers.AddSingletonDepth()(data_input)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'], padding='same',
use_bias=False,
name='initial_causal_conv')(data_expanded)
skip_connections = []
res_block_i = 0
for stack_i in range(self.num_stacks):
layer_in_stack = 0
for dilation in self.dilations:
res_block_i += 1
data_out, skip_out = self.dilated_residual_block(data_out, res_block_i, layer_in_stack, dilation, stack_i)
if skip_out is not None:
skip_connections.append(skip_out)
layer_in_stack += 1
data_out = keras.layers.Add()(skip_connections)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][0],
self.config['model']['filters']['lengths']['final'][0],
padding='same',
use_bias=False)(data_out)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][1],
self.config['model']['filters']['lengths']['final'][1], padding='same',
use_bias=False)(data_out)
data_out = keras.layers.Convolution1D(1, 1)(data_out)
data_out_vocals_1 = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_1')(
data_out)
data_out_vocals_2 = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_2')(
data_out)
return keras.engine.Model(inputs=[data_input], outputs=[data_out_vocals_1, data_out_vocals_2])
def dilated_residual_block(self, data_x, res_block_i, layer_i, dilation, stack_i):
original_x = data_x
data_out = keras.layers.Conv1D(2 * self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'],
dilation_rate=dilation, padding='same',
use_bias=False,
name='res_%d_dilated_conv_d%d_s%d' % (
res_block_i, dilation, stack_i),
activation=None)(data_x)
data_out_1 = layers.Slice(
(Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_1_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
data_out_2 = layers.Slice(
(Ellipsis, slice(self.config['model']['filters']['depths']['res'],
2 * self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_2_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
tanh_out = keras.layers.Activation('tanh')(data_out_1)
sigm_out = keras.layers.Activation('sigmoid')(data_out_2)
data_x = keras.layers.Multiply(name='res_%d_gated_activation_%d_s%d' % (res_block_i, layer_i, stack_i))(
[tanh_out, sigm_out])
data_x = keras.layers.Convolution1D(
self.config['model']['filters']['depths']['res'] + self.config['model']['filters']['depths']['skip'], 1,
padding='same', use_bias=False)(data_x)
res_x = layers.Slice((Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_3_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((Ellipsis, slice(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['depths']['res'] +
self.config['model']['filters']['depths']['skip'])),
(self.input_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_data_slice_4_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((slice(self.samples_of_interest_indices[0], self.samples_of_interest_indices[-1] + 1, 1),
Ellipsis), (self.padded_target_field_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_keep_samples_of_interest_d%d_s%d' % (res_block_i, dilation, stack_i))(skip_x)
res_x = keras.layers.Add()([original_x, res_x])
return res_x, skip_x
# Multi-Instrument Separation Wavenet Model
class MultiInstrumentSeparationWavenet():
def __init__(self, config, load_checkpoint=None, input_length=None, target_field_length=None, print_model_summary=False):
self.config = config
self.verbosity = config['training']['verbosity']
self.num_stacks = self.config['model']['num_stacks']
if type(self.config['model']['dilations']) is int:
self.dilations = [2 ** i for i in range(0, self.config['model']['dilations'] + 1)]
elif type(self.config['model']['dilations']) is list:
self.dilations = self.config['model']['dilations']
self.receptive_field_length = util.compute_receptive_field_length(config['model']['num_stacks'], self.dilations,
config['model']['filters']['lengths']['res'],
1)
if input_length is not None:
self.input_length = input_length
self.target_field_length = self.input_length - (self.receptive_field_length - 1)
if target_field_length is not None:
self.target_field_length = target_field_length
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
else:
self.target_field_length = config['model']['target_field_length']
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
self.target_padding = config['model']['target_padding']
self.padded_target_field_length = self.target_field_length + 2 * self.target_padding
self.half_target_field_length = self.target_field_length / 2
self.half_receptive_field_length = self.receptive_field_length / 2
self.num_residual_blocks = len(self.dilations) * self.num_stacks
self.activation = keras.layers.Activation('relu')
self.samples_of_interest_indices = self.get_padded_target_field_indices()
self.target_sample_indices = self.get_target_field_indices()
self.optimizer = self.get_optimizer()
self.out_1_loss = self.get_out_1_loss()
self.out_2_loss = self.get_out_2_loss()
self.out_3_loss = self.get_out_3_loss()
self.metrics = self.get_metrics()
self.epoch_num = 0
self.checkpoints_path = ''
self.samples_path = ''
self.history_filename = ''
self.config['model']['num_residual_blocks'] = self.num_residual_blocks
self.config['model']['receptive_field_length'] = self.receptive_field_length
self.config['model']['input_length'] = self.input_length
self.config['model']['target_field_length'] = self.target_field_length
self.config['model']['type'] = 'multi-instrument'
self.model = self.setup_model(load_checkpoint, print_model_summary)
def setup_model(self, load_checkpoint=None, print_model_summary=False):
self.checkpoints_path = os.path.join(self.config['training']['path'], 'checkpoints')
self.samples_path = os.path.join(self.config['training']['path'], 'samples')
self.history_filename = 'history_' + self.config['training']['path'][
self.config['training']['path'].rindex('/') + 1:] + '.csv'
model = self.build_model()
if os.path.exists(self.checkpoints_path) and util.dir_contains_files(self.checkpoints_path):
if load_checkpoint is not None:
last_checkpoint_path = load_checkpoint
self.epoch_num = 0
else:
checkpoints = os.listdir(self.checkpoints_path)
checkpoints.sort(key=lambda x: os.stat(os.path.join(self.checkpoints_path, x)).st_mtime)
last_checkpoint = checkpoints[-1]
last_checkpoint_path = os.path.join(self.checkpoints_path, last_checkpoint)
self.epoch_num = int(last_checkpoint[11:16])
print 'Loading model from epoch: %d' % self.epoch_num
model.load_weights(last_checkpoint_path)
else:
print 'Building new model...'
if not os.path.exists(self.config['training']['path']):
os.mkdir(self.config['training']['path'])
if not os.path.exists(self.checkpoints_path):
os.mkdir(self.checkpoints_path)
self.epoch_num = 0
if not os.path.exists(self.samples_path):
os.mkdir(self.samples_path)
if print_model_summary:
model.summary()
model.compile(optimizer=self.optimizer,
loss={'data_output_1': self.out_1_loss, 'data_output_2': self.out_2_loss,
'data_output_3': self.out_3_loss}, metrics=self.metrics)
self.config['model']['num_params'] = model.count_params()
config_path = os.path.join(self.config['training']['path'], 'config.json')
if not os.path.exists(config_path):
util.pretty_json_dump(self.config, config_path)
if print_model_summary:
util.pretty_json_dump(self.config)
return model
def get_optimizer(self):
return keras.optimizers.Adam(lr=self.config['optimizer']['lr'], decay=self.config['optimizer']['decay'],
epsilon=self.config['optimizer']['epsilon'])
def get_out_1_loss(self):
if self.config['training']['loss']['out_1']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_1']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_1']['l1'],
self.config['training']['loss']['out_1']['l2'])
def get_out_2_loss(self):
if self.config['training']['loss']['out_2']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_2']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_2']['l1'],
self.config['training']['loss']['out_2']['l2'])
def get_out_3_loss(self):
if self.config['training']['loss']['out_3']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_3']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_3']['l1'],
self.config['training']['loss']['out_3']['l2'])
def get_callbacks(self):
return [
keras.callbacks.EarlyStopping(patience=self.config['training']['early_stopping_patience'], verbose=1,
monitor='loss'),
keras.callbacks.ModelCheckpoint(os.path.join(self.checkpoints_path,
'checkpoint.{epoch:05d}-{val_loss:.3f}.hdf5')),
keras.callbacks.CSVLogger(os.path.join(self.config['training']['path'], self.history_filename), append=True)
]
def fit_model(self, train_set_generator, num_steps_train, test_set_generator, num_steps_test, num_epochs):
print('Fitting model with %d training num steps and %d test num steps...' % (num_steps_train, num_steps_test))
self.model.fit_generator(train_set_generator,
num_steps_train,
epochs=num_epochs,
validation_data=test_set_generator,
validation_steps=num_steps_test,
callbacks=self.get_callbacks(),
verbose=self.verbosity,
initial_epoch=self.epoch_num)
def separate_batch(self, inputs):
return self.model.predict_on_batch(inputs)
def get_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length,
target_sample_index + self.half_target_field_length + 1)
def get_padded_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_target_sample_index(self):
return int(np.floor(self.input_length / 2.0))
def get_metrics(self):
return [
keras.metrics.mean_absolute_error,
self.valid_mean_absolute_error
]
def valid_mean_absolute_error(self, y_true, y_pred):
return keras.backend.mean(
keras.backend.abs(y_true[:, 1:-2] - y_pred[:, 1:-2]))
def build_model(self):
data_input = keras.engine.Input(
shape=(self.input_length,),
name='data_input')
data_expanded = layers.AddSingletonDepth()(data_input)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'], padding='same',
use_bias=False,
name='initial_causal_conv')(data_expanded)
skip_connections = []
res_block_i = 0
for stack_i in range(self.num_stacks):
layer_in_stack = 0
for dilation in self.dilations:
res_block_i += 1
data_out, skip_out = self.dilated_residual_block(data_out, res_block_i, layer_in_stack, dilation, stack_i)
if skip_out is not None:
skip_connections.append(skip_out)
layer_in_stack += 1
data_out = keras.layers.Add()(skip_connections)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][0],
self.config['model']['filters']['lengths']['final'][0],
padding='same',
use_bias=False)(data_out)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][1],
self.config['model']['filters']['lengths']['final'][1], padding='same',
use_bias=False)(data_out)
data_out = keras.layers.Convolution1D(3, 1)(data_out)
data_out_vocals = layers.Slice((Ellipsis, slice(0, 1)), (self.padded_target_field_length, 1),
name='slice_data_output_1')(data_out)
data_out_drums = layers.Slice((Ellipsis, slice(1, 2)), (self.padded_target_field_length, 1),
name='slice_data_output_2')(data_out)
data_out_bass = layers.Slice((Ellipsis, slice(2, 3)), (self.padded_target_field_length, 1),
name='slice_data_output_3')(data_out)
data_out_vocals = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_1')(
data_out_vocals)
data_out_drums = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_2')(
data_out_drums)
data_out_bass = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_3')(
data_out_bass)
return keras.engine.Model(inputs=[data_input], outputs=[data_out_vocals, data_out_drums, data_out_bass])
def dilated_residual_block(self, data_x, res_block_i, layer_i, dilation, stack_i):
original_x = data_x
data_out = keras.layers.Conv1D(2 * self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'],
dilation_rate=dilation, padding='same',
use_bias=False,
name='res_%d_dilated_conv_d%d_s%d' % (
res_block_i, dilation, stack_i),
activation=None)(data_x)
data_out_1 = layers.Slice(
(Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_1_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
data_out_2 = layers.Slice(
(Ellipsis, slice(self.config['model']['filters']['depths']['res'],
2 * self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_2_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
tanh_out = keras.layers.Activation('tanh')(data_out_1)
sigm_out = keras.layers.Activation('sigmoid')(data_out_2)
data_x = keras.layers.Multiply(name='res_%d_gated_activation_%d_s%d' % (res_block_i, layer_i, stack_i))(
[tanh_out, sigm_out])
data_x = keras.layers.Convolution1D(
self.config['model']['filters']['depths']['res'] + self.config['model']['filters']['depths']['skip'], 1,
padding='same', use_bias=False)(data_x)
res_x = layers.Slice((Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_3_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((Ellipsis, slice(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['depths']['res'] +
self.config['model']['filters']['depths']['skip'])),
(self.input_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_data_slice_4_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((slice(self.samples_of_interest_indices[0], self.samples_of_interest_indices[-1] + 1, 1),
Ellipsis), (self.padded_target_field_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_keep_samples_of_interest_d%d_s%d' % (res_block_i, dilation, stack_i))(skip_x)
res_x = keras.layers.Add()([original_x, res_x])
return res_x, skip_x
================================================
FILE: separate.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Separate.py
from __future__ import division
import os
import util
import tqdm
import numpy as np
def separate_sample(model, input, batch_size, output_filename_prefix, sample_rate, output_path, target):
if target == 'singing-voice':
if len(input['mixture']) < model.receptive_field_length:
raise ValueError('Input is not long enough to be used with this model.')
num_output_samples = input['mixture'].shape[0] - (model.receptive_field_length - 1)
num_fragments = int(np.ceil(num_output_samples / model.target_field_length))
num_batches = int(np.ceil(num_fragments / batch_size))
vocals_output = []
num_pad_values = 0
fragment_i = 0
for batch_i in tqdm.tqdm(range(0, num_batches)):
if batch_i == num_batches - 1: # If its the last batch
batch_size = num_fragments - batch_i * batch_size
input_batch = np.zeros((batch_size, model.input_length))
# Assemble batch
for batch_fragment_i in range(0, batch_size):
if fragment_i + model.target_field_length > num_output_samples:
remainder = input['mixture'][fragment_i:]
current_fragment = np.zeros((model.input_length,))
current_fragment[:remainder.shape[0]] = remainder
num_pad_values = model.input_length - remainder.shape[0]
else:
current_fragment = input['mixture'][fragment_i:fragment_i + model.input_length]
input_batch[batch_fragment_i, :] = current_fragment
fragment_i += model.target_field_length
separated_output_fragments = model.separate_batch({'data_input': input_batch})
if type(separated_output_fragments) is list:
vocals_output_fragment = separated_output_fragments[0]
vocals_output_fragment = vocals_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
vocals_output_fragment = vocals_output_fragment.flatten().tolist()
if type(separated_output_fragments) is float:
vocals_output_fragment = [vocals_output_fragment]
vocals_output = vocals_output + vocals_output_fragment
vocals_output = np.array(vocals_output)
if num_pad_values != 0:
vocals_output = vocals_output[:-num_pad_values]
mixture_valid_signal = input['mixture'][
model.half_receptive_field_length:model.half_receptive_field_length + len(vocals_output)]
accompaniment_output = mixture_valid_signal - vocals_output
output_vocals_filename = output_filename_prefix + '_vocals.wav'
output_accompaniment_filename = output_filename_prefix + '_accompaniment.wav'
output_vocals_filepath = os.path.join(output_path, output_vocals_filename)
output_accompaniment_filepath = os.path.join(output_path, output_accompaniment_filename)
util.write_wav(vocals_output, output_vocals_filepath, sample_rate)
util.write_wav(accompaniment_output, output_accompaniment_filepath, sample_rate)
if target == 'multi-instrument':
if len(input['mixture']) < model.receptive_field_length:
raise ValueError('Input is not long enough to be used with this model.')
num_output_samples = input['mixture'].shape[0] - (model.receptive_field_length - 1)
num_fragments = int(np.ceil(num_output_samples / model.target_field_length))
num_batches = int(np.ceil(num_fragments / batch_size))
vocals_output = []
drums_output = []
bass_output = []
num_pad_values = 0
fragment_i = 0
for batch_i in tqdm.tqdm(range(0, num_batches)):
if batch_i == num_batches - 1: # If its the last batch
batch_size = num_fragments - batch_i * batch_size
input_batch = np.zeros((batch_size, model.input_length))
# Assemble batch
for batch_fragment_i in range(0, batch_size):
if fragment_i + model.target_field_length > num_output_samples:
remainder = input['mixture'][fragment_i:]
current_fragment = np.zeros((model.input_length,))
current_fragment[:remainder.shape[0]] = remainder
num_pad_values = model.input_length - remainder.shape[0]
else:
current_fragment = input['mixture'][fragment_i:fragment_i + model.input_length]
input_batch[batch_fragment_i, :] = current_fragment
fragment_i += model.target_field_length
separated_output_fragments = model.separate_batch({'data_input': input_batch})
if type(separated_output_fragments) is list:
vocals_output_fragment = separated_output_fragments[0]
drums_output_fragment = separated_output_fragments[1]
bass_output_fragment = separated_output_fragments[2]
vocals_output_fragment = vocals_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
vocals_output_fragment = vocals_output_fragment.flatten().tolist()
drums_output_fragment = drums_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
drums_output_fragment = drums_output_fragment.flatten().tolist()
bass_output_fragment = bass_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
bass_output_fragment = bass_output_fragment.flatten().tolist()
if type(separated_output_fragments) is float:
vocals_output_fragment = [vocals_output_fragment]
if type(drums_output_fragment) is float:
drums_output_fragment = [drums_output_fragment]
if type(bass_output_fragment) is float:
bass_output_fragment = [bass_output_fragment]
vocals_output = vocals_output + vocals_output_fragment
drums_output = drums_output + drums_output_fragment
bass_output = bass_output + bass_output_fragment
vocals_output = np.array(vocals_output)
drums_output = np.array(drums_output)
bass_output = np.array(bass_output)
if num_pad_values != 0:
vocals_output = vocals_output[:-num_pad_values]
drums_output = drums_output[:-num_pad_values]
bass_output = bass_output[:-num_pad_values]
mixture_valid_signal = input['mixture'][
model.half_receptive_field_length:model.half_receptive_field_length + len(vocals_output)]
other_output = mixture_valid_signal - vocals_output - drums_output - bass_output
output_vocals_filename = output_filename_prefix + '_vocals.wav'
output_drums_filename = output_filename_prefix + '_drums.wav'
output_bass_filename = output_filename_prefix + '_bass.wav'
output_other_filename = output_filename_prefix + '_other.wav'
output_vocals_filepath = os.path.join(output_path, output_vocals_filename)
output_drums_filepath = os.path.join(output_path, output_drums_filename)
output_bass_filepath = os.path.join(output_path, output_bass_filename)
output_other_filepath = os.path.join(output_path, output_other_filename)
util.write_wav(vocals_output, output_vocals_filepath, sample_rate)
util.write_wav(drums_output, output_drums_filepath, sample_rate)
util.write_wav(bass_output, output_bass_filepath, sample_rate)
util.write_wav(other_output, output_other_filepath, sample_rate)
================================================
FILE: sessions/multi-instrument/checkpoints/checkpoint.00045-0.hdf5
================================================
[File too large to display: 38.3 MB]
================================================
FILE: sessions/multi-instrument/config.json
================================================
{
"dataset": {
"extract_voice_percentage": 0,
"in_memory_percentage": 1,
"path": "MUS",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"depths": {
"final": [
2048,
256
],
"res": 64,
"skip": 64
},
"lengths": {
"final": [
3,
3
],
"res": 3,
"skip": 1
}
},
"input_length": 9785,
"num_params": 3277763,
"num_residual_blocks": 40,
"num_stacks": 4,
"receptive_field_length": 8185,
"target_field_length": 1601,
"target_padding": 1,
"type": "multi-instrument"
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_3": {
"l1": 1,
"l2": 0,
"weight": 1
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/multi-instrument",
"verbosity": 1
}
}
================================================
FILE: sessions/singing-voice/checkpoints/checkpoint.00058-0.hdf5
================================================
[File too large to display: 38.3 MB]
================================================
FILE: sessions/singing-voice/config.json
================================================
{
"dataset": {
"extract_voice_percentage": 0.5,
"in_memory_percentage": 1,
"path": "data/MUS",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"depths": {
"final": [
2048,
256
],
"res": 64,
"skip": 64
},
"lengths": {
"final": [
3,
3
],
"res": 3,
"skip": 1
}
},
"input_length": 9785,
"num_params": 3277249,
"num_residual_blocks": 40,
"num_stacks": 4,
"receptive_field_length": 8185,
"target_field_length": 1601,
"target_padding": 1,
"type": "singing-voice"
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": -0.05
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/singing-voice",
"verbosity": 1
}
}
================================================
FILE: util.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Util.py
# Utility functions for dealing with audio signals and training a Source Separation Wavenet
import os
import numpy as np
import json
import warnings
import scipy.signal
import scipy.stats
import soundfile as sf
import keras
import glob
def l1_l2_loss(y_true, y_pred, l1_weight, l2_weight):
loss = 0
if l1_weight != 0:
loss += l1_weight*keras.losses.mean_absolute_error(y_true, y_pred)
if l2_weight != 0:
loss += l2_weight * keras.losses.mean_squared_error(y_true, y_pred)
return loss
def compute_receptive_field_length(stacks, dilations, filter_length, target_field_length):
half_filter_length = (filter_length-1)/2
length = 0
for d in dilations:
length += d*half_filter_length
length = 2*length
length = stacks * length
length += target_field_length
return length
def wav_to_float(x):
try:
max_value = np.iinfo(x.dtype).max
min_value = np.iinfo(x.dtype).min
except:
max_value = np.finfo(x.dtype).max
min_value = np.finfo(x.dtype).min
x = x.astype('float64', casting='safe')
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def float_to_uint8(x):
x += 1.
x /= 2.
uint8_max_value = np.iinfo('uint8').max
x *= uint8_max_value
x = x.astype('uint8')
return x
def keras_float_to_uint8(x):
x += 1.
x /= 2.
uint8_max_value = 255
x *= uint8_max_value
return x
def linear_to_ulaw(x, u=255):
x = np.sign(x) * (np.log(1 + u * np.abs(x)) / np.log(1 + u))
return x
def keras_linear_to_ulaw(x, u=255.0):
x = keras.backend.sign(x) * (keras.backend.log(1 + u * keras.backend.abs(x)) / keras.backend.log(1 + u))
return x
def uint8_to_float(x):
max_value = np.iinfo('uint8').max
min_value = np.iinfo('uint8').min
x = x.astype('float32', casting='unsafe')
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def keras_uint8_to_float(x):
max_value = 255
min_value = 0
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def ulaw_to_linear(x, u=255.0):
y = np.sign(x) * (1 / float(u)) * (((1 + float(u)) ** np.abs(x)) - 1)
return y
def keras_ulaw_to_linear(x, u=255.0):
y = keras.backend.sign(x) * (1 / u) * (((1 + u) ** keras.backend.abs(x)) - 1)
return y
def one_hot_encode(x, num_values=256):
if isinstance(x, int):
x = np.array([x])
if isinstance(x, list):
x = np.array(x)
return np.eye(num_values, dtype='uint8')[x.astype('uint8')]
def one_hot_decode(x):
return np.argmax(x, axis=-1)
def preemphasis(signal, alpha=0.95):
return np.append(signal[0], signal[1:] - alpha * signal[:-1])
def binary_encode(x, max_value):
if isinstance(x, int):
x = np.array([x])
if isinstance(x, list):
x = np.array(x)
width = np.ceil(np.log2(max_value)).astype(int)
return (((x[:, None] & (1 << np.arange(width)))) > 0).astype(int)
def get_condition_input_encode_func(representation):
if representation == 'binary':
return binary_encode
else:
return one_hot_encode
def ensure_keys_in_dict(keys, dictionary):
if all (key in dictionary for key in keys):
return True
return False
def get_subdict_from_dict(keys, dictionary):
return dict((k, dictionary[k]) for k in keys if k in dictionary)
def pretty_json_dump(values, file_path=None):
if file_path is None:
print json.dumps(values, sort_keys=True, indent=4, separators=(',', ': '))
else:
json.dump(values, open(file_path, 'w'), sort_keys=True, indent=4, separators=(',', ': '))
def read_wav(filename):
# Reads in a wav audio file, averages both if stereo, converts the signal to float64 representation
audio_signal, sample_rate = sf.read(filename)
if audio_signal.ndim > 1:
audio_signal = (audio_signal[:, 0] + audio_signal[:, 1])/2.0
if audio_signal.dtype != 'float64':
audio_signal = wav_to_float(audio_signal)
return audio_signal, sample_rate
def load_wav(wav_path, desired_sample_rate):
sequence, sample_rate = read_wav(wav_path)
sequence = ensure_sample_rate(sequence, desired_sample_rate, sample_rate)
return sequence
def write_wav(x, filename, sample_rate):
if type(x) != np.ndarray:
x = np.array(x)
with warnings.catch_warnings():
warnings.simplefilter("error")
sf.write(filename, x, sample_rate)
def ensure_sample_rate(x, desired_sample_rate, file_sample_rate):
if file_sample_rate != desired_sample_rate:
return scipy.signal.resample_poly(x, desired_sample_rate, file_sample_rate)
return x
def normalize(x):
max_peak = np.max(np.abs(x))
return x / max_peak
def get_sequence_with_singing_indices(full_sequence):
signal_magnitude = np.abs(full_sequence)
chunk_length = 800
chunks_energies = []
for i in xrange(0, len(signal_magnitude), chunk_length):
chunks_energies.append(np.mean(signal_magnitude[i:i + chunk_length]))
threshold = np.max(chunks_energies) * .1
chunks_energies = np.asarray(chunks_energies)
chunks_energies[np.where(chunks_energies < threshold)] = 0
onsets = np.zeros(len(chunks_energies))
onsets[np.nonzero(chunks_energies)] = 1
onsets = np.diff(onsets)
start_ind = np.squeeze(np.where(onsets == 1))
finish_ind = np.squeeze(np.where(onsets == -1))
if finish_ind[0] < start_ind[0]:
finish_ind = finish_ind[1:]
if start_ind[-1] > finish_ind[-1]:
start_ind = start_ind[:-1]
indices_inici_final = np.insert(finish_ind, np.arange(len(start_ind)), start_ind)
return np.squeeze((np.asarray(indices_inici_final) + 1) * chunk_length)
def get_indices_subsequence(indices):
start_indice = 2 * np.random.randint(0, np.ceil(len(indices) / 2))
vocals_indices = (indices[start_indice], indices[start_indice + 1])
accompaniment_indices = vocals_indices
return vocals_indices, accompaniment_indices
def contains_voice(fragment, sequence):
signal_fragment_magnitude = np.abs(fragment)
signal_sequence_magnitude = np.abs(sequence)
chunk_length = 800
chunks_fragment_energies = []
for i in xrange(0, len(signal_fragment_magnitude), chunk_length):
chunks_fragment_energies.append(np.mean(signal_fragment_magnitude[i:i + chunk_length]))
chunks_sequence_energies = []
for i in xrange(0, len(signal_sequence_magnitude), chunk_length):
chunks_sequence_energies.append(np.mean(signal_sequence_magnitude[i:i + chunk_length]))
threshold = np.max(chunks_sequence_energies) * .1
chunks_fragment_energies = np.asarray(chunks_fragment_energies)
chunks_fragment_energies[np.where(chunks_fragment_energies < threshold)] = 0
if np.count_nonzero(chunks_fragment_energies) > 0:
return True
else:
return False
def dir_contains_files(path):
for f in os.listdir(path):
if not f.startswith('.'):
return True
return False
gitextract_t_rsro8e/ ├── LICENSE ├── README.md ├── config.md ├── config_multi_instrument.json ├── config_singing_voice.json ├── datasets.py ├── environment.yml ├── layers.py ├── main.py ├── models.py ├── separate.py ├── sessions/ │ ├── multi-instrument/ │ │ ├── checkpoints/ │ │ │ └── checkpoint.00045-0.hdf5 │ │ └── config.json │ └── singing-voice/ │ ├── checkpoints/ │ │ └── checkpoint.00058-0.hdf5 │ └── config.json └── util.py
SYMBOL INDEX (107 symbols across 6 files)
FILE: datasets.py
class SingingVoiceMUSDB18Dataset (line 11) | class SingingVoiceMUSDB18Dataset():
method __init__ (line 13) | def __init__(self, config, model):
method load_dataset (line 28) | def load_dataset(self):
method load_songs (line 46) | def load_songs(self):
method get_num_sequences_in_dataset (line 70) | def get_num_sequences_in_dataset(self):
method retrieve_sequence (line 74) | def retrieve_sequence(self, set, condition, sequence_num):
method get_random_batch_generator (line 87) | def get_random_batch_generator(self, set):
method get_condition_input_encode_func (line 143) | def get_condition_input_encode_func(self, representation):
method get_target_sample_index (line 150) | def get_target_sample_index(self):
method get_samples_of_interest_indices (line 153) | def get_samples_of_interest_indices(self, causal=False):
method get_sample_weight_vector_length (line 162) | def get_sample_weight_vector_length(self):
class MultiInstrumentMUSDB18Dataset (line 169) | class MultiInstrumentMUSDB18Dataset():
method __init__ (line 171) | def __init__(self, config, model):
method load_dataset (line 186) | def load_dataset(self):
method load_songs (line 204) | def load_songs(self):
method get_num_sequences_in_dataset (line 228) | def get_num_sequences_in_dataset(self):
method retrieve_sequence (line 232) | def retrieve_sequence(self, set, condition, sequence_num):
method get_random_batch_generator (line 245) | def get_random_batch_generator(self, set):
method get_condition_input_encode_func (line 311) | def get_condition_input_encode_func(self, representation):
method get_target_sample_index (line 318) | def get_target_sample_index(self):
method get_samples_of_interest_indices (line 321) | def get_samples_of_interest_indices(self, causal=False):
method get_sample_weight_vector_length (line 330) | def get_sample_weight_vector_length(self):
FILE: layers.py
class AddSingletonDepth (line 7) | class AddSingletonDepth(keras.layers.Layer):
method call (line 9) | def call(self, x, mask=None):
method compute_output_shape (line 17) | def compute_output_shape(self, input_shape):
class Subtract (line 24) | class Subtract(keras.layers.Layer):
method __init__ (line 26) | def __init__(self, **kwargs):
method call (line 29) | def call(self, x, mask=None):
method compute_output_shape (line 32) | def compute_output_shape(self, input_shape):
class Add (line 36) | class Add(keras.layers.Layer):
method __init__ (line 38) | def __init__(self, **kwargs):
method call (line 41) | def call(self, x, mask=None):
method compute_output_shape (line 47) | def compute_output_shape(self, input_shape):
class Slice (line 51) | class Slice(keras.layers.Layer):
method __init__ (line 53) | def __init__(self, selector, output_shape, **kwargs):
method call (line 58) | def call(self, x, mask=None):
method compute_output_shape (line 73) | def compute_output_shape(self, input_shape):
FILE: main.py
function set_system_settings (line 15) | def set_system_settings():
function get_command_line_arguments (line 20) | def get_command_line_arguments():
function load_config (line 49) | def load_config(config_filepath):
function get_dataset (line 60) | def get_dataset(config, cla, model):
function training (line 69) | def training(config, cla):
function get_valid_output_folder_path (line 92) | def get_valid_output_folder_path(outputs_folder_path):
function inference (line 104) | def inference(config, cla):
function main (line 170) | def main():
FILE: models.py
class SingingVoiceSeparationWavenet (line 13) | class SingingVoiceSeparationWavenet():
method __init__ (line 15) | def __init__(self, config, load_checkpoint=None, input_length=None, ta...
method setup_model (line 66) | def setup_model(self, load_checkpoint=None, print_model_summary=False):
method get_optimizer (line 118) | def get_optimizer(self):
method get_out_1_loss (line 123) | def get_out_1_loss(self):
method get_out_2_loss (line 132) | def get_out_2_loss(self):
method get_callbacks (line 141) | def get_callbacks(self):
method fit_model (line 151) | def fit_model(self, train_set_generator, num_steps_train, test_set_gen...
method separate_batch (line 164) | def separate_batch(self, inputs):
method get_target_field_indices (line 167) | def get_target_field_indices(self):
method get_padded_target_field_indices (line 174) | def get_padded_target_field_indices(self):
method get_target_sample_index (line 181) | def get_target_sample_index(self):
method get_metrics (line 184) | def get_metrics(self):
method valid_mean_absolute_error (line 191) | def valid_mean_absolute_error(self, y_true, y_pred):
method build_model (line 195) | def build_model(self):
method dilated_residual_block (line 245) | def dilated_residual_block(self, data_x, res_block_i, layer_i, dilatio...
class MultiInstrumentSeparationWavenet (line 299) | class MultiInstrumentSeparationWavenet():
method __init__ (line 301) | def __init__(self, config, load_checkpoint=None, input_length=None, ta...
method setup_model (line 353) | def setup_model(self, load_checkpoint=None, print_model_summary=False):
method get_optimizer (line 406) | def get_optimizer(self):
method get_out_1_loss (line 411) | def get_out_1_loss(self):
method get_out_2_loss (line 420) | def get_out_2_loss(self):
method get_out_3_loss (line 429) | def get_out_3_loss(self):
method get_callbacks (line 437) | def get_callbacks(self):
method fit_model (line 447) | def fit_model(self, train_set_generator, num_steps_train, test_set_gen...
method separate_batch (line 460) | def separate_batch(self, inputs):
method get_target_field_indices (line 463) | def get_target_field_indices(self):
method get_padded_target_field_indices (line 470) | def get_padded_target_field_indices(self):
method get_target_sample_index (line 477) | def get_target_sample_index(self):
method get_metrics (line 480) | def get_metrics(self):
method valid_mean_absolute_error (line 487) | def valid_mean_absolute_error(self, y_true, y_pred):
method build_model (line 491) | def build_model(self):
method dilated_residual_block (line 552) | def dilated_residual_block(self, data_x, res_block_i, layer_i, dilatio...
FILE: separate.py
function separate_sample (line 11) | def separate_sample(model, input, batch_size, output_filename_prefix, sa...
FILE: util.py
function l1_l2_loss (line 16) | def l1_l2_loss(y_true, y_pred, l1_weight, l2_weight):
function compute_receptive_field_length (line 29) | def compute_receptive_field_length(stacks, dilations, filter_length, tar...
function wav_to_float (line 41) | def wav_to_float(x):
function float_to_uint8 (line 56) | def float_to_uint8(x):
function keras_float_to_uint8 (line 66) | def keras_float_to_uint8(x):
function linear_to_ulaw (line 75) | def linear_to_ulaw(x, u=255):
function keras_linear_to_ulaw (line 81) | def keras_linear_to_ulaw(x, u=255.0):
function uint8_to_float (line 87) | def uint8_to_float(x):
function keras_uint8_to_float (line 98) | def keras_uint8_to_float(x):
function ulaw_to_linear (line 108) | def ulaw_to_linear(x, u=255.0):
function keras_ulaw_to_linear (line 114) | def keras_ulaw_to_linear(x, u=255.0):
function one_hot_encode (line 120) | def one_hot_encode(x, num_values=256):
function one_hot_decode (line 129) | def one_hot_decode(x):
function preemphasis (line 134) | def preemphasis(signal, alpha=0.95):
function binary_encode (line 139) | def binary_encode(x, max_value):
function get_condition_input_encode_func (line 149) | def get_condition_input_encode_func(representation):
function ensure_keys_in_dict (line 157) | def ensure_keys_in_dict(keys, dictionary):
function get_subdict_from_dict (line 164) | def get_subdict_from_dict(keys, dictionary):
function pretty_json_dump (line 169) | def pretty_json_dump(values, file_path=None):
function read_wav (line 177) | def read_wav(filename):
function load_wav (line 191) | def load_wav(wav_path, desired_sample_rate):
function write_wav (line 198) | def write_wav(x, filename, sample_rate):
function ensure_sample_rate (line 208) | def ensure_sample_rate(x, desired_sample_rate, file_sample_rate):
function normalize (line 215) | def normalize(x):
function get_sequence_with_singing_indices (line 220) | def get_sequence_with_singing_indices(full_sequence):
function get_indices_subsequence (line 251) | def get_indices_subsequence(indices):
function contains_voice (line 260) | def contains_voice(fragment, sequence):
function dir_contains_files (line 285) | def dir_contains_files(path):
Condensed preview — 16 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (90K chars).
[
{
"path": "LICENSE",
"chars": 1079,
"preview": "MIT License\n\nCopyright (c) 2018 Francesc Lluís Salvadó\n\nPermission is hereby granted, free of charge, to any person obta"
},
{
"path": "README.md",
"chars": 5358,
"preview": "A Wavenet for Music Source Separation\n====\n\nA neural network for end-to-end music source separation, as described in [En"
},
{
"path": "config.md",
"chars": 3763,
"preview": "config.json - Configuring a training session\n----\n\nThe parameters present in a `config.json` file allow one to configure"
},
{
"path": "config_multi_instrument.json",
"chars": 1398,
"preview": "{\n \"dataset\": {\n \"in_memory_percentage\": 1,\n\t\"extract_voice_percentage\": 0,\n \"path\": \"data/MUSDB\",\n "
},
{
"path": "config_singing_voice.json",
"chars": 1288,
"preview": "{\n \"dataset\": {\n \"in_memory_percentage\": 1,\n\t\"extract_voice_percentage\": 0.5,\n \"path\": \"data/MUSDB\",\n "
},
{
"path": "datasets.py",
"chars": 15467,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Datasets.py\n\nimport util\nimport os\nimport numpy as np\n"
},
{
"path": "environment.yml",
"chars": 1254,
"preview": "name: sswavenet\nchannels:\n - anaconda\n - conda-forge\n - defaults\ndependencies:\n - intel-openmp=2018.0.0=hc7b2577_8\n "
},
{
"path": "layers.py",
"chars": 2320,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Layers.py\n\nimport keras\n\n\nclass AddSingletonDepth(kera"
},
{
"path": "main.py",
"chars": 7135,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Main.py\n\nimport sys\nimport logging\nimport optparse\nimp"
},
{
"path": "models.py",
"chars": 29506,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Models.py\n\nimport keras\nimport util\nimport os\nimport n"
},
{
"path": "separate.py",
"chars": 7960,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Separate.py\n\nfrom __future__ import division\nimport os"
},
{
"path": "sessions/multi-instrument/config.json",
"chars": 1699,
"preview": "{\n \"dataset\": {\n \"extract_voice_percentage\": 0,\n \"in_memory_percentage\": 1,\n \"path\": \"MUS\",\n "
},
{
"path": "sessions/singing-voice/config.json",
"chars": 1588,
"preview": "{\n \"dataset\": {\n \"extract_voice_percentage\": 0.5,\n \"in_memory_percentage\": 1,\n \"path\": \"data/MUS"
},
{
"path": "util.py",
"chars": 7171,
"preview": "# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018\n# Util.py\n# Utility functions for dealing with audio sig"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the francesclluis/source-separation-wavenet GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 16 files (76.7 MB), approximately 20.2k tokens, and a symbol index with 107 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.