Repository: francesclluis/source-separation-wavenet
Branch: master
Commit: c80bb531f32d
Files: 16
Total size: 76.7 MB
Directory structure:
gitextract_t_rsro8e/
├── LICENSE
├── README.md
├── config.md
├── config_multi_instrument.json
├── config_singing_voice.json
├── datasets.py
├── environment.yml
├── layers.py
├── main.py
├── models.py
├── separate.py
├── sessions/
│ ├── multi-instrument/
│ │ ├── checkpoints/
│ │ │ └── checkpoint.00045-0.hdf5
│ │ └── config.json
│ └── singing-voice/
│ ├── checkpoints/
│ │ └── checkpoint.00058-0.hdf5
│ └── config.json
└── util.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2018 Francesc Lluís Salvadó
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
A Wavenet for Music Source Separation
====
A neural network for end-to-end music source separation, as described in [End-to-end music source separation:
is it possible in the waveform domain?](https://arxiv.org/abs/1810.12187)
Listen to separated samples [here](http://jordipons.me/apps/end-to-end-music-source-separation/)
What is a Wavenet for Music Source Separation?
-----
The Wavenet for Music Source Separation is a fully convolutional neural network that directly operates on the raw audio waveform.
It is an adaptation of [Wavenet](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) that turns the original causal model (that is generative and slow), into a non-causal model (that is discriminative and parallelizable). This idea was originally proposed by [Rethage et al.](https://arxiv.org/abs/1706.07162) for speech denoising and now it is adapted for monaural music source separation. Their [code](https://github.com/drethage/speech-denoising-wavenet) is reused.
The main difference between the original Wavenet and the non-causal adaptation used, is that some samples from the future can be used to predict the present one. As a result of removing the autoregressive causal nature of the original Wavenet, this fully convolutional model is now able to predict a target field instead of one sample at a time – due to this parallelization, it is possible to run the model in real-time on a GPU.
See the diagram below for a summary of the network architecture.
Installation
-----
1. `git clone https://github.com/francesclluis/source-separation-wavenet.git`
2. Install [conda](https://conda.io/docs/user-guide/install/index.html)
3. `conda env create -f environment.yml`
4. `source activate sswavenet`
*Currently the project requires **Keras 2.1** and **Theano 1.0.1**, the large dilations present in the architecture are not supported by the current version of Tensorflow*
Usage
-----
A pre-trained multi-instrument model (best-performing model described in the paper) can be found in `sessions/multi-instrument/checkpoints` and is ready to be used out-of-the-box. The parameterization of this model is specified in `sessions/multi-instrument/config.json`
A pre-trained singing-voice model (best-performing model described in the paper) can be found in `sessions/singing-voice/checkpoints` and is ready to be used out-of-the-box. The parameterization of this model is specified in `sessions/singing-voice/config.json`
*Download the dataset as described [below](https://github.com/francesclluis/source-separation-wavenet#dataset)*
#### Source Separation:
Example (multi-instrument): `THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/multi-instrument/config.json --mixture_input_path audio/`
Example (singing-voice): `THEANO_FLAGS=device=cuda python main.py --mode inference --config sessions/singing-voice/config.json --mixture_input_path audio/`
###### Speedup
To achieve faster source separation, one can increase the target-field length by use of the optional `--target_field_length` argument. This defines the amount of samples that are separated in a single forward propagation, saving redundant calculations. In the following example, it is increased 10x that of when the model was trained, the batch_size is reduced to 4.
Faster Example: `THEANO_FLAGS=device=cuda python main.py --mode inference --target_field_length 16001 --batch_size 4 --config sessions/multi-instrument/config.json --mixture_input_path audio/`
#### Training:
Example (multi-instrument): `THEANO_FLAGS=device=cuda python main.py --mode training --target multi-instrument --config config_multi_instrument.json`
Example (singing-voice): `THEANO_FLAGS=device=cuda python main.py --mode training --target singing-voice --config config_singing_voice.json`
#### Configuration
A detailed description of all configurable parameters can be found in [config.md](https://github.com/francesclluis/source-separation-wavenet/blob/master/config.md)
#### Optional command-line arguments:
Argument | Valid Inputs | Default | Description
-------- | ---- | ---------- | -----
mode | [training, inference] | training |
target | [multi-instrument, singing-voice] | multi-instrument | Target of the model to train
config | string | config.json | Path to JSON-formatted config file
print_model_summary | bool | False | Prints verbose summary of the model
load_checkpoint | string | None | Path to hdf5 file containing a snapshot of model weights
#### Additional arguments during source separation:
Argument | Valid Inputs | Default | Description
-------- | ------------ | ------- | -----------
one_shot | bool | False | Separates each audio file in a single forward propagation
target_field_length | int | as defined in config.json | Overrides parameter in config.json for separating with different target-field lengths than used in training
batch_size | int | as defined in config.json | # of samples per batch
Dataset
-----
The MUSDB18 is used for training the model. It is provided by the Community-Based Signal Separation Evaluation Campaign (SISEC).
1. [Download here](https://sigsep.github.io/datasets/musdb.html#download)
2. Decode dataset to WAV format as explained [here](https://github.com/sigsep/sigsep-mus-io)
3. Extract to `data/MUSDB`
================================================
FILE: config.md
================================================
config.json - Configuring a training session
----
The parameters present in a `config.json` file allow one to configure a training session. Each of these parameters is described below:
### Dataset
How the data is used for training
* **extract_voice_percentage**: (float) Proportion of the data containing singing voice (instead of vocal streams having silence)
* **in_memory_percentage**: (float) Percentage of the dataset to load into memory, useful when dataset requires more memory than available
* **path**: (string) Path to dataset
* **sample_rate**: (int) Sample rate to which all samples should be resampled to
* **type**: (string) Identifier of which dataset is being used for training
### Model
What the model will be
* **condition_encoding**: (string) Which numerical representation to encode integer condition values to, either binary or one-hot
* **dilations**: (int) Maximum dilation factor as an exponent of 2, e.g. dilations = 9 results in a maximum dilation of 2^9 = 512
* **filters**:
* **lengths**:
* **res**: (int) Lengths of convolution kernels in residual blocks
* **final**: ([int, int]) Lengths of convolution kernels in final layers, individually definable
* **skip**: (int) Lengths of convolution kernels in skip connections
* **depths**:
* **res**: (int) Number of filters in residual-block convolution layers
* **skip**: (int) Number of filters in skip connections
* **final**: ([int, int]) Number of filters in final layers, individually definable
* **num_stacks**: (int) Number of stacks, as defined in the paper
* **target_field_length**: (int) Length of the output
* **target_padding**: (int) Number of samples used for padding the target_field *per side*
### Training
How training will be carried out
* **batch_size**: (int) Number of samples per batch
* **early_stopping_patience**: (int) Number of epochs to wait without improvement in accuracy before stopping training
* **loss**: (in the case of multi-instrument)
* **out_1**: First term in the three term loss (vocals)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to first term
* **out_2**: Second term in the three term loss (drums)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to second term
* **out_3**: Third term in the three term loss (bass)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to third term
* **loss**: (in the case of singing-voice)
* **out_1**: First term in the two term loss (singing voice)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to first term
* **out_2**: Second term in the two term loss (dissimilarity singing voice)
* **l1**: (float) Percentage weight given to L1 loss
* **l2**: (float) Percentage weight given to L2 loss
* **weight**: (float) Percentage weight given to second term
* **num_epochs**: (int) Maximum number of epochs to train for
* **num_steps_test**: (int) Total number of steps (batches of samples) to yield from validation generator before stopping at the end of every epoch.
* **num_steps_train**: (int) Total number of steps (batches of samples) to yield from training generator before declaring one epoch finished and starting the next epoch.
* **path**: (string) Path to the folder containing all files pertaining to the training session
* **verbosity**: (int) Keras verbosity level
================================================
FILE: config_multi_instrument.json
================================================
{
"dataset": {
"in_memory_percentage": 1,
"extract_voice_percentage": 0,
"path": "data/MUSDB",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"lengths": {
"res": 3,
"final": [3, 3],
"skip": 1
},
"depths": {
"res": 64,
"skip": 64,
"final": [2048, 256]
}
},
"num_stacks": 4,
"target_field_length": 1601,
"target_padding": 1
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_3": {
"l1": 1,
"l2": 0,
"weight": 1
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/003",
"verbosity": 1
}
}
================================================
FILE: config_singing_voice.json
================================================
{
"dataset": {
"in_memory_percentage": 1,
"extract_voice_percentage": 0.5,
"path": "data/MUSDB",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"lengths": {
"res": 3,
"final": [3, 3],
"skip": 1
},
"depths": {
"res": 64,
"skip": 64,
"final": [2048, 256]
}
},
"num_stacks": 4,
"target_field_length": 1601,
"target_padding": 1
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": -0.05
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/002",
"verbosity": 1
}
}
================================================
FILE: datasets.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Datasets.py
import util
import os
import numpy as np
import musdb
import logging
class SingingVoiceMUSDB18Dataset():
def __init__(self, config, model):
self.model = model
self.path = config['dataset']['path']
self.sample_rate = config['dataset']['sample_rate']
self.file_paths = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.sequences = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.voice_indices = {'train': [], 'val': []}
self.batch_size = config['training']['batch_size']
self.extract_voice_percent = config['dataset']['extract_voice_percentage']
self.in_memory_percentage = config['dataset']['in_memory_percentage']
self.num_sequences_in_memory = 0
self.condition_encode_function = util.get_condition_input_encode_func(config['model']['condition_encoding'])
def load_dataset(self):
print('Loading MUSDB18 dataset for singing voice separation...')
mus = musdb.DB(root_dir=self.path, is_wav=True)
tracks = mus.load_mus_tracks(subsets='train')
np.random.seed(seed=1337)
val_idx = np.random.choice(len(tracks), size=25, replace=False)
train_idx = [i for i in range(len(tracks)) if i not in val_idx]
val_tracks = [tracks[i] for i in val_idx]
train_tracks = [tracks[i] for i in train_idx]
for condition in ['mixture', 'vocals']:
self.file_paths['val'][condition] = [track.path[:-11] + condition + '.wav' for track in val_tracks]
for condition in ['mixture', 'vocals']:
self.file_paths['train'][condition] = [track.path[:-11] + condition + '.wav' for track in train_tracks]
self.load_songs()
return self
def load_songs(self):
for set in ['train', 'val']:
for condition in ['mixture', 'vocals']:
for filepath in self.file_paths[set][condition]:
if condition == 'vocals':
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
if self.extract_voice_percent > 0:
self.voice_indices[set].append(util.get_sequence_with_singing_indices(sequence))
else:
if self.in_memory_percentage == 1 or np.random.uniform(0, 1) <= (
self.in_memory_percentage - 0.5) * 2:
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
else:
self.sequences[set][condition].append([-1])
def get_num_sequences_in_dataset(self):
return len(self.sequences['train']['vocals']) + len(self.sequences['train']['mixture']) + len(
self.sequences['val']['vocals']) + len(self.sequences['val']['mixture'])
def retrieve_sequence(self, set, condition, sequence_num):
if len(self.sequences[set][condition][sequence_num]) == 1:
sequence = util.load_wav(self.file_paths[set][condition][sequence_num], self.sample_rate)
if (float(self.num_sequences_in_memory) / self.get_num_sequences_in_dataset()) < self.in_memory_percentage:
self.sequences[set][condition][sequence_num] = sequence
self.num_sequences_in_memory += 1
else:
sequence = self.sequences[set][condition][sequence_num]
return np.array(sequence)
def get_random_batch_generator(self, set):
if set not in ['train', 'val']:
raise ValueError("Argument SET must be either 'train' or 'val'")
while True:
sample_indices = np.random.randint(0, len(self.sequences[set]['vocals']), self.batch_size)
batch_inputs = []
batch_outputs_1 = []
batch_outputs_2 = []
for i, sample_i in enumerate(sample_indices):
while True:
starting_index = 0
mixture = self.retrieve_sequence(set, 'mixture', sample_i)
vocals = self.retrieve_sequence(set, 'vocals', sample_i)
accompaniment = mixture - vocals
if np.random.uniform(0, 1) < self.extract_voice_percent:
indices = self.voice_indices[set][sample_i]
vocals_indices, _ = util.get_indices_subsequence(indices)
vocals = vocals[vocals_indices[0]:vocals_indices[1]]
starting_index = vocals_indices[0]
if len(vocals) < self.model.input_length:
sample_i = np.random.randint(0, len(self.sequences[set]['vocals']))
else:
break
offset_1 = np.squeeze(np.random.randint(0, len(vocals) - self.model.input_length + 1, 1))
vocals_fragment = vocals[offset_1:offset_1 + self.model.input_length]
offset_2 = offset_1 + starting_index
accompaniment_fragment = accompaniment[offset_2:offset_2 + self.model.input_length]
input = accompaniment_fragment + vocals_fragment
output_vocals = vocals_fragment
output_accompaniment = accompaniment_fragment
batch_inputs.append(input)
batch_outputs_1.append(output_vocals)
batch_outputs_2.append(output_accompaniment)
batch_inputs = np.array(batch_inputs, dtype='float32')
batch_outputs_1 = np.array(batch_outputs_1, dtype='float32')
batch_outputs_2 = np.array(batch_outputs_2, dtype='float32')
batch_outputs_1 = batch_outputs_1[:, self.model.get_padded_target_field_indices()]
batch_outputs_2 = batch_outputs_2[:, self.model.get_padded_target_field_indices()]
batch = {'data_input': batch_inputs}, {'data_output_1': batch_outputs_1,
'data_output_2': batch_outputs_2}
yield batch
def get_condition_input_encode_func(self, representation):
if representation == 'binary':
return util.binary_encode
else:
return util.one_hot_encode
def get_target_sample_index(self):
return int(np.floor(self.fragment_length / 2.0))
def get_samples_of_interest_indices(self, causal=False):
if causal:
return -1
else:
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_sample_weight_vector_length(self):
if self.samples_of_interest_only:
return len(self.get_samples_of_interest_indices())
else:
return self.fragment_length
class MultiInstrumentMUSDB18Dataset():
def __init__(self, config, model):
self.model = model
self.path = config['dataset']['path']
self.sample_rate = config['dataset']['sample_rate']
self.file_paths = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.sequences = {'train': {'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}, 'val':
{'vocals': [], 'mixture': [], 'drums': [], 'other': [], 'bass': []}}
self.voice_indices = {'train': [], 'val': []}
self.batch_size = config['training']['batch_size']
self.extract_voice_percent = config['dataset']['extract_voice_percentage']
self.in_memory_percentage = config['dataset']['in_memory_percentage']
self.num_sequences_in_memory = 0
self.condition_encode_function = util.get_condition_input_encode_func(config['model']['condition_encoding'])
def load_dataset(self):
print('Loading MUSDB18 dataset for multi-instrument separation...')
mus = musdb.DB(root_dir=self.path, is_wav=True)
tracks = mus.load_mus_tracks(subsets='train')
np.random.seed(seed=1337)
val_idx = np.random.choice(len(tracks), size=25, replace=False)
train_idx = [i for i in range(len(tracks)) if i not in val_idx]
val_tracks = [tracks[i] for i in val_idx]
train_tracks = [tracks[i] for i in train_idx]
for condition in ['mixture', 'vocals', 'drums', 'other', 'bass']:
self.file_paths['val'][condition] = [track.path[:-11] + condition + '.wav' for track in val_tracks]
for condition in ['mixture', 'vocals', 'drums', 'other', 'bass']:
self.file_paths['train'][condition] = [track.path[:-11] + condition + '.wav' for track in train_tracks]
self.load_songs()
return self
def load_songs(self):
for set in ['train', 'val']:
for condition in ['vocals', 'mixture', 'drums', 'other', 'bass']:
for filepath in self.file_paths[set][condition]:
if condition == 'vocals':
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
if self.extract_voice_percent > 0:
self.voice_indices[set].append(util.get_sequence_with_singing_indices(sequence))
else:
if self.in_memory_percentage == 1 or np.random.uniform(0, 1) <= (
self.in_memory_percentage - 0.5) * 2:
sequence = util.load_wav(filepath, self.sample_rate)
self.sequences[set][condition].append(sequence)
self.num_sequences_in_memory += 1
else:
self.sequences[set][condition].append([-1])
def get_num_sequences_in_dataset(self):
return len(self.sequences['train']['vocals']) + len(self.sequences['train']['mixture']) + len(
self.sequences['val']['vocals']) + len(self.sequences['val']['mixture'])
def retrieve_sequence(self, set, condition, sequence_num):
if len(self.sequences[set][condition][sequence_num]) == 1:
sequence = util.load_wav(self.file_paths[set][condition][sequence_num], self.sample_rate)
if (float(self.num_sequences_in_memory) / self.get_num_sequences_in_dataset()) < self.in_memory_percentage:
self.sequences[set][condition][sequence_num] = sequence
self.num_sequences_in_memory += 1
else:
sequence = self.sequences[set][condition][sequence_num]
return np.array(sequence)
def get_random_batch_generator(self, set):
if set not in ['train', 'val']:
raise ValueError("Argument SET must be either 'train' or 'val'")
while True:
sample_indices = np.random.randint(0, len(self.sequences[set]['vocals']), self.batch_size)
batch_inputs = []
batch_outputs_1 = []
batch_outputs_2 = []
batch_outputs_3 = []
for i, sample_i in enumerate(sample_indices):
while True:
starting_index = 0
vocals = self.retrieve_sequence(set, 'vocals', sample_i)
bass = self.retrieve_sequence(set, 'bass', sample_i)
drums = self.retrieve_sequence(set, 'drums', sample_i)
other = self.retrieve_sequence(set, 'other', sample_i)
if np.random.uniform(0, 1) < self.extract_voice_percent:
indices = self.voice_indices[set][sample_i]
vocals_indices, _ = util.get_indices_subsequence(indices)
vocals = vocals[vocals_indices[0]:vocals_indices[1]]
starting_index = vocals_indices[0]
if len(vocals) < self.model.input_length:
sample_i = np.random.randint(0, len(self.sequences[set]['vocals']))
else:
break
offset_1 = np.squeeze(np.random.randint(0, len(vocals) - self.model.input_length + 1, 1))
vocals_fragment = vocals[offset_1:offset_1 + self.model.input_length]
offset_2 = offset_1 + starting_index
bass_fragment = bass[offset_2:offset_2 + self.model.input_length]
drums_fragment = drums[offset_2:offset_2 + self.model.input_length]
other_fragment = other[offset_2:offset_2 + self.model.input_length]
input = vocals_fragment + bass_fragment + drums_fragment + other_fragment
output_vocals = vocals_fragment
output_drums = drums_fragment
output_bass = bass_fragment
batch_inputs.append(input)
batch_outputs_1.append(output_vocals)
batch_outputs_2.append(output_drums)
batch_outputs_3.append(output_bass)
batch_inputs = np.array(batch_inputs, dtype='float32')
batch_outputs_1 = np.array(batch_outputs_1, dtype='float32')
batch_outputs_2 = np.array(batch_outputs_2, dtype='float32')
batch_outputs_3 = np.array(batch_outputs_3, dtype='float32')
batch_outputs_1 = batch_outputs_1[:, self.model.get_padded_target_field_indices()]
batch_outputs_2 = batch_outputs_2[:, self.model.get_padded_target_field_indices()]
batch_outputs_3 = batch_outputs_3[:, self.model.get_padded_target_field_indices()]
batch = {'data_input': batch_inputs}, {'data_output_1': batch_outputs_1,
'data_output_2': batch_outputs_2,
'data_output_3': batch_outputs_3}
yield batch
def get_condition_input_encode_func(self, representation):
if representation == 'binary':
return util.binary_encode
else:
return util.one_hot_encode
def get_target_sample_index(self):
return int(np.floor(self.fragment_length / 2.0))
def get_samples_of_interest_indices(self, causal=False):
if causal:
return -1
else:
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_sample_weight_vector_length(self):
if self.samples_of_interest_only:
return len(self.get_samples_of_interest_indices())
else:
return self.fragment_length
================================================
FILE: environment.yml
================================================
name: sswavenet
channels:
- anaconda
- conda-forge
- defaults
dependencies:
- intel-openmp=2018.0.0=hc7b2577_8
- mkl=2018.0.1=h19d6760_4
- mkl-service=1.1.2=py27hb2d42c5_4
- ca-certificates=2018.1.18=0
- certifi=2018.1.18=py27_0
- h5py=2.7.1=py27_2
- hdf5=1.10.1=2
- keras=2.1.5=py27_0
- libgpuarray=0.7.5=0
- mako=1.0.7=py27_0
- markupsafe=1.0=py27_0
- openssl=1.0.2n=0
- pygpu=0.7.5=py27_0
- pyyaml=3.12=py27_1
- six=1.11.0=py27_1
- theano=1.0.1=py27_1
- yaml=0.1.7=0
- libedit=3.1=heed3624_0
- libffi=3.2.1=hd88cf55_4
- libgcc-ng=7.2.0=hdf63c60_3
- libgfortran=3.0.0=1
- libgfortran-ng=7.2.0=hdf63c60_3
- libstdcxx-ng=7.2.0=hdf63c60_3
- ncurses=6.0=h9df7e31_2
- numpy=1.14.2=py27hdbf6ddf_0
- pip=9.0.1=py27_5
- python=2.7.14=h1571d57_30
- readline=7.0=ha6073c6_4
- scipy=1.0.0=py27hf5f0f52_0
- setuptools=38.5.1=py27_0
- sqlite=3.22.0=h1bed415_0
- tk=8.6.7=hc745277_3
- wheel=0.30.0=py27h2bc6bb2_1
- zlib=1.2.11=ha838bed_2
- pip:
- cffi==1.11.5
- functools32==3.2.3.post2
- jsonschema==2.6.0
- musdb==0.2.3
- museval==0.2.0
- pyaml==17.12.1
- pycparser==2.18
- simplejson==3.13.2
- soundfile==0.9.0
- stempeg==0.1.3
- tqdm==4.19.7
================================================
FILE: layers.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Layers.py
import keras
class AddSingletonDepth(keras.layers.Layer):
def call(self, x, mask=None):
x = keras.backend.expand_dims(x, -1) # add a dimension of the right
if keras.backend.ndim(x) == 4:
return keras.backend.permute_dimensions(x, (0, 3, 1, 2))
else:
return x
def compute_output_shape(self, input_shape):
if len(input_shape) == 3:
return input_shape[0], 1, input_shape[1], input_shape[2]
else:
return input_shape[0], input_shape[1], 1
class Subtract(keras.layers.Layer):
def __init__(self, **kwargs):
super(Subtract, self).__init__(**kwargs)
def call(self, x, mask=None):
return x[0] - x[1]
def compute_output_shape(self, input_shape):
return input_shape[0]
class Add(keras.layers.Layer):
def __init__(self, **kwargs):
super(Add, self).__init__(**kwargs)
def call(self, x, mask=None):
output = x[0]
for i in range(1, len(x)):
output += x[i]
return output
def compute_output_shape(self, input_shape):
return input_shape[0]
class Slice(keras.layers.Layer):
def __init__(self, selector, output_shape, **kwargs):
self.selector = selector
self.desired_output_shape = output_shape
super(Slice, self).__init__(**kwargs)
def call(self, x, mask=None):
selector = self.selector
if len(self.selector) == 2 and not type(self.selector[1]) is slice and not type(self.selector[1]) is int:
x = keras.backend.permute_dimensions(x, [0, 2, 1])
selector = (self.selector[1], self.selector[0])
y = x[selector]
if len(self.selector) == 2 and not type(self.selector[1]) is slice and not type(self.selector[1]) is int:
y = keras.backend.permute_dimensions(y, [0, 2, 1])
return y
def compute_output_shape(self, input_shape):
output_shape = (None,)
for i, dim_length in enumerate(self.desired_output_shape):
if dim_length == Ellipsis:
output_shape = output_shape + (input_shape[i+1],)
else:
output_shape = output_shape + (dim_length,)
return output_shape
================================================
FILE: main.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Main.py
import sys
import logging
import optparse
import json
import os
import models
import datasets
import util
import separate
def set_system_settings():
sys.setrecursionlimit(50000)
logging.getLogger().setLevel(logging.INFO)
def get_command_line_arguments():
parser = optparse.OptionParser()
parser.set_defaults(config='sessions/multi-instrument/config.json')
parser.set_defaults(mode='training')
parser.set_defaults(target='multi-instrument')
parser.set_defaults(load_checkpoint=None)
parser.set_defaults(condition_value=0)
parser.set_defaults(batch_size=None)
parser.set_defaults(one_shot=False)
parser.set_defaults(mixture_input_path=None)
parser.set_defaults(print_model_summary=False)
parser.set_defaults(target_field_length=None)
parser.add_option('--mode', dest='mode')
parser.add_option('--target', dest='target')
parser.add_option('--print_model_summary', dest='print_model_summary')
parser.add_option('--config', dest='config')
parser.add_option('--load_checkpoint', dest='load_checkpoint')
parser.add_option('--condition_value', dest='condition_value')
parser.add_option('--batch_size', dest='batch_size')
parser.add_option('--one_shot', dest='one_shot')
parser.add_option('--mixture_input_path', dest='mixture_input_path')
parser.add_option('--target_field_length', dest='target_field_length')
(options, args) = parser.parse_args()
return options
def load_config(config_filepath):
try:
config_file = open(config_filepath, 'r')
except IOError:
logging.error('No readable config file at path: ' + config_filepath)
exit()
else:
with config_file:
return json.load(config_file)
def get_dataset(config, cla, model):
if config['dataset']['type'] == 'musdb18':
if cla.target == 'singing-voice':
return datasets.SingingVoiceMUSDB18Dataset(config, model).load_dataset()
elif cla.target == 'multi-instrument':
return datasets.MultiInstrumentMUSDB18Dataset(config, model).load_dataset()
def training(config, cla):
# Instantiate Model
if cla.target == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif cla.target == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
else:
raise Exception("Argument target must be either 'singing-voice' or 'multi-instrument'")
dataset = get_dataset(config, cla, model)
num_steps_train = config['training']['num_steps_train']
num_steps_val = config['training']['num_steps_test']
train_set_generator = dataset.get_random_batch_generator('train')
val_set_generator = dataset.get_random_batch_generator('val')
model.fit_model(train_set_generator, num_steps_train, val_set_generator, num_steps_val,
config['training']['num_epochs'])
def get_valid_output_folder_path(outputs_folder_path):
j = 1
while True:
output_folder_name = 'samples_%d' % j
output_folder_path = os.path.join(outputs_folder_path, output_folder_name)
if not os.path.isdir(output_folder_path):
os.mkdir(output_folder_path)
break
j += 1
return output_folder_path
def inference(config, cla):
if cla.batch_size is not None:
batch_size = int(cla.batch_size)
else:
batch_size = config['training']['batch_size']
if cla.target_field_length is not None:
cla.target_field_length = int(cla.target_field_length)
if not bool(cla.one_shot):
if config['model']['type'] == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif config['model']['type'] == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
print 'Performing inference..'
else:
print 'Performing one-shot inference..'
samples_folder_path = os.path.join(config['training']['path'], 'samples')
output_folder_path = get_valid_output_folder_path(samples_folder_path)
#If input_path is a single wav file, then set filenames to single element with wav filename
if cla.mixture_input_path.endswith('.wav'):
filenames = [cla.mixture_input_path.rsplit('/', 1)[-1]]
cla.mixture_input_path = cla.mixture_input_path.rsplit('/', 1)[0] + '/'
else:
if not cla.mixture_input_path.endswith('/'):
cla.mixture_input_path += '/'
filenames = [filename for filename in os.listdir(cla.mixture_input_path) if filename.endswith('.wav')]
for filename in filenames:
mixture_input = util.load_wav(cla.mixture_input_path + filename, config['dataset']['sample_rate'])
input = {'mixture': mixture_input}
output_filename_prefix = filename[0:-4]
if bool(cla.one_shot):
if len(input['mixture']) % 2 == 0: # If input length is even, remove one sample
input['mixture'] = input['mixture'][:-1]
if config['model']['type'] == 'singing-voice':
model = models.SingingVoiceSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
elif config['model']['type'] == 'multi-instrument':
model = models.MultiInstrumentSeparationWavenet(config, target_field_length=cla.target_field_length,
load_checkpoint=cla.load_checkpoint,
print_model_summary=cla.print_model_summary)
print "Separating: " + filename
separate.separate_sample(model, input, batch_size, output_filename_prefix,
config['dataset']['sample_rate'], output_folder_path, config['model']['type'])
def main():
set_system_settings()
cla = get_command_line_arguments()
config = load_config(cla.config)
if cla.mode == 'training':
training(config, cla)
elif cla.mode == 'inference':
inference(config, cla)
if __name__ == "__main__":
main()
================================================
FILE: models.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Models.py
import keras
import util
import os
import numpy as np
import layers
import logging
#Singing Voice Separation Wavenet Model
class SingingVoiceSeparationWavenet():
def __init__(self, config, load_checkpoint=None, input_length=None, target_field_length=None, print_model_summary=False):
self.config = config
self.verbosity = config['training']['verbosity']
self.num_stacks = self.config['model']['num_stacks']
if type(self.config['model']['dilations']) is int:
self.dilations = [2 ** i for i in range(0, self.config['model']['dilations'] + 1)]
elif type(self.config['model']['dilations']) is list:
self.dilations = self.config['model']['dilations']
self.receptive_field_length = util.compute_receptive_field_length(config['model']['num_stacks'], self.dilations,
config['model']['filters']['lengths']['res'],
1)
if input_length is not None:
self.input_length = input_length
self.target_field_length = self.input_length - (self.receptive_field_length - 1)
if target_field_length is not None:
self.target_field_length = target_field_length
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
else:
self.target_field_length = config['model']['target_field_length']
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
self.target_padding = config['model']['target_padding']
self.padded_target_field_length = self.target_field_length + 2 * self.target_padding
self.half_target_field_length = self.target_field_length / 2
self.half_receptive_field_length = self.receptive_field_length / 2
self.num_residual_blocks = len(self.dilations) * self.num_stacks
self.activation = keras.layers.Activation('relu')
self.samples_of_interest_indices = self.get_padded_target_field_indices()
self.target_sample_indices = self.get_target_field_indices()
self.optimizer = self.get_optimizer()
self.out_1_loss = self.get_out_1_loss()
self.out_2_loss = self.get_out_2_loss()
self.metrics = self.get_metrics()
self.epoch_num = 0
self.checkpoints_path = ''
self.samples_path = ''
self.history_filename = ''
self.config['model']['num_residual_blocks'] = self.num_residual_blocks
self.config['model']['receptive_field_length'] = self.receptive_field_length
self.config['model']['input_length'] = self.input_length
self.config['model']['target_field_length'] = self.target_field_length
self.config['model']['type'] = 'singing-voice'
self.model = self.setup_model(load_checkpoint, print_model_summary)
def setup_model(self, load_checkpoint=None, print_model_summary=False):
self.checkpoints_path = os.path.join(self.config['training']['path'], 'checkpoints')
self.samples_path = os.path.join(self.config['training']['path'], 'samples')
self.history_filename = 'history_' + self.config['training']['path'][
self.config['training']['path'].rindex('/') + 1:] + '.csv'
model = self.build_model()
if os.path.exists(self.checkpoints_path) and util.dir_contains_files(self.checkpoints_path):
if load_checkpoint is not None:
last_checkpoint_path = load_checkpoint
self.epoch_num = 0
else:
checkpoints = os.listdir(self.checkpoints_path)
checkpoints.sort(key=lambda x: os.stat(os.path.join(self.checkpoints_path, x)).st_mtime)
last_checkpoint = checkpoints[-1]
last_checkpoint_path = os.path.join(self.checkpoints_path, last_checkpoint)
self.epoch_num = int(last_checkpoint[11:16])
print 'Loading model from epoch: %d' % self.epoch_num
model.load_weights(last_checkpoint_path)
else:
print 'Building new model...'
if not os.path.exists(self.config['training']['path']):
os.mkdir(self.config['training']['path'])
if not os.path.exists(self.checkpoints_path):
os.mkdir(self.checkpoints_path)
self.epoch_num = 0
if not os.path.exists(self.samples_path):
os.mkdir(self.samples_path)
if print_model_summary:
model.summary()
model.compile(optimizer=self.optimizer,
loss={'data_output_1': self.out_1_loss, 'data_output_2': self.out_2_loss}, metrics=self.metrics)
self.config['model']['num_params'] = model.count_params()
config_path = os.path.join(self.config['training']['path'], 'config.json')
if not os.path.exists(config_path):
util.pretty_json_dump(self.config, config_path)
if print_model_summary:
util.pretty_json_dump(self.config)
return model
def get_optimizer(self):
return keras.optimizers.Adam(lr=self.config['optimizer']['lr'], decay=self.config['optimizer']['decay'],
epsilon=self.config['optimizer']['epsilon'])
def get_out_1_loss(self):
if self.config['training']['loss']['out_1']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_1']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_1']['l1'],
self.config['training']['loss']['out_1']['l2'])
def get_out_2_loss(self):
if self.config['training']['loss']['out_2']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_2']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_2']['l1'],
self.config['training']['loss']['out_2']['l2'])
def get_callbacks(self):
return [
keras.callbacks.EarlyStopping(patience=self.config['training']['early_stopping_patience'], verbose=1,
monitor='loss'),
keras.callbacks.ModelCheckpoint(os.path.join(self.checkpoints_path,
'checkpoint.{epoch:05d}-{val_loss:.3f}.hdf5')),
keras.callbacks.CSVLogger(os.path.join(self.config['training']['path'], self.history_filename), append=True)
]
def fit_model(self, train_set_generator, num_steps_train, test_set_generator, num_steps_test, num_epochs):
print('Fitting model with %d training num steps and %d test num steps...' % (num_steps_train, num_steps_test))
self.model.fit_generator(train_set_generator,
num_steps_train,
epochs=num_epochs,
validation_data=test_set_generator,
validation_steps=num_steps_test,
callbacks=self.get_callbacks(),
verbose=self.verbosity,
initial_epoch=self.epoch_num)
def separate_batch(self, inputs):
return self.model.predict_on_batch(inputs)
def get_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length,
target_sample_index + self.half_target_field_length + 1)
def get_padded_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_target_sample_index(self):
return int(np.floor(self.input_length / 2.0))
def get_metrics(self):
return [
keras.metrics.mean_absolute_error,
self.valid_mean_absolute_error
]
def valid_mean_absolute_error(self, y_true, y_pred):
return keras.backend.mean(
keras.backend.abs(y_true[:, 1:-2] - y_pred[:, 1:-2]))
def build_model(self):
data_input = keras.engine.Input(
shape=(self.input_length,),
name='data_input')
data_expanded = layers.AddSingletonDepth()(data_input)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'], padding='same',
use_bias=False,
name='initial_causal_conv')(data_expanded)
skip_connections = []
res_block_i = 0
for stack_i in range(self.num_stacks):
layer_in_stack = 0
for dilation in self.dilations:
res_block_i += 1
data_out, skip_out = self.dilated_residual_block(data_out, res_block_i, layer_in_stack, dilation, stack_i)
if skip_out is not None:
skip_connections.append(skip_out)
layer_in_stack += 1
data_out = keras.layers.Add()(skip_connections)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][0],
self.config['model']['filters']['lengths']['final'][0],
padding='same',
use_bias=False)(data_out)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][1],
self.config['model']['filters']['lengths']['final'][1], padding='same',
use_bias=False)(data_out)
data_out = keras.layers.Convolution1D(1, 1)(data_out)
data_out_vocals_1 = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_1')(
data_out)
data_out_vocals_2 = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_2')(
data_out)
return keras.engine.Model(inputs=[data_input], outputs=[data_out_vocals_1, data_out_vocals_2])
def dilated_residual_block(self, data_x, res_block_i, layer_i, dilation, stack_i):
original_x = data_x
data_out = keras.layers.Conv1D(2 * self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'],
dilation_rate=dilation, padding='same',
use_bias=False,
name='res_%d_dilated_conv_d%d_s%d' % (
res_block_i, dilation, stack_i),
activation=None)(data_x)
data_out_1 = layers.Slice(
(Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_1_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
data_out_2 = layers.Slice(
(Ellipsis, slice(self.config['model']['filters']['depths']['res'],
2 * self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_2_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
tanh_out = keras.layers.Activation('tanh')(data_out_1)
sigm_out = keras.layers.Activation('sigmoid')(data_out_2)
data_x = keras.layers.Multiply(name='res_%d_gated_activation_%d_s%d' % (res_block_i, layer_i, stack_i))(
[tanh_out, sigm_out])
data_x = keras.layers.Convolution1D(
self.config['model']['filters']['depths']['res'] + self.config['model']['filters']['depths']['skip'], 1,
padding='same', use_bias=False)(data_x)
res_x = layers.Slice((Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_3_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((Ellipsis, slice(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['depths']['res'] +
self.config['model']['filters']['depths']['skip'])),
(self.input_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_data_slice_4_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((slice(self.samples_of_interest_indices[0], self.samples_of_interest_indices[-1] + 1, 1),
Ellipsis), (self.padded_target_field_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_keep_samples_of_interest_d%d_s%d' % (res_block_i, dilation, stack_i))(skip_x)
res_x = keras.layers.Add()([original_x, res_x])
return res_x, skip_x
# Multi-Instrument Separation Wavenet Model
class MultiInstrumentSeparationWavenet():
def __init__(self, config, load_checkpoint=None, input_length=None, target_field_length=None, print_model_summary=False):
self.config = config
self.verbosity = config['training']['verbosity']
self.num_stacks = self.config['model']['num_stacks']
if type(self.config['model']['dilations']) is int:
self.dilations = [2 ** i for i in range(0, self.config['model']['dilations'] + 1)]
elif type(self.config['model']['dilations']) is list:
self.dilations = self.config['model']['dilations']
self.receptive_field_length = util.compute_receptive_field_length(config['model']['num_stacks'], self.dilations,
config['model']['filters']['lengths']['res'],
1)
if input_length is not None:
self.input_length = input_length
self.target_field_length = self.input_length - (self.receptive_field_length - 1)
if target_field_length is not None:
self.target_field_length = target_field_length
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
else:
self.target_field_length = config['model']['target_field_length']
self.input_length = self.receptive_field_length + (self.target_field_length - 1)
self.target_padding = config['model']['target_padding']
self.padded_target_field_length = self.target_field_length + 2 * self.target_padding
self.half_target_field_length = self.target_field_length / 2
self.half_receptive_field_length = self.receptive_field_length / 2
self.num_residual_blocks = len(self.dilations) * self.num_stacks
self.activation = keras.layers.Activation('relu')
self.samples_of_interest_indices = self.get_padded_target_field_indices()
self.target_sample_indices = self.get_target_field_indices()
self.optimizer = self.get_optimizer()
self.out_1_loss = self.get_out_1_loss()
self.out_2_loss = self.get_out_2_loss()
self.out_3_loss = self.get_out_3_loss()
self.metrics = self.get_metrics()
self.epoch_num = 0
self.checkpoints_path = ''
self.samples_path = ''
self.history_filename = ''
self.config['model']['num_residual_blocks'] = self.num_residual_blocks
self.config['model']['receptive_field_length'] = self.receptive_field_length
self.config['model']['input_length'] = self.input_length
self.config['model']['target_field_length'] = self.target_field_length
self.config['model']['type'] = 'multi-instrument'
self.model = self.setup_model(load_checkpoint, print_model_summary)
def setup_model(self, load_checkpoint=None, print_model_summary=False):
self.checkpoints_path = os.path.join(self.config['training']['path'], 'checkpoints')
self.samples_path = os.path.join(self.config['training']['path'], 'samples')
self.history_filename = 'history_' + self.config['training']['path'][
self.config['training']['path'].rindex('/') + 1:] + '.csv'
model = self.build_model()
if os.path.exists(self.checkpoints_path) and util.dir_contains_files(self.checkpoints_path):
if load_checkpoint is not None:
last_checkpoint_path = load_checkpoint
self.epoch_num = 0
else:
checkpoints = os.listdir(self.checkpoints_path)
checkpoints.sort(key=lambda x: os.stat(os.path.join(self.checkpoints_path, x)).st_mtime)
last_checkpoint = checkpoints[-1]
last_checkpoint_path = os.path.join(self.checkpoints_path, last_checkpoint)
self.epoch_num = int(last_checkpoint[11:16])
print 'Loading model from epoch: %d' % self.epoch_num
model.load_weights(last_checkpoint_path)
else:
print 'Building new model...'
if not os.path.exists(self.config['training']['path']):
os.mkdir(self.config['training']['path'])
if not os.path.exists(self.checkpoints_path):
os.mkdir(self.checkpoints_path)
self.epoch_num = 0
if not os.path.exists(self.samples_path):
os.mkdir(self.samples_path)
if print_model_summary:
model.summary()
model.compile(optimizer=self.optimizer,
loss={'data_output_1': self.out_1_loss, 'data_output_2': self.out_2_loss,
'data_output_3': self.out_3_loss}, metrics=self.metrics)
self.config['model']['num_params'] = model.count_params()
config_path = os.path.join(self.config['training']['path'], 'config.json')
if not os.path.exists(config_path):
util.pretty_json_dump(self.config, config_path)
if print_model_summary:
util.pretty_json_dump(self.config)
return model
def get_optimizer(self):
return keras.optimizers.Adam(lr=self.config['optimizer']['lr'], decay=self.config['optimizer']['decay'],
epsilon=self.config['optimizer']['epsilon'])
def get_out_1_loss(self):
if self.config['training']['loss']['out_1']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_1']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_1']['l1'],
self.config['training']['loss']['out_1']['l2'])
def get_out_2_loss(self):
if self.config['training']['loss']['out_2']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_2']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_2']['l1'],
self.config['training']['loss']['out_2']['l2'])
def get_out_3_loss(self):
if self.config['training']['loss']['out_3']['weight'] == 0:
return lambda y_true, y_pred: y_true * 0
return lambda y_true, y_pred: self.config['training']['loss']['out_3']['weight'] * util.l1_l2_loss(
y_true, y_pred, self.config['training']['loss']['out_3']['l1'],
self.config['training']['loss']['out_3']['l2'])
def get_callbacks(self):
return [
keras.callbacks.EarlyStopping(patience=self.config['training']['early_stopping_patience'], verbose=1,
monitor='loss'),
keras.callbacks.ModelCheckpoint(os.path.join(self.checkpoints_path,
'checkpoint.{epoch:05d}-{val_loss:.3f}.hdf5')),
keras.callbacks.CSVLogger(os.path.join(self.config['training']['path'], self.history_filename), append=True)
]
def fit_model(self, train_set_generator, num_steps_train, test_set_generator, num_steps_test, num_epochs):
print('Fitting model with %d training num steps and %d test num steps...' % (num_steps_train, num_steps_test))
self.model.fit_generator(train_set_generator,
num_steps_train,
epochs=num_epochs,
validation_data=test_set_generator,
validation_steps=num_steps_test,
callbacks=self.get_callbacks(),
verbose=self.verbosity,
initial_epoch=self.epoch_num)
def separate_batch(self, inputs):
return self.model.predict_on_batch(inputs)
def get_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length,
target_sample_index + self.half_target_field_length + 1)
def get_padded_target_field_indices(self):
target_sample_index = self.get_target_sample_index()
return range(target_sample_index - self.half_target_field_length - self.target_padding,
target_sample_index + self.half_target_field_length + self.target_padding + 1)
def get_target_sample_index(self):
return int(np.floor(self.input_length / 2.0))
def get_metrics(self):
return [
keras.metrics.mean_absolute_error,
self.valid_mean_absolute_error
]
def valid_mean_absolute_error(self, y_true, y_pred):
return keras.backend.mean(
keras.backend.abs(y_true[:, 1:-2] - y_pred[:, 1:-2]))
def build_model(self):
data_input = keras.engine.Input(
shape=(self.input_length,),
name='data_input')
data_expanded = layers.AddSingletonDepth()(data_input)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'], padding='same',
use_bias=False,
name='initial_causal_conv')(data_expanded)
skip_connections = []
res_block_i = 0
for stack_i in range(self.num_stacks):
layer_in_stack = 0
for dilation in self.dilations:
res_block_i += 1
data_out, skip_out = self.dilated_residual_block(data_out, res_block_i, layer_in_stack, dilation, stack_i)
if skip_out is not None:
skip_connections.append(skip_out)
layer_in_stack += 1
data_out = keras.layers.Add()(skip_connections)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][0],
self.config['model']['filters']['lengths']['final'][0],
padding='same',
use_bias=False)(data_out)
data_out = self.activation(data_out)
data_out = keras.layers.Convolution1D(self.config['model']['filters']['depths']['final'][1],
self.config['model']['filters']['lengths']['final'][1], padding='same',
use_bias=False)(data_out)
data_out = keras.layers.Convolution1D(3, 1)(data_out)
data_out_vocals = layers.Slice((Ellipsis, slice(0, 1)), (self.padded_target_field_length, 1),
name='slice_data_output_1')(data_out)
data_out_drums = layers.Slice((Ellipsis, slice(1, 2)), (self.padded_target_field_length, 1),
name='slice_data_output_2')(data_out)
data_out_bass = layers.Slice((Ellipsis, slice(2, 3)), (self.padded_target_field_length, 1),
name='slice_data_output_3')(data_out)
data_out_vocals = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_1')(
data_out_vocals)
data_out_drums = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_2')(
data_out_drums)
data_out_bass = keras.layers.Lambda(lambda x: keras.backend.squeeze(x, 2),
output_shape=lambda shape: (shape[0], shape[1]), name='data_output_3')(
data_out_bass)
return keras.engine.Model(inputs=[data_input], outputs=[data_out_vocals, data_out_drums, data_out_bass])
def dilated_residual_block(self, data_x, res_block_i, layer_i, dilation, stack_i):
original_x = data_x
data_out = keras.layers.Conv1D(2 * self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['lengths']['res'],
dilation_rate=dilation, padding='same',
use_bias=False,
name='res_%d_dilated_conv_d%d_s%d' % (
res_block_i, dilation, stack_i),
activation=None)(data_x)
data_out_1 = layers.Slice(
(Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_1_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
data_out_2 = layers.Slice(
(Ellipsis, slice(self.config['model']['filters']['depths']['res'],
2 * self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_2_d%d_s%d' % (self.num_residual_blocks, dilation, stack_i))(data_out)
tanh_out = keras.layers.Activation('tanh')(data_out_1)
sigm_out = keras.layers.Activation('sigmoid')(data_out_2)
data_x = keras.layers.Multiply(name='res_%d_gated_activation_%d_s%d' % (res_block_i, layer_i, stack_i))(
[tanh_out, sigm_out])
data_x = keras.layers.Convolution1D(
self.config['model']['filters']['depths']['res'] + self.config['model']['filters']['depths']['skip'], 1,
padding='same', use_bias=False)(data_x)
res_x = layers.Slice((Ellipsis, slice(0, self.config['model']['filters']['depths']['res'])),
(self.input_length, self.config['model']['filters']['depths']['res']),
name='res_%d_data_slice_3_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((Ellipsis, slice(self.config['model']['filters']['depths']['res'],
self.config['model']['filters']['depths']['res'] +
self.config['model']['filters']['depths']['skip'])),
(self.input_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_data_slice_4_d%d_s%d' % (res_block_i, dilation, stack_i))(data_x)
skip_x = layers.Slice((slice(self.samples_of_interest_indices[0], self.samples_of_interest_indices[-1] + 1, 1),
Ellipsis), (self.padded_target_field_length, self.config['model']['filters']['depths']['skip']),
name='res_%d_keep_samples_of_interest_d%d_s%d' % (res_block_i, dilation, stack_i))(skip_x)
res_x = keras.layers.Add()([original_x, res_x])
return res_x, skip_x
================================================
FILE: separate.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Separate.py
from __future__ import division
import os
import util
import tqdm
import numpy as np
def separate_sample(model, input, batch_size, output_filename_prefix, sample_rate, output_path, target):
if target == 'singing-voice':
if len(input['mixture']) < model.receptive_field_length:
raise ValueError('Input is not long enough to be used with this model.')
num_output_samples = input['mixture'].shape[0] - (model.receptive_field_length - 1)
num_fragments = int(np.ceil(num_output_samples / model.target_field_length))
num_batches = int(np.ceil(num_fragments / batch_size))
vocals_output = []
num_pad_values = 0
fragment_i = 0
for batch_i in tqdm.tqdm(range(0, num_batches)):
if batch_i == num_batches - 1: # If its the last batch
batch_size = num_fragments - batch_i * batch_size
input_batch = np.zeros((batch_size, model.input_length))
# Assemble batch
for batch_fragment_i in range(0, batch_size):
if fragment_i + model.target_field_length > num_output_samples:
remainder = input['mixture'][fragment_i:]
current_fragment = np.zeros((model.input_length,))
current_fragment[:remainder.shape[0]] = remainder
num_pad_values = model.input_length - remainder.shape[0]
else:
current_fragment = input['mixture'][fragment_i:fragment_i + model.input_length]
input_batch[batch_fragment_i, :] = current_fragment
fragment_i += model.target_field_length
separated_output_fragments = model.separate_batch({'data_input': input_batch})
if type(separated_output_fragments) is list:
vocals_output_fragment = separated_output_fragments[0]
vocals_output_fragment = vocals_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
vocals_output_fragment = vocals_output_fragment.flatten().tolist()
if type(separated_output_fragments) is float:
vocals_output_fragment = [vocals_output_fragment]
vocals_output = vocals_output + vocals_output_fragment
vocals_output = np.array(vocals_output)
if num_pad_values != 0:
vocals_output = vocals_output[:-num_pad_values]
mixture_valid_signal = input['mixture'][
model.half_receptive_field_length:model.half_receptive_field_length + len(vocals_output)]
accompaniment_output = mixture_valid_signal - vocals_output
output_vocals_filename = output_filename_prefix + '_vocals.wav'
output_accompaniment_filename = output_filename_prefix + '_accompaniment.wav'
output_vocals_filepath = os.path.join(output_path, output_vocals_filename)
output_accompaniment_filepath = os.path.join(output_path, output_accompaniment_filename)
util.write_wav(vocals_output, output_vocals_filepath, sample_rate)
util.write_wav(accompaniment_output, output_accompaniment_filepath, sample_rate)
if target == 'multi-instrument':
if len(input['mixture']) < model.receptive_field_length:
raise ValueError('Input is not long enough to be used with this model.')
num_output_samples = input['mixture'].shape[0] - (model.receptive_field_length - 1)
num_fragments = int(np.ceil(num_output_samples / model.target_field_length))
num_batches = int(np.ceil(num_fragments / batch_size))
vocals_output = []
drums_output = []
bass_output = []
num_pad_values = 0
fragment_i = 0
for batch_i in tqdm.tqdm(range(0, num_batches)):
if batch_i == num_batches - 1: # If its the last batch
batch_size = num_fragments - batch_i * batch_size
input_batch = np.zeros((batch_size, model.input_length))
# Assemble batch
for batch_fragment_i in range(0, batch_size):
if fragment_i + model.target_field_length > num_output_samples:
remainder = input['mixture'][fragment_i:]
current_fragment = np.zeros((model.input_length,))
current_fragment[:remainder.shape[0]] = remainder
num_pad_values = model.input_length - remainder.shape[0]
else:
current_fragment = input['mixture'][fragment_i:fragment_i + model.input_length]
input_batch[batch_fragment_i, :] = current_fragment
fragment_i += model.target_field_length
separated_output_fragments = model.separate_batch({'data_input': input_batch})
if type(separated_output_fragments) is list:
vocals_output_fragment = separated_output_fragments[0]
drums_output_fragment = separated_output_fragments[1]
bass_output_fragment = separated_output_fragments[2]
vocals_output_fragment = vocals_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
vocals_output_fragment = vocals_output_fragment.flatten().tolist()
drums_output_fragment = drums_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
drums_output_fragment = drums_output_fragment.flatten().tolist()
bass_output_fragment = bass_output_fragment[:,
model.target_padding: model.target_padding + model.target_field_length]
bass_output_fragment = bass_output_fragment.flatten().tolist()
if type(separated_output_fragments) is float:
vocals_output_fragment = [vocals_output_fragment]
if type(drums_output_fragment) is float:
drums_output_fragment = [drums_output_fragment]
if type(bass_output_fragment) is float:
bass_output_fragment = [bass_output_fragment]
vocals_output = vocals_output + vocals_output_fragment
drums_output = drums_output + drums_output_fragment
bass_output = bass_output + bass_output_fragment
vocals_output = np.array(vocals_output)
drums_output = np.array(drums_output)
bass_output = np.array(bass_output)
if num_pad_values != 0:
vocals_output = vocals_output[:-num_pad_values]
drums_output = drums_output[:-num_pad_values]
bass_output = bass_output[:-num_pad_values]
mixture_valid_signal = input['mixture'][
model.half_receptive_field_length:model.half_receptive_field_length + len(vocals_output)]
other_output = mixture_valid_signal - vocals_output - drums_output - bass_output
output_vocals_filename = output_filename_prefix + '_vocals.wav'
output_drums_filename = output_filename_prefix + '_drums.wav'
output_bass_filename = output_filename_prefix + '_bass.wav'
output_other_filename = output_filename_prefix + '_other.wav'
output_vocals_filepath = os.path.join(output_path, output_vocals_filename)
output_drums_filepath = os.path.join(output_path, output_drums_filename)
output_bass_filepath = os.path.join(output_path, output_bass_filename)
output_other_filepath = os.path.join(output_path, output_other_filename)
util.write_wav(vocals_output, output_vocals_filepath, sample_rate)
util.write_wav(drums_output, output_drums_filepath, sample_rate)
util.write_wav(bass_output, output_bass_filepath, sample_rate)
util.write_wav(other_output, output_other_filepath, sample_rate)
================================================
FILE: sessions/multi-instrument/checkpoints/checkpoint.00045-0.hdf5
================================================
[File too large to display: 38.3 MB]
================================================
FILE: sessions/multi-instrument/config.json
================================================
{
"dataset": {
"extract_voice_percentage": 0,
"in_memory_percentage": 1,
"path": "MUS",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"depths": {
"final": [
2048,
256
],
"res": 64,
"skip": 64
},
"lengths": {
"final": [
3,
3
],
"res": 3,
"skip": 1
}
},
"input_length": 9785,
"num_params": 3277763,
"num_residual_blocks": 40,
"num_stacks": 4,
"receptive_field_length": 8185,
"target_field_length": 1601,
"target_padding": 1,
"type": "multi-instrument"
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_3": {
"l1": 1,
"l2": 0,
"weight": 1
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/multi-instrument",
"verbosity": 1
}
}
================================================
FILE: sessions/singing-voice/checkpoints/checkpoint.00058-0.hdf5
================================================
[File too large to display: 38.3 MB]
================================================
FILE: sessions/singing-voice/config.json
================================================
{
"dataset": {
"extract_voice_percentage": 0.5,
"in_memory_percentage": 1,
"path": "data/MUS",
"sample_rate": 16000,
"type": "musdb18"
},
"model": {
"condition_encoding": "binary",
"dilations": 9,
"filters": {
"depths": {
"final": [
2048,
256
],
"res": 64,
"skip": 64
},
"lengths": {
"final": [
3,
3
],
"res": 3,
"skip": 1
}
},
"input_length": 9785,
"num_params": 3277249,
"num_residual_blocks": 40,
"num_stacks": 4,
"receptive_field_length": 8185,
"target_field_length": 1601,
"target_padding": 1,
"type": "singing-voice"
},
"optimizer": {
"decay": 0.0,
"epsilon": 1e-08,
"lr": 0.001,
"momentum": 0.9,
"type": "adam"
},
"training": {
"batch_size": 10,
"early_stopping_patience": 16,
"loss": {
"out_1": {
"l1": 1,
"l2": 0,
"weight": 1
},
"out_2": {
"l1": 1,
"l2": 0,
"weight": -0.05
}
},
"num_epochs": 250,
"num_steps_test": 500,
"num_steps_train": 2000,
"path": "sessions/singing-voice",
"verbosity": 1
}
}
================================================
FILE: util.py
================================================
# A Wavenet For Source Separation - Francesc Lluis - 25.10.2018
# Util.py
# Utility functions for dealing with audio signals and training a Source Separation Wavenet
import os
import numpy as np
import json
import warnings
import scipy.signal
import scipy.stats
import soundfile as sf
import keras
import glob
def l1_l2_loss(y_true, y_pred, l1_weight, l2_weight):
loss = 0
if l1_weight != 0:
loss += l1_weight*keras.losses.mean_absolute_error(y_true, y_pred)
if l2_weight != 0:
loss += l2_weight * keras.losses.mean_squared_error(y_true, y_pred)
return loss
def compute_receptive_field_length(stacks, dilations, filter_length, target_field_length):
half_filter_length = (filter_length-1)/2
length = 0
for d in dilations:
length += d*half_filter_length
length = 2*length
length = stacks * length
length += target_field_length
return length
def wav_to_float(x):
try:
max_value = np.iinfo(x.dtype).max
min_value = np.iinfo(x.dtype).min
except:
max_value = np.finfo(x.dtype).max
min_value = np.finfo(x.dtype).min
x = x.astype('float64', casting='safe')
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def float_to_uint8(x):
x += 1.
x /= 2.
uint8_max_value = np.iinfo('uint8').max
x *= uint8_max_value
x = x.astype('uint8')
return x
def keras_float_to_uint8(x):
x += 1.
x /= 2.
uint8_max_value = 255
x *= uint8_max_value
return x
def linear_to_ulaw(x, u=255):
x = np.sign(x) * (np.log(1 + u * np.abs(x)) / np.log(1 + u))
return x
def keras_linear_to_ulaw(x, u=255.0):
x = keras.backend.sign(x) * (keras.backend.log(1 + u * keras.backend.abs(x)) / keras.backend.log(1 + u))
return x
def uint8_to_float(x):
max_value = np.iinfo('uint8').max
min_value = np.iinfo('uint8').min
x = x.astype('float32', casting='unsafe')
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def keras_uint8_to_float(x):
max_value = 255
min_value = 0
x -= min_value
x /= ((max_value - min_value) / 2.)
x -= 1.
return x
def ulaw_to_linear(x, u=255.0):
y = np.sign(x) * (1 / float(u)) * (((1 + float(u)) ** np.abs(x)) - 1)
return y
def keras_ulaw_to_linear(x, u=255.0):
y = keras.backend.sign(x) * (1 / u) * (((1 + u) ** keras.backend.abs(x)) - 1)
return y
def one_hot_encode(x, num_values=256):
if isinstance(x, int):
x = np.array([x])
if isinstance(x, list):
x = np.array(x)
return np.eye(num_values, dtype='uint8')[x.astype('uint8')]
def one_hot_decode(x):
return np.argmax(x, axis=-1)
def preemphasis(signal, alpha=0.95):
return np.append(signal[0], signal[1:] - alpha * signal[:-1])
def binary_encode(x, max_value):
if isinstance(x, int):
x = np.array([x])
if isinstance(x, list):
x = np.array(x)
width = np.ceil(np.log2(max_value)).astype(int)
return (((x[:, None] & (1 << np.arange(width)))) > 0).astype(int)
def get_condition_input_encode_func(representation):
if representation == 'binary':
return binary_encode
else:
return one_hot_encode
def ensure_keys_in_dict(keys, dictionary):
if all (key in dictionary for key in keys):
return True
return False
def get_subdict_from_dict(keys, dictionary):
return dict((k, dictionary[k]) for k in keys if k in dictionary)
def pretty_json_dump(values, file_path=None):
if file_path is None:
print json.dumps(values, sort_keys=True, indent=4, separators=(',', ': '))
else:
json.dump(values, open(file_path, 'w'), sort_keys=True, indent=4, separators=(',', ': '))
def read_wav(filename):
# Reads in a wav audio file, averages both if stereo, converts the signal to float64 representation
audio_signal, sample_rate = sf.read(filename)
if audio_signal.ndim > 1:
audio_signal = (audio_signal[:, 0] + audio_signal[:, 1])/2.0
if audio_signal.dtype != 'float64':
audio_signal = wav_to_float(audio_signal)
return audio_signal, sample_rate
def load_wav(wav_path, desired_sample_rate):
sequence, sample_rate = read_wav(wav_path)
sequence = ensure_sample_rate(sequence, desired_sample_rate, sample_rate)
return sequence
def write_wav(x, filename, sample_rate):
if type(x) != np.ndarray:
x = np.array(x)
with warnings.catch_warnings():
warnings.simplefilter("error")
sf.write(filename, x, sample_rate)
def ensure_sample_rate(x, desired_sample_rate, file_sample_rate):
if file_sample_rate != desired_sample_rate:
return scipy.signal.resample_poly(x, desired_sample_rate, file_sample_rate)
return x
def normalize(x):
max_peak = np.max(np.abs(x))
return x / max_peak
def get_sequence_with_singing_indices(full_sequence):
signal_magnitude = np.abs(full_sequence)
chunk_length = 800
chunks_energies = []
for i in xrange(0, len(signal_magnitude), chunk_length):
chunks_energies.append(np.mean(signal_magnitude[i:i + chunk_length]))
threshold = np.max(chunks_energies) * .1
chunks_energies = np.asarray(chunks_energies)
chunks_energies[np.where(chunks_energies < threshold)] = 0
onsets = np.zeros(len(chunks_energies))
onsets[np.nonzero(chunks_energies)] = 1
onsets = np.diff(onsets)
start_ind = np.squeeze(np.where(onsets == 1))
finish_ind = np.squeeze(np.where(onsets == -1))
if finish_ind[0] < start_ind[0]:
finish_ind = finish_ind[1:]
if start_ind[-1] > finish_ind[-1]:
start_ind = start_ind[:-1]
indices_inici_final = np.insert(finish_ind, np.arange(len(start_ind)), start_ind)
return np.squeeze((np.asarray(indices_inici_final) + 1) * chunk_length)
def get_indices_subsequence(indices):
start_indice = 2 * np.random.randint(0, np.ceil(len(indices) / 2))
vocals_indices = (indices[start_indice], indices[start_indice + 1])
accompaniment_indices = vocals_indices
return vocals_indices, accompaniment_indices
def contains_voice(fragment, sequence):
signal_fragment_magnitude = np.abs(fragment)
signal_sequence_magnitude = np.abs(sequence)
chunk_length = 800
chunks_fragment_energies = []
for i in xrange(0, len(signal_fragment_magnitude), chunk_length):
chunks_fragment_energies.append(np.mean(signal_fragment_magnitude[i:i + chunk_length]))
chunks_sequence_energies = []
for i in xrange(0, len(signal_sequence_magnitude), chunk_length):
chunks_sequence_energies.append(np.mean(signal_sequence_magnitude[i:i + chunk_length]))
threshold = np.max(chunks_sequence_energies) * .1
chunks_fragment_energies = np.asarray(chunks_fragment_energies)
chunks_fragment_energies[np.where(chunks_fragment_energies < threshold)] = 0
if np.count_nonzero(chunks_fragment_energies) > 0:
return True
else:
return False
def dir_contains_files(path):
for f in os.listdir(path):
if not f.startswith('.'):
return True
return False