Repository: MaigoAkisame/cmu-thesis
Branch: master
Commit: 7578dda2de95
Files: 26
Total size: 93.5 KB
Directory structure:
gitextract_nbiev_ku/
├── .gitignore
├── LICENSE
├── README.md
├── code/
│ ├── audioset/
│ │ ├── Net.py
│ │ ├── eval-TALNet.sh
│ │ ├── eval.py
│ │ ├── train.py
│ │ ├── util_f1.py
│ │ ├── util_in.py
│ │ └── util_out.py
│ ├── dcase/
│ │ ├── Net.py
│ │ ├── eval.py
│ │ ├── train.py
│ │ ├── util_f1.py
│ │ ├── util_in.py
│ │ └── util_out.py
│ └── sequential/
│ ├── Net.py
│ ├── ctc.py
│ ├── ctl.py
│ ├── eval.py
│ ├── train.py
│ ├── util_f1.py
│ ├── util_in.py
│ └── util_out.py
├── data/
│ └── download.sh
└── workspace/
└── .gitignore
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.pyc
data/dcase
data/audioset
data/sequential
workspace/dcase
workspace/audioset
workspace/sequential
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2018 Yun Wang
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# cmu-thesis
This repository contains the code for three experiments in my PhD thesis, [Polyphonic Sound Event Detection with Weak Labeling](http://www.cs.cmu.edu/~yunwang/papers/cmu-thesis.pdf):
* Sound event detection with **presence/absence labeling** on the **[DCASE 2017 challenge](http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-large-scale-sound-event-detection)** (Chapter 3.2)
* Sound event detection with **presence/absence labeling** on **[Google Audio Set](https://research.google.com/audioset/)** (Chapter 3.3)
* Sound event detection with **sequential labeling** on a subset of **[Google Audio Set](https://research.google.com/audioset/)** (Chapter 4)
## Prerequisites
Hardware:
* A GPU
* Large storage (1 TB recommended)
Software:
* Python 2.7
* PyTorch (I used version 0.4.0a0+d3b6c5e)
* numpy, scipy, [joblib](https://pypi.org/project/joblib/)
## Quick Start
```python
# Clone the repository
git clone https://github.com/MaigoAkisame/cmu-thesis.git
# Download the data: may take up to 1 day!
cd cmu-thesis/data
./download.sh
# Train a model for the DCASE experiment using default settings
cd ../code/dcase
python train.py # Needs to run on a GPU
# Evaluate the model at Checkpoint 25
python eval.py --ckpt=25 # Needs to run on a GPU for the first time
# Download and evaluate the TALNet model for the Audio Set experiment
cd ../audioset
./eval-TALNet.sh # Needs to run on a GPU for the first time
```
## Organization of the Repository
### code
The `code` directory contains three sub-directories: `dcase`, `audioset`, and `sequential`. These contain the code for the three experiments. In each subdirectory:
* `Net.py` defines the network architecture (you don't need to execute this script directly);
* `train.py` trains the network;
* `eval.py` evaluates the network's performance.
The `train.py` and `eval.py` script can take many command line arguments, which specify the architecture of the network and the hyperparameters used during training. If you encounter "out of memory" errors, a good idea is to reduce the batch size.
Some scripts that may be of special interest:
* `code/*/util_in.py`: Implements data balancing so that each minibatch contains roughly equal numbers of recordings of each event type;
* `code/sequential/ctc.py`: My implementation of connectionist temporal classification (CTC);
* `code/sequential/ctl.py`: My implementation of connectionist temporal localization (CTL).
### data
The script `data/download.sh` will download and extract the following three archives in the `data` directory:
* [dcase.tgz](http://islpc21.is.cs.cmu.edu/yunwang/git/cmu-thesis/data/dcase.tgz) (4.9 GB)
* [audioset.tgz](http://islpc21.is.cs.cmu.edu/yunwang/git/cmu-thesis/data/audioset.tgz) (341 GB)
* [sequential.tgz](http://islpc21.is.cs.cmu.edu/yunwang/git/cmu-thesis/data/sequential.tgz) (63 GB)
These archives contain Matlab data files (with the `.mat` extension) that store the filterbank features and ground truth labels. They can be loaded with the `scipy.io.loadmat` function in Python. Each Matlab file contains three matrices:
* `feat`: Filterbank features, a float32 array of shape (n, 400, 64) (n recordings, 400 frames, 64 frequency bins);
* `labels`:
* Presence/absence labeling, a boolean array of shape (n, m) (n recordings, m event types), or
* Strong labelng, a boolean array of shape (n, 100, m) (n recordings, 100 frames, m event types);
* `hashes`: A character array of size (n, 11), containing the YouTube hash IDs of the recordings.
Training recordings are organized by class (so data balancing can be done easily), and each Matlab file contains up to 101 recordings. Validation and test/evaluation recordings are stored in Matlab files that contain up to 500 recordings each.
Because the data is so huge, I do not provide the code for downloading the raw data, extracting features, and organizing the features and labels into Matlab data files. The whole process took me more than a month and endless babysitting!
### workspace
The training logs, trained models, predictions on the test/evaluation recordings, and evaluation results will be generated in this directory. The sub-directory names will reflect the network architecture and hyperparameters for training.
The script `code/audioset/eval-TALNet.py` will download the TALNet model and store it at `workspace/audioset/TALNet/model/TALNet.pt`. At the time of my graduation (October 2018), this is the best model that can both classify and localize sound events on Google Audio Set.
## Citing
If you use this code in your research, please cite my PhD thesis:
* Yun Wang, "Polyphonic sound event detection with weak labeling", PhD thesis, Carnegie Mellon University, Oct. 2018.
and/or the following publications:
* Yun Wang, Juncheng Li and Florian Metze, "A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling," arXiv e-prints, Oct. 2018. [Online]. Available: <http://arxiv.org/abs/1810.09050>.
* Yun Wang and Florian Metze, "Connectionist temporal localization for sound event detection with sequential labeling," arXiv e-prints, Oct. 2018. [Online]. Available: <http://arxiv.org/abs/1810.09052>.
================================================
FILE: code/audioset/Net.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy
class ConvBlock(nn.Module):
def __init__(self, n_input_feature_maps, n_output_feature_maps, kernel_size, batch_norm = False, pool_stride = None):
super(ConvBlock, self).__init__()
assert all(x % 2 == 1 for x in kernel_size)
self.n_input = n_input_feature_maps
self.n_output = n_output_feature_maps
self.kernel_size = kernel_size
self.batch_norm = batch_norm
self.pool_stride = pool_stride
# "~batch_norm" should be written as "not batch_norm"; otherwise ~True will evaluate to -2 and be treated as True.
# But I'll keep this error to avoid breaking existing models.
self.conv = nn.Conv2d(self.n_input, self.n_output, self.kernel_size, padding = tuple(x/2 for x in self.kernel_size), bias = ~batch_norm)
if batch_norm: self.bn = nn.BatchNorm2d(self.n_output)
nn.init.xavier_uniform(self.conv.weight)
def forward(self, x):
x = self.conv(x)
if self.batch_norm: x = self.bn(x)
x = F.relu(x)
if self.pool_stride is not None: x = F.max_pool2d(x, self.pool_stride)
return x
class Net(nn.Module):
def __init__(self, args):
super(Net, self).__init__()
self.__dict__.update(args.__dict__) # Instill all args into self
assert self.n_conv_layers % self.n_pool_layers == 0
self.input_n_freq_bins = n_freq_bins = 64
self.output_size = 527
self.conv = []
pool_interval = self.n_conv_layers / self.n_pool_layers
n_input = 1
for i in range(self.n_conv_layers):
if (i + 1) % pool_interval == 0: # this layer has pooling
n_freq_bins /= 2
n_output = self.embedding_size / n_freq_bins
pool_stride = (2, 2) if i < pool_interval * 2 else (1, 2)
else:
n_output = self.embedding_size * 2 / n_freq_bins
pool_stride = None
layer = ConvBlock(n_input, n_output, self.kernel_size, batch_norm = self.batch_norm, pool_stride = pool_stride)
self.conv.append(layer)
self.__setattr__('conv' + str(i + 1), layer)
n_input = n_output
self.gru = nn.GRU(self.embedding_size, self.embedding_size / 2, 1, batch_first = True, bidirectional = True)
self.fc_prob = nn.Linear(self.embedding_size, self.output_size)
if self.pooling == 'att':
self.fc_att = nn.Linear(self.embedding_size, self.output_size)
# Better initialization
nn.init.orthogonal(self.gru.weight_ih_l0); nn.init.constant(self.gru.bias_ih_l0, 0)
nn.init.orthogonal(self.gru.weight_hh_l0); nn.init.constant(self.gru.bias_hh_l0, 0)
nn.init.orthogonal(self.gru.weight_ih_l0_reverse); nn.init.constant(self.gru.bias_ih_l0_reverse, 0)
nn.init.orthogonal(self.gru.weight_hh_l0_reverse); nn.init.constant(self.gru.bias_hh_l0_reverse, 0)
nn.init.xavier_uniform(self.fc_prob.weight); nn.init.constant(self.fc_prob.bias, 0)
if self.pooling == 'att':
nn.init.xavier_uniform(self.fc_att.weight); nn.init.constant(self.fc_att.bias, 0)
def forward(self, x):
x = x.view((-1, 1, x.size(1), x.size(2))) # x becomes (batch, channel, time, freq)
for i in range(len(self.conv)):
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x = self.conv[i](x) # x becomes (batch, channel, time, freq)
x = x.permute(0, 2, 1, 3).contiguous() # x becomes (batch, time, channel, freq)
x = x.view((-1, x.size(1), x.size(2) * x.size(3))) # x becomes (batch, time, embedding_size)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x, _ = self.gru(x) # x becomes (batch, time, embedding_size)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
frame_prob = F.sigmoid(self.fc_prob(x)) # shape of frame_prob: (batch, time, output_size)
frame_prob = torch.clamp(frame_prob, 1e-7, 1 - 1e-7)
if self.pooling == 'max':
global_prob, _ = frame_prob.max(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'ave':
global_prob = frame_prob.mean(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'lin':
global_prob = (frame_prob * frame_prob).sum(dim = 1) / frame_prob.sum(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'exp':
global_prob = (frame_prob * frame_prob.exp()).sum(dim = 1) / frame_prob.exp().sum(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'att':
frame_att = F.softmax(self.fc_att(x), dim = 1)
global_prob = (frame_prob * frame_att).sum(dim = 1)
return global_prob, frame_prob, frame_att
def predict(self, x, verbose = True, batch_size = 100):
# Predict in batches. Both input and output are numpy arrays.
# If verbose == True, return all of global_prob, frame_prob and att
# If verbose == False, only return global_prob
result = []
for i in range(0, len(x), batch_size):
with torch.no_grad():
input = Variable(torch.from_numpy(x[i : i + batch_size])).cuda()
output = self.forward(input)
if not verbose: output = output[:1]
result.append([var.data.cpu().numpy() for var in output])
result = tuple(numpy.concatenate(items) for items in zip(*result))
return result if verbose else result[0]
================================================
FILE: code/audioset/eval-TALNet.sh
================================================
TALNet_FILE=../../workspace/audioset/TALNet/model/TALNet.pt
if ! [ -f $TALNet_FILE ]; then
mkdir -p $(dirname $TALNet_FILE)
wget -O $TALNet_FILE http://islpc21.is.cs.cmu.edu/yunwang/git/cmu-thesis/model/TALNet.pt
fi
python eval.py --TALNet
================================================
FILE: code/audioset/eval.py
================================================
import sys, os, os.path
import argparse
import numpy
from util_out import *
from util_f1 import *
from scipy.io import loadmat, savemat
# Parse input arguments
def mybool(s):
return s.lower() in ['t', 'true', 'y', 'yes', '1']
parser = argparse.ArgumentParser()
parser.add_argument('--TALNet', action = 'store_true') # specify this to evaluate the pre-trained TALNet model
parser.add_argument('--embedding_size', type = int, default = 1024) # this is the embedding size after a pooling layer
# after a non-pooling layer, the embeddings size will be twice this much
parser.add_argument('--n_conv_layers', type = int, default = 10)
parser.add_argument('--kernel_size', type = str, default = '3') # 'n' or 'nxm'
parser.add_argument('--n_pool_layers', type = int, default = 5) # the pooling layers will be inserted uniformly into the conv layers
# the should be at least 2 and at most 6 pooling layers
# the first two pooling layers will have stride (2,2); later ones will have stride (1,2)
parser.add_argument('--batch_norm', type = mybool, default = True)
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--pooling', type = str, default = 'lin', choices = ['max', 'ave', 'lin', 'exp', 'att'])
parser.add_argument('--batch_size', type = int, default = 250)
parser.add_argument('--ckpt_size', type = int, default = 1000) # how many batches per checkpoint
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 1e-3)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 0.8)
parser.add_argument('--random_seed', type = int, default = 15213)
parser.add_argument('--ckpt', type = int)
args = parser.parse_args()
if 'x' not in args.kernel_size:
args.kernel_size = args.kernel_size + 'x' + args.kernel_size
# Locate model file and prepare directories for prediction and evaluation
expid = 'TALNet' if args.TALNet else 'embed%d-%dC%dP-kernel%s-%s-drop%.1f-%s-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.embedding_size,
args.n_conv_layers,
args.n_pool_layers,
args.kernel_size,
'bn' if args.batch_norm else 'nobn',
args.dropout,
args.pooling,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/audioset', expid)
PRED_PATH = os.path.join(WORKSPACE, 'pred')
if not os.path.exists(PRED_PATH): os.makedirs(PRED_PATH)
EVAL_PATH = os.path.join(WORKSPACE, 'eval')
if not os.path.exists(EVAL_PATH): os.makedirs(EVAL_PATH)
if args.TALNet:
MODEL_FILE = os.path.join(WORKSPACE, 'model', 'TALNet.pt')
PRED_FILE = os.path.join(PRED_PATH, 'TALNet.mat')
EVAL_FILE = os.path.join(EVAL_PATH, 'TALNet.txt')
else:
MODEL_FILE = os.path.join(WORKSPACE, 'model', 'checkpoint%d.pt' % args.ckpt)
PRED_FILE = os.path.join(PRED_PATH, 'checkpoint%d.mat' % args.ckpt)
EVAL_FILE = os.path.join(EVAL_PATH, 'checkpoint%d.txt' % args.ckpt)
with open(EVAL_FILE, 'w'):
pass
def write_log(s):
print s
with open(EVAL_FILE, 'a') as f:
f.write(s + '\n')
if os.path.exists(PRED_FILE):
# Load saved predictions, no need to use GPU
data = loadmat(PRED_FILE)
dcase_thres = data['dcase_thres'].ravel()
dcase_test_y = data['dcase_test_y']
dcase_test_frame_y = data['dcase_test_frame_y']
dcase_test_outputs = []
dcase_test_outputs.append(data['dcase_test_global_prob'])
dcase_test_outputs.append(data['dcase_test_frame_prob'])
if args.pooling == 'att':
dcase_test_outputs.append(data['dcase_test_frame_att'])
gas_eval_y = data['gas_eval_y']
gas_eval_global_prob = data['gas_eval_global_prob']
else:
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from util_in import *
# Load model
args.kernel_size = tuple(int(x) for x in args.kernel_size.split('x'))
model = Net(args).cuda()
model.load_state_dict(torch.load(MODEL_FILE)['model'])
model.eval()
# Load DCASE data
dcase_valid_x, dcase_valid_y, _ = bulk_load('DCASE_valid')
dcase_test_x, dcase_test_y, dcase_test_hashes = bulk_load('DCASE_test')
dcase_test_frame_y = load_dcase_test_frame_truth()
DCASE_CLASS_IDS = [318, 324, 341, 321, 307, 310, 314, 397, 325, 326, 323, 319, 14, 342, 329, 331, 316]
# Predict on DCASE data
dcase_valid_global_prob = model.predict(dcase_valid_x, verbose = False)[:, DCASE_CLASS_IDS]
dcase_thres = optimize_micro_avg_f1(dcase_valid_global_prob, dcase_valid_y)
dcase_test_outputs = model.predict(dcase_test_x, verbose = True)
dcase_test_outputs = tuple(x[..., DCASE_CLASS_IDS] for x in dcase_test_outputs)
# Load GAS data
gas_eval_x, gas_eval_y, gas_eval_hashes = bulk_load('GAS_eval')
# Predict on GAS data
gas_eval_global_prob = model.predict(gas_eval_x, verbose = False)
# Save predictions
data = {}
data['dcase_thres'] = dcase_thres
data['dcase_test_hashes'] = dcase_test_hashes
data['dcase_test_y'] = dcase_test_y
data['dcase_test_frame_y'] = dcase_test_frame_y
data['dcase_test_global_prob'] = dcase_test_outputs[0]
data['dcase_test_frame_prob'] = dcase_test_outputs[1]
if args.pooling == 'att':
data['dcase_test_frame_att'] = dcase_test_outputs[2]
data['gas_eval_hashes'] = gas_eval_hashes
data['gas_eval_y'] = gas_eval_y
data['gas_eval_global_prob'] = gas_eval_global_prob
savemat(PRED_FILE, data)
# Evaluation on DCASE 2017
write_log('Performance on DCASE 2017:')
write_log('')
write_log(' || || Task A (recording level) || Task B (1-second segment level) ')
write_log(' CLASS || THRES || TP | FN | FP | Prec. | Recall | F1 || TP | FN | FP | Prec. | Recall | F1 | Sub | Del | Ins | ER ')
FORMAT1 = ' Micro Avg || || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f | %#4d | %#4d | %#4d | %6.02f '
FORMAT2 = ' %######9d || %8.0006f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f | | | | '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT1)
write_log(SEP)
# dcase_test_y and dcase_test_frame_y are inconsistent in some places
# so when you evaluate Task A, use a "fake_dcase_test_frame_y" derived from dcase_test_y
fake_dcase_test_frame_y = numpy.tile(numpy.expand_dims(dcase_test_y, 1), (1, 100, 1))
# Micro-average performance across all classes
res_taskA = dcase_sed_eval(dcase_test_outputs, args.pooling, dcase_thres, fake_dcase_test_frame_y, 100, verbose = True)
res_taskB = dcase_sed_eval(dcase_test_outputs, args.pooling, dcase_thres, dcase_test_frame_y, 10, verbose = True)
write_log(FORMAT1 % (res_taskA.TP, res_taskA.FN, res_taskA.FP, res_taskA.precision, res_taskA.recall, res_taskA.F1,
res_taskB.TP, res_taskB.FN, res_taskB.FP, res_taskB.precision, res_taskB.recall, res_taskB.F1,
res_taskB.sub, res_taskB.dele, res_taskB.ins, res_taskB.ER))
write_log(SEP)
# Class-wise performance
N_CLASSES = dcase_test_outputs[0].shape[-1]
for i in range(N_CLASSES):
outputs = [x[..., i:i+1] for x in dcase_test_outputs]
res_taskA = dcase_sed_eval(outputs, args.pooling, dcase_thres[i], fake_dcase_test_frame_y[..., i:i+1], 100, verbose = True)
res_taskB = dcase_sed_eval(outputs, args.pooling, dcase_thres[i], dcase_test_frame_y[..., i:i+1], 10, verbose = True)
write_log(FORMAT2 % (i, dcase_thres[i],
res_taskA.TP, res_taskA.FN, res_taskA.FP, res_taskA.precision, res_taskA.recall, res_taskA.F1,
res_taskB.TP, res_taskB.FN, res_taskB.FP, res_taskB.precision, res_taskB.recall, res_taskB.F1))
# Evaluation on Google Audio Set
write_log('')
write_log('Performance on Google Audio Set:')
write_log('')
write_log(" CLASS || AP | AUC | d' ")
FORMAT = ' %00007s || %5.3f | %5.3f |%6.03f '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT)
write_log(SEP)
classwise = []
N_CLASSES = gas_eval_global_prob.shape[-1]
for i in range(N_CLASSES):
classwise.append(gas_eval(gas_eval_global_prob[:,i], gas_eval_y[:,i])) # AP, AUC, dprime
map, mauc = numpy.array(classwise).mean(axis = 0)[:2]
write_log(FORMAT % ('Average', map, mauc, dprime(mauc)))
write_log(SEP)
for i in range(N_CLASSES):
write_log(FORMAT % ((str(i),) + classwise[i]))
================================================
FILE: code/audioset/train.py
================================================
import sys, os, os.path, time
import argparse
import numpy
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from util_in import *
from util_out import *
from util_f1 import *
torch.backends.cudnn.benchmark = True
# Parse input arguments
def mybool(s):
return s.lower() in ['t', 'true', 'y', 'yes', '1']
parser = argparse.ArgumentParser()
parser.add_argument('--embedding_size', type = int, default = 1024) # this is the embedding size after a pooling layer
# after a non-pooling layer, the embeddings size will be twice this much
parser.add_argument('--n_conv_layers', type = int, default = 10)
parser.add_argument('--kernel_size', type = str, default = '3') # 'n' or 'nxm'
parser.add_argument('--n_pool_layers', type = int, default = 5) # the pooling layers will be inserted uniformly into the conv layers
# the should be at least 2 and at most 6 pooling layers
# the first two pooling layers will have stride (2,2); later ones will have stride (1,2)
parser.add_argument('--batch_norm', type = mybool, default = True)
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--pooling', type = str, default = 'lin', choices = ['max', 'ave', 'lin', 'exp', 'att'])
parser.add_argument('--batch_size', type = int, default = 250)
parser.add_argument('--ckpt_size', type = int, default = 1000) # how many batches per checkpoint
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 1e-3)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 0.8)
parser.add_argument('--max_ckpt', type = int, default = 30)
parser.add_argument('--random_seed', type = int, default = 15213)
args = parser.parse_args()
if 'x' not in args.kernel_size:
args.kernel_size = args.kernel_size + 'x' + args.kernel_size
numpy.random.seed(args.random_seed)
# Prepare log file and model directory
expid = 'embed%d-%dC%dP-kernel%s-%s-drop%.1f-%s-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.embedding_size,
args.n_conv_layers,
args.n_pool_layers,
args.kernel_size,
'bn' if args.batch_norm else 'nobn',
args.dropout,
args.pooling,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/audioset', expid)
MODEL_PATH = os.path.join(WORKSPACE, 'model')
if not os.path.exists(MODEL_PATH): os.makedirs(MODEL_PATH)
LOG_FILE = os.path.join(WORKSPACE, 'train.log')
with open(LOG_FILE, 'w'):
pass
def write_log(s):
timestamp = time.strftime('%m-%d %H:%M:%S')
msg = '[' + timestamp + '] ' + s
print msg
with open(LOG_FILE, 'a') as f:
f.write(msg + '\n')
# Load data
write_log('Loading data ...')
train_gen = batch_generator(batch_size = args.batch_size, random_seed = args.random_seed)
gas_valid_x, gas_valid_y, _ = bulk_load('GAS_valid')
gas_eval_x, gas_eval_y, _ = bulk_load('GAS_eval')
dcase_valid_x, dcase_valid_y, _ = bulk_load('DCASE_valid')
dcase_test_x, dcase_test_y, _ = bulk_load('DCASE_test')
dcase_test_frame_truth = load_dcase_test_frame_truth()
DCASE_CLASS_IDS = [318, 324, 341, 321, 307, 310, 314, 397, 325, 326, 323, 319, 14, 342, 329, 331, 316]
# Build model
args.kernel_size = tuple(int(x) for x in args.kernel_size.split('x'))
model = Net(args).cuda()
if args.optimizer == 'sgd':
optimizer = SGD(model.parameters(), lr = args.init_lr, momentum = 0.9, nesterov = True)
elif args.optimizer == 'adam':
optimizer = Adam(model.parameters(), lr = args.init_lr)
scheduler = ReduceLROnPlateau(optimizer, mode = 'max', factor = args.lr_factor, patience = args.lr_patience) if args.lr_factor < 1.0 else None
criterion = nn.BCELoss()
# Train model
write_log('Training model ...')
write_log(' || GAS_VALID || GAS_EVAL || D_VAL || DCASE_TEST ')
write_log(" CKPT | LR | Tr.LOSS || MAP | MAUC | d' || MAP | MAUC | d' || Gl.F1 || Gl.F1 | Fr.ER | Fr.F1 | 1s.ER | 1s.F1 ")
FORMAT = ' %#4d | %8.0003g | %8.0006f || %5.3f | %5.3f |%6.03f || %5.3f | %5.3f |%6.03f || %5.3f || %5.3f | %5.3f | %5.3f | %5.3f | %5.3f '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT)
write_log(SEP)
for checkpoint in range(1, args.max_ckpt + 1):
# Train for args.ckpt_size batches
model.train()
train_loss = 0
for batch in range(1, args.ckpt_size + 1):
x, y = next(train_gen)
optimizer.zero_grad()
global_prob = model(x)[0]
global_prob.clamp_(min = 1e-7, max = 1 - 1e-7)
loss = criterion(global_prob, y)
train_loss += loss.data[0]
if numpy.isnan(train_loss) or numpy.isinf(train_loss): break
loss.backward()
optimizer.step()
sys.stderr.write('Checkpoint %d, Batch %d / %d, avg train loss = %f\r' % \
(checkpoint, batch, args.ckpt_size, train_loss / batch))
del x, y, global_prob, loss # This line and next line: to save GPU memory
torch.cuda.empty_cache() # I don't know if they're useful or not
train_loss /= args.ckpt_size
# Evaluate model
model.eval()
sys.stderr.write('Evaluating model on GAS_VALID ...\r')
global_prob = model.predict(gas_valid_x, verbose = False)
gv_map, gv_mauc, gv_dprime = gas_eval(global_prob, gas_valid_y)
sys.stderr.write('Evaluating model on GAS_EVAL ... \r')
global_prob = model.predict(gas_eval_x, verbose = False)
ge_map, ge_mauc, ge_dprime = gas_eval(global_prob, gas_eval_y)
sys.stderr.write('Evaluating model on DCASE_VALID ...\r')
global_prob = model.predict(dcase_valid_x, verbose = False)[:, DCASE_CLASS_IDS]
thres = optimize_micro_avg_f1(global_prob, dcase_valid_y)
dv_f1 = f1(global_prob >= thres, dcase_valid_y)
sys.stderr.write('Evaluating model on DCASE_TEST ... \r')
outputs = model.predict(dcase_test_x, verbose = True)
outputs = tuple(x[..., DCASE_CLASS_IDS] for x in outputs)
dt_f1 = f1(outputs[0] >= thres, dcase_test_y)
dt_frame_er, dt_frame_f1 = dcase_sed_eval(outputs, args.pooling, thres, dcase_test_frame_truth, 1)
dt_1s_er, dt_1s_f1 = dcase_sed_eval(outputs, args.pooling, thres, dcase_test_frame_truth, 10)
# Write log
write_log(FORMAT % (
checkpoint, optimizer.param_groups[0]['lr'], train_loss,
gv_map, gv_mauc, gv_dprime,
ge_map, ge_mauc, ge_dprime,
dv_f1, dt_f1, dt_frame_er, dt_frame_f1, dt_1s_er, dt_1s_f1
))
# Abort if training has gone mad
if numpy.isnan(train_loss) or numpy.isinf(train_loss):
write_log('Aborted.')
break
# Save model. Too bad I can't save the scheduler
MODEL_FILE = os.path.join(MODEL_PATH, 'checkpoint%d.pt' % checkpoint)
state = {'model': model.state_dict(), 'optimizer': optimizer.state_dict()}
sys.stderr.write('Saving model to %s ...\r' % MODEL_FILE)
torch.save(state, MODEL_FILE)
# Update learning rate
if scheduler is not None:
scheduler.step(gv_map)
write_log('DONE!')
================================================
FILE: code/audioset/util_f1.py
================================================
import numpy
# Compute F1 given predictions and truth
def f1(pred, truth):
return 2.0 * (pred & truth).sum() / (pred.sum() + truth.sum())
# Given scores and truth for a single class (as 1-D numpy arrays), find optimal threshold and corresponding F1
# Statistics of other classes may be given to optimize micro-average F1
def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpred = 0):
# Start with predicting everything as negative
best_thres = numpy.inf
best_f1 = 0.0
num = extraNcorr # number of correctly predicted instances
den = extraNtrue + extraNpred + truth.sum() # number of predicted instances + true instances
instances = [(-numpy.inf, False)] + sorted(zip(scores, truth))
# Lower the threshold gradually
for i in range(len(instances) - 1, 0, -1):
if instances[i][1]: num += 1
den += 1
if instances[i][0] > instances[i-1][0]: # Can put threshold here
f1 = 2.0 * num / den
if f1 > best_f1:
best_thres = (instances[i][0] + instances[i-1][0]) / 2
best_f1 = f1
return best_thres, best_f1
# Given scores and truth for many classes (as 2-D numpy arrays),
# find the optimal class-specific thresholds (as a 1-D numpy array) that maximizes the micro-average F1
# The algorithm is stochastic, but I have always observed deterministic results
def optimize_micro_avg_f1(scores, truth):
# First optimize each class individually
nClasses = truth.shape[1]
thres = numpy.zeros(nClasses, dtype = 'float64')
for i in range(nClasses):
thres[i], _ = optimize_f1(scores[:,i], truth[:,i])
Ntrue = truth.sum(axis = 0)
Npred = (scores >= thres).sum(axis = 0)
Ncorr = ((scores >= thres) & truth).sum(axis = 0)
# Repeatly re-tune the threshold for each class until convergence
candidates = range(nClasses)
while len(candidates) > 0:
i = numpy.random.choice(candidates)
candidates.remove(i)
old_thres = thres[i]
thres[i], _ = optimize_f1(
scores[:,i],
truth[:,i],
extraNcorr = Ncorr.sum() - Ncorr[i],
extraNtrue = Ntrue.sum() - Ntrue[i],
extraNpred = Npred.sum() - Npred[i],
)
if thres[i] != old_thres:
Npred[i] = (scores[:,i] >= thres[i]).sum(axis = 0)
Ncorr[i] = ((scores[:,i] >= thres[i]) & truth[:,i]).sum(axis = 0)
candidates = range(nClasses)
candidates.remove(i)
return thres
================================================
FILE: code/audioset/util_in.py
================================================
import sys, os, os.path, glob
import cPickle
from scipy.io import loadmat
import numpy
from multiprocessing import Process, Queue
import torch
from torch.autograd import Variable
N_CLASSES = 527
N_WORKERS = 6
GAS_FEATURE_DIR = '../../data/audioset'
DCASE_FEATURE_DIR = '../../data/dcase'
with open(os.path.join(GAS_FEATURE_DIR, 'normalizer.pkl'), 'rb') as f:
mu, sigma = cPickle.load(f)
def sample_generator(file_list, random_seed = 15213):
rng = numpy.random.RandomState(random_seed)
while True:
rng.shuffle(file_list)
for filename in file_list:
data = loadmat(filename)
feat = ((data['feat'] - mu) / sigma).astype('float32')
labels = data['labels'].astype('float32')
for i in range(len(data['feat'])):
yield feat[i], labels[i]
def worker(queues, file_lists, random_seed):
generators = [sample_generator(file_lists[i], random_seed + i) for i in range(len(file_lists))]
while True:
for gen, q in zip(generators, queues):
q.put(next(gen))
def batch_generator(batch_size, random_seed = 15213):
queues = [Queue(5) for class_id in range(N_CLASSES)]
file_lists = [sorted(glob.glob(os.path.join(GAS_FEATURE_DIR, 'GAS_train_unbalanced_class%03d_part*.mat' % class_id))) for class_id in range(N_CLASSES)]
for worker_id in range(N_WORKERS):
p = Process(target = worker, args = (queues[worker_id::N_WORKERS], file_lists[worker_id::N_WORKERS], random_seed))
p.daemon = True
p.start()
rng = numpy.random.RandomState(random_seed)
batch = []
while True:
rng.shuffle(queues)
for q in queues:
batch.append(q.get())
if len(batch) == batch_size:
yield tuple(Variable(torch.from_numpy(numpy.stack(x))).cuda() for x in zip(*batch))
batch = []
def bulk_load(prefix):
feat = []; labels = []; hashes = []
for filename in sorted(glob.glob(os.path.join(GAS_FEATURE_DIR, '%s_*.mat' % prefix)) +
glob.glob(os.path.join(DCASE_FEATURE_DIR, '%s_*.mat' % prefix))):
data = loadmat(filename)
feat.append(((data['feat'] - mu) / sigma).astype('float32'))
labels.append(data['labels'].astype('bool'))
hashes.append(data['hashes'])
return numpy.concatenate(feat), numpy.concatenate(labels), numpy.concatenate(hashes)
def load_dcase_test_frame_truth():
return cPickle.load(open(os.path.join(DCASE_FEATURE_DIR, 'DCASE_test_frame_label.pkl'), 'rb'))
================================================
FILE: code/audioset/util_out.py
================================================
from scipy import stats
import numpy
def roc(pred, truth):
data = numpy.array(sorted(zip(pred, truth), reverse = True))
pred, truth = data[:,0], data[:,1].astype("bool")
TP = truth.cumsum()
FP = (1 - truth).cumsum()
mask = numpy.concatenate([numpy.diff(pred) < 0, numpy.array([True])])
TP = numpy.concatenate([numpy.array([0]), TP[mask]])
FP = numpy.concatenate([numpy.array([0]), FP[mask]])
return TP, FP
def ap_and_auc(pred, truth):
TP, FP = roc(pred, truth)
auc = ((TP[1:] + TP[:-1]) * numpy.diff(FP)).sum() / (2 * TP[-1] * FP[-1])
precision = TP[1:] / (TP + FP)[1:]
weight = numpy.diff(TP)
ap = (precision * weight).sum() / TP[-1]
return ap, auc
def dprime(auc):
return stats.norm().ppf(auc) * numpy.sqrt(2.0)
def gas_eval(pred, truth):
if truth.ndim == 1:
ap, auc = ap_and_auc(pred, truth)
else:
ap, auc = numpy.array([ap_and_auc(pred[:,i], truth[:,i]) for i in range(truth.shape[1]) if truth[:,i].any()]).mean(axis = 0)
return ap, auc, dprime(auc)
def dcase_sed_eval(outputs, pooling, thres, truth, seg_len, verbose = False):
pred = outputs[1].reshape((-1, seg_len, outputs[1].shape[-1]))
if pooling == 'max':
seg_prob = pred.max(axis = 1)
elif pooling == 'ave':
seg_prob = pred.mean(axis = 1)
elif pooling == 'lin':
seg_prob = (pred * pred).sum(axis = 1) / pred.sum(axis = 1)
elif pooling == 'exp':
seg_prob = (pred * numpy.exp(pred)).sum(axis = 1) / numpy.exp(pred).sum(axis = 1)
elif pooling == 'att':
att = outputs[2].reshape((-1, seg_len, outputs[2].shape[-1]))
seg_prob = (pred * att).sum(axis = 1) / att.sum(axis = 1)
pred = seg_prob >= thres
truth = truth.reshape((-1, seg_len, truth.shape[-1])).max(axis = 1)
if not verbose:
Ntrue = truth.sum(axis = 1)
Npred = pred.sum(axis = 1)
Ncorr = (truth & pred).sum(axis = 1)
Nmiss = Ntrue - Ncorr
Nfa = Npred - Ncorr
error_rate = 1.0 * numpy.maximum(Nmiss, Nfa).sum() / Ntrue.sum()
f1 = 2.0 * Ncorr.sum() / (Ntrue + Npred).sum()
return error_rate, f1
else:
class Object(object):
pass
res = Object()
res.TP = (truth & pred).sum()
res.FN = (truth & ~pred).sum()
res.FP = (~truth & pred).sum()
res.precision = 100.0 * res.TP / (res.TP + res.FP)
res.recall = 100.0 * res.TP / (res.TP + res.FN)
res.F1 = 200.0 * res.TP / (2 * res.TP + res.FP + res.FN)
res.sub = numpy.minimum((truth & ~pred).sum(axis = 1), (~truth & pred).sum(axis = 1)).sum()
res.dele = res.FN - res.sub
res.ins = res.FP - res.sub
res.ER = 100.0 * (res.sub + res.dele + res.ins) / (res.TP + res.FN)
return res
================================================
FILE: code/dcase/Net.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy
class Net(nn.Module):
def __init__(self, args):
super(Net, self).__init__()
self.pooling = args.pooling
self.dropout = args.dropout
self.conv1 = nn.Conv2d(1, 32, (5, 5), padding = (2, 2)) # (1, 400, 64) -> (32, 400, 64)
self.conv2 = nn.Conv2d(32, 64, (5, 5), padding = (2, 2)) # (32, 400, 32) -> (64, 400, 32)
self.conv3 = nn.Conv2d(64, 128, (5, 5), padding = (2, 2)) # (64, 200, 16) -> (128, 200, 16)
self.gru = nn.GRU(1024, 100, 1, batch_first = True, bidirectional = True)
self.fc_prob = nn.Linear(200, 17)
if self.pooling == 'att':
self.fc_att = nn.Linear(200, 17)
# Better initialization
nn.init.xavier_uniform(self.conv1.weight); nn.init.constant(self.conv1.bias, 0)
nn.init.xavier_uniform(self.conv2.weight); nn.init.constant(self.conv2.bias, 0)
nn.init.xavier_uniform(self.conv3.weight); nn.init.constant(self.conv3.bias, 0)
nn.init.orthogonal(self.gru.weight_ih_l0); nn.init.constant(self.gru.bias_ih_l0, 0)
nn.init.orthogonal(self.gru.weight_hh_l0); nn.init.constant(self.gru.bias_hh_l0, 0)
nn.init.orthogonal(self.gru.weight_ih_l0_reverse); nn.init.constant(self.gru.bias_ih_l0_reverse, 0)
nn.init.orthogonal(self.gru.weight_hh_l0_reverse); nn.init.constant(self.gru.bias_hh_l0_reverse, 0)
nn.init.xavier_uniform(self.fc_prob.weight); nn.init.constant(self.fc_prob.bias, 0)
if self.pooling == 'att':
nn.init.xavier_uniform(self.fc_att.weight); nn.init.constant(self.fc_att.bias, 0)
def forward(self, x):
# shape of x: (batch, time, frequency) = (batch, 400, 64)
x = x.view((-1, 1, x.size(1), x.size(2))) # x becomes (batch, channel, time, frequency) = (batch, 1, 400, 64)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x = F.max_pool2d(F.relu(self.conv1(x)), (1, 2)) # (batch, 32, 400, 32)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x = F.max_pool2d(F.relu(self.conv2(x)), (2, 2)) # (batch, 64, 200, 16)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x = F.max_pool2d(F.relu(self.conv3(x)), (2, 2)) # (batch, 128, 100, 8)
x = x.permute(0, 2, 1, 3).contiguous() # x becomes (batch, time, channel, frequency) = (batch, 100, 128, 8)
x = x.view((-1, x.size(1), x.size(2) * x.size(3))) # x becomes (batch, time, channel * frequency) = (batch, 100, 1024)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x, _ = self.gru(x) # (batch, 100, 200)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
frame_prob = F.sigmoid(self.fc_prob(x)) # shape of frame_prob: (batch, time, class) = (batch, 100, 17)
if self.pooling == 'max':
global_prob, _ = frame_prob.max(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'ave':
global_prob = frame_prob.mean(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'lin':
global_prob = (frame_prob * frame_prob).sum(dim = 1) / frame_prob.sum(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'exp':
global_prob = (frame_prob * frame_prob.exp()).sum(dim = 1) / frame_prob.exp().sum(dim = 1)
return global_prob, frame_prob
elif self.pooling == 'att':
frame_att = F.softmax(self.fc_att(x), dim = 1)
global_prob = (frame_prob * frame_att).sum(dim = 1)
return global_prob, frame_prob, frame_att
def predict(self, x, verbose = True, batch_size = 100):
# Predict in batches. Both input and output are numpy arrays.
# If verbose == True, return all of global_prob, frame_prob and att
# If verbose == False, only return global_prob
result = []
for i in range(0, len(x), batch_size):
with torch.no_grad():
input = Variable(torch.from_numpy(x[i : i + batch_size])).cuda()
output = self.forward(input)
if not verbose: output = output[:1]
result.append([var.data.cpu().numpy() for var in output])
result = tuple(numpy.concatenate(items) for items in zip(*result))
return result if verbose else result[0]
================================================
FILE: code/dcase/eval.py
================================================
import sys, os, os.path
import argparse
import numpy
from util_out import *
from util_f1 import *
from scipy.io import loadmat, savemat
# Parse input arguments
parser = argparse.ArgumentParser(description = '')
parser.add_argument('--pooling', type = str, default = 'lin', choices = ['max', 'ave', 'lin', 'exp', 'att'])
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--batch_size', type = int, default = 100)
parser.add_argument('--ckpt_size', type = int, default = 500)
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 3e-4)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 0.5)
parser.add_argument('--random_seed', type = int, default = 15213)
parser.add_argument('--ckpt', type = int)
args = parser.parse_args()
# Locate model file and prepare directories for prediction and evaluation
expid = '%s-drop%.1f-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.pooling,
args.dropout,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/dcase', expid)
MODEL_FILE = os.path.join(WORKSPACE, 'model', 'checkpoint%d.pt' % args.ckpt)
PRED_PATH = os.path.join(WORKSPACE, 'pred')
if not os.path.exists(PRED_PATH): os.makedirs(PRED_PATH)
PRED_FILE = os.path.join(PRED_PATH, 'checkpoint%d.mat' % args.ckpt)
EVAL_PATH = os.path.join(WORKSPACE, 'eval')
if not os.path.exists(EVAL_PATH): os.makedirs(EVAL_PATH)
EVAL_FILE = os.path.join(EVAL_PATH, 'checkpoint%d.txt' % args.ckpt)
with open(EVAL_FILE, 'w'):
pass
def write_log(s):
print s
with open(EVAL_FILE, 'a') as f:
f.write(s + '\n')
if os.path.exists(PRED_FILE):
# Load saved predictions, no need to use GPU
data = loadmat(PRED_FILE)
thres = data['thres'].ravel()
test_y = data['test_y']
test_frame_y = data['test_frame_y']
test_outputs = []
test_outputs.append(data['test_global_prob'])
test_outputs.append(data['test_frame_prob'])
if args.pooling == 'att':
test_outputs.append(data['test_frame_att'])
else:
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from util_in import *
# Load model
model = Net(args).cuda()
model.load_state_dict(torch.load(MODEL_FILE)['model'])
model.eval()
# Load data
valid_x, valid_y, _ = bulk_load('DCASE_valid')
test_x, test_y, test_hashes = bulk_load('DCASE_test')
test_frame_y = load_dcase_test_frame_truth()
# Predict
valid_global_prob = model.predict(valid_x, verbose = False)
thres = optimize_micro_avg_f1(valid_global_prob, valid_y)
test_outputs = model.predict(test_x, verbose = True)
# Save predictions
data = {}
data['thres'] = thres
data['test_hashes'] = test_hashes
data['test_y'] = test_y
data['test_frame_y'] = test_frame_y
data['test_global_prob'] = test_outputs[0]
data['test_frame_prob'] = test_outputs[1]
if args.pooling == 'att':
data['test_frame_att'] = test_outputs[2]
savemat(PRED_FILE, data)
# Evaluation
write_log(' || || Task A (recording level) || Task B (1-second segment level) ')
write_log(' CLASS || THRES || TP | FN | FP | Prec. | Recall | F1 || TP | FN | FP | Prec. | Recall | F1 | Sub | Del | Ins | ER ')
FORMAT1 = ' Micro Avg || || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f | %#4d | %#4d | %#4d | %6.02f '
FORMAT2 = ' %######9d || %8.0006f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f || %#4d | %#4d | %#4d | %6.02f | %6.02f | %6.02f | | | | '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT1)
write_log(SEP)
# test_y and test_frame_y are inconsistent in some places
# so when you evaluate Task A, use a "fake_test_frame_y" derived from test_y
fake_test_frame_y = numpy.tile(numpy.expand_dims(test_y, 1), (1, 100, 1))
# Micro-average performance across all classes
res_taskA = dcase_sed_eval(test_outputs, args.pooling, thres, fake_test_frame_y, 100, verbose = True)
res_taskB = dcase_sed_eval(test_outputs, args.pooling, thres, test_frame_y, 10, verbose = True)
write_log(FORMAT1 % (res_taskA.TP, res_taskA.FN, res_taskA.FP, res_taskA.precision, res_taskA.recall, res_taskA.F1,
res_taskB.TP, res_taskB.FN, res_taskB.FP, res_taskB.precision, res_taskB.recall, res_taskB.F1,
res_taskB.sub, res_taskB.dele, res_taskB.ins, res_taskB.ER))
write_log(SEP)
# Class-wise performance
N_CLASSES = test_outputs[0].shape[-1]
for i in range(N_CLASSES):
outputs = [x[..., i:i+1] for x in test_outputs]
res_taskA = dcase_sed_eval(outputs, args.pooling, thres[i], fake_test_frame_y[..., i:i+1], 100, verbose = True)
res_taskB = dcase_sed_eval(outputs, args.pooling, thres[i], test_frame_y[..., i:i+1], 10, verbose = True)
write_log(FORMAT2 % (i, thres[i],
res_taskA.TP, res_taskA.FN, res_taskA.FP, res_taskA.precision, res_taskA.recall, res_taskA.F1,
res_taskB.TP, res_taskB.FN, res_taskB.FP, res_taskB.precision, res_taskB.recall, res_taskB.F1))
================================================
FILE: code/dcase/train.py
================================================
import sys, os, os.path, time
import argparse
import numpy
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from util_in import *
from util_out import *
from util_f1 import *
torch.backends.cudnn.benchmark = True
# Parse input arguments
parser = argparse.ArgumentParser(description = '')
parser.add_argument('--pooling', type = str, default = 'lin', choices = ['max', 'ave', 'lin', 'exp', 'att'])
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--batch_size', type = int, default = 100)
parser.add_argument('--ckpt_size', type = int, default = 500)
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 3e-4)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 0.5)
parser.add_argument('--max_ckpt', type = int, default = 50)
parser.add_argument('--random_seed', type = int, default = 15213)
args = parser.parse_args()
numpy.random.seed(args.random_seed)
# Prepare log file and model directory
expid = '%s-drop%.1f-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.pooling,
args.dropout,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/dcase', expid)
MODEL_PATH = os.path.join(WORKSPACE, 'model')
if not os.path.exists(MODEL_PATH): os.makedirs(MODEL_PATH)
LOG_FILE = os.path.join(WORKSPACE, 'train.log')
with open(LOG_FILE, 'w'):
pass
def write_log(s):
timestamp = time.strftime('%Y-%m-%d %H:%M:%S')
msg = '[' + timestamp + '] ' + s
print msg
with open(LOG_FILE, 'a') as f:
f.write(msg + '\n')
# Load data
write_log('Loading data ...')
valid_x, valid_y, _ = bulk_load('DCASE_valid')
test_x, test_y, _ = bulk_load('DCASE_test')
test_frame_y = load_dcase_test_frame_truth()
# Build model
write_log('Building model ...')
model = Net(args).cuda()
if args.optimizer == 'sgd':
optimizer = SGD(model.parameters(), lr = args.init_lr, momentum = 0.9, nesterov = True)
elif args.optimizer == 'adam':
optimizer = Adam(model.parameters(), lr = args.init_lr)
if args.lr_factor < 1.0:
scheduler = ReduceLROnPlateau(optimizer, mode = 'min', factor = args.lr_factor, patience = args.lr_patience)
criterion = nn.BCELoss()
def bce_loss(input, target):
return -numpy.log(numpy.where(target, input, 1 - input)).sum() / input.size
# Train model
write_log('Training model ...')
write_log(' || D_VAL || DCASE_TEST ')
write_log(' CKPT | LR | Tr.LOSS | Val.LOSS || Gl.F1 || Gl.F1 | Fr.ER | Fr.F1 | 1s.ER | 1s.F1 ')
FORMAT = ' %#4d | %8.0003g | %8.0006f | %8.0006f || %5.3f || %5.3f | %5.3f | %5.3f | %5.3f | %5.3f '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT)
write_log(SEP)
gen_train = batch_generator(args.batch_size, args.random_seed)
for ckpt in range(1, args.max_ckpt + 1):
model.train()
train_loss = 0
for i in range(args.ckpt_size):
x, y = next(gen_train)
optimizer.zero_grad()
global_prob = model(x)[0]
global_prob.clamp_(min = 1e-7, max = 1 - 1e-7)
loss = criterion(global_prob, y)
train_loss += loss.data[0]
loss.backward()
optimizer.step()
sys.stderr.write('Checkpoint %d, Batch %d / %d, avg train loss = %f\r' % (ckpt, i + 1, args.ckpt_size, train_loss / (i + 1)))
train_loss /= args.ckpt_size
# Compute validation loss, validation F1 and test F1
model.eval()
valid_global_prob = model.predict(valid_x, verbose = False)
valid_loss = bce_loss(valid_global_prob, valid_y)
thres = optimize_micro_avg_f1(valid_global_prob, valid_y)
valid_global_f1 = f1(valid_global_prob >= thres, valid_y)
test_outputs = model.predict(test_x, verbose = True)
test_global_f1 = f1(test_outputs[0] >= thres, test_y)
test_frame_er, test_frame_f1 = dcase_sed_eval(test_outputs, args.pooling, thres, test_frame_y, 1) # every 1 frame is a segment
test_1s_er, test_1s_f1 = dcase_sed_eval(test_outputs, args.pooling, thres, test_frame_y, 10) # every 10 frame is a segment
# Write log
write_log(FORMAT % (
ckpt, optimizer.param_groups[0]['lr'], train_loss, valid_loss,
valid_global_f1, test_global_f1, test_frame_er, test_frame_f1, test_1s_er, test_1s_f1
))
# Abort if training has gone mad
if numpy.isnan(train_loss) or numpy.isinf(train_loss):
write_log('Aborted.')
break
# Save model. Too bad I can't save the scheduler
MODEL_FILE = os.path.join(MODEL_PATH, 'checkpoint%d.pt' % ckpt)
state = {'model': model.state_dict(), 'optimizer': optimizer.state_dict()}
torch.save(state, MODEL_FILE)
# Update learning rate
if args.lr_factor < 1.0:
scheduler.step(valid_loss)
write_log('DONE!')
================================================
FILE: code/dcase/util_f1.py
================================================
import numpy
# Compute F1 given predictions and truth
def f1(pred, truth):
return 2.0 * (pred & truth).sum() / (pred.sum() + truth.sum())
# Given scores and truth for a single class (as 1-D numpy arrays), find optimal threshold and corresponding F1
# Statistics of other classes may be given to optimize micro-average F1
def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpred = 0):
# Start with predicting everything as negative
best_thres = numpy.inf
best_f1 = 0.0
num = extraNcorr # number of correctly predicted instances
den = extraNtrue + extraNpred + truth.sum() # number of predicted instances + true instances
instances = [(-numpy.inf, False)] + sorted(zip(scores, truth))
# Lower the threshold gradually
for i in range(len(instances) - 1, 0, -1):
if instances[i][1]: num += 1
den += 1
if instances[i][0] > instances[i-1][0]: # Can put threshold here
f1 = 2.0 * num / den
if f1 > best_f1:
best_thres = (instances[i][0] + instances[i-1][0]) / 2
best_f1 = f1
return best_thres, best_f1
# Given scores and truth for many classes (as 2-D numpy arrays),
# find the optimal class-specific thresholds (as a 1-D numpy array) that maximizes the micro-average F1
# The algorithm is stochastic, but I have always observed deterministic results
def optimize_micro_avg_f1(scores, truth):
# First optimize each class individually
nClasses = truth.shape[1]
thres = numpy.zeros(nClasses, dtype = 'float64')
for i in range(nClasses):
thres[i], _ = optimize_f1(scores[:,i], truth[:,i])
Ntrue = truth.sum(axis = 0)
Npred = (scores >= thres).sum(axis = 0)
Ncorr = ((scores >= thres) & truth).sum(axis = 0)
# Repeatly re-tune the threshold for each class until convergence
candidates = range(nClasses)
while len(candidates) > 0:
i = numpy.random.choice(candidates)
candidates.remove(i)
old_thres = thres[i]
thres[i], _ = optimize_f1(
scores[:,i],
truth[:,i],
extraNcorr = Ncorr.sum() - Ncorr[i],
extraNtrue = Ntrue.sum() - Ntrue[i],
extraNpred = Npred.sum() - Npred[i],
)
if thres[i] != old_thres:
Npred[i] = (scores[:,i] >= thres[i]).sum(axis = 0)
Ncorr[i] = ((scores[:,i] >= thres[i]) & truth[:,i]).sum(axis = 0)
candidates = range(nClasses)
candidates.remove(i)
return thres
================================================
FILE: code/dcase/util_in.py
================================================
import sys, os, os.path, glob
import cPickle
from scipy.io import loadmat
import numpy
from multiprocessing import Process, Queue
import torch
from torch.autograd import Variable
N_CLASSES = 17
N_WORKERS = 6
FEATURE_DIR = '../../data/dcase'
with open(os.path.join(FEATURE_DIR, 'normalizer.pkl'), 'rb') as f:
mu, sigma = cPickle.load(f)
def sample_generator(file_list, random_seed = 15213):
rng = numpy.random.RandomState(random_seed)
while True:
rng.shuffle(file_list)
for filename in file_list:
data = loadmat(filename)
feat = ((data['feat'] - mu) / sigma).astype('float32')
labels = data['labels'].astype('float32')
for i in range(len(data['feat'])):
yield feat[i], labels[i]
def worker(queues, file_lists, random_seed):
generators = [sample_generator(file_lists[i], random_seed + i) for i in range(len(file_lists))]
while True:
for gen, q in zip(generators, queues):
q.put(next(gen))
def batch_generator(batch_size, random_seed = 15213):
queues = [Queue(5) for class_id in range(N_CLASSES)]
file_lists = [sorted(glob.glob(os.path.join(FEATURE_DIR, 'DCASE_train_class%02d_part*.mat' % class_id))) for class_id in range(N_CLASSES)]
for worker_id in range(N_WORKERS):
p = Process(target = worker, args = (queues[worker_id::N_WORKERS], file_lists[worker_id::N_WORKERS], random_seed))
p.daemon = True
p.start()
rng = numpy.random.RandomState(random_seed)
batch = []
while True:
rng.shuffle(queues)
for q in queues:
batch.append(q.get())
if len(batch) == batch_size:
yield tuple(Variable(torch.from_numpy(numpy.stack(x))).cuda() for x in zip(*batch))
batch = []
def bulk_load(prefix):
feat = []; labels = []; hashes = []
for filename in sorted(glob.glob(os.path.join(FEATURE_DIR, '%s_*.mat' % prefix))):
data = loadmat(filename)
feat.append(((data['feat'] - mu) / sigma).astype('float32'))
labels.append(data['labels'].astype('bool'))
hashes.append(data['hashes'])
return numpy.concatenate(feat), numpy.concatenate(labels), numpy.concatenate(hashes)
def load_dcase_test_frame_truth():
return cPickle.load(open(os.path.join(FEATURE_DIR, 'DCASE_test_frame_label.pkl'), 'rb'))
================================================
FILE: code/dcase/util_out.py
================================================
import numpy
def dcase_sed_eval(outputs, pooling, thres, truth, seg_len, verbose = False):
pred = outputs[1].reshape((-1, seg_len, outputs[1].shape[-1]))
if pooling == 'max':
seg_prob = pred.max(axis = 1)
elif pooling == 'ave':
seg_prob = pred.mean(axis = 1)
elif pooling == 'lin':
seg_prob = (pred * pred).sum(axis = 1) / pred.sum(axis = 1)
elif pooling == 'exp':
seg_prob = (pred * numpy.exp(pred)).sum(axis = 1) / numpy.exp(pred).sum(axis = 1)
elif pooling == 'att':
att = outputs[2].reshape((-1, seg_len, outputs[2].shape[-1]))
seg_prob = (pred * att).sum(axis = 1) / att.sum(axis = 1)
pred = seg_prob >= thres
truth = truth.reshape((-1, seg_len, truth.shape[-1])).max(axis = 1)
if not verbose:
Ntrue = truth.sum(axis = 1)
Npred = pred.sum(axis = 1)
Ncorr = (truth & pred).sum(axis = 1)
Nmiss = Ntrue - Ncorr
Nfa = Npred - Ncorr
error_rate = 1.0 * numpy.maximum(Nmiss, Nfa).sum() / Ntrue.sum()
f1 = 2.0 * Ncorr.sum() / (Ntrue + Npred).sum()
return error_rate, f1
else:
class Object(object):
pass
res = Object()
res.TP = (truth & pred).sum()
res.FN = (truth & ~pred).sum()
res.FP = (~truth & pred).sum()
res.precision = 100.0 * res.TP / (res.TP + res.FP)
res.recall = 100.0 * res.TP / (res.TP + res.FN)
res.F1 = 200.0 * res.TP / (2 * res.TP + res.FP + res.FN)
res.sub = numpy.minimum((truth & ~pred).sum(axis = 1), (~truth & pred).sum(axis = 1)).sum()
res.dele = res.FN - res.sub
res.ins = res.FP - res.sub
res.ER = 100.0 * (res.sub + res.dele + res.ins) / (res.TP + res.FN)
return res
================================================
FILE: code/sequential/Net.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy
class ConvBlock(nn.Module):
def __init__(self, n_input_feature_maps, n_output_feature_maps, kernel_size_2d, batch_norm = False, pool_stride = None):
super(ConvBlock, self).__init__()
assert all(x % 2 == 1 for x in kernel_size_2d)
self.n_input = n_input_feature_maps
self.n_output = n_output_feature_maps
self.kernel_size = kernel_size_2d
self.batch_norm = batch_norm
self.pool_stride = pool_stride
# "~batch_norm" should be written as "not batch_norm"; otherwise ~True will evaluate to -2 and be treated as True.
# But I'll keep this error to avoid breaking existing models.
self.conv = nn.Conv2d(self.n_input, self.n_output, self.kernel_size, padding = tuple(x/2 for x in self.kernel_size), bias = ~batch_norm)
if batch_norm: self.bn = nn.BatchNorm2d(self.n_output)
nn.init.xavier_uniform(self.conv.weight)
def forward(self, x):
x = self.conv(x)
if self.batch_norm: x = self.bn(x)
x = F.relu(x)
if self.pool_stride is not None: x = F.max_pool2d(x, self.pool_stride)
return x
class Net(nn.Module):
def __init__(self, args):
super(Net, self).__init__()
self.__dict__.update(args.__dict__) # Instill all args into self
assert self.n_conv_layers % self.n_pool_layers == 0
self.input_n_freq_bins = n_freq_bins = 64
self.output_size = 71 if self.mode == 'ctc' else 35
self.conv = []
pool_interval = self.n_conv_layers / self.n_pool_layers
n_input = 1
for i in range(self.n_conv_layers):
if (i + 1) % pool_interval == 0: # this layer has pooling
n_freq_bins /= 2
n_output = self.embedding_size / n_freq_bins
pool_stride = (2, 2) if i < pool_interval * 2 else (1, 2)
else:
n_output = self.embedding_size * 2 / n_freq_bins
pool_stride = None
layer = ConvBlock(n_input, n_output, self.kernel_size, batch_norm = self.batch_norm, pool_stride = pool_stride)
self.conv.append(layer)
self.__setattr__('conv' + str(i + 1), layer)
n_input = n_output
self.gru = nn.GRU(self.embedding_size, self.embedding_size / 2, 1, batch_first = True, bidirectional = True)
self.fc = nn.Linear(self.embedding_size, self.output_size)
# Better initialization
nn.init.orthogonal(self.gru.weight_ih_l0); nn.init.constant(self.gru.bias_ih_l0, 0)
nn.init.orthogonal(self.gru.weight_hh_l0); nn.init.constant(self.gru.bias_hh_l0, 0)
nn.init.orthogonal(self.gru.weight_ih_l0_reverse); nn.init.constant(self.gru.bias_ih_l0_reverse, 0)
nn.init.orthogonal(self.gru.weight_hh_l0_reverse); nn.init.constant(self.gru.bias_hh_l0_reverse, 0)
nn.init.xavier_uniform(self.fc.weight); nn.init.constant(self.fc.bias, 0)
def forward(self, x):
x = x.view((-1, 1, x.size(1), x.size(2))) # x becomes (batch, channel, time, freq)
for i in range(len(self.conv)):
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x = self.conv[i](x) # x becomes (batch, channel, time, freq)
x = x.permute(0, 2, 1, 3).contiguous() # x becomes (batch, time, channel, freq)
x = x.view((-1, x.size(1), x.size(2) * x.size(3))) # x becomes (batch, time, embedding_size)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
x, _ = self.gru(x) # x becomes (batch, time, embedding_size)
if self.dropout > 0: x = F.dropout(x, p = self.dropout, training = self.training)
if self.mode == 'ctc':
log_prob = F.log_softmax(self.fc(x), dim = -1) # shape of log_prob: (batch, time, output_size)
return log_prob # returns the log probability
else:
frame_prob = F.sigmoid(self.fc(x)) # shape of frame_prob: (batch, time, output_size)
frame_prob = torch.clamp(frame_prob, 1e-7, 1 - 1e-7)
return frame_prob
def predict(self, x, batch_size = 300):
# Predict in batches. Both input and output are numpy arrays.
result = []
for i in range(0, len(x), batch_size):
with torch.no_grad():
input = Variable(torch.from_numpy(x[i : i + batch_size])).cuda()
output = self.forward(input)
result.append(output.data.cpu().numpy())
return numpy.concatenate(result)
================================================
FILE: code/sequential/ctc.py
================================================
import numpy
numpy.seterr(divide = 'ignore')
import torch
from torch.autograd import Variable
def logsumexp(*args):
M = reduce(torch.max, args)
mask = M != -numpy.inf
M[mask] += torch.log(sum(torch.exp(x[mask] - M[mask]) for x in args))
# Must pick the valid part out, otherwise the gradient will contain NaNs
return M
# Input arguments:
# logProb: a 3-D Variable of size N_SEQS * N_FRAMES * N_LABELS containing LOG probabilities.
# seqLen: a list or numpy array indicating the number of valid frames in each sequence.
# label: a list of label sequences.
# Note on implementation:
# Anything that will be backpropped must be a Variable;
# Anything used as an index must be a torch.cuda.LongTensor.
def ctc_loss(logProb, seqLen, label, debug = False):
seqLen = numpy.array(seqLen)
nSeqs, nFrames = logProb.size(0), logProb.size(1)
# Find out the lengths of the label sequences
labelLen = torch.from_numpy(numpy.array([len(x) for x in label])).cuda()
# Insert blank symbol at the beginning, at the end, and between all symbols of the label sequences
nStates = max(len(x) for x in label) * 2 + 1
extendedLabel = numpy.zeros((nSeqs, nStates), dtype = 'int64')
for i in range(nSeqs):
extendedLabel[i, 1 : (len(label[i]) * 2) : 2] = label[i]
label = torch.from_numpy(extendedLabel).cuda()
# Compute alpha trellis
dummyColumn = Variable(-numpy.inf * torch.ones((nSeqs, 1)).cuda())
allSeqIndex = torch.from_numpy(numpy.tile(numpy.arange(nSeqs), (nStates, 1)).T).cuda()
uttLogProb = Variable(torch.zeros(nSeqs).cuda())
for frame in range(nFrames):
if frame == 0:
# Initialize the log probability first two states to log(1), and other states to log(0)
alpha = Variable(-numpy.inf * torch.ones((nSeqs, nStates)).cuda())
alpha[:, :2] = 0
else:
# Receive probability from previous frame
p2 = alpha[:, :-2].clone()
p2[label[:, 2:] == label[:, :-2]] = -numpy.inf
# Probability can pass across labels two steps apart if they are different
alpha = logsumexp(alpha,
torch.cat([dummyColumn, alpha[:, :-1]], 1),
torch.cat([dummyColumn, dummyColumn, p2], 1))
# Multiply with the probability of current frame
alpha += logProb[allSeqIndex, frame, label]
# Collect probability for ends of utterances
seqIndex = (seqLen == frame + 1).nonzero()[0]
if len(seqIndex) > 0:
seqIndex = torch.from_numpy(seqIndex).cuda()
ll = labelLen[seqIndex]
p = alpha[seqIndex, ll * 2].clone()
if (ll > 0).any():
p[ll > 0] = logsumexp(p[ll > 0], alpha[seqIndex[ll > 0], ll[ll > 0] * 2 - 1])
uttLogProb[seqIndex] = p
# Return the per-frame negative log probability of all utterances (and per-utterance log probs if debug == True)
loss = -uttLogProb.sum() / seqLen.sum()
if debug:
return loss, uttLogProb
else:
return loss
if __name__ == '__main__':
torch.set_printoptions(precision = 5)
label = numpy.array([[2, 1, 1, 3], # BAAC
[0, 0, 0, 0], # null
[1, 0, 0, 0], # A
[3, 2, 0, 0], # CB
[0, 0, 0, 0], # null
[1, 0, 0, 0], # A
[3, 2, 0, 0]]) # CB
seqLen = numpy.array([5, 3, 3, 3, 1, 1, 1])
logProb = numpy.log(numpy.tile(numpy.array([[[0.1, 0.2, 0.3, 0.4]]], dtype = 'float32'), (len(seqLen), max(seqLen), 1)))
logProb = Variable(torch.from_numpy(logProb).cuda(), requires_grad = True)
loss, uttLogProb = ctc_loss(logProb, seqLen, label, debug = True)
print loss, torch.exp(uttLogProb)
# Expected output of torch.exp(uttLogProb): [0.00048, 0.001, 0.022, 0.12, 0.1, 0.2, 0]
loss.backward()
# print logProb.grad
================================================
FILE: code/sequential/ctl.py
================================================
import numpy
numpy.seterr(divide = 'ignore')
import torch
from torch.autograd import Variable
def cuda(x):
return x.cuda() if torch.cuda.is_available() else x
def tensor(array):
if array.dtype == 'bool':
array = array.astype('uint8')
return cuda(torch.from_numpy(array))
def variable(array):
if isinstance(array, numpy.ndarray):
array = tensor(array)
return cuda(Variable(array))
def logsumexp(*args):
M = reduce(torch.max, args)
mask = M != -numpy.inf
M[mask] += torch.log(sum(torch.exp(x[mask] - M[mask]) for x in args))
# Must pick the valid part out, otherwise the gradient will contain NaNs
return M
# Input arguments:
# frameProb: a 3-D Variable of size N_SEQS * N_FRAMES * N_CLASSES containing the probability of each event at each frame.
# seqLen: a list or numpy array indicating the number of valid frames in each sequence.
# label: a list of label sequences.
# Note on implementation:
# Anything that will be backpropped must be a Variable;
# Anything used as an index must be a torch.cuda.LongTensor.
def ctl_loss(frameProb, seqLen, label, maxConcur = 1, debug = False):
seqLen = numpy.array(seqLen)
nSeqs, nFrames, nClasses = frameProb.size()
# Clear the content in the frames of frameProb beyond seqLen
frameIndex = numpy.tile(numpy.arange(nFrames), (nSeqs, 1))
mask = variable(numpy.expand_dims(frameIndex < seqLen.reshape((nSeqs, 1)), 2))
z = variable(torch.zeros(frameProb.size()))
frameProb = torch.where(mask, frameProb, z)
# Convert frameProb (probabilities of events) into probabilities of event boundaries
z = variable(1e-7 * torch.ones((nSeqs, 1, nClasses))) # Real zeros would cause NaNs in the gradient
frameProb = torch.cat([z, frameProb, z], dim = 1)
startProb = torch.clamp(frameProb[:, 1:] - frameProb[:, :-1], min = 1e-7)
endProb = torch.clamp(frameProb[:, :-1] - frameProb[:, 1:], min = 1e-7)
boundaryProb = torch.stack([startProb, endProb], dim = 3).view((nSeqs, nFrames + 1, nClasses * 2))
blankLogProb = torch.log(1 - boundaryProb).sum(dim = 2)
# blankLogProb[seq, frame] = log probability of emitting nothing at this frame
deltaLogProb = torch.log(boundaryProb) - torch.log(1 - boundaryProb)
# deltaLogProb[seq, frame, token] = log prob of emitting token minus log prob of not emitting token
# Find out the lengths of the label sequences
labelLen = tensor(numpy.array([len(x) for x in label]))
# Put the label sequences into a Variable
maxLabelLen = max(len(x) for x in label)
L = numpy.zeros((nSeqs, maxLabelLen), dtype = 'int64')
for i in range(nSeqs):
L[i, :len(label[i])] = numpy.array(label[i]) - 1 # minus one because we no longer have a dedicated blank token
label = tensor(L)
if maxConcur > maxLabelLen:
maxConcur = maxLabelLen
# Compute alpha trellis
# alpha[m, n] = log probability of having emitted n tokens in the m-th sequence at the current frame
nStates = maxLabelLen + 1
alpha = variable(-numpy.inf * torch.ones((nSeqs, nStates)))
alpha[:, 0] = 0
seqIndex = tensor(numpy.tile(numpy.arange(nSeqs), (nStates, 1)).T)
dummyColumns = variable(-numpy.inf * torch.ones((nSeqs, maxConcur)))
uttLogProb = variable(torch.zeros(nSeqs))
for frame in range(nFrames + 1): # +1 because we are considering boundaries
# Case 0: don't emit anything at current frame
p = alpha + blankLogProb[:, frame].view((-1, 1))
alpha = p
for i in range(1, maxConcur + 1):
# Case i: emit i tokens at current frame
p = p[:, :-1] + deltaLogProb[seqIndex[:, i:], frame, label[:, (i-1):]]
alpha = logsumexp(alpha, torch.cat([dummyColumns[:, :i], p], dim = 1))
# Collect probability for ends of utterances
finishedSeqs = (seqLen == frame).nonzero()[0]
if len(finishedSeqs) > 0:
finishedSeqs = tensor(finishedSeqs)
uttLogProb[finishedSeqs] = alpha[finishedSeqs, labelLen[finishedSeqs]].clone()
# Return the per-frame negative log probability of all utterances (and per-utterance log probs if debug == True)
loss = -uttLogProb.sum() / (seqLen + 1).sum()
if debug:
return loss, uttLogProb
else:
return loss
if __name__ == '__main__':
def strip(variable):
return variable.data.cpu().numpy()
torch.set_printoptions(precision = 5)
frameProb = numpy.array([[[0.1, 0.9, 0.9], [0.1, 0.9, 0.9], [0.1, 0.9, 0.9], [0.1, 0.9, 0.1]]], dtype = 'float32') # event B all the time; event C in the first three frames
frameProb = numpy.tile(frameProb, (4, 1, 1))
frameProb = Variable(tensor(frameProb), requires_grad = True)
label = [[3, 5, 6, 4], [3, 4], [5, 6], []] # <B><C></C></B>; <B></B>; <C></C>; empty
seqLen = numpy.array([4, 4, 4, 4])
loss, uttLogProb = ctl_loss(frameProb, seqLen, label, maxConcur = 1, debug = True)
print strip(loss), strip(torch.exp(uttLogProb))
loss, uttLogProb = ctl_loss(frameProb, seqLen, label, maxConcur = 2, debug = True)
print strip(loss), strip(torch.exp(uttLogProb))
loss, uttLogProb = ctl_loss(frameProb, seqLen, label, maxConcur = 3, debug = True)
print strip(loss), strip(torch.exp(uttLogProb))
# Reference output:
# [ 1.45882034] [ 2.10689101e-03 2.61903927e-03 1.27433671e-03 3.03234774e-05] # Prob of first label sequence is small
# [ 1.26348567] [ 1.04593262e-01 2.61992868e-03 1.27623521e-03 3.03234774e-05] # Prob of first label sequence gets big, because <B><C> can be emitted at the same time
# [ 1.263484 ] [ 1.04596682e-01 2.61992868e-03 1.27623521e-03 3.03234774e-05] # Prob of first label sequence stays almost the same, because it doesn't need to emit three tokens at the same time
loss.backward()
print frameProb.grad
================================================
FILE: code/sequential/eval.py
================================================
import sys, os, os.path
import argparse
import numpy
from util_out import *
from util_f1 import *
from scipy.io import loadmat, savemat
# Parse input arguments
def mybool(s):
return s.lower() in ['t', 'true', 'y', 'yes', '1']
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type = str, default = 'ctl', choices = ['strong', 'mil', 'ctc', 'ctl', 'combine'])
parser.add_argument('--embedding_size', type = int, default = 512)
# This is the embedding size after a pooling layer or after the GRU layer
# After a non-pooling layer, the embeddings size will be twice this much
parser.add_argument('--n_conv_layers', type = int, default = 6)
parser.add_argument('--kernel_size', type = str, default = '3') # 'n' or 'nxm'
parser.add_argument('--n_pool_layers', type = int, default = 6)
# the pooling layers will be inserted uniformly into the conv layers
# the should be at least 2 and at most 6 pooling layers
# the first two pooling layers will have stride (2,2); later ones will have stride (1,2)
parser.add_argument('--max_concur', type = int, default = 1)
parser.add_argument('--mil_weight', type = float, default = 3.3)
parser.add_argument('--ctl_weight', type = float, default = 1.0)
parser.add_argument('--batch_norm', type = mybool, default = True)
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--batch_size', type = int, default = 500)
parser.add_argument('--ckpt_size', type = int, default = 200) # how many batches per checkpoint
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 1e-3)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 1.0)
parser.add_argument('--random_seed', type = int, default = 15213)
parser.add_argument('--ckpt', type = int)
args = parser.parse_args()
if 'x' not in args.kernel_size:
args.kernel_size = args.kernel_size + 'x' + args.kernel_size
# Locate model file and prepare directories for prediction and evaluation
expid = '%s-embed%d-%dC%dP-kernel%s%s%s-%s-drop%.1f-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.mode,
args.embedding_size,
args.n_conv_layers,
args.n_pool_layers,
args.kernel_size,
'-concur%d' % args.max_concur if args.mode in ['ctl', 'combine'] else '',
'-weight%g:%g' % (args.mil_weight, args.ctl_weight) if args.mode == 'combine' else '',
'bn' if args.batch_norm else 'nobn',
args.dropout,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/sequential', expid)
MODEL_FILE = os.path.join(WORKSPACE, 'model', 'checkpoint%d.pt' % args.ckpt)
PRED_PATH = os.path.join(WORKSPACE, 'pred')
if not os.path.exists(PRED_PATH): os.makedirs(PRED_PATH)
PRED_FILE = os.path.join(PRED_PATH, 'checkpoint%d.mat' % args.ckpt)
EVAL_PATH = os.path.join(WORKSPACE, 'eval')
if not os.path.exists(EVAL_PATH): os.makedirs(EVAL_PATH)
EVAL_FILE = os.path.join(EVAL_PATH, 'checkpoint%d.txt' % args.ckpt)
with open(EVAL_FILE, 'w'):
pass
def write_log(s):
print s
with open(EVAL_FILE, 'a') as f:
f.write(s + '\n')
if os.path.exists(PRED_FILE):
# Load saved predictions, no need to use GPU
data = loadmat(PRED_FILE)
thres = data['thres'].ravel()
eval_frame_y = data['eval_frame_y']
eval_frame_prob = data['eval_frame_prob']
else:
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from util_in import *
# Load model
args.kernel_size = tuple(int(x) for x in args.kernel_size.split('x'))
model = Net(args).cuda()
model.load_state_dict(torch.load(MODEL_FILE)['model'])
model.eval()
# Load data
valid_x, valid_frame_y, _, _ = bulk_load('GAS_valid')
eval_x, eval_frame_y, _, eval_hashes = bulk_load('GAS_eval')
# Predict
if args.mode == 'ctc':
thres = numpy.array([0.5] * eval_frame_y.shape[-1])
eval_log_prob = model.predict(eval_x)
eval_frame_prob = ctc_decode(eval_log_prob).astype('float32')
else:
valid_frame_prob = model.predict(valid_x)
thres, valid_f1 = optimize_gas_valid(valid_frame_prob, valid_frame_y)
eval_frame_prob = model.predict(eval_x)
# Save predictions
data = {}
data['thres'] = thres
data['eval_hashes'] = eval_hashes
data['eval_frame_y'] = eval_frame_y
data['eval_frame_prob'] = eval_frame_prob
if args.mode == 'ctc':
data['eval_log_prob'] = eval_log_prob
savemat(PRED_FILE, data)
# Evaluation
write_log(' CLASS || THRES || TP | FN | FP | Prec. | Recall | F1 ')
FORMAT1 = ' Macro Avg || || | | | | | %6.02f '
FORMAT2 = ' %######9d || %8.0006f || %##5d | %##5d | %##5d | %6.02f | %6.02f | %6.02f '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT1)
write_log(SEP)
TP, FN, FP, precision, recall, f1 = evaluate_gas_eval(eval_frame_prob, thres, eval_frame_y, verbose = True)
write_log(FORMAT1 % f1.mean())
write_log(SEP)
N_CLASSES = len(f1)
for i in range(N_CLASSES):
write_log(FORMAT2 % (i, thres[i], TP[i], FN[i], FP[i], precision[i], recall[i], f1[i]))
================================================
FILE: code/sequential/train.py
================================================
import sys, os, os.path, time
import argparse
import numpy
import torch
import torch.nn as nn
from torch.optim import *
from torch.optim.lr_scheduler import *
from torch.autograd import Variable
from Net import Net
from ctc import ctc_loss
from ctl import ctl_loss
from util_in import *
from util_out import *
from util_f1 import *
torch.backends.cudnn.benchmark = True
# Parse input arguments
def mybool(s):
return s.lower() in ['t', 'true', 'y', 'yes', '1']
parser = argparse.ArgumentParser()
parser.add_argument('--mode', type = str, default = 'ctl', choices = ['strong', 'mil', 'ctc', 'ctl', 'combine'])
parser.add_argument('--embedding_size', type = int, default = 512)
# This is the embedding size after a pooling layer or after the GRU layer
# After a non-pooling layer, the embeddings size will be twice this much
parser.add_argument('--n_conv_layers', type = int, default = 6)
parser.add_argument('--kernel_size', type = str, default = '3') # 'n' or 'nxm'
parser.add_argument('--n_pool_layers', type = int, default = 6)
# the pooling layers will be inserted uniformly into the conv layers
# the should be at least 2 and at most 6 pooling layers
# the first two pooling layers will have stride (2,2); later ones will have stride (1,2)
parser.add_argument('--max_concur', type = int, default = 1) # for mode == 'ctl' or 'combine' only
parser.add_argument('--mil_weight', type = float, default = 3.3) # for mode == 'combine' only
parser.add_argument('--ctl_weight', type = float, default = 1.0) # for mode == 'combine' only
parser.add_argument('--batch_norm', type = mybool, default = True)
parser.add_argument('--dropout', type = float, default = 0.0)
parser.add_argument('--batch_size', type = int, default = 500)
parser.add_argument('--ckpt_size', type = int, default = 200) # how many batches per checkpoint
parser.add_argument('--optimizer', type = str, default = 'adam', choices = ['adam', 'sgd'])
parser.add_argument('--init_lr', type = float, default = 1e-3)
parser.add_argument('--lr_patience', type = int, default = 3)
parser.add_argument('--lr_factor', type = float, default = 1.0)
parser.add_argument('--max_ckpt', type = int, default = 100)
parser.add_argument('--random_seed', type = int, default = 15213)
args = parser.parse_args()
if 'x' not in args.kernel_size:
args.kernel_size = args.kernel_size + 'x' + args.kernel_size
numpy.random.seed(args.random_seed)
# Prepare log file and model directory
expid = '%s-embed%d-%dC%dP-kernel%s%s%s-%s-drop%.1f-batch%d-ckpt%d-%s-lr%.0e-pat%d-fac%.1f-seed%d' % (
args.mode,
args.embedding_size,
args.n_conv_layers,
args.n_pool_layers,
args.kernel_size,
'-concur%d' % args.max_concur if args.mode in ['ctl', 'combine'] else '',
'-weight%g:%g' % (args.mil_weight, args.ctl_weight) if args.mode == 'combine' else '',
'bn' if args.batch_norm else 'nobn',
args.dropout,
args.batch_size,
args.ckpt_size,
args.optimizer,
args.init_lr,
args.lr_patience,
args.lr_factor,
args.random_seed
)
WORKSPACE = os.path.join('../../workspace/sequential', expid)
MODEL_PATH = os.path.join(WORKSPACE, 'model')
if not os.path.exists(MODEL_PATH): os.makedirs(MODEL_PATH)
LOG_FILE = os.path.join(WORKSPACE, 'train.log')
with open(LOG_FILE, 'w'):
pass
def write_log(s):
timestamp = time.strftime('%m-%d %H:%M:%S')
msg = '[' + timestamp + '] ' + s
print msg
with open(LOG_FILE, 'a') as f:
f.write(msg + '\n')
# Load data
write_log('Loading data ...')
train_gen = batch_generator(batch_size = args.batch_size, random_seed = args.random_seed)
gas_valid_x, gas_valid_y_frame, gas_valid_y_seq, _ = bulk_load('GAS_valid')
gas_eval_x, gas_eval_y_frame, gas_eval_y_seq, _ = bulk_load('GAS_eval')
# Build model
args.kernel_size = tuple(int(x) for x in args.kernel_size.split('x'))
model = Net(args).cuda()
if args.optimizer == 'sgd':
optimizer = SGD(model.parameters(), lr = args.init_lr, momentum = 0.9, nesterov = True)
elif args.optimizer == 'adam':
optimizer = Adam(model.parameters(), lr = args.init_lr)
scheduler = ReduceLROnPlateau(optimizer, mode = 'max', factor = args.lr_factor, patience = args.lr_patience) if args.lr_factor < 1.0 else None
# Train model
write_log('Training model ...')
write_log(' CKPT | LR | Tr.LOSS || G.Val.F1 | G.Ev.F1 ')
FORMAT = ' %#4d | %8.0003g | %8.0006f || %8.0002f | %8.0002f '
SEP = ''.join('+' if c == '|' else '-' for c in FORMAT)
write_log(SEP)
checkpoint = 0
best_gv_f1 = None
best_ge_f1 = None
bce_loss = nn.BCELoss()
for checkpoint in range(1, args.max_ckpt + 1):
# Train for args.ckpt_size batches
model.train()
train_loss = 0
for batch in range(1, args.ckpt_size + 1):
x, y_global, y_seq, y_frame = next(train_gen)
optimizer.zero_grad()
if args.mode == 'strong':
frame_prob = model(x)
loss = bce_loss(frame_prob, y_frame)
elif args.mode == 'mil':
frame_prob = model(x)
global_prob = (frame_prob * frame_prob).sum(dim = 1) / frame_prob.sum(dim = 1) # linear softmax pooling function
loss = bce_loss(global_prob, y_global)
elif args.mode == 'ctc':
log_prob = model(x)
seq_len = numpy.array([log_prob.shape[1]] * log_prob.shape[0]) # actually all batches are the same size
loss = ctc_loss(log_prob, seq_len, y_seq)
elif args.mode == 'ctl':
frame_prob = model(x)
seq_len = numpy.array([frame_prob.shape[1]] * frame_prob.shape[0]) # actually all batches are the same size
loss = ctl_loss(frame_prob, seq_len, y_seq, args.max_concur)
elif args.mode == 'combine':
frame_prob = model(x)
global_prob = (frame_prob * frame_prob).sum(dim = 1) / frame_prob.sum(dim = 1) # linear softmax pooling function
mil_loss = bce_loss(global_prob, y_global)
seq_len = numpy.array([frame_prob.shape[1]] * frame_prob.shape[0]) # actually all batches are the same size
ctl_loss_ = ctl_loss(frame_prob, seq_len, y_seq, args.max_concur)
loss = mil_loss * args.mil_weight + ctl_loss_ * args.ctl_weight
train_loss += loss.data[0]
if numpy.isnan(train_loss) or numpy.isinf(train_loss): break
loss.backward()
optimizer.step()
sys.stderr.write('Checkpoint %d, Batch %d / %d, avg train loss = %f\r' % \
(checkpoint, batch, args.ckpt_size, train_loss / batch))
train_loss /= args.ckpt_size
# Evaluate model
model.eval()
def predict(x):
if args.mode != 'ctc':
return model.predict(x)
else:
log_prob = model.predict(x)
return ctc_decode(log_prob).astype('float32')
sys.stderr.write('Evaluating model on GAS_VALID ...\r')
frame_prob = predict(gas_valid_x)
thres, gv_f1 = optimize_gas_valid(frame_prob, gas_valid_y_frame)
sys.stderr.write('Evaluating model on GAS_EVAL ...\r')
frame_prob = predict(gas_eval_x)
ge_f1 = evaluate_gas_eval(frame_prob, thres, gas_eval_y_frame, verbose = False)
# Write log
write_log(FORMAT % (checkpoint, optimizer.param_groups[0]['lr'], train_loss, gv_f1, ge_f1))
# Abort if training has gone mad
if numpy.isnan(train_loss) or numpy.isinf(train_loss):
write_log('Aborted.')
break
# Save model regularly. Too bad I can't save the scheduler
MODEL_FILE = os.path.join(MODEL_PATH, 'checkpoint%d.pt' % checkpoint)
state = {'model': model.state_dict(), 'optimizer': optimizer.state_dict()}
sys.stderr.write('Saving model to %s ...\r' % MODEL_FILE)
torch.save(state, MODEL_FILE)
# Update learning rate
if scheduler is not None:
scheduler.step(gv_f1)
# Update best results
if best_gv_f1 is None or gv_f1 > best_gv_f1:
best_gv_f1 = gv_f1
best_gv_ckpt = checkpoint
if best_ge_f1 is None or ge_f1 > best_ge_f1:
best_ge_f1 = ge_f1
best_ge_ckpt = checkpoint
write_log('DONE!')
================================================
FILE: code/sequential/util_f1.py
================================================
import numpy
# Compute F1 given predictions and truth
def f1(pred, truth):
return 2.0 * (pred & truth).sum() / (pred.sum() + truth.sum())
# Given scores and truth for a single class (as 1-D numpy arrays), find optimal threshold and corresponding F1
# Statistics of other classes may be given to optimize micro-average F1
def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpred = 0):
# Start with predicting everything as negative
best_thres = numpy.inf
best_f1 = 0.0
num = extraNcorr # number of correctly predicted instances
den = extraNtrue + extraNpred + truth.sum() # number of predicted instances + true instances
instances = [(-numpy.inf, False)] + sorted(zip(scores, truth))
# Lower the threshold gradually
for i in range(len(instances) - 1, 0, -1):
if instances[i][1]: num += 1
den += 1
if instances[i][0] > instances[i-1][0]: # Can put threshold here
f1 = 2.0 * num / den
if f1 > best_f1:
best_thres = (instances[i][0] + instances[i-1][0]) / 2
best_f1 = f1
return best_thres, best_f1
# Given scores and truth for many classes (as 2-D numpy arrays),
# find the optimal class-specific thresholds (as a 1-D numpy array) that maximizes the micro-average F1
# The algorithm is stochastic, but I have always observed deterministic results
def optimize_micro_avg_f1(scores, truth):
# First optimize each class individually
nClasses = truth.shape[1]
thres = numpy.zeros(nClasses, dtype = 'float64')
for i in range(nClasses):
thres[i], _ = optimize_f1(scores[:,i], truth[:,i])
Ntrue = truth.sum(axis = 0)
Npred = (scores >= thres).sum(axis = 0)
Ncorr = ((scores >= thres) & truth).sum(axis = 0)
# Repeatly re-tune the threshold for each class until convergence
candidates = range(nClasses)
while len(candidates) > 0:
i = numpy.random.choice(candidates)
candidates.remove(i)
old_thres = thres[i]
thres[i], _ = optimize_f1(
scores[:,i],
truth[:,i],
extraNcorr = Ncorr.sum() - Ncorr[i],
extraNtrue = Ntrue.sum() - Ntrue[i],
extraNpred = Npred.sum() - Npred[i],
)
if thres[i] != old_thres:
Npred[i] = (scores[:,i] >= thres[i]).sum(axis = 0)
Ncorr[i] = ((scores[:,i] >= thres[i]) & truth[:,i]).sum(axis = 0)
candidates = range(nClasses)
candidates.remove(i)
return thres
================================================
FILE: code/sequential/util_in.py
================================================
import sys, os, os.path, glob
import cPickle
from scipy.io import loadmat
import numpy
from multiprocessing import Process, Queue
import torch
from torch.autograd import Variable
N_CLASSES = 35
N_WORKERS = 6
FEATURE_DIR = '../../data/sequential'
with open(os.path.join(FEATURE_DIR, 'normalizer.pkl'), 'rb') as f:
mu, sigma = cPickle.load(f)
def sample_generator(file_list, random_seed = 15213):
rng = numpy.random.RandomState(random_seed)
while True:
rng.shuffle(file_list)
for filename in file_list:
data = loadmat(filename)
feat = ((data['feat'] - mu) / sigma).astype('float32')
labels = data['labels'].astype('bool')
for i in range(len(data['feat'])):
yield feat[i], labels[i]
def worker(queues, file_lists, random_seed):
generators = [sample_generator(file_lists[i], random_seed + i) for i in range(len(file_lists))]
while True:
for gen, q in zip(generators, queues):
q.put(next(gen))
def batch_generator(batch_size, random_seed = 15213):
queues = [Queue(5) for class_id in range(N_CLASSES)]
file_lists = [sorted(glob.glob(os.path.join(FEATURE_DIR, 'GAS_train_unbalanced_class%02d_part*.mat' % class_id))) for class_id in range(N_CLASSES)]
for worker_id in range(N_WORKERS):
p = Process(target = worker, args = (queues[worker_id::N_WORKERS], file_lists[worker_id::N_WORKERS], random_seed))
p.daemon = True
p.start()
rng = numpy.random.RandomState(random_seed)
batch_x = []; batch_y_global = []; batch_y_seq = []; batch_y_frame = []
while True:
rng.shuffle(queues)
for q in queues:
x, y_frame = q.get()
batch_x.append(x)
batch_y_global.append(y_frame.max(axis = -2))
batch_y_seq.append(mask2ctc(y_frame))
batch_y_frame.append(y_frame)
if len(batch_x) == batch_size:
yield Variable(torch.from_numpy(numpy.stack(batch_x))).cuda(), \
Variable(torch.from_numpy(numpy.stack(batch_y_global).astype('float32'))).cuda(), \
batch_y_seq, \
Variable(torch.from_numpy(numpy.stack(batch_y_frame).astype('float32'))).cuda()
batch_x = []; batch_y_global = []; batch_y_seq = []; batch_y_frame = []
def bulk_load(prefix):
data = loadmat(os.path.join(FEATURE_DIR, prefix + '.mat'))
x = ((data['feat'] - mu) / sigma).astype('float32')
y_frame = data['labels'].astype('bool')
y_seq = [mask2ctc(y) for y in y_frame]
return x, y_frame, y_seq, data['hashes']
def mask2ctc(mask):
z = numpy.zeros((1, mask.shape[-1]), dtype = 'bool')
zp = numpy.concatenate([z, mask])
pz = numpy.concatenate([mask, z])
onset = (pz & ~zp).nonzero()
offset = (zp & ~pz).nonzero()
boundaries = sorted([(t, 1, event) for (t, event) in zip(*onset)] + [(t, -1, event) for (t, event) in zip(*offset)]) # time, onset/offset, event id
return [bound[2] * 2 + {1:1, -1:2}[bound[1]] for bound in boundaries]
================================================
FILE: code/sequential/util_out.py
================================================
import numpy
from util_f1 import *
from joblib import Parallel, delayed
N_JOBS = 6
def ctc_decode(log_prob):
# Decode log_prob (boundary probabilities, batch * frame * (2n+1)) to frame_pred (boolean event decisions, batch * frame * n)
nSeqs, nFrames, nLabels = log_prob.shape
nClasses = (nLabels - 1) / 2
frame_pred = numpy.zeros((nSeqs, nFrames, nClasses), dtype = 'bool')
for i in range(nSeqs):
onset = [None] * nClasses
prev_token = 0
for t, token in zip(range(nFrames), log_prob[i].argmax(axis = 1)):
if token == 0: continue
if token % 2 == 1: # onset of event
event = (token - 1) / 2
onset[event] = t
else: # offset of event
event = token / 2 - 1
if onset[event] is not None:
frame_pred[i, onset[event] : t + 1, event] = True
onset[event] = None
return frame_pred
def optimize_gas_valid(pred, y):
nClasses = y.shape[-1]
result = Parallel(n_jobs = N_JOBS)(delayed(optimize_f1)(pred[..., i].ravel(), y[..., i].ravel()) for i in range(nClasses))
thres = numpy.array([r[0] for r in result], dtype = 'float64')
class_f1 = numpy.array([r[1] for r in result], dtype = 'float32') * 100.0
return thres, class_f1.mean()
def TP_FN_FP(pred, truth):
TP = (pred & truth).sum()
FN = (~pred & truth).sum()
FP = (pred & ~truth).sum()
return (TP, FN, FP)
def evaluate_gas_eval(pred, thres, truth, verbose = False):
# if verbose == False, return only the macro-average F1
# if verbose == True, return the class-wise TP, FN, FP, precision, recall, F1
pred = pred >= thres
nClasses = len(thres)
stats = Parallel(n_jobs = N_JOBS)(delayed(TP_FN_FP)(pred[..., i], truth[..., i]) for i in range(nClasses))
TP, FN, FP = numpy.array(stats, dtype = 'int32').T
f1 = 200.0 * TP / (2 * TP + FN + FP)
if not verbose: return f1.mean()
precision = 100.0 * TP / (TP + FP)
recall = 100.0 * TP / (TP + FN)
return TP, FN, FP, precision, recall, f1
================================================
FILE: data/download.sh
================================================
archives="audioset.tgz sequential.tgz dcase.tgz"
for archive in $archives; do
wget http://islpc21.is.cs.cmu.edu/yunwang/git/cmu-thesis/data/$archive && ((tar zxf $archive && rm $archive) &)
done
while [ $(ls $archives 2>/dev/null | wc -l) -ne 0 ]; do
echo -ne "Extracting file $(ls ${archives//.tgz/\/*} 2>/dev/null | wc -l) of 47457 ...\r"
sleep 10;
done
echo -e "\nAll files extracted. DONE!"
================================================
FILE: workspace/.gitignore
================================================
gitextract_nbiev_ku/
├── .gitignore
├── LICENSE
├── README.md
├── code/
│ ├── audioset/
│ │ ├── Net.py
│ │ ├── eval-TALNet.sh
│ │ ├── eval.py
│ │ ├── train.py
│ │ ├── util_f1.py
│ │ ├── util_in.py
│ │ └── util_out.py
│ ├── dcase/
│ │ ├── Net.py
│ │ ├── eval.py
│ │ ├── train.py
│ │ ├── util_f1.py
│ │ ├── util_in.py
│ │ └── util_out.py
│ └── sequential/
│ ├── Net.py
│ ├── ctc.py
│ ├── ctl.py
│ ├── eval.py
│ ├── train.py
│ ├── util_f1.py
│ ├── util_in.py
│ └── util_out.py
├── data/
│ └── download.sh
└── workspace/
└── .gitignore
SYMBOL INDEX (72 symbols across 20 files)
FILE: code/audioset/Net.py
class ConvBlock (line 7) | class ConvBlock(nn.Module):
method __init__ (line 8) | def __init__(self, n_input_feature_maps, n_output_feature_maps, kernel...
method forward (line 22) | def forward(self, x):
class Net (line 29) | class Net(nn.Module):
method __init__ (line 30) | def __init__(self, args):
method forward (line 64) | def forward(self, x):
method predict (line 93) | def predict(self, x, verbose = True, batch_size = 100):
FILE: code/audioset/eval.py
function mybool (line 9) | def mybool(s):
function write_log (line 68) | def write_log(s):
FILE: code/audioset/train.py
function mybool (line 17) | def mybool(s):
function write_log (line 68) | def write_log(s):
FILE: code/audioset/util_f1.py
function f1 (line 4) | def f1(pred, truth):
function optimize_f1 (line 9) | def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpre...
function optimize_micro_avg_f1 (line 30) | def optimize_micro_avg_f1(scores, truth):
FILE: code/audioset/util_in.py
function sample_generator (line 17) | def sample_generator(file_list, random_seed = 15213):
function worker (line 28) | def worker(queues, file_lists, random_seed):
function batch_generator (line 34) | def batch_generator(batch_size, random_seed = 15213):
function bulk_load (line 53) | def bulk_load(prefix):
function load_dcase_test_frame_truth (line 63) | def load_dcase_test_frame_truth():
FILE: code/audioset/util_out.py
function roc (line 4) | def roc(pred, truth):
function ap_and_auc (line 14) | def ap_and_auc(pred, truth):
function dprime (line 22) | def dprime(auc):
function gas_eval (line 25) | def gas_eval(pred, truth):
function dcase_sed_eval (line 32) | def dcase_sed_eval(outputs, pooling, thres, truth, seg_len, verbose = Fa...
FILE: code/dcase/Net.py
class Net (line 7) | class Net(nn.Module):
method __init__ (line 8) | def __init__(self, args):
method forward (line 31) | def forward(self, x):
method predict (line 63) | def predict(self, x, verbose = True, batch_size = 100):
FILE: code/dcase/eval.py
function write_log (line 45) | def write_log(s):
FILE: code/dcase/train.py
function write_log (line 51) | def write_log(s):
function bce_loss (line 74) | def bce_loss(input, target):
FILE: code/dcase/util_f1.py
function f1 (line 4) | def f1(pred, truth):
function optimize_f1 (line 9) | def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpre...
function optimize_micro_avg_f1 (line 30) | def optimize_micro_avg_f1(scores, truth):
FILE: code/dcase/util_in.py
function sample_generator (line 16) | def sample_generator(file_list, random_seed = 15213):
function worker (line 27) | def worker(queues, file_lists, random_seed):
function batch_generator (line 33) | def batch_generator(batch_size, random_seed = 15213):
function bulk_load (line 52) | def bulk_load(prefix):
function load_dcase_test_frame_truth (line 61) | def load_dcase_test_frame_truth():
FILE: code/dcase/util_out.py
function dcase_sed_eval (line 3) | def dcase_sed_eval(outputs, pooling, thres, truth, seg_len, verbose = Fa...
FILE: code/sequential/Net.py
class ConvBlock (line 7) | class ConvBlock(nn.Module):
method __init__ (line 8) | def __init__(self, n_input_feature_maps, n_output_feature_maps, kernel...
method forward (line 22) | def forward(self, x):
class Net (line 29) | class Net(nn.Module):
method __init__ (line 30) | def __init__(self, args):
method forward (line 60) | def forward(self, x):
method predict (line 78) | def predict(self, x, batch_size = 300):
FILE: code/sequential/ctc.py
function logsumexp (line 6) | def logsumexp(*args):
function ctc_loss (line 20) | def ctc_loss(logProb, seqLen, label, debug = False):
FILE: code/sequential/ctl.py
function cuda (line 6) | def cuda(x):
function tensor (line 9) | def tensor(array):
function variable (line 14) | def variable(array):
function logsumexp (line 19) | def logsumexp(*args):
function ctl_loss (line 33) | def ctl_loss(frameProb, seqLen, label, maxConcur = 1, debug = False):
function strip (line 98) | def strip(variable):
FILE: code/sequential/eval.py
function mybool (line 9) | def mybool(s):
function write_log (line 69) | def write_log(s):
FILE: code/sequential/train.py
function mybool (line 19) | def mybool(s):
function write_log (line 77) | def write_log(s):
function predict (line 150) | def predict(x):
FILE: code/sequential/util_f1.py
function f1 (line 4) | def f1(pred, truth):
function optimize_f1 (line 9) | def optimize_f1(scores, truth, extraNcorr = 0, extraNtrue = 0, extraNpre...
function optimize_micro_avg_f1 (line 30) | def optimize_micro_avg_f1(scores, truth):
FILE: code/sequential/util_in.py
function sample_generator (line 16) | def sample_generator(file_list, random_seed = 15213):
function worker (line 27) | def worker(queues, file_lists, random_seed):
function batch_generator (line 33) | def batch_generator(batch_size, random_seed = 15213):
function bulk_load (line 59) | def bulk_load(prefix):
function mask2ctc (line 66) | def mask2ctc(mask):
FILE: code/sequential/util_out.py
function ctc_decode (line 7) | def ctc_decode(log_prob):
function optimize_gas_valid (line 27) | def optimize_gas_valid(pred, y):
function TP_FN_FP (line 34) | def TP_FN_FP(pred, truth):
function evaluate_gas_eval (line 40) | def evaluate_gas_eval(pred, thres, truth, verbose = False):
Condensed preview — 26 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (99K chars).
[
{
"path": ".gitignore",
"chars": 103,
"preview": "*.pyc\ndata/dcase\ndata/audioset\ndata/sequential\nworkspace/dcase\nworkspace/audioset\nworkspace/sequential\n"
},
{
"path": "LICENSE",
"chars": 1065,
"preview": "MIT License\n\nCopyright (c) 2018 Yun Wang\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\no"
},
{
"path": "README.md",
"chars": 5226,
"preview": "# cmu-thesis\n\nThis repository contains the code for three experiments in my PhD thesis, [Polyphonic Sound Event Detectio"
},
{
"path": "code/audioset/Net.py",
"chars": 6118,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport numpy\n\ncla"
},
{
"path": "code/audioset/eval-TALNet.sh",
"chars": 244,
"preview": "TALNet_FILE=../../workspace/audioset/TALNet/model/TALNet.pt\nif ! [ -f $TALNet_FILE ]; then\n mkdir -p $(dirname $TALNet_"
},
{
"path": "code/audioset/eval.py",
"chars": 8944,
"preview": "import sys, os, os.path\nimport argparse\nimport numpy\nfrom util_out import *\nfrom util_f1 import *\nfrom scipy.io import l"
},
{
"path": "code/audioset/train.py",
"chars": 7480,
"preview": "import sys, os, os.path, time\nimport argparse\nimport numpy\nimport torch\nimport torch.nn as nn\nfrom torch.optim import *\n"
},
{
"path": "code/audioset/util_f1.py",
"chars": 2556,
"preview": "import numpy\n\n# Compute F1 given predictions and truth\ndef f1(pred, truth):\n return 2.0 * (pred & truth).sum() / (pre"
},
{
"path": "code/audioset/util_in.py",
"chars": 2535,
"preview": "import sys, os, os.path, glob\nimport cPickle\nfrom scipy.io import loadmat\nimport numpy\nfrom multiprocessing import Proce"
},
{
"path": "code/audioset/util_out.py",
"chars": 2794,
"preview": "from scipy import stats\nimport numpy\n\ndef roc(pred, truth):\n data = numpy.array(sorted(zip(pred, truth), reverse = Tr"
},
{
"path": "code/dcase/Net.py",
"chars": 4742,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport numpy\n\ncla"
},
{
"path": "code/dcase/eval.py",
"chars": 5527,
"preview": "import sys, os, os.path\nimport argparse\nimport numpy\nfrom util_out import *\nfrom util_f1 import *\nfrom scipy.io import l"
},
{
"path": "code/dcase/train.py",
"chars": 5070,
"preview": "import sys, os, os.path, time\nimport argparse\nimport numpy\nimport torch\nimport torch.nn as nn\nfrom torch.optim import *\n"
},
{
"path": "code/dcase/util_f1.py",
"chars": 2556,
"preview": "import numpy\n\n# Compute F1 given predictions and truth\ndef f1(pred, truth):\n return 2.0 * (pred & truth).sum() / (pre"
},
{
"path": "code/dcase/util_in.py",
"chars": 2368,
"preview": "import sys, os, os.path, glob\nimport cPickle\nfrom scipy.io import loadmat\nimport numpy\nfrom multiprocessing import Proce"
},
{
"path": "code/dcase/util_out.py",
"chars": 1761,
"preview": "import numpy\n\ndef dcase_sed_eval(outputs, pooling, thres, truth, seg_len, verbose = False):\n pred = outputs[1].reshap"
},
{
"path": "code/sequential/Net.py",
"chars": 5058,
"preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport numpy\n\ncla"
},
{
"path": "code/sequential/ctc.py",
"chars": 3996,
"preview": "import numpy\nnumpy.seterr(divide = 'ignore')\nimport torch\nfrom torch.autograd import Variable\n\ndef logsumexp(*args):\n "
},
{
"path": "code/sequential/ctl.py",
"chars": 5909,
"preview": "import numpy\nnumpy.seterr(divide = 'ignore')\nimport torch\nfrom torch.autograd import Variable\n\ndef cuda(x):\n return x"
},
{
"path": "code/sequential/eval.py",
"chars": 5424,
"preview": "import sys, os, os.path\nimport argparse\nimport numpy\nfrom util_out import *\nfrom util_f1 import *\nfrom scipy.io import l"
},
{
"path": "code/sequential/train.py",
"chars": 8102,
"preview": "import sys, os, os.path, time\nimport argparse\nimport numpy\nimport torch\nimport torch.nn as nn\nfrom torch.optim import *\n"
},
{
"path": "code/sequential/util_f1.py",
"chars": 2556,
"preview": "import numpy\n\n# Compute F1 given predictions and truth\ndef f1(pred, truth):\n return 2.0 * (pred & truth).sum() / (pre"
},
{
"path": "code/sequential/util_in.py",
"chars": 3071,
"preview": "import sys, os, os.path, glob\nimport cPickle\nfrom scipy.io import loadmat\nimport numpy\nfrom multiprocessing import Proce"
},
{
"path": "code/sequential/util_out.py",
"chars": 2107,
"preview": "import numpy\nfrom util_f1 import *\nfrom joblib import Parallel, delayed\n\nN_JOBS = 6\n\ndef ctc_decode(log_prob):\n # Dec"
},
{
"path": "data/download.sh",
"chars": 401,
"preview": "archives=\"audioset.tgz sequential.tgz dcase.tgz\"\nfor archive in $archives; do\n wget http://islpc21.is.cs.cmu.edu/yunwan"
},
{
"path": "workspace/.gitignore",
"chars": 0,
"preview": ""
}
]
About this extraction
This page contains the full source code of the MaigoAkisame/cmu-thesis GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 26 files (93.5 KB), approximately 27.0k tokens, and a symbol index with 72 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.