[
  {
    "path": "README.md",
    "content": "# TDNN\nSimple Time Delay Neural Network (TDNN) implementation in Pytorch. Uses the unfold method to slide over an input sequence.\n\n![Alt text](misc/diagram.png?raw=true \"Diagram\") [1] https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf\n\n# Factorized TDNN (TDNN-F)\n\nI've also implemented the Factorized TDNN from Kaldi (TDNN-F) in PyTorch here: https://github.com/cvqluu/Factorized-TDNN\n\n## Usage\n\nTo recreate the TDNN part of the x-vector network in [2]:\n\n```python\n\nfrom tdnn import TDNN\n\n# Assuming 24 dim MFCCs per frame\n\nframe1 = TDNN(input_dim=24, output_dim=512, context_size=5, dilation=1)\nframe2 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=2)\nframe3 = TDNN(input_dim=512, output_dim=512, context_size=3, dilation=3)\nframe4 = TDNN(input_dim=512, output_dim=512, context_size=1, dilation=1)\nframe5 = TDNN(input_dim=512, output_dim=1500, context_size=1, dilation=1)\n\n# Input to frame1 is of shape (batch_size, T, 24)\n# Output of frame5 will be (batch_size, T-14, 1500)\n\n```\n\n![Alt text](misc/xvec_config.png?raw=true \"Diagram\") [2] https://www.danielpovey.com/files/2018_icassp_xvectors.pdf\n"
  },
  {
    "path": "tdnn.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass TDNN(nn.Module):\n    \n    def __init__(\n                    self, \n                    input_dim=23, \n                    output_dim=512,\n                    context_size=5,\n                    stride=1,\n                    dilation=1,\n                    batch_norm=True,\n                    dropout_p=0.0\n                ):\n        '''\n        TDNN as defined by https://www.danielpovey.com/files/2015_interspeech_multisplice.pdf\n\n        Affine transformation not applied globally to all frames but smaller windows with local context\n\n        batch_norm: True to include batch normalisation after the non linearity\n        \n        Context size and dilation determine the frames selected\n        (although context size is not really defined in the traditional sense)\n        For example:\n            context size 5 and dilation 1 is equivalent to [-2,-1,0,1,2]\n            context size 3 and dilation 2 is equivalent to [-2, 0, 2]\n            context size 1 and dilation 1 is equivalent to [0]\n        '''\n        super(TDNN, self).__init__()\n        self.context_size = context_size\n        self.stride = stride\n        self.input_dim = input_dim\n        self.output_dim = output_dim\n        self.dilation = dilation\n        self.dropout_p = dropout_p\n        self.batch_norm = batch_norm\n      \n        self.kernel = nn.Linear(input_dim*context_size, output_dim)\n        self.nonlinearity = nn.ReLU()\n        if self.batch_norm:\n            self.bn = nn.BatchNorm1d(output_dim)\n        if self.dropout_p:\n            self.drop = nn.Dropout(p=self.dropout_p)\n        \n    def forward(self, x):\n        '''\n        input: size (batch, seq_len, input_features)\n        outpu: size (batch, new_seq_len, output_features)\n        '''\n\n        _, _, d = x.shape\n        assert (d == self.input_dim), 'Input dimension was wrong. Expected ({}), got ({})'.format(self.input_dim, d)\n        x = x.unsqueeze(1)\n\n        # Unfold input into smaller temporal contexts\n        x = F.unfold(\n                        x, \n                        (self.context_size, self.input_dim), \n                        stride=(1,self.input_dim), \n                        dilation=(self.dilation,1)\n                    )\n\n        # N, output_dim*context_size, new_t = x.shape\n        x = x.transpose(1,2)\n        x = self.kernel(x)\n        x = self.nonlinearity(x)\n        \n        if self.dropout_p:\n            x = self.drop(x)\n\n        if self.batch_norm:\n            x = x.transpose(1,2)\n            x = self.bn(x)\n            x = x.transpose(1,2)\n\n        return x\n"
  }
]