[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n.hypothesis/\n.pytest_cache/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# pyenv\n.python-version\n\n# celery beat schedule file\ncelerybeat-schedule\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n\n# Auto-generated content above this. Manually added content below.\n.vscode/\n.cache/\n*cache*\n/.project\ntmp/\ndata/\nsite/\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing to PyWarm\n\nPyWarm is developed on [GitHub](https://github.com/blue-season/pywarm). \n\nPlease use GitHub to file Bug reports and submit pull requests. \n\nPlease document and test before submissions.\n\nPyWarm is developed with Python 3.7, but has been tested to work with Python 3.6+.\n\n# Coding Style\n\nFor the rational behind the distinct coding style use in PyWarm, please check\n\n[A Coding Style for Python](https://blue-season.github.io/a-coding-style-for-python/).\n"
  },
  {
    "path": "LICENSE.md",
    "content": "MIT License\n\nCopyright (c) 2019 blue-season\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "\n[![PyWarm - A cleaner way to build neural networks for PyTorch](https://github.com/blue-season/pywarm/raw/gh-pages/docs/pywarm-logo.png)](https://blue-season.github.io/pywarm/)\n\n# PyWarm\n\nA cleaner way to build neural networks for PyTorch.\n\n[![PyPI Python Version](https://img.shields.io/pypi/pyversions/pywarm)](https://github.com/blue-season/pywarm)\n[![PyPI Version](https://img.shields.io/pypi/v/pywarm)](https://pypi.org/project/pywarm/)\n[![License](https://img.shields.io/github/license/blue-season/pywarm)](https://github.com/blue-season/pywarm/blob/master/LICENSE)\n\n[Examples](https://blue-season.github.io/pywarm/docs/example/)  |  [Tutorial](https://blue-season.github.io/pywarm/docs/tutorial/)  |   [API reference](https://blue-season.github.io/pywarm/reference/warm/functional/)\n\n----\n\n## Introduction\n\nPyWarm is a lightweight, high-level neural network construction API for PyTorch.\nIt enables defining all parts of NNs in the functional way.\n\nWith PyWarm, you can put *all* network data flow logic in the `forward()` method of\nyour model, without the need to define children modules in the `__init__()` method\nand then call it again in the `forward()`.\nThis result in a much more readable model definition in fewer lines of code.\n\nPyWarm only aims to simplify the network definition, and does not attempt to cover\nmodel training, validation or data handling.\n\n----\n\nFor example, a convnet for MNIST:\n(If needed, click the tabs to switch between Warm and Torch versions)\n\n\n``` Python tab=\"Warm\" linenums=\"1\"\n# powered by PyWarm\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.functional as W\n\n\nclass ConvNet(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [2, 1, 28, 28])\n\n    def forward(self, x):\n        x = W.conv(x, 20, 5, activation='relu')\n        x = F.max_pool2d(x, 2)\n        x = W.conv(x, 50, 5, activation='relu')\n        x = F.max_pool2d(x, 2)\n        x = x.view(-1, 800)\n        x = W.linear(x, 500, activation='relu')\n        x = W.linear(x, 10)\n        return F.log_softmax(x, dim=1)\n```\n\n``` Python tab=\"Torch\" linenums=\"1\"\n# vanilla PyTorch version, taken from\n# pytorch tutorials/beginner_source/blitz/neural_networks_tutorial.py \nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass ConvNet(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(1, 20, 5, 1)\n        self.conv2 = nn.Conv2d(20, 50, 5, 1)\n        self.fc1 = nn.Linear(4*4*50, 500)\n        self.fc2 = nn.Linear(500, 10)\n\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        x = F.max_pool2d(x, 2, 2)\n        x = F.relu(self.conv2(x))\n        x = F.max_pool2d(x, 2, 2)\n        x = x.view(-1, 4*4*50)\n        x = F.relu(self.fc1(x))\n        x = self.fc2(x)\n        return F.log_softmax(x, dim=1)\n```\n\n----\n\nA couple of things you may have noticed:\n\n-   First of all, in the PyWarm version, the entire network definition and\n    data flow logic resides in the `forward()` method. You don't have to look\n    up and down repeatedly to understand what `self.conv1`, `self.fc1` etc.\n    is doing.\n\n-   You do not need to track and specify `in_channels` (or `in_features`, etc.)\n    for network layers. PyWarm can infer the information for you. e.g.\n\n```Python\n# Warm\nx = W.conv(x, 20, 5, activation='relu')\nx = W.conv(x, 50, 5, activation='relu')\n\n\n# Torch\nself.conv1 = nn.Conv2d(1, 20, 5, 1)\nself.conv2 = nn.Conv2d(20, 50, 5, 1)\n```\n\n-   One unified `W.conv` for all 1D, 2D, and 3D cases. Fewer things to keep track of!\n\n-   `activation='relu'`. All `warm.functional` APIs accept an optional `activation` keyword,\n    which is basically equivalent to `F.relu(W.conv(...))`. The keyword `activation` can also \n    take in a callable, for example `activation=torch.nn.ReLU(inplace=True)` or `activation=swish`.\n\nFor deeper neural networks, see additional [examples](https://blue-season.github.io/pywarm/docs/example/).\n\n----\n## Installation\n\n    pip3 install pywarm\n\n----\n## Quick start: 30 seconds to PyWarm\n\nIf you already have experinces with PyTorch, using PyWarm is very straightforward:\n\n-   First, import PyWarm in you model file:\n```Python\nimport warm\nimport warm.functional as W\n```\n\n-   Second, remove child module definitions in the model's `__init__()` method.\n    In stead, use `W.conv`, `W.linear` ... etc. in the model's `forward()` method,\n    just like how you would use torch nn functional `F.max_pool2d`, `F.relu` ... etc.\n\n    For example, instead of writing:\n\n```Python\n# Torch\nclass MyModule(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size)\n        # other child module definitions\n    def forward(self, x):\n        x = self.conv1(x)\n        # more forward steps\n```\n\n-   You can now write in the warm way:\n\n```Python\n# Warm\nclass MyWarmModule(nn.Module):\n    def __init__(self):\n        super().__init__()\n        warm.up(self, input_shape_or_data)\n    def forward(self, x):\n        x = W.conv(x, out_channels, kernel_size) # no in_channels needed\n        # more forward steps\n```\n\n-   Finally, don't forget to warmify the model by adding\n    \n    `warm.up(self, input_shape_or_data)`\n\n    at the end of the model's `__init__()` method. You need to supply\n    `input_shape_or_data`, which is either a tensor of input data, \n    or just its shape, e.g. `[2, 1, 28, 28]` for MNIST inputs.\n    \n    The model is now ready to use, just like any other PyTorch models.\n\nCheck out the [tutorial](https://blue-season.github.io/pywarm/docs/tutorial/) \nand [examples](https://blue-season.github.io/pywarm/docs/example/) if you want to learn more!\n\n----\n## Testing\n\nClone the repository first, then\n\n    cd pywarm\n    pytest -v\n\n----\n## Documentation\n\nDocumentations are generated using the excellent [Portray](https://timothycrosley.github.io/portray/) package.\n\n-   [Examples](https://blue-season.github.io/pywarm/docs/example/)\n\n-   [Tutorial](https://blue-season.github.io/pywarm/docs/tutorial/) \n\n-   [API reference](https://blue-season.github.io/pywarm/reference/warm/functional/)\n"
  },
  {
    "path": "docs/example.md",
    "content": "\n# PyWarm Examples\n\n## ResNet\n\nA more detailed example, the ResNet18 network defined in PyWarm and vanilla PyTorch:\n\n``` Python tab=\"Warm\" linenums=\"1\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.functional as W\n\n\ndef basic(x, size, stride):\n    y = W.conv(x, size, 3, stride=stride, padding=1, bias=False)\n    y = W.batch_norm(y, activation='relu')\n    y = W.conv(y, size, 3, stride=1, padding=1, bias=False)\n    y = W.batch_norm(y)\n    if y.shape[1] != x.shape[1]: # channel size mismatch, needs projection\n        x = W.conv(x, y.shape[1], 1, stride=stride, bias=False)\n        x = W.batch_norm(x)\n    y = y+x # residual shortcut connection\n    return F.relu(y)\n\n\ndef stack(x, num_block, size, stride, block=basic):\n    for s in [stride]+[1]*(num_block-1):\n        x = block(x, size, s)\n    return x\n\n\nclass ResNet(nn.Module):\n\n    def __init__(self, block=basic,\n            stack_spec=((2, 64, 1), (2, 128, 2), (2, 256, 2), (2, 512, 2))):\n        super().__init__()\n        self.block = block\n        self.stack_spec = stack_spec\n        warm.up(self, [2, 3, 32, 32])\n\n    def forward(self, x):\n        y = W.conv(x, 64, 7, stride=2, padding=3, bias=False)\n        y = W.batch_norm(y, activation='relu')\n        y = F.max_pool2d(y, 3, stride=2, padding=1)\n        for spec in self.stack_spec:\n            y = stack(y, *spec, block=self.block)\n        y = F.adaptive_avg_pool2d(y, 1)\n        y = torch.flatten(y, 1)\n        y = W.linear(y, 1000)\n        return y\n\n\nresnet18 = ResNet()\n```\n\n``` Python tab=\"Torch\" linenums=\"1\"\n# code based on torchvision/models/resnet.py\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\ndef conv3x3(size_in, size_out, stride=1):\n    return nn.Conv2d(size_in, size_out, kernel_size=3, stride=stride,\n        padding=1, groups=1, bias=False, dilation=1, )\n\n\ndef conv1x1(size_in, size_out, stride=1):\n    return nn.Conv2d(size_in, size_out, kernel_size=1, stride=stride,\n        padding=0, groups=1, bias=False, dilation=1, )\n\n\nclass BasicBlock(nn.Module):\n\n    expansion = 1\n\n    def __init__(self, size_in, size_out, stride=1, downsample=None):\n        super().__init__()\n        self.conv1 = conv3x3(size_in, size_out, stride)\n        self.bn1 = nn.BatchNorm2d(size_out)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(size_out, size_out)\n        self.bn2 = nn.BatchNorm2d(size_out)\n        self.downsample = downsample\n\n    def forward(self, x):\n        identity = x\n        y = self.conv1(x)\n        y = self.bn1(y)\n        y = self.relu(y)\n        y = self.conv2(y)\n        y = self.bn2(y)\n        if self.downsample is not None:\n            identity = self.downsample(x)\n        y += identity\n        y = self.relu(y)\n        return y\n\n\nclass ResNet(nn.Module):\n\n    def __init__(self,\n            block=BasicBlock, num_block=[2, 2, 2, 2]):\n        super().__init__()\n        self.size_in = 64\n        self.conv1 = nn.Conv2d(3, self.size_in, kernel_size=7, stride=2,\n            padding=3, bias=False)\n        self.bn1 = nn.BatchNorm2d(self.size_in)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.stack1 = self._make_stack(block, 64, num_block[0], 1)\n        self.stack2 = self._make_stack(block, 128, num_block[1], 2)\n        self.stack3 = self._make_stack(block, 256, num_block[2], 2)\n        self.stack4 = self._make_stack(block, 512, num_block[3], 2)\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.fc = nn.Linear(512, 1000)\n\n    def _make_stack(self, block, size_out, num_blocks, stride):\n        downsample = None\n        if stride != 1 or self.size_in != size_out:\n            downsample = nn.Sequential(\n                conv1x1(self.size_in, size_out, stride),\n                nn.BatchNorm2d(size_out), )\n        stacks = []\n        for stride in strides:\n            stacks.append(\n                block(self.size_in, size_out, stride, downsample))\n            self.size_in = size_out\n        return nn.Sequential(*stacks)\n\n    def forward(self, x):\n        y = self.conv1(x)\n        y = self.bn1(y)\n        y = self.relu(y)\n        y = self.maxpool(y)\n        y = self.stack1(y)\n        y = self.stack2(y)\n        y = self.stack3(y)\n        y = self.stack4(y)\n        y = self.avg_pool(y)\n        y = torch.flatten(y, 1)\n        y = self.fc(y)\n        return y\n\n\nresnet18 = ResNet()\n```\n\n-   The PyWarm version significantly reduces self-repititions of code as in the vanilla PyTorch version.\n\n-   Note that when warming the model via `warm.up(self, [2, 3, 32, 32])`\n    We set the first `Batch` dimension to 2 because the model uses `batch_norm`,\n    which will not work when `Batch` is 1.\n\n----\n\n## MobileNet\n\n``` Python tab=\"Warm\" linenums=\"1\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.functional as W\n\n\ndef conv_bn_relu(x, size, stride=1, expand=1, kernel=3, groups=1):\n    x = W.conv(x, size, kernel, padding=(kernel-1)//2,\n        stride=stride, groups=groups, bias=False, )\n    return W.batch_norm(x, activation='relu6')\n\n\ndef bottleneck(x, size_out, stride, expand):\n    size_in = x.shape[1]\n    size_mid = size_in*expand\n    y = conv_bn_relu(x, size_mid, kernel=1) if expand > 1 else x\n    y = conv_bn_relu(y, size_mid, stride, kernel=3, groups=size_mid)\n    y = W.conv(y, size_out, kernel=1, bias=False)\n    y = W.batch_norm(y)\n    if stride == 1 and size_in == size_out:\n        y += x # residual shortcut\n    return y\n\n\ndef conv1x1(x, *arg):\n    return conv_bn_relu(x, *arg, kernel=1)\n\n\ndef pool(x, *arg):\n    return x.mean([2, 3])\n\n\ndef classify(x, size, *arg):\n    x = W.dropout(x, rate=0.2)\n    return W.linear(x, size)\n\n\ndefault_spec = (\n    (None, 32, 1, 2, conv_bn_relu),  # t, c, n, s, operator\n    (1, 16, 1, 1, bottleneck),\n    (6, 24, 2, 2, bottleneck),\n    (6, 32, 3, 2, bottleneck),\n    (6, 64, 4, 2, bottleneck),\n    (6, 96, 3, 1, bottleneck),\n    (6, 160, 3, 2, bottleneck),\n    (6, 320, 1, 1, bottleneck),\n    (None, 1280, 1, 1, conv1x1),\n    (None, None, 1, None, pool),\n    (None, 1000, 1, None, classify), )\n\n\nclass MobileNetV2(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [2, 3, 224, 224])\n        \n    def forward(self, x):\n        for t, c, n, s, op in default_spec:\n            for i in range(n):\n                stride = s if i == 0 else 1\n                x = op(x, c, stride, t)\n        return x\n\n\nnet = MobileNetV2()\n```\n\n``` Python tab=\"Torch\" linenums=\"1\"\n# code based on torchvision/models/mobilenet.py\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass ConvBNReLU(nn.Sequential):\n\n    def __init__(self, in_planes, out_planes, \n            kernel_size=3, stride=1, groups=1):\n        padding = (kernel_size-1)//2\n        super(ConvBNReLU, self).__init__(\n            nn.Conv2d(in_planes, out_planes, kernel_size, \n                stride, padding, groups=groups, bias=False),\n            nn.BatchNorm2d(out_planes),\n            nn.ReLU6(inplace=True), )\n\n\nclass BottleNeck(nn.Module):\n\n    def __init__(self, inp, oup, stride, expand_ratio):\n        super().__init__()\n        self.stride = stride\n        assert stride in [1, 2]\n        hidden_dim = int(round(inp * expand_ratio))\n        self.use_res_connect = self.stride == 1 and inp == oup\n        layers = []\n        if expand_ratio != 1:\n            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))\n        layers.extend([\n            ConvBNReLU(hidden_dim, hidden_dim, \n                stride=stride, groups=hidden_dim),\n            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),\n            nn.BatchNorm2d(oup), ])\n        self.conv = nn.Sequential(*layers)\n\n    def forward(self, x):\n        if self.use_res_connect:\n            return x + self.conv(x)\n        else:\n            return self.conv(x)\n\n\ndefault_spec = [\n    [1, 16, 1, 1], # t, c, n, s\n    [6, 24, 2, 2],\n    [6, 32, 3, 2],\n    [6, 64, 4, 2],\n    [6, 96, 3, 1],\n    [6, 160, 3, 2],\n    [6, 320, 1, 1], ]\n\n\nclass MobileNetV2(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        input_channel = 32\n        last_channel = 1280\n        features = [ConvBNReLU(3, input_channel, stride=2)]\n        for t, c, n, s in default_spec:\n            output_channel = c\n            for i in range(n):\n                stride = s if i == 0 else 1\n                features.append(\n                    BottleNeck(\n                        input_channel, output_channel,\n                        stride, expand_ratio=t))\n                input_channel = output_channel\n        features.append(ConvBNReLU(input_channel, \n            last_channel, kernel_size=1))\n        self.features = nn.Sequential(*features)\n        self.classifier = nn.Sequential(\n            nn.Dropout(0.2),\n            nn.Linear(last_channel, 1000), )\n\n    def forward(self, x):\n        x = self.features(x)\n        x = x.mean([2, 3])\n        x = self.classifier(x)\n        return x\n\n\nnet = MobileNetV2()\n```\n\n## Transformer\n\n```Python\n\"\"\"\nThe Transformer model from paper Attention is all you need.\nThe Transformer instance accepts two inputs:\nx is Tensor with shape (Batch, Channel, LengthX).\n    usually a source sequence from embedding (in such cases,\n    Channel equals the embedding size).\ny is Tensor with shape (Batch, Channel, lengthY).\n    usually a target sequence, also from embedding.\n**kw is passed down to inner components.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.functional as W\n\n\ndef multi_head_attention(x, y=None, num_head=8, dropout=0.1, mask=None, **kw):\n    def split_heads(t):\n        return t.reshape(batch, num_head, size//num_head, t.shape[-1])\n    def merge_heads(t):\n        return t.reshape(batch, -1, t.shape[-1])\n    if y is None:\n        y = x # self attention\n    batch, size = x.shape[:2]\n    assert size%num_head == 0, 'num_head must be a divisor of size.'\n    assert y.shape[:2] == x.shape[:2], 'The first 2 dims of x, y must match.'\n    q = W.linear(x, size) # query\n    k = W.linear(y, size) # key\n    v = W.linear(y, size) # value\n    q = split_heads(q)\n    k = split_heads(k)\n    v = split_heads(v)\n    q *= (size//num_head)**(-0.5)\n    a = q.transpose(2, 3).contiguous().matmul(k) # attention weights\n    if mask is not None:\n        a += mask\n    a = F.softmax(a, dim=-1)\n    a = W.dropout(a, dropout)\n    x = v.matmul(a.transpose(2, 3).contiguous())\n    x = merge_heads(x)\n    return W.linear(x, size)\n\n\ndef feed_forward(x, size_ff=2048, dropout=0.1, **kw):\n    y = W.linear(x, size_ff, activation='relu')\n    y = W.dropout(y, dropout)\n    return W.linear(y, x.shape[1])\n\n\ndef residual_add(x, layer, dropout=0.1, **kw):\n    y = W.layer_norm(x)\n    y = layer(y, **kw)\n    y = W.dropout(y, dropout)\n    return x+y\n\n\ndef encoder(x, num_encoder=6, **kw):\n    for i in range(num_encoder):\n        x = residual_add(x, multi_head_attention, **kw)\n        x = residual_add(x, feed_forward, **kw)\n    return W.layer_norm(x)\n\n\ndef decoder(x, y, num_decoder=6, mask_x=None, mask_y=None, **kw):\n    for i in range(num_decoder):\n        y = residual_add(y, multi_head_attention, mask=mask_y, **kw)\n        y = residual_add(x, multi_head_attention, y=y, mask=mask_x, **kw)\n        y = residual_add(y, feed_forward, **kw)\n    return W.layer_norm(y)\n\n\ndef transformer(x, y, **kw):\n    x = encoder(x, **kw)\n    x = decoder(x, y, **kw)\n    return x\n\n\nclass Transformer(nn.Module):\n\n    def __init__(self, *shape, **kw):\n        super().__init__()\n        self.kw = kw\n        warm.up(self, *shape)\n        \n    def forward(self, x, y):\n        return transformer(x, y, **self.kw)\n\n```\n\n\n## EfficientNet\n\nFor a brief overview, check the [blog post](https://blue-season.github.io/efficientnet-in-5-minutes/).\n\n```python\n\"\"\"\nEfficientNet model from https://arxiv.org/abs/1905.11946\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.functional as W\n\n\ndef swish(x):\n    return x*torch.sigmoid(x)\n\n\ndef squeeze_excitation(x, size_se):\n    if size_se == 0:\n        return x\n    size_in = x.shape[1]\n    x = F.adaptive_avg_pool2d(x, 1)\n    x = W.conv(x, size_se, 1, activation=swish)\n    return W.conv(x, size_in, 1, activation=swish)\n\n\ndef drop_connect(x, rate):\n    if rate == 0:\n        return x\n    rate = 1.0-rate\n    drop_mask = rate + torch.rand([x.shape[0], 1, 1, 1],\n        device=x.device, requires_grad=False)\n    return x/rate*drop_mask.floor()\n\n\ndef conv_pad_same(x, size, kernel=1, stride=1, **kw):\n    \"\"\" Same padding so that out_size*stride == in_size. \"\"\"\n    pad = 0\n    if kernel != 1 or stride != 1:\n        in_size, s, k = [torch.as_tensor(v)\n            for v in (x.shape[2:], stride, kernel)]\n        pad = torch.max(((in_size+s-1)//s-1)*s+k-in_size, torch.tensor(0))\n        left, right = pad//2, pad-pad//2\n        if torch.all(left == right):\n            pad = tuple(left.tolist())\n        else:\n            left, right = left.tolist(), right.tolist()\n            pad = sum(zip(left[::-1], right[::-1]), ())\n            x = F.pad(x, pad)\n            pad = 0\n    return W.conv(x, size, kernel, stride=stride, padding=pad, **kw)\n\n\ndef conv_bn_act(x, size, kernel=1, stride=1, groups=1, \n        bias=False, eps=1e-3, momentum=1e-2, act=swish):\n    x = conv_pad_same(x, size, kernel, stride=stride, groups=groups, bias=bias)\n    return W.batch_norm(x, eps=eps, momentum=momentum, activation=act)\n\n\ndef mb_block(x, size_out, expand=1, kernel=1, stride=1,\n        se_ratio=0.25, dc_ratio=0.2):\n    \"\"\" Mobilenet Bottleneck Block. \"\"\"\n    size_in = x.shape[1]\n    size_mid = size_in*expand\n    y = conv_bn_act(x, size_mid, 1) if expand > 1 else x\n    y = conv_bn_act(y, size_mid, kernel, stride=stride, groups=size_mid)\n    y = squeeze_excitation(y, int(size_in*se_ratio))\n    y = conv_bn_act(y, size_out, 1, act=None)\n    if stride == 1 and size_in == size_out:\n        y = drop_connect(y, dc_ratio)\n        y += x\n    return y\n\n\nspec_b0 = (\n# size, expand, kernel, stride, repeat, squeeze_excitation, drop_connect\n    (16, 1, 3, 1, 1, 0.25, 0.2),\n    (24, 6, 3, 2, 2, 0.25, 0.2),\n    (40, 6, 5, 2, 2, 0.25, 0.2),\n    (80, 6, 3, 2, 3, 0.25, 0.2),\n    (112, 6, 5, 1, 3, 0.25, 0.2),\n    (192, 6, 5, 2, 4, 0.25, 0.2),\n    (320, 6, 3, 1, 1, 0.25, 0.2), )\n\n\nclass WarmEfficientNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [2, 3, 32, 32])\n    def forward(self, x):\n        x = conv_bn_act(x, 32, kernel=3, stride=2)\n        for size, expand, kernel, stride, repeat, se, dc in spec_b0:\n            for i in range(repeat):\n                stride = stride if i == 0 else 1\n                x = mb_block(x, size, expand, kernel, stride, se, dc)\n        x = conv_bn_act(x, 1280)\n        x = F.adaptive_avg_pool2d(x, 1)\n        x = W.dropout(x, 0.2)\n        x = x.view(x.shape[0], -1)\n        x = W.linear(x, 1000)\n        return x\n```\n"
  },
  {
    "path": "docs/text.mako",
    "content": "## Define mini-templates for each portion of the doco.\n\n<%!\n  def indent(s, spaces=4):\n      new = s.replace('\\n', '\\n' + ' ' * spaces)\n      return ' ' * spaces + new.strip()\n%>\n\n<%def name=\"deflist(s)\">:${indent(s)[1:]}</%def>\n\n<%def name=\"h3(s)\">### ${s}\n</%def>\n\n<%def name=\"function(func)\" buffered=\"True\">\n    <%\n        returns = show_type_annotations and func.return_annotation() or ''\n        if returns:\n            returns = ' -> ' + returns\n    %>\n${\"---\"}\n${\"### \" + func.name}\n\n\n```python3\ndef :\n    ${\",\\n  \".join(func.params(annotate=show_type_annotations))} ${returns}\n```\n${func.docstring}\n\n% if show_source_code and func.source and func.obj is not getattr(func.inherits, 'obj', None):\n\n??? example \"View Source\"\n        ${\"\\n        \".join(func.source.split(\"\\n\"))}\n\n% endif\n</%def>\n\n<%def name=\"variable(var)\" buffered=\"True\">\n```python3\n${var.name}\n```\n${var.docstring | deflist}\n</%def>\n\n<%def name=\"class_(cls)\" buffered=\"True\">\n${\"---\"}\n${\"### \" + cls.name}\n\n```python3\ndef :\n    ${\",\\n  \".join(cls.params(annotate=show_type_annotations))}\n```\n\n${cls.docstring}\n\n% if show_source_code and cls.source:\n\n??? example \"View Source\"\n        ${\"\\n        \".join(cls.source.split(\"\\n\"))}\n\n------\n\n% endif\n\n<%\n  class_vars = cls.class_variables(show_inherited_members, sort=sort_identifiers)\n  static_methods = cls.functions(show_inherited_members, sort=sort_identifiers)\n  inst_vars = cls.instance_variables(show_inherited_members, sort=sort_identifiers)\n  methods = cls.methods(show_inherited_members, sort=sort_identifiers)\n  mro = cls.mro()\n  subclasses = cls.subclasses()\n%>\n% if mro:\n${h3('Ancestors (in MRO)')}\n    % for c in mro:\n* ${c.refname}\n    % endfor\n% endif\n\n% if subclasses:\n${h3('Descendants')}\n    % for c in subclasses:\n* ${c.refname}\n    % endfor\n% endif\n\n% if class_vars:\n${h3('Class variables')}\n    % for v in class_vars:\n${variable(v)}\n\n    % endfor\n% endif\n\n% if static_methods:\n${h3('Static methods')}\n    % for f in static_methods:\n${function(f)}\n\n    % endfor\n% endif\n\n% if inst_vars:\n${h3('Instance variables')}\n% for v in inst_vars:\n${variable(v)}\n\n% endfor\n% endif\n% if methods:\n${h3('Methods')}\n    % for m in methods:\n${function(m)}\n\n% endfor\n% endif\n\n</%def>\n\n## Start the output logic for an entire module.\n\n<%\n  variables = module.variables()\n  classes = module.classes()\n  functions = module.functions()\n  submodules = module.submodules()\n  heading = 'Namespace' if module.is_namespace else 'Module'\n%>\n\n${\"# \" + heading} ${module.name}\n\n${module.docstring}\n\n% if show_source_code:\n\n??? example \"View Source\"\n        ${\"\\n        \".join(module.source.split(\"\\n\"))}\n\n% endif\n\n\n% if submodules:\nSub-modules\n-----------\n    % for m in submodules:\n* [${m.name}](${m.name.split(\".\")[-1]}/)\n    % endfor\n% endif\n\n% if variables:\nVariables\n---------\n    % for v in variables:\n${variable(v)}\n\n    % endfor\n% endif\n\n% if functions:\nFunctions\n---------\n    % for f in functions:\n${function(f)}\n\n    % endfor\n% endif\n\n% if classes:\nClasses\n-------\n    % for c in classes:\n${class_(c)}\n\n    % endfor\n% endif\n"
  },
  {
    "path": "docs/tutorial.md",
    "content": "\n# PyWarm Basic Tutorial\n\n## Import\n\nTo get started, first import PyWarm in your project:\n\n```Python\nimport warm\nimport warm.functional as W\n```\n\n## Rewrite\n\nNow you can replace child module definitions with function calls. \nFor example, instead of:\n\n```Python\n# Torch\nclass MyModule(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size)\n        # other child module definitions\n\n    def forward(self, x):\n        x = self.conv1(x)\n        # more forward steps\n```\n\nYou now use the warm functions:\n\n```Python\n# Warm\nclass MyWarmModule(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        warm.up(self, input_shape_or_data)\n\n    def forward(self, x):\n        x = W.conv(x, out_channels, kernel_size) # no in_channels needed\n        # more forward steps\n```\n\nNotice the `warm.up(self, input_shape_or_data)` at the end of the `__init__()` method.\nIt is required so that PyWarm can infer all shapes of itermediate steps and set up trainable parameters.\nThe only argument `input_shape_or_data` can either be a tensor, e.g. `torch.randn(2, 1, 28, 28)`,\nor just the shape, e.g. `[2, 1, 28, 28]` for the model inputs. If the model has multiple inputs,\nyou may supple them in a list or a dictionary.\n\nAlthough it is recommended that you attach `warm.up()` to the end of the `__init__()` of your model, you can actually\nuse it on the class instances outside of the definition, like a normal function call:\n\n```Python\nclass MyWarmModule(nn.Module):\n\n    def __init__(self):\n        super().__init__() # no warm.up here\n\n    def forward(self, x):\n        x = W.conv(x, 10, 3)\n        # forward step, powered by PyWarm\n\n\nmodel = MyWarmModule() # call warm.up outside of the module definition\n\nwarm.up(model, [2, 1, 28, 28])\n```\n\n**Note**: If the model contains `batch_norm` layers, you need to specify the `Batch` dimension to at least 2.\n\n# Advanced Topics\n\n## Default shapes\n\nPyWarm has a unified functional interface, that by default all functions accept and return tensors with shape\n`(Batch, Channel, *)`, where `*` is any number of additional dimensions. For example, for 2d images,\nthe `*` usually stands for `(Height, Width)`, and for 1d time series, the `*` means `(Time,)`.\n\nThis convention is optimized for the performance of Convolutional networks. It may become less efficient if your\nmodel relies heavily on dense (Linear) or recurrent (LSTM, GRU) layers. You can use different input and\noutput shapes by specifying `in_shape`, `out_shape` keyword arguments in the function calls. These keywords\naccept only letters `'B'`, `'C'` and `'D'`, which stand for `Batch`, `Channel`, and `*` (extra Dimensions)\nrespectively. So for example if for a 1d time series you want to have `(Time, Batch, Channel)` as the output shape,\nyou can specify `out_shape='DBC'`.\n\n## Dimensional awareness\n\nPyWarm functions can automatically identify 1d, 2d and 3d input data, so the same function can be used on different\ndimensional cases. For example, the single `W.conv` is enough to replace `nn.Conv1d, nn.Conv2d, nn.Conv3d`.\nSimilarly, you don't need `nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d` for differnt inputs, a single `W.batch_norm`\ncan replace them all.\n\n## Shape inference\n\nMany neural network layers will perform a transformation of shapes. For example, after a convolution operation,\nthe shape is changed from `(Batch, ChannelIn, *)` to `(Batch, ChannelOut, *)`. PyTorch nn Modules require the user to \nkeep track of both `in_channels` and `out_channels`. PyWarm relieves this pain by inferring the `in_channels` for you,\nso you can focus more on the nature of your tasks, rather than chores.\n\n## Argument passdown\n\nIf the signature of a PyWarm function does not specify all possible argument of its torch nn Module couterpart, it will pass down\nadditional keyword arguments to the underlying nn Module. For example, if you want to specify strides of 2 for a conv layer,\njust use `W.conv(..., stride=2)`. The only thing to remember is that you have to specify the full keyword, instead of \nrelying on the position of arguments.\n\n## Parameter initialization per layer\n\nUnlike PyTorch's approach, paramter initialization can be specified directly in PyWarm's functional interface.\nFor example:\n\n```Python\nx = W.conv(x, 20, 1, init_weight='kaiming_uniform_')\n```\nThis makes it easier to create layer specific initialization in PyWarm. You no long need to go through\n`self.modules()` and `self.parameters()` to create custom initializations.\n\nBy default, PyWarm will look into `torch.nn.init` for initialization function names.\nAlternatively, you may just specify a callable, or a tuple `(fn, kwargs)` if the callable accepts more than 1 input.\n\nIf the initialization is not specified or `None` is used, the corresponding layer will get default initializations as used\nin torch nn modules. \n\n## Apply activation nonlinearity to the output\n\nPyWarm's functional interface supports adding an optional keyword argument `activation=name`, where\nname is a callable or just its name, which represents an activation (nonlinearity) function\nin `torch.nn.functional` or just `torch`. By default no activation is used.\n\n## Mix and Match\n\nYou are not limited to only use PyWarm's functional interface. It is completely ok to mix and match the old\nPyTorch way of child module definitions with PyWarm's function API. For example:\n\n```Python\nclass MyModel(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        # other stuff\n        self.conv1 = nn.Conv2d(2, 30, 7, padding=3)\n        # other stuff\n\n    def forward(self, x):\n        y = F.relu(self.conv1(x))\n        y = W.conv(y, 40, 3, activation='relu')\n```\n\n## Custom layer names\n\nNormally you do not have to specify layer names when using the functional API.\nPyWarm will track and count usage for the layer type and automatically assign names for you. For example,\nsubsequent convolutional layer calls via `W.conv` will create `conv_1`, `conv_2`, ... etc. in the parent module.\n\nNevertheless, if you want to ensure certain layer have particular names, you can specify `name='my_name'`\nkeyword arguments in the call.\n\nAlternatively, if you still want PyWarm to count usage and increment ordinal for you, but only want to customize \nthe base type name, you can use `base_name='my_prefix'` keyword instead. The PyWarm modules will then have\nnames like `my_prefix_1`, `my_prefix_2` in the parent module.\n\nSee the PyWarm [resnet example in the examples folder](https://github.com/blue-season/pywarm/blob/master/examples/resnet.py)\non how to use these features to load pre-trained model parameters into PyWarm models.\n"
  },
  {
    "path": "examples/efficientnet.py",
    "content": "\n# 09-20-2019;\n\"\"\"\nEfficientNet\n\"\"\"\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.util\nimport warm.functional as W\nfrom warm.engine import namespace\n\n\ndef swish(x):\n    return x*torch.sigmoid(x)\n\n\ndef conv_pad_same(x, size, kernel=1, stride=1, **kw):\n    pad = 0\n    if kernel != 1 or stride != 1:\n        in_size, s, k = [torch.as_tensor(v) for v in (x.shape[2:], stride, kernel)]\n        pad = torch.max(((in_size+s-1)//s-1)*s+k-in_size, torch.tensor(0))\n        left, right = pad//2, pad-pad//2\n        if torch.all(left == right):\n            pad = tuple(left.tolist())\n        else:\n            left, right = left.tolist(), right.tolist()\n            pad = sum(zip(left[::-1], right[::-1]), ())\n            x = F.pad(x, pad)\n            pad = 0\n    return W.conv(x, size, kernel, stride=stride, padding=pad, **kw)\n\n\n@namespace\ndef conv_bn_act(x, size, kernel=1, stride=1, groups=1, bias=False, eps=1e-3, momentum=1e-2, act=swish, name='', **kw):\n    x = conv_pad_same(x, size, kernel, stride=stride, groups=groups, bias=bias, name=name+'-conv')\n    return W.batch_norm(x, eps=eps, momentum=momentum, activation=act, name=name+'-bn')\n\n\n@namespace\ndef mb_block(x, size_out, expand=1, kernel=1, stride=1, se_ratio=0.25, dc_ratio=0.2, **kw):\n    \"\"\" MobileNet Bottleneck Block. \"\"\"\n    size_in = x.shape[1]\n    size_mid = size_in*expand\n    y = conv_bn_act(x, size_mid, 1, **kw) if expand > 1 else x\n    y = conv_bn_act(y, size_mid, kernel, stride=stride, groups=size_mid, **kw)\n    y = squeeze_excitation(y, int(size_in*se_ratio), **kw)\n    y = conv_bn_act(y, size_out, 1, act=None, **kw)\n    if stride == 1 and size_in == size_out:\n        y = drop_connect(y, dc_ratio)\n        y += x\n    return y\n\n\n@namespace\ndef squeeze_excitation(x, size_se, name='', **kw):\n    if size_se == 0:\n        return x\n    size_in = x.shape[1]\n    x = F.adaptive_avg_pool2d(x, 1)\n    x = W.conv(x, size_se, 1, activation=swish, name=name+'-conv1')\n    return W.conv(x, size_in, 1, activation=swish, name=name+'-conv2')\n\n\ndef drop_connect(x, rate):\n    \"\"\" Randomly set entire batch to 0. \"\"\"\n    if rate == 0:\n        return x\n    rate = 1.0-rate\n    drop_mask = torch.rand([x.shape[0], 1, 1, 1], device=x.device, requires_grad=False)+rate\n    return x/rate*drop_mask.floor()\n\n\nspec_b0 = (\n    (16, 1, 3, 1, 1, 0.25, 0.2), # size, expand, kernel, stride, repeat, se_ratio, dc_ratio\n    (24, 6, 3, 2, 2, 0.25, 0.2),\n    (40, 6, 5, 2, 2, 0.25, 0.2),\n    (80, 6, 3, 2, 3, 0.25, 0.2),\n    (112, 6, 5, 1, 3, 0.25, 0.2),\n    (192, 6, 5, 2, 4, 0.25, 0.2),\n    (320, 6, 3, 1, 1, 0.25, 0.2), )\n\n\nclass WarmEfficientNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [2, 3, 32, 32])\n    def forward(self, x):\n        x = conv_bn_act(x, 32, kernel=3, stride=2, name='head')\n        for size, expand, kernel, stride, repeat, se_ratio, dc_ratio in spec_b0:\n            for i in range(repeat):\n                stride = stride if i == 0 else 1\n                x = mb_block(x, size, expand, kernel, stride, se_ratio, dc_ratio)\n        x = conv_bn_act(x, 1280, name='tail')\n        x = F.adaptive_avg_pool2d(x, 1)\n        x = W.dropout(x, 0.2)\n        x = x.view(x.shape[0], -1)\n        x = W.linear(x, 1000)\n        return x\n\n\nif __name__ == '__main__':\n    m = WarmEfficientNet()\n    warm.util.summary(m)\n"
  },
  {
    "path": "examples/lstm.py",
    "content": "# 09-07-2019;\n\"\"\"\nLSTM sequence model example, based on\nhttps://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html\n\"\"\"\nimport argparse\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport warm\nimport warm.functional as W\n\n\ntraining_data = [\n    ('The dog ate the apple'.split(), ['DET', 'NN', 'V', 'DET', 'NN']),\n    ('Everybody read that book'.split(), ['NN', 'V', 'DET', 'NN']), ]\ntesting_data = [('The dog ate the book'.split(), ['DET', 'NN', 'V', 'DET', 'NN'])]\nword_to_ix = {}\nfor sent, tags in training_data:\n    for word in sent:\n        if word not in word_to_ix:\n            word_to_ix[word] = len(word_to_ix)\ntag_to_ix = {'DET': 0, 'NN': 1, 'V': 2}\nix_to_tag = {v:k for k, v in tag_to_ix.items()}\n\n\nclass WarmTagger(nn.Module):\n    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):\n        super().__init__()\n        self.arg = (embedding_dim, hidden_dim, vocab_size, tagset_size)\n        warm.up(self, torch.tensor([0, 1], dtype=torch.long))\n    def forward(self, x): # D\n        embedding_dim, hidden_dim, vocab_size, tagset_size = self.arg\n        y = W.embedding(x, embedding_dim, vocab_size) # D->DC\n        y = W.lstm(y.T[None, ...], hidden_dim) # DC->BCD\n        y = W.linear(y, tagset_size) # BCD\n        y = F.log_softmax(y, dim=1) # BCD\n        return y[0].T # DC\n\n\nclass TorchTagger(nn.Module):\n    def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):\n        super().__init__()\n        self.hidden_dim = hidden_dim\n        self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)\n        self.lstm = nn.LSTM(embedding_dim, hidden_dim)\n        self.hidden2tag = nn.Linear(hidden_dim, tagset_size)\n    def forward(self, sentence):\n        embeds = self.word_embeddings(sentence)\n        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))\n        tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))\n        tag_scores = F.log_softmax(tag_space, dim=1)\n        return tag_scores\n\n\ndef prepare_sequence(seq, to_ix):\n    idxs = [to_ix[w] for w in seq]\n    return torch.tensor(idxs, dtype=torch.long)\n\n\ndef main():\n    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')\n    parser.add_argument(\n        '--warm', action='store_true', help='use warm instead of vanilla pytorch.')\n    p = parser.parse_args()\n    torch.manual_seed(1)\n    #\n    arg = (6, 6, len(word_to_ix), len(tag_to_ix))\n    model = WarmTagger(*arg) if p.warm else TorchTagger(*arg)\n    print(f'Using {model._get_name()}.')\n    loss_function = nn.NLLLoss()\n    optimizer = optim.SGD(model.parameters(), lr=0.1)\n    #\n    for epoch in range(300):\n        for sentence, tags in training_data:\n            model.zero_grad()\n            sentence_in = prepare_sequence(sentence, word_to_ix)\n            targets = prepare_sequence(tags, tag_to_ix)\n            tag_scores = model(sentence_in)\n            loss = loss_function(tag_scores, targets)\n            loss.backward()\n            optimizer.step()\n    #\n    with torch.no_grad():\n        inputs = prepare_sequence(testing_data[0][0], word_to_ix)\n        tag_scores = model(inputs)\n        ix = torch.argmax(tag_scores, -1).numpy()\n        print(testing_data[0][0])\n        print('Network tags:\\n', [ix_to_tag[i] for i in ix])\n        print('True tags:\\n', testing_data[0][1])\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "examples/mnist.py",
    "content": "# 08-27-2019;\n\"\"\"\nMNIST training example.\nUse `python mnist.py` to run with PyTorch NN.\nUse `python mnist.py --warm` to run with PyWarm NN.\nUse `python mnist.py --help` to see a list of cli argument options.\n\"\"\"\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport argparse\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nfrom torchvision import datasets, transforms\nimport warm\nimport warm.functional as W\n\n\nclass WarmNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [1, 1, 28, 28])\n    def forward(self, x):\n        x = W.conv(x, 20, 5, activation='relu')\n        x = F.max_pool2d(x, 2)\n        x = W.conv(x, 50, 5, activation='relu')\n        x = F.max_pool2d(x, 2)\n        x = x.view(-1, 800)\n        x = W.linear(x, 500, activation='relu')\n        x = W.linear(x, 10)\n        return F.log_softmax(x, dim=1)\n\n\nclass TorchNet(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(1, 20, 5, 1)\n        self.conv2 = nn.Conv2d(20, 50, 5, 1)\n        self.fc1 = nn.Linear(4*4*50, 500)\n        self.fc2 = nn.Linear(500, 10)\n    def forward(self, x):\n        x = F.relu(self.conv1(x))\n        x = F.max_pool2d(x, 2, 2)\n        x = F.relu(self.conv2(x))\n        x = F.max_pool2d(x, 2, 2)\n        x = x.view(-1, 4*4*50)\n        x = F.relu(self.fc1(x))\n        x = self.fc2(x)\n        return F.log_softmax(x, dim=1)\n\n\ndef train(p, model, device, train_loader, optimizer, epoch):\n    model.train()\n    for batch_idx, (data, target) in enumerate(train_loader):\n        data, target = data.to(device), target.to(device)\n        optimizer.zero_grad()\n        output = model(data)\n        loss = F.nll_loss(output, target)\n        loss.backward()\n        optimizer.step()\n        if batch_idx%p.log_interval == 0:\n            print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n                epoch, batch_idx*len(data), len(train_loader.dataset),\n                100.*batch_idx/len(train_loader), loss.item()))\n\n\ndef test(p, model, device, test_loader):\n    model.eval()\n    test_loss = 0\n    correct = 0\n    size = len(test_loader.dataset)\n    with torch.no_grad():\n        for data, target in test_loader:\n            data, target = data.to(device), target.to(device)\n            output = model(data)\n            test_loss += F.nll_loss(output, target, reduction='sum').item()\n            pred = output.argmax(dim=1, keepdim=True)\n            correct += pred.eq(target.view_as(pred)).sum().item()\n    test_loss /= size\n    print(f'\\nTest loss: {test_loss:.4f}, Accuracy: {correct}/{size} ({100*correct/size:.2f}%)\\n')\n\n\ndef main():\n    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')\n    parser.add_argument(\n        '--warm', action='store_true', help='use warm instead of vanilla pytorch.')\n    parser.add_argument(\n        '--batch-size', type=int, default=128, metavar='N', help='input batch size for training (default: 128)')\n    parser.add_argument(\n        '--test-batch-size', type=int, default=1000, metavar='N', help='input batch size for testing (default: 1000)')\n    parser.add_argument(\n        '--epochs', type=int, default=3, metavar='N', help='number of epochs to train (default: 3)')\n    parser.add_argument(\n        '--lr', type=float, default=0.02, metavar='LR', help='learning rate (default: 0.02)')\n    parser.add_argument(\n        '--momentum', type=float, default=0.5, metavar='M', help='SGD momentum (default: 0.5)')\n    parser.add_argument(\n        '--no-cuda', action='store_true', default=False, help='disables CUDA training')\n    parser.add_argument(\n        '--seed', type=int, default=1, metavar='S', help='random seed (default: 1)')\n    parser.add_argument(\n        '--log-interval', type=int, default=10, metavar='N', help='number of batchs between logging training status')\n    parser.add_argument(\n        '--save-model', action='store_true', default=False, help='For Saving the current Model')\n    p = parser.parse_args()\n    #\n    torch.manual_seed(p.seed)\n    use_cuda = not p.no_cuda and torch.cuda.is_available()\n    device = 'cuda' if use_cuda else 'cpu'\n    kw = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}\n    data_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), ])\n    train_data = datasets.MNIST('../data', train=True, download=True, transform=data_transform)\n    test_data = datasets.MNIST('../data', train=False, download=True, transform=data_transform)\n    train_loader = torch.utils.data.DataLoader(train_data, batch_size=p.batch_size, shuffle=True, **kw)\n    test_loader = torch.utils.data.DataLoader(test_data, batch_size=p.test_batch_size, shuffle=True, **kw)\n    model = WarmNet() if p.warm else TorchNet()\n    print(f'Using {model._get_name()}.')\n    model = model.to(device)\n    optimizer = optim.SGD(model.parameters(), lr=p.lr, momentum=p.momentum)\n    print(f'Training with {p.epochs} epochs on {device} device.')\n    #\n    for i in range(p.epochs):\n        train(p, model, device, train_loader, optimizer, i)\n        test(p, model, device, test_loader)\n    #\n    if p.save_model:\n        torch.save(model.state_dict(), 'mnist_cnn.pt')\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "examples/mobilenet.py",
    "content": "# 09-03-2019;\n\"\"\"\nConstruct a WarmMobileNetV2() using PyWarm, then copy state dicts\nfrom torchvision.models.mobilenet_v2() into WarmMobileNetV2(),\ncompare if it produce identical results as the official one.\n\"\"\"\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.util\nimport warm.functional as W\n\n\ndef conv_bn_relu(x, size, stride=1, expand=1, kernel=3, groups=1, name=''):\n    x = W.conv(x, size, kernel, padding=(kernel-1)//2, stride=stride, groups=groups, bias=False,\n        name=f'{name}-0', )\n    return W.batch_norm(x, activation='relu6', name=f'{name}-1')\n\n\ndef bottleneck(x, size_out, stride, expand, name=''):\n    size_in = x.shape[1]\n    size_mid = size_in*expand\n    y = conv_bn_relu(x, size_mid, kernel=1, name=f'{name}-conv-0') if expand > 1 else x\n    y = conv_bn_relu(y, size_mid, stride, kernel=3, groups=size_mid, name=f'{name}-conv-{1 if expand > 1 else 0}')\n    y = W.conv(y, size_out, kernel=1, bias=False, name=f'{name}-conv-{2 if expand > 1 else 1}')\n    y = W.batch_norm(y, name=f'{name}-conv-{3 if expand > 1 else 2}')\n    if stride == 1 and size_in == size_out:\n        y += x # residual shortcut\n    return y\n\n\ndef conv1x1(x, *arg, **kw):\n    return conv_bn_relu(x, *arg, kernel=1, **kw)\n\n\ndef pool(x, *arg, **kw):\n    return x.mean([2, 3])\n\n\ndef classify(x, size, *arg, **kw):\n    x = W.dropout(x, rate=0.2, name='classifier-0')\n    return W.linear(x, size, name='classifier-1')\n\n\ndefault_spec = (\n    (None, 32, 1, 2, conv_bn_relu),  # t, c, n, s, operator\n    (1, 16, 1, 1, bottleneck),\n    (6, 24, 2, 2, bottleneck),\n    (6, 32, 3, 2, bottleneck),\n    (6, 64, 4, 2, bottleneck),\n    (6, 96, 3, 1, bottleneck),\n    (6, 160, 3, 2, bottleneck),\n    (6, 320, 1, 1, bottleneck),\n    (None, 1280, 1, 1, conv1x1),\n    (None, None, 1, None, pool),\n    (None, 1000, 1, None, classify), )\n\n\nclass WarmMobileNetV2(nn.Module):\n    def __init__(self):\n        super().__init__()\n        warm.up(self, [2, 3, 224, 224])\n    def forward(self, x):\n        count = 0\n        for t, c, n, s, op in default_spec:\n            for i in range(n):\n                stride = s if i == 0 else 1\n                x = op(x, c, stride, t, name=f'features-{count}')\n                count += 1\n        return x\n\n\ndef test():\n    \"\"\" Compare the classification result of WarmMobileNetV2 versus torchvision mobilenet_v2. \"\"\"\n    new = WarmMobileNetV2()\n    from torchvision.models import mobilenet_v2\n    old = mobilenet_v2()\n    state = old.state_dict()\n    for k in list(state.keys()): # Map parameters of old, e.g. layer2.0.conv1.weight\n        s = k.split('.') # to parameters of new, e.g. layer2-0-conv1.weight\n        s = '-'.join(s[:-1])+'.'+s[-1]\n        state[s] = state.pop(k)\n    new.load_state_dict(state)\n    warm.util.summary(old)\n    warm.util.summary(new)\n    x = torch.randn(1, 3, 224, 224)\n    with torch.no_grad():\n        old.eval()\n        y_old = old(x)\n        new.eval()\n        y_new = new(x)\n        if torch.equal(y_old, y_new):\n            print('Success! Same results from old and new.')\n        else:\n            print('Warning! New and old produce different results.')\n\n\nif __name__ == '__main__':\n    test()\n"
  },
  {
    "path": "examples/resnet.py",
    "content": "# 08-29-2019;\n\"\"\"\nConstruct a WarmResNet() using PyWarm, then copy state dicts\nfrom torchvision.models.resnet18() into WarmResNet(),\ncompare if it produce identical results as the official one.\n\"\"\"\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.util\nimport warm.functional as W\n\n\ndef basic(x, size, stride, stack_index, block_index):\n    \"\"\" The basic block. \"\"\"\n    prefix = f'layer{stack_index+1}-{block_index}-'\n    y = W.conv(x, size, 3, stride=stride, padding=1, bias=False, name=prefix+'conv1')\n    y = W.batch_norm(y, activation='relu', name=prefix+'bn1')\n    y = W.conv(y, size, 3, stride=1, padding=1, bias=False, name=prefix+'conv2')\n    y = W.batch_norm(y, name=prefix+'bn2')\n    if y.shape[1] != x.shape[1]:\n        x = W.conv(x, y.shape[1], 1, stride=stride, bias=False, name=prefix+'downsample-0')\n        x = W.batch_norm(x, name=prefix+'downsample-1')\n    return F.relu(y+x)\n\n\ndef stack(x, num_block, size, stride, stack_index, block=basic):\n    \"\"\" A stack of num_block blocks. \"\"\"\n    for block_index, s in enumerate([stride]+[1]*(num_block-1)):\n        x = block(x, size, s, stack_index, block_index)\n    return x\n\n\nclass WarmResNet(nn.Module):\n    def __init__(self, block=basic, stack_spec=((2, 64, 1), (2, 128, 2), (2, 256, 2), (2, 512, 2))):\n        super().__init__()\n        self.block = block\n        self.stack_spec = stack_spec\n        warm.up(self, [2, 3, 32, 32])\n    def forward(self, x):\n        y = W.conv(x, 64, 7, stride=2, padding=3, bias=False, name='conv1')\n        y = W.batch_norm(y, activation='relu', name='bn1')\n        y = F.max_pool2d(y, 3, stride=2, padding=1)\n        for i, spec in enumerate(self.stack_spec):\n            y = stack(y, *spec, i, block=self.block)\n        y = F.adaptive_avg_pool2d(y, 1)\n        y = torch.flatten(y, 1)\n        y = W.linear(y, 1000, name='fc')\n        return y\n\n\ndef test_time(fn, *arg, repeat=10, **kw):\n    dur = 0.0\n    for i in range(repeat):\n        start = time.time()\n        y = fn(*arg, **kw)\n        dur += time.time()-start\n    return dur\n\n\ndef test():\n    \"\"\" Compare the classification result of WarmResNet versus torchvision resnet18. \"\"\"\n    new = WarmResNet()\n    from torchvision.models import resnet18\n    old = resnet18()\n    state = old.state_dict()\n    for k in list(state.keys()): # Map parameters of old, e.g. layer2.0.conv1.weight\n        s = k.split('.') # to parameters of new, e.g. layer2-0-conv1.weight\n        s = '-'.join(s[:-1])+'.'+s[-1]\n        state[s] = state.pop(k)\n    new.load_state_dict(state)\n    warm.util.summary(old)\n    warm.util.summary(new)\n    x = torch.randn(2, 3, 224, 224)\n    with torch.no_grad():\n        old.eval()\n        y_old = old(x)\n        new.eval()\n        y_new = new(x)\n        if torch.equal(y_old, y_new):\n            print('Success! Same results from old and new.')\n        else:\n            print('Warning! New and old produce different results.')\n        t_old = test_time(old, x)\n        t_new = test_time(new, x)\n        print('Total forward time for old:', t_old, 'seconds.')\n        print('Total forward time for new:', t_new, 'seconds.')\n\n\nif __name__ == '__main__':\n    test()\n"
  },
  {
    "path": "examples/transformer.py",
    "content": "# 09-05-2019;\n\"\"\"\nThe Transformer model from paper *Attention is all you need*.\n\"\"\"\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nsys.path.append('..')\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warm\nimport warm.util\nimport warm.functional as W\n\n\ndef multi_head_attention(x, y=None, num_head=8, dropout=0.1, mask=None, **kw):\n    def split_heads(t): # (B, C, L) -> (B, N, H, L) where N*H == C\n        return t.reshape(batch, num_head, size//num_head, t.shape[-1])\n    def merge_heads(t): # (B, N, H, L) -> (B, C, L)\n        return t.reshape(batch, -1, t.shape[-1]) # (B, C, L)\n    if y is None:\n        y = x # self attention\n    batch, size = x.shape[:2] # B, C, Lx\n    assert size%num_head == 0, 'num_head must be a divisor of size.'\n    assert y.shape[:2] == x.shape[:2], 'The first 2 dims of x, y must match.'\n    q = W.linear(x, size) # query\n    k = W.linear(y, size) # key\n    v = W.linear(y, size) # value\n    q = split_heads(q) # (B, N, H, Lx)\n    k = split_heads(k) # (B, N, H, Ly)\n    v = split_heads(v) # (B, N, H, Ly)\n    q *= (size//num_head)**(-0.5)\n    a = q.transpose(2, 3).contiguous().matmul(k) # attention weights, (B, N, Lx, Ly)\n    if mask is not None:\n        a += mask\n    a = F.softmax(a, dim=-1)\n    a = W.dropout(a, dropout)\n    x = v.matmul(a.transpose(2, 3).contiguous()) # (B, N, H, Lx)\n    x = merge_heads(x) # (B, C, Lx)\n    return W.linear(x, size)\n\n\ndef feed_forward(x, size_ff=2048, dropout=0.1, **kw):\n    y = W.linear(x, size_ff, activation='relu')\n    y = W.dropout(y, dropout)\n    return W.linear(y, x.shape[1])\n\n\ndef residual_add(x, layer, dropout=0.1, **kw):\n    y = W.layer_norm(x)\n    y = layer(y, **kw)\n    y = W.dropout(y, dropout)\n    return x+y\n\n\ndef encoder(x, num_encoder=6, **kw):\n    for i in range(num_encoder):\n        x = residual_add(x, multi_head_attention, **kw)\n        x = residual_add(x, feed_forward, **kw)\n    return W.layer_norm(x)\n\n\ndef decoder(x, y, num_decoder=6, mask_x=None, mask_y=None, **kw):\n    for i in range(num_decoder):\n        y = residual_add(y, multi_head_attention, mask=mask_y, **kw)\n        y = residual_add(x, multi_head_attention, y=y, mask=mask_x, **kw)\n        y = residual_add(y, feed_forward, **kw)\n    return W.layer_norm(y)\n\n\ndef transformer(x, y, **kw):\n    x = encoder(x, **kw)\n    x = decoder(x, y, **kw)\n    return x\n\n\nclass Transformer(nn.Module):\n    def __init__(self, *shape, **kw):\n        super().__init__()\n        self.kw = kw\n        warm.up(self, *shape)\n    def forward(self, x, y):\n        return transformer(x, y, **self.kw)\n"
  },
  {
    "path": "pyproject.toml",
    "content": "[tool.poetry]\nname = 'PyWarm'\nversion = '0.4.1'\ndescription = 'A cleaner way to build neural networks for PyTorch.'\nlicense = 'MIT'\nauthors = ['blue-season <very.blue.season@gmail.com>']\nreadme = 'README.md'\nrepository = 'https://github.com/blue-season/pywarm'\nhomepage = 'https://github.com/blue-season/pywarm'\nkeywords = ['pywarm', 'pytorch', 'neural network', 'deep learning']\npackages = [ { include='warm' }, ]\n\n\n[tool.poetry.dependencies]\npython = '>=3.6'\n\n\n[tool.poetry.dev-dependencies]\ntoml = '>=0.9'\npytest = '>=3.0'\ntorch = '>=1.0'\ntorchvision = '>=0.4'\n\n\n[tool.portray]\nmodules = ['warm']\n\n\n[tool.portray.mkdocs]\nmarkdown_extensions = ['pymdownx.superfences']\n\n\n[tool.portray.mkdocs.theme]\nlogo = 'docs/pywarm-logo-small-light.gif'\nfavicon = 'docs/pywarm-logo-small-dark.gif'\nname = 'material'\npalette = {primary='deep orange', accent='pink'}\n\n\n[tool.portray.pdoc3]\nconfig = ['show_source_code=False',\n    'show_type_annotations=False',\n    'sort_identifiers=True',\n    'show_inherited_members=False']\ntemplate_dir = 'docs'\n"
  },
  {
    "path": "tests/test_engine.py",
    "content": "# 08-31-2019;\n\"\"\"\nTest cases for warm.engine.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport copy\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nfrom warm import engine\n\n\ndef test_set_get_default_parent():\n    a = nn.Identity()\n    b = nn.Identity()\n    engine.set_default_parent(a)\n    assert engine.get_default_parent() is a, 'get_default_parent result mismatchs set_default_parent.'\n    engine.set_default_parent(b)\n    assert engine.get_default_parent() is b, 'get_default_parent result mismatchs set_default_parent.'\n\n\ndef test_auto_name():\n    a = nn.Identity()\n    for i in range(10):\n        assert engine._auto_name('test', a) == f'test_{i+1}', 'new calls to _auto_name failed to increment name count.'\n    a(None) # test if forward pre hook is triggered to reset names\n    assert engine._auto_name('test', a) == 'test_1', 'forward_pre_hook did not work.'\n\n\ndef test_initialize():\n    a = nn.Parameter(torch.zeros(3, 4))\n    b = nn.Parameter(torch.zeros(3, 4))\n    c = nn.Parameter(torch.zeros(3, 4))\n    torch.manual_seed(1)\n    engine.initialize_(a, 'normal_')\n    torch.manual_seed(1)\n    nn.init.normal_(b)\n    assert torch.equal(a, b), 'initialize_ with str spec did not work correctly.'\n    assert not torch.equal(a, c), 'initialize_ with str spec did not work.'\n    torch.manual_seed(1)\n    engine.initialize_(c, nn.init.normal_)\n    assert torch.equal(a, c), 'initialize_ with function spec did not work correctly.'\n\n\ndef test_activate():\n    a = torch.randn(3, 4)\n    b = copy.deepcopy(a)\n    a = engine.activate(a, 'hardshrink')\n    b = F.hardshrink(b)\n    assert torch.equal(a, b), 'activate with str spec did not work correctly.'\n    a = engine.activate(a, 'relu')\n    b = F.relu(b)\n    assert torch.equal(a, b), 'activate with str spec did not work correctly.'\n\n\ndef test_permute():\n    x = torch.randn(1, 2, 3)\n    y = engine.permute(x, 'BCD', 'DCB')\n    assert list(y.shape) == [3, 2, 1], 'permute 3d tensor with str in_shape and str out_shape did not work correctly.'\n    y = engine.permute(x, 'BCD', None)\n    assert list(y.shape) == [1, 2, 3], 'permute tensor with None out_shape did not work corretly.'\n    y = engine.permute(x, 'BCD', [1, 0, 2])\n    assert list(y.shape) == [2, 1, 3], 'permute tensor with list out_shape did not work corretly.'\n    x = torch.randn(1, 2, 3, 4)\n    y = engine.permute(x, 'BCD', 'DCB')\n    assert list(y.shape) == [3, 4, 2, 1], 'permute 4d tensor with str in_shape and str out_shape did not work correctly.'\n    y = engine.permute(x, 'DBC', 'CDB')\n    assert list(y.shape) == [4, 1, 2, 3], 'permute 4d tensor with str in_shape and str out_shape did not work correctly.'\n    x = torch.randn(1, 2, 3, 4, 5)\n    y = engine.permute(x, 'BDC', 'BCD')\n    assert list(y.shape) == [1, 5, 2, 3, 4], 'permute 5d tensor with str in_shape and str out_shape did not work correctly.'\n    x = torch.randn(1, 2)\n    y = engine.permute(x, 'BDC', 'BCD')\n    assert list(y.shape) == [1, 2], 'permute 2d tensor with str in_shape and str out_shape did not work correctly.'\n    y = engine.permute(x, 'CBD', 'DBC')\n    assert list(y.shape) == [2, 1], 'permute 2d tensor with str in_shape and str out_shape did not work correctly.'\n\n\ndef test_unused_kwargs():\n    kw = {'unused1':0, 'unused2':0, 'base_class':0}\n    unused = engine.unused_kwargs(kw)\n    assert 'base_class' not in unused, 'unused_kwargs leaks used.'\n    assert set(unused.keys()) == {'unused1', 'unused2'}, 'unused_kwargs did not filter kw correctly.'\n\n\ndef test_prepare_model_is_ready():\n    class TestModel(nn.Module):\n        def forward(self, x):\n            x = engine.forward(x, nn.Linear, 'linear',\n                base_arg=(x.shape[-1], 4, False), # in_features, out_features, bias\n                in_shape=None, out_shape=None, base_shape=None,\n                initialization={'weight':'ones_'}, activation=(F.dropout, {'p':1.0}), )\n            return x\n    x = torch.randn(1, 2, 3)\n    m = TestModel()\n    assert not engine.is_ready(m), 'is_ready did not work correctly.'\n    engine.prepare_model_(m, x)\n    assert engine.is_ready(m), 'prepare_model_ did not work correctly.'\n    assert m.linear_1.bias is None, 'linear_1 should not have bias.'\n    assert torch.allclose(m.linear_1.weight, torch.Tensor([1.0])), 'linear_1.weight should be initialized to all 1s.'\n    y = m(x)\n    assert torch.allclose(y, torch.Tensor([0.0])), 'y should be all 0s because we dropout everything.'\n    assert list(y.shape) == [1, 2, 4], 'y should have shape [1, 2, 4] after linear projection.'\n\n\ndef test_forward():\n    x = torch.randn(1, 2, 3)\n    m = nn.Module()\n    engine.set_default_parent(m)\n    class TripleOut(nn.Module): # to test tuple_out\n        def forward(self, x, b=1, c='2'):\n            return x+b, x, c\n    y = engine.forward(x, base_class=TripleOut, base_name='tri', tuple_out=False)\n    assert isinstance(y, torch.Tensor), 'tuple_out did not work correctly.'\n    y = engine.forward(x, base_class=TripleOut, base_name='tri', tuple_out=True)\n    assert isinstance(y, tuple) and len(y) == 3 and y[-1] == '2', 'tuple_out did not work correctly.'\n    y = engine.forward(x, base_class=TripleOut, base_name='tri', forward_kw={'c':3}, tuple_out=True)\n    assert y[-1] == 3, 'forward_kw did not work correctly.'\n    y = engine.forward(x, base_class=TripleOut, base_name='tri', forward_arg=(2.0,))\n    assert torch.allclose(y-x, torch.Tensor([2.0])), 'forward_arg did not work correctly.'\n    y = engine.forward(x, base_class=TripleOut, activation=(F.dropout, {'p':1.0}))\n    assert torch.allclose(y, torch.Tensor([0.0])), 'activation did not work correctly.'\n    y = engine.forward(\n        x, base_class=nn.Linear, base_kw={'out_features':4}, infer_kw={'in_features':'C'}, base_shape='BDC')\n    assert  y.shape[1] == 4, 'base_kw, infer_kw did not work correctly.'\n\n\ndef test_namespace():\n    m = nn.Module()\n    engine.set_default_parent(m)\n    @engine.namespace\n    def f1(name=''):\n        return ';'.join([f2(name=name) for i in range(2)])\n    @engine.namespace\n    def f2(name=''):\n        return name\n    s0, s1, s2 = [f1() for i in range(3)]\n    assert s0 == 'f1_1-f2_1;f1_1-f2_2'\n    assert s1 == 'f1_2-f2_1;f1_2-f2_2'\n    assert s2 == 'f1_3-f2_1;f1_3-f2_2'\n"
  },
  {
    "path": "tests/test_functional.py",
    "content": "# 08-31-2019;\n\"\"\"\nTest cases for warm.functional.\n\"\"\"\nimport torch\nimport torch.nn as nn\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nimport warm.module as mm\nimport warm.functional as W\n\n\ndef test_conv():\n    m = nn.Module()\n    x = torch.randn(1, 2, 8) # BCD\n    torch.manual_seed(100)\n    y0 = nn.Conv1d(2, 3, 3)(x)\n    torch.manual_seed(100)\n    y1 = W.conv(x, 3, 3, parent=m)\n    assert torch.equal(y0, y1), 'conv incorrect output on 1d signal.'\n    m = nn.Module()\n    x = torch.randn(1, 2, 3, 4) # BCD\n    torch.manual_seed(100)\n    y0 = nn.Conv2d(2, 3, 3)(x)\n    torch.manual_seed(100)\n    y1 = W.conv(x, 3, 3, parent=m)\n    assert torch.equal(y0, y1), 'conv incorrect output on 2d signal.'\n\n\ndef test_linear():\n    m = nn.Module()\n    x = torch.randn(1, 2, 3) # BDC\n    torch.manual_seed(100)\n    y0 = nn.Linear(3, 4)(x)\n    torch.manual_seed(100)\n    y1 = W.linear(x, 4, parent=m, in_shape='BDC', out_shape='BDC')\n    assert torch.equal(y0, y1), 'linear incorrect output on 1d signal.'\n    m = nn.Module()\n    x = torch.randn(1, 2, 3, 4) # BDC\n    torch.manual_seed(100)\n    y0 = nn.Linear(4, 3)(x)\n    torch.manual_seed(100)\n    y1 = W.linear(x, 3, parent=m, in_shape='BDC', out_shape='BDC')\n    assert torch.equal(y0, y1), 'batch_norm incorrect output on 2d signal.'\n\n\ndef test_batch_norm():\n    m = nn.Module()\n    x = torch.randn(1, 2, 3) # BCD\n    torch.manual_seed(100)\n    y0 = nn.BatchNorm1d(2)(x)\n    torch.manual_seed(100)\n    y1 = W.batch_norm(x, parent=m)\n    m = nn.Module()\n    assert torch.equal(y0, y1), 'batch_norm incorrect output on 1d signal.'\n    x = torch.randn(1, 2, 3, 4) # BCD\n    torch.manual_seed(100)\n    y0 = nn.BatchNorm2d(2)(x)\n    torch.manual_seed(100)\n    y1 = W.batch_norm(x, parent=m)\n    assert torch.equal(y0, y1), 'batch_norm incorrect output on 2d signal.'\n\n\ndef test_lstm():\n    m = nn.Module()\n    x = torch.randn(3, 2, 1) # DBC\n    torch.manual_seed(100)\n    y0, *_ = nn.LSTM(1, 2, num_layers=2)(x)\n    torch.manual_seed(100)\n    y1 = W.lstm(x, 2, num_layers=2, parent=m, init_weight_hh=None, in_shape='DBC', out_shape='DBC')\n    assert torch.equal(y0, y1)\n    y1, s1 = W.lstm(x, 2, parent=m, tuple_out=True) # test tuple out\n    assert len(s1) == 2\n    y2 = W.lstm((y1, s1), 2, parent=m) # test tuple in\n    assert torch.is_tensor(y2)\n\n\ndef test_gru():\n    m = nn.Module()\n    x = torch.randn(3, 2, 1) # DBC\n    torch.manual_seed(100)\n    y0, *_ = nn.GRU(1, 2, num_layers=2)(x)\n    torch.manual_seed(100)\n    y1 = W.gru(x, 2, num_layers=2, parent=m, init_weight_hh=None, in_shape='DBC', out_shape='DBC')\n    assert torch.equal(y0, y1)\n\n\ndef test_identity():\n    x = torch.randn(1, 2, 3)\n    assert torch.equal(W.identity(x, 7, 8, a='b'), x)\n\n\ndef test_dropout():\n    m = nn.Module()\n    x = torch.ones(2, 6, 6, 6)\n    torch.manual_seed(100)\n    y0 = nn.Dropout(0.3)(x)\n    torch.manual_seed(100)\n    y1 = W.dropout(x, 0.3, parent=m)\n    assert torch.equal(y0, y1)\n    torch.manual_seed(100)\n    y0 = nn.Dropout2d(0.3)(x)\n    torch.manual_seed(100)\n    y1 = W.dropout(x, 0.3, by_channel=True, parent=m)\n    assert torch.equal(y0, y1)\n\n\ndef test_transformer():\n    m = nn.Module()\n    x = torch.randn(10, 2, 4)\n    y = torch.randn(6, 2, 4)\n    torch.manual_seed(100)\n    z0 = nn.Transformer(4, 2, 1, 1, dim_feedforward=8)(x, y)\n    torch.manual_seed(100)\n    z1 = W.transformer(x, y, 1, 1, 2, dim_feedforward=8, in_shape='DBC', out_shape='DBC', parent=m)\n    assert torch.equal(z0, z1)\n    torch.manual_seed(100)\n    z1 = W.transformer(x, y, 1, 1, 2, dim_feedforward=8, in_shape='DBC', out_shape='DBC', parent=m, causal=True)\n    assert not torch.equal(z0, z1)\n    z1 = W.transformer(x, None, 2, 0, 2, dim_feedforward=8, in_shape='DBC', out_shape='DBC', parent=m)\n    assert z1.shape == x.shape\n\n\ndef test_layer_norm():\n    m = nn.Module()\n    x = torch.randn(1, 2, 3, 4, 5)\n    y0 = nn.LayerNorm([3, 4, 5])(x)\n    y1 = W.layer_norm(x, [2, -2, -1], parent=m)\n    assert torch.equal(y0, y1)\n    y0 = nn.LayerNorm(5)(x)\n    y1 = W.layer_norm(x, dim=-1, parent=m)\n    assert torch.equal(y0, y1)\n    x0 = x.permute(0, 4, 2, 1, 3)\n    y0 = nn.LayerNorm([2, 4])(x0)\n    y0 = y0.permute(0, 3, 2, 4, 1)\n    y1 = W.layer_norm(x, dim=[1, -2], parent=m)\n    assert torch.equal(y0, y1)\n\n\ndef test_embedding():\n    m = nn.Module()\n    x = torch.randint(0, 20, (1, 2, 3, 4, 5))\n    torch.manual_seed(10)\n    y0 = nn.Embedding(20, 8)(x)\n    torch.manual_seed(10)\n    y1 = W.embedding(x, 8, 20, parent=m)\n    assert torch.equal(y0, y1)\n    torch.manual_seed(10)\n    y1 = W.embedding(x, 8, 20, in_shape='DCB', parent=m) # shapes should have no effect\n    assert torch.equal(y0, y1)\n    torch.manual_seed(10)\n    y1 = W.embedding(x, 8, 20, out_shape='CBD', parent=m) # shapes should have no effect\n    assert torch.equal(y0, y1)\n    y1 = W.embedding(x, 8, parent=m) # should work without a explicit vocabulary size\n    torch.manual_seed(10)\n    y1 = W.embedding(x.double(), 8, parent=m) # should work with non integer tensors.\n    assert torch.equal(y0, y1)\n"
  },
  {
    "path": "tests/test_module.py",
    "content": "# 08-31-2019;\n\"\"\"\nTest cases for warm.module.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nimport warm.module as mm\nimport warm.functional as W\n\n\ndef test_lambda():\n    f = lambda x: x*2\n    m = mm.Lambda(f)\n    x = torch.randn(1, 2)\n    assert torch.equal(f(x), m(x)), 'lambda did not work correctly.'\n    def f(x, w, b=5):\n        return x*w+b\n    m = mm.Lambda(f, 2, b=1)\n    assert torch.equal(f(x, 2, 1), m(x)), 'function with args and kwargs did not work correctly.'\n    x = torch.randn(3, 2, 4)\n    m = mm.Lambda(W.permute, 'BDC', 'BCD')\n    assert list(m(x).shape) == [3, 4, 2], 'lambda permute did not work correctly.'\n\n\ndef test_sequential():\n    s = mm.Sequential(\n        nn.Linear(1, 2),\n        nn.LSTM(2, 3, batch_first=True), # lstm and gru return multiple outputs\n        nn.GRU(3, 4, batch_first=True),\n        mm.Lambda(W.permute, 'BDC', 'BCD'),\n        nn.Conv1d(4, 5, 1), )\n    x = torch.randn(3, 2, 1)\n    assert list(s(x).shape) == [3, 5, 2]\n\n\ndef test_shortcut():\n    l = nn.Linear(1, 1, bias=False)\n    nn.init.constant_(l.weight, 2.0)\n    s = mm.Shortcut(l)\n    x = torch.ones(1, 1)\n    assert torch.allclose(s(x), torch.Tensor([3.0]))\n"
  },
  {
    "path": "tests/test_util.py",
    "content": "# 08-31-2019;\n\"\"\"\nTest cases for warm.util.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nfrom warm import util\n\n\ndef test_camel_to_snake():\n    assert util.camel_to_snake('CamelAndSnake') == 'camel_and_snake'\n    assert util.camel_to_snake('camelAndSnake') == 'camel_and_snake'\n    assert util.camel_to_snake('camelANDSnake') == 'camel_and_snake'\n    assert util.camel_to_snake('CAMELAndSnake') == 'camel_and_snake'\n    assert util.camel_to_snake('CAMELAndSNAKE') == 'camel_and_snake'\n    assert util.camel_to_snake('CamelAndSnake_') == 'camel_and_snake_'\n    assert util.camel_to_snake('_CamelAndSnake') == '__camel_and_snake'\n\n\ndef test_summary_str():\n    from examples.resnet import WarmResNet\n    m = WarmResNet()\n    s = util.summary_str(m)\n    assert len(s) > 0\n\n\ndef test_summary():\n    from examples.resnet import WarmResNet\n    m = WarmResNet()\n    util.summary(m)\n"
  },
  {
    "path": "tests/test_warm.py",
    "content": "# 09-10-2019;\n\"\"\"\nTest cases for the warm module.\n\"\"\"\nimport torch.nn as nn\nfrom pathlib import Path\nimport sys\nsys.path.append(str(Path(__file__).parent.parent))\nimport warm\n\n\ndef test_warm_up():\n    m = nn.Identity()\n    assert not warm.engine.is_ready(m), 'is_ready did not work correctly.'\n    warm.up(m, [1, 2, 3])\n    assert warm.engine.is_ready(m), 'warm.up did not work correctly.'\n"
  },
  {
    "path": "warm/__init__.py",
    "content": "# 09-10-2019;\n\n\"\"\" `warm.up` is an alias of\n[`warm.engine.prepare_model_`](https://blue-season.github.io/pywarm/reference/warm/engine/#prepare_model_). \"\"\"\nfrom warm.engine import prepare_model_ as up\n"
  },
  {
    "path": "warm/engine.py",
    "content": "# 08-26-2019;\n\"\"\"\nPyWarm engine to the functional interface.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom warm import util\n\n\n_DEFAULT_PARENT_MODULE = None\n\n\ndef set_default_parent(parent):\n    \"\"\" Set the default `parent` module. \"\"\"\n    global _DEFAULT_PARENT_MODULE\n    _DEFAULT_PARENT_MODULE = parent\n\n\ndef get_default_parent():\n    \"\"\" Get the default `parent` module. \"\"\"\n    global _DEFAULT_PARENT_MODULE\n    return _DEFAULT_PARENT_MODULE\n\n\ndef _auto_name(name, parent):\n    \"\"\" Track the count of reference to `name` from `parent`. \"\"\"\n    if not is_ready(parent):\n        parent._pywarm_auto_name_dict = {}\n        def _hook(model, x):\n            model._pywarm_auto_name_dict = {}\n        parent._pywarm_forward_pre_hook = parent.register_forward_pre_hook(_hook)\n    track = parent._pywarm_auto_name_dict\n    if name not in track:\n        track[name] = 0\n    track[name] += 1\n    return f'{name}_{track[name]}'\n\n\ndef prepare_model_(model, *data, device='cpu'):\n    \"\"\" Initialize all childen modules defined by `warm` in a parent `model`.\\n\n    -  `model: Module`; The parent model to be prepared.\n    -  `data: Tensor, or list of int`; A batch of data with the correct shape and type to be forwarded by model.\n        `data` can also be a list of `int`, in which case it is interpreted as the shape of the input data.\n    -  `device: str, or torch.device`; Should be the same for `model` and `data`. Default: `'cpu'`.\n    -  `return: Module`; The prepared model, with all children modules defined by `warm` initialized. \"\"\"\n    _auto_name('', model)\n    set_default_parent(model)\n    def _prep_data(d):\n        if isinstance(d, (np.ndarray, torch.Tensor)):\n            return torch.as_tensor(d).to(device)\n        elif isinstance(d, (list, tuple)):\n            if all(isinstance(x, int) for x in d):\n                return torch.randn(*d, device=device)\n            return [_prep_data(x) for x in d]\n        elif isinstance(d, dict):\n            return {k:_prep_data(v) for k, v in d.items()}\n    with torch.no_grad():\n        is_training = model.training\n        data = [_prep_data(d) for d in data]\n        model.eval()\n        model.to(device)\n        model(*data)\n        model.train(is_training)\n    return model\n\n\ndef is_ready(model):\n    \"\"\" Check if a `model` is prepared. \"\"\"\n    return hasattr(model, '_pywarm_forward_pre_hook')\n\n\ndef activate(x, spec, lookup=None):\n    \"\"\" Activate tensors with given nonlinearity `spec`ification.\\n\n    -  `x: Tensor or list of Tensor`; The tensors to be initialized.\n    -  `spec: str or callable or 2-tuple`; If a `str`, should be one of the nonlinearity functions contained in\n        `torch.nn.functional` or `torch`. If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`.\n        If a 2-`tuple`, it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`.\n    -  `lookup: None or list of module`; Parent modules to look for `spec`. If `None`, `[nn.functional, torch]` is used.\n    -  `return: Tensor or list of Tensor`; Activation results. \"\"\"\n    if spec is None:\n        return x\n    lookup = lookup or [nn.functional, torch]\n    if isinstance(spec, str):\n        for look in lookup:\n            try:\n                spec = getattr(look, spec)\n                break\n            except:\n                pass\n        if isinstance(spec, str):\n            raise ValueError(f'Unknown spec {spec}.')\n    if callable(spec):\n        spec = (spec, {})\n    fn, kw = spec\n    if isinstance(x, (list, tuple)):\n        return [fn(y, **kw) for y in x]\n    return fn(x, **kw)\n\n\ndef initialize_(x, spec):\n    \"\"\" Initialize parameters with given nonlinearity `spec`ification.\\n\n    -  `x: Tensor or list of Tensor`; The tensors to be initialized.\n    -  `spec: str or callable or 2-tuple`; If a `str`, should be one of the nonlinearity functions contained in\n        `torch.nn.init`. If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`. If a 2-`tuple`,\n        it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`. \"\"\"\n    activate(x, spec, lookup=[nn.init])\n\n\ndef permute(x, in_shape='BCD', out_shape='BCD', **kw):\n    \"\"\" Permute the dimensions of a tensor.\\n\n    -  `x: Tensor`; The nd-tensor to be permuted.\n    -  `in_shape: str`; The dimension shape of `x`. Can only have characters `'B'` or `'C'` or `'D'`,\n        which stand for Batch, Channel, or extra Dimensions. The default value `'BCD'` means\n        the input tensor `x` should be at lest 2-d with shape `(Batch, Channel, Dim0, Dim1, Dim2, ...)`,\n        where `Dim0, Dim1, Dim2 ...` stand for any number of extra dimensions.\n    -  `out_shape: str or tuple or None`; The dimension shape of returned tensor.  Default: `'BCD'`.\n        If a `str`, it is restricted to the same three characters `'B'`, `'C'` or `'D'` as the `in_shape`.\n        If a `tuple`, `in_shape` is ignored, and simply `x.permute(out_shape)` is returned.\n        If `None`, no permution will be performed.\n    -  `return: Tensor`; Permuted nd-tensor. \"\"\"\n    if (in_shape == out_shape) or (out_shape is None):\n        return x\n    if isinstance(out_shape, (list, tuple, torch.Size)):\n        return x.permute(*out_shape)\n    if isinstance(in_shape, str) and isinstance(out_shape, str) :\n        assert set(in_shape) == set(out_shape) <= {'B', 'C', 'D'}, 'In and out shapes must have save set of chars among B, C, and D.'\n        in_shape = in_shape.lower().replace('d', '...')\n        out_shape = out_shape.lower().replace('d', '...')\n        return torch.einsum(f'{in_shape}->{out_shape}', x)\n    return x\n\n\ndef unused_kwargs(kw):\n    \"\"\" Filter out entries used by `forward` and return the rest. \"\"\"\n    fn_kw = dict(base_class=None,\n        base_name=None, name=None, base_arg=None, base_kw=None, parent=None,\n        infer_kw=None, in_shape='BCD', base_shape=None, out_shape='BCD', tuple_out=False,\n        forward_arg=None, forward_kw=None, initialization=None, activation=None, )\n    return {k:v for k, v in kw.items() if k not in fn_kw}\n\n\ndef forward(x, base_class, \n        base_name=None, name=None, base_arg=None, base_kw=None, parent=None,\n        infer_kw=None, in_shape='BCD', base_shape='BCD', out_shape='BCD', tuple_out=False,\n        forward_arg=None, forward_kw=None, initialization=None, activation=None, **kw):\n    \"\"\" A forward template that creates child instances at the first time it is called.\\n\n    -  `x: Tensor`; The nd-tensor to be forwarded.\n    -  `base_class: Module`; A child `torch.nn.Module` that will be created at the first time this function is called.\n    -  `base_name: str`; Name for the `base_class`. Default: base_class name.\n    -  `name: str`; Name for the child module instance. Default: class name plus ordinal.\n    -  `base_arg: tuple`; Positional args to be passed to create the child module instance. Default: None.\n    -  `base_kw: dict`; KWargs to be passed to create the child module instance. Default: None.\n    -  `parent: Module`; The parent of the child instance.  Default: None. If `None`, will use `get_default_parent`.\n    -  `infer_kw: dict`; Key should be valid for the child instance. Value shoud be a character,\n        one of `'B'`, `'C'`, or `'D'` (see `permute`), to substitute for a dimension of `x`. Default: None.\n    -  `in_shape: str`; The dimension shape of `x`. See also `permute`. Default: `'BCD'`.\n    -  `base_shape: str`; The dimension shape required by the child module. See also `permute`. Default: `'BCD'`.\n    -  `out_shape: str or tuple or None`; The dimension shape of returned tensor. See also `permute`. Default: `'BCD'`.\n    -  `tuple_out: bool`; Whether the child module will return more than 1 outputs (e.g. `nn.RNN`).\n        If `True`, the returned value of the function will be a tuple containing all outputs. Default: False.\n    -  `forward_arg: tuple`; positional args to be passed when calling the child module instance. Default: None.\n    -  `forward_kw: dict`; KWargs to be passed when calling the child module instance. Default: None.\n    -  `initialization: dict`; Keys are name of parameters to initialize. Values are init specs, which can be \n        a, `str`, a `callable`, or `2-tuple`; See the `spec` argument of `initialize_` for details. Default: None.\n    -  `activation: str or callable or 2-tuple`; See the `spec` argument of `activate`. Default: None.\n    -  `return: Tensor or tuple`; If `tuple_out` is `True`, the returned value will be a `tuple`. \"\"\"\n    parent = parent or get_default_parent()\n    if name is None:\n        base_name = base_name or util.camel_to_snake(base_class.__name__)\n        name = _auto_name(base_name, parent)\n    if name not in parent._modules:\n        if infer_kw is not None:\n            shape = in_shape\n            if 'D' in shape:\n                shape = list(shape)\n                shape[shape.index('D')] = 'D'*(x.ndim-len(shape)+1)\n                shape = ''.join(shape)\n            infer_kw = {\n                k:x.shape[shape.find(v) if isinstance(v, str) else v]\n                for k, v in infer_kw.items()}\n        base = base_class(*(base_arg or []), **(infer_kw or {}), **(base_kw or {}), )\n        parent.add_module(name, base)\n        if initialization is not None:\n            s = parent.state_dict()\n            for k, v in initialization.items():\n                initialize_(s[name+'.'+k], v)\n    x = permute(x, in_shape, base_shape)\n    y = parent._modules[name](x, *(forward_arg or []), **(forward_kw or {}))\n    r = []\n    if isinstance(y, tuple):\n        y, *r = y\n    y = permute(y, base_shape, out_shape)\n    y = activate(y, activation)\n    if tuple_out:\n        return (y, *r)\n    return y\n\n\nimport functools\ndef namespace(f):\n    \"\"\" After decoration, the function name and call count will be appended to the `name` kw. \"\"\"\n    @functools.wraps(f)\n    def _wrapped(*arg, **kw):\n        parent = kw.get('parent', get_default_parent())\n        name = kw.get('name', '')\n        name = '_warmns_' + name + ('-' if name else '') + f.__name__\n        name = _auto_name(name, parent)\n        kw['name'] = name.replace('_warmns_', '')\n        return f(*arg, **kw)\n    return _wrapped\n"
  },
  {
    "path": "warm/functional.py",
    "content": "# 08-27-2019;\n\"\"\"\nWraps around various torch.nn Modules to fit into a functional interface.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nfrom warm import engine\nfrom warm import util\n\n\npermute = engine.permute\n\n\ndef conv(x, size, kernel, init_weight=None, init_bias=None, bias=True, **kw):\n    \"\"\" Convolution layer.\\n\n    -  `x: Tensor`; With shape `(Batch, Channel, *)` where `*` Can be 1d or 2d or 3d.\n        If 3d, shapes are `(Batch, Channel, Length)`.\n        If 4d, shapes are `(Batch, Channel, Height, Width)`.\n        If 5d, shapes are `(Batch, Channel, Depth, Height, Width)`.\n    -  `size: int`; Size of hidden filters, and size of the output channel.\n    -  `kernel: int or tuple`; Size of the convolution kernel.\n    -  `init_weight: None or str or callable`; Initialization specification for the weight tensor.\n        If a `str`, should be one of the nonlinearity functions contained in `torch.nn.init`.\n        If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`. If a 2-`tuple`,\n        it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.ConvNd`s default scheme.\n    -  `init_bias: None or str or callable`; Same as `init_weight`, but for the bias tensor.\n    -  `bias: bool`; If `True`, adds a learnable bias to the output. Default: `True`.\n    -  `**kw:dict`; Any additional KWargs are passed down to `torch.nn.ConvNd`, where N can be 1, 2 or 3.\n        as well as `warm.engine.forward`. Refer to their docs for details. Some of the additional ConvNd arguments:\n        `stride, padding, dilation, groups`.\n    -  `return: Tensor`; With shape `(Batch, Size, *)` where `*` can be 1d, 2d, 3d that depends on `x`. \"\"\"\n    d = x.ndim-3\n    assert d in [0, 1, 2], 'Incompatible number of dims for input x.'\n    inferred_kw = dict(\n        base_name='conv',\n        base_class=[nn.Conv1d, nn.Conv2d, nn.Conv3d][d],\n        base_kw={\n            'out_channels':size,\n            'kernel_size':kernel,\n            'bias':bias,\n            **engine.unused_kwargs(kw), },\n        infer_kw={'in_channels':'C'},\n        initialization={'weight':init_weight, **({'bias':init_bias} if bias else {})}, )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef linear(x, size, init_weight=None, init_bias=None, bias=True, **kw):\n    \"\"\" Linear transformation layer.\\n\n    -  `x: Tensor`; 2d or more, with shapes `(Batch, Channel, *)` where `*` means any number of additional dimensions.\n    -  `size: int`; Size of hidden features, and size of the output channel.\n    -  `init_weight: None or str or callable`; Initialization specification for the weight tensor.\n        If a `str`, should be one of the nonlinearity functions contained in `torch.nn.init`.\n        If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`. If a 2-`tuple`,\n        it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.Linear`s default scheme.\n    -  `init_bias: None or str or callable`; Same as `init_weight`, but for the bias tensor.\n    -  `bias: bool`; If `True`, adds a learnable bias to the output. Default: `True`.\n    -  `**kw:dict`; Any additional KWargs are passed down to `warm.engine.forward`. Refer to its docs for details.\n    -  `return: Tensor`; With shape `(Batch, Size, *)` where `*` can be 1d, 2d, 3d that depends on `x`. \"\"\"\n    inferred_kw = dict(\n        base_name='linear',\n        base_class=nn.Linear,\n        base_kw={'out_features':size, 'bias':bias},\n        base_shape='BDC',\n        infer_kw={'in_features':'C'},\n        initialization={'weight':init_weight, **({'bias':init_bias} if bias else {})}, )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef batch_norm(x, **kw):\n    \"\"\" Batch Normalization layer.\\n\n    -  `x: Tensor`; 2d or more, with shapes `(Batch, Channel, *)` where `*` means any number of additional dimensions.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.BatchNormNd`, where N can be 1, 2 or 3.\n        as well as `warm.engine.forward`. Refer to their docs for details. Some of the additional BatchNorm arguments:\n        `eps, momentum, affine, track_running_stats`.\n    -  `return: Tensor`; Same shape as input  `x`. \"\"\"\n    d = x.ndim-3\n    assert d in [0, 1, 2], 'Incompatible number of dims for input x.'\n    inferred_kw = dict(\n        base_name='batch_norm',\n        base_class=[nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d][d],\n        base_kw={'num_features':x.shape[1]}, )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef lstm(x, size,\n        init_weight_hh='orthogonal_', init_weight_ih=None, init_bias_hh=None, init_bias_ih=None,\n        bias=True, num_layers=1, **kw):\n    \"\"\" Long Short Term Memory layer.\\n\n    -  `x: Tensor or tuple`; If tuple, must be of format `(x, (h_0, c_0))`, where `x` is a 3d tensor,\n        with shapes `(Batch, Channel, Length)`.\n    -  `size: int`; Size of hidden features, and size of the output channel.\n    -  `init_weight_hh: None or str or callable`; Initialization specification for the hidden-hidden weight tensor.\n        If a `str`, should be one of the nonlinearity functions contained in `torch.nn.init`.\n        If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`. If a 2-`tuple`,\n        it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`.\n        Default: `'orthogonal_'`.\n    -  `init_weight_ih: None or str or callable`; Initialization specification for the input-hidden weight tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.LSTM`s default scheme.\n    -  `init_bias_hh: None or str or callable`; Initialization specification for the hidden-hidden bias tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.LSTM`s default scheme.\n    -  `init_bias_ih: None or str or callable`; Initialization specification for the input-hidden bias tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.LSTM`s default scheme.\n    -  `bias: bool`; If `False`, then the layer does not use `bias_ih` and `bias_hh`. Default: `True`.\n    -  `num_layers: int`; Number of the recurrent layers. Default: 1.\n    -  `tuple_out: bool`; If `True`, the returned value will be a tuple `(out, (h_n, c_n))`. Default: False.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.LSTM`, as well as `warm.engine.forward`.\n        Refer to their docs for details. Some of the additional LSTM arguments: `dropout, bidirectional, batch_first`.\n    -  `return: Tensor or tuple`; If `tuple_out` set to true, will return `(out, (h_n, c_n)`, otherwise just `out`.\n        `out` has shape `(Batch, Size, Length*Directions)`,\n            where Directions = 2 if `bidirectional` else 1.\n        `h_n` is the hidden states with shape `(num_layers*Directions, Batch, Size)`.\n        `c_n` is the cell states with shape `(num_layers*Directions, Batch, Size)`. \"\"\"\n    states = None\n    if isinstance(x, tuple):\n        x, *states = x\n    init = dict(\n        weight_hh=init_weight_hh,\n        weight_ih=init_weight_ih,\n        bias_hh=init_bias_hh,\n        bias_ih=init_bias_ih, )\n    inferred_kw = dict(\n        base_name='lstm',\n        base_class=nn.LSTM,\n        base_kw={\n            'hidden_size':size,\n            'num_layers':num_layers,\n            **engine.unused_kwargs(kw), },\n        base_shape='DBC',\n        infer_kw={'input_size':'C'},\n        forward_arg=states,\n        initialization={\n            f'{k}_l{l}':init[k] for k in ['weight_hh', 'weight_ih']+(['bias_hh', 'bias_ih'] if bias else [])\n            for l in range(num_layers)}, )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef gru(*arg, **kw):\n    \"\"\" Gated Recurrent Unit layer.\\n\n    -  `x: Tensor or tuple`; If tuple, must be of format `(x, (h_0, c_0))`, where `x` is a 3d tensor,\n        with shapes `(Batch, Channel, Length)`.\n    -  `size: int`; Size of hidden features, and size of the output channel.\n    -  `init_weight_hh: None or str or callable`; Initialization specification for the hidden-hidden weight tensor.\n        If a `str`, should be one of the nonlinearity functions contained in `torch.nn.init`.\n        If a `callable`, it will be applied to `x` directly, i.e. `spec(x)`. If a 2-`tuple`,\n        it must be of format `(callable, kwargs)`, i.e. `callable(x, **kwargs)`.\n        Default: `'orthogonal_'`.\n    -  `init_weight_ih: None or str or callable`; Initialization specification for the input-hidden weight tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.GRU`s default scheme.\n    -  `init_bias_hh: None or str or callable`; Initialization specification for the hidden-hidden bias tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.GRU`s default scheme.\n    -  `init_bias_ih: None or str or callable`; Initialization specification for the input-hidden bias tensor.\n        Default: `None`, and the weight tensor is initialized using `torch.nn.GRU`s default scheme.\n    -  `bias: bool`; If `False`, then the layer does not use `bias_ih` and `bias_hh`. Default: `True`.\n    -  `num_layers: int`; Number of the recurrent layers. Default: 1.\n    -  `tuple_out: bool`; If `True`, the returned value will be a tuple `(out, (h_n, c_n))`. Default: False.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.GRU`, as well as `warm.engine.forward`.\n        Refer to their docs for details. Some of the additional GRU arguments: `dropout, bidirectional, batch_first`.\n    -  `return: Tensor or tuple`; If `tuple_out` set to true, will return `(out, (h_n, c_n)`, otherwise just `out`.\n        `out` has shape `(Batch, Size, Length*Directions)`,\n            where Directions = 2 if `bidirectional` else 1.\n        `h_n` is the hidden states with shape `(num_layers*Directions, Batch, Size)`.\n        `c_n` is the cell states with shape `(num_layers*Directions, Batch, Size)`. \"\"\"\n    return lstm(*arg, base_name='gru', base_class=nn.GRU, **kw)\n\n\ndef identity(x, *arg, **kw):\n    \"\"\" Identity layer that returns the first input, ignores the rest arguments. \"\"\"\n    return x\n\n\ndef dropout(x, rate=0.5, by_channel=False, **kw):\n    \"\"\" Dropout layer.\\n\n    During training, randomly zeros part of input tensor `x`, at probability `rate`.\\n\n    -  `x: Tensor`; Can be of any shape if `by_channel` is false, or 2d and up if `by_channel` is true.\n    -  `rate: float`; The probability of dropout. Default 0.5.\n    -  `by_channel: bool`; If true, will dropout entire channels (all `'D'` dimensions will be 0 if x is `'BCD'`).\n        `by_channel` true requires `x` to be 2d or more.\n    -  `inplace: bool`; If true, the operation will be in-place and the input `x` will be altered.\n    -  `return: Tensor`; Same shape as `x`. \"\"\"\n    inferred_kw = dict(\n        base_name='dropout',\n        base_class=[nn.Dropout, nn.Dropout2d][by_channel],\n        base_kw={'p':rate},\n        base_shape=[None, 'BCD'][by_channel], )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef transformer(x, y=None, num_encoder=6, num_decoder=6, num_head=8,\n        mask=None, causal=False, in_shape='BCD', **kw):\n    \"\"\" Transformer layer.\\n\n    This layer covers functionality of `Transformer`, `TransformerEncoder`, and `TransformerDecoder`.\n    See [`torch.nn.Transformer`](https://pytorch.org/docs/stable/nn.html#transformer) for more details.\\n\n    -  `x: Tensor`; The source sequence, with shape `(Batch, Channel, LengthX)`.\n        `Channel` is usually from embedding.\n    -  `y: None or Tensor`; The target sequence. Also with shape `(Batch, Channel, LengthY)`.\n        If not present, default to equal `x`.\n    -  `num_encoder: int`; Number of encoder layers. Set to 0 to disable encoder and use only decoder. Default 6.\n    -  `num_decoder: int`; Number of decoder layers. Set to 0 to disable decoder and use only encoder. Default 6.\n    -  `num_head: int`; Number of heads for multi-headed attention. Default 8.\n    -  `mask: None or dict`; Keys are among: `src_mask`, `tgt_mask`, `memory_mask`,\n        `src_key_padding_mask`, `tgt_key_padding_mask`, `memory_key_padding_mask`.\n        See the `forward` method of `torch.nn.Transformer` for details.\n    -  `causal: bool`; Default false. if true, will add causal masks to source and target, so that\n        current value only depends on the past, not the future, in the sequences.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.Transformer`, as well as `warm.engine.forward`.\n    -  `return: Tensor`; Same shape as `y`, if `num_decoder` > 0. Otherwise same shape as `x`. \"\"\"\n    def _causal_mask(n):\n        mask = (torch.triu(torch.ones(n, n)) == 1).transpose(0, 1)\n        return mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))\n    if y is None:\n        y = x\n    y = permute(y, in_shape, 'DBC')\n    mask = mask or {}\n    if causal:\n        i = in_shape.find('D')\n        mx = _causal_mask(x.shape[i])\n        mask['src_mask'] = mask.pop('src_mask', 0.0)+mx\n        my = _causal_mask(y.shape[0])\n        mask['tgt_mask'] = mask.pop('tgt_mask', 0.0)+my\n    encoder = identity if num_encoder == 0 else None\n    decoder = identity if num_decoder == 0 else None\n    inferred_kw = dict(\n        base_name='transformer',\n        base_class=nn.Transformer,\n        base_shape='DBC',\n        base_kw=dict(\n            d_model=x.shape[in_shape.find('C')],\n            custom_encoder=encoder,\n            custom_decoder=decoder,\n            nhead=num_head,\n            num_encoder_layers=num_encoder,\n            num_decoder_layers=num_decoder, \n            **engine.unused_kwargs(kw), ),\n        in_shape=in_shape,\n        forward_kw=mask,\n        forward_arg=(y, ), )\n    return engine.forward(x, **{**inferred_kw, **kw})\n\n\ndef layer_norm(x, dim=1, **kw):\n    \"\"\" Layer Normalization.\\n\n    -  `x: Tensor`; Can be of any shape.\n    -  `dim: int or list of int`; Dimensions to be normalized. Default: 1.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.LayerNorm`, as well as `warm.engine.forward`.\n    -  `return: Tensor`; Same shape as `x`. \"\"\"\n    if dim != -1:\n        if isinstance(dim, int):\n            dim = [dim]\n        dim_norm = [x.ndim+i if i < 0 else i for i in dim]\n        order = [i for i in range(x.ndim) if i not in dim_norm]+dim_norm\n        x = x.permute(order)\n        norm_shape = x.shape[-len(dim_norm):]\n    else:\n        norm_shape = [x.shape[-1]]\n    inferred_kw = dict(\n        base_name='layer_norm',\n        base_class=nn.LayerNorm,\n        base_kw={'normalized_shape':norm_shape}, )\n    x = engine.forward(x, **{**inferred_kw, **kw})\n    if dim != -1:\n        x = x.permute(np.argsort(order).tolist())\n    return x\n\n\ndef embedding(x, size, vocabulary=None, **kw):\n    \"\"\" Embedding layer.\\n\n    The input is usually a list of indices (integers), and the output is a dense matrix which\n    maps indices to dense vectors. Thus the output will have 1 more dimension than the input.\\n\n    **Note**: The output of this function is always one more dimension than the input. For input with shape `(*)`,\n    The output will be `(*, size)`. Any shape specifications in the KWargs are ignored. \\n\n    -  `x: Tensor`; Contains indices into the vocabulary. Will be converted to `LongTensor` of integers.\n        Can be of any shape.\n    -  `size: int`; The size of embedding vector.\n    -  `vocabulary: int or None`; The size of vocabulary of embedding, or max number of unique indices in `x`.\n        By default it is set to `max(x)-min(x)+1`.\n    -  `**kw: dict`; Any additional KWargs are passed down to `torch.nn.LayerNorm`, as well as `warm.engine.forward`.\n    -  `return: Tensor`; With the embedded dim appended to the shape of x.\n        Thus with shape `(*, Size)`, where `*` is the shape of `x`. \"\"\"\n    x = x.type(torch.LongTensor)\n    if vocabulary is None:\n        vocabulary = x.max()-x.min()+1\n    kw.pop('in_shape', None)\n    kw.pop('out_shape', None)\n    kw.pop('base_shape', None)\n    inferred_kw = dict(\n        base_name='embedding',\n        base_class=nn.Embedding,\n        base_kw=dict(\n            num_embeddings=vocabulary,\n            embedding_dim=size,\n            **engine.unused_kwargs(kw), ),\n        base_shape=None,\n        in_shape=None,\n        out_shape=None, )\n    return engine.forward(x, **{**inferred_kw, **kw})\n"
  },
  {
    "path": "warm/module.py",
    "content": "# 08-27-2019;\n\"\"\"\nCustom modules to enhance the nn Sequential experience.\n\nPyWarm's core concept is to use a functional interface to simplify network building.\nHowever, if you still prefer the classical way of defining child modules in `__init__()`,\nPyWarm provides some utilities to help organize child modules better.\n\n- `Lambda` can be used to wrap one line data transformations, like `x.view()`, `x.permute()` etc, into modules.\n\n- `Sequential` is an extension to `nn.Sequential` that better accomodates PyTorch RNNs.\n\n- `Shortcut` is another extension to `nn.Sequential` that will also perform a shortcut addition (AKA residual connection)\nfor the input with output, so that residual blocks can be written in an entire sequential way.\n\nFor example, to define the basic block type for resnet:\n\n\n```Python\nimport torch.nn as nn\nimport warm.module as wm\n\n\ndef basic_block(size_in, size_out, stride=1):\n    block = wm.Shortcut(\n        nn.Conv2d(size_in, size_out, 3, stride, 1, bias=False),\n        nn.BatchNorm2d(size_out),\n        nn.ReLU(),\n        nn.Conv2d(size_out, size_out, 3, 1, 1, bias=False),\n        nn.BatchNorm2d(size_out),\n        projection=wm.Lambda(\n            lambda x: x if x.shape[1] == size_out else nn.Sequential(\n                nn.Conv2d(size_in, size_out, 1, stride, bias=False),\n                nn.BatchNorm2d(size_out), )(x), ), )\n    return block\n```\n\"\"\"\n\n\nimport torch.nn as nn\n\n\nclass Lambda(nn.Module):\n    \"\"\" Wraps a callable and all its call arguments.\\n\n    -  `fn: callable`; The callable being wrapped.\n    -  `*arg: list`; Arguments to be passed to `fn`.\n    -  `**kw: dict`; KWargs to be passed to `fn`. \"\"\"\n    def __init__(self, fn, *arg, **kw):\n        super().__init__()\n        self.fn = fn\n        self.arg = arg\n        self.kw = kw\n    def forward(self, x):\n        \"\"\" forward. \"\"\"\n        return self.fn(x, *self.arg, **self.kw)\n\n\nclass Sequential(nn.Sequential):\n    \"\"\" Similar to `nn.Sequential`, except that child modules can have multiple outputs (e.g. `nn.RNN`).\\n\n    -  `*arg: list of Modules`; Same as `nn.Sequential`. \"\"\"\n    def forward(self, x):\n        \"\"\" forward. \"\"\"\n        for module in self._modules.values():\n            if isinstance(x, tuple):\n                try:\n                    x = module(x)\n                except Exception:\n                    x = module(x[0])\n            else:\n                x = module(x)\n        return x\n\n\nclass Shortcut(Sequential):\n    \"\"\" Similar to `nn.Sequential`, except that it performs a shortcut addition for the input and output.\\n\n    -  `*arg: list of Modules`; Same as `nn.Sequential`.\n    -  `projection: None or callable`; If `None`, input with be added directly to the output.\n        otherwise input will be passed to the `projection` first, usually to make the shapes match. \"\"\"\n    def __init__(self, *arg, projection=None):\n        super().__init__(*arg)\n        self.projection = projection or nn.Identity()\n    def forward(self, x):\n        \"\"\" forward. \"\"\"\n        return super().forward(x)+self.projection(x)\n"
  },
  {
    "path": "warm/util.py",
    "content": "# 08-28-2019;\n\"\"\"\nShort utilities.\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport numpy as np\nimport re\n\n\n\"\"\" Create a property for class torch.Tensor called ndim, for pytorch earlier than 1.2. \"\"\"\nif not hasattr(torch.Tensor, 'ndim'):\n    torch.Tensor.ndim = property(lambda x: x.dim())\n\n\ndef camel_to_snake(name):\n    \"\"\" Convert a camelCaseString to its snake_case_equivalent. \"\"\"\n    s1 = re.sub('(.)([A-Z][a-z]+)', r'\\1_\\2', name)\n    return re.sub('([a-z0-9])([A-Z])', r'\\1_\\2', s1).lower()\n\n\ndef summary_str(model):\n    \"\"\" Get a string representation of model building blocks and parameter counts. \"\"\"\n    indent_list, name_list, count_list = [], [], []\n    def module_info(m, name, indent_level):\n        count_list.append(sum([np.prod(list(p.size())) for p in m.parameters()]))\n        indent_list.append(indent_level)\n        name_list.append(name)\n        for name, child in m.named_children():\n            if name.isdigit():\n                name = child._get_name()\n            module_info(child, name, indent_level+1)\n    module_info(model, model._get_name(), 0)\n    max_indent = max(indent_list)*4\n    max_name = max(len(x) for x in name_list)+max_indent+2\n    max_param = len(str(count_list[0]))+max_name+2\n    out = ['Blocks{:>{w}}'.format('Params', w=max_param-6)]\n    out += ['-'*max_param]\n    for indent, name, param in zip(indent_list, name_list, count_list):\n        s0 = '    '*indent\n        s1 = '{:{w}}'.format(name, w=max_name-len(s0))\n        s2 = '{:>{w}}'.format(param, w=max_param-len(s1)-len(s0))\n        out += [s0+s1+s2]\n    return '\\n'.join(out)\n\n\ndef summary(model):\n    \"\"\" Print a summary about model building blocks and parameter counts. \"\"\"\n    print(summary_str(model))\n"
  }
]