[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nenv/\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\n*.egg-info/\n.installed.cfg\n*.egg\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*,cover\n.hypothesis/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# IPython Notebook\n.ipynb_checkpoints\n\n# pyenv\n.python-version\n\n# celery beat schedule file\ncelerybeat-schedule\n\n# dotenv\n.env\n\n# virtualenv\nvenv/\nENV/\n\n# Spyder project settings\n.spyderproject\n\n# Rope project settings\n.ropeproject\n"
  },
  {
    "path": ".travis.yml",
    "content": "sudo: false\nlanguage: python\ndist: xenial\n\npython:\n  - \"2.7\"\n  - \"3.7\"\n\nenv:\n  - TORCH_VERSION=1.0.1\n  - TORCH_VERSION=0.4.1\n\ncache:\n  apt: true\n  directories:\n    - $HOME/.cache/pip\n\ninstall:\n\n  - wget http://repo.continuum.io/miniconda/Miniconda-3.6.0-Linux-x86_64.sh -O miniconda.sh\n  - bash miniconda.sh -b -p $HOME/miniconda\n  - export PATH=\"$HOME/miniconda/bin:$PATH\"\n  - conda config --set always_yes yes --set changeps1 no\n  - conda update -q conda\n  - hash -r\n  - conda info -a\n  - conda create -q -n testenv python=$TRAVIS_PYTHON_VERSION\n  - source activate testenv\n  - conda install -c pytorch pytorch-cpu=$TORCH_VERSION numpy scipy pytest cython\n\n  # install package\n  - cd pytorch\n  - pip install .\n\nscript:\n  - mkdir empty_dir\n  - pytest pytest -vs --pyargs torchsparseattn\n  - cd .. \n"
  },
  {
    "path": "LICENSE",
    "content": "BSD 3-Clause License\n\nCopyright (c) 2017, Vlad Niculae\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above copyright notice, this\n  list of conditions and the following disclaimer.\n\n* Redistributions in binary form must reproduce the above copyright notice,\n  this list of conditions and the following disclaimer in the documentation\n  and/or other materials provided with the distribution.\n\n* Neither the name of the copyright holder nor the names of its\n  contributors may be used to endorse or promote products derived from\n  this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\nAND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\nIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE\nFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL\nDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR\nSERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER\nCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,\nOR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\nOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n"
  },
  {
    "path": "README.md",
    "content": "# Sparse and structured attention mechanisms\n[![Build Status](https://travis-ci.org/vene/sparse-structured-attention.svg?branch=master)](https://travis-ci.org/vene/sparse-structured-attention)\n[![PyPI version](https://badge.fury.io/py/torchsparseattn.svg)](https://badge.fury.io/py/torchsparseattn)\n\n<p align=\"center\"><img src=\"fusedmax.png\" /></p>\n\n--------------------------------------------------------------------------------\n\nEfficient implementation of structured sparsity inducing\nattention mechanisms: fusedmax, oscarmax and sparsemax.\n\n**Note**: If you are just looking for sparsemax, I recommend the implementation in the [entmax](https://github.com/deep-spin/entmax).\n\nCurrently available for pytorch >= 0.4.1. (For older versions, use a previous\nrelease of this package.) Requires python >= 2.7, cython, numpy, scipy.\n\nUsage example:\n\n```python\n\nIn [1]: import torch\nIn [2]: import torchsparseattn\nIn [3]: a = torch.tensor([1, 2.1, 1.9], dtype=torch.double)\nIn [4]: lengths = torch.tensor([3])\nIn [5]: fusedmax = torchsparseattn.Fusedmax(alpha=.1)\nIn [6]: fusedmax(a, lengths)\nOut[6]: tensor([0.0000, 0.5000, 0.5000], dtype=torch.float64)\n```\n\nFor details, check out our paper:\n\n> Vlad Niculae and Mathieu Blondel\n> A Regularized Framework for Sparse and Structured Neural Attention\n> In: Proceedings of NIPS, 2017. \n> https://arxiv.org/abs/1705.07704 \n\nSee also:\n\n> André F. T. Martins and Ramón Fernandez Astudillo\n> From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification\n> In: Proceedings of ICML, 2016\n> https://arxiv.org/abs/1602.02068\n\n> X. Zeng and M. Figueiredo,\n> The ordered weighted L1 norm: Atomic formulation, dual norm, and projections.\n> eprint http://arxiv.org/abs/1409.4271\n\n"
  },
  {
    "path": "pytorch/MANIFEST.in",
    "content": "include MANIFEST.in\nrecursive-include torchsparseattn *.c *.h *.cpp *.pyx *.pxd\n"
  },
  {
    "path": "pytorch/setup.py",
    "content": "import numpy\nfrom setuptools import setup, find_packages, Extension\n\nfrom Cython.Build import cythonize\n\nextensions = [\n    Extension('torchsparseattn._isotonic',\n              [\"torchsparseattn/_isotonic.pyx\"],\n              include_dirs=[numpy.get_include()]),\n    Extension('torchsparseattn._fused',\n              [\"torchsparseattn/_fused.pyx\"],\n              include_dirs=[numpy.get_include()]),\n    Extension('torchsparseattn._fused_jv',\n              [\"torchsparseattn/_fused_jv.pyx\"]),\n]\n\nextensions = cythonize(extensions)\n\n\nsetup(name=\"torchsparseattn\",\n      version=\"0.3.dev0\",\n      description=\"Sparse structured attention mechanisms for pytorch\",\n      author=\"Vlad Niculae\",\n      author_email=\"vlad@vene.ro\",\n      license=\"BSD 3-clause\",\n      packages=find_packages(),\n      ext_modules=extensions,\n      install_requires=['numpy'],\n      zip_safe=False,\n      classifiers=[\n          'Intended Audience :: Science/Research',\n          'Intended Audience :: Developers', 'License :: OSI Approved',\n          'Programming Language :: C', 'Programming Language :: Python',\n          'Topic :: Software Development',\n          'Topic :: Scientific/Engineering',\n          'Operating System :: Microsoft :: Windows',\n          'Operating System :: POSIX', 'Operating System :: Unix',\n          'Operating System :: MacOS']\n)\n"
  },
  {
    "path": "pytorch/torchsparseattn/__init__.py",
    "content": "from .fused import Fusedmax, FusedProxFunction\nfrom .oscar import Oscarmax, OscarProxFunction\nfrom .sparsemax import Sparsemax, SparsemaxFunction\n\n__version__ = __VERSION__ = '0.3.dev0'\n"
  },
  {
    "path": "pytorch/torchsparseattn/_fused.pyx",
    "content": "# encoding: utf-8\n# cython: cdivision=True\n# cython: boundscheck=False\n# cython: wraparound=False\n#\n# Authors: Fabian Pedregosa\n# Bundled file from lightning library\n\n\"\"\"\nThese are some helper functions to compute the proximal operator of some common penalties\n\"\"\"\n\ncimport numpy as np\nfrom cython cimport floating\n\ncpdef prox_tv1d(np.ndarray[ndim=1, dtype=floating] w, floating stepsize):\n    \"\"\"\n    Computes the proximal operator of the 1-dimensional total variation operator.\n\n    This solves a problem of the form\n\n         argmin_x TV(x) + (1/(2 stepsize)) ||x - w||^2\n\n    where TV(x) is the one-dimensional total variation\n\n    Parameters\n    ----------\n    w: array\n        vector of coefficieents\n    stepsize: float\n        step size (sometimes denoted gamma) in proximal objective function\n\n    References\n    ----------\n    Condat, Laurent. \"A direct algorithm for 1D total variation denoising.\"\n    IEEE Signal Processing Letters (2013)\n    \"\"\"\n    cdef long width, k, k0, kplus, kminus\n    cdef floating umin, umax, vmin, vmax, twolambda, minlambda\n    width = w.size\n\n    # /to avoid invalid memory access to input[0] and invalid lambda values\n    if width > 0 and stepsize >= 0:\n        k, k0 = 0, 0\t\t\t# k: current sample location, k0: beginning of current segment\n        umin = stepsize  # u is the dual variable\n        umax = - stepsize\n        vmin = w[0] - stepsize\n        vmax = w[0] + stepsize\t  # bounds for the segment's value\n        kplus = 0\n        kminus = 0 \t# last positions where umax=-lambda, umin=lambda, respectively\n        twolambda = 2.0 * stepsize  # auxiliary variable\n        minlambda = -stepsize\t\t# auxiliary variable\n        while True:\t\t\t\t# simple loop, the exit test is inside\n            while k >= width-1: \t# we use the right boundary condition\n                if umin < 0.0:\t\t\t# vmin is too high -> negative jump necessary\n                    while True:\n                        w[k0] = vmin\n                        k0 += 1\n                        if k0 > kminus:\n                            break\n                    k = k0\n                    kminus = k\n                    vmin = w[kminus]\n                    umin = stepsize\n                    umax = vmin + umin - vmax\n                elif umax > 0.0:    # vmax is too low -> positive jump necessary\n                    while True:\n                        w[k0] = vmax\n                        k0 += 1\n                        if k0 > kplus:\n                            break\n                    k = k0\n                    kplus = k\n                    vmax = w[kplus]\n                    umax = minlambda\n                    umin = vmax + umax -vmin\n                else:\n                    vmin += umin / (k-k0+1)\n                    while True:\n                        w[k0] = vmin\n                        k0 += 1\n                        if k0 > k:\n                            break\n                    return\n            umin += w[k + 1] - vmin\n            if umin < minlambda:       # negative jump necessary\n                while True:\n                    w[k0] = vmin\n                    k0 += 1\n                    if k0 > kminus:\n                        break\n                k = k0\n                kminus = k\n                kplus = kminus\n                vmin = w[kplus]\n                vmax = vmin + twolambda\n                umin = stepsize\n                umax = minlambda\n            else:\n                umax += w[k + 1] - vmax\n                if umax > stepsize:\n                    while True:\n                        w[k0] = vmax\n                        k0 += 1\n                        if k0 > kplus:\n                            break\n                    k = k0\n                    kminus = k\n                    kplus = kminus\n                    vmax = w[kplus]\n                    vmin = vmax - twolambda\n                    umin = stepsize\n                    umax = minlambda\n                else:                   # no jump necessary, we continue\n                    k += 1\n                    if umin >= stepsize:\t\t# update of vmin\n                        kminus = k\n                        vmin += (umin - stepsize) / (kminus - k0 + 1)\n                        umin = stepsize\n                    if umax <= minlambda:\t    # update of vmax\n                        kplus = k\n                        vmax += (umax + stepsize) / (kplus - k0 + 1)\n                        umax = minlambda\n"
  },
  {
    "path": "pytorch/torchsparseattn/_fused_jv.pyx",
    "content": "cimport cython\nfrom cython cimport floating\n\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\n@cython.cdivision(True)\ndef _inplace_fused_prox_jv(floating[::1] y_hat, floating[::1] dout):\n    cdef Py_ssize_t n_features = dout.shape[0]\n    cdef Py_ssize_t i, last_ix\n    cdef unsigned int n\n    cdef floating acc\n    for i in range(n_features + 1):\n        if i in (0, n_features) or y_hat[i] != y_hat[i - 1]:\n            if i > 0:\n                dout[last_ix:i] = acc / n\n\n            if i < n_features:\n                last_ix = i\n                acc = dout[i]\n                n = 1\n\n        else:\n            acc += dout[i]\n            n += 1\n    return dout\n\n\n\n"
  },
  {
    "path": "pytorch/torchsparseattn/_isotonic.pyx",
    "content": "# Author: Nelle Varoquaux, Andrew Tulloch, Antony Lee\n\n# Uses the pool adjacent violators algorithm (PAVA), with the\n# enhancement of searching for the longest decreasing subsequence to\n# pool at each step.\n\nimport numpy as np\ncimport numpy as np\ncimport cython\nfrom cython cimport floating\n\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\n@cython.cdivision(True)\ndef _inplace_contiguous_isotonic_regression(floating[::1] y, floating[::1] w):\n    cdef:\n        Py_ssize_t n = y.shape[0], i, k\n        floating prev_y, sum_wy, sum_w\n        Py_ssize_t[::1] target = np.arange(n, dtype=np.intp)\n\n    # target describes a list of blocks.  At any time, if [i..j] (inclusive) is\n    # an active block, then target[i] := j and target[j] := i.\n\n    # For \"active\" indices (block starts):\n    # w[i] := sum{w_orig[j], j=[i..target[i]]}\n    # y[i] := sum{y_orig[j]*w_orig[j], j=[i..target[i]]} / w[i]\n\n    with nogil:\n        i = 0\n        while i < n:\n            k = target[i] + 1\n            if k == n:\n                break\n            if y[i] < y[k]:\n                i = k\n                continue\n            sum_wy = w[i] * y[i]\n            sum_w = w[i]\n            while True:\n                # We are within a decreasing subsequence.\n                prev_y = y[k]\n                sum_wy += w[k] * y[k]\n                sum_w += w[k]\n                k = target[k] + 1\n                if k == n or prev_y < y[k]:\n                    # Non-singleton decreasing subsequence is finished,\n                    # update first entry.\n                    y[i] = sum_wy / sum_w\n                    w[i] = sum_w\n                    target[i] = k - 1\n                    target[k - 1] = i\n                    if i > 0:\n                        # Backtrack if we can.  This makes the algorithm\n                        # single-pass and ensures O(n) complexity.\n                        i = target[i - 1]\n                    # Otherwise, restart from the same point.\n                    break\n        # Reconstruct the solution.\n        i = 0\n        while i < n:\n            k = target[i] + 1\n            y[i + 1 : k] = y[i]\n            i = k\n\n\n@cython.boundscheck(False)\n@cython.wraparound(False)\n@cython.cdivision(True)\ndef _make_unique(np.ndarray[dtype=floating] X,\n                 np.ndarray[dtype=floating] y,\n                 np.ndarray[dtype=floating] sample_weights):\n    \"\"\"Average targets for duplicate X, drop duplicates.\n\n    Aggregates duplicate X values into a single X value where\n    the target y is a (sample_weighted) average of the individual\n    targets.\n\n    Assumes that X is ordered, so that all duplicates follow each other.\n    \"\"\"\n    unique_values = len(np.unique(X))\n    if unique_values == len(X):\n        return X, y, sample_weights\n    cdef np.ndarray[dtype=floating] y_out = np.empty(unique_values)\n    cdef np.ndarray[dtype=floating] x_out = np.empty(unique_values)\n    cdef np.ndarray[dtype=floating] weights_out = np.empty(unique_values)\n\n    cdef floating current_x = X[0]\n    cdef floating current_y = 0\n    cdef floating current_weight = 0\n    cdef floating y_old = 0\n    cdef int i = 0\n    cdef int current_count = 0\n    cdef int j\n    cdef floating x\n    cdef int n_samples = len(X)\n    for j in range(n_samples):\n        x = X[j]\n        if x != current_x:\n            # next unique value\n            x_out[i] = current_x\n            weights_out[i] = current_weight / current_count\n            y_out[i] = current_y / current_weight\n            i += 1\n            current_x = x\n            current_weight = sample_weights[j]\n            current_y = y[j] * sample_weights[j]\n            current_count = 1\n        else:\n            current_weight += sample_weights[j]\n            current_y += y[j] * sample_weights[j]\n            current_count += 1\n\n    x_out[i] = current_x\n    weights_out[i] = current_weight / current_count\n    y_out[i] = current_y / current_weight\n    return x_out, y_out, weights_out\n"
  },
  {
    "path": "pytorch/torchsparseattn/base.py",
    "content": "from torch import nn\nfrom torch import autograd as ta\n\nclass _BaseBatchProjection(ta.Function):\n    \"\"\"Applies a sample-wise normalizing projection over a batch.\"\"\"\n\n    def forward(self, x, lengths=None):\n\n        requires_squeeze = False\n        if x.dim() == 1:\n            x = x.unsqueeze(0)\n            requires_squeeze = True\n\n        n_samples, max_dim = x.size()\n\n        has_lengths = True\n        if lengths is None:\n            has_lengths = False\n            lengths = [max_dim] * n_samples\n\n        y_star = x.new()\n        y_star.resize_as_(x)\n        y_star.zero_()\n\n        for i in range(n_samples):\n            y_star[i, :lengths[i]] = self.project(x[i, :lengths[i]])\n\n        if requires_squeeze:\n            y_star = y_star.squeeze()\n\n        self.mark_non_differentiable(y_star)\n        if has_lengths:\n            self.mark_non_differentiable(lengths)\n            self.save_for_backward(y_star, lengths)\n        else:\n            self.save_for_backward(y_star)\n\n        return y_star\n\n    def backward(self, dout):\n\n        if not self.needs_input_grad[0]:\n            return None\n\n        if len(self.needs_input_grad) > 1 and self.needs_input_grad[1]:\n            raise ValueError(\"Cannot differentiate {} w.r.t. the \"\n                             \"sequence lengths\".format(self.__name__))\n\n        saved = self.saved_tensors\n        if len(saved) == 2:\n            y_star, lengths = saved\n        else:\n            y_star, = saved\n            lengths = None\n\n        requires_squeeze = False\n        if y_star.dim() == 1:\n            y_star = y_star.unsqueeze(0)\n            dout = dout.unsqueeze(0)\n            requires_squeeze = True\n\n        n_samples, max_dim = y_star.size()\n        din = dout.new()\n        din.resize_as_(y_star)\n        din.zero_()\n\n        if lengths is None:\n            lengths = [max_dim] * n_samples\n\n        for i in range(n_samples):\n            din[i, :lengths[i]] = self.project_jv(dout[i, :lengths[i]],\n                                                  y_star[i, :lengths[i]])\n\n        if requires_squeeze:\n            din = din.squeeze()\n\n        return din, None\n"
  },
  {
    "path": "pytorch/torchsparseattn/fused.py",
    "content": "\"\"\"Fusedmax attention\n\nClusters neighboring attention weights into groups with equal weight.\n\nA Regularized Framework for Sparse and Structured Neural Attention\nVlad Niculae, Mathieu Blondel\nhttps://arxiv.org/abs/1705.07704\n\"\"\"\n\nfrom __future__ import division\n\nimport torch\nfrom torch import nn\nfrom torch import autograd as ta\nimport warnings\n\nfrom .base import _BaseBatchProjection\nfrom .sparsemax import SparsemaxFunction\nfrom ._fused import prox_tv1d\n\n\ndef _inplace_fused_prox_jv_slow(y_hat, dout):\n    \"\"\"not efficient in python for long seqs, but template for a cython impl\"\"\"\n\n    n_features = len(dout)\n\n    for i in range(n_features + 1):\n        if i in (0, n_features) or y_hat[i] != y_hat[i - 1]:\n            if i > 0:\n                dout[last_ix:i] = acc / n\n\n            if i < n_features:\n                last_ix = i\n                acc = dout[i]\n                n = 1\n        else:\n            acc += dout[i]\n            n += 1\n    return dout\n\n\ntry:\n    from ._fused_jv import _inplace_fused_prox_jv\nexcept ImportError:\n    warnings.warn(\"Could not import cython implementation of fused backward \"\n                  \"pass. Slow implementation used instead.\")\n    _inplace_fused_prox_jv = _inplace_fused_prox_jv_slow\n\n\ndef fused_prox_jv_slow(y_hat, dout):\n    dout = dout.clone()\n    _inplace_fused_prox_jv_slow(y_hat, dout)\n    return dout\n\n\ndef fused_prox_jv_fast(y_hat, dout):\n    dout = dout.clone()\n    _inplace_fused_prox_jv(y_hat.detach().numpy(), dout.numpy())\n    return dout\n\n\nclass FusedProxFunction(_BaseBatchProjection):\n\n    def __init__(self, alpha=1):\n        self.alpha = alpha\n\n    def project(self, x):\n        x_np = x.detach().numpy().copy()\n        prox_tv1d(x_np, self.alpha)\n        y_hat = torch.from_numpy(x_np)\n        return y_hat\n\n    def project_jv(self, dout, y_hat):\n        dout = dout.clone()\n        _inplace_fused_prox_jv(y_hat.detach().numpy(), dout.numpy())\n        return dout\n\n\nclass Fusedmax(nn.Module):\n    def __init__(self, alpha=1):\n        self.alpha = alpha\n        super(Fusedmax, self).__init__()\n\n    def forward(self, x, lengths=None):\n        fused_prox = FusedProxFunction(self.alpha)\n        sparsemax = SparsemaxFunction()\n        return sparsemax(fused_prox(x, lengths), lengths)\n\n\nif __name__ == '__main__':\n    from timeit import timeit\n    torch.manual_seed(1)\n\n    for dim in (5, 10, 50, 100, 500, 1000):\n\n        x = torch.randn(dim)\n        x_var = ta.Variable(x, requires_grad=True)\n        y_hat = FusedProxFunction()(x_var).data\n        dout = torch.arange(0, dim)\n        print(\"dimension={}\".format(dim))\n        print(\"slow\", timeit(\"fused_prox_jv_slow(y_hat, dout)\",\n                             globals=globals(),\n                             number=10000))\n        print(\"fast\", timeit(\"fused_prox_jv_fast(y_hat, dout)\",\n                             globals=globals(),\n                             number=10000))\n"
  },
  {
    "path": "pytorch/torchsparseattn/isotonic.py",
    "content": "\"\"\"\nIsotonic Regression that preserves 32bit inputs.\n\nbackported from scikit-learn pull request\nhttps://github.com/scikit-learn/scikit-learn/pull/9106\"\"\"\n\nimport numpy as np\n\nfrom ._isotonic import _inplace_contiguous_isotonic_regression\n\n\ndef isotonic_regression(y, sample_weight=None, y_min=None, y_max=None,\n                        increasing=True):\n    \"\"\"Solve the isotonic regression model::\n\n        min sum w[i] (y[i] - y_[i]) ** 2\n\n        subject to y_min = y_[1] <= y_[2] ... <= y_[n] = y_max\n\n    where:\n        - y[i] are inputs (real numbers)\n        - y_[i] are fitted\n        - w[i] are optional strictly positive weights (default to 1.0)\n\n    Read more in the :ref:`User Guide <isotonic>`.\n\n    Parameters\n    ----------\n    y : iterable of floating-point values\n        The data.\n\n    sample_weight : iterable of floating-point values, optional, default: None\n        Weights on each point of the regression.\n        If None, weight is set to 1 (equal weights).\n\n    y_min : optional, default: None\n        If not None, set the lowest value of the fit to y_min.\n\n    y_max : optional, default: None\n        If not None, set the highest value of the fit to y_max.\n\n    increasing : boolean, optional, default: True\n        Whether to compute ``y_`` is increasing (if set to True) or decreasing\n        (if set to False)\n\n    Returns\n    -------\n    y_ : list of floating-point values\n        Isotonic fit of y.\n\n    References\n    ----------\n    \"Active set algorithms for isotonic regression; A unifying framework\"\n    by Michael J. Best and Nilotpal Chakravarti, section 3.\n    \"\"\"\n    order = np.s_[:] if increasing else np.s_[::-1]\n    # y = as_float_array(y)  # avoid sklearn dependency; we always pass arrays\n    y = np.array(y[order], dtype=y.dtype)\n    if sample_weight is None:\n        sample_weight = np.ones(len(y), dtype=y.dtype)\n    else:\n        sample_weight = np.array(sample_weight[order], dtype=y.dtype)\n\n    _inplace_contiguous_isotonic_regression(y, sample_weight)\n    if y_min is not None or y_max is not None:\n        # Older versions of np.clip don't accept None as a bound, so use np.inf\n        if y_min is None:\n            y_min = -np.inf\n        if y_max is None:\n            y_max = np.inf\n        np.clip(y, y_min, y_max, y)\n    return y[order]\n"
  },
  {
    "path": "pytorch/torchsparseattn/oscar.py",
    "content": "\"\"\"Oscarmax attention\n\nClusters attention weights into groups with equal weight, regardless of index.\n\nA Regularized Framework for Sparse and Structured Neural Attention\nVlad Niculae, Mathieu Blondel\nhttps://arxiv.org/abs/1705.07704\n\"\"\"\n\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom torch import autograd as ta\n\nfrom .isotonic import isotonic_regression\nfrom .base import _BaseBatchProjection\nfrom .sparsemax import SparsemaxFunction\n\n\ndef oscar_prox_jv(y_hat, dout):\n    y_hat = y_hat.detach().numpy()\n    din = dout.clone().zero_()\n    dout = dout.numpy()\n    din_np = din.numpy()\n\n    sign = np.sign(y_hat)\n    y_hat = np.abs(y_hat)\n\n    uniq, inv, counts = np.unique(y_hat, return_inverse=True,\n                                  return_counts=True)\n    n_unique = len(uniq)\n    tmp = np.zeros((n_unique,), dtype=y_hat.dtype)\n    np.add.at(tmp, inv, dout * sign)\n    tmp /= counts\n    tmp.take(inv, mode='clip', out=din_np)\n    din_np *= sign\n    return din\n\n\ndef prox_owl(v, w):\n    \"\"\"Proximal operator of the OWL norm dot(w, reversed(sort(v)))\n\n    Follows description and notation from:\n    X. Zeng, M. Figueiredo,\n    The ordered weighted L1 norm: Atomic formulation, dual norm,\n    and projections.\n    eprint http://arxiv.org/abs/1409.4271\n    \"\"\"\n\n    # wlog operate on absolute values\n    v_abs = np.abs(v)\n    ix = np.argsort(v_abs)[::-1]\n    v_abs = v_abs[ix]\n    # project to K+ (monotone non-negative decreasing cone)\n    v_abs = isotonic_regression(v_abs - w, y_min=0, increasing=False)\n\n    # undo the sorting\n    inv_ix = np.zeros_like(ix)\n    inv_ix[ix] = np.arange(len(v))\n    v_abs = v_abs[inv_ix]\n\n    return np.sign(v) * v_abs\n\n\ndef _oscar_weights(alpha, beta, size):\n    w = np.arange(size - 1, -1, -1, dtype=np.float32)\n    w *= beta\n    w += alpha\n    return w\n\n\nclass OscarProxFunction(_BaseBatchProjection):\n    \"\"\"Proximal operator of the OSCAR regularizer.\n\n    ||w||_oscar = alpha ||w||_1 + beta * sum_i<j max { |w_i|, |w_j| }\n\n    Implemented via the OWL norm with appropriate choice of weights, as\n    described in:\n\n    X. Zeng, M. Figueiredo,\n    The ordered weighted L1 norm: Atomic formulation, dual norm,\n    and projections.\n    eprint http://arxiv.org/abs/1409.4271\n\n    Backward pass is described in:\n    V. Niculae, M. Blondel,\n    A Regularized Framework for Sparse and Structured Neural Attention.\n    eprint https://arxiv.org/abs/1705.07704\n    \"\"\"\n\n    def __init__(self, alpha=0, beta=1):\n        self.alpha = alpha\n        self.beta = beta\n\n    def project(self, x):\n        x_np = x.detach().numpy().copy()\n        weights = _oscar_weights(self.alpha, self.beta, x_np.shape[0])\n        y_hat_np = prox_owl(x_np, weights)\n        y_hat = torch.from_numpy(y_hat_np)\n        return y_hat\n\n    def project_jv(self, dout, y_hat):\n        return oscar_prox_jv(y_hat, dout)\n\n\nclass Oscarmax(nn.Module):\n    def __init__(self, beta=1):\n        self.beta = beta\n        super(Oscarmax, self).__init__()\n\n    def forward(self, x, lengths=None):\n        oscar_prox = OscarProxFunction(beta=self.beta)\n        sparsemax = SparsemaxFunction()\n        return sparsemax(oscar_prox(x, lengths), lengths)\n\n\nif __name__ == '__main__':\n    from timeit import timeit\n    torch.manual_seed(1)\n\n    for dim in (5, 10, 50, 100, 500, 1000):\n\n        x = torch.randn(dim)\n        x_var = ta.Variable(x, requires_grad=True)\n\n        def _run_backward(x):\n            y_hat = OscarProxFunction(beta=0.1)(x)\n            val = y_hat.mean()\n            val.backward()\n\n        print(\"dimension={}\".format(dim))\n        print(\"la\", timeit(\"_run_backward(x_var)\",\n                           globals=globals(),\n                           number=10000))\n"
  },
  {
    "path": "pytorch/torchsparseattn/sparsemax.py",
    "content": "# encoding: utf8\n\n\"\"\"\nFrom Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label\nClassification. André F. T. Martins, Ramón Fernandez Astudillo\nIn: Proc. of ICML 2016, https://arxiv.org/abs/1602.02068\n\"\"\"\n\nfrom __future__ import division\n\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom .base import _BaseBatchProjection\n\n\ndef project_simplex(v, z=1):\n    v_sorted, _ = torch.sort(v, dim=0, descending=True)\n    cssv = torch.cumsum(v_sorted, dim=0) - z\n    ind = torch.arange(1, 1 + len(v)).to(dtype=v.dtype)\n    cond = v_sorted - cssv / ind > 0\n    rho = ind.masked_select(cond)[-1]\n    tau = cssv.masked_select(cond)[-1] / rho\n    w = torch.clamp(v - tau, min=0)\n    return w\n\n\ndef sparsemax_grad(dout, w_star):\n    supp = w_star > 0\n    masked = dout.masked_select(supp)\n    nnz = supp.to(dtype=dout.dtype).sum()\n    masked -= masked.sum() / nnz\n    out = dout.new(dout.size()).zero_()\n    out[supp] = masked\n    return(out)\n\n\nclass SparsemaxFunction(_BaseBatchProjection):\n\n    def project(self, x):\n        return project_simplex(x)\n\n    def project_jv(self, dout, y_star):\n        return sparsemax_grad(dout, y_star)\n\n\nclass Sparsemax(nn.Module):\n\n    def forward(self, x, lengths=None):\n        sparsemax = SparsemaxFunction()\n        return sparsemax(x, lengths)\n\n"
  },
  {
    "path": "pytorch/torchsparseattn/test_attention.py",
    "content": "import pytest\n\nimport torch\nfrom torch import nn\nfrom torch.autograd import Variable\n\nfrom . import Sparsemax, Fusedmax, Oscarmax\n\n\nclass AttentionRegressor(nn.Module):\n\n    def __init__(self, projection, n_features=100):\n        super(AttentionRegressor, self).__init__()\n        self.projection = projection\n        self.attn_template = nn.Parameter(torch.Tensor(n_features))\n        self.attn_template.data.uniform_(-0.1, 0.1)\n\n    def forward(self, X, lengths):\n\n        # compute scores for each input word\n        scores = torch.matmul(X, self.attn_template)\n        weights = self.projection(scores, lengths)\n        weighted_avg = torch.bmm(X.transpose(1, 2),\n                                 weights.unsqueeze(-1)).squeeze(-1)\n        pred = weighted_avg.sum(dim=1)  # very simple prediction rule\n        return pred\n\n\n@pytest.mark.parametrize('projection', [Sparsemax(),\n                                        Fusedmax(0.1),\n                                        Oscarmax(0.01)])\ndef test_attention(projection):\n    n_samples = 20\n    max_len = 10\n    torch.manual_seed(1)\n    n_features = 50\n\n    X = torch.zeros(n_samples, max_len, n_features)\n\n    # generate lengths in [1, max_len]\n    lengths = 1 + (torch.rand(n_samples) * max_len).long()\n\n    for i in range(n_samples):\n        X[i, :lengths[i], :] = torch.randn(lengths[i], n_features)\n\n    X = Variable(X)\n    lengths = Variable(lengths)\n    targets = Variable(torch.randn(n_samples))\n\n    regr = AttentionRegressor(projection, n_features=n_features)\n    loss_func = nn.MSELoss()\n    optim = torch.optim.SGD(regr.parameters(), lr=0.0001)\n\n    pred = regr(X, lengths)\n\n    init_obj = loss_func(pred, targets)\n\n    for it in range(50):\n        optim.zero_grad()\n        pred = regr(X, lengths)\n        obj = loss_func(pred, targets)\n        obj.backward()\n        optim.step()\n\n    final_obj = obj\n    assert final_obj < init_obj\n    assert regr.attn_template.grad.size() == (n_features,)\n"
  },
  {
    "path": "pytorch/torchsparseattn/test_fused.py",
    "content": "from __future__ import division\n\nimport pytest\nfrom numpy.testing import assert_allclose\nimport torch\nfrom torch.autograd import gradcheck, Variable\n\nfrom .fused import fused_prox_jv_slow, fused_prox_jv_fast\nfrom .fused import FusedProxFunction\n\n\ndef _fused_prox_jacobian(y_hat, dout=None):\n    \"\"\"reference naive implementation: construct the jacobian\"\"\"\n    dim = y_hat.shape[0]\n    groups = torch.zeros(dim)\n    J = torch.zeros(dim, dim)\n    current_group = 0\n\n    for i in range(1, dim):\n        if y_hat[i] == y_hat[i - 1]:\n            groups[i] = groups[i - 1]\n        else:\n            current_group += 1\n            groups[i] = current_group\n\n    for i in range(dim):\n        for j in range(dim):\n            if groups[i] == groups[j]:\n                n_fused = (groups == groups[i]).sum()\n                J[i, j] = 1 / n_fused.to(y_hat.dtype)\n\n    if dout is not None:\n        return torch.mv(J, dout)\n    else:\n        return J\n\n\n@pytest.mark.parametrize('alpha', [0.001, 0.01, 0.1, 1])\ndef test_jv(alpha):\n\n    torch.manual_seed(1)\n    torch.set_default_tensor_type('torch.DoubleTensor')\n\n    for _ in range(30):\n        x = Variable(torch.randn(15))\n        dout = torch.randn(15)\n\n        y_hat = FusedProxFunction(alpha=alpha)(x).data\n\n\n        ref = _fused_prox_jacobian(y_hat, dout)\n        din_slow = fused_prox_jv_slow(y_hat, dout)\n        din_fast = fused_prox_jv_fast(y_hat, dout)\n        assert_allclose(ref.numpy(), din_slow.numpy(), atol=1e-5)\n        assert_allclose(ref.numpy(), din_fast.numpy(), atol=1e-5)\n\n\n@pytest.mark.parametrize('alpha', [0.001, 0.01, 0.1, 1])\ndef test_finite_diff(alpha):\n    torch.manual_seed(1)\n    torch.set_default_tensor_type('torch.DoubleTensor')\n\n    for _ in range(30):\n        x = Variable(torch.randn(20), requires_grad=True)\n        func = FusedProxFunction(alpha=alpha)\n        assert gradcheck(func, (x,), eps=1e-4, atol=1e-3)\n"
  },
  {
    "path": "pytorch/torchsparseattn/test_oscar.py",
    "content": "from __future__ import division\n\nimport pytest\nfrom numpy.testing import assert_allclose\nimport numpy as np\nimport torch\nfrom torch.autograd import gradcheck, Variable\n\nfrom .oscar import OscarProxFunction, oscar_prox_jv\n\n\ndef _oscar_prox_jacobian(y_star, dout=None):\n    y_star = y_star.numpy()\n    dim = y_star.shape[0]\n    J = torch.zeros(dim, dim)\n\n    _, inv, counts = np.unique(np.abs(y_star),\n                               return_inverse=True,\n                               return_counts=True)\n\n    for i in range(dim):\n        for j in range(dim):\n            if (inv[i] == inv[j] and\n                    y_star[i] != 0):\n                J[i, j] = (np.sign(y_star[i]) * np.sign(y_star[j])\n                           / counts[inv[i]])\n    if dout is not None:\n        return torch.mv(J, dout)\n    else:\n        return J\n\n\n@pytest.mark.parametrize('alpha', [0.001, 0.01, 0.1, 1])\n@pytest.mark.parametrize('beta', [0.001, 0.01, 0.1, 1])\ndef test_jv(alpha, beta):\n\n    torch.manual_seed(1)\n    torch.set_default_tensor_type('torch.DoubleTensor')\n\n    for _ in range(30):\n        x = Variable(torch.randn(15))\n        dout = torch.randn(15)\n        y_hat = OscarProxFunction(alpha=alpha, beta=beta)(x).data\n\n        ref = _oscar_prox_jacobian(y_hat, dout)\n        din = oscar_prox_jv(y_hat, dout)\n        assert_allclose(ref.numpy(), din.numpy(), atol=1e-5)\n\n\n@pytest.mark.parametrize('alpha', [0.001, 0.01, 0.1, 1])\n@pytest.mark.parametrize('beta', [0.001, 0.01, 0.1, 1])\ndef test_finite_diff(alpha, beta):\n    torch.manual_seed(1)\n    torch.set_default_tensor_type('torch.DoubleTensor')\n\n    for _ in range(30):\n        x = Variable(torch.randn(20), requires_grad=True)\n        func = OscarProxFunction(alpha, beta=beta)\n        assert gradcheck(func, (x,), eps=1e-5, atol=1e-3)\n"
  },
  {
    "path": "pytorch/torchsparseattn/test_sparsemax.py",
    "content": "import torch\nfrom torch.autograd import gradcheck, Variable\nfrom .sparsemax import SparsemaxFunction\n\n\ndef test_sparsemax():\n\n    torch.manual_seed(1)\n    torch.set_default_tensor_type('torch.DoubleTensor')\n\n    for _ in range(30):\n        func = SparsemaxFunction()\n        x = Variable(torch.randn(20), requires_grad=True)\n        assert gradcheck(func, (x,), eps=1e-4, atol=1e-3)\n"
  }
]