Full Code of lmnt-com/haste for AI

master ceba32ecd373 cached

102 files

762.7 KB

205.3k tokens

413 symbols

1 requests

Download .txt

Showing preview only (799K chars total). Download the full file or copy to clipboard to get everything.

Repository: lmnt-com/haste
Branch: master
Commit: ceba32ecd373
Files: 102
Total size: 762.7 KB

Directory structure:
gitextract_l70fejx2/

├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── Makefile
├── README.md
├── benchmarks/
│   ├── benchmark_gru.cc
│   ├── benchmark_lstm.cc
│   ├── cudnn_wrappers.h
│   └── report.py
├── build/
│   ├── MANIFEST.in
│   ├── common.py
│   ├── setup.pytorch.py
│   └── setup.tf.py
├── docs/
│   ├── pytorch/
│   │   ├── haste_pytorch/
│   │   │   ├── GRU.md
│   │   │   ├── IndRNN.md
│   │   │   ├── LSTM.md
│   │   │   ├── LayerNormGRU.md
│   │   │   └── LayerNormLSTM.md
│   │   └── haste_pytorch.md
│   └── tf/
│       ├── haste_tf/
│       │   ├── GRU.md
│       │   ├── GRUCell.md
│       │   ├── IndRNN.md
│       │   ├── LSTM.md
│       │   ├── LayerNorm.md
│       │   ├── LayerNormGRU.md
│       │   ├── LayerNormGRUCell.md
│       │   ├── LayerNormLSTM.md
│       │   ├── LayerNormLSTMCell.md
│       │   └── ZoneoutWrapper.md
│       └── haste_tf.md
├── examples/
│   ├── device_ptr.h
│   ├── gru.cc
│   └── lstm.cc
├── frameworks/
│   ├── pytorch/
│   │   ├── __init__.py
│   │   ├── base_rnn.py
│   │   ├── gru.cc
│   │   ├── gru.py
│   │   ├── indrnn.cc
│   │   ├── indrnn.py
│   │   ├── layer_norm_gru.cc
│   │   ├── layer_norm_gru.py
│   │   ├── layer_norm_indrnn.cc
│   │   ├── layer_norm_indrnn.py
│   │   ├── layer_norm_lstm.cc
│   │   ├── layer_norm_lstm.py
│   │   ├── lstm.cc
│   │   ├── lstm.py
│   │   ├── support.cc
│   │   └── support.h
│   └── tf/
│       ├── __init__.py
│       ├── arena.h
│       ├── base_rnn.py
│       ├── gru.cc
│       ├── gru.py
│       ├── gru_cell.py
│       ├── indrnn.cc
│       ├── indrnn.py
│       ├── layer_norm.cc
│       ├── layer_norm.py
│       ├── layer_norm_gru.cc
│       ├── layer_norm_gru.py
│       ├── layer_norm_gru_cell.py
│       ├── layer_norm_indrnn.cc
│       ├── layer_norm_indrnn.py
│       ├── layer_norm_lstm.cc
│       ├── layer_norm_lstm.py
│       ├── layer_norm_lstm_cell.py
│       ├── lstm.cc
│       ├── lstm.py
│       ├── support.cc
│       ├── support.h
│       ├── weight_config.py
│       └── zoneout_wrapper.py
├── lib/
│   ├── blas.h
│   ├── device_assert.h
│   ├── gru_backward_gpu.cu.cc
│   ├── gru_forward_gpu.cu.cc
│   ├── haste/
│   │   ├── gru.h
│   │   ├── indrnn.h
│   │   ├── layer_norm.h
│   │   ├── layer_norm_gru.h
│   │   ├── layer_norm_indrnn.h
│   │   ├── layer_norm_lstm.h
│   │   └── lstm.h
│   ├── haste.h
│   ├── indrnn_backward_gpu.cu.cc
│   ├── indrnn_forward_gpu.cu.cc
│   ├── inline_ops.h
│   ├── layer_norm_backward_gpu.cu.cc
│   ├── layer_norm_forward_gpu.cu.cc
│   ├── layer_norm_gru_backward_gpu.cu.cc
│   ├── layer_norm_gru_forward_gpu.cu.cc
│   ├── layer_norm_indrnn_backward_gpu.cu.cc
│   ├── layer_norm_indrnn_forward_gpu.cu.cc
│   ├── layer_norm_lstm_backward_gpu.cu.cc
│   ├── layer_norm_lstm_forward_gpu.cu.cc
│   ├── lstm_backward_gpu.cu.cc
│   └── lstm_forward_gpu.cu.cc
└── validation/
    ├── pytorch.py
    ├── pytorch_speed.py
    ├── tf.py
    └── tf_pytorch.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*.a
*.o
*.so
*.whl
benchmark_lstm
benchmark_gru
haste_lstm
haste_gru


================================================
FILE: CHANGELOG.md
================================================
# ChangeLog

## 0.4.0 (2020-04-13)
### Added
- New layer normalized GRU layer (`LayerNormGRU`).
- New IndRNN layer.
- CPU support for all PyTorch layers.
- Support for building PyTorch API on Windows.
- Added `state` argument to PyTorch layers to specify initial state.
- Added weight transforms to TensorFlow API (see docs for details).
- Added `get_weights` method to extract weights from RNN layers (TensorFlow).
- Added `to_native_weights` and `from_native_weights` to PyTorch API for `LSTM` and `GRU` layers.
- Validation tests to check for correctness.

### Changed
- Performance improvements to GRU layer.
- BREAKING CHANGE: PyTorch layers default to CPU instead of GPU.
- BREAKING CHANGE: `h` must not be transposed before passing it to `gru::BackwardPass::Iterate`.

### Fixed
- Multi-GPU training with TensorFlow caused by invalid sharing of `cublasHandle_t`.

## 0.3.0 (2020-03-09)
### Added
- PyTorch support.
- New layer normalized LSTM layer (`LayerNormLSTM`).
- New fused layer normalization layer.

### Fixed
- Occasional uninitialized memory use in TensorFlow LSTM implementation.

## 0.2.0 (2020-02-12)
### Added
- New time-fused API for LSTM (`lstm::ForwardPass::Run`, `lstm::BackwardPass::Run`).
- Benchmarking code to evaluate the performance of an implementation.

### Changed
- Performance improvements to existing iterative LSTM API.
- BREAKING CHANGE: `h` must not be transposed before passing it to `lstm::BackwardPass::Iterate`.
- BREAKING CHANGE: `dv` does not need to be allocated and `v` must be passed instead to `lstm::BackwardPass::Iterate`.

## 0.1.0 (2020-01-29)
### Added
- Initial release of Haste.


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright 2020 LMNT, Inc.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: Makefile
================================================
AR ?= ar
CXX ?= g++
NVCC ?= nvcc -ccbin $(CXX)
PYTHON ?= python

ifeq ($(OS),Windows_NT)
LIBHASTE := haste.lib
CUDA_HOME ?= $(CUDA_PATH)
AR := lib
AR_FLAGS := /nologo /out:$(LIBHASTE)
NVCC_FLAGS := -x cu -Xcompiler "/MD"
else
LIBHASTE := libhaste.a
CUDA_HOME ?= /usr/local/cuda
AR ?= ar
AR_FLAGS := -crv $(LIBHASTE)
NVCC_FLAGS := -std=c++11 -x cu -Xcompiler -fPIC
endif

LOCAL_CFLAGS := -I/usr/include/eigen3 -I$(CUDA_HOME)/include -Ilib -O3
LOCAL_LDFLAGS := -L$(CUDA_HOME)/lib64 -L. -lcudart -lcublas
GPU_ARCH_FLAGS := -gencode arch=compute_37,code=compute_37 -gencode arch=compute_60,code=compute_60 -gencode arch=compute_70,code=compute_70

# Small enough project that we can just recompile all the time.
.PHONY: all haste haste_tf haste_pytorch libhaste_tf examples benchmarks clean

all: haste haste_tf haste_pytorch examples benchmarks

haste:
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/lstm_forward_gpu.cu.cc -o lib/lstm_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/lstm_backward_gpu.cu.cc -o lib/lstm_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/gru_forward_gpu.cu.cc -o lib/gru_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/gru_backward_gpu.cu.cc -o lib/gru_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_forward_gpu.cu.cc -o lib/layer_norm_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_backward_gpu.cu.cc -o lib/layer_norm_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_lstm_forward_gpu.cu.cc -o lib/layer_norm_lstm_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_lstm_backward_gpu.cu.cc -o lib/layer_norm_lstm_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_gru_forward_gpu.cu.cc -o lib/layer_norm_gru_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_gru_backward_gpu.cu.cc -o lib/layer_norm_gru_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/indrnn_backward_gpu.cu.cc -o lib/indrnn_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/indrnn_forward_gpu.cu.cc -o lib/indrnn_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_indrnn_forward_gpu.cu.cc -o lib/layer_norm_indrnn_forward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(NVCC) $(GPU_ARCH_FLAGS) -c lib/layer_norm_indrnn_backward_gpu.cu.cc -o lib/layer_norm_indrnn_backward_gpu.o $(NVCC_FLAGS) $(LOCAL_CFLAGS)
	$(AR) $(AR_FLAGS) lib/*.o

libhaste_tf: haste
	$(eval TF_CFLAGS := $(shell $(PYTHON) -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))'))
	$(eval TF_LDFLAGS := $(shell $(PYTHON) -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))'))
	$(CXX) -std=c++11 -c frameworks/tf/lstm.cc -o frameworks/tf/lstm.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/gru.cc -o frameworks/tf/gru.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/layer_norm.cc -o frameworks/tf/layer_norm.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/layer_norm_gru.cc -o frameworks/tf/layer_norm_gru.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/layer_norm_indrnn.cc -o frameworks/tf/layer_norm_indrnn.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/layer_norm_lstm.cc -o frameworks/tf/layer_norm_lstm.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/indrnn.cc -o frameworks/tf/indrnn.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -std=c++11 -c frameworks/tf/support.cc -o frameworks/tf/support.o $(LOCAL_CFLAGS) $(TF_CFLAGS) -fPIC
	$(CXX) -shared frameworks/tf/*.o libhaste.a -o frameworks/tf/libhaste_tf.so $(LOCAL_LDFLAGS) $(TF_LDFLAGS) -fPIC

# Dependencies handled by setup.py
haste_tf:
	@$(eval TMP := $(shell mktemp -d))
	@cp -r . $(TMP)
	@cat build/common.py build/setup.tf.py > $(TMP)/setup.py
	@(cd $(TMP); $(PYTHON) setup.py -q bdist_wheel)
	@cp $(TMP)/dist/*.whl .
	@rm -rf $(TMP)

# Dependencies handled by setup.py
haste_pytorch:
	@$(eval TMP := $(shell mktemp -d))
	@cp -r . $(TMP)
	@cat build/common.py build/setup.pytorch.py > $(TMP)/setup.py
	@(cd $(TMP); $(PYTHON) setup.py -q bdist_wheel)
	@cp $(TMP)/dist/*.whl .
	@rm -rf $(TMP)

dist:
	@$(eval TMP := $(shell mktemp -d))
	@cp -r . $(TMP)
	@cp build/MANIFEST.in $(TMP)
	@cat build/common.py build/setup.tf.py > $(TMP)/setup.py
	@(cd $(TMP); $(PYTHON) setup.py -q sdist)
	@cp $(TMP)/dist/*.tar.gz .
	@rm -rf $(TMP)
	@$(eval TMP := $(shell mktemp -d))
	@cp -r . $(TMP)
	@cp build/MANIFEST.in $(TMP)
	@cat build/common.py build/setup.pytorch.py > $(TMP)/setup.py
	@(cd $(TMP); $(PYTHON) setup.py -q sdist)
	@cp $(TMP)/dist/*.tar.gz .
	@rm -rf $(TMP)

examples: haste
	$(CXX) -std=c++11 examples/lstm.cc $(LIBHASTE) $(LOCAL_CFLAGS) $(LOCAL_LDFLAGS) -o haste_lstm -Wno-ignored-attributes
	$(CXX) -std=c++11 examples/gru.cc $(LIBHASTE) $(LOCAL_CFLAGS) $(LOCAL_LDFLAGS) -o haste_gru -Wno-ignored-attributes

benchmarks: haste
	$(CXX) -std=c++11 benchmarks/benchmark_lstm.cc $(LIBHASTE) $(LOCAL_CFLAGS) $(LOCAL_LDFLAGS) -o benchmark_lstm -Wno-ignored-attributes -lcudnn
	$(CXX) -std=c++11 benchmarks/benchmark_gru.cc $(LIBHASTE) $(LOCAL_CFLAGS) $(LOCAL_LDFLAGS) -o benchmark_gru -Wno-ignored-attributes -lcudnn

clean:
	rm -fr benchmark_lstm benchmark_gru haste_lstm haste_gru haste_*.whl haste_*.tar.gz
	find . \( -iname '*.o' -o -iname '*.so' -o -iname '*.a' -o -iname '*.lib' \) -delete


================================================
FILE: README.md
================================================
<div align="center">
  <img src="https://lmnt.com/assets/haste-logo_social_media.png">
</div>

--------------------------------------------------------------------------------
[![GitHub release (latest SemVer including pre-releases)](https://img.shields.io/github/v/release/lmnt-com/haste?include_prereleases)](https://github.com/lmnt-com/haste/releases) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hzYhcyvbXYMAUwa3515BszSkhx1UUFSt) [![GitHub](https://img.shields.io/github/license/lmnt-com/haste)](LICENSE)

**We're hiring!**
If you like what we're building here, [come join us at LMNT](https://explore.lmnt.com).

Haste is a CUDA implementation of fused RNN layers with built-in [DropConnect](http://proceedings.mlr.press/v28/wan13.html) and [Zoneout](https://arxiv.org/abs/1606.01305) regularization. These layers are exposed through C++ and Python APIs for easy integration into your own projects or machine learning frameworks.

Which RNN types are supported?
- [GRU](https://en.wikipedia.org/wiki/Gated_recurrent_unit)
- [IndRNN](http://arxiv.org/abs/1803.04831)
- [LSTM](https://en.wikipedia.org/wiki/Long_short-term_memory)
- [Layer Normalized GRU](https://arxiv.org/abs/1607.06450)
- [Layer Normalized LSTM](https://arxiv.org/abs/1607.06450)

What's included in this project?
- a standalone C++ API (`libhaste`)
- a TensorFlow Python API (`haste_tf`)
- a PyTorch API (`haste_pytorch`)
- examples for writing your own custom C++ inference / training code using `libhaste`
- benchmarking programs to evaluate the performance of RNN implementations

For questions or feedback about Haste, please open an issue on GitHub or send us an email at [haste@lmnt.com](mailto:haste@lmnt.com).

## Install
Here's what you'll need to get started:
- a [CUDA Compute Capability](https://developer.nvidia.com/cuda-gpus) 3.7+ GPU (required)
- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) 10.0+ (required)
- [TensorFlow GPU](https://www.tensorflow.org/install/gpu) 1.14+ or 2.0+ for TensorFlow integration (optional)
- [PyTorch](https://pytorch.org) 1.3+ for PyTorch integration (optional)
- [Eigen 3](http://eigen.tuxfamily.org/) to build the C++ examples (optional)
- [cuDNN Developer Library](https://developer.nvidia.com/rdp/cudnn-archive) to build benchmarking programs (optional)

Once you have the prerequisites, you can install with pip or by building the source code.

### Using pip
```
pip install haste_pytorch
pip install haste_tf
```

### Building from source
```
make               # Build everything
make haste         # ;) Build C++ API
make haste_tf      # Build TensorFlow API
make haste_pytorch # Build PyTorch API
make examples
make benchmarks
```

If you built the TensorFlow or PyTorch API, install it with `pip`:
```
pip install haste_tf-*.whl
pip install haste_pytorch-*.whl
```

If the CUDA Toolkit that you're building against is not in `/usr/local/cuda`, you must specify the
`$CUDA_HOME` environment variable before running make:
```
CUDA_HOME=/usr/local/cuda-10.2 make
```

## Performance
Our LSTM and GRU benchmarks indicate that Haste has the fastest publicly available implementation for nearly all problem sizes. The following charts show our LSTM results, but the GRU results are qualitatively similar.
<table>
  <tr><td><img src="https://lmnt.com/assets/haste/benchmark/report_n=16_c=128.png"></td><td><img src="https://lmnt.com/assets/haste/benchmark/report_n=32_c=256.png"></td></tr>
  <tr></tr>
  <tr><td><img src="https://lmnt.com/assets/haste/benchmark/report_n=64_c=128.png"></td><td><img src="https://lmnt.com/assets/haste/benchmark/report_n=128_c=256.png"></td></tr>
</table>

Here is our complete LSTM benchmark result grid:
<br>
[`N=1 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=64.png)
[`N=1 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=128.png)
[`N=1 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=256.png)
[`N=1 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=1_c=512.png)
<br>
[`N=32 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=64.png)
[`N=32 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=128.png)
[`N=32 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=256.png)
[`N=32 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=32_c=512.png)
<br>
[`N=64 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=64.png)
[`N=64 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=128.png)
[`N=64 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=256.png)
[`N=64 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=64_c=512.png)
<br>
[`N=128 C=64`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=64.png)
[`N=128 C=128`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=128.png)
[`N=128 C=256`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=256.png)
[`N=128 C=512`](https://lmnt.com/assets/haste/benchmark/report_n=128_c=512.png)

## Documentation
### TensorFlow API
```python
import haste_tf as haste

gru_layer = haste.GRU(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)
indrnn_layer = haste.IndRNN(num_units=256, direction='bidirectional', zoneout=0.1)
lstm_layer = haste.LSTM(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)
norm_gru_layer = haste.LayerNormGRU(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)
norm_lstm_layer = haste.LayerNormLSTM(num_units=256, direction='bidirectional', zoneout=0.1, dropout=0.05)

# `x` is a tensor with shape [N,T,C]
x = tf.random.normal([5, 25, 128])

y, state = gru_layer(x, training=True)
y, state = indrnn_layer(x, training=True)
y, state = lstm_layer(x, training=True)
y, state = norm_gru_layer(x, training=True)
y, state = norm_lstm_layer(x, training=True)
```

The TensorFlow Python API is documented in [`docs/tf/haste_tf.md`](docs/tf/haste_tf.md).

### PyTorch API
```python
import torch
import haste_pytorch as haste

gru_layer = haste.GRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)
indrnn_layer = haste.IndRNN(input_size=128, hidden_size=256, zoneout=0.1)
lstm_layer = haste.LSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)
norm_gru_layer = haste.LayerNormGRU(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)
norm_lstm_layer = haste.LayerNormLSTM(input_size=128, hidden_size=256, zoneout=0.1, dropout=0.05)

gru_layer.cuda()
indrnn_layer.cuda()
lstm_layer.cuda()
norm_gru_layer.cuda()
norm_lstm_layer.cuda()

# `x` is a CUDA tensor with shape [T,N,C]
x = torch.rand([25, 5, 128]).cuda()

y, state = gru_layer(x)
y, state = indrnn_layer(x)
y, state = lstm_layer(x)
y, state = norm_gru_layer(x)
y, state = norm_lstm_layer(x)
```

The PyTorch API is documented in [`docs/pytorch/haste_pytorch.md`](docs/pytorch/haste_pytorch.md).

### C++ API
The C++ API is documented in [`lib/haste/*.h`](lib/haste/) and there are code samples in [`examples/`](examples/).

## Code layout
- [`benchmarks/`](benchmarks): programs to evaluate performance of RNN implementations
- [`docs/tf/`](docs/tf): API reference documentation for `haste_tf`
- [`docs/pytorch/`](docs/pytorch): API reference documentation for `haste_pytorch`
- [`examples/`](examples): examples for writing your own C++ inference / training code using `libhaste`
- [`frameworks/tf/`](frameworks/tf): TensorFlow Python API and custom op code
- [`frameworks/pytorch/`](frameworks/pytorch): PyTorch API and custom op code
- [`lib/`](lib): CUDA kernels and C++ API
- [`validation/`](validation): scripts to validate output and gradients of RNN layers

## Implementation notes
- the GRU implementation is based on `1406.1078v1` (same as cuDNN) rather than `1406.1078v3`
- Zoneout on LSTM cells is applied to the hidden state only, and not the cell state
- the layer normalized LSTM implementation uses [these equations](https://github.com/lmnt-com/haste/issues/1)

## References
1. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. _Neural Computation_, _9_(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
1. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. _arXiv:1406.1078 [cs, stat]_. http://arxiv.org/abs/1406.1078.
1. Wan, L., Zeiler, M., Zhang, S., Cun, Y. L., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. In _International Conference on Machine Learning_ (pp. 1058–1066). Presented at the International Conference on Machine Learning. http://proceedings.mlr.press/v28/wan13.html.
1. Krueger, D., Maharaj, T., Kramár, J., Pezeshki, M., Ballas, N., Ke, N. R., et al. (2017). Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations. _arXiv:1606.01305 [cs]_. http://arxiv.org/abs/1606.01305.
1. Ba, J., Kiros, J.R., & Hinton, G.E. (2016). Layer Normalization. _arXiv:1607.06450 [cs, stat]_. https://arxiv.org/abs/1607.06450.
1. Li, S., Li, W., Cook, C., Zhu, C., & Gao, Y. (2018). Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN. _arXiv:1803.04831 [cs]_. http://arxiv.org/abs/1803.04831.

## Citing this work
To cite this work, please use the following BibTeX entry:
```
@misc{haste2020,
  title  = {Haste: a fast, simple, and open RNN library},
  author = {Sharvil Nanavati},
  year   = 2020,
  month  = "Jan",
  howpublished = {\url{https://github.com/lmnt-com/haste/}},
}
```

## License
[Apache 2.0](LICENSE)


================================================
FILE: benchmarks/benchmark_gru.cc
================================================
// Copyright 2020 LMNT, Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ==============================================================================

#include <Eigen/Dense>
#include <cassert>
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cuda.h>
#include <cuda_runtime_api.h>
#include <cudnn.h>
#include <getopt.h>
#include <iostream>
#include <string>
#include <unsupported/Eigen/CXX11/Tensor>
#include <vector>

#include "../examples/device_ptr.h"
#include "cudnn_wrappers.h"
#include "haste.h"

using haste::v0::gru::BackwardPass;
using haste::v0::gru::ForwardPass;
using std::string;

using Tensor1 = Eigen::Tensor<float, 1>;
using Tensor2 = Eigen::Tensor<float, 2>;
using Tensor3 = Eigen::Tensor<float, 3>;

static constexpr int DEFAULT_SAMPLE_SIZE = 10;
static constexpr int DEFAULT_TIME_STEPS = 50;

static cudnnHandle_t g_cudnn_handle;
static cublasHandle_t g_blas_handle;

float TimeLoop(std::function<void()> fn, int iterations) {
  cudaEvent_t start, stop;
  cudaEventCreate(&start);
  cudaEventCreate(&stop);
  cudaEventRecord(start);
  for (int i = 0; i < iterations; ++i)
    fn();
  float elapsed_ms;
  cudaEventRecord(stop);
  cudaEventSynchronize(stop);
  cudaEventElapsedTime(&elapsed_ms, start, stop);
  cudaEventDestroy(start);
  cudaEventDestroy(stop);
  return elapsed_ms / iterations;
}

float CudnnInference(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& bx,
    const Tensor1& br,
    const Tensor3& x) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  device_ptr<Tensor3> x_dev(x);

  device_ptr<Tensor2> h_dev(batch_size * hidden_size);
  device_ptr<Tensor2> c_dev(batch_size * hidden_size);
  device_ptr<Tensor3> y_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor2> h_out_dev(batch_size * hidden_size);
  device_ptr<Tensor2> c_out_dev(batch_size * hidden_size);

  h_dev.zero();
  c_dev.zero();

  // Descriptors all the way down. Nice.
  RnnDescriptor<float> rnn_descriptor(g_cudnn_handle, hidden_size, CUDNN_GRU);

  TensorDescriptorArray<float> x_descriptors(time_steps, { batch_size, input_size, 1 });
  TensorDescriptorArray<float> y_descriptors(time_steps, { batch_size, hidden_size, 1 });

  auto h_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto c_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto h_out_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto c_out_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });

  size_t workspace_size;
  cudnnGetRNNWorkspaceSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &x_descriptors,
      &workspace_size);
  auto workspace_dev = device_ptr<Tensor1>::NewByteSized(workspace_size);

  size_t w_count;
  cudnnGetRNNParamsSize(
      g_cudnn_handle,
      *rnn_descriptor,
      *&x_descriptors,
      &w_count,
      CUDNN_DATA_FLOAT);

  auto w_dev = device_ptr<Tensor1>::NewByteSized(w_count);
  FilterDescriptor<float> w_descriptor(w_dev.Size());

  float ms = TimeLoop([&]() {
    cudnnRNNForwardInference(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &x_descriptors,
        x_dev.data,
        *h_descriptor,
        h_dev.data,
        *c_descriptor,
        c_dev.data,
        *w_descriptor,
        w_dev.data,
        &y_descriptors,
        y_dev.data,
        *h_out_descriptor,
        h_out_dev.data,
        *c_out_descriptor,
        c_out_dev.data,
        workspace_dev.data,
        workspace_size);
  }, sample_size);
  return ms;
}

float CudnnTrain(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& bx,
    const Tensor1& br,
    const Tensor3& x,
    const Tensor3& dh) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  device_ptr<Tensor3> y_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor3> dy_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor2> dhy_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dcy_dev(batch_size * hidden_size);
  device_ptr<Tensor2> hx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> cx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dx_dev(time_steps * batch_size * input_size);
  device_ptr<Tensor2> dhx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dcx_dev(batch_size * hidden_size);

  RnnDescriptor<float> rnn_descriptor(g_cudnn_handle, hidden_size, CUDNN_GRU);
  TensorDescriptorArray<float> y_descriptors(time_steps, { batch_size, hidden_size, 1 });
  TensorDescriptorArray<float> dy_descriptors(time_steps, { batch_size, hidden_size, 1 });
  TensorDescriptorArray<float> dx_descriptors(time_steps, { batch_size, input_size, 1 });

  TensorDescriptor<float> dhy_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dcy_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> hx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> cx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dhx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dcx_descriptor({ 1, batch_size, hidden_size });

  size_t workspace_size = 0;
  cudnnGetRNNWorkspaceSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &dx_descriptors,
      &workspace_size);
  auto workspace_dev = device_ptr<Tensor1>::NewByteSized(workspace_size);

  size_t w_count = 0;
  cudnnGetRNNParamsSize(
      g_cudnn_handle,
      *rnn_descriptor,
      *&dx_descriptors,
      &w_count,
      CUDNN_DATA_FLOAT);

  auto w_dev = device_ptr<Tensor1>::NewByteSized(w_count);
  FilterDescriptor<float> w_descriptor(w_dev.Size());

  size_t reserve_size = 0;
  cudnnGetRNNTrainingReserveSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &dx_descriptors,
      &reserve_size);
  auto reserve_dev = device_ptr<Tensor1>::NewByteSized(reserve_size);

  float ms = TimeLoop([&]() {
    cudnnRNNForwardTraining(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &dx_descriptors,
        dx_dev.data,
        *hx_descriptor,
        hx_dev.data,
        *cx_descriptor,
        cx_dev.data,
        *w_descriptor,
        w_dev.data,
        &y_descriptors,
        y_dev.data,
        *dhy_descriptor,
        dhy_dev.data,
        *dcy_descriptor,
        dcy_dev.data,
        workspace_dev.data,
        workspace_size,
        reserve_dev.data,
        reserve_size);

    cudnnRNNBackwardData(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &y_descriptors,
        y_dev.data,
        &dy_descriptors,
        dy_dev.data,
        *dhy_descriptor,
        dhy_dev.data,
        *dcy_descriptor,
        dcy_dev.data,
        *w_descriptor,
        w_dev.data,
        *hx_descriptor,
        hx_dev.data,
        *cx_descriptor,
        cx_dev.data,
        &dx_descriptors,
        dx_dev.data,
        *dhx_descriptor,
        dhx_dev.data,
        *dcx_descriptor,
        dcx_dev.data,
        workspace_dev.data,
        workspace_size,
        reserve_dev.data,
        reserve_size);

    cudnnRNNBackwardWeights(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &dx_descriptors,
        dx_dev.data,
        *hx_descriptor,
        hx_dev.data,
        &y_descriptors,
        y_dev.data,
        workspace_dev.data,
        workspace_size,
        *w_descriptor,
        w_dev.data,
        reserve_dev.data,
        reserve_size);
  }, sample_size);
  return ms;
}

float HasteInference(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& bx,
    const Tensor1& br,
    const Tensor3& x) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  // Copy weights over to GPU.
  device_ptr<Tensor2> W_dev(W);
  device_ptr<Tensor2> R_dev(R);
  device_ptr<Tensor1> bx_dev(bx);
  device_ptr<Tensor1> br_dev(br);
  device_ptr<Tensor3> x_dev(x);

  device_ptr<Tensor3> h_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor2> tmp_Wx_dev(time_steps * batch_size * hidden_size * 3);
  device_ptr<Tensor2> tmp_Rh_dev(batch_size * hidden_size * 3);

  h_dev.zero();

  // Settle down the GPU and off we go!
  cudaDeviceSynchronize();
  float ms = TimeLoop([&]() {
    ForwardPass<float> forward(
        false,
        batch_size,
        input_size,
        hidden_size,
        g_blas_handle);

    forward.Run(
        time_steps,
        W_dev.data,
        R_dev.data,
        bx_dev.data,
        br_dev.data,
        x_dev.data,
        h_dev.data,
        nullptr,
        tmp_Wx_dev.data,
        tmp_Rh_dev.data,
        0.0f,
        nullptr);
  }, sample_size);
  return ms;
}

float HasteTrain(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& bx,
    const Tensor1& br,
    const Tensor3& x,
    const Tensor3& dh) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  device_ptr<Tensor2> W_dev(W);
  device_ptr<Tensor2> R_dev(R);
  device_ptr<Tensor3> x_dev(x);
  device_ptr<Tensor3> h_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor3> v_dev(time_steps * batch_size * hidden_size * 4);
  device_ptr<Tensor2> tmp_Wx_dev(time_steps * batch_size * hidden_size * 3);
  device_ptr<Tensor2> tmp_Rh_dev(batch_size * hidden_size * 3);

  device_ptr<Tensor2> W_t_dev(W);
  device_ptr<Tensor2> R_t_dev(R);
  device_ptr<Tensor1> bx_dev(bx);
  device_ptr<Tensor1> br_dev(br);
  device_ptr<Tensor3> x_t_dev(x);

  // These gradients should actually come "from above" but we're just allocating
  // a bunch of uninitialized memory and passing it in.
  device_ptr<Tensor3> dh_new_dev(dh);

  device_ptr<Tensor3> dx_dev(time_steps * batch_size * input_size);
  device_ptr<Tensor2> dW_dev(input_size * hidden_size * 3);
  device_ptr<Tensor2> dR_dev(hidden_size * hidden_size * 3);
  device_ptr<Tensor2> dbx_dev(hidden_size * 3);
  device_ptr<Tensor2> dbr_dev(hidden_size * 3);
  device_ptr<Tensor2> dh_dev(batch_size * hidden_size);
  device_ptr<Tensor3> dp_dev(time_steps * batch_size * hidden_size * 3);
  device_ptr<Tensor3> dq_dev(time_steps * batch_size * hidden_size * 3);

  ForwardPass<float> forward(
      true,
      batch_size,
      input_size,
      hidden_size,
      g_blas_handle);

  BackwardPass<float> backward(
      batch_size,
      input_size,
      hidden_size,
      g_blas_handle);

  static const float alpha = 1.0f;
  static const float beta = 0.0f;

  cudaDeviceSynchronize();
  float ms = TimeLoop([&]() {
    forward.Run(
        time_steps,
        W_dev.data,
        R_dev.data,
        bx_dev.data,
        br_dev.data,
        x_dev.data,
        h_dev.data,
        v_dev.data,
        tmp_Wx_dev.data,
        tmp_Rh_dev.data,
        0.0f,
        nullptr);

    // Haste needs `x`, `W`, and `R` to be transposed between the forward
    // pass and backward pass. Add these transposes in here to get a fair
    // measurement of the overall time it takes to run an entire training
    // loop.
    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        batch_size * time_steps, input_size,
        &alpha,
        x_dev.data, input_size,
        &beta,
        x_dev.data, batch_size * time_steps,
        x_t_dev.data, batch_size * time_steps);

    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        input_size, hidden_size * 3,
        &alpha,
        W_dev.data, hidden_size * 3,
        &beta,
        W_dev.data, input_size,
        W_t_dev.data, input_size);

    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        hidden_size, hidden_size * 3,
        &alpha,
        R_dev.data, hidden_size * 3,
        &beta,
        R_dev.data, hidden_size,
        R_t_dev.data, hidden_size);

    backward.Run(
        time_steps,
        W_t_dev.data,
        R_t_dev.data,
        bx_dev.data,
        br_dev.data,
        x_t_dev.data,
        h_dev.data,
        v_dev.data,
        dh_new_dev.data,
        dx_dev.data,
        dW_dev.data,
        dR_dev.data,
        dbx_dev.data,
        dbr_dev.data,
        dh_dev.data,
        dp_dev.data,
        dq_dev.data,
        nullptr);
  }, sample_size);
  return ms;
}

void usage(const char* name) {
  printf("Usage: %s [OPTION]...\n", name);
  printf("  -h, --help\n");
  printf("  -i, --implementation IMPL <haste|cudnn> (default: haste)\n");
  printf("  -m, --mode MODE           <inference|training> (default: training)\n");
  printf("  -s, --sample_size NUM     number of runs to average over (default: %d)\n",
      DEFAULT_SAMPLE_SIZE);
  printf("  -t, --time_steps NUM      number of time steps in RNN (default: %d)\n",
      DEFAULT_TIME_STEPS);
}

int main(int argc, char* const* argv) {
  srand(time(0));

  cudnnCreate(&g_cudnn_handle);
  cublasCreate(&g_blas_handle);

  static struct option long_options[] = {
    { "help", no_argument, 0, 'h' },
    { "implementation", required_argument, 0, 'i' },
    { "mode", required_argument, 0, 'm' },
    { "sample_size", required_argument, 0, 's' },
    { "time_steps", required_argument, 0, 't' },
    { 0, 0, 0, 0 }
  };

  int c;
  int opt_index;
  bool inference_flag = false;
  bool haste_flag = true;
  int sample_size = DEFAULT_SAMPLE_SIZE;
  int time_steps = DEFAULT_TIME_STEPS;
  while ((c = getopt_long(argc, argv, "hi:m:s:t:", long_options, &opt_index)) != -1)
    switch (c) {
      case 'h':
        usage(argv[0]);
        return 0;
      case 'i':
        if (optarg[0] == 'c' || optarg[0] == 'C')
          haste_flag = false;
        break;
      case 'm':
        if (optarg[0] == 'i' || optarg[0] == 'I')
          inference_flag = true;
        break;
      case 's':
        sscanf(optarg, "%d", &sample_size);
        break;
      case 't':
        sscanf(optarg, "%d", &time_steps);
        break;
    }

  printf("# Benchmark configuration:\n");
  printf("#   Mode: %s\n", inference_flag ? "inference" : "training");
  printf("#   Implementation: %s\n", haste_flag ? "Haste" : "cuDNN");
  printf("#   Sample size: %d\n", sample_size);
  printf("#   Time steps: %d\n", time_steps);
  printf("#\n");
  printf("# batch_size,hidden_size,input_size,time_ms\n");

  for (const int N : { 1, 16, 32, 64, 128 }) {
    for (const int H : { 128, 256, 512, 768, 1024, 1536, 2048, 3072, 4096 }) {
      for (const int C : { 64, 128, 256, 512 }) {
        Tensor2 W(H * 3, C);
        Tensor2 R(H * 3, H);
        Tensor1 bx(H * 3);
        Tensor1 br(H * 3);
        Tensor3 x(C, N, time_steps);
        Tensor3 dh(H, N, time_steps + 1);

        float ms;
        if (inference_flag) {
          if (haste_flag)
            ms = HasteInference(sample_size, W, R, bx, br, x);
          else
            ms = CudnnInference(sample_size, W, R, bx, br, x);
        } else {
          if (haste_flag)
            ms = HasteTrain(sample_size, W, R, bx, br, x, dh);
          else
            ms = CudnnTrain(sample_size, W, R, bx, br, x, dh);
        }
        printf("%d,%d,%d,%f\n", N, H, C, ms);
      }
    }
  }

  cublasDestroy(g_blas_handle);
  cudnnDestroy(g_cudnn_handle);
  return 0;
}


================================================
FILE: benchmarks/benchmark_lstm.cc
================================================
// Copyright 2020 LMNT, Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ==============================================================================

#include <Eigen/Dense>
#include <cassert>
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cuda.h>
#include <cuda_runtime_api.h>
#include <cudnn.h>
#include <getopt.h>
#include <iostream>
#include <string>
#include <unsupported/Eigen/CXX11/Tensor>
#include <vector>

#include "../examples/device_ptr.h"
#include "cudnn_wrappers.h"
#include "haste.h"

using haste::v0::lstm::BackwardPass;
using haste::v0::lstm::ForwardPass;
using std::string;

using Tensor1 = Eigen::Tensor<float, 1>;
using Tensor2 = Eigen::Tensor<float, 2>;
using Tensor3 = Eigen::Tensor<float, 3>;

static constexpr int DEFAULT_SAMPLE_SIZE = 10;
static constexpr int DEFAULT_TIME_STEPS = 50;

static cudnnHandle_t g_cudnn_handle;
static cublasHandle_t g_blas_handle;

float TimeLoop(std::function<void()> fn, int iterations) {
  cudaEvent_t start, stop;
  cudaEventCreate(&start);
  cudaEventCreate(&stop);
  cudaEventRecord(start);
  for (int i = 0; i < iterations; ++i)
    fn();
  float elapsed_ms;
  cudaEventRecord(stop);
  cudaEventSynchronize(stop);
  cudaEventElapsedTime(&elapsed_ms, start, stop);
  cudaEventDestroy(start);
  cudaEventDestroy(stop);
  return elapsed_ms / iterations;
}

float CudnnInference(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& b,
    const Tensor3& x) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  device_ptr<Tensor3> x_dev(x);

  device_ptr<Tensor2> h_dev(batch_size * hidden_size);
  device_ptr<Tensor2> c_dev(batch_size * hidden_size);
  device_ptr<Tensor3> y_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor2> h_out_dev(batch_size * hidden_size);
  device_ptr<Tensor2> c_out_dev(batch_size * hidden_size);

  h_dev.zero();
  c_dev.zero();

  // Descriptors all the way down. Nice.
  RnnDescriptor<float> rnn_descriptor(g_cudnn_handle, hidden_size, CUDNN_LSTM);

  TensorDescriptorArray<float> x_descriptors(time_steps, { batch_size, input_size, 1 });
  TensorDescriptorArray<float> y_descriptors(time_steps, { batch_size, hidden_size, 1 });

  auto h_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto c_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto h_out_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });
  auto c_out_descriptor = TensorDescriptor<float>({ 1, batch_size, hidden_size });

  size_t workspace_size;
  cudnnGetRNNWorkspaceSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &x_descriptors,
      &workspace_size);
  auto workspace_dev = device_ptr<Tensor1>::NewByteSized(workspace_size);

  size_t w_count;
  cudnnGetRNNParamsSize(
      g_cudnn_handle,
      *rnn_descriptor,
      *&x_descriptors,
      &w_count,
      CUDNN_DATA_FLOAT);

  auto w_dev = device_ptr<Tensor1>::NewByteSized(w_count);
  FilterDescriptor<float> w_descriptor(w_dev.Size());

  float ms = TimeLoop([&]() {
    cudnnRNNForwardInference(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &x_descriptors,
        x_dev.data,
        *h_descriptor,
        h_dev.data,
        *c_descriptor,
        c_dev.data,
        *w_descriptor,
        w_dev.data,
        &y_descriptors,
        y_dev.data,
        *h_out_descriptor,
        h_out_dev.data,
        *c_out_descriptor,
        c_out_dev.data,
        workspace_dev.data,
        workspace_size);
  }, sample_size);
  return ms;
}

float CudnnTrain(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& b,
    const Tensor3& x,
    const Tensor3& dh,
    const Tensor3& dc) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  device_ptr<Tensor3> y_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor3> dy_dev(time_steps * batch_size * hidden_size);
  device_ptr<Tensor2> dhy_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dcy_dev(batch_size * hidden_size);
  device_ptr<Tensor2> hx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> cx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dx_dev(time_steps * batch_size * input_size);
  device_ptr<Tensor2> dhx_dev(batch_size * hidden_size);
  device_ptr<Tensor2> dcx_dev(batch_size * hidden_size);

  RnnDescriptor<float> rnn_descriptor(g_cudnn_handle, hidden_size, CUDNN_LSTM);
  TensorDescriptorArray<float> y_descriptors(time_steps, { batch_size, hidden_size, 1 });
  TensorDescriptorArray<float> dy_descriptors(time_steps, { batch_size, hidden_size, 1 });
  TensorDescriptorArray<float> dx_descriptors(time_steps, { batch_size, input_size, 1 });

  TensorDescriptor<float> dhy_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dcy_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> hx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> cx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dhx_descriptor({ 1, batch_size, hidden_size });
  TensorDescriptor<float> dcx_descriptor({ 1, batch_size, hidden_size });

  size_t workspace_size = 0;
  cudnnGetRNNWorkspaceSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &dx_descriptors,
      &workspace_size);
  auto workspace_dev = device_ptr<Tensor1>::NewByteSized(workspace_size);

  size_t w_count = 0;
  cudnnGetRNNParamsSize(
      g_cudnn_handle,
      *rnn_descriptor,
      *&dx_descriptors,
      &w_count,
      CUDNN_DATA_FLOAT);

  auto w_dev = device_ptr<Tensor1>::NewByteSized(w_count);
  FilterDescriptor<float> w_descriptor(w_dev.Size());

  size_t reserve_size = 0;
  cudnnGetRNNTrainingReserveSize(
      g_cudnn_handle,
      *rnn_descriptor,
      time_steps,
      &dx_descriptors,
      &reserve_size);
  auto reserve_dev = device_ptr<Tensor1>::NewByteSized(reserve_size);

  float ms = TimeLoop([&]() {
    cudnnRNNForwardTraining(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &dx_descriptors,
        dx_dev.data,
        *hx_descriptor,
        hx_dev.data,
        *cx_descriptor,
        cx_dev.data,
        *w_descriptor,
        w_dev.data,
        &y_descriptors,
        y_dev.data,
        *dhy_descriptor,
        dhy_dev.data,
        *dcy_descriptor,
        dcy_dev.data,
        workspace_dev.data,
        workspace_size,
        reserve_dev.data,
        reserve_size);

    cudnnRNNBackwardData(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &y_descriptors,
        y_dev.data,
        &dy_descriptors,
        dy_dev.data,
        *dhy_descriptor,
        dhy_dev.data,
        *dcy_descriptor,
        dcy_dev.data,
        *w_descriptor,
        w_dev.data,
        *hx_descriptor,
        hx_dev.data,
        *cx_descriptor,
        cx_dev.data,
        &dx_descriptors,
        dx_dev.data,
        *dhx_descriptor,
        dhx_dev.data,
        *dcx_descriptor,
        dcx_dev.data,
        workspace_dev.data,
        workspace_size,
        reserve_dev.data,
        reserve_size);

    cudnnRNNBackwardWeights(
        g_cudnn_handle,
        *rnn_descriptor,
        time_steps,
        &dx_descriptors,
        dx_dev.data,
        *hx_descriptor,
        hx_dev.data,
        &y_descriptors,
        y_dev.data,
        workspace_dev.data,
        workspace_size,
        *w_descriptor,
        w_dev.data,
        reserve_dev.data,
        reserve_size);
  }, sample_size);
  return ms;
}

float HasteInference(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& b,
    const Tensor3& x) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  // Copy weights over to GPU.
  device_ptr<Tensor2> W_dev(W);
  device_ptr<Tensor2> R_dev(R);
  device_ptr<Tensor1> b_dev(b);
  device_ptr<Tensor3> x_dev(x);

  device_ptr<Tensor3> h_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor3> c_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor3> v_dev(time_steps * batch_size * hidden_size * 4);
  device_ptr<Tensor2> tmp_Rh_dev(batch_size * hidden_size * 4);

  h_dev.zero();
  c_dev.zero();

  // Settle down the GPU and off we go!
  cudaDeviceSynchronize();
  float ms = TimeLoop([&]() {
    ForwardPass<float> forward(
        false,
        batch_size,
        input_size,
        hidden_size,
        g_blas_handle);

    forward.Run(
        time_steps,
        W_dev.data,
        R_dev.data,
        b_dev.data,
        x_dev.data,
        h_dev.data,
        c_dev.data,
        v_dev.data,
        tmp_Rh_dev.data,
        0.0f,
        nullptr);
  }, sample_size);
  return ms;
}

float HasteTrain(
    int sample_size,
    const Tensor2& W,
    const Tensor2& R,
    const Tensor1& b,
    const Tensor3& x,
    const Tensor3& dh,
    const Tensor3& dc) {
  const int time_steps = x.dimension(2);
  const int batch_size = x.dimension(1);
  const int input_size = x.dimension(0);
  const int hidden_size = R.dimension(1);

  Eigen::array<int, 3> transpose_x({ 1, 2, 0 });
  Tensor3 x_t = x.shuffle(transpose_x);

  Eigen::array<int, 2> transpose({ 1, 0 });
  Tensor2 W_t = W.shuffle(transpose);
  Tensor2 R_t = R.shuffle(transpose);

  device_ptr<Tensor2> W_dev(W);
  device_ptr<Tensor2> R_dev(R);
  device_ptr<Tensor3> x_dev(x);
  device_ptr<Tensor3> h_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor3> c_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor3> v_dev(time_steps * batch_size * hidden_size * 4);
  device_ptr<Tensor2> tmp_Rh_dev(batch_size * hidden_size * 4);

  device_ptr<Tensor2> W_t_dev(W_t);
  device_ptr<Tensor2> R_t_dev(R_t);
  device_ptr<Tensor1> b_dev(b);
  device_ptr<Tensor3> x_t_dev(x_t);

  // These gradients should actually come "from above" but we're just allocating
  // a bunch of uninitialized memory and passing it in.
  device_ptr<Tensor3> dh_new_dev(dh);
  device_ptr<Tensor3> dc_new_dev(dc);

  device_ptr<Tensor3> dx_dev(time_steps * batch_size * input_size);
  device_ptr<Tensor2> dW_dev(input_size * hidden_size * 4);
  device_ptr<Tensor2> dR_dev(hidden_size * hidden_size * 4);
  device_ptr<Tensor2> db_dev(hidden_size * 4);
  device_ptr<Tensor2> dh_dev((time_steps + 1) * batch_size * hidden_size);
  device_ptr<Tensor2> dc_dev((time_steps + 1) * batch_size * hidden_size);

  dW_dev.zero();
  dR_dev.zero();
  db_dev.zero();
  dh_dev.zero();
  dc_dev.zero();

  ForwardPass<float> forward(
      true,
      batch_size,
      input_size,
      hidden_size,
      g_blas_handle);

  BackwardPass<float> backward(
      batch_size,
      input_size,
      hidden_size,
      g_blas_handle);

  static const float alpha = 1.0f;
  static const float beta = 0.0f;

  cudaDeviceSynchronize();
  float ms = TimeLoop([&]() {
    forward.Run(
        time_steps,
        W_dev.data,
        R_dev.data,
        b_dev.data,
        x_dev.data,
        h_dev.data,
        c_dev.data,
        v_dev.data,
        tmp_Rh_dev.data,
        0.0f,
        nullptr);

    // Haste needs `x`, `W`, and `R` to be transposed between the forward
    // pass and backward pass. Add these transposes in here to get a fair
    // measurement of the overall time it takes to run an entire training
    // loop.
    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        batch_size * time_steps, input_size,
        &alpha,
        x_dev.data, input_size,
        &beta,
        x_dev.data, batch_size * time_steps,
        x_t_dev.data, batch_size * time_steps);

    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        input_size, hidden_size * 4,
        &alpha,
        W_dev.data, hidden_size * 4,
        &beta,
        W_dev.data, input_size,
        W_t_dev.data, input_size);

    cublasSgeam(
        g_blas_handle,
        CUBLAS_OP_T, CUBLAS_OP_N,
        hidden_size, hidden_size * 4,
        &alpha,
        R_dev.data, hidden_size * 4,
        &beta,
        R_dev.data, hidden_size,
        R_t_dev.data, hidden_size);

    backward.Run(
        time_steps,
        W_t_dev.data,
        R_t_dev.data,
        b_dev.data,
        x_t_dev.data,
        h_dev.data,
        c_dev.data,
        dh_new_dev.data,
        dc_new_dev.data,
        dx_dev.data,
        dW_dev.data,
        dR_dev.data,
        db_dev.data,
        dh_dev.data,
        dc_dev.data,
        v_dev.data,
        nullptr);
  }, sample_size);
  return ms;
}

void usage(const char* name) {
  printf("Usage: %s [OPTION]...\n", name);
  printf("  -h, --help\n");
  printf("  -i, --implementation IMPL <haste|cudnn> (default: haste)\n");
  printf("  -m, --mode MODE           <inference|training> (default: training)\n");
  printf("  -s, --sample_size NUM     number of runs to average over (default: %d)\n",
      DEFAULT_SAMPLE_SIZE);
  printf("  -t, --time_steps NUM      number of time steps in RNN (default: %d)\n",
      DEFAULT_TIME_STEPS);
}

int main(int argc, char* const* argv) {
  srand(time(0));

  cudnnCreate(&g_cudnn_handle);
  cublasCreate(&g_blas_handle);

  static struct option long_options[] = {
    { "help", no_argument, 0, 'h' },
    { "implementation", required_argument, 0, 'i' },
    { "mode", required_argument, 0, 'm' },
    { "sample_size", required_argument, 0, 's' },
    { "time_steps", required_argument, 0, 't' },
    { 0, 0, 0, 0 }
  };

  int c;
  int opt_index;
  bool inference_flag = false;
  bool haste_flag = true;
  int sample_size = DEFAULT_SAMPLE_SIZE;
  int time_steps = DEFAULT_TIME_STEPS;
  while ((c = getopt_long(argc, argv, "hi:m:s:t:", long_options, &opt_index)) != -1)
    switch (c) {
      case 'h':
        usage(argv[0]);
        return 0;
      case 'i':
        if (optarg[0] == 'c' || optarg[0] == 'C')
          haste_flag = false;
        break;
      case 'm':
        if (optarg[0] == 'i' || optarg[0] == 'I')
          inference_flag = true;
        break;
      case 's':
        sscanf(optarg, "%d", &sample_size);
        break;
      case 't':
        sscanf(optarg, "%d", &time_steps);
        break;
    }

  printf("# Benchmark configuration:\n");
  printf("#   Mode: %s\n", inference_flag ? "inference" : "training");
  printf("#   Implementation: %s\n", haste_flag ? "Haste" : "cuDNN");
  printf("#   Sample size: %d\n", sample_size);
  printf("#   Time steps: %d\n", time_steps);
  printf("#\n");
  printf("# batch_size,hidden_size,input_size,time_ms\n");

  for (const int N : { 1, 16, 32, 64, 128 }) {
    for (const int H : { 128, 256, 512, 768, 1024, 1536, 2048, 3072, 4096 }) {
      for (const int C : { 64, 128, 256, 512 }) {
        Tensor2 W(H * 4, C);
        Tensor2 R(H * 4, H);
        Tensor1 b(H * 4);
        Tensor3 x(C, N, time_steps);
        Tensor3 dh(H, N, time_steps + 1);
        Tensor3 dc(H, N, time_steps + 1);

        float ms;
        if (inference_flag) {
          if (haste_flag)
            ms = HasteInference(sample_size, W, R, b, x);
          else
            ms = CudnnInference(sample_size, W, R, b, x);
        } else {
          if (haste_flag)
            ms = HasteTrain(sample_size, W, R, b, x, dh, dc);
          else
            ms = CudnnTrain(sample_size, W, R, b, x, dh, dc);
        }
        printf("%d,%d,%d,%f\n", N, H, C, ms);
      }
    }
  }

  cublasDestroy(g_blas_handle);
  cudnnDestroy(g_cudnn_handle);
  return 0;
}


================================================
FILE: benchmarks/cudnn_wrappers.h
================================================
// Copyright 2020 LMNT, Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//    http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// ==============================================================================

#pragma once

#include <cassert>
#include <cudnn.h>
#include <vector>

template<typename T>
struct CudnnDataType {};

template<>
struct CudnnDataType<float> {
  static constexpr auto value = CUDNN_DATA_FLOAT;
};

template<>
struct CudnnDataType<double> {
  static constexpr auto value = CUDNN_DATA_DOUBLE;
};

template<typename T>
class TensorDescriptor {
  public:
    TensorDescriptor(const std::vector<int>& dims) {
      std::vector<int> strides;
      int stride = 1;
      for (int i = dims.size() - 1; i >= 0; --i) {
        strides.insert(strides.begin(), stride);
        stride *= dims[i];
      }
      cudnnCreateTensorDescriptor(&descriptor_);
      cudnnSetTensorNdDescriptor(descriptor_, CudnnDataType<T>::value, dims.size(), &dims[0], &strides[0]);
    }

    ~TensorDescriptor() {
      cudnnDestroyTensorDescriptor(descriptor_);
    }

    cudnnTensorDescriptor_t& operator*() {
      return descriptor_;
    }

  private:
    cudnnTensorDescriptor_t descriptor_;
};


template<typename T>
class TensorDescriptorArray {
  public:
    TensorDescriptorArray(int count, const std::vector<int>& dims) {
      std::vector<int> strides;
      int stride = 1;
      for (int i = dims.size() - 1; i >= 0; --i) {
        strides.insert(strides.begin(), stride);
        stride *= dims[i];
      }
      for (int i = 0; i < count; ++i) {
        cudnnTensorDescriptor_t descriptor;
        cudnnCreateTensorDescriptor(&descriptor);
        cudnnSetTensorNdDescriptor(descriptor, CudnnDataType<T>::value, dims.size(), &dims[0], &strides[0]);
        descriptors_.push_back(descriptor);
      }
    }

    ~TensorDescriptorArray() {
      for (auto& desc : descriptors_)
        cudnnDestroyTensorDescriptor(desc);
    }

    cudnnTensorDescriptor_t* operator&() {
      return &descriptors_[0];
    }

  private:
    std::vector<cudnnTensorDescriptor_t> descriptors_;
};

class DropoutDescriptor {
  public:
    DropoutDescriptor(const cudnnHandle_t& handle) {
      cudnnCreateDropoutDescriptor(&descriptor_);
      cudnnSetDropoutDescriptor(descriptor_, handle, 0.0f, nullptr, 0, 0LL);
    }

    ~DropoutDescriptor() {
      cudnnDestroyDropoutDescriptor(descriptor_);
    }

    cudnnDropoutDescriptor_t& operator*() {
      return descriptor_;
    }

  private:
    cudnnDropoutDescriptor_t descriptor_;
};

template<typename T>
class RnnDescriptor {
  public:
    RnnDescriptor(const cudnnHandle_t& handle, int size, cudnnRNNMode_t algorithm) : dropout_(handle) {
      cudnnCreateRNNDescriptor(&descriptor_);
      cudnnSetRNNDescriptor(
          handle,
          descriptor_,
          size,
          1,
          *dropout_,
          CUDNN_LINEAR_INPUT,
          CUDNN_UNIDIRECTIONAL,
          algorithm,
          CUDNN_RNN_ALGO_STANDARD,
          CudnnDataType<T>::value);
    }

    ~RnnDescriptor() {
      cudnnDestroyRNNDescriptor(descriptor_);
    }

    cudnnRNNDescriptor_t& operator*() {
      return descriptor_;
    }

  private:
    cudnnRNNDescriptor_t descriptor_;
    DropoutDescriptor dropout_;
};

template<typename T>
class FilterDescriptor {
  public:
    FilterDescriptor(const size_t size) {
      int filter_dim[] = { (int)size, 1, 1 };
      cudnnCreateFilterDescriptor(&descriptor_);
      cudnnSetFilterNdDescriptor(descriptor_, CudnnDataType<T>::value, CUDNN_TENSOR_NCHW, 3, filter_dim);
    }

    ~FilterDescriptor() {
      cudnnDestroyFilterDescriptor(descriptor_);
    }

    cudnnFilterDescriptor_t& operator*() {
      return descriptor_;
    }

  private:
    cudnnFilterDescriptor_t descriptor_;
};


================================================
FILE: benchmarks/report.py
================================================
# Copyright 2020 LMNT, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

import argparse
import matplotlib.pyplot as plt
import numpy as np
import os


def extract(x, predicate):
  return np.array(list(filter(predicate, x)))


def main(args):
  np.set_printoptions(suppress=True)

  A = np.loadtxt(args.A, delimiter=',')
  B = np.loadtxt(args.B, delimiter=',')

  faster = 1.0 - A[:,-1] / B[:,-1]

  print(f'A is faster than B by:')
  print(f'  mean:   {np.mean(faster)*100:7.4}%')
  print(f'  std:    {np.std(faster)*100:7.4}%')
  print(f'  median: {np.median(faster)*100:7.4}%')
  print(f'  min:    {np.min(faster)*100:7.4}%')
  print(f'  max:    {np.max(faster)*100:7.4}%')

  for batch_size in np.unique(A[:,0]):
    for input_size in np.unique(A[:,2]):
      a = extract(A, lambda x: x[0] == batch_size and x[2] == input_size)
      b = extract(B, lambda x: x[0] == batch_size and x[2] == input_size)
      fig, ax = plt.subplots(dpi=200)
      ax.set_xticks(a[:,1])
      ax.set_xticklabels(a[:,1].astype(np.int32), rotation=60)
      ax.tick_params(axis='y', which='both', length=0)
      ax.spines['top'].set_visible(False)
      ax.spines['right'].set_visible(False)
      plt.title(f'batch size={int(batch_size)}, input size={int(input_size)}')
      plt.plot(a[:,1], a[:,-1], color=args.color[0])
      plt.plot(a[:,1], b[:,-1], color=args.color[1])
      plt.xlabel('hidden size')
      plt.ylabel('time (ms)')
      plt.legend(args.name, frameon=False)
      plt.tight_layout()
      if args.save:
        os.makedirs(args.save[0], exist_ok=True)
        plt.savefig(f'{args.save[0]}/report_n={int(batch_size)}_c={int(input_size)}.png', dpi=200)
      else:
        plt.show()


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--name', nargs=2, default=['A', 'B'])
  parser.add_argument('--color', nargs=2, default=['#1f77b4', '#2ca02c'])
  parser.add_argument('--save', nargs=1, default=None)
  parser.add_argument('A')
  parser.add_argument('B')
  main(parser.parse_args())


================================================
FILE: build/MANIFEST.in
================================================
include Makefile
include frameworks/tf/*.h
include frameworks/tf/*.cc
include frameworks/pytorch/*.h
include frameworks/pytorch/*.cc
include lib/*.cc
include lib/*.h
include lib/haste/*.h


================================================
FILE: build/common.py
================================================
# Copyright 2020 LMNT, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================


VERSION = '0.5.0-rc0'
DESCRIPTION = 'Haste: a fast, simple, and open RNN library.'
AUTHOR = 'LMNT, Inc.'
AUTHOR_EMAIL = 'haste@lmnt.com'
URL = 'https://haste.lmnt.com'
LICENSE = 'Apache 2.0'
CLASSIFIERS = [
  'Development Status :: 4 - Beta',
  'Intended Audience :: Developers',
  'Intended Audience :: Education',
  'Intended Audience :: Science/Research',
  'License :: OSI Approved :: Apache Software License',
  'Programming Language :: Python :: 3.6',
  'Programming Language :: Python :: 3.7',
  'Programming Language :: Python :: 3.8',
  'Topic :: Scientific/Engineering :: Mathematics',
  'Topic :: Software Development :: Libraries :: Python Modules',
  'Topic :: Software Development :: Libraries',
]




================================================
FILE: build/setup.pytorch.py
================================================
# Copyright 2020 LMNT, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

import os
import sys

from glob import glob
from platform import platform
from torch.utils import cpp_extension
from setuptools import setup
from setuptools.dist import Distribution


class BuildHaste(cpp_extension.BuildExtension):
  def run(self):
    os.system('make haste')
    super().run()


base_path = os.path.dirname(os.path.realpath(__file__))
if 'Windows' in platform():
  CUDA_HOME = os.environ.get('CUDA_HOME', os.environ.get('CUDA_PATH'))
  extra_args = []
else:
  CUDA_HOME = os.environ.get('CUDA_HOME', '/usr/local/cuda')
  extra_args = ['-Wno-sign-compare']

with open(f'frameworks/pytorch/_version.py', 'wt') as f:
  f.write(f'__version__ = "{VERSION}"')

extension = cpp_extension.CUDAExtension(
    'haste_pytorch_lib',
    sources = glob('frameworks/pytorch/*.cc'),
    extra_compile_args = extra_args,
    include_dirs = [os.path.join(base_path, 'lib'), os.path.join(CUDA_HOME, 'include')],
    libraries = ['haste'],
    library_dirs = ['.'])

setup(name = 'haste_pytorch',
    version = VERSION,
    description = DESCRIPTION,
    long_description = open('README.md', 'r',encoding='utf-8').read(),
    long_description_content_type = 'text/markdown',
    author = AUTHOR,
    author_email = AUTHOR_EMAIL,
    url = URL,
    license = LICENSE,
    keywords = 'pytorch machine learning rnn lstm gru custom op',
    packages = ['haste_pytorch'],
    package_dir = { 'haste_pytorch': 'frameworks/pytorch' },
    install_requires = [],
    ext_modules = [extension],
    cmdclass = { 'build_ext': BuildHaste },
    classifiers = CLASSIFIERS)


================================================
FILE: build/setup.tf.py
================================================
# Copyright 2020 LMNT, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

import os
import sys

from setuptools import setup
from setuptools.dist import Distribution
from distutils.command.build import build as _build


class BinaryDistribution(Distribution):
  """This class is needed in order to create OS specific wheels."""
  def has_ext_modules(self):
    return True


class BuildHaste(_build):
  def run(self):
    os.system('make libhaste_tf')
    super().run()


with open(f'frameworks/tf/_version.py', 'wt') as f:
  f.write(f'__version__ = "{VERSION}"')

setup(name = 'haste_tf',
    version = VERSION,
    description = DESCRIPTION,
    long_description = open('README.md', 'r').read(),
    long_description_content_type = 'text/markdown',
    author = AUTHOR,
    author_email = AUTHOR_EMAIL,
    url = URL,
    license = LICENSE,
    keywords = 'tensorflow machine learning rnn lstm gru custom op',
    packages = ['haste_tf'],
    package_dir = { 'haste_tf': 'frameworks/tf' },
    package_data = { 'haste_tf': ['*.so'] },
    install_requires = [],
    zip_safe = False,
    distclass = BinaryDistribution,
    cmdclass = { 'build': BuildHaste },
    classifiers = CLASSIFIERS)


================================================
FILE: docs/pytorch/haste_pytorch/GRU.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch.GRU" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="add_module"/>
<meta itemprop="property" content="apply"/>
<meta itemprop="property" content="buffers"/>
<meta itemprop="property" content="children"/>
<meta itemprop="property" content="cpu"/>
<meta itemprop="property" content="cuda"/>
<meta itemprop="property" content="double"/>
<meta itemprop="property" content="eval"/>
<meta itemprop="property" content="extra_repr"/>
<meta itemprop="property" content="float"/>
<meta itemprop="property" content="forward"/>
<meta itemprop="property" content="from_native_weights"/>
<meta itemprop="property" content="half"/>
<meta itemprop="property" content="load_state_dict"/>
<meta itemprop="property" content="modules"/>
<meta itemprop="property" content="named_buffers"/>
<meta itemprop="property" content="named_children"/>
<meta itemprop="property" content="named_modules"/>
<meta itemprop="property" content="named_parameters"/>
<meta itemprop="property" content="parameters"/>
<meta itemprop="property" content="register_backward_hook"/>
<meta itemprop="property" content="register_buffer"/>
<meta itemprop="property" content="register_forward_hook"/>
<meta itemprop="property" content="register_forward_pre_hook"/>
<meta itemprop="property" content="register_parameter"/>
<meta itemprop="property" content="requires_grad_"/>
<meta itemprop="property" content="reset_parameters"/>
<meta itemprop="property" content="share_memory"/>
<meta itemprop="property" content="state_dict"/>
<meta itemprop="property" content="to"/>
<meta itemprop="property" content="to_native_weights"/>
<meta itemprop="property" content="train"/>
<meta itemprop="property" content="type"/>
<meta itemprop="property" content="zero_grad"/>
</div>

# haste_pytorch.GRU

<!-- Insert buttons and diff -->


## Class `GRU`

Gated Recurrent Unit layer.



<!-- Placeholder for "Used in" -->

This GRU layer offers a fused, GPU-accelerated PyTorch op for inference
and training. There are two commonly-used variants of GRU cells. This one
implements 1406.1078v1 which applies the reset gate to the hidden state
after matrix multiplication. cuDNN also implements this variant. The other
variant, 1406.1078v3, applies the reset gate before matrix multiplication
and is currently unsupported.

This layer has built-in support for DropConnect and Zoneout, which are
both techniques used to regularize RNNs.

See [\_\_init\_\_](#__init__) and [forward](#forward) for usage.
See [from_native_weights](#from_native_weights) and
[to_native_weights](#to_native_weights) for compatibility with PyTorch GRUs.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    input_size,
    hidden_size,
    batch_first=False,
    dropout=0.0,
    zoneout=0.0
)
```

Initialize the parameters of the GRU layer.


#### Arguments:


* <b>`input_size`</b>: int, the feature dimension of the input.
* <b>`hidden_size`</b>: int, the feature dimension of the output.
* <b>`batch_first`</b>: (optional) bool, if `True`, then the input and output
  tensors are provided as `(batch, seq, feature)`.
* <b>`dropout`</b>: (optional) float, sets the dropout rate for DropConnect
  regularization on the recurrent matrix.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization.


#### Variables:


* <b>`kernel`</b>: the input projection weight matrix. Dimensions
  (input_size, hidden_size * 3) with `z,r,h` gate layout. Initialized
  with Xavier uniform initialization.
* <b>`recurrent_kernel`</b>: the recurrent projection weight matrix. Dimensions
  (hidden_size, hidden_size * 3) with `z,r,h` gate layout. Initialized
  with orthogonal initialization.
* <b>`bias`</b>: the input projection bias vector. Dimensions (hidden_size * 3) with
  `z,r,h` gate layout. Initialized to zeros.
* <b>`recurrent_bias`</b>: the recurrent projection bias vector. Dimensions
  (hidden_size * 3) with `z,r,h` gate layout. Initialized to zeros.



## Methods

<h3 id="__call__"><code><a name="__call__">__call__</a></code></h3>

``` python
__call__(
    *input,
    **kwargs
)
```

Call self as a function.


<h3 id="add_module"><code><a name="add_module">add_module</a></code></h3>

``` python
add_module(
    name,
    module
)
```

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

#### Args:

name (string): name of the child module. The child module can be
    accessed from this module using the given name
module (Module): child module to be added to the module.


<h3 id="apply"><code><a name="apply">apply</a></code></h3>

``` python
apply(fn)
```

Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).

#### Args:

fn (:class:`Module` -> None): function to be applied to each submodule



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.data.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    ```

<h3 id="buffers"><code><a name="buffers">buffers</a></code></h3>

``` python
buffers(recurse=True)
```

Returns an iterator over module buffers.


#### Args:

recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`torch.Tensor`</b>: module buffer

Example::

    ```
    >>> for buf in model.buffers():
    >>>     print(type(buf.data), buf.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="children"><code><a name="children">children</a></code></h3>

``` python
children()
```

Returns an iterator over immediate children modules.


#### Yields:


* <b>`Module`</b>: a child module

<h3 id="cpu"><code><a name="cpu">cpu</a></code></h3>

``` python
cpu()
```

Moves all model parameters and buffers to the CPU.


#### Returns:


* <b>`Module`</b>: self

<h3 id="cuda"><code><a name="cuda">cuda</a></code></h3>

``` python
cuda(device=None)
```

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.

#### Arguments:

device (int, optional): if specified, all parameters will be
    copied to that device



#### Returns:


* <b>`Module`</b>: self

<h3 id="double"><code><a name="double">double</a></code></h3>

``` python
double()
```

Casts all floating point parameters and buffers to ``double`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="eval"><code><a name="eval">eval</a></code></h3>

``` python
eval()
```

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

#### Returns:


* <b>`Module`</b>: self

<h3 id="extra_repr"><code><a name="extra_repr">extra_repr</a></code></h3>

``` python
extra_repr()
```

Set the extra representation of the module

To print customized extra information, you should reimplement
this method in your own modules. Both single-line and multi-line
strings are acceptable.

<h3 id="float"><code><a name="float">float</a></code></h3>

``` python
float()
```

Casts all floating point parameters and buffers to float datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="forward"><code><a name="forward">forward</a></code></h3>

``` python
forward(
    input,
    state=None,
    lengths=None
)
```

Runs a forward pass of the GRU layer.


#### Arguments:


* <b>`input`</b>: Tensor, a batch of input sequences to pass through the GRU.
  Dimensions (seq_len, batch_size, input_size) if `batch_first` is
  `False`, otherwise (batch_size, seq_len, input_size).
* <b>`lengths`</b>: (optional) Tensor, list of sequence lengths for each batch
  element. Dimension (batch_size). This argument may be omitted if
  all batch elements are unpadded and have the same sequence length.


#### Returns:


* <b>`output`</b>: Tensor, the output of the GRU layer. Dimensions
  (seq_len, batch_size, hidden_size) if `batch_first` is `False` (default)
  or (batch_size, seq_len, hidden_size) if `batch_first` is `True`. Note
  that if `lengths` was specified, the `output` tensor will not be
  masked. It's the caller's responsibility to either not use the invalid
  entries or to mask them out before using them.
* <b>`h_n`</b>: the hidden state for the last sequence item. Dimensions
  (1, batch_size, hidden_size).

<h3 id="from_native_weights"><code><a name="from_native_weights">from_native_weights</a></code></h3>

``` python
from_native_weights(
    weight_ih_l0,
    weight_hh_l0,
    bias_ih_l0,
    bias_hh_l0
)
```

Copies and converts the provided PyTorch GRU weights into this layer.


#### Arguments:


* <b>`weight_ih_l0`</b>: Parameter, the input-hidden weights of the PyTorch GRU layer.
* <b>`weight_hh_l0`</b>: Parameter, the hidden-hidden weights of the PyTorch GRU layer.
* <b>`bias_ih_l0`</b>: Parameter, the input-hidden bias of the PyTorch GRU layer.
* <b>`bias_hh_l0`</b>: Parameter, the hidden-hidden bias of the PyTorch GRU layer.

<h3 id="half"><code><a name="half">half</a></code></h3>

``` python
half()
```

Casts all floating point parameters and buffers to ``half`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="load_state_dict"><code><a name="load_state_dict">load_state_dict</a></code></h3>

``` python
load_state_dict(
    state_dict,
    strict=True
)
```

Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.

#### Arguments:

state_dict (dict): a dict containing parameters and
    persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
    in :attr:`state_dict` match the keys returned by this module's
    :meth:`~torch.nn.Module.state_dict` function. Default: ``True``



#### Returns:

``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
    * **missing_keys** is a list of str containing the missing keys
    * **unexpected_keys** is a list of str containing the unexpected keys


<h3 id="modules"><code><a name="modules">modules</a></code></h3>

``` python
modules()
```

Returns an iterator over all modules in the network.


#### Yields:


* <b>`Module`</b>: a module in the network


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    ```

    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)

<h3 id="named_buffers"><code><a name="named_buffers">named_buffers</a></code></h3>

``` python
named_buffers(
    prefix='',
    recurse=True
)
```

Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.

#### Args:

prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`(string, torch.Tensor)`</b>: Tuple containing the name and buffer

Example::

    ```
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    ```

<h3 id="named_children"><code><a name="named_children">named_children</a></code></h3>

``` python
named_children()
```

Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple containing a name and child module

Example::

    ```
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    ```

<h3 id="named_modules"><code><a name="named_modules">named_modules</a></code></h3>

``` python
named_modules(
    memo=None,
    prefix=''
)
```

Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple of name and module


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    ```

    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

<h3 id="named_parameters"><code><a name="named_parameters">named_parameters</a></code></h3>

``` python
named_parameters(
    prefix='',
    recurse=True
)
```

Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.

#### Args:

prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`(string, Parameter)`</b>: Tuple containing the name and parameter

Example::

    ```
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    ```

<h3 id="parameters"><code><a name="parameters">parameters</a></code></h3>

``` python
parameters(recurse=True)
```

Returns an iterator over module parameters.

This is typically passed to an optimizer.

#### Args:

recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`Parameter`</b>: module parameter

Example::

    ```
    >>> for param in model.parameters():
    >>>     print(type(param.data), param.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="register_backward_hook"><code><a name="register_backward_hook">register_backward_hook</a></code></h3>

``` python
register_backward_hook(hook)
```

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::

    hook(module, grad_input, grad_output) -> Tensor or None

The :attr:`grad_input` and :attr:`grad_output` may be tuples if the
module has multiple inputs or outputs. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to
input that will be used in place of :attr:`grad_input` in subsequent
computations.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


.. warning ::

    The current implementation will not have the presented behavior
    for complex :class:`Module` that perform many operations.
    In some failure cases, :attr:`grad_input` and :attr:`grad_output` will only
    contain the gradients for a subset of the inputs and outputs.
    For such :class:`Module`, you should use :func:`torch.Tensor.register_hook`
    directly on a specific input or output to get the required gradients.

<h3 id="register_buffer"><code><a name="register_buffer">register_buffer</a></code></h3>

``` python
register_buffer(
    name,
    tensor
)
```

Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

#### Args:

name (string): name of the buffer. The buffer can be accessed
    from this module using the given name
tensor (Tensor): buffer to be registered.


Example::

    ```
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    ```

<h3 id="register_forward_hook"><code><a name="register_forward_hook">register_forward_hook</a></code></h3>

``` python
register_forward_hook(hook)
```

Registers a forward hook on the module.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None or modified output

The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_forward_pre_hook"><code><a name="register_forward_pre_hook">register_forward_pre_hook</a></code></h3>

``` python
register_forward_pre_hook(hook)
```

Registers a forward pre-hook on the module.

The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::

    hook(module, input) -> None or modified input

The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_parameter"><code><a name="register_parameter">register_parameter</a></code></h3>

``` python
register_parameter(
    name,
    param
)
```

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

#### Args:

name (string): name of the parameter. The parameter can be accessed
    from this module using the given name
param (Parameter): parameter to be added to the module.


<h3 id="requires_grad_"><code><a name="requires_grad_">requires_grad_</a></code></h3>

``` python
requires_grad_(requires_grad=True)
```

Change if autograd should record operations on parameters in this
module.

This method sets the parameters' :attr:`requires_grad` attributes
in-place.

This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).

#### Args:

requires_grad (bool): whether autograd should record operations on
                      parameters in this module. Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="reset_parameters"><code><a name="reset_parameters">reset_parameters</a></code></h3>

``` python
reset_parameters()
```

Resets this layer's parameters to their initial values.


<h3 id="share_memory"><code><a name="share_memory">share_memory</a></code></h3>

``` python
share_memory()
```




<h3 id="state_dict"><code><a name="state_dict">state_dict</a></code></h3>

``` python
state_dict(
    destination=None,
    prefix='',
    keep_vars=False
)
```

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.

#### Returns:


* <b>`dict`</b>:     a dictionary containing a whole state of the module

Example::

    ```
    >>> module.state_dict().keys()
    ['bias', 'weight']
    ```

<h3 id="to"><code><a name="to">to</a></code></h3>

``` python
to(
    *args,
    **kwargs
)
```

Moves and/or casts the parameters and buffers.

This can be called as

.. function:: to(device=None, dtype=None, non_blocking=False)

.. function:: to(dtype, non_blocking=False)

.. function:: to(tensor, non_blocking=False)

Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.

See below for examples.

.. note::
    This method modifies the module in-place.

#### Args:

device (:class:`torch.device`): the desired device of the parameters
    and buffers in this module
dtype (:class:`torch.dtype`): the desired floating point type of
    the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
    dtype and device for all parameters and buffers in this module



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    ```

<h3 id="to_native_weights"><code><a name="to_native_weights">to_native_weights</a></code></h3>

``` python
to_native_weights()
```

Converts Haste GRU weights to native PyTorch GRU weights.


#### Returns:


* <b>`weight_ih_l0`</b>: Parameter, the input-hidden weights of the GRU layer.
* <b>`weight_hh_l0`</b>: Parameter, the hidden-hidden weights of the GRU layer.
* <b>`bias_ih_l0`</b>: Parameter, the input-hidden bias of the GRU layer.
* <b>`bias_hh_l0`</b>: Parameter, the hidden-hidden bias of the GRU layer.

<h3 id="train"><code><a name="train">train</a></code></h3>

``` python
train(mode=True)
```

Sets the module in training mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

#### Args:

mode (bool): whether to set training mode (``True``) or evaluation
             mode (``False``). Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="type"><code><a name="type">type</a></code></h3>

``` python
type(dst_type)
```

Casts all parameters and buffers to :attr:`dst_type`.


#### Arguments:

dst_type (type or string): the desired type



#### Returns:


* <b>`Module`</b>: self

<h3 id="zero_grad"><code><a name="zero_grad">zero_grad</a></code></h3>

``` python
zero_grad()
```

Sets gradients of all model parameters to zero.






================================================
FILE: docs/pytorch/haste_pytorch/IndRNN.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch.IndRNN" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="add_module"/>
<meta itemprop="property" content="apply"/>
<meta itemprop="property" content="buffers"/>
<meta itemprop="property" content="children"/>
<meta itemprop="property" content="cpu"/>
<meta itemprop="property" content="cuda"/>
<meta itemprop="property" content="double"/>
<meta itemprop="property" content="eval"/>
<meta itemprop="property" content="extra_repr"/>
<meta itemprop="property" content="float"/>
<meta itemprop="property" content="forward"/>
<meta itemprop="property" content="half"/>
<meta itemprop="property" content="load_state_dict"/>
<meta itemprop="property" content="modules"/>
<meta itemprop="property" content="named_buffers"/>
<meta itemprop="property" content="named_children"/>
<meta itemprop="property" content="named_modules"/>
<meta itemprop="property" content="named_parameters"/>
<meta itemprop="property" content="parameters"/>
<meta itemprop="property" content="register_backward_hook"/>
<meta itemprop="property" content="register_buffer"/>
<meta itemprop="property" content="register_forward_hook"/>
<meta itemprop="property" content="register_forward_pre_hook"/>
<meta itemprop="property" content="register_parameter"/>
<meta itemprop="property" content="requires_grad_"/>
<meta itemprop="property" content="reset_parameters"/>
<meta itemprop="property" content="share_memory"/>
<meta itemprop="property" content="state_dict"/>
<meta itemprop="property" content="to"/>
<meta itemprop="property" content="train"/>
<meta itemprop="property" content="type"/>
<meta itemprop="property" content="zero_grad"/>
</div>

# haste_pytorch.IndRNN

<!-- Insert buttons and diff -->


## Class `IndRNN`

Independently Recurrent Neural Network layer.



<!-- Placeholder for "Used in" -->

This layer offers a fused, GPU-accelerated PyTorch op for inference and
training. It also supports Zoneout regularization.

See [\_\_init\_\_](#__init__) and [forward](#forward) for usage.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    input_size,
    hidden_size,
    batch_first=False,
    zoneout=0.0
)
```

Initialize the parameters of the IndRNN layer.


#### Arguments:


* <b>`input_size`</b>: int, the feature dimension of the input.
* <b>`hidden_size`</b>: int, the feature dimension of the output.
* <b>`batch_first`</b>: (optional) bool, if `True`, then the input and output
  tensors are provided as `(batch, seq, feature)`.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization.


#### Variables:


* <b>`kernel`</b>: the input projection weight matrix. Dimensions
  (input_size, hidden_size). Initialized with Xavier uniform
  initialization.
* <b>`recurrent_scale`</b>: the recurrent scale weight vector. Dimensions
  (hidden_size). Initialized uniformly in [-0.5, 0.5]. Note that this
  initialization scheme is different than in the original authors'
  implementation. See https://github.com/lmnt-com/haste/issues/7 for
  details.
* <b>`bias`</b>: the RNN bias vector. Dimensions (hidden_size). Initialized to zeros.



## Methods

<h3 id="__call__"><code><a name="__call__">__call__</a></code></h3>

``` python
__call__(
    *input,
    **kwargs
)
```

Call self as a function.


<h3 id="add_module"><code><a name="add_module">add_module</a></code></h3>

``` python
add_module(
    name,
    module
)
```

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

#### Args:

name (string): name of the child module. The child module can be
    accessed from this module using the given name
module (Module): child module to be added to the module.


<h3 id="apply"><code><a name="apply">apply</a></code></h3>

``` python
apply(fn)
```

Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).

#### Args:

fn (:class:`Module` -> None): function to be applied to each submodule



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.data.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    ```

<h3 id="buffers"><code><a name="buffers">buffers</a></code></h3>

``` python
buffers(recurse=True)
```

Returns an iterator over module buffers.


#### Args:

recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`torch.Tensor`</b>: module buffer

Example::

    ```
    >>> for buf in model.buffers():
    >>>     print(type(buf.data), buf.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="children"><code><a name="children">children</a></code></h3>

``` python
children()
```

Returns an iterator over immediate children modules.


#### Yields:


* <b>`Module`</b>: a child module

<h3 id="cpu"><code><a name="cpu">cpu</a></code></h3>

``` python
cpu()
```

Moves all model parameters and buffers to the CPU.


#### Returns:


* <b>`Module`</b>: self

<h3 id="cuda"><code><a name="cuda">cuda</a></code></h3>

``` python
cuda(device=None)
```

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.

#### Arguments:

device (int, optional): if specified, all parameters will be
    copied to that device



#### Returns:


* <b>`Module`</b>: self

<h3 id="double"><code><a name="double">double</a></code></h3>

``` python
double()
```

Casts all floating point parameters and buffers to ``double`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="eval"><code><a name="eval">eval</a></code></h3>

``` python
eval()
```

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

#### Returns:


* <b>`Module`</b>: self

<h3 id="extra_repr"><code><a name="extra_repr">extra_repr</a></code></h3>

``` python
extra_repr()
```

Set the extra representation of the module

To print customized extra information, you should reimplement
this method in your own modules. Both single-line and multi-line
strings are acceptable.

<h3 id="float"><code><a name="float">float</a></code></h3>

``` python
float()
```

Casts all floating point parameters and buffers to float datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="forward"><code><a name="forward">forward</a></code></h3>

``` python
forward(
    input,
    state=None,
    lengths=None
)
```

Runs a forward pass of the IndRNN layer.


#### Arguments:


* <b>`input`</b>: Tensor, a batch of input sequences to pass through the GRU.
  Dimensions (seq_len, batch_size, input_size) if `batch_first` is
  `False`, otherwise (batch_size, seq_len, input_size).
* <b>`state`</b>: (optional) Tensor, the initial state for each batch element in
  `input`. Dimensions (1, batch_size, hidden_size). Defaults to zeros.
* <b>`lengths`</b>: (optional) Tensor, list of sequence lengths for each batch
  element. Dimension (batch_size). This argument may be omitted if
  all batch elements are unpadded and have the same sequence length.


#### Returns:


* <b>`output`</b>: Tensor, the output of the GRU layer. Dimensions
  (seq_len, batch_size, hidden_size) if `batch_first` is `False` (default)
  or (batch_size, seq_len, hidden_size) if `batch_first` is `True`. Note
  that if `lengths` was specified, the `output` tensor will not be
  masked. It's the caller's responsibility to either not use the invalid
  entries or to mask them out before using them.
* <b>`state`</b>: the hidden state for the last sequence item. Dimensions
  (1, batch_size, hidden_size).

<h3 id="half"><code><a name="half">half</a></code></h3>

``` python
half()
```

Casts all floating point parameters and buffers to ``half`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="load_state_dict"><code><a name="load_state_dict">load_state_dict</a></code></h3>

``` python
load_state_dict(
    state_dict,
    strict=True
)
```

Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.

#### Arguments:

state_dict (dict): a dict containing parameters and
    persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
    in :attr:`state_dict` match the keys returned by this module's
    :meth:`~torch.nn.Module.state_dict` function. Default: ``True``



#### Returns:

``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
    * **missing_keys** is a list of str containing the missing keys
    * **unexpected_keys** is a list of str containing the unexpected keys


<h3 id="modules"><code><a name="modules">modules</a></code></h3>

``` python
modules()
```

Returns an iterator over all modules in the network.


#### Yields:


* <b>`Module`</b>: a module in the network


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    ```

    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)

<h3 id="named_buffers"><code><a name="named_buffers">named_buffers</a></code></h3>

``` python
named_buffers(
    prefix='',
    recurse=True
)
```

Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.

#### Args:

prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`(string, torch.Tensor)`</b>: Tuple containing the name and buffer

Example::

    ```
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    ```

<h3 id="named_children"><code><a name="named_children">named_children</a></code></h3>

``` python
named_children()
```

Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple containing a name and child module

Example::

    ```
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    ```

<h3 id="named_modules"><code><a name="named_modules">named_modules</a></code></h3>

``` python
named_modules(
    memo=None,
    prefix=''
)
```

Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple of name and module


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    ```

    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

<h3 id="named_parameters"><code><a name="named_parameters">named_parameters</a></code></h3>

``` python
named_parameters(
    prefix='',
    recurse=True
)
```

Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.

#### Args:

prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`(string, Parameter)`</b>: Tuple containing the name and parameter

Example::

    ```
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    ```

<h3 id="parameters"><code><a name="parameters">parameters</a></code></h3>

``` python
parameters(recurse=True)
```

Returns an iterator over module parameters.

This is typically passed to an optimizer.

#### Args:

recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`Parameter`</b>: module parameter

Example::

    ```
    >>> for param in model.parameters():
    >>>     print(type(param.data), param.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="register_backward_hook"><code><a name="register_backward_hook">register_backward_hook</a></code></h3>

``` python
register_backward_hook(hook)
```

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::

    hook(module, grad_input, grad_output) -> Tensor or None

The :attr:`grad_input` and :attr:`grad_output` may be tuples if the
module has multiple inputs or outputs. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to
input that will be used in place of :attr:`grad_input` in subsequent
computations.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


.. warning ::

    The current implementation will not have the presented behavior
    for complex :class:`Module` that perform many operations.
    In some failure cases, :attr:`grad_input` and :attr:`grad_output` will only
    contain the gradients for a subset of the inputs and outputs.
    For such :class:`Module`, you should use :func:`torch.Tensor.register_hook`
    directly on a specific input or output to get the required gradients.

<h3 id="register_buffer"><code><a name="register_buffer">register_buffer</a></code></h3>

``` python
register_buffer(
    name,
    tensor
)
```

Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

#### Args:

name (string): name of the buffer. The buffer can be accessed
    from this module using the given name
tensor (Tensor): buffer to be registered.


Example::

    ```
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    ```

<h3 id="register_forward_hook"><code><a name="register_forward_hook">register_forward_hook</a></code></h3>

``` python
register_forward_hook(hook)
```

Registers a forward hook on the module.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None or modified output

The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_forward_pre_hook"><code><a name="register_forward_pre_hook">register_forward_pre_hook</a></code></h3>

``` python
register_forward_pre_hook(hook)
```

Registers a forward pre-hook on the module.

The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::

    hook(module, input) -> None or modified input

The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_parameter"><code><a name="register_parameter">register_parameter</a></code></h3>

``` python
register_parameter(
    name,
    param
)
```

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

#### Args:

name (string): name of the parameter. The parameter can be accessed
    from this module using the given name
param (Parameter): parameter to be added to the module.


<h3 id="requires_grad_"><code><a name="requires_grad_">requires_grad_</a></code></h3>

``` python
requires_grad_(requires_grad=True)
```

Change if autograd should record operations on parameters in this
module.

This method sets the parameters' :attr:`requires_grad` attributes
in-place.

This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).

#### Args:

requires_grad (bool): whether autograd should record operations on
                      parameters in this module. Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="reset_parameters"><code><a name="reset_parameters">reset_parameters</a></code></h3>

``` python
reset_parameters()
```




<h3 id="share_memory"><code><a name="share_memory">share_memory</a></code></h3>

``` python
share_memory()
```




<h3 id="state_dict"><code><a name="state_dict">state_dict</a></code></h3>

``` python
state_dict(
    destination=None,
    prefix='',
    keep_vars=False
)
```

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.

#### Returns:


* <b>`dict`</b>:     a dictionary containing a whole state of the module

Example::

    ```
    >>> module.state_dict().keys()
    ['bias', 'weight']
    ```

<h3 id="to"><code><a name="to">to</a></code></h3>

``` python
to(
    *args,
    **kwargs
)
```

Moves and/or casts the parameters and buffers.

This can be called as

.. function:: to(device=None, dtype=None, non_blocking=False)

.. function:: to(dtype, non_blocking=False)

.. function:: to(tensor, non_blocking=False)

Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.

See below for examples.

.. note::
    This method modifies the module in-place.

#### Args:

device (:class:`torch.device`): the desired device of the parameters
    and buffers in this module
dtype (:class:`torch.dtype`): the desired floating point type of
    the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
    dtype and device for all parameters and buffers in this module



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    ```

<h3 id="train"><code><a name="train">train</a></code></h3>

``` python
train(mode=True)
```

Sets the module in training mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

#### Args:

mode (bool): whether to set training mode (``True``) or evaluation
             mode (``False``). Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="type"><code><a name="type">type</a></code></h3>

``` python
type(dst_type)
```

Casts all parameters and buffers to :attr:`dst_type`.


#### Arguments:

dst_type (type or string): the desired type



#### Returns:


* <b>`Module`</b>: self

<h3 id="zero_grad"><code><a name="zero_grad">zero_grad</a></code></h3>

``` python
zero_grad()
```

Sets gradients of all model parameters to zero.






================================================
FILE: docs/pytorch/haste_pytorch/LSTM.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch.LSTM" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="add_module"/>
<meta itemprop="property" content="apply"/>
<meta itemprop="property" content="buffers"/>
<meta itemprop="property" content="children"/>
<meta itemprop="property" content="cpu"/>
<meta itemprop="property" content="cuda"/>
<meta itemprop="property" content="double"/>
<meta itemprop="property" content="eval"/>
<meta itemprop="property" content="extra_repr"/>
<meta itemprop="property" content="float"/>
<meta itemprop="property" content="forward"/>
<meta itemprop="property" content="from_native_weights"/>
<meta itemprop="property" content="half"/>
<meta itemprop="property" content="load_state_dict"/>
<meta itemprop="property" content="modules"/>
<meta itemprop="property" content="named_buffers"/>
<meta itemprop="property" content="named_children"/>
<meta itemprop="property" content="named_modules"/>
<meta itemprop="property" content="named_parameters"/>
<meta itemprop="property" content="parameters"/>
<meta itemprop="property" content="register_backward_hook"/>
<meta itemprop="property" content="register_buffer"/>
<meta itemprop="property" content="register_forward_hook"/>
<meta itemprop="property" content="register_forward_pre_hook"/>
<meta itemprop="property" content="register_parameter"/>
<meta itemprop="property" content="requires_grad_"/>
<meta itemprop="property" content="reset_parameters"/>
<meta itemprop="property" content="share_memory"/>
<meta itemprop="property" content="state_dict"/>
<meta itemprop="property" content="to"/>
<meta itemprop="property" content="to_native_weights"/>
<meta itemprop="property" content="train"/>
<meta itemprop="property" content="type"/>
<meta itemprop="property" content="zero_grad"/>
</div>

# haste_pytorch.LSTM

<!-- Insert buttons and diff -->


## Class `LSTM`

Long Short-Term Memory layer.



<!-- Placeholder for "Used in" -->

This LSTM layer offers a fused, GPU-accelerated PyTorch op for inference
and training. Although this implementation is comparable in performance to
cuDNN's LSTM, it offers additional options not typically found in other
high-performance implementations. DropConnect and Zoneout regularization are
built-in, and this layer allows setting a non-zero initial forget gate bias.

See [\_\_init\_\_](#__init__) and [forward](#forward) for general usage.
See [from_native_weights](#from_native_weights) and
[to_native_weights](#to_native_weights) for compatibility with PyTorch LSTMs.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    input_size,
    hidden_size,
    batch_first=False,
    forget_bias=1.0,
    dropout=0.0,
    zoneout=0.0
)
```

Initialize the parameters of the LSTM layer.


#### Arguments:


* <b>`input_size`</b>: int, the feature dimension of the input.
* <b>`hidden_size`</b>: int, the feature dimension of the output.
* <b>`batch_first`</b>: (optional) bool, if `True`, then the input and output
  tensors are provided as `(batch, seq, feature)`.
* <b>`forget_bias`</b>: (optional) float, sets the initial bias of the forget gate
  for this LSTM cell.
* <b>`dropout`</b>: (optional) float, sets the dropout rate for DropConnect
  regularization on the recurrent matrix.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization.


#### Variables:


* <b>`kernel`</b>: the input projection weight matrix. Dimensions
  (input_size, hidden_size * 4) with `i,g,f,o` gate layout. Initialized
  with Xavier uniform initialization.
* <b>`recurrent_kernel`</b>: the recurrent projection weight matrix. Dimensions
  (hidden_size, hidden_size * 4) with `i,g,f,o` gate layout. Initialized
  with orthogonal initialization.
* <b>`bias`</b>: the projection bias vector. Dimensions (hidden_size * 4) with
  `i,g,f,o` gate layout. The forget gate biases are initialized to
  `forget_bias` and the rest are zeros.



## Methods

<h3 id="__call__"><code><a name="__call__">__call__</a></code></h3>

``` python
__call__(
    *input,
    **kwargs
)
```

Call self as a function.


<h3 id="add_module"><code><a name="add_module">add_module</a></code></h3>

``` python
add_module(
    name,
    module
)
```

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

#### Args:

name (string): name of the child module. The child module can be
    accessed from this module using the given name
module (Module): child module to be added to the module.


<h3 id="apply"><code><a name="apply">apply</a></code></h3>

``` python
apply(fn)
```

Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).

#### Args:

fn (:class:`Module` -> None): function to be applied to each submodule



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.data.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    ```

<h3 id="buffers"><code><a name="buffers">buffers</a></code></h3>

``` python
buffers(recurse=True)
```

Returns an iterator over module buffers.


#### Args:

recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`torch.Tensor`</b>: module buffer

Example::

    ```
    >>> for buf in model.buffers():
    >>>     print(type(buf.data), buf.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="children"><code><a name="children">children</a></code></h3>

``` python
children()
```

Returns an iterator over immediate children modules.


#### Yields:


* <b>`Module`</b>: a child module

<h3 id="cpu"><code><a name="cpu">cpu</a></code></h3>

``` python
cpu()
```

Moves all model parameters and buffers to the CPU.


#### Returns:


* <b>`Module`</b>: self

<h3 id="cuda"><code><a name="cuda">cuda</a></code></h3>

``` python
cuda(device=None)
```

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.

#### Arguments:

device (int, optional): if specified, all parameters will be
    copied to that device



#### Returns:


* <b>`Module`</b>: self

<h3 id="double"><code><a name="double">double</a></code></h3>

``` python
double()
```

Casts all floating point parameters and buffers to ``double`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="eval"><code><a name="eval">eval</a></code></h3>

``` python
eval()
```

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

#### Returns:


* <b>`Module`</b>: self

<h3 id="extra_repr"><code><a name="extra_repr">extra_repr</a></code></h3>

``` python
extra_repr()
```

Set the extra representation of the module

To print customized extra information, you should reimplement
this method in your own modules. Both single-line and multi-line
strings are acceptable.

<h3 id="float"><code><a name="float">float</a></code></h3>

``` python
float()
```

Casts all floating point parameters and buffers to float datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="forward"><code><a name="forward">forward</a></code></h3>

``` python
forward(
    input,
    state=None,
    lengths=None
)
```

Runs a forward pass of the LSTM layer.


#### Arguments:


* <b>`input`</b>: Tensor, a batch of input sequences to pass through the LSTM.
  Dimensions (seq_len, batch_size, input_size) if `batch_first` is
  `False`, otherwise (batch_size, seq_len, input_size).
* <b>`lengths`</b>: (optional) Tensor, list of sequence lengths for each batch
  element. Dimension (batch_size). This argument may be omitted if
  all batch elements are unpadded and have the same sequence length.


#### Returns:


* <b>`output`</b>: Tensor, the output of the LSTM layer. Dimensions
  (seq_len, batch_size, hidden_size) if `batch_first` is `False` (default)
  or (batch_size, seq_len, hidden_size) if `batch_first` is `True`. Note
  that if `lengths` was specified, the `output` tensor will not be
  masked. It's the caller's responsibility to either not use the invalid
  entries or to mask them out before using them.
* <b>`(h_n, c_n)`</b>: the hidden and cell states, respectively, for the last
  sequence item. Dimensions (1, batch_size, hidden_size).

<h3 id="from_native_weights"><code><a name="from_native_weights">from_native_weights</a></code></h3>

``` python
from_native_weights(
    weight_ih_l0,
    weight_hh_l0,
    bias_ih_l0,
    bias_hh_l0
)
```

Copies and converts the provided PyTorch LSTM weights into this layer.


#### Arguments:


* <b>`weight_ih_l0`</b>: Parameter, the input-hidden weights of the PyTorch LSTM layer.
* <b>`weight_hh_l0`</b>: Parameter, the hidden-hidden weights of the PyTorch LSTM layer.
* <b>`bias_ih_l0`</b>: Parameter, the input-hidden bias of the PyTorch LSTM layer.
* <b>`bias_hh_l0`</b>: Parameter, the hidden-hidden bias of the PyTorch LSTM layer.

<h3 id="half"><code><a name="half">half</a></code></h3>

``` python
half()
```

Casts all floating point parameters and buffers to ``half`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="load_state_dict"><code><a name="load_state_dict">load_state_dict</a></code></h3>

``` python
load_state_dict(
    state_dict,
    strict=True
)
```

Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.

#### Arguments:

state_dict (dict): a dict containing parameters and
    persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
    in :attr:`state_dict` match the keys returned by this module's
    :meth:`~torch.nn.Module.state_dict` function. Default: ``True``



#### Returns:

``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
    * **missing_keys** is a list of str containing the missing keys
    * **unexpected_keys** is a list of str containing the unexpected keys


<h3 id="modules"><code><a name="modules">modules</a></code></h3>

``` python
modules()
```

Returns an iterator over all modules in the network.


#### Yields:


* <b>`Module`</b>: a module in the network


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    ```

    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)

<h3 id="named_buffers"><code><a name="named_buffers">named_buffers</a></code></h3>

``` python
named_buffers(
    prefix='',
    recurse=True
)
```

Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.

#### Args:

prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`(string, torch.Tensor)`</b>: Tuple containing the name and buffer

Example::

    ```
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    ```

<h3 id="named_children"><code><a name="named_children">named_children</a></code></h3>

``` python
named_children()
```

Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple containing a name and child module

Example::

    ```
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    ```

<h3 id="named_modules"><code><a name="named_modules">named_modules</a></code></h3>

``` python
named_modules(
    memo=None,
    prefix=''
)
```

Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple of name and module


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    ```

    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

<h3 id="named_parameters"><code><a name="named_parameters">named_parameters</a></code></h3>

``` python
named_parameters(
    prefix='',
    recurse=True
)
```

Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.

#### Args:

prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`(string, Parameter)`</b>: Tuple containing the name and parameter

Example::

    ```
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    ```

<h3 id="parameters"><code><a name="parameters">parameters</a></code></h3>

``` python
parameters(recurse=True)
```

Returns an iterator over module parameters.

This is typically passed to an optimizer.

#### Args:

recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`Parameter`</b>: module parameter

Example::

    ```
    >>> for param in model.parameters():
    >>>     print(type(param.data), param.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="register_backward_hook"><code><a name="register_backward_hook">register_backward_hook</a></code></h3>

``` python
register_backward_hook(hook)
```

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::

    hook(module, grad_input, grad_output) -> Tensor or None

The :attr:`grad_input` and :attr:`grad_output` may be tuples if the
module has multiple inputs or outputs. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to
input that will be used in place of :attr:`grad_input` in subsequent
computations.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


.. warning ::

    The current implementation will not have the presented behavior
    for complex :class:`Module` that perform many operations.
    In some failure cases, :attr:`grad_input` and :attr:`grad_output` will only
    contain the gradients for a subset of the inputs and outputs.
    For such :class:`Module`, you should use :func:`torch.Tensor.register_hook`
    directly on a specific input or output to get the required gradients.

<h3 id="register_buffer"><code><a name="register_buffer">register_buffer</a></code></h3>

``` python
register_buffer(
    name,
    tensor
)
```

Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

#### Args:

name (string): name of the buffer. The buffer can be accessed
    from this module using the given name
tensor (Tensor): buffer to be registered.


Example::

    ```
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    ```

<h3 id="register_forward_hook"><code><a name="register_forward_hook">register_forward_hook</a></code></h3>

``` python
register_forward_hook(hook)
```

Registers a forward hook on the module.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None or modified output

The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_forward_pre_hook"><code><a name="register_forward_pre_hook">register_forward_pre_hook</a></code></h3>

``` python
register_forward_pre_hook(hook)
```

Registers a forward pre-hook on the module.

The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::

    hook(module, input) -> None or modified input

The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_parameter"><code><a name="register_parameter">register_parameter</a></code></h3>

``` python
register_parameter(
    name,
    param
)
```

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

#### Args:

name (string): name of the parameter. The parameter can be accessed
    from this module using the given name
param (Parameter): parameter to be added to the module.


<h3 id="requires_grad_"><code><a name="requires_grad_">requires_grad_</a></code></h3>

``` python
requires_grad_(requires_grad=True)
```

Change if autograd should record operations on parameters in this
module.

This method sets the parameters' :attr:`requires_grad` attributes
in-place.

This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).

#### Args:

requires_grad (bool): whether autograd should record operations on
                      parameters in this module. Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="reset_parameters"><code><a name="reset_parameters">reset_parameters</a></code></h3>

``` python
reset_parameters()
```

Resets this layer's parameters to their initial values.


<h3 id="share_memory"><code><a name="share_memory">share_memory</a></code></h3>

``` python
share_memory()
```




<h3 id="state_dict"><code><a name="state_dict">state_dict</a></code></h3>

``` python
state_dict(
    destination=None,
    prefix='',
    keep_vars=False
)
```

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.

#### Returns:


* <b>`dict`</b>:     a dictionary containing a whole state of the module

Example::

    ```
    >>> module.state_dict().keys()
    ['bias', 'weight']
    ```

<h3 id="to"><code><a name="to">to</a></code></h3>

``` python
to(
    *args,
    **kwargs
)
```

Moves and/or casts the parameters and buffers.

This can be called as

.. function:: to(device=None, dtype=None, non_blocking=False)

.. function:: to(dtype, non_blocking=False)

.. function:: to(tensor, non_blocking=False)

Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.

See below for examples.

.. note::
    This method modifies the module in-place.

#### Args:

device (:class:`torch.device`): the desired device of the parameters
    and buffers in this module
dtype (:class:`torch.dtype`): the desired floating point type of
    the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
    dtype and device for all parameters and buffers in this module



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    ```

<h3 id="to_native_weights"><code><a name="to_native_weights">to_native_weights</a></code></h3>

``` python
to_native_weights()
```

Converts Haste LSTM weights to native PyTorch LSTM weights.


#### Returns:


* <b>`weight_ih_l0`</b>: Parameter, the input-hidden weights of the LSTM layer.
* <b>`weight_hh_l0`</b>: Parameter, the hidden-hidden weights of the LSTM layer.
* <b>`bias_ih_l0`</b>: Parameter, the input-hidden bias of the LSTM layer.
* <b>`bias_hh_l0`</b>: Parameter, the hidden-hidden bias of the LSTM layer.

<h3 id="train"><code><a name="train">train</a></code></h3>

``` python
train(mode=True)
```

Sets the module in training mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

#### Args:

mode (bool): whether to set training mode (``True``) or evaluation
             mode (``False``). Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="type"><code><a name="type">type</a></code></h3>

``` python
type(dst_type)
```

Casts all parameters and buffers to :attr:`dst_type`.


#### Arguments:

dst_type (type or string): the desired type



#### Returns:


* <b>`Module`</b>: self

<h3 id="zero_grad"><code><a name="zero_grad">zero_grad</a></code></h3>

``` python
zero_grad()
```

Sets gradients of all model parameters to zero.






================================================
FILE: docs/pytorch/haste_pytorch/LayerNormGRU.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch.LayerNormGRU" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="add_module"/>
<meta itemprop="property" content="apply"/>
<meta itemprop="property" content="buffers"/>
<meta itemprop="property" content="children"/>
<meta itemprop="property" content="cpu"/>
<meta itemprop="property" content="cuda"/>
<meta itemprop="property" content="double"/>
<meta itemprop="property" content="eval"/>
<meta itemprop="property" content="extra_repr"/>
<meta itemprop="property" content="float"/>
<meta itemprop="property" content="forward"/>
<meta itemprop="property" content="half"/>
<meta itemprop="property" content="load_state_dict"/>
<meta itemprop="property" content="modules"/>
<meta itemprop="property" content="named_buffers"/>
<meta itemprop="property" content="named_children"/>
<meta itemprop="property" content="named_modules"/>
<meta itemprop="property" content="named_parameters"/>
<meta itemprop="property" content="parameters"/>
<meta itemprop="property" content="register_backward_hook"/>
<meta itemprop="property" content="register_buffer"/>
<meta itemprop="property" content="register_forward_hook"/>
<meta itemprop="property" content="register_forward_pre_hook"/>
<meta itemprop="property" content="register_parameter"/>
<meta itemprop="property" content="requires_grad_"/>
<meta itemprop="property" content="reset_parameters"/>
<meta itemprop="property" content="share_memory"/>
<meta itemprop="property" content="state_dict"/>
<meta itemprop="property" content="to"/>
<meta itemprop="property" content="train"/>
<meta itemprop="property" content="type"/>
<meta itemprop="property" content="zero_grad"/>
</div>

# haste_pytorch.LayerNormGRU

<!-- Insert buttons and diff -->


## Class `LayerNormGRU`

Layer Normalized Gated Recurrent Unit layer.



<!-- Placeholder for "Used in" -->

This GRU layer applies layer normalization to the input and recurrent output
activations of a standard GRU. The implementation is fused and
GPU-accelerated. There are two commonly-used variants of GRU cells. This one
implements 1406.1078v1 which applies the reset gate to the hidden state
after matrix multiplication. The other variant, 1406.1078v3, applies the
reset gate before matrix multiplication and is currently unsupported.

This layer has built-in support for DropConnect and Zoneout, which are
both techniques used to regularize RNNs.

See [\_\_init\_\_](#__init__) and [forward](#forward) for usage.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    input_size,
    hidden_size,
    batch_first=False,
    dropout=0.0,
    zoneout=0.0
)
```

Initialize the parameters of the GRU layer.


#### Arguments:


* <b>`input_size`</b>: int, the feature dimension of the input.
* <b>`hidden_size`</b>: int, the feature dimension of the output.
* <b>`batch_first`</b>: (optional) bool, if `True`, then the input and output
  tensors are provided as `(batch, seq, feature)`.
* <b>`dropout`</b>: (optional) float, sets the dropout rate for DropConnect
  regularization on the recurrent matrix.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization.


#### Variables:


* <b>`kernel`</b>: the input projection weight matrix. Dimensions
  (input_size, hidden_size * 3) with `z,r,h` gate layout. Initialized
  with Xavier uniform initialization.
* <b>`recurrent_kernel`</b>: the recurrent projection weight matrix. Dimensions
  (hidden_size, hidden_size * 3) with `z,r,h` gate layout. Initialized
  with orthogonal initialization.
* <b>`bias`</b>: the input projection bias vector. Dimensions (hidden_size * 3) with
  `z,r,h` gate layout. Initialized to zeros.
* <b>`recurrent_bias`</b>: the recurrent projection bias vector. Dimensions
  (hidden_size * 3) with `z,r,h` gate layout. Initialized to zeros.
* <b>`gamma`</b>: the input and recurrent normalization gain. Dimensions
  (2, hidden_size * 4) with `gamma[0]` specifying the input gain and
  `gamma[1]` specifying the recurrent gain. Initialized to ones.



## Methods

<h3 id="__call__"><code><a name="__call__">__call__</a></code></h3>

``` python
__call__(
    *input,
    **kwargs
)
```

Call self as a function.


<h3 id="add_module"><code><a name="add_module">add_module</a></code></h3>

``` python
add_module(
    name,
    module
)
```

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

#### Args:

name (string): name of the child module. The child module can be
    accessed from this module using the given name
module (Module): child module to be added to the module.


<h3 id="apply"><code><a name="apply">apply</a></code></h3>

``` python
apply(fn)
```

Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).

#### Args:

fn (:class:`Module` -> None): function to be applied to each submodule



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.data.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    ```

<h3 id="buffers"><code><a name="buffers">buffers</a></code></h3>

``` python
buffers(recurse=True)
```

Returns an iterator over module buffers.


#### Args:

recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`torch.Tensor`</b>: module buffer

Example::

    ```
    >>> for buf in model.buffers():
    >>>     print(type(buf.data), buf.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="children"><code><a name="children">children</a></code></h3>

``` python
children()
```

Returns an iterator over immediate children modules.


#### Yields:


* <b>`Module`</b>: a child module

<h3 id="cpu"><code><a name="cpu">cpu</a></code></h3>

``` python
cpu()
```

Moves all model parameters and buffers to the CPU.


#### Returns:


* <b>`Module`</b>: self

<h3 id="cuda"><code><a name="cuda">cuda</a></code></h3>

``` python
cuda(device=None)
```

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.

#### Arguments:

device (int, optional): if specified, all parameters will be
    copied to that device



#### Returns:


* <b>`Module`</b>: self

<h3 id="double"><code><a name="double">double</a></code></h3>

``` python
double()
```

Casts all floating point parameters and buffers to ``double`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="eval"><code><a name="eval">eval</a></code></h3>

``` python
eval()
```

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

#### Returns:


* <b>`Module`</b>: self

<h3 id="extra_repr"><code><a name="extra_repr">extra_repr</a></code></h3>

``` python
extra_repr()
```

Set the extra representation of the module

To print customized extra information, you should reimplement
this method in your own modules. Both single-line and multi-line
strings are acceptable.

<h3 id="float"><code><a name="float">float</a></code></h3>

``` python
float()
```

Casts all floating point parameters and buffers to float datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="forward"><code><a name="forward">forward</a></code></h3>

``` python
forward(
    input,
    state=None,
    lengths=None
)
```

Runs a forward pass of the GRU layer.


#### Arguments:


* <b>`input`</b>: Tensor, a batch of input sequences to pass through the GRU.
  Dimensions (seq_len, batch_size, input_size) if `batch_first` is
  `False`, otherwise (batch_size, seq_len, input_size).
* <b>`state`</b>: (optional) Tensor, the intial state for each batch element in
  `input`. Dimensions (1, batch_size, hidden_size). Defaults to zeros.
* <b>`lengths`</b>: (optional) Tensor, list of sequence lengths for each batch
  element. Dimension (batch_size). This argument may be omitted if
  all batch elements are unpadded and have the same sequence length.


#### Returns:


* <b>`output`</b>: Tensor, the output of the GRU layer. Dimensions
  (seq_len, batch_size, hidden_size) if `batch_first` is `False` (default)
  or (batch_size, seq_len, hidden_size) if `batch_first` is `True`. Note
  that if `lengths` was specified, the `output` tensor will not be
  masked. It's the caller's responsibility to either not use the invalid
  entries or to mask them out before using them.
* <b>`h_n`</b>: the hidden state for the last sequence item. Dimensions
  (1, batch_size, hidden_size).

<h3 id="half"><code><a name="half">half</a></code></h3>

``` python
half()
```

Casts all floating point parameters and buffers to ``half`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="load_state_dict"><code><a name="load_state_dict">load_state_dict</a></code></h3>

``` python
load_state_dict(
    state_dict,
    strict=True
)
```

Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.

#### Arguments:

state_dict (dict): a dict containing parameters and
    persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
    in :attr:`state_dict` match the keys returned by this module's
    :meth:`~torch.nn.Module.state_dict` function. Default: ``True``



#### Returns:

``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
    * **missing_keys** is a list of str containing the missing keys
    * **unexpected_keys** is a list of str containing the unexpected keys


<h3 id="modules"><code><a name="modules">modules</a></code></h3>

``` python
modules()
```

Returns an iterator over all modules in the network.


#### Yields:


* <b>`Module`</b>: a module in the network


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    ```

    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)

<h3 id="named_buffers"><code><a name="named_buffers">named_buffers</a></code></h3>

``` python
named_buffers(
    prefix='',
    recurse=True
)
```

Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.

#### Args:

prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`(string, torch.Tensor)`</b>: Tuple containing the name and buffer

Example::

    ```
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    ```

<h3 id="named_children"><code><a name="named_children">named_children</a></code></h3>

``` python
named_children()
```

Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple containing a name and child module

Example::

    ```
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    ```

<h3 id="named_modules"><code><a name="named_modules">named_modules</a></code></h3>

``` python
named_modules(
    memo=None,
    prefix=''
)
```

Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple of name and module


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    ```

    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

<h3 id="named_parameters"><code><a name="named_parameters">named_parameters</a></code></h3>

``` python
named_parameters(
    prefix='',
    recurse=True
)
```

Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.

#### Args:

prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`(string, Parameter)`</b>: Tuple containing the name and parameter

Example::

    ```
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    ```

<h3 id="parameters"><code><a name="parameters">parameters</a></code></h3>

``` python
parameters(recurse=True)
```

Returns an iterator over module parameters.

This is typically passed to an optimizer.

#### Args:

recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`Parameter`</b>: module parameter

Example::

    ```
    >>> for param in model.parameters():
    >>>     print(type(param.data), param.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="register_backward_hook"><code><a name="register_backward_hook">register_backward_hook</a></code></h3>

``` python
register_backward_hook(hook)
```

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::

    hook(module, grad_input, grad_output) -> Tensor or None

The :attr:`grad_input` and :attr:`grad_output` may be tuples if the
module has multiple inputs or outputs. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to
input that will be used in place of :attr:`grad_input` in subsequent
computations.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


.. warning ::

    The current implementation will not have the presented behavior
    for complex :class:`Module` that perform many operations.
    In some failure cases, :attr:`grad_input` and :attr:`grad_output` will only
    contain the gradients for a subset of the inputs and outputs.
    For such :class:`Module`, you should use :func:`torch.Tensor.register_hook`
    directly on a specific input or output to get the required gradients.

<h3 id="register_buffer"><code><a name="register_buffer">register_buffer</a></code></h3>

``` python
register_buffer(
    name,
    tensor
)
```

Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

#### Args:

name (string): name of the buffer. The buffer can be accessed
    from this module using the given name
tensor (Tensor): buffer to be registered.


Example::

    ```
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    ```

<h3 id="register_forward_hook"><code><a name="register_forward_hook">register_forward_hook</a></code></h3>

``` python
register_forward_hook(hook)
```

Registers a forward hook on the module.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None or modified output

The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_forward_pre_hook"><code><a name="register_forward_pre_hook">register_forward_pre_hook</a></code></h3>

``` python
register_forward_pre_hook(hook)
```

Registers a forward pre-hook on the module.

The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::

    hook(module, input) -> None or modified input

The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_parameter"><code><a name="register_parameter">register_parameter</a></code></h3>

``` python
register_parameter(
    name,
    param
)
```

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

#### Args:

name (string): name of the parameter. The parameter can be accessed
    from this module using the given name
param (Parameter): parameter to be added to the module.


<h3 id="requires_grad_"><code><a name="requires_grad_">requires_grad_</a></code></h3>

``` python
requires_grad_(requires_grad=True)
```

Change if autograd should record operations on parameters in this
module.

This method sets the parameters' :attr:`requires_grad` attributes
in-place.

This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).

#### Args:

requires_grad (bool): whether autograd should record operations on
                      parameters in this module. Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="reset_parameters"><code><a name="reset_parameters">reset_parameters</a></code></h3>

``` python
reset_parameters()
```

Resets this layer's parameters to their initial values.


<h3 id="share_memory"><code><a name="share_memory">share_memory</a></code></h3>

``` python
share_memory()
```




<h3 id="state_dict"><code><a name="state_dict">state_dict</a></code></h3>

``` python
state_dict(
    destination=None,
    prefix='',
    keep_vars=False
)
```

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.

#### Returns:


* <b>`dict`</b>:     a dictionary containing a whole state of the module

Example::

    ```
    >>> module.state_dict().keys()
    ['bias', 'weight']
    ```

<h3 id="to"><code><a name="to">to</a></code></h3>

``` python
to(
    *args,
    **kwargs
)
```

Moves and/or casts the parameters and buffers.

This can be called as

.. function:: to(device=None, dtype=None, non_blocking=False)

.. function:: to(dtype, non_blocking=False)

.. function:: to(tensor, non_blocking=False)

Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.

See below for examples.

.. note::
    This method modifies the module in-place.

#### Args:

device (:class:`torch.device`): the desired device of the parameters
    and buffers in this module
dtype (:class:`torch.dtype`): the desired floating point type of
    the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
    dtype and device for all parameters and buffers in this module



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    ```

<h3 id="train"><code><a name="train">train</a></code></h3>

``` python
train(mode=True)
```

Sets the module in training mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

#### Args:

mode (bool): whether to set training mode (``True``) or evaluation
             mode (``False``). Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="type"><code><a name="type">type</a></code></h3>

``` python
type(dst_type)
```

Casts all parameters and buffers to :attr:`dst_type`.


#### Arguments:

dst_type (type or string): the desired type



#### Returns:


* <b>`Module`</b>: self

<h3 id="zero_grad"><code><a name="zero_grad">zero_grad</a></code></h3>

``` python
zero_grad()
```

Sets gradients of all model parameters to zero.






================================================
FILE: docs/pytorch/haste_pytorch/LayerNormLSTM.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch.LayerNormLSTM" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="add_module"/>
<meta itemprop="property" content="apply"/>
<meta itemprop="property" content="buffers"/>
<meta itemprop="property" content="children"/>
<meta itemprop="property" content="cpu"/>
<meta itemprop="property" content="cuda"/>
<meta itemprop="property" content="double"/>
<meta itemprop="property" content="eval"/>
<meta itemprop="property" content="extra_repr"/>
<meta itemprop="property" content="float"/>
<meta itemprop="property" content="forward"/>
<meta itemprop="property" content="half"/>
<meta itemprop="property" content="load_state_dict"/>
<meta itemprop="property" content="modules"/>
<meta itemprop="property" content="named_buffers"/>
<meta itemprop="property" content="named_children"/>
<meta itemprop="property" content="named_modules"/>
<meta itemprop="property" content="named_parameters"/>
<meta itemprop="property" content="parameters"/>
<meta itemprop="property" content="register_backward_hook"/>
<meta itemprop="property" content="register_buffer"/>
<meta itemprop="property" content="register_forward_hook"/>
<meta itemprop="property" content="register_forward_pre_hook"/>
<meta itemprop="property" content="register_parameter"/>
<meta itemprop="property" content="requires_grad_"/>
<meta itemprop="property" content="reset_parameters"/>
<meta itemprop="property" content="share_memory"/>
<meta itemprop="property" content="state_dict"/>
<meta itemprop="property" content="to"/>
<meta itemprop="property" content="train"/>
<meta itemprop="property" content="type"/>
<meta itemprop="property" content="zero_grad"/>
</div>

# haste_pytorch.LayerNormLSTM

<!-- Insert buttons and diff -->


## Class `LayerNormLSTM`

Layer Normalized Long Short-Term Memory layer.



<!-- Placeholder for "Used in" -->

This LSTM layer applies layer normalization to the input, recurrent, and
output activations of a standard LSTM. The implementation is fused and
GPU-accelerated. DropConnect and Zoneout regularization are built-in, and
this layer allows setting a non-zero initial forget gate bias.

Details about the exact function this layer implements can be found at
https://github.com/lmnt-com/haste/issues/1.

See [\_\_init\_\_](#__init__) and [forward](#forward) for usage.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    input_size,
    hidden_size,
    batch_first=False,
    forget_bias=1.0,
    dropout=0.0,
    zoneout=0.0
)
```

Initialize the parameters of the LSTM layer.


#### Arguments:


* <b>`input_size`</b>: int, the feature dimension of the input.
* <b>`hidden_size`</b>: int, the feature dimension of the output.
* <b>`batch_first`</b>: (optional) bool, if `True`, then the input and output
  tensors are provided as `(batch, seq, feature)`.
* <b>`forget_bias`</b>: (optional) float, sets the initial bias of the forget gate
  for this LSTM cell.
* <b>`dropout`</b>: (optional) float, sets the dropout rate for DropConnect
  regularization on the recurrent matrix.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization.


#### Variables:


* <b>`kernel`</b>: the input projection weight matrix. Dimensions
  (input_size, hidden_size * 4) with `i,g,f,o` gate layout. Initialized
  with Xavier uniform initialization.
* <b>`recurrent_kernel`</b>: the recurrent projection weight matrix. Dimensions
  (hidden_size, hidden_size * 4) with `i,g,f,o` gate layout. Initialized
  with orthogonal initialization.
* <b>`bias`</b>: the projection bias vector. Dimensions (hidden_size * 4) with
  `i,g,f,o` gate layout. The forget gate biases are initialized to
  `forget_bias` and the rest are zeros.
* <b>`gamma`</b>: the input and recurrent normalization gain. Dimensions
  (2, hidden_size * 4) with `gamma[0]` specifying the input gain and
  `gamma[1]` specifying the recurrent gain. Initialized to ones.
* <b>`gamma_h`</b>: the output normalization gain. Dimensions (hidden_size).
  Initialized to ones.
* <b>`beta_h`</b>: the output normalization bias. Dimensions (hidden_size).
  Initialized to zeros.



## Methods

<h3 id="__call__"><code><a name="__call__">__call__</a></code></h3>

``` python
__call__(
    *input,
    **kwargs
)
```

Call self as a function.


<h3 id="add_module"><code><a name="add_module">add_module</a></code></h3>

``` python
add_module(
    name,
    module
)
```

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

#### Args:

name (string): name of the child module. The child module can be
    accessed from this module using the given name
module (Module): child module to be added to the module.


<h3 id="apply"><code><a name="apply">apply</a></code></h3>

``` python
apply(fn)
```

Applies ``fn`` recursively to every submodule (as returned by ``.children()``)
as well as self. Typical use includes initializing the parameters of a model
(see also :ref:`nn-init-doc`).

#### Args:

fn (:class:`Module` -> None): function to be applied to each submodule



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.data.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    ```

<h3 id="buffers"><code><a name="buffers">buffers</a></code></h3>

``` python
buffers(recurse=True)
```

Returns an iterator over module buffers.


#### Args:

recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`torch.Tensor`</b>: module buffer

Example::

    ```
    >>> for buf in model.buffers():
    >>>     print(type(buf.data), buf.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="children"><code><a name="children">children</a></code></h3>

``` python
children()
```

Returns an iterator over immediate children modules.


#### Yields:


* <b>`Module`</b>: a child module

<h3 id="cpu"><code><a name="cpu">cpu</a></code></h3>

``` python
cpu()
```

Moves all model parameters and buffers to the CPU.


#### Returns:


* <b>`Module`</b>: self

<h3 id="cuda"><code><a name="cuda">cuda</a></code></h3>

``` python
cuda(device=None)
```

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So
it should be called before constructing optimizer if the module will
live on GPU while being optimized.

#### Arguments:

device (int, optional): if specified, all parameters will be
    copied to that device



#### Returns:


* <b>`Module`</b>: self

<h3 id="double"><code><a name="double">double</a></code></h3>

``` python
double()
```

Casts all floating point parameters and buffers to ``double`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="eval"><code><a name="eval">eval</a></code></h3>

``` python
eval()
```

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

This is equivalent with :meth:`self.train(False) <torch.nn.Module.train>`.

#### Returns:


* <b>`Module`</b>: self

<h3 id="extra_repr"><code><a name="extra_repr">extra_repr</a></code></h3>

``` python
extra_repr()
```

Set the extra representation of the module

To print customized extra information, you should reimplement
this method in your own modules. Both single-line and multi-line
strings are acceptable.

<h3 id="float"><code><a name="float">float</a></code></h3>

``` python
float()
```

Casts all floating point parameters and buffers to float datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="forward"><code><a name="forward">forward</a></code></h3>

``` python
forward(
    input,
    state=None,
    lengths=None
)
```

Runs a forward pass of the LSTM layer.


#### Arguments:


* <b>`input`</b>: Tensor, a batch of input sequences to pass through the LSTM.
  Dimensions (seq_len, batch_size, input_size) if `batch_first` is
  `False`, otherwise (batch_size, seq_len, input_size).
* <b>`lengths`</b>: (optional) Tensor, list of sequence lengths for each batch
  element. Dimension (batch_size). This argument may be omitted if
  all batch elements are unpadded and have the same sequence length.


#### Returns:


* <b>`output`</b>: Tensor, the output of the LSTM layer. Dimensions
  (seq_len, batch_size, hidden_size) if `batch_first` is `False` (default)
  or (batch_size, seq_len, hidden_size) if `batch_first` is `True`. Note
  that if `lengths` was specified, the `output` tensor will not be
  masked. It's the caller's responsibility to either not use the invalid
  entries or to mask them out before using them.
* <b>`(h_n, c_n)`</b>: the hidden and cell states, respectively, for the last
  sequence item. Dimensions (1, batch_size, hidden_size).

<h3 id="half"><code><a name="half">half</a></code></h3>

``` python
half()
```

Casts all floating point parameters and buffers to ``half`` datatype.


#### Returns:


* <b>`Module`</b>: self

<h3 id="load_state_dict"><code><a name="load_state_dict">load_state_dict</a></code></h3>

``` python
load_state_dict(
    state_dict,
    strict=True
)
```

Copies parameters and buffers from :attr:`state_dict` into
this module and its descendants. If :attr:`strict` is ``True``, then
the keys of :attr:`state_dict` must exactly match the keys returned
by this module's :meth:`~torch.nn.Module.state_dict` function.

#### Arguments:

state_dict (dict): a dict containing parameters and
    persistent buffers.
strict (bool, optional): whether to strictly enforce that the keys
    in :attr:`state_dict` match the keys returned by this module's
    :meth:`~torch.nn.Module.state_dict` function. Default: ``True``



#### Returns:

``NamedTuple`` with ``missing_keys`` and ``unexpected_keys`` fields:
    * **missing_keys** is a list of str containing the missing keys
    * **unexpected_keys** is a list of str containing the unexpected keys


<h3 id="modules"><code><a name="modules">modules</a></code></h3>

``` python
modules()
```

Returns an iterator over all modules in the network.


#### Yields:


* <b>`Module`</b>: a module in the network


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    ```

    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)

<h3 id="named_buffers"><code><a name="named_buffers">named_buffers</a></code></h3>

``` python
named_buffers(
    prefix='',
    recurse=True
)
```

Returns an iterator over module buffers, yielding both the
name of the buffer as well as the buffer itself.

#### Args:

prefix (str): prefix to prepend to all buffer names.
recurse (bool): if True, then yields buffers of this module
    and all submodules. Otherwise, yields only buffers that
    are direct members of this module.



#### Yields:


* <b>`(string, torch.Tensor)`</b>: Tuple containing the name and buffer

Example::

    ```
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    ```

<h3 id="named_children"><code><a name="named_children">named_children</a></code></h3>

``` python
named_children()
```

Returns an iterator over immediate children modules, yielding both
the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple containing a name and child module

Example::

    ```
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    ```

<h3 id="named_modules"><code><a name="named_modules">named_modules</a></code></h3>

``` python
named_modules(
    memo=None,
    prefix=''
)
```

Returns an iterator over all modules in the network, yielding
both the name of the module as well as the module itself.

#### Yields:


* <b>`(string, Module)`</b>: Tuple of name and module


#### Note:

Duplicate modules are returned only once. In the following
example, ``l`` will be returned only once.


Example::

    ```
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    ```

    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))

<h3 id="named_parameters"><code><a name="named_parameters">named_parameters</a></code></h3>

``` python
named_parameters(
    prefix='',
    recurse=True
)
```

Returns an iterator over module parameters, yielding both the
name of the parameter as well as the parameter itself.

#### Args:

prefix (str): prefix to prepend to all parameter names.
recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`(string, Parameter)`</b>: Tuple containing the name and parameter

Example::

    ```
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    ```

<h3 id="parameters"><code><a name="parameters">parameters</a></code></h3>

``` python
parameters(recurse=True)
```

Returns an iterator over module parameters.

This is typically passed to an optimizer.

#### Args:

recurse (bool): if True, then yields parameters of this module
    and all submodules. Otherwise, yields only parameters that
    are direct members of this module.



#### Yields:


* <b>`Parameter`</b>: module parameter

Example::

    ```
    >>> for param in model.parameters():
    >>>     print(type(param.data), param.size())
    <class 'torch.FloatTensor'> (20L,)
    <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L)
    ```

<h3 id="register_backward_hook"><code><a name="register_backward_hook">register_backward_hook</a></code></h3>

``` python
register_backward_hook(hook)
```

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module
inputs are computed. The hook should have the following signature::

    hook(module, grad_input, grad_output) -> Tensor or None

The :attr:`grad_input` and :attr:`grad_output` may be tuples if the
module has multiple inputs or outputs. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to
input that will be used in place of :attr:`grad_input` in subsequent
computations.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


.. warning ::

    The current implementation will not have the presented behavior
    for complex :class:`Module` that perform many operations.
    In some failure cases, :attr:`grad_input` and :attr:`grad_output` will only
    contain the gradients for a subset of the inputs and outputs.
    For such :class:`Module`, you should use :func:`torch.Tensor.register_hook`
    directly on a specific input or output to get the required gradients.

<h3 id="register_buffer"><code><a name="register_buffer">register_buffer</a></code></h3>

``` python
register_buffer(
    name,
    tensor
)
```

Adds a persistent buffer to the module.

This is typically used to register a buffer that should not to be
considered a model parameter. For example, BatchNorm's ``running_mean``
is not a parameter, but is part of the persistent state.

Buffers can be accessed as attributes using given names.

#### Args:

name (string): name of the buffer. The buffer can be accessed
    from this module using the given name
tensor (Tensor): buffer to be registered.


Example::

    ```
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    ```

<h3 id="register_forward_hook"><code><a name="register_forward_hook">register_forward_hook</a></code></h3>

``` python
register_forward_hook(hook)
```

Registers a forward hook on the module.

The hook will be called every time after :func:`forward` has computed an output.
It should have the following signature::

    hook(module, input, output) -> None or modified output

The hook can modify the output. It can modify the input inplace but
it will not have effect on forward since this is called after
:func:`forward` is called.

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_forward_pre_hook"><code><a name="register_forward_pre_hook">register_forward_pre_hook</a></code></h3>

``` python
register_forward_pre_hook(hook)
```

Registers a forward pre-hook on the module.

The hook will be called every time before :func:`forward` is invoked.
It should have the following signature::

    hook(module, input) -> None or modified input

The hook can modify the input. User can either return a tuple or a
single modified value in the hook. We will wrap the value into a tuple
if a single value is returned(unless that value is already a tuple).

#### Returns:

:class:`torch.utils.hooks.RemovableHandle`:
    a handle that can be used to remove the added hook by calling
    ``handle.remove()``


<h3 id="register_parameter"><code><a name="register_parameter">register_parameter</a></code></h3>

``` python
register_parameter(
    name,
    param
)
```

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

#### Args:

name (string): name of the parameter. The parameter can be accessed
    from this module using the given name
param (Parameter): parameter to be added to the module.


<h3 id="requires_grad_"><code><a name="requires_grad_">requires_grad_</a></code></h3>

``` python
requires_grad_(requires_grad=True)
```

Change if autograd should record operations on parameters in this
module.

This method sets the parameters' :attr:`requires_grad` attributes
in-place.

This method is helpful for freezing part of the module for finetuning
or training parts of a model individually (e.g., GAN training).

#### Args:

requires_grad (bool): whether autograd should record operations on
                      parameters in this module. Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="reset_parameters"><code><a name="reset_parameters">reset_parameters</a></code></h3>

``` python
reset_parameters()
```

Resets this layer's parameters to their initial values.


<h3 id="share_memory"><code><a name="share_memory">share_memory</a></code></h3>

``` python
share_memory()
```




<h3 id="state_dict"><code><a name="state_dict">state_dict</a></code></h3>

``` python
state_dict(
    destination=None,
    prefix='',
    keep_vars=False
)
```

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are
included. Keys are corresponding parameter and buffer names.

#### Returns:


* <b>`dict`</b>:     a dictionary containing a whole state of the module

Example::

    ```
    >>> module.state_dict().keys()
    ['bias', 'weight']
    ```

<h3 id="to"><code><a name="to">to</a></code></h3>

``` python
to(
    *args,
    **kwargs
)
```

Moves and/or casts the parameters and buffers.

This can be called as

.. function:: to(device=None, dtype=None, non_blocking=False)

.. function:: to(dtype, non_blocking=False)

.. function:: to(tensor, non_blocking=False)

Its signature is similar to :meth:`torch.Tensor.to`, but only accepts
floating point desired :attr:`dtype` s. In addition, this method will
only cast the floating point parameters and buffers to :attr:`dtype`
(if given). The integral parameters and buffers will be moved
:attr:`device`, if that is given, but with dtypes unchanged. When
:attr:`non_blocking` is set, it tries to convert/move asynchronously
with respect to the host if possible, e.g., moving CPU Tensors with
pinned memory to CUDA devices.

See below for examples.

.. note::
    This method modifies the module in-place.

#### Args:

device (:class:`torch.device`): the desired device of the parameters
    and buffers in this module
dtype (:class:`torch.dtype`): the desired floating point type of
    the floating point parameters and buffers in this module
tensor (torch.Tensor): Tensor whose dtype and device are the desired
    dtype and device for all parameters and buffers in this module



#### Returns:


* <b>`Module`</b>: self

Example::

    ```
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    ```

<h3 id="train"><code><a name="train">train</a></code></h3>

``` python
train(mode=True)
```

Sets the module in training mode.

This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

#### Args:

mode (bool): whether to set training mode (``True``) or evaluation
             mode (``False``). Default: ``True``.



#### Returns:


* <b>`Module`</b>: self

<h3 id="type"><code><a name="type">type</a></code></h3>

``` python
type(dst_type)
```

Casts all parameters and buffers to :attr:`dst_type`.


#### Arguments:

dst_type (type or string): the desired type



#### Returns:


* <b>`Module`</b>: self

<h3 id="zero_grad"><code><a name="zero_grad">zero_grad</a></code></h3>

``` python
zero_grad()
```

Sets gradients of all model parameters to zero.






================================================
FILE: docs/pytorch/haste_pytorch.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_pytorch" />
<meta itemprop="path" content="Stable" />
</div>

# Module: haste_pytorch



Haste: a fast, simple, and open RNN library.



## Classes

[`class GRU`](./haste_pytorch/GRU.md): Gated Recurrent Unit layer.

[`class IndRNN`](./haste_pytorch/IndRNN.md): Independently Recurrent Neural Network layer.

[`class LSTM`](./haste_pytorch/LSTM.md): Long Short-Term Memory layer.

[`class LayerNormGRU`](./haste_pytorch/LayerNormGRU.md): Layer Normalized Gated Recurrent Unit layer.

[`class LayerNormLSTM`](./haste_pytorch/LayerNormLSTM.md): Layer Normalized Long Short-Term Memory layer.



================================================
FILE: docs/tf/haste_tf/GRU.md
================================================
<div itemscope itemtype="http://developers.google.com/ReferenceObject">
<meta itemprop="name" content="haste_tf.GRU" />
<meta itemprop="path" content="Stable" />
<meta itemprop="property" content="bidirectional"/>
<meta itemprop="property" content="name"/>
<meta itemprop="property" content="name_scope"/>
<meta itemprop="property" content="output_size"/>
<meta itemprop="property" content="state_size"/>
<meta itemprop="property" content="submodules"/>
<meta itemprop="property" content="trainable_variables"/>
<meta itemprop="property" content="variables"/>
<meta itemprop="property" content="__call__"/>
<meta itemprop="property" content="__init__"/>
<meta itemprop="property" content="build"/>
<meta itemprop="property" content="with_name_scope"/>
</div>

# haste_tf.GRU

<!-- Insert buttons and diff -->


## Class `GRU`

Gated Recurrent Unit layer.



<!-- Placeholder for "Used in" -->

This GRU layer offers a fused, GPU-accelerated TensorFlow op for inference
and training. There are two commonly-used variants of GRU cells. This one
implements 1406.1078v1 which applies the reset gate to the hidden state
after matrix multiplication. cuDNN also implements this variant. The other
variant, 1406.1078v3, applies the reset gate before matrix multiplication
and is currently unsupported.

This layer has built-in support for DropConnect and Zoneout, which are
both techniques used to regularize RNNs.

<h2 id="__init__"><code><a name="__init__">__init__</a></code></h2>

``` python
__init__(
    num_units,
    direction='unidirectional',
    **kwargs
)
```

Initialize the parameters of the GRU layer.


#### Arguments:


* <b>`num_units`</b>: int, the number of units in the LSTM cell.
* <b>`direction`</b>: string, 'unidirectional' or 'bidirectional'.
* <b>`**kwargs`</b>: Dict, keyword arguments (see below).


#### Keyword Arguments:


* <b>`kernel_initializer`</b>: (optional) the initializer to use for the input
  matrix weights. Defaults to `glorot_uniform`.
* <b>`recurrent_initializer`</b>: (optional) the initializer to use for the
  recurrent matrix weights. Defaults to `orthogonal`.
* <b>`bias_initializer`</b>: (optional) the initializer to use for input bias
  vectors. Defaults to `zeros`.
* <b>`recurrent_bias_initializer`</b>: (optional) the initializer to use for
  recurrent bias vectors. Defaults to `zeros`.
* <b>`kernel_transform`</b>: (optional) a function with signature
  `(kernel: Tensor) -> Tensor` that transforms the kernel before it is
  used. Defaults to the identity function.
* <b>`recurrent_transform`</b>: (optional) a function with signature
  `(recurrent_kernel: Tensor) -> Tensor` that transforms the recurrent
  kernel before it is used. Defaults to the identity function.
* <b>`bias_transform`</b>: (optional) a function with signature
  `(bias: Tensor) -> Tensor` that transforms the bias before it is used.
  Defaults to the identity function.
* <b>`recurrent_bias_transform`</b>: (optional) a function with signature
  `(recurrent_bias: Tensor) -> Tensor` that transforms the recurrent bias
  before it is used. Defaults to the identity function.
* <b>`dropout`</b>: (optional) float, sets the dropout rate for DropConnect
  regularization on the recurrent matrix. Defaults to 0.
* <b>`zoneout`</b>: (optional) float, sets the zoneout rate for Zoneout
  regularization. Defaults to 0.
* <b>`dtype`</b>: (optional) the data type for this layer. Defaults to `tf.float32`.
* <b>`name`</b>: (optional) string, the name for this layer.



## Properties

<h3 id="bidirectional"><code>bidirectional</code></h3>

`True` if this is a bidirectional RNN, `False` otherwise.


<h3 id="name"><code>name</code></h3>

Returns the name of this module as passed or determined in the ctor.

NOTE: This is not the same as the `self.name_scope.name` which includes
parent module names.

<h3 id="name_scope"><code>name_scope</code></h3>

Returns a `tf.name_scope` instance for this class.


<h3 id="output_size"><code>output_size</code></h3>




<h3 id="state_size"><code>state_size</code></h3>




<h3 id="submodules"><code>submodules</code></h3>

Sequence of all sub-modules.

Submodules are modules which are properties of this module, or found as
properties of modules which are properties of this module (and so on).

```
a = tf.Module()
b = tf.Module()
c = tf.Module()
a.b = b
b.c = c
assert list(a.submodules) == [b, c]
assert list(b.submodules) == [c]
assert list(c.submodules) == []
```

#### Returns:

A sequence of all submodules.


<h3 id="trainable_variables"><code>trainable_variables</code></h3>

Sequen

Download .txt

gitextract_l70fejx2/

├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── Makefile
├── README.md
├── benchmarks/
│   ├── benchmark_gru.cc
│   ├── benchmark_lstm.cc
│   ├── cudnn_wrappers.h
│   └── report.py
├── build/
│   ├── MANIFEST.in
│   ├── common.py
│   ├── setup.pytorch.py
│   └── setup.tf.py
├── docs/
│   ├── pytorch/
│   │   ├── haste_pytorch/
│   │   │   ├── GRU.md
│   │   │   ├── IndRNN.md
│   │   │   ├── LSTM.md
│   │   │   ├── LayerNormGRU.md
│   │   │   └── LayerNormLSTM.md
│   │   └── haste_pytorch.md
│   └── tf/
│       ├── haste_tf/
│       │   ├── GRU.md
│       │   ├── GRUCell.md
│       │   ├── IndRNN.md
│       │   ├── LSTM.md
│       │   ├── LayerNorm.md
│       │   ├── LayerNormGRU.md
│       │   ├── LayerNormGRUCell.md
│       │   ├── LayerNormLSTM.md
│       │   ├── LayerNormLSTMCell.md
│       │   └── ZoneoutWrapper.md
│       └── haste_tf.md
├── examples/
│   ├── device_ptr.h
│   ├── gru.cc
│   └── lstm.cc
├── frameworks/
│   ├── pytorch/
│   │   ├── __init__.py
│   │   ├── base_rnn.py
│   │   ├── gru.cc
│   │   ├── gru.py
│   │   ├── indrnn.cc
│   │   ├── indrnn.py
│   │   ├── layer_norm_gru.cc
│   │   ├── layer_norm_gru.py
│   │   ├── layer_norm_indrnn.cc
│   │   ├── layer_norm_indrnn.py
│   │   ├── layer_norm_lstm.cc
│   │   ├── layer_norm_lstm.py
│   │   ├── lstm.cc
│   │   ├── lstm.py
│   │   ├── support.cc
│   │   └── support.h
│   └── tf/
│       ├── __init__.py
│       ├── arena.h
│       ├── base_rnn.py
│       ├── gru.cc
│       ├── gru.py
│       ├── gru_cell.py
│       ├── indrnn.cc
│       ├── indrnn.py
│       ├── layer_norm.cc
│       ├── layer_norm.py
│       ├── layer_norm_gru.cc
│       ├── layer_norm_gru.py
│       ├── layer_norm_gru_cell.py
│       ├── layer_norm_indrnn.cc
│       ├── layer_norm_indrnn.py
│       ├── layer_norm_lstm.cc
│       ├── layer_norm_lstm.py
│       ├── layer_norm_lstm_cell.py
│       ├── lstm.cc
│       ├── lstm.py
│       ├── support.cc
│       ├── support.h
│       ├── weight_config.py
│       └── zoneout_wrapper.py
├── lib/
│   ├── blas.h
│   ├── device_assert.h
│   ├── gru_backward_gpu.cu.cc
│   ├── gru_forward_gpu.cu.cc
│   ├── haste/
│   │   ├── gru.h
│   │   ├── indrnn.h
│   │   ├── layer_norm.h
│   │   ├── layer_norm_gru.h
│   │   ├── layer_norm_indrnn.h
│   │   ├── layer_norm_lstm.h
│   │   └── lstm.h
│   ├── haste.h
│   ├── indrnn_backward_gpu.cu.cc
│   ├── indrnn_forward_gpu.cu.cc
│   ├── inline_ops.h
│   ├── layer_norm_backward_gpu.cu.cc
│   ├── layer_norm_forward_gpu.cu.cc
│   ├── layer_norm_gru_backward_gpu.cu.cc
│   ├── layer_norm_gru_forward_gpu.cu.cc
│   ├── layer_norm_indrnn_backward_gpu.cu.cc
│   ├── layer_norm_indrnn_forward_gpu.cu.cc
│   ├── layer_norm_lstm_backward_gpu.cu.cc
│   ├── layer_norm_lstm_forward_gpu.cu.cc
│   ├── lstm_backward_gpu.cu.cc
│   └── lstm_forward_gpu.cu.cc
└── validation/
    ├── pytorch.py
    ├── pytorch_speed.py
    ├── tf.py
    └── tf_pytorch.py

Download .txt

SYMBOL INDEX (413 symbols across 73 files)

FILE: benchmarks/benchmark_gru.cc
  function TimeLoop (line 49) | float TimeLoop(std::function<void()> fn, int iterations) {
  function CudnnInference (line 65) | float CudnnInference(
  function CudnnTrain (line 144) | float CudnnTrain(
  function HasteInference (line 281) | float HasteInference(
  function HasteTrain (line 333) | float HasteTrain(
  function usage (line 462) | void usage(const char* name) {
  function main (line 473) | int main(int argc, char* const* argv) {

FILE: benchmarks/benchmark_lstm.cc
  function TimeLoop (line 49) | float TimeLoop(std::function<void()> fn, int iterations) {
  function CudnnInference (line 65) | float CudnnInference(
  function CudnnTrain (line 143) | float CudnnTrain(
  function HasteInference (line 280) | float HasteInference(
  function HasteTrain (line 331) | float HasteTrain(
  function usage (line 469) | void usage(const char* name) {
  function main (line 480) | int main(int argc, char* const* argv) {

FILE: benchmarks/cudnn_wrappers.h
  function float (line 26) | struct CudnnDataType<float> {
  function double (line 31) | struct CudnnDataType<double> {

FILE: benchmarks/report.py
  function extract (line 22) | def extract(x, predicate):
  function main (line 26) | def main(args):

FILE: build/setup.pytorch.py
  class BuildHaste (line 26) | class BuildHaste(cpp_extension.BuildExtension):
    method run (line 27) | def run(self):

FILE: build/setup.tf.py
  class BinaryDistribution (line 24) | class BinaryDistribution(Distribution):
    method has_ext_modules (line 26) | def has_ext_modules(self):
  class BuildHaste (line 30) | class BuildHaste(_build):
    method run (line 31) | def run(self):

FILE: examples/device_ptr.h
  function device_ptr (line 26) | static device_ptr<T> NewByteSized(size_t bytes) {
  function explicit (line 30) | explicit device_ptr(size_t size_)
  function explicit (line 37) | explicit device_ptr(const T& elem)
  function ToDevice (line 63) | void ToDevice(const T& src) {
  function ToHost (line 68) | void ToHost(T& target) const {
  function zero (line 77) | void zero() {

FILE: examples/gru.cc
  class ScopeTimer (line 47) | class ScopeTimer {
    method ScopeTimer (line 49) | ScopeTimer(const string& msg) : msg_(msg) {
  function GruInference (line 71) | void GruInference(
  function GruTrain (line 119) | void GruTrain(
  function main (line 209) | int main() {

FILE: examples/lstm.cc
  class ScopeTimer (line 47) | class ScopeTimer {
    method ScopeTimer (line 49) | ScopeTimer(const string& msg) : msg_(msg) {
  function LstmInference (line 71) | void LstmInference(const Tensor2& W, const Tensor2& R, const Tensor1& b,...
  function LstmTrain (line 114) | void LstmTrain(const Tensor2& W, const Tensor2& R, const Tensor1& b, con...
  function LstmTrainIterative (line 219) | void LstmTrainIterative(const Tensor2& W, const Tensor2& R, const Tensor...
  function main (line 332) | int main() {

FILE: frameworks/pytorch/base_rnn.py
  class BaseRNN (line 25) | class BaseRNN(nn.Module):
    method __init__ (line 26) | def __init__(
    method _permute (line 40) | def _permute(self, x):
    method _get_state (line 45) | def _get_state(self, input, state, state_shape):
    method _get_final_state (line 52) | def _get_final_state(self, state, lengths):
    method _get_zoneout_mask (line 64) | def _get_zoneout_mask(self, input):
    method _is_cuda (line 72) | def _is_cuda(self):
  function _validate_state (line 79) | def _validate_state(state, state_shape):
  function _zero_state (line 108) | def _zero_state(input, state_shape):

FILE: frameworks/pytorch/gru.cc
  function gru_forward (line 31) | std::vector<Tensor> gru_forward(
  function gru_backward (line 91) | std::vector<Tensor> gru_backward(
  function gru_init (line 162) | void gru_init(py::module& m) {

FILE: frameworks/pytorch/gru.py
  function GRUScript (line 33) | def GRUScript(
  class GRUFunction (line 69) | class GRUFunction(torch.autograd.Function):
    method forward (line 71) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 79) | def backward(ctx, grad_h):
  class GRU (line 91) | class GRU(BaseRNN):
    method __init__ (line 110) | def __init__(self,
    method to_native_weights (line 161) | def to_native_weights(self):
    method from_native_weights (line 186) | def from_native_weights(self, weight_ih_l0, weight_hh_l0, bias_ih_l0, ...
    method reset_parameters (line 210) | def reset_parameters(self):
    method forward (line 219) | def forward(self, input, state=None, lengths=None):
    method _impl (line 249) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/indrnn.cc
  function Tensor (line 31) | Tensor indrnn_forward(
  function indrnn_backward (line 84) | std::vector<Tensor> indrnn_backward(
  function indrnn_init (line 145) | void indrnn_init(py::module& m) {

FILE: frameworks/pytorch/indrnn.py
  function IndRNNScript (line 33) | def IndRNNScript(
  class IndRNNFunction (line 57) | class IndRNNFunction(torch.autograd.Function):
    method forward (line 59) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 66) | def backward(ctx, grad_h):
  class IndRNN (line 76) | class IndRNN(BaseRNN):
    method __init__ (line 86) | def __init__(
    method reset_parameters (line 134) | def reset_parameters(self):
    method forward (line 139) | def forward(self, input, state=None, lengths=None):
    method _impl (line 171) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/layer_norm_gru.cc
  function layer_norm_gru_forward (line 31) | std::vector<Tensor> layer_norm_gru_forward(
  function layer_norm_gru_backward (line 117) | std::vector<Tensor> layer_norm_gru_backward(
  function layer_norm_gru_init (line 224) | void layer_norm_gru_init(py::module& m) {

FILE: frameworks/pytorch/layer_norm_gru.py
  function LayerNormGRUScript (line 33) | def LayerNormGRUScript(
  class LayerNormGRUFunction (line 70) | class LayerNormGRUFunction(torch.autograd.Function):
    method forward (line 72) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 80) | def backward(ctx, grad_h):
  class LayerNormGRU (line 92) | class LayerNormGRU(BaseRNN):
    method __init__ (line 109) | def __init__(self,
    method reset_parameters (line 164) | def reset_parameters(self):
    method forward (line 174) | def forward(self, input, state=None, lengths=None):
    method _impl (line 206) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/layer_norm_indrnn.cc
  function layer_norm_indrnn_forward (line 31) | std::vector<Tensor> layer_norm_indrnn_forward(
  function layer_norm_indrnn_backward (line 99) | std::vector<Tensor> layer_norm_indrnn_backward(
  function layer_norm_indrnn_init (line 179) | void layer_norm_indrnn_init(py::module& m) {

FILE: frameworks/pytorch/layer_norm_indrnn.py
  function LayerNormIndRNNScript (line 33) | def LayerNormIndRNNScript(
  class LayerNormIndRNNFunction (line 59) | class LayerNormIndRNNFunction(torch.autograd.Function):
    method forward (line 61) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 68) | def backward(ctx, grad_h):
  class LayerNormIndRNN (line 79) | class LayerNormIndRNN(BaseRNN):
    method __init__ (line 91) | def __init__(
    method reset_parameters (line 143) | def reset_parameters(self):
    method forward (line 150) | def forward(self, input, state=None, lengths=None):
    method _impl (line 182) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/layer_norm_lstm.cc
  function layer_norm_lstm_forward (line 31) | std::vector<Tensor> layer_norm_lstm_forward(
  function layer_norm_lstm_backward (line 141) | std::vector<Tensor> layer_norm_lstm_backward(
  function layer_norm_lstm_init (line 272) | void layer_norm_lstm_init(py::module& m) {

FILE: frameworks/pytorch/layer_norm_lstm.py
  function LayerNormLSTMScript (line 33) | def LayerNormLSTMScript(
  class LayerNormLSTMFunction (line 72) | class LayerNormLSTMFunction(torch.autograd.Function):
    method forward (line 74) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 82) | def backward(ctx, grad_h, grad_c):
  class LayerNormLSTM (line 94) | class LayerNormLSTM(BaseRNN):
    method __init__ (line 109) | def __init__(self,
    method reset_parameters (line 172) | def reset_parameters(self):
    method forward (line 184) | def forward(self, input, state=None, lengths=None):
    method _impl (line 215) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/lstm.cc
  function lstm_forward (line 31) | std::vector<Tensor> lstm_forward(
  function lstm_backward (line 91) | std::vector<Tensor> lstm_backward(
  function lstm_init (line 161) | void lstm_init(py::module& m) {

FILE: frameworks/pytorch/lstm.py
  function LSTMScript (line 33) | def LSTMScript(
  class LSTMFunction (line 69) | class LSTMFunction(torch.autograd.Function):
    method forward (line 71) | def forward(ctx, training, zoneout_prob, *inputs):
    method backward (line 79) | def backward(ctx, grad_h, grad_c):
  class LSTM (line 91) | class LSTM(BaseRNN):
    method __init__ (line 106) | def __init__(self,
    method to_native_weights (line 159) | def to_native_weights(self):
    method from_native_weights (line 182) | def from_native_weights(self, weight_ih_l0, weight_hh_l0, bias_ih_l0, ...
    method reset_parameters (line 203) | def reset_parameters(self):
    method forward (line 212) | def forward(self, input, state=None, lengths=None):
    method _impl (line 243) | def _impl(self, input, state, zoneout_mask):

FILE: frameworks/pytorch/support.cc
  function PYBIND11_MODULE (line 25) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: frameworks/pytorch/support.h
  function Half (line 30) | struct native_type<c10::Half> {

FILE: frameworks/tf/arena.h
  function T (line 42) | T* data() {
  type Entry (line 66) | struct Entry {
  type Entry (line 88) | struct Entry {

FILE: frameworks/tf/base_rnn.py
  function reverse_sequence (line 29) | def reverse_sequence(sequence, sequence_length):
  function transpose (line 40) | def transpose(tensor_or_tuple, perm):
  class BaseRNN (line 47) | class BaseRNN(tf.Module):
    method __init__ (line 48) | def __init__(self, rnn_class, num_units, direction, default_name, **kw...
    method build (line 63) | def build(self, shape):
    method output_size (line 81) | def output_size(self):
    method state_size (line 87) | def state_size(self):
    method __call__ (line 92) | def __call__(self, inputs, training, sequence_length=None, time_major=...
    method bidirectional (line 129) | def bidirectional(self):

FILE: frameworks/tf/gru.cc
  type HasteGruOp (line 78) | struct HasteGruOp : public OpKernel {
    method HasteGruOp (line 79) | explicit HasteGruOp(OpKernelConstruction* context) : OpKernel(context) {
    method Compute (line 87) | void Compute(OpKernelContext* context) override {
  type HasteGruGradOp (line 206) | struct HasteGruGradOp : public OpKernel {
    method HasteGruGradOp (line 207) | explicit HasteGruGradOp(OpKernelConstruction* context) : OpKernel(cont...
    method Compute (line 209) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/gru.py
  function gru_gradient (line 36) | def gru_gradient(op, *grads):
  class GRULayer (line 62) | class GRULayer(tf.Module):
    method __init__ (line 63) | def __init__(self,
    method build (line 97) | def build(self, shape):
    method get_weights (line 122) | def get_weights(self):
    method __call__ (line 133) | def __call__(self, inputs, sequence_length, training):
  class GRU (line 175) | class GRU(BaseRNN):
    method __init__ (line 190) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/gru_cell.py
  class GRUCell (line 25) | class GRUCell(rnn_cell.RNNCell):
    method __init__ (line 33) | def __init__(self, num_units, name=None, **kwargs):
    method state_size (line 41) | def state_size(self):
    method output_size (line 45) | def output_size(self):
    method build (line 48) | def build(self, shape):
    method __call__ (line 68) | def __call__(self, inputs, state, scope=None):

FILE: frameworks/tf/indrnn.cc
  type HasteIndrnnOp (line 71) | struct HasteIndrnnOp : public OpKernel {
    method HasteIndrnnOp (line 72) | explicit HasteIndrnnOp(OpKernelConstruction* context) : OpKernel(conte...
    method Compute (line 77) | void Compute(OpKernelContext* context) override {
  type HasteIndrnnGradOp (line 171) | struct HasteIndrnnGradOp : public OpKernel {
    method HasteIndrnnGradOp (line 172) | explicit HasteIndrnnGradOp(OpKernelConstruction* context) : OpKernel(c...
    method Compute (line 174) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/indrnn.py
  function indrnn_gradient (line 38) | def indrnn_gradient(op, *grads):
  function _get_initializer (line 60) | def _get_initializer(initializer):
  class IndRNNLayer (line 72) | class IndRNNLayer(tf.Module):
    method __init__ (line 73) | def __init__(self,
    method build (line 105) | def build(self, shape):
    method get_weights (line 127) | def get_weights(self):
    method __call__ (line 134) | def __call__(self, inputs, sequence_length, training):
  class IndRNN (line 171) | class IndRNN(BaseRNN):
    method __init__ (line 179) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/layer_norm.cc
  type HasteLayerNormOp (line 55) | struct HasteLayerNormOp : public OpKernel {
    method HasteLayerNormOp (line 56) | explicit HasteLayerNormOp(OpKernelConstruction* context) : OpKernel(co...
    method Compute (line 58) | void Compute(OpKernelContext* context) override {
  type HasteLayerNormGradOp (line 116) | struct HasteLayerNormGradOp : public OpKernel {
    method HasteLayerNormGradOp (line 117) | explicit HasteLayerNormGradOp(OpKernelConstruction* context) : OpKerne...
    method Compute (line 119) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/layer_norm.py
  function layer_norm_gradient (line 34) | def layer_norm_gradient(op, *grads):
  class LayerNorm (line 43) | class LayerNorm(tf.Module):
    method __init__ (line 51) | def __init__(self, name=None):
    method build (line 64) | def build(self, shape):
    method __call__ (line 82) | def __call__(self, x):

FILE: frameworks/tf/layer_norm_gru.cc
  type HasteLayerNormGruOp (line 81) | struct HasteLayerNormGruOp : public OpKernel {
    method HasteLayerNormGruOp (line 82) | explicit HasteLayerNormGruOp(OpKernelConstruction* context) : OpKernel...
    method Compute (line 90) | void Compute(OpKernelContext* context) override {
  type HasteLayerNormGruGradOp (line 246) | struct HasteLayerNormGruGradOp : public OpKernel {
    method HasteLayerNormGruGradOp (line 247) | explicit HasteLayerNormGruGradOp(OpKernelConstruction* context) : OpKe...
    method Compute (line 249) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/layer_norm_gru.py
  function layer_norm_gru_gradient (line 36) | def layer_norm_gru_gradient(op, *grads):
  class LayerNormGRULayer (line 63) | class LayerNormGRULayer(tf.Module):
    method __init__ (line 64) | def __init__(self,
    method build (line 98) | def build(self, shape):
    method get_weights (line 124) | def get_weights(self):
    method __call__ (line 136) | def __call__(self, inputs, sequence_length, training):
  class LayerNormGRU (line 179) | class LayerNormGRU(BaseRNN):
    method __init__ (line 194) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/layer_norm_gru_cell.py
  class LayerNormGRUCell (line 30) | class LayerNormGRUCell(rnn_cell.RNNCell):
    method __init__ (line 39) | def __init__(self,
    method state_size (line 60) | def state_size(self):
    method output_size (line 64) | def output_size(self):
    method build (line 67) | def build(self, shape):
    method __call__ (line 88) | def __call__(self, inputs, state, training=False, scope=None):
    method _layer_norm (line 106) | def _layer_norm(self, x, gamma):

FILE: frameworks/tf/layer_norm_indrnn.cc
  type HasteLayerNormIndrnnOp (line 78) | struct HasteLayerNormIndrnnOp : public OpKernel {
    method HasteLayerNormIndrnnOp (line 79) | explicit HasteLayerNormIndrnnOp(OpKernelConstruction* context) : OpKer...
    method Compute (line 84) | void Compute(OpKernelContext* context) override {
  type HasteLayerNormIndrnnGradOp (line 208) | struct HasteLayerNormIndrnnGradOp : public OpKernel {
    method HasteLayerNormIndrnnGradOp (line 209) | explicit HasteLayerNormIndrnnGradOp(OpKernelConstruction* context) : O...
    method Compute (line 211) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/layer_norm_indrnn.py
  function layer_norm_indrnn_gradient (line 38) | def layer_norm_indrnn_gradient(op, *grads):
  function _get_initializer (line 62) | def _get_initializer(initializer):
  class LayerNormIndRNNLayer (line 74) | class LayerNormIndRNNLayer(tf.Module):
    method __init__ (line 75) | def __init__(self,
    method build (line 108) | def build(self, shape):
    method get_weights (line 130) | def get_weights(self):
    method __call__ (line 138) | def __call__(self, inputs, sequence_length, training):
  class LayerNormIndRNN (line 176) | class LayerNormIndRNN(BaseRNN):
    method __init__ (line 186) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/layer_norm_lstm.cc
  type HasteLayerNormLstmOp (line 86) | struct HasteLayerNormLstmOp : public OpKernel {
    method HasteLayerNormLstmOp (line 87) | explicit HasteLayerNormLstmOp(OpKernelConstruction* context) : OpKerne...
    method Compute (line 95) | void Compute(OpKernelContext* context) override {
  type HasteLayerNormLstmGradOp (line 281) | struct HasteLayerNormLstmGradOp : public OpKernel {
    method HasteLayerNormLstmGradOp (line 282) | explicit HasteLayerNormLstmGradOp(OpKernelConstruction* context) : OpK...
    method Compute (line 284) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/layer_norm_lstm.py
  function lstm_gradient (line 37) | def lstm_gradient(op, *grads):
  class LayerNormLSTMLayer (line 78) | class LayerNormLSTMLayer(tf.Module):
    method __init__ (line 79) | def __init__(self,
    method build (line 116) | def build(self, shape):
    method get_weights (line 149) | def get_weights(self):
    method state_size (line 162) | def state_size(self):
    method output_size (line 166) | def output_size(self):
    method __call__ (line 169) | def __call__(self, x, sequence_length, training):
  class LayerNormLSTM (line 212) | class LayerNormLSTM(BaseRNN):
    method __init__ (line 225) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/layer_norm_lstm_cell.py
  class LayerNormLSTMCell (line 30) | class LayerNormLSTMCell(rnn_cell.RNNCell):
    method __init__ (line 39) | def __init__(self,
    method state_size (line 61) | def state_size(self):
    method output_size (line 65) | def output_size(self):
    method build (line 68) | def build(self, shape):
    method __call__ (line 84) | def __call__(self, inputs, state, training=False, scope=None):
    method _layer_norm (line 106) | def _layer_norm(self, x, gamma, beta):

FILE: frameworks/tf/lstm.cc
  type HasteLstmOp (line 77) | struct HasteLstmOp : public OpKernel {
    method HasteLstmOp (line 78) | explicit HasteLstmOp(OpKernelConstruction* context) : OpKernel(context) {
    method Compute (line 86) | void Compute(OpKernelContext* context) override {
  type HasteLstmGradOp (line 215) | struct HasteLstmGradOp : public OpKernel {
    method HasteLstmGradOp (line 216) | explicit HasteLstmGradOp(OpKernelConstruction* context) : OpKernel(con...
    method Compute (line 218) | void Compute(OpKernelContext* context) override {

FILE: frameworks/tf/lstm.py
  function lstm_gradient (line 38) | def lstm_gradient(op, *grads):
  class LSTMLayer (line 63) | class LSTMLayer(tf.Module):
    method __init__ (line 64) | def __init__(self,
    method build (line 102) | def build(self, shape):
    method get_weights (line 146) | def get_weights(self):
    method state_size (line 183) | def state_size(self):
    method output_size (line 187) | def output_size(self):
    method __call__ (line 190) | def __call__(self, x, sequence_length, training):
  class LSTM (line 230) | class LSTM(BaseRNN):
    method __init__ (line 245) | def __init__(self, num_units, direction='unidirectional', **kwargs):

FILE: frameworks/tf/support.cc
  type std (line 29) | namespace std {
    type hash<Key> (line 32) | struct hash<Key> {
  function cublasHandle_t (line 43) | cublasHandle_t GetCublasHandle(tensorflow::OpKernelContext* context) {
  function cudaStream_t (line 64) | const cudaStream_t& GetCudaStream(tensorflow::OpKernelContext* context) {

FILE: frameworks/tf/support.h
  function namespace (line 21) | namespace tensorflow {

FILE: frameworks/tf/weight_config.py
  class WeightConfig (line 17) | class WeightConfig:
    method __init__ (line 20) | def __init__(self, initializer=None, constraint=None, transform=None):
    method override (line 25) | def override(self, initializer, constraint, transform):

FILE: frameworks/tf/zoneout_wrapper.py
  class ZoneoutWrapper (line 29) | class ZoneoutWrapper(rnn_cell.RNNCell):
    method __init__ (line 38) | def __init__(self, cell, rate, training):
    method state_size (line 56) | def state_size(self):
    method output_size (line 60) | def output_size(self):
    method __call__ (line 63) | def __call__(self, inputs, state, scope=None):
    method _apply_zoneout (line 92) | def _apply_zoneout(self, new_tensor, old_tensor):
    method _build_mask (line 100) | def _build_mask(self, shape):

FILE: lib/blas.h
  function set_pointer_mode (line 22) | struct set_pointer_mode {
  function enable_tensor_cores (line 34) | struct enable_tensor_cores {
  function __half (line 49) | struct blas<__half> {
  function float (line 54) | struct blas<float> {
  function double (line 59) | struct blas<double> {

FILE: lib/gru_backward_gpu.cu.cc
  function __global__ (line 27) | __global__
  function __global__ (line 100) | __global__
  type haste (line 118) | namespace haste {
    type v0 (line 119) | namespace v0 {
      type gru (line 120) | namespace gru {
        type BackwardPass<T>::private_data (line 123) | struct BackwardPass<T>::private_data {
        type BackwardPass<half> (line 411) | struct BackwardPass<half>
        type BackwardPass<float> (line 412) | struct BackwardPass<float>
        type BackwardPass<double> (line 413) | struct BackwardPass<double>

FILE: lib/gru_forward_gpu.cu.cc
  function __global__ (line 28) | __global__
  function __global__ (line 89) | __global__
  type haste (line 107) | namespace haste {
    type v0 (line 108) | namespace v0 {
      type gru (line 109) | namespace gru {
        type ForwardPass<T>::private_data (line 112) | struct ForwardPass<T>::private_data {
        type ForwardPass<half> (line 373) | struct ForwardPass<half>
        type ForwardPass<float> (line 374) | struct ForwardPass<float>
        type ForwardPass<double> (line 375) | struct ForwardPass<double>

FILE: lib/haste/gru.h
  function namespace (line 21) | namespace haste {

FILE: lib/haste/indrnn.h
  function namespace (line 21) | namespace haste {

FILE: lib/haste/layer_norm.h
  function namespace (line 20) | namespace haste {

FILE: lib/haste/layer_norm_gru.h
  function namespace (line 21) | namespace haste {

FILE: lib/haste/layer_norm_indrnn.h
  function namespace (line 21) | namespace haste {

FILE: lib/haste/layer_norm_lstm.h
  function namespace (line 21) | namespace haste {

FILE: lib/haste/lstm.h
  function namespace (line 21) | namespace haste {

FILE: lib/indrnn_backward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 78) | namespace haste {
    type v0 (line 79) | namespace v0 {
      type indrnn (line 80) | namespace indrnn {
        type BackwardPass<T>::private_data (line 83) | struct BackwardPass<T>::private_data {
        class BackwardPass<float> (line 209) | class BackwardPass<float>
        class BackwardPass<double> (line 210) | class BackwardPass<double>

FILE: lib/indrnn_forward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 67) | namespace haste {
    type v0 (line 68) | namespace v0 {
      type indrnn (line 69) | namespace indrnn {
        type ForwardPass<T>::private_data (line 72) | struct ForwardPass<T>::private_data {
        class ForwardPass<float> (line 212) | class ForwardPass<float>
        class ForwardPass<double> (line 213) | class ForwardPass<double>

FILE: lib/inline_ops.h
  function atomicAdd (line 47) | double atomicAdd(double* address, double val) {

FILE: lib/layer_norm_backward_gpu.cu.cc
  function __global__ (line 24) | __global__
  type haste (line 97) | namespace haste {
    type v0 (line 98) | namespace v0 {
      type layer_norm (line 99) | namespace layer_norm {
        class BackwardPass<float> (line 167) | class BackwardPass<float>
        class BackwardPass<double> (line 168) | class BackwardPass<double>

FILE: lib/layer_norm_forward_gpu.cu.cc
  function __global__ (line 23) | __global__
  type haste (line 92) | namespace haste {
    type v0 (line 93) | namespace v0 {
      type layer_norm (line 94) | namespace layer_norm {
        class ForwardPass<float> (line 152) | class ForwardPass<float>
        class ForwardPass<double> (line 153) | class ForwardPass<double>

FILE: lib/layer_norm_gru_backward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 99) | namespace haste {
    type v0 (line 100) | namespace v0 {
      type layer_norm_gru (line 101) | namespace layer_norm_gru {
        type BackwardPass<T>::private_data (line 104) | struct BackwardPass<T>::private_data {
        type BackwardPass<float> (line 312) | struct BackwardPass<float>
        type BackwardPass<double> (line 313) | struct BackwardPass<double>

FILE: lib/layer_norm_gru_forward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 87) | namespace haste {
    type v0 (line 88) | namespace v0 {
      type layer_norm_gru (line 89) | namespace layer_norm_gru {
        type ForwardPass<T>::private_data (line 92) | struct ForwardPass<T>::private_data {
        type ForwardPass<float> (line 307) | struct ForwardPass<float>
        type ForwardPass<double> (line 308) | struct ForwardPass<double>

FILE: lib/layer_norm_indrnn_backward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 78) | namespace haste {
    type v0 (line 79) | namespace v0 {
      type layer_norm_indrnn (line 80) | namespace layer_norm_indrnn {
        type BackwardPass<T>::private_data (line 83) | struct BackwardPass<T>::private_data {
        class BackwardPass<float> (line 211) | class BackwardPass<float>
        class BackwardPass<double> (line 212) | class BackwardPass<double>

FILE: lib/layer_norm_indrnn_forward_gpu.cu.cc
  function __global__ (line 26) | __global__
  type haste (line 67) | namespace haste {
    type v0 (line 68) | namespace v0 {
      type layer_norm_indrnn (line 69) | namespace layer_norm_indrnn {
        type ForwardPass<T>::private_data (line 72) | struct ForwardPass<T>::private_data {
        class ForwardPass<float> (line 215) | class ForwardPass<float>
        class ForwardPass<double> (line 216) | class ForwardPass<double>

FILE: lib/layer_norm_lstm_backward_gpu.cu.cc
  function __global__ (line 28) | __global__
  function __global__ (line 68) | __global__
  type haste (line 122) | namespace haste {
    type v0 (line 123) | namespace v0 {
      type layer_norm_lstm (line 124) | namespace layer_norm_lstm {
        type BackwardPass<T>::private_data (line 127) | struct BackwardPass<T>::private_data {
        type BackwardPass<float> (line 354) | struct BackwardPass<float>
        type BackwardPass<double> (line 355) | struct BackwardPass<double>

FILE: lib/layer_norm_lstm_forward_gpu.cu.cc
  function __global__ (line 27) | __global__
  function __global__ (line 76) | __global__
  type haste (line 113) | namespace haste {
    type v0 (line 114) | namespace v0 {
      type layer_norm_lstm (line 115) | namespace layer_norm_lstm {
        type ForwardPass<T>::private_data (line 118) | struct ForwardPass<T>::private_data {
        type ForwardPass<float> (line 342) | struct ForwardPass<float>
        type ForwardPass<double> (line 343) | struct ForwardPass<double>

FILE: lib/lstm_backward_gpu.cu.cc
  function __global__ (line 28) | __global__
  type haste (line 100) | namespace haste {
    type v0 (line 101) | namespace v0 {
      type lstm (line 102) | namespace lstm {
        type BackwardPass<T>::private_data (line 105) | struct BackwardPass<T>::private_data {
        type BackwardPass<float> (line 407) | struct BackwardPass<float>
        type BackwardPass<double> (line 408) | struct BackwardPass<double>

FILE: lib/lstm_forward_gpu.cu.cc
  function __global__ (line 28) | __global__
  type haste (line 91) | namespace haste {
    type v0 (line 92) | namespace v0 {
      type lstm (line 93) | namespace lstm {
        type ForwardPass<T>::private_data (line 96) | struct ForwardPass<T>::private_data {
        type ForwardPass<float> (line 368) | struct ForwardPass<float>
        type ForwardPass<double> (line 369) | struct ForwardPass<double>

FILE: validation/pytorch.py
  function self_consistency (line 42) | def self_consistency(rnn, x):
  function native_consistency (line 65) | def native_consistency(haste_rnn, pytorch_rnn, x):
  function _run_rnn (line 92) | def _run_rnn(rnn_type, x, **kwargs):
  function run_rnn (line 100) | def run_rnn(rnn_type, x):
  function main (line 105) | def main(args):

FILE: validation/tf.py
  function stfu (line 21) | def stfu():
  function NativeGRUBuilder (line 27) | def NativeGRUBuilder(hidden_size):
  function NativeLSTMBuilder (line 37) | def NativeLSTMBuilder(hidden_size):
  function NativeGRUWeights (line 47) | def NativeGRUWeights(native_gru, haste_gru):
  function NativeLSTMWeights (line 54) | def NativeLSTMWeights(native_lstm, haste_lstm):
  function native_consistency (line 89) | def native_consistency(haste_rnn, native_rnn, x):
  function run_rnn (line 107) | def run_rnn(rnn_type, x):
  function main (line 114) | def main(args):

FILE: validation/tf_pytorch.py
  function stfu (line 25) | def stfu():
  function copy_weights_gru (line 31) | def copy_weights_gru(rnn_tf, rnn_pt):
  function copy_weights_indrnn (line 44) | def copy_weights_indrnn(rnn_tf, rnn_pt):
  function copy_weights_layer_norm_gru (line 55) | def copy_weights_layer_norm_gru(rnn_tf, rnn_pt):
  function copy_weights_layer_norm_indrnn (line 70) | def copy_weights_layer_norm_indrnn(rnn_tf, rnn_pt):
  function copy_weights_layer_norm_lstm (line 83) | def copy_weights_layer_norm_lstm(rnn_tf, rnn_pt):
  function copy_weights_lstm (line 100) | def copy_weights_lstm(rnn_tf, rnn_pt):
  function run_rnn (line 144) | def run_rnn(rnn_type, x):
  function main (line 166) | def main(args):

Download .json

Condensed preview — 102 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (818K chars).

[
  {
    "path": ".gitignore",
    "chars": 69,
    "preview": "*.a\n*.o\n*.so\n*.whl\nbenchmark_lstm\nbenchmark_gru\nhaste_lstm\nhaste_gru\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 1636,
    "preview": "# ChangeLog\n\n## 0.4.0 (2020-04-13)\n### Added\n- New layer normalized GRU layer (`LayerNormGRU`).\n- New IndRNN layer.\n- CP"
  },
  {
    "path": "LICENSE",
    "chars": 11340,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "Makefile",
    "chars": 5630,
    "preview": "AR ?= ar\nCXX ?= g++\nNVCC ?= nvcc -ccbin $(CXX)\nPYTHON ?= python\n\nifeq ($(OS),Windows_NT)\nLIBHASTE := haste.lib\nCUDA_HOME"
  },
  {
    "path": "README.md",
    "chars": 9549,
    "preview": "<div align=\"center\">\n  <img src=\"https://lmnt.com/assets/haste-logo_social_media.png\">\n</div>\n\n-------------------------"
  },
  {
    "path": "benchmarks/benchmark_gru.cc",
    "chars": 16074,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "benchmarks/benchmark_lstm.cc",
    "chars": 16208,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "benchmarks/cudnn_wrappers.h",
    "chars": 4247,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "benchmarks/report.py",
    "chars": 2631,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "build/MANIFEST.in",
    "chars": 188,
    "preview": "include Makefile\ninclude frameworks/tf/*.h\ninclude frameworks/tf/*.cc\ninclude frameworks/pytorch/*.h\ninclude frameworks/"
  },
  {
    "path": "build/common.py",
    "chars": 1392,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "build/setup.pytorch.py",
    "chars": 2237,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "build/setup.tf.py",
    "chars": 1796,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "docs/pytorch/haste_pytorch/GRU.md",
    "chars": 24606,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch.GRU"
  },
  {
    "path": "docs/pytorch/haste_pytorch/IndRNN.md",
    "chars": 22706,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch.Ind"
  },
  {
    "path": "docs/pytorch/haste_pytorch/LSTM.md",
    "chars": 24573,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch.LST"
  },
  {
    "path": "docs/pytorch/haste_pytorch/LayerNormGRU.md",
    "chars": 23644,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch.Lay"
  },
  {
    "path": "docs/pytorch/haste_pytorch/LayerNormLSTM.md",
    "chars": 23628,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch.Lay"
  },
  {
    "path": "docs/pytorch/haste_pytorch.md",
    "chars": 700,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_pytorch\" />"
  },
  {
    "path": "docs/tf/haste_tf/GRU.md",
    "chars": 7593,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.GRU\" />\n"
  },
  {
    "path": "docs/tf/haste_tf/GRUCell.md",
    "chars": 19158,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.GRUCell\""
  },
  {
    "path": "docs/tf/haste_tf/IndRNN.md",
    "chars": 6957,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.IndRNN\" "
  },
  {
    "path": "docs/tf/haste_tf/LSTM.md",
    "chars": 7949,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LSTM\" />"
  },
  {
    "path": "docs/tf/haste_tf/LayerNorm.md",
    "chars": 4346,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LayerNor"
  },
  {
    "path": "docs/tf/haste_tf/LayerNormGRU.md",
    "chars": 7669,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LayerNor"
  },
  {
    "path": "docs/tf/haste_tf/LayerNormGRUCell.md",
    "chars": 19248,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LayerNor"
  },
  {
    "path": "docs/tf/haste_tf/LayerNormLSTM.md",
    "chars": 7443,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LayerNor"
  },
  {
    "path": "docs/tf/haste_tf/LayerNormLSTMCell.md",
    "chars": 19254,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.LayerNor"
  },
  {
    "path": "docs/tf/haste_tf/ZoneoutWrapper.md",
    "chars": 19069,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf.ZoneoutW"
  },
  {
    "path": "docs/tf/haste_tf.md",
    "chars": 1229,
    "preview": "<div itemscope itemtype=\"http://developers.google.com/ReferenceObject\">\n<meta itemprop=\"name\" content=\"haste_tf\" />\n<met"
  },
  {
    "path": "examples/device_ptr.h",
    "chars": 2346,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/gru.cc",
    "chars": 6080,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "examples/lstm.cc",
    "chars": 10164,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/__init__.py",
    "chars": 1126,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/base_rnn.py",
    "chars": 4389,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/gru.cc",
    "chars": 5314,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/gru.py",
    "chars": 10236,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/indrnn.cc",
    "chars": 4649,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/indrnn.py",
    "chars": 6656,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/layer_norm_gru.cc",
    "chars": 7641,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/layer_norm_gru.py",
    "chars": 8819,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/layer_norm_indrnn.cc",
    "chars": 5964,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/layer_norm_indrnn.py",
    "chars": 7407,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/layer_norm_lstm.cc",
    "chars": 8931,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/layer_norm_lstm.py",
    "chars": 9445,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/lstm.cc",
    "chars": 5195,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/lstm.py",
    "chars": 10163,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/pytorch/support.cc",
    "chars": 1106,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/pytorch/support.h",
    "chars": 1248,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/__init__.py",
    "chars": 1422,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/arena.h",
    "chars": 2926,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/base_rnn.py",
    "chars": 4439,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/gru.cc",
    "chars": 12364,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/gru.py",
    "chars": 8702,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/gru_cell.py",
    "chars": 2675,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/indrnn.cc",
    "chars": 9769,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/indrnn.py",
    "chars": 7633,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm.cc",
    "chars": 5505,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/layer_norm.py",
    "chars": 2622,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm_gru.cc",
    "chars": 15702,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/layer_norm_gru.py",
    "chars": 9039,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm_gru_cell.py",
    "chars": 3480,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm_indrnn.cc",
    "chars": 12381,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/layer_norm_indrnn.py",
    "chars": 8064,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm_lstm.cc",
    "chars": 17762,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/layer_norm_lstm.py",
    "chars": 9538,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/layer_norm_lstm_cell.py",
    "chars": 3802,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/lstm.cc",
    "chars": 12400,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/lstm.py",
    "chars": 11360,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/support.cc",
    "chars": 2152,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/support.h",
    "chars": 1210,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "frameworks/tf/weight_config.py",
    "chars": 1201,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "frameworks/tf/zoneout_wrapper.py",
    "chars": 3403,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "lib/blas.h",
    "chars": 1805,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/device_assert.h",
    "chars": 999,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/gru_backward_gpu.cu.cc",
    "chars": 11907,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/gru_forward_gpu.cu.cc",
    "chars": 11073,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/gru.h",
    "chars": 8655,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/indrnn.h",
    "chars": 2087,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/layer_norm.h",
    "chars": 2351,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/layer_norm_gru.h",
    "chars": 8428,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/layer_norm_indrnn.h",
    "chars": 2227,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/layer_norm_lstm.h",
    "chars": 8356,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste/lstm.h",
    "chars": 12825,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/haste.h",
    "chars": 1257,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/indrnn_backward_gpu.cu.cc",
    "chars": 5356,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/indrnn_forward_gpu.cu.cc",
    "chars": 5582,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/inline_ops.h",
    "chars": 2029,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_backward_gpu.cu.cc",
    "chars": 4938,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_forward_gpu.cu.cc",
    "chars": 4195,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_gru_backward_gpu.cu.cc",
    "chars": 9040,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_gru_forward_gpu.cu.cc",
    "chars": 9283,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_indrnn_backward_gpu.cu.cc",
    "chars": 5500,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_indrnn_forward_gpu.cu.cc",
    "chars": 5752,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_lstm_backward_gpu.cu.cc",
    "chars": 10687,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/layer_norm_lstm_forward_gpu.cu.cc",
    "chars": 10525,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/lstm_backward_gpu.cu.cc",
    "chars": 12020,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "lib/lstm_forward_gpu.cu.cc",
    "chars": 11436,
    "preview": "// Copyright 2020 LMNT, Inc. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the \"License\");\n"
  },
  {
    "path": "validation/pytorch.py",
    "chars": 3278,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "validation/pytorch_speed.py",
    "chars": 362,
    "preview": "import torch\nimport haste_pytorch as haste\n\nfrom time import time\n\n\nseq_len = 2500\nbatch_size = 64\ninput_size = 256\nhidd"
  },
  {
    "path": "validation/tf.py",
    "chars": 3733,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  },
  {
    "path": "validation/tf_pytorch.py",
    "chars": 6123,
    "preview": "# Copyright 2020 LMNT, Inc. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# y"
  }
]

About this extraction

This page contains the full source code of the lmnt-com/haste GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 102 files (762.7 KB), approximately 205.3k tokens, and a symbol index with 413 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo