Repository: LiyuanLucasLiu/LD-Net
Branch: master
Commit: f9489b6e7d43
Files: 34
Total size: 151.5 KB
Directory structure:
gitextract_4ncvupf5/
├── LICENSE
├── ReadMe.md
├── docs/
│ ├── Makefile
│ └── source/
│ ├── conf.py
│ ├── index.rst
│ ├── seq.rst
│ └── word.rst
├── ldnet_ner_prune.sh
├── ldnet_np_prune.sh
├── model_seq/
│ ├── __init__.py
│ ├── crf.py
│ ├── dataset.py
│ ├── elmo.py
│ ├── evaluator.py
│ ├── seqlabel.py
│ ├── seqlm.py
│ ├── sparse_lm.py
│ └── utils.py
├── model_word_ada/
│ ├── LM.py
│ ├── __init__.py
│ ├── adaptive.py
│ ├── basic.py
│ ├── dataset.py
│ ├── densenet.py
│ ├── ldnet.py
│ └── utils.py
├── pre_seq/
│ ├── encode_data.py
│ └── gene_map.py
├── pre_word_ada/
│ ├── encode_data2folder.py
│ └── gene_map.py
├── prune_sparse_seq.py
├── train_lm.py
├── train_seq.py
└── train_seq_elmo.py
================================================
FILE CONTENTS
================================================
================================================
FILE: LICENSE
================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [2018] [Liyuan Liu]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================
FILE: ReadMe.md
================================================
# LD-Net
[](http://ld-net.readthedocs.io/en/latest/?badge=latest)
[](https://opensource.org/licenses/Apache-2.0)
**Check Our New NER Toolkit🚀🚀🚀**
- **Inference**:
- **[LightNER](https://github.com/LiyuanLucasLiu/LightNER)**: inference w. models pre-trained / trained w. *any* following tools, *efficiently*.
- **Training**:
- **[LD-Net](https://github.com/LiyuanLucasLiu/LD-Net)**: train NER models w. efficient contextualized representations.
- **[VanillaNER](https://github.com/LiyuanLucasLiu/Vanilla_NER)**: train vanilla NER models w. pre-trained embedding.
- **Distant Training**:
- **[AutoNER](https://shangjingbo1226.github.io/AutoNER/)**: train NER models w.o. line-by-line annotations and get competitive performance.
--------------------------------
LD-Net provides sequence labeling models featuring:
- **Efficiency**: constructing *efficient contextualized representations* without retraining language models.
- **Portability**: *well-organized*, *easy-to-modify* and *[well-documented](http://lm-lstm-crf.readthedocs.io/en/latest/)*.
Remarkablely, our pre-trained NER model achieved:
- **92.08** test F1 on the CoNLL03 NER task.
- **160K words/sec** decoding speed (**6X** speedup compared to its original model).
Details about LD-Net can be accessed at: https://arxiv.org/abs/1804.07827.
- [Model notes](#model-notes)
- [Benchmarks](#benchmarks)
- [Pretrained model](#pretrained-model)
- [Language models](#language-models)
- [Named Entity Recognition](#named-entity-recognition)
- [Chunking](#chunking)
- [Training](#model-training)
- [Dependency](#dependency)
- [Data](#data)
- [Model](#model)
- [Command](#command)
- [Inference](#inference)
- [Citation](#citation)
## Model Notes

## Benchmarks
| Model for CoNLL03 | #FLOPs| Mean(F1) | Std(F1) |
| ------------- |-------------| -----| -----|
| Vanilla NER w.o. LM | 3 M | 90.78 | 0.24 |
| LD-Net (w.o. pruning) | 51 M | 91.86 | 0.15 |
| LD-Net (origin, picked based on dev f1) | 51 M | 91.95 | |
| LD-Net (pruned) | **5 M** | 91.84 | 0.14 |
| Model for CoNLL00 | #FLOPs| Mean(F1) | Std(F1) |
| ------------- |-------------| -----| -----|
| Vanilla NP w.o. LM | 3 M | 94.42 | 0.08 |
| LD-Net (w.o. pruning) | 51 M | 96.01 | 0.07 |
| LD-Net (origin, picked based on dev f1) | 51 M | 96.13 | |
| LD-Net (pruned) | **10 M** | 95.66 | 0.04 |
## Pretrained Models
Here we provide both pre-trained language models and pre-trained sequence labeling models.
### Language Models
Our pretrained language model contains word embedding, 10-layer densely-connected LSTM and adative softmax, and achieve an average PPL of 50.06 on the one billion benchmark dataset.
| Forward Language Model | Backward Language Model |
| ------------- |------------- |
| [Download Link](http://dmserv4.cs.illinois.edu/ld0.th) | [Download Link](http://dmserv4.cs.illinois.edu/ld_0.th)|
### Named Entity Recognition
The original pre-trained named entity tagger achieves 91.95 F1, the pruned tagged achieved 92.08 F1.
| Original Tagger | Pruned Tagger |
| ------------- |------------- |
| [Download Link](http://dmserv4.cs.illinois.edu/ner.th) | [Download Link](http://dmserv4.cs.illinois.edu/pner0.th) |
### Chunking
The original pre-trained named entity tagger achieves 96.13 F1, the pruned tagged achieved 95.79 F1.
| Original Tagger | Pruned Tagger |
| ------------- |------------- |
| [Download Link](http://dmserv4.cs.illinois.edu/np.th) | [Download Link](http://dmserv4.cs.illinois.edu/pnp0.th) |
## Training
### Demo Scripts
To pruning the original LD-Net for the CoNLL03 NER, please run:
```
bash ldnet_ner_prune.sh
```
To pruning the original LD-Net for the CoNLL00 Chunking, please run:
```
bash ldnet_np_prune.sh
```
### Dependency
Our package is based on Python 3.6 and the following packages:
```
numpy
tqdm
torch-scope
torch==0.4.1
```
### Data
Pre-process scripts are available in ```pre_seq``` and ```pre_word_ada```, while pre-processed data has been stored in:
| NER | Chunking |
| ------------- |------------- |
| [Download Link](http://dmserv4.cs.illinois.edu/ner_dataset.pk) | [Download Link](http://dmserv4.cs.illinois.edu/np_dataset.pk) |
### Model
Our implementations are available in ```model_seq``` and ```model_word_ada```, and the documentations are hosted in [ReadTheDoc](http://lm-lstm-crf.readthedocs.io/en/latest/)
| NER | Chunking |
| ------------- |------------- |
| [Download Link](http://dmserv4.cs.illinois.edu/ner_dataset.pk) | [Download Link](http://dmserv4.cs.illinois.edu/np_dataset.pk) |
## Inference
For model inference, please check our [LightNER package](https://github.com/LiyuanLucasLiu/LightNER)
## Citation
If you find the implementation useful, please cite the following paper: [Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling](https://arxiv.org/abs/1804.07827)
```
@inproceedings{liu2018efficient,
title = "{Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling}",
author = {Liu, Liyuan and Ren, Xiang and Shang, Jingbo and Peng, Jian and Han, Jiawei},
booktitle = {EMNLP},
year = 2018,
}
```
================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXPROJ = LD_Net
SOURCEDIR = source
BUILDDIR = build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
================================================
FILE: docs/source/conf.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Wrapper documentation build configuration file, created by
# sphinx-quickstart on Thu Sep 14 03:49:01 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.mathjax',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx.ext.githubpages'
]
napoleon_use_ivar = True
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = 'LD-Net'
copyright = '2018, Liyuan Liu'
author = 'Liyuan Liu'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = ''
# The full version, including alpha/beta/rc tags.
release = ''
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = []
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
html_theme_options = {
'collapse_navigation': False,
'display_version': True,
}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
'**': [
'about.html',
'navigation.html',
'relations.html', # needs 'show_related': True theme option to display
'searchbox.html',
'donate.html',
]
}
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'LD_Net'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'ldnet.tex', 'LD-Net Documentation',
'Liyuan Liu', 'manual'),
]
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'LD-Net', 'LD-Net Documentation',
[author], 1)
]
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'LD-Net', 'LD-Net Documentation',
author, 'LD-Net', 'Efficient Contextualized Representations.',
'Miscellaneous'),
]
autodoc_mock_imports = ['torch', 'numpy', 'tensorboardX', 'git', 'tqdm']
intersphinx_mapping = {
'git': ('https://gitpython.readthedocs.io/en/stable/', None),
'tensorboardX': ('https://tensorboardx.readthedocs.io/en/latest/', None),
'python':('https://docs.python.org/3', None),
'numpy': ('http://docs.scipy.org/doc/numpy/', None),
'torch': ('http://pytorch.org/docs/master', None)
}
================================================
FILE: docs/source/index.rst
================================================
.. LD-Net documentation master file.
:github_url: https://github.com/LiyuanLucasLiu/LD-Net
LD-Net documentation
=========================
**Check Our New NER Toolkit🚀🚀🚀**
- **Inference**:
- `LightNER <https://github.com/LiyuanLucasLiu/LightNER>`_: inference w. models pre-trained / trained w. *any* following tools, *efficiently*.
- **Training**:
- `LD-Net <https://github.com/LiyuanLucasLiu/LD-Net>`_: train NER models w. efficient contextualized representations.
- `VanillaNER <https://github.com/LiyuanLucasLiu/Vanilla_NER>`_: train vanilla NER models w. pre-trained embedding.
- **Distant Training**:
- `AutoNER <https://shangjingbo1226.github.io/AutoNER/>`_: train NER models w.o. line-by-line annotations and get competitive performance.
--------------------------
This project provides high-performance word-level language model, and sequence labeling with contextualized representation.
The key feature of this project is the support of langugage model pruning without retraining.
Details about LD-Net can be accessed at: https://arxiv.org/abs/1804.07827.
.. toctree::
:maxdepth: 2
:caption: Language Modeling
word
.. toctree::
:maxdepth: 2
:caption: Sequence Labeling
seq
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
================================================
FILE: docs/source/seq.rst
================================================
Sequence Labeling
==========================
model_seq\.crf module
----------------------
.. automodule:: model_seq.crf
:members:
model_seq\.dataset module
--------------------------
.. automodule:: model_seq.dataset
:members:
model_seq\.elmo module
-----------------------
.. automodule:: model_seq.elmo
:members:
model_seq\.evaluator module
----------------------------
.. automodule:: model_seq.evaluator
:members:
model_seq\.seqlabel module
--------------------------
.. automodule:: model_seq.seqlabel
:members:
model_seq\.seqlm module
------------------------
.. automodule:: model_seq.seqlm
:members:
model_seq\.sparse_lm module
----------------------------
.. automodule:: model_seq.sparse_lm
:members:
model_seq\.utils module
-------------------------
.. automodule:: model_seq.utils
:members:
================================================
FILE: docs/source/word.rst
================================================
Language Modeling
==========================
model_word_ada\.adaptive module
-------------------------------
.. automodule:: model_word_ada.adaptive
:members:
model_word_ada\.basic module
----------------------------
.. automodule:: model_word_ada.basic
:members:
model_word_ada\.dataset module
-------------------------------
.. automodule:: model_word_ada.dataset
:members:
model_word_ada\.densenet module
-------------------------------
.. automodule:: model_word_ada.densenet
:members:
model_word_ada\.ldnet module
----------------------------
.. automodule:: model_word_ada.ldnet
:members:
model_word_ada\.LM module
-------------------------
.. automodule:: model_word_ada.LM
:members:
model_word_ada\.utils module
----------------------------
.. automodule:: model_word_ada.utils
:members:
================================================
FILE: ldnet_ner_prune.sh
================================================
FIRST_RUN=1
DATA_ROOT="data/"
NER_DATASET=$DATA_ROOT/ner_dataset.pk
CHECKPOINT_ROOT="checkpoint/"
NER_CHECKPOINT=$CHECKPOINT_ROOT/ner.th
CHECKPOINT_NAME="p_ner0"
green=`tput setaf 2`
reset=`tput sgr0`
if [ $FIRST_RUN == 1 ] && [ ! -e $NER_DATASET ]; then
echo ${green}=== Downloading Dataset ===${reset}
mkdir -p DATA_ROOT
curl http://dmserv4.cs.illinois.edu/ner_dataset.pk -o $NER_DATASET
fi
if [ $FIRST_RUN == 1 ] && [ ! -e $NER_CHECKPOINT ]; then
echo ${green}=== Downloading Checkpoint ===${reset}
mkdir -p CHECKPOINT_ROOT
curl http://dmserv4.cs.illinois.edu/ner.th -o $NER_CHECKPOINT
fi
echo ${green}=== Pruning NER Model ===${reset}
python prune_sparse_seq.py --cp_root $CHECKPOINT_ROOT --checkpoint_name $CHECKPOINT_NAME --corpus $NER_DATASET --load_seq $NER_CHECKPOINT --seq_lambda0 0.05 --seq_lambda1 2
================================================
FILE: ldnet_np_prune.sh
================================================
FIRST_RUN=1
DATA_ROOT="data/"
NP_DATASET=$DATA_ROOT/np_dataset.pk
CHECKPOINT_ROOT="checkpoint/"
NP_CHECKPOINT=$CHECKPOINT_ROOT/np.th
CHECKPOINT_NAME="p_np0"
green=`tput setaf 2`
reset=`tput sgr0`
if [ $FIRST_RUN == 1 ] && [ ! -e $NP_DATASET ]; then
echo ${green}=== Downloading Dataset ===${reset}
mkdir -p DATA_ROOT
curl http://dmserv4.cs.illinois.edu/np_dataset.pk -o $NP_DATASET
fi
if [ $FIRST_RUN == 1 ] && [ ! -e $NP_CHECKPOINT ]; then
echo ${green}=== Downloading Checkpoint ===${reset}
mkdir -p CHECKPOINT_ROOT
curl http://dmserv4.cs.illinois.edu/np.th -o $NP_CHECKPOINT
fi
echo ${green}=== Pruning NER Model ===${reset}
python prune_sparse_seq.py --cp_root $CHECKPOINT_ROOT --checkpoint_name $CHECKPOINT_NAME --corpus $NP_DATASET --load_seq $NP_CHECKPOINT --seq_lambda0 0.05 --seq_lambda1 2
================================================
FILE: model_seq/__init__.py
================================================
================================================
FILE: model_seq/crf.py
================================================
"""
.. module:: crf
:synopsis: conditional random field
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.optim as optim
import torch.sparse as sparse
import model_seq.utils as utils
class CRF(nn.Module):
"""
Conditional Random Field Module
Parameters
----------
hidden_dim : ``int``, required.
the dimension of the input features.
tagset_size : ``int``, required.
the size of the target labels.
if_bias: ``bool``, optional, (default=True).
whether the linear transformation has the bias term.
"""
def __init__(self,
hidden_dim: int,
tagset_size: int,
if_bias: bool = True):
super(CRF, self).__init__()
self.tagset_size = tagset_size
self.hidden2tag = nn.Linear(hidden_dim, self.tagset_size, bias=if_bias)
self.transitions = nn.Parameter(torch.Tensor(self.tagset_size, self.tagset_size))
def rand_init(self):
"""
random initialization
"""
utils.init_linear(self.hidden2tag)
self.transitions.data.zero_()
def forward(self, feats):
"""
calculate the potential score for the conditional random field.
Parameters
----------
feats: ``torch.FloatTensor``, required.
the input features for the conditional random field, of shape (*, hidden_dim).
Returns
-------
output: ``torch.FloatTensor``.
A float tensor of shape (ins_num, from_tag_size, to_tag_size)
"""
scores = self.hidden2tag(feats).view(-1, 1, self.tagset_size)
ins_num = scores.size(0)
crf_scores = scores.expand(ins_num, self.tagset_size, self.tagset_size) + self.transitions.view(1, self.tagset_size, self.tagset_size).expand(ins_num, self.tagset_size, self.tagset_size)
return crf_scores
class CRFLoss(nn.Module):
"""
The negative loss for the Conditional Random Field Module
Parameters
----------
y_map : ``dict``, required.
a ``dict`` maps from tag string to tag index.
average_batch : ``bool``, optional, (default=True).
whether the return score would be averaged per batch.
"""
def __init__(self,
y_map: dict,
average_batch: bool = True):
super(CRFLoss, self).__init__()
self.tagset_size = len(y_map)
self.start_tag = y_map['<s>']
self.end_tag = y_map['<eof>']
self.average_batch = average_batch
def forward(self, scores, target, mask):
"""
calculate the negative log likehood for the conditional random field.
Parameters
----------
scores: ``torch.FloatTensor``, required.
the potential score for the conditional random field, of shape (seq_len, batch_size, from_tag_size, to_tag_size).
target: ``torch.LongTensor``, required.
the positive path for the conditional random field, of shape (seq_len, batch_size).
mask: ``torch.ByteTensor``, required.
the mask for the unpadded sentence parts, of shape (seq_len, batch_size).
Returns
-------
loss: ``torch.FloatTensor``.
The NLL loss.
"""
seq_len = scores.size(0)
bat_size = scores.size(1)
tg_energy = torch.gather(scores.view(seq_len, bat_size, -1), 2, target.unsqueeze(2)).view(seq_len, bat_size)
tg_energy = tg_energy.masked_select(mask).sum()
seq_iter = enumerate(scores)
_, inivalues = seq_iter.__next__()
partition = inivalues[:, self.start_tag, :].squeeze(1).clone()
for idx, cur_values in seq_iter:
cur_values = cur_values + partition.unsqueeze(2).expand(bat_size, self.tagset_size, self.tagset_size)
cur_partition = utils.log_sum_exp(cur_values)
mask_idx = mask[idx, :].view(bat_size, 1).expand(bat_size, self.tagset_size)
partition.masked_scatter_(mask_idx, cur_partition.masked_select(mask_idx))
partition = partition[:, self.end_tag].sum()
if self.average_batch:
return (partition - tg_energy) / bat_size
else:
return (partition - tg_energy)
class CRFDecode():
"""
The negative loss for the Conditional Random Field Module
Parameters
----------
y_map : ``dict``, required.
a ``dict`` maps from tag string to tag index.
"""
def __init__(self, y_map: dict):
self.tagset_size = len(y_map)
self.start_tag = y_map['<s>']
self.end_tag = y_map['<eof>']
self.y_map = y_map
self.r_y_map = {v:k for k, v in self.y_map.items()}
def decode(self, scores, mask):
"""
find the best path from the potential scores by the viterbi decoding algorithm.
Parameters
----------
scores: ``torch.FloatTensor``, required.
the potential score for the conditional random field, of shape (seq_len, batch_size, from_tag_size, to_tag_size).
mask: ``torch.ByteTensor``, required.
the mask for the unpadded sentence parts, of shape (seq_len, batch_size).
Returns
-------
output: ``torch.LongTensor``.
A LongTensor of shape (seq_len - 1, batch_size)
"""
seq_len = scores.size(0)
bat_size = scores.size(1)
mask = 1 - mask.data
decode_idx = torch.LongTensor(seq_len-1, bat_size)
seq_iter = enumerate(scores)
_, inivalues = seq_iter.__next__()
forscores = inivalues[:, self.start_tag, :]
back_points = list()
for idx, cur_values in seq_iter:
cur_values = cur_values + forscores.contiguous().view(bat_size, self.tagset_size, 1).expand(bat_size, self.tagset_size, self.tagset_size)
forscores, cur_bp = torch.max(cur_values, 1)
cur_bp.masked_fill_(mask[idx].view(bat_size, 1).expand(bat_size, self.tagset_size), self.end_tag)
back_points.append(cur_bp)
pointer = back_points[-1][:, self.end_tag]
decode_idx[-1] = pointer
for idx in range(len(back_points)-2, -1, -1):
back_point = back_points[idx]
index = pointer.contiguous().view(-1, 1)
pointer = torch.gather(back_point, 1, index).view(-1)
decode_idx[idx] = pointer
return decode_idx
def to_spans(self, sequence):
"""
decode the best path to spans.
Parameters
----------
sequence: list, required.
the list of best label indexes paths .
Returns
-------
output: ``set``.
A set of chunks contains the position and type of the entities.
"""
chunks = []
current = None
for i, y in enumerate(sequence):
label = self.r_y_map[y]
if label.startswith('B-'):
if current is not None:
chunks.append('@'.join(current))
current = [label.replace('B-', ''), '%d' % i]
elif label.startswith('S-'):
if current is not None:
chunks.append('@'.join(current))
current = None
base = label.replace('S-', '')
chunks.append('@'.join([base, '%d' % i]))
elif label.startswith('I-'):
if current is not None:
base = label.replace('I-', '')
if base == current[0]:
current.append('%d' % i)
else:
chunks.append('@'.join(current))
current = [base, '%d' % i]
else:
current = [label.replace('I-', ''), '%d' % i]
elif label.startswith('E-'):
if current is not None:
base = label.replace('E-', '')
if base == current[0]:
current.append('%d' % i)
chunks.append('@'.join(current))
current = None
else:
chunks.append('@'.join(current))
current = [base, '%d' % i]
chunks.append('@'.join(current))
current = None
else:
current = [label.replace('E-', ''), '%d' % i]
chunks.append('@'.join(current))
current = None
else:
if current is not None:
chunks.append('@'.join(current))
current = None
if current is not None:
chunks.append('@'.join(current))
return set(chunks)
================================================
FILE: model_seq/dataset.py
================================================
"""
.. module:: dataset
:synopsis: dataset for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
import pickle
import random
import functools
import itertools
from tqdm import tqdm
class SeqDataset(object):
"""
Dataset for Sequence Labeling
Parameters
----------
dataset : ``list``, required.
The encoded dataset (outputs of preprocess scripts).
flm_pad : ``int``, required.
The pad index for the forward language model.
blm_pad : ``int``, required.
The pad index for the backward language model.
w_pad : ``int``, required.
The pad index for the word-level inputs.
c_con : ``int``, required.
The index of connect character token for character-level inputs.
c_pad : ``int``, required.
The pad index for the character-level inputs.
y_start : ``int``, required.
The index of the start label token.
y_pad : ``int``, required.
The index of the pad label token.
y_size : ``int``, required.
The size of the tag set.
batch_size: ``int``, required.
Batch size.
"""
def __init__(self,
dataset: list,
flm_pad: int,
blm_pad: int,
w_pad: int,
c_con: int,
c_pad: int,
y_start: int,
y_pad: int,
y_size: int,
batch_size: int):
super(SeqDataset, self).__init__()
self.flm_pad = flm_pad
self.blm_pad = blm_pad
self.w_pad = w_pad
self.c_con = c_con
self.c_pad = c_pad
self.y_pad = y_pad
self.y_size = y_size
self.y_start = y_start
self.batch_size = batch_size
self.construct_index(dataset)
self.shuffle()
def shuffle(self):
"""
shuffle dataset
"""
random.shuffle(self.shuffle_list)
def get_tqdm(self, device):
"""
construct dataset reader and the corresponding tqdm.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
"""
return tqdm(self.reader(device), mininterval=2, total=self.index_length // self.batch_size, leave=False, file=sys.stdout, ncols=80)
def construct_index(self, dataset):
"""
construct index for the dataset.
Parameters
----------
dataset: ``list``, required.
the encoded dataset (outputs of preprocess scripts).
"""
for instance in dataset:
c_len = [len(tup)+1 for tup in instance[3]]
c_ins = [tup for ins in instance[3] for tup in (ins + [self.c_con])]
instance[3] = c_ins
instance.append(c_len)
self.dataset = dataset
self.index_length = len(dataset)
self.shuffle_list = list(range(0, self.index_length))
def reader(self, device):
"""
construct dataset reader.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
Returns
-------
reader: ``iterator``.
A lazy iterable object
"""
cur_idx = 0
while cur_idx < self.index_length:
end_index = min(cur_idx + self.batch_size, self.index_length)
batch = [self.dataset[self.shuffle_list[index]] for index in range(cur_idx, end_index)]
cur_idx = end_index
yield self.batchify(batch, device)
self.shuffle()
def batchify(self, batch, device):
"""
batchify a batch of data and move to a device.
Parameters
----------
batch: ``list``, required.
a sample from the encoded dataset (outputs of preprocess scripts).
device: ``torch.device``, required.
the target device for the dataset loader.
"""
cur_batch_size = len(batch)
char_padded_len = max([len(tup[3]) for tup in batch])
word_padded_len = max([len(tup[0]) for tup in batch])
tmp_batch = [list() for ind in range(11)]
for instance_ind in range(cur_batch_size):
instance = batch[instance_ind]
char_padded_len_ins = char_padded_len - len(instance[3])
word_padded_len_ins = word_padded_len - len(instance[0])
tmp_batch[0].append(instance[3] + [self.c_pad] + [self.c_pad] * char_padded_len_ins)
tmp_batch[2].append([self.c_pad] + instance[3][::-1] + [self.c_pad] * char_padded_len_ins)
tmp_p = list( itertools.accumulate(instance[5]+[1]+[0]* word_padded_len_ins) )
tmp_batch[1].append([(x - 1) * cur_batch_size + instance_ind for x in tmp_p])
tmp_p = list(itertools.accumulate([1]+instance[5][::-1]))[::-1] + [1]*word_padded_len_ins
tmp_batch[3].append([(x - 1) * cur_batch_size + instance_ind for x in tmp_p])
tmp_batch[4].append(instance[0] + [self.flm_pad] + [self.flm_pad] * word_padded_len_ins)
tmp_batch[5].append([self.blm_pad] + instance[1][::-1] + [self.blm_pad] * word_padded_len_ins)
tmp_p = list(range(len(instance[1]), -1, -1)) + list(range(len(instance[1])+1, word_padded_len+1))
tmp_batch[6].append([x * cur_batch_size + instance_ind for x in tmp_p])
tmp_batch[7].append(instance[2] + [self.w_pad] + [self.w_pad] * word_padded_len_ins)
tmp_batch[8].append([self.y_start * self.y_size + instance[4][0]] + [instance[4][ind] * self.y_size + instance[4][ind+1] for ind in range(len(instance[4]) - 1)] + [instance[4][-1] * self.y_size + self.y_pad] + [self.y_pad * self.y_size + self.y_pad] * word_padded_len_ins)
tmp_batch[9].append([1] * len(instance[4]) + [1] + [0] * word_padded_len_ins)
tmp_batch[10].append(instance[4])
tbt = [torch.LongTensor(v).transpose(0, 1).contiguous() for v in tmp_batch[0:9]] + [torch.ByteTensor(tmp_batch[9]).transpose(0, 1).contiguous()]
tbt[1] = tbt[1].view(-1)
tbt[3] = tbt[3].view(-1)
tbt[6] = tbt[6].view(-1)
return [ten.to(device) for ten in tbt] + [tmp_batch[10]]
================================================
FILE: model_seq/elmo.py
================================================
"""
.. module:: elmo
:synopsis: deep contextualized representation
.. moduleauthor:: Liyuan Liu
"""
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_seq.utils as utils
import torch
import torch.nn as nn
import torch.nn.functional as F
class EBUnit(nn.Module):
"""
The basic recurrent unit for the ELMo RNNs wrapper.
Parameters
----------
ori_unit : ``torch.nn.Module``, required.
The original module of rnn unit.
droprate : ``float``, required.
The dropout ratrio.
fix_rate: ``bool``, required.
Whether to fix the rqtio.
"""
def __init__(self, ori_unit, droprate, fix_rate):
super(EBUnit, self).__init__()
self.layer = ori_unit.layer
self.droprate = droprate
self.output_dim = ori_unit.output_dim
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.FloatTensor``, required.
The input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
out, _ = self.layer(x)
if self.droprate > 0:
out = F.dropout(out, p=self.droprate, training=self.training)
return out
class ERNN(nn.Module):
"""
The multi-layer recurrent networks for the ELMo RNNs wrapper.
Parameters
----------
ori_drnn : ``torch.nn.Module``, required.
The original module of rnn networks.
droprate : ``float``, required.
The dropout ratrio.
fix_rate: ``bool``, required.
Whether to fix the rqtio.
"""
def __init__(self, ori_drnn, droprate, fix_rate):
super(ERNN, self).__init__()
self.layer_list = [EBUnit(ori_unit, droprate, fix_rate) for ori_unit in ori_drnn.layer._modules.values()]
self.gamma = nn.Parameter(torch.FloatTensor([1.0]))
self.weight_list = nn.Parameter(torch.FloatTensor([0.0] * len(self.layer_list)))
self.layer = nn.ModuleList(self.layer_list)
for param in self.layer.parameters():
param.requires_grad = False
if fix_rate:
self.gamma.requires_grad = False
self.weight_list.requires_grad = False
self.output_dim = self.layer_list[-1].output_dim
def regularizer(self):
"""
Calculate the regularization term.
Returns
----------
The regularization term.
"""
srd_weight = self.weight_list - (1.0 / len(self.layer_list))
return (srd_weight ** 2).sum()
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.FloatTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The ELMo outputs.
"""
out = 0
nw = self.gamma * F.softmax(self.weight_list, dim=0)
for ind in range(len(self.layer_list)):
x = self.layer[ind](x)
out += x * nw[ind]
return out
class ElmoLM(nn.Module):
"""
The language model for the ELMo RNNs wrapper.
Parameters
----------
ori_lm : ``torch.nn.Module``, required.
the original module of language model.
backward : ``bool``, required.
whether the language model is backward.
droprate : ``float``, required.
the dropout ratrio.
fix_rate: ``bool``, required.
whether to fix the rqtio.
"""
def __init__(self, ori_lm, backward, droprate, fix_rate):
super(ElmoLM, self).__init__()
self.rnn = ERNN(ori_lm.rnn, droprate, fix_rate)
self.w_num = ori_lm.w_num
self.w_dim = ori_lm.w_dim
self.word_embed = ori_lm.word_embed
self.word_embed.weight.requires_grad = False
self.output_dim = ori_lm.rnn_output
self.backward = backward
def init_hidden(self):
"""
initialize hidden states.
"""
return
def regularizer(self):
"""
Calculate the regularization term.
Returns
----------
reg: ``list``.
The list of regularization terms.
"""
return self.rnn.regularizer()
def prox(self, lambda0):
"""
the proximal calculator.
"""
return 0.0
def forward(self, w_in, ind=None):
"""
Calculate the output.
Parameters
----------
w_in : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size).
ind : ``torch.LongTensor``, optional, (default=None).
the index tensor for the backward language model, of shape (seq_len, batch_size).
Returns
----------
output: ``torch.FloatTensor``.
The ELMo outputs.
"""
w_emb = self.word_embed(w_in)
out = self.rnn(w_emb)
if self.backward:
out_size = out.size()
out = out.view(out_size[0] * out_size[1], out_size[2]).index_select(0, ind).contiguous().view(out_size)
return out
================================================
FILE: model_seq/evaluator.py
================================================
"""
.. module:: evaluator
:synopsis: evaluator for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import torch
import numpy as np
import itertools
import model_seq.utils as utils
from torch.autograd import Variable
class eval_batch:
"""
Base class for evaluation, provide method to calculate f1 score and accuracy.
Parameters
----------
decoder : ``torch.nn.Module``, required.
the decoder module, which needs to contain the ``to_span()`` method.
"""
def __init__(self, decoder):
self.decoder = decoder
def reset(self):
"""
reset counters.
"""
self.correct_labels = 0
self.total_labels = 0
self.gold_count = 0
self.guess_count = 0
self.overlap_count = 0
def calc_f1_batch(self, decoded_data, target_data):
"""
update statics for f1 score.
Parameters
----------
decoded_data: ``torch.LongTensor``, required.
the decoded best label index pathes.
target_data: ``torch.LongTensor``, required.
the golden label index pathes.
"""
batch_decoded = torch.unbind(decoded_data, 1)
for decoded, target in zip(batch_decoded, target_data):
length = len(target)
best_path = decoded[:length]
correct_labels_i, total_labels_i, gold_count_i, guess_count_i, overlap_count_i = self.eval_instance(best_path.numpy(), target)
self.correct_labels += correct_labels_i
self.total_labels += total_labels_i
self.gold_count += gold_count_i
self.guess_count += guess_count_i
self.overlap_count += overlap_count_i
def calc_acc_batch(self, decoded_data, target_data):
"""
update statics for accuracy score.
Parameters
----------
decoded_data: ``torch.LongTensor``, required.
the decoded best label index pathes.
target_data: ``torch.LongTensor``, required.
the golden label index pathes.
"""
batch_decoded = torch.unbind(decoded_data, 1)
for decoded, target in zip(batch_decoded, target_data):
# remove padding
length = len(target)
best_path = decoded[:length].numpy()
self.total_labels += length
self.correct_labels += np.sum(np.equal(best_path, gold))
def f1_score(self):
"""
calculate the f1 score based on the inner counter.
"""
if self.guess_count == 0:
return 0.0, 0.0, 0.0, 0.0
precision = self.overlap_count / float(self.guess_count)
recall = self.overlap_count / float(self.gold_count)
if precision == 0.0 or recall == 0.0:
return 0.0, 0.0, 0.0, 0.0
f = 2 * (precision * recall) / (precision + recall)
accuracy = float(self.correct_labels) / self.total_labels
return f, precision, recall, accuracy
def acc_score(self):
"""
calculate the accuracy score based on the inner counter.
"""
if 0 == self.total_labels:
return 0.0
accuracy = float(self.correct_labels) / self.total_labels
return accuracy
def eval_instance(self, best_path, gold):
"""
Calculate statics to update inner counters for one instance.
Parameters
----------
best_path: required.
the decoded best label index pathe.
gold: required.
the golden label index pathes.
"""
total_labels = len(best_path)
correct_labels = np.sum(np.equal(best_path, gold))
gold_chunks = self.decoder.to_spans(gold)
gold_count = len(gold_chunks)
guess_chunks = self.decoder.to_spans(best_path)
guess_count = len(guess_chunks)
overlap_chunks = gold_chunks & guess_chunks
overlap_count = len(overlap_chunks)
return correct_labels, total_labels, gold_count, guess_count, overlap_count
class eval_wc(eval_batch):
"""
evaluation class for LD-Net
Parameters
----------
decoder : ``torch.nn.Module``, required.
the decoder module, which needs to contain the ``to_span()`` and ``decode()`` method.
score_type : ``str``, required.
whether the f1 score or the accuracy is needed.
"""
def __init__(self, decoder, score_type):
eval_batch.__init__(self, decoder)
if 'f' in score_type:
self.eval_b = self.calc_f1_batch
self.calc_s = self.f1_score
else:
self.eval_b = self.calc_acc_batch
self.calc_s = self.acc_score
def calc_score(self, seq_model, dataset_loader):
"""
calculate scores
Parameters
----------
seq_model: required.
sequence labeling model.
dataset_loader: required.
the dataset loader.
Returns
-------
score: ``float``.
calculated score.
"""
seq_model.eval()
self.reset()
for f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w, _, f_y_m, g_y in dataset_loader:
scores = seq_model(f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w)
decoded = self.decoder.decode(scores.data, f_y_m)
self.eval_b(decoded, g_y)
return self.calc_s()
================================================
FILE: model_seq/seqlabel.py
================================================
"""
.. module:: seqlabel
:synopsis: sequence labeling model
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_seq.utils as utils
from model_seq.crf import CRF
class SeqLabel(nn.Module):
"""
Sequence Labeling model augumented with language model.
Parameters
----------
f_lm : ``torch.nn.Module``, required.
The forward language modle for contextualized representations.
b_lm : ``torch.nn.Module``, required.
The backward language modle for contextualized representations.
c_num : ``int`` , required.
The number of characters.
c_dim : ``int`` , required.
The dimension of character embedding.
c_hidden : ``int`` , required.
The dimension of character hidden states.
c_layer : ``int`` , required.
The number of character lstms.
w_num : ``int`` , required.
The number of words.
w_dim : ``int`` , required.
The dimension of word embedding.
w_hidden : ``int`` , required.
The dimension of word hidden states.
w_layer : ``int`` , required.
The number of word lstms.
y_num : ``int`` , required.
The number of tags types.
droprate : ``float`` , required
The dropout ratio.
unit : "str", optional, (default = 'lstm')
The type of the recurrent unit.
"""
def __init__(self, f_lm, b_lm,
c_num: int,
c_dim: int,
c_hidden: int,
c_layer: int,
w_num: int,
w_dim: int,
w_hidden: int,
w_layer: int,
y_num: int,
droprate: float,
unit: str = 'lstm'):
super(SeqLabel, self).__init__()
rnnunit_map = {'rnn': nn.RNN, 'lstm': nn.LSTM, 'gru': nn.GRU}
self.f_lm = f_lm
self.b_lm = b_lm
self.unit_type = unit
self.char_embed = nn.Embedding(c_num, c_dim)
self.word_embed = nn.Embedding(w_num, w_dim)
self.char_seq = nn.Linear(c_hidden * 2, w_dim)
self.lm_seq = nn.Linear(f_lm.output_dim + b_lm.output_dim, w_dim)
self.relu = nn.ReLU()
self.c_hidden = c_hidden
tmp_rnn_dropout = droprate if c_layer > 1 else 0
self.char_fw = rnnunit_map[unit](c_dim, c_hidden, c_layer, dropout = tmp_rnn_dropout)
self.char_bw = rnnunit_map[unit](c_dim, c_hidden, c_layer, dropout = tmp_rnn_dropout)
tmp_rnn_dropout = droprate if w_layer > 1 else 0
self.word_rnn = rnnunit_map[unit](w_dim * 3, w_hidden // 2, w_layer, dropout = tmp_rnn_dropout, bidirectional = True)
self.y_num = y_num
self.crf = CRF(w_hidden, y_num)
self.drop = nn.Dropout(p = droprate)
def to_params(self):
"""
To parameters.
"""
return {
"model_type": "char-lstm-crf",
"forward_lm": self.f_lm.to_params(),
"backward_lm": self.b_lm.to_params(),
"word_embed_num": self.word_embed.num_embeddings,
"word_embed_dim": self.word_embed.embedding_dim,
"char_embed_num": self.char_embed.num_embeddings,
"char_embed_dim": self.char_embed.embedding_dim,
"char_hidden": self.c_hidden,
"char_layers": self.char_fw.num_layers,
"word_hidden": self.word_rnn.hidden_size,
"word_layers": self.word_rnn.num_layers,
"droprate": self.drop.p,
"y_num": self.y_num,
"label_schema": "iobes",
"unit_type": self.unit_type
}
def prune_dense_rnn(self):
"""
Prune dense rnn to be smaller by delecting layers.
"""
f_prune_mask = self.f_lm.prune_dense_rnn()
b_prune_mask = self.b_lm.prune_dense_rnn()
prune_mask = torch.cat([f_prune_mask, b_prune_mask], dim = 0)
mask_index = prune_mask.nonzero().squeeze(1)
self.lm_seq.weight = nn.Parameter(self.lm_seq.weight.data.index_select(1, mask_index).contiguous())
self.lm_seq.in_features = self.lm_seq.weight.size(1)
def set_batch_seq_size(self, sentence):
"""
Set the batch size and sequence length.
"""
tmp = sentence.size()
self.word_seq_length = tmp[0]
self.batch_size = tmp[1]
def load_pretrained_word_embedding(self, pre_word_embeddings):
"""
Load pre-trained word embedding.
"""
self.word_embed.weight = nn.Parameter(pre_word_embeddings)
def rand_init(self):
"""
Random initialization.
"""
utils.init_embedding(self.char_embed.weight)
utils.init_lstm(self.char_fw)
utils.init_lstm(self.char_bw)
utils.init_lstm(self.word_rnn)
utils.init_linear(self.char_seq)
utils.init_linear(self.lm_seq)
self.crf.rand_init()
def forward(self, f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w):
"""
Calculate the output (crf potentials).
Parameters
----------
f_c : ``torch.LongTensor``, required.
Character-level inputs in the forward direction.
f_p : ``torch.LongTensor``, required.
Ouput position of character-level inputs in the forward direction.
b_c : ``torch.LongTensor``, required.
Character-level inputs in the backward direction.
b_p : ``torch.LongTensor``, required.
Ouput position of character-level inputs in the backward direction.
flm_w : ``torch.LongTensor``, required.
Word-level inputs for the forward language model.
blm_w : ``torch.LongTensor``, required.
Word-level inputs for the backward language model.
blm_ind : ``torch.LongTensor``, required.
Ouput position of word-level inputs for the backward language model.
f_w: ``torch.LongTensor``, required.
Word-level inputs for the sequence labeling model.
Returns
-------
output: ``torch.FloatTensor``.
A float tensor of shape (sequence_len, batch_size, from_tag_size, to_tag_size)
"""
self.set_batch_seq_size(f_w)
f_c_e = self.drop(self.char_embed(f_c))
b_c_e = self.drop(self.char_embed(b_c))
f_c_e, _ = self.char_fw(f_c_e)
b_c_e, _ = self.char_bw(b_c_e)
f_c_e = f_c_e.view(-1, self.c_hidden).index_select(0, f_p).view(self.word_seq_length, self.batch_size, self.c_hidden)
b_c_e = b_c_e.view(-1, self.c_hidden).index_select(0, b_p).view(self.word_seq_length, self.batch_size, self.c_hidden)
c_o = self.drop(torch.cat([f_c_e, b_c_e], dim = 2))
c_o = self.char_seq(c_o)
self.f_lm.init_hidden()
self.b_lm.init_hidden()
f_lm_e = self.f_lm(flm_w)
b_lm_e = self.b_lm(blm_w, blm_ind)
lm_o = self.drop(torch.cat([f_lm_e, b_lm_e], dim = 2))
lm_o = self.relu(self.lm_seq(lm_o))
w_e = self.word_embed(f_w)
rnn_in = self.drop(torch.cat([c_o, lm_o, w_e], dim = 2))
rnn_out, _ = self.word_rnn(rnn_in)
crf_out = self.crf(self.drop(rnn_out)).view(self.word_seq_length, self.batch_size, self.y_num, self.y_num)
return crf_out
class Vanilla_SeqLabel(nn.Module):
"""
Sequence Labeling model augumented without language model.
Parameters
----------
f_lm : ``torch.nn.Module``, required.
forward language modle for contextualized representations.
b_lm : ``torch.nn.Module``, required.
backward language modle for contextualized representations.
c_num : ``int`` , required.
number of characters.
c_dim : ``int`` , required.
dimension of character embedding.
c_hidden : ``int`` , required.
dimension of character hidden states.
c_layer : ``int`` , required.
number of character lstms.
w_num : ``int`` , required.
number of words.
w_dim : ``int`` , required.
dimension of word embedding.
w_hidden : ``int`` , required.
dimension of word hidden states.
w_layer : ``int`` , required.
number of word lstms.
y_num : ``int`` , required.
number of tags types.
droprate : ``float`` , required
dropout ratio.
unit : "str", optional, (default = 'lstm')
type of the recurrent unit.
"""
def __init__(self, f_lm, b_lm, c_num, c_dim, c_hidden, c_layer, w_num, w_dim, w_hidden, w_layer, y_num, droprate, unit='lstm'):
super(Vanilla_SeqLabel, self).__init__()
rnnunit_map = {'rnn': nn.RNN, 'lstm': nn.LSTM, 'gru': nn.GRU}
self.char_embed = nn.Embedding(c_num, c_dim)
self.word_embed = nn.Embedding(w_num, w_dim)
self.char_seq = nn.Linear(c_hidden * 2, w_dim)
self.c_hidden = c_hidden
self.char_fw = rnnunit_map[unit](c_dim, c_hidden, c_layer, dropout = droprate)
self.char_bw = rnnunit_map[unit](c_dim, c_hidden, c_layer, dropout = droprate)
self.word_rnn = rnnunit_map[unit](w_dim + w_dim, w_hidden // 2, w_layer, dropout = droprate, bidirectional = True)
self.y_num = y_num
self.crf = CRF(w_hidden, y_num)
self.drop = nn.Dropout(p = droprate)
def set_batch_seq_size(self, sentence):
"""
set batch size and sequence length
"""
tmp = sentence.size()
self.word_seq_length = tmp[0]
self.batch_size = tmp[1]
def load_pretrained_word_embedding(self, pre_word_embeddings):
"""
Load pre-trained word embedding.
"""
self.word_embed.weight = nn.Parameter(pre_word_embeddings)
def rand_init(self):
"""
Random initialization.
"""
utils.init_embedding(self.char_embed.weight)
utils.init_lstm(self.char_fw)
utils.init_lstm(self.char_bw)
utils.init_lstm(self.word_rnn)
utils.init_linear(self.char_seq)
self.crf.rand_init()
def forward(self, f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w):
"""
Calculate the output (crf potentials).
Parameters
----------
f_c : ``torch.LongTensor``, required.
Character-level inputs in the forward direction.
f_p : ``torch.LongTensor``, required.
Ouput position of character-level inputs in the forward direction.
b_c : ``torch.LongTensor``, required.
Character-level inputs in the backward direction.
b_p : ``torch.LongTensor``, required.
Ouput position of character-level inputs in the backward direction.
flm_w : ``torch.LongTensor``, required.
Word-level inputs for the forward language model.
blm_w : ``torch.LongTensor``, required.
Word-level inputs for the backward language model.
blm_ind : ``torch.LongTensor``, required.
Ouput position of word-level inputs for the backward language model.
f_w: ``torch.LongTensor``, required.
Word-level inputs for the sequence labeling model.
Returns
-------
output: ``torch.FloatTensor``.
A float tensor of shape (sequence_len, batch_size, from_tag_size, to_tag_size)
"""
self.set_batch_seq_size(f_w)
f_c_e = self.drop(self.char_embed(f_c))
b_c_e = self.drop(self.char_embed(b_c))
f_c_e, _ = self.char_fw(f_c_e)
b_c_e, _ = self.char_bw(b_c_e)
f_c_e = f_c_e.view(-1, self.c_hidden).index_select(0, f_p).view(self.word_seq_length, self.batch_size, self.c_hidden)
b_c_e = b_c_e.view(-1, self.c_hidden).index_select(0, b_p).view(self.word_seq_length, self.batch_size, self.c_hidden)
c_o = self.drop(torch.cat([f_c_e, b_c_e], dim = 2))
c_o = self.char_seq(c_o)
w_e = self.word_embed(f_w)
rnn_in = self.drop(torch.cat([c_o, w_e], dim = 2))
rnn_out, _ = self.word_rnn(rnn_in)
crf_out = self.crf(self.drop(rnn_out)).view(self.word_seq_length, self.batch_size, self.y_num, self.y_num)
return crf_out
================================================
FILE: model_seq/seqlm.py
================================================
"""
.. module:: seqlm
:synopsis: language model for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_seq.utils as utils
import torch
import torch.nn as nn
import torch.nn.functional as F
class BasicSeqLM(nn.Module):
"""
The language model for the dense rnns.
Parameters
----------
ori_lm : ``torch.nn.Module``, required.
the original module of language model.
backward : ``bool``, required.
whether the language model is backward.
droprate : ``float``, required.
the dropout ratrio.
fix_rate: ``bool``, required.
whether to fix the rqtio.
"""
def __init__(self, ori_lm, backward, droprate, fix_rate):
super(BasicSeqLM, self).__init__()
self.rnn = ori_lm.rnn
for param in self.rnn.parameters():
param.requires_grad = False
self.w_num = ori_lm.w_num
self.w_dim = ori_lm.w_dim
self.word_embed = ori_lm.word_embed
self.word_embed.weight.requires_grad = False
self.output_dim = ori_lm.rnn_output
self.backward = backward
def to_params(self):
"""
To parameters.
"""
return {
"rnn_params": self.rnn.to_params(),
"word_embed_num": self.word_embed.num_embeddings,
"word_embed_dim": self.word_embed.embedding_dim
}
def init_hidden(self):
"""
initialize hidden states.
"""
self.rnn.init_hidden()
def regularizer(self):
"""
Calculate the regularization term.
Returns
----------
reg: ``list``.
The list of regularization terms.
"""
return self.rnn.regularizer()
def forward(self, w_in, ind=None):
"""
Calculate the output.
Parameters
----------
w_in : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size).
ind : ``torch.LongTensor``, optional, (default=None).
the index tensor for the backward language model, of shape (seq_len, batch_size).
Returns
----------
output: ``torch.FloatTensor``.
The ELMo outputs.
"""
w_emb = self.word_embed(w_in)
out = self.rnn(w_emb)
if self.backward:
out_size = out.size()
out = out.view(out_size[0] * out_size[1], out_size[2]).index_select(0, ind).contiguous().view(out_size)
return out
================================================
FILE: model_seq/sparse_lm.py
================================================
"""
.. module:: sparse_lm
:synopsis: sparse language model for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_seq.utils as utils
class SBUnit(nn.Module):
"""
The basic recurrent unit for the dense-RNNs wrapper.
Parameters
----------
ori_unit : ``torch.nn.Module``, required.
the original module of rnn unit.
droprate : ``float``, required.
the dropout ratrio.
fix_rate: ``bool``, required.
whether to fix the rqtio.
"""
def __init__(self, ori_unit, droprate, fix_rate):
super(SBUnit, self).__init__()
self.unit_type = ori_unit.unit_type
self.layer = ori_unit.layer
self.droprate = droprate
self.input_dim = ori_unit.input_dim
self.increase_rate = ori_unit.increase_rate
self.output_dim = ori_unit.input_dim + ori_unit.increase_rate
def prune_rnn(self, mask):
"""
Prune dense rnn to be smaller by delecting layers.
Parameters
----------
mask : ``torch.ByteTensor``, required.
The selection tensor for the input matrix.
"""
mask_index = mask.nonzero().squeeze(1)
self.layer.weight_ih_l0 = nn.Parameter(self.layer.weight_ih_l0.data.index_select(1, mask_index).contiguous())
self.layer.input_size = self.layer.weight_ih_l0.size(1)
def forward(self, x, weight=1):
"""
Calculate the output.
Parameters
----------
x : ``torch.FloatTensor``, required.
The input tensor, of shape (seq_len, batch_size, input_dim).
weight : ``torch.FloatTensor``, required.
The selection variable.
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
if self.droprate > 0:
new_x = F.dropout(x, p=self.droprate, training=self.training)
else:
new_x = x
out, _ = self.layer(new_x)
out = weight * out
return torch.cat([x, out], 2)
class SDRNN(nn.Module):
"""
The multi-layer recurrent networks for the dense-RNNs wrapper.
Parameters
----------
ori_unit : ``torch.nn.Module``, required.
the original module of rnn unit.
droprate : ``float``, required.
the dropout ratrio.
fix_rate: ``bool``, required.
whether to fix the rqtio.
"""
def __init__(self, ori_drnn, droprate, fix_rate):
super(SDRNN, self).__init__()
if ori_drnn.layer:
self.layer_list = [SBUnit(ori_unit, droprate, fix_rate) for ori_unit in ori_drnn.layer._modules.values()]
self.weight_list = nn.Parameter(torch.FloatTensor([1.0] * len(self.layer_list)))
self.weight_list.requires_grad = not fix_rate
# self.layer = nn.Sequential(*self.layer_list)
self.layer = nn.ModuleList(self.layer_list)
for param in self.layer.parameters():
param.requires_grad = False
else:
self.layer_list = list()
self.weight_list = list()
self.layer = None
# self.output_dim = self.layer_list[-1].output_dim
self.emb_dim = ori_drnn.emb_dim
self.output_dim = ori_drnn.output_dim
self.unit_type = ori_drnn.unit_type
def to_params(self):
"""
To parameters.
"""
return {
"rnn_type": "LDRNN",
"unit_type": self.unit_type,
"layer_num": 0 if not self.layer else len(self.layer),
"emb_dim": self.emb_dim,
"hid_dim": -1 if not self.layer else self.layer[0].increase_rate,
"droprate": -1 if not self.layer else self.layer[0].droprate,
"after_pruned": True
}
def prune_dense_rnn(self):
"""
Prune dense rnn to be smaller by delecting layers.
"""
prune_mask = torch.ones(self.layer_list[0].input_dim)
increase_mask_one = torch.ones(self.layer_list[0].increase_rate)
increase_mask_zero = torch.zeros(self.layer_list[0].increase_rate)
new_layer_list = list()
new_weight_list = list()
for ind in range(0, len(self.layer_list)):
if self.weight_list.data[ind] > 0:
new_weight_list.append(self.weight_list.data[ind])
self.layer_list[ind].prune_rnn(prune_mask)
new_layer_list.append(self.layer_list[ind])
prune_mask = torch.cat([prune_mask, increase_mask_one], dim = 0)
else:
prune_mask = torch.cat([prune_mask, increase_mask_zero], dim = 0)
if not new_layer_list:
self.output_dim = self.layer_list[0].input_dim
self.layer = None
self.weight_list = None
self.layer_list = None
else:
self.layer_list = new_layer_list
self.layer = nn.ModuleList(self.layer_list)
self.weight_list = nn.Parameter(torch.FloatTensor(new_weight_list))
self.weight_list.requires_grad = False
for param in self.layer.parameters():
param.requires_grad = False
return prune_mask
def prox(self):
"""
the proximal calculator.
"""
self.weight_list.data.masked_fill_(self.weight_list.data < 0, 0)
self.weight_list.data.masked_fill_(self.weight_list.data > 1, 1)
none_zero_count = (self.weight_list.data > 0).sum()
return none_zero_count
def regularizer(self):
"""
Calculate the regularization term.
Returns
----------
reg0: ``torch.FloatTensor``.
The value of reg0.
reg1: ``torch.FloatTensor``.
The value of reg1.
reg2: ``torch.FloatTensor``.
The value of reg2.
"""
reg3 = (self.weight_list * (1 - self.weight_list)).sum()
none_zero = self.weight_list.data > 0
none_zero_count = none_zero.sum()
reg0 = none_zero_count
reg1 = self.weight_list[none_zero].sum()
return reg0, reg1, reg3
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.FloatTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The ELMo outputs.
"""
if self.layer_list is not None:
for ind in range(len(self.layer_list)):
x = self.layer[ind](x, self.weight_list[ind])
return x
# return self.layer(x)
class SparseSeqLM(nn.Module):
"""
The language model for the dense rnns with layer-wise selection.
Parameters
----------
ori_lm : ``torch.nn.Module``, required.
the original module of language model.
backward : ``bool``, required.
whether the language model is backward.
droprate : ``float``, required.
the dropout ratrio.
fix_rate: ``bool``, required.
whether to fix the rqtio.
"""
def __init__(self, ori_lm, backward, droprate, fix_rate):
super(SparseSeqLM, self).__init__()
self.rnn = SDRNN(ori_lm.rnn, droprate, fix_rate)
self.w_num = ori_lm.w_num
self.w_dim = ori_lm.w_dim
self.word_embed = ori_lm.word_embed
self.word_embed.weight.requires_grad = False
self.output_dim = ori_lm.rnn_output
self.backward = backward
def to_params(self):
"""
To parameters.
"""
return {
"backward": self.backward,
"rnn_params": self.rnn.to_params(),
"word_embed_num": self.word_embed.num_embeddings,
"word_embed_dim": self.word_embed.embedding_dim
}
def prune_dense_rnn(self):
"""
Prune dense rnn to be smaller by delecting layers.
"""
prune_mask = self.rnn.prune_dense_rnn()
self.output_dim = self.rnn.output_dim
return prune_mask
def init_hidden(self):
"""
initialize hidden states.
"""
return
def regularizer(self):
"""
Calculate the regularization term.
Returns
----------
reg: ``list``.
The list of regularization terms.
"""
return self.rnn.regularizer()
def prox(self):
"""
the proximal calculator.
"""
return self.rnn.prox()
def forward(self, w_in, ind=None):
"""
Calculate the output.
Parameters
----------
w_in : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size).
ind : ``torch.LongTensor``, optional, (default=None).
the index tensor for the backward language model, of shape (seq_len, batch_size).
Returns
----------
output: ``torch.FloatTensor``.
The ELMo outputs.
"""
w_emb = self.word_embed(w_in)
out = self.rnn(w_emb)
if self.backward:
out_size = out.size()
out = out.view(out_size[0] * out_size[1], out_size[2]).index_select(0, ind).contiguous().view(out_size)
return out
================================================
FILE: model_seq/utils.py
================================================
"""
.. module:: utils
:synopsis: utils
.. moduleauthor:: Liyuan Liu
"""
import numpy as np
import torch
import json
import torch
import torch.nn as nn
import torch.nn.init
from torch.autograd import Variable
def log_sum_exp(vec):
"""
log sum exp function.
Parameters
----------
vec : ``torch.FloatTensor``, required.
input vector, of shape(ins_num, from_tag_size, to_tag_size)
Returns
-------
sum: ``torch.FloatTensor``.
log sum exp results, tensor of shape (ins_num, to_tag_size)
"""
max_score, _ = torch.max(vec, 1)
return max_score + torch.log(torch.sum(torch.exp(vec - max_score.unsqueeze(1).expand_as(vec)), 1))
def repackage_hidden(h):
"""
Wraps hidden states in new Variables, to detach them from their history
Parameters
----------
h : ``Tuple`` or ``Tensors``, required.
Tuple or Tensors, hidden states.
Returns
-------
hidden: ``Tuple`` or ``Tensors``.
detached hidden states
"""
if type(h) == torch.Tensor:
return h.detach()
else:
return tuple(repackage_hidden(v) for v in h)
def to_scalar(var):
"""
convert a tensor to a scalar number
"""
return var.view(-1).item()
def init_embedding(input_embedding):
"""
random initialize embedding
"""
bias = np.sqrt(3.0 / input_embedding.size(1))
nn.init.uniform_(input_embedding, -bias, bias)
def init_linear(input_linear):
"""
random initialize linear projection.
"""
bias = np.sqrt(6.0 / (input_linear.weight.size(0) + input_linear.weight.size(1)))
nn.init.uniform_(input_linear.weight, -bias, bias)
if input_linear.bias is not None:
input_linear.bias.data.zero_()
def adjust_learning_rate(optimizer, lr):
"""
adjust learning to the the new value.
Parameters
----------
optimizer : required.
pytorch optimizer.
float : ``float``, required.
the target learning rate.
"""
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def init_lstm(input_lstm):
"""
random initialize lstms
"""
for ind in range(0, input_lstm.num_layers):
weight = eval('input_lstm.weight_ih_l'+str(ind))
bias = np.sqrt(6.0 / (weight.size(0)/4 + weight.size(1)))
nn.init.uniform_(weight, -bias, bias)
weight = eval('input_lstm.weight_hh_l'+str(ind))
bias = np.sqrt(6.0 / (weight.size(0)/4 + weight.size(1)))
nn.init.uniform_(weight, -bias, bias)
if input_lstm.bias:
for ind in range(0, input_lstm.num_layers):
weight = eval('input_lstm.bias_ih_l'+str(ind))
weight.data.zero_()
weight.data[input_lstm.hidden_size: 2 * input_lstm.hidden_size] = 1
weight = eval('input_lstm.bias_hh_l'+str(ind))
weight.data.zero_()
weight.data[input_lstm.hidden_size: 2 * input_lstm.hidden_size] = 1
================================================
FILE: model_word_ada/LM.py
================================================
"""
.. module:: LM
:synopsis: language modeling
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_word_ada.utils as utils
class LM(nn.Module):
"""
The language model model.
Parameters
----------
rnn : ``torch.nn.Module``, required.
The RNNs network.
soft_max : ``torch.nn.Module``, required.
The softmax layer.
w_num : ``int`` , required.
The number of words.
w_dim : ``int`` , required.
The dimension of word embedding.
droprate : ``float`` , required
The dropout ratio.
label_dim : ``int`` , required.
The input dimension of softmax.
"""
def __init__(self, rnn, soft_max, w_num, w_dim, droprate, label_dim = -1, add_relu=False):
super(LM, self).__init__()
self.rnn = rnn
self.soft_max = soft_max
self.w_num = w_num
self.w_dim = w_dim
self.word_embed = nn.Embedding(w_num, w_dim)
self.rnn_output = self.rnn.output_dim
self.add_proj = label_dim > 0
if self.add_proj:
self.project = nn.Linear(self.rnn_output, label_dim)
if add_relu:
self.relu = nn.ReLU()
else:
self.relu = lambda x: x
self.drop = nn.Dropout(p=droprate)
def load_embed(self, origin_lm):
"""
Load embedding from another language model.
"""
self.word_embed = origin_lm.word_embed
self.soft_max = origin_lm.soft_max
def rand_ini(self):
"""
Random initialization.
"""
self.rnn.rand_ini()
# utils.init_linear(self.project)
self.soft_max.rand_ini()
# if not self.tied_weight:
utils.init_embedding(self.word_embed.weight)
if self.add_proj:
utils.init_linear(self.project)
def init_hidden(self):
"""
Initialize hidden states.
"""
self.rnn.init_hidden()
def forward(self, w_in, target):
"""
Calculate the loss.
Parameters
----------
w_in : ``torch.FloatTensor``, required.
the input tensor, of shape (word_num, input_dim).
target : ``torch.FloatTensor``, required.
the target of the language model, of shape (word_num).
Returns
----------
loss: ``torch.FloatTensor``.
The NLL loss.
"""
w_emb = self.word_embed(w_in)
w_emb = self.drop(w_emb)
out = self.rnn(w_emb).contiguous().view(-1, self.rnn_output)
if self.add_proj:
out = self.drop(self.relu(self.project(out)))
# out = self.drop(self.project(out))
out = self.soft_max(out, target)
return out
def log_prob(self, w_in):
"""
Calculate log-probability for the whole dictionary.
Parameters
----------
w_in : ``torch.FloatTensor``, required.
the input tensor, of shape (word_num, input_dim).
Returns
----------
prob: ``torch.FloatTensor``.
The full log-probability.
"""
w_emb = self.word_embed(w_in)
out = self.rnn(w_emb).contiguous().view(-1, self.rnn_output)
if self.add_proj:
out = self.relu(self.project(out))
out = self.soft_max.log_prob(out, w_emb.device)
return out
================================================
FILE: model_word_ada/__init__.py
================================================
================================================
FILE: model_word_ada/adaptive.py
================================================
"""
.. module:: adaptive
:synopsis: adaptive softmax
.. moduleauthor:: Liyuan Liu
"""
import torch
from torch import nn
from math import sqrt
class AdaptiveSoftmax(nn.Module):
"""
The adaptive softmax layer.
Modified from: https://github.com/rosinality/adaptive-softmax-pytorch/blob/master/adasoft.py
Parameters
----------
input_size : ``int``, required.
The input dimension.
cutoff : ``list``, required.
The list of cutoff values.
"""
def __init__(self, input_size, cutoff):
super().__init__()
self.input_size = input_size
self.cutoff = cutoff
self.output_size = cutoff[0] + len(cutoff) - 1
self.head = nn.Linear(input_size, self.output_size)
self.tail = nn.ModuleList()
self.cross_entropy = nn.CrossEntropyLoss(size_average=False)
for i in range(len(self.cutoff) - 1):
seq = nn.Sequential(
nn.Linear(input_size, input_size // 4 ** i, False),
nn.Linear(input_size // 4 ** i, cutoff[i + 1] - cutoff[i], False)
)
self.tail.append(seq)
def rand_ini(self):
"""
Random Initialization.
"""
nn.init.xavier_normal_(self.head.weight)
for tail in self.tail:
nn.init.xavier_normal_(tail[0].weight)
nn.init.xavier_normal_(tail[1].weight)
def log_prob(self, w_in, device):
"""
Calculate log-probability for the whole dictionary.
Parameters
----------
w_in : ``torch.FloatTensor``, required.
the input tensor, of shape (word_num, input_dim).
device: ``torch.device``, required.
the target device for calculation.
Returns
----------
prob: ``torch.FloatTensor``.
The full log-probability.
"""
lsm = nn.LogSoftmax(dim=1).to(device)
head_out = self.head(w_in)
batch_size = head_out.size(0)
prob = torch.zeros(batch_size, self.cutoff[-1]).to(device)
lsm_head = lsm(head_out)
prob.narrow(1, 0, self.output_size).add_(lsm_head.narrow(1, 0, self.output_size).data)
for i in range(len(self.tail)):
pos = self.cutoff[i]
i_size = self.cutoff[i + 1] - pos
buffer = lsm_head.narrow(1, self.cutoff[0] + i, 1)
buffer = buffer.expand(batch_size, i_size)
lsm_tail = lsm(self.tail[i](w_in))
prob.narrow(1, pos, i_size).copy_(buffer.data).add_(lsm_tail.data)
return prob
def forward(self, w_in, target):
"""
Calculate the log-likihood w.o. calculate the full distribution.
Parameters
----------
w_in : ``torch.FloatTensor``, required.
the input tensor, of shape (word_num, input_dim).
target : ``torch.FloatTensor``, required.
the target of the language model, of shape (word_num).
Returns
----------
loss: ``torch.FloatTensor``.
The NLL loss.
"""
batch_size = w_in.size(0)
output = 0.0
first_target = target.clone()
for i in range(len(self.cutoff) - 1):
mask = target.ge(self.cutoff[i]).mul(target.lt(self.cutoff[i + 1]))
if mask.sum() > 0:
first_target[mask] = self.cutoff[0] + i
second_target = target[mask].add(-self.cutoff[i])
second_input = w_in.index_select(0, mask.nonzero().squeeze())
second_output = self.tail[i](second_input)
output += self.cross_entropy(second_output, second_target)
output += self.cross_entropy(self.head(w_in), first_target)
output /= batch_size
return output
================================================
FILE: model_word_ada/basic.py
================================================
"""
.. module:: basic
:synopsis: basic rnn
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_word_ada.utils as utils
class BasicUnit(nn.Module):
"""
The basic recurrent unit for the vanilla stacked RNNs.
Parameters
----------
unit : ``str``, required.
The type of rnn unit.
input_dim : ``int``, required.
The input dimension fo the unit.
hid_dim : ``int``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
"""
def __init__(self, unit, input_dim, hid_dim, droprate):
super(BasicUnit, self).__init__()
self.unit_type = unit
rnnunit_map = {'rnn': nn.RNN, 'lstm': nn.LSTM, 'gru': nn.GRU}
self.batch_norm = (unit == 'bnlstm')
self.layer = rnnunit_map[unit](input_dim, hid_dim, 1)
self.droprate = droprate
self.output_dim = hid_dim
self.init_hidden()
def init_hidden(self):
"""
Initialize hidden states.
"""
self.hidden_state = None
def rand_ini(self):
"""
Random Initialization.
"""
if not self.batch_norm:
utils.init_lstm(self.layer)
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
out, new_hidden = self.layer(x, self.hidden_state)
self.hidden_state = utils.repackage_hidden(new_hidden)
if self.droprate > 0:
out = F.dropout(out, p=self.droprate, training=self.training)
return out
class BasicRNN(nn.Module):
"""
The multi-layer recurrent networks for the vanilla stacked RNNs.
Parameters
----------
layer_num: ``int``, required.
The number of layers.
unit : ``torch.nn.Module``, required.
The type of rnn unit.
input_dim : ``int``, required.
The input dimension fo the unit.
hid_dim : ``int``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
"""
def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate):
super(BasicRNN, self).__init__()
layer_list = [BasicUnit(unit, emb_dim, hid_dim, droprate)] + [BasicUnit(unit, hid_dim, hid_dim, droprate) for i in range(layer_num - 1)]
self.layer = nn.Sequential(*layer_list)
self.output_dim = layer_list[-1].output_dim
self.unit_type = unit
self.init_hidden()
def to_params(self):
"""
To parameters.
"""
return {
"rnn_type": "Basic",
"unit_type": self.layer[0].unit_type,
"layer_num": len(self.layer),
"emb_dim": self.layer[0].layer.input_size,
"hid_dim": self.layer[0].layer.hidden_size,
"droprate": self.layer[0].droprate
}
def init_hidden(self):
"""
Initialize hidden states.
"""
for tup in self.layer.children():
tup.init_hidden()
def rand_ini(self):
"""
Random Initialization.
"""
for tup in self.layer.children():
tup.rand_ini()
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
return self.layer(x)
================================================
FILE: model_word_ada/dataset.py
================================================
"""
.. module:: dataset
:synopsis: dataset for language modeling
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import sys
import pickle
import random
from tqdm import tqdm
from torch.utils.data import Dataset
class EvalDataset(object):
"""
Dataset for Language Modeling
Parameters
----------
dataset : ``list``, required.
The encoded dataset (outputs of preprocess scripts).
sequence_length: ``int``, required.
Sequence Length.
"""
def __init__(self, dataset, sequence_length):
super(EvalDataset, self).__init__()
self.dataset = dataset
self.sequence_length = sequence_length
self.construct_index()
def get_tqdm(self, device):
"""
construct dataset reader and the corresponding tqdm.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
"""
return tqdm(self.reader(device), mininterval=2, total=self.index_length, leave=False, file=sys.stdout, ncols=80)
def construct_index(self):
"""
construct index for the dataset.
"""
token_per_batch = self.sequence_length
tot_num = len(self.dataset) - 1
res_num = tot_num - tot_num % token_per_batch
self.x = list(torch.unbind(torch.LongTensor(self.dataset[0:res_num]).view(-1, self.sequence_length), 0))
self.y = list(torch.unbind(torch.LongTensor(self.dataset[1:res_num+1]).view(-1, self.sequence_length), 0))
self.x.append(torch.LongTensor(self.dataset[res_num:tot_num]))
self.y.append(torch.LongTensor(self.dataset[res_num+1:tot_num+1]))
self.index_length = len(self.x)
self.cur_idx = 0
def reader(self, device):
"""
construct dataset reader.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
Returns
-------
reader: ``iterator``.
A lazy iterable object
"""
if self.cur_idx == self.index_length:
self.cur_idx = 0
raise StopIteration
word_t = self.x[self.cur_idx].to(device).view(-1, 1)
label_t = self.y[self.cur_idx].to(device).view(-1, 1)
self.cur_idx += 1
yield word_t, label_t
class LargeDataset(object):
"""
Lazy Dataset for Language Modeling
Parameters
----------
root : ``str``, required.
The root folder for dataset files.
range_idx : ``int``, required.
The maximum file index for the input files (train_*.pk).
batch_size : ``int``, required.
Batch size.
sequence_length: ``int``, required.
Sequence Length.
"""
def __init__(self, root, range_idx, batch_size, sequence_length):
super(LargeDataset, self).__init__()
self.root = root
self.range_idx = range_idx
self.shuffle_list = list(range(0, range_idx))
self.shuffle()
self.batch_size = batch_size
self.sequence_length = sequence_length
self.token_per_batch = self.batch_size * self.sequence_length
self.total_batch_num = -1
def shuffle(self):
"""
shuffle dataset
"""
random.shuffle(self.shuffle_list)
def get_tqdm(self, device):
"""
construct dataset reader and the corresponding tqdm.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
"""
self.batch_count = 0
self.cur_idx = 0
self.file_idx = 0
self.index_length = 0
if self.total_batch_num <= 0:
return tqdm(self.reader(device), mininterval=2, leave=False, file=sys.stdout).__iter__()
else:
return tqdm(self.reader(device), mininterval=2, total=self.total_batch_num, leave=False, file=sys.stdout, ncols=80).__iter__()
def reader(self, device):
"""
construct dataset reader.
Parameters
----------
device: ``torch.device``, required.
the target device for the dataset loader.
Returns
-------
reader: ``iterator``.
A lazy iterable object
"""
while self.file_idx < self.range_idx:
self.open_next()
while self.cur_idx < self.index_length:
word_t = self.x[self.cur_idx].to(device)
# label_t = self.y[self.cur_idx].to(device)
label_t = self.y[self.cur_idx].to(device)
self.cur_idx += 1
yield word_t, label_t
self.total_batch_num = self.batch_count
self.shuffle()
def open_next(self):
"""
Open the next file.
"""
self.dataset = pickle.load(open(self.root + 'train_' + str( self.shuffle_list[self.file_idx])+'.pk', 'rb'))
res_num = len(self.dataset) - 1
res_num = res_num - res_num % self.token_per_batch
self.x = torch.LongTensor(self.dataset[0:res_num]).view(self.batch_size, -1, self.sequence_length).transpose_(0, 1).transpose_(1, 2).contiguous()
self.y = torch.LongTensor(self.dataset[1:res_num+1]).view(self.batch_size, -1, self.sequence_length).transpose_(0, 1).transpose_(1, 2).contiguous()
self.index_length = self.x.size(0)
self.cur_idx = 0
self.batch_count += self.index_length
self.file_idx += 1
================================================
FILE: model_word_ada/densenet.py
================================================
"""
.. module:: densenet
:synopsis: densernn
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_word_ada.utils as utils
class BasicUnit(nn.Module):
"""
The basic recurrent unit for the densely connected RNNs.
Parameters
----------
unit : ``torch.nn.Module``, required.
The type of rnn unit.
input_dim : ``float``, required.
The input dimension fo the unit.
increase_rate : ``float``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
"""
def __init__(self, unit, input_dim, increase_rate, droprate):
super(BasicUnit, self).__init__()
rnnunit_map = {'rnn': nn.RNN, 'lstm': nn.LSTM, 'gru': nn.GRU}
self.unit_type = unit
self.layer = rnnunit_map[unit](input_dim, increase_rate, 1)
if 'lstm' == self.unit_type:
utils.init_lstm(self.layer)
self.droprate = droprate
self.input_dim = input_dim
self.increase_rate = increase_rate
self.output_dim = input_dim + increase_rate
self.init_hidden()
def init_hidden(self):
"""
Initialize hidden states.
"""
self.hidden_state = None
def rand_ini(self):
"""
Random Initialization.
"""
return
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
if self.droprate > 0:
new_x = F.dropout(x, p=self.droprate, training=self.training)
else:
new_x = x
out, new_hidden = self.layer(new_x, self.hidden_state)
self.hidden_state = utils.repackage_hidden(new_hidden)
out = out.contiguous()
return torch.cat([x, out], 2)
class DenseRNN(nn.Module):
"""
The multi-layer recurrent networks for the densely connected RNNs.
Parameters
----------
layer_num: ``float``, required.
The number of layers.
unit : ``torch.nn.Module``, required.
The type of rnn unit.
input_dim : ``float``, required.
The input dimension fo the unit.
hid_dim : ``float``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
"""
def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate):
super(DenseRNN, self).__init__()
self.unit_type = unit
self.layer_list = [BasicUnit(unit, emb_dim + i * hid_dim, hid_dim, droprate) for i in range(layer_num)]
self.layer = nn.Sequential(*self.layer_list) if layer_num > 0 else None
self.output_dim = self.layer_list[-1].output_dim if layer_num > 0 else emb_dim
self.emb_dim = emb_dim
self.init_hidden()
def to_params(self):
"""
To parameters.
"""
return {
"rnn_type": "DenseRNN",
"unit_type": self.layer[0].unit_type,
"layer_num": len(self.layer),
"emb_dim": self.layer[0].input_dim,
"hid_dim": self.layer[0].increase_rate,
"droprate": self.layer[0].droprate
}
def init_hidden(self):
"""
Initialize hidden states.
"""
for tup in self.layer_list:
tup.init_hidden()
def rand_ini(self):
"""
Random Initialization.
"""
for tup in self.layer_list:
tup.rand_ini()
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs.
"""
return self.layer(x)
================================================
FILE: model_word_ada/ldnet.py
================================================
"""
.. module:: ldnet
:synopsis: LD-Net
.. moduleauthor:: Liyuan Liu
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import model_word_ada.utils as utils
import random
class BasicUnit(nn.Module):
"""
The basic recurrent unit for the densely connected RNNs with layer-wise dropout.
Parameters
----------
unit : ``torch.nn.Module``, required.
The type of rnn unit.
input_dim : ``float``, required.
The input dimension fo the unit.
increase_rate : ``float``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
layer_dropout : ``float``, required.
The layer-wise dropout ratrio.
"""
def __init__(self, unit, input_dim, increase_rate, droprate, layer_drop = 0):
super(BasicUnit, self).__init__()
rnnunit_map = {'rnn': nn.RNN, 'lstm': nn.LSTM, 'gru': nn.GRU}
self.unit_type = unit
self.layer = rnnunit_map[unit](input_dim, increase_rate, 1)
if 'lstm' == self.unit_type:
utils.init_lstm(self.layer)
self.layer_drop = layer_drop
self.droprate = droprate
self.input_dim = input_dim
self.increase_rate = increase_rate
self.output_dim = input_dim + increase_rate
self.init_hidden()
def init_hidden(self):
"""
Initialize hidden states.
"""
self.hidden_state = None
def rand_ini(self):
"""
Random Initialization.
"""
return
def forward(self, x, p_out):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
p_out : ``torch.LongTensor``, required.
the final output tensor for the softmax, of shape (seq_len, batch_size, input_dim).
Returns
----------
out: ``torch.FloatTensor``.
The undropped outputs of RNNs to the softmax.
p_out: ``torch.FloatTensor``.
The dropped outputs of RNNs to the next_layer.
"""
if self.droprate > 0:
new_x = F.dropout(x, p=self.droprate, training=self.training)
else:
new_x = x
out, new_hidden = self.layer(new_x, self.hidden_state)
self.hidden_state = utils.repackage_hidden(new_hidden)
out = out.contiguous()
if self.training and random.uniform(0, 1) < self.layer_drop:
deep_out = torch.autograd.Variable( torch.zeros(x.size(0), x.size(1), self.increase_rate) ).cuda()
else:
deep_out = out
o_out = torch.cat([p_out, out], 2)
d_out = torch.cat([x, deep_out], 2)
return d_out, o_out
class LDRNN(nn.Module):
"""
The multi-layer recurrent networks for the densely connected RNNs with layer-wise dropout.
Parameters
----------
layer_num: ``float``, required.
The number of layers.
unit : ``torch.nn.Module``, required.
The type of rnn unit.
input_dim : ``float``, required.
The input dimension fo the unit.
hid_dim : ``float``, required.
The hidden dimension fo the unit.
droprate : ``float``, required.
The dropout ratrio.
layer_dropout : ``float``, required.
The layer-wise dropout ratrio.
"""
def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate, layer_drop):
super(LDRNN, self).__init__()
self.unit_type = unit
self.layer_list = [BasicUnit(unit, emb_dim + i * hid_dim, hid_dim, droprate, layer_drop) for i in range(layer_num)]
self.layer_num = layer_num
self.layer = nn.ModuleList(self.layer_list) if layer_num > 0 else None
self.output_dim = self.layer_list[-1].output_dim if layer_num > 0 else emb_dim
self.emb_dim = emb_dim
self.init_hidden()
def to_params(self):
"""
To parameters.
"""
return {
"rnn_type": "LDRNN",
"unit_type": self.layer[0].unit_type,
"layer_num": len(self.layer),
"emb_dim": self.layer[0].input_dim,
"hid_dim": self.layer[0].increase_rate,
"droprate": self.layer[0].droprate,
"after_pruned": False
}
def init_hidden(self):
"""
Initialize hidden states.
"""
for tup in self.layer_list:
tup.init_hidden()
def rand_ini(self):
"""
Random Initialization.
"""
for tup in self.layer_list:
tup.rand_ini()
def forward(self, x):
"""
Calculate the output.
Parameters
----------
x : ``torch.LongTensor``, required.
the input tensor, of shape (seq_len, batch_size, input_dim).
Returns
----------
output: ``torch.FloatTensor``.
The output of RNNs to the Softmax.
"""
output = x
for ind in range(self.layer_num):
x, output = self.layer_list[ind](x, output)
return output
================================================
FILE: model_word_ada/utils.py
================================================
"""
.. module:: utils
:synopsis: utils
.. moduleauthor:: Liyuan Liu
"""
import numpy as np
import torch
import json
import torch
import torch.nn as nn
import torch.nn.init
from torch.autograd import Variable
def repackage_hidden(h):
"""
Wraps hidden states in new Variables, to detach them from their history
Parameters
----------
h : ``Tuple`` or ``Tensors``, required.
Tuple or Tensors, hidden states.
Returns
-------
hidden: ``Tuple`` or ``Tensors``.
detached hidden states
"""
if type(h) == torch.Tensor:
return h.detach()
else:
return tuple(repackage_hidden(v) for v in h)
def to_scalar(var):
"""
convert a tensor to a scalar number
"""
return var.view(-1).item()
def init_embedding(input_embedding):
"""
random initialize embedding
"""
bias = np.sqrt(3.0 / input_embedding.size(1))
nn.init.uniform_(input_embedding, -bias, bias)
def init_linear(input_linear):
"""
random initialize linear projection.
"""
bias = np.sqrt(6.0 / (input_linear.weight.size(0) + input_linear.weight.size(1)))
nn.init.uniform_(input_linear.weight, -bias, bias)
if input_linear.bias is not None:
input_linear.bias.data.zero_()
def adjust_learning_rate(optimizer, lr):
"""
adjust learning to the the new value.
Parameters
----------
optimizer : required.
pytorch optimizer.
float : ``float``, required.
the target learning rate.
"""
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def init_lstm(input_lstm):
"""
random initialize lstms
"""
for ind in range(0, input_lstm.num_layers):
weight = eval('input_lstm.weight_ih_l'+str(ind))
bias = np.sqrt(6.0 / (weight.size(0)/4 + weight.size(1)))
nn.init.uniform_(weight, -bias, bias)
weight = eval('input_lstm.weight_hh_l'+str(ind))
bias = np.sqrt(6.0 / (weight.size(0)/4 + weight.size(1)))
nn.init.uniform_(weight, -bias, bias)
if input_lstm.bias:
for ind in range(0, input_lstm.num_layers):
weight = eval('input_lstm.bias_ih_l'+str(ind))
weight.data.zero_()
weight.data[input_lstm.hidden_size: 2 * input_lstm.hidden_size] = 1
weight = eval('input_lstm.bias_hh_l'+str(ind))
weight.data.zero_()
weight.data[input_lstm.hidden_size: 2 * input_lstm.hidden_size] = 1
================================================
FILE: pre_seq/encode_data.py
================================================
"""
.. module:: encode_data
:synopsis: encode data for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import pickle
import argparse
import os
import random
import numpy as np
from tqdm import tqdm
import itertools
import functools
def encode_dataset(input_file, flm_map, blm_map, gw_map, c_map, y_map):
flm_unk = flm_map['<unk>']
blm_unk = blm_map['<unk>']
gw_unk = gw_map['<unk>']
c_con = c_map[' ']
c_unk = c_map['<unk>']
dataset = list()
tmpw_flm, tmpw_blm, tmpw_gw, tmpc, tmpy = list(), list(), list(), list(), list()
with open(input_file, 'r') as fin:
for line in fin:
if line.isspace() or line.startswith('-DOCSTART-'):
if len(tmpw_flm) > 0:
dataset.append([tmpw_flm, tmpw_blm, tmpw_gw, tmpc, tmpy])
tmpw_flm, tmpw_blm, tmpw_gw, tmpc, tmpy = list(), list(), list(), list(), list()
else:
line = line.split()
tmpw_flm.append(flm_map.get(line[0], flm_unk))
tmpw_blm.append(blm_map.get(line[0], blm_unk))
tmpw_gw.append(gw_map.get(line[0].lower(), gw_unk))
tmpy.append(y_map[line[-1]])
tmpc.append([c_map.get(tup, c_unk) for tup in line[0]])
if len(tmpw_flm) > 0:
dataset.append([tmpw_flm, tmpw_blm, tmpw_gw, tmpc, tmpy])
return dataset
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--train_file', default="./data/ner/eng.train.iobes")
parser.add_argument('--test_file', default="./data/ner/eng.testb.iobes")
parser.add_argument('--dev_file', default="./data/ner/eng.testa.iobes")
parser.add_argument('--input_map', default="./data/conll_map.pk")
parser.add_argument('--output_file', default="./data/ner_dataset.pk")
parser.add_argument('--threshold', type=int, default=1)
parser.add_argument('--unk', default='<unk>')
args = parser.parse_args()
with open(args.input_map, 'rb') as f:
p_data = pickle.load(f)
name_list = ['flm_map', 'blm_map', 'gw_map', 'c_map', 'y_map', 'emb_array']
flm_map, blm_map, gw_map, c_map, y_map, emb_array = [p_data[tup] for tup in name_list]
train_dataset = encode_dataset(args.train_file, flm_map, blm_map, gw_map, c_map, y_map)
test_dataset = encode_dataset(args.test_file, flm_map, blm_map, gw_map, c_map, y_map)
dev_dataset = encode_dataset(args.dev_file, flm_map, blm_map, gw_map, c_map, y_map)
with open(args.output_file, 'wb') as f:
pickle.dump({'flm_map': flm_map, 'blm_map': blm_map, 'gw_map': gw_map, 'c_map': c_map, 'y_map': y_map, 'emb_array': emb_array, 'train_data': train_dataset, 'test_data': test_dataset, 'dev_data': dev_dataset}, f)
================================================
FILE: pre_seq/gene_map.py
================================================
"""
.. module:: gene_map
:synopsis: generate map for sequence labeling
.. moduleauthor:: Liyuan Liu
"""
import pickle
import argparse
import os
import random
import numpy as np
from tqdm import tqdm
import itertools
import functools
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--train_corpus', default='./data/ner/eng.train.iobes')
parser.add_argument('--input_embedding', default="./embedding/glove.6B.100d.txt")
parser.add_argument('--output_map', default="./data/conll_map.pk")
parser.add_argument('--flm_map', default="./data/one_billion/test.pk")
parser.add_argument('--blm_map', default="./data/one_billion_reverse/test.pk")
parser.add_argument('--threshold', type=int, default=5)
parser.add_argument('--unk', default='unk')
args = parser.parse_args()
with open(args.flm_map, 'rb') as f:
p_data = pickle.load(f)
flm_map = p_data['w_map']
with open(args.blm_map, 'rb') as f:
p_data = pickle.load(f)
blm_map = p_data['w_map']
gw_map = dict()
embedding_array = list()
for line in open(args.input_embedding, 'r'):
line = line.split()
vector = list(map(lambda t: float(t), filter(lambda n: n and not n.isspace(), line[1:])))
if line[0] == args.unk:
gw_map['<unk>'] = len(gw_map)
else:
gw_map[line[0]] = len(gw_map)
embedding_array.append(vector)
bias = 2 * np.sqrt(3.0 / len(embedding_array[0]))
gw_map['<\n>'] = len(gw_map)
embedding_array.append([random.random() * bias - bias for tup in embedding_array[0]])
w_count = dict()
c_count = dict()
y_map = dict()
# y_map = {'B-LST':0, 'E-LST':1}
with open(args.train_corpus, 'r') as fin:
for line in fin:
if line.isspace() or line.startswith('-DOCSTART-'):
c_count['\n'] = c_count.get('\n', 0) + 1
else:
line = line.split()
for tup in line[0]:
c_count[tup] = c_count.get(tup, 0) + 1
c_count[' '] = c_count.get(' ', 0) + 1
if line[-1] not in y_map:
y_map[line[-1]] = len(y_map)
word = line[0].lower()
if word not in gw_map:
w_count[word] = w_count.get(word, 0) + 1
w_set = {k for k, v in w_count.items() if v > args.threshold}
for k in w_set:
gw_map[k] = len(gw_map)
embedding_array.append([random.random() * bias - bias for tup in embedding_array[0]])
c_set = {k for k, v in c_count.items() if v > args.threshold}
c_map = {v:k for k, v in enumerate(c_set)}
c_map['<unk>'] = len(c_map)
y_map['<s>'] = len(y_map)
y_map['<eof>'] = len(y_map)
with open(args.output_map, 'wb') as f:
pickle.dump({'flm_map': flm_map, 'blm_map': blm_map, 'gw_map': gw_map, 'c_map': c_map, 'y_map': y_map, 'emb_array': embedding_array}, f)
================================================
FILE: pre_word_ada/encode_data2folder.py
================================================
"""
.. module:: encode_data2folder
:synopsis: encode data folder for language modeling
.. moduleauthor:: Liyuan Liu
"""
import pickle
import argparse
import os
import random
import numpy as np
from tqdm import tqdm
import itertools
import functools
def encode_dataset(input_folder, w_map, reverse):
w_eof = w_map['\n']
w_unk = w_map['<unk>']
list_dirs = os.walk(input_folder)
lines = list()
for root, dirs, files in list_dirs:
for file in tqdm(files):
with open(os.path.join(root, file)) as fin:
lines = lines + list(filter(lambda t: t and not t.isspace(), fin.readlines()))
dataset = list()
for line in lines:
dataset += list(map(lambda t: w_map.get(t, w_unk), line.split())) + [w_eof]
if reverse:
dataset = dataset[::-1]
return dataset
def encode_dataset2file(input_folder, t, w_map, reverse):
w_eof = w_map['\n']
w_unk = w_map['<unk>']
list_dirs = os.walk(input_folder)
range_ind = 0
for root, dirs, files in list_dirs:
for file in tqdm(files):
with open(os.path.join(root, file), 'r') as fin:
lines = list(filter(lambda t: t and not t.isspace(), fin.readlines()))
dataset = list()
for line in lines:
dataset += list(map(lambda t: w_map.get(t, w_unk), line.split())) + [w_eof]
if reverse:
dataset = dataset[::-1]
with open(output_folder+'train_'+ str(range_ind) + '.pk', 'wb') as f:
pickle.dump(dataset, f)
range_ind += 1
return range_ind
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--train_folder', default="./data/1b_train")
parser.add_argument('--test_folder', default="./data/1b_test")
parser.add_argument('--input_map', default="./data/1b_map.pk")
parser.add_argument('--output_folder', default="./data/one_billion/")
parser.add_argument('--threshold', type=int, default=3)
parser.add_argument('--unk', default='<unk>')
parser.add_argument('--reverse', action='store_true')
args = parser.parse_args()
with open(args.input_map, 'rb') as f:
w_count = pickle.load(f)
unk_count = sum([v for k, v in w_count.items() if v <= args.threshold])
w_list = [(k, v) for k, v in w_count.items() if v > args.threshold]
w_list.append(('<unk>', unk_count))
w_list.sort(key=lambda t: t[1], reverse=True)
w_map = {kv[0]:v for v, kv in enumerate(w_list)}
range_ind = encode_dataset2file(args.train_folder, args.output_folder, w_map, args.reverse)
test_dataset = encode_dataset(args.test_folder, w_map, args.reverse)
with open(args.output_folder+'test.pk', 'wb') as f:
pickle.dump({'w_map': w_map, 'test_data':test_dataset, 'range' : range_ind}, f)
================================================
FILE: pre_word_ada/gene_map.py
================================================
"""
.. module:: gene_map
:synopsis: gene map for language modeling
.. moduleauthor:: Liyuan Liu
"""
import pickle
import argparse
import os
import random
import numpy as np
from tqdm import tqdm
import itertools
import functools
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--input_folder', default="./data/1b_train")
parser.add_argument('--output_map', default="./data/1b_map.pk")
args = parser.parse_args()
w_count = {'\n':0}
list_dirs = os.walk(args.input_folder)
for root, dirs, files in list_dirs:
for file in tqdm(files):
with open(os.path.join(root, file)) as fin:
for line in fin:
if not line or line.isspace():
continue
line = line.split()
for tup in line:
w_count[tup] = w_count.get(tup, 0) + 1
w_count['\n'] += 1
with open(args.output_map, 'wb') as f:
pickle.dump(w_count, f)
================================================
FILE: prune_sparse_seq.py
================================================
from __future__ import print_function
import datetime
import time
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.optim as optim
import codecs
import pickle
import math
from model_word_ada.LM import LM
from model_word_ada.basic import BasicRNN
from model_word_ada.densenet import DenseRNN
from model_word_ada.ldnet import LDRNN
from model_seq.crf import CRFLoss, CRFDecode
from model_seq.dataset import SeqDataset
from model_seq.evaluator import eval_wc
from model_seq.seqlabel import SeqLabel, Vanilla_SeqLabel
from model_seq.seqlm import BasicSeqLM
from model_seq.sparse_lm import SparseSeqLM
import model_seq.utils as utils
from torch_scope import wrapper
import argparse
import logging
import json
import os
import sys
import itertools
import functools
logger = logging.getLogger(__name__)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, default="auto")
parser.add_argument('--cp_root', default='./checkpoint')
parser.add_argument('--checkpoint_name', default='p_ner')
parser.add_argument('--git_tracking', action='store_true')
parser.add_argument('--corpus', default='./data/ner_dataset.pk')
parser.add_argument('--load_seq', default='./checkpoint/ner.th')
parser.add_argument('--lm_hid_dim', type=int, default=300)
parser.add_argument('--lm_word_dim', type=int, default=300)
parser.add_argument('--lm_label_dim', type=int, default=1600)
parser.add_argument('--lm_layer_num', type=int, default=10)
parser.add_argument('--lm_droprate', type=float, default=0.5)
parser.add_argument('--lm_rnn_layer', choices=['Basic', 'DenseNet', 'LDNet'], default='LDNet')
parser.add_argument('--lm_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_c_dim', type=int, default=30)
parser.add_argument('--seq_c_hid', type=int, default=150)
parser.add_argument('--seq_c_layer', type=int, default=1)
parser.add_argument('--seq_w_dim', type=int, default=100)
parser.add_argument('--seq_w_hid', type=int, default=300)
parser.add_argument('--seq_w_layer', type=int, default=1)
parser.add_argument('--seq_droprate', type=float, default=0.5)
parser.add_argument('--seq_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_model', choices=['vanilla', 'lm-aug'], default='lm-aug')
parser.add_argument('--seq_lambda0', type=float, default=0.05)
parser.add_argument('--seq_lambda1', type=float, default=2)
parser.add_argument('--batch_size', type=int, default=10)
parser.add_argument('--patience', type=int, default=5)
parser.add_argument('--epoch', type=int, default=200)
parser.add_argument('--least', type=int, default=50)
parser.add_argument('--clip', type=float, default=5)
parser.add_argument('--lr', type=float, default=0.015)
parser.add_argument('--lr_decay', type=float, default=0.05)
parser.add_argument('--update', choices=['Adam', 'Adagrad', 'Adadelta', 'SGD'], default='SGD')
args = parser.parse_args()
pw = wrapper(os.path.join(args.cp_root, args.checkpoint_name), args.checkpoint_name, enable_git_track=args.git_tracking)
gpu_index = pw.auto_device() if 'auto' == args.gpu else int(args.gpu)
device = torch.device("cuda:" + str(gpu_index) if gpu_index >= 0 else "cpu")
if gpu_index >= 0:
torch.cuda.set_device(gpu_index)
logger.info('Loading data from {}.'.format(args.corpus))
dataset = pickle.load(open(args.corpus, 'rb'))
name_list = ['flm_map', 'blm_map', 'gw_map', 'c_map', 'y_map', 'emb_array', 'train_data', 'test_data', 'dev_data']
flm_map, blm_map, gw_map, c_map, y_map, emb_array, train_data, test_data, dev_data = [dataset[tup] for tup in name_list ]
logger.info('Building language models and seuqence labeling models.')
rnn_map = {'Basic': BasicRNN, 'DenseNet': DenseRNN, 'LDNet': functools.partial(LDRNN, layer_drop = 0)}
flm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
blm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
flm_model = LM(flm_rnn_layer, None, len(flm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
blm_model = LM(blm_rnn_layer, None, len(blm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
flm_model_seq = SparseSeqLM(flm_model, False, args.lm_droprate, False)
blm_model_seq = SparseSeqLM(blm_model, True, args.lm_droprate, False)
SL_map = {'vanilla':Vanilla_SeqLabel, 'lm-aug': SeqLabel}
seq_model = SL_map[args.seq_model](flm_model_seq, blm_model_seq, len(c_map), args.seq_c_dim, args.seq_c_hid, args.seq_c_layer, len(gw_map), args.seq_w_dim, args.seq_w_hid, args.seq_w_layer, len(y_map), args.seq_droprate, unit=args.seq_rnn_unit)
logger.info('Loading pre-trained models from {}.'.format(args.load_seq))
seq_file = wrapper.restore_checkpoint(args.load_seq)['model']
seq_model.load_state_dict(seq_file)
seq_model.to(device)
crit = CRFLoss(y_map)
decoder = CRFDecode(y_map)
evaluator = eval_wc(decoder, 'f1')
logger.info('Constructing dataset.')
train_dataset, test_dataset, dev_dataset = [SeqDataset(tup_data, flm_map['\n'], blm_map['\n'], gw_map['<\n>'], c_map[' '], c_map['\n'], y_map['<s>'], y_map['<eof>'], len(y_map), args.batch_size) for tup_data in [train_data, test_data, dev_data]]
logger.info('Constructing optimizer.')
param_dict = filter(lambda t: t.requires_grad, seq_model.parameters())
optim_map = {'Adam' : optim.Adam, 'Adagrad': optim.Adagrad, 'Adadelta': optim.Adadelta, 'SGD': functools.partial(optim.SGD, momentum=0.9)}
if args.lr > 0:
optimizer=optim_map[args.update](param_dict, lr=args.lr)
else:
optimizer=optim_map[args.update](param_dict)
logger.info('Saving configues.')
pw.save_configue(args)
logger.info('Setting up training environ.')
best_f1 = float('-inf')
patience_count = 0
batch_index = 0
normalizer = 0
tot_loss = 0
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
print(dev_f1)
logger.info('Start training...')
for indexs in range(args.epoch):
logger.info('############')
logger.info('Epoch: {}'.format(indexs))
pw.nvidia_memory_map()
iterator = train_dataset.get_tqdm(device)
seq_model.train()
for f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w, f_y, f_y_m, _ in iterator:
seq_model.zero_grad()
output = seq_model(f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w)
loss = crit(output, f_y, f_y_m)
tot_loss += utils.to_scalar(loss)
normalizer += 1
if args.seq_lambda0 > 0:
f_reg0, f_reg1, f_reg3 = flm_model_seq.regularizer()
b_reg0, b_reg1, b_reg3 = blm_model_seq.regularizer()
loss += args.seq_lambda0 * (f_reg3 + b_reg3)
if (f_reg0 + b_reg0 > args.seq_lambda1):
loss += args.seq_lambda0 * (f_reg1 + b_reg1)
loss.backward()
torch.nn.utils.clip_grad_norm_(seq_model.parameters(), args.clip)
optimizer.step()
flm_model_seq.prox()
blm_model_seq.prox()
batch_index += 1
if 0 == batch_index % 100:
pw.add_loss_vs_batch({'training_loss': tot_loss / (normalizer + 1e-9)}, batch_index, use_logger = False)
tot_loss = 0
normalizer = 0
if args.lr > 0:
current_lr = args.lr / (1 + (indexs + 1) * args.lr_decay)
utils.adjust_learning_rate(optimizer, current_lr)
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
nonezero_count = (flm_model_seq.rnn.weight_list.data > 0).int().cpu().sum() + (blm_model_seq.rnn.weight_list.data > 0).cpu().int().sum()
pw.add_loss_vs_batch({'dev_f1': dev_f1, 'none_zero_count': nonezero_count.item()}, indexs, use_logger = True)
pw.add_loss_vs_batch({'dev_pre': dev_pre, 'dev_rec': dev_rec}, indexs, use_logger = False)
logger.info('Saving model...')
pw.save_checkpoint(model = seq_model, is_best = (nonezero_count <= args.seq_lambda1 and dev_f1 > best_f1))
if nonezero_count <= args.seq_lambda1 and dev_f1 > best_f1:
nonezero_count = nonezero_count
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
best_f1, best_dev_pre, best_dev_rec, best_dev_acc = dev_f1, dev_pre, dev_rec, dev_acc
pw.add_loss_vs_batch({'tot_loss': tot_loss/(normalizer+1e-9), 'test_f1': test_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'test_pre': test_pre, 'test_rec': test_rec}, indexs, use_logger = False)
patience_count = 0
elif dev_f1 > best_f1:
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
pw.add_loss_vs_batch({'tot_loss': tot_loss/(normalizer+1e-9), 'test_f1': test_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'test_pre': test_pre, 'test_rec': test_rec}, indexs, use_logger = False)
else:
patience_count += 1
if patience_count >= args.patience and indexs >= args.least:
break
pw.add_loss_vs_batch({'best_test_f1': test_f1, 'best_test_pre': test_pre, 'best_test_rec': test_rec}, 0, use_logger = True, use_writer = False)
pw.add_loss_vs_batch({'best_dev_f1': best_f1, 'best_dev_pre': best_dev_pre, 'best_dev_rec': best_dev_rec}, 0, use_logger = True, use_writer = False)
logger.info('Loading best_performing_model.')
seq_param = pw.restore_best_checkpoint()['model']
seq_model.load_state_dict(seq_param)
seq_model.to(device)
logger.info('Test before deleting layers.')
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
pw.add_loss_vs_batch({'best_test_f1': test_f1, 'best_dev_f1': dev_f1}, 1, use_logger = True, use_writer = False)
logger.info('Deleting layers.')
seq_model.cpu()
seq_model.prune_dense_rnn()
seq_model.to(device)
logger.info('Resulting models display.')
print(seq_model)
logger.info('Test after deleting layers.')
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
pw.add_loss_vs_batch({'best_test_f1': test_f1, 'best_dev_f1': dev_f1}, 2, use_logger = True, use_writer = False)
seq_model.cpu()
logger.info('Saving model...')
seq_config = seq_model.to_params()
pw.save_checkpoint(model = seq_model,
is_best = True,
s_dict = {'config': seq_config,
'flm_map': flm_map,
'blm_map': blm_map,
'gw_map': gw_map,
'c_map': c_map,
'y_map': y_map})
pw.close()
================================================
FILE: train_lm.py
================================================
from __future__ import print_function
import datetime
import time
import torch
import torch.nn as nn
import torch.optim as optim
import codecs
import pickle
import math
from model_word_ada.LM import LM
from model_word_ada.basic import BasicRNN
from model_word_ada.ldnet import LDRNN
from model_word_ada.densenet import DenseRNN
from model_word_ada.dataset import LargeDataset, EvalDataset
from model_word_ada.adaptive import AdaptiveSoftmax
import model_word_ada.utils as utils
from torch_scope import wrapper
import argparse
import logging
import json
import os
import sys
import itertools
import functools
logger = logging.getLogger(__name__)
def evaluate(data_loader, lm_model, limited = 76800):
lm_model.eval()
lm_model.init_hidden()
total_loss = 0
total_len = 0
for word_t, label_t in data_loader:
label_t = label_t.view(-1)
tmp_len = label_t.size(0)
total_loss += tmp_len * lm_model(word_t, label_t).item()
total_len += tmp_len
if limited >=0 and total_len > limited:
break
ppl = math.exp(total_loss / total_len)
return ppl
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, default="auto")
parser.add_argument('--cp_root', default='./checkpoint')
parser.add_argument('--checkpoint_name', default='ld0')
parser.add_argument('--git_tracking', action='store_true')
parser.add_argument('--dataset_folder', default='./data/one_billion/')
parser.add_argument('--restore_checkpoint', default='')
parser.add_argument('--batch_size', type=int, default=128)
parser.add_argument('--sequence_length', type=int, default=20)
parser.add_argument('--hid_dim', type=int, default=300)
parser.add_argument('--word_dim', type=int, default=300)
parser.add_argument('--label_dim', type=int, default=1600)
parser.add_argument('--layer_num', type=int, default=10)
parser.add_argument('--droprate', type=float, default=0.01)
parser.add_argument('--add_relu', action='store_true')
parser.add_argument('--layer_drop', type=float, default=0.5)
parser.add_argument('--epoch', type=int, default=400)
parser.add_argument('--clip', type=float, default=5)
parser.add_argument('--update', choices=['Adam', 'Adagrad', 'Adadelta'], default='Adam', help='adam is the best')
parser.add_argument('--rnn_layer', choices=['Basic', 'DenseNet', 'LDNet'], default='LDNet')
parser.add_argument('--rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--lr', type=float, default=0.001)
parser.add_argument('--lr_decay', type=float, default=0.1)
parser.add_argument('--cut_off', nargs='+', default=[4000,40000,200000])
parser.add_argument('--interval', type=int, default=100)
parser.add_argument('--epoch_size', type=int, default=4000)
parser.add_argument('--patience', type=float, default=10)
args = parser.parse_args()
pw = wrapper(os.path.join(args.cp_root, args.checkpoint_name), args.checkpoint_name, enable_git_track=args.git_tracking)
gpu_index = pw.auto_device() if 'auto' == args.gpu else int(args.gpu)
device = torch.device("cuda:" + str(gpu_index) if gpu_index >= 0 else "cpu")
if gpu_index >= 0:
torch.cuda.set_device(gpu_index)
logger.info('Loading dataset.')
dataset = pickle.load(open(args.dataset_folder + 'test.pk', 'rb'))
w_map, test_data, range_idx = dataset['w_map'], dataset['test_data'], dataset['range']
train_loader = LargeDataset(args.dataset_folder, range_idx, args.batch_size, args.sequence_length)
test_loader = EvalDataset(test_data, args.batch_size)
logger.info('Building models.')
rnn_map = {'Basic': BasicRNN, 'DenseNet': DenseRNN, 'LDNet': functools.partial(LDRNN, layer_drop = args.layer_drop)}
rnn_layer = rnn_map[args.rnn_layer](args.layer_num, args.rnn_unit, args.word_dim, args.hid_dim, args.droprate)
cut_off = args.cut_off + [len(w_map) + 1]
if args.label_dim > 0:
soft_max = AdaptiveSoftmax(args.label_dim, cut_off)
else:
soft_max = AdaptiveSoftmax(rnn_layer.output_dim, cut_off)
lm_model = LM(rnn_layer, soft_max, len(w_map), args.word_dim, args.droprate, label_dim = args.label_dim, add_relu=args.add_relu)
lm_model.rand_ini()
logger.info('Building optimizer.')
optim_map = {'Adam' : optim.Adam, 'Adagrad': optim.Adagrad, 'Adadelta': optim.Adadelta}
if args.lr > 0:
optimizer=optim_map[args.update](lm_model.parameters(), lr=args.lr)
else:
optimizer=optim_map[args.update](lm_model.parameters())
if args.restore_checkpoint:
if os.path.isfile(args.restore_checkpoint):
logger.info("loading checkpoint: '{}'".format(args.restore_checkpoint))
model_file = wrapper.restore_checkpoint(args.restore_checkpoint)['model']
lm_model.load_state_dict(model_file, False)
else:
logger.info("no checkpoint found at: '{}'".format(args.restore_checkpoint))
lm_model.to(device)
logger.info('Saving configues.')
pw.save_configue(args)
logger.info('Setting up training environ.')
best_train_ppl = float('inf')
cur_lr = args.lr
batch_index = 0
epoch_loss = 0
patience = 0
writer = SummaryWriter(log_dir='./runs_1b/'+args.log_dir)
name_list = ['batch_loss', 'train_ppl', 'test_ppl']
bloss, tr_ppl, te_ppl = [args.log_dir+'/'+tup for tup in name_list]
try:
for indexs in range(args.epoch):
logger.info('############')
logger.info('Epoch: {}'.format(indexs))
pw.nvidia_memory_map()
lm_model.train()
for word_t, label_t in train_loader.get_tqdm(device):
if 1 == train_loader.cur_idx:
lm_model.init_hidden()
label_t = label_t.view(-1)
lm_model.zero_grad()
loss = lm_model(word_t, label_t)
loss.backward()
torch.nn.utils.clip_grad_norm_(lm_model.parameters(), args.clip)
optimizer.step()
batch_index += 1
if 0 == batch_index % args.interval:
s_loss = utils.to_scalar(loss)
pw.add_loss_vs_batch({'batch_loss': s_loss}, batch_index, use_logger = False)
epoch_loss += utils.to_scalar(loss)
if 0 == batch_index % args.epoch_size:
epoch_ppl = math.exp(epoch_loss / args.epoch_size)
pw.add_loss_vs_batch({'train_ppl': epoch_ppl}, batch_index, use_logger = True)
if epoch_loss < best_train_ppl:
best_train_ppl = epoch_loss
patience = 0
else:
patience += 1
epoch_loss = 0
if patience > args.patience and cur_lr > 0:
patience = 0
cur_lr *= args.lr_decay
best_train_ppl = float('inf')
logger.info('adjust_learning_rate...')
utils.adjust_learning_rate(optimizer, cur_lr)
test_ppl = evaluate(test_loader.get_tqdm(device), lm_model)
pw.add_loss_vs_batch({'test_ppl': test_ppl}, indexs, use_logger = True)
pw.save_checkpoint(model = lm_model, optimizer = optimizer, is_best = True)
except KeyboardInterrupt:
logger.info('Exiting from training early')
test_ppl = evaluate(test_loader.get_tqdm(device), lm_model)
pw.add_loss_vs_batch({'test_ppl': test_ppl}, indexs, use_logger = True)
pw.save_checkpoint(model = lm_model, optimizer = optimizer, is_best = True)
pw.close()
================================================
FILE: train_seq.py
================================================
from __future__ import print_function
import datetime
import time
import torch
import torch.nn as nn
import torch.optim as optim
import codecs
import pickle
import math
from model_word_ada.LM import LM
from model_word_ada.basic import BasicRNN
from model_word_ada.densenet import DenseRNN
from model_word_ada.ldnet import LDRNN
from model_seq.crf import CRFLoss, CRFDecode
from model_seq.dataset import SeqDataset
from model_seq.evaluator import eval_wc
from model_seq.seqlabel import SeqLabel, Vanilla_SeqLabel
from model_seq.seqlm import BasicSeqLM
from model_seq.sparse_lm import SparseSeqLM
import model_seq.utils as utils
from torch_scope import wrapper
import argparse
import logging
import json
import os
import sys
import itertools
import functools
logger = logging.getLogger(__name__)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, default="auto")
parser.add_argument('--cp_root', default='./checkpoint')
parser.add_argument('--checkpoint_name', default='ner')
parser.add_argument('--git_tracking', action='store_true')
parser.add_argument('--corpus', default='./data/ner_dataset.pk')
parser.add_argument('--forward_lm', default='./checkpoint/ld0.th')
parser.add_argument('--backward_lm', default='./checkpoint/ld_0.th')
parser.add_argument('--lm_hid_dim', type=int, default=300)
parser.add_argument('--lm_word_dim', type=int, default=300)
parser.add_argument('--lm_label_dim', type=int, default=-1)
parser.add_argument('--lm_layer_num', type=int, default=10)
parser.add_argument('--lm_droprate', type=float, default=0.5)
parser.add_argument('--lm_rnn_layer', choices=['Basic', 'DenseNet', 'LDNet'], default='LDNet')
parser.add_argument('--lm_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_c_dim', type=int, default=30)
parser.add_argument('--seq_c_hid', type=int, default=150)
parser.add_argument('--seq_c_layer', type=int, default=1)
parser.add_argument('--seq_w_dim', type=int, default=100)
parser.add_argument('--seq_w_hid', type=int, default=300)
parser.add_argument('--seq_w_layer', type=int, default=1)
parser.add_argument('--seq_droprate', type=float, default=0.5)
parser.add_argument('--seq_model', choices=['vanilla', 'lm-aug'], default='lm-aug')
parser.add_argument('--seq_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_lm_model', choices=['vanilla', 'sparse-lm'], default='vanilla')
parser.add_argument('--batch_size', type=int, default=10)
parser.add_argument('--patience', type=int, default=15)
parser.add_argument('--epoch', type=int, default=200)
parser.add_argument('--clip', type=float, default=5)
parser.add_argument('--lr', type=float, default=0.015)
parser.add_argument('--lr_decay', type=float, default=0.05)
parser.add_argument('--update', choices=['Adam', 'Adagrad', 'Adadelta', 'SGD'], default='SGD')
args = parser.parse_args()
pw = wrapper(os.path.join(args.cp_root, args.checkpoint_name), args.checkpoint_name, enable_git_track=args.git_tracking)
gpu_index = pw.auto_device() if 'auto' == args.gpu else int(args.gpu)
device = torch.device("cuda:" + str(gpu_index) if gpu_index >= 0 else "cpu")
if gpu_index >= 0:
torch.cuda.set_device(gpu_index)
logger.info('Loading data')
dataset = pickle.load(open(args.corpus, 'rb'))
name_list = ['flm_map', 'blm_map', 'gw_map', 'c_map', 'y_map', 'emb_array', 'train_data', 'test_data', 'dev_data']
flm_map, blm_map, gw_map, c_map, y_map, emb_array, train_data, test_data, dev_data = [dataset[tup] for tup in name_list ]
logger.info('Loading language model')
rnn_map = {'Basic': BasicRNN, 'DenseNet': DenseRNN, 'LDNet': functools.partial(LDRNN, layer_drop = 0)}
flm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
blm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
flm_model = LM(flm_rnn_layer, None, len(flm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
blm_model = LM(blm_rnn_layer, None, len(blm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
flm_file = wrapper.restore_checkpoint(args.forward_lm)['model']
flm_model.load_state_dict(flm_file, False)
blm_file = wrapper.restore_checkpoint(args.backward_lm)['model']
blm_model.load_state_dict(blm_file, False)
slm_map = {'vanilla': BasicSeqLM, 'sparse-lm': SparseSeqLM}
flm_model_seq = slm_map[args.seq_lm_model](flm_model, False, args.lm_droprate, True)
blm_model_seq = slm_map[args.seq_lm_model](blm_model, True, args.lm_droprate, True)
logger.info('Building models')
SL_map = {'vanilla':Vanilla_SeqLabel, 'lm-aug': SeqLabel}
seq_model = SL_map[args.seq_model](flm_model_seq, blm_model_seq, len(c_map), args.seq_c_dim, args.seq_c_hid, args.seq_c_layer, len(gw_map), args.seq_w_dim, args.seq_w_hid, args.seq_w_layer, len(y_map), args.seq_droprate, unit=args.seq_rnn_unit)
seq_model.rand_init()
seq_model.load_pretrained_word_embedding(torch.FloatTensor(emb_array))
seq_model.to(device)
crit = CRFLoss(y_map)
decoder = CRFDecode(y_map)
evaluator = eval_wc(decoder, 'f1')
logger.info('Constructing dataset')
train_dataset, test_dataset, dev_dataset = [SeqDataset(tup_data, flm_map['\n'], blm_map['\n'], gw_map['<\n>'], c_map[' '], c_map['\n'], y_map['<s>'], y_map['<eof>'], len(y_map), args.batch_size) for tup_data in [train_data, test_data, dev_data]]
logger.info('Constructing optimizer')
param_dict = filter(lambda t: t.requires_grad, seq_model.parameters())
optim_map = {'Adam' : optim.Adam, 'Adagrad': optim.Adagrad, 'Adadelta': optim.Adadelta, 'SGD': functools.partial(optim.SGD, momentum=0.9)}
if args.lr > 0:
optimizer=optim_map[args.update](param_dict, lr=args.lr)
else:
optimizer=optim_map[args.update](param_dict)
logger.info('Saving configues.')
pw.save_configue(args)
logger.info('Setting up training environ.')
best_f1 = float('-inf')
patience_count = 0
batch_index = 0
normalizer=0
tot_loss = 0
for indexs in range(args.epoch):
logger.info('############')
logger.info('Epoch: {}'.format(indexs))
pw.nvidia_memory_map()
seq_model.train()
for f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w, f_y, f_y_m, _ in train_dataset.get_tqdm(device):
seq_model.zero_grad()
output = seq_model(f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w)
loss = crit(output, f_y, f_y_m)
tot_loss += utils.to_scalar(loss)
normalizer += 1
loss.backward()
torch.nn.utils.clip_grad_norm_(seq_model.parameters(), args.clip)
optimizer.step()
batch_index += 1
if 0 == batch_index % 100:
pw.add_loss_vs_batch({'training_loss': tot_loss / (normalizer + 1e-9)}, batch_index, use_logger = False)
tot_loss = 0
normalizer = 0
if args.lr > 0:
current_lr = args.lr / (1 + (indexs + 1) * args.lr_decay)
utils.adjust_learning_rate(optimizer, current_lr)
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
pw.add_loss_vs_batch({'dev_f1': dev_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'dev_pre': dev_pre, 'dev_rec': dev_rec}, indexs, use_logger = False)
logger.info('Saving model...')
pw.save_checkpoint(model = seq_model, is_best = (dev_f1 > best_f1))
if dev_f1 > best_f1:
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
best_f1, best_dev_pre, best_dev_rec, best_dev_acc = dev_f1, dev_pre, dev_rec, dev_acc
pw.add_loss_vs_batch({'test_f1': test_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'test_pre': test_pre, 'test_rec': test_rec}, indexs, use_logger = False)
patience_count = 0
else:
patience_count += 1
if patience_count >= args.patience:
break
pw.close()
================================================
FILE: train_seq_elmo.py
================================================
from __future__ import print_function
import datetime
import time
import torch
import torch.nn as nn
import torch.optim as optim
import codecs
import pickle
import math
import numpy as np
from model_word_ada.LM import LM
from model_word_ada.basic import BasicRNN
from model_word_ada.densenet import DenseRNN
from model_word_ada.ldnet import LDRNN
from model_seq.crf import CRFLoss, CRFDecode
from model_seq.dataset import SeqDataset
from model_seq.evaluator import eval_wc
from model_seq.seqlabel import SeqLabel, Vanilla_SeqLabel
from model_seq.seqlm import BasicSeqLM
from model_seq.elmo import ElmoLM
import model_seq.utils as utils
from torch_scope import wrapper
import argparse
import logging
import json
import os
import sys
import itertools
import functools
logger = logging.getLogger(__name__)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, default="auto")
parser.add_argument('--cp_root', default='./checkpoint')
parser.add_argument('--checkpoint_name', default='elmo_ner')
parser.add_argument('--git_tracking', action='store_true')
parser.add_argument('--corpus', default='./data/ner_dataset.pk')
parser.add_argument('--forward_lm', default='./checkpoint/basic0.th')
parser.add_argument('--backward_lm', default='./checkpoint/basic_0.th')
parser.add_argument('--lm_hid_dim', type=int, default=2048)
parser.add_argument('--lm_word_dim', type=int, default=300)
parser.add_argument('--lm_label_dim', type=int, default=-1)
parser.add_argument('--lm_layer_num', type=int, default=2)
parser.add_argument('--lm_droprate', type=float, default=0.5)
parser.add_argument('--lm_rnn_layer', choices=['Basic'], default='Basic')
parser.add_argument('--lm_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_c_dim', type=int, default=30)
parser.add_argument('--seq_c_hid', type=int, default=150)
parser.add_argument('--seq_c_layer', type=int, default=1)
parser.add_argument('--seq_w_dim', type=int, default=100)
parser.add_argument('--seq_w_hid', type=int, default=300)
parser.add_argument('--seq_w_layer', type=int, default=1)
parser.add_argument('--seq_droprate', type=float, default=0.5)
parser.add_argument('--seq_model', choices=['vanilla', 'lm-aug'], default='lm-aug')
parser.add_argument('--seq_rnn_unit', choices=['gru', 'lstm', 'rnn'], default='lstm')
parser.add_argument('--seq_lambda0', type=float, default=0.01)
parser.add_argument('--batch_size', type=int, default=10)
parser.add_argument('--patience', type=int, default=15)
parser.add_argument('--epoch', type=int, default=200)
parser.add_argument('--clip', type=float, default=5)
parser.add_argument('--lr', type=float, default=0.015)
parser.add_argument('--lr_decay', type=float, default=0.05)
parser.add_argument('--update', choices=['Adam', 'Adagrad', 'Adadelta', 'SGD'], default='SGD')
args = parser.parse_args()
pw = wrapper(os.path.join(args.cp_root, args.checkpoint_name), args.checkpoint_name, enable_git_track=args.git_tracking)
gpu_index = pw.auto_device() if 'auto' == args.gpu else int(args.gpu)
device = torch.device("cuda:" + str(gpu_index) if gpu_index >= 0 else "cpu")
if gpu_index >= 0:
torch.cuda.set_device(gpu_index)
logger.info('Loading data')
dataset = pickle.load(open(args.corpus, 'rb'))
name_list = ['flm_map', 'blm_map', 'gw_map', 'c_map', 'y_map', 'emb_array', 'train_data', 'test_data', 'dev_data']
flm_map, blm_map, gw_map, c_map, y_map, emb_array, train_data, test_data, dev_data = [dataset[tup] for tup in name_list ]
logger.info('Loading language model')
rnn_map = {'Basic': BasicRNN}
flm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
blm_rnn_layer = rnn_map[args.lm_rnn_layer](args.lm_layer_num, args.lm_rnn_unit, args.lm_word_dim, args.lm_hid_dim, args.lm_droprate)
flm_model = LM(flm_rnn_layer, None, len(flm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
blm_model = LM(blm_rnn_layer, None, len(blm_map), args.lm_word_dim, args.lm_droprate, label_dim = args.lm_label_dim)
flm_file = wrapper.restore_checkpoint(args.forward_lm)['model']
flm_model.load_state_dict(flm_file, False)
blm_file = wrapper.restore_checkpoint(args.backward_lm)['model']
blm_model.load_state_dict(blm_file, False)
flm_model_seq = ElmoLM(flm_model, False, args.lm_droprate, True)
blm_model_seq = ElmoLM(blm_model, True, args.lm_droprate, True)
logger.info('Building model')
SL_map = {'vanilla':Vanilla_SeqLabel, 'lm-aug': SeqLabel}
seq_model = SL_map[args.seq_model](flm_model_seq, blm_model_seq, len(c_map), args.seq_c_dim, args.seq_c_hid, args.seq_c_layer, len(gw_map), args.seq_w_dim, args.seq_w_hid, args.seq_w_layer, len(y_map), args.seq_droprate, unit=args.seq_rnn_unit)
seq_model.rand_init()
seq_model.load_pretrained_word_embedding(torch.FloatTensor(emb_array))
seq_model.to(device)
crit = CRFLoss(y_map)
decoder = CRFDecode(y_map)
evaluator = eval_wc(decoder, 'f1')
print('constructing dataset')
train_dataset, test_dataset, dev_dataset = [SeqDataset(tup_data, flm_map['\n'], blm_map['\n'], gw_map['<\n>'], c_map[' '], c_map['\n'], y_map['<s>'], y_map['<eof>'], len(y_map), args.batch_size) for tup_data in [train_data, test_data, dev_data]]
print('constructing optimizer')
param_dict = filter(lambda t: t.requires_grad, seq_model.parameters())
optim_map = {'Adam' : optim.Adam, 'Adagrad': optim.Adagrad, 'Adadelta': optim.Adadelta, 'SGD': functools.partial(optim.SGD, momentum=0.9)}
if args.lr > 0:
optimizer=optim_map[args.update](param_dict, lr=args.lr)
else:
optimizer=optim_map[args.update](param_dict)
logger.info('Saving configues.')
pw.save_configue(args)
logger.info('Setting up training environ.')
best_f1 = float('-inf')
patience_count = 0
batch_index = 0
normalizer = 0
tot_loss = 0
for indexs in range(args.epoch):
logger.info('############')
logger.info('Epoch: {}'.format(indexs))
pw.nvidia_memory_map()
seq_model.train()
for f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w, f_y, f_y_m, _ in train_dataset.get_tqdm(device):
seq_model.zero_grad()
output = seq_model(f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w)
loss = crit(output, f_y, f_y_m)
tot_loss += utils.to_scalar(loss)
normalizer += 1
if args.seq_lambda0 > 0:
loss += args.seq_lambda0 * (flm_model_seq.regularizer(args.seq_lambda1) + blm_model_seq.regularizer(args.seq_lambda1))
loss.backward()
torch.nn.utils.clip_grad_norm_(seq_model.parameters(), args.clip)
optimizer.step()
batch_index += 1
if 0 == batch_index % 100:
pw.add_loss_vs_batch({'training_loss': tot_loss / (normalizer + 1e-9)}, batch_index, use_logger = False)
tot_loss = 0
normalizer = 0
if args.lr > 0:
current_lr = args.lr / (1 + (indexs + 1) * args.lr_decay)
utils.adjust_learning_rate(optimizer, current_lr)
dev_f1, dev_pre, dev_rec, dev_acc = evaluator.calc_score(seq_model, dev_dataset.get_tqdm(device))
pw.add_loss_vs_batch({'dev_f1': dev_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'dev_pre': dev_pre, 'dev_rec': dev_rec}, indexs, use_logger = False)
logger.info('Saving model...')
pw.save_checkpoint(model = seq_model, is_best = (dev_f1 > best_f1))
if dev_f1 > best_f1:
test_f1, test_pre, test_rec, test_acc = evaluator.calc_score(seq_model, test_dataset.get_tqdm(device))
best_f1, best_dev_pre, best_dev_rec, best_dev_acc = dev_f1, dev_pre, dev_rec, dev_acc
pw.add_loss_vs_batch({'test_f1': test_f1}, indexs, use_logger = True)
pw.add_loss_vs_batch({'test_pre': test_pre, 'test_rec': test_rec}, indexs, use_logger = False)
patience_count = 0
else:
patience_count += 1
if patience_count >= args.patience:
break
pw.close()
gitextract_4ncvupf5/ ├── LICENSE ├── ReadMe.md ├── docs/ │ ├── Makefile │ └── source/ │ ├── conf.py │ ├── index.rst │ ├── seq.rst │ └── word.rst ├── ldnet_ner_prune.sh ├── ldnet_np_prune.sh ├── model_seq/ │ ├── __init__.py │ ├── crf.py │ ├── dataset.py │ ├── elmo.py │ ├── evaluator.py │ ├── seqlabel.py │ ├── seqlm.py │ ├── sparse_lm.py │ └── utils.py ├── model_word_ada/ │ ├── LM.py │ ├── __init__.py │ ├── adaptive.py │ ├── basic.py │ ├── dataset.py │ ├── densenet.py │ ├── ldnet.py │ └── utils.py ├── pre_seq/ │ ├── encode_data.py │ └── gene_map.py ├── pre_word_ada/ │ ├── encode_data2folder.py │ └── gene_map.py ├── prune_sparse_seq.py ├── train_lm.py ├── train_seq.py └── train_seq_elmo.py
SYMBOL INDEX (154 symbols across 18 files)
FILE: model_seq/crf.py
class CRF (line 14) | class CRF(nn.Module):
method __init__ (line 27) | def __init__(self,
method rand_init (line 36) | def rand_init(self):
method forward (line 43) | def forward(self, feats):
class CRFLoss (line 63) | class CRFLoss(nn.Module):
method __init__ (line 76) | def __init__(self,
method forward (line 85) | def forward(self, scores, target, mask):
class CRFDecode (line 129) | class CRFDecode():
method __init__ (line 139) | def __init__(self, y_map: dict):
method decode (line 146) | def decode(self, scores, mask):
method to_spans (line 189) | def to_spans(self, sequence):
FILE: model_seq/dataset.py
class SeqDataset (line 19) | class SeqDataset(object):
method __init__ (line 46) | def __init__(self,
method shuffle (line 72) | def shuffle(self):
method get_tqdm (line 78) | def get_tqdm(self, device):
method construct_index (line 90) | def construct_index(self, dataset):
method reader (line 109) | def reader(self, device):
method batchify (line 131) | def batchify(self, batch, device):
FILE: model_seq/elmo.py
class EBUnit (line 18) | class EBUnit(nn.Module):
method __init__ (line 31) | def __init__(self, ori_unit, droprate, fix_rate):
method forward (line 40) | def forward(self, x):
class ERNN (line 61) | class ERNN(nn.Module):
method __init__ (line 74) | def __init__(self, ori_drnn, droprate, fix_rate):
method regularizer (line 93) | def regularizer(self):
method forward (line 104) | def forward(self, x):
class ElmoLM (line 125) | class ElmoLM(nn.Module):
method __init__ (line 141) | def __init__(self, ori_lm, backward, droprate, fix_rate):
method init_hidden (line 155) | def init_hidden(self):
method regularizer (line 161) | def regularizer(self):
method prox (line 172) | def prox(self, lambda0):
method forward (line 178) | def forward(self, w_in, ind=None):
FILE: model_seq/evaluator.py
class eval_batch (line 14) | class eval_batch:
method __init__ (line 23) | def __init__(self, decoder):
method reset (line 26) | def reset(self):
method calc_f1_batch (line 36) | def calc_f1_batch(self, decoded_data, target_data):
method calc_acc_batch (line 60) | def calc_acc_batch(self, decoded_data, target_data):
method f1_score (line 82) | def f1_score(self):
method acc_score (line 96) | def acc_score(self):
method eval_instance (line 105) | def eval_instance(self, best_path, gold):
class eval_wc (line 130) | class eval_wc(eval_batch):
method __init__ (line 141) | def __init__(self, decoder, score_type):
method calc_score (line 151) | def calc_score(self, seq_model, dataset_loader):
FILE: model_seq/seqlabel.py
class SeqLabel (line 13) | class SeqLabel(nn.Module):
method __init__ (line 46) | def __init__(self, f_lm, b_lm,
method to_params (line 87) | def to_params(self):
method prune_dense_rnn (line 109) | def prune_dense_rnn(self):
method set_batch_seq_size (line 120) | def set_batch_seq_size(self, sentence):
method load_pretrained_word_embedding (line 128) | def load_pretrained_word_embedding(self, pre_word_embeddings):
method rand_init (line 134) | def rand_init(self):
method forward (line 146) | def forward(self, f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w):
class Vanilla_SeqLabel (line 208) | class Vanilla_SeqLabel(nn.Module):
method __init__ (line 241) | def __init__(self, f_lm, b_lm, c_num, c_dim, c_hidden, c_layer, w_num,...
method set_batch_seq_size (line 262) | def set_batch_seq_size(self, sentence):
method load_pretrained_word_embedding (line 270) | def load_pretrained_word_embedding(self, pre_word_embeddings):
method rand_init (line 276) | def rand_init(self):
method forward (line 287) | def forward(self, f_c, f_p, b_c, b_p, flm_w, blm_w, blm_ind, f_w):
FILE: model_seq/seqlm.py
class BasicSeqLM (line 18) | class BasicSeqLM(nn.Module):
method __init__ (line 33) | def __init__(self, ori_lm, backward, droprate, fix_rate):
method to_params (line 50) | def to_params(self):
method init_hidden (line 60) | def init_hidden(self):
method regularizer (line 66) | def regularizer(self):
method forward (line 77) | def forward(self, w_in, ind=None):
FILE: model_seq/sparse_lm.py
class SBUnit (line 14) | class SBUnit(nn.Module):
method __init__ (line 27) | def __init__(self, ori_unit, droprate, fix_rate):
method prune_rnn (line 40) | def prune_rnn(self, mask):
method forward (line 53) | def forward(self, x, weight=1):
class SDRNN (line 81) | class SDRNN(nn.Module):
method __init__ (line 94) | def __init__(self, ori_drnn, droprate, fix_rate):
method to_params (line 118) | def to_params(self):
method prune_dense_rnn (line 132) | def prune_dense_rnn(self):
method prox (line 170) | def prox(self):
method regularizer (line 179) | def regularizer(self):
method forward (line 199) | def forward(self, x):
class SparseSeqLM (line 219) | class SparseSeqLM(nn.Module):
method __init__ (line 235) | def __init__(self, ori_lm, backward, droprate, fix_rate):
method to_params (line 249) | def to_params(self):
method prune_dense_rnn (line 260) | def prune_dense_rnn(self):
method init_hidden (line 268) | def init_hidden(self):
method regularizer (line 274) | def regularizer(self):
method prox (line 285) | def prox(self):
method forward (line 291) | def forward(self, w_in, ind=None):
FILE: model_seq/utils.py
function log_sum_exp (line 17) | def log_sum_exp(vec):
function repackage_hidden (line 35) | def repackage_hidden(h):
function to_scalar (line 54) | def to_scalar(var):
function init_embedding (line 60) | def init_embedding(input_embedding):
function init_linear (line 67) | def init_linear(input_linear):
function adjust_learning_rate (line 76) | def adjust_learning_rate(optimizer, lr):
function init_lstm (line 90) | def init_lstm(input_lstm):
FILE: model_word_ada/LM.py
class LM (line 12) | class LM(nn.Module):
method __init__ (line 32) | def __init__(self, rnn, soft_max, w_num, w_dim, droprate, label_dim = ...
method load_embed (line 54) | def load_embed(self, origin_lm):
method rand_ini (line 61) | def rand_ini(self):
method init_hidden (line 74) | def init_hidden(self):
method forward (line 80) | def forward(self, w_in, target):
method log_prob (line 111) | def log_prob(self, w_in):
FILE: model_word_ada/adaptive.py
class AdaptiveSoftmax (line 12) | class AdaptiveSoftmax(nn.Module):
method __init__ (line 24) | def __init__(self, input_size, cutoff):
method rand_ini (line 44) | def rand_ini(self):
method log_prob (line 54) | def log_prob(self, w_in, device):
method forward (line 90) | def forward(self, w_in, target):
FILE: model_word_ada/basic.py
class BasicUnit (line 12) | class BasicUnit(nn.Module):
method __init__ (line 27) | def __init__(self, unit, input_dim, hid_dim, droprate):
method init_hidden (line 42) | def init_hidden(self):
method rand_ini (line 48) | def rand_ini(self):
method forward (line 55) | def forward(self, x):
class BasicRNN (line 78) | class BasicRNN(nn.Module):
method __init__ (line 95) | def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate):
method to_params (line 105) | def to_params(self):
method init_hidden (line 118) | def init_hidden(self):
method rand_ini (line 125) | def rand_ini(self):
method forward (line 132) | def forward(self, x):
FILE: model_word_ada/dataset.py
class EvalDataset (line 18) | class EvalDataset(object):
method __init__ (line 29) | def __init__(self, dataset, sequence_length):
method get_tqdm (line 37) | def get_tqdm(self, device):
method construct_index (line 49) | def construct_index(self):
method reader (line 66) | def reader(self, device):
class LargeDataset (line 91) | class LargeDataset(object):
method __init__ (line 106) | def __init__(self, root, range_idx, batch_size, sequence_length):
method shuffle (line 119) | def shuffle(self):
method get_tqdm (line 125) | def get_tqdm(self, device):
method reader (line 145) | def reader(self, device):
method open_next (line 175) | def open_next(self):
FILE: model_word_ada/densenet.py
class BasicUnit (line 12) | class BasicUnit(nn.Module):
method __init__ (line 27) | def __init__(self, unit, input_dim, increase_rate, droprate):
method init_hidden (line 47) | def init_hidden(self):
method rand_ini (line 53) | def rand_ini(self):
method forward (line 59) | def forward(self, x):
class DenseRNN (line 86) | class DenseRNN(nn.Module):
method __init__ (line 103) | def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate):
method to_params (line 114) | def to_params(self):
method init_hidden (line 127) | def init_hidden(self):
method rand_ini (line 134) | def rand_ini(self):
method forward (line 141) | def forward(self, x):
FILE: model_word_ada/ldnet.py
class BasicUnit (line 13) | class BasicUnit(nn.Module):
method __init__ (line 30) | def __init__(self, unit, input_dim, increase_rate, droprate, layer_dro...
method init_hidden (line 52) | def init_hidden(self):
method rand_ini (line 58) | def rand_ini(self):
method forward (line 64) | def forward(self, x, p_out):
class LDRNN (line 102) | class LDRNN(nn.Module):
method __init__ (line 121) | def __init__(self, layer_num, unit, emb_dim, hid_dim, droprate, layer_...
method to_params (line 134) | def to_params(self):
method init_hidden (line 148) | def init_hidden(self):
method rand_ini (line 155) | def rand_ini(self):
method forward (line 162) | def forward(self, x):
FILE: model_word_ada/utils.py
function repackage_hidden (line 17) | def repackage_hidden(h):
function to_scalar (line 36) | def to_scalar(var):
function init_embedding (line 42) | def init_embedding(input_embedding):
function init_linear (line 49) | def init_linear(input_linear):
function adjust_learning_rate (line 58) | def adjust_learning_rate(optimizer, lr):
function init_lstm (line 72) | def init_lstm(input_lstm):
FILE: pre_seq/encode_data.py
function encode_dataset (line 18) | def encode_dataset(input_file, flm_map, blm_map, gw_map, c_map, y_map):
FILE: pre_word_ada/encode_data2folder.py
function encode_dataset (line 18) | def encode_dataset(input_folder, w_map, reverse):
function encode_dataset2file (line 41) | def encode_dataset2file(input_folder, t, w_map, reverse):
FILE: train_lm.py
function evaluate (line 31) | def evaluate(data_loader, lm_model, limited = 76800):
Condensed preview — 34 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (163K chars).
[
{
"path": "LICENSE",
"chars": 11344,
"preview": " Apache License\n Version 2.0, January 2004\n "
},
{
"path": "ReadMe.md",
"chars": 5315,
"preview": "# LD-Net\n\n[](http://ld-net.readthe"
},
{
"path": "docs/Makefile",
"chars": 631,
"preview": "# Minimal makefile for Sphinx documentation\r\n#\r\n\r\n# You can set these variables from the command line.\r\nSPHINXOPTS =\r"
},
{
"path": "docs/source/conf.py",
"chars": 6083,
"preview": "#!/usr/bin/env python3\r\n# -*- coding: utf-8 -*-\r\n#\r\n# Wrapper documentation build configuration file, created by\r\n# sphi"
},
{
"path": "docs/source/index.rst",
"chars": 1366,
"preview": ".. LD-Net documentation master file.\r\n\r\n:github_url: https://github.com/LiyuanLucasLiu/LD-Net\r\n\r\nLD-Net documentation\r\n="
},
{
"path": "docs/source/seq.rst",
"chars": 818,
"preview": "Sequence Labeling\n==========================\n\nmodel_seq\\.crf module\n----------------------\n.. automodule:: model_seq.crf"
},
{
"path": "docs/source/word.rst",
"chars": 809,
"preview": "Language Modeling\n==========================\n\nmodel_word_ada\\.adaptive module\n-------------------------------\n.. automod"
},
{
"path": "ldnet_ner_prune.sh",
"chars": 843,
"preview": "FIRST_RUN=1\n\nDATA_ROOT=\"data/\"\nNER_DATASET=$DATA_ROOT/ner_dataset.pk\n\nCHECKPOINT_ROOT=\"checkpoint/\"\nNER_CHECKPOINT=$CHEC"
},
{
"path": "ldnet_np_prune.sh",
"chars": 830,
"preview": "FIRST_RUN=1\n\nDATA_ROOT=\"data/\"\nNP_DATASET=$DATA_ROOT/np_dataset.pk\n\nCHECKPOINT_ROOT=\"checkpoint/\"\nNP_CHECKPOINT=$CHECKPO"
},
{
"path": "model_seq/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "model_seq/crf.py",
"chars": 8808,
"preview": "\"\"\"\n.. module:: crf\n :synopsis: conditional random field\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\n\nimport torch\nimport torc"
},
{
"path": "model_seq/dataset.py",
"chars": 6373,
"preview": "\"\"\"\n.. module:: dataset\n :synopsis: dataset for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\n\nimport torch\nim"
},
{
"path": "model_seq/elmo.py",
"chars": 5257,
"preview": "\"\"\"\n.. module:: elmo\n :synopsis: deep contextualized representation\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport time\n\ni"
},
{
"path": "model_seq/evaluator.py",
"chars": 5408,
"preview": "\"\"\"\n.. module:: evaluator\n :synopsis: evaluator for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch"
},
{
"path": "model_seq/seqlabel.py",
"chars": 12086,
"preview": "\"\"\"\n.. module:: seqlabel\n :synopsis: sequence labeling model\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimport t"
},
{
"path": "model_seq/seqlm.py",
"chars": 2586,
"preview": "\"\"\"\n.. module:: seqlm\n :synopsis: language model for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport time"
},
{
"path": "model_seq/sparse_lm.py",
"chars": 9377,
"preview": "\"\"\"\n.. module:: sparse_lm\n :synopsis: sparse language model for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\n"
},
{
"path": "model_seq/utils.py",
"chars": 2954,
"preview": "\"\"\"\n.. module:: utils\n :synopsis: utils\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport numpy as np\nimport torch\nimport jso"
},
{
"path": "model_word_ada/LM.py",
"chars": 3480,
"preview": "\"\"\"\n.. module:: LM\n :synopsis: language modeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimport torch.nn as n"
},
{
"path": "model_word_ada/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "model_word_ada/adaptive.py",
"chars": 3813,
"preview": "\"\"\"\n.. module:: adaptive\n :synopsis: adaptive softmax\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nfrom torch impo"
},
{
"path": "model_word_ada/basic.py",
"chars": 3858,
"preview": "\"\"\"\n.. module:: basic\n :synopsis: basic rnn\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimport torch.nn as nn\nimp"
},
{
"path": "model_word_ada/dataset.py",
"chars": 5600,
"preview": "\"\"\"\n.. module:: dataset\n :synopsis: dataset for language modeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimp"
},
{
"path": "model_word_ada/densenet.py",
"chars": 4105,
"preview": "\"\"\"\n.. module:: densenet\n :synopsis: densernn\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimport torch.nn as nn\ni"
},
{
"path": "model_word_ada/ldnet.py",
"chars": 5149,
"preview": "\"\"\"\n.. module:: ldnet\n :synopsis: LD-Net\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport"
},
{
"path": "model_word_ada/utils.py",
"chars": 2477,
"preview": "\"\"\"\n.. module:: utils\n :synopsis: utils\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport numpy as np\nimport torch\nimport jso"
},
{
"path": "pre_seq/encode_data.py",
"chars": 2758,
"preview": "\"\"\"\n.. module:: encode_data\n :synopsis: encode data for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport p"
},
{
"path": "pre_seq/gene_map.py",
"chars": 2965,
"preview": "\"\"\"\n.. module:: gene_map\n :synopsis: generate map for sequence labeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport pic"
},
{
"path": "pre_word_ada/encode_data2folder.py",
"chars": 2863,
"preview": "\"\"\"\n.. module:: encode_data2folder\n :synopsis: encode data folder for language modeling\n \n.. moduleauthor:: Liyuan Li"
},
{
"path": "pre_word_ada/gene_map.py",
"chars": 1044,
"preview": "\"\"\"\n.. module:: gene_map\n :synopsis: gene map for language modeling\n \n.. moduleauthor:: Liyuan Liu\n\"\"\"\nimport pickle\n"
},
{
"path": "prune_sparse_seq.py",
"chars": 11503,
"preview": "from __future__ import print_function\nimport datetime\nimport time\nimport torch\nimport torch.autograd as autograd\nimport "
},
{
"path": "train_lm.py",
"chars": 7798,
"preview": "from __future__ import print_function\nimport datetime\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.optim "
},
{
"path": "train_seq.py",
"chars": 8415,
"preview": "from __future__ import print_function\nimport datetime\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.optim "
},
{
"path": "train_seq_elmo.py",
"chars": 8374,
"preview": "from __future__ import print_function\nimport datetime\nimport time\nimport torch\nimport torch.nn as nn\nimport torch.optim "
}
]
About this extraction
This page contains the full source code of the LiyuanLucasLiu/LD-Net GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 34 files (151.5 KB), approximately 38.8k tokens, and a symbol index with 154 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.