Full Code of hankcs/HanLP for AI

doc-zh ddb1299bddff cached

697 files

3.2 MB

879.0k tokens

3347 symbols

1 requests

Download .txt

Showing preview only (3,496K chars total). Download the full file or copy to clipboard to get everything.

Repository: hankcs/HanLP
Branch: doc-zh
Commit: ddb1299bddff
Files: 697
Total size: 3.2 MB

Directory structure:
gitextract_p7um9exn/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   └── feature_request.md
│   ├── pull_request_template.md
│   └── workflows/
│       └── unit-tests.yml
├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── docs/
│   ├── Makefile
│   ├── annotations/
│   │   ├── constituency/
│   │   │   ├── ctb.md
│   │   │   ├── index.md
│   │   │   ├── npcmj.md
│   │   │   └── ptb.md
│   │   ├── dep/
│   │   │   ├── index.md
│   │   │   ├── pmt.md
│   │   │   ├── sd_en.md
│   │   │   ├── sd_zh.md
│   │   │   └── ud.md
│   │   ├── index.md
│   │   ├── ner/
│   │   │   ├── index.md
│   │   │   ├── msra.md
│   │   │   ├── ontonotes.md
│   │   │   └── pku.md
│   │   ├── pos/
│   │   │   ├── 863.md
│   │   │   ├── ctb.md
│   │   │   ├── index.md
│   │   │   ├── npcmj.md
│   │   │   ├── pku.md
│   │   │   └── ud.md
│   │   ├── sdp/
│   │   │   ├── dm.md
│   │   │   ├── index.md
│   │   │   ├── pas.md
│   │   │   ├── psd.md
│   │   │   └── semeval16.md
│   │   ├── srl/
│   │   │   ├── cpb.md
│   │   │   ├── index.md
│   │   │   └── propbank.md
│   │   └── tok/
│   │       ├── ctb.md
│   │       ├── index.md
│   │       └── msr.md
│   ├── api/
│   │   ├── common/
│   │   │   ├── configurable.rst
│   │   │   ├── conll.rst
│   │   │   ├── constant.rst
│   │   │   ├── document.rst
│   │   │   └── index.md
│   │   ├── hanlp/
│   │   │   ├── common/
│   │   │   │   ├── component.rst
│   │   │   │   ├── dataset.md
│   │   │   │   ├── index.md
│   │   │   │   ├── structure.md
│   │   │   │   ├── torch_component.md
│   │   │   │   ├── transform.md
│   │   │   │   └── vocab.md
│   │   │   ├── components/
│   │   │   │   ├── classifiers.md
│   │   │   │   ├── eos.md
│   │   │   │   ├── index.md
│   │   │   │   ├── lemmatizer.md
│   │   │   │   ├── mtl/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── mtl.md
│   │   │   │   │   └── tasks/
│   │   │   │   │       ├── constituency.md
│   │   │   │   │       ├── dep.md
│   │   │   │   │       ├── index.md
│   │   │   │   │       ├── lem.md
│   │   │   │   │       ├── ner/
│   │   │   │   │       │   ├── biaffine_ner.md
│   │   │   │   │       │   ├── index.md
│   │   │   │   │       │   └── tag_ner.md
│   │   │   │   │       ├── pos.md
│   │   │   │   │       ├── sdp.md
│   │   │   │   │       ├── srl/
│   │   │   │   │       │   ├── bio_srl.md
│   │   │   │   │       │   ├── index.md
│   │   │   │   │       │   └── rank_srl.md
│   │   │   │   │       ├── task.md
│   │   │   │   │       ├── tok.md
│   │   │   │   │       └── ud.md
│   │   │   │   ├── ner/
│   │   │   │   │   ├── biaffine_ner.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── rnn_ner.md
│   │   │   │   │   └── transformer_ner.md
│   │   │   │   ├── parsers/
│   │   │   │   │   ├── biaffine_dep.md
│   │   │   │   │   ├── biaffine_sdp.md
│   │   │   │   │   ├── crf_constituency_parser.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── ud_parser.md
│   │   │   │   ├── pipeline.md
│   │   │   │   ├── srl/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── span_bio.md
│   │   │   │   │   └── span_rank.md
│   │   │   │   ├── sts.md
│   │   │   │   ├── taggers/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── rnn_tagger.md
│   │   │   │   │   └── transformer_tagger.md
│   │   │   │   └── tokenizers/
│   │   │   │       ├── index.md
│   │   │   │       ├── multi_criteria.md
│   │   │   │       └── transformer.md
│   │   │   ├── datasets/
│   │   │   │   ├── constituency/
│   │   │   │   │   ├── constituency_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── dep/
│   │   │   │   │   ├── conll_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── eos/
│   │   │   │   │   ├── eos.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── index.md
│   │   │   │   ├── ner/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── json.md
│   │   │   │   │   ├── resources.md
│   │   │   │   │   └── tsv.md
│   │   │   │   ├── pos/
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── srl/
│   │   │   │   │   ├── conll2012_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   └── tok/
│   │   │   │       ├── index.md
│   │   │   │       ├── mcws_dataset.md
│   │   │   │       ├── resources.md
│   │   │   │       └── txt.md
│   │   │   ├── hanlp.rst
│   │   │   ├── index.md
│   │   │   ├── layers/
│   │   │   │   ├── decoders/
│   │   │   │   │   ├── biaffine_ner.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── linear_crf.md
│   │   │   │   ├── embeddings/
│   │   │   │   │   ├── char_cnn.md
│   │   │   │   │   ├── char_rnn.md
│   │   │   │   │   ├── embedding.md
│   │   │   │   │   ├── fasttext.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── transformer.md
│   │   │   │   │   └── word2vec.md
│   │   │   │   ├── index.md
│   │   │   │   └── transformers/
│   │   │   │       ├── encoder.md
│   │   │   │       ├── index.md
│   │   │   │       └── tokenizer.md
│   │   │   ├── pretrained/
│   │   │   │   ├── amr.md
│   │   │   │   ├── amr2text.md
│   │   │   │   ├── constituency.md
│   │   │   │   ├── dep.md
│   │   │   │   ├── eos.md
│   │   │   │   ├── fasttext.md
│   │   │   │   ├── glove.md
│   │   │   │   ├── index.md
│   │   │   │   ├── mlm.md
│   │   │   │   ├── mtl.md
│   │   │   │   ├── ner.md
│   │   │   │   ├── pos.md
│   │   │   │   ├── sdp.md
│   │   │   │   ├── srl.md
│   │   │   │   ├── sts.md
│   │   │   │   ├── tok.md
│   │   │   │   └── word2vec.md
│   │   │   └── utils/
│   │   │       ├── index.md
│   │   │       └── io_util.md
│   │   ├── restful.rst
│   │   ├── restful_golang.md
│   │   ├── restful_java.md
│   │   └── trie/
│   │       ├── dictionary.md
│   │       ├── index.md
│   │       └── trie.md
│   ├── conf.py
│   ├── configure.md
│   ├── contributing.md
│   ├── data_format.md
│   ├── index.md
│   ├── install.md
│   ├── references.bib
│   ├── references.rst
│   └── tutorial.md
├── hanlp/
│   ├── __init__.py
│   ├── callbacks/
│   │   ├── __init__.py
│   │   └── fine_csv_logger.py
│   ├── common/
│   │   ├── __init__.py
│   │   ├── component.py
│   │   ├── dataset.py
│   │   ├── keras_component.py
│   │   ├── structure.py
│   │   ├── torch_component.py
│   │   ├── transform.py
│   │   ├── transform_tf.py
│   │   ├── vocab.py
│   │   └── vocab_tf.py
│   ├── components/
│   │   ├── __init__.py
│   │   ├── amr/
│   │   │   ├── __init__.py
│   │   │   ├── amrbart/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── bart_amr_generation.py
│   │   │   │   ├── bart_amr_parser.py
│   │   │   │   ├── common/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── constant.py
│   │   │   │   │   ├── penman_interface.py
│   │   │   │   │   └── postprocessing.py
│   │   │   │   ├── data_interface/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── dataset.py
│   │   │   │   ├── model_interface/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── modeling_bart.py
│   │   │   │   │   └── tokenization_bart.py
│   │   │   │   └── preprocess/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── amr_io.py
│   │   │   │       ├── penman_interface.py
│   │   │   │       └── read_and_process.py
│   │   │   └── seq2seq/
│   │   │       ├── __init__.py
│   │   │       ├── dataset/
│   │   │       │   ├── IO.py
│   │   │       │   ├── __init__.py
│   │   │       │   ├── dataset.py
│   │   │       │   ├── linearization.py
│   │   │       │   ├── penman.py
│   │   │       │   ├── postprocessing.py
│   │   │       │   ├── tokenization_bart.py
│   │   │       │   └── tokenization_t5.py
│   │   │       ├── evaluation.py
│   │   │       ├── optim.py
│   │   │       └── seq2seq_amr_parser.py
│   │   ├── classifiers/
│   │   │   ├── __init__.py
│   │   │   ├── fasttext_classifier.py
│   │   │   ├── transformer_classifier.py
│   │   │   ├── transformer_classifier_hf.py
│   │   │   ├── transformer_classifier_tf.py
│   │   │   └── transformer_regression_hf.py
│   │   ├── distillation/
│   │   │   ├── __init__.py
│   │   │   ├── distillable_component.py
│   │   │   ├── losses.py
│   │   │   └── schedulers.py
│   │   ├── eos/
│   │   │   ├── __init__.py
│   │   │   └── ngram.py
│   │   ├── lambda_wrapper.py
│   │   ├── lemmatizer.py
│   │   ├── lm/
│   │   │   ├── __init__.py
│   │   │   └── mlm.py
│   │   ├── mtl/
│   │   │   ├── __init__.py
│   │   │   ├── multi_task_learning.py
│   │   │   └── tasks/
│   │   │       ├── __init__.py
│   │   │       ├── amr.py
│   │   │       ├── constituency.py
│   │   │       ├── dep.py
│   │   │       ├── dep_2nd.py
│   │   │       ├── lem.py
│   │   │       ├── ner/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── biaffine_ner.py
│   │   │       │   └── tag_ner.py
│   │   │       ├── pos.py
│   │   │       ├── sdp.py
│   │   │       ├── srl/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── bio_srl.py
│   │   │       │   └── rank_srl.py
│   │   │       ├── tok/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── reg_tok.py
│   │   │       │   └── tag_tok.py
│   │   │       └── ud.py
│   │   ├── ner/
│   │   │   ├── __init__.py
│   │   │   ├── biaffine_ner/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── biaffine_ner.py
│   │   │   │   └── biaffine_ner_model.py
│   │   │   ├── ner_tf.py
│   │   │   ├── rnn_ner.py
│   │   │   └── transformer_ner.py
│   │   ├── parsers/
│   │   │   ├── __init__.py
│   │   │   ├── alg.py
│   │   │   ├── alg_tf.py
│   │   │   ├── biaffine/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── biaffine.py
│   │   │   │   ├── biaffine_2nd_dep.py
│   │   │   │   ├── biaffine_dep.py
│   │   │   │   ├── biaffine_model.py
│   │   │   │   ├── biaffine_sdp.py
│   │   │   │   ├── mlp.py
│   │   │   │   ├── structual_attention.py
│   │   │   │   └── variationalbilstm.py
│   │   │   ├── biaffine_parser_tf.py
│   │   │   ├── biaffine_tf/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── alg.py
│   │   │   │   ├── layers.py
│   │   │   │   └── model.py
│   │   │   ├── chu_liu_edmonds.py
│   │   │   ├── conll.py
│   │   │   ├── constituency/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── crf_constituency_model.py
│   │   │   │   ├── crf_constituency_parser.py
│   │   │   │   └── treecrf.py
│   │   │   ├── parse_alg.py
│   │   │   └── ud/
│   │   │       ├── __init__.py
│   │   │       ├── lemma_edit.py
│   │   │       ├── tag_decoder.py
│   │   │       ├── ud_model.py
│   │   │       ├── ud_parser.py
│   │   │       ├── udify_util.py
│   │   │       └── util.py
│   │   ├── pipeline.py
│   │   ├── rnn_language_model_tf.py
│   │   ├── srl/
│   │   │   ├── __init__.py
│   │   │   ├── span_bio/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── baffine_tagging.py
│   │   │   │   └── span_bio.py
│   │   │   └── span_rank/
│   │   │       ├── __init__.py
│   │   │       ├── highway_variational_lstm.py
│   │   │       ├── inference_utils.py
│   │   │       ├── layer.py
│   │   │       ├── span_rank.py
│   │   │       ├── span_ranking_srl_model.py
│   │   │       ├── srl_eval_utils.py
│   │   │       └── util.py
│   │   ├── sts/
│   │   │   ├── __init__.py
│   │   │   └── transformer_sts.py
│   │   ├── taggers/
│   │   │   ├── __init__.py
│   │   │   ├── cnn_tagger_tf.py
│   │   │   ├── ngram_conv/
│   │   │   │   ├── __init__.py
│   │   │   │   └── ngram_conv_tagger.py
│   │   │   ├── pos_tf.py
│   │   │   ├── rnn/
│   │   │   │   ├── __init__.py
│   │   │   │   └── rnntaggingmodel.py
│   │   │   ├── rnn_tagger.py
│   │   │   ├── rnn_tagger_tf.py
│   │   │   ├── tagger.py
│   │   │   ├── tagger_tf.py
│   │   │   ├── transformers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── metrics_tf.py
│   │   │   │   ├── transformer_tagger.py
│   │   │   │   ├── transformer_tagger_tf.py
│   │   │   │   └── transformer_transform_tf.py
│   │   │   └── util.py
│   │   └── tokenizers/
│   │       ├── __init__.py
│   │       ├── multi_criteria_cws_transformer.py
│   │       ├── tok.py
│   │       ├── tok_tf.py
│   │       └── transformer.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── classification/
│   │   │   ├── __init__.py
│   │   │   └── sentiment.py
│   │   ├── coref/
│   │   │   ├── __init__.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── conll12coref.py
│   │   ├── eos/
│   │   │   ├── __init__.py
│   │   │   ├── eos.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── nn_eos.py
│   │   ├── lm/
│   │   │   ├── __init__.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── lm_dataset.py
│   │   ├── lu/
│   │   │   ├── __init__.py
│   │   │   └── glue.py
│   │   ├── ner/
│   │   │   ├── __init__.py
│   │   │   ├── conll03.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── json_ner.py
│   │   │   │   └── tsv.py
│   │   │   ├── msra.py
│   │   │   ├── resume.py
│   │   │   └── weibo.py
│   │   ├── parsing/
│   │   │   ├── __init__.py
│   │   │   ├── amr.py
│   │   │   ├── ctb5.py
│   │   │   ├── ctb7.py
│   │   │   ├── ctb8.py
│   │   │   ├── ctb9.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _ctb_utils.py
│   │   │   │   ├── conll_dataset.py
│   │   │   │   └── constituency_dataset.py
│   │   │   ├── pmt1.py
│   │   │   ├── ptb.py
│   │   │   ├── semeval15.py
│   │   │   ├── semeval16.py
│   │   │   └── ud/
│   │   │       ├── __init__.py
│   │   │       ├── ud210.py
│   │   │       ├── ud210m.py
│   │   │       ├── ud23.py
│   │   │       ├── ud23m.py
│   │   │       ├── ud27.py
│   │   │       └── ud27m.py
│   │   ├── pos/
│   │   │   ├── __init__.py
│   │   │   └── ctb5.py
│   │   ├── qa/
│   │   │   ├── __init__.py
│   │   │   └── hotpotqa.py
│   │   ├── srl/
│   │   │   ├── __init__.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conll2012.py
│   │   │   │   └── ontonotes_loader.py
│   │   │   └── ontonotes5/
│   │   │       ├── __init__.py
│   │   │       ├── _utils.py
│   │   │       ├── chinese.py
│   │   │       └── english.py
│   │   ├── sts/
│   │   │   ├── __init__.py
│   │   │   └── stsb.py
│   │   └── tokenization/
│   │       ├── __init__.py
│   │       ├── ctb6.py
│   │       ├── loaders/
│   │       │   ├── __init__.py
│   │       │   ├── chunking_dataset.py
│   │       │   ├── multi_criteria_cws/
│   │       │   │   ├── __init__.py
│   │       │   │   └── mcws_dataset.py
│   │       │   └── txt.py
│   │       └── sighan2005/
│   │           ├── __init__.py
│   │           ├── as_.py
│   │           ├── cityu.py
│   │           ├── msr.py
│   │           └── pku.py
│   ├── layers/
│   │   ├── __init__.py
│   │   ├── cnn_encoder.py
│   │   ├── crf/
│   │   │   ├── __init__.py
│   │   │   ├── crf.py
│   │   │   ├── crf_layer_tf.py
│   │   │   └── crf_tf.py
│   │   ├── dropout.py
│   │   ├── embeddings/
│   │   │   ├── __init__.py
│   │   │   ├── char_cnn.py
│   │   │   ├── char_cnn_tf.py
│   │   │   ├── char_rnn.py
│   │   │   ├── char_rnn_tf.py
│   │   │   ├── concat_embedding.py
│   │   │   ├── contextual_string_embedding.py
│   │   │   ├── contextual_string_embedding_tf.py
│   │   │   ├── contextual_word_embedding.py
│   │   │   ├── embedding.py
│   │   │   ├── fast_text.py
│   │   │   ├── fast_text_tf.py
│   │   │   ├── util.py
│   │   │   ├── util_tf.py
│   │   │   ├── word2vec.py
│   │   │   └── word2vec_tf.py
│   │   ├── feed_forward.py
│   │   ├── feedforward.py
│   │   ├── scalar_mix.py
│   │   ├── time_distributed.py
│   │   ├── transformers/
│   │   │   ├── __init__.py
│   │   │   ├── encoder.py
│   │   │   ├── loader_tf.py
│   │   │   ├── pt_imports.py
│   │   │   ├── relative_transformer.py
│   │   │   ├── resource.py
│   │   │   ├── tf_imports.py
│   │   │   ├── utils.py
│   │   │   └── utils_tf.py
│   │   └── weight_normalization.py
│   ├── losses/
│   │   ├── __init__.py
│   │   └── sparse_categorical_crossentropy.py
│   ├── metrics/
│   │   ├── __init__.py
│   │   ├── accuracy.py
│   │   ├── amr/
│   │   │   ├── __init__.py
│   │   │   └── smatch_eval.py
│   │   ├── chunking/
│   │   │   ├── __init__.py
│   │   │   ├── binary_chunking_f1.py
│   │   │   ├── bmes_tf.py
│   │   │   ├── chunking_f1.py
│   │   │   ├── chunking_f1_tf.py
│   │   │   ├── conlleval.py
│   │   │   ├── iobes_tf.py
│   │   │   └── sequence_labeling.py
│   │   ├── f1.py
│   │   ├── metric.py
│   │   ├── mtl.py
│   │   ├── parsing/
│   │   │   ├── __init__.py
│   │   │   ├── attachmentscore.py
│   │   │   ├── conllx_eval.py
│   │   │   ├── labeled_f1.py
│   │   │   ├── labeled_f1_tf.py
│   │   │   ├── labeled_score.py
│   │   │   ├── semdep_eval.py
│   │   │   └── span.py
│   │   ├── spearman_correlation.py
│   │   └── srl/
│   │       ├── __init__.py
│   │       └── srlconll.py
│   ├── optimizers/
│   │   ├── __init__.py
│   │   └── adamw/
│   │       ├── __init__.py
│   │       └── optimization.py
│   ├── pretrained/
│   │   ├── __init__.py
│   │   ├── amr.py
│   │   ├── amr2text.py
│   │   ├── classifiers.py
│   │   ├── constituency.py
│   │   ├── dep.py
│   │   ├── eos.py
│   │   ├── fasttext.py
│   │   ├── glove.py
│   │   ├── mtl.py
│   │   ├── ner.py
│   │   ├── pos.py
│   │   ├── rnnlm.py
│   │   ├── sdp.py
│   │   ├── srl.py
│   │   ├── sts.py
│   │   ├── tok.py
│   │   └── word2vec.py
│   ├── transform/
│   │   ├── __init__.py
│   │   ├── conll_tf.py
│   │   ├── glue_tf.py
│   │   ├── table_tf.py
│   │   ├── tacred_tf.py
│   │   ├── text_tf.py
│   │   ├── transformer_tokenizer.py
│   │   ├── tsv_tf.py
│   │   └── txt_tf.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── component_util.py
│   │   ├── file_read_backwards/
│   │   │   ├── __init__.py
│   │   │   ├── buffer_work_space.py
│   │   │   └── file_read_backwards.py
│   │   ├── init_util.py
│   │   ├── io_util.py
│   │   ├── lang/
│   │   │   ├── __init__.py
│   │   │   ├── en/
│   │   │   │   ├── __init__.py
│   │   │   │   └── english_tokenizer.py
│   │   │   ├── ja/
│   │   │   │   ├── __init__.py
│   │   │   │   └── bert_tok.py
│   │   │   └── zh/
│   │   │       ├── __init__.py
│   │   │       ├── char_table.py
│   │   │       └── localization.py
│   │   ├── log_util.py
│   │   ├── rules.py
│   │   ├── span_util.py
│   │   ├── string_util.py
│   │   ├── tf_util.py
│   │   ├── time_util.py
│   │   └── torch_util.py
│   └── version.py
├── plugins/
│   ├── README.md
│   ├── hanlp_common/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── hanlp_common/
│   │   │   ├── __init__.py
│   │   │   ├── amr.py
│   │   │   ├── configurable.py
│   │   │   ├── conll.py
│   │   │   ├── constant.py
│   │   │   ├── document.py
│   │   │   ├── io.py
│   │   │   ├── reflection.py
│   │   │   ├── structure.py
│   │   │   ├── util.py
│   │   │   └── visualization.py
│   │   └── setup.py
│   ├── hanlp_demo/
│   │   ├── README.md
│   │   ├── hanlp_demo/
│   │   │   ├── __init__.py
│   │   │   ├── block_windows.py
│   │   │   ├── en/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── demo_amr.py
│   │   │   │   ├── demo_dep.py
│   │   │   │   ├── demo_lm.py
│   │   │   │   ├── demo_ner.py
│   │   │   │   ├── demo_pipeline.py
│   │   │   │   ├── demo_pos.py
│   │   │   │   ├── demo_sdp.py
│   │   │   │   ├── demo_sentiment_analysis.py
│   │   │   │   ├── demo_tok.py
│   │   │   │   └── train_sst2_albert_base.py
│   │   │   ├── ja/
│   │   │   │   ├── __init__.py
│   │   │   │   └── demo_mtl.py
│   │   │   ├── mul/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── demo_lid.py
│   │   │   │   ├── demo_lid_restful.py
│   │   │   │   ├── demo_mtl.py
│   │   │   │   └── train/
│   │   │   │       ├── __init__.py
│   │   │   │       └── mul_base.py
│   │   │   ├── sent_split.py
│   │   │   └── zh/
│   │   │       ├── __init__.py
│   │   │       ├── abstractive_summarization_restful.ipynb
│   │   │       ├── amr_restful.ipynb
│   │   │       ├── amr_stl.ipynb
│   │   │       ├── classification_restful.ipynb
│   │   │       ├── con_mtl.ipynb
│   │   │       ├── con_restful.ipynb
│   │   │       ├── con_stl.ipynb
│   │   │       ├── cor_restful.ipynb
│   │   │       ├── demo_amr.py
│   │   │       ├── demo_custom_dict.py
│   │   │       ├── demo_custom_dict_stl.py
│   │   │       ├── demo_del_tasks.py
│   │   │       ├── demo_document.py
│   │   │       ├── demo_mlm.py
│   │   │       ├── demo_mtl.py
│   │   │       ├── demo_ner_dict.py
│   │   │       ├── demo_parse_constituency.py
│   │   │       ├── demo_pipeline.py
│   │   │       ├── demo_pos_dict.py
│   │   │       ├── demo_sts.py
│   │   │       ├── demo_word2vec.py
│   │   │       ├── dep_mtl.ipynb
│   │   │       ├── dep_restful.ipynb
│   │   │       ├── dep_stl.ipynb
│   │   │       ├── extractive_summarization_restful.ipynb
│   │   │       ├── gec_restful.ipynb
│   │   │       ├── keyphrase_restful.ipynb
│   │   │       ├── lid_restful.ipynb
│   │   │       ├── lid_stl.ipynb
│   │   │       ├── ner_mtl.ipynb
│   │   │       ├── ner_restful.ipynb
│   │   │       ├── ner_stl.ipynb
│   │   │       ├── pos_mtl.ipynb
│   │   │       ├── pos_restful.ipynb
│   │   │       ├── pos_stl.ipynb
│   │   │       ├── sdp_mtl.ipynb
│   │   │       ├── sdp_restful.ipynb
│   │   │       ├── sdp_stl.ipynb
│   │   │       ├── sentiment_restful.ipynb
│   │   │       ├── srl_mtl.ipynb
│   │   │       ├── srl_restful.ipynb
│   │   │       ├── srl_stl.ipynb
│   │   │       ├── sts_restful.ipynb
│   │   │       ├── sts_stl.ipynb
│   │   │       ├── tf/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── demo_classifier.py
│   │   │       │   ├── demo_client.py
│   │   │       │   ├── demo_cws.py
│   │   │       │   ├── demo_cws_trie.py
│   │   │       │   ├── demo_dep.py
│   │   │       │   ├── demo_fasttext.py
│   │   │       │   ├── demo_multiprocess.py
│   │   │       │   ├── demo_ner.py
│   │   │       │   ├── demo_pipeline.py
│   │   │       │   ├── demo_pos.py
│   │   │       │   ├── demo_sdp.py
│   │   │       │   ├── demo_serving.py
│   │   │       │   └── train/
│   │   │       │       ├── __init__.py
│   │   │       │       ├── cws/
│   │   │       │       │   ├── __init__.py
│   │   │       │       │   ├── train_ctb6_cws_albert.py
│   │   │       │       │   ├── train_ctb6_cws_bert.py
│   │   │       │       │   ├── train_ctb6_cws_convseg.py
│   │   │       │       │   ├── train_large_bert_cws.py
│   │   │       │       │   ├── train_large_conv_cws.py
│   │   │       │       │   ├── train_large_cws_albert.py
│   │   │       │       │   ├── train_large_cws_electra.py
│   │   │       │       │   ├── train_large_rnn_cws.py
│   │   │       │       │   ├── train_msr_cws_albert.py
│   │   │       │       │   ├── train_msr_cws_bert.py
│   │   │       │       │   ├── train_msr_cws_ngram_conv.py
│   │   │       │       │   ├── train_msr_cws_ngram_conv_embed.py
│   │   │       │       │   ├── train_pku980106_conv_cws.py
│   │   │       │       │   ├── train_pku980106_rnn_cws.py
│   │   │       │       │   └── train_pku_conv_cws.py
│   │   │       │       ├── finetune_msra_ner_albert.py
│   │   │       │       ├── train_chnsenticorp_bert.py
│   │   │       │       ├── train_conll03_ner_bert.py
│   │   │       │       ├── train_conll03_ner_flair.py
│   │   │       │       ├── train_ctb5_dep.py
│   │   │       │       ├── train_ctb5_pos_rnn.py
│   │   │       │       ├── train_ctb7_dep.py
│   │   │       │       ├── train_ctb9_pos_albert.py
│   │   │       │       ├── train_ctb9_pos_electra.py
│   │   │       │       ├── train_msra_ner_albert.py
│   │   │       │       ├── train_msra_ner_bert.py
│   │   │       │       ├── train_msra_ner_electra.py
│   │   │       │       ├── train_msra_ner_ngram_conv.py
│   │   │       │       ├── train_msra_ner_rnn.py
│   │   │       │       ├── train_ptb_dep_biaffine_albert.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert_96.6.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert_positional.py
│   │   │       │       ├── train_ptb_dep_sa_albert.py
│   │   │       │       ├── train_ptb_dep_sa_albert_topk.py
│   │   │       │       ├── train_ptb_dep_sa_bert.py
│   │   │       │       ├── train_ptb_dep_sa_pos_bert.py
│   │   │       │       ├── train_ptb_pos_rnn_fasttext.py
│   │   │       │       ├── train_semeval15_dm.py
│   │   │       │       ├── train_semeval15_pas.py
│   │   │       │       ├── train_semeval15_psd.py
│   │   │       │       ├── train_semeval16_news.py
│   │   │       │       └── train_semeval16_text.py
│   │   │       ├── tok_mtl.ipynb
│   │   │       ├── tok_restful.ipynb
│   │   │       ├── tok_stl.ipynb
│   │   │       ├── train/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── finetune_ner.py
│   │   │       │   ├── open_base.py
│   │   │       │   └── open_small.py
│   │   │       ├── train_sota_bert_pku.py
│   │   │       ├── tst_restful.ipynb
│   │   │       └── tutorial.ipynb
│   │   └── setup.py
│   ├── hanlp_restful/
│   │   ├── README.md
│   │   ├── hanlp_restful/
│   │   │   └── __init__.py
│   │   ├── setup.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_client.py
│   ├── hanlp_restful_golang/
│   │   └── README.md
│   ├── hanlp_restful_java/
│   │   ├── pom.xml
│   │   └── src/
│   │       ├── main/
│   │       │   └── java/
│   │       │       └── com/
│   │       │           └── hankcs/
│   │       │               └── hanlp/
│   │       │                   └── restful/
│   │       │                       ├── BaseInput.java
│   │       │                       ├── CoreferenceResolutionOutput.java
│   │       │                       ├── DocumentInput.java
│   │       │                       ├── HanLPClient.java
│   │       │                       ├── SentenceInput.java
│   │       │                       ├── Span.java
│   │       │                       ├── TokenInput.java
│   │       │                       └── mrp/
│   │       │                           ├── Anchor.java
│   │       │                           ├── Edge.java
│   │       │                           ├── MeaningRepresentation.java
│   │       │                           └── Node.java
│   │       └── test/
│   │           └── java/
│   │               └── com/
│   │                   └── hankcs/
│   │                       └── hanlp/
│   │                           └── restful/
│   │                               ├── HanLPClientTest.java
│   │                               └── MeaningRepresentationTest.java
│   └── hanlp_trie/
│       ├── README.md
│       ├── hanlp_trie/
│       │   ├── __init__.py
│       │   ├── dictionary.py
│       │   └── trie.py
│       ├── setup.py
│       └── tests/
│           ├── __init__.py
│           ├── test_trie.py
│           └── test_trie_dict.py
├── setup.py
└── tests/
    ├── __init__.py
    ├── test_config_tracker.py
    ├── test_mtl.py
    ├── test_pipeline.py
    ├── test_rules.py
    └── test_string_util.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: 🐛发现一个bug
about: 需提交版本号、触发代码、错误日志
title: ''
labels: bug
assignees: hankcs

---

<!--
感谢找出bug，请认真填写下表：
-->

**Describe the bug**
A clear and concise description of what the bug is.

**Code to reproduce the issue**
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

```python
```

**Describe the current behavior**
A clear and concise description of what happened.

**Expected behavior**
A clear and concise description of what you expected to happen.

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python version:
- HanLP version:

**Other info / logs**
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

* [ ] I've completed this form and searched the web for solutions.
<!-- ⬆️此处务必勾选，否则你的issue会被机器人自动删除！ -->
<!-- ⬆️此处务必勾选，否则你的issue会被机器人自动删除！ -->
<!-- ⬆️此处务必勾选，否则你的issue会被机器人自动删除！ -->

================================================
FILE: .github/ISSUE_TEMPLATE/config.yml
================================================
blank_issues_enabled: false
contact_links:
  - name: ⁉️ 提问求助请上论坛
    url: https://bbs.hankcs.com/
    about: 欢迎前往蝴蝶效应论坛求助


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: 🚀新功能请愿
about: 建议增加一个新功能
title: ''
labels: feature request
assignees: hankcs

---

<!--
提问请上论坛，不要发这里！
提问请上论坛，不要发这里！
提问请上论坛，不要发这里！

以下必填，否则直接关闭。
-->

**Describe the feature and the current behavior/state.**

**Will this change the current api? How?**

**Who will benefit with this feature?**

**Are you willing to contribute it (Yes/No):**

**System information**
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Python version:
- HanLP version:

**Any other info**

* [ ] I've carefully completed this form.
<!-- 发表前先搜索，此处一定要勾选！ -->
<!-- 发表前先搜索，此处一定要勾选！ -->
<!-- 发表前先搜索，此处一定要勾选！ -->

================================================
FILE: .github/pull_request_template.md
================================================
<!--
Thank you for being interested in contributing to HanLP! You are awesome ✨.
⚠️Changes must be made on dev branch.
-->

# Title of Your Pull Request

## Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

## Type of Change

Please check any relevant options and delete the rest.

- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] This change requires a documentation update

## How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

## Checklist

Check all items that apply.

- [ ] ⚠️Changes **must** be made on `dev` branch instead of `master`
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] My code follows the style guidelines of this project
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have checked my code and corrected any misspellings


================================================
FILE: .github/workflows/unit-tests.yml
================================================
name: Unit Tests

on:
  push:
    branches: [ "**" ]
  pull_request:
    branches: [ "**" ]

jobs:
  build:

    runs-on: ${{ matrix.os }}
    env:
      HANLP_HOME: ${{ github.workspace }}/data
    strategy:
      fail-fast: false
      matrix:
        os: [ ubuntu-latest, macos-latest, windows-latest ]
        python-version: [ 3.6, 3.7, 3.8, 3.9, '3.10' ]
        exclude:
          # GHA doesn't list 3.6 for ubuntu-22.04
          - os: ubuntu-latest
            python-version: "3.6"

          # MacOS 14.4.1 for arm64 doesn't support Python < 3.8
          - os: macos-latest
            python-version: "3.6"
          - os: macos-latest
            python-version: "3.7"

        include:
          # GHA doesn't list 3.6 for ubuntu-22
          - os: ubuntu-20.04
            python-version: "3.6"

          # MacOS 13 required for Python < 3.8
          - os: macos-13
            python-version: "3.6"
          - os: macos-13
            python-version: "3.7"

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python ${{ matrix.python-version }}
        uses: actions/setup-python@v3
        with:
          python-version: ${{ matrix.python-version }}

      - name: Install dependencies
        shell: bash
        run: |
          python -m pip install -e plugins/hanlp_trie
          python -m pip install -e plugins/hanlp_common
          python -m pip install -e .
          python -m pip install pytest

      - name: Cache data
        uses: actions/cache@v3
        with:
          path: ${{ env.HANLP_HOME }}
          key: hanlp-data

      - name: Test with pytest
        shell: bash
        run: |
          pytest tests
          pytest plugins/hanlp_trie/tests
  deploy:
    needs: build
    if: github.event_name == 'push' && github.ref == 'refs/heads/master'
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3
      - name: Install dependencies
        run: |
          python -m pip install setuptools wheel twine
      - name: Deploy to PyPI
        run: |
          python setup.py sdist bdist_wheel
          python -m twine upload dist/*
        env:
          TWINE_USERNAME: __token__
          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
          TWINE_REPOSITORY: pypi


================================================
FILE: .gitignore
================================================
# Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

### Java template
# Compiled class file
*.class

# Log file

# BlueJ files
*.ctxt

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.nar
*.ear
*.zip
*.tar.gz
*.rar

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

### Eclipse template
.metadata
bin/
tmp/
*.tmp
*.bak
*.swp
*~.nib
local.properties
.settings/
.loadpath
.recommenders

# External tool builders
.externalToolBuilders/

# Locally stored "Eclipse launch configurations"
*.launch

# PyDev specific (Python IDE for Eclipse)
*.pydevproject

# CDT-specific (C/C++ Development Tooling)
.cproject

# CDT- autotools
.autotools

# Java annotation processor (APT)
.factorypath

# PDT-specific (PHP Development Tools)
.buildpath

# sbteclipse plugin
.target

# Tern plugin
.tern-project

# TeXlipse plugin
.texlipse

# STS (Spring Tool Suite)
.springBeans

# Code Recommenders
.recommenders/

# Annotation Processing
.apt_generated/

# Scala IDE specific (Scala & Java development for Eclipse)
.cache-main
.scala_dependencies
.worksheet

### VisualStudioCode template
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json

### JetBrains template
# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839

# User-specific stuff
.idea/**/workspace.xml
.idea/**/tasks.xml
.idea/**/usage.statistics.xml
.idea/**/dictionaries
.idea/**/shelf

# Generated files
.idea/**/contentModel.xml

# Sensitive or high-churn files
.idea/**/dataSources/
.idea/**/dataSources.ids
.idea/**/dataSources.local.xml
.idea/**/sqlDataSources.xml
.idea/**/dynamic.xml
.idea/**/uiDesigner.xml
.idea/**/dbnavigator.xml

# Gradle
.idea/**/gradle.xml
.idea/**/libraries

# Gradle and Maven with auto-import
# When using Gradle or Maven with auto-import, you should exclude module files,
# since they will be recreated, and may cause churn.  Uncomment if using
# auto-import.
# .idea/modules.xml
# .idea/*.iml
# .idea/modules
# *.iml
# *.ipr

# CMake
cmake-build-*/

# Mongo Explorer plugin
.idea/**/mongoSettings.xml

# File-based project format
*.iws

# IntelliJ
out/

# mpeltonen/sbt-idea plugin
.idea_modules/

# JIRA plugin
atlassian-ide-plugin.xml

# Cursive Clojure plugin
.idea/replstate.xml

# Crashlytics plugin (for Android Studio and IntelliJ)
com_crashlytics_export_strings.xml
crashlytics.properties
crashlytics-build.properties
fabric.properties

# Editor-based Rest HanLPClient
.idea/httpRequests

# Android studio 3.1+ serialized cache file
.idea/caches/build_file_checksums.ser
.idea
*.iml
data
.vscode
*.pkl
*.pdf
_static/
_build/
_templates/

================================================
FILE: CITATION.cff
================================================
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: He
  given-names: Han
  orcid: "https://orcid.org/0009-0005-1778-917X"
title: "HanLP: Han Language Processing"
version: 2.1
date-released: 2015-05-27
url: "https://github.com/hankcs/HanLP"
preferred-citation:
  type: conference-paper
  authors:
    - family-names: He
      given-names: Han
    - family-names: Choi
      given-names: Jinho D.
  title: "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders"
  editors:
    - family-names: Moens
      given-names: Marie-Francine
    - family-names: Huang
      given-names: Xuanjing
    - family-names: Specia
      given-names: Lucia
    - family-names: Yih
      given-names: Scott Wen-tau
  year: 2021
  month: 11
  date-released: 2021-11
  conference:
    name: "2021 Conference on Empirical Methods in Natural Language Processing"
    place: "Online and Punta Cana, Dominican Republic"
    url: "https://aclanthology.org/2021.emnlp-main.451"
  doi: "10.18653/v1/2021.emnlp-main.451"
  url: "https://aclanthology.org/2021.emnlp-main.451"
  publisher: "Association for Computational Linguistics"
  booktitle: "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing"
  location: "Online and Punta Cana, Dominican Republic"
  pages: "5555-5577"


================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
<h2 align="center">HanLP: Han Language Processing</h2>

<div align="center">
    <a href="https://github.com/hankcs/HanLP/actions/workflows/unit-tests.yml">
       <img alt="Unit Tests" src="https://github.com/hankcs/hanlp/actions/workflows/unit-tests.yml/badge.svg?branch=master">
    </a>
    <a href="https://pypi.org/project/hanlp/">
        <img alt="PyPI Version" src="https://img.shields.io/pypi/v/hanlp?color=blue">
    </a>
    <a href="https://pypi.org/project/hanlp/">
        <img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/hanlp?colorB=blue">
    </a>
    <a href="https://pepy.tech/project/hanlp">
        <img alt="Downloads" src="https://static.pepy.tech/badge/hanlp">
    </a>
    <a href="https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Ftutorial.ipynb">
        <img alt="在线运行" src="https://mybinder.org/badge_logo.svg">
    </a>
</div>
<h4 align="center">
    <a href="https://github.com/hankcs/HanLP/tree/master">English</a> |
    <a href="https://github.com/hankcs/HanLP/tree/doc-ja">日本語</a> |
    <a href="https://hanlp.hankcs.com/docs/">文档</a> |
    <a href="https://bbs.hankcs.com/t/topic/3940">论文</a> |
    <a href="https://bbs.hankcs.com/">论坛</a> |
    <a href="https://github.com/wangedison/hanlp-jupyterlab-docker">docker</a> |
    <a href="https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Ftutorial.ipynb">▶️在线运行</a>
</h4>



面向生产环境的多语种自然语言处理工具包，基于PyTorch和TensorFlow 2.x双引擎，目标是普及落地最前沿的NLP技术。HanLP具备功能完善、精度准确、性能高效、语料时新、架构清晰、可自定义的特点。

[![demo](https://raw.githubusercontent.com/hankcs/OpenCC-to-HanLP/img/demo.gif)](https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Ftutorial.ipynb)

借助世界上最大的多语种语料库，HanLP2.1支持包括简繁中英日俄法德在内的[130种语言](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/mtl.html#hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_MMINILMV2L6)上的10种联合任务以及多种单任务。HanLP预训练了十几种任务上的数十个模型并且正在持续迭代语料库与模型：

<div align="center">

| 功能                                                         | RESTful                                                      | 多任务                                                       | 单任务                                                       | 模型                                                         | 标注标准                                                     |
| ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
| [分词](https://hanlp.hankcs.com/demos/tok.html)              | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_stl.ipynb) | [tok](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/tok.html) | [粗分](https://hanlp.hankcs.com/docs/annotations/tok/msr.html)、[细分](https://hanlp.hankcs.com/docs/annotations/tok/ctb.html) |
| [词性标注](https://hanlp.hankcs.com/demos/pos.html)          | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/pos_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/pos_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/pos_stl.ipynb) | [pos](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/pos.html) | [CTB](https://hanlp.hankcs.com/docs/annotations/pos/ctb.html)、[PKU](https://hanlp.hankcs.com/docs/annotations/pos/pku.html)、[863](https://hanlp.hankcs.com/docs/annotations/pos/863.html) |
| [命名实体识别](https://hanlp.hankcs.com/demos/ner.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/ner_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/ner_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/ner_stl.ipynb) | [ner](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/ner.html) | [PKU](https://hanlp.hankcs.com/docs/annotations/ner/pku.html)、[MSRA](https://hanlp.hankcs.com/docs/annotations/ner/msra.html)、[OntoNotes](https://hanlp.hankcs.com/docs/annotations/ner/ontonotes.html) |
| [依存句法分析](https://hanlp.hankcs.com/demos/dep.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_stl.ipynb) | [dep](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/dep.html) | [SD](https://hanlp.hankcs.com/docs/annotations/dep/sd_zh.html)、[UD](https://hanlp.hankcs.com/docs/annotations/dep/ud.html#chinese)、[PMT](https://hanlp.hankcs.com/docs/annotations/dep/pmt.html) |
| [成分句法分析](https://hanlp.hankcs.com/demos/con.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/con_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/con_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/con_stl.ipynb) | [con](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/constituency.html) | [Chinese Tree Bank](https://hanlp.hankcs.com/docs/annotations/constituency/ctb.html) |
| [语义依存分析](https://hanlp.hankcs.com/demos/sdp.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sdp_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sdp_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sdp_stl.ipynb) | [sdp](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/sdp.html) | [CSDP](https://hanlp.hankcs.com/docs/annotations/sdp/semeval16.html#) |
| [语义角色标注](https://hanlp.hankcs.com/demos/srl.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/srl_restful.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/srl_mtl.ipynb) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/srl_stl.ipynb) | [srl](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/srl.html) | [Chinese Proposition Bank](https://hanlp.hankcs.com/docs/annotations/srl/cpb.html) |
| [抽象意义表示](https://hanlp.hankcs.com/demos/amr.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/amr_restful.ipynb) | 暂无                                                         | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/amr_stl.ipynb) | [amr](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/amr.html) | [CAMR](https://www.hankcs.com/nlp/corpus/introduction-to-chinese-abstract-meaning-representation.html) |
| [指代消解](https://hanlp.hankcs.com/demos/cor.html)          | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/cor_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | OntoNotes                                                    |
| [语义文本相似度](https://hanlp.hankcs.com/demos/sts.html)    | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sts_restful.ipynb) | 暂无                                                         | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sts_stl.ipynb) | [sts](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/sts.html) | 暂无                                                         |
| [文本风格转换](https://hanlp.hankcs.com/demos/tst.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tst_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [关键词短语提取](https://hanlp.hankcs.com/demos/keyphrase.html) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/keyphrase_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [抽取式自动摘要](https://hanlp.hankcs.com/demos/exsum.html)  | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/extractive_summarization_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [生成式自动摘要](https://hanlp.hankcs.com/demos/absum.html)  | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/abstractive_summarization_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [文本语法纠错](https://hanlp.hankcs.com/demos/gec.html)      | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/gec_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [文本分类](https://hanlp.hankcs.com/demos/classification.html) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/classification_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | 暂无                                                         |
| [情感分析](https://hanlp.hankcs.com/demos/sentiment.html)    | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/sentiment_restful.ipynb) | 暂无                                                         | 暂无                                                         | 暂无                                                         | `[-1,+1]`                                                    |
| [语种检测](https://hanlp.hankcs.com/demos/classification.html) | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/lid_restful.ipynb) | 暂无                                                         | [教程](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/lid_stl.ipynb) | 暂无                                                         | [ISO 639-1编码](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) |

</div>

- 词干提取、词法语法特征提取请参考[英文教程](https://hanlp.hankcs.com/docs/tutorial.html)；[词向量](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/word2vec.html)和[完形填空](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/mlm.html)请参考相应文档。
- 简繁转换、拼音、新词发现、文本聚类请参考[1.x教程](https://github.com/hankcs/HanLP/tree/1.x)。

量体裁衣，HanLP提供**RESTful**和**native**两种API，分别面向轻量级和海量级两种场景。无论何种API何种语言，HanLP接口在语义上保持一致，在代码上坚持开源。如果您在研究中使用了HanLP，请引用我们的[EMNLP论文](https://aclanthology.org/2021.emnlp-main.451/)。

### 轻量级RESTful API

仅数KB，适合敏捷开发、移动APP等场景。简单易用，无需GPU配环境，秒速安装。语料更多、模型更大、精度更高，**强烈推荐**。服务器GPU算力有限，匿名用户配额较少，[建议申请**免费公益**API秘钥`auth`](https://bbs.hanlp.com/t/hanlp2-1-restful-api/53)。

#### Python

```shell
pip install hanlp_restful
```

创建客户端，填入服务器地址和秘钥：

```python
from hanlp_restful import HanLPClient
HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='zh') # auth不填则匿名，zh中文，mul多语种
```

#### Golang

安装 `go get -u github.com/hankcs/gohanlp@main` ，创建客户端，填入服务器地址和秘钥：

```go
HanLP := hanlp.HanLPClient(hanlp.WithAuth(""),hanlp.WithLanguage("zh")) // auth不填则匿名，zh中文，mul多语种
```

#### Java

在`pom.xml`中添加依赖：

```xml
<dependency>
    <groupId>com.hankcs.hanlp.restful</groupId>
    <artifactId>hanlp-restful</artifactId>
    <version>0.0.12</version>
</dependency>
```

创建客户端，填入服务器地址和秘钥：

```java
HanLPClient HanLP = new HanLPClient("https://www.hanlp.com/api", null, "zh"); // auth不填则匿名，zh中文，mul多语种
```

#### 快速上手

无论何种开发语言，调用`parse`接口，传入一篇文章，得到HanLP精准的分析结果。

```java
HanLP.parse("2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。")
```

更多功能包括语义相似度、风格转换、指代消解等，请参考[文档](https://hanlp.hankcs.com/docs/api/restful.html)和[测试用例](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_restful/tests/test_client.py)。

### 海量级native API

依赖PyTorch、TensorFlow等深度学习技术，适合**专业**NLP工程师、研究者以及本地海量数据场景。要求Python 3.6至3.10，支持Windows，推荐*nix。可以在CPU上运行，推荐GPU/TPU。安装PyTorch版：

```bash
pip install hanlp
```

- HanLP每次发布都通过了Linux、macOS和Windows上Python3.6至3.10的[单元测试](https://github.com/hankcs/HanLP/actions?query=branch%3Amaster)，不存在安装问题。

HanLP发布的模型分为多任务和单任务两种，多任务速度快省显存，单任务精度高更灵活。

#### 多任务模型

HanLP的工作流程为加载模型然后将其当作函数调用，例如下列联合多任务模型：

```python
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.CLOSE_TOK_POS_NER_SRL_DEP_SDP_CON_ELECTRA_SMALL_ZH) # 世界最大中文语料库
HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。'])
```

Native API的输入单位为句子，需使用[多语种分句模型](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_demo/hanlp_demo/sent_split.py)或[基于规则的分句函数](https://github.com/hankcs/HanLP/blob/master/hanlp/utils/rules.py#L19)先行分句。RESTful和native两种API的语义设计完全一致，用户可以无缝互换。简洁的接口也支持灵活的参数，常用的技巧有：

- 灵活的`tasks`任务调度，任务越少，速度越快，详见[教程](https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Ftutorial.ipynb)。在内存有限的场景下，用户还可以[删除不需要的任务](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_demo/hanlp_demo/zh/demo_del_tasks.py)达到模型瘦身的效果。
- 高效的trie树自定义词典，以及强制、合并、校正3种规则，请参考[demo](https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/tok_mtl.ipynb)和[文档](https://hanlp.hankcs.com/docs/api/hanlp/components/tokenizers/transformer.html)。规则系统的效果将无缝应用到后续统计模型，从而快速适应新领域。

#### 单任务模型

根据我们的[最新研究](https://aclanthology.org/2021.emnlp-main.451)，多任务学习的优势在于速度和显存，然而精度往往不如单任务模型。所以，HanLP预训练了许多单任务模型并设计了优雅的[流水线模式](https://hanlp.hankcs.com/docs/api/hanlp/components/pipeline.html#hanlp.components.pipeline.Pipeline)将其组装起来。

```python
import hanlp
HanLP = hanlp.pipeline() \
    .append(hanlp.utils.rules.split_sentence, output_key='sentences') \
    .append(hanlp.load('FINE_ELECTRA_SMALL_ZH'), output_key='tok') \
    .append(hanlp.load('CTB9_POS_ELECTRA_SMALL'), output_key='pos') \
    .append(hanlp.load('MSRA_NER_ELECTRA_SMALL_ZH'), output_key='ner', input_key='tok') \
    .append(hanlp.load('CTB9_DEP_ELECTRA_SMALL', conll=0), output_key='dep', input_key='tok')\
    .append(hanlp.load('CTB9_CON_ELECTRA_SMALL'), output_key='con', input_key='tok')
HanLP('2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。阿婆主来到北京立方庭参观自然语义科技公司。')
```

更多功能，请参考[demo](https://github.com/hankcs/HanLP/tree/doc-zh/plugins/hanlp_demo/hanlp_demo/zh)和[文档](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/index.html)了解更多模型与用法。

### 输出格式

无论何种API何种开发语言何种自然语言，HanLP的输出统一为`json`格式兼容`dict`的[`Document`](https://hanlp.hankcs.com/docs/api/common/document.html):

```json
{
  "tok/fine": [
    ["2021年", "HanLPv2.1", "为", "生产", "环境", "带来", "次", "世代", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"],
    ["阿婆主", "来到", "北京", "立方庭", "参观", "自然", "语义", "科技", "公司", "。"]
  ],
  "tok/coarse": [
    ["2021年", "HanLPv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进", "的", "多语种", "NLP", "技术", "。"],
    ["阿婆主", "来到", "北京立方庭", "参观", "自然语义科技公司", "。"]
  ],
  "pos/ctb": [
    ["NT", "NR", "P", "NN", "NN", "VV", "JJ", "NN", "AD", "JJ", "DEG", "CD", "NN", "NR", "NN", "PU"],
    ["NN", "VV", "NR", "NR", "VV", "NN", "NN", "NN", "NN", "PU"]
  ],
  "pos/pku": [
    ["t", "nx", "p", "vn", "n", "v", "b", "n", "d", "a", "u", "a", "n", "nx", "n", "w"],
    ["n", "v", "ns", "ns", "v", "n", "n", "n", "n", "w"]
  ],
  "pos/863": [
    ["nt", "w", "p", "v", "n", "v", "a", "nt", "d", "a", "u", "a", "n", "ws", "n", "w"],
    ["n", "v", "ns", "n", "v", "n", "n", "n", "n", "w"]
  ],
  "ner/pku": [
    [],
    [["北京立方庭", "ns", 2, 4], ["自然语义科技公司", "nt", 5, 9]]
  ],
  "ner/msra": [
    [["2021年", "DATE", 0, 1], ["HanLPv2.1", "ORGANIZATION", 1, 2]],
    [["北京", "LOCATION", 2, 3], ["立方庭", "LOCATION", 3, 4], ["自然语义科技公司", "ORGANIZATION", 5, 9]]
  ],
  "ner/ontonotes": [
    [["2021年", "DATE", 0, 1], ["HanLPv2.1", "ORG", 1, 2]],
    [["北京立方庭", "FAC", 2, 4], ["自然语义科技公司", "ORG", 5, 9]]
  ],
  "srl": [
    [[["2021年", "ARGM-TMP", 0, 1], ["HanLPv2.1", "ARG0", 1, 2], ["为生产环境", "ARG2", 2, 5], ["带来", "PRED", 5, 6], ["次世代最先进的多语种NLP技术", "ARG1", 6, 15]], [["最", "ARGM-ADV", 8, 9], ["先进", "PRED", 9, 10], ["技术", "ARG0", 14, 15]]],
    [[["阿婆主", "ARG0", 0, 1], ["来到", "PRED", 1, 2], ["北京立方庭", "ARG1", 2, 4]], [["阿婆主", "ARG0", 0, 1], ["参观", "PRED", 4, 5], ["自然语义科技公司", "ARG1", 5, 9]]]
  ],
  "dep": [
    [[6, "tmod"], [6, "nsubj"], [6, "prep"], [5, "nn"], [3, "pobj"], [0, "root"], [8, "amod"], [15, "nn"], [10, "advmod"], [15, "rcmod"], [10, "assm"], [13, "nummod"], [15, "nn"], [15, "nn"], [6, "dobj"], [6, "punct"]],
    [[2, "nsubj"], [0, "root"], [4, "nn"], [2, "dobj"], [2, "conj"], [9, "nn"], [9, "nn"], [9, "nn"], [5, "dobj"], [2, "punct"]]
  ],
  "sdp": [
    [[[6, "Time"]], [[6, "Exp"]], [[5, "mPrep"]], [[5, "Desc"]], [[6, "Datv"]], [[13, "dDesc"]], [[0, "Root"], [8, "Desc"], [13, "Desc"]], [[15, "Time"]], [[10, "mDegr"]], [[15, "Desc"]], [[10, "mAux"]], [[8, "Quan"], [13, "Quan"]], [[15, "Desc"]], [[15, "Nmod"]], [[6, "Pat"]], [[6, "mPunc"]]],
    [[[2, "Agt"], [5, "Agt"]], [[0, "Root"]], [[4, "Loc"]], [[2, "Lfin"]], [[2, "ePurp"]], [[8, "Nmod"]], [[9, "Nmod"]], [[9, "Nmod"]], [[5, "Datv"]], [[5, "mPunc"]]]
  ],
  "con": [
    ["TOP", [["IP", [["NP", [["NT", ["2021年"]]]], ["NP", [["NR", ["HanLPv2.1"]]]], ["VP", [["PP", [["P", ["为"]], ["NP", [["NN", ["生产"]], ["NN", ["环境"]]]]]], ["VP", [["VV", ["带来"]], ["NP", [["ADJP", [["NP", [["ADJP", [["JJ", ["次"]]]], ["NP", [["NN", ["世代"]]]]]], ["ADVP", [["AD", ["最"]]]], ["VP", [["JJ", ["先进"]]]]]], ["DEG", ["的"]], ["NP", [["QP", [["CD", ["多"]]]], ["NP", [["NN", ["语种"]]]]]], ["NP", [["NR", ["NLP"]], ["NN", ["技术"]]]]]]]]]], ["PU", ["。"]]]]]],
    ["TOP", [["IP", [["NP", [["NN", ["阿婆主"]]]], ["VP", [["VP", [["VV", ["来到"]], ["NP", [["NR", ["北京"]], ["NR", ["立方庭"]]]]]], ["VP", [["VV", ["参观"]], ["NP", [["NN", ["自然"]], ["NN", ["语义"]], ["NN", ["科技"]], ["NN", ["公司"]]]]]]]], ["PU", ["。"]]]]]]
  ]
}
```

特别地，Python RESTful和native API支持基于等宽字体的[可视化](https://hanlp.hankcs.com/docs/tutorial.html#visualization)，能够直接将语言学结构在控制台内可视化出来：

```python
HanLP(['2021年HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。', '阿婆主来到北京立方庭参观自然语义科技公司。']).pretty_print()

Dep Tree    	Token    	Relati	PoS	Tok      	NER Type        	Tok      	SRL PA1     	Tok      	SRL PA2     	Tok      	PoS    3       4       5       6       7       8       9 
────────────	─────────	──────	───	─────────	────────────────	─────────	────────────	─────────	────────────	─────────	─────────────────────────────────────────────────────────
 ┌─────────►	2021年    	tmod  	NT 	2021年    	───►DATE        	2021年    	───►ARGM-TMP	2021年    	            	2021年    	NT ───────────────────────────────────────────►NP ───┐   
 │┌────────►	HanLPv2.1	nsubj 	NR 	HanLPv2.1	───►ORGANIZATION	HanLPv2.1	───►ARG0    	HanLPv2.1	            	HanLPv2.1	NR ───────────────────────────────────────────►NP────┤   
 ││┌─►┌─────	为        	prep  	P  	为        	                	为        	◄─┐         	为        	            	为        	P ───────────┐                                       │   
 │││  │  ┌─►	生产       	nn    	NN 	生产       	                	生产       	  ├►ARG2    	生产       	            	生产       	NN ──┐       ├────────────────────────►PP ───┐       │   
 │││  └─►└──	环境       	pobj  	NN 	环境       	                	环境       	◄─┘         	环境       	            	环境       	NN ──┴►NP ───┘                               │       │   
┌┼┴┴────────	带来       	root  	VV 	带来       	                	带来       	╟──►PRED    	带来       	            	带来       	VV ──────────────────────────────────┐       │       │   
││       ┌─►	次        	amod  	JJ 	次        	                	次        	◄─┐         	次        	            	次        	JJ ───►ADJP──┐                       │       ├►VP────┤   
││  ┌───►└──	世代       	nn    	NN 	世代       	                	世代       	  │         	世代       	            	世代       	NN ───►NP ───┴►NP ───┐               │       │       │   
││  │    ┌─►	最        	advmod	AD 	最        	                	最        	  │         	最        	───►ARGM-ADV	最        	AD ───────────►ADVP──┼►ADJP──┐       ├►VP ───┘       ├►IP
││  │┌──►├──	先进       	rcmod 	JJ 	先进       	                	先进       	  │         	先进       	╟──►PRED    	先进       	JJ ───────────►VP ───┘       │       │               │   
││  ││   └─►	的        	assm  	DEG	的        	                	的        	  ├►ARG1    	的        	            	的        	DEG──────────────────────────┤       │               │   
││  ││   ┌─►	多        	nummod	CD 	多        	                	多        	  │         	多        	            	多        	CD ───►QP ───┐               ├►NP ───┘               │   
││  ││┌─►└──	语种       	nn    	NN 	语种       	                	语种       	  │         	语种       	            	语种       	NN ───►NP ───┴────────►NP────┤                       │   
││  │││  ┌─►	NLP      	nn    	NR 	NLP      	                	NLP      	  │         	NLP      	            	NLP      	NR ──┐                       │                       │   
│└─►└┴┴──┴──	技术       	dobj  	NN 	技术       	                	技术       	◄─┘         	技术       	───►ARG0    	技术       	NN ──┴────────────────►NP ───┘                       │   
└──────────►	。        	punct 	PU 	。        	                	。        	            	。        	            	。        	PU ──────────────────────────────────────────────────┘   

Dep Tree    	Tok	Relat	Po	Tok	NER Type        	Tok	SRL PA1 	Tok	SRL PA2 	Tok	Po    3       4       5       6 
────────────	───	─────	──	───	────────────────	───	────────	───	────────	───	────────────────────────────────
         ┌─►	阿婆主	nsubj	NN	阿婆主	                	阿婆主	───►ARG0	阿婆主	───►ARG0	阿婆主	NN───────────────────►NP ───┐   
┌┬────┬──┴──	来到 	root 	VV	来到 	                	来到 	╟──►PRED	来到 	        	来到 	VV──────────┐               │   
││    │  ┌─►	北京 	nn   	NR	北京 	───►LOCATION    	北京 	◄─┐     	北京 	        	北京 	NR──┐       ├►VP ───┐       │   
││    └─►└──	立方庭	dobj 	NR	立方庭	───►LOCATION    	立方庭	◄─┴►ARG1	立方庭	        	立方庭	NR──┴►NP ───┘       │       │   
│└─►┌───────	参观 	conj 	VV	参观 	                	参观 	        	参观 	╟──►PRED	参观 	VV──────────┐       ├►VP────┤   
│   │  ┌───►	自然 	nn   	NN	自然 	◄─┐             	自然 	        	自然 	◄─┐     	自然 	NN──┐       │       │       ├►IP
│   │  │┌──►	语义 	nn   	NN	语义 	  │             	语义 	        	语义 	  │     	语义 	NN  │       ├►VP ───┘       │   
│   │  ││┌─►	科技 	nn   	NN	科技 	  ├►ORGANIZATION	科技 	        	科技 	  ├►ARG1	科技 	NN  ├►NP ───┘               │   
│   └─►└┴┴──	公司 	dobj 	NN	公司 	◄─┘             	公司 	        	公司 	◄─┘     	公司 	NN──┘                       │   
└──────────►	。  	punct	PU	。  	                	。  	        	。  	        	。  	PU──────────────────────────┘   
```

关于标注集含义，请参考[《语言学标注规范》](https://hanlp.hankcs.com/docs/annotations/index.html)及[《格式规范》](https://hanlp.hankcs.com/docs/data_format.html)。我们购买、标注或采用了世界上量级最大、种类最多的语料库用于联合多语种多任务学习，所以HanLP的标注集也是覆盖面最广的。

## 训练你自己的领域模型

写深度学习模型一点都不难，难的是复现较高的准确率。下列[代码](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_demo/hanlp_demo/zh/train_sota_bert_pku.py)展示了如何在sighan2005 PKU语料库上花6分钟训练一个超越学术界state-of-the-art的中文分词模型。

```python
tokenizer = TransformerTaggingTokenizer()
save_dir = 'data/model/cws/sighan2005_pku_bert_base_96.73'
tokenizer.fit(
    SIGHAN2005_PKU_TRAIN_ALL,
    SIGHAN2005_PKU_TEST,  # Conventionally, no devset is used. See Tian et al. (2020).
    save_dir,
    'bert-base-chinese',
    max_seq_len=300,
    char_level=True,
    hard_constraint=True,
    sampler_builder=SortingSamplerBuilder(batch_size=32),
    epochs=3,
    adam_epsilon=1e-6,
    warmup_steps=0.1,
    weight_decay=0.01,
    word_dropout=0.1,
    seed=1660853059,
)
tokenizer.evaluate(SIGHAN2005_PKU_TEST, save_dir)
```

其中，由于指定了随机数种子，结果一定是`96.73`。不同于那些虚假宣传的学术论文或商业项目，HanLP保证所有结果可复现。如果你有任何质疑，我们将当作最高优先级的致命性bug第一时间排查问题。

请参考[demo](https://github.com/hankcs/HanLP/tree/master/plugins/hanlp_demo/hanlp_demo/zh/train)了解更多训练脚本。

## 性能

<table><thead><tr><th rowspan="2">lang</th><th rowspan="2">corpora</th><th rowspan="2">model</th><th colspan="2">tok</th><th colspan="4">pos</th><th colspan="3">ner</th><th rowspan="2">dep</th><th rowspan="2">con</th><th rowspan="2">srl</th><th colspan="4">sdp</th><th rowspan="2">lem</th><th rowspan="2">fea</th><th rowspan="2">amr</th></tr><tr><th>fine</th><th>coarse</th><th>ctb</th><th>pku</th><th>863</th><th>ud</th><th>pku</th><th>msra</th><th>ontonotes</th><th>SemEval16</th><th>DM</th><th>PAS</th><th>PSD</th></tr></thead><tbody><tr><td rowspan="2">mul</td><td rowspan="2">UD2.7<br>OntoNotes5</td><td>small</td><td>98.62</td><td>-</td><td>-</td><td>-</td><td>-</td><td>93.23</td><td>-</td><td>-</td><td>74.42</td><td>79.10</td><td>76.85</td><td>70.63</td><td>-</td><td>91.19</td><td>93.67</td><td>85.34</td><td>87.71</td><td>84.51</td><td>-</td></tr><tr><td>base</td><td>98.97</td><td>-</td><td>-</td><td>-</td><td>-</td><td>90.32</td><td>-</td><td>-</td><td>80.32</td><td>78.74</td><td>71.23</td><td>73.63</td><td>-</td><td>92.60</td><td>96.04</td><td>81.19</td><td>85.08</td><td>82.13</td><td>-</td></tr><tr><td rowspan="5">zh</td><td rowspan="2">open</td><td>small</td><td>97.25</td><td>-</td><td>96.66</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>95.00</td><td>84.57</td><td>87.62</td><td>73.40</td><td>84.57</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td>base</td><td>97.50</td><td>-</td><td>97.07</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>96.04</td><td>87.11</td><td>89.84</td><td>77.78</td><td>87.11</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td rowspan="3">close</td><td>small</td><td>96.70</td><td>95.93</td><td>96.87</td><td>97.56</td><td>95.05</td><td>-</td><td>96.22</td><td>95.74</td><td>76.79</td><td>84.44</td><td>88.13</td><td>75.81</td><td>74.28</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td>base</td><td>97.52</td><td>96.44</td><td>96.99</td><td>97.59</td><td>95.29</td><td>-</td><td>96.48</td><td>95.72</td><td>77.77</td><td>85.29</td><td>88.57</td><td>76.52</td><td>73.76</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr><tr><td>ernie</td><td>96.95</td><td>97.29</td><td>96.76</td><td>97.64</td><td>95.22</td><td>-</td><td>97.31</td><td>96.47</td><td>77.95</td><td>85.67</td><td>89.17</td><td>78.51</td><td>74.10</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td><td>-</td></tr></tbody></table>

- 根据我们的[最新研究](https://aclanthology.org/2021.emnlp-main.451)，单任务学习的性能往往优于多任务学习。在乎精度甚于速度的话，建议使用[单任务模型](https://hanlp.hankcs.com/docs/api/hanlp/pretrained/index.html)。

HanLP采用的数据预处理与拆分比例与流行方法未必相同，比如HanLP采用了[完整版的MSRA命名实体识别语料](https://bbs.hankcs.com/t/topic/3033)，而非大众使用的阉割版；HanLP使用了语法覆盖更广的[Stanford Dependencies标准](https://hanlp.hankcs.com/docs/annotations/dep/sd_zh.html)，而非学术界沿用的Zhang and Clark (2008)标准；HanLP提出了[均匀分割CTB的方法](https://bbs.hankcs.com/t/topic/3024)，而不采用学术界不均匀且遗漏了51个黄金文件的方法。HanLP开源了[一整套语料预处理脚本与相应语料库](https://github.com/hankcs/HanLP/blob/master/plugins/hanlp_demo/hanlp_demo/zh/train/open_small.py)，力图推动中文NLP的透明化。

总之，HanLP只做我们认为正确、先进的事情，而不一定是流行、权威的事情。

## 引用

如果你在研究中使用了HanLP，请按如下格式引用：

```bibtex
@inproceedings{he-choi-2021-stem,
    title = "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders",
    author = "He, Han and Choi, Jinho D.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.451",
    pages = "5555--5577",
    abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}
```

## License

### 源代码

HanLP源代码的授权协议为 **Apache License 2.0**，可免费用做商业用途。请在产品说明中附加HanLP的链接和授权协议。HanLP受版权法保护，侵权必究。

##### 自然语义（青岛）科技有限公司

HanLP从v1.7版起独立运作，由自然语义（青岛）科技有限公司作为项目主体，主导后续版本的开发，并拥有后续版本的版权。

##### 上海林原公司

HanLP 早期得到了上海林原公司的大力支持，并拥有1.28及前序版本的版权，相关版本也曾在上海林原公司网站发布。

### 预训练模型

机器学习模型的授权在法律上没有定论，但本着尊重开源语料库原始授权的精神，如不特别说明，HanLP的多语种模型授权沿用[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)，中文模型授权为仅供研究与教学使用。

## References

https://hanlp.hankcs.com/docs/references.html



================================================
FILE: docs/Makefile
================================================
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS    ?=
SPHINXBUILD   ?= sphinx-build
SOURCEDIR     = .
BUILDDIR      = _build

# Put it first so that "make" without argument is like "make help".
help:
	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)


================================================
FILE: docs/annotations/constituency/ctb.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Chinese Tree Bank

See also [The Bracketing Guidelines for the Penn Chinese Treebank (3.0)](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1040&context=ircs_reports).

| Tag  | Definition                             | 定义                                     | 例子            |
|------|----------------------------------------------|----------------------------------------------------|-------------------|
| ADJP | adjective phrase                             | 形容词短语，以形容词为中心词                                     | 不完全、大型            |
| ADVP | adverbial phrase headed by AD (adverb)       | 副词短语，以副词为中心词                                       | 非常、很              |
| CLP  | classifier phrase                            | 由量词构成的短语                                           | 系列、大批             |
| CP   | clause headed by C (complementizer)          | 从句，通过带补语（如“的”、“吗”等）                                | 张三喜欢李四吗？          |
| DNP  | phrase formed by ‘‘XP + DEG’’                | 结构为XP + DEG(的)的短语，其中XP可以是ADJP、DP、QP、PP等等，用于修饰名词短语。 | 大型的、前几年的、五年的、在上海的 |
| DP   | determiner phrase                            | 限定词短语，通常由限定词和数量词构成                                 | 这三个、任何            |
| DVP  | phrase formed by ‘‘XP + DEV’’                | 结构为XP+地的短评，用于修饰动词短语VP                              | 心情失落地、大批地         |
| FRAG | fragment                                     | 片段                                                 | (完）               |
| INTJ | interjection                                 | 插话，感叹语                                             | 哈哈、切              |
| IP   | simple clause headed by I (INFL)             | 简单子句或句子，通常不带补语（如“的”、“吗”等）                          | 张三喜欢李四。           |
| LCP  | phrase formed by ‘‘XP + LC’’                 | 用于表本地点+方位词（LC)的短语                                  | 生活中、田野上           |
| LST  | list marker                                  | 列表短语，包括标点符号                                        | 一.                |
| MSP  | some particles                               | 其他小品词                                              | 所、而、来、去           |
| NN   | common noun                                  | 名词                                                 | HanLP、技术          |
| NP   | noun phrase                                  | 名词短语，中心词通常为名词                                      | 美好生活、经济水平         |
| PP   | preposition phrase                           | 介词短语，中心词通常为介词                                      | 在北京、据报道           |
| PRN  | parenthetical                                | 插入语                                                | ，（张三说)，           |
| QP   | quantifier phrase                            | 量词短语                                               | 三个、五百辆            |
| TOP  | root node                                    | 根节点                                                | 根节点               |
| UCP  | unidentical coordination phrase              | 不对称的并列短语，指并列词两侧的短语类型不致                             | (养老、医疗）保险         |
| VCD  | coordinated verb compound                    | 复合动词                                               | 出版发行              |
| VCP  | verb compounds formed by VV + VC             | VV + VC形式的动词短语                                     | 看作是               |
| VNV  | verb compounds formed by A-not-A or A-one-A  | V不V形式的动词短语                                         | 能不能、信不信           |
| VP   | verb phrase                                  | 动词短语，中心词通常为动词                                      | 完成任务、努力工作         |
| VPT  | potential form V-de-R or V-bu-R              | V不R、V得R形式的动词短语                                     | 打不赢、打得过           |
| VRD  | verb resultative compound                    | 动补结构短语                                             | 研制成功、降下来          |
| VSB  | verb compounds formed by a modifier + a head | 修饰语+中心词构成的动词短语                                     | 拿来支付、仰头望去         |

================================================
FILE: docs/annotations/constituency/index.md
================================================
# Constituency Parsing

## Chinese
```{toctree}
ctb
```

## English
```{toctree}
ptb
```

## Japanese
```{toctree}
npcmj
```



================================================
FILE: docs/annotations/constituency/npcmj.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# NPCMJ

| Tag             | Description                             |
|-----------------|-----------------------------------------|
| ADVP            | adverb phrase                           |
| ADVP-CMPL       | complement adverb phrase                |
| ADVP-MSR        | measurement adverb phrase               |
| ADVP-PRD        | predicate adverb phrase                 |
| ADVP-TMP        | temporal adverb phrase                  |
| CONJP           | conjunction phrase                      |
| CP-EXL          | exclamative                             |
| CP-IMP          | imperative                              |
| CP-FINAL        | projection for sentence final particle  |
| CP-QUE          | question (direct or indirect)           |
| CP-QUE-ADV      | question used adverbially               |
| CP-QUE-OB1      | question used as object                 |
| CP-QUE-PRD      | question used as a nominal predicate    |
| CP-QUE-SBJ      | question used as subject                |
| CP-THT          | complementizer clause                   |
| CP-THT-ADV      | complementizer clause used adverbially  |
| CP-THT-OB1      | complementizer clause used as object    |
| CP-THT-PRD      | complementizer clause used as predicate |
| CP-THT-PRP      | purposive complementizer clause         |
| CP-THT-SBJ      | complementizer clause used as subject   |
| FRAG            | fragment                                |
| FS              | false start                             |
| INTJP           | interjection phrase                     |
| IP-ADV          | adverbial clause                        |
| IP-ADV-CONJ     | coordinated clause                      |
| IP-ADV-PRD      | adverbial clause used as predicate      |
| IP-ADV-SCON     | subordinate clause                      |
| IP-ADV-SCON-CND |                                         |
| conditional     | clause                                  |
| IP-EMB          | gapless noun-modifying clause           |
| IP-IMP          | imperative clause                       |
| IP-MAT          | matrix clause                           |
| IP-NMZ          | nominalized clause                      |
| IP-NMZ-PRD      | nominalized clause used as predicate    |
| IP-REL          | relative clause                         |
| IP-SMC          | small clause                            |
| IP-SMC-CNT      | small clause in continuative form       |
| IP-SMC-OB1      | small clause used as object             |
| IP-SMC-SBJ      | small clause used as subject            |
| IP-SUB          | clause under CP* layer                  |
| multi-sentence  | multiple sentence                       |
| NML             | intermediate nominal layer              |
| NP              | noun phrase                             |
| NP-ADV          | adverbial noun phrase                   |
| NP-CZZ          | causee noun phrase                      |
| NP-DOB1         | derived primary object noun phrase      |
| NP-DSBJ         | derived subject noun phrase             |
| NP-LGS          | logical subject noun phrase             |
| NP-LOC          | locational noun phrase                  |
| NP-MSR          | measure noun phrase                     |
| NP-OB1          | primary object noun phrase              |
| NP-OB2          | secondary object noun phrase            |
| NP-POS          | possessive noun phrase                  |
| NP-PRD          | predicate noun phrase                   |
| NP-SBJ          | subject noun phrase                     |
| NP-SBJ2         | secondary subject noun phrase           |
| NP-TMP          | temporal noun phrase                    |
| NP-TPC          | topic noun phrase                       |
| NP-VOC          | vocative noun phrase                    |
| NUMCLP          | numeral-classifier phrase               |
| PNLP            | prenominal phrase                       |
| PP              | particle phrase                         |
| PP-ADV          | adverbial particle phrase               |
| PP-CMPL         | complement particle phrase              |
| PP-CONJ         | coordination particle phrase            |
| PP-CZZ          | causee particle phrase                  |
| PP-DOB1         | derived primary object particle phrase  |
| PP-DSBJ         | derived subject particle phrase         |
| PP-LGS          | logical subject particle phrase         |
| PP-LOC          | locational particle phrase              |
| PP-MSR          | measure particle phrase                 |
| PP-OB1          | primary object particle phrase          |
| PP-OB2          | secondary object particle phrase        |
| PP-PRD          | predicate particle phrase               |
| PP-PRP          | purpositive particle phrase             |
| PP-SBJ          | subject particle phrase                 |
| PP-SBJ2         | secondary subject particle phrase       |
| PP-SCON         | subordination particle phrase           |
| PP-SCON-CND     | conditional particle phrase             |
| PP-TMP          | temporal particle phrase                |
| PP-TPC          | topic particle phrase                   |
| PP-VOC          | vocative particle phrase                |
| PRN             | parenthetical                           |

================================================
FILE: docs/annotations/constituency/ptb.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Penn Treebank

| Tag    | Description                                                                                                                                                                                                         |
|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ADJP   | Adjective Phrase.                                                                                                                                                                                                   |
| ADVP   | Adverb Phrase.                                                                                                                                                                                                      |
| CONJP  | Conjunction Phrase.                                                                                                                                                                                                 |
| FRAG   | Fragment.                                                                                                                                                                                                           |
| INTJ   | Interjection. Corresponds approximately to the part-of-speech tag UH.                                                                                                                                               |
| LST    | List marker. Includes surrounding punctuation.                                                                                                                                                                      |
| NAC    | Not a Constituent; used to show the scope of certain prenominal modifiers within an NP.                                                                                                                             |
| NP     | Noun Phrase.                                                                                                                                                                                                        |
| NX     | - Used within certain complex NPs to mark the head of the NP. Corresponds very roughly to N-bar level but used quite differently.                                                                                   |
| PP     | Prepositional Phrase.                                                                                                                                                                                               |
| PRN    | Parenthetical                                                                                                                                                                                                       |
| PRT    | Particle. Category for words that should be tagged RP.                                                                                                                                                              |
| QP     | Quantifier Phrase (i.e. complex measure/amount phrase); used within NP.                                                                                                                                             |
| ROOT   | No description                                                                                                                                                                                                      |
| RRC    | Reduced Relative Clause.                                                                                                                                                                                            |
| S      | conjunction or a wh-word and that does not exhibit subject-verb inversion.                                                                                                                                          |
| SBAR   | Clause introduced by a (possibly empty) subordinating conjunction.                                                                                                                                                  |
| SBARQ  | - Direct question introduced by a wh-word or a wh-phrase. Indirect questions and relative clauses should be bracketed as SBAR, not SBARQ.                                                                           |
| SINV   | - Inverted declarative sentence, i.e. one in which the subject follows the tensed verb or modal.                                                                                                                    |
| SQ     | Inverted yes/no question, or main clause of a wh-question, following the wh-phrase in SBARQ.                                                                                                                        |
| UCP    | Unlike Coordinated Phrase.                                                                                                                                                                                          |
| VP     | Verb Phrase.                                                                                                                                                                                                       |
| WHADJP | Wh-adjective Phrase. Adjectival phrase containing a wh-adverb, as in how hot.                                                                                                                                       |
| WHADVP | - Wh-adverb Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing a wh-adverb such as how or why.                                                        |
| WHNP   | - Wh-noun Phrase. Introduces a clause with an NP gap. May be null (containing the 0 complementizer) or lexical, containing some wh-word, e.g. who, which book, whose daughter, none of which, or how many leopards. |
| WHPP   | - Wh-prepositional Phrase. Prepositional phrase containing a wh-noun phrase (such as of which or by whose authority) that either introduces a PP gap or is contained by a WHNP.                                     |
| X      | - Unknown, uncertain, or unbracketable. X is often used for bracketing typos and in bracketing the…the-constructions.                                                                                              |



================================================
FILE: docs/annotations/dep/index.md
================================================
# Dependency Parsing

## Chinese

```{toctree}
sd_zh
pmt
```

## English

```{toctree}
sd_en
```

## Multilingual

```{toctree}
ud
```


================================================
FILE: docs/annotations/dep/pmt.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# PKU Multi-view Chinese Treebank


```{eval-rst}

See also :cite:`qiu-etal-2014-multi`.
    
```

| Tag  | Description                                 | 依存关系       |
| ---- | ------------------------------------------- | -------------- |
| ACT  | action object                               | 行为宾语       |
| ADV  | adverbial                                   | 状语           |
| APP  | appositive element                          | 同位           |
| ATT  | attribute                                   | 定语           |
| CMP  | complement                                  | 补语           |
| COO  | other coordination element                  | 一般并列       |
| COS  | share-right-child coordination element      | 共享并列       |
| DE   | de (modifier of 的(special function word))  | 的字           |
| DEI  | dei (modifier of 得(special function word)) | 得字           |
| DI   | di (modifier of 地(special function word))  | 地字           |
| FOC  | focus                                       | 强调           |
| HED  | root of a sentence                          | 核心           |
| IC   | independent clause                          | 小句           |
| IOB  | indirect object                             | 间接宾语       |
| IS   | independent structure                       | 独立结构       |
| ISC  | non-shared independent structure            | 并列式独立结构 |
| LAD  | left additive                               | 前附加         |
| MT   | modality and time                           | 时体           |
| NUM  | number                                      | 数字           |
| POB  | propositional object                        | 介宾           |
| PUN  | punctuation                                 | 标点           |
| PUS  | cross-clause punctuation                    | 跨句标点       |
| QUC  | post-positional quantity                    | 数量补语       |
| QUCC | non-shared post-positional quantity         | 非共享数量补语 |
| QUN  | quantity                                    | 数量           |
| RAD  | right additive                              | 后附加         |
| RADC | non-shared right additive                   | 非共享后附加   |
| RED  | reduplicate element                         | 重叠           |
| SBV  | subject                                     | 主语           |
| TPC  | topic                                       | 话题           |
| VOB  | direct object                               | 宾语           |
| VV   | serial verb construction                    | 连动           |


================================================
FILE: docs/annotations/dep/sd_en.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Stanford Dependencies English

See also [Stanford typed dependencies manual](https://nlp.stanford.edu/software/dependencies_manual.pdf).

| Tag        | Description                       |
|------------|-----------------------------------|
| abbrev     | abbreviation modifier             |
| acomp      | adjectival complement             |
| advcl      | adverbial clause modifier         |
| advmod     | adverbial modifier                |
| agent      | agent                             |
| amod       | adjectival modifier               |
| appos      | appositional modifier             |
| arg        | argument                          |
| attr       | attributive                       |
| aux        | auxiliary                         |
| auxpass    | passive auxiliary                 |
| cc         | coordination                      |
| ccomp      | clausal complement                |
| comp       | complement                        |
| complm     | complementizer                    |
| conj       | conjunct                          |
| cop        | copula                            |
| csubj      | clausal subject                   |
| csubjpass  | clausal passive subject           |
| dep        | dependent                         |
| det        | determiner                        |
| discourse  | discourse element                 |
| dobj       | direct object                     |
| expl       | expletive                         |
| goeswith   | goes with                         |
| iobj       | indirect object                   |
| mark       | marker                            |
| mod        | modifier                          |
| mwe        | multi-word expression             |
| neg        | negation modifier                 |
| nn         | noun compound modifier            |
| npadvmod   | noun phrase as adverbial modifier |
| nsubj      | nominal subject                   |
| nsubjpass  | passive nominal subject           |
| num        | numeric modifier                  |
| number     | element of compound number        |
| obj        | object                            |
| parataxis  | parataxis                         |
| pcomp      | prepositional complement          |
| pobj       | object of a preposition           |
| poss       | possession modifier               |
| possessive | possessive modifier               |
| preconj    | preconjunct                       |
| pred       | predicate                         |
| predet     | predeterminer                     |
| prep       | prepositional modifier            |
| prepc      | prepositional clausal modifier    |
| prt        | phrasal verb particle             |
| punct      | punctuation                       |
| purpcl     | purpose clause modifier           |
| quantmod   | quantifier phrase modifier        |
| rcmod      | relative clause modifier          |
| ref        | referent                          |
| rel        | relative                          |
| root       | root                              |
| sdep       | semantic dependent                |
| subj       | subject                           |
| tmod       | temporal modifier                 |
| vmod       | verb modifier                     |
| xcomp      | open clausal complement           |
| xsubj      | controlling subject               |

================================================
FILE: docs/annotations/dep/sd_zh.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Stanford Dependencies Chinese


```{eval-rst}

See also :cite:`chang-etal-2009-discriminative`.
    
```

|Tag|Description|中文简称|例句|依存弧|
| ---- | ---- | ---- | ---- | ---- |
|nn|noun compound modifier|复合名词修饰|服务中心|nn(中心，服务）|
|punct|punctuation|标点符号|海关统计表明，|punct(表明，，)|
|nsubj|nominal subject|名词性主语|梅花盛开|nsubj (盛开，梅花）|
|conj|conjunct (links two conjuncts)|连接性状语|设备和原材料|conj(原材料，设备）|
|dobj|direct object|直接宾语|浦东颁布了七十一件文件|dobj(颁布，文件）|
|advmod|adverbial modifier|副词性状语|部门先送上文件|advmod(送上，先）|
|prep|prepositional modifier|介词性修饰语|在实践中逐步完善|prep(完善，在）|
|nummod|number modifier|数词修饰语|七十一件文件|nummod(件，七十一）|
|amod|adjectival modifier|形容词修饰语|跨世纪工程|amod(工程，跨世纪）|
|pobj|prepositional object|介词性宾语|根据有关规定|pobj (根据，规定）|
|rcmod|relative clause modifier|关系从句修饰语|不曾遇到过的情况|rcmod(情况，遇到）|
|cpm|complementizer|补语|开发浦东的经济活动|cpm(开发，的）|
|assm|associative marker|关联标记|企业的商品|assm(企业，的）|
|assmod|associative modifier|关联修饰|企业的商品|assmod(商品，企业）|
|cc|coordinating conjunction|并列关系|设备和原材料|cc(原材料，和）|
|clf|classifier modifier|类别修饰|七十一件文件|clf(文件，件）|
|ccomp|clausal complement|从句补充|银行决定先取得信用评级|ccomp(决定，取得）|
|det|determiner|限定语|这些经济活动|det(活动，这些）|
|lobj|localizer object|范围宾语|近年来|lobj(来，近年）|
|range|dative object that is a quantifier phrase|数量词间接宾语|成交药品一亿多元|range(成交，元）|
|asp|aspect marker|时态标记|发挥了作用|asp(发挥，了）|
|tmod|temporal modifier|时间修饰语|以前不曾遇到过|tmod(遇到，以前）|
|plmod|localizer modifier of a preposition|介词性地点修饰|在这片热土上|plmod(在，上）|
|attr|attributive|属性|贸易额为二百亿美元|attr(为，美元）|
|mmod|modal verb modifier|情态动词|利益能得到保障|mmod(得到，能）|
|loc|localizer|位置补语|占九成以上|loc(占，以上）|
|top|topic|主题|建筑是主要活动|top(是，建筑）|
|pccomp|clausal complement of a preposition|介词补语|据有关部门介绍|pccomp(据，介绍）|
|etc|etc modifier|省略关系|科技、文教等领域|etc(文教，等）|
|lccomp|clausal complement of a localizer|位置补语|中国对外开放中升起的明星|lccomp(中，开放）|
|ordmod|ordinal number modifier|量词修饰|第七个机构|ordmod(个，第七）|
|xsubj|controlling subject|控制主语|银行决定先取得信用评级|xsubj (取得，银行）|
|neg|negative modifier|否定修饰|以前不曾遇到过|neg(遇到，不）|
|rcomp|resultative complement|结果补语|研究成功|rcomp(研究，成功）|
|comod|coordinated verb compound modifier|并列联合动词|颁布实行|comod(颁布，实行）|
|vmod|verb modifier|动词修饰|其在支持外商企业方面的作用|vmod(方面，支持）|
|prtmod|particles such as 所，以，来，而|小品词|在产业化所取得的成就|prtmod(取得，所）|
|ba|“ba” construction|把字关系|把注意力转向市场|ba(转向，把）|
|dvpm|manner DE(地）modifier|地字修饰|有效地防止流失|dvpm(有效，地）|
|dvpmod|a "XP+DEV", phrase that modifies VP|地字动词短语|有效地防止流失|dvpmod(防止，有效）|
|prnmod|parenthetical modifier|插入词修饰|八五期间（1990-1995 )|pmmod(期间，1995)|
|cop|copular|系动词|原是自给自足的经济|cop(自给自足，是）|
|pass|passive marker|被动标记|被认定为高技术产业|pass(认定，被）|
|nsubjpass|nominal passive subject|被动名词主语|镍被称作现代工业的维生素|nsubjpass(称作，镍）|
|dep|dependent|其他依赖关系|新华社北京二月十二日电|dep(电，新华社）|


================================================
FILE: docs/annotations/dep/ud.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Universal Dependencies

## Cross-Linguistic

See also [Universal Dependencies](https://universaldependencies.org/docs/u/dep/index.html).

| Tag        | Description                                  |
|------------|----------------------------------------------|
| acl        | clausal modifier of noun (adjectival clause) |
| advcl      | adverbial clause modifier                    |
| advmod     | adverbial modifier                           |
| amod       | adjectival modifier                          |
| appos      | appositional modifier                        |
| aux        | auxiliary                                    |
| auxpass    | passive auxiliary                            |
| case       | case marking                                 |
| cc         | coordinating conjunction                     |
| ccomp      | clausal complement                           |
| compound   | compound                                     |
| conj       | conjunct                                     |
| cop        | copula                                       |
| csubj      | clausal subject                              |
| csubjpass  | clausal passive subject                      |
| dep        | unspecified dependency                       |
| det        | determiner                                   |
| discourse  | discourse element                            |
| dislocated | dislocated elements                          |
| dobj       | direct object                                |
| expl       | expletive                                    |
| foreign    | foreign words                                |
| goeswith   | goes with                                    |
| iobj       | indirect object                              |
| list       | list                                         |
| mark       | marker                                       |
| mwe        | multi-word expression                        |
| name       | name                                         |
| neg        | negation modifier                            |
| nmod       | nominal modifier                             |
| nsubj      | nominal subject                              |
| nsubjpass  | passive nominal subject                      |
| nummod     | numeric modifier                             |
| parataxis  | parataxis                                    |
| punct      | punctuation                                  |
| remnant    | remnant in ellipsis                          |
| reparandum | overridden disfluency                        |
| root       | root                                         |
| vocative   | vocative                                     |
| xcomp      | open clausal complement                      |


## Localization

### Chinese

| Tag              |       简称 |                                                         例句 |
| :--------------- |---------:| -----------------------------------------------------------: |
| acl              |    形容词子句 | ![acl](https://file.hankcs.com/img/ud/1303b5cbe9413044cb800b3c3514b70b.svg) |
| advcl:loc        |  状语从句修饰语 | ![advcl:loc](https://file.hankcs.com/img/ud/e8865563caf0eda7a80043eda8cc43a6.svg) |
| advmod           |       状语 | ![advmod](https://file.hankcs.com/img/ud/3ce9276f4e18d92edb48e58956bbaee7.svg) |
| advmod:dvp       |     状语:地 | ![advmod:dvp](https://file.hankcs.com/img/ud/e90870682b9f0a80736d25977565f96a.svg) |
| advmod:loc       |    状语:限定 | ![advmod:loc](https://file.hankcs.com/img/ud/135e9143e73e5f45290d204d4ad5b30e.svg) |
| advmod:rcomp     |    状语:因果 | ![advmod:rcomp](https://file.hankcs.com/img/ud/aa75be342648bed0846f54a88f71e7a7.svg) |
| amod             |       形容 | ![amod](https://file.hankcs.com/img/ud/dee0097c244c1bd0a1d1ed117932346d.svg) |
| amod:ordmod      |    形容:数量 | ![amod:ordmod](https://file.hankcs.com/img/ud/8bb79245311a4190836dce8439591e91.svg) |
| appos            |       同位 | ![appos](https://file.hankcs.com/img/ud/a74f6a31f68ba5697d0a8906e8476b47.svg) |
| aux:asp          |    助语:时态 | ![aux:asp](https://file.hankcs.com/img/ud/8c32de9b4858c0e4d24ee6da5fb80a6e.svg) |
| aux:ba           |     助语:把 | ![aux:ba](https://file.hankcs.com/img/ud/2c712e3af49fcdbd5914398895904f3c.svg) |
| aux:modal        |    助语:情态 | ![aux:modal](https://file.hankcs.com/img/ud/606946c569e4bfbacbb1b9e13336e247.svg) |
| aux:prtmod       |    助语:分词 | ![aux:prtmod](https://file.hankcs.com/img/ud/fc49d338487dd63687941433a0633f5d.svg) |
| auxpass          |       被动 | ![auxpass](https://file.hankcs.com/img/ud/a6e4a8aabb7bb1bb5c4e9cdf7876e3f7.svg) |
| case             |       条件 | ![case](https://file.hankcs.com/img/ud/35a021e15a9355880cb8720ba34ed936.svg) |
| cc               |     并列连词 | ![cc](https://file.hankcs.com/img/ud/18c6a22520cec2ba60ce636bb410f651.svg) |
| ccomp            |     从句补语 | ![ccomp](https://file.hankcs.com/img/ud/8cc4ea0c6a090f1ba03d02926240c35b.svg) |
| compound:nn      |     复合名词 | ![compound:nn](https://file.hankcs.com/img/ud/587e12141aa42aa9862ea0ac0eb30e09.svg) |
| compound:vc      |     复合动词 | ![compound:vc](https://file.hankcs.com/img/ud/f72cedcb6cec8563d88063b118544a9d.svg) |
| conj             |       连接 | ![conj](https://file.hankcs.com/img/ud/fc924f495d1d5a3a828a0e2262da06cd.svg) |
| cop              |       系动 | ![cop](https://file.hankcs.com/img/ud/a7da58f57adbe9e6bd166ecb514f2d1c.svg) |
| csubj            |     从句主语 | ![csubj](https://file.hankcs.com/img/ud/0adda481e81b3765ed7f4f9d55c153c4.svg) |
| dep              |      未定义 | ![dep](https://file.hankcs.com/img/ud/db15b792f1bfd5e42982832b04c65a79.svg) |
| det              |       限定 | ![det](https://file.hankcs.com/img/ud/17376d13a4e7b0677cd18d13e0990dab.svg) |
| discourse        |       语气 | ![discourse](https://file.hankcs.com/img/ud/d7eb37d5fd13462b237140a08f0ed9a4.svg) |
| dobj             |     直接宾语 | ![dobj](https://file.hankcs.com/img/ud/f5e801103ddc57a9aeff0e272b8f7b44.svg) |
| etc              |       省略 | ![etc](https://file.hankcs.com/img/ud/86d3fd24cae9f585b7730119edaa0248.svg) |
| mark             |       标记 | ![mark](https://file.hankcs.com/img/ud/b17b4027ab368c76a3b6f085d5b561d9.svg) |
| mark:clf         |    标记:量词 | ![mark:clf](https://file.hankcs.com/img/ud/5974c92e3587aa64ba1d572243b9c5cc.svg) |
| name             |       名称 | ![name](https://file.hankcs.com/img/ud/63ea082457dfe6f4fc04f635a8c019f3.svg) |
| neg              |       否定 | ![neg](https://file.hankcs.com/img/ud/e38814231ff9a31dcce5672556375c94.svg) |
| nmod             |     名词修饰 | ![nmod](https://file.hankcs.com/img/ud/e948a8dbcd43984d14c257f0ace1753d.svg) |
| nmod:assmod      |  名词修饰:关联 | ![nmod:assmod](https://file.hankcs.com/img/ud/76349f30cef2c4978a03118d65ac6c81.svg) |
| nmod:poss        | 名词修饰:所有格 | ![nmod:poss](https://file.hankcs.com/img/ud/5b4937dbea42cdff7054e9dd0904bedb.svg) |
| nmod:prep        |  名词修饰:介词 | ![nmod:prep](https://file.hankcs.com/img/ud/63b92981638b758681a82e9f4a9aa04c.svg) |
| nmod:range       |  名词修饰:范围 | ![nmod:range](https://file.hankcs.com/img/ud/217ec98756cfe3750c76f5e5e89b7f54.svg) |
| nmod:tmod        |  名词修饰:时间 | ![nmod:tmod](https://file.hankcs.com/img/ud/166e3b8fb72db52f0ec332d444ea017f.svg) |
| nmod:topic       |  名词修饰:主题 | ![nmod:topic](https://file.hankcs.com/img/ud/93c83c98c188b131211ac5e9ff5242c0.svg) |
| nsubj            |     名词主语 | ![nsubj](https://file.hankcs.com/img/ud/63e3902d4a3045d1d696a0c4ed203563.svg) |
| nsubj:xsubj      | 名词主语: 补语 | ![nsubj:xsubj](https://file.hankcs.com/img/ud/80cb355b9f9732fd888186a1f658b0ac.svg) |
| nsubjpass        |    被动态主语 | ![nsubjpass](https://file.hankcs.com/img/ud/6327fab58ab42d5a417b2e5c7018ac3a.svg) |
| nummod           |       数量 | ![nummod](https://file.hankcs.com/img/ud/0fd20559645265c2c937f06631aa74df.svg) |
| parataxis:prnmod |       并列 | ![parataxis:prnmod](https://file.hankcs.com/img/ud/783a0faf4cd935bb61f5d225a388b79e.svg) |
| punct            |     标点符号 | ![punct](https://file.hankcs.com/img/ud/983410055658352080ae476a5d85e6b5.svg) |
| root             |        根 | ![root](https://file.hankcs.com/img/ud/588101bec0440ffb769172f8b7e9f98e.svg) |
| xcomp            |     从句补语 | ![xcomp](https://file.hankcs.com/img/ud/c72071875f1c01e51acb9e1ec4893113.svg) |


================================================
FILE: docs/annotations/index.md
================================================
# Annotations


```{toctree}
tok/index
pos/index
ner/index
dep/index
sdp/index
srl/index
constituency/index
```



================================================
FILE: docs/annotations/ner/index.md
================================================
# Named Entity Recognition

## Chinese

```{toctree}
pku
msra
```

## Multilingual

```{toctree}
ontonotes
```


================================================
FILE: docs/annotations/ner/msra.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# msra

| Category | Subcategory    | Tag-set of Format-1 | Tag-set of Format-2 |
|----------|----------------|---------------------|---------------------|
| NAMEX    | Person         | P                   | PERSON              |
|          | Location       | L                   | LOCATION            |
|          | Organization   | 〇                   | ORGANIZATION        |
| TIMEX    | Date           | dat                 | DATE                |
|          | Duration       | dur                 | DURATION            |
|          | Time           | tim                 | TIME                |
| NUMEX    | Percent        | per                 | PERCENT             |
|          | Money          | mon                 | MONEY               |
|          | Frequency      | fre                 | FREQUENCY           |
|          | Integer        | int                 | INTEGER             |
|          | Fraction       | fra                 | FRACTION            |
|          | Decimal        | dec                 | DECIMAL             |
|          | Ordinal        | ord                 | ORDINAL             |
|          | Rate           | rat                 | RATE                |
| MEASUREX | Age            | age                 | AGE                 |
|          | Weight         | wei                 | WEIGHT              |
|          | Length         | len                 | LENGTH              |
|          | Temperature    | tem                 | TEMPERATURE         |
|          | Angle          | ang                 | ANGLE               |
|          | Area           | are                 | AREA                |
|          | Capacity       | cap                 | CAPACITY            |
|          | Speed          | spe                 | SPEED               |
|          | Acceleration   | acc                 | ACCELERATION        |
|          | Other measures | mea                 | MEASURE             |
| ADDREX   | Email          | ema                 | EMAIL               |
|          | Phone          | pho                 | PHONE               |
|          | Fax            | fax                 | FAX                 |
|          | Telex          | tel                 | TELEX               |
|          | WWW            | WWW                 | WWW                 |
|          | Postalcode     | pos                 | POSTALCODE          |



================================================
FILE: docs/annotations/ner/ontonotes.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# ontonotes

| TAG       | Description                        |
|--------------|------------------------------------------------------|
| PERSON       | People, including fictional                          |
| NORP         | Nationalities or religious or political groups       |
| FACILITY     | Buildings, airports, highways, bridges, etc.         |
| ORGANIZATION | Companies, agencies, institutions, etc.              |
| GPE          | Countries, cities, states                            |
| LOCATION     | Non-GPE locations, mountain ranges, bodies of water  |
| PRODUCT      | Vehicles, weapons, foods, etc. (Not services)        |
| EVENT        | Named hurricanes, battles, wars, sports events, etc. |
| WORK OF ART  | Titles of books, songs, etc.                         |
| LAW          | Named documents made into laws                       |
| DATE     | Absolute or relative dates or periods        |
| TIME     | Times smaller than a day                     |
| PERCENT  | Percentage                        |
| MONEY    | Monetary values, including unit              |
| QUANTITY | Measurements, as of weight or distance       |
| ORDINAL  | “first”, “second”                             |
| CARDINAL | Numerals that do not fall under another type |



================================================
FILE: docs/annotations/ner/pku.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# pku

| 序号 | 词性 | 名称     | 帮助记忆的诠释                                         | 例子及注解                                                   |
| ---- | ---- | -------- | ------------------------------------------------------ | ------------------------------------------------------------ |
| 1   | nr   | 人名     | 名词代码n和“人(ren)”的声母并在一起。                   | 1. 汉族人及与汉族起名方式相同的非汉族人的姓和名单独切分，并分别标注为nr。张/nr 仁伟/nr， 欧阳/nr 修/nr， 阮/nr 志雄/nr， 朴/nr 贞爱/nr汉族人除有单姓和复姓外，还有双姓，即有的女子出嫁后，在原来的姓上加上丈夫的姓。如：陈方安生。这种情况切分、标注为：陈/nr 方/nr 安生/nr；唐姜氏，切分、标注为：唐/nr 姜氏/nr。2. 姓名后的职务、职称或称呼要分开。江/nr 主席/n， 小平/nr 同志/n， 江/nr 总书记/n，张/nr 教授/n， 王/nr 部长/n， 陈/nr 老总/n， 李/nr 大娘/n， 刘/nr 阿姨/n， 龙/nr 姑姑/n3. 对人的简称、尊称等若为两个字，则合为一个切分单位，并标以nr。老张/nr， 大李/nr， 小郝/nr， 郭老/nr， 陈总/nr4. 明显带排行的亲属称谓要切分开，分不清楚的则不切开。三/m 哥/n， 大婶/n， 大/a 女儿/n， 大哥/n， 小弟/n， 老爸/n5. 一些著名作者的或不易区分姓和名的笔名通常作为一个切分单位。鲁迅/nr， 茅盾/nr， 巴金/nr， 三毛/nr， 琼瑶/nr， 白桦/nr6. 外国人或少数民族的译名（包括日本人的姓名）不予切分，标注为nr。克林顿/nr， 叶利钦/nr， 才旦卓玛/nr， 小林多喜二/nr， 北研二/nr，华盛顿/nr， 爱因斯坦/nr有些西方人的姓名中有小圆点，也不分开。卡尔·马克思/nr |
| 2   | ns   | 地名     | 名词代码n和处所词代码s并在一起。                       | 安徽/ns，深圳/ns，杭州/ns，拉萨/ns，哈尔滨/ns， 呼和浩特/ns， 乌鲁木齐/ns，长江/ns，黄海/ns，太平洋/ns， 泰山/ns， 华山/ns，亚洲/ns， 海南岛/ns，太湖/ns，白洋淀/ns， 俄罗斯/ns，哈萨克斯坦/ns，彼得堡/ns， 伏尔加格勒/ns 1. 国名不论长短，作为一个切分单位。中国/ns， 中华人民共和国/ns， 日本国/ns， 美利坚合众国/ns， 美国/ns2. 地名后有“省”、“市”、“县”、“区”、“乡”、“镇”、“村”、“旗”、“州”、“都”、“府”、“道”等单字的行政区划名称时，不切分开，作为一个切分单位。四川省/ns， 天津市/ns，景德镇/ns沙市市/ns， 牡丹江市/ns，正定县/ns，海淀区/ns， 通州区/ns，东升乡/ns， 双桥镇/ns 南化村/ns，华盛顿州/ns，俄亥俄州/ns，东京都/ns， 大阪府/ns，北海道/ns， 长野县/ns，开封府/ns，宣城县/ns3. 地名后的行政区划有两个以上的汉字，则将地名同行政区划名称切开，不过要将地名同行政区划名称用方括号括起来，并标以短语NS。[芜湖/ns 专区/n] NS，[宣城/ns 地区/n]ns，[内蒙古/ns 自治区/n]NS，[深圳/ns 特区/n]NS， [厦门/ns 经济/n 特区/n]NS， [香港/ns 特别/a 行政区/n]NS，[香港/ns 特区/n]NS， [华盛顿/ns 特区/n]NS，4. 地名后有表示地形地貌的一个字的普通名词，如“江、河、山、洋、海、岛、峰、湖”等，不予切分。鸭绿江/ns，亚马逊河/ns， 喜马拉雅山/ns， 珠穆朗玛峰/ns，地中海/ns，大西洋/ns，洞庭湖/ns， 塞普路斯岛/ns 5. 地名后接的表示地形地貌的普通名词若有两个以上汉字，则应切开。然后将地名同该普通名词标成短语NS。[台湾/ns 海峡/n]NS，[华北/ns 平原/n]NS，[帕米尔/ns 高原/n]NS， [南沙/ns 群岛/n]NS，[京东/ns 大/a 峡谷/n]NS [横断/b 山脉/n]NS6．地名后有表示自然区划的一个字的普通名词，如“ 街，路，道，巷，里，町，庄，村，弄，堡”等，不予切分。 中关村/ns，长安街/ns，学院路/ns， 景德镇/ns， 吴家堡/ns， 庞各庄/ns， 三元里/ns，彼得堡/ns， 北菜市巷/ns， 7．地名后接的表示自然区划的普通名词若有两个以上汉字，则应切开。然后将地名同自然区划名词标成短语NS。[米市/ns 大街/n]NS， [蒋家/nz 胡同/n]NS ， [陶然亭/ns 公园/n]NS ， 8． 大小地名相连时的标注方式为：北京市/ns 海淀区/ns 海淀镇/ns [南/f 大街/n]NS [蒋家/nz 胡同/n]NS 24/m 号/q ， |
| 3   | nt   | 机构团体 | “团”的声母为t，名词代码n和t并在一起。                  | （参见2。短语标记说明--NT）联合国/nt，中共中央/nt，国务院/nt， 北京大学/nt1．大多数团体、机构、组织的专有名称一般是短语型的，较长，且含有地名或人名等专名，再组合，标注为短语NT。[中国/ns 计算机/n 学会/n]NT， [香港/ns 钟表业/n 总会/n]NT， [烟台/ns 大学/n]NT， [香港/ns 理工大学/n]NT， [华东/ns 理工大学/n]NT，[合肥/ns 师范/n 学院/n]NT， [北京/ns 图书馆/n]NT， [富士通/nz 株式会社/n]NT， [香山/ns 植物园/n]NT， [安娜/nz 美容院/n]NT，[上海/ns 手表/n 厂/n]NT， [永和/nz 烧饼铺/n]NT，[北京/ns 国安/nz 队/n]NT，2. 对于在国际或中国范围内的知名的唯一的团体、机构、组织的名称即使前面没有专名，也标为nt或NT。联合国/nt，国务院/nt，外交部/nt， 财政部/nt，教育部/nt， 国防部/nt，[世界/n 贸易/n 组织/n]NT， [国家/n 教育/vn 委员会/n]NT，[信息/n 产业/n 部/n]NT，[全国/n 信息/n 技术/n 标准化/vn 委员会/n]NT，[全国/n 总/b 工会/n]NT，[全国/n 人民/n 代表/n 大会/n]NT，美国的“国务院”，其他国家的“外交部、财政部、教育部”，必须在其所属国的国名之后出现时，才联合标注为NT。[美国/ns 国务院/n]NT，[法国/ns 外交部/n]NT，[美/j 国会/n]NT，日本有些政府机构名称很特别，无论是否出现在“日本”国名之后都标为nt。[日本/ns 外务省/nt]NT，[日/j 通产省/nt]NT通产省/nt 3. 前后相连有上下位关系的团体机构组织名称的处理方式如下:[联合国/nt 教科文/j 组织/n]NT， [中国/ns 银行/n 北京/ns 分行/n]NT，[河北省/ns 正定县/ns 西平乐乡/ns 南化村/ns 党支部/n]NT， 当下位名称含有专名（如“北京/ns 分行/n”、“南化村/ns 党支部/n”、“昌平/ns 分校/n”）时，也可脱离前面的上位名称单独标注为NT。[中国/ns 银行/n]NT [北京/ns 分行/n]NT，北京大学/nt [昌平/ns 分校/n]NT，4. 团体、机构、组织名称中用圆括号加注简称时:[宝山/ns 钢铁/n （/w 宝钢/j ）/w 总/b 公司/n]NT，[宝山/ns 钢铁/n 总/b 公司/n]NT，（/w 宝钢/j ）/w |

================================================
FILE: docs/annotations/pos/863.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# 863

| 词性  |   名称   |              说明              |                                                                                                                                                                                                                                    例子                                                                                                                                                                                                                                    |
| :-- | -----: | ---------------------------: | -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
| a   |    形容词 |        取英语形容词adjective的第1个字母 |                                                                                                                                                                                                                                                                                                                                                                                                                                         [重要/a 步伐/n]NP ，美丽/a ，看似/v 抽象/a ， |
| c   |     连词 |      取英语连词conjunction的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                                           合作/vn 与/c 伙伴/n |
| d   |     副词 | 取adverb的第2个字母，因其第1个字母已用于形容词。 |                                                                                                                                                                                                                                                                                                                                                                                                                                                             进一步/d 发展/v ， |
| e   |     叹词 |      取英语叹词exclamation的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                             啊/e ，/w 那/r 金灿灿/z 的/u 麦穗/n ， |
| f   |    方位词 |                      取汉字“方”。 |                                                                                                                                                                                                                                                                                                                                                                                                                                    军人/n 的/u 眼睛/n 里/f 不/d 是/v 没有/v 风景/n ， |
| g   |    语素字 |                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| h   |   前接成分 |               取英语head的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                          许多/m 非/h 主角/n 人物/n ，办事处/n 的/u “/w 准/h 政府/n ”/w 功能/n 不断/d 加强/v ， |
| i   |     成语 |            取英语成语idiom的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                                         一言一行/i ，义无反顾/i ， |
| j   |   简称略语 |                   取汉字“简”的声母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                                     [德/j 外长/n]NP ，文教/j ， |
| k   |   后接成分 |                        后接成分。 |                                                                                                                                                                                                                                                                                                                                                                                                                                         少年儿童/l 朋友/n 们/k ，身体/n 健康/a 者/k ， |
| m   |     数词 |    取英语numeral的第3个字母，n，u已有他用。 | 1．数量词组应切分为数词和量词。 三/m 个/q， 10/m 公斤/q， 一/m 盒/q 点心/n ，但少数数量词已是词典的登录单位，则不再切分。 一个/m ， 一些/m ，2. 基数、序数、小数、分数、百分数一律不予切分，为一个切分单位，标注为 m 。一百二十三/m，20万/m， 123.54/m， 一个/m， 第一/m， 第三十五/m， 20%/m， 三分之二/m， 千分之三十/m， 几十/m 人/n， 十几万/m 元/q， 第一百零一/m 个/q ，3. 约数，前加副词、形容词或后加“来、多、左右”等助数词的应予分开。约/d 一百/m 多/m 万/m，仅/d 一百/m 个/q， 四十/m 来/m 个/q，二十/m 余/m 只/q， 十几/m 个/q，三十/m 左右/m ，两个数词相连的及“成百”、“上千”等则不予切分。五六/m 年/q， 七八/m 天/q，十七八/m 岁/q， 成百/m 学生/n，上千/m 人/n， 4．表序关系的“数＋名”结构，应予切分。二/m 连/n ，　三/m 部/n ， |
| mq  |    数量词 |                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| n   |     名词 |             取英语名词noun的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                        （参见 动词--v）岗位/n ， 城市/n ， 机会/n ，她/r 是/v 责任/n 编辑/n ， |
| nd  |   方位名词 |           方位名词（nd），表示位置的相对方向 |                                                                                                                                                                                                                                                                                                                                                                                                                  上  下  左  右  前  后  里  外  中  东  西  南  北前边  左面  里头  中间  外部 |
| nh  |     人名 |           人名（nh），表示人的名称的专有名词 |                                                                                                                                                                                                                                                                                                                                                                                                                                        华罗庚  阿凡提  诸葛亮  司马相如  松赞干布  卡尔·马克思 |
| nhf |      姓 |                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| nhs |      名 |                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| ni  |    机构名 |    机构名（ni），表示团体、组织、机构名称的专有名词 |                                                                                                                                                                                                                                                                                                                                                                                                                                                    联合国  教育部  北京大学  中国科学院 |
| nl  |   处所名词 |                处所名词（nl），表示处所 |                                                                                                                                                                                                                                                                                                                                                                                                                                           空中  高处  隔壁  门口  附近  边疆  一旁  野外 |
| ns  |     地名 |         地名（ns），表示地理区域名称的专有名词 |                                                                                                                                                                                                                                                                                                                                                                                                                       亚洲  大西洋  地中海  阿尔卑斯山  加拿大中国  北京  浙江  景德镇  呼和浩特  中关村 |
| nt  |   时间名词 |          时间名词（nt），包括一般所说的时量词 |                                                                                                                                                                                                                                                                                                                                                                                                                                 年  月  日  分  秒现在  过去  昨天  去年  将来  宋朝  星期一 |
| nz  | 其他专有名词 |                   其他专有名词（nz） |                                                                                                                                                                                                                                                                                                                                                                                                                                                           五粮液  宫爆鸡丁  桑塔纳 |
| o   |    拟声词 |    取英语拟声词onomatopoeia的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                          哈哈/o 一/m 笑/v ，装载机/n 隆隆/o 推进/v ， |
| p   |     介词 |    取英语介词prepositional的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                对/p 子孙后代/n 负责/v ，以/p 煤/n 养/v 农/Ng ，为/p 治理/v 荒山/n 服务/v ， 把/p 青年/n 推/v 上/v 了/u 领导/vn 岗位/n ， |
| q   |     量词 |           取英语quantity的第1个字母。 |                                                                                                                                                                                                                                                                                                                                                                                                                                                （参见数词m）首/m 批/q ，一/m 年/q ， |
| r   |     代词 |  取英语代词pronoun的第2个字母，因p已用于介词。 |                                                                                                                                                                                                                                                                                                                                                                           单音节代词“本”、“每”、“各”、“诸”后接单音节名词时，和后接的单音节名词合为代词；当后接双音节名词时，应予切分。本报/r， 每人/r， 本社/r， 本/r 地区/n， 各/r 部门/n |
| u   |     助词 |              取英语助词auxiliary。 |                                                                                                                                                                                                                                                                                                                                                                   [[俄罗斯/ns 和/c 北约/j]NP-BL 之间/f [战略/n 伙伴/n 关系/n]NP 的/u 建立/vn]NP 填平/v 了/u [[欧洲/ns 安全/a 政治/n]NP 的/u 鸿沟/n]NP |
| v   |     动词 |             取英语动词verb的第一个字母。 |                                                                                                                                                                                                                                                                                                                （参见 名词--n）[[[欧盟/j 扩大/v]S 的/u [历史性/n 决定/n]NP]NP 和/c [北约/j 开放/v]S]NP-BL [为/p [创建/v [一/m 种/q 新/a 的/u 欧洲/ns 安全/a 格局/n]NP]VP-SBI]PP-MD [奠定/v 了/u 基础/n]V-SBI ，， |
| vd  |   趋向动词 |                趋向动词（vd），表示趋向 |                                                                                                                                                                                                                                                                                                                                                                                                                      （走）上   （趴）下   （进）来   （回）去（跑）上来  （掉）下去  （提）起来  （扔）过去 |
| vl  |   联系动词 |             联系动词（vl），表示关系的判断 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                        是 |
| vu  |   能愿动词 |             能愿动词（vu），表示可能、意愿 |                                                                                                                                                                                                                                                                                                                                                                                                                                             能够  能  应该  可以  可能  情愿  愿意  要 |
| w   |   标点符号 |                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                  ”/w ：/w |
| ws  | 非汉字字符串 |                非汉字字符串（ws），如： |                                                                                                                                                                                                                                                                                                                                                                                                                                                    HanLP office  windows |
| x   |   非语素字 |  非语素字只是一个符号，字母x通常用于代表未知数、符号。 | 


================================================
FILE: docs/annotations/pos/ctb.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# ctb

 See also [The Part-Of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0)](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1039&context=ircs_reports).

| Tag  | Description                                          | Chinese | Chinese Description                                                      | Examples              |
|-----|------------------------------------------------------|---------|---------------------------------------------------------|------------------------|
| AD  | adverb                                               | 副词      | 副词                                                      | 仍然、很、大大、约              |
| AS  | aspect marker                                        | 动态助词    | 助词                                                      | 了、着、过                  |
| BA  | `bǎ` in ba-construction                              | 把字句     | 当“把”、“将”出现在结构“NP0 + BA + NP1+VP”时的词性                    | 把、将                    |
| CC  | coordinating conjunction                             | 并列连接词   | 并列连词                                                    | 与、和、或者、还是              |
| CD  | cardinal number                                      | 概数词     | 数词或表达数量的词                                               | 一百、好些、若干               |
| CS  | subordinating conjunction                            | 从属连词    | 从属连词                                                    | 如果、那么、就                |
| DEC | `de` as a complementizer or a nominalizer            | 补语成分“的” | 当“的”或“之”作补语标记或名词化标记时的词性，其结构为：S/VP DEC {NP}，如，喜欢旅游的大学生   | 的、之                    |
| DEG | `de` as a genitive marker and an associative marker  | 属格“的”   | 当“的”或“之”作所有格时的词性，其结构为:NP/PP/JJ/DT DEG {NP}， 如，他的车、经济的发展 | 的、之                    |
| DER | resultative `de`, `de` in V-de const and V-de-R      | 表结果的“得” | 当“得”出现在结构“V-得-R”时的词性，如，他跑得很快                            | 得                      |
| DEV | manner `de`, `de` before VP                          | 表方式的“地” | 当“地”出现在结构“X-地-VP”时的词性，如，高兴地说                            | 地                      |
| DT  | determiner                                           | 限定词     | 代冠词，通常用来修饰名词                                            | 这、那、该、每、各              |
| ETC | for words like "etc."                                | 表示省略    | “等”、“等等”的词性                                             | 等、等等              |
| EM  | emoji                                                | 表情符     | 表情符、或称颜文字                                      | ：）             |
| FW  | foreign words                                        | 外来语     | 外来词                                                     | 卡拉、A型                  |
| IC  | incomplete component                                 | 不完整成分   | 不完整成分，尤指ASR导致的错误                         | 好*xin*、那个*ba*  |
| IJ  | interjection                                         | 句首感叹词   | 感叹词，通常出现在句子首部                                           | 啊                      |
| JJ  | other noun-modifier                                  | 其他名词修饰语 | 形容词                                                     | 共同、新                   |
| LB  | `bèi` in long bei-const                              | 长句式表被动  | 当“被”、“叫”、“给”出现在结构“NP0 + LB + NP1+ VP”结构时 的词性，如，他被我训了一顿  | 被、叫、给                  |
| LC  | localizer                                            | 方位词     | 方位词以及表示范围的限定词                                                     | 前、旁、到、在内、以来、为止               |
| M   | measure word                                         | 量词      | 量词                                                      | 个、群、公里                 |
| MSP | other particle                                       | 其他小品词   | 其他虚词，包括“所”、“以”、“来”和“而”等出现在VP前的词                         | 所、以、来、而                |
| NN  | common noun                                          | 其他名词    | 除专有名词和时间名词外的所有名词                                        | 桌子、生活、经济               |
| NOI | noise that characters are written in the wrong order | 噪声      | 汉字顺序颠倒产生的噪声                    | 事/NOI 类/NOI 各/NOI 故/NOI |
| NR  | proper noun                                          | 专有名词    | 专有名词，通常表示地名、人名、机构名等                                     | 北京、乔丹、微软               |
| NT  | temporal noun                                        | 时间名词    | 表示时间概念的名词                                               | 一月、汉朝、当今               |
| OD  | ordinal number                                       | 序数词     | 序列词                                                     | 第一百                    |
| ON  | onomatopoeia                                         | 象声词     | 象声词                                                     | 哗哗、呼、咯吱              |
| P   | preposition e.g., "from" and "to"                    | 介词      | 介词                                                      | 从、对、根据                 |
| PN  | pronoun                                              | 代词      | 代词，通常用来指代名词                                             | 我、这些、其、自己              |
| PU  | punctuation                                          | 标点符号    | 标点符号                                                    | ?、。、；                  |
| SB  | `bèi` in short bei-const                             | 短句式表被动  | 当“被”、“给”出现在NP0 +SB+ VP结果时的词性，如，他被训了 一顿                  | 被、叫                    |
| SP  | sentence final particle                              | 句末助词    | 经常出现在句尾的词                                               | 吧、呢、啊、啊                |
| URL | web address                                          | 网址      | 网址                                                      | www.hankcs.com         |
| VA  | predicative adjective                                | 表语形容词   | 可以接在“很”后面的形容词谓语                                         | 雪白、厉害                  |
| VC  | copula, be words                                     | 系动词     | 系动词，表示“是”或“非”概念的动词                                       | 是、为、非                  |
| VE  | `yǒu` as the main verb                               | 动词有无    | 表示“有”或“无”概念的动词                                          | 有、没有、无                 |
| VV  | other verb                                           | 其他动词    | 其他普通动词，包括情态词、控制动词、动作动词、心理动词等等                           | 可能、要、走、喜欢              |


================================================
FILE: docs/annotations/pos/index.md
================================================
# Part-of-Speech Tagging

## Chinese
```{toctree}
ctb
pku
863
```

## Japanese
```{toctree}
npcmj
```

## Multilingual

```{toctree}
ud
```





================================================
FILE: docs/annotations/pos/npcmj.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# NPCMJ


| Tag       | Description                       |
|-----------|-----------------------------------|
| ADJI      | イ-adjective                      |
| ADJI-MD   | modal イ-adjective                |
| ADJN      | ナ-adjective                      |
| ADJN-MD   | modal ナ-adjective                |
| ADV       | adverb                            |
| AX        | auxiliary verb (including copula) |
| AXD       | auxiliary verb, past tense        |
| CL        | classifier                        |
| CONJ      | coordinating conjunction          |
| D         | determiner                        |
| FN        | formal noun                       |
| FW        | foreign word                      |
| INTJ      | interjection                      |
| MD        | modal element                     |
| N         | noun                              |
| N-MENTION | mentioned expression              |
| NEG       | negation                          |
| NPR       | proper noun                       |
| NUM       | numeral                           |
| P-COMP    | complementizer particle           |
| P-CONN    | conjunctional particle            |
| P-FINAL   | final particle                    |
| P-INTJ    | interjectional particle           |
| P-OPTR    | toritate particle                 |
| P-ROLE    | role particle                     |
| PASS      | direct passive                    |
| PASS2     | indirect passive                  |
| PNL       | prenominal                        |
| PRO       | pronoun                           |
| PU        | punctuation                       |
| PUL       | left bracket                      |
| PUR       | right bracket                     |
| Q         | quantifier                        |
| QUOT      | quote                             |
| SYM       | symbol                            |
| VB        | verb (or verb stem)               |
| VB0       | light verb                        |
| VB2       | secondary verb                    |
| WADV      | indeterminate adverb              |
| WD        | indeterminate determiner          |
| WNUM      | indeterminate numeral             |
| WPRO      | indeterminate pronoun             |

================================================
FILE: docs/annotations/pos/pku.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# pku

| 序号 | 词性 | 名称     | 帮助记忆的诠释                                         | 例子及注解                                                   |
| ---- | ---- | -------- | ------------------------------------------------------ | ------------------------------------------------------------ |
| 1    | Ag   | 形语素   | 形容词性语素。形容词代码为a，语素代码ｇ前面置以A。     | 绿色/n 似/d 锦/Ag ，                                         |
| 2    | a    | 形容词   | 取英语形容词adjective的第1个字母                       | [重要/a 步伐/n]NP ，美丽/a ，看似/v 抽象/a ，                |
| 3    | ad   | 副形词   | 直接作状语的形容词。形容词代码a和副词代码d并在一起。   | [积极/ad 谋求/v]V-ZZ ，幻象/n 易/ad 逝/Vg ，                 |
| 4    | an   | 名形词   | 具有名词功能的形容词。形容词代码a和名词代码n并在一起。 | [外交/n 和/c 安全/an]NP-BL ，                                |
| 5    | Bg   | 区别语素 | 区别词性语素。区别词代码为b，语素代码ｇ前面置以B。     | 赤/Ag 橙/Bg 黄/a 绿/a 青/a 蓝/a 紫/a ，                      |
| 6    | b    | 区别词   | 取汉字“别”的声母。                                     | 女/b 司机/n， 金/b 手镯/n， 慢性/b 胃炎/n， 古/b 钱币/n， 副/b 主任/n， 总/b 公司/n 单音节区别词和单音节名词或名语素组合，作为一个词，并标以名词词性n。 |
| 7    | c    | 连词     | 取英语连词conjunction的第1个字母。                     | 合作/vn 与/c 伙伴/n                                          |
| 8    | Dg   | 副语素   | 副词性语素。副词代码为d，语素代码ｇ前面置以D。         | 了解/v 甚/Dg 深/a ，煞/Dg 是/v 喜人/a ，                     |
| 9    | d    | 副词     | 取adverb的第2个字母，因其第1个字母已用于形容词。       | 进一步/d 发展/v ，                                           |
| 10   | e    | 叹词     | 取英语叹词exclamation的第1个字母。                     | 啊/e ，/w 那/r 金灿灿/z 的/u 麦穗/n ，                       |
| 11   | f    | 方位词   | 取汉字“方”。                                           | 军人/n 的/u 眼睛/n 里/f 不/d 是/v 没有/v 风景/n ，           |
| 12   | h    | 前接成分 | 取英语head的第1个字母。                                | 许多/m 非/h 主角/n 人物/n ，办事处/n 的/u “/w 准/h 政府/n ”/w 功能/n 不断/d 加强/v ， |
| 13   | i    | 成语     | 取英语成语idiom的第1个字母。                           | 一言一行/i ，义无反顾/i ，                                   |
| 14   | j    | 简称略语 | 取汉字“简”的声母。                                     | [德/j 外长/n]NP ，文教/j ，                                  |
| 15   | k    | 后接成分 | 后接成分。                                             | 少年儿童/l 朋友/n 们/k ，身体/n 健康/a 者/k ，               |
| 16   | l    | 习用语   | 习用语尚未成为成语，有点“临时性”，取“临”的声母。       | 少年儿童/l 朋友/n 们/k ，落到实处/l ，                       |
| 17   | Mg   | 数语素   | 数词性语素。数词代码为m，语素代码ｇ前面置以M。         | 甲/Mg 减下/v 的/u 人/n 让/v 乙/Mg 背上/v ，凡/d “/w 寅/Mg 年/n ”/w 中/f 出生/v 的/u 人/n 生肖/n 都/d 属/v 虎/n ， |
| 18   | m    | 数词     | 取英语numeral的第3个字母，n，u已有他用。               | 1．数量词组应切分为数词和量词。 三/m 个/q， 10/m 公斤/q， 一/m 盒/q 点心/n ，但少数数量词已是词典的登录单位，则不再切分。 一个/m ， 一些/m ，2. 基数、序数、小数、分数、百分数一律不予切分，为一个切分单位，标注为 m 。一百二十三/m，20万/m， 123.54/m， 一个/m， 第一/m， 第三十五/m， 20%/m， 三分之二/m， 千分之三十/m， 几十/m 人/n， 十几万/m 元/q， 第一百零一/m 个/q ，3. 约数，前加副词、形容词或后加“来、多、左右”等助数词的应予分开。约/d 一百/m 多/m 万/m，仅/d 一百/m 个/q， 四十/m 来/m 个/q，二十/m 余/m 只/q， 十几/m 个/q，三十/m 左右/m ，两个数词相连的及“成百”、“上千”等则不予切分。五六/m 年/q， 七八/m 天/q，十七八/m 岁/q， 成百/m 学生/n，上千/m 人/n， 4．表序关系的“数＋名”结构，应予切分。二/m 连/n ，　三/m 部/n ， |
| 19   | Ng   | 名语素   | 名词性语素。名词代码为n，语素代码ｇ前面置以N。         | 出/v 过/u 两/m 天/q 差/Ng， 理/v 了/u 一/m 次/q 发/Ng，      |
| 20   | n    | 名词     | 取英语名词noun的第1个字母。                            | （参见 动词--v）岗位/n ， 城市/n ， 机会/n ，她/r 是/v 责任/n 编辑/n ， |
| 21   | nr   | 人名     | 名词代码n和“人(ren)”的声母并在一起。                   | 1. 汉族人及与汉族起名方式相同的非汉族人的姓和名单独切分，并分别标注为nr。张/nr 仁伟/nr， 欧阳/nr 修/nr， 阮/nr 志雄/nr， 朴/nr 贞爱/nr汉族人除有单姓和复姓外，还有双姓，即有的女子出嫁后，在原来的姓上加上丈夫的姓。如：陈方安生。这种情况切分、标注为：陈/nr 方/nr 安生/nr；唐姜氏，切分、标注为：唐/nr 姜氏/nr。2. 姓名后的职务、职称或称呼要分开。江/nr 主席/n， 小平/nr 同志/n， 江/nr 总书记/n，张/nr 教授/n， 王/nr 部长/n， 陈/nr 老总/n， 李/nr 大娘/n， 刘/nr 阿姨/n， 龙/nr 姑姑/n3. 对人的简称、尊称等若为两个字，则合为一个切分单位，并标以nr。老张/nr， 大李/nr， 小郝/nr， 郭老/nr， 陈总/nr4. 明显带排行的亲属称谓要切分开，分不清楚的则不切开。三/m 哥/n， 大婶/n， 大/a 女儿/n， 大哥/n， 小弟/n， 老爸/n5. 一些著名作者的或不易区分姓和名的笔名通常作为一个切分单位。鲁迅/nr， 茅盾/nr， 巴金/nr， 三毛/nr， 琼瑶/nr， 白桦/nr6. 外国人或少数民族的译名（包括日本人的姓名）不予切分，标注为nr。克林顿/nr， 叶利钦/nr， 才旦卓玛/nr， 小林多喜二/nr， 北研二/nr，华盛顿/nr， 爱因斯坦/nr有些西方人的姓名中有小圆点，也不分开。卡尔·马克思/nr |
| 22   | ns   | 地名     | 名词代码n和处所词代码s并在一起。                       | 安徽/ns，深圳/ns，杭州/ns，拉萨/ns，哈尔滨/ns， 呼和浩特/ns， 乌鲁木齐/ns，长江/ns，黄海/ns，太平洋/ns， 泰山/ns， 华山/ns，亚洲/ns， 海南岛/ns，太湖/ns，白洋淀/ns， 俄罗斯/ns，哈萨克斯坦/ns，彼得堡/ns， 伏尔加格勒/ns 1. 国名不论长短，作为一个切分单位。中国/ns， 中华人民共和国/ns， 日本国/ns， 美利坚合众国/ns， 美国/ns2. 地名后有“省”、“市”、“县”、“区”、“乡”、“镇”、“村”、“旗”、“州”、“都”、“府”、“道”等单字的行政区划名称时，不切分开，作为一个切分单位。四川省/ns， 天津市/ns，景德镇/ns沙市市/ns， 牡丹江市/ns，正定县/ns，海淀区/ns， 通州区/ns，东升乡/ns， 双桥镇/ns 南化村/ns，华盛顿州/ns，俄亥俄州/ns，东京都/ns， 大阪府/ns，北海道/ns， 长野县/ns，开封府/ns，宣城县/ns3. 地名后的行政区划有两个以上的汉字，则将地名同行政区划名称切开，不过要将地名同行政区划名称用方括号括起来，并标以短语NS。[芜湖/ns 专区/n] NS，[宣城/ns 地区/n]ns，[内蒙古/ns 自治区/n]NS，[深圳/ns 特区/n]NS， [厦门/ns 经济/n 特区/n]NS， [香港/ns 特别/a 行政区/n]NS，[香港/ns 特区/n]NS， [华盛顿/ns 特区/n]NS，4. 地名后有表示地形地貌的一个字的普通名词，如“江、河、山、洋、海、岛、峰、湖”等，不予切分。鸭绿江/ns，亚马逊河/ns， 喜马拉雅山/ns， 珠穆朗玛峰/ns，地中海/ns，大西洋/ns，洞庭湖/ns， 塞普路斯岛/ns 5. 地名后接的表示地形地貌的普通名词若有两个以上汉字，则应切开。然后将地名同该普通名词标成短语NS。[台湾/ns 海峡/n]NS，[华北/ns 平原/n]NS，[帕米尔/ns 高原/n]NS， [南沙/ns 群岛/n]NS，[京东/ns 大/a 峡谷/n]NS [横断/b 山脉/n]NS6．地名后有表示自然区划的一个字的普通名词，如“ 街，路，道，巷，里，町，庄，村，弄，堡”等，不予切分。 中关村/ns，长安街/ns，学院路/ns， 景德镇/ns， 吴家堡/ns， 庞各庄/ns， 三元里/ns，彼得堡/ns， 北菜市巷/ns， 7．地名后接的表示自然区划的普通名词若有两个以上汉字，则应切开。然后将地名同自然区划名词标成短语NS。[米市/ns 大街/n]NS， [蒋家/nz 胡同/n]NS ， [陶然亭/ns 公园/n]NS ， 8． 大小地名相连时的标注方式为：北京市/ns 海淀区/ns 海淀镇/ns [南/f 大街/n]NS [蒋家/nz 胡同/n]NS 24/m 号/q ， |
| 23   | nt   | 机构团体 | “团”的声母为t，名词代码n和t并在一起。                  | （参见2。短语标记说明--NT）联合国/nt，中共中央/nt，国务院/nt， 北京大学/nt1．大多数团体、机构、组织的专有名称一般是短语型的，较长，且含有地名或人名等专名，再组合，标注为短语NT。[中国/ns 计算机/n 学会/n]NT， [香港/ns 钟表业/n 总会/n]NT， [烟台/ns 大学/n]NT， [香港/ns 理工大学/n]NT， [华东/ns 理工大学/n]NT，[合肥/ns 师范/n 学院/n]NT， [北京/ns 图书馆/n]NT， [富士通/nz 株式会社/n]NT， [香山/ns 植物园/n]NT， [安娜/nz 美容院/n]NT，[上海/ns 手表/n 厂/n]NT， [永和/nz 烧饼铺/n]NT，[北京/ns 国安/nz 队/n]NT，2. 对于在国际或中国范围内的知名的唯一的团体、机构、组织的名称即使前面没有专名，也标为nt或NT。联合国/nt，国务院/nt，外交部/nt， 财政部/nt，教育部/nt， 国防部/nt，[世界/n 贸易/n 组织/n]NT， [国家/n 教育/vn 委员会/n]NT，[信息/n 产业/n 部/n]NT，[全国/n 信息/n 技术/n 标准化/vn 委员会/n]NT，[全国/n 总/b 工会/n]NT，[全国/n 人民/n 代表/n 大会/n]NT，美国的“国务院”，其他国家的“外交部、财政部、教育部”，必须在其所属国的国名之后出现时，才联合标注为NT。[美国/ns 国务院/n]NT，[法国/ns 外交部/n]NT，[美/j 国会/n]NT，日本有些政府机构名称很特别，无论是否出现在“日本”国名之后都标为nt。[日本/ns 外务省/nt]NT，[日/j 通产省/nt]NT通产省/nt 3. 前后相连有上下位关系的团体机构组织名称的处理方式如下:[联合国/nt 教科文/j 组织/n]NT， [中国/ns 银行/n 北京/ns 分行/n]NT，[河北省/ns 正定县/ns 西平乐乡/ns 南化村/ns 党支部/n]NT， 当下位名称含有专名（如“北京/ns 分行/n”、“南化村/ns 党支部/n”、“昌平/ns 分校/n”）时，也可脱离前面的上位名称单独标注为NT。[中国/ns 银行/n]NT [北京/ns 分行/n]NT，北京大学/nt [昌平/ns 分校/n]NT，4. 团体、机构、组织名称中用圆括号加注简称时:[宝山/ns 钢铁/n （/w 宝钢/j ）/w 总/b 公司/n]NT，[宝山/ns 钢铁/n 总/b 公司/n]NT，（/w 宝钢/j ）/w |
| 24   | nx   | 外文字符 | 外文字符。                                             | A/nx 公司/n ，B/nx 先生/n ，X/nx 君/Ng ，24/m K/nx 镀金/n ，C/nx 是/v 光速/n ，Windows98/nx ，PentiumIV/nx ，I LOVE THIS GAME/nx ，HanLP/nx |
| 25   | nz   | 其他专名 | “专”的声母的第1个字母为z，名词代码n和z并在一起。       | （参见2。短语标记说明--NZ）除人名、国名、地名、团体、机构、组织以外的其他专有名词都标以nz。满族/nz，俄罗斯族/nz，汉语/nz，罗马利亚语/nz， 捷克语/nz，中文/nz， 英文/nz， 满人/nz， 哈萨克人/nz， 诺贝尔奖/nz， 茅盾奖/nz， 1.包含专有名称（或简称）的交通线，标以nz；短语型的，标为NZ。津浦路/nz， 石太线/nz， [京/j 九/j 铁路/n]NZ， [京/j 津/j 高速/b 公路/n]NZ， 2. 历史上重要事件、运动等专有名称一般是短语型的，按短语型专有名称处理，标以NZ。[卢沟桥/ns 事件/n]NZ， [西安/ns 事变/n]NZ，[五四/t 运动/n]NZ， [明治/nz 维新/n]NZ，[甲午/t 战争/n]NZ，3.专有名称后接多音节的名词，如“语言”、“文学”、“文化”、“方式”、“精神”等，失去专指性，则应分开。欧洲/ns 语言/n， 法国/ns 文学/n， 西方/ns 文化/n， 贝多芬/nr 交响乐/n， 雷锋/nr 精神/n， 美国/ns 方式/n，日本/ns 料理/n， 宋朝/t 古董/n 4. 商标（包括专名及后接的“牌”、“型”等）是专指的，标以nz，但其后所接的商品仍标以普通名词n。康师傅/nr 方便面/n， 中华牌/nz 香烟/n， 牡丹III型/nz 电视机/n， 联想/nz 电脑/n， 鳄鱼/nz 衬衣/n， 耐克/nz 鞋/n5. 以序号命名的名称一般不认为是专有名称。2/m 号/q 国道/n ，十一/m 届/q 三中全会/j如果前面有专名，合起来作为短语型专名。[中国/ns 101/m 国道/n]NZ， [中共/j 十一/m 届/q 三中全会/j]NZ，6. 书、报、杂志、文档、报告、协议、合同等的名称通常有书名号加以标识，不作为专有名词。由于这些名字往往较长，名字本身按常规处理。《/w 宁波/ns 日报/n 》/w ，《/w 鲁迅/nr 全集/n 》/w，中华/nz 读书/vn 报/n， 杜甫/nr 诗选/n，少数书名、报刊名等专有名称，则不切分。红楼梦/nz， 人民日报/nz，儒林外史/nz 7. 当有些专名无法分辨它们是人名还是地名或机构名时，暂标以nz。[巴黎/ns 贝尔希/nz 体育馆/n]NT，其中“贝尔希”只好暂标为nz。 |
| 26   | o    | 拟声词   | 取英语拟声词onomatopoeia的第1个字母。                  | 哈哈/o 一/m 笑/v ，装载机/n 隆隆/o 推进/v ，                 |
| 27   | p    | 介词     | 取英语介词prepositional的第1个字母。                   | 对/p 子孙后代/n 负责/v ，以/p 煤/n 养/v 农/Ng ，为/p 治理/v 荒山/n 服务/v ， 把/p 青年/n 推/v 上/v 了/u 领导/vn 岗位/n ， |
| 28   | q    | 量词     | 取英语quantity的第1个字母。                            | （参见数词m）首/m 批/q ，一/m 年/q ，                        |
| 29   | Rg   | 代语素   | 代词性语素。代词代码为r，在语素的代码g前面置以R。      | 读者/n 就/d 是/v 这/r 两/m 棵/q 小树/n 扎根/v 于/p 斯/Rg 、/w 成长/v 于/p 斯/Rg 的/u 肥田/n 沃土/n ， |
| 30   | r    | 代词     | 取英语代词pronoun的第2个字母，因p已用于介词。          | 单音节代词“本”、“每”、“各”、“诸”后接单音节名词时，和后接的单音节名词合为代词；当后接双音节名词时，应予切分。本报/r， 每人/r， 本社/r， 本/r 地区/n， 各/r 部门/n |
| 31   | s    | 处所词   | 取英语space的第1个字母。                               | 家里/s 的/u 电脑/n 都/d 联通/v 了/u 国际/n 互联网/n ，西部/s 交通/n 咽喉/n ， |
| 32   | Tg   | 时语素   | 时间词性语素。时间词代码为t，在语素的代码g前面置以T。  | ３日/t 晚/Tg 在/p 总统府/n 发表/v 声明/n ，尊重/v 现/Tg 执政/vn 当局/n 的/u 权威/n ， |
| 33   | t    | 时间词   | 取英语time的第1个字母。                                | 1. 年月日时分秒，按年、月、日、时、分、秒切分，标注为t 。1997年/t 3月/t 19日/t 下午/t 2时/t 18分/t若数字后无表示时间的“年、月、日、时、分、秒”等的标为数词m。1998/m 中文/n 信息/n 处理/vn 国际/n 会议/n 2. 历史朝代的名称虽然有专有名词的性质，仍标注为t。西周/t， 秦朝/t， 东汉/t， 南北朝/t， 清代/t“牛年、虎年”等一律不予切分，标注为：牛年/t， 虎年/t， 甲午年/t， 甲午/t 战争/n， 庚子/t 赔款/n， 戊戌/t 变法/n |
| 34   | u    | 助词     | 取英语助词auxiliary。                                  | [[俄罗斯/ns 和/c 北约/j]NP-BL 之间/f [战略/n 伙伴/n 关系/n]NP 的/u 建立/vn]NP 填平/v 了/u [[欧洲/ns 安全/a 政治/n]NP 的/u 鸿沟/n]NP |
| 35   | Vg   | 动语素   | 动词性语素。动词代码为v。在语素的代码g前面置以V。      | 洗/v 了/u 一个/m 舒舒服服/z 的/u 澡/Vg                       |
| 36   | v    | 动词     | 取英语动词verb的第一个字母。                           | （参见 名词--n）[[[欧盟/j 扩大/v]S 的/u [历史性/n 决定/n]NP]NP 和/c [北约/j 开放/v]S]NP-BL [为/p [创建/v [一/m 种/q 新/a 的/u 欧洲/ns 安全/a 格局/n]NP]VP-SBI]PP-MD [奠定/v 了/u 基础/n]V-SBI ，， |
| 37   | vd   | 副动词   | 直接作状语的动词。动词和副词的代码并在一起。           | 形势/n 会/v 持续/vd 好转/v ，认为/v 是/v 电话局/n 收/v 错/vd 了/u 费/n ， |
| 38   | vn   | 名动词   | 指具有名词功能的动词。动词和名词的代码并在一起。       | 引起/v 人们/n 的/u 关注/vn 和/c 思考/vn ，收费/vn 电话/n 的/u 号码/n ， |
| 39   | w    | 标点符号 |                                                        | ”/w ：/w                                                     |
| 40   | x    | 非语素字 | 非语素字只是一个符号，字母x通常用于代表未知数、符号。  |                                                              |
| 41   | Yg   | 语气语素 | 语气词性语素。语气词代码为y。在语素的代码g前面置以Y。  | 唯/d 大力/d 者/k 能/v 致/v 之/u 耳/Yg                        |
| 42   | y    | 语气词   | 取汉字“语”的声母。                                     | 会/v 泄露/v 用户/n 隐私/n 吗/y ，又/d 何在/v 呢/y ？         |
| 43   | z    | 状态词   | 取汉字“状”的声母的前一个字母。                         | 取得/v 扎扎实实/z 的/u 突破性/n 进展/vn ，四季/n 常青/z 的/u 热带/n 树木/n ，短短/z 几/m 年/q 间， |

================================================
FILE: docs/annotations/pos/ud.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Universal Dependencies

See also [Universal Dependencies](https://universaldependencies.org/u/pos/).

| Tag        | Description                                  |
|------------|----------------------------------------------|
| ADJ   | adjective                 |
| ADP   | adposition                |
| ADV   | adverb                    |
| AUX   | auxiliary                 |
| CCONJ | coordinating conjunction  |
| DET   | determiner                |
| INTJ  | interjection              |
| NOUN  | noun                      |
| NUM   | numeral                   |
| PART  | particle                  |
| PRON  | pronoun                   |
| PROPN | proper noun               |
| PUNCT | punctuation               |
| SCONJ | subordinating conjunction |
| SYM   | symbol                    |
| VERB  | verb                      |
| X     | other                     |

================================================
FILE: docs/annotations/sdp/dm.md
================================================
# The reduction of Minimal Recursion Semantics

Please refer to [Minimal Recursion Semantics An Introduction](https://www.cl.cam.ac.uk/~aac10/papers/mrs.pdf).


================================================
FILE: docs/annotations/sdp/index.md
================================================
# Semantic Dependency Parsing

## Chinese

```{toctree}
semeval16
```

## English

```{toctree}
dm
pas
psd
```



================================================
FILE: docs/annotations/sdp/pas.md
================================================
# Predicate-Argument Structures

Please refer to [Probabilistic disambiguation models for wide-coverage HPSG parsing](https://www.aclweb.org/anthology/P05-1011.pdf).


================================================
FILE: docs/annotations/sdp/psd.md
================================================
# Prague Czech-English Dependency Treebank

Please refer to [Prague Czech-English Dependency Treebank](http://ufal.mff.cuni.cz/pcedt2.0/en/index.html).


================================================
FILE: docs/annotations/sdp/semeval16.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# SemEval2016

## CSDP

SemEval2016 adopts the CSDP guideline listed as follows.

### 语义关系标注标签集

| 分类         |              |                 |                                                              |
| ------------ | ------------ | --------------- | ------------------------------------------------------------ |
| 语义周边角色 | 主体角色     | 施事AGT；       | 施事Agt；感事Aft                                             |
|              |              | 当事EXP；       | 当事Exp；领事Poss                                            |
|              | 客体角色     | 受事PAT；       | 受事Pat                                                      |
|              |              | 客事CONT；      | 客事Cont；成事Prod；结局Cons                                 |
|              |              | 涉事DATV；      | 涉事Datv；比较Comp；源事Orig                                 |
|              |              | 系事LINK；      | 类事Clas；属事Belg                                           |
|              | 情境角色     | 工具TOOL；      | 工具Tool                                                     |
|              |              | 材料MATL；      | 材料Matl                                                     |
|              |              | 方式MANN；      | 方式Mann；依据Accd                                           |
|              |              | 范围SCO；       | 范围Sco                                                      |
|              |              | 缘由REAS；      | 缘故Reas；意图Int                                            |
|              |              | 时间TIME；      | 时间Time；时间起点Tini；时间终点Tfin；时段Tdur；时距Trang    |
|              |              | 空间LOC；       | 空间Loc；原处所Lini；终处所Lfin；通过处所Lthru；趋向Dir      |
|              |              | 度量MEAS；      | 数量Quan；起始量Nini；终止量Nfin；数量短语Qp；频率Freq；顺序Seq；变化量Nvar |
|              |              | 状态STAT；      | 状态Stat；起始状态Sini；终止状态Sfin；历经状态Sproc          |
|              |              | 修饰FEAT；      | 描写Desc；宿主Host；名词修饰语Nmod；时间修饰语Tmod           |
| 语义结构关系 | 反关系       | 反施事rAGT；    | 反施事rAgt；反感事rAft                                       |
|              |              | 反当事rEXP。    | 反当事rExp；反领事rPoss                                      |
|              |              | 反受事rPAT；    | 反受事rPat                                                   |
|              |              | 反客事rCONT；   | 反客事rCont；反成事rProd；反结局rCons                        |
|              |              | 反涉事rDATV；   | 反涉事rDatv；反比较rComp；反源事rOrig                        |
|              |              | 反系事rLINK。   | 反类事rClas；反属事rBelg                                     |
|              |              | 反工具rTOOL；   | 反工具rTool                                                  |
|              |              | 反材料rMATL；   | 反材料rMatl                                                  |
|              |              | 反方式RMANN；   | 反方式rMann；反依据rAccd                                     |
|              |              | 反范围rSCO；    | 反范围rSco                                                   |
|              |              | 反缘由rREAS；   | 反缘故rReas；反意图rInt                                      |
|              |              | 反时间rTIME；   | 反时间rTime；反时间起点rTini；反时间终点rTfin；反时段rTdur；反时距rTrang |
|              |              | 反空间rLOC；    | 反空间rLoc；反原处所rLini；反终处所rLfin；反通过处所rLthru；反趋向rDir |
|              |              | 反度量rMEAS；   | 反数量rQuan；反起始量rNini；反终止量rNfin；反数量短语rQp；反频率rFreq；反顺序rSeq；反变化量rNvar |
|              |              | 反状态rSTAT；   | 反状态rStat；反起始状态rSini；反终止状态rSfin；反历经状态rSproc |
|              |              | 反修饰rFEAT；   | 反描写rDesc；反宿主rHost; 反名词修饰语rNmod; 反时间修饰语rTmod |
|              | 嵌套事件关系 | 嵌套施事dAGT；  | 嵌套施事dAgt；嵌套感事dAft                                   |
|              |              | 嵌套当事dEXP。  | 嵌套当事dExp；嵌套领事dPoss                                  |
|              |              | 嵌套受事dPAT；  | 嵌套受事dPat                                                 |
|              |              | 嵌套客事dCONT； | 嵌套客事dCont；嵌套成事dProd；嵌套结局dCons                  |
|              |              | 嵌套涉事dDATV； | 嵌套涉事dDatv；嵌套比较dComp；嵌套源事dOrig                  |
|              |              | 嵌套系事dLINK。 | 嵌套类事dClas；嵌套属事dBelg                                 |
|              |              | 嵌套工具dTOOL； | 嵌套工具dTool                                                |
|              |              | 嵌套材料dMATL； | 嵌套材料dMatl                                                |
|              |              | 嵌套方式dMANN； | 嵌套方式dMann；嵌套依据dAccd                                 |
|              |              | 嵌套范围dSCO；  | 嵌套范围dSco                                                 |
|              |              | 嵌套缘由dREAS； | 嵌套缘故dReas；嵌套意图dInt                                  |
|              |              | 嵌套时间dTIME； | 嵌套时间dTime；嵌套时间起点dTini；嵌套时间终点dTfin；嵌套时段dTdur；嵌套时距dTrang |
|              |              | 嵌套空间dLOC；  | 嵌套空间dLoc；嵌套原处所dLini；嵌套终处所dLfin；嵌套通过处所dLthru；嵌套趋向dDir |
|              |              | 嵌套度量dMEAS； | 嵌套数量dQuan；嵌套起始量dNini；嵌套终止量dNfin；嵌套数量短语dQp；嵌套频率dFreq；嵌套顺序dSeq；嵌套变化量dNvar |
|              |              | 嵌套状态dSTAT； | 嵌套状态dStat；嵌套起始状态dSini；嵌套终止状态dSfin；嵌套历经状态dSproc |
|              |              | 嵌套修饰dFEAT； | 嵌套描写dDesc；嵌套宿主dHost; 嵌套名词修饰语dNmod; 嵌套时间修饰语dTmod |
|              | 事件关系     | 并列关系eCOO；  | 并列eCoo；等同eEqu；分叙eRect；选择eSelt;割舍eAban；选取ePref；总括eSum |
|              |              | 先行关系ePREC； | 先行ePrec；原因eCau；条件eCond；假设eSupp；手段eMetd；让步eConc |
|              |              | 后继关系eSUCC； | 后继eSucc；递进eProg；转折 eAdvt；目的ePurp；结果eResu；推论eInf |
| 语义依附标记 | 标点标记     | 标点标记mPUNC； | 标点标记mPunc                                                |
|              | 依附标记     | 否定标记mNEG；  | 否定标记mNeg                                                 |
|              |              | 关系标记mRELA； | 连词标记mConj；介词标记mPrep                                 |
|              |              | 依附标记mDEPD； | 语气标记mTone；时间标记mTime;范围标记mRang；情态标记mMod； 频率标记mFreq；程度标记mDegr；趋向标记mDir；的字标记mAux； 多数标记mMaj；插入语标记mPars；离合标记mSepa；实词虚化标记mVain 重复标记mRept |

## SemEval2016

The following table is a subset of CSDP but offers some examples to illustrate the idea.

| 关系类型   | Tag           | Description        | Example                     |
|--------|---------------|--------------------|-----------------------------|
| 施事关系   | Agt           | Agent              | 我送她一束花 (我 <– 送)             |
| 当事关系   | Exp           | Experiencer        | 我跑得快 (跑 –> 我)               |
| 感事关系   | Aft           | Affection          | 我思念家乡 (思念 –> 我)             |
| 领事关系   | Poss          | Possessor          | 他有一本好读 (他 <– 有)             |
| 受事关系   | Pat           | Patient            | 他打了小明 (打 –> 小明)             |
| 客事关系   | Cont          | Content            | 他听到鞭炮声 (听 –> 鞭炮声)           |
| 成事关系   | Prod          | Product            | 他写了本小说 (写 –> 小说)            |
| 源事关系   | Orig          | Origin             | 我军缴获敌人四辆坦克 (缴获 –> 坦克)       |
| 涉事关系   | Datv          | Dative             | 他告诉我个秘密 ( 告诉 –> 我 )         |
| 比较角色   | Comp          | Comitative         | 他成绩比我好 (他 –> 我)             |
| 属事角色   | Belg          | Belongings         | 老赵有俩女儿 (老赵 <– 有)            |
| 类事角色   | Clas          | Classification     | 他是中学生 (是 –> 中学生)            |
| 依据角色   | Accd          | According          | 本庭依法宣判 (依法 <– 宣判)           |
| 缘故角色   | Reas          | Reason             | 他在愁女儿婚事 (愁 –> 婚事)           |
| 意图角色   | Int           | Intention          | 为了金牌他拼命努力 (金牌 <– 努力)        |
| 结局角色   | Cons          | Consequence        | 他跑了满头大汗 (跑 –> 满头大汗)         |
| 方式角色   | Mann          | Manner             | 球慢慢滚进空门 (慢慢 <– 滚)           |
| 工具角色   | Tool          | Tool               | 她用砂锅熬粥 (砂锅 <– 熬粥)           |
| 材料角色   | Malt          | Material           | 她用小米熬粥 (小米 <– 熬粥)           |
| 时间角色   | Time          | Time               | 唐朝有个李白 (唐朝 <– 有)            |
| 空间角色   | Loc           | Location           | 这房子朝南 (朝 –> 南)              |
| 历程角色   | Proc          | Process            | 火车正在过长江大桥 (过 –> 大桥)         |
| 趋向角色   | Dir           | Direction          | 部队奔向南方 (奔 –> 南)             |
| 范围角色   | Sco           | Scope              | 产品应该比质量 (比 –> 质量)           |
| 数量角色   | Quan          | Quantity           | 一年有365天 (有 –> 天)            |
| 数量数组   | Qp            | Quantity-phrase    | 三本书 (三 –> 本)                |
| 频率角色   | Freq          | Frequency          | 他每天看书 (每天 <– 看)             |
| 顺序角色   | Seq           | Sequence           | 他跑第一 (跑 –> 第一)              |
| 描写角色   | Desc(Feat)    | Description        | 他长得胖 (长 –> 胖)               |
| 宿主角色   | Host          | Host               | 住房面积 (住房 <– 面积)             |
| 名字修饰角色 | Nmod          | Name-modifier      | 果戈里大街 (果戈里 <– 大街)           |
| 时间修饰角色 | Tmod          | Time-modifier      | 星期一上午 (星期一 <– 上午)           |
| 反角色    | r + main role |                    | 打篮球的小姑娘 (打篮球 <– 姑娘)         |
| 嵌套角色   | d + main role |                    | 爷爷看见孙子在跑 (看见 –> 跑)          |
| 并列关系   | eCoo          | event Coordination | 我喜欢唱歌和跳舞 (唱歌 –> 跳舞)         |
| 选择关系   | eSelt         | event Selection    | 您是喝茶还是喝咖啡 (茶 –> 咖啡)         |
| 等同关系   | eEqu          | event Equivalent   | 他们三个人一起走 (他们 –> 三个人)        |
| 先行关系   | ePrec         | event Precedent    | 首先，先                        |
| 顺承关系   | eSucc         | event Successor    | 随后，然后                       |
| 递进关系   | eProg         | event Progression  | 况且，并且                       |
| 转折关系   | eAdvt         | event adversative  | 却，然而                        |
| 原因关系   | eCau          | event Cause        | 因为，既然                       |
| 结果关系   | eResu         | event Result       | 因此，以致                       |
| 推论关系   | eInf          | event Inference    | 才，则                         |
| 条件关系   | eCond         | event Condition    | 只要，除非                       |
| 假设关系   | eSupp         | event Supposition  | 如果，要是                       |
| 让步关系   | eConc         | event Concession   | 纵使，哪怕                       |
| 手段关系   | eMetd         | event Method       |                             |
| 目的关系   | ePurp         | event Purpose      | 为了，以便                       |
| 割舍关系   | eAban         | event Abandonment  | 与其，也不                       |
| 选取关系   | ePref         | event Preference   | 不如，宁愿                       |
| 总括关系   | eSum          | event Summary      | 总而言之                        |
| 分叙关系   | eRect         | event Recount      | 例如，比方说                      |
| 连词标记   | mConj         | Conjunction        | 和，或                         |
| 的字标记   | mAux          | Auxiliary          | 的，地，得                       |
| 介词标记   | mPrep         | Preposition        | 把，被                         |
| 语气标记   | mTone         | Tone               | 吗，呢                         |
| 时间标记   | mTime         | Time               | 才，曾经                        |
| 范围标记   | mRang         | Range              | 都，到处                        |
| 程度标记   | mDegr         | Degree             | 很，稍微                        |
| 频率标记   | mFreq         | Frequency Marker   | 再，常常                        |
| 趋向标记   | mDir          | Direction Marker   | 上去，下来                       |
| 插入语标记  | mPars         | Parenthesis Marker | 总的来说，众所周知                   |
| 否定标记   | mNeg          | Negation Marker    | 不，没，未                       |
| 情态标记   | mMod          | Modal Marker       | 幸亏，会，能                      |
| 标点标记   | mPunc         | Punctuation Marker | ，。！                         |
| 重复标记   | mPept         | Repetition Marker  | 走啊走 (走 –> 走)                |
| 多数标记   | mMaj          | Majority Marker    | 们，等                         |
| 实词虚化标记 | mVain         | Vain Marker        |                             |
| 离合标记   | mSepa         | Seperation Marker  | 吃了个饭 (吃 –> 饭) 洗了个澡 (洗 –> 澡) |
| 根节点    | Root          | Root               | 全句核心节点                      |

See also [SemEval-2016 Task 9](https://www.hankcs.com/nlp/sdp-corpus.html) and [CSDP](https://csdp-doc.readthedocs.io/zh_CN/latest/%E9%99%84%E5%BD%95/).


================================================
FILE: docs/annotations/srl/cpb.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# Chinese Proposition Bank

|      | 标签       | 角色    | 例子                      |
|------|----------|-------|-------------------------|
| 中心角色 | ARG0     | 施事者   | (ARG0中国政府)提供援助         |
|      | ARG1     | 受事者   | 中国政府提供(ARG1援助)          |
|      | ARG2     | 依谓词而定 | 失业率控制(ARG2在百分之十内)       |
|      | ARG3     | 依谓词而定 | (ARG3从城市)扩大到农村          |
|      | ARG4     | 依谓词而定 | 提高(ARG4百分之二十)          |
| 附属角色 | ARGM-ADV | 状语    | (ARGM-ADV共同)承担          |
|      | ARGM-BNF | 受益者   | (ARGM-BNF为其他国家)进行融资     |
|      | ARGM-CND | 条件    | (ARGM-CND如果成功)，他就留下     |
|      | ARGM-DIR | 方向    | (ARGM-DIR向和平)迈出一大步      |
|      | ARGM-EXT | 范围    | 在北京逗留(ARGM-EXT两天)      |
|      | ARGM-FRQ | 频率    | 每半年执行(ARGM-FRQ一次)      |
|      | ARGM-LOC | 地点、位置 | (ARGM-LOC在机场)被捕获        |
|      | ARGM-MNR | 方式    | (ARGM-MNR以中英文)发行        |
|      | ARGM-PRP | 目的或原因 | (ARGM-PRP由于危机)而破产       |
|      | ARGM-TMP | 时间    | 公司(ARGM-TMP去年)成立       |
|      | ARGM-TPC | 主题    | (ARGM-TPC稳定政策)，核心是...   |
|      | ARGM-DIS | 话语标记  | (ARGM-DIS因此)，他感到不公      |
|      | ARGM-CRD | 并列论元  | (ARGM-CRD与台湾)非正式接触      |
|      | ARGM-PRD | 次谓词   | 指控廉政公署五人(ARGM-PRD接受贿赂) |


```{note}
Although ARG0 and ARG1 share general definitions across all predicates, word sense disambiguation is required to find 
the coresponding definition of semantic roles. Given the word sense of `变化`, say `变化-2`, 
[its second frameset](http://verbs.colorado.edu/chinese/cpb/html_frames/0183-bian-hua.html) can 
be found which defines the following 2 arguments:

1.    ARG0: agent/cause
2.    ARG1: entity arg0 changes

These definitions are different from that of frameset `变化-1`:

1.    ARG0: entity undergoing change
   
Sometimes, the number of arguments and definitions can vary a lot across framesets. 
In summary, word sense disambiguation is essential if SRL is to be used to best effect in practical applications  
```

================================================
FILE: docs/annotations/srl/index.md
================================================
# Semantic Role Labeling

## Chinese
```{toctree}
cpb
```

## English
```{toctree}
propbank
```



================================================
FILE: docs/annotations/srl/propbank.md
================================================
<!--
# ========================================================================
# Copyright 2020 hankcs
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# ========================================================================
-->

# English PropBank

| Role | Description                                  |
|------|----------------------------------------|
| ARG0 | agent                                  |
| ARG1 | patient                                |
| ARG2 | instrument, benefactive, attribute     |
| ARG3 | starting point, benefactive, attribute |
| ARG4 | ending point                           |
| ARGM | modifier                               |
| COM  | Comitative                             |
| LOC  | Locative                               |
| DIR  | Directional                            |
| GOL  | Goal                                   |
| MNR  | Manner                                 |
| TMP  | Temporal                               |
| EXT  | Extent                                 |
| REC  | Reciprocals                            |
| PRD  | Secondary Predication                  |
| PRP  | Purpose                                |
| CAU  | Cause                                  |
| DIS  | Discourse                              |
| ADV  | Adverbials                             |
| ADJ  | Adjectival                             |
| MOD  | Modal                                  |
| NEG  | Negation                               |
| DSP  | Direct Speech                          |
| LVB  | Light Verb                             |
| CXN  | Construction                           |



================================================
FILE: docs/annotations/tok/ctb.md
================================================
The Segmentation Guidelines for the Penn Chinese Treebank (3.0)
===============================================================

Fei Xia

*University of Pennsylvania*

This is an OCR version. See also the [PDF version](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1038&context=ircs_reports).

## Abstract


This document describes the segmentation guidelines for the Penn Chinese Treebank Project. The goal of the project is the creation of a 100-thousand-word corpus of Mandarin Chinese text with syntactic bracketing. The Chinese Treebank has been released via the Linguistic Data Consortium (LDC) and is available to the public.

The segmentation guidelines have been revised several times during the two-year period of the project. The previous two versions were completed in December 1998 and March 1999, respectively. This document is the third and final version. We have added an introduction chapter in order to explain some rationale behind certain decisions in the guidelines. We also include the English gloss to the Chinese words in the guidelines.


In this document, we first discuss the notion of word and tests for wordhood that have been proposed in the literature. Then we give the specification for word segmentation. The specification is organized according to the potential Part-of-Speech tag of an expression and the internal structure of the expression. Next, we specify the treatment for some common collocations. Finally, we compare our guidelines with two segmentation standards: the first (Liu et al., 1993) is used in Mainland China and the second (CKIP, 1996) is used in Academia Sinica in Taiwan.

## Chapter 1 Introduction

This document is designed for the Penn Chinese Treebank Project [XPX+ 00]. The goal of the project is the creation of a 100-thousand word corpus of Mandarin Chinese text with syntactic bracketing. The annotation consists of two stages: the first phrase is word segmentation and part-of-speech (POS) tagging and the second phrase is syntactic bracketing. Each stage includes at least two passes, that is, the data are annotated by one annotator, then the resulting files are checked by another annotator. 

The segmentation guidelines, like POS guidelines and bracketing guidelines, have been revised several times during the project. So far, we have released all three versions on our web site: the first draft was completed in December 1998, after the first pass of word segmentation and POS tagging; the second draft in March 1999, after the second pass of word segmentation and POS tagging. This document, which is the third draft, is revised after the second pass of bracketing. The major changes in the third draft, compared with the previous two drafts, are (1) we add an introduction chapter in order to explain some rationale behind the guideline, (2) we add the gloss to the Chinese words in the guidelines,1 and (3) we also turn the guidelines into a technical report, which is published by the Institute for Research in Cognitive Science (IRCS) of the University of Pennsylvania.

### 1.1 Notion of *word*

The difficulty in defining the notion of word is not unique to Chinese,2 but the problem is certainly more severe for Chinese for a number of reasons. First, Chinese is not written with word delimiters so segmenting a sentence into "words" is not a natural task even for a native speaker. Second, Chinese has little inflectional morphology to ease word identification. Third, there is little consensus in the community on difficult constructions that could affect word segmentation. For instance, the segmentation of verb resultative compounds depends on the syntactic analysis of the construction. One view on how a verb resultative compound is formed says that a simple sentence with a compound is actually bi-clausal and the compound is formed by movement, therefore, the compound should be treated as two words. Another view believes that the compound is formed in the lexicon, and therefore should be one word. The segmentation of the verb resultative compounds depends on which view we adopt for this construction. Fourth, many monosyllabic morphemes that used to be able to stand alone in non-Modern Chinese become bound in Modern Chinese. The influence of non-Modern Chinese makes it difficult to draw the line between bound morphemes and free morphemes, the notions which could otherwise have been very useful for deciding word boundaries.


Our approach is based on both linguistic and engineering consideration. The notion word in our Treebank is roughly a syntactic atom as defined in [SW87], that is, anything that can be inserted into an X° position in syntax. This includes both compounds and simple words.

### 1.2 Tests of wordhood


What tests can be used to decide whether a string of hanzi[Chinese character] is a word or not? Without loss of generalization, we assume the string that we are trying to segment is X-Y, which has two morphemes X and Y. The following tests for establishing word boundaries have been proposed by various authors:


- Bound morpheme: a bound morpheme should be attached to its neighboring morpheme to form a word when possible.


- Productivity: if a rule that combines the expression X-Y does not apply generally (i.e., it is not productive), then X-Y is likely to be a word.


- Frequency of co-occurrence: if the expression X-Y occurs very often, it is likely to be a word.


- Complex internal structure: strings with complex internal structures should be segmented when possible.


- Compositionality: if the meaning of X-Y is not compositional, it is likely to be a word.


- Insertion: if another morpheme can be inserted between X and Y, then X-Y is unlikely to be a word.


- XP-substitution: if a morpheme can not be replaced by a phrase of the same type, then it is likely to be part of a word.


- The number of syllables: several guidelines [LTS93, Chi96] have used syllable numbers on certain cases. For example, in [LTS93], a verb resultative compound is treated as one word if the resultative part is monosyllabic, and it is treated as two words if the resultative part has more than one syllable.


All of these tests are very useful. However, none of them is sufficient by itself for covering the entire range of difficult cases. Either the test is applicable only to limited cases (e.g., the XP-substitution test) or there is no objective way to perform the test as the test refers to vaguely defined properties (e.g., in the productive test, it is not clear where to draw the line between a productive rule and a non-productive rule). For more discussion on this topic from the linguistics point of view, please refer to [Pac98, SW87].


Since no single test is sufficient, we chose a set of tests for our segmentation guidelines which includes all of the ones mentioned except for the productivity test and the frequency test. Rather than have the annotators try to memorize the entire set and make each decision from these principles, in the guidelines we spell out what the results of applying the tests would be for all of the relevant phenomena. For example, for the treatment of verb resultative compounds, we select the relevant tests (e.g., the number of syllables and the insertion test), and give several examples of the results of applying these tests to verb resultative compounds. This makes it straightforward, and thus efficient, for the annotators to follow the guidelines.

### 1.3 Compatibility with other guidelines


We have studied other groups, guidelines, such as the Segmentation Standard in China [LTS93] and the one in Taiwan [Chi96], and tried to accommodate them in our guidelines if possible.


Since the final result of the Treebank is a list of bracketed sentences, our guidelines have some flexibility with regards to the segmentation of certain constructions. For example, the string 走上来[walk up] is treated as two segments in [LTS93], but one segment in [Chi96]. In our Treebank, we will segment it into two parts, and then group them together as a compound ——that is, (走[walk]/V 上来[up]/V)/V. We call 走上来 a word with internal structures. Out annotation, in this case, is compatible with both [LTS93] and [Chi96]. The comparisons of these three guidelines can be found in Appendix A.


Note: For the sake of annotation efficiency, the grouping of the words with internal structure is done at bracketing stage, rather than at the segmentation stage. In this document, we show the grouping format, but keep in mind that the format is the one AFTER the bracketing is completed. For example, we consider 走上来[walk up] 2us one word. It is segmented into “走[walk]/V 上来[up]/V” at the segmentation stage, and it will be grouped into (走[walk]/V 上来[up]/V)/V at the bracketing stage. In the paper, we just say 走上来[walk up] should be annotated as (走[walk]/V 上来[up]/V)/V.


Most disagreements among these three guidelines do not make much difference to parsing or sentence interpretation. For most patterns for which the guidelines give different treatments (e.g., numbers and reduplication strings), simple conversion programs can be written to convert the data from one format to another.


Our goal is: in the final output, the word boundary (the highest-level X° in the parse tree) should be as accurate as possible, while the internal structure serves as a bridge for the resource sharing with other systems.

### 1.4 Treatment for unclear cases

There are two types of unclear cases:

- A construction is easy to identify but there is no consensus on its treatment. 
  Ex: A-not-A, V-de construction, V-R, potential form (i.e., V-de-R).Our approach: we will choose one analysis, and annotate the data according to that analysis. Make sure that the annotation is easy to convert to the structures for other analyses if necessary.
- Two constructions are difficult to tell apart by existing tests.
  Ex: some N+N are compounds, others are phrases.


Our approach: for the sake of consistency and efficiency, we don^ disambiguate the two constructions unless making the distinction is crucial for various reasons.

### 1.5 Organization of this guidelines


The guidelines are organized according to the internal structure of the corresponding expressions (e.g., a verb resultative compound is represented as V+V, while a verb-object expression is as V+N), so it is easy for the annotators to search the guidelines for reference. The Part-of-speech tags used in this paper are identical to the ones used in the POS tagging task except that the tags for verbs are merged into V and the ones for nouns are merged into N. For the descriptions of the complete POS tagset, please refer to our Part-of-Speech Tagging Guidelines for the Penn Chinese Treebank (3.0). The list of POS tags can be found in Appendix B.


In this guidelines, we list mainly the decision for each case without going into detail elaborating other alternatives and the reasoning behind each decision.

Chapter 2 Specification
---------


In this chapter, we assume that a sentence has been segmented into large chunks, and the next step is to decide whether each chunk should be further divided. The chapter is arranged by the potential POS of the chunk if the chunk is a word. To search through the section, first use the ^POS^ of the chunk to find the subsection, then use the ^word^ formation information to find the subsection; or simply use the “word” formation information.

### 2.1    Common noun: NN

#### 2.1.1    Name of relative


Treat it as one word.


Ex:三叔[uncle]/NN，表叔[uncle]/NN,.大姑父[uncle]/NN.

#### 2.1.2 CD+N


If a measure word can be inserted between CD and N without changing the meaning, tag it as CD+N; otherwise, tag it as one word (N).


One word:三排[the third platoon]/NN，一方[one side]/NN,三者[three entities]/NN, 一行[a group traveling together]/NN，2 1 世纪[the 21st century]/NT.


Two words: — [one]/CD 学生[student]/NN.

#### 2.1.3 DT+N


Treat it as one word if both DT and N are monosyllabic and either DT or N is bound; otherwise, treat it as two words.


Sometimes, it is difficult to decide whether a morpheme is bound or not because of the influence of non-Modern Chinese. To be consistent, we maintain a list of nouns and a list of determiners. If a morpheme is in one of the lists, we consider it as bound:


- monosyllabic bound nouns: /^.[school], ^ (when it means the earth).


- monosyllabic bound determiners:当[this/that]


We also treat 本人[oneself]/NN as one word and tag it as NN.


One word:本人[oneself]/NN,本校[our school]/NN,全球[whole world]/NN,当地[the place mentioned]/NN,当今[present time]/NT,当代[the contemporary era]/NN.


Two words:本[one’s]/DT 单位[organization]/NN.

#### 2.1.4    PN+N


Treat it as one word if both PN and N are monosyllabic and N is bound; otherwise, treat it as two words.


In this case, the current list of bound nouns is:校[school].


One word:我校[my school]/NN.


Two words:我[my]/PN 单祆[organization]/NN.

#### 2.1.5    JJ+N


The pattern is: X+N, where X modifies the N, and X is either a JJ or a prefix.


Note: JJ+N can be a phrase. For example, in one of the files we annotated,全国性[nationwide]/JJ 网络[network]/NN is extended into “全国性[nationwide]/JJ 观测[observe]/VV 苏梅克一列桌/NR 9 号[number 9]/NN 彗星[comet]/NN 撞击[hit]/W 木星[Jupiter]/NN 的/DEC 网络[network]/NN”.


Segment X+N according to the type of X:


- X is a prefix: treat X+N as one word.[1](#bookmark93) A list of prefixes:啊，非[non-].


Ex:啊爸[father]/NN,非商业化[non-commercial]/JJ 宗旨[purpose]/NN.


A list of JJs:原[former],前[former]


Ex:原[former]/JJ 在[at]/P 华[China]/NR 老挝[Laos]/NR 难民[refugee]/NN;


前[former]/JJ 民主德国[German Democratic Republic]/NR.


- X is a non-predicate adjective:[2](#bookmark94) if both JJ and N are monosyllabic, tag it as one word; otherwise, treat it as JJ+N.


One word:女人[woman]/NN.


Two words:共同[mutual]/JJ 利益[interest]/NN.


- X is an adjective: treat it as one word if X or N is bound or the meaning of X+N is non-compositional. For unclear cases, if both JJ and N are monosyllabic, treat JJ+N as one word (e.g” 鲜花[fresh flower]/NN,强队[strong team]/NN, •红茶[black tea]/NN,好评[favorable comment]/NN).


One word:小媳妇[daughter-in-law]/NN,大洲[continent]/NN,大海[sea]/NN.


Two words:厚[thick]/JJ 书[book]/NN.

#### 2.1.6    LC+N


If both LC and N are monosyllabic, treat the string as one word, and tag it as NN or NT according to its meaning.


Ex:前院[front yard]/NN,前天[day before yesterday]/NT,左肩[left shoulder]/NN.

#### 2.1.7    N+LC


Treat N+LC as one word if:[3](#bookmark95)


- the N and LC are monosyllabic; and


- in this context, the N is non-referential or bound; and


- in this context, the N can not be modified by Det-M or other modifiers.


Otherwise, treat it as two words.


- One word (some of them might be two words in other context):室内[indoor](室内[indoor]/NN 训练[training]/NN),台下[off stage],眼前[at present],境外[foreign](境外[foreign]/NN 集团[group]/NN 境内外[domestic and international /NN,海外[oversea](海外[oversea]/NN 市场[market]/NN)，背后[at the back]/NN,天下[world]/NN,国内[domestic]/NN,午后[afternoon]/NT,赛前[before the contest]/NT.
- Two words:中午[noon]/NT 以后[afterwards]/LC.

#### 2.1.8    N+N: N1 modifies N2


If it is 1-hl or 2+1 (i.e., N1 has one or two hanzi and N2 has one hanzi), treat N1+N2 as one word (i.e.，we treat all monosyllabic nouns as potential “接尾词. If a noun with no more than 2 hanzi is followed by multiple    "接尾词"    monosyllabic noun attaches to the preceding    the whole string is treated as one word (e.g•，物理学家[physicist]/NN).


For other cases, the string is treated as two words.


- One word:北京市[Beijing]/NR,研究室[research lab]/NN,发展史[developmental history]/NN,始祖鸟[proto-bird]/NN, 残疾人[the physically challenged]/NN, 清晰度[visibility]/NN, [sense of urgency]/NN, 大奖赛[tournament]/NN,太阳系[the solar system]/NN.
- Two words:北京[Beijing]/NR 大学[University]/NN,坑具[toy]/NN 工厂[factory]/NN,合作[collaboration]/NN, 领城[area]/NN,史学[history]/NN 研究[research]/NN.

#### 2.1.9    PN+LC


If both PN and LC are monosyllabic, treat PN+LC as one word and tag it as NT or NN.


One word:此间[here]/NN,此前[before this]/NN,其中[among them]/NN,何时[when]/NT.


Two word:这[this]/PN 以后[after]/LC.

#### 2.1.10    V+N


In this pattern, we assume V is VV (For VA+N, please refer to the section for JJ+N) If V modifies N, treat V+N as one word and tag it as a noun.


one word:烤肉[barbecue]/NN，炒菜[stir-fried dishes]/NN,证明信[certificate]/NN,讨论会[symposium]/NN.[4](#bookmark96)

### 2.2 Proper Noun: NR


Currently, if the proper noun is composed of multiple words, we don^ group them.

#### 2.2.1    Personal name


Treat it as one word. Don't give the internal structure unless there is a space between two names (in foreign alphabet).


Ex:张胜利/NR,卡尔[Karl].马克斯[Maxx]/NR, John/NR Smith/NR.

#### 2.2.2    Personal name with affixes


Treat it as one word.


Ex:老张/NR,张老/NR

#### 2.2.3    Personal name + title


Treat it as two words.


Ex:张/NR 教授[professor]/NN,张/NR 李/NR 两[two]/CD 位/M 教授[professor"^

#### 2.2.4    Name of Organization/Country/School/..


If the pattern is N1+N2, where N2 is a common noun, then if N2 is monosyllabic, treat N1+N2 as one word, else treat N1+N2 as two words.


Simple names:北京市[Beijing]/NR,黄河[the Yellow River]/NR,沙市[Sha City]/NR,黑龙江省[Heilongji^ Province]/NR.


Complex names:北京[Beijing]/NR.大学[University]/NN,北京[Beijing]/NR 第一[First]/OD 服装厂[Clothing Factory]/NN，美国[the United States]/NR 国会[Congress]/NN.

#### 2.2.5 NR+NR: coordination without conjunction


Treat it as two words.


Ex:中[China]/NR 美[the United States]/NR,中[China]/NR 美[the United States]/NR 关系[relation]/NN, 东[Eastern Asia]/NR 新[Singapore]/NR 澳[Macao]/NR.

### 2.3 Temporal noun: NT


The names of years/months/day/hour and so on axe words.


Ex: 1998年[1998]/NT 3月[March]/NT 21 日[21st]/NT, 5点钟[5 o’clock]/NT，初一[the first day of a lunar month]NT，i年[last year]/NT.

#### 2.3.1 CD+N


If CD+N is the name of a time, treat it as one word (NT). If it is the count of the time, treat it as two words (CD+M).


One word: 1998年[1998]/NT, 5点钟[5 o，clock]/NT, 9 0 年代[the 90s]/NT,


Two words: 3/CD 年[year]/M, 3/CD 个/M 月[month]/NN.

### 2.4 Localizer: LC


Localizers are separated from the noun that it attaxJies to except for the case mentioned in Section 2.1.7 (i.e., N+LC).


A localizer is either one or two syllables:


- monosyllabic localizers: e.g.内[in],后[after].


- bisyUabic localizers: e.g.之间[between],以来[since],以后[afterwards],左右[around].

### 2.5 Pronoun: PN


Treat it as one word.


Ex:他们[they]/PN,他自己piimself]/PN，自己[self]/PN.

### 2.6    Determiner: DT


We separate DTs from the succeeding words.


Ex:这[this]/DT 三[three]/CD 个/M 人[people]/NN,各[each]/DT 国[nation]/NN.


Currently, we treat 这些[these] as one word, and tag it as DT.


Some examples of bisyllabic DTs:全体[all]，其余[the rest], —切[all],这些[these],那些[those],所

### 2.7    Cardinal number: CD


Treat it as one word. Note: the internal structure of a CD is very easy to recover if needed.


Some examples:


- Pure numbers: 一亿三千万[one hundred and thirty million]/CD, 30.1/CD, 123,456/CD, 35.6%/CD, 30万[three hundred thousand]/CD, 30几[thirty odd]/CD.


- Estimation:三四十[between thirty and forty-nine]/CD 岁[years old]/M.


- CD + X + CD(5.5.4): X is a morpheme such as 余[odd],分之[fraction]，点[point]•三十几亿[three billion odd]/CD,三分之一[one third]/CD,三点一[three point one]/CD,好几[multiple]/CD 个/M.


- CD+X: X is a morpheme such as 余[odd],来[over/odd]:四千一百余[four thousand and one hundred odd]/CD 人[people]/NN,三十雇[about thirty]/CD 个/M.

### 2.8 Ordinal number: OD


Treat it as one word.


Ex:第一[first]/OD,第三十一[thirty-first]/OD.

### 2.9 Measure word: M


Treat the measure word, including a reduplicated or a compound measure word, as one word. Treat the string such as 分钟[minute] as one word.


Ex:杯[cup]/M,杯杯[cup-cup]/M,架次[number of flights]/M,分钟[minute]/M.

### 2.10 Verb: VA, VC, VE, and VV

#### 2.10.1 Reduplication: A A, ABAB, A ABB, A AB, ABB，ABAC


Treat it as one word.


- AA, A is a verb: AA/V 
  Ex:看看[see]/W，红红[vivid red]/VA.


- ABAB: AB is a verb: ABAB/V
  Ex:研究研究[research]/VV,雪白雪白[snow white]/VA.


- AABB, AB is a verb: AABB/V
  Ex:来来往往[come and go]/W,高髙兴兴[happy]/VA Note: most of the time, AA or BB is not a word.


- AAB(except for AA-看 in 2.10.2):AAB/V 
  Ex:蒙蒙亮
  Note: most of the time, AA or B is not a word.


- ABB: ABB/V
  Ex:绿油油[bright green]/VA，红彤彤[bright red]/VA.
  Note: most of the time, A or BB is not a word.


- ABAC, etc.: ABAC/V
  Ex:马里马虎[careless]/VA,有条有理[orderly]/VA，一清二楚[very clear]/VA.

#### 2.10.2 “Reduplication”： AA-kan, A-one-A, A-le-one-A，A-le-A


Treat it as one word with internal structure.


- AA-看：（AA/V 看/V)/V
  Ex:(说说[say]/W 看/VV)/V.
  The basic meaning of the word 看 is to “see”，but in this context，it roughly means "try to do something".


- A-one-A: (A/V one/CD A/V)/V 
  Ex:(想[think]/W — [one]/CD 想[think]/VV)/V.  

- A-le-A: (A/V le/AS A/V)/V 
  Ex:(想[think]/W 了/AS 想[think]/W)/V.


- A-l^on^A: (A/V le/AS one/CD A/V)/V 
  Ex:(想[think]/W 了/AS — [one]/CD 想[think]/W)/V.


Note: V+CD+M is treated as three words, e.g. [look]/V [one]/CD [eye]/M (take a look).

#### 2.10.3 A-not-A

Treat it as one word with internal structure.

Ex:(来[come]/VV 没[not]/AD 来[come]/VV)/V，（高[happy]/VA 不[not]/AD 高兴[happy]/VA)/V， (喜[like]/VV 不[not]/AD 喜欢[like]/VV)/V.

#### 2.10.4 AD+V


If one or more of the following hold, treat AD+V as one word (V):


- no free word can intervene between AD and V,


- the V cannot be a predicate without the AD,


- the subcategorization frame of AD+V is different from that of the V.


Otherwise, treat it as two words.

- One word:胡说[talk nonsense],胡来[mess things up],敬献[present with great respect],尚余[remain]
  [(尚余[still remain]/VV 七十五[75]/^D 名)M 难民[refugee]〉NN)，历任[have served successively as],并列[tie处 不喪[not afraid o月.

- Two words:已经[already]/AD 采取[take]/VV,不[not]/AD 应该[should]/VV，没[not]/AD 完成[complete]/VV.

#### 2.10.5 MSP+V


If the V can not be a predicate without the MSP, treat MSP+V as one word (V).


One word:以期[in order to]/W (以期[in order to]/W 在[at] 与[with] 美国[the United States]、 瑞典[Sweden]、挪威[Norway]、这些 [these]、世界[world]、强队[strong teams] 、交锋[competition] 、中[during]...).

#### 2.10.6 N+V


Some subject-predicate strings Coin be either a phrase or a word depending on the context.


If a VP-modifier can be inserted between the subject and the predicate part and the “subject” is referential, then the string is a phrase, otherwise it is a word.


One word:头疼[headache]/VA in “他[he]/PN 让[make]/VV 我[me]/PN 很[very]/AD 〈He gives me a headache}”.


Two words:头[head]/NN 疼[ache]/VA in “我[I]/PN 头[head]/NN {很[very]/AD}疼[ache]/VA〈I have a headache}’’.

#### 2.10.7 V+N


If the V and the N axe separated (by the aspect markers, by the modifiers of the N, or because the V is reduplicated), treat V+N as two words.


If the V and the N are adjacent,[6](#bookmark98)


- If V-N is semantically transitive and its object can occur after N only when VN are adjacent (therefore the V is not a ditransitive verb)，treat V+N as one word (e.g.,投资[invest]/VV, 出席[be present]/W,关心[care]/VV,为期[scheduled for a specific duration of time]/W).


- If V and VN have similar meaning and both axe semantically intransitive, treat VN as one word (e.g.,睡觉[sleep]/VV).


- If N is “bound”, treat VN as one word (e.g.,游泳[swim]/VV,无望[hopeless]/VV,无效[invalid]/VV, 无法[unable to]/VV,辞职[resign]/W).


- If V-N is 1+1 AND the meaning is non-compositional，treat V-N as one word (e.g.,念书[study]/VV, 流血[bleed]/VV).  


Examples of V-N as two words:访[visit]/VY 华[China]/NR in the sentence 他[he]/PN 曾[previously]/AD 七[seven]/CD 次[time]/M 访[visit]/W 华[China]/NR〈He has visited China seven times、

#### 2.10.8    V+R


The tests for verb resultative compounds (V-Hs): both V and R are verbs and the potential forms (V-de-R, V-not-R) exist. So our definition of V-R includes resultative and directional verb com-pounds (e.g.，看见[see] and 走上来[walk up])，but it does NOT include words such as 改善[improve] and 鼓动[agitate].

- We treat it as one word. For the sake of compatibility with other guidelines, we give the internal structure for the words if they have more than 2 syllables or if the R is the following:完[finish]/W.

- Words without internal structure:吃掉[eat up]/VV,看见[see]/W，擦净[wipe clean]/VV.

- Words with internal structures:(做[do]/VV 完[finish]/W)/V,(擦[wipe]/VV 干净[clean]/VV)/V, (认识[realize]/W 到[reach]/VV)/V.

#### 2.10.9    Potential form: V-de/bu-R


We treat it as one word.

- If V-R exists, give the internal structure of V-de/bu-R, otherwise, don^ give one.
  Ex: words with internal structure:(擦[wipe]/VV 不[not]/AD 冷[clean]/VA)/V，（擦[wipe]/VV 得/DER 净[clean]/VA)/V.    "

- words without internal structure:吃不了 [unable to eat anymore]/W，买不起[cannot afford]/VV.


Note: the string WV de R,? can be ambiguous between potential form and V-de construction. For example, “这[this]张[M]桌子[table]擦[wipe]得pER]干净[clean]吗[SP]?’’ can either be a potential form (which means Can this table be wiped clean?), or it could be a V-de construction (which means Has the table been wiped clean?). The two constructions have different syntactic structures. Normally, we can tell them apart by meaning, by the position of the object or by checking whether adverbs can be inserted between the de and the R.

#### 2.10.10    V+DIR


See Section 2.10.8 (i.e., the section for V+R).


Words with internal structure:(走[walk]/VV 出去[out]/VV)/V,(走[walk]/VV 不[not]/AD 出去[o叫 Words without internal structure:走出[walk out of]/VV,想出[think of]/VV.

#### 2.10.11    V+AS


Treat it as two words.[7](#bookmark99)


Ex:走[walk]/VV 了/AS.

#### 2.10.12 V+DER


The pattern is V-de in V-de construction. We treat V-de as two words.[8](#bookmark100) Ex:走[walk]/VV 得/DER (走[walk]/W 得/DER 很[very]/AD 快[fast]/VA).

#### 2.10.13 Verb coordination without conjunctive words


If the pattern is 1+1, treat it as a word; otherwise, treat it as multiple words.


One word:修建[build]/VV.


Two words:宣传[propagate]/VV 鼓动[agitate]/VV.

#### 2.10.14 V+coverb


The pattern is V+X, where X is monosyllabic and it is either a P or a V.[9](#bookmark101)

- We first decide whether V+X is a word. If it is, we use its syllable count to decide whether to show its internal structure. That is, if V is monosyllabic, don^ give the internal structure;


otherwise, give the internal structure.


- treat V+X as one word if X is in the following list:给[give];为[become],成[become]，作[treat as],到[arrive],出[out];自[from],向[toward],入[in],以[with].
  Ex:

  - 给[give]:送给[give/send to]/VV，交给[hand in]/VV，（赠送[give as a gift to]/VV 给[give]/VV)/V.
  - 为[to],成[become/into]，作[do/as],到[arrive],出[out]:(翻译[translate]/VV 成[become] 当作[treat as]/VV,起到[take effect]>V，找到[find]/VV,(认识[realize]/VV 到[reach]/VV)/V,决出[decide victors]/VV.
  - 自[from],向[toward],入[in],以[with]:来自[come from]面向[face toward]/ into]/VV,迈向[step toward]，VV,报以[respond with]/VV，加以[supplement with]/VV.



- treat V+X as two words if X is in the following list:在[at]，似[like].

  - Ex:生[to be born]/W 在[at]/P，坐[sit]/W 在[at]/P，留[stay]/W 在[at]/P，深[deep]/VA 似pike]/P 海[sea]〉NN.

-  treat V+X as one word or two words (V+P) according to the meaning of the X, if X is in the following list:于[at].

  - If 于 in V + 于 can be replaced by 在[at], tag V+于 £us two words (V+P). Otherwise, tag it as one word.
  - One word:等于[equal to]/VV,缘于[due to]/VV,大于[bigger than]/VV,小于[smaller than]/VV, 无助于[of no help to]/VY 低于[lower than]/W,利于[be beneficial for]/W,有利于[be beneficial for]/VV.
  - Two words:生[to be born]/W 于[at]/P,建[build]/VV 于[at]/P.


#### 2.10.15 Others

Generally, in X+V(or V+X) where X modifies V, if X cannot modify other verbs, or V cannot be a predicate without the X, treat X+V as one word.

- Ex:以期[in order to]/W

### 2.11 Adverb: AD


Adverbs are separated from the XP that it modifies.


Adverbs that modify numbers:近[almost]/AD 三十[thirty]/CD，5[five]/CD 分[minute]/M 多[odd]^ 钟[minute]/NN.[10](#bookmark102)


The string such as fe^[extremely big] is an adverb when it modifies VPs, not AD+VA, because the VA(大[big]) cannot modify VPs without the AD(极[extremely]).

#### 2.11.1    Reduplication


When VA(or AD) reduplicates, the resulting word can be an AD.


Ex:妤好[well]/AD 干[do]/W，常常[always]/AD,仅仅[only]/AD.

#### 2.11.2    DT+M/N


The following are tagged as ADs when they modify VP/S:这样[this way]/AD (这样[this way]/AD 做[do]/W),同机[on the same airplane]/AD (同机[on the same airplane]/AD 到达[arrive]/W).

#### 2.11.3    P+PN


We treat the following as two words:为[for]/P 此[this]/PN.

#### 2.11.4   P+N


The following can be seen as frozen PPs. Since they have the same function as the ADs, we treat them as words, and tag them as ADs:迄今[until now],沿途[on the way],即席[impromptu]， 为何[why](为何[why]/AD 愈演愈洩[get worse and worse]/VA),为什么[why]/AD 来[come]/VV

#### 2.11.5 PN+LC


If a PN+LC totally loses the function of an NP and the string acts like an adverb, treat it as an adverb.


We treat the following as ADs:此外[in addition]/AD.

#### 2.11.6 Others


If in that context a string totally loses the function of the XP(where X is the head of the string) and the string behaves like an adverb, tag it as AD.


We treat the following as ADs:进一步[a step further]/AD.

### 2.12 Preposition: P


Separate it from NP/S that follows it.


Most prepositions are monosyllabic. Some common bisyllabic prepositions are:为了 [in order to],随着[along with],沿着[along],本着[in conformity with],鉴于[due to],除了[except],经过[through]，


作为[being/regard as],截止[until].


When a coverb follows a verb, we have to decide whether the word is part of a verb compound. A list of such coverbs are:于，给，为， See Section 2.10.14 for details.

### 2.13 Subordinating Conjunction: CS


Separate it from the XP that follows it.


Strings such as 只有[only] is ambiguous:


- CS:只有[only if]/CS ...才[then]/AD ....


- AD+VE:他[he]只[only]/AD 有[have]/VE 三[three]/CD 块/M 钱[money]/NN〈He only has three dollars).

### 2.14 Conjunction: CC


Separate it from the XPs that it conjoins.


Ex:和[and]/CC,与

### 2.15 Particle: DEC, DEG, DEV, DER，AS, SP，ETC，and MSP


Separate it from the XP that it attaches to.[11](#bookmark103)


Most particles axe monosyllabic. One of bisyllabic particles is 的话[if so]/SP.

### 2.16 Interjection: IJ


Treat it as one word.


Ex: 哈[expressing satisfaction and so on]/IJ.

### 2.17 Onomatopoeia: ON


Treat it as one word.


Ex:哈哈[sound of laughter]/ON,哔啦啦[sound of water/rain]/ON

### 2.18 Other noun-modifier: JJ


Separate it from the measure word (M) or the noun (N) that it modifies. Ex:三[three]/CD 大[big]/JJ 杯[glass]/M 水[water]/NN


"When JJs modify nouns, the JJs can be adjectives,区别词(非谓形容词)，or “phrasal words”. Most of the <4phrasal words,? have two parts: X+Y, both X and Y are monosyllabic, and X or Y is the short-form of the corresponding words. Some examples of the "phrasal words" are as follows:

#### 2.18.1 V+N


V+N:随军[being with the army]/JJ.妓女[prostitute]/NN,旅英[having studied in England]/JJ 学者[scholar]/N^ 成套[forming a complete set]/JJ 设备[equipment]/NN,.发稿[sending manuscripts to press]/JJ 时间[time]/NN， ^^[receiving award]/JJ #i[scholar]/NN, 驻华[being stationed in China]/JJ  使馆[embassy]/NN, ^4[giving benefit]/JJ 国家[nation]/NN，

#### 2.18.2 AD+VA


AD+VA:最新[the newest]/JJ 消息[news]/NN,超大[extra-large]/JJ 规模[scale]/NN 集成[integrate]/NN 电路[circuit]/NN，较大[relatively big]/JJ 增长[growth]/NN.


The common “AD”：最[the most],超[extra-],较[relatively].

#### 2.18.3 VA+N


VA+N/M:高层[high-ranking]/JJ 人士[official]/NN,高速[high speed]/JJ 公路[highway]/NN，大幅[big size]/JJ 标语[slogan]/NN.

#### 2.18.4 CD+N


CD+N/M:两国[two~nation]/JJ 关系[relation]/NN，多国[multi-nation|/JJ 部队[troop]/NN

#### 2.18.5 P+N


P+N/LC:对外[foreign]/JJ 政策[policy]/NN

#### 2.18.6 Others


others:关贸[tariff and trade]/JJ 总协定[treaty]/NN,年均[annual average]/JJ 增长率[growth rate]/NN, 上述[aforementioned]/JJ 三[three]/CD 国[nation]/NN,历届[all previous sessions]/JJ 世界[world]/NN 体操[gymnastics]/NN 大赛[championship]/NN,有关[related]/JJ 方面[parties]/]S[N.

### 2.19 Punctuation: PU


Treat it as one word, except when it is part of another word; for example, 4V? in a number (e.g., 123,456/CD) or in proper names，（e.g.，卡尔[Karl].马克斯[Marx]/NR).

### 2.20 Foreign word: FW


Treat it as one word, except when it is part of another word (e.g.,    [Karaoke]/NN).

### 2.21 Others

#### 2.21.1 Idioms


The frozen idioms (成语）axe treated as words when they function as an NP or a VP.


Ex:各有所好[each has his likes and dislikes]/V, 一比高低[compete]/V.

#### 2.21.2 Telescopic strings


Telescopic strings are treated as one word if they are not too long (less than four characters). K it is too long, segment them according to pauses.


Short strings:进出口[imports and exports]/NN 贸易[trade]/NN,国内外[foreign and domestic]/NN 形勢[situation] /NN.


Long strings:交响[symphony]/JJ 乐团[orchestra]/NN，北京[Beijing]/NR 市长[mayor]/NN.

#### 2.21.3 Short form


Ex:三好[three-merit]/JJ 学生[student]/NN，教科文[education，science,紐d culture]/NN 组织[organization] (UNESCO),七中[the seventh central government]/NN 全会[convention]/NN.

  

Shortened

Download .txt

gitextract_p7um9exn/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── config.yml
│   │   └── feature_request.md
│   ├── pull_request_template.md
│   └── workflows/
│       └── unit-tests.yml
├── .gitignore
├── CITATION.cff
├── LICENSE
├── README.md
├── docs/
│   ├── Makefile
│   ├── annotations/
│   │   ├── constituency/
│   │   │   ├── ctb.md
│   │   │   ├── index.md
│   │   │   ├── npcmj.md
│   │   │   └── ptb.md
│   │   ├── dep/
│   │   │   ├── index.md
│   │   │   ├── pmt.md
│   │   │   ├── sd_en.md
│   │   │   ├── sd_zh.md
│   │   │   └── ud.md
│   │   ├── index.md
│   │   ├── ner/
│   │   │   ├── index.md
│   │   │   ├── msra.md
│   │   │   ├── ontonotes.md
│   │   │   └── pku.md
│   │   ├── pos/
│   │   │   ├── 863.md
│   │   │   ├── ctb.md
│   │   │   ├── index.md
│   │   │   ├── npcmj.md
│   │   │   ├── pku.md
│   │   │   └── ud.md
│   │   ├── sdp/
│   │   │   ├── dm.md
│   │   │   ├── index.md
│   │   │   ├── pas.md
│   │   │   ├── psd.md
│   │   │   └── semeval16.md
│   │   ├── srl/
│   │   │   ├── cpb.md
│   │   │   ├── index.md
│   │   │   └── propbank.md
│   │   └── tok/
│   │       ├── ctb.md
│   │       ├── index.md
│   │       └── msr.md
│   ├── api/
│   │   ├── common/
│   │   │   ├── configurable.rst
│   │   │   ├── conll.rst
│   │   │   ├── constant.rst
│   │   │   ├── document.rst
│   │   │   └── index.md
│   │   ├── hanlp/
│   │   │   ├── common/
│   │   │   │   ├── component.rst
│   │   │   │   ├── dataset.md
│   │   │   │   ├── index.md
│   │   │   │   ├── structure.md
│   │   │   │   ├── torch_component.md
│   │   │   │   ├── transform.md
│   │   │   │   └── vocab.md
│   │   │   ├── components/
│   │   │   │   ├── classifiers.md
│   │   │   │   ├── eos.md
│   │   │   │   ├── index.md
│   │   │   │   ├── lemmatizer.md
│   │   │   │   ├── mtl/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── mtl.md
│   │   │   │   │   └── tasks/
│   │   │   │   │       ├── constituency.md
│   │   │   │   │       ├── dep.md
│   │   │   │   │       ├── index.md
│   │   │   │   │       ├── lem.md
│   │   │   │   │       ├── ner/
│   │   │   │   │       │   ├── biaffine_ner.md
│   │   │   │   │       │   ├── index.md
│   │   │   │   │       │   └── tag_ner.md
│   │   │   │   │       ├── pos.md
│   │   │   │   │       ├── sdp.md
│   │   │   │   │       ├── srl/
│   │   │   │   │       │   ├── bio_srl.md
│   │   │   │   │       │   ├── index.md
│   │   │   │   │       │   └── rank_srl.md
│   │   │   │   │       ├── task.md
│   │   │   │   │       ├── tok.md
│   │   │   │   │       └── ud.md
│   │   │   │   ├── ner/
│   │   │   │   │   ├── biaffine_ner.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── rnn_ner.md
│   │   │   │   │   └── transformer_ner.md
│   │   │   │   ├── parsers/
│   │   │   │   │   ├── biaffine_dep.md
│   │   │   │   │   ├── biaffine_sdp.md
│   │   │   │   │   ├── crf_constituency_parser.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── ud_parser.md
│   │   │   │   ├── pipeline.md
│   │   │   │   ├── srl/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── span_bio.md
│   │   │   │   │   └── span_rank.md
│   │   │   │   ├── sts.md
│   │   │   │   ├── taggers/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── rnn_tagger.md
│   │   │   │   │   └── transformer_tagger.md
│   │   │   │   └── tokenizers/
│   │   │   │       ├── index.md
│   │   │   │       ├── multi_criteria.md
│   │   │   │       └── transformer.md
│   │   │   ├── datasets/
│   │   │   │   ├── constituency/
│   │   │   │   │   ├── constituency_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── dep/
│   │   │   │   │   ├── conll_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── eos/
│   │   │   │   │   ├── eos.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── index.md
│   │   │   │   ├── ner/
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── json.md
│   │   │   │   │   ├── resources.md
│   │   │   │   │   └── tsv.md
│   │   │   │   ├── pos/
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   ├── srl/
│   │   │   │   │   ├── conll2012_dataset.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── resources.md
│   │   │   │   └── tok/
│   │   │   │       ├── index.md
│   │   │   │       ├── mcws_dataset.md
│   │   │   │       ├── resources.md
│   │   │   │       └── txt.md
│   │   │   ├── hanlp.rst
│   │   │   ├── index.md
│   │   │   ├── layers/
│   │   │   │   ├── decoders/
│   │   │   │   │   ├── biaffine_ner.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   └── linear_crf.md
│   │   │   │   ├── embeddings/
│   │   │   │   │   ├── char_cnn.md
│   │   │   │   │   ├── char_rnn.md
│   │   │   │   │   ├── embedding.md
│   │   │   │   │   ├── fasttext.md
│   │   │   │   │   ├── index.md
│   │   │   │   │   ├── transformer.md
│   │   │   │   │   └── word2vec.md
│   │   │   │   ├── index.md
│   │   │   │   └── transformers/
│   │   │   │       ├── encoder.md
│   │   │   │       ├── index.md
│   │   │   │       └── tokenizer.md
│   │   │   ├── pretrained/
│   │   │   │   ├── amr.md
│   │   │   │   ├── amr2text.md
│   │   │   │   ├── constituency.md
│   │   │   │   ├── dep.md
│   │   │   │   ├── eos.md
│   │   │   │   ├── fasttext.md
│   │   │   │   ├── glove.md
│   │   │   │   ├── index.md
│   │   │   │   ├── mlm.md
│   │   │   │   ├── mtl.md
│   │   │   │   ├── ner.md
│   │   │   │   ├── pos.md
│   │   │   │   ├── sdp.md
│   │   │   │   ├── srl.md
│   │   │   │   ├── sts.md
│   │   │   │   ├── tok.md
│   │   │   │   └── word2vec.md
│   │   │   └── utils/
│   │   │       ├── index.md
│   │   │       └── io_util.md
│   │   ├── restful.rst
│   │   ├── restful_golang.md
│   │   ├── restful_java.md
│   │   └── trie/
│   │       ├── dictionary.md
│   │       ├── index.md
│   │       └── trie.md
│   ├── conf.py
│   ├── configure.md
│   ├── contributing.md
│   ├── data_format.md
│   ├── index.md
│   ├── install.md
│   ├── references.bib
│   ├── references.rst
│   └── tutorial.md
├── hanlp/
│   ├── __init__.py
│   ├── callbacks/
│   │   ├── __init__.py
│   │   └── fine_csv_logger.py
│   ├── common/
│   │   ├── __init__.py
│   │   ├── component.py
│   │   ├── dataset.py
│   │   ├── keras_component.py
│   │   ├── structure.py
│   │   ├── torch_component.py
│   │   ├── transform.py
│   │   ├── transform_tf.py
│   │   ├── vocab.py
│   │   └── vocab_tf.py
│   ├── components/
│   │   ├── __init__.py
│   │   ├── amr/
│   │   │   ├── __init__.py
│   │   │   ├── amrbart/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── bart_amr_generation.py
│   │   │   │   ├── bart_amr_parser.py
│   │   │   │   ├── common/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── constant.py
│   │   │   │   │   ├── penman_interface.py
│   │   │   │   │   └── postprocessing.py
│   │   │   │   ├── data_interface/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   └── dataset.py
│   │   │   │   ├── model_interface/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── modeling_bart.py
│   │   │   │   │   └── tokenization_bart.py
│   │   │   │   └── preprocess/
│   │   │   │       ├── __init__.py
│   │   │   │       ├── amr_io.py
│   │   │   │       ├── penman_interface.py
│   │   │   │       └── read_and_process.py
│   │   │   └── seq2seq/
│   │   │       ├── __init__.py
│   │   │       ├── dataset/
│   │   │       │   ├── IO.py
│   │   │       │   ├── __init__.py
│   │   │       │   ├── dataset.py
│   │   │       │   ├── linearization.py
│   │   │       │   ├── penman.py
│   │   │       │   ├── postprocessing.py
│   │   │       │   ├── tokenization_bart.py
│   │   │       │   └── tokenization_t5.py
│   │   │       ├── evaluation.py
│   │   │       ├── optim.py
│   │   │       └── seq2seq_amr_parser.py
│   │   ├── classifiers/
│   │   │   ├── __init__.py
│   │   │   ├── fasttext_classifier.py
│   │   │   ├── transformer_classifier.py
│   │   │   ├── transformer_classifier_hf.py
│   │   │   ├── transformer_classifier_tf.py
│   │   │   └── transformer_regression_hf.py
│   │   ├── distillation/
│   │   │   ├── __init__.py
│   │   │   ├── distillable_component.py
│   │   │   ├── losses.py
│   │   │   └── schedulers.py
│   │   ├── eos/
│   │   │   ├── __init__.py
│   │   │   └── ngram.py
│   │   ├── lambda_wrapper.py
│   │   ├── lemmatizer.py
│   │   ├── lm/
│   │   │   ├── __init__.py
│   │   │   └── mlm.py
│   │   ├── mtl/
│   │   │   ├── __init__.py
│   │   │   ├── multi_task_learning.py
│   │   │   └── tasks/
│   │   │       ├── __init__.py
│   │   │       ├── amr.py
│   │   │       ├── constituency.py
│   │   │       ├── dep.py
│   │   │       ├── dep_2nd.py
│   │   │       ├── lem.py
│   │   │       ├── ner/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── biaffine_ner.py
│   │   │       │   └── tag_ner.py
│   │   │       ├── pos.py
│   │   │       ├── sdp.py
│   │   │       ├── srl/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── bio_srl.py
│   │   │       │   └── rank_srl.py
│   │   │       ├── tok/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── reg_tok.py
│   │   │       │   └── tag_tok.py
│   │   │       └── ud.py
│   │   ├── ner/
│   │   │   ├── __init__.py
│   │   │   ├── biaffine_ner/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── biaffine_ner.py
│   │   │   │   └── biaffine_ner_model.py
│   │   │   ├── ner_tf.py
│   │   │   ├── rnn_ner.py
│   │   │   └── transformer_ner.py
│   │   ├── parsers/
│   │   │   ├── __init__.py
│   │   │   ├── alg.py
│   │   │   ├── alg_tf.py
│   │   │   ├── biaffine/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── biaffine.py
│   │   │   │   ├── biaffine_2nd_dep.py
│   │   │   │   ├── biaffine_dep.py
│   │   │   │   ├── biaffine_model.py
│   │   │   │   ├── biaffine_sdp.py
│   │   │   │   ├── mlp.py
│   │   │   │   ├── structual_attention.py
│   │   │   │   └── variationalbilstm.py
│   │   │   ├── biaffine_parser_tf.py
│   │   │   ├── biaffine_tf/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── alg.py
│   │   │   │   ├── layers.py
│   │   │   │   └── model.py
│   │   │   ├── chu_liu_edmonds.py
│   │   │   ├── conll.py
│   │   │   ├── constituency/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── crf_constituency_model.py
│   │   │   │   ├── crf_constituency_parser.py
│   │   │   │   └── treecrf.py
│   │   │   ├── parse_alg.py
│   │   │   └── ud/
│   │   │       ├── __init__.py
│   │   │       ├── lemma_edit.py
│   │   │       ├── tag_decoder.py
│   │   │       ├── ud_model.py
│   │   │       ├── ud_parser.py
│   │   │       ├── udify_util.py
│   │   │       └── util.py
│   │   ├── pipeline.py
│   │   ├── rnn_language_model_tf.py
│   │   ├── srl/
│   │   │   ├── __init__.py
│   │   │   ├── span_bio/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── baffine_tagging.py
│   │   │   │   └── span_bio.py
│   │   │   └── span_rank/
│   │   │       ├── __init__.py
│   │   │       ├── highway_variational_lstm.py
│   │   │       ├── inference_utils.py
│   │   │       ├── layer.py
│   │   │       ├── span_rank.py
│   │   │       ├── span_ranking_srl_model.py
│   │   │       ├── srl_eval_utils.py
│   │   │       └── util.py
│   │   ├── sts/
│   │   │   ├── __init__.py
│   │   │   └── transformer_sts.py
│   │   ├── taggers/
│   │   │   ├── __init__.py
│   │   │   ├── cnn_tagger_tf.py
│   │   │   ├── ngram_conv/
│   │   │   │   ├── __init__.py
│   │   │   │   └── ngram_conv_tagger.py
│   │   │   ├── pos_tf.py
│   │   │   ├── rnn/
│   │   │   │   ├── __init__.py
│   │   │   │   └── rnntaggingmodel.py
│   │   │   ├── rnn_tagger.py
│   │   │   ├── rnn_tagger_tf.py
│   │   │   ├── tagger.py
│   │   │   ├── tagger_tf.py
│   │   │   ├── transformers/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── metrics_tf.py
│   │   │   │   ├── transformer_tagger.py
│   │   │   │   ├── transformer_tagger_tf.py
│   │   │   │   └── transformer_transform_tf.py
│   │   │   └── util.py
│   │   └── tokenizers/
│   │       ├── __init__.py
│   │       ├── multi_criteria_cws_transformer.py
│   │       ├── tok.py
│   │       ├── tok_tf.py
│   │       └── transformer.py
│   ├── datasets/
│   │   ├── __init__.py
│   │   ├── classification/
│   │   │   ├── __init__.py
│   │   │   └── sentiment.py
│   │   ├── coref/
│   │   │   ├── __init__.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── conll12coref.py
│   │   ├── eos/
│   │   │   ├── __init__.py
│   │   │   ├── eos.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── nn_eos.py
│   │   ├── lm/
│   │   │   ├── __init__.py
│   │   │   └── loaders/
│   │   │       ├── __init__.py
│   │   │       └── lm_dataset.py
│   │   ├── lu/
│   │   │   ├── __init__.py
│   │   │   └── glue.py
│   │   ├── ner/
│   │   │   ├── __init__.py
│   │   │   ├── conll03.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── json_ner.py
│   │   │   │   └── tsv.py
│   │   │   ├── msra.py
│   │   │   ├── resume.py
│   │   │   └── weibo.py
│   │   ├── parsing/
│   │   │   ├── __init__.py
│   │   │   ├── amr.py
│   │   │   ├── ctb5.py
│   │   │   ├── ctb7.py
│   │   │   ├── ctb8.py
│   │   │   ├── ctb9.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── _ctb_utils.py
│   │   │   │   ├── conll_dataset.py
│   │   │   │   └── constituency_dataset.py
│   │   │   ├── pmt1.py
│   │   │   ├── ptb.py
│   │   │   ├── semeval15.py
│   │   │   ├── semeval16.py
│   │   │   └── ud/
│   │   │       ├── __init__.py
│   │   │       ├── ud210.py
│   │   │       ├── ud210m.py
│   │   │       ├── ud23.py
│   │   │       ├── ud23m.py
│   │   │       ├── ud27.py
│   │   │       └── ud27m.py
│   │   ├── pos/
│   │   │   ├── __init__.py
│   │   │   └── ctb5.py
│   │   ├── qa/
│   │   │   ├── __init__.py
│   │   │   └── hotpotqa.py
│   │   ├── srl/
│   │   │   ├── __init__.py
│   │   │   ├── loaders/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── conll2012.py
│   │   │   │   └── ontonotes_loader.py
│   │   │   └── ontonotes5/
│   │   │       ├── __init__.py
│   │   │       ├── _utils.py
│   │   │       ├── chinese.py
│   │   │       └── english.py
│   │   ├── sts/
│   │   │   ├── __init__.py
│   │   │   └── stsb.py
│   │   └── tokenization/
│   │       ├── __init__.py
│   │       ├── ctb6.py
│   │       ├── loaders/
│   │       │   ├── __init__.py
│   │       │   ├── chunking_dataset.py
│   │       │   ├── multi_criteria_cws/
│   │       │   │   ├── __init__.py
│   │       │   │   └── mcws_dataset.py
│   │       │   └── txt.py
│   │       └── sighan2005/
│   │           ├── __init__.py
│   │           ├── as_.py
│   │           ├── cityu.py
│   │           ├── msr.py
│   │           └── pku.py
│   ├── layers/
│   │   ├── __init__.py
│   │   ├── cnn_encoder.py
│   │   ├── crf/
│   │   │   ├── __init__.py
│   │   │   ├── crf.py
│   │   │   ├── crf_layer_tf.py
│   │   │   └── crf_tf.py
│   │   ├── dropout.py
│   │   ├── embeddings/
│   │   │   ├── __init__.py
│   │   │   ├── char_cnn.py
│   │   │   ├── char_cnn_tf.py
│   │   │   ├── char_rnn.py
│   │   │   ├── char_rnn_tf.py
│   │   │   ├── concat_embedding.py
│   │   │   ├── contextual_string_embedding.py
│   │   │   ├── contextual_string_embedding_tf.py
│   │   │   ├── contextual_word_embedding.py
│   │   │   ├── embedding.py
│   │   │   ├── fast_text.py
│   │   │   ├── fast_text_tf.py
│   │   │   ├── util.py
│   │   │   ├── util_tf.py
│   │   │   ├── word2vec.py
│   │   │   └── word2vec_tf.py
│   │   ├── feed_forward.py
│   │   ├── feedforward.py
│   │   ├── scalar_mix.py
│   │   ├── time_distributed.py
│   │   ├── transformers/
│   │   │   ├── __init__.py
│   │   │   ├── encoder.py
│   │   │   ├── loader_tf.py
│   │   │   ├── pt_imports.py
│   │   │   ├── relative_transformer.py
│   │   │   ├── resource.py
│   │   │   ├── tf_imports.py
│   │   │   ├── utils.py
│   │   │   └── utils_tf.py
│   │   └── weight_normalization.py
│   ├── losses/
│   │   ├── __init__.py
│   │   └── sparse_categorical_crossentropy.py
│   ├── metrics/
│   │   ├── __init__.py
│   │   ├── accuracy.py
│   │   ├── amr/
│   │   │   ├── __init__.py
│   │   │   └── smatch_eval.py
│   │   ├── chunking/
│   │   │   ├── __init__.py
│   │   │   ├── binary_chunking_f1.py
│   │   │   ├── bmes_tf.py
│   │   │   ├── chunking_f1.py
│   │   │   ├── chunking_f1_tf.py
│   │   │   ├── conlleval.py
│   │   │   ├── iobes_tf.py
│   │   │   └── sequence_labeling.py
│   │   ├── f1.py
│   │   ├── metric.py
│   │   ├── mtl.py
│   │   ├── parsing/
│   │   │   ├── __init__.py
│   │   │   ├── attachmentscore.py
│   │   │   ├── conllx_eval.py
│   │   │   ├── labeled_f1.py
│   │   │   ├── labeled_f1_tf.py
│   │   │   ├── labeled_score.py
│   │   │   ├── semdep_eval.py
│   │   │   └── span.py
│   │   ├── spearman_correlation.py
│   │   └── srl/
│   │       ├── __init__.py
│   │       └── srlconll.py
│   ├── optimizers/
│   │   ├── __init__.py
│   │   └── adamw/
│   │       ├── __init__.py
│   │       └── optimization.py
│   ├── pretrained/
│   │   ├── __init__.py
│   │   ├── amr.py
│   │   ├── amr2text.py
│   │   ├── classifiers.py
│   │   ├── constituency.py
│   │   ├── dep.py
│   │   ├── eos.py
│   │   ├── fasttext.py
│   │   ├── glove.py
│   │   ├── mtl.py
│   │   ├── ner.py
│   │   ├── pos.py
│   │   ├── rnnlm.py
│   │   ├── sdp.py
│   │   ├── srl.py
│   │   ├── sts.py
│   │   ├── tok.py
│   │   └── word2vec.py
│   ├── transform/
│   │   ├── __init__.py
│   │   ├── conll_tf.py
│   │   ├── glue_tf.py
│   │   ├── table_tf.py
│   │   ├── tacred_tf.py
│   │   ├── text_tf.py
│   │   ├── transformer_tokenizer.py
│   │   ├── tsv_tf.py
│   │   └── txt_tf.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── component_util.py
│   │   ├── file_read_backwards/
│   │   │   ├── __init__.py
│   │   │   ├── buffer_work_space.py
│   │   │   └── file_read_backwards.py
│   │   ├── init_util.py
│   │   ├── io_util.py
│   │   ├── lang/
│   │   │   ├── __init__.py
│   │   │   ├── en/
│   │   │   │   ├── __init__.py
│   │   │   │   └── english_tokenizer.py
│   │   │   ├── ja/
│   │   │   │   ├── __init__.py
│   │   │   │   └── bert_tok.py
│   │   │   └── zh/
│   │   │       ├── __init__.py
│   │   │       ├── char_table.py
│   │   │       └── localization.py
│   │   ├── log_util.py
│   │   ├── rules.py
│   │   ├── span_util.py
│   │   ├── string_util.py
│   │   ├── tf_util.py
│   │   ├── time_util.py
│   │   └── torch_util.py
│   └── version.py
├── plugins/
│   ├── README.md
│   ├── hanlp_common/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── hanlp_common/
│   │   │   ├── __init__.py
│   │   │   ├── amr.py
│   │   │   ├── configurable.py
│   │   │   ├── conll.py
│   │   │   ├── constant.py
│   │   │   ├── document.py
│   │   │   ├── io.py
│   │   │   ├── reflection.py
│   │   │   ├── structure.py
│   │   │   ├── util.py
│   │   │   └── visualization.py
│   │   └── setup.py
│   ├── hanlp_demo/
│   │   ├── README.md
│   │   ├── hanlp_demo/
│   │   │   ├── __init__.py
│   │   │   ├── block_windows.py
│   │   │   ├── en/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── demo_amr.py
│   │   │   │   ├── demo_dep.py
│   │   │   │   ├── demo_lm.py
│   │   │   │   ├── demo_ner.py
│   │   │   │   ├── demo_pipeline.py
│   │   │   │   ├── demo_pos.py
│   │   │   │   ├── demo_sdp.py
│   │   │   │   ├── demo_sentiment_analysis.py
│   │   │   │   ├── demo_tok.py
│   │   │   │   └── train_sst2_albert_base.py
│   │   │   ├── ja/
│   │   │   │   ├── __init__.py
│   │   │   │   └── demo_mtl.py
│   │   │   ├── mul/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── demo_lid.py
│   │   │   │   ├── demo_lid_restful.py
│   │   │   │   ├── demo_mtl.py
│   │   │   │   └── train/
│   │   │   │       ├── __init__.py
│   │   │   │       └── mul_base.py
│   │   │   ├── sent_split.py
│   │   │   └── zh/
│   │   │       ├── __init__.py
│   │   │       ├── abstractive_summarization_restful.ipynb
│   │   │       ├── amr_restful.ipynb
│   │   │       ├── amr_stl.ipynb
│   │   │       ├── classification_restful.ipynb
│   │   │       ├── con_mtl.ipynb
│   │   │       ├── con_restful.ipynb
│   │   │       ├── con_stl.ipynb
│   │   │       ├── cor_restful.ipynb
│   │   │       ├── demo_amr.py
│   │   │       ├── demo_custom_dict.py
│   │   │       ├── demo_custom_dict_stl.py
│   │   │       ├── demo_del_tasks.py
│   │   │       ├── demo_document.py
│   │   │       ├── demo_mlm.py
│   │   │       ├── demo_mtl.py
│   │   │       ├── demo_ner_dict.py
│   │   │       ├── demo_parse_constituency.py
│   │   │       ├── demo_pipeline.py
│   │   │       ├── demo_pos_dict.py
│   │   │       ├── demo_sts.py
│   │   │       ├── demo_word2vec.py
│   │   │       ├── dep_mtl.ipynb
│   │   │       ├── dep_restful.ipynb
│   │   │       ├── dep_stl.ipynb
│   │   │       ├── extractive_summarization_restful.ipynb
│   │   │       ├── gec_restful.ipynb
│   │   │       ├── keyphrase_restful.ipynb
│   │   │       ├── lid_restful.ipynb
│   │   │       ├── lid_stl.ipynb
│   │   │       ├── ner_mtl.ipynb
│   │   │       ├── ner_restful.ipynb
│   │   │       ├── ner_stl.ipynb
│   │   │       ├── pos_mtl.ipynb
│   │   │       ├── pos_restful.ipynb
│   │   │       ├── pos_stl.ipynb
│   │   │       ├── sdp_mtl.ipynb
│   │   │       ├── sdp_restful.ipynb
│   │   │       ├── sdp_stl.ipynb
│   │   │       ├── sentiment_restful.ipynb
│   │   │       ├── srl_mtl.ipynb
│   │   │       ├── srl_restful.ipynb
│   │   │       ├── srl_stl.ipynb
│   │   │       ├── sts_restful.ipynb
│   │   │       ├── sts_stl.ipynb
│   │   │       ├── tf/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── demo_classifier.py
│   │   │       │   ├── demo_client.py
│   │   │       │   ├── demo_cws.py
│   │   │       │   ├── demo_cws_trie.py
│   │   │       │   ├── demo_dep.py
│   │   │       │   ├── demo_fasttext.py
│   │   │       │   ├── demo_multiprocess.py
│   │   │       │   ├── demo_ner.py
│   │   │       │   ├── demo_pipeline.py
│   │   │       │   ├── demo_pos.py
│   │   │       │   ├── demo_sdp.py
│   │   │       │   ├── demo_serving.py
│   │   │       │   └── train/
│   │   │       │       ├── __init__.py
│   │   │       │       ├── cws/
│   │   │       │       │   ├── __init__.py
│   │   │       │       │   ├── train_ctb6_cws_albert.py
│   │   │       │       │   ├── train_ctb6_cws_bert.py
│   │   │       │       │   ├── train_ctb6_cws_convseg.py
│   │   │       │       │   ├── train_large_bert_cws.py
│   │   │       │       │   ├── train_large_conv_cws.py
│   │   │       │       │   ├── train_large_cws_albert.py
│   │   │       │       │   ├── train_large_cws_electra.py
│   │   │       │       │   ├── train_large_rnn_cws.py
│   │   │       │       │   ├── train_msr_cws_albert.py
│   │   │       │       │   ├── train_msr_cws_bert.py
│   │   │       │       │   ├── train_msr_cws_ngram_conv.py
│   │   │       │       │   ├── train_msr_cws_ngram_conv_embed.py
│   │   │       │       │   ├── train_pku980106_conv_cws.py
│   │   │       │       │   ├── train_pku980106_rnn_cws.py
│   │   │       │       │   └── train_pku_conv_cws.py
│   │   │       │       ├── finetune_msra_ner_albert.py
│   │   │       │       ├── train_chnsenticorp_bert.py
│   │   │       │       ├── train_conll03_ner_bert.py
│   │   │       │       ├── train_conll03_ner_flair.py
│   │   │       │       ├── train_ctb5_dep.py
│   │   │       │       ├── train_ctb5_pos_rnn.py
│   │   │       │       ├── train_ctb7_dep.py
│   │   │       │       ├── train_ctb9_pos_albert.py
│   │   │       │       ├── train_ctb9_pos_electra.py
│   │   │       │       ├── train_msra_ner_albert.py
│   │   │       │       ├── train_msra_ner_bert.py
│   │   │       │       ├── train_msra_ner_electra.py
│   │   │       │       ├── train_msra_ner_ngram_conv.py
│   │   │       │       ├── train_msra_ner_rnn.py
│   │   │       │       ├── train_ptb_dep_biaffine_albert.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert_96.6.py
│   │   │       │       ├── train_ptb_dep_biaffine_bert_positional.py
│   │   │       │       ├── train_ptb_dep_sa_albert.py
│   │   │       │       ├── train_ptb_dep_sa_albert_topk.py
│   │   │       │       ├── train_ptb_dep_sa_bert.py
│   │   │       │       ├── train_ptb_dep_sa_pos_bert.py
│   │   │       │       ├── train_ptb_pos_rnn_fasttext.py
│   │   │       │       ├── train_semeval15_dm.py
│   │   │       │       ├── train_semeval15_pas.py
│   │   │       │       ├── train_semeval15_psd.py
│   │   │       │       ├── train_semeval16_news.py
│   │   │       │       └── train_semeval16_text.py
│   │   │       ├── tok_mtl.ipynb
│   │   │       ├── tok_restful.ipynb
│   │   │       ├── tok_stl.ipynb
│   │   │       ├── train/
│   │   │       │   ├── __init__.py
│   │   │       │   ├── finetune_ner.py
│   │   │       │   ├── open_base.py
│   │   │       │   └── open_small.py
│   │   │       ├── train_sota_bert_pku.py
│   │   │       ├── tst_restful.ipynb
│   │   │       └── tutorial.ipynb
│   │   └── setup.py
│   ├── hanlp_restful/
│   │   ├── README.md
│   │   ├── hanlp_restful/
│   │   │   └── __init__.py
│   │   ├── setup.py
│   │   └── tests/
│   │       ├── __init__.py
│   │       └── test_client.py
│   ├── hanlp_restful_golang/
│   │   └── README.md
│   ├── hanlp_restful_java/
│   │   ├── pom.xml
│   │   └── src/
│   │       ├── main/
│   │       │   └── java/
│   │       │       └── com/
│   │       │           └── hankcs/
│   │       │               └── hanlp/
│   │       │                   └── restful/
│   │       │                       ├── BaseInput.java
│   │       │                       ├── CoreferenceResolutionOutput.java
│   │       │                       ├── DocumentInput.java
│   │       │                       ├── HanLPClient.java
│   │       │                       ├── SentenceInput.java
│   │       │                       ├── Span.java
│   │       │                       ├── TokenInput.java
│   │       │                       └── mrp/
│   │       │                           ├── Anchor.java
│   │       │                           ├── Edge.java
│   │       │                           ├── MeaningRepresentation.java
│   │       │                           └── Node.java
│   │       └── test/
│   │           └── java/
│   │               └── com/
│   │                   └── hankcs/
│   │                       └── hanlp/
│   │                           └── restful/
│   │                               ├── HanLPClientTest.java
│   │                               └── MeaningRepresentationTest.java
│   └── hanlp_trie/
│       ├── README.md
│       ├── hanlp_trie/
│       │   ├── __init__.py
│       │   ├── dictionary.py
│       │   └── trie.py
│       ├── setup.py
│       └── tests/
│           ├── __init__.py
│           ├── test_trie.py
│           └── test_trie_dict.py
├── setup.py
└── tests/
    ├── __init__.py
    ├── test_config_tracker.py
    ├── test_mtl.py
    ├── test_pipeline.py
    ├── test_rules.py
    └── test_string_util.py

Download .txt

Showing preview only (286K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (3347 symbols across 265 files)

FILE: hanlp/__init__.py
  function load (line 13) | def load(save_dir: str, verbose=None, **kwargs) -> hanlp.common.componen...
  function pipeline (line 46) | def pipeline(*pipes) -> hanlp.components.pipeline.Pipeline:

FILE: hanlp/callbacks/fine_csv_logger.py
  class StreamTableFormatter (line 12) | class StreamTableFormatter(object):
    method __init__ (line 14) | def __init__(self) -> None:
    method format_row (line 18) | def format_row(self, cells) -> List[str]:
    method format_cell (line 27) | def format_cell(self, cell: str, min_width) -> str:
  class FineCSVLogger (line 33) | class FineCSVLogger(tf.keras.callbacks.History):
    method __init__ (line 35) | def __init__(self, filename, separator=',', append=False):
    method on_train_begin (line 44) | def on_train_begin(self, logs=None):
    method on_train_end (line 48) | def on_train_end(self, logs=None):
    method on_epoch_end (line 51) | def on_epoch_end(self, epoch, logs=None):

FILE: hanlp/common/component.py
  class Component (line 11) | class Component(Configurable, ABC):
    method predict (line 13) | def predict(self, *args, **kwargs):
    method __call__ (line 25) | def __call__(self, *args, **kwargs):

FILE: hanlp/common/dataset.py
  class Transformable (line 30) | class Transformable(ABC):
    method __init__ (line 31) | def __init__(self, transform: Union[Callable, List] = None) -> None:
    method append_transform (line 43) | def append_transform(self, transform: Callable):
    method insert_transform (line 64) | def insert_transform(self, index: int, transform: Callable):
    method transform_sample (line 87) | def transform_sample(self, sample: dict, inplace=False) -> dict:
  class TransformableDataset (line 109) | class TransformableDataset(Transformable, Dataset, ABC):
    method __init__ (line 111) | def __init__(self,
    method load_data (line 139) | def load_data(self, data, generate_idx=False):
    method should_load_file (line 166) | def should_load_file(self, data) -> bool:
    method load_file (line 178) | def load_file(self, filepath: str):
    method __getitem__ (line 186) | def __getitem__(self, index: Union[int, slice]) -> Union[dict, List[di...
    method __len__ (line 212) | def __len__(self) -> int:
    method __repr__ (line 215) | def __repr__(self) -> str:
    method purge_cache (line 218) | def purge_cache(self):
    method split (line 223) | def split(self, *ratios):
    method k_fold (line 250) | def k_fold(self, k, i):
    method subset (line 265) | def subset(self, indices):
    method shuffle (line 280) | def shuffle(self):
    method prune (line 290) | def prune(self, criterion: Callable, logger: Logger = None):
  class TransformSequentialDataset (line 314) | class TransformSequentialDataset(Transformable, IterableDataset, ABC):
  class DeviceDataLoader (line 318) | class DeviceDataLoader(DataLoader):
    method __init__ (line 319) | def __init__(self, dataset, batch_size=32, shuffle=False, sampler=None,
    method __iter__ (line 341) | def __iter__(self):
    method collate_fn (line 350) | def collate_fn(self, samples):
  class PadSequenceDataLoader (line 354) | class PadSequenceDataLoader(DataLoader):
    method __init__ (line 356) | def __init__(self, dataset, batch_size=32, shuffle=False, sampler=None,
    method __iter__ (line 429) | def __iter__(self):
    method tensorize (line 434) | def tensorize(raw_batch: Dict[str, Any], vocabs: VocabDict, pad_dict: ...
    method pad_data (line 468) | def pad_data(data: Union[torch.Tensor, Iterable], pad, dtype=None, dev...
    method collate_fn (line 510) | def collate_fn(self, samples):
  class CachedDataLoader (line 514) | class CachedDataLoader(object):
    method __init__ (line 515) | def __init__(self, dataloader: torch.utils.data.DataLoader, filename=N...
    method _build_cache (line 522) | def _build_cache(self, dataset, verbose=HANLP_VERBOSE):
    method close (line 530) | def close(self):
    method __iter__ (line 534) | def __iter__(self):
    method __len__ (line 540) | def __len__(self):
  function _prefetch_generator (line 544) | def _prefetch_generator(dataloader, queue, batchify=None):
  class PrefetchDataLoader (line 552) | class PrefetchDataLoader(DataLoader):
    method __init__ (line 553) | def __init__(self, dataloader: torch.utils.data.DataLoader, prefetch: ...
    method _fire_process (line 583) | def _fire_process(self, dataloader, prefetch):
    method __iter__ (line 588) | def __iter__(self):
    method close (line 601) | def close(self):
    method batchify (line 610) | def batchify(self):
    method batchify (line 614) | def batchify(self, batchify):
  class BucketSampler (line 622) | class BucketSampler(Sampler):
    method __init__ (line 624) | def __init__(self, buckets: Dict[float, List[int]], batch_max_tokens, ...
    method __iter__ (line 651) | def __iter__(self):
    method __len__ (line 660) | def __len__(self):
  class KMeansSampler (line 664) | class KMeansSampler(BucketSampler):
    method __init__ (line 665) | def __init__(self, lengths, batch_max_tokens, batch_size=None, shuffle...
  class SortingSampler (line 684) | class SortingSampler(Sampler):
    method __init__ (line 686) | def __init__(self, lengths: List[int], batch_size=None, batch_max_toke...
    method __iter__ (line 728) | def __iter__(self):
    method __len__ (line 734) | def __len__(self) -> int:
  class SamplerBuilder (line 738) | class SamplerBuilder(AutoConfigurable, ABC):
    method build (line 740) | def build(self, lengths: List[int], shuffle=False, gradient_accumulati...
    method __call__ (line 751) | def __call__(self, lengths: List[int], shuffle=False, **kwargs) -> Sam...
    method scale (line 754) | def scale(self, gradient_accumulation):
  class SortingSamplerBuilder (line 774) | class SortingSamplerBuilder(SortingSampler, SamplerBuilder):
    method __init__ (line 776) | def __init__(self, batch_size=None, batch_max_tokens=None, use_effecti...
    method build (line 788) | def build(self, lengths: List[int], shuffle=False, gradient_accumulati...
    method __len__ (line 792) | def __len__(self) -> int:
  class KMeansSamplerBuilder (line 796) | class KMeansSamplerBuilder(KMeansSampler, SamplerBuilder):
    method __init__ (line 798) | def __init__(self, batch_max_tokens, batch_size=None, n_buckets=1):
    method build (line 810) | def build(self, lengths: List[int], shuffle=False, gradient_accumulati...
    method __len__ (line 814) | def __len__(self) -> int:
  class TableDataset (line 818) | class TableDataset(TransformableDataset):
    method __init__ (line 819) | def __init__(self,
    method load_file (line 831) | def load_file(self, filepath: str):

FILE: hanlp/common/keras_component.py
  class KerasComponent (line 33) | class KerasComponent(Component, ABC):
    method __init__ (line 34) | def __init__(self, transform: Transform) -> None:
    method evaluate (line 49) | def evaluate(self, input_path: str, save_dir=None, output=False, batch...
    method num_samples_in (line 102) | def num_samples_in(self, dataset):
    method evaluate_dataset (line 105) | def evaluate_dataset(self, tst_data, callbacks, output, num_batches, *...
    method evaluate_output (line 109) | def evaluate_output(self, tst_data, out, num_batches, metrics: List[tf...
    method evaluate_output_to_file (line 121) | def evaluate_output_to_file(self, batch, outputs, out):
    method _capture_config (line 127) | def _capture_config(self, config: Dict,
    method save_meta (line 149) | def save_meta(self, save_dir, filename='meta.json', **kwargs):
    method load_meta (line 154) | def load_meta(self, save_dir, filename='meta.json'):
    method save_config (line 160) | def save_config(self, save_dir, filename='config.json'):
    method load_config (line 163) | def load_config(self, save_dir, filename='config.json'):
    method save_weights (line 167) | def save_weights(self, save_dir, filename='model.h5'):
    method load_weights (line 170) | def load_weights(self, save_dir, filename='model.h5', **kwargs):
    method save_vocabs (line 176) | def save_vocabs(self, save_dir, filename='vocabs.json'):
    method load_vocabs (line 183) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_transform (line 192) | def load_transform(self, save_dir) -> Transform:
    method save (line 205) | def save(self, save_dir: str, **kwargs):
    method load (line 210) | def load(self, save_dir: str, logger=hanlp.utils.log_util.logger, **kw...
    method input_shape (line 220) | def input_shape(self) -> List:
    method build (line 223) | def build(self, logger, **kwargs):
    method compile_model (line 252) | def compile_model(self, optimizer, loss, metrics):
    method build_optimizer (line 260) | def build_optimizer(self, optimizer, **kwargs) -> tf.keras.optimizers....
    method build_loss (line 273) | def build_loss(self, loss, **kwargs):
    method build_transform (line 284) | def build_transform(self, **kwargs):
    method build_vocab (line 287) | def build_vocab(self, trn_data, logger):
    method build_metrics (line 292) | def build_metrics(self, metrics, logger: logging.Logger, **kwargs):
    method build_model (line 297) | def build_model(self, **kwargs) -> tf.keras.Model:
    method fit (line 300) | def fit(self, trn_data, dev_data, save_dir, batch_size, epochs, run_ea...
    method train_loop (line 369) | def train_loop(self, trn_data, dev_data, epochs, num_examples, train_s...
    method build_valid_dataset (line 379) | def build_valid_dataset(self, dev_data, batch_size):
    method build_train_dataset (line 383) | def build_train_dataset(self, trn_data, batch_size, num_examples):
    method build_callbacks (line 389) | def build_callbacks(self, save_dir, logger, **kwargs):
    method on_train_begin (line 425) | def on_train_begin(self):
    method predict (line 431) | def predict(self, data: Any, batch_size=None, **kwargs):
    method predict_batch (line 460) | def predict_batch(self, batch, inputs=None, **kwargs):
    method sample_data (line 467) | def sample_data(self):
    method from_meta (line 471) | def from_meta(meta: dict, **kwargs):
    method export_model_for_serving (line 490) | def export_model_for_serving(self, export_dir=None, version=1, overwri...
    method serve (line 508) | def serve(self, export_dir=None, grpc_port=8500, rest_api_port=0, over...

FILE: hanlp/common/structure.py
  class ConfigTracker (line 11) | class ConfigTracker(Configurable):
    method __init__ (line 13) | def __init__(self, locals_: Dict, exclude=('kwargs', 'self', '__class_...
  class History (line 37) | class History(object):
    method __init__ (line 38) | def __init__(self):
    method step (line 45) | def step(self, gradient_accumulation):
    method num_training_steps (line 57) | def num_training_steps(self, num_batches, gradient_accumulation):

FILE: hanlp/common/torch_component.py
  class TorchComponent (line 29) | class TorchComponent(Component, ABC):
    method __init__ (line 30) | def __init__(self, **kwargs) -> None:
    method _capture_config (line 43) | def _capture_config(self, locals_: Dict,
    method save_weights (line 72) | def save_weights(self, save_dir, filename='model.pt', trainable_only=T...
    method load_weights (line 89) | def load_weights(self, save_dir, filename='model.pt', **kwargs):
    method save_config (line 106) | def save_config(self, save_dir, filename='config.json'):
    method load_config (line 115) | def load_config(self, save_dir, filename='config.json', **kwargs):
    method save_vocabs (line 131) | def save_vocabs(self, save_dir, filename='vocabs.json'):
    method load_vocabs (line 141) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method save (line 152) | def save(self, save_dir: str, **kwargs):
    method load (line 163) | def load(self, save_dir: str, devices=None, verbose=HANLP_VERBOSE, **k...
    method fit (line 189) | def fit(self,
    method build_logger (line 301) | def build_logger(self, name, save_dir):
    method build_dataloader (line 315) | def build_dataloader(self, data, batch_size, shuffle=False, device=Non...
    method build_vocabs (line 330) | def build_vocabs(self, trn: torch.utils.data.Dataset, logger: logging....
    method _savable_config (line 340) | def _savable_config(self):
    method build_optimizer (line 360) | def build_optimizer(self, **kwargs):
    method build_criterion (line 369) | def build_criterion(self, **kwargs):
    method build_metric (line 378) | def build_metric(self, **kwargs):
    method execute_training_loop (line 387) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 408) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 422) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method build_model (line 435) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method evaluate (line 444) | def evaluate(self, tst_data, save_dir=None, logger: logging.Logger = N...
    method generate_prediction_filename (line 505) | def generate_prediction_filename(self, tst_data, save_dir):
    method to (line 512) | def to(self,
    method parallelize (line 570) | def parallelize(self, devices: List[Union[int, torch.device]]):
    method devices (line 574) | def devices(self):
    method device (line 586) | def device(self):
    method on_config_ready (line 594) | def on_config_ready(self, **kwargs):
    method model_ (line 604) | def model_(self) -> nn.Module:
    method predict (line 617) | def predict(self, *args, **kwargs):
    method _create_dummy_placeholder_on (line 628) | def _create_dummy_placeholder_on(device):
    method __call__ (line 634) | def __call__(self, *args, **kwargs):

FILE: hanlp/common/transform.py
  class ToIndex (line 19) | class ToIndex(ABC):
    method __init__ (line 21) | def __init__(self, vocab: Vocab = None) -> None:
    method __call__ (line 28) | def __call__(self, sample):
    method save_vocab (line 31) | def save_vocab(self, save_dir, filename='vocab.json'):
    method load_vocab (line 36) | def load_vocab(self, save_dir, filename='vocab.json'):
  class FieldToIndex (line 43) | class FieldToIndex(ToIndex):
    method __init__ (line 45) | def __init__(self, src, vocab: Vocab, dst=None) -> None:
    method __call__ (line 52) | def __call__(self, sample: dict):
    method save_vocab (line 56) | def save_vocab(self, save_dir, filename=None):
    method load_vocab (line 61) | def load_vocab(self, save_dir, filename=None):
  class VocabList (line 67) | class VocabList(list):
    method __init__ (line 69) | def __init__(self, *fields) -> None:
    method append (line 74) | def append(self, item: Union[str, Tuple[str, Vocab], Tuple[str, str, V...
    method save_vocab (line 90) | def save_vocab(self, save_dir):
    method load_vocab (line 94) | def load_vocab(self, save_dir):
  class VocabDict (line 99) | class VocabDict(SerializableDict):
    method __init__ (line 101) | def __init__(self, *args, **kwargs) -> None:
    method save_vocabs (line 114) | def save_vocabs(self, save_dir, filename='vocabs.json'):
    method load_vocabs (line 127) | def load_vocabs(self, save_dir, filename='vocabs.json', vocab_cls=Vocab):
    method _load_vocabs (line 140) | def _load_vocabs(vd, vocabs: dict, vocab_cls=Vocab):
    method lock (line 163) | def lock(self):
    method unlock (line 171) | def unlock(self):
    method mutable (line 180) | def mutable(self):
    method __call__ (line 184) | def __call__(self, sample: dict):
    method __getattr__ (line 192) | def __getattr__(self, key):
    method __setattr__ (line 197) | def __setattr__(self, key, value):
    method __getitem__ (line 200) | def __getitem__(self, k: str) -> Vocab:
    method __setitem__ (line 203) | def __setitem__(self, k: str, v: Vocab) -> None:
    method summary (line 206) | def summary(self, logger: logging.Logger = None):
    method put (line 220) | def put(self, **kwargs):
  class NamedTransform (line 230) | class NamedTransform(ABC):
    method __init__ (line 231) | def __init__(self, src: str, dst: str = None) -> None:
    method __call__ (line 238) | def __call__(self, sample: dict) -> dict:
  class ConfigurableTransform (line 242) | class ConfigurableTransform(Configurable, ABC):
    method config (line 244) | def config(self):
    method from_config (line 249) | def from_config(cls, config: dict):
  class ConfigurableNamedTransform (line 269) | class ConfigurableNamedTransform(NamedTransform, ConfigurableTransform, ...
  class EmbeddingNamedTransform (line 273) | class EmbeddingNamedTransform(ConfigurableNamedTransform, ABC):
    method __init__ (line 275) | def __init__(self, output_dim: int, src: str, dst: str) -> None:
  class RenameField (line 280) | class RenameField(NamedTransform):
    method __call__ (line 282) | def __call__(self, sample: dict):
  class CopyField (line 287) | class CopyField(object):
    method __init__ (line 288) | def __init__(self, src, dst) -> None:
    method __call__ (line 292) | def __call__(self, sample: dict) -> dict:
  class FilterField (line 297) | class FilterField(object):
    method __init__ (line 298) | def __init__(self, *keys) -> None:
    method __call__ (line 301) | def __call__(self, sample: dict):
  class TransformList (line 306) | class TransformList(list):
    method __init__ (line 321) | def __init__(self, *transforms) -> None:
    method __call__ (line 325) | def __call__(self, sample):
    method index_by_type (line 330) | def index_by_type(self, t):
  class LowerCase (line 336) | class LowerCase(object):
    method __init__ (line 337) | def __init__(self, src, dst=None) -> None:
    method __call__ (line 343) | def __call__(self, sample: dict) -> dict:
  class LowerCase3D (line 352) | class LowerCase3D(LowerCase):
    method __call__ (line 354) | def __call__(self, sample: dict) -> dict:
  class ToChar (line 360) | class ToChar(object):
    method __init__ (line 361) | def __init__(self, src, dst='char', max_word_length=None, min_word_len...
    method __call__ (line 370) | def __call__(self, sample: dict) -> dict:
    method to_chars (line 378) | def to_chars(self, word: str):
  class AppendEOS (line 387) | class AppendEOS(NamedTransform):
    method __init__ (line 389) | def __init__(self, src: str, dst: str = None, eos=EOS) -> None:
    method __call__ (line 393) | def __call__(self, sample: dict) -> dict:
  class WhitespaceTokenizer (line 398) | class WhitespaceTokenizer(NamedTransform):
    method __call__ (line 400) | def __call__(self, sample: dict) -> dict:
    method tokenize (line 409) | def tokenize(text: str):
  class NormalizeDigit (line 413) | class NormalizeDigit(object):
    method __init__ (line 414) | def __init__(self, src, dst=None) -> None:
    method transform (line 421) | def transform(word: str):
    method __call__ (line 430) | def __call__(self, sample: dict) -> dict:
  class Bigram (line 439) | class Bigram(NamedTransform):
    method __init__ (line 441) | def __init__(self, src: str, dst: str = None) -> None:
    method __call__ (line 446) | def __call__(self, sample: dict) -> dict:
  class FieldLength (line 454) | class FieldLength(NamedTransform):
    method __init__ (line 456) | def __init__(self, src: str, dst: str = None, delta=0) -> None:
    method __call__ (line 462) | def __call__(self, sample: dict) -> dict:
  class BMESOtoIOBES (line 467) | class BMESOtoIOBES(object):
    method __init__ (line 468) | def __init__(self, field='tag') -> None:
    method __call__ (line 471) | def __call__(self, sample: dict) -> dict:
    method convert (line 476) | def convert(y: str):
  class NormalizeToken (line 482) | class NormalizeToken(ConfigurableNamedTransform):
    method __init__ (line 484) | def __init__(self, mapper: Union[str, dict], src: str, dst: str = None...
    method __call__ (line 496) | def __call__(self, sample: dict) -> dict:
    method convert (line 507) | def convert(self, token) -> str:
  class PunctuationMask (line 511) | class PunctuationMask(ConfigurableNamedTransform):
    method __init__ (line 512) | def __init__(self, src: str, dst: str = None) -> None:
    method __call__ (line 526) | def __call__(self, sample: dict) -> dict:
  class NormalizeCharacter (line 536) | class NormalizeCharacter(NormalizeToken):
    method convert (line 537) | def convert(self, token) -> str:

FILE: hanlp/common/transform_tf.py
  class Transform (line 16) | class Transform(ABC):
    method __init__ (line 18) | def __init__(self, config: SerializableDict = None, map_x=True, map_y=...
    method fit (line 35) | def fit(self, trn_path: str, **kwargs) -> int:
    method build_config (line 51) | def build_config(self):
    method create_types_shapes_values (line 68) | def create_types_shapes_values(self) -> Tuple[Tuple, Tuple, Tuple]:
    method file_to_inputs (line 75) | def file_to_inputs(self, filepath: str, gold=True):
    method inputs_to_samples (line 87) | def inputs_to_samples(self, inputs, gold=False):
    method file_to_samples (line 94) | def file_to_samples(self, filepath: str, gold=True):
    method file_to_dataset (line 106) | def file_to_dataset(self, filepath: str, gold=True, map_x=None, map_y=...
    method inputs_to_dataset (line 148) | def inputs_to_dataset(self, inputs, gold=False, map_x=None, map_y=None...
    method samples_to_dataset (line 162) | def samples_to_dataset(self, samples: Generator, map_x=None, map_y=Non...
    method x_to_idx (line 210) | def x_to_idx(self, x) -> Union[tf.Tensor, Tuple]:
    method y_to_idx (line 214) | def y_to_idx(self, y) -> tf.Tensor:
    method lock_vocabs (line 217) | def lock_vocabs(self):
    method summarize_vocabs (line 222) | def summarize_vocabs(self, logger=None, header='Vocab summary:'):
    method generator_to_callable (line 238) | def generator_to_callable(generator: Generator):
    method str_to_idx (line 241) | def str_to_idx(self, X, Y) -> Tuple[Union[tf.Tensor, Tuple], tf.Tensor]:
    method X_to_inputs (line 244) | def X_to_inputs(self, X: Union[tf.Tensor, Tuple[tf.Tensor]]) -> Iterable:
    method Y_to_outputs (line 247) | def Y_to_outputs(self, Y: Union[tf.Tensor, Tuple[tf.Tensor]], gold=Fal...
    method XY_to_inputs_outputs (line 251) | def XY_to_inputs_outputs(self, X: Union[tf.Tensor, Tuple[tf.Tensor]],
    method input_is_single_sample (line 269) | def input_is_single_sample(self, input: Any) -> bool:
    method input_to_inputs (line 272) | def input_to_inputs(self, input: Any) -> Tuple[Any, bool]:
    method input_truth_output_to_str (line 291) | def input_truth_output_to_str(self, input, truth, output):
    method cleanup (line 307) | def cleanup(self):

FILE: hanlp/common/vocab.py
  class Vocab (line 12) | class Vocab(Serializable):
    method __init__ (line 13) | def __init__(self, idx_to_token: List[str] = None, token_to_idx: Dict ...
    method __setitem__ (line 42) | def __setitem__(self, token: str, idx: int):
    method __getitem__ (line 46) | def __getitem__(self, key: Union[str, int, List]) -> Union[int, str, L...
    method __contains__ (line 67) | def __contains__(self, key: Union[str, int]):
    method add (line 75) | def add(self, token: str) -> int:
    method update (line 95) | def update(self, tokens: Iterable[str]) -> None:
    method get_idx (line 105) | def get_idx(self, token: str) -> int:
    method get_idx_without_add (line 126) | def get_idx_without_add(self, token: str) -> int:
    method get_token (line 132) | def get_token(self, idx: int) -> str:
    method has_key (line 149) | def has_key(self, token):
    method __len__ (line 152) | def __len__(self):
    method lock (line 155) | def lock(self):
    method build_idx_to_token (line 168) | def build_idx_to_token(self):
    method unlock (line 174) | def unlock(self):
    method locked (line 188) | def locked(self):
    method unk_idx (line 195) | def unk_idx(self):
    method pad_idx (line 205) | def pad_idx(self):
    method tokens (line 215) | def tokens(self):
    method __str__ (line 221) | def __str__(self) -> str:
    method summary (line 224) | def summary(self, verbose=True) -> str:
    method __call__ (line 244) | def __call__(self, some_token: Union[str, Iterable[str]]) -> Union[int...
    method to_dict (line 260) | def to_dict(self) -> dict:
    method copy_from (line 275) | def copy_from(self, item: dict):
    method lower (line 290) | def lower(self):
    method first_token (line 305) | def first_token(self):
    method merge (line 314) | def merge(self, other):
    method safe_pad_token (line 324) | def safe_pad_token(self) -> str:
    method safe_pad_token_idx (line 335) | def safe_pad_token_idx(self) -> int:
    method safe_unk_token (line 342) | def safe_unk_token(self) -> str:
    method __repr__ (line 352) | def __repr__(self) -> str:
    method extend (line 357) | def extend(self, tokens: Iterable[str]):
    method reload_idx_to_token (line 361) | def reload_idx_to_token(self, idx_to_token: List[str], pad_idx=0, unk_...
    method set_unk_as_safe_unk (line 369) | def set_unk_as_safe_unk(self):
    method clear (line 374) | def clear(self):
  class CustomVocab (line 379) | class CustomVocab(Vocab):
    method to_dict (line 380) | def to_dict(self) -> dict:
  class LowercaseVocab (line 386) | class LowercaseVocab(CustomVocab):
    method get_idx (line 387) | def get_idx(self, token: str) -> int:
  class VocabWithNone (line 400) | class VocabWithNone(CustomVocab):
    method get_idx (line 401) | def get_idx(self, token: str) -> int:
  class VocabWithFrequency (line 407) | class VocabWithFrequency(CustomVocab):
    method __init__ (line 409) | def __init__(self, counter: Counter = None, min_occur_cnt=0, pad_token...
    method to_dict (line 423) | def to_dict(self) -> dict:
    method copy_from (line 428) | def copy_from(self, item: dict):
    method get_frequency (line 432) | def get_frequency(self, token):
  class VocabCounter (line 439) | class VocabCounter(CustomVocab):
    method __init__ (line 441) | def __init__(self, idx_to_token: List[str] = None, token_to_idx: Dict ...
    method get_idx (line 446) | def get_idx(self, token: str) -> int:
    method trim (line 451) | def trim(self, min_frequency):
    method copy_from (line 464) | def copy_from(self, item: dict):
    method to_dict (line 468) | def to_dict(self) -> dict:
  class Vocab3D (line 474) | class Vocab3D(CustomVocab):
    method __call__ (line 475) | def __call__(self, some_token: Union[str, Iterable[str], Iterable[Iter...
  function create_label_vocab (line 505) | def create_label_vocab() -> Vocab:

FILE: hanlp/common/vocab_tf.py
  class VocabTF (line 12) | class VocabTF(Serializable):
    method __init__ (line 13) | def __init__(self, idx_to_token: List[str] = None, token_to_idx: Dict ...
    method __setitem__ (line 35) | def __setitem__(self, token: str, idx: int):
    method __getitem__ (line 39) | def __getitem__(self, key: Union[str, int, List]) -> Union[int, str, L...
    method __contains__ (line 52) | def __contains__(self, key: Union[str, int]):
    method add (line 60) | def add(self, token: str) -> int:
    method update (line 70) | def update(self, tokens: Iterable[str]) -> None:
    method get_idx (line 84) | def get_idx(self, token: str) -> int:
    method get_idx_without_add (line 94) | def get_idx_without_add(self, token: str) -> int:
    method get_token (line 100) | def get_token(self, idx: int) -> str:
    method has_key (line 109) | def has_key(self, token):
    method __len__ (line 112) | def __len__(self):
    method lock (line 115) | def lock(self):
    method build_idx_to_token (line 123) | def build_idx_to_token(self):
    method build_lookup_table (line 129) | def build_lookup_table(self):
    method unlock (line 135) | def unlock(self):
    method locked (line 145) | def locked(self):
    method unk_idx (line 149) | def unk_idx(self):
    method pad_idx (line 156) | def pad_idx(self):
    method tokens (line 163) | def tokens(self):
    method __str__ (line 166) | def __str__(self) -> str:
    method summary (line 169) | def summary(self, verbose=True) -> str:
    method __call__ (line 180) | def __call__(self, some_token: Union[str, List[str]]) -> Union[int, Li...
    method lookup (line 189) | def lookup(self, token_tensor: tf.Tensor) -> tf.Tensor:
    method to_dict (line 194) | def to_dict(self) -> dict:
    method copy_from (line 203) | def copy_from(self, item: dict):
    method lower (line 210) | def lower(self):
    method first_token (line 219) | def first_token(self):
    method merge (line 226) | def merge(self, other):
    method safe_pad_token (line 231) | def safe_pad_token(self) -> str:
    method safe_pad_token_idx (line 248) | def safe_pad_token_idx(self) -> int:
    method safe_unk_token (line 252) | def safe_unk_token(self) -> str:
  function create_label_vocab (line 269) | def create_label_vocab() -> VocabTF:

FILE: hanlp/components/amr/amrbart/bart_amr_generation.py
  class BART_AMR_Generation (line 25) | class BART_AMR_Generation(TorchComponent):
    method __init__ (line 26) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 32) | def build_dataloader(self, data, batch_size=32, shuffle=False, device=...
    method build_optimizer (line 47) | def build_optimizer(self, **kwargs):
    method build_criterion (line 50) | def build_criterion(self, **kwargs):
    method build_metric (line 53) | def build_metric(self, **kwargs):
    method execute_training_loop (line 56) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 60) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 63) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method build_model (line 66) | def build_model(self, training=True, transformer=None, **kwargs) -> to...
    method input_is_flat (line 76) | def input_is_flat(self, data):
    method predict (line 79) | def predict(
    method predict_batch (line 105) | def predict_batch(self, batch, num_beams, max_length):
    method load_config (line 124) | def load_config(self, save_dir: str, filename='config.json', **kwargs):
    method load_vocabs (line 132) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 138) | def load_weights(self, save_dir, filename='model.pt', **kwargs):

FILE: hanlp/components/amr/amrbart/bart_amr_parser.py
  class BART_AMR_Parser (line 28) | class BART_AMR_Parser(TorchComponent):
    method __init__ (line 29) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 35) | def build_dataloader(self, data, batch_size=32, shuffle=False, device=...
    method build_optimizer (line 51) | def build_optimizer(self, **kwargs):
    method build_criterion (line 54) | def build_criterion(self, **kwargs):
    method build_metric (line 57) | def build_metric(self, **kwargs):
    method execute_training_loop (line 60) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 64) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method build_model (line 67) | def build_model(self, training=True, transformer=None, **kwargs) -> to...
    method input_is_flat (line 77) | def input_is_flat(self, data):
    method predict (line 80) | def predict(
    method predict_batch (line 106) | def predict_batch(self, batch, num_beams, max_length):
    method load_config (line 152) | def load_config(self, save_dir: str, filename='config.json', **kwargs):
    method load_vocabs (line 160) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 166) | def load_weights(self, save_dir, filename='model.pt', **kwargs):
    method evaluate_dataloader (line 170) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method evaluate (line 207) | def evaluate(self, tst_data, save_dir=None, logger: logging.Logger = N...

FILE: hanlp/components/amr/amrbart/common/penman_interface.py
  function _get_model (line 36) | def _get_model(dereify):
  function _remove_wiki (line 47) | def _remove_wiki(graph):
  function load (line 60) | def load(source, dereify=None, remove_wiki=False):
  function loads (line 69) | def loads(string, dereify=None, remove_wiki=False):
  function encode (line 78) | def encode(g, top=None, indent=-1, compact=False):

FILE: hanlp/components/amr/amrbart/common/postprocessing.py
  function token_processing (line 39) | def token_processing(tok):
  function decode_into_node_and_backreferences (line 55) | def decode_into_node_and_backreferences(subtoken_ids, tokenizer):
  function index_of (line 233) | def index_of(element, iterable, default=None, start=None, end=None):
  function separate_edges_nodes (line 253) | def separate_edges_nodes(edges_nodes_slice, *other):
  function _split_name_ops (line 277) | def _split_name_ops(graph):
  function _reconstruct_graph_from_nodes (line 314) | def _reconstruct_graph_from_nodes(nodes, backreferences):
  function build_graph (line 440) | def build_graph(nodes, backreferences, restore_name_ops=False):
  class ParsedStatus (line 447) | class ParsedStatus(enum.Enum):
  function connect_graph_if_not_connected (line 453) | def connect_graph_if_not_connected(graph):
  function restore_backreferences_from_pointers (line 488) | def restore_backreferences_from_pointers(nodes):

FILE: hanlp/components/amr/amrbart/data_interface/dataset.py
  class AMRParsingDataSet (line 24) | class AMRParsingDataSet(object):
    method tokenize (line 27) | def tokenize(sample: dict, tokenizer, max_src_length=400, max_tgt_leng...
  class AMR2TextDataSet (line 47) | class AMR2TextDataSet(object):
    method tokenize (line 50) | def tokenize(sample: dict, tokenizer, max_src_length=400, max_tgt_leng...

FILE: hanlp/components/amr/amrbart/model_interface/modeling_bart.py
  function shift_tokens_right (line 75) | def shift_tokens_right(input_ids: torch.Tensor, pad_token_id: int, decod...
  function _make_causal_mask (line 91) | def _make_causal_mask(input_ids_shape: torch.Size, dtype: torch.dtype, p...
  function _expand_mask (line 106) | def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Option...
  class BartLearnedPositionalEmbedding (line 120) | class BartLearnedPositionalEmbedding(nn.Embedding):
    method __init__ (line 125) | def __init__(self, num_embeddings: int, embedding_dim: int):
    method forward (line 131) | def forward(self, input_ids_shape: torch.Size, past_key_values_length:...
  class BartAttention (line 140) | class BartAttention(nn.Module):
    method __init__ (line 143) | def __init__(
    method _shape (line 170) | def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
    method forward (line 173) | def forward(
  class BartEncoderLayer (line 287) | class BartEncoderLayer(nn.Module):
    method __init__ (line 288) | def __init__(self, config: BartConfig):
    method forward (line 304) | def forward(
  class BartDecoderLayer (line 355) | class BartDecoderLayer(nn.Module):
    method __init__ (line 356) | def __init__(self, config: BartConfig):
    method forward (line 382) | def forward(
  class BartClassificationHead (line 472) | class BartClassificationHead(nn.Module):
    method __init__ (line 475) | def __init__(
    method forward (line 487) | def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
  class BartPretrainedModel (line 496) | class BartPretrainedModel(PreTrainedModel):
    method _init_weights (line 502) | def _init_weights(self, module):
    method _set_gradient_checkpointing (line 513) | def _set_gradient_checkpointing(self, module, value=False):
    method dummy_inputs (line 518) | def dummy_inputs(self):
  class PretrainedBartModel (line 528) | class PretrainedBartModel(BartPretrainedModel):
    method __init_subclass__ (line 529) | def __init_subclass__(self):
  class BartEncoder (line 692) | class BartEncoder(BartPretrainedModel):
    method __init__ (line 702) | def __init__(self, config: BartConfig, embed_tokens: Optional[nn.Embed...
    method get_input_embeddings (line 729) | def get_input_embeddings(self):
    method set_input_embeddings (line 732) | def set_input_embeddings(self, value):
    method forward (line 735) | def forward(
  class BartDecoder (line 868) | class BartDecoder(BartPretrainedModel):
    method __init__ (line 877) | def __init__(self, config: BartConfig, embed_tokens: Optional[nn.Embed...
    method get_input_embeddings (line 901) | def get_input_embeddings(self):
    method set_input_embeddings (line 904) | def set_input_embeddings(self, value):
    method _prepare_decoder_attention_mask (line 907) | def _prepare_decoder_attention_mask(self, attention_mask, input_shape,...
    method forward (line 925) | def forward(
  class BartModel (line 1146) | class BartModel(BartPretrainedModel):
    method __init__ (line 1147) | def __init__(self, config: BartConfig):
    method get_input_embeddings (line 1159) | def get_input_embeddings(self):
    method set_input_embeddings (line 1162) | def set_input_embeddings(self, value):
    method get_encoder (line 1167) | def get_encoder(self):
    method get_decoder (line 1170) | def get_decoder(self):
    method forward (line 1181) | def forward(
  class BartForConditionalGeneration (line 1273) | class BartForConditionalGeneration(BartPretrainedModel):
    method __init__ (line 1277) | def __init__(self, config: BartConfig):
    method get_encoder (line 1286) | def get_encoder(self):
    method get_decoder (line 1289) | def get_decoder(self):
    method resize_token_embeddings (line 1292) | def resize_token_embeddings(self, new_num_tokens: int) -> nn.Embedding:
    method _resize_final_logits_bias (line 1297) | def _resize_final_logits_bias(self, new_num_tokens: int) -> None:
    method get_output_embeddings (line 1306) | def get_output_embeddings(self):
    method set_output_embeddings (line 1309) | def set_output_embeddings(self, new_embeddings):
    method forward (line 1315) | def forward(
    method prepare_inputs_for_generation (line 1393) | def prepare_inputs_for_generation(
    method prepare_decoder_input_ids_from_labels (line 1421) | def prepare_decoder_input_ids_from_labels(self, labels: torch.Tensor):
    method _reorder_cache (line 1425) | def _reorder_cache(past, beam_idx):
  class BartForSequenceClassification (line 1442) | class BartForSequenceClassification(BartPretrainedModel):
    method __init__ (line 1443) | def __init__(self, config: BartConfig, **kwargs):
    method forward (line 1464) | def forward(
  class BartForQuestionAnswering (line 1569) | class BartForQuestionAnswering(BartPretrainedModel):
    method __init__ (line 1570) | def __init__(self, config):
    method forward (line 1590) | def forward(
  class BartDecoderWrapper (line 1685) | class BartDecoderWrapper(BartPretrainedModel):
    method __init__ (line 1691) | def __init__(self, config):
    method forward (line 1695) | def forward(self, *args, **kwargs):
  class BartForCausalLM (line 1699) | class BartForCausalLM(BartPretrainedModel):
    method __init__ (line 1700) | def __init__(self, config):
    method get_input_embeddings (line 1712) | def get_input_embeddings(self):
    method set_input_embeddings (line 1715) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 1718) | def get_output_embeddings(self):
    method set_output_embeddings (line 1721) | def set_output_embeddings(self, new_embeddings):
    method set_decoder (line 1724) | def set_decoder(self, decoder):
    method get_decoder (line 1727) | def get_decoder(self):
    method forward (line 1731) | def forward(
    method prepare_inputs_for_generation (line 1874) | def prepare_inputs_for_generation(self, input_ids, past=None, attentio...
    method _reorder_cache (line 1890) | def _reorder_cache(past, beam_idx):

FILE: hanlp/components/amr/amrbart/model_interface/tokenization_bart.py
  class AMRBartTokenizer (line 33) | class AMRBartTokenizer(BartTokenizer):
    method __init__ (line 36) | def __init__(self, vocab_file, merges_file, errors="replace", bos_toke...
    method from_pretrained (line 44) | def from_pretrained(cls, pretrained_model_path, *args, **kwargs):
    method init_amr_vocabulary (line 49) | def init_amr_vocabulary(self):
    method _tokenize (line 66) | def _tokenize(self, text):
    method _tok_bpe (line 83) | def _tok_bpe(self, token):
    method tokenize_amr (line 97) | def tokenize_amr(self, amr_tokens):
    method decode_amr (line 141) | def decode_amr(self, tokens, restore_name_ops=None):
    method _fix_and_make_graph (line 174) | def _fix_and_make_graph(self, nodes):
    method _classify (line 450) | def _classify(self, node):

FILE: hanlp/components/amr/amrbart/preprocess/amr_io.py
  function read_raw_amr_data (line 31) | def read_raw_amr_data(

FILE: hanlp/components/amr/amrbart/preprocess/penman_interface.py
  function _get_model (line 36) | def _get_model(dereify):
  function _remove_wiki (line 47) | def _remove_wiki(graph):
  function load (line 60) | def load(source, dereify=None, remove_wiki=False):
  function loads (line 69) | def loads(string, dereify=None, remove_wiki=False):
  function encode (line 78) | def encode(g, top=None, indent=-1, compact=False):

FILE: hanlp/components/amr/amrbart/preprocess/read_and_process.py
  function _tokenize_encoded_graph (line 33) | def _tokenize_encoded_graph(encoded):
  function dfs_linearize (line 50) | def dfs_linearize(graph, remove_pars=False, use_pointer_tokens=True):
  function main (line 82) | def main():

FILE: hanlp/components/amr/seq2seq/dataset/IO.py
  function read_raw_amr_data (line 7) | def read_raw_amr_data(

FILE: hanlp/components/amr/seq2seq/dataset/dataset.py
  class AMRDataset (line 17) | class AMRDataset(TransformableDataset):
    method __init__ (line 19) | def __init__(self,
    method load_file (line 32) | def load_file(self, filepath: str):
    method get_roles (line 38) | def get_roles(self):
    method get_frames (line 48) | def get_frames(self):
  class AMRPickleDataset (line 60) | class AMRPickleDataset(AMRDataset):
    method load_file (line 62) | def load_file(self, filepath: str):
  function dfs_linearize_tokenize (line 69) | def dfs_linearize_tokenize(sample: dict, tokenizer: PENMANBartTokenizer,...
  function dfs_linearize_levi (line 85) | def dfs_linearize_levi(sample: dict, tokenizer: PENMANBartTokenizer, rem...
  function dfs_linearize_rgcn (line 111) | def dfs_linearize_rgcn(sample: dict, tokenizer: PENMANBartTokenizer) -> ...
  function dfs_linearize_constituency (line 127) | def dfs_linearize_constituency(sample: dict, tokenizer: PENMANBartTokeni...
  function dfs_linearize_tokenize_with_linguistic_structures (line 171) | def dfs_linearize_tokenize_with_linguistic_structures(sample: dict, toke...
  function dep_to_levi (line 220) | def dep_to_levi(tok: List[str], dep: List[Tuple[int, str]]):
  function dfs (line 227) | def dfs(tok: List[str], dep: List[Tuple[int, str]], s, seq):

FILE: hanlp/components/amr/seq2seq/dataset/linearization.py
  class SemanticGraph (line 12) | class SemanticGraph:
    method variables (line 36) | def variables(self) -> Set[str]:
    method resolved_nodes_var (line 42) | def resolved_nodes_var(self) -> List[str]:
    method nodes (line 47) | def nodes(self) -> List[str]:
    method resolved_nodes (line 52) | def resolved_nodes(self) -> List[str]:
    method src_occurrence (line 55) | def src_occurrence(self, var: str) -> int:
  class BaseLinearizer (line 59) | class BaseLinearizer(metaclass=abc.ABCMeta):
    method linearize (line 62) | def linearize(self, *args, **kwargs) -> SemanticGraph:
  class AMRTokens (line 66) | class AMRTokens:
    method is_node (line 98) | def is_node(cls, string: str) -> bool:
    method read_backr (line 106) | def read_backr(cls, string: str) -> Optional:
  function index_default (line 119) | def index_default(
  class AMRLinearizer (line 132) | class AMRLinearizer(BaseLinearizer):
    method __init__ (line 134) | def __init__(
    method _collapse_name_ops (line 143) | def _collapse_name_ops(self, amr):
    method linearize (line 170) | def linearize(self, amr: penman.Graph) -> SemanticGraph:
    method _linearize (line 179) | def _linearize(self, amr: penman.Graph) -> SemanticGraph:
    method _interleave (line 321) | def _interleave(self, graph: SemanticGraph) -> SemanticGraph:
    method _add_pointer_tokens (line 382) | def _add_pointer_tokens(self, graph: SemanticGraph) -> SemanticGraph:

FILE: hanlp/components/amr/seq2seq/dataset/penman.py
  function _get_model (line 22) | def _get_model(dereify):
  function _remove_wiki (line 31) | def _remove_wiki(graph):
  function pm_load (line 44) | def pm_load(source, dereify=None, remove_wiki=False) -> List[penman.Graph]:
  function loads (line 63) | def loads(string, dereify=None, remove_wiki=False):
  function pm_encode (line 72) | def pm_encode(g, top=None, indent=-1, compact=False):
  function role_is_reverted (line 77) | def role_is_reverted(role: str):
  class AMRGraph (line 83) | class AMRGraph(penman.Graph):
    method __str__ (line 84) | def __str__(self):

FILE: hanlp/components/amr/seq2seq/dataset/postprocessing.py
  function token_processing (line 15) | def token_processing(tok):
  function decode_into_node_and_backreferences (line 31) | def decode_into_node_and_backreferences(subtoken_ids, tokenizer):
  function decode_into_node_and_backreferences_without_space (line 191) | def decode_into_node_and_backreferences_without_space(subtoken_ids, toke...
  function index_of (line 360) | def index_of(element, iterable, default=None, start=None, end=None):
  function separate_edges_nodes (line 378) | def separate_edges_nodes(edges_nodes_slice, *other):
  function _split_name_ops (line 405) | def _split_name_ops(graph):
  function _reconstruct_graph_from_nodes (line 442) | def _reconstruct_graph_from_nodes(nodes, backreferences):
  function build_graph (line 563) | def build_graph(nodes, backreferences, restore_name_ops=False):
  class ParsedStatus (line 570) | class ParsedStatus(enum.Enum):
  function connect_graph_if_not_connected (line 576) | def connect_graph_if_not_connected(graph):
  function restore_backreferences_from_pointers (line 610) | def restore_backreferences_from_pointers(nodes):

FILE: hanlp/components/amr/seq2seq/dataset/tokenization_bart.py
  class AMRBartTokenizer (line 15) | class AMRBartTokenizer(BartTokenizer):
    method __init__ (line 24) | def __init__(self, *args, use_pointer_tokens=False, collapse_name_ops=...
    method from_pretrained (line 36) | def from_pretrained(cls, pretrained_model_path, additional_tokens: Ite...
    method init_amr_vocabulary (line 43) | def init_amr_vocabulary(self, additions: Set[str] = None, recategoriza...
    method build_inputs_with_special_tokens (line 81) | def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=No...
    method _tokenize (line 87) | def _tokenize(self, text):
    method _tok_bpe (line 104) | def _tok_bpe(self, token, add_space=True):
    method _get_nodes_and_backreferences (line 120) | def _get_nodes_and_backreferences(self, graph):
    method tokenize_amr (line 125) | def tokenize_amr(self, graph):
    method batch_encode_sentences (line 181) | def batch_encode_sentences(self, sentences, device=torch.device('cpu')):
    method linearize (line 188) | def linearize(self, graph):
    method batch_encode_graphs (line 201) | def batch_encode_graphs(self, graphs, device=torch.device('cpu')):
    method batch_encode_graphs_from_linearized (line 205) | def batch_encode_graphs_from_linearized(self, linearized, extras=None,...
    method decode_amr (line 223) | def decode_amr(self, tokens, restore_name_ops=False):
  class PENMANBartTokenizer (line 257) | class PENMANBartTokenizer(AMRBartTokenizer):
    method __init__ (line 259) | def __init__(self, *args, raw_graph=False, **kwargs):
    method _tokenize_encoded_graph (line 265) | def _tokenize_encoded_graph(self, encoded):
    method tokenize_amr (line 282) | def tokenize_amr(self, graph):
    method _get_nodes_and_backreferences (line 295) | def _get_nodes_and_backreferences(self, graph):
    method _classify (line 327) | def _classify(self, node):
    method _fix_and_make_graph (line 354) | def _fix_and_make_graph(self, nodes):
    method decode_amr (line 635) | def decode_amr(self, tokens, restore_name_ops=None):

FILE: hanlp/components/amr/seq2seq/dataset/tokenization_t5.py
  class AMRT5Tokenizer (line 16) | class AMRT5Tokenizer(T5TokenizerFast):
    method __init__ (line 25) | def __init__(self, *args, use_pointer_tokens=False, collapse_name_ops=...
    method from_pretrained (line 37) | def from_pretrained(cls, pretrained_model_path, additional_tokens: Ite...
    method init_amr_vocabulary (line 44) | def init_amr_vocabulary(self, additions: Set[str] = None, recategoriza...
    method build_inputs_with_special_tokens (line 73) | def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=No...
    method _tokenize (line 79) | def _tokenize(self, text):
    method _tok_bpe (line 96) | def _tok_bpe(self, token, add_space=True):
    method _get_nodes_and_backreferences (line 109) | def _get_nodes_and_backreferences(self, graph):
    method tokenize_amr (line 114) | def tokenize_amr(self, graph):
    method batch_encode_sentences (line 171) | def batch_encode_sentences(self, sentences, device=torch.device('cpu')):
    method linearize (line 178) | def linearize(self, graph):
    method batch_encode_graphs (line 191) | def batch_encode_graphs(self, graphs, device=torch.device('cpu')):
    method batch_encode_graphs_from_linearized (line 195) | def batch_encode_graphs_from_linearized(self, linearized, extras=None,...
    method decode_amr (line 213) | def decode_amr(self, tokens, restore_name_ops=False):
  class PENMANT5Tokenizer (line 248) | class PENMANT5Tokenizer(AMRT5Tokenizer):
    method __init__ (line 250) | def __init__(self, *args, raw_graph=False, **kwargs):
    method _tokenize_encoded_graph (line 256) | def _tokenize_encoded_graph(self, encoded):
    method tokenize_amr (line 275) | def tokenize_amr(self, graph):
    method _get_nodes_and_backreferences (line 288) | def _get_nodes_and_backreferences(self, graph):
    method _classify (line 320) | def _classify(self, node):
    method _fix_and_make_graph (line 347) | def _fix_and_make_graph(self, nodes):
    method decode_amr (line 630) | def decode_amr(self, tokens, restore_name_ops=None):
    method encoder (line 672) | def encoder(self) -> Dict[str, int]:

FILE: hanlp/components/amr/seq2seq/evaluation.py
  function write_predictions (line 6) | def write_predictions(predictions_path, tokenizer, graphs):
  function compute_smatch (line 15) | def compute_smatch(pred, gold):
  function compute_bleu (line 22) | def compute_bleu(gold_sentences, pred_sentences):

FILE: hanlp/components/amr/seq2seq/optim.py
  class RAdam (line 8) | class RAdam(Optimizer):
    method __init__ (line 10) | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weig...
    method __setstate__ (line 29) | def __setstate__(self, state):
    method step (line 32) | def step(self, closure=None):

FILE: hanlp/components/amr/seq2seq/seq2seq_amr_parser.py
  class Seq2seq_AMR_Parser (line 34) | class Seq2seq_AMR_Parser(TorchComponent):
    method __init__ (line 35) | def __init__(self, **kwargs):
    method build_dataloader (line 41) | def build_dataloader(self, data, batch_size,
    method _create_dataloader (line 73) | def _create_dataloader(self, dataset, batch_size, device, sampler, shu...
    method _get_pad_dict (line 77) | def _get_pad_dict(self):
    method finalize_dataset (line 81) | def finalize_dataset(self, dataset, logger: logging.Logger = None):
    method build_dataset (line 85) | def build_dataset(self, data, generate_idx):
    method collect_additional_tokens (line 89) | def collect_additional_tokens(self, additional_tokens, dataset):
    method build_tokenizer (line 99) | def build_tokenizer(self, additional_tokens) -> PENMANBartTokenizer:
    method build_optimizer (line 119) | def build_optimizer(self, trn, lr, epochs, gradient_accumulation, warm...
    method build_criterion (line 132) | def build_criterion(self, **kwargs):
    method build_metric (line 135) | def build_metric(self, **kwargs):
    method execute_training_loop (line 138) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 177) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method _step (line 198) | def _step(self, optimizer, scheduler):
    method report_metrics (line 206) | def report_metrics(self, loss):
    method feed_batch (line 209) | def feed_batch(self, batch):
    method evaluate_dataloader (line 221) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method predict_amrs (line 259) | def predict_amrs(self, batch, beam_size=1):
    method _model_generate (line 287) | def _model_generate(self, batch, beam_size):
    method build_model (line 299) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method _get_model_cls (line 315) | def _get_model_cls(self, transformer: str):
    method _init_new_embeddings (line 325) | def _init_new_embeddings(model, tokenizer):
    method input_is_flat (line 377) | def input_is_flat(self, data):
    method predict (line 380) | def predict(self, data: Union[str, List[str]], beautiful_amr_graph=Tru...
    method fit (line 399) | def fit(self, trn_data, dev_data, save_dir, batch_size=32, epochs=30,
    method on_config_ready (line 490) | def on_config_ready(self, **kwargs):
    method evaluate (line 501) | def evaluate(self, tst_data, save_dir=None, logger: logging.Logger = N...
    method build_vocabs (line 505) | def build_vocabs(self, trn: torch.utils.data.Dataset, logger: logging....

FILE: hanlp/components/classifiers/fasttext_classifier.py
  class FastTextClassifier (line 19) | class FastTextClassifier(Component):
    method __init__ (line 21) | def __init__(self) -> None:
    method load (line 29) | def load(self, save_dir, model_path=None, **kwargs):
    method predict (line 41) | def predict(self, text: Union[str, List[str]], topk=False, prob=False,...
    method labels (line 90) | def labels(self):
    method _strip_prefix (line 94) | def _strip_prefix(label: str):

FILE: hanlp/components/classifiers/transformer_classifier.py
  class TransformerClassificationModel (line 28) | class TransformerClassificationModel(nn.Module):
    method __init__ (line 30) | def __init__(self,
    method forward (line 40) | def forward(self, input_ids, attention_mask, token_type_ids):
  class TransformerComponent (line 52) | class TransformerComponent(TorchComponent, ABC):
    method __init__ (line 53) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 63) | def build_optimizer(self,
    method fit (line 86) | def fit(self, trn_data, dev_data, save_dir,
    method on_config_ready (line 110) | def on_config_ready(self, **kwargs):
    method build_transformer (line 117) | def build_transformer(self, training=True):
  class TransformerClassifier (line 129) | class TransformerClassifier(TransformerComponent):
    method __init__ (line 131) | def __init__(self, **kwargs) -> None:
    method build_criterion (line 140) | def build_criterion(self, **kwargs):
    method build_metric (line 144) | def build_metric(self, **kwargs):
    method execute_training_loop (line 147) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method label_vocab (line 166) | def label_vocab(self):
    method fit_dataloader (line 169) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method update_metric (line 191) | def update_metric(self, metric, logits: torch.Tensor, target, output=N...
    method compute_loss (line 197) | def compute_loss(self, criterion, logits, target, batch):
    method feed_batch (line 201) | def feed_batch(self, batch) -> torch.LongTensor:
    method evaluate_dataloader (line 206) | def evaluate_dataloader(self,
    method build_model (line 249) | def build_model(self, transformer, training=True, **kwargs) -> torch.n...
    method build_dataloader (line 259) | def build_dataloader(self, data, batch_size, shuffle, device, text_a_k...
    method build_dataset (line 310) | def build_dataset(self, data) -> TransformableDataset:
    method predict (line 321) | def predict(self, data: Union[str, List[str]], batch_size: int = None,...
    method fit (line 356) | def fit(self, trn_data, dev_data, save_dir,
    method build_vocabs (line 378) | def build_vocabs(self, trn, logger, **kwargs):

FILE: hanlp/components/classifiers/transformer_classifier_hf.py
  class TransformerClassifierHF (line 17) | class TransformerClassifierHF(TorchComponent):
    method __init__ (line 18) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 22) | def build_dataloader(self, data, sampler_builder=None, shuffle=False, ...
    method build_optimizer (line 38) | def build_optimizer(self, **kwargs):
    method build_criterion (line 41) | def build_criterion(self, **kwargs):
    method build_metric (line 44) | def build_metric(self, **kwargs):
    method execute_training_loop (line 47) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 51) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 54) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method load_vocabs (line 57) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 60) | def load_weights(self, save_dir, filename='model.pt', **kwargs):
    method build_model (line 63) | def build_model(self, training=True, save_dir=None, **kwargs) -> torch...
    method predict (line 66) | def predict(self, text: Union[str, List[str]], topk=False, prob=False,...
    method labels (line 124) | def labels(self):

FILE: hanlp/components/classifiers/transformer_classifier_tf.py
  class TransformerTextTransform (line 17) | class TransformerTextTransform(TableTransform):
    method __init__ (line 19) | def __init__(self, config: SerializableDict = None, map_x=False, map_y...
    method inputs_to_samples (line 24) | def inputs_to_samples(self, inputs, gold=False):
    method create_types_shapes_values (line 74) | def create_types_shapes_values(self) -> Tuple[Tuple, Tuple, Tuple]:
    method x_to_idx (line 81) | def x_to_idx(self, x) -> Union[tf.Tensor, Tuple]:
    method y_to_idx (line 85) | def y_to_idx(self, y) -> tf.Tensor:
    method Y_to_outputs (line 96) | def Y_to_outputs(self, Y: Union[tf.Tensor, Tuple[tf.Tensor]], gold=Fal...
    method input_is_single_sample (line 106) | def input_is_single_sample(self, input: Any) -> bool:
  class TransformerClassifierTF (line 110) | class TransformerClassifierTF(KerasComponent):
    method __init__ (line 112) | def __init__(self, bert_text_transform=None) -> None:
    method fit (line 120) | def fit(self, trn_data: Any, dev_data: Any, save_dir: str, transformer...
    method evaluate_output (line 125) | def evaluate_output(self, tst_data, out, num_batches, metric):
    method _y_id_to_str (line 144) | def _y_id_to_str(self, Y_pred) -> str:
    method build_loss (line 147) | def build_loss(self, loss, **kwargs):
    method build_optimizer (line 159) | def build_optimizer(self, optimizer, use_amp, train_steps, warmup_step...
    method build_model (line 177) | def build_model(self, transformer, max_length, **kwargs):
    method build_vocab (line 182) | def build_vocab(self, trn_data, logger):
    method build_metrics (line 188) | def build_metrics(self, metrics, logger, **kwargs):

FILE: hanlp/components/classifiers/transformer_regression_hf.py
  class TransformerRegressionHF (line 17) | class TransformerRegressionHF(TorchComponent):
    method __init__ (line 18) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 22) | def build_dataloader(self, data, sampler_builder=None, shuffle=False, ...
    method build_optimizer (line 38) | def build_optimizer(self, **kwargs):
    method build_criterion (line 41) | def build_criterion(self, **kwargs):
    method build_metric (line 44) | def build_metric(self, **kwargs):
    method execute_training_loop (line 47) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 51) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 54) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method load_vocabs (line 57) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 60) | def load_weights(self, save_dir, filename='model.pt', **kwargs):
    method build_model (line 63) | def build_model(self, training=True, save_dir=None, **kwargs) -> torch...
    method predict (line 66) | def predict(self, text: Union[str, List[str]], **kwargs):

FILE: hanlp/components/distillation/distillable_component.py
  class DistillableComponent (line 15) | class DistillableComponent(TorchComponent, ABC):
    method build_teacher (line 18) | def build_teacher(self, teacher: str, devices) -> TorchComponent:
    method distill (line 21) | def distill(self,
    method _savable_config (line 50) | def _savable_config(self):

FILE: hanlp/components/distillation/losses.py
  function kd_mse_loss (line 10) | def kd_mse_loss(logits_S, logits_T, temperature=1):
  function kd_ce_loss (line 26) | def kd_ce_loss(logits_S, logits_T, temperature=1):
  function att_mse_loss (line 43) | def att_mse_loss(attention_S, attention_T, mask=None):
  function att_mse_sum_loss (line 64) | def att_mse_sum_loss(attention_S, attention_T, mask=None):
  function att_ce_loss (line 89) | def att_ce_loss(attention_S, attention_T, mask=None):
  function att_ce_mean_loss (line 110) | def att_ce_mean_loss(attention_S, attention_T, mask=None):
  function hid_mse_loss (line 134) | def hid_mse_loss(state_S, state_T, mask=None):
  function cos_loss (line 153) | def cos_loss(state_S, state_T, mask=None):
  function pkd_loss (line 176) | def pkd_loss(state_S, state_T, mask=None):
  function fsp_loss (line 194) | def fsp_loss(state_S, state_T, mask=None):
  function mmd_loss (line 237) | def mmd_loss(state_S, state_T, mask=None):
  class KnowledgeDistillationLoss (line 276) | class KnowledgeDistillationLoss(AutoConfigurable):
    method __init__ (line 277) | def __init__(self, name) -> None:
    method __call__ (line 284) | def __call__(self, *args, **kwargs):

FILE: hanlp/components/distillation/schedulers.py
  function linear_growth_weight_scheduler (line 11) | def linear_growth_weight_scheduler(x):
  function linear_decay_weight_scheduler (line 15) | def linear_decay_weight_scheduler(x):
  function constant_temperature_scheduler (line 19) | def constant_temperature_scheduler(logits_S, logits_T, base_temperature):
  function flsw_temperature_scheduler_builder (line 26) | def flsw_temperature_scheduler_builder(beta, gamma, eps=1e-4, *args):
  function cwsm_temperature_scheduler_builder (line 44) | def cwsm_temperature_scheduler_builder(beta, *args):
  class LinearTeacherAnnealingScheduler (line 61) | class LinearTeacherAnnealingScheduler(object):
    method __init__ (line 62) | def __init__(self, num_training_steps: int) -> None:
    method step (line 67) | def step(self):
    method __float__ (line 70) | def __float__(self):
  class TemperatureScheduler (line 74) | class TemperatureScheduler(ABC, AutoConfigurable):
    method __init__ (line 76) | def __init__(self, base_temperature) -> None:
    method __call__ (line 80) | def __call__(self, logits_S, logits_T):
    method forward (line 84) | def forward(self, logits_S, logits_T):
    method from_name (line 88) | def from_name(name):
  class FunctionalScheduler (line 98) | class FunctionalScheduler(TemperatureScheduler):
    method __init__ (line 100) | def __init__(self, scheduler_func, base_temperature) -> None:
    method forward (line 104) | def forward(self, logits_S, logits_T):
  class ConstantScheduler (line 108) | class ConstantScheduler(TemperatureScheduler):
    method forward (line 109) | def forward(self, logits_S, logits_T):
  class FlswScheduler (line 113) | class FlswScheduler(FunctionalScheduler):
    method __init__ (line 114) | def __init__(self, beta=1, gamma=1, eps=1e-4, base_temperature=8):
  class CwsmScheduler (line 121) | class CwsmScheduler(FunctionalScheduler):
    method __init__ (line 122) | def __init__(self, beta=1, base_temperature=8):

FILE: hanlp/components/eos/ngram.py
  class NgramSentenceBoundaryDetectionModel (line 22) | class NgramSentenceBoundaryDetectionModel(nn.Module):
    method __init__ (line 24) | def __init__(self,
    method forward (line 58) | def forward(self, x: torch.Tensor):
  class NgramSentenceBoundaryDetector (line 69) | class NgramSentenceBoundaryDetector(TorchComponent):
    method __init__ (line 71) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 87) | def build_optimizer(self, **kwargs):
    method build_criterion (line 91) | def build_criterion(self, **kwargs):
    method build_metric (line 94) | def build_metric(self, **kwargs):
    method execute_training_loop (line 97) | def execute_training_loop(self,
    method fit_dataloader (line 124) | def fit_dataloader(self,
    method compute_loss (line 150) | def compute_loss(self, prediction, batch, criterion):
    method evaluate_dataloader (line 155) | def evaluate_dataloader(self,
    method build_model (line 178) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method build_dataloader (line 182) | def build_dataloader(self, data, batch_size, shuffle, device, logger: ...
    method predict (line 191) | def predict(self, data: Union[str, List[str]], batch_size: int = None,...
    method fit (line 245) | def fit(self,
    method build_vocabs (line 273) | def build_vocabs(self, dataset: SentenceBoundaryDetectionDataset, logg...
    method reset_metrics (line 297) | def reset_metrics(self, metrics):
    method report_metrics (line 300) | def report_metrics(self, loss, metrics):
    method update_metrics (line 303) | def update_metrics(self, batch: dict, prediction: torch.FloatTensor, m...
    method feed_batch (line 309) | def feed_batch(self, batch):

FILE: hanlp/components/lambda_wrapper.py
  class LambdaComponent (line 10) | class LambdaComponent(Component):
    method __init__ (line 11) | def __init__(self, function: Callable) -> None:
    method predict (line 18) | def predict(self, data: Any, **kwargs):
    method from_config (line 25) | def from_config(meta: dict, **kwargs):

FILE: hanlp/components/lemmatizer.py
  function add_lemma_rules_to_sample (line 11) | def add_lemma_rules_to_sample(sample: dict):
  class TransformerLemmatizer (line 20) | class TransformerLemmatizer(TransformerTagger):
    method __init__ (line 22) | def __init__(self, **kwargs) -> None:
    method build_dataset (line 30) | def build_dataset(self, data, transform=None, **kwargs):
    method prediction_to_human (line 36) | def prediction_to_human(self, pred, vocab: List[str], batch, token=None):

FILE: hanlp/components/lm/mlm.py
  class MaskedLanguageModelDataset (line 22) | class MaskedLanguageModelDataset(TransformableDataset):
    method load_file (line 24) | def load_file(self, filepath: str):
  class MaskedLanguageModel (line 28) | class MaskedLanguageModel(TorchComponent):
    method __init__ (line 30) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 34) | def build_dataloader(self, data, batch_size, shuffle=False, device=Non...
    method build_optimizer (line 49) | def build_optimizer(self, **kwargs):
    method build_criterion (line 52) | def build_criterion(self, **kwargs):
    method build_metric (line 55) | def build_metric(self, **kwargs):
    method execute_training_loop (line 58) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 62) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 65) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method build_model (line 68) | def build_model(self, training=True, transformer=None, **kwargs) -> to...
    method input_is_flat (line 71) | def input_is_flat(self, masked_sents):
    method predict (line 74) | def predict(self, masked_sents: Union[str, List[str]], batch_size=32, ...
    method load_config (line 105) | def load_config(self, save_dir, filename='config.json', **kwargs):
    method load_vocabs (line 108) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 111) | def load_weights(self, save_dir, filename='model.pt', **kwargs):

FILE: hanlp/components/mtl/multi_task_learning.py
  class MultiTaskModel (line 38) | class MultiTaskModel(torch.nn.Module):
    method __init__ (line 40) | def __init__(self,
  class MultiTaskDataLoader (line 52) | class MultiTaskDataLoader(DataLoader):
    method __init__ (line 54) | def __init__(self, training=True, tau: float = 0.8, **dataloaders) -> ...
    method __len__ (line 62) | def __len__(self) -> int:
    method __iter__ (line 67) | def __iter__(self):
    method sampling_weights (line 81) | def sampling_weights(self):
    method sizes (line 89) | def sizes(self):
  class MultiTaskLearning (line 93) | class MultiTaskLearning(TorchComponent):
    method __init__ (line 95) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 114) | def build_dataloader(self,
    method build_transform (line 198) | def build_transform(self, task: Task) -> Tuple[TransformerSequenceToke...
    method build_optimizer (line 208) | def build_optimizer(self,
    method build_criterion (line 265) | def build_criterion(self, **kwargs):
    method build_metric (line 268) | def build_metric(self, **kwargs):
    method execute_training_loop (line 276) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method _close_dataloader (line 311) | def _close_dataloader(self, d):
    method fit_dataloader (line 323) | def fit_dataloader(self,
    method report_metrics (line 384) | def report_metrics(self, loss, metrics: MetricDict):
    method evaluate_dataloader (line 389) | def evaluate_dataloader(self,
    method build_model (line 438) | def build_model(self, training=False, **kwargs) -> torch.nn.Module:
    method predict (line 459) | def predict(self,
    method resolve_tasks (line 570) | def resolve_tasks(self, tasks, skip_tasks) -> List[Iterable[str]]:
    method predict_task (line 593) | def predict_task(self, task: Task, output_key, batch, results, output_...
    method _resolve_task_name (line 601) | def _resolve_task_name(self, dependencies):
    method fit (line 617) | def fit(self,
    method on_config_ready (line 650) | def on_config_ready(self, **kwargs):
    method reset_metrics (line 669) | def reset_metrics(metrics: Dict[str, Metric]):
    method feed_batch (line 673) | def feed_batch(self,
    method _encode (line 695) | def _encode(self, batch, task_name, output_dict=None, cls_is_bos=False...
    method decode_output (line 722) | def decode_output(self, output_dict, batch, task_name=None):
    method update_metrics (line 739) | def update_metrics(self, batch: Dict[str, Any], output_dict: Dict[str,...
    method compute_loss (line 748) | def compute_loss(self,
    method evaluate (line 755) | def evaluate(self, save_dir=None, logger: logging.Logger = None, batch...
    method save_vocabs (line 761) | def save_vocabs(self, save_dir, filename='vocabs.json'):
    method load_vocabs (line 765) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method parallelize (line 769) | def parallelize(self, devices: List[Union[int, torch.device]]):
    method __call__ (line 772) | def __call__(self, data, **kwargs) -> Document:
    method __getitem__ (line 775) | def __getitem__(self, task_name: str) -> Task:
    method __delitem__ (line 778) | def __delitem__(self, task_name: str):
    method __repr__ (line 798) | def __repr__(self):
    method items (line 801) | def items(self):
    method __setattr__ (line 804) | def __setattr__(self, key: str, value):

FILE: hanlp/components/mtl/tasks/__init__.py
  class Task (line 27) | class Task(ConfigTracker, TorchComponent, ABC):
    method __init__ (line 29) | def __init__(self,
    method build_dataloader (line 78) | def build_dataloader(self,
    method build_optimizer (line 102) | def build_optimizer(self, decoder: torch.nn.Module, **kwargs):
    method build_batch_wise_scheduler (line 105) | def build_batch_wise_scheduler(self, decoder: torch.nn.Module, **kwargs):
    method compute_loss (line 109) | def compute_loss(self,
    method decode_output (line 117) | def decode_output(self,
    method update_metrics (line 124) | def update_metrics(self,
    method build_model (line 133) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 137) | def build_metric(self, **kwargs):
    method fit_dataloader (line 140) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 143) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, o...
    method execute_training_loop (line 146) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method compute_lens (line 151) | def compute_lens(self, data: Union[List[Dict[str, Any]], str], dataset...
    method feed_batch (line 175) | def feed_batch(self,
    method input_is_flat (line 182) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 194) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method transform_batch (line 198) | def transform_batch(self,
    method _adjust_token (line 233) | def _adjust_token(self, batch, cls_is_bos, sep_is_eos, token_key):
    method build_samples (line 250) | def build_samples(self, inputs, cls_is_bos=False, sep_is_eos=False):
    method build_tokenizer (line 270) | def build_tokenizer(self, tokenizer: TransformerSequenceTokenizer):
    method finalize_document (line 287) | def finalize_document(self, doc: Document, task_name: str):

FILE: hanlp/components/mtl/tasks/amr.py
  class GraphAbstractMeaningRepresentationParsing (line 31) | class GraphAbstractMeaningRepresentationParsing(Task, GraphAbstractMeani...
    method __init__ (line 33) | def __init__(self,
    method build_dataloader (line 70) | def build_dataloader(self,
    method compute_loss (line 94) | def compute_loss(self,
    method decode_output (line 104) | def decode_output(self,
    method update_metrics (line 111) | def update_metrics(self,
    method build_model (line 118) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 121) | def build_metric(self, **kwargs):
    method input_is_flat (line 124) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 127) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method evaluate_dataloader (line 135) | def evaluate_dataloader(self,
    method feed_batch (line 150) | def feed_batch(self,
    method transform_batch (line 163) | def transform_batch(self, batch: Dict[str, Any], results: Dict[str, An...

FILE: hanlp/components/mtl/tasks/constituency.py
  class CRFConstituencyParsing (line 27) | class CRFConstituencyParsing(Task, CRFConstituencyParser):
    method __init__ (line 28) | def __init__(self,
    method build_dataloader (line 78) | def build_dataloader(self,
    method feed_batch (line 101) | def feed_batch(self,
    method compute_loss (line 112) | def compute_loss(self,
    method decode_output (line 121) | def decode_output(self,
    method update_metrics (line 137) | def update_metrics(self,
    method build_model (line 143) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 146) | def build_metric(self, **kwargs):
    method input_is_flat (line 149) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 152) | def prediction_to_result(self, prediction: List, batch: Dict[str, Any]...
    method finalize_document (line 155) | def finalize_document(self, doc: Document, task_name: str):
    method build_samples (line 168) | def build_samples(self, inputs, cls_is_bos=False, sep_is_eos=False):

FILE: hanlp/components/mtl/tasks/dep.py
  class BiaffineDependencyParsing (line 26) | class BiaffineDependencyParsing(Task, BiaffineDependencyParser):
    method __init__ (line 27) | def __init__(self,
    method update_metrics (line 84) | def update_metrics(self, batch: Dict[str, Any],
    method decode_output (line 90) | def decode_output(self,
    method compute_loss (line 98) | def compute_loss(self, batch: Dict[str, Any],
    method build_model (line 106) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 110) | def build_metric(self, **kwargs):
    method build_dataloader (line 113) | def build_dataloader(self, data, transform: TransformList = None, trai...
    method feed_batch (line 133) | def feed_batch(self, h: torch.FloatTensor, batch: Dict[str, torch.Tens...
    method build_optimizer (line 140) | def build_optimizer(self, decoder: torch.nn.Module, **kwargs):
    method input_is_flat (line 149) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 152) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method build_samples (line 165) | def build_samples(self, inputs, cls_is_bos=False, sep_is_eos=False):

FILE: hanlp/components/mtl/tasks/dep_2nd.py
  class BiaffineSecondaryDependencyDecoder (line 22) | class BiaffineSecondaryDependencyDecoder(torch.nn.Module):
    method __init__ (line 23) | def __init__(self, hidden_size, config) -> None:
    method forward (line 28) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
  class BiaffineSecondaryDependencyParsing (line 38) | class BiaffineSecondaryDependencyParsing(Task, BiaffineSecondaryParser):
    method __init__ (line 40) | def __init__(self, trn: str = None, dev: str = None, tst: str = None, ...
    method build_dataloader (line 59) | def build_dataloader(self, data, transform: Callable = None, training=...
    method update_metrics (line 72) | def update_metrics(self, batch: Dict[str, Any],
    method decode_output (line 79) | def decode_output(self, output: Dict[str, Any], batch: Dict[str, Any],...
    method compute_loss (line 83) | def compute_loss(self, batch: Dict[str, Any],
    method build_model (line 89) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 92) | def build_metric(self, **kwargs):
    method build_criterion (line 95) | def build_criterion(self, **kwargs):
    method build_optimizer (line 98) | def build_optimizer(self, decoder: torch.nn.Module, **kwargs):
    method input_is_flat (line 106) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 109) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/lem.py
  class LinearDecoder (line 19) | class LinearDecoder(torch.nn.Module):
    method __init__ (line 20) | def __init__(self,
    method forward (line 26) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
  class TransformerLemmatization (line 30) | class TransformerLemmatization(Task, TransformerLemmatizer):
    method __init__ (line 32) | def __init__(self,
    method build_dataloader (line 76) | def build_dataloader(self,
    method compute_loss (line 97) | def compute_loss(self,
    method decode_output (line 103) | def decode_output(self,
    method update_metrics (line 111) | def update_metrics(self,
    method build_model (line 118) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 121) | def build_metric(self, **kwargs):
    method input_is_flat (line 124) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 127) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/ner/biaffine_ner.py
  class BiaffineNamedEntityRecognition (line 23) | class BiaffineNamedEntityRecognition(Task, BiaffineNamedEntityRecognizer):
    method __init__ (line 25) | def __init__(self, trn: str = None, dev: str = None, tst: str = None, ...
    method update_metrics (line 56) | def update_metrics(self, batch: Dict[str, Any],
    method decode_output (line 61) | def decode_output(self,
    method compute_loss (line 69) | def compute_loss(self, batch: Dict[str, Any],
    method build_dataloader (line 74) | def build_dataloader(self, data,
    method build_model (line 93) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 97) | def build_metric(self, **kwargs):
    method input_is_flat (line 100) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 103) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/ner/tag_ner.py
  class LinearCRFDecoder (line 22) | class LinearCRFDecoder(torch.nn.Module):
    method __init__ (line 23) | def __init__(self,
    method forward (line 33) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
  class TaggingNamedEntityRecognition (line 39) | class TaggingNamedEntityRecognition(Task, TransformerNamedEntityRecogniz...
    method __init__ (line 41) | def __init__(self,
    method build_dataloader (line 112) | def build_dataloader(self,
    method compute_loss (line 135) | def compute_loss(self,
    method decode_output (line 141) | def decode_output(self,
    method update_metrics (line 149) | def update_metrics(self,
    method build_model (line 157) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 160) | def build_metric(self, **kwargs):
    method input_is_flat (line 163) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 166) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/pos.py
  class LinearCRFDecoder (line 22) | class LinearCRFDecoder(torch.nn.Module):
    method __init__ (line 23) | def __init__(self,
    method forward (line 38) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
  class TransformerTagging (line 54) | class TransformerTagging(Task, TransformerTagger):
    method __init__ (line 56) | def __init__(self,
    method build_dataloader (line 113) | def build_dataloader(self,
    method compute_loss (line 134) | def compute_loss(self,
    method decode_output (line 140) | def decode_output(self,
    method update_metrics (line 148) | def update_metrics(self,
    method build_model (line 155) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 158) | def build_metric(self, **kwargs):
    method input_is_flat (line 161) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 164) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/sdp.py
  class BiaffineSemanticDependencyParsing (line 24) | class BiaffineSemanticDependencyParsing(Task, BiaffineSemanticDependency...
    method __init__ (line 25) | def __init__(self,
    method update_metrics (line 85) | def update_metrics(self, batch: Dict[str, Any],
    method decode_output (line 91) | def decode_output(self,
    method compute_loss (line 99) | def compute_loss(self, batch: Dict[str, Any],
    method build_model (line 107) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 111) | def build_metric(self, **kwargs):
    method build_dataloader (line 114) | def build_dataloader(self, data, transform: TransformList = None, trai...
    method feed_batch (line 130) | def feed_batch(self, h: torch.FloatTensor, batch: Dict[str, torch.Tens...
    method build_optimizer (line 140) | def build_optimizer(self, decoder: torch.nn.Module, **kwargs):
    method input_is_flat (line 149) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 152) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method build_samples (line 169) | def build_samples(self, inputs, cls_is_bos=False, sep_is_eos=False):

FILE: hanlp/components/mtl/tasks/srl/bio_srl.py
  class SpanBIOSemanticRoleLabeling (line 22) | class SpanBIOSemanticRoleLabeling(Task, SpanBIOSemanticRoleLabeler):
    method __init__ (line 24) | def __init__(self,
    method build_dataloader (line 67) | def build_dataloader(self, data, transform: Callable = None, training=...
    method compute_loss (line 79) | def compute_loss(self, batch: Dict[str, Any],
    method decode_output (line 85) | def decode_output(self,
    method update_metrics (line 93) | def update_metrics(self, batch: Dict[str, Any],
    method build_model (line 98) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method feed_batch (line 107) | def feed_batch(self, h: torch.FloatTensor, batch: Dict[str, torch.Tens...
    method build_metric (line 119) | def build_metric(self, **kwargs):
    method input_is_flat (line 122) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 125) | def prediction_to_result(self, prediction: List, batch: Dict[str, Any]...

FILE: hanlp/components/mtl/tasks/srl/rank_srl.py
  class SpanRankingSemanticRoleLabeling (line 21) | class SpanRankingSemanticRoleLabeling(Task, SpanRankingSemanticRoleLabel...
    method __init__ (line 23) | def __init__(self, trn: str = None, dev: str = None, tst: str = None, ...
    method build_dataloader (line 75) | def build_dataloader(self, data, transform: Callable = None, training=...
    method update_metrics (line 85) | def update_metrics(self, batch: Dict[str, Any],
    method decode_output (line 91) | def decode_output(self,
    method compute_loss (line 98) | def compute_loss(self, batch: Dict[str, Any],
    method build_model (line 103) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 106) | def build_metric(self, **kwargs):
    method build_criterion (line 110) | def build_criterion(self, **kwargs):
    method input_is_flat (line 113) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 116) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...

FILE: hanlp/components/mtl/tasks/tok/reg_tok.py
  function generate_token_span_tuple (line 23) | def generate_token_span_tuple(sample: dict):
  class RegressionTokenizingDecoder (line 37) | class RegressionTokenizingDecoder(torch.nn.Linear):
    method __init__ (line 39) | def __init__(self, in_features: int, out_features: int = 1, bias: bool...
    method forward (line 43) | def forward(self, input: Tensor, **kwargs) -> Tensor:
  class RegressionTokenization (line 47) | class RegressionTokenization(Task):
    method __init__ (line 49) | def __init__(self, trn: str = None, dev: str = None, tst: str = None, ...
    method build_criterion (line 55) | def build_criterion(self, **kwargs):
    method build_metric (line 58) | def build_metric(self, **kwargs):
    method build_model (line 62) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method predict (line 65) | def predict(self, data: Union[str, List[str]], batch_size: int = None,...
    method build_dataloader (line 68) | def build_dataloader(self,
    method decode_output (line 95) | def decode_output(self,
    method update_metrics (line 101) | def update_metrics(self, batch: Dict[str, Any],
    method compute_loss (line 106) | def compute_loss(self, batch: Dict[str, Any],

FILE: hanlp/components/mtl/tasks/tok/tag_tok.py
  class LinearCRFDecoder (line 23) | class LinearCRFDecoder(torch.nn.Module):
    method __init__ (line 24) | def __init__(self,
    method forward (line 32) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
  class TaggingTokenization (line 36) | class TaggingTokenization(Task, TransformerTaggingTokenizer):
    method __init__ (line 38) | def __init__(self,
    method build_dataloader (line 95) | def build_dataloader(self, data, transform: TransformList = None, trai...
    method compute_loss (line 116) | def compute_loss(self,
    method decode_output (line 122) | def decode_output(self, output: Union[torch.Tensor, Dict[str, torch.Te...
    method update_metrics (line 126) | def update_metrics(self, batch: Dict[str, Any],
    method build_model (line 131) | def build_model(self, encoder_size, training=True, **kwargs) -> torch....
    method build_metric (line 134) | def build_metric(self, **kwargs):
    method build_criterion (line 137) | def build_criterion(self, model=None, **kwargs):
    method input_is_flat (line 140) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 143) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method build_tokenizer (line 146) | def build_tokenizer(self, tokenizer: TransformerSequenceTokenizer):
    method build_samples (line 164) | def build_samples(self, inputs, cls_is_bos=False, sep_is_eos=False):
    method dict_force (line 168) | def dict_force(self) -> DictInterface:
    method dict_force (line 172) | def dict_force(self, dictionary: Union[DictInterface, Union[Dict[str, ...
    method dict_combine (line 178) | def dict_combine(self) -> DictInterface:
    method dict_combine (line 182) | def dict_combine(self, dictionary: Union[DictInterface, Union[Dict[str...
    method transform_batch (line 186) | def transform_batch(self, batch: Dict[str, Any], results: Dict[str, An...

FILE: hanlp/components/mtl/tasks/ud.py
  class UniversalDependenciesParsing (line 24) | class UniversalDependenciesParsing(Task, UniversalDependenciesParser):
    method __init__ (line 26) | def __init__(self,
    method build_dataloader (line 73) | def build_dataloader(self, data, transform: Callable = None, training=...
    method compute_loss (line 92) | def compute_loss(self, batch: Dict[str, Any],
    method decode_output (line 97) | def decode_output(self, output: Union[torch.Tensor, Dict[str, torch.Te...
    method update_metrics (line 102) | def update_metrics(self, batch: Dict[str, Any],
    method build_model (line 108) | def build_model(self,
    method build_metric (line 128) | def build_metric(self, **kwargs):
    method input_is_flat (line 131) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 134) | def prediction_to_result(self, prediction: Dict[str, Any], batch: Dict...
    method feed_batch (line 137) | def feed_batch(self, h: torch.FloatTensor, batch: Dict[str, torch.Tens...
    method finalize_document (line 146) | def finalize_document(self, doc: Document, task_name: str):

FILE: hanlp/components/ner/biaffine_ner/biaffine_ner.py
  class BiaffineNamedEntityRecognizer (line 24) | class BiaffineNamedEntityRecognizer(TorchComponent):
    method __init__ (line 26) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 38) | def build_optimizer(self,
    method use_transformer (line 67) | def use_transformer(self):
    method _get_transformer (line 70) | def _get_transformer(self):
    method build_criterion (line 73) | def build_criterion(self, **kwargs):
    method build_metric (line 77) | def build_metric(self, **kwargs) -> F1:
    method execute_training_loop (line 80) | def execute_training_loop(self,
    method fit_dataloader (line 115) | def fit_dataloader(self,
    method evaluate_dataloader (line 151) | def evaluate_dataloader(self,
    method build_model (line 186) | def build_model(self,
    method build_dataloader (line 198) | def build_dataloader(self, data, batch_size, shuffle, device, logger: ...
    method build_dataset (line 223) | def build_dataset(self, data, vocabs, transform):
    method predict (line 232) | def predict(self, data: Union[List[str], List[List[str]]], batch_size:...
    method prediction_to_result (line 253) | def prediction_to_result(token, prediction, predictions: List, ret_tok...
    method input_is_flat (line 267) | def input_is_flat(data):
    method fit (line 271) | def fit(self,
    method build_vocabs (line 338) | def build_vocabs(self, dataset, logger, vocabs, lock=True, label_vocab...
    method reset_metrics (line 350) | def reset_metrics(self, metrics):
    method report_metrics (line 353) | def report_metrics(self, loss, metrics):
    method feed_batch (line 356) | def feed_batch(self, batch) -> Dict[str, Any]:
    method update_metrics (line 361) | def update_metrics(self, batch: dict, prediction: Union[Dict, List], m...
    method get_pred_ner (line 368) | def get_pred_ner(self, sentences, span_scores):

FILE: hanlp/components/ner/biaffine_ner/biaffine_ner_model.py
  function initializer_1d (line 12) | def initializer_1d(input_tensor, initializer):
  class BiaffineNamedEntityRecognitionModel (line 19) | class BiaffineNamedEntityRecognitionModel(nn.Module):
    method __init__ (line 21) | def __init__(self, config, embed: torch.nn.Module, context_layer: torc...
    method forward (line 37) | def forward(self,
  class BiaffineNamedEntityRecognitionDecoder (line 54) | class BiaffineNamedEntityRecognitionDecoder(nn.Module):
    method __init__ (line 55) | def __init__(self, hidden_size, ffnn_size, label_space_size, loss_redu...
    method forward (line 76) | def forward(self, contextualized_embeddings: torch.FloatTensor, batch:...
    method get_dense_span_labels (line 86) | def get_dense_span_labels(self, span_starts, span_ends, span_labels, m...
    method decode (line 101) | def decode(self, contextualized_embeddings, gold_starts, gold_ends, go...

FILE: hanlp/components/ner/ner_tf.py
  class IOBES_NamedEntityRecognizer (line 20) | class IOBES_NamedEntityRecognizer(KerasComponent, ABC):
    method predict_batch (line 22) | def predict_batch(self, batch, inputs=None):
  class IOBES_Transform (line 27) | class IOBES_Transform(Transform):
    method Y_to_outputs (line 29) | def Y_to_outputs(self, Y: Union[tf.Tensor, Tuple[tf.Tensor]], gold=Fal...
  class RNNNamedEntityRecognizerTF (line 35) | class RNNNamedEntityRecognizerTF(RNNTaggerTF, IOBES_NamedEntityRecognizer):
    method fit (line 37) | def fit(self, trn_data: str, dev_data: str = None, save_dir: str = Non...
    method build_loss (line 48) | def build_loss(self, loss, **kwargs):
  class NgramConvNamedEntityRecognizerTF (line 56) | class NgramConvNamedEntityRecognizerTF(NgramConvTaggerTF, IOBES_NamedEnt...
    method fit (line 58) | def fit(self, trn_data: Any, dev_data: Any, save_dir: str, word_embed:...
  class IOBES_TransformerTransform (line 69) | class IOBES_TransformerTransform(IOBES_Transform, TransformerTransform):
  class TransformerNamedEntityRecognizerTF (line 73) | class TransformerNamedEntityRecognizerTF(TransformerTaggerTF):
    method __init__ (line 75) | def __init__(self, transform: TransformerTransform = None) -> None:
    method fit (line 80) | def fit(self, trn_data, dev_data, save_dir, transformer, optimizer='ad...

FILE: hanlp/components/ner/rnn_ner.py
  class RNNNamedEntityRecognizer (line 14) | class RNNNamedEntityRecognizer(RNNTagger):
    method __init__ (line 16) | def __init__(self, **kwargs) -> None:
    method build_metric (line 24) | def build_metric(self, **kwargs):
    method evaluate_dataloader (line 27) | def evaluate_dataloader(self, data, criterion, logger=None, ratio_widt...
    method fit (line 33) | def fit(self, trn_data, dev_data, save_dir, batch_size=50, epochs=100,...
    method update_metrics (line 41) | def update_metrics(self, metric, logits, y, mask, batch, prediction):
    method predict (line 47) | def predict(self, tokens: Any, batch_size: int = None, **kwargs):
    method predict_data (line 50) | def predict_data(self, data, batch_size, **kwargs):
    method save_config (line 65) | def save_config(self, save_dir, filename='config.json'):

FILE: hanlp/components/ner/transformer_ner.py
  class TransformerNamedEntityRecognizer (line 18) | class TransformerNamedEntityRecognizer(TransformerTagger):
    method __init__ (line 20) | def __init__(self, **kwargs) -> None:
    method build_metric (line 35) | def build_metric(self, **kwargs):
    method update_metrics (line 39) | def update_metrics(self, metric, logits, y, mask, batch, prediction):
    method decode_output (line 46) | def decode_output(self, logits, mask, batch, model=None):
    method tag_to_span (line 51) | def tag_to_span(self, batch_tags, batch):
    method decorate_spans (line 104) | def decorate_spans(self, spans, batch):
    method generate_prediction_filename (line 114) | def generate_prediction_filename(self, tst_data, save_dir):
    method prediction_to_human (line 117) | def prediction_to_human(self, pred, vocab, batch):
    method input_is_flat (line 120) | def input_is_flat(self, tokens):
    method fit (line 123) | def fit(self, trn_data, dev_data, save_dir, transformer,
    method build_vocabs (line 204) | def build_vocabs(self, trn, logger, **kwargs):
    method build_dataset (line 214) | def build_dataset(self, data, transform=None, **kwargs):
    method dict_whitelist (line 223) | def dict_whitelist(self) -> DictInterface:
    method dict_whitelist (line 227) | def dict_whitelist(self, dictionary: Union[DictInterface, Union[Dict[s...
    method dict_blacklist (line 233) | def dict_blacklist(self) -> DictInterface:
    method dict_blacklist (line 237) | def dict_blacklist(self, dictionary: Union[DictInterface, Union[Dict[s...

FILE: hanlp/components/parsers/alg.py
  function kmeans (line 29) | def kmeans(x, k, max_it=32):
  function eisner (line 102) | def eisner(scores, mask):
  function backtrack (line 195) | def backtrack(p_i, p_c, heads, i, j, complete):
  function stripe (line 209) | def stripe(x, n, w, offset=(0, 0), dim=1):
  function cky (line 246) | def cky(scores, mask):
  function istree (line 318) | def istree(sequence, proj=False, multiroot=False):
  function tarjan (line 352) | def tarjan(sequence):
  function chuliu_edmonds (line 408) | def chuliu_edmonds(s):
  function mst (line 507) | def mst(scores, mask, multiroot=False):
  function eisner2o (line 566) | def eisner2o(scores, mask):
  function pad (line 721) | def pad(tensors, padding_value=0, total_length=None):
  function decode_dep (line 733) | def decode_dep(s_arc, mask, tree=False, proj=False):

FILE: hanlp/components/parsers/alg_tf.py
  function nonzero (line 11) | def nonzero(t: tf.Tensor) -> tf.Tensor:
  function view (line 15) | def view(t: tf.Tensor, *dims) -> tf.Tensor:
  function arange (line 19) | def arange(n: int) -> tf.Tensor:
  function randperm (line 23) | def randperm(n: int) -> tf.Tensor:
  function tolist (line 27) | def tolist(t: tf.Tensor) -> List:
  function kmeans (line 33) | def kmeans(x, k, seed=None):
  class Tarjan (line 91) | class Tarjan:
    method __init__ (line 94) | def __init__(self, prediction, tokens):
    method strongconnect (line 121) | def strongconnect(self, v, index, stack):
    method edges (line 158) | def edges(self):
    method vertices (line 162) | def vertices(self):
    method indices (line 166) | def indices(self):
    method SCCs (line 170) | def SCCs(self):
  function tarjan (line 174) | def tarjan(parse_probs, length, tokens_to_keep, ensure_tree=True):
  function rel_argmax (line 259) | def rel_argmax(rel_probs, length, root, ensure_tree=True):

FILE: hanlp/components/parsers/biaffine/biaffine.py
  class Biaffine (line 28) | class Biaffine(nn.Module):
    method __init__ (line 54) | def __init__(self, n_in, n_out=1, bias_x=True, bias_y=True):
    method __repr__ (line 65) | def __repr__(self):
    method reset_parameters (line 74) | def reset_parameters(self):
    method forward (line 77) | def forward(self, x, y):

FILE: hanlp/components/parsers/biaffine/biaffine_2nd_dep.py
  class BiaffineSeparateDecoder (line 25) | class BiaffineSeparateDecoder(torch.nn.Module):
    method __init__ (line 27) | def __init__(self, hidden_size, config) -> None:
    method forward (line 40) | def forward(self, x, mask):
  class BiaffineJointDecoder (line 44) | class BiaffineJointDecoder(BiaffineDecoder):
    method __init__ (line 45) | def __init__(self, hidden_size, config) -> None:
    method forward (line 56) | def forward(self, x, mask=None, **kwargs: Any):
  class BiaffineSecondaryModel (line 63) | class BiaffineSecondaryModel(torch.nn.Module):
    method __init__ (line 65) | def __init__(self, config, pretrained_embed: torch.Tensor = None, tran...
    method forward (line 72) | def forward(self,
  class BiaffineSecondaryParser (line 82) | class BiaffineSecondaryParser(BiaffineDependencyParser):
    method __init__ (line 84) | def __init__(self) -> None:
    method build_dataset (line 88) | def build_dataset(self, data, bos_transform=None):
    method build_criterion (line 99) | def build_criterion(self, **kwargs):
    method fit (line 103) | def fit(self, trn_data, dev_data, save_dir, feat=None, n_embed=100, pr...
    method build_vocabs (line 115) | def build_vocabs(self, dataset, logger=None, transformer=None):
    method create_model (line 122) | def create_model(self, pretrained_embed, transformer):
    method compute_loss (line 125) | def compute_loss(self, arc_scores, rel_scores, arcs, rels, mask, crite...
    method compute_mask (line 136) | def compute_mask(arc_scores_2nd, batch, mask_1st):
    method unpack_scores (line 142) | def unpack_scores(self, arc_scores, rel_scores):
    method get_pad_dict (line 147) | def get_pad_dict(self):
    method decode (line 152) | def decode(self, arc_scores, rel_scores, mask, batch=None, predicting=...
    method update_metric (line 184) | def update_metric(self, arc_preds, rel_preds, arcs, rels, mask, puncts...
    method build_metric (line 192) | def build_metric(self, **kwargs):
    method collect_outputs_extend (line 197) | def collect_outputs_extend(self, predictions: list, arc_preds, rel_pre...
    method predictions_to_human (line 200) | def predictions_to_human(self, predictions, outputs, data, use_pos, co...

FILE: hanlp/components/parsers/biaffine/biaffine_dep.py
  class BiaffineDependencyParser (line 33) | class BiaffineDependencyParser(TorchComponent):
    method __init__ (line 34) | def __init__(self) -> None:
    method predict (line 41) | def predict(self, data: Any, batch_size=None, batch_max_tokens=None, c...
    method build_samples (line 70) | def build_samples(self, data, use_pos=None):
    method input_is_flat (line 84) | def input_is_flat(self, data, use_pos=None):
    method before_outputs (line 93) | def before_outputs(self, data):
    method post_outputs (line 100) | def post_outputs(self, predictions, data, order, use_pos, build_data, ...
    method predictions_to_human (line 108) | def predictions_to_human(self, predictions, outputs, data, use_pos, co...
    method collect_outputs (line 126) | def collect_outputs(self, arc_scores, rel_scores, mask, batch, predict...
    method collect_outputs_extend (line 138) | def collect_outputs_extend(self, predictions: list, arc_preds, rel_pre...
    method use_pos (line 143) | def use_pos(self):
    method fit (line 146) | def fit(self, trn_data, dev_data, save_dir,
    method execute_training_loop (line 194) | def execute_training_loop(self, trn, dev, devices, epochs, logger, pat...
    method build_optimizer (line 235) | def build_optimizer(self, epochs, trn, gradient_accumulation, **kwargs):
    method build_transformer_tokenizer (line 282) | def build_transformer_tokenizer(self):
    method build_dataloader (line 292) | def build_dataloader(self,
    method cache_dataset (line 330) | def cache_dataset(self, dataset, timer, training=False, logger=None):
    method get_pad_dict (line 334) | def get_pad_dict(self):
    method build_dataset (line 337) | def build_dataset(self, data, bos_transform=None):
    method build_tokenizer_transform (line 350) | def build_tokenizer_transform(self):
    method build_vocabs (line 357) | def build_vocabs(self, dataset, logger=None, transformer=None):
    method build_model (line 403) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method create_model (line 410) | def create_model(self, pretrained_embed, transformer):
    method build_embeddings (line 416) | def build_embeddings(self, training=True):
    method fit_dataloader (line 427) | def fit_dataloader(self,
    method _step (line 460) | def _step(self, optimizer, scheduler, transformer_optimizer, transform...
    method feed_batch (line 472) | def feed_batch(self, batch):
    method _report (line 484) | def _report(self, loss, metric: AttachmentScore):
    method compute_loss (line 487) | def compute_loss(self, arc_scores, rel_scores, arcs, rels, mask, crite...
    method evaluate_dataloader (line 499) | def evaluate_dataloader(self, loader: PadSequenceDataLoader, criterion...
    method update_metric (line 537) | def update_metric(self, arc_preds, rel_preds, arcs, rels, mask, puncts...
    method decode (line 543) | def decode(self, arc_scores, rel_scores, mask, batch=None):
    method build_criterion (line 554) | def build_criterion(self, **kwargs):
    method build_metric (line 558) | def build_metric(self, **kwargs):
    method on_config_ready (line 561) | def on_config_ready(self, **kwargs):
    method prediction_to_head_rel (line 565) | def prediction_to_head_rel(self, arcs: torch.LongTensor, rels: torch.L...

FILE: hanlp/components/parsers/biaffine/biaffine_model.py
  class EncoderWithContextualLayer (line 18) | class EncoderWithContextualLayer(nn.Module):
    method __init__ (line 19) | def __init__(self,
    method forward (line 89) | def forward(self, words, feats, input_ids, token_span, mask, lens):
    method run_rnn (line 143) | def run_rnn(self, embed, lens, seq_len):
    method run_transformer (line 149) | def run_transformer(self, input_ids, token_span):
  class BiaffineDecoder (line 154) | class BiaffineDecoder(nn.Module):
    method __init__ (line 155) | def __init__(self, hidden_size, n_mlp_arc, n_mlp_rel, mlp_dropout, n_r...
    method forward (line 181) | def forward(self, x, mask=None, **kwargs: Any) -> Tuple[torch.Tensor, ...
    method decode (line 189) | def decode(arc_d, arc_h, rel_d, rel_h, mask, arc_attn, rel_attn):
    method apply_mlps (line 200) | def apply_mlps(self, x):
  class BiaffineDependencyModel (line 209) | class BiaffineDependencyModel(nn.Module):
    method __init__ (line 211) | def __init__(self, config, pretrained_embed: torch.Tensor = None, tran...
    method forward (line 221) | def forward(self,

FILE: hanlp/components/parsers/biaffine/biaffine_sdp.py
  class BiaffineSemanticDependencyParser (line 20) | class BiaffineSemanticDependencyParser(BiaffineDependencyParser):
    method __init__ (line 21) | def __init__(self) -> None:
    method get_pad_dict (line 28) | def get_pad_dict(self):
    method build_metric (line 31) | def build_metric(self, **kwargs):
    method build_dataset (line 35) | def build_dataset(self, data, transform=None):
    method build_criterion (line 42) | def build_criterion(self, **kwargs):
    method feed_batch (line 45) | def feed_batch(self, batch):
    method convert_to_3d_puncts (line 52) | def convert_to_3d_puncts(puncts, mask):
    method convert_to_3d_mask (line 58) | def convert_to_3d_mask(arc_scores, mask):
    method compute_loss (line 64) | def compute_loss(self, arc_scores, rel_scores, arcs, rels, mask: torch...
    method cache_dataset (line 79) | def cache_dataset(self, dataset, timer, training=False, logger=None):
    method decode (line 106) | def decode(self, arc_scores, rel_scores, mask, batch=None):
    method collect_outputs_extend (line 134) | def collect_outputs_extend(self, predictions, arc_preds, rel_preds, le...
    method predictions_to_human (line 139) | def predictions_to_human(self, predictions, outputs, data, use_pos, co...
    method fit (line 153) | def fit(self, trn_data, dev_data, save_dir,

FILE: hanlp/components/parsers/biaffine/mlp.py
  class MLP (line 30) | class MLP(nn.Module):
    method __init__ (line 46) | def __init__(self, n_in, n_out, dropout=0, activation=True):
    method __repr__ (line 57) | def __repr__(self):
    method reset_parameters (line 64) | def reset_parameters(self):
    method forward (line 68) | def forward(self, x):

FILE: hanlp/components/parsers/biaffine/structual_attention.py
  class StructuralAttentionLayer (line 21) | class StructuralAttentionLayer(nn.Module):
    method __init__ (line 23) | def __init__(self, hidden_size, n_mlp_arc, n_mlp_rel, mlp_dropout, n_r...
    method forward (line 35) | def forward(self, x, mask):
  class StructuralAttentionModel (line 50) | class StructuralAttentionModel(nn.Module):
    method __init__ (line 51) | def __init__(self,
    method forward (line 77) | def forward(self,
  class MaskedTokenGenerator (line 91) | class MaskedTokenGenerator(object):
    method __init__ (line 93) | def __init__(self, transformer_tokenizer: PreTrainedTokenizer, mask_pr...
    method __call__ (line 103) | def __call__(self, tokens: torch.LongTensor, prefix_mask: torch.LongTe...
  class StructuralAttentionParser (line 118) | class StructuralAttentionParser(BiaffineDependencyParser):
    method __init__ (line 119) | def __init__(self) -> None:
    method build_model (line 124) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method fit (line 129) | def fit(self, trn_data, dev_data, save_dir,
    method feed_batch (line 165) | def feed_batch(self, batch):
    method on_config_ready (line 183) | def on_config_ready(self, **kwargs):
    method compute_loss (line 187) | def compute_loss(self, arc_scores, rel_scores, arcs, rels, mask, crite...
    method build_tokenizer_transform (line 201) | def build_tokenizer_transform(self):
    method build_metric (line 208) | def build_metric(self, training=None, **kwargs):
    method update_metric (line 215) | def update_metric(self, arc_scores, rel_scores, arcs, rels, mask, punc...
    method _report (line 229) | def _report(self, loss, metric):

FILE: hanlp/components/parsers/biaffine/variationalbilstm.py
  class VariationalLSTM (line 33) | class VariationalLSTM(nn.Module):
    method __init__ (line 64) | def __init__(self, input_size, hidden_size, num_layers=1, bidirectiona...
    method __repr__ (line 85) | def __repr__(self):
    method reset_parameters (line 96) | def reset_parameters(self):
    method permute_hidden (line 105) | def permute_hidden(self, hx, permutation):
    method layer_forward (line 113) | def layer_forward(self, x, hx, cell, batch_sizes, reverse=False):
    method forward (line 141) | def forward(self, sequence, hx=None):
  class VariationalLSTMEncoder (line 208) | class VariationalLSTMEncoder(VariationalLSTM, ConfigTracker):
    method __init__ (line 209) | def __init__(self,
    method forward (line 222) | def forward(self, embed, mask):
    method get_output_dim (line 230) | def get_output_dim(self):

FILE: hanlp/components/parsers/biaffine_parser_tf.py
  class BiaffineDependencyParserTF (line 27) | class BiaffineDependencyParserTF(KerasComponent):
    method __init__ (line 28) | def __init__(self, transform: CoNLL_DEP_Transform = None) -> None:
    method build_model (line 35) | def build_model(self, pretrained_embed, n_embed, training, **kwargs) -...
    method _init_config (line 49) | def _init_config(self):
    method load_weights (line 55) | def load_weights(self, save_dir, filename='model.h5', functional=False...
    method fit (line 60) | def fit(self, trn_data, dev_data, save_dir,
    method train_loop (line 94) | def train_loop(self, trn_data, dev_data, epochs, num_examples,
    method evaluate (line 149) | def evaluate(self, input_path: str, save_dir=None, output=False, batch...
    method evaluate_batch (line 155) | def evaluate_batch(self, words, feats, arcs, rels, arc_loss, rel_loss,...
    method _build_metrics (line 162) | def _build_metrics(self):
    method run_metrics (line 171) | def run_metrics(self, arcs, rels, arc_scores, rel_scores, words, mask,...
    method train_batch (line 179) | def train_batch(self, words, feats, arcs, rels, mask, optimizer, arc_l...
    method get_loss (line 187) | def get_loss(self, arc_scores, rel_scores, arcs, rels, mask, arc_loss,...
    method build_optimizer (line 197) | def build_optimizer(self, optimizer='adam', lr=2e-3, mu=.9, nu=.9, eps...
    method build_loss (line 213) | def build_loss(self, arc_loss, rel_loss, **kwargs):
    method sample_data (line 224) | def sample_data(self):
    method num_samples_in (line 227) | def num_samples_in(self, dataset):
    method build_train_dataset (line 230) | def build_train_dataset(self, trn_data, batch_size, num_examples):
    method build_callbacks (line 237) | def build_callbacks(self, save_dir, logger, metrics, **kwargs):
    method build_progbar (line 248) | def build_progbar(self, metrics, training=True):
    method decode (line 253) | def decode(self, arc_scores, rel_scores, mask):
    method evaluate_dataset (line 291) | def evaluate_dataset(self, tst_data, callbacks, output, num_batches, r...
    method predict_batch (line 338) | def predict_batch(self, batch, inputs=None, conll=True, **kwargs):
    method compile_model (line 347) | def compile_model(self, optimizer, loss, metrics):
  class BiaffineSemanticDependencyParserTF (line 351) | class BiaffineSemanticDependencyParserTF(BiaffineDependencyParserTF):
    method __init__ (line 352) | def __init__(self, transform: CoNLL_SDP_Transform = None) -> None:
    method fit (line 359) | def fit(self, trn_data, dev_data, save_dir, n_embed=100, pretrained_em...
    method get_loss (line 371) | def get_loss(self, arc_scores, rel_scores, arcs, rels, mask, arc_loss,...
    method decode (line 383) | def decode(self, arc_scores, rel_scores, mask):
  class BiaffineTransformerDependencyParserTF (line 390) | class BiaffineTransformerDependencyParserTF(BiaffineDependencyParserTF, ...
    method __init__ (line 391) | def __init__(self, transform: CoNLL_Transformer_Transform = None) -> N...
    method build_model (line 397) | def build_model(self, transformer, training, **kwargs) -> tf.keras.Model:
    method build_transformer (line 402) | def build_transformer(self, training, transformer):
    method fit (line 441) | def fit(self, trn_data, dev_data, save_dir, transformer, max_seq_lengt...
    method sample_data (line 463) | def sample_data(self):
    method build_optimizer (line 471) | def build_optimizer(self, optimizer, learning_rate, epsilon, weight_de...
    method build_vocab (line 493) | def build_vocab(self, trn_data, logger):
    method build_callbacks (line 497) | def build_callbacks(self, save_dir, logger, metrics, **kwargs):
    method on_train_begin (line 504) | def on_train_begin(self):
    method train_batch (line 510) | def train_batch(self, words, feats, arcs, rels, mask, optimizer, arc_l...
    method _apply_grads (line 524) | def _apply_grads(self, accum_grads):
    method on_epoch_end (line 540) | def on_epoch_end(self, epoch, logs=None):
  class BiaffineTransformerSemanticDependencyParser (line 545) | class BiaffineTransformerSemanticDependencyParser(BiaffineTransformerDep...
    method __init__ (line 547) | def __init__(self, transform: CoNLL_Transformer_Transform = None) -> N...
    method get_loss (line 552) | def get_loss(self, arc_scores, rel_scores, arcs, rels, mask, arc_loss,...
    method fit (line 556) | def fit(self, trn_data, dev_data, save_dir, transformer, max_seq_lengt...
    method decode (line 566) | def decode(self, arc_scores, rel_scores, mask):

FILE: hanlp/components/parsers/biaffine_tf/alg.py
  function nonzero (line 11) | def nonzero(t: tf.Tensor) -> tf.Tensor:
  function view (line 15) | def view(t: tf.Tensor, *dims) -> tf.Tensor:
  function arange (line 19) | def arange(n: int) -> tf.Tensor:
  function randperm (line 23) | def randperm(n: int) -> tf.Tensor:
  function tolist (line 27) | def tolist(t: tf.Tensor) -> List:
  function kmeans (line 33) | def kmeans(x, k, seed=None):
  class Tarjan (line 91) | class Tarjan:
    method __init__ (line 94) | def __init__(self, prediction, tokens):
    method strongconnect (line 121) | def strongconnect(self, v, index, stack):
    method edges (line 158) | def edges(self):
    method vertices (line 162) | def vertices(self):
    method indices (line 166) | def indices(self):
    method SCCs (line 170) | def SCCs(self):
  function tarjan (line 174) | def tarjan(parse_probs, length, tokens_to_keep, ensure_tree=True):
  function rel_argmax (line 259) | def rel_argmax(rel_probs, length, root, ensure_tree=True):

FILE: hanlp/components/parsers/biaffine_tf/layers.py
  class Biaffine (line 9) | class Biaffine(tf.keras.layers.Layer):
    method __init__ (line 10) | def __init__(self, n_in, n_out=1, bias_x=True, bias_y=True, trainable=...
    method build (line 19) | def build(self, input_shape):
    method extra_repr (line 26) | def extra_repr(self):
    method call (line 36) | def call(self, x, y, **kwargs):
  class MLP (line 50) | class MLP(tf.keras.layers.Layer):
    method __init__ (line 51) | def __init__(self, n_hidden, dropout=0, trainable=True, name=None, dty...
    method call (line 57) | def call(self, x, **kwargs):
  class SharedDropout (line 65) | class SharedDropout(tf.keras.layers.Layer):
    method __init__ (line 67) | def __init__(self, p=0.5, batch_first=True, trainable=True, name=None,...
    method extra_repr (line 73) | def extra_repr(self):
    method call (line 80) | def call(self, x, training=None, **kwargs):
    method get_mask (line 91) | def get_mask(x, p):
  class IndependentDropout (line 98) | class IndependentDropout(tf.keras.layers.Layer):
    method __init__ (line 100) | def __init__(self, p=0.5, trainable=True, name=None, dtype=None, dynam...
    method extra_repr (line 105) | def extra_repr(self):
    method call (line 108) | def call(self, inputs, training=None, **kwargs):

FILE: hanlp/components/parsers/biaffine_tf/model.py
  class BiaffineModelTF (line 9) | class BiaffineModelTF(tf.keras.Model):
    method __init__ (line 11) | def __init__(self, config, embed=None, transformer: TFPreTrainedModel ...
    method call (line 85) | def call(self, inputs, mask_inf=True, **kwargs):
    method run_transformer (line 130) | def run_transformer(self, input_ids, input_mask, prefix_offset):
    method to_functional (line 146) | def to_functional(self):

FILE: hanlp/components/parsers/chu_liu_edmonds.py
  function decode_mst (line 8) | def decode_mst(
  function chu_liu_edmonds (line 100) | def chu_liu_edmonds(
  function _find_cycle (line 274) | def _find_cycle(

FILE: hanlp/components/parsers/conll.py
  function collapse_enhanced_empty_nodes (line 10) | def collapse_enhanced_empty_nodes(sent: list):
  function read_conll (line 27) | def read_conll(filepath: Union[str, TimingFileIterator], underline_to_no...

FILE: hanlp/components/parsers/constituency/crf_constituency_model.py
  class CRFConstituencyDecoder (line 32) | class CRFConstituencyDecoder(nn.Module):
    method __init__ (line 97) | def __init__(self,
    method forward (line 119) | def forward(self, x, **kwargs):
    method loss (line 147) | def loss(self, s_span, s_label, charts, mask, mbr=True):
    method decode (line 174) | def decode(self, s_span, s_label, mask):
  class CRFConstituencyModel (line 194) | class CRFConstituencyModel(nn.Module):
    method __init__ (line 196) | def __init__(self, encoder, decoder: CRFConstituencyDecoder) -> None:
    method forward (line 201) | def forward(self, batch):

FILE: hanlp/components/parsers/constituency/crf_constituency_parser.py
  class CRFConstituencyParser (line 27) | class CRFConstituencyParser(TorchComponent):
    method __init__ (line 28) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 37) | def build_optimizer(self, trn, **kwargs):
    method build_criterion (line 41) | def build_criterion(self, decoder=None, **kwargs):
    method build_metric (line 44) | def build_metric(self, **kwargs):
    method execute_training_loop (line 47) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 81) | def fit_dataloader(self,
    method decode_output (line 118) | def decode_output(self, out, mask, batch, span_probs=None, decoder=Non...
    method update_metrics (line 142) | def update_metrics(self, metric, batch, prediction):
    method feed_batch (line 156) | def feed_batch(self, batch: dict):
    method compute_mask (line 161) | def compute_mask(self, batch, offset=1):
    method compute_loss (line 168) | def compute_loss(self, out, y, mask, crf_decoder=None):
    method _step (line 176) | def _step(self, optimizer, scheduler, grad_norm):
    method evaluate_dataloader (line 183) | def evaluate_dataloader(self, data, criterion, logger=None, ratio_widt...
    method build_model (line 206) | def build_model(self, encoder, training=True, **kwargs) -> torch.nn.Mo...
    method build_dataloader (line 211) | def build_dataloader(self,
    method predict (line 237) | def predict(self, data: Union[str, List[str]], **kwargs):
    method input_is_flat (line 258) | def input_is_flat(self, data):
    method build_samples (line 261) | def build_samples(self, data):
    method fit (line 265) | def fit(self,
    method build_dataset (line 299) | def build_dataset(self, data, transform, logger=None):
    method build_vocabs (line 313) | def build_vocabs(self, trn, logger, **kwargs):

FILE: hanlp/components/parsers/constituency/treecrf.py
  class CRFConstituency (line 32) | class CRFConstituency(nn.Module):
    method forward (line 45) | def forward(self, scores, mask, target=None, mbr=False):
    method inside (line 78) | def inside(self, scores, mask):
  class CRF2oDependency (line 105) | class CRF2oDependency(nn.Module):
    method __init__ (line 117) | def __init__(self):
    method forward (line 122) | def forward(self, scores, mask, target=None, mbr=True, partial=False):
    method inside (line 176) | def inside(self, scores, mask, cands=None):
    method loss (line 255) | def loss(self, s_arc, s_sib, s_rel, arcs, sibs, rels, mask, mbr=True, ...
    method decode (line 324) | def decode(self, s_arc, s_sib, s_rel, mask, tree=False, mbr=True, proj...

FILE: hanlp/components/parsers/parse_alg.py
  class Tarjan (line 11) | class Tarjan:
    method __init__ (line 14) | def __init__(self, prediction, tokens):
    method strongconnect (line 41) | def strongconnect(self, v, index, stack):
    method edges (line 78) | def edges(self):
    method vertices (line 82) | def vertices(self):
    method indices (line 86) | def indices(self):
    method SCCs (line 90) | def SCCs(self):
  class UnionFind (line 94) | class UnionFind(object):
    method __init__ (line 96) | def __init__(self, n) -> None:
    method find (line 101) | def find(self, x):
    method unite (line 107) | def unite(self, x, y):
    method same (line 119) | def same(self, x, y):
  function tarjan (line 123) | def tarjan(parse_probs, length, tokens_to_keep, ensure_tree=True):
  function chu_liu_edmonds (line 177) | def chu_liu_edmonds(parse_probs, length):
  function unique_root (line 183) | def unique_root(parse_probs, tokens_to_keep: np.ndarray, length):
  function dfs (line 221) | def dfs(graph, start, end):
  function mst_then_greedy (line 234) | def mst_then_greedy(arc_scores, rel_scores, mask, root_rel_idx, rel_idx=...
  function adjust_root_score (line 255) | def adjust_root_score(arc_scores, parse_preds, root_rel_idx, rel_scores=...
  function add_secondary_arcs_by_scores (line 265) | def add_secondary_arcs_by_scores(arc_scores, rel_scores, tree, root_rel_...
  function add_secondary_arcs_by_preds (line 275) | def add_secondary_arcs_by_preds(arc_scores, arc_preds, rel_preds, tree, ...
  function adjust_root_score_then_add_secondary_arcs (line 304) | def adjust_root_score_then_add_secondary_arcs(arc_scores, rel_scores, tr...

FILE: hanlp/components/parsers/ud/lemma_edit.py
  function min_edit_script (line 9) | def min_edit_script(source, target, allow_copy=False):
  function gen_lemma_rule (line 35) | def gen_lemma_rule(form, lemma, allow_copy=False):
  function apply_lemma_rule (line 79) | def apply_lemma_rule(form, lemma_rule):

FILE: hanlp/components/parsers/ud/tag_decoder.py
  class TagDecoder (line 40) | class TagDecoder(torch.nn.Module):
    method __init__ (line 43) | def __init__(self,
    method forward (line 63) | def forward(self,
    method _adaptive_loss (line 79) | def _adaptive_loss(self, hidden, mask, gold_tags, output_dim):
    method _loss (line 95) | def _loss(self, hidden, mask, gold_tags, output_dim):
    method decode (line 109) | def decode(self, output_dict: Dict[str, torch.Tensor]) -> Dict[str, to...

FILE: hanlp/components/parsers/ud/ud_model.py
  class UniversalDependenciesModel (line 16) | class UniversalDependenciesModel(torch.nn.Module):
    method __init__ (line 17) | def __init__(self,
    method forward (line 43) | def forward(self,
  class UniversalDependenciesDecoder (line 51) | class UniversalDependenciesDecoder(torch.nn.Module):
    method __init__ (line 52) | def __init__(self,
    method forward (line 89) | def forward(self,
    method decode (line 135) | def decode(self, output_dict: Dict[str, torch.Tensor]) -> Dict[str, to...

FILE: hanlp/components/parsers/ud/ud_parser.py
  class UniversalDependenciesParser (line 34) | class UniversalDependenciesParser(TorchComponent):
    method __init__ (line 36) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 46) | def build_dataloader(self,
    method build_vocabs (line 71) | def build_vocabs(self, trn, logger, **kwargs):
    method build_dataset (line 86) | def build_dataset(self, data, transform):
    method build_optimizer (line 90) | def build_optimizer(self, trn, **kwargs):
    method build_criterion (line 94) | def build_criterion(self, **kwargs):
    method build_metric (line 97) | def build_metric(self, **kwargs):
    method evaluate_dataloader (line 105) | def evaluate_dataloader(self,
    method build_model (line 132) | def build_model(self,
    method predict (line 155) | def predict(self, data: Union[List[str], List[List[str]]], batch_size:...
    method build_samples (line 182) | def build_samples(self, data: List[List[str]]):
    method fit (line 185) | def fit(self,
    method execute_training_loop (line 214) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 248) | def fit_dataloader(self,
    method decode_output (line 284) | def decode_output(self, outputs, mask, batch):
    method update_metrics (line 291) | def update_metrics(self, metrics, batch, outputs, mask):
    method feed_batch (line 302) | def feed_batch(self, batch: dict):
    method compute_mask (line 310) | def compute_mask(self, batch):
    method _step (line 315) | def _step(self, optimizer, scheduler, grad_norm):
    method input_is_flat (line 321) | def input_is_flat(self, data):
    method prediction_to_human (line 325) | def prediction_to_human(self, outputs: dict, batch):
    method __call__ (line 345) | def __call__(self, data, batch_size=None, **kwargs) -> Union[CoNLLSent...

FILE: hanlp/components/parsers/ud/udify_util.py
  function get_ud_treebank_files (line 31) | def get_ud_treebank_files(dataset_dir: str, treebanks: List[str] = None)...
  function sequence_cross_entropy (line 64) | def sequence_cross_entropy(log_probs: torch.FloatTensor,
  function sequence_cross_entropy_with_logits (line 109) | def sequence_cross_entropy_with_logits(
  function tiny_value_of_dtype (line 279) | def tiny_value_of_dtype(dtype: torch.dtype):
  function combine_initial_dims_to_1d_or_2d (line 301) | def combine_initial_dims_to_1d_or_2d(tensor: torch.Tensor) -> torch.Tensor:
  function uncombine_initial_dims (line 318) | def uncombine_initial_dims(tensor: torch.Tensor, original_size: torch.Si...
  function get_range_vector (line 340) | def get_range_vector(size: int, device: int) -> torch.Tensor:
  function get_device_of (line 357) | def get_device_of(tensor: torch.Tensor) -> int:

FILE: hanlp/components/parsers/ud/util.py
  function generate_lemma_rule (line 8) | def generate_lemma_rule(sample: dict):
  function append_bos (line 15) | def append_bos(sample: dict):
  function sample_form_missing (line 27) | def sample_form_missing(sample: dict):

FILE: hanlp/components/pipeline.py
  class Pipe (line 15) | class Pipe(Component):
    method __init__ (line 17) | def __init__(self, component: Component, input_key: str = None, output...
    method predict (line 33) | def predict(self, doc: Document, **kwargs) -> Document:
    method __repr__ (line 66) | def __repr__(self):
    method from_config (line 72) | def from_config(meta: dict, **kwargs):
  class Pipeline (line 78) | class Pipeline(Component, list):
    method __init__ (line 79) | def __init__(self, *pipes: Pipe) -> None:
    method append (line 86) | def append(self, component: Callable, input_key: Union[str, Iterable[s...
    method insert (line 106) | def insert(self, index: int, component: Callable, input_key: Union[str...
    method __call__ (line 132) | def __call__(self, doc: Union[Document, Any] = None, **kwargs) -> Docu...
    method copy (line 149) | def copy(self):
    method __copy__ (line 152) | def __copy__(self):
    method meta (line 157) | def meta(self):
    method meta (line 164) | def meta(self, value):
    method save (line 167) | def save(self, filepath):
    method load (line 170) | def load(self, filepath):
    method from_config (line 176) | def from_config(meta: Union[dict, str], **kwargs):

FILE: hanlp/components/rnn_language_model_tf.py
  class RNNLanguageModel (line 12) | class RNNLanguageModel(KerasComponent):
    method __init__ (line 14) | def __init__(self, transform: TextTransform = None) -> None:
    method fit (line 20) | def fit(self, trn_data, dev_data, save_dir,
    method build_model (line 36) | def build_model(self, embedding, rnn_input_dropout, rnn_units, rnn_out...
    method build_optimizer (line 54) | def build_optimizer(self, optimizer, learning_rate, clipnorm, **kwargs):
    method build_train_dataset (line 59) | def build_train_dataset(self, trn_data, batch_size):
    method build_valid_dataset (line 63) | def build_valid_dataset(self, dev_data, batch_size):
    method generate_text (line 67) | def generate_text(self, text: Union[str, List[str]] = '\n', num_steps=...

FILE: hanlp/components/srl/span_bio/baffine_tagging.py
  class BiaffineTaggingDecoder (line 14) | class BiaffineTaggingDecoder(nn.Module):
    method __init__ (line 16) | def __init__(self,
    method forward (line 38) | def forward(self, x: torch.Tensor, **kwargs):
  class SpanBIOSemanticRoleLabelingModel (line 48) | class SpanBIOSemanticRoleLabelingModel(nn.Module):
    method __init__ (line 50) | def __init__(self,
    method forward (line 70) | def forward(self, batch, mask):

FILE: hanlp/components/srl/span_bio/span_bio.py
  class SpanBIOSemanticRoleLabeler (line 33) | class SpanBIOSemanticRoleLabeler(TorchComponent):
    method __init__ (line 35) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 45) | def build_optimizer(self,
    method build_criterion (line 65) | def build_criterion(self, decoder=None, **kwargs):
    method build_metric (line 75) | def build_metric(self, **kwargs):
    method execute_training_loop (line 78) | def execute_training_loop(self,
    method fit_dataloader (line 123) | def fit_dataloader(self,
    method naive_decode (line 157) | def naive_decode(self, pred, mask, batch, decoder=None):
    method decode_output (line 168) | def decode_output(self, pred, mask, batch, decoder=None):
    method update_metrics (line 196) | def update_metrics(self, metric, prediction, batch):
    method feed_batch (line 204) | def feed_batch(self, batch: dict):
    method compute_mask (line 215) | def compute_mask(self, mask2d):
    method _step (line 226) | def _step(self, optimizer, scheduler, grad_norm):
    method build_model (line 233) | def build_model(self, embed: Embedding, encoder, training, **kwargs) -...
    method build_dataloader (line 246) | def build_dataloader(self, data, batch_size,
    method build_dataset (line 269) | def build_dataset(self, data, transform):
    method build_vocabs (line 276) | def build_vocabs(self, dataset, logger, **kwargs):
    method predict (line 292) | def predict(self, data: Union[str, List[str]], batch_size: int = None,...
    method build_samples (line 311) | def build_samples(self, data):
    method fit (line 315) | def fit(self,
    method compute_loss (line 343) | def compute_loss(self, criterion, pred, srl, mask):
    method evaluate_dataloader (line 355) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method input_is_flat (line 371) | def input_is_flat(self, data) -> bool:
    method prediction_to_result (line 374) | def prediction_to_result(self, prediction: List, batch: Dict[str, Any]...

FILE: hanlp/components/srl/span_rank/highway_variational_lstm.py
  function initializer_1d (line 11) | def initializer_1d(input_tensor, initializer):
  class HighwayBiLSTM (line 18) | class HighwayBiLSTM(nn.Module):
    method __init__ (line 21) | def __init__(self, input_size, hidden_size, num_layers=1, batch_first=...
    method reset_dropout_layer (line 47) | def reset_dropout_layer(self, batch_size):
    method _forward_rnn (line 54) | def _forward_rnn(cell, gate, input, masks, initial, drop_masks=None, h...
    method _forward_brnn (line 66) | def _forward_brnn(cell, gate, input, masks, initial, drop_masks=None, ...
    method forward (line 78) | def forward(self, input, masks, initial=None):
  class StackedHighwayBiLSTM (line 120) | class StackedHighwayBiLSTM(nn.Module):
    method __init__ (line 123) | def __init__(self, input_size, hidden_size, num_layers=1, batch_first=...
    method reset_parameters (line 158) | def reset_parameters(self):
    method reset_dropout_layer (line 166) | def reset_dropout_layer(self, batch_size):
    method reset_state (line 174) | def reset_state(self, batch_size):
    method _forward_rnn (line 182) | def _forward_rnn(cell, gate, input, masks, initial, drop_masks=None, h...
    method _forward_brnn (line 194) | def _forward_brnn(cell, gate, input, masks, initial, drop_masks=None, ...
    method forward (line 206) | def forward(self, input, masks, initial=None):

FILE: hanlp/components/srl/span_rank/inference_utils.py
  function decode_spans (line 7) | def decode_spans(span_starts, span_ends, span_scores, labels_inv):
  function greedy_decode (line 34) | def greedy_decode(predict_dict, srl_labels_inv):
  function get_predicted_clusters (line 99) | def get_predicted_clusters(top_span_starts, top_span_ends, predicted_ant...
  function _decode_non_overlapping_spans (line 124) | def _decode_non_overlapping_spans(starts, ends, scores, max_len, labels_...
  function _dp_decode_non_overlapping_spans (line 147) | def _dp_decode_non_overlapping_spans(starts, ends, scores, max_len, labe...
  function srl_decode (line 212) | def srl_decode(sentence_lengths, predict_dict, srl_labels_inv, config): ...

FILE: hanlp/components/srl/span_rank/layer.py
  function get_tensor_np (line 12) | def get_tensor_np(t):
  function orthonormal_initializer (line 16) | def orthonormal_initializer(output_size, input_size):
  class LayerNorm (line 54) | class LayerNorm(nn.Module):
    method __init__ (line 55) | def __init__(self, features, eps=1e-8):
    method forward (line 61) | def forward(self, x):
  class DropoutLayer3D (line 67) | class DropoutLayer3D(nn.Module):
    method __init__ (line 68) | def __init__(self, input_size, dropout_rate=0.0):
    method reset_dropout_mask (line 77) | def reset_dropout_mask(self, batch_size, length):
    method forward (line 83) | def forward(self, x):
  class DropoutLayer (line 90) | class DropoutLayer(nn.Module):
    method __init__ (line 91) | def __init__(self, input_size, dropout_rate=0.0):
    method reset_dropout_mask (line 98) | def reset_dropout_mask(self, batch_size):
    method forward (line 102) | def forward(self, x):
  class NonLinear (line 109) | class NonLinear(nn.Module):
    method __init__ (line 110) | def __init__(self, input_size, hidden_size, activation=None):
    method forward (line 124) | def forward(self, x):
    method reset_parameters (line 128) | def reset_parameters(self):
  class Biaffine (line 133) | class Biaffine(nn.Module):
    method __init__ (line 134) | def __init__(self, in1_features, in2_features, out_features,
    method reset_parameters (line 149) | def reset_parameters(self):
    method forward (line 152) | def forward(self, input1, input2):
    method __repr__ (line 175) | def __repr__(self):
  class HighwayLSTMCell (line 182) | class HighwayLSTMCell(nn.Module):
    method __init__ (line 183) | def __init__(self, input_size, hidden_size):
    method reset_parameters (line 194) | def reset_parameters(self):
    method forward (line 205) | def forward(self, x, mask=None, hx=None, dropout=None):
  class VariationalLSTMCell (line 225) | class VariationalLSTMCell(nn.Module):
    method __init__ (line 226) | def __init__(self, input_size, hidden_size):
    method reset_parameters (line 233) | def reset_parameters(self):
    method forward (line 238) | def forward(self, x, mask=None, hx=None, dropout=None):
  class VariationalLSTM (line 253) | class VariationalLSTM(nn.Module):
    method __init__ (line 256) | def __init__(self, input_size, hidden_size, num_layers=1, batch_first=...
    method reset_parameters (line 301) | def reset_parameters(self):  # modified by kiro
    method _forward_rnn (line 313) | def _forward_rnn(cell, input, masks, initial, drop_masks):
    method _forward_brnn (line 328) | def _forward_brnn(cell, input, masks, initial, drop_masks):
    method forward (line 343) | def forward(self, input, masks, initial=None):

FILE: hanlp/components/srl/span_rank/span_rank.py
  class SpanRankingSemanticRoleLabeler (line 28) | class SpanRankingSemanticRoleLabeler(TorchComponent):
    method __init__ (line 29) | def __init__(self, **kwargs) -> None:
    method build_optimizer (line 39) | def build_optimizer(self,
    method _get_transformer (line 68) | def _get_transformer(self):
    method build_criterion (line 71) | def build_criterion(self, **kwargs):
    method build_metric (line 75) | def build_metric(self, **kwargs) -> Tuple[F1, F1]:
    method execute_training_loop (line 80) | def execute_training_loop(self,
    method fit_dataloader (line 112) | def fit_dataloader(self,
    method _step (line 148) | def _step(self, optimizer, linear_scheduler):
    method evaluate_dataloader (line 156) | def evaluate_dataloader(self,
    method build_model (line 217) | def build_model(self,
    method build_dataloader (line 229) | def build_dataloader(self, data, batch_size, shuffle, device, logger: ...
    method build_dataset (line 247) | def build_dataset(self, data, generate_idx, logger, transform=None):
    method predict (line 264) | def predict(self, data: Union[str, List[str]], batch_size: int = None,...
    method format_dict_to_results (line 291) | def format_dict_to_results(data, outputs, exclusive_offset=False, with...
    method input_is_flat (line 313) | def input_is_flat(self, data):
    method fit (line 317) | def fit(self,
    method build_vocabs (line 355) | def build_vocabs(self, dataset, logger, **kwargs):
    method reset_metrics (line 371) | def reset_metrics(self, metrics):
    method report_metrics (line 375) | def report_metrics(self, loss, metrics):
    method feed_batch (line 379) | def feed_batch(self, batch) -> Dict[str, Any]:
    method decode_output (line 385) | def decode_output(self, output_dict, batch, training=False):
    method update_metrics (line 410) | def update_metrics(self, batch: dict, output_dict: dict, metrics):

FILE: hanlp/components/srl/span_rank/span_ranking_srl_model.py
  function initializer_1d (line 13) | def initializer_1d(input_tensor, initializer):
  class SpanRankingSRLDecoder (line 20) | class SpanRankingSRLDecoder(nn.Module):
    method __init__ (line 22) | def __init__(self, context_layer_output_dim, label_space_size, config)...
    method reset_parameters (line 59) | def reset_parameters(self):
    method forward (line 83) | def forward(self, hidden_states, batch, mask=None):
    method get_candidate_spans (line 90) | def get_candidate_spans(sent_lengths: torch.Tensor, max_sent_length, m...
    method exclusive_cumsum (line 107) | def exclusive_cumsum(input: torch.Tensor, exclusive=True):
    method flatten_emb (line 128) | def flatten_emb(self, emb):
    method flatten_emb_in_sentence (line 134) | def flatten_emb_in_sentence(self, emb, batch_sentences_mask):
    method get_span_emb (line 139) | def get_span_emb(self, flatted_context_emb, flatted_candidate_starts, ...
    method get_arg_unary_scores (line 181) | def get_arg_unary_scores(self, span_emb):
    method get_pred_unary_scores (line 200) | def get_pred_unary_scores(self, span_emb):
    method extract_spans (line 208) | def extract_spans(self, candidate_scores, candidate_starts, candidate_...
    method batch_index_select (line 234) | def batch_index_select(self, emb, indices):
    method get_batch_topk (line 242) | def get_batch_topk(self, candidate_starts: torch.Tensor, candidate_end...
    method get_dense_span_labels (line 258) | def get_dense_span_labels(self, span_starts, span_ends, span_labels, m...
    method gather_4d (line 281) | def gather_4d(params, indices):
    method get_srl_labels (line 287) | def get_srl_labels(self,
    method get_srl_unary_scores (line 315) | def get_srl_unary_scores(self, span_emb):
    method get_srl_scores (line 323) | def get_srl_scores(self, arg_emb, pred_emb, arg_scores, pred_scores, n...
    method get_srl_softmax_loss (line 348) | def get_srl_softmax_loss(self, srl_scores, srl_labels, num_predicted_a...
    method get_srl_loss_mask (line 355) | def get_srl_loss_mask(self, srl_scores, num_predicted_args, num_predic...
    method decode (line 364) | def decode(self, contextualized_embeddings, sent_lengths, masks, gold_...
  class SpanRankingSRLModel (line 462) | class SpanRankingSRLModel(nn.Module):
    method __init__ (line 464) | def __init__(self, config, embed: torch.nn.Module, context_layer: torc...
    method forward (line 479) | def forward(self,
    method unpack (line 494) | def unpack(batch, mask=None, training=False):

FILE: hanlp/components/srl/span_rank/srl_eval_utils.py
  function split_example_for_eval (line 14) | def split_example_for_eval(example):
  function evaluate_retrieval (line 43) | def evaluate_retrieval(span_starts, span_ends, span_scores, pred_starts,...
  function _calc_f1 (line 95) | def _calc_f1(total_gold, total_predicted, total_matched, message=None):
  function compute_span_f1 (line 104) | def compute_span_f1(gold_data, predictions, task_name):
  function compute_unlabeled_span_f1 (line 130) | def compute_unlabeled_span_f1(gold_data, predictions, task_name):
  function compute_srl_f1 (line 162) | def compute_srl_f1(sentences, gold_srl, predictions, gold_path=None) -> ...
  function print_sentence_to_conll (line 225) | def print_sentence_to_conll(fout, tokens, labels):
  function read_gold_predicates (line 247) | def read_gold_predicates(gold_path):
  function print_to_conll (line 262) | def print_to_conll(sentences, srl_labels, output_filename, gold_predicat...

FILE: hanlp/components/srl/span_rank/util.py
  function block_orth_normal_initializer (line 5) | def block_orth_normal_initializer(input_size, output_size):

FILE: hanlp/components/sts/transformer_sts.py
  class TransformerSemanticTextualSimilarity (line 23) | class TransformerSemanticTextualSimilarity(TorchComponent):
    method __init__ (line 25) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 37) | def build_dataloader(self, data, batch_size, sent_a_col=None,
    method build_optimizer (line 64) | def build_optimizer(self, trn, epochs, gradient_accumulation=1, lr=1e-...
    method build_criterion (line 74) | def build_criterion(self, **kwargs):
    method build_metric (line 77) | def build_metric(self, **kwargs):
    method execute_training_loop (line 80) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 101) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 130) | def evaluate_dataloader(self, data: DataLoader, logger: logging.Logger...
    method build_model (line 162) | def build_model(self, transformer, training=True, **kwargs) -> torch.n...
    method predict (line 170) | def predict(self, data: Union[List[str], List[List[str]]], batch_size:...
    method fit (line 203) | def fit(self, trn_data, dev_data, save_dir,
    method on_config_ready (line 226) | def on_config_ready(self, transformer, max_seq_len, **kwargs):
    method feed_batch (line 234) | def feed_batch(self, batch) -> SequenceClassifierOutput:
    method decode (line 238) | def decode(self, output: SequenceClassifierOutput):
    method report_metrics (line 241) | def report_metrics(self, loss, metric):

FILE: hanlp/components/taggers/cnn_tagger_tf.py
  class WindowTokenTransform (line 15) | class WindowTokenTransform(TSVTaggingTransform):
    method fit (line 17) | def fit(self, trn_path: str, **kwargs):
    method create_types_shapes_values (line 25) | def create_types_shapes_values(self) -> Tuple[Tuple, Tuple, Tuple]:
    method inputs_to_samples (line 33) | def inputs_to_samples(self, inputs, gold=False):
    method X_to_inputs (line 55) | def X_to_inputs(self, X: Union[tf.Tensor, Tuple[tf.Tensor]]) -> Iterable:
  class CNNTaggingModel (line 63) | class CNNTaggingModel(tf.keras.models.Model):
    method __init__ (line 64) | def __init__(self, filters, num_tags, embed, dropout, kernels, **kwargs):
    method call (line 76) | def call(self, inputs, **kwargs):
  class CNNTaggerTF (line 94) | class CNNTaggerTF(TaggerComponent, ABC):
    method __init__ (line 95) | def __init__(self, transform: WindowTokenTransform = None) -> None:
    method build_model (line 102) | def build_model(self, embedding, **kwargs) -> tf.keras.Model:
    method fit (line 112) | def fit(self, trn_data: Any, dev_data: Any, save_dir: str, embedding=2...
    method input_shape (line 124) | def input_shape(self) -> List:

FILE: hanlp/components/taggers/ngram_conv/ngram_conv_tagger.py
  class NgramTransform (line 19) | class NgramTransform(TSVTaggingTransform):
    method __init__ (line 21) | def __init__(self, config: SerializableDict = None, map_x=True, map_y=...
    method inputs_to_samples (line 26) | def inputs_to_samples(self, inputs, gold=False):
    method x_to_idx (line 38) | def x_to_idx(self, x) -> Union[tf.Tensor, Tuple]:
    method y_to_idx (line 44) | def y_to_idx(self, y) -> tf.Tensor:
    method create_types_shapes_values (line 47) | def create_types_shapes_values(self) -> Tuple[Tuple, Tuple, Tuple]:
    method fit (line 59) | def fit(self, trn_path: str, **kwargs):
    method X_to_inputs (line 76) | def X_to_inputs(self, X: Union[tf.Tensor, Tuple[tf.Tensor]]) -> Iterable:
    method input_truth_output_to_str (line 79) | def input_truth_output_to_str(self, input: List[str], truth: List[str]...
  class NgramConvTaggingModel (line 84) | class NgramConvTaggingModel(tf.keras.models.Model):
    method __init__ (line 85) | def __init__(self, word_embed: tf.keras.layers.Embedding, ngram_embed:...
    method call (line 109) | def call(self, inputs, **kwargs):
  class NgramConvTaggerTF (line 141) | class NgramConvTaggerTF(TaggerComponent):
    method __init__ (line 143) | def __init__(self, transform: NgramTransform = None) -> None:
    method build_model (line 149) | def build_model(self, word_embed, ngram_embed, window_size, weight_nor...
    method fit (line 168) | def fit(self, trn_data: Any, dev_data: Any, save_dir: str, word_embed:...

FILE: hanlp/components/taggers/pos_tf.py
  class CNNPartOfSpeechTaggerTF (line 8) | class CNNPartOfSpeechTaggerTF(CNNTaggerTF):
  class RNNPartOfSpeechTaggerTF (line 12) | class RNNPartOfSpeechTaggerTF(RNNTaggerTF):

FILE: hanlp/components/taggers/rnn/rnntaggingmodel.py
  class RNNTaggingModel (line 31) | class RNNTaggingModel(nn.Module):
    method __init__ (line 33) | def __init__(self,
    method reset_parameters (line 74) | def reset_parameters(self):
    method forward (line 78) | def forward(self,

FILE: hanlp/components/taggers/rnn_tagger.py
  class RNNTagger (line 24) | class RNNTagger(Tagger):
    method __init__ (line 26) | def __init__(self, **kwargs) -> None:
    method execute_training_loop (line 36) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method build_scheduler (line 79) | def build_scheduler(self, optimizer, anneal_factor, anneal_patience, *...
    method fit_dataloader (line 86) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method feed_batch (line 108) | def feed_batch(self, batch):
    method build_model (line 114) | def build_model(self, rnn_input, rnn_hidden, drop, crf, **kwargs) -> t...
    method _convert_embed (line 126) | def _convert_embed(self):
    method build_dataloader (line 132) | def build_dataloader(self, data, batch_size, shuffle, device, logger=N...
    method build_dataset (line 156) | def build_dataset(self, data, transform):
    method build_vocabs (line 159) | def build_vocabs(self, dataset, logger):
    method fit (line 167) | def fit(self, trn_data, dev_data, save_dir,
    method _id_to_tags (line 185) | def _id_to_tags(self, ids):
    method write_output (line 194) | def write_output(self, yhat, y, mask, batch, prediction, output):

FILE: hanlp/components/taggers/rnn_tagger_tf.py
  class RNNTaggerTF (line 17) | class RNNTaggerTF(TaggerComponent):
    method __init__ (line 19) | def __init__(self, transform: Transform = None) -> None:
    method fit (line 24) | def fit(self, trn_data: str, dev_data: str = None, save_dir: str = Non...
    method build_model (line 31) | def build_model(self, embeddings, embedding_trainable, rnn_input_dropo...
    method predict (line 46) | def predict(self, sents: Union[List[str], List[List[str]]], batch_size...
    method save_weights (line 50) | def save_weights(self, save_dir, filename='model.h5'):
    method build_loss (line 60) | def build_loss(self, loss, **kwargs):
    method tag_vocab (line 68) | def tag_vocab(self) -> VocabTF:
    method build_transform (line 71) | def build_transform(self, embeddings, **kwargs):
    method sample_data (line 79) | def sample_data(self):

FILE: hanlp/components/taggers/tagger.py
  class Tagger (line 25) | class Tagger(DistillableComponent, ABC):
    method build_optimizer (line 26) | def build_optimizer(self, optimizer, lr, **kwargs):
    method build_criterion (line 32) | def build_criterion(self, model=None, reduction='mean', decoder=None, ...
    method build_metric (line 43) | def build_metric(self, **kwargs):
    method feed_batch (line 47) | def feed_batch(self, batch):
    method compute_loss (line 50) | def compute_loss(self, criterion, out, y, mask):
    method decode_output (line 58) | def decode_output(self, logits, mask, batch, model=None):
    method execute_training_loop (line 67) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method id_to_tags (line 101) | def id_to_tags(self, ids: torch.LongTensor, lens: List[int]):
    method update_metrics (line 110) | def update_metrics(self, metric, logits, y, mask, batch=None, predicti...
    method evaluate_dataloader (line 114) | def evaluate_dataloader(self, data, criterion, logger=None, ratio_widt...
    method write_prediction (line 140) | def write_prediction(self, prediction, batch, output: TextIO):
    method predict (line 145) | def predict(self, tokens: Any, batch_size: int = None, **kwargs):
    method input_is_flat (line 156) | def input_is_flat(self, tokens):
    method predict_data (line 159) | def predict_data(self, data, batch_size, sampler_builder=None, **kwargs):
    method build_samples (line 176) | def build_samples(self, data: List[str], **kwargs):
    method prediction_to_human (line 179) | def prediction_to_human(self, pred_ids, vocab: List[str], batch):
    method tagging_scheme (line 194) | def tagging_scheme(self):
    method dict_tags (line 205) | def dict_tags(self) -> DictInterface:
    method dict_tags (line 219) | def dict_tags(self,

FILE: hanlp/components/taggers/tagger_tf.py
  class TaggerComponent (line 14) | class TaggerComponent(KerasComponent, ABC):
    method build_metrics (line 16) | def build_metrics(self, metrics, logger: logging.Logger, **kwargs):
    method build_loss (line 27) | def build_loss(self, loss, **kwargs):

FILE: hanlp/components/taggers/transformers/metrics_tf.py
  class Accuracy (line 7) | class Accuracy(tf.keras.metrics.SparseCategoricalAccuracy):
    method __init__ (line 9) | def __init__(self, name='sparse_categorical_accuracy', dtype=None, mas...
    method update_state (line 13) | def update_state(self, y_true, y_pred, sample_weight=None):

FILE: hanlp/components/taggers/transformers/transformer_tagger.py
  class TransformerTaggingModel (line 28) | class TransformerTaggingModel(nn.Module):
    method __init__ (line 29) | def __init__(self,
    method forward (line 54) | def forward(self, lens: torch.LongTensor, input_ids, token_span, token...
  class TransformerTagger (line 67) | class TransformerTagger(TransformerComponent, Tagger):
    method __init__ (line 69) | def __init__(self, **kwargs) -> None:
    method fit_dataloader (line 81) | def fit_dataloader(self,
    method _step (line 131) | def _step(self, optimizer, scheduler, grad_norm, transformer_grad_norm...
    method compute_distill_loss (line 139) | def compute_distill_loss(self, kd_criterion, out_S, out_T, mask, tempe...
    method build_model (line 145) | def build_model(self, training=True, extra_embeddings: Embedding = Non...
    method build_dataloader (line 170) | def build_dataloader(self, data, batch_size, shuffle, device, logger: ...
    method build_dataset (line 204) | def build_dataset(self, data, transform=None, **kwargs):
    method last_transform (line 207) | def last_transform(self):
    method tokenizer_transform (line 212) | def tokenizer_transform(self) -> TransformerSequenceTokenizer:
    method build_vocabs (line 219) | def build_vocabs(self, trn, logger, **kwargs):
    method fit (line 233) | def fit(self,
    method feed_batch (line 269) | def feed_batch(self, batch: dict):
    method distill (line 281) | def distill(self,

FILE: hanlp/components/taggers/transformers/transformer_tagger_tf.py
  class TransformerTaggingModel (line 17) | class TransformerTaggingModel(tf.keras.Model):
    method __init__ (line 18) | def __init__(self, transformer: tf.keras.Model, *args, **kwargs):
    method call (line 22) | def call(self, inputs, training=None, mask=None):
  class TransformerTaggerTF (line 26) | class TransformerTaggerTF(TaggerComponent):
    method __init__ (line 27) | def __init__(self, transform: TransformerTransform = None) -> None:
    method build_model (line 33) | def build_model(self, transformer, max_seq_length, **kwargs) -> tf.ker...
    method fit (line 38) | def fit(self, trn_data, dev_data, save_dir,
    method build_optimizer (line 58) | def build_optimizer(self, optimizer, learning_rate, epsilon, weight_de...
    method build_vocab (line 67) | def build_vocab(self, trn_data, logger):
    method train_loop (line 73) | def train_loop(self, trn_data, dev_data, epochs, num_examples, train_s...
    method build_loss (line 85) | def build_loss(self, loss, **kwargs):
    method load_transform (line 90) | def load_transform(self, save_dir) -> Transform:

FILE: hanlp/components/taggers/transformers/transformer_transform_tf.py
  class TransformerTransform (line 15) | class TransformerTransform(TsvTaggingFormat, Transform):
    method __init__ (line 16) | def __init__(self,
    method max_seq_length (line 28) | def max_seq_length(self):
    method tokenizer (line 33) | def tokenizer(self):
    method tokenizer (line 37) | def tokenizer(self, tokenizer):
    method fit (line 48) | def fit(self, trn_path: str, **kwargs) -> int:
    method create_types_shapes_values (line 56) | def create_types_shapes_values(self) -> Tuple[Tuple, Tuple, Tuple]:
    method lock_vocabs (line 64) | def lock_vocabs(self):
    method inputs_to_samples (line 67) | def inputs_to_samples(self, inputs, gold=False):
    method x_to_idx (line 111) | def x_to_idx(self, x) -> Union[tf.Tensor, Tuple]:
    method y_to_idx (line 114) | def y_to_idx(self, y) -> tf.Tensor:
    method input_is_single_sample (line 117) | def input_is_single_sample(self, input: Union[List[str], List[List[str...
    method Y_to_outputs (line 120) | def Y_to_outputs(self, Y: Union[tf.Tensor, Tuple[tf.Tensor]], gold=Fal...

FILE: hanlp/components/taggers/util.py
  function guess_tagging_scheme (line 8) | def guess_tagging_scheme(labels: List[str]) -> str:
  function guess_allowed_transitions (line 15) | def guess_allowed_transitions(labels) -> List[Tuple[int, int]]:

FILE: hanlp/components/tokenizers/multi_criteria_cws_transformer.py
  class MultiCriteriaTransformerTaggingTokenizer (line 17) | class MultiCriteriaTransformerTaggingTokenizer(TransformerTaggingTokeniz...
    method __init__ (line 18) | def __init__(self, **kwargs) -> None:
    method build_dataset (line 28) | def build_dataset(self, data, **kwargs):
    method on_config_ready (line 31) | def on_config_ready(self, **kwargs):
    method last_transform (line 41) | def last_transform(self):
    method build_vocabs (line 48) | def build_vocabs(self, trn, logger, **kwargs):
    method feed_batch (line 52) | def feed_batch(self, batch: dict):
    method build_samples (line 57) | def build_samples(self, data: List[str], criteria=None, **kwargs):
    method build_metric (line 68) | def build_metric(self, **kwargs):
    method update_metrics (line 74) | def update_metrics(self, metric, logits, y, mask, batch, prediction):
    method fit (line 80) | def fit(self, trn_data, dev_data, save_dir, transformer, average_subwo...

FILE: hanlp/components/tokenizers/tok.py
  class RNNTokenizer (line 13) | class RNNTokenizer(RNNTagger):
    method predict (line 15) | def predict(self, sentence: Any, batch_size: int = None, **kwargs):
    method predict_data (line 26) | def predict_data(self, data, batch_size, **kwargs):
    method build_dataset (line 31) | def build_dataset(self, data, transform=None):
    method build_metric (line 39) | def build_metric(self, **kwargs):
    method update_metrics (line 42) | def update_metrics(self, metric, logits, y, mask, batch):
    method fit (line 48) | def fit(self, trn_data, dev_data, save_dir, batch_size=50, epochs=100,...

FILE: hanlp/components/tokenizers/tok_tf.py
  class BMESTokenizerTF (line 21) | class BMESTokenizerTF(KerasComponent):
    method build_metrics (line 23) | def build_metrics(self, metrics, logger: logging.Logger, **kwargs):
  class NgramConvTokenizerTransform (line 30) | class NgramConvTokenizerTransform(TxtFormat, NgramTransform):
    method inputs_to_samples (line 32) | def inputs_to_samples(self, inputs, gold=False):
    method input_is_single_sample (line 39) | def input_is_single_sample(self, input: Union[List[str], List[List[str...
    method Y_to_outputs (line 44) | def Y_to_outputs(self, Y: Union[tf.Tensor, Tuple[tf.Tensor]], gold=Fal...
  class NgramConvTokenizerTF (line 49) | class NgramConvTokenizerTF(BMESTokenizerTF, NgramConvTaggerTF):
    method __init__ (line 51) | def __init__(self) -> None:
    method fit (line 54) | def fit(self, trn_data: Any, dev_data: Any, save_dir: str, word_embed:...
    method evaluate_output_to_file (line 62) | def evaluate_output_to_file(self, batch, outputs, out):
    method build_loss (line 68) | def build_loss(self, loss, **kwargs):
  class TransformerTokenizerTransform (line 74) | class TransformerTokenizerTransform(TxtBMESFormat, TransformerTransform):
    method inputs_to_samples (line 76) | def inputs_to_samples(self, inputs, gold=False):
    method Y_to_tokens (line 80) | def Y_to_tokens(self, tag_vocab, Y, gold, inputs):
  class TransformerTokenizerTF (line 88) | class TransformerTokenizerTF(BMESTokenizerTF, TransformerTaggerTF):
    method __init__ (line 89) | def __init__(self, transform: TransformerTokenizerTransform = None) ->...
  class RNNTokenizerTransform (line 95) | class RNNTokenizerTransform(TxtBMESFormat, TSVTaggingTransform):
  class RNNTokenizerTF (line 99) | class RNNTokenizerTF(BMESTokenizerTF, RNNTaggerTF):
    method __init__ (line 100) | def __init__(self, transform: RNNTokenizerTransform = None) -> None:
    method fit (line 105) | def fit(self, trn_data: str, dev_data: str = None, save_dir: str = Non...

FILE: hanlp/components/tokenizers/transformer.py
  class TransformerTaggingTokenizer (line 21) | class TransformerTaggingTokenizer(TransformerTagger):
    method __init__ (line 23) | def __init__(self, **kwargs) -> None:
    method dict_force (line 43) | def dict_force(self) -> DictInterface:
    method dict_force (line 65) | def dict_force(self, dictionary: Union[DictInterface, Union[Dict[str, ...
    method dict_combine (line 72) | def dict_combine(self) -> DictInterface:
    method dict_combine (line 84) | def dict_combine(self, dictionary: Union[DictInterface, Union[Dict[str...
    method build_metric (line 98) | def build_metric(self, **kwargs):
    method update_metrics (line 102) | def update_metrics(self, metric, logits, y, mask, batch, prediction):
    method decode_output (line 108) | def decode_output(self, logits, mask, batch, model=None):
    method tag_to_span (line 115) | def tag_to_span(self, batch_tags, batch: dict):
    method write_prediction (line 157) | def write_prediction(self, prediction, batch, output: TextIO):
    method tokenizer_transform (line 164) | def tokenizer_transform(self):
    method spans_to_tokens (line 174) | def spans_to_tokens(self, spans, batch, rebuild_span=False):
    method generate_prediction_filename (line 213) | def generate_prediction_filename(self, tst_data, save_dir):
    method prediction_to_human (line 216) | def prediction_to_human(self, pred, vocab, batch, rebuild_span=False):
    method input_is_flat (line 230) | def input_is_flat(self, tokens):
    method build_dataset (line 233) | def build_dataset(self, data, **kwargs):
    method last_transform (line 236) | def last_transform(self):
    method fit (line 240) | def fit(self, trn_data, dev_data, save_dir, transformer, average_subwo...
    method feed_batch (line 297) | def feed_batch(self, batch: dict):

FILE: hanlp/datasets/coref/loaders/conll12coref.py
  class Ontonotes (line 15) | class Ontonotes(_Ontonotes):
    method dataset_document_iterator (line 16) | def dataset_document_iterator(self, file_path: str) -> Iterator[List[O...
  class CONLL12CorefDataset (line 51) | class CONLL12CorefDataset(TransformableDataset):
    method __init__ (line 53) | def __init__(self, data: Union[str, List], transform: Union[Callable, ...
    method load_file (line 60) | def load_file(self, filepath: str):
    method text_to_instance (line 77) | def text_to_instance(

FILE: hanlp/datasets/eos/eos.py
  class SentenceBoundaryDetectionDataset (line 14) | class SentenceBoundaryDetectionDataset(TransformableDataset):
    method __init__ (line 16) | def __init__(self,
    method load_file (line 48) | def load_file(self, filepath: str):

FILE: hanlp/datasets/lm/loaders/lm_dataset.py
  class LanguageModelDataset (line 16) | class LanguageModelDataset(TransformSequentialDataset):
    method __init__ (line 18) | def __init__(self,
    method vocab (line 59) | def vocab(self):
    method vocab_path (line 63) | def vocab_path(self):
    method load_file (line 66) | def load_file(self, filepath):
    method __iter__ (line 95) | def __iter__(self):
    method estimate_num_batches (line 113) | def estimate_num_batches(self, seq_len=None):
    method max_seq_len (line 119) | def max_seq_len(self):
    method _read_chunk (line 124) | def _read_chunk(fp, offset, length):
    method _debug_load_cache (line 132) | def _debug_load_cache(self):
    method filecache (line 141) | def filecache(self):

FILE: hanlp/datasets/lu/glue.py
  class SST2Dataset (line 15) | class SST2Dataset(TableDataset):
  function main (line 19) | def main():

FILE: hanlp/datasets/ner/loaders/json_ner.py
  class JsonNERDataset (line 15) | class JsonNERDataset(TransformableDataset):
    method __init__ (line 17) | def __init__(self, data: Union[str, List], transform: Union[Callable, ...
    method load_file (line 34) | def load_file(self, filepath: str):
  function convert_conll03_to_json (line 92) | def convert_conll03_to_json(file_path):
  function unpack_ner (line 130) | def unpack_ner(sample: dict) -> dict:
  function prune_ner_tagset (line 141) | def prune_ner_tagset(sample: dict, tagset: Union[set, Dict[str, str]]):

FILE: hanlp/datasets/ner/loaders/tsv.py
  class TSVTaggingDataset (line 11) | class TSVTaggingDataset(TransformableDataset):
    method __init__ (line 13) | def __init__(self,
    method load_file (line 47) | def load_file(self, filepath):

FILE: hanlp/datasets/parsing/amr.py
  class AbstractMeaningRepresentationDataset (line 23) | class AbstractMeaningRepresentationDataset(TransformableDataset):
    method load_file (line 24) | def load_file(self, filepath: str):
  function generate_oracle (line 29) | def generate_oracle(sample: dict):
  function chars_for_tok (line 38) | def chars_for_tok(sample: dict, max_string_len=20):
  function append_bos (line 48) | def append_bos(sample: dict):
  function get_concepts (line 55) | def get_concepts(sample: dict, vocab: VocabWithFrequency = None, rel_voc...
  function batchify (line 83) | def batchify(data, vocabs: VocabDict, unk_rate=0., device=None, squeeze=...
  function make_batch_for_bart (line 160) | def make_batch_for_bart(augmented_concept, ret, tokenizer, device, train...
  function levi_amr (line 182) | def levi_amr(concept, edge, extra_arc=False):
  function move_dict_to_device (line 194) | def move_dict_to_device(ret, device):
  function subtoken_to_tensor (line 204) | def subtoken_to_tensor(token_field, ret):
  function make_batch_for_squeeze (line 210) | def make_batch_for_squeeze(data, augmented_concept, tokenizer, device, r...
  function linearize (line 259) | def linearize(concept: List, edge: List, label='', prefix=REL, extra_arc...
  function unlinearize (line 289) | def unlinearize(concept: List, edge: List, prefix=REL, extra_arc=False):
  function separate_concept_rel (line 306) | def separate_concept_rel(concept, prefix=REL):
  function remove_unconnected_components (line 316) | def remove_unconnected_components(concept: List, edge: List):
  function largest_connected_component (line 342) | def largest_connected_component(triples: List):
  function to_triples (line 358) | def to_triples(concept: List, edge: List):
  function reverse_edge_for_levi_bfs (line 362) | def reverse_edge_for_levi_bfs(concept, edge):
  function un_kahn (line 370) | def un_kahn(concept, edge):

FILE: hanlp/datasets/parsing/loaders/_ctb_utils.py
  function convert_to_dependency (line 76) | def convert_to_dependency(src, dst, language='zh', version='3.3.0', conl...
  function clean_ctb_bracketed (line 107) | def clean_ctb_bracketed(ctb_root, out_root):
  function _list_treebank_root (line 121) | def _list_treebank_root(ctb_root):
  function list_treebank (line 126) | def list_treebank(ctb_home):
  function load_bracketed_trees (line 132) | def load_bracketed_trees(chtbs) -> List[Tree]:
  function split_str_to_trees (line 144) | def split_str_to_trees(text: str):
  function make_ctb_tasks (line 160) | def make_ctb_tasks(chtbs, out_root, part):
  function reverse_splits (line 210) | def reverse_splits(splits):
  function split_chtb (line 218) | def split_chtb(chtbs: List[str], splits=None):
  function id_of_chtb (line 244) | def id_of_chtb(each: str):
  function make_ctb (line 248) | def make_ctb(ctb_home):
  function load_domains (line 273) | def load_domains(ctb_home):
  function ctb_pos_to_text_format (line 294) | def ctb_pos_to_text_format(path, delimiter='_'):
  function remove_all_ec (line 310) | def remove_all_ec(path):

FILE: hanlp/datasets/parsing/loaders/conll_dataset.py
  class CoNLLParsingDataset (line 12) | class CoNLLParsingDataset(TransformableDataset):
    method __init__ (line 14) | def __init__(self,
    method load_file (line 33) | def load_file(self, filepath):
    method __len__ (line 56) | def __len__(self) -> int:
  function append_bos (line 60) | def append_bos(sample: dict, pos_key='CPOS', bos=ROOT) -> dict:
  function append_bos_eos (line 80) | def append_bos_eos(sample: dict) -> dict:
  function get_sibs (line 90) | def get_sibs(sample: dict) -> dict:

FILE: hanlp/datasets/parsing/loaders/constituency_dataset.py
  class ConstituencyDataset (line 12) | class ConstituencyDataset(TransformableDataset):
    method load_file (line 13) | def load_file(self, filepath: str):
  function unpack_tree_to_features (line 22) | def unpack_tree_to_features(sample: dict):
  function append_bos_eos (line 36) | def append_bos_eos(sample: dict):
  function remove_subcategory (line 43) | def remove_subcategory(sample: dict):
  function binarize (line 52) | def binarize(tree: Tree):
  function factorize (line 105) | def factorize(tree, delete_labels=None, equal_labels=None):
  function build_tree (line 164) | def build_tree(tokens: List[str], sequence):

FILE: hanlp/datasets/parsing/pmt1.py
  function _make_ptm (line 20) | def _make_ptm():

FILE: hanlp/datasets/parsing/semeval15.py
  function unpack_deps_to_head_deprel (line 16) | def unpack_deps_to_head_deprel(sample: dict, pad_rel=None, arc_key='arc'...
  function append_bos_to_form_pos (line 41) | def append_bos_to_form_pos(sample, pos_key='CPOS'):
  function merge_head_deprel_with_2nd (line 48) | def merge_head_deprel_with_2nd(sample: dict):

FILE: hanlp/datasets/parsing/semeval16.py
  function convert_conll_to_conllu (line 33) | def convert_conll_to_conllu(path):

FILE: hanlp/datasets/parsing/ud/__init__.py
  function concat_treebanks (line 12) | def concat_treebanks(home, version):

FILE: hanlp/datasets/parsing/ud/ud210.py
  function _list_dir (line 19) | def _list_dir(path, home):
  function main (line 37) | def main():

FILE: hanlp/datasets/parsing/ud/ud23.py
  function _list_dir (line 9) | def _list_dir(path, home):
  function main (line 29) | def main():

FILE: hanlp/datasets/parsing/ud/ud27.py
  function _list_dir (line 19) | def _list_dir(path, home):
  function main (line 37) | def main():

FILE: hanlp/datasets/qa/hotpotqa.py
  class HotpotQADataset (line 18) | class HotpotQADataset(TransformableDataset):
    method load_file (line 20) | def load_file(self, filepath):
  class BuildGraph (line 25) | class BuildGraph(object):
    method __init__ (line 27) | def __init__(self, dst='graph') -> None:
    method __call__ (line 31) | def __call__(self, sample: dict):
  function hotpotqa_collate_fn (line 36) | def hotpotqa_collate_fn(samples):
  function flat_sentence (line 75) | def flat_sentence(sample: dict) -> dict:
  function create_sp_label (line 82) | def create_sp_label(sample: dict) -> dict:
  class Type (line 99) | class Type(Enum):
  class Vertex (line 109) | class Vertex(object):
    method __init__ (line 111) | def __init__(self, id, type: Type, text=None) -> None:
    method connect (line 121) | def connect(self, to, rel):
    method __str__ (line 125) | def __str__(self) -> str:
    method __hash__ (line 128) | def __hash__(self) -> int:
    method is_word (line 131) | def is_word(self):
    method is_question (line 134) | def is_question(self):
    method is_sp (line 137) | def is_sp(self):
    method is_sp_root (line 140) | def is_sp_root(self):
    method is_sp_root_candidate (line 143) | def is_sp_root_candidate(self):
  function build_graph (line 147) | def build_graph(each: dict, debug=False):

FILE: hanlp/datasets/srl/loaders/conll2012.py
  class CoNLL2012BIOSRLDataset (line 17) | class CoNLL2012BIOSRLDataset(TransformableDataset):
    method load_file (line 18) | def load_file(self, filepath: str):
    method _make_bio_labels (line 44) | def _make_bio_labels(prop):
    method _remove_B_V (line 76) | def _remove_B_V(labels):
  class CoNLL2012SRLDataset (line 80) | class CoNLL2012SRLDataset(TransformableDataset):
    method __init__ (line 82) | def __init__(self,
    method load_file (line 91) | def load_file(self, filepath: str):
    method build_sample (line 145) | def build_sample(self, sentence, deduplicated_srl, doc, sid):
  function group_pa_by_p (line 152) | def group_pa_by_p(sample: dict) -> dict:
  function group_pa_by_p_ (line 160) | def group_pa_by_p_(srl):
  function filter_v_args (line 170) | def filter_v_args(sample: dict) -> dict:
  function unpack_srl (line 176) | def unpack_srl(sample: dict) -> dict:
  class SpanCandidatesGenerator (line 193) | class SpanCandidatesGenerator(NamedTransform):
    method __init__ (line 195) | def __init__(self, src: str, dst: str = None, max_span_width=None) -> ...
    method __call__ (line 201) | def __call__(self, sample: dict) -> dict:
  class CoNLL2012SRLBIODataset (line 206) | class CoNLL2012SRLBIODataset(CoNLL2012SRLDataset):
    method build_sample (line 207) | def build_sample(self, tokens, deduplicated_srl, doc, sid):

FILE: hanlp/datasets/srl/loaders/ontonotes_loader.py
  class OntonotesSentence (line 13) | class OntonotesSentence:
    method __init__ (line 55) | def __init__(
  class Ontonotes (line 85) | class Ontonotes:
    method dataset_iterator (line 181) | def dataset_iterator(self, file_path: str) -> Iterator[OntonotesSenten...
    method dataset_path_iterator (line 189) | def dataset_path_iterator(file_path: str) -> Iterator[str]:
    method dataset_document_iterator (line 205) | def dataset_document_iterator(self, file_path: str) -> Iterator[List[O...
    method sentence_iterator (line 232) | def sentence_iterator(self, file_path: str) -> Iterator[OntonotesSente...
    method _conll_rows_to_sentence (line 240) | def _conll_rows_to_sentence(self, conll_rows: List[str]) -> OntonotesS...
    method _process_coref_span_annotations_for_word (line 369) | def _process_coref_span_annotations_for_word(
    method _process_span_annotations_for_word (line 420) | def _process_span_annotations_for_word(
  function make_coref_instance (line 463) | def make_coref_instance(
  function _normalize_word (line 579) | def _normalize_word(word):
  function _canonicalize_clusters (line 586) | def _canonicalize_clusters(clusters: List[List[Tuple[int, int]]]) -> Lis...

FILE: hanlp/datasets/srl/ontonotes5/_utils.py
  function flatten (line 24) | def flatten(l):
  function get_doc_key (line 28) | def get_doc_key(doc_id, part):
  class DocumentState (line 32) | class DocumentState(object):
    method __init__ (line 33) | def __init__(self):
    method assert_empty (line 56) | def assert_empty(self):
    method assert_finalizable (line 77) | def assert_finalizable(self):
    method finalize_sentence (line 89) | def finalize_sentence(self):
    method finalize (line 112) | def finalize(self):
  function filter_data (line 147) | def filter_data(input_json_file, output_json_file, doc_ids_file=None, an...
  function normalize_word (line 224) | def normalize_word(word, language):
  function handle_bit (line 233) | def handle_bit(word_index, bit, stack, spans, label_set):
  function handle_line (line 269) | def handle_line(line, document_state: DocumentState, language, labels, s...
  function ontonotes_document_generator (line 339) | def ontonotes_document_generator(input_path, language, labels, stats):
  function convert_to_jsonlines (line 349) | def convert_to_jsonlines(input_path, output_path, language, labels=None,...
  function make_ontonotes_jsonlines (line 364) | def make_ontonotes_jsonlines(conll12_ontonotes_path, output_path, langua...
  function make_ontonotes_language_jsonlines (line 371) | def make_ontonotes_language_jsonlines(conll12_ontonotes_path, output_pat...
  function ensure_python_points_to_python2 (line 405) | def ensure_python_points_to_python2():
  function make_gold_conll (line 414) | def make_gold_conll(ontonotes_path, language):
  function convert_jsonlines_to_IOBES (line 432) | def convert_jsonlines_to_IOBES(json_file, output_file=None, doc_level_of...
  function make_ner_tsv_if_necessary (line 463) | def make_ner_tsv_if_necessary(json_file):
  function batch_make_ner_tsv_if_necessary (line 471) | def batch_make_ner_tsv_if_necessary(json_files):
  function make_pos_tsv_if_necessary (line 476) | def make_pos_tsv_if_necessary(json_file):
  function make_pos_tsv (line 484) | def make_pos_tsv(json_file, output_file):
  function batch_make_pos_tsv_if_necessary (line 494) | def batch_make_pos_tsv_if_necessary(json_files):
  function make_con_txt (line 499) | def make_con_txt(conll_file, output_file):
  function make_con_txt_if_necessary (line 518) | def make_con_txt_if_necessary(json_file):
  function batch_make_con_txt_if_necessary (line 526) | def batch_make_con_txt_if_necessary(json_files):
  function batch_remove_empty_category_if_necessary (line 531) | def batch_remove_empty_category_if_necessary(json_files):
  function make_dep_conllx (line 539) | def make_dep_conllx(con_txt_file, output_file, language='en'):
  function make_dep_conllx_if_necessary (line 544) | def make_dep_conllx_if_necessary(con_txt_file: str, language='en'):
  function batch_make_dep_conllx_if_necessary (line 552) | def batch_make_dep_conllx_if_necessary(con_txt_files, language='en'):
  function make_ner_json_if_necessary (line 557) | def make_ner_json_if_necessary(json_file):
  function batch_make_ner_json_if_necessary (line 565) | def batch_make_ner_json_if_necessary(json_files):
  function make_ner_json (line 570) | def make_ner_json(json_file, output_file):
  function make_srl_json_if_necessary (line 574) | def make_srl_json_if_necessary(json_file):
  function make_coref_json_if_necessary (line 582) | def make_coref_json_if_necessary(json_file):
  function batch_make_srl_json_if_necessary (line 590) | def batch_make_srl_json_if_necessary(json_files):
  function make_srl_json (line 595) | def make_srl_json(json_file, output_file):
  function batch_make_coref_json_if_necessary (line 599) | def batch_make_coref_json_if_necessary(json_files):
  function make_coref_json (line 604) | def make_coref_json(json_file, output_file):
  function load_raw_text (line 608) | def load_raw_text(onf_file) -> List[str]:
  function batch_load_raw_text (line 633) | def batch_load_raw_text(root: str) -> Dict[str, List[str]]:
  function make_raw_text_if_necessary (line 642) | def make_raw_text_if_necessary(home: str):
  class RestoreToken (line 651) | class RestoreToken(NormalizeToken):
    method __init__ (line 652) | def __init__(self, src: str, mapper: Union[str, dict] = None, dst: str...
    method __call__ (line 660) | def __call__(self, sample: dict) -> dict:
  function main (line 667) | def main():

FILE: hanlp/datasets/sts/stsb.py
  class SemanticTextualSimilarityDataset (line 14) | class SemanticTextualSimilarityDataset(TransformableDataset):
    method __init__ (line 15) | def __init__(self,
    method load_file (line 30) | def load_file(self, filepath: str):

FILE: hanlp/datasets/tokenization/loaders/chunking_dataset.py
  class ChunkingDataset (line 12) | class ChunkingDataset(TransformableDataset):
    method __init__ (line 14) | def __init__(self, data: Union[str, List], transform: Union[Callable, ...
    method load_file (line 25) | def load_file(self, filepath):
    method _generate_chars_tags (line 32) | def _generate_chars_tags(filepath, delimiter, max_seq_len):

FILE: hanlp/datasets/tokenization/loaders/multi_criteria_cws/mcws_dataset.py
  class MultiCriteriaTextTokenizingDataset (line 11) | class MultiCriteriaTextTokenizingDataset(TextTokenizingDataset):
    method __init__ (line 12) | def __init__(self,
    method should_load_file (line 25) | def should_load_file(self, data) -> bool:
    method load_file (line 28) | def load_file(self, filepath: Union[Iterable[str], Dict[str, str]]):
  function append_criteria_token (line 87) | def append_criteria_token(sample: dict, criteria_tokens: Dict[str, int],...

FILE: hanlp/datasets/tokenization/loaders/txt.py
  class TextTokenizingDataset (line 12) | class TextTokenizingDataset(TransformableDataset):
    method __init__ (line 13) | def __init__(self,
    method load_file (line 47) | def load_file(self, filepath: str):
  function generate_tags_for_subtokens (line 85) | def generate_tags_for_subtokens(sample: dict, tagging_scheme='BMES'):
  function subtoken_offsets_to_subtokens (line 117) | def subtoken_offsets_to_subtokens(text, token_subtoken_offsets):
  function subtokens_group_to_subtokens (line 124) | def subtokens_group_to_subtokens(tokens, subtoken_offsets_group):

FILE: hanlp/datasets/tokenization/sighan2005/__init__.py
  function make (line 12) | def make(train):

FILE: hanlp/layers/cnn_encoder.py
  class CnnEncoder (line 7) | class CnnEncoder(torch.nn.Module):
    method __init__ (line 48) | def __init__(
    method get_input_dim (line 81) | def get_input_dim(self) -> int:
    method get_output_dim (line 84) | def get_output_dim(self) -> int:
    method forward (line 87) | def forward(self, tokens: torch.Tensor, mask: torch.BoolTensor):

FILE: hanlp/layers/crf/crf.py
  class CRF (line 28) | class CRF(nn.Module):
    method __init__ (line 57) | def __init__(self, num_tags: int, batch_first: bool = True) -> None:
    method reset_parameters (line 69) | def reset_parameters(self) -> None:
    method __repr__ (line 79) | def __repr__(self) -> str:
    method forward (line 82) | def forward(
    method decode (line 136) | def decode(self, emissions: torch.Tensor,
    method _validate (line 160) | def _validate(
    method _compute_score (line 188) | def _compute_score(
    method _compute_normalizer (line 227) | def _compute_normalizer(
    method _viterbi_decode (line 278) | def _viterbi_decode(self, emissions: torch.FloatTensor,

FILE: hanlp/layers/crf/crf_layer_tf.py
  class CRF (line 21) | class CRF(tf.keras.layers.Layer):
    method __init__ (line 46) | def __init__(self, num_classes, **kwargs):
    method get_config (line 55) | def get_config(self):
    method build (line 64) | def build(self, input_shape):
    method compute_mask (line 82) | def compute_mask(self, inputs, mask=None):
    method call (line 88) | def call(self, inputs, sequence_lengths=None, mask=None, training=None...
    method compute_output_shape (line 113) | def compute_output_shape(self, input_shape):
    method viterbi_accuracy (line 118) | def viterbi_accuracy(self):
  class CRFLoss (line 131) | class CRFLoss(object):
    method __init__ (line 133) | def __init__(self, crf: CRF, dtype) -> None:
    method __call__ (line 139) | def __call__(self, y_true, y_pred, sample_weight=None, **kwargs):
  class CRFWrapper (line 153) | class CRFWrapper(tf.keras.Model):
    method __init__ (line 154) | def __init__(self, model: tf.keras.Model, num_classes=None, *args, **k...
    method call (line 159) | def call(self, inputs, training=None, mask=None):
    method compute_output_shape (line 164) | def compute_output_shape(self, input_shape):

FILE: hanlp/layers/crf/crf_tf.py
  function crf_sequence_score (line 27) | def crf_sequence_score(inputs, tag_indices, sequence_lengths,
  function crf_multitag_sequence_score (line 75) | def crf_multitag_sequence_score(inputs, tag_bitmap, sequence_lengths,
  function crf_log_norm (line 125) | def crf_log_norm(inputs, sequence_lengths, transition_params):
  function crf_log_likelihood (line 175) | def crf_log_likelihood(inputs,
  function crf_unary_score (line 216) | def crf_unary_score(tag_indices, sequence_lengths, inputs):
  function crf_binary_score (line 256) | def crf_binary_score(tag_indices, sequence_lengths, transition_params):
  function crf_forward (line 295) | def crf_forward(inputs, state, transition_params, sequence_lengths):
  function viterbi_decode (line 333) | def viterbi_decode(score, transition_params):
  class CrfDecodeForwardRnnCell (line 366) | class CrfDecodeForwardRnnCell(tf.keras.layers.AbstractRNNCell):
    method __init__ (line 369) | def __init__(self, transition_params, **kwargs):
    method state_size (line 383) | def state_size(self):
    method output_size (line 387) | def output_size(self):
    method build (line 390) | def build(self, input_shape):
    method call (line 393) | def call(self, inputs, state):
  function crf_decode_forward (line 414) | def crf_decode_forward(inputs, state, transition_params, sequence_lengths):
  function crf_decode_backward (line 437) | def crf_decode_backward(inputs, state):
  function crf_decode (line 461) | def crf_decode(potentials, transition_params, sequence_length):

FILE: hanlp/layers/dropout.py
  class WordDropout (line 9) | class WordDropout(nn.Module):
    method __init__ (line 10) | def __init__(self, p: float, oov_token: int, exclude_tokens: List[int]...
    method token_dropout (line 19) | def token_dropout(tokens: torch.LongTensor,
    method forward (line 62) | def forward(self, tokens: torch.LongTensor) -> torch.LongTensor:
  class SharedDropout (line 66) | class SharedDropout(nn.Module):
    method __init__ (line 68) | def __init__(self, p=0.5, batch_first=True):
    method extra_repr (line 74) | def extra_repr(self):
    method forward (line 81) | def forward(self, x):
    method get_mask (line 92) | def get_mask(x, p):
  class IndependentDropout (line 99) | class IndependentDropout(nn.Module):
    method __init__ (line 101) | def __init__(self, p=0.5):
    method extra_repr (line 127) | def extra_repr(self):
    method forward (line 130) | def forward(self, *items):
  class LockedDropout (line 143) | class LockedDropout(nn.Module):
    method __init__ (line 144) | def __init__(self, dropout_rate=0.5):
    method forward (line 148) | def forward(self, x):

FILE: hanlp/layers/embeddings/char_cnn.py
  class CharCNN (line 15) | class CharCNN(nn.Module):
    method __init__ (line 16) | def __init__(self,
    method forward (line 74) | def forward(self, batch: dict, **kwargs):
    method get_output_dim (line 80) | def get_output_dim(self) -> int:
  class CharCNNEmbedding (line 84) | class CharCNNEmbedding(Embedding, AutoConfigurable):
    method __init__ (line 85) | def __init__(self,
    method transform (line 123) | def transform(self, vocabs: VocabDict, **kwargs) -> Optional[Callable]:
    method vocab_name (line 133) | def vocab_name(self):
    method module (line 137) | def module(self, vocabs: VocabDict, **kwargs) -> Optional[nn.Module]:

FILE: hanlp/layers/embeddings/char_cnn_tf.py
  class CharCNNEmbeddingTF (line 13) | class CharCNNEmbeddingTF(tf.keras.layers.Layer):
    method __init__ (line 14) | def __init__(self, word_vocab: VocabTF, char_vocab: VocabTF,
    method call (line 32) | def call(self, inputs: tf.Tensor, **kwargs):
    method compute_output_shape (line 45) | def compute_output_shape(self, input_shape):
    method get_config (line 48) | def get_config(self):
  function masked_conv1d_and_max (line 59) | def masked_conv1d_and_max(t, weights, conv1d):

FILE: hanlp/layers/embeddings/char_rnn.py
  class CharRNN (line 16) | class CharRNN(nn.Module, EmbeddingDim):
    method __init__ (line 17) | def __init__(self,
    method forward (line 47) | def forward(self, batch, mask, **kwargs):
    method embedding_dim (line 68) | def embedding_dim(self) -> int:
  class CharRNNEmbedding (line 72) | class CharRNNEmbedding(Embedding, AutoConfigurable):
    method __init__ (line 73) | def __init__(self,
    method transform (line 92) | def transform(self, vocabs: VocabDict, **kwargs) -> Optional[Callable]:
    method vocab_name (line 101) | def vocab_name(self):
    method module (line 105) | def module(self, vocabs: VocabDict, **kwargs) -> Optional[nn.Module]:

FILE: hanlp/layers/embeddings/char_rnn_tf.py
  class CharRNNEmbeddingTF (line 11) | class CharRNNEmbeddingTF(tf.keras.layers.Layer):
    method __init__ (line 12) | def __init__(self, word_vocab: VocabTF, char_vocab: VocabTF,
    method call (line 29) | def call(self, inputs: tf.Tensor, **kwargs):
    method get_config (line 54) | def get_config(self):

FILE: hanlp/layers/embeddings/concat_embedding.py
  class ConcatEmbedding (line 10) | class ConcatEmbedding(tf.keras.layers.Layer):
    method __init__ (line 11) | def __init__(self, *embeddings, trainable=True, name=None, dtype=None,...
    method build (line 26) | def build(self, input_shape):
    method compute_mask (line 31) | def compute_mask(self, inputs, mask=None):
    method call (line 38) | def call(self, inputs, **kwargs):
    method get_config (line 48) | def get_config(self):
    method compute_output_shape (line 55) | def compute_output_shape(self, input_shape):

FILE: hanlp/layers/embeddings/contextual_string_embedding.py
  class RNNLanguageModel (line 26) | class RNNLanguageModel(nn.Module):
    method __init__ (line 29) | def __init__(self,
    method forward (line 44) | def forward(self, ids: torch.LongTensor, lens: torch.LongTensor):
    method load_language_model (line 52) | def load_language_model(cls, model_file):
    method save (line 62) | def save(self, file):
  class ContextualStringEmbeddingModule (line 73) | class ContextualStringEmbeddingModule(nn.Module, EmbeddingDim):
    method __init__ (line 75) | def __init__(self, field: str, path: str, trainable=False) -> None:
    method __call__ (line 87) | def __call__(self, batch: dict, **kwargs):
    method embedding_dim (line 94) | def embedding_dim(self):
    method run_lm (line 97) | def run_lm(self, lm, ids: torch.Tensor, offsets: torch.LongTensor):
    method forward (line 102) | def forward(self,
    method embed (line 111) | def embed(self, sents: List[List[str]], vocab: Dict[str, int]):
  class ContextualStringEmbeddingTransform (line 135) | class ContextualStringEmbeddingTransform(Configurable):
    method __init__ (line 137) | def __init__(self, src: str) -> None:
    method __call__ (line 140) | def __call__(self, sample: dict):
  class ContextualStringEmbedding (line 174) | class ContextualStringEmbedding(Embedding):
    method __init__ (line 175) | def __init__(self, field, path, trainable=False) -> None:
    method transform (line 181) | def transform(self, **kwargs) -> Callable:
    method module (line 188) | def module(self, **kwargs) -> nn.Module:
  function main (line 192) | def main():
  function _validate (line 198) | def _validate():

FILE: hanlp/layers/embeddings/contextual_string_embedding_tf.py
  class ContextualStringEmbeddingTF (line 16) | class ContextualStringEmbeddingTF(tf.keras.layers.Layer):
    method __init__ (line 18) | def __init__(self, forward_model_path=None, backward_model_path=None, ...
    method call (line 36) | def call(self, inputs, **kwargs):
    method _load_lm (line 42) | def _load_lm(self, filepath):
    method embed (line 52) | def embed(self, texts: List[List[str]]):
    method _run_rnn (line 74) | def _run_rnn(self, texts, model):
    method _get_raw_string (line 97) | def _get_raw_string(self, sent: List[str], tokenizer):
    method get_config (line 113) | def get_config(self):
    method output_dim (line 123) | def output_dim(self):
    method compute_output_shape (line 130) | def compute_output_shape(self, input_shape):
    method compute_mask (line 133) | def compute_mask(self, inputs, mask=None):

FILE: hanlp/layers/embeddings/contextual_word_embedding.py
  class ContextualWordEmbeddingModule (line 17) | class ContextualWordEmbeddingModule(TransformerEncoder):
    method __init__ (line 18) | def __init__(self,
    method forward (line 53) | def forward(self, batch: dict, mask=None, **kwargs):
    method get_output_dim (line 70) | def get_output_dim(self):
    method get_device (line 73) | def get_device(self):
  class ContextualWordEmbedding (line 78) | class ContextualWordEmbedding(Embedding, AutoConfigurable):
    method __init__ (line 79) | def __init__(self, field: str,
    method transform (line 156) | def transform(self, **kwargs) -> TransformerSequenceTokenizer:
    method module (line 159) | def module(self, training=True, **kwargs) -> Optional[nn.Module]:
    method get_output_dim (line 172) | def get_output_dim(self):
    method get_tokenizer (line 176) | def get_tokenizer(self):
  function find_transformer (line 180) | def find_transformer(embed: nn.Module):

FILE: hanlp/layers/embeddings/embedding.py
  class EmbeddingDim (line 16) | class EmbeddingDim(ABC):
    method embedding_dim (line 19) | def embedding_dim(self) -> int:
    method get_output_dim (line 22) | def get_output_dim(self) -> int:
  class Embedding (line 26) | class Embedding(AutoConfigurable, ABC):
    method __init__ (line 28) | def __init__(self) -> None:
    method transform (line 34) | def transform(self, **kwargs) -> Optional[Callable]:
    method module (line 45) | def module(self, **kwargs) -> Optional[nn.Module]:
  class ConcatModuleList (line 57) | class ConcatModuleList(nn.ModuleList, EmbeddingDim):
    method __init__ (line 59) | def __init__(self, *modules: Optional[Iterable[Module]], dropout=None)...
    method embedding_dim (line 72) | def embedding_dim(self) -> int:
    method get_output_dim (line 75) | def get_output_dim(self) -> int:
    method forward (line 79) | def forward(self, batch: dict, **kwargs):
    method embeddings (line 86) | def embeddings(self):
  class EmbeddingList (line 93) | class EmbeddingList(Embedding):
    method __init__ (line 94) | def __init__(self, *embeddings_, embeddings: dict = None, dropout=None...
    method transform (line 112) | def transform(self, **kwargs):
    method module (line 117) | def module(self, **kwargs):
    method to_list (line 122) | def to_list(self):
  function find_embedding_by_class (line 126) | def find_embedding_by_class(embed: Embedding, cls):

FILE: hanlp/layers/embeddings/fast_text.py
  class FastTextTransform (line 26) | class FastTextTransform(EmbeddingNamedTransform):
    method __init__ (line 27) | def __init__(self, filepath: str, src, dst=None, **kwargs) -> None:
    method __call__ (line 39) | def __call__(self, sample: dict):
    method embed (line 48) | def embed(self, word: str):
  class SelectFromBatchModule (line 52) | class SelectFromBatchModule(torch.nn.Module):
    method __init__ (line 53) | def __init__(self, key) -> None:
    method __call__ (line 57) | def __call__(self, batch: dict, mask=None, **kwargs):
  class FastTextEmbeddingModule (line 61) | class FastTextEmbeddingModule(SelectFromBatchModule):
    method __init__ (line 63) | def __init__(self, key, embedding_dim: int) -> None:
    method __call__ (line 73) | def __call__(self, batch: dict, mask=None, **kwargs):
    method __repr__ (line 80) | def __repr__(self):
    method get_output_dim (line 86) | def get_output_dim(self):
  class FastTextEmbedding (line 90) | class FastTextEmbedding(Embedding, AutoConfigurable):
    method __init__ (line 91) | def __init__(self, src: str, filepath: str) -> None:
    method transform (line 103) | def transform(self, **kwargs) -> Optional[Callable]:
    method module (line 106) | def module(self, **kwargs) -> Optional[nn.Module]:
  class FastTextDataset (line 110) | class FastTextDataset(TransformableDataset):
    method load_file (line 112) | def load_file(self, filepath: str):
  class FastTextEmbeddingComponent (line 116) | class FastTextEmbeddingComponent(TorchComponent):
    method __init__ (line 117) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 125) | def build_dataloader(self, data, shuffle=False, device=None, logger: l...
    method build_optimizer (line 131) | def build_optimizer(self, **kwargs):
    method build_criterion (line 134) | def build_criterion(self, **kwargs):
    method build_metric (line 137) | def build_metric(self, **kwargs):
    method execute_training_loop (line 140) | def execute_training_loop(self, trn: DataLoader, dev: DataLoader, epoc...
    method fit_dataloader (line 144) | def fit_dataloader(self, trn: DataLoader, criterion, optimizer, metric...
    method evaluate_dataloader (line 147) | def evaluate_dataloader(self, data: DataLoader, criterion: Callable, m...
    method load_vocabs (line 150) | def load_vocabs(self, save_dir, filename='vocabs.json'):
    method load_weights (line 153) | def load_weights(self, save_dir, filename='model.pt', **kwargs):
    method build_model (line 156) | def build_model(self, training=True, **kwargs) -> torch.nn.Module:
    method predict (line 160) | def predict(self, data: str, **kwargs):
    method devices (line 166) | def devices(self):

FILE: hanlp/layers/embeddings/fast_text_tf.py
  class FastTextEmbeddingTF (line 18) | class FastTextEmbeddingTF(tf.keras.layers.Embedding):
    method __init__ (line 20) | def __init__(self, filepath: str, padding=PAD, name=None, **kwargs):
    method embed (line 41) | def embed(self, word):
    method embed_np (line 44) | def embed_np(self, words: np.ndarray):
    method build (line 56) | def build(self, input_shape):
    method compute_output_shape (line 60) | def compute_output_shape(self, input_shape):
    method call (line 63) | def call(self, inputs: tf.Tensor):
    method compute_mask (line 84) | def compute_mask(self, inputs, mask=None):
    method get_config (line 89) | def get_config(self):

FILE: hanlp/layers/embeddings/util.py
  function index_word2vec_with_vocab (line 14) | def index_word2vec_with_vocab(filepath: str,
  function build_word2vec_with_vocab (line 78) | def build_word2vec_with_vocab(embed: Union[str, int],

FILE: hanlp/layers/embeddings/util_tf.py
  function build_embedding (line 23) | def build_embedding(embeddings: Union[str, int, dict], word_vocab: Vocab...
  function any_embedding_in (line 69) | def any_embedding_in(embeddings, *cls):
  function embeddings_require_string_input (line 78) | def embeddings_require_string_input(embeddings):
  function embeddings_require_char_input (line 85) | def embeddings_require_char_input(embeddings):

FILE: hanlp/layers/embeddings/word2vec.py
  class Word2VecEmbeddingModule (line 27) | class Word2VecEmbeddingModule(nn.Module, EmbeddingDim):
    method __init__ (line 28) | def __init__(self, field: str, embed: nn.Embedding, word_dropout: Word...
    method forward (line 57) | def forward(self, batch: dict, **kwargs):
    method embedding_dim (line 76) | def embedding_dim(self) -> int:
    method _apply (line 87) | def _apply(self, fn):
  class Word2VecEmbedding (line 93) | class Word2VecEmbedding(Embedding, AutoConfigurable):
    method __init__ (line 94) | def __init__(self,
    method module (line 139) | def module(self, vocabs: VocabDict, **kwargs) -> Optional[nn.Module]:
    method transform (line 162) | def transform(self, vocabs: VocabDict = None, **kwargs) -> Optional[Ca...
  class Word2VecDataset (line 169) | class Word2VecDataset(TransformableDataset):
    method load_file (line 171) | def load_file(self, filepath: str):
  class Word2VecEmbeddingComponent (line 175) | class Word2VecEmbeddingComponent(TorchComponent):
    method __init__ (line 177) | def __init__(self, **kwargs) -> None:
    method build_dataloader (line 186) | def build_dataloader(self, data: List[str], shuffle=False, device=None...
    method build_optimizer (line 191) | def build_optimizer(self, **kwargs):
    method build_criterion (line 194) | def build_criterion(self, **kwargs):
    method build_metric (line 197) | def build_metric(self,

Download .json

Condensed preview — 697 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,549K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 1004,
    "preview": "---\nname: 🐛发现一个bug\nabout: 需提交版本号、触发代码、错误日志\ntitle: ''\nlabels: bug\nassignees: hankcs\n\n---\n\n<!--\n感谢找出bug，请认真填写下表：\n-->\n\n**De"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/config.yml",
    "chars": 122,
    "preview": "blank_issues_enabled: false\ncontact_links:\n  - name: ⁉️ 提问求助请上论坛\n    url: https://bbs.hankcs.com/\n    about: 欢迎前往蝴蝶效应论坛求"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 603,
    "preview": "---\nname: 🚀新功能请愿\nabout: 建议增加一个新功能\ntitle: ''\nlabels: feature request\nassignees: hankcs\n\n---\n\n<!--\n提问请上论坛，不要发这里！\n提问请上论坛，不要"
  },
  {
    "path": ".github/pull_request_template.md",
    "chars": 1467,
    "preview": "<!--\nThank you for being interested in contributing to HanLP! You are awesome ✨.\n⚠️Changes must be made on dev branch.\n-"
  },
  {
    "path": ".github/workflows/unit-tests.yml",
    "chars": 2252,
    "preview": "name: Unit Tests\n\non:\n  push:\n    branches: [ \"**\" ]\n  pull_request:\n    branches: [ \"**\" ]\n\njobs:\n  build:\n\n    runs-on"
  },
  {
    "path": ".gitignore",
    "chars": 4513,
    "preview": "# Created by .ignore support plugin (hsz.mobi)\n### Python template\n# Byte-compiled / optimized / DLL files\n__pycache__/\n"
  },
  {
    "path": "CITATION.cff",
    "chars": 1368,
    "preview": "cff-version: 1.2.0\nmessage: \"If you use this software, please cite it as below.\"\nauthors:\n- family-names: He\n  given-nam"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 30164,
    "preview": "<h2 align=\"center\">HanLP: Han Language Processing</h2>\n\n<div align=\"center\">\n    <a href=\"https://github.com/hankcs/HanL"
  },
  {
    "path": "docs/Makefile",
    "chars": 634,
    "preview": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the "
  },
  {
    "path": "docs/annotations/constituency/ctb.md",
    "chars": 4761,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/constituency/index.md",
    "chars": 126,
    "preview": "# Constituency Parsing\n\n## Chinese\n```{toctree}\nctb\n```\n\n## English\n```{toctree}\nptb\n```\n\n## Japanese\n```{toctree}\nnpcmj"
  },
  {
    "path": "docs/annotations/constituency/npcmj.md",
    "chars": 6141,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/constituency/ptb.md",
    "chars": 7405,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/dep/index.md",
    "chars": 135,
    "preview": "# Dependency Parsing\n\n## Chinese\n\n```{toctree}\nsd_zh\npmt\n```\n\n## English\n\n```{toctree}\nsd_en\n```\n\n## Multilingual\n\n```{t"
  },
  {
    "path": "docs/annotations/dep/pmt.md",
    "chars": 3309,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/dep/sd_en.md",
    "chars": 4215,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/dep/sd_zh.md",
    "chars": 3458,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/dep/ud.md",
    "chars": 9067,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/index.md",
    "chars": 113,
    "preview": "# Annotations\n\n\n```{toctree}\ntok/index\npos/index\nner/index\ndep/index\nsdp/index\nsrl/index\nconstituency/index\n```\n\n"
  },
  {
    "path": "docs/annotations/ner/index.md",
    "chars": 111,
    "preview": "# Named Entity Recognition\n\n## Chinese\n\n```{toctree}\npku\nmsra\n```\n\n## Multilingual\n\n```{toctree}\nontonotes\n```\n"
  },
  {
    "path": "docs/annotations/ner/msra.md",
    "chars": 3240,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/ner/ontonotes.md",
    "chars": 2130,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/ner/pku.md",
    "chars": 4241,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/pos/863.md",
    "chars": 18664,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/pos/ctb.md",
    "chars": 7089,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/pos/index.md",
    "chars": 143,
    "preview": "# Part-of-Speech Tagging\n\n## Chinese\n```{toctree}\nctb\npku\n863\n```\n\n## Japanese\n```{toctree}\nnpcmj\n```\n\n## Multilingual\n\n"
  },
  {
    "path": "docs/annotations/pos/npcmj.md",
    "chars": 3068,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/pos/pku.md",
    "chars": 11024,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/pos/ud.md",
    "chars": 1736,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/sdp/dm.md",
    "chars": 159,
    "preview": "# The reduction of Minimal Recursion Semantics\n\nPlease refer to [Minimal Recursion Semantics An Introduction](https://ww"
  },
  {
    "path": "docs/annotations/sdp/index.md",
    "chars": 112,
    "preview": "# Semantic Dependency Parsing\n\n## Chinese\n\n```{toctree}\nsemeval16\n```\n\n## English\n\n```{toctree}\ndm\npas\npsd\n```\n\n"
  },
  {
    "path": "docs/annotations/sdp/pas.md",
    "chars": 166,
    "preview": "# Predicate-Argument Structures\n\nPlease refer to [Probabilistic disambiguation models for wide-coverage HPSG parsing](ht"
  },
  {
    "path": "docs/annotations/sdp/psd.md",
    "chars": 152,
    "preview": "# Prague Czech-English Dependency Treebank\n\nPlease refer to [Prague Czech-English Dependency Treebank](http://ufal.mff.c"
  },
  {
    "path": "docs/annotations/sdp/semeval16.md",
    "chars": 12722,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/srl/cpb.md",
    "chars": 2752,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/srl/index.md",
    "chars": 97,
    "preview": "# Semantic Role Labeling\n\n## Chinese\n```{toctree}\ncpb\n```\n\n## English\n```{toctree}\npropbank\n```\n\n"
  },
  {
    "path": "docs/annotations/srl/propbank.md",
    "chars": 2240,
    "preview": "<!--\n# ========================================================================\n# Copyright 2020 hankcs\n#\n# Licensed und"
  },
  {
    "path": "docs/annotations/tok/ctb.md",
    "chars": 44077,
    "preview": "The Segmentation Guidelines for the Penn Chinese Treebank (3.0)\n========================================================"
  },
  {
    "path": "docs/annotations/tok/index.md",
    "chars": 51,
    "preview": "# Tokenization\n\n## Chinese\n```{toctree}\nctb\nmsr\n```"
  },
  {
    "path": "docs/annotations/tok/msr.md",
    "chars": 72037,
    "preview": "# MSR中文文本标注规范 (5.0 版)\n\n[**Tokenization Guidelines of Chinese Text (V5.0)**](http://sighan.cs.uchicago.edu/bakeoff2006/MS"
  },
  {
    "path": "docs/api/common/configurable.rst",
    "chars": 194,
    "preview": ".. _api/configurable:\n\nconfigurable\n====================\n\n\n.. autoclass:: hanlp_common.configurable.Configurable\n\t:membe"
  },
  {
    "path": "docs/api/common/conll.rst",
    "chars": 216,
    "preview": ".. _api/conll:\n\nconll\n====================\n\n\n.. autoclass:: hanlp_common.conll.CoNLLWord\n\t:members:\n\n.. autoclass:: hanl"
  },
  {
    "path": "docs/api/common/constant.rst",
    "chars": 81,
    "preview": "constant\n====================\n\n\n.. automodule:: hanlp_common.constant\n\t:members:\n"
  },
  {
    "path": "docs/api/common/document.rst",
    "chars": 140,
    "preview": ".. _api/document:\n\ndocument\n====================\n\n.. currentmodule:: hanlp_common\n\n.. autoclass:: hanlp_common.document."
  },
  {
    "path": "docs/api/common/index.md",
    "chars": 122,
    "preview": "# hanlp_common\n\nCommon APIs shared between `hanlp` and `restful`.\n\n```{toctree}\ndocument\nconll\nconfigurable\nconstant\n```"
  },
  {
    "path": "docs/api/hanlp/common/component.rst",
    "chars": 121,
    "preview": "component\n=================\n\n.. currentmodule:: hanlp.common\n\n.. autoclass:: hanlp.common.component.Component\n\t:members:"
  },
  {
    "path": "docs/api/hanlp/common/dataset.md",
    "chars": 1183,
    "preview": "# dataset\n\nThis module provides base definition for datasets, dataloaders and samplers.\n\n## datasets\n\n```{eval-rst}\n.. c"
  },
  {
    "path": "docs/api/hanlp/common/index.md",
    "chars": 110,
    "preview": "# common\n\nCommon base classes.\n\n```{toctree}\nstructure\nvocab\ntransform\ndataset\ncomponent\ntorch_component\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/common/structure.md",
    "chars": 186,
    "preview": "# structure\n\n```{eval-rst}\n.. currentmodule:: hanlp.common\n\n.. autoclass:: hanlp.common.structure.ConfigTracker\n\t:member"
  },
  {
    "path": "docs/api/hanlp/common/torch_component.md",
    "chars": 157,
    "preview": "# torch_component\n\n```{eval-rst}\n.. currentmodule:: hanlp.common.torch_component\n\n.. autoclass:: hanlp.common.torch_comp"
  },
  {
    "path": "docs/api/hanlp/common/transform.md",
    "chars": 124,
    "preview": "# transform\n\n```{eval-rst}\n.. currentmodule:: hanlp.common\n\n.. autoclass:: hanlp.common.transform.VocabDict\n\t:members:\n\n"
  },
  {
    "path": "docs/api/hanlp/common/vocab.md",
    "chars": 192,
    "preview": "# vocab\n\n```{eval-rst}\n.. currentmodule:: hanlp.common\n\n.. autoclass:: hanlp.common.transform.Vocab\n\t:members:\n\t:special"
  },
  {
    "path": "docs/api/hanlp/components/classifiers.md",
    "chars": 183,
    "preview": "# classifiers\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.classifiers\n\n.. autoclass:: hanlp.components.classifier"
  },
  {
    "path": "docs/api/hanlp/components/eos.md",
    "chars": 150,
    "preview": "# eos\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.eos\n\n.. autoclass:: hanlp.components.eos.ngram.NgramSentenceBou"
  },
  {
    "path": "docs/api/hanlp/components/index.md",
    "chars": 164,
    "preview": "# components\n\nNLP components.\n\n```{toctree}\nmtl/index\nclassifiers\neos\ntokenizers/index\nlemmatizer\ntaggers/index\nner/inde"
  },
  {
    "path": "docs/api/hanlp/components/lemmatizer.md",
    "chars": 129,
    "preview": "# lemmatizer\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.lemmatizer\n\n.. autoclass:: TransformerLemmatizer\n\t:membe"
  },
  {
    "path": "docs/api/hanlp/components/mtl/index.md",
    "chars": 79,
    "preview": "# mtl\n\nMulti-Task Learning (MTL) framework.\n\n```{toctree}\nmtl\ntasks/index\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/mtl/mtl.md",
    "chars": 223,
    "preview": "# MultiTaskLearning\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl.multi_ta"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/constituency.md",
    "chars": 236,
    "preview": "# con\n\nConstituency parsing.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/dep.md",
    "chars": 228,
    "preview": "# dep\n\nDependency parsing.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl.t"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/index.md",
    "chars": 122,
    "preview": "# tasks\n\nMulti-Task Learning (MTL) tasks.\n\n```{toctree}\ntask\nconstituency\ndep\nsdp\nud\nlem\npos\ntok\nner/index\nsrl/index\n```"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/lem.md",
    "chars": 222,
    "preview": "# lem\n\nLemmatization.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl.tasks."
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/ner/biaffine_ner.md",
    "chars": 270,
    "preview": "# biaffine_ner\n\nBiaffine Named Entity Recognition.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/ner/index.md",
    "chars": 73,
    "preview": "# ner\n\nNamed Entity Recognition.\n\n```{toctree}\ntag_ner\nbiaffine_ner\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/ner/tag_ner.md",
    "chars": 264,
    "preview": "# tag_ner\n\nTagging based Named Entity Recognition.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/pos.md",
    "chars": 225,
    "preview": "# pos\n\nPart-of-speech tagging.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.m"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/sdp.md",
    "chars": 245,
    "preview": "# sdp\n\nSemantic Dependency Parsing.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.compone"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/srl/bio_srl.md",
    "chars": 264,
    "preview": "# bio_srl\n\nBIO Tagging based Semantic Role Labeling.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclas"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/srl/index.md",
    "chars": 67,
    "preview": "# srl\n\nSemantic Role Labeling.\n\n```{toctree}\nbio_srl\nrank_srl\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/srl/rank_srl.md",
    "chars": 265,
    "preview": "# rank_srl\n\nSpan Ranking Semantic Role Labeling.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: "
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/task.md",
    "chars": 183,
    "preview": "# Task\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl.tasks.Task\n\t:members:"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/tok.md",
    "chars": 224,
    "preview": "# tok\n\nTokenization.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.mtl\n\n.. autoclass:: hanlp.components.mtl.tasks.t"
  },
  {
    "path": "docs/api/hanlp/components/mtl/tasks/ud.md",
    "chars": 303,
    "preview": "# ud\n\nUniversal Dependencies Parsing (lemmatization, features, PoS tagging and dependency parsing).\n\n```{eval-rst}\n.. cu"
  },
  {
    "path": "docs/api/hanlp/components/ner/biaffine_ner.md",
    "chars": 231,
    "preview": "# biaffine_ner\n\nBiaffine Named Entity Recognition.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.ner.transformer_ne"
  },
  {
    "path": "docs/api/hanlp/components/ner/index.md",
    "chars": 89,
    "preview": "# ner\n\nNamed Entity Recognition.\n\n```{toctree}\ntransformer_ner\nrnn_ner\nbiaffine_ner\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/ner/rnn_ner.md",
    "chars": 200,
    "preview": "# rnn_ner\n\nTagging based Named Entity Recognition.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.ner.rnn_ner\n\n.. au"
  },
  {
    "path": "docs/api/hanlp/components/ner/transformer_ner.md",
    "chars": 232,
    "preview": "# transformer_ner\n\nTagging based Named Entity Recognition.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.ner.transf"
  },
  {
    "path": "docs/api/hanlp/components/parsers/biaffine_dep.md",
    "chars": 199,
    "preview": "# biaffine_dep\n\nBiaffine dependency parser.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components\n\n.. autoclass:: hanlp.com"
  },
  {
    "path": "docs/api/hanlp/components/parsers/biaffine_sdp.md",
    "chars": 207,
    "preview": "# biaffine_sdp\n\nBiaffine dependency parser.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components\n\n.. autoclass:: hanlp.com"
  },
  {
    "path": "docs/api/hanlp/components/parsers/crf_constituency_parser.md",
    "chars": 222,
    "preview": "# crf_constituency_parser\n\nBiaffine dependency parser.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components\n\n.. autoclass:"
  },
  {
    "path": "docs/api/hanlp/components/parsers/index.md",
    "chars": 99,
    "preview": "# parsers\n\nParsers.\n\n```{toctree}\nbiaffine_dep\nbiaffine_sdp\nud_parser\ncrf_constituency_parser\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/parsers/ud_parser.md",
    "chars": 256,
    "preview": "# ud_parser\n\nUniversal Dependencies Parsing (lemmatization, features, PoS tagging and dependency parsing).\n\n```{eval-rst"
  },
  {
    "path": "docs/api/hanlp/components/pipeline.md",
    "chars": 198,
    "preview": "# pipeline\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.pipeline\n\n.. autoclass:: hanlp.components.pipeline.Pipe\n\t:"
  },
  {
    "path": "docs/api/hanlp/components/srl/index.md",
    "chars": 69,
    "preview": "# srl\n\nSemantic Role Labelers.\n\n```{toctree}\nspan_rank\nspan_bio\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/srl/span_bio.md",
    "chars": 172,
    "preview": "# span_bio\n\nSpan BIO tagging based SRL.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.srl.span_bio.span_bio\n\n.. aut"
  },
  {
    "path": "docs/api/hanlp/components/srl/span_rank.md",
    "chars": 172,
    "preview": "# span_rank\n\nSpan Rank based SRL.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.srl.span_rank.span_rank\n\n.. autocla"
  },
  {
    "path": "docs/api/hanlp/components/sts.md",
    "chars": 168,
    "preview": "# sts\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.sts\n\n.. autoclass:: hanlp.components.sts.transformer_sts.Transf"
  },
  {
    "path": "docs/api/hanlp/components/taggers/index.md",
    "chars": 69,
    "preview": "# taggers\n\nTaggers.\n\n```{toctree}\ntransformer_tagger\nrnn_tagger\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/taggers/rnn_tagger.md",
    "chars": 161,
    "preview": "# rnn_tagger\n\nRNN based tagger.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components\n\n.. autoclass:: hanlp.components.tagg"
  },
  {
    "path": "docs/api/hanlp/components/taggers/transformer_tagger.md",
    "chars": 206,
    "preview": "# transformer_tagger\n\nTransformer based tagger.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components\n\n.. autoclass:: hanlp"
  },
  {
    "path": "docs/api/hanlp/components/tokenizers/index.md",
    "chars": 72,
    "preview": "# tokenizers\n\nTokenizers.\n\n```{toctree}\ntransformer\nmulti_criteria\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/components/tokenizers/multi_criteria.md",
    "chars": 292,
    "preview": "# multi_criteria\n\nTransformer based Multi-Criteria Word tokenizer.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.to"
  },
  {
    "path": "docs/api/hanlp/components/tokenizers/transformer.md",
    "chars": 218,
    "preview": "# transformer\n\nTransformer based tokenizer.\n\n```{eval-rst}\n.. currentmodule:: hanlp.components.tokenizers.transformer\n\n."
  },
  {
    "path": "docs/api/hanlp/datasets/constituency/constituency_dataset.md",
    "chars": 142,
    "preview": "# constituency_dataset\n\n```{eval-rst}\n\n.. autoclass:: hanlp.datasets.parsing.loaders.constituency_dataset.ConstituencyDa"
  },
  {
    "path": "docs/api/hanlp/datasets/constituency/index.md",
    "chars": 88,
    "preview": "# con\n\nConstituency parsing datasets.\n\n```{toctree}\nconstituency_dataset\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/constituency/resources.md",
    "chars": 971,
    "preview": "# resources\n\n## Chinese Treebank\n\n\n### CTB8\n\n\n\n````{margin} **Discussion**\n```{seealso}\nAbout our data split on [our for"
  },
  {
    "path": "docs/api/hanlp/datasets/dep/conll_dataset.md",
    "chars": 141,
    "preview": "# conll\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.parsing.loaders.conll_dataset \n\n\n.. autoclass:: CoNLLParsingDat"
  },
  {
    "path": "docs/api/hanlp/datasets/dep/index.md",
    "chars": 79,
    "preview": "# dep\n\nDependency parsing datasets.\n\n```{toctree}\nconll_dataset\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/dep/resources.md",
    "chars": 2607,
    "preview": "# resources\n\n## PKU Multiview Treebank\n\nPKU Multi-view Chinese Treebank, released by PKU-ICL. It contains the sentences "
  },
  {
    "path": "docs/api/hanlp/datasets/eos/eos.md",
    "chars": 128,
    "preview": "# eos\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.eos.eos\n\n.. autoclass:: SentenceBoundaryDetectionDataset\n\t:member"
  },
  {
    "path": "docs/api/hanlp/datasets/eos/index.md",
    "chars": 78,
    "preview": "# eos\n\nSentence boundary detection datasets.\n\n```{toctree}\neos\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/eos/resources.md",
    "chars": 107,
    "preview": "# resources\n\n## nn_eos\n\n```{eval-rst}\n\n.. automodule:: hanlp.datasets.eos.loaders.nn_eos\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/datasets/index.md",
    "chars": 839,
    "preview": "# datasets\n\n```{eval-rst}\nNLP datasets grouped by tasks. For each task, we provide at least one ``torch.utils.data.Datas"
  },
  {
    "path": "docs/api/hanlp/datasets/ner/index.md",
    "chars": 59,
    "preview": "# ner\n\nNER datasets.\n\n```{toctree}\ntsv\njson\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/ner/json.md",
    "chars": 124,
    "preview": "# json\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.ner.loaders.json_ner\n\n.. autoclass:: JsonNERDataset\n\t:members:\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/ner/resources.md",
    "chars": 914,
    "preview": "# resources\n\n## CoNLL 2003\n\n```{eval-rst}\n\n.. automodule:: hanlp.datasets.ner.conll03\n    :members:\n\n```\n\n## MSRA\n\n```{e"
  },
  {
    "path": "docs/api/hanlp/datasets/ner/tsv.md",
    "chars": 121,
    "preview": "# tsv\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.ner.loaders.tsv\n\n.. autoclass:: TSVTaggingDataset\n\t:members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/datasets/pos/index.md",
    "chars": 181,
    "preview": "# pos\n\nPoS datasets. \n\n```{eval-rst}\nPoS is a normal tagging task which uses :class:`hanlp.datasets.ner.loaders.tsv.TSVT"
  },
  {
    "path": "docs/api/hanlp/datasets/pos/resources.md",
    "chars": 493,
    "preview": "# resources\n\n## CTB5\n\n```{eval-rst}\n\n.. automodule:: hanlp.datasets.pos.ctb5\n    :members:\n\n```\n\n## CTB8\n\n```{eval-rst}\n"
  },
  {
    "path": "docs/api/hanlp/datasets/srl/conll2012_dataset.md",
    "chars": 124,
    "preview": "# conll2012_dataset\n\n```{eval-rst}\n\n.. autoclass:: hanlp.datasets.srl.loaders.conll2012.CoNLL2012SRLDataset\n\t:members:\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/srl/index.md",
    "chars": 87,
    "preview": "# srl\n\nSemantic Role Labeling datasets.\n\n```{toctree}\nconll2012_dataset\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/srl/resources.md",
    "chars": 356,
    "preview": "# resources\n\n## OntoNotes 5\n\n### Chinese\n\n```{eval-rst}\n\n.. autodata:: hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_"
  },
  {
    "path": "docs/api/hanlp/datasets/tok/index.md",
    "chars": 76,
    "preview": "# tok\n\nTokenization datasets.\n\n```{toctree}\ntxt\nmcws_dataset\nresources\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/datasets/tok/mcws_dataset.md",
    "chars": 184,
    "preview": "# mcws_dataset\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.tokenization.loaders.multi_criteria_cws.mcws_dataset\n\n.."
  },
  {
    "path": "docs/api/hanlp/datasets/tok/resources.md",
    "chars": 996,
    "preview": "# resources\n\n## sighan2005\n\n[The Second International Chinese Word Segmentation Bakeoff](http://sighan.cs.uchicago.edu/b"
  },
  {
    "path": "docs/api/hanlp/datasets/tok/txt.md",
    "chars": 134,
    "preview": "# txt\n\n```{eval-rst}\n.. currentmodule:: hanlp.datasets.tokenization.loaders.txt\n\n.. autoclass:: TextTokenizingDataset\n\t:"
  },
  {
    "path": "docs/api/hanlp/hanlp.rst",
    "chars": 109,
    "preview": ".. _api/main:\n\nhanlp\n==========\n\n.. currentmodule:: hanlp\n\n.. autofunction:: load\n\n.. autofunction:: pipeline"
  },
  {
    "path": "docs/api/hanlp/index.md",
    "chars": 142,
    "preview": "# hanlp\n\nCore APIs for `hanlp`.\n\n```{toctree}\nhanlp\ncommon/index\ncomponents/index\npretrained/index\ndatasets/index\nutils/"
  },
  {
    "path": "docs/api/hanlp/layers/decoders/biaffine_ner.md",
    "chars": 154,
    "preview": "# biaffine_ner\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.components.ner.biaffine_ner.biaffine_ner_model.BiaffineNamedEntityR"
  },
  {
    "path": "docs/api/hanlp/layers/decoders/index.md",
    "chars": 54,
    "preview": "# decoders\n\n```{toctree}\nlinear_crf\nbiaffine_ner\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/layers/decoders/linear_crf.md",
    "chars": 109,
    "preview": "# linear_crf\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.components.mtl.tasks.pos.LinearCRFDecoder\n\t:members:\n\n```\n"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/char_cnn.md",
    "chars": 177,
    "preview": "# char_cnn\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.char_cnn.CharCNN\n\t:members:\n\n.. autoclass:: hanlp.lay"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/char_rnn.md",
    "chars": 177,
    "preview": "# char_rnn\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.char_rnn.CharRNN\n\t:members:\n\n.. autoclass:: hanlp.lay"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/embedding.md",
    "chars": 257,
    "preview": "# embedding\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.embedding.Embedding\n\t:members:\n\n.. autoclass:: hanlp"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/fasttext.md",
    "chars": 195,
    "preview": "# fasttext\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.fast_text.FastTextEmbedding\n\t:members:\n\n.. autoclass::"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/index.md",
    "chars": 90,
    "preview": "# embeddings\n\n```{toctree}\nembedding\nword2vec\nfasttext\nchar_cnn\nchar_rnn\ntransformer\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/transformer.md",
    "chars": 243,
    "preview": "# transformer\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.contextual_word_embedding.ContextualWordEmbedding\n"
  },
  {
    "path": "docs/api/hanlp/layers/embeddings/word2vec.md",
    "chars": 193,
    "preview": "# word2vec\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.embeddings.word2vec.Word2VecEmbedding\n\t:members:\n\n.. autoclass:: "
  },
  {
    "path": "docs/api/hanlp/layers/index.md",
    "chars": 79,
    "preview": "# layers\n\n```{toctree}\nembeddings/index\ntransformers/index\ndecoders/index\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/layers/transformers/encoder.md",
    "chars": 111,
    "preview": "# encoder\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.layers.transformers.encoder.TransformerEncoder\n\t:members:\n\n```\n"
  },
  {
    "path": "docs/api/hanlp/layers/transformers/index.md",
    "chars": 52,
    "preview": "# transformers\n\n```{toctree}\nencoder\ntokenizer\n```\n\n"
  },
  {
    "path": "docs/api/hanlp/layers/transformers/tokenizer.md",
    "chars": 127,
    "preview": "# tokenizer\n\n\n```{eval-rst}\n\n.. autoclass:: hanlp.transform.transformer_tokenizer.TransformerSequenceTokenizer\n\t:members"
  },
  {
    "path": "docs/api/hanlp/pretrained/amr.md",
    "chars": 1037,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/amr2text.md",
    "chars": 1326,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/constituency.md",
    "chars": 1308,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/dep.md",
    "chars": 77,
    "preview": "# dep\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.dep\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/eos.md",
    "chars": 78,
    "preview": "# eos\n\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.eos\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/fasttext.md",
    "chars": 87,
    "preview": "# fasttext\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.fasttext\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/glove.md",
    "chars": 81,
    "preview": "# glove\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.glove\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/index.md",
    "chars": 400,
    "preview": "# pretrained\n\n```{eval-rst}\nNLP components grouped by tasks. For each task, we provide at least one :class:`~hanlp.commo"
  },
  {
    "path": "docs/api/hanlp/pretrained/mlm.md",
    "chars": 1363,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/mtl.md",
    "chars": 77,
    "preview": "# mtl\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.mtl\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/ner.md",
    "chars": 77,
    "preview": "# ner\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.ner\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/pos.md",
    "chars": 1060,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/sdp.md",
    "chars": 77,
    "preview": "# sdp\n\n```{eval-rst}\n\n.. automodule:: hanlp.pretrained.sdp\n    :members:\n\n```"
  },
  {
    "path": "docs/api/hanlp/pretrained/srl.md",
    "chars": 898,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/sts.md",
    "chars": 821,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/tok.md",
    "chars": 1025,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/pretrained/word2vec.md",
    "chars": 2028,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/api/hanlp/utils/index.md",
    "chars": 46,
    "preview": "# utils\n\nUtilities.\n\n```{toctree}\nio_util\n```\n"
  },
  {
    "path": "docs/api/hanlp/utils/io_util.md",
    "chars": 110,
    "preview": "# io_util\n\n```{eval-rst}\n\n.. currentmodule:: hanlp.utils\n\n.. automodule:: hanlp.utils.io_util\n\t:members:\n\n```\n"
  },
  {
    "path": "docs/api/restful.rst",
    "chars": 201,
    "preview": ".. _api/hanlp_restful:\n\nhanlp_restful\n====================\n\n.. currentmodule:: hanlp_restful\n\n.. autoclass:: HanLPClient"
  },
  {
    "path": "docs/api/restful_golang.md",
    "chars": 836,
    "preview": "# Golang RESTful API\n\n## Install\n\n```shell script\ngo get -u github.com/hankcs/gohanlp@main\n```\n\n## Quick Start \n\nObtain "
  },
  {
    "path": "docs/api/restful_java.md",
    "chars": 879,
    "preview": "# Java RESTful API\n\nAdd the following dependency into the `pom.xml` file of your project. \n\n```xml\n<dependency>\n  <group"
  },
  {
    "path": "docs/api/trie/dictionary.md",
    "chars": 183,
    "preview": "# dictionary\n\n```{eval-rst}\n.. currentmodule:: hanlp_trie\n\n.. autoclass:: hanlp_trie.dictionary.DictInterface\n\t:members:"
  },
  {
    "path": "docs/api/trie/index.md",
    "chars": 113,
    "preview": "# hanlp_trie\n\nHanLP trie/dictionary interface and referential implementation.\n\n```{toctree}\ntrie\ndictionary\n```\n\n"
  },
  {
    "path": "docs/api/trie/trie.md",
    "chars": 152,
    "preview": "# trie\n\n```{eval-rst}\n.. currentmodule:: hanlp_trie\n\n.. autoclass:: hanlp_trie.trie.Node\n\t:members:\n\n.. autoclass:: hanl"
  },
  {
    "path": "docs/conf.py",
    "chars": 4781,
    "preview": "# -- Project information -----------------------------------------------------\nimport sys\nimport os\nfrom datetime import"
  },
  {
    "path": "docs/configure.md",
    "chars": 3115,
    "preview": "# Configuration\n\n## Customize ``HANLP_HOME``\n\nAll resources HanLP use will be cached into a directory called `HANLP_HOME"
  },
  {
    "path": "docs/contributing.md",
    "chars": 1603,
    "preview": "# Contributing Guide\n\nThank you for being interested in contributing to `HanLP`! You\nare awesome ✨.\n\nThis guideline cont"
  },
  {
    "path": "docs/data_format.md",
    "chars": 5097,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "docs/index.md",
    "chars": 1885,
    "preview": "# HanLP: Han Language Processing\n\n[![GitHub stars](https://img.shields.io/github/stars/hankcs/HanLP)](https://github.com"
  },
  {
    "path": "docs/install.md",
    "chars": 5206,
    "preview": "# Install\n\n```{figure} _static/install-versions.svg\n---\nwidth: 100%\nfigclass: caption\nalt: HanLP versions\nname: hanlp-ve"
  },
  {
    "path": "docs/references.bib",
    "chars": 43290,
    "preview": "%% This BibTeX bibliography file was created using BibDesk.\n%% https://bibdesk.sourceforge.io/\n\n%% Created for hankcs at"
  },
  {
    "path": "docs/references.rst",
    "chars": 92,
    "preview": "References\n==================\n\n.. bibliography:: references.bib\n\t:cited:\n\t:style: astrostyle"
  },
  {
    "path": "docs/tutorial.md",
    "chars": 5030,
    "preview": "---\njupytext:\n  formats: ipynb,md:myst\n  text_representation:\n    extension: .md\n    format_name: myst\n    format_versio"
  },
  {
    "path": "hanlp/__init__.py",
    "chars": 2073,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-06-13 18:05\nimport hanlp.common\nimport hanlp.components\nimport hanl"
  },
  {
    "path": "hanlp/callbacks/__init__.py",
    "chars": 64,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-12-05 02:10"
  },
  {
    "path": "hanlp/callbacks/fine_csv_logger.py",
    "chars": 2613,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-12-05 02:12\nimport copy\nfrom io import TextIOWrapper\nfrom typing im"
  },
  {
    "path": "hanlp/common/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-08-26 14:45\n"
  },
  {
    "path": "hanlp/common/component.py",
    "chars": 1004,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-08-26 14:45\nimport inspect\nfrom abc import ABC, abstractmethod\nfrom"
  },
  {
    "path": "hanlp/common/dataset.py",
    "chars": 34211,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2020-05-09 20:27\nimport math\nimport os\nimport random\nimport tempfile\nimp"
  },
  {
    "path": "hanlp/common/keras_component.py",
    "chars": 25197,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-08-26 14:45\nimport logging\nimport math\nimport os\nimport sys\nfrom ab"
  },
  {
    "path": "hanlp/common/structure.py",
    "chars": 2454,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-08-26 14:58\nfrom typing import Dict\n\nfrom hanlp_common.configurable"
  },
  {
    "path": "hanlp/common/torch_component.py",
    "chars": 26624,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2020-05-08 21:20\nimport logging\nimport os\nimport re\nimport time\nfrom abc"
  },
  {
    "path": "hanlp/common/transform.py",
    "chars": 15420,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2020-05-03 14:44\nimport logging\nimport os\nfrom abc import ABC, abstractm"
  },
  {
    "path": "hanlp/common/transform_tf.py",
    "chars": 11225,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-10-27 14:22\nimport inspect\nfrom abc import ABC, abstractmethod\nfrom"
  },
  {
    "path": "hanlp/common/vocab.py",
    "chars": 16012,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-06-13 22:42\nfrom collections import Counter\nfrom typing import List"
  },
  {
    "path": "hanlp/common/vocab_tf.py",
    "chars": 8615,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-06-13 22:42\nfrom typing import List, Dict, Union, Iterable\n\nfrom ha"
  },
  {
    "path": "hanlp/components/__init__.py",
    "chars": 95,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2019-08-26 16:10\nfrom .pipeline import Pipeline"
  },
  {
    "path": "hanlp/components/amr/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2020-08-20 17:35\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-05 17:53\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/bart_amr_generation.py",
    "chars": 5619,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-05 17:56\nimport logging\nimport os.path\nfrom typing import Callab"
  },
  {
    "path": "hanlp/components/amr/amrbart/bart_amr_parser.py",
    "chars": 8970,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-05 17:56\nimport logging\nimport os.path\nfrom typing import Callab"
  },
  {
    "path": "hanlp/components/amr/amrbart/common/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-05 17:53\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/common/constant.py",
    "chars": 54540,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/amrbart/common/penman_interface.py",
    "chars": 2502,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/amrbart/common/postprocessing.py",
    "chars": 17468,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/amrbart/data_interface/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-07 14:36\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/data_interface/dataset.py",
    "chars": 3300,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/amrbart/model_interface/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-03 20:33\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/model_interface/modeling_bart.py",
    "chars": 88401,
    "preview": "# coding=utf-8\n# Copyright 2021 The Fairseq Authors and The HuggingFace Inc. team. All rights reserved.\n#\n# Licensed und"
  },
  {
    "path": "hanlp/components/amr/amrbart/model_interface/tokenization_bart.py",
    "chars": 18599,
    "preview": "# coding:utf-8\n# this is a simplified version of \"https://github.com/SapienzaNLP/spring/blob/main/spring_amr/tokenizatio"
  },
  {
    "path": "hanlp/components/amr/amrbart/preprocess/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2022-12-03 20:33\n"
  },
  {
    "path": "hanlp/components/amr/amrbart/preprocess/amr_io.py",
    "chars": 2450,
    "preview": "# coding:utf-8\n# the code is migrated from https://github.com/SapienzaNLP/spring \n# MIT License\n#\n# Copyright (c) 2022 x"
  },
  {
    "path": "hanlp/components/amr/amrbart/preprocess/penman_interface.py",
    "chars": 2502,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/amrbart/preprocess/read_and_process.py",
    "chars": 4808,
    "preview": "# coding:utf-8\n# MIT License\n#\n# Copyright (c) 2022 xfbai\n#\n# Permission is hereby granted, free of charge, to any perso"
  },
  {
    "path": "hanlp/components/amr/seq2seq/__init__.py",
    "chars": 65,
    "preview": "# -*- coding:utf-8 -*-\n# Author: hankcs\n# Date: 2021-04-27 19:24\n"
  },
  {
    "path": "hanlp/components/amr/seq2seq/dataset/IO.py",
    "chars": 995,
    "preview": "import glob\nfrom typing import List, Union, Iterable\nfrom pathlib import Path\nfrom .penman import pm_load as pm_load\n\n\nd"
  }
]

// ... and 497 more files (download for full content)

About this extraction

This page contains the full source code of the hankcs/HanLP GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 697 files (3.2 MB), approximately 879.0k tokens, and a symbol index with 3347 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo