Full Code of stanfordnlp/stanza for AI

main 516b07140fdf cached

579 files

3.8 MB

1.0M tokens

3744 symbols

1 requests

Download .txt

Showing preview only (4,126K chars total). Download the full file or copy to clipboard to get everything.

Repository: stanfordnlp/stanza
Branch: main
Commit: 516b07140fdf
Files: 579
Total size: 3.8 MB

Directory structure:
gitextract_z9hqe0ws/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   └── question.md
│   ├── pull_request_template.md
│   ├── stale.yml
│   └── workflows/
│       └── stanza-tests.yaml
├── .gitignore
├── .travis.yml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── demo/
│   ├── CONLL_Dependency_Visualizer_Example.ipynb
│   ├── Dependency_Visualization_Testing.ipynb
│   ├── NER_Visualization.ipynb
│   ├── Stanza_Beginners_Guide.ipynb
│   ├── Stanza_CoreNLP_Interface.ipynb
│   ├── arabic_test.conllu.txt
│   ├── corenlp.py
│   ├── en_test.conllu.txt
│   ├── japanese_test.conllu.txt
│   ├── pipeline_demo.py
│   ├── scenegraph.py
│   ├── semgrex visualization.ipynb
│   ├── semgrex.py
│   └── ssurgeon_script.txt
├── doc/
│   └── CoreNLP.proto
├── scripts/
│   ├── config.sh
│   └── download_vectors.sh
├── setup.py
└── stanza/
    ├── __init__.py
    ├── _version.py
    ├── models/
    │   ├── __init__.py
    │   ├── _training_logging.py
    │   ├── charlm.py
    │   ├── classifier.py
    │   ├── classifiers/
    │   │   ├── __init__.py
    │   │   ├── base_classifier.py
    │   │   ├── cnn_classifier.py
    │   │   ├── config.py
    │   │   ├── constituency_classifier.py
    │   │   ├── data.py
    │   │   ├── iterate_test.py
    │   │   ├── trainer.py
    │   │   └── utils.py
    │   ├── common/
    │   │   ├── __init__.py
    │   │   ├── beam.py
    │   │   ├── bert_embedding.py
    │   │   ├── biaffine.py
    │   │   ├── build_short_name_to_treebank.py
    │   │   ├── char_model.py
    │   │   ├── chuliu_edmonds.py
    │   │   ├── constant.py
    │   │   ├── convert_pretrain.py
    │   │   ├── count_ner_coverage.py
    │   │   ├── count_pretrain_coverage.py
    │   │   ├── crf.py
    │   │   ├── data.py
    │   │   ├── doc.py
    │   │   ├── dropout.py
    │   │   ├── exceptions.py
    │   │   ├── foundation_cache.py
    │   │   ├── hlstm.py
    │   │   ├── large_margin_loss.py
    │   │   ├── loss.py
    │   │   ├── maxout_linear.py
    │   │   ├── packed_lstm.py
    │   │   ├── peft_config.py
    │   │   ├── pretrain.py
    │   │   ├── relative_attn.py
    │   │   ├── seq2seq_constant.py
    │   │   ├── seq2seq_model.py
    │   │   ├── seq2seq_modules.py
    │   │   ├── seq2seq_utils.py
    │   │   ├── short_name_to_treebank.py
    │   │   ├── stanza_object.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── constituency/
    │   │   ├── __init__.py
    │   │   ├── base_model.py
    │   │   ├── base_trainer.py
    │   │   ├── dynamic_oracle.py
    │   │   ├── ensemble.py
    │   │   ├── error_analysis_in_order.py
    │   │   ├── evaluate_treebanks.py
    │   │   ├── in_order_compound_oracle.py
    │   │   ├── in_order_oracle.py
    │   │   ├── label_attention.py
    │   │   ├── lstm_model.py
    │   │   ├── lstm_tree_stack.py
    │   │   ├── parse_transitions.py
    │   │   ├── parse_tree.py
    │   │   ├── parser_training.py
    │   │   ├── partitioned_transformer.py
    │   │   ├── positional_encoding.py
    │   │   ├── retagging.py
    │   │   ├── score_converted_dependencies.py
    │   │   ├── state.py
    │   │   ├── text_processing.py
    │   │   ├── top_down_oracle.py
    │   │   ├── trainer.py
    │   │   ├── transformer_tree_stack.py
    │   │   ├── transition_sequence.py
    │   │   ├── tree_embedding.py
    │   │   ├── tree_reader.py
    │   │   ├── tree_stack.py
    │   │   └── utils.py
    │   ├── constituency_parser.py
    │   ├── coref/
    │   │   ├── __init__.py
    │   │   ├── anaphoricity_scorer.py
    │   │   ├── bert.py
    │   │   ├── cluster_checker.py
    │   │   ├── config.py
    │   │   ├── conll.py
    │   │   ├── const.py
    │   │   ├── coref_chain.py
    │   │   ├── coref_config.toml
    │   │   ├── dataset.py
    │   │   ├── loss.py
    │   │   ├── model.py
    │   │   ├── pairwise_encoder.py
    │   │   ├── predict.py
    │   │   ├── rough_scorer.py
    │   │   ├── span_predictor.py
    │   │   ├── tokenizer_customization.py
    │   │   ├── utils.py
    │   │   └── word_encoder.py
    │   ├── depparse/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   └── trainer.py
    │   ├── identity_lemmatizer.py
    │   ├── lang_identifier.py
    │   ├── langid/
    │   │   ├── __init__.py
    │   │   ├── create_ud_data.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   └── trainer.py
    │   ├── lemma/
    │   │   ├── __init__.py
    │   │   ├── attach_lemma_classifier.py
    │   │   ├── data.py
    │   │   ├── edit.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   └── vocab.py
    │   ├── lemma_classifier/
    │   │   ├── __init__.py
    │   │   ├── base_model.py
    │   │   ├── base_trainer.py
    │   │   ├── baseline_model.py
    │   │   ├── constants.py
    │   │   ├── evaluate_many.py
    │   │   ├── evaluate_models.py
    │   │   ├── lstm_model.py
    │   │   ├── prepare_dataset.py
    │   │   ├── train_lstm_model.py
    │   │   ├── train_many.py
    │   │   ├── train_transformer_model.py
    │   │   ├── transformer_model.py
    │   │   └── utils.py
    │   ├── lemmatizer.py
    │   ├── mwt/
    │   │   ├── __init__.py
    │   │   ├── character_classifier.py
    │   │   ├── data.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── mwt_expander.py
    │   ├── ner/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── ner_tagger.py
    │   ├── parser.py
    │   ├── pos/
    │   │   ├── __init__.py
    │   │   ├── build_xpos_vocab_factory.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── vocab.py
    │   │   ├── xpos_vocab_factory.py
    │   │   └── xpos_vocab_utils.py
    │   ├── tagger.py
    │   ├── tokenization/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── tokenize_files.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── tokenizer.py
    │   └── wl_coref.py
    ├── pipeline/
    │   ├── __init__.py
    │   ├── _constants.py
    │   ├── constituency_processor.py
    │   ├── core.py
    │   ├── coref_processor.py
    │   ├── demo/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── demo_server.py
    │   │   ├── stanza-brat.css
    │   │   ├── stanza-brat.html
    │   │   ├── stanza-brat.js
    │   │   └── stanza-parseviewer.js
    │   ├── depparse_processor.py
    │   ├── external/
    │   │   ├── __init__.py
    │   │   ├── corenlp_converter_depparse.py
    │   │   ├── jieba.py
    │   │   ├── pythainlp.py
    │   │   ├── spacy.py
    │   │   └── sudachipy.py
    │   ├── langid_processor.py
    │   ├── lemma_processor.py
    │   ├── morphseg_processor.py
    │   ├── multilingual.py
    │   ├── mwt_processor.py
    │   ├── ner_processor.py
    │   ├── pos_processor.py
    │   ├── processor.py
    │   ├── registry.py
    │   ├── sentiment_processor.py
    │   └── tokenize_processor.py
    ├── protobuf/
    │   ├── CoreNLP_pb2.py
    │   └── __init__.py
    ├── resources/
    │   ├── __init__.py
    │   ├── common.py
    │   ├── default_packages.py
    │   ├── installation.py
    │   ├── prepare_resources.py
    │   └── print_charlm_depparse.py
    ├── server/
    │   ├── __init__.py
    │   ├── annotator.py
    │   ├── client.py
    │   ├── dependency_converter.py
    │   ├── java_protobuf_requests.py
    │   ├── main.py
    │   ├── morphology.py
    │   ├── parser_eval.py
    │   ├── semgrex.py
    │   ├── ssurgeon.py
    │   ├── tokensregex.py
    │   ├── tsurgeon.py
    │   └── ud_enhancer.py
    ├── tests/
    │   ├── __init__.py
    │   ├── classifiers/
    │   │   ├── __init__.py
    │   │   ├── test_classifier.py
    │   │   ├── test_constituency_classifier.py
    │   │   ├── test_data.py
    │   │   └── test_process_utils.py
    │   ├── common/
    │   │   ├── __init__.py
    │   │   ├── test_bert_embedding.py
    │   │   ├── test_char_model.py
    │   │   ├── test_chuliu_edmonds.py
    │   │   ├── test_common_data.py
    │   │   ├── test_confusion.py
    │   │   ├── test_constant.py
    │   │   ├── test_data_conversion.py
    │   │   ├── test_data_objects.py
    │   │   ├── test_doc.py
    │   │   ├── test_dropout.py
    │   │   ├── test_foundation_cache.py
    │   │   ├── test_pretrain.py
    │   │   ├── test_relative_attn.py
    │   │   ├── test_short_name_to_treebank.py
    │   │   └── test_utils.py
    │   ├── constituency/
    │   │   ├── __init__.py
    │   │   ├── test_convert_arboretum.py
    │   │   ├── test_convert_it_vit.py
    │   │   ├── test_convert_starlang.py
    │   │   ├── test_ensemble.py
    │   │   ├── test_in_order_compound_oracle.py
    │   │   ├── test_in_order_oracle.py
    │   │   ├── test_lstm_model.py
    │   │   ├── test_parse_transitions.py
    │   │   ├── test_parse_tree.py
    │   │   ├── test_positional_encoding.py
    │   │   ├── test_selftrain_vi_quad.py
    │   │   ├── test_text_processing.py
    │   │   ├── test_top_down_oracle.py
    │   │   ├── test_trainer.py
    │   │   ├── test_transformer_tree_stack.py
    │   │   ├── test_transition_sequence.py
    │   │   ├── test_tree_reader.py
    │   │   ├── test_tree_stack.py
    │   │   ├── test_utils.py
    │   │   └── test_vietnamese.py
    │   ├── datasets/
    │   │   ├── __init__.py
    │   │   ├── coref/
    │   │   │   ├── __init__.py
    │   │   │   └── test_hebrew_iahlt.py
    │   │   ├── ner/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_prepare_ner_file.py
    │   │   │   └── test_utils.py
    │   │   ├── test_common.py
    │   │   └── test_vietnamese_renormalization.py
    │   ├── depparse/
    │   │   ├── __init__.py
    │   │   ├── test_depparse_data.py
    │   │   └── test_parser.py
    │   ├── langid/
    │   │   ├── __init__.py
    │   │   ├── test_langid.py
    │   │   └── test_multilingual.py
    │   ├── lemma/
    │   │   ├── __init__.py
    │   │   ├── test_data.py
    │   │   ├── test_lemma_trainer.py
    │   │   └── test_lowercase.py
    │   ├── lemma_classifier/
    │   │   ├── __init__.py
    │   │   ├── test_data_preparation.py
    │   │   └── test_training.py
    │   ├── morphseg/
    │   │   ├── __init__.py
    │   │   ├── conftest.py
    │   │   ├── test_integration.py
    │   │   ├── test_morpheme_segmenter.py
    │   │   └── test_stanza_integration.py
    │   ├── mwt/
    │   │   ├── __init__.py
    │   │   ├── test_character_classifier.py
    │   │   ├── test_english_corner_cases.py
    │   │   ├── test_prepare_mwt.py
    │   │   └── test_utils.py
    │   ├── ner/
    │   │   ├── __init__.py
    │   │   ├── test_bsf_2_beios.py
    │   │   ├── test_bsf_2_iob.py
    │   │   ├── test_combine_ner_datasets.py
    │   │   ├── test_convert_amt.py
    │   │   ├── test_convert_nkjp.py
    │   │   ├── test_convert_starlang_ner.py
    │   │   ├── test_data.py
    │   │   ├── test_from_conllu.py
    │   │   ├── test_models_ner_scorer.py
    │   │   ├── test_ner_tagger.py
    │   │   ├── test_ner_trainer.py
    │   │   ├── test_ner_training.py
    │   │   ├── test_ner_utils.py
    │   │   ├── test_pay_amt_annotators.py
    │   │   ├── test_split_wikiner.py
    │   │   └── test_suc3.py
    │   ├── pipeline/
    │   │   ├── __init__.py
    │   │   ├── pipeline_device_tests.py
    │   │   ├── test_arabic_pipeline.py
    │   │   ├── test_core.py
    │   │   ├── test_decorators.py
    │   │   ├── test_depparse.py
    │   │   ├── test_english_pipeline.py
    │   │   ├── test_french_pipeline.py
    │   │   ├── test_lemmatizer.py
    │   │   ├── test_pipeline_constituency_processor.py
    │   │   ├── test_pipeline_depparse_processor.py
    │   │   ├── test_pipeline_mwt_expander.py
    │   │   ├── test_pipeline_ner_processor.py
    │   │   ├── test_pipeline_pos_processor.py
    │   │   ├── test_pipeline_sentiment_processor.py
    │   │   ├── test_requirements.py
    │   │   └── test_tokenizer.py
    │   ├── pos/
    │   │   ├── __init__.py
    │   │   ├── test_data.py
    │   │   ├── test_tagger.py
    │   │   └── test_xpos_vocab_factory.py
    │   ├── pytest.ini
    │   ├── resources/
    │   │   ├── __init__.py
    │   │   ├── test_charlm_depparse.py
    │   │   ├── test_common.py
    │   │   ├── test_default_packages.py
    │   │   ├── test_installation.py
    │   │   └── test_prepare_resources.py
    │   ├── server/
    │   │   ├── __init__.py
    │   │   ├── test_client.py
    │   │   ├── test_java_protobuf_requests.py
    │   │   ├── test_morphology.py
    │   │   ├── test_parser_eval.py
    │   │   ├── test_protobuf.py
    │   │   ├── test_semgrex.py
    │   │   ├── test_server_misc.py
    │   │   ├── test_server_pretokenized.py
    │   │   ├── test_server_request.py
    │   │   ├── test_server_start.py
    │   │   ├── test_ssurgeon.py
    │   │   ├── test_tokensregex.py
    │   │   ├── test_tsurgeon.py
    │   │   └── test_ud_enhancer.py
    │   ├── setup.py
    │   └── tokenization/
    │       ├── __init__.py
    │       ├── test_prepare_tokenizer_treebank.py
    │       ├── test_replace_long_tokens.py
    │       ├── test_spaces.py
    │       ├── test_tokenization_lst20.py
    │       ├── test_tokenization_orchid.py
    │       ├── test_tokenize_data.py
    │       ├── test_tokenize_files.py
    │       ├── test_tokenize_utils.py
    │       └── test_vocab.py
    └── utils/
        ├── __init__.py
        ├── avg_sent_len.py
        ├── charlm/
        │   ├── __init__.py
        │   ├── conll17_to_text.py
        │   ├── dump_oscar.py
        │   ├── make_lm_data.py
        │   └── oscar_to_text.py
        ├── confusion.py
        ├── conll.py
        ├── constituency/
        │   ├── __init__.py
        │   ├── check_transitions.py
        │   ├── grep_dev_logs.py
        │   ├── grep_test_logs.py
        │   └── list_tensors.py
        ├── datasets/
        │   ├── __init__.py
        │   ├── common.py
        │   ├── conllu_to_text.py
        │   ├── constituency/
        │   │   ├── __init__.py
        │   │   ├── build_silver_dataset.py
        │   │   ├── common_trees.py
        │   │   ├── convert_alt.py
        │   │   ├── convert_arboretum.py
        │   │   ├── convert_cintil.py
        │   │   ├── convert_ctb.py
        │   │   ├── convert_icepahc.py
        │   │   ├── convert_it_turin.py
        │   │   ├── convert_it_vit.py
        │   │   ├── convert_spmrl.py
        │   │   ├── convert_starlang.py
        │   │   ├── count_common_words.py
        │   │   ├── extract_all_silver_dataset.py
        │   │   ├── extract_silver_dataset.py
        │   │   ├── prepare_con_dataset.py
        │   │   ├── reduce_dataset.py
        │   │   ├── relabel_tags.py
        │   │   ├── selftrain.py
        │   │   ├── selftrain_it.py
        │   │   ├── selftrain_single_file.py
        │   │   ├── selftrain_vi_quad.py
        │   │   ├── selftrain_wiki.py
        │   │   ├── silver_variance.py
        │   │   ├── split_holdout.py
        │   │   ├── split_weighted_ensemble.py
        │   │   ├── tokenize_wiki.py
        │   │   ├── treebank_to_labeled_brackets.py
        │   │   ├── utils.py
        │   │   ├── vtb_convert.py
        │   │   └── vtb_split.py
        │   ├── contract_mwt.py
        │   ├── coref/
        │   │   ├── __init__.py
        │   │   ├── balance_languages.py
        │   │   ├── convert_hebrew_iahlt.py
        │   │   ├── convert_hebrew_mixed.py
        │   │   ├── convert_hindi.py
        │   │   ├── convert_ontonotes.py
        │   │   ├── convert_tamil.py
        │   │   ├── convert_udcoref.py
        │   │   ├── convert_udcoref_1.2.py
        │   │   └── utils.py
        │   ├── corenlp_segmenter_dataset.py
        │   ├── depparse/
        │   │   └── check_results.py
        │   ├── ner/
        │   │   ├── __init__.py
        │   │   ├── build_en_combined.py
        │   │   ├── check_for_duplicates.py
        │   │   ├── combine_ner_datasets.py
        │   │   ├── compare_entities.py
        │   │   ├── conll_to_iob.py
        │   │   ├── convert_amt.py
        │   │   ├── convert_ar_aqmar.py
        │   │   ├── convert_bn_daffodil.py
        │   │   ├── convert_bsf_to_beios.py
        │   │   ├── convert_bsnlp.py
        │   │   ├── convert_en_conll03.py
        │   │   ├── convert_fire_2013.py
        │   │   ├── convert_he_iahlt.py
        │   │   ├── convert_hy_armtdp.py
        │   │   ├── convert_ijc.py
        │   │   ├── convert_kk_kazNERD.py
        │   │   ├── convert_lst20.py
        │   │   ├── convert_mr_l3cube.py
        │   │   ├── convert_my_ucsy.py
        │   │   ├── convert_nkjp.py
        │   │   ├── convert_nner22.py
        │   │   ├── convert_nytk.py
        │   │   ├── convert_ontonotes.py
        │   │   ├── convert_rgai.py
        │   │   ├── convert_sindhi_siner.py
        │   │   ├── convert_starlang_ner.py
        │   │   ├── count_entities.py
        │   │   ├── json_to_bio.py
        │   │   ├── misc_to_date.py
        │   │   ├── ontonotes_multitag.py
        │   │   ├── prepare_ner_dataset.py
        │   │   ├── prepare_ner_file.py
        │   │   ├── preprocess_wikiner.py
        │   │   ├── simplify_en_worldwide.py
        │   │   ├── simplify_ontonotes_to_worldwide.py
        │   │   ├── split_wikiner.py
        │   │   ├── suc_conll_to_iob.py
        │   │   ├── suc_to_iob.py
        │   │   └── utils.py
        │   ├── pos/
        │   │   ├── __init__.py
        │   │   ├── convert_trees_to_pos.py
        │   │   └── remove_columns.py
        │   ├── prepare_depparse_treebank.py
        │   ├── prepare_lemma_classifier.py
        │   ├── prepare_lemma_treebank.py
        │   ├── prepare_mwt_treebank.py
        │   ├── prepare_pos_treebank.py
        │   ├── prepare_tokenizer_data.py
        │   ├── prepare_tokenizer_treebank.py
        │   ├── pretrain/
        │   │   ├── __init__.py
        │   │   └── word_in_pretrain.py
        │   ├── random_split_conllu.py
        │   ├── sentiment/
        │   │   ├── __init__.py
        │   │   ├── add_constituency.py
        │   │   ├── convert_italian_poetry_classification.py
        │   │   ├── convert_italian_sentence_classification.py
        │   │   ├── prepare_sentiment_dataset.py
        │   │   ├── process_MELD.py
        │   │   ├── process_airline.py
        │   │   ├── process_arguana_xml.py
        │   │   ├── process_corona.py
        │   │   ├── process_es_tass2020.py
        │   │   ├── process_it_sentipolc16.py
        │   │   ├── process_ren_chinese.py
        │   │   ├── process_sb10k.py
        │   │   ├── process_scare.py
        │   │   ├── process_slsd.py
        │   │   ├── process_sst.py
        │   │   ├── process_usage_german.py
        │   │   ├── process_utils.py
        │   │   └── process_vsfc_vietnamese.py
        │   ├── thai_syllable_dict_generator.py
        │   ├── tokenization/
        │   │   ├── __init__.py
        │   │   ├── convert_ml_cochin.py
        │   │   ├── convert_my_alt.py
        │   │   ├── convert_text_files.py
        │   │   ├── convert_th_best.py
        │   │   ├── convert_th_lst20.py
        │   │   ├── convert_th_orchid.py
        │   │   ├── convert_vi_vlsp.py
        │   │   └── process_thai_tokenization.py
        │   └── vietnamese/
        │       ├── __init__.py
        │       └── renormalize.py
        ├── default_paths.py
        ├── get_tqdm.py
        ├── helper_func.py
        ├── languages/
        │   ├── __init__.py
        │   └── kazakh_transliteration.py
        ├── lemma/
        │   ├── __init__.py
        │   └── count_ambiguous_lemmas.py
        ├── max_mwt_length.py
        ├── ner/
        │   ├── __init__.py
        │   ├── flair_ner_tag_dataset.py
        │   ├── paying_annotators.py
        │   └── spacy_ner_tag_dataset.py
        ├── pretrain/
        │   ├── __init__.py
        │   └── compare_pretrains.py
        ├── select_backoff.py
        ├── training/
        │   ├── __init__.py
        │   ├── common.py
        │   ├── compose_ete_results.py
        │   ├── remove_constituency_optimizer.py
        │   ├── run_charlm.py
        │   ├── run_constituency.py
        │   ├── run_depparse.py
        │   ├── run_ete.py
        │   ├── run_lemma.py
        │   ├── run_lemma_classifier.py
        │   ├── run_mwt.py
        │   ├── run_ner.py
        │   ├── run_pos.py
        │   ├── run_sentiment.py
        │   ├── run_tokenizer.py
        │   └── separate_ner_pretrain.py
        └── visualization/
            ├── README
            ├── __init__.py
            ├── conll_deprel_visualization.py
            ├── constants.py
            ├── dependency_visualization.py
            ├── ner_visualization.py
            ├── semgrex_app.py
            ├── semgrex_visualizer.py
            ├── ssurgeon_visualizer.py
            └── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment (please complete the following information):**
 - OS: [e.g. Windows, Ubuntu, CentOS, MacOS]
 - Python version: [e.g. Python 3.6.8 from Anaconda]
 - Stanza version: [e.g., 1.0.0]

**Additional context**
Add any other context about the problem here.


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


================================================
FILE: .github/ISSUE_TEMPLATE/question.md
================================================
---
name: Question
about: 'Question about general usage. '
title: "[QUESTION]"
labels: question
assignees: ''

---

Before you start, make sure to check out:
* Our documentation: https://stanfordnlp.github.io/stanza/
* Our FAQ: https://stanfordnlp.github.io/stanza/faq.html
* Github issues (especially closed ones)
Your question might have an answer in these places!

If you still couldn't find the answer to your question, feel free to delete this text and write down your question. The more information you provide with your question, the faster we will be able to help you!

If you have a question about an issue you're facing when using Stanza, please try to provide a detailed step-by-step guide to reproduce the issue you're facing. Try to at least provide a minimal code sample to reproduce the problem you are facing, instead of just describing it. That would greatly help us in locating the issue faster and help you resolve it!


================================================
FILE: .github/pull_request_template.md
================================================
**BEFORE YOU START**: please make sure your pull request is against the `dev` branch. 
We cannot accept pull requests against the `main` branch. 
See our [contributing guide](https://github.com/stanfordnlp/stanza/blob/main/CONTRIBUTING.md) for details.

## Description
A brief and concise description of what your pull request is trying to accomplish.

## Fixes Issues
A list of issues/bugs with # references. (e.g., #123)

## Unit test coverage
Are there unit tests in place to make sure your code is functioning correctly?
(see [here](https://github.com/stanfordnlp/stanza/blob/master/tests/test_tagger.py) for a simple example)

## Known breaking changes/behaviors
Does this break anything in Stanza's existing user interface? If so, what is it and how is it addressed?


================================================
FILE: .github/stale.yml
================================================
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 60
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Issues with these labels will never be considered stale
exemptLabels:
  - pinned
  - security
  - fixed on dev
  - bug
  - enhancement
# Label to use when marking an issue as stale
staleLabel: stale
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
  This issue has been automatically marked as stale because it has not had
  recent activity. It will be closed if no further activity occurs. Thank you
  for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: >
  This issue has been automatically closed due to inactivity.


================================================
FILE: .github/workflows/stanza-tests.yaml
================================================
name: Run Stanza Tests
on: [push]
jobs:
  Run-Stanza-Tests:
    runs-on: self-hosted
    steps:
      - run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event."
      - run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!"
      - run: echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}."
      - name: Check out repository code
        uses: actions/checkout@v2
      - run: echo "💡 The ${{ github.repository }} repository has been cloned to the runner."
      - run: echo "🖥️ The workflow is now ready to test your code on the runner."
      - name: Run Stanza Tests
        run: |
          # set up environment
          echo "Setting up environment..."
          bash
          #. $CONDA_PREFIX/etc/profile.d/conda.sh
          . /home/stanzabuild/miniconda3/etc/profile.d/conda.sh
          conda activate stanza
          export STANZA_TEST_HOME=/scr/stanza_test
          export CORENLP_HOME=$STANZA_TEST_HOME/corenlp_dir
          export CLASSPATH=$CORENLP_HOME/*:
          echo CORENLP_HOME=$CORENLP_HOME
          echo CLASSPATH=$CLASSPATH
          # install from stanza repo being evaluated
          echo PWD: $pwd
          echo PATH: $PATH
          pip3 install -e .
          pip3 install -e .[test]
          pip3 install -e .[transformers]
          pip3 install -e .[tokenizers]
          pip3 install -e .[morphseg]
          # set up for tests
          echo "Running stanza test set up..."
          rm -rf $STANZA_TEST_HOME
          python3 stanza/tests/setup.py
          # run tests
          echo "Running tests..."
          export CUDA_VISIBLE_DEVICES=2
          pytest stanza/tests
          
      - run: echo "🍏 This job's status is ${{ job.status }}."


================================================
FILE: .gitignore
================================================
# kept from original
.DS_Store
*.tmp
*.pkl
*.conllu
*.lem
*.toklabels

# also data w/o any slash to account for symlinks
data
data/
stanza_resources/
stanza_test/
saved_models/
logs/
log/
*_test_treebanks
wandb/

params/*/*.json
!params/*/default.json

# emacs backup files
*~
# VI backup files?
*py.swp

# standard github python project gitignore
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# IDE-related
.vscode/

.idea/vcs.xml
.idea/inspectionProfiles/profiles_settings.xml
.idea/workspace.xml

# Jekyll stuff, triggered by running the docs locally
.jekyll-cache/
.jekyll-metadata
_site/

# symlink / directory for data files
extern_data


================================================
FILE: .travis.yml
================================================
language: python
python:
  - 3.6.5
notifications:
  email: false
install:
  - pip install --quiet .
  - export CORENLP_HOME=~/corenlp-latest CORENLP_VERSION=stanford-corenlp-latest
  - export CORENLP_URL="http://nlp.stanford.edu/software/${CORENLP_VERSION}.zip"
  - wget $CORENLP_URL -O corenlp-latest.zip
  - unzip corenlp-latest.zip > unzip.log
  - export CORENLP_UNZIP=`grep creating unzip.log | head -n 1 | cut -d ":" -f 2`
  - mv $CORENLP_UNZIP $CORENLP_HOME
  - mkdir ~/stanza_test
  - mkdir ~/stanza_test/in
  - mkdir ~/stanza_test/out
  - mkdir ~/stanza_test/scripts
  - cp tests/data/external_server.properties ~/stanza_test/scripts
  - cp tests/data/example_french.json ~/stanza_test/out
  - cp tests/data/tiny_emb.* ~/stanza_test/in
  - export STANZA_TEST_HOME=~/stanza_test
script:
  - python -m pytest -m travis tests/


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to Stanza

We would love to see contributions to Stanza from the community! Contributions that we welcome include bugfixes and enhancements. If you want to report a bug or suggest a feature but don't intend to fix or implement it by yourself, please create a corresponding issue on [our issues page](https://github.com/stanfordnlp/stanza/issues). If you plan to contribute a bugfix or enhancement, please read the following.

## 🛠️ Bugfixes

For bugfixes, please follow these steps:

- Make sure a fix does not already exist, by searching through existing [issues](https://github.com/stanfordnlp/stanza/issues) (including closed ones) and [pull requests](https://github.com/stanfordnlp/stanza/pulls).
- Confirm the bug with us by creating a bug-report issue. In your issue, you should at least include the platform and environment that you are running with, and a minimal code snippet that will reproduce the bug.
- Once the bug is confirmed, you can go ahead with implementing the bugfix, and create a pull request **against the `dev` branch**.

## 💡 Enhancements

For enhancements, please follow these steps:

- Make sure a similar enhancement suggestion does not already exist, by searching through existing [issues](https://github.com/stanfordnlp/stanza/issues).
- Create a feature-request issue and discuss about this enhancement with us. We'll need to make sure this enhancement won't break existing user interface and functionalities.
- Once the enhancement is confirmed with us, you can go ahead with implementing it, and create a pull request **against the `dev` branch**.


================================================
FILE: LICENSE
================================================
Copyright 2019 The Board of Trustees of The Leland Stanford Junior University

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


================================================
FILE: README.md
================================================
<div align="center"><img src="https://github.com/stanfordnlp/stanza/raw/dev/images/stanza-logo.png" height="100px"/></div>

<h2 align="center">Stanza: A Python NLP Library for Many Human Languages</h2>

<div align="center">
    <a href="https://github.com/stanfordnlp/stanza/actions">
       <img alt="Run Tests" src="https://github.com/stanfordnlp/stanza/actions/workflows/stanza-tests.yaml/badge.svg">
    </a>
    <a href="https://pypi.org/project/stanza/">
        <img alt="PyPI Version" src="https://img.shields.io/pypi/v/stanza?color=blue">
    </a>
    <a href="https://anaconda.org/stanfordnlp/stanza">
        <img alt="Conda Versions" src="https://img.shields.io/conda/vn/stanfordnlp/stanza?color=blue&label=conda">
    </a>
    <a href="https://pypi.org/project/stanza/">
        <img alt="Python Versions" src="https://img.shields.io/pypi/pyversions/stanza?colorB=blue">
    </a>
</div>

The Stanford NLP Group's official Python NLP library. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python. For detailed information please visit our [official website](https://stanfordnlp.github.io/stanza/).

🔥 &nbsp;A new collection of **biomedical** and **clinical** English model packages are now available, offering seamless experience for syntactic analysis and named entity recognition (NER) from biomedical literature text and clinical notes. For more information, check out our [Biomedical models documentation page](https://stanfordnlp.github.io/stanza/biomed.html).

### References

If you use this library in your research, please kindly cite our [ACL2020 Stanza system demo paper](https://arxiv.org/abs/2003.07082):

```bibtex
@inproceedings{qi2020stanza,
    title={Stanza: A {Python} Natural Language Processing Toolkit for Many Human Languages},
    author={Qi, Peng and Zhang, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D.},
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    year={2020}
}
```

If you use our biomedical and clinical models, please also cite our [Stanza Biomedical Models description paper](https://arxiv.org/abs/2007.14640):

```bibtex
@article{zhang2021biomedical,
    author = {Zhang, Yuhao and Zhang, Yuhui and Qi, Peng and Manning, Christopher D and Langlotz, Curtis P},
    title = {Biomedical and clinical {E}nglish model packages for the {S}tanza {P}ython {NLP} library},
    journal = {Journal of the American Medical Informatics Association},
    year = {2021},
    month = {06},
    issn = {1527-974X}
}
```

The PyTorch implementation of the neural pipeline in this repository is due to [Peng Qi](http://qipeng.me) (@qipeng), [Yuhao Zhang](http://yuhao.im) (@yuhaozhang), and [Yuhui Zhang](https://cs.stanford.edu/~yuhuiz/) (@yuhui-zh15), with help from [Jason Bolton](mailto:jebolton@stanford.edu) (@j38), [Tim Dozat](https://web.stanford.edu/~tdozat/) (@tdozat) and [John Bauer](https://www.linkedin.com/in/john-bauer-b3883b60/) (@AngledLuffa). Maintenance of this repo is currently led by [John Bauer](https://www.linkedin.com/in/john-bauer-b3883b60/).

If you use the CoreNLP software through Stanza, please cite the CoreNLP software package and the respective modules as described [here](https://stanfordnlp.github.io/CoreNLP/#citing-stanford-corenlp-in-papers) ("Citing Stanford CoreNLP in papers"). The CoreNLP client is mostly written by [Arun Chaganty](http://arun.chagantys.org/), and [Jason Bolton](mailto:jebolton@stanford.edu) spearheaded merging the two projects together.

If you use the Semgrex or Ssurgeon part of CoreNLP, please cite [our GURT paper on Semgrex and Ssurgeon](https://aclanthology.org/2023.tlt-1.7/):

```bibtex
@inproceedings{bauer-etal-2023-semgrex,
    title = "Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs",
    author = "Bauer, John  and
      Kiddon, Chlo{\'e}  and
      Yeh, Eric  and
      Shan, Alex  and
      D. Manning, Christopher",
    booktitle = "Proceedings of the 21st International Workshop on Treebanks and Linguistic Theories (TLT, GURT/SyntaxFest 2023)",
    month = mar,
    year = "2023",
    address = "Washington, D.C.",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.tlt-1.7",
    pages = "67--73",
    abstract = "Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released toolkits in Java and Python allows for searching text relations and attributes over natural text.",
}
```

## Issues and Usage Q&A

To ask questions, report issues or request features 🤔, please use the [GitHub Issue Tracker](https://github.com/stanfordnlp/stanza/issues). Before creating a new issue, please make sure to search for existing issues that may solve your problem, or visit the [Frequently Asked Questions (FAQ) page](https://stanfordnlp.github.io/stanza/faq.html) on our website.

## Contributing to Stanza

We welcome community contributions to Stanza in the form of bugfixes 🛠️ and enhancements 💡! If you want to contribute, please first read [our contribution guideline](CONTRIBUTING.md).

## Installation

### pip

Stanza supports Python 3.6 or later. We recommend that you install Stanza via [pip](https://pip.pypa.io/en/stable/installing/), the Python package manager. To install, simply run:
```bash
pip install stanza
```
This should also help resolve all of the dependencies of Stanza, for instance [PyTorch](https://pytorch.org/) 1.3.0 or above.

If you currently have a previous version of `stanza` installed, use:
```bash
pip install stanza -U
```

### Anaconda

To install Stanza via Anaconda, use the following conda command:

```bash
conda install -c stanfordnlp stanza
```

Note that for now installing Stanza via Anaconda does not work for Python 3.10. For Python 3.10 please use pip installation.

### From Source

Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of Stanza. For this option, run
```bash
git clone https://github.com/stanfordnlp/stanza.git
cd stanza
pip install -e .
```

## Running Stanza

### Getting Started with the neural pipeline

To run your first Stanza pipeline, simply follow these steps in your Python interactive interpreter:

```python
>>> import stanza
>>> stanza.download('en')       # Optional: pre-download English models (Pipeline can auto-download if needed)
>>> nlp = stanza.Pipeline('en') # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()
```

If you encounter `requests.exceptions.ConnectionError`, please try to use a proxy:

```python
>>> import stanza
>>> proxies = {'http': 'http://ip:port', 'https': 'http://ip:port'}
>>> stanza.download('en', proxies=proxies)  # Optional: pre-download English models (Pipeline can auto-download if needed)
>>> nlp = stanza.Pipeline('en')             # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()
```

The last command will print out the words in the first sentence in the input string (or [`Document`](https://stanfordnlp.github.io/stanza/data_objects.html#document), as it is represented in Stanza), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its "head"), along with the dependency relation between the words. The output should look like:

```
('Barack', '4', 'nsubj:pass')
('Obama', '1', 'flat')
('was', '4', 'aux:pass')
('born', '0', 'root')
('in', '6', 'case')
('Hawaii', '4', 'obl')
('.', '4', 'punct')
```

See [our getting started guide](https://stanfordnlp.github.io/stanza/installation_usage.html#getting-started) for more details.

### Accessing Java Stanford CoreNLP software

Aside from the neural pipeline, this package also includes an official wrapper for accessing the Java Stanford CoreNLP software with Python code.

There are a few initial setup steps.

* Download [Stanford CoreNLP](https://stanfordnlp.github.io/CoreNLP/) and models for the language you wish to use
* Put the model jars in the distribution folder
* Tell the Python code where Stanford CoreNLP is located by setting the `CORENLP_HOME` environment variable (e.g., in *nix): `export CORENLP_HOME=/path/to/stanford-corenlp-4.5.3`

We provide [comprehensive examples](https://stanfordnlp.github.io/stanza/corenlp_client.html) in our documentation that show how one can use CoreNLP through Stanza and extract various annotations from it.

### Online Colab Notebooks

To get your started, we also provide interactive Jupyter notebooks in the `demo` folder. You can also open these notebooks and run them interactively on [Google Colab](https://colab.research.google.com). To view all available notebooks, follow these steps:

* Go to the [Google Colab website](https://colab.research.google.com)
* Navigate to `File` -> `Open notebook`, and choose `GitHub` in the pop-up menu
* Note that you do **not** need to give Colab access permission to your GitHub account
* Type `stanfordnlp/stanza` in the search bar, and click enter

### Trained Models for the Neural Pipeline

We currently provide models for all of the [Universal Dependencies](https://universaldependencies.org/) treebanks v2.8, as well as NER models for a few widely-spoken languages. You can find instructions for downloading and using these models [here](https://stanfordnlp.github.io/stanza/models.html).

### Batching To Maximize Pipeline Speed

To maximize speed performance, it is essential to run the pipeline on batches of documents. Running a for loop on one sentence at a time will be very slow. The best approach at this time is to concatenate documents together, with each document separated by a blank line (i.e., two line breaks `\n\n`).  The tokenizer will recognize blank lines as sentence breaks. We are actively working on improving multi-document processing.

## Training your own neural pipelines

All neural modules in this library can be trained with your own data. The tokenizer, the multi-word token (MWT) expander, the POS/morphological features tagger, the lemmatizer and the dependency parser require [CoNLL-U](https://universaldependencies.org/format.html) formatted data, while the NER model requires the BIOES format. Currently, we do not support model training via the `Pipeline` interface. Therefore, to train your own models, you need to clone this git repository and run training from the source.

For detailed step-by-step guidance on how to train and evaluate your own models, please visit our [training documentation](https://stanfordnlp.github.io/stanza/training.html).

## LICENSE

Stanza is released under the Apache License, Version 2.0. See the [LICENSE](https://github.com/stanfordnlp/stanza/blob/master/LICENSE) file for more details.


================================================
FILE: demo/CONLL_Dependency_Visualizer_Example.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0fd86c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.conll_deprel_visualization import conll_to_visual\n",
    "\n",
    "# load necessary conllu files - expected to be in the demo directory along with the notebook\n",
    "en_file = \"en_test.conllu.txt\"\n",
    "\n",
    "# testing left to right languages\n",
    "conll_to_visual(en_file, \"en\", sent_count=2)\n",
    "conll_to_visual(en_file, \"en\", sent_count=10)\n",
    "#conll_to_visual(en_file, \"en\", display_all=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc4b3f9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.conll_deprel_visualization import conll_to_visual\n",
    "\n",
    "jp_file = \"japanese_test.conllu.txt\"\n",
    "conll_to_visual(jp_file, \"ja\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6852b8e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.conll_deprel_visualization import conll_to_visual\n",
    "\n",
    "# testing right to left languages\n",
    "ar_file = \"arabic_test.conllu.txt\"\n",
    "conll_to_visual(ar_file, \"ar\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.22"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: demo/Dependency_Visualization_Testing.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "64b2a9e0",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.dependency_visualization import visualize_strings\n",
    "\n",
    "ar_strings = ['برلين ترفض حصول شركة اميركية على رخصة تصنيع دبابة \"ليوبارد\" الالمانية', \"هل بإمكاني مساعدتك؟\", \n",
    "              \"أراك في مابعد\", \"لحظة من فضلك\"]\n",
    "# Testing with right to left language\n",
    "visualize_strings(ar_strings, \"ar\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35ef521b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.dependency_visualization import visualize_strings\n",
    "\n",
    "en_strings = [\"This is a sentence.\", \n",
    "              \"He is wearing a red shirt\",\n",
    "              \"Barack Obama was born in Hawaii. He was elected President of the United States in 2008.\"]\n",
    "# Testing with left to right languages\n",
    "visualize_strings(en_strings, \"en\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3cf10ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.dependency_visualization import visualize_strings\n",
    "\n",
    "zh_strings = [\"中国是一个很有意思的国家。\"]\n",
    "# Testing with right to left language\n",
    "visualize_strings(zh_strings, \"zh\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2b9b574",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.22"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: demo/NER_Visualization.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abf300bb",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.ner_visualization import visualize_strings\n",
    "\n",
    "en_strings = ['''Samuel Jackson, a Christian man from Utah, went to the JFK Airport for a flight to New York.\n",
    "                 He was thinking of attending the US Open, his favorite tennis tournament besides Wimbledon.\n",
    "                 That would be a dream trip, certainly not possible since it is $5000 attendance and 5000 miles away.\n",
    "                 On the way there, he watched the Super Bowl for 2 hours and read War and Piece by Tolstoy for 1 hour.\n",
    "                 In New York, he crossed the Brooklyn Bridge and listened to the 5th symphony of Beethoven as well as\n",
    "                 \"All I want for Christmas is You\" by Mariah Carey.''', \n",
    "              \"Barack Obama was born in Hawaii. He was elected President of the United States in 2008\"]\n",
    "    \n",
    "visualize_strings(en_strings, \"en\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5670921a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.ner_visualization import visualize_strings\n",
    "\n",
    "zh_strings = ['''来自犹他州的基督徒塞缪尔杰克逊前往肯尼迪机场搭乘航班飞往纽约。\n",
    "                 他正在考虑参加美国公开赛，这是除了温布尔登之外他最喜欢的网球赛事。\n",
    "                 那将是一次梦想之旅，当然不可能，因为它的出勤费为 5000 美元，距离 5000 英里。\n",
    "                 在去的路上，他看了 2 个小时的超级碗比赛，看了 1 个小时的托尔斯泰的《战争与碎片》。\n",
    "                 在纽约，他穿过布鲁克林大桥，聆听了贝多芬的第五交响曲以及 玛丽亚凯莉的“圣诞节我想要的就是你”。''',\n",
    "              \"我觉得罗家费德勒住在加州, 在美国里面。\"]\n",
    "visualize_strings(zh_strings, \"zh\", colors={\"PERSON\": \"yellow\", \"DATE\": \"red\", \"GPE\": \"blue\"})\n",
    "visualize_strings(zh_strings, \"zh\", select=['PERSON', 'DATE'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8d96072",
   "metadata": {},
   "outputs": [],
   "source": [
    "from stanza.utils.visualization.ner_visualization import visualize_strings\n",
    "\n",
    "ar_strings = [\".أعيش في سان فرانسيسكو ، كاليفورنيا. اسمي أليكس وأنا ألتحق بجامعة ستانفورد. أنا أدرس علوم الكمبيوتر وأستاذي هو كريس مانينغ\"\n",
    "             , \"اسمي أليكس ، أنا من الولايات المتحدة.\",  \n",
    "               '''صامويل جاكسون ، رجل مسيحي من ولاية يوتا ، ذهب إلى مطار جون كنيدي في رحلة إلى نيويورك. كان يفكر في حضور بطولة الولايات المتحدة المفتوحة للتنس ، بطولة التنس المفضلة لديه إلى جانب بطولة ويمبلدون. ستكون هذه رحلة الأحلام ، وبالتأكيد ليست ممكنة لأنها تبلغ 5000 دولار للحضور و 5000 ميل. في الطريق إلى هناك ، شاهد Super Bowl لمدة ساعتين وقرأ War and Piece by Tolstoy لمدة ساعة واحدة. في نيويورك ، عبر جسر بروكلين واستمع إلى السيمفونية الخامسة لبيتهوفن وكذلك \"كل ما أريده في عيد الميلاد هو أنت\" لماريا كاري.''']\n",
    "\n",
    "visualize_strings(ar_strings, \"ar\", colors={\"PER\": \"pink\", \"LOC\": \"linear-gradient(90deg, #aa9cfc, #fc9ce7)\", \"ORG\": \"yellow\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22489b27",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.22"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: demo/Stanza_Beginners_Guide.ipynb
================================================
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Stanza-Beginners-Guide.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "56LiYCkPM7V_",
        "colab_type": "text"
      },
      "source": [
        "# Welcome to Stanza!\n",
        "\n",
        "![Latest Version](https://img.shields.io/pypi/v/stanza.svg?colorB=bc4545)\n",
        "![Python Versions](https://img.shields.io/pypi/pyversions/stanza.svg?colorB=bc4545)\n",
        "\n",
        "Stanza is a Python NLP toolkit that supports 60+ human languages. It is built with highly accurate neural network components that enable efficient training and evaluation with your own annotated data, and offers pretrained models on 100 treebanks. Additionally, Stanza provides a stable, officially maintained Python interface to Java Stanford CoreNLP Toolkit.\n",
        "\n",
        "In this tutorial, we will demonstrate how to set up Stanza and annotate text with its native neural network NLP models. For the use of the Python CoreNLP interface, please see other tutorials."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yQff4Di5Nnq0",
        "colab_type": "text"
      },
      "source": [
        "## 1. Installing Stanza\n",
        "\n",
        "Note that Stanza only supports Python 3.6 and above. Installing and importing Stanza are as simple as running the following commands:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "owSj1UtdEvSU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Install; note that the prefix \"!\" is not needed if you are running in a terminal\n",
        "!pip install stanza\n",
        "\n",
        "# Import the package\n",
        "import stanza"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4ixllwEKeCJg",
        "colab_type": "text"
      },
      "source": [
        "### More Information\n",
        "\n",
        "For common troubleshooting, please visit our [troubleshooting page](https://stanfordnlp.github.io/stanfordnlp/installation_usage.html#troubleshooting)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "aeyPs5ARO79d",
        "colab_type": "text"
      },
      "source": [
        "## 2. Downloading Models\n",
        "\n",
        "You can download models with the `stanza.download` command. The language can be specified with either a full language name (e.g., \"english\"), or a short code (e.g., \"en\"). \n",
        "\n",
        "By default, models will be saved to your `~/stanza_resources` directory. If you want to specify your own path to save the model files, you can pass a `dir=your_path` argument.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HDwRm-KXGcYo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Download an English model into the default directory\n",
        "print(\"Downloading English model...\")\n",
        "stanza.download('en')\n",
        "\n",
        "# Similarly, download a (simplified) Chinese model\n",
        "# Note that you can use verbose=False to turn off all printed messages\n",
        "print(\"Downloading Chinese model...\")\n",
        "stanza.download('zh', verbose=False)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7HCfQ0SfdmsU",
        "colab_type": "text"
      },
      "source": [
        "### More Information\n",
        "\n",
        "Pretrained models are provided for 60+ different languages. For all languages, available models and the corresponding short language codes, please check out the [models page](https://stanfordnlp.github.io/stanza/models.html).\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b3-WZJrzWD2o",
        "colab_type": "text"
      },
      "source": [
        "## 3. Processing Text\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XrnKl2m3fq2f",
        "colab_type": "text"
      },
      "source": [
        "### Constructing Pipeline\n",
        "\n",
        "To process a piece of text, you'll need to first construct a `Pipeline` with different `Processor` units. The pipeline is language-specific, so again you'll need to first specify the language (see examples).\n",
        "\n",
        "- By default, the pipeline will include all processors, including tokenization, multi-word token expansion, part-of-speech tagging, lemmatization, dependency parsing and named entity recognition (for supported languages). However, you can always specify what processors you want to include with the `processors` argument.\n",
        "\n",
        "- Stanza's pipeline is CUDA-aware, meaning that a CUDA-device will be used whenever it is available, otherwise CPUs will be used when a GPU is not found. You can force the pipeline to use CPU regardless by setting `use_gpu=False`.\n",
        "\n",
        "- Again, you can suppress all printed messages by setting `verbose=False`."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HbiTSBDPG53o",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Build an English pipeline, with all processors by default\n",
        "print(\"Building an English pipeline...\")\n",
        "en_nlp = stanza.Pipeline('en')\n",
        "\n",
        "# Build a Chinese pipeline, with customized processor list and no logging, and force it to use CPU\n",
        "print(\"Building a Chinese pipeline...\")\n",
        "zh_nlp = stanza.Pipeline('zh', processors='tokenize,lemma,pos,depparse', verbose=False, use_gpu=False)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Go123Bx8e1wt",
        "colab_type": "text"
      },
      "source": [
        "### Annotating Text\n",
        "\n",
        "After a pipeline is successfully constructed, you can get annotations of a piece of text simply by passing the string into the pipeline object. The pipeline will return a `Document` object, which can be used to access detailed annotations from. For example:\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "k_p0h1UTHDMm",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Processing English text\n",
        "en_doc = en_nlp(\"Barack Obama was born in Hawaii.  He was elected president in 2008.\")\n",
        "print(type(en_doc))\n",
        "\n",
        "# Processing Chinese text\n",
        "zh_doc = zh_nlp(\"达沃斯世界经济论坛是每年全球政商界领袖聚在一起的年度盛事。\")\n",
        "print(type(zh_doc))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DavwCP9egzNZ",
        "colab_type": "text"
      },
      "source": [
        "### More Information\n",
        "\n",
        "For more information on how to construct a pipeline and information on different processors, please visit our [pipeline page](https://stanfordnlp.github.io/stanfordnlp/pipeline.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O_PYLEGziQWR",
        "colab_type": "text"
      },
      "source": [
        "## 4. Accessing Annotations\n",
        "\n",
        "Annotations can be accessed from the returned `Document` object. \n",
        "\n",
        "A `Document` contains a list of `Sentence`s, and a `Sentence` contains a list of `Token`s and `Word`s. For the most part `Token`s and `Word`s overlap, but some tokens can be divided into mutiple words, for instance the French token `aux` is divided into the words `à` and `les`, while in English a word and a token are equivalent. Note that dependency parses are derived over `Word`s.\n",
        "\n",
        "Additionally, a `Span` object is used to represent annotations that are part of a document, such as named entity mentions.\n",
        "\n",
        "\n",
        "The following example iterate over all English sentences and words, and print the word information one by one:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "B5691SpFHFZ6",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "for i, sent in enumerate(en_doc.sentences):\n",
        "    print(\"[Sentence {}]\".format(i+1))\n",
        "    for word in sent.words:\n",
        "        print(\"{:12s}\\t{:12s}\\t{:6s}\\t{:d}\\t{:12s}\".format(\\\n",
        "              word.text, word.lemma, word.pos, word.head, word.deprel))\n",
        "    print(\"\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-AUkCkNIrusq",
        "colab_type": "text"
      },
      "source": [
        "The following example iterate over all extracted named entity mentions and print out their character spans and types."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "5Uu0-WmvsnlK",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "print(\"Mention text\\tType\\tStart-End\")\n",
        "for ent in en_doc.ents:\n",
        "    print(\"{}\\t{}\\t{}-{}\".format(ent.text, ent.type, ent.start_char, ent.end_char))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ql1SZlZOnMLo",
        "colab_type": "text"
      },
      "source": [
        "And similarly for the Chinese text:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XsVcEO9tHKPG",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "for i, sent in enumerate(zh_doc.sentences):\n",
        "    print(\"[Sentence {}]\".format(i+1))\n",
        "    for word in sent.words:\n",
        "        print(\"{:12s}\\t{:12s}\\t{:6s}\\t{:d}\\t{:12s}\".format(\\\n",
        "              word.text, word.lemma, word.pos, word.head, word.deprel))\n",
        "    print(\"\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dUhWAs8pnnHT",
        "colab_type": "text"
      },
      "source": [
        "Alternatively, you can directly print a `Word` object to view all its annotations as a Python dict:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6_UafNb7HHIg",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "word = en_doc.sentences[0].words[0]\n",
        "print(word)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TAQlOsuRoq2V",
        "colab_type": "text"
      },
      "source": [
        "### More Information\n",
        "\n",
        "For all information on different data objects, please visit our [data objects page](https://stanfordnlp.github.io/stanza/data_objects.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hiiWHxYPpmhd",
        "colab_type": "text"
      },
      "source": [
        "## 5. Resources\n",
        "\n",
        "Apart from this interactive tutorial, we also provide tutorials on our website that cover a variety of use cases such as how to use different model \"packages\" for a language, how to use spaCy as a tokenizer, how to process pretokenized text without running the tokenizer, etc. For these tutorials please visit [our Tutorials page](https://stanfordnlp.github.io/stanza/tutorials.html).\n",
        "\n",
        "Other resources that you may find helpful include:\n",
        "\n",
        "- [Stanza Homepage](https://stanfordnlp.github.io/stanza/index.html)\n",
        "- [FAQs](https://stanfordnlp.github.io/stanza/faq.html)\n",
        "- [GitHub Repo](https://github.com/stanfordnlp/stanza)\n",
        "- [Reporting Issues](https://github.com/stanfordnlp/stanza/issues)\n",
        "- [Stanza System Description Paper](http://arxiv.org/abs/2003.07082)\n"
      ]
    }
  ]
}

================================================
FILE: demo/Stanza_CoreNLP_Interface.ipynb
================================================
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Stanza-CoreNLP-Interface.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2-4lzQTC9yxG",
        "colab_type": "text"
      },
      "source": [
        "# Stanza: A Tutorial on the Python CoreNLP Interface\n",
        "\n",
        "![Latest Version](https://img.shields.io/pypi/v/stanza.svg?colorB=bc4545)\n",
        "![Python Versions](https://img.shields.io/pypi/pyversions/stanza.svg?colorB=bc4545)\n",
        "\n",
        "While the Stanza library implements accurate neural network modules for basic functionalities such as part-of-speech tagging and dependency parsing, the [Stanford CoreNLP Java library](https://stanfordnlp.github.io/CoreNLP/) has been developed for years and offers more complementary features such as coreference resolution and relation extraction. To unlock these features, the Stanza library also offers an officially maintained Python interface to the CoreNLP Java library. This interface allows you to get NLP anntotations from CoreNLP by writing native Python code.\n",
        "\n",
        "\n",
        "This tutorial walks you through the installation, setup and basic usage of this Python CoreNLP interface. If you want to learn how to use the neural network components in Stanza, please refer to other tutorials."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YpKwWeVkASGt",
        "colab_type": "text"
      },
      "source": [
        "## 1. Installation\n",
        "\n",
        "Before the installation starts, please make sure that you have Python 3 and Java installed on your computer. Since Colab already has them installed, we'll skip this procedure in this notebook."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k1Az2ECuAfG8",
        "colab_type": "text"
      },
      "source": [
        "### Installing Stanza\n",
        "\n",
        "Installing and importing Stanza are as simple as running the following commands:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xiFwYAgW4Mss",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Install stanza; note that the prefix \"!\" is not needed if you are running in a terminal\n",
        "!pip install stanza\n",
        "\n",
        "# Import stanza\n",
        "import stanza"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2zFvaA8_A32_",
        "colab_type": "text"
      },
      "source": [
        "### Setting up Stanford CoreNLP\n",
        "\n",
        "In order for the interface to work, the Stanford CoreNLP library has to be installed and a `CORENLP_HOME` environment variable has to be pointed to the installation location.\n",
        "\n",
        "Here we are going to show you how to download and install the CoreNLP library on your machine, with Stanza's installation command:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MgK6-LPV-OdA",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Download the Stanford CoreNLP package with Stanza's installation command\n",
        "# This'll take several minutes, depending on the network speed\n",
        "corenlp_dir = './corenlp'\n",
        "stanza.install_corenlp(dir=corenlp_dir)\n",
        "\n",
        "# Set the CORENLP_HOME environment variable to point to the installation location\n",
        "import os\n",
        "os.environ[\"CORENLP_HOME\"] = corenlp_dir"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jdq8MT-NAhKj",
        "colab_type": "text"
      },
      "source": [
        "That's all for the installation! 🎉  We can now double check if the installation is successful by listing files in the CoreNLP directory. You should be able to see a number of `.jar` files by running the following command:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "K5eIOaJp_tuo",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Examine the CoreNLP installation folder to make sure the installation is successful\n",
        "!ls $CORENLP_HOME"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S0xb9BHt__gx",
        "colab_type": "text"
      },
      "source": [
        "**Note 1**:\n",
        "If you are want to use the interface in a terminal (instead of a Colab notebook), you can properly set the `CORENLP_HOME` environment variable with:\n",
        "\n",
        "```bash\n",
        "export CORENLP_HOME=path_to_corenlp_dir\n",
        "```\n",
        "\n",
        "Here we instead set this variable with the Python `os` library, simply because `export` command is not well-supported in Colab notebook.\n",
        "\n",
        "\n",
        "**Note 2**:\n",
        "The `stanza.install_corenlp()` function is only available since Stanza v1.1.1. If you are using an earlier version of Stanza, please check out our [manual installation page](https://stanfordnlp.github.io/stanza/client_setup.html#manual-installation) for how to install CoreNLP on your computer.\n",
        "\n",
        "**Note 3**:\n",
        "Besides the installation function, we also provide a `stanza.download_corenlp_models()` function to help you download additional CoreNLP models for different languages that are not shipped with the default installation. Check out our [automatic installation website page](https://stanfordnlp.github.io/stanza/client_setup.html#automated-installation) for more information on how to use it."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xJsuO6D8D05q",
        "colab_type": "text"
      },
      "source": [
        "## 2. Annotating Text with CoreNLP Interface"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "dZNHxXHkH1K2",
        "colab_type": "text"
      },
      "source": [
        "### Constructing CoreNLPClient\n",
        "\n",
        "At a high level, the CoreNLP Python interface works by first starting a background Java CoreNLP server process, and then initializing a client instance in Python which can pass the text to the background server process, and accept the returned annotation results.\n",
        "\n",
        "We wrap these functionalities in a `CoreNLPClient` class. Therefore, we need to start by importing this class from Stanza."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "LS4OKnqJ8wui",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Import client module\n",
        "from stanza.server import CoreNLPClient"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WP4Dz6PIJHeL",
        "colab_type": "text"
      },
      "source": [
        "After the import is done, we can construct a `CoreNLPClient` instance. The constructor method takes a Python list of annotator names as argument. Here let's explore some basic annotators including tokenization, sentence split, part-of-speech tagging, lemmatization and named entity recognition (NER). \n",
        "\n",
        "Additionally, the client constructor accepts a `memory` argument, which specifies how much memory will be allocated to the background Java process. An `endpoint` option can be used to specify a port number used by the communication between the server and the client. The default port is 9000. However, since this port is pre-occupied by a system process in Colab, we'll manually set it to 9001 in the following example.\n",
        "\n",
        "Also, here we manually set `be_quiet=True` to avoid an IO issue in colab notebook. You should be able to use `be_quiet=False` on your own computer, which will print detailed logging information from CoreNLP during usage.\n",
        "\n",
        "For more options in constructing the clients, please refer to the [CoreNLP Client Options List](https://stanfordnlp.github.io/stanza/corenlp_client.html#corenlp-client-options)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mbOBugvd9JaM",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001\n",
        "client = CoreNLPClient(\n",
        "    annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
        "    memory='4G', \n",
        "    endpoint='http://localhost:9001',\n",
        "    be_quiet=True)\n",
        "print(client)\n",
        "\n",
        "# Start the background server and wait for some time\n",
        "# Note that in practice this is totally optional, as by default the server will be started when the first annotation is performed\n",
        "client.start()\n",
        "import time; time.sleep(10)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kgTiVjNydmIW",
        "colab_type": "text"
      },
      "source": [
        "After the above code block finishes executing, if you print the background processes, you should be able to find the Java CoreNLP server running."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "spZrJ-oFdkdF",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Print background processes and look for java\n",
        "# You should be able to see a StanfordCoreNLPServer java process running in the background\n",
        "!ps -o pid,cmd | grep java"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KxJeJ0D2LoOs",
        "colab_type": "text"
      },
      "source": [
        "### Annotating Text\n",
        "\n",
        "Annotating a piece of text is as simple as passing the text into an `annotate` function of the client object. After the annotation is complete, a `Document`  object will be returned with all annotations.\n",
        "\n",
        "Note that although in general annotations are very fast, the first annotation might take a while to complete in the notebook. Please stay patient."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "s194RnNg5z95",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Annotate some text\n",
        "text = \"Albert Einstein was a German-born theoretical physicist. He developed the theory of relativity.\"\n",
        "document = client.annotate(text)\n",
        "print(type(document))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "semmA3e0TcM1",
        "colab_type": "text"
      },
      "source": [
        "## 3. Accessing Annotations\n",
        "\n",
        "Annotations can be accessed from the returned `Document` object.\n",
        "\n",
        "A `Document` contains a list of `Sentence`s, which contain a list of `Token`s. Here let's first explore the annotations stored in all tokens."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lIO4B5d6Rk4I",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Iterate over all tokens in all sentences, and print out the word, lemma, pos and ner tags\n",
        "print(\"{:12s}\\t{:12s}\\t{:6s}\\t{}\".format(\"Word\", \"Lemma\", \"POS\", \"NER\"))\n",
        "\n",
        "for i, sent in enumerate(document.sentence):\n",
        "    print(\"[Sentence {}]\".format(i+1))\n",
        "    for t in sent.token:\n",
        "        print(\"{:12s}\\t{:12s}\\t{:6s}\\t{}\".format(t.word, t.lemma, t.pos, t.ner))\n",
        "    print(\"\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "msrJfvu8VV9m",
        "colab_type": "text"
      },
      "source": [
        "Alternatively, you can also browse the NER results by iterating over entity mentions over the sentences. For example:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ezEjc9LeV2Xs",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Iterate over all detected entity mentions\n",
        "print(\"{:30s}\\t{}\".format(\"Mention\", \"Type\"))\n",
        "\n",
        "for sent in document.sentence:\n",
        "    for m in sent.mentions:\n",
        "        print(\"{:30s}\\t{}\".format(m.entityMentionText, m.entityType))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ueGzBZ3hWzkN",
        "colab_type": "text"
      },
      "source": [
        "To print all annotations a sentence, token or mention has, you can simply print the corresponding obejct."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4_S8o2BHXIed",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Print annotations of a token\n",
        "print(document.sentence[0].token[0])\n",
        "\n",
        "# Print annotations of a mention\n",
        "print(document.sentence[0].mentions[0])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Qp66wjZ10xia",
        "colab_type": "text"
      },
      "source": [
        "**Note**: Since the Stanza CoreNLP client interface simply ports the CoreNLP annotation results to native Python objects, for a comprehensive lists of available annotators and how their annotation results can be accessed, you will need to visit the [Stanford CoreNLP website](https://stanfordnlp.github.io/CoreNLP/)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IPqzMK90X0w3",
        "colab_type": "text"
      },
      "source": [
        "## 4. Shutting Down the CoreNLP Server\n",
        "\n",
        "To shut down the background CoreNLP server process, simply call the `stop` function of the client. Note that once a server is shutdown, you'll have to restart the server with the `start()` function before any annotation is requested."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xrJq8lZ3Nw7b",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Shut down the background CoreNLP server\n",
        "client.stop()\n",
        "\n",
        "time.sleep(10)\n",
        "!ps -o pid,cmd | grep java"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "23Vwa_ifYfF7",
        "colab_type": "text"
      },
      "source": [
        "### More Information\n",
        "\n",
        "For more information on how to use the `CoreNLPClient`, please go to the [CoreNLPClient documentation page](https://stanfordnlp.github.io/stanza/corenlp_client.html)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YUrVT6kA_Bzx",
        "colab_type": "text"
      },
      "source": [
        "## 5. Simplifying Client Usage with the Python `with` statement\n",
        "\n",
        "In the above demo, we explicitly called the `client.start()` and `client.stop()` functions to start and stop a client-server connection. However, doing this in practice is usually suboptimal, since you may forget to call the `stop()` function at the end, resulting in an unused server process occupying your machine memory.\n",
        "\n",
        "To solve is, a simple solution is to use the client interface with the [Python `with` statement](https://docs.python.org/3/reference/compound_stmts.html#the-with-statement). The `with` statement provides an elegant way to automatically start and stop the server process in your Python program, without you needing to worry about this. The following code snippet demonstrates how to establish a client, annotate an example text and then stop the server with a simple `with` statement. Note that we **always recommend** you to use the `with` statement when working with the Stanza CoreNLP client interface."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "H0ct2-R4AvJh",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "print(\"Starting a server with the Python \\\"with\\\" statement...\")\n",
        "with CoreNLPClient(annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner'], \n",
        "                   memory='4G', endpoint='http://localhost:9001', be_quiet=True) as client:\n",
        "    text = \"Albert Einstein was a German-born theoretical physicist.\"\n",
        "    document = client.annotate(text)\n",
        "\n",
        "    print(\"{:30s}\\t{}\".format(\"Mention\", \"Type\"))\n",
        "    for sent in document.sentence:\n",
        "        for m in sent.mentions:\n",
        "            print(\"{:30s}\\t{}\".format(m.entityMentionText, m.entityType))\n",
        "\n",
        "print(\"\\nThe server should be stopped upon exit from the \\\"with\\\" statement.\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W435Lwc4YqKb",
        "colab_type": "text"
      },
      "source": [
        "## 6. Other Resources\n",
        "\n",
        "- [Stanza Homepage](https://stanfordnlp.github.io/stanza/)\n",
        "- [FAQs](https://stanfordnlp.github.io/stanza/faq.html)\n",
        "- [GitHub Repo](https://github.com/stanfordnlp/stanza)\n",
        "- [Reporting Issues](https://github.com/stanfordnlp/stanza/issues)\n"
      ]
    }
  ]
}

================================================
FILE: demo/arabic_test.conllu.txt
================================================
# newdoc id = assabah.20041005.0017
# newpar id = assabah.20041005.0017:p1
# sent_id = assabah.20041005.0017:p1u1
# text = سوريا: تعديل وزاري واسع يشمل 8 حقائب
# orig_file_sentence ASB_ARB_20041005.0017#1
1	سوريا	سُورِيَا	X	X---------	Foreign=Yes	0	root	0:root	SpaceAfter=No|Vform=سُورِيَا|Gloss=Syria|Root=sUr|Translit=sūriyā|LTranslit=sūriyā
2	:	:	PUNCT	G---------	_	1	punct	1:punct	Vform=:|Translit=:
3	تعديل	تَعدِيل	NOUN	N------S1I	Case=Nom|Definite=Ind|Number=Sing	6	nsubj	6:nsubj	Vform=تَعدِيلٌ|Gloss=adjustment,change,modification,amendment|Root=`_d_l|Translit=taʿdīlun|LTranslit=taʿdīl
4	وزاري	وِزَارِيّ	ADJ	A-----MS1I	Case=Nom|Definite=Ind|Gender=Masc|Number=Sing	3	amod	3:amod	Vform=وِزَارِيٌّ|Gloss=ministry,ministerial|Root=w_z_r|Translit=wizārīyun|LTranslit=wizārīy
5	واسع	وَاسِع	ADJ	A-----MS1I	Case=Nom|Definite=Ind|Gender=Masc|Number=Sing	3	amod	3:amod	Vform=وَاسِعٌ|Gloss=wide,extensive,broad|Root=w_s_`|Translit=wāsiʿun|LTranslit=wāsiʿ
6	يشمل	شَمِل	VERB	VIIA-3MS--	Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Person=3|VerbForm=Fin|Voice=Act	1	parataxis	1:parataxis	Vform=يَشمَلُ|Gloss=comprise,include,contain|Root=^s_m_l|Translit=yašmalu|LTranslit=šamil
7	8	8	NUM	Q---------	NumForm=Digit	6	obj	6:obj	Vform=٨|Translit=8
8	حقائب	حَقِيبَة	NOUN	N------P2I	Case=Gen|Definite=Ind|Number=Plur	7	nmod	7:nmod:gen	Vform=حَقَائِبَ|Gloss=briefcase,suitcase,portfolio,luggage|Root=.h_q_b|Translit=ḥaqāʾiba|LTranslit=ḥaqībat

# newpar id = assabah.20041005.0017:p2
# sent_id = assabah.20041005.0017:p2u1
# text = دمشق (وكالات الانباء) - اجرى الرئيس السوري بشار الاسد تعديلا حكومياً واسعا تم بموجبه إقالة وزيري الداخلية والاعلام عن منصبيها في حين ظل محمد ناجي العطري رئيساً للحكومة.
# orig_file_sentence ASB_ARB_20041005.0017#2
1	دمشق	دمشق	X	U---------	_	0	root	0:root	Vform=دمشق|Root=OOV|Translit=dmšq
2	(	(	PUNCT	G---------	_	3	punct	3:punct	SpaceAfter=No|Vform=(|Translit=(
3	وكالات	وِكَالَة	NOUN	N------P1R	Case=Nom|Definite=Cons|Number=Plur	1	dep	1:dep	Vform=وِكَالَاتُ|Gloss=agency|Root=w_k_l|Translit=wikālātu|LTranslit=wikālat
4	الانباء	نَبَأ	NOUN	N------P2D	Case=Gen|Definite=Def|Number=Plur	3	nmod	3:nmod:gen	SpaceAfter=No|Vform=اَلأَنبَاءِ|Gloss=news_item,report|Root=n_b_'|Translit=al-ʾanbāʾi|LTranslit=nabaʾ
5	)	)	PUNCT	G---------	_	3	punct	3:punct	Vform=)|Translit=)
6	-	-	PUNCT	G---------	_	1	punct	1:punct	Vform=-|Translit=-
7	اجرى	أَجرَى	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	1	advcl	1:advcl:فِي_حِينَ	Vform=أَجرَى|Gloss=conduct,carry_out,perform|Root=^g_r_y|Translit=ʾaǧrā|LTranslit=ʾaǧrā
8	الرئيس	رَئِيس	NOUN	N------S1D	Case=Nom|Definite=Def|Number=Sing	7	nsubj	7:nsubj	Vform=اَلرَّئِيسُ|Gloss=president,head,chairman|Root=r_'_s|Translit=ar-raʾīsu|LTranslit=raʾīs
9	السوري	سُورِيّ	ADJ	A-----MS1D	Case=Nom|Definite=Def|Gender=Masc|Number=Sing	8	amod	8:amod	Vform=اَلسُّورِيُّ|Gloss=Syrian|Root=sUr|Translit=as-sūrīyu|LTranslit=sūrīy
10	بشار	بشار	X	U---------	_	11	nmod	11:nmod	Vform=بشار|Root=OOV|Translit=bšār
11	الاسد	الاسد	X	U---------	_	8	nmod	8:nmod	Vform=الاسد|Root=OOV|Translit=ālāsd
12	تعديلا	تَعدِيل	NOUN	N------S4I	Case=Acc|Definite=Ind|Number=Sing	7	obj	7:obj	Vform=تَعدِيلًا|Gloss=adjustment,change,modification,amendment|Root=`_d_l|Translit=taʿdīlan|LTranslit=taʿdīl
13	حكومياً	حُكُومِيّ	ADJ	A-----MS4I	Case=Acc|Definite=Ind|Gender=Masc|Number=Sing	12	amod	12:amod	Vform=حُكُومِيًّا|Gloss=governmental,state,official|Root=.h_k_m|Translit=ḥukūmīyan|LTranslit=ḥukūmīy
14	واسعا	وَاسِع	ADJ	A-----MS4I	Case=Acc|Definite=Ind|Gender=Masc|Number=Sing	12	amod	12:amod	Vform=وَاسِعًا|Gloss=wide,extensive,broad|Root=w_s_`|Translit=wāsiʿan|LTranslit=wāsiʿ
15	تم	تَمّ	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	12	acl	12:acl	Vform=تَمَّ|Gloss=conclude,take_place|Root=t_m_m|Translit=tamma|LTranslit=tamm
16-18	بموجبه	_	_	_	_	_	_	_	_
16	ب	بِ	ADP	P---------	AdpType=Prep	18	case	18:case	Vform=بِ|Gloss=by,with|Root=bi|Translit=bi|LTranslit=bi
17	موجب	مُوجِب	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	16	fixed	16:fixed	Vform=مُوجِبِ|Gloss=reason,motive|Root=w_^g_b|Translit=mūǧibi|LTranslit=mūǧib
18	ه	هُوَ	PRON	SP---3MS2-	Case=Gen|Gender=Masc|Number=Sing|Person=3|PronType=Prs	15	nmod	15:nmod:بِ_مُوجِب:gen	Vform=هِ|Gloss=he,she,it|Translit=hi|LTranslit=huwa
19	إقالة	إِقَالَة	NOUN	N------S1R	Case=Nom|Definite=Cons|Number=Sing	15	nsubj	15:nsubj	Vform=إِقَالَةُ|Gloss=dismissal,discharge|Root=q_y_l|Translit=ʾiqālatu|LTranslit=ʾiqālat
20	وزيري	وَزِير	NOUN	N------D2R	Case=Gen|Definite=Cons|Number=Dual	19	nmod	19:nmod:gen	Vform=وَزِيرَي|Gloss=minister|Root=w_z_r|Translit=wazīray|LTranslit=wazīr
21	الداخلية	دَاخِلِيّ	ADJ	A-----FS2D	Case=Gen|Definite=Def|Gender=Fem|Number=Sing	20	amod	20:amod	Vform=اَلدَّاخِلِيَّةِ|Gloss=internal,domestic,interior,of_state|Root=d__h_l|Translit=ad-dāḫilīyati|LTranslit=dāḫilīy
22-23	والاعلام	_	_	_	_	_	_	_	_
22	و	وَ	CCONJ	C---------	_	23	cc	23:cc	Vform=وَ|Gloss=and|Root=wa|Translit=wa|LTranslit=wa
23	الإعلام	إِعلَام	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	21	conj	20:amod|21:conj	Vform=اَلإِعلَامِ|Gloss=information,media|Root=`_l_m|Translit=al-ʾiʿlāmi|LTranslit=ʾiʿlām
24	عن	عَن	ADP	P---------	AdpType=Prep	25	case	25:case	Vform=عَن|Gloss=about,from|Root=`an|Translit=ʿan|LTranslit=ʿan
25-26	منصبيها	_	_	_	_	_	_	_	_
25	منصبي	مَنصِب	NOUN	N------D2R	Case=Gen|Definite=Cons|Number=Dual	19	nmod	19:nmod:عَن:gen	Vform=مَنصِبَي|Gloss=post,position,office|Root=n_.s_b|Translit=manṣibay|LTranslit=manṣib
26	ها	هُوَ	PRON	SP---3FS2-	Case=Gen|Gender=Fem|Number=Sing|Person=3|PronType=Prs	25	nmod	25:nmod:gen	Vform=هَا|Gloss=he,she,it|Translit=hā|LTranslit=huwa
27	في	فِي	ADP	P---------	AdpType=Prep	7	mark	7:mark	Vform=فِي|Gloss=in|Root=fI|Translit=fī|LTranslit=fī
28	حين	حِينَ	ADP	PI------2-	AdpType=Prep|Case=Gen	7	mark	7:mark	Vform=حِينِ|Gloss=when|Root=.h_y_n|Translit=ḥīni|LTranslit=ḥīna
29	ظل	ظَلّ	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	7	parataxis	7:parataxis	Vform=ظَلَّ|Gloss=remain,continue|Root=.z_l_l|Translit=ẓalla|LTranslit=ẓall
30	محمد	محمد	X	U---------	_	32	nmod	32:nmod	Vform=محمد|Root=OOV|Translit=mḥmd
31	ناجي	ناجي	X	U---------	_	32	nmod	32:nmod	Vform=ناجي|Root=OOV|Translit=nāǧy
32	العطري	العطري	X	U---------	_	29	nsubj	29:nsubj	Vform=العطري|Root=OOV|Translit=ālʿṭry
33	رئيساً	رَئِيس	NOUN	N------S4I	Case=Acc|Definite=Ind|Number=Sing	29	xcomp	29:xcomp	Vform=رَئِيسًا|Gloss=president,head,chairman|Root=r_'_s|Translit=raʾīsan|LTranslit=raʾīs
34-35	للحكومة	_	_	_	_	_	_	_	SpaceAfter=No
34	ل	لِ	ADP	P---------	AdpType=Prep	35	case	35:case	Vform=لِ|Gloss=for,to|Root=l|Translit=li|LTranslit=li
35	الحكومة	حُكُومَة	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	33	nmod	33:nmod:لِ:gen	Vform=اَلحُكُومَةِ|Gloss=government,administration|Root=.h_k_m|Translit=al-ḥukūmati|LTranslit=ḥukūmat
36	.	.	PUNCT	G---------	_	1	punct	1:punct	Vform=.|Translit=.

# newpar id = assabah.20041005.0017:p3
# sent_id = assabah.20041005.0017:p3u1
# text = واضافت المصادر ان مهدي دخل الله رئيس تحرير صحيفة الحزب الحاكم والليبرالي التوجهات تسلم منصب وزير الاعلام خلفا لاحمد الحسن فيما تسلم اللواء غازي كنعان رئيس شعبة الامن السياسي منصب وزير الداخلية.
# orig_file_sentence ASB_ARB_20041005.0017#3
1-2	واضافت	_	_	_	_	_	_	_	_
1	و	وَ	CCONJ	C---------	_	0	root	0:root	Vform=وَ|Gloss=and|Root=wa|Translit=wa|LTranslit=wa
2	أضافت	أَضَاف	VERB	VP-A-3FS--	Aspect=Perf|Gender=Fem|Number=Sing|Person=3|Voice=Act	1	parataxis	1:parataxis	Vform=أَضَافَت|Gloss=add,attach,receive_as_guest|Root=.d_y_f|Translit=ʾaḍāfat|LTranslit=ʾaḍāf
3	المصادر	مَصدَر	NOUN	N------P1D	Case=Nom|Definite=Def|Number=Plur	2	nsubj	2:nsubj	Vform=اَلمَصَادِرُ|Gloss=source|Root=.s_d_r|Translit=al-maṣādiru|LTranslit=maṣdar
4	ان	أَنَّ	SCONJ	C---------	_	16	mark	16:mark	Vform=أَنَّ|Gloss=that|Root='_n|Translit=ʾanna|LTranslit=ʾanna
5	مهدي	مهدي	X	U---------	_	6	nmod	6:nmod	Vform=مهدي|Root=OOV|Translit=mhdy
6	دخل	دخل	X	U---------	_	16	nsubj	16:nsubj	Vform=دخل|Root=OOV|Translit=dḫl
7	الله	الله	X	U---------	_	6	nmod	6:nmod	Vform=الله|Root=OOV|Translit=āllh
8	رئيس	رَئِيس	NOUN	N------S4R	Case=Acc|Definite=Cons|Number=Sing	6	nmod	6:nmod:acc	Vform=رَئِيسَ|Gloss=president,head,chairman|Root=r_'_s|Translit=raʾīsa|LTranslit=raʾīs
9	تحرير	تَحرِير	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	8	nmod	8:nmod:gen	Vform=تَحرِيرِ|Gloss=liberation,liberating,editorship,editing|Root=.h_r_r|Translit=taḥrīri|LTranslit=taḥrīr
10	صحيفة	صَحِيفَة	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	9	nmod	9:nmod:gen	Vform=صَحِيفَةِ|Gloss=newspaper,sheet,leaf|Root=.s_.h_f|Translit=ṣaḥīfati|LTranslit=ṣaḥīfat
11	الحزب	حِزب	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	10	nmod	10:nmod:gen	Vform=اَلحِزبِ|Gloss=party,band|Root=.h_z_b|Translit=al-ḥizbi|LTranslit=ḥizb
12	الحاكم	حَاكِم	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	11	nmod	11:nmod:gen	Vform=اَلحَاكِمِ|Gloss=ruler,governor|Root=.h_k_m|Translit=al-ḥākimi|LTranslit=ḥākim
13-14	والليبرالي	_	_	_	_	_	_	_	_
13	و	وَ	CCONJ	C---------	_	6	cc	6:cc	Vform=وَ|Gloss=and|Root=wa|Translit=wa|LTranslit=wa
14	الليبرالي	لِيبِرَالِيّ	ADJ	A-----MS4D	Case=Acc|Definite=Def|Gender=Masc|Number=Sing	6	amod	6:amod	Vform=اَللِّيبِرَالِيَّ|Gloss=liberal|Root=lIbirAl|Translit=al-lībirālīya|LTranslit=lībirālīy
15	التوجهات	تَوَجُّه	NOUN	N------P2D	Case=Gen|Definite=Def|Number=Plur	14	nmod	14:nmod:gen	Vform=اَلتَّوَجُّهَاتِ|Gloss=attitude,approach|Root=w_^g_h|Translit=at-tawaǧǧuhāti|LTranslit=tawaǧǧuh
16	تسلم	تَسَلَّم	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	2	ccomp	2:ccomp	Vform=تَسَلَّمَ|Gloss=receive,assume|Root=s_l_m|Translit=tasallama|LTranslit=tasallam
17	منصب	مَنصِب	NOUN	N------S4R	Case=Acc|Definite=Cons|Number=Sing	16	obj	16:obj	Vform=مَنصِبَ|Gloss=post,position,office|Root=n_.s_b|Translit=manṣiba|LTranslit=manṣib
18	وزير	وَزِير	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	17	nmod	17:nmod:gen	Vform=وَزِيرِ|Gloss=minister|Root=w_z_r|Translit=wazīri|LTranslit=wazīr
19	الاعلام	عَلَم	NOUN	N------P2D	Case=Gen|Definite=Def|Number=Plur	18	nmod	18:nmod:gen	Vform=اَلأَعلَامِ|Gloss=flag,banner,badge|Root=`_l_m|Translit=al-ʾaʿlāmi|LTranslit=ʿalam
20	خلفا	خَلَف	NOUN	N------S4I	Case=Acc|Definite=Ind|Number=Sing	16	obl	16:obl:acc	Vform=خَلَفًا|Gloss=substitute,scion|Root=_h_l_f|Translit=ḫalafan|LTranslit=ḫalaf
21-22	لاحمد	_	_	_	_	_	_	_	_
21	ل	لِ	ADP	P---------	AdpType=Prep	23	case	23:case	Vform=لِ|Gloss=for,to|Root=l|Translit=li|LTranslit=li
22	أحمد	أَحمَد	NOUN	N------S2I	Case=Gen|Definite=Ind|Number=Sing	23	nmod	23:nmod:gen	Vform=أَحمَدَ|Gloss=Ahmad|Root=.h_m_d|Translit=ʾaḥmada|LTranslit=ʾaḥmad
23	الحسن	الحسن	X	U---------	_	20	nmod	20:nmod:لِ	Vform=الحسن|Root=OOV|Translit=ālḥsn
24	فيما	فِيمَا	CCONJ	C---------	_	25	cc	25:cc	Vform=فِيمَا|Gloss=while,during_which|Root=fI|Translit=fīmā|LTranslit=fīmā
25	تسلم	تَسَلَّم	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	16	conj	2:ccomp|16:conj	Vform=تَسَلَّمَ|Gloss=receive,assume|Root=s_l_m|Translit=tasallama|LTranslit=tasallam
26	اللواء	لِوَاء	NOUN	N------S1D	Case=Nom|Definite=Def|Number=Sing	25	nsubj	25:nsubj	Vform=اَللِّوَاءُ|Gloss=banner,flag|Root=l_w_y|Translit=al-liwāʾu|LTranslit=liwāʾ
27	غازي	غازي	X	U---------	_	28	nmod	28:nmod	Vform=غازي|Root=OOV|Translit=ġāzy
28	كنعان	كنعان	X	U---------	_	26	nmod	26:nmod	Vform=كنعان|Root=OOV|Translit=knʿān
29	رئيس	رَئِيس	NOUN	N------S1R	Case=Nom|Definite=Cons|Number=Sing	26	nmod	26:nmod:nom	Vform=رَئِيسُ|Gloss=president,head,chairman|Root=r_'_s|Translit=raʾīsu|LTranslit=raʾīs
30	شعبة	شُعبَة	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	29	nmod	29:nmod:gen	Vform=شُعبَةِ|Gloss=branch,subdivision|Root=^s_`_b|Translit=šuʿbati|LTranslit=šuʿbat
31	الامن	أَمن	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	30	nmod	30:nmod:gen	Vform=اَلأَمنِ|Gloss=security,safety|Root='_m_n|Translit=al-ʾamni|LTranslit=ʾamn
32	السياسي	سِيَاسِيّ	ADJ	A-----MS2D	Case=Gen|Definite=Def|Gender=Masc|Number=Sing	31	amod	31:amod	Vform=اَلسِّيَاسِيِّ|Gloss=political|Root=s_w_s|Translit=as-siyāsīyi|LTranslit=siyāsīy
33	منصب	مَنصِب	NOUN	N------S4R	Case=Acc|Definite=Cons|Number=Sing	25	obj	25:obj	Vform=مَنصِبَ|Gloss=post,position,office|Root=n_.s_b|Translit=manṣiba|LTranslit=manṣib
34	وزير	وَزِير	NOUN	N------S2R	Case=Gen|Definite=Cons|Number=Sing	33	nmod	33:nmod:gen	Vform=وَزِيرِ|Gloss=minister|Root=w_z_r|Translit=wazīri|LTranslit=wazīr
35	الداخلية	دَاخِلِيّ	ADJ	A-----FS2D	Case=Gen|Definite=Def|Gender=Fem|Number=Sing	34	amod	34:amod	SpaceAfter=No|Vform=اَلدَّاخِلِيَّةِ|Gloss=internal,domestic,interior,of_state|Root=d__h_l|Translit=ad-dāḫilīyati|LTranslit=dāḫilīy
36	.	.	PUNCT	G---------	_	1	punct	1:punct	Vform=.|Translit=.

# newpar id = assabah.20041005.0017:p4
# sent_id = assabah.20041005.0017:p4u1
# text = وذكرت وكالة الانباء السورية ان التعديل شمل ثماني حقائب بينها وزارتا الداخلية والاقتصاد.
# orig_file_sentence ASB_ARB_20041005.0017#4
1-2	وذكرت	_	_	_	_	_	_	_	_
1	و	وَ	CCONJ	C---------	_	0	root	0:root	Vform=وَ|Gloss=and|Root=wa|Translit=wa|LTranslit=wa
2	ذكرت	ذَكَر	VERB	VP-A-3FS--	Aspect=Perf|Gender=Fem|Number=Sing|Person=3|Voice=Act	1	parataxis	1:parataxis	Vform=ذَكَرَت|Gloss=mention,cite,remember|Root=_d_k_r|Translit=ḏakarat|LTranslit=ḏakar
3	وكالة	وِكَالَة	NOUN	N------S1R	Case=Nom|Definite=Cons|Number=Sing	2	nsubj	2:nsubj	Vform=وِكَالَةُ|Gloss=agency|Root=w_k_l|Translit=wikālatu|LTranslit=wikālat
4	الانباء	نَبَأ	NOUN	N------P2D	Case=Gen|Definite=Def|Number=Plur	3	nmod	3:nmod:gen	Vform=اَلأَنبَاءِ|Gloss=news_item,report|Root=n_b_'|Translit=al-ʾanbāʾi|LTranslit=nabaʾ
5	السورية	سُورِيّ	ADJ	A-----FS1D	Case=Nom|Definite=Def|Gender=Fem|Number=Sing	3	amod	3:amod	Vform=اَلسُّورِيَّةُ|Gloss=Syrian|Root=sUr|Translit=as-sūrīyatu|LTranslit=sūrīy
6	ان	أَنَّ	SCONJ	C---------	_	8	mark	8:mark	Vform=أَنَّ|Gloss=that|Root='_n|Translit=ʾanna|LTranslit=ʾanna
7	التعديل	تَعدِيل	NOUN	N------S4D	Case=Acc|Definite=Def|Number=Sing	8	obl	8:obl:acc	Vform=اَلتَّعدِيلَ|Gloss=adjustment,change,modification,amendment|Root=`_d_l|Translit=at-taʿdīla|LTranslit=taʿdīl
8	شمل	شَمِل	VERB	VP-A-3MS--	Aspect=Perf|Gender=Masc|Number=Sing|Person=3|Voice=Act	2	ccomp	2:ccomp	Vform=شَمِلَ|Gloss=comprise,include,contain|Root=^s_m_l|Translit=šamila|LTranslit=šamil
9	ثماني	ثَمَانُون	NUM	QL------4R	Case=Acc|Definite=Cons|NumForm=Word	8	obj	8:obj	Vform=ثَمَانِي|Gloss=eighty|Root=_t_m_n|Translit=ṯamānī|LTranslit=ṯamānūn
10	حقائب	حَقِيبَة	NOUN	N------P2I	Case=Gen|Definite=Ind|Number=Plur	9	nmod	9:nmod:gen	Vform=حَقَائِبَ|Gloss=briefcase,suitcase,portfolio,luggage|Root=.h_q_b|Translit=ḥaqāʾiba|LTranslit=ḥaqībat
11-12	بينها	_	_	_	_	_	_	_	_
11	بين	بَينَ	ADP	PI------4-	AdpType=Prep|Case=Acc	12	case	12:case	Vform=بَينَ|Gloss=between,among|Root=b_y_n|Translit=bayna|LTranslit=bayna
12	ها	هُوَ	PRON	SP---3FS2-	Case=Gen|Gender=Fem|Number=Sing|Person=3|PronType=Prs	10	obl	10:obl:بَينَ:gen	Vform=هَا|Gloss=he,she,it|Translit=hā|LTranslit=huwa
13	وزارتا	وِزَارَة	NOUN	N------D1R	Case=Nom|Definite=Cons|Number=Dual	12	nsubj	12:nsubj	Vform=وِزَارَتَا|Gloss=ministry|Root=w_z_r|Translit=wizāratā|LTranslit=wizārat
14	الداخلية	دَاخِلِيّ	ADJ	A-----FS2D	Case=Gen|Definite=Def|Gender=Fem|Number=Sing	13	amod	13:amod	Vform=اَلدَّاخِلِيَّةِ|Gloss=internal,domestic,interior,of_state|Root=d__h_l|Translit=ad-dāḫilīyati|LTranslit=dāḫilīy
15-16	والاقتصاد	_	_	_	_	_	_	_	SpaceAfter=No
15	و	وَ	CCONJ	C---------	_	16	cc	16:cc	Vform=وَ|Gloss=and|Root=wa|Translit=wa|LTranslit=wa
16	الاقتصاد	اِقتِصَاد	NOUN	N------S2D	Case=Gen|Definite=Def|Number=Sing	14	conj	13:amod|14:conj	Vform=اَلِاقتِصَادِ|Gloss=economy,saving|Root=q_.s_d|Translit=al-i-ʼqtiṣādi|LTranslit=iqtiṣād
17	.	.	PUNCT	G---------	_	1	punct	1:punct	Vform=.|Translit=.

================================================
FILE: demo/corenlp.py
================================================
from stanza.server import CoreNLPClient

# example text
print('---')
print('input text')
print('')

text = "Chris Manning is a nice person. Chris wrote a simple sentence. He also gives oranges to people."

print(text)

# set up the client
print('---')
print('starting up Java Stanford CoreNLP Server...')

# set up the client
with CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner','parse','depparse','coref'], timeout=60000, memory='16G') as client:
    # submit the request to the server
    ann = client.annotate(text)

    # get the first sentence
    sentence = ann.sentence[0]

    # get the dependency parse of the first sentence
    print('---')
    print('dependency parse of first sentence')
    dependency_parse = sentence.basicDependencies
    print(dependency_parse)
 
    # get the constituency parse of the first sentence
    print('---')
    print('constituency parse of first sentence')
    constituency_parse = sentence.parseTree
    print(constituency_parse)

    # get the first subtree of the constituency parse
    print('---')
    print('first subtree of constituency parse')
    print(constituency_parse.child[0])

    # get the value of the first subtree
    print('---')
    print('value of first subtree of constituency parse')
    print(constituency_parse.child[0].value)

    # get the first token of the first sentence
    print('---')
    print('first token of first sentence')
    token = sentence.token[0]
    print(token)

    # get the part-of-speech tag
    print('---')
    print('part of speech tag of token')
    token.pos
    print(token.pos)

    # get the named entity tag
    print('---')
    print('named entity tag of token')
    print(token.ner)

    # get an entity mention from the first sentence
    print('---')
    print('first entity mention in sentence')
    print(sentence.mentions[0])

    # access the coref chain
    print('---')
    print('coref chains for the example')
    print(ann.corefChain)

    # Use tokensregex patterns to find who wrote a sentence.
    pattern = '([ner: PERSON]+) /wrote/ /an?/ []{0,3} /sentence|article/'
    matches = client.tokensregex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "Chris wrote a simple sentence"
    matches["sentences"][1]["0"]["1"]["text"] == "Chris"

    # Use semgrex patterns to directly find who wrote what.
    pattern = '{word:wrote} >nsubj {}=subject >obj {}=object'
    matches = client.semgrex(text, pattern)
    # sentences contains a list with matches for each sentence.
    assert len(matches["sentences"]) == 3
    # length tells you whether or not there are any matches in this
    assert matches["sentences"][1]["length"] == 1
    # You can access matches like most regex groups.
    matches["sentences"][1]["0"]["text"] == "wrote"
    matches["sentences"][1]["0"]["$subject"]["text"] == "Chris"
    matches["sentences"][1]["0"]["$object"]["text"] == "sentence"



================================================
FILE: demo/en_test.conllu.txt
================================================
# newdoc id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200
# sent_id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200-0001
# newpar id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200-p0001
# text = What if Google Morphed Into GoogleOS?
1	What	what	PRON	WP	PronType=Int	0	root	0:root	_
2	if	if	SCONJ	IN	_	4	mark	4:mark	_
3	Google	Google	PROPN	NNP	Number=Sing	4	nsubj	4:nsubj	_
4	Morphed	morph	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	1	advcl	1:advcl:if	_
5	Into	into	ADP	IN	_	6	case	6:case	_
6	GoogleOS	GoogleOS	PROPN	NNP	Number=Sing	4	obl	4:obl:into	SpaceAfter=No
7	?	?	PUNCT	.	_	4	punct	4:punct	_

# sent_id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200-0002
# text = What if Google expanded on its search-engine (and now e-mail) wares into a full-fledged operating system?
1	What	what	PRON	WP	PronType=Int	0	root	0:root	_
2	if	if	SCONJ	IN	_	4	mark	4:mark	_
3	Google	Google	PROPN	NNP	Number=Sing	4	nsubj	4:nsubj	_
4	expanded	expand	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	1	advcl	1:advcl:if	_
5	on	on	ADP	IN	_	15	case	15:case	_
6	its	its	PRON	PRP$	Gender=Neut|Number=Sing|Person=3|Poss=Yes|PronType=Prs	15	nmod:poss	15:nmod:poss	_
7	search	search	NOUN	NN	Number=Sing	9	compound	9:compound	SpaceAfter=No
8	-	-	PUNCT	HYPH	_	9	punct	9:punct	SpaceAfter=No
9	engine	engine	NOUN	NN	Number=Sing	15	compound	15:compound	_
10	(	(	PUNCT	-LRB-	_	9	punct	9:punct	SpaceAfter=No
11	and	and	CCONJ	CC	_	13	cc	13:cc	_
12	now	now	ADV	RB	_	13	advmod	13:advmod	_
13	e-mail	e-mail	NOUN	NN	Number=Sing	9	conj	9:conj:and|15:compound	SpaceAfter=No
14	)	)	PUNCT	-RRB-	_	15	punct	15:punct	_
15	wares	wares	NOUN	NNS	Number=Plur	4	obl	4:obl:on	_
16	into	into	ADP	IN	_	22	case	22:case	_
17	a	a	DET	DT	Definite=Ind|PronType=Art	22	det	22:det	_
18	full	full	ADV	RB	_	20	advmod	20:advmod	SpaceAfter=No
19	-	-	PUNCT	HYPH	_	20	punct	20:punct	SpaceAfter=No
20	fledged	fledged	ADJ	JJ	Degree=Pos	22	amod	22:amod	_
21	operating	operating	NOUN	NN	Number=Sing	22	compound	22:compound	_
22	system	system	NOUN	NN	Number=Sing	4	obl	4:obl:into	SpaceAfter=No
23	?	?	PUNCT	.	_	4	punct	4:punct	_

# sent_id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200-0003
# text = [via Microsoft Watch from Mary Jo Foley ]
1	[	[	PUNCT	-LRB-	_	4	punct	4:punct	SpaceAfter=No
2	via	via	ADP	IN	_	4	case	4:case	_
3	Microsoft	Microsoft	PROPN	NNP	Number=Sing	4	compound	4:compound	_
4	Watch	Watch	PROPN	NNP	Number=Sing	0	root	0:root	_
5	from	from	ADP	IN	_	6	case	6:case	_
6	Mary	Mary	PROPN	NNP	Number=Sing	4	nmod	4:nmod:from	_
7	Jo	Jo	PROPN	NNP	Number=Sing	6	flat	6:flat	_
8	Foley	Foley	PROPN	NNP	Number=Sing	6	flat	6:flat	_
9	]	]	PUNCT	-RRB-	_	4	punct	4:punct	_

# newdoc id = weblog-blogspot.com_marketview_20050511222700_ENG_20050511_222700
# sent_id = weblog-blogspot.com_marketview_20050511222700_ENG_20050511_222700-0001
# newpar id = weblog-blogspot.com_marketview_20050511222700_ENG_20050511_222700-p0001
# text = (And, by the way, is anybody else just a little nostalgic for the days when that was a good thing?)
1	(	(	PUNCT	-LRB-	_	14	punct	14:punct	SpaceAfter=No
2	And	and	CCONJ	CC	_	14	cc	14:cc	SpaceAfter=No
3	,	,	PUNCT	,	_	14	punct	14:punct	_
4	by	by	ADP	IN	_	6	case	6:case	_
5	the	the	DET	DT	Definite=Def|PronType=Art	6	det	6:det	_
6	way	way	NOUN	NN	Number=Sing	14	obl	14:obl:by	SpaceAfter=No
7	,	,	PUNCT	,	_	14	punct	14:punct	_
8	is	be	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	14	cop	14:cop	_
9	anybody	anybody	PRON	NN	Number=Sing	14	nsubj	14:nsubj	_
10	else	else	ADJ	JJ	Degree=Pos	9	amod	9:amod	_
11	just	just	ADV	RB	_	13	advmod	13:advmod	_
12	a	a	DET	DT	Definite=Ind|PronType=Art	13	det	13:det	_
13	little	little	ADJ	JJ	Degree=Pos	14	obl:npmod	14:obl:npmod	_
14	nostalgic	nostalgic	NOUN	NN	Number=Sing	0	root	0:root	_
15	for	for	ADP	IN	_	17	case	17:case	_
16	the	the	DET	DT	Definite=Def|PronType=Art	17	det	17:det	_
17	days	day	NOUN	NNS	Number=Plur	14	nmod	14:nmod:for|23:obl:npmod	_
18	when	when	ADV	WRB	PronType=Rel	23	advmod	17:ref	_
19	that	that	PRON	DT	Number=Sing|PronType=Dem	23	nsubj	23:nsubj	_
20	was	be	AUX	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	23	cop	23:cop	_
21	a	a	DET	DT	Definite=Ind|PronType=Art	23	det	23:det	_
22	good	good	ADJ	JJ	Degree=Pos	23	amod	23:amod	_
23	thing	thing	NOUN	NN	Number=Sing	17	acl:relcl	17:acl:relcl	SpaceAfter=No
24	?	?	PUNCT	.	_	14	punct	14:punct	SpaceAfter=No
25	)	)	PUNCT	-RRB-	_	14	punct	14:punct	_

================================================
FILE: demo/japanese_test.conllu.txt
================================================
# newdoc id = test-s1
# sent_id = test-s1
# text = これに不快感を示す住民はいましたが,現在,表立って反対や抗議の声を挙げている住民はいないようです。
1	これ	此れ	PRON	代名詞	_	6	obl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=代名詞|SpaceAfter=No|UnidicInfo=,此れ,これ,これ,コレ,,,コレ,コレ,此れ
2	に	に	ADP	助詞-格助詞	_	1	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,に,に,に,ニ,,,ニ,ニ,に
3	不快	不快	NOUN	名詞-普通名詞-形状詞可能	_	4	compound	_	BunsetuBILabel=B|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,不快,不快,不快,フカイ,,,フカイ,フカイカン,不快感
4	感	感	NOUN	名詞-普通名詞-一般	_	6	obj	_	BunsetuBILabel=I|BunsetuPositionType=SEM_HEAD|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,感,感,感,カン,,,カン,フカイカン,不快感
5	を	を	ADP	助詞-格助詞	_	4	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,を,を,を,オ,,,ヲ,ヲ,を
6	示す	示す	VERB	動詞-一般-五段-サ行	_	7	acl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-五段-サ行|SpaceAfter=No|UnidicInfo=,示す,示す,示す,シメス,,,シメス,シメス,示す
7	住民	住民	NOUN	名詞-普通名詞-一般	_	9	nsubj	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,住民,住民,住民,ジューミン,,,ジュウミン,ジュウミン,住民
8	は	は	ADP	助詞-係助詞	_	7	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
9	い	居る	VERB	動詞-非自立可能-上一段-ア行	_	29	advcl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-上一段-ア行|PrevUDLemma=いる|SpaceAfter=No|UnidicInfo=,居る,い,いる,イ,,,イル,イル,居る
10	まし	ます	AUX	助動詞-助動詞-マス	_	9	aux	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助動詞-助動詞-マス|SpaceAfter=No|UnidicInfo=,ます,まし,ます,マシ,,,マス,マス,ます
11	た	た	AUX	助動詞-助動詞-タ	_	9	aux	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-助動詞-タ|SpaceAfter=No|UnidicInfo=,た,た,た,タ,,,タ,タ,た
12	が	が	SCONJ	助詞-接続助詞	_	9	mark	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助詞-接続助詞|SpaceAfter=No|UnidicInfo=,が,が,が,ガ,,,ガ,ガ,が
13	,	,	PUNCT	補助記号-読点	_	9	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-読点|SpaceAfter=No|UnidicInfo=,，,,,,,,,,，
14	現在	現在	ADV	名詞-普通名詞-副詞可能	_	16	advmod	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=副詞|SpaceAfter=No|UnidicInfo=,現在,現在,現在,ゲンザイ,,,ゲンザイ,ゲンザイ,現在
15	,	,	PUNCT	補助記号-読点	_	14	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-読点|SpaceAfter=No|UnidicInfo=,，,,,,,,,,，
16	表立っ	表立つ	VERB	動詞-一般-五段-タ行	_	24	advcl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-五段-タ行|SpaceAfter=No|UnidicInfo=,表立つ,表立っ,表立つ,オモテダッ,,,オモテダツ,オモテダツ,表立つ
17	て	て	SCONJ	助詞-接続助詞	_	16	mark	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-接続助詞|SpaceAfter=No|UnidicInfo=,て,て,て,テ,,,テ,テ,て
18	反対	反対	NOUN	名詞-普通名詞-サ変形状詞可能	_	20	nmod	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,反対,反対,反対,ハンタイ,,,ハンタイ,ハンタイ,反対
19	や	や	ADP	助詞-副助詞	_	18	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-副助詞|SpaceAfter=No|UnidicInfo=,や,や,や,ヤ,,,ヤ,ヤ,や
20	抗議	抗議	NOUN	名詞-普通名詞-サ変可能	_	22	nmod	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,抗議,抗議,抗議,コーギ,,,コウギ,コウギ,抗議
21	の	の	ADP	助詞-格助詞	_	20	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,の,の,の,ノ,,,ノ,ノ,の
22	声	声	NOUN	名詞-普通名詞-一般	_	24	obj	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,声,声,声,コエ,,,コエ,コエ,声
23	を	を	ADP	助詞-格助詞	_	22	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,を,を,を,オ,,,ヲ,ヲ,を
24	挙げ	上げる	VERB	動詞-非自立可能-下一段-ガ行	_	27	acl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-下一段-ガ行|SpaceAfter=No|UnidicInfo=,上げる,挙げ,挙げる,アゲ,,,アゲル,アゲル,上げる
25	て	て	SCONJ	助詞-接続助詞	_	24	mark	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-上一段-ア行|SpaceAfter=No|UnidicInfo=,て,て,て,テ,,,テ,テイル,ている
26	いる	居る	VERB	動詞-非自立可能-上一段-ア行	_	25	fixed	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=I|LUWPOS=助動詞-上一段-ア行|PrevUDLemma=いる|SpaceAfter=No|UnidicInfo=,居る,いる,いる,イル,,,イル,テイル,ている
27	住民	住民	NOUN	名詞-普通名詞-一般	_	29	nsubj	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,住民,住民,住民,ジューミン,,,ジュウミン,ジュウミン,住民
28	は	は	ADP	助詞-係助詞	_	27	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
29	い	居る	VERB	動詞-非自立可能-上一段-ア行	_	0	root	_	BunsetuBILabel=B|BunsetuPositionType=ROOT|LUWBILabel=B|LUWPOS=動詞-一般-上一段-ア行|PrevUDLemma=いる|SpaceAfter=No|UnidicInfo=,居る,い,いる,イ,,,イル,イル,居る
30	ない	ない	AUX	助動詞-助動詞-ナイ	Polarity=Neg	29	aux	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助動詞-助動詞-ナイ|SpaceAfter=No|UnidicInfo=,ない,ない,ない,ナイ,,,ナイ,ナイ,ない
31	よう	様	AUX	形状詞-助動詞語幹	_	29	aux	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=形状詞-助動詞語幹|PrevUDLemma=よう|SpaceAfter=No|UnidicInfo=,様,よう,よう,ヨー,,,ヨウ,ヨウ,様
32	です	です	AUX	助動詞-助動詞-デス	_	29	aux	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-助動詞-デス|PrevUDLemma=だ|SpaceAfter=No|UnidicInfo=,です,です,です,デス,,,デス,デス,です
33	。	。	PUNCT	補助記号-句点	_	29	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-句点|SpaceAfter=Yes|UnidicInfo=,。,。,。,,,,,,。

# newdoc id = test-s2
# sent_id = test-s2
# text = 幸福の科学側からは,特にどうしてほしいという要望はいただいていません。
1	幸福	幸福	NOUN	名詞-普通名詞-形状詞可能	_	4	nmod	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,幸福,幸福,幸福,コーフク,,,コウフク,コウフクノカガクガワ,幸福の科学側
2	の	の	ADP	助詞-格助詞	_	1	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,の,の,の,ノ,,,ノ,コウフクノカガクガワ,幸福の科学側
3	科学	科学	NOUN	名詞-普通名詞-サ変可能	_	4	compound	_	BunsetuBILabel=B|BunsetuPositionType=CONT|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,科学,科学,科学,カガク,,,カガク,コウフクノカガクガワ,幸福の科学側
4	側	側	NOUN	名詞-普通名詞-一般	_	17	obl	_	BunsetuBILabel=I|BunsetuPositionType=SEM_HEAD|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,側,側,側,ガワ,,,ガワ,コウフクノカガクガワ,幸福の科学側
5	から	から	ADP	助詞-格助詞	_	4	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,から,から,から,カラ,,,カラ,カラ,から
6	は	は	ADP	助詞-係助詞	_	4	case	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
7	,	,	PUNCT	補助記号-読点	_	4	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-読点|SpaceAfter=No|UnidicInfo=,，,,,,,,,,，
8	特に	特に	ADV	副詞	_	17	advmod	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=副詞|SpaceAfter=No|UnidicInfo=,特に,特に,特に,トクニ,,,トクニ,トクニ,特に
9	どう	どう	ADV	副詞	_	15	advcl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-サ行変格|SpaceAfter=No|UnidicInfo=,どう,どう,どう,ドー,,,ドウ,ドウスル,どうする
10	し	為る	AUX	動詞-非自立可能-サ行変格	_	9	aux	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=I|LUWPOS=動詞-一般-サ行変格|PrevUDLemma=する|SpaceAfter=No|UnidicInfo=,為る,し,する,シ,,,スル,ドウスル,どうする
11	て	て	SCONJ	助詞-接続助詞	_	9	mark	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-形容詞|SpaceAfter=No|UnidicInfo=,て,て,て,テ,,,テ,テホシイ,てほしい
12	ほしい	欲しい	AUX	形容詞-非自立可能-形容詞	_	11	fixed	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=I|LUWPOS=助動詞-形容詞|PrevUDLemma=ほしい|SpaceAfter=No|UnidicInfo=,欲しい,ほしい,ほしい,ホシー,,,ホシイ,テホシイ,てほしい
13	と	と	ADP	助詞-格助詞	_	9	case	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,と,と,と,ト,,,ト,トイウ,という
14	いう	言う	VERB	動詞-一般-五段-ワア行	_	13	fixed	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=I|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,言う,いう,いう,イウ,,,イウ,トイウ,という
15	要望	要望	NOUN	名詞-普通名詞-サ変可能	_	17	nsubj	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,要望,要望,要望,ヨーボー,,,ヨウボウ,ヨウボウ,要望
16	は	は	ADP	助詞-係助詞	_	15	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
17	いただい	頂く	VERB	動詞-非自立可能-五段-カ行	_	0	root	_	BunsetuBILabel=B|BunsetuPositionType=ROOT|LUWBILabel=B|LUWPOS=動詞-一般-五段-カ行|PrevUDLemma=いただく|SpaceAfter=No|UnidicInfo=,頂く,いただい,いただく,イタダイ,,,イタダク,イタダク,頂く
18	て	て	SCONJ	助詞-接続助詞	_	17	mark	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-上一段-ア行|SpaceAfter=No|UnidicInfo=,て,て,て,テ,,,テ,テイル,ている
19	い	居る	VERB	動詞-非自立可能-上一段-ア行	_	18	fixed	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=I|LUWPOS=助動詞-上一段-ア行|PrevUDLemma=いる|SpaceAfter=No|UnidicInfo=,居る,い,いる,イ,,,イル,テイル,ている
20	ませ	ます	AUX	助動詞-助動詞-マス	_	17	aux	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=B|LUWPOS=助動詞-助動詞-マス|SpaceAfter=No|UnidicInfo=,ます,ませ,ます,マセ,,,マス,マス,ます
21	ん	ず	AUX	助動詞-助動詞-ヌ	Polarity=Neg	17	aux	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助動詞-助動詞-ヌ|PrevUDLemma=ぬ|SpaceAfter=No|UnidicInfo=,ず,ん,ぬ,ン,,,ヌ,ズ,ず
22	。	。	PUNCT	補助記号-句点	_	17	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-句点|SpaceAfter=Yes|UnidicInfo=,。,。,。,,,,,,。

# newdoc id = test-s3
# sent_id = test-s3
# text = 星取り参加は当然とされ,不参加は白眼視される。
1	星取り	星取り	NOUN	名詞-普通名詞-一般	_	2	compound	_	BunsetuBILabel=B|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,星取り,星取り,星取り,ホシトリ,,,ホシトリ,ホシトリサンカ,星取り参加
2	参加	参加	NOUN	名詞-普通名詞-サ変可能	_	4	nsubj	_	BunsetuBILabel=I|BunsetuPositionType=SEM_HEAD|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,参加,参加,参加,サンカ,,,サンカ,ホシトリサンカ,星取り参加
3	は	は	ADP	助詞-係助詞	_	2	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
4	当然	当然	ADJ	形状詞-一般	_	6	advcl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=形状詞-一般|SpaceAfter=No|UnidicInfo=,当然,当然,当然,トーゼン,,,トウゼン,トウゼン,当然
5	と	と	ADP	助詞-格助詞	_	4	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-格助詞|SpaceAfter=No|UnidicInfo=,と,と,と,ト,,,ト,ト,と
6	さ	為る	VERB	動詞-非自立可能-サ行変格	_	13	acl	_	BunsetuBILabel=B|BunsetuPositionType=SEM_HEAD|LUWBILabel=B|LUWPOS=動詞-一般-サ行変格|PrevUDLemma=する|SpaceAfter=No|UnidicInfo=,為る,さ,する,サ,,,スル,スル,する
7	れ	れる	AUX	助動詞-助動詞-レル	_	6	aux	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助動詞-助動詞-レル|SpaceAfter=No|UnidicInfo=,れる,れ,れる,レ,,,レル,レル,れる
8	,	,	PUNCT	補助記号-読点	_	6	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-読点|SpaceAfter=No|UnidicInfo=,，,,,,,,,,，
9	不	不	NOUN	接頭辞	Polarity=Neg	10	compound	_	BunsetuBILabel=B|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,不,不,不,フ,,,フ,フサンカ,不参加
10	参加	参加	NOUN	名詞-普通名詞-サ変可能	_	13	nsubj	_	BunsetuBILabel=I|BunsetuPositionType=SEM_HEAD|LUWBILabel=I|LUWPOS=名詞-普通名詞-一般|SpaceAfter=No|UnidicInfo=,参加,参加,参加,サンカ,,,サンカ,フサンカ,不参加
11	は	は	ADP	助詞-係助詞	_	10	case	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助詞-係助詞|SpaceAfter=No|UnidicInfo=,は,は,は,ワ,,,ハ,ハ,は
12	白眼	白眼	NOUN	名詞-普通名詞-一般	_	13	compound	_	BunsetuBILabel=B|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=動詞-一般-サ行変格|SpaceAfter=No|UnidicInfo=,白眼,白眼,白眼,ハクガン,,,ハクガン,ハクガンシスル,白眼視する
13	視	視	NOUN	接尾辞-名詞的-サ変可能	_	0	root	_	BunsetuBILabel=I|BunsetuPositionType=ROOT|LUWBILabel=I|LUWPOS=動詞-一般-サ行変格|SpaceAfter=No|UnidicInfo=,視,視,視,シ,,,シ,ハクガンシスル,白眼視する
14	さ	為る	AUX	動詞-非自立可能-サ行変格	_	13	aux	_	BunsetuBILabel=I|BunsetuPositionType=FUNC|LUWBILabel=I|LUWPOS=動詞-一般-サ行変格|PrevUDLemma=する|SpaceAfter=No|UnidicInfo=,為る,さ,する,サ,,,スル,ハクガンシスル,白眼視する
15	れる	れる	AUX	助動詞-助動詞-レル	_	13	aux	_	BunsetuBILabel=I|BunsetuPositionType=SYN_HEAD|LUWBILabel=B|LUWPOS=助動詞-助動詞-レル|SpaceAfter=No|UnidicInfo=,れる,れる,れる,レル,,,レル,レル,れる
16	。	。	PUNCT	補助記号-句点	_	13	punct	_	BunsetuBILabel=I|BunsetuPositionType=CONT|LUWBILabel=B|LUWPOS=補助記号-句点|SpaceAfter=Yes|UnidicInfo=,。,。,。,,,,,,。

================================================
FILE: demo/pipeline_demo.py
================================================
"""
A basic demo of the Stanza neural pipeline.
"""

import sys
import argparse
import os

import stanza
from stanza.resources.common import DEFAULT_MODEL_DIR


if __name__ == '__main__':
    # get arguments
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--models_dir', help='location of models files | default: ~/stanza_resources',
                        default=DEFAULT_MODEL_DIR)
    parser.add_argument('-l', '--lang', help='Demo language',
                        default="en")
    parser.add_argument('-c', '--cpu', action='store_true', help='Use cpu as the device.')
    args = parser.parse_args()

    example_sentences = {"en": "Barack Obama was born in Hawaii.  He was elected president in 2008.",
            "zh": "中国文化经历上千年的历史演变，是各区域、各民族古代文化长期相互交流、借鉴、融合的结果。",
            "fr": "Van Gogh grandit au sein d'une famille de l'ancienne bourgeoisie. Il tente d'abord de faire carrière comme marchand d'art chez Goupil & C.",
            "vi": "Trận Trân Châu Cảng (hay Chiến dịch Hawaii theo cách gọi của Bộ Tổng tư lệnh Đế quốc Nhật Bản) là một đòn tấn công quân sự bất ngờ được Hải quân Nhật Bản thực hiện nhằm vào căn cứ hải quân của Hoa Kỳ tại Trân Châu Cảng thuộc tiểu bang Hawaii vào sáng Chủ Nhật, ngày 7 tháng 12 năm 1941, dẫn đến việc Hoa Kỳ sau đó quyết định tham gia vào hoạt động quân sự trong Chiến tranh thế giới thứ hai."}

    if args.lang not in example_sentences:
        print(f'Sorry, but we don\'t have a demo sentence for "{args.lang}" for the moment. Try one of these languages: {list(example_sentences.keys())}')
        sys.exit(1)

    # download the models
    stanza.download(args.lang, dir=args.models_dir)
    # set up a pipeline
    print('---')
    print('Building pipeline...')
    pipeline = stanza.Pipeline(lang=args.lang, dir=args.models_dir, use_gpu=(not args.cpu))
    # process the document
    doc = pipeline(example_sentences[args.lang])
    # access nlp annotations
    print('')
    print('Input: {}'.format(example_sentences[args.lang]))
    print("The tokenizer split the input into {} sentences.".format(len(doc.sentences)))
    print('---')
    print('tokens of first sentence: ')
    doc.sentences[0].print_tokens()
    print('')
    print('---')
    print('dependency parse of first sentence: ')
    doc.sentences[0].print_dependencies()
    print('')



================================================
FILE: demo/scenegraph.py
================================================
"""
Very short demo for the SceneGraph interface in the CoreNLP server

Requires CoreNLP >= 4.5.5, Stanza >= 1.5.1
"""

import json

from stanza.server import CoreNLPClient

# start_server=None if you have the server running in another process on the same host
# you can start it with whatever normal options CoreNLPClient has
#
# preload=False avoids having the server unnecessarily load annotators
# if you don't plan on using them
with CoreNLPClient(preload=False) as client:
    result = client.scenegraph("Jennifer's antennae are on her head.")
    print(json.dumps(result, indent=2))




================================================
FILE: demo/semgrex visualization.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2787d5f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import stanza\n",
    "from stanza.server.semgrex import Semgrex\n",
    "from stanza.models.common.constant import is_right_to_left\n",
    "import spacy\n",
    "from spacy import displacy\n",
    "from spacy.tokens import Doc\n",
    "from IPython.display import display, HTML\n",
    "\n",
    "\n",
    "\"\"\"\n",
    "IMPORTANT: For the code in this module to run, you must have corenlp and Java installed on your machine. Additionally,\n",
    "set an environment variable CLASSPATH equal to the path of your corenlp directory.\n",
    "\n",
    "Example: CLASSPATH=C:\\\\Users\\\\Alex\\\\PycharmProjects\\\\pythonProject\\\\stanford-corenlp-4.5.0\\\\stanford-corenlp-4.5.0\\\\*\n",
    "\"\"\"\n",
    "\n",
    "%env CLASSPATH=C:\\\\stanford-corenlp-4.5.2\\\\stanford-corenlp-4.5.2\\\\*\n",
    "def get_sentences_html(doc, language):\n",
    "    \"\"\"\n",
    "    Returns a list of the HTML strings of the dependency visualizations of a given stanza doc object.\n",
    "\n",
    "    The 'language' arg is the two-letter language code for the document to be processed.\n",
    "\n",
    "    First converts the stanza doc object to a spacy doc object and uses displacy to generate an HTML\n",
    "    string for each sentence of the doc object.\n",
    "    \"\"\"\n",
    "    html_strings = []\n",
    "\n",
    "    # blank model - we don't use any of the model features, just the visualization\n",
    "    nlp = spacy.blank(\"en\")\n",
    "    sentences_to_visualize = []\n",
    "    for sentence in doc.sentences:\n",
    "        words, lemmas, heads, deps, tags = [], [], [], [], []\n",
    "        if is_right_to_left(language):  # order of words displayed is reversed, dependency arcs remain intact\n",
    "            sent_len = len(sentence.words)\n",
    "            for word in reversed(sentence.words):\n",
    "                words.append(word.text)\n",
    "                lemmas.append(word.lemma)\n",
    "                deps.append(word.deprel)\n",
    "                tags.append(word.upos)\n",
    "                if word.head == 0:  # spaCy head indexes are formatted differently than that of Stanza\n",
    "                    heads.append(sent_len - word.id)\n",
    "                else:\n",
    "                    heads.append(sent_len - word.head)\n",
    "        else:  # left to right rendering\n",
    "            for word in sentence.words:\n",
    "                words.append(word.text)\n",
    "                lemmas.append(word.lemma)\n",
    "                deps.append(word.deprel)\n",
    "                tags.append(word.upos)\n",
    "                if word.head == 0:\n",
    "                    heads.append(word.id - 1)\n",
    "                else:\n",
    "                    heads.append(word.head - 1)\n",
    "        document_result = Doc(nlp.vocab, words=words, lemmas=lemmas, heads=heads, deps=deps, pos=tags)\n",
    "        sentences_to_visualize.append(document_result)\n",
    "\n",
    "    for line in sentences_to_visualize:  # render all sentences through displaCy\n",
    "        html_strings.append(displacy.render(line, style=\"dep\",\n",
    "                                            options={\"compact\": True, \"word_spacing\": 30, \"distance\": 100,\n",
    "                                                     \"arrow_spacing\": 20}, jupyter=False))\n",
    "    return html_strings\n",
    "\n",
    "\n",
    "def find_nth(haystack, needle, n):\n",
    "    \"\"\"\n",
    "    Returns the starting index of the nth occurrence of the substring 'needle' in the string 'haystack'.\n",
    "    \"\"\"\n",
    "    start = haystack.find(needle)\n",
    "    while start >= 0 and n > 1:\n",
    "        start = haystack.find(needle, start + len(needle))\n",
    "        n -= 1\n",
    "    return start\n",
    "\n",
    "\n",
    "def round_base(num, base=10):\n",
    "    \"\"\"\n",
    "    Rounding a number to its nearest multiple of the base. round_base(49.2, base=50) = 50.\n",
    "    \"\"\"\n",
    "    return base * round(num/base)\n",
    "\n",
    "\n",
    "def process_sentence_html(orig_html, semgrex_sentence):\n",
    "    \"\"\"\n",
    "    Takes a semgrex sentence object and modifies the HTML of the original sentence's deprel visualization,\n",
    "    highlighting words involved in the search queries and adding the label of the word inside of the semgrex match.\n",
    "\n",
    "    Returns the modified html string of the sentence's deprel visualization.\n",
    "    \"\"\"\n",
    "    tracker = {}  # keep track of which words have multiple labels\n",
    "    DEFAULT_TSPAN_COUNT = 2  # the original displacy html assigns two <tspan> objects per <text> object\n",
    "    CLOSING_TSPAN_LEN = 8  # </tspan> is 8 chars long\n",
    "    colors = ['#4477AA', '#66CCEE', '#228833', '#CCBB44', '#EE6677', '#AA3377', '#BBBBBB']\n",
    "    css_bolded_class = \"<style> .bolded{font-weight: bold;} </style>\\n\"\n",
    "    found_index = orig_html.find(\"\\n\")  # returns index where the opening <svg> ends\n",
    "    # insert the new style class into html string\n",
    "    orig_html = orig_html[: found_index + 1] + css_bolded_class + orig_html[found_index + 1:]\n",
    "\n",
    "    # Add color to words in the match, bold words in the match\n",
    "    for query in semgrex_sentence.result:\n",
    "        for i, match in enumerate(query.match):\n",
    "            color = colors[i]\n",
    "            paired_dy = 2\n",
    "            for node in match.node:\n",
    "                name, match_index = node.name, node.matchIndex\n",
    "                # edit existing <tspan> to change color and bold the text\n",
    "                start = find_nth(orig_html, \"<text\", match_index)  # finds start of svg <text> of interest\n",
    "                if match_index not in tracker:  # if we've already bolded and colored, keep the first color\n",
    "                    tspan_start = orig_html.find(\"<tspan\",\n",
    "                                                 start)  # finds start of the first svg <tspan> inside of the <text>\n",
    "                    tspan_end = orig_html.find(\"</tspan>\", start)  # finds start of the end of the above <tspan>\n",
    "                    tspan_substr = orig_html[tspan_start: tspan_end + CLOSING_TSPAN_LEN + 1] + \"\\n\"\n",
    "                    # color words in the hit and bold words in the hit\n",
    "                    edited_tspan = tspan_substr.replace('class=\"displacy-word\"', 'class=\"bolded\"').replace(\n",
    "                        'fill=\"currentColor\"', f'fill=\"{color}\"')\n",
    "                    # insert edited <tspan> object into html string\n",
    "                    orig_html = orig_html[: tspan_start] + edited_tspan + orig_html[tspan_end + CLOSING_TSPAN_LEN + 2:]\n",
    "                    tracker[match_index] = DEFAULT_TSPAN_COUNT\n",
    "\n",
    "                # next, we have to insert the new <tspan> object for the label\n",
    "                # Copy old <tspan> to copy formatting when creating new <tspan> later\n",
    "                prev_tspan_start = find_nth(orig_html[start:], \"<tspan\",\n",
    "                                            tracker[match_index] - 1) + start  # find the previous <tspan> start index\n",
    "                prev_tspan_end = find_nth(orig_html[start:], \"</tspan>\",\n",
    "                                          tracker[match_index] - 1) + start  # find the prev </tspan> start index\n",
    "                prev_tspan = orig_html[prev_tspan_start: prev_tspan_end + CLOSING_TSPAN_LEN + 1]\n",
    "\n",
    "                # Find spot to insert new tspan\n",
    "                closing_tspan_start = find_nth(orig_html[start:], \"</tspan>\", tracker[match_index]) + start\n",
    "                up_to_new_tspan = orig_html[: closing_tspan_start + CLOSING_TSPAN_LEN + 1]\n",
    "                rest_need_add_newline = orig_html[closing_tspan_start + CLOSING_TSPAN_LEN + 1:]\n",
    "\n",
    "                # Calculate proper x value in svg\n",
    "                x_value_start = prev_tspan.find('x=\"')\n",
    "                x_value_end = prev_tspan[x_value_start + 3:].find('\"') + 3  # 3 is the length of the 'x=\"' substring\n",
    "                x_value = prev_tspan[x_value_start + 3: x_value_end + x_value_start]\n",
    "\n",
    "                # Calculate proper y value in svg\n",
    "                DEFAULT_DY_VAL, dy = 2, 2\n",
    "                if paired_dy != DEFAULT_DY_VAL and node == match.node[\n",
    "                    1]:  # we're on the second node and need to adjust height to match the paired node\n",
    "                    dy = paired_dy\n",
    "                if node == match.node[0]:\n",
    "                    paired_node_level = 2\n",
    "                    if match.node[1].matchIndex in tracker:  # check if we need to adjust heights of labels\n",
    "                        paired_node_level = tracker[match.node[1].matchIndex]\n",
    "                        dif = tracker[match_index] - paired_node_level\n",
    "                        if dif > 0:  # current node has more labels\n",
    "                            paired_dy = DEFAULT_DY_VAL * dif + 1\n",
    "                            dy = DEFAULT_DY_VAL\n",
    "                        else:  # paired node has more labels, adjust this label down\n",
    "                            dy = DEFAULT_DY_VAL * (abs(dif) + 1)\n",
    "                            paired_dy = DEFAULT_DY_VAL\n",
    "\n",
    "                # Insert new <tspan> object\n",
    "                new_tspan = f'  <tspan class=\"displacy-word\" dy=\"{dy}em\" fill=\"{color}\" x={x_value}>{name[: 3].title()}.</tspan>\\n'  # abbreviate label names to 3 chars\n",
    "                orig_html = up_to_new_tspan + new_tspan + rest_need_add_newline\n",
    "                tracker[match_index] += 1\n",
    "    return orig_html\n",
    "\n",
    "\n",
    "def render_html_strings(edited_html_strings):\n",
    "    \"\"\"\n",
    "    Renders the HTML to make the edits visible\n",
    "    \"\"\"\n",
    "    for html_string in edited_html_strings:\n",
    "        display(HTML(html_string))\n",
    "\n",
    "\n",
    "def visualize_search_doc(doc, semgrex_queries, lang_code, start_match=0, end_match=10):\n",
    "    \"\"\"\n",
    "    Visualizes the semgrex results of running semgrex search on a stanza doc object with the given list of\n",
    "    semgrex queries. Returns a list of the edited HTML strings from the doc. Each element in the list represents\n",
    "    the HTML to render one of the sentences in the document.\n",
    "\n",
    "    'lang_code' is the two-letter language abbreviation for the language that the stanza doc object is written in.\n",
    "\n",
    "\n",
    "    'start_match' and 'end_match' determine which matches to visualize. Works similar to splices, so that\n",
    "    start_match=0 and end_match=10 will display the first 10 semgrex matches.\n",
    "    \"\"\"\n",
    "    matches_count = 0  # Limits number of visualizations\n",
    "    with Semgrex(classpath=\"$CLASSPATH\") as sem:\n",
    "        edited_html_strings = []\n",
    "        semgrex_results = sem.process(doc, *semgrex_queries)\n",
    "        # one html string for each sentence\n",
    "        unedited_html_strings = get_sentences_html(doc, lang_code)\n",
    "        for i in range(len(unedited_html_strings)):\n",
    "\n",
    "            if matches_count >= end_match:  # we've collected enough matches, stop early\n",
    "                break\n",
    "\n",
    "            # check if sentence has matches, if not then do not visualize\n",
    "            has_none = True\n",
    "            for query in semgrex_results.result[i].result:\n",
    "                for match in query.match:\n",
    "                    if match:\n",
    "                        has_none = False\n",
    "\n",
    "            # Process HTML if queries have matches\n",
    "            if not has_none:\n",
    "                if start_match <= matches_count < end_match:\n",
    "                    edited_string = process_sentence_html(unedited_html_strings[i], semgrex_results.result[i])\n",
    "                    edited_string = adjust_dep_arrows(edited_string)\n",
    "                    edited_html_strings.append(edited_string)\n",
    "                matches_count += 1\n",
    "\n",
    "        render_html_strings(edited_html_strings)\n",
    "    return edited_html_strings\n",
    "\n",
    "\n",
    "def visualize_search_str(text, semgrex_queries, lang_code):\n",
    "    \"\"\"\n",
    "    Visualizes the deprel of the semgrex results from running semgrex search on a string with the given list of\n",
    "    semgrex queries. Returns a list of the edited HTML strings. Each element in the list represents\n",
    "    the HTML to render one of the sentences in the document.\n",
    "\n",
    "    Internally, this function converts the string into a stanza doc object before processing the doc object.\n",
    "\n",
    "    'lang_code' is the two-letter language abbreviation for the language that the stanza doc object is written in.\n",
    "    \"\"\"\n",
    "    nlp = stanza.Pipeline(lang_code, processors=\"tokenize, pos, lemma, depparse\")\n",
    "    doc = nlp(text)\n",
    "    return visualize_search_doc(doc, semgrex_queries, lang_code)\n",
    "\n",
    "\n",
    "def adjust_dep_arrows(raw_html):\n",
    "    \"\"\"\n",
    "    The default spaCy dependency visualization has misaligned arrows.\n",
    "    We fix arrows by aligning arrow ends and bodies to the word that they are directed to. If a word has an\n",
    "    arrowhead that is pointing not directly on the word's center, align the arrowhead to match the center of the word.\n",
    "\n",
    "    returns the edited html with fixed arrow placement\n",
    "    \"\"\"\n",
    "    HTML_ARROW_BEGINNING = '<g class=\"displacy-arrow\">'\n",
    "    HTML_ARROW_ENDING = \"</g>\"\n",
    "    HTML_ARROW_ENDING_LEN = 6   # there are 2 newline chars after the arrow ending\n",
    "    arrows_start_idx = find_nth(haystack=raw_html, needle='<g class=\"displacy-arrow\">', n=1)\n",
    "    words_html, arrows_html = raw_html[: arrows_start_idx], raw_html[arrows_start_idx:]  # separate html for words and arrows\n",
    "    final_html = words_html  # continually concatenate to this after processing each arrow\n",
    "    arrow_number = 1  # which arrow we're editing (1-indexed)\n",
    "    start_idx, end_of_class_idx = find_nth(haystack=arrows_html, needle=HTML_ARROW_BEGINNING, n=arrow_number), find_nth(arrows_html, HTML_ARROW_ENDING, arrow_number)\n",
    "    while start_idx != -1:  # edit every arrow\n",
    "        arrow_section = arrows_html[start_idx: end_of_class_idx + HTML_ARROW_ENDING_LEN]  # slice a single svg arrow object\n",
    "        if arrow_section[-1] == \"<\":   # this is the last arrow in the HTML, don't cut the splice early\n",
    "            arrow_section = arrows_html[start_idx:]\n",
    "        edited_arrow_section = edit_dep_arrow(arrow_section)\n",
    "\n",
    "        final_html = final_html + edited_arrow_section  # continually update html with new arrow html until done\n",
    "\n",
    "        # Prepare for next iteration\n",
    "        arrow_number += 1\n",
    "        start_idx = find_nth(arrows_html, '<g class=\"displacy-arrow\">', n=arrow_number)\n",
    "        end_of_class_idx = find_nth(arrows_html, \"</g>\", arrow_number)\n",
    "    return final_html\n",
    "\n",
    "\n",
    "def edit_dep_arrow(arrow_html):\n",
    "    \"\"\"\n",
    "    The formatting of a displacy arrow in svg is the following:\n",
    "    <g class=\"displacy-arrow\">\n",
    "        <path class=\"displacy-arc\" id=\"arrow-c628889ffbf343e3848193a08606f10a-0-0\" stroke-width=\"2px\" d=\"M70,352.0 C70,177.0 390.0,177.0 390.0,352.0\" fill=\"none\" stroke=\"currentColor\"/>\n",
    "        <text dy=\"1.25em\" style=\"font-size: 0.8em; letter-spacing: 1px\">\n",
    "            <textPath xlink:href=\"#arrow-c628889ffbf343e3848193a08606f10a-0-0\" class=\"displacy-label\" startOffset=\"50%\" side=\"left\" fill=\"currentColor\" text-anchor=\"middle\">csubj</textPath>\n",
    "        </text>\n",
    "        <path class=\"displacy-arrowhead\" d=\"M70,354.0 L62,342.0 78,342.0\" fill=\"currentColor\"/>\n",
    "    </g>\n",
    "\n",
    "    We edit the 'd = ...' parts of the <path class ...> section to fix the arrow direction and length\n",
    "\n",
    "    returns the arrow_html with distances fixed\n",
    "    \"\"\"\n",
    "    WORD_SPACING = 50   # words start at x=50 and are separated by 100s so their x values are multiples of 50\n",
    "    M_OFFSET = 4  # length of 'd=\"M' that we search for to extract the number from d=\"M70, for instance\n",
    "    ARROW_PIXEL_SIZE = 4\n",
    "    first_d_idx, second_d_idx = find_nth(arrow_html, 'd=\"M', 1), find_nth(arrow_html, 'd=\"M', 2)  # find where d=\"M starts\n",
    "    first_d_cutoff, second_d_cutoff = arrow_html.find(\",\", first_d_idx), arrow_html.find(\",\", second_d_idx)  # isolate the number after 'M' e.g. 'M70'\n",
    "    # gives svg x values of arrow body starting position and arrowhead position\n",
    "    arrow_position, arrowhead_position = float(arrow_html[first_d_idx + M_OFFSET: first_d_cutoff]), float(arrow_html[second_d_idx + M_OFFSET: second_d_cutoff])\n",
    "    # gives starting index of where 'fill=\"none\"' or 'fill=\"currentColor\"' begin, reference points to end the d= section\n",
    "    first_fill_start_idx, second_fill_start_idx = find_nth(arrow_html, \"fill\", n=1), find_nth(arrow_html, \"fill\", n=3)\n",
    "\n",
    "    # isolate the d= ... section to edit\n",
    "    first_d, second_d = arrow_html[first_d_idx: first_fill_start_idx], arrow_html[second_d_idx: second_fill_start_idx]\n",
    "    first_d_split, second_d_split = first_d.split(\",\"), second_d.split(\",\")\n",
    "\n",
    "    if arrow_position == arrowhead_position:  # This arrow is incoming onto the word, center the arrow/head to word center\n",
    "        corrected_arrow_pos = corrected_arrowhead_pos = round_base(arrow_position, base=WORD_SPACING)\n",
    "\n",
    "        # edit first_d  -- arrow body\n",
    "        second_term = first_d_split[1].split(\" \")[0] + \" \" + str(corrected_arrow_pos)\n",
    "        first_d = 'd=\"M' + str(corrected_arrow_pos) + \",\" + second_term + \",\" + \",\".join(first_d_split[2:])\n",
    "\n",
    "        # edit second_d  -- arrowhead\n",
    "        second_term = second_d_split[1].split(\" \")[0] + \" L\" + str(corrected_arrowhead_pos - ARROW_PIXEL_SIZE)\n",
    "        third_term = second_d_split[2].split(\" \")[0] + \" \" + str(corrected_arrowhead_pos + ARROW_PIXEL_SIZE)\n",
    "        second_d = 'd=\"M' + str(corrected_arrowhead_pos) + \",\" + second_term + \",\" + third_term + \",\" + \",\".join(second_d_split[3:])\n",
    "    else:  # This arrow is outgoing to another word, center the arrow/head to that word's center\n",
    "        corrected_arrowhead_pos = round_base(arrowhead_position, base=WORD_SPACING)\n",
    "\n",
    "        # edit first_d -- arrow body\n",
    "        third_term = first_d_split[2].split(\" \")[0] + \" \" + str(corrected_arrowhead_pos)\n",
    "        fourth_term = first_d_split[3].split(\" \")[0] + \" \" + str(corrected_arrowhead_pos)\n",
    "        terms = [first_d_split[0], first_d_split[1], third_term, fourth_term] + first_d_split[4:]\n",
    "        first_d = \",\".join(terms)\n",
    "\n",
    "        # edit second_d -- arrow head\n",
    "        first_term = f'd=\"M{corrected_arrowhead_pos}'\n",
    "        second_term = second_d_split[1].split(\" \")[0] + \" L\" + str(corrected_arrowhead_pos - ARROW_PIXEL_SIZE)\n",
    "        third_term = second_d_split[2].split(\" \")[0] + \" \" + str(corrected_arrowhead_pos + ARROW_PIXEL_SIZE)\n",
    "        terms = [first_term, second_term, third_term] + second_d_split[3:]\n",
    "        second_d = \",\".join(terms)\n",
    "    # rebuild and return html\n",
    "    return arrow_html[:first_d_idx] + first_d + \" \" + arrow_html[first_fill_start_idx:second_d_idx] + second_d + \" \" + arrow_html[second_fill_start_idx:]\n",
    "\n",
    "\n",
    "def main():\n",
    "    nlp = stanza.Pipeline(\"en\", processors=\"tokenize,pos,lemma,depparse\")\n",
    "\n",
    "    # doc = nlp(\"This a dummy sentence. Banning opal removed all artifact decks from the meta.  I miss playing lantern. This is a dummy sentence.\")\n",
    "    doc = nlp(\"Banning opal removed artifact decks from the meta. Banning tennis resulted in players banning people.\")\n",
    "    # A single result .result[i].result[j] is a list of matches for sentence i on semgrex query j.\n",
    "    queries = [\"{pos:NN}=object <obl {}=action\",\n",
    "               \"{cpos:NOUN}=thing <obj {cpos:VERB}=action\"]\n",
    "    res = visualize_search_doc(doc, queries, \"en\")\n",
    "    print(res[0])  # see the first sentence's deprel visualization HTML\n",
    "    print(\"---------------------------------------\")\n",
    "    print(res[1])  # second sentence's deprel visualization HTML\n",
    "    return\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    main()\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}


================================================
FILE: demo/semgrex.py
================================================
import stanza
from stanza.server.semgrex import Semgrex

nlp = stanza.Pipeline("en", processors="tokenize,pos,lemma,depparse")

doc = nlp("Banning opal removed all artifact decks from the meta.  I miss playing lantern.")
with Semgrex(classpath="$CLASSPATH") as sem:
    semgrex_results = sem.process(doc,
                                  "{pos:NN}=object <obl {}=action",
                                  "{cpos:NOUN}=thing <obj {cpos:VERB}=action")
    print("COMPLETE RESULTS")
    print(semgrex_results)

    print("Number of matches in graph 0 ('Banning opal...') for semgrex query 1 (thing <obj action): %d" % len(semgrex_results.result[0].result[1].match))
    for match_idx, match in enumerate(semgrex_results.result[0].result[1].match):
        print("Match {}:\n-----------\n{}".format(match_idx, match))

    print("graph 1 for semgrex query 0 is an empty match: len %d" % len(semgrex_results.result[1].result[0].match))


================================================
FILE: demo/ssurgeon_script.txt
================================================
# To run this, use the stanza/server/ssurgeon.py main file.
# For example:
# python3 stanza/server/ssurgeon.py  --edit_file demo/ssurgeon_script.txt --no_print_input --input_file ../data/ud2_11/UD_English-Pronouns/en_pronouns-ud-test.conllu > en_pronouns.updated.conllu
# This script updates the UD 2.11 version of UD_English-Pronouns to
# better match punctuation attachments, MWT, and no double subjects.

# This turns unwanted csubj into advcl
{}=source >nsubj {} >csubj=bad {}
relabelNamedEdge -edge bad -reln advcl

# This detects punctuations which are not attached to the root and reattaches them
{word:/[.]/}=punct <punct=bad {}=parent << {$}=root : {}=parent << {}=root
removeNamedEdge -edge bad
addEdge -gov root -dep punct -reln punct

# This detects the specific MWT found in the 2.11 dataset
{}=first . {word:/'s|n't|'ll/}=second
combineMWT -node first -node second


================================================
FILE: doc/CoreNLP.proto
================================================
syntax = "proto2";

package edu.stanford.nlp.pipeline;

option java_package = "edu.stanford.nlp.pipeline";
option java_outer_classname = "CoreNLPProtos";

//
// From JAVANLP_HOME, you can build me with the command:
//
//  protoc -I=src/edu/stanford/nlp/pipeline/ --java_out=src src/edu/stanford/nlp/pipeline/CoreNLP.proto
//

//
// To do the python version:
//
//  protoc -I=./doc --python_out=./stanza/protobuf ./doc/CoreNLP.proto
//

//
// An enumeration for the valid languages allowed in CoreNLP
//
enum Language {
  Unknown  = 0;
  Any      = 1;
  Arabic   = 2;
  Chinese  = 3;
  English  = 4;
  German   = 5;
  French   = 6;
  Hebrew   = 7;
  Spanish  = 8;
  UniversalEnglish = 9;
  UniversalChinese = 10;
}

//
// A document; that is, the equivalent of an Annotation.
//
message Document {
  required string     text        = 1;
  repeated Sentence   sentence    = 2;
  repeated CorefChain corefChain  = 3;
  optional string     docID       = 4;
  optional string     docDate     = 7;
  optional uint64     calendar    = 8;

  /**
   * A peculiar field, for the corner case when a Document is
   * serialized without any sentences. Otherwise
   */
  repeated Token      sentencelessToken = 5;
  repeated Token      character = 10;

  repeated Quote      quote = 6;

  /**
   * This field is for entity mentions across the document.
   */
  repeated NERMention mentions = 9;
  optional bool hasEntityMentionsAnnotation = 13; // used to differentiate between null and empty list

  /**
   * xml information
   */
  optional bool    xmlDoc = 11;
  repeated Section sections = 12;

  /** coref mentions for entire document **/
  repeated Mention         mentionsForCoref                    = 14;
  optional bool hasCorefMentionAnnotation = 15;
  optional bool hasCorefAnnotation = 16;
  repeated int32 corefMentionToEntityMentionMappings = 17;
  repeated int32 entityMentionToCorefMentionMappings = 18;

  extensions 100 to 255;
}

//
// The serialized version of a CoreMap representing a sentence.
//
message Sentence {
  repeated Token            token                               = 1;
  required uint32           tokenOffsetBegin                    = 2;
  required uint32           tokenOffsetEnd                      = 3;
  optional uint32           sentenceIndex                       = 4;
  optional uint32           characterOffsetBegin                = 5;
  optional uint32           characterOffsetEnd                  = 6;
  optional ParseTree        parseTree                           = 7;
  optional ParseTree        binarizedParseTree                  = 31;
  optional ParseTree        annotatedParseTree                  = 32;
  optional string           sentiment                           = 33;
  repeated ParseTree        kBestParseTrees                     = 34;
  optional DependencyGraph  basicDependencies                   = 8;
  optional DependencyGraph  collapsedDependencies               = 9;
  optional DependencyGraph  collapsedCCProcessedDependencies    = 10;
  optional DependencyGraph  alternativeDependencies             = 13;
  repeated RelationTriple   openieTriple                        = 14;   // The OpenIE triples in the sentence
  repeated RelationTriple   kbpTriple                           = 16;   // The KBP triples in this sentence
  repeated SentenceFragment entailedSentence                    = 15;   // The entailed sentences, by natural logic
  repeated SentenceFragment entailedClause                      = 35;   // The entailed clauses, by natural logic
  optional DependencyGraph  enhancedDependencies                = 17;
  optional DependencyGraph  enhancedPlusPlusDependencies        = 18;
  repeated Token            character                           = 19;

  optional uint32           paragraph                           = 11;

  optional string           text                                = 12;   // Only needed if we're only saving the sentence.

  optional uint32           lineNumber                          = 20;

  // Fields set by other annotators in CoreNLP
  optional bool            hasRelationAnnotations              = 51;
  repeated Entity          entity                              = 52;
  repeated Relation        relation                            = 53;
  optional bool            hasNumerizedTokensAnnotation        = 54;
  repeated NERMention      mentions                            = 55;
  repeated Mention         mentionsForCoref                    = 56;
  optional bool            hasCorefMentionsAnnotation          = 57;

  optional string          sentenceID                          = 58;  // Useful when storing sentences (e.g. ForEach)
  optional string          sectionDate                         = 59;  // date of section
  optional uint32          sectionIndex                        = 60;  // section index for this sentence's section
  optional string          sectionName                         = 61;  // name of section
  optional string          sectionAuthor                       = 62;  // author of section
  optional string          docID                               = 63;  // doc id
  optional bool            sectionQuoted                       = 64;  // is this sentence in an xml quote in a post

  optional bool            hasEntityMentionsAnnotation         = 65;  // check if there are entity mentions
  optional bool            hasKBPTriplesAnnotation             = 68;  // check if there are KBP triples
  optional bool            hasOpenieTriplesAnnotation          = 69;  // check if there are OpenIE triples

  // quote stuff
  optional uint32             chapterIndex                     = 66;
  optional uint32             paragraphIndex                   = 67;
  // the quote annotator can soometimes add merged sentences
  optional Sentence           enhancedSentence                 = 70;

  // speaker stuff
  optional string          speaker                             = 71;  // The speaker speaking this sentence
  optional string          speakerType                         = 72;  // The type of speaker speaking this sentence

  extensions 100 to 255;
}

//
// The serialized version of a Token (a CoreLabel).
//
message Token {
  // Fields set by the default annotators [new CoreNLP(new Properties())]
  optional string word              = 1;    // the word's gloss (post-tokenization)
  optional string pos               = 2;    // The word's part of speech tag
  optional string value             = 3;    // The word's 'value', (e.g., parse tree node)
  optional string category          = 4;    // The word's 'category' (e.g., parse tree node)
  optional string before            = 5;    // The whitespace/xml before the token
  optional string after             = 6;    // The whitespace/xml after the token
  optional string originalText      = 7;    // The original text for this token
  optional string ner               = 8;    // The word's NER tag
  optional string coarseNER         = 62;   // The word's coarse NER tag
  optional string fineGrainedNER    = 63;   // The word's fine-grained NER tag
  repeated string nerLabelProbs     = 66;   // listing of probs
  optional string normalizedNER     = 9;    // The word's normalized NER tag
  optional string lemma             = 10;   // The word's lemma
  optional uint32 beginChar         = 11;   // The character offset begin, in the document
  optional uint32 endChar           = 12;   // The character offset end, in the document
  optional uint32 utterance         = 13;   // The utterance tag used in dcoref
  optional string speaker           = 14;   // The speaker speaking this word
  optional string speakerType       = 77;   // The type of speaker speaking this word
  optional uint32 beginIndex        = 15;   // The begin index of, e.g., a span
  optional uint32 endIndex          = 16;   // The begin index of, e.g., a span
  optional uint32 tokenBeginIndex   = 17;   // The begin index of the token
  optional uint32 tokenEndIndex     = 18;   // The end index of the token
  optional Timex  timexValue        = 19;   // The time this word refers to
  optional bool   hasXmlContext     = 21;   // Used by clean xml annotator
  repeated string xmlContext        = 22;   // Used by clean xml annotator
  optional uint32 corefClusterID    = 23;   // The [primary] cluster id for this token
  optional string answer            = 24;   // A temporary annotation which is occasionally left in
  //  optional string projectedCategory = 25;   // The syntactic category of the maximal constituent headed by the word. Not used anywhere, so deleted.
  optional uint32    headWordIndex  = 26;   // The index of the head word of this word.
  optional Operator  operator       = 27;   // If this is an operator, which one is it and what is its scope (as per Natural Logic)?
  optional Polarity  polarity       = 28;   // The polarity of this word, according to Natural Logic
  optional string    polarity_dir   = 39;   // The polarity of this word, either "up", "down", or "flat"
  optional Span      span           = 29;   // The span of a leaf node of a tree
  optional string    sentiment      = 30;   // The final sentiment of the sentence
  optional int32     quotationIndex = 31;   // The index of the quotation this token refers to
  optional MapStringString conllUFeatures = 32;
  optional string coarseTag         = 33; //  The coarse POS tag (used to store the UPOS tag)
  optional Span conllUTokenSpan     = 34;
  optional string conllUMisc        = 35;
  optional MapStringString conllUSecondaryDeps = 36;
  optional string   wikipediaEntity = 37;
  optional bool     isNewline = 38;


  // Fields set by other annotators in CoreNLP
  optional string gender          = 51;  // gender annotation (machine reading)
  optional string trueCase        = 52;  // true case type of token
  optional string trueCaseText    = 53;  // true case gloss of token

  //  Chinese character info
  optional string chineseChar     = 54;
  optional string chineseSeg      = 55;
  optional string chineseXMLChar  = 60;

  //  Arabic character info
  optional string arabicSeg       = 76;

  // Section info
  optional string sectionName     = 56;
  optional string sectionAuthor   = 57;
  optional string sectionDate     = 58;
  optional string sectionEndLabel = 59;

  // French tokens have parents
  optional string parent          = 61;

  // mention index info
  repeated uint32 corefMentionIndex = 64;
  optional uint32 entityMentionIndex = 65;

  // mwt stuff
  optional bool isMWT = 67;
  optional bool isFirstMWT = 68;
  optional string mwtText = 69;
  // setting this to a map might be nice, but there are a couple issues
  // for one, there can be values with no key
  // for another, it's a pain to correctly parse, since different treebanks
  // can have different standards for how to write out the misc field
  optional string mwtMisc = 78;

  // number info
  optional uint64 numericValue = 70;
  optional string numericType = 71;
  optional uint64 numericCompositeValue = 72;
  optional string numericCompositeType = 73;

  optional uint32 codepointOffsetBegin   = 74;
  optional uint32 codepointOffsetEnd     = 75;

  // Fields in the CoreLabel java class that are moved elsewhere
  //       string text           @see Document#text + character offsets
  //       uint32 sentenceIndex  @see Sentence#sentenceIndex
  //       string docID          @see Document#docID
  //       uint32 paragraph      @see Sentence#paragraph

  // Most serialized annotations will not have this
  // Some code paths may not correctly process this if serialized,
  // since many places will read the index off the position in a sentence
  // In particular, deserializing a Document using ProtobufAnnotationSerializer
  // will clobber any index value
  // But Semgrex and Ssurgeon in particular need a way
  // to pass around nodes where the node's index is not strictly 1, 2, 3, ...
  // thanks to the empty nodes in UD treebanks such as
  // English EWT or Estonian EWT (not related to each other)
  optional uint32 index          = 79;
  optional uint32 emptyIndex     = 80;

  extensions 100 to 255;
}

//
// An enumeration of valid sentiment values for the sentiment classifier.
//
enum Sentiment {
  STRONG_NEGATIVE   = 0;
  WEAK_NEGATIVE     = 1;
  NEUTRAL           = 2;
  WEAK_POSITIVE     = 3;
  STRONG_POSITIVE   = 4;
}

//
// A quotation marker in text
//
message Quote {
  optional string text           = 1;
  optional uint32 begin          = 2;
  optional uint32 end            = 3;
  optional uint32 sentenceBegin  = 5;
  optional uint32 sentenceEnd    = 6;
  optional uint32 tokenBegin     = 7;
  optional uint32 tokenEnd       = 8;
  optional string docid          = 9;
  optional uint32 index          = 10;
  optional string author         = 11;
  optional string mention        = 12;
  optional uint32 mentionBegin   = 13;
  optional uint32 mentionEnd     = 14;
  optional string mentionType    = 15;
  optional string mentionSieve   = 16;
  optional string speaker        = 17;
  optional string speakerSieve   = 18;
  optional string canonicalMention = 19;
  optional uint32 canonicalMentionBegin = 20;
  optional uint32 canonicalMentionEnd = 21;
  optional DependencyGraph attributionDependencyGraph = 22;
}

//
// A syntactic parse tree, with scores.
//
message ParseTree {
  repeated ParseTree child           = 1;
  optional string    value           = 2;
  optional uint32    yieldBeginIndex = 3;
  optional uint32    yieldEndIndex   = 4;
  optional double    score           = 5;
  optional Sentiment sentiment       = 6;
}

//
// A dependency graph representation.
//
message DependencyGraph {
  message Node {
    required uint32 sentenceIndex  = 1;
    required uint32 index          = 2;
    optional uint32 copyAnnotation = 3;
    optional uint32 emptyIndex     = 4;
  }

  message Edge {
    required uint32 source      = 1;
    required uint32 target      = 2;
    optional string dep         = 3;
    optional bool   isExtra     = 4;
    optional uint32 sourceCopy  = 5;
    optional uint32 targetCopy  = 6;
    optional uint32 sourceEmpty = 8;
    optional uint32 targetEmpty = 9;
    optional Language language  = 7 [default=Unknown];
  }

  repeated Node     node     = 1;
  repeated Edge     edge     = 2;
  repeated uint32   root     = 3 [packed=true];
  // optional: if this graph message is not part of a larger context,
  // the tokens will help reconstruct the actual sentence
  repeated Token    token    = 4;
  // The values in this field will index directly into the node list
  // This is useful so that additional information such as emptyIndex
  // can be considered without having to pass it around a second time
  repeated uint32   rootNode = 5 [packed=true];
}

//
// A coreference chain.
// These fields are not *really* optional. CoreNLP will crash without them.
//
message CorefChain {
  message CorefMention {
    optional int32  mentionID          = 1;
    optional string mentionType        = 2;
    optional string number             = 3;
    optional string gender             = 4;
    optional string animacy            = 5;
    optional uint32 beginIndex         = 6;
    optional uint32 endIndex           = 7;
    optional uint32 headIndex          = 9;
    optional uint32 sentenceIndex      = 10;
    optional uint32 position           = 11;  // the second element of position
  }

  required int32        chainID        = 1;
  repeated CorefMention mention        = 2;
  required uint32       representative = 3;
}

//
// a mention
//

message Mention {
  optional int32 mentionID             = 1;
  optional string mentionType          = 2;
  optional string number               = 3;
  optional string gender               = 4;
  optional string animacy              = 5;
  optional string person               = 6;
  optional uint32 startIndex           = 7;
  optional uint32 endIndex             = 9;
  optional int32 headIndex             = 10;
  optional string headString           = 11;
  optional string nerString            = 12;
  optional int32 originalRef           = 13;
  optional int32 goldCorefClusterID    = 14;
  optional int32 corefClusterID        = 15;
  optional int32 mentionNum            = 16;
  optional int32 sentNum               = 17;
  optional int32 utter                 = 18;
  optional int32 paragraph             = 19;
  optional bool isSubject              = 20;
  optional bool isDirectObject         = 21;
  optional bool isIndirectObject       = 22;
  optional bool isPrepositionObject    = 23;
  optional bool hasTwin                = 24;
  optional bool generic                = 25;
  optional bool isSingleton            = 26;
  optional bool hasBasicDependency     = 27;
  optional bool hasEnhancedDependency  = 28;
  optional bool hasContextParseTree    = 29;
  optional IndexedWord headIndexedWord = 30;
  optional IndexedWord   dependingVerb = 31;
  optional IndexedWord       headWord  = 32;
  optional SpeakerInfo    speakerInfo  = 33;

  repeated IndexedWord sentenceWords   = 50;
  repeated IndexedWord originalSpan    = 51;
  repeated string dependents           = 52;
  repeated string preprocessedTerms    = 53;
  repeated int32 appositions           = 54;
  repeated int32 predicateNominatives  = 55;
  repeated int32 relativePronouns      = 56;
  repeated int32 listMembers           = 57;
  repeated int32 belongToLists         = 58;

}

//
// store the position (sentence, token index) of a CoreLabel
//

message IndexedWord {
  optional  int32 sentenceNum          = 1;
  optional  int32 tokenIndex           = 2;
  optional  int32 docID                = 3;
  optional uint32 copyCount            = 4;
}

//
// speaker info, this is used for Mentions
//

message SpeakerInfo {
  optional string speakerName          = 1;
  repeated int32 mentions              = 2;
}

//
// A Span of text
//
message Span {
  required uint32 begin      = 1;
  required uint32 end        = 2;
}

//
// A Timex object, representing a temporal expression (TIMe EXpression)
// These fields are not *really* optional. CoreNLP will crash without them.
//
message Timex {
  optional string value      = 1;
  optional string altValue   = 2;
  optional string text       = 3;
  optional string type       = 4;
  optional string tid        = 5;
  optional uint32 beginPoint = 6;
  optional uint32 endPoint   = 7;
}

//
// A representation of an entity in a relation.
// This corresponds to the EntityMention, and more broadly the
// ExtractionObject classes.
//
message Entity {
  optional uint32 headStart      = 6;
  optional uint32 headEnd        = 7;
  optional string mentionType    = 8;
  optional string normalizedName = 9;
  optional uint32 headTokenIndex = 10;
  optional string corefID        = 11;
  // inherited from ExtractionObject
  optional string objectID       = 1;
  optional uint32 extentStart    = 2;
  optional uint32 extentEnd      = 3;
  optional string type           = 4;
  optional string subtype        = 5;
  // Implicit
  //       uint32 sentence       @see implicit in sentence
}

//
// A representation of a relation, mirroring RelationMention
//
message Relation {
  repeated string argName   = 6;
  repeated Entity arg       = 7;
  optional string signature = 8;
  // inherited from ExtractionObject
  optional string objectID = 1;
  optional uint32 extentStart    = 2;
  optional uint32 extentEnd      = 3;
  optional string type           = 4;
  optional string subtype        = 5;
  // Implicit
  //       uint32 sentence       @see implicit in sentence
}

//
// A Natural Logic operator
//
message Operator {
  required string name                = 1;
  required int32  quantifierSpanBegin = 2;
  required int32  quantifierSpanEnd   = 3;
  required int32  subjectSpanBegin    = 4;
  required int32  subjectSpanEnd      = 5;
  required int32  objectSpanBegin     = 6;
  required int32  objectSpanEnd       = 7;
}

//
// The seven informative Natural Logic relations
//
enum NaturalLogicRelation {
  EQUIVALENCE        = 0;
  FORWARD_ENTAILMENT = 1;
  REVERSE_ENTAILMENT = 2;
  NEGATION           = 3;
  ALTERNATION        = 4;
  COVER              = 5;
  INDEPENDENCE       = 6;
}

//
// The polarity of a word, according to Natural Logic
//
message Polarity {
  required NaturalLogicRelation projectEquivalence       = 1;
  required NaturalLogicRelation projectForwardEntailment = 2;
  required NaturalLogicRelation projectReverseEntailment = 3;
  required NaturalLogicRelation projectNegation          = 4;
  required NaturalLogicRelation projectAlternation       = 5;
  required NaturalLogicRelation projectCover             = 6;
  required NaturalLogicRelation projectIndependence      = 7;
}

//
// An NER mention in the text
//
message NERMention {
  optional uint32 sentenceIndex                 = 1;
  required uint32 tokenStartInSentenceInclusive = 2;
  required uint32 tokenEndInSentenceExclusive   = 3;
  required string ner                           = 4;
  optional string normalizedNER                 = 5;
  optional string entityType                    = 6;
  optional Timex  timex                         = 7;
  optional string wikipediaEntity               = 8;
  optional string gender                        = 9;
  optional uint32 entityMentionIndex            = 10;
  optional uint32 canonicalEntityMentionIndex   = 11;
  optional string entityMentionText             = 12;
}

//
// An entailed sentence fragment.
// Created by the openie annotator.
//
message SentenceFragment {
  repeated uint32 tokenIndex     = 1;
  optional uint32 root           = 2;
  optional bool   assumedTruth   = 3;
  optional double score          = 4;
}


//
// The index of a token in a document, including the sentence
// index and the offset.
//
message TokenLocation {
 optional uint32 sentenceIndex = 1;
 optional uint32 tokenIndex    = 2;

}


//
// An OpenIE relation triple.
// Created by the openie annotator.
//
message RelationTriple {
  optional string          subject        = 1;   // The surface form of the subject
  optional string          relation       = 2;   // The surface form of the relation (required)
  optional string          object         = 3;   // The surface form of the object
  optional double          confidence     = 4;   // The [optional] confidence of the extraction
  repeated TokenLocation   subjectTokens  = 13; // The tokens comprising the subject of the triple
  repeated TokenLocation   relationTokens = 14; // The tokens comprising the relation of the triple
  repeated TokenLocation   objectTokens   = 15; // The tokens comprising the object of the triple
  optional DependencyGraph tree           = 8;   // The dependency graph fragment for this triple
  optional bool            istmod         = 9;   // If true, this expresses an implicit tmod relation
  optional bool            prefixBe       = 10;  // If true, this relation string is missing a 'be' prefix
  optional bool            suffixBe       = 11;  // If true, this relation string is missing a 'be' suffix
  optional bool            suffixOf       = 12;  // If true, this relation string is missing a 'of' prefix
}


//
// A map from strings to strings.
// Used, minimally, in the CoNLLU featurizer
//
message MapStringString {
  repeated string key   = 1;
  repeated string value = 2;
}

//
// A map from integers to strings.
// Used, minimally, in the CoNLLU featurizer
//
message MapIntString {
  repeated uint32 key   = 1;
  repeated string value = 2;
}

//
// Store section info
//

message Section {
  required uint32 charBegin         = 1;
  required uint32 charEnd           = 2;
  optional string author            = 3;
  repeated uint32 sentenceIndexes   = 4;
  optional string datetime          = 5;
  repeated Quote quotes             = 6;
  optional uint32 authorCharBegin   = 7;
  optional uint32 authorCharEnd     = 8;
  required Token xmlTag             = 9;
}



// A message for requesting a semgrex
// Each sentence stores information about the tokens making up the
// corresponding graph
// An alternative would have been to use the existing Document or
// Sentence classes, but the problem with that is it would be
// ambiguous which dependency object to use.
message SemgrexRequest {
  message Dependencies {
    repeated Token           token       = 1;
    required DependencyGraph graph       = 2;
  }

  repeated string            semgrex     = 1;
  repeated Dependencies      query       = 2;
}

// The response from running a semgrex
// If you pass in M semgrex expressions and N dependency graphs,
// this returns MxN nested results.  Each SemgrexResult can match
// multiple times in one graph
//
// You may want to send multiple semgrexes per query because
// translating large numbers of dependency graphs to protobufs
// will be expensive, so doing several queries at once will save time
message SemgrexResponse {
  message NamedNode {
    required string          name        = 1;
    required int32           matchIndex  = 2;
  }

  message NamedRelation {
    required string          name        = 1;
    required string          reln        = 2;
  }

  message NamedEdge {
    required string          name        = 1;
    required int32           source      = 2;
    required int32           target      = 3;
    optional string          reln        = 4;
    optional bool            isExtra     = 5;
    optional uint32          sourceCopy  = 6;
    optional uint32          targetCopy  = 7;
  }

  message VariableString {
    required string          name        = 1;
    required string          value       = 2;
  }

  message Match {
    required int32           matchIndex   = 1;
    repeated NamedNode       node         = 2;
    repeated NamedRelation   reln         = 3;
    repeated NamedEdge       edge         = 6;
    repeated VariableString  varstring    = 7;

    // when processing multiple sentences at once,
    // which sentence this applies to
    // indexed from 0
    optional int32           sentenceIndex  = 4;
    // index of the semgrex expression this match applies to
    // indexed from 0
    optional int32           semgrexIndex = 5;
  }

  message SemgrexResult {
    repeated Match           match       = 1;
  }

  message GraphResult {
    repeated SemgrexResult   result      = 1;
  }

  repeated GraphResult       result      = 1;
}


// A message for processing an Ssurgeon
// Each sentence stores information about the tokens making up the
// corresponding graph
// An alternative would have been to use the existing Document or
// Sentence classes, but the problem with that is it would be
// ambiguous which dependency object to use.  Another problem
// is that if the intent is to use multiple graphs from a
// Sentence, then edits to the nodes of one graph would show up
// in the nodes of the other graph (same backing CoreLabels)
// and the operations themselves may not have the intended effect.
// The Ssurgeon is composed of two pieces, the semgrex and the
// ssurgeon operations, along with some optional documentation.
message SsurgeonRequest {
  message Ssurgeon {
    optional string          semgrex     = 1;
    repeated string          operation   = 2;
    optional string          id          = 3;
    optional string          notes       = 4;
    optional string          language    = 5;
  }

  repeated Ssurgeon          ssurgeon    = 1;
  repeated DependencyGraph   graph       = 2;
}

message SsurgeonResponse {
  message SsurgeonResult {
    optional DependencyGraph graph      = 1;
    optional bool            changed    = 2;
  }

  repeated SsurgeonResult    result      = 1;
}

// It's possible to send in a whole document, but we
// only care about the Sentences and Tokens
message TokensRegexRequest {
  required Document          doc         = 1;
  repeated string            pattern     = 2;
}

// The result will be a nested structure:
// repeated PatternMatch, one for each pattern
// each PatternMatch has a repeated Match,
// which tells you which sentence matched and where
message TokensRegexResponse {
  message MatchLocation {
    optional string          text        = 1;
    optional int32           begin       = 2;
    optional int32           end         = 3;
  }

  message Match {
    required int32           sentence    = 1;
    required MatchLocation   match       = 2;
    repeated MatchLocation   group       = 3;
  }

  message PatternMatch {
    repeated Match           match       = 1;
  }

  repeated PatternMatch      match       = 1;
}

// A protobuf which allows to pass in a document with basic
// dependencies to be converted to enhanced
message DependencyEnhancerRequest {
  required Document          document           = 1;

  oneof ref {
    Language          language           = 2;
    // The expected value of this is a regex which matches relative pronouns
    string            relativePronouns   = 3;
  }
}

// A version of ParseTree with a flattened structure so that deep trees
// don't exceed the protobuf stack depth
message FlattenedParseTree {
  message Node {
    oneof contents {
      bool              openNode           = 1;
      bool              closeNode          = 2;
      string            value              = 3;
    }

    optional double     score              = 4;
  }

  repeated Node         nodes              = 1;
}

// A protobuf for calling the java constituency parser evaluator from elsewhere
message EvaluateParserRequest {
  message ParseResult {
    required FlattenedParseTree         gold           = 1;
    // repeated so you can send in kbest parses, if your parser handles that
    // note that this already includes a score field
    repeated FlattenedParseTree         predicted      = 2;
  }

  repeated ParseResult         treebank       = 1;
}

message EvaluateParserResponse {
  required double              f1             = 1;
  optional double              kbestF1        = 2;
  // keep track of the individual tree F1 scores
  repeated double              treeF1         = 3;
}


// A protobuf for running Tsurgeon operations on constituency trees
message TsurgeonRequest {
  message Operation {
    required string                tregex         = 1;
    repeated string                tsurgeon       = 2;
  }
  repeated Operation               operations     = 1;
  repeated FlattenedParseTree      trees          = 2;
}

// The results of the Tsurgeon operation
message TsurgeonResponse {
  repeated FlattenedParseTree      trees          = 1;
}

// Sent in Morphology requests - a stream of sentences with tagged words
message MorphologyRequest {
  message TaggedWord {
    required string                word           = 1;
    optional string                xpos           = 2;
  }

  repeated TaggedWord              words          = 1;
}

// Sent back from the Morphology request - the words and their tags
message MorphologyResponse {
  message WordTagLemma {
    required string                word           = 1;
    optional string                xpos           = 2;
    required string                lemma          = 3;
  }

  repeated WordTagLemma            words          = 1;
}


// A request for converting constituency trees to dependency graphs
message DependencyConverterRequest {
  repeated FlattenedParseTree      trees          = 1;
}

// The result of using the CoreNLP dependency converter.
// One graph per tree
message DependencyConverterResponse {
  message DependencyConversion {
    required DependencyGraph       graph          = 1;
    optional FlattenedParseTree    tree           = 2;
  }

  repeated DependencyConversion    conversions         = 1;
}



================================================
FILE: scripts/config.sh
================================================
#!/bin/bash
#
# Set environment variables for the training and testing of stanza modules.

# Set UDBASE to the location of UD data folder
# The data should be CoNLL-U format
# For details, see
#   http://universaldependencies.org/conll18/data.html (CoNLL-18 UD data)
#   https://universaldependencies.org/
# When rebuilding models based on Universal Dependencies, download the
#   UD data to some directory, set UDBASE to that directory, and
#   uncomment this line.  Alternatively, put UDBASE in your shell
#   config, Windows env variables, etc as relevant.
# export UDBASE=/path/to/UD

# Set NERBASE to the location of NER data folder
# The data should be BIO format or convertable to that format
# For details, see https://www.aclweb.org/anthology/W03-0419.pdf (CoNLL-03 NER paper)
# There are other NER datasets, supported in
#   stanza/utils/datasets/ner/prepare_ner_dataset.py
# If rebuilding NER data, choose a location for the NER directory
#   and set NERBASE to that variable.
# export NERBASE=/path/to/NER

# Set CONSTITUENCY_BASE to the location of NER data folder
# The data will be in some dataset-specific format
# There is a conversion script which will turn this
#   into a PTB style format
#   stanza/utils/datasets/constituency/prepare_con_dataset.py
# If processing constituency data, choose a location for the CON data
#   and set CONSTITUENCY_BASE to that variable.
# export CONSTITUENCY_BASE=/path/to/CON

# Set directories to store processed training/evaluation files
# $DATA_ROOT is a default home for where all the outputs from the
#   preparation scripts will go.  The training scripts will then look
#   for the stanza formatted data in that directory.
export DATA_ROOT=./data
export TOKENIZE_DATA_DIR=$DATA_ROOT/tokenize
export MWT_DATA_DIR=$DATA_ROOT/mwt
export LEMMA_DATA_DIR=$DATA_ROOT/lemma
export POS_DATA_DIR=$DATA_ROOT/pos
export DEPPARSE_DATA_DIR=$DATA_ROOT/depparse
export ETE_DATA_DIR=$DATA_ROOT/ete
export NER_DATA_DIR=$DATA_ROOT/ner
export CHARLM_DATA_DIR=$DATA_ROOT/charlm
export CONSTITUENCY_DATA_DIR=$DATA_ROOT/constituency
export SENTIMENT_DATA_DIR=$DATA_ROOT/sentiment

# Set directories to store external word vector data
export WORDVEC_DIR=./extern_data/wordvec


================================================
FILE: scripts/download_vectors.sh
================================================
#!/bin/bash
#
# Download word vector files for all supported languages. Run as:
#   ./download_vectors.sh WORDVEC_DIR
# where WORDVEC_DIR is the target directory to store the word vector data.

# check arguments
: ${1?"Usage: $0 WORDVEC_DIR"}
WORDVEC_DIR=$1

# constants and functions
CONLL17_URL="https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-1989/word-embeddings-conll17.tar"
CONLL17_TAR="word-embeddings-conll17.tar"

FASTTEXT_BASE_URL="https://dl.fbaipublicfiles.com/fasttext/vectors-wiki"

# TODO: some fasttext vectors are now at
# https://fasttext.cc/docs/en/pretrained-vectors.html
# there are also vectors for
# Welsh, Icelandic, Thai, Sanskrit
# https://fasttext.cc/docs/en/crawl-vectors.html

# We get the Armenian word vectors from here:
# https://github.com/ispras-texterra/word-embeddings-eval-hy
# https://arxiv.org/ftp/arxiv/papers/1906/1906.03134.pdf
# In particular, the glove model (dogfooding):
# https://at.ispras.ru/owncloud/index.php/s/pUUiS1l1jGKNax3/download
# These vectors improved F1 by about 1 on various tasks for Armenian
# and had much better coverage of Western Armenian

# For Eryza, we use word vectors available here:
# https://github.com/mokha/semantics
# @incollection{Alnajjar_2021,
#   doi = {10.31885/9789515150257.24},
#   url = {https://doi.org/10.31885%2F9789515150257.24},
#   year = 2021,
#   month = {mar},
#   publisher = {University of Helsinki},
#   pages = {275--288},
#   author = {Khalid Alnajjar},
#   title = {When Word Embeddings Become Endangered},
#   booktitle = {Multilingual Facilitation}
# }

declare -a FASTTEXT_LANG=("Afrikaans" "Breton" "Buryat" "Chinese" "Faroese" "Gothic" "Kurmanji" "North_Sami" "Serbian" "Upper_Sorbian")
declare -a FASTTEXT_CODE=("af" "br" "bxr" "zh" "fo" "got" "ku" "se" "sr" "hsb")
declare -a LOCAL_CODE=("af" "br" "bxr" "zh" "fo" "got" "kmr" "sme" "sr" "hsb")

color_green='\033[32;1m'
color_clear='\033[0m' # No Color
function msg() {
    echo -e "${color_green}$@${color_clear}"
}

function prepare_fasttext_vec() {
    lang=$1
    ftcode=$2
    code=$3

    cwd=$(pwd)
    mkdir -p $lang
    cd $lang
    msg "=== Downloading fasttext vector file for ${lang}..."
    url="${FASTTEXT_BASE_URL}/wiki.${ftcode}.vec"
    fname="${code}.vectors"
    wget $url -O $fname

    msg "=== Compressing file ${fname}..."
    xz $fname
    cd $cwd
}

# do the actual work
mkdir -p $WORDVEC_DIR
cd $WORDVEC_DIR

msg "Downloading CONLL17 word vectors. This may take a while..."
wget $CONLL17_URL -O $CONLL17_TAR

msg "Extracting CONLL17 word vector files..."
tar -xvf $CONLL17_TAR
rm $CONLL17_TAR

msg "Preparing fasttext vectors for the rest of the languages."
for (( i=0; i<${#FASTTEXT_LANG[*]}; ++i)); do
    prepare_fasttext_vec ${FASTTEXT_LANG[$i]} ${FASTTEXT_CODE[$i]} ${LOCAL_CODE[$i]}
done

# handle old french
mkdir Old_French
ln -s French/fr.vectors.xz Old_French/fro.vectors.xz

msg "All done."


================================================
FILE: setup.py
================================================
# Always prefer setuptools over distutils
import re

from setuptools import setup, find_packages
# To use a consistent encoding
from codecs import open
from os import path

here = path.abspath(path.dirname(__file__))

# read the version from stanza/_version.py
version_file_contents = open(path.join(here, 'stanza/_version.py'), encoding='utf-8').read()
VERSION = re.compile('__version__ = \"(.*)\"').search(version_file_contents).group(1)

# Get the long description from the README file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
    long_description = f.read()

setup(
    name='stanza',

    # Versions should comply with PEP440.  For a discussion on single-sourcing
    # the version across setup.py and the project code, see
    # https://packaging.python.org/en/latest/single_source_version.html
    version=VERSION,

    description='A Python NLP Library for Many Human Languages, by the Stanford NLP Group',
    long_description=long_description,
    long_description_content_type="text/markdown",
    # The project's main homepage.
    url='https://github.com/stanfordnlp/stanza',

    # Author details
    author='Stanford Natural Language Processing Group',
    author_email='jebolton@stanford.edu',

    # Choose your license
    license='Apache License 2.0',

    # See https://pypi.python.org/pypi?%3Aaction=list_classifiers
    classifiers=[
        # How mature is this project? Common values are
        #   3 - Alpha
        #   4 - Beta
        #   5 - Production/Stable
        'Development Status :: 4 - Beta',

        # Indicate who your project is intended for
        'Intended Audience :: Developers',
        'Intended Audience :: Education',
        'Intended Audience :: Science/Research',
        'Intended Audience :: Information Technology',
        'Topic :: Scientific/Engineering',
        'Topic :: Scientific/Engineering :: Artificial Intelligence',
        'Topic :: Scientific/Engineering :: Information Analysis',
        'Topic :: Text Processing',
        'Topic :: Text Processing :: Linguistic',
        'Topic :: Software Development',
        'Topic :: Software Development :: Libraries',

        # Specify the Python versions you support here. In particular, ensure
        # that you indicate whether you support Python 2, Python 3 or both.
        'Programming Language :: Python :: 3.9',
        'Programming Language :: Python :: 3.10',
        'Programming Language :: Python :: 3.11',
        'Programming Language :: Python :: 3.12',
        'Programming Language :: Python :: 3.13',
    ],

    # What does your project relate to?
    keywords='natural-language-processing nlp natural-language-understanding stanford-nlp deep-learning',

    # You can just specify the packages manually here if your project is
    # simple. Or you can use find_packages().
    packages=find_packages(exclude=['data', 'docs', 'extern_data', 'figures', 'saved_models']),

    # List run-time dependencies here.  These will be installed by pip when
    # your project is installed. For an analysis of "install_requires" vs pip's
    # requirements files see:
    # https://packaging.python.org/en/latest/requirements.html
    install_requires=[
        'emoji', 
        'numpy', 
        'platformdirs',
        'protobuf>=3.15.0',
        'requests', 
        'networkx',
        'tomli;python_version<"3.11"',
        'torch>=1.13.0',
        'tqdm',
        'udtools>=0.2.4',
    ],

    # List required Python versions
    python_requires='>=3.9',

    # List additional groups of dependencies here (e.g. development
    # dependencies). You can install these using the following syntax,
    # for example:
    # $ pip install -e .[dev,test]
    extras_require={
        'dev': [
            'check-manifest',
        ],
        'test': [
            'coverage', 
            'pytest',
        ],
        'transformers': [
            'transformers>=3.0.0',
            'peft>=0.6.1',
        ],
        'datasets': [
            'datasets',
        ],
        'tokenizers': [
            'jieba',
            'pythainlp',
            'python-crfsuite',
            'spacy',
            'sudachidict_core',
            'sudachipy',
        ],
        'visualization': [
            'spacy',
            'streamlit',
            'ipython',
        ],
        'morphseg': [
            'morphseg>=0.2.0',
        ]
    },

    # If there are data files included in your packages that need to be
    # installed, specify them here.  If using Python 2.6 or less, then these
    # have to be included in MANIFEST.in as well.
    package_data={
        "": ["pipeline/demo/*ttf",
             "pipeline/demo/*css",
             "pipeline/demo/*html",
             "pipeline/demo/*js",
             "pipeline/demo/*gif",],
    },

    include_package_data=True,

    # Although 'package_data' is the preferred approach, in some case you may
    # need to place data files outside of your packages. See:
    # http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # noqa
    # In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
    data_files=[],

    # To provide executable scripts, use entry points in preference to the
    # "scripts" keyword. Entry points provide cross-platform support and allow
    # pip to create the appropriate form of executable for the target platform.
    entry_points={
    },
)


================================================
FILE: stanza/__init__.py
================================================
from stanza.pipeline.core import DownloadMethod, Pipeline
from stanza.pipeline.multilingual import MultilingualPipeline
from stanza.models.common.doc import Document
from stanza.resources.common import download
from stanza.resources.installation import install_corenlp, download_corenlp_models
from stanza._version import __version__, __resources_version__
from stanza.pipeline.morphseg_processor import MorphSegProcessor

import logging
logger = logging.getLogger('stanza')

# if the client application hasn't set the log level, we set it
# ourselves to INFO
if logger.level == 0:
    logger.setLevel(logging.INFO)

log_handler = logging.StreamHandler()
log_formatter = logging.Formatter(fmt="%(asctime)s %(levelname)s: %(message)s",
                              datefmt='%Y-%m-%d %H:%M:%S')
log_handler.setFormatter(log_formatter)

# also, if the client hasn't added any handlers for this logger
# (or a default handler), we add a handler of our own
#
# client can later do
#   logger.removeHandler(stanza.log_handler)
if not logger.hasHandlers():
    logger.addHandler(log_handler)


================================================
FILE: stanza/_version.py
================================================
""" Single source of truth for version number """

__version__ = "1.11.1"
__resources_version__ = '1.11.0'


================================================
FILE: stanza/models/__init__.py
================================================


================================================
FILE: stanza/models/_training_logging.py
================================================
import logging

logger = logging.getLogger('stanza')
logger.setLevel(logging.DEBUG)

================================================
FILE: stanza/models/charlm.py
================================================
"""
Entry point for training and evaluating a character-level neural language model.
"""

import argparse
from copy import copy
import logging
import lzma
import math
import os
import random
import time
from types import GeneratorType
import numpy as np
import torch

from stanza.models.common.char_model import build_charlm_vocab, CharacterLanguageModel, CharacterLanguageModelTrainer
from stanza.models.common.vocab import CharVocab
from stanza.models.common import utils
from stanza.models import _training_logging

logger = logging.getLogger('stanza')

def repackage_hidden(h):
    """Wraps hidden states in new Tensors,
    to detach them from their history."""
    if isinstance(h, torch.Tensor):
        return h.detach()
    else:
        return tuple(repackage_hidden(v) for v in h)

def batchify(data, bsz, device):
    # Work out how cleanly we can divide the dataset into bsz parts.
    nbatch = data.size(0) // bsz
    # Trim off any extra elements that wouldn't cleanly fit (remainders).
    data = data.narrow(0, 0, nbatch * bsz)
    # Evenly divide the data across the bsz batches.
    data = data.view(bsz, -1) # batch_first is True
    data = data.to(device)
    return data

def get_batch(source, i, seq_len):
    seq_len = min(seq_len, source.size(1) - 1 - i)
    data = source[:, i:i+seq_len]
    target = source[:, i+1:i+1+seq_len].reshape(-1)
    return data, target

def load_file(filename, vocab, direction):
    with utils.open_read_text(filename) as fin:
        data = fin.read()

    idx = vocab['char'].map(data)
    if direction == 'backward': idx = idx[::-1]
    return torch.tensor(idx)

def load_data(path, vocab, direction):
    if os.path.isdir(path):
        filenames = sorted(os.listdir(path))
        for filename in filenames:
            logger.info('Loading data from {}'.format(filename))
            data = load_file(os.path.join(path, filename), vocab, direction)
            yield data
    else:
        data = load_file(path, vocab, direction)
        yield data

def build_argparse():
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--train_file', type=str, help="Input plaintext file")
    parser.add_argument('--train_dir', type=str, help="If non-empty, load from directory with multiple training files")
    parser.add_argument('--eval_file', type=str, help="Input plaintext file for the dev/test set")
    parser.add_argument('--shorthand', type=str, help="UD treebank shorthand")

    parser.add_argument('--mode', default='train', choices=['train', 'predict'])
    parser.add_argument('--direction', default='forward', choices=['forward', 'backward'], help="Forward or backward language model")
    parser.add_argument('--forward', action='store_const', dest='direction', const='forward', help="Train a forward language model")
    parser.add_argument('--backward', action='store_const', dest='direction', const='backward', help="Train a backward language model")

    parser.add_argument('--char_emb_dim', type=int, default=100, help="Dimension of unit embeddings")
    parser.add_argument('--char_hidden_dim', type=int, default=1024, help="Dimension of hidden units")
    parser.add_argument('--char_num_layers', type=int, default=1, help="Layers of RNN in the language model")
    parser.add_argument('--char_dropout', type=float, default=0.05, help="Dropout probability")
    parser.add_argument('--char_unit_dropout', type=float, default=1e-5, help="Randomly set an input char to UNK during training")
    parser.add_argument('--char_rec_dropout', type=float, default=0.0, help="Recurrent dropout probability")

    parser.add_argument('--batch_size', type=int, default=100, help="Batch size to use")
    parser.add_argument('--bptt_size', type=int, default=250, help="Sequence length to consider at a time")
    parser.add_argument('--epochs', type=int, default=50, help="Total epochs to train the model for")
    parser.add_argument('--max_grad_norm', type=float, default=0.25, help="Maximum gradient norm to clip to")
    parser.add_argument('--lr0', type=float, default=5, help="Initial learning rate")
    parser.add_argument('--anneal', type=float, default=0.25, help="Anneal the learning rate by this amount when dev performance deteriorate")
    parser.add_argument('--patience', type=int, default=1, help="Patience for annealing the learning rate")
    parser.add_argument('--weight_decay', type=float, default=0.0, help="Weight decay")
    parser.add_argument('--momentum', type=float, default=0.0, help='Momentum for SGD.')
    parser.add_argument('--cutoff', type=int, default=1000, help="Frequency cutoff for char vocab. By default we assume a very large corpus.")
    
    parser.add_argument('--report_steps', type=int, default=50, help="Update step interval to report loss")
    parser.add_argument('--eval_steps', type=int, default=100000, help="Update step interval to run eval on dev; set to -1 to eval after each epoch")
    parser.add_argument('--save_name', type=str, default=None, help="File name to save the model")
    parser.add_argument('--vocab_save_name', type=str, default=None, help="File name to save the vocab")
    parser.add_argument('--checkpoint_save_name', type=str, default=None, help="File name to save the most recent checkpoint")
    parser.add_argument('--no_checkpoint', dest='checkpoint', action='store_false', help="Don't save checkpoints")
    parser.add_argument('--save_dir', type=str, default='saved_models/charlm', help="Directory to save models in")
    parser.add_argument('--summary', action='store_true', help='Use summary writer to record progress.')
    utils.add_device_args(parser)
    parser.add_argument('--seed', type=int, default=1234)

    parser.add_argument('--wandb', action='store_true', help='Start a wandb session and write the results of training.  Only applies to training.  Use --wandb_name instead to specify a name')
    parser.add_argument('--wandb_name', default=None, help='Name of a wandb session to start when training.  Will default to the dataset short name')
    return parser

def build_model_filename(args):
    if args['save_name']:
        save_name = args['save_name']
    else:
        save_name = '{}_{}_charlm.pt'.format(args['shorthand'], args['direction'])
    model_file = os.path.join(args['save_dir'], save_name)
    return model_file

def parse_args(args=None):
    parser = build_argparse()

    args = parser.parse_args(args=args)

    if args.wandb_name:
        args.wandb = True

    args = vars(args)
    return args

def main(args=None):
    args = parse_args(args=args)

    utils.set_random_seed(args['seed'])

    logger.info("Running {} character-level language model in {} mode".format(args['direction'], args['mode']))
    
    utils.ensure_dir(args['save_dir'])

    if args['mode'] == 'train':
        train(args)
    else:
        evaluate(args)

def evaluate_epoch(args, vocab, data, model, criterion):
    """
    Run an evaluation over entire dataset.
    """
    model.eval()
    device = next(model.parameters()).device
    hidden = None
    total_loss = 0
    if isinstance(data, GeneratorType):
        data = list(data)
        assert len(data) == 1, 'Only support single dev/test file'
        data = data[0]
    batches = batchify(data, args['batch_size'], device)
    with torch.no_grad():
        for i in range(0, batches.size(1) - 1, args['bptt_size']):
            data, target = get_batch(batches, i, args['bptt_size'])
            lens = [data.size(1) for i in range(data.size(0))]

            output, hidden, decoded = model.forward(data, lens, hidden)
            loss = criterion(decoded.view(-1, len(vocab['char'])), target)
            
            hidden = repackage_hidden(hidden)
            total_loss += data.size(1) * loss.data.item()
    return total_loss / batches.size(1)

def evaluate_and_save(args, vocab, data, trainer, best_loss, model_file, checkpoint_file, writer=None):
    """
    Run an evaluation over entire dataset, print progress and save the model if necessary.
    """
    start_time = time.time()
    loss = evaluate_epoch(args, vocab, data, trainer.model, trainer.criterion)
    ppl = math.exp(loss)
    elapsed = int(time.time() - start_time)
    # TODO: step the scheduler less often when the eval frequency is higher
    previous_lr = get_current_lr(trainer, args)
    trainer.scheduler.step(loss)
    current_lr = get_current_lr(trainer, args)
    if previous_lr != current_lr:
        logger.info("Updating learning rate to %f", current_lr)
    logger.info(
        "| eval checkpoint @ global step {:10d} | time elapsed {:6d}s | loss {:5.2f} | ppl {:8.2f}".format(
            trainer.global_step,
            elapsed,
            loss,
            ppl,
        )
    )
    if best_loss is None or loss < best_loss:
        best_loss = loss
        trainer.save(model_file, full=False)
        logger.info('new best model saved at step {:10d}'.format(trainer.global_step))
    if writer:
        writer.add_scalar('dev_loss', loss, global_step=trainer.global_step)
        writer.add_scalar('dev_ppl', ppl, global_step=trainer.global_step)
    if checkpoint_file:
        trainer.save(checkpoint_file, full=True)
        logger.info('new checkpoint saved at step {:10d}'.format(trainer.global_step))

    return loss, ppl, best_loss

def get_current_lr(trainer, args):
    return trainer.scheduler.state_dict().get('_last_lr', [args['lr0']])[0]

def load_char_vocab(vocab_file):
    return {'char': CharVocab.load_state_dict(torch.load(vocab_file, lambda storage, loc: storage, weights_only=True))}

def train(args):
    utils.log_training_args(args, logger)
    model_file = build_model_filename(args)

    vocab_file = args['save_dir'] + '/' + args['vocab_save_name'] if args['vocab_save_name'] is not None \
        else '{}/{}_vocab.pt'.format(args['save_dir'], args['shorthand'])

    if args['checkpoint']:
        checkpoint_file = utils.checkpoint_name(args['save_dir'], model_file, args['checkpoint_save_name'])
    else:
        checkpoint_file = None

    if os.path.exists(vocab_file):
        logger.info('Loading existing vocab file')
        vocab = load_char_vocab(vocab_file)
    else:
        logger.info('Building and saving vocab')
        vocab = {'char': build_charlm_vocab(args['train_file'] if args['train_dir'] is None else args['train_dir'], cutoff=args['cutoff'])}
        torch.save(vocab['char'].state_dict(), vocab_file)
    logger.info("Training model with vocab size: {}".format(len(vocab['char'])))

    if checkpoint_file and os.path.exists(checkpoint_file):
        logger.info('Loading existing checkpoint: %s' % checkpoint_file)
        trainer = CharacterLanguageModelTrainer.load(args, checkpoint_file, finetune=True)
    else:
        trainer = CharacterLanguageModelTrainer.from_new_model(args, vocab)

    writer = None
    if args['summary']:
        from torch.utils.tensorboard import SummaryWriter
        summary_dir = '{}/{}_summary'.format(args['save_dir'], args['save_name']) if args['save_name'] is not None \
            else '{}/{}_{}_charlm_summary'.format(args['save_dir'], args['shorthand'], args['direction'])
        writer = SummaryWriter(log_dir=summary_dir)
    
    # evaluate model within epoch if eval_interval is set
    eval_within_epoch = False
    if args['eval_steps'] > 0:
        eval_within_epoch = True

    if args['wandb']:
        import wandb
        wandb_name = args['wandb_name'] if args['wandb_name'] else '%s_%s_charlm' % (args['shorthand'], args['direction'])
        wandb.init(name=wandb_name, config=args)
        wandb.run.define_metric('best_loss', summary='min')
        wandb.run.define_metric('ppl', summary='min')

    device = next(trainer.model.parameters()).device

    best_loss = None
    start_epoch = trainer.epoch  # will default to 1 for a new trainer
    for trainer.epoch in range(start_epoch, args['epochs']+1):
        # load train data from train_dir if not empty, otherwise load from file
        if args['train_dir'] is not None:
            train_path = args['train_dir']
        else:
            train_path = args['train_file']
        train_data = load_data(train_path, vocab, args['direction'])
        dev_data = load_file(args['eval_file'], vocab, args['direction']) # dev must be a single file

        # run over entire training set
        for data_chunk in train_data:
            batches = batchify(data_chunk, args['batch_size'], device)
            hidden = None
            total_loss = 0.0
            total_batches = math.ceil((batches.size(1) - 1) / args['bptt_size'])
            iteration, i = 0, 0
            # over the data chunk
            while i < batches.size(1) - 1 - 1:
                trainer.model.train()
                trainer.global_step += 1
                start_time = time.time()
                bptt = args['bptt_size'] if np.random.random() < 0.95 else args['bptt_size']/ 2.
                # prevent excessively small or negative sequence lengths
                seq_len = max(5, int(np.random.normal(bptt, 5)))
                # prevent very large sequence length, must be <= 1.2 x bptt
                seq_len = min(seq_len, int(args['bptt_size'] * 1.2))
                data, target = get_batch(batches, i, seq_len)
                lens = [data.size(1) for i in range(data.size(0))]
                
                trainer.optimizer.zero_grad()
                output, hidden, decoded = trainer.model.forward(data, lens, hidden)
                loss = trainer.criterion(decoded.view(-1, len(vocab['char'])), target)
                total_loss += loss.data.item()
                loss.backward()

                torch.nn.utils.clip_grad_norm_(trainer.params, args['max_grad_norm'])
                trainer.optimizer.step()

                hidden = repackage_hidden(hidden)

                if (iteration + 1) % args['report_steps'] == 0:
                    cur_loss = total_loss / args['report_steps']
                    elapsed = time.time() - start_time
                    logger.info(
                        "| epoch {:5d} | {:5d}/{:5d} batches | sec/batch {:.6f} | loss {:5.2f} | ppl {:8.2f}".format(
                            trainer.epoch,
                            iteration + 1,
                            total_batches,
                            elapsed / args['report_steps'],
                            cur_loss,
                            math.exp(cur_loss),
                        )
                    )
                    if args['wandb']:
                        wandb.log({'train_loss': cur_loss}, step=trainer.global_step)
                    total_loss = 0.0

                iteration += 1
                i += seq_len

                # evaluate if necessary
                if eval_within_epoch and trainer.global_step % args['eval_steps'] == 0:
                    _, ppl, best_loss = evaluate_and_save(args, vocab, dev_data, trainer, best_loss, model_file, checkpoint_file, writer)
                    if args['wandb']:
                        wandb.log({'ppl': ppl, 'best_loss': best_loss, 'lr': get_current_lr(trainer, args)}, step=trainer.global_step)

        # if eval_interval isn't provided, run evaluation after each epoch
        if not eval_within_epoch or trainer.epoch == args['epochs']:
            _, ppl, best_loss = evaluate_and_save(args, vocab, dev_data, trainer, best_loss, model_file, checkpoint_file, writer)
            if args['wandb']:
                wandb.log({'ppl': ppl, 'best_loss': best_loss, 'lr': get_current_lr(trainer, args)}, step=trainer.global_step)

    if writer:
        writer.close()
    if args['wandb']:
        wandb.finish()
    return

def evaluate(args):
    model_file = build_model_filename(args)

    model = CharacterLanguageModel.load(model_file).to(args['device'])
    vocab = model.vocab
    data = load_data(args['eval_file'], vocab, args['direction'])
    criterion = torch.nn.CrossEntropyLoss()
    
    loss = evaluate_epoch(args, vocab, data, model, criterion)
    logger.info(
        "| best model | loss {:5.2f} | ppl {:8.2f}".format(
            loss,
            math.exp(loss),
        )
    )
    return

if __name__ == '__main__':
    main()


================================================
FILE: stanza/models/classifier.py
================================================
import argparse
import ast
import logging
import os
import random
import re
from enum import Enum

import torch
import torch.nn as nn

from stanza.models.common import loss
from stanza.models.common import utils
from stanza.models.pos.vocab import CharVocab

import stanza.models.classifiers.data as data
from stanza.models.classifiers.trainer import Trainer
from stanza.models.classifiers.utils import WVType, ExtraVectors, ModelType
from stanza.models.common.peft_config import add_peft_args, resolve_peft_args

from stanza.utils.confusion import format_confusion, confusion_to_accuracy, confusion_to_macro_f1


class Loss(Enum):
    CROSS = 1
    WEIGHTED_CROSS = 2
    LOG_CROSS = 3
    FOCAL = 4

class DevScoring(Enum):
    ACCURACY = 'ACC'
    WEIGHTED_F1 = 'WF'

logger = logging.getLogger('stanza')
tlogger = logging.getLogger('stanza.classifiers.trainer')

logging.getLogger('elmoformanylangs').setLevel(logging.WARNING)

DEFAULT_TRAIN='data/sentiment/en_sstplus.train.txt'
DEFAULT_DEV='data/sentiment/en_sst3roots.dev.txt'
DEFAULT_TEST='data/sentiment/en_sst3roots.test.txt'

"""A script for training and testing classifier models, especially on the SST.

If you run the script with no arguments, it will start trying to train
a sentiment model.

python3 -m stanza.models.classifier

This requires the sentiment dataset to be in an `extern_data`
directory, such as by symlinking it from somewhere else.

The default model is a CNN where the word vectors are first mapped to
channels with filters of a few different widths, those channels are
maxpooled over the entire sentence, and then the resulting pools have
fully connected layers until they reach the number of classes in the
training data.  You can see the defaults in the options below.

https://arxiv.org/abs/1408.5882

(Currently the CNN is the only sentence classifier implemented.)

To train with a more complicated CNN arch:

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 > FC41.out 2>&1 &

You can train models with word vectors other than the default word2vec.  For example:

 nohup python3 -u -m stanza.models.classifier  --wordvec_type google --wordvec_dir extern_data/google --max_epochs 200 --filter_channels 1000 --fc_shapes 200,100 --base_name FC21_google > FC21_google.out 2>&1 &

A model trained on the 5 class dataset can be tested on the 2 class dataset with a command line like this:

python3 -u -m stanza.models.classifier  --no_train --load_name saved_models/classifier/sst_en_ewt_FS_3_4_5_C_1000_FC_400_100_classifier.E0165-ACC41.87.pt --test_file data/sentiment/en_sst2roots.test.txt --test_remap_labels "{0:0, 1:0, 3:1, 4:1}"

python3 -u -m stanza.models.classifier  --wordvec_type google --wordvec_dir extern_data/google --no_train --load_name saved_models/classifier/FC21_google_en_ewt_FS_3_4_5_C_1000_FC_200_100_classifier.E0189-ACC45.87.pt --test_file data/sentiment/en_sst2roots.test.txt --test_remap_labels "{0:0, 1:0, 3:1, 4:1}"

A model trained on the 3 class dataset can be tested on the 2 class dataset with a command line like this:

python3 -u -m stanza.models.classifier  --wordvec_type google --wordvec_dir extern_data/google --no_train --load_name saved_models/classifier/FC21_3C_google_en_ewt_FS_3_4_5_C_1000_FC_200_100_classifier.E0101-ACC68.94.pt --test_file data/sentiment/en_sst2roots.test.txt --test_remap_labels "{0:0, 2:1}"

To train models on combined 3 class datasets:

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 --base_name FC41_3class  --extra_wordvec_method CONCAT --extra_wordvec_dim 200  --train_file data/sentiment/en_sstplus.train.txt --dev_file data/sentiment/en_sst3roots.dev.txt --test_file data/sentiment/en_sst3roots.test.txt > FC41_3class.out 2>&1 &

This tests that model:

python3 -u -m stanza.models.classifier --no_train --load_name en_sstplus.pt --test_file data/sentiment/en_sst3roots.test.txt

Here is an example for training a model in a different language:

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 --base_name FC41_german  --train_file data/sentiment/de_sb10k.train.txt --dev_file data/sentiment/de_sb10k.dev.txt --test_file data/sentiment/de_sb10k.test.txt --shorthand de_sb10k --min_train_len 3 --extra_wordvec_method CONCAT --extra_wordvec_dim 100 > de_sb10k.out 2>&1 &

This uses more data, although that wound up being worse for the German model:

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 --base_name FC41_german  --train_file data/sentiment/de_sb10k.train.txt,data/sentiment/de_scare.train.txt,data/sentiment/de_usage.train.txt --dev_file data/sentiment/de_sb10k.dev.txt --test_file data/sentiment/de_sb10k.test.txt --shorthand de_sb10k --min_train_len 3 --extra_wordvec_method CONCAT --extra_wordvec_dim 100 > de_sb10k.out 2>&1 &

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 --base_name FC41_chinese --train_file data/sentiment/zh_ren.train.txt --dev_file data/sentiment/zh_ren.dev.txt --test_file data/sentiment/zh_ren.test.txt --shorthand zh_ren --wordvec_type fasttext --extra_wordvec_method SUM --wordvec_pretrain_file ../stanza_resources/zh-hans/pretrain/gsdsimp.pt > zh_ren.out 2>&1 &

nohup python3 -u -m stanza.models.classifier --max_epochs 400 --filter_channels 1000 --fc_shapes 400,100 --save_name vi_vsfc.pt  --train_file data/sentiment/vi_vsfc.train.json --dev_file data/sentiment/vi_vsfc.dev.json --test_file data/sentiment/vi_vsfc.test.json --shorthand vi_vsfc --wordvec_pretrain_file ../stanza_resources/vi/pretrain/vtb.pt --wordvec_type word2vec --extra_wordvec_method SUM --dev_eval_scoring WEIGHTED_F1 > vi_vsfc.out 2>&1 &

python3 -u -m stanza.models.classifier --no_train --test_file extern_data/sentiment/vietnamese/_UIT-VSFC/test.txt --shorthand vi_vsfc --wordvec_pretrain_file ../stanza_resources/vi/pretrain/vtb.pt --wordvec_type word2vec --load_name vi_vsfc.pt
"""

def convert_fc_shapes(arg):
    """
    Returns a tuple of sizes to use in FC layers.

    For examples, converts "100" -> (100,)
    "100,200" -> (100,200)
    """
    arg = arg.strip()
    if not arg:
        return ()
    arg = ast.literal_eval(arg)
    if isinstance(arg, int):
        return (arg,)
    if isinstance(arg, tuple):
        return arg
    return tuple(arg)

# For the most part, these values are for the constituency parser.
# Only the WD for adadelta is originally for sentiment
# Also LR for adadelta and madgrad

# madgrad learning rate experiment on sstplus
# note that the hyperparameters are not cross-validated in tandem, so
# later changes may make some earlier experiments slightly out of date
# LR
#   0.01         failed to converge
#   0.004        failed to converge
#   0.003        0.5572
#   0.002        failed to converge
#   0.001        0.6857
#   0.0008       0.6799
#   0.0005       0.6849
#   0.00025      0.6749
#   0.0001       0.6746
#   0.00001      0.6536
#   0.000001     0.6267
# LR 0.001 produced the best model, but it does occasionally fail to
# converge to a working model, so we set the default to 0.0005 instead
DEFAULT_LEARNING_RATES = { "adamw": 0.0002, "adadelta": 1.0, "sgd": 0.001, "adabelief": 0.00005, "madgrad": 0.0005, "sgd": 0.001 }
DEFAULT_LEARNING_EPS = { "adabelief": 1e-12, "adadelta": 1e-6, "adamw": 1e-8 }
DEFAULT_LEARNING_RHO = 0.9
DEFAULT_MOMENTUM = { "madgrad": 0.9, "sgd": 0.9 }
DEFAULT_WEIGHT_DECAY = { "adamw": 0.05, "adadelta": 0.0001, "sgd": 0.01, "adabelief": 1.2e-6, "madgrad": 2e-6 }

def build_argparse():
    """
    Build the argparse for the classifier.

    Refactored so that other utility scripts can use the same parser if needed.
    """
    parser = argparse.ArgumentParser()

    parser.add_argument('--train', dest='train', default=True, action='store_true', help='Train the model (default)')
    parser.add_argument('--no_train', dest='train', action='store_false', help="Don't train the model")

    parser.add_argument('--shorthand', type=str, default='en_ewt', help="Treebank shorthand, eg 'en' for English")

    parser.add_argument('--load_name', type=str, default=None, help='Name for loading an existing model')
    parser.add_argument('--save_dir', type=str, default='saved_models/classifier', help='Root dir for saving models.')
    parser.add_argument('--save_name', type=str, default="{shorthand}_{embedding}_{bert_finetuning}_{classifier_type}_classifier.pt", help='Name for saving the model')

    parser.add_argument('--checkpoint_save_name', type=str, default=None, help="File name to save the most recent checkpoint")
    parser.add_argument('--no_checkpoint', dest='checkpoint', action='store_false', help="Don't save checkpoints")

    parser.add_argument('--save_intermediate_models', default=False, action='store_true',
                        help='Save all intermediate models - this can be a lot!')

    parser.add_argument('--train_file', type=str, default=DEFAULT_TRAIN, help='Input file(s) to train a model from.  Each line is an example.  Should go <label> <tokenized sentence>.  Comma separated list.')
    parser.add_argument('--dev_file', type=str, default=DEFAULT_DEV, help='Input file(s) to use as the dev set.')
    parser.add_argument('--test_file', type=str, default=DEFAULT_TEST, help='Input file(s) to use as the test set.')
    parser.add_argument('--output_predictions', default=False, action='store_true', help='Output predictions when running the test set')
    parser.add_argument('--max_epochs', type=int, default=100)
    parser.add_argument('--tick', type=int, default=50)

    parser.add_argument('--model_type', type=lambda x: ModelType[x.upper()], default=ModelType.CNN,
                        help='Model type to use.  Options: %s' % " ".join(x.name for x in ModelType))

    parser.add_argument('--filter_sizes', default=(3,4,5), type=ast.literal_eval, help='Filter sizes for the layer after the word vectors')
    parser.add_argument('--filter_channels', default=1000, type=ast.literal_eval, help='Number of channels for layers after the word vectors.  Int for same number of channels (scaled by width) for each filter, or tuple/list for exact lengths for each filter')
    parser.add_argument('--fc_shapes', default="400,100", type=convert_fc_shapes, help='Extra fully connected layers to put after the initial filters.  If set to blank, will FC directly from the max pooling to the output layer.')
    parser.add_argument('--dropout', default=0.5, type=float, help='Dropout value to use')

    parser.add_argument('--batch_size', default=50, type=int, help='Batch size when training')
    parser.add_argument('--batch_single_item', default=200, type=int, help='Items of this size go in their own batch')
    parser.add_argument('--dev_eval_batches', default=2000, type=int, help='Run the dev set after this many train batches.  Set to 0 to only do it once per epoch')
    parser.add_argument('--dev_eval_scoring', type=lambda x: DevScoring[x.upper()], default=DevScoring.WEIGHTED_F1,
                        help=('Scoring method to use for choosing the best model.  Options: %s' %
                              " ".join(x.name for x in DevScoring)))

    parser.add_argument('--weight_decay', default=None, type=float, help='Weight decay (eg, l2 reg) to use in the optimizer')
    parser.add_argument('--learning_rate', default=None, type=float, help='Learning rate to use in the optimizer')
    parser.add_argument('--momentum', default=None, type=float, help='Momentum to use in the optimizer')

    parser.add_argument('--optim', default='adadelta', choices=['adadelta', 'madgrad', 'sgd'], help='Optimizer type: SGD, Adadelta, or madgrad.  Highly recommend to install madgrad and use that')

    parser.add_argument('--test_remap_labels', default=None, type=ast.literal_eval,
                        help='Map of which label each classifier label should map to.  For example, "{0:0, 1:0, 3:1, 4:1}" to map a 5 class sentiment test to a 2 class.  Any labels not mapped will be considered wrong')
    parser.add_argument('--forgive_unmapped_labels', dest='forgive_unmapped_labels', default=True, action='store_true',
                        help='When remapping labels, such as from 5 class to 2 class, pick a different label if the first guess is not remapped.')
    parser.add_argument('--no_forgive_unmapped_labels', dest='forgive_unmapped_labels', action='store_false',
                        help="When remapping labels, such as from 5 class to 2 class, DON'T pick a different label if the first guess is not remapped.")

    parser.add_argument('--loss', type=lambda x: Loss[x.upper()], default=Loss.CROSS,
                        help="Whether to use regular cross entropy or scale it by 1/log(quantity)")
    parser.add_argument('--loss_focal_gamma', default=2, type=float, help='gamma value for a focal loss')
    parser.add_argument('--min_train_len', type=int, default=0,
                        help="Filter sentences less than this length")

    parser.add_argument('--pretrain_max_vocab', type=int, default=-1)
    parser.add_argument('--wordvec_pretrain_file', type=

Download .txt

gitextract_z9hqe0ws/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── feature_request.md
│   │   └── question.md
│   ├── pull_request_template.md
│   ├── stale.yml
│   └── workflows/
│       └── stanza-tests.yaml
├── .gitignore
├── .travis.yml
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── demo/
│   ├── CONLL_Dependency_Visualizer_Example.ipynb
│   ├── Dependency_Visualization_Testing.ipynb
│   ├── NER_Visualization.ipynb
│   ├── Stanza_Beginners_Guide.ipynb
│   ├── Stanza_CoreNLP_Interface.ipynb
│   ├── arabic_test.conllu.txt
│   ├── corenlp.py
│   ├── en_test.conllu.txt
│   ├── japanese_test.conllu.txt
│   ├── pipeline_demo.py
│   ├── scenegraph.py
│   ├── semgrex visualization.ipynb
│   ├── semgrex.py
│   └── ssurgeon_script.txt
├── doc/
│   └── CoreNLP.proto
├── scripts/
│   ├── config.sh
│   └── download_vectors.sh
├── setup.py
└── stanza/
    ├── __init__.py
    ├── _version.py
    ├── models/
    │   ├── __init__.py
    │   ├── _training_logging.py
    │   ├── charlm.py
    │   ├── classifier.py
    │   ├── classifiers/
    │   │   ├── __init__.py
    │   │   ├── base_classifier.py
    │   │   ├── cnn_classifier.py
    │   │   ├── config.py
    │   │   ├── constituency_classifier.py
    │   │   ├── data.py
    │   │   ├── iterate_test.py
    │   │   ├── trainer.py
    │   │   └── utils.py
    │   ├── common/
    │   │   ├── __init__.py
    │   │   ├── beam.py
    │   │   ├── bert_embedding.py
    │   │   ├── biaffine.py
    │   │   ├── build_short_name_to_treebank.py
    │   │   ├── char_model.py
    │   │   ├── chuliu_edmonds.py
    │   │   ├── constant.py
    │   │   ├── convert_pretrain.py
    │   │   ├── count_ner_coverage.py
    │   │   ├── count_pretrain_coverage.py
    │   │   ├── crf.py
    │   │   ├── data.py
    │   │   ├── doc.py
    │   │   ├── dropout.py
    │   │   ├── exceptions.py
    │   │   ├── foundation_cache.py
    │   │   ├── hlstm.py
    │   │   ├── large_margin_loss.py
    │   │   ├── loss.py
    │   │   ├── maxout_linear.py
    │   │   ├── packed_lstm.py
    │   │   ├── peft_config.py
    │   │   ├── pretrain.py
    │   │   ├── relative_attn.py
    │   │   ├── seq2seq_constant.py
    │   │   ├── seq2seq_model.py
    │   │   ├── seq2seq_modules.py
    │   │   ├── seq2seq_utils.py
    │   │   ├── short_name_to_treebank.py
    │   │   ├── stanza_object.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── constituency/
    │   │   ├── __init__.py
    │   │   ├── base_model.py
    │   │   ├── base_trainer.py
    │   │   ├── dynamic_oracle.py
    │   │   ├── ensemble.py
    │   │   ├── error_analysis_in_order.py
    │   │   ├── evaluate_treebanks.py
    │   │   ├── in_order_compound_oracle.py
    │   │   ├── in_order_oracle.py
    │   │   ├── label_attention.py
    │   │   ├── lstm_model.py
    │   │   ├── lstm_tree_stack.py
    │   │   ├── parse_transitions.py
    │   │   ├── parse_tree.py
    │   │   ├── parser_training.py
    │   │   ├── partitioned_transformer.py
    │   │   ├── positional_encoding.py
    │   │   ├── retagging.py
    │   │   ├── score_converted_dependencies.py
    │   │   ├── state.py
    │   │   ├── text_processing.py
    │   │   ├── top_down_oracle.py
    │   │   ├── trainer.py
    │   │   ├── transformer_tree_stack.py
    │   │   ├── transition_sequence.py
    │   │   ├── tree_embedding.py
    │   │   ├── tree_reader.py
    │   │   ├── tree_stack.py
    │   │   └── utils.py
    │   ├── constituency_parser.py
    │   ├── coref/
    │   │   ├── __init__.py
    │   │   ├── anaphoricity_scorer.py
    │   │   ├── bert.py
    │   │   ├── cluster_checker.py
    │   │   ├── config.py
    │   │   ├── conll.py
    │   │   ├── const.py
    │   │   ├── coref_chain.py
    │   │   ├── coref_config.toml
    │   │   ├── dataset.py
    │   │   ├── loss.py
    │   │   ├── model.py
    │   │   ├── pairwise_encoder.py
    │   │   ├── predict.py
    │   │   ├── rough_scorer.py
    │   │   ├── span_predictor.py
    │   │   ├── tokenizer_customization.py
    │   │   ├── utils.py
    │   │   └── word_encoder.py
    │   ├── depparse/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   └── trainer.py
    │   ├── identity_lemmatizer.py
    │   ├── lang_identifier.py
    │   ├── langid/
    │   │   ├── __init__.py
    │   │   ├── create_ud_data.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   └── trainer.py
    │   ├── lemma/
    │   │   ├── __init__.py
    │   │   ├── attach_lemma_classifier.py
    │   │   ├── data.py
    │   │   ├── edit.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   └── vocab.py
    │   ├── lemma_classifier/
    │   │   ├── __init__.py
    │   │   ├── base_model.py
    │   │   ├── base_trainer.py
    │   │   ├── baseline_model.py
    │   │   ├── constants.py
    │   │   ├── evaluate_many.py
    │   │   ├── evaluate_models.py
    │   │   ├── lstm_model.py
    │   │   ├── prepare_dataset.py
    │   │   ├── train_lstm_model.py
    │   │   ├── train_many.py
    │   │   ├── train_transformer_model.py
    │   │   ├── transformer_model.py
    │   │   └── utils.py
    │   ├── lemmatizer.py
    │   ├── mwt/
    │   │   ├── __init__.py
    │   │   ├── character_classifier.py
    │   │   ├── data.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── mwt_expander.py
    │   ├── ner/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── ner_tagger.py
    │   ├── parser.py
    │   ├── pos/
    │   │   ├── __init__.py
    │   │   ├── build_xpos_vocab_factory.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── scorer.py
    │   │   ├── trainer.py
    │   │   ├── vocab.py
    │   │   ├── xpos_vocab_factory.py
    │   │   └── xpos_vocab_utils.py
    │   ├── tagger.py
    │   ├── tokenization/
    │   │   ├── __init__.py
    │   │   ├── data.py
    │   │   ├── model.py
    │   │   ├── tokenize_files.py
    │   │   ├── trainer.py
    │   │   ├── utils.py
    │   │   └── vocab.py
    │   ├── tokenizer.py
    │   └── wl_coref.py
    ├── pipeline/
    │   ├── __init__.py
    │   ├── _constants.py
    │   ├── constituency_processor.py
    │   ├── core.py
    │   ├── coref_processor.py
    │   ├── demo/
    │   │   ├── README.md
    │   │   ├── __init__.py
    │   │   ├── demo_server.py
    │   │   ├── stanza-brat.css
    │   │   ├── stanza-brat.html
    │   │   ├── stanza-brat.js
    │   │   └── stanza-parseviewer.js
    │   ├── depparse_processor.py
    │   ├── external/
    │   │   ├── __init__.py
    │   │   ├── corenlp_converter_depparse.py
    │   │   ├── jieba.py
    │   │   ├── pythainlp.py
    │   │   ├── spacy.py
    │   │   └── sudachipy.py
    │   ├── langid_processor.py
    │   ├── lemma_processor.py
    │   ├── morphseg_processor.py
    │   ├── multilingual.py
    │   ├── mwt_processor.py
    │   ├── ner_processor.py
    │   ├── pos_processor.py
    │   ├── processor.py
    │   ├── registry.py
    │   ├── sentiment_processor.py
    │   └── tokenize_processor.py
    ├── protobuf/
    │   ├── CoreNLP_pb2.py
    │   └── __init__.py
    ├── resources/
    │   ├── __init__.py
    │   ├── common.py
    │   ├── default_packages.py
    │   ├── installation.py
    │   ├── prepare_resources.py
    │   └── print_charlm_depparse.py
    ├── server/
    │   ├── __init__.py
    │   ├── annotator.py
    │   ├── client.py
    │   ├── dependency_converter.py
    │   ├── java_protobuf_requests.py
    │   ├── main.py
    │   ├── morphology.py
    │   ├── parser_eval.py
    │   ├── semgrex.py
    │   ├── ssurgeon.py
    │   ├── tokensregex.py
    │   ├── tsurgeon.py
    │   └── ud_enhancer.py
    ├── tests/
    │   ├── __init__.py
    │   ├── classifiers/
    │   │   ├── __init__.py
    │   │   ├── test_classifier.py
    │   │   ├── test_constituency_classifier.py
    │   │   ├── test_data.py
    │   │   └── test_process_utils.py
    │   ├── common/
    │   │   ├── __init__.py
    │   │   ├── test_bert_embedding.py
    │   │   ├── test_char_model.py
    │   │   ├── test_chuliu_edmonds.py
    │   │   ├── test_common_data.py
    │   │   ├── test_confusion.py
    │   │   ├── test_constant.py
    │   │   ├── test_data_conversion.py
    │   │   ├── test_data_objects.py
    │   │   ├── test_doc.py
    │   │   ├── test_dropout.py
    │   │   ├── test_foundation_cache.py
    │   │   ├── test_pretrain.py
    │   │   ├── test_relative_attn.py
    │   │   ├── test_short_name_to_treebank.py
    │   │   └── test_utils.py
    │   ├── constituency/
    │   │   ├── __init__.py
    │   │   ├── test_convert_arboretum.py
    │   │   ├── test_convert_it_vit.py
    │   │   ├── test_convert_starlang.py
    │   │   ├── test_ensemble.py
    │   │   ├── test_in_order_compound_oracle.py
    │   │   ├── test_in_order_oracle.py
    │   │   ├── test_lstm_model.py
    │   │   ├── test_parse_transitions.py
    │   │   ├── test_parse_tree.py
    │   │   ├── test_positional_encoding.py
    │   │   ├── test_selftrain_vi_quad.py
    │   │   ├── test_text_processing.py
    │   │   ├── test_top_down_oracle.py
    │   │   ├── test_trainer.py
    │   │   ├── test_transformer_tree_stack.py
    │   │   ├── test_transition_sequence.py
    │   │   ├── test_tree_reader.py
    │   │   ├── test_tree_stack.py
    │   │   ├── test_utils.py
    │   │   └── test_vietnamese.py
    │   ├── datasets/
    │   │   ├── __init__.py
    │   │   ├── coref/
    │   │   │   ├── __init__.py
    │   │   │   └── test_hebrew_iahlt.py
    │   │   ├── ner/
    │   │   │   ├── __init__.py
    │   │   │   ├── test_prepare_ner_file.py
    │   │   │   └── test_utils.py
    │   │   ├── test_common.py
    │   │   └── test_vietnamese_renormalization.py
    │   ├── depparse/
    │   │   ├── __init__.py
    │   │   ├── test_depparse_data.py
    │   │   └── test_parser.py
    │   ├── langid/
    │   │   ├── __init__.py
    │   │   ├── test_langid.py
    │   │   └── test_multilingual.py
    │   ├── lemma/
    │   │   ├── __init__.py
    │   │   ├── test_data.py
    │   │   ├── test_lemma_trainer.py
    │   │   └── test_lowercase.py
    │   ├── lemma_classifier/
    │   │   ├── __init__.py
    │   │   ├── test_data_preparation.py
    │   │   └── test_training.py
    │   ├── morphseg/
    │   │   ├── __init__.py
    │   │   ├── conftest.py
    │   │   ├── test_integration.py
    │   │   ├── test_morpheme_segmenter.py
    │   │   └── test_stanza_integration.py
    │   ├── mwt/
    │   │   ├── __init__.py
    │   │   ├── test_character_classifier.py
    │   │   ├── test_english_corner_cases.py
    │   │   ├── test_prepare_mwt.py
    │   │   └── test_utils.py
    │   ├── ner/
    │   │   ├── __init__.py
    │   │   ├── test_bsf_2_beios.py
    │   │   ├── test_bsf_2_iob.py
    │   │   ├── test_combine_ner_datasets.py
    │   │   ├── test_convert_amt.py
    │   │   ├── test_convert_nkjp.py
    │   │   ├── test_convert_starlang_ner.py
    │   │   ├── test_data.py
    │   │   ├── test_from_conllu.py
    │   │   ├── test_models_ner_scorer.py
    │   │   ├── test_ner_tagger.py
    │   │   ├── test_ner_trainer.py
    │   │   ├── test_ner_training.py
    │   │   ├── test_ner_utils.py
    │   │   ├── test_pay_amt_annotators.py
    │   │   ├── test_split_wikiner.py
    │   │   └── test_suc3.py
    │   ├── pipeline/
    │   │   ├── __init__.py
    │   │   ├── pipeline_device_tests.py
    │   │   ├── test_arabic_pipeline.py
    │   │   ├── test_core.py
    │   │   ├── test_decorators.py
    │   │   ├── test_depparse.py
    │   │   ├── test_english_pipeline.py
    │   │   ├── test_french_pipeline.py
    │   │   ├── test_lemmatizer.py
    │   │   ├── test_pipeline_constituency_processor.py
    │   │   ├── test_pipeline_depparse_processor.py
    │   │   ├── test_pipeline_mwt_expander.py
    │   │   ├── test_pipeline_ner_processor.py
    │   │   ├── test_pipeline_pos_processor.py
    │   │   ├── test_pipeline_sentiment_processor.py
    │   │   ├── test_requirements.py
    │   │   └── test_tokenizer.py
    │   ├── pos/
    │   │   ├── __init__.py
    │   │   ├── test_data.py
    │   │   ├── test_tagger.py
    │   │   └── test_xpos_vocab_factory.py
    │   ├── pytest.ini
    │   ├── resources/
    │   │   ├── __init__.py
    │   │   ├── test_charlm_depparse.py
    │   │   ├── test_common.py
    │   │   ├── test_default_packages.py
    │   │   ├── test_installation.py
    │   │   └── test_prepare_resources.py
    │   ├── server/
    │   │   ├── __init__.py
    │   │   ├── test_client.py
    │   │   ├── test_java_protobuf_requests.py
    │   │   ├── test_morphology.py
    │   │   ├── test_parser_eval.py
    │   │   ├── test_protobuf.py
    │   │   ├── test_semgrex.py
    │   │   ├── test_server_misc.py
    │   │   ├── test_server_pretokenized.py
    │   │   ├── test_server_request.py
    │   │   ├── test_server_start.py
    │   │   ├── test_ssurgeon.py
    │   │   ├── test_tokensregex.py
    │   │   ├── test_tsurgeon.py
    │   │   └── test_ud_enhancer.py
    │   ├── setup.py
    │   └── tokenization/
    │       ├── __init__.py
    │       ├── test_prepare_tokenizer_treebank.py
    │       ├── test_replace_long_tokens.py
    │       ├── test_spaces.py
    │       ├── test_tokenization_lst20.py
    │       ├── test_tokenization_orchid.py
    │       ├── test_tokenize_data.py
    │       ├── test_tokenize_files.py
    │       ├── test_tokenize_utils.py
    │       └── test_vocab.py
    └── utils/
        ├── __init__.py
        ├── avg_sent_len.py
        ├── charlm/
        │   ├── __init__.py
        │   ├── conll17_to_text.py
        │   ├── dump_oscar.py
        │   ├── make_lm_data.py
        │   └── oscar_to_text.py
        ├── confusion.py
        ├── conll.py
        ├── constituency/
        │   ├── __init__.py
        │   ├── check_transitions.py
        │   ├── grep_dev_logs.py
        │   ├── grep_test_logs.py
        │   └── list_tensors.py
        ├── datasets/
        │   ├── __init__.py
        │   ├── common.py
        │   ├── conllu_to_text.py
        │   ├── constituency/
        │   │   ├── __init__.py
        │   │   ├── build_silver_dataset.py
        │   │   ├── common_trees.py
        │   │   ├── convert_alt.py
        │   │   ├── convert_arboretum.py
        │   │   ├── convert_cintil.py
        │   │   ├── convert_ctb.py
        │   │   ├── convert_icepahc.py
        │   │   ├── convert_it_turin.py
        │   │   ├── convert_it_vit.py
        │   │   ├── convert_spmrl.py
        │   │   ├── convert_starlang.py
        │   │   ├── count_common_words.py
        │   │   ├── extract_all_silver_dataset.py
        │   │   ├── extract_silver_dataset.py
        │   │   ├── prepare_con_dataset.py
        │   │   ├── reduce_dataset.py
        │   │   ├── relabel_tags.py
        │   │   ├── selftrain.py
        │   │   ├── selftrain_it.py
        │   │   ├── selftrain_single_file.py
        │   │   ├── selftrain_vi_quad.py
        │   │   ├── selftrain_wiki.py
        │   │   ├── silver_variance.py
        │   │   ├── split_holdout.py
        │   │   ├── split_weighted_ensemble.py
        │   │   ├── tokenize_wiki.py
        │   │   ├── treebank_to_labeled_brackets.py
        │   │   ├── utils.py
        │   │   ├── vtb_convert.py
        │   │   └── vtb_split.py
        │   ├── contract_mwt.py
        │   ├── coref/
        │   │   ├── __init__.py
        │   │   ├── balance_languages.py
        │   │   ├── convert_hebrew_iahlt.py
        │   │   ├── convert_hebrew_mixed.py
        │   │   ├── convert_hindi.py
        │   │   ├── convert_ontonotes.py
        │   │   ├── convert_tamil.py
        │   │   ├── convert_udcoref.py
        │   │   ├── convert_udcoref_1.2.py
        │   │   └── utils.py
        │   ├── corenlp_segmenter_dataset.py
        │   ├── depparse/
        │   │   └── check_results.py
        │   ├── ner/
        │   │   ├── __init__.py
        │   │   ├── build_en_combined.py
        │   │   ├── check_for_duplicates.py
        │   │   ├── combine_ner_datasets.py
        │   │   ├── compare_entities.py
        │   │   ├── conll_to_iob.py
        │   │   ├── convert_amt.py
        │   │   ├── convert_ar_aqmar.py
        │   │   ├── convert_bn_daffodil.py
        │   │   ├── convert_bsf_to_beios.py
        │   │   ├── convert_bsnlp.py
        │   │   ├── convert_en_conll03.py
        │   │   ├── convert_fire_2013.py
        │   │   ├── convert_he_iahlt.py
        │   │   ├── convert_hy_armtdp.py
        │   │   ├── convert_ijc.py
        │   │   ├── convert_kk_kazNERD.py
        │   │   ├── convert_lst20.py
        │   │   ├── convert_mr_l3cube.py
        │   │   ├── convert_my_ucsy.py
        │   │   ├── convert_nkjp.py
        │   │   ├── convert_nner22.py
        │   │   ├── convert_nytk.py
        │   │   ├── convert_ontonotes.py
        │   │   ├── convert_rgai.py
        │   │   ├── convert_sindhi_siner.py
        │   │   ├── convert_starlang_ner.py
        │   │   ├── count_entities.py
        │   │   ├── json_to_bio.py
        │   │   ├── misc_to_date.py
        │   │   ├── ontonotes_multitag.py
        │   │   ├── prepare_ner_dataset.py
        │   │   ├── prepare_ner_file.py
        │   │   ├── preprocess_wikiner.py
        │   │   ├── simplify_en_worldwide.py
        │   │   ├── simplify_ontonotes_to_worldwide.py
        │   │   ├── split_wikiner.py
        │   │   ├── suc_conll_to_iob.py
        │   │   ├── suc_to_iob.py
        │   │   └── utils.py
        │   ├── pos/
        │   │   ├── __init__.py
        │   │   ├── convert_trees_to_pos.py
        │   │   └── remove_columns.py
        │   ├── prepare_depparse_treebank.py
        │   ├── prepare_lemma_classifier.py
        │   ├── prepare_lemma_treebank.py
        │   ├── prepare_mwt_treebank.py
        │   ├── prepare_pos_treebank.py
        │   ├── prepare_tokenizer_data.py
        │   ├── prepare_tokenizer_treebank.py
        │   ├── pretrain/
        │   │   ├── __init__.py
        │   │   └── word_in_pretrain.py
        │   ├── random_split_conllu.py
        │   ├── sentiment/
        │   │   ├── __init__.py
        │   │   ├── add_constituency.py
        │   │   ├── convert_italian_poetry_classification.py
        │   │   ├── convert_italian_sentence_classification.py
        │   │   ├── prepare_sentiment_dataset.py
        │   │   ├── process_MELD.py
        │   │   ├── process_airline.py
        │   │   ├── process_arguana_xml.py
        │   │   ├── process_corona.py
        │   │   ├── process_es_tass2020.py
        │   │   ├── process_it_sentipolc16.py
        │   │   ├── process_ren_chinese.py
        │   │   ├── process_sb10k.py
        │   │   ├── process_scare.py
        │   │   ├── process_slsd.py
        │   │   ├── process_sst.py
        │   │   ├── process_usage_german.py
        │   │   ├── process_utils.py
        │   │   └── process_vsfc_vietnamese.py
        │   ├── thai_syllable_dict_generator.py
        │   ├── tokenization/
        │   │   ├── __init__.py
        │   │   ├── convert_ml_cochin.py
        │   │   ├── convert_my_alt.py
        │   │   ├── convert_text_files.py
        │   │   ├── convert_th_best.py
        │   │   ├── convert_th_lst20.py
        │   │   ├── convert_th_orchid.py
        │   │   ├── convert_vi_vlsp.py
        │   │   └── process_thai_tokenization.py
        │   └── vietnamese/
        │       ├── __init__.py
        │       └── renormalize.py
        ├── default_paths.py
        ├── get_tqdm.py
        ├── helper_func.py
        ├── languages/
        │   ├── __init__.py
        │   └── kazakh_transliteration.py
        ├── lemma/
        │   ├── __init__.py
        │   └── count_ambiguous_lemmas.py
        ├── max_mwt_length.py
        ├── ner/
        │   ├── __init__.py
        │   ├── flair_ner_tag_dataset.py
        │   ├── paying_annotators.py
        │   └── spacy_ner_tag_dataset.py
        ├── pretrain/
        │   ├── __init__.py
        │   └── compare_pretrains.py
        ├── select_backoff.py
        ├── training/
        │   ├── __init__.py
        │   ├── common.py
        │   ├── compose_ete_results.py
        │   ├── remove_constituency_optimizer.py
        │   ├── run_charlm.py
        │   ├── run_constituency.py
        │   ├── run_depparse.py
        │   ├── run_ete.py
        │   ├── run_lemma.py
        │   ├── run_lemma_classifier.py
        │   ├── run_mwt.py
        │   ├── run_ner.py
        │   ├── run_pos.py
        │   ├── run_sentiment.py
        │   ├── run_tokenizer.py
        │   └── separate_ner_pretrain.py
        └── visualization/
            ├── README
            ├── __init__.py
            ├── conll_deprel_visualization.py
            ├── constants.py
            ├── dependency_visualization.py
            ├── ner_visualization.py
            ├── semgrex_app.py
            ├── semgrex_visualizer.py
            ├── ssurgeon_visualizer.py
            └── utils.py

Download .txt

Showing preview only (318K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (3744 symbols across 468 files)

FILE: stanza/models/charlm.py
  function repackage_hidden (line 24) | def repackage_hidden(h):
  function batchify (line 32) | def batchify(data, bsz, device):
  function get_batch (line 42) | def get_batch(source, i, seq_len):
  function load_file (line 48) | def load_file(filename, vocab, direction):
  function load_data (line 56) | def load_data(path, vocab, direction):
  function build_argparse (line 67) | def build_argparse():
  function build_model_filename (line 112) | def build_model_filename(args):
  function parse_args (line 120) | def parse_args(args=None):
  function main (line 131) | def main(args=None):
  function evaluate_epoch (line 145) | def evaluate_epoch(args, vocab, data, model, criterion):
  function evaluate_and_save (line 170) | def evaluate_and_save(args, vocab, data, trainer, best_loss, model_file,...
  function get_current_lr (line 205) | def get_current_lr(trainer, args):
  function load_char_vocab (line 208) | def load_char_vocab(vocab_file):
  function train (line 211) | def train(args):
  function evaluate (line 339) | def evaluate(args):

FILE: stanza/models/classifier.py
  class Loss (line 24) | class Loss(Enum):
  class DevScoring (line 30) | class DevScoring(Enum):
  function convert_fc_shapes (line 104) | def convert_fc_shapes(arg):
  function build_argparse (line 148) | def build_argparse():
  function build_model_filename (line 294) | def build_model_filename(args):
  function parse_args (line 304) | def parse_args(args=None):
  function dataset_predictions (line 327) | def dataset_predictions(model, dataset):
  function confusion_dataset (line 347) | def confusion_dataset(predictions, dataset, labels):
  function score_dataset (line 366) | def score_dataset(model, dataset, label_map=None,
  function score_dev_set (line 420) | def score_dev_set(model, dev_set, dev_eval_scoring):
  function intermediate_name (line 438) | def intermediate_name(filename, epoch, dev_scoring, score):
  function log_param_sizes (line 445) | def log_param_sizes(model):
  function train_model (line 454) | def train_model(trainer, model_file, checkpoint_file, args, train_set, d...
  function main (line 587) | def main(args=None):

FILE: stanza/models/classifiers/base_classifier.py
  class BaseClassifier (line 21) | class BaseClassifier(ABC, nn.Module):
    method extract_sentences (line 23) | def extract_sentences(self, doc):
    method preprocess_sentences (line 28) | def preprocess_sentences(self, sentences):
    method label_sentences (line 34) | def label_sentences(self, sentences, batch_size=None):

FILE: stanza/models/classifiers/cnn_classifier.py
  class CNNClassifier (line 55) | class CNNClassifier(BaseClassifier):
    method __init__ (line 56) | def __init__(self, pretrain, extra_vocab, labels,
    method add_unsaved_module (line 283) | def add_unsaved_module(self, name, module):
    method is_unsaved_module (line 294) | def is_unsaved_module(self, name):
    method log_configuration (line 297) | def log_configuration(self):
    method log_norms (line 305) | def log_norms(self):
    method build_char_reps (line 312) | def build_char_reps(self, inputs, max_phrase_len, charlm, projection, ...
    method extract_bert_embeddings (line 323) | def extract_bert_embeddings(self, inputs, max_phrase_len, begin_paddin...
    method forward (line 341) | def forward(self, inputs):
    method get_params (line 516) | def get_params(self, skip_modules=True):
    method preprocess_data (line 541) | def preprocess_data(self, sentences):
    method extract_sentences (line 545) | def extract_sentences(self, doc):

FILE: stanza/models/classifiers/config.py
  class CNNConfig (line 8) | class CNNConfig:  # pylint: disable=too-many-instance-attributes, too-fe...
  class ConstituencyConfig (line 44) | class ConstituencyConfig:  # pylint: disable=too-many-instance-attribute...

FILE: stanza/models/classifiers/constituency_classifier.py
  class ConstituencyClassifier (line 23) | class ConstituencyClassifier(BaseClassifier):
    method __init__ (line 24) | def __init__(self, tree_embedding, labels, args):
    method is_unsaved_module (line 43) | def is_unsaved_module(self, name):
    method log_configuration (line 46) | def log_configuration(self):
    method log_norms (line 53) | def log_norms(self):
    method forward (line 62) | def forward(self, inputs):
    method get_params (line 74) | def get_params(self, skip_modules=True):
    method extract_sentences (line 95) | def extract_sentences(self, doc):

FILE: stanza/models/classifiers/data.py
  class SentimentDatum (line 17) | class SentimentDatum:
    method __init__ (line 18) | def __init__(self, sentiment, text, constituency=None):
    method __eq__ (line 23) | def __eq__(self, other):
    method __str__ (line 30) | def __str__(self):
    method _asdict (line 33) | def _asdict(self):
  function update_text (line 39) | def update_text(sentence: List[str], wordvec_type: WVType) -> List[str]:
  function read_dataset (line 72) | def read_dataset(dataset, wordvec_type: WVType, min_len: int) -> List[Se...
  function dataset_labels (line 91) | def dataset_labels(dataset):
  function dataset_vocab (line 105) | def dataset_vocab(dataset):
  function sort_dataset_by_len (line 115) | def sort_dataset_by_len(dataset, keep_index=False):
  function shuffle_dataset (line 132) | def shuffle_dataset(sorted_dataset, batch_size, batch_single_item):
  function check_labels (line 158) | def check_labels(labels, dataset):

FILE: stanza/models/classifiers/iterate_test.py
  function parse_args (line 27) | def parse_args():

FILE: stanza/models/classifiers/trainer.py
  class Trainer (line 29) | class Trainer:
    method __init__ (line 34) | def __init__(self, model, optimizer=None, epochs_trained=0, global_ste...
    method save (line 45) | def save(self, filename, epochs_trained=None, skip_modules=True, save_...
    method load (line 68) | def load(filename, args, foundation_cache=None, load_optimizer=False):
    method load_pretrain (line 210) | def load_pretrain(args, foundation_cache):
    method build_new_model (line 233) | def build_new_model(args, train_set):
    method build_optimizer (line 303) | def build_optimizer(model, args):

FILE: stanza/models/classifiers/utils.py
  class WVType (line 14) | class WVType(Enum):
  class ExtraVectors (line 20) | class ExtraVectors(Enum):
  class ModelType (line 25) | class ModelType(Enum):
  function build_output_layers (line 29) | def build_output_layers(fc_input_size, fc_shapes, num_classes):

FILE: stanza/models/common/beam.py
  function trunc_division (line 34) | def trunc_division(a, b):
  function trunc_division (line 37) | def trunc_division(a, b):
  class Beam (line 40) | class Beam(object):
    method __init__ (line 41) | def __init__(self, size, device=None):
    method get_current_state (line 59) | def get_current_state(self):
    method get_current_origin (line 63) | def get_current_origin(self):
    method advance (line 67) | def advance(self, wordLk, copy_indices=None):
    method sort_best (line 113) | def sort_best(self):
    method get_best (line 116) | def get_best(self):
    method get_hyp (line 121) | def get_hyp(self, k):

FILE: stanza/models/common/bert_embedding.py
  class TextTooLongError (line 16) | class TextTooLongError(ValueError):
    method __init__ (line 20) | def __init__(self, length, max_len, line_num, text):
  function update_max_length (line 26) | def update_max_length(model_name, tokenizer):
  function load_tokenizer (line 37) | def load_tokenizer(model_name, tokenizer_kwargs=None, local_files_only=F...
  function load_bert (line 58) | def load_bert(model_name, tokenizer_kwargs=None, local_files_only=False):
  function tokenize_manual (line 70) | def tokenize_manual(model_name, sent, tokenizer):
  function filter_data (line 91) | def filter_data(model_name, data, tokenizer = None, log_level=logging.DE...
  function needs_length_filter (line 112) | def needs_length_filter(model_name):
  function cloned_feature (line 122) | def cloned_feature(feature, num_layers, detach=True):
  function extract_bart_word_embeddings (line 141) | def extract_bart_word_embeddings(model_name, tokenizer, model, data, dev...
  function extract_phobert_embeddings (line 177) | def extract_phobert_embeddings(model_name, tokenizer, model, data, devic...
  function fix_blank_tokens (line 273) | def fix_blank_tokens(tokenizer, data):
  function extract_llama_embeddings (line 296) | def extract_llama_embeddings(model_name, tokenizer, model, data, device,...
  function extract_xlnet_embeddings (line 338) | def extract_xlnet_embeddings(model_name, tokenizer, model, data, device,...
  function build_cloned_features (line 408) | def build_cloned_features(model, tokenizer, attention_tensor, id_tensor,...
  function convert_to_position_list (line 450) | def convert_to_position_list(sentence, offsets):
  function extract_base_embeddings (line 472) | def extract_base_embeddings(model_name, tokenizer, model, data, device, ...
  function extract_bert_embeddings (line 524) | def extract_bert_embeddings(model_name, tokenizer, model, data, device, ...

FILE: stanza/models/common/biaffine.py
  class PairwiseBilinear (line 5) | class PairwiseBilinear(nn.Module):
    method __init__ (line 9) | def __init__(self, input1_size, input2_size, output_size, bias=True):
    method forward (line 19) | def forward(self, input1, input2):
  class BiaffineScorer (line 35) | class BiaffineScorer(nn.Module):
    method __init__ (line 36) | def __init__(self, input1_size, input2_size, output_size):
    method forward (line 43) | def forward(self, input1, input2):
  class PairwiseBiaffineScorer (line 48) | class PairwiseBiaffineScorer(nn.Module):
    method __init__ (line 49) | def __init__(self, input1_size, input2_size, output_size):
    method forward (line 56) | def forward(self, input1, input2):
  class DeepBiaffineScorer (line 61) | class DeepBiaffineScorer(nn.Module):
    method __init__ (line 62) | def __init__(self, input1_size, input2_size, hidden_size, output_size,...
    method forward (line 73) | def forward(self, input1, input2):

FILE: stanza/models/common/char_model.py
  class CharacterModel (line 33) | class CharacterModel(nn.Module):
    method __init__ (line 34) | def __init__(self, args, vocab, pad=False, bidirectional=False, attent...
    method forward (line 55) | def forward(self, chars, chars_mask, word_orig_idx, sentlens, wordlens):
  function build_charlm_vocab (line 82) | def build_charlm_vocab(path, cutoff=0):
  class CharacterLanguageModel (line 121) | class CharacterLanguageModel(nn.Module):
    method __init__ (line 123) | def __init__(self, args, vocab, pad=False, is_forward_lm=True):
    method forward (line 145) | def forward(self, chars, charlens, hidden=None):
    method get_representation (line 158) | def get_representation(self, chars, charoffsets, charlens, char_orig_i...
    method per_char_representation (line 168) | def per_char_representation(self, words):
    method build_char_representation (line 183) | def build_char_representation(self, sentences):
    method hidden_dim (line 228) | def hidden_dim(self):
    method char_vocab (line 231) | def char_vocab(self):
    method train (line 234) | def train(self, mode=True):
    method full_state (line 245) | def full_state(self):
    method save (line 255) | def save(self, filename):
    method from_full_state (line 261) | def from_full_state(cls, state, finetune=False):
    method load (line 270) | def load(cls, filename, finetune=False):
  class CharacterLanguageModelWordAdapter (line 278) | class CharacterLanguageModelWordAdapter(nn.Module):
    method __init__ (line 282) | def __init__(self, charlms):
    method forward (line 286) | def forward(self, words, wrap=True):
    method hidden_dim (line 299) | def hidden_dim(self):
  class CharacterLanguageModelTrainer (line 302) | class CharacterLanguageModelTrainer():
    method __init__ (line 303) | def __init__(self, model, params, optimizer, criterion, scheduler, epo...
    method save (line 312) | def save(self, filename, full=True):
    method from_new_model (line 328) | def from_new_model(cls, args, vocab):
    method load (line 339) | def load(cls, args, filename, finetune=False):

FILE: stanza/models/common/chuliu_edmonds.py
  function tarjan (line 5) | def tarjan(tree):
  function process_cycle (line 125) | def process_cycle(tree, cycle, scores):
  function expand_contracted_tree (line 164) | def expand_contracted_tree(tree, contracted_tree, cycle_locs, noncycle_l...
  function prepare_scores (line 197) | def prepare_scores(scores):
  function chuliu_edmonds (line 206) | def chuliu_edmonds(scores):
  function chuliu_edmonds_one_root (line 246) | def chuliu_edmonds_one_root(scores):

FILE: stanza/models/common/constant.py
  class UnknownLanguageError (line 9) | class UnknownLanguageError(ValueError):
  function langcode_to_lang (line 493) | def langcode_to_lang(lcode):
  function pretty_langcode_to_lang (line 501) | def pretty_langcode_to_lang(lcode):
  function lang_to_langcode (line 510) | def lang_to_langcode(lang):
  function is_right_to_left (line 525) | def is_right_to_left(lang):
  function treebank_to_short_name (line 534) | def treebank_to_short_name(treebank):
  function treebank_to_langid (line 560) | def treebank_to_langid(treebank):

FILE: stanza/models/common/convert_pretrain.py
  function main (line 25) | def main():

FILE: stanza/models/common/count_ner_coverage.py
  function parse_args (line 4) | def parse_args():
  function read_ner (line 14) | def read_ner(filename):
  function count_coverage (line 25) | def count_coverage(pretrain, words):

FILE: stanza/models/common/count_pretrain_coverage.py
  function parse_args (line 15) | def parse_args():

FILE: stanza/models/common/crf.py
  class CRFLoss (line 12) | class CRFLoss(nn.Module):
    method __init__ (line 17) | def __init__(self, num_tag, batch_average=True):
    method forward (line 22) | def forward(self, inputs, masks, tag_indices):
    method crf_unary_score (line 46) | def crf_unary_score(self, inputs, masks, tag_indices, input_bs, input_...
    method crf_binary_score (line 57) | def crf_binary_score(self, inputs, masks, tag_indices, input_bs, input...
    method crf_log_norm (line 76) | def crf_log_norm(self, inputs, masks, tag_indices):
  function viterbi_decode (line 107) | def viterbi_decode(scores, transition_params):
  function log_sum_exp (line 132) | def log_sum_exp(value, dim=None, keepdim=False):

FILE: stanza/models/common/data.py
  function map_to_ids (line 15) | def map_to_ids(tokens, vocab):
  function get_long_tensor (line 19) | def get_long_tensor(tokens_list, batch_size, pad_id=constant.PAD_ID):
  function get_float_tensor (line 33) | def get_float_tensor(features_list, batch_size):
  function sort_all (line 43) | def sort_all(batch, lens):
  function get_augment_ratio (line 51) | def get_augment_ratio(train_data, should_augment_predicate, can_augment_...
  function should_augment_nopunct_predicate (line 88) | def should_augment_nopunct_predicate(sentence):
  function can_augment_nopunct_predicate (line 92) | def can_augment_nopunct_predicate(sentence):
  function augment_punct (line 106) | def augment_punct(train_data, augment_ratio,

FILE: stanza/models/common/doc.py
  class MWTProcessingType (line 22) | class MWTProcessingType(Enum):
  class DocJSONEncoder (line 59) | class DocJSONEncoder(json.JSONEncoder):
    method default (line 60) | def default(self, obj):
  class Document (line 67) | class Document(StanzaObject):
    method __init__ (line 71) | def __init__(self, sentences, text=None, comments=None, empty_sentence...
    method mark_whitespace (line 92) | def mark_whitespace(self):
    method lang (line 114) | def lang(self):
    method lang (line 119) | def lang(self, value):
    method text (line 124) | def text(self):
    method text (line 129) | def text(self, value):
    method sentences (line 134) | def sentences(self):
    method sentences (line 139) | def sentences(self, value):
    method num_tokens (line 144) | def num_tokens(self):
    method num_tokens (line 149) | def num_tokens(self, value):
    method num_words (line 154) | def num_words(self):
    method num_words (line 159) | def num_words(self, value):
    method ents (line 164) | def ents(self):
    method ents (line 169) | def ents(self, value):
    method entities (line 174) | def entities(self):
    method entities (line 179) | def entities(self, value):
    method _process_sentences (line 183) | def _process_sentences(self, sentences, comments=None, empty_sentences...
    method _count_words (line 242) | def _count_words(self):
    method get (line 249) | def get(self, fields, as_sentences=False, from_token=False):
    method set (line 289) | def set(self, fields, contents, to_token=False, to_sentence=False):
    method set_mwt_expansions (line 336) | def set_mwt_expansions(self, expansions,
    method get_mwt_expansions (line 432) | def get_mwt_expansions(self, evaluation=False):
    method build_ents (line 451) | def build_ents(self):
    method sort_features (line 459) | def sort_features(self):
    method iter_words (line 469) | def iter_words(self):
    method iter_tokens (line 474) | def iter_tokens(self):
    method sentence_comments (line 479) | def sentence_comments(self):
    method coref (line 484) | def coref(self):
    method coref (line 491) | def coref(self, chains):
    method _attach_coref_mentions (line 496) | def _attach_coref_mentions(self, chains):
    method reindex_sentences (line 515) | def reindex_sentences(self, start_index):
    method to_dict (line 519) | def to_dict(self):
    method __repr__ (line 524) | def __repr__(self):
    method __format__ (line 527) | def __format__(self, spec):
    method to_serialized (line 534) | def to_serialized(self):
    method from_serialized (line 540) | def from_serialized(cls, serialized_string):
  class Sentence (line 555) | class Sentence(StanzaObject):
    method __init__ (line 559) | def __init__(self, tokens, doc=None, empty_words=None):
    method _process_tokens (line 587) | def _process_tokens(self, tokens):
    method has_enhanced_dependencies (line 624) | def has_enhanced_dependencies(self):
    method enhanced_dependencies (line 631) | def enhanced_dependencies(self):
    method index (line 644) | def index(self):
    method index (line 654) | def index(self, value):
    method id (line 659) | def id(self):
    method id (line 670) | def id(self, value):
    method sent_id (line 676) | def sent_id(self):
    method sent_id (line 681) | def sent_id(self, value):
    method speaker (line 693) | def speaker(self):
    method speaker (line 698) | def speaker(self, value):
    method doc_id (line 716) | def doc_id(self):
    method doc_id (line 721) | def doc_id(self, value):
    method doc (line 733) | def doc(self):
    method doc (line 738) | def doc(self, value):
    method text (line 743) | def text(self):
    method text (line 748) | def text(self, value):
    method dependencies (line 753) | def dependencies(self):
    method dependencies (line 758) | def dependencies(self, value):
    method tokens (line 763) | def tokens(self):
    method tokens (line 768) | def tokens(self, value):
    method words (line 773) | def words(self):
    method words (line 778) | def words(self, value):
    method empty_words (line 783) | def empty_words(self):
    method empty_words (line 788) | def empty_words(self, value):
    method all_words (line 793) | def all_words(self):
    method ents (line 804) | def ents(self):
    method ents (line 809) | def ents(self, value):
    method entities (line 814) | def entities(self):
    method entities (line 819) | def entities(self, value):
    method build_ents (line 823) | def build_ents(self):
    method sentiment (line 838) | def sentiment(self):
    method sentiment (line 843) | def sentiment(self, value):
    method constituency (line 855) | def constituency(self):
    method constituency (line 860) | def constituency(self, value):
    method comments (line 879) | def comments(self):
    method add_comment (line 883) | def add_comment(self, comment):
    method rebuild_dependencies (line 914) | def rebuild_dependencies(self):
    method build_dependencies (line 920) | def build_dependencies(self):
    method build_fake_dependencies (line 940) | def build_fake_dependencies(self):
    method print_dependencies (line 948) | def print_dependencies(self, file=None):
    method dependencies_string (line 953) | def dependencies_string(self):
    method get_roots (line 959) | def get_roots(self):
    method print_tokens (line 967) | def print_tokens(self, file=None):
    method tokens_string (line 972) | def tokens_string(self):
    method print_words (line 978) | def print_words(self, file=None):
    method words_string (line 983) | def words_string(self):
    method to_dict (line 989) | def to_dict(self):
    method __repr__ (line 1003) | def __repr__(self):
    method __format__ (line 1006) | def __format__(self, spec):
  function init_from_misc (line 1035) | def init_from_misc(unit):
  function dict_to_conll_text (line 1062) | def dict_to_conll_text(token_dict, id_connector="-"):
  class Token (line 1122) | class Token(StanzaObject):
    method __init__ (line 1128) | def __init__(self, sentence, token_entry, words=None):
    method id (line 1155) | def id(self):
    method id (line 1160) | def id(self, value):
    method manual_expansion (line 1165) | def manual_expansion(self):
    method manual_expansion (line 1170) | def manual_expansion(self, value):
    method text (line 1175) | def text(self):
    method text (line 1180) | def text(self, value):
    method misc (line 1185) | def misc(self):
    method misc (line 1190) | def misc(self, value):
    method consolidate_whitespace (line 1194) | def consolidate_whitespace(self):
    method spaces_before (line 1248) | def spaces_before(self):
    method spaces_before (line 1253) | def spaces_before(self, value):
    method spaces_after (line 1257) | def spaces_after(self):
    method spaces_after (line 1262) | def spaces_after(self, value):
    method words (line 1266) | def words(self):
    method words (line 1271) | def words(self, value):
    method line_number (line 1278) | def line_number(self):
    method start_char (line 1283) | def start_char(self):
    method end_char (line 1288) | def end_char(self):
    method ner (line 1293) | def ner(self):
    method ner (line 1298) | def ner(self, value):
    method multi_ner (line 1303) | def multi_ner(self):
    method multi_ner (line 1308) | def multi_ner(self, value):
    method sent (line 1313) | def sent(self):
    method sent (line 1318) | def sent(self, value):
    method __repr__ (line 1322) | def __repr__(self):
    method __format__ (line 1325) | def __format__(self, spec):
    method to_conll_text (line 1333) | def to_conll_text(self, fields=DEFAULT_OUTPUT_FIELDS):
    method to_dict (line 1336) | def to_dict(self, fields=DEFAULT_OUTPUT_FIELDS):
    method pretty_print (line 1389) | def pretty_print(self):
    method _is_null (line 1393) | def _is_null(self, value):
    method is_mwt (line 1396) | def is_mwt(self):
  class Word (line 1399) | class Word(StanzaObject):
    method __init__ (line 1403) | def __init__(self, sentence, word_entry):
    method manual_expansion (line 1437) | def manual_expansion(self):
    method manual_expansion (line 1442) | def manual_expansion(self, value):
    method id (line 1447) | def id(self):
    method id (line 1452) | def id(self, value):
    method text (line 1457) | def text(self):
    method text (line 1462) | def text(self, value):
    method lemma (line 1467) | def lemma(self):
    method lemma (line 1472) | def lemma(self, value):
    method upos (line 1477) | def upos(self):
    method upos (line 1482) | def upos(self, value):
    method xpos (line 1487) | def xpos(self):
    method xpos (line 1492) | def xpos(self, value):
    method feats (line 1497) | def feats(self):
    method feats (line 1502) | def feats(self, value):
    method head (line 1507) | def head(self):
    method head (line 1512) | def head(self, value):
    method deprel (line 1517) | def deprel(self):
    method deprel (line 1522) | def deprel(self, value):
    method deps (line 1527) | def deps(self):
    method deps (line 1548) | def deps(self, value):
    method misc (line 1581) | def misc(self):
    method misc (line 1586) | def misc(self, value):
    method line_number (line 1591) | def line_number(self):
    method start_char (line 1596) | def start_char(self):
    method start_char (line 1601) | def start_char(self, value):
    method end_char (line 1605) | def end_char(self):
    method end_char (line 1610) | def end_char(self, value):
    method parent (line 1614) | def parent(self):
    method parent (line 1621) | def parent(self, value):
    method pos (line 1628) | def pos(self):
    method pos (line 1633) | def pos(self, value):
    method coref_chains (line 1638) | def coref_chains(self):
    method coref_chains (line 1651) | def coref_chains(self, chain):
    method sent (line 1656) | def sent(self):
    method sent (line 1661) | def sent(self, value):
    method __repr__ (line 1665) | def __repr__(self):
    method __format__ (line 1668) | def __format__(self, spec):
    method to_conll_text (line 1676) | def to_conll_text(self, fields=DEFAULT_OUTPUT_FIELDS):
    method to_dict (line 1683) | def to_dict(self, fields=DEFAULT_OUTPUT_FIELDS):
    method pretty_print (line 1692) | def pretty_print(self):
    method _is_null (line 1698) | def _is_null(self, value):
  class Span (line 1702) | class Span(StanzaObject):
    method __init__ (line 1707) | def __init__(self, span_entry=None, tokens=None, type=None, doc=None, ...
    method init_from_entry (line 1726) | def init_from_entry(self, span_entry):
    method init_from_tokens (line 1732) | def init_from_tokens(self, tokens, type):
    method doc (line 1763) | def doc(self):
    method doc (line 1768) | def doc(self, value):
    method text (line 1773) | def text(self):
    method text (line 1778) | def text(self, value):
    method tokens (line 1783) | def tokens(self):
    method tokens (line 1788) | def tokens(self, value):
    method words (line 1793) | def words(self):
    method words (line 1798) | def words(self, value):
    method type (line 1803) | def type(self):
    method type (line 1808) | def type(self, value):
    method start_char (line 1813) | def start_char(self):
    method start_char (line 1818) | def start_char(self, value):
    method end_char (line 1823) | def end_char(self):
    method end_char (line 1828) | def end_char(self, value):
    method sent (line 1833) | def sent(self):
    method sent (line 1838) | def sent(self, value):
    method to_dict (line 1842) | def to_dict(self):
    method __repr__ (line 1848) | def __repr__(self):
    method pretty_print (line 1851) | def pretty_print(self):

FILE: stanza/models/common/dropout.py
  class WordDropout (line 4) | class WordDropout(nn.Module):
    method __init__ (line 9) | def __init__(self, dropprob):
    method forward (line 13) | def forward(self, x, replacement=None):
    method extra_repr (line 27) | def extra_repr(self):
  class LockedDropout (line 30) | class LockedDropout(nn.Module):
    method __init__ (line 35) | def __init__(self, dropprob, batch_first=True):
    method forward (line 40) | def forward(self, x):
    method extra_repr (line 52) | def extra_repr(self):
  class SequenceUnitDropout (line 55) | class SequenceUnitDropout(nn.Module):
    method __init__ (line 59) | def __init__(self, dropprob, replacement_id):
    method forward (line 64) | def forward(self, x):
    method extra_repr (line 73) | def extra_repr(self):

FILE: stanza/models/common/exceptions.py
  class ForwardCharlmNotFoundError (line 9) | class ForwardCharlmNotFoundError(FileNotFoundError):
    method __init__ (line 10) | def __init__(self, msg, filename):
  class BackwardCharlmNotFoundError (line 13) | class BackwardCharlmNotFoundError(FileNotFoundError):
    method __init__ (line 14) | def __init__(self, msg, filename):

FILE: stanza/models/common/foundation_cache.py
  class FoundationCache (line 18) | class FoundationCache:
    method __init__ (line 19) | def __init__(self, other=None, local_files_only=False):
    method load_bert (line 34) | def load_bert(self, transformer_name, local_files_only=None):
    method load_bert_with_peft (line 38) | def load_bert_with_peft(self, transformer_name, peft_name, local_files...
    method load_charlm (line 65) | def load_charlm(self, filename):
    method load_pretrain (line 78) | def load_pretrain(self, filename):
  class NoTransformerFoundationCache (line 95) | class NoTransformerFoundationCache(FoundationCache):
    method load_bert (line 104) | def load_bert(self, transformer_name, local_files_only=None):
    method load_bert_with_peft (line 107) | def load_bert_with_peft(self, transformer_name, peft_name, local_files...
  function load_bert (line 110) | def load_bert(model_name, foundation_cache=None, local_files_only=None):
  function load_bert_with_peft (line 119) | def load_bert_with_peft(model_name, peft_name, foundation_cache=None, lo...
  function load_charlm (line 125) | def load_charlm(charlm_file, foundation_cache=None, finetune=False):
  function load_pretrain (line 140) | def load_pretrain(filename, foundation_cache=None):

FILE: stanza/models/common/hlstm.py
  class HLSTMCell (line 8) | class HLSTMCell(nn.modules.rnn.RNNCellBase):
    method __init__ (line 13) | def __init__(self, input_size, hidden_size, bias=True):
    method forward (line 27) | def forward(self, input, c_l_minus_one=None, hx=None):
  class HighwayLSTM (line 55) | class HighwayLSTM(nn.Module):
    method __init__ (line 60) | def __init__(self, input_size, hidden_size,
    method forward (line 91) | def forward(self, input, seqlens, hx=None):

FILE: stanza/models/common/large_margin_loss.py
  class LargeMarginInSoftmaxLoss (line 27) | class LargeMarginInSoftmaxLoss(nn.CrossEntropyLoss):
    method __init__ (line 40) | def __init__(self, reg_lambda=0.3, deg_logit=None,
    method forward (line 47) | def forward(self, input, target):

FILE: stanza/models/common/loss.py
  function SequenceLoss (line 14) | def SequenceLoss(vocab_size):
  function weighted_cross_entropy_loss (line 20) | def weighted_cross_entropy_loss(labels, log_dampened=False):
  class FocalLoss (line 39) | class FocalLoss(nn.Module):
    method __init__ (line 50) | def __init__(self, reduction='mean', gamma=2.0):
    method forward (line 59) | def forward(self, inputs, targets):
  class MixLoss (line 88) | class MixLoss(nn.Module):
    method __init__ (line 93) | def __init__(self, vocab_size, alpha):
    method forward (line 100) | def forward(self, seq_inputs, seq_targets, class_inputs, class_targets):
  class MaxEntropySequenceLoss (line 106) | class MaxEntropySequenceLoss(nn.Module):
    method __init__ (line 113) | def __init__(self, vocab_size, alpha):
    method forward (line 120) | def forward(self, inputs, targets):

FILE: stanza/models/common/maxout_linear.py
  class MaxoutLinear (line 22) | class MaxoutLinear(nn.Module):
    method __init__ (line 23) | def __init__(self, in_channels, out_channels, maxout_k):
    method forward (line 32) | def forward(self, inputs):

FILE: stanza/models/common/packed_lstm.py
  class PackedLSTM (line 6) | class PackedLSTM(nn.Module):
    method __init__ (line 7) | def __init__(self, input_size, hidden_size, num_layers, bias=True, bat...
    method forward (line 18) | def forward(self, input, lengths, hx=None):
  class LSTMwRecDropout (line 27) | class LSTMwRecDropout(nn.Module):
    method __init__ (line 29) | def __init__(self, input_size, hidden_size, num_layers, bias=True, bat...
    method forward (line 48) | def forward(self, input, hx=None):

FILE: stanza/models/common/peft_config.py
  function add_peft_args (line 24) | def add_peft_args(parser):
  function pop_peft_args (line 36) | def pop_peft_args(args):
  function resolve_peft_args (line 52) | def resolve_peft_args(args, logger, check_bert_finetune=True):
  function build_peft_config (line 84) | def build_peft_config(args, logger):
  function build_peft_wrapper (line 97) | def build_peft_wrapper(bert_model, args, logger, adapter_name="default"):
  function load_peft_wrapper (line 111) | def load_peft_wrapper(bert_model, lora_params, args, logger, adapter_name):

FILE: stanza/models/common/pretrain.py
  class PretrainedWordVocab (line 23) | class PretrainedWordVocab(BaseVocab):
    method build_vocab (line 24) | def build_vocab(self):
    method normalize_unit (line 28) | def normalize_unit(self, unit):
  class Pretrain (line 34) | class Pretrain:
    method __init__ (line 37) | def __init__(self, filename=None, vec_filename=None, max_vocab=-1, sav...
    method __len__ (line 44) | def __len__(self):
    method vocab (line 48) | def vocab(self):
    method emb (line 54) | def emb(self):
    method load (line 59) | def load(self):
    method save (line 100) | def save(self, filename):
    method write_text (line 115) | def write_text(self, filename, header=False):
    method read_pretrain (line 131) | def read_pretrain(self):
    method read_from_csv (line 153) | def read_from_csv(filename):
    method read_from_file (line 178) | def read_from_file(filename, max_vocab=None):
  function find_pretrain_file (line 239) | def find_pretrain_file(wordvec_pretrain_file, save_dir, shorthand, lang):

FILE: stanza/models/common/relative_attn.py
  class RelativeAttention (line 9) | class RelativeAttention(nn.Module):
    method __init__ (line 10) | def __init__(self, d_model, num_heads, window=8, dropout=0.2, reverse=...
    method forward (line 50) | def forward(self, x, sink=None):
    method skew_repeat (line 116) | def skew_repeat(self, q):

FILE: stanza/models/common/seq2seq_model.py
  class Seq2SeqModel (line 19) | class Seq2SeqModel(nn.Module):
    method __init__ (line 26) | def __init__(self, args, emb_matrix=None, contextual_embedding=None):
    method add_unsaved_module (line 88) | def add_unsaved_module(self, name, module):
    method init_weights (line 92) | def init_weights(self):
    method zero_state (line 116) | def zero_state(self, inputs):
    method encode (line 123) | def encode(self, enc_inputs, lens):
    method decode (line 134) | def decode(self, dec_inputs, hn, cn, ctx, ctx_mask=None, src=None, nev...
    method embed (line 213) | def embed(self, src, src_mask, pos, raw):
    method forward (line 233) | def forward(self, src, src_mask, tgt_in, pos=None, raw=None):
    method get_log_prob (line 250) | def get_log_prob(self, logits):
    method predict_greedy (line 257) | def predict_greedy(self, src, src_mask, pos=None, raw=None, never_deco...
    method predict (line 298) | def predict(self, src, src_mask, pos=None, beam_size=5, raw=None, neve...

FILE: stanza/models/common/seq2seq_modules.py
  class BasicAttention (line 15) | class BasicAttention(nn.Module):
    method __init__ (line 19) | def __init__(self, dim):
    method forward (line 28) | def forward(self, input, context, mask=None, attn_only=False):
  class SoftDotAttention (line 55) | class SoftDotAttention(nn.Module):
    method __init__ (line 62) | def __init__(self, dim):
    method forward (line 71) | def forward(self, input, context, mask=None, attn_only=False, return_l...
  class LinearAttention (line 106) | class LinearAttention(nn.Module):
    method __init__ (line 111) | def __init__(self, dim):
    method forward (line 119) | def forward(self, input, context, mask=None, attn_only=False):
  class DeepAttention (line 148) | class DeepAttention(nn.Module):
    method __init__ (line 155) | def __init__(self, dim):
    method forward (line 165) | def forward(self, input, context, mask=None, attn_only=False):
  class LSTMAttention (line 194) | class LSTMAttention(nn.Module):
    method __init__ (line 197) | def __init__(self, input_size, hidden_size, batch_first=True, attn_typ...
    method forward (line 218) | def forward(self, input, hidden, ctx, ctx_mask=None, return_logattn=Fa...

FILE: stanza/models/common/seq2seq_utils.py
  function get_optimizer (line 12) | def get_optimizer(name, parameters, lr):
  function change_lr (line 24) | def change_lr(optimizer, new_lr):
  function flatten_indices (line 28) | def flatten_indices(seq_lens, width):
  function keep_partial_grad (line 35) | def keep_partial_grad(grad, topk):
  function save_config (line 44) | def save_config(config, path, verbose=True):
  function load_config (line 51) | def load_config(path, verbose=True):
  function unmap_with_copy (line 58) | def unmap_with_copy(indices, src_tokens, vocab):
  function prune_decoded_seqs (line 74) | def prune_decoded_seqs(seqs):
  function prune_hyp (line 87) | def prune_hyp(hyp):
  function prune (line 97) | def prune(data_list, lens):
  function sort (line 104) | def sort(packed, ref, reverse=True):
  function unsort (line 114) | def unsort(sorted_list, oidx):

FILE: stanza/models/common/short_name_to_treebank.py
  function short_name_to_treebank (line 355) | def short_name_to_treebank(short_name):
  function canonical_treebank_name (line 702) | def canonical_treebank_name(ud_name):

FILE: stanza/models/common/stanza_object.py
  function _readonly_setter (line 1) | def _readonly_setter(self, name):
  class StanzaObject (line 9) | class StanzaObject(object):
    method add_property (line 15) | def add_property(cls, name, default=None, getter=None, setter=None):

FILE: stanza/models/common/trainer.py
  class Trainer (line 3) | class Trainer:
    method change_lr (line 4) | def change_lr(self, new_lr):
    method save (line 8) | def save(self, filename):
    method load (line 15) | def load(self, filename):

FILE: stanza/models/common/utils.py
  function get_wordvec_file (line 40) | def get_wordvec_file(wordvec_dir, shorthand, wordvec_type=None):
  function output_stream (line 68) | def output_stream(filename=None):
  function open_read_text (line 82) | def open_read_text(filename, encoding="utf-8"):
  function open_read_binary (line 105) | def open_read_binary(filename):
  function get_adaptive_eval_interval (line 139) | def get_adaptive_eval_interval(cur_dev_size, thres_dev_size, base_interv...
  function ud_scores (line 151) | def ud_scores(gold_conllu_file, system_conllu_file):
  function harmonic_mean (line 181) | def harmonic_mean(a, weights=None):
  function dispatch_optimizer (line 192) | def dispatch_optimizer(name, parameters, opt_logger, lr=None, betas=None...
  function get_optimizer (line 250) | def get_optimizer(name, model, lr, betas=(0.9, 0.999), eps=1e-8, momentu...
  function get_split_optimizer (line 293) | def get_split_optimizer(name, model, lr, betas=(0.9, 0.999), eps=1e-8, m...
  function change_lr (line 338) | def change_lr(optimizer, new_lr):
  function flatten_indices (line 342) | def flatten_indices(seq_lens, width):
  function keep_partial_grad (line 349) | def keep_partial_grad(grad, topk):
  function ensure_dir (line 358) | def ensure_dir(d, verbose=True):
  function save_config (line 365) | def save_config(config, path, verbose=True):
  function load_config (line 372) | def load_config(path, verbose=True):
  function print_config (line 379) | def print_config(config):
  function normalize_text (line 385) | def normalize_text(text):
  function unmap_with_copy (line 388) | def unmap_with_copy(indices, src_tokens, vocab):
  function prune_decoded_seqs (line 404) | def prune_decoded_seqs(seqs):
  function prune_hyp (line 417) | def prune_hyp(hyp):
  function prune (line 427) | def prune(data_list, lens):
  function sort (line 434) | def sort(packed, ref, reverse=True):
  function unsort (line 444) | def unsort(sorted_list, oidx):
  function sort_with_indices (line 454) | def sort_with_indices(data, key=None, reverse=False):
  function split_into_batches (line 471) | def split_into_batches(data, batch_size):
  function tensor_unsort (line 503) | def tensor_unsort(sorted_tensor, oidx):
  function set_random_seed (line 512) | def set_random_seed(seed):
  function find_missing_tags (line 530) | def find_missing_tags(known_tags, test_tags):
  function warn_missing_tags (line 538) | def warn_missing_tags(known_tags, test_tags, test_set_name):
  function checkpoint_name (line 550) | def checkpoint_name(save_dir, save_name, checkpoint_name):
  function default_device (line 570) | def default_device():
  function add_device_args (line 578) | def add_device_args(parser):
  function load_elmo (line 586) | def load_elmo(elmo_model):
  function log_training_args (line 595) | def log_training_args(args, args_logger, name="training"):
  function embedding_name (line 605) | def embedding_name(args):
  function standard_model_file_name (line 627) | def standard_model_file_name(args, model_type, **kwargs):
  function escape_misc_space (line 686) | def escape_misc_space(space):
  function unescape_misc_space (line 708) | def unescape_misc_space(misc_space):
  function space_before_to_misc (line 739) | def space_before_to_misc(space):
  function space_after_to_misc (line 755) | def space_after_to_misc(space):
  function misc_to_space_before (line 766) | def misc_to_space_before(misc):
  function misc_to_space_after (line 780) | def misc_to_space_after(misc):
  function log_norms (line 806) | def log_norms(model):
  function attach_bert_model (line 819) | def attach_bert_model(model, bert_model, bert_tokenizer, use_peft, force...
  function build_save_each_filename (line 834) | def build_save_each_filename(base_filename):
  function build_nonlinearity (line 912) | def build_nonlinearity(nonlinearity):
  function update_word_cutoff (line 924) | def update_word_cutoff(pt, word_cutoff):
  function simplify_punct (line 999) | def simplify_punct(data):

FILE: stanza/models/common/vocab.py
  class BaseVocab (line 18) | class BaseVocab:
    method __init__ (line 21) | def __init__(self, data=None, lang="", idx=0, cutoff=0, lower=False):
    method build_vocab (line 31) | def build_vocab(self):
    method state_dict (line 34) | def state_dict(self):
    method load_state_dict (line 44) | def load_state_dict(cls, state_dict):
    method normalize_unit (line 51) | def normalize_unit(self, unit):
    method unit2id (line 60) | def unit2id(self, unit):
    method id2unit (line 67) | def id2unit(self, id):
    method map (line 70) | def map(self, units):
    method unmap (line 73) | def unmap(self, ids):
    method __str__ (line 76) | def __str__(self):
    method __len__ (line 81) | def __len__(self):
    method __getitem__ (line 84) | def __getitem__(self, key):
    method __contains__ (line 92) | def __contains__(self, key):
    method size (line 96) | def size(self):
  class DeltaVocab (line 99) | class DeltaVocab(BaseVocab):
    method __init__ (line 107) | def __init__(self, data, orig_vocab):
    method build_vocab (line 111) | def build_vocab(self):
  class CompositeVocab (line 128) | class CompositeVocab(BaseVocab):
    method __init__ (line 142) | def __init__(self, data=None, lang="", idx=0, sep="", keyed=False):
    method unit2parts (line 148) | def unit2parts(self, unit):
    method unit2id (line 167) | def unit2id(self, unit):
    method id2unit (line 175) | def id2unit(self, id):
    method build_vocab (line 194) | def build_vocab(self):
    method lens (line 233) | def lens(self):
    method items (line 236) | def items(self, idx):
    method __str__ (line 239) | def __str__(self):
  class BaseMultiVocab (line 244) | class BaseMultiVocab:
    method __init__ (line 249) | def __init__(self, vocab_dict=None):
    method __setitem__ (line 258) | def __setitem__(self, key, item):
    method __getitem__ (line 261) | def __getitem__(self, key):
    method __str__ (line 264) | def __str__(self):
    method __contains__ (line 267) | def __contains__(self, key):
    method keys (line 270) | def keys(self):
    method state_dict (line 273) | def state_dict(self):
    method load_state_dict (line 281) | def load_state_dict(cls, state_dict):
  class CharVocab (line 287) | class CharVocab(BaseVocab):
    method build_vocab (line 288) | def build_vocab(self):

FILE: stanza/models/constituency/base_model.py
  class BaseModel (line 39) | class BaseModel(ABC):
    method __init__ (line 49) | def __init__(self, transition_scheme, unary_limit, reverse_sentence, r...
    method initial_word_queues (line 62) | def initial_word_queues(self, tagged_word_lists):
    method initial_transitions (line 70) | def initial_transitions(self):
    method initial_constituents (line 76) | def initial_constituents(self):
    method get_word (line 82) | def get_word(self, word_node):
    method transform_word_to_constituent (line 88) | def transform_word_to_constituent(self, state):
    method dummy_constituent (line 94) | def dummy_constituent(self, dummy):
    method build_constituents (line 100) | def build_constituents(self, labels, children_lists):
    method push_constituents (line 106) | def push_constituents(self, constituent_stacks, constituents):
    method get_top_constituent (line 114) | def get_top_constituent(self, constituents):
    method push_transitions (line 122) | def push_transitions(self, transition_stacks, transitions):
    method get_top_transition (line 130) | def get_top_transition(self, transitions):
    method root_labels (line 138) | def root_labels(self):
    method unary_limit (line 146) | def unary_limit(self):
    method transition_scheme (line 153) | def transition_scheme(self):
    method has_unary_transitions (line 159) | def has_unary_transitions(self):
    method is_top_down (line 166) | def is_top_down(self):
    method reverse_sentence (line 173) | def reverse_sentence(self):
    method predict (line 179) | def predict(self, states, is_legal=True):
    method weighted_choice (line 182) | def weighted_choice(self, states):
    method predict_gold (line 185) | def predict_gold(self, states, is_legal=True):
    method initial_state_from_preterminals (line 196) | def initial_state_from_preterminals(self, preterminal_lists, gold_tree...
    method initial_state_from_words (line 221) | def initial_state_from_words(self, word_lists):
    method initial_state_from_gold_trees (line 226) | def initial_state_from_gold_trees(self, trees, gold_sequences=None):
    method build_batch_from_trees (line 232) | def build_batch_from_trees(self, batch_size, data_iterator):
    method build_batch_from_trees_with_gold_sequence (line 247) | def build_batch_from_trees_with_gold_sequence(self, batch_size, data_i...
    method build_batch_from_tagged_words (line 259) | def build_batch_from_tagged_words(self, batch_size, data_iterator):
    method parse_sentences (line 277) | def parse_sentences(self, data_iterator, build_batch_fn, batch_size, t...
    method parse_sentences_no_grad (line 356) | def parse_sentences_no_grad(self, data_iterator, build_batch_fn, batch...
    method analyze_trees (line 366) | def analyze_trees(self, trees, batch_size=None, keep_state=True, keep_...
    method parse_tagged_words (line 383) | def parse_tagged_words(self, words, batch_size):
    method bulk_apply (line 404) | def bulk_apply(self, state_batch, transitions, fail=False):
  class SimpleModel (line 478) | class SimpleModel(BaseModel):
    method __init__ (line 489) | def __init__(self, transition_scheme=TransitionScheme.TOP_DOWN_UNARY, ...
    method initial_word_queues (line 492) | def initial_word_queues(self, tagged_word_lists):
    method initial_transitions (line 503) | def initial_transitions(self):
    method initial_constituents (line 506) | def initial_constituents(self):
    method get_word (line 509) | def get_word(self, word_node):
    method transform_word_to_constituent (line 512) | def transform_word_to_constituent(self, state):
    method dummy_constituent (line 515) | def dummy_constituent(self, dummy):
    method build_constituents (line 518) | def build_constituents(self, labels, children_lists):
    method push_constituents (line 528) | def push_constituents(self, constituent_stacks, constituents):
    method get_top_constituent (line 531) | def get_top_constituent(self, constituents):
    method push_transitions (line 534) | def push_transitions(self, transition_stacks, transitions):
    method get_top_transition (line 537) | def get_top_transition(self, transitions):

FILE: stanza/models/constituency/base_trainer.py
  class ModelType (line 12) | class ModelType(Enum):
  class BaseTrainer (line 16) | class BaseTrainer:
    method __init__ (line 17) | def __init__(self, model, optimizer=None, scheduler=None, epochs_train...
    method save (line 29) | def save(self, filename, save_optimizer=True):
    method log_norms (line 47) | def log_norms(self):
    method log_shapes (line 50) | def log_shapes(self):
    method transitions (line 54) | def transitions(self):
    method root_labels (line 58) | def root_labels(self):
    method device (line 62) | def device(self):
    method train (line 65) | def train(self):
    method eval (line 68) | def eval(self):
    method load (line 74) | def load(filename, args=None, load_optimizer=False, foundation_cache=N...

FILE: stanza/models/constituency/dynamic_oracle.py
  function score_candidates_single_block (line 9) | def score_candidates_single_block(model, state, candidates, candidate_idx):
  function score_candidates (line 33) | def score_candidates(model, state, candidates):
  function advance_past_constituents (line 60) | def advance_past_constituents(gold_sequence, cur_index):
  function find_previous_open (line 76) | def find_previous_open(gold_sequence, cur_index):
  function find_in_order_constituent_end (line 94) | def find_in_order_constituent_end(gold_sequence, cur_index):
  class DynamicOracle (line 118) | class DynamicOracle():
    method __init__ (line 119) | def __init__(self, root_labels, oracle_level, repair_types, additional...
    method fix_error (line 132) | def fix_error(self, pred_transition, model, state):

FILE: stanza/models/constituency/ensemble.py
  class Ensemble (line 48) | class Ensemble(nn.Module):
    method __init__ (line 49) | def __init__(self, args, filenames=None, models=None, foundation_cache...
    method detach_submodels (line 98) | def detach_submodels(self):
    method train (line 104) | def train(self, mode=True):
    method transitions (line 112) | def transitions(self):
    method root_labels (line 116) | def root_labels(self):
    method device (line 120) | def device(self):
    method unary_limit (line 123) | def unary_limit(self):
    method transition_scheme (line 129) | def transition_scheme(self):
    method has_unary_transitions (line 132) | def has_unary_transitions(self):
    method is_top_down (line 136) | def is_top_down(self):
    method reverse_sentence (line 140) | def reverse_sentence(self):
    method retag_method (line 144) | def retag_method(self):
    method uses_xpos (line 148) | def uses_xpos(self):
    method get_top_constituent (line 151) | def get_top_constituent(self, constituents):
    method get_top_transition (line 154) | def get_top_transition(self, transitions):
    method log_norms (line 157) | def log_norms(self):
    method log_shapes (line 171) | def log_shapes(self):
    method get_params (line 178) | def get_params(self):
    method initial_state_from_preterminals (line 187) | def initial_state_from_preterminals(self, preterminal_lists, gold_tree...
    method build_batch_from_tagged_words (line 194) | def build_batch_from_tagged_words(self, batch_size, data_iterator):
    method build_batch_from_trees (line 213) | def build_batch_from_trees(self, batch_size, data_iterator):
    method predict (line 230) | def predict(self, states, is_legal=True):
    method bulk_apply (line 263) | def bulk_apply(self, state_batch, transitions, fail=False):
    method parse_tagged_words (line 272) | def parse_tagged_words(self, words, batch_size):
    method parse_sentences (line 295) | def parse_sentences(self, data_iterator, build_batch_fn, batch_size, t...
    method parse_sentences_no_grad (line 362) | def parse_sentences_no_grad(self, data_iterator, build_batch_fn, batch...
  class EnsembleTrainer (line 366) | class EnsembleTrainer(BaseTrainer):
    method __init__ (line 370) | def __init__(self, ensemble, optimizer=None, scheduler=None, epochs_tr...
    method from_files (line 374) | def from_files(args, filenames, foundation_cache=None):
    method get_peft_params (line 379) | def get_peft_params(self):
    method model_type (line 391) | def model_type(self):
    method log_num_words_known (line 394) | def log_num_words_known(self, words):
    method build_optimizer (line 402) | def build_optimizer(args, model, first_optimizer):
    method load_optimizer (line 417) | def load_optimizer(model, checkpoint, first_optimizer, filename):
    method load_scheduler (line 429) | def load_scheduler(model, optimizer, checkpoint, first_optimizer):
    method model_from_params (line 436) | def model_from_params(params, peft_params, args, foundation_cache=None...
  function parse_args (line 459) | def parse_args(args=None):
  function main (line 478) | def main(args=None):

FILE: stanza/models/constituency/error_analysis_in_order.py
  class FirstError (line 15) | class FirstError(Enum):
  function advance_past_unaries (line 29) | def advance_past_unaries(sequence, idx):
  function check_attachment_error (line 34) | def check_attachment_error(gold_sequence, pred_sequence, idx, error_type):
  function analyze_tree (line 59) | def analyze_tree(gold_tree, pred_tree):

FILE: stanza/models/constituency/evaluate_treebanks.py
  function main (line 13) | def main():

FILE: stanza/models/constituency/in_order_compound_oracle.py
  function fix_missing_unary_error (line 6) | def fix_missing_unary_error(gold_transition, pred_transition, gold_seque...
  function fix_wrong_unary_error (line 22) | def fix_wrong_unary_error(gold_transition, pred_transition, gold_sequenc...
  function fix_spurious_unary_error (line 33) | def fix_spurious_unary_error(gold_transition, pred_transition, gold_sequ...
  function fix_open_shift_error (line 42) | def fix_open_shift_error(gold_transition, pred_transition, gold_sequence...
  function fix_open_open_two_subtrees_error (line 63) | def fix_open_open_two_subtrees_error(gold_transition, pred_transition, g...
  function fix_open_open_error (line 81) | def fix_open_open_error(gold_transition, pred_transition, gold_sequence,...
  function fix_open_open_three_subtrees_error (line 111) | def fix_open_open_three_subtrees_error(gold_transition, pred_transition,...
  function fix_open_open_many_subtrees_error (line 114) | def fix_open_open_many_subtrees_error(gold_transition, pred_transition, ...
  function fix_open_close_error (line 117) | def fix_open_close_error(gold_transition, pred_transition, gold_sequence...
  function fix_shift_close_error (line 142) | def fix_shift_close_error(gold_transition, pred_transition, gold_sequenc...
  function fix_shift_open_unambiguous_error (line 161) | def fix_shift_open_unambiguous_error(gold_transition, pred_transition, g...
  function fix_close_shift_unambiguous_error (line 178) | def fix_close_shift_unambiguous_error(gold_transition, pred_transition, ...
  class RepairType (line 197) | class RepairType(Enum):
    method __new__ (line 215) | def __new__(cls, fn, correct=False, debug=False):
    method is_correct (line 228) | def is_correct(self):
  class InOrderCompoundOracle (line 325) | class InOrderCompoundOracle(DynamicOracle):
    method __init__ (line 326) | def __init__(self, root_labels, oracle_level, additional_oracle_levels...

FILE: stanza/models/constituency/in_order_oracle.py
  function fix_wrong_open_root_error (line 6) | def fix_wrong_open_root_error(gold_transition, pred_transition, gold_seq...
  function fix_wrong_open_unary_chain (line 18) | def fix_wrong_open_unary_chain(gold_transition, pred_transition, gold_se...
  function fix_wrong_open_subtrees (line 38) | def fix_wrong_open_subtrees(gold_transition, pred_transition, gold_seque...
  function fix_wrong_open_two_subtrees (line 63) | def fix_wrong_open_two_subtrees(gold_transition, pred_transition, gold_s...
  function fix_wrong_open_multiple_subtrees (line 66) | def fix_wrong_open_multiple_subtrees(gold_transition, pred_transition, g...
  function advance_past_unaries (line 69) | def advance_past_unaries(gold_sequence, cur_index):
  function fix_wrong_open_stuff_unary (line 74) | def fix_wrong_open_stuff_unary(gold_transition, pred_transition, gold_se...
  function fix_wrong_open_general (line 117) | def fix_wrong_open_general(gold_transition, pred_transition, gold_sequen...
  function fix_missed_unary (line 139) | def fix_missed_unary(gold_transition, pred_transition, gold_sequence, go...
  function fix_open_shift (line 154) | def fix_open_shift(gold_transition, pred_transition, gold_sequence, gold...
  function fix_open_close (line 211) | def fix_open_close(gold_transition, pred_transition, gold_sequence, gold...
  function fix_shift_close (line 277) | def fix_shift_close(gold_transition, pred_transition, gold_sequence, gol...
  function fix_close_shift_open_bracket (line 306) | def fix_close_shift_open_bracket(gold_transition, pred_transition, gold_...
  function fix_close_open_shift_unambiguous_bracket (line 347) | def fix_close_open_shift_unambiguous_bracket(gold_transition, pred_trans...
  function fix_close_open_shift_ambiguous_bracket_early (line 350) | def fix_close_open_shift_ambiguous_bracket_early(gold_transition, pred_t...
  function fix_close_open_shift_ambiguous_bracket_late (line 353) | def fix_close_open_shift_ambiguous_bracket_late(gold_transition, pred_tr...
  function fix_close_open_shift_ambiguous_predicted (line 356) | def fix_close_open_shift_ambiguous_predicted(gold_transition, pred_trans...
  function fix_close_open_shift_nested (line 404) | def fix_close_open_shift_nested(gold_transition, pred_transition, gold_s...
  function fix_close_shift_shift (line 444) | def fix_close_shift_shift(gold_transition, pred_transition, gold_sequenc...
  function fix_close_shift_shift_unambiguous (line 481) | def fix_close_shift_shift_unambiguous(gold_transition, pred_transition, ...
  function fix_close_shift_shift_ambiguous_early (line 484) | def fix_close_shift_shift_ambiguous_early(gold_transition, pred_transiti...
  function fix_close_shift_shift_ambiguous_late (line 487) | def fix_close_shift_shift_ambiguous_late(gold_transition, pred_transitio...
  function fix_close_shift_shift_ambiguous_predicted (line 490) | def fix_close_shift_shift_ambiguous_predicted(gold_transition, pred_tran...
  function ambiguous_shift_open_unary_close (line 524) | def ambiguous_shift_open_unary_close(gold_transition, pred_transition, g...
  function ambiguous_shift_open_early_close (line 532) | def ambiguous_shift_open_early_close(gold_transition, pred_transition, g...
  function ambiguous_shift_open_late_close (line 543) | def ambiguous_shift_open_late_close(gold_transition, pred_transition, go...
  function ambiguous_shift_open_predicted_close (line 552) | def ambiguous_shift_open_predicted_close(gold_transition, pred_transitio...
  function report_close_shift (line 587) | def report_close_shift(gold_transition, pred_transition, gold_sequence, ...
  function report_close_open (line 595) | def report_close_open(gold_transition, pred_transition, gold_sequence, g...
  function report_open_open (line 603) | def report_open_open(gold_transition, pred_transition, gold_sequence, go...
  function report_open_shift (line 611) | def report_open_shift(gold_transition, pred_transition, gold_sequence, g...
  function report_open_close (line 619) | def report_open_close(gold_transition, pred_transition, gold_sequence, g...
  function report_shift_open (line 627) | def report_shift_open(gold_transition, pred_transition, gold_sequence, g...
  class RepairType (line 635) | class RepairType(Enum):
    method __new__ (line 786) | def __new__(cls, fn, correct=False, debug=False):
    method is_correct (line 803) | def is_correct(self):
  class InOrderOracle (line 1027) | class InOrderOracle(DynamicOracle):
    method __init__ (line 1028) | def __init__(self, root_labels, oracle_level, additional_oracle_levels...

FILE: stanza/models/constituency/label_attention.py
  class BatchIndices (line 13) | class BatchIndices:
    method __init__ (line 17) | def __init__(self, batch_idxs_np, device):
  class FeatureDropoutFunction (line 36) | class FeatureDropoutFunction(torch.autograd.function.InplaceFunction):
    method forward (line 38) | def forward(cls, ctx, input, batch_idxs, p=0.5, train=False, inplace=F...
    method backward (line 65) | def backward(ctx, grad_output):
  class FeatureDropout (line 72) | class FeatureDropout(nn.Module):
    method __init__ (line 78) | def __init__(self, p=0.5, inplace=False):
    method forward (line 86) | def forward(self, input, batch_idxs):
  class LayerNormalization (line 91) | class LayerNormalization(nn.Module):
    method __init__ (line 92) | def __init__(self, d_hid, eps=1e-3, affine=True):
    method forward (line 101) | def forward(self, z):
  class ScaledDotProductAttention (line 115) | class ScaledDotProductAttention(nn.Module):
    method __init__ (line 116) | def __init__(self, d_model, attention_dropout=0.1):
    method forward (line 122) | def forward(self, q, k, v, attn_mask=None):
  class MultiHeadAttention (line 153) | class MultiHeadAttention(nn.Module):
    method __init__ (line 158) | def __init__(self, n_head, d_model, d_k, d_v, residual_dropout=0.1, at...
    method split_qkv_packed (line 211) | def split_qkv_packed(self, inp, qk_inp=None):
    method pad_and_rearrange (line 237) | def pad_and_rearrange(self, q_s, k_s, v_s, batch_idxs):
    method combine_v (line 265) | def combine_v(self, outputs):
    method forward (line 289) | def forward(self, inp, batch_idxs, qk_inp=None):
  class PositionwiseFeedForward (line 315) | class PositionwiseFeedForward(nn.Module):
    method __init__ (line 323) | def __init__(self, d_hid, d_ff, relu_dropout=0.1, residual_dropout=0.1):
    method forward (line 334) | def forward(self, x, batch_idxs):
  class PartitionedPositionwiseFeedForward (line 345) | class PartitionedPositionwiseFeedForward(nn.Module):
    method __init__ (line 346) | def __init__(self, d_hid, d_ff, d_positional, relu_dropout=0.1, residu...
    method forward (line 358) | def forward(self, x, batch_idxs):
  class LabelAttention (line 376) | class LabelAttention(nn.Module):
    method __init__ (line 381) | def __init__(self, d_model, d_k, d_v, d_l, d_proj, combine_as_self, us...
    method split_qkv_packed (line 461) | def split_qkv_packed(self, inp, k_inp=None):
    method pad_and_rearrange (line 497) | def pad_and_rearrange(self, q_s, k_s, v_s, batch_idxs):
    method combine_v (line 537) | def combine_v(self, outputs):
    method forward (line 567) | def forward(self, inp, batch_idxs, k_inp=None):
  class LabelAttentionModule (line 619) | class LabelAttentionModule(nn.Module):
    method __init__ (line 625) | def __init__(self,
    method forward (line 682) | def forward(self, word_embeddings, tagged_word_lists):

FILE: stanza/models/constituency/lstm_model.py
  class SentenceBoundary (line 69) | class SentenceBoundary(Enum):
  class StackHistory (line 74) | class StackHistory(Enum):
  class ConstituencyComposition (line 203) | class ConstituencyComposition(Enum):
  class LSTMModel (line 215) | class LSTMModel(BaseModel, nn.Module):
    method __init__ (line 216) | def __init__(self, pretrain, forward_charlm, backward_charlm, bert_mod...
    method uses_lattn (line 604) | def uses_lattn(args):
    method uses_pattn (line 608) | def uses_pattn(args):
    method copy_with_new_structure (line 611) | def copy_with_new_structure(self, other):
    method build_output_layers (line 648) | def build_output_layers(self, num_output_layers, final_layer_size, max...
    method num_words_known (line 672) | def num_words_known(self, words):
    method retag_method (line 676) | def retag_method(self):
    method uses_xpos (line 680) | def uses_xpos(self):
    method add_unsaved_module (line 683) | def add_unsaved_module(self, name, module):
    method is_unsaved_module (line 695) | def is_unsaved_module(self, name):
    method get_norms (line 698) | def get_norms(self):
    method log_norms (line 720) | def log_norms(self):
    method log_shapes (line 725) | def log_shapes(self):
    method initial_word_queues (line 732) | def initial_word_queues(self, tagged_word_lists):
    method initial_transitions (line 881) | def initial_transitions(self):
    method initial_constituents (line 887) | def initial_constituents(self):
    method get_word (line 893) | def get_word(self, word_node):
    method transform_word_to_constituent (line 896) | def transform_word_to_constituent(self, state):
    method dummy_constituent (line 912) | def dummy_constituent(self, dummy):
    method build_constituents (line 919) | def build_constituents(self, labels, children_lists):
    method push_constituents (line 1068) | def push_constituents(self, constituent_stacks, constituents):
    method get_top_constituent (line 1085) | def get_top_constituent(self, constituents):
    method push_transitions (line 1094) | def push_transitions(self, transition_stacks, transitions):
    method get_top_transition (line 1104) | def get_top_transition(self, transitions):
    method forward (line 1113) | def forward(self, states):
    method predict (line 1140) | def predict(self, states, is_legal=True):
    method weighted_choice (line 1174) | def weighted_choice(self, states):
    method predict_gold (line 1197) | def predict_gold(self, states):
    method get_params (line 1207) | def get_params(self, skip_modules=True):

FILE: stanza/models/constituency/lstm_tree_stack.py
  class LSTMTreeStack (line 21) | class LSTMTreeStack(nn.Module):
    method __init__ (line 22) | def __init__(self, input_size, hidden_size, num_lstm_layers, dropout, ...
    method initial_state (line 48) | def initial_state(self, initial_value=None):
    method push_states (line 68) | def push_states(self, stacks, values, inputs):
    method output (line 85) | def output(self, stack):

FILE: stanza/models/constituency/parse_transitions.py
  class TransitionScheme (line 16) | class TransitionScheme(Enum):
    method __new__ (line 17) | def __new__(cls, value, short_name):
  class Transition (line 59) | class Transition(ABC):
    method update_state (line 65) | def update_state(self, state, model):
    method delta_opens (line 83) | def delta_opens(self):
    method apply (line 86) | def apply(self, state, model):
    method is_legal (line 97) | def is_legal(self, state, model):
    method components (line 104) | def components(self):
    method short_name (line 114) | def short_name(self):
    method short_label (line 119) | def short_label(self):
    method __lt__ (line 131) | def __lt__(self, other):
    method from_repr (line 143) | def from_repr(desc):
  class Shift (line 171) | class Shift(Transition):
    method update_state (line 172) | def update_state(self, state, model):
    method is_legal (line 182) | def is_legal(self, state, model):
    method short_name (line 221) | def short_name(self):
    method __repr__ (line 224) | def __repr__(self):
    method __eq__ (line 227) | def __eq__(self, other):
    method __hash__ (line 234) | def __hash__(self):
  class CompoundUnary (line 237) | class CompoundUnary(Transition):
    method __init__ (line 238) | def __init__(self, *label):
    method update_state (line 243) | def update_state(self, state, model):
    method is_legal (line 258) | def is_legal(self, state, model):
    method components (line 284) | def components(self):
    method short_name (line 287) | def short_name(self):
    method __repr__ (line 290) | def __repr__(self):
    method __eq__ (line 293) | def __eq__(self, other):
    method __hash__ (line 302) | def __hash__(self):
  class Dummy (line 305) | class Dummy():
    method __init__ (line 309) | def __init__(self, label):
    method is_preterminal (line 312) | def is_preterminal(self):
    method __format__ (line 315) | def __format__(self, spec):
    method __str__ (line 322) | def __str__(self):
    method __eq__ (line 325) | def __eq__(self, other):
    method __hash__ (line 334) | def __hash__(self):
  function too_many_unary_nodes (line 337) | def too_many_unary_nodes(tree, unary_limit):
  class OpenConstituent (line 352) | class OpenConstituent(Transition):
    method __init__ (line 353) | def __init__(self, *label):
    method delta_opens (line 357) | def delta_opens(self):
    method update_state (line 360) | def update_state(self, state, model):
    method is_legal (line 365) | def is_legal(self, state, model):
    method components (line 433) | def components(self):
    method short_name (line 436) | def short_name(self):
    method __repr__ (line 439) | def __repr__(self):
    method __eq__ (line 442) | def __eq__(self, other):
    method __hash__ (line 451) | def __hash__(self):
  class Finalize (line 454) | class Finalize(Transition):
    method __init__ (line 462) | def __init__(self, *label):
    method update_state (line 465) | def update_state(self, state, model):
    method is_legal (line 483) | def is_legal(self, state, model):
    method short_name (line 489) | def short_name(self):
    method __repr__ (line 492) | def __repr__(self):
    method __eq__ (line 495) | def __eq__(self, other):
    method __hash__ (line 502) | def __hash__(self):
  class CloseConstituent (line 505) | class CloseConstituent(Transition):
    method delta_opens (line 506) | def delta_opens(self):
    method update_state (line 509) | def update_state(self, state, model):
    method build_constituents (line 533) | def build_constituents(model, data):
    method is_legal (line 546) | def is_legal(self, state, model):
    method short_name (line 603) | def short_name(self):
    method __repr__ (line 606) | def __repr__(self):
    method __eq__ (line 609) | def __eq__(self, other):
    method __hash__ (line 616) | def __hash__(self):
  function check_transitions (line 619) | def check_transitions(train_transitions, other_transitions, treebank_name):

FILE: stanza/models/constituency/parse_tree.py
  class TreePrintMethod (line 33) | class TreePrintMethod(Enum):
  class Tree (line 46) | class Tree(StanzaObject):
    method __init__ (line 50) | def __init__(self, label=None, children=None):
    method is_leaf (line 60) | def is_leaf(self):
    method is_preterminal (line 63) | def is_preterminal(self):
    method yield_preterminals (line 66) | def yield_preterminals(self):
    method leaf_labels (line 86) | def leaf_labels(self):
    method __len__ (line 96) | def __len__(self):
    method all_leaves_are_preterminals (line 99) | def all_leaves_are_preterminals(self):
    method pretty_print (line 111) | def pretty_print(self, normalize=None):
    method __format__ (line 169) | def __format__(self, spec):
    method __repr__ (line 289) | def __repr__(self):
    method __eq__ (line 292) | def __eq__(self, other):
    method depth (line 305) | def depth(self):
    method visit_preorder (line 310) | def visit_preorder(self, internal=None, preterminal=None, leaf=None):
    method get_unique_constituent_labels (line 336) | def get_unique_constituent_labels(trees):
    method get_constituent_counts (line 346) | def get_constituent_counts(trees):
    method get_unique_tags (line 359) | def get_unique_tags(trees):
    method get_unique_words (line 372) | def get_unique_words(trees):
    method get_common_words (line 385) | def get_common_words(trees, num_words):
    method get_rare_words (line 401) | def get_rare_words(trees, threshold=0.05):
    method get_root_labels (line 417) | def get_root_labels(trees):
    method get_compound_constituents (line 421) | def get_compound_constituents(trees, separate_root=False):
    method simplify_labels (line 445) | def simplify_labels(self, pattern=CONSTITUENT_SPLIT):
    method reverse (line 458) | def reverse(self):
    method remap_constituent_labels (line 471) | def remap_constituent_labels(self, label_map):
    method remap_words (line 485) | def remap_words(self, word_map):
    method replace_words (line 499) | def replace_words(self, words):
    method replace_tags (line 520) | def replace_tags(self, tags):
    method prune_none (line 551) | def prune_none(self):
    method count_unary_depth (line 571) | def count_unary_depth(self):
    method write_treebank (line 587) | def write_treebank(trees, out_file, fmt="{}"):

FILE: stanza/models/constituency/parser_training.py
  class EpochStats (line 38) | class EpochStats(namedtuple("EpochStats", ['epoch_loss', 'transitions_co...
    method __add__ (line 39) | def __add__(self, other):
  function evaluate (line 48) | def evaluate(args, model_file, retag_pipeline):
  function remove_optimizer (line 96) | def remove_optimizer(args, model_save_file, model_load_file):
  function add_grad_clipping (line 114) | def add_grad_clipping(trainer, grad_clipping):
  function build_trainer (line 123) | def build_trainer(args, train_trees, dev_trees, silver_trees, foundation...
  function train (line 201) | def train(args, model_load_file, retag_pipeline):
  function compose_train_data (line 263) | def compose_train_data(trees, sequences):
  function next_epoch_data (line 270) | def next_epoch_data(leftover_training_data, train_data, epoch_size):
  function update_bert_learning_rate (line 293) | def update_bert_learning_rate(args, optimizer, epochs_trained):
  function iterate_training (line 327) | def iterate_training(args, trainer, train_trees, train_sequences, transi...
  function train_model_one_epoch (line 543) | def train_model_one_epoch(epoch, trainer, transition_tensors, process_ou...
  function train_model_one_batch (line 568) | def train_model_one_batch(epoch, batch_idx, model, training_batch, trans...
  function run_dev_set (line 691) | def run_dev_set(model, retagged_trees, original_trees, args, evaluator=N...

FILE: stanza/models/constituency/partitioned_transformer.py
  class FeatureDropoutFunction (line 16) | class FeatureDropoutFunction(torch.autograd.function.InplaceFunction):
    method forward (line 18) | def forward(ctx, input, p=0.5, train=False, inplace=False):
    method backward (line 51) | def backward(ctx, grad_output):
  class FeatureDropout (line 58) | class FeatureDropout(nn.Dropout):
    method forward (line 65) | def forward(self, x):
  class PartitionedReLU (line 79) | class PartitionedReLU(nn.ReLU):
    method forward (line 80) | def forward(self, x):
  class PartitionedLinear (line 88) | class PartitionedLinear(nn.Module):
    method __init__ (line 89) | def __init__(self, in_features, out_features, bias=True):
    method forward (line 94) | def forward(self, x):
  class PartitionedMultiHeadAttention (line 105) | class PartitionedMultiHeadAttention(nn.Module):
    method __init__ (line 106) | def __init__(
    method forward (line 123) | def forward(self, x, mask=None):
  class PartitionedTransformerEncoderLayer (line 147) | class PartitionedTransformerEncoderLayer(nn.Module):
    method __init__ (line 148) | def __init__(self,
    method forward (line 173) | def forward(self, x, mask=None):
  class PartitionedTransformerEncoder (line 185) | class PartitionedTransformerEncoder(nn.Module):
    method __init__ (line 186) | def __init__(self,
    method forward (line 208) | def forward(self, x, mask=None):
  class ConcatPositionalEncoding (line 214) | class ConcatPositionalEncoding(nn.Module):
    method __init__ (line 218) | def __init__(self, d_model=256, max_len=512):
    method forward (line 223) | def forward(self, x):
  class PartitionedTransformerModule (line 230) | class PartitionedTransformerModule(nn.Module):
    method __init__ (line 231) | def __init__(self,
    method forward (line 273) | def forward(self, attention_mask, bert_embeddings):

FILE: stanza/models/constituency/positional_encoding.py
  class SinusoidalEncoding (line 11) | class SinusoidalEncoding(nn.Module):
    method __init__ (line 15) | def __init__(self, model_dim, max_len):
    method build_position (line 20) | def build_position(model_dim, max_len, device=None):
    method forward (line 30) | def forward(self, x):
    method max_len (line 42) | def max_len(self):
  class AddSinusoidalEncoding (line 46) | class AddSinusoidalEncoding(nn.Module):
    method __init__ (line 52) | def __init__(self, d_model=256, max_len=512):
    method forward (line 56) | def forward(self, x, scale=1.0):
  class ConcatSinusoidalEncoding (line 71) | class ConcatSinusoidalEncoding(nn.Module):
    method __init__ (line 77) | def __init__(self, d_model=256, max_len=512):
    method forward (line 81) | def forward(self, x):

FILE: stanza/models/constituency/retagging.py
  function add_retag_args (line 34) | def add_retag_args(parser):
  function postprocess_args (line 46) | def postprocess_args(args):
  function build_retag_pipeline (line 64) | def build_retag_pipeline(args):

FILE: stanza/models/constituency/score_converted_dependencies.py
  function score_converted_dependencies (line 19) | def score_converted_dependencies(args):
  function main (line 48) | def main():

FILE: stanza/models/constituency/state.py
  class State (line 3) | class State(namedtuple('State', ['word_queue', 'transitions', 'constitue...
    method empty_word_queue (line 33) | def empty_word_queue(self):
    method empty_transitions (line 38) | def empty_transitions(self):
    method has_one_constituent (line 43) | def has_one_constituent(self):
    method empty_constituents (line 48) | def empty_constituents(self):
    method num_constituents (line 51) | def num_constituents(self):
    method num_transitions (line 55) | def num_transitions(self):
    method get_word (line 59) | def get_word(self, pos):
    method finished (line 64) | def finished(self, model):
    method get_tree (line 67) | def get_tree(self, model):
    method all_transitions (line 70) | def all_transitions(self, model):
    method all_constituents (line 79) | def all_constituents(self, model):
    method all_words (line 88) | def all_words(self, model):
    method to_string (line 91) | def to_string(self, model):
    method __str__ (line 94) | def __str__(self):
  class MultiState (line 97) | class MultiState(namedtuple('MultiState', ['states', 'gold_tree', 'gold_...
    method finished (line 98) | def finished(self, ensemble):
    method get_tree (line 101) | def get_tree(self, ensemble):
    method empty_constituents (line 105) | def empty_constituents(self):
    method num_constituents (line 108) | def num_constituents(self):
    method num_transitions (line 112) | def num_transitions(self):
    method num_opens (line 117) | def num_opens(self):
    method sentence_length (line 121) | def sentence_length(self):
    method empty_word_queue (line 124) | def empty_word_queue(self):
    method empty_transitions (line 127) | def empty_transitions(self):
    method constituents (line 131) | def constituents(self):
    method transitions (line 139) | def transitions(self):

FILE: stanza/models/constituency/text_processing.py
  function read_tokenized_file (line 14) | def read_tokenized_file(tokenized_file):
  function read_xml_tree_file (line 26) | def read_xml_tree_file(tree_file):
  function parse_tokenized_sentences (line 76) | def parse_tokenized_sentences(args, model, retag_pipeline, sentences):
  function parse_text (line 88) | def parse_text(args, model, retag_pipeline, tokenized_file=None, predict...
  function parse_dir (line 133) | def parse_dir(args, model, retag_pipeline, tokenized_dir, predict_dir):
  function load_model_parse_text (line 142) | def load_model_parse_text(args, model_file, retag_pipeline):

FILE: stanza/models/constituency/top_down_oracle.py
  function find_constituent_end (line 7) | def find_constituent_end(gold_sequence, cur_index):
  function fix_shift_close (line 22) | def fix_shift_close(gold_transition, pred_transition, gold_sequence, gol...
  function fix_open_close (line 42) | def fix_open_close(gold_transition, pred_transition, gold_sequence, gold...
  function fix_one_open_shift (line 61) | def fix_one_open_shift(gold_transition, pred_transition, gold_sequence, ...
  function fix_multiple_open_shift (line 93) | def fix_multiple_open_shift(gold_transition, pred_transition, gold_seque...
  function fix_nested_open_constituent (line 131) | def fix_nested_open_constituent(gold_transition, pred_transition, gold_s...
  function fix_shift_open_immediate_close (line 162) | def fix_shift_open_immediate_close(gold_transition, pred_transition, gol...
  function fix_shift_open_ambiguous_unary (line 186) | def fix_shift_open_ambiguous_unary(gold_transition, pred_transition, gol...
  function fix_shift_open_ambiguous_later (line 209) | def fix_shift_open_ambiguous_later(gold_transition, pred_transition, gol...
  function fix_shift_open_ambiguous_predicted (line 234) | def fix_shift_open_ambiguous_predicted(gold_transition, pred_transition,...
  function fix_close_shift_ambiguous_immediate (line 268) | def fix_close_shift_ambiguous_immediate(gold_transition, pred_transition...
  function fix_close_shift_ambiguous_later (line 298) | def fix_close_shift_ambiguous_later(gold_transition, pred_transition, go...
  function fix_close_shift (line 331) | def fix_close_shift(gold_transition, pred_transition, gold_sequence, gol...
  function fix_close_shift_with_opens (line 387) | def fix_close_shift_with_opens(*args, **kwargs):
  function fix_close_next_correct_predicted (line 390) | def fix_close_next_correct_predicted(gold_transition, pred_transition, g...
  function fix_close_open_correct_open (line 428) | def fix_close_open_correct_open(gold_transition, pred_transition, gold_s...
  function fix_close_open_correct_open_ambiguous_immediate (line 466) | def fix_close_open_correct_open_ambiguous_immediate(*args, **kwargs):
  function fix_close_open_correct_open_ambiguous_later (line 469) | def fix_close_open_correct_open_ambiguous_later(gold_transition, pred_tr...
  function fix_open_open_ambiguous_unary (line 487) | def fix_open_open_ambiguous_unary(gold_transition, pred_transition, gold...
  function fix_open_open_ambiguous_later (line 509) | def fix_open_open_ambiguous_later(gold_transition, pred_transition, gold...
  function fix_open_open_ambiguous_random (line 532) | def fix_open_open_ambiguous_random(gold_transition, pred_transition, gol...
  function report_shift_open (line 557) | def report_shift_open(gold_transition, pred_transition, gold_sequence, g...
  function report_close_shift (line 566) | def report_close_shift(gold_transition, pred_transition, gold_sequence, ...
  function report_close_open (line 574) | def report_close_open(gold_transition, pred_transition, gold_sequence, g...
  function report_open_open (line 582) | def report_open_open(gold_transition, pred_transition, gold_sequence, go...
  class RepairType (line 591) | class RepairType(Enum):
    method __new__ (line 675) | def __new__(cls, fn, correct=False, debug=False):
    method is_correct (line 688) | def is_correct(self):
  class TopDownOracle (line 755) | class TopDownOracle(DynamicOracle):
    method __init__ (line 756) | def __init__(self, root_labels, oracle_level, additional_oracle_levels...

FILE: stanza/models/constituency/trainer.py
  class Trainer (line 30) | class Trainer(BaseTrainer):
    method __init__ (line 36) | def __init__(self, model, optimizer=None, scheduler=None, epochs_train...
    method save (line 39) | def save(self, filename, save_optimizer=True):
    method get_peft_params (line 45) | def get_peft_params(self):
    method model_type (line 53) | def model_type(self):
    method find_and_load_pretrain (line 57) | def find_and_load_pretrain(saved_args, foundation_cache):
    method find_and_load_charlm (line 68) | def find_and_load_charlm(charlm_file, direction, saved_args, foundatio...
    method log_num_words_known (line 79) | def log_num_words_known(self, words):
    method load_optimizer (line 83) | def load_optimizer(model, checkpoint, first_optimizer, filename):
    method load_scheduler (line 95) | def load_scheduler(model, optimizer, checkpoint, first_optimizer):
    method model_from_params (line 102) | def model_from_params(params, peft_params, args, foundation_cache=None...
    method build_trainer (line 206) | def build_trainer(args, train_transitions, train_constituents, tags, w...

FILE: stanza/models/constituency/transformer_tree_stack.py
  class TransformerTreeStack (line 20) | class TransformerTreeStack(nn.Module):
    method __init__ (line 21) | def __init__(self, input_size, output_size, input_dropout, length_limi...
    method attention (line 52) | def attention(self, key, query, value, mask=None):
    method initial_state (line 105) | def initial_state(self, initial_value=None):
    method push_states (line 131) | def push_states(self, stacks, values, inputs):
    method output (line 192) | def output(self, stack):

FILE: stanza/models/constituency/transition_sequence.py
  function yield_top_down_sequence (line 18) | def yield_top_down_sequence(tree, transition_scheme=TransitionScheme.TOP...
  function yield_in_order_sequence (line 59) | def yield_in_order_sequence(tree):
  function yield_in_order_compound_sequence (line 83) | def yield_in_order_compound_sequence(tree, transition_scheme):
  function build_sequence (line 127) | def build_sequence(tree, transition_scheme=TransitionScheme.TOP_DOWN_UNA...
  function build_treebank (line 139) | def build_treebank(trees, transition_scheme=TransitionScheme.TOP_DOWN_UN...
  function all_transitions (line 148) | def all_transitions(transition_lists):
  function convert_trees_to_sequences (line 157) | def convert_trees_to_sequences(trees, treebank_name, transition_scheme, ...
  function main (line 172) | def main():

FILE: stanza/models/constituency/tree_embedding.py
  class TreeEmbedding (line 16) | class TreeEmbedding(nn.Module):
    method __init__ (line 17) | def __init__(self, constituency_parser, args):
    method embed_trees (line 53) | def embed_trees(self, inputs):
    method forward (line 96) | def forward(self, inputs):
    method get_norms (line 99) | def get_norms(self):
    method get_params (line 107) | def get_params(self, skip_modules=True):
    method from_parser_file (line 125) | def from_parser_file(args, foundation_cache=None):
    method model_from_params (line 130) | def model_from_params(params, args, foundation_cache=None):

FILE: stanza/models/constituency/tree_reader.py
  class UnclosedTreeError (line 26) | class UnclosedTreeError(ValueError):
    method __init__ (line 30) | def __init__(self, line_num):
  class ExtraCloseTreeError (line 34) | class ExtraCloseTreeError(ValueError):
    method __init__ (line 38) | def __init__(self, line_num):
  class UnlabeledTreeError (line 42) | class UnlabeledTreeError(ValueError):
    method __init__ (line 48) | def __init__(self, line_num):
  class MixedTreeError (line 52) | class MixedTreeError(ValueError):
    method __init__ (line 56) | def __init__(self, line_num, child_label, children):
  function normalize (line 62) | def normalize(text):
  function read_single_tree (line 65) | def read_single_tree(token_iterator, broken_ok):
  class TokenIterator (line 120) | class TokenIterator:
    method __init__ (line 128) | def __init__(self):
    method set_mark (line 133) | def set_mark(self):
    method get_mark (line 139) | def get_mark(self):
    method __iter__ (line 144) | def __iter__(self):
    method __next__ (line 147) | def __next__(self):
  class TextTokenIterator (line 167) | class TextTokenIterator(TokenIterator):
    method __init__ (line 168) | def __init__(self, text, use_tqdm=True):
  class FileTokenIterator (line 179) | class FileTokenIterator(TokenIterator):
    method __init__ (line 180) | def __init__(self, filename):
    method __enter__ (line 184) | def __enter__(self):
    method __exit__ (line 197) | def __exit__(self, exc_type, exc_value, exc_tb):
  function read_token_iterator (line 201) | def read_token_iterator(token_iterator, broken_ok, tree_callback):
  function read_trees (line 224) | def read_trees(text, broken_ok=False, tree_callback=None, use_tqdm=True):
  function read_tree_file (line 233) | def read_tree_file(filename, broken_ok=False, tree_callback=None):
  function read_directory (line 241) | def read_directory(dirname, broken_ok=False, tree_callback=None):
  function read_treebank (line 251) | def read_treebank(filename, tree_callback=None):
  function main (line 265) | def main():

FILE: stanza/models/constituency/tree_stack.py
  class TreeStack (line 7) | class TreeStack(namedtuple('TreeStack', ['value', 'parent', 'length'])):
    method pop (line 34) | def pop(self):
    method push (line 37) | def push(self, value):
    method __iter__ (line 41) | def __iter__(self):
    method __reversed__ (line 48) | def __reversed__(self):
    method __str__ (line 53) | def __str__(self):
    method __len__ (line 56) | def __len__(self):

FILE: stanza/models/constituency/utils.py
  function retag_tags (line 44) | def retag_tags(doc, pipelines, xpos):
  function retag_trees (line 65) | def retag_trees(trees, pipelines, xpos=True):
  function build_optimizer (line 105) | def build_optimizer(args, model, build_simple_adadelta=False):
  function build_scheduler (line 152) | def build_scheduler(args, optimizer, first_optimizer=False):
  function initialize_linear (line 177) | def initialize_linear(linear, nonlinearity, bias):
  function add_predict_output_args (line 185) | def add_predict_output_args(parser):
  function postprocess_predict_output_args (line 195) | def postprocess_predict_output_args(args):
  function get_open_nodes (line 200) | def get_open_nodes(trees, transition_scheme):
  function verify_transitions (line 213) | def verify_transitions(trees, sequences, transition_scheme, unary_limit,...
  function check_constituents (line 237) | def check_constituents(train_constituents, trees, treebank_name, fail=Tr...
  function check_root_labels (line 258) | def check_root_labels(root_labels, other_trees, treebank_name):
  function remove_duplicate_trees (line 266) | def remove_duplicate_trees(trees, treebank_name):
  function remove_singleton_trees (line 282) | def remove_singleton_trees(trees):

FILE: stanza/models/constituency_parser.py
  function build_argparse (line 185) | def build_argparse():
  function build_model_filename (line 755) | def build_model_filename(args):
  function parse_args (line 790) | def parse_args(args=None):
  function main (line 882) | def main(args=None):

FILE: stanza/models/coref/anaphoricity_scorer.py
  class AnaphoricityScorer (line 10) | class AnaphoricityScorer(torch.nn.Module):
    method __init__ (line 12) | def __init__(self,
    method forward (line 38) | def forward(self, *,  # type: ignore  # pylint: disable=arguments-diff...
    method _ffnn (line 71) | def _ffnn(self, x: torch.Tensor) -> torch.Tensor:
    method _get_pair_matrix (line 92) | def _get_pair_matrix(mentions_batch: torch.Tensor,

FILE: stanza/models/coref/bert.py
  function get_subwords_batches (line 15) | def get_subwords_batches(doc: Doc,

FILE: stanza/models/coref/cluster_checker.py
  class ClusterChecker (line 15) | class ClusterChecker:
    method __init__ (line 19) | def __init__(self):
    method _f1 (line 39) | def _f1(p,r):
    method add_predictions (line 42) | def add_predictions(self,
    method bakeoff (line 87) | def bakeoff(self):
    method mbc (line 92) | def mbc(self):
    method total_lea (line 105) | def total_lea(self):
    method _lea (line 114) | def _lea(key: List[List[Hashable]],
    method _muc (line 139) | def _muc(key: List[List[Hashable]],
    method _b3 (line 177) | def _b3(key: List[List[Hashable]],
    method _phi4 (line 202) | def _phi4(c1, c2):
    method _ceafe (line 206) | def _ceafe(clusters: List[List[Hashable]], gold_clusters: List[List[Ha...

FILE: stanza/models/coref/config.py
  class Config (line 11) | class Config:  # pylint: disable=too-many-instance-attributes, too-few-p...

FILE: stanza/models/coref/conll.py
  function write_conll (line 14) | def write_conll(doc: Doc,
  function open_ (line 93) | def open_(config: Config, epochs: int, data_split: str):

FILE: stanza/models/coref/const.py
  class CorefResult (line 17) | class CorefResult:

FILE: stanza/models/coref/coref_chain.py
  class CorefMention (line 8) | class CorefMention:
    method __init__ (line 9) | def __init__(self, sentence, start_word, end_word):
  class CorefChain (line 14) | class CorefChain:
    method __init__ (line 15) | def __init__(self, index, mentions, representative_text, representativ...
  class CorefAttachment (line 21) | class CorefAttachment:
    method __init__ (line 22) | def __init__(self, chain, is_start, is_end, is_representative):
    method to_json (line 28) | def to_json(self):

FILE: stanza/models/coref/dataset.py
  class CorefDataset (line 9) | class CorefDataset(Dataset):
    method __init__ (line 11) | def __init__(self, path, config, tokenizer):
    method avg_span (line 62) | def avg_span(self):
    method __getitem__ (line 65) | def __getitem__(self, x):
    method __len__ (line 68) | def __len__(self):

FILE: stanza/models/coref/loss.py
  class CorefLoss (line 7) | class CorefLoss(torch.nn.Module):
    method __init__ (line 13) | def __init__(self, bce_weight: float):
    method forward (line 19) | def forward(self,    # type: ignore  # pylint: disable=arguments-diffe...
    method _bce (line 26) | def _bce(self,
    method _nlml (line 34) | def _nlml(input_: torch.Tensor, target: torch.Tensor) -> torch.Tensor:

FILE: stanza/models/coref/model.py
  class CorefModel (line 49) | class CorefModel:  # pylint: disable=too-many-instance-attributes
    method __init__ (line 69) | def __init__(self,
    method training (line 105) | def training(self) -> bool:
    method training (line 110) | def training(self, new_value: bool):
    method evaluate (line 118) | def evaluate(self,
    method load_weights (line 218) | def load_weights(self,
    method load_state_dicts (line 264) | def load_state_dicts(self,
    method build_doc (line 289) | def build_doc(self, doc: dict) -> dict:
    method load_model (line 319) | def load_model(path: str,
    method run (line 352) | def run(self,  # pylint: disable=too-many-locals
    method save_weights (line 426) | def save_weights(self, save_path=None, save_optimizers=True):
    method log_norms (line 454) | def log_norms(self):
    method train (line 463) | def train(self, log=False):
    method _bertify (line 623) | def _bertify(self, doc: Doc) -> torch.Tensor:
    method _build_model (line 663) | def _build_model(self, foundation_cache):
    method disable_zeros_predictor (line 711) | def disable_zeros_predictor(self):
    method _build_optimizers (line 715) | def _build_optimizers(self):
    method _clusterize (line 758) | def _clusterize(self, doc: Doc, scores: torch.Tensor, top_indices: tor...
    method _get_docs (line 802) | def _get_docs(self, path: str) -> List[Doc]:
    method _get_ground_truth (line 808) | def _get_ground_truth(cluster_ids: torch.Tensor,
    method _load_config (line 862) | def _load_config(config_path: str,
    method _set_training (line 874) | def _set_training(self, value: bool):

FILE: stanza/models/coref/pairwise_encoder.py
  class PairwiseEncoder (line 12) | class PairwiseEncoder(torch.nn.Module):
    method __init__ (line 19) | def __init__(self, config: Config):
    method device (line 45) | def device(self) -> torch.device:
    method forward (line 50) | def forward(self,  # type: ignore  # pylint: disable=arguments-differ ...
    method _speaker_map (line 74) | def _speaker_map(doc: Doc) -> List[int]:

FILE: stanza/models/coref/rough_scorer.py
  class RoughScorer (line 12) | class RoughScorer(torch.nn.Module):
    method __init__ (line 18) | def __init__(self, features: int, config: Config):
    method forward (line 25) | def forward(self,  # type: ignore  # pylint: disable=arguments-differ ...
    method _prune (line 44) | def _prune(self,

FILE: stanza/models/coref/span_predictor.py
  class SpanPredictor (line 11) | class SpanPredictor(torch.nn.Module):
    method __init__ (line 12) | def __init__(self, input_size: int, distance_emb_size: int):
    method device (line 30) | def device(self) -> torch.device:
    method forward (line 35) | def forward(self,  # type: ignore  # pylint: disable=arguments-differ ...
    method get_training_data (line 96) | def get_training_data(self,
    method predict (line 111) | def predict(self,

FILE: stanza/models/coref/utils.py
  class GraphNode (line 11) | class GraphNode:
    method __init__ (line 12) | def __init__(self, node_id: int):
    method link (line 17) | def link(self, another: "GraphNode"):
    method __repr__ (line 21) | def __repr__(self) -> str:
  function add_dummy (line 25) | def add_dummy(tensor: torch.Tensor, eps: bool = False):
  function sigmoid_focal_loss (line 38) | def sigmoid_focal_loss(

FILE: stanza/models/coref/word_encoder.py
  class WordEncoder (line 12) | class WordEncoder(torch.nn.Module):  # pylint: disable=too-many-instance...
    method __init__ (line 16) | def __init__(self, features: int, config: Config):
    method device (line 27) | def device(self) -> torch.device:
    method forward (line 32) | def forward(self,  # type: ignore  # pylint: disable=arguments-differ ...
    method _attn_scores (line 60) | def _attn_scores(self,
    method _cluster_ids (line 98) | def _cluster_ids(self, doc: Doc) -> torch.Tensor:

FILE: stanza/models/depparse/data.py
  function data_to_batches (line 15) | def data_to_batches(data, batch_size, eval_mode, sort_during_eval, min_l...
  class DataLoader (line 65) | class DataLoader:
    method __init__ (line 67) | def __init__(self, doc, batch_size, args, pretrain, vocab=None, evalua...
    method init_vocab (line 108) | def init_vocab(self, data):
    method preprocess (line 127) | def preprocess(self, data, vocab, pretrain_vocab, args):
    method __len__ (line 149) | def __len__(self):
    method __getitem__ (line 152) | def __getitem__(self, key):
    method load_doc (line 192) | def load_doc(self, doc):
    method resolve_none (line 198) | def resolve_none(self, data):
    method __iter__ (line 207) | def __iter__(self):
    method set_batch_size (line 211) | def set_batch_size(self, batch_size):
    method reshuffle (line 214) | def reshuffle(self):
    method chunk_batches (line 219) | def chunk_batches(self, data):
  function to_int (line 227) | def to_int(string, ignore_error=False):

FILE: stanza/models/depparse/model.py
  class Parser (line 22) | class Parser(nn.Module):
    method __init__ (line 23) | def __init__(self, args, vocab, emb_matrix=None, foundation_cache=None...
    method add_unsaved_module (line 131) | def add_unsaved_module(self, name, module):
    method log_norms (line 135) | def log_norms(self):
    method forward (line 138) | def forward(self, word, word_mask, wordchars, wordchars_mask, upos, xp...

FILE: stanza/models/depparse/scorer.py
  function score_named_dependencies (line 12) | def score_named_dependencies(pred_doc, gold_doc, output_latex=False):
  function score (line 64) | def score(system_conllu_file, gold_conllu_file, verbose=True):

FILE: stanza/models/depparse/trainer.py
  function unpack_batch (line 27) | def unpack_batch(batch, device):
  class Trainer (line 37) | class Trainer(BaseTrainer):
    method __init__ (line 39) | def __init__(self, args=None, vocab=None, pretrain=None, model_file=None,
    method __init_optim (line 87) | def __init_optim(self):
    method update (line 121) | def update(self, batch, eval=False):
    method predict (line 145) | def predict(self, batch, unsort=True):
    method save (line 164) | def save(self, filename, skip_modules=True, save_optimizer=False):
    method load (line 194) | def load(self, filename, pretrain, args=None, foundation_cache=None, d...

FILE: stanza/models/identity_lemmatizer.py
  function parse_args (line 19) | def parse_args(args=None):
  function main (line 36) | def main(args=None):

FILE: stanza/models/lang_identifier.py
  function parse_args (line 22) | def parse_args(args=None):
  function randomize_lengths_range (line 45) | def randomize_lengths_range(range_list):
  function main (line 54) | def main(args=None):
  function build_indexes (line 63) | def build_indexes(args):
  function train_model (line 85) | def train_model(args):
  function score_log_path (line 141) | def score_log_path(file_path):
  function eval_model (line 153) | def eval_model(args):
  function eval_trainer (line 178) | def eval_trainer(trainer, dev_data, batch_mode=False, fine_grained=True):

FILE: stanza/models/langid/create_ud_data.py
  function parse_args (line 33) | def parse_args(args=None):
  function splits_from_list (line 48) | def splits_from_list(value_list):
  function main (line 52) | def main(args=None):
  function collect_files (line 70) | def collect_files(ud_path, languages, data_format="ud"):
  function generate_examples (line 93) | def generate_examples(lang_id, list_of_files, splits=(0.8,0.1,0.1), min_...
  function sentences_from_file (line 114) | def sentences_from_file(ud_file_path, data_format="ud"):
  function sentence_to_windows (line 130) | def sentence_to_windows(sentence, min_window, max_window):
  function validate_sentence (line 151) | def validate_sentence(current_window, min_window):
  function find (line 160) | def find(s, ch):
  function clean_sentence (line 168) | def clean_sentence(line):
  function example_json (line 202) | def example_json(lang_id, text, eval_length=None):

FILE: stanza/models/langid/data.py
  class DataLoader (line 6) | class DataLoader:
    method __init__ (line 17) | def __init__(self, device=None):
    method load_data (line 25) | def load_data(self, batch_size, data_files, char_index, tag_index, ran...
    method randomize_data (line 100) | def randomize_data(sentences, upper_lim=20, lower_lim=5):
    method build_batch_tensors (line 121) | def build_batch_tensors(self, batch):
    method next (line 132) | def next(self):

FILE: stanza/models/langid/model.py
  class LangIDBiLSTM (line 7) | class LangIDBiLSTM(nn.Module):
    method __init__ (line 18) | def __init__(self, char_to_idx, tag_to_idx, num_layers, embedding_dim,...
    method build_lang_mask (line 60) | def build_lang_mask(self, device):
    method loss (line 72) | def loss(self, Y_hat, Y):
    method forward (line 75) | def forward(self, x):
    method prediction_scores (line 90) | def prediction_scores(self, x):
    method save (line 98) | def save(self, path):
    method load (line 111) | def load(cls, path, device=None, batch_size=64, lang_subset=None):

FILE: stanza/models/langid/trainer.py
  class Trainer (line 7) | class Trainer:
    method __init__ (line 14) | def __init__(self, config, load_model=False, device=None):
    method update (line 27) | def update(self, inputs):
    method predict (line 36) | def predict(self, inputs):
    method save (line 41) | def save(self, label=None):
    method load (line 47) | def load(self, model_path=None, device=None):

FILE: stanza/models/lemma/attach_lemma_classifier.py
  function attach_classifier (line 6) | def attach_classifier(input_filename, output_filename, classifiers):
  function main (line 15) | def main(args=None):

FILE: stanza/models/lemma/data.py
  class DataLoader (line 17) | class DataLoader:
    method __init__ (line 18) | def __init__(self, doc, batch_size, args, vocab=None, evaluation=False...
    method init_vocab (line 66) | def init_vocab(self, data):
    method preprocess (line 74) | def preprocess(self, data, char_vocab, pos_vocab, args):
    method __len__ (line 89) | def __len__(self):
    method __getitem__ (line 92) | def __getitem__(self, key):
    method __iter__ (line 119) | def __iter__(self):
    method raw_data (line 123) | def raw_data(self):
    method load_doc (line 127) | def load_doc(doc, caseless, skip_blank_lemmas, evaluation):
    method extract_correct_forms (line 142) | def extract_correct_forms(data):
    method remove_goeswith (line 180) | def remove_goeswith(data):
    method lowercase_data (line 202) | def lowercase_data(data):
    method skip_blank_lemmas (line 208) | def skip_blank_lemmas(data):
    method resolve_none (line 213) | def resolve_none(data):

FILE: stanza/models/lemma/edit.py
  function get_edit_type (line 7) | def get_edit_type(word, lemma):
  function edit_word (line 15) | def edit_word(word, pred, edit_id):

FILE: stanza/models/lemma/scorer.py
  function score (line 11) | def score(system_conllu_file, gold_conllu_file):

FILE: stanza/models/lemma/trainer.py
  function unpack_batch (line 26) | def unpack_batch(batch, device):
  class Trainer (line 33) | class Trainer(object):
    method __init__ (line 35) | def __init__(self, args=None, vocab=None, emb_matrix=None, model_file=...
    method build_seq2seq (line 63) | def build_seq2seq(self, args, emb_matrix, foundation_cache):
    method update (line 78) | def update(self, batch, eval=False):
    method predict (line 104) | def predict(self, batch, beam_size=1, vocab=None):
    method postprocess (line 127) | def postprocess(self, words, preds, edits=None):
    method has_contextual_lemmatizers (line 148) | def has_contextual_lemmatizers(self):
    method predict_contextual (line 151) | def predict_contextual(self, sentence_words, sentence_tags, preds):
    method update_contextual_preds (line 175) | def update_contextual_preds(self, doc, preds):
    method update_lr (line 198) | def update_lr(self, new_lr):
    method train_dict (line 201) | def train_dict(self, triples, update_word_dict=True):
    method predict_dict (line 221) | def predict_dict(self, pairs):
    method skip_seq2seq (line 236) | def skip_seq2seq(self, pairs):
    method ensemble (line 252) | def ensemble(self, pairs, other_preds):
    method save (line 271) | def save(self, filename, skip_modules=True):
    method load (line 295) | def load(self, filename, args, foundation_cache, lemma_classifier_args...

FILE: stanza/models/lemma/vocab.py
  class Vocab (line 6) | class Vocab(BaseVocab):
    method build_vocab (line 7) | def build_vocab(self):
  class MultiVocab (line 12) | class MultiVocab(BaseMultiVocab):
    method load_state_dict (line 14) | def load_state_dict(cls, state_dict):

FILE: stanza/models/lemma_classifier/base_model.py
  class LemmaClassifier (line 23) | class LemmaClassifier(ABC, nn.Module):
    method __init__ (line 24) | def __init__(self, label_decoder, target_words, target_upos, *args, **...
    method add_unsaved_module (line 33) | def add_unsaved_module(self, name, module):
    method is_unsaved_module (line 37) | def is_unsaved_module(self, name):
    method save (line 40) | def save(self, save_name):
    method model_type (line 52) | def model_type(self):
    method target_indices (line 57) | def target_indices(self, words, tags):
    method predict (line 60) | def predict(self, position_indices: torch.Tensor, sentences: List[List...
    method from_checkpoint (line 69) | def from_checkpoint(checkpoint, args=None):
    method load (line 125) | def load(filename, args=None):

FILE: stanza/models/lemma_classifier/base_trainer.py
  class BaseLemmaClassifierTrainer (line 20) | class BaseLemmaClassifierTrainer(ABC):
    method configure_weighted_loss (line 21) | def configure_weighted_loss(self, label_decoder: Mapping, counts: Mapp...
    method build_model (line 36) | def build_model(self, label_decoder, upos_to_id, known_words, target_w...
    method train (line 41) | def train(self, num_epochs: int, save_name: str, args: Mapping, eval_f...

FILE: stanza/models/lemma_classifier/baseline_model.py
  class BaselineModel (line 12) | class BaselineModel:
    method __init__ (line 14) | def __init__(self, token_to_lemmatize, prediction_lemma, prediction_up...
    method predict (line 19) | def predict(self, token):
    method evaluate (line 23) | def evaluate(self, conll_path):

FILE: stanza/models/lemma_classifier/constants.py
  class ModelType (line 8) | class ModelType(Enum):

FILE: stanza/models/lemma_classifier/evaluate_many.py
  function evaluate_n_models (line 13) | def evaluate_n_models(path_to_models_dir, args):
  function main (line 45) | def main():

FILE: stanza/models/lemma_classifier/evaluate_models.py
  function get_weighted_f1 (line 35) | def get_weighted_f1(mcc_results: Mapping[int, Mapping[str, float]], conf...
  function evaluate_sequences (line 54) | def evaluate_sequences(gold_tag_sequences: List[Any], pred_tag_sequences...
  function model_predict (line 111) | def model_predict(model: nn.Module, position_indices: torch.Tensor, sent...
  function evaluate_model (line 130) | def evaluate_model(model: nn.Module, eval_path: str, verbose: bool = Tru...
  function main (line 183) | def main(args=None, predefined_args=None):

FILE: stanza/models/lemma_classifier/lstm_model.py
  class LemmaClassifierLSTM (line 17) | class LemmaClassifierLSTM(LemmaClassifier):
    method __init__ (line 24) | def __init__(self, model_args, output_dim, pt_embedding, label_decoder...
    method get_save_dict (line 109) | def get_save_dict(self):
    method convert_tags (line 125) | def convert_tags(self, upos_tags: List[List[str]]):
    method forward (line 130) | def forward(self, pos_indices: List[int], sentences: List[List[str]], ...
    method model_type (line 218) | def model_type(self):

FILE: stanza/models/lemma_classifier/prepare_dataset.py
  function load_doc_from_conll_file (line 17) | def load_doc_from_conll_file(path: str):
  class DataProcessor (line 24) | class DataProcessor():
    method __init__ (line 26) | def __init__(self, target_word: str, target_upos: List[str], allowed_l...
    method keep_sentence (line 32) | def keep_sentence(self, sentence):
    method find_all_occurrences (line 38) | def find_all_occurrences(self, sentence) -> List[int]:
    method write_output_file (line 49) | def write_output_file(save_name, target_upos, sentences):
    method process_document (line 64) | def process_document(self, doc, save_name: str) -> None:
  function main (line 99) | def main(args=None):

FILE: stanza/models/lemma_classifier/train_lstm_model.py
  class LemmaClassifierTrainer (line 19) | class LemmaClassifierTrainer(BaseLemmaClassifierTrainer):
    method __init__ (line 24) | def __init__(self, model_args: dict, embedding_file: str, use_charlm: ...
    method build_model (line 75) | def build_model(self, label_decoder, upos_to_id, known_words, target_w...
  function build_argparse (line 79) | def build_argparse():
  function main (line 100) | def main(args=None, predefined_args=None):

FILE: stanza/models/lemma_classifier/train_many.py
  function train_n_models (line 19) | def train_n_models(num_models: int, base_path: str, args):
  function train_n_tfmrs (line 83) | def train_n_tfmrs(num_models: int, base_path: str, args):
  function main (line 112) | def main():

FILE: stanza/models/lemma_classifier/train_transformer_model.py
  class TransformerBaselineTrainer (line 21) | class TransformerBaselineTrainer(BaseLemmaClassifierTrainer):
    method __init__ (line 27) | def __init__(self, model_args: dict, transformer_name: str = "roberta"...
    method set_layer_learning_rates (line 53) | def set_layer_learning_rates(self, transformer_lr: float, mlp_lr: floa...
    method build_model (line 75) | def build_model(self, label_decoder, upos_to_id, known_words, target_w...
  function main (line 79) | def main(args=None, predefined_args=None):

FILE: stanza/models/lemma_classifier/transformer_model.py
  class LemmaClassifierWithTransformer (line 16) | class LemmaClassifierWithTransformer(LemmaClassifier):
    method __init__ (line 17) | def __init__(self, model_args: dict, output_dim: int, transformer_name...
    method get_save_dict (line 50) | def get_save_dict(self):
    method convert_tags (line 64) | def convert_tags(self, upos_tags: List[List[str]]):
    method forward (line 67) | def forward(self, idx_positions: List[int], sentences: List[List[str]]...
    method model_type (line 88) | def model_type(self):

FILE: stanza/models/lemma_classifier/utils.py
  class Dataset (line 15) | class Dataset:
    method __init__ (line 16) | def __init__(self, data_path: str, batch_size: int =DEFAULT_BATCH_SIZE...
    method __len__ (line 103) | def __len__(self):
    method __iter__ (line 109) | def __iter__(self):
  function extract_unknown_token_indices (line 124) | def extract_unknown_token_indices(tokenized_indices: torch.tensor, unkno...
  function get_device (line 138) | def get_device():
  function round_up_to_multiple (line 152) | def round_up_to_multiple(number, multiple):
  function main (line 168) | def main():

FILE: stanza/models/lemmatizer.py
  function build_argparse (line 33) | def build_argparse():
  function parse_args (line 90) | def parse_args(args=None):
  function main (line 103) | def main(args=None):
  function all_lowercase (line 115) | def all_lowercase(doc):
  function build_model_filename (line 122) | def build_model_filename(args):
  function train (line 133) | def train(args):
  function evaluate (line 259) | def evaluate(args):

FILE: stanza/models/mwt/character_classifier.py
  class CharacterClassifier (line 14) | class CharacterClassifier(nn.Module):
    method __init__ (line 15) | def __init__(self, args):
    method encode (line 43) | def encode(self, enc_inputs, lens):
    method embed (line 49) | def embed(self, src, src_mask):
    method forward (line 60) | def forward(self, src, src_mask):

FILE: stanza/models/mwt/data.py
  class DataLoader (line 30) | class DataLoader:
    method __init__ (line 31) | def __init__(self, doc, batch_size, args, vocab=None, evaluation=False...
    method init_vocab (line 67) | def init_vocab(self, data):
    method maybe_augment_apos (line 72) | def maybe_augment_apos(self, datum):
    method process (line 81) | def process(self, sample):
    method prepare_target (line 91) | def prepare_target(self, vocab, datum):
    method __len__ (line 100) | def __len__(self):
    method __getitem__ (line 103) | def __getitem__(self, key):
    method __collate_fn (line 121) | def __collate_fn(data):
    method __iter__ (line 143) | def __iter__(self):
    method to_loader (line 147) | def to_loader(self):
    method load_doc (line 157) | def load_doc(self, doc, evaluation=False):
  class BinaryDataLoader (line 162) | class BinaryDataLoader(DataLoader):
    method prepare_target (line 168) | def prepare_target(self, vocab, datum):

FILE: stanza/models/mwt/scorer.py
  function score (line 6) | def score(system_conllu_file, gold_conllu_file):

FILE: stanza/models/mwt/trainer.py
  function unpack_batch (line 22) | def unpack_batch(batch, device):
  class Trainer (line 29) | class Trainer(BaseTrainer):
    method __init__ (line 31) | def __init__(self, args=None, vocab=None, emb_matrix=None, model_file=...
    method update (line 53) | def update(self, batch, eval=False):
    method predict (line 84) | def predict(self, batch, unsort=True, never_decode_unk=False, vocab=No...
    method train_dict (line 125) | def train_dict(self, pairs):
    method dict_expansion (line 139) | def dict_expansion(self, word):
    method predict_dict (line 163) | def predict_dict(self, words):
    method ensemble (line 174) | def ensemble(self, cands, other_preds):
    method save (line 186) | def save(self, filename):
    method load (line 199) | def load(self, filename):

FILE: stanza/models/mwt/utils.py
  function mwts_composed_of_words (line 7) | def mwts_composed_of_words(doc):
  function resplit_mwt (line 20) | def resplit_mwt(tokens, pipeline, keep_tokens=True):
  function main (line 82) | def main():

FILE: stanza/models/mwt/vocab.py
  class Vocab (line 6) | class Vocab(BaseVocab):
    method build_vocab (line 7) | def build_vocab(self):
    method add_unit (line 15) | def add_unit(self, unit):

FILE: stanza/models/mwt_expander.py
  function build_argparse (line 40) | def build_argparse():
  function parse_args (line 90) | def parse_args(args=None):
  function main (line 99) | def main(args=None):
  function train (line 112) | def train(args):
  function evaluate (line 276) | def evaluate(args):

FILE: stanza/models/ner/data.py
  class DataLoader (line 15) | class DataLoader:
    method __init__ (line 16) | def __init__(self, doc, batch_size, args, pretrain=None, vocab=None, e...
    method init_vocab (line 55) | def init_vocab(self, data):
    method preprocess (line 93) | def preprocess(self, data, vocab, args):
    method __len__ (line 106) | def __len__(self):
    method __getitem__ (line 109) | def __getitem__(self, key):
    method __iter__ (line 150) | def __iter__(self):
    method _load_doc (line 154) | def _load_doc(self, doc, scheme):
    method process_chars (line 165) | def process_chars(self, sents):
    method reshuffle (line 190) | def reshuffle(self):
    method chunk_batches (line 195) | def chunk_batches(self, data):

FILE: stanza/models/ner/model.py
  class NERTagger (line 25) | class NERTagger(nn.Module):
    method __init__ (line 26) | def __init__(self, args, vocab, emb_matrix=None, foundation_cache=None...
    method init_emb (line 131) | def init_emb(self, emb_matrix):
    method add_unsaved_module (line 140) | def add_unsaved_module(self, name, module):
    method log_norms (line 144) | def log_norms(self):
    method forward (line 151) | def forward(self, sentences, wordchars, wordchars_mask, tags, word_ori...
    method extract_static_embeddings (line 281) | def extract_static_embeddings(args, sents, vocab):

FILE: stanza/models/ner/scorer.py
  function score_by_entity (line 13) | def score_by_entity(pred_tag_sequences, gold_tag_sequences, verbose=True...
  function score_by_token (line 89) | def score_by_token(pred_tag_sequences, gold_tag_sequences, verbose=True,...
  function test (line 161) | def test():

FILE: stanza/models/ner/trainer.py
  function unpack_batch (line 22) | def unpack_batch(batch, device):
  function fix_singleton_tags (line 35) | def fix_singleton_tags(tags):
  class Trainer (line 63) | class Trainer(BaseTrainer):
    method __init__ (line 65) | def __init__(self, args=None, vocab=None, pretrain=None, model_file=No...
    method update (line 117) | def update(self, batch, eval=False):
    method predict (line 137) | def predict(self, batch, unsort=True):
    method save (line 176) | def save(self, filename, skip_modules=True):
    method load (line 200) | def load(self, filename, pretrain=None, args=None, foundation_cache=No...
    method get_known_tags (line 258) | def get_known_tags(self):

FILE: stanza/models/ner/utils.py
  function is_basic_scheme (line 14) | def is_basic_scheme(all_tags):
  function is_bio_scheme (line 30) | def is_bio_scheme(all_tags):
  function to_bio2 (line 49) | def to_bio2(tags):
  function basic_to_bio (line 73) | def basic_to_bio(tags):
  function bio2_to_bioes (line 95) | def bio2_to_bioes(tags):
  function normalize_empty_tags (line 127) | def normalize_empty_tags(sentences):
  function process_tags (line 138) | def process_tags(sentences, scheme):
  function decode_from_bioes (line 218) | def decode_from_bioes(tags):
  function merge_tags (line 267) | def merge_tags(*sequences):

FILE: stanza/models/ner/vocab.py
  class TagVocab (line 8) | class TagVocab(BaseVocab):
    method build_vocab (line 10) | def build_vocab(self):
  function convert_tag_vocab (line 16) | def convert_tag_vocab(state_dict):
  class MultiVocab (line 31) | class MultiVocab(BaseMultiVocab):
    method state_dict (line 32) | def state_dict(self):
    method load_state_dict (line 43) | def load_state_dict(cls, state_dict):

FILE: stanza/models/ner_tagger.py
  function build_argparse (line 36) | def build_argparse():
  function parse_args (line 124) | def parse_args(args=None):
  function main (line 136) | def main(args=None):
  function load_pretrain (line 148) | def load_pretrain(args):
  function model_file_name (line 165) | def model_file_name(args):
  function get_known_tags (line 168) | def get_known_tags(tags):
  function warn_missing_tags (line 182) | def warn_missing_tags(tag_vocab, data_tags, error_msg, bioes_to_bio=False):
  function train (line 209) | def train(args):
  function write_ner_results (line 408) | def write_ner_results(filename, batch, preds, predict_tagset):
  function evaluate (line 427) | def evaluate(args):
  function evaluate_model (line 434) | def evaluate_model(loaded_args, trainer, vocab, eval_file):
  function load_model (line 473) | def load_model(args, model_file):

FILE: stanza/models/parser.py
  function build_argparse (line 42) | def build_argparse():
  function parse_args (line 206) | def parse_args(args=None):
  function main (line 217) | def main(args=None):
  function model_file_name (line 229) | def model_file_name(args):
  function load_pretrain (line 233) | def load_pretrain(args):
  function predict_dataset (line 244) | def predict_dataset(trainer, dev_batch):
  function train (line 253) | def train(args):
  function evaluate (line 467) | def evaluate(args):
  function evaluate_trainer (line 482) | def evaluate_trainer(args, trainer, pretrain):

FILE: stanza/models/pos/build_xpos_vocab_factory.py
  function get_xpos_factory (line 20) | def get_xpos_factory(shorthand, fn):
  function main (line 48) | def main():

FILE: stanza/models/pos/data.py
  class Dataset (line 24) | class Dataset:
    method __init__ (line 25) | def __init__(self, doc, args, pretrain, vocab=None, evaluation=False, ...
    method init_vocab (line 66) | def init_vocab(docs, args):
    method preprocess (line 84) | def preprocess(self, data, vocab, pretrain_vocab, args):
    method __len__ (line 102) | def __len__(self):
    method __mask (line 105) | def __mask(self, upos):
    method __getitem__ (line 129) | def __getitem__(self, key):
    method __iter__ (line 216) | def __iter__(self):
    method to_loader (line 220) | def to_loader(self, **kwargs):
    method to_length_limited_loader (line 227) | def to_length_limited_loader(self, batch_size, maximum_tokens):
    method __collate_fn (line 234) | def __collate_fn(data):
    method load_doc (line 280) | def load_doc(doc):
    method resolve_none (line 287) | def resolve_none(data):
  class LengthLimitedBatchSampler (line 296) | class LengthLimitedBatchSampler(Sampler):
    method __init__ (line 305) | def __init__(self, data, batch_size, maximum_tokens):
    method __len__ (line 336) | def __len__(self):
    method __iter__ (line 339) | def __iter__(self):
  class ShuffledDataset (line 347) | class ShuffledDataset:
    method __init__ (line 374) | def __init__(self, datasets, batch_size):
    method __iter__ (line 379) | def __iter__(self):
    method __len__ (line 389) | def __len__(self):

FILE: stanza/models/pos/model.py
  class Tagger (line 22) | class Tagger(nn.Module):
    method __init__ (line 23) | def __init__(self, args, vocab, emb_matrix=None, share_hid=False, foun...
    method add_unsaved_module (line 134) | def add_unsaved_module(self, name, module):
    method log_norms (line 138) | def log_norms(self):
    method forward (line 141) | def forward(self, word, word_mask, wordchars, wordchars_mask, upos, xp...

FILE: stanza/models/pos/scorer.py
  function score (line 10) | def score(system_conllu_file, gold_conllu_file, verbose=True, eval_type=...

FILE: stanza/models/pos/trainer.py
  function unpack_batch (line 19) | def unpack_batch(batch, device):
  class Trainer (line 29) | class Trainer(BaseTrainer):
    method __init__ (line 31) | def __init__(self, args=None, vocab=None, pretrain=None, model_file=No...
    method update (line 63) | def update(self, batch, eval=False):
    method predict (line 91) | def predict(self, batch, unsort=True):
    method save (line 108) | def save(self, filename, skip_modules=True):
    method load (line 133) | def load(self, filename, pretrain, args=None, foundation_cache=None):

FILE: stanza/models/pos/vocab.py
  class WordVocab (line 6) | class WordVocab(BaseVocab):
    method __init__ (line 7) | def __init__(self, data=None, lang="", idx=0, cutoff=0, lower=False, i...
    method id2unit (line 12) | def id2unit(self, id):
    method unit2id (line 18) | def unit2id(self, unit):
    method build_vocab (line 24) | def build_vocab(self):
    method __iter__ (line 36) | def __iter__(self):
    method __str__ (line 43) | def __str__(self):
  class XPOSVocab (line 46) | class XPOSVocab(CompositeVocab):
    method __init__ (line 47) | def __init__(self, data=None, lang="", idx=0, sep="", keyed=False):
  class FeatureVocab (line 50) | class FeatureVocab(CompositeVocab):
    method __init__ (line 51) | def __init__(self, data=None, lang="", idx=0, sep="|", keyed=True):
  class MultiVocab (line 54) | class MultiVocab(BaseMultiVocab):
    method state_dict (line 55) | def state_dict(self):
    method load_state_dict (line 66) | def load_state_dict(cls, state_dict):

FILE: stanza/models/pos/xpos_vocab_factory.py
  function xpos_vocab_factory (line 188) | def xpos_vocab_factory(data, shorthand):

FILE: stanza/models/pos/xpos_vocab_utils.py
  class XPOSType (line 9) | class XPOSType(Enum):
  function filter_data (line 18) | def filter_data(data, idx):
  function choose_simplest_factory (line 28) | def choose_simplest_factory(data, shorthand):
  function build_xpos_vocab (line 44) | def build_xpos_vocab(description, data, shorthand):

FILE: stanza/models/tagger.py
  function build_argparse (line 33) | def build_argparse():
  function parse_args (line 122) | def parse_args(args=None):
  function main (line 139) | def main(args=None):
  function model_file_name (line 151) | def model_file_name(args):
  function save_each_file_name (line 154) | def save_each_file_name(args):
  function load_pretrain (line 159) | def load_pretrain(args):
  function get_eval_type (line 170) | def get_eval_type(dev_batch):
  function load_training_data (line 181) | def load_training_data(args, pretrain):
  function train (line 255) | def train(args):
  function evaluate (line 416) | def evaluate(args):
  function evaluate_trainer (line 431) | def evaluate_trainer(args, trainer, pretrain):

FILE: stanza/models/tokenization/data.py
  function filter_consecutive_whitespaces (line 16) | def filter_consecutive_whitespaces(para):
  class TokenizationDataset (line 34) | class TokenizationDataset:
    method __init__ (line 35) | def __init__(self, tokenizer_args, input_files={'txt': None, 'label': ...
    method labels (line 75) | def labels(self):
    method extract_dict_feat (line 83) | def extract_dict_feat(self, para, idx):
    method para_to_sentences (line 120) | def para_to_sentences(self, para):
    method advance_old_batch (line 186) | def advance_old_batch(self, eval_offsets, old_batch):
  function build_move_punct_set (line 217) | def build_move_punct_set(data, move_back_prob):
  function build_known_mwt (line 240) | def build_known_mwt(data, mwt_expansions):
  class DataLoader (line 264) | class DataLoader(TokenizationDataset):
    method __init__ (line 268) | def __init__(self, args, input_files={'txt': None, 'label': None}, inp...
    method __len__ (line 318) | def __len__(self):
    method init_vocab (line 321) | def init_vocab(self):
    method augment_vocab (line 326) | def augment_vocab(vocab, data, existing_unit, new_unit):
    method init_sent_ids (line 346) | def init_sent_ids(self):
    method has_mwt (line 354) | def has_mwt(self):
    method shuffle (line 364) | def shuffle(self):
    method move_last_char (line 369) | def move_last_char(self, sentence):
    method split_mwt (line 377) | def split_mwt(self, sentence):
    method move_punct_back (line 416) | def move_punct_back(self, sentence):
    method augment_final_punct (line 443) | def augment_final_punct(self, sentence):
    method next (line 456) | def next(self, eval_offsets=None, unit_dropout=0.0, feat_unit_dropout=...
  class SortedDataset (line 619) | class SortedDataset(Dataset):
    method __init__ (line 629) | def __init__(self, dataset):
    method __len__ (line 635) | def __len__(self):
    method __getitem__ (line 638) | def __getitem__(self, index):
    method unsort (line 646) | def unsort(self, arr):
    method collate (line 649) | def collate(self, samples):

FILE: stanza/models/tokenization/model.py
  class Tokenizer (line 9) | class Tokenizer(nn.Module):
    method __init__ (line 10) | def __init__(self, args, nchars, emb_dim, hidden_dim, dropout, feat_dr...
    method add_unsaved_module (line 60) | def add_unsaved_module(self, name, module):
    method forward (line 64) | def forward(self, x, feats, lengths, raw=None):

FILE: stanza/models/tokenization/tokenize_files.py
  function tokenize_to_file (line 32) | def tokenize_to_file(tokenizer, fin, fout, chunk_size=500):
  function main (line 47) | def main(args=None):

FILE: stanza/models/tokenization/trainer.py
  class Trainer (line 16) | class Trainer(BaseTrainer):
    method __init__ (line 17) | def __init__(self, args=None, vocab=None, lexicon=None, dictionary=Non...
    method update (line 35) | def update(self, inputs):
    method predict (line 57) | def predict(self, inputs):
    method save (line 70) | def save(self, filename, skip_modules=True):
    method load (line 94) | def load(self, filename, args, foundation_cache):

FILE: stanza/models/tokenization/utils.py
  function create_dictionary (line 21) | def create_dictionary(lexicon):
  function create_lexicon (line 52) | def create_lexicon(shorthand=None, train_path=None, external_path=None):
  function load_lexicon (line 123) | def load_lexicon(args):
  function load_mwt_dict (line 145) | def load_mwt_dict(filename):
  function process_sentence (line 166) | def process_sentence(sentence, mwt_dict=None):
  function find_spans (line 207) | def find_spans(raw):
  function update_pred_regex (line 225) | def update_pred_regex(raw, pred):
  function predict (line 253) | def predict(trainer, data_generator, batch_size, max_seqlen, use_regex_t...
  function output_predictions (line 326) | def output_predictions(output_file, trainer, data_generator, vocab, mwt_...
  function postprocess_doc (line 344) | def postprocess_doc(doc, postprocessor, orig_text=None):
  function reassemble_doc_from_tokens (line 407) | def reassemble_doc_from_tokens(tokens, mwts, expansions, raw_text):
  function decode_predictions (line 469) | def decode_predictions(vocab, mwt_dict, orig_text, all_raw, all_preds, n...
  function match_tokens_with_text (line 550) | def match_tokens_with_text(sentences, orig_text):
  function eval_model (line 581) | def eval_model(args, trainer, batches, vocab, mwt_dict):

FILE: stanza/models/tokenization/vocab.py
  class Vocab (line 9) | class Vocab(BaseVocab):
    method __init__ (line 10) | def __init__(self, *args, **kwargs):
    method build_vocab (line 14) | def build_vocab(self):
    method append (line 25) | def append(self, unit):
    method normalize_unit (line 30) | def normalize_unit(self, unit):
    method normalize_token (line 34) | def normalize_token(self, token):

FILE: stanza/models/tokenizer.py
  function build_argparse (line 34) | def build_argparse():
  function parse_args (line 104) | def parse_args(args=None):
  function model_file_name (line 114) | def model_file_name(args):
  function main (line 126) | def main(args=None):
  function train (line 143) | def train(args):
  function evaluate (line 248) | def evaluate(args):

FILE: stanza/models/wl_coref.py
  function output_running_time (line 62) | def output_running_time():
  function deterministic (line 73) | def deterministic() -> None:

FILE: stanza/pipeline/constituency_processor.py
  class ConstituencyProcessor (line 16) | class ConstituencyProcessor(UDProcessor):
    method _set_up_requires (line 25) | def _set_up_requires(self):
    method _set_up_model (line 32) | def _set_up_model(self, config, pipeline, device):
    method _set_up_final_config (line 52) | def _set_up_final_config(self, config):
    method process (line 58) | def process(self, document):
    method get_constituents (line 74) | def get_constituents(self):

FILE: stanza/pipeline/core.py
  class DownloadMethod (line 39) | class DownloadMethod(Enum):
  class LanguageNotDownloadedError (line 51) | class LanguageNotDownloadedError(FileNotFoundError):
    method __init__ (line 52) | def __init__(self, lang, lang_dir, model_path):
  class UnsupportedProcessorError (line 58) | class UnsupportedProcessorError(FileNotFoundError):
    method __init__ (line 59) | def __init__(self, processor, lang):
  class IllegalPackageError (line 64) | class IllegalPackageError(ValueError):
    method __init__ (line 65) | def __init__(self, msg):
  class PipelineRequirementsException (line 68) | class PipelineRequirementsException(Exception):
    method __init__ (line 74) | def __init__(self, processor_req_fails):
    method processor_req_fails (line 79) | def processor_req_fails(self):
    method build_message (line 82) | def build_message(self):
    method __str__ (line 87) | def __str__(self):
  function build_default_config_option (line 90) | def build_default_config_option(model_specs):
  function filter_variants (line 111) | def filter_variants(model_specs):
  function build_default_config (line 115) | def build_default_config(resources, lang, model_dir, load_list):
  function normalize_download_method (line 163) | def normalize_download_method(download_method):
  class Pipeline (line 176) | class Pipeline:
    method __init__ (line 178) | def __init__(self,
    method update_kwargs (line 351) | def update_kwargs(kwargs, processor_list):
    method filter_config (line 372) | def filter_config(prefix, config_dict):
    method loaded_processors (line 384) | def loaded_processors(self):
    method process (line 391) | def process(self, doc, processors=None):
    method process_conllu (line 434) | def process_conllu(self, doc, ignore_gapping=True, processors=None):
    method bulk_process (line 445) | def bulk_process(self, docs, *args, **kwargs):
    method stream (line 455) | def stream(self, docs, batch_size=50, *args, **kwargs):
    method __str__ (line 483) | def __str__(self):
    method __call__ (line 490) | def __call__(self, doc, processors=None):
  function main (line 493) | def main():

FILE: stanza/pipeline/coref_processor.py
  function extract_text (line 14) | def extract_text(document, sent_id, start_word, end_word):
  class CorefProcessor (line 56) | class CorefProcessor(UDProcessor):
    method _set_up_model (line 62) | def _set_up_model(self, config, pipeline, device):
    method process (line 91) | def process(self, document):
    method _handle_zero_anaphora (line 200) | def _handle_zero_anaphora(self, document, results, sent_ids, word_pos):

FILE: stanza/pipeline/demo/demo_server.py
  function get_file (line 9) | def get_file(path):
  function static_file (line 16) | def static_file(path):
  function index (line 27) | def index():
  function annotate (line 31) | def annotate():
  function create_app (line 57) | def create_app():

FILE: stanza/pipeline/demo/stanza-brat.js
  function isInt (line 49) | function isInt(value) {
  function posColor (line 71) | function posColor(posTag) {
  function uposColor (line 105) | function uposColor(posTag) {
  function nerColor (line 137) | function nerColor(nerTag) {
  function sentimentColor (line 164) | function sentimentColor(sentiment) {
  function annotators (line 184) | function annotators() {
  function date (line 195) | function date() {
  function ConstituencyParseProcessor (line 213) | function ConstituencyParseProcessor() {
  function render (line 316) | function render(data, reverse) {
  function renderTokensregex (line 834) | function renderTokensregex(data) {
  function renderSemgrex (line 893) | function renderSemgrex(data) {
  function renderTregex (line 971) | function renderTregex(data) {
  function createAnnotationDiv (line 1093) | function createAnnotationDiv(id, annotator, selector, label) {

FILE: stanza/pipeline/demo/stanza-parseviewer.js
  function adjustGraphPositioning (line 105) | function adjustGraphPositioning(svg, g, minWidth, minHeight) {

FILE: stanza/pipeline/depparse_processor.py
  class DepparseProcessor (line 21) | class DepparseProcessor(UDProcessor):
    method __init__ (line 28) | def __init__(self, config, pipeline, device):
    method _set_up_requires (line 32) | def _set_up_requires(self):
    method _set_up_model (line 39) | def _set_up_model(self, config, pipeline, device):
    method get_known_relations (line 49) | def get_known_relations(self):
    method process (line 56) | def process(self, document):

FILE: stanza/pipeline/external/corenlp_converter_depparse.py
  class ConverterDepparse (line 10) | class ConverterDepparse(ProcessorVariant):
    method __init__ (line 14) | def __init__(self, config):
    method process (line 28) | def process(self, document):

FILE: stanza/pipeline/external/jieba.py
  function check_jieba (line 12) | def check_jieba():
  class JiebaTokenizer (line 25) | class JiebaTokenizer(ProcessorVariant):
    method __init__ (line 26) | def __init__(self, config):
    method process (line 43) | def process(self, document):

FILE: stanza/pipeline/external/pythainlp.py
  function check_pythainlp (line 11) | def check_pythainlp():
  class PyThaiNLPTokenizer (line 26) | class PyThaiNLPTokenizer(ProcessorVariant):
    method __init__ (line 27) | def __init__(self, config):
    method process (line 44) | def process(self, document):

FILE: stanza/pipeline/external/spacy.py
  function check_spacy (line 9) | def check_spacy():
  class SpacyTokenizer (line 22) | class SpacyTokenizer(ProcessorVariant):
    method __init__ (line 23) | def __init__(self, config):
    method process (line 48) | def process(self, document):

FILE: stanza/pipeline/external/sudachipy.py
  function check_sudachipy (line 13) | def check_sudachipy():
  class SudachiPyTokenizer (line 29) | class SudachiPyTokenizer(ProcessorVariant):
    method __init__ (line 30) | def __init__(self, config):
    method process (line 45) | def process(self, document):

FILE: stanza/pipeline/langid_processor.py
  class LangIDProcessor (line 17) | class LangIDProcessor(UDProcessor):
    method _set_up_model (line 31) | def _set_up_model(self, config, pipeline, device):
    method _text_to_tensor (line 38) | def _text_to_tensor(self, docs):
    method _id_langs (line 50) | def _id_langs(self, batch_tensor):
    method clean_text (line 67) | def clean_text(text):
    method _process_list (line 84) | def _process_list(self, docs):
    method process (line 113) | def process(self, doc):
    method bulk_process (line 121) | def bulk_process(self, docs):

FILE: stanza/pipeline/lemma_processor.py
  class LemmaProcessor (line 18) | class LemmaProcessor(UDProcessor):
    method __init__ (line 28) | def __init__(self, config, pipeline, device):
    method use_identity (line 35) | def use_identity(self):
    method _set_up_model (line 38) | def _set_up_model(self, config, pipeline, device):
    method _set_up_requires (line 60) | def _set_up_requires(self):
    method process (line 69) | def process(self, document):

FILE: stanza/pipeline/morphseg_processor.py
  class MorphSegProcessor (line 7) | class MorphSegProcessor(UDProcessor):
    method __init__ (line 11) | def __init__(self, config, pipeline, device):
    method _set_up_model (line 18) | def _set_up_model(self, config, pipeline, device):
    method process (line 45) | def process(self, document):

FILE: stanza/pipeline/multilingual.py
  class MultilingualPipeline (line 18) | class MultilingualPipeline:
    method __init__ (line 40) | def __init__(self,
    method _update_pipeline_cache (line 103) | def _update_pipeline_cache(self, lang):
    method process (line 148) | def process(self, doc):
    method __call__ (line 185) | def __call__(self, doc):

FILE: stanza/pipeline/mwt_processor.py
  class MWTProcessor (line 15) | class MWTProcessor(UDProcessor):
    method _set_up_model (line 22) | def _set_up_model(self, config, pipeline, device):
    method build_batch (line 25) | def build_batch(self, document):
    method process (line 28) | def process(self, document):
    method bulk_process (line 52) | def bulk_process(self, docs):

FILE: stanza/pipeline/ner_processor.py
  class NERProcessor (line 21) | class NERProcessor(UDProcessor):
    method _get_dependencies (line 28) | def _get_dependencies(self, config, dep_name):
    method _set_up_model (line 37) | def _set_up_model(self, config, pipeline, device):
    method _set_up_final_config (line 83) | def _set_up_final_config(self, config):
    method __str__ (line 98) | def __str__(self):
    method mark_inactive (line 101) | def mark_inactive(self):
    method process (line 106) | def process(self, document):
    method bulk_process (line 127) | def bulk_process(self, docs):
    method get_known_tags (line 136) | def get_known_tags(self, model_idx=0):

FILE: stanza/pipeline/pos_processor.py
  class POSProcessor (line 19) | class POSProcessor(UDProcessor):
    method _set_up_model (line 26) | def _set_up_model(self, config, pipeline, device):
    method __str__ (line 35) | def __str__(self):
    method get_known_xpos (line 38) | def get_known_xpos(self):
    method is_composite_xpos (line 49) | def is_composite_xpos(self):
    method get_known_upos (line 55) | def get_known_upos(self):
    method get_known_feats (line 62) | def get_known_feats(self):
    method process (line 69) | def process(self, document):

FILE: stanza/pipeline/processor.py
  class ProcessorRequirementsException (line 10) | class ProcessorRequirementsException(Exception):
    method __init__ (line 13) | def __init__(self, processors_list, err_processor, provided_reqs):
    method err_processor (line 22) | def err_processor(self):
    method processor_type (line 27) | def processor_type(self):
    method processors_list (line 31) | def processors_list(self):
    method provided_reqs (line 35) | def provided_reqs(self):
    method build_message (line 38) | def build_message(self):
    method __str__ (line 48) | def __str__(self):
  class Processor (line 52) | class Processor(ABC):
    method __init__ (line 55) | def __init__(self, config, pipeline, device):
    method __str__ (line 73) | def __str__(self):
    method process (line 88) | def process(self, doc):
    method bulk_process (line 92) | def bulk_process(self, docs):
    method _set_up_provides (line 100) | def _set_up_provides(self):
    method _set_up_requires (line 104) | def _set_up_requires(self):
    method _set_up_variant_requires (line 108) | def _set_up_variant_requires(self):
    method _set_up_variants (line 125) | def _set_up_variants(self, config, device):
    method config (line 133) | def config(self):
    method pipeline (line 138) | def pipeline(self):
    method provides (line 143) | def provides(self):
    method requires (line 147) | def requires(self):
    method _check_requirements (line 150) | def _check_requirements(self):
  class ProcessorVariant (line 160) | class ProcessorVariant(ABC):
    method process (line 166) | def process(self, doc):
    method bulk_process (line 177) | def bulk_process(self, docs):
  class UDProcessor (line 182) | class UDProcessor(Processor):
    method __init__ (line 185) | def __init__(self, config, pipeline, device):
    method _set_up_model (line 199) | def _set_up_model(self, config, pipeline, device):
    method _set_up_final_config (line 202) | def _set_up_final_config(self, config):
    method mark_inactive (line 214) | def mark_inactive(self):
    method pretrain (line 220) | def pretrain(self):
    method trainer (line 224) | def trainer(self):
    method vocab (line 228) | def vocab(self):
    method filter_out_option (line 232) | def filter_out_option(option):
    method bulk_process (line 242) | def bulk_process(self, docs):
  class ProcessorRegisterException (line 262) | class ProcessorRegisterException(Exception):
    method __init__ (line 265) | def __init__(self, processor_class, expected_parent):
    method build_message (line 270) | def build_message(self):
    method __str__ (line 273) | def __str__(self):
  function register_processor (line 276) | def register_processor(name):
  function register_processor_variant (line 286) | def register_processor_variant(name, variant):

FILE: stanza/pipeline/sentiment_processor.py
  class SentimentProcessor (line 23) | class SentimentProcessor(UDProcessor):
    method _set_up_model (line 35) | def _set_up_model(self, config, pipeline, device):
    method _set_up_final_config (line 65) | def _set_up_final_config(self, config):
    method process (line 72) | def process(self, document):

FILE: stanza/pipeline/tokenize_processor.py
  class TokenizeProcessor (line 31) | class TokenizeProcessor(UDProcessor):
    method _set_up_model (line 40) | def _set_up_model(self, config, pipeline, device):
    method process_pre_tokenized_text (line 57) | def process_pre_tokenized_text(self, input_src):
    method process (line 82) | def process(self, document):
    method bulk_process (line 121) | def bulk_process(self, docs):

FILE: stanza/protobuf/__init__.py
  function parseFromDelimitedString (line 11) | def parseFromDelimitedString(obj, buf, offset=0):
  function writeToDelimitedString (line 27) | def writeToDelimitedString(obj, stream=None):
  function to_text (line 42) | def to_text(sentence):

FILE: stanza/resources/common.py
  class ResourcesFileNotFoundError (line 45) | class ResourcesFileNotFoundError(FileNotFoundError):
    method __init__ (line 46) | def __init__(self, resources_filepath):
  class UnknownLanguageError (line 50) | class UnknownLanguageError(ValueError):
    method __init__ (line 51) | def __init__(self, unknown):
  class UnknownProcessorError (line 55) | class UnknownProcessorError(ValueError):
    method __init__ (line 56) | def __init__(self, unknown):
  function ensure_dir (line 62) | def ensure_dir(path):
  function get_md5 (line 68) | def get_md5(path):
  function unzip (line 81) | def unzip(path, filename):
  function get_root_from_zipfile (line 89) | def get_root_from_zipfile(filename):
  function file_exists (line 98) | def file_exists(path, md5):
  function assert_file_exists (line 104) | def assert_file_exists(path, md5=None, alternate_md5=None):
  function download_file (line 115) | def download_file(url, path, proxies, raise_for_status=False):
  function request_file (line 138) | def request_file(url, path, proxies=None, md5=None, raise_for_status=Fal...
  function sort_processors (line 168) | def sort_processors(processor_list):
  function add_mwt (line 186) | def add_mwt(processors, resources, lang):
  function maintain_processor_list (line 201) | def maintain_processor_list(resources, lang, package, processors, allow_...
  function add_dependencies (line 299) | def add_dependencies(resources, lang, processor_list):
  function flatten_processor_list (line 327) | def flatten_processor_list(processor_list):
  function set_logging_level (line 350) | def set_logging_level(logging_level, verbose):
  function process_pipeline_parameters (line 376) | def process_pipeline_parameters(lang, model_dir, package, processors):
  function download_resources_json (line 441) | def download_resources_json(model_dir=DEFAULT_MODEL_DIR,
  function load_resources_json (line 468) | def load_resources_json(model_dir=DEFAULT_MODEL_DIR, resources_filepath=...
  function get_language_resources (line 480) | def get_language_resources(resources, lang):
  function list_available_languages (line 494) | def list_available_languages(model_dir=DEFAULT_MODEL_DIR,
  function expand_model_url (line 511) | def expand_model_url(resources, model_url):
  function download_models (line 517) | def download_models(download_list,
  function download (line 556) | def download(

FILE: stanza/resources/default_packages.py
  function build_default_pretrains (line 220) | def build_default_pretrains(default_treebanks):
  function known_nicknames (line 950) | def known_nicknames():

FILE: stanza/resources/installation.py
  function download_corenlp_models (line 34) | def download_corenlp_models(model, version, dir=DEFAULT_CORENLP_DIR, url...
  function install_corenlp (line 88) | def install_corenlp(dir=DEFAULT_CORENLP_DIR, url=DEFAULT_CORENLP_URL, lo...

FILE: stanza/resources/prepare_resources.py
  function parse_args (line 30) | def parse_args():
  function ensure_dir (line 73) | def ensure_dir(dir):
  function copy_file (line 77) | def copy_file(src, dst):
  function get_md5 (line 82) | def get_md5(path):
  function split_model_name (line 87) | def split_model_name(model):
  function split_package (line 105) | def split_package(package, default_use_charlm=True):
  function get_pretrain_package (line 129) | def get_pretrain_package(lang, package, model_pretrains, default_pretrai...
  function get_charlm_package (line 141) | def get_charlm_package(lang, package, model_charlms, default_charlms, de...
  function get_con_dependencies (line 152) | def get_con_dependencies(lang, package):
  function get_pos_charlm_package (line 166) | def get_pos_charlm_package(lang, package):
  function get_pos_dependencies (line 169) | def get_pos_dependencies(lang, package):
  function get_lemma_pretrain_package (line 183) | def get_lemma_pretrain_package(lang, package):
  function get_lemma_charlm_package (line 195) | def get_lemma_charlm_package(lang, package):
  function get_lemma_dependencies (line 198) | def get_lemma_dependencies(lang, package):
  function get_tokenizer_charlm_package (line 213) | def get_tokenizer_charlm_package(lang, package):
  function get_tokenizer_dependencies (line 216) | def get_tokenizer_dependencies(lang, package):
  function get_depparse_charlm_package (line 223) | def get_depparse_charlm_package(lang, package):
  function get_depparse_dependencies (line 226) | def get_depparse_dependencies(lang, package):
  function get_ner_charlm_package (line 240) | def get_ner_charlm_package(lang, package):
  function get_ner_pretrain_package (line 243) | def get_ner_pretrain_package(lang, package):
  function get_ner_dependencies (line 246) | def get_ner_dependencies(lang, package):
  function get_sentiment_dependencies (line 260) | def get_sentiment_dependencies(lang, package):
  function get_dependencies (line 279) | def get_dependencies(processor, lang, package):
  function process_dirs (line 302) | def process_dirs(args):
  function get_default_pos_package (line 341) | def get_default_pos_package(lang, ud_package):
  function get_default_depparse_package (line 349) | def get_default_depparse_package(lang, ud_package):
  function process_default_zips (line 357) | def process_default_zips(args):
  function get_default_processors (line 411) | def get_default_processors(resources, lang):
  function get_default_optional_processors (line 478) | def get_default_optional_processors(resources, lang):
  function update_processor_add_transformer (line 488) | def update_processor_add_transformer(resources, lang, current_processors...
  function get_default_accurate (line 498) | def get_default_accurate(resources, lang):
  function get_optional_accurate (line 536) | def get_optional_accurate(resources, lang):
  function get_default_fast (line 550) | def get_default_fast(resources, lang):
  function process_packages (line 578) | def process_packages(args):
  function process_lcode (line 674) | def process_lcode(args):
  function process_misc (line 702) | def process_misc(args):
  function main (line 713) | def main():

FILE: stanza/resources/print_charlm_depparse.py
  function list_depparse (line 12) | def list_depparse():

FILE: stanza/server/annotator.py
  class Annotator (line 11) | class Annotator(Process):
    method name (line 21) | def name(self):
    method requires (line 28) | def requires(self):
    method provides (line 36) | def provides(self):
    method annotate (line 45) | def annotate(self, ann):
    method properties (line 53) | def properties(self):
    class _Handler (line 64) | class _Handler(BaseHTTPRequestHandler):
      method __init__ (line 67) | def __init__(self, request, client_address, server):
      method do_GET (line 70) | def do_GET(self):
      method do_POST (line 87) | def do_POST(self):
    method __init__ (line 117) | def __init__(self, host="", port=8432):
    method run (line 125) | def run(self):

FILE: stanza/server/client.py
  function is_corenlp_lang (line 50) | def is_corenlp_lang(props_str):
  function validate_corenlp_props (line 59) | def validate_corenlp_props(properties=None, annotators=None, output_form...
  class AnnotationException (line 69) | class AnnotationException(Exception):
  class TimeoutException (line 74) | class TimeoutException(AnnotationException):
  class ShouldRetryException (line 79) | class ShouldRetryException(Exception):
  class PermanentlyFailedException (line 84) | class PermanentlyFailedException(Exception):
  class StartServer (line 88) | class StartServer(enum.Enum):
  function clean_props_file (line 94) | def clean_props_file(props_file):
  class RobustService (line 101) | class RobustService(object):
    method __init__ (line 105) | def __init__(self, start_cmd, stop_cmd, endpoint, stdout=None,
    method is_alive (line 121) | def is_alive(self):
    method start (line 129) | def start(self):
    method atexit_kill (line 161) | def atexit_kill(self):
    method stop (line 168) | def stop(self):
    method __enter__ (line 186) | def __enter__(self):
    method __exit__ (line 190) | def __exit__(self, _, __, ___):
    method ensure_alive (line 193) | def ensure_alive(self):
  function resolve_classpath (line 226) | def resolve_classpath(classpath=None):
  class CoreNLPClient (line 247) | class CoreNLPClient(RobustService):
    method __init__ (line 257) | def __init__(self, start_server=StartServer.FORCE_START,
    method _setup_client_defaults (line 369) | def _setup_client_defaults(self):
    method _setup_server_defaults (line 386) | def _setup_server_defaults(self):
    method _request (line 437) | def _request(self, buf, properties, reset_default=False, **kwargs):
    method annotate (line 476) | def annotate(self, text, annotators=None, output_format=None, properti...
    method update (line 548) | def update(self, doc, annotators=None, properties=None):
    method tokensregex (line 567) | def tokensregex(self, text, pattern, filter=False, to_words=False, ann...
    method semgrex (line 574) | def semgrex(self, text, pattern, filter=False, to_words=False, annotat...
    method fill_tree_proto (line 580) | def fill_tree_proto(self, tree, proto_tree):
    method tregex (line 587) | def tregex(self, text=None, pattern=None, filter=False, annotators=Non...
    method __regex (line 634) | def __regex(self, path, text, pattern, filter, annotators=None, proper...
    method scenegraph (line 696) | def scenegraph(self, text, properties=None):
  function read_corenlp_props (line 737) | def read_corenlp_props(props_path):
  function write_corenlp_props (line 751) | def write_corenlp_props(props_dict, file_path=None):
  function regex_matches_to_indexed_words (line 767) | def regex_matches_to_indexed_words(matches):

FILE: stanza/server/dependency_converter.py
  function send_converter_request (line 13) | def send_converter_request(request, classpath=None):
  function build_request (line 16) | def build_request(doc):
  function process_doc (line 25) | def process_doc(doc, classpath=None):
  function attach_dependencies (line 34) | def attach_dependencies(doc, response):
  class DependencyConverter (line 65) | class DependencyConverter(JavaProtobufContext):
    method __init__ (line 72) | def __init__(self, classpath=None):
    method process (line 75) | def process(self, doc):
  function main (line 84) | def main():

FILE: stanza/server/java_protobuf_requests.py
  function send_request (line 9) | def send_request(request, response_type, java_main, classpath=None):
  function add_tree_nodes (line 26) | def add_tree_nodes(proto_tree, tree, score):
  function build_tree (line 50) | def build_tree(tree, score):
  function from_tree (line 66) | def from_tree(proto_tree):
  function add_token (line 115) | def add_token(token_list, word, token):
  function add_sentence (line 168) | def add_sentence(request_sentences, sentence, num_tokens):
  function add_word_to_graph (line 180) | def add_word_to_graph(graph, word, sent_idx):
  function convert_networkx_graph (line 206) | def convert_networkx_graph(graph_proto, sentence, sent_idx):
  function features_to_string (line 253) | def features_to_string(features):
  function misc_space_pieces (line 260) | def misc_space_pieces(misc):
  function remove_space_misc (line 272) | def remove_space_misc(misc):
  function substitute_space_misc (line 284) | def substitute_space_misc(misc, space_misc):
  class JavaProtobufContext (line 317) | class JavaProtobufContext(object):
    method __init__ (line 321) | def __init__(self, classpath, build_response, java_main, extra_args=No...
    method open_pipe (line 331) | def open_pipe(self):
    method close_pipe (line 336) | def close_pipe(self):
    method __enter__ (line 342) | def __enter__(self):
    method __exit__ (line 346) | def __exit__(self, type, value, traceback):
    method process_request (line 349) | def process_request(self, request):

FILE: stanza/server/main.py
  function dictstr (line 18) | def dictstr(arg):
  function do_annotate (line 33) | def do_annotate(args):
  function main (line 51) | def main():

FILE: stanza/server/morphology.py
  function send_morphology_request (line 14) | def send_morphology_request(request):
  function build_request (line 17) | def build_request(words, xpos_tags):
  function process_text (line 31) | def process_text(words, xpos_tags):
  class Morphology (line 45) | class Morphology(JavaProtobufContext):
    method __init__ (line 54) | def __init__(self, classpath=None):
    method process (line 57) | def process(self, words, xpos_tags):
  function main (line 65) | def main():

FILE: stanza/server/parser_eval.py
  function build_request (line 18) | def build_request(treebank):
  function collate (line 45) | def collate(gold_treebank, predictions_treebank):
  class EvaluateParser (line 56) | class EvaluateParser(JavaProtobufContext):
    method __init__ (line 63) | def __init__(self, classpath=None, kbest=None, silent=False):
    method process (line 74) | def process(self, treebank):
  function main (line 79) | def main():

FILE: stanza/server/semgrex.py
  function send_semgrex_request (line 50) | def send_semgrex_request(request):
  function build_request (line 53) | def build_request(doc, semgrex_patterns, enhanced=False):
  function process_doc (line 77) | def process_doc(doc, *semgrex_patterns, enhanced=False):
  class Semgrex (line 87) | class Semgrex(JavaProtobufContext):
    method __init__ (line 94) | def __init__(self, classpath=None):
    method process (line 97) | def process(self, doc, *semgrex_patterns):
  function annotate_doc (line 104) | def annotate_doc(doc, semgrex_result, semgrex_patterns, matches_only, ex...
  function main (line 180) | def main():

FILE: stanza/server/ssurgeon.py
  function parse_ssurgeon_edits (line 30) | def parse_ssurgeon_edits(ssurgeon_text):
  function read_ssurgeon_edits (line 47) | def read_ssurgeon_edits(edit_file):
  function send_ssurgeon_request (line 51) | def send_ssurgeon_request(request):
  function build_request (line 54) | def build_request(doc, ssurgeon_edits):
  function build_request_one_operation (line 84) | def build_request_one_operation(doc, semgrex_pattern, ssurgeon_edits, ss...
  function process_doc (line 88) | def process_doc(doc, ssurgeon_edits):
  function process_doc_one_operation (line 98) | def process_doc_one_operation(doc, semgrex_pattern, ssurgeon_edits, ssur...
  function build_word_entry (line 103) | def build_word_entry(word_index, graph_word):
  function convert_response_to_doc (line 129) | def convert_response_to_doc(doc, semgrex_response, add_missing_text):
  class Ssurgeon (line 213) | class Ssurgeon(java_protobuf_requests.JavaProtobufContext):
    method __init__ (line 220) | def __init__(self, classpath=None):
    method process (line 223) | def process(self, doc, ssurgeon_edits):
    method process_one_operation (line 230) | def process_one_operation(self, doc, semgrex_pattern, ssurgeon_edits, ...
  function main (line 250) | def main():

FILE: stanza/server/tokensregex.py
  function send_tokensregex_request (line 15) | def send_tokensregex_request(request):
  function process_doc (line 19) | def process_doc(doc, *patterns):
  function main (line 33) | def main():

FILE: stanza/server/tsurgeon.py
  function send_tsurgeon_request (line 24) | def send_tsurgeon_request(request):
  function build_request (line 28) | def build_request(trees, operations):
  function process_trees (line 53) | def process_trees(trees, *operations):
  class Tsurgeon (line 65) | class Tsurgeon(JavaProtobufContext):
    method __init__ (line 72) | def __init__(self, classpath=None):
    method process (line 75) | def process(self, trees, *operations):
  function main (line 81) | def main():

FILE: stanza/server/ud_enhancer.py
  function build_enhancer_request (line 9) | def build_enhancer_request(doc, language, pronouns_pattern):
  function process_doc (line 49) | def process_doc(doc, language=None, pronouns_pattern=None):
  class UniversalEnhancer (line 53) | class UniversalEnhancer(JavaProtobufContext):
    method __init__ (line 60) | def __init__(self, language=None, pronouns_pattern=None, classpath=None):
    method process (line 67) | def process(self, doc):
  function main (line 71) | def main():

FILE: stanza/tests/__init__.py
  function safe_rm (line 82) | def safe_rm(path_to_rm):
  function compare_ignoring_whitespace (line 109) | def compare_ignoring_whitespace(predicted, expected):

FILE: stanza/tests/classifiers/test_classifier.py
  function fake_embeddings (line 24) | def fake_embeddings(tmp_path_factory):
  class TestClassifier (line 48) | class TestClassifier:
    method build_model (line 49) | def build_model(self, tmp_path, fake_embeddings, train_file, dev_file,...
    method run_training (line 74) | def run_training(self, tmp_path, fake_embeddings, train_file, dev_file...
    method test_build_model (line 88) | def test_build_model(self, tmp_path, fake_embeddings, train_file, dev_...
    method test_save_load (line 94) | def test_save_load(self, tmp_path, fake_embeddings, train_file, dev_fi...
    method test_train_basic (line 108) | def test_train_basic(self, tmp_path, fake_embeddings, train_file, dev_...
    method test_train_bilstm (line 111) | def test_train_bilstm(self, tmp_path, fake_embeddings, train_file, dev...
    method test_train_maxpool_width (line 121) | def test_train_maxpool_width(self, tmp_path, fake_embeddings, train_fi...
    method test_train_conv_2d (line 137) | def test_train_conv_2d(self, tmp_path, fake_embeddings, train_file, de...
    method test_train_filter_channels (line 147) | def test_train_filter_channels(self, tmp_path, fake_embeddings, train_...
    method test_train_bert (line 157) | def test_train_bert(self, tmp_path, fake_embeddings, train_file, dev_f...
    method test_finetune_bert (line 170) | def test_finetune_bert(self, tmp_path, fake_embeddings, train_file, de...
    method test_finetune_bert_layers (line 183) | def test_finetune_bert_layers(self, tmp_path, fake_embeddings, train_f...
    method test_finetune_peft (line 239) | def test_finetune_peft(self, tmp_path, fake_embeddings, train_file, de...
    method test_finetune_peft_restart (line 267) | def test_finetune_peft_restart(self, tmp_path, fake_embeddings, train_...

FILE: stanza/tests/classifiers/test_constituency_classifier.py
  class TestConstituencyClassifier (line 17) | class TestConstituencyClassifier:
    method constituency_model (line 19) | def constituency_model(self, fake_embeddings, tmp_path_factory):
    method build_model (line 27) | def build_model(self, tmp_path, constituency_model, fake_embeddings, t...
    method run_training (line 50) | def run_training(self, tmp_path, constituency_model, fake_embeddings, ...
    method test_build_model (line 63) | def test_build_model(self, tmp_path, constituency_model, fake_embeddin...
    method test_save_load (line 69) | def test_save_load(self, tmp_path, constituency_model, fake_embeddings...
    method test_train_basic (line 81) | def test_train_basic(self, tmp_path, constituency_model, fake_embeddin...
    method test_train_pipeline (line 84) | def test_train_pipeline(self, tmp_path, constituency_model, fake_embed...
    method test_train_all_words (line 112) | def test_train_all_words(self, tmp_path, constituency_model, fake_embe...
    method test_train_top_layer (line 117) | def test_train_top_layer(self, tmp_path, constituency_model, fake_embe...
    method test_train_attn (line 122) | def test_train_attn(self, tmp_path, constituency_model, fake_embedding...

FILE: stanza/tests/classifiers/test_data.py
  function train_file (line 34) | def train_file(tmp_path_factory):
  function dev_file (line 42) | def dev_file(tmp_path_factory):
  function test_file (line 50) | def test_file(tmp_path_factory):
  function train_file_with_trees (line 58) | def train_file_with_trees(tmp_path_factory):
  function dev_file_with_trees (line 66) | def dev_file_with_trees(tmp_path_factory):
  class TestClassifierData (line 73) | class TestClassifierData:
    method test_read_data (line 74) | def test_read_data(self, train_file):
    method test_read_data_with_trees (line 81) | def test_read_data_with_trees(self, train_file, train_file_with_trees):
    method test_dataset_vocab (line 93) | def test_dataset_vocab(self, train_file):
    method test_dataset_labels (line 102) | def test_dataset_labels(self, train_file):
    method test_sort_by_length (line 110) | def test_sort_by_length(self, train_file):
    method test_check_labels (line 120) | def test_check_labels(self, train_file):

FILE: stanza/tests/classifiers/test_process_utils.py
  function test_write_list (line 19) | def test_write_list(tmp_path, train_file):
  function test_write_dataset (line 31) | def test_write_dataset(tmp_path, train_file, dev_file, test_file):
  function test_read_snippets (line 46) | def test_read_snippets(tmp_path):
  function test_read_snippets_two_columns (line 64) | def test_read_snippets_two_columns(tmp_path):

FILE: stanza/tests/common/test_bert_embedding.py
  function tiny_bert (line 11) | def tiny_bert():
  function test_load_bert (line 15) | def test_load_bert(tiny_bert):
  function test_run_bert (line 21) | def test_run_bert(tiny_bert):
  function test_run_bert_empty_word (line 26) | def test_run_bert_empty_word(tiny_bert):

FILE: stanza/tests/common/test_char_model.py
  class TestCharModel (line 30) | class TestCharModel:
    method test_single_file_vocab (line 31) | def test_single_file_vocab(self):
    method test_single_file_xz_vocab (line 42) | def test_single_file_xz_vocab(self):
    method test_single_file_dir_vocab (line 53) | def test_single_file_dir_vocab(self):
    method test_multiple_files_vocab (line 64) | def test_multiple_files_vocab(self):
    method test_cutoff_vocab (line 80) | def test_cutoff_vocab(self):
    method test_build_model (line 98) | def test_build_model(self):
    method english_forward (line 155) | def english_forward(self):
    method english_backward (line 165) | def english_backward(self):
    method test_load_model (line 174) | def test_load_model(self, english_forward, english_backward):
    method test_save_load_model (line 181) | def test_save_load_model(self, english_forward, english_backward):

FILE: stanza/tests/common/test_chuliu_edmonds.py
  function test_tarjan_basic (line 14) | def test_tarjan_basic():
  function test_tarjan_cycle (line 23) | def test_tarjan_cycle():

FILE: stanza/tests/common/test_common_data.py
  function test_augment_ratio (line 9) | def test_augment_ratio():
  function test_augment_punct (line 27) | def test_augment_punct():

FILE: stanza/tests/common/test_confusion.py
  function simple_confusion (line 13) | def simple_confusion():
  function short_confusion (line 23) | def short_confusion():
  function test_simple_output (line 62) | def test_simple_output(simple_confusion):
  function test_short_output (line 65) | def test_short_output(short_confusion):
  function test_hide_blank_short_output (line 68) | def test_hide_blank_short_output(short_confusion):
  function test_macro_f1 (line 71) | def test_macro_f1(simple_confusion, short_confusion):
  function test_weighted_f1 (line 75) | def test_weighted_f1(simple_confusion, short_confusion):

FILE: stanza/tests/common/test_constant.py
  function test_treebank (line 15) | def test_treebank():
  function test_lang_to_langcode (line 41) | def test_lang_to_langcode():
  function test_right_to_left (line 48) | def test_right_to_left():
  function test_two_to_three (line 55) | def test_two_to_three():
  function test_langlower (line 62) | def test_langlower():

FILE: stanza/tests/common/test_data_conversion.py
  function test_conll_to_dict (line 41) | def test_conll_to_dict():
  function test_dict_to_conll (line 47) | def test_dict_to_conll():
  function test_dict_to_doc_and_doc_to_dict (line 53) | def test_dict_to_doc_and_doc_to_dict():
  function check_russian_doc (line 101) | def check_russian_doc(doc):
  function test_write_russian_doc (line 131) | def test_write_russian_doc(tmp_path):
  function test_write_to_io (line 171) | def test_write_to_io():
  function test_write_doc2conll_append (line 179) | def test_write_doc2conll_append(tmp_path):
  function test_doc_with_comments (line 190) | def test_doc_with_comments():
  function test_unusual_misc (line 197) | def test_unusual_misc():
  function test_file (line 214) | def test_file():
  function test_zip_file (line 225) | def test_zip_file():
  function test_simple_ner_conversion (line 250) | def test_simple_ner_conversion():
  function test_mwt_ner_conversion (line 283) | def test_mwt_ner_conversion():
  function test_no_offsets_output (line 343) | def test_no_offsets_output():
  function test_deps_conversion (line 376) | def test_deps_conversion():
  function test_empty_deps_conversion (line 413) | def test_empty_deps_conversion():
  function test_empty_deps_at_end_conversion (line 419) | def test_empty_deps_at_end_conversion():
  function check_empty_deps_conversion (line 425) | def check_empty_deps_conversion(input_str, expected_words):
  function test_read_doc_id (line 463) | def test_read_doc_id():
  function test_read_dependency_errors (line 480) | def test_read_dependency_errors():
  function test_read_multiple_doc_ids (line 537) | def test_read_multiple_doc_ids():
  function test_convert_dict (line 560) | def test_convert_dict():
  function test_line_numbers (line 571) | def test_line_numbers():
  function test_speaker (line 601) | def test_speaker():

FILE: stanza/tests/common/test_data_objects.py
  function nlp_pipeline (line 20) | def nlp_pipeline():
  function test_readonly (line 24) | def test_readonly(nlp_pipeline):
  function test_getter (line 32) | def test_getter(nlp_pipeline):
  function test_setter_getter (line 39) | def test_setter_getter(nlp_pipeline):
  function test_backpointer (line 55) | def test_backpointer(nlp_pipeline):

FILE: stanza/tests/common/test_doc.py
  function sentences_dict (line 10) | def sentences_dict():
  function doc (line 18) | def doc(sentences_dict):
  function test_basic_values (line 22) | def test_basic_values(doc, sentences_dict):
  function test_set_sentence (line 34) | def test_set_sentence(doc):
  function test_set_tokens (line 45) | def test_set_tokens(doc):
  function test_constituency_comment (line 58) | def test_constituency_comment(doc):
  function test_sentiment_comment (line 87) | def test_sentiment_comment(doc):
  function test_sent_id_comment (line 116) | def test_sent_id_comment(doc):
  function test_doc_id_comment (line 139) | def test_doc_id_comment(doc):
  function pipeline (line 156) | def pipeline():
  function test_serialized (line 159) | def test_serialized(pipeline):

FILE: stanza/tests/common/test_dropout.py
  function test_word_dropout (line 10) | def test_word_dropout():

FILE: stanza/tests/common/test_foundation_cache.py
  function test_charlm_cache (line 14) | def test_charlm_cache():

FILE: stanza/tests/common/test_pretrain.py
  function check_vocab (line 14) | def check_vocab(vocab):
  function check_embedding (line 21) | def check_embedding(emb, unk=False):
  function check_pretrain (line 33) | def check_pretrain(pt):
  function test_text_pretrain (line 37) | def test_text_pretrain():
  function test_xz_pretrain (line 41) | def test_xz_pretrain():
  function test_gz_pretrain (line 45) | def test_gz_pretrain():
  function test_zip_pretrain (line 49) | def test_zip_pretrain():
  function test_csv_pretrain (line 53) | def test_csv_pretrain():
  function test_resave_pretrain (line 57) | def test_resave_pretrain():
  function test_whitespace (line 86) | def test_whitespace():
  function test_no_header (line 112) | def test_no_header():
  function test_no_header (line 130) | def test_no_header():

FILE: stanza/tests/common/test_relative_attn.py
  function test_attn (line 10) | def test_attn():
  function test_shorter_sequence (line 22) | def test_shorter_sequence():
  function test_reverse (line 35) | def test_reverse():

FILE: stanza/tests/common/test_short_name_to_treebank.py
  function test_short_name (line 8) | def test_short_name():
  function test_canonical_name (line 11) | def test_canonical_name():

FILE: stanza/tests/common/test_utils.py
  function test_wordvec_not_found (line 13) | def test_wordvec_not_found():
  function test_word2vec_xz (line 22) | def test_word2vec_xz():
  function test_fasttext_txt (line 40) | def test_fasttext_txt():
  function test_wordvec_type (line 58) | def test_wordvec_type():
  function test_sort_with_indices (line 80) | def test_sort_with_indices():
  function test_empty_sort_with_indices (line 89) | def test_empty_sort_with_indices():
  function test_split_into_batches (line 98) | def test_split_into_batches():
  function test_find_missing_tags (line 130) | def test_find_missing_tags():
  function test_open_read_text (line 136) | def test_open_read_text():
  function test_checkpoint_name (line 181) | def test_checkpoint_name():
  function test_punct_simplification (line 195) | def test_punct_simplification():

FILE: stanza/tests/constituency/test_convert_arboretum.py
  function test_projective_example (line 163) | def test_projective_example():
  function test_not_fix_example (line 197) | def test_not_fix_example():
  function test_fix_proj_example (line 214) | def test_fix_proj_example():

FILE: stanza/tests/constituency/test_convert_it_vit.py
  function test_process_mwts (line 168) | def test_process_mwts():
  function test_raw_tree (line 178) | def test_raw_tree():
  function test_update_mwts (line 190) | def test_update_mwts():
  function test_read_percent (line 219) | def test_read_percent():

FILE: stanza/tests/constituency/test_convert_starlang.py
  function test_read_tree (line 16) | def test_read_tree():
  function test_missing_word (line 23) | def test_missing_word():
  function test_bad_label (line 31) | def test_bad_label():

FILE: stanza/tests/constituency/test_ensemble.py
  function pipeline (line 21) | def pipeline():
  function saved_ensemble (line 25) | def saved_ensemble(tmp_path_factory, pipeline):
  function check_basic_predictions (line 44) | def check_basic_predictions(trees):
  function test_ensemble_inference (line 54) | def test_ensemble_inference(pipeline):
  function test_ensemble_save (line 71) | def test_ensemble_save(saved_ensemble):
  function test_ensemble_save_load (line 79) | def test_ensemble_save_load(pipeline, saved_ensemble):
  function test_parse_text (line 86) | def test_parse_text(tmp_path, pipeline, saved_ensemble):
  function test_pipeline (line 105) | def test_pipeline(saved_ensemble):

FILE: stanza/tests/constituency/test_in_order_compound_oracle.py
  function trees (line 35) | def trees():
  function gold_sequences (line 43) | def gold_sequences(trees):
  function get_repairs (line 47) | def get_repairs(gold_sequence, wrong_transition, repair_fn):
  function test_fix_shift_close (line 59) | def test_fix_shift_close():
  function test_fix_open_close (line 80) | def test_fix_open_close():

FILE: stanza/tests/constituency/test_in_order_oracle.py
  function get_repairs (line 119) | def get_repairs(gold_sequence, wrong_transition, repair_fn):
  function unary_trees (line 132) | def unary_trees():
  function gold_sequences (line 140) | def gold_sequences(unary_trees):
  function wide_trees (line 145) | def wide_trees():
  function test_wrong_open_root (line 152) | def test_wrong_open_root(gold_sequences):
  function test_missed_unary (line 177) | def test_missed_unary(gold_sequences):
  function test_open_with_stuff (line 222) | def test_open_with_stuff(unary_trees, gold_sequences):
  function test_general_open (line 240) | def test_general_open(gold_sequences):
  function test_missed_unary (line 252) | def test_missed_unary(unary_trees, gold_sequences):
  function test_open_shift (line 285) | def test_open_shift(unary_trees, gold_sequences):
  function test_open_close (line 314) | def test_open_close(unary_trees, gold_sequences):
  function test_shift_close (line 347) | def test_shift_close(unary_trees, gold_sequences):
  function test_close_open_shift_nested (line 385) | def test_close_open_shift_nested(unary_trees, gold_sequences):
  function check_repairs (line 403) | def check_repairs(trees, gold_sequences, expected_trees, transition, rep...
  function test_close_open_shift_unambiguous (line 428) | def test_close_open_shift_unambiguous(unary_trees, gold_sequences):
  function test_close_open_shift_ambiguous_early (line 438) | def test_close_open_shift_ambiguous_early(unary_trees, gold_sequences):
  function test_close_open_shift_ambiguous_late (line 448) | def test_close_open_shift_ambiguous_late(unary_trees, gold_sequences):
  function test_close_shift_shift (line 459) | def test_close_shift_shift(unary_trees, wide_trees):
  function test_close_shift_shift_early (line 480) | def test_close_shift_shift_early(unary_trees, wide_trees):
  function test_close_shift_shift_late (line 502) | def test_close_shift_shift_late(unary_trees, wide_trees):

FILE: stanza/tests/constituency/test_lstm_model.py
  function pretrain_file (line 16) | def pretrain_file():
  function build_model (line 19) | def build_model(pretrain_file, *args):
  function unary_model (line 26) | def unary_model(pretrain_file):
  function test_initial_state (line 29) | def test_initial_state(unary_model):
  function test_shift (line 32) | def test_shift(pretrain_file):
  function test_unary (line 38) | def test_unary(unary_model):
  function test_unary_requires_root (line 41) | def test_unary_requires_root(unary_model):
  function test_open (line 44) | def test_open(unary_model):
  function test_compound_open (line 47) | def test_compound_open(pretrain_file):
  function test_in_order_open (line 51) | def test_in_order_open(pretrain_file):
  function test_close (line 55) | def test_close(unary_model):
  function run_forward_checks (line 58) | def run_forward_checks(model, num_states=1):
  function test_unary_forward (line 96) | def test_unary_forward(unary_model):
  function test_lstm_forward (line 104) | def test_lstm_forward(pretrain_file):
  function test_lstm_layers (line 109) | def test_lstm_layers(pretrain_file):
  function test_multiple_output_forward (line 117) | def test_multiple_output_forward(pretrain_file):
  function test_no_tag_embedding_forward (line 130) | def test_no_tag_embedding_forward(pretrain_file):
  function test_forward_combined_dummy (line 140) | def test_forward_combined_dummy(pretrain_file):
  function test_nonlinearity_init (line 150) | def test_nonlinearity_init(pretrain_file):
  function test_forward_charlm (line 163) | def test_forward_charlm(pretrain_file):
  function test_forward_bert (line 181) | def test_forward_bert(pretrain_file):
  function test_forward_xlnet (line 191) | def test_forward_xlnet(pretrain_file):
  function test_forward_sentence_boundaries (line 201) | def test_forward_sentence_boundaries(pretrain_file):
  function test_forward_constituency_composition (line 214) | def test_forward_constituency_composition(pretrain_file):
  function test_forward_key_position (line 248) | def test_forward_key_position(pretrain_file):
  function test_forward_attn_hidden_size (line 265) | def test_forward_attn_hidden_size(pretrain_file):
  function test_forward_partitioned_attention (line 278) | def test_forward_partitioned_attention(pretrain_file):
  function test_forward_labeled_attention (line 288) | def test_forward_labeled_attention(pretrain_file):
  function test_lattn_partitioned (line 301) | def test_lattn_partitioned(pretrain_file):
  function test_lattn_projection (line 309) | def test_lattn_projection(pretrain_file):
  function test_forward_timing_choices (line 329) | def test_forward_timing_choices(pretrain_file):
  function test_transition_stack (line 339) | def test_transition_stack(pretrain_file):
  function test_constituent_stack (line 358) | def test_constituent_stack(pretrain_file):
  function test_different_transition_sizes (line 377) | def test_different_transition_sizes(pretrain_file):
  function test_relative_attention (line 417) | def test_relative_attention(pretrain_file):
  function test_relative_attention_cat (line 421) | def test_relative_attention_cat(pretrain_file):
  function test_relative_attention_directional (line 431) | def test_relative_attention_directional(pretrain_file):
  function test_relative_attention_sinks (line 438) | def test_relative_attention_sinks(pretrain_file):
  function test_relative_attention_cat_sinks (line 448) | def test_relative_attention_cat_sinks(pretrain_file):
  function test_relative_attention_endpoint_sinks (line 458) | def test_relative_attention_endpoint_sinks(pretrain_file):
  function test_lstm_tree_forward (line 468) | def test_lstm_tree_forward(pretrain_file):
  function test_lstm_tree_cx_forward (line 479) | def test_lstm_tree_cx_forward(pretrain_file):
  function test_maxout (line 490) | def test_maxout(pretrain_file):
  function check_structure_test (line 508) | def check_structure_test(pretrain_file, args1, args2):
  function test_copy_with_new_structure_same (line 554) | def test_copy_with_new_structure_same(pretrain_file):
  function test_copy_with_new_structure_untied (line 562) | def test_copy_with_new_structure_untied(pretrain_file):
  function test_copy_with_new_structure_pattn (line 570) | def test_copy_with_new_structure_pattn(pretrain_file):
  function test_copy_with_new_structure_both (line 575) | def test_copy_with_new_structure_both(pretrain_file):
  function test_copy_with_new_structure_lattn (line 580) | def test_copy_with_new_structure_lattn(pretrain_file):
  function test_parse_tagged_words (line 585) | def test_parse_tagged_words(pretrain_file):

FILE: stanza/tests/constituency/test_parse_transitions.py
  function build_initial_state (line 11) | def build_initial_state(model, num_states=1):
  function test_initial_state (line 21) | def test_initial_state(model=None):
  function test_shift (line 36) | def test_shift(model=None):
  function test_initial_unary (line 82) | def test_initial_unary(model=None):
  function test_unary (line 96) | def test_unary(model=None):
  function test_unary_requires_root (line 120) | def test_unary_requires_root(model=None):
  function test_open (line 154) | def test_open(model=None):
  function test_compound_open (line 183) | def test_compound_open(model=None):
  function test_in_order_open (line 209) | def test_in_order_open(model=None):
  function test_too_many_unaries_close (line 256) | def test_too_many_unaries_close():
  function test_too_many_unaries_open (line 282) | def test_too_many_unaries_open():
  function test_close (line 313) | def test_close(model=None):
  function test_in_order_compound_finalize (line 367) | def test_in_order_compound_finalize(model=None):
  function test_hashes (line 408) | def test_hashes():
  function test_sort (line 455) | def test_sort():
  function test_check_transitions (line 470) | def test_check_transitions():

FILE: stanza/tests/constituency/test_parse_tree.py
  function test_leaf_preterminal (line 10) | def test_leaf_preterminal():
  function test_yield_preterminals (line 30) | def test_yield_preterminals():
  function test_depth (line 38) | def test_depth():
  function test_unique_labels (line 44) | def test_unique_labels():
  function test_unique_tags (line 58) | def test_unique_tags():
  function test_unique_words (line 71) | def test_unique_words():
  function test_rare_words (line 83) | def test_rare_words():
  function test_common_words (line 95) | def test_common_words():
  function test_root_labels (line 107) | def test_root_labels():
  function test_prune_none (line 122) | def test_prune_none():
  function test_simplify_labels (line 136) | def test_simplify_labels():
  function test_remap_constituent_labels (line 144) | def test_remap_constituent_labels():
  function test_remap_constituent_words (line 154) | def test_remap_constituent_words():
  function test_replace_words (line 164) | def test_replace_words():
  function test_compound_constituents (line 176) | def test_compound_constituents():
  function test_equals (line 190) | def test_equals():
  function test_count_unaries (line 290) | def test_count_unaries():
  function test_str_bracket_labels (line 299) | def test_str_bracket_labels():
  function test_all_leaves_are_preterminals (line 307) | def test_all_leaves_are_preterminals():
  function test_latex (line 318) | def test_latex():
  function test_pretty_print (line 328) | def test_pretty_print():
  function test_reverse (line 364) | def test_reverse():

FILE: stanza/tests/constituency/test_positional_encoding.py
  function test_positional_encoding (line 13) | def test_positional_encoding():
  function test_resize (line 19) | def test_resize():
  function test_arange (line 25) | def test_arange():
  function test_add (line 31) | def test_add():

FILE: stanza/tests/constituency/test_selftrain_vi_quad.py
  function test_read_file (line 21) | def test_read_file():

FILE: stanza/tests/constituency/test_text_processing.py
  function pipeline (line 20) | def pipeline():
  function test_read_tokenized_file (line 23) | def test_read_tokenized_file(tmp_path):
  function test_parse_tokenized_sentences (line 32) | def test_parse_tokenized_sentences(pipeline):
  function test_parse_text (line 47) | def test_parse_text(tmp_path, pipeline):
  function test_parse_dir (line 64) | def test_parse_dir(tmp_path, pipeline):
  function test_parse_text (line 89) | def test_parse_text(tmp_path, pipeline):

FILE: stanza/tests/constituency/test_top_down_oracle.py
  function get_single_repair (line 26) | def get_single_repair(gold_sequence, wrong_transition, repair_fn, idx, *...
  function build_state (line 29) | def build_state(model, tree, num_transitions):
  function test_fix_open_shift (line 38) | def test_fix_open_shift():
  function test_fix_open_shift_observed_error (line 56) | def test_fix_open_shift_observed_error():
  function test_open_open_ambiguous_unary_fix (line 100) | def test_open_open_ambiguous_unary_fix():
  function test_open_open_ambiguous_later_fix (line 113) | def test_open_open_ambiguous_later_fix():
  function test_fix_close_shift_ambiguous_immediate (line 153) | def test_fix_close_shift_ambiguous_immediate():
  function test_fix_close_shift_ambiguous_later (line 168) | def test_fix_close_shift_ambiguous_later():
  function test_oracle_with_optional_level (line 181) | def test_oracle_with_optional_level():
  function test_fix_close_shift (line 207) | def test_fix_close_shift():
  function test_fix_close_shift_deeper_tree (line 233) | def test_fix_close_shift_deeper_tree():
  function test_fix_close_shift_open_tree (line 251) | def test_fix_close_shift_open_tree():
  function test_fix_close_open (line 290) | def test_fix_close_open():
  function test_fix_close_open_invalid (line 308) | def test_fix_close_open_invalid():
  function test_fix_close_open_ambiguous_immediate (line 322) | def test_fix_close_open_ambiguous_immediate():
  function test_fix_close_open_ambiguous_later (line 349) | def test_fix_close_open_ambiguous_later():
  function test_shift_close (line 387) | def test_shift_close():
  function test_shift_open_ambiguous_unary (line 413) | def test_shift_open_ambiguous_unary():
  function test_shift_open_ambiguous_later (line 429) | def test_shift_open_ambiguous_later():

FILE: stanza/tests/constituency/test_trainer.py
  function build_trainer (line 61) | def build_trainer(wordvec_pretrain_file, *args, treebank=TREEBANK):
  class TestTrainer (line 78) | class TestTrainer:
    method wordvec_pretrain_file (line 80) | def wordvec_pretrain_file(self):
    method tiny_random_xlnet (line 84) | def tiny_random_xlnet(self, tmp_path_factory):
    method tiny_random_bart (line 103) | def tiny_random_bart(self, tmp_path_factory):
    method test_initial_model (line 116) | def test_initial_model(self, wordvec_pretrain_file):
    method test_save_load_model (line 124) | def test_save_load_model(self, wordvec_pretrain_file):
    method test_relearn_structure (line 147) | def test_relearn_structure(self, wordvec_pretrain_file):
    method write_treebanks (line 170) | def write_treebanks(self, tmpdirname):
    method training_args (line 182) | def training_args(self, wordvec_pretrain_file, tmpdirname, train_treeb...
    method run_train_test (line 196) | def run_train_test(self, wordvec_pretrain_file, tmpdirname, num_epochs...
    method test_train (line 250) | def test_train(self, wordvec_pretrain_file):
    method test_early_dropout (line 257) | def test_early_dropout(self, wordvec_pretrain_file):
    method test_train_silver (line 280) | def test_train_silver(self, wordvec_pretrain_file):
    method test_train_checkpoint (line 291) | def test_train_checkpoint(self, wordvec_pretrain_file):
    method run_multistage_tests (line 317) | def run_multistage_tests(self, wordvec_pretrain_file, tmpdirname, use_...
    method test_multistage_lattn (line 348) | def test_multistage_lattn(self, wordvec_pretrain_file):
    method test_multistage_no_lattn (line 357) | def test_multistage_no_lattn(self, wordvec_pretrain_file):
    method test_multistage_optimizer (line 366) | def test_multistage_optimizer(self, wordvec_pretrain_file):
    method test_grad_clip_hooks (line 393) | def test_grad_clip_hooks(self, wordvec_pretrain_file):
    method test_analyze_trees (line 401) | def test_analyze_trees(self, wordvec_pretrain_file):
    method bert_weights_allclose (line 432) | def bert_weights_allclose(self, bert_model, parser_model):
    method frozen_transformer_test (line 443) | def frozen_transformer_test(self, wordvec_pretrain_file, transformer_n...
    method test_bert_frozen (line 462) | def test_bert_frozen(self, wordvec_pretrain_file):
    method test_xlnet_frozen (line 468) | def test_xlnet_frozen(self, wordvec_pretrain_file, tiny_random_xlnet):
    method test_bart_frozen (line 474) | def test_bart_frozen(self, wordvec_pretrain_file, tiny_random_bart):
    method test_bert_finetune_one_epoch (line 480) | def test_bert_finetune_one_epoch(self, wordvec_pretrain_file):
    method finetune_transformer_test (line 512) | def finetune_transformer_test(self, wordvec_pretrain_file, transformer...
    method test_bert_finetune (line 532) | def test_bert_finetune(self, wordvec_pretrain_file):
    method test_xlnet_finetune (line 538) | def test_xlnet_finetune(self, wordvec_pretrain_file, tiny_random_xlnet):
    method test_stage1_bert_finetune (line 544) | def test_stage1_bert_finetune(self, wordvec_pretrain_file):
    method one_layer_finetune_transformer_test (line 582) | def one_layer_finetune_transformer_test(self, wordvec_pretrain_file, t...
    method test_bert_finetune_one_layer (line 608) | def test_bert_finetune_one_layer(self, wordvec_pretrain_file):
    method test_xlnet_finetune_one_layer (line 611) | def test_xlnet_finetune_one_layer(self, wordvec_pretrain_file, tiny_ra...
    method test_peft_finetune (line 614) | def test_peft_finetune(self, tmp_path, wordvec_pretrain_file):
    method test_peft_twostage_finetune (line 619) | def test_peft_twostage_finetune(self, wordvec_pretrain_file):

FILE: stanza/tests/constituency/test_transformer_tree_stack.py
  function test_initial_state (line 9) | def test_initial_state():
  function test_output (line 20) | def test_output():
  function test_push_state_single (line 30) | def test_push_state_single():
  function test_push_state_same_length (line 47) | def test_push_state_same_length():
  function test_push_state_different_length (line 69) | def test_push_state_different_length():
  function test_mask (line 89) | def test_mask():
  function test_position (line 121) | def test_position():
  function test_length_limit (line 139) | def test_length_limit():
  function test_two_heads (line 171) | def test_two_heads():

FILE: stanza/tests/constituency/test_transition_sequence.py
  function reconstruct_tree (line 13) | def reconstruct_tree(tree, sequence, transition_scheme=TransitionScheme....
  function check_reproduce_tree (line 32) | def check_reproduce_tree(transition_scheme):
  function test_top_down_unary (line 61) | def test_top_down_unary():
  function test_top_down_no_unary (line 64) | def test_top_down_no_unary():
  function test_in_order (line 67) | def test_in_order():
  function test_in_order_compound (line 70) | def test_in_order_compound():
  function test_in_order_unary (line 73) | def test_in_order_unary():
  function test_all_transitions (line 76) | def test_all_transitions():
  function test_all_transitions_no_unary (line 86) | def test_all_transitions_no_unary():
  function test_top_down_compound_unary (line 95) | def test_top_down_compound_unary():
  function test_chinese_tree (line 116) | def test_chinese_tree():
  function test_chinese_tree_reversed (line 131) | def test_chinese_tree_reversed():

FILE: stanza/tests/constituency/test_tree_reader.py
  function test_simple (line 9) | def test_simple():
  function test_newlines (line 23) | def test_newlines():
  function test_parens (line 31) | def test_parens():
  function test_complicated (line 47) | def test_complicated():
  function test_one_word (line 64) | def test_one_word():
  function test_missing_close_parens (line 80) | def test_missing_close_parens():
  function test_mixed_tree (line 91) | def test_mixed_tree():
  function test_unlabeled_tree (line 105) | def test_unlabeled_tree():

FILE: stanza/tests/constituency/test_tree_stack.py
  function test_simple (line 9) | def test_simple():
  function test_iter (line 20) | def test_iter():
  function test_str (line 28) | def test_str():
  function test_len (line 35) | def test_len():
  function test_long_len (line 43) | def test_long_len():

FILE: stanza/tests/constituency/test_utils.py
  function pipeline (line 13) | def pipeline():
  function test_xpos_retag (line 18) | def test_xpos_retag(pipeline):
  function test_upos_retag (line 32) | def test_upos_retag(pipeline):
  function test_replace_tags (line 45) | def test_replace_tags():

FILE: stanza/tests/constituency/test_vietnamese.py
  function test_read_vi_tree (line 33) | def test_read_vi_tree():
  function test_vi_embedding (line 55) | def test_vi_embedding():
  function test_space_formatting (line 77) | def test_space_formatting():
  function test_vlsp_formatting (line 89) | def test_vlsp_formatting():
  function test_language_formatting (line 109) | def test_language_formatting():

FILE: stanza/tests/datasets/coref/test_hebrew_iahlt.py
  function tokenizer (line 10) | def tokenizer():
  function test_extract_doc (line 31) | def test_extract_doc(tokenizer):

FILE: stanza/tests/datasets/ner/test_prepare_ner_file.py
  function check_json_file (line 35) | def check_json_file(doc, raw_text, expected_sentences, expected_tokens):
  function write_and_convert (line 52) | def write_and_convert(tmp_path, raw_text):
  function run_test (line 65) | def run_test(tmp_path, raw_text, expected_sentences, expected_tokens):
  function test_simple (line 69) | def test_simple(tmp_path):
  function test_ner_at_end (line 72) | def test_ner_at_end(tmp_path):
  function test_two_sentences (line 75) | def test_two_sentences(tmp_path):

FILE: stanza/tests/datasets/ner/test_utils.py
  function test_list_doc_entities (line 10) | def test_list_doc_entities(tmp_path):

FILE: stanza/tests/datasets/test_common.py
  function test_fake_deps_no_change (line 65) | def test_fake_deps_no_change():
  function test_fake_deps_all_tokens (line 69) | def test_fake_deps_all_tokens():
  function test_fake_deps_only_root (line 74) | def test_fake_deps_only_root():

FILE: stanza/tests/datasets/test_vietnamese_renormalization.py
  function test_replace_all (line 8) | def test_replace_all():
  function test_replace_file (line 14) | def test_replace_file(tmp_path):

FILE: stanza/tests/depparse/test_depparse_data.py
  function make_fake_data (line 11) | def make_fake_data(*lengths):
  function check_batches (line 19) | def check_batches(batched_data, expected_sizes, expected_order):
  function test_data_to_batches_eval_mode (line 28) | def test_data_to_batches_eval_mode():
  function test_punct_simplification (line 93) | def test_punct_simplification():

FILE: stanza/tests/depparse/test_parser.py
  class TestParser (line 76) | class TestParser:
    method wordvec_pretrain_file (line 78) | def wordvec_pretrain_file(self):
    method run_training (line 81) | def run_training(self, tmp_path, wordvec_pretrain_file, train_text, de...
    method test_train (line 129) | def test_train(self, tmp_path, wordvec_pretrain_file):
    method test_arc_embedding (line 135) | def test_arc_embedding(self, tmp_path, wordvec_pretrain_file):
    method test_no_arc_embedding (line 141) | def test_no_arc_embedding(self, tmp_path, wordvec_pretrain_file):
    method test_zipfile_train (line 147) | def test_zipfile_train(self, tmp_path, wordvec_pretrain_file):
    method test_with_bert_nlayers (line 153) | def test_with_bert_nlayers(self, tmp_path, wordvec_pretrain_file):
    method test_with_bert_finetuning (line 156) | def test_with_bert_finetuning(self, tmp_path, wordvec_pretrain_file):
    method test_with_bert_finetuning_resaved (line 161) | def test_with_bert_finetuning_resaved(self, tmp_path, wordvec_pretrain...
    method test_with_peft (line 187) | def test_with_peft(self, tmp_path, wordvec_pretrain_file):
    method test_single_optimizer_checkpoint (line 192) | def test_single_optimizer_checkpoint(self, tmp_path, wordvec_pretrain_...
    method test_two_optimizers_checkpoint (line 214) | def test_two_optimizers_checkpoint(self, tmp_path, wordvec_pretrain_fi...

FILE: stanza/tests/langid/test_langid.py
  function basic_multilingual (line 17) | def basic_multilingual():
  function enfr_multilingual (line 21) | def enfr_multilingual():
  function en_multilingual (line 25) | def en_multilingual():
  function clean_multilingual (line 29) | def clean_multilingual():
  function test_langid (line 32) | def test_langid(basic_multilingual):
  function test_langid_benchmark (line 45) | def test_langid_benchmark(basic_multilingual):
  function test_text_cleaning (line 558) | def test_text_cleaning(basic_multilingual, clean_multilingual):
  function test_emoji_cleaning (line 573) | def test_emoji_cleaning():
  function test_lang_subset (line 581) | def test_lang_subset(basic_multilingual, enfr_multilingual, en_multiling...
  function test_lang_subset_unlikely_language (line 600) | def test_lang_subset_unlikely_language(en_multilingual):

FILE: stanza/tests/langid/test_multilingual.py
  function run_multilingual_pipeline (line 15) | def run_multilingual_pipeline(en_has_dependencies=True, fr_has_dependenc...
  function test_multilingual_pipeline (line 62) | def test_multilingual_pipeline():
  function test_multilingual_pipeline_small_cache (line 68) | def test_multilingual_p

Download .json

Condensed preview — 579 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,232K chars).

[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 651,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the "
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 604,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: enhancement\nassignees: ''\n\n---\n\n**Is"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/question.md",
    "chars": 938,
    "preview": "---\nname: Question\nabout: 'Question about general usage. '\ntitle: \"[QUESTION]\"\nlabels: question\nassignees: ''\n\n---\n\nBefo"
  },
  {
    "path": ".github/pull_request_template.md",
    "chars": 773,
    "preview": "**BEFORE YOU START**: please make sure your pull request is against the `dev` branch. \nWe cannot accept pull requests ag"
  },
  {
    "path": ".github/stale.yml",
    "chars": 781,
    "preview": "# Number of days of inactivity before an issue becomes stale\ndaysUntilStale: 60\n# Number of days of inactivity before a "
  },
  {
    "path": ".github/workflows/stanza-tests.yaml",
    "chars": 1800,
    "preview": "name: Run Stanza Tests\non: [push]\njobs:\n  Run-Stanza-Tests:\n    runs-on: self-hosted\n    steps:\n      - run: echo \"🎉 The"
  },
  {
    "path": ".gitignore",
    "chars": 2653,
    "preview": "# kept from original\n.DS_Store\n*.tmp\n*.pkl\n*.conllu\n*.lem\n*.toklabels\n\n# also data w/o any slash to account for symlinks"
  },
  {
    "path": ".travis.yml",
    "chars": 832,
    "preview": "language: python\npython:\n  - 3.6.5\nnotifications:\n  email: false\ninstall:\n  - pip install --quiet .\n  - export CORENLP_H"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1597,
    "preview": "# Contributing to Stanza\n\nWe would love to see contributions to Stanza from the community! Contributions that we welcome"
  },
  {
    "path": "LICENSE",
    "chars": 603,
    "preview": "Copyright 2019 The Board of Trustees of The Leland Stanford Junior University\n\nLicensed under the Apache License, Versio"
  },
  {
    "path": "README.md",
    "chars": 11355,
    "preview": "<div align=\"center\"><img src=\"https://github.com/stanfordnlp/stanza/raw/dev/images/stanza-logo.png\" height=\"100px\"/></di"
  },
  {
    "path": "demo/CONLL_Dependency_Visualizer_Example.ipynb",
    "chars": 1761,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"c0fd86c8\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "demo/Dependency_Visualization_Testing.ipynb",
    "chars": 2028,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"64b2a9e0\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "demo/NER_Visualization.ipynb",
    "chars": 3652,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"abf300bb\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "demo/Stanza_Beginners_Guide.ipynb",
    "chars": 12623,
    "preview": "{\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0,\n  \"metadata\": {\n    \"colab\": {\n      \"name\": \"Stanza-Beginners-Guide.ipynb\",\n "
  },
  {
    "path": "demo/Stanza_CoreNLP_Interface.ipynb",
    "chars": 18478,
    "preview": "{\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0,\n  \"metadata\": {\n    \"colab\": {\n      \"name\": \"Stanza-CoreNLP-Interface.ipynb\","
  },
  {
    "path": "demo/arabic_test.conllu.txt",
    "chars": 15516,
    "preview": "# newdoc id = assabah.20041005.0017\n# newpar id = assabah.20041005.0017:p1\n# sent_id = assabah.20041005.0017:p1u1\n# text"
  },
  {
    "path": "demo/corenlp.py",
    "chars": 3191,
    "preview": "from stanza.server import CoreNLPClient\n\n# example text\nprint('---')\nprint('input text')\nprint('')\n\ntext = \"Chris Mannin"
  },
  {
    "path": "demo/en_test.conllu.txt",
    "chars": 4496,
    "preview": "# newdoc id = weblog-blogspot.com_zentelligence_20040423000200_ENG_20040423_000200\n# sent_id = weblog-blogspot.com_zente"
  },
  {
    "path": "demo/japanese_test.conllu.txt",
    "chars": 11631,
    "preview": "# newdoc id = test-s1\n# sent_id = test-s1\n# text = これに不快感を示す住民はいましたが,現在,表立って反対や抗議の声を挙げている住民はいないようです。\n1\tこれ\t此れ\tPRON\t代名詞\t_\t"
  },
  {
    "path": "demo/pipeline_demo.py",
    "chars": 2334,
    "preview": "\"\"\"\nA basic demo of the Stanza neural pipeline.\n\"\"\"\n\nimport sys\nimport argparse\nimport os\n\nimport stanza\nfrom stanza.res"
  },
  {
    "path": "demo/scenegraph.py",
    "chars": 592,
    "preview": "\"\"\"\nVery short demo for the SceneGraph interface in the CoreNLP server\n\nRequires CoreNLP >= 4.5.5, Stanza >= 1.5.1\n\"\"\"\n\n"
  },
  {
    "path": "demo/semgrex visualization.ipynb",
    "chars": 21981,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"id\": \"2787d5f5\",\n   \"metadata\": {},\n   \"output"
  },
  {
    "path": "demo/semgrex.py",
    "chars": 933,
    "preview": "import stanza\nfrom stanza.server.semgrex import Semgrex\n\nnlp = stanza.Pipeline(\"en\", processors=\"tokenize,pos,lemma,depp"
  },
  {
    "path": "demo/ssurgeon_script.txt",
    "chars": 879,
    "preview": "# To run this, use the stanza/server/ssurgeon.py main file.\n# For example:\n# python3 stanza/server/ssurgeon.py  --edit_f"
  },
  {
    "path": "doc/CoreNLP.proto",
    "chars": 31532,
    "preview": "syntax = \"proto2\";\n\npackage edu.stanford.nlp.pipeline;\n\noption java_package = \"edu.stanford.nlp.pipeline\";\noption java_o"
  },
  {
    "path": "scripts/config.sh",
    "chars": 2211,
    "preview": "#!/bin/bash\n#\n# Set environment variables for the training and testing of stanza modules.\n\n# Set UDBASE to the location "
  },
  {
    "path": "scripts/download_vectors.sh",
    "chars": 2907,
    "preview": "#!/bin/bash\n#\n# Download word vector files for all supported languages. Run as:\n#   ./download_vectors.sh WORDVEC_DIR\n# "
  },
  {
    "path": "setup.py",
    "chars": 5415,
    "preview": "# Always prefer setuptools over distutils\nimport re\n\nfrom setuptools import setup, find_packages\n# To use a consistent e"
  },
  {
    "path": "stanza/__init__.py",
    "chars": 1086,
    "preview": "from stanza.pipeline.core import DownloadMethod, Pipeline\nfrom stanza.pipeline.multilingual import MultilingualPipeline\n"
  },
  {
    "path": "stanza/_version.py",
    "chars": 107,
    "preview": "\"\"\" Single source of truth for version number \"\"\"\n\n__version__ = \"1.11.1\"\n__resources_version__ = '1.11.0'\n"
  },
  {
    "path": "stanza/models/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/_training_logging.py",
    "chars": 83,
    "preview": "import logging\n\nlogger = logging.getLogger('stanza')\nlogger.setLevel(logging.DEBUG)"
  },
  {
    "path": "stanza/models/charlm.py",
    "chars": 16203,
    "preview": "\"\"\"\nEntry point for training and evaluating a character-level neural language model.\n\"\"\"\n\nimport argparse\nfrom copy impo"
  },
  {
    "path": "stanza/models/classifier.py",
    "chars": 36241,
    "preview": "import argparse\nimport ast\nimport logging\nimport os\nimport random\nimport re\nfrom enum import Enum\n\nimport torch\nimport t"
  },
  {
    "path": "stanza/models/classifiers/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/classifiers/base_classifier.py",
    "chars": 1938,
    "preview": "from abc import ABC, abstractmethod\n\nimport logging\n\nimport torch\nimport torch.nn as nn\n\nfrom stanza.models.common.utils"
  },
  {
    "path": "stanza/models/classifiers/cnn_classifier.py",
    "chars": 27970,
    "preview": "import dataclasses\nimport logging\nimport math\nimport os\nimport random\nimport re\n\nimport numpy as np\nimport torch\nimport "
  },
  {
    "path": "stanza/models/classifiers/config.py",
    "chars": 1547,
    "preview": "from dataclasses import dataclass\nfrom typing import List, Union\n\n# TODO: perhaps put the enums in this file\nfrom stanza"
  },
  {
    "path": "stanza/models/classifiers/constituency_classifier.py",
    "chars": 4268,
    "preview": "\"\"\"\nA classifier that uses a constituency parser for the base embeddings\n\"\"\"\n\nimport dataclasses\nimport logging\nfrom typ"
  },
  {
    "path": "stanza/models/classifiers/data.py",
    "chars": 6069,
    "preview": "\"\"\"Stanza models classifier data functions.\"\"\"\n\nimport collections\nfrom collections import namedtuple\nimport logging\nimp"
  },
  {
    "path": "stanza/models/classifiers/iterate_test.py",
    "chars": 2168,
    "preview": "\"\"\"Iterate test.\"\"\"\nimport argparse\nimport glob\nimport logging\n\nimport stanza.models.classifier as classifier\nimport sta"
  },
  {
    "path": "stanza/models/classifiers/trainer.py",
    "chars": 16546,
    "preview": "\"\"\"\nOrganizes the model itself and its optimizer in one place\n\nSaving the optimizer allows for easy restarting of traini"
  },
  {
    "path": "stanza/models/classifiers/utils.py",
    "chars": 1023,
    "preview": "from enum import Enum\n\nfrom torch import nn\n\n\"\"\"\nDefines some methods which may occur in multiple model types\n\"\"\"\n# NLP "
  },
  {
    "path": "stanza/models/common/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/common/beam.py",
    "chars": 4481,
    "preview": "from __future__ import division\nimport torch\n\nimport stanza.models.common.seq2seq_constant as constant\n\nr\"\"\"\n Adapted an"
  },
  {
    "path": "stanza/models/common/bert_embedding.py",
    "chars": 27075,
    "preview": "import math\nimport logging\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom to"
  },
  {
    "path": "stanza/models/common/biaffine.py",
    "chars": 3582,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass PairwiseBilinear(nn.Module):\n    ''' A bilinea"
  },
  {
    "path": "stanza/models/common/build_short_name_to_treebank.py",
    "chars": 2969,
    "preview": "import glob\nimport os\n\nfrom stanza.models.common.constant import treebank_to_short_name, UnknownLanguageError, treebank_"
  },
  {
    "path": "stanza/models/common/char_model.py",
    "chars": 15746,
    "preview": "\"\"\"\nBased on\n\n@inproceedings{akbik-etal-2018-contextual,\n    title = \"Contextual String Embeddings for Sequence Labeling"
  },
  {
    "path": "stanza/models/common/chuliu_edmonds.py",
    "chars": 11649,
    "preview": "# Adapted from Tim's code here: https://github.com/tdozat/Parser-v3/blob/master/scripts/chuliu_edmonds.py\n\nimport numpy "
  },
  {
    "path": "stanza/models/common/constant.py",
    "chars": 15328,
    "preview": "\"\"\"\nGlobal constants.\n\nThese language codes mirror UD language codes when possible\n\"\"\"\n\nimport re\n\nclass UnknownLanguage"
  },
  {
    "path": "stanza/models/common/convert_pretrain.py",
    "chars": 1856,
    "preview": "\"\"\"\nA utility script to load a word embedding file from a text file and save it as a .pt\n\nRun it as follows:\n  python st"
  },
  {
    "path": "stanza/models/common/count_ner_coverage.py",
    "chars": 1171,
    "preview": "from stanza.models.common import pretrain\nimport argparse\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n    "
  },
  {
    "path": "stanza/models/common/count_pretrain_coverage.py",
    "chars": 1535,
    "preview": "\"\"\"A simple script to count the fraction of words in a UD dataset which are in a particular pretrain.\n\nFor example, this"
  },
  {
    "path": "stanza/models/common/crf.py",
    "chars": 5915,
    "preview": "\"\"\"\nCRF loss and viterbi decoding.\n\"\"\"\n\nimport math\nfrom numbers import Number\nimport numpy as np\nimport torch\nfrom torc"
  },
  {
    "path": "stanza/models/common/data.py",
    "chars": 6111,
    "preview": "\"\"\"\nUtility functions for data transformations.\n\"\"\"\n\nimport logging\nimport random\n\nimport torch\n\nimport stanza.models.co"
  },
  {
    "path": "stanza/models/common/doc.py",
    "chars": 73035,
    "preview": "\"\"\"\nBasic data structures\n\"\"\"\n\nimport io\nfrom itertools import repeat\nimport re\nimport json\nimport pickle\nimport warning"
  },
  {
    "path": "stanza/models/common/dropout.py",
    "chars": 2858,
    "preview": "import torch\nimport torch.nn as nn\n\nclass WordDropout(nn.Module):\n    \"\"\" A word dropout layer that's designed for embed"
  },
  {
    "path": "stanza/models/common/exceptions.py",
    "chars": 452,
    "preview": "\"\"\"\nA couple more specific FileNotFoundError exceptions\n\nThe idea being, the caller can catch it and report a more usefu"
  },
  {
    "path": "stanza/models/common/foundation_cache.py",
    "chars": 5809,
    "preview": "\"\"\"\nKeeps BERT, charlm, word embedings in a cache to save memory\n\"\"\"\n\nfrom collections import namedtuple\nfrom copy impor"
  },
  {
    "path": "stanza/models/common/hlstm.py",
    "chars": 5180,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.utils.rnn import pad_packed_sequence, p"
  },
  {
    "path": "stanza/models/common/large_margin_loss.py",
    "chars": 2638,
    "preview": "\"\"\"\nLargeMarginInSoftmax, from the article\n\n@inproceedings{kobayashi2019bmvc,\n  title={Large Margin In Softmax Cross-Ent"
  },
  {
    "path": "stanza/models/common/loss.py",
    "chars": 4716,
    "preview": "\"\"\"\nDifferent loss functions.\n\"\"\"\n\nimport logging\nimport numpy as np\nimport torch\nimport torch.nn as nn\n\nimport stanza.m"
  },
  {
    "path": "stanza/models/common/maxout_linear.py",
    "chars": 1265,
    "preview": "\"\"\"\nA layer which implements maxout from the \"Maxout Networks\" paper\n\nhttps://arxiv.org/pdf/1302.4389v4.pdf\nGoodfellow, "
  },
  {
    "path": "stanza/models/common/packed_lstm.py",
    "chars": 4855,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.utils.rnn import pad_packed_sequence, p"
  },
  {
    "path": "stanza/models/common/peft_config.py",
    "chars": 5293,
    "preview": "\"\"\"\nSet a few common flags for peft uage\n\"\"\"\n\n\nTRANSFORMER_LORA_RANK = {}\nDEFAULT_LORA_RANK = 64\n\nTRANSFORMER_LORA_ALPHA"
  },
  {
    "path": "stanza/models/common/pretrain.py",
    "chars": 12173,
    "preview": "\"\"\"\nSupports for pretrained data.\n\"\"\"\nimport csv\nimport os\nimport re\n\nimport lzma\nimport logging\nimport numpy as np\nimpo"
  },
  {
    "path": "stanza/models/common/relative_attn.py",
    "chars": 6477,
    "preview": "import logging\n\nimport torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nlogger = logging.getLogger('stanza')\n"
  },
  {
    "path": "stanza/models/common/seq2seq_constant.py",
    "chars": 221,
    "preview": "\"\"\"\nConstants for seq2seq models.\n\"\"\"\n\nPAD = '<PAD>'\nPAD_ID = 0\nUNK = '<UNK>'\nUNK_ID = 1\nSOS = '<SOS>'\nSOS_ID = 2\nEOS = "
  },
  {
    "path": "stanza/models/common/seq2seq_model.py",
    "chars": 16817,
    "preview": "\"\"\"\nThe full encoder-decoder model, built on top of the base seq2seq modules.\n\"\"\"\n\nimport logging\nimport torch\nfrom torc"
  },
  {
    "path": "stanza/models/common/seq2seq_modules.py",
    "chars": 8483,
    "preview": "\"\"\"\nPytorch implementation of basic sequence to Sequence modules.\n\"\"\"\n\nimport logging\nimport torch\nimport torch.nn as nn"
  },
  {
    "path": "stanza/models/common/seq2seq_utils.py",
    "chars": 3334,
    "preview": "\"\"\"\nUtils for seq2seq models.\n\"\"\"\nfrom collections import Counter\nimport random\nimport json\nimport torch\n\nimport stanza."
  },
  {
    "path": "stanza/models/common/short_name_to_treebank.py",
    "chars": 42266,
    "preview": "# This module is autogenerated by build_short_name_to_treebank.py\n# Please do not edit\n\nSHORT_NAMES = {\n    'abq_atb':  "
  },
  {
    "path": "stanza/models/common/stanza_object.py",
    "chars": 1147,
    "preview": "def _readonly_setter(self, name):\n    full_classname = self.__class__.__module__\n    if full_classname is None:\n        "
  },
  {
    "path": "stanza/models/common/trainer.py",
    "chars": 664,
    "preview": "import torch\n\nclass Trainer:\n    def change_lr(self, new_lr):\n        for param_group in self.optimizer.param_groups:\n  "
  },
  {
    "path": "stanza/models/common/utils.py",
    "chars": 37293,
    "preview": "\"\"\"\nUtility functions.\n\"\"\"\n\nimport argparse\nfrom collections import Counter\nfrom contextlib import contextmanager\nimport"
  },
  {
    "path": "stanza/models/common/vocab.py",
    "chars": 11187,
    "preview": "from copy import copy\nfrom collections import Counter, OrderedDict\nfrom collections.abc import Iterable\nimport os\nimport"
  },
  {
    "path": "stanza/models/constituency/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/constituency/base_model.py",
    "chars": 23254,
    "preview": "\"\"\"\nThe BaseModel is passed to the transitions so that the transitions\ncan operate on a parsing state without knowing th"
  },
  {
    "path": "stanza/models/constituency/base_trainer.py",
    "chars": 6998,
    "preview": "from enum import Enum\nimport logging\nimport os\n\nimport torch\n\nfrom pickle import UnpicklingError\nimport warnings\n\nlogger"
  },
  {
    "path": "stanza/models/constituency/dynamic_oracle.py",
    "chars": 6808,
    "preview": "from collections import namedtuple\n\nimport numpy as np\n\nfrom stanza.models.constituency.parse_transitions import Shift, "
  },
  {
    "path": "stanza/models/constituency/ensemble.py",
    "chars": 21325,
    "preview": "\"\"\"\nPrototype of ensembling N models together on the same dataset\n\nThe main inference method is to run the normal transi"
  },
  {
    "path": "stanza/models/constituency/error_analysis_in_order.py",
    "chars": 9067,
    "preview": "\"\"\"\nA tool with an initial set of error analysis for in-order parsing.\n\nAnalyzes the first error created in the parser\n\n"
  },
  {
    "path": "stanza/models/constituency/evaluate_treebanks.py",
    "chars": 1249,
    "preview": "\"\"\"\nRead multiple treebanks, score the results.\n\nReports the k-best score if multiple predicted treebanks are given.\n\"\"\""
  },
  {
    "path": "stanza/models/constituency/in_order_compound_oracle.py",
    "chars": 14298,
    "preview": "from enum import Enum\n\nfrom stanza.models.constituency.dynamic_oracle import advance_past_constituents, find_in_order_co"
  },
  {
    "path": "stanza/models/constituency/in_order_oracle.py",
    "chars": 45987,
    "preview": "from enum import Enum\n\nfrom stanza.models.constituency.dynamic_oracle import advance_past_constituents, find_in_order_co"
  },
  {
    "path": "stanza/models/constituency/label_attention.py",
    "chars": 31043,
    "preview": "import numpy as np\nimport functools\nimport sys\nimport torch\nfrom torch.autograd import Variable\nimport torch.nn as nn\nim"
  },
  {
    "path": "stanza/models/constituency/lstm_model.py",
    "chars": 72333,
    "preview": "\"\"\"\nA version of the BaseModel which uses LSTMs to predict the correct next transition\nbased on the current known state."
  },
  {
    "path": "stanza/models/constituency/lstm_tree_stack.py",
    "chars": 3605,
    "preview": "\"\"\"\nKeeps an LSTM in TreeStack form.\n\nThe TreeStack nodes keep the hx and cx for the LSTM, along with a\n\"value\" which re"
  },
  {
    "path": "stanza/models/constituency/parse_transitions.py",
    "chars": 25171,
    "preview": "\"\"\"\nDefines a series of transitions (open a constituent, close a constituent, etc)\n\"\"\"\n\nfrom abc import ABC, abstractmet"
  },
  {
    "path": "stanza/models/constituency/parse_tree.py",
    "chars": 21860,
    "preview": "\"\"\"\nTree datastructure\n\"\"\"\n\nfrom collections import deque, Counter\nimport copy\nfrom enum import Enum\nfrom io import Stri"
  },
  {
    "path": "stanza/models/constituency/parser_training.py",
    "chars": 41532,
    "preview": "from collections import Counter, namedtuple\nimport copy\nimport logging\nimport os\nimport random\nimport re\n\nimport torch\nf"
  },
  {
    "path": "stanza/models/constituency/partitioned_transformer.py",
    "chars": 11165,
    "preview": "\"\"\"\nTransformer with partitioned content and position features.\n\nSee section 3 of https://arxiv.org/pdf/1805.01052.pdf\n\""
  },
  {
    "path": "stanza/models/constituency/positional_encoding.py",
    "chars": 3049,
    "preview": "\"\"\"\nBased on\nhttps://pytorch.org/tutorials/beginner/transformer_tutorial.html#define-the-model\n\"\"\"\n\nimport math\n\nimport "
  },
  {
    "path": "stanza/models/constituency/retagging.py",
    "chars": 6647,
    "preview": "\"\"\"\nRefactor a few functions specifically for retagging trees\n\nRetagging is important because the gold tags will not be "
  },
  {
    "path": "stanza/models/constituency/score_converted_dependencies.py",
    "chars": 2368,
    "preview": "\"\"\"\nScript which processes a dependency file by using the constituency parser, then converting with the CoreNLP converte"
  },
  {
    "path": "stanza/models/constituency/state.py",
    "chars": 5665,
    "preview": "from collections import namedtuple\n\nclass State(namedtuple('State', ['word_queue', 'transitions', 'constituents', 'gold_"
  },
  {
    "path": "stanza/models/constituency/text_processing.py",
    "chars": 6323,
    "preview": "import os\n\nimport logging\n\nfrom stanza.models.common import utils\nfrom stanza.models.constituency.utils import retag_tag"
  },
  {
    "path": "stanza/models/constituency/top_down_oracle.py",
    "chars": 32621,
    "preview": "from enum import Enum\nimport random\n\nfrom stanza.models.constituency.dynamic_oracle import advance_past_constituents, sc"
  },
  {
    "path": "stanza/models/constituency/trainer.py",
    "chars": 16582,
    "preview": "\"\"\"\nThis file includes a variety of methods needed to train new\nconstituency parsers.  It also includes a method to load"
  },
  {
    "path": "stanza/models/constituency/transformer_tree_stack.py",
    "chars": 7555,
    "preview": "\"\"\"\nBased on\n\nTransition-based Parsing with Stack-Transformers\nRamon Fernandez Astudillo, Miguel Ballesteros, Tahira Nas"
  },
  {
    "path": "stanza/models/constituency/transition_sequence.py",
    "chars": 6117,
    "preview": "\"\"\"\nBuild a transition sequence from parse trees.\n\nSupports multiple transition schemes - TOP_DOWN and variants, IN_ORDE"
  },
  {
    "path": "stanza/models/constituency/tree_embedding.py",
    "chars": 6165,
    "preview": "\"\"\"\nA module to use a Constituency Parser to make an embedding for a tree\n\nThe embedding can be produced just from the w"
  },
  {
    "path": "stanza/models/constituency/tree_reader.py",
    "chars": 9212,
    "preview": "\"\"\"\nReads ParseTree objects from a file, string, or similar input\n\nWorks by first splitting the input into (, ), and all"
  },
  {
    "path": "stanza/models/constituency/tree_stack.py",
    "chars": 2033,
    "preview": "\"\"\"\nA utilitiy class for keeping track of intermediate parse states\n\"\"\"\n\nfrom collections import namedtuple\n\nclass TreeS"
  },
  {
    "path": "stanza/models/constituency/utils.py",
    "chars": 13538,
    "preview": "\"\"\"\nCollects a few of the conparser utility methods which don't belong elsewhere\n\"\"\"\n\nfrom collections import Counter\nim"
  },
  {
    "path": "stanza/models/constituency_parser.py",
    "chars": 56258,
    "preview": "\"\"\"A command line interface to a shift reduce constituency parser.\n\nThis follows the work of\nRecurrent neural network gr"
  },
  {
    "path": "stanza/models/coref/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/coref/anaphoricity_scorer.py",
    "chars": 4856,
    "preview": "\"\"\" Describes AnaphicityScorer, a torch module that for a matrix of\nmentions produces their anaphoricity scores.\n\"\"\"\nimp"
  },
  {
    "path": "stanza/models/coref/bert.py",
    "chars": 2438,
    "preview": "\"\"\"Functions related to BERT or similar models\"\"\"\n\nimport logging\nfrom typing import List, Tuple\n\nimport numpy as np    "
  },
  {
    "path": "stanza/models/coref/cluster_checker.py",
    "chars": 8397,
    "preview": "\"\"\" Describes ClusterChecker, a class used to retrieve LEA scores.\nSee aclweb.org/anthology/P16-1060.pdf. \"\"\"\n\nfrom typi"
  },
  {
    "path": "stanza/models/coref/config.py",
    "chars": 1651,
    "preview": "\"\"\" Describes Config, a simple namespace for config values.\n\nFor description of all config values, refer to config.toml."
  },
  {
    "path": "stanza/models/coref/conll.py",
    "chars": 3999,
    "preview": "\"\"\" Contains functions to produce conll-formatted output files with\npredicted spans and their clustering \"\"\"\n\nfrom colle"
  },
  {
    "path": "stanza/models/coref/const.py",
    "chars": 877,
    "preview": "\"\"\" Contains type aliases for coref module \"\"\"\n\nfrom dataclasses import dataclass\nfrom typing import Any, Dict, List, Tu"
  },
  {
    "path": "stanza/models/coref/coref_chain.py",
    "chars": 1377,
    "preview": "\"\"\"\nCoref chain suitable for attaching to a Document after coref processing\n\"\"\"\n\n# by not using namedtuple, we can use t"
  },
  {
    "path": "stanza/models/coref/coref_config.toml",
    "chars": 10118,
    "preview": "# =============================================================================\n# Before you start changing anything her"
  },
  {
    "path": "stanza/models/coref/dataset.py",
    "chars": 2738,
    "preview": "import json\nimport logging\nfrom torch.utils.data import Dataset\n\nfrom stanza.models.coref.tokenizer_customization import"
  },
  {
    "path": "stanza/models/coref/loss.py",
    "chars": 1465,
    "preview": "\"\"\" Describes the loss function used to train the model, which is a weighted\nsum of NLML and BCE losses. \"\"\"\n\nimport tor"
  },
  {
    "path": "stanza/models/coref/model.py",
    "chars": 39947,
    "preview": "\"\"\" see __init__.py \"\"\"\n\nfrom datetime import datetime\nimport dataclasses\nimport json\nimport logging\nimport os\nimport ra"
  },
  {
    "path": "stanza/models/coref/pairwise_encoder.py",
    "chars": 3542,
    "preview": "\"\"\" Describes PairwiseEncodes, that transforms pairwise features, such as\ndistance between the mentions, same/different "
  },
  {
    "path": "stanza/models/coref/predict.py",
    "chars": 2167,
    "preview": "import argparse\n\nimport json\nimport torch\nfrom tqdm import tqdm\n\nfrom stanza.models.coref.model import CorefModel\n\n\nif _"
  },
  {
    "path": "stanza/models/coref/rough_scorer.py",
    "chars": 2224,
    "preview": "\"\"\" Describes RoughScorer, a simple bilinear module to calculate rough\nanaphoricity scores.\n\"\"\"\n\nfrom typing import Tupl"
  },
  {
    "path": "stanza/models/coref/span_predictor.py",
    "chars": 6032,
    "preview": "\"\"\" Describes SpanPredictor which aims to predict spans by taking as input\nhead word and context embeddings.\n\"\"\"\n\nfrom t"
  },
  {
    "path": "stanza/models/coref/tokenizer_customization.py",
    "chars": 745,
    "preview": "\"\"\" This file defines functions used to modify the default behaviour\nof transformers.AutoTokenizer. These changes are ne"
  },
  {
    "path": "stanza/models/coref/utils.py",
    "chars": 3261,
    "preview": "\"\"\" Contains functions not directly linked to coreference resolution \"\"\"\n\nfrom typing import List, Set\n\nimport torch\nimp"
  },
  {
    "path": "stanza/models/coref/word_encoder.py",
    "chars": 4464,
    "preview": "\"\"\" Describes WordEncoder. Extracts mention vectors from bert-encoded text.\n\"\"\"\n\nfrom typing import Tuple\n\nimport torch\n"
  },
  {
    "path": "stanza/models/depparse/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/depparse/data.py",
    "chars": 10019,
    "preview": "import random\nimport logging\nimport torch\n\nfrom stanza.models.common.bert_embedding import filter_data, needs_length_fil"
  },
  {
    "path": "stanza/models/depparse/model.py",
    "chars": 16702,
    "preview": "import logging\nimport os\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom tor"
  },
  {
    "path": "stanza/models/depparse/scorer.py",
    "chars": 3009,
    "preview": "\"\"\"\nUtils and wrappers for scoring parsers.\n\"\"\"\n\nfrom collections import Counter\nimport logging\n\nfrom stanza.models.comm"
  },
  {
    "path": "stanza/models/depparse/trainer.py",
    "chars": 13269,
    "preview": "\"\"\"\nA trainer class to handle training and testing of models.\n\"\"\"\n\nimport copy\nimport sys\nimport logging\nimport torch\nfr"
  },
  {
    "path": "stanza/models/identity_lemmatizer.py",
    "chars": 2479,
    "preview": "\"\"\"\nAn identity lemmatizer that mimics the behavior of a normal lemmatizer but directly uses word as lemma.\n\"\"\"\n\nimport "
  },
  {
    "path": "stanza/models/lang_identifier.py",
    "chars": 10266,
    "preview": "\"\"\"\nEntry point for training and evaluating a Bi-LSTM language identifier\n\"\"\"\n\nimport argparse\nimport json\nimport loggin"
  },
  {
    "path": "stanza/models/langid/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/langid/create_ud_data.py",
    "chars": 8627,
    "preview": "\"\"\"\nScript for producing training/dev/test data from UD data or sentences\n\nExample output data format (one example per l"
  },
  {
    "path": "stanza/models/langid/data.py",
    "chars": 5206,
    "preview": "import json\nimport random\nimport torch\n\n\nclass DataLoader:\n    \"\"\"\n    Class for loading language id data and providing "
  },
  {
    "path": "stanza/models/langid/model.py",
    "chars": 4790,
    "preview": "import os\n\nimport torch\nimport torch.nn as nn\n\n\nclass LangIDBiLSTM(nn.Module):\n    \"\"\"\n    Multi-layer BiLSTM model for "
  },
  {
    "path": "stanza/models/langid/trainer.py",
    "chars": 1778,
    "preview": "import torch\nimport torch.optim as optim\n\nfrom stanza.models.langid.model import LangIDBiLSTM\n\n\nclass Trainer:\n\n    DEFA"
  },
  {
    "path": "stanza/models/lemma/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/lemma/attach_lemma_classifier.py",
    "chars": 945,
    "preview": "import argparse\n\nfrom stanza.models.lemma.trainer import Trainer\nfrom stanza.models.lemma_classifier.base_model import L"
  },
  {
    "path": "stanza/models/lemma/data.py",
    "chars": 8328,
    "preview": "import random\nimport numpy as np\nimport os\nfrom collections import Counter\nimport logging\nimport torch\n\nimport stanza.mo"
  },
  {
    "path": "stanza/models/lemma/edit.py",
    "chars": 631,
    "preview": "\"\"\"\nUtilities for calculating edits between word and lemma forms.\n\"\"\"\n\nEDIT_TO_ID = {'none': 0, 'identity': 1, 'lower': "
  },
  {
    "path": "stanza/models/lemma/scorer.py",
    "chars": 504,
    "preview": "\"\"\"\nUtils and wrappers for scoring lemmatizers.\n\"\"\"\n\nimport logging\n\nfrom stanza.models.common.utils import ud_scores\n\nl"
  },
  {
    "path": "stanza/models/lemma/trainer.py",
    "chars": 13206,
    "preview": "\"\"\"\nA trainer class to handle training and testing of models.\n\"\"\"\n\nimport os\nimport sys\nimport numpy as np\nfrom collecti"
  },
  {
    "path": "stanza/models/lemma/vocab.py",
    "chars": 649,
    "preview": "from collections import Counter\n\nfrom stanza.models.common.vocab import BaseVocab, BaseMultiVocab\nfrom stanza.models.com"
  },
  {
    "path": "stanza/models/lemma_classifier/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/lemma_classifier/base_model.py",
    "chars": 5845,
    "preview": "\"\"\"\nBase class for the LemmaClassifier types.\n\nVersions include LSTM and Transformer varieties\n\"\"\"\n\nimport logging\n\nfrom"
  },
  {
    "path": "stanza/models/lemma_classifier/base_trainer.py",
    "chars": 5848,
    "preview": "\nfrom abc import ABC, abstractmethod\nimport logging\nimport os\nfrom typing import List, Tuple, Any, Mapping\n\nimport torch"
  },
  {
    "path": "stanza/models/lemma_classifier/baseline_model.py",
    "chars": 2130,
    "preview": "\"\"\"\nBaseline model for the existing lemmatizer which always predicts \"be\" and never \"have\" on the \"'s\" token.\n\nThe Basel"
  },
  {
    "path": "stanza/models/lemma_classifier/constants.py",
    "chars": 437,
    "preview": "from enum import Enum\n\nUNKNOWN_TOKEN = \"unk\"  # token name for unknown tokens\nUNKNOWN_TOKEN_IDX = -1   # custom index we"
  },
  {
    "path": "stanza/models/lemma_classifier/evaluate_many.py",
    "chars": 3336,
    "preview": "\"\"\"\nUtils to evaluate many models of the same type at once\n\"\"\"\nimport argparse\nimport os\nimport logging\n\nfrom stanza.mod"
  },
  {
    "path": "stanza/models/lemma_classifier/evaluate_models.py",
    "chars": 11140,
    "preview": "import os\nimport sys\n\nparentdir = os.path.dirname(__file__)\nparentdir = os.path.dirname(parentdir)\nparentdir = os.path.d"
  },
  {
    "path": "stanza/models/lemma_classifier/lstm_model.py",
    "chars": 11238,
    "preview": "import torch\nimport torch.nn as nn\nimport os\nimport logging\nimport math\nfrom torch.nn.utils.rnn import pad_sequence, pac"
  },
  {
    "path": "stanza/models/lemma_classifier/prepare_dataset.py",
    "chars": 5335,
    "preview": "import argparse\nimport json\nimport os\nimport re\n\nimport stanza\nfrom stanza.models.lemma_classifier import utils\n\nfrom ty"
  },
  {
    "path": "stanza/models/lemma_classifier/train_lstm_model.py",
    "chars": 8048,
    "preview": "\"\"\"\nThe code in this file works to train a lemma classifier for 's\n\"\"\"\n\nimport argparse\nimport logging\nimport os\n\nimport"
  },
  {
    "path": "stanza/models/lemma_classifier/train_many.py",
    "chars": 8380,
    "preview": "\"\"\"\nUtils for training and evaluating multiple models simultaneously\n\"\"\"\n\nimport argparse\nimport os\n\nfrom stanza.models."
  },
  {
    "path": "stanza/models/lemma_classifier/train_transformer_model.py",
    "chars": 6316,
    "preview": "\"\"\"\nThis file contains code used to train a baseline transformer model to classify on a lemma of a particular token.\n\"\"\""
  },
  {
    "path": "stanza/models/lemma_classifier/transformer_model.py",
    "chars": 4022,
    "preview": "import torch\nimport torch.nn as nn\nimport os\nimport sys\nimport logging\n\nfrom transformers import AutoTokenizer, AutoMode"
  },
  {
    "path": "stanza/models/lemma_classifier/utils.py",
    "chars": 7358,
    "preview": "from collections import Counter\nimport json\nimport logging\nimport os\nimport random\nfrom typing import List, Tuple, Any, "
  },
  {
    "path": "stanza/models/lemmatizer.py",
    "chars": 14495,
    "preview": "\"\"\"\nEntry point for training and evaluating a lemmatizer.\n\nThis lemmatizer combines a neural sequence-to-sequence archit"
  },
  {
    "path": "stanza/models/mwt/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/mwt/character_classifier.py",
    "chars": 2515,
    "preview": "\"\"\"\nClassify characters based on an LSTM with learned character representations\n\"\"\"\n\nimport logging\n\nimport torch\nfrom t"
  },
  {
    "path": "stanza/models/mwt/data.py",
    "chars": 6566,
    "preview": "import random\nimport numpy as np\nimport os\nfrom collections import Counter, namedtuple\nimport logging\n\nimport torch\nfrom"
  },
  {
    "path": "stanza/models/mwt/scorer.py",
    "chars": 348,
    "preview": "\"\"\"\nUtils and wrappers for scoring MWT\n\"\"\"\nfrom stanza.models.common.utils import ud_scores\n\ndef score(system_conllu_fil"
  },
  {
    "path": "stanza/models/mwt/trainer.py",
    "chars": 8827,
    "preview": "\"\"\"\nA trainer class to handle training and testing of models.\n\"\"\"\n\nimport sys\nimport numpy as np\nfrom collections import"
  },
  {
    "path": "stanza/models/mwt/utils.py",
    "chars": 4006,
    "preview": "import stanza\n\nfrom stanza.models.common import doc\nfrom stanza.models.tokenization.data import TokenizationDataset\nfrom"
  },
  {
    "path": "stanza/models/mwt/vocab.py",
    "chars": 674,
    "preview": "from collections import Counter\n\nfrom stanza.models.common.vocab import BaseVocab\nimport stanza.models.common.seq2seq_co"
  },
  {
    "path": "stanza/models/mwt_expander.py",
    "chars": 15964,
    "preview": "\"\"\"\nEntry point for training and evaluating a multi-word token (MWT) expander.\n\nThis MWT expander combines a neural sequ"
  },
  {
    "path": "stanza/models/ner/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/ner/data.py",
    "chars": 10152,
    "preview": "import random\nimport logging\nimport torch\n\nfrom stanza.models.common.bert_embedding import filter_data, needs_length_fil"
  },
  {
    "path": "stanza/models/ner/model.py",
    "chars": 15354,
    "preview": "import os\nimport logging\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom tor"
  },
  {
    "path": "stanza/models/ner/scorer.py",
    "chars": 6364,
    "preview": "\"\"\"\nAn NER scorer that calculates F1 score given gold and predicted tags.\n\"\"\"\nimport sys\nimport os\nimport logging\nfrom c"
  },
  {
    "path": "stanza/models/ner/trainer.py",
    "chars": 13399,
    "preview": "\"\"\"\nA trainer class to handle training and testing of models.\n\"\"\"\n\nimport sys\nimport logging\nimport torch\nfrom torch imp"
  },
  {
    "path": "stanza/models/ner/utils.py",
    "chars": 11202,
    "preview": "\"\"\"\nUtility functions for dealing with NER tagging.\n\"\"\"\n\nimport logging\n\nfrom stanza.models.common.vocab import EMPTY\n\nl"
  },
  {
    "path": "stanza/models/ner/vocab.py",
    "chars": 2656,
    "preview": "from collections import Counter, OrderedDict\n\nfrom stanza.models.common.vocab import BaseVocab, BaseMultiVocab, CharVoca"
  },
  {
    "path": "stanza/models/ner_tagger.py",
    "chars": 26980,
    "preview": "\"\"\"\nEntry point for training and evaluating an NER tagger.\n\nThis tagger uses BiLSTM layers with character and word-level"
  },
  {
    "path": "stanza/models/parser.py",
    "chars": 29803,
    "preview": "\"\"\"\nEntry point for training and evaluating a dependency parser.\n\nThis implementation combines a deep biaffine graph-bas"
  },
  {
    "path": "stanza/models/pos/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/pos/build_xpos_vocab_factory.py",
    "chars": 6344,
    "preview": "import argparse\nfrom collections import defaultdict\nimport logging\nimport os\nimport re\nimport sys\nfrom zipfile import Zi"
  },
  {
    "path": "stanza/models/pos/data.py",
    "chars": 16422,
    "preview": "import random\nimport logging\nimport copy\nimport torch\nfrom collections import namedtuple\n\nfrom torch.utils.data import D"
  },
  {
    "path": "stanza/models/pos/model.py",
    "chars": 13822,
    "preview": "import logging\nimport os\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom tor"
  },
  {
    "path": "stanza/models/pos/scorer.py",
    "chars": 670,
    "preview": "\"\"\"\nUtils and wrappers for scoring taggers.\n\"\"\"\nimport logging\n\nfrom stanza.models.common.utils import ud_scores\n\nlogger"
  },
  {
    "path": "stanza/models/pos/trainer.py",
    "chars": 8976,
    "preview": "\"\"\"\nA trainer class to handle training and testing of models.\n\"\"\"\n\nimport sys\nimport logging\nimport torch\nfrom torch imp"
  },
  {
    "path": "stanza/models/pos/vocab.py",
    "chars": 3112,
    "preview": "from collections import Counter, OrderedDict\n\nfrom stanza.models.common.vocab import BaseVocab, BaseMultiVocab, CharVoca"
  },
  {
    "path": "stanza/models/pos/xpos_vocab_factory.py",
    "chars": 11285,
    "preview": "# This is the XPOS factory method generated automatically from stanza.models.pos.build_xpos_vocab_factory.\n# Please don'"
  },
  {
    "path": "stanza/models/pos/xpos_vocab_utils.py",
    "chars": 1595,
    "preview": "from collections import namedtuple\nfrom enum import Enum\nimport logging\nimport os\n\nfrom stanza.models.common.vocab impor"
  },
  {
    "path": "stanza/models/tagger.py",
    "chars": 25047,
    "preview": "\"\"\"\nEntry point for training and evaluating a POS/morphological features tagger.\n\nThis tagger uses highway BiLSTM layers"
  },
  {
    "path": "stanza/models/tokenization/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/models/tokenization/data.py",
    "chars": 31388,
    "preview": "from bisect import bisect_right\nfrom collections import defaultdict\nfrom copy import copy\nimport numpy as np\nimport rand"
  },
  {
    "path": "stanza/models/tokenization/model.py",
    "chars": 5134,
    "preview": "import torch\nimport torch.nn.functional as F\nimport torch.nn as nn\nfrom torch.nn.utils.rnn import pad_packed_sequence, p"
  },
  {
    "path": "stanza/models/tokenization/tokenize_files.py",
    "chars": 3788,
    "preview": "\"\"\"Use a Stanza tokenizer to turn a text file into one tokenized paragraph per line\n\nFor example, the output of this scr"
  },
  {
    "path": "stanza/models/tokenization/trainer.py",
    "chars": 5134,
    "preview": "import sys\nimport logging\nimport torch\nimport torch.nn as nn\nimport torch.optim as optim\n\nfrom stanza.models.common impo"
  },
  {
    "path": "stanza/models/tokenization/utils.py",
    "chars": 26371,
    "preview": "from collections import Counter\nfrom copy import copy\nimport json\nimport numpy as np\nimport re\nimport logging\nimport os\n"
  },
  {
    "path": "stanza/models/tokenization/vocab.py",
    "chars": 1232,
    "preview": "from collections import Counter\nimport re\n\nfrom stanza.models.common.vocab import BaseVocab\nfrom stanza.models.common.vo"
  },
  {
    "path": "stanza/models/tokenizer.py",
    "chars": 15847,
    "preview": "\"\"\"\nEntry point for training and evaluating a neural tokenizer.\n\nThis tokenizer treats tokenization and sentence segment"
  },
  {
    "path": "stanza/models/wl_coref.py",
    "chars": 12040,
    "preview": "\"\"\"\nRuns experiments with CorefModel.\n\nTry 'python wl_coref.py -h' for more details.\n\nCode based on\n\nhttps://github.com/"
  },
  {
    "path": "stanza/pipeline/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "stanza/pipeline/_constants.py",
    "chars": 280,
    "preview": "\"\"\" Module defining constants \"\"\"\n\n# string constants for processor names\nLANGID = 'langid'\nTOKENIZE = 'tokenize'\nMWT = "
  },
  {
    "path": "stanza/pipeline/constituency_processor.py",
    "chars": 3094,
    "preview": "\"\"\"\nProcessor that attaches a constituency tree to a sentence\n\"\"\"\n\nfrom stanza.models.constituency.trainer import Traine"
  }
]

// ... and 379 more files (download for full content)

About this extraction

This page contains the full source code of the stanfordnlp/stanza GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 579 files (3.8 MB), approximately 1.0M tokens, and a symbol index with 3744 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo