Full Code of daniellibin/gaiic2021_track3_querySim for AI

master 08a8079e1ffd cached

701 files

12.2 MB

3.2M tokens

11552 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (12,976K chars total). Download the full file to get everything.

Repository: daniellibin/gaiic2021_track3_querySim
Branch: master
Commit: 08a8079e1ffd
Files: 701
Total size: 12.2 MB

Directory structure:
gitextract_r3nhpke0/

├── README.md
└── code/
    ├── .gitignore
    ├── Config.py
    ├── Dockerfile
    ├── NEZHA/
    │   ├── configuration_nezha.py
    │   └── modeling_nezha.py
    ├── bert-base-chinese/
    │   └── config.json
    ├── bert-base-count3/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── bert_model/
    │       │   └── gitkeep
    │       ├── train_bert.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── bert-base-count3-len100/
    │   └── finetuning/
    │       ├── .ipynb_checkpoints/
    │       │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │       ├── Config.py
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── model.py
    │       ├── models/
    │       │   └── gitkeep
    │       ├── multi_gpu_QA.py
    │       └── utils.py
    ├── bert-base-count5/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── bert_model/
    │       │   └── gitkeep
    │       ├── train_bert.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── bert-base-count5-len32/
    │   └── finetuning/
    │       ├── .ipynb_checkpoints/
    │       │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │       ├── Config.py
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── model.py
    │       ├── models/
    │       │   └── gitkeep
    │       ├── multi_gpu_QA.py
    │       └── utils.py
    ├── build_vocab.py
    ├── docker_build.sh
    ├── main_fusion_thread.py
    ├── model.py
    ├── nezha-base-count3/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── nezha_model/
    │       │   └── gitkeep
    │       ├── train_nezha.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── nezha-base-count5/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── nezha_model/
    │       │   └── gitkeep
    │       ├── train_nezha.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── nezha-cn-base/
    │   ├── config.json
    │   └── vocab.txt
    ├── requirements.txt
    ├── run.sh
    ├── serial_main_fusion_thread.py
    └── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# 0.前言

决赛答辩已经过去一段时间了，我们队伍ac milan最终获得了复赛第3，决赛第4的成绩。在此首先感谢一些队友的carry～

经过2个多月的比赛，学习收获了很多，也认识了很多大佬，在这里记录一下自己的参赛体验和学习收获。

[github地址]: https://github.com/daniellibin/gaiic2021_track3_querySim
[比赛地址]: https://tianchi.aliyun.com/competition/entrance/531851/introduction

# 1.赛题背景

小布助手是OPPO公司为欧加集团三品牌手机和IoT设备自研的语音助手，为用户提供了有趣、贴心、便捷的对话式服务。意图识别是对话系统中的一个核心任务，而对话短文本语义匹配是意图识别的主流算法方案之一。本赛题要求参赛队伍根据脱敏后的短文本query-pair，预测它们是否属于同一语义，提交的结果按照指定的评价指标使用在线评测数据进行评测和排名，得分最优者获胜。

# 2.赛题描述及数据说明

- ### 训练数据

  训练数据包含输入query-pair，以及对应的真值。初赛训练样本10万，复赛训练样本30万，这份数据主要用于参赛队伍训练模型，为确保数据的高质量，每一个样本的真值都有进行人工标注校验。每行为一个训练样本，由query-pair和真值组成，每行格式如下：

  - query-pair格式：query以中文为主，中间可能带有少量英文单词（如英文缩写、品牌词、设备型号等），采用UTF-8编码，未分词，两个query之间使用\t分割。
  - 真值：真值可为0或1，其中1代表query-pair语义相匹配，0则代表不匹配，真值与query-pair之间也用\t分割。

  ### 训练数据样本举例（空白间隔为\t）：

  ```
  肖战的粉丝叫什么名字 肖战的粉丝叫什么 1
  
  王者荣耀里面打野谁最厉害 王者荣耀什么英雄最好玩 0
  
  我想换个手机 我要换手机 1
  
  我是张睿 我想张睿 0
  
  不想 不想说 0
  ```

  ### 测试数据

  脱敏后的query-pair数据，初赛采用A/B榜的方式，A榜和B榜样本规模分别为2.5万，发布时间以赛制为准，初赛队伍根据初赛B榜排名择优进入复赛；复赛同样采用A/B榜的方式，样本规模5万（与初赛不重复），复赛队伍根据复赛B榜排名择优进入现场答辩。

  ### 测试数据样本举例（空白间隔为\t）

  ```
  肖战的粉丝叫什么名字 肖战的粉丝叫什么
  
  王者荣耀里面打野谁最厉害 王者荣耀什么英雄最好玩
  
  我想换个手机 我要换手机
  
  我是张睿 我想张睿
  
  不想 不想说
  ```

# 3.评估标准

比赛的评估标准由性能标准和效果标准两部分组成，初赛采用效果标准，`AUC` 指标。

# 4.整体设计

![image-20210619163346172](README.assets/image-20210619163346172.png)

## （1）预训练

#### a.模型选取

赛题所给数据经过了脱敏，相当于一种新的语言，无法直接利用开源的预训练模型进行迁移学习

但是预训练依然很有必要，在有限的数据上，我们需要尽可能充分地利用其中的信息，Bert语言模型的MLM预训练任务可以利用无监督文本信息，学习文本表征、语言学知识和世界性知识

我们选用的是Bert和其变种Nezha，二者主要区别在于绝对位置编码与相对位置编码

考虑到后续的模型融合以及线上环境提供四卡，我们预训练了四个模型，参数量皆为1亿左右

![image-20210619163530653](README.assets/image-20210619163530653.png)

#### b.MASK策略

模型输入为经典的拼接形式：[CLS] s1 [SEP] s2 [SEP] 

对偶：s1、s2以50%的概率交换位置，是对语义无损的数据增强方式

长度自适应动态N-gram Mask策略
- 动态Mask：预训练达到400 epoch，上百万次iter，可以每次迭代都随机生成新的mask文本，增强模型泛化能力
- N-gram Mask：以15%的概率选中token，为增加训练难度，选中部分以70%、20%、10%的概率进行1-gram、2-gram、3-gram片段的mask（选中token使用[MASK]、随机词、自身替换的概率和原版Bert一致）
- 长度自适应：考虑到对短文本进行过较长gram的mask对语义有较大破坏，长度小于7的文本不进行3-gram mask，小于4的文本不进行2-gram mask
- 防止小概率的连续Mask：已经mask了的文本片段，强制跳过下一个token的mask，防止一长串连续的mask

#### c.其他Trick与参数设置

- 学习率warmup与衰减

  - 预训练400 epoch ，前4.5个epoch，学习率从0线性增长到5e-5，之后线性衰减到1e-5

- 分块shuffle

  - 预训练周期长，优化时间性能非常重要，分块shuffle将长度差不多的样本组成batch快，块间shuffle，减少padding部分运算量，耗时减少了约40%，实测不会降低模型效果

- 权重衰减

  - 限制网络权值的大小，缓解过拟合现象

- 四个模型通用参数设置

  ![image-20210619170554408](README.assets/image-20210619170554408.png)

## （2）微调

#### a.模型参数

- 预训练利用文本中的无监督信息，微调则需利用有监督的句子对匹配信息，将赛题任务建模为匹配与不匹配的二分类问题

- 我们在4个预训练模型的基础上，训练了6个微调模型，从词表、截断长度和模型结构等维度保证模型之间的差异性，以便后序模型融合，参数设置对比如下：

  ![image-20210619170702479](README.assets/image-20210619170702479.png)

#### b.后接结构

- Bert/Nezha后接的三种结构

  ![image-20210619170927378](README.assets/image-20210619170927378.png)

考虑到Bert已经具备强大的特征提取能力，以及运行和推理时限严格，所以其只后接了一些简单的结构。

#### c.Trick

- 学习率

    - warmup与衰减：可以使得训练初期学习率较小，模型可以慢慢趋于稳定，待相对稳定后再以预先设置的学习率进行训练，使得模型收敛速度变得更快。后采用学习率衰减的方式使模型收敛到更佳的极值点，提升最终效果
    - 不同模型采用不同的学习率（2e-5或4e-5）

- 模型融合时先对logits加权平均，后softmax

  - 使得softmax不再是每个模型独立进行，而是综合利用所有模型信息

- 对抗训练

  - 对抗训练是一种引入噪声的训练方式，可以对参数进行正则化，提升模型鲁棒性和泛化能力
    Fast Gradient Method (FGM)：对embedding层在梯度方向添加扰动
    Projected Gradient Descent (PGD) ：迭代扰动，每次扰动被投影到规定范围内
    团队实验了FGM、PGD，前者速度快且效果更佳。

  #### d.通用参数

  最佳参数
  - batch_size=32，预训练充分的情况下，微调收敛非常快，小bs带来更大的随机性，更不容易过早陷入局部最优
  - epoch=3
  - dropout=0.2，训练时以一定概率丢弃某些神经元，缓解过拟合
  - FGM，epsilon=0.25时效果最佳

## （3）模型融合与推理

![image-20210619171224369](README.assets/image-20210619171224369.png)

## （4）性能优化

#### a.分块shuffle

- 赛题限制线上总运行时间为80小时，限制推理5w测试集时间为15分钟（含网络开销），性能优化尤为关键

  - 分块shuffle将长度差不多的样本组成batch快，块间shuffle，减少padding部分运算量，预训练耗时减少了约40%

  - 最终预训练线上能控制在9分多钟一个epoch，400个epoch能控制在65小时以内完成

    ![image-20210619171438518](README.assets/image-20210619171438518.png)

#### b.推理加速

- ONNX Runtime：ONNX Runtime是机器学习模型的预测引擎，能使用内置的图优化（Graph  Optimization）和各种硬件加速功能，来优化和加速推理。像BERT这样的Transformer模型，由许多运算符（Operator）的图构成，ONNX  Runtime内置图优化功能，可以简化图并且减少节点，进而执行更复杂的节点融合和布局优化。通过使用ONNX Runtime，推理部分获得了非常可观的加速。

![image-20210619171514789](README.assets/image-20210619171514789.png)

#### c.对cuda版本的调优

- 在大家使用较多的cuda11镜像中，我们发现线上V100速度较慢，根据以往项目经验，老一些的卡用较新的cuda版本未必能发挥出最好的性能，我们尝试更换镜像版本为cuda10.2，cudnn版本配套改为7，onnxruntime-gpu版本配套改为1.5.1，推理速度有了较大提升，使得在15分钟内我们能跑6个模型（以往为4个）

![image-20210619171554475](README.assets/image-20210619171554475.png)

#### d.其他细节

- 减少内存到显存的通信开销：避免使用.to('cuda')的方式将tensor从内存移至显存，增加通信开销，而是一开始就用torch.tensor(xxx,device='cuda')的方式将tensor创建在显存

- 编写更快的分词函数：所给数据已经用空格将token隔开，避免使用tokenize函数将数据整体当做字符串进行分词，而是按空格split后直接convert_tokens_to_ids

- ……

# 5.创新和落地

#### a.创新

- 融入对偶的长度自适应动态N-gram Mask策略

- 不同词表、不同截断长度、不同结构的模型融合，保证模型差异性

- 学习率warmup与衰减、模型权重衰减、对抗训练等Trick

- 性能优化，包括分块shuffle、ONNX Runtime的使用、对cuda版本的调优和其他细节优化

#### b.落地

- 我们的模型将语义匹配转换为分类问题，这是一种通用性非常强的解决方案，可以广泛落地于自然语言处理领域中涉及到句子关系的各项任务中，如开放域意图识别（本赛题）、QQ匹配、QA匹配、文本蕴含等

- 推理速度较快，不计网络通信消耗，比赛使用的6模（4 Bert，2 Nezha）融合后可达77的QPS（AUC 0.9579），在牺牲不到一个百分点的AUC下，单模Bert可达595的QPS（AUC 0.948）

- 实际生产环境复杂，短文本相对容易出现语义缺失，且受噪声影响相对更大（用户输错或语音识别错误几个字，占短文本整体的比例可能就较大），可能需考虑辅以指代消解、文本补全、文本纠错等技术

- 深度学习并非万能，实际落地时，需要不断进行badcase分析，适当辅以规则的方法提升系统鲁棒性

# 6.方案总结

- 总结性回答
  - 我们从预训练、微调、模型融合和推理四个方面入手，每个阶段进行针对性的策略改进及创新，辅以性能优化，最终形成了一个较好的端到端解决方案，可以广泛落地于自然语言处理领域中涉及到句子关系的各项任务中，具有较好的实用性和创新性。
- 方法优劣势分析、展望
  - 优点：效果好，速度快，模型通用性强
  - 缺点：交互型模型因为每次计算都需要输入完整句子对，不适合于从海量文本中召回结果，而是适合在召回小部分候选集后，进行精细的排序
  - 展望：从科学研究角度，我们要利用好预训练模型这个核武器，设计更有针对性，更加合理的预训练任务，此外也可探索结合上下文、引入知识的多轮匹配任务。从应用角度，可以从badcase出发，不断优化算法，挖掘用户需求，让小布成为一个知识更加渊博，对话更加流畅，更加人性化的智能助理



# 7.前排大佬解决方案

# 一、AI小花

https://github.com/nilboy/gaic_track3_pair_sim

![](README.assets/image-20210619194942961.png)

![image-20210619194951696](README.assets/image-20210619194951696.png)

# 二、[none]

![image-20210619200120075](README.assets/image-20210619200120075.png)

![image-20210619200126641](README.assets/image-20210619200126641.png) 

![image-20210619200137247](README.assets/image-20210619200137247.png)

# 三、赛道3-白[MASK]

![image-20210619204111156](README.assets/image-20210619204111156.png)

![image-20210619204017855](README.assets/image-20210619204017855.png)

![image-20210619204120283](README.assets/image-20210619204120283.png)

![image-20210619204128548](README.assets/image-20210619204128548.png)

# 四、科讯嘉联灵珠团队

![image-20210619205915821](README.assets/image-20210619205915821.png)

![image-20210619210050654](README.assets/image-20210619210050654.png)

# 五、LOL王者

![image-20210619210251396](README.assets/image-20210619210251396.png)

![image-20210619210301353](README.assets/image-20210619210301353.png)



================================================
FILE: code/.gitignore
================================================
bert-base-chinese/pytorch_model.bin
nezha-cn-base/pytorch_model.bin
.idea
.DS_Store
__pycache__


================================================
FILE: code/Config.py
================================================
from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \
    get_linear_schedule_with_warmup, XLNetModel, XLNetTokenizer, XLNetConfig, ElectraModel, ElectraConfig, ElectraTokenizer, \
    RobertaTokenizer, RobertaModel, RobertaConfig
from NEZHA.modeling_nezha import NeZhaModel
from NEZHA.configuration_nezha import NeZhaConfig


MODELS = {
    'BertForClass':  BertModel,
    'BertForClass_MultiDropout':  BertModel,
   'BertLastTwoCls':  BertModel,
    'BertLastCls':BertModel,
   'BertLastTwoClsPooler':  BertModel,
    'BertLastTwoEmbeddings': BertModel,
    'BertLastTwoEmbeddingsPooler': BertModel,
    'BertLastFourCls': BertModel,
    'BertLastFourClsPooler':  BertModel,
    'BertLastFourEmbeddings':  BertModel,
   'BertLastFourEmbeddingsPooler':  BertModel,
   'BertDynCls':  BertModel,
    'BertDynEmbeddings': BertModel,
    'BertRNN': BertModel,
    'BertCNN': XLNetModel,
    'BertRCNN':  BertModel,
    'XLNet': XLNetModel,
    'Electra': ElectraModel,
    'NEZHA': NeZhaModel
    }

TOKENIZERS = {
    'BertForClass': BertTokenizer,
    'BertForClass_MultiDropout': BertTokenizer,
    'BertLastTwoCls': BertTokenizer,
    'BertLastCls': BertTokenizer,
    'BertLastTwoClsPooler': BertTokenizer,
    'BertLastTwoEmbeddings': BertTokenizer,
    'BertLastTwoEmbeddingsPooler': BertTokenizer,
    'BertLastFourCls': BertTokenizer,
    'BertLastFourClsPooler': BertTokenizer,
    'BertLastFourEmbeddings': BertTokenizer,
    'BertLastFourEmbeddingsPooler': BertTokenizer,
    'BertDynCls': BertTokenizer,
    'BertDynEmbeddings': BertTokenizer,
    'BertRNN': BertTokenizer,
    'BertCNN': BertTokenizer,
    'BertRCNN': BertTokenizer,
    'XLNet': XLNetTokenizer,
    'Electra': ElectraTokenizer,
    'NEZHA': BertTokenizer
    }

CONFIGS = {
    'BertForClass': BertConfig,
    'BertForClass_MultiDropout': BertConfig,
    'BertLastTwoCls': BertConfig,
    'BertLastCls': BertConfig,
    'BertLastTwoClsPooler': BertConfig,
    'BertLastTwoEmbeddings': BertConfig,
    'BertLastTwoEmbeddingsPooler': BertConfig,
    'BertLastFourCls': BertConfig,
    'BertLastFourClsPooler': BertConfig,
    'BertLastFourEmbeddings': BertConfig,
    'BertLastFourEmbeddingsPooler': BertConfig,
    'BertDynCls': BertConfig,
    'BertDynEmbeddings': BertConfig,
    'BertRNN': BertConfig,
    'BertCNN': BertConfig,
    'BertRCNN': BertConfig,
    'XLNet': XLNetConfig,
    'Electra': ElectraConfig,
    'NEZHA': NeZhaConfig

    }

================================================
FILE: code/Dockerfile
================================================
# Base Images
## 从天池基础镜像构建(from的base img 根据自己的需要更换，建议使用天池open list镜像链接：https://tianchi.aliyun.com/forum/postDetail?postId=67720)
#FROM registry.cn-shanghai.aliyuncs.com/tcc-public/pytorch:1.6-cuda10.1-py3
FROM registry.cn-shanghai.aliyuncs.com/xiaobu_match/match:cuda10.2base

## 把当前文件夹里的文件构建到镜像的根目录下
ADD . /

##安装依赖包,pip包请在requirements.txt添加
#RUN apt-get update && apt-get install -y curl


#RUN   pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
#pip install --no-cache-dir -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple transformers==4.2.0
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tqdm
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple flask
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple psutil
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple onnx
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple onnxruntime-gpu==1.7.0
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple sklearn
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple onnxruntime_tools
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple sympy
#RUN    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple sentencepiece


## 指定默认工作目录为根目录（需要把run.sh和生成的结果文件都放在该文件夹下，提交后才能运行）
WORKDIR /


## 镜像启动后统一执行 sh run.sh
CMD ["sh", "run.sh"]


================================================
FILE: code/NEZHA/configuration_nezha.py
================================================

from transformers import PretrainedConfig

NEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}

class NeZhaConfig(PretrainedConfig):
    r"""
        This is the configuration class to store the configuration of an :class:`~transformers.AlbertModel`.
        It is used to instantiate an ALBERT model according to the specified arguments, defining the model
        architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
        the ALBERT `xxlarge <https://huggingface.co/albert-xxlarge-v2>`__ architecture.

        Configuration objects inherit from  :class:`~transformers.PretrainedConfig` and can be used
        to control the model outputs. Read the documentation from  :class:`~transformers.PretrainedConfig`
        for more information.


        Args:
            vocab_size (:obj:`int`, optional, defaults to 30000):
                Vocabulary size of the ALBERT model. Defines the different tokens that
                can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.AlbertModel`.
            embedding_size (:obj:`int`, optional, defaults to 128):
                Dimensionality of vocabulary embeddings.
            hidden_size (:obj:`int`, optional, defaults to 4096):
                Dimensionality of the encoder layers and the pooler layer.
            num_hidden_layers (:obj:`int`, optional, defaults to 12):
                Number of hidden layers in the Transformer encoder.
            num_hidden_groups (:obj:`int`, optional, defaults to 1):
                Number of groups for the hidden layers, parameters in the same group are shared.
            num_attention_heads (:obj:`int`, optional, defaults to 64):
                Number of attention heads for each attention layer in the Transformer encoder.
            intermediate_size (:obj:`int`, optional, defaults to 16384):
                The dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
            inner_group_num (:obj:`int`, optional, defaults to 1):
                The number of inner repetition of attention and ffn.
            hidden_act (:obj:`str` or :obj:`function`, optional, defaults to "gelu_new"):
                The non-linear activation function (function or string) in the encoder and pooler.
                If string, "gelu", "relu", "swish" and "gelu_new" are supported.
            hidden_dropout_prob (:obj:`float`, optional, defaults to 0):
                The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
            attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0):
                The dropout ratio for the attention probabilities.
            max_position_embeddings (:obj:`int`, optional, defaults to 512):
                The maximum sequence length that this model might ever be used with. Typically set this to something
                large (e.g., 512 or 1024 or 2048).
            type_vocab_size (:obj:`int`, optional, defaults to 2):
                The vocabulary size of the `token_type_ids` passed into :class:`~transformers.AlbertModel`.
            initializer_range (:obj:`float`, optional, defaults to 0.02):
                The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
            layer_norm_eps (:obj:`float`, optional, defaults to 1e-12):
                The epsilon used by the layer normalization layers.
            classifier_dropout_prob (:obj:`float`, optional, defaults to 0.1):
                The dropout ratio for attached classifiers.

        Example::

            from transformers import AlbertConfig, AlbertModel
            # Initializing an ALBERT-xxlarge style configuration
            albert_xxlarge_configuration = AlbertConfig()

            # Initializing an ALBERT-base style configuration
            albert_base_configuration = AlbertConfig(
                hidden_size=768,
                num_attention_heads=12,
                intermediate_size=3072,
            )

            # Initializing a model from the ALBERT-base style configuration
            model = AlbertModel(albert_xxlarge_configuration)

            # Accessing the model configuration
            configuration = model.config

        Attributes:
            pretrained_config_archive_map (Dict[str, str]):
                A dictionary containing all the available pre-trained checkpoints.
    """

    pretrained_config_archive_map = NEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP
    model_type = "nezha"

    def __init__(
        self,
        vocab_size=30000,
        embedding_size=128,
        hidden_size=4096,
        num_hidden_layers=12,
        num_hidden_groups=1,
        num_attention_heads=64,
        intermediate_size=16384,
        inner_group_num=1,
        hidden_act="gelu_new",
        hidden_dropout_prob=0,
        attention_probs_dropout_prob=0,
        max_position_embeddings=512,
        max_relative_position=64,
        type_vocab_size=2,
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        classifier_dropout_prob=0.1,
        use_relative_position=True,
        pad_token_id=0,
        bos_token_id=2,
        eos_token_id=3,
        **kwargs
    ):
        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

        self.vocab_size = vocab_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_hidden_groups = num_hidden_groups
        self.num_attention_heads = num_attention_heads
        self.inner_group_num = inner_group_num
        self.hidden_act = hidden_act
        self.intermediate_size = intermediate_size
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.max_relative_position = max_relative_position
        self.type_vocab_size = type_vocab_size
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.use_relative_position=use_relative_position
        self.classifier_dropout_prob = classifier_dropout_prob


================================================
FILE: code/NEZHA/modeling_nezha.py
================================================
import math
import os
import warnings
from dataclasses import dataclass
from typing import Optional, Tuple

import torch
import torch.utils.checkpoint
from torch import nn
from torch.nn import CrossEntropyLoss, MSELoss

from transformers.activations import ACT2FN
from transformers.file_utils import (
    ModelOutput,
    add_code_sample_docstrings,
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
    replace_return_docstrings,
)
from transformers.modeling_outputs import (
    BaseModelOutputWithPastAndCrossAttentions,
    BaseModelOutputWithPoolingAndCrossAttentions,
    CausalLMOutputWithCrossAttentions,
    MaskedLMOutput,
    MultipleChoiceModelOutput,
    NextSentencePredictorOutput,
    QuestionAnsweringModelOutput,
    SequenceClassifierOutput,
    TokenClassifierOutput,
)
from transformers.modeling_utils import (
    PreTrainedModel,
    apply_chunking_to_forward,
    find_pruneable_heads_and_indices,
    prune_linear_layer,
)

from transformers.models.bert.configuration_bert import BertConfig

import logging
logger = logging.getLogger(__name__)

_CHECKPOINT_FOR_DOC = "bert-base-uncased"
_CONFIG_FOR_DOC = "BertConfig"
_TOKENIZER_FOR_DOC = "BertTokenizer"


def load_tf_weights_in_bert(model, config, tf_checkpoint_path):
    """Load tf checkpoints in a pytorch model."""
    try:
        import re

        import numpy as np
        import tensorflow as tf
    except ImportError:
        logger.error(
            "Loading a TensorFlow model in PyTorch, requires TensorFlow to be installed. Please see "
            "https://www.tensorflow.org/install/ for installation instructions."
        )
        raise
    tf_path = os.path.abspath(tf_checkpoint_path)
    logger.info("Converting TensorFlow checkpoint from {}".format(tf_path))
    # Load weights from TF model
    init_vars = tf.train.list_variables(tf_path)
    names = []
    arrays = []
    for name, shape in init_vars:
        logger.info("Loading TF weight {} with shape {}".format(name, shape))
        array = tf.train.load_variable(tf_path, name)
        names.append(name)
        arrays.append(array)

    for name, array in zip(names, arrays):
        name = name.split("/")
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if any(
            n in ["adam_v", "adam_m", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "global_step"]
            for n in name
        ):
            logger.info("Skipping {}".format("/".join(name)))
            continue
        pointer = model
        for m_name in name:
            if re.fullmatch(r"[A-Za-z]+_\d+", m_name):
                scope_names = re.split(r"_(\d+)", m_name)
            else:
                scope_names = [m_name]
            if scope_names[0] == "kernel" or scope_names[0] == "gamma":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "output_bias" or scope_names[0] == "beta":
                pointer = getattr(pointer, "bias")
            elif scope_names[0] == "output_weights":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "squad":
                pointer = getattr(pointer, "classifier")
            else:
                try:
                    pointer = getattr(pointer, scope_names[0])
                except AttributeError:
                    logger.info("Skipping {}".format("/".join(name)))
                    continue
            if len(scope_names) >= 2:
                num = int(scope_names[1])
                pointer = pointer[num]
        if m_name[-11:] == "_embeddings":
            pointer = getattr(pointer, "weight")
        elif m_name == "kernel":
            array = np.transpose(array)
        try:
            assert (
                pointer.shape == array.shape
            ), f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched"
        except AssertionError as e:
            e.args += (pointer.shape, array.shape)
            raise
        logger.info("Initialize PyTorch weight {}".format(name))
        pointer.data = torch.from_numpy(array)
    return model


class BertEmbeddings(nn.Module):
    """Construct the embeddings from word, position and token_type embeddings."""

    def __init__(self, config):
        super().__init__()
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
        # any TensorFlow checkpoint file
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, input_ids=None, token_type_ids=None, inputs_embeds=None):
        if input_ids is not None:
            input_shape = input_ids.size()
        else:
            input_shape = inputs_embeds.size()[:-1]
        if token_type_ids is None:
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=input_ids.device)

        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        token_type_embeddings = self.token_type_embeddings(token_type_ids)

        embeddings = inputs_embeds + token_type_embeddings
        embeddings = self.LayerNorm(embeddings)
        embeddings = self.dropout(embeddings)
        return embeddings

def relative_position_encoding(depth, max_length=512, max_relative_position=64):
    vocab_size = max_relative_position * 2 + 1
    range_vec = torch.arange(max_length)
    range_mat = range_vec.repeat(max_length).view(max_length, max_length)
    distance_mat = range_mat - torch.t(range_mat)
    distance_mat_clipped = torch.clamp(distance_mat, -max_relative_position, max_relative_position)
    final_mat = distance_mat_clipped + max_relative_position

    embeddings_table = torch.zeros(vocab_size, depth)
    position = torch.arange(0, vocab_size, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, depth, 2).float() * (-math.log(10000.0) / depth))
    embeddings_table[:, 0::2] = torch.sin(position * div_term)
    embeddings_table[:, 1::2] = torch.cos(position * div_term)
    embeddings_table = embeddings_table.unsqueeze(0).transpose(0, 1).squeeze(1)

    flat_relative_positions_matrix = final_mat.view(-1)
    one_hot_relative_positions_matrix = torch.nn.functional.one_hot(flat_relative_positions_matrix,
                                                                    num_classes=vocab_size).float()
    positions_encoding = torch.matmul(one_hot_relative_positions_matrix, embeddings_table)
    my_shape = list(final_mat.size())
    my_shape.append(depth)
    positions_encoding = positions_encoding.view(my_shape)
    return positions_encoding

class BertSelfAttention(nn.Module):
    def __init__(self, config):
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                "The hidden size (%d) is not a multiple of the number of attention "
                "heads (%d)" % (config.hidden_size, config.num_attention_heads)
            )

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = nn.Linear(config.hidden_size, self.all_head_size)
        self.key = nn.Linear(config.hidden_size, self.all_head_size)
        self.value = nn.Linear(config.hidden_size, self.all_head_size)

        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
        self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
        if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
            self.max_position_embeddings = config.max_position_embeddings
            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)

        self.is_decoder = config.is_decoder

    def transpose_for_scores(self, x):
        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(*new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
        relations_kv=None
    ):
        mixed_query_layer = self.query(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        is_cross_attention = encoder_hidden_states is not None

        if is_cross_attention and past_key_value is not None:
            # reuse k,v, cross_attentions
            key_layer = past_key_value[0]
            value_layer = past_key_value[1]
            attention_mask = encoder_attention_mask
        elif is_cross_attention:
            key_layer = self.transpose_for_scores(self.key(encoder_hidden_states))
            value_layer = self.transpose_for_scores(self.value(encoder_hidden_states))
            attention_mask = encoder_attention_mask
        elif past_key_value is not None:
            key_layer = self.transpose_for_scores(self.key(hidden_states))
            value_layer = self.transpose_for_scores(self.value(hidden_states))
            key_layer = torch.cat([past_key_value[0], key_layer], dim=2)
            value_layer = torch.cat([past_key_value[1], value_layer], dim=2)
        else:
            key_layer = self.transpose_for_scores(self.key(hidden_states))
            value_layer = self.transpose_for_scores(self.value(hidden_states))

        query_layer = self.transpose_for_scores(mixed_query_layer)

        if self.is_decoder:
            # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
            # Further calls to cross_attention layer can then reuse all cross-attention
            # key/value_states (first "if" case)
            # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
            # all previous decoder key/value_states. Further calls to uni-directional self-attention
            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
            # if encoder bi-directional self-attention `past_key_value` is always `None`
            past_key_value = (key_layer, value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))

        batch_size, num_attention_heads, from_seq_length, to_seq_length = attention_scores.size()


        query_layer_t = query_layer.permute(2, 0, 1, 3)

        query_layer_r = query_layer_t.contiguous().view(from_seq_length, batch_size * num_attention_heads,
                                                        self.attention_head_size)
        key_position_scores = torch.matmul(query_layer_r, relations_kv.permute(0, 2, 1))
        key_position_scores_r = key_position_scores.view(from_seq_length, batch_size,
                                                         num_attention_heads, from_seq_length)
        key_position_scores_r_t = key_position_scores_r.permute(1, 2, 0, 3)
        attention_scores = attention_scores + key_position_scores_r_t

        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in NeZhaModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = nn.Softmax(dim=-1)(attention_scores)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = torch.matmul(attention_probs, value_layer)


        attention_probs_t = attention_probs.permute(2, 0, 1, 3)
        attentions_probs_r = attention_probs_t.contiguous().view(from_seq_length, batch_size * num_attention_heads,
                                                                 to_seq_length)
        value_position_scores = torch.matmul(attentions_probs_r, relations_kv)
        value_position_scores_r = value_position_scores.view(from_seq_length, batch_size,
                                                             num_attention_heads, self.attention_head_size)
        value_position_scores_r_t = value_position_scores_r.permute(1, 2, 0, 3)
        context_layer = context_layer + value_position_scores_r_t

        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(*new_context_layer_shape)

        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)

        if self.is_decoder:
            outputs = outputs + (past_key_value,)
        return outputs


class BertSelfOutput(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, hidden_states, input_tensor):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class BertAttention(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.self = BertSelfAttention(config)
        self.output = BertSelfOutput(config)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        if len(heads) == 0:
            return
        heads, index = find_pruneable_heads_and_indices(
            heads, self.self.num_attention_heads, self.self.attention_head_size, self.pruned_heads
        )

        # Prune linear layers
        self.self.query = prune_linear_layer(self.self.query, index)
        self.self.key = prune_linear_layer(self.self.key, index)
        self.self.value = prune_linear_layer(self.self.value, index)
        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)

        # Update hyper params and store pruned heads
        self.self.num_attention_heads = self.self.num_attention_heads - len(heads)
        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
        relations_kv=None
    ):
        self_outputs = self.self(
            hidden_states,
            attention_mask,
            head_mask,
            encoder_hidden_states,
            encoder_attention_mask,
            past_key_value,
            output_attentions,
            relations_kv=relations_kv
        )
        attention_output = self.output(self_outputs[0], hidden_states)
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs


class BertIntermediate(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
        if isinstance(config.hidden_act, str):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def forward(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states


class BertOutput(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, hidden_states, input_tensor):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class BertLayer(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.chunk_size_feed_forward = config.chunk_size_feed_forward
        self.seq_len_dim = 1
        self.attention = BertAttention(config)
        self.is_decoder = config.is_decoder
        self.add_cross_attention = config.add_cross_attention
        if self.add_cross_attention:
            assert self.is_decoder, f"{self} should be used as a decoder model if cross attention is added"
            self.crossattention = BertAttention(config)
        self.intermediate = BertIntermediate(config)
        self.output = BertOutput(config)

    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_value=None,
        output_attentions=False,
        relations_kv=None
    ):
        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
        self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
        self_attention_outputs = self.attention(
            hidden_states,
            attention_mask,
            head_mask,
            output_attentions=output_attentions,
            past_key_value=self_attn_past_key_value,
            relations_kv=relations_kv
        )
        attention_output = self_attention_outputs[0]

        # if decoder, the last output is tuple of self-attn cache
        if self.is_decoder:
            outputs = self_attention_outputs[1:-1]
            present_key_value = self_attention_outputs[-1]
        else:
            outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights

        cross_attn_present_key_value = None
        if self.is_decoder and encoder_hidden_states is not None:
            assert hasattr(
                self, "crossattention"
            ), f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers by setting `config.add_cross_attention=True`"

            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
            cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
            cross_attention_outputs = self.crossattention(
                attention_output,
                attention_mask,
                head_mask,
                encoder_hidden_states,
                encoder_attention_mask,
                cross_attn_past_key_value,
                output_attentions,
            )
            attention_output = cross_attention_outputs[0]
            outputs = outputs + cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights

            # add cross-attn cache to positions 3,4 of present_key_value tuple
            cross_attn_present_key_value = cross_attention_outputs[-1]
            present_key_value = present_key_value + cross_attn_present_key_value

        layer_output = apply_chunking_to_forward(
            self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
        )
        outputs = (layer_output,) + outputs

        # if decoder, return the attn key/values as the last output
        if self.is_decoder:
            outputs = outputs + (present_key_value,)

        return outputs

    def feed_forward_chunk(self, attention_output):
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.output(intermediate_output, attention_output)
        return layer_output


class NeZhaEncoder(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.config = config
        self.layer = nn.ModuleList([BertLayer(config) for _ in range(config.num_hidden_layers)])
        self.relative_positions_encoding = relative_position_encoding(max_length=config.max_position_embeddings,
                                                                     depth=int(config.hidden_size / config.num_attention_heads),
                                                                     max_relative_position=config.max_relative_position).to('cuda')
    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=False,
        output_hidden_states=False,
        return_dict=False,
    ):
        to_seq_length=hidden_states.shape[1]
        relations_kv = self.relative_positions_encoding[:to_seq_length, :to_seq_length, :]
        all_hidden_states = () if output_hidden_states else None
        all_self_attentions = () if output_attentions else None
        all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None

        next_decoder_cache = () if use_cache else None
        for i, layer_module in enumerate(self.layer):
            if output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)

            layer_head_mask = head_mask[i] if head_mask is not None else None
            past_key_value = past_key_values[i] if past_key_values is not None else None

            if getattr(self.config, "gradient_checkpointing", False) and self.training:

                if use_cache:
                    logger.warn(
                        "`use_cache=True` is incompatible with `config.gradient_checkpointing=True`. Setting "
                        "`use_cache=False`..."
                    )
                    use_cache = False

                def create_custom_forward(module):
                    def custom_forward(*inputs):
                        return module(*inputs, past_key_value, output_attentions)

                    return custom_forward

                layer_outputs = torch.utils.checkpoint.checkpoint(
                    create_custom_forward(layer_module),
                    hidden_states,
                    attention_mask,
                    layer_head_mask,
                    encoder_hidden_states,
                    encoder_attention_mask,
                )
            else:
                layer_outputs = layer_module(
                    hidden_states,
                    attention_mask,
                    layer_head_mask,
                    encoder_hidden_states,
                    encoder_attention_mask,
                    past_key_value,
                    output_attentions,relations_kv=relations_kv
                )

            hidden_states = layer_outputs[0]
            if use_cache:
                next_decoder_cache += (layer_outputs[-1],)
            if output_attentions:
                all_self_attentions = all_self_attentions + (layer_outputs[1],)
                if self.config.add_cross_attention:
                    all_cross_attentions = all_cross_attentions + (layer_outputs[2],)

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        if not return_dict:
            return tuple(
                v
                for v in [
                    hidden_states,
                    next_decoder_cache,
                    all_hidden_states,
                    all_self_attentions,
                    all_cross_attentions,
                ]
                if v is not None
            )
        return BaseModelOutputWithPastAndCrossAttentions(
            last_hidden_state=hidden_states,
            past_key_values=next_decoder_cache,
            hidden_states=all_hidden_states,
            attentions=all_self_attentions,
            cross_attentions=all_cross_attentions,
        )


class BertPooler(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        self.activation = nn.Tanh()

    def forward(self, hidden_states):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token.
        first_token_tensor = hidden_states[:, 0]
        pooled_output = self.dense(first_token_tensor)
        pooled_output = self.activation(pooled_output)
        return pooled_output


class BertPredictionHeadTransform(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
        if isinstance(config.hidden_act, str):
            self.transform_act_fn = ACT2FN[config.hidden_act]
        else:
            self.transform_act_fn = config.hidden_act
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)

    def forward(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.transform_act_fn(hidden_states)
        hidden_states = self.LayerNorm(hidden_states)
        return hidden_states


class BertLMPredictionHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.transform = BertPredictionHeadTransform(config)

        # The output weights are the same as the input embeddings, but there is
        # an output-only bias for each token.
        self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

        self.bias = nn.Parameter(torch.zeros(config.vocab_size))

        # Need a link between the two variables so that the bias is correctly resized with `resize_token_embeddings`
        self.decoder.bias = self.bias

    def forward(self, hidden_states):
        hidden_states = self.transform(hidden_states)
        hidden_states = self.decoder(hidden_states)
        return hidden_states


class BertOnlyMLMHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.predictions = BertLMPredictionHead(config)

    def forward(self, sequence_output):
        prediction_scores = self.predictions(sequence_output)
        return prediction_scores


class BertOnlyNSPHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.seq_relationship = nn.Linear(config.hidden_size, 2)

    def forward(self, pooled_output):
        seq_relationship_score = self.seq_relationship(pooled_output)
        return seq_relationship_score


class BertPreTrainingHeads(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.predictions = BertLMPredictionHead(config)
        self.seq_relationship = nn.Linear(config.hidden_size, 2)

    def forward(self, sequence_output, pooled_output):
        prediction_scores = self.predictions(sequence_output)
        seq_relationship_score = self.seq_relationship(pooled_output)
        return prediction_scores, seq_relationship_score


class BertPreTrainedModel(PreTrainedModel):
    """
    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
    models.
    """

    config_class = BertConfig
    load_tf_weights = load_tf_weights_in_bert
    base_model_prefix = "bert"

    def _init_weights(self, module):
        """ Initialize the weights """
        if isinstance(module, nn.Linear):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
            if module.bias is not None:
                module.bias.data.zero_()
        elif isinstance(module, nn.Embedding):
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
            if module.padding_idx is not None:
                module.weight.data[module.padding_idx].zero_()
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)


@dataclass
class BertForPreTrainingOutput(ModelOutput):
    """
    Output type of :class:`~transformers.BertForPreTraining`.

    Args:
        loss (`optional`, returned when ``labels`` is provided, ``torch.FloatTensor`` of shape :obj:`(1,)`):
            Total loss as the sum of the masked language modeling loss and the next sequence prediction
            (classification) loss.
        prediction_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`):
            Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
        seq_relationship_logits (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, 2)`):
            Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation
            before SoftMax).
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_hidden_states=True`` is passed or when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``output_attentions=True`` is passed or when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape :obj:`(batch_size, num_heads,
            sequence_length, sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
    """

    loss: Optional[torch.FloatTensor] = None
    prediction_logits: torch.FloatTensor = None
    seq_relationship_logits: torch.FloatTensor = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None


BERT_START_DOCSTRING = r"""

    This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
    pruning heads etc.)

    This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__
    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
    general usage and behavior.

    Parameters:
        config (:class:`~transformers.BertConfig`): Model configuration class with all the parameters of the model.
            Initializing with a config file does not load the weights associated with the model, only the
            configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model
            weights.
"""

BERT_INPUTS_DOCSTRING = r"""
    Args:
        input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`):
            Indices of input sequence tokens in the vocabulary.

            Indices can be obtained using :class:`~transformers.BertTokenizer`. See
            :meth:`transformers.PreTrainedTokenizer.encode` and :meth:`transformers.PreTrainedTokenizer.__call__` for
            details.

            `What are input IDs? <../glossary.html#input-ids>`__
        attention_mask (:obj:`torch.FloatTensor` of shape :obj:`({0})`, `optional`):
            Mask to avoid performing attention on padding token indices. Mask values selected in ``[0, 1]``:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.

            `What are attention masks? <../glossary.html#attention-mask>`__
        token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
            Segment token indices to indicate first and second portions of the inputs. Indices are selected in ``[0,
            1]``:

            - 0 corresponds to a `sentence A` token,
            - 1 corresponds to a `sentence B` token.

            `What are token type IDs? <../glossary.html#token-type-ids>`_
        position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range ``[0,
            config.max_position_embeddings - 1]``.

            `What are position IDs? <../glossary.html#position-ids>`_
        head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
            Mask to nullify selected heads of the self-attention modules. Mask values selected in ``[0, 1]``:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

        inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`):
            Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
            This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
            vectors than the model's internal embedding lookup matrix.
        output_attentions (:obj:`bool`, `optional`):
            Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
            tensors for more detail.
        output_hidden_states (:obj:`bool`, `optional`):
            Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
            more detail.
        return_dict (:obj:`bool`, `optional`):
            Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
"""


@add_start_docstrings(
    "The bare Bert Model transformer outputting raw hidden-states without any specific head on top.",
    BERT_START_DOCSTRING,
)
class NeZhaModel(BertPreTrainedModel):
    """

    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
    cross-attention is added between the self-attention layers, following the architecture described in `Attention is
    all you need <https://arxiv.org/abs/1706.03762>`__ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
    Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the :obj:`is_decoder` argument of the configuration
    set to :obj:`True`. To be used in a Seq2Seq model, the model needs to initialized with both :obj:`is_decoder`
    argument and :obj:`add_cross_attention` set to :obj:`True`; an :obj:`encoder_hidden_states` is then expected as an
    input to the forward pass.
    """

    def __init__(self, config, add_pooling_layer=True):
        super().__init__(config)
        self.config = config

        self.embeddings = BertEmbeddings(config)
        self.encoder = NeZhaEncoder(config)

        self.pooler = BertPooler(config) if add_pooling_layer else None

        self.init_weights()

    def get_input_embeddings(self):
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """
        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
        class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layer[layer].attention.prune_heads(heads)

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=BaseModelOutputWithPoolingAndCrossAttentions,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
            the model is configured as a decoder.
        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.

            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
        use_cache (:obj:`bool`, `optional`):
            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
            decoding (see :obj:`past_key_values`).
        """
        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
        output_hidden_states = (
            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
        )
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        if self.config.is_decoder:
            use_cache = use_cache if use_cache is not None else self.config.use_cache
        else:
            use_cache = False

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        elif input_ids is not None:
            input_shape = input_ids.size()
            batch_size, seq_length = input_shape
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.size()[:-1]
            batch_size, seq_length = input_shape
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        device = input_ids.device if input_ids is not None else inputs_embeds.device

        # past_key_values_length
        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0

        if attention_mask is None:
            attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)
        if token_type_ids is None:
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)

        # If a 2D or 3D attention mask is provided for the cross-attention
        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
        if self.config.is_decoder and encoder_hidden_states is not None:
            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
            if encoder_attention_mask is None:
                encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
            encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
        else:
            encoder_extended_attention_mask = None

        # Prepare head mask if needed
        # 1.0 in head_mask indicate we keep the head
        # attention_probs has shape bsz x n_heads x N x N
        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        embedding_output = self.embeddings(
            input_ids=input_ids,

            token_type_ids=token_type_ids,
            inputs_embeds=inputs_embeds,
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_extended_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        sequence_output = encoder_outputs[0]
        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None

        if not return_dict:
            return (sequence_output, pooled_output) + encoder_outputs[1:]

        return BaseModelOutputWithPoolingAndCrossAttentions(
            last_hidden_state=sequence_output,
            pooler_output=pooled_output,
            past_key_values=encoder_outputs.past_key_values,
            hidden_states=encoder_outputs.hidden_states,
            attentions=encoder_outputs.attentions,
            cross_attentions=encoder_outputs.cross_attentions,
        )


@add_start_docstrings(
    """
    Bert Model with two heads on top as done during the pretraining: a `masked language modeling` head and a `next
    sentence prediction (classification)` head.
    """,
    BERT_START_DOCSTRING,
)
class BertForPreTraining(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.bert = NeZhaModel(config)
        self.cls = BertPreTrainingHeads(config)

        self.init_weights()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @replace_return_docstrings(output_type=BertForPreTrainingOutput, config_class=_CONFIG_FOR_DOC)
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        labels=None,
        next_sentence_label=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape ``(batch_size, sequence_length)``, `optional`):
            Labels for computing the masked language modeling loss. Indices should be in ``[-100, 0, ...,
            config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are ignored
            (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``
        next_sentence_label (``torch.LongTensor`` of shape ``(batch_size,)``, `optional`):
            Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair
            (see :obj:`input_ids` docstring) Indices should be in ``[0, 1]``:

            - 0 indicates sequence B is a continuation of sequence A,
            - 1 indicates sequence B is a random sequence.
        kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
            Used to hide legacy arguments that have been deprecated.

        Returns:

        Example::

            >>> from transformers import BertTokenizer, BertForPreTraining
            >>> import torch

            >>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
            >>> model = BertForPreTraining.from_pretrained('bert-base-uncased')

            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
            >>> outputs = model(**inputs)

            >>> prediction_logits = outputs.prediction_logits
            >>> seq_relationship_logits = outputs.seq_relationship_logits
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output, pooled_output = outputs[:2]
        prediction_scores, seq_relationship_score = self.cls(sequence_output, pooled_output)

        total_loss = None
        if labels is not None and next_sentence_label is not None:
            loss_fct = CrossEntropyLoss()
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
            next_sentence_loss = loss_fct(seq_relationship_score.view(-1, 2), next_sentence_label.view(-1))
            total_loss = masked_lm_loss + next_sentence_loss

        if not return_dict:
            output = (prediction_scores, seq_relationship_score) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return BertForPreTrainingOutput(
            loss=total_loss,
            prediction_logits=prediction_scores,
            seq_relationship_logits=seq_relationship_score,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


@add_start_docstrings(
    """Bert Model with a `language modeling` head on top for CLM fine-tuning. """, BERT_START_DOCSTRING
)
class BertLMHeadModel(BertPreTrainedModel):

    _keys_to_ignore_on_load_unexpected = [r"pooler"]
    _keys_to_ignore_on_load_missing = [ r"predictions.decoder.bias"]

    def __init__(self, config):
        super().__init__(config)

        if not config.is_decoder:
            logger.warning("If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`")

        self.bert = NeZhaModel(config, add_pooling_layer=False)
        self.cls = BertOnlyMLMHead(config)

        self.init_weights()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @replace_return_docstrings(output_type=CausalLMOutputWithCrossAttentions, config_class=_CONFIG_FOR_DOC)
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        labels=None,
        past_key_values=None,
        use_cache=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
            the model is configured as a decoder.
        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:

            - 1 for tokens that are **not masked**,
            - 0 for tokens that are **masked**.
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
            ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are
            ignored (masked), the loss is only computed for the tokens with labels n ``[0, ..., config.vocab_size]``
        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.

            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
        use_cache (:obj:`bool`, `optional`):
            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
            decoding (see :obj:`past_key_values`).

        Returns:

        Example::

            >>> from transformers import BertTokenizer, BertLMHeadModel, BertConfig
            >>> import torch

            >>> tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
            >>> config = BertConfig.from_pretrained("bert-base-cased")
            >>> config.is_decoder = True
            >>> model = BertLMHeadModel.from_pretrained('bert-base-cased', config=config)

            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
            >>> outputs = model(**inputs)

            >>> prediction_logits = outputs.logits
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        if labels is not None:
            use_cache = False

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            past_key_values=past_key_values,
            use_cache=use_cache,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        prediction_scores = self.cls(sequence_output)

        lm_loss = None
        if labels is not None:
            # we are doing next-token prediction; shift prediction scores and input ids by one
            shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()
            labels = labels[:, 1:].contiguous()
            loss_fct = CrossEntropyLoss()
            lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_scores,) + outputs[2:]
            return ((lm_loss,) + output) if lm_loss is not None else output

        return CausalLMOutputWithCrossAttentions(
            loss=lm_loss,
            logits=prediction_scores,
            past_key_values=outputs.past_key_values,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
            cross_attentions=outputs.cross_attentions,
        )

    def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=None, **model_kwargs):
        input_shape = input_ids.shape
        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
        if attention_mask is None:
            attention_mask = input_ids.new_ones(input_shape)

        # cut decoder_input_ids if past is used
        if past is not None:
            input_ids = input_ids[:, -1:]

        return {"input_ids": input_ids, "attention_mask": attention_mask, "past_key_values": past}

    def _reorder_cache(self, past, beam_idx):
        reordered_past = ()
        for layer_past in past:
            reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
        return reordered_past


@add_start_docstrings("""Bert Model with a `language modeling` head on top. """, BERT_START_DOCSTRING)
class NeZhaForMaskedLM(BertPreTrainedModel):

    _keys_to_ignore_on_load_unexpected = [r"pooler"]
    _keys_to_ignore_on_load_missing = [r"predictions.decoder.bias"]

    def __init__(self, config):
        super().__init__(config)

        if config.is_decoder:
            logger.warning(
                "If you want to use `NeZhaForMaskedLM` make sure `config.is_decoder=False` for "
                "bi-directional self-attention."
            )

        self.bert = NeZhaModel(config, add_pooling_layer=False)
        self.cls = BertOnlyMLMHead(config)

        self.init_weights()

    def get_output_embeddings(self):
        return self.cls.predictions.decoder

    def set_output_embeddings(self, new_embeddings):
        self.cls.predictions.decoder = new_embeddings

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=MaskedLMOutput,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Labels for computing the masked language modeling loss. Indices should be in ``[-100, 0, ...,
            config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are ignored
            (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``
        """

        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            encoder_hidden_states=encoder_hidden_states,
            encoder_attention_mask=encoder_attention_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]
        prediction_scores = self.cls(sequence_output)

        masked_lm_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()  # -100 index = padding token
            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))

        if not return_dict:
            output = (prediction_scores,) + outputs[2:]
            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output

        return MaskedLMOutput(
            loss=masked_lm_loss,
            logits=prediction_scores,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )

    def prepare_inputs_for_generation(self, input_ids, attention_mask=None, **model_kwargs):
        input_shape = input_ids.shape
        effective_batch_size = input_shape[0]

        #  add a dummy token
        assert self.config.pad_token_id is not None, "The PAD token should be defined for generation"
        attention_mask = torch.cat([attention_mask, attention_mask.new_zeros((attention_mask.shape[0], 1))], dim=-1)
        dummy_token = torch.full(
            (effective_batch_size, 1), self.config.pad_token_id, dtype=torch.long, device=input_ids.device
        )
        input_ids = torch.cat([input_ids, dummy_token], dim=1)

        return {"input_ids": input_ids, "attention_mask": attention_mask}


@add_start_docstrings(
    """Bert Model with a `next sentence prediction (classification)` head on top. """,
    BERT_START_DOCSTRING,
)
class BertForNextSentencePrediction(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.bert = NeZhaModel(config)
        self.cls = BertOnlyNSPHead(config)

        self.init_weights()

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @replace_return_docstrings(output_type=NextSentencePredictorOutput, config_class=_CONFIG_FOR_DOC)
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
        **kwargs
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
            Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair
            (see ``input_ids`` docstring). Indices should be in ``[0, 1]``:

            - 0 indicates sequence B is a continuation of sequence A,
            - 1 indicates sequence B is a random sequence.

        Returns:

        Example::

            >>> from transformers import BertTokenizer, BertForNextSentencePrediction
            >>> import torch

            >>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
            >>> model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

            >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
            >>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
            >>> encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

            >>> outputs = model(**encoding, labels=torch.LongTensor([1]))
            >>> logits = outputs.logits
            >>> assert logits[0, 0] < logits[0, 1] # next sentence was random
        """

        if "next_sentence_label" in kwargs:
            warnings.warn(
                "The `next_sentence_label` argument is deprecated and will be removed in a future version, use `labels` instead.",
                FutureWarning,
            )
            labels = kwargs.pop("next_sentence_label")

        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        seq_relationship_scores = self.cls(pooled_output)

        next_sentence_loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            next_sentence_loss = loss_fct(seq_relationship_scores.view(-1, 2), labels.view(-1))

        if not return_dict:
            output = (seq_relationship_scores,) + outputs[2:]
            return ((next_sentence_loss,) + output) if next_sentence_loss is not None else output

        return NextSentencePredictorOutput(
            loss=next_sentence_loss,
            logits=seq_relationship_scores,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


@add_start_docstrings(
    """
    Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled
    output) e.g. for GLUE tasks.
    """,
    BERT_START_DOCSTRING,
)
class BertForSequenceClassification(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.bert = NeZhaModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        self.init_weights()

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=SequenceClassifierOutput,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
            Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ...,
            config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
            If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            if self.num_labels == 1:
                #  We are doing regression
                loss_fct = MSELoss()
                loss = loss_fct(logits.view(-1), labels.view(-1))
            else:
                loss_fct = CrossEntropyLoss()
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return SequenceClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


@add_start_docstrings(
    """
    Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a
    softmax) e.g. for RocStories/SWAG tasks.
    """,
    BERT_START_DOCSTRING,
)
class BertForMultipleChoice(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.bert = NeZhaModel(config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, 1)

        self.init_weights()

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=MultipleChoiceModelOutput,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
            Labels for computing the multiple choice classification loss. Indices should be in ``[0, ...,
            num_choices-1]`` where :obj:`num_choices` is the size of the second dimension of the input tensors. (See
            :obj:`input_ids` above)
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]

        input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
        attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
        token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
        inputs_embeds = (
            inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
            if inputs_embeds is not None
            else None
        )

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        pooled_output = outputs[1]

        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        reshaped_logits = logits.view(-1, num_choices)

        loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            loss = loss_fct(reshaped_logits, labels)

        if not return_dict:
            output = (reshaped_logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return MultipleChoiceModelOutput(
            loss=loss,
            logits=reshaped_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


@add_start_docstrings(
    """
    Bert Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
    Named-Entity-Recognition (NER) tasks.
    """,
    BERT_START_DOCSTRING,
)
class BertForTokenClassification(BertPreTrainedModel):

    _keys_to_ignore_on_load_unexpected = [r"pooler"]

    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.bert = NeZhaModel(config, add_pooling_layer=False)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

        self.init_weights()

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=TokenClassifierOutput,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
            Labels for computing the token classification loss. Indices should be in ``[0, ..., config.num_labels -
            1]``.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,

            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        sequence_output = self.dropout(sequence_output)
        logits = self.classifier(sequence_output)

        loss = None
        if labels is not None:
            loss_fct = CrossEntropyLoss()
            # Only keep active parts of the loss
            if attention_mask is not None:
                active_loss = attention_mask.view(-1) == 1
                active_logits = logits.view(-1, self.num_labels)
                active_labels = torch.where(
                    active_loss, labels.view(-1), torch.tensor(loss_fct.ignore_index).type_as(labels)
                )
                loss = loss_fct(active_logits, active_labels)
            else:
                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))

        if not return_dict:
            output = (logits,) + outputs[2:]
            return ((loss,) + output) if loss is not None else output

        return TokenClassifierOutput(
            loss=loss,
            logits=logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


@add_start_docstrings(
    """
    Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
    layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
    """,
    BERT_START_DOCSTRING,
)
class BertForQuestionAnswering(BertPreTrainedModel):

    _keys_to_ignore_on_load_unexpected = [r"pooler"]

    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels

        self.bert = NeZhaModel(config, add_pooling_layer=False)
        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)

        self.init_weights()

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    @add_code_sample_docstrings(
        tokenizer_class=_TOKENIZER_FOR_DOC,
        checkpoint=_CHECKPOINT_FOR_DOC,
        output_type=QuestionAnsweringModelOutput,
        config_class=_CONFIG_FOR_DOC,
    )
    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,

        head_mask=None,
        inputs_embeds=None,
        start_positions=None,
        end_positions=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=False,
    ):
        r"""
        start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
            Labels for position (index) of the start of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
            sequence are not taken into account for computing the loss.
        end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
            Labels for position (index) of the end of the labelled span for computing the token classification loss.
            Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
            sequence are not taken into account for computing the loss.
        """
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict

        outputs = self.bert(
            input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )

        sequence_output = outputs[0]

        logits = self.qa_outputs(sequence_output)
        start_logits, end_logits = logits.split(1, dim=-1)
        start_logits = start_logits.squeeze(-1)
        end_logits = end_logits.squeeze(-1)

        total_loss = None
        if start_positions is not None and end_positions is not None:
            # If we are on multi-GPU, split add a dimension
            if len(start_positions.size()) > 1:
                start_positions = start_positions.squeeze(-1)
            if len(end_positions.size()) > 1:
                end_positions = end_positions.squeeze(-1)
            # sometimes the start/end positions are outside our model inputs, we ignore these terms
            ignored_index = start_logits.size(1)
            start_positions.clamp_(0, ignored_index)
            end_positions.clamp_(0, ignored_index)

            loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
            start_loss = loss_fct(start_logits, start_positions)
            end_loss = loss_fct(end_logits, end_positions)
            total_loss = (start_loss + end_loss) / 2

        if not return_dict:
            output = (start_logits, end_logits) + outputs[2:]
            return ((total_loss,) + output) if total_loss is not None else output

        return QuestionAnsweringModelOutput(
            loss=total_loss,
            start_logits=start_logits,
            end_logits=end_logits,
            hidden_states=outputs.hidden_states,
            attentions=outputs.attentions,
        )


================================================
FILE: code/bert-base-chinese/config.json
================================================
{
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "type_vocab_size": 2,
  "vocab_size": 21128
}


================================================
FILE: code/bert-base-count3/finetuning/.ipynb_checkpoints/PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
    "Licensed under the MIT License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inference PyTorch Bert Model with ONNX Runtime on GPU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this tutorial, you'll learn how to load a Bert model from PyTorch, convert it to ONNX, and inference it for high performance using ONNX Runtime and NVIDIA GPU. In the following sections, we are going to use the Bert model trained with Stanford Question Answering Dataset (SQuAD) dataset as an example. Bert SQuAD model is used in question answering scenarios, where the answer to every question is a segment of text from the corresponding reading passage, or the question might be unanswerable.\n",
    "\n",
    "This notebook is for GPU inference. For CPU inference, please look at another notebook [Inference PyTorch Bert Model with ONNX Runtime on CPU](PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0. Prerequisites ##\n",
    "It requires your machine to have a GPU, and a python environment with [PyTorch](https://pytorch.org/) installed before running this notebook.\n",
    "\n",
    "#### GPU Environment Setup using AnaConda\n",
    "\n",
    "First, we install [AnaConda](https://www.anaconda.com/distribution/) in a target machine and open an AnaConda prompt window when it is done. Then run the following commands to create a conda environment. This notebook is tested with PyTorch 1.5.0 and OnnxRuntime 1.3.0.\n",
    "\n",
    "```console\n",
    "conda create -n gpu_env python=3.7\n",
    "conda activate gpu_env\n",
    "conda install pytorch torchvision cudatoolkit=10.1 -c pytorch\n",
    "conda install -c anaconda ipykernel\n",
    "conda install -c conda-forge ipywidgets\n",
    "python -m ipykernel install --user --name=gpu_env_py37\n",
    "jupyter notebook\n",
    "```\n",
    "Finally, launch Jupyter Notebook and you can choose gpu_env_py37 as kernel to run this notebook.\n",
    "\n",
    "Onnxruntime-gpu need specified version of CUDA and cuDNN. You can find the corresponding version in [requirements](https://github.com/microsoft/onnxruntime/tree/rel-1.3.0#system-requirements). If the version is different from above cudatoolkit version, you have to install them separately, and add their bin directories to PATH environment variable (See [CUDA and cuDNN Path](#CUDA-and-cuDNN-Path) below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[33mWARNING: Skipping onnxruntime-gpu as it is not installed.\u001b[0m\r\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip uninstall --quiet --yes onnxruntime-gpu\n",
    "!{sys.executable} -m pip install --quiet onnxruntime-gpu\n",
    "!{sys.executable} -m pip install --quiet --upgrade transformers\n",
    "!{sys.executable} -m pip install --quiet --upgrade onnxconverter_common\n",
    "!{sys.executable} -m pip install --quiet --upgrade onnxruntime-tools\n",
    "!{sys.executable} -m pip install --quiet wget netron pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load Pretrained Bert model ##"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We begin by downloading the SQuAD data file and store them in the specified location. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "cache_dir = \"./squad\"\n",
    "if not os.path.exists(cache_dir):\n",
    "    os.makedirs(cache_dir)\n",
    "\n",
    "predict_file_url = \"https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json\"\n",
    "predict_file = os.path.join(cache_dir, \"dev-v1.1.json\")\n",
    "if not os.path.exists(predict_file):\n",
    "    import wget\n",
    "    print(\"Start downloading predict file.\")\n",
    "    wget.download(predict_file_url, predict_file)\n",
    "    print(\"Predict file downloaded.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first define some constant variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Whether allow overwriting existing ONNX model and download the latest script from GitHub\n",
    "enable_overwrite = True\n",
    "\n",
    "# Total samples to inference, so that we can get average latency\n",
    "total_samples = 1000\n",
    "\n",
    "# ONNX opset version\n",
    "opset_version=11"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify some model configuration variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# For fine-tuned large model, the model name is \"bert-large-uncased-whole-word-masking-finetuned-squad\". Here we use bert-base for demo.\n",
    "model_name_or_path = \"bert-base-cased\"\n",
    "max_seq_length = 128\n",
    "doc_stride = 128\n",
    "max_query_length = 64"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Start to load model from pretrained. This step could take a few minutes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 48/48 [00:04<00:00, 11.28it/s]\n",
      "convert squad examples to features: 100%|██████████| 1000/1000 [00:09<00:00, 102.15it/s]\n",
      "add example index and unique id: 100%|██████████| 1000/1000 [00:00<00:00, 161306.98it/s]\n"
     ]
    }
   ],
   "source": [
    "# The following code is adapted from HuggingFace transformers\n",
    "# https://github.com/huggingface/transformers/blob/master/examples/run_squad.py\n",
    "\n",
    "from transformers import (BertConfig, BertForQuestionAnswering, BertTokenizer)\n",
    "\n",
    "# Load pretrained model and tokenizer\n",
    "config_class, model_class, tokenizer_class = (BertConfig, BertForQuestionAnswering, BertTokenizer)\n",
    "config = config_class.from_pretrained(model_name_or_path, cache_dir=cache_dir)\n",
    "tokenizer = tokenizer_class.from_pretrained(model_name_or_path, do_lower_case=True, cache_dir=cache_dir)\n",
    "model = model_class.from_pretrained(model_name_or_path,\n",
    "                                    from_tf=False,\n",
    "                                    config=config,\n",
    "                                    cache_dir=cache_dir)\n",
    "# load some examples\n",
    "from transformers.data.processors.squad import SquadV1Processor\n",
    "\n",
    "processor = SquadV1Processor()\n",
    "examples = processor.get_dev_examples(None, filename=predict_file)\n",
    "\n",
    "from transformers import squad_convert_examples_to_features\n",
    "features, dataset = squad_convert_examples_to_features( \n",
    "            examples=examples[:total_samples], # convert enough examples for this notebook\n",
    "            tokenizer=tokenizer,\n",
    "            max_seq_length=max_seq_length,\n",
    "            doc_stride=doc_stride,\n",
    "            max_query_length=max_query_length,\n",
    "            is_training=False,\n",
    "            return_dataset='pt'\n",
    "        )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Export the loaded model ##\n",
    "Once the model is loaded, we can export the loaded PyTorch model to ONNX."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model exported at  ./onnx/bert-base-cased-squad_opset11.onnx\n"
     ]
    }
   ],
   "source": [
    "output_dir = \"./onnx\"\n",
    "if not os.path.exists(output_dir):\n",
    "    os.makedirs(output_dir)   \n",
    "export_model_path = os.path.join(output_dir, 'bert-base-cased-squad_opset{}.onnx'.format(opset_version))\n",
    "\n",
    "import torch\n",
    "use_gpu = torch.cuda.is_available()\n",
    "device = torch.device(\"cuda\" if use_gpu else \"cpu\")\n",
    "\n",
    "# Get the first example data to run the model and export it to ONNX\n",
    "data = dataset[0]\n",
    "inputs = {\n",
    "    'input_ids':      data[0].to(device).reshape(1, max_seq_length),\n",
    "    'attention_mask': data[1].to(device).reshape(1, max_seq_length),\n",
    "    'token_type_ids': data[2].to(device).reshape(1, max_seq_length)\n",
    "}\n",
    "\n",
    "# Set model to inference mode, which is required before exporting the model because some operators behave differently in \n",
    "# inference and training mode.\n",
    "model.eval()\n",
    "model.to(device)\n",
    "\n",
    "if enable_overwrite or not os.path.exists(export_model_path):\n",
    "    with torch.no_grad():\n",
    "        symbolic_names = {0: 'batch_size', 1: 'max_seq_len'}\n",
    "        torch.onnx.export(model,                                            # model being run\n",
    "                          args=tuple(inputs.values()),                      # model input (or a tuple for multiple inputs)\n",
    "                          f=export_model_path,                              # where to save the model (can be a file or file-like object)\n",
    "                          opset_version=opset_version,                      # the ONNX version to export the model to\n",
    "                          do_constant_folding=True,                         # whether to execute constant folding for optimization\n",
    "                          input_names=['input_ids',                         # the model's input names\n",
    "                                       'input_mask', \n",
    "                                       'segment_ids'],\n",
    "                          output_names=['start', 'end'],                    # the model's output names\n",
    "                          dynamic_axes={'input_ids': symbolic_names,        # variable length axes\n",
    "                                        'input_mask' : symbolic_names,\n",
    "                                        'segment_ids' : symbolic_names,\n",
    "                                        'start' : symbolic_names,\n",
    "                                        'end' : symbolic_names})\n",
    "        print(\"Model exported at \", export_model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. PyTorch Inference ##\n",
    "Use PyTorch to evaluate an example input for comparison purpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "PyTorch cuda Inference time = 16.57 ms\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "# Measure the latency. It is not accurate using Jupyter Notebook, it is recommended to use standalone python script.\n",
    "latency = []\n",
    "with torch.no_grad():\n",
    "    for i in range(total_samples):\n",
    "        data = dataset[i]\n",
    "        inputs = {\n",
    "            'input_ids':      data[0].to(device).reshape(1, max_seq_length),\n",
    "            'attention_mask': data[1].to(device).reshape(1, max_seq_length),\n",
    "            'token_type_ids': data[2].to(device).reshape(1, max_seq_length)\n",
    "        }\n",
    "        start = time.time()\n",
    "        outputs = model(**inputs)\n",
    "        latency.append(time.time() - start)\n",
    "print(\"PyTorch {} Inference time = {} ms\".format(device.type, format(sum(latency) * 1000 / len(latency), '.2f')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Inference ONNX Model with ONNX Runtime ##\n",
    "\n",
    "### CUDA and cuDNN Path\n",
    "onnxruntime-gpu has dependency on [CUDA](https://developer.nvidia.com/cuda-downloads) and [cuDNN](https://developer.nvidia.com/cudnn):\n",
    "\n",
    "* [onnxruntime-gpu v1.3.0](https://github.com/microsoft/onnxruntime/tree/rel-1.3.0#system-requirements) requires CUDA Runtime 10.1 and CUDNN 7.6.5.\n",
    "* [onnxruntime-gpu v1.2.0](https://github.com/microsoft/onnxruntime/releases/tag/v1.2.0) requires CUDA Runtime 10.1 and CUDNN 7.6.5.\n",
    "\n",
    "During installing PyTorch 1.5, we installed cudatoolkit 10.1.243 in this conda environment. That shall be good for onnxruntime-gpu 1.3.0 in Jupyter Notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Change to True when onnxruntime (like onnxruntime-gpu 1.0.0 ~ 1.1.2) cannot be imported.\n",
    "add_cuda_path = False\n",
    "\n",
    "if add_cuda_path:\n",
    "    # Add path of CUDA 10.0 and CUDNN 7.6 for onnxruntime-gpu 1.0.0 ~ 1.1.2\n",
    "    cuda_dir = 'D:/NVidia/CUDA/v10.1/bin'\n",
    "    cudnn_dir = 'D:/NVidia/CUDA/v10.1/bin'\n",
    "    if not (os.path.exists(cuda_dir) and os.path.exists(cudnn_dir)):\n",
    "        raise ValueError(\"Please specify correct path for CUDA and cuDNN. Otherwise onnxruntime cannot be imported.\")\n",
    "    else:\n",
    "        if cuda_dir == cudnn_dir:\n",
    "            os.environ[\"PATH\"] = cuda_dir + ';' + os.environ[\"PATH\"]\n",
    "        else:\n",
    "            os.environ[\"PATH\"] = cuda_dir + ';' + cudnn_dir + ';' + os.environ[\"PATH\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### OpenMP Environment Variable\n",
    "\n",
    "OpenMP environment variables are optional for GPU inference of standard Bert model. It has little performance impact on Bert model since most nodes are executed in GPU. \n",
    "\n",
    "You can find the best setting based on [Performance Test Tool](#Performance-Test-Tool) result in later part of this notebook.\n",
    "\n",
    "**Attention: Setting environment variables shall be done before importing onnxruntime**. Otherwise, they might not take effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optional. You can change them according to Performance Test Tool result.\n",
    "#os.environ[\"OMP_NUM_THREADS\"] = '1'\n",
    "#os.environ[\"OMP_WAIT_POLICY\"] = 'PASSIVE'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to inference the model with ONNX Runtime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "OnnxRuntime gpu Inference time = 4.43 ms\n"
     ]
    }
   ],
   "source": [
    "import psutil\n",
    "import onnxruntime\n",
    "import numpy\n",
    "\n",
    "assert 'CUDAExecutionProvider' in onnxruntime.get_available_providers()\n",
    "device_name = 'gpu'\n",
    "\n",
    "sess_options = onnxruntime.SessionOptions()\n",
    "\n",
    "# Optional: store the optimized graph and view it using Netron to verify that model is fully optimized.\n",
    "# Note that this will increase session creation time so enable it for debugging only.\n",
    "sess_options.optimized_model_filepath = os.path.join(output_dir, \"optimized_model_{}.onnx\".format(device_name))\n",
    "\n",
    "# Please change the value according to best setting in Performance Test Tool result.\n",
    "sess_options.intra_op_num_threads=psutil.cpu_count(logical=True)\n",
    "\n",
    "session = onnxruntime.InferenceSession(export_model_path, sess_options)\n",
    "\n",
    "latency = []\n",
    "for i in range(total_samples):\n",
    "    data = dataset[i]\n",
    "    # TODO: use IO Binding (see https://github.com/microsoft/onnxruntime/pull/4206) to improve performance.\n",
    "    ort_inputs = {\n",
    "        'input_ids':  data[0].cpu().reshape(1, max_seq_length).numpy(),\n",
    "        'input_mask': data[1].cpu().reshape(1, max_seq_length).numpy(),\n",
    "        'segment_ids': data[2].cpu().reshape(1, max_seq_length).numpy()\n",
    "    }\n",
    "    start = time.time()\n",
    "    ort_outputs = session.run(None, ort_inputs)\n",
    "    latency.append(time.time() - start)\n",
    "    \n",
    "print(\"OnnxRuntime {} Inference time = {} ms\".format(device_name, format(sum(latency) * 1000 / len(latency), '.2f')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can compare the output of PyTorch and ONNX Runtime. We can see some results are not close. It is because ONNX Runtime uses some approximation in CUDA optimization. Based on our evaluation on SQuAD data set, F1 score is on par for models before and after optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "***** Verifying correctness *****\n",
      "PyTorch and ONNX Runtime output 0 are close: True\n",
      "maximum_diff=9.499490261077881e-07 average_diff=1.4225952327251434e-07\n",
      "PyTorch and ONNX Runtime output 1 are close: True\n",
      "maximum_diff=6.92903995513916e-07 average_diff=1.2441887520253658e-07\n"
     ]
    }
   ],
   "source": [
    "print(\"***** Verifying correctness *****\")\n",
    "for i in range(2):    \n",
    "    print('PyTorch and ONNX Runtime output {} are close:'.format(i), numpy.allclose(ort_outputs[i], outputs[i].cpu(), rtol=1e-02, atol=1e-02))\n",
    "    diff = ort_outputs[i] - outputs[i].cpu().numpy()\n",
    "    max_diff = numpy.max(numpy.abs(diff))\n",
    "    avg_diff = numpy.average(numpy.abs(diff))\n",
    "    print(f'maximum_diff={max_diff} average_diff={avg_diff}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inference with Actual Sequence Length\n",
    "Note that ONNX model is exported using dynamic length axis. It is recommended to use actual sequence input without padding instead of fixed length input for best performance. Let's see how it can be applied to this model.\n",
    "\n",
    "From an example input below, we can see zero padding at the end of each sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input_ids': tensor([[  101,  1293,  1242,  2557,  1127,  1226,  1104,  1103,  3613, 16429,\n",
       "           5235,   136,   102,  3613, 16429,  5988,   170,   107,  1353,  1671,\n",
       "           1992,  1342,   107,  5235,   117,  1107,  1134,  1473,  3683,  3538,\n",
       "           1125,   170,  1476,   118,  1248,  2595,  4086,  1714,  1104,  2965,\n",
       "          15897,  1104,  3613, 16429,   119,  1473,  3683,  3538,  3222,  1149,\n",
       "           2551,  1168, 23759,  1116,  1121,  1506,  1103, 10280,  2231,  1111,\n",
       "           1103,  1714, 16355,   119,   102,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
       "              0,     0,     0,     0,     0,     0,     0,     0]],\n",
       "        device='cuda:0'),\n",
       " 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
       "          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
       "          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0'),\n",
       " 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
       "          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
       "          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
       "          0, 0, 0, 0, 0, 0, 0, 0]], device='cuda:0')}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# An example input (we can see padding). From attention_mask, we can deduce the actual length.\n",
    "inputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original sequence length is 128. After removing paddings, the sequence length is reduced. Input with smaller sequence length need less computation, thus we can see there is improvement on inference latency. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Average length 101\n",
      "OnnxRuntime gpu Inference time with actual sequence length = 4.23 ms\n"
     ]
    }
   ],
   "source": [
    "import statistics\n",
    "\n",
    "latency = []\n",
    "lengths = []\n",
    "for i in range(total_samples):\n",
    "    data = dataset[i]\n",
    "    # Instead of using fixed length (128), we can use actual sequence length (less than 128), which helps to get better performance.\n",
    "    actual_sequence_length = sum(data[1].numpy())\n",
    "    lengths.append(actual_sequence_length)\n",
    "    opt_inputs = {\n",
    "        'input_ids':  data[0].numpy()[:actual_sequence_length].reshape(1, actual_sequence_length),\n",
    "        'input_mask': data[1].numpy()[:actual_sequence_length].reshape(1, actual_sequence_length),\n",
    "        'segment_ids': data[2].numpy()[:actual_sequence_length].reshape(1, actual_sequence_length)\n",
    "    }\n",
    "    start = time.time()\n",
    "    opt_outputs = session.run(None, opt_inputs)\n",
    "    latency.append(time.time() - start)\n",
    "print(\"Average length\", statistics.mean(lengths))\n",
    "print(\"OnnxRuntime {} Inference time with actual sequence length = {} ms\".format(device_name, format(sum(latency) * 1000 / len(latency), '.2f')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compare the output and see whether the results are close.\n",
    "\n",
    "**Note**: Need end-to-end evaluation on performance and accuracy if you use this strategy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "***** Comparing results with/without paddings *****\n",
      "Output 0 are close: True\n",
      "Output 1 are close: True\n"
     ]
    }
   ],
   "source": [
    "print(\"***** Comparing results with/without paddings *****\")\n",
    "for i in range(2):\n",
    "    print('Output {} are close:'.format(i), numpy.allclose(opt_outputs[i], ort_outputs[i][:,:len(opt_outputs[i][0])], rtol=1e-03, atol=1e-03))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Offline Optimization and Test Tools\n",
    "\n",
    "It is recommended to try [OnnxRuntime Transformer Model Optimization Tool](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers) on the exported ONNX models. It could help verify whether the model can be fully optimized, and get performance test results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Transformer Optimizer\n",
    "\n",
    "Although OnnxRuntime could optimize Bert model exported by PyTorch. Sometime, model cannot be fully optimized due to different reasons:\n",
    "* A new subgraph pattern is generated by new version of export tool, and the pattern is not covered by older version of OnnxRuntime. \n",
    "* The exported model uses dynamic axis and this makes it harder for shape inference of the graph. That blocks some optimization to be applied.\n",
    "* Some optimization is better to be done offline. Like change input tensor type from int64 to int32 to avoid extra Cast nodes, or convert model to float16 to achieve better performance in V100 or T4 GPU.\n",
    "\n",
    "We have python script **optimizer.py**, which is more flexible in graph pattern matching and model conversion (like float32 to float16). You can also use it to verify whether a Bert model is fully optimized.\n",
    "\n",
    "In this example, we can see that it introduces optimization that is not provided by onnxruntime: SkipLayerNormalization and bias fusion, which is not fused in OnnxRuntime due to shape inference as mentioned.\n",
    "\n",
    "It will also tell whether the model is fully optimized or not. If not, that means you might need change the script to fuse some new pattern of subgraph.\n",
    "\n",
    "Example Usage:\n",
    "```\n",
    "from onnxruntime_tools import optimizer\n",
    "optimized_model = optimizer.optimize_model(export_model_path, model_type='bert', num_heads=12, hidden_size=768)\n",
    "optimized_model.save_model_to_file(optimized_model_path)\n",
    "```\n",
    "\n",
    "You can also use optimizer_cli like the following:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Float32 Model\n",
    "Let us optimize the ONNX model using the script. The first example will output model with float32 to store weights. This is the choice for most GPUs without Tensor Core.\n",
    "\n",
    "If your GPU (like V100 or T4) has Tensor Core, jump to [Float16 Model](#6.-Model-Optimization-with-Float16) section since that will give you better performance than Float32 model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimize_by_onnxruntime: Save optimized model by onnxruntime to ./onnx/bert-base-cased-squad_opset11_o1_cpu.onnx\n",
      "               apply: Fused LayerNormalization count: 25\n",
      "               apply: Fused Gelu count: 12\n",
      "               apply: Fused SkipLayerNormalization count: 25\n",
      "               apply: Fused Attention count: 12\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 5 nodes are removed\n",
      "               apply: Fused EmbedLayerNormalization(with mask) count: 1\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 12 nodes are removed\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 0 nodes are removed\n",
      "               apply: Fused BiasGelu count: 12\n",
      "               apply: Fused SkipLayerNormalization(add bias) count: 24\n",
      "            optimize: opset verion: 11\n",
      "  save_model_to_file: Output model to ./onnx/bert-base-cased-squad_opt_gpu_fp32.onnx\n",
      "get_fused_operator_statistics: Optimized operators:{'EmbedLayerNormalization': 1, 'Attention': 12, 'Gelu': 0, 'FastGelu': 0, 'BiasGelu': 12, 'LayerNormalization': 0, 'SkipLayerNormalization': 24}\n",
      "                main: The model has been fully optimized.\n"
     ]
    }
   ],
   "source": [
    "optimized_fp32_model_path = './onnx/bert-base-cased-squad_opt_{}_fp32.onnx'.format('gpu' if use_gpu else 'cpu')\n",
    "\n",
    "!python -m onnxruntime_tools.optimizer_cli --input $export_model_path --output $optimized_fp32_model_path"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Optimized Graph\n",
    "We can open the optimized model using [Netron](https://github.com/lutzroeder/netron) to visualize.\n",
    "\n",
    "The graph is like the following:\n",
    "<img src='images/optimized_bert_gpu.png'>\n",
    "\n",
    "Sometime, optimized graph is slightly different. For example, FastGelu is replaced by BiasGelu for CPU inference; When the option --input_int32 is used, Cast nodes for inputs are removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "import netron\n",
    "\n",
    "# change it to True if want to view the optimized model in browser\n",
    "enable_netron = False\n",
    "if enable_netron:\n",
    "    # If you encounter error \"access a socket in a way forbidden by its access permissions\", install Netron as standalone application instead.\n",
    "    netron.start(optimized_fp32_model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performance Test Tool\n",
    "\n",
    "The following will create 1000 random inputs of batch_size 1 and sequence length 128, then measure the average latency and throughput numbers.\n",
    "\n",
    "Note that the test uses fixed sequence length. If you use [dynamic sequence length](#Inference-with-Actual-Sequence-Length), actual performance depends on the distribution of sequence length.\n",
    "\n",
    "**Attention**: Latency numbers from Jupyter Notebook are not accurate. See [Attional Info](#7.-Additional-Info) for more info."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "test setting TestSetting(batch_size=1, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=None, omp_wait_policy=None, intra_op_num_threads=None, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=1 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=0,OMP_NUM_THREADS=,OMP_WAIT_POLICY=,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.92 ms, Throughput = 203.24 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.90 ms, Throughput = 203.88 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 5.07 ms, Throughput = 197.16 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.82 ms, Throughput = 207.33 QPS\n",
      "skip duplicated test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "skip duplicated test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.93 ms, Throughput = 202.92 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.91 ms, Throughput = 203.55 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp32.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.88 ms, Throughput = 204.90 QPS\n",
      "Test summary is saved to onnx/perf_results_GPU_B1_S128_20200617-232134.txt\n"
     ]
    }
   ],
   "source": [
    "GPU_OPTION = '--use_gpu' if use_gpu else ''\n",
    "\n",
    "!python -m onnxruntime_tools.transformers.bert_perf_test --model $optimized_fp32_model_path --batch_size 1 --sequence_length 128 --samples 1000 --test_times 1 --inclusive --all $GPU_OPTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load the summary file and take a look. Note that blank value in OMP_NUM_THREADS or OMP_WAIT_POLICY means the environment variable does not exist."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Float32 model perf results from ./onnx/perf_results_GPU_B1_S128_20200617-232134.txt\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Latency(ms)</th>\n",
       "      <th>Latency_P50</th>\n",
       "      <th>Latency_P75</th>\n",
       "      <th>Latency_P90</th>\n",
       "      <th>Latency_P95</th>\n",
       "      <th>Latency_P99</th>\n",
       "      <th>Throughput(QPS)</th>\n",
       "      <th>intra_op_num_threads</th>\n",
       "      <th>OMP_NUM_THREADS</th>\n",
       "      <th>OMP_WAIT_POLICY</th>\n",
       "      <th>contiguous</th>\n",
       "      <th>warmup</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>4.82</td>\n",
       "      <td>4.53</td>\n",
       "      <td>4.57</td>\n",
       "      <td>5.15</td>\n",
       "      <td>7.25</td>\n",
       "      <td>8.75</td>\n",
       "      <td>207.33</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4.88</td>\n",
       "      <td>4.54</td>\n",
       "      <td>4.58</td>\n",
       "      <td>6.47</td>\n",
       "      <td>7.13</td>\n",
       "      <td>8.68</td>\n",
       "      <td>204.90</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.90</td>\n",
       "      <td>4.54</td>\n",
       "      <td>4.57</td>\n",
       "      <td>6.16</td>\n",
       "      <td>7.64</td>\n",
       "      <td>8.82</td>\n",
       "      <td>203.88</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4.91</td>\n",
       "      <td>4.55</td>\n",
       "      <td>4.59</td>\n",
       "      <td>6.70</td>\n",
       "      <td>7.43</td>\n",
       "      <td>8.78</td>\n",
       "      <td>203.55</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.92</td>\n",
       "      <td>4.57</td>\n",
       "      <td>4.60</td>\n",
       "      <td>6.50</td>\n",
       "      <td>7.82</td>\n",
       "      <td>8.90</td>\n",
       "      <td>203.24</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>4.93</td>\n",
       "      <td>4.55</td>\n",
       "      <td>4.59</td>\n",
       "      <td>6.66</td>\n",
       "      <td>7.57</td>\n",
       "      <td>8.80</td>\n",
       "      <td>202.92</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5.07</td>\n",
       "      <td>4.56</td>\n",
       "      <td>4.61</td>\n",
       "      <td>7.19</td>\n",
       "      <td>8.11</td>\n",
       "      <td>9.01</td>\n",
       "      <td>197.16</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Latency(ms)  Latency_P50  Latency_P75  Latency_P90  Latency_P95  \\\n",
       "0         4.82         4.53         4.57         5.15         7.25   \n",
       "1         4.88         4.54         4.58         6.47         7.13   \n",
       "2         4.90         4.54         4.57         6.16         7.64   \n",
       "3         4.91         4.55         4.59         6.70         7.43   \n",
       "4         4.92         4.57         4.60         6.50         7.82   \n",
       "5         4.93         4.55         4.59         6.66         7.57   \n",
       "6         5.07         4.56         4.61         7.19         8.11   \n",
       "\n",
       "   Latency_P99  Throughput(QPS)  intra_op_num_threads OMP_NUM_THREADS  \\\n",
       "0         8.75           207.33                     1              12   \n",
       "1         8.68           204.90                    12              12   \n",
       "2         8.82           203.88                     1              12   \n",
       "3         8.78           203.55                    12              12   \n",
       "4         8.90           203.24                     0                   \n",
       "5         8.80           202.92                    12               1   \n",
       "6         9.01           197.16                    12               1   \n",
       "\n",
       "  OMP_WAIT_POLICY contiguous  warmup  \n",
       "0          ACTIVE       None    True  \n",
       "1         PASSIVE       None    True  \n",
       "2         PASSIVE       None    True  \n",
       "3          ACTIVE       None    True  \n",
       "4                       None    True  \n",
       "5         PASSIVE       None    True  \n",
       "6          ACTIVE       None    True  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import glob     \n",
    "import pandas\n",
    "latest_result_file = max(glob.glob(\"./onnx/perf_results_GPU_B1_S128_*.txt\"), key=os.path.getmtime)\n",
    "result_data = pandas.read_table(latest_result_file, converters={'OMP_NUM_THREADS': str, 'OMP_WAIT_POLICY':str})\n",
    "print(\"Float32 model perf results from\", latest_result_file)\n",
    "# Remove some columns that have same values for all rows.\n",
    "columns_to_remove = ['model', 'graph_optimization_level', 'batch_size', 'sequence_length', 'test_cases', 'test_times', 'use_gpu']\n",
    "result_data.drop(columns_to_remove, axis=1, inplace=True)\n",
    "result_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From above result, we can see that latency is very close for different settings. The default setting (intra_op_num_threads=0, OMP_NUM_THREADS and OMP_WAIT_POLICY does not exist) performs the best. \n",
    "\n",
    "### Model Results Comparison Tool\n",
    "\n",
    "When a BERT model is optimized, some approximation is used in calculation. If your BERT model has three inputs, a script compare_bert_results.py can be used to do a quick verification. The tool will generate some fake input data, and compare the inference outputs of the original and optimized models. If outputs are all close, it is safe to use the optimized model.\n",
    "\n",
    "For GPU inference, the absolute or relative difference is larger than those numbers of CPU inference. Note that slight difference in output will not impact final result. We did end-to-end evaluation using SQuAD data set using a fine-tuned squad model, and F1 score is almost the same before/after optimization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100% passed for 100 random inputs given thresholds (rtol=0.01, atol=0.01).\r\n",
      "maximum absolute difference=1.9222497940063477e-06\r\n",
      "maximum relative difference=0.05027933046221733\r\n"
     ]
    }
   ],
   "source": [
    "!python -m onnxruntime_tools.transformers.compare_bert_results --baseline_model $export_model_path --optimized_model $optimized_fp32_model_path --batch_size 1 --sequence_length 128 --samples 100 --rtol 0.01 --atol 0.01 $GPU_OPTION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Model Optimization with Float16\n",
    "\n",
    "The optimizer.py script have an option **--float16** to convert model to use float16 to store weights. After the conversion, it could be faster to run in GPU with tensor cores like V100 or T4.\n",
    "\n",
    "Let's run tools to measure the performance on V100. The results show significant performance improvement: latency is about 3.4 ms for float32 model, and 1.8 ms for float16 model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "optimize_by_onnxruntime: Save optimized model by onnxruntime to ./onnx/bert-base-cased-squad_opset11_o1_cpu.onnx\n",
      "               apply: Fused LayerNormalization count: 25\n",
      "               apply: Fused Gelu count: 12\n",
      "               apply: Fused SkipLayerNormalization count: 25\n",
      "               apply: Fused Attention count: 12\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 5 nodes are removed\n",
      "               apply: Fused EmbedLayerNormalization(with mask) count: 1\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 12 nodes are removed\n",
      "         prune_graph: Graph pruned: 0 inputs, 0 outputs and 0 nodes are removed\n",
      "               apply: Fused BiasGelu count: 12\n",
      "               apply: Fused SkipLayerNormalization(add bias) count: 24\n",
      "            optimize: opset verion: 11\n",
      "  save_model_to_file: Output model to ./onnx/bert-base-cased-squad_opt_gpu_fp16.onnx\n",
      "get_fused_operator_statistics: Optimized operators:{'EmbedLayerNormalization': 1, 'Attention': 12, 'Gelu': 0, 'FastGelu': 0, 'BiasGelu': 12, 'LayerNormalization': 0, 'SkipLayerNormalization': 24}\n",
      "                main: The model has been fully optimized.\n"
     ]
    }
   ],
   "source": [
    "optimized_fp16_model_path = './onnx/bert-base-cased-squad_opt_{}_fp16.onnx'.format('gpu' if use_gpu else 'cpu')\n",
    "!python -m onnxruntime_tools.optimizer_cli --input $export_model_path --output $optimized_fp16_model_path --float16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "test setting TestSetting(batch_size=1, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=None, omp_wait_policy=None, intra_op_num_threads=None, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=1 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=0,OMP_NUM_THREADS=,OMP_WAIT_POLICY=,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.01 ms, Throughput = 331.90 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.12 ms, Throughput = 320.00 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.02 ms, Throughput = 331.39 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.01 ms, Throughput = 332.53 QPS\n",
      "skip duplicated test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "skip duplicated test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=1,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.04 ms, Throughput = 328.67 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.01 ms, Throughput = 331.72 QPS\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=12,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=PASSIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.04 ms, Throughput = 329.32 QPS\n",
      "Test summary is saved to onnx/perf_results_GPU_B1_S128_20200617-232234.txt\n"
     ]
    }
   ],
   "source": [
    "GPU_OPTION = '--use_gpu' if use_gpu else ''\n",
    "!python -m onnxruntime_tools.transformers.bert_perf_test --model $optimized_fp16_model_path --batch_size 1 --sequence_length 128 --samples 1000 --test_times 1 --inclusive --all $GPU_OPTION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Float32 model perf results from ./onnx/perf_results_GPU_B1_S128_20200617-232234.txt\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Latency(ms)</th>\n",
       "      <th>Latency_P50</th>\n",
       "      <th>Latency_P75</th>\n",
       "      <th>Latency_P90</th>\n",
       "      <th>Latency_P95</th>\n",
       "      <th>Latency_P99</th>\n",
       "      <th>Throughput(QPS)</th>\n",
       "      <th>intra_op_num_threads</th>\n",
       "      <th>OMP_NUM_THREADS</th>\n",
       "      <th>OMP_WAIT_POLICY</th>\n",
       "      <th>contiguous</th>\n",
       "      <th>warmup</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.01</td>\n",
       "      <td>2.79</td>\n",
       "      <td>2.81</td>\n",
       "      <td>2.86</td>\n",
       "      <td>5.08</td>\n",
       "      <td>7.16</td>\n",
       "      <td>332.53</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.01</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.81</td>\n",
       "      <td>2.88</td>\n",
       "      <td>4.52</td>\n",
       "      <td>7.05</td>\n",
       "      <td>331.90</td>\n",
       "      <td>0</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.01</td>\n",
       "      <td>2.78</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.92</td>\n",
       "      <td>5.01</td>\n",
       "      <td>7.02</td>\n",
       "      <td>331.72</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.02</td>\n",
       "      <td>2.79</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.85</td>\n",
       "      <td>6.34</td>\n",
       "      <td>7.04</td>\n",
       "      <td>331.39</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>ACTIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3.04</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.82</td>\n",
       "      <td>2.93</td>\n",
       "      <td>5.56</td>\n",
       "      <td>7.08</td>\n",
       "      <td>329.32</td>\n",
       "      <td>12</td>\n",
       "      <td>12</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>3.04</td>\n",
       "      <td>2.79</td>\n",
       "      <td>2.81</td>\n",
       "      <td>2.92</td>\n",
       "      <td>6.37</td>\n",
       "      <td>7.08</td>\n",
       "      <td>328.67</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3.12</td>\n",
       "      <td>2.79</td>\n",
       "      <td>2.82</td>\n",
       "      <td>2.96</td>\n",
       "      <td>6.66</td>\n",
       "      <td>7.20</td>\n",
       "      <td>320.00</td>\n",
       "      <td>1</td>\n",
       "      <td>12</td>\n",
       "      <td>PASSIVE</td>\n",
       "      <td>None</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Latency(ms)  Latency_P50  Latency_P75  Latency_P90  Latency_P95  \\\n",
       "0         3.01         2.79         2.81         2.86         5.08   \n",
       "1         3.01         2.80         2.81         2.88         4.52   \n",
       "2         3.01         2.78         2.80         2.92         5.01   \n",
       "3         3.02         2.79         2.80         2.85         6.34   \n",
       "4         3.04         2.80         2.82         2.93         5.56   \n",
       "5         3.04         2.79         2.81         2.92         6.37   \n",
       "6         3.12         2.79         2.82         2.96         6.66   \n",
       "\n",
       "   Latency_P99  Throughput(QPS)  intra_op_num_threads OMP_NUM_THREADS  \\\n",
       "0         7.16           332.53                     1              12   \n",
       "1         7.05           331.90                     0                   \n",
       "2         7.02           331.72                    12              12   \n",
       "3         7.04           331.39                    12               1   \n",
       "4         7.08           329.32                    12              12   \n",
       "5         7.08           328.67                    12               1   \n",
       "6         7.20           320.00                     1              12   \n",
       "\n",
       "  OMP_WAIT_POLICY contiguous  warmup  \n",
       "0          ACTIVE       None    True  \n",
       "1                       None    True  \n",
       "2          ACTIVE       None    True  \n",
       "3          ACTIVE       None    True  \n",
       "4         PASSIVE       None    True  \n",
       "5         PASSIVE       None    True  \n",
       "6         PASSIVE       None    True  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import glob     \n",
    "import pandas\n",
    "latest_result_file = max(glob.glob(\"./onnx/perf_results_GPU_B1_S128_*.txt\"), key=os.path.getmtime)\n",
    "result_data = pandas.read_table(latest_result_file, converters={'OMP_NUM_THREADS': str, 'OMP_WAIT_POLICY':str})\n",
    "print(\"Float32 model perf results from\", latest_result_file)\n",
    "# Remove some columns that have same values for all rows.\n",
    "columns_to_remove = ['model', 'graph_optimization_level', 'batch_size', 'sequence_length', 'test_cases', 'test_times', 'use_gpu']\n",
    "result_data.drop(columns_to_remove, axis=1, inplace=True)\n",
    "result_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Throughput Tuning\n",
    "\n",
    "Some application need best throughput under some constraint on latency. This can be done by testing performance of different batch sizes. The tool could help on this.\n",
    "\n",
    "Here is an example that check the performance of multiple batch sizes (1, 2, 4, 8, 16, 32 and 64) using default settings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "test setting TestSetting(batch_size=32, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=32 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=32,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 16.17 ms, Throughput = 1979.41 QPS\n",
      "test setting TestSetting(batch_size=1, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=1 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=1,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.00 ms, Throughput = 333.83 QPS\n",
      "test setting TestSetting(batch_size=2, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=2 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=2,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 3.59 ms, Throughput = 557.32 QPS\n",
      "test setting TestSetting(batch_size=64, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=64 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=64,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 29.26 ms, Throughput = 2187.15 QPS\n",
      "test setting TestSetting(batch_size=4, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=4 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=4,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 4.32 ms, Throughput = 926.92 QPS\n",
      "test setting TestSetting(batch_size=8, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=8 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=8,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 6.32 ms, Throughput = 1266.63 QPS\n",
      "test setting TestSetting(batch_size=16, sequence_length=128, test_cases=1000, test_times=1, contiguous=None, use_gpu=True, warmup=True, omp_num_threads=12, omp_wait_policy='ACTIVE', intra_op_num_threads=1, seed=3, verbose=False, inclusive=False, extra_latency=True)\n",
      "Generating 1000 samples for batch_size=16 sequence_length=128\n",
      "Running test: model=bert-base-cased-squad_opt_gpu_fp16.onnx,graph_optimization_level=ENABLE_ALL,intra_op_num_threads=1,OMP_NUM_THREADS=12,OMP_WAIT_POLICY=ACTIVE,batch_size=16,sequence_length=128,test_cases=1000,test_times=1,contiguous=None,use_gpu=True,warmup=True\n",
      "Average latency = 9.60 ms, Throughput = 1666.05 QPS\n",
      "Test summary is saved to onnx/perf_results_GPU_B1-2-4-8-16-32-64_S128_20200617-232401.txt\n"
     ]
    }
   ],
   "source": [
    "GPU_OPTION = '--use_gpu' if use_gpu else ''\n",
    "THREAD_SETTING = '--intra_op_num_threads 1 --omp_num_threads {} --omp_wait_policy ACTIVE'.format(psutil.cpu_count(logical=True))\n",
    "!python -m onnxruntime_tools.transformers.bert_perf_test --model $optimized_fp16_model_path --batch_size 1 2 4 8 16 32 64 --sequence_length 128 --samples 1000 --test_times 1 --inclusive $THREAD_SETTING $GPU_OPTION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Float16 model summary from ./onnx/perf_results_GPU_B1-2-4-8-16-32-64_S128_20200617-232401.txt\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Latency(ms)</th>\n",
       "      <th>Latency_P50</th>\n",
       "      <th>Latency_P75</th>\n",
       "      <th>Latency_P90</th>\n",
       "      <th>Latency_P95</th>\n",
       "      <th>Latency_P99</th>\n",
       "      <th>Throughput(QPS)</th>\n",
       "      <th>batch_size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.00</td>\n",
       "      <td>2.79</td>\n",
       "      <td>2.81</td>\n",
       "      <td>2.86</td>\n",
       "      <td>4.37</td>\n",
       "      <td>7.08</td>\n",
       "      <td>333.83</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.59</td>\n",
       "      <td>3.33</td>\n",
       "      <td>3.35</td>\n",
       "      <td>3.42</td>\n",
       "      <td>6.60</td>\n",
       "      <td>7.54</td>\n",
       "      <td>557.32</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.32</td>\n",
       "      <td>3.98</td>\n",
       "      <td>4.01</td>\n",
       "      <td>4.64</td>\n",
       "      <td>7.23</td>\n",
       "      <td>8.11</td>\n",
       "      <td>926.92</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>6.32</td>\n",
       "      <td>5.94</td>\n",
       "      <td>5.97</td>\n",
       "      <td>7.61</td>\n",
       "      <td>8.96</td>\n",
       "      <td>10.12</td>\n",
       "      <td>1266.63</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9.60</td>\n",
       "      <td>9.22</td>\n",
       "      <td>9.25</td>\n",
       "      <td>11.32</td>\n",
       "      <td>12.33</td>\n",
       "      <td>13.34</td>\n",
       "      <td>1666.05</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16.17</td>\n",
       "      <td>15.80</td>\n",
       "      <td>15.90</td>\n",
       "      <td>17.38</td>\n",
       "      <td>18.80</td>\n",
       "      <td>19.93</td>\n",
       "      <td>1979.41</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>29.26</td>\n",
       "      <td>28.89</td>\n",
       "      <td>29.01</td>\n",
       "      <td>30.63</td>\n",
       "      <td>32.53</td>\n",
       "      <td>33.28</td>\n",
       "      <td>2187.15</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Latency(ms)  Latency_P50  Latency_P75  Latency_P90  Latency_P95  \\\n",
       "0         3.00         2.79         2.81         2.86         4.37   \n",
       "1         3.59         3.33         3.35         3.42         6.60   \n",
       "2         4.32         3.98         4.01         4.64         7.23   \n",
       "3         6.32         5.94         5.97         7.61         8.96   \n",
       "4         9.60         9.22         9.25        11.32        12.33   \n",
       "5        16.17        15.80        15.90        17.38        18.80   \n",
       "6        29.26        28.89        29.01        30.63        32.53   \n",
       "\n",
       "   Latency_P99  Throughput(QPS)  batch_size  \n",
       "0         7.08           333.83           1  \n",
       "1         7.54           557.32           2  \n",
       "2         8.11           926.92           4  \n",
       "3        10.12          1266.63           8  \n",
       "4        13.34          1666.05          16  \n",
       "5        19.93          1979.41          32  \n",
       "6        33.28          2187.15          64  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "import glob     \n",
    "import pandas\n",
    "latest_result_file = max(glob.glob(\"./onnx/perf_results_*.txt\"), key=os.path.getmtime)\n",
    "result_data = pandas.read_table(latest_result_file, converters={'OMP_NUM_THREADS': str, 'OMP_WAIT_POLICY':str})\n",
    "print(\"Float16 model summary from\", latest_result_file)\n",
    "columns_to_remove = ['model', 'graph_optimization_level', 'test_cases', 'test_times', 'use_gpu', 'warmup', 'sequence_length']\n",
    "columns_to_remove.extend(['intra_op_num_threads', 'OMP_NUM_THREADS', 'OMP_WAIT_POLICY', 'contiguous'])\n",
    "result_data.drop(columns_to_remove, axis=1, inplace=True)\n",
    "result_data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Additional Info\n",
    "\n",
    "Note that running Jupyter Notebook has significant impact on performance result. You can close Jupyter Notebook and other applications, then run the performance test in a console to get more accurate performance numbers.\n",
    "\n",
    "We have a [benchmark script](https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/run_benchmark.sh). It is recommended to use it measure inference speed of OnnxRuntime.\n",
    "\n",
    "[OnnxRuntime C API](https://github.com/microsoft/onnxruntime/blob/master/docs/C_API.md) could get slightly better performance than python API. If you use C API in inference, you can use OnnxRuntime_Perf_Test.exe built from source to measure performance instead.\n",
    "\n",
    "Here is the machine configuration that generated the above results. You might get slower or faster result according to your hardware."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\r\n",
      "  \"gpu\": {\r\n",
      "    \"driver_version\": \"440.64.00\",\r\n",
      "    \"devices\": [\r\n",
      "      {\r\n",
      "        \"memory_total\": 16945512448,\r\n",
      "        \"memory_available\": 14110883840,\r\n",
      "        \"name\": \"Tesla V100-PCIE-16GB\"\r\n",
      "      },\r\n",
      "      {\r\n",
      "        \"memory_total\": 16945512448,\r\n",
      "        \"memory_available\": 16932601856,\r\n",
      "        \"name\": \"Tesla V100-PCIE-16GB\"\r\n",
      "      }\r\n",
      "    ]\r\n",
      "  },\r\n",
      "  \"cpu\": {\r\n",
      "    \"brand\": \"Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz\",\r\n",
      "    \"cores\": 12,\r\n",
      "    \"logical_cores\": 12,\r\n",
      "    \"hz\": \"2.5940 GHz\",\r\n",
      "    \"l2_cache\": \"256 KB\",\r\n",
      "    \"l3_cache\": \"35840 KB\",\r\n",
      "    \"processor\": \"x86_64\"\r\n",
      "  },\r\n",
      "  \"memory\": {\r\n",
      "    \"total\": 236645588992,\r\n",
      "    \"available\": 222567559168\r\n",
      "  },\r\n",
      "  \"python\": \"3.7.7.final.0 (64 bit)\",\r\n",
      "  \"os\": \"Linux-4.15.0-1089-azure-x86_64-with-debian-stretch-sid\",\r\n",
      "  \"onnxruntime\": {\r\n",
      "    \"version\": \"1.3.0\",\r\n",
      "    \"support_gpu\": true\r\n",
      "  },\r\n",
      "  \"pytorch\": {\r\n",
      "    \"version\": \"1.5.0\",\r\n",
      "    \"support_gpu\": true\r\n",
      "  },\r\n",
      "  \"tensorflow\": null\r\n",
      "}\r\n"
     ]
    }
   ],
   "source": [
    "!{sys.executable} -m onnxruntime_tools.transformers.machine_info --silent"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "PyCharm (ccks_ner-master)",
   "language": "python",
   "name": "pycharm-de4c0941"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: code/bert-base-count3/finetuning/Config.py
================================================
from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \
    get_linear_schedule_with_warmup, XLNetModel, XLNetTokenizer, XLNetConfig, ElectraModel, ElectraConfig, ElectraTokenizer, \
    RobertaTokenizer, RobertaModel, RobertaConfig
from NEZHA.modeling_nezha import NeZhaModel
from NEZHA.configuration_nezha import NeZhaConfig


MODELS = {
    'BertForClass':  BertModel,
    'BertForClass_MultiDropout':  BertModel,
   'BertLastTwoCls':  BertModel,
    'BertLastCls':BertModel,
   'BertLastTwoClsPooler':  BertModel,
    'BertLastTwoEmbeddings': BertModel,
    'BertLastTwoEmbeddingsPooler': BertModel,
    'BertLastFourCls': BertModel,
    'BertLastFourClsPooler':  BertModel,
    'BertLastFourEmbeddings':  BertModel,
   'BertLastFourEmbeddingsPooler':  BertModel,
   'BertDynCls':  BertModel,
    'BertDynEmbeddings': BertModel,
    'BertRNN': BertModel,
    'BertCNN': XLNetModel,
    'BertRCNN':  BertModel,
    'XLNet': XLNetModel,
    'Electra': ElectraModel,
    'NEZHA': NeZhaModel
    }

TOKENIZERS = {
    'BertForClass': BertTokenizer,
    'BertForClass_MultiDropout': BertTokenizer,
    'BertLastTwoCls': BertTokenizer,
    'BertLastCls': BertTokenizer,
    'BertLastTwoClsPooler': BertTokenizer,
    'BertLastTwoEmbeddings': BertTokenizer,
    'BertLastTwoEmbeddingsPooler': BertTokenizer,
    'BertLastFourCls': BertTokenizer,
    'BertLastFourClsPooler': BertTokenizer,
    'BertLastFourEmbeddings': BertTokenizer,
    'BertLastFourEmbeddingsPooler': BertTokenizer,
    'BertDynCls': BertTokenizer,
    'BertDynEmbeddings': BertTokenizer,
    'BertRNN': BertTokenizer,
    'BertCNN': BertTokenizer,
    'BertRCNN': BertTokenizer,
    'XLNet': XLNetTokenizer,
    'Electra': ElectraTokenizer,
    'NEZHA': BertTokenizer
    }

CONFIGS = {
    'BertForClass': BertConfig,
    'BertForClass_MultiDropout': BertConfig,
    'BertLastTwoCls': BertConfig,
    'BertLastCls': BertConfig,
    'BertLastTwoClsPooler': BertConfig,
    'BertLastTwoEmbeddings': BertConfig,
    'BertLastTwoEmbeddingsPooler': BertConfig,
    'BertLastFourCls': BertConfig,
    'BertLastFourClsPooler': BertConfig,
    'BertLastFourEmbeddings': BertConfig,
    'BertLastFourEmbeddingsPooler': BertConfig,
    'BertDynCls': BertConfig,
    'BertDynEmbeddings': BertConfig,
    'BertRNN': BertConfig,
    'BertCNN': BertConfig,
    'BertRCNN': BertConfig,
    'XLNet': XLNetConfig,
    'Electra': ElectraConfig,
    'NEZHA': NeZhaConfig

    }

================================================
FILE: code/bert-base-count3/finetuning/NEZHA/configuration_nezha.py
================================================

from transformers import PretrainedConfig

NEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}

class NeZhaConfig(PretrainedConfig):
    r"""
        This is the configuration class to store the configuration of an :class:`~transformers.AlbertModel`.
        It is used to instantiate an ALBERT model according to the specified arguments, defining the model
        architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
        the ALBERT `xxlarge <https://huggingface.co/albert-xxlarge-v2>`__ architecture.

        Configuration objects inherit from  :class:`~transformers.PretrainedConfig` and can be used
        to control the model outputs. Read the documentation from  :class:`~transformers.PretrainedConfig`
        for more information.


        Args:
            vocab_size (:obj:`int`, optional, defaults to 30000):
                Vocabulary size of the ALBERT model. Defines the different tokens that
                can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.AlbertModel`.
            embedding_size (:obj:`int`, optional, defaults to 128):
                Dimensionality of vocabulary embeddings.
            hidden_size (:obj:`int`, optional, defaults to 4096):
                Dimensionality of the encoder layers and the pooler layer.
            num_hidden_layers (:obj:`int`, optional, defaults to 12):
                Number of hidden layers in the Transformer encoder.
            num_hidden_groups (:obj:`int`, optional, defaults to 1):
                Number of groups for the hidden layers, parameters in the same group are shared.
            num_attention_heads (:obj:`int`, optional, defaults to 64):
                Number of attention heads for each attention layer in the Transformer encoder.
            intermediate_size (:obj:`int`, optional, defaults to 16384):
                The dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
            inner_group_num (:obj:`int`, optional, defaults to 1):
                The number of inner repetition of attention and ffn.
            hidden_act (:obj:`str` or :obj:`function`, optional, defaults to "gelu_new"):
                The non-linear activation function (function or string) in the encoder and pooler.
                If string, "gelu", "relu", "swish" and "gelu_new" are supported.
            hidden_dropout_prob (:obj:`float`, optional, defaults to 0):
                The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
            attention_probs_dropout_prob (:obj:`float`, optional, defaults to 0):
                The dropout ratio for the attention probabilities.
            max_position_embeddings (:obj:`int`, optional, defaults to 512):
                The maximum sequence length that this model might ever be used with. Typically set this to something
                large (e.g., 512 or 1024 or 2048).
            type_vocab_size (:obj:`int`, optional, defaults to 2):
                The vocabulary size of the `token_type_ids` passed into :class:`~transformers.AlbertModel`.
            initializer_range (:obj:`float`, optional, defaults to 0.02):
                The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
            layer_norm_eps (:obj:`float`, optional, defaults to 1e-12):
                The epsilon used by the layer normalization layers.
            classifier_dropout_prob (:obj:`float`, optional, defaults to 0.1):
                The dropout ratio for attached classifiers.

        Example::

            from transformers import AlbertConfig, AlbertModel
            # Initializing an ALBERT-xxlarge style configuration
            albert_xxlarge_configuration = AlbertConfig()

            # Initializing an ALBERT-base style configuration
            albert_base_configuration = AlbertConfig(
                hidden_size=768,
                num_attention_heads=12,
                intermediate_size=3072,
            )

            # Initializing a model from the ALBERT-base style configuration
            model = AlbertModel(albert_xxlarge_configuration)

            # Accessing the model configuration
            configuration = model.config

        Attributes:
            pretrained_config_archive_map (Dict[str, str]):
                A dictionary containing all the available pre-trained checkpoints.
    """

    pretrained_config_archive_map = NEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP
    model_type = "nezha"

    def __init__(
        self,
        vocab_size=30000,
        embedding_size=128,
        hidden_size=4096,
        num_hidden_layers=12,
        num_hidden_groups=1,
        num_attention_heads=64,
        intermediate_size=16384,
        inner_group_num=1,
        hidden_act="gelu_new",
        hidden_dropout_prob=0,
        attention_probs_dropout_prob=0,
        max_position_embeddings=512,
        max_relative_position=64,
        type_vocab_size=2,
        initializer_range=0.02,
        layer_norm_eps=1e-12,
        classifier_dropout_prob=0.1,
        use_relative_position=True,
        pad_token_id=0,
        bos_token_id=2,
        eos_token_id=3,
        **kwargs
    ):
        super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)

        self.vocab_size = vocab_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        self.num_hidden_groups = num_hidden_groups
        self.num_attention_heads = num_attention_heads
        self.inner_group_num = inner_group_num
        self.hidden_act = hidden_act
        self.intermediate_size = intermediate_size
        self.hidden_dropout_prob = hidden_dropout_prob
        self.attention_probs_dropout_prob = attention_probs_dropout_prob
        self.max_position_embeddings = max_position_embeddings
        self.max_relative_position = max_relative_position
        self.type_vocab_size = type_vocab_size
        self.initializer_range = initializer_range
        self.layer_norm_eps = layer_norm_eps
        self.use_relative_position=use_relative_position
        self.classifier_dropout_prob = classifier_dropout_prob


================================================
FILE: code/bert-base-count3/finetuning/NEZHA/modeling_nezha.py
================================================
import math
import os
import logging
import torch

from torch import nn
from torch.nn import CrossEntropyLoss, MSELoss

from .configuration_nezha import NeZhaConfig
from transformers.file_utils import add_start_docstrings, add_start_docstrings_to_model_forward
from transformers.modeling_utils import PreTrainedModel, prune_linear_layer
from transformers.models.bert.modeling_bert import (
    BertOutput,
    BertPooler,
    BertSelfOutput,
    BertIntermediate,
    BertOnlyMLMHead,
    BertOnlyNSPHead,
    BertPreTrainingHeads,
    BERT_START_DOCSTRING,
    BERT_INPUTS_DOCSTRING,
)

logger = logging.getLogger(__name__)

_CONFIG_FOR_DOC = "NeZhaConfig"
_TOKENIZER_FOR_DOC = "NeZhaTokenizer"

NEZHA_PRETRAINED_MODEL_ARCHIVE_LIST = []
NEZHA_PRETRAINED_MODEL_ARCHIVE_MAP = {}


def load_tf_weights_in_nezha(model, config, tf_checkpoint_path):
    """Load tf checkpoints in a pytorch model."""
    try:
        import re
        import numpy as np
        import tensorflow as tf
    except ImportError:
        logger.error(
            "Loading a TensorFlow model in PyTorch, requires TensorFlow to be installed. Please see "
            "https://www.tensorflow.org/install/ for installation instructions."
        )
        raise

    tf_path = os.path.abspath(tf_checkpoint_path)
    logger.info("Converting TensorFlow checkpoint from {}".format(tf_path))
    # Load weights from TF model
    init_vars = tf.train.list_variables(tf_path)
    names = []
    arrays = []
    for name, shape in init_vars:
        # logger.info("Loading TF weight {} with shape {}".format(name, shape))
        array = tf.train.load_variable(tf_path, name)
        names.append(name)
        arrays.append(array)

    for name, array in zip(names, arrays):
        name = name.split("/")
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if any(
                n in ["adam_v", "adam_m", "lamb_m", "lamb_v", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1",
                      "global_step", "good_steps", "loss_scale", 'bad_steps']
                for n in name
        ):
            logger.info("Skipping {}".format("/".join(name)))
            continue
        pointer = model
        for m_name in name:
            if re.fullmatch(r"[A-Za-z]+_\d+", m_name):
                scope_names = re.split(r"_(\d+)", m_name)
            else:
                scope_names = [m_name]
            if scope_names[0] == "kernel" or scope_names[0] == "gamma":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "output_bias" or scope_names[0] == "beta":
                pointer = getattr(pointer, "bias")
            elif scope_names[0] == "output_weights":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "squad":
                pointer = getattr(pointer, "classifier")
            else:
                try:
                    pointer = getattr(pointer, scope_names[0])
                except AttributeError:
                    logger.info("Skipping {}".format("/".join(name)))
                    continue
            if len(scope_names) >= 2:
                num = int(scope_names[1])
                pointer = pointer[num]
        if m_name[-11:] == "_embeddings":
            pointer = getattr(pointer, "weight")
        elif m_name == "kernel":
            array = np.transpose(array)
        try:
            assert (
                    pointer.shape == array.shape
            ), f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched"
        except AssertionError as e:
            e.args += (pointer.shape, array.shape)
            raise
        logger.info("Initialize PyTorch weight {}".format(name))
        pointer.data = torch.from_numpy(array)
    return model


class NeZhaEmbeddings(nn.Module):
    """
    Construct the embeddings from word, position and token_type embeddings.
    """

    def __init__(self, config):
        super().__init__()
        self.use_relative_position = config.use_relative_position
        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
        # any TensorFlow checkpoint file
        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, input_ids=None, token_type_ids=None, inputs_embeds=None):
        if input_ids is not None:
            input_shape = input_ids.size()
        else:
            input_shape = inputs_embeds.size()[:-1]
        device = input_ids.device if input_ids is not None else inputs_embeds.device
        if token_type_ids is None:
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
        if inputs_embeds is None:
            inputs_embeds = self.word_embeddings(input_ids)
        token_type_embeddings = self.token_type_embeddings(token_type_ids)
        embeddings = inputs_embeds + token_type_embeddings
        embeddings = self.LayerNorm(embeddings)
        embeddings = self.dropout(embeddings)
        return embeddings


def relative_position_encoding(depth, max_length=512, max_relative_position=127):
    vocab_size = max_relative_position * 2 + 1
    range_vec = torch.arange(max_length)
    range_mat = range_vec.repeat(max_length).view(max_length, max_length)
    distance_mat = range_mat - torch.t(range_mat)
    distance_mat_clipped = torch.clamp(distance_mat, -max_relative_position, max_relative_position)
    final_mat = distance_mat_clipped + max_relative_position

    embeddings_table = torch.zeros(vocab_size, depth)
    position = torch.arange(0, vocab_size, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, depth, 2).float() * (-math.log(10000.0) / depth))
    embeddings_table[:, 0::2] = torch.sin(position * div_term)
    embeddings_table[:, 1::2] = torch.cos(position * div_term)
    embeddings_table = embeddings_table.unsqueeze(0).transpose(0, 1).squeeze(1)

    flat_relative_positions_matrix = final_mat.view(-1)
    one_hot_relative_positions_matrix = torch.nn.functional.one_hot(flat_relative_positions_matrix,
                                                                    num_classes=vocab_size).float()
    positions_encoding = torch.matmul(one_hot_relative_positions_matrix, embeddings_table)
    my_shape = list(final_mat.size())
    my_shape.append(depth)
    positions_encoding = positions_encoding.view(my_shape)
    return positions_encoding


class NeZhaSelfAttention(nn.Module):
    def __init__(self, config):
        super().__init__()
        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
            raise ValueError(
                "The hidden size (%d) is not a multiple of the number of attention "
                "heads (%d)" % (config.hidden_size, config.num_attention_heads)
            )
        self.output_attentions = config.output_attentions

        self.num_attention_heads = config.num_attention_heads
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = nn.Linear(config.hidden_size, self.all_head_size)
        self.key = nn.Linear(config.hidden_size, self.all_head_size)
        self.value = nn.Linear(config.hidden_size, self.all_head_size)
        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)

        self.relative_positions_encoding = relative_position_encoding(max_length=config.max_position_embeddings,
                                                                     depth=self.attention_head_size,
                                                                     max_relative_position=config.max_relative_position).to('cuda')

    def transpose_for_scores(self, x):
        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
        x = x.view(*new_x_shape)
        return x.permute(0, 2, 1, 3)

    def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
    ):
        mixed_query_layer = self.query(hidden_states)

        # If this is instantiated as a cross-attention module, the keys
        # and values come from an encoder; the attention mask needs to be
        # such that the encoder's padding tokens are not attended to.
        if encoder_hidden_states is not None:
            mixed_key_layer = self.key(encoder_hidden_states)
            mixed_value_layer = self.value(encoder_hidden_states)
            attention_mask = encoder_attention_mask
        else:
            mixed_key_layer = self.key(hidden_states)
            mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer)
        key_layer = self.transpose_for_scores(mixed_key_layer)
        value_layer = self.transpose_for_scores(mixed_value_layer)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))

        batch_size, num_attention_heads, from_seq_length, to_seq_length = attention_scores.size()

        relations_keys = self.relative_positions_encoding[:to_seq_length, :to_seq_length, :]
        query_layer_t = query_layer.permute(2, 0, 1, 3)

        query_layer_r = query_layer_t.contiguous().view(from_seq_length, batch_size * num_attention_heads,
                                                        self.attention_head_size)
        key_position_scores = torch.matmul(query_layer_r, relations_keys.permute(0, 2, 1))
        key_position_scores_r = key_position_scores.view(from_seq_length, batch_size,
                                                         num_attention_heads, from_seq_length)
        key_position_scores_r_t = key_position_scores_r.permute(1, 2, 0, 3)
        attention_scores = attention_scores + key_position_scores_r_t

        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = nn.Softmax(dim=-1)(attention_scores)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = torch.matmul(attention_probs, value_layer)

        relations_values = self.relative_positions_encoding[:to_seq_length, :to_seq_length, :]
        attention_probs_t = attention_probs.permute(2, 0, 1, 3)
        attentions_probs_r = attention_probs_t.contiguous().view(from_seq_length, batch_size * num_attention_heads,
                                                                 to_seq_length)
        value_position_scores = torch.matmul(attentions_probs_r, relations_values)
        value_position_scores_r = value_position_scores.view(from_seq_length, batch_size,
                                                             num_attention_heads, self.attention_head_size)
        value_position_scores_r_t = value_position_scores_r.permute(1, 2, 0, 3)
        context_layer = context_layer + value_position_scores_r_t

        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
        context_layer = context_layer.view(*new_context_layer_shape)

        outputs = (context_layer, attention_probs) if self.output_attentions else (context_layer,)
        return outputs


class NeZhaAttention(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.self = NeZhaSelfAttention(config)
        self.output = BertSelfOutput(config)
        self.pruned_heads = set()

    def prune_heads(self, heads):
        if len(heads) == 0:
            return
        mask = torch.ones(self.self.num_attention_heads, self.self.attention_head_size)
        heads = set(heads) - self.pruned_heads  # Convert to set and remove already pruned heads
        for head in heads:
            # Compute how many pruned heads are before the head and move the index accordingly
            head = head - sum(1 if h < head else 0 for h in self.pruned_heads)
            mask[head] = 0
        mask = mask.view(-1).contiguous().eq(1)
        index = torch.arange(len(mask))[mask].long()
        # Prune linear layers
        self.self.query = prune_linear_layer(self.self.query, index)
        self.self.key = prune_linear_layer(self.self.key, index)
        self.self.value = prune_linear_layer(self.self.value, index)
        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
        # Update hyper params and store pruned heads
        self.self.num_attention_heads = self.self.num_attention_heads - len(heads)
        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
        self.pruned_heads = self.pruned_heads.union(heads)

    def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
    ):
        self_outputs = self.self(
            hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
        )
        attention_output = self.output(self_outputs[0], hidden_states)
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs


class NeZhaLayer(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.attention = NeZhaAttention(config)
        self.is_decoder = config.is_decoder
        if self.is_decoder:
            self.crossattention = NeZhaAttention(config)
        self.intermediate = BertIntermediate(config)
        self.output = BertOutput(config)

    def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
    ):
        self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
        attention_output = self_attention_outputs[0]
        outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights

        if self.is_decoder and encoder_hidden_states is not None:
            cross_attention_outputs = self.crossattention(
                attention_output, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
            )
            attention_output = cross_attention_outputs[0]
            outputs = outputs + cross_attention_outputs[1:]  # add cross attentions if we output attention weights

        intermediate_output = self.intermediate(attention_output)
        layer_output = self.output(intermediate_output, attention_output)
        outputs = (layer_output,) + outputs
        return outputs


class NeZhaEncoder(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.output_attentions = config.output_attentions
        self.output_hidden_states = config.output_hidden_states
        self.layer = nn.ModuleList([NeZhaLayer(config) for _ in range(config.num_hidden_layers)])


    def forward(
            self,
            hidden_states,
            attention_mask=None,
            head_mask=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
    ):
        all_hidden_states = ()
        all_attentions = ()
        for i, layer_module in enumerate(self.layer):
            if self.output_hidden_states:
                all_hidden_states = all_hidden_states + (hidden_states,)
            layer_outputs = layer_module(
                hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask
            )
            hidden_states = layer_outputs[0]
            if self.output_attentions:
                all_attentions = all_attentions + (layer_outputs[1],)
        # Add last layer
        if self.output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)

        outputs = (hidden_states,)
        if self.output_hidden_states:
            outputs = outputs + (all_hidden_states,)
        if self.output_attentions:
            outputs = outputs + (all_attentions,)
        return outputs  # last-layer hidden state, (all hidden states), (all attentions)


class NeZhaPreTrainedModel(PreTrainedModel):
    """ An abstract class to handle weights initialization and
        a simple interface for downloading and loading pretrained models.
    """
    config_class = NeZhaConfig
    pretrained_model_archive_map = NEZHA_PRETRAINED_MODEL_ARCHIVE_MAP
    load_tf_weights = load_tf_weights_in_nezha
    base_model_prefix = "bert"

    def _init_weights(self, module):
        """ Initialize the weights """
        if isinstance(module, (nn.Linear, nn.Embedding)):
            # Slightly different from the TF version which uses truncated_normal for initialization
            # cf https://github.com/pytorch/pytorch/pull/5617
            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
        elif isinstance(module, nn.LayerNorm):
            module.bias.data.zero_()
            module.weight.data.fill_(1.0)
        if isinstance(module, nn.Linear) and module.bias is not None:
            module.bias.data.zero_()


@add_start_docstrings(
    "The bare Bert Model transformer outputting raw hidden-states without any specific head on top.",
    BERT_START_DOCSTRING,
)
class NeZhaModel(NeZhaPreTrainedModel):
    """
    The model can behave as an encoder (with only self-attention) as well
    as a decoder, in which case a layer of cross-attention is added between
    the self-attention layers, following the architecture described in `Attention is all you need`_ by Ashish Vaswani,
    Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin.

    To behave as an decoder the model needs to be initialized with the
    :obj:`is_decoder` argument of the configuration set to :obj:`True`; an
    :obj:`encoder_hidden_states` is expected as an input to the forward pass.

    .. _`Attention is all you need`:
        https://arxiv.org/abs/1706.03762

    """

    def __init__(self, config):
        super().__init__(config)
        self.config = config
        self.embeddings = NeZhaEmbeddings(config)
        self.encoder = NeZhaEncoder(config)
        self.pooler = BertPooler(config)
        self.init_weights()

    def get_input_embeddings(self):
        return self.embeddings.word_embeddings

    def set_input_embeddings(self, value):
        self.embeddings.word_embeddings = value

    def _prune_heads(self, heads_to_prune):
        """ Prunes heads of the model.
            heads_to_prune: dict of {layer_num: list of heads to prune in this layer}
            See base class PreTrainedModel
        """
        for layer, heads in heads_to_prune.items():
            self.encoder.layer[layer].attention.prune_heads(heads)

    @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
    def forward(
            self,
            input_ids=None,
            attention_mask=None,
            token_type_ids=None,
            head_mask=None,
            position_ids=None,
            inputs_embeds=None,
            encoder_hidden_states=None,
            encoder_attention_mask=None,
    ):
        r"""
    Return:
        :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
        last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
            Sequence of hidden-states at the output of the last layer of the model.
        pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
            Last layer hidden-state of the first token of the sequence (classification token)
            further processed by a Linear layer and a Tanh activation function. The Linear
            layer weights are trained from the next sentence prediction (classification)
            objective during pre-training.

            This output is usually *not* a good summary
            of the semantic content of the input, you're often better with averaging or pooling
            the sequence of hidden-states for the whole input sequence.
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.

            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
            :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.

            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.

    Examples::

        from transformers import BertModel, BertTokenizer
        import torch

        tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        model = BertModel.from_pretrained('bert-base-uncased')

        input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
        outputs = model(input_ids)

        last_hidden_states = outputs[0]  # The last hidden-state is the first element of the output tuple

        """

        if input_ids is not None and inputs_embeds is not None:
            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
        elif input_ids is not None:
            input_shape = input_ids.size()
        elif inputs_embeds is not None:
            input_shape = inputs_embeds.size()[:-1]
        else:
            raise ValueError("You have to specify either input_ids or inputs_embeds")

        device = input_ids.device if input_ids is not None else inputs_embeds.device

        if attention_mask is None:
            attention_mask = torch.ones(input_shape, device=device)
        if token_type_ids is None:
            token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)

        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
        # ourselves in which case we just need to make it broadcastable to all heads.
        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(
            attention_mask, input_shape, self.device
        )

        # If a 2D ou 3D attention mask is provided for the cross-attention
        # we need to make broadcastabe to [batch_size, num_heads, seq_length, seq_length]
        if self.config.is_decoder and encoder_hidden_states is not None:
            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
            if encoder_attention_mask is None:
                encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
            encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
        else:
            encoder_extended_attention_mask = None

        # Prepare head mask if needed
        # 1.0 in head_mask indicate we keep the head
        # attention_probs has shape bsz x n_heads x N x N
        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)

        embedding_output = self.embeddings(
            input_ids=input_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
        )
        encoder_outputs = self.encoder(
            embedding_output,
            attention_mask=extended_attention_mask,
            head_mask=head_mask,
            encoder_hidden_states=encoder_hidden_states,
            en

Download .txt

gitextract_r3nhpke0/

├── README.md
└── code/
    ├── .gitignore
    ├── Config.py
    ├── Dockerfile
    ├── NEZHA/
    │   ├── configuration_nezha.py
    │   └── modeling_nezha.py
    ├── bert-base-chinese/
    │   └── config.json
    ├── bert-base-count3/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── bert_model/
    │       │   └── gitkeep
    │       ├── train_bert.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── bert-base-count3-len100/
    │   └── finetuning/
    │       ├── .ipynb_checkpoints/
    │       │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │       ├── Config.py
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── model.py
    │       ├── models/
    │       │   └── gitkeep
    │       ├── multi_gpu_QA.py
    │       └── utils.py
    ├── bert-base-count5/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── bert_model/
    │       │   └── gitkeep
    │       ├── train_bert.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── bert-base-count5-len32/
    │   └── finetuning/
    │       ├── .ipynb_checkpoints/
    │       │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │       ├── Config.py
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── model.py
    │       ├── models/
    │       │   └── gitkeep
    │       ├── multi_gpu_QA.py
    │       └── utils.py
    ├── build_vocab.py
    ├── docker_build.sh
    ├── main_fusion_thread.py
    ├── model.py
    ├── nezha-base-count3/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── nezha_model/
    │       │   └── gitkeep
    │       ├── train_nezha.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── nezha-base-count5/
    │   ├── finetuning/
    │   │   ├── .ipynb_checkpoints/
    │   │   │   └── PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb
    │   │   ├── Config.py
    │   │   ├── NEZHA/
    │   │   │   ├── configuration_nezha.py
    │   │   │   └── modeling_nezha.py
    │   │   ├── model.py
    │   │   ├── models/
    │   │   │   └── gitkeep
    │   │   ├── multi_gpu_QA.py
    │   │   └── utils.py
    │   └── pretrain/
    │       ├── NEZHA/
    │       │   ├── configuration_nezha.py
    │       │   └── modeling_nezha.py
    │       ├── NLP_Utils.py
    │       ├── __init__.py
    │       ├── nezha_model/
    │       │   └── gitkeep
    │       ├── train_nezha.py
    │       └── transformers1/
    │           ├── __init__.py
    │           ├── __main__.py
    │           ├── activations.py
    │           ├── another_try.py
    │           ├── benchmark/
    │           │   ├── __init__.py
    │           │   ├── benchmark.py
    │           │   ├── benchmark_args.py
    │           │   ├── benchmark_args_utils.py
    │           │   └── benchmark_utils.py
    │           ├── benchmark_utils.py
    │           ├── commands/
    │           │   ├── __init__.py
    │           │   ├── convert.py
    │           │   ├── download.py
    │           │   ├── env.py
    │           │   ├── run.py
    │           │   ├── serving.py
    │           │   ├── train.py
    │           │   ├── transformers_cli.py
    │           │   └── user.py
    │           ├── configuration_albert.py
    │           ├── configuration_auto.py
    │           ├── configuration_bart.py
    │           ├── configuration_bert.py
    │           ├── configuration_camembert.py
    │           ├── configuration_ctrl.py
    │           ├── configuration_distilbert.py
    │           ├── configuration_electra.py
    │           ├── configuration_encoder_decoder.py
    │           ├── configuration_flaubert.py
    │           ├── configuration_gpt2.py
    │           ├── configuration_longformer.py
    │           ├── configuration_marian.py
    │           ├── configuration_mmbt.py
    │           ├── configuration_openai.py
    │           ├── configuration_reformer.py
    │           ├── configuration_roberta.py
    │           ├── configuration_t5.py
    │           ├── configuration_transfo_xl.py
    │           ├── configuration_utils.py
    │           ├── configuration_xlm.py
    │           ├── configuration_xlm_roberta.py
    │           ├── configuration_xlnet.py
    │           ├── convert_albert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bart_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_bert_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_bert_pytorch_checkpoint_to_original_tf.py
    │           ├── convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_electra_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_gpt2_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_graph_to_onnx.py
    │           ├── convert_longformer_original_pytorch_lightning_to_pytorch.py
    │           ├── convert_marian_to_pytorch.py
    │           ├── convert_openai_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_pytorch_checkpoint_to_tf2.py
    │           ├── convert_reformer_trax_checkpoint_to_pytorch.py
    │           ├── convert_roberta_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_t5_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
    │           ├── convert_xlm_original_pytorch_checkpoint_to_pytorch.py
    │           ├── convert_xlnet_original_tf_checkpoint_to_pytorch.py
    │           ├── data/
    │           │   ├── __init__.py
    │           │   ├── data_collator.py
    │           │   ├── datasets/
    │           │   │   ├── __init__.py
    │           │   │   ├── glue.py
    │           │   │   └── language_modeling.py
    │           │   ├── metrics/
    │           │   │   ├── __init__.py
    │           │   │   └── squad_metrics.py
    │           │   └── processors/
    │           │       ├── __init__.py
    │           │       ├── glue.py
    │           │       ├── squad.py
    │           │       ├── utils.py
    │           │       └── xnli.py
    │           ├── file.py
    │           ├── file_utils.py
    │           ├── filep.py
    │           ├── hf_api.py
    │           ├── hf_argparser.py
    │           ├── modelcard.py
    │           ├── modeling_albert.py
    │           ├── modeling_auto.py
    │           ├── modeling_bart.py
    │           ├── modeling_beam_search.py
    │           ├── modeling_bert.py
    │           ├── modeling_camembert.py
    │           ├── modeling_ctrl.py
    │           ├── modeling_distilbert.py
    │           ├── modeling_electra.py
    │           ├── modeling_encoder_decoder.py
    │           ├── modeling_flaubert.py
    │           ├── modeling_gpt2.py
    │           ├── modeling_longformer.py
    │           ├── modeling_marian.py
    │           ├── modeling_mmbt.py
    │           ├── modeling_openai.py
    │           ├── modeling_reformer.py
    │           ├── modeling_roberta.py
    │           ├── modeling_t5.py
    │           ├── modeling_tf_albert.py
    │           ├── modeling_tf_auto.py
    │           ├── modeling_tf_bert.py
    │           ├── modeling_tf_camembert.py
    │           ├── modeling_tf_ctrl.py
    │           ├── modeling_tf_distilbert.py
    │           ├── modeling_tf_electra.py
    │           ├── modeling_tf_flaubert.py
    │           ├── modeling_tf_gpt2.py
    │           ├── modeling_tf_openai.py
    │           ├── modeling_tf_pytorch_utils.py
    │           ├── modeling_tf_roberta.py
    │           ├── modeling_tf_t5.py
    │           ├── modeling_tf_transfo_xl.py
    │           ├── modeling_tf_transfo_xl_utilities.py
    │           ├── modeling_tf_utils.py
    │           ├── modeling_tf_xlm.py
    │           ├── modeling_tf_xlm_roberta.py
    │           ├── modeling_tf_xlnet.py
    │           ├── modeling_transfo_xl.py
    │           ├── modeling_transfo_xl_utilities.py
    │           ├── modeling_utils.py
    │           ├── modeling_xlm.py
    │           ├── modeling_xlm_roberta.py
    │           ├── modeling_xlnet.py
    │           ├── optimization.py
    │           ├── optimization_tf.py
    │           ├── pipelines.py
    │           ├── tokenization_albert.py
    │           ├── tokenization_auto.py
    │           ├── tokenization_bart.py
    │           ├── tokenization_bert.py
    │           ├── tokenization_bert_japanese.py
    │           ├── tokenization_camembert.py
    │           ├── tokenization_ctrl.py
    │           ├── tokenization_distilbert.py
    │           ├── tokenization_electra.py
    │           ├── tokenization_flaubert.py
    │           ├── tokenization_gpt2.py
    │           ├── tokenization_longformer.py
    │           ├── tokenization_marian.py
    │           ├── tokenization_openai.py
    │           ├── tokenization_reformer.py
    │           ├── tokenization_roberta.py
    │           ├── tokenization_t5.py
    │           ├── tokenization_transfo_xl.py
    │           ├── tokenization_utils.py
    │           ├── tokenization_xlm.py
    │           ├── tokenization_xlm_roberta.py
    │           ├── tokenization_xlnet.py
    │           ├── trainer.py
    │           ├── trainer_tf.py
    │           ├── trainer_utils.py
    │           ├── training_args.py
    │           ├── training_args_tf.py
    │           ├── try.py
    │           └── utils_encoder_decoder.py
    ├── nezha-cn-base/
    │   ├── config.json
    │   └── vocab.txt
    ├── requirements.txt
    ├── run.sh
    ├── serial_main_fusion_thread.py
    └── utils.py

Download .txt

Showing preview only (938K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (11552 symbols across 625 files)

FILE: code/NEZHA/configuration_nezha.py
  class NeZhaConfig (line 6) | class NeZhaConfig(PretrainedConfig):
    method __init__ (line 82) | def __init__(

FILE: code/NEZHA/modeling_nezha.py
  function load_tf_weights_in_bert (line 48) | def load_tf_weights_in_bert(model, config, tf_checkpoint_path):
  class BertEmbeddings (line 122) | class BertEmbeddings(nn.Module):
    method __init__ (line 125) | def __init__(self, config):
    method forward (line 134) | def forward(self, input_ids=None, token_type_ids=None, inputs_embeds=N...
  function relative_position_encoding (line 151) | def relative_position_encoding(depth, max_length=512, max_relative_posit...
  class BertSelfAttention (line 175) | class BertSelfAttention(nn.Module):
    method __init__ (line 176) | def __init__(self, config):
    method transpose_for_scores (line 200) | def transpose_for_scores(self, x):
    method forward (line 205) | def forward(
  class BertSelfOutput (line 308) | class BertSelfOutput(nn.Module):
    method __init__ (line 309) | def __init__(self, config):
    method forward (line 315) | def forward(self, hidden_states, input_tensor):
  class BertAttention (line 322) | class BertAttention(nn.Module):
    method __init__ (line 323) | def __init__(self, config):
    method prune_heads (line 329) | def prune_heads(self, heads):
    method forward (line 347) | def forward(
  class BertIntermediate (line 373) | class BertIntermediate(nn.Module):
    method __init__ (line 374) | def __init__(self, config):
    method forward (line 382) | def forward(self, hidden_states):
  class BertOutput (line 388) | class BertOutput(nn.Module):
    method __init__ (line 389) | def __init__(self, config):
    method forward (line 395) | def forward(self, hidden_states, input_tensor):
  class BertLayer (line 402) | class BertLayer(nn.Module):
    method __init__ (line 403) | def __init__(self, config):
    method forward (line 416) | def forward(
    method feed_forward_chunk (line 481) | def feed_forward_chunk(self, attention_output):
  class NeZhaEncoder (line 487) | class NeZhaEncoder(nn.Module):
    method __init__ (line 488) | def __init__(self, config):
    method forward (line 495) | def forward(
  class BertPooler (line 588) | class BertPooler(nn.Module):
    method __init__ (line 589) | def __init__(self, config):
    method forward (line 594) | def forward(self, hidden_states):
  class BertPredictionHeadTransform (line 603) | class BertPredictionHeadTransform(nn.Module):
    method __init__ (line 604) | def __init__(self, config):
    method forward (line 613) | def forward(self, hidden_states):
  class BertLMPredictionHead (line 620) | class BertLMPredictionHead(nn.Module):
    method __init__ (line 621) | def __init__(self, config):
    method forward (line 634) | def forward(self, hidden_states):
  class BertOnlyMLMHead (line 640) | class BertOnlyMLMHead(nn.Module):
    method __init__ (line 641) | def __init__(self, config):
    method forward (line 645) | def forward(self, sequence_output):
  class BertOnlyNSPHead (line 650) | class BertOnlyNSPHead(nn.Module):
    method __init__ (line 651) | def __init__(self, config):
    method forward (line 655) | def forward(self, pooled_output):
  class BertPreTrainingHeads (line 660) | class BertPreTrainingHeads(nn.Module):
    method __init__ (line 661) | def __init__(self, config):
    method forward (line 666) | def forward(self, sequence_output, pooled_output):
  class BertPreTrainedModel (line 672) | class BertPreTrainedModel(PreTrainedModel):
    method _init_weights (line 682) | def _init_weights(self, module):
  class BertForPreTrainingOutput (line 700) | class BertForPreTrainingOutput(ModelOutput):
  class NeZhaModel (line 805) | class NeZhaModel(BertPreTrainedModel):
    method __init__ (line 819) | def __init__(self, config, add_pooling_layer=True):
    method get_input_embeddings (line 830) | def get_input_embeddings(self):
    method set_input_embeddings (line 833) | def set_input_embeddings(self, value):
    method _prune_heads (line 836) | def _prune_heads(self, heads_to_prune):
    method forward (line 851) | def forward(
  class BertForPreTraining (line 982) | class BertForPreTraining(BertPreTrainedModel):
    method __init__ (line 983) | def __init__(self, config):
    method get_output_embeddings (line 991) | def get_output_embeddings(self):
    method set_output_embeddings (line 994) | def set_output_embeddings(self, new_embeddings):
    method forward (line 999) | def forward(
  class BertLMHeadModel (line 1083) | class BertLMHeadModel(BertPreTrainedModel):
    method __init__ (line 1088) | def __init__(self, config):
    method get_output_embeddings (line 1099) | def get_output_embeddings(self):
    method set_output_embeddings (line 1102) | def set_output_embeddings(self, new_embeddings):
    method forward (line 1107) | def forward(
    method prepare_inputs_for_generation (line 1209) | def prepare_inputs_for_generation(self, input_ids, past=None, attentio...
    method _reorder_cache (line 1221) | def _reorder_cache(self, past, beam_idx):
  class NeZhaForMaskedLM (line 1229) | class NeZhaForMaskedLM(BertPreTrainedModel):
    method __init__ (line 1234) | def __init__(self, config):
    method get_output_embeddings (line 1248) | def get_output_embeddings(self):
    method set_output_embeddings (line 1251) | def set_output_embeddings(self, new_embeddings):
    method forward (line 1261) | def forward(
    method prepare_inputs_for_generation (line 1318) | def prepare_inputs_for_generation(self, input_ids, attention_mask=None...
  class BertForNextSentencePrediction (line 1337) | class BertForNextSentencePrediction(BertPreTrainedModel):
    method __init__ (line 1338) | def __init__(self, config):
    method forward (line 1348) | def forward(
  class BertForSequenceClassification (line 1438) | class BertForSequenceClassification(BertPreTrainedModel):
    method __init__ (line 1439) | def __init__(self, config):
    method forward (line 1456) | def forward(
  class BertForMultipleChoice (line 1523) | class BertForMultipleChoice(BertPreTrainedModel):
    method __init__ (line 1524) | def __init__(self, config):
    method forward (line 1540) | def forward(
  class BertForTokenClassification (line 1613) | class BertForTokenClassification(BertPreTrainedModel):
    method __init__ (line 1617) | def __init__(self, config):
    method forward (line 1634) | def forward(
  class BertForQuestionAnswering (line 1704) | class BertForQuestionAnswering(BertPreTrainedModel):
    method __init__ (line 1708) | def __init__(self, config):
    method forward (line 1724) | def forward(

FILE: code/bert-base-count3-len100/finetuning/NEZHA/configuration_nezha.py
  class NeZhaConfig (line 6) | class NeZhaConfig(PretrainedConfig):
    method __init__ (line 82) | def __init__(

FILE: code/bert-base-count3-len100/finetuning/NEZHA/modeling_nezha.py
  function load_tf_weights_in_nezha (line 33) | def load_tf_weights_in_nezha(model, config, tf_checkpoint_path):
  class NeZhaEmbeddings (line 108) | class NeZhaEmbeddings(nn.Module):
    method __init__ (line 113) | def __init__(self, config):
    method forward (line 123) | def forward(self, input_ids=None, token_type_ids=None, inputs_embeds=N...
  function relative_position_encoding (line 140) | def relative_position_encoding(depth, max_length=512, max_relative_posit...
  class NeZhaSelfAttention (line 165) | class NeZhaSelfAttention(nn.Module):
    method __init__ (line 166) | def __init__(self, config):
    method transpose_for_scores (line 188) | def transpose_for_scores(self, x):
    method forward (line 193) | def forward(
  class NeZhaAttention (line 270) | class NeZhaAttention(nn.Module):
    method __init__ (line 271) | def __init__(self, config):
    method prune_heads (line 277) | def prune_heads(self, heads):
    method forward (line 298) | def forward(
  class NeZhaLayer (line 314) | class NeZhaLayer(nn.Module):
    method __init__ (line 315) | def __init__(self, config):
    method forward (line 324) | def forward(
  class NeZhaEncoder (line 349) | class NeZhaEncoder(nn.Module):
    method __init__ (line 350) | def __init__(self, config):
    method forward (line 357) | def forward(
  class NeZhaPreTrainedModel (line 388) | class NeZhaPreTrainedModel(PreTrainedModel):
    method _init_weights (line 397) | def _init_weights(self, module):
  class NeZhaModel (line 414) | class NeZhaModel(NeZhaPreTrainedModel):
    method __init__ (line 430) | def __init__(self, config):
    method get_input_embeddings (line 438) | def get_input_embeddings(self):
    method set_input_embeddings (line 441) | def set_input_embeddings(self, value):
    method _prune_heads (line 444) | def _prune_heads(self, heads_to_prune):
    method forward (line 453) | def forward(
  class NeZhaForPreTraining (line 569) | class NeZhaForPreTraining(NeZhaPreTrainedModel):
    method __init__ (line 570) | def __init__(self, config):
    method get_output_embeddings (line 576) | def get_output_embeddings(self):
    method forward (line 580) | def forward(
  class NeZhaForMaskedLM (line 664) | class NeZhaForMaskedLM(NeZhaPreTrainedModel):
    method __init__ (line 665) | def __init__(self, config):
    method get_output_embeddings (line 671) | def get_output_embeddings(self):
    method forward (line 675) | def forward(
    method prepare_inputs_for_generation (line 760) | def prepare_inputs_for_generation(self, input_ids, attention_mask=None...
  class NeZhaForNextSentencePrediction (line 786) | class NeZhaForNextSentencePrediction(NeZhaPreTrainedModel):
    method __init__ (line 787) | def __init__(self, config):
    method forward (line 794) | def forward(
  class NeZhaForSequenceClassification (line 868) | class NeZhaForSequenceClassification(NeZhaPreTrainedModel):
    method __init__ (line 869) | def __init__(self, config):
    method forward (line 878) | def forward(
  class NeZhaForMultipleChoice (line 962) | class NeZhaForMultipleChoice(NeZhaPreTrainedModel):
    method __init__ (line 963) | def __init__(self, config):
    method forward (line 971) | def forward(
  class NeZhaForTokenClassification (line 1058) | class NeZhaForTokenClassification(NeZhaPreTrainedModel):
    method __init__ (line 1059) | def __init__(self, config):
    method forward (line 1068) | def forward(
  class NeZhaForQuestionAnswering (line 1153) | class NeZhaForQuestionAnswering(NeZhaPreTrainedModel):
    method __init__ (line 1154) | def __init__(self, config):
    method forward (line 1162) | def forward(

FILE: code/bert-base-count3-len100/finetuning/model.py
  class BertForClass (line 11) | class BertForClass(nn.Module):
    method __init__ (line 12) | def __init__(self, config):
    method forward (line 24) | def forward(self, input_ids, input_masks, segment_ids):
  class BertForClass_MultiDropout (line 37) | class BertForClass_MultiDropout(nn.Module):
    method __init__ (line 38) | def __init__(self, config):
    method forward (line 50) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoCls (line 63) | class BertLastTwoCls(nn.Module):
    method __init__ (line 64) | def __init__(self, config):
    method forward (line 75) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastCls (line 83) | class BertLastCls(nn.Module):
    method __init__ (line 84) | def __init__(self, config):
    method forward (line 95) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoClsPooler (line 108) | class BertLastTwoClsPooler(nn.Module):
    method __init__ (line 109) | def __init__(self, config):
    method forward (line 120) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoEmbeddings (line 132) | class BertLastTwoEmbeddings(nn.Module):
    method __init__ (line 133) | def __init__(self, config):
    method forward (line 144) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoEmbeddingsPooler (line 160) | class BertLastTwoEmbeddingsPooler(nn.Module):
    method __init__ (line 161) | def __init__(self, config):
    method forward (line 172) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourCls (line 187) | class BertLastFourCls(nn.Module):
    method __init__ (line 188) | def __init__(self, config):
    method forward (line 199) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourClsPooler (line 215) | class BertLastFourClsPooler(nn.Module):
    method __init__ (line 216) | def __init__(self, config):
    method forward (line 227) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourEmbeddings (line 239) | class BertLastFourEmbeddings(nn.Module):
    method __init__ (line 240) | def __init__(self, config):
    method forward (line 251) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourEmbeddingsPooler (line 268) | class BertLastFourEmbeddingsPooler(nn.Module):
    method __init__ (line 269) | def __init__(self, config):
    method forward (line 280) | def forward(self, input_ids, input_masks, segment_ids):
  class BertDynCls (line 296) | class BertDynCls(nn.Module):
    method __init__ (line 297) | def __init__(self, config):
    method forward (line 311) | def forward(self, input_ids, input_masks, segment_ids):
  class BertDynEmbeddings (line 343) | class BertDynEmbeddings(nn.Module):
    method __init__ (line 344) | def __init__(self, config):
    method forward (line 358) | def forward(self, input_ids, input_masks, segment_ids):
  class BertRNN (line 392) | class BertRNN(nn.Module):
    method __init__ (line 394) | def __init__(self, config):
    method forward (line 434) | def forward(self, input_ids, input_masks, segment_ids):
  class BertCNN (line 459) | class BertCNN(nn.Module):
    method __init__ (line 461) | def __init__(self, config):
    method conv_and_pool (line 480) | def conv_and_pool(self, x, conv):
    method forward (line 485) | def forward(self, input_ids, input_masks, segment_ids):
  class BertRCNN (line 497) | class BertRCNN(nn.Module):
    method __init__ (line 498) | def __init__(self, config):
    method forward (line 540) | def forward(self, input_ids, input_masks, segment_ids):
  class XLNet (line 564) | class XLNet(nn.Module):
    method __init__ (line 566) | def __init__(self, config):
    method forward (line 574) | def forward(self, input_ids, input_masks, segment_ids):
  class ElectraClassificationHead (line 584) | class ElectraClassificationHead(nn.Module):
    method __init__ (line 587) | def __init__(self, config):
    method forward (line 593) | def forward(self, features, **kwargs):
  class Electra (line 602) | class Electra(nn.Module):
    method __init__ (line 604) | def __init__(self, config):
    method forward (line 613) | def forward(self, input_ids, input_masks, segment_ids):
  class NEZHA (line 621) | class NEZHA(nn.Module):
    method __init__ (line 622) | def __init__(self, config):
    method forward (line 637) | def forward(self, input_ids, input_masks, segment_ids):

FILE: code/bert-base-count3-len100/finetuning/multi_gpu_QA.py
  class Config (line 46) | class Config:
    method __init__ (line 47) | def __init__(self):

FILE: code/bert-base-count3-len100/finetuning/utils.py
  function paddingList (line 12) | def paddingList(ls:list,val,returnTensor=False):
  function fastTokenizer (line 19) | def fastTokenizer(a:str,b:str,maxLen,tk):
  class data_generator (line 39) | class data_generator:
    method __init__ (line 40) | def __init__(self, data, config, shuffle=False):
    method __len__ (line 53) | def __len__(self):
    method __iter__ (line 56) | def __iter__(self):
  class PGD (line 95) | class PGD():
    method __init__ (line 96) | def __init__(self, model):
    method attack (line 101) | def attack(self, epsilon=0.3, alpha=0.1, emb_name='word_embeddings', i...
    method restore (line 113) | def restore(self, emb_name='word_embeddings'):
    method project (line 121) | def project(self, param_name, param_data, epsilon):
    method backup_grad (line 127) | def backup_grad(self):
    method restore_grad (line 132) | def restore_grad(self):
  class FGM (line 139) | class FGM():
    method __init__ (line 140) | def __init__(self, model):
    method attack (line 144) | def attack(self, epsilon=0.25, emb_name='word_embeddings'):
    method restore (line 154) | def restore(self, emb_name='word_embeddings'):
  class FocalLoss (line 164) | class FocalLoss(nn.Module):
    method __init__ (line 180) | def __init__(self, num_class, alpha=None, gamma=2,
    method forward (line 201) | def forward(self, input, target):
  function f1_match (line 244) | def f1_match(y_true,y_pred):

FILE: code/bert-base-count3/finetuning/NEZHA/configuration_nezha.py
  class NeZhaConfig (line 6) | class NeZhaConfig(PretrainedConfig):
    method __init__ (line 82) | def __init__(

FILE: code/bert-base-count3/finetuning/NEZHA/modeling_nezha.py
  function load_tf_weights_in_nezha (line 33) | def load_tf_weights_in_nezha(model, config, tf_checkpoint_path):
  class NeZhaEmbeddings (line 108) | class NeZhaEmbeddings(nn.Module):
    method __init__ (line 113) | def __init__(self, config):
    method forward (line 123) | def forward(self, input_ids=None, token_type_ids=None, inputs_embeds=N...
  function relative_position_encoding (line 140) | def relative_position_encoding(depth, max_length=512, max_relative_posit...
  class NeZhaSelfAttention (line 165) | class NeZhaSelfAttention(nn.Module):
    method __init__ (line 166) | def __init__(self, config):
    method transpose_for_scores (line 188) | def transpose_for_scores(self, x):
    method forward (line 193) | def forward(
  class NeZhaAttention (line 270) | class NeZhaAttention(nn.Module):
    method __init__ (line 271) | def __init__(self, config):
    method prune_heads (line 277) | def prune_heads(self, heads):
    method forward (line 298) | def forward(
  class NeZhaLayer (line 314) | class NeZhaLayer(nn.Module):
    method __init__ (line 315) | def __init__(self, config):
    method forward (line 324) | def forward(
  class NeZhaEncoder (line 349) | class NeZhaEncoder(nn.Module):
    method __init__ (line 350) | def __init__(self, config):
    method forward (line 357) | def forward(
  class NeZhaPreTrainedModel (line 388) | class NeZhaPreTrainedModel(PreTrainedModel):
    method _init_weights (line 397) | def _init_weights(self, module):
  class NeZhaModel (line 414) | class NeZhaModel(NeZhaPreTrainedModel):
    method __init__ (line 430) | def __init__(self, config):
    method get_input_embeddings (line 438) | def get_input_embeddings(self):
    method set_input_embeddings (line 441) | def set_input_embeddings(self, value):
    method _prune_heads (line 444) | def _prune_heads(self, heads_to_prune):
    method forward (line 453) | def forward(
  class NeZhaForPreTraining (line 569) | class NeZhaForPreTraining(NeZhaPreTrainedModel):
    method __init__ (line 570) | def __init__(self, config):
    method get_output_embeddings (line 576) | def get_output_embeddings(self):
    method forward (line 580) | def forward(
  class NeZhaForMaskedLM (line 664) | class NeZhaForMaskedLM(NeZhaPreTrainedModel):
    method __init__ (line 665) | def __init__(self, config):
    method get_output_embeddings (line 671) | def get_output_embeddings(self):
    method forward (line 675) | def forward(
    method prepare_inputs_for_generation (line 760) | def prepare_inputs_for_generation(self, input_ids, attention_mask=None...
  class NeZhaForNextSentencePrediction (line 786) | class NeZhaForNextSentencePrediction(NeZhaPreTrainedModel):
    method __init__ (line 787) | def __init__(self, config):
    method forward (line 794) | def forward(
  class NeZhaForSequenceClassification (line 868) | class NeZhaForSequenceClassification(NeZhaPreTrainedModel):
    method __init__ (line 869) | def __init__(self, config):
    method forward (line 878) | def forward(
  class NeZhaForMultipleChoice (line 962) | class NeZhaForMultipleChoice(NeZhaPreTrainedModel):
    method __init__ (line 963) | def __init__(self, config):
    method forward (line 971) | def forward(
  class NeZhaForTokenClassification (line 1058) | class NeZhaForTokenClassification(NeZhaPreTrainedModel):
    method __init__ (line 1059) | def __init__(self, config):
    method forward (line 1068) | def forward(
  class NeZhaForQuestionAnswering (line 1153) | class NeZhaForQuestionAnswering(NeZhaPreTrainedModel):
    method __init__ (line 1154) | def __init__(self, config):
    method forward (line 1162) | def forward(

FILE: code/bert-base-count3/finetuning/model.py
  class BertForClass (line 11) | class BertForClass(nn.Module):
    method __init__ (line 12) | def __init__(self, config):
    method forward (line 24) | def forward(self, input_ids, input_masks, segment_ids):
  class BertForClass_MultiDropout (line 37) | class BertForClass_MultiDropout(nn.Module):
    method __init__ (line 38) | def __init__(self, config):
    method forward (line 50) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoCls (line 63) | class BertLastTwoCls(nn.Module):
    method __init__ (line 64) | def __init__(self, config):
    method forward (line 75) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastCls (line 83) | class BertLastCls(nn.Module):
    method __init__ (line 84) | def __init__(self, config):
    method forward (line 95) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoClsPooler (line 108) | class BertLastTwoClsPooler(nn.Module):
    method __init__ (line 109) | def __init__(self, config):
    method forward (line 120) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoEmbeddings (line 132) | class BertLastTwoEmbeddings(nn.Module):
    method __init__ (line 133) | def __init__(self, config):
    method forward (line 144) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastTwoEmbeddingsPooler (line 160) | class BertLastTwoEmbeddingsPooler(nn.Module):
    method __init__ (line 161) | def __init__(self, config):
    method forward (line 172) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourCls (line 187) | class BertLastFourCls(nn.Module):
    method __init__ (line 188) | def __init__(self, config):
    method forward (line 199) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourClsPooler (line 215) | class BertLastFourClsPooler(nn.Module):
    method __init__ (line 216) | def __init__(self, config):
    method forward (line 227) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourEmbeddings (line 239) | class BertLastFourEmbeddings(nn.Module):
    method __init__ (line 240) | def __init__(self, config):
    method forward (line 251) | def forward(self, input_ids, input_masks, segment_ids):
  class BertLastFourEmbeddingsPooler (line 268) | class BertLastFourEmbeddingsPooler(nn.Module):
    method __init__ (line 269) | def __init__(self, config):
    method forward (line 280) | def forward(self, input_ids, input_masks, segment_ids):
  class BertDynCls (line 296) | class BertDynCls(nn.Module):
    method __init__ (line 297) | def __init__(self, config):
    method forward (line 311) | def forward(self, input_ids, input_masks, segment_ids):
  class BertDynEmbeddings (line 343) | class BertDynEmbeddings(nn.Module):
    method __init__ (line 344) | def __init__(self, config):
    method forward (line 358) | def forward(self, input_ids, input_masks, segment_ids):
  class BertRNN (line 392) | class BertRNN(nn.Module):
    method __init__ (line 394) | def __init__(self, config):
    method forward (line 434) | def forward(self, input_ids, input_masks, segment_ids):
  class BertCNN (line 459) | class BertCNN(nn.Module):
    method __init__ (line 461) | def __init__(self, config):
    method conv_and_pool (line 480) | def conv_and_pool(self, x, conv):
    method forward (line 485) | def forward(self, input_ids, input_masks, segment_ids):
  class BertRCNN (line 497) | class BertRCNN(nn.Module):
    method __init__ (line 498) | def __init__(self, config):
    method forward (line 540) | def forward(self, input_ids, input_masks, segment_ids):
  class XLNet (line 564) | class XLNet(nn.Module):
    method __init__ (line 566) | def __init__(self, config):
    method forward (line 574) | def forward(self, input_ids, input_masks, segment_ids):
  class ElectraClassificationHead (line 584) | class ElectraClassificationHead(nn.Module):
    method __init__ (line 587) | def __init__(self, config):
    method forward (line 593) | def forward(self, features, **kwargs):
  class Electra (line 602) | class Electra(nn.Module):
    method __init__ (line 604) | def __init__(self, config):
    method forward (line 613) | def forward(self, input_ids, input_masks, segment_ids):
  class NEZHA (line 621) | class NEZHA(nn.Module):
    method __init__ (line 622) | def __init__(self, config):
    method forward (line 637) | def forward(self, input_ids, input_masks, segment_ids):

FILE: code/bert-base-count3/finetuning/multi_gpu_QA.py
  class Config (line 46) | class Config:
    method __init__ (line 47) | def __init__(self):

FILE: code/bert-base-count3/finetuning/utils.py
  function paddingList (line 12) | def paddingList(ls:list,val,returnTensor=False):
  function fastTokenizer (line 19) | def fastTokenizer(a:str,b:str,maxLen,tk):
  class data_generator (line 39) | class data_generator:
    method __init__ (line 40) | def __init__(self, data, config, shuffle=False):
    method __len__ (line 53) | def __len__(self):
    method __iter__ (line 56) | def __iter__(self):
  class PGD (line 95) | class PGD():
    method __init__ (line 96) | def __init__(self, model):
    method attack (line 101) | def attack(self, epsilon=0.3, alpha=0.1, emb_name='word_embeddings', i...
    method restore (line 113) | def restore(self, emb_name='word_embeddings'):
    method project (line 121) | def project(self, param_name, param_data, epsilon):
    method backup_grad (line 127) | def backup_grad(self):
    method restore_grad (line 132) | def restore_grad(self):
  class FGM (line 139) | class FGM():
    method __init__ (line 140) | def __init__(self, model):
    method attack (line 144) | def attack(self, epsilon=0.25, emb_name='word_embeddings'):
    method restore (line 154) | def restore(self, emb_name='word_embeddings'):
  class FocalLoss (line 164) | class FocalLoss(nn.Module):
    method __init__ (line 180) | def __init__(self, num_class, alpha=None, gamma=2,
    method forward (line 201) | def forward(self, input, target):
  function f1_match (line 244) | def f1_match(y_true,y_pred):

FILE: code/bert-base-count3/pretrain/NLP_Utils.py
  function writeToJsonFile (line 10) | def writeToJsonFile(path: str, obj):
  function readFromJsonFile (line 13) | def readFromJsonFile(path: str):
  function loadData (line 17) | def loadData(path):
  function calNegPos (line 35) | def calNegPos(ls):#计算正负比例
  function paddingList (line 54) | def paddingList(ls:list,val,returnTensor=False):
  function truncate (line 61) | def truncate(a:list,b:list,maxLen):
  class MLM_Data (line 77) | class MLM_Data(Dataset):
    method __init__ (line 79) | def __init__(self,textLs:list,maxLen:int,tk:BertTokenizer):
    method __len__ (line 87) | def __len__(self):
    method random_mask (line 90) | def random_mask(self,text_ids):
    method __getitem__ (line 128) | def __getitem__(self, item):
    method collate (line 143) | def collate(cls,batch):
  function blockShuffle (line 163) | def blockShuffle(data:list,bs:int,sortBsNum,key):
  class blockShuffleDataLoader (line 179) | class blockShuffleDataLoader(DataLoader):
    method __init__ (line 180) | def __init__(self, dataset: Dataset,sortBsNum,key,**kwargs):
    method __iter__ (line 186) | def __iter__(self):

FILE: code/bert-base-count3/pretrain/transformers1/__main__.py
  function main (line 2) | def main():

FILE: code/bert-base-count3/pretrain/transformers1/activations.py
  function swish (line 11) | def swish(x):
  function _gelu_python (line 15) | def _gelu_python(x):
  function gelu_new (line 25) | def gelu_new(x):
  function gelu_fast (line 38) | def gelu_fast(x):
  function get_activation (line 52) | def get_activation(activation_string):

FILE: code/bert-base-count3/pretrain/transformers1/benchmark/benchmark.py
  class PyTorchBenchmark (line 38) | class PyTorchBenchmark(Benchmark):
    method framework_version (line 45) | def framework_version(self):
    method train (line 48) | def train(self, model_name, batch_size, sequence_length, trace_memory=...
    method inference (line 100) | def inference(self, model_name, batch_size, sequence_length, trace_mem...

FILE: code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_args.py
  function is_tpu_available (line 37) | def is_tpu_available():
  class PyTorchBenchmarkArguments (line 45) | class PyTorchBenchmarkArguments(BenchmarkArguments):
    method _setup_devices (line 52) | def _setup_devices(self) -> Tuple["torch.device", int]:
    method device_idx (line 67) | def device_idx(self) -> int:
    method device (line 72) | def device(self) -> "torch.device":
    method n_gpu (line 77) | def n_gpu(self):

FILE: code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_args_utils.py
  function list_field (line 24) | def list_field(default=None, metadata=None):
  class BenchmarkArguments (line 29) | class BenchmarkArguments:
    method to_json_string (line 90) | def to_json_string(self):
    method model_names (line 97) | def model_names(self):

FILE: code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_utils.py
  function is_memory_tracing_enabled (line 43) | def is_memory_tracing_enabled():
  class Frame (line 48) | class Frame(NamedTuple):
  class UsedMemoryState (line 65) | class UsedMemoryState(NamedTuple):
  class Memory (line 77) | class Memory(NamedTuple):
    method __repr__ (line 85) | def __repr__(self) -> str:
  class MemoryState (line 89) | class MemoryState(NamedTuple):
  class MemorySummary (line 103) | class MemorySummary(NamedTuple):
  function start_memory_tracing (line 123) | def start_memory_tracing(
  function stop_memory_tracing (line 273) | def stop_memory_tracing(
  function bytes_to_mega_bytes (line 370) | def bytes_to_mega_bytes(memory_amount: int) -> int:
  class Benchmark (line 376) | class Benchmark(ABC):
    method __init__ (line 386) | def __init__(self, args: BenchmarkArguments = None, configs: Pretraine...
    method print_fn (line 401) | def print_fn(self):
    method is_gpu (line 421) | def is_gpu(self):
    method framework_version (line 426) | def framework_version(self):
    method train (line 430) | def train(self, model_name, batch_size, sequence_length):
    method inference (line 434) | def inference(self, model_name, batch_size, sequence_length):
    method run (line 437) | def run(self):
    method environment_info (line 512) | def environment_info(self):
    method print_results (line 572) | def print_results(self, result_dict):
    method print_memory_trace_statistics (line 585) | def print_memory_trace_statistics(self, summary: MemorySummary):
    method save_to_csv (line 609) | def save_to_csv(self, result_dict, filename):

FILE: code/bert-base-count3/pretrain/transformers1/benchmark_utils.py
  function is_memory_tracing_enabled (line 29) | def is_memory_tracing_enabled():
  class Frame (line 34) | class Frame(NamedTuple):
  class UsedMemoryState (line 51) | class UsedMemoryState(NamedTuple):
  class Memory (line 63) | class Memory(NamedTuple):
    method __repr__ (line 71) | def __repr__(self) -> str:
  class MemoryState (line 75) | class MemoryState(NamedTuple):
  class MemorySummary (line 89) | class MemorySummary(NamedTuple):
  function start_memory_tracing (line 108) | def start_memory_tracing(
  function stop_memory_tracing (line 256) | def stop_memory_tracing(
  function bytes_to_human_readable (line 334) | def bytes_to_human_readable(memory_amount: int) -> str:

FILE: code/bert-base-count3/pretrain/transformers1/commands/__init__.py
  class BaseTransformersCLICommand (line 5) | class BaseTransformersCLICommand(ABC):
    method register_subcommand (line 8) | def register_subcommand(parser: ArgumentParser):
    method run (line 12) | def run(self):

FILE: code/bert-base-count3/pretrain/transformers1/commands/convert.py
  function convert_command_factory (line 7) | def convert_command_factory(args: Namespace):
  class ConvertCommand (line 17) | class ConvertCommand(BaseTransformersCLICommand):
    method register_subcommand (line 19) | def register_subcommand(parser: ArgumentParser):
    method __init__ (line 46) | def __init__(
    method run (line 64) | def run(self):

FILE: code/bert-base-count3/pretrain/transformers1/commands/download.py
  function download_command_factory (line 6) | def download_command_factory(args):
  class DownloadCommand (line 10) | class DownloadCommand(BaseTransformersCLICommand):
    method register_subcommand (line 12) | def register_subcommand(parser: ArgumentParser):
    method __init__ (line 23) | def __init__(self, model: str, cache: str, force: bool):
    method run (line 28) | def run(self):

FILE: code/bert-base-count3/pretrain/transformers1/commands/env.py
  function info_command_factory (line 9) | def info_command_factory(_):
  class EnvironmentCommand (line 13) | class EnvironmentCommand(BaseTransformersCLICommand):
    method register_subcommand (line 15) | def register_subcommand(parser: ArgumentParser):
    method run (line 19) | def run(self):
    method format_dict (line 57) | def format_dict(d):

FILE: code/bert-base-count3/pretrain/transformers1/commands/run.py
  function try_infer_format_from_ext (line 11) | def try_infer_format_from_ext(path: str):
  function run_command_factory (line 25) | def run_command_factory(args):
  class RunCommand (line 44) | class RunCommand(BaseTransformersCLICommand):
    method __init__ (line 45) | def __init__(self, nlp: Pipeline, reader: PipelineDataFormat):
    method register_subcommand (line 50) | def register_subcommand(parser: ArgumentParser):
    method run (line 81) | def run(self):

FILE: code/bert-base-count3/pretrain/transformers1/commands/serving.py
  function Body (line 21) | def Body(*x, **y):
  function serve_command_factory (line 30) | def serve_command_factory(args: Namespace):
  class ServeModelInfoResult (line 45) | class ServeModelInfoResult(BaseModel):
  class ServeTokenizeResult (line 53) | class ServeTokenizeResult(BaseModel):
  class ServeDeTokenizeResult (line 62) | class ServeDeTokenizeResult(BaseModel):
  class ServeForwardResult (line 70) | class ServeForwardResult(BaseModel):
  class ServeCommand (line 78) | class ServeCommand(BaseTransformersCLICommand):
    method register_subcommand (line 80) | def register_subcommand(parser: ArgumentParser):
    method __init__ (line 106) | def __init__(self, pipeline: Pipeline, host: str, port: int, workers: ...
    method run (line 156) | def run(self):
    method model_info (line 159) | def model_info(self):
    method tokenize (line 162) | def tokenize(self, text_input: str = Body(None, embed=True), return_id...
    method detokenize (line 180) | def detokenize(
    method forward (line 198) | async def forward(self, inputs=Body(None, embed=True)):

FILE: code/bert-base-count3/pretrain/transformers1/commands/train.py
  function train_command_factory (line 18) | def train_command_factory(args: Namespace):
  class TrainCommand (line 26) | class TrainCommand(BaseTransformersCLICommand):
    method register_subcommand (line 28) | def register_subcommand(parser: ArgumentParser):
    method __init__ (line 78) | def __init__(self, args: Namespace):
    method run (line 124) | def run(self):
    method run_torch (line 129) | def run_torch(self):
    method run_tf (line 132) | def run_tf(self):

FILE: code/bert-base-count3/pretrain/transformers1/commands/transformers_cli.py
  function main (line 12) | def main():

FILE: code/bert-base-count3/pretrain/transformers1/commands/user.py
  class UserCommands (line 16) | class UserCommands(BaseTransformersCLICommand):
    method register_subcommand (line 18) | def register_subcommand(parser: ArgumentParser):
  class ANSI (line 47) | class ANSI:
    method bold (line 57) | def bold(cls, s):
    method red (line 61) | def red(cls, s):
  class BaseUserCommand (line 65) | class BaseUserCommand:
    method __init__ (line 66) | def __init__(self, args):
  class LoginCommand (line 71) | class LoginCommand(BaseUserCommand):
    method run (line 72) | def run(self):
  class WhoamiCommand (line 98) | class WhoamiCommand(BaseUserCommand):
    method run (line 99) | def run(self):
  class LogoutCommand (line 115) | class LogoutCommand(BaseUserCommand):
    method run (line 116) | def run(self):
  class ListObjsCommand (line 126) | class ListObjsCommand(BaseUserCommand):
    method tabulate (line 127) | def tabulate(self, rows: List[List[Union[str, int]]], headers: List[st...
    method run (line 142) | def run(self):
  class DeleteObjCommand (line 160) | class DeleteObjCommand(BaseUserCommand):
    method run (line 161) | def run(self):
  class UploadCommand (line 175) | class UploadCommand(BaseUserCommand):
    method walk_dir (line 176) | def walk_dir(self, rel_path):
    method run (line 187) | def run(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_albert.py
  class AlbertConfig (line 33) | class AlbertConfig(PretrainedConfig):
    method __init__ (line 104) | def __init__(

FILE: code/bert-base-count3/pretrain/transformers1/configuration_auto.py
  class AutoConfig (line 98) | class AutoConfig:
    method __init__ (line 109) | def __init__(self):
    method for_model (line 116) | def for_model(cls, model_type: str, *args, **kwargs):
    method from_pretrained (line 127) | def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_bart.py
  class BartConfig (line 34) | class BartConfig(PretrainedConfig):
    method __init__ (line 40) | def __init__(
    method num_attention_heads (line 121) | def num_attention_heads(self) -> int:
    method hidden_size (line 125) | def hidden_size(self) -> int:
    method is_valid_mbart (line 128) | def is_valid_mbart(self) -> bool:

FILE: code/bert-base-count3/pretrain/transformers1/configuration_bert.py
  class BertConfig (line 53) | class BertConfig(PretrainedConfig):
    method __init__ (line 109) | def __init__(

FILE: code/bert-base-count3/pretrain/transformers1/configuration_camembert.py
  class CamembertConfig (line 33) | class CamembertConfig(RobertaConfig):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_ctrl.py
  class CTRLConfig (line 28) | class CTRLConfig(PretrainedConfig):
    method __init__ (line 83) | def __init__(
    method max_position_embeddings (line 125) | def max_position_embeddings(self):
    method hidden_size (line 129) | def hidden_size(self):
    method num_attention_heads (line 133) | def num_attention_heads(self):
    method num_hidden_layers (line 137) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_distilbert.py
  class DistilBertConfig (line 36) | class DistilBertConfig(PretrainedConfig):
    method __init__ (line 96) | def __init__(
    method hidden_size (line 130) | def hidden_size(self):
    method num_attention_heads (line 134) | def num_attention_heads(self):
    method num_hidden_layers (line 138) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_electra.py
  class ElectraConfig (line 36) | class ElectraConfig(PretrainedConfig):
    method __init__ (line 95) | def __init__(

FILE: code/bert-base-count3/pretrain/transformers1/configuration_encoder_decoder.py
  class EncoderDecoderConfig (line 26) | class EncoderDecoderConfig(PretrainedConfig):
    method __init__ (line 62) | def __init__(self, **kwargs):
    method from_encoder_decoder_configs (line 79) | def from_encoder_decoder_configs(
    method to_dict (line 90) | def to_dict(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_flaubert.py
  class FlaubertConfig (line 33) | class FlaubertConfig(XLMConfig):
    method __init__ (line 147) | def __init__(self, layerdrop=0.0, pre_norm=False, pad_token_id=2, bos_...

FILE: code/bert-base-count3/pretrain/transformers1/configuration_gpt2.py
  class GPT2Config (line 35) | class GPT2Config(PretrainedConfig):
    method __init__ (line 117) | def __init__(
    method max_position_embeddings (line 164) | def max_position_embeddings(self):
    method hidden_size (line 168) | def hidden_size(self):
    method num_attention_heads (line 172) | def num_attention_heads(self):
    method num_hidden_layers (line 176) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_longformer.py
  class LongformerConfig (line 34) | class LongformerConfig(RobertaConfig):
    method __init__ (line 65) | def __init__(self, attention_window: Union[List[int], int] = 512, sep_...

FILE: code/bert-base-count3/pretrain/transformers1/configuration_marian.py
  class MarianConfig (line 25) | class MarianConfig(BartConfig):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_mmbt.py
  class MMBTConfig (line 25) | class MMBTConfig(object):
    method __init__ (line 38) | def __init__(self, config, num_labels=None, modal_hidden_size=2048):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_openai.py
  class OpenAIGPTConfig (line 31) | class OpenAIGPTConfig(PretrainedConfig):
    method __init__ (line 115) | def __init__(
    method max_position_embeddings (line 159) | def max_position_embeddings(self):
    method hidden_size (line 163) | def hidden_size(self):
    method num_attention_heads (line 167) | def num_attention_heads(self):
    method num_hidden_layers (line 171) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_reformer.py
  class ReformerConfig (line 32) | class ReformerConfig(PretrainedConfig):
    method __init__ (line 141) | def __init__(

FILE: code/bert-base-count3/pretrain/transformers1/configuration_roberta.py
  class RobertaConfig (line 36) | class RobertaConfig(BertConfig):
    method __init__ (line 65) | def __init__(self, pad_token_id=1, bos_token_id=0, eos_token_id=2, **k...

FILE: code/bert-base-count3/pretrain/transformers1/configuration_t5.py
  class T5Config (line 34) | class T5Config(PretrainedConfig):
    method __init__ (line 64) | def __init__(
    method max_position_embeddings (line 98) | def max_position_embeddings(self):
    method hidden_size (line 102) | def hidden_size(self):
    method num_attention_heads (line 106) | def num_attention_heads(self):
    method num_hidden_layers (line 110) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_transfo_xl.py
  class TransfoXLConfig (line 31) | class TransfoXLConfig(PretrainedConfig):
    method __init__ (line 117) | def __init__(
    method max_position_embeddings (line 186) | def max_position_embeddings(self):
    method n_token (line 190) | def n_token(self):  # Backward compatibility
    method n_token (line 194) | def n_token(self, value):  # Backward compatibility
    method hidden_size (line 198) | def hidden_size(self):
    method num_attention_heads (line 202) | def num_attention_heads(self):
    method num_hidden_layers (line 206) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_utils.py
  class PretrainedConfig (line 31) | class PretrainedConfig(object):
    method __init__ (line 56) | def __init__(self, **kwargs):
    method num_labels (line 118) | def num_labels(self):
    method num_labels (line 122) | def num_labels(self, num_labels):
    method save_pretrained (line 126) | def save_pretrained(self, save_directory):
    method from_pretrained (line 146) | def from_pretrained(cls, pretrained_model_name_or_path, **kwargs) -> "...
    method get_config_dict (line 205) | def get_config_dict(cls, pretrained_model_name_or_path: str, **kwargs)...
    method from_dict (line 270) | def from_dict(cls, config_dict: Dict, **kwargs) -> "PretrainedConfig":
    method from_json_file (line 308) | def from_json_file(cls, json_file: str) -> "PretrainedConfig":
    method _dict_from_json_file (line 324) | def _dict_from_json_file(cls, json_file: str):
    method __eq__ (line 329) | def __eq__(self, other):
    method __repr__ (line 332) | def __repr__(self):
    method to_diff_dict (line 335) | def to_diff_dict(self):
    method to_dict (line 358) | def to_dict(self):
    method to_json_string (line 370) | def to_json_string(self, use_diff=True):
    method to_json_file (line 387) | def to_json_file(self, json_file_path, use_diff=True):
    method update (line 400) | def update(self, config_dict: Dict):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_xlm.py
  class XLMConfig (line 39) | class XLMConfig(PretrainedConfig):
    method __init__ (line 159) | def __init__(
    method n_words (line 235) | def n_words(self):  # For backward compatibility
    method n_words (line 239) | def n_words(self, value):  # For backward compatibility
    method hidden_size (line 243) | def hidden_size(self):
    method num_attention_heads (line 247) | def num_attention_heads(self):
    method num_hidden_layers (line 251) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_xlm_roberta.py
  class XLMRobertaConfig (line 36) | class XLMRobertaConfig(RobertaConfig):

FILE: code/bert-base-count3/pretrain/transformers1/configuration_xlnet.py
  class XLNetConfig (line 32) | class XLNetConfig(PretrainedConfig):
    method __init__ (line 129) | def __init__(
    method max_position_embeddings (line 194) | def max_position_embeddings(self):
    method n_token (line 198) | def n_token(self):  # Backward compatibility
    method n_token (line 202) | def n_token(self, value):  # Backward compatibility
    method hidden_size (line 206) | def hidden_size(self):
    method num_attention_heads (line 210) | def num_attention_heads(self):
    method num_hidden_layers (line 214) | def num_hidden_layers(self):

FILE: code/bert-base-count3/pretrain/transformers1/convert_albert_original_tf_checkpoint_to_pytorch.py
  function convert_tf_checkpoint_to_pytorch (line 29) | def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, albert_config_f...

FILE: code/bert-base-count3/pretrain/transformers1/convert_bart_original_pytorch_checkpoint_to_pytorch.py
  function remove_ignore_keys_ (line 56) | def remove_ignore_keys_(state_dict):
  function rename_key (line 68) | def rename_key(dct, old, new):
  function load_xsum_checkpoint (line 73) | def load_xsum_checkpoint(checkpoint_path):
  function convert_checkpoint_from_disk (line 81) | def convert_checkpoint_from_disk(checkpoint_path, **config_kwargs):
  function convert_bart_checkpoint (line 95) | def convert_bart_checkpoint(checkpoint_path, pytorch_dump_folder_path, h...

FILE: code/bert-base-count3/pretrain/transformers1/convert_bert_original_tf_checkpoint_to_pytorch.py
  function convert_tf_checkpoint_to_pytorch (line 29) | def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_fil...

FILE: code/bert-base-count3/pretrain/transformers1/convert_bert_pytorch_checkpoint_to_original_tf.py
  function convert_pytorch_checkpoint_to_tf (line 28) | def convert_pytorch_checkpoint_to_tf(model: BertModel, ckpt_dir: str, mo...
  function main (line 92) | def main(raw_args=None):

FILE: code/bert-base-count3/pretrain/transformers1/convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py
  function convert_dialogpt_checkpoint (line 15) | def convert_dialogpt_checkpoint(checkpoint_path: str, pytorch_dump_folde...

FILE: code/bert-base-count3/pretrain/transformers1/convert_electra_original_tf_checkpoint_to_pytorch.py
  function convert_tf_checkpoint_to_pytorch (line 29) | def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, config_file, py...

FILE: code/bert-base-count3/pretrain/transformers1/convert_gpt2_original_tf_checkpoint_to_pytorch.py
  function convert_gpt2_checkpoint_to_pytorch (line 29) | def convert_gpt2_checkpoint_to_pytorch(gpt2_checkpoint_path, gpt2_config...

FILE: code/bert-base-count3/pretrain/transformers1/convert_graph_to_onnx.py
  class OnnxConverterArgumentParser (line 11) | class OnnxConverterArgumentParser(ArgumentParser):
    method __init__ (line 16) | def __init__(self):
  function ensure_valid_input (line 28) | def ensure_valid_input(model, tokens, input_names):
  function infer_shapes (line 53) | def infer_shapes(nlp: Pipeline, framework: str) -> Tuple[List[str], List...
  function load_graph_from_args (line 100) | def load_graph_from_args(framework: str, model: str, tokenizer: Optional...
  function convert_pytorch (line 111) | def convert_pytorch(nlp: Pipeline, opset: int, output: str, use_external...
  function convert_tensorflow (line 138) | def convert_tensorflow(nlp: Pipeline, opset: int, output: str):
  function convert (line 166) | def convert(
  function verify (line 193) | def verify(path: str):

FILE: code/bert-base-count3/pretrain/transformers1/convert_longformer_original_pytorch_lightning_to_pytorch.py
  class LightningModel (line 26) | class LightningModel(pl.LightningModule):
    method __init__ (line 27) | def __init__(self, model):
    method forward (line 34) | def forward(self):
  function convert_longformer_qa_checkpoint_to_pytorch (line 38) | def convert_longformer_qa_checkpoint_to_pytorch(

FILE: code/bert-base-count3/pretrain/transformers1/convert_marian_to_pytorch.py
  function remove_prefix (line 18) | def remove_prefix(text: str, prefix: str):
  function convert_encoder_layer (line 24) | def convert_encoder_layer(opus_dict, layer_prefix: str, converter: dict):
  function load_layers_ (line 35) | def load_layers_(layer_lst: torch.nn.ModuleList, opus_state: dict, conve...
  function find_pretrained_model (line 42) | def find_pretrained_model(src_lang: str, tgt_lang: str) -> List[str]:
  function add_emb_entries (line 55) | def add_emb_entries(wemb, final_bias, n_special_tokens=1):
  function _cast_yaml_str (line 64) | def _cast_yaml_str(v):
  function cast_marian_config (line 76) | def cast_marian_config(raw_cfg: Dict[str, str]) -> Dict:
  function load_config_from_state_dict (line 83) | def load_config_from_state_dict(opus_dict):
  function find_model_file (line 91) | def find_model_file(dest_dir):  # this one better
  function convert_opus_name_to_hf_name (line 136) | def convert_opus_name_to_hf_name(x):
  function convert_hf_name_to_opus_name (line 142) | def convert_hf_name_to_opus_name(hf_model_name):
  function write_model_card (line 152) | def write_model_card(
  function get_clean_model_id_mapping (line 185) | def get_clean_model_id_mapping(multiling_model_ids):
  function make_registry (line 189) | def make_registry(repo_path="Opus-MT-train/models"):
  function convert_all_sentencepiece_models (line 206) | def convert_all_sentencepiece_models(model_list=None, repo_path=None):
  function lmap (line 222) | def lmap(f, x) -> List:
  function fetch_test_set (line 226) | def fetch_test_set(test_set_url):
  function convert_whole_dir (line 239) | def convert_whole_dir(path=Path("marian_ckpt/")):
  function _parse_readme (line 247) | def _parse_readme(lns):
  function save_tokenizer_config (line 270) | def save_tokenizer_config(dest_dir: Path):
  function add_to_vocab_ (line 276) | def add_to_vocab_(vocab: Dict[str, int], special_tokens: List[str]):
  function find_vocab_file (line 287) | def find_vocab_file(model_dir):
  function add_special_tokens_to_vocab (line 291) | def add_special_tokens_to_vocab(model_dir: Path) -> None:
  function save_tokenizer (line 300) | def save_tokenizer(self, save_directory):
  function check_equal (line 309) | def check_equal(marian_cfg, k1, k2):
  function check_marian_cfg_assumptions (line 314) | def check_marian_cfg_assumptions(marian_cfg):
  class OpusState (line 371) | class OpusState:
    method __init__ (line 372) | def __init__(self, source_dir):
    method _check_layer_entries (line 420) | def _check_layer_entries(self):
    method extra_keys (line 432) | def extra_keys(self):
    method sub_keys (line 445) | def sub_keys(self, layer_prefix):
    method load_marian_model (line 448) | def load_marian_model(self) -> MarianMTModel:
  function download_and_unzip (line 483) | def download_and_unzip(url, dest_dir):
  function convert (line 494) | def convert(source_dir: Path, dest_dir):
  function load_yaml (line 525) | def load_yaml(path):
  function save_json (line 532) | def save_json(content: Union[Dict, List], path: str) -> None:
  function unzip (line 537) | def unzip(zip_path: str, dest_dir: str) -> None:

FILE: code/bert-base-count3/pretrain/transformers1/convert_openai_original_tf_checkpoint_to_pytorch.py
  function convert_openai_checkpoint_to_pytorch (line 29) | def convert_openai_checkpoint_to_pytorch(openai_checkpoint_folder_path, ...

FILE: code/bert-base-count3/pretrain/transformers1/convert_pytorch_checkpoint_to_tf2.py
  function convert_pt_checkpoint_to_tf (line 187) | def convert_pt_checkpoint_to_tf(
  function convert_all_pt_checkpoints_to_tf (line 233) | def convert_all_pt_checkpoints_to_tf(

FILE: code/bert-base-count3/pretrain/transformers1/convert_reformer_trax_checkpoint_to_pytorch.py
  function set_param (line 31) | def set_param(torch_layer, weight, bias=None):
  function set_layer_weights_in_torch_lsh (line 40) | def set_layer_weights_in_torch_lsh(weights, torch_layer, hidden_size):
  function set_layer_weights_in_torch_local (line 58) | def set_layer_weights_in_torch_local(weights, torch_layer, hidden_size):
  function set_block_weights_in_torch (line 79) | def set_block_weights_in_torch(weights, torch_block, hidden_size):
  function set_model_weights_in_torch (line 128) | def set_model_weights_in_torch(weights, torch_model, hidden_size):
  function convert_trax_checkpoint_to_pytorch (line 174) | def convert_trax_checkpoint_to_pytorch(trax_model_pkl_path, config_file,...

FILE: code/bert-base-count3/pretrain/transformers1/convert_roberta_original_pytorch_checkpoint_to_pytorch.py
  function convert_roberta_checkpoint_to_pytorch (line 42) | def convert_roberta_checkpoint_to_pytorch(

FILE: code/bert-base-count3/pretrain/transformers1/convert_t5_original_tf_checkpoint_to_pytorch.py
  function convert_tf_checkpoint_to_pytorch (line 29) | def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, config_file, py...

FILE: code/bert-base-count3/pretrain/transformers1/convert_transfo_xl_original_tf_checkpoint_to_pytorch.py
  function convert_transfo_xl_checkpoint_to_pytorch (line 47) | def convert_transfo_xl_checkpoint_to_pytorch(

FILE: code/bert-base-count3/pretrain/transformers1/convert_xlm_original_pytorch_checkpoint_to_pytorch.py
  function convert_xlm_checkpoint_to_pytorch (line 32) | def convert_xlm_checkpoint_to_pytorch(xlm_checkpoint_path, pytorch_dump_...

FILE: code/bert-base-count3/pretrain/transformers1/convert_xlnet_original_tf_checkpoint_to_pytorch.py
  function convert_xlnet_checkpoint_to_pytorch (line 51) | def convert_xlnet_checkpoint_to_pytorch(

FILE: code/bert-base-count3/pretrain/transformers1/data/data_collator.py
  class DataCollator (line 12) | class DataCollator(ABC):
    method collate_batch (line 19) | def collate_batch(self) -> Dict[str, torch.Tensor]:
  class DefaultDataCollator (line 33) | class DefaultDataCollator(DataCollator):
    method collate_batch (line 46) | def collate_batch(self, features: List[InputDataClass]) -> Dict[str, t...
  class DataCollatorForLanguageModeling (line 80) | class DataCollatorForLanguageModeling(DataCollator):
    method collate_batch (line 91) | def collate_batch(self, examples: List[torch.Tensor]) -> Dict[str, tor...
    method _tensorize_batch (line 99) | def _tensorize_batch(self, examples: List[torch.Tensor]) -> torch.Tensor:
    method mask_tokens (line 112) | def mask_tokens(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, tor...
    method mask_tokens2 (line 148) | def mask_tokens2(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...
    method mask_tokens3 (line 192) | def mask_tokens3(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...
    method mask_tokens4 (line 259) | def mask_tokens4(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...
    method mask_tokens5 (line 342) | def mask_tokens5(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...
    method mask_tokens6 (line 427) | def mask_tokens6(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...
    method mask_tokens7 (line 507) | def mask_tokens7(self, inputs: torch.Tensor) -> Tuple[torch.Tensor, to...

FILE: code/bert-base-count3/pretrain/transformers1/data/datasets/glue.py
  class GlueDataTrainingArguments (line 23) | class GlueDataTrainingArguments:
    method __post_init__ (line 47) | def __post_init__(self):
  class Split (line 51) | class Split(Enum):
  class GlueDataset (line 57) | class GlueDataset(Dataset):
    method __init__ (line 67) | def __init__(
    method __len__ (line 135) | def __len__(self):
    method __getitem__ (line 138) | def __getitem__(self, i) -> InputFeatures:
    method get_labels (line 141) | def get_labels(self):

FILE: code/bert-base-count3/pretrain/transformers1/data/datasets/language_modeling.py
  class TextDataset (line 16) | class TextDataset(Dataset):
    method __init__ (line 22) | def __init__(
    method __len__ (line 71) | def __len__(self):
    method __getitem__ (line 74) | def __getitem__(self, i) -> torch.Tensor:
  class LineByLineTextDataset (line 78) | class LineByLineTextDataset(Dataset):
    method __init__ (line 84) | def __init__(self, tokenizer: PreTrainedTokenizer, file_path: str, blo...
    method __len__ (line 97) | def __len__(self):
    method __getitem__ (line 100) | def __getitem__(self, i) -> torch.Tensor:

FILE: code/bert-base-count3/pretrain/transformers1/data/metrics/__init__.py
  function is_sklearn_available (line 26) | def is_sklearn_available():
  function simple_accuracy (line 32) | def simple_accuracy(preds, labels):
  function acc_and_f1 (line 35) | def acc_and_f1(preds, labels):
  function pearson_and_spearman (line 44) | def pearson_and_spearman(preds, labels):
  function glue_compute_metrics (line 53) | def glue_compute_metrics(task_name, preds, labels):
  function xnli_compute_metrics (line 80) | def xnli_compute_metrics(task_name, preds, labels):

FILE: code/bert-base-count3/pretrain/transformers1/data/metrics/squad_metrics.py
  function normalize_answer (line 24) | def normalize_answer(s):
  function get_tokens (line 44) | def get_tokens(s):
  function compute_exact (line 50) | def compute_exact(a_gold, a_pred):
  function compute_f1 (line 54) | def compute_f1(a_gold, a_pred):
  function get_raw_scores (line 70) | def get_raw_scores(examples, preds):
  function apply_no_ans_threshold (line 96) | def apply_no_ans_threshold(scores, na_probs, qid_to_has_ans, na_prob_thr...
  function make_eval_dict (line 107) | def make_eval_dict(exact_scores, f1_scores, qid_list=None):
  function merge_eval (line 128) | def merge_eval(main_eval, new_eval, prefix):
  function find_best_thresh_v2 (line 133) | def find_best_thresh_v2(preds, scores, na_probs, qid_to_has_ans):
  function find_all_best_thresh_v2 (line 167) | def find_all_best_thresh_v2(main_eval, preds, exact_raw, f1_raw, na_prob...
  function find_best_thresh (line 178) | def find_best_thresh(preds, scores, na_probs, qid_to_has_ans):
  function find_all_best_thresh (line 201) | def find_all_best_thresh(main_eval, preds, exact_raw, f1_raw, na_probs, ...
  function squad_evaluate (line 211) | def squad_evaluate(examples, preds, no_answer_probs=None, no_answer_prob...
  function get_final_text (line 242) | def get_final_text(pred_text, orig_text, do_lower_case, verbose_logging=...
  function _get_best_indexes (line 336) | def _get_best_indexes(logits, n_best_size):
  function _compute_softmax (line 348) | def _compute_softmax(scores):
  function compute_predictions_logits (line 371) | def compute_predictions_logits(
  function compute_predictions_log_probs (line 576) | def compute_predictions_log_probs(

FILE: code/bert-base-count3/pretrain/transformers1/data/processors/glue.py
  function glue_convert_examples_to_features (line 34) | def glue_convert_examples_to_features(
  function _tf_glue_convert_examples_to_features (line 70) | def _tf_glue_convert_examples_to_features(
  function _glue_convert_examples_to_features (line 107) | def _glue_convert_examples_to_features(
  class OutputMode (line 159) | class OutputMode(Enum):
  class MrpcProcessor (line 164) | class MrpcProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 167) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 176) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 181) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 185) | def get_test_examples(self, data_dir):
    method get_labels (line 189) | def get_labels(self):
    method _create_examples (line 193) | def _create_examples(self, lines, set_type):
  class MnliProcessor (line 207) | class MnliProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 210) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 219) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 223) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 227) | def get_test_examples(self, data_dir):
    method get_labels (line 231) | def get_labels(self):
    method _create_examples (line 235) | def _create_examples(self, lines, set_type):
  class MnliMismatchedProcessor (line 249) | class MnliMismatchedProcessor(MnliProcessor):
    method get_dev_examples (line 252) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 256) | def get_test_examples(self, data_dir):
  class ColaProcessor (line 261) | class ColaProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 264) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 273) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 277) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 281) | def get_test_examples(self, data_dir):
    method get_labels (line 285) | def get_labels(self):
    method _create_examples (line 289) | def _create_examples(self, lines, set_type):
  class Sst2Processor (line 304) | class Sst2Processor(DataProcessor):
    method get_example_from_tensor_dict (line 307) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 316) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 320) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 324) | def get_test_examples(self, data_dir):
    method get_labels (line 328) | def get_labels(self):
    method _create_examples (line 332) | def _create_examples(self, lines, set_type):
  class StsbProcessor (line 346) | class StsbProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 349) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 358) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 362) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 366) | def get_test_examples(self, data_dir):
    method get_labels (line 370) | def get_labels(self):
    method _create_examples (line 374) | def _create_examples(self, lines, set_type):
  class QqpProcessor (line 388) | class QqpProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 391) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 400) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 404) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 408) | def get_test_examples(self, data_dir):
    method get_labels (line 412) | def get_labels(self):
    method _create_examples (line 416) | def _create_examples(self, lines, set_type):
  class QnliProcessor (line 436) | class QnliProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 439) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 448) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 452) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 456) | def get_test_examples(self, data_dir):
    method get_labels (line 460) | def get_labels(self):
    method _create_examples (line 464) | def _create_examples(self, lines, set_type):
  class RteProcessor (line 478) | class RteProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 481) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 490) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 494) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 498) | def get_test_examples(self, data_dir):
    method get_labels (line 502) | def get_labels(self):
    method _create_examples (line 506) | def _create_examples(self, lines, set_type):
  class WnliProcessor (line 520) | class WnliProcessor(DataProcessor):
    method get_example_from_tensor_dict (line 523) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 532) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 536) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 540) | def get_test_examples(self, data_dir):
    method get_labels (line 544) | def get_labels(self):
    method _create_examples (line 548) | def _create_examples(self, lines, set_type):

FILE: code/bert-base-count3/pretrain/transformers1/data/processors/squad.py
  function _improve_answer_span (line 25) | def _improve_answer_span(doc_tokens, input_start, input_end, tokenizer, ...
  function _check_is_max_context (line 38) | def _check_is_max_context(doc_spans, cur_span_index, position):
  function _new_check_is_max_context (line 58) | def _new_check_is_max_context(doc_spans, cur_span_index, position):
  function _is_whitespace (line 80) | def _is_whitespace(c):
  function squad_convert_example_to_features (line 86) | def squad_convert_example_to_features(example, max_seq_length, doc_strid...
  function squad_convert_example_to_features_init (line 264) | def squad_convert_example_to_features_init(tokenizer_for_convert):
  function squad_convert_examples_to_features (line 269) | def squad_convert_examples_to_features(
  class SquadProcessor (line 445) | class SquadProcessor(DataProcessor):
    method _get_example_from_tensor_dict (line 454) | def _get_example_from_tensor_dict(self, tensor_dict, evaluate=False):
    method get_examples_from_dataset (line 478) | def get_examples_from_dataset(self, dataset, evaluate=False):
    method get_train_examples (line 509) | def get_train_examples(self, data_dir, filename=None):
    method get_dev_examples (line 531) | def get_dev_examples(self, data_dir, filename=None):
    method _create_examples (line 552) | def _create_examples(self, input_data, set_type):
  class SquadV1Processor (line 594) | class SquadV1Processor(SquadProcessor):
  class SquadV2Processor (line 599) | class SquadV2Processor(SquadProcessor):
  class SquadExample (line 604) | class SquadExample(object):
    method __init__ (line 619) | def __init__(
  class SquadFeatures (line 667) | class SquadFeatures(object):
    method __init__ (line 692) | def __init__(
  class SquadResult (line 729) | class SquadResult(object):
    method __init__ (line 739) | def __init__(self, unique_id, start_logits, end_logits, start_top_inde...

FILE: code/bert-base-count3/pretrain/transformers1/data/processors/utils.py
  class InputExample (line 31) | class InputExample:
    method to_json_string (line 50) | def to_json_string(self):
  class InputFeatures (line 56) | class InputFeatures:
    method to_json_string (line 77) | def to_json_string(self):
  class DataProcessor (line 82) | class DataProcessor:
    method get_example_from_tensor_dict (line 85) | def get_example_from_tensor_dict(self, tensor_dict):
    method get_train_examples (line 93) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 97) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 101) | def get_test_examples(self, data_dir):
    method get_labels (line 105) | def get_labels(self):
    method tfds_map (line 109) | def tfds_map(self, example):
    method _read_tsv (line 117) | def _read_tsv(cls, input_file, quotechar=None):
  class SingleSentenceClassificationProcessor (line 123) | class SingleSentenceClassificationProcessor(DataProcessor):
    method __init__ (line 126) | def __init__(self, labels=None, examples=None, mode="classification", ...
    method __len__ (line 132) | def __len__(self):
    method __getitem__ (line 135) | def __getitem__(self, idx):
    method create_from_csv (line 141) | def create_from_csv(
    method create_from_examples (line 158) | def create_from_examples(cls, texts_or_text_and_labels, labels=None, *...
    method add_examples_from_csv (line 163) | def add_examples_from_csv(
    method add_examples (line 193) | def add_examples(
    method get_features (line 226) | def get_features(

FILE: code/bert-base-count3/pretrain/transformers1/data/processors/xnli.py
  class XnliProcessor (line 28) | class XnliProcessor(DataProcessor):
    method __init__ (line 32) | def __init__(self, language, train_language=None):
    method get_train_examples (line 36) | def get_train_examples(self, data_dir):
    method get_test_examples (line 52) | def get_test_examples(self, data_dir):
    method get_labels (line 70) | def get_labels(self):

FILE: code/bert-base-count3/pretrain/transformers1/file_utils.py
  function is_torch_available (line 93) | def is_torch_available():
  function is_tf_available (line 97) | def is_tf_available():
  function add_start_docstrings (line 101) | def add_start_docstrings(*docstr):
  function add_start_docstrings_to_callable (line 109) | def add_start_docstrings_to_callable(*docstr):
  function add_end_docstrings (line 127) | def add_end_docstrings(*docstr):
  function is_remote_url (line 135) | def is_remote_url(url_or_filename):
  function hf_bucket_url (line 140) | def hf_bucket_url(model_id: str, filename: str, use_cdn=True) -> str:
  function url_to_filename (line 164) | def url_to_filename(url, etag=None):
  function filename_to_url (line 188) | def filename_to_url(filename, cache_dir=None):
  function cached_path (line 214) | def cached_path(
  function http_get (line 306) | def http_get(url, temp_file, proxies=None, resume_size=0, user_agent=None):
  function get_from_cache (line 339) | def get_from_cache(
  class cached_property (line 453) | class cached_property(property):
    method __get__ (line 462) | def __get__(self, obj, objtype=None):
  function torch_required (line 476) | def torch_required(func):
  function tf_required (line 488) | def tf_required(func):

FILE: code/bert-base-count3/pretrain/transformers1/hf_api.py
  class S3Obj (line 29) | class S3Obj:
    method __init__ (line 34) | def __init__(self, filename: str, LastModified: str, ETag: str, Size: ...
  class PresignedUrl (line 41) | class PresignedUrl:
    method __init__ (line 42) | def __init__(self, write: str, access: str, type: str, **kwargs):
  class S3Object (line 48) | class S3Object:
    method __init__ (line 53) | def __init__(
  class ModelInfo (line 69) | class ModelInfo:
    method __init__ (line 74) | def __init__(
  class HfApi (line 92) | class HfApi:
    method __init__ (line 93) | def __init__(self, endpoint=None):
    method login (line 96) | def login(self, username: str, password: str) -> str:
    method whoami (line 112) | def whoami(self, token: str) -> Tuple[str, List[str]]:
    method logout (line 122) | def logout(self, token: str) -> None:
    method presign (line 130) | def presign(self, token: str, filename: str, organization: Optional[st...
    method presign_and_upload (line 144) | def presign_and_upload(self, token: str, filename: str, filepath: str,...
    method list_objs (line 166) | def list_objs(self, token: str, organization: Optional[str] = None) ->...
    method delete_obj (line 177) | def delete_obj(self, token: str, filename: str, organization: Optional...
    method model_list (line 189) | def model_list(self) -> List[ModelInfo]:
  class TqdmProgressFileReader (line 200) | class TqdmProgressFileReader:
    method __init__ (line 209) | def __init__(self, f: io.BufferedReader):
    method _read (line 216) | def _read(self, n=-1):
    method close (line 220) | def close(self):
  class HfFolder (line 224) | class HfFolder:
    method save_token (line 228) | def save_token(cls, token):
    method get_token (line 237) | def get_token(cls):
    method delete_token (line 248) | def delete_token(cls):

FILE: code/bert-base-count3/pretrain/transformers1/hf_argparser.py
  class HfArgumentParser (line 14) | class HfArgumentParser(ArgumentParser):
    method __init__ (line 26) | def __init__(self, dataclass_types: Union[DataClassType, Iterable[Data...
    method _add_dataclass_arguments (line 42) | def _add_dataclass_arguments(self, dtype: DataClassType):
    method parse_args_into_dataclasses (line 88) | def parse_args_into_dataclasses(
    method parse_json_file (line 146) | def parse_json_file(self, json_file: str) -> Tuple[DataClass, ...]:

FILE: code/bert-base-count3/pretrain/transformers1/modelcard.py
  class ModelCard (line 38) | class ModelCard:
    method __init__ (line 55) | def __init__(self, **kwargs):
    method save_pretrained (line 75) | def save_pretrained(self, save_directory_or_file):
    method from_pretrained (line 88) | def from_pretrained(cls, pretrained_model_name_or_path, **kwargs):
    method from_dict (line 186) | def from_dict(cls, json_object):
    method from_json_file (line 191) | def from_json_file(cls, json_file):
    method __eq__ (line 198) | def __eq__(self, other):
    method __repr__ (line 201) | def __repr__(self):
    method to_dict (line 204) | def to_dict(self):
    method to_json_string (line 209) | def to_json_string(self):
    method to_json_file (line 213) | def to_json_file(self, json_file_path):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_albert.py
  function load_tf_weights_in_albert (line 47) | def load_tf_weights_in_albert(model, config, tf_checkpoint_path):
  class AlbertEmbeddings (line 171) | class AlbertEmbeddings(BertEmbeddings):
    method __init__ (line 176) | def __init__(self, config):
  class AlbertAttention (line 185) | class AlbertAttention(BertSelfAttention):
    method __init__ (line 186) | def __init__(self, config):
    method prune_heads (line 198) | def prune_heads(self, heads):
    method forward (line 221) | def forward(self, input_ids, attention_mask=None, head_mask=None):
  class AlbertLayer (line 266) | class AlbertLayer(nn.Module):
    method __init__ (line 267) | def __init__(self, config):
    method forward (line 277) | def forward(self, hidden_states, attention_mask=None, head_mask=None):
  class AlbertLayerGroup (line 287) | class AlbertLayerGroup(nn.Module):
    method __init__ (line 288) | def __init__(self, config):
    method forward (line 295) | def forward(self, hidden_states, attention_mask=None, head_mask=None):
  class AlbertTransformer (line 317) | class AlbertTransformer(nn.Module):
    method __init__ (line 318) | def __init__(self, config):
    method forward (line 327) | def forward(self, hidden_states, attention_mask=None, head_mask=None):
  class AlbertPreTrainedModel (line 363) | class AlbertPreTrainedModel(PreTrainedModel):
    method _init_weights (line 371) | def _init_weights(self, module):
  class AlbertModel (line 439) | class AlbertModel(AlbertPreTrainedModel):
    method __init__ (line 445) | def __init__(self, config):
    method get_input_embeddings (line 456) | def get_input_embeddings(self):
    method set_input_embeddings (line 459) | def set_input_embeddings(self, value):
    method _resize_token_embeddings (line 462) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 468) | def _prune_heads(self, heads_to_prune):
    method forward (line 487) | def forward(
  class AlbertForPreTraining (line 576) | class AlbertForPreTraining(AlbertPreTrainedModel):
    method __init__ (line 577) | def __init__(self, config):
    method tie_weights (line 587) | def tie_weights(self):
    method get_output_embeddings (line 590) | def get_output_embeddings(self):
    method forward (line 594) | def forward(
  class AlbertMLMHead (line 680) | class AlbertMLMHead(nn.Module):
    method __init__ (line 681) | def __init__(self, config):
    method forward (line 693) | def forward(self, hidden_states):
  class AlbertSOPHead (line 704) | class AlbertSOPHead(nn.Module):
    method __init__ (line 705) | def __init__(self, config):
    method forward (line 711) | def forward(self, pooled_output):
  class AlbertForMaskedLM (line 720) | class AlbertForMaskedLM(AlbertPreTrainedModel):
    method __init__ (line 721) | def __init__(self, config):
    method tie_weights (line 730) | def tie_weights(self):
    method get_output_embeddings (line 733) | def get_output_embeddings(self):
    method forward (line 737) | def forward(
  class AlbertForSequenceClassification (line 810) | class AlbertForSequenceClassification(AlbertPreTrainedModel):
    method __init__ (line 811) | def __init__(self, config):
    method forward (line 822) | def forward(
  class AlbertForTokenClassification (line 905) | class AlbertForTokenClassification(AlbertPreTrainedModel):
    method __init__ (line 906) | def __init__(self, config):
    method forward (line 917) | def forward(
  class AlbertForQuestionAnswering (line 1002) | class AlbertForQuestionAnswering(AlbertPreTrainedModel):
    method __init__ (line 1003) | def __init__(self, config):
    method forward (line 1013) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_auto.py
  class AutoModel (line 269) | class AutoModel:
    method __init__ (line 279) | def __init__(self):
    method from_config (line 287) | def from_config(cls, config):
    method from_pretrained (line 329) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelForPreTraining (line 424) | class AutoModelForPreTraining:
    method __init__ (line 433) | def __init__(self):
    method from_config (line 441) | def from_config(cls, config):
    method from_pretrained (line 483) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelWithLMHead (line 570) | class AutoModelWithLMHead:
    method __init__ (line 580) | def __init__(self):
    method from_config (line 588) | def from_config(cls, config):
    method from_pretrained (line 630) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelForSequenceClassification (line 718) | class AutoModelForSequenceClassification:
    method __init__ (line 728) | def __init__(self):
    method from_config (line 736) | def from_config(cls, config):
    method from_pretrained (line 778) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelForQuestionAnswering (line 867) | class AutoModelForQuestionAnswering:
    method __init__ (line 877) | def __init__(self):
    method from_config (line 885) | def from_config(cls, config):
    method from_pretrained (line 924) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelForTokenClassification (line 1009) | class AutoModelForTokenClassification:
    method __init__ (line 1019) | def __init__(self):
    method from_config (line 1027) | def from_config(cls, config):
    method from_pretrained (line 1069) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class AutoModelForMultipleChoice (line 1156) | class AutoModelForMultipleChoice:
    method __init__ (line 1166) | def __init__(self):
    method from_config (line 1174) | def from_config(cls, config):
    method from_pretrained (line 1189) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...

FILE: code/bert-base-count3/pretrain/transformers1/modeling_bart.py
  function invert_mask (line 94) | def invert_mask(attention_mask):
  function _prepare_bart_decoder_inputs (line 99) | def _prepare_bart_decoder_inputs(
  class PretrainedBartModel (line 120) | class PretrainedBartModel(PreTrainedModel):
    method _init_weights (line 124) | def _init_weights(self, module):
    method dummy_inputs (line 138) | def dummy_inputs(self):
  function _make_linear_from_emb (line 148) | def _make_linear_from_emb(emb):
  function _check_shapes (line 156) | def _check_shapes(shape_1, shape2):
  function shift_tokens_right (line 161) | def shift_tokens_right(input_ids, pad_token_id):
  function make_padding_mask (line 170) | def make_padding_mask(input_ids, padding_idx=1):
  class EncoderLayer (line 181) | class EncoderLayer(nn.Module):
    method __init__ (line 182) | def __init__(self, config: BartConfig):
    method forward (line 198) | def forward(self, x, encoder_padding_mask):
  class BartEncoder (line 234) | class BartEncoder(nn.Module):
    method __init__ (line 243) | def __init__(self, config: BartConfig, embed_tokens):
    method forward (line 270) | def forward(
  class DecoderLayer (line 327) | class DecoderLayer(nn.Module):
    method __init__ (line 328) | def __init__(self, config: BartConfig):
    method forward (line 352) | def forward(
  class BartDecoder (line 416) | class BartDecoder(nn.Module):
    method __init__ (line 425) | def __init__(self, config: BartConfig, embed_tokens: nn.Embedding):
    method forward (line 449) | def forward(
  function _reorder_buffer (line 542) | def _reorder_buffer(attn_cache, new_order):
  class SelfAttention (line 549) | class SelfAttention(nn.Module):
    method __init__ (line 552) | def __init__(
    method _shape (line 575) | def _shape(self, tensor, dim_0, bsz):
    method forward (line 578) | def forward(
    method _use_saved_state (line 663) | def _use_saved_state(self, k, v, saved_state, key_padding_mask, static...
    method _cat_prev_key_padding_mask (line 691) | def _cat_prev_key_padding_mask(
  class BartClassificationHead (line 718) | class BartClassificationHead(nn.Module):
    method __init__ (line 723) | def __init__(
    method forward (line 731) | def forward(self, x):
  class LearnedPositionalEmbedding (line 740) | class LearnedPositionalEmbedding(nn.Embedding):
    method __init__ (line 748) | def __init__(
    method forward (line 757) | def forward(self, input, use_cache=False):
  function LayerNorm (line 767) | def LayerNorm(normalized_shape, eps=1e-5, elementwise_affine=True):
  function fill_with_neg_inf (line 778) | def fill_with_neg_inf(t):
  function _filter_out_falsey_values (line 783) | def _filter_out_falsey_values(tup) -> Tuple:
  function _get_shape (line 789) | def _get_shape(t):
  class BartModel (line 796) | class BartModel(PretrainedBartModel):
    method __init__ (line 797) | def __init__(self, config: BartConfig):
    method forward (line 811) | def forward(
    method get_input_embeddings (line 854) | def get_input_embeddings(self):
    method set_input_embeddings (line 857) | def set_input_embeddings(self, value):
    method get_output_embeddings (line 862) | def get_output_embeddings(self):
  class BartForConditionalGeneration (line 870) | class BartForConditionalGeneration(PretrainedBartModel):
    method __init__ (line 873) | def __init__(self, config: BartConfig):
    method resize_token_embeddings (line 879) | def resize_token_embeddings(self, new_num_tokens: int) -> nn.Embedding:
    method _resize_final_logits_bias (line 886) | def _resize_final_logits_bias(self, new_num_tokens: int, old_num_token...
    method forward (line 895) | def forward(
    method prepare_inputs_for_generation (line 967) | def prepare_inputs_for_generation(self, decoder_input_ids, past, atten...
    method prepare_logits_for_generation (line 984) | def prepare_logits_for_generation(self, logits, cur_len, max_length):
    method _force_token_ids_generation (line 991) | def _force_token_ids_generation(self, scores, token_ids) -> None:
    method _reorder_cache (line 1004) | def _reorder_cache(past, beam_idx):
    method get_encoder (line 1020) | def get_encoder(self):
    method get_output_embeddings (line 1023) | def get_output_embeddings(self):
  class BartForSequenceClassification (line 1031) | class BartForSequenceClassification(PretrainedBartModel):
    method __init__ (line 1032) | def __init__(self, config: BartConfig, **kwargs):
    method forward (line 1042) | def forward(
  class SinusoidalPositionalEmbedding (line 1109) | class SinusoidalPositionalEmbedding(nn.Embedding):
    method __init__ (line 1112) | def __init__(self, num_positions, embedding_dim, padding_idx=None):
    method _init_weight (line 1119) | def _init_weight(out: nn.Parameter):
    method forward (line 1134) | def forward(self, input_ids, use_cache=False):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_beam_search.py
  class TransformerBeamSearch (line 29) | class TransformerBeamSearch(nn.Module):
    method __init__ (line 30) | def __init__(
    method step (line 80) | def step(self, log_probabilities):
    method forward (line 177) | def forward(self, encoder_input_ids, **kwargs):
    method remove_repeating_trigrams (line 224) | def remove_repeating_trigrams(self, log_probabilities, _B):
    method enforce_min_length (line 233) | def enforce_min_length(self):
    method enforce_max_length (line 237) | def enforce_max_length(self):
    method length_penalty (line 241) | def length_penalty(self):
  function tile (line 245) | def tile(x, count, dim=0):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_bert.py
  function load_tf_weights_in_bert (line 62) | def load_tf_weights_in_bert(model, config, tf_checkpoint_path):
  function mish (line 134) | def mish(x):
  class BertEmbeddings (line 144) | class BertEmbeddings(nn.Module):
    method __init__ (line 148) | def __init__(self, config):
    method forward (line 159) | def forward(self, input_ids=None, token_type_ids=None, position_ids=No...
  class BertSelfAttention (line 184) | class BertSelfAttention(nn.Module):
    method __init__ (line 185) | def __init__(self, config):
    method transpose_for_scores (line 204) | def transpose_for_scores(self, x):
    method forward (line 209) | def forward(
  class BertSelfOutput (line 262) | class BertSelfOutput(nn.Module):
    method __init__ (line 263) | def __init__(self, config):
    method forward (line 269) | def forward(self, hidden_states, input_tensor):
  class BertAttention (line 276) | class BertAttention(nn.Module):
    method __init__ (line 277) | def __init__(self, config):
    method prune_heads (line 283) | def prune_heads(self, heads):
    method forward (line 306) | def forward(
  class BertIntermediate (line 322) | class BertIntermediate(nn.Module):
    method __init__ (line 323) | def __init__(self, config):
    method forward (line 331) | def forward(self, hidden_states):
  class BertOutput (line 337) | class BertOutput(nn.Module):
    method __init__ (line 338) | def __init__(self, config):
    method forward (line 344) | def forward(self, hidden_states, input_tensor):
  class BertLayer (line 351) | class BertLayer(nn.Module):
    method __init__ (line 352) | def __init__(self, config):
    method forward (line 361) | def forward(
  class BertEncoder (line 386) | class BertEncoder(nn.Module):
    method __init__ (line 387) | def __init__(self, config):
    method forward (line 393) | def forward(
  class BertPooler (line 427) | class BertPooler(nn.Module):
    method __init__ (line 428) | def __init__(self, config):
    method forward (line 433) | def forward(self, hidden_states):
  class BertPredictionHeadTransform (line 442) | class BertPredictionHeadTransform(nn.Module):
    method __init__ (line 443) | def __init__(self, config):
    method forward (line 452) | def forward(self, hidden_states):
  class BertLMPredictionHead (line 459) | class BertLMPredictionHead(nn.Module):
    method __init__ (line 460) | def __init__(self, config):
    method forward (line 473) | def forward(self, hidden_states):
  class BertOnlyMLMHead (line 479) | class BertOnlyMLMHead(nn.Module):
    method __init__ (line 480) | def __init__(self, config):
    method forward (line 484) | def forward(self, sequence_output):
  class BertOnlyNSPHead (line 489) | class BertOnlyNSPHead(nn.Module):
    method __init__ (line 490) | def __init__(self, config):
    method forward (line 494) | def forward(self, pooled_output):
  class BertPreTrainingHeads (line 499) | class BertPreTrainingHeads(nn.Module):
    method __init__ (line 500) | def __init__(self, config):
    method forward (line 505) | def forward(self, sequence_output, pooled_output):
  class BertPreTrainedModel (line 511) | class BertPreTrainedModel(PreTrainedModel):
    method _init_weights (line 520) | def _init_weights(self, module):
  class BertModel (line 594) | class BertModel(BertPreTrainedModel):
    method __init__ (line 611) | def __init__(self, config):
    method get_input_embeddings (line 621) | def get_input_embeddings(self):
    method set_input_embeddings (line 624) | def set_input_embeddings(self, value):
    method _prune_heads (line 627) | def _prune_heads(self, heads_to_prune):
    method forward (line 636) | def forward(
  class BertForPreTraining (line 750) | class BertForPreTraining(BertPreTrainedModel):
    method __init__ (line 751) | def __init__(self, config):
    method get_output_embeddings (line 759) | def get_output_embeddings(self):
    method forward (line 763) | def forward(
  class BertForMaskedLM (line 850) | class BertForMaskedLM(BertPreTrainedModel):
    method __init__ (line 851) | def __init__(self, config):
    method get_output_embeddings (line 859) | def get_output_embeddings(self):
    method forward (line 863) | def forward(
    method prepare_inputs_for_generation (line 960) | def prepare_inputs_for_generation(self, input_ids, attention_mask=None...
  class BertForNextSentencePrediction (line 986) | class BertForNextSentencePrediction(BertPreTrainedModel):
    method __init__ (line 987) | def __init__(self, config):
    method forward (line 996) | def forward(
  class BertForSequenceClassification (line 1074) | class BertForSequenceClassification(BertPreTrainedModel):
    method __init__ (line 1075) | def __init__(self, config):
    method forward (line 1086) | def forward(
  class BertForMultipleChoice (line 1171) | class BertForMultipleChoice(BertPreTrainedModel):
    method __init__ (line 1172) | def __init__(self, config):
    method forward (line 1182) | def forward(
  class BertForTokenClassification (line 1274) | class BertForTokenClassification(BertPreTrainedModel):
    method __init__ (line 1275) | def __init__(self, config):
    method forward (line 1286) | def forward(
  class BertForQuestionAnswering (line 1372) | class BertForQuestionAnswering(BertPreTrainedModel):
    method __init__ (line 1373) | def __init__(self, config):
    method forward (line 1383) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_camembert.py
  class CamembertModel (line 59) | class CamembertModel(RobertaModel):
  class CamembertForMaskedLM (line 71) | class CamembertForMaskedLM(RobertaForMaskedLM):
  class CamembertForSequenceClassification (line 85) | class CamembertForSequenceClassification(RobertaForSequenceClassification):
  class CamembertForMultipleChoice (line 99) | class CamembertForMultipleChoice(RobertaForMultipleChoice):
  class CamembertForTokenClassification (line 113) | class CamembertForTokenClassification(RobertaForTokenClassification):
  class CamembertForQuestionAnswering (line 127) | class CamembertForQuestionAnswering(RobertaForQuestionAnswering):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_ctrl.py
  function angle_defn (line 39) | def angle_defn(pos, i, d_model_size):
  function positional_encoding (line 44) | def positional_encoding(position, d_model_size, dtype):
  function scaled_dot_product_attention (line 59) | def scaled_dot_product_attention(q, k, v, mask, attention_mask=None, hea...
  class MultiHeadAttention (line 85) | class MultiHeadAttention(torch.nn.Module):
    method __init__ (line 86) | def __init__(self, d_model_size, num_heads, output_attentions=False):
    method split_into_heads (line 100) | def split_into_heads(self, x, batch_size):
    method forward (line 104) | def forward(self, v, k, q, mask, layer_past=None, attention_mask=None,...
  function point_wise_feed_forward_network (line 136) | def point_wise_feed_forward_network(d_model_size, dff):
  class EncoderLayer (line 140) | class EncoderLayer(torch.nn.Module):
    method __init__ (line 141) | def __init__(self, d_model_size, num_heads, dff, rate=0.1, output_atte...
    method forward (line 153) | def forward(self, x, mask, layer_past=None, attention_mask=None, head_...
  class CTRLPreTrainedModel (line 178) | class CTRLPreTrainedModel(PreTrainedModel):
    method _init_weights (line 186) | def _init_weights(self, module):
  class CTRLModel (line 263) | class CTRLModel(CTRLPreTrainedModel):
    method __init__ (line 264) | def __init__(self, config):
    method get_input_embeddings (line 287) | def get_input_embeddings(self):
    method set_input_embeddings (line 290) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 293) | def _prune_heads(self, heads_to_prune):
    method forward (line 301) | def forward(
  class CTRLLMHeadModel (line 458) | class CTRLLMHeadModel(CTRLPreTrainedModel):
    method __init__ (line 459) | def __init__(self, config):
    method get_output_embeddings (line 466) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 469) | def prepare_inputs_for_generation(self, input_ids, past, **kwargs):
    method forward (line 477) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_distilbert.py
  function create_sinusoidal_embeddings (line 54) | def create_sinusoidal_embeddings(n_pos, dim, out):
  class Embeddings (line 62) | class Embeddings(nn.Module):
    method __init__ (line 63) | def __init__(self, config):
    method forward (line 75) | def forward(self, input_ids):
  class MultiHeadSelfAttention (line 100) | class MultiHeadSelfAttention(nn.Module):
    method __init__ (line 101) | def __init__(self, config):
    method prune_heads (line 118) | def prune_heads(self, heads):
    method forward (line 139) | def forward(self, query, key, value, mask, head_mask=None):
  class FFN (line 198) | class FFN(nn.Module):
    method __init__ (line 199) | def __init__(self, config):
    method forward (line 209) | def forward(self, input):
  class TransformerBlock (line 217) | class TransformerBlock(nn.Module):
    method __init__ (line 218) | def __init__(self, config):
    method forward (line 231) | def forward(self, x, attn_mask=None, head_mask=None):
  class Transformer (line 264) | class Transformer(nn.Module):
    method __init__ (line 265) | def __init__(self, config):
    method forward (line 274) | def forward(self, x, attn_mask=None, head_mask=None):
  class DistilBertPreTrainedModel (line 325) | class DistilBertPreTrainedModel(PreTrainedModel):
    method _init_weights (line 334) | def _init_weights(self, module):
  class DistilBertModel (line 392) | class DistilBertModel(DistilBertPreTrainedModel):
    method __init__ (line 393) | def __init__(self, config):
    method get_input_embeddings (line 401) | def get_input_embeddings(self):
    method set_input_embeddings (line 404) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 407) | def _prune_heads(self, heads_to_prune):
    method forward (line 416) | def forward(self, input_ids=None, attention_mask=None, head_mask=None,...
  class DistilBertForMaskedLM (line 477) | class DistilBertForMaskedLM(DistilBertPreTrainedModel):
    method __init__ (line 478) | def __init__(self, config):
    method get_output_embeddings (line 492) | def get_output_embeddings(self):
    method forward (line 496) | def forward(self, input_ids=None, attention_mask=None, head_mask=None,...
  class DistilBertForSequenceClassification (line 558) | class DistilBertForSequenceClassification(DistilBertPreTrainedModel):
    method __init__ (line 559) | def __init__(self, config):
    method forward (line 571) | def forward(self, input_ids=None, attention_mask=None, head_mask=None,...
  class DistilBertForQuestionAnswering (line 638) | class DistilBertForQuestionAnswering(DistilBertPreTrainedModel):
    method __init__ (line 639) | def __init__(self, config):
    method forward (line 650) | def forward(
  class DistilBertForTokenClassification (line 740) | class DistilBertForTokenClassification(DistilBertPreTrainedModel):
    method __init__ (line 741) | def __init__(self, config):
    method forward (line 752) | def forward(self, input_ids=None, attention_mask=None, head_mask=None,...

FILE: code/bert-base-count3/pretrain/transformers1/modeling_electra.py
  function load_tf_weights_in_electra (line 28) | def load_tf_weights_in_electra(model, config, tf_checkpoint_path, discri...
  class ElectraEmbeddings (line 109) | class ElectraEmbeddings(BertEmbeddings):
    method __init__ (line 112) | def __init__(self, config):
  class ElectraDiscriminatorPredictions (line 123) | class ElectraDiscriminatorPredictions(nn.Module):
    method __init__ (line 126) | def __init__(self, config):
    method forward (line 133) | def forward(self, discriminator_hidden_states, attention_mask):
  class ElectraGeneratorPredictions (line 141) | class ElectraGeneratorPredictions(nn.Module):
    method __init__ (line 144) | def __init__(self, config):
    method forward (line 150) | def forward(self, generator_hidden_states):
  class ElectraPreTrainedModel (line 158) | class ElectraPreTrainedModel(BertPreTrainedModel):
  class ElectraModel (line 233) | class ElectraModel(ElectraPreTrainedModel):
    method __init__ (line 237) | def __init__(self, config):
    method get_input_embeddings (line 248) | def get_input_embeddings(self):
    method set_input_embeddings (line 251) | def set_input_embeddings(self, value):
    method _prune_heads (line 254) | def _prune_heads(self, heads_to_prune):
    method forward (line 263) | def forward(
  class ElectraClassificationHead (line 334) | class ElectraClassificationHead(nn.Module):
    method __init__ (line 337) | def __init__(self, config):
    method forward (line 343) | def forward(self, features, **kwargs):
  class ElectraForSequenceClassification (line 358) | class ElectraForSequenceClassification(ElectraPreTrainedModel):
    method __init__ (line 359) | def __init__(self, config):
    method forward (line 368) | def forward(
  class ElectraForPreTraining (line 448) | class ElectraForPreTraining(ElectraPreTrainedModel):
    method __init__ (line 449) | def __init__(self, config):
    method forward (line 457) | def forward(
  class ElectraForMaskedLM (line 542) | class ElectraForMaskedLM(ElectraPreTrainedModel):
    method __init__ (line 543) | def __init__(self, config):
    method get_output_embeddings (line 552) | def get_output_embeddings(self):
    method forward (line 556) | def forward(
  class ElectraForTokenClassification (line 634) | class ElectraForTokenClassification(ElectraPreTrainedModel):
    method __init__ (line 635) | def __init__(self, config):
    method forward (line 644) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_encoder_decoder.py
  class EncoderDecoderModel (line 29) | class EncoderDecoderModel(PreTrainedModel):
    method __init__ (line 40) | def __init__(
    method tie_weights (line 74) | def tie_weights(self):
    method get_encoder (line 78) | def get_encoder(self):
    method get_decoder (line 81) | def get_decoder(self):
    method get_input_embeddings (line 84) | def get_input_embeddings(self):
    method get_output_embeddings (line 87) | def get_output_embeddings(self):
    method from_encoder_decoder_pretrained (line 91) | def from_encoder_decoder_pretrained(
    method forward (line 183) | def forward(
    method prepare_inputs_for_generation (line 303) | def prepare_inputs_for_generation(self, input_ids, past, attention_mas...
    method _reorder_cache (line 321) | def _reorder_cache(self, past, beam_idx):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_flaubert.py
  class FlaubertModel (line 110) | class FlaubertModel(XLMModel):
    method __init__ (line 114) | def __init__(self, config):  # , dico, is_encoder, with_output):
    method forward (line 120) | def forward(
  class FlaubertWithLMHeadModel (line 300) | class FlaubertWithLMHeadModel(XLMWithLMHeadModel):
    method __init__ (line 308) | def __init__(self, config):
  class FlaubertForSequenceClassification (line 319) | class FlaubertForSequenceClassification(XLMForSequenceClassification):
    method __init__ (line 327) | def __init__(self, config):
  class FlaubertForQuestionAnsweringSimple (line 338) | class FlaubertForQuestionAnsweringSimple(XLMForQuestionAnsweringSimple):
    method __init__ (line 346) | def __init__(self, config):
  class FlaubertForQuestionAnswering (line 357) | class FlaubertForQuestionAnswering(XLMForQuestionAnswering):
    method __init__ (line 365) | def __init__(self, config):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_gpt2.py
  function load_tf_weights_in_gpt2 (line 44) | def load_tf_weights_in_gpt2(model, config, gpt2_checkpoint_path):
  class Attention (line 99) | class Attention(nn.Module):
    method __init__ (line 100) | def __init__(self, nx, n_ctx, config, scale=False):
    method prune_heads (line 121) | def prune_heads(self, heads):
    method _attn (line 143) | def _attn(self, q, k, v, attention_mask=None, head_mask=None):
    method merge_heads (line 167) | def merge_heads(self, x):
    method split_heads (line 172) | def split_heads(self, x, k=False):
    method forward (line 180) | def forward(self, x, layer_past=None, attention_mask=None, head_mask=N...
  class MLP (line 207) | class MLP(nn.Module):
    method __init__ (line 208) | def __init__(self, n_state, config):  # in MLP: n_state=3072 (4 * n_embd)
    method forward (line 216) | def forward(self, x):
  class Block (line 222) | class Block(nn.Module):
    method __init__ (line 223) | def __init__(self, n_ctx, config, scale=False):
    method forward (line 231) | def forward(self, x, layer_past=None, attention_mask=None, head_mask=N...
  class GPT2PreTrainedModel (line 249) | class GPT2PreTrainedModel(PreTrainedModel):
    method __init__ (line 258) | def __init__(self, *inputs, **kwargs):
    method _init_weights (line 261) | def _init_weights(self, module):
  class GPT2Model (line 339) | class GPT2Model(GPT2PreTrainedModel):
    method __init__ (line 340) | def __init__(self, config):
    method get_input_embeddings (line 353) | def get_input_embeddings(self):
    method set_input_embeddings (line 356) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 359) | def _prune_heads(self, heads_to_prune):
    method forward (line 367) | def forward(
  class GPT2LMHeadModel (line 523) | class GPT2LMHeadModel(GPT2PreTrainedModel):
    method __init__ (line 524) | def __init__(self, config):
    method get_output_embeddings (line 531) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 534) | def prepare_inputs_for_generation(self, input_ids, past, **kwargs):
    method forward (line 542) | def forward(
  class GPT2DoubleHeadsModel (line 631) | class GPT2DoubleHeadsModel(GPT2PreTrainedModel):
    method __init__ (line 632) | def __init__(self, config):
    method get_output_embeddings (line 641) | def get_output_embeddings(self):
    method forward (line 645) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_longformer.py
  function _get_question_end_index (line 43) | def _get_question_end_index(input_ids, sep_token_id):
  function _compute_global_attention_mask (line 59) | def _compute_global_attention_mask(input_ids, sep_token_id, before_sep_t...
  class LongformerSelfAttention (line 81) | class LongformerSelfAttention(nn.Module):
    method __init__ (line 82) | def __init__(self, config, layer_id):
    method _skew (line 117) | def _skew(x, direction):
    method _skew2 (line 124) | def _skew2(x):
    method _chunk (line 136) | def _chunk(x, w):
    method _mask_invalid_locations (line 150) | def _mask_invalid_locations(self, input_tensor, w) -> torch.Tensor:
    method _sliding_chunks_matmul_qk (line 163) | def _sliding_chunks_matmul_qk(self, q: torch.Tensor, k: torch.Tensor, ...
    method _sliding_chunks_matmul_pv (line 210) | def _sliding_chunks_matmul_pv(self, prob: torch.Tensor, v: torch.Tenso...
    method forward (line 238) | def forward(
  class LongformerModel (line 498) | class LongformerModel(RobertaModel):
    method __init__ (line 519) | def __init__(self, config):
    method _pad_to_window_size (line 538) | def _pad_to_window_size(
    method forward (line 582) | def forward(
  class LongformerForMaskedLM (line 686) | class LongformerForMaskedLM(BertPreTrainedModel):
    method __init__ (line 690) | def __init__(self, config):
    method forward (line 699) | def forward(
  class LongformerForSequenceClassification (line 776) | class LongformerForSequenceClassification(BertPreTrainedModel):
    method __init__ (line 780) | def __init__(self, config):
    method forward (line 788) | def forward(
  class LongformerClassificationHead (line 868) | class LongformerClassificationHead(nn.Module):
    method __init__ (line 871) | def __init__(self, config):
    method forward (line 877) | def forward(self, hidden_states, **kwargs):
  class LongformerForQuestionAnswering (line 892) | class LongformerForQuestionAnswering(BertPreTrainedModel):
    method __init__ (line 896) | def __init__(self, config):
    method forward (line 906) | def forward(
  class LongformerForTokenClassification (line 1016) | class LongformerForTokenClassification(BertPreTrainedModel):
    method __init__ (line 1020) | def __init__(self, config):
    method forward (line 1031) | def forward(
  class LongformerForMultipleChoice (line 1116) | class LongformerForMultipleChoice(BertPreTrainedModel):
    method __init__ (line 1120) | def __init__(self, config):
    method forward (line 1130) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_marian.py
  class MarianMTModel (line 26) | class MarianMTModel(BartForConditionalGeneration):
    method prepare_logits_for_generation (line 49) | def prepare_logits_for_generation(self, logits, cur_len, max_length):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_mmbt.py
  class ModalEmbeddings (line 32) | class ModalEmbeddings(nn.Module):
    method __init__ (line 36) | def __init__(self, config, encoder, embeddings):
    method forward (line 47) | def forward(self, input_modal, start_token=None, end_token=None, posit...
  class MMBTModel (line 152) | class MMBTModel(nn.Module, ModuleUtilsMixin):
    method __init__ (line 180) | def __init__(self, config, transformer, encoder):
    method forward (line 186) | def forward(
    method get_input_embeddings (line 268) | def get_input_embeddings(self):
    method set_input_embeddings (line 271) | def set_input_embeddings(self, value):
  class MMBTForClassification (line 281) | class MMBTForClassification(nn.Module):
    method __init__ (line 312) | def __init__(self, config, transformer, encoder):
    method forward (line 320) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_openai.py
  function load_tf_weights_in_openai_gpt (line 42) | def load_tf_weights_in_openai_gpt(model, config, openai_checkpoint_folde...
  class Attention (line 122) | class Attention(nn.Module):
    method __init__ (line 123) | def __init__(self, nx, n_ctx, config, scale=False):
    method prune_heads (line 141) | def prune_heads(self, heads):
    method _attn (line 160) | def _attn(self, q, k, v, attention_mask=None, head_mask=None):
    method merge_heads (line 185) | def merge_heads(self, x):
    method split_heads (line 190) | def split_heads(self, x, k=False):
    method forward (line 198) | def forward(self, x, attention_mask=None, head_mask=None):
  class MLP (line 216) | class MLP(nn.Module):
    method __init__ (line 217) | def __init__(self, n_state, config):  # in MLP: n_state=3072 (4 * n_embd)
    method forward (line 225) | def forward(self, x):
  class Block (line 231) | class Block(nn.Module):
    method __init__ (line 232) | def __init__(self, n_ctx, config, scale=False):
    method forward (line 240) | def forward(self, x, attention_mask=None, head_mask=None):
  class OpenAIGPTPreTrainedModel (line 252) | class OpenAIGPTPreTrainedModel(PreTrainedModel):
    method _init_weights (line 261) | def _init_weights(self, module):
  class OpenAIGPTModel (line 329) | class OpenAIGPTModel(OpenAIGPTPreTrainedModel):
    method __init__ (line 330) | def __init__(self, config):
    method get_input_embeddings (line 342) | def get_input_embeddings(self):
    method set_input_embeddings (line 345) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 348) | def _prune_heads(self, heads_to_prune):
    method forward (line 356) | def forward(
  class OpenAIGPTLMHeadModel (line 471) | class OpenAIGPTLMHeadModel(OpenAIGPTPreTrainedModel):
    method __init__ (line 472) | def __init__(self, config):
    method get_output_embeddings (line 479) | def get_output_embeddings(self):
    method forward (line 483) | def forward(
  class OpenAIGPTDoubleHeadsModel (line 567) | class OpenAIGPTDoubleHeadsModel(OpenAIGPTPreTrainedModel):
    method __init__ (line 568) | def __init__(self, config):
    method get_output_embeddings (line 578) | def get_output_embeddings(self):
    method forward (line 582) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_reformer.py
  function mish (line 45) | def mish(x):
  function _get_least_common_mult_chunk_len (line 70) | def _get_least_common_mult_chunk_len(config):
  class AxialPositionEmbeddings (line 87) | class AxialPositionEmbeddings(nn.Module):
    method __init__ (line 92) | def __init__(self, config):
    method forward (line 117) | def forward(self, position_ids):
  class PositionEmbeddings (line 166) | class PositionEmbeddings(nn.Module):
    method __init__ (line 170) | def __init__(self, config):
    method forward (line 175) | def forward(self, position_ids):
  class ReformerEmbeddings (line 181) | class ReformerEmbeddings(nn.Module):
    method __init__ (line 185) | def __init__(self, config):
    method forward (line 195) | def forward(self, input_ids=None, position_ids=None, inputs_embeds=None):
  class EfficientAttentionMixin (line 226) | class EfficientAttentionMixin:
    method _look_adjacent (line 231) | def _look_adjacent(self, vectors, num_chunks_before, num_chunks_after):
    method _split_hidden_size_dim (line 254) | def _split_hidden_size_dim(self, x, num_attn_heads, attn_head_size):
    method _merge_hidden_size_dims (line 262) | def _merge_hidden_size_dims(self, x, num_attn_heads, attn_head_size):
    method _split_seq_length_dim_to (line 269) | def _split_seq_length_dim_to(self, vectors, dim_factor_1, dim_factor_2...
  class LSHSelfAttention (line 284) | class LSHSelfAttention(nn.Module, EfficientAttentionMixin):
    method __init__ (line 285) | def __init__(self, config):
    method forward (line 315) | def forward(
    method _hash_vectors (line 441) | def _hash_vectors(self, vectors, num_hashes):
    method _get_sorted_bucket_idx_and_undo_sorted_bucket_idx (line 506) | def _get_sorted_bucket_idx_and_undo_sorted_bucket_idx(self, sequence_l...
    method _set_num_buckets (line 537) | def _set_num_buckets(self, sequence_length):
    method _attend (line 556) | def _attend(
    method _compute_attn_mask (line 635) | def _compute_attn_mask(self, query_indices, key_indices, attention_mask):
    method _len_and_dim_norm (line 663) | def _len_and_dim_norm(self, vectors):
    method _len_norm (line 673) | def _len_norm(self, x, epsilon=1e-6):
    method _gather_by_expansion (line 681) | def _gather_by_expansion(self, vectors, idxs, num_hashes):
  class ReverseSort (line 690) | class ReverseSort(Function):
    method forward (line 700) | def forward(ctx, out_vectors, logits, sorted_bucket_idx, undo_sorted_b...
    method backward (line 713) | def backward(ctx, grad_out_vectors, grad_logits):
  class LocalSelfAttention (line 747) | class LocalSelfAttention(nn.Module, EfficientAttentionMixin):
    method __init__ (line 748) | def __init__(self, config):
    method forward (line 773) | def forward(self, hidden_states, attention_mask=None, head_mask=None, ...
    method _compute_attn_mask (line 888) | def _compute_attn_mask(self, query_indices, key_indices, attention_mas...
  class ReformerSelfOutput (line 913) | class ReformerSelfOutput(nn.Module):
    method __init__ (line 914) | def __init__(self, config):
    method forward (line 921) | def forward(self, hidden_states):
  class ReformerAttention (line 927) | class ReformerAttention(nn.Module):
    method __init__ (line 928) | def __init__(self, config, layer_id=0):
    method forward (line 953) | def forward(
  class ReformerFeedForwardDense (line 986) | class ReformerFeedForwardDense(nn.Module):
    method __init__ (line 987) | def __init__(self, config):
    method forward (line 998) | def forward(self, hidden_states):
  class ReformerFeedForwardOutput (line 1005) | class ReformerFeedForwardOutput(nn.Module):
    method __init__ (line 1006) | def __init__(self, config):
    method forward (line 1012) | def forward(self, hidden_states):
  class ChunkReformerFeedForward (line 1018) | class ChunkReformerFeedForward(nn.Module):
    method __init__ (line 1019) | def __init__(self, config):
    method forward (line 1028) | def forward(self, attention_output):
    method forward_chunk (line 1033) | def forward_chunk(self, hidden_states):
  class ReformerLayer (line 1039) | class ReformerLayer(nn.Module):
    method __init__ (line 1040) | def __init__(self, config, layer_id=0):
    method _init_attention_seed (line 1050) | def _init_attention_seed(self):
    method _init_feed_forward_seed (line 1070) | def _init_feed_forward_seed(self):
    method forward (line 1090) | def forward(
    method backward_pass (line 1134) | def backward_pass(
  class _ReversibleFunction (line 1195) | class _ReversibleFunction(Function):
    method forward (line 1205) | def forward(
    method backward (line 1256) | def backward(ctx, grad_hidden_states):
  class ReformerEncoder (line 1302) | class ReformerEncoder(nn.Module):
    method __init__ (line 1303) | def __init__(self, config):
    method forward (line 1312) | def forward(
  class ReformerOnlyLMHead (line 1350) | class ReformerOnlyLMHead(nn.Module):
    method __init__ (line 1351) | def __init__(self, config):
    method forward (line 1363) | def forward(self, hidden_states):
    method forward_chunk (line 1366) | def forward_chunk(self, hidden_states):
  class ReformerPreTrainedModel (line 1371) | class ReformerPreTrainedModel(PreTrainedModel):
    method dummy_inputs (line 1380) | def dummy_inputs(self):
    method _init_weights (line 1389) | def _init_weights(self, module):
  class ReformerModel (line 1470) | class ReformerModel(ReformerPreTrainedModel):
    method __init__ (line 1471) | def __init__(self, config):
    method get_input_embeddings (line 1483) | def get_input_embeddings(self):
    method set_input_embeddings (line 1486) | def set_input_embeddings(self, value):
    method _prune_heads (line 1489) | def _prune_heads(self, heads_to_prune):
    method forward (line 1498) | def forward(
    method _pad_to_mult_of_chunk_length (line 1615) | def _pad_to_mult_of_chunk_length(
  class ReformerModelWithLMHead (line 1674) | class ReformerModelWithLMHead(ReformerPreTrainedModel):
    method __init__ (line 1675) | def __init__(self, config):
    method get_output_embeddings (line 1682) | def get_output_embeddings(self):
    method tie_weights (line 1685) | def tie_weights(self):
    method forward (line 1690) | def forward(
    method prepare_inputs_for_generation (line 1766) | def prepare_inputs_for_generation(self, input_ids, past, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_roberta.py
  class RobertaEmbeddings (line 44) | class RobertaEmbeddings(BertEmbeddings):
    method __init__ (line 49) | def __init__(self, config):
    method forward (line 57) | def forward(self, input_ids=None, token_type_ids=None, position_ids=No...
    method create_position_ids_from_inputs_embeds (line 69) | def create_position_ids_from_inputs_embeds(self, inputs_embeds):
  class RobertaModel (line 139) | class RobertaModel(BertModel):
    method __init__ (line 148) | def __init__(self, config):
    method get_input_embeddings (line 154) | def get_input_embeddings(self):
    method set_input_embeddings (line 157) | def set_input_embeddings(self, value):
  class RobertaForMaskedLM (line 162) | class RobertaForMaskedLM(BertPreTrainedModel):
    method __init__ (line 166) | def __init__(self, config):
    method get_output_embeddings (line 174) | def get_output_embeddings(self):
    method forward (line 178) | def forward(
  class RobertaLMHead (line 246) | class RobertaLMHead(nn.Module):
    method __init__ (line 249) | def __init__(self, config):
    method forward (line 260) | def forward(self, features, **kwargs):
  class RobertaForSequenceClassification (line 276) | class RobertaForSequenceClassification(BertPreTrainedModel):
    method __init__ (line 280) | def __init__(self, config):
    method forward (line 288) | def forward(
  class RobertaForMultipleChoice (line 366) | class RobertaForMultipleChoice(BertPreTrainedModel):
    method __init__ (line 370) | def __init__(self, config):
    method forward (line 380) | def forward(
  class RobertaForTokenClassification (line 464) | class RobertaForTokenClassification(BertPreTrainedModel):
    method __init__ (line 468) | def __init__(self, config):
    method forward (line 479) | def forward(
  class RobertaClassificationHead (line 559) | class RobertaClassificationHead(nn.Module):
    method __init__ (line 562) | def __init__(self, config):
    method forward (line 568) | def forward(self, features, **kwargs):
  class RobertaForQuestionAnswering (line 583) | class RobertaForQuestionAnswering(BertPreTrainedModel):
    method __init__ (line 587) | def __init__(self, config):
    method forward (line 597) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_t5.py
  function load_tf_weights_in_t5 (line 53) | def load_tf_weights_in_t5(model, config, tf_checkpoint_path):
  class T5LayerNorm (line 143) | class T5LayerNorm(nn.Module):
    method __init__ (line 144) | def __init__(self, hidden_size, eps=1e-6):
    method forward (line 152) | def forward(self, x):
  class T5DenseReluDense (line 162) | class T5DenseReluDense(nn.Module):
    method __init__ (line 163) | def __init__(self, config):
    method forward (line 169) | def forward(self, hidden_states):
  class T5LayerFF (line 177) | class T5LayerFF(nn.Module):
    method __init__ (line 178) | def __init__(self, config):
    method forward (line 184) | def forward(self, hidden_states):
  class T5Attention (line 191) | class T5Attention(nn.Module):
    method __init__ (line 192) | def __init__(self, config: T5Config, has_relative_attention_bias=False):
    method prune_heads (line 215) | def prune_heads(self, heads):
    method _relative_position_bucket (line 236) | def _relative_position_bucket(relative_position, bidirectional=True, n...
    method compute_bias (line 283) | def compute_bias(self, qlen, klen):
    method forward (line 298) | def forward(
  class T5LayerSelfAttention (line 401) | class T5LayerSelfAttention(nn.Module):
    method __init__ (line 402) | def __init__(self, config, has_relative_attention_bias=False):
    method forward (line 408) | def forward(
  class T5LayerCrossAttention (line 432) | class T5LayerCrossAttention(nn.Module):
    method __init__ (line 433) | def __init__(self, config, has_relative_attention_bias=False):
    method forward (line 439) | def forward(
  class T5Block (line 467) | class T5Block(nn.Module):
    method __init__ (line 468) | def __init__(self, config, has_relative_attention_bias=False):
    method forward (line 478) | def forward(
  class T5PreTrainedModel (line 553) | class T5PreTrainedModel(PreTrainedModel):
    method dummy_inputs (line 563) | def dummy_inputs(self):
    method _init_weights (line 573) | def _init_weights(self, module):
    method _shift_right (line 605) | def _shift_right(self, input_ids):
  class T5Stack (line 627) | class T5Stack(T5PreTrainedModel):
    method __init__ (line 628) | def __init__(self, config, embed_tokens=None):
    method get_input_embeddings (line 644) | def get_input_embeddings(self):
    method get_output_embeddings (line 647) | def get_output_embeddings(self):
    method set_input_embeddings (line 650) | def set_input_embeddings(self, new_embeddings):
    method forward (line 653) | def forward(
  class T5Model (line 846) | class T5Model(T5PreTrainedModel):
    method __init__ (line 847) | def __init__(self, config):
    method get_input_embeddings (line 860) | def get_input_embeddings(self):
    method set_input_embeddings (line 863) | def set_input_embeddings(self, new_embeddings):
    method get_encoder (line 868) | def get_encoder(self):
    method get_decoder (line 871) | def get_decoder(self):
    method _prune_heads (line 874) | def _prune_heads(self, heads_to_prune):
    method forward (line 883) | def forward(
  class T5ForConditionalGeneration (line 966) | class T5ForConditionalGeneration(T5PreTrainedModel):
    method __init__ (line 967) | def __init__(self, config):
    method get_input_embeddings (line 984) | def get_input_embeddings(self):
    method set_input_embeddings (line 987) | def set_input_embeddings(self, new_embeddings):
    method get_output_embeddings (line 992) | def get_output_embeddings(self):
    method get_encoder (line 995) | def get_encoder(self):
    method get_decoder (line 998) | def get_decoder(self):
    method forward (line 1002) | def forward(
    method prepare_inputs_for_generation (line 1114) | def prepare_inputs_for_generation(self, input_ids, past, attention_mas...
    method _reorder_cache (line 1131) | def _reorder_cache(self, past, beam_idx):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_albert.py
  class TFAlbertEmbeddings (line 45) | class TFAlbertEmbeddings(tf.keras.layers.Layer):
    method __init__ (line 49) | def __init__(self, config, **kwargs):
    method build (line 71) | def build(self, input_shape):
    method call (line 83) | def call(self, inputs, mode="embedding", training=False):
    method _embedding (line 105) | def _embedding(self, inputs, training=False):
    method _linear (line 130) | def _linear(self, inputs):
  class TFAlbertSelfAttention (line 144) | class TFAlbertSelfAttention(tf.keras.layers.Layer):
    method __init__ (line 145) | def __init__(self, config, **kwargs):
    method transpose_for_scores (line 171) | def transpose_for_scores(self, x, batch_size):
    method call (line 175) | def call(self, inputs, training=False):
  class TFAlbertSelfOutput (line 220) | class TFAlbertSelfOutput(tf.keras.layers.Layer):
    method __init__ (line 221) | def __init__(self, config, **kwargs):
    method call (line 229) | def call(self, inputs, training=False):
  class TFAlbertAttention (line 238) | class TFAlbertAttention(TFBertSelfAttention):
    method __init__ (line 239) | def __init__(self, config, **kwargs):
    method prune_heads (line 249) | def prune_heads(self, heads):
    method call (line 252) | def call(self, inputs, training=False):
  class TFAlbertLayer (line 306) | class TFAlbertLayer(tf.keras.layers.Layer):
    method __init__ (line 307) | def __init__(self, config, **kwargs):
    method call (line 328) | def call(self, inputs, training=False):
  class TFAlbertLayerGroup (line 344) | class TFAlbertLayerGroup(tf.keras.layers.Layer):
    method __init__ (line 345) | def __init__(self, config, **kwargs):
    method call (line 354) | def call(self, inputs, training=False):
  class TFAlbertTransformer (line 379) | class TFAlbertTransformer(tf.keras.layers.Layer):
    method __init__ (line 380) | def __init__(self, config, **kwargs):
    method call (line 396) | def call(self, inputs, training=False):
  class TFAlbertPreTrainedModel (line 438) | class TFAlbertPreTrainedModel(TFPreTrainedModel):
  class TFAlbertMLMHead (line 447) | class TFAlbertMLMHead(tf.keras.layers.Layer):
    method __init__ (line 448) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 466) | def build(self, input_shape):
    method call (line 473) | def call(self, hidden_states):
  class TFAlbertMainLayer (line 482) | class TFAlbertMainLayer(tf.keras.layers.Layer):
    method __init__ (line 485) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 498) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 501) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 504) | def _prune_heads(self, heads_to_prune):
    method call (line 511) | def call(
  class TFAlbertModel (line 674) | class TFAlbertModel(TFAlbertPreTrainedModel):
    method __init__ (line 675) | def __init__(self, config, *inputs, **kwargs):
    method call (line 680) | def call(self, inputs, **kwargs):
  class TFAlbertForPreTraining (line 725) | class TFAlbertForPreTraining(TFAlbertPreTrainedModel):
    method __init__ (line 726) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 734) | def get_output_embeddings(self):
    method call (line 738) | def call(self, inputs, **kwargs):
  class TFAlbertSOPHead (line 772) | class TFAlbertSOPHead(tf.keras.layers.Layer):
    method __init__ (line 773) | def __init__(self, config, **kwargs):
    method call (line 781) | def call(self, pooled_output, training: bool):
  class TFAlbertForMaskedLM (line 788) | class TFAlbertForMaskedLM(TFAlbertPreTrainedModel):
    method __init__ (line 789) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 795) | def get_output_embeddings(self):
    method call (line 799) | def call(self, inputs, **kwargs):
  class TFAlbertForSequenceClassification (line 844) | class TFAlbertForSequenceClassification(TFAlbertPreTrainedModel):
    method __init__ (line 845) | def __init__(self, config, *inputs, **kwargs):
    method call (line 856) | def call(self, inputs, **kwargs):
  class TFAlbertForQuestionAnswering (line 901) | class TFAlbertForQuestionAnswering(TFAlbertPreTrainedModel):
    method __init__ (line 902) | def __init__(self, config, *inputs, **kwargs):
    method call (line 912) | def call(self, inputs, **kwargs):
  class TFAlbertForMultipleChoice (line 967) | class TFAlbertForMultipleChoice(TFAlbertPreTrainedModel):
    method __init__ (line 968) | def __init__(self, config, *inputs, **kwargs):
    method dummy_inputs (line 978) | def dummy_inputs(self):
    method call (line 987) | def call(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_auto.py
  class TFAutoModel (line 174) | class TFAutoModel(object):
    method __init__ (line 198) | def __init__(self):
    method from_config (line 206) | def from_config(cls, config):
    method from_pretrained (line 244) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelForPreTraining (line 336) | class TFAutoModelForPreTraining(object):
    method __init__ (line 345) | def __init__(self):
    method from_config (line 353) | def from_config(cls, config):
    method from_pretrained (line 392) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelWithLMHead (line 486) | class TFAutoModelWithLMHead(object):
    method __init__ (line 510) | def __init__(self):
    method from_config (line 518) | def from_config(cls, config):
    method from_pretrained (line 556) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelForMultipleChoice (line 649) | class TFAutoModelForMultipleChoice:
    method __init__ (line 665) | def __init__(self):
    method from_config (line 673) | def from_config(cls, config):
    method from_pretrained (line 706) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelForSequenceClassification (line 796) | class TFAutoModelForSequenceClassification(object):
    method __init__ (line 815) | def __init__(self):
    method from_config (line 823) | def from_config(cls, config):
    method from_pretrained (line 859) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelForQuestionAnswering (line 952) | class TFAutoModelForQuestionAnswering(object):
    method __init__ (line 972) | def __init__(self):
    method from_config (line 980) | def from_config(cls, config):
    method from_pretrained (line 1017) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
  class TFAutoModelForTokenClassification (line 1111) | class TFAutoModelForTokenClassification:
    method __init__ (line 1112) | def __init__(self):
    method from_config (line 1120) | def from_config(cls, config):
    method from_pretrained (line 1155) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_bert.py
  function gelu (line 58) | def gelu(x):
  function gelu_new (line 69) | def gelu_new(x):
  function swish (line 82) | def swish(x):
  class TFBertEmbeddings (line 94) | class TFBertEmbeddings(tf.keras.layers.Layer):
    method __init__ (line 98) | def __init__(self, config, **kwargs):
    method build (line 122) | def build(self, input_shape):
    method call (line 134) | def call(self, inputs, mode="embedding", training=False):
    method _embedding (line 156) | def _embedding(self, inputs, training=False):
    method _linear (line 181) | def _linear(self, inputs):
  class TFBertSelfAttention (line 197) | class TFBertSelfAttention(tf.keras.layers.Layer):
    method __init__ (line 198) | def __init__(self, config, **kwargs):
    method transpose_for_scores (line 224) | def transpose_for_scores(self, x, batch_size):
    method call (line 228) | def call(self, inputs, training=False):
  class TFBertSelfOutput (line 273) | class TFBertSelfOutput(tf.keras.layers.Layer):
    method __init__ (line 274) | def __init__(self, config, **kwargs):
    method call (line 282) | def call(self, inputs, training=False):
  class TFBertAttention (line 291) | class TFBertAttention(tf.keras.layers.Layer):
    method __init__ (line 292) | def __init__(self, config, **kwargs):
    method prune_heads (line 297) | def prune_heads(self, heads):
    method call (line 300) | def call(self, inputs, training=False):
  class TFBertIntermediate (line 309) | class TFBertIntermediate(tf.keras.layers.Layer):
    method __init__ (line 310) | def __init__(self, config, **kwargs):
    method call (line 320) | def call(self, hidden_states):
  class TFBertOutput (line 326) | class TFBertOutput(tf.keras.layers.Layer):
    method __init__ (line 327) | def __init__(self, config, **kwargs):
    method call (line 335) | def call(self, inputs, training=False):
  class TFBertLayer (line 344) | class TFBertLayer(tf.keras.layers.Layer):
    method __init__ (line 345) | def __init__(self, config, **kwargs):
    method call (line 351) | def call(self, inputs, training=False):
  class TFBertEncoder (line 362) | class TFBertEncoder(tf.keras.layers.Layer):
    method __init__ (line 363) | def __init__(self, config, **kwargs):
    method call (line 369) | def call(self, inputs, training=False):
  class TFBertPooler (line 396) | class TFBertPooler(tf.keras.layers.Layer):
    method __init__ (line 397) | def __init__(self, config, **kwargs):
    method call (line 406) | def call(self, hidden_states):
  class TFBertPredictionHeadTransform (line 414) | class TFBertPredictionHeadTransform(tf.keras.layers.Layer):
    method __init__ (line 415) | def __init__(self, config, **kwargs):
    method call (line 426) | def call(self, hidden_states):
  class TFBertLMPredictionHead (line 433) | class TFBertLMPredictionHead(tf.keras.layers.Layer):
    method __init__ (line 434) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 443) | def build(self, input_shape):
    method call (line 447) | def call(self, hidden_states):
  class TFBertMLMHead (line 454) | class TFBertMLMHead(tf.keras.layers.Layer):
    method __init__ (line 455) | def __init__(self, config, input_embeddings, **kwargs):
    method call (line 459) | def call(self, sequence_output):
  class TFBertNSPHead (line 464) | class TFBertNSPHead(tf.keras.layers.Layer):
    method __init__ (line 465) | def __init__(self, config, **kwargs):
    method call (line 471) | def call(self, pooled_output):
  class TFBertMainLayer (line 477) | class TFBertMainLayer(tf.keras.layers.Layer):
    method __init__ (line 480) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 488) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 491) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 494) | def _prune_heads(self, heads_to_prune):
    method call (line 501) | def call(
  class TFBertPreTrainedModel (line 583) | class TFBertPreTrainedModel(TFPreTrainedModel):
  class TFBertModel (line 667) | class TFBertModel(TFBertPreTrainedModel):
    method __init__ (line 668) | def __init__(self, config, *inputs, **kwargs):
    method call (line 673) | def call(self, inputs, **kwargs):
  class TFBertForPreTraining (line 718) | class TFBertForPreTraining(TFBertPreTrainedModel):
    method __init__ (line 719) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 726) | def get_output_embeddings(self):
    method call (line 730) | def call(self, inputs, **kwargs):
  class TFBertForMaskedLM (line 775) | class TFBertForMaskedLM(TFBertPreTrainedModel):
    method __init__ (line 776) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 782) | def get_output_embeddings(self):
    method call (line 786) | def call(self, inputs, **kwargs):
  class TFBertForNextSentencePrediction (line 828) | class TFBertForNextSentencePrediction(TFBertPreTrainedModel):
    method __init__ (line 829) | def __init__(self, config, *inputs, **kwargs):
    method call (line 836) | def call(self, inputs, **kwargs):
  class TFBertForSequenceClassification (line 883) | class TFBertForSequenceClassification(TFBertPreTrainedModel):
    method __init__ (line 884) | def __init__(self, config, *inputs, **kwargs):
    method call (line 895) | def call(self, inputs, **kwargs):
  class TFBertForMultipleChoice (line 941) | class TFBertForMultipleChoice(TFBertPreTrainedModel):
    method __init__ (line 942) | def __init__(self, config, *inputs, **kwargs):
    method dummy_inputs (line 952) | def dummy_inputs(self):
    method call (line 961) | def call(
  class TFBertForTokenClassification (line 1064) | class TFBertForTokenClassification(TFBertPreTrainedModel):
    method __init__ (line 1065) | def __init__(self, config, *inputs, **kwargs):
    method call (line 1076) | def call(self, inputs, **kwargs):
  class TFBertForQuestionAnswering (line 1122) | class TFBertForQuestionAnswering(TFBertPreTrainedModel):
    method __init__ (line 1123) | def __init__(self, config, *inputs, **kwargs):
    method call (line 1133) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_camembert.py
  class TFCamembertModel (line 70) | class TFCamembertModel(TFRobertaModel):
  class TFCamembertForMaskedLM (line 82) | class TFCamembertForMaskedLM(TFRobertaForMaskedLM):
  class TFCamembertForSequenceClassification (line 96) | class TFCamembertForSequenceClassification(TFRobertaForSequenceClassific...
  class TFCamembertForTokenClassification (line 110) | class TFCamembertForTokenClassification(TFRobertaForTokenClassification):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_ctrl.py
  function angle_defn (line 38) | def angle_defn(pos, i, d_model_size):
  function positional_encoding (line 43) | def positional_encoding(position, d_model_size):
  function scaled_dot_product_attention (line 55) | def scaled_dot_product_attention(q, k, v, mask, attention_mask=None, hea...
  class TFMultiHeadAttention (line 80) | class TFMultiHeadAttention(tf.keras.layers.Layer):
    method __init__ (line 81) | def __init__(self, d_model_size, num_heads, output_attentions=False, *...
    method split_into_heads (line 95) | def split_into_heads(self, x, batch_size):
    method call (line 99) | def call(self, inputs, training=False):
  function point_wise_feed_forward_network (line 142) | def point_wise_feed_forward_network(d_model_size, dff, name=""):
  class TFEncoderLayer (line 149) | class TFEncoderLayer(tf.keras.layers.Layer):
    method __init__ (line 150) | def __init__(
    method call (line 166) | def call(self, inputs, training=False):
  class TFCTRLMainLayer (line 186) | class TFCTRLMainLayer(tf.keras.layers.Layer):
    method __init__ (line 189) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 218) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 221) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 224) | def _prune_heads(self, heads_to_prune):
    method call (line 230) | def call(
  class TFCTRLPreTrainedModel (line 379) | class TFCTRLPreTrainedModel(TFPreTrainedModel):
  class TFCTRLModel (line 471) | class TFCTRLModel(TFCTRLPreTrainedModel):
    method __init__ (line 472) | def __init__(self, config, *inputs, **kwargs):
    method call (line 477) | def call(self, inputs, **kwargs):
  class TFCTRLLMHead (line 515) | class TFCTRLLMHead(tf.keras.layers.Layer):
    method __init__ (line 516) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 524) | def build(self, input_shape):
    method call (line 528) | def call(self, hidden_states):
  class TFCTRLLMHeadModel (line 539) | class TFCTRLLMHeadModel(TFCTRLPreTrainedModel):
    method __init__ (line 540) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 546) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 549) | def prepare_inputs_for_generation(self, inputs, past, **kwargs):
    method call (line 557) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_distilbert.py
  function gelu (line 46) | def gelu(x):
  function gelu_new (line 57) | def gelu_new(x):
  class TFEmbeddings (line 70) | class TFEmbeddings(tf.keras.layers.Layer):
    method __init__ (line 71) | def __init__(self, config, **kwargs):
    method build (line 89) | def build(self, input_shape):
    method call (line 99) | def call(self, inputs, inputs_embeds=None, mode="embedding", training=...
    method _embedding (line 121) | def _embedding(self, inputs, inputs_embeds=None, training=False):
    method _linear (line 156) | def _linear(self, inputs):
  class TFMultiHeadSelfAttention (line 172) | class TFMultiHeadSelfAttention(tf.keras.layers.Layer):
    method __init__ (line 173) | def __init__(self, config, **kwargs):
    method prune_heads (line 198) | def prune_heads(self, heads):
    method call (line 201) | def call(self, inputs, training=False):
  class TFFFN (line 262) | class TFFFN(tf.keras.layers.Layer):
    method __init__ (line 263) | def __init__(self, config, **kwargs):
    method call (line 279) | def call(self, input, training=False):
  class TFTransformerBlock (line 287) | class TFTransformerBlock(tf.keras.layers.Layer):
    method __init__ (line 288) | def __init__(self, config, **kwargs):
    method call (line 306) | def call(self, inputs, training=False):  # removed: src_enc=None, src_...
  class TFTransformer (line 341) | class TFTransformer(tf.keras.layers.Layer):
    method __init__ (line 342) | def __init__(self, config, **kwargs):
    method call (line 350) | def call(self, inputs, training=False):
  class TFDistilBertMainLayer (line 402) | class TFDistilBertMainLayer(tf.keras.layers.Layer):
    method __init__ (line 403) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 410) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 413) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 416) | def _prune_heads(self, heads_to_prune):
    method call (line 419) | def call(self, inputs, attention_mask=None, head_mask=None, inputs_emb...
  class TFDistilBertPreTrainedModel (line 465) | class TFDistilBertPreTrainedModel(TFPreTrainedModel):
  class TFDistilBertModel (line 539) | class TFDistilBertModel(TFDistilBertPreTrainedModel):
    method __init__ (line 540) | def __init__(self, config, *inputs, **kwargs):
    method call (line 545) | def call(self, inputs, **kwargs):
  class TFDistilBertLMHead (line 577) | class TFDistilBertLMHead(tf.keras.layers.Layer):
    method __init__ (line 578) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 586) | def build(self, input_shape):
    method call (line 590) | def call(self, hidden_states):
  class TFDistilBertForMaskedLM (line 599) | class TFDistilBertForMaskedLM(TFDistilBertPreTrainedModel):
    method __init__ (line 600) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 614) | def get_output_embeddings(self):
    method call (line 618) | def call(self, inputs, **kwargs):
  class TFDistilBertForSequenceClassification (line 665) | class TFDistilBertForSequenceClassification(TFDistilBertPreTrainedModel):
    method __init__ (line 666) | def __init__(self, config, *inputs, **kwargs):
    method call (line 683) | def call(self, inputs, **kwargs):
  class TFDistilBertForTokenClassification (line 729) | class TFDistilBertForTokenClassification(TFDistilBertPreTrainedModel):
    method __init__ (line 730) | def __init__(self, config, *inputs, **kwargs):
    method call (line 741) | def call(self, inputs, **kwargs):
  class TFDistilBertForQuestionAnswering (line 786) | class TFDistilBertForQuestionAnswering(TFDistilBertPreTrainedModel):
    method __init__ (line 787) | def __init__(self, config, *inputs, **kwargs):
    method call (line 798) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_electra.py
  class TFElectraEmbeddings (line 27) | class TFElectraEmbeddings(tf.keras.layers.Layer):
    method __init__ (line 31) | def __init__(self, config, **kwargs):
    method build (line 55) | def build(self, input_shape):
    method call (line 67) | def call(self, inputs, mode="embedding", training=False):
    method _embedding (line 89) | def _embedding(self, inputs, training=False):
    method _linear (line 114) | def _linear(self, inputs):
  class TFElectraDiscriminatorPredictions (line 130) | class TFElectraDiscriminatorPredictions(tf.keras.layers.Layer):
    method __init__ (line 131) | def __init__(self, config, **kwargs):
    method call (line 138) | def call(self, discriminator_hidden_states, training=False):
  class TFElectraGeneratorPredictions (line 146) | class TFElectraGeneratorPredictions(tf.keras.layers.Layer):
    method __init__ (line 147) | def __init__(self, config, **kwargs):
    method call (line 153) | def call(self, generator_hidden_states, training=False):
  class TFElectraPreTrainedModel (line 161) | class TFElectraPreTrainedModel(TFBertPreTrainedModel):
    method get_extended_attention_mask (line 166) | def get_extended_attention_mask(self, attention_mask, input_shape):
    method get_head_mask (line 188) | def get_head_mask(self, head_mask):
  class TFElectraMainLayer (line 197) | class TFElectraMainLayer(TFElectraPreTrainedModel):
    method __init__ (line 201) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 210) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 213) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 216) | def _prune_heads(self, heads_to_prune):
    method call (line 223) | def call(
  class TFElectraModel (line 348) | class TFElectraModel(TFElectraPreTrainedModel):
    method __init__ (line 349) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 353) | def get_input_embeddings(self):
    method call (line 357) | def call(self, inputs, **kwargs):
  class TFElectraForPreTraining (line 398) | class TFElectraForPreTraining(TFElectraPreTrainedModel):
    method __init__ (line 399) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 405) | def get_input_embeddings(self):
    method call (line 409) | def call(
  class TFElectraMaskedLMHead (line 458) | class TFElectraMaskedLMHead(tf.keras.layers.Layer):
    method __init__ (line 459) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 464) | def build(self, input_shape):
    method call (line 468) | def call(self, hidden_states, training=False):
  class TFElectraForMaskedLM (line 482) | class TFElectraForMaskedLM(TFElectraPreTrainedModel):
    method __init__ (line 483) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 495) | def get_input_embeddings(self):
    method get_output_embeddings (line 498) | def get_output_embeddings(self):
    method call (line 502) | def call(
  class TFElectraForTokenClassification (line 560) | class TFElectraForTokenClassification(TFElectraPreTrainedModel):
    method __init__ (line 561) | def __init__(self, config, **kwargs):
    method call (line 569) | def call(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_flaubert.py
  class TFFlaubertModel (line 107) | class TFFlaubertModel(TFXLMModel):
    method __init__ (line 110) | def __init__(self, config, *inputs, **kwargs):
  class TFFlaubertMainLayer (line 115) | class TFFlaubertMainLayer(TFXLMMainLayer):
    method __init__ (line 116) | def __init__(self, config, *inputs, **kwargs):
    method call (line 121) | def call(
  class TFFlaubertWithLMHeadModel (line 311) | class TFFlaubertWithLMHeadModel(TFXLMWithLMHeadModel):
    method __init__ (line 314) | def __init__(self, config, *inputs, **kwargs):
  class TFFlaubertForSequenceClassification (line 324) | class TFFlaubertForSequenceClassification(TFXLMForSequenceClassification):
    method __init__ (line 327) | def __init__(self, config, *inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_gpt2.py
  function gelu (line 50) | def gelu(x):
  class TFAttention (line 63) | class TFAttention(tf.keras.layers.Layer):
    method __init__ (line 64) | def __init__(self, nx, n_ctx, config, scale=False, **kwargs):
    method prune_heads (line 82) | def prune_heads(self, heads):
    method causal_attention_mask (line 86) | def causal_attention_mask(nd, ns, dtype):
    method _attn (line 95) | def _attn(self, inputs, training=False):
    method merge_heads (line 125) | def merge_heads(self, x):
    method split_heads (line 131) | def split_heads(self, x):
    method call (line 137) | def call(self, inputs, training=False):
  class TFMLP (line 175) | class TFMLP(tf.keras.layers.Layer):
    method __init__ (line 176) | def __init__(self, n_state, config, **kwargs):
    method call (line 184) | def call(self, x, training=False):
  class TFBlock (line 191) | class TFBlock(tf.keras.layers.Layer):
    method __init__ (line 192) | def __init__(self, n_ctx, config, scale=False, **kwargs):
    method call (line 200) | def call(self, inputs, training=False):
  class TFGPT2MainLayer (line 217) | class TFGPT2MainLayer(tf.keras.layers.Layer):
    method __init__ (line 220) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 241) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 244) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 247) | def _prune_heads(self, heads_to_prune):
    method call (line 253) | def call(
  class TFGPT2PreTrainedModel (line 387) | class TFGPT2PreTrainedModel(TFPreTrainedModel):
  class TFGPT2Model (line 475) | class TFGPT2Model(TFGPT2PreTrainedModel):
    method __init__ (line 476) | def __init__(self, config, *inputs, **kwargs):
    method call (line 481) | def call(self, inputs, **kwargs):
  class TFGPT2LMHeadModel (line 524) | class TFGPT2LMHeadModel(TFGPT2PreTrainedModel):
    method __init__ (line 525) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 529) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 532) | def prepare_inputs_for_generation(self, inputs, past, **kwargs):
    method call (line 540) | def call(self, inputs, **kwargs):
  class TFGPT2DoubleHeadsModel (line 593) | class TFGPT2DoubleHeadsModel(TFGPT2PreTrainedModel):
    method __init__ (line 594) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 602) | def get_output_embeddings(self):
    method call (line 606) | def call(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_openai.py
  function gelu (line 45) | def gelu(x):
  function swish (line 58) | def swish(x):
  class TFAttention (line 69) | class TFAttention(tf.keras.layers.Layer):
    method __init__ (line 70) | def __init__(self, nx, n_ctx, config, scale=False, **kwargs):
    method prune_heads (line 88) | def prune_heads(self, heads):
    method causal_attention_mask (line 92) | def causal_attention_mask(nd, ns, dtype):
    method _attn (line 101) | def _attn(self, inputs, training=False):
    method merge_heads (line 131) | def merge_heads(self, x):
    method split_heads (line 137) | def split_heads(self, x):
    method call (line 143) | def call(self, inputs, training=False):
  class TFMLP (line 163) | class TFMLP(tf.keras.layers.Layer):
    method __init__ (line 164) | def __init__(self, n_state, config, **kwargs):
    method call (line 172) | def call(self, x, training=False):
  class TFBlock (line 179) | class TFBlock(tf.keras.layers.Layer):
    method __init__ (line 180) | def __init__(self, n_ctx, config, scale=False, **kwargs):
    method call (line 188) | def call(self, inputs, training=False):
  class TFOpenAIGPTMainLayer (line 202) | class TFOpenAIGPTMainLayer(tf.keras.layers.Layer):
    method __init__ (line 203) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 223) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 226) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 229) | def _prune_heads(self, heads_to_prune):
    method call (line 235) | def call(
  class TFOpenAIGPTPreTrainedModel (line 349) | class TFOpenAIGPTPreTrainedModel(TFPreTrainedModel):
  class TFOpenAIGPTModel (line 430) | class TFOpenAIGPTModel(TFOpenAIGPTPreTrainedModel):
    method __init__ (line 431) | def __init__(self, config, *inputs, **kwargs):
    method call (line 436) | def call(self, inputs, **kwargs):
  class TFOpenAIGPTLMHeadModel (line 475) | class TFOpenAIGPTLMHeadModel(TFOpenAIGPTPreTrainedModel):
    method __init__ (line 476) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 480) | def get_output_embeddings(self):
    method call (line 484) | def call(self, inputs, **kwargs):
  class TFOpenAIGPTDoubleHeadsModel (line 532) | class TFOpenAIGPTDoubleHeadsModel(TFOpenAIGPTPreTrainedModel):
    method __init__ (line 533) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 541) | def get_output_embeddings(self):
    method call (line 545) | def call(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_pytorch_utils.py
  function convert_tf_weight_name_to_pt_weight_name (line 29) | def convert_tf_weight_name_to_pt_weight_name(tf_name, start_prefix_to_re...
  function load_pytorch_checkpoint_in_tf2_model (line 73) | def load_pytorch_checkpoint_in_tf2_model(tf_model, pytorch_checkpoint_pa...
  function load_pytorch_model_in_tf2_model (line 97) | def load_pytorch_model_in_tf2_model(tf_model, pt_model, tf_inputs=None, ...
  function load_pytorch_weights_in_tf2_model (line 107) | def load_pytorch_weights_in_tf2_model(tf_model, pt_state_dict, tf_inputs...
  function load_tf2_checkpoint_in_pytorch_model (line 205) | def load_tf2_checkpoint_in_pytorch_model(pt_model, tf_checkpoint_path, t...
  function load_tf2_model_in_pytorch_model (line 240) | def load_tf2_model_in_pytorch_model(pt_model, tf_model, allow_missing_ke...
  function load_tf2_weights_in_pytorch_model (line 248) | def load_tf2_weights_in_pytorch_model(pt_model, tf_weights, allow_missin...

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_roberta.py
  class TFRobertaEmbeddings (line 40) | class TFRobertaEmbeddings(TFBertEmbeddings):
    method __init__ (line 45) | def __init__(self, config, **kwargs):
    method create_position_ids_from_input_ids (line 49) | def create_position_ids_from_input_ids(self, x):
    method create_position_ids_from_inputs_embeds (line 60) | def create_position_ids_from_inputs_embeds(self, inputs_embeds):
    method _embedding (line 71) | def _embedding(self, inputs, training=False):
  class TFRobertaMainLayer (line 85) | class TFRobertaMainLayer(TFBertMainLayer):
    method __init__ (line 90) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 94) | def get_input_embeddings(self):
  class TFRobertaPreTrainedModel (line 98) | class TFRobertaPreTrainedModel(TFPreTrainedModel):
  class TFRobertaModel (line 182) | class TFRobertaModel(TFRobertaPreTrainedModel):
    method __init__ (line 183) | def __init__(self, config, *inputs, **kwargs):
    method call (line 188) | def call(self, inputs, **kwargs):
  class TFRobertaLMHead (line 228) | class TFRobertaLMHead(tf.keras.layers.Layer):
    method __init__ (line 231) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 244) | def build(self, input_shape):
    method call (line 248) | def call(self, features):
  class TFRobertaForMaskedLM (line 260) | class TFRobertaForMaskedLM(TFRobertaPreTrainedModel):
    method __init__ (line 261) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 267) | def get_output_embeddings(self):
    method call (line 271) | def call(self, inputs, **kwargs):
  class TFRobertaClassificationHead (line 310) | class TFRobertaClassificationHead(tf.keras.layers.Layer):
    method __init__ (line 313) | def __init__(self, config, **kwargs):
    method call (line 326) | def call(self, features, training=False):
  class TFRobertaForSequenceClassification (line 340) | class TFRobertaForSequenceClassification(TFRobertaPreTrainedModel):
    method __init__ (line 341) | def __init__(self, config, *inputs, **kwargs):
    method call (line 349) | def call(self, inputs, **kwargs):
  class TFRobertaForTokenClassification (line 394) | class TFRobertaForTokenClassification(TFRobertaPreTrainedModel):
    method __init__ (line 395) | def __init__(self, config, *inputs, **kwargs):
    method call (line 406) | def call(self, inputs, **kwargs):
  class TFRobertaForQuestionAnswering (line 451) | class TFRobertaForQuestionAnswering(TFRobertaPreTrainedModel):
    method __init__ (line 452) | def __init__(self, config, *inputs, **kwargs):
    method call (line 462) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_t5.py
  class TFT5LayerNorm (line 49) | class TFT5LayerNorm(tf.keras.layers.Layer):
    method __init__ (line 50) | def __init__(self, epsilon=1e-6, **kwargs):
    method build (line 57) | def build(self, input_shape):
    method call (line 62) | def call(self, x):
  class TFT5DenseReluDense (line 68) | class TFT5DenseReluDense(tf.keras.layers.Layer):
    method __init__ (line 69) | def __init__(self, config, **kwargs):
    method call (line 76) | def call(self, hidden_states, training=False):
  class TFT5LayerFF (line 84) | class TFT5LayerFF(tf.keras.layers.Layer):
    method __init__ (line 85) | def __init__(self, config, **kwargs):
    method call (line 91) | def call(self, hidden_states, training=False):
  class TFT5Attention (line 98) | class TFT5Attention(tf.keras.layers.Layer):
    method __init__ (line 101) | def __init__(self, config, has_relative_attention_bias=False, **kwargs):
    method prune_heads (line 127) | def prune_heads(self, heads):
    method _relative_position_bucket (line 131) | def _relative_position_bucket(relative_position, bidirectional=True, n...
    method compute_bias (line 176) | def compute_bias(self, qlen, klen):
    method call (line 188) | def call(
  class TFT5LayerSelfAttention (line 302) | class TFT5LayerSelfAttention(tf.keras.layers.Layer):
    method __init__ (line 303) | def __init__(self, config, has_relative_attention_bias=False, **kwargs):
    method call (line 311) | def call(
  class TFT5LayerCrossAttention (line 337) | class TFT5LayerCrossAttention(tf.keras.layers.Layer):
    method __init__ (line 338) | def __init__(self, config, has_relative_attention_bias=False, **kwargs):
    method call (line 346) | def call(
  class TFT5Block (line 376) | class TFT5Block(tf.keras.layers.Layer):
    method __init__ (line 377) | def __init__(self, config, has_relative_attention_bias=False, **kwargs):
    method call (line 393) | def call(
  class _NoLayerEmbedTokens (line 471) | class _NoLayerEmbedTokens(object):
    method __init__ (line 478) | def __init__(self, layer, abs_scope_name=None):
    method call (line 482) | def call(self, inputs, mode="embedding"):
    method __call__ (line 491) | def __call__(self, inputs, mode="embedding"):
  class TFT5MainLayer (line 505) | class TFT5MainLayer(tf.keras.layers.Layer):
    method __init__ (line 506) | def __init__(self, config, embed_tokens=None, **kwargs):
    method get_input_embeddings (line 524) | def get_input_embeddings(self):
    method get_output_embeddings (line 527) | def get_output_embeddings(self):
    method set_embed_tokens (line 530) | def set_embed_tokens(self, embed_tokens):
    method _resize_token_embeddings (line 533) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 536) | def _prune_heads(self, heads_to_prune):
    method call (line 539) | def call(
  class TFT5PreTrainedModel (line 718) | class TFT5PreTrainedModel(TFPreTrainedModel):
    method dummy_inputs (line 727) | def dummy_inputs(self):
  class TFT5Model (line 828) | class TFT5Model(TFT5PreTrainedModel):
    method __init__ (line 829) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 846) | def get_input_embeddings(self):
    method get_output_embeddings (line 849) | def get_output_embeddings(self):
    method get_encoder (line 852) | def get_encoder(self):
    method get_decoder (line 855) | def get_decoder(self):
    method call (line 859) | def call(self, inputs, **kwargs):
  class TFT5ForConditionalGeneration (line 947) | class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
    method __init__ (line 948) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 967) | def get_input_embeddings(self):
    method get_output_embeddings (line 970) | def get_output_embeddings(self):
    method get_encoder (line 973) | def get_encoder(self):
    method get_decoder (line 976) | def get_decoder(self):
    method call (line 980) | def call(self, inputs, **kwargs):
    method prepare_inputs_for_generation (line 1079) | def prepare_inputs_for_generation(self, inputs, past, attention_mask, ...
    method _reorder_cache (line 1097) | def _reorder_cache(self, past, beam_idx):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_transfo_xl.py
  class TFPositionalEmbedding (line 39) | class TFPositionalEmbedding(tf.keras.layers.Layer):
    method __init__ (line 40) | def __init__(self, demb, **kwargs):
    method call (line 45) | def call(self, pos_seq, bsz=None):
  class TFPositionwiseFF (line 55) | class TFPositionwiseFF(tf.keras.layers.Layer):
    method __init__ (line 56) | def __init__(self, d_model, d_inner, dropout, pre_lnorm=False, layer_n...
    method call (line 74) | def call(self, inp, training=False):
  class TFRelPartialLearnableMultiHeadAttn (line 98) | class TFRelPartialLearnableMultiHeadAttn(tf.keras.layers.Layer):
    method __init__ (line 99) | def __init__(
    method build (line 152) | def build(self, input_shape):
    method _rel_shift (line 162) | def _rel_shift(self, x):
    method call (line 172) | def call(self, inputs, training=False):
  class TFRelPartialLearnableDecoderLayer (line 252) | class TFRelPartialLearnableDecoderLayer(tf.keras.layers.Layer):
    method __init__ (line 253) | def __init__(
    method call (line 301) | def call(self, inputs, training=False):
  class TFAdaptiveEmbedding (line 311) | class TFAdaptiveEmbedding(tf.keras.layers.Layer):
    method __init__ (line 312) | def __init__(self, n_token, d_embed, d_proj, cutoffs, div_val=1, init_...
    method build (line 344) | def build(self, input_shape):
    method call (line 357) | def call(self, inp):
  class TFTransfoXLMainLayer (line 384) | class TFTransfoXLMainLayer(tf.keras.layers.Layer):
    method __init__ (line 387) | def __init__(self, config, **kwargs):
    method build (line 455) | def build(self, input_shape):
    method get_input_embeddings (line 465) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 468) | def _resize_token_embeddings(self, new_num_tokens):
    method backward_compatible (line 471) | def backward_compatible(self):
    method reset_length (line 474) | def reset_length(self, tgt_len, ext_len, mem_len):
    method _prune_heads (line 479) | def _prune_heads(self, heads):
    method init_mems (line 482) | def init_mems(self, bsz):
    method _update_mems (line 493) | def _update_mems(self, hids, mems, mlen, qlen):
    method call (line 517) | def call(self, inputs, mems=None, head_mask=None, inputs_embeds=None, ...
  class TFTransfoXLPreTrainedModel (line 628) | class TFTransfoXLPreTrainedModel(TFPreTrainedModel):
  class TFTransfoXLModel (line 693) | class TFTransfoXLModel(TFTransfoXLPreTrainedModel):
    method __init__ (line 694) | def __init__(self, config, *inputs, **kwargs):
    method call (line 699) | def call(self, inputs, **kwargs):
  class TFTransfoXLLMHead (line 737) | class TFTransfoXLLMHead(tf.keras.layers.Layer):
    method __init__ (line 738) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 746) | def build(self, input_shape):
    method call (line 750) | def call(self, hidden_states):
  class TFTransfoXLLMHeadModel (line 761) | class TFTransfoXLLMHeadModel(TFTransfoXLPreTrainedModel):
    method __init__ (line 762) | def __init__(self, config):
    method get_output_embeddings (line 774) | def get_output_embeddings(self):
    method reset_length (line 781) | def reset_length(self, tgt_len, ext_len, mem_len):
    method init_mems (line 784) | def init_mems(self, bsz):
    method call (line 788) | def call(self, inputs, mems=None, head_mask=None, inputs_embeds=None, ...
    method prepare_inputs_for_generation (line 855) | def prepare_inputs_for_generation(self, inputs, past, **model_kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_transfo_xl_utilities.py
  class TFAdaptiveSoftmaxMask (line 25) | class TFAdaptiveSoftmaxMask(tf.keras.layers.Layer):
    method __init__ (line 26) | def __init__(self, vocab_size, d_embed, d_proj, cutoffs, div_val=1, ke...
    method build (line 45) | def build(self, input_shape):
    method _logit (line 104) | def _logit(x, W, b, proj=None):
    method _gather_logprob (line 111) | def _gather_logprob(logprob, target):
    method call (line 117) | def call(self, inputs, return_mean=True, training=False):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_utils.py
  class TFModelUtilsMixin (line 34) | class TFModelUtilsMixin:
    method num_parameters (line 39) | def num_parameters(self, only_trainable: bool = False) -> int:
  function keras_serializable (line 49) | def keras_serializable(cls):
  class TFPreTrainedModel (line 107) | class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
    method dummy_inputs (line 127) | def dummy_inputs(self):
    method __init__ (line 135) | def __init__(self, config, *inputs, **kwargs):
    method get_input_embeddings (line 148) | def get_input_embeddings(self):
    method get_output_embeddings (line 162) | def get_output_embeddings(self):
    method _get_resized_embeddings (line 172) | def _get_resized_embeddings(self, old_embeddings, new_num_tokens=None):
    method resize_token_embeddings (line 206) | def resize_token_embeddings(self, new_num_tokens=None):
    method prune_heads (line 221) | def prune_heads(self, heads_to_prune):
    method save_pretrained (line 230) | def save_pretrained(self, save_directory):
    method from_pretrained (line 247) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
    method prepare_inputs_for_generation (line 438) | def prepare_inputs_for_generation(self, inputs, **kwargs):
    method _use_cache (line 441) | def _use_cache(self, outputs, use_cache):
    method generate (line 449) | def generate(
    method _generate_no_beam_search (line 810) | def _generate_no_beam_search(
    method _generate_beam_search (line 973) | def _generate_beam_search(
    method _reorder_cache (line 1294) | def _reorder_cache(past, beam_idx):
  function _create_next_token_logits_penalties (line 1298) | def _create_next_token_logits_penalties(input_ids, logits, repetition_pe...
  function calc_banned_ngram_tokens (line 1312) | def calc_banned_ngram_tokens(prev_input_ids, num_hypos, no_repeat_ngram_...
  function calc_banned_bad_words_ids (line 1335) | def calc_banned_bad_words_ids(prev_input_ids, bad_words_ids):
  function tf_top_k_top_p_filtering (line 1371) | def tf_top_k_top_p_filtering(logits, top_k=0, top_p=1.0, filter_value=-f...
  function scatter_values_on_batch_indices (line 1421) | def scatter_values_on_batch_indices(values, batch_indices):
  function set_tensor_by_indices_to_value (line 1431) | def set_tensor_by_indices_to_value(tensor, indices, value):
  class BeamHypotheses (line 1437) | class BeamHypotheses(object):
    method __init__ (line 1438) | def __init__(self, num_beams, max_length, length_penalty, early_stoppi...
    method __len__ (line 1449) | def __len__(self):
    method add (line 1455) | def add(self, hyp, sum_logprobs):
    method is_done (line 1469) | def is_done(self, best_sum_logprobs, cur_len=None):
  class TFConv1D (line 1487) | class TFConv1D(tf.keras.layers.Layer):
    method __init__ (line 1488) | def __init__(self, nf, nx, initializer_range=0.02, **kwargs):
    method build (line 1497) | def build(self, input_shape):
    method call (line 1503) | def call(self, x):
  class TFSharedEmbeddings (line 1514) | class TFSharedEmbeddings(tf.keras.layers.Layer):
    method __init__ (line 1518) | def __init__(self, vocab_size, hidden_size, initializer_range=None, **...
    method build (line 1524) | def build(self, input_shape):
    method call (line 1534) | def call(self, inputs, mode="embedding"):
    method _embedding (line 1556) | def _embedding(self, input_ids):
    method _linear (line 1560) | def _linear(self, inputs):
  class TFSequenceSummary (line 1575) | class TFSequenceSummary(tf.keras.layers.Layer):
    method __init__ (line 1591) | def __init__(self, config, initializer_range=0.02, **kwargs):
    method call (line 1623) | def call(self, inputs, training=False):
  function shape_list (line 1682) | def shape_list(x):
  function get_initializer (line 1689) | def get_initializer(initializer_range=0.02):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_xlm.py
  function create_sinusoidal_embeddings (line 49) | def create_sinusoidal_embeddings(n_pos, dim, out):
  function gelu (line 55) | def gelu(x):
  function get_masks (line 66) | def get_masks(slen, lengths, causal, padding_mask=None, dtype=tf.float32):
  class TFMultiHeadAttention (line 97) | class TFMultiHeadAttention(tf.keras.layers.Layer):
    method __init__ (line 101) | def __init__(self, n_heads, dim, config, **kwargs):
    method prune_heads (line 116) | def prune_heads(self, heads):
    method call (line 119) | def call(self, inputs, training=False):
  class TFTransformerFFN (line 185) | class TFTransformerFFN(tf.keras.layers.Layer):
    method __init__ (line 186) | def __init__(self, in_dim, dim_hidden, out_dim, config, **kwargs):
    method call (line 193) | def call(self, input, training=False):
  class TFXLMMainLayer (line 201) | class TFXLMMainLayer(tf.keras.layers.Layer):
    method __init__ (line 202) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 292) | def get_input_embeddings(self):
    method _resize_token_embeddings (line 295) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 298) | def _prune_heads(self, heads_to_prune):
    method call (line 305) | def call(
  class TFXLMPreTrainedModel (line 468) | class TFXLMPreTrainedModel(TFPreTrainedModel):
    method dummy_inputs (line 477) | def dummy_inputs(self):
  class TFXLMModel (line 574) | class TFXLMModel(TFXLMPreTrainedModel):
    method __init__ (line 575) | def __init__(self, config, *inputs, **kwargs):
    method call (line 580) | def call(self, inputs, **kwargs):
  class TFXLMPredLayer (line 614) | class TFXLMPredLayer(tf.keras.layers.Layer):
    method __init__ (line 619) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 636) | def build(self, input_shape):
    method call (line 641) | def call(self, hidden_states):
  class TFXLMWithLMHeadModel (line 652) | class TFXLMWithLMHeadModel(TFXLMPreTrainedModel):
    method __init__ (line 653) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 658) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 661) | def prepare_inputs_for_generation(self, inputs, **kwargs):
    method call (line 676) | def call(self, inputs, **kwargs):
  class TFXLMForSequenceClassification (line 720) | class TFXLMForSequenceClassification(TFXLMPreTrainedModel):
    method __init__ (line 721) | def __init__(self, config, *inputs, **kwargs):
    method call (line 729) | def call(self, inputs, **kwargs):
  class TFXLMForQuestionAnsweringSimple (line 774) | class TFXLMForQuestionAnsweringSimple(TFXLMPreTrainedModel):
    method __init__ (line 775) | def __init__(self, config, *inputs, **kwargs):
    method call (line 783) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_xlm_roberta.py
  class TFXLMRobertaModel (line 70) | class TFXLMRobertaModel(TFRobertaModel):
  class TFXLMRobertaForMaskedLM (line 82) | class TFXLMRobertaForMaskedLM(TFRobertaForMaskedLM):
  class TFXLMRobertaForSequenceClassification (line 96) | class TFXLMRobertaForSequenceClassification(TFRobertaForSequenceClassifi...
  class TFXLMRobertaForTokenClassification (line 110) | class TFXLMRobertaForTokenClassification(TFRobertaForTokenClassification):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_tf_xlnet.py
  function gelu (line 47) | def gelu(x):
  function swish (line 56) | def swish(x):
  class TFXLNetRelativeAttention (line 67) | class TFXLNetRelativeAttention(tf.keras.layers.Layer):
    method __init__ (line 68) | def __init__(self, config, **kwargs):
    method build (line 87) | def build(self, input_shape):
    method prune_heads (line 118) | def prune_heads(self, heads):
    method rel_shift (line 121) | def rel_shift(self, x, klen=-1):
    method rel_attn_core (line 133) | def rel_attn_core(self, inputs, training=False):
    method post_attention (line 178) | def post_attention(self, inputs, residual=True, training=False):
    method call (line 193) | def call(self, inputs, training=False):
  class TFXLNetFeedForward (line 290) | class TFXLNetFeedForward(tf.keras.layers.Layer):
    method __init__ (line 291) | def __init__(self, config, **kwargs):
    method call (line 306) | def call(self, inp, training=False):
  class TFXLNetLayer (line 317) | class TFXLNetLayer(tf.keras.layers.Layer):
    method __init__ (line 318) | def __init__(self, config, **kwargs):
    method call (line 324) | def call(self, inputs, training=False):
  class TFXLNetLMHead (line 336) | class TFXLNetLMHead(tf.keras.layers.Layer):
    method __init__ (line 337) | def __init__(self, config, input_embeddings, **kwargs):
    method build (line 344) | def build(self, input_shape):
    method call (line 348) | def call(self, hidden_states):
  class TFXLNetMainLayer (line 355) | class TFXLNetMainLayer(tf.keras.layers.Layer):
    method __init__ (line 358) | def __init__(self, config, **kwargs):
    method get_input_embeddings (line 380) | def get_input_embeddings(self):
    method build (line 383) | def build(self, input_shape):
    method _resize_token_embeddings (line 389) | def _resize_token_embeddings(self, new_num_tokens):
    method _prune_heads (line 392) | def _prune_heads(self, heads_to_prune):
    method create_mask (line 395) | def create_mask(self, qlen, mlen, dtype=tf.float32):
    method cache_mem (line 424) | def cache_mem(self, curr_out, prev_mem):
    method positional_embedding (line 437) | def positional_embedding(pos_seq, inv_freq, bsz=None):
    method relative_positional_encoding (line 447) | def relative_positional_encoding(self, qlen, klen, bsz=None, dtype=None):
    method call (line 495) | def call(
  class TFXLNetPreTrainedModel (line 699) | class TFXLNetPreTrainedModel(TFPreTrainedModel):
  class TFXLNetModel (line 795) | class TFXLNetModel(TFXLNetPreTrainedModel):
    method __init__ (line 796) | def __init__(self, config, *inputs, **kwargs):
    method call (line 801) | def call(self, inputs, **kwargs):
  class TFXLNetLMHeadModel (line 844) | class TFXLNetLMHeadModel(TFXLNetPreTrainedModel):
    method __init__ (line 845) | def __init__(self, config, *inputs, **kwargs):
    method get_output_embeddings (line 850) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 853) | def prepare_inputs_for_generation(self, inputs, past, **kwargs):
    method call (line 885) | def call(self, inputs, **kwargs):
  class TFXLNetForSequenceClassification (line 941) | class TFXLNetForSequenceClassification(TFXLNetPreTrainedModel):
    method __init__ (line 942) | def __init__(self, config, *inputs, **kwargs):
    method call (line 955) | def call(self, inputs, **kwargs):
  class TFXLNetForTokenClassification (line 1005) | class TFXLNetForTokenClassification(TFXLNetPreTrainedModel):
    method __init__ (line 1006) | def __init__(self, config, *inputs, **kwargs):
    method call (line 1015) | def call(self, inputs, **kwargs):
  class TFXLNetForQuestionAnsweringSimple (line 1064) | class TFXLNetForQuestionAnsweringSimple(TFXLNetPreTrainedModel):
    method __init__ (line 1065) | def __init__(self, config, *inputs, **kwargs):
    method call (line 1073) | def call(self, inputs, **kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_transfo_xl.py
  function build_tf_to_pytorch_map (line 42) | def build_tf_to_pytorch_map(model, config):
  function load_tf_weights_in_transfo_xl (line 109) | def load_tf_weights_in_transfo_xl(model, config, tf_path):
  class PositionalEmbedding (line 167) | class PositionalEmbedding(nn.Module):
    method __init__ (line 168) | def __init__(self, demb):
    method forward (line 176) | def forward(self, pos_seq, bsz=None):
  class PositionwiseFF (line 186) | class PositionwiseFF(nn.Module):
    method __init__ (line 187) | def __init__(self, d_model, d_inner, dropout, pre_lnorm=False, layer_n...
    method forward (line 206) | def forward(self, inp):
  class RelPartialLearnableMultiHeadAttn (line 223) | class RelPartialLearnableMultiHeadAttn(nn.Module):
    method __init__ (line 224) | def __init__(
    method _rel_shift (line 269) | def _rel_shift(self, x):
    method forward (line 281) | def forward(self, w, r, attn_mask=None, mems=None, head_mask=None):
  class RelPartialLearnableDecoderLayer (line 370) | class RelPartialLearnableDecoderLayer(nn.Module):
    method __init__ (line 371) | def __init__(self, n_head, d_model, d_head, d_inner, dropout, layer_no...
    method forward (line 381) | def forward(self, dec_inp, r, dec_attn_mask=None, mems=None, head_mask...
  class AdaptiveEmbedding (line 391) | class AdaptiveEmbedding(nn.Module):
    method __init__ (line 392) | def __init__(self, n_token, d_embed, d_proj, cutoffs, div_val=1, sampl...
    method forward (line 419) | def forward(self, inp):
  class TransfoXLPreTrainedModel (line 451) | class TransfoXLPreTrainedModel(PreTrainedModel):
    method _init_weight (line 460) | def _init_weight(self, weight):
    method _init_bias (line 466) | def _init_bias(self, bias):
    method _init_weights (line 469) | def _init_weights(self, m):
  class TransfoXLModel (line 552) | class TransfoXLModel(TransfoXLPreTrainedModel):
    method __init__ (line 553) | def __init__(self, config):
    method get_input_embeddings (line 618) | def get_input_embeddings(self):
    method set_input_embeddings (line 621) | def set_input_embeddings(self, new_embeddings):
    method backward_compatible (line 624) | def backward_compatible(self):
    method reset_length (line 627) | def reset_length(self, tgt_len, ext_len, mem_len):
    method _prune_heads (line 632) | def _prune_heads(self, heads):
    method init_mems (line 636) | def init_mems(self, bsz):
    method _update_mems (line 648) | def _update_mems(self, hids, mems, mlen, qlen):
    method forward (line 673) | def forward(self, input_ids=None, mems=None, head_mask=None, inputs_em...
  class TransfoXLLMHeadModel (line 807) | class TransfoXLLMHeadModel(TransfoXLPreTrainedModel):
    method __init__ (line 808) | def __init__(self, config):
    method tie_weights (line 823) | def tie_weights(self):
    method reset_length (line 844) | def reset_length(self, tgt_len, ext_len, mem_len):
    method init_mems (line 847) | def init_mems(self, bsz):
    method forward (line 851) | def forward(self, input_ids=None, mems=None, head_mask=None, inputs_em...
    method get_output_embeddings (line 917) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 925) | def prepare_inputs_for_generation(self, input_ids, past, **model_kwargs):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_transfo_xl_utilities.py
  class ProjectedAdaptiveLogSoftmax (line 30) | class ProjectedAdaptiveLogSoftmax(nn.Module):
    method __init__ (line 31) | def __init__(self, n_token, d_embed, d_proj, cutoffs, div_val=1, keep_...
    method _compute_logit (line 72) | def _compute_logit(self, hidden, weight, bias, proj):
    method forward (line 86) | def forward(self, hidden, labels=None, keep_order=False):
    method log_prob (line 193) | def log_prob(self, hidden):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_utils.py
  class Identity (line 47) | class Identity(nn.Module):
    method __init__ (line 51) | def __init__(self, *args, **kwargs):
    method forward (line 54) | def forward(self, input):
  class ModuleUtilsMixin (line 58) | class ModuleUtilsMixin:
    method num_parameters (line 63) | def num_parameters(self, only_trainable: bool = False) -> int:
    method _hook_rss_memory_pre_forward (line 71) | def _hook_rss_memory_pre_forward(module, *args, **kwargs):
    method _hook_rss_memory_post_forward (line 83) | def _hook_rss_memory_post_forward(module, *args, **kwargs):
    method add_memory_hooks (line 96) | def add_memory_hooks(self):
    method reset_memory_hooks_state (line 105) | def reset_memory_hooks_state(self):
    method device (line 112) | def device(self) -> device:
    method dtype (line 130) | def dtype(self) -> dtype:
    method invert_attention_mask (line 147) | def invert_attention_mask(self, encoder_attention_mask: Tensor) -> Ten...
    method get_extended_attention_mask (line 173) | def get_extended_attention_mask(self, attention_mask: Tensor, input_sh...
    method get_head_mask (line 217) | def get_head_mask(self, head_mask: Tensor, num_hidden_layers: int, is_...
    method _convert_head_mask_to_5d (line 238) | def _convert_head_mask_to_5d(self, head_mask, num_hidden_layers):
  class PreTrainedModel (line 250) | class PreTrainedModel(nn.Module, ModuleUtilsMixin):
    method dummy_inputs (line 270) | def dummy_inputs(self):
    method __init__ (line 278) | def __init__(self, config, *inputs, **kwargs):
    method base_model (line 292) | def base_model(self):
    method get_input_embeddings (line 295) | def get_input_embeddings(self):
    method set_input_embeddings (line 309) | def set_input_embeddings(self, value: nn.Module):
    method get_output_embeddings (line 323) | def get_output_embeddings(self):
    method tie_weights (line 333) | def tie_weights(self):
    method _tie_or_clone_weights (line 343) | def _tie_or_clone_weights(self, output_embeddings, input_embeddings):
    method resize_token_embeddings (line 361) | def resize_token_embeddings(self, new_num_tokens: Optional[int] = None):
    method _resize_token_embeddings (line 388) | def _resize_token_embeddings(self, new_num_tokens):
    method _get_resized_embeddings (line 394) | def _get_resized_embeddings(
    method init_weights (line 432) | def init_weights(self):
    method prune_heads (line 444) | def prune_heads(self, heads_to_prune: Dict):
    method save_pretrained (line 459) | def save_pretrained(self, save_directory):
    method from_pretrained (line 494) | def from_pretrained(cls, pretrained_model_name_or_path, *model_args, *...
    method prepare_inputs_for_generation (line 777) | def prepare_inputs_for_generation(self, input_ids, **kwargs):
    method prepare_logits_for_generation (line 780) | def prepare_logits_for_generation(self, logits, **kwargs):
    method _use_cache (line 783) | def _use_cache(self, outputs, use_cache):
    method enforce_repetition_penalty_ (line 791) | def enforce_repetition_penalty_(self, lprobs, batch_size, num_beams, p...
    method generate (line 802) | def generate(
    method _generate_no_beam_search (line 1186) | def _generate_no_beam_search(
    method _generate_beam_search (line 1307) | def _generate_beam_search(
    method _reorder_cache (line 1582) | def _reorder_cache(past: Tuple, beam_idx: Tensor) -> Tuple[Tensor]:
  function calc_banned_ngram_tokens (line 1586) | def calc_banned_ngram_tokens(prev_input_ids: Tensor, num_hypos: int, no_...
  function calc_banned_bad_words_ids (line 1609) | def calc_banned_bad_words_ids(prev_input_ids: Iterable[int], bad_words_i...
  function top_k_top_p_filtering (line 1645) | def top_k_top_p_filtering(
  class BeamHypotheses (line 1686) | class BeamHypotheses(object):
    method __init__ (line 1687) | def __init__(self, num_beams, max_length, length_penalty, early_stoppi...
    method __len__ (line 1698) | def __len__(self):
    method add (line 1704) | def add(self, hyp, sum_logprobs):
    method is_done (line 1718) | def is_done(self, best_sum_logprobs, cur_len=None):
  class Conv1D (line 1736) | class Conv1D(nn.Module):
    method __init__ (line 1737) | def __init__(self, nf, nx):
    method forward (line 1748) | def forward(self, x):
  class PoolerStartLogits (line 1755) | class PoolerStartLogits(nn.Module):
    method __init__ (line 1758) | def __init__(self, config):
    method forward (line 1762) | def forward(self, hidden_states, p_mask=None):
  class PoolerEndLogits (line 1779) | class PoolerEndLogits(nn.Module):
    method __init__ (line 1783) | def __init__(self, config):
    method forward (line 1790) | def forward(self, hidden_states, start_states=None, start_positions=No...
  class PoolerAnswerClass (line 1826) | class PoolerAnswerClass(nn.Module):
    method __init__ (line 1829) | def __init__(self, config):
    method forward (line 1835) | def forward(self, hidden_states, start_states=None, start_positions=No...
  class SQuADHead (line 1873) | class SQuADHead(nn.Module):
    method __init__ (line 1914) | def __init__(self, config):
    method forward (line 1923) | def forward(
  class SequenceSummary (line 1990) | class SequenceSummary(nn.Module):
    method __init__ (line 2006) | def __init__(self, config: PretrainedConfig):
    method forward (line 2035) | def forward(self, hidden_states, cls_index=None):
  function create_position_ids_from_input_ids (line 2067) | def create_position_ids_from_input_ids(input_ids, padding_idx):
  function prune_linear_layer (line 2081) | def prune_linear_layer(layer, index, dim=0):
  function prune_conv1d_layer (line 2106) | def prune_conv1d_layer(layer, index, dim=1):
  function prune_layer (line 2130) | def prune_layer(layer, index, dim=None):
  function apply_chunking_to_forward (line 2143) | def apply_chunking_to_forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_xlm.py
  function create_sinusoidal_embeddings (line 52) | def create_sinusoidal_embeddings(n_pos, dim, out):
  function get_masks (line 60) | def get_masks(slen, lengths, causal, padding_mask=None):
  class MultiHeadAttention (line 85) | class MultiHeadAttention(nn.Module):
    method __init__ (line 89) | def __init__(self, n_heads, dim, config):
    method prune_heads (line 104) | def prune_heads(self, heads):
    method forward (line 125) | def forward(self, input, mask, kv=None, cache=None, head_mask=None):
  class TransformerFFN (line 189) | class TransformerFFN(nn.Module):
    method __init__ (line 190) | def __init__(self, in_dim, dim_hidden, out_dim, config):
    method forward (line 197) | def forward(self, input):
  class XLMPreTrainedModel (line 205) | class XLMPreTrainedModel(PreTrainedModel):
    method __init__ (line 214) | def __init__(self, *inputs, **kwargs):
    method dummy_inputs (line 218) | def dummy_inputs(self):
    method _init_weights (line 227) | def _init_weights(self, module):
  class XLMModel (line 313) | class XLMModel(XLMPreTrainedModel):
    method __init__ (line 314) | def __init__(self, config):  # , dico, is_encoder, with_output):
    method get_input_embeddings (line 384) | def get_input_embeddings(self):
    method set_input_embeddings (line 387) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 390) | def _prune_heads(self, heads_to_prune):
    method forward (line 399) | def forward(
  class XLMPredLayer (line 554) | class XLMPredLayer(nn.Module):
    method __init__ (line 559) | def __init__(self, config):
    method forward (line 577) | def forward(self, x, y=None):
  class XLMWithLMHeadModel (line 602) | class XLMWithLMHeadModel(XLMPreTrainedModel):
    method __init__ (line 603) | def __init__(self, config):
    method get_output_embeddings (line 610) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 613) | def prepare_inputs_for_generation(self, input_ids, **kwargs):
    method forward (line 627) | def forward(
  class XLMForSequenceClassification (line 702) | class XLMForSequenceClassification(XLMPreTrainedModel):
    method __init__ (line 703) | def __init__(self, config):
    method forward (line 713) | def forward(
  class XLMForQuestionAnsweringSimple (line 799) | class XLMForQuestionAnsweringSimple(XLMPreTrainedModel):
    method __init__ (line 800) | def __init__(self, config):
    method forward (line 809) | def forward(
  class XLMForQuestionAnswering (line 917) | class XLMForQuestionAnswering(XLMPreTrainedModel):
    method __init__ (line 918) | def __init__(self, config):
    method forward (line 927) | def forward(
  class XLMForTokenClassification (line 1034) | class XLMForTokenClassification(XLMPreTrainedModel):
    method __init__ (line 1035) | def __init__(self, config):
    method forward (line 1046) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/modeling_xlm_roberta.py
  class XLMRobertaModel (line 62) | class XLMRobertaModel(RobertaModel):
  class XLMRobertaForMaskedLM (line 74) | class XLMRobertaForMaskedLM(RobertaForMaskedLM):
  class XLMRobertaForSequenceClassification (line 88) | class XLMRobertaForSequenceClassification(RobertaForSequenceClassificati...
  class XLMRobertaForMultipleChoice (line 102) | class XLMRobertaForMultipleChoice(RobertaForMultipleChoice):
  class XLMRobertaForTokenClassification (line 116) | class XLMRobertaForTokenClassification(RobertaForTokenClassification):

FILE: code/bert-base-count3/pretrain/transformers1/modeling_xlnet.py
  function build_tf_xlnet_to_pytorch_map (line 42) | def build_tf_xlnet_to_pytorch_map(model, config, tf_weights=None):
  function load_tf_weights_in_xlnet (line 125) | def load_tf_weights_in_xlnet(model, config, tf_path):
  class XLNetRelativeAttention (line 193) | class XLNetRelativeAttention(nn.Module):
    method __init__ (line 194) | def __init__(self, config):
    method prune_heads (line 223) | def prune_heads(self, heads):
    method rel_shift (line 227) | def rel_shift(x, klen=-1):
    method rel_shift_bnij (line 240) | def rel_shift_bnij(x, klen=-1):
    method rel_attn_core (line 254) | def rel_attn_core(self, q_head, k_head_h, v_head_h, k_head_r, seg_mat=...
    method post_attention (line 296) | def post_attention(self, h, attn_vec, residual=True):
    method forward (line 308) | def forward(self, h, g, attn_mask_h, attn_mask_g, r, seg_mat, mems=Non...
  class XLNetFeedForward (line 403) | class XLNetFeedForward(nn.Module):
    method __init__ (line 404) | def __init__(self, config):
    method forward (line 415) | def forward(self, inp):
  class XLNetLayer (line 426) | class XLNetLayer(nn.Module):
    method __init__ (line 427) | def __init__(self, config):
    method forward (line 433) | def forward(
  class XLNetPreTrainedModel (line 457) | class XLNetPreTrainedModel(PreTrainedModel):
    method _init_weights (line 466) | def _init_weights(self, module):
  class XLNetModel (line 568) | class XLNetModel(XLNetPreTrainedModel):
    method __init__ (line 569) | def __init__(self, config):
    method get_input_embeddings (line 590) | def get_input_embeddings(self):
    method set_input_embeddings (line 593) | def set_input_embeddings(self, new_embeddings):
    method _prune_heads (line 596) | def _prune_heads(self, heads_to_prune):
    method create_mask (line 599) | def create_mask(self, qlen, mlen):
    method cache_mem (line 629) | def cache_mem(self, curr_out, prev_mem):
    method positional_embedding (line 642) | def positional_embedding(pos_seq, inv_freq, bsz=None):
    method relative_positional_encoding (line 652) | def relative_positional_encoding(self, qlen, klen, bsz=None):
    method forward (line 692) | def forward(
  class XLNetLMHeadModel (line 927) | class XLNetLMHeadModel(XLNetPreTrainedModel):
    method __init__ (line 928) | def __init__(self, config):
    method get_output_embeddings (line 938) | def get_output_embeddings(self):
    method prepare_inputs_for_generation (line 941) | def prepare_inputs_for_generation(self, input_ids, past, **kwargs):
    method forward (line 975) | def forward(
  class XLNetForSequenceClassification (line 1083) | class XLNetForSequenceClassification(XLNetPreTrainedModel):
    method __init__ (line 1084) | def __init__(self, config):
    method forward (line 1095) | def forward(
  class XLNetForTokenClassification (line 1189) | class XLNetForTokenClassification(XLNetPreTrainedModel):
    method __init__ (line 1190) | def __init__(self, config):
    method forward (line 1200) | def forward(
  class XLNetForMultipleChoice (line 1298) | class XLNetForMultipleChoice(XLNetPreTrainedModel):
    method __init__ (line 1299) | def __init__(self, config):
    method forward (line 1309) | def forward(
  class XLNetForQuestionAnsweringSimple (line 1411) | class XLNetForQuestionAnsweringSimple(XLNetPreTrainedModel):
    method __init__ (line 1412) | def __init__(self, config):
    method forward (line 1422) | def forward(
  class XLNetForQuestionAnswering (line 1534) | class XLNetForQuestionAnswering(XLNetPreTrainedModel):
    method __init__ (line 1535) | def __init__(self, config):
    method forward (line 1548) | def forward(

FILE: code/bert-base-count3/pretrain/transformers1/optimization.py
  function get_constant_schedule (line 28) | def get_constant_schedule(optimizer, last_epoch=-1):
  function get_constant_schedule_with_warmup (line 34) | def get_constant_schedule_with_warmup(optimizer, num_warmup_steps, last_...
  function get_linear_schedule_with_warmup (line 47) | def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_tra...
  function get_cosine_schedule_with_warmup (line 62) | def get_cosine_schedule_with_warmup(optimizer, num_warmup_steps, num_tra...
  function get_cosine_with_hard_restarts_schedule_with_warmup (line 77) | def get_cosine_with_hard_restarts_schedule_with_warmup(
  class AdamW (line 96) | class AdamW(Optimizer):
    method __init__ (line 107) | def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-6, weig...
    method step (line 119) | def step(self, closure=None):

FILE: code/bert-base-count3/pretrain/transformers1/optimization_tf.py
  class WarmUp (line 23) | class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule):
    method __init__ (line 26) | def __init__(
    method __call__ (line 36) | def __call__(self, step):
    method get_config (line 51) | def get_config(self):
  function create_optimizer (line 61) | def create_optimizer(init_lr, num_train_steps, num_warmup_steps, end_lr=...
  class AdamWeightDecay (line 84) | class AdamWeightDecay(tf.keras.optimizers.Adam):
    method __init__ (line 94) | def __init__(
    method from_config (line 113) | def from_config(cls, config):
    method _prepare_local (line 118) | def _prepare_local(self, var_device, var_dtype, apply_state):
    method _decay_weights_op (line 124) | def _decay_weights_op(self, var, learning_rate, apply_state):
    method apply_gradients (line 133) | def apply_gradients(self, grads_and_vars, name=None):
    method _get_lr (line 137) | def _get_lr(self, var_device, var_dtype, apply_state):
    method _resource_apply_dense (line 150) | def _resource_apply_dense(self, grad, var, apply_state=None):
    method _resource_apply_sparse (line 156) | def _resource_apply_sparse(self, grad, var, indices, apply_state=None):
    method get_config (line 162) | def get_config(self):
    method _do_use_weight_decay (line 167) | def _do_use_weight_decay(self, param_name):
  class GradientAccumulator (line 185) | class GradientAccumulator(object):
    method __init__ (line 197) | def __init__(self):
    method step (line 203) | def step(self):
    method gradients (line 216) | def gradients(self):
    method __call__ (line 222) | def __call__(self, gradients):
    method reset (line 248) | def reset(self):

FILE: code/bert-base-count3/pretrain/transformers1/pipelines.py
  function get_framework (line 69) | def get_framework(model=None):
  class ArgumentHandler (line 89) | class ArgumentHandler(ABC):
    method __call__ (line 95) | def __call__(self, *args, **kwargs):
  class DefaultArgumentHandler (line 99) | class DefaultArgumentHandler(ArgumentHandler):
    method handle_kwargs (line 105) | def handle_kwargs(kwargs: Dict) -> List:
    method handle_args (line 114) | def handle_args(args: Sequence[Any]) -> List[str]:
    method __call__ (line 140) | def __call__(self, *args, **kwargs):
  class PipelineDataFormat (line 150) | class PipelineDataFormat:
    method __init__ (line 164) | def __init__(
    method __iter__ (line 184) | def __iter__(self):
    method save (line 188) | def save(self, data: dict):
    method save_binary (line 196) | def save_binary(self, data: Union[dict, List[dict]]) -> str:
    method from_str (line 211) | def from_str(
  class CsvPipelineDataFormat (line 224) | class CsvPipelineDataFormat(PipelineDataFormat):
    method __init__ (line 225) | def __init__(
    method __iter__ (line 230) | def __iter__(self):
    method save (line 239) | def save(self, data: List[dict]):
  class JsonPipelineDataFormat (line 247) | class JsonPipelineDataFormat(PipelineDataFormat):
    method __init__ (line 248) | def __init__(
    method __iter__ (line 256) | def __iter__(self):
    method save (line 263) | def save(self, data: dict):
  class PipedPipelineDataFormat (line 268) | class PipedPipelineDataFormat(PipelineDataFormat):
    method __iter__ (line 276) | def __iter__(self):
    method save (line 292) | def save(self, data: dict):
    method save_binary (line 295) | def save_binary(self, data: Union[dict, List[dict]]) -> str:
  class _ScikitCompat (line 305) | class _ScikitCompat(ABC):
    method transform (line 311) | def transform(self, X):
    method predict (line 315) | def predict(self, X):
  class Pipeline (line 319) | class Pipeline(_ScikitCompat):
    method __init__ (line 370) | def __init__(
    method save_pretrained (line 402) | def save_pretrained(self, save_directory):
    method transform (line 415) | def transform(self, X):
    method predict (line 421) | def predict(self, X):
    method device_placement (line 428) | def device_placement(self):
    method ensure_tensor_on_device (line 449) | def ensure_tensor_on_device(self, **inputs):
    method _parse_and_tokenize (line 457) | def _parse_and_tokenize(self, *args, pad_to_max_length=True, add_speci...
    method __call__ (line 472) | def __call__(self, *args, **kwargs):
    method _forward (line 476) | def _forward(self, inputs, return_tensors=False):
  class FeatureExtractionPipeline (line 501) | class FeatureExtractionPipeline(Pipeline):
    method __init__ (line 537) | def __init__(
    method __call__ (line 558) | def __call__(self, *args, **kwargs):
  class TextGenerationPipeline (line 562) | class TextGenerationPipeline(Pipeline):
    method __call__ (line 606) | def __call__(
  class TextClassificationPipeline (line 683) | class TextClassificationPipeline(Pipeline):
    method __call__ (line 720) | def __call__(self, *args, **kwargs):
  class FillMaskPipeline (line 726) | class FillMaskPipeline(Pipeline):
    method __init__ (line 764) | def __init__(
    method __call__ (line 788) | def __call__(self, *args, **kwargs):
  class NerPipeline (line 826) | class NerPipeline(Pipeline):
    method __init__ (line 865) | def __init__(
    method __call__ (line 893) | def __call__(self, *args, **kwargs):
    method group_entities (line 973) | def group_entities(self, entities):
  class QuestionAnsweringArgumentHandler (line 993) | class QuestionAnsweringArgumentHandler(ArgumentHandler):
    method __call__ (line 1002) | def __call__(self, *args, **kwargs):
  class QuestionAnsweringPipeline (line 1055) | class QuestionAnsweringPipeline(Pipeline):
    method __init__ (line 1094) | def __init__(
    method create_sample (line 1116) | def create_sample(
    method __call__ (line 1135) | def __call

Copy disabled (too large) Download .json

Condensed preview — 701 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (13,312K chars).

[
  {
    "path": "README.md",
    "chars": 6446,
    "preview": "# 0.前言\n\n决赛答辩已经过去一段时间了，我们队伍ac milan最终获得了复赛第3，决赛第4的成绩。在此首先感谢一些队友的carry～\n\n经过2个多月的比赛，学习收获了很多，也认识了很多大佬，在这里记录一下自己的参赛体验和学习收获。\n\n"
  },
  {
    "path": "code/.gitignore",
    "chars": 96,
    "preview": "bert-base-chinese/pytorch_model.bin\nnezha-cn-base/pytorch_model.bin\n.idea\n.DS_Store\n__pycache__\n"
  },
  {
    "path": "code/Config.py",
    "chars": 2464,
    "preview": "from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear_schedule_wit"
  },
  {
    "path": "code/Dockerfile",
    "chars": 1493,
    "preview": "# Base Images\n## 从天池基础镜像构建(from的base img 根据自己的需要更换，建议使用天池open list镜像链接：https://tianchi.aliyun.com/forum/postDetail?postI"
  },
  {
    "path": "code/NEZHA/configuration_nezha.py",
    "chars": 6316,
    "preview": "\nfrom transformers import PretrainedConfig\n\nNEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}\n\nclass NeZhaConfig(PretrainedConfig"
  },
  {
    "path": "code/NEZHA/modeling_nezha.py",
    "chars": 75639,
    "preview": "import math\nimport os\nimport warnings\nfrom dataclasses import dataclass\nfrom typing import Optional, Tuple\n\nimport torch"
  },
  {
    "path": "code/bert-base-chinese/config.json",
    "chars": 624,
    "preview": "{\n  \"architectures\": [\n    \"BertForMaskedLM\"\n  ],\n  \"attention_probs_dropout_prob\": 0.1,\n  \"directionality\": \"bidi\",\n  \""
  },
  {
    "path": "code/bert-base-count3/finetuning/.ipynb_checkpoints/PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb",
    "chars": 71273,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Copyright (c) Microsoft Corporation"
  },
  {
    "path": "code/bert-base-count3/finetuning/Config.py",
    "chars": 2464,
    "preview": "from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear_schedule_wit"
  },
  {
    "path": "code/bert-base-count3/finetuning/NEZHA/configuration_nezha.py",
    "chars": 6316,
    "preview": "\nfrom transformers import PretrainedConfig\n\nNEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}\n\nclass NeZhaConfig(PretrainedConfig"
  },
  {
    "path": "code/bert-base-count3/finetuning/NEZHA/modeling_nezha.py",
    "chars": 59682,
    "preview": "import math\nimport os\nimport logging\nimport torch\n\nfrom torch import nn\nfrom torch.nn import CrossEntropyLoss, MSELoss\n\n"
  },
  {
    "path": "code/bert-base-count3/finetuning/model.py",
    "chars": 31235,
    "preview": "import torch\nimport random\nimport os\nfrom torch import nn, optim\nimport torch.nn.functional as F\nfrom transformers.activ"
  },
  {
    "path": "code/bert-base-count3/finetuning/models/gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count3/finetuning/multi_gpu_QA.py",
    "chars": 8941,
    "preview": "from tqdm import tqdm, trange\nimport numpy as np\nimport pandas as pd\nimport logging\nimport torch\nimport random\nimport os"
  },
  {
    "path": "code/bert-base-count3/finetuning/utils.py",
    "chars": 9440,
    "preview": "import torch\nfrom transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear"
  },
  {
    "path": "code/bert-base-count3/pretrain/NLP_Utils.py",
    "chars": 6894,
    "preview": "import random\nimport json\nimport transformers as _\nfrom transformers1 import BertTokenizer\nimport torch\nfrom torch.utils"
  },
  {
    "path": "code/bert-base-count3/pretrain/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count3/pretrain/bert_model/gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count3/pretrain/train_bert.py",
    "chars": 1677,
    "preview": "# coding:utf-8\nimport numpy as np\nimport random\nimport os\nrandom.seed(0)\nnp.random.seed(0)#seed应该在main里尽早设置，以防万一\nos.envi"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/__init__.py",
    "chars": 17850,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/__main__.py",
    "chars": 7101,
    "preview": "# coding: utf8\ndef main():\n    import sys\n    if (len(sys.argv) < 4 or len(sys.argv) > 6) or sys.argv[1] not in [\"bert\","
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/activations.py",
    "chars": 1537,
    "preview": "import logging\nimport math\n\nimport torch\nimport torch.nn.functional as F\n\n\nlogger = logging.getLogger(__name__)\n\n\ndef sw"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/another_try.py",
    "chars": 440,
    "preview": "from transformers import TFBertModel, BertTokenizer, BertConfig\nimport tensorflow as tf\n\nconfig = BertConfig.from_pretra"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark/__init__.py",
    "chars": 344,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark/benchmark.py",
    "chars": 6120,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_args.py",
    "chars": 2325,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_args_utils.py",
    "chars": 4119,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark/benchmark_utils.py",
    "chars": 27794,
    "preview": "\"\"\"\nUtilities for working with the local dataset cache.\nThis file is adapted from the AllenNLP library at https://github"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/benchmark_utils.py",
    "chars": 15355,
    "preview": "\"\"\"\nUtilities for working with the local dataset cache.\nThis file is adapted from the AllenNLP library at https://github"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/__init__.py",
    "chars": 316,
    "preview": "from abc import ABC, abstractmethod\nfrom argparse import ArgumentParser\n\n\nclass BaseTransformersCLICommand(ABC):\n    @st"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/convert.py",
    "chars": 7149,
    "preview": "from argparse import ArgumentParser, Namespace\nfrom logging import getLogger\n\nfrom transformers.commands import BaseTran"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/download.py",
    "chars": 1272,
    "preview": "from argparse import ArgumentParser\n\nfrom transformers.commands import BaseTransformersCLICommand\n\n\ndef download_command"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/env.py",
    "chars": 2028,
    "preview": "import platform\nfrom argparse import ArgumentParser\n\nfrom transformers import __version__ as version\nfrom transformers i"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/run.py",
    "chars": 3673,
    "preview": "import logging\nfrom argparse import ArgumentParser\n\nfrom transformers.commands import BaseTransformersCLICommand\nfrom tr"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/serving.py",
    "chars": 7462,
    "preview": "import logging\nfrom argparse import ArgumentParser, Namespace\nfrom typing import Any, List, Optional\n\nfrom transformers "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/train.py",
    "chars": 5831,
    "preview": "import os\nfrom argparse import ArgumentParser, Namespace\nfrom logging import getLogger\n\nfrom transformers import SingleS"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/transformers_cli.py",
    "chars": 1170,
    "preview": "#!/usr/bin/env python\nfrom argparse import ArgumentParser\n\nfrom transformers.commands.convert import ConvertCommand\nfrom"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/commands/user.py",
    "chars": 9149,
    "preview": "import os\nimport sys\nfrom argparse import ArgumentParser\nfrom getpass import getpass\nfrom typing import List, Union\n\nfro"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_albert.py",
    "chars": 7513,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_auto.py",
    "chars": 11554,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_bart.py",
    "chars": 5218,
    "preview": "# coding=utf-8\n# Copyright 2020 The Fairseq Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache License"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_bert.py",
    "chars": 8565,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_camembert.py",
    "chars": 1526,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_ctrl.py",
    "chars": 5543,
    "preview": "# coding=utf-8\n# Copyright 2018 Salesforce and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rig"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_distilbert.py",
    "chars": 6790,
    "preview": "# coding=utf-8\n# Copyright 2019-present, the HuggingFace Inc. team, The Google AI Language Team and Facebook, Inc.\n#\n# L"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_electra.py",
    "chars": 6587,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_encoder_decoder.py",
    "chars": 4294,
    "preview": "# coding=utf-8\n# Copyright 2020 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_flaubert.py",
    "chars": 9895,
    "preview": "# coding=utf-8\n# Copyright 2019-present CNRS, Facebook Inc. and the HuggingFace Inc. team.\n#\n# Licensed under the Apache"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_gpt2.py",
    "chars": 8237,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_longformer.py",
    "chars": 3395,
    "preview": "# coding=utf-8\n# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.\n#\n# Licensed under the Ap"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_marian.py",
    "chars": 946,
    "preview": "# coding=utf-8\n# Copyright 2020 The OPUS-NMT Team, Marian team, and The HuggingFace Inc. team.\n#\n# Licensed under the Ap"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_mmbt.py",
    "chars": 1531,
    "preview": "# coding=utf-8\n# Copyright (c) Facebook, Inc. and its affiliates.\n# Copyright (c) HuggingFace Inc. team.\n#\n# Licensed un"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_openai.py",
    "chars": 8062,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_reformer.py",
    "chars": 13431,
    "preview": "# coding=utf-8\n# Copyright 2020 The Trax Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_roberta.py",
    "chars": 3152,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_t5.py",
    "chars": 4523,
    "preview": "# coding=utf-8\n# Copyright 2010, The T5 Authors and HuggingFace Inc.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_transfo_xl.py",
    "chars": 8472,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_utils.py",
    "chars": 18996,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_xlm.py",
    "chars": 12703,
    "preview": "# coding=utf-8\n# Copyright 2019-present, Facebook, Inc and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_xlm_roberta.py",
    "chars": 2003,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/configuration_xlnet.py",
    "chars": 10275,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_albert_original_tf_checkpoint_to_pytorch.py",
    "chars": 2163,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_bart_original_pytorch_checkpoint_to_pytorch.py",
    "chars": 6080,
    "preview": "# coding=utf-8\n# Copyright 2020 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_bert_original_tf_checkpoint_to_pytorch.py",
    "chars": 2139,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_bert_pytorch_checkpoint_to_original_tf.py",
    "chars": 4115,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_dialogpt_original_pytorch_checkpoint_to_pytorch.py",
    "chars": 923,
    "preview": "import argparse\nimport os\n\nimport torch\n\nfrom transformers.file_utils import WEIGHTS_NAME\n\n\nDIALOGPT_MODELS = [\"small\", "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_electra_original_tf_checkpoint_to_pytorch.py",
    "chars": 2853,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_gpt2_original_tf_checkpoint_to_pytorch.py",
    "chars": 2507,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_graph_to_onnx.py",
    "chars": 8095,
    "preview": "from argparse import ArgumentParser\nfrom os import listdir, makedirs\nfrom os.path import abspath, dirname, exists\nfrom t"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_longformer_original_pytorch_lightning_to_pytorch.py",
    "chars": 3037,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_marian_to_pytorch.py",
    "chars": 20856,
    "preview": "import argparse\nimport json\nimport os\nimport shutil\nimport warnings\nfrom pathlib import Path\nfrom typing import Dict, Li"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_openai_original_tf_checkpoint_to_pytorch.py",
    "chars": 2641,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_pytorch_checkpoint_to_tf2.py",
    "chars": 14504,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_reformer_trax_checkpoint_to_pytorch.py",
    "chars": 7729,
    "preview": "# coding=utf-8\n# Copyright 2020 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_roberta_original_pytorch_checkpoint_to_pytorch.py",
    "chars": 7917,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_t5_original_tf_checkpoint_to_pytorch.py",
    "chars": 2100,
    "preview": "# coding=utf-8\n# Copyright 2018 The T5 authors and HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_transfo_xl_original_tf_checkpoint_to_pytorch.py",
    "chars": 4913,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_xlm_original_pytorch_checkpoint_to_pytorch.py",
    "chars": 2970,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/convert_xlnet_original_tf_checkpoint_to_pytorch.py",
    "chars": 3685,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/__init__.py",
    "chars": 739,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/data_collator.py",
    "chars": 28404,
    "preview": "from abc import ABC, abstractmethod\nfrom dataclasses import dataclass\nfrom typing import Any, Dict, List, NewType, Tuple"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/datasets/__init__.py",
    "chars": 294,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/datasets/glue.py",
    "chars": 5140,
    "preview": "import logging\nimport os\nimport time\nfrom dataclasses import dataclass, field\nfrom enum import Enum\nfrom typing import L"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/datasets/language_modeling.py",
    "chars": 3886,
    "preview": "import logging\nimport os\nimport pickle\nimport time\n\nimport torch\nfrom filelock import FileLock\nfrom torch.utils.data.dat"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/metrics/__init__.py",
    "chars": 3005,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/metrics/squad_metrics.py",
    "chars": 29002,
    "preview": "\"\"\" Very heavily inspired by the official evaluation script for SQuAD version 2.0 which was\nmodified by XLNet authors to"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/processors/__init__.py",
    "chars": 578,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/processors/glue.py",
    "chars": 21508,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/processors/squad.py",
    "chars": 29147,
    "preview": "import json\nimport logging\nimport os\nfrom functools import partial\nfrom multiprocessing import Pool, cpu_count\n\nimport n"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/processors/utils.py",
    "chars": 13847,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/data/processors/xnli.py",
    "chars": 2971,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/file.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/file_utils.py",
    "chars": 18037,
    "preview": "\"\"\"\nUtilities for working with the local dataset cache.\nThis file is adapted from the AllenNLP library at https://github"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/filep.py",
    "chars": 1009,
    "preview": "from transformers import GPT2LMHeadModel, GPT2Tokenizer\nimport torch\n\ntokenizer = GPT2Tokenizer.from_pretrained(\"gpt2\")\n"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/hf_api.py",
    "chars": 7909,
    "preview": "# coding=utf-8\n# Copyright 2019-present, the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 ("
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/hf_argparser.py",
    "chars": 7125,
    "preview": "import dataclasses\nimport json\nimport sys\nfrom argparse import ArgumentParser\nfrom enum import Enum\nfrom pathlib import "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modelcard.py",
    "chars": 10123,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_albert.py",
    "chars": 49943,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lic"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_auto.py",
    "chars": 71001,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_bart.py",
    "chars": 48981,
    "preview": "# coding=utf-8\n# Copyright 2020 The Facebook AI Research Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_beam_search.py",
    "chars": 10385,
    "preview": "# coding=utf-8\n# Copyright (c) 2019 Yang Liu\n\n# Permission is hereby granted, free of charge, to any person obtaining a "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_bert.py",
    "chars": 67451,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_camembert.py",
    "chars": 4943,
    "preview": "# coding=utf-8\n# Copyright 2019 Inria, Facebook AI Research and the HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_ctrl.py",
    "chars": 24473,
    "preview": "# coding=utf-8\n# Copyright 2018 Salesforce and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rig"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_distilbert.py",
    "chars": 38330,
    "preview": "# coding=utf-8\n# Copyright 2019-present, the HuggingFace Inc. team, The Google AI Language Team and Facebook, Inc.\n#\n# L"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_electra.py",
    "chars": 32789,
    "preview": "import logging\nimport os\n\nimport torch\nimport torch.nn as nn\nfrom torch.nn import CrossEntropyLoss, MSELoss\n\nfrom .activ"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_encoder_decoder.py",
    "chars": 17750,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_flaubert.py",
    "chars": 16120,
    "preview": "# coding=utf-8\n# Copyright 2019-present CNRS, Facebook Inc. and the HuggingFace Inc. team.\n#\n# Licensed under the Apache"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_gpt2.py",
    "chars": 34640,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_longformer.py",
    "chars": 62348,
    "preview": "# coding=utf-8\n# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.\n#\n# Licensed under the Ap"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_marian.py",
    "chars": 2232,
    "preview": "# coding=utf-8\n# Copyright 2020 Marian Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache License"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_mmbt.py",
    "chars": 18339,
    "preview": "# coding=utf-8\n# Copyright (c) Facebook, Inc. and its affiliates.\n# Copyright (c) HuggingFace Inc. team.\n#\n# Licensed un"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_openai.py",
    "chars": 31665,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_reformer.py",
    "chars": 74861,
    "preview": "# coding=utf-8\n# Copyright 2020 The Trax Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_roberta.py",
    "chars": 31410,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_t5.py",
    "chars": 53884,
    "preview": "# coding=utf-8\n# Copyright 2018 Mesh TensorFlow authors, T5 Authors and HuggingFace Inc. team.\n#\n# Licensed under the Ap"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_albert.py",
    "chars": 51018,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_auto.py",
    "chars": 77465,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_bert.py",
    "chars": 55544,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_camembert.py",
    "chars": 4580,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_ctrl.py",
    "chars": 27239,
    "preview": "# coding=utf-8\n# Copyright 2018 Salesforce and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rig"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_distilbert.py",
    "chars": 38814,
    "preview": "# coding=utf-8\n# Copyright 2019-present, the HuggingFace Inc. team, The Google AI Language Team and Facebook, Inc.\n#\n# L"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_electra.py",
    "chars": 28235,
    "preview": "import logging\n\nimport tensorflow as tf\n\nfrom transformers import ElectraConfig\n\nfrom .file_utils import add_start_docst"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_flaubert.py",
    "chars": 15628,
    "preview": "# coding=utf-8\n# Copyright 2019-present, Facebook, Inc and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_gpt2.py",
    "chars": 34194,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_openai.py",
    "chars": 30606,
    "preview": "# coding=utf-8\n# Copyright 2018 The OpenAI Team Authors and HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORAT"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_pytorch_utils.py",
    "chars": 12952,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_roberta.py",
    "chars": 24964,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_t5.py",
    "chars": 53109,
    "preview": "# coding=utf-8\n# Copyright 2018 T5 Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_transfo_xl.py",
    "chars": 35690,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_transfo_xl_utilities.py",
    "chars": 7702,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_utils.py",
    "chars": 84368,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n# Copyright (c) 2018,"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_xlm.py",
    "chars": 38799,
    "preview": "# coding=utf-8\n# Copyright 2019-present, Facebook, Inc and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_xlm_roberta.py",
    "chars": 4604,
    "preview": "# coding=utf-8\n# Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORA"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_tf_xlnet.py",
    "chars": 60484,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_transfo_xl.py",
    "chars": 39919,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_transfo_xl_utilities.py",
    "chars": 10697,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_utils.py",
    "chars": 108860,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors, Facebook AI Research authors and The HuggingFace In"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_xlm.py",
    "chars": 50898,
    "preview": "# coding=utf-8\n# Copyright 2019-present, Facebook, Inc and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_xlm_roberta.py",
    "chars": 4498,
    "preview": "# coding=utf-8\n# Copyright 2019 Facebook AI Research and the HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORA"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/modeling_xlnet.py",
    "chars": 79879,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/optimization.py",
    "chars": 7691,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under th"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/optimization_tf.py",
    "chars": 10486,
    "preview": "# Copyright 2019 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/pipelines.py",
    "chars": 79703,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_albert.py",
    "chars": 14164,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and the HuggingFace Inc. team.\n#\n# Licensed under the Apache Lic"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_auto.py",
    "chars": 10665,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_bart.py",
    "chars": 1920,
    "preview": "# coding=utf-8\n# Copyright 2020 The Facebook AI Research Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_bert.py",
    "chars": 30615,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under th"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_bert_japanese.py",
    "chars": 10058,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under th"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_camembert.py",
    "chars": 12124,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_ctrl.py",
    "chars": 8553,
    "preview": "# coding=utf-8\n# Copyright 2018 Salesforce and The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_distilbert.py",
    "chars": 3815,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_electra.py",
    "chars": 3750,
    "preview": "# coding=utf-8\n# Copyright 2020 The Google AI Team, Stanford University and The HuggingFace Inc. team.\n#\n# Licensed unde"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_flaubert.py",
    "chars": 5615,
    "preview": "# coding=utf-8\n# Copyright 2019-present CNRS, Facebook Inc. and the HuggingFace Inc. team.\n#\n# Licensed under the Apache"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_gpt2.py",
    "chars": 13497,
    "preview": "# coding=utf-8\n# Copyright 2018 The Open AI Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_longformer.py",
    "chars": 2260,
    "preview": "# coding=utf-8\n# Copyright 2020 The Allen Institute for AI team and The HuggingFace Inc. team.\n#\n# Licensed under the Ap"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_marian.py",
    "chars": 9766,
    "preview": "import json\nimport re\nimport warnings\nfrom pathlib import Path\nfrom shutil import copyfile\nfrom typing import Dict, List"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_openai.py",
    "chars": 9806,
    "preview": "# coding=utf-8\n# Copyright 2018 The Open AI Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_reformer.py",
    "chars": 6754,
    "preview": "# coding=utf-8\n# Copyright 2020 The Trax Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, V"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_roberta.py",
    "chars": 15719,
    "preview": "# coding=utf-8\n# Copyright 2018 The Open AI Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_t5.py",
    "chars": 8346,
    "preview": "# coding=utf-8\n# Copyright 2018 T5 Authors and HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_transfo_xl.py",
    "chars": 28414,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_utils.py",
    "chars": 133366,
    "preview": "# coding=utf-8\n# Copyright 2020 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_xlm.py",
    "chars": 35404,
    "preview": "# coding=utf-8\n# Copyright 2019 The Open AI Team Authors and The HuggingFace Inc. team.\n#\n# Licensed under the Apache Li"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_xlm_roberta.py",
    "chars": 13660,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/tokenization_xlnet.py",
    "chars": 13909,
    "preview": "# coding=utf-8\n# Copyright 2018 Google AI, Google Brain and Carnegie Mellon University Authors and the HuggingFace Inc. "
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/trainer.py",
    "chars": 32725,
    "preview": "import json\nimport logging\nimport math\nimport os\nimport random\nimport re\nimport shutil\nfrom contextlib import contextman"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/trainer_tf.py",
    "chars": 17414,
    "preview": "\"\"\"Tensorflow trainer class.\"\"\"\n\nimport logging\nimport math\nimport os\nfrom typing import Callable, Dict, Optional\n\nimpor"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/trainer_utils.py",
    "chars": 516,
    "preview": "from typing import Dict, NamedTuple, Optional\n\nimport numpy as np\n\n\nclass EvalPrediction(NamedTuple):\n    \"\"\"\n    Evalua"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/training_args.py",
    "chars": 7917,
    "preview": "import dataclasses\nimport json\nimport logging\nfrom dataclasses import dataclass, field\nfrom typing import Any, Dict, Opt"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/training_args_tf.py",
    "chars": 3080,
    "preview": "import logging\nfrom dataclasses import dataclass, field\nfrom typing import Tuple\n\nfrom .file_utils import cached_propert"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/try.py",
    "chars": 720,
    "preview": "from transformers import TFAlbertForMaskedLM, TFAlbertModel, TFAlbertForSequenceClassification, AlbertForMaskedLM\nimport"
  },
  {
    "path": "code/bert-base-count3/pretrain/transformers1/utils_encoder_decoder.py",
    "chars": 1862,
    "preview": "# coding=utf-8\n# Copyright 2020 The HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Lice"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/.ipynb_checkpoints/PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb",
    "chars": 71273,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Copyright (c) Microsoft Corporation"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/Config.py",
    "chars": 2464,
    "preview": "from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear_schedule_wit"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/NEZHA/configuration_nezha.py",
    "chars": 6316,
    "preview": "\nfrom transformers import PretrainedConfig\n\nNEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}\n\nclass NeZhaConfig(PretrainedConfig"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/NEZHA/modeling_nezha.py",
    "chars": 59682,
    "preview": "import math\nimport os\nimport logging\nimport torch\n\nfrom torch import nn\nfrom torch.nn import CrossEntropyLoss, MSELoss\n\n"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/model.py",
    "chars": 31235,
    "preview": "import torch\nimport random\nimport os\nfrom torch import nn, optim\nimport torch.nn.functional as F\nfrom transformers.activ"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/models/gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/multi_gpu_QA.py",
    "chars": 8961,
    "preview": "from tqdm import tqdm, trange\nimport numpy as np\nimport pandas as pd\nimport logging\nimport torch\nimport random\nimport os"
  },
  {
    "path": "code/bert-base-count3-len100/finetuning/utils.py",
    "chars": 9440,
    "preview": "import torch\nfrom transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear"
  },
  {
    "path": "code/bert-base-count5/finetuning/.ipynb_checkpoints/PyTorch_Bert-Squad_OnnxRuntime_GPU-checkpoint.ipynb",
    "chars": 71273,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"Copyright (c) Microsoft Corporation"
  },
  {
    "path": "code/bert-base-count5/finetuning/Config.py",
    "chars": 2464,
    "preview": "from transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear_schedule_wit"
  },
  {
    "path": "code/bert-base-count5/finetuning/NEZHA/configuration_nezha.py",
    "chars": 6316,
    "preview": "\nfrom transformers import PretrainedConfig\n\nNEZHA_PRETRAINED_CONFIG_ARCHIVE_MAP = {}\n\nclass NeZhaConfig(PretrainedConfig"
  },
  {
    "path": "code/bert-base-count5/finetuning/NEZHA/modeling_nezha.py",
    "chars": 59682,
    "preview": "import math\nimport os\nimport logging\nimport torch\n\nfrom torch import nn\nfrom torch.nn import CrossEntropyLoss, MSELoss\n\n"
  },
  {
    "path": "code/bert-base-count5/finetuning/model.py",
    "chars": 31235,
    "preview": "import torch\nimport random\nimport os\nfrom torch import nn, optim\nimport torch.nn.functional as F\nfrom transformers.activ"
  },
  {
    "path": "code/bert-base-count5/finetuning/models/gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count5/finetuning/multi_gpu_QA.py",
    "chars": 8942,
    "preview": "from tqdm import tqdm, trange\nimport numpy as np\nimport pandas as pd\nimport logging\nimport torch\nimport random\nimport os"
  },
  {
    "path": "code/bert-base-count5/finetuning/utils.py",
    "chars": 9440,
    "preview": "import torch\nfrom transformers import BertTokenizer, AdamW, BertModel, BertPreTrainedModel, BertConfig, \\\n    get_linear"
  },
  {
    "path": "code/bert-base-count5/pretrain/NLP_Utils.py",
    "chars": 6894,
    "preview": "import random\nimport json\nimport transformers as _\nfrom transformers1 import BertTokenizer\nimport torch\nfrom torch.utils"
  },
  {
    "path": "code/bert-base-count5/pretrain/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count5/pretrain/bert_model/gitkeep",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/bert-base-count5/pretrain/train_bert.py",
    "chars": 1678,
    "preview": "# coding:utf-8\nimport numpy as np\nimport random\nimport os\nrandom.seed(0)\nnp.random.seed(0)#seed应该在main里尽早设置，以防万一\nos.envi"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/__init__.py",
    "chars": 17850,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/__main__.py",
    "chars": 7101,
    "preview": "# coding: utf8\ndef main():\n    import sys\n    if (len(sys.argv) < 4 or len(sys.argv) > 6) or sys.argv[1] not in [\"bert\","
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/activations.py",
    "chars": 1537,
    "preview": "import logging\nimport math\n\nimport torch\nimport torch.nn.functional as F\n\n\nlogger = logging.getLogger(__name__)\n\n\ndef sw"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/another_try.py",
    "chars": 440,
    "preview": "from transformers import TFBertModel, BertTokenizer, BertConfig\nimport tensorflow as tf\n\nconfig = BertConfig.from_pretra"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/benchmark/__init__.py",
    "chars": 344,
    "preview": "# flake8: noqa\n# There's no way to ignore \"F401 '...' imported but unused\" warnings in this\n# module, but to preserve ot"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/benchmark/benchmark.py",
    "chars": 6120,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  },
  {
    "path": "code/bert-base-count5/pretrain/transformers1/benchmark/benchmark_args.py",
    "chars": 2325,
    "preview": "# coding=utf-8\n# Copyright 2018 The HuggingFace Inc. team.\n# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserve"
  }
]

// ... and 501 more files (download for full content)

About this extraction

This page contains the full source code of the daniellibin/gaiic2021_track3_querySim GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 701 files (12.2 MB), approximately 3.2M tokens, and a symbol index with 11552 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo