[
  {
    "path": "README.md",
    "content": "# albert_zh\n\nAn Implementation of <a href=\"https://arxiv.org/pdf/1909.11942.pdf\">A Lite Bert For Self-Supervised Learning Language Representations</a> with TensorFlow\n\nALBert is based on Bert, but with some improvements. It achieves state of the art performance on main benchmarks with 30% parameters less. \n\nFor albert_base_zh it only has ten percentage parameters compare of original bert model, and main accuracy is retained. \n\n\nDifferent version of ALBERT pre-trained model for Chinese, including TensorFlow, PyTorch and Keras, is available now.\n\n海量中文语料上预训练ALBERT模型：参数更少，效果更好。预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准\n\n<a href='https://www.cluebenchmarks.com/clueai.html'>clueai工具包: 三行代码，三分钟定制一个NLP的API（零样本学习）</a>\n\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_tiny_compare_s.jpg\"  width=\"90%\" height=\"70%\" />\n\n一键运行10个数据集、9个基线模型、不同任务上模型效果的详细对比，见<a href=\"http://www.CLUEbenchmarks.com\">CLUE benchmark</a>\n\n一键运行CLUE中文任务：6个中文分类或句子对任务（新）\n---------------------------------------------------------------------\n    使用方式：\n    1、克隆项目\n       git clone https://github.com/brightmart/albert_zh.git\n    2、运行一键运行脚本(GPU方式): 会自动下载模型和所有任务数据并开始运行。\n       bash run_classifier_clue.sh\n       执行该一键运行脚本将会自动下载所有任务数据，并为所有任务找到最优模型，然后测试得到提交结果\n    \n\n模型下载 Download Pre-trained Models of Chinese\n-----------------------------------------------\n1、<a href=\"https://storage.googleapis.com/albert_zh/albert_tiny.zip\">albert_tiny_zh</a>, <a href=\"https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip\">albert_tiny_zh(训练更久，累积学习20亿个样本)</a>，文件大小16M、参数为4M\n\n    训练和推理预测速度提升约10倍，精度基本保留，模型大小为bert的1/25；语义相似度数据集LCQMC测试集上达到85.4%，相比bert_base仅下降1.5个点。\n\n    lcqmc训练使用如下参数： --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 \n    \n    albert_tiny使用同样的大规模中文语料数据，层数仅为4层、hidden size等向量维度大幅减少; 尝试使用如下学习率来获得更好效果：{2e-5, 6e-5, 1e-4} \n    \n    【使用场景】任务相对比较简单一些或实时性要求高的任务，如语义相似度等句子对任务、分类任务；比较难的任务如阅读理解等，可以使用其他大模型。\n\n     例如，可以使用[Tensorflow Lite](https://www.tensorflow.org/lite)在移动端进行部署，本文[随后](#use_tflite)针对这一点进行了介绍，包括如何把模型转换成Tensorflow Lite格式和对其进行性能测试等。\n     \n     一键运行albert_tiny_zh(linux,lcqmc任务)：\n     1) git clone https://github.com/brightmart/albert_zh\n     2) cd albert_zh\n     3) bash run_classifier_lcqmc.sh\n1.1、<a href=\"https://storage.googleapis.com/albert_zh/albert_tiny_zh_google.zip\">albert_tiny_google_zh(累积学习10亿个样本,google版本)</a>，模型大小16M、性能与albert_tiny_zh一致\n\n1.2、<a href=\"https://storage.googleapis.com/albert_zh/albert_small_zh_google.zip\">albert_small_google_zh(累积学习10亿个样本,google版本)</a>，\n     \n     速度比bert_base快4倍；LCQMC测试集上比Bert下降仅0.9个点；去掉adam后模型大小18.5M；使用方法，见 #下游任务 Fine-tuning on Downstream Task     \n     \n2、<a href=\"https://storage.googleapis.com/albert_zh/albert_large_zh.zip\">albert_large_zh</a>,参数量，层数24，文件大小为64M\n   \n    参数量和模型大小为bert_base的六分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base上升0.2个点\n\n3、<a href=\"https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip\">albert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096)</a>; <a href=\"https://storage.googleapis.com/albert_zh/albert_base_zh.zip\"> albert_base_zh(小模型体验版)</a>, 参数量12M, 层数12，大小为40M\n\n    参数量为bert_base的十分之一，模型大小也十分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base下降约0.6~1个点；\n    相比未预训练，albert_base提升14个点\n\n4、<a href=\"https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip\">albert_xlarge_zh_177k </a>; \n<a href=\"https://storage.googleapis.com/albert_zh/albert_xlarge_zh_183k.zip\">albert_xlarge_zh_183k(优先尝试)</a>参数量，层数24，文件大小为230M\n   \n    参数量和模型大小为bert_base的二分之一；需要一张大的显卡；完整测试对比将后续添加；batch_size不能太小，否则可能影响精度\n\n### 快速加载\n依托于[Huggingface-Transformers 2.2.2](https://github.com/huggingface/transformers)，可轻松调用以上模型。\n```\ntokenizer = AutoTokenizer.from_pretrained(\"MODEL_NAME\")\nmodel = AutoModel.from_pretrained(\"MODEL_NAME\")\n```\n\n其中`MODEL_NAME`对应列表如下：\n\n| 模型名 | MODEL_NAME |\n| - | - |\n| albert_tiny_google_zh | voidful/albert_chinese_tiny |\n| albert_small_google_zh | voidful/albert_chinese_small  |\n| albert_base_zh (from google) | voidful/albert_chinese_base   |\n| albert_large_zh (from google) | voidful/albert_chinese_large   |\n| albert_xlarge_zh (from google) | voidful/albert_chinese_xlarge   |\n| albert_xxlarge_zh (from google) | voidful/albert_chinese_xxlarge   |\n\n更多通过transformers使用albert的<a href='https://huggingface.co/models?search=albert_chinese'>示例</a>\n\n预训练 Pre-training\n-----------------------------------------------\n\n#### 生成特定格式的文件(tfrecords) Generate tfrecords Files\n\nRun following command 运行以下命令即可。项目自动了一个示例的文本文件(data/news_zh_1.txt)\n   \n       bash create_pretrain_data.sh\n   \n如果你有很多文本文件，可以通过传入参数的方式，生成多个特定格式的文件(tfrecords）\n\n###### Support English and Other Non-Chinese Language: \n    If you are doing pre-train for english or other language,which is not chinese, \n    you should set hyperparameter of non_chinese to True on create_pretraining_data.py; \n    otherwise, by default it is doing chinese pre-train using whole word mask of chinese.\n\n#### 执行预训练 pre-training on GPU/TPU using the command\n    GPU(brightmart版, tiny模型):\n    export BERT_BASE_DIR=./albert_tiny_zh\n    nohup python3 run_pretraining.py --input_file=./data/tf*.tfrecord  \\\n    --output_dir=./my_new_model_path --do_train=True --do_eval=True --bert_config_file=$BERT_BASE_DIR/albert_config_tiny.json \\\n    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=51 \\\n    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176    \\\n    --save_checkpoints_steps=2000  --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &\n    \n    GPU(Google版本, small模型):\n    export BERT_BASE_DIR=./albert_small_zh_google\n    nohup python3 run_pretraining_google.py --input_file=./data/tf*.tfrecord --eval_batch_size=64 \\\n    --output_dir=./my_new_model_path --do_train=True --do_eval=True --albert_config_file=$BERT_BASE_DIR/albert_config_small_google.json  --export_dir=./my_new_model_path_export \\\n    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=20 \\\n    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176   \\\n    --save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt\n    \n    TPU, add something like this:\n        --use_tpu=True  --tpu_name=grpc://10.240.1.66:8470 --tpu_zone=us-central1-a\n        \n    注：如果你重头开始训练，可以不指定init_checkpoint；\n    如果你从现有的模型基础上训练，指定一下BERT_BASE_DIR的路径，并确保bert_config_file和init_checkpoint两个参数的值能对应到相应的文件上；\n    领域上的预训练，根据数据的大小，可以不用训练特别久。\n\n环境 Environment\n-----------------------------------------------\nUse Python3 + Tensorflow 1.x \n\ne.g. Tensorflow 1.4 or 1.5\n\n\n下游任务 Fine-tuning on Downstream Task\n-----------------------------------------------\n##### 使用TensorFlow:\n\n以使用albert_base做LCQMC任务为例。LCQMC任务是在口语化描述的数据集上做文本的相似性预测。\n\nWe will use LCQMC dataset for fine-tuning, it is oral language corpus, it is used to train and predict semantic similarity of a pair of sentences.\n\n下载<a href=\"https://drive.google.com/open?id=1HXYMqsXjmA5uIfu_SFqP7r_vZZG-m_H0\">LCQMC</a>数据集，包含训练、验证和测试集，训练集包含24万口语化描述的中文句子对，标签为1或0。1为句子语义相似，0为语义不相似。\n\n通过运行下列命令做LCQMC数据集上的fine-tuning:\n    \n    1. Clone this project:\n          \n          git clone https://github.com/brightmart/albert_zh.git\n          \n    2. Fine-tuning by running the following command.\n        brightmart版本的tiny模型\n        export BERT_BASE_DIR=./albert_tiny_zh\n        export TEXT_DIR=./lcqmc\n        nohup python3 run_classifier.py   --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \\\n        --bert_config_file=./albert_config/albert_config_tiny.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4  --num_train_epochs=5 \\\n        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &\n        \n        google版本的small模型\n        export BERT_BASE_DIR=./albert_small_zh\n        export TEXT_DIR=./lcqmc\n        nohup python3 run_classifier_sp_google.py --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \\\n        --albert_config_file=./$BERT_BASE_DIR/albert_config_small_google.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 \\\n        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &\n\n    Notice/注：\n        1) you need to download pre-trained chinese albert model, and also download LCQMC dataset \n        你需要下载预训练的模型，并放入到项目当前项目，假设目录名称为albert_tiny_zh; 需要下载LCQMC数据集，并放入到当前项目，\n        假设数据集目录名称为lcqmc\n\n        2) for Fine-tuning, you can try to add small percentage of dropout(e.g. 0.1) by changing parameters of \n          attention_probs_dropout_prob & hidden_dropout_prob on albert_config_xxx.json. By default, we set dropout as zero. \n        \n        3) you can try different learning rate {2e-5, 6e-5, 1e-4} for better performance \n\n\nUpdates\n-----------------------------------------------\n**\\*\\*\\*\\*\\* 2019-11-03: add google version of albert_small, albert_tiny; \n\nadd method to deploy ablert_tiny to mobile devices with only 0.1 second inference time for sequence length 128, 60M memory \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-30: add a simple guide about converting the model to Tensorflow Lite for edge deployment \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-15: albert_tiny_zh, 10 times fast than bert base for training and inference, accuracy remains \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-07: more models of albert \\*\\*\\*\\*\\***\n\nadd albert_xlarge_zh; albert_base_zh_additional_steps, training with more instances\n\n**\\*\\*\\*\\*\\* 2019-10-04: PyTorch and Keras versions of albert were supported \\*\\*\\*\\*\\***\n\na.Convert to PyTorch version and do your tasks through <a href=\"https://github.com/lonePatient/albert_pytorch\">albert_pytorch</a>\n\nb.Load pre-trained model with keras using one line of codes through <a href=\"https://github.com/bojone/bert4keras\">bert4keras</a>\n\nc.Use albert with TensorFlow 2.0: Use or load pre-trained model with tf2.0 through <a href=\"https://github.com/kpe/bert-for-tf2\">bert-for-tf2</a>\n\nReleasing albert_xlarge on 6th Oct\n\n**\\*\\*\\*\\*\\* 2019-10-02: albert_large_zh,albert_base_zh \\*\\*\\*\\*\\***\n\nRelesed albert_base_zh with only 10% parameters of bert_base, a small model(40M) & training can be very fast. \n\nRelased albert_large_zh with only 16% parameters of bert_base(64M)\n\n**\\*\\*\\*\\*\\* 2019-09-28: codes and test functions \\*\\*\\*\\*\\*** \n\nAdd codes and test functions for three main changes of albert from bert\n\nALBERT模型介绍 Introduction of ALBERT\n-----------------------------------------------\nALBERT模型是BERT的改进版，与最近其他State of the art的模型不同的是，这次是预训练小模型，效果更好、参数更少。\n\n它对BERT进行了三个改造 Three main changes of ALBert from Bert：\n\n1）词嵌入向量参数的因式分解 Factorized embedding parameterization\n   \n     O(V * H) to O(V * E + E * H)\n     \n     如以ALBert_xxlarge为例，V=30000, H=4096, E=128\n       \n     那么原先参数为V * H= 30000 * 4096 = 1.23亿个参数，现在则为V * E + E * H = 30000*128+128*4096 = 384万 + 52万 = 436万，\n       \n     词嵌入相关的参数变化前是变换后的28倍。\n\n\n2）跨层参数共享 Cross-Layer Parameter Sharing\n\n     参数共享能显著减少参数。共享可以分为全连接层、注意力层的参数共享；注意力层的参数对效果的减弱影响小一点。\n\n3）段落连续性任务 Inter-sentence coherence loss.\n     \n     使用段落连续性任务。正例，使用从一个文档中连续的两个文本段落；负例，使用从一个文档中连续的两个文本段落，但位置调换了。\n     \n     避免使用原有的NSP任务，原有的任务包含隐含了预测主题这类过于简单的任务。\n\n      We maintain that inter-sentence modeling is an important aspect of language understanding, but we propose a loss \n      based primarily on coherence. That is, for ALBERT, we use a sentence-order prediction (SOP) loss, which avoids topic \n      prediction and instead focuses on modeling inter-sentence coherence. The SOP loss uses as positive examples the \n      same technique as BERT (two consecutive segments from the same document), and as negative examples the same two \n      consecutive segments but with their order swapped. This forces the model to learn finer-grained distinctions about\n      discourse-level coherence properties. \n\n其他变化，还有 Other changes：\n\n    1）去掉了dropout  Remove dropout to enlarge capacity of model.\n        最大的模型，训练了1百万步后，还是没有过拟合训练数据。说明模型的容量还可以更大，就移除了dropout\n        （dropout可以认为是随机的去掉网络中的一部分，同时使网络变小一些）\n        We also note that, even after training for 1M steps, our largest models still do not overfit to their training data. \n        As a result, we decide to remove dropout to further increase our model capacity.\n        其他型号的模型，在我们的实现中我们还是会保留原始的dropout的比例，防止模型对训练数据的过拟合。\n        \n    2）为加快训练速度，使用LAMB做为优化器 Use LAMB as optimizer, to train with big batch size\n      使用了大的batch_size来训练(4096)。 LAMB优化器使得我们可以训练，特别大的批次batch_size，如高达6万。\n    \n    3）使用n-gram(uni-gram,bi-gram, tri-gram）来做遮蔽语言模型 Use n-gram as make language model\n       即以不同的概率使用n-gram,uni-gram的概率最大，bi-gram其次，tri-gram概率最小。\n       本项目中目前使用的是在中文上做whole word mask，稍后会更新一下与n-gram mask的效果对比。n-gram从spanBERT中来。\n\n\n训练语料/训练配置 Training Data & Configuration\n-----------------------------------------------\n30g中文语料，超过100亿汉字，包括多个百科、新闻、互动社区。\n\n预训练序列长度sequence_length设置为512，批次batch_size为4096，训练产生了3.5亿个训练数据(instance)；每一个模型默认会训练125k步，albert_xxlarge将训练更久。\n\n作为比较，roberta_zh预训练产生了2.5亿个训练数据、序列长度为256。由于albert_zh预训练生成的训练数据更多、使用的序列长度更长，\n \n    我们预计albert_zh会有比roberta_zh更好的性能表现，并且能更好处理较长的文本。\n\n训练使用TPU v3 Pod，我们使用的是v3-256，它包含32个v3-8。每个v3-8机器，含有128G的显存。\n\n\n模型性能与对比(英文) Performance and Comparision\n-----------------------------------------------    \n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/state_of_the_art.jpg\"  width=\"80%\" height=\"40%\" />\n  \n   \n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_performance.jpg\"  width=\"80%\" height=\"40%\" />\n\n\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/add_data_removing_dropout.jpg\"  width=\"80%\" height=\"40%\" />\n\n\n中文任务集上效果对比测试 Performance on Chinese datasets\n----------------------------------------------- \n\n###  问题匹配语任务：LCQMC(Sentence Pair Matching)\n\n| 模型 | 开发集(Dev) | 测试集(Test) |\n| :------- | :---------: | :---------: |\n| BERT | 89.4(88.4) | 86.9(86.4) | \n| ERNIE | 89.8 (89.6) | 87.2 (87.0) | \n| BERT-wwm |89.4 (89.2) | 87.0 (86.8) | \n| BERT-wwm-ext | - |-  |\n| RoBERTa-zh-base | 88.7 | 87.0  |\n| RoBERTa-zh-Large | ***89.9(89.6)*** | 87.2(86.7) |\n| RoBERTa-zh-Large(20w_steps) | 89.7| 87.0 |\n| ALBERT-zh-tiny | -- | 85.4 |\n| ALBERT-zh-small | -- | 86.0 |\n| ALBERT-zh-small(Pytorch) | -- | 86.8 |\n| ALBERT-zh-base-additional-36k-steps | 87.8 | 86.3 |\n| ALBERT-zh-base | 87.2 | 86.3 |\n| ALBERT-large | 88.7 | 87.1 |\n| ALBERT-xlarge | 87.3 | ***87.7*** |\n\n注：只跑了一次ALBERT-xlarge，效果还可能提升\n\n### 自然语言推断：XNLI of Chinese Version\n\n| 模型 | 开发集 | 测试集 |\n| :------- | :---------: | :---------: |\n| BERT | 77.8 (77.4) | 77.8 (77.5) | \n| ERNIE | 79.7 (79.4) | 78.6 (78.2) | \n| BERT-wwm | 79.0 (78.4) | 78.2 (78.0) | \n| BERT-wwm-ext | 79.4 (78.6) | 78.7 (78.3) |\n| XLNet | 79.2  | 78.7 |\n| RoBERTa-zh-base | 79.8 |78.8  |\n| RoBERTa-zh-Large | 80.2 (80.0) | 79.9 (79.5) |\n| ALBERT-base | 77.0 | 77.1 |\n| ALBERT-large | 78.0 | 77.5 |\n| ALBERT-xlarge | ? | ? |\n\n注：BERT-wwm-ext来自于<a href=\"https://github.com/ymcui/Chinese-BERT-wwm\">这里</a>；XLNet来自于<a href=\"https://github.com/ymcui/Chinese-PreTrained-XLNet\">这里</a>; RoBERTa-zh-base，指12层RoBERTa中文模型\n   \n\n###  阅读理解任务：CRMC2018\n\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/crmc2018_compare_s.jpg\"  width=\"90%\" height=\"70%\" />\n\n\n### 语言模型、文本段预测准确性、训练时间 Mask Language Model Accuarcy & Training Time\n\n| Model | MLM eval acc | SOP eval acc | Training(Hours) | Loss eval |\n| :------- | :---------: | :---------: | :---------: |:---------: |\n| albert_zh_base | 79.1% | 99.0% | 6h | 1.01|\n| albert_zh_large | 80.9% | 98.6% | 22.5h | 0.93|\n| albert_zh_xlarge | ? | ? | 53h(预估) | ? |\n| albert_zh_xxlarge | ? | ? | 106h(预估) | ? |\n\n注：? 将很快替换\n\n模型参数和配置 Configuration of Models\n-----------------------------------------------\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_configuration.jpg\"  width=\"80%\" height=\"40%\" />\n\n代码实现和测试 Implementation and Code Testing\n-----------------------------------------------\n通过运行以下命令测试主要的改进点，包括但不限于词嵌入向量参数的因式分解、跨层参数共享、段落连续性任务等。\n\n    python test_changes.py\n\n##### <a name=\"use_tflite\"></a>使用TensorFlow Lite(TFLite)在移动端进行部署:\n这里我们主要介绍TFLite模型格式转换和性能测试。转换成TFLite模型后，对于如何在移\n动端使用该模型，可以参考TFLite提供的[Android/iOS应用完整开发案例教程页面](https://www.tensorflow.org/lite/examples)。\n该页面目前已经包含了[文本分类](https://github.com/tensorflow/examples/blob/master/lite/examples/text_classification/android)，\n[文本问答](https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/android)两个Android案例。\n\n下面以<a href=\"https://storage.googleapis.com/albert_zh/albert_tiny.zip\">albert_tiny_zh</a>\n为例来介绍TFLite模型格式转换和性能测试：\n\n1. Freeze graph from the checkpoint\n\nEnsure to have >=1.14 1.x installed to use the freeze_graph tool as it is removed from 2.x distribution\n\n    pip install tensorflow==1.15\n\n    freeze_graph --input_checkpoint=./albert_model.ckpt \\\n      --output_graph=/tmp/albert_tiny_zh.pb \\\n      --output_node_names=cls/predictions/truediv \\\n      --checkpoint_version=1 --input_meta_graph=./albert_model.ckpt.meta --input_binary=true\n\n2. Convert to TFLite format\n\nWe are going to use the new experimental tf->tflite converter that's distributed with the Tensorflow nightly build.\n\n    pip install tf-nightly\n\n    tflite_convert --graph_def_file=/tmp/albert_tiny_zh.pb \\\n      --input_arrays='input_ids,input_mask,segment_ids,masked_lm_positions,masked_lm_ids,masked_lm_weights' \\\n      --output_arrays='cls/predictions/truediv' \\\n      --input_shapes=1,128:1,128:128:1,128:1,128:1,128 \\\n      --output_file=/tmp/albert_tiny_zh.tflite \\\n      --enable_v1_converter --experimental_new_converter\n\n3. Benchmark the performance of the TFLite model\n\nSee [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark) \nfor details about the performance benchmark tools in TFLite. For example: after\nbuilding the benchmark tool binary for an Android phone, do the following to\nget an idea of how the TFLite model performs on the phone\n\n    adb push /tmp/albert_tiny_zh.tflite /data/local/tmp/\n    adb shell /data/local/tmp/benchmark_model_performance_options --graph=/data/local/tmp/albert_tiny_zh.tflite --perf_options_list=cpu\n\nOn an Android phone w/ Qualcomm's SD845 SoC, via the above benchmark tool, as\nof 2019/11/01, the inference latency is ~120ms w/ this converted TFLite model\nusing 4 threads on CPU, and the memory usage is ~60MB for the model during\ninference. Note the performance will improve further with future TFLite\nimplementation optimizations.\n\n##### 使用PyTorch版本:\n\n    download pre-trained model, and convert to PyTorch using:\n     \n      python convert_albert_tf_checkpoint_to_pytorch.py     \n     \n   using <a href=\"https://github.com/lonePatient/albert_pytorch\">albert_pytorch\n   \n##### 使用Keras加载:\n\n<a href=\"https://github.com/bojone/bert4keras\">bert4keras</a> 适配albert，能成功加载albert_zh的权重，只需要在load_pretrained_model函数里加上albert=True\n\nload pre-trained model with bert4keras\n\n##### 使用tf2.0加载:\n\n<a href=\"https://github.com/kpe/bert-for-tf2\">bert-for-tf2</a>\n\n\n使用案例-基于用户输入预测文本相似性 Use Case-Text Similarity Based on User Input\n-------------------------------------------------\n\n功能说明：用户可以通过本例了解如何加载训训练集实现基于用户输入的短文本相似度判断。可以基于该代码将程序灵活地拓展为后台服务或增加文本分类等示例。\n\n涉及代码：similarity.py、args.py\n\n步骤：\n\n1、使用本模型进行文本相似性训练，保存模型文件至相应目录下\n\n2、根据实际情况，修改args.py中的参数，参数说明如下：\n\n```python\n#模型目录，存放ckpt文件\nmodel_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n\n#config文件，存放模型的json文件\nconfig_name = os.path.join(file_path, 'albert_config/albert_config_tiny.json')\n\n#ckpt文件名称\nckpt_name = os.path.join(model_dir, 'model.ckpt')\n\n#输出文件目录，训练时的模型输出目录\noutput_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n\n#vocab文件目录\nvocab_file = os.path.join(file_path, 'albert_config/vocab.txt')\n\n#数据目录，训练使用的数据集存放目录\ndata_dir = os.path.join(file_path, 'data/')\n```\n\n本例中的文件结构为：\n\n    |__args.py\n    \n    |__similarity.py\n    \n    |__data\n    \n    |__albert_config\n    \n    |__albert_lcqmc_checkpoints\n    \n    |__lcqmc\n\n3、修改用户输入单词\n\n打开similarity.py，最底部如下代码：\n\n```python\nif __name__ == '__main__':\n    sim = BertSim()\n    sim.start_model()\n    sim.predict_sentences([(\"我喜欢妈妈做的汤\", \"妈妈做的汤我很喜欢喝\")])\n```\n\n其中sim.start_model()表示加载模型，sim.predict_sentences的输入为一个元组数组，元组中包含两个元素分别为需要判定相似的句子。\n\n4、运行python文件：similarity.py\n\n\n支持的序列长度与批次大小的关系,12G显存 Trade off between batch Size and sequence length\n-------------------------------------------------\n\nSystem       | Seq Length | Max Batch Size\n------------ | ---------- | --------------\n`albert-base`  | 64         | 64\n...          | 128        | 32\n...          | 256        | 16\n...          | 320        | 14\n...          | 384        | 12\n...          | 512        | 6\n`albert-large` | 64         | 12\n...          | 128        | 6\n...          | 256        | 2\n...          | 320        | 1\n...          | 384        | 0\n...          | 512        | 0\n`albert-xlarge` | -         | -\n\n学习曲线 Training Loss of xlarge of albert_zh\n-------------------------------------------------\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/xlarge_loss.jpg\"  width=\"80%\" height=\"40%\" />\n\n所有的参数 Parameters of albert_xlarge\n-------------------------------------------------\n<img src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_large_zh_parameters.jpg\"  width=\"80%\" height=\"40%\" />\n\n\n#### 技术交流与问题讨论QQ群: 836811304 Join us on QQ group\n\nIf you have any question, you can raise an issue, or send me an email: brightmart@hotmail.com;\n\nCurrently how to use PyTorch version of albert is not clear yet, if you know how to do that, just email us or open an issue.\n\nYou can also send pull request to report you performance on your task or add methods on how to load models for PyTorch and so on.\n\nIf you have ideas for generate best performance pre-training Chinese model, please also let me know.\n\n##### Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)\n\nCite Us\n-----------------------------------------------\nBright Liang Xu, albert_zh, (2019), GitHub repository, https://github.com/brightmart/albert_zh\n\nReference\n-----------------------------------------------\n1、<a href=\"https://arxiv.org/pdf/1909.11942.pdf\">ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations</a>\n\n2、<a href=\"https://arxiv.org/pdf/1810.04805.pdf\">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a>\n\n3、<a href=\"https://arxiv.org/abs/1907.10529\">SpanBERT: Improving Pre-training by Representing and Predicting Spans</a>\n\n4、<a href=\"https://arxiv.org/pdf/1907.11692.pdf\">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a>\n\n5、<a href=\"https://arxiv.org/pdf/1904.00962.pdf\">Large Batch Optimization for Deep Learning: Training BERT in 76 minutes(LAMB)</a>\n\n6、<a href=\"https://github.com/ymcui/LAMB_Optimizer_TF\">LAMB Optimizer,TensorFlow version</a>\n\n7、<a href=\"http://baijiahao.baidu.com/s?id=1645712785366950083&wfr=spider&for=pc\">预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准</a>\n\n8、 <a href=\"https://github.com/lonePatient/albert_pytorch\">albert_pytorch</a>\n\n9、<a href=\"https://github.com/bojone/bert4keras\">load albert with keras</a>\n\n10、<a href=\"https://github.com/kpe/bert-for-tf2\">load albert with tf2.0</a>\n\n11、<a href=\"https://github.com/google-research/google-research/tree/master/albert\">repo of albert from google</a>\n\n12、<a href=\"https://github.com/chineseGLUE/chineseGLUE\">chineseGLUE-中文任务基准测评：公开可用多个任务、基线模型、广泛测评与效果对比</a>\n\n\n\n\n"
  },
  {
    "path": "albert_config/albert_config_base.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 768,\n  \"embedding_size\": 128,\n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 3072 ,\n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 12,\n\n  \"pooler_fc_size\": 768,\n  \"pooler_num_attention_heads\": 12,\n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128,\n   \"ln_type\":\"postln\"\n\n}\n"
  },
  {
    "path": "albert_config/albert_config_base_google_fast.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.1,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.1,\n  \"embedding_size\": 128,\n  \"hidden_size\": 768,\n  \"initializer_range\": 0.02,\n  \"intermediate_size\": 3072,\n  \"max_position_embeddings\": 512,\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 12,\n  \"num_hidden_groups\": 12,\n  \"net_structure_type\": 0,\n  \"gap_size\": 0,\n  \"num_memory_blocks\": 0,\n  \"inner_group_num\": 1,\n  \"down_scale_factor\": 1,\n  \"type_vocab_size\": 2,\n  \"vocab_size\": 21128\n}"
  },
  {
    "path": "albert_config/albert_config_large.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 1024,\n  \"embedding_size\": 128,\n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 4096,\n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 16,\n  \"num_hidden_layers\": 24,\n\n  \"pooler_fc_size\": 768,\n  \"pooler_num_attention_heads\": 12,\n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128,\n   \"ln_type\":\"postln\"\n\n}\n"
  },
  {
    "path": "albert_config/albert_config_small_google.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.0,\n  \"embedding_size\": 128,\n  \"hidden_size\": 384,\n  \"initializer_range\": 0.02,\n  \"intermediate_size\": 1536,\n  \"max_position_embeddings\": 512,\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 6,\n  \"num_hidden_groups\": 1,\n  \"net_structure_type\": 0,\n  \"gap_size\": 0,\n  \"num_memory_blocks\": 0,\n  \"inner_group_num\": 1,\n  \"down_scale_factor\": 1,\n  \"type_vocab_size\": 2,\n  \"vocab_size\": 21128\n}"
  },
  {
    "path": "albert_config/albert_config_tiny.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 312,\n  \"embedding_size\": 128,\n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 1248 ,\n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 4,\n\n  \"pooler_fc_size\": 768,\n  \"pooler_num_attention_heads\": 12,\n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128,\n   \"ln_type\":\"postln\"\n\n}\n"
  },
  {
    "path": "albert_config/albert_config_tiny_google.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.0,\n  \"embedding_size\": 128,\n  \"hidden_size\": 312,\n  \"initializer_range\": 0.02,\n  \"intermediate_size\": 1248,\n  \"max_position_embeddings\": 512,\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 4,\n  \"num_hidden_groups\": 1,\n  \"net_structure_type\": 0,\n  \"gap_size\": 0,\n  \"num_memory_blocks\": 0,\n  \"inner_group_num\": 1,\n  \"down_scale_factor\": 1,\n  \"type_vocab_size\": 2,\n  \"vocab_size\": 21128\n}\n"
  },
  {
    "path": "albert_config/albert_config_tiny_google_fast.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.1,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.1,\n  \"embedding_size\": 128,\n  \"hidden_size\": 336,\n  \"initializer_range\": 0.02,\n  \"intermediate_size\": 1344,\n  \"max_position_embeddings\": 512,\n  \"num_attention_heads\": 12,\n  \"num_hidden_layers\": 4,\n  \"num_hidden_groups\": 12,\n  \"net_structure_type\": 0,\n  \"gap_size\": 0,\n  \"num_memory_blocks\": 0,\n  \"inner_group_num\": 1,\n  \"down_scale_factor\": 1,\n  \"type_vocab_size\": 2,\n  \"vocab_size\": 21128\n}"
  },
  {
    "path": "albert_config/albert_config_xlarge.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 2048,\n  \"embedding_size\": 128,\n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 8192,\n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 32,\n  \"num_hidden_layers\": 24,\n\n  \"pooler_fc_size\": 1024,\n  \"pooler_num_attention_heads\": 64,\n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128,\n  \"ln_type\":\"postln\"\n\n}\n"
  },
  {
    "path": "albert_config/albert_config_xxlarge.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 4096,\n  \"embedding_size\": 128,\n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 16384,\n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 64,\n  \"num_hidden_layers\": 12,\n\n  \"pooler_fc_size\": 1024,\n  \"pooler_num_attention_heads\": 64,\n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128,\n   \"ln_type\":\"preln\"\n\n}\n"
  },
  {
    "path": "albert_config/bert_config.json",
    "content": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": 0.0,\n  \"hidden_size\": 768, \n  \"initializer_range\": 0.02, \n  \"intermediate_size\": 3072, \n  \"max_position_embeddings\": 512, \n  \"num_attention_heads\": 12, \n  \"num_hidden_layers\": 12, \n  \"pooler_fc_size\": 768, \n  \"pooler_num_attention_heads\": 12, \n  \"pooler_num_fc_layers\": 3, \n  \"pooler_size_per_head\": 128, \n  \"pooler_type\": \"first_token_transform\", \n  \"type_vocab_size\": 2, \n  \"vocab_size\": 21128\n}\n"
  },
  {
    "path": "albert_config/vocab.txt",
    "content": "[PAD]\n[unused1]\n[unused2]\n[unused3]\n[unused4]\n[unused5]\n[unused6]\n[unused7]\n[unused8]\n[unused9]\n[unused10]\n[unused11]\n[unused12]\n[unused13]\n[unused14]\n[unused15]\n[unused16]\n[unused17]\n[unused18]\n[unused19]\n[unused20]\n[unused21]\n[unused22]\n[unused23]\n[unused24]\n[unused25]\n[unused26]\n[unused27]\n[unused28]\n[unused29]\n[unused30]\n[unused31]\n[unused32]\n[unused33]\n[unused34]\n[unused35]\n[unused36]\n[unused37]\n[unused38]\n[unused39]\n[unused40]\n[unused41]\n[unused42]\n[unused43]\n[unused44]\n[unused45]\n[unused46]\n[unused47]\n[unused48]\n[unused49]\n[unused50]\n[unused51]\n[unused52]\n[unused53]\n[unused54]\n[unused55]\n[unused56]\n[unused57]\n[unused58]\n[unused59]\n[unused60]\n[unused61]\n[unused62]\n[unused63]\n[unused64]\n[unused65]\n[unused66]\n[unused67]\n[unused68]\n[unused69]\n[unused70]\n[unused71]\n[unused72]\n[unused73]\n[unused74]\n[unused75]\n[unused76]\n[unused77]\n[unused78]\n[unused79]\n[unused80]\n[unused81]\n[unused82]\n[unused83]\n[unused84]\n[unused85]\n[unused86]\n[unused87]\n[unused88]\n[unused89]\n[unused90]\n[unused91]\n[unused92]\n[unused93]\n[unused94]\n[unused95]\n[unused96]\n[unused97]\n[unused98]\n[unused99]\n[UNK]\n[CLS]\n[SEP]\n[MASK]\n<S>\n<T>\n!\n\"\n#\n$\n%\n&\n'\n(\n)\n*\n+\n,\n-\n.\n/\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n:\n;\n<\n=\n>\n?\n@\n[\n\\\n]\n^\n_\na\nb\nc\nd\ne\nf\ng\nh\ni\nj\nk\nl\nm\nn\no\np\nq\nr\ns\nt\nu\nv\nw\nx\ny\nz\n{\n|\n}\n~\n£\n¤\n¥\n§\n©\n«\n®\n°\n±\n²\n³\nµ\n·\n¹\nº\n»\n¼\n×\nß\næ\n÷\nø\nđ\nŋ\nɔ\nə\nɡ\nʰ\nˇ\nˈ\nˊ\nˋ\nˍ\nː\n˙\n˚\nˢ\nα\nβ\nγ\nδ\nε\nη\nθ\nι\nκ\nλ\nμ\nν\nο\nπ\nρ\nς\nσ\nτ\nυ\nφ\nχ\nψ\nω\nа\nб\nв\nг\nд\nе\nж\nз\nи\nк\nл\nм\nн\nо\nп\nр\nс\nт\nу\nф\nх\nц\nч\nш\nы\nь\nя\nі\nا\nب\nة\nت\nد\nر\nس\nع\nل\nم\nن\nه\nو\nي\n۩\nก\nง\nน\nม\nย\nร\nอ\nา\nเ\n๑\n་\nღ\nᄀ\nᄁ\nᄂ\nᄃ\nᄅ\nᄆ\nᄇ\nᄈ\nᄉ\nᄋ\nᄌ\nᄎ\nᄏ\nᄐ\nᄑ\nᄒ\nᅡ\nᅢ\nᅣ\nᅥ\nᅦ\nᅧ\nᅨ\nᅩ\nᅪ\nᅬ\nᅭ\nᅮ\nᅯ\nᅲ\nᅳ\nᅴ\nᅵ\nᆨ\nᆫ\nᆯ\nᆷ\nᆸ\nᆺ\nᆻ\nᆼ\nᗜ\nᵃ\nᵉ\nᵍ\nᵏ\nᵐ\nᵒ\nᵘ\n‖\n„\n†\n•\n‥\n‧\n \n‰\n′\n″\n‹\n›\n※\n‿\n⁄\nⁱ\n⁺\nⁿ\n₁\n₂\n₃\n₄\n€\n℃\n№\n™\nⅰ\nⅱ\nⅲ\nⅳ\nⅴ\n←\n↑\n→\n↓\n↔\n↗\n↘\n⇒\n∀\n−\n∕\n∙\n√\n∞\n∟\n∠\n∣\n∥\n∩\n∮\n∶\n∼\n∽\n≈\n≒\n≡\n≤\n≥\n≦\n≧\n≪\n≫\n⊙\n⋅\n⋈\n⋯\n⌒\n①\n②\n③\n④\n⑤\n⑥\n⑦\n⑧\n⑨\n⑩\n⑴\n⑵\n⑶\n⑷\n⑸\n⒈\n⒉\n⒊\n⒋\nⓒ\nⓔ\nⓘ\n─\n━\n│\n┃\n┅\n┆\n┊\n┌\n└\n├\n┣\n═\n║\n╚\n╞\n╠\n╭\n╮\n╯\n╰\n╱\n╳\n▂\n▃\n▅\n▇\n█\n▉\n▋\n▌\n▍\n▎\n■\n□\n▪\n▫\n▬\n▲\n△\n▶\n►\n▼\n▽\n◆\n◇\n○\n◎\n●\n◕\n◠\n◢\n◤\n☀\n★\n☆\n☕\n☞\n☺\n☼\n♀\n♂\n♠\n♡\n♣\n♥\n♦\n♪\n♫\n♬\n✈\n✔\n✕\n✖\n✦\n✨\n✪\n✰\n✿\n❀\n❤\n➜\n➤\n⦿\n、\n。\n〃\n々\n〇\n〈\n〉\n《\n》\n「\n」\n『\n』\n【\n】\n〓\n〔\n〕\n〖\n〗\n〜\n〝\n〞\nぁ\nあ\nぃ\nい\nう\nぇ\nえ\nお\nか\nき\nく\nけ\nこ\nさ\nし\nす\nせ\nそ\nた\nち\nっ\nつ\nて\nと\nな\nに\nぬ\nね\nの\nは\nひ\nふ\nへ\nほ\nま\nみ\nむ\nめ\nも\nゃ\nや\nゅ\nゆ\nょ\nよ\nら\nり\nる\nれ\nろ\nわ\nを\nん\n゜\nゝ\nァ\nア\nィ\nイ\nゥ\nウ\nェ\nエ\nォ\nオ\nカ\nキ\nク\nケ\nコ\nサ\nシ\nス\nセ\nソ\nタ\nチ\nッ\nツ\nテ\nト\nナ\nニ\nヌ\nネ\nノ\nハ\nヒ\nフ\nヘ\nホ\nマ\nミ\nム\nメ\nモ\nャ\nヤ\nュ\nユ\nョ\nヨ\nラ\nリ\nル\nレ\nロ\nワ\nヲ\nン\nヶ\n・\nー\nヽ\nㄅ\nㄆ\nㄇ\nㄉ\nㄋ\nㄌ\nㄍ\nㄎ\nㄏ\nㄒ\nㄚ\nㄛ\nㄞ\nㄟ\nㄢ\nㄤ\nㄥ\nㄧ\nㄨ\nㆍ\n㈦\n㊣\n㎡\n㗎\n一\n丁\n七\n万\n丈\n三\n上\n下\n不\n与\n丐\n丑\n专\n且\n丕\n世\n丘\n丙\n业\n丛\n东\n丝\n丞\n丟\n両\n丢\n两\n严\n並\n丧\n丨\n个\n丫\n中\n丰\n串\n临\n丶\n丸\n丹\n为\n主\n丼\n丽\n举\n丿\n乂\n乃\n久\n么\n义\n之\n乌\n乍\n乎\n乏\n乐\n乒\n乓\n乔\n乖\n乗\n乘\n乙\n乜\n九\n乞\n也\n习\n乡\n书\n乩\n买\n乱\n乳\n乾\n亀\n亂\n了\n予\n争\n事\n二\n于\n亏\n云\n互\n五\n井\n亘\n亙\n亚\n些\n亜\n亞\n亟\n亡\n亢\n交\n亥\n亦\n产\n亨\n亩\n享\n京\n亭\n亮\n亲\n亳\n亵\n人\n亿\n什\n仁\n仃\n仄\n仅\n仆\n仇\n今\n介\n仍\n从\n仏\n仑\n仓\n仔\n仕\n他\n仗\n付\n仙\n仝\n仞\n仟\n代\n令\n以\n仨\n仪\n们\n仮\n仰\n仲\n件\n价\n任\n份\n仿\n企\n伉\n伊\n伍\n伎\n伏\n伐\n休\n伕\n众\n优\n伙\n会\n伝\n伞\n伟\n传\n伢\n伤\n伦\n伪\n伫\n伯\n估\n伴\n伶\n伸\n伺\n似\n伽\n佃\n但\n佇\n佈\n位\n低\n住\n佐\n佑\n体\n佔\n何\n佗\n佘\n余\n佚\n佛\n作\n佝\n佞\n佟\n你\n佢\n佣\n佤\n佥\n佩\n佬\n佯\n佰\n佳\n併\n佶\n佻\n佼\n使\n侃\n侄\n來\n侈\n例\n侍\n侏\n侑\n侖\n侗\n供\n依\n侠\n価\n侣\n侥\n侦\n侧\n侨\n侬\n侮\n侯\n侵\n侶\n侷\n便\n係\n促\n俄\n俊\n俎\n俏\n俐\n俑\n俗\n俘\n俚\n保\n俞\n俟\n俠\n信\n俨\n俩\n俪\n俬\n俭\n修\n俯\n俱\n俳\n俸\n俺\n俾\n倆\n倉\n個\n倌\n倍\n倏\n們\n倒\n倔\n倖\n倘\n候\n倚\n倜\n借\n倡\n値\n倦\n倩\n倪\n倫\n倬\n倭\n倶\n债\n值\n倾\n偃\n假\n偈\n偉\n偌\n偎\n偏\n偕\n做\n停\n健\n側\n偵\n偶\n偷\n偻\n偽\n偿\n傀\n傅\n傍\n傑\n傘\n備\n傚\n傢\n傣\n傥\n储\n傩\n催\n傭\n傲\n傳\n債\n傷\n傻\n傾\n僅\n働\n像\n僑\n僕\n僖\n僚\n僥\n僧\n僭\n僮\n僱\n僵\n價\n僻\n儀\n儂\n億\n儆\n儉\n儋\n儒\n儕\n儘\n償\n儡\n優\n儲\n儷\n儼\n儿\n兀\n允\n元\n兄\n充\n兆\n兇\n先\n光\n克\n兌\n免\n児\n兑\n兒\n兔\n兖\n党\n兜\n兢\n入\n內\n全\n兩\n八\n公\n六\n兮\n兰\n共\n兲\n关\n兴\n兵\n其\n具\n典\n兹\n养\n兼\n兽\n冀\n内\n円\n冇\n冈\n冉\n冊\n册\n再\n冏\n冒\n冕\n冗\n写\n军\n农\n冠\n冢\n冤\n冥\n冨\n冪\n冬\n冯\n冰\n冲\n决\n况\n冶\n冷\n冻\n冼\n冽\n冾\n净\n凄\n准\n凇\n凈\n凉\n凋\n凌\n凍\n减\n凑\n凛\n凜\n凝\n几\n凡\n凤\n処\n凪\n凭\n凯\n凰\n凱\n凳\n凶\n凸\n凹\n出\n击\n函\n凿\n刀\n刁\n刃\n分\n切\n刈\n刊\n刍\n刎\n刑\n划\n列\n刘\n则\n刚\n创\n初\n删\n判\n別\n刨\n利\n刪\n别\n刮\n到\n制\n刷\n券\n刹\n刺\n刻\n刽\n剁\n剂\n剃\n則\n剉\n削\n剋\n剌\n前\n剎\n剐\n剑\n剔\n剖\n剛\n剜\n剝\n剣\n剤\n剥\n剧\n剩\n剪\n副\n割\n創\n剷\n剽\n剿\n劃\n劇\n劈\n劉\n劊\n劍\n劏\n劑\n力\n劝\n办\n功\n加\n务\n劣\n动\n助\n努\n劫\n劭\n励\n劲\n劳\n労\n劵\n効\n劾\n势\n勁\n勃\n勇\n勉\n勋\n勐\n勒\n動\n勖\n勘\n務\n勛\n勝\n勞\n募\n勢\n勤\n勧\n勳\n勵\n勸\n勺\n勻\n勾\n勿\n匀\n包\n匆\n匈\n匍\n匐\n匕\n化\n北\n匙\n匝\n匠\n匡\n匣\n匪\n匮\n匯\n匱\n匹\n区\n医\n匾\n匿\n區\n十\n千\n卅\n升\n午\n卉\n半\n卍\n华\n协\n卑\n卒\n卓\n協\n单\n卖\n南\n単\n博\n卜\n卞\n卟\n占\n卡\n卢\n卤\n卦\n卧\n卫\n卮\n卯\n印\n危\n即\n却\n卵\n卷\n卸\n卻\n卿\n厂\n厄\n厅\n历\n厉\n压\n厌\n厕\n厘\n厚\n厝\n原\n厢\n厥\n厦\n厨\n厩\n厭\n厮\n厲\n厳\n去\n县\n叁\n参\n參\n又\n叉\n及\n友\n双\n反\n収\n发\n叔\n取\n受\n变\n叙\n叛\n叟\n叠\n叡\n叢\n口\n古\n句\n另\n叨\n叩\n只\n叫\n召\n叭\n叮\n可\n台\n叱\n史\n右\n叵\n叶\n号\n司\n叹\n叻\n叼\n叽\n吁\n吃\n各\n吆\n合\n吉\n吊\n吋\n同\n名\n后\n吏\n吐\n向\n吒\n吓\n吕\n吖\n吗\n君\n吝\n吞\n吟\n吠\n吡\n否\n吧\n吨\n吩\n含\n听\n吭\n吮\n启\n吱\n吳\n吴\n吵\n吶\n吸\n吹\n吻\n吼\n吽\n吾\n呀\n呂\n呃\n呆\n呈\n告\n呋\n呎\n呐\n呓\n呕\n呗\n员\n呛\n呜\n呢\n呤\n呦\n周\n呱\n呲\n味\n呵\n呷\n呸\n呻\n呼\n命\n咀\n咁\n咂\n咄\n咆\n咋\n和\n咎\n咏\n咐\n咒\n咔\n咕\n咖\n咗\n咘\n咙\n咚\n咛\n咣\n咤\n咦\n咧\n咨\n咩\n咪\n咫\n咬\n咭\n咯\n咱\n咲\n咳\n咸\n咻\n咽\n咿\n哀\n品\n哂\n哄\n哆\n哇\n哈\n哉\n哋\n哌\n响\n哎\n哏\n哐\n哑\n哒\n哔\n哗\n哟\n員\n哥\n哦\n哧\n哨\n哩\n哪\n哭\n哮\n哲\n哺\n哼\n哽\n唁\n唄\n唆\n唇\n唉\n唏\n唐\n唑\n唔\n唠\n唤\n唧\n唬\n售\n唯\n唰\n唱\n唳\n唷\n唸\n唾\n啃\n啄\n商\n啉\n啊\n問\n啓\n啕\n啖\n啜\n啞\n啟\n啡\n啤\n啥\n啦\n啧\n啪\n啫\n啬\n啮\n啰\n啱\n啲\n啵\n啶\n啷\n啸\n啻\n啼\n啾\n喀\n喂\n喃\n善\n喆\n喇\n喉\n喊\n喋\n喎\n喏\n喔\n喘\n喙\n喚\n喜\n喝\n喟\n喧\n喪\n喫\n喬\n單\n喰\n喱\n喲\n喳\n喵\n営\n喷\n喹\n喺\n喻\n喽\n嗅\n嗆\n嗇\n嗎\n嗑\n嗒\n嗓\n嗔\n嗖\n嗚\n嗜\n嗝\n嗟\n嗡\n嗣\n嗤\n嗦\n嗨\n嗪\n嗬\n嗯\n嗰\n嗲\n嗳\n嗶\n嗷\n嗽\n嘀\n嘅\n嘆\n嘈\n嘉\n嘌\n嘍\n嘎\n嘔\n嘖\n嘗\n嘘\n嘚\n嘛\n嘜\n嘞\n嘟\n嘢\n嘣\n嘤\n嘧\n嘩\n嘭\n嘮\n嘯\n嘰\n嘱\n嘲\n嘴\n嘶\n嘸\n嘹\n嘻\n嘿\n噁\n噌\n噎\n噓\n噔\n噗\n噙\n噜\n噠\n噢\n噤\n器\n噩\n噪\n噬\n噱\n噴\n噶\n噸\n噹\n噻\n噼\n嚀\n嚇\n嚎\n嚏\n嚐\n嚓\n嚕\n嚟\n嚣\n嚥\n嚨\n嚮\n嚴\n嚷\n嚼\n囂\n囉\n囊\n囍\n囑\n囔\n囗\n囚\n四\n囝\n回\n囟\n因\n囡\n团\n団\n囤\n囧\n囪\n囫\n园\n困\n囱\n囲\n図\n围\n囹\n固\n国\n图\n囿\n圃\n圄\n圆\n圈\n國\n圍\n圏\n園\n圓\n圖\n團\n圜\n土\n圣\n圧\n在\n圩\n圭\n地\n圳\n场\n圻\n圾\n址\n坂\n均\n坊\n坍\n坎\n坏\n坐\n坑\n块\n坚\n坛\n坝\n坞\n坟\n坠\n坡\n坤\n坦\n坨\n坪\n坯\n坳\n坵\n坷\n垂\n垃\n垄\n型\n垒\n垚\n垛\n垠\n垢\n垣\n垦\n垩\n垫\n垭\n垮\n垵\n埂\n埃\n埋\n城\n埔\n埕\n埗\n域\n埠\n埤\n埵\n執\n埸\n培\n基\n埼\n堀\n堂\n堃\n堅\n堆\n堇\n堑\n堕\n堙\n堡\n堤\n堪\n堯\n堰\n報\n場\n堵\n堺\n堿\n塊\n塌\n塑\n塔\n塗\n塘\n塚\n塞\n塢\n塩\n填\n塬\n塭\n塵\n塾\n墀\n境\n墅\n墉\n墊\n墒\n墓\n増\n墘\n墙\n墜\n增\n墟\n墨\n墩\n墮\n墳\n墻\n墾\n壁\n壅\n壆\n壇\n壊\n壑\n壓\n壕\n壘\n壞\n壟\n壢\n壤\n壩\n士\n壬\n壮\n壯\n声\n売\n壳\n壶\n壹\n壺\n壽\n处\n备\n変\n复\n夏\n夔\n夕\n外\n夙\n多\n夜\n够\n夠\n夢\n夥\n大\n天\n太\n夫\n夭\n央\n夯\n失\n头\n夷\n夸\n夹\n夺\n夾\n奂\n奄\n奇\n奈\n奉\n奋\n奎\n奏\n奐\n契\n奔\n奕\n奖\n套\n奘\n奚\n奠\n奢\n奥\n奧\n奪\n奬\n奮\n女\n奴\n奶\n奸\n她\n好\n如\n妃\n妄\n妆\n妇\n妈\n妊\n妍\n妒\n妓\n妖\n妘\n妙\n妝\n妞\n妣\n妤\n妥\n妨\n妩\n妪\n妮\n妲\n妳\n妹\n妻\n妾\n姆\n姉\n姊\n始\n姍\n姐\n姑\n姒\n姓\n委\n姗\n姚\n姜\n姝\n姣\n姥\n姦\n姨\n姪\n姫\n姬\n姹\n姻\n姿\n威\n娃\n娄\n娅\n娆\n娇\n娉\n娑\n娓\n娘\n娛\n娜\n娟\n娠\n娣\n娥\n娩\n娱\n娲\n娴\n娶\n娼\n婀\n婁\n婆\n婉\n婊\n婕\n婚\n婢\n婦\n婧\n婪\n婭\n婴\n婵\n婶\n婷\n婺\n婿\n媒\n媚\n媛\n媞\n媧\n媲\n媳\n媽\n媾\n嫁\n嫂\n嫉\n嫌\n嫑\n嫔\n嫖\n嫘\n嫚\n嫡\n嫣\n嫦\n嫩\n嫲\n嫵\n嫻\n嬅\n嬉\n嬌\n嬗\n嬛\n嬢\n嬤\n嬪\n嬰\n嬴\n嬷\n嬸\n嬿\n孀\n孃\n子\n孑\n孔\n孕\n孖\n字\n存\n孙\n孚\n孛\n孜\n孝\n孟\n孢\n季\n孤\n学\n孩\n孪\n孫\n孬\n孰\n孱\n孳\n孵\n學\n孺\n孽\n孿\n宁\n它\n宅\n宇\n守\n安\n宋\n完\n宏\n宓\n宕\n宗\n官\n宙\n定\n宛\n宜\n宝\n实\n実\n宠\n审\n客\n宣\n室\n宥\n宦\n宪\n宫\n宮\n宰\n害\n宴\n宵\n家\n宸\n容\n宽\n宾\n宿\n寂\n寄\n寅\n密\n寇\n富\n寐\n寒\n寓\n寛\n寝\n寞\n察\n寡\n寢\n寥\n實\n寧\n寨\n審\n寫\n寬\n寮\n寰\n寵\n寶\n寸\n对\n寺\n寻\n导\n対\n寿\n封\n専\n射\n将\n將\n專\n尉\n尊\n尋\n對\n導\n小\n少\n尔\n尕\n尖\n尘\n尚\n尝\n尤\n尧\n尬\n就\n尴\n尷\n尸\n尹\n尺\n尻\n尼\n尽\n尾\n尿\n局\n屁\n层\n屄\n居\n屆\n屈\n屉\n届\n屋\n屌\n屍\n屎\n屏\n屐\n屑\n展\n屜\n属\n屠\n屡\n屢\n層\n履\n屬\n屯\n山\n屹\n屿\n岀\n岁\n岂\n岌\n岐\n岑\n岔\n岖\n岗\n岘\n岙\n岚\n岛\n岡\n岩\n岫\n岬\n岭\n岱\n岳\n岷\n岸\n峇\n峋\n峒\n峙\n峡\n峤\n峥\n峦\n峨\n峪\n峭\n峯\n峰\n峴\n島\n峻\n峽\n崁\n崂\n崆\n崇\n崎\n崑\n崔\n崖\n崗\n崙\n崛\n崧\n崩\n崭\n崴\n崽\n嵇\n嵊\n嵋\n嵌\n嵐\n嵘\n嵩\n嵬\n嵯\n嶂\n嶄\n嶇\n嶋\n嶙\n嶺\n嶼\n嶽\n巅\n巍\n巒\n巔\n巖\n川\n州\n巡\n巢\n工\n左\n巧\n巨\n巩\n巫\n差\n己\n已\n巳\n巴\n巷\n巻\n巽\n巾\n巿\n币\n市\n布\n帅\n帆\n师\n希\n帐\n帑\n帕\n帖\n帘\n帚\n帛\n帜\n帝\n帥\n带\n帧\n師\n席\n帮\n帯\n帰\n帳\n帶\n帷\n常\n帼\n帽\n幀\n幂\n幄\n幅\n幌\n幔\n幕\n幟\n幡\n幢\n幣\n幫\n干\n平\n年\n并\n幸\n幹\n幺\n幻\n幼\n幽\n幾\n广\n庁\n広\n庄\n庆\n庇\n床\n序\n庐\n库\n应\n底\n庖\n店\n庙\n庚\n府\n庞\n废\n庠\n度\n座\n庫\n庭\n庵\n庶\n康\n庸\n庹\n庾\n廁\n廂\n廃\n廈\n廉\n廊\n廓\n廖\n廚\n廝\n廟\n廠\n廢\n廣\n廬\n廳\n延\n廷\n建\n廿\n开\n弁\n异\n弃\n弄\n弈\n弊\n弋\n式\n弑\n弒\n弓\n弔\n引\n弗\n弘\n弛\n弟\n张\n弥\n弦\n弧\n弩\n弭\n弯\n弱\n張\n強\n弹\n强\n弼\n弾\n彅\n彆\n彈\n彌\n彎\n归\n当\n录\n彗\n彙\n彝\n形\n彤\n彥\n彦\n彧\n彩\n彪\n彫\n彬\n彭\n彰\n影\n彷\n役\n彻\n彼\n彿\n往\n征\n径\n待\n徇\n很\n徉\n徊\n律\n後\n徐\n徑\n徒\n従\n徕\n得\n徘\n徙\n徜\n從\n徠\n御\n徨\n復\n循\n徬\n微\n徳\n徴\n徵\n德\n徹\n徼\n徽\n心\n必\n忆\n忌\n忍\n忏\n忐\n忑\n忒\n忖\n志\n忘\n忙\n応\n忠\n忡\n忤\n忧\n忪\n快\n忱\n念\n忻\n忽\n忿\n怀\n态\n怂\n怅\n怆\n怎\n怏\n怒\n怔\n怕\n怖\n怙\n怜\n思\n怠\n怡\n急\n怦\n性\n怨\n怪\n怯\n怵\n总\n怼\n恁\n恃\n恆\n恋\n恍\n恐\n恒\n恕\n恙\n恚\n恢\n恣\n恤\n恥\n恨\n恩\n恪\n恫\n恬\n恭\n息\n恰\n恳\n恵\n恶\n恸\n恺\n恻\n恼\n恿\n悄\n悅\n悉\n悌\n悍\n悔\n悖\n悚\n悟\n悠\n患\n悦\n您\n悩\n悪\n悬\n悯\n悱\n悲\n悴\n悵\n悶\n悸\n悻\n悼\n悽\n情\n惆\n惇\n惊\n惋\n惑\n惕\n惘\n惚\n惜\n惟\n惠\n惡\n惦\n惧\n惨\n惩\n惫\n惬\n惭\n惮\n惯\n惰\n惱\n想\n惴\n惶\n惹\n惺\n愁\n愆\n愈\n愉\n愍\n意\n愕\n愚\n愛\n愜\n感\n愣\n愤\n愧\n愫\n愷\n愿\n慄\n慈\n態\n慌\n慎\n慑\n慕\n慘\n慚\n慟\n慢\n慣\n慧\n慨\n慫\n慮\n慰\n慳\n慵\n慶\n慷\n慾\n憂\n憊\n憋\n憎\n憐\n憑\n憔\n憚\n憤\n憧\n憨\n憩\n憫\n憬\n憲\n憶\n憾\n懂\n懇\n懈\n應\n懊\n懋\n懑\n懒\n懦\n懲\n懵\n懶\n懷\n懸\n懺\n懼\n懾\n懿\n戀\n戈\n戊\n戌\n戍\n戎\n戏\n成\n我\n戒\n戕\n或\n战\n戚\n戛\n戟\n戡\n戦\n截\n戬\n戮\n戰\n戲\n戳\n戴\n戶\n户\n戸\n戻\n戾\n房\n所\n扁\n扇\n扈\n扉\n手\n才\n扎\n扑\n扒\n打\n扔\n払\n托\n扛\n扣\n扦\n执\n扩\n扪\n扫\n扬\n扭\n扮\n扯\n扰\n扱\n扳\n扶\n批\n扼\n找\n承\n技\n抄\n抉\n把\n抑\n抒\n抓\n投\n抖\n抗\n折\n抚\n抛\n抜\n択\n抟\n抠\n抡\n抢\n护\n报\n抨\n披\n抬\n抱\n抵\n抹\n押\n抽\n抿\n拂\n拄\n担\n拆\n拇\n拈\n拉\n拋\n拌\n拍\n拎\n拐\n拒\n拓\n拔\n拖\n拗\n拘\n拙\n拚\n招\n拜\n拟\n拡\n拢\n拣\n拥\n拦\n拧\n拨\n择\n括\n拭\n拮\n拯\n拱\n拳\n拴\n拷\n拼\n拽\n拾\n拿\n持\n挂\n指\n挈\n按\n挎\n挑\n挖\n挙\n挚\n挛\n挝\n挞\n挟\n挠\n挡\n挣\n挤\n挥\n挨\n挪\n挫\n振\n挲\n挹\n挺\n挽\n挾\n捂\n捅\n捆\n捉\n捋\n捌\n捍\n捎\n捏\n捐\n捕\n捞\n损\n捡\n换\n捣\n捧\n捨\n捩\n据\n捱\n捲\n捶\n捷\n捺\n捻\n掀\n掂\n掃\n掇\n授\n掉\n掌\n掏\n掐\n排\n掖\n掘\n掙\n掛\n掠\n採\n探\n掣\n接\n控\n推\n掩\n措\n掬\n掰\n掲\n掳\n掴\n掷\n掸\n掺\n揀\n揃\n揄\n揆\n揉\n揍\n描\n提\n插\n揖\n揚\n換\n握\n揣\n揩\n揪\n揭\n揮\n援\n揶\n揸\n揹\n揽\n搀\n搁\n搂\n搅\n損\n搏\n搐\n搓\n搔\n搖\n搗\n搜\n搞\n搡\n搪\n搬\n搭\n搵\n搶\n携\n搽\n摀\n摁\n摄\n摆\n摇\n摈\n摊\n摒\n摔\n摘\n摞\n摟\n摧\n摩\n摯\n摳\n摸\n摹\n摺\n摻\n撂\n撃\n撅\n撇\n撈\n撐\n撑\n撒\n撓\n撕\n撚\n撞\n撤\n撥\n撩\n撫\n撬\n播\n撮\n撰\n撲\n撵\n撷\n撸\n撻\n撼\n撿\n擀\n擁\n擂\n擄\n擅\n擇\n擊\n擋\n操\n擎\n擒\n擔\n擘\n據\n擞\n擠\n擡\n擢\n擦\n擬\n擰\n擱\n擲\n擴\n擷\n擺\n擼\n擾\n攀\n攏\n攒\n攔\n攘\n攙\n攜\n攝\n攞\n攢\n攣\n攤\n攥\n攪\n攫\n攬\n支\n收\n攸\n改\n攻\n放\n政\n故\n效\n敌\n敍\n敎\n敏\n救\n敕\n敖\n敗\n敘\n教\n敛\n敝\n敞\n敢\n散\n敦\n敬\n数\n敲\n整\n敵\n敷\n數\n斂\n斃\n文\n斋\n斌\n斎\n斐\n斑\n斓\n斗\n料\n斛\n斜\n斟\n斡\n斤\n斥\n斧\n斩\n斫\n斬\n断\n斯\n新\n斷\n方\n於\n施\n旁\n旃\n旅\n旋\n旌\n旎\n族\n旖\n旗\n无\n既\n日\n旦\n旧\n旨\n早\n旬\n旭\n旮\n旱\n时\n旷\n旺\n旻\n昀\n昂\n昆\n昇\n昉\n昊\n昌\n明\n昏\n易\n昔\n昕\n昙\n星\n映\n春\n昧\n昨\n昭\n是\n昱\n昴\n昵\n昶\n昼\n显\n晁\n時\n晃\n晉\n晋\n晌\n晏\n晒\n晓\n晔\n晕\n晖\n晗\n晚\n晝\n晞\n晟\n晤\n晦\n晨\n晩\n普\n景\n晰\n晴\n晶\n晷\n智\n晾\n暂\n暄\n暇\n暈\n暉\n暌\n暐\n暑\n暖\n暗\n暝\n暢\n暧\n暨\n暫\n暮\n暱\n暴\n暸\n暹\n曄\n曆\n曇\n曉\n曖\n曙\n曜\n曝\n曠\n曦\n曬\n曰\n曲\n曳\n更\n書\n曹\n曼\n曾\n替\n最\n會\n月\n有\n朋\n服\n朐\n朔\n朕\n朗\n望\n朝\n期\n朦\n朧\n木\n未\n末\n本\n札\n朮\n术\n朱\n朴\n朵\n机\n朽\n杀\n杂\n权\n杆\n杈\n杉\n李\n杏\n材\n村\n杓\n杖\n杜\n杞\n束\n杠\n条\n来\n杨\n杭\n杯\n杰\n東\n杳\n杵\n杷\n杼\n松\n板\n极\n构\n枇\n枉\n枋\n析\n枕\n林\n枚\n果\n枝\n枢\n枣\n枪\n枫\n枭\n枯\n枰\n枱\n枳\n架\n枷\n枸\n柄\n柏\n某\n柑\n柒\n染\n柔\n柘\n柚\n柜\n柞\n柠\n柢\n查\n柩\n柬\n柯\n柱\n柳\n柴\n柵\n査\n柿\n栀\n栃\n栄\n栅\n标\n栈\n栉\n栋\n栎\n栏\n树\n栓\n栖\n栗\n校\n栩\n株\n样\n核\n根\n格\n栽\n栾\n桀\n桁\n桂\n桃\n桅\n框\n案\n桉\n桌\n桎\n桐\n桑\n桓\n桔\n桜\n桠\n桡\n桢\n档\n桥\n桦\n桧\n桨\n桩\n桶\n桿\n梁\n梅\n梆\n梏\n梓\n梗\n條\n梟\n梢\n梦\n梧\n梨\n梭\n梯\n械\n梳\n梵\n梶\n检\n棂\n棄\n棉\n棋\n棍\n棒\n棕\n棗\n棘\n棚\n棟\n棠\n棣\n棧\n森\n棱\n棲\n棵\n棹\n棺\n椁\n椅\n椋\n植\n椎\n椒\n検\n椪\n椭\n椰\n椹\n椽\n椿\n楂\n楊\n楓\n楔\n楚\n楝\n楞\n楠\n楣\n楨\n楫\n業\n楮\n極\n楷\n楸\n楹\n楼\n楽\n概\n榄\n榆\n榈\n榉\n榔\n榕\n榖\n榛\n榜\n榨\n榫\n榭\n榮\n榱\n榴\n榷\n榻\n槁\n槃\n構\n槌\n槍\n槎\n槐\n槓\n様\n槛\n槟\n槤\n槭\n槲\n槳\n槻\n槽\n槿\n樁\n樂\n樊\n樑\n樓\n標\n樞\n樟\n模\n樣\n権\n横\n樫\n樯\n樱\n樵\n樸\n樹\n樺\n樽\n樾\n橄\n橇\n橋\n橐\n橘\n橙\n機\n橡\n橢\n橫\n橱\n橹\n橼\n檀\n檄\n檎\n檐\n檔\n檗\n檜\n檢\n檬\n檯\n檳\n檸\n檻\n櫃\n櫚\n櫛\n櫥\n櫸\n櫻\n欄\n權\n欒\n欖\n欠\n次\n欢\n欣\n欧\n欲\n欸\n欺\n欽\n款\n歆\n歇\n歉\n歌\n歎\n歐\n歓\n歙\n歛\n歡\n止\n正\n此\n步\n武\n歧\n歩\n歪\n歯\n歲\n歳\n歴\n歷\n歸\n歹\n死\n歼\n殁\n殃\n殆\n殇\n殉\n殊\n残\n殒\n殓\n殖\n殘\n殞\n殡\n殤\n殭\n殯\n殲\n殴\n段\n殷\n殺\n殼\n殿\n毀\n毁\n毂\n毅\n毆\n毋\n母\n毎\n每\n毒\n毓\n比\n毕\n毗\n毘\n毙\n毛\n毡\n毫\n毯\n毽\n氈\n氏\n氐\n民\n氓\n气\n氖\n気\n氙\n氛\n氟\n氡\n氢\n氣\n氤\n氦\n氧\n氨\n氪\n氫\n氮\n氯\n氰\n氲\n水\n氷\n永\n氹\n氾\n汀\n汁\n求\n汆\n汇\n汉\n汎\n汐\n汕\n汗\n汙\n汛\n汝\n汞\n江\n池\n污\n汤\n汨\n汩\n汪\n汰\n汲\n汴\n汶\n汹\n決\n汽\n汾\n沁\n沂\n沃\n沅\n沈\n沉\n沌\n沏\n沐\n沒\n沓\n沖\n沙\n沛\n沟\n没\n沢\n沣\n沥\n沦\n沧\n沪\n沫\n沭\n沮\n沱\n河\n沸\n油\n治\n沼\n沽\n沾\n沿\n況\n泄\n泉\n泊\n泌\n泓\n法\n泗\n泛\n泞\n泠\n泡\n波\n泣\n泥\n注\n泪\n泫\n泮\n泯\n泰\n泱\n泳\n泵\n泷\n泸\n泻\n泼\n泽\n泾\n洁\n洄\n洋\n洒\n洗\n洙\n洛\n洞\n津\n洩\n洪\n洮\n洱\n洲\n洵\n洶\n洸\n洹\n活\n洼\n洽\n派\n流\n浃\n浄\n浅\n浆\n浇\n浊\n测\n济\n浏\n浑\n浒\n浓\n浔\n浙\n浚\n浜\n浣\n浦\n浩\n浪\n浬\n浮\n浯\n浴\n海\n浸\n涂\n涅\n涇\n消\n涉\n涌\n涎\n涓\n涔\n涕\n涙\n涛\n涝\n涞\n涟\n涠\n涡\n涣\n涤\n润\n涧\n涨\n涩\n涪\n涮\n涯\n液\n涵\n涸\n涼\n涿\n淀\n淄\n淅\n淆\n淇\n淋\n淌\n淑\n淒\n淖\n淘\n淙\n淚\n淞\n淡\n淤\n淦\n淨\n淩\n淪\n淫\n淬\n淮\n深\n淳\n淵\n混\n淹\n淺\n添\n淼\n清\n済\n渉\n渊\n渋\n渍\n渎\n渐\n渔\n渗\n渙\n渚\n減\n渝\n渠\n渡\n渣\n渤\n渥\n渦\n温\n測\n渭\n港\n渲\n渴\n游\n渺\n渾\n湃\n湄\n湊\n湍\n湖\n湘\n湛\n湟\n湧\n湫\n湮\n湯\n湳\n湾\n湿\n満\n溃\n溅\n溉\n溏\n源\n準\n溜\n溝\n溟\n溢\n溥\n溧\n溪\n溫\n溯\n溱\n溴\n溶\n溺\n溼\n滁\n滂\n滄\n滅\n滇\n滋\n滌\n滑\n滓\n滔\n滕\n滙\n滚\n滝\n滞\n滟\n满\n滢\n滤\n滥\n滦\n滨\n滩\n滬\n滯\n滲\n滴\n滷\n滸\n滾\n滿\n漁\n漂\n漆\n漉\n漏\n漓\n演\n漕\n漠\n漢\n漣\n漩\n漪\n漫\n漬\n漯\n漱\n漲\n漳\n漸\n漾\n漿\n潆\n潇\n潋\n潍\n潑\n潔\n潘\n潛\n潜\n潞\n潟\n潢\n潤\n潦\n潧\n潭\n潮\n潰\n潴\n潸\n潺\n潼\n澀\n澄\n澆\n澈\n澍\n澎\n澗\n澜\n澡\n澤\n澧\n澱\n澳\n澹\n激\n濁\n濂\n濃\n濑\n濒\n濕\n濘\n濛\n濟\n濠\n濡\n濤\n濫\n濬\n濮\n濯\n濱\n濺\n濾\n瀅\n瀆\n瀉\n瀋\n瀏\n瀑\n瀕\n瀘\n瀚\n瀛\n瀝\n瀞\n瀟\n瀧\n瀨\n瀬\n瀰\n瀾\n灌\n灏\n灑\n灘\n灝\n灞\n灣\n火\n灬\n灭\n灯\n灰\n灵\n灶\n灸\n灼\n災\n灾\n灿\n炀\n炁\n炅\n炉\n炊\n炎\n炒\n炔\n炕\n炖\n炙\n炜\n炫\n炬\n炭\n炮\n炯\n炳\n炷\n炸\n点\n為\n炼\n炽\n烁\n烂\n烃\n烈\n烊\n烏\n烘\n烙\n烛\n烟\n烤\n烦\n烧\n烨\n烩\n烫\n烬\n热\n烯\n烷\n烹\n烽\n焉\n焊\n焕\n焖\n焗\n焘\n焙\n焚\n焜\n無\n焦\n焯\n焰\n焱\n然\n焼\n煅\n煉\n煊\n煌\n煎\n煒\n煖\n煙\n煜\n煞\n煤\n煥\n煦\n照\n煨\n煩\n煮\n煲\n煸\n煽\n熄\n熊\n熏\n熒\n熔\n熙\n熟\n熠\n熨\n熬\n熱\n熵\n熹\n熾\n燁\n燃\n燄\n燈\n燉\n燊\n燎\n燒\n燔\n燕\n燙\n燜\n營\n燥\n燦\n燧\n燭\n燮\n燴\n燻\n燼\n燿\n爆\n爍\n爐\n爛\n爪\n爬\n爭\n爰\n爱\n爲\n爵\n父\n爷\n爸\n爹\n爺\n爻\n爽\n爾\n牆\n片\n版\n牌\n牍\n牒\n牙\n牛\n牝\n牟\n牠\n牡\n牢\n牦\n牧\n物\n牯\n牲\n牴\n牵\n特\n牺\n牽\n犀\n犁\n犄\n犊\n犍\n犒\n犢\n犧\n犬\n犯\n状\n犷\n犸\n犹\n狀\n狂\n狄\n狈\n狎\n狐\n狒\n狗\n狙\n狞\n狠\n狡\n狩\n独\n狭\n狮\n狰\n狱\n狸\n狹\n狼\n狽\n猎\n猕\n猖\n猗\n猙\n猛\n猜\n猝\n猥\n猩\n猪\n猫\n猬\n献\n猴\n猶\n猷\n猾\n猿\n獄\n獅\n獎\n獐\n獒\n獗\n獠\n獣\n獨\n獭\n獰\n獲\n獵\n獷\n獸\n獺\n獻\n獼\n獾\n玄\n率\n玉\n王\n玑\n玖\n玛\n玟\n玠\n玥\n玩\n玫\n玮\n环\n现\n玲\n玳\n玷\n玺\n玻\n珀\n珂\n珅\n珈\n珉\n珊\n珍\n珏\n珐\n珑\n珙\n珞\n珠\n珣\n珥\n珩\n珪\n班\n珮\n珲\n珺\n現\n球\n琅\n理\n琇\n琉\n琊\n琍\n琏\n琐\n琛\n琢\n琥\n琦\n琨\n琪\n琬\n琮\n琰\n琲\n琳\n琴\n琵\n琶\n琺\n琼\n瑀\n瑁\n瑄\n瑋\n瑕\n瑗\n瑙\n瑚\n瑛\n瑜\n瑞\n瑟\n瑠\n瑣\n瑤\n瑩\n瑪\n瑯\n瑰\n瑶\n瑾\n璀\n璁\n璃\n璇\n璉\n璋\n璎\n璐\n璜\n璞\n璟\n璧\n璨\n環\n璽\n璿\n瓊\n瓏\n瓒\n瓜\n瓢\n瓣\n瓤\n瓦\n瓮\n瓯\n瓴\n瓶\n瓷\n甄\n甌\n甕\n甘\n甙\n甚\n甜\n生\n產\n産\n甥\n甦\n用\n甩\n甫\n甬\n甭\n甯\n田\n由\n甲\n申\n电\n男\n甸\n町\n画\n甾\n畀\n畅\n界\n畏\n畑\n畔\n留\n畜\n畝\n畢\n略\n畦\n番\n畫\n異\n畲\n畳\n畴\n當\n畸\n畹\n畿\n疆\n疇\n疊\n疏\n疑\n疔\n疖\n疗\n疙\n疚\n疝\n疟\n疡\n疣\n疤\n疥\n疫\n疮\n疯\n疱\n疲\n疳\n疵\n疸\n疹\n疼\n疽\n疾\n痂\n病\n症\n痈\n痉\n痊\n痍\n痒\n痔\n痕\n痘\n痙\n痛\n痞\n痠\n痢\n痣\n痤\n痧\n痨\n痪\n痫\n痰\n痱\n痴\n痹\n痺\n痼\n痿\n瘀\n瘁\n瘋\n瘍\n瘓\n瘘\n瘙\n瘟\n瘠\n瘡\n瘢\n瘤\n瘦\n瘧\n瘩\n瘪\n瘫\n瘴\n瘸\n瘾\n療\n癇\n癌\n癒\n癖\n癜\n癞\n癡\n癢\n癣\n癥\n癫\n癬\n癮\n癱\n癲\n癸\n発\n登\n發\n白\n百\n皂\n的\n皆\n皇\n皈\n皋\n皎\n皑\n皓\n皖\n皙\n皚\n皮\n皰\n皱\n皴\n皺\n皿\n盂\n盃\n盅\n盆\n盈\n益\n盎\n盏\n盐\n监\n盒\n盔\n盖\n盗\n盘\n盛\n盜\n盞\n盟\n盡\n監\n盤\n盥\n盧\n盪\n目\n盯\n盱\n盲\n直\n相\n盹\n盼\n盾\n省\n眈\n眉\n看\n県\n眙\n眞\n真\n眠\n眦\n眨\n眩\n眯\n眶\n眷\n眸\n眺\n眼\n眾\n着\n睁\n睇\n睏\n睐\n睑\n睛\n睜\n睞\n睡\n睢\n督\n睥\n睦\n睨\n睪\n睫\n睬\n睹\n睽\n睾\n睿\n瞄\n瞅\n瞇\n瞋\n瞌\n瞎\n瞑\n瞒\n瞓\n瞞\n瞟\n瞠\n瞥\n瞧\n瞩\n瞪\n瞬\n瞭\n瞰\n瞳\n瞻\n瞼\n瞿\n矇\n矍\n矗\n矚\n矛\n矜\n矢\n矣\n知\n矩\n矫\n短\n矮\n矯\n石\n矶\n矽\n矾\n矿\n码\n砂\n砌\n砍\n砒\n研\n砖\n砗\n砚\n砝\n砣\n砥\n砧\n砭\n砰\n砲\n破\n砷\n砸\n砺\n砼\n砾\n础\n硅\n硐\n硒\n硕\n硝\n硫\n硬\n确\n硯\n硼\n碁\n碇\n碉\n碌\n碍\n碎\n碑\n碓\n碗\n碘\n碚\n碛\n碟\n碣\n碧\n碩\n碰\n碱\n碳\n碴\n確\n碼\n碾\n磁\n磅\n磊\n磋\n磐\n磕\n磚\n磡\n磨\n磬\n磯\n磲\n磷\n磺\n礁\n礎\n礙\n礡\n礦\n礪\n礫\n礴\n示\n礼\n社\n祀\n祁\n祂\n祇\n祈\n祉\n祎\n祐\n祕\n祖\n祗\n祚\n祛\n祜\n祝\n神\n祟\n祠\n祢\n祥\n票\n祭\n祯\n祷\n祸\n祺\n祿\n禀\n禁\n禄\n禅\n禍\n禎\n福\n禛\n禦\n禧\n禪\n禮\n禱\n禹\n禺\n离\n禽\n禾\n禿\n秀\n私\n秃\n秆\n秉\n秋\n种\n科\n秒\n秘\n租\n秣\n秤\n秦\n秧\n秩\n秭\n积\n称\n秸\n移\n秽\n稀\n稅\n程\n稍\n税\n稔\n稗\n稚\n稜\n稞\n稟\n稠\n稣\n種\n稱\n稲\n稳\n稷\n稹\n稻\n稼\n稽\n稿\n穀\n穂\n穆\n穌\n積\n穎\n穗\n穢\n穩\n穫\n穴\n究\n穷\n穹\n空\n穿\n突\n窃\n窄\n窈\n窍\n窑\n窒\n窓\n窕\n窖\n窗\n窘\n窜\n窝\n窟\n窠\n窥\n窦\n窨\n窩\n窪\n窮\n窯\n窺\n窿\n竄\n竅\n竇\n竊\n立\n竖\n站\n竜\n竞\n竟\n章\n竣\n童\n竭\n端\n競\n竹\n竺\n竽\n竿\n笃\n笆\n笈\n笋\n笏\n笑\n笔\n笙\n笛\n笞\n笠\n符\n笨\n第\n笹\n笺\n笼\n筆\n等\n筊\n筋\n筍\n筏\n筐\n筑\n筒\n答\n策\n筛\n筝\n筠\n筱\n筲\n筵\n筷\n筹\n签\n简\n箇\n箋\n箍\n箏\n箐\n箔\n箕\n算\n箝\n管\n箩\n箫\n箭\n箱\n箴\n箸\n節\n篁\n範\n篆\n篇\n築\n篑\n篓\n篙\n篝\n篠\n篡\n篤\n篩\n篪\n篮\n篱\n篷\n簇\n簌\n簍\n簡\n簦\n簧\n簪\n簫\n簷\n簸\n簽\n簾\n簿\n籁\n籃\n籌\n籍\n籐\n籟\n籠\n籤\n籬\n籮\n籲\n米\n类\n籼\n籽\n粄\n粉\n粑\n粒\n粕\n粗\n粘\n粟\n粤\n粥\n粧\n粪\n粮\n粱\n粲\n粳\n粵\n粹\n粼\n粽\n精\n粿\n糅\n糊\n糍\n糕\n糖\n糗\n糙\n糜\n糞\n糟\n糠\n糧\n糬\n糯\n糰\n糸\n系\n糾\n紀\n紂\n約\n紅\n紉\n紊\n紋\n納\n紐\n紓\n純\n紗\n紘\n紙\n級\n紛\n紜\n素\n紡\n索\n紧\n紫\n紮\n累\n細\n紳\n紹\n紺\n終\n絃\n組\n絆\n経\n結\n絕\n絞\n絡\n絢\n給\n絨\n絮\n統\n絲\n絳\n絵\n絶\n絹\n綁\n綏\n綑\n經\n継\n続\n綜\n綠\n綢\n綦\n綫\n綬\n維\n綱\n網\n綴\n綵\n綸\n綺\n綻\n綽\n綾\n綿\n緊\n緋\n総\n緑\n緒\n緘\n線\n緝\n緞\n締\n緣\n編\n緩\n緬\n緯\n練\n緹\n緻\n縁\n縄\n縈\n縛\n縝\n縣\n縫\n縮\n縱\n縴\n縷\n總\n績\n繁\n繃\n繆\n繇\n繋\n織\n繕\n繚\n繞\n繡\n繩\n繪\n繫\n繭\n繳\n繹\n繼\n繽\n纂\n續\n纍\n纏\n纓\n纔\n纖\n纜\n纠\n红\n纣\n纤\n约\n级\n纨\n纪\n纫\n纬\n纭\n纯\n纰\n纱\n纲\n纳\n纵\n纶\n纷\n纸\n纹\n纺\n纽\n纾\n线\n绀\n练\n组\n绅\n细\n织\n终\n绊\n绍\n绎\n经\n绑\n绒\n结\n绔\n绕\n绘\n给\n绚\n绛\n络\n绝\n绞\n统\n绡\n绢\n绣\n绥\n绦\n继\n绩\n绪\n绫\n续\n绮\n绯\n绰\n绳\n维\n绵\n绶\n绷\n绸\n绻\n综\n绽\n绾\n绿\n缀\n缄\n缅\n缆\n缇\n缈\n缉\n缎\n缓\n缔\n缕\n编\n缘\n缙\n缚\n缜\n缝\n缠\n缢\n缤\n缥\n缨\n缩\n缪\n缭\n缮\n缰\n缱\n缴\n缸\n缺\n缽\n罂\n罄\n罌\n罐\n网\n罔\n罕\n罗\n罚\n罡\n罢\n罩\n罪\n置\n罰\n署\n罵\n罷\n罹\n羁\n羅\n羈\n羊\n羌\n美\n羔\n羚\n羞\n羟\n羡\n羣\n群\n羥\n羧\n羨\n義\n羯\n羲\n羸\n羹\n羽\n羿\n翁\n翅\n翊\n翌\n翎\n習\n翔\n翘\n翟\n翠\n翡\n翦\n翩\n翰\n翱\n翳\n翹\n翻\n翼\n耀\n老\n考\n耄\n者\n耆\n耋\n而\n耍\n耐\n耒\n耕\n耗\n耘\n耙\n耦\n耨\n耳\n耶\n耷\n耸\n耻\n耽\n耿\n聂\n聆\n聊\n聋\n职\n聒\n联\n聖\n聘\n聚\n聞\n聪\n聯\n聰\n聲\n聳\n聴\n聶\n職\n聽\n聾\n聿\n肃\n肄\n肅\n肆\n肇\n肉\n肋\n肌\n肏\n肓\n肖\n肘\n肚\n肛\n肝\n肠\n股\n肢\n肤\n肥\n肩\n肪\n肮\n肯\n肱\n育\n肴\n肺\n肽\n肾\n肿\n胀\n胁\n胃\n胄\n胆\n背\n胍\n胎\n胖\n胚\n胛\n胜\n胝\n胞\n胡\n胤\n胥\n胧\n胫\n胭\n胯\n胰\n胱\n胳\n胴\n胶\n胸\n胺\n能\n脂\n脅\n脆\n脇\n脈\n脉\n脊\n脍\n脏\n脐\n脑\n脓\n脖\n脘\n脚\n脛\n脣\n脩\n脫\n脯\n脱\n脲\n脳\n脸\n脹\n脾\n腆\n腈\n腊\n腋\n腌\n腎\n腐\n腑\n腓\n腔\n腕\n腥\n腦\n腩\n腫\n腭\n腮\n腰\n腱\n腳\n腴\n腸\n腹\n腺\n腻\n腼\n腾\n腿\n膀\n膈\n膊\n膏\n膑\n膘\n膚\n膛\n膜\n膝\n膠\n膦\n膨\n膩\n膳\n膺\n膻\n膽\n膾\n膿\n臀\n臂\n臃\n臆\n臉\n臊\n臍\n臓\n臘\n臟\n臣\n臥\n臧\n臨\n自\n臬\n臭\n至\n致\n臺\n臻\n臼\n臾\n舀\n舂\n舅\n舆\n與\n興\n舉\n舊\n舌\n舍\n舎\n舐\n舒\n舔\n舖\n舗\n舛\n舜\n舞\n舟\n航\n舫\n般\n舰\n舱\n舵\n舶\n舷\n舸\n船\n舺\n舾\n艇\n艋\n艘\n艙\n艦\n艮\n良\n艰\n艱\n色\n艳\n艷\n艹\n艺\n艾\n节\n芃\n芈\n芊\n芋\n芍\n芎\n芒\n芙\n芜\n芝\n芡\n芥\n芦\n芩\n芪\n芫\n芬\n芭\n芮\n芯\n花\n芳\n芷\n芸\n芹\n芻\n芽\n芾\n苁\n苄\n苇\n苋\n苍\n苏\n苑\n苒\n苓\n苔\n苕\n苗\n苛\n苜\n苞\n苟\n苡\n苣\n若\n苦\n苫\n苯\n英\n苷\n苹\n苻\n茁\n茂\n范\n茄\n茅\n茉\n茎\n茏\n茗\n茜\n茧\n茨\n茫\n茬\n茭\n茯\n茱\n茲\n茴\n茵\n茶\n茸\n茹\n茼\n荀\n荃\n荆\n草\n荊\n荏\n荐\n荒\n荔\n荖\n荘\n荚\n荞\n荟\n荠\n荡\n荣\n荤\n荥\n荧\n荨\n荪\n荫\n药\n荳\n荷\n荸\n荻\n荼\n荽\n莅\n莆\n莉\n莊\n莎\n莒\n莓\n莖\n莘\n莞\n莠\n莢\n莧\n莪\n莫\n莱\n莲\n莴\n获\n莹\n莺\n莽\n莿\n菀\n菁\n菅\n菇\n菈\n菊\n菌\n菏\n菓\n菖\n菘\n菜\n菟\n菠\n菡\n菩\n華\n菱\n菲\n菸\n菽\n萁\n萃\n萄\n萊\n萋\n萌\n萍\n萎\n萘\n萝\n萤\n营\n萦\n萧\n萨\n萩\n萬\n萱\n萵\n萸\n萼\n落\n葆\n葉\n著\n葚\n葛\n葡\n董\n葦\n葩\n葫\n葬\n葭\n葯\n葱\n葳\n葵\n葷\n葺\n蒂\n蒋\n蒐\n蒔\n蒙\n蒜\n蒞\n蒟\n蒡\n蒨\n蒲\n蒸\n蒹\n蒻\n蒼\n蒿\n蓁\n蓄\n蓆\n蓉\n蓋\n蓑\n蓓\n蓖\n蓝\n蓟\n蓦\n蓬\n蓮\n蓼\n蓿\n蔑\n蔓\n蔔\n蔗\n蔘\n蔚\n蔡\n蔣\n蔥\n蔫\n蔬\n蔭\n蔵\n蔷\n蔺\n蔻\n蔼\n蔽\n蕁\n蕃\n蕈\n蕉\n蕊\n蕎\n蕙\n蕤\n蕨\n蕩\n蕪\n蕭\n蕲\n蕴\n蕻\n蕾\n薄\n薅\n薇\n薈\n薊\n薏\n薑\n薔\n薙\n薛\n薦\n薨\n薩\n薪\n薬\n薯\n薰\n薹\n藉\n藍\n藏\n藐\n藓\n藕\n藜\n藝\n藤\n藥\n藩\n藹\n藻\n藿\n蘆\n蘇\n蘊\n蘋\n蘑\n蘚\n蘭\n蘸\n蘼\n蘿\n虎\n虏\n虐\n虑\n虔\n處\n虚\n虛\n虜\n虞\n號\n虢\n虧\n虫\n虬\n虱\n虹\n虻\n虽\n虾\n蚀\n蚁\n蚂\n蚊\n蚌\n蚓\n蚕\n蚜\n蚝\n蚣\n蚤\n蚩\n蚪\n蚯\n蚱\n蚵\n蛀\n蛆\n蛇\n蛊\n蛋\n蛎\n蛐\n蛔\n蛙\n蛛\n蛟\n蛤\n蛭\n蛮\n蛰\n蛳\n蛹\n蛻\n蛾\n蜀\n蜂\n蜃\n蜆\n蜇\n蜈\n蜊\n蜍\n蜒\n蜓\n蜕\n蜗\n蜘\n蜚\n蜜\n蜡\n蜢\n蜥\n蜱\n蜴\n蜷\n蜻\n蜿\n蝇\n蝈\n蝉\n蝌\n蝎\n蝕\n蝗\n蝙\n蝟\n蝠\n蝦\n蝨\n蝴\n蝶\n蝸\n蝼\n螂\n螃\n融\n螞\n螢\n螨\n螯\n螳\n螺\n蟀\n蟄\n蟆\n蟋\n蟎\n蟑\n蟒\n蟠\n蟬\n蟲\n蟹\n蟻\n蟾\n蠅\n蠍\n蠔\n蠕\n蠛\n蠟\n蠡\n蠢\n蠣\n蠱\n蠶\n蠹\n蠻\n血\n衄\n衅\n衆\n行\n衍\n術\n衔\n街\n衙\n衛\n衝\n衞\n衡\n衢\n衣\n补\n表\n衩\n衫\n衬\n衮\n衰\n衲\n衷\n衹\n衾\n衿\n袁\n袂\n袄\n袅\n袈\n袋\n袍\n袒\n袖\n袜\n袞\n袤\n袪\n被\n袭\n袱\n裁\n裂\n装\n裆\n裊\n裏\n裔\n裕\n裘\n裙\n補\n裝\n裟\n裡\n裤\n裨\n裱\n裳\n裴\n裸\n裹\n製\n裾\n褂\n複\n褐\n褒\n褓\n褔\n褚\n褥\n褪\n褫\n褲\n褶\n褻\n襁\n襄\n襟\n襠\n襪\n襬\n襯\n襲\n西\n要\n覃\n覆\n覇\n見\n規\n覓\n視\n覚\n覦\n覧\n親\n覬\n観\n覷\n覺\n覽\n觀\n见\n观\n规\n觅\n视\n览\n觉\n觊\n觎\n觐\n觑\n角\n觞\n解\n觥\n触\n觸\n言\n訂\n計\n訊\n討\n訓\n訕\n訖\n託\n記\n訛\n訝\n訟\n訣\n訥\n訪\n設\n許\n訳\n訴\n訶\n診\n註\n証\n詆\n詐\n詔\n評\n詛\n詞\n詠\n詡\n詢\n詣\n試\n詩\n詫\n詬\n詭\n詮\n詰\n話\n該\n詳\n詹\n詼\n誅\n誇\n誉\n誌\n認\n誓\n誕\n誘\n語\n誠\n誡\n誣\n誤\n誥\n誦\n誨\n說\n説\n読\n誰\n課\n誹\n誼\n調\n諄\n談\n請\n諏\n諒\n論\n諗\n諜\n諡\n諦\n諧\n諫\n諭\n諮\n諱\n諳\n諷\n諸\n諺\n諾\n謀\n謁\n謂\n謄\n謊\n謎\n謐\n謔\n謗\n謙\n講\n謝\n謠\n謨\n謬\n謹\n謾\n譁\n證\n譎\n譏\n識\n譙\n譚\n譜\n警\n譬\n譯\n議\n譲\n譴\n護\n譽\n讀\n變\n讓\n讚\n讞\n计\n订\n认\n讥\n讧\n讨\n让\n讪\n讫\n训\n议\n讯\n记\n讲\n讳\n讴\n讶\n讷\n许\n讹\n论\n讼\n讽\n设\n访\n诀\n证\n诃\n评\n诅\n识\n诈\n诉\n诊\n诋\n词\n诏\n译\n试\n诗\n诘\n诙\n诚\n诛\n话\n诞\n诟\n诠\n诡\n询\n诣\n诤\n该\n详\n诧\n诩\n诫\n诬\n语\n误\n诰\n诱\n诲\n说\n诵\n诶\n请\n诸\n诺\n读\n诽\n课\n诿\n谀\n谁\n调\n谄\n谅\n谆\n谈\n谊\n谋\n谌\n谍\n谎\n谏\n谐\n谑\n谒\n谓\n谔\n谕\n谗\n谘\n谙\n谚\n谛\n谜\n谟\n谢\n谣\n谤\n谥\n谦\n谧\n谨\n谩\n谪\n谬\n谭\n谯\n谱\n谲\n谴\n谶\n谷\n豁\n豆\n豇\n豈\n豉\n豊\n豌\n豎\n豐\n豔\n豚\n象\n豢\n豪\n豫\n豬\n豹\n豺\n貂\n貅\n貌\n貓\n貔\n貘\n貝\n貞\n負\n財\n貢\n貧\n貨\n販\n貪\n貫\n責\n貯\n貰\n貳\n貴\n貶\n買\n貸\n費\n貼\n貽\n貿\n賀\n賁\n賂\n賃\n賄\n資\n賈\n賊\n賑\n賓\n賜\n賞\n賠\n賡\n賢\n賣\n賤\n賦\n質\n賬\n賭\n賴\n賺\n購\n賽\n贅\n贈\n贊\n贍\n贏\n贓\n贖\n贛\n贝\n贞\n负\n贡\n财\n责\n贤\n败\n账\n货\n质\n贩\n贪\n贫\n贬\n购\n贮\n贯\n贰\n贱\n贲\n贴\n贵\n贷\n贸\n费\n贺\n贻\n贼\n贾\n贿\n赁\n赂\n赃\n资\n赅\n赈\n赊\n赋\n赌\n赎\n赏\n赐\n赓\n赔\n赖\n赘\n赚\n赛\n赝\n赞\n赠\n赡\n赢\n赣\n赤\n赦\n赧\n赫\n赭\n走\n赳\n赴\n赵\n赶\n起\n趁\n超\n越\n趋\n趕\n趙\n趟\n趣\n趨\n足\n趴\n趵\n趸\n趺\n趾\n跃\n跄\n跆\n跋\n跌\n跎\n跑\n跖\n跚\n跛\n距\n跟\n跡\n跤\n跨\n跩\n跪\n路\n跳\n践\n跷\n跹\n跺\n跻\n踉\n踊\n踌\n踏\n踐\n踝\n踞\n踟\n踢\n踩\n踪\n踮\n踱\n踴\n踵\n踹\n蹂\n蹄\n蹇\n蹈\n蹉\n蹊\n蹋\n蹑\n蹒\n蹙\n蹟\n蹣\n蹤\n蹦\n蹩\n蹬\n蹭\n蹲\n蹴\n蹶\n蹺\n蹼\n蹿\n躁\n躇\n躉\n躊\n躋\n躍\n躏\n躪\n身\n躬\n躯\n躲\n躺\n軀\n車\n軋\n軌\n軍\n軒\n軟\n転\n軸\n軼\n軽\n軾\n較\n載\n輒\n輓\n輔\n輕\n輛\n輝\n輟\n輩\n輪\n輯\n輸\n輻\n輾\n輿\n轄\n轅\n轆\n轉\n轍\n轎\n轟\n车\n轧\n轨\n轩\n转\n轭\n轮\n软\n轰\n轲\n轴\n轶\n轻\n轼\n载\n轿\n较\n辄\n辅\n辆\n辇\n辈\n辉\n辊\n辍\n辐\n辑\n输\n辕\n辖\n辗\n辘\n辙\n辛\n辜\n辞\n辟\n辣\n辦\n辨\n辩\n辫\n辭\n辮\n辯\n辰\n辱\n農\n边\n辺\n辻\n込\n辽\n达\n迁\n迂\n迄\n迅\n过\n迈\n迎\n运\n近\n返\n还\n这\n进\n远\n违\n连\n迟\n迢\n迤\n迥\n迦\n迩\n迪\n迫\n迭\n述\n迴\n迷\n迸\n迹\n迺\n追\n退\n送\n适\n逃\n逅\n逆\n选\n逊\n逍\n透\n逐\n递\n途\n逕\n逗\n這\n通\n逛\n逝\n逞\n速\n造\n逢\n連\n逮\n週\n進\n逵\n逶\n逸\n逻\n逼\n逾\n遁\n遂\n遅\n遇\n遊\n運\n遍\n過\n遏\n遐\n遑\n遒\n道\n達\n違\n遗\n遙\n遛\n遜\n遞\n遠\n遢\n遣\n遥\n遨\n適\n遭\n遮\n遲\n遴\n遵\n遶\n遷\n選\n遺\n遼\n遽\n避\n邀\n邁\n邂\n邃\n還\n邇\n邈\n邊\n邋\n邏\n邑\n邓\n邕\n邛\n邝\n邢\n那\n邦\n邨\n邪\n邬\n邮\n邯\n邰\n邱\n邳\n邵\n邸\n邹\n邺\n邻\n郁\n郅\n郊\n郎\n郑\n郜\n郝\n郡\n郢\n郤\n郦\n郧\n部\n郫\n郭\n郴\n郵\n郷\n郸\n都\n鄂\n鄉\n鄒\n鄔\n鄙\n鄞\n鄢\n鄧\n鄭\n鄰\n鄱\n鄲\n鄺\n酉\n酊\n酋\n酌\n配\n酐\n酒\n酗\n酚\n酝\n酢\n酣\n酥\n酩\n酪\n酬\n酮\n酯\n酰\n酱\n酵\n酶\n酷\n酸\n酿\n醃\n醇\n醉\n醋\n醍\n醐\n醒\n醚\n醛\n醜\n醞\n醣\n醪\n醫\n醬\n醮\n醯\n醴\n醺\n釀\n釁\n采\n釉\n释\n釋\n里\n重\n野\n量\n釐\n金\n釗\n釘\n釜\n針\n釣\n釦\n釧\n釵\n鈀\n鈉\n鈍\n鈎\n鈔\n鈕\n鈞\n鈣\n鈦\n鈪\n鈴\n鈺\n鈾\n鉀\n鉄\n鉅\n鉉\n鉑\n鉗\n鉚\n鉛\n鉤\n鉴\n鉻\n銀\n銃\n銅\n銑\n銓\n銖\n銘\n銜\n銬\n銭\n銮\n銳\n銷\n銹\n鋁\n鋅\n鋒\n鋤\n鋪\n鋰\n鋸\n鋼\n錄\n錐\n錘\n錚\n錠\n錢\n錦\n錨\n錫\n錮\n錯\n録\n錳\n錶\n鍊\n鍋\n鍍\n鍛\n鍥\n鍰\n鍵\n鍺\n鍾\n鎂\n鎊\n鎌\n鎏\n鎔\n鎖\n鎗\n鎚\n鎧\n鎬\n鎮\n鎳\n鏈\n鏖\n鏗\n鏘\n鏞\n鏟\n鏡\n鏢\n鏤\n鏽\n鐘\n鐮\n鐲\n鐳\n鐵\n鐸\n鐺\n鑄\n鑊\n鑑\n鑒\n鑣\n鑫\n鑰\n鑲\n鑼\n鑽\n鑾\n鑿\n针\n钉\n钊\n钎\n钏\n钒\n钓\n钗\n钙\n钛\n钜\n钝\n钞\n钟\n钠\n钡\n钢\n钣\n钤\n钥\n钦\n钧\n钨\n钩\n钮\n钯\n钰\n钱\n钳\n钴\n钵\n钺\n钻\n钼\n钾\n钿\n铀\n铁\n铂\n铃\n铄\n铅\n铆\n铉\n铎\n铐\n铛\n铜\n铝\n铠\n铡\n铢\n铣\n铤\n铨\n铩\n铬\n铭\n铮\n铰\n铲\n铵\n银\n铸\n铺\n链\n铿\n销\n锁\n锂\n锄\n锅\n锆\n锈\n锉\n锋\n锌\n锏\n锐\n锑\n错\n锚\n锟\n锡\n锢\n锣\n锤\n锥\n锦\n锭\n键\n锯\n锰\n锲\n锵\n锹\n锺\n锻\n镀\n镁\n镂\n镇\n镉\n镌\n镍\n镐\n镑\n镕\n镖\n镗\n镛\n镜\n镣\n镭\n镯\n镰\n镳\n镶\n長\n长\n門\n閃\n閉\n開\n閎\n閏\n閑\n閒\n間\n閔\n閘\n閡\n関\n閣\n閥\n閨\n閩\n閱\n閲\n閹\n閻\n閾\n闆\n闇\n闊\n闌\n闍\n闔\n闕\n闖\n闘\n關\n闡\n闢\n门\n闪\n闫\n闭\n问\n闯\n闰\n闲\n间\n闵\n闷\n闸\n闹\n闺\n闻\n闽\n闾\n阀\n阁\n阂\n阅\n阆\n阇\n阈\n阉\n阎\n阐\n阑\n阔\n阕\n阖\n阙\n阚\n阜\n队\n阡\n阪\n阮\n阱\n防\n阳\n阴\n阵\n阶\n阻\n阿\n陀\n陂\n附\n际\n陆\n陇\n陈\n陋\n陌\n降\n限\n陕\n陛\n陝\n陞\n陟\n陡\n院\n陣\n除\n陨\n险\n陪\n陰\n陲\n陳\n陵\n陶\n陷\n陸\n険\n陽\n隅\n隆\n隈\n隊\n隋\n隍\n階\n随\n隐\n隔\n隕\n隘\n隙\n際\n障\n隠\n隣\n隧\n隨\n險\n隱\n隴\n隶\n隸\n隻\n隼\n隽\n难\n雀\n雁\n雄\n雅\n集\n雇\n雉\n雋\n雌\n雍\n雎\n雏\n雑\n雒\n雕\n雖\n雙\n雛\n雜\n雞\n離\n難\n雨\n雪\n雯\n雰\n雲\n雳\n零\n雷\n雹\n電\n雾\n需\n霁\n霄\n霆\n震\n霈\n霉\n霊\n霍\n霎\n霏\n霑\n霓\n霖\n霜\n霞\n霧\n霭\n霰\n露\n霸\n霹\n霽\n霾\n靂\n靄\n靈\n青\n靓\n靖\n静\n靚\n靛\n靜\n非\n靠\n靡\n面\n靥\n靦\n革\n靳\n靴\n靶\n靼\n鞅\n鞋\n鞍\n鞏\n鞑\n鞘\n鞠\n鞣\n鞦\n鞭\n韆\n韋\n韌\n韓\n韜\n韦\n韧\n韩\n韬\n韭\n音\n韵\n韶\n韻\n響\n頁\n頂\n頃\n項\n順\n須\n頌\n預\n頑\n頒\n頓\n頗\n領\n頜\n頡\n頤\n頫\n頭\n頰\n頷\n頸\n頹\n頻\n頼\n顆\n題\n額\n顎\n顏\n顔\n願\n顛\n類\n顧\n顫\n顯\n顱\n顴\n页\n顶\n顷\n项\n顺\n须\n顼\n顽\n顾\n顿\n颁\n颂\n预\n颅\n领\n颇\n颈\n颉\n颊\n颌\n颍\n颐\n频\n颓\n颔\n颖\n颗\n题\n颚\n颛\n颜\n额\n颞\n颠\n颡\n颢\n颤\n颦\n颧\n風\n颯\n颱\n颳\n颶\n颼\n飄\n飆\n风\n飒\n飓\n飕\n飘\n飙\n飚\n飛\n飞\n食\n飢\n飨\n飩\n飪\n飯\n飲\n飼\n飽\n飾\n餃\n餅\n餉\n養\n餌\n餐\n餒\n餓\n餘\n餚\n餛\n餞\n餡\n館\n餮\n餵\n餾\n饅\n饈\n饋\n饌\n饍\n饑\n饒\n饕\n饗\n饞\n饥\n饨\n饪\n饬\n饭\n饮\n饯\n饰\n饱\n饲\n饴\n饵\n饶\n饷\n饺\n饼\n饽\n饿\n馀\n馁\n馄\n馅\n馆\n馈\n馋\n馍\n馏\n馒\n馔\n首\n馗\n香\n馥\n馨\n馬\n馭\n馮\n馳\n馴\n駁\n駄\n駅\n駆\n駐\n駒\n駕\n駛\n駝\n駭\n駱\n駿\n騁\n騎\n騏\n験\n騙\n騨\n騰\n騷\n驀\n驅\n驊\n驍\n驒\n驕\n驗\n驚\n驛\n驟\n驢\n驥\n马\n驭\n驮\n驯\n驰\n驱\n驳\n驴\n驶\n驷\n驸\n驹\n驻\n驼\n驾\n驿\n骁\n骂\n骄\n骅\n骆\n骇\n骈\n骊\n骋\n验\n骏\n骐\n骑\n骗\n骚\n骛\n骜\n骞\n骠\n骡\n骤\n骥\n骧\n骨\n骯\n骰\n骶\n骷\n骸\n骼\n髂\n髅\n髋\n髏\n髒\n髓\n體\n髖\n高\n髦\n髪\n髮\n髯\n髻\n鬃\n鬆\n鬍\n鬓\n鬚\n鬟\n鬢\n鬣\n鬥\n鬧\n鬱\n鬼\n魁\n魂\n魄\n魅\n魇\n魍\n魏\n魔\n魘\n魚\n魯\n魷\n鮑\n鮨\n鮪\n鮭\n鮮\n鯉\n鯊\n鯖\n鯛\n鯨\n鯰\n鯽\n鰍\n鰓\n鰭\n鰲\n鰻\n鰾\n鱈\n鱉\n鱔\n鱗\n鱷\n鱸\n鱼\n鱿\n鲁\n鲈\n鲍\n鲑\n鲛\n鲜\n鲟\n鲢\n鲤\n鲨\n鲫\n鲱\n鲲\n鲶\n鲷\n鲸\n鳃\n鳄\n鳅\n鳌\n鳍\n鳕\n鳖\n鳗\n鳝\n鳞\n鳥\n鳩\n鳳\n鳴\n鳶\n鴉\n鴕\n鴛\n鴦\n鴨\n鴻\n鴿\n鵑\n鵜\n鵝\n鵡\n鵬\n鵰\n鵲\n鶘\n鶩\n鶯\n鶴\n鷗\n鷲\n鷹\n鷺\n鸚\n鸞\n鸟\n鸠\n鸡\n鸢\n鸣\n鸥\n鸦\n鸨\n鸪\n鸭\n鸯\n鸳\n鸵\n鸽\n鸾\n鸿\n鹂\n鹃\n鹄\n鹅\n鹈\n鹉\n鹊\n鹌\n鹏\n鹑\n鹕\n鹘\n鹜\n鹞\n鹤\n鹦\n鹧\n鹫\n鹭\n鹰\n鹳\n鹵\n鹹\n鹼\n鹽\n鹿\n麂\n麋\n麒\n麓\n麗\n麝\n麟\n麥\n麦\n麩\n麴\n麵\n麸\n麺\n麻\n麼\n麽\n麾\n黃\n黄\n黍\n黎\n黏\n黑\n黒\n黔\n默\n黛\n黜\n黝\n點\n黠\n黨\n黯\n黴\n鼋\n鼎\n鼐\n鼓\n鼠\n鼬\n鼹\n鼻\n鼾\n齁\n齊\n齋\n齐\n齒\n齡\n齢\n齣\n齦\n齿\n龄\n龅\n龈\n龊\n龋\n龌\n龍\n龐\n龔\n龕\n龙\n龚\n龛\n龜\n龟\n︰\n︱\n︶\n︿\n﹁\n﹂\n﹍\n﹏\n﹐\n﹑\n﹒\n﹔\n﹕\n﹖\n﹗\n﹙\n﹚\n﹝\n﹞\n﹡\n﹣\n！\n＂\n＃\n＄\n％\n＆\n＇\n（\n）\n＊\n＋\n，\n－\n．\n／\n０\n１\n２\n３\n４\n５\n６\n７\n８\n９\n：\n；\n＜\n＝\n＞\n？\n＠\n［\n＼\n］\n＾\n＿\n｀\nａ\nｂ\nｃ\nｄ\nｅ\nｆ\nｇ\nｈ\nｉ\nｊ\nｋ\nｌ\nｍ\nｎ\nｏ\nｐ\nｑ\nｒ\nｓ\nｔ\nｕ\nｖ\nｗ\nｘ\nｙ\nｚ\n｛\n｜\n｝\n～\n｡\n｢\n｣\n､\n･\nｯ\nｰ\nｲ\nｸ\nｼ\nｽ\nﾄ\nﾉ\nﾌ\nﾗ\nﾙ\nﾝ\nﾞ\nﾟ\n￣\n￥\n👍\n🔥\n😂\n😎\n...\nyam\n10\n2017\n12\n11\n2016\n20\n30\n15\n06\nlofter\n##s\n2015\nby\n16\n14\n18\n13\n24\n17\n2014\n21\n##0\n22\n19\n25\n23\ncom\n100\n00\n05\n2013\n##a\n03\n09\n08\n28\n##2\n50\n01\n04\n##1\n27\n02\n2012\n##3\n26\n##e\n07\n##8\n##5\n##6\n##4\n##9\n##7\n29\n2011\n40\n##t\n2010\n##o\n##d\n##i\n2009\n##n\napp\nwww\nthe\n##m\n31\n##c\n##l\n##y\n##r\n##g\n2008\n60\nhttp\n200\nqq\n##p\n80\n##f\ngoogle\npixnet\n90\ncookies\ntripadvisor\n500\n##er\n##k\n35\n##h\nfacebook\n2007\n2000\n70\n##b\nof\n##x\n##u\n45\n300\niphone\n32\n1000\n2006\n48\nip\n36\nin\n38\n3d\n##w\n##ing\n55\nctrip\n##on\n##v\n33\n##の\nto\n34\n400\nid\n2005\nit\n37\nwindows\nllc\ntop\n99\n42\n39\n000\nled\nat\n##an\n41\n51\n52\n46\n49\n43\n53\n44\n##z\nandroid\n58\nand\n59\n2004\n56\nvr\n##か\n5000\n2003\n47\nblogthis\ntwitter\n54\n##le\n150\nok\n2018\n57\n75\ncn\nno\nios\n##in\n##mm\n##00\n800\non\nte\n3000\n65\n2001\n360\n95\nig\nlv\n120\n##ng\n##を\n##us\n##に\npc\nてす\n──\n600\n##te\n85\n2002\n88\n##ed\nhtml\nncc\nwifi\nemail\n64\nblog\nis\n##10\n##て\nmail\nonline\n##al\ndvd\n##ic\nstudio\n##は\n##℃\n##ia\n##と\nline\nvip\n72\n##q\n98\n##ce\n##en\nfor\n##is\n##ra\n##es\n##j\nusb\nnet\ncp\n1999\nasia\n4g\n##cm\ndiy\nnew\n3c\n##お\nta\n66\nlanguage\nvs\napple\ntw\n86\nweb\n##ne\nipad\n62\nyou\n##re\n101\n68\n##tion\nps\nde\nbt\npony\natm\n##2017\n1998\n67\n##ch\nceo\n##or\ngo\n##na\nav\npro\ncafe\n96\npinterest\n97\n63\npixstyleme3c\n##ta\nmore\nsaid\n##2016\n1997\nmp3\n700\n##ll\nnba\njun\n##20\n92\ntv\n1995\npm\n61\n76\nnbsp\n250\n##ie\nlinux\n##ma\ncd\n110\nhd\n##17\n78\n##ion\n77\n6000\nam\n##th\n##st\n94\n##se\n##et\n69\n180\ngdp\nmy\n105\n81\nabc\n89\nflash\n79\none\n93\n1990\n1996\n##ck\ngps\n##も\n##ly\nweb885\n106\n2020\n91\n##ge\n4000\n1500\nxd\nboss\nisbn\n1994\norg\n##ry\nme\nlove\n##11\n0fork\n73\n##12\n3g\n##ter\n##ar\n71\n82\n##la\nhotel\n130\n1970\npk\n83\n87\n140\nie\n##os\n##30\n##el\n74\n##50\nseo\ncpu\n##ml\np2p\n84\nmay\n##る\nsun\ntue\ninternet\ncc\nposted\nyoutube\n##at\n##ン\n##man\nii\n##ル\n##15\nabs\nnt\npdf\nyahoo\nago\n1980\n##it\nnews\nmac\n104\n##てす\n##me\n##り\njava\n1992\nspa\n##de\n##nt\nhk\nall\nplus\nla\n1993\n##mb\n##16\n##ve\nwest\n##da\n160\nair\n##い\n##ps\nから\n##to\n1989\nlogo\nhtc\nphp\nhttps\nfi\nmomo\n##son\nsat\n##ke\n##80\nebd\nsuv\nwi\nday\napk\n##88\n##um\nmv\ngalaxy\nwiki\nor\nbrake\n##ス\n1200\nする\nthis\n1991\nmon\n##こ\n❤2017\npo\n##ない\njavascript\nlife\nhome\njune\n##ss\nsystem\n900\n##ー\n##０\npp\n1988\nworld\nfb\n4k\nbr\n##as\nic\nai\nleonardo\nsafari\n##60\nlive\nfree\nxx\nwed\nwin7\nkiehl\n##co\nlg\no2o\n##go\nus\n235\n1949\nmm\nしい\nvfm\nkanye\n##90\n##2015\n##id\njr\n##ey\n123\nrss\n##sa\n##ro\n##am\n##no\nthu\nfri\n350\n##sh\n##ki\n103\ncomments\nname\n##のて\n##pe\n##ine\nmax\n1987\n8000\nuber\n##mi\n##ton\nwordpress\noffice\n1986\n1985\n##ment\n107\nbd\nwin10\n##ld\n##li\ngmail\nbb\ndior\n##rs\n##ri\n##rd\n##ます\nup\ncad\n##®\ndr\nして\nread\n##21\nをお\n##io\n##99\nurl\n1984\npvc\npaypal\nshow\npolicy\n##40\n##ty\n##18\nwith\n##★\n##01\ntxt\n102\n##ba\ndna\nfrom\npost\nmini\nar\ntaiwan\njohn\n##ga\nprivacy\nagoda\n##13\n##ny\nword\n##24\n##22\n##by\n##ur\n##hz\n1982\n##ang\n265\ncookie\nnetscape\n108\n##ka\n##～\n##ad\nhouse\nshare\nnote\nibm\ncode\nhello\nnike\nsim\nsurvey\n##016\n1979\n1950\nwikia\n##32\n##017\n5g\ncbc\n##tor\n##kg\n1983\n##rt\n##14\ncampaign\nstore\n2500\nos\n##ct\n##ts\n##°\n170\napi\n##ns\n365\nexcel\n##な\n##ao\n##ら\n##し\n～～\n##nd\nuniversity\n163\nには\n518\n##70\n##ya\n##il\n##25\npierre\nipo\n0020\n897\n##23\nhotels\n##ian\nのお\n125\nyears\n6606\n##ers\n##26\nhigh\n##day\ntime\n##ay\nbug\n##line\n##く\n##す\n##be\nxp\ntalk2yam\nyamservice\n10000\ncoco\n##dy\nsony\n##ies\n1978\nmicrosoft\ndavid\npeople\n##ha\n1960\ninstagram\nintel\nその\n##ot\niso\n1981\n##va\n115\n##mo\n##land\nxxx\nman\nco\nltxsw\n##ation\nbaby\n220\n##pa\n##ol\n1945\n7000\ntag\n450\n##ue\nmsn\n##31\noppo\n##ト\n##ca\ncontrol\n##om\nst\nchrome\n##ure\n##ん\nbe\n##き\nlol\n##19\nした\n##bo\n240\nlady\n##100\n##way\n##から\n4600\n##ko\n##do\n##un\n4s\ncorporation\n168\n##ni\nherme\n##28\nｃｐ\n978\n##up\n##06\nui\n##ds\nppt\nadmin\nthree\nします\nbbc\nre\n128\n##48\nca\n##015\n##35\nhp\n##ee\ntpp\n##た\n##ive\n××\nroot\n##cc\n##ました\n##ble\n##ity\nadobe\npark\n114\net\noled\ncity\n##ex\n##ler\n##ap\nchina\n##book\n20000\nview\n##ice\nglobal\n##km\nyour\nhong\n##mg\nout\n##ms\nng\nebay\n##29\nmenu\nubuntu\n##cy\nrom\n##view\nopen\nktv\ndo\nserver\n##lo\nif\nenglish\n##ね\n##５\n##oo\n1600\n##02\nstep1\nkong\nclub\n135\njuly\ninc\n1976\nmr\nhi\n##net\ntouch\n##ls\n##ii\nmichael\nlcd\n##05\n##33\nphone\njames\nstep2\n1300\nios9\n##box\ndc\n##２\n##ley\nsamsung\n111\n280\npokemon\ncss\n##ent\n##les\nいいえ\n##１\ns8\natom\nplay\nbmw\n##said\nsa\netf\nctrl\n♥yoyo♥\n##55\n2025\n##2014\n##66\nadidas\namazon\n1958\n##ber\n##ner\nvisa\n##77\n##der\n1800\nconnectivity\n##hi\nfirefox\n109\n118\nhr\nso\nstyle\nmark\npop\nol\nskip\n1975\nas\n##27\n##ir\n##61\n190\nmba\n##う\n##ai\nle\n##ver\n1900\ncafe2017\nlte\nsuper\n113\n129\n##ron\namd\nlike\n##☆\nare\n##ster\nwe\n##sk\npaul\ndata\ninternational\n##ft\nlongchamp\nssd\ngood\n##ート\n##ti\nreply\n##my\n↓↓↓\napr\nstar\n##ker\nsource\n136\njs\n112\nget\nforce\nphoto\n##one\n126\n##2013\n##ow\nlink\nbbs\n1972\ngoods\n##lin\npython\n119\n##ip\ngame\n##ics\n##ません\nblue\n##●\n520\n##45\npage\nitunes\n##03\n1955\n260\n1968\ngt\ngif\n618\n##ff\n##47\ngroup\nくたさい\nabout\nbar\nganji\n##nce\nmusic\nlee\nnot\n1977\n1971\n1973\n##per\nan\nfaq\ncomment\n##って\ndays\n##ock\n116\n##bs\n1974\n1969\nv1\nplayer\n1956\nxbox\nsql\nfm\nf1\n139\n##ah\n210\n##lv\n##mp\n##000\nmelody\n1957\n##３\n550\n17life\n199\n1966\nxml\nmarket\n##au\n##71\n999\n##04\nwhat\ngl\n##95\n##age\ntips\n##68\nbook\n##ting\nmysql\ncan\n1959\n230\n##ung\nwonderland\nwatch\n10℃\n##ction\n9000\nmar\nmobile\n1946\n1962\narticle\n##db\npart\n▲top\nparty\nって\n1967\n1964\n1948\n##07\n##ore\n##op\nこの\ndj\n##78\n##38\n010\nmain\n225\n1965\n##ong\nart\n320\nad\n134\n020\n##73\n117\npm2\njapan\n228\n##08\nts\n1963\n##ica\nder\nsm\n##36\n2019\n##wa\nct\n##７\n##や\n##64\n1937\nhomemesh\nsearch\n##85\n##れは\n##tv\n##di\nmacbook\n##９\n##くたさい\nservice\n##♥\ntype\nった\n750\n##ier\n##si\n##75\n##います\n##ok\nbest\n##ット\ngoris\nlock\n##った\ncf\n3m\nbig\n##ut\nftp\ncarol\n##vi\n１０\n1961\nhappy\nsd\n##ac\n122\nanti\npe\ncnn\niii\n1920\n138\n##ラ\n1940\nesp\njan\ntags\n##98\n##51\naugust\nvol\n##86\n154\n##™\n##fs\n##れ\n##sion\ndesign\nac\n##ム\npress\njordan\nppp\nthat\nkey\ncheck\n##６\n##tt\n##㎡\n1080p\n##lt\npower\n##42\n1952\n##bc\nvivi\n##ック\nhe\n133\n121\njpg\n##rry\n201\n175\n3500\n1947\nnb\n##ted\n##rn\nしています\n1954\nusd\n##t00\nmaster\n##ンク\n001\nmodel\n##58\nal\n##09\n1953\n##34\nram\ngoo\nても\n##ui\n127\n1930\nred\n##ary\nrpg\nitem\n##pm\n##41\n270\n##za\nproject\n##2012\nhot\ntd\nblogabstract\n##ger\n##62\n650\n##44\ngr2\n##します\n##ｍ\nblack\nelectronic\nnfc\nyear\nasus\nまた\nhtml5\ncindy\n##hd\nm3\n132\nesc\n##od\nbooking\n##53\nfed\ntvb\n##81\n##ina\nmit\n165\n##いる\nchan\n192\ndistribution\nnext\nになる\npeter\nbios\nsteam\ncm\n1941\nにも\npk10\n##ix\n##65\n##91\ndec\nnasa\n##ana\nicecat\n00z\nb1\nwill\n##46\nli\nse\n##ji\n##み\n##ard\noct\n##ain\njp\n##ze\n##bi\ncio\n##56\nsmart\nh5\n##39\n##port\ncurve\nvpn\n##nm\n##dia\nutc\n##あり\n12345678910\n##52\nrmvb\nchanel\na4\nmiss\n##and\n##im\nmedia\nwho\n##63\nshe\ngirl\n5s\n124\nvera\n##して\nclass\nvivo\nking\n##フ\n##ei\nnational\nab\n1951\n5cm\n888\n145\nipod\nap\n1100\n5mm\n211\nms\n2756\n##69\nmp4\nmsci\n##po\n##89\n131\nmg\nindex\n380\n##bit\n##out\n##zz\n##97\n##67\n158\napec\n##８\nphotoshop\nopec\n￥799\nては\n##96\n##tes\n##ast\n2g\n○○\n##ール\n￥2899\n##ling\n##よ\n##ory\n1938\n##ical\nkitty\ncontent\n##43\nstep3\n##cn\nwin8\n155\nvc\n1400\niphone7\nrobert\n##した\ntcl\n137\nbeauty\n##87\nen\ndollars\n##ys\n##oc\nstep\npay\nyy\na1\n##2011\n##lly\n##ks\n##♪\n1939\n188\ndownload\n1944\nsep\nexe\nph\nいます\nschool\ngb\ncenter\npr\nstreet\n##board\nuv\n##37\n##lan\nwinrar\n##que\n##ua\n##com\n1942\n1936\n480\ngpu\n##４\nettoday\nfu\ntom\n##54\n##ren\n##via\n149\n##72\nb2b\n144\n##79\n##tch\nrose\narm\nmb\n##49\n##ial\n##nn\nnvidia\nstep4\nmvp\n00㎡\nyork\n156\n##イ\nhow\ncpi\n591\n2765\ngov\nkg\njoe\n##xx\nmandy\npa\n##ser\ncopyright\nfashion\n1935\ndon\n##け\necu\n##ist\n##art\nerp\nwap\nhave\n##lm\ntalk\n##ek\n##ning\n##if\nch\n##ite\nvideo\n1943\ncs\nsan\niot\nlook\n##84\n##2010\n##ku\noctober\n##ux\ntrump\n##hs\n##ide\nbox\n141\nfirst\n##ins\napril\n##ight\n##83\n185\nangel\nprotected\naa\n151\n162\nx1\nm2\n##fe\n##×\n##ho\nsize\n143\nmin\nofo\nfun\ngomaji\nex\nhdmi\nfood\ndns\nmarch\nchris\nkevin\n##のか\n##lla\n##pp\n##ec\nag\nems\n6s\n720p\n##rm\n##ham\noff\n##92\nasp\nteam\nfandom\ned\n299\n▌♥\n##ell\ninfo\nされています\n##82\nsina\n4066\n161\n##able\n##ctor\n330\n399\n315\ndll\nrights\nltd\nidc\njul\n3kg\n1927\n142\nma\nsurface\n##76\n##ク\n～～～\n304\nmall\neps\n146\ngreen\n##59\nmap\nspace\ndonald\nv2\nsodu\n##light\n1931\n148\n1700\nまて\n310\nreserved\nhtm\n##han\n##57\n2d\n178\nmod\n##ise\n##tions\n152\nti\n##shi\ndoc\n1933\nicp\n055\nwang\n##ram\nshopping\naug\n##pi\n##well\nnow\nwam\nb2\nからお\n##hu\n236\n1928\n##gb\n266\nf2\n##93\n153\nmix\n##ef\n##uan\nbwl\n##plus\n##res\ncore\n##ess\ntea\n5℃\nhktvmall\nnhk\n##ate\nlist\n##ese\n301\nfeb\n4m\ninn\nての\nnov\n159\n12345\ndaniel\n##ci\npass\n##bet\n##nk\ncoffee\n202\nssl\nairbnb\n##ute\nfbi\nwoshipm\nskype\nea\ncg\nsp\n##fc\n##www\nyes\nedge\nalt\n007\n##94\nfpga\n##ght\n##gs\niso9001\nさい\n##ile\n##wood\n##uo\nimage\nlin\nicon\namerican\n##em\n1932\nset\nsays\n##king\n##tive\nblogger\n##74\nなと\n256\n147\n##ox\n##zy\n##red\n##ium\n##lf\nnokia\nclaire\n##リ\n##ding\nnovember\nlohas\n##500\n##tic\n##マ\n##cs\n##ある\n##che\n##ire\n##gy\n##ult\ndb\njanuary\nwin\n##カ\n166\nroad\nptt\n##ま\n##つ\n198\n##fa\n##mer\nanna\npchome\nはい\nudn\nef\n420\n##time\n##tte\n2030\n##ア\ng20\nwhite\nかかります\n1929\n308\ngarden\neleven\ndi\n##おります\nchen\n309b\n777\n172\nyoung\ncosplay\nちてない\n4500\nbat\n##123\n##tra\n##ては\nkindle\nnpc\nsteve\netc\n##ern\n##｜\ncall\nxperia\nces\ntravel\nsk\ns7\n##ous\n1934\n##int\nみいたたけます\n183\nedu\nfile\ncho\nqr\n##car\n##our\n186\n##ant\n##ｄ\neric\n1914\nrends\n##jo\n##する\nmastercard\n##2000\nkb\n##min\n290\n##ino\nvista\n##ris\n##ud\njack\n2400\n##set\n169\npos\n1912\n##her\n##ou\ntaipei\nしく\n205\nbeta\n##ませんか\n232\n##fi\nexpress\n255\nbody\n##ill\naphojoy\nuser\ndecember\nmeiki\n##ick\ntweet\nrichard\n##av\n##ᆫ\niphone6\n##dd\nちてすか\nviews\n##mark\n321\npd\n##００\ntimes\n##▲\nlevel\n##ash\n10g\npoint\n5l\n##ome\n208\nkoreanmall\n##ak\ngeorge\nq2\n206\nwma\ntcp\n##200\nスタッフ\nfull\nmlb\n##lle\n##watch\ntm\nrun\n179\n911\nsmith\nbusiness\n##und\n1919\ncolor\n##tal\n222\n171\n##less\nmoon\n4399\n##rl\nupdate\npcb\nshop\n499\n157\nlittle\nなし\nend\n##mhz\nvan\ndsp\neasy\n660\n##house\n##key\nhistory\n##ｏ\noh\n##001\n##hy\n##web\noem\nlet\nwas\n##2009\n##gg\nreview\n##wan\n182\n##°c\n203\nuc\ntitle\n##val\nunited\n233\n2021\n##ons\ndoi\ntrivago\noverdope\nsbs\n##ance\n##ち\ngrand\nspecial\n573032185\nimf\n216\nwx17house\n##so\n##ーム\naudi\n##he\nlondon\nwilliam\n##rp\n##ake\nscience\nbeach\ncfa\namp\nps4\n880\n##800\n##link\n##hp\ncrm\nferragamo\nbell\nmake\n##eng\n195\nunder\nzh\nphotos\n2300\n##style\n##ント\nvia\n176\nda\n##gi\ncompany\ni7\n##ray\nthomas\n370\nufo\ni5\n##max\nplc\nben\nback\nresearch\n8g\n173\nmike\n##pc\n##ッフ\nseptember\n189\n##ace\nvps\nfebruary\n167\npantos\nwp\nlisa\n1921\n★★\njquery\nnight\nlong\noffer\n##berg\n##news\n1911\n##いて\nray\nfks\nwto\nせます\nover\n164\n340\n##all\n##rus\n1924\n##888\n##works\nblogtitle\nloftpermalink\n##→\n187\nmartin\ntest\nling\nkm\n##め\n15000\nfda\nv3\n##ja\n##ロ\nｗedding\nかある\noutlet\nfamily\n##ea\nをこ\n##top\nstory\n##ness\nsalvatore\n##lu\n204\nswift\n215\nroom\nしている\noracle\n##ul\n1925\nsam\nb2c\nweek\npi\nrock\n##のは\n##ａ\n##けと\n##ean\n##300\n##gle\ncctv\nafter\nchinese\n##back\npowered\nx2\n##tan\n1918\n##nes\n##イン\ncanon\nonly\n181\n##zi\n##las\nsay\n##oe\n184\n##sd\n221\n##bot\n##world\n##zo\nsky\nmade\ntop100\njust\n1926\npmi\n802\n234\ngap\n##vr\n177\nles\n174\n▲topoct\nball\nvogue\nvi\ning\nofweek\ncos\n##list\n##ort\n▲topmay\n##なら\n##lon\nとして\nlast\n##tc\n##of\n##bus\n##gen\nreal\neva\n##コ\na3\nnas\n##lie\n##ria\n##coin\n##bt\n▲topapr\nhis\n212\ncat\nnata\nvive\nhealth\n⋯⋯\ndrive\nsir\n▲topmar\ndu\ncup\n##カー\n##ook\n##よう\n##sy\nalex\nmsg\ntour\nしました\n3ce\n##word\n193\nebooks\nr8\nblock\n318\n##より\n2200\nnice\npvp\n207\nmonths\n1905\nrewards\n##ther\n1917\n0800\n##xi\n##チ\n##sc\nmicro\n850\ngg\nblogfp\nop\n1922\ndaily\nm1\n264\ntrue\n##bb\nml\n##tar\n##のお\n##ky\nanthony\n196\n253\n##yo\nstate\n218\n##ara\n##aa\n##rc\n##tz\n##ston\nより\ngear\n##eo\n##ade\nge\nsee\n1923\n##win\n##ura\nss\nheart\n##den\n##ita\ndown\n##sm\nel\npng\n2100\n610\nrakuten\nwhatsapp\nbay\ndream\nadd\n##use\n680\n311\npad\ngucci\nmpv\n##ode\n##fo\nisland\n▲topjun\n##▼\n223\njason\n214\nchicago\n##❤\nしの\n##hone\nio\n##れる\n##ことか\nsogo\nbe2\n##ology\n990\ncloud\nvcd\n##con\n2～3\n##ford\n##joy\n##kb\n##こさいます\n##rade\nbut\n##ach\ndocker\n##ful\nrfid\nul\n##ase\nhit\nford\n##star\n580\n##○\n１１\na2\nsdk\nreading\nedited\n##are\ncmos\n##mc\n238\nsiri\nlight\n##ella\n##ため\nbloomberg\n##read\npizza\n##ison\njimmy\n##vm\ncollege\nnode\njournal\nba\n18k\n##play\n245\n##cer\n２０\nmagic\n##yu\n191\njump\n288\ntt\n##ings\nasr\n##lia\n3200\nstep5\nnetwork\n##cd\nmc\nいします\n1234\npixstyleme\n273\n##600\n2800\nmoney\n★★★★★\n1280\n１２\n430\nbl\nみの\nact\n##tus\ntokyo\n##rial\n##life\nemba\n##ae\nsaas\ntcs\n##rk\n##wang\nsummer\n##sp\nko\n##ving\n390\npremium\n##その\nnetflix\n##ヒ\nuk\nmt\n##lton\nright\nfrank\ntwo\n209\nえる\n##ple\n##cal\n021\n##んな\n##sen\n##ville\nhold\nnexus\ndd\n##ius\nてお\n##mah\n##なく\ntila\nzero\n820\nce\n##tin\nresort\n##ws\ncharles\nold\np10\n5d\nreport\n##360\n##ru\n##には\nbus\nvans\nlt\n##est\npv\n##レ\nlinks\nrebecca\n##ツ\n##dm\nazure\n##365\nきな\nlimited\nbit\n4gb\n##mon\n1910\nmoto\n##eam\n213\n1913\nvar\neos\nなとの\n226\nblogspot\nされた\n699\ne3\ndos\ndm\nfc\n##ments\n##ik\n##kw\nboy\n##bin\n##ata\n960\ner\n##せ\n219\n##vin\n##tu\n##ula\n194\n##∥\nstation\n##ろ\n##ature\n835\nfiles\nzara\nhdr\ntop10\nnature\n950\nmagazine\ns6\nmarriott\n##シ\navira\ncase\n##っと\ntab\n##ran\ntony\n##home\noculus\nim\n##ral\njean\nsaint\ncry\n307\nrosie\n##force\n##ini\nice\n##bert\nのある\n##nder\n##mber\npet\n2600\n##◆\nplurk\n▲topdec\n##sis\n00kg\n▲topnov\n720\n##ence\ntim\n##ω\n##nc\n##ても\n##name\nlog\nips\ngreat\nikea\nmalaysia\nunix\n##イト\n3600\n##ncy\n##nie\n12000\nakb48\n##ye\n##oid\n404\n##chi\n##いた\noa\nxuehai\n##1000\n##orm\n##rf\n275\nさん\n##ware\n##リー\n980\nho\n##pro\ntext\n##era\n560\nbob\n227\n##ub\n##2008\n8891\nscp\navi\n##zen\n2022\nmi\nwu\nmuseum\nqvod\napache\nlake\njcb\n▲topaug\n★★★\nni\n##hr\nhill\n302\nne\nweibo\n490\nruby\n##ーシ\n##ヶ\n##row\n4d\n▲topjul\niv\n##ish\ngithub\n306\nmate\n312\n##スト\n##lot\n##ane\nandrew\nのハイト\n##tina\nt1\nrf\ned2k\n##vel\n##900\nway\nfinal\nりの\nns\n5a\n705\n197\n##メ\nsweet\nbytes\n##ene\n▲topjan\n231\n##cker\n##2007\n##px\n100g\ntopapp\n229\nhelpapp\nrs\nlow\n14k\ng4g\ncare\n630\nldquo\nあり\n##fork\nleave\nrm\nedition\n##gan\n##zon\n##qq\n▲topsep\n##google\n##ism\ngold\n224\nexplorer\n##zer\ntoyota\ncategory\nselect\nvisual\n##labels\nrestaurant\n##md\nposts\ns1\n##ico\nもっと\nangelababy\n123456\n217\nsports\ns3\nmbc\n1915\nしてくたさい\nshell\nx86\ncandy\n##new\nkbs\nface\nxl\n470\n##here\n4a\nswissinfo\nv8\n▲topfeb\ndram\n##ual\n##vice\n3a\n##wer\nsport\nq1\nios10\npublic\nint\ncard\n##ｃ\nep\nau\nrt\n##れた\n1080\nbill\n##mll\nkim\n３０\n460\nwan\n##uk\n##ミ\nx3\n298\n0t\nscott\n##ming\n239\ne5\n##3d\nh7n9\nworldcat\nbrown\n##あります\n##vo\n##led\n##580\n##ax\n249\n410\n##ert\nparis\n##～6\npolo\n925\n##lr\n599\n##ナ\ncapital\n##hing\nbank\ncv\n1g\n##chat\n##ｓ\n##たい\nadc\n##ule\n2m\n##ｅ\ndigital\nhotmail\n268\n##pad\n870\nbbq\nquot\n##ring\nbefore\nwali\n##まて\nmcu\n2k\n2b\nという\ncostco\n316\nnorth\n333\nswitch\n##city\n##ｐ\nphilips\n##mann\nmanagement\npanasonic\n##cl\n##vd\n##ping\n##rge\nalice\n##lk\n##ましょう\ncss3\n##ney\nvision\nalpha\n##ular\n##400\n##tter\nlz\nにお\n##ありません\nmode\ngre\n1916\npci\n##tm\n237\n1～2\n##yan\n##そ\nについて\n##let\n##キ\nwork\nwar\ncoach\nah\nmary\n##ᅵ\nhuang\n##pt\na8\npt\nfollow\n##berry\n1895\n##ew\na5\nghost\n##ション\n##wn\n##og\nsouth\n##code\ngirls\n##rid\naction\nvilla\ngit\nr11\ntable\ngames\n##cket\nerror\n##anonymoussaid\n##ag\nhere\n##ame\n##gc\nqa\n##■\n##lis\ngmp\n##gin\nvmalife\n##cher\nyu\nwedding\n##tis\ndemo\ndragon\n530\nsoho\nsocial\nbye\n##rant\nriver\norz\nacer\n325\n##↑\n##ース\n##ats\n261\ndel\n##ven\n440\nups\n##ように\n##ター\n305\nvalue\nmacd\nyougou\n##dn\n661\n##ano\nll\n##urt\n##rent\ncontinue\nscript\n##wen\n##ect\npaper\n263\n319\nshift\n##chel\n##フト\n##cat\n258\nx5\nfox\n243\n##さん\ncar\naaa\n##blog\nloading\n##yn\n##tp\nkuso\n799\nsi\nsns\nイカせるテンマ\nヒンクテンマ3\nrmb\nvdc\nforest\ncentral\nprime\nhelp\nultra\n##rmb\n##ような\n241\nsquare\n688\n##しい\nのないフロクに\n##field\n##reen\n##ors\n##ju\nc1\nstart\n510\n##air\n##map\ncdn\n##wo\ncba\nstephen\nm8\n100km\n##get\nopera\n##base\n##ood\nvsa\ncom™\n##aw\n##ail\n251\nなのて\ncount\nt2\n##ᅡ\n##een\n2700\nhop\n##gp\nvsc\ntree\n##eg\n##ose\n816\n285\n##ories\n##shop\nalphago\nv4\n1909\nsimon\n##ᆼ\nfluke62max\nzip\nスホンサー\n##sta\nlouis\ncr\nbas\n##～10\nbc\n##yer\nhadoop\n##ube\n##wi\n1906\n0755\nhola\n##low\nplace\ncentre\n5v\nd3\n##fer\n252\n##750\n##media\n281\n540\n0l\nexchange\n262\nseries\n##ハー\n##san\neb\n##bank\n##ｋ\nq3\n##nge\n##mail\ntake\n##lp\n259\n1888\nclient\neast\ncache\nevent\nvincent\n##ールを\nきを\n##nse\nsui\n855\nadchoice\n##и\n##stry\n##なたの\n246\n##zone\nga\napps\nsea\n##ab\n248\ncisco\n##タ\n##rner\nkymco\n##care\ndha\n##pu\n##yi\nminkoff\nroyal\np1\nへの\nannie\n269\ncollection\nkpi\nplaystation\n257\nになります\n866\nbh\n##bar\nqueen\n505\nradio\n1904\nandy\narmani\n##xy\nmanager\niherb\n##ery\n##share\nspring\nraid\njohnson\n1908\n##ob\nvolvo\nhall\n##ball\nv6\nour\ntaylor\n##hk\nbi\n242\n##cp\nkate\nbo\nwater\ntechnology\n##rie\nサイトは\n277\n##ona\n##sl\nhpv\n303\ngtx\nhip\nrdquo\njayz\nstone\n##lex\n##rum\nnamespace\n##やり\n620\n##ale\n##atic\ndes\n##erson\n##ql\n##ves\n##type\nenter\n##この\n##てきます\nd2\n##168\n##mix\n##bian\nとの\na9\njj\nky\n##lc\naccess\nmovie\n##hc\nリストに\ntower\n##ration\n##mit\nます\n##nch\nua\ntel\nprefix\n##o2\n1907\n##point\n1901\nott\n～10\n##http\n##ury\nbaidu\n##ink\nmember\n##logy\nbigbang\nnownews\n##js\n##shot\n##tb\n##こと\n247\neba\n##tics\n##lus\nける\nv5\nspark\n##ama\nthere\n##ions\ngod\n##lls\n##down\nhiv\n##ress\nburberry\nday2\n##kv\n◆◆\njeff\nrelated\nfilm\nedit\njoseph\n283\n##ark\ncx\n32gb\norder\ng9\n30000\n##ans\n##tty\ns5\n##bee\nかあります\nthread\nxr\nbuy\nsh\n005\nland\nspotify\nmx\n##ari\n276\n##verse\n×email\nsf\nwhy\n##ことて\n244\n7headlines\nnego\nsunny\ndom\nexo\n401\n666\npositioning\nfit\nrgb\n##tton\n278\nkiss\nalexa\nadam\nlp\nみリストを\n##ｇ\nmp\n##ties\n##llow\namy\n##du\nnp\n002\ninstitute\n271\n##rth\n##lar\n2345\n590\n##des\nsidebar\n１５\nimax\nsite\n##cky\n##kit\n##ime\n##009\nseason\n323\n##fun\n##ンター\n##ひ\ngogoro\na7\npu\nlily\nfire\ntwd600\n##ッセーシを\nいて\n##vis\n30ml\n##cture\n##をお\ninformation\n##オ\nclose\nfriday\n##くれる\nyi\nnick\nてすか\n##tta\n##tel\n6500\n##lock\ncbd\neconomy\n254\nかお\n267\ntinker\ndouble\n375\n8gb\nvoice\n##app\noops\nchannel\ntoday\n985\n##right\nraw\nxyz\n##＋\njim\nedm\n##cent\n7500\nsupreme\n814\nds\n##its\n##asia\ndropbox\n##てすか\n##tti\nbooks\n272\n100ml\n##tle\n##ller\n##ken\n##more\n##boy\nsex\n309\n##dom\nt3\n##ider\n##なります\n##unch\n1903\n810\nfeel\n5500\n##かった\n##put\nにより\ns2\nmo\n##gh\nmen\nka\namoled\ndiv\n##tr\n##n1\nport\nhoward\n##tags\nken\ndnf\n##nus\nadsense\n##а\nide\n##へ\nbuff\nthunder\n##town\n##ique\nhas\n##body\nauto\npin\n##erry\ntee\nてした\n295\nnumber\n##the\n##013\nobject\npsp\ncool\nudnbkk\n16gb\n##mic\nmiui\n##tro\nmost\nr2\n##alk\n##nity\n1880\n±0\n##いました\n428\ns4\nlaw\nversion\n##oa\nn1\nsgs\ndocomo\n##tf\n##ack\nhenry\nfc2\n##ded\n##sco\n##014\n##rite\n286\n0mm\nlinkedin\n##ada\n##now\nwii\n##ndy\nucbug\n##◎\nsputniknews\nlegalminer\n##ika\n##xp\n2gb\n##bu\nq10\noo\nb6\ncome\n##rman\ncheese\nming\nmaker\n##gm\nnikon\n##fig\nppi\nkelly\n##ります\njchere\nてきます\nted\nmd\n003\nfgo\ntech\n##tto\ndan\nsoc\n##gl\n##len\nhair\nearth\n640\n521\nimg\n##pper\n##a1\n##てきる\n##ロク\nacca\n##ition\n##ference\nsuite\n##ig\noutlook\n##mond\n##cation\n398\n##pr\n279\n101vip\n358\n##999\n282\n64gb\n3800\n345\nairport\n##over\n284\n##おり\njones\n##ith\nlab\n##su\n##いるのて\nco2\ntown\npiece\n##llo\nno1\nvmware\n24h\n##qi\nfocus\nreader\n##admin\n##ora\ntb\nfalse\n##log\n1898\nknow\nlan\n838\n##ces\nf4\n##ume\nmotel\nstop\n##oper\nna\nflickr\nnetcomponents\n##af\n##─\npose\nwilliams\nlocal\n##ound\n##cg\n##site\n##iko\nいお\n274\n5m\ngsm\ncon\n##ath\n1902\nfriends\n##hip\ncell\n317\n##rey\n780\ncream\n##cks\n012\n##dp\nfacebooktwitterpinterestgoogle\nsso\n324\nshtml\nsong\nswiss\n##mw\n##キンク\nlumia\nxdd\nstring\ntiffany\n522\nmarc\nられた\ninsee\nrussell\nsc\ndell\n##ations\nｏｋ\ncamera\n289\n##vs\n##flow\n##late\nclassic\n287\n##nter\nstay\ng1\nmtv\n512\n##ever\n##lab\n##nger\nqe\nsata\nryan\nd1\n50ml\ncms\n##cing\nsu\n292\n3300\neditor\n296\n##nap\nsecurity\nsunday\nassociation\n##ens\n##700\n##bra\nacg\n##かり\nsofascore\nとは\nmkv\n##ign\njonathan\ngary\nbuild\nlabels\n##oto\ntesla\nmoba\nqi\ngohappy\ngeneral\najax\n1024\n##かる\nサイト\nsociety\n##test\n##urs\nwps\nfedora\n##ich\nmozilla\n328\n##480\n##dr\nusa\nurn\n##lina\n##ｒ\ngrace\n##die\n##try\n##ader\n1250\n##なり\nelle\n570\n##chen\n##ᆯ\nprice\n##ten\nuhz\n##ough\neq\n##hen\nstates\npush\nsession\nbalance\nwow\n506\n##cus\n##py\nwhen\n##ward\n##ep\n34e\nwong\nlibrary\nprada\n##サイト\n##cle\nrunning\n##ree\n313\nck\ndate\nq4\n##ctive\n##ool\n##＞\nmk\n##ira\n##163\n388\ndie\nsecret\nrq\ndota\nbuffet\nは１ヶ\ne6\n##ez\npan\n368\nha\n##card\n##cha\n2a\n##さ\nalan\nday3\neye\nf3\n##end\nfrance\nkeep\nadi\nrna\ntvbs\n##ala\nsolo\nnova\n##え\n##tail\n##ょう\nsupport\n##ries\n##なる\n##ved\nbase\ncopy\niis\nfps\n##ways\nhero\nhgih\nprofile\nfish\nmu\nssh\nentertainment\nchang\n##wd\nclick\ncake\n##ond\npre\n##tom\nkic\npixel\n##ov\n##fl\nproduct\n6a\n##pd\ndear\n##gate\nes\nyumi\naudio\n##²\n##sky\necho\nbin\nwhere\n##ture\n329\n##ape\nfind\nsap\nisis\n##なと\nnand\n##101\n##load\n##ream\nband\na6\n525\nnever\n##post\nfestival\n50cm\n##we\n555\nguide\n314\nzenfone\n##ike\n335\ngd\nforum\njessica\nstrong\nalexander\n##ould\nsoftware\nallen\n##ious\nprogram\n360°\nelse\nlohasthree\n##gar\nすることかてきます\nplease\n##れます\nrc\n##ggle\n##ric\nbim\n50000\n##own\neclipse\n355\nbrian\n3ds\n##side\n061\n361\n##other\n##ける\n##tech\n##ator\n485\nengine\n##ged\n##ｔ\nplaza\n##fit\ncia\nngo\nwestbrook\nshi\ntbs\n50mm\n##みませんか\nsci\n291\nreuters\n##ily\ncontextlink\n##hn\naf\n##cil\nbridge\nvery\n##cel\n1890\ncambridge\n##ize\n15g\n##aid\n##data\n790\nfrm\n##head\naward\nbutler\n##sun\nmeta\n##mar\namerica\nps3\npuma\npmid\n##すか\nlc\n670\nkitchen\n##lic\nオーフン5\nきなしソフトサーヒス\nそして\nday1\nfuture\n★★★★\n##text\n##page\n##rris\npm1\n##ket\nfans\n##っています\n1001\nchristian\nbot\nkids\ntrackback\n##hai\nc3\ndisplay\n##hl\nn2\n1896\nidea\nさんも\n##sent\nairmail\n##ug\n##men\npwm\nけます\n028\n##lution\n369\n852\nawards\nschemas\n354\nasics\nwikipedia\nfont\n##tional\n##vy\nc2\n293\n##れている\n##dget\n##ein\nっている\ncontact\npepper\nスキル\n339\n##～5\n294\n##uel\n##ument\n730\n##hang\nみてす\nq5\n##sue\nrain\n##ndi\nwei\nswatch\n##cept\nわせ\n331\npopular\n##ste\n##tag\np2\n501\ntrc\n1899\n##west\n##live\njustin\nhonda\nping\nmessenger\n##rap\nv9\n543\n##とは\nunity\nappqq\nはすへて\n025\nleo\n##tone\n##テ\n##ass\nuniqlo\n##010\n502\nher\njane\nmemory\nmoneydj\n##tical\nhuman\n12306\nしていると\n##m2\ncoc\nmiacare\n##mn\ntmt\n##core\nvim\nkk\n##may\nfan\ntarget\nuse\ntoo\n338\n435\n2050\n867\n737\nfast\n##2c\nservices\n##ope\nomega\nenergy\n##わ\npinkoi\n1a\n##なから\n##rain\njackson\n##ement\n##シャンルの\n374\n366\nそんな\np9\nrd\n##ᆨ\n1111\n##tier\n##vic\nzone\n##│\n385\n690\ndl\nisofix\ncpa\nm4\n322\nkimi\nめて\ndavis\n##lay\nlulu\n##uck\n050\nweeks\nqs\n##hop\n920\n##ｎ\nae\n##ear\n～5\neia\n405\n##fly\nkorea\njpeg\nboost\n##ship\nsmall\n##リア\n1860\neur\n297\n425\nvalley\n##iel\nsimple\n##ude\nrn\nk2\n##ena\nされます\nnon\npatrick\nしているから\n##ナー\nfeed\n5757\n30g\nprocess\nwell\nqqmei\n##thing\nthey\naws\nlu\npink\n##ters\n##kin\nまたは\nboard\n##vertisement\nwine\n##ien\nunicode\n##dge\nr1\n359\n##tant\nいを\n##twitter\n##3c\ncool1\nされる\n##れて\n##ｌ\nisp\n##012\nstandard\n45㎡2\n402\n##150\nmatt\n##fu\n326\n##iner\ngooglemsn\npixnetfacebookyahoo\n##ラン\nx7\n886\n##uce\nメーカー\nsao\n##ev\n##きました\n##file\n9678\n403\nxddd\nshirt\n6l\n##rio\n##hat\n3mm\ngivenchy\nya\nbang\n##lio\nmonday\ncrystal\nロクイン\n##abc\n336\nhead\n890\nubuntuforumwikilinuxpastechat\n##vc\n##～20\n##rity\ncnc\n7866\nipv6\nnull\n1897\n##ost\nyang\nimsean\ntiger\n##fet\n##ンス\n352\n##＝\ndji\n327\nji\nmaria\n##come\n##んて\nfoundation\n3100\n##beth\n##なった\n1m\n601\nactive\n##aft\n##don\n3p\nsr\n349\nemma\n##khz\nliving\n415\n353\n1889\n341\n709\n457\nsas\nx6\n##face\npptv\nx4\n##mate\nhan\nsophie\n##jing\n337\nfifa\n##mand\nother\nsale\ninwedding\n##gn\nてきちゃいます\n##mmy\n##pmlast\nbad\nnana\nnbc\nしてみてくたさいね\nなとはお\n##wu\n##かあります\n##あ\nnote7\nsingle\n##340\nせからこ\nしてくたさい♪この\nしにはとんとんワークケートを\nするとあなたにもっとマッチした\nならワークケートへ\nもみつかっちゃうかも\nワークケートの\n##bel\nwindow\n##dio\n##ht\nunion\nage\n382\n１４\n##ivity\n##ｙ\nコメント\ndomain\nneo\n##isa\n##lter\n5k\nf5\nsteven\n##cts\npowerpoint\ntft\nself\ng2\nft\n##テル\nzol\n##act\nmwc\n381\n343\nもう\nnbapop\n408\nてある\neds\nace\n##room\nprevious\nauthor\ntomtom\nil\n##ets\nhu\nfinancial\n☆☆☆\nっています\nbp\n5t\nchi\n1gb\n##hg\nfairmont\ncross\n008\ngay\nh2\nfunction\n##けて\n356\nalso\n1b\n625\n##ータ\n##raph\n1894\n3～5\n##ils\ni3\n334\navenue\n##host\nによる\n##bon\n##tsu\nmessage\nnavigation\n50g\nfintech\nh6\n##ことを\n8cm\n##ject\n##vas\n##firm\ncredit\n##wf\nxxxx\nform\n##nor\n##space\nhuawei\nplan\njson\nsbl\n##dc\nmachine\n921\n392\nwish\n##120\n##sol\nwindows7\nedward\n##ために\ndevelopment\nwashington\n##nsis\nlo\n818\n##sio\n##ym\n##bor\nplanet\n##～8\n##wt\nieee\ngpa\n##めて\ncamp\nann\ngm\n##tw\n##oka\nconnect\n##rss\n##work\n##atus\nwall\nchicken\nsoul\n2mm\n##times\nfa\n##ather\n##cord\n009\n##eep\nhitachi\ngui\nharry\n##pan\ne1\ndisney\n##press\n##ーション\nwind\n386\nfrigidaire\n##tl\nliu\nhsu\n332\nbasic\nvon\nev\nいた\nてきる\nスホンサーサイト\nlearning\n##ull\nexpedia\narchives\nchange\n##wei\nsanta\ncut\nins\n6gb\nturbo\nbrand\ncf1\n508\n004\nreturn\n747\n##rip\nh1\n##nis\n##をこ\n128gb\n##にお\n3t\napplication\nしており\nemc\nrx\n##oon\n384\nquick\n412\n15058\nwilson\nwing\nchapter\n##bug\nbeyond\n##cms\n##dar\n##oh\nzoom\ne2\ntrip\nsb\n##nba\nrcep\n342\naspx\nci\n080\ngc\ngnu\nめる\n##count\nadvanced\ndance\ndv\n##url\n##ging\n367\n8591\nam09\nshadow\nbattle\n346\n##ｉ\n##cia\n##という\nemily\n##のてす\n##tation\nhost\nff\ntechorz\nsars\n##mini\n##mporary\n##ering\nnc\n4200\n798\n##next\ncma\n##mbps\n##gas\n##ift\n##dot\n##ィ\n455\n##～17\namana\n##りの\n426\n##ros\nir\n00㎡1\n##eet\n##ible\n##↓\n710\nˋ▽ˊ\n##aka\ndcs\niq\n##ｖ\nl1\n##lor\nmaggie\n##011\n##iu\n588\n##～1\n830\n##gt\n1tb\narticles\ncreate\n##burg\n##iki\ndatabase\nfantasy\n##rex\n##cam\ndlc\ndean\n##you\nhard\npath\ngaming\nvictoria\nmaps\ncb\n##lee\n##itor\noverchicstoretvhome\nsystems\n##xt\n416\np3\nsarah\n760\n##nan\n407\n486\nx9\ninstall\nsecond\n626\n##ann\n##ph\n##rcle\n##nic\n860\n##nar\nec\n##とう\n768\nmetro\nchocolate\n##rian\n～4\n##table\n##しています\nskin\n##sn\n395\nmountain\n##0mm\ninparadise\n6m\n7x24\nib\n4800\n##jia\neeworld\ncreative\ng5\ng3\n357\nparker\necfa\nvillage\nからの\n18000\nsylvia\nサーヒス\nhbl\n##ques\n##onsored\n##x2\n##きます\n##v4\n##tein\nie6\n383\n##stack\n389\nver\n##ads\n##baby\nsound\nbbe\n##110\n##lone\n##uid\nads\n022\ngundam\n351\nthinkpad\n006\nscrum\nmatch\n##ave\nmems\n##470\n##oy\n##なりました\n##talk\nglass\nlamigo\nspan\n##eme\njob\n##a5\njay\nwade\nkde\n498\n##lace\nocean\ntvg\n##covery\n##r3\n##ners\n##rea\njunior\nthink\n##aine\ncover\n##ision\n##sia\n↓↓\n##bow\nmsi\n413\n458\n406\n##love\n711\n801\nsoft\nz2\n##pl\n456\n1840\nmobil\nmind\n##uy\n427\nnginx\n##oi\nめた\n##rr\n6221\n##mple\n##sson\n##ーシてす\n371\n##nts\n91tv\ncomhd\ncrv3000\n##uard\n1868\n397\ndeep\nlost\nfield\ngallery\n##bia\nrate\nspf\nredis\ntraction\n930\nicloud\n011\nなら\nfe\njose\n372\n##tory\ninto\nsohu\nfx\n899\n379\nkicstart2\n##hia\nすく\n##～3\n##sit\nra\n２４\n##walk\n##xure\n500g\n##pact\npacific\nxa\nnatural\ncarlo\n##250\n##walker\n1850\n##can\ncto\ngigi\n516\n##サー\npen\n##hoo\nob\nmatlab\n##ｂ\n##yy\n13913459\n##iti\nmango\n##bbs\nsense\nc5\noxford\n##ニア\nwalker\njennifer\n##ola\ncourse\n##bre\n701\n##pus\n##rder\nlucky\n075\n##ぁ\nivy\nなお\n##nia\nsotheby\nside\n##ugh\njoy\n##orage\n##ush\n##bat\n##dt\n364\nr9\n##2d\n##gio\n511\ncountry\nwear\n##lax\n##～7\n##moon\n393\nseven\nstudy\n411\n348\nlonzo\n8k\n##ェ\nevolution\n##イフ\n##kk\ngs\nkd\n##レス\narduino\n344\nb12\n##lux\narpg\n##rdon\ncook\n##x5\ndark\nfive\n##als\n##ida\nとても\nsign\n362\n##ちの\nsomething\n20mm\n##nda\n387\n##posted\nfresh\ntf\n1870\n422\ncam\n##mine\n##skip\n##form\n##ssion\neducation\n394\n##tee\ndyson\nstage\n##jie\nwant\n##night\nepson\npack\nあります\n##ppy\nテリヘル\n##█\nwd\n##eh\n##rence\nleft\n##lvin\ngolden\nmhz\ndiscovery\n##trix\n##n2\nloft\n##uch\n##dra\n##sse\nspeed\n～1\n1mdb\nsorry\nwelcome\n##urn\nwave\ngaga\n##lmer\nteddy\n##160\nトラックハック\nせよ\n611\n##f2016\n378\nrp\n##sha\nrar\n##あなたに\n##きた\n840\nholiday\n##ュー\n373\n074\n##vg\n##nos\n##rail\ngartner\ngi\n6p\n##dium\nkit\n488\nb3\neco\n##ろう\n20g\nsean\n##stone\nautocad\nnu\n##np\nf16\nwrite\n029\nm5\n##ias\nimages\natp\n##dk\nfsm\n504\n1350\nve\n52kb\n##xxx\n##のに\n##cake\n414\nunit\nlim\nru\n1v\n##ification\npublished\nangela\n16g\nanalytics\nak\n##ｑ\n##nel\ngmt\n##icon\nagain\n##₂\n##bby\nios11\n445\nかこさいます\nwaze\nいてす\n##ハ\n9985\n##ust\n##ティー\nframework\n##007\niptv\ndelete\n52sykb\ncl\nwwdc\n027\n30cm\n##fw\n##ての\n1389\n##xon\nbrandt\n##ses\n##dragon\ntc\nvetements\nanne\nmonte\nmodern\nofficial\n##へて\n##ere\n##nne\n##oud\nもちろん\n５０\netnews\n##a2\n##graphy\n421\n863\n##ちゃん\n444\n##rtex\n##てお\nl2\n##gma\nmount\nccd\nたと\narchive\nmorning\ntan\nddos\ne7\n##ホ\nday4\n##ウ\ngis\n453\nits\n495\nfactory\nbruce\npg\n##ito\nってくたさい\nguest\ncdma\n##lling\n536\nn3\nしかし\n3～4\nmega\neyes\nro\n１３\nwomen\ndac\nchurch\n##jun\nsingapore\n##facebook\n6991\nstarbucks\n##tos\n##stin\n##shine\nzen\n##mu\ntina\n20℃\n1893\n##たけて\n503\n465\nrequest\n##gence\nqt\n##っ\n1886\n347\n363\nq7\n##zzi\ndiary\n##tore\n409\n##ead\n468\ncst\n##osa\ncanada\nagent\nva\n##jiang\n##ちは\n##ーク\n##lam\nsg\n##nix\n##sday\n##よって\ng6\n##master\nbing\n##zl\ncharlie\n１６\n8mm\nnb40\n##ーン\nthai\n##ルフ\nln284ct\n##itz\n##2f\nbonnie\n##food\n##lent\noriginals\n##stro\n##lts\n418\n∟∣\n##bscribe\nchildren\nntd\nyesstyle\n##かも\nhmv\n##tment\nd5\n2cm\narts\nsms\n##pn\n##я\n##いい\ntopios9\n539\nlifestyle\nvirtual\n##ague\nxz\n##deo\nmuji\n024\nunt\n##nnis\n##ᅩ\nfaq1\n1884\n396\n##ette\nfly\n64㎡\nはしめまして\n441\ncurry\n##pop\nのこ\nrelease\n##←\n##◆◆\n##cast\n073\nありな\n500ml\n##ews\n5c\n##stle\nios7\n##ima\n787\ndog\nlenovo\n##r4\nroger\n013\ncbs\nvornado\n100m\n417\n##desk\n##クok\n##ald\n1867\n9595\n2900\n##van\noil\n##ｘ\nsome\nbreak\ncommon\n##jy\n##lines\ng7\ntwice\n419\nella\nnano\nbelle\nにこ\n##mes\n##self\n##note\njb\n##ことかてきます\nbenz\n##との\n##ova\n451\nsave\n##wing\n##ますのて\nkai\nりは\n##hua\n##rect\nrainer\n##unge\n448\n##0m\nadsl\n##かな\nguestname\n##uma\n##kins\n##zu\ntokichoi\n##price\ncounty\n##med\n##mus\nrmk\n391\naddress\nvm\nえて\nopenload\n##group\n##hin\n##iginal\namg\nurban\n##oz\njobs\nemi\n##public\nbeautiful\n##sch\nalbum\n##dden\n##bell\njerry\nworks\nhostel\nmiller\n##drive\n##rmin\n##１０\n376\nboot\n828\n##370\n##fx\n##cm～\n1885\n##nome\n##ctionary\n##oman\n##lish\n##cr\n##hm\n433\n##how\n432\nfrancis\nxi\nc919\nb5\nevernote\n##uc\nvga\n##3000\ncoupe\n##urg\n##cca\n##uality\n019\n6g\nれる\nmulti\n##また\n##ett\nem\nhey\n##ani\n##tax\n##rma\ninside\nthan\n740\nleonnhurt\n##jin\nict\nれた\nbird\nnotes\n200mm\nくの\n##dical\n##lli\nresult\n442\niu\nee\n438\nsmap\ngopro\n##last\nyin\npure\n998\n32g\nけた\n5kg\n##dan\n##rame\nmama\n##oot\nbean\nmarketing\n##hur\n2l\nbella\nsync\nxuite\n##ground\n515\ndiscuz\n##getrelax\n##ince\n##bay\n##5s\ncj\n##イス\ngmat\napt\n##pass\njing\n##rix\nc4\nrich\n##とても\nniusnews\n##ello\nbag\n770\n##eting\n##mobile\n１８\nculture\n015\n##のてすか\n377\n1020\narea\n##ience\n616\ndetails\ngp\nuniversal\nsilver\ndit\nはお\nprivate\nddd\nu11\nkanshu\n##ified\nfung\n##nny\ndx\n##520\ntai\n475\n023\n##fr\n##lean\n3s\n##pin\n429\n##rin\n25000\nly\nrick\n##bility\nusb3\nbanner\n##baru\n##gion\nmetal\ndt\nvdf\n1871\nkarl\nqualcomm\nbear\n1010\noldid\nian\njo\n##tors\npopulation\n##ernel\n1882\nmmorpg\n##mv\n##bike\n603\n##©\nww\nfriend\n##ager\nexhibition\n##del\n##pods\nfpx\nstructure\n##free\n##tings\nkl\n##rley\n##copyright\n##mma\ncalifornia\n3400\norange\nyoga\n4l\ncanmake\nhoney\n##anda\n##コメント\n595\nnikkie\n##ルハイト\ndhl\npublishing\n##mall\n##gnet\n20cm\n513\n##クセス\n##┅\ne88\n970\n##dog\nfishbase\n##!\n##\"\n###\n##$\n##%\n##&\n##'\n##(\n##)\n##*\n##+\n##,\n##-\n##.\n##/\n##:\n##;\n##<\n##=\n##>\n##?\n##@\n##[\n##\\\n##]\n##^\n##_\n##{\n##|\n##}\n##~\n##£\n##¤\n##¥\n##§\n##«\n##±\n##³\n##µ\n##·\n##¹\n##º\n##»\n##¼\n##ß\n##æ\n##÷\n##ø\n##đ\n##ŋ\n##ɔ\n##ə\n##ɡ\n##ʰ\n##ˇ\n##ˈ\n##ˊ\n##ˋ\n##ˍ\n##ː\n##˙\n##˚\n##ˢ\n##α\n##β\n##γ\n##δ\n##ε\n##η\n##θ\n##ι\n##κ\n##λ\n##μ\n##ν\n##ο\n##π\n##ρ\n##ς\n##σ\n##τ\n##υ\n##φ\n##χ\n##ψ\n##б\n##в\n##г\n##д\n##е\n##ж\n##з\n##к\n##л\n##м\n##н\n##о\n##п\n##р\n##с\n##т\n##у\n##ф\n##х\n##ц\n##ч\n##ш\n##ы\n##ь\n##і\n##ا\n##ب\n##ة\n##ت\n##د\n##ر\n##س\n##ع\n##ل\n##م\n##ن\n##ه\n##و\n##ي\n##۩\n##ก\n##ง\n##น\n##ม\n##ย\n##ร\n##อ\n##า\n##เ\n##๑\n##་\n##ღ\n##ᄀ\n##ᄁ\n##ᄂ\n##ᄃ\n##ᄅ\n##ᄆ\n##ᄇ\n##ᄈ\n##ᄉ\n##ᄋ\n##ᄌ\n##ᄎ\n##ᄏ\n##ᄐ\n##ᄑ\n##ᄒ\n##ᅢ\n##ᅣ\n##ᅥ\n##ᅦ\n##ᅧ\n##ᅨ\n##ᅪ\n##ᅬ\n##ᅭ\n##ᅮ\n##ᅯ\n##ᅲ\n##ᅳ\n##ᅴ\n##ᆷ\n##ᆸ\n##ᆺ\n##ᆻ\n##ᗜ\n##ᵃ\n##ᵉ\n##ᵍ\n##ᵏ\n##ᵐ\n##ᵒ\n##ᵘ\n##‖\n##„\n##†\n##•\n##‥\n##‧\n## \n##‰\n##′\n##″\n##‹\n##›\n##※\n##‿\n##⁄\n##ⁱ\n##⁺\n##ⁿ\n##₁\n##₃\n##₄\n##€\n##№\n##ⅰ\n##ⅱ\n##ⅲ\n##ⅳ\n##ⅴ\n##↔\n##↗\n##↘\n##⇒\n##∀\n##−\n##∕\n##∙\n##√\n##∞\n##∟\n##∠\n##∣\n##∩\n##∮\n##∶\n##∼\n##∽\n##≈\n##≒\n##≡\n##≤\n##≥\n##≦\n##≧\n##≪\n##≫\n##⊙\n##⋅\n##⋈\n##⋯\n##⌒\n##①\n##②\n##③\n##④\n##⑤\n##⑥\n##⑦\n##⑧\n##⑨\n##⑩\n##⑴\n##⑵\n##⑶\n##⑷\n##⑸\n##⒈\n##⒉\n##⒊\n##⒋\n##ⓒ\n##ⓔ\n##ⓘ\n##━\n##┃\n##┆\n##┊\n##┌\n##└\n##├\n##┣\n##═\n##║\n##╚\n##╞\n##╠\n##╭\n##╮\n##╯\n##╰\n##╱\n##╳\n##▂\n##▃\n##▅\n##▇\n##▉\n##▋\n##▌\n##▍\n##▎\n##□\n##▪\n##▫\n##▬\n##△\n##▶\n##►\n##▽\n##◇\n##◕\n##◠\n##◢\n##◤\n##☀\n##☕\n##☞\n##☺\n##☼\n##♀\n##♂\n##♠\n##♡\n##♣\n##♦\n##♫\n##♬\n##✈\n##✔\n##✕\n##✖\n##✦\n##✨\n##✪\n##✰\n##✿\n##❀\n##➜\n##➤\n##⦿\n##、\n##。\n##〃\n##々\n##〇\n##〈\n##〉\n##《\n##》\n##「\n##」\n##『\n##』\n##【\n##】\n##〓\n##〔\n##〕\n##〖\n##〗\n##〜\n##〝\n##〞\n##ぃ\n##ぇ\n##ぬ\n##ふ\n##ほ\n##む\n##ゃ\n##ゅ\n##ゆ\n##ょ\n##゜\n##ゝ\n##ァ\n##ゥ\n##エ\n##ォ\n##ケ\n##サ\n##セ\n##ソ\n##ッ\n##ニ\n##ヌ\n##ネ\n##ノ\n##ヘ\n##モ\n##ャ\n##ヤ\n##ュ\n##ユ\n##ョ\n##ヨ\n##ワ\n##ヲ\n##・\n##ヽ\n##ㄅ\n##ㄆ\n##ㄇ\n##ㄉ\n##ㄋ\n##ㄌ\n##ㄍ\n##ㄎ\n##ㄏ\n##ㄒ\n##ㄚ\n##ㄛ\n##ㄞ\n##ㄟ\n##ㄢ\n##ㄤ\n##ㄥ\n##ㄧ\n##ㄨ\n##ㆍ\n##㈦\n##㊣\n##㗎\n##一\n##丁\n##七\n##万\n##丈\n##三\n##上\n##下\n##不\n##与\n##丐\n##丑\n##专\n##且\n##丕\n##世\n##丘\n##丙\n##业\n##丛\n##东\n##丝\n##丞\n##丟\n##両\n##丢\n##两\n##严\n##並\n##丧\n##丨\n##个\n##丫\n##中\n##丰\n##串\n##临\n##丶\n##丸\n##丹\n##为\n##主\n##丼\n##丽\n##举\n##丿\n##乂\n##乃\n##久\n##么\n##义\n##之\n##乌\n##乍\n##乎\n##乏\n##乐\n##乒\n##乓\n##乔\n##乖\n##乗\n##乘\n##乙\n##乜\n##九\n##乞\n##也\n##习\n##乡\n##书\n##乩\n##买\n##乱\n##乳\n##乾\n##亀\n##亂\n##了\n##予\n##争\n##事\n##二\n##于\n##亏\n##云\n##互\n##五\n##井\n##亘\n##亙\n##亚\n##些\n##亜\n##亞\n##亟\n##亡\n##亢\n##交\n##亥\n##亦\n##产\n##亨\n##亩\n##享\n##京\n##亭\n##亮\n##亲\n##亳\n##亵\n##人\n##亿\n##什\n##仁\n##仃\n##仄\n##仅\n##仆\n##仇\n##今\n##介\n##仍\n##从\n##仏\n##仑\n##仓\n##仔\n##仕\n##他\n##仗\n##付\n##仙\n##仝\n##仞\n##仟\n##代\n##令\n##以\n##仨\n##仪\n##们\n##仮\n##仰\n##仲\n##件\n##价\n##任\n##份\n##仿\n##企\n##伉\n##伊\n##伍\n##伎\n##伏\n##伐\n##休\n##伕\n##众\n##优\n##伙\n##会\n##伝\n##伞\n##伟\n##传\n##伢\n##伤\n##伦\n##伪\n##伫\n##伯\n##估\n##伴\n##伶\n##伸\n##伺\n##似\n##伽\n##佃\n##但\n##佇\n##佈\n##位\n##低\n##住\n##佐\n##佑\n##体\n##佔\n##何\n##佗\n##佘\n##余\n##佚\n##佛\n##作\n##佝\n##佞\n##佟\n##你\n##佢\n##佣\n##佤\n##佥\n##佩\n##佬\n##佯\n##佰\n##佳\n##併\n##佶\n##佻\n##佼\n##使\n##侃\n##侄\n##來\n##侈\n##例\n##侍\n##侏\n##侑\n##侖\n##侗\n##供\n##依\n##侠\n##価\n##侣\n##侥\n##侦\n##侧\n##侨\n##侬\n##侮\n##侯\n##侵\n##侶\n##侷\n##便\n##係\n##促\n##俄\n##俊\n##俎\n##俏\n##俐\n##俑\n##俗\n##俘\n##俚\n##保\n##俞\n##俟\n##俠\n##信\n##俨\n##俩\n##俪\n##俬\n##俭\n##修\n##俯\n##俱\n##俳\n##俸\n##俺\n##俾\n##倆\n##倉\n##個\n##倌\n##倍\n##倏\n##們\n##倒\n##倔\n##倖\n##倘\n##候\n##倚\n##倜\n##借\n##倡\n##値\n##倦\n##倩\n##倪\n##倫\n##倬\n##倭\n##倶\n##债\n##值\n##倾\n##偃\n##假\n##偈\n##偉\n##偌\n##偎\n##偏\n##偕\n##做\n##停\n##健\n##側\n##偵\n##偶\n##偷\n##偻\n##偽\n##偿\n##傀\n##傅\n##傍\n##傑\n##傘\n##備\n##傚\n##傢\n##傣\n##傥\n##储\n##傩\n##催\n##傭\n##傲\n##傳\n##債\n##傷\n##傻\n##傾\n##僅\n##働\n##像\n##僑\n##僕\n##僖\n##僚\n##僥\n##僧\n##僭\n##僮\n##僱\n##僵\n##價\n##僻\n##儀\n##儂\n##億\n##儆\n##儉\n##儋\n##儒\n##儕\n##儘\n##償\n##儡\n##優\n##儲\n##儷\n##儼\n##儿\n##兀\n##允\n##元\n##兄\n##充\n##兆\n##兇\n##先\n##光\n##克\n##兌\n##免\n##児\n##兑\n##兒\n##兔\n##兖\n##党\n##兜\n##兢\n##入\n##內\n##全\n##兩\n##八\n##公\n##六\n##兮\n##兰\n##共\n##兲\n##关\n##兴\n##兵\n##其\n##具\n##典\n##兹\n##养\n##兼\n##兽\n##冀\n##内\n##円\n##冇\n##冈\n##冉\n##冊\n##册\n##再\n##冏\n##冒\n##冕\n##冗\n##写\n##军\n##农\n##冠\n##冢\n##冤\n##冥\n##冨\n##冪\n##冬\n##冯\n##冰\n##冲\n##决\n##况\n##冶\n##冷\n##冻\n##冼\n##冽\n##冾\n##净\n##凄\n##准\n##凇\n##凈\n##凉\n##凋\n##凌\n##凍\n##减\n##凑\n##凛\n##凜\n##凝\n##几\n##凡\n##凤\n##処\n##凪\n##凭\n##凯\n##凰\n##凱\n##凳\n##凶\n##凸\n##凹\n##出\n##击\n##函\n##凿\n##刀\n##刁\n##刃\n##分\n##切\n##刈\n##刊\n##刍\n##刎\n##刑\n##划\n##列\n##刘\n##则\n##刚\n##创\n##初\n##删\n##判\n##別\n##刨\n##利\n##刪\n##别\n##刮\n##到\n##制\n##刷\n##券\n##刹\n##刺\n##刻\n##刽\n##剁\n##剂\n##剃\n##則\n##剉\n##削\n##剋\n##剌\n##前\n##剎\n##剐\n##剑\n##剔\n##剖\n##剛\n##剜\n##剝\n##剣\n##剤\n##剥\n##剧\n##剩\n##剪\n##副\n##割\n##創\n##剷\n##剽\n##剿\n##劃\n##劇\n##劈\n##劉\n##劊\n##劍\n##劏\n##劑\n##力\n##劝\n##办\n##功\n##加\n##务\n##劣\n##动\n##助\n##努\n##劫\n##劭\n##励\n##劲\n##劳\n##労\n##劵\n##効\n##劾\n##势\n##勁\n##勃\n##勇\n##勉\n##勋\n##勐\n##勒\n##動\n##勖\n##勘\n##務\n##勛\n##勝\n##勞\n##募\n##勢\n##勤\n##勧\n##勳\n##勵\n##勸\n##勺\n##勻\n##勾\n##勿\n##匀\n##包\n##匆\n##匈\n##匍\n##匐\n##匕\n##化\n##北\n##匙\n##匝\n##匠\n##匡\n##匣\n##匪\n##匮\n##匯\n##匱\n##匹\n##区\n##医\n##匾\n##匿\n##區\n##十\n##千\n##卅\n##升\n##午\n##卉\n##半\n##卍\n##华\n##协\n##卑\n##卒\n##卓\n##協\n##单\n##卖\n##南\n##単\n##博\n##卜\n##卞\n##卟\n##占\n##卡\n##卢\n##卤\n##卦\n##卧\n##卫\n##卮\n##卯\n##印\n##危\n##即\n##却\n##卵\n##卷\n##卸\n##卻\n##卿\n##厂\n##厄\n##厅\n##历\n##厉\n##压\n##厌\n##厕\n##厘\n##厚\n##厝\n##原\n##厢\n##厥\n##厦\n##厨\n##厩\n##厭\n##厮\n##厲\n##厳\n##去\n##县\n##叁\n##参\n##參\n##又\n##叉\n##及\n##友\n##双\n##反\n##収\n##发\n##叔\n##取\n##受\n##变\n##叙\n##叛\n##叟\n##叠\n##叡\n##叢\n##口\n##古\n##句\n##另\n##叨\n##叩\n##只\n##叫\n##召\n##叭\n##叮\n##可\n##台\n##叱\n##史\n##右\n##叵\n##叶\n##号\n##司\n##叹\n##叻\n##叼\n##叽\n##吁\n##吃\n##各\n##吆\n##合\n##吉\n##吊\n##吋\n##同\n##名\n##后\n##吏\n##吐\n##向\n##吒\n##吓\n##吕\n##吖\n##吗\n##君\n##吝\n##吞\n##吟\n##吠\n##吡\n##否\n##吧\n##吨\n##吩\n##含\n##听\n##吭\n##吮\n##启\n##吱\n##吳\n##吴\n##吵\n##吶\n##吸\n##吹\n##吻\n##吼\n##吽\n##吾\n##呀\n##呂\n##呃\n##呆\n##呈\n##告\n##呋\n##呎\n##呐\n##呓\n##呕\n##呗\n##员\n##呛\n##呜\n##呢\n##呤\n##呦\n##周\n##呱\n##呲\n##味\n##呵\n##呷\n##呸\n##呻\n##呼\n##命\n##咀\n##咁\n##咂\n##咄\n##咆\n##咋\n##和\n##咎\n##咏\n##咐\n##咒\n##咔\n##咕\n##咖\n##咗\n##咘\n##咙\n##咚\n##咛\n##咣\n##咤\n##咦\n##咧\n##咨\n##咩\n##咪\n##咫\n##咬\n##咭\n##咯\n##咱\n##咲\n##咳\n##咸\n##咻\n##咽\n##咿\n##哀\n##品\n##哂\n##哄\n##哆\n##哇\n##哈\n##哉\n##哋\n##哌\n##响\n##哎\n##哏\n##哐\n##哑\n##哒\n##哔\n##哗\n##哟\n##員\n##哥\n##哦\n##哧\n##哨\n##哩\n##哪\n##哭\n##哮\n##哲\n##哺\n##哼\n##哽\n##唁\n##唄\n##唆\n##唇\n##唉\n##唏\n##唐\n##唑\n##唔\n##唠\n##唤\n##唧\n##唬\n##售\n##唯\n##唰\n##唱\n##唳\n##唷\n##唸\n##唾\n##啃\n##啄\n##商\n##啉\n##啊\n##問\n##啓\n##啕\n##啖\n##啜\n##啞\n##啟\n##啡\n##啤\n##啥\n##啦\n##啧\n##啪\n##啫\n##啬\n##啮\n##啰\n##啱\n##啲\n##啵\n##啶\n##啷\n##啸\n##啻\n##啼\n##啾\n##喀\n##喂\n##喃\n##善\n##喆\n##喇\n##喉\n##喊\n##喋\n##喎\n##喏\n##喔\n##喘\n##喙\n##喚\n##喜\n##喝\n##喟\n##喧\n##喪\n##喫\n##喬\n##單\n##喰\n##喱\n##喲\n##喳\n##喵\n##営\n##喷\n##喹\n##喺\n##喻\n##喽\n##嗅\n##嗆\n##嗇\n##嗎\n##嗑\n##嗒\n##嗓\n##嗔\n##嗖\n##嗚\n##嗜\n##嗝\n##嗟\n##嗡\n##嗣\n##嗤\n##嗦\n##嗨\n##嗪\n##嗬\n##嗯\n##嗰\n##嗲\n##嗳\n##嗶\n##嗷\n##嗽\n##嘀\n##嘅\n##嘆\n##嘈\n##嘉\n##嘌\n##嘍\n##嘎\n##嘔\n##嘖\n##嘗\n##嘘\n##嘚\n##嘛\n##嘜\n##嘞\n##嘟\n##嘢\n##嘣\n##嘤\n##嘧\n##嘩\n##嘭\n##嘮\n##嘯\n##嘰\n##嘱\n##嘲\n##嘴\n##嘶\n##嘸\n##嘹\n##嘻\n##嘿\n##噁\n##噌\n##噎\n##噓\n##噔\n##噗\n##噙\n##噜\n##噠\n##噢\n##噤\n##器\n##噩\n##噪\n##噬\n##噱\n##噴\n##噶\n##噸\n##噹\n##噻\n##噼\n##嚀\n##嚇\n##嚎\n##嚏\n##嚐\n##嚓\n##嚕\n##嚟\n##嚣\n##嚥\n##嚨\n##嚮\n##嚴\n##嚷\n##嚼\n##囂\n##囉\n##囊\n##囍\n##囑\n##囔\n##囗\n##囚\n##四\n##囝\n##回\n##囟\n##因\n##囡\n##团\n##団\n##囤\n##囧\n##囪\n##囫\n##园\n##困\n##囱\n##囲\n##図\n##围\n##囹\n##固\n##国\n##图\n##囿\n##圃\n##圄\n##圆\n##圈\n##國\n##圍\n##圏\n##園\n##圓\n##圖\n##團\n##圜\n##土\n##圣\n##圧\n##在\n##圩\n##圭\n##地\n##圳\n##场\n##圻\n##圾\n##址\n##坂\n##均\n##坊\n##坍\n##坎\n##坏\n##坐\n##坑\n##块\n##坚\n##坛\n##坝\n##坞\n##坟\n##坠\n##坡\n##坤\n##坦\n##坨\n##坪\n##坯\n##坳\n##坵\n##坷\n##垂\n##垃\n##垄\n##型\n##垒\n##垚\n##垛\n##垠\n##垢\n##垣\n##垦\n##垩\n##垫\n##垭\n##垮\n##垵\n##埂\n##埃\n##埋\n##城\n##埔\n##埕\n##埗\n##域\n##埠\n##埤\n##埵\n##執\n##埸\n##培\n##基\n##埼\n##堀\n##堂\n##堃\n##堅\n##堆\n##堇\n##堑\n##堕\n##堙\n##堡\n##堤\n##堪\n##堯\n##堰\n##報\n##場\n##堵\n##堺\n##堿\n##塊\n##塌\n##塑\n##塔\n##塗\n##塘\n##塚\n##塞\n##塢\n##塩\n##填\n##塬\n##塭\n##塵\n##塾\n##墀\n##境\n##墅\n##墉\n##墊\n##墒\n##墓\n##増\n##墘\n##墙\n##墜\n##增\n##墟\n##墨\n##墩\n##墮\n##墳\n##墻\n##墾\n##壁\n##壅\n##壆\n##壇\n##壊\n##壑\n##壓\n##壕\n##壘\n##壞\n##壟\n##壢\n##壤\n##壩\n##士\n##壬\n##壮\n##壯\n##声\n##売\n##壳\n##壶\n##壹\n##壺\n##壽\n##处\n##备\n##変\n##复\n##夏\n##夔\n##夕\n##外\n##夙\n##多\n##夜\n##够\n##夠\n##夢\n##夥\n##大\n##天\n##太\n##夫\n##夭\n##央\n##夯\n##失\n##头\n##夷\n##夸\n##夹\n##夺\n##夾\n##奂\n##奄\n##奇\n##奈\n##奉\n##奋\n##奎\n##奏\n##奐\n##契\n##奔\n##奕\n##奖\n##套\n##奘\n##奚\n##奠\n##奢\n##奥\n##奧\n##奪\n##奬\n##奮\n##女\n##奴\n##奶\n##奸\n##她\n##好\n##如\n##妃\n##妄\n##妆\n##妇\n##妈\n##妊\n##妍\n##妒\n##妓\n##妖\n##妘\n##妙\n##妝\n##妞\n##妣\n##妤\n##妥\n##妨\n##妩\n##妪\n##妮\n##妲\n##妳\n##妹\n##妻\n##妾\n##姆\n##姉\n##姊\n##始\n##姍\n##姐\n##姑\n##姒\n##姓\n##委\n##姗\n##姚\n##姜\n##姝\n##姣\n##姥\n##姦\n##姨\n##姪\n##姫\n##姬\n##姹\n##姻\n##姿\n##威\n##娃\n##娄\n##娅\n##娆\n##娇\n##娉\n##娑\n##娓\n##娘\n##娛\n##娜\n##娟\n##娠\n##娣\n##娥\n##娩\n##娱\n##娲\n##娴\n##娶\n##娼\n##婀\n##婁\n##婆\n##婉\n##婊\n##婕\n##婚\n##婢\n##婦\n##婧\n##婪\n##婭\n##婴\n##婵\n##婶\n##婷\n##婺\n##婿\n##媒\n##媚\n##媛\n##媞\n##媧\n##媲\n##媳\n##媽\n##媾\n##嫁\n##嫂\n##嫉\n##嫌\n##嫑\n##嫔\n##嫖\n##嫘\n##嫚\n##嫡\n##嫣\n##嫦\n##嫩\n##嫲\n##嫵\n##嫻\n##嬅\n##嬉\n##嬌\n##嬗\n##嬛\n##嬢\n##嬤\n##嬪\n##嬰\n##嬴\n##嬷\n##嬸\n##嬿\n##孀\n##孃\n##子\n##孑\n##孔\n##孕\n##孖\n##字\n##存\n##孙\n##孚\n##孛\n##孜\n##孝\n##孟\n##孢\n##季\n##孤\n##学\n##孩\n##孪\n##孫\n##孬\n##孰\n##孱\n##孳\n##孵\n##學\n##孺\n##孽\n##孿\n##宁\n##它\n##宅\n##宇\n##守\n##安\n##宋\n##完\n##宏\n##宓\n##宕\n##宗\n##官\n##宙\n##定\n##宛\n##宜\n##宝\n##实\n##実\n##宠\n##审\n##客\n##宣\n##室\n##宥\n##宦\n##宪\n##宫\n##宮\n##宰\n##害\n##宴\n##宵\n##家\n##宸\n##容\n##宽\n##宾\n##宿\n##寂\n##寄\n##寅\n##密\n##寇\n##富\n##寐\n##寒\n##寓\n##寛\n##寝\n##寞\n##察\n##寡\n##寢\n##寥\n##實\n##寧\n##寨\n##審\n##寫\n##寬\n##寮\n##寰\n##寵\n##寶\n##寸\n##对\n##寺\n##寻\n##导\n##対\n##寿\n##封\n##専\n##射\n##将\n##將\n##專\n##尉\n##尊\n##尋\n##對\n##導\n##小\n##少\n##尔\n##尕\n##尖\n##尘\n##尚\n##尝\n##尤\n##尧\n##尬\n##就\n##尴\n##尷\n##尸\n##尹\n##尺\n##尻\n##尼\n##尽\n##尾\n##尿\n##局\n##屁\n##层\n##屄\n##居\n##屆\n##屈\n##屉\n##届\n##屋\n##屌\n##屍\n##屎\n##屏\n##屐\n##屑\n##展\n##屜\n##属\n##屠\n##屡\n##屢\n##層\n##履\n##屬\n##屯\n##山\n##屹\n##屿\n##岀\n##岁\n##岂\n##岌\n##岐\n##岑\n##岔\n##岖\n##岗\n##岘\n##岙\n##岚\n##岛\n##岡\n##岩\n##岫\n##岬\n##岭\n##岱\n##岳\n##岷\n##岸\n##峇\n##峋\n##峒\n##峙\n##峡\n##峤\n##峥\n##峦\n##峨\n##峪\n##峭\n##峯\n##峰\n##峴\n##島\n##峻\n##峽\n##崁\n##崂\n##崆\n##崇\n##崎\n##崑\n##崔\n##崖\n##崗\n##崙\n##崛\n##崧\n##崩\n##崭\n##崴\n##崽\n##嵇\n##嵊\n##嵋\n##嵌\n##嵐\n##嵘\n##嵩\n##嵬\n##嵯\n##嶂\n##嶄\n##嶇\n##嶋\n##嶙\n##嶺\n##嶼\n##嶽\n##巅\n##巍\n##巒\n##巔\n##巖\n##川\n##州\n##巡\n##巢\n##工\n##左\n##巧\n##巨\n##巩\n##巫\n##差\n##己\n##已\n##巳\n##巴\n##巷\n##巻\n##巽\n##巾\n##巿\n##币\n##市\n##布\n##帅\n##帆\n##师\n##希\n##帐\n##帑\n##帕\n##帖\n##帘\n##帚\n##帛\n##帜\n##帝\n##帥\n##带\n##帧\n##師\n##席\n##帮\n##帯\n##帰\n##帳\n##帶\n##帷\n##常\n##帼\n##帽\n##幀\n##幂\n##幄\n##幅\n##幌\n##幔\n##幕\n##幟\n##幡\n##幢\n##幣\n##幫\n##干\n##平\n##年\n##并\n##幸\n##幹\n##幺\n##幻\n##幼\n##幽\n##幾\n##广\n##庁\n##広\n##庄\n##庆\n##庇\n##床\n##序\n##庐\n##库\n##应\n##底\n##庖\n##店\n##庙\n##庚\n##府\n##庞\n##废\n##庠\n##度\n##座\n##庫\n##庭\n##庵\n##庶\n##康\n##庸\n##庹\n##庾\n##廁\n##廂\n##廃\n##廈\n##廉\n##廊\n##廓\n##廖\n##廚\n##廝\n##廟\n##廠\n##廢\n##廣\n##廬\n##廳\n##延\n##廷\n##建\n##廿\n##开\n##弁\n##异\n##弃\n##弄\n##弈\n##弊\n##弋\n##式\n##弑\n##弒\n##弓\n##弔\n##引\n##弗\n##弘\n##弛\n##弟\n##张\n##弥\n##弦\n##弧\n##弩\n##弭\n##弯\n##弱\n##張\n##強\n##弹\n##强\n##弼\n##弾\n##彅\n##彆\n##彈\n##彌\n##彎\n##归\n##当\n##录\n##彗\n##彙\n##彝\n##形\n##彤\n##彥\n##彦\n##彧\n##彩\n##彪\n##彫\n##彬\n##彭\n##彰\n##影\n##彷\n##役\n##彻\n##彼\n##彿\n##往\n##征\n##径\n##待\n##徇\n##很\n##徉\n##徊\n##律\n##後\n##徐\n##徑\n##徒\n##従\n##徕\n##得\n##徘\n##徙\n##徜\n##從\n##徠\n##御\n##徨\n##復\n##循\n##徬\n##微\n##徳\n##徴\n##徵\n##德\n##徹\n##徼\n##徽\n##心\n##必\n##忆\n##忌\n##忍\n##忏\n##忐\n##忑\n##忒\n##忖\n##志\n##忘\n##忙\n##応\n##忠\n##忡\n##忤\n##忧\n##忪\n##快\n##忱\n##念\n##忻\n##忽\n##忿\n##怀\n##态\n##怂\n##怅\n##怆\n##怎\n##怏\n##怒\n##怔\n##怕\n##怖\n##怙\n##怜\n##思\n##怠\n##怡\n##急\n##怦\n##性\n##怨\n##怪\n##怯\n##怵\n##总\n##怼\n##恁\n##恃\n##恆\n##恋\n##恍\n##恐\n##恒\n##恕\n##恙\n##恚\n##恢\n##恣\n##恤\n##恥\n##恨\n##恩\n##恪\n##恫\n##恬\n##恭\n##息\n##恰\n##恳\n##恵\n##恶\n##恸\n##恺\n##恻\n##恼\n##恿\n##悄\n##悅\n##悉\n##悌\n##悍\n##悔\n##悖\n##悚\n##悟\n##悠\n##患\n##悦\n##您\n##悩\n##悪\n##悬\n##悯\n##悱\n##悲\n##悴\n##悵\n##悶\n##悸\n##悻\n##悼\n##悽\n##情\n##惆\n##惇\n##惊\n##惋\n##惑\n##惕\n##惘\n##惚\n##惜\n##惟\n##惠\n##惡\n##惦\n##惧\n##惨\n##惩\n##惫\n##惬\n##惭\n##惮\n##惯\n##惰\n##惱\n##想\n##惴\n##惶\n##惹\n##惺\n##愁\n##愆\n##愈\n##愉\n##愍\n##意\n##愕\n##愚\n##愛\n##愜\n##感\n##愣\n##愤\n##愧\n##愫\n##愷\n##愿\n##慄\n##慈\n##態\n##慌\n##慎\n##慑\n##慕\n##慘\n##慚\n##慟\n##慢\n##慣\n##慧\n##慨\n##慫\n##慮\n##慰\n##慳\n##慵\n##慶\n##慷\n##慾\n##憂\n##憊\n##憋\n##憎\n##憐\n##憑\n##憔\n##憚\n##憤\n##憧\n##憨\n##憩\n##憫\n##憬\n##憲\n##憶\n##憾\n##懂\n##懇\n##懈\n##應\n##懊\n##懋\n##懑\n##懒\n##懦\n##懲\n##懵\n##懶\n##懷\n##懸\n##懺\n##懼\n##懾\n##懿\n##戀\n##戈\n##戊\n##戌\n##戍\n##戎\n##戏\n##成\n##我\n##戒\n##戕\n##或\n##战\n##戚\n##戛\n##戟\n##戡\n##戦\n##截\n##戬\n##戮\n##戰\n##戲\n##戳\n##戴\n##戶\n##户\n##戸\n##戻\n##戾\n##房\n##所\n##扁\n##扇\n##扈\n##扉\n##手\n##才\n##扎\n##扑\n##扒\n##打\n##扔\n##払\n##托\n##扛\n##扣\n##扦\n##执\n##扩\n##扪\n##扫\n##扬\n##扭\n##扮\n##扯\n##扰\n##扱\n##扳\n##扶\n##批\n##扼\n##找\n##承\n##技\n##抄\n##抉\n##把\n##抑\n##抒\n##抓\n##投\n##抖\n##抗\n##折\n##抚\n##抛\n##抜\n##択\n##抟\n##抠\n##抡\n##抢\n##护\n##报\n##抨\n##披\n##抬\n##抱\n##抵\n##抹\n##押\n##抽\n##抿\n##拂\n##拄\n##担\n##拆\n##拇\n##拈\n##拉\n##拋\n##拌\n##拍\n##拎\n##拐\n##拒\n##拓\n##拔\n##拖\n##拗\n##拘\n##拙\n##拚\n##招\n##拜\n##拟\n##拡\n##拢\n##拣\n##拥\n##拦\n##拧\n##拨\n##择\n##括\n##拭\n##拮\n##拯\n##拱\n##拳\n##拴\n##拷\n##拼\n##拽\n##拾\n##拿\n##持\n##挂\n##指\n##挈\n##按\n##挎\n##挑\n##挖\n##挙\n##挚\n##挛\n##挝\n##挞\n##挟\n##挠\n##挡\n##挣\n##挤\n##挥\n##挨\n##挪\n##挫\n##振\n##挲\n##挹\n##挺\n##挽\n##挾\n##捂\n##捅\n##捆\n##捉\n##捋\n##捌\n##捍\n##捎\n##捏\n##捐\n##捕\n##捞\n##损\n##捡\n##换\n##捣\n##捧\n##捨\n##捩\n##据\n##捱\n##捲\n##捶\n##捷\n##捺\n##捻\n##掀\n##掂\n##掃\n##掇\n##授\n##掉\n##掌\n##掏\n##掐\n##排\n##掖\n##掘\n##掙\n##掛\n##掠\n##採\n##探\n##掣\n##接\n##控\n##推\n##掩\n##措\n##掬\n##掰\n##掲\n##掳\n##掴\n##掷\n##掸\n##掺\n##揀\n##揃\n##揄\n##揆\n##揉\n##揍\n##描\n##提\n##插\n##揖\n##揚\n##換\n##握\n##揣\n##揩\n##揪\n##揭\n##揮\n##援\n##揶\n##揸\n##揹\n##揽\n##搀\n##搁\n##搂\n##搅\n##損\n##搏\n##搐\n##搓\n##搔\n##搖\n##搗\n##搜\n##搞\n##搡\n##搪\n##搬\n##搭\n##搵\n##搶\n##携\n##搽\n##摀\n##摁\n##摄\n##摆\n##摇\n##摈\n##摊\n##摒\n##摔\n##摘\n##摞\n##摟\n##摧\n##摩\n##摯\n##摳\n##摸\n##摹\n##摺\n##摻\n##撂\n##撃\n##撅\n##撇\n##撈\n##撐\n##撑\n##撒\n##撓\n##撕\n##撚\n##撞\n##撤\n##撥\n##撩\n##撫\n##撬\n##播\n##撮\n##撰\n##撲\n##撵\n##撷\n##撸\n##撻\n##撼\n##撿\n##擀\n##擁\n##擂\n##擄\n##擅\n##擇\n##擊\n##擋\n##操\n##擎\n##擒\n##擔\n##擘\n##據\n##擞\n##擠\n##擡\n##擢\n##擦\n##擬\n##擰\n##擱\n##擲\n##擴\n##擷\n##擺\n##擼\n##擾\n##攀\n##攏\n##攒\n##攔\n##攘\n##攙\n##攜\n##攝\n##攞\n##攢\n##攣\n##攤\n##攥\n##攪\n##攫\n##攬\n##支\n##收\n##攸\n##改\n##攻\n##放\n##政\n##故\n##效\n##敌\n##敍\n##敎\n##敏\n##救\n##敕\n##敖\n##敗\n##敘\n##教\n##敛\n##敝\n##敞\n##敢\n##散\n##敦\n##敬\n##数\n##敲\n##整\n##敵\n##敷\n##數\n##斂\n##斃\n##文\n##斋\n##斌\n##斎\n##斐\n##斑\n##斓\n##斗\n##料\n##斛\n##斜\n##斟\n##斡\n##斤\n##斥\n##斧\n##斩\n##斫\n##斬\n##断\n##斯\n##新\n##斷\n##方\n##於\n##施\n##旁\n##旃\n##旅\n##旋\n##旌\n##旎\n##族\n##旖\n##旗\n##无\n##既\n##日\n##旦\n##旧\n##旨\n##早\n##旬\n##旭\n##旮\n##旱\n##时\n##旷\n##旺\n##旻\n##昀\n##昂\n##昆\n##昇\n##昉\n##昊\n##昌\n##明\n##昏\n##易\n##昔\n##昕\n##昙\n##星\n##映\n##春\n##昧\n##昨\n##昭\n##是\n##昱\n##昴\n##昵\n##昶\n##昼\n##显\n##晁\n##時\n##晃\n##晉\n##晋\n##晌\n##晏\n##晒\n##晓\n##晔\n##晕\n##晖\n##晗\n##晚\n##晝\n##晞\n##晟\n##晤\n##晦\n##晨\n##晩\n##普\n##景\n##晰\n##晴\n##晶\n##晷\n##智\n##晾\n##暂\n##暄\n##暇\n##暈\n##暉\n##暌\n##暐\n##暑\n##暖\n##暗\n##暝\n##暢\n##暧\n##暨\n##暫\n##暮\n##暱\n##暴\n##暸\n##暹\n##曄\n##曆\n##曇\n##曉\n##曖\n##曙\n##曜\n##曝\n##曠\n##曦\n##曬\n##曰\n##曲\n##曳\n##更\n##書\n##曹\n##曼\n##曾\n##替\n##最\n##會\n##月\n##有\n##朋\n##服\n##朐\n##朔\n##朕\n##朗\n##望\n##朝\n##期\n##朦\n##朧\n##木\n##未\n##末\n##本\n##札\n##朮\n##术\n##朱\n##朴\n##朵\n##机\n##朽\n##杀\n##杂\n##权\n##杆\n##杈\n##杉\n##李\n##杏\n##材\n##村\n##杓\n##杖\n##杜\n##杞\n##束\n##杠\n##条\n##来\n##杨\n##杭\n##杯\n##杰\n##東\n##杳\n##杵\n##杷\n##杼\n##松\n##板\n##极\n##构\n##枇\n##枉\n##枋\n##析\n##枕\n##林\n##枚\n##果\n##枝\n##枢\n##枣\n##枪\n##枫\n##枭\n##枯\n##枰\n##枱\n##枳\n##架\n##枷\n##枸\n##柄\n##柏\n##某\n##柑\n##柒\n##染\n##柔\n##柘\n##柚\n##柜\n##柞\n##柠\n##柢\n##查\n##柩\n##柬\n##柯\n##柱\n##柳\n##柴\n##柵\n##査\n##柿\n##栀\n##栃\n##栄\n##栅\n##标\n##栈\n##栉\n##栋\n##栎\n##栏\n##树\n##栓\n##栖\n##栗\n##校\n##栩\n##株\n##样\n##核\n##根\n##格\n##栽\n##栾\n##桀\n##桁\n##桂\n##桃\n##桅\n##框\n##案\n##桉\n##桌\n##桎\n##桐\n##桑\n##桓\n##桔\n##桜\n##桠\n##桡\n##桢\n##档\n##桥\n##桦\n##桧\n##桨\n##桩\n##桶\n##桿\n##梁\n##梅\n##梆\n##梏\n##梓\n##梗\n##條\n##梟\n##梢\n##梦\n##梧\n##梨\n##梭\n##梯\n##械\n##梳\n##梵\n##梶\n##检\n##棂\n##棄\n##棉\n##棋\n##棍\n##棒\n##棕\n##棗\n##棘\n##棚\n##棟\n##棠\n##棣\n##棧\n##森\n##棱\n##棲\n##棵\n##棹\n##棺\n##椁\n##椅\n##椋\n##植\n##椎\n##椒\n##検\n##椪\n##椭\n##椰\n##椹\n##椽\n##椿\n##楂\n##楊\n##楓\n##楔\n##楚\n##楝\n##楞\n##楠\n##楣\n##楨\n##楫\n##業\n##楮\n##極\n##楷\n##楸\n##楹\n##楼\n##楽\n##概\n##榄\n##榆\n##榈\n##榉\n##榔\n##榕\n##榖\n##榛\n##榜\n##榨\n##榫\n##榭\n##榮\n##榱\n##榴\n##榷\n##榻\n##槁\n##槃\n##構\n##槌\n##槍\n##槎\n##槐\n##槓\n##様\n##槛\n##槟\n##槤\n##槭\n##槲\n##槳\n##槻\n##槽\n##槿\n##樁\n##樂\n##樊\n##樑\n##樓\n##標\n##樞\n##樟\n##模\n##樣\n##権\n##横\n##樫\n##樯\n##樱\n##樵\n##樸\n##樹\n##樺\n##樽\n##樾\n##橄\n##橇\n##橋\n##橐\n##橘\n##橙\n##機\n##橡\n##橢\n##橫\n##橱\n##橹\n##橼\n##檀\n##檄\n##檎\n##檐\n##檔\n##檗\n##檜\n##檢\n##檬\n##檯\n##檳\n##檸\n##檻\n##櫃\n##櫚\n##櫛\n##櫥\n##櫸\n##櫻\n##欄\n##權\n##欒\n##欖\n##欠\n##次\n##欢\n##欣\n##欧\n##欲\n##欸\n##欺\n##欽\n##款\n##歆\n##歇\n##歉\n##歌\n##歎\n##歐\n##歓\n##歙\n##歛\n##歡\n##止\n##正\n##此\n##步\n##武\n##歧\n##歩\n##歪\n##歯\n##歲\n##歳\n##歴\n##歷\n##歸\n##歹\n##死\n##歼\n##殁\n##殃\n##殆\n##殇\n##殉\n##殊\n##残\n##殒\n##殓\n##殖\n##殘\n##殞\n##殡\n##殤\n##殭\n##殯\n##殲\n##殴\n##段\n##殷\n##殺\n##殼\n##殿\n##毀\n##毁\n##毂\n##毅\n##毆\n##毋\n##母\n##毎\n##每\n##毒\n##毓\n##比\n##毕\n##毗\n##毘\n##毙\n##毛\n##毡\n##毫\n##毯\n##毽\n##氈\n##氏\n##氐\n##民\n##氓\n##气\n##氖\n##気\n##氙\n##氛\n##氟\n##氡\n##氢\n##氣\n##氤\n##氦\n##氧\n##氨\n##氪\n##氫\n##氮\n##氯\n##氰\n##氲\n##水\n##氷\n##永\n##氹\n##氾\n##汀\n##汁\n##求\n##汆\n##汇\n##汉\n##汎\n##汐\n##汕\n##汗\n##汙\n##汛\n##汝\n##汞\n##江\n##池\n##污\n##汤\n##汨\n##汩\n##汪\n##汰\n##汲\n##汴\n##汶\n##汹\n##決\n##汽\n##汾\n##沁\n##沂\n##沃\n##沅\n##沈\n##沉\n##沌\n##沏\n##沐\n##沒\n##沓\n##沖\n##沙\n##沛\n##沟\n##没\n##沢\n##沣\n##沥\n##沦\n##沧\n##沪\n##沫\n##沭\n##沮\n##沱\n##河\n##沸\n##油\n##治\n##沼\n##沽\n##沾\n##沿\n##況\n##泄\n##泉\n##泊\n##泌\n##泓\n##法\n##泗\n##泛\n##泞\n##泠\n##泡\n##波\n##泣\n##泥\n##注\n##泪\n##泫\n##泮\n##泯\n##泰\n##泱\n##泳\n##泵\n##泷\n##泸\n##泻\n##泼\n##泽\n##泾\n##洁\n##洄\n##洋\n##洒\n##洗\n##洙\n##洛\n##洞\n##津\n##洩\n##洪\n##洮\n##洱\n##洲\n##洵\n##洶\n##洸\n##洹\n##活\n##洼\n##洽\n##派\n##流\n##浃\n##浄\n##浅\n##浆\n##浇\n##浊\n##测\n##济\n##浏\n##浑\n##浒\n##浓\n##浔\n##浙\n##浚\n##浜\n##浣\n##浦\n##浩\n##浪\n##浬\n##浮\n##浯\n##浴\n##海\n##浸\n##涂\n##涅\n##涇\n##消\n##涉\n##涌\n##涎\n##涓\n##涔\n##涕\n##涙\n##涛\n##涝\n##涞\n##涟\n##涠\n##涡\n##涣\n##涤\n##润\n##涧\n##涨\n##涩\n##涪\n##涮\n##涯\n##液\n##涵\n##涸\n##涼\n##涿\n##淀\n##淄\n##淅\n##淆\n##淇\n##淋\n##淌\n##淑\n##淒\n##淖\n##淘\n##淙\n##淚\n##淞\n##淡\n##淤\n##淦\n##淨\n##淩\n##淪\n##淫\n##淬\n##淮\n##深\n##淳\n##淵\n##混\n##淹\n##淺\n##添\n##淼\n##清\n##済\n##渉\n##渊\n##渋\n##渍\n##渎\n##渐\n##渔\n##渗\n##渙\n##渚\n##減\n##渝\n##渠\n##渡\n##渣\n##渤\n##渥\n##渦\n##温\n##測\n##渭\n##港\n##渲\n##渴\n##游\n##渺\n##渾\n##湃\n##湄\n##湊\n##湍\n##湖\n##湘\n##湛\n##湟\n##湧\n##湫\n##湮\n##湯\n##湳\n##湾\n##湿\n##満\n##溃\n##溅\n##溉\n##溏\n##源\n##準\n##溜\n##溝\n##溟\n##溢\n##溥\n##溧\n##溪\n##溫\n##溯\n##溱\n##溴\n##溶\n##溺\n##溼\n##滁\n##滂\n##滄\n##滅\n##滇\n##滋\n##滌\n##滑\n##滓\n##滔\n##滕\n##滙\n##滚\n##滝\n##滞\n##滟\n##满\n##滢\n##滤\n##滥\n##滦\n##滨\n##滩\n##滬\n##滯\n##滲\n##滴\n##滷\n##滸\n##滾\n##滿\n##漁\n##漂\n##漆\n##漉\n##漏\n##漓\n##演\n##漕\n##漠\n##漢\n##漣\n##漩\n##漪\n##漫\n##漬\n##漯\n##漱\n##漲\n##漳\n##漸\n##漾\n##漿\n##潆\n##潇\n##潋\n##潍\n##潑\n##潔\n##潘\n##潛\n##潜\n##潞\n##潟\n##潢\n##潤\n##潦\n##潧\n##潭\n##潮\n##潰\n##潴\n##潸\n##潺\n##潼\n##澀\n##澄\n##澆\n##澈\n##澍\n##澎\n##澗\n##澜\n##澡\n##澤\n##澧\n##澱\n##澳\n##澹\n##激\n##濁\n##濂\n##濃\n##濑\n##濒\n##濕\n##濘\n##濛\n##濟\n##濠\n##濡\n##濤\n##濫\n##濬\n##濮\n##濯\n##濱\n##濺\n##濾\n##瀅\n##瀆\n##瀉\n##瀋\n##瀏\n##瀑\n##瀕\n##瀘\n##瀚\n##瀛\n##瀝\n##瀞\n##瀟\n##瀧\n##瀨\n##瀬\n##瀰\n##瀾\n##灌\n##灏\n##灑\n##灘\n##灝\n##灞\n##灣\n##火\n##灬\n##灭\n##灯\n##灰\n##灵\n##灶\n##灸\n##灼\n##災\n##灾\n##灿\n##炀\n##炁\n##炅\n##炉\n##炊\n##炎\n##炒\n##炔\n##炕\n##炖\n##炙\n##炜\n##炫\n##炬\n##炭\n##炮\n##炯\n##炳\n##炷\n##炸\n##点\n##為\n##炼\n##炽\n##烁\n##烂\n##烃\n##烈\n##烊\n##烏\n##烘\n##烙\n##烛\n##烟\n##烤\n##烦\n##烧\n##烨\n##烩\n##烫\n##烬\n##热\n##烯\n##烷\n##烹\n##烽\n##焉\n##焊\n##焕\n##焖\n##焗\n##焘\n##焙\n##焚\n##焜\n##無\n##焦\n##焯\n##焰\n##焱\n##然\n##焼\n##煅\n##煉\n##煊\n##煌\n##煎\n##煒\n##煖\n##煙\n##煜\n##煞\n##煤\n##煥\n##煦\n##照\n##煨\n##煩\n##煮\n##煲\n##煸\n##煽\n##熄\n##熊\n##熏\n##熒\n##熔\n##熙\n##熟\n##熠\n##熨\n##熬\n##熱\n##熵\n##熹\n##熾\n##燁\n##燃\n##燄\n##燈\n##燉\n##燊\n##燎\n##燒\n##燔\n##燕\n##燙\n##燜\n##營\n##燥\n##燦\n##燧\n##燭\n##燮\n##燴\n##燻\n##燼\n##燿\n##爆\n##爍\n##爐\n##爛\n##爪\n##爬\n##爭\n##爰\n##爱\n##爲\n##爵\n##父\n##爷\n##爸\n##爹\n##爺\n##爻\n##爽\n##爾\n##牆\n##片\n##版\n##牌\n##牍\n##牒\n##牙\n##牛\n##牝\n##牟\n##牠\n##牡\n##牢\n##牦\n##牧\n##物\n##牯\n##牲\n##牴\n##牵\n##特\n##牺\n##牽\n##犀\n##犁\n##犄\n##犊\n##犍\n##犒\n##犢\n##犧\n##犬\n##犯\n##状\n##犷\n##犸\n##犹\n##狀\n##狂\n##狄\n##狈\n##狎\n##狐\n##狒\n##狗\n##狙\n##狞\n##狠\n##狡\n##狩\n##独\n##狭\n##狮\n##狰\n##狱\n##狸\n##狹\n##狼\n##狽\n##猎\n##猕\n##猖\n##猗\n##猙\n##猛\n##猜\n##猝\n##猥\n##猩\n##猪\n##猫\n##猬\n##献\n##猴\n##猶\n##猷\n##猾\n##猿\n##獄\n##獅\n##獎\n##獐\n##獒\n##獗\n##獠\n##獣\n##獨\n##獭\n##獰\n##獲\n##獵\n##獷\n##獸\n##獺\n##獻\n##獼\n##獾\n##玄\n##率\n##玉\n##王\n##玑\n##玖\n##玛\n##玟\n##玠\n##玥\n##玩\n##玫\n##玮\n##环\n##现\n##玲\n##玳\n##玷\n##玺\n##玻\n##珀\n##珂\n##珅\n##珈\n##珉\n##珊\n##珍\n##珏\n##珐\n##珑\n##珙\n##珞\n##珠\n##珣\n##珥\n##珩\n##珪\n##班\n##珮\n##珲\n##珺\n##現\n##球\n##琅\n##理\n##琇\n##琉\n##琊\n##琍\n##琏\n##琐\n##琛\n##琢\n##琥\n##琦\n##琨\n##琪\n##琬\n##琮\n##琰\n##琲\n##琳\n##琴\n##琵\n##琶\n##琺\n##琼\n##瑀\n##瑁\n##瑄\n##瑋\n##瑕\n##瑗\n##瑙\n##瑚\n##瑛\n##瑜\n##瑞\n##瑟\n##瑠\n##瑣\n##瑤\n##瑩\n##瑪\n##瑯\n##瑰\n##瑶\n##瑾\n##璀\n##璁\n##璃\n##璇\n##璉\n##璋\n##璎\n##璐\n##璜\n##璞\n##璟\n##璧\n##璨\n##環\n##璽\n##璿\n##瓊\n##瓏\n##瓒\n##瓜\n##瓢\n##瓣\n##瓤\n##瓦\n##瓮\n##瓯\n##瓴\n##瓶\n##瓷\n##甄\n##甌\n##甕\n##甘\n##甙\n##甚\n##甜\n##生\n##產\n##産\n##甥\n##甦\n##用\n##甩\n##甫\n##甬\n##甭\n##甯\n##田\n##由\n##甲\n##申\n##电\n##男\n##甸\n##町\n##画\n##甾\n##畀\n##畅\n##界\n##畏\n##畑\n##畔\n##留\n##畜\n##畝\n##畢\n##略\n##畦\n##番\n##畫\n##異\n##畲\n##畳\n##畴\n##當\n##畸\n##畹\n##畿\n##疆\n##疇\n##疊\n##疏\n##疑\n##疔\n##疖\n##疗\n##疙\n##疚\n##疝\n##疟\n##疡\n##疣\n##疤\n##疥\n##疫\n##疮\n##疯\n##疱\n##疲\n##疳\n##疵\n##疸\n##疹\n##疼\n##疽\n##疾\n##痂\n##病\n##症\n##痈\n##痉\n##痊\n##痍\n##痒\n##痔\n##痕\n##痘\n##痙\n##痛\n##痞\n##痠\n##痢\n##痣\n##痤\n##痧\n##痨\n##痪\n##痫\n##痰\n##痱\n##痴\n##痹\n##痺\n##痼\n##痿\n##瘀\n##瘁\n##瘋\n##瘍\n##瘓\n##瘘\n##瘙\n##瘟\n##瘠\n##瘡\n##瘢\n##瘤\n##瘦\n##瘧\n##瘩\n##瘪\n##瘫\n##瘴\n##瘸\n##瘾\n##療\n##癇\n##癌\n##癒\n##癖\n##癜\n##癞\n##癡\n##癢\n##癣\n##癥\n##癫\n##癬\n##癮\n##癱\n##癲\n##癸\n##発\n##登\n##發\n##白\n##百\n##皂\n##的\n##皆\n##皇\n##皈\n##皋\n##皎\n##皑\n##皓\n##皖\n##皙\n##皚\n##皮\n##皰\n##皱\n##皴\n##皺\n##皿\n##盂\n##盃\n##盅\n##盆\n##盈\n##益\n##盎\n##盏\n##盐\n##监\n##盒\n##盔\n##盖\n##盗\n##盘\n##盛\n##盜\n##盞\n##盟\n##盡\n##監\n##盤\n##盥\n##盧\n##盪\n##目\n##盯\n##盱\n##盲\n##直\n##相\n##盹\n##盼\n##盾\n##省\n##眈\n##眉\n##看\n##県\n##眙\n##眞\n##真\n##眠\n##眦\n##眨\n##眩\n##眯\n##眶\n##眷\n##眸\n##眺\n##眼\n##眾\n##着\n##睁\n##睇\n##睏\n##睐\n##睑\n##睛\n##睜\n##睞\n##睡\n##睢\n##督\n##睥\n##睦\n##睨\n##睪\n##睫\n##睬\n##睹\n##睽\n##睾\n##睿\n##瞄\n##瞅\n##瞇\n##瞋\n##瞌\n##瞎\n##瞑\n##瞒\n##瞓\n##瞞\n##瞟\n##瞠\n##瞥\n##瞧\n##瞩\n##瞪\n##瞬\n##瞭\n##瞰\n##瞳\n##瞻\n##瞼\n##瞿\n##矇\n##矍\n##矗\n##矚\n##矛\n##矜\n##矢\n##矣\n##知\n##矩\n##矫\n##短\n##矮\n##矯\n##石\n##矶\n##矽\n##矾\n##矿\n##码\n##砂\n##砌\n##砍\n##砒\n##研\n##砖\n##砗\n##砚\n##砝\n##砣\n##砥\n##砧\n##砭\n##砰\n##砲\n##破\n##砷\n##砸\n##砺\n##砼\n##砾\n##础\n##硅\n##硐\n##硒\n##硕\n##硝\n##硫\n##硬\n##确\n##硯\n##硼\n##碁\n##碇\n##碉\n##碌\n##碍\n##碎\n##碑\n##碓\n##碗\n##碘\n##碚\n##碛\n##碟\n##碣\n##碧\n##碩\n##碰\n##碱\n##碳\n##碴\n##確\n##碼\n##碾\n##磁\n##磅\n##磊\n##磋\n##磐\n##磕\n##磚\n##磡\n##磨\n##磬\n##磯\n##磲\n##磷\n##磺\n##礁\n##礎\n##礙\n##礡\n##礦\n##礪\n##礫\n##礴\n##示\n##礼\n##社\n##祀\n##祁\n##祂\n##祇\n##祈\n##祉\n##祎\n##祐\n##祕\n##祖\n##祗\n##祚\n##祛\n##祜\n##祝\n##神\n##祟\n##祠\n##祢\n##祥\n##票\n##祭\n##祯\n##祷\n##祸\n##祺\n##祿\n##禀\n##禁\n##禄\n##禅\n##禍\n##禎\n##福\n##禛\n##禦\n##禧\n##禪\n##禮\n##禱\n##禹\n##禺\n##离\n##禽\n##禾\n##禿\n##秀\n##私\n##秃\n##秆\n##秉\n##秋\n##种\n##科\n##秒\n##秘\n##租\n##秣\n##秤\n##秦\n##秧\n##秩\n##秭\n##积\n##称\n##秸\n##移\n##秽\n##稀\n##稅\n##程\n##稍\n##税\n##稔\n##稗\n##稚\n##稜\n##稞\n##稟\n##稠\n##稣\n##種\n##稱\n##稲\n##稳\n##稷\n##稹\n##稻\n##稼\n##稽\n##稿\n##穀\n##穂\n##穆\n##穌\n##積\n##穎\n##穗\n##穢\n##穩\n##穫\n##穴\n##究\n##穷\n##穹\n##空\n##穿\n##突\n##窃\n##窄\n##窈\n##窍\n##窑\n##窒\n##窓\n##窕\n##窖\n##窗\n##窘\n##窜\n##窝\n##窟\n##窠\n##窥\n##窦\n##窨\n##窩\n##窪\n##窮\n##窯\n##窺\n##窿\n##竄\n##竅\n##竇\n##竊\n##立\n##竖\n##站\n##竜\n##竞\n##竟\n##章\n##竣\n##童\n##竭\n##端\n##競\n##竹\n##竺\n##竽\n##竿\n##笃\n##笆\n##笈\n##笋\n##笏\n##笑\n##笔\n##笙\n##笛\n##笞\n##笠\n##符\n##笨\n##第\n##笹\n##笺\n##笼\n##筆\n##等\n##筊\n##筋\n##筍\n##筏\n##筐\n##筑\n##筒\n##答\n##策\n##筛\n##筝\n##筠\n##筱\n##筲\n##筵\n##筷\n##筹\n##签\n##简\n##箇\n##箋\n##箍\n##箏\n##箐\n##箔\n##箕\n##算\n##箝\n##管\n##箩\n##箫\n##箭\n##箱\n##箴\n##箸\n##節\n##篁\n##範\n##篆\n##篇\n##築\n##篑\n##篓\n##篙\n##篝\n##篠\n##篡\n##篤\n##篩\n##篪\n##篮\n##篱\n##篷\n##簇\n##簌\n##簍\n##簡\n##簦\n##簧\n##簪\n##簫\n##簷\n##簸\n##簽\n##簾\n##簿\n##籁\n##籃\n##籌\n##籍\n##籐\n##籟\n##籠\n##籤\n##籬\n##籮\n##籲\n##米\n##类\n##籼\n##籽\n##粄\n##粉\n##粑\n##粒\n##粕\n##粗\n##粘\n##粟\n##粤\n##粥\n##粧\n##粪\n##粮\n##粱\n##粲\n##粳\n##粵\n##粹\n##粼\n##粽\n##精\n##粿\n##糅\n##糊\n##糍\n##糕\n##糖\n##糗\n##糙\n##糜\n##糞\n##糟\n##糠\n##糧\n##糬\n##糯\n##糰\n##糸\n##系\n##糾\n##紀\n##紂\n##約\n##紅\n##紉\n##紊\n##紋\n##納\n##紐\n##紓\n##純\n##紗\n##紘\n##紙\n##級\n##紛\n##紜\n##素\n##紡\n##索\n##紧\n##紫\n##紮\n##累\n##細\n##紳\n##紹\n##紺\n##終\n##絃\n##組\n##絆\n##経\n##結\n##絕\n##絞\n##絡\n##絢\n##給\n##絨\n##絮\n##統\n##絲\n##絳\n##絵\n##絶\n##絹\n##綁\n##綏\n##綑\n##經\n##継\n##続\n##綜\n##綠\n##綢\n##綦\n##綫\n##綬\n##維\n##綱\n##網\n##綴\n##綵\n##綸\n##綺\n##綻\n##綽\n##綾\n##綿\n##緊\n##緋\n##総\n##緑\n##緒\n##緘\n##線\n##緝\n##緞\n##締\n##緣\n##編\n##緩\n##緬\n##緯\n##練\n##緹\n##緻\n##縁\n##縄\n##縈\n##縛\n##縝\n##縣\n##縫\n##縮\n##縱\n##縴\n##縷\n##總\n##績\n##繁\n##繃\n##繆\n##繇\n##繋\n##織\n##繕\n##繚\n##繞\n##繡\n##繩\n##繪\n##繫\n##繭\n##繳\n##繹\n##繼\n##繽\n##纂\n##續\n##纍\n##纏\n##纓\n##纔\n##纖\n##纜\n##纠\n##红\n##纣\n##纤\n##约\n##级\n##纨\n##纪\n##纫\n##纬\n##纭\n##纯\n##纰\n##纱\n##纲\n##纳\n##纵\n##纶\n##纷\n##纸\n##纹\n##纺\n##纽\n##纾\n##线\n##绀\n##练\n##组\n##绅\n##细\n##织\n##终\n##绊\n##绍\n##绎\n##经\n##绑\n##绒\n##结\n##绔\n##绕\n##绘\n##给\n##绚\n##绛\n##络\n##绝\n##绞\n##统\n##绡\n##绢\n##绣\n##绥\n##绦\n##继\n##绩\n##绪\n##绫\n##续\n##绮\n##绯\n##绰\n##绳\n##维\n##绵\n##绶\n##绷\n##绸\n##绻\n##综\n##绽\n##绾\n##绿\n##缀\n##缄\n##缅\n##缆\n##缇\n##缈\n##缉\n##缎\n##缓\n##缔\n##缕\n##编\n##缘\n##缙\n##缚\n##缜\n##缝\n##缠\n##缢\n##缤\n##缥\n##缨\n##缩\n##缪\n##缭\n##缮\n##缰\n##缱\n##缴\n##缸\n##缺\n##缽\n##罂\n##罄\n##罌\n##罐\n##网\n##罔\n##罕\n##罗\n##罚\n##罡\n##罢\n##罩\n##罪\n##置\n##罰\n##署\n##罵\n##罷\n##罹\n##羁\n##羅\n##羈\n##羊\n##羌\n##美\n##羔\n##羚\n##羞\n##羟\n##羡\n##羣\n##群\n##羥\n##羧\n##羨\n##義\n##羯\n##羲\n##羸\n##羹\n##羽\n##羿\n##翁\n##翅\n##翊\n##翌\n##翎\n##習\n##翔\n##翘\n##翟\n##翠\n##翡\n##翦\n##翩\n##翰\n##翱\n##翳\n##翹\n##翻\n##翼\n##耀\n##老\n##考\n##耄\n##者\n##耆\n##耋\n##而\n##耍\n##耐\n##耒\n##耕\n##耗\n##耘\n##耙\n##耦\n##耨\n##耳\n##耶\n##耷\n##耸\n##耻\n##耽\n##耿\n##聂\n##聆\n##聊\n##聋\n##职\n##聒\n##联\n##聖\n##聘\n##聚\n##聞\n##聪\n##聯\n##聰\n##聲\n##聳\n##聴\n##聶\n##職\n##聽\n##聾\n##聿\n##肃\n##肄\n##肅\n##肆\n##肇\n##肉\n##肋\n##肌\n##肏\n##肓\n##肖\n##肘\n##肚\n##肛\n##肝\n##肠\n##股\n##肢\n##肤\n##肥\n##肩\n##肪\n##肮\n##肯\n##肱\n##育\n##肴\n##肺\n##肽\n##肾\n##肿\n##胀\n##胁\n##胃\n##胄\n##胆\n##背\n##胍\n##胎\n##胖\n##胚\n##胛\n##胜\n##胝\n##胞\n##胡\n##胤\n##胥\n##胧\n##胫\n##胭\n##胯\n##胰\n##胱\n##胳\n##胴\n##胶\n##胸\n##胺\n##能\n##脂\n##脅\n##脆\n##脇\n##脈\n##脉\n##脊\n##脍\n##脏\n##脐\n##脑\n##脓\n##脖\n##脘\n##脚\n##脛\n##脣\n##脩\n##脫\n##脯\n##脱\n##脲\n##脳\n##脸\n##脹\n##脾\n##腆\n##腈\n##腊\n##腋\n##腌\n##腎\n##腐\n##腑\n##腓\n##腔\n##腕\n##腥\n##腦\n##腩\n##腫\n##腭\n##腮\n##腰\n##腱\n##腳\n##腴\n##腸\n##腹\n##腺\n##腻\n##腼\n##腾\n##腿\n##膀\n##膈\n##膊\n##膏\n##膑\n##膘\n##膚\n##膛\n##膜\n##膝\n##膠\n##膦\n##膨\n##膩\n##膳\n##膺\n##膻\n##膽\n##膾\n##膿\n##臀\n##臂\n##臃\n##臆\n##臉\n##臊\n##臍\n##臓\n##臘\n##臟\n##臣\n##臥\n##臧\n##臨\n##自\n##臬\n##臭\n##至\n##致\n##臺\n##臻\n##臼\n##臾\n##舀\n##舂\n##舅\n##舆\n##與\n##興\n##舉\n##舊\n##舌\n##舍\n##舎\n##舐\n##舒\n##舔\n##舖\n##舗\n##舛\n##舜\n##舞\n##舟\n##航\n##舫\n##般\n##舰\n##舱\n##舵\n##舶\n##舷\n##舸\n##船\n##舺\n##舾\n##艇\n##艋\n##艘\n##艙\n##艦\n##艮\n##良\n##艰\n##艱\n##色\n##艳\n##艷\n##艹\n##艺\n##艾\n##节\n##芃\n##芈\n##芊\n##芋\n##芍\n##芎\n##芒\n##芙\n##芜\n##芝\n##芡\n##芥\n##芦\n##芩\n##芪\n##芫\n##芬\n##芭\n##芮\n##芯\n##花\n##芳\n##芷\n##芸\n##芹\n##芻\n##芽\n##芾\n##苁\n##苄\n##苇\n##苋\n##苍\n##苏\n##苑\n##苒\n##苓\n##苔\n##苕\n##苗\n##苛\n##苜\n##苞\n##苟\n##苡\n##苣\n##若\n##苦\n##苫\n##苯\n##英\n##苷\n##苹\n##苻\n##茁\n##茂\n##范\n##茄\n##茅\n##茉\n##茎\n##茏\n##茗\n##茜\n##茧\n##茨\n##茫\n##茬\n##茭\n##茯\n##茱\n##茲\n##茴\n##茵\n##茶\n##茸\n##茹\n##茼\n##荀\n##荃\n##荆\n##草\n##荊\n##荏\n##荐\n##荒\n##荔\n##荖\n##荘\n##荚\n##荞\n##荟\n##荠\n##荡\n##荣\n##荤\n##荥\n##荧\n##荨\n##荪\n##荫\n##药\n##荳\n##荷\n##荸\n##荻\n##荼\n##荽\n##莅\n##莆\n##莉\n##莊\n##莎\n##莒\n##莓\n##莖\n##莘\n##莞\n##莠\n##莢\n##莧\n##莪\n##莫\n##莱\n##莲\n##莴\n##获\n##莹\n##莺\n##莽\n##莿\n##菀\n##菁\n##菅\n##菇\n##菈\n##菊\n##菌\n##菏\n##菓\n##菖\n##菘\n##菜\n##菟\n##菠\n##菡\n##菩\n##華\n##菱\n##菲\n##菸\n##菽\n##萁\n##萃\n##萄\n##萊\n##萋\n##萌\n##萍\n##萎\n##萘\n##萝\n##萤\n##营\n##萦\n##萧\n##萨\n##萩\n##萬\n##萱\n##萵\n##萸\n##萼\n##落\n##葆\n##葉\n##著\n##葚\n##葛\n##葡\n##董\n##葦\n##葩\n##葫\n##葬\n##葭\n##葯\n##葱\n##葳\n##葵\n##葷\n##葺\n##蒂\n##蒋\n##蒐\n##蒔\n##蒙\n##蒜\n##蒞\n##蒟\n##蒡\n##蒨\n##蒲\n##蒸\n##蒹\n##蒻\n##蒼\n##蒿\n##蓁\n##蓄\n##蓆\n##蓉\n##蓋\n##蓑\n##蓓\n##蓖\n##蓝\n##蓟\n##蓦\n##蓬\n##蓮\n##蓼\n##蓿\n##蔑\n##蔓\n##蔔\n##蔗\n##蔘\n##蔚\n##蔡\n##蔣\n##蔥\n##蔫\n##蔬\n##蔭\n##蔵\n##蔷\n##蔺\n##蔻\n##蔼\n##蔽\n##蕁\n##蕃\n##蕈\n##蕉\n##蕊\n##蕎\n##蕙\n##蕤\n##蕨\n##蕩\n##蕪\n##蕭\n##蕲\n##蕴\n##蕻\n##蕾\n##薄\n##薅\n##薇\n##薈\n##薊\n##薏\n##薑\n##薔\n##薙\n##薛\n##薦\n##薨\n##薩\n##薪\n##薬\n##薯\n##薰\n##薹\n##藉\n##藍\n##藏\n##藐\n##藓\n##藕\n##藜\n##藝\n##藤\n##藥\n##藩\n##藹\n##藻\n##藿\n##蘆\n##蘇\n##蘊\n##蘋\n##蘑\n##蘚\n##蘭\n##蘸\n##蘼\n##蘿\n##虎\n##虏\n##虐\n##虑\n##虔\n##處\n##虚\n##虛\n##虜\n##虞\n##號\n##虢\n##虧\n##虫\n##虬\n##虱\n##虹\n##虻\n##虽\n##虾\n##蚀\n##蚁\n##蚂\n##蚊\n##蚌\n##蚓\n##蚕\n##蚜\n##蚝\n##蚣\n##蚤\n##蚩\n##蚪\n##蚯\n##蚱\n##蚵\n##蛀\n##蛆\n##蛇\n##蛊\n##蛋\n##蛎\n##蛐\n##蛔\n##蛙\n##蛛\n##蛟\n##蛤\n##蛭\n##蛮\n##蛰\n##蛳\n##蛹\n##蛻\n##蛾\n##蜀\n##蜂\n##蜃\n##蜆\n##蜇\n##蜈\n##蜊\n##蜍\n##蜒\n##蜓\n##蜕\n##蜗\n##蜘\n##蜚\n##蜜\n##蜡\n##蜢\n##蜥\n##蜱\n##蜴\n##蜷\n##蜻\n##蜿\n##蝇\n##蝈\n##蝉\n##蝌\n##蝎\n##蝕\n##蝗\n##蝙\n##蝟\n##蝠\n##蝦\n##蝨\n##蝴\n##蝶\n##蝸\n##蝼\n##螂\n##螃\n##融\n##螞\n##螢\n##螨\n##螯\n##螳\n##螺\n##蟀\n##蟄\n##蟆\n##蟋\n##蟎\n##蟑\n##蟒\n##蟠\n##蟬\n##蟲\n##蟹\n##蟻\n##蟾\n##蠅\n##蠍\n##蠔\n##蠕\n##蠛\n##蠟\n##蠡\n##蠢\n##蠣\n##蠱\n##蠶\n##蠹\n##蠻\n##血\n##衄\n##衅\n##衆\n##行\n##衍\n##術\n##衔\n##街\n##衙\n##衛\n##衝\n##衞\n##衡\n##衢\n##衣\n##补\n##表\n##衩\n##衫\n##衬\n##衮\n##衰\n##衲\n##衷\n##衹\n##衾\n##衿\n##袁\n##袂\n##袄\n##袅\n##袈\n##袋\n##袍\n##袒\n##袖\n##袜\n##袞\n##袤\n##袪\n##被\n##袭\n##袱\n##裁\n##裂\n##装\n##裆\n##裊\n##裏\n##裔\n##裕\n##裘\n##裙\n##補\n##裝\n##裟\n##裡\n##裤\n##裨\n##裱\n##裳\n##裴\n##裸\n##裹\n##製\n##裾\n##褂\n##複\n##褐\n##褒\n##褓\n##褔\n##褚\n##褥\n##褪\n##褫\n##褲\n##褶\n##褻\n##襁\n##襄\n##襟\n##襠\n##襪\n##襬\n##襯\n##襲\n##西\n##要\n##覃\n##覆\n##覇\n##見\n##規\n##覓\n##視\n##覚\n##覦\n##覧\n##親\n##覬\n##観\n##覷\n##覺\n##覽\n##觀\n##见\n##观\n##规\n##觅\n##视\n##览\n##觉\n##觊\n##觎\n##觐\n##觑\n##角\n##觞\n##解\n##觥\n##触\n##觸\n##言\n##訂\n##計\n##訊\n##討\n##訓\n##訕\n##訖\n##託\n##記\n##訛\n##訝\n##訟\n##訣\n##訥\n##訪\n##設\n##許\n##訳\n##訴\n##訶\n##診\n##註\n##証\n##詆\n##詐\n##詔\n##評\n##詛\n##詞\n##詠\n##詡\n##詢\n##詣\n##試\n##詩\n##詫\n##詬\n##詭\n##詮\n##詰\n##話\n##該\n##詳\n##詹\n##詼\n##誅\n##誇\n##誉\n##誌\n##認\n##誓\n##誕\n##誘\n##語\n##誠\n##誡\n##誣\n##誤\n##誥\n##誦\n##誨\n##說\n##説\n##読\n##誰\n##課\n##誹\n##誼\n##調\n##諄\n##談\n##請\n##諏\n##諒\n##論\n##諗\n##諜\n##諡\n##諦\n##諧\n##諫\n##諭\n##諮\n##諱\n##諳\n##諷\n##諸\n##諺\n##諾\n##謀\n##謁\n##謂\n##謄\n##謊\n##謎\n##謐\n##謔\n##謗\n##謙\n##講\n##謝\n##謠\n##謨\n##謬\n##謹\n##謾\n##譁\n##證\n##譎\n##譏\n##識\n##譙\n##譚\n##譜\n##警\n##譬\n##譯\n##議\n##譲\n##譴\n##護\n##譽\n##讀\n##變\n##讓\n##讚\n##讞\n##计\n##订\n##认\n##讥\n##讧\n##讨\n##让\n##讪\n##讫\n##训\n##议\n##讯\n##记\n##讲\n##讳\n##讴\n##讶\n##讷\n##许\n##讹\n##论\n##讼\n##讽\n##设\n##访\n##诀\n##证\n##诃\n##评\n##诅\n##识\n##诈\n##诉\n##诊\n##诋\n##词\n##诏\n##译\n##试\n##诗\n##诘\n##诙\n##诚\n##诛\n##话\n##诞\n##诟\n##诠\n##诡\n##询\n##诣\n##诤\n##该\n##详\n##诧\n##诩\n##诫\n##诬\n##语\n##误\n##诰\n##诱\n##诲\n##说\n##诵\n##诶\n##请\n##诸\n##诺\n##读\n##诽\n##课\n##诿\n##谀\n##谁\n##调\n##谄\n##谅\n##谆\n##谈\n##谊\n##谋\n##谌\n##谍\n##谎\n##谏\n##谐\n##谑\n##谒\n##谓\n##谔\n##谕\n##谗\n##谘\n##谙\n##谚\n##谛\n##谜\n##谟\n##谢\n##谣\n##谤\n##谥\n##谦\n##谧\n##谨\n##谩\n##谪\n##谬\n##谭\n##谯\n##谱\n##谲\n##谴\n##谶\n##谷\n##豁\n##豆\n##豇\n##豈\n##豉\n##豊\n##豌\n##豎\n##豐\n##豔\n##豚\n##象\n##豢\n##豪\n##豫\n##豬\n##豹\n##豺\n##貂\n##貅\n##貌\n##貓\n##貔\n##貘\n##貝\n##貞\n##負\n##財\n##貢\n##貧\n##貨\n##販\n##貪\n##貫\n##責\n##貯\n##貰\n##貳\n##貴\n##貶\n##買\n##貸\n##費\n##貼\n##貽\n##貿\n##賀\n##賁\n##賂\n##賃\n##賄\n##資\n##賈\n##賊\n##賑\n##賓\n##賜\n##賞\n##賠\n##賡\n##賢\n##賣\n##賤\n##賦\n##質\n##賬\n##賭\n##賴\n##賺\n##購\n##賽\n##贅\n##贈\n##贊\n##贍\n##贏\n##贓\n##贖\n##贛\n##贝\n##贞\n##负\n##贡\n##财\n##责\n##贤\n##败\n##账\n##货\n##质\n##贩\n##贪\n##贫\n##贬\n##购\n##贮\n##贯\n##贰\n##贱\n##贲\n##贴\n##贵\n##贷\n##贸\n##费\n##贺\n##贻\n##贼\n##贾\n##贿\n##赁\n##赂\n##赃\n##资\n##赅\n##赈\n##赊\n##赋\n##赌\n##赎\n##赏\n##赐\n##赓\n##赔\n##赖\n##赘\n##赚\n##赛\n##赝\n##赞\n##赠\n##赡\n##赢\n##赣\n##赤\n##赦\n##赧\n##赫\n##赭\n##走\n##赳\n##赴\n##赵\n##赶\n##起\n##趁\n##超\n##越\n##趋\n##趕\n##趙\n##趟\n##趣\n##趨\n##足\n##趴\n##趵\n##趸\n##趺\n##趾\n##跃\n##跄\n##跆\n##跋\n##跌\n##跎\n##跑\n##跖\n##跚\n##跛\n##距\n##跟\n##跡\n##跤\n##跨\n##跩\n##跪\n##路\n##跳\n##践\n##跷\n##跹\n##跺\n##跻\n##踉\n##踊\n##踌\n##踏\n##踐\n##踝\n##踞\n##踟\n##踢\n##踩\n##踪\n##踮\n##踱\n##踴\n##踵\n##踹\n##蹂\n##蹄\n##蹇\n##蹈\n##蹉\n##蹊\n##蹋\n##蹑\n##蹒\n##蹙\n##蹟\n##蹣\n##蹤\n##蹦\n##蹩\n##蹬\n##蹭\n##蹲\n##蹴\n##蹶\n##蹺\n##蹼\n##蹿\n##躁\n##躇\n##躉\n##躊\n##躋\n##躍\n##躏\n##躪\n##身\n##躬\n##躯\n##躲\n##躺\n##軀\n##車\n##軋\n##軌\n##軍\n##軒\n##軟\n##転\n##軸\n##軼\n##軽\n##軾\n##較\n##載\n##輒\n##輓\n##輔\n##輕\n##輛\n##輝\n##輟\n##輩\n##輪\n##輯\n##輸\n##輻\n##輾\n##輿\n##轄\n##轅\n##轆\n##轉\n##轍\n##轎\n##轟\n##车\n##轧\n##轨\n##轩\n##转\n##轭\n##轮\n##软\n##轰\n##轲\n##轴\n##轶\n##轻\n##轼\n##载\n##轿\n##较\n##辄\n##辅\n##辆\n##辇\n##辈\n##辉\n##辊\n##辍\n##辐\n##辑\n##输\n##辕\n##辖\n##辗\n##辘\n##辙\n##辛\n##辜\n##辞\n##辟\n##辣\n##辦\n##辨\n##辩\n##辫\n##辭\n##辮\n##辯\n##辰\n##辱\n##農\n##边\n##辺\n##辻\n##込\n##辽\n##达\n##迁\n##迂\n##迄\n##迅\n##过\n##迈\n##迎\n##运\n##近\n##返\n##还\n##这\n##进\n##远\n##违\n##连\n##迟\n##迢\n##迤\n##迥\n##迦\n##迩\n##迪\n##迫\n##迭\n##述\n##迴\n##迷\n##迸\n##迹\n##迺\n##追\n##退\n##送\n##适\n##逃\n##逅\n##逆\n##选\n##逊\n##逍\n##透\n##逐\n##递\n##途\n##逕\n##逗\n##這\n##通\n##逛\n##逝\n##逞\n##速\n##造\n##逢\n##連\n##逮\n##週\n##進\n##逵\n##逶\n##逸\n##逻\n##逼\n##逾\n##遁\n##遂\n##遅\n##遇\n##遊\n##運\n##遍\n##過\n##遏\n##遐\n##遑\n##遒\n##道\n##達\n##違\n##遗\n##遙\n##遛\n##遜\n##遞\n##遠\n##遢\n##遣\n##遥\n##遨\n##適\n##遭\n##遮\n##遲\n##遴\n##遵\n##遶\n##遷\n##選\n##遺\n##遼\n##遽\n##避\n##邀\n##邁\n##邂\n##邃\n##還\n##邇\n##邈\n##邊\n##邋\n##邏\n##邑\n##邓\n##邕\n##邛\n##邝\n##邢\n##那\n##邦\n##邨\n##邪\n##邬\n##邮\n##邯\n##邰\n##邱\n##邳\n##邵\n##邸\n##邹\n##邺\n##邻\n##郁\n##郅\n##郊\n##郎\n##郑\n##郜\n##郝\n##郡\n##郢\n##郤\n##郦\n##郧\n##部\n##郫\n##郭\n##郴\n##郵\n##郷\n##郸\n##都\n##鄂\n##鄉\n##鄒\n##鄔\n##鄙\n##鄞\n##鄢\n##鄧\n##鄭\n##鄰\n##鄱\n##鄲\n##鄺\n##酉\n##酊\n##酋\n##酌\n##配\n##酐\n##酒\n##酗\n##酚\n##酝\n##酢\n##酣\n##酥\n##酩\n##酪\n##酬\n##酮\n##酯\n##酰\n##酱\n##酵\n##酶\n##酷\n##酸\n##酿\n##醃\n##醇\n##醉\n##醋\n##醍\n##醐\n##醒\n##醚\n##醛\n##醜\n##醞\n##醣\n##醪\n##醫\n##醬\n##醮\n##醯\n##醴\n##醺\n##釀\n##釁\n##采\n##釉\n##释\n##釋\n##里\n##重\n##野\n##量\n##釐\n##金\n##釗\n##釘\n##釜\n##針\n##釣\n##釦\n##釧\n##釵\n##鈀\n##鈉\n##鈍\n##鈎\n##鈔\n##鈕\n##鈞\n##鈣\n##鈦\n##鈪\n##鈴\n##鈺\n##鈾\n##鉀\n##鉄\n##鉅\n##鉉\n##鉑\n##鉗\n##鉚\n##鉛\n##鉤\n##鉴\n##鉻\n##銀\n##銃\n##銅\n##銑\n##銓\n##銖\n##銘\n##銜\n##銬\n##銭\n##銮\n##銳\n##銷\n##銹\n##鋁\n##鋅\n##鋒\n##鋤\n##鋪\n##鋰\n##鋸\n##鋼\n##錄\n##錐\n##錘\n##錚\n##錠\n##錢\n##錦\n##錨\n##錫\n##錮\n##錯\n##録\n##錳\n##錶\n##鍊\n##鍋\n##鍍\n##鍛\n##鍥\n##鍰\n##鍵\n##鍺\n##鍾\n##鎂\n##鎊\n##鎌\n##鎏\n##鎔\n##鎖\n##鎗\n##鎚\n##鎧\n##鎬\n##鎮\n##鎳\n##鏈\n##鏖\n##鏗\n##鏘\n##鏞\n##鏟\n##鏡\n##鏢\n##鏤\n##鏽\n##鐘\n##鐮\n##鐲\n##鐳\n##鐵\n##鐸\n##鐺\n##鑄\n##鑊\n##鑑\n##鑒\n##鑣\n##鑫\n##鑰\n##鑲\n##鑼\n##鑽\n##鑾\n##鑿\n##针\n##钉\n##钊\n##钎\n##钏\n##钒\n##钓\n##钗\n##钙\n##钛\n##钜\n##钝\n##钞\n##钟\n##钠\n##钡\n##钢\n##钣\n##钤\n##钥\n##钦\n##钧\n##钨\n##钩\n##钮\n##钯\n##钰\n##钱\n##钳\n##钴\n##钵\n##钺\n##钻\n##钼\n##钾\n##钿\n##铀\n##铁\n##铂\n##铃\n##铄\n##铅\n##铆\n##铉\n##铎\n##铐\n##铛\n##铜\n##铝\n##铠\n##铡\n##铢\n##铣\n##铤\n##铨\n##铩\n##铬\n##铭\n##铮\n##铰\n##铲\n##铵\n##银\n##铸\n##铺\n##链\n##铿\n##销\n##锁\n##锂\n##锄\n##锅\n##锆\n##锈\n##锉\n##锋\n##锌\n##锏\n##锐\n##锑\n##错\n##锚\n##锟\n##锡\n##锢\n##锣\n##锤\n##锥\n##锦\n##锭\n##键\n##锯\n##锰\n##锲\n##锵\n##锹\n##锺\n##锻\n##镀\n##镁\n##镂\n##镇\n##镉\n##镌\n##镍\n##镐\n##镑\n##镕\n##镖\n##镗\n##镛\n##镜\n##镣\n##镭\n##镯\n##镰\n##镳\n##镶\n##長\n##长\n##門\n##閃\n##閉\n##開\n##閎\n##閏\n##閑\n##閒\n##間\n##閔\n##閘\n##閡\n##関\n##閣\n##閥\n##閨\n##閩\n##閱\n##閲\n##閹\n##閻\n##閾\n##闆\n##闇\n##闊\n##闌\n##闍\n##闔\n##闕\n##闖\n##闘\n##關\n##闡\n##闢\n##门\n##闪\n##闫\n##闭\n##问\n##闯\n##闰\n##闲\n##间\n##闵\n##闷\n##闸\n##闹\n##闺\n##闻\n##闽\n##闾\n##阀\n##阁\n##阂\n##阅\n##阆\n##阇\n##阈\n##阉\n##阎\n##阐\n##阑\n##阔\n##阕\n##阖\n##阙\n##阚\n##阜\n##队\n##阡\n##阪\n##阮\n##阱\n##防\n##阳\n##阴\n##阵\n##阶\n##阻\n##阿\n##陀\n##陂\n##附\n##际\n##陆\n##陇\n##陈\n##陋\n##陌\n##降\n##限\n##陕\n##陛\n##陝\n##陞\n##陟\n##陡\n##院\n##陣\n##除\n##陨\n##险\n##陪\n##陰\n##陲\n##陳\n##陵\n##陶\n##陷\n##陸\n##険\n##陽\n##隅\n##隆\n##隈\n##隊\n##隋\n##隍\n##階\n##随\n##隐\n##隔\n##隕\n##隘\n##隙\n##際\n##障\n##隠\n##隣\n##隧\n##隨\n##險\n##隱\n##隴\n##隶\n##隸\n##隻\n##隼\n##隽\n##难\n##雀\n##雁\n##雄\n##雅\n##集\n##雇\n##雉\n##雋\n##雌\n##雍\n##雎\n##雏\n##雑\n##雒\n##雕\n##雖\n##雙\n##雛\n##雜\n##雞\n##離\n##難\n##雨\n##雪\n##雯\n##雰\n##雲\n##雳\n##零\n##雷\n##雹\n##電\n##雾\n##需\n##霁\n##霄\n##霆\n##震\n##霈\n##霉\n##霊\n##霍\n##霎\n##霏\n##霑\n##霓\n##霖\n##霜\n##霞\n##霧\n##霭\n##霰\n##露\n##霸\n##霹\n##霽\n##霾\n##靂\n##靄\n##靈\n##青\n##靓\n##靖\n##静\n##靚\n##靛\n##靜\n##非\n##靠\n##靡\n##面\n##靥\n##靦\n##革\n##靳\n##靴\n##靶\n##靼\n##鞅\n##鞋\n##鞍\n##鞏\n##鞑\n##鞘\n##鞠\n##鞣\n##鞦\n##鞭\n##韆\n##韋\n##韌\n##韓\n##韜\n##韦\n##韧\n##韩\n##韬\n##韭\n##音\n##韵\n##韶\n##韻\n##響\n##頁\n##頂\n##頃\n##項\n##順\n##須\n##頌\n##預\n##頑\n##頒\n##頓\n##頗\n##領\n##頜\n##頡\n##頤\n##頫\n##頭\n##頰\n##頷\n##頸\n##頹\n##頻\n##頼\n##顆\n##題\n##額\n##顎\n##顏\n##顔\n##願\n##顛\n##類\n##顧\n##顫\n##顯\n##顱\n##顴\n##页\n##顶\n##顷\n##项\n##顺\n##须\n##顼\n##顽\n##顾\n##顿\n##颁\n##颂\n##预\n##颅\n##领\n##颇\n##颈\n##颉\n##颊\n##颌\n##颍\n##颐\n##频\n##颓\n##颔\n##颖\n##颗\n##题\n##颚\n##颛\n##颜\n##额\n##颞\n##颠\n##颡\n##颢\n##颤\n##颦\n##颧\n##風\n##颯\n##颱\n##颳\n##颶\n##颼\n##飄\n##飆\n##风\n##飒\n##飓\n##飕\n##飘\n##飙\n##飚\n##飛\n##飞\n##食\n##飢\n##飨\n##飩\n##飪\n##飯\n##飲\n##飼\n##飽\n##飾\n##餃\n##餅\n##餉\n##養\n##餌\n##餐\n##餒\n##餓\n##餘\n##餚\n##餛\n##餞\n##餡\n##館\n##餮\n##餵\n##餾\n##饅\n##饈\n##饋\n##饌\n##饍\n##饑\n##饒\n##饕\n##饗\n##饞\n##饥\n##饨\n##饪\n##饬\n##饭\n##饮\n##饯\n##饰\n##饱\n##饲\n##饴\n##饵\n##饶\n##饷\n##饺\n##饼\n##饽\n##饿\n##馀\n##馁\n##馄\n##馅\n##馆\n##馈\n##馋\n##馍\n##馏\n##馒\n##馔\n##首\n##馗\n##香\n##馥\n##馨\n##馬\n##馭\n##馮\n##馳\n##馴\n##駁\n##駄\n##駅\n##駆\n##駐\n##駒\n##駕\n##駛\n##駝\n##駭\n##駱\n##駿\n##騁\n##騎\n##騏\n##験\n##騙\n##騨\n##騰\n##騷\n##驀\n##驅\n##驊\n##驍\n##驒\n##驕\n##驗\n##驚\n##驛\n##驟\n##驢\n##驥\n##马\n##驭\n##驮\n##驯\n##驰\n##驱\n##驳\n##驴\n##驶\n##驷\n##驸\n##驹\n##驻\n##驼\n##驾\n##驿\n##骁\n##骂\n##骄\n##骅\n##骆\n##骇\n##骈\n##骊\n##骋\n##验\n##骏\n##骐\n##骑\n##骗\n##骚\n##骛\n##骜\n##骞\n##骠\n##骡\n##骤\n##骥\n##骧\n##骨\n##骯\n##骰\n##骶\n##骷\n##骸\n##骼\n##髂\n##髅\n##髋\n##髏\n##髒\n##髓\n##體\n##髖\n##高\n##髦\n##髪\n##髮\n##髯\n##髻\n##鬃\n##鬆\n##鬍\n##鬓\n##鬚\n##鬟\n##鬢\n##鬣\n##鬥\n##鬧\n##鬱\n##鬼\n##魁\n##魂\n##魄\n##魅\n##魇\n##魍\n##魏\n##魔\n##魘\n##魚\n##魯\n##魷\n##鮑\n##鮨\n##鮪\n##鮭\n##鮮\n##鯉\n##鯊\n##鯖\n##鯛\n##鯨\n##鯰\n##鯽\n##鰍\n##鰓\n##鰭\n##鰲\n##鰻\n##鰾\n##鱈\n##鱉\n##鱔\n##鱗\n##鱷\n##鱸\n##鱼\n##鱿\n##鲁\n##鲈\n##鲍\n##鲑\n##鲛\n##鲜\n##鲟\n##鲢\n##鲤\n##鲨\n##鲫\n##鲱\n##鲲\n##鲶\n##鲷\n##鲸\n##鳃\n##鳄\n##鳅\n##鳌\n##鳍\n##鳕\n##鳖\n##鳗\n##鳝\n##鳞\n##鳥\n##鳩\n##鳳\n##鳴\n##鳶\n##鴉\n##鴕\n##鴛\n##鴦\n##鴨\n##鴻\n##鴿\n##鵑\n##鵜\n##鵝\n##鵡\n##鵬\n##鵰\n##鵲\n##鶘\n##鶩\n##鶯\n##鶴\n##鷗\n##鷲\n##鷹\n##鷺\n##鸚\n##鸞\n##鸟\n##鸠\n##鸡\n##鸢\n##鸣\n##鸥\n##鸦\n##鸨\n##鸪\n##鸭\n##鸯\n##鸳\n##鸵\n##鸽\n##鸾\n##鸿\n##鹂\n##鹃\n##鹄\n##鹅\n##鹈\n##鹉\n##鹊\n##鹌\n##鹏\n##鹑\n##鹕\n##鹘\n##鹜\n##鹞\n##鹤\n##鹦\n##鹧\n##鹫\n##鹭\n##鹰\n##鹳\n##鹵\n##鹹\n##鹼\n##鹽\n##鹿\n##麂\n##麋\n##麒\n##麓\n##麗\n##麝\n##麟\n##麥\n##麦\n##麩\n##麴\n##麵\n##麸\n##麺\n##麻\n##麼\n##麽\n##麾\n##黃\n##黄\n##黍\n##黎\n##黏\n##黑\n##黒\n##黔\n##默\n##黛\n##黜\n##黝\n##點\n##黠\n##黨\n##黯\n##黴\n##鼋\n##鼎\n##鼐\n##鼓\n##鼠\n##鼬\n##鼹\n##鼻\n##鼾\n##齁\n##齊\n##齋\n##齐\n##齒\n##齡\n##齢\n##齣\n##齦\n##齿\n##龄\n##龅\n##龈\n##龊\n##龋\n##龌\n##龍\n##龐\n##龔\n##龕\n##龙\n##龚\n##龛\n##龜\n##龟\n##︰\n##︱\n##︶\n##︿\n##﹁\n##﹂\n##﹍\n##﹏\n##﹐\n##﹑\n##﹒\n##﹔\n##﹕\n##﹖\n##﹗\n##﹙\n##﹚\n##﹝\n##﹞\n##﹡\n##﹣\n##！\n##＂\n##＃\n##＄\n##％\n##＆\n##＇\n##（\n##）\n##＊\n##，\n##－\n##．\n##／\n##：\n##；\n##＜\n##？\n##＠\n##［\n##＼\n##］\n##＾\n##＿\n##｀\n##ｆ\n##ｈ\n##ｊ\n##ｕ\n##ｗ\n##ｚ\n##｛\n##｝\n##｡\n##｢\n##｣\n##､\n##･\n##ｯ\n##ｰ\n##ｲ\n##ｸ\n##ｼ\n##ｽ\n##ﾄ\n##ﾉ\n##ﾌ\n##ﾗ\n##ﾙ\n##ﾝ\n##ﾞ\n##ﾟ\n##￣\n##￥\n##👍\n##🔥\n##😂\n##😎\n"
  },
  {
    "path": "args.py",
    "content": "import os\nimport tensorflow as tf\n\ntf.logging.set_verbosity(tf.logging.INFO)\n\nfile_path = os.path.dirname(__file__)\n\n\n#模型目录\nmodel_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n\n#config文件\nconfig_name = os.path.join(file_path, 'albert_config/albert_config_tiny.json')\n#ckpt文件名称\nckpt_name = os.path.join(model_dir, 'model.ckpt')\n#输出文件目录\noutput_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n#vocab文件目录\nvocab_file = os.path.join(file_path, 'albert_config/vocab.txt')\n#数据目录\ndata_dir = os.path.join(file_path, 'data/')\n\nnum_train_epochs = 10\nbatch_size = 128\nlearning_rate = 0.00005\n\n# gpu使用率\ngpu_memory_fraction = 0.8\n\n# 默认取倒数第二层的输出值作为句向量\nlayer_indexes = [-2]\n\n# 序列的最大程度，单文本建议把该值调小\nmax_seq_len = 128\n\n# graph名字\ngraph_file = os.path.join(file_path, 'albert_lcqmc_checkpoints/graph')"
  },
  {
    "path": "bert_utils.py",
    "content": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport copy\nimport json\nimport math\nimport re\nimport six\nimport tensorflow as tf\n\ndef get_shape_list(tensor, expected_rank=None, name=None):\n\t\"\"\"Returns a list of the shape of tensor, preferring static dimensions.\n\n\tArgs:\n\t\ttensor: A tf.Tensor object to find the shape of.\n\t\texpected_rank: (optional) int. The expected rank of `tensor`. If this is\n\t\t\tspecified and the `tensor` has a different rank, and exception will be\n\t\t\tthrown.\n\t\tname: Optional name of the tensor for the error message.\n\n\tReturns:\n\t\tA list of dimensions of the shape of tensor. All static dimensions will\n\t\tbe returned as python integers, and dynamic dimensions will be returned\n\t\tas tf.Tensor scalars.\n\t\"\"\"\n\tif name is None:\n\t\tname = tensor.name\n\n\tif expected_rank is not None:\n\t\tassert_rank(tensor, expected_rank, name)\n\n\tshape = tensor.shape.as_list()\n\n\tnon_static_indexes = []\n\tfor (index, dim) in enumerate(shape):\n\t\tif dim is None:\n\t\t\tnon_static_indexes.append(index)\n\n\tif not non_static_indexes:\n\t\treturn shape\n\n\tdyn_shape = tf.shape(tensor)\n\tfor index in non_static_indexes:\n\t\tshape[index] = dyn_shape[index]\n\treturn shape\n\ndef reshape_to_matrix(input_tensor):\n\t\"\"\"Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).\"\"\"\n\tndims = input_tensor.shape.ndims\n\tif ndims < 2:\n\t\traise ValueError(\"Input tensor must have at least rank 2. Shape = %s\" %\n\t\t\t\t\t\t\t\t\t\t (input_tensor.shape))\n\tif ndims == 2:\n\t\treturn input_tensor\n\n\twidth = input_tensor.shape[-1]\n\toutput_tensor = tf.reshape(input_tensor, [-1, width])\n\treturn output_tensor\n\ndef reshape_from_matrix(output_tensor, orig_shape_list):\n\t\"\"\"Reshapes a rank 2 tensor back to its original rank >= 2 tensor.\"\"\"\n\tif len(orig_shape_list) == 2:\n\t\treturn output_tensor\n\n\toutput_shape = get_shape_list(output_tensor)\n\n\torig_dims = orig_shape_list[0:-1]\n\twidth = output_shape[-1]\n\n\treturn tf.reshape(output_tensor, orig_dims + [width])\n\ndef assert_rank(tensor, expected_rank, name=None):\n\t\"\"\"Raises an exception if the tensor rank is not of the expected rank.\n\n\tArgs:\n\t\ttensor: A tf.Tensor to check the rank of.\n\t\texpected_rank: Python integer or list of integers, expected rank.\n\t\tname: Optional name of the tensor for the error message.\n\n\tRaises:\n\t\tValueError: If the expected shape doesn't match the actual shape.\n\t\"\"\"\n\tif name is None:\n\t\tname = tensor.name\n\n\texpected_rank_dict = {}\n\tif isinstance(expected_rank, six.integer_types):\n\t\texpected_rank_dict[expected_rank] = True\n\telse:\n\t\tfor x in expected_rank:\n\t\t\texpected_rank_dict[x] = True\n\n\tactual_rank = tensor.shape.ndims\n\tif actual_rank not in expected_rank_dict:\n\t\tscope_name = tf.get_variable_scope().name\n\t\traise ValueError(\n\t\t\t\t\"For the tensor `%s` in scope `%s`, the actual rank \"\n\t\t\t\t\"`%d` (shape = %s) is not equal to the expected rank `%s`\" %\n\t\t\t\t(name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))\n\ndef gather_indexes(sequence_tensor, positions):\n\t\"\"\"Gathers the vectors at the specific positions over a minibatch.\"\"\"\n\tsequence_shape = get_shape_list(sequence_tensor, expected_rank=3)\n\tbatch_size = sequence_shape[0]\n\tseq_length = sequence_shape[1]\n\twidth = sequence_shape[2]\n\n\tflat_offsets = tf.reshape(\n\t\t\ttf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])\n\tflat_positions = tf.reshape(positions + flat_offsets, [-1])\n\tflat_sequence_tensor = tf.reshape(sequence_tensor,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t[batch_size * seq_length, width])\n\toutput_tensor = tf.gather(flat_sequence_tensor, flat_positions)\n\treturn output_tensor\n\n# add sequence mask for:\n# 1. random shuffle lm modeling---xlnet with random shuffled input\n# 2. left2right and right2left language modeling\n# 3. conditional generation\ndef generate_seq2seq_mask(attention_mask, mask_sequence, seq_type, **kargs):\n\tif seq_type == 'seq2seq':\n\t\tif mask_sequence is not None:\n\t\t\tseq_shape = get_shape_list(mask_sequence, expected_rank=2)\n\t\t\tseq_len = seq_shape[1]\n\t\t\tones = tf.ones((1, seq_len, seq_len))\n\t\t\ta_mask = tf.matrix_band_part(ones, -1, 0)\n\t\t\ts_ex12 = tf.expand_dims(tf.expand_dims(mask_sequence, 1), 2)\n\t\t\ts_ex13 = tf.expand_dims(tf.expand_dims(mask_sequence, 1), 3)\n\t\t\ta_mask = (1 - s_ex13) * (1 - s_ex12) + s_ex13 * a_mask\n\t\t\t# generate mask of batch x seq_len x seq_len\n\t\t\ta_mask = tf.reshape(a_mask, (-1, seq_len, seq_len))\n\t\t\tout_mask = attention_mask * a_mask\n\t\telse:\n\t\t\tones = tf.ones_like(attention_mask[:1])\n\t\t\tmask = (tf.matrix_band_part(ones, -1, 0))\n\t\t\tout_mask = attention_mask * mask\n\telse:\n\t\tout_mask = attention_mask\n\n\treturn out_mask\n\n"
  },
  {
    "path": "classifier_utils.py",
    "content": "# -*- coding: utf-8 -*-\n# @Author: bo.shi\n# @Date:   2019-12-01 22:28:41\n# @Last Modified by:   bo.shi\n# @Last Modified time: 2019-12-02 18:36:50\n# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n\"\"\"Utility functions for GLUE classification tasks.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\n\nfrom __future__ import print_function\n\nimport json\nimport csv\nimport os\nimport six\n\nimport tensorflow as tf\n\n\ndef convert_to_unicode(text):\n  \"\"\"Converts `text` to Unicode (if it's not already), assuming utf-8 input.\"\"\"\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return text.decode(\"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text.decode(\"utf-8\", \"ignore\")\n    elif isinstance(text, unicode):\n      return text\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\nclass InputExample(object):\n  \"\"\"A single training/test example for simple sequence classification.\"\"\"\n\n  def __init__(self, guid, text_a, text_b=None, label=None):\n    \"\"\"Constructs a InputExample.\n    Args:\n      guid: Unique id for the example.\n      text_a: string. The untokenized text of the first sequence. For single\n        sequence tasks, only this sequence must be specified.\n      text_b: (Optional) string. The untokenized text of the second sequence.\n        Only must be specified for sequence pair tasks.\n      label: (Optional) string. The label of the example. This should be\n        specified for train and dev examples, but not for test examples.\n    \"\"\"\n    self.guid = guid\n    self.text_a = text_a\n    self.text_b = text_b\n    self.label = label\n\n\nclass PaddingInputExample(object):\n  \"\"\"Fake example so the num input examples is a multiple of the batch size.\n  When running eval/predict on the TPU, we need to pad the number of examples\n  to be a multiple of the batch size, because the TPU requires a fixed batch\n  size. The alternative is to drop the last batch, which is bad because it means\n  the entire output data won't be generated.\n  We use this class instead of `None` because treating `None` as padding\n  battches could cause silent errors.\n  \"\"\"\n\n\nclass DataProcessor(object):\n  \"\"\"Base class for data converters for sequence classification data sets.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the train set.\"\"\"\n    raise NotImplementedError()\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the dev set.\"\"\"\n    raise NotImplementedError()\n\n  def get_test_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for prediction.\"\"\"\n    raise NotImplementedError()\n\n  def get_labels(self):\n    \"\"\"Gets the list of labels for this data set.\"\"\"\n    raise NotImplementedError()\n\n  @classmethod\n  def _read_tsv(cls, input_file, delimiter=\"\\t\", quotechar=None):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with tf.gfile.Open(input_file, \"r\") as f:\n      reader = csv.reader(f, delimiter=delimiter, quotechar=quotechar)\n      lines = []\n      for line in reader:\n        lines.append(line)\n      return lines\n\n  @classmethod\n  def _read_txt(cls, input_file):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with tf.gfile.Open(input_file, \"r\") as f:\n      reader = f.readlines()\n      lines = []\n      for line in reader:\n        lines.append(line.strip().split(\"_!_\"))\n      return lines\n\n  @classmethod\n  def _read_json(cls, input_file):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with tf.gfile.Open(input_file, \"r\") as f:\n      reader = f.readlines()\n      lines = []\n      for line in reader:\n        lines.append(json.loads(line.strip()))\n      return lines\n\n\nclass XnliProcessor(DataProcessor):\n  \"\"\"Processor for the XNLI data set.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"See base class.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(line['premise'])\n      text_b = convert_to_unicode(line['hypo'])\n      label = convert_to_unicode(line['label']) if set_type != 'test' else 'contradiction'\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"contradiction\", \"entailment\", \"neutral\"]\n\n\n# class TnewsProcessor(DataProcessor):\n#     \"\"\"Processor for the MRPC data set (GLUE version).\"\"\"\n#\n#     def get_train_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"toutiao_category_train.txt\")), \"train\")\n#\n#     def get_dev_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"toutiao_category_dev.txt\")), \"dev\")\n#\n#     def get_test_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"toutiao_category_test.txt\")), \"test\")\n#\n#     def get_labels(self):\n#         \"\"\"See base class.\"\"\"\n#         labels = []\n#         for i in range(17):\n#             if i == 5 or i == 11:\n#                 continue\n#             labels.append(str(100 + i))\n#         return labels\n#\n#     def _create_examples(self, lines, set_type):\n#         \"\"\"Creates examples for the training and dev sets.\"\"\"\n#         examples = []\n#         for (i, line) in enumerate(lines):\n#             if i == 0:\n#                 continue\n#             guid = \"%s-%s\" % (set_type, i)\n#             text_a = convert_to_unicode(line[3])\n#             text_b = None\n#             label = convert_to_unicode(line[1])\n#             examples.append(\n#                 InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#         return examples\n\n\nclass TnewsProcessor(DataProcessor):\n  \"\"\"Processor for the MRPC data set (GLUE version).\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    labels = []\n    for i in range(17):\n      if i == 5 or i == 11:\n        continue\n      labels.append(str(100 + i))\n    return labels\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(line['sentence'])\n      text_b = None\n      label = convert_to_unicode(line['label']) if set_type != 'test' else \"100\"\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\n# class iFLYTEKDataProcessor(DataProcessor):\n#     \"\"\"Processor for the iFLYTEKData data set (GLUE version).\"\"\"\n#\n#     def get_train_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"train.txt\")), \"train\")\n#\n#     def get_dev_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n#\n#     def get_test_examples(self, data_dir):\n#         \"\"\"See base class.\"\"\"\n#         return self._create_examples(\n#             self._read_txt(os.path.join(data_dir, \"test.txt\")), \"test\")\n#\n#     def get_labels(self):\n#         \"\"\"See base class.\"\"\"\n#         labels = []\n#         for i in range(119):\n#             labels.append(str(i))\n#         return labels\n#\n#     def _create_examples(self, lines, set_type):\n#         \"\"\"Creates examples for the training and dev sets.\"\"\"\n#         examples = []\n#         for (i, line) in enumerate(lines):\n#             if i == 0:\n#                 continue\n#             guid = \"%s-%s\" % (set_type, i)\n#             text_a = convert_to_unicode(line[1])\n#             text_b = None\n#             label = convert_to_unicode(line[0])\n#             examples.append(\n#                 InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#         return examples\n\n\nclass iFLYTEKDataProcessor(DataProcessor):\n  \"\"\"Processor for the iFLYTEKData data set (GLUE version).\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    labels = []\n    for i in range(119):\n      labels.append(str(i))\n    return labels\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(line['sentence'])\n      text_b = None\n      label = convert_to_unicode(line['label']) if set_type != 'test' else \"0\"\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\nclass AFQMCProcessor(DataProcessor):\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(line['sentence1'])\n      text_b = convert_to_unicode(line['sentence2'])\n      label = convert_to_unicode(line['label']) if set_type != 'test' else '0'\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\nclass CMNLIProcessor(DataProcessor):\n  \"\"\"Processor for the CMNLI data set.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples_json(os.path.join(data_dir, \"train.json\"), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples_json(os.path.join(data_dir, \"dev.json\"), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples_json(os.path.join(data_dir, \"test.json\"), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"contradiction\", \"entailment\", \"neutral\"]\n\n  def _create_examples_json(self, file_name, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    lines = tf.gfile.Open(file_name, \"r\")\n    index = 0\n    for line in lines:\n      line_obj = json.loads(line)\n      index = index + 1\n      guid = \"%s-%s\" % (set_type, index)\n      text_a = convert_to_unicode(line_obj[\"sentence1\"])\n      text_b = convert_to_unicode(line_obj[\"sentence2\"])\n      label = convert_to_unicode(line_obj[\"label\"]) if set_type != 'test' else 'neutral'\n\n      if label != \"-\":\n        examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n\n    return examples\n\n\nclass CslProcessor(DataProcessor):\n  \"\"\"Processor for the CSL data set.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(\" \".join(line['keyword']))\n      text_b = convert_to_unicode(line['abst'])\n      label = convert_to_unicode(line['label']) if set_type != 'test' else '0'\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\n# class InewsProcessor(DataProcessor):\n#   \"\"\"Processor for the MRPC data set (GLUE version).\"\"\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"train.txt\")), \"train\")\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"test.txt\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     labels = [\"0\", \"1\", \"2\"]\n#     return labels\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     for (i, line) in enumerate(lines):\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       text_a = convert_to_unicode(line[2])\n#       text_b = convert_to_unicode(line[3])\n#       label = convert_to_unicode(line[0]) if set_type != \"test\" else '0'\n#       examples.append(\n#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#     return examples\n#\n#\n# class THUCNewsProcessor(DataProcessor):\n#   \"\"\"Processor for the THUCNews data set (GLUE version).\"\"\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"train.txt\")), \"train\")\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_txt(os.path.join(data_dir, \"test.txt\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     labels = []\n#     for i in range(14):\n#       labels.append(str(i))\n#     return labels\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     for (i, line) in enumerate(lines):\n#       if i == 0 or len(line) < 3:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       text_a = convert_to_unicode(line[3])\n#       text_b = None\n#       label = convert_to_unicode(line[0])\n#       examples.append(\n#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#     return examples\n#\n# class LCQMCProcessor(DataProcessor):\n#   \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n#\n#   def __init__(self):\n#     self.language = \"zh\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"train.txt\")), \"train\")\n#     # dev_0827.tsv\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"test.txt\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"0\", \"1\"]\n#     # return [\"-1\",\"0\", \"1\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     print(\"length of lines:\", len(lines))\n#     for (i, line) in enumerate(lines):\n#       # print('#i:',i,line)\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       try:\n#         label = convert_to_unicode(line[2])\n#         text_a = convert_to_unicode(line[0])\n#         text_b = convert_to_unicode(line[1])\n#         examples.append(\n#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#       except Exception:\n#         print('###error.i:', i, line)\n#     return examples\n#\n#\n# class JDCOMMENTProcessor(DataProcessor):\n#   \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n#\n#   def __init__(self):\n#     self.language = \"zh\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"jd_train.csv\"), \",\", \"\\\"\"), \"train\")\n#     # dev_0827.tsv\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"jd_dev.csv\"), \",\", \"\\\"\"), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"jd_test.csv\"), \",\", \"\\\"\"), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"1\", \"2\", \"3\", \"4\", \"5\"]\n#     # return [\"-1\",\"0\", \"1\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     print(\"length of lines:\", len(lines))\n#     for (i, line) in enumerate(lines):\n#       # print('#i:',i,line)\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       try:\n#         label = convert_to_unicode(line[0])\n#         text_a = convert_to_unicode(line[1])\n#         text_b = convert_to_unicode(line[2])\n#         examples.append(\n#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#       except Exception:\n#         print('###error.i:', i, line)\n#     return examples\n#\n#\n# class BQProcessor(DataProcessor):\n#   \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n#\n#   def __init__(self):\n#     self.language = \"zh\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"train.txt\")), \"train\")\n#     # dev_0827.tsv\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"test.txt\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"0\", \"1\"]\n#     # return [\"-1\",\"0\", \"1\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     print(\"length of lines:\", len(lines))\n#     for (i, line) in enumerate(lines):\n#       # print('#i:',i,line)\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       try:\n#         label = convert_to_unicode(line[2])\n#         text_a = convert_to_unicode(line[0])\n#         text_b = convert_to_unicode(line[1])\n#         examples.append(\n#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#       except Exception:\n#         print('###error.i:', i, line)\n#     return examples\n#\n#\n# class MnliProcessor(DataProcessor):\n#   \"\"\"Processor for the MultiNLI data set (GLUE version).\"\"\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"dev_matched.tsv\")),\n#         \"dev_matched\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"test_matched.tsv\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"contradiction\", \"entailment\", \"neutral\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     for (i, line) in enumerate(lines):\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, convert_to_unicode(line[0]))\n#       text_a = convert_to_unicode(line[8])\n#       text_b = convert_to_unicode(line[9])\n#       if set_type == \"test\":\n#         label = \"contradiction\"\n#       else:\n#         label = convert_to_unicode(line[-1])\n#       examples.append(\n#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#     return examples\n#\n#\n# class MrpcProcessor(DataProcessor):\n#   \"\"\"Processor for the MRPC data set (GLUE version).\"\"\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"dev.tsv\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"test.tsv\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"0\", \"1\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     for (i, line) in enumerate(lines):\n#       if i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       text_a = convert_to_unicode(line[3])\n#       text_b = convert_to_unicode(line[4])\n#       if set_type == \"test\":\n#         label = \"0\"\n#       else:\n#         label = convert_to_unicode(line[0])\n#       examples.append(\n#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n#     return examples\n#\n#\n# class ColaProcessor(DataProcessor):\n#   \"\"\"Processor for the CoLA data set (GLUE version).\"\"\"\n#\n#   def get_train_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n#\n#   def get_dev_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"dev.tsv\")), \"dev\")\n#\n#   def get_test_examples(self, data_dir):\n#     \"\"\"See base class.\"\"\"\n#     return self._create_examples(\n#         self._read_tsv(os.path.join(data_dir, \"test.tsv\")), \"test\")\n#\n#   def get_labels(self):\n#     \"\"\"See base class.\"\"\"\n#     return [\"0\", \"1\"]\n#\n#   def _create_examples(self, lines, set_type):\n#     \"\"\"Creates examples for the training and dev sets.\"\"\"\n#     examples = []\n#     for (i, line) in enumerate(lines):\n#       # Only the test set has a header\n#       if set_type == \"test\" and i == 0:\n#         continue\n#       guid = \"%s-%s\" % (set_type, i)\n#       if set_type == \"test\":\n#         text_a = convert_to_unicode(line[1])\n#         label = \"0\"\n#       else:\n#         text_a = convert_to_unicode(line[3])\n#         label = convert_to_unicode(line[1])\n#       examples.append(\n#           InputExample(guid=guid, text_a=text_a, text_b=None, label=label))\n#     return examples\n\nclass WSCProcessor(DataProcessor):\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"true\", \"false\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = convert_to_unicode(line['text'])\n      text_a_list = list(text_a)\n      target = line['target']\n      query = target['span1_text']\n      query_idx = target['span1_index']\n      pronoun = target['span2_text']\n      pronoun_idx = target['span2_index']\n\n      assert text_a[pronoun_idx: (pronoun_idx + len(pronoun))\n                    ] == pronoun, \"pronoun: {}\".format(pronoun)\n      assert text_a[query_idx: (query_idx + len(query))] == query, \"query: {}\".format(query)\n\n      if pronoun_idx > query_idx:\n        text_a_list.insert(query_idx, \"_\")\n        text_a_list.insert(query_idx + len(query) + 1, \"_\")\n        text_a_list.insert(pronoun_idx + 2, \"[\")\n        text_a_list.insert(pronoun_idx + len(pronoun) + 2 + 1, \"]\")\n      else:\n        text_a_list.insert(pronoun_idx, \"[\")\n        text_a_list.insert(pronoun_idx + len(pronoun) + 1, \"]\")\n        text_a_list.insert(query_idx + 2, \"_\")\n        text_a_list.insert(query_idx + len(query) + 2 + 1, \"_\")\n\n      text_a = \"\".join(text_a_list)\n\n      if set_type == \"test\":\n        label = \"true\"\n      else:\n        label = line['label']\n\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))\n    return examples\n\n\nclass COPAProcessor(DataProcessor):\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n\n  def __init__(self):\n    self.language = \"zh\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"train.json\")), \"train\")\n    # dev_0827.tsv\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"dev.json\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_json(os.path.join(data_dir, \"test.json\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  @classmethod\n  def _create_examples_one(self, lines, set_type):\n    examples = []\n    for (i, line) in enumerate(lines):\n      guid1 = \"%s-%s\" % (set_type, i)\n#         try:\n      if line['question'] == 'cause':\n        text_a = convert_to_unicode(line['premise'] + '原因是什么呢？' + line['choice0'])\n        text_b = convert_to_unicode(line['premise'] + '原因是什么呢？' + line['choice1'])\n      else:\n        text_a = convert_to_unicode(line['premise'] + '造成了什么影响呢？' + line['choice0'])\n        text_b = convert_to_unicode(line['premise'] + '造成了什么影响呢？' + line['choice1'])\n      label = convert_to_unicode(str(1 if line['label'] == 0 else 0)) if set_type != 'test' else '0'\n      examples.append(\n          InputExample(guid=guid1, text_a=text_a, text_b=text_b, label=label))\n#         except Exception as e:\n#             print('###error.i:',e, i, line)\n    return examples\n\n  @classmethod\n  def _create_examples(self, lines, set_type):\n    examples = []\n    for (i, line) in enumerate(lines):\n      i = 2 * i\n      guid1 = \"%s-%s\" % (set_type, i)\n      guid2 = \"%s-%s\" % (set_type, i + 1)\n#         try:\n      premise = convert_to_unicode(line['premise'])\n      choice0 = convert_to_unicode(line['choice0'])\n      label = convert_to_unicode(str(1 if line['label'] == 0 else 0)) if set_type != 'test' else '0'\n      #text_a2 = convert_to_unicode(line['premise'])\n      choice1 = convert_to_unicode(line['choice1'])\n      label2 = convert_to_unicode(\n          str(0 if line['label'] == 0 else 1)) if set_type != 'test' else '0'\n      if line['question'] == 'effect':\n        text_a = premise\n        text_b = choice0\n        text_a2 = premise\n        text_b2 = choice1\n      elif line['question'] == 'cause':\n        text_a = choice0\n        text_b = premise\n        text_a2 = choice1\n        text_b2 = premise\n      else:\n        print('wrong format!!')\n        return None\n      examples.append(\n          InputExample(guid=guid1, text_a=text_a, text_b=text_b, label=label))\n      examples.append(\n          InputExample(guid=guid2, text_a=text_a2, text_b=text_b2, label=label2))\n#         except Exception as e:\n#             print('###error.i:',e, i, line)\n    return examples"
  },
  {
    "path": "create_pretrain_data.sh",
    "content": "#!/usr/bin/env bash\n\nBERT_BASE_DIR=./albert_config\npython3 create_pretraining_data.py --do_whole_word_mask=True --input_file=data/news_zh_1.txt \\\n--output_file=data/tf_news_2016_zh_raw_news2016zh_1.tfrecord --vocab_file=$BERT_BASE_DIR/vocab.txt --do_lower_case=True \\\n--max_seq_length=512 --max_predictions_per_seq=51 --masked_lm_prob=0.10"
  },
  {
    "path": "create_pretraining_data.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Create masked LM/next sentence masked_lm TF examples for BERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport random\nimport tokenization\nimport tensorflow as tf\nimport jieba\nimport re\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\nflags.DEFINE_string(\"input_file\", None,\n                    \"Input raw text file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\n    \"output_file\", None,\n    \"Output TF example file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\"vocab_file\", None,\n                    \"The vocabulary file that the BERT model was trained on.\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_bool(\n    \"do_whole_word_mask\", False,\n    \"Whether to use whole word masking rather than per-WordPiece masking.\")\n\nflags.DEFINE_integer(\"max_seq_length\", 128, \"Maximum sequence length.\")\n\nflags.DEFINE_integer(\"max_predictions_per_seq\", 20,\n                     \"Maximum number of masked LM predictions per sequence.\")\n\nflags.DEFINE_integer(\"random_seed\", 12345, \"Random seed for data generation.\")\n\nflags.DEFINE_integer(\n    \"dupe_factor\", 10,\n    \"Number of times to duplicate the input data (with different masks).\")\n\nflags.DEFINE_float(\"masked_lm_prob\", 0.15, \"Masked LM probability.\")\n\nflags.DEFINE_float(\n    \"short_seq_prob\", 0.1,\n    \"Probability of creating sequences which are shorter than the \"\n    \"maximum length.\")\n\nflags.DEFINE_bool(\"non_chinese\", False,\"manually set this to True if you are not doing chinese pre-train task.\")\n\n\nclass TrainingInstance(object):\n  \"\"\"A single training instance (sentence pair).\"\"\"\n\n  def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm_labels,\n               is_random_next):\n    self.tokens = tokens\n    self.segment_ids = segment_ids\n    self.is_random_next = is_random_next\n    self.masked_lm_positions = masked_lm_positions\n    self.masked_lm_labels = masked_lm_labels\n\n  def __str__(self):\n    s = \"\"\n    s += \"tokens: %s\\n\" % (\" \".join(\n        [tokenization.printable_text(x) for x in self.tokens]))\n    s += \"segment_ids: %s\\n\" % (\" \".join([str(x) for x in self.segment_ids]))\n    s += \"is_random_next: %s\\n\" % self.is_random_next\n    s += \"masked_lm_positions: %s\\n\" % (\" \".join(\n        [str(x) for x in self.masked_lm_positions]))\n    s += \"masked_lm_labels: %s\\n\" % (\" \".join(\n        [tokenization.printable_text(x) for x in self.masked_lm_labels]))\n    s += \"\\n\"\n    return s\n\n  def __repr__(self):\n    return self.__str__()\n\n\ndef write_instance_to_example_files(instances, tokenizer, max_seq_length,\n                                    max_predictions_per_seq, output_files):\n  \"\"\"Create TF example files from `TrainingInstance`s.\"\"\"\n  writers = []\n  for output_file in output_files:\n    writers.append(tf.python_io.TFRecordWriter(output_file))\n\n  writer_index = 0\n\n  total_written = 0\n  for (inst_index, instance) in enumerate(instances):\n    input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)\n    input_mask = [1] * len(input_ids)\n    segment_ids = list(instance.segment_ids)\n    assert len(input_ids) <= max_seq_length\n\n    while len(input_ids) < max_seq_length:\n      input_ids.append(0)\n      input_mask.append(0)\n      segment_ids.append(0)\n\n    assert len(input_ids) == max_seq_length\n    assert len(input_mask) == max_seq_length\n    assert len(segment_ids) == max_seq_length\n\n    masked_lm_positions = list(instance.masked_lm_positions)\n    masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)\n    masked_lm_weights = [1.0] * len(masked_lm_ids)\n\n    while len(masked_lm_positions) < max_predictions_per_seq:\n      masked_lm_positions.append(0)\n      masked_lm_ids.append(0)\n      masked_lm_weights.append(0.0)\n\n    next_sentence_label = 1 if instance.is_random_next else 0\n\n    features = collections.OrderedDict()\n    features[\"input_ids\"] = create_int_feature(input_ids)\n    features[\"input_mask\"] = create_int_feature(input_mask)\n    features[\"segment_ids\"] = create_int_feature(segment_ids)\n    features[\"masked_lm_positions\"] = create_int_feature(masked_lm_positions)\n    features[\"masked_lm_ids\"] = create_int_feature(masked_lm_ids)\n    features[\"masked_lm_weights\"] = create_float_feature(masked_lm_weights)\n    features[\"next_sentence_labels\"] = create_int_feature([next_sentence_label])\n\n    tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n\n    writers[writer_index].write(tf_example.SerializeToString())\n    writer_index = (writer_index + 1) % len(writers)\n\n    total_written += 1\n\n    if inst_index < 20:\n      tf.logging.info(\"*** Example ***\")\n      tf.logging.info(\"tokens: %s\" % \" \".join(\n          [tokenization.printable_text(x) for x in instance.tokens]))\n\n      for feature_name in features.keys():\n        feature = features[feature_name]\n        values = []\n        if feature.int64_list.value:\n          values = feature.int64_list.value\n        elif feature.float_list.value:\n          values = feature.float_list.value\n        tf.logging.info(\n            \"%s: %s\" % (feature_name, \" \".join([str(x) for x in values])))\n\n  for writer in writers:\n    writer.close()\n\n  tf.logging.info(\"Wrote %d total instances\", total_written)\n\n\ndef create_int_feature(values):\n  feature = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n  return feature\n\n\ndef create_float_feature(values):\n  feature = tf.train.Feature(float_list=tf.train.FloatList(value=list(values)))\n  return feature\n\n\ndef create_training_instances(input_files, tokenizer, max_seq_length,\n                              dupe_factor, short_seq_prob, masked_lm_prob,\n                              max_predictions_per_seq, rng):\n  \"\"\"Create `TrainingInstance`s from raw text.\"\"\"\n  all_documents = [[]]\n\n  # Input file format:\n  # (1) One sentence per line. These should ideally be actual sentences, not\n  # entire paragraphs or arbitrary spans of text. (Because we use the\n  # sentence boundaries for the \"next sentence prediction\" task).\n  # (2) Blank lines between documents. Document boundaries are needed so\n  # that the \"next sentence prediction\" task doesn't span between documents.\n  for input_file in input_files:\n    with tf.gfile.GFile(input_file, \"r\") as reader:\n      while True:\n        strings=reader.readline()\n        strings=strings.replace(\"   \",\" \").replace(\"  \",\" \") # 如果有两个或三个空格，替换为一个空格\n        line = tokenization.convert_to_unicode(strings)\n        if not line:\n          break\n        line = line.strip()\n\n        # Empty lines are used as document delimiters\n        if not line:\n          all_documents.append([])\n        tokens = tokenizer.tokenize(line)\n        if tokens:\n          all_documents[-1].append(tokens)\n\n  # Remove empty documents\n  all_documents = [x for x in all_documents if x]\n  rng.shuffle(all_documents)\n\n  vocab_words = list(tokenizer.vocab.keys())\n  instances = []\n  for _ in range(dupe_factor):\n    for document_index in range(len(all_documents)):\n      instances.extend(\n        create_instances_from_document_albert( # change to albert style for sentence order prediction(SOP), 2019-08-28, brightmart\n              all_documents, document_index, max_seq_length, short_seq_prob,\n              masked_lm_prob, max_predictions_per_seq, vocab_words, rng))\n\n  rng.shuffle(instances)\n  return instances\n\ndef get_new_segment(segment):  # 新增的方法 ####\n    \"\"\"\n    输入一句话，返回一句经过处理的话: 为了支持中文全称mask，将被分开的词，将上特殊标记(\"#\")，使得后续处理模块，能够知道哪些字是属于同一个词的。\n    :param segment: 一句话. e.g.  ['悬', '灸', '技', '术', '培', '训', '专', '家', '教', '你', '艾', '灸', '降', '血', '糖', '，', '为', '爸', '妈', '收', '好', '了', '！']\n    :return: 一句处理过的话 e.g.    ['悬', '##灸', '技', '术', '培', '训', '专', '##家', '教', '你', '艾', '##灸', '降', '##血', '##糖', '，', '为', '爸', '##妈', '收', '##好', '了', '！']\n    \"\"\"\n    seq_cws = jieba.lcut(\"\".join(segment)) # 分词\n    seq_cws_dict = {x: 1 for x in seq_cws} # 分词后的词加入到词典dict\n    new_segment = []\n    i = 0\n    while i < len(segment): # 从句子的第一个字开始处理，知道处理完整个句子\n      if len(re.findall('[\\u4E00-\\u9FA5]', segment[i])) == 0:  # 如果找不到中文的，原文加进去即不用特殊处理。\n        new_segment.append(segment[i])\n        i += 1\n        continue\n\n      has_add = False\n      for length in range(3, 0, -1):\n        if i + length > len(segment):\n          continue\n        if ''.join(segment[i:i + length]) in seq_cws_dict:\n          new_segment.append(segment[i])\n          for l in range(1, length):\n            new_segment.append('##' + segment[i + l])\n          i += length\n          has_add = True\n          break\n      if not has_add:\n        new_segment.append(segment[i])\n        i += 1\n    # print(\"get_new_segment.wwm.get_new_segment:\",new_segment)\n    return new_segment\n\ndef create_instances_from_document_albert(\n    all_documents, document_index, max_seq_length, short_seq_prob,\n    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates `TrainingInstance`s for a single document.\n     This method is changed to create sentence-order prediction (SOP) followed by idea from paper of ALBERT, 2019-08-28, brightmart\n  \"\"\"\n  document = all_documents[document_index] # 得到一个文档\n\n  # Account for [CLS], [SEP], [SEP]\n  max_num_tokens = max_seq_length - 3\n\n  # We *usually* want to fill up the entire sequence since we are padding\n  # to `max_seq_length` anyways, so short sequences are generally wasted\n  # computation. However, we *sometimes*\n  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter\n  # sequences to minimize the mismatch between pre-training and fine-tuning.\n  # The `target_seq_length` is just a rough target however, whereas\n  # `max_seq_length` is a hard limit.\n  target_seq_length = max_num_tokens\n  if rng.random() < short_seq_prob: # 有一定的比例，如10%的概率，我们使用比较短的序列长度，以缓解预训练的长序列和调优阶段（可能的）短序列的不一致情况\n    target_seq_length = rng.randint(2, max_num_tokens)\n\n  # We DON'T just concatenate all of the tokens from a document into a long\n  # sequence and choose an arbitrary split point because this would make the\n  # next sentence prediction task too easy. Instead, we split the input into\n  # segments \"A\" and \"B\" based on the actual \"sentences\" provided by the user\n  # input.\n  # 设法使用实际的句子，而不是任意的截断句子，从而更好的构造句子连贯性预测的任务\n  instances = []\n  current_chunk = [] # 当前处理的文本段，包含多个句子\n  current_length = 0\n  i = 0\n  # print(\"###document:\",document) # 一个document可以是一整篇文章、新闻、词条等. document:[['是', '爷', '们', '，', '就', '得', '给', '媳', '妇', '幸', '福'], ['关', '注', '【', '晨', '曦', '教', '育', '】', '，', '获', '取', '育', '儿', '的', '智', '慧', '，', '与', '孩', '子', '一', '同', '成', '长', '！'], ['方', '法', ':', '打', '开', '微', '信', '→', '添', '加', '朋', '友', '→', '搜', '号', '→', '##he', '##bc', '##x', '##jy', '##→', '关', '注', '!', '我', '是', '一', '个', '爷', '们', '，', '孝', '顺', '是', '做', '人', '的', '第', '一', '准', '则', '。'], ['甭', '管', '小', '时', '候', '怎', '么', '跟', '家', '长', '犯', '混', '蛋', '，', '长', '大', '了', '，', '就', '底', '报', '答', '父', '母', '，', '以', '后', '我', '媳', '妇', '也', '必', '须', '孝', '顺', '。'], ['我', '是', '一', '个', '爷', '们', '，', '可', '以', '花', '心', '，', '可', '以', '好', '玩', '。'], ['但', '我', '一', '定', '会', '找', '一', '个', '管', '的', '住', '我', '的', '女', '人', '，', '和', '我', '一', '起', '生', '活', '。'], ['28', '岁', '以', '前', '在', '怎', '么', '玩', '都', '行', '，', '但', '我', '最', '后', '一', '定', '会', '找', '一', '个', '勤', '俭', '持', '家', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '我', '不', '会', '让', '自', '己', '的', '女', '人', '受', '一', '点', '委', '屈', '，', '每', '次', '把', '她', '抱', '在', '怀', '里', '，', '看', '她', '洋', '溢', '着', '幸', '福', '的', '脸', '，', '我', '都', '会', '引', '以', '为', '傲', '，', '这', '特', '么', '就', '是', '我', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '干', '什', '么', '也', '不', '能', '忘', '了', '自', '己', '媳', '妇', '，', '就', '算', '和', '哥', '们', '一', '起', '喝', '酒', '，', '喝', '到', '很', '晚', '，', '也', '要', '提', '前', '打', '电', '话', '告', '诉', '她', '，', '让', '她', '早', '点', '休', '息', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '绝', '对', '不', '能', '抽', '烟', '，', '喝', '酒', '还', '勉', '强', '过', '得', '去', '，', '不', '过', '该', '喝', '的', '时', '候', '喝', '，', '不', '该', '喝', '的', '时', '候', '，', '少', '扯', '纳', '极', '薄', '蛋', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '必', '须', '听', '我', '话', '，', '在', '人', '前', '一', '定', '要', '给', '我', '面', '子', '，', '回', '家', '了', '咱', '什', '么', '都', '好', '说', '。'], ['我', '是', '一', '爷', '们', '，', '就', '算', '难', '的', '吃', '不', '上', '饭', '了', '，', '都', '不', '张', '口', '跟', '媳', '妇', '要', '一', '分', '钱', '。'], ['我', '是', '一', '爷', '们', '，', '不', '管', '上', '学', '还', '是', '上', '班', '，', '我', '都', '会', '送', '媳', '妇', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '交', '往', '不', '到', '1', '年', '，', '绝', '对', '不', '会', '和', '媳', '妇', '提', '过', '分', '的', '要', '求', '，', '我', '会', '尊', '重', '她', '。'], ['我', '是', '一', '爷', '们', '，', '游', '戏', '永', '远', '比', '不', '上', '我', '媳', '妇', '重', '要', '，', '只', '要', '媳', '妇', '发', '话', '，', '我', '绝', '对', '唯', '命', '是', '从', '。'], ['我', '是', '一', '爷', '们', '，', '上', 'q', '绝', '对', '是', '为', '了', '等', '媳', '妇', '，', '所', '有', '暧', '昧', '的', '心', '情', '只', '为', '她', '一', '个', '女', '人', '而', '写', '，', '我', '不', '一', '定', '会', '经', '常', '写', '日', '志', '，', '可', '是', '我', '会', '告', '诉', '全', '世', '界', '，', '我', '很', '爱', '她', '。'], ['我', '是', '一', '爷', '们', '，', '不', '一', '定', '要', '经', '常', '制', '造', '浪', '漫', '、', '偶', '尔', '过', '个', '节', '日', '也', '要', '送', '束', '玫', '瑰', '花', '给', '媳', '妇', '抱', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '手', '机', '会', '24', '小', '时', '为', '她', '开', '机', '，', '让', '她', '半', '夜', '痛', '经', '的', '时', '候', '，', '做', '恶', '梦', '的', '时', '候', '，', '随', '时', '可', '以', '联', '系', '到', '我', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '经', '常', '带', '媳', '妇', '出', '去', '玩', '，', '她', '不', '一', '定', '要', '和', '我', '所', '有', '的', '哥', '们', '都', '认', '识', '，', '但', '见', '面', '能', '说', '的', '上', '话', '就', '行', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '和', '媳', '妇', '的', '姐', '妹', '哥', '们', '搞', '好', '关', '系', '，', '让', '她', '们', '相', '信', '我', '一', '定', '可', '以', '给', '我', '媳', '妇', '幸', '福', '。'], ['我', '是', '一', '爷', '们', '，', '吵', '架', '后', '、', '也', '要', '主', '动', '打', '电', '话', '关', '心', '她', '，', '咱', '是', '一', '爷', '们', '，', '给', '媳', '妇', '服', '个', '软', '，', '道', '个', '歉', '怎', '么', '了', '？'], ['我', '是', '一', '爷', '们', '，', '绝', '对', '不', '会', '嫌', '弃', '自', '己', '媳', '妇', '，', '拿', '她', '和', '别', '人', '比', '，', '说', '她', '这', '不', '如', '人', '家', '，', '纳', '不', '如', '人', '家', '的', '。'], ['我', '是', '一', '爷', '们', '，', '陪', '媳', '妇', '逛', '街', '时', '，', '碰', '见', '熟', '人', '，', '无', '论', '我', '媳', '妇', '长', '的', '好', '看', '与', '否', '，', '我', '都', '会', '大', '方', '的', '介', '绍', '。'], ['谁', '让', '咱', '爷', '们', '就', '好', '这', '口', '呢', '。'], ['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。'], ['【', '我', '们', '重', '在', '分', '享', '。'], ['所', '有', '文', '字', '和', '美', '图', '，', '来', '自', '网', '络', '，', '晨', '欣', '教', '育', '整', '理', '。'], ['对', '原', '文', '作', '者', '，', '表', '示', '敬', '意', '。'], ['】', '关', '注', '晨', '曦', '教', '育', '[UNK]', '[UNK]', '晨', '曦', '教', '育', '（', '微', '信', '号', '：', 'he', '##bc', '##x', '##jy', '）', '。'], ['打', '开', '微', '信', '，', '扫', '描', '二', '维', '码', '，', '关', '注', '[UNK]', '晨', '曦', '教', '育', '[UNK]', '，', '获', '取', '更', '多', '育', '儿', '资', '源', '。'], ['点', '击', '下', '面', '订', '阅', '按', '钮', '订', '阅', '，', '会', '有', '更', '多', '惊', '喜', '哦', '！']]\n  while i < len(document): # 从文档的第一个位置开始，按个往下看\n    segment = document[i] # segment是列表，代表的是按字分开的一个完整句子，如 segment=['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。']\n    if FLAGS.non_chinese==False: # if non chinese is False, that means it is chinese, then do something to make chinese whole word mask works.\n      segment = get_new_segment(segment)  # whole word mask for chinese: 结合分词的中文的whole mask设置即在需要的地方加上“##”\n\n    current_chunk.append(segment) # 将一个独立的句子加入到当前的文本块中\n    current_length += len(segment) # 累计到为止位置接触到句子的总长度\n    if i == len(document) - 1 or current_length >= target_seq_length:\n      # 如果累计的序列长度达到了目标的长度，或当前走到了文档结尾==>构造并添加到“A[SEP]B“中的A和B中；\n      if current_chunk: # 如果当前块不为空\n        # `a_end` is how many segments from `current_chunk` go into the `A`\n        # (first) sentence.\n        a_end = 1\n        if len(current_chunk) >= 2: # 当前块，如果包含超过两个句子，取当前块的一部分作为“A[SEP]B“中的A部分\n          a_end = rng.randint(1, len(current_chunk) - 1)\n        # 将当前文本段中选取出来的前半部分，赋值给A即tokens_a\n        tokens_a = []\n        for j in range(a_end):\n          tokens_a.extend(current_chunk[j])\n\n        # 构造“A[SEP]B“中的B部分(有一部分是正常的当前文档中的后半部;在原BERT的实现中一部分是随机的从另一个文档中选取的，）\n        tokens_b = []\n        for j in range(a_end, len(current_chunk)):\n          tokens_b.extend(current_chunk[j])\n\n        # 有百分之50%的概率交换一下tokens_a和tokens_b的位置\n        # print(\"tokens_a length1:\",len(tokens_a))\n        # print(\"tokens_b length1:\",len(tokens_b)) # len(tokens_b) = 0\n\n        if len(tokens_a) == 0 or len(tokens_b) == 0: i += 1; continue\n        if rng.random() < 0.5: # 交换一下tokens_a和tokens_b\n          is_random_next=True\n          temp=tokens_a\n          tokens_a=tokens_b\n          tokens_b=temp\n        else:\n          is_random_next=False\n\n        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)\n\n        assert len(tokens_a) >= 1\n        assert len(tokens_b) >= 1\n\n        # 把tokens_a & tokens_b加入到按照bert的风格，即以[CLS]tokens_a[SEP]tokens_b[SEP]的形式，结合到一起，作为最终的tokens; 也带上segment_ids，前面部分segment_ids的值是0，后面部分的值是1.\n        tokens = []\n        segment_ids = []\n        tokens.append(\"[CLS]\")\n        segment_ids.append(0)\n        for token in tokens_a:\n          tokens.append(token)\n          segment_ids.append(0)\n\n        tokens.append(\"[SEP]\")\n        segment_ids.append(0)\n\n        for token in tokens_b:\n          tokens.append(token)\n          segment_ids.append(1)\n        tokens.append(\"[SEP]\")\n        segment_ids.append(1)\n\n        # 创建masked LM的任务的数据 Creates the predictions for the masked LM objective\n        (tokens, masked_lm_positions,\n         masked_lm_labels) = create_masked_lm_predictions(\n             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)\n        instance = TrainingInstance( # 创建训练实例的对象\n            tokens=tokens,\n            segment_ids=segment_ids,\n            is_random_next=is_random_next,\n            masked_lm_positions=masked_lm_positions,\n            masked_lm_labels=masked_lm_labels)\n        instances.append(instance)\n      current_chunk = [] # 清空当前块\n      current_length = 0 # 重置当前文本块的长度\n    i += 1 # 接着文档中的内容往后看\n\n  return instances\n\n\ndef create_instances_from_document_original( # THIS IS ORIGINAL BERT STYLE FOR CREATE DATA OF MLM AND NEXT SENTENCE PREDICTION TASK\n    all_documents, document_index, max_seq_length, short_seq_prob,\n    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates `TrainingInstance`s for a single document.\"\"\"\n  document = all_documents[document_index] # 得到一个文档\n\n  # Account for [CLS], [SEP], [SEP]\n  max_num_tokens = max_seq_length - 3\n\n  # We *usually* want to fill up the entire sequence since we are padding\n  # to `max_seq_length` anyways, so short sequences are generally wasted\n  # computation. However, we *sometimes*\n  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter\n  # sequences to minimize the mismatch between pre-training and fine-tuning.\n  # The `target_seq_length` is just a rough target however, whereas\n  # `max_seq_length` is a hard limit.\n  target_seq_length = max_num_tokens\n  if rng.random() < short_seq_prob: # 有一定的比例，如10%的概率，我们使用比较短的序列长度，以缓解预训练的长序列和调优阶段（可能的）短序列的不一致情况\n    target_seq_length = rng.randint(2, max_num_tokens)\n\n  # We DON'T just concatenate all of the tokens from a document into a long\n  # sequence and choose an arbitrary split point because this would make the\n  # next sentence prediction task too easy. Instead, we split the input into\n  # segments \"A\" and \"B\" based on the actual \"sentences\" provided by the user\n  # input.\n  # 设法使用实际的句子，而不是任意的截断句子，从而更好的构造句子连贯性预测的任务\n  instances = []\n  current_chunk = [] # 当前处理的文本段，包含多个句子\n  current_length = 0\n  i = 0\n  # print(\"###document:\",document) # 一个document可以是一整篇文章、新闻、一个词条等. document:[['是', '爷', '们', '，', '就', '得', '给', '媳', '妇', '幸', '福'], ['关', '注', '【', '晨', '曦', '教', '育', '】', '，', '获', '取', '育', '儿', '的', '智', '慧', '，', '与', '孩', '子', '一', '同', '成', '长', '！'], ['方', '法', ':', '打', '开', '微', '信', '→', '添', '加', '朋', '友', '→', '搜', '号', '→', '##he', '##bc', '##x', '##jy', '##→', '关', '注', '!', '我', '是', '一', '个', '爷', '们', '，', '孝', '顺', '是', '做', '人', '的', '第', '一', '准', '则', '。'], ['甭', '管', '小', '时', '候', '怎', '么', '跟', '家', '长', '犯', '混', '蛋', '，', '长', '大', '了', '，', '就', '底', '报', '答', '父', '母', '，', '以', '后', '我', '媳', '妇', '也', '必', '须', '孝', '顺', '。'], ['我', '是', '一', '个', '爷', '们', '，', '可', '以', '花', '心', '，', '可', '以', '好', '玩', '。'], ['但', '我', '一', '定', '会', '找', '一', '个', '管', '的', '住', '我', '的', '女', '人', '，', '和', '我', '一', '起', '生', '活', '。'], ['28', '岁', '以', '前', '在', '怎', '么', '玩', '都', '行', '，', '但', '我', '最', '后', '一', '定', '会', '找', '一', '个', '勤', '俭', '持', '家', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '我', '不', '会', '让', '自', '己', '的', '女', '人', '受', '一', '点', '委', '屈', '，', '每', '次', '把', '她', '抱', '在', '怀', '里', '，', '看', '她', '洋', '溢', '着', '幸', '福', '的', '脸', '，', '我', '都', '会', '引', '以', '为', '傲', '，', '这', '特', '么', '就', '是', '我', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '干', '什', '么', '也', '不', '能', '忘', '了', '自', '己', '媳', '妇', '，', '就', '算', '和', '哥', '们', '一', '起', '喝', '酒', '，', '喝', '到', '很', '晚', '，', '也', '要', '提', '前', '打', '电', '话', '告', '诉', '她', '，', '让', '她', '早', '点', '休', '息', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '绝', '对', '不', '能', '抽', '烟', '，', '喝', '酒', '还', '勉', '强', '过', '得', '去', '，', '不', '过', '该', '喝', '的', '时', '候', '喝', '，', '不', '该', '喝', '的', '时', '候', '，', '少', '扯', '纳', '极', '薄', '蛋', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '必', '须', '听', '我', '话', '，', '在', '人', '前', '一', '定', '要', '给', '我', '面', '子', '，', '回', '家', '了', '咱', '什', '么', '都', '好', '说', '。'], ['我', '是', '一', '爷', '们', '，', '就', '算', '难', '的', '吃', '不', '上', '饭', '了', '，', '都', '不', '张', '口', '跟', '媳', '妇', '要', '一', '分', '钱', '。'], ['我', '是', '一', '爷', '们', '，', '不', '管', '上', '学', '还', '是', '上', '班', '，', '我', '都', '会', '送', '媳', '妇', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '交', '往', '不', '到', '1', '年', '，', '绝', '对', '不', '会', '和', '媳', '妇', '提', '过', '分', '的', '要', '求', '，', '我', '会', '尊', '重', '她', '。'], ['我', '是', '一', '爷', '们', '，', '游', '戏', '永', '远', '比', '不', '上', '我', '媳', '妇', '重', '要', '，', '只', '要', '媳', '妇', '发', '话', '，', '我', '绝', '对', '唯', '命', '是', '从', '。'], ['我', '是', '一', '爷', '们', '，', '上', 'q', '绝', '对', '是', '为', '了', '等', '媳', '妇', '，', '所', '有', '暧', '昧', '的', '心', '情', '只', '为', '她', '一', '个', '女', '人', '而', '写', '，', '我', '不', '一', '定', '会', '经', '常', '写', '日', '志', '，', '可', '是', '我', '会', '告', '诉', '全', '世', '界', '，', '我', '很', '爱', '她', '。'], ['我', '是', '一', '爷', '们', '，', '不', '一', '定', '要', '经', '常', '制', '造', '浪', '漫', '、', '偶', '尔', '过', '个', '节', '日', '也', '要', '送', '束', '玫', '瑰', '花', '给', '媳', '妇', '抱', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '手', '机', '会', '24', '小', '时', '为', '她', '开', '机', '，', '让', '她', '半', '夜', '痛', '经', '的', '时', '候', '，', '做', '恶', '梦', '的', '时', '候', '，', '随', '时', '可', '以', '联', '系', '到', '我', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '经', '常', '带', '媳', '妇', '出', '去', '玩', '，', '她', '不', '一', '定', '要', '和', '我', '所', '有', '的', '哥', '们', '都', '认', '识', '，', '但', '见', '面', '能', '说', '的', '上', '话', '就', '行', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '和', '媳', '妇', '的', '姐', '妹', '哥', '们', '搞', '好', '关', '系', '，', '让', '她', '们', '相', '信', '我', '一', '定', '可', '以', '给', '我', '媳', '妇', '幸', '福', '。'], ['我', '是', '一', '爷', '们', '，', '吵', '架', '后', '、', '也', '要', '主', '动', '打', '电', '话', '关', '心', '她', '，', '咱', '是', '一', '爷', '们', '，', '给', '媳', '妇', '服', '个', '软', '，', '道', '个', '歉', '怎', '么', '了', '？'], ['我', '是', '一', '爷', '们', '，', '绝', '对', '不', '会', '嫌', '弃', '自', '己', '媳', '妇', '，', '拿', '她', '和', '别', '人', '比', '，', '说', '她', '这', '不', '如', '人', '家', '，', '纳', '不', '如', '人', '家', '的', '。'], ['我', '是', '一', '爷', '们', '，', '陪', '媳', '妇', '逛', '街', '时', '，', '碰', '见', '熟', '人', '，', '无', '论', '我', '媳', '妇', '长', '的', '好', '看', '与', '否', '，', '我', '都', '会', '大', '方', '的', '介', '绍', '。'], ['谁', '让', '咱', '爷', '们', '就', '好', '这', '口', '呢', '。'], ['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。'], ['【', '我', '们', '重', '在', '分', '享', '。'], ['所', '有', '文', '字', '和', '美', '图', '，', '来', '自', '网', '络', '，', '晨', '欣', '教', '育', '整', '理', '。'], ['对', '原', '文', '作', '者', '，', '表', '示', '敬', '意', '。'], ['】', '关', '注', '晨', '曦', '教', '育', '[UNK]', '[UNK]', '晨', '曦', '教', '育', '（', '微', '信', '号', '：', 'he', '##bc', '##x', '##jy', '）', '。'], ['打', '开', '微', '信', '，', '扫', '描', '二', '维', '码', '，', '关', '注', '[UNK]', '晨', '曦', '教', '育', '[UNK]', '，', '获', '取', '更', '多', '育', '儿', '资', '源', '。'], ['点', '击', '下', '面', '订', '阅', '按', '钮', '订', '阅', '，', '会', '有', '更', '多', '惊', '喜', '哦', '！']]\n  while i < len(document): # 从文档的第一个位置开始，按个往下看\n    segment = document[i] # segment是列表，代表的是按字分开的一个完整句子，如 segment=['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。']\n    # print(\"###i:\",i,\";segment:\",segment)\n    current_chunk.append(segment) # 将一个独立的句子加入到当前的文本块中\n    current_length += len(segment) # 累计到为止位置接触到句子的总长度\n    if i == len(document) - 1 or current_length >= target_seq_length: # 如果累计的序列长度达到了目标的长度==>构造并添加到“A[SEP]B“中的A和B中。\n      if current_chunk: # 如果当前块不为空\n        # `a_end` is how many segments from `current_chunk` go into the `A`\n        # (first) sentence.\n        a_end = 1\n        if len(current_chunk) >= 2: # 当前块，如果包含超过两个句子，怎取当前块的一部分作为“A[SEP]B“中的A部分\n          a_end = rng.randint(1, len(current_chunk) - 1)\n        # 将当前文本段中选取出来的前半部分，赋值给A即tokens_a\n        tokens_a = []\n        for j in range(a_end):\n          tokens_a.extend(current_chunk[j])\n\n        # 构造“A[SEP]B“中的B部分(原本的B有一部分是随机的从另一个文档中选取的，有一部分是正常的当前文档中的后半部）\n        tokens_b = []\n        # Random next\n        is_random_next = False\n        if len(current_chunk) == 1 or rng.random() < 0.5: # 有50%的概率，是从其他文档中随机的选取一个文档，并得到这个文档的后半版本作为B即tokens_b\n          is_random_next = True\n          target_b_length = target_seq_length - len(tokens_a)\n\n          # This should rarely go for more than one iteration for large\n          # corpora. However, just to be careful, we try to make sure that\n          # the random document is not the same as the document\n          # we're processing.\n          random_document_index=0\n          for _ in range(10): # 随机的选出一个与当前的文档不一样的文档的索引\n            random_document_index = rng.randint(0, len(all_documents) - 1)\n            if random_document_index != document_index:\n              break\n\n          random_document = all_documents[random_document_index] # 选出这个文档\n          random_start = rng.randint(0, len(random_document) - 1) # 从这个文档选出一个段落的开始位置\n          for j in range(random_start, len(random_document)): # 从这个文档的开始位置到结束，作为我们的“A[SEP]B“中的B即tokens_b\n            tokens_b.extend(random_document[j])\n            if len(tokens_b) >= target_b_length:\n              break\n          # We didn't actually use these segments so we \"put them back\" so\n          # they don't go to waste. 这里是为了防止文本的浪费的一个小技巧\n          num_unused_segments = len(current_chunk) - a_end # e.g. 550-200=350\n          i -= num_unused_segments # i=i-num_unused_segments, e.g. i=400, num_unused_segments=350, 那么 i=i-num_unused_segments=400-350=50\n        # Actual next\n        else: # 有另外50%的几乎，从当前文本块（长度为max_sequence_length）中的后段中填充到tokens_b即“A[SEP]B“中的B。\n          is_random_next = False\n          for j in range(a_end, len(current_chunk)):\n            tokens_b.extend(current_chunk[j])\n        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)\n\n        assert len(tokens_a) >= 1\n        assert len(tokens_b) >= 1\n\n        # 把tokens_a & tokens_b加入到按照bert的风格，即以[CLS]tokens_a[SEP]tokens_b[SEP]的形式，结合到一起，作为最终的tokens; 也带上segment_ids，前面部分segment_ids的值是0，后面部分的值是1.\n        tokens = []\n        segment_ids = []\n        tokens.append(\"[CLS]\")\n        segment_ids.append(0)\n        for token in tokens_a:\n          tokens.append(token)\n          segment_ids.append(0)\n\n        tokens.append(\"[SEP]\")\n        segment_ids.append(0)\n\n        for token in tokens_b:\n          tokens.append(token)\n          segment_ids.append(1)\n        tokens.append(\"[SEP]\")\n        segment_ids.append(1)\n\n        # 创建masked LM的任务的数据 Creates the predictions for the masked LM objective\n        (tokens, masked_lm_positions,\n         masked_lm_labels) = create_masked_lm_predictions(\n             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)\n        instance = TrainingInstance( # 创建训练实例的对象\n            tokens=tokens,\n            segment_ids=segment_ids,\n            is_random_next=is_random_next,\n            masked_lm_positions=masked_lm_positions,\n            masked_lm_labels=masked_lm_labels)\n        instances.append(instance)\n      current_chunk = [] # 清空当前块\n      current_length = 0 # 重置当前文本块的长度\n    i += 1 # 接着文档中的内容往后看\n\n  return instances\n\n\nMaskedLmInstance = collections.namedtuple(\"MaskedLmInstance\",\n                                          [\"index\", \"label\"])\n\n\ndef create_masked_lm_predictions(tokens, masked_lm_prob,\n                                 max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates the predictions for the masked LM objective.\"\"\"\n\n  cand_indexes = []\n  for (i, token) in enumerate(tokens):\n    if token == \"[CLS]\" or token == \"[SEP]\":\n      continue\n    # Whole Word Masking means that if we mask all of the wordpieces\n    # corresponding to an original word. When a word has been split into\n    # WordPieces, the first token does not have any marker and any subsequence\n    # tokens are prefixed with ##. So whenever we see the ## token, we\n    # append it to the previous set of word indexes.\n    #\n    # Note that Whole Word Masking does *not* change the training code\n    # at all -- we still predict each WordPiece independently, softmaxed\n    # over the entire vocabulary.\n    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and\n            token.startswith(\"##\")):\n      cand_indexes[-1].append(i)\n    else:\n      cand_indexes.append([i])\n\n  rng.shuffle(cand_indexes)\n\n  if FLAGS.non_chinese==False: # if non chinese is False, that means it is chinese, then try to remove \"##\" which is added previously\n    output_tokens = [t[2:] if len(re.findall('##[\\u4E00-\\u9FA5]', t)) > 0 else t for t in tokens]  # 去掉\"##\"\n  else: # english and other language, which is not chinese\n    output_tokens = list(tokens)\n\n  num_to_predict = min(max_predictions_per_seq,\n                       max(1, int(round(len(tokens) * masked_lm_prob))))\n\n  masked_lms = []\n  covered_indexes = set()\n  for index_set in cand_indexes:\n    if len(masked_lms) >= num_to_predict:\n      break\n    # If adding a whole-word mask would exceed the maximum number of\n    # predictions, then just skip this candidate.\n    if len(masked_lms) + len(index_set) > num_to_predict:\n      continue\n    is_any_index_covered = False\n    for index in index_set:\n      if index in covered_indexes:\n        is_any_index_covered = True\n        break\n    if is_any_index_covered:\n      continue\n    for index in index_set:\n      covered_indexes.add(index)\n\n      masked_token = None\n      # 80% of the time, replace with [MASK]\n      if rng.random() < 0.8:\n        masked_token = \"[MASK]\"\n      else:\n        # 10% of the time, keep original\n        if rng.random() < 0.5:\n          if FLAGS.non_chinese == False: # if non chinese is False, that means it is chinese, then try to remove \"##\" which is added previously\n            masked_token = tokens[index][2:] if len(re.findall('##[\\u4E00-\\u9FA5]', tokens[index])) > 0 else tokens[index]  # 去掉\"##\"\n          else:\n            masked_token = tokens[index]\n        # 10% of the time, replace with random word\n        else:\n          masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]\n\n      output_tokens[index] = masked_token\n\n      masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))\n  assert len(masked_lms) <= num_to_predict\n  masked_lms = sorted(masked_lms, key=lambda x: x.index)\n\n  masked_lm_positions = []\n  masked_lm_labels = []\n  for p in masked_lms:\n    masked_lm_positions.append(p.index)\n    masked_lm_labels.append(p.label)\n\n  # tf.logging.info('%s' % (tokens))\n  # tf.logging.info('%s' % (output_tokens))\n  return (output_tokens, masked_lm_positions, masked_lm_labels)\n\ndef create_masked_lm_predictions_original(tokens, masked_lm_prob,\n                                 max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates the predictions for the masked LM objective.\"\"\"\n\n  cand_indexes = []\n  for (i, token) in enumerate(tokens):\n    if token == \"[CLS]\" or token == \"[SEP]\":\n      continue\n    # Whole Word Masking means that if we mask all of the wordpieces\n    # corresponding to an original word. When a word has been split into\n    # WordPieces, the first token does not have any marker and any subsequence\n    # tokens are prefixed with ##. So whenever we see the ## token, we\n    # append it to the previous set of word indexes.\n    #\n    # Note that Whole Word Masking does *not* change the training code\n    # at all -- we still predict each WordPiece independently, softmaxed\n    # over the entire vocabulary.\n    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and\n        token.startswith(\"##\")):\n      cand_indexes[-1].append(i)\n    else:\n      cand_indexes.append([i])\n\n  rng.shuffle(cand_indexes)\n\n  output_tokens = list(tokens)\n\n  num_to_predict = min(max_predictions_per_seq,\n                       max(1, int(round(len(tokens) * masked_lm_prob))))\n\n  masked_lms = []\n  covered_indexes = set()\n  for index_set in cand_indexes:\n    if len(masked_lms) >= num_to_predict:\n      break\n    # If adding a whole-word mask would exceed the maximum number of\n    # predictions, then just skip this candidate.\n    if len(masked_lms) + len(index_set) > num_to_predict:\n      continue\n    is_any_index_covered = False\n    for index in index_set:\n      if index in covered_indexes:\n        is_any_index_covered = True\n        break\n    if is_any_index_covered:\n      continue\n    for index in index_set:\n      covered_indexes.add(index)\n\n      masked_token = None\n      # 80% of the time, replace with [MASK]\n      if rng.random() < 0.8:\n        masked_token = \"[MASK]\"\n      else:\n        # 10% of the time, keep original\n        if rng.random() < 0.5:\n          masked_token = tokens[index]\n        # 10% of the time, replace with random word\n        else:\n          masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]\n\n      output_tokens[index] = masked_token\n\n      masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))\n  assert len(masked_lms) <= num_to_predict\n  masked_lms = sorted(masked_lms, key=lambda x: x.index)\n\n  masked_lm_positions = []\n  masked_lm_labels = []\n  for p in masked_lms:\n    masked_lm_positions.append(p.index)\n    masked_lm_labels.append(p.label)\n\n  return (output_tokens, masked_lm_positions, masked_lm_labels)\n\n\ndef truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):\n  \"\"\"Truncates a pair of sequences to a maximum sequence length.\"\"\"\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_num_tokens:\n      break\n\n    trunc_tokens = tokens_a if len(tokens_a) > len(tokens_b) else tokens_b\n    assert len(trunc_tokens) >= 1\n\n    # We want to sometimes truncate from the front and sometimes from the\n    # back to add more randomness and avoid biases.\n    if rng.random() < 0.5:\n      del trunc_tokens[0]\n    else:\n      trunc_tokens.pop()\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  tokenizer = tokenization.FullTokenizer(\n      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)\n\n  input_files = []\n  for input_pattern in FLAGS.input_file.split(\",\"):\n    input_files.extend(tf.gfile.Glob(input_pattern))\n\n  tf.logging.info(\"*** Reading from input files ***\")\n  for input_file in input_files:\n    tf.logging.info(\"  %s\", input_file)\n\n  rng = random.Random(FLAGS.random_seed)\n  instances = create_training_instances(\n      input_files, tokenizer, FLAGS.max_seq_length, FLAGS.dupe_factor,\n      FLAGS.short_seq_prob, FLAGS.masked_lm_prob, FLAGS.max_predictions_per_seq,\n      rng)\n\n  output_files = FLAGS.output_file.split(\",\")\n  tf.logging.info(\"*** Writing to output files ***\")\n  for output_file in output_files:\n    tf.logging.info(\"  %s\", output_file)\n\n  write_instance_to_example_files(instances, tokenizer, FLAGS.max_seq_length,\n                                  FLAGS.max_predictions_per_seq, output_files)\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"input_file\")\n  flags.mark_flag_as_required(\"output_file\")\n  flags.mark_flag_as_required(\"vocab_file\")\n  tf.app.run()"
  },
  {
    "path": "create_pretraining_data_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n# coding=utf-8\n\"\"\"Create masked LM/next sentence masked_lm TF examples for ALBERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport random\n\nimport numpy as np\nimport six\nfrom six.moves import range\nfrom six.moves import zip\nimport tensorflow as tf\n\nfrom albert import tokenization\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\nflags.DEFINE_string(\"input_file\", None,\n                    \"Input raw text file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\n    \"output_file\", None,\n    \"Output TF example file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\n    \"vocab_file\", None,\n    \"The vocabulary file that the ALBERT model was trained on.\")\n\nflags.DEFINE_string(\"spm_model_file\", None,\n                    \"The model file for sentence piece tokenization.\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_bool(\n    \"do_whole_word_mask\", True,\n    \"Whether to use whole word masking rather than per-xWordPiece masking.\")\n\nflags.DEFINE_bool(\n    \"do_permutation\", False,\n    \"Whether to do the permutation training.\")\n\nflags.DEFINE_bool(\n    \"favor_shorter_ngram\", False,\n    \"Whether to set higher probabilities for sampling shorter ngrams.\")\n\nflags.DEFINE_bool(\n    \"random_next_sentence\", False,\n    \"Whether to use the sentence that's right before the current sentence \"\n    \"as the negative sample for next sentence prection, rather than using \"\n    \"sentences from other random documents.\")\n\nflags.DEFINE_integer(\"max_seq_length\", 512, \"Maximum sequence length.\")\n\nflags.DEFINE_integer(\"ngram\", 3, \"Maximum number of ngrams to mask.\")\n\nflags.DEFINE_integer(\"max_predictions_per_seq\", 20,\n                     \"Maximum number of masked LM predictions per sequence.\")\n\nflags.DEFINE_integer(\"random_seed\", 12345, \"Random seed for data generation.\")\n\nflags.DEFINE_integer(\n    \"dupe_factor\", 10,\n    \"Number of times to duplicate the input data (with different masks).\")\n\nflags.DEFINE_float(\"masked_lm_prob\", 0.15, \"Masked LM probability.\")\n\nflags.DEFINE_float(\n    \"short_seq_prob\", 0.1,\n    \"Probability of creating sequences which are shorter than the \"\n    \"maximum length.\")\n\n\nclass TrainingInstance(object):\n  \"\"\"A single training instance (sentence pair).\"\"\"\n\n  def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm_labels,\n               is_random_next, token_boundary):\n    self.tokens = tokens\n    self.segment_ids = segment_ids\n    self.is_random_next = is_random_next\n    self.token_boundary = token_boundary\n    self.masked_lm_positions = masked_lm_positions\n    self.masked_lm_labels = masked_lm_labels\n\n  def __str__(self):\n    s = \"\"\n    s += \"tokens: %s\\n\" % (\" \".join(\n        [tokenization.printable_text(x) for x in self.tokens]))\n    s += \"segment_ids: %s\\n\" % (\" \".join([str(x) for x in self.segment_ids]))\n    s += \"token_boundary: %s\\n\" % (\" \".join(\n        [str(x) for x in self.token_boundary]))\n    s += \"is_random_next: %s\\n\" % self.is_random_next\n    s += \"masked_lm_positions: %s\\n\" % (\" \".join(\n        [str(x) for x in self.masked_lm_positions]))\n    s += \"masked_lm_labels: %s\\n\" % (\" \".join(\n        [tokenization.printable_text(x) for x in self.masked_lm_labels]))\n    s += \"\\n\"\n    return s\n\n  def __repr__(self):\n    return self.__str__()\n\n\ndef write_instance_to_example_files(instances, tokenizer, max_seq_length,\n                                    max_predictions_per_seq, output_files):\n  \"\"\"Create TF example files from `TrainingInstance`s.\"\"\"\n  writers = []\n  for output_file in output_files:\n    writers.append(tf.python_io.TFRecordWriter(output_file))\n\n  writer_index = 0\n\n  total_written = 0\n  for (inst_index, instance) in enumerate(instances):\n    input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)\n    input_mask = [1] * len(input_ids)\n    segment_ids = list(instance.segment_ids)\n    token_boundary = list(instance.token_boundary)\n    assert len(input_ids) <= max_seq_length\n\n    while len(input_ids) < max_seq_length:\n      input_ids.append(0)\n      input_mask.append(0)\n      segment_ids.append(0)\n      token_boundary.append(0)\n\n    assert len(input_ids) == max_seq_length\n    assert len(input_mask) == max_seq_length\n    assert len(segment_ids) == max_seq_length\n\n    masked_lm_positions = list(instance.masked_lm_positions)\n    masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)\n    masked_lm_weights = [1.0] * len(masked_lm_ids)\n\n    multiplier = 1 + int(FLAGS.do_permutation)\n    while len(masked_lm_positions) < max_predictions_per_seq * multiplier:\n      masked_lm_positions.append(0)\n      masked_lm_ids.append(0)\n      masked_lm_weights.append(0.0)\n\n    sentence_order_label = 1 if instance.is_random_next else 0\n\n    features = collections.OrderedDict()\n    features[\"input_ids\"] = create_int_feature(input_ids)\n    features[\"input_mask\"] = create_int_feature(input_mask)\n    features[\"segment_ids\"] = create_int_feature(segment_ids)\n    features[\"token_boundary\"] = create_int_feature(token_boundary)\n    features[\"masked_lm_positions\"] = create_int_feature(masked_lm_positions)\n    features[\"masked_lm_ids\"] = create_int_feature(masked_lm_ids)\n    features[\"masked_lm_weights\"] = create_float_feature(masked_lm_weights)\n    # Note: We keep this feature name `next_sentence_labels` to be compatible\n    # with the original data created by lanzhzh@. However, in the ALBERT case\n    # it does contain sentence_order_label.\n    features[\"next_sentence_labels\"] = create_int_feature(\n        [sentence_order_label])\n\n    tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n\n    writers[writer_index].write(tf_example.SerializeToString())\n    writer_index = (writer_index + 1) % len(writers)\n\n    total_written += 1\n\n    if inst_index < 6:\n      tf.logging.info(\"*** Example ***\")\n      tf.logging.info(\"tokens: %s\" % \" \".join(\n          [tokenization.printable_text(x) for x in instance.tokens]))\n\n      for feature_name in features.keys():\n        feature = features[feature_name]\n        values = []\n        if feature.int64_list.value:\n          values = feature.int64_list.value\n        elif feature.float_list.value:\n          values = feature.float_list.value\n        tf.logging.info(\n            \"%s: %s\" % (feature_name, \" \".join([str(x) for x in values])))\n\n  for writer in writers:\n    writer.close()\n\n  tf.logging.info(\"Wrote %d total instances\", total_written)\n\n\ndef create_int_feature(values):\n  feature = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n  return feature\n\n\ndef create_float_feature(values):\n  feature = tf.train.Feature(float_list=tf.train.FloatList(value=list(values)))\n  return feature\n\n\ndef create_training_instances(input_files, tokenizer, max_seq_length,\n                              dupe_factor, short_seq_prob, masked_lm_prob,\n                              max_predictions_per_seq, rng):\n  \"\"\"Create `TrainingInstance`s from raw text.\"\"\"\n  all_documents = [[]]\n\n  # Input file format:\n  # (1) One sentence per line. These should ideally be actual sentences, not\n  # entire paragraphs or arbitrary spans of text. (Because we use the\n  # sentence boundaries for the \"next sentence prediction\" task).\n  # (2) Blank lines between documents. Document boundaries are needed so\n  # that the \"next sentence prediction\" task doesn't span between documents.\n  for input_file in input_files:\n    with tf.gfile.GFile(input_file, \"r\") as reader:\n      while True:\n        line = reader.readline()\n        if not FLAGS.spm_model_file:\n          line = tokenization.convert_to_unicode(line)\n        if not line:\n          break\n        if FLAGS.spm_model_file:\n          line = tokenization.preprocess_text(line, lower=FLAGS.do_lower_case)\n        else:\n          line = line.strip()\n\n        # Empty lines are used as document delimiters\n        if not line:\n          all_documents.append([])\n        tokens = tokenizer.tokenize(line)\n        if tokens:\n          all_documents[-1].append(tokens)\n\n  # Remove empty documents\n  all_documents = [x for x in all_documents if x]\n  rng.shuffle(all_documents)\n\n  vocab_words = list(tokenizer.vocab.keys())\n  instances = []\n  for _ in range(dupe_factor):\n    for document_index in range(len(all_documents)):\n      instances.extend(\n          create_instances_from_document(\n              all_documents, document_index, max_seq_length, short_seq_prob,\n              masked_lm_prob, max_predictions_per_seq, vocab_words, rng))\n\n  rng.shuffle(instances)\n  return instances\n\n\ndef create_instances_from_document(\n    all_documents, document_index, max_seq_length, short_seq_prob,\n    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates `TrainingInstance`s for a single document.\"\"\"\n  document = all_documents[document_index]\n\n  # Account for [CLS], [SEP], [SEP]\n  max_num_tokens = max_seq_length - 3\n\n  # We *usually* want to fill up the entire sequence since we are padding\n  # to `max_seq_length` anyways, so short sequences are generally wasted\n  # computation. However, we *sometimes*\n  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter\n  # sequences to minimize the mismatch between pre-training and fine-tuning.\n  # The `target_seq_length` is just a rough target however, whereas\n  # `max_seq_length` is a hard limit.\n  target_seq_length = max_num_tokens\n  if rng.random() < short_seq_prob:\n    target_seq_length = rng.randint(2, max_num_tokens)\n\n  # We DON'T just concatenate all of the tokens from a document into a long\n  # sequence and choose an arbitrary split point because this would make the\n  # next sentence prediction task too easy. Instead, we split the input into\n  # segments \"A\" and \"B\" based on the actual \"sentences\" provided by the user\n  # input.\n  instances = []\n  current_chunk = []\n  current_length = 0\n  i = 0\n  while i < len(document):\n    segment = document[i]\n    current_chunk.append(segment)\n    current_length += len(segment)\n    if i == len(document) - 1 or current_length >= target_seq_length:\n      if current_chunk:\n        # `a_end` is how many segments from `current_chunk` go into the `A`\n        # (first) sentence.\n        a_end = 1\n        if len(current_chunk) >= 2:\n          a_end = rng.randint(1, len(current_chunk) - 1)\n\n        tokens_a = []\n        for j in range(a_end):\n          tokens_a.extend(current_chunk[j])\n\n        tokens_b = []\n        # Random next\n        is_random_next = False\n        if len(current_chunk) == 1 or \\\n            (FLAGS.random_next_sentence and rng.random() < 0.5):\n          is_random_next = True\n          target_b_length = target_seq_length - len(tokens_a)\n\n          # This should rarely go for more than one iteration for large\n          # corpora. However, just to be careful, we try to make sure that\n          # the random document is not the same as the document\n          # we're processing.\n          for _ in range(10):\n            random_document_index = rng.randint(0, len(all_documents) - 1)\n            if random_document_index != document_index:\n              break\n\n          random_document = all_documents[random_document_index]\n          random_start = rng.randint(0, len(random_document) - 1)\n          for j in range(random_start, len(random_document)):\n            tokens_b.extend(random_document[j])\n            if len(tokens_b) >= target_b_length:\n              break\n          # We didn't actually use these segments so we \"put them back\" so\n          # they don't go to waste.\n          num_unused_segments = len(current_chunk) - a_end\n          i -= num_unused_segments\n        elif not FLAGS.random_next_sentence and rng.random() < 0.5:\n          is_random_next = True\n          for j in range(a_end, len(current_chunk)):\n            tokens_b.extend(current_chunk[j])\n          # Note(mingdachen): in this case, we just swap tokens_a and tokens_b\n          tokens_a, tokens_b = tokens_b, tokens_a\n        # Actual next\n        else:\n          is_random_next = False\n          for j in range(a_end, len(current_chunk)):\n            tokens_b.extend(current_chunk[j])\n        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)\n\n        assert len(tokens_a) >= 1\n        assert len(tokens_b) >= 1\n\n        tokens = []\n        segment_ids = []\n        tokens.append(\"[CLS]\")\n        segment_ids.append(0)\n        for token in tokens_a:\n          tokens.append(token)\n          segment_ids.append(0)\n\n        tokens.append(\"[SEP]\")\n        segment_ids.append(0)\n\n        for token in tokens_b:\n          tokens.append(token)\n          segment_ids.append(1)\n        tokens.append(\"[SEP]\")\n        segment_ids.append(1)\n\n        (tokens, masked_lm_positions,\n         masked_lm_labels, token_boundary) = create_masked_lm_predictions(\n             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)\n        instance = TrainingInstance(\n            tokens=tokens,\n            segment_ids=segment_ids,\n            is_random_next=is_random_next,\n            token_boundary=token_boundary,\n            masked_lm_positions=masked_lm_positions,\n            masked_lm_labels=masked_lm_labels)\n        instances.append(instance)\n      current_chunk = []\n      current_length = 0\n    i += 1\n\n  return instances\n\n\nMaskedLmInstance = collections.namedtuple(\"MaskedLmInstance\",\n                                          [\"index\", \"label\"])\n\n\ndef _is_start_piece_sp(piece):\n  \"\"\"Check if the current word piece is the starting piece (sentence piece).\"\"\"\n  special_pieces = set(list('!\"#$%&\\\"()*+,-./:;?@[\\\\]^_`{|}~'))\n  special_pieces.add(u\"€\".encode(\"utf-8\"))\n  special_pieces.add(u\"£\".encode(\"utf-8\"))\n  # Note(mingdachen):\n  # For foreign characters, we always treat them as a whole piece.\n  english_chars = set(list(\"abcdefghijklmnopqrstuvwhyz\"))\n  if (six.ensure_str(piece).startswith(\"▁\") or\n      six.ensure_str(piece).startswith(\"<\") or piece in special_pieces or\n      not all([i.lower() in english_chars.union(special_pieces)\n               for i in piece])):\n    return True\n  else:\n    return False\n\n\ndef _is_start_piece_bert(piece):\n  \"\"\"Check if the current word piece is the starting piece (BERT).\"\"\"\n  # When a word has been split into\n  # WordPieces, the first token does not have any marker and any subsequence\n  # tokens are prefixed with ##. So whenever we see the ## token, we\n  # append it to the previous set of word indexes.\n  return not six.ensure_str(piece).startswith(\"##\")\n\n\ndef is_start_piece(piece):\n  if FLAGS.spm_model_file:\n    return _is_start_piece_sp(piece)\n  else:\n    return _is_start_piece_bert(piece)\n\n\ndef create_masked_lm_predictions(tokens, masked_lm_prob,\n                                 max_predictions_per_seq, vocab_words, rng):\n  \"\"\"Creates the predictions for the masked LM objective.\"\"\"\n\n  cand_indexes = []\n  # Note(mingdachen): We create a list for recording if the piece is\n  # the starting piece of current token, where 1 means true, so that\n  # on-the-fly whole word masking is possible.\n  token_boundary = [0] * len(tokens)\n\n  for (i, token) in enumerate(tokens):\n    if token == \"[CLS]\" or token == \"[SEP]\":\n      token_boundary[i] = 1\n      continue\n    # Whole Word Masking means that if we mask all of the wordpieces\n    # corresponding to an original word.\n    #\n    # Note that Whole Word Masking does *not* change the training code\n    # at all -- we still predict each WordPiece independently, softmaxed\n    # over the entire vocabulary.\n    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and\n        not is_start_piece(token)):\n      cand_indexes[-1].append(i)\n    else:\n      cand_indexes.append([i])\n      if is_start_piece(token):\n        token_boundary[i] = 1\n\n  output_tokens = list(tokens)\n\n  masked_lm_positions = []\n  masked_lm_labels = []\n\n  if masked_lm_prob == 0:\n    return (output_tokens, masked_lm_positions,\n            masked_lm_labels, token_boundary)\n\n  num_to_predict = min(max_predictions_per_seq,\n                       max(1, int(round(len(tokens) * masked_lm_prob))))\n\n  # Note(mingdachen):\n  # By default, we set the probilities to favor longer ngram sequences.\n  ngrams = np.arange(1, FLAGS.ngram + 1, dtype=np.int64)\n  pvals = 1. / np.arange(1, FLAGS.ngram + 1)\n  pvals /= pvals.sum(keepdims=True)\n\n  if FLAGS.favor_shorter_ngram:\n    pvals = pvals[::-1]\n\n  ngram_indexes = []\n  for idx in range(len(cand_indexes)):\n    ngram_index = []\n    for n in ngrams:\n      ngram_index.append(cand_indexes[idx:idx+n])\n    ngram_indexes.append(ngram_index)\n\n  rng.shuffle(ngram_indexes)\n\n  masked_lms = []\n  covered_indexes = set()\n  for cand_index_set in ngram_indexes:\n    if len(masked_lms) >= num_to_predict:\n      break\n    if not cand_index_set:\n      continue\n    # Note(mingdachen):\n    # Skip current piece if they are covered in lm masking or previous ngrams.\n    for index_set in cand_index_set[0]:\n      for index in index_set:\n        if index in covered_indexes:\n          continue\n\n    n = np.random.choice(ngrams[:len(cand_index_set)],\n                         p=pvals[:len(cand_index_set)] /\n                         pvals[:len(cand_index_set)].sum(keepdims=True))\n    index_set = sum(cand_index_set[n - 1], [])\n    n -= 1\n    # Note(mingdachen):\n    # Repeatedly looking for a candidate that does not exceed the\n    # maximum number of predictions by trying shorter ngrams.\n    while len(masked_lms) + len(index_set) > num_to_predict:\n      if n == 0:\n        break\n      index_set = sum(cand_index_set[n - 1], [])\n      n -= 1\n    # If adding a whole-word mask would exceed the maximum number of\n    # predictions, then just skip this candidate.\n    if len(masked_lms) + len(index_set) > num_to_predict:\n      continue\n    is_any_index_covered = False\n    for index in index_set:\n      if index in covered_indexes:\n        is_any_index_covered = True\n        break\n    if is_any_index_covered:\n      continue\n    for index in index_set:\n      covered_indexes.add(index)\n\n      masked_token = None\n      # 80% of the time, replace with [MASK]\n      if rng.random() < 0.8:\n        masked_token = \"[MASK]\"\n      else:\n        # 10% of the time, keep original\n        if rng.random() < 0.5:\n          masked_token = tokens[index]\n        # 10% of the time, replace with random word\n        else:\n          masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]\n\n      output_tokens[index] = masked_token\n\n      masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))\n  assert len(masked_lms) <= num_to_predict\n\n  rng.shuffle(ngram_indexes)\n\n  select_indexes = set()\n  if FLAGS.do_permutation:\n    for cand_index_set in ngram_indexes:\n      if len(select_indexes) >= num_to_predict:\n        break\n      if not cand_index_set:\n        continue\n      # Note(mingdachen):\n      # Skip current piece if they are covered in lm masking or previous ngrams.\n      for index_set in cand_index_set[0]:\n        for index in index_set:\n          if index in covered_indexes or index in select_indexes:\n            continue\n\n      n = np.random.choice(ngrams[:len(cand_index_set)],\n                           p=pvals[:len(cand_index_set)] /\n                           pvals[:len(cand_index_set)].sum(keepdims=True))\n      index_set = sum(cand_index_set[n - 1], [])\n      n -= 1\n\n      while len(select_indexes) + len(index_set) > num_to_predict:\n        if n == 0:\n          break\n        index_set = sum(cand_index_set[n - 1], [])\n        n -= 1\n      # If adding a whole-word mask would exceed the maximum number of\n      # predictions, then just skip this candidate.\n      if len(select_indexes) + len(index_set) > num_to_predict:\n        continue\n      is_any_index_covered = False\n      for index in index_set:\n        if index in covered_indexes or index in select_indexes:\n          is_any_index_covered = True\n          break\n      if is_any_index_covered:\n        continue\n      for index in index_set:\n        select_indexes.add(index)\n    assert len(select_indexes) <= num_to_predict\n\n    select_indexes = sorted(select_indexes)\n    permute_indexes = list(select_indexes)\n    rng.shuffle(permute_indexes)\n    orig_token = list(output_tokens)\n\n    for src_i, tgt_i in zip(select_indexes, permute_indexes):\n      output_tokens[src_i] = orig_token[tgt_i]\n      masked_lms.append(MaskedLmInstance(index=src_i, label=orig_token[src_i]))\n\n  masked_lms = sorted(masked_lms, key=lambda x: x.index)\n\n  for p in masked_lms:\n    masked_lm_positions.append(p.index)\n    masked_lm_labels.append(p.label)\n  return (output_tokens, masked_lm_positions, masked_lm_labels, token_boundary)\n\n\ndef truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):\n  \"\"\"Truncates a pair of sequences to a maximum sequence length.\"\"\"\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_num_tokens:\n      break\n\n    trunc_tokens = tokens_a if len(tokens_a) > len(tokens_b) else tokens_b\n    assert len(trunc_tokens) >= 1\n\n    # We want to sometimes truncate from the front and sometimes from the\n    # back to add more randomness and avoid biases.\n    if rng.random() < 0.5:\n      del trunc_tokens[0]\n    else:\n      trunc_tokens.pop()\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  tokenizer = tokenization.FullTokenizer(\n      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case,\n      spm_model_file=FLAGS.spm_model_file)\n\n  input_files = []\n  for input_pattern in FLAGS.input_file.split(\",\"):\n    input_files.extend(tf.gfile.Glob(input_pattern))\n\n  tf.logging.info(\"*** Reading from input files ***\")\n  for input_file in input_files:\n    tf.logging.info(\"  %s\", input_file)\n\n  rng = random.Random(FLAGS.random_seed)\n  instances = create_training_instances(\n      input_files, tokenizer, FLAGS.max_seq_length, FLAGS.dupe_factor,\n      FLAGS.short_seq_prob, FLAGS.masked_lm_prob, FLAGS.max_predictions_per_seq,\n      rng)\n\n  tf.logging.info(\"number of instances: %i\", len(instances))\n\n  output_files = FLAGS.output_file.split(\",\")\n  tf.logging.info(\"*** Writing to output files ***\")\n  for output_file in output_files:\n    tf.logging.info(\"  %s\", output_file)\n\n  write_instance_to_example_files(instances, tokenizer, FLAGS.max_seq_length,\n                                  FLAGS.max_predictions_per_seq, output_files)\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"input_file\")\n  flags.mark_flag_as_required(\"output_file\")\n  flags.mark_flag_as_required(\"vocab_file\")\n  tf.app.run()"
  },
  {
    "path": "data/news_zh_1.txt",
    "content": "最后的南京老城该往何处去 城市化时代呼唤文化自觉\n【概要】80后学者姚远出版《城市的自觉》一书 姚远出版《城市的自觉》 作者简介姚远，政治学博士，1981年出生于南京，1999年从金陵中学毕业后考入北京大学国际关系学院，负笈燕园十二载，获政治学博士学位。\n现任教于南京大学政府管理学院。\n在关系古都北京、南京等历史文化名城存废的历史关头，他锲而不舍地为抢救中华文明奔走呐喊。\n2010年，他被中国文物保护基金会评为“中国文化遗产保护年度十大杰出人物”，当时的获奖评语是：一支?土耳其诗人纳齐姆·希克梅特曾深情地说：“人的一生有两样东西不会忘记，那就是母亲的面孔和城市的面貌。\n”然而，前不久南京再次发生颜料坊地块市级文保单位两进建筑被毁的事件。\n故宫博物院院长、原国家文物局局长单霁翔近日在宁直言，南京城南再遭损毁令他心痛。\n南京老城“路在何方”？\n2010年被中国文物保护基金会评为“中国文化遗产保护年度十大杰出人物”的80后学者、南京大学姚远老师所著的《城市的自觉》近日正式出版。\n书中探索古城保护与复兴的建设性路径，值得南京的决策者们在颜料坊事件后再次深思。\n江南时报记者黄勇疑问:城市化，是否迷失了文化自觉“目睹一座座古建筑的消失，行走在古城的废墟，想到梁思成说过的‘拆掉北京的一座城楼，就像割掉我的一块肉；扒掉北京的一段城墙，就像扒掉我的一层皮’，真是感同身受，我流泪了。\n”这是姚远最让记者为之动容的一句话，也是《城市的自觉》一书中的“魂”。\n包括南京在内，中国大多数城市正处于大拆除的时代，成片的历史街区在“旧城改造”的大旗下被不断夷为平地。\n有专家称，这场“休克疗法式”的“改造”，对中华文脉的影响之深、之巨、之不可逆，堪称中国城市史上“三千年未有之大变局”。\n《城市的自觉》正是在这种背景下，由北京大学出版社于近日出版的。\n书中，姚远以情理交融的文字，辅之以背景、南京古城珍贵的最后影像，如实记录了在北京梁思成故居和宣南、东四八条、钟鼓楼等历史街区，南京颜料坊、南捕厅、门东、门西等历史街区的最后时刻，为阻挡推土机而屡败屡战的历程。\n同时，又理性剖析了与存续城市记忆密切相关的文化自觉、物权保护、民生改善、公众参与等议题，探索古城保护与复兴的建设性路径。\n为何要保老城？\n很多人认为陈旧的老街区、老房子应该为摩天大楼让位，造高速路、摩天楼是现代化，“保护老古董”是抱残守缺，姚远却不是这种看法：“一些决策者并不知城市遗产保护恰恰是‘后工业’、‘后现代’的思想，比前者的理念差不多领先了一个世纪。\n” 在他眼里，南京这座千年古城曾是“活”着的，老城里有最纯正的方言、最鲜活的民俗、最地道的小吃，简直是一座巨大的民俗博物馆。\n“你可以在同老者的交谈中，听到一个个家族或老宅的兴衰故事。\n这里的城与人，就是一本厚重的大书，它们用最生动的语言向你讲述不一样的‘城南旧事’。\n”面对许多古城不断遭到大拆大建、拆真建假、拆旧建新的厄运，姚远痛心地说，“我们的城市化，是否迷失了自我认同，是否失去了文化自觉的能力？\n在城市化的文化自觉重建之前，我们还将继续付出多少代价？\n”现状:老城南仅剩不到1平方公里南京城曾有十九个别称，如秦淮、白下、建邺、江宁等，建城史更是长达两千五百年。\n但如今，除去明城墙以及一些重点文物以及七零八落的民国建筑之外，这个城市跟中国其他的城市看上去并无太多区别，鳞次栉比的高楼大厦，车水马龙的宽阔街道，川流不息的红男绿女……持续多年的旧城改造，已经让南京老城日益失去古朴的历史风貌。\n秦淮河畔的老城南，是南京文化的发源地，是南京的根。\n在2006年前，尽管南京诸多的“殿、庙、塔、桥”已在兵火和变乱中消失，但秦淮河畔的老城南依然保存了文物丰富、风貌完整的历史街区。\n然而，2006年，南京风云突起，突击对颜料坊、安品街等历史街区实施“危旧房改造”，拆毁大量文物建筑。\n2009年又是一轮“危改”，大大的“拆”字，再次涂上了门东、门西、南捕厅等多片老街区。\n2010年至今，南京先后出台了《南京市历史文化名城保护条例》《南京历史文化名城保护规划》《南京老城南历史城区保护规划与城市设计》，以法规的高度，回应了社会各界的诉求，明确要求对老城的整体保护。\n姚远和其他学者联名提出的建议，有40处被采纳进了最后的《条例》中。\n姚远告诉江南时报记者，南京的传统旧城区——老城南仅剩不到1平方公里，尚不及50平方公里老城总面积的2%，整体保护势在必行。\n但他并不认为整体保护意味着“冻结不动”，而是强调古民居、古街巷和宏伟的古建筑一样重要，它们是古都特有的城市肌理，低矮的民居衬托高大的城阙，形成轮廓丰富的城市格局。\n如果消灭了它们，名胜古迹就变成无法交融联络的“孤岛”，古都的整体风貌则无从谈起。\n“对于金陵古城濒危的最后这点种子，实行‘整体保护’已经没有任何讨价还价的余地。\n”《城市的自觉》一书中，姚远的声音振聋发聩。\n方案:探索保护与整治的最大合力可惜的是，在专家学者与推土机的拉锯战中，前者基本还是处于下风的，即便是中央领导的几次批示，旧城改造的推土机依然我行我素，将一面面古墙碾在轮下。\n颜料坊、牛市、门东等被“肢解”的老城南片区，如今多已竖起或正在建设房地产开发、商业项目。\n2002年8月，姚远在南京颜料坊开始了古城保护的第一次拍摄。\n如今牛市64号-颜料坊49号这座百年清代建筑却再遭破坏。\n单霁翔近日在南大演讲中也表示，颜料坊再遭损毁令人心痛。\n“我不认同南京老城南成片拆除，搬迁当地住户的改造方式。\n简单地认为它的居住形式落后了，这种态度是消极的，没有给予作为代表地域特色的传统建筑的居住形式有尊严的呵护。\n”《城市的自觉》一书中也多次提及南京老城不能“只见物，不见人”。\n姚远强调，南京历史文化名城的保护，离不开对传统社区的活态保护。\n老城南有丰富的民俗和古老的街区，是唇齿相依的一个整体。\n拆去了老宅，迁走了居民，文化自然就成了无源之水、无本之木。\n“国际上的成功经验表明，保护从来不是发展、民生、现代化的反义词。\n”姚远建议，老城区的整治，可以在政府的指导和协助下，以居民为主体，通过社区互助的“自我修缮”的方式来实施，将“旧城区改建”从拆迁模式下的行政关系转变为修缮模式下的民事关系，最大限度地调动各方面的积极性，形成保护与整治的最大合力。\n措施:用行动让法律“站起来”经历了两次保卫战，姚远对于文物保护方面的法律条文早已如数家珍。\n在他看来，“法治”和“参与”这两个关键词尤为重要。\n姚远认为，政府的很多失误是因为政策制定的封闭性，推土机开到门口时才告知公众。\n公民参与，就要求行政更加透明、公开。\n“几次保护后制定的政策或者法律法规，也很重要。\n因为未来只要有人参与去触动，政策或者法律法规就能‘站起来’，变成一套强有力的程序，约束政府行为。\n”“这些年古城保护的每一点进步，都离不开广泛的公众参与，都凝结着社会各界共同的努力。\n”姚远认为，在北京、南京等许多古城，一批志愿者、社会人士和民间团体，在古城命运的危急关头，已经显示出日益崛起的公众参与的巨大力量。\n“关键要有人能够站出来。\n第一个人站出来，就会有第二个人跟上，专家和媒体也会介入，事情就能在公开博弈中得到较为合理的解决。\n我国目前民间的文保力量正在逐渐成长，公民参与将成为构建良性社会机制的重要力量。\n”姚远强调。\n单霁翔对文化遗产保护中的公众参与也做出了高度评价。\n他在《城市的自觉》的序中写道：“保护文化遗产绝不仅仅是各级政府和文物工作者的专利，只有广大民众真心地、持久地参与文化遗产保护，文化遗产才能得到最可靠的保障。\n以姚远博士为代表的一批志愿者和社会人士，在我国文化遗产保护事业中已经显示出不可低估、无可替代的力量。\n\n不是每一块石头，都能叫珠宝\n对于很多人来说，矿石是长成这样的石头： 上图：铁矿石 上图：石 上图：煤矿石 上图：锡矿石如你所想象的那样，很多矿石都是又黑又丑，即使在野外遇到，也不会多看一眼的那种石头。\n当然，也不是所有矿石都这么丑。\n我们再看看下面这些矿石： 上图：赤铜 上图：钼铅矿 上图：方硼石 上图：自然硫 上图：云母这些矿石，能否让你感慨大自然的造化神奇?小伙伴们可能会想，这些漂亮的矿石，打磨以后就是漂亮的宝石啊，为什么我们不把他们加工成珠宝呢?这个是个好问题。\n人类自古以来就没有停止过对美好事物的追求，凡漂亮的东西都可能被人们看上，成为制作饰品原料。\n珠宝就是大自然赐予的美好的东西中的一种。\n珠宝如果不美就不能成为珠宝，这种美或表现为绚丽的颜色，或表现为透明而洁净。\n物以稀为贵，鸽血红级别的红宝石、矢车菊蓝级别的蓝宝石，每克拉价值上万美元，而某些颇美丽又可耐久的宝石(如白水晶)，由于产量较多，开采较容易，其价格一直较低。\nso，大家能明白了吧，不是每一块石头都能成为珠宝。\n如果拥有珠宝，请务必珍惜。\n目前1000+人已关注加入我们您看此文用· 秒，转发只需1秒呦~\n\n北京市黄埔同学会接待“踏寻中山足迹学习之旅”台湾参访团\n光明网讯（通讯员苏民军记者任生心）日前，由台湾中国统一联盟桃竹分会成员组成的“踏寻中山足迹学习之旅”参访团一行21人来到北京参观访问。\n在北京市黄埔同学会的精心安排下，在京期间，参访团拜谒了中山先生衣冠冢，参观了卢沟桥、抗战纪念馆、抗战名将纪念馆和宋庆龄故居等；“踏寻中山足迹学习之旅”参访团还将赴南京中山堂等地参访。\n在抗战纪念馆，参访团成员们认真聆听讲解员的介绍，仔细观看每张图片资料，回顾国共两党团结抗战的往事，缅怀那些为民族独立而壮烈牺牲的英雄。\n而后，参访团一行来到位于京西香山深处的孙中山先生衣冠冢拜谒，参访团团长李尚贤（台湾中国统一联盟总会第一副主席兼秘书长）发表了简短的感言后，全体成员在孙中山雕像前三鞠躬，向孙中山先生致敬，缅怀孙中山先生以“三民主义”为宗旨的革命的一生。\n随后，参访团一行又来到2009年建成的北京香麓园抗战名将纪念馆，瞻仰了佟麟阁将军墓，他们还参观了宋庆龄故居。\n\n鼎丰(08056.HK)向客户借出5000万人币 月息1.75厘 为期一年\n鼎丰集团控股(08056.HK)+0.030(+1.345%)公布，同意将一笔5000万元人民币的款项委托予贷款银行，以供转借予客户，贷款期为十二个月，月息1.75厘。\n(报价延迟最少十五分钟。\n\n在青岛不买房，居然能拥有这么多东西！\n这段时间青岛房价扶摇直上闹得人心惶惶这不，青岛房市，又在国庆节火了一把 国庆5天内16城启动楼市限购一时之间楼市风云大转纵观9月份青岛一手房均价怎么也有一万三四了看完十三哥默默地回去工作了 按照一套房子100平米计算购买一套房子大概需要130万在青岛，买一套房子怎么也得需要130万如果这些钱不买房能在全世界各地买什么呢？\n今天，小编就带大家（bai）感（ri）受（meng）一下在西班牙能买3.4个村庄 一位英国人，名叫尼尔·克里斯蒂，在西班牙农村西北部一个田园地区买下了一处村庄（阿鲁纳达），只花费了4.5万欧元（约合35.6万人民币）。\n简直便宜到吐血，这点钱要是在青岛的豪宅区，恐怕厕所都买不了。\n如果选的地方靠近旅游景区，稍微装修一下，变成一个度假村……妥妥的壕啊，画面太美，不敢想象……在爱尔兰差不多能买个小岛 Inishdooney岛，位于北爱尔兰西北部，售价14万英镑（约合139万人民币）。\n约38万平方米的无人居住地有淡水池塘、天然溶洞和鹅卵石海滩，美翻了有木有！\n一个小岛的钱，和青岛一个水泥格子的价格差不多。\n不要拦着最懂妹，我要去爱尔兰做岛主！\n在巴厘岛能买2座别墅 巴厘岛，蓝天、碧水、白云，美的像梦一样，而你知道吗，这座世界著名旅游岛一个小镇的别墅只要10.7万美元，也就是不到70万人民币，青岛买房那点钱都够买两栋别墅了。\n在巴厘岛拥有两座别墅是什么概念？\n发完文章小编就去买机票！\n在美国能买1驾小飞机 美国塞斯纳C172R型，最大航程可达1270公里，飞机上具备GPS导航定位系统、自动驾驶、盲降设备等，价格大概在17万美元左右，也就是104万人民币。\n在青岛买房的钱妥妥的够买一架飞机了。\n直接移民去西班牙 一个以阳光和沙滩吸引着无数游客的国家，有着激情的足球和斗牛文化、独特的海鲜美食、发达的时装行业、热情火辣的西班牙女郎...... 直接去西班牙？\n你以为我在搞笑？\n西班牙有个买房移民的政策，在西班牙的指定区域购买当地售价在170万人民币以上的房产就可以办理多次往返签证了，然后你待够10年，就可以入西班牙国籍了。\n买一大堆LV手袋 十三哥相信很多女孩应该都很喜欢LV手袋。\n这款极具魅力的CHAIN LOUISE手袋价格为2.04万人民币。\n随随便便买一堆！\n带着爱人环游世界 微博上那对香港80后小夫妻历时308天花费16万人民币走遍了37国，你们还记得吗？\n按照他们的行程，你几乎就能去环游世界了。\n什么也不用想，痛痛快快环游地球一圈！\n在澳大利亚当农场主 五卧室、三浴室的大房子，还有德尼利昆镇附近一块27英亩的农场。\n只需要美元价格14.4万美元（≈96万人民币），是不是惊呆了！\n哦，对了，澳大利亚还提供住房贷款业务哟！\n十三哥要挣钱去澳大利亚买牧场！\n在莫斯科买下1座别墅 莫斯科市中心双卧室、双浴室的豪华大别墅，你觉得多少钱？\n千万别吃惊，美元价格在15.2万美元左右（≈100.1万人民币）。\n虽然在这个城市生活总会有各种各样的压力我们必须十分努力才能看起来毫不费力但是我们永远保持一颗向上的心不气馁，好好加油！\n[海尔地产世纪公馆]新都心2期升级新品9月底推出 海尔地产世纪公馆二期规划8栋高层住宅，预计9月底推出，认筹中，交2.5万享99折优惠，预计均价17000-18000元/平。\n户型面积区间89-162平，主力120-140平品质改善产品。\n125-126平为套三，142-162平为套四。\n海尔地产世纪公馆一户一价，以上价格仅供参考，所有在售户型价格以售楼处公布为准。\n咨询电话：400-099-0099 转 27724[金隅和府]3大商圈环绕地铁房18000元 金隅和府一户一价，以下价格仅供参考，所有在售户型价格以售楼处公布为准。\n金隅和府预计9月20日加推6#楼（24F）楼王，3个单元，1梯2户，户型面积为90平套二，122平、138平套三，团购交1万团购金、10万认筹金可以享受97折优惠，预计均价18000-26000元/平。\n金隅和府位于镇江路12号，近邻山东路、延吉路、东西快速路等三横三纵交通网、未来享地铁M5之便利；CBD商圈、香港路商圈、台东商圈3大商圈环绕，居住生活便利。\n\n直播拐点来临：未来直播APP开发还有哪些趋势？\n趋势一：巨头收割直播价值，依赖巨头扶持的直播平台存活几率更高尽管一线垂直领域已经被巨头的直播平台占领，但创业者依然还有机会。\n未来在泛娱乐社交、游戏、美妆电商等核心领域必然会有几家直播平台具有突出优势，而这些具备突出优势的直播平台很可能会被BAT入股收购或者收编，因此如果能够获得巨头的资本输血与流量扶持，往往存活的几率会更大。\n趋势二：直播平台从争抢网红到争抢明星资源明星+粉丝经济+直播平台，很可能会衍生出新型的整合营销方式。\n即怎样通过可购买价值的内容设定，运营好与粉丝之间的感情沟通，让粉丝群体进行持续性参与并进行情感消费投入，直播平台与明星组合叠加的人气效应与非理性消费的频次也非常契合品牌商的需求。\n因此，直播的未来趋势将从争抢网红资源到争抢明星资源。\n这是直播平台孕育粉丝经济进而带来新型的情感消费与商业模式的要走的一条必要的路径。\n而未来可能会有越来越多的品牌商更愿意尝试这种直播互动带来的品牌曝光机会与商业变现模式。\n趋势三：从泛娱乐明星网红直播转入到二级垂直细分市场的专业直播泛娱乐直播内容属性上由于其单一、无聊的直播内容无法构成平台的核心竞争力，直播平台未来大趋势是从泛娱乐直播转入到内涵直播。\n目前部分视频直播平台已针对财经、育儿、时尚、体育、美食等垂直领域的自频道开放直播权限，内容的差异化与垂直化可以为直播平台带来新的商业模式，平台也可以通过优质的直播内容，产生付费、会员、打赏以及直播购物等盈利模式。\n因为目前缺乏真正有价值的直播，多数直播平台在内容供给侧是存在问题的，网红要提升自身与粉丝之间的黏性，显然需要差异化的内容，而从目前的欧美网红与直播内容的发展规律来看，更健康、更有价值与内涵的直播内容成为未来的发展趋势之一。\n趋势四：网红孵化器批量生产网红 将走向专业化由于在网红包装、传播、变现等方面具备专业的运营能力，网红孵化器未来须具备 “经纪人+代运营+供应链+网红星探”等多重角色，向专业网红群聚捆绑者向提供专业化的服务与垂直领域专家型、特长型、个性型网红培养者与发现者这一定位转型。\n借助在用户洞察、网红运营、电商管理方面的精良团队，需要打通粉丝营销和电商运营，并将网红、粉丝，平台、内容，品牌、供应链，进行有效链接及整合。\n趋势五：C端直播洗牌 B端企业直播崛起带动专业的商务直播需求目前，各种企业的商务发布会、沙龙、座谈、讲座、渠道大会、教育培训等方面直播需求强烈，在企业进行移动视频直播的需求推动下，它们开始寻求低成本、快速的搭建属于自己的高清视频直播平台的模式，而企业搭建视频直播平台需要专业的技术能力的服务商来应对这种需求。\n用户可以通过微信直接观看企业直播参与互动，让直播突破空间场地的限制，某种程度也代表直播产业链的一个接入的发展方向。\n趋势六：解决直播用户体验与新媒体营销，移动直播服务商将迎来新的机会直播行业进入了各行各业均可参与，并将直播作为企业服务工具的直播+时代，而玩转直播+，从技术、营销、服务、内容，进而可以衍生出更多的直播服务盈利。\n而对于解决直播体验背后的移动直播服务商，也将迎来新的机会。\n趋势七：直播或成为企业的标配，可能为企业带来更多转化率当直播火爆起来的时候，人们要关注的不仅仅是行业能火爆多久，它的商业模式是否成熟，在洗牌节点来临与巨头羽翼覆盖下，自身还有没有机会，创业者与企业都应该从中寻找自己的机会与跨界领域的嫁接。\n它不仅仅是内容和流量的变现工具，更应该是一种营销与商业理念的转变。\n不久前，马化腾向青年创业者建议，要关注两个产业跨界的部分，因为将新技术用在两个产业跨界部分往往最有可能诞生创新的机会。\n而企业营销如果能从垂直细分领域的切入并借助直播技术与趋势为已所用，往往也能获得新的机会，尽管任何基于行业趋势的预测都意味着不确定性，但抓住不确定性的机会，才能最终在新一轮风口下，把握企业转型与商业、营销模式创新的机会，迎来属于自己的时代。\n欢迎互联网创业者加入杭州互联网创业QQ群：157936473直接加QQ或pc上点击加群项目开发咨询：0571-28030088\n\n邓伟根北美硅谷行“捎回”一个MBA授课点\n南都讯记者郭伟豪通讯员伍新宇6月7日至16日，佛山市委常委、南海区委书记、佛山高新区党工委书记兼管委会主任邓伟根率领由南海区和佛山高新区相关人员组成的经贸洽谈和友好交流代表团，对新加坡、美国和加拿大进行友好访问。\n由于新加坡裕廊、美国硅谷与有“加拿大高科技之都”美誉的万锦市均以发达的高科技产业著名，皆是所在国的硅谷，邓伟根更称此行为“三谷”之行。\n在新加坡，邓伟根一行与新加坡淡马锡控股公司相关负责人就双方进一步深化合作进行了深入的探讨。\n交流中，新加坡国立大学(N U S)商学院杨贤院长表示有意在南海设立N U S的海外M B A授课点，双方拟于6月下旬就有关意向在南海签订合作协议。\n6月9日，邓伟根一行前往硅谷拜会了硅谷美华科技商会(S V C A C A )和华美半导体协会(C A SPA )。\nSV C A C A和CA SPA将通过其广泛的会员和在硅谷等地的影响力，为佛高区、南高区在硅谷进行宣传推介，并积极把有意拓展中国市场的高科技项目推荐到南高区。\n代表团一行还到访了南海区政府与万锦市政府联合举办了“南海区与万锦市经贸交流会”。\n2012年12月，万锦市市长薛家平先生率团访问南海后，万锦市议会正式通过了为当地一道路命名“南海街”的议案，并于2013年9月举行道路命名仪式。\n在本次交流中，邓伟根提议未来也在南海选址命名一条“万锦路”，此举也立即得到薛家平市长的认同。\n对于“三谷”之行，邓伟根表示，南海将利用现有的南海乡亲和关系密切的协会等有利资源，计划在“三谷”建立南海和佛高区的海外联络处，学习和吸收海外高科技之都的先进经验，努力将已定位为“中国制造金谷”的佛高区南海核心园打造成为下一个“硅谷”，并争取早日实现佛高区挺进全国国家高新区20强的目标。\n\n内地高中生将通篇学习《道德经》\n摘要国内第一套自主研发的高中传统文化通识教材预计将于今年9月出版，四册分别为《论语》《孟子》《大学·中庸》和《道德经》。\n2016年高考改革方案中，全国25个省高考要统一命题，并且增加分数后的语文考试，正在研究增加“中华优秀传统文化”之相关内容。\n《道德经》成为高中传统文化教材。\n法制晚报讯(记者 李文姬 )今天上午，记者从“十二五”教育部规划课题《传统文化与中小学生人格培养研究》总课题组了解到，国内第一套自主研发的高中传统文化通识教材预计将于今年9月出版，四册分别为《论语》《孟子》《大学·中庸》和《道德经》。\n至此，课题组已完成了幼儿园、小学、初中、高中各阶段标准化传统文化教材的研发工作，高中国学教材将在各地开展成规模的教材试用工作。\n中国国学文化艺术中心秘书长张健表示，目前各地高考改革的几个信号均指向国学，但考什么、怎么考又是一个难题。\n专家建议，不应以文言文字词解释等传统形式考查，应关注考生如何消化吸收传统文化中的哲学素养和思想韬略。\n教材各年级国学内容全覆盖据 “十二五”教育部规划课题《传统文化与中小学生人格培养研究》总课题组介绍，高中传统文化通识系列教材作为“十一五”、“十二五”两个阶段十年课题研究的重要成果之一，由中国国学文化艺术中心承担资源整合和编著。\n去年，教育部印发了《完善中华优秀传统文化教育指导纲要》，要求在课程建设和课程标准修订中强化中华优秀传统文化内容。\n在中小学德育、语文、历史等课程标准修订中，增加中华优秀传统文化的比重。\n课题组秘书长张健表示，幼儿园、小学、初中、高中各阶段标准化传统文化教材的均已研发完成，明确提出以“青少年完美人格”为传统文化教育目标，教材知识相互关联，自成体系，并通过高中教材实现最终教学评价。\n这是“十一五”“十二五”两个阶段十年课题研究的重要成果之一。\n今年5月份之前，《高等教育传统文化教材》(12册)《全国行政领导干部国学教材》(10册)两套教材也将研发完毕。\n内容高中教材含《论语》《道德经》此次即将出版的高中阶段传统文化通识教材共有4册，供高中一、二年级使用。\n高一学习《论语》《孟子》，高二学习《大学·中庸》和《道德经》。\n其中《道德经》为原文全本讲解，另外三册则是按主题归类讲解。\n如《大学·中庸》一册，分为“慎独”“齐家”“格物致知”“中和”“为政”等章节。\n据课题组专家介绍，这4册书并非孤立的高中教材，而是《中华优秀传统文化教育全国中小学实验教材》的高中部分。\n全套教材包含小学、初中和高中三个阶段，经专家组反复研讨、论证，制定了“儒学养正、兵学相佑、道法自然、文化浸润”的课程结构，各阶段教学内容和深度循序渐进、系统科学。\n事实上，小学高年级段已开始涉及《论语》《孟子》等儒学典籍，但仅以诵读和简单理解为主，到高中阶段，学生可在已有基础上更为深刻地领悟儒道经典的思想内涵，以达到融会贯通的程度。\n此外，每一章节在讲解儒道核心精神的同时，还为学生提供了大量中西文化比较等拓展阅读素材。\n针对公众关注的一个话题，即传统文化有望成为高考的新考点，课题组表示目前在研发高中传统文化教材的同时，就已开展了另一个重点子课题研究，即传统文化教学评价与考试模式研究。\n张健强调高考改革的几个信号均指向国学，例如北京、上海等地公布的高考改革方案中，英语降分后其所降分数分给了语文，而且还更进一步明确指出了就是将分数转移给所增加的“传统文化考试内容”部分。\n又如今年清华北大自主招生均招收国学特长生。\n此外，近期公布的2016年高考改革方案中，全国25个省高考要统一命题，并且增加分数后的语文考试，正在研究增加“中华优秀传统文化”之相关内容。\n张健表示，传统文化成为高考的又一创新考点指日可待，但考什么、怎么考又是一个重大难题。\n由于相关子课题研究还没有结束，课题组非行政机构只承担建议义务。\n张健坦言，能否在高考语文中出现一个新的形式——政论或申论形式的传统文化论述题，这一方向应该是研究和创新的改革方向之一。\n若2016年传统文化进入高考，最大的问题是很多高中生没有接触过传统文化课程，不具备相关知识储备和素养，国学文化是通过长期熏陶和涵养才能显现的，不是靠一朝一夕突击补课就能拥有的。\n\n悬灸技术培训专家教你艾灸降血糖，为爸妈收好了！\n近年来随着我国经济条件的改善和人们生活水平的提高，我国糖尿病的患病率也在逐年上升。\n悬灸技术培训的创始人艾灸专家刘全军先生对糖尿病深有研究，接下来，学一学他是怎么用艾灸降血压的吧！\n中医认为，糖尿病是气血、阴阳失调等多种原因引起的一种慢性疾病。\n虽然分为上消、中消、下消，但是无论何种糖尿病 ，治疗的原则都是荣养阴液，清热润燥。\n艾灸对控制血糖效果不错。\n艾灸功效：调升元阳降血糖艾灸可以修复受损胰岛细胞，激活再生，逐步实现胰岛素的自给自足。\n服药一天比一天少，身体一天比一天好，彻底摆脱终生服药！\n还可以双向调节血糖，使血糖老老实实地锁定在正常的恒定值范围。\n也可以改善组织供氧，对微血管病变导致的视物不清、眼底出血等视网膜病变及早期肾病病变及早期肾病病变有明显治疗与改善作用，改善病人消瘦无力、免疫力低下、低蛋白质血证及伤口不愈等现象。\n艾灸取穴糖尿病艾灸过的穴位有，承浆中脘足三里关元曲骨三阴交、期门太冲下脘天枢气海膈俞膻中、胃俞，这么多穴位可根据患者当时的症状进行选取。\n选取后艾灸，每10天为一个疗程，疗程间休息3-5天后继续第二轮的治疗，三个疗程基本可见到理想疗效。\n这几个穴位都是具有补充人体元阳功能的大穴和调节脏腑功能的腧穴，从根上调节人体的元阳使阴阳达到新的平衡，五脏六腑尤其是肺、脾肾的功能恢复正常，糖尿病自然也就不药而愈了。\n艾灸可以有效控制糖尿病 ，这在很多资料都有报导。\n艾灸使病人的营养能得到有效的吸收和利用，从而提高人体的自身免疫功能和抗病防病能力，防止了系列并发症的发生，真正做到综合治疗，标本兼治。\n艾灸对于常见病是具有广泛的适应性的。\n希望大家把艾灸推广出去，让艾灸这个疗法能够更完善，造福更多的人。\n\n熟食放在垃圾旁无照窝点被取缔\n本报讯（记者李涛）又黑又脏的墙面、随意堆放的加工原料、处处弥漫的刺鼻味道。\n昨天上午，东小口镇政府与城管、食药、公安等部门开展联合执法行动时，依法取缔了一个位于昌平区东小口镇半截塔村的非法熟食加工窝点。\n昨天上午，执法人员对东小口镇半截塔村进行环境整治时，一家挂着“久久鸭”招牌的小店的店主显得有点紧张，还“顺手”把通向后院的门关上了。\n执法人员觉得有些蹊跷，便要求到后院进行检查。\n一进院子，执法人员就发现大量的熟食加工原料被随意摆放在地上，旁边就堆放着垃圾。\n院内煤炉上的一口锅内正煮着的食物，发出刺鼻的味道。\n执法队员介绍，在炉子一旁的笸箩里盛着制作好的熟食制品，但却没有任何遮盖，一阵风起，煤灰混着尘土就落在上面。\n执法队员说：“走进院旁的小屋内，地上和墙上满是油污，脏乎乎的冰柜上堆放着一袋一袋的半成品，一个个用来盛放熟食制品的笸箩摞在生锈的铁架子上。\n”随后，执法人员仔细查找，没有发现任何消毒设施，调查得知从事加工的人员也没有取得加工熟食应需的健康证。\n执法人员随后对店主进行询问，当执法人员要求出示营业执照及卫生许可证时，店主嘟囔了半天才坦白自己不具备任何手续。\n执法人员当即对该非法生产窝点进行了取缔，对现场工作人员进行了宣传与教育，并依法没收了加工工具及食品。"
  },
  {
    "path": "lamb_optimizer_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"Functions and classes related to optimization (weight updates).\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport re\nimport six\nimport tensorflow as tf\n\n# pylint: disable=g-direct-tensorflow-import\nfrom tensorflow.python.ops import array_ops\nfrom tensorflow.python.ops import linalg_ops\nfrom tensorflow.python.ops import math_ops\n# pylint: enable=g-direct-tensorflow-import\n\n\nclass LAMBOptimizer(tf.train.Optimizer):\n  \"\"\"LAMB (Layer-wise Adaptive Moments optimizer for Batch training).\"\"\"\n  # A new optimizer that includes correct L2 weight decay, adaptive\n  # element-wise updating, and layer-wise justification. The LAMB optimizer\n  # was proposed by Yang You, Jing Li, Jonathan Hseu, Xiaodan Song,\n  # James Demmel, and Cho-Jui Hsieh in a paper titled as Reducing BERT\n  # Pre-Training Time from 3 Days to 76 Minutes (arxiv.org/abs/1904.00962)\n\n  def __init__(self,\n               learning_rate,\n               weight_decay_rate=0.0,\n               beta_1=0.9,\n               beta_2=0.999,\n               epsilon=1e-6,\n               exclude_from_weight_decay=None,\n               exclude_from_layer_adaptation=None,\n               name=\"LAMBOptimizer\"):\n    \"\"\"Constructs a LAMBOptimizer.\"\"\"\n    super(LAMBOptimizer, self).__init__(False, name)\n\n    self.learning_rate = learning_rate\n    self.weight_decay_rate = weight_decay_rate\n    self.beta_1 = beta_1\n    self.beta_2 = beta_2\n    self.epsilon = epsilon\n    self.exclude_from_weight_decay = exclude_from_weight_decay\n    # exclude_from_layer_adaptation is set to exclude_from_weight_decay if the\n    # arg is None.\n    # TODO(jingli): validate if exclude_from_layer_adaptation is necessary.\n    if exclude_from_layer_adaptation:\n      self.exclude_from_layer_adaptation = exclude_from_layer_adaptation\n    else:\n      self.exclude_from_layer_adaptation = exclude_from_weight_decay\n\n  def apply_gradients(self, grads_and_vars, global_step=None, name=None):\n    \"\"\"See base class.\"\"\"\n    assignments = []\n    for (grad, param) in grads_and_vars:\n      if grad is None or param is None:\n        continue\n\n      param_name = self._get_variable_name(param.name)\n\n      m = tf.get_variable(\n          name=six.ensure_str(param_name) + \"/adam_m\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n      v = tf.get_variable(\n          name=six.ensure_str(param_name) + \"/adam_v\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n\n      # Standard Adam update.\n      next_m = (\n          tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))\n      next_v = (\n          tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,\n                                                    tf.square(grad)))\n\n      update = next_m / (tf.sqrt(next_v) + self.epsilon)\n\n      # Just adding the square of the weights to the loss function is *not*\n      # the correct way of using L2 regularization/weight decay with Adam,\n      # since that will interact with the m and v parameters in strange ways.\n      #\n      # Instead we want ot decay the weights in a manner that doesn't interact\n      # with the m/v parameters. This is equivalent to adding the square\n      # of the weights to the loss with plain (non-momentum) SGD.\n      if self._do_use_weight_decay(param_name):\n        update += self.weight_decay_rate * param\n\n      ratio = 1.0\n      if self._do_layer_adaptation(param_name):\n        w_norm = linalg_ops.norm(param, ord=2)\n        g_norm = linalg_ops.norm(update, ord=2)\n        ratio = array_ops.where(math_ops.greater(w_norm, 0), array_ops.where(\n            math_ops.greater(g_norm, 0), (w_norm / g_norm), 1.0), 1.0)\n\n      update_with_lr = ratio * self.learning_rate * update\n\n      next_param = param - update_with_lr\n\n      assignments.extend(\n          [param.assign(next_param),\n           m.assign(next_m),\n           v.assign(next_v)])\n    return tf.group(*assignments, name=name)\n\n  def _do_use_weight_decay(self, param_name):\n    \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n    if not self.weight_decay_rate:\n      return False\n    if self.exclude_from_weight_decay:\n      for r in self.exclude_from_weight_decay:\n        if re.search(r, param_name) is not None:\n          return False\n    return True\n\n  def _do_layer_adaptation(self, param_name):\n    \"\"\"Whether to do layer-wise learning rate adaptation for `param_name`.\"\"\"\n    if self.exclude_from_layer_adaptation:\n      for r in self.exclude_from_layer_adaptation:\n        if re.search(r, param_name) is not None:\n          return False\n    return True\n\n  def _get_variable_name(self, param_name):\n    \"\"\"Get the variable name from the tensor name.\"\"\"\n    m = re.match(\"^(.*):\\\\d+$\", six.ensure_str(param_name))\n    if m is not None:\n      param_name = m.group(1)\n    return param_name\n"
  },
  {
    "path": "modeling.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"The main BERT model and related functions.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport copy\nimport json\nimport math\nimport re\nimport numpy as np\nimport six\nimport tensorflow as tf\nimport bert_utils\n\nclass BertConfig(object):\n  \"\"\"Configuration for `BertModel`.\"\"\"\n\n  def __init__(self,\n               vocab_size,\n               hidden_size=768,\n               num_hidden_layers=12,\n               num_attention_heads=12,\n               intermediate_size=3072,\n               hidden_act=\"gelu\",\n               hidden_dropout_prob=0.1,\n               attention_probs_dropout_prob=0.1,\n               max_position_embeddings=512,\n               type_vocab_size=16,\n               initializer_range=0.02):\n    \"\"\"Constructs BertConfig.\n\n    Args:\n      vocab_size: Vocabulary size of `inputs_ids` in `BertModel`.\n      hidden_size: Size of the encoder layers and the pooler layer.\n      num_hidden_layers: Number of hidden layers in the Transformer encoder.\n      num_attention_heads: Number of attention heads for each attention layer in\n        the Transformer encoder.\n      intermediate_size: The size of the \"intermediate\" (i.e., feed-forward)\n        layer in the Transformer encoder.\n      hidden_act: The non-linear activation function (function or string) in the\n        encoder and pooler.\n      hidden_dropout_prob: The dropout probability for all fully connected\n        layers in the embeddings, encoder, and pooler.\n      attention_probs_dropout_prob: The dropout ratio for the attention\n        probabilities.\n      max_position_embeddings: The maximum sequence length that this model might\n        ever be used with. Typically set this to something large just in case\n        (e.g., 512 or 1024 or 2048).\n      type_vocab_size: The vocabulary size of the `token_type_ids` passed into\n        `BertModel`.\n      initializer_range: The stdev of the truncated_normal_initializer for\n        initializing all weight matrices.\n    \"\"\"\n    self.vocab_size = vocab_size\n    self.hidden_size = hidden_size\n    self.num_hidden_layers = num_hidden_layers\n    self.num_attention_heads = num_attention_heads\n    self.hidden_act = hidden_act\n    self.intermediate_size = intermediate_size\n    self.hidden_dropout_prob = hidden_dropout_prob\n    self.attention_probs_dropout_prob = attention_probs_dropout_prob\n    self.max_position_embeddings = max_position_embeddings\n    self.type_vocab_size = type_vocab_size\n    self.initializer_range = initializer_range\n\n  @classmethod\n  def from_dict(cls, json_object):\n    \"\"\"Constructs a `BertConfig` from a Python dictionary of parameters.\"\"\"\n    config = BertConfig(vocab_size=None)\n    for (key, value) in six.iteritems(json_object):\n      config.__dict__[key] = value\n    return config\n\n  @classmethod\n  def from_json_file(cls, json_file):\n    \"\"\"Constructs a `BertConfig` from a json file of parameters.\"\"\"\n    with tf.gfile.GFile(json_file, \"r\") as reader:\n      text = reader.read()\n    return cls.from_dict(json.loads(text))\n\n  def to_dict(self):\n    \"\"\"Serializes this instance to a Python dictionary.\"\"\"\n    output = copy.deepcopy(self.__dict__)\n    return output\n\n  def to_json_string(self):\n    \"\"\"Serializes this instance to a JSON string.\"\"\"\n    return json.dumps(self.to_dict(), indent=2, sort_keys=True) + \"\\n\"\n\n\nclass BertModel(object):\n  \"\"\"BERT model (\"Bidirectional Encoder Representations from Transformers\").\n\n  Example usage:\n\n  ```python\n  # Already been converted into WordPiece token ids\n  input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])\n  input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])\n  token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])\n\n  config = modeling.BertConfig(vocab_size=32000, hidden_size=512,\n    num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)\n\n  model = modeling.BertModel(config=config, is_training=True,\n    input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)\n\n  label_embeddings = tf.get_variable(...)\n  pooled_output = model.get_pooled_output()\n  logits = tf.matmul(pooled_output, label_embeddings)\n  ...\n  ```\n  \"\"\"\n\n  def __init__(self,\n               config,\n               is_training,\n               input_ids,\n               input_mask=None,\n               token_type_ids=None,\n               use_one_hot_embeddings=False,\n               scope=None):\n    \"\"\"Constructor for BertModel.\n\n    Args:\n      config: `BertConfig` instance.\n      is_training: bool. true for training model, false for eval model. Controls\n        whether dropout will be applied.\n      input_ids: int32 Tensor of shape [batch_size, seq_length].\n      input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].\n      token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      use_one_hot_embeddings: (optional) bool. Whether to use one-hot word\n        embeddings or tf.embedding_lookup() for the word embeddings.\n      scope: (optional) variable scope. Defaults to \"bert\".\n\n    Raises:\n      ValueError: The config is invalid or one of the input tensor shapes\n        is invalid.\n    \"\"\"\n    config = copy.deepcopy(config)\n    if not is_training:\n      config.hidden_dropout_prob = 0.0\n      config.attention_probs_dropout_prob = 0.0\n\n    input_shape = get_shape_list(input_ids, expected_rank=2)\n    batch_size = input_shape[0]\n    seq_length = input_shape[1]\n\n    if input_mask is None:\n      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    if token_type_ids is None:\n      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    with tf.variable_scope(scope, default_name=\"bert\"):\n      with tf.variable_scope(\"embeddings\"):\n        # Perform embedding lookup on the word ids, but use stype of factorized embedding parameterization from albert. add by brightmart, 2019-09-28\n        (self.embedding_output, self.embedding_table,self.embedding_table_2) = embedding_lookup_factorized(\n            input_ids=input_ids,\n            vocab_size=config.vocab_size,\n            hidden_size=config.hidden_size,\n            embedding_size=config.embedding_size,\n            initializer_range=config.initializer_range,\n            word_embedding_name=\"word_embeddings\",\n            use_one_hot_embeddings=use_one_hot_embeddings)\n\n        # Add positional embeddings and token type embeddings, then layer\n        # normalize and perform dropout.\n        self.embedding_output = embedding_postprocessor(\n            input_tensor=self.embedding_output,\n            use_token_type=True,\n            token_type_ids=token_type_ids,\n            token_type_vocab_size=config.type_vocab_size,\n            token_type_embedding_name=\"token_type_embeddings\",\n            use_position_embeddings=True,\n            position_embedding_name=\"position_embeddings\",\n            initializer_range=config.initializer_range,\n            max_position_embeddings=config.max_position_embeddings,\n            dropout_prob=config.hidden_dropout_prob)\n\n      with tf.variable_scope(\"encoder\"):\n        # This converts a 2D mask of shape [batch_size, seq_length] to a 3D\n        # mask of shape [batch_size, seq_length, seq_length] which is used\n        # for the attention scores.\n        attention_mask = create_attention_mask_from_input_mask(\n            input_ids, input_mask)\n\n        # Run the stacked transformer.\n        # `sequence_output` shape = [batch_size, seq_length, hidden_size].\n        ln_type=config.ln_type\n        print(\"ln_type:\",ln_type)\n        if ln_type=='postln' or ln_type is None: # currently, base or large of albert used post-LN structure\n            print(\"old structure of transformer.use: transformer_model,which use post-LN\")\n            self.all_encoder_layers = transformer_model(\n                input_tensor=self.embedding_output,\n                attention_mask=attention_mask,\n                hidden_size=config.hidden_size,\n                num_hidden_layers=config.num_hidden_layers,\n                num_attention_heads=config.num_attention_heads,\n                intermediate_size=config.intermediate_size,\n                intermediate_act_fn=get_activation(config.hidden_act),\n                hidden_dropout_prob=config.hidden_dropout_prob,\n                attention_probs_dropout_prob=config.attention_probs_dropout_prob,\n                initializer_range=config.initializer_range,\n                do_return_all_layers=True)\n        else: # xlarge or xxlarge of albert, used pre-LN structure\n            print(\"new structure of transformer.use: prelln_transformer_model,which use pre-LN\")\n            self.all_encoder_layers = prelln_transformer_model( # change by brightmart, 4th, oct, 2019. pre-Layer Normalization can converge fast and better. check paper: ON LAYER NORMALIZATION IN THE TRANSFORMER ARCHITECTURE\n                input_tensor=self.embedding_output,\n                attention_mask=attention_mask,\n                hidden_size=config.hidden_size,\n                num_hidden_layers=config.num_hidden_layers,\n                num_attention_heads=config.num_attention_heads,\n                intermediate_size=config.intermediate_size,\n                intermediate_act_fn=get_activation(config.hidden_act),\n                hidden_dropout_prob=config.hidden_dropout_prob,\n                attention_probs_dropout_prob=config.attention_probs_dropout_prob,\n                initializer_range=config.initializer_range,\n                do_return_all_layers=True,\n                shared_type='all') #  do_return_all_layers=True\n\n      self.sequence_output = self.all_encoder_layers[-1] # [batch_size, seq_length, hidden_size]\n      # The \"pooler\" converts the encoded sequence tensor of shape\n      # [batch_size, seq_length, hidden_size] to a tensor of shape\n      # [batch_size, hidden_size]. This is necessary for segment-level\n      # (or segment-pair-level) classification tasks where we need a fixed\n      # dimensional representation of the segment.\n      with tf.variable_scope(\"pooler\"):\n        # We \"pool\" the model by simply taking the hidden state corresponding\n        # to the first token. We assume that this has been pre-trained\n        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)\n        self.pooled_output = tf.layers.dense(\n            first_token_tensor,\n            config.hidden_size,\n            activation=tf.tanh,\n            kernel_initializer=create_initializer(config.initializer_range))\n\n  def get_pooled_output(self):\n    return self.pooled_output\n\n  def get_sequence_output(self):\n    \"\"\"Gets final hidden layer of encoder.\n\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the final hidden of the transformer encoder.\n    \"\"\"\n    return self.sequence_output\n\n  def get_all_encoder_layers(self):\n    return self.all_encoder_layers\n\n  def get_embedding_output(self):\n    \"\"\"Gets output of the embedding lookup (i.e., input to the transformer).\n\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the output of the embedding layer, after summing the word\n      embeddings with the positional embeddings and the token type embeddings,\n      then performing layer normalization. This is the input to the transformer.\n    \"\"\"\n    return self.embedding_output\n\n  def get_embedding_table(self):\n    return self.embedding_table\n\n  def get_embedding_table_2(self):\n    return self.embedding_table_2\n\ndef gelu(x):\n  \"\"\"Gaussian Error Linear Unit.\n\n  This is a smoother version of the RELU.\n  Original paper: https://arxiv.org/abs/1606.08415\n  Args:\n    x: float Tensor to perform activation.\n\n  Returns:\n    `x` with the GELU activation applied.\n  \"\"\"\n  cdf = 0.5 * (1.0 + tf.tanh(\n      (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))\n  return x * cdf\n\n\ndef get_activation(activation_string):\n  \"\"\"Maps a string to a Python function, e.g., \"relu\" => `tf.nn.relu`.\n\n  Args:\n    activation_string: String name of the activation function.\n\n  Returns:\n    A Python function corresponding to the activation function. If\n    `activation_string` is None, empty, or \"linear\", this will return None.\n    If `activation_string` is not a string, it will return `activation_string`.\n\n  Raises:\n    ValueError: The `activation_string` does not correspond to a known\n      activation.\n  \"\"\"\n\n  # We assume that anything that\"s not a string is already an activation\n  # function, so we just return it.\n  if not isinstance(activation_string, six.string_types):\n    return activation_string\n\n  if not activation_string:\n    return None\n\n  act = activation_string.lower()\n  if act == \"linear\":\n    return None\n  elif act == \"relu\":\n    return tf.nn.relu\n  elif act == \"gelu\":\n    return gelu\n  elif act == \"tanh\":\n    return tf.tanh\n  else:\n    raise ValueError(\"Unsupported activation: %s\" % act)\n\n\ndef get_assignment_map_from_checkpoint(tvars, init_checkpoint):\n  \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n  assignment_map = {}\n  initialized_variable_names = {}\n\n  name_to_variable = collections.OrderedDict()\n  for var in tvars:\n    name = var.name\n    m = re.match(\"^(.*):\\\\d+$\", name)\n    if m is not None:\n      name = m.group(1)\n    name_to_variable[name] = var\n\n  init_vars = tf.train.list_variables(init_checkpoint)\n\n  assignment_map = collections.OrderedDict()\n  for x in init_vars:\n    (name, var) = (x[0], x[1])\n    if name not in name_to_variable:\n      continue\n    assignment_map[name] = name\n    initialized_variable_names[name] = 1\n    initialized_variable_names[name + \":0\"] = 1\n\n  return (assignment_map, initialized_variable_names)\n\n\ndef dropout(input_tensor, dropout_prob):\n  \"\"\"Perform dropout.\n\n  Args:\n    input_tensor: float Tensor.\n    dropout_prob: Python float. The probability of dropping out a value (NOT of\n      *keeping* a dimension as in `tf.nn.dropout`).\n\n  Returns:\n    A version of `input_tensor` with dropout applied.\n  \"\"\"\n  if dropout_prob is None or dropout_prob == 0.0:\n    return input_tensor\n\n  output = tf.nn.dropout(input_tensor, 1.0 - dropout_prob)\n  return output\n\n\ndef layer_norm(input_tensor, name=None):\n  \"\"\"Run layer normalization on the last dimension of the tensor.\"\"\"\n  return tf.contrib.layers.layer_norm(\n      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)\n\n\ndef layer_norm_and_dropout(input_tensor, dropout_prob, name=None):\n  \"\"\"Runs layer normalization followed by dropout.\"\"\"\n  output_tensor = layer_norm(input_tensor, name)\n  output_tensor = dropout(output_tensor, dropout_prob)\n  return output_tensor\n\n\ndef create_initializer(initializer_range=0.02):\n  \"\"\"Creates a `truncated_normal_initializer` with the given range.\"\"\"\n  return tf.truncated_normal_initializer(stddev=initializer_range)\n\n\ndef embedding_lookup(input_ids,\n                     vocab_size,\n                     embedding_size=128,\n                     initializer_range=0.02,\n                     word_embedding_name=\"word_embeddings\",\n                     use_one_hot_embeddings=False):\n  \"\"\"Looks up words embeddings for id tensor.\n\n  Args:\n    input_ids: int32 Tensor of shape [batch_size, seq_length] containing word\n      ids.\n    vocab_size: int. Size of the embedding vocabulary.\n    embedding_size: int. Width of the word embeddings.\n    initializer_range: float. Embedding initialization range.\n    word_embedding_name: string. Name of the embedding table.\n    use_one_hot_embeddings: bool. If True, use one-hot method for word\n      embeddings. If False, use `tf.gather()`.\n\n  Returns:\n    float Tensor of shape [batch_size, seq_length, embedding_size].\n  \"\"\"\n  # This function assumes that the input is of shape [batch_size, seq_length,\n  # num_inputs].\n  #\n  # If the input is a 2D tensor of shape [batch_size, seq_length], we\n  # reshape to [batch_size, seq_length, 1].\n  if input_ids.shape.ndims == 2:\n    input_ids = tf.expand_dims(input_ids, axis=[-1]) # shape of input_ids is:[ batch_size, seq_length, 1]\n\n  embedding_table = tf.get_variable( # [vocab_size, embedding_size]\n      name=word_embedding_name,\n      shape=[vocab_size, embedding_size],\n      initializer=create_initializer(initializer_range))\n\n  flat_input_ids = tf.reshape(input_ids, [-1]) # one rank. shape as (batch_size * sequence_length,)\n  if use_one_hot_embeddings:\n    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size) # one_hot_input_ids=[batch_size * sequence_length,vocab_size]\n    output = tf.matmul(one_hot_input_ids, embedding_table) # output=[batch_size * sequence_length,embedding_size]\n  else:\n    output = tf.gather(embedding_table, flat_input_ids) # [vocab_size, embedding_size]*[batch_size * sequence_length,]--->[batch_size * sequence_length,embedding_size]\n\n  input_shape = get_shape_list(input_ids) # input_shape=[ batch_size, seq_length, 1]\n\n  output = tf.reshape(output,input_shape[0:-1] + [input_shape[-1] * embedding_size]) # output=[batch_size,sequence_length,embedding_size]\n  return (output, embedding_table)\n\ndef embedding_lookup_factorized(input_ids, # Factorized embedding parameterization provide by albert\n                     vocab_size,\n                     hidden_size,\n                     embedding_size=128,\n                     initializer_range=0.02,\n                     word_embedding_name=\"word_embeddings\",\n                     use_one_hot_embeddings=False):\n    \"\"\"Looks up words embeddings for id tensor, but in a factorized style followed by albert. it is used to reduce much percentage of parameters previous exists.\n       Check \"Factorized embedding parameterization\" session in the paper.\n\n     Args:\n       input_ids: int32 Tensor of shape [batch_size, seq_length] containing word\n         ids.\n       vocab_size: int. Size of the embedding vocabulary.\n       embedding_size: int. Width of the word embeddings.\n       initializer_range: float. Embedding initialization range.\n       word_embedding_name: string. Name of the embedding table.\n       use_one_hot_embeddings: bool. If True, use one-hot method for word\n         embeddings. If False, use `tf.gather()`.\n\n     Returns:\n       float Tensor of shape [batch_size, seq_length, embedding_size].\n     \"\"\"\n    # This function assumes that the input is of shape [batch_size, seq_length,\n    # num_inputs].\n    #\n    # If the input is a 2D tensor of shape [batch_size, seq_length], we\n    # reshape to [batch_size, seq_length, 1].\n\n    # 1.first project one-hot vectors into a lower dimensional embedding space of size E\n    print(\"embedding_lookup_factorized. factorized embedding parameterization is used.\")\n    if input_ids.shape.ndims == 2:\n        input_ids = tf.expand_dims(input_ids, axis=[-1])  # shape of input_ids is:[ batch_size, seq_length, 1]\n\n    embedding_table = tf.get_variable(  # [vocab_size, embedding_size]\n        name=word_embedding_name,\n        shape=[vocab_size, embedding_size],\n        initializer=create_initializer(initializer_range))\n\n    flat_input_ids = tf.reshape(input_ids, [-1])  # one rank. shape as (batch_size * sequence_length,)\n    if use_one_hot_embeddings:\n        one_hot_input_ids = tf.one_hot(flat_input_ids,depth=vocab_size)  # one_hot_input_ids=[batch_size * sequence_length,vocab_size]\n        output_middle = tf.matmul(one_hot_input_ids, embedding_table)  # output=[batch_size * sequence_length,embedding_size]\n    else:\n        output_middle = tf.gather(embedding_table,flat_input_ids)  # [vocab_size, embedding_size]*[batch_size * sequence_length,]--->[batch_size * sequence_length,embedding_size]\n\n    # 2. project vector(output_middle) to the hidden space\n    project_variable = tf.get_variable(  # [embedding_size, hidden_size]\n        name=word_embedding_name+\"_2\",\n        shape=[embedding_size, hidden_size],\n        initializer=create_initializer(initializer_range))\n    output = tf.matmul(output_middle, project_variable) # ([batch_size * sequence_length, embedding_size] * [embedding_size, hidden_size])--->[batch_size * sequence_length, hidden_size]\n    # reshape back to 3 rank\n    input_shape = get_shape_list(input_ids)  # input_shape=[ batch_size, seq_length, 1]\n    batch_size, sequene_length, _=input_shape\n    output = tf.reshape(output, (batch_size,sequene_length,hidden_size))  # output=[batch_size, sequence_length, hidden_size]\n    return (output, embedding_table, project_variable)\n\n\ndef embedding_postprocessor(input_tensor,\n                            use_token_type=False,\n                            token_type_ids=None,\n                            token_type_vocab_size=16,\n                            token_type_embedding_name=\"token_type_embeddings\",\n                            use_position_embeddings=True,\n                            position_embedding_name=\"position_embeddings\",\n                            initializer_range=0.02,\n                            max_position_embeddings=512,\n                            dropout_prob=0.1):\n  \"\"\"Performs various post-processing on a word embedding tensor.\n\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length,\n      embedding_size].\n    use_token_type: bool. Whether to add embeddings for `token_type_ids`.\n    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      Must be specified if `use_token_type` is True.\n    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.\n    token_type_embedding_name: string. The name of the embedding table variable\n      for token type ids.\n    use_position_embeddings: bool. Whether to add position embeddings for the\n      position of each token in the sequence.\n    position_embedding_name: string. The name of the embedding table variable\n      for positional embeddings.\n    initializer_range: float. Range of the weight initialization.\n    max_position_embeddings: int. Maximum sequence length that might ever be\n      used with this model. This can be longer than the sequence length of\n      input_tensor, but cannot be shorter.\n    dropout_prob: float. Dropout probability applied to the final output tensor.\n\n  Returns:\n    float tensor with same shape as `input_tensor`.\n\n  Raises:\n    ValueError: One of the tensor shapes or input values is invalid.\n  \"\"\"\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  batch_size = input_shape[0]\n  seq_length = input_shape[1]\n  width = input_shape[2]\n\n  output = input_tensor\n\n  if use_token_type:\n    if token_type_ids is None:\n      raise ValueError(\"`token_type_ids` must be specified if\"\n                       \"`use_token_type` is True.\")\n    token_type_table = tf.get_variable(\n        name=token_type_embedding_name,\n        shape=[token_type_vocab_size, width],\n        initializer=create_initializer(initializer_range))\n    # This vocab will be small so we always do one-hot here, since it is always\n    # faster for a small vocabulary.\n    flat_token_type_ids = tf.reshape(token_type_ids, [-1])\n    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)\n    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)\n    token_type_embeddings = tf.reshape(token_type_embeddings,\n                                       [batch_size, seq_length, width])\n    output += token_type_embeddings\n\n  if use_position_embeddings:\n    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)\n    with tf.control_dependencies([assert_op]):\n      full_position_embeddings = tf.get_variable(\n          name=position_embedding_name,\n          shape=[max_position_embeddings, width],\n          initializer=create_initializer(initializer_range))\n      # Since the position embedding table is a learned variable, we create it\n      # using a (long) sequence length `max_position_embeddings`. The actual\n      # sequence length might be shorter than this, for faster training of\n      # tasks that do not have long sequences.\n      #\n      # So `full_position_embeddings` is effectively an embedding table\n      # for position [0, 1, 2, ..., max_position_embeddings-1], and the current\n      # sequence has positions [0, 1, 2, ... seq_length-1], so we can just\n      # perform a slice.\n      position_embeddings = tf.slice(full_position_embeddings, [0, 0],\n                                     [seq_length, -1])\n      num_dims = len(output.shape.as_list())\n\n      # Only the last two dimensions are relevant (`seq_length` and `width`), so\n      # we broadcast among the first dimensions, which is typically just\n      # the batch size.\n      position_broadcast_shape = []\n      for _ in range(num_dims - 2):\n        position_broadcast_shape.append(1)\n      position_broadcast_shape.extend([seq_length, width])\n      position_embeddings = tf.reshape(position_embeddings,\n                                       position_broadcast_shape)\n      output += position_embeddings\n\n  output = layer_norm_and_dropout(output, dropout_prob)\n  return output\n\n\ndef create_attention_mask_from_input_mask(from_tensor, to_mask):\n  \"\"\"Create 3D attention mask from a 2D tensor mask.\n\n  Args:\n    from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...].\n    to_mask: int32 Tensor of shape [batch_size, to_seq_length].\n\n  Returns:\n    float Tensor of shape [batch_size, from_seq_length, to_seq_length].\n  \"\"\"\n  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])\n  batch_size = from_shape[0]\n  from_seq_length = from_shape[1]\n\n  to_shape = get_shape_list(to_mask, expected_rank=2)\n  to_seq_length = to_shape[1]\n\n  to_mask = tf.cast(\n      tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)\n\n  # We don't assume that `from_tensor` is a mask (although it could be). We\n  # don't actually care if we attend *from* padding tokens (only *to* padding)\n  # tokens so we create a tensor of all ones.\n  #\n  # `broadcast_ones` = [batch_size, from_seq_length, 1]\n  broadcast_ones = tf.ones(\n      shape=[batch_size, from_seq_length, 1], dtype=tf.float32)\n\n  # Here we broadcast along two dimensions to create the mask.\n  mask = broadcast_ones * to_mask\n\n  return mask\n\n\ndef attention_layer(from_tensor,\n                    to_tensor,\n                    attention_mask=None,\n                    num_attention_heads=1,\n                    size_per_head=512,\n                    query_act=None,\n                    key_act=None,\n                    value_act=None,\n                    attention_probs_dropout_prob=0.0,\n                    initializer_range=0.02,\n                    do_return_2d_tensor=False,\n                    batch_size=None,\n                    from_seq_length=None,\n                    to_seq_length=None):\n  \"\"\"Performs multi-headed attention from `from_tensor` to `to_tensor`.\n\n  This is an implementation of multi-headed attention based on \"Attention\n  is all you Need\". If `from_tensor` and `to_tensor` are the same, then\n  this is self-attention. Each timestep in `from_tensor` attends to the\n  corresponding sequence in `to_tensor`, and returns a fixed-with vector.\n\n  This function first projects `from_tensor` into a \"query\" tensor and\n  `to_tensor` into \"key\" and \"value\" tensors. These are (effectively) a list\n  of tensors of length `num_attention_heads`, where each tensor is of shape\n  [batch_size, seq_length, size_per_head].\n\n  Then, the query and key tensors are dot-producted and scaled. These are\n  softmaxed to obtain attention probabilities. The value tensors are then\n  interpolated by these probabilities, then concatenated back to a single\n  tensor and returned.\n\n  In practice, the multi-headed attention are done with transposes and\n  reshapes rather than actual separate tensors.\n\n  Args:\n    from_tensor: float Tensor of shape [batch_size, from_seq_length,\n      from_width].\n    to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].\n    attention_mask: (optional) int32 Tensor of shape [batch_size,\n      from_seq_length, to_seq_length]. The values should be 1 or 0. The\n      attention scores will effectively be set to -infinity for any positions in\n      the mask that are 0, and will be unchanged for positions that are 1.\n    num_attention_heads: int. Number of attention heads.\n    size_per_head: int. Size of each attention head.\n    query_act: (optional) Activation function for the query transform.\n    key_act: (optional) Activation function for the key transform.\n    value_act: (optional) Activation function for the value transform.\n    attention_probs_dropout_prob: (optional) float. Dropout probability of the\n      attention probabilities.\n    initializer_range: float. Range of the weight initializer.\n    do_return_2d_tensor: bool. If True, the output will be of shape [batch_size\n      * from_seq_length, num_attention_heads * size_per_head]. If False, the\n      output will be of shape [batch_size, from_seq_length, num_attention_heads\n      * size_per_head].\n    batch_size: (Optional) int. If the input is 2D, this might be the batch size\n      of the 3D version of the `from_tensor` and `to_tensor`.\n    from_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `from_tensor`.\n    to_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `to_tensor`.\n\n  Returns:\n    float Tensor of shape [batch_size, from_seq_length,\n      num_attention_heads * size_per_head]. (If `do_return_2d_tensor` is\n      true, this will be of shape [batch_size * from_seq_length,\n      num_attention_heads * size_per_head]).\n\n  Raises:\n    ValueError: Any of the arguments or tensor shapes are invalid.\n  \"\"\"\n\n  def transpose_for_scores(input_tensor, batch_size, num_attention_heads,\n                           seq_length, width):\n    output_tensor = tf.reshape(\n        input_tensor, [batch_size, seq_length, num_attention_heads, width])\n\n    output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3])\n    return output_tensor\n\n  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])\n  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])\n\n  if len(from_shape) != len(to_shape):\n    raise ValueError(\n        \"The rank of `from_tensor` must match the rank of `to_tensor`.\")\n\n  if len(from_shape) == 3:\n    batch_size = from_shape[0]\n    from_seq_length = from_shape[1]\n    to_seq_length = to_shape[1]\n  elif len(from_shape) == 2:\n    if (batch_size is None or from_seq_length is None or to_seq_length is None):\n      raise ValueError(\n          \"When passing in rank 2 tensors to attention_layer, the values \"\n          \"for `batch_size`, `from_seq_length`, and `to_seq_length` \"\n          \"must all be specified.\")\n\n  # Scalar dimensions referenced here:\n  #   B = batch size (number of sequences)\n  #   F = `from_tensor` sequence length\n  #   T = `to_tensor` sequence length\n  #   N = `num_attention_heads`\n  #   H = `size_per_head`\n\n  from_tensor_2d = reshape_to_matrix(from_tensor)\n  to_tensor_2d = reshape_to_matrix(to_tensor)\n\n  # `query_layer` = [B*F, N*H]\n  query_layer = tf.layers.dense(\n      from_tensor_2d,\n      num_attention_heads * size_per_head,\n      activation=query_act,\n      name=\"query\",\n      kernel_initializer=create_initializer(initializer_range))\n\n  # `key_layer` = [B*T, N*H]\n  key_layer = tf.layers.dense(\n      to_tensor_2d,\n      num_attention_heads * size_per_head,\n      activation=key_act,\n      name=\"key\",\n      kernel_initializer=create_initializer(initializer_range))\n\n  # `value_layer` = [B*T, N*H]\n  value_layer = tf.layers.dense(\n      to_tensor_2d,\n      num_attention_heads * size_per_head,\n      activation=value_act,\n      name=\"value\",\n      kernel_initializer=create_initializer(initializer_range))\n\n  # `query_layer` = [B, N, F, H]\n  query_layer = transpose_for_scores(query_layer, batch_size,\n                                     num_attention_heads, from_seq_length,\n                                     size_per_head)\n\n  # `key_layer` = [B, N, T, H]\n  key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads,\n                                   to_seq_length, size_per_head)\n\n  # Take the dot product between \"query\" and \"key\" to get the raw\n  # attention scores.\n  # `attention_scores` = [B, N, F, T]\n  attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)\n  attention_scores = tf.multiply(attention_scores,\n                                 1.0 / math.sqrt(float(size_per_head)))\n\n  if attention_mask is not None:\n    # `attention_mask` = [B, 1, F, T]\n    attention_mask = tf.expand_dims(attention_mask, axis=[1])\n\n    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n    # masked positions, this operation will create a tensor which is 0.0 for\n    # positions we want to attend and -10000.0 for masked positions.\n    adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0\n\n    # Since we are adding it to the raw scores before the softmax, this is\n    # effectively the same as removing these entirely.\n    attention_scores += adder\n\n  # Normalize the attention scores to probabilities.\n  # `attention_probs` = [B, N, F, T]\n  attention_probs = tf.nn.softmax(attention_scores)\n\n  # This is actually dropping out entire tokens to attend to, which might\n  # seem a bit unusual, but is taken from the original Transformer paper.\n  attention_probs = dropout(attention_probs, attention_probs_dropout_prob)\n\n  # `value_layer` = [B, T, N, H]\n  value_layer = tf.reshape(\n      value_layer,\n      [batch_size, to_seq_length, num_attention_heads, size_per_head])\n\n  # `value_layer` = [B, N, T, H]\n  value_layer = tf.transpose(value_layer, [0, 2, 1, 3])\n\n  # `context_layer` = [B, N, F, H]\n  context_layer = tf.matmul(attention_probs, value_layer)\n\n  # `context_layer` = [B, F, N, H]\n  context_layer = tf.transpose(context_layer, [0, 2, 1, 3])\n\n  if do_return_2d_tensor:\n    # `context_layer` = [B*F, N*H]\n    context_layer = tf.reshape(\n        context_layer,\n        [batch_size * from_seq_length, num_attention_heads * size_per_head])\n  else:\n    # `context_layer` = [B, F, N*H]\n    context_layer = tf.reshape(\n        context_layer,\n        [batch_size, from_seq_length, num_attention_heads * size_per_head])\n\n  return context_layer\n\n\ndef transformer_model(input_tensor,\n                      attention_mask=None,\n                      hidden_size=768,\n                      num_hidden_layers=12,\n                      num_attention_heads=12,\n                      intermediate_size=3072,\n                      intermediate_act_fn=gelu,\n                      hidden_dropout_prob=0.1,\n                      attention_probs_dropout_prob=0.1,\n                      initializer_range=0.02,\n                      do_return_all_layers=False,\n                      share_parameter_across_layers=True):\n  \"\"\"Multi-headed, multi-layer Transformer from \"Attention is All You Need\".\n\n  This is almost an exact implementation of the original Transformer encoder.\n\n  See the original paper:\n  https://arxiv.org/abs/1706.03762\n\n  Also see:\n  https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py\n\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].\n    attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,\n      seq_length], with 1 for positions that can be attended to and 0 in\n      positions that should not be.\n    hidden_size: int. Hidden size of the Transformer.\n    num_hidden_layers: int. Number of layers (blocks) in the Transformer.\n    num_attention_heads: int. Number of attention heads in the Transformer.\n    intermediate_size: int. The size of the \"intermediate\" (a.k.a., feed\n      forward) layer.\n    intermediate_act_fn: function. The non-linear activation function to apply\n      to the output of the intermediate/feed-forward layer.\n    hidden_dropout_prob: float. Dropout probability for the hidden layers.\n    attention_probs_dropout_prob: float. Dropout probability of the attention\n      probabilities.\n    initializer_range: float. Range of the initializer (stddev of truncated\n      normal).\n    do_return_all_layers: Whether to also return all layers or just the final\n      layer.\n\n  Returns:\n    float Tensor of shape [batch_size, seq_length, hidden_size], the final\n    hidden layer of the Transformer.\n\n  Raises:\n    ValueError: A Tensor shape or parameter is invalid.\n  \"\"\"\n  if hidden_size % num_attention_heads != 0:\n    raise ValueError(\n        \"The hidden size (%d) is not a multiple of the number of attention \"\n        \"heads (%d)\" % (hidden_size, num_attention_heads))\n\n  attention_head_size = int(hidden_size / num_attention_heads)\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  batch_size = input_shape[0]\n  seq_length = input_shape[1]\n  input_width = input_shape[2]\n\n  # The Transformer performs sum residuals on all layers so the input needs\n  # to be the same as the hidden size.\n  if input_width != hidden_size:\n    raise ValueError(\"The width of the input tensor (%d) != hidden size (%d)\" %\n                     (input_width, hidden_size))\n\n  # We keep the representation as a 2D tensor to avoid re-shaping it back and\n  # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on\n  # the GPU/CPU but may not be free on the TPU, so we want to minimize them to\n  # help the optimizer.\n  prev_output = reshape_to_matrix(input_tensor)\n\n  all_layer_outputs = []\n  for layer_idx in range(num_hidden_layers):\n    if share_parameter_across_layers:\n        name_variable_scope=\"layer_shared\"\n    else:\n        name_variable_scope=\"layer_%d\" % layer_idx\n    # share all parameters across layers. add by brightmart, 2019-09-28. previous it is like this: \"layer_%d\" % layer_idx\n    with tf.variable_scope(name_variable_scope, reuse=True if (share_parameter_across_layers and layer_idx>0) else False):\n\n      layer_input = prev_output\n\n      with tf.variable_scope(\"attention\"):\n        attention_heads = []\n        with tf.variable_scope(\"self\"):\n          attention_head = attention_layer(\n              from_tensor=layer_input,\n              to_tensor=layer_input,\n              attention_mask=attention_mask,\n              num_attention_heads=num_attention_heads,\n              size_per_head=attention_head_size,\n              attention_probs_dropout_prob=attention_probs_dropout_prob,\n              initializer_range=initializer_range,\n              do_return_2d_tensor=True,\n              batch_size=batch_size,\n              from_seq_length=seq_length,\n              to_seq_length=seq_length)\n          attention_heads.append(attention_head)\n\n        attention_output = None\n        if len(attention_heads) == 1:\n          attention_output = attention_heads[0]\n        else:\n          # In the case where we have other sequences, we just concatenate\n          # them to the self-attention head before the projection.\n          attention_output = tf.concat(attention_heads, axis=-1)\n\n        # Run a linear projection of `hidden_size` then add a residual\n        # with `layer_input`.\n        with tf.variable_scope(\"output\"):\n          attention_output = tf.layers.dense(\n              attention_output,\n              hidden_size,\n              kernel_initializer=create_initializer(initializer_range))\n          attention_output = dropout(attention_output, hidden_dropout_prob)\n          attention_output = layer_norm(attention_output + layer_input)\n\n      # The activation is only applied to the \"intermediate\" hidden layer.\n      with tf.variable_scope(\"intermediate\"):\n        intermediate_output = tf.layers.dense(\n            attention_output,\n            intermediate_size,\n            activation=intermediate_act_fn,\n            kernel_initializer=create_initializer(initializer_range))\n\n      # Down-project back to `hidden_size` then add the residual.\n      with tf.variable_scope(\"output\"):\n        layer_output = tf.layers.dense(\n            intermediate_output,\n            hidden_size,\n            kernel_initializer=create_initializer(initializer_range))\n        layer_output = dropout(layer_output, hidden_dropout_prob)\n        layer_output = layer_norm(layer_output + attention_output)\n        prev_output = layer_output\n        all_layer_outputs.append(layer_output)\n\n  if do_return_all_layers:\n    final_outputs = []\n    for layer_output in all_layer_outputs:\n      final_output = reshape_from_matrix(layer_output, input_shape)\n      final_outputs.append(final_output)\n    return final_outputs\n  else:\n    final_output = reshape_from_matrix(prev_output, input_shape)\n    return final_output\n\n\ndef get_shape_list(tensor, expected_rank=None, name=None):\n  \"\"\"Returns a list of the shape of tensor, preferring static dimensions.\n\n  Args:\n    tensor: A tf.Tensor object to find the shape of.\n    expected_rank: (optional) int. The expected rank of `tensor`. If this is\n      specified and the `tensor` has a different rank, and exception will be\n      thrown.\n    name: Optional name of the tensor for the error message.\n\n  Returns:\n    A list of dimensions of the shape of tensor. All static dimensions will\n    be returned as python integers, and dynamic dimensions will be returned\n    as tf.Tensor scalars.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  if expected_rank is not None:\n    assert_rank(tensor, expected_rank, name)\n\n  shape = tensor.shape.as_list()\n\n  non_static_indexes = []\n  for (index, dim) in enumerate(shape):\n    if dim is None:\n      non_static_indexes.append(index)\n\n  if not non_static_indexes:\n    return shape\n\n  dyn_shape = tf.shape(tensor)\n  for index in non_static_indexes:\n    shape[index] = dyn_shape[index]\n  return shape\n\n\ndef reshape_to_matrix(input_tensor):\n  \"\"\"Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).\"\"\"\n  ndims = input_tensor.shape.ndims\n  if ndims < 2:\n    raise ValueError(\"Input tensor must have at least rank 2. Shape = %s\" %\n                     (input_tensor.shape))\n  if ndims == 2:\n    return input_tensor\n\n  width = input_tensor.shape[-1]\n  output_tensor = tf.reshape(input_tensor, [-1, width])\n  return output_tensor\n\n\ndef reshape_from_matrix(output_tensor, orig_shape_list):\n  \"\"\"Reshapes a rank 2 tensor back to its original rank >= 2 tensor.\"\"\"\n  if len(orig_shape_list) == 2:\n    return output_tensor\n\n  output_shape = get_shape_list(output_tensor)\n\n  orig_dims = orig_shape_list[0:-1]\n  width = output_shape[-1]\n\n  return tf.reshape(output_tensor, orig_dims + [width])\n\n\ndef assert_rank(tensor, expected_rank, name=None):\n  \"\"\"Raises an exception if the tensor rank is not of the expected rank.\n\n  Args:\n    tensor: A tf.Tensor to check the rank of.\n    expected_rank: Python integer or list of integers, expected rank.\n    name: Optional name of the tensor for the error message.\n\n  Raises:\n    ValueError: If the expected shape doesn't match the actual shape.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  expected_rank_dict = {}\n  if isinstance(expected_rank, six.integer_types):\n    expected_rank_dict[expected_rank] = True\n  else:\n    for x in expected_rank:\n      expected_rank_dict[x] = True\n\n  actual_rank = tensor.shape.ndims\n  if actual_rank not in expected_rank_dict:\n    scope_name = tf.get_variable_scope().name\n    raise ValueError(\n        \"For the tensor `%s` in scope `%s`, the actual rank \"\n        \"`%d` (shape = %s) is not equal to the expected rank `%s`\" %\n        (name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))\n\ndef prelln_transformer_model(input_tensor,\n\t\t\t\t\t\tattention_mask=None,\n\t\t\t\t\t\thidden_size=768,\n\t\t\t\t\t\tnum_hidden_layers=12,\n\t\t\t\t\t\tnum_attention_heads=12,\n\t\t\t\t\t\tintermediate_size=3072,\n\t\t\t\t\t\tintermediate_act_fn=gelu,\n\t\t\t\t\t\thidden_dropout_prob=0.1,\n\t\t\t\t\t\tattention_probs_dropout_prob=0.1,\n\t\t\t\t\t\tinitializer_range=0.02,\n\t\t\t\t\t\tdo_return_all_layers=False,\n\t\t\t\t\t\tshared_type='all', # None,\n\t\t\t\t\t\tadapter_fn=None):\n\t\"\"\"Multi-headed, multi-layer Transformer from \"Attention is All You Need\".\n\n\tThis is almost an exact implementation of the original Transformer encoder.\n\n\tSee the original paper:\n\thttps://arxiv.org/abs/1706.03762\n\n\tAlso see:\n\thttps://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py\n\n\tArgs:\n\t\tinput_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].\n\t\tattention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,\n\t\t\tseq_length], with 1 for positions that can be attended to and 0 in\n\t\t\tpositions that should not be.\n\t\thidden_size: int. Hidden size of the Transformer.\n\t\tnum_hidden_layers: int. Number of layers (blocks) in the Transformer.\n\t\tnum_attention_heads: int. Number of attention heads in the Transformer.\n\t\tintermediate_size: int. The size of the \"intermediate\" (a.k.a., feed\n\t\t\tforward) layer.\n\t\tintermediate_act_fn: function. The non-linear activation function to apply\n\t\t\tto the output of the intermediate/feed-forward layer.\n\t\thidden_dropout_prob: float. Dropout probability for the hidden layers.\n\t\tattention_probs_dropout_prob: float. Dropout probability of the attention\n\t\t\tprobabilities.\n\t\tinitializer_range: float. Range of the initializer (stddev of truncated\n\t\t\tnormal).\n\t\tdo_return_all_layers: Whether to also return all layers or just the final\n\t\t\tlayer.\n\n\tReturns:\n\t\tfloat Tensor of shape [batch_size, seq_length, hidden_size], the final\n\t\thidden layer of the Transformer.\n\n\tRaises:\n\t\tValueError: A Tensor shape or parameter is invalid.\n\t\"\"\"\n\tif hidden_size % num_attention_heads != 0:\n\t\traise ValueError(\n\t\t\t\t\"The hidden size (%d) is not a multiple of the number of attention \"\n\t\t\t\t\"heads (%d)\" % (hidden_size, num_attention_heads))\n\n\tattention_head_size = int(hidden_size / num_attention_heads)\n\n\tinput_shape = bert_utils.get_shape_list(input_tensor, expected_rank=3)\n\tbatch_size = input_shape[0]\n\tseq_length = input_shape[1]\n\tinput_width = input_shape[2]\n\n\t# The Transformer performs sum residuals on all layers so the input needs\n\t# to be the same as the hidden size.\n\tif input_width != hidden_size:\n\t\traise ValueError(\"The width of the input tensor (%d) != hidden size (%d)\" %\n\t\t\t\t\t\t\t\t\t\t (input_width, hidden_size))\n\n\t# We keep the representation as a 2D tensor to avoid re-shaping it back and\n\t# forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on\n\t# the GPU/CPU but may not be free on the TPU, so we want to minimize them to\n\t# help the optimizer.\n\tprev_output = bert_utils.reshape_to_matrix(input_tensor)\n\n\tall_layer_outputs = []\n\n\tdef layer_scope(idx, shared_type):\n\t\tif shared_type == 'all':\n\t\t\ttmp = {\n\t\t\t\t\"layer\":\"layer_shared\",\n\t\t\t\t'attention':'attention',\n\t\t\t\t'intermediate':'intermediate',\n\t\t\t\t'output':'output'\n\t\t\t}\n\t\telif shared_type == 'attention':\n\t\t\ttmp = {\n\t\t\t\t\"layer\":\"layer_shared\",\n\t\t\t\t'attention':'attention',\n\t\t\t\t'intermediate':'intermediate_{}'.format(idx),\n\t\t\t\t'output':'output_{}'.format(idx)\n\t\t\t}\n\t\telif shared_type == 'ffn':\n\t\t\ttmp = {\n\t\t\t\t\"layer\":\"layer_shared\",\n\t\t\t\t'attention':'attention_{}'.format(idx),\n\t\t\t\t'intermediate':'intermediate',\n\t\t\t\t'output':'output'\n\t\t\t}\n\t\telse:\n\t\t\ttmp = {\n\t\t\t\t\"layer\":\"layer_{}\".format(idx),\n\t\t\t\t'attention':'attention',\n\t\t\t\t'intermediate':'intermediate',\n\t\t\t\t'output':'output'\n\t\t\t}\n\n\t\treturn tmp\n\n\tall_layer_outputs = []\n\n\tfor layer_idx in range(num_hidden_layers):\n\n\t\tidx_scope = layer_scope(layer_idx, shared_type)\n\n\t\twith tf.variable_scope(idx_scope['layer'], reuse=tf.AUTO_REUSE):\n\t\t\tlayer_input = prev_output\n\n\t\t\twith tf.variable_scope(idx_scope['attention'], reuse=tf.AUTO_REUSE):\n\t\t\t\tattention_heads = []\n\n\t\t\t\twith tf.variable_scope(\"output\", reuse=tf.AUTO_REUSE):\n\t\t\t\t\tlayer_input_pre = layer_norm(layer_input)\n\n\t\t\t\twith tf.variable_scope(\"self\"):\n\t\t\t\t\tattention_head = attention_layer(\n\t\t\t\t\t\t\tfrom_tensor=layer_input_pre,\n\t\t\t\t\t\t\tto_tensor=layer_input_pre,\n\t\t\t\t\t\t\tattention_mask=attention_mask,\n\t\t\t\t\t\t\tnum_attention_heads=num_attention_heads,\n\t\t\t\t\t\t\tsize_per_head=attention_head_size,\n\t\t\t\t\t\t\tattention_probs_dropout_prob=attention_probs_dropout_prob,\n\t\t\t\t\t\t\tinitializer_range=initializer_range,\n\t\t\t\t\t\t\tdo_return_2d_tensor=True,\n\t\t\t\t\t\t\tbatch_size=batch_size,\n\t\t\t\t\t\t\tfrom_seq_length=seq_length,\n\t\t\t\t\t\t\tto_seq_length=seq_length)\n\t\t\t\t\tattention_heads.append(attention_head)\n\n\t\t\t\tattention_output = None\n\t\t\t\tif len(attention_heads) == 1:\n\t\t\t\t\tattention_output = attention_heads[0]\n\t\t\t\telse:\n\t\t\t\t\t# In the case where we have other sequences, we just concatenate\n\t\t\t\t\t# them to the self-attention head before the projection.\n\t\t\t\t\tattention_output = tf.concat(attention_heads, axis=-1)\n\n\t\t\t\t# Run a linear projection of `hidden_size` then add a residual\n\t\t\t\t# with `layer_input`.\n\t\t\t\twith tf.variable_scope(\"output\", reuse=tf.AUTO_REUSE):\n\t\t\t\t\tattention_output = tf.layers.dense(\n\t\t\t\t\t\t\tattention_output,\n\t\t\t\t\t\t\thidden_size,\n\t\t\t\t\t\t\tkernel_initializer=create_initializer(initializer_range))\n\t\t\t\t\tattention_output = dropout(attention_output, hidden_dropout_prob)\n\n\t\t\t\t\t# attention_output = layer_norm(attention_output + layer_input)\n\t\t\t\t\tattention_output = attention_output + layer_input\n\n\t\t\twith tf.variable_scope(idx_scope['output'], reuse=tf.AUTO_REUSE):\n\t\t\t\tattention_output_pre = layer_norm(attention_output)\n\n\t\t\t# The activation is only applied to the \"intermediate\" hidden layer.\n\t\t\twith tf.variable_scope(idx_scope['intermediate'], reuse=tf.AUTO_REUSE):\n\t\t\t\tintermediate_output = tf.layers.dense(\n\t\t\t\t\t\tattention_output_pre,\n\t\t\t\t\t\tintermediate_size,\n\t\t\t\t\t\tactivation=intermediate_act_fn,\n\t\t\t\t\t\tkernel_initializer=create_initializer(initializer_range))\n\n\t\t\t# Down-project back to `hidden_size` then add the residual.\n\t\t\twith tf.variable_scope(idx_scope['output'], reuse=tf.AUTO_REUSE):\n\t\t\t\tlayer_output = tf.layers.dense(\n\t\t\t\t\t\tintermediate_output,\n\t\t\t\t\t\thidden_size,\n\t\t\t\t\t\tkernel_initializer=create_initializer(initializer_range))\n\t\t\t\tlayer_output = dropout(layer_output, hidden_dropout_prob)\n\n\t\t\t\t# layer_output = layer_norm(layer_output + attention_output)\n\t\t\t\tlayer_output = layer_output + attention_output\n\t\t\t\tprev_output = layer_output\n\t\t\t\tall_layer_outputs.append(layer_output)\n\n\tif do_return_all_layers:\n\t\tfinal_outputs = []\n\t\tfor layer_output in all_layer_outputs:\n\t\t\tfinal_output = bert_utils.reshape_from_matrix(layer_output, input_shape)\n\t\t\tfinal_outputs.append(final_output)\n\t\treturn final_outputs\n\telse:\n\t\tfinal_output = bert_utils.reshape_from_matrix(prev_output, input_shape)\n\t\treturn final_output\n"
  },
  {
    "path": "modeling_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"The main ALBERT model and related functions.\nFor a description of the algorithm, see https://arxiv.org/abs/1909.11942.\n\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport copy\nimport json\nimport math\nimport re\nimport numpy as np\nimport six\nfrom six.moves import range\nimport tensorflow as tf\n\n\nclass AlbertConfig(object):\n  \"\"\"Configuration for `AlbertModel`.\n  The default settings match the configuration of model `albert_xxlarge`.\n  \"\"\"\n\n  def __init__(self,\n               vocab_size,\n               embedding_size=128,\n               hidden_size=4096,\n               num_hidden_layers=12,\n               num_hidden_groups=1,\n               num_attention_heads=64,\n               intermediate_size=16384,\n               inner_group_num=1,\n               down_scale_factor=1,\n               hidden_act=\"gelu\",\n               hidden_dropout_prob=0,\n               attention_probs_dropout_prob=0,\n               max_position_embeddings=512,\n               type_vocab_size=2,\n               initializer_range=0.02):\n    \"\"\"Constructs AlbertConfig.\n    Args:\n      vocab_size: Vocabulary size of `inputs_ids` in `AlbertModel`.\n      embedding_size: size of voc embeddings.\n      hidden_size: Size of the encoder layers and the pooler layer.\n      num_hidden_layers: Number of hidden layers in the Transformer encoder.\n      num_hidden_groups: Number of group for the hidden layers, parameters in\n        the same group are shared.\n      num_attention_heads: Number of attention heads for each attention layer in\n        the Transformer encoder.\n      intermediate_size: The size of the \"intermediate\" (i.e., feed-forward)\n        layer in the Transformer encoder.\n      inner_group_num: int, number of inner repetition of attention and ffn.\n      down_scale_factor: float, the scale to apply\n      hidden_act: The non-linear activation function (function or string) in the\n        encoder and pooler.\n      hidden_dropout_prob: The dropout probability for all fully connected\n        layers in the embeddings, encoder, and pooler.\n      attention_probs_dropout_prob: The dropout ratio for the attention\n        probabilities.\n      max_position_embeddings: The maximum sequence length that this model might\n        ever be used with. Typically set this to something large just in case\n        (e.g., 512 or 1024 or 2048).\n      type_vocab_size: The vocabulary size of the `token_type_ids` passed into\n        `AlbertModel`.\n      initializer_range: The stdev of the truncated_normal_initializer for\n        initializing all weight matrices.\n    \"\"\"\n    self.vocab_size = vocab_size\n    self.embedding_size = embedding_size\n    self.hidden_size = hidden_size\n    self.num_hidden_layers = num_hidden_layers\n    self.num_hidden_groups = num_hidden_groups\n    self.num_attention_heads = num_attention_heads\n    self.inner_group_num = inner_group_num\n    self.down_scale_factor = down_scale_factor\n    self.hidden_act = hidden_act\n    self.intermediate_size = intermediate_size\n    self.hidden_dropout_prob = hidden_dropout_prob\n    self.attention_probs_dropout_prob = attention_probs_dropout_prob\n    self.max_position_embeddings = max_position_embeddings\n    self.type_vocab_size = type_vocab_size\n    self.initializer_range = initializer_range\n\n  @classmethod\n  def from_dict(cls, json_object):\n    \"\"\"Constructs a `AlbertConfig` from a Python dictionary of parameters.\"\"\"\n    config = AlbertConfig(vocab_size=None)\n    for (key, value) in six.iteritems(json_object):\n      config.__dict__[key] = value\n    return config\n\n  @classmethod\n  def from_json_file(cls, json_file):\n    \"\"\"Constructs a `AlbertConfig` from a json file of parameters.\"\"\"\n    with tf.gfile.GFile(json_file, \"r\") as reader:\n      text = reader.read()\n    return cls.from_dict(json.loads(text))\n\n  def to_dict(self):\n    \"\"\"Serializes this instance to a Python dictionary.\"\"\"\n    output = copy.deepcopy(self.__dict__)\n    return output\n\n  def to_json_string(self):\n    \"\"\"Serializes this instance to a JSON string.\"\"\"\n    return json.dumps(self.to_dict(), indent=2, sort_keys=True) + \"\\n\"\n\n\nclass AlbertModel(object):\n  \"\"\"BERT model (\"Bidirectional Encoder Representations from Transformers\").\n  Example usage:\n  ```python\n  # Already been converted from strings into ids\n  input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])\n  input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])\n  token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])\n  config = modeling.AlbertConfig(vocab_size=32000, hidden_size=512,\n    num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)\n  model = modeling.AlbertModel(config=config, is_training=True,\n    input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)\n  label_embeddings = tf.get_variable(...)\n  pooled_output = model.get_pooled_output()\n  logits = tf.matmul(pooled_output, label_embeddings)\n  ...\n  ```\n  \"\"\"\n\n  def __init__(self,\n               config,\n               is_training,\n               input_ids,\n               input_mask=None,\n               token_type_ids=None,\n               use_one_hot_embeddings=False,\n               scope=None):\n    \"\"\"Constructor for AlbertModel.\n    Args:\n      config: `AlbertConfig` instance.\n      is_training: bool. true for training model, false for eval model. Controls\n        whether dropout will be applied.\n      input_ids: int32 Tensor of shape [batch_size, seq_length].\n      input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].\n      token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      use_one_hot_embeddings: (optional) bool. Whether to use one-hot word\n        embeddings or tf.embedding_lookup() for the word embeddings.\n      scope: (optional) variable scope. Defaults to \"bert\".\n    Raises:\n      ValueError: The config is invalid or one of the input tensor shapes\n        is invalid.\n    \"\"\"\n    config = copy.deepcopy(config)\n    if not is_training:\n      config.hidden_dropout_prob = 0.0\n      config.attention_probs_dropout_prob = 0.0\n\n    input_shape = get_shape_list(input_ids, expected_rank=2)\n    batch_size = input_shape[0]\n    seq_length = input_shape[1]\n\n    if input_mask is None:\n      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    if token_type_ids is None:\n      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    with tf.variable_scope(scope, default_name=\"bert\"):\n      with tf.variable_scope(\"embeddings\"):\n        # Perform embedding lookup on the word ids.\n        (self.word_embedding_output,\n         self.output_embedding_table) = embedding_lookup(\n            input_ids=input_ids,\n            vocab_size=config.vocab_size,\n            embedding_size=config.embedding_size,\n            initializer_range=config.initializer_range,\n            word_embedding_name=\"word_embeddings\",\n            use_one_hot_embeddings=use_one_hot_embeddings)\n\n        # Add positional embeddings and token type embeddings, then layer\n        # normalize and perform dropout.\n        self.embedding_output = embedding_postprocessor(\n            input_tensor=self.word_embedding_output,\n            use_token_type=True,\n            token_type_ids=token_type_ids,\n            token_type_vocab_size=config.type_vocab_size,\n            token_type_embedding_name=\"token_type_embeddings\",\n            use_position_embeddings=True,\n            position_embedding_name=\"position_embeddings\",\n            initializer_range=config.initializer_range,\n            max_position_embeddings=config.max_position_embeddings,\n            dropout_prob=config.hidden_dropout_prob)\n\n      with tf.variable_scope(\"encoder\"):\n\n        # Run the stacked transformer.\n        # `sequence_output` shape = [batch_size, seq_length, hidden_size].\n        self.all_encoder_layers = transformer_model(\n            input_tensor=self.embedding_output,\n            attention_mask=input_mask,\n            hidden_size=config.hidden_size,\n            num_hidden_layers=config.num_hidden_layers,\n            num_hidden_groups=config.num_hidden_groups,\n            num_attention_heads=config.num_attention_heads,\n            intermediate_size=config.intermediate_size,\n            inner_group_num=config.inner_group_num,\n            intermediate_act_fn=get_activation(config.hidden_act),\n            hidden_dropout_prob=config.hidden_dropout_prob,\n            attention_probs_dropout_prob=config.attention_probs_dropout_prob,\n            initializer_range=config.initializer_range,\n            do_return_all_layers=True)\n\n      self.sequence_output = self.all_encoder_layers[-1]\n      # The \"pooler\" converts the encoded sequence tensor of shape\n      # [batch_size, seq_length, hidden_size] to a tensor of shape\n      # [batch_size, hidden_size]. This is necessary for segment-level\n      # (or segment-pair-level) classification tasks where we need a fixed\n      # dimensional representation of the segment.\n      with tf.variable_scope(\"pooler\"):\n        # We \"pool\" the model by simply taking the hidden state corresponding\n        # to the first token. We assume that this has been pre-trained\n        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)\n        self.pooled_output = tf.layers.dense(\n            first_token_tensor,\n            config.hidden_size,\n            activation=tf.tanh,\n            kernel_initializer=create_initializer(config.initializer_range))\n\n  def get_pooled_output(self):\n    return self.pooled_output\n\n  def get_sequence_output(self):\n    \"\"\"Gets final hidden layer of encoder.\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the final hidden of the transformer encoder.\n    \"\"\"\n    return self.sequence_output\n\n  def get_all_encoder_layers(self):\n    return self.all_encoder_layers\n\n  def get_word_embedding_output(self):\n    \"\"\"Get output of the word(piece) embedding lookup.\n    This is BEFORE positional embeddings and token type embeddings have been\n    added.\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the output of the word(piece) embedding layer.\n    \"\"\"\n    return self.word_embedding_output\n\n  def get_embedding_output(self):\n    \"\"\"Gets output of the embedding lookup (i.e., input to the transformer).\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the output of the embedding layer, after summing the word\n      embeddings with the positional embeddings and the token type embeddings,\n      then performing layer normalization. This is the input to the transformer.\n    \"\"\"\n    return self.embedding_output\n\n  def get_embedding_table(self):\n    return self.output_embedding_table\n\n\ndef gelu(x):\n  \"\"\"Gaussian Error Linear Unit.\n  This is a smoother version of the RELU.\n  Original paper: https://arxiv.org/abs/1606.08415\n  Args:\n    x: float Tensor to perform activation.\n  Returns:\n    `x` with the GELU activation applied.\n  \"\"\"\n  cdf = 0.5 * (1.0 + tf.tanh(\n      (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))\n  return x * cdf\n\n\ndef get_activation(activation_string):\n  \"\"\"Maps a string to a Python function, e.g., \"relu\" => `tf.nn.relu`.\n  Args:\n    activation_string: String name of the activation function.\n  Returns:\n    A Python function corresponding to the activation function. If\n    `activation_string` is None, empty, or \"linear\", this will return None.\n    If `activation_string` is not a string, it will return `activation_string`.\n  Raises:\n    ValueError: The `activation_string` does not correspond to a known\n      activation.\n  \"\"\"\n\n  # We assume that anything that\"s not a string is already an activation\n  # function, so we just return it.\n  if not isinstance(activation_string, six.string_types):\n    return activation_string\n\n  if not activation_string:\n    return None\n\n  act = activation_string.lower()\n  if act == \"linear\":\n    return None\n  elif act == \"relu\":\n    return tf.nn.relu\n  elif act == \"gelu\":\n    return gelu\n  elif act == \"tanh\":\n    return tf.tanh\n  else:\n    raise ValueError(\"Unsupported activation: %s\" % act)\n\n\ndef get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0):\n  \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n  assignment_map = {}\n  initialized_variable_names = {}\n\n  name_to_variable = collections.OrderedDict()\n  for var in tvars:\n    name = var.name\n    m = re.match(\"^(.*):\\\\d+$\", name)\n    if m is not None:\n      name = m.group(1)\n    name_to_variable[name] = var\n  init_vars = tf.train.list_variables(init_checkpoint)\n  init_vars_name = [name for (name, _) in init_vars]\n\n  if num_of_group > 0:\n    assignment_map = []\n    for gid in range(num_of_group):\n      assignment_map.append(collections.OrderedDict())\n  else:\n    assignment_map = collections.OrderedDict()\n\n  for name in name_to_variable:\n    if name in init_vars_name:\n      tvar_name = name\n    elif (re.sub(r\"/group_\\d+/\", \"/group_0/\",\n                 six.ensure_str(name)) in init_vars_name and\n          num_of_group > 1):\n      tvar_name = re.sub(r\"/group_\\d+/\", \"/group_0/\", six.ensure_str(name))\n    elif (re.sub(r\"/ffn_\\d+/\", \"/ffn_1/\", six.ensure_str(name))\n          in init_vars_name and num_of_group > 1):\n      tvar_name = re.sub(r\"/ffn_\\d+/\", \"/ffn_1/\", six.ensure_str(name))\n    elif (re.sub(r\"/attention_\\d+/\", \"/attention_1/\", six.ensure_str(name))\n          in init_vars_name and num_of_group > 1):\n      tvar_name = re.sub(r\"/attention_\\d+/\", \"/attention_1/\",\n                         six.ensure_str(name))\n    else:\n      tf.logging.info(\"name %s does not get matched\", name)\n      continue\n    tf.logging.info(\"name %s match to %s\", name, tvar_name)\n    if num_of_group > 0:\n      group_matched = False\n      for gid in range(1, num_of_group):\n        if ((\"/group_\" + str(gid) + \"/\" in name) or\n            (\"/ffn_\" + str(gid) + \"/\" in name) or\n            (\"/attention_\" + str(gid) + \"/\" in name)):\n          group_matched = True\n          tf.logging.info(\"%s belongs to %dth\", name, gid)\n          assignment_map[gid][tvar_name] = name\n      if not group_matched:\n        assignment_map[0][tvar_name] = name\n    else:\n      assignment_map[tvar_name] = name\n    initialized_variable_names[name] = 1\n    initialized_variable_names[six.ensure_str(name) + \":0\"] = 1\n\n  return (assignment_map, initialized_variable_names)\n\n\ndef dropout(input_tensor, dropout_prob):\n  \"\"\"Perform dropout.\n  Args:\n    input_tensor: float Tensor.\n    dropout_prob: Python float. The probability of dropping out a value (NOT of\n      *keeping* a dimension as in `tf.nn.dropout`).\n  Returns:\n    A version of `input_tensor` with dropout applied.\n  \"\"\"\n  if dropout_prob is None or dropout_prob == 0.0:\n    return input_tensor\n\n  output = tf.nn.dropout(input_tensor, rate=dropout_prob)\n  return output\n\n\ndef layer_norm(input_tensor, name=None):\n  \"\"\"Run layer normalization on the last dimension of the tensor.\"\"\"\n  return tf.contrib.layers.layer_norm(\n      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)\n\n\ndef layer_norm_and_dropout(input_tensor, dropout_prob, name=None):\n  \"\"\"Runs layer normalization followed by dropout.\"\"\"\n  output_tensor = layer_norm(input_tensor, name)\n  output_tensor = dropout(output_tensor, dropout_prob)\n  return output_tensor\n\n\ndef create_initializer(initializer_range=0.02):\n  \"\"\"Creates a `truncated_normal_initializer` with the given range.\"\"\"\n  return tf.truncated_normal_initializer(stddev=initializer_range)\n\n\ndef get_timing_signal_1d_given_position(channels,\n                                        position,\n                                        min_timescale=1.0,\n                                        max_timescale=1.0e4):\n  \"\"\"Get sinusoids of diff frequencies, with timing position given.\n  Adapted from add_timing_signal_1d_given_position in\n  //third_party/py/tensor2tensor/layers/common_attention.py\n  Args:\n    channels: scalar, size of timing embeddings to create. The number of\n        different timescales is equal to channels / 2.\n    position: a Tensor with shape [batch, seq_len]\n    min_timescale: a float\n    max_timescale: a float\n  Returns:\n    a Tensor of timing signals [batch, seq_len, channels]\n  \"\"\"\n  num_timescales = channels // 2\n  log_timescale_increment = (\n      math.log(float(max_timescale) / float(min_timescale)) /\n      (tf.to_float(num_timescales) - 1))\n  inv_timescales = min_timescale * tf.exp(\n      tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)\n  scaled_time = (\n      tf.expand_dims(tf.to_float(position), 2) * tf.expand_dims(\n          tf.expand_dims(inv_timescales, 0), 0))\n  signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=2)\n  signal = tf.pad(signal, [[0, 0], [0, 0], [0, tf.mod(channels, 2)]])\n  return signal\n\n\ndef embedding_lookup(input_ids,\n                     vocab_size,\n                     embedding_size=128,\n                     initializer_range=0.02,\n                     word_embedding_name=\"word_embeddings\",\n                     use_one_hot_embeddings=False):\n  \"\"\"Looks up words embeddings for id tensor.\n  Args:\n    input_ids: int32 Tensor of shape [batch_size, seq_length] containing word\n      ids.\n    vocab_size: int. Size of the embedding vocabulary.\n    embedding_size: int. Width of the word embeddings.\n    initializer_range: float. Embedding initialization range.\n    word_embedding_name: string. Name of the embedding table.\n    use_one_hot_embeddings: bool. If True, use one-hot method for word\n      embeddings. If False, use `tf.nn.embedding_lookup()`.\n  Returns:\n    float Tensor of shape [batch_size, seq_length, embedding_size].\n  \"\"\"\n  # This function assumes that the input is of shape [batch_size, seq_length,\n  # num_inputs].\n  #\n  # If the input is a 2D tensor of shape [batch_size, seq_length], we\n  # reshape to [batch_size, seq_length, 1].\n  if input_ids.shape.ndims == 2:\n    input_ids = tf.expand_dims(input_ids, axis=[-1])\n\n  embedding_table = tf.get_variable(\n      name=word_embedding_name,\n      shape=[vocab_size, embedding_size],\n      initializer=create_initializer(initializer_range))\n\n  if use_one_hot_embeddings:\n    flat_input_ids = tf.reshape(input_ids, [-1])\n    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)\n    output = tf.matmul(one_hot_input_ids, embedding_table)\n  else:\n    output = tf.nn.embedding_lookup(embedding_table, input_ids)\n\n  input_shape = get_shape_list(input_ids)\n\n  output = tf.reshape(output,\n                      input_shape[0:-1] + [input_shape[-1] * embedding_size])\n  return (output, embedding_table)\n\n\ndef embedding_postprocessor(input_tensor,\n                            use_token_type=False,\n                            token_type_ids=None,\n                            token_type_vocab_size=16,\n                            token_type_embedding_name=\"token_type_embeddings\",\n                            use_position_embeddings=True,\n                            position_embedding_name=\"position_embeddings\",\n                            initializer_range=0.02,\n                            max_position_embeddings=512,\n                            dropout_prob=0.1):\n  \"\"\"Performs various post-processing on a word embedding tensor.\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length,\n      embedding_size].\n    use_token_type: bool. Whether to add embeddings for `token_type_ids`.\n    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      Must be specified if `use_token_type` is True.\n    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.\n    token_type_embedding_name: string. The name of the embedding table variable\n      for token type ids.\n    use_position_embeddings: bool. Whether to add position embeddings for the\n      position of each token in the sequence.\n    position_embedding_name: string. The name of the embedding table variable\n      for positional embeddings.\n    initializer_range: float. Range of the weight initialization.\n    max_position_embeddings: int. Maximum sequence length that might ever be\n      used with this model. This can be longer than the sequence length of\n      input_tensor, but cannot be shorter.\n    dropout_prob: float. Dropout probability applied to the final output tensor.\n  Returns:\n    float tensor with same shape as `input_tensor`.\n  Raises:\n    ValueError: One of the tensor shapes or input values is invalid.\n  \"\"\"\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  batch_size = input_shape[0]\n  seq_length = input_shape[1]\n  width = input_shape[2]\n\n  output = input_tensor\n\n  if use_token_type:\n    if token_type_ids is None:\n      raise ValueError(\"`token_type_ids` must be specified if\"\n                       \"`use_token_type` is True.\")\n    token_type_table = tf.get_variable(\n        name=token_type_embedding_name,\n        shape=[token_type_vocab_size, width],\n        initializer=create_initializer(initializer_range))\n    # This vocab will be small so we always do one-hot here, since it is always\n    # faster for a small vocabulary.\n    flat_token_type_ids = tf.reshape(token_type_ids, [-1])\n    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)\n    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)\n    token_type_embeddings = tf.reshape(token_type_embeddings,\n                                       [batch_size, seq_length, width])\n    output += token_type_embeddings\n\n  if use_position_embeddings:\n    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)\n    with tf.control_dependencies([assert_op]):\n      full_position_embeddings = tf.get_variable(\n          name=position_embedding_name,\n          shape=[max_position_embeddings, width],\n          initializer=create_initializer(initializer_range))\n      # Since the position embedding table is a learned variable, we create it\n      # using a (long) sequence length `max_position_embeddings`. The actual\n      # sequence length might be shorter than this, for faster training of\n      # tasks that do not have long sequences.\n      #\n      # So `full_position_embeddings` is effectively an embedding table\n      # for position [0, 1, 2, ..., max_position_embeddings-1], and the current\n      # sequence has positions [0, 1, 2, ... seq_length-1], so we can just\n      # perform a slice.\n      position_embeddings = tf.slice(full_position_embeddings, [0, 0],\n                                     [seq_length, -1])\n      num_dims = len(output.shape.as_list())\n\n      # Only the last two dimensions are relevant (`seq_length` and `width`), so\n      # we broadcast among the first dimensions, which is typically just\n      # the batch size.\n      position_broadcast_shape = []\n      for _ in range(num_dims - 2):\n        position_broadcast_shape.append(1)\n      position_broadcast_shape.extend([seq_length, width])\n      position_embeddings = tf.reshape(position_embeddings,\n                                       position_broadcast_shape)\n      output += position_embeddings\n\n  output = layer_norm_and_dropout(output, dropout_prob)\n  return output\n\n\ndef dense_layer_3d(input_tensor,\n                   num_attention_heads,\n                   head_size,\n                   initializer,\n                   activation,\n                   name=None):\n  \"\"\"A dense layer with 3D kernel.\n  Args:\n    input_tensor: float Tensor of shape [batch, seq_length, hidden_size].\n    num_attention_heads: Number of attention heads.\n    head_size: The size per attention head.\n    initializer: Kernel initializer.\n    activation: Actication function.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n\n  input_shape = get_shape_list(input_tensor)\n  hidden_size = input_shape[2]\n\n  with tf.variable_scope(name):\n    w = tf.get_variable(\n        name=\"kernel\",\n        shape=[hidden_size, num_attention_heads * head_size],\n        initializer=initializer)\n    w = tf.reshape(w, [hidden_size, num_attention_heads, head_size])\n    b = tf.get_variable(\n        name=\"bias\",\n        shape=[num_attention_heads * head_size],\n        initializer=tf.zeros_initializer)\n    b = tf.reshape(b, [num_attention_heads, head_size])\n    ret = tf.einsum(\"BFH,HND->BFND\", input_tensor, w)\n    ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\n\ndef dense_layer_3d_proj(input_tensor,\n                        hidden_size,\n                        head_size,\n                        initializer,\n                        activation,\n                        name=None):\n  \"\"\"A dense layer with 3D kernel for projection.\n  Args:\n    input_tensor: float Tensor of shape [batch,from_seq_length,\n      num_attention_heads, size_per_head].\n    hidden_size: The size of hidden layer.\n    num_attention_heads: The size of output dimension.\n    head_size: The size of head.\n    initializer: Kernel initializer.\n    activation: Actication function.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n  input_shape = get_shape_list(input_tensor)\n  num_attention_heads= input_shape[2]\n  with tf.variable_scope(name):\n    w = tf.get_variable(\n        name=\"kernel\",\n        shape=[num_attention_heads * head_size, hidden_size],\n        initializer=initializer)\n    w = tf.reshape(w, [num_attention_heads, head_size, hidden_size])\n    b = tf.get_variable(\n        name=\"bias\", shape=[hidden_size], initializer=tf.zeros_initializer)\n    ret = tf.einsum(\"BFND,NDH->BFH\", input_tensor, w)\n    ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\n\ndef dense_layer_2d(input_tensor,\n                   output_size,\n                   initializer,\n                   activation,\n                   num_attention_heads=1,\n                   name=None):\n  \"\"\"A dense layer with 2D kernel.\n  Args:\n    input_tensor: Float tensor with rank 3.\n    output_size: The size of output dimension.\n    initializer: Kernel initializer.\n    activation: Activation function.\n    num_attention_heads: number of attention head in attention layer.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n  del num_attention_heads  # unused\n  input_shape = get_shape_list(input_tensor)\n  hidden_size = input_shape[2]\n  with tf.variable_scope(name):\n    w = tf.get_variable(\n        name=\"kernel\",\n        shape=[hidden_size, output_size],\n        initializer=initializer)\n    b = tf.get_variable(\n        name=\"bias\", shape=[output_size], initializer=tf.zeros_initializer)\n    ret = tf.einsum(\"BFH,HO->BFO\", input_tensor, w)\n    ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\n\ndef dot_product_attention(q, k, v, bias, dropout_rate=0.0):\n  \"\"\"Dot-product attention.\n  Args:\n    q: Tensor with shape [..., length_q, depth_k].\n    k: Tensor with shape [..., length_kv, depth_k]. Leading dimensions must\n      match with q.\n    v: Tensor with shape [..., length_kv, depth_v] Leading dimensions must\n      match with q.\n    bias: bias Tensor (see attention_bias())\n    dropout_rate: a float.\n  Returns:\n    Tensor with shape [..., length_q, depth_v].\n  \"\"\"\n  logits = tf.matmul(q, k, transpose_b=True)  # [..., length_q, length_kv]\n  logits = tf.multiply(logits, 1.0 / math.sqrt(float(get_shape_list(q)[-1])))\n  if bias is not None:\n    # `attention_mask` = [B, T]\n    from_shape = get_shape_list(q)\n    if len(from_shape) == 4:\n      broadcast_ones = tf.ones([from_shape[0], 1, from_shape[2], 1], tf.float32)\n    elif len(from_shape) == 5:\n      # from_shape = [B, N, Block_num, block_size, depth]#\n      broadcast_ones = tf.ones([from_shape[0], 1, from_shape[2], from_shape[3],\n                                1], tf.float32)\n\n    bias = tf.matmul(broadcast_ones,\n                     tf.cast(bias, tf.float32), transpose_b=True)\n\n    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n    # masked positions, this operation will create a tensor which is 0.0 for\n    # positions we want to attend and -10000.0 for masked positions.\n    adder = (1.0 - bias) * -10000.0\n\n    # Since we are adding it to the raw scores before the softmax, this is\n    # effectively the same as removing these entirely.\n    logits += adder\n  else:\n    adder = 0.0\n\n  attention_probs = tf.nn.softmax(logits, name=\"attention_probs\")\n  attention_probs = dropout(attention_probs, dropout_rate)\n  return tf.matmul(attention_probs, v)\n\n\ndef attention_layer(from_tensor,\n                    to_tensor,\n                    attention_mask=None,\n                    num_attention_heads=1,\n                    query_act=None,\n                    key_act=None,\n                    value_act=None,\n                    attention_probs_dropout_prob=0.0,\n                    initializer_range=0.02,\n                    batch_size=None,\n                    from_seq_length=None,\n                    to_seq_length=None):\n  \"\"\"Performs multi-headed attention from `from_tensor` to `to_tensor`.\n  Args:\n    from_tensor: float Tensor of shape [batch_size, from_seq_length,\n      from_width].\n    to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].\n    attention_mask: (optional) int32 Tensor of shape [batch_size,\n      from_seq_length, to_seq_length]. The values should be 1 or 0. The\n      attention scores will effectively be set to -infinity for any positions in\n      the mask that are 0, and will be unchanged for positions that are 1.\n    num_attention_heads: int. Number of attention heads.\n    query_act: (optional) Activation function for the query transform.\n    key_act: (optional) Activation function for the key transform.\n    value_act: (optional) Activation function for the value transform.\n    attention_probs_dropout_prob: (optional) float. Dropout probability of the\n      attention probabilities.\n    initializer_range: float. Range of the weight initializer.\n    batch_size: (Optional) int. If the input is 2D, this might be the batch size\n      of the 3D version of the `from_tensor` and `to_tensor`.\n    from_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `from_tensor`.\n    to_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `to_tensor`.\n  Returns:\n    float Tensor of shape [batch_size, from_seq_length, num_attention_heads,\n      size_per_head].\n  Raises:\n    ValueError: Any of the arguments or tensor shapes are invalid.\n  \"\"\"\n  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])\n  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])\n  size_per_head = int(from_shape[2]/num_attention_heads)\n\n  if len(from_shape) != len(to_shape):\n    raise ValueError(\n        \"The rank of `from_tensor` must match the rank of `to_tensor`.\")\n\n  if len(from_shape) == 3:\n    batch_size = from_shape[0]\n    from_seq_length = from_shape[1]\n    to_seq_length = to_shape[1]\n  elif len(from_shape) == 2:\n    if (batch_size is None or from_seq_length is None or to_seq_length is None):\n      raise ValueError(\n          \"When passing in rank 2 tensors to attention_layer, the values \"\n          \"for `batch_size`, `from_seq_length`, and `to_seq_length` \"\n          \"must all be specified.\")\n\n  # Scalar dimensions referenced here:\n  #   B = batch size (number of sequences)\n  #   F = `from_tensor` sequence length\n  #   T = `to_tensor` sequence length\n  #   N = `num_attention_heads`\n  #   H = `size_per_head`\n\n  # `query_layer` = [B, F, N, H]\n  q = dense_layer_3d(from_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), query_act, \"query\")\n\n  # `key_layer` = [B, T, N, H]\n  k = dense_layer_3d(to_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), key_act, \"key\")\n  # `value_layer` = [B, T, N, H]\n  v = dense_layer_3d(to_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), value_act, \"value\")\n  q = tf.transpose(q, [0, 2, 1, 3])\n  k = tf.transpose(k, [0, 2, 1, 3])\n  v = tf.transpose(v, [0, 2, 1, 3])\n  if attention_mask is not None:\n    attention_mask = tf.reshape(\n        attention_mask, [batch_size, 1, to_seq_length, 1])\n    # 'new_embeddings = [B, N, F, H]'\n  new_embeddings = dot_product_attention(q, k, v, attention_mask,\n                                         attention_probs_dropout_prob)\n\n  return tf.transpose(new_embeddings, [0, 2, 1, 3])\n\n\ndef attention_ffn_block(layer_input,\n                        hidden_size=768,\n                        attention_mask=None,\n                        num_attention_heads=1,\n                        attention_head_size=64,\n                        attention_probs_dropout_prob=0.0,\n                        intermediate_size=3072,\n                        intermediate_act_fn=None,\n                        initializer_range=0.02,\n                        hidden_dropout_prob=0.0):\n  \"\"\"A network with attention-ffn as sub-block.\n  Args:\n    layer_input: float Tensor of shape [batch_size, from_seq_length,\n      from_width].\n    hidden_size: (optional) int, size of hidden layer.\n    attention_mask: (optional) int32 Tensor of shape [batch_size,\n      from_seq_length, to_seq_length]. The values should be 1 or 0. The\n      attention scores will effectively be set to -infinity for any positions in\n      the mask that are 0, and will be unchanged for positions that are 1.\n    num_attention_heads: int. Number of attention heads.\n    attention_head_size: int. Size of attention head.\n    attention_probs_dropout_prob: float. dropout probability for attention_layer\n    intermediate_size: int. Size of intermediate hidden layer.\n    intermediate_act_fn: (optional) Activation function for the intermediate\n      layer.\n    initializer_range: float. Range of the weight initializer.\n    hidden_dropout_prob: (optional) float. Dropout probability of the hidden\n      layer.\n  Returns:\n    layer output\n  \"\"\"\n\n  with tf.variable_scope(\"attention_1\"):\n    with tf.variable_scope(\"self\"):\n      attention_output = attention_layer(\n          from_tensor=layer_input,\n          to_tensor=layer_input,\n          attention_mask=attention_mask,\n          num_attention_heads=num_attention_heads,\n          attention_probs_dropout_prob=attention_probs_dropout_prob,\n          initializer_range=initializer_range)\n\n    # Run a linear projection of `hidden_size` then add a residual\n    # with `layer_input`.\n    with tf.variable_scope(\"output\"):\n      attention_output = dense_layer_3d_proj(\n          attention_output,\n          hidden_size,\n          attention_head_size,\n          create_initializer(initializer_range),\n          None,\n          name=\"dense\")\n      attention_output = dropout(attention_output, hidden_dropout_prob)\n  attention_output = layer_norm(attention_output + layer_input)\n  with tf.variable_scope(\"ffn_1\"):\n    with tf.variable_scope(\"intermediate\"):\n      intermediate_output = dense_layer_2d(\n          attention_output,\n          intermediate_size,\n          create_initializer(initializer_range),\n          intermediate_act_fn,\n          num_attention_heads=num_attention_heads,\n          name=\"dense\")\n      with tf.variable_scope(\"output\"):\n        ffn_output = dense_layer_2d(\n            intermediate_output,\n            hidden_size,\n            create_initializer(initializer_range),\n            None,\n            num_attention_heads=num_attention_heads,\n            name=\"dense\")\n      ffn_output = dropout(ffn_output, hidden_dropout_prob)\n  ffn_output = layer_norm(ffn_output + attention_output)\n  return ffn_output\n\n\ndef transformer_model(input_tensor,\n                      attention_mask=None,\n                      hidden_size=768,\n                      num_hidden_layers=12,\n                      num_hidden_groups=12,\n                      num_attention_heads=12,\n                      intermediate_size=3072,\n                      inner_group_num=1,\n                      intermediate_act_fn=\"gelu\",\n                      hidden_dropout_prob=0.1,\n                      attention_probs_dropout_prob=0.1,\n                      initializer_range=0.02,\n                      do_return_all_layers=False):\n  \"\"\"Multi-headed, multi-layer Transformer from \"Attention is All You Need\".\n  This is almost an exact implementation of the original Transformer encoder.\n  See the original paper:\n  https://arxiv.org/abs/1706.03762\n  Also see:\n  https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].\n    attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,\n      seq_length], with 1 for positions that can be attended to and 0 in\n      positions that should not be.\n    hidden_size: int. Hidden size of the Transformer.\n    num_hidden_layers: int. Number of layers (blocks) in the Transformer.\n    num_hidden_groups: int. Number of group for the hidden layers, parameters\n      in the same group are shared.\n    num_attention_heads: int. Number of attention heads in the Transformer.\n    intermediate_size: int. The size of the \"intermediate\" (a.k.a., feed\n      forward) layer.\n    inner_group_num: int, number of inner repetition of attention and ffn.\n    intermediate_act_fn: function. The non-linear activation function to apply\n      to the output of the intermediate/feed-forward layer.\n    hidden_dropout_prob: float. Dropout probability for the hidden layers.\n    attention_probs_dropout_prob: float. Dropout probability of the attention\n      probabilities.\n    initializer_range: float. Range of the initializer (stddev of truncated\n      normal).\n    do_return_all_layers: Whether to also return all layers or just the final\n      layer.\n  Returns:\n    float Tensor of shape [batch_size, seq_length, hidden_size], the final\n    hidden layer of the Transformer.\n  Raises:\n    ValueError: A Tensor shape or parameter is invalid.\n  \"\"\"\n  if hidden_size % num_attention_heads != 0:\n    raise ValueError(\n        \"The hidden size (%d) is not a multiple of the number of attention \"\n        \"heads (%d)\" % (hidden_size, num_attention_heads))\n\n  attention_head_size = hidden_size // num_attention_heads\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  input_width = input_shape[2]\n\n  all_layer_outputs = []\n  if input_width != hidden_size:\n    prev_output = dense_layer_2d(\n        input_tensor, hidden_size, create_initializer(initializer_range),\n        None, name=\"embedding_hidden_mapping_in\")\n  else:\n    prev_output = input_tensor\n  with tf.variable_scope(\"transformer\", reuse=tf.AUTO_REUSE):\n    for layer_idx in range(num_hidden_layers):\n      group_idx = int(layer_idx / num_hidden_layers * num_hidden_groups)\n      with tf.variable_scope(\"group_%d\" % group_idx):\n        with tf.name_scope(\"layer_%d\" % layer_idx):\n          layer_output = prev_output\n          for inner_group_idx in range(inner_group_num):\n            with tf.variable_scope(\"inner_group_%d\" % inner_group_idx):\n              layer_output = attention_ffn_block(\n                  layer_output, hidden_size, attention_mask,\n                  num_attention_heads, attention_head_size,\n                  attention_probs_dropout_prob, intermediate_size,\n                  intermediate_act_fn, initializer_range, hidden_dropout_prob)\n              prev_output = layer_output\n              all_layer_outputs.append(layer_output)\n  if do_return_all_layers:\n    return all_layer_outputs\n  else:\n    return all_layer_outputs[-1]\n\n\ndef get_shape_list(tensor, expected_rank=None, name=None):\n  \"\"\"Returns a list of the shape of tensor, preferring static dimensions.\n  Args:\n    tensor: A tf.Tensor object to find the shape of.\n    expected_rank: (optional) int. The expected rank of `tensor`. If this is\n      specified and the `tensor` has a different rank, and exception will be\n      thrown.\n    name: Optional name of the tensor for the error message.\n  Returns:\n    A list of dimensions of the shape of tensor. All static dimensions will\n    be returned as python integers, and dynamic dimensions will be returned\n    as tf.Tensor scalars.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  if expected_rank is not None:\n    assert_rank(tensor, expected_rank, name)\n\n  shape = tensor.shape.as_list()\n\n  non_static_indexes = []\n  for (index, dim) in enumerate(shape):\n    if dim is None:\n      non_static_indexes.append(index)\n\n  if not non_static_indexes:\n    return shape\n\n  dyn_shape = tf.shape(tensor)\n  for index in non_static_indexes:\n    shape[index] = dyn_shape[index]\n  return shape\n\n\ndef reshape_to_matrix(input_tensor):\n  \"\"\"Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).\"\"\"\n  ndims = input_tensor.shape.ndims\n  if ndims < 2:\n    raise ValueError(\"Input tensor must have at least rank 2. Shape = %s\" %\n                     (input_tensor.shape))\n  if ndims == 2:\n    return input_tensor\n\n  width = input_tensor.shape[-1]\n  output_tensor = tf.reshape(input_tensor, [-1, width])\n  return output_tensor\n\n\ndef reshape_from_matrix(output_tensor, orig_shape_list):\n  \"\"\"Reshapes a rank 2 tensor back to its original rank >= 2 tensor.\"\"\"\n  if len(orig_shape_list) == 2:\n    return output_tensor\n\n  output_shape = get_shape_list(output_tensor)\n\n  orig_dims = orig_shape_list[0:-1]\n  width = output_shape[-1]\n\n  return tf.reshape(output_tensor, orig_dims + [width])\n\n\ndef assert_rank(tensor, expected_rank, name=None):\n  \"\"\"Raises an exception if the tensor rank is not of the expected rank.\n  Args:\n    tensor: A tf.Tensor to check the rank of.\n    expected_rank: Python integer or list of integers, expected rank.\n    name: Optional name of the tensor for the error message.\n  Raises:\n    ValueError: If the expected shape doesn't match the actual shape.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  expected_rank_dict = {}\n  if isinstance(expected_rank, six.integer_types):\n    expected_rank_dict[expected_rank] = True\n  else:\n    for x in expected_rank:\n      expected_rank_dict[x] = True\n\n  actual_rank = tensor.shape.ndims\n  if actual_rank not in expected_rank_dict:\n    scope_name = tf.get_variable_scope().name\n    raise ValueError(\n        \"For the tensor `%s` in scope `%s`, the actual rank \"\n        \"`%d` (shape = %s) is not equal to the expected rank `%s`\" %\n        (name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))"
  },
  {
    "path": "modeling_google_fast.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"The main ALBERT model and related functions.\nFor a description of the algorithm, see https://arxiv.org/abs/1909.11942.\n\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport copy\nimport json\nimport math\nimport re\nimport numpy as np\nimport six\nfrom six.moves import range\nimport tensorflow as tf\n\n\nclass AlbertConfig(object):\n  \"\"\"Configuration for `AlbertModel`.\n  The default settings match the configuration of model `albert_xxlarge`.\n  \"\"\"\n\n  def __init__(self,\n               vocab_size,\n               embedding_size=128,\n               hidden_size=4096,\n               num_hidden_layers=12,\n               num_hidden_groups=1,\n               num_attention_heads=64,\n               intermediate_size=16384,\n               inner_group_num=1,\n               down_scale_factor=1,\n               hidden_act=\"gelu\",\n               hidden_dropout_prob=0,\n               attention_probs_dropout_prob=0,\n               max_position_embeddings=512,\n               type_vocab_size=2,\n               initializer_range=0.02):\n    \"\"\"Constructs AlbertConfig.\n    Args:\n      vocab_size: Vocabulary size of `inputs_ids` in `AlbertModel`.\n      embedding_size: size of voc embeddings.\n      hidden_size: Size of the encoder layers and the pooler layer.\n      num_hidden_layers: Number of hidden layers in the Transformer encoder.\n      num_hidden_groups: Number of group for the hidden layers, parameters in\n        the same group are shared.\n      num_attention_heads: Number of attention heads for each attention layer in\n        the Transformer encoder.\n      intermediate_size: The size of the \"intermediate\" (i.e., feed-forward)\n        layer in the Transformer encoder.\n      inner_group_num: int, number of inner repetition of attention and ffn.\n      down_scale_factor: float, the scale to apply\n      hidden_act: The non-linear activation function (function or string) in the\n        encoder and pooler.\n      hidden_dropout_prob: The dropout probability for all fully connected\n        layers in the embeddings, encoder, and pooler.\n      attention_probs_dropout_prob: The dropout ratio for the attention\n        probabilities.\n      max_position_embeddings: The maximum sequence length that this model might\n        ever be used with. Typically set this to something large just in case\n        (e.g., 512 or 1024 or 2048).\n      type_vocab_size: The vocabulary size of the `token_type_ids` passed into\n        `AlbertModel`.\n      initializer_range: The stdev of the truncated_normal_initializer for\n        initializing all weight matrices.\n    \"\"\"\n    self.vocab_size = vocab_size\n    self.embedding_size = embedding_size\n    self.hidden_size = hidden_size\n    self.num_hidden_layers = num_hidden_layers\n    self.num_hidden_groups = num_hidden_groups\n    self.num_attention_heads = num_attention_heads\n    self.inner_group_num = inner_group_num\n    self.down_scale_factor = down_scale_factor\n    self.hidden_act = hidden_act\n    self.intermediate_size = intermediate_size\n    self.hidden_dropout_prob = hidden_dropout_prob\n    self.attention_probs_dropout_prob = attention_probs_dropout_prob\n    self.max_position_embeddings = max_position_embeddings\n    self.type_vocab_size = type_vocab_size\n    self.initializer_range = initializer_range\n\n  @classmethod\n  def from_dict(cls, json_object):\n    \"\"\"Constructs a `AlbertConfig` from a Python dictionary of parameters.\"\"\"\n    config = AlbertConfig(vocab_size=None)\n    for (key, value) in six.iteritems(json_object):\n      config.__dict__[key] = value\n    return config\n\n  @classmethod\n  def from_json_file(cls, json_file):\n    \"\"\"Constructs a `AlbertConfig` from a json file of parameters.\"\"\"\n    with tf.gfile.GFile(json_file, \"r\") as reader:\n      text = reader.read()\n    return cls.from_dict(json.loads(text))\n\n  def to_dict(self):\n    \"\"\"Serializes this instance to a Python dictionary.\"\"\"\n    output = copy.deepcopy(self.__dict__)\n    return output\n\n  def to_json_string(self):\n    \"\"\"Serializes this instance to a JSON string.\"\"\"\n    return json.dumps(self.to_dict(), indent=2, sort_keys=True) + \"\\n\"\n\n\nclass AlbertModel(object):\n  \"\"\"BERT model (\"Bidirectional Encoder Representations from Transformers\").\n  Example usage:\n  ```python\n  # Already been converted from strings into ids\n  input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])\n  input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])\n  token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])\n  config = modeling.AlbertConfig(vocab_size=32000, hidden_size=512,\n    num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)\n  model = modeling.AlbertModel(config=config, is_training=True,\n    input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)\n  label_embeddings = tf.get_variable(...)\n  pooled_output = model.get_pooled_output()\n  logits = tf.matmul(pooled_output, label_embeddings)\n  ...\n  ```\n  \"\"\"\n\n  def __init__(self,\n               config,\n               is_training,\n               input_ids,\n               input_mask=None,\n               token_type_ids=None,\n               use_one_hot_embeddings=False,\n               scope=None):\n    \"\"\"Constructor for AlbertModel.\n    Args:\n      config: `AlbertConfig` instance.\n      is_training: bool. true for training model, false for eval model. Controls\n        whether dropout will be applied.\n      input_ids: int32 Tensor of shape [batch_size, seq_length].\n      input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].\n      token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      use_one_hot_embeddings: (optional) bool. Whether to use one-hot word\n        embeddings or tf.embedding_lookup() for the word embeddings.\n      scope: (optional) variable scope. Defaults to \"bert\".\n    Raises:\n      ValueError: The config is invalid or one of the input tensor shapes\n        is invalid.\n    \"\"\"\n    config = copy.deepcopy(config)\n    if not is_training:\n      config.hidden_dropout_prob = 0.0\n      config.attention_probs_dropout_prob = 0.0\n\n    input_shape = get_shape_list(input_ids, expected_rank=2)\n    batch_size = input_shape[0]\n    seq_length = input_shape[1]\n\n    if input_mask is None:\n      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    if token_type_ids is None:\n      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)\n\n    with tf.variable_scope(scope, default_name=\"bert\"):\n      with tf.variable_scope(\"embeddings\"):\n        # Perform embedding lookup on the word ids.\n        (self.word_embedding_output,\n         self.output_embedding_table) = embedding_lookup(\n            input_ids=input_ids,\n            vocab_size=config.vocab_size,\n            embedding_size=config.embedding_size,\n            initializer_range=config.initializer_range,\n            word_embedding_name=\"word_embeddings\",\n            use_one_hot_embeddings=use_one_hot_embeddings)\n\n        # Add positional embeddings and token type embeddings, then layer\n        # normalize and perform dropout.\n        self.embedding_output = embedding_postprocessor(\n            input_tensor=self.word_embedding_output,\n            use_token_type=True,\n            token_type_ids=token_type_ids,\n            token_type_vocab_size=config.type_vocab_size,\n            token_type_embedding_name=\"token_type_embeddings\",\n            use_position_embeddings=True,\n            position_embedding_name=\"position_embeddings\",\n            initializer_range=config.initializer_range,\n            max_position_embeddings=config.max_position_embeddings,\n            dropout_prob=config.hidden_dropout_prob)\n\n      with tf.variable_scope(\"encoder\"):\n\n        # Run the stacked transformer.\n        # `sequence_output` shape = [batch_size, seq_length, hidden_size].\n        self.all_encoder_layers = transformer_model(\n            input_tensor=self.embedding_output,\n            attention_mask=input_mask,\n            hidden_size=config.hidden_size,\n            num_hidden_layers=config.num_hidden_layers,\n            num_hidden_groups=config.num_hidden_groups,\n            num_attention_heads=config.num_attention_heads,\n            intermediate_size=config.intermediate_size,\n            inner_group_num=config.inner_group_num,\n            intermediate_act_fn=get_activation(config.hidden_act),\n            hidden_dropout_prob=config.hidden_dropout_prob,\n            attention_probs_dropout_prob=config.attention_probs_dropout_prob,\n            initializer_range=config.initializer_range,\n            do_return_all_layers=True)\n\n      self.sequence_output = self.all_encoder_layers[-1]\n      # The \"pooler\" converts the encoded sequence tensor of shape\n      # [batch_size, seq_length, hidden_size] to a tensor of shape\n      # [batch_size, hidden_size]. This is necessary for segment-level\n      # (or segment-pair-level) classification tasks where we need a fixed\n      # dimensional representation of the segment.\n      with tf.variable_scope(\"pooler\"):\n        # We \"pool\" the model by simply taking the hidden state corresponding\n        # to the first token. We assume that this has been pre-trained\n        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)\n        self.pooled_output = tf.layers.dense(\n            first_token_tensor,\n            config.hidden_size,\n            activation=tf.tanh,\n            kernel_initializer=create_initializer(config.initializer_range))\n\n  def get_pooled_output(self):\n    return self.pooled_output\n\n  def get_sequence_output(self):\n    \"\"\"Gets final hidden layer of encoder.\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the final hidden of the transformer encoder.\n    \"\"\"\n    return self.sequence_output\n\n  def get_all_encoder_layers(self):\n    return self.all_encoder_layers\n\n  def get_word_embedding_output(self):\n    \"\"\"Get output of the word(piece) embedding lookup.\n    This is BEFORE positional embeddings and token type embeddings have been\n    added.\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the output of the word(piece) embedding layer.\n    \"\"\"\n    return self.word_embedding_output\n\n  def get_embedding_output(self):\n    \"\"\"Gets output of the embedding lookup (i.e., input to the transformer).\n    Returns:\n      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding\n      to the output of the embedding layer, after summing the word\n      embeddings with the positional embeddings and the token type embeddings,\n      then performing layer normalization. This is the input to the transformer.\n    \"\"\"\n    return self.embedding_output\n\n  def get_embedding_table(self):\n    return self.output_embedding_table\n\n\ndef gelu(x):\n  \"\"\"Gaussian Error Linear Unit.\n  This is a smoother version of the RELU.\n  Original paper: https://arxiv.org/abs/1606.08415\n  Args:\n    x: float Tensor to perform activation.\n  Returns:\n    `x` with the GELU activation applied.\n  \"\"\"\n  cdf = 0.5 * (1.0 + tf.tanh(\n      (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))\n  return x * cdf\n\n\ndef get_activation(activation_string):\n  \"\"\"Maps a string to a Python function, e.g., \"relu\" => `tf.nn.relu`.\n  Args:\n    activation_string: String name of the activation function.\n  Returns:\n    A Python function corresponding to the activation function. If\n    `activation_string` is None, empty, or \"linear\", this will return None.\n    If `activation_string` is not a string, it will return `activation_string`.\n  Raises:\n    ValueError: The `activation_string` does not correspond to a known\n      activation.\n  \"\"\"\n\n  # We assume that anything that\"s not a string is already an activation\n  # function, so we just return it.\n  if not isinstance(activation_string, six.string_types):\n    return activation_string\n\n  if not activation_string:\n    return None\n\n  act = activation_string.lower()\n  if act == \"linear\":\n    return None\n  elif act == \"relu\":\n    return tf.nn.relu\n  elif act == \"gelu\":\n    return gelu\n  elif act == \"tanh\":\n    return tf.tanh\n  elif act == \"swish\":\n    return lambda x: x * tf.sigmoid(x)\n  else:\n    raise ValueError(\"Unsupported activation: %s\" % act)\n\n\ndef get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_group=0):\n  \"\"\"Compute the union of the current variables and checkpoint variables.\"\"\"\n  assignment_map = {}\n  initialized_variable_names = {}\n\n  name_to_variable = collections.OrderedDict()\n  for var in tvars:\n    name = var.name\n    m = re.match(\"^(.*):\\\\d+$\", name)\n    if m is not None:\n      name = m.group(1)\n    name_to_variable[name] = var\n  init_vars = tf.train.list_variables(init_checkpoint)\n  init_vars_name = [name for (name, _) in init_vars]\n\n  if num_of_group > 0:\n    assignment_map = []\n    for gid in range(num_of_group):\n      assignment_map.append(collections.OrderedDict())\n  else:\n    assignment_map = collections.OrderedDict()\n\n  for name in name_to_variable:\n    if name in init_vars_name:\n      tvar_name = name\n    elif (re.sub(r\"/group_\\d+/\", \"/group_0/\",\n                 six.ensure_str(name)) in init_vars_name and\n          num_of_group > 1):\n      tvar_name = re.sub(r\"/group_\\d+/\", \"/group_0/\", six.ensure_str(name))\n    elif (re.sub(r\"/ffn_\\d+/\", \"/ffn_1/\", six.ensure_str(name))\n          in init_vars_name and num_of_group > 1):\n      tvar_name = re.sub(r\"/ffn_\\d+/\", \"/ffn_1/\", six.ensure_str(name))\n    elif (re.sub(r\"/attention_\\d+/\", \"/attention_1/\", six.ensure_str(name))\n          in init_vars_name and num_of_group > 1):\n      tvar_name = re.sub(r\"/attention_\\d+/\", \"/attention_1/\",\n                         six.ensure_str(name))\n    else:\n      tf.logging.info(\"name %s does not get matched\", name)\n      continue\n    tf.logging.info(\"name %s match to %s\", name, tvar_name)\n    if num_of_group > 0:\n      group_matched = False\n      for gid in range(1, num_of_group):\n        if ((\"/group_\" + str(gid) + \"/\" in name) or\n            (\"/ffn_\" + str(gid) + \"/\" in name) or\n            (\"/attention_\" + str(gid) + \"/\" in name)):\n          group_matched = True\n          tf.logging.info(\"%s belongs to %dth\", name, gid)\n          assignment_map[gid][tvar_name] = name\n      if not group_matched:\n        assignment_map[0][tvar_name] = name\n    else:\n      assignment_map[tvar_name] = name\n    initialized_variable_names[name] = 1\n    initialized_variable_names[six.ensure_str(name) + \":0\"] = 1\n\n  return (assignment_map, initialized_variable_names)\n\n\ndef dropout(input_tensor, dropout_prob):\n  \"\"\"Perform dropout.\n  Args:\n    input_tensor: float Tensor.\n    dropout_prob: Python float. The probability of dropping out a value (NOT of\n      *keeping* a dimension as in `tf.nn.dropout`).\n  Returns:\n    A version of `input_tensor` with dropout applied.\n  \"\"\"\n  if dropout_prob is None or dropout_prob == 0.0:\n    return input_tensor\n\n  output = tf.nn.dropout(input_tensor, rate=dropout_prob)\n  return output\n\n\ndef layer_norm(input_tensor, name=None):\n  \"\"\"Run layer normalization on the last dimension of the tensor.\"\"\"\n  return tf.contrib.layers.layer_norm(\n      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)\n\n\ndef layer_norm_and_dropout(input_tensor, dropout_prob, name=None):\n  \"\"\"Runs layer normalization followed by dropout.\"\"\"\n  output_tensor = layer_norm(input_tensor, name)\n  output_tensor = dropout(output_tensor, dropout_prob)\n  return output_tensor\n\n\ndef create_initializer(initializer_range=0.02):\n  \"\"\"Creates a `truncated_normal_initializer` with the given range.\"\"\"\n  return tf.truncated_normal_initializer(stddev=initializer_range)\n\n\ndef get_timing_signal_1d_given_position(channels,\n                                        position,\n                                        min_timescale=1.0,\n                                        max_timescale=1.0e4):\n  \"\"\"Get sinusoids of diff frequencies, with timing position given.\n  Adapted from add_timing_signal_1d_given_position in\n  //third_party/py/tensor2tensor/layers/common_attention.py\n  Args:\n    channels: scalar, size of timing embeddings to create. The number of\n        different timescales is equal to channels / 2.\n    position: a Tensor with shape [batch, seq_len]\n    min_timescale: a float\n    max_timescale: a float\n  Returns:\n    a Tensor of timing signals [batch, seq_len, channels]\n  \"\"\"\n  num_timescales = channels // 2\n  log_timescale_increment = (\n      math.log(float(max_timescale) / float(min_timescale)) /\n      (tf.to_float(num_timescales) - 1))\n  inv_timescales = min_timescale * tf.exp(\n      tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)\n  scaled_time = (\n      tf.expand_dims(tf.to_float(position), 2) * tf.expand_dims(\n          tf.expand_dims(inv_timescales, 0), 0))\n  signal = tf.concat([tf.sin(scaled_time), tf.cos(scaled_time)], axis=2)\n  signal = tf.pad(signal, [[0, 0], [0, 0], [0, tf.mod(channels, 2)]])\n  return signal\n\n\ndef embedding_lookup(input_ids,\n                     vocab_size,\n                     embedding_size=128,\n                     initializer_range=0.02,\n                     word_embedding_name=\"word_embeddings\",\n                     use_one_hot_embeddings=False):\n  \"\"\"Looks up words embeddings for id tensor.\n  Args:\n    input_ids: int32 Tensor of shape [batch_size, seq_length] containing word\n      ids.\n    vocab_size: int. Size of the embedding vocabulary.\n    embedding_size: int. Width of the word embeddings.\n    initializer_range: float. Embedding initialization range.\n    word_embedding_name: string. Name of the embedding table.\n    use_one_hot_embeddings: bool. If True, use one-hot method for word\n      embeddings. If False, use `tf.nn.embedding_lookup()`.\n  Returns:\n    float Tensor of shape [batch_size, seq_length, embedding_size].\n  \"\"\"\n  # This function assumes that the input is of shape [batch_size, seq_length,\n  # num_inputs].\n  #\n  # If the input is a 2D tensor of shape [batch_size, seq_length], we\n  # reshape to [batch_size, seq_length, 1].\n  if input_ids.shape.ndims == 2:\n    input_ids = tf.expand_dims(input_ids, axis=[-1])\n\n  embedding_table = tf.get_variable(\n      name=word_embedding_name,\n      shape=[vocab_size, embedding_size],\n      initializer=create_initializer(initializer_range))\n\n  if use_one_hot_embeddings:\n    flat_input_ids = tf.reshape(input_ids, [-1])\n    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)\n    output = tf.matmul(one_hot_input_ids, embedding_table)\n  else:\n    output = tf.nn.embedding_lookup(embedding_table, input_ids)\n\n  input_shape = get_shape_list(input_ids)\n\n  output = tf.reshape(output,\n                      input_shape[0:-1] + [input_shape[-1] * embedding_size])\n  return (output, embedding_table)\n\n\ndef embedding_postprocessor(input_tensor,\n                            use_token_type=False,\n                            token_type_ids=None,\n                            token_type_vocab_size=16,\n                            token_type_embedding_name=\"token_type_embeddings\",\n                            use_position_embeddings=True,\n                            position_embedding_name=\"position_embeddings\",\n                            initializer_range=0.02,\n                            max_position_embeddings=512,\n                            dropout_prob=0.1):\n  \"\"\"Performs various post-processing on a word embedding tensor.\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length,\n      embedding_size].\n    use_token_type: bool. Whether to add embeddings for `token_type_ids`.\n    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].\n      Must be specified if `use_token_type` is True.\n    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.\n    token_type_embedding_name: string. The name of the embedding table variable\n      for token type ids.\n    use_position_embeddings: bool. Whether to add position embeddings for the\n      position of each token in the sequence.\n    position_embedding_name: string. The name of the embedding table variable\n      for positional embeddings.\n    initializer_range: float. Range of the weight initialization.\n    max_position_embeddings: int. Maximum sequence length that might ever be\n      used with this model. This can be longer than the sequence length of\n      input_tensor, but cannot be shorter.\n    dropout_prob: float. Dropout probability applied to the final output tensor.\n  Returns:\n    float tensor with same shape as `input_tensor`.\n  Raises:\n    ValueError: One of the tensor shapes or input values is invalid.\n  \"\"\"\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  batch_size = input_shape[0]\n  seq_length = input_shape[1]\n  width = input_shape[2]\n\n  output = input_tensor\n\n  if use_token_type:\n    if token_type_ids is None:\n      raise ValueError(\"`token_type_ids` must be specified if\"\n                       \"`use_token_type` is True.\")\n    token_type_table = tf.get_variable(\n        name=token_type_embedding_name,\n        shape=[token_type_vocab_size, width],\n        initializer=create_initializer(initializer_range))\n    # This vocab will be small so we always do one-hot here, since it is always\n    # faster for a small vocabulary.\n    flat_token_type_ids = tf.reshape(token_type_ids, [-1])\n    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)\n    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)\n    token_type_embeddings = tf.reshape(token_type_embeddings,\n                                       [batch_size, seq_length, width])\n    output += token_type_embeddings\n\n  if use_position_embeddings:\n    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)\n    with tf.control_dependencies([assert_op]):\n      full_position_embeddings = tf.get_variable(\n          name=position_embedding_name,\n          shape=[max_position_embeddings, width],\n          initializer=create_initializer(initializer_range))\n      # Since the position embedding table is a learned variable, we create it\n      # using a (long) sequence length `max_position_embeddings`. The actual\n      # sequence length might be shorter than this, for faster training of\n      # tasks that do not have long sequences.\n      #\n      # So `full_position_embeddings` is effectively an embedding table\n      # for position [0, 1, 2, ..., max_position_embeddings-1], and the current\n      # sequence has positions [0, 1, 2, ... seq_length-1], so we can just\n      # perform a slice.\n      position_embeddings = tf.slice(full_position_embeddings, [0, 0],\n                                     [seq_length, -1])\n      num_dims = len(output.shape.as_list())\n\n      # Only the last two dimensions are relevant (`seq_length` and `width`), so\n      # we broadcast among the first dimensions, which is typically just\n      # the batch size.\n      position_broadcast_shape = []\n      for _ in range(num_dims - 2):\n        position_broadcast_shape.append(1)\n      position_broadcast_shape.extend([seq_length, width])\n      position_embeddings = tf.reshape(position_embeddings,\n                                       position_broadcast_shape)\n      output += position_embeddings\n\n  output = layer_norm_and_dropout(output, dropout_prob)\n  return output\n\n\ndef dense_layer_3d(input_tensor,\n                   num_attention_heads,\n                   head_size,\n                   initializer,\n                   activation,\n                   name=None):\n  \"\"\"A dense layer with 3D kernel.\n  Args:\n    input_tensor: float Tensor of shape [batch, seq_length, hidden_size].\n    num_attention_heads: Number of attention heads.\n    head_size: The size per attention head.\n    initializer: Kernel initializer.\n    activation: Actication function.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n\n  input_shape = get_shape_list(input_tensor)\n  hidden_size = input_shape[2]\n\n  with tf.variable_scope(name):\n    w = tf.get_variable(\n        name=\"kernel\",\n        shape=[hidden_size, num_attention_heads * head_size],\n        initializer=initializer)\n    w = tf.reshape(w, [hidden_size, num_attention_heads, head_size])\n    b = tf.get_variable(\n        name=\"bias\",\n        shape=[num_attention_heads * head_size],\n        initializer=tf.zeros_initializer)\n    b = tf.reshape(b, [num_attention_heads, head_size])\n    ret = tf.einsum(\"BFH,HND->BFND\", input_tensor, w)\n    ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\n\ndef dense_layer_3d_proj(input_tensor,\n                        hidden_size,\n                        head_size,\n                        initializer,\n                        activation,\n                        name=None):\n  \"\"\"A dense layer with 3D kernel for projection.\n  Args:\n    input_tensor: float Tensor of shape [batch,from_seq_length,\n      num_attention_heads, size_per_head].\n    hidden_size: The size of hidden layer.\n    num_attention_heads: The size of output dimension.\n    head_size: The size of head.\n    initializer: Kernel initializer.\n    activation: Actication function.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n  input_shape = get_shape_list(input_tensor)\n  num_attention_heads= input_shape[2]\n  with tf.variable_scope(name):\n    w = tf.get_variable(\n        name=\"kernel\",\n        shape=[num_attention_heads * head_size, hidden_size],\n        initializer=initializer)\n    w = tf.reshape(w, [num_attention_heads, head_size, hidden_size])\n    b = tf.get_variable(\n        name=\"bias\", shape=[hidden_size], initializer=tf.zeros_initializer)\n    ret = tf.einsum(\"BFND,NDH->BFH\", input_tensor, w)\n    ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\ndef dense_layer_2d(input_tensor,\n                   output_size,\n                   initializer,\n                   activation,\n                   num_attention_heads=1,\n                   name=None,\n                   num_groups=1):\n  \"\"\"A dense layer with 2D kernel.\n  Args:\n    input_tensor: Float tensor with rank 3.\n    output_size: The size of output dimension.\n    initializer: Kernel initializer.\n    activation: Activation function.\n    num_groups: number of groups in dense layer\n    num_attention_heads: number of attention head in attention layer.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n  del num_attention_heads  # unused\n  input_shape = get_shape_list(input_tensor)\n  hidden_size = input_shape[2]\n  if num_groups == 1:\n    with tf.variable_scope(name):\n      w = tf.get_variable(\n          name=\"kernel\",\n          shape=[hidden_size, output_size],\n          initializer=initializer)\n      b = tf.get_variable(\n          name=\"bias\", shape=[output_size], initializer=tf.zeros_initializer)\n      ret = tf.einsum(\"BFH,HO->BFO\", input_tensor, w)\n      ret += b\n  else:\n    assert hidden_size % num_groups == 0\n    assert output_size % num_groups == 0\n    with tf.variable_scope(name):\n      w = tf.get_variable(\n          name=\"kernel\",\n          shape=[hidden_size//num_groups, output_size//num_groups, num_groups],\n          initializer=initializer)\n      b = tf.get_variable(\n          name=\"bias\", shape=[output_size], initializer=tf.zeros_initializer)\n      input_tensor = tf.reshape(input_tensor, input_shape[:2] + [hidden_size//num_groups, num_groups])\n      ret = tf.einsum(\"BFHG,HOG->BFGO\", input_tensor, w)\n      ret = tf.reshape(ret, input_shape[:2] + [output_size])\n      ret += b\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\ndef dense_layer_2d_old(input_tensor,\n                   output_size,\n                   initializer,\n                   activation,\n                   num_attention_heads=1,\n                   name=None,\n                   num_groups=1):\n  \"\"\"A dense layer with 2D kernel. 添加分组全连接的方式\n  Args:\n    input_tensor: Float tensor with rank 3. [ batch_size,sequence_length, hidden_size]\n    output_size: The size of output dimension.\n    initializer: Kernel initializer.\n    activation: Activation function.\n    num_groups: number of groups in dense layer\n    num_attention_heads: number of attention head in attention layer.\n    name: The name scope of this layer.\n  Returns:\n    float logits Tensor.\n  \"\"\"\n  del num_attention_heads  # unused\n  input_shape = get_shape_list(input_tensor)\n  # print(\"#dense_layer_2d.1.input_shape of input_tensor:\",input_shape)  # e.g. [2, 512, 768] = [ batch_size,sequence_length, hidden_size]\n  hidden_size = input_shape[2]\n  if num_groups == 1:\n    with tf.variable_scope(name):\n      w = tf.get_variable(\n          name=\"kernel\",\n          shape=[hidden_size, output_size],\n          initializer=initializer)\n      b = tf.get_variable(\n          name=\"bias\", shape=[output_size], initializer=tf.zeros_initializer)\n      ret = tf.einsum(\"BFH,HO->BFO\", input_tensor, w)\n      ret += b\n  else: # e.g. input_shape = [2, 512, 768] = [ batch_size,sequence_length, hidden_size]\n    assert hidden_size % num_groups == 0\n    assert output_size % num_groups == 0\n    # print(\"#dense_layer_2d.output_size:\",output_size,\";hidden_size:\",hidden_size) # output_size = 3072; hidden_size = 768\n    with tf.variable_scope(name):\n      w = tf.get_variable(\n          name=\"kernel\",\n          shape=[num_groups, hidden_size//num_groups, output_size//num_groups],\n          initializer=initializer)\n      # print(\"#dense_layer_2d.2'w:\",w.shape) # (16, 48, 192)\n      b = tf.get_variable(\n          name=\"bias\", shape=[num_groups, output_size//num_groups], initializer=tf.zeros_initializer)\n      # input_tensor = [ batch_size,sequence_length, hidden_size].\n      # input_shape[:2] + [hidden_size//num_groups, num_groups] = [batch_size, sequence_length, hidden_size/num_groups, num_groups]\n      input_tensor = tf.reshape(input_tensor, input_shape[:2] + [hidden_size//num_groups, num_groups])\n      # print(\"#dense_layer_2d.2.input_shape of input_tensor:\", input_tensor.shape)\n      input_tensor = tf.transpose(input_tensor, [3, 0, 1, 2]) # [num_groups, batch_size, sequence_length, hidden_size/num_groups]\n      # print(\"#dense_layer_2d.3.input_shape of input_tensor:\", input_tensor.shape) #  input_tensor=(16, 2, 512, 192)\n      # input_tensor=[num_groups, batch_size, sequence_length, hidden_size/num_groups], w=[num_groups, hidden_size/num_groups, output_size/num_groups]\n\n      ret = tf.einsum(\"GBFH,GHO->GBFO\", input_tensor, w)\n      # print(\"#dense_layer_2d.4. shape of ret:\", ret.shape) #  (16, 2, 512, 48) = [num_groups, batch_size, sequence_length ,output_size]\n      b = tf.expand_dims(b, 1)\n      b = tf.expand_dims(b, 1)\n      # print(\"#dense_layer_2d.4.2.b:\",b.shape) #  (16, 1, 1, 48)\n      ret += b\n      ret = tf.transpose(ret, [1, 2, 0, 3]) #  (2, 512, 16, 48)\n      # print(\"#dense_layer_2d.5. shape of ret:\", ret.shape)\n      ret = tf.reshape(ret, input_shape[:2] + [output_size]) # [2, 512, 768]\n  if activation is not None:\n    return activation(ret)\n  else:\n    return ret\n\n\ndef dot_product_attention(q, k, v, bias, dropout_rate=0.0):\n  \"\"\"Dot-product attention.\n  Args:\n    q: Tensor with shape [..., length_q, depth_k].\n    k: Tensor with shape [..., length_kv, depth_k]. Leading dimensions must\n      match with q.\n    v: Tensor with shape [..., length_kv, depth_v] Leading dimensions must\n      match with q.\n    bias: bias Tensor (see attention_bias())\n    dropout_rate: a float.\n  Returns:\n    Tensor with shape [..., length_q, depth_v].\n  \"\"\"\n  logits = tf.matmul(q, k, transpose_b=True)  # [..., length_q, length_kv]\n  logits = tf.multiply(logits, 1.0 / math.sqrt(float(get_shape_list(q)[-1])))\n  if bias is not None:\n    # `attention_mask` = [B, T]\n    from_shape = get_shape_list(q)\n    if len(from_shape) == 4:\n      broadcast_ones = tf.ones([from_shape[0], 1, from_shape[2], 1], tf.float32)\n    elif len(from_shape) == 5:\n      # from_shape = [B, N, Block_num, block_size, depth]#\n      broadcast_ones = tf.ones([from_shape[0], 1, from_shape[2], from_shape[3],\n                                1], tf.float32)\n\n    bias = tf.matmul(broadcast_ones,\n                     tf.cast(bias, tf.float32), transpose_b=True)\n\n    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for\n    # masked positions, this operation will create a tensor which is 0.0 for\n    # positions we want to attend and -10000.0 for masked positions.\n    adder = (1.0 - bias) * -10000.0\n\n    # Since we are adding it to the raw scores before the softmax, this is\n    # effectively the same as removing these entirely.\n    logits += adder\n  else:\n    adder = 0.0\n\n  attention_probs = tf.nn.softmax(logits, name=\"attention_probs\")\n  attention_probs = dropout(attention_probs, dropout_rate)\n  return tf.matmul(attention_probs, v)\n\n\ndef attention_layer(from_tensor,\n                    to_tensor,\n                    attention_mask=None,\n                    num_attention_heads=1,\n                    query_act=None,\n                    key_act=None,\n                    value_act=None,\n                    attention_probs_dropout_prob=0.0,\n                    initializer_range=0.02,\n                    batch_size=None,\n                    from_seq_length=None,\n                    to_seq_length=None):\n  \"\"\"Performs multi-headed attention from `from_tensor` to `to_tensor`.\n  Args:\n    from_tensor: float Tensor of shape [batch_size, from_seq_length,\n      from_width].\n    to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].\n    attention_mask: (optional) int32 Tensor of shape [batch_size,\n      from_seq_length, to_seq_length]. The values should be 1 or 0. The\n      attention scores will effectively be set to -infinity for any positions in\n      the mask that are 0, and will be unchanged for positions that are 1.\n    num_attention_heads: int. Number of attention heads.\n    query_act: (optional) Activation function for the query transform.\n    key_act: (optional) Activation function for the key transform.\n    value_act: (optional) Activation function for the value transform.\n    attention_probs_dropout_prob: (optional) float. Dropout probability of the\n      attention probabilities.\n    initializer_range: float. Range of the weight initializer.\n    batch_size: (Optional) int. If the input is 2D, this might be the batch size\n      of the 3D version of the `from_tensor` and `to_tensor`.\n    from_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `from_tensor`.\n    to_seq_length: (Optional) If the input is 2D, this might be the seq length\n      of the 3D version of the `to_tensor`.\n  Returns:\n    float Tensor of shape [batch_size, from_seq_length, num_attention_heads,\n      size_per_head].\n  Raises:\n    ValueError: Any of the arguments or tensor shapes are invalid.\n  \"\"\"\n  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])\n  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])\n  size_per_head = int(from_shape[2]/num_attention_heads)\n\n  if len(from_shape) != len(to_shape):\n    raise ValueError(\n        \"The rank of `from_tensor` must match the rank of `to_tensor`.\")\n\n  if len(from_shape) == 3:\n    batch_size = from_shape[0]\n    from_seq_length = from_shape[1]\n    to_seq_length = to_shape[1]\n  elif len(from_shape) == 2:\n    if (batch_size is None or from_seq_length is None or to_seq_length is None):\n      raise ValueError(\n          \"When passing in rank 2 tensors to attention_layer, the values \"\n          \"for `batch_size`, `from_seq_length`, and `to_seq_length` \"\n          \"must all be specified.\")\n\n  # Scalar dimensions referenced here:\n  #   B = batch size (number of sequences)\n  #   F = `from_tensor` sequence length\n  #   T = `to_tensor` sequence length\n  #   N = `num_attention_heads`\n  #   H = `size_per_head`\n\n  # `query_layer` = [B, F, N, H]\n  q = dense_layer_3d(from_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), query_act, \"query\")\n\n  # `key_layer` = [B, T, N, H]\n  k = dense_layer_3d(to_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), key_act, \"key\")\n  # `value_layer` = [B, T, N, H]\n  v = dense_layer_3d(to_tensor, num_attention_heads, size_per_head,\n                     create_initializer(initializer_range), value_act, \"value\")\n  q = tf.transpose(q, [0, 2, 1, 3])\n  k = tf.transpose(k, [0, 2, 1, 3])\n  v = tf.transpose(v, [0, 2, 1, 3])\n  if attention_mask is not None:\n    attention_mask = tf.reshape(\n        attention_mask, [batch_size, 1, to_seq_length, 1])\n    # 'new_embeddings = [B, N, F, H]'\n  new_embeddings = dot_product_attention(q, k, v, attention_mask,\n                                         attention_probs_dropout_prob)\n\n  return tf.transpose(new_embeddings, [0, 2, 1, 3])\n\n\ndef attention_ffn_block(layer_input,\n                        hidden_size=768,\n                        attention_mask=None,\n                        num_attention_heads=1,\n                        attention_head_size=64,\n                        attention_probs_dropout_prob=0.0,\n                        intermediate_size=3072,\n                        intermediate_act_fn=None,\n                        initializer_range=0.02,\n                        hidden_dropout_prob=0.0):\n  \"\"\"A network with attention-ffn as sub-block.\n  Args:\n    layer_input: float Tensor of shape [batch_size, from_seq_length,\n      from_width].\n    hidden_size: (optional) int, size of hidden layer.\n    attention_mask: (optional) int32 Tensor of shape [batch_size,\n      from_seq_length, to_seq_length]. The values should be 1 or 0. The\n      attention scores will effectively be set to -infinity for any positions in\n      the mask that are 0, and will be unchanged for positions that are 1.\n    num_attention_heads: int. Number of attention heads.\n    attention_head_size: int. Size of attention head.\n    attention_probs_dropout_prob: float. dropout probability for attention_layer\n    intermediate_size: int. Size of intermediate hidden layer.\n    intermediate_act_fn: (optional) Activation function for the intermediate\n      layer.\n    initializer_range: float. Range of the weight initializer.\n    hidden_dropout_prob: (optional) float. Dropout probability of the hidden\n      layer.\n  Returns:\n    layer output\n  \"\"\"\n\n  with tf.variable_scope(\"attention_1\"):\n    with tf.variable_scope(\"self\"):\n      attention_output = attention_layer(\n          from_tensor=layer_input,\n          to_tensor=layer_input,\n          attention_mask=attention_mask,\n          num_attention_heads=num_attention_heads,\n          attention_probs_dropout_prob=attention_probs_dropout_prob,\n          initializer_range=initializer_range)\n\n    # Run a linear projection of `hidden_size` then add a residual\n    # with `layer_input`.\n    with tf.variable_scope(\"output\"):\n      attention_output = dense_layer_3d_proj(\n          attention_output,\n          hidden_size,\n          attention_head_size,\n          create_initializer(initializer_range),\n          None,\n          name=\"dense\")\n      attention_output = dropout(attention_output, hidden_dropout_prob)\n  attention_output = layer_norm(attention_output + layer_input)\n  with tf.variable_scope(\"ffn_1\"):\n    with tf.variable_scope(\"intermediate\"):\n      intermediate_output = dense_layer_2d(\n          attention_output,\n          intermediate_size,\n          create_initializer(initializer_range),\n          intermediate_act_fn,\n          num_attention_heads=num_attention_heads,\n          name=\"dense\",\n          num_groups=16)\n      with tf.variable_scope(\"output\"):\n        ffn_output = dense_layer_2d(\n            intermediate_output,\n            hidden_size,\n            create_initializer(initializer_range),\n            None,\n            num_attention_heads=num_attention_heads,\n            name=\"dense\",\n            num_groups=16)\n      ffn_output = dropout(ffn_output, hidden_dropout_prob)\n  ffn_output = layer_norm(ffn_output + attention_output)\n  return ffn_output\n\n\ndef transformer_model(input_tensor,\n                      attention_mask=None,\n                      hidden_size=768,\n                      num_hidden_layers=12,\n                      num_hidden_groups=12,\n                      num_attention_heads=12,\n                      intermediate_size=3072,\n                      inner_group_num=1,\n                      intermediate_act_fn=\"gelu\",\n                      hidden_dropout_prob=0.1,\n                      attention_probs_dropout_prob=0.1,\n                      initializer_range=0.02,\n                      do_return_all_layers=False):\n  \"\"\"Multi-headed, multi-layer Transformer from \"Attention is All You Need\".\n  This is almost an exact implementation of the original Transformer encoder.\n  See the original paper:\n  https://arxiv.org/abs/1706.03762\n  Also see:\n  https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py\n  Args:\n    input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].\n    attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,\n      seq_length], with 1 for positions that can be attended to and 0 in\n      positions that should not be.\n    hidden_size: int. Hidden size of the Transformer.\n    num_hidden_layers: int. Number of layers (blocks) in the Transformer.\n    num_hidden_groups: int. Number of group for the hidden layers, parameters\n      in the same group are shared.\n    num_attention_heads: int. Number of attention heads in the Transformer.\n    intermediate_size: int. The size of the \"intermediate\" (a.k.a., feed\n      forward) layer.\n    inner_group_num: int, number of inner repetition of attention and ffn.\n    intermediate_act_fn: function. The non-linear activation function to apply\n      to the output of the intermediate/feed-forward layer.\n    hidden_dropout_prob: float. Dropout probability for the hidden layers.\n    attention_probs_dropout_prob: float. Dropout probability of the attention\n      probabilities.\n    initializer_range: float. Range of the initializer (stddev of truncated\n      normal).\n    do_return_all_layers: Whether to also return all layers or just the final\n      layer.\n  Returns:\n    float Tensor of shape [batch_size, seq_length, hidden_size], the final\n    hidden layer of the Transformer.\n  Raises:\n    ValueError: A Tensor shape or parameter is invalid.\n  \"\"\"\n  if hidden_size % num_attention_heads != 0:\n    raise ValueError(\n        \"The hidden size (%d) is not a multiple of the number of attention \"\n        \"heads (%d)\" % (hidden_size, num_attention_heads))\n\n  attention_head_size = hidden_size // num_attention_heads\n  input_shape = get_shape_list(input_tensor, expected_rank=3)\n  input_width = input_shape[2]\n\n  all_layer_outputs = []\n  if input_width != hidden_size:\n    prev_output = dense_layer_2d(\n        input_tensor, hidden_size, create_initializer(initializer_range),\n        None, name=\"embedding_hidden_mapping_in\")\n  else:\n    prev_output = input_tensor\n  with tf.variable_scope(\"transformer\", reuse=tf.AUTO_REUSE):\n    for layer_idx in range(num_hidden_layers):\n      group_idx = int(layer_idx / num_hidden_layers * num_hidden_groups)\n      with tf.variable_scope(\"group_%d\" % group_idx):\n        with tf.name_scope(\"layer_%d\" % layer_idx):\n          layer_output = prev_output\n          for inner_group_idx in range(inner_group_num):\n            with tf.variable_scope(\"inner_group_%d\" % inner_group_idx):\n              layer_output = attention_ffn_block(\n                  layer_output, hidden_size, attention_mask,\n                  num_attention_heads, attention_head_size,\n                  attention_probs_dropout_prob, intermediate_size,\n                  intermediate_act_fn, initializer_range, hidden_dropout_prob)\n              prev_output = layer_output\n              all_layer_outputs.append(layer_output)\n  if do_return_all_layers:\n    return all_layer_outputs\n  else:\n    return all_layer_outputs[-1]\n\n\ndef get_shape_list(tensor, expected_rank=None, name=None):\n  \"\"\"Returns a list of the shape of tensor, preferring static dimensions.\n  Args:\n    tensor: A tf.Tensor object to find the shape of.\n    expected_rank: (optional) int. The expected rank of `tensor`. If this is\n      specified and the `tensor` has a different rank, and exception will be\n      thrown.\n    name: Optional name of the tensor for the error message.\n  Returns:\n    A list of dimensions of the shape of tensor. All static dimensions will\n    be returned as python integers, and dynamic dimensions will be returned\n    as tf.Tensor scalars.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  if expected_rank is not None:\n    assert_rank(tensor, expected_rank, name)\n\n  shape = tensor.shape.as_list()\n\n  non_static_indexes = []\n  for (index, dim) in enumerate(shape):\n    if dim is None:\n      non_static_indexes.append(index)\n\n  if not non_static_indexes:\n    return shape\n\n  dyn_shape = tf.shape(tensor)\n  for index in non_static_indexes:\n    shape[index] = dyn_shape[index]\n  return shape\n\n\ndef reshape_to_matrix(input_tensor):\n  \"\"\"Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix).\"\"\"\n  ndims = input_tensor.shape.ndims\n  if ndims < 2:\n    raise ValueError(\"Input tensor must have at least rank 2. Shape = %s\" %\n                     (input_tensor.shape))\n  if ndims == 2:\n    return input_tensor\n\n  width = input_tensor.shape[-1]\n  output_tensor = tf.reshape(input_tensor, [-1, width])\n  return output_tensor\n\n\ndef reshape_from_matrix(output_tensor, orig_shape_list):\n  \"\"\"Reshapes a rank 2 tensor back to its original rank >= 2 tensor.\"\"\"\n  if len(orig_shape_list) == 2:\n    return output_tensor\n\n  output_shape = get_shape_list(output_tensor)\n\n  orig_dims = orig_shape_list[0:-1]\n  width = output_shape[-1]\n\n  return tf.reshape(output_tensor, orig_dims + [width])\n\n\ndef assert_rank(tensor, expected_rank, name=None):\n  \"\"\"Raises an exception if the tensor rank is not of the expected rank.\n  Args:\n    tensor: A tf.Tensor to check the rank of.\n    expected_rank: Python integer or list of integers, expected rank.\n    name: Optional name of the tensor for the error message.\n  Raises:\n    ValueError: If the expected shape doesn't match the actual shape.\n  \"\"\"\n  if name is None:\n    name = tensor.name\n\n  expected_rank_dict = {}\n  if isinstance(expected_rank, six.integer_types):\n    expected_rank_dict[expected_rank] = True\n  else:\n    for x in expected_rank:\n      expected_rank_dict[x] = True\n\n  actual_rank = tensor.shape.ndims\n  if actual_rank not in expected_rank_dict:\n    scope_name = tf.get_variable_scope().name\n    raise ValueError(\n        \"For the tensor `%s` in scope `%s`, the actual rank \"\n        \"`%d` (shape = %s) is not equal to the expected rank `%s`\" %\n        (name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))\n"
  },
  {
    "path": "optimization.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Functions and classes related to optimization (weight updates).\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport re\nimport tensorflow as tf\n\n\ndef create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, use_tpu):\n    \"\"\"Creates an optimizer training op.\"\"\"\n    global_step = tf.train.get_or_create_global_step()\n\n    learning_rate = tf.constant(value=init_lr, shape=[], dtype=tf.float32)\n\n    # Implements linear decay of the learning rate.\n    learning_rate = tf.train.polynomial_decay(\n        learning_rate,\n        global_step,\n        num_train_steps,\n        end_learning_rate=0.0,\n        power=1.0,\n        cycle=False)\n\n    # Implements linear warmup. I.e., if global_step < num_warmup_steps, the\n    # learning rate will be `global_step/num_warmup_steps * init_lr`.\n    if num_warmup_steps:\n        global_steps_int = tf.cast(global_step, tf.int32)\n        warmup_steps_int = tf.constant(num_warmup_steps, dtype=tf.int32)\n\n        global_steps_float = tf.cast(global_steps_int, tf.float32)\n        warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)\n\n        warmup_percent_done = global_steps_float / warmup_steps_float\n        warmup_learning_rate = init_lr * warmup_percent_done\n\n        is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)\n        learning_rate = (\n                (1.0 - is_warmup) * learning_rate + is_warmup * warmup_learning_rate)\n\n    # It is recommended that you use this optimizer for fine tuning, since this\n    # is how the model was trained (note that the Adam m/v variables are NOT\n    # loaded from init_checkpoint.)\n    optimizer = LAMBOptimizer(\n        learning_rate=learning_rate,\n        weight_decay_rate=0.01,\n        beta_1=0.9,\n        beta_2=0.999,\n        epsilon=1e-6,\n        exclude_from_weight_decay=[\"LayerNorm\", \"layer_norm\", \"bias\"])\n\n    if use_tpu:\n        optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)\n\n    tvars = tf.trainable_variables()\n    grads = tf.gradients(loss, tvars)\n\n    # This is how the model was pre-trained.\n    (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)\n\n    train_op = optimizer.apply_gradients(\n        zip(grads, tvars), global_step=global_step)\n\n    # Normally the global step update is done inside of `apply_gradients`.\n    # However, `AdamWeightDecayOptimizer` doesn't do this. But if you use\n    # a different optimizer, you should probably take this line out.\n    new_global_step = global_step + 1\n    train_op = tf.group(train_op, [global_step.assign(new_global_step)])\n    return train_op\n\n\nclass AdamWeightDecayOptimizer(tf.train.Optimizer):\n    \"\"\"A basic Adam optimizer that includes \"correct\" L2 weight decay.\"\"\"\n\n    def __init__(self,\n                 learning_rate,\n                 weight_decay_rate=0.0,\n                 beta_1=0.9,\n                 beta_2=0.999,\n                 epsilon=1e-6,\n                 exclude_from_weight_decay=None,\n                 name=\"AdamWeightDecayOptimizer\"):\n        \"\"\"Constructs a AdamWeightDecayOptimizer.\"\"\"\n        super(AdamWeightDecayOptimizer, self).__init__(False, name)\n\n        self.learning_rate = learning_rate\n        self.weight_decay_rate = weight_decay_rate\n        self.beta_1 = beta_1\n        self.beta_2 = beta_2\n        self.epsilon = epsilon\n        self.exclude_from_weight_decay = exclude_from_weight_decay\n\n    def apply_gradients(self, grads_and_vars, global_step=None, name=None):\n        \"\"\"See base class.\"\"\"\n        assignments = []\n        for (grad, param) in grads_and_vars:\n            if grad is None or param is None:\n                continue\n\n            param_name = self._get_variable_name(param.name)\n\n            m = tf.get_variable(\n                name=param_name + \"/adam_m\",\n                shape=param.shape.as_list(),\n                dtype=tf.float32,\n                trainable=False,\n                initializer=tf.zeros_initializer())\n            v = tf.get_variable(\n                name=param_name + \"/adam_v\",\n                shape=param.shape.as_list(),\n                dtype=tf.float32,\n                trainable=False,\n                initializer=tf.zeros_initializer())\n\n            # Standard Adam update.\n            next_m = (\n                    tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))\n            next_v = (\n                    tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,\n                                                              tf.square(grad)))\n\n            update = next_m / (tf.sqrt(next_v) + self.epsilon)\n\n            # Just adding the square of the weights to the loss function is *not*\n            # the correct way of using L2 regularization/weight decay with Adam,\n            # since that will interact with the m and v parameters in strange ways.\n            #\n            # Instead we want ot decay the weights in a manner that doesn't interact\n            # with the m/v parameters. This is equivalent to adding the square\n            # of the weights to the loss with plain (non-momentum) SGD.\n            if self._do_use_weight_decay(param_name):\n                update += self.weight_decay_rate * param\n\n            update_with_lr = self.learning_rate * update\n\n            next_param = param - update_with_lr\n\n            assignments.extend(\n                [param.assign(next_param),\n                 m.assign(next_m),\n                 v.assign(next_v)])\n        return tf.group(*assignments, name=name)\n\n    def _do_use_weight_decay(self, param_name):\n        \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n        if not self.weight_decay_rate:\n            return False\n        if self.exclude_from_weight_decay:\n            for r in self.exclude_from_weight_decay:\n                if re.search(r, param_name) is not None:\n                    return False\n        return True\n\n    def _get_variable_name(self, param_name):\n        \"\"\"Get the variable name from the tensor name.\"\"\"\n        m = re.match(\"^(.*):\\\\d+$\", param_name)\n        if m is not None:\n            param_name = m.group(1)\n        return param_name\n\n\n#\nclass LAMBOptimizer(tf.train.Optimizer):\n    \"\"\"\n    LAMBOptimizer optimizer.\n    https://github.com/ymcui/LAMB_Optimizer_TF\n    # IMPORTANT NOTE\n    - This is NOT an official implementation.\n    - LAMB optimizer is changed from arXiv v1 ~ v3.\n    - We implement v3 version (which is the latest version on June, 2019.).\n    - Our implementation is based on `AdamWeightDecayOptimizer` in BERT (provided by Google).\n\n    # References\n    - Large Batch Optimization for Deep Learning: Training BERT in 76 minutes. https://arxiv.org/abs/1904.00962v3\n    - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805\n    # Parameters\n    - There is nothing special, just the same as `AdamWeightDecayOptimizer`.\n    \"\"\"\n\n    def __init__(self,\n                 learning_rate,\n                 weight_decay_rate=0.01,\n                 beta_1=0.9,\n                 beta_2=0.999,\n                 epsilon=1e-6,\n                 exclude_from_weight_decay=None,\n                 name=\"LAMBOptimizer\"):\n        \"\"\"Constructs a LAMBOptimizer.\"\"\"\n        super(LAMBOptimizer, self).__init__(False, name)\n\n        self.learning_rate = learning_rate\n        self.weight_decay_rate = weight_decay_rate\n        self.beta_1 = beta_1\n        self.beta_2 = beta_2\n        self.epsilon = epsilon\n        self.exclude_from_weight_decay = exclude_from_weight_decay\n\n    def apply_gradients(self, grads_and_vars, global_step=None, name=None):\n        \"\"\"See base class.\"\"\"\n        assignments = []\n        for (grad, param) in grads_and_vars:\n            if grad is None or param is None:\n                continue\n\n            param_name = self._get_variable_name(param.name)\n\n            m = tf.get_variable(\n                name=param_name + \"/lamb_m\",\n                shape=param.shape.as_list(),\n                dtype=tf.float32,\n                trainable=False,\n                initializer=tf.zeros_initializer())\n            v = tf.get_variable(\n                name=param_name + \"/lamb_v\",\n                shape=param.shape.as_list(),\n                dtype=tf.float32,\n                trainable=False,\n                initializer=tf.zeros_initializer())\n\n            # Standard Adam update.\n            next_m = (\n                    tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))\n            next_v = (\n                    tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,\n                                                              tf.square(grad)))\n\n            update = next_m / (tf.sqrt(next_v) + self.epsilon)\n\n            # Just adding the square of the weights to the loss function is *not*\n            # the correct way of using L2 regularization/weight decay with Adam,\n            # since that will interact with the m and v parameters in strange ways.\n            #\n            # Instead we want ot decay the weights in a manner that doesn't interact\n            # with the m/v parameters. This is equivalent to adding the square\n            # of the weights to the loss with plain (non-momentum) SGD.\n            if self._do_use_weight_decay(param_name):\n                update += self.weight_decay_rate * param\n\n            ############## BELOW ARE THE SPECIFIC PARTS FOR LAMB ##############\n\n            # Note: Here are two choices for scaling function \\phi(z)\n            # minmax:   \\phi(z) = min(max(z, \\gamma_l), \\gamma_u)\n            # identity: \\phi(z) = z\n            # The authors does not mention what is \\gamma_l and \\gamma_u\n            # UPDATE: after asking authors, they provide me the code below.\n            # ratio = array_ops.where(math_ops.greater(w_norm, 0), array_ops.where(\n            #      math_ops.greater(g_norm, 0), (w_norm / g_norm), 1.0), 1.0)\n\n            r1 = tf.sqrt(tf.reduce_sum(tf.square(param)))\n            r2 = tf.sqrt(tf.reduce_sum(tf.square(update)))\n\n            r = tf.where(tf.greater(r1, 0.0),\n                         tf.where(tf.greater(r2, 0.0),\n                                  r1 / r2,\n                                  1.0),\n                         1.0)\n\n            eta = self.learning_rate * r\n\n            update_with_lr = eta * update\n\n            next_param = param - update_with_lr\n\n            assignments.extend(\n                [param.assign(next_param),\n                 m.assign(next_m),\n                 v.assign(next_v)])\n        return tf.group(*assignments, name=name)\n\n    def _do_use_weight_decay(self, param_name):\n        \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n        if not self.weight_decay_rate:\n            return False\n        if self.exclude_from_weight_decay:\n            for r in self.exclude_from_weight_decay:\n                if re.search(r, param_name) is not None:\n                    return False\n        return True\n\n    def _get_variable_name(self, param_name):\n        \"\"\"Get the variable name from the tensor name.\"\"\"\n        m = re.match(\"^(.*):\\\\d+$\", param_name)\n        if m is not None:\n            param_name = m.group(1)\n        return param_name"
  },
  {
    "path": "optimization_finetuning.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Functions and classes related to optimization (weight updates).\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport re\nimport tensorflow as tf\n\n\ndef create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, use_tpu):\n  \"\"\"Creates an optimizer training op.\"\"\"\n  global_step = tf.train.get_or_create_global_step()\n\n  learning_rate = tf.constant(value=init_lr, shape=[], dtype=tf.float32)\n\n  # Implements linear decay of the learning rate.\n  learning_rate = tf.train.polynomial_decay(\n      learning_rate,\n      global_step,\n      num_train_steps,\n      end_learning_rate=0.0,\n      power=1.0,\n      cycle=False)\n\n  # Implements linear warmup. I.e., if global_step < num_warmup_steps, the\n  # learning rate will be `global_step/num_warmup_steps * init_lr`.\n  if num_warmup_steps:\n    global_steps_int = tf.cast(global_step, tf.int32)\n    warmup_steps_int = tf.constant(num_warmup_steps, dtype=tf.int32)\n\n    global_steps_float = tf.cast(global_steps_int, tf.float32)\n    warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)\n\n    warmup_percent_done = global_steps_float / warmup_steps_float\n    warmup_learning_rate = init_lr * warmup_percent_done\n\n    is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)\n    learning_rate = (\n        (1.0 - is_warmup) * learning_rate + is_warmup * warmup_learning_rate)\n\n  # It is recommended that you use this optimizer for fine tuning, since this\n  # is how the model was trained (note that the Adam m/v variables are NOT\n  # loaded from init_checkpoint.)\n  optimizer = AdamWeightDecayOptimizer(\n      learning_rate=learning_rate,\n      weight_decay_rate=0.01,\n      beta_1=0.9,\n      beta_2=0.999, # 0.98 ONLY USED FOR PRETRAIN. MUST CHANGE AT FINE-TUNING 0.999,\n      epsilon=1e-6,\n      exclude_from_weight_decay=[\"LayerNorm\", \"layer_norm\", \"bias\"])\n\n  if use_tpu:\n    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)\n\n  tvars = tf.trainable_variables()\n  grads = tf.gradients(loss, tvars)\n\n  # This is how the model was pre-trained.\n  (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)\n\n  train_op = optimizer.apply_gradients(\n      zip(grads, tvars), global_step=global_step)\n\n  # Normally the global step update is done inside of `apply_gradients`.\n  # However, `AdamWeightDecayOptimizer` doesn't do this. But if you use\n  # a different optimizer, you should probably take this line out.\n  new_global_step = global_step + 1\n  train_op = tf.group(train_op, [global_step.assign(new_global_step)])\n  return train_op\n\n\nclass AdamWeightDecayOptimizer(tf.train.Optimizer):\n  \"\"\"A basic Adam optimizer that includes \"correct\" L2 weight decay.\"\"\"\n\n  def __init__(self,\n               learning_rate,\n               weight_decay_rate=0.0,\n               beta_1=0.9,\n               beta_2=0.999,\n               epsilon=1e-6,\n               exclude_from_weight_decay=None,\n               name=\"AdamWeightDecayOptimizer\"):\n    \"\"\"Constructs a AdamWeightDecayOptimizer.\"\"\"\n    super(AdamWeightDecayOptimizer, self).__init__(False, name)\n\n    self.learning_rate = learning_rate\n    self.weight_decay_rate = weight_decay_rate\n    self.beta_1 = beta_1\n    self.beta_2 = beta_2\n    self.epsilon = epsilon\n    self.exclude_from_weight_decay = exclude_from_weight_decay\n\n  def apply_gradients(self, grads_and_vars, global_step=None, name=None):\n    \"\"\"See base class.\"\"\"\n    assignments = []\n    for (grad, param) in grads_and_vars:\n      if grad is None or param is None:\n        continue\n\n      param_name = self._get_variable_name(param.name)\n\n      m = tf.get_variable(\n          name=param_name + \"/adam_m\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n      v = tf.get_variable(\n          name=param_name + \"/adam_v\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n\n      # Standard Adam update.\n      next_m = (\n          tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))\n      next_v = (\n          tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,\n                                                    tf.square(grad)))\n\n      update = next_m / (tf.sqrt(next_v) + self.epsilon)\n\n      # Just adding the square of the weights to the loss function is *not*\n      # the correct way of using L2 regularization/weight decay with Adam,\n      # since that will interact with the m and v parameters in strange ways.\n      #\n      # Instead we want ot decay the weights in a manner that doesn't interact\n      # with the m/v parameters. This is equivalent to adding the square\n      # of the weights to the loss with plain (non-momentum) SGD.\n      if self._do_use_weight_decay(param_name):\n        update += self.weight_decay_rate * param\n\n      update_with_lr = self.learning_rate * update\n\n      next_param = param - update_with_lr\n\n      assignments.extend(\n          [param.assign(next_param),\n           m.assign(next_m),\n           v.assign(next_v)])\n    return tf.group(*assignments, name=name)\n\n  def _do_use_weight_decay(self, param_name):\n    \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n    if not self.weight_decay_rate:\n      return False\n    if self.exclude_from_weight_decay:\n      for r in self.exclude_from_weight_decay:\n        if re.search(r, param_name) is not None:\n          return False\n    return True\n\n  def _get_variable_name(self, param_name):\n    \"\"\"Get the variable name from the tensor name.\"\"\"\n    m = re.match(\"^(.*):\\\\d+$\", param_name)\n    if m is not None:\n      param_name = m.group(1)\n    return param_name\n"
  },
  {
    "path": "optimization_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"Functions and classes related to optimization (weight updates).\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport re\n\nimport six\nfrom six.moves import zip\nimport tensorflow as tf\n\nimport lamb_optimizer_google as lamb_optimizer\n\n\ndef create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, use_tpu,\n                     optimizer=\"adamw\", poly_power=1.0, start_warmup_step=0):\n  \"\"\"Creates an optimizer training op.\"\"\"\n  global_step = tf.train.get_or_create_global_step()\n\n  learning_rate = tf.constant(value=init_lr, shape=[], dtype=tf.float32)\n\n  # Implements linear decay of the learning rate.\n  learning_rate = tf.train.polynomial_decay(\n      learning_rate,\n      global_step,\n      num_train_steps,\n      end_learning_rate=0.0,\n      power=poly_power,\n      cycle=False)\n\n  # Implements linear warmup. I.e., if global_step - start_warmup_step <\n  # num_warmup_steps, the learning rate will be\n  # `(global_step - start_warmup_step)/num_warmup_steps * init_lr`.\n  if num_warmup_steps:\n    tf.logging.info(\"++++++ warmup starts at step \" + str(start_warmup_step)\n                    + \", for \" + str(num_warmup_steps) + \" steps ++++++\")\n    global_steps_int = tf.cast(global_step, tf.int32)\n    start_warm_int = tf.constant(start_warmup_step, dtype=tf.int32)\n    global_steps_int = global_steps_int - start_warm_int\n    warmup_steps_int = tf.constant(num_warmup_steps, dtype=tf.int32)\n\n    global_steps_float = tf.cast(global_steps_int, tf.float32)\n    warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)\n\n    warmup_percent_done = global_steps_float / warmup_steps_float\n    warmup_learning_rate = init_lr * warmup_percent_done\n\n    is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)\n    learning_rate = (\n        (1.0 - is_warmup) * learning_rate + is_warmup * warmup_learning_rate)\n\n  # It is OK that you use this optimizer for finetuning, since this\n  # is how the model was trained (note that the Adam m/v variables are NOT\n  # loaded from init_checkpoint.)\n  # It is OK to use AdamW in the finetuning even the model is trained by LAMB.\n  # As report in the Bert pulic github, the learning rate for SQuAD 1.1 finetune\n  # is 3e-5, 4e-5 or 5e-5. For LAMB, the users can use 3e-4, 4e-4,or 5e-4 for a\n  # batch size of 64 in the finetune.\n  if optimizer == \"adamw\":\n    tf.logging.info(\"using adamw\")\n    optimizer = AdamWeightDecayOptimizer(\n        learning_rate=learning_rate,\n        weight_decay_rate=0.01,\n        beta_1=0.9,\n        beta_2=0.999,\n        epsilon=1e-6,\n        exclude_from_weight_decay=[\"LayerNorm\", \"layer_norm\", \"bias\"])\n  elif optimizer == \"lamb\":\n    tf.logging.info(\"using lamb\")\n    optimizer = lamb_optimizer.LAMBOptimizer(\n        learning_rate=learning_rate,\n        weight_decay_rate=0.01,\n        beta_1=0.9,\n        beta_2=0.999,\n        epsilon=1e-6,\n        exclude_from_weight_decay=[\"LayerNorm\", \"layer_norm\", \"bias\"])\n  else:\n    raise ValueError(\"Not supported optimizer: \", optimizer)\n\n  if use_tpu:\n    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)\n\n  tvars = tf.trainable_variables()\n  grads = tf.gradients(loss, tvars)\n\n  # This is how the model was pre-trained.\n  (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)\n\n  train_op = optimizer.apply_gradients(\n      list(zip(grads, tvars)), global_step=global_step)\n\n  # Normally the global step update is done inside of `apply_gradients`.\n  # However, neither `AdamWeightDecayOptimizer` nor `LAMBOptimizer` do this.\n  # But if you use a different optimizer, you should probably take this line\n  # out.\n  new_global_step = global_step + 1\n  train_op = tf.group(train_op, [global_step.assign(new_global_step)])\n  return train_op\n\n\nclass AdamWeightDecayOptimizer(tf.train.Optimizer):\n  \"\"\"A basic Adam optimizer that includes \"correct\" L2 weight decay.\"\"\"\n\n  def __init__(self,\n               learning_rate,\n               weight_decay_rate=0.0,\n               beta_1=0.9,\n               beta_2=0.999,\n               epsilon=1e-6,\n               exclude_from_weight_decay=None,\n               name=\"AdamWeightDecayOptimizer\"):\n    \"\"\"Constructs a AdamWeightDecayOptimizer.\"\"\"\n    super(AdamWeightDecayOptimizer, self).__init__(False, name)\n\n    self.learning_rate = learning_rate\n    self.weight_decay_rate = weight_decay_rate\n    self.beta_1 = beta_1\n    self.beta_2 = beta_2\n    self.epsilon = epsilon\n    self.exclude_from_weight_decay = exclude_from_weight_decay\n\n  def apply_gradients(self, grads_and_vars, global_step=None, name=None):\n    \"\"\"See base class.\"\"\"\n    assignments = []\n    for (grad, param) in grads_and_vars:\n      if grad is None or param is None:\n        continue\n\n      param_name = self._get_variable_name(param.name)\n\n      m = tf.get_variable(\n          name=six.ensure_str(param_name) + \"/adam_m\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n      v = tf.get_variable(\n          name=six.ensure_str(param_name) + \"/adam_v\",\n          shape=param.shape.as_list(),\n          dtype=tf.float32,\n          trainable=False,\n          initializer=tf.zeros_initializer())\n\n      # Standard Adam update.\n      next_m = (\n          tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))\n      next_v = (\n          tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,\n                                                    tf.square(grad)))\n\n      update = next_m / (tf.sqrt(next_v) + self.epsilon)\n\n      # Just adding the square of the weights to the loss function is *not*\n      # the correct way of using L2 regularization/weight decay with Adam,\n      # since that will interact with the m and v parameters in strange ways.\n      #\n      # Instead we want ot decay the weights in a manner that doesn't interact\n      # with the m/v parameters. This is equivalent to adding the square\n      # of the weights to the loss with plain (non-momentum) SGD.\n      if self._do_use_weight_decay(param_name):\n        update += self.weight_decay_rate * param\n\n      update_with_lr = self.learning_rate * update\n\n      next_param = param - update_with_lr\n\n      assignments.extend(\n          [param.assign(next_param),\n           m.assign(next_m),\n           v.assign(next_v)])\n    return tf.group(*assignments, name=name)\n\n  def _do_use_weight_decay(self, param_name):\n    \"\"\"Whether to use L2 weight decay for `param_name`.\"\"\"\n    if not self.weight_decay_rate:\n      return False\n    if self.exclude_from_weight_decay:\n      for r in self.exclude_from_weight_decay:\n        if re.search(r, param_name) is not None:\n          return False\n    return True\n\n  def _get_variable_name(self, param_name):\n    \"\"\"Get the variable name from the tensor name.\"\"\"\n    m = re.match(\"^(.*):\\\\d+$\", six.ensure_str(param_name))\n    if m is not None:\n      param_name = m.group(1)\n    return param_name\n"
  },
  {
    "path": "resources/create_pretraining_data_roberta.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Create masked LM/next sentence masked_lm TF examples for BERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport random\nimport re\nimport tokenization\nimport tensorflow as tf\nimport jieba\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\nflags.DEFINE_string(\"input_file\", None,\n                    \"Input raw text file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\n    \"output_file\", None,\n    \"Output TF example file (or comma-separated list of files).\")\n\nflags.DEFINE_string(\"vocab_file\", None,\n                    \"The vocabulary file that the BERT model was trained on.\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_bool(\n    \"do_whole_word_mask\", False,\n    \"Whether to use whole word masking rather than per-WordPiece masking.\")\n\nflags.DEFINE_integer(\"max_seq_length\", 128, \"Maximum sequence length.\")\n\nflags.DEFINE_integer(\"max_predictions_per_seq\", 20,\n                     \"Maximum number of masked LM predictions per sequence.\")\n\nflags.DEFINE_integer(\"random_seed\", 12345, \"Random seed for data generation.\")\n\nflags.DEFINE_integer(\n    \"dupe_factor\", 10,\n    \"Number of times to duplicate the input data (with different masks).\")\n\nflags.DEFINE_float(\"masked_lm_prob\", 0.15, \"Masked LM probability.\")\n\nflags.DEFINE_float(\n    \"short_seq_prob\", 0.1,\n    \"Probability of creating sequences which are shorter than the \"\n    \"maximum length.\")\n\n\nclass TrainingInstance(object):\n    \"\"\"A single training instance (sentence pair).\"\"\"\n\n    def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm_labels,\n                 is_random_next):\n        self.tokens = tokens\n        self.segment_ids = segment_ids\n        self.is_random_next = is_random_next\n        self.masked_lm_positions = masked_lm_positions\n        self.masked_lm_labels = masked_lm_labels\n\n    def __str__(self):\n        s = \"\"\n        s += \"tokens: %s\\n\" % (\" \".join(\n            [tokenization.printable_text(x) for x in self.tokens]))\n        s += \"segment_ids: %s\\n\" % (\" \".join([str(x) for x in self.segment_ids]))\n        s += \"is_random_next: %s\\n\" % self.is_random_next\n        s += \"masked_lm_positions: %s\\n\" % (\" \".join(\n            [str(x) for x in self.masked_lm_positions]))\n        s += \"masked_lm_labels: %s\\n\" % (\" \".join(\n            [tokenization.printable_text(x) for x in self.masked_lm_labels]))\n        s += \"\\n\"\n        return s\n\n    def __repr__(self):\n        return self.__str__()\n\n\ndef write_instance_to_example_files(instances, tokenizer, max_seq_length,\n                                    max_predictions_per_seq, output_files):\n    \"\"\"Create TF example files from `TrainingInstance`s.\"\"\"\n    writers = []\n    for output_file in output_files:\n        writers.append(tf.python_io.TFRecordWriter(output_file))\n\n    writer_index = 0\n\n    total_written = 0\n    for (inst_index, instance) in enumerate(instances):\n        input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)\n        input_mask = [1] * len(input_ids)\n        segment_ids = list(instance.segment_ids)\n        assert len(input_ids) <= max_seq_length\n\n        while len(input_ids) < max_seq_length:\n            input_ids.append(0)\n            input_mask.append(0)\n            segment_ids.append(0)\n\n        assert len(input_ids) == max_seq_length\n        assert len(input_mask) == max_seq_length\n        # print(\"length of segment_ids:\",len(segment_ids),\"max_seq_length:\", max_seq_length)\n        assert len(segment_ids) == max_seq_length\n\n        masked_lm_positions = list(instance.masked_lm_positions)\n        masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)\n        masked_lm_weights = [1.0] * len(masked_lm_ids)\n\n        while len(masked_lm_positions) < max_predictions_per_seq:\n            masked_lm_positions.append(0)\n            masked_lm_ids.append(0)\n            masked_lm_weights.append(0.0)\n\n        next_sentence_label = 1 if instance.is_random_next else 0\n\n        features = collections.OrderedDict()\n        features[\"input_ids\"] = create_int_feature(input_ids)\n        features[\"input_mask\"] = create_int_feature(input_mask)\n        features[\"segment_ids\"] = create_int_feature(segment_ids)\n        features[\"masked_lm_positions\"] = create_int_feature(masked_lm_positions)\n        features[\"masked_lm_ids\"] = create_int_feature(masked_lm_ids)\n        features[\"masked_lm_weights\"] = create_float_feature(masked_lm_weights)\n        features[\"next_sentence_labels\"] = create_int_feature([next_sentence_label])\n\n        tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n\n        writers[writer_index].write(tf_example.SerializeToString())\n        writer_index = (writer_index + 1) % len(writers)\n\n        total_written += 1\n\n        if inst_index < 20:\n            tf.logging.info(\"*** Example ***\")\n            tf.logging.info(\"tokens: %s\" % \" \".join(\n                [tokenization.printable_text(x) for x in instance.tokens]))\n\n            for feature_name in features.keys():\n                feature = features[feature_name]\n                values = []\n                if feature.int64_list.value:\n                    values = feature.int64_list.value\n                elif feature.float_list.value:\n                    values = feature.float_list.value\n                tf.logging.info(\n                    \"%s: %s\" % (feature_name, \" \".join([str(x) for x in values])))\n\n    for writer in writers:\n        writer.close()\n\n    tf.logging.info(\"Wrote %d total instances\", total_written)\n\n\ndef create_int_feature(values):\n    feature = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n    return feature\n\n\ndef create_float_feature(values):\n    feature = tf.train.Feature(float_list=tf.train.FloatList(value=list(values)))\n    return feature\n\n\ndef create_training_instances(input_files, tokenizer, max_seq_length,\n                              dupe_factor, short_seq_prob, masked_lm_prob,\n                              max_predictions_per_seq, rng):\n    \"\"\"Create `TrainingInstance`s from raw text.\"\"\"\n    all_documents = [[]]\n\n    # Input file format:\n    # (1) One sentence per line. These should ideally be actual sentences, not\n    # entire paragraphs or arbitrary spans of text. (Because we use the\n    # sentence boundaries for the \"next sentence prediction\" task).\n    # (2) Blank lines between documents. Document boundaries are needed so\n    # that the \"next sentence prediction\" task doesn't span between documents.\n    print(\"create_training_instances.started...\")\n    for input_file in input_files:\n        with tf.gfile.GFile(input_file, \"r\") as reader:\n            while True:\n                line = tokenization.convert_to_unicode(reader.readline().replace(\"<eop>\",\"\"))# .replace(\"”\",\"\")) # 将<eop>、”替换掉。\n                if not line:\n                    break\n                line = line.strip()\n\n                # Empty lines are used as document delimiters\n                if not line:\n                    all_documents.append([])\n                tokens = tokenizer.tokenize(line)\n                if tokens:\n                    all_documents[-1].append(tokens)\n\n    # Remove empty documents\n    all_documents = [x for x in all_documents if x]\n    rng.shuffle(all_documents)\n\n    vocab_words = list(tokenizer.vocab.keys())\n    instances = []\n    for _ in range(dupe_factor):\n        for document_index in range(len(all_documents)):\n            instances.extend(\n                create_instances_from_document(\n                    all_documents, document_index, max_seq_length, short_seq_prob,\n                    masked_lm_prob, max_predictions_per_seq, vocab_words, rng))\n\n    rng.shuffle(instances)\n    print(\"create_training_instances.ended...\")\n\n    return instances\n\n\ndef _is_chinese_char(cp):\n    \"\"\"Checks whether CP is the codepoint of a CJK character.\"\"\"\n    # This defines a \"chinese character\" as anything in the CJK Unicode block:\n    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)\n    #\n    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,\n    # despite its name. The modern Korean Hangul alphabet is a different block,\n    # as is Japanese Hiragana and Katakana. Those alphabets are used to write\n    # space-separated words, so they are not treated specially and handled\n    # like the all of the other languages.\n    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #\n        (cp >= 0x3400 and cp <= 0x4DBF) or  #\n        (cp >= 0x20000 and cp <= 0x2A6DF) or  #\n        (cp >= 0x2A700 and cp <= 0x2B73F) or  #\n        (cp >= 0x2B740 and cp <= 0x2B81F) or  #\n        (cp >= 0x2B820 and cp <= 0x2CEAF) or\n        (cp >= 0xF900 and cp <= 0xFAFF) or  #\n            (cp >= 0x2F800 and cp <= 0x2FA1F)):  #\n        return True\n\n\ndef get_new_segment(segment): #  新增的方法 ####\n    \"\"\"\n    输入一句话，返回一句经过处理的话: 为了支持中文全称mask，将被分开的词，将上特殊标记(\"#\")，使得后续处理模块，能够知道哪些字是属于同一个词的。\n    :param segment: 一句话\n    :return: 一句处理过的话\n    \"\"\"\n    seq_cws = jieba.lcut(\"\".join(segment))\n    seq_cws_dict = {x: 1 for x in seq_cws}\n    new_segment = []\n    i = 0\n    while i < len(segment):\n        if len(re.findall('[\\u4E00-\\u9FA5]', segment[i]))==0: # 不是中文的，原文加进去。\n            new_segment.append(segment[i])\n            i += 1\n            continue\n\n        has_add = False\n        for length in range(3,0,-1):\n            if i+length>len(segment):\n                continue\n            if ''.join(segment[i:i+length]) in seq_cws_dict:\n                new_segment.append(segment[i])\n                for l in range(1, length):\n                    new_segment.append('##' + segment[i+l])\n                i += length\n                has_add = True\n                break\n        if not has_add:\n            new_segment.append(segment[i])\n            i += 1\n    return new_segment\n\ndef get_raw_instance(document,max_sequence_length): # 新增的方法 TODO need check again to ensure full use of data\n    \"\"\"\n    获取初步的训练实例，将整段按照max_sequence_length切分成多个部分,并以多个处理好的实例的形式返回。\n    :param document: 一整段\n    :param max_sequence_length:\n    :return: a list. each element is a sequence of text\n    \"\"\"\n    max_sequence_length_allowed=max_sequence_length-2\n    document = [seq for seq in document if len(seq)<max_sequence_length_allowed]\n    sizes = [len(seq) for seq in document]\n\n    result_list = []\n    curr_seq = [] # 当前处理的序列\n    sz_idx = 0\n    while sz_idx < len(sizes):\n        # 当前句子加上新的句子，如果长度小于最大限制，则合并当前句子和新句子；否则即超过了最大限制，那么做为一个新的序列加到目标列表中\n        if len(curr_seq) + sizes[sz_idx] <= max_sequence_length_allowed: # or len(curr_seq)==0:\n            curr_seq += document[sz_idx]\n            sz_idx += 1\n        else:\n            result_list.append(curr_seq)\n            curr_seq = []\n    # 对最后一个序列进行处理，如果太短的话，丢弃掉。\n    if len(curr_seq)>max_sequence_length_allowed/2: # /2\n        result_list.append(curr_seq)\n\n    # # 计算总共可以得到多少份\n    # num_instance=int(len(big_list)/max_sequence_length_allowed)+1\n    # print(\"num_instance:\",num_instance)\n    # # 切分成多份，添加到列表中\n    # result_list=[]\n    # for j in range(num_instance):\n    #     index=j*max_sequence_length_allowed\n    #     end_index=index+max_sequence_length_allowed if j!=num_instance-1 else -1\n    #     result_list.append(big_list[index:end_index])\n    return result_list\n\ndef create_instances_from_document( # 新增的方法\n    # 目标按照RoBERTa的思路，使用DOC-SENTENCES，并会去掉NSP任务: 从一个文档中连续的获得文本，直到达到最大长度。如果是从下一个文档中获得，那么加上一个分隔符\n    #  document即一整段话，包含多个句子。每个句子叫做segment.\n    # 给定一个document即一整段话，生成一些instance.\n        all_documents, document_index, max_seq_length, short_seq_prob,\n        masked_lm_prob, max_predictions_per_seq, vocab_words, rng):\n    \"\"\"Creates `TrainingInstance`s for a single document.\"\"\"\n    document = all_documents[document_index]\n\n    # Account for [CLS], [SEP], [SEP]\n    max_num_tokens = max_seq_length - 3\n\n    # We *usually* want to fill up the entire sequence since we are padding\n    # to `max_seq_length` anyways, so short sequences are generally wasted\n    # computation. However, we *sometimes*\n    # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter\n    # sequences to minimize the mismatch between pre-training and fine-tuning.\n    # The `target_seq_length` is just a rough target however, whereas\n    # `max_seq_length` is a hard limit.\n\n    #target_seq_length = max_num_tokens\n    #if rng.random() < short_seq_prob:\n    #    target_seq_length = rng.randint(2, max_num_tokens)\n\n    instances = []\n    raw_text_list_list=get_raw_instance(document, max_seq_length) # document即一整段话，包含多个句子。每个句子叫做segment.\n    for j, raw_text_list in enumerate(raw_text_list_list):\n        ####################################################################################################################\n        raw_text_list = get_new_segment(raw_text_list) # 结合分词的中文的whole mask设置即在需要的地方加上“##”\n        # 1、设置token, segment_ids\n        is_random_next=True # this will not be used, so it's value doesn't matter\n        tokens = []\n        segment_ids = []\n        tokens.append(\"[CLS]\")\n        segment_ids.append(0)\n        for token in raw_text_list:\n            tokens.append(token)\n            segment_ids.append(0)\n        tokens.append(\"[SEP]\")\n        segment_ids.append(0)\n        ################################################################################################################\n        # 2、调用原有的方法\n        (tokens, masked_lm_positions,\n         masked_lm_labels) = create_masked_lm_predictions(\n            tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)\n        instance = TrainingInstance(\n            tokens=tokens,\n            segment_ids=segment_ids,\n            is_random_next=is_random_next,\n            masked_lm_positions=masked_lm_positions,\n            masked_lm_labels=masked_lm_labels)\n        instances.append(instance)\n\n    return instances\n\n\n\ndef create_instances_from_document_original(\n        all_documents, document_index, max_seq_length, short_seq_prob,\n        masked_lm_prob, max_predictions_per_seq, vocab_words, rng):\n    \"\"\"Creates `TrainingInstance`s for a single document.\"\"\"\n    document = all_documents[document_index]\n\n    # Account for [CLS], [SEP], [SEP]\n    max_num_tokens = max_seq_length - 3\n\n    # We *usually* want to fill up the entire sequence since we are padding\n    # to `max_seq_length` anyways, so short sequences are generally wasted\n    # computation. However, we *sometimes*\n    # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter\n    # sequences to minimize the mismatch between pre-training and fine-tuning.\n    # The `target_seq_length` is just a rough target however, whereas\n    # `max_seq_length` is a hard limit.\n    target_seq_length = max_num_tokens\n    if rng.random() < short_seq_prob:\n        target_seq_length = rng.randint(2, max_num_tokens)\n\n    # We DON'T just concatenate all of the tokens from a document into a long\n    # sequence and choose an arbitrary split point because this would make the\n    # next sentence prediction task too easy. Instead, we split the input into\n    # segments \"A\" and \"B\" based on the actual \"sentences\" provided by the user\n    # input.\n    instances = []\n    current_chunk = []\n    current_length = 0\n    i = 0\n    print(\"document_index:\",document_index,\"document:\",type(document),\" ;document:\",document) # document即一整段话，包含多个句子。每个句子叫做segment.\n    while i < len(document):\n        segment = document[i] # 取到一个部分（可能是一段话）\n        print(\"i:\",i,\" ;segment:\",segment)\n        ####################################################################################################################\n        segment = get_new_segment(segment) # 结合分词的中文的whole mask设置即在需要的地方加上“##”\n        ###################################################################################################################\n        current_chunk.append(segment)\n        current_length += len(segment)\n        print(\"#####condition:\",i == len(document) - 1 or current_length >= target_seq_length)\n        if i == len(document) - 1 or current_length >= target_seq_length:\n            if current_chunk:\n                # `a_end` is how many segments from `current_chunk` go into the `A`\n                # (first) sentence.\n                a_end = 1\n                if len(current_chunk) >= 2:\n                    a_end = rng.randint(1, len(current_chunk) - 1)\n\n                tokens_a = []\n                for j in range(a_end):\n                    tokens_a.extend(current_chunk[j])\n\n                tokens_b = []\n                # Random next\n                is_random_next = False\n                if len(current_chunk) == 1 or rng.random() < 0.5:\n                    is_random_next = True\n                    target_b_length = target_seq_length - len(tokens_a)\n\n                    # This should rarely go for more than one iteration for large\n                    # corpora. However, just to be careful, we try to make sure that\n                    # the random document is not the same as the document\n                    # we're processing.\n                    for _ in range(10):\n                        random_document_index = rng.randint(0, len(all_documents) - 1)\n                        if random_document_index != document_index:\n                            break\n\n                    random_document = all_documents[random_document_index]\n                    random_start = rng.randint(0, len(random_document) - 1)\n                    for j in range(random_start, len(random_document)):\n                        tokens_b.extend(random_document[j])\n                        if len(tokens_b) >= target_b_length:\n                            break\n                    # We didn't actually use these segments so we \"put them back\" so\n                    # they don't go to waste.\n                    num_unused_segments = len(current_chunk) - a_end\n                    i -= num_unused_segments\n                # Actual next\n                else:\n                    is_random_next = False\n                    for j in range(a_end, len(current_chunk)):\n                        tokens_b.extend(current_chunk[j])\n                truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)\n\n                assert len(tokens_a) >= 1\n                assert len(tokens_b) >= 1\n\n                tokens = []\n                segment_ids = []\n                tokens.append(\"[CLS]\")\n                segment_ids.append(0)\n                for token in tokens_a:\n                    tokens.append(token)\n                    segment_ids.append(0)\n\n                tokens.append(\"[SEP]\")\n                segment_ids.append(0)\n\n                for token in tokens_b:\n                    tokens.append(token)\n                    segment_ids.append(1)\n                tokens.append(\"[SEP]\")\n                segment_ids.append(1)\n\n                (tokens, masked_lm_positions,\n                 masked_lm_labels) = create_masked_lm_predictions(\n                     tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)\n                instance = TrainingInstance(\n                    tokens=tokens,\n                    segment_ids=segment_ids,\n                    is_random_next=is_random_next,\n                    masked_lm_positions=masked_lm_positions,\n                    masked_lm_labels=masked_lm_labels)\n                instances.append(instance)\n            current_chunk = []\n            current_length = 0\n        i += 1\n\n    return instances\n\n\nMaskedLmInstance = collections.namedtuple(\"MaskedLmInstance\",\n                                          [\"index\", \"label\"])\n\n\ndef create_masked_lm_predictions(tokens, masked_lm_prob,\n                                 max_predictions_per_seq, vocab_words, rng):\n    \"\"\"Creates the predictions for the masked LM objective.\"\"\"\n\n    cand_indexes = []\n    for (i, token) in enumerate(tokens):\n        if token == \"[CLS]\" or token == \"[SEP]\":\n            continue\n        # Whole Word Masking means that if we mask all of the wordpieces\n        # corresponding to an original word. When a word has been split into\n        # WordPieces, the first token does not have any marker and any subsequence\n        # tokens are prefixed with ##. So whenever we see the ## token, we\n        # append it to the previous set of word indexes.\n        #\n        # Note that Whole Word Masking does *not* change the training code\n        # at all -- we still predict each WordPiece independently, softmaxed\n        # over the entire vocabulary.\n        if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and\n                token.startswith(\"##\")):\n            cand_indexes[-1].append(i)\n        else:\n            cand_indexes.append([i])\n\n    rng.shuffle(cand_indexes)\n\n    output_tokens = [t[2:] if len(re.findall('##[\\u4E00-\\u9FA5]', t))>0 else t for t in tokens] # 去掉\"##\"\n\n    num_to_predict = min(max_predictions_per_seq,\n                         max(1, int(round(len(tokens) * masked_lm_prob))))\n\n    masked_lms = []\n    covered_indexes = set()\n    for index_set in cand_indexes:\n        if len(masked_lms) >= num_to_predict:\n            break\n        # If adding a whole-word mask would exceed the maximum number of\n        # predictions, then just skip this candidate.\n        if len(masked_lms) + len(index_set) > num_to_predict:\n            continue\n        is_any_index_covered = False\n        for index in index_set:\n            if index in covered_indexes:\n                is_any_index_covered = True\n                break\n        if is_any_index_covered:\n            continue\n        for index in index_set:\n            covered_indexes.add(index)\n\n            masked_token = None\n            # 80% of the time, replace with [MASK]\n            if rng.random() < 0.8:\n                masked_token = \"[MASK]\"\n            else:\n                # 10% of the time, keep original\n                if rng.random() < 0.5:\n                    masked_token = tokens[index][2:] if len(re.findall('##[\\u4E00-\\u9FA5]', tokens[index]))>0 else tokens[index] # 去掉\"##\"\n                # 10% of the time, replace with random word\n                else:\n                    masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]\n\n            output_tokens[index] = masked_token\n\n            masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))\n    assert len(masked_lms) <= num_to_predict\n    masked_lms = sorted(masked_lms, key=lambda x: x.index)\n\n    masked_lm_positions = []\n    masked_lm_labels = []\n    for p in masked_lms:\n        masked_lm_positions.append(p.index)\n        masked_lm_labels.append(p.label)\n\n    # tf.logging.info('%s' % (tokens))\n    # tf.logging.info('%s' % (output_tokens))\n    return (output_tokens, masked_lm_positions, masked_lm_labels)\n\n\ndef truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):\n    \"\"\"Truncates a pair of sequences to a maximum sequence length.\"\"\"\n    while True:\n        total_length = len(tokens_a) + len(tokens_b)\n        if total_length <= max_num_tokens:\n            break\n\n        trunc_tokens = tokens_a if len(tokens_a) > len(tokens_b) else tokens_b\n        assert len(trunc_tokens) >= 1\n\n        # We want to sometimes truncate from the front and sometimes from the\n        # back to add more randomness and avoid biases.\n        if rng.random() < 0.5:\n            del trunc_tokens[0]\n        else:\n            trunc_tokens.pop()\n\n\ndef main(_):\n    tf.logging.set_verbosity(tf.logging.INFO)\n\n    tokenizer = tokenization.FullTokenizer(\n        vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)\n\n    input_files = []\n    for input_pattern in FLAGS.input_file.split(\",\"):\n        input_files.extend(tf.gfile.Glob(input_pattern))\n\n    tf.logging.info(\"*** Reading from input files ***\")\n    for input_file in input_files:\n        tf.logging.info(\"  %s\", input_file)\n\n    rng = random.Random(FLAGS.random_seed)\n    instances = create_training_instances(\n        input_files, tokenizer, FLAGS.max_seq_length, FLAGS.dupe_factor,\n        FLAGS.short_seq_prob, FLAGS.masked_lm_prob, FLAGS.max_predictions_per_seq,\n        rng)\n\n    output_files = FLAGS.output_file.split(\",\")\n    tf.logging.info(\"*** Writing to output files ***\")\n    for output_file in output_files:\n        tf.logging.info(\"  %s\", output_file)\n\n    write_instance_to_example_files(instances, tokenizer, FLAGS.max_seq_length,\n                                    FLAGS.max_predictions_per_seq, output_files)\n\n\nif __name__ == \"__main__\":\n    flags.mark_flag_as_required(\"input_file\")\n    flags.mark_flag_as_required(\"output_file\")\n    flags.mark_flag_as_required(\"vocab_file\")\n    tf.app.run()"
  },
  {
    "path": "resources/shell_scripts/create_pretrain_data_batch_webtext.sh",
    "content": "#!/usr/bin/env bash\necho $1,$2\n\nBERT_BASE_DIR=./bert_config\nfor((i=$1;i<=$2;i++));\ndo\npython3 create_pretraining_data.py --do_whole_word_mask=True --input_file=gs://raw_text/web_text_zh_raw/web_text_zh_$i.txt \\\n--output_file=gs://albert_zh/tf_records/tf_web_text_zh_$i.tfrecord --vocab_file=$BERT_BASE_DIR/vocab.txt --do_lower_case=True \\\n--max_seq_length=512 --max_predictions_per_seq=76 --masked_lm_prob=0.15\ndone\n"
  },
  {
    "path": "run_classifier.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"BERT finetuning runner.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport csv\nimport os\nimport modeling\nimport optimization_finetuning as optimization\nimport tokenization\nimport tensorflow as tf\n# from loss import bi_tempered_logistic_loss\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n## Required parameters\nflags.DEFINE_string(\n    \"data_dir\", None,\n    \"The input data dir. Should contain the .tsv files (or other data files) \"\n    \"for the task.\")\n\nflags.DEFINE_string(\n    \"bert_config_file\", None,\n    \"The config json file corresponding to the pre-trained BERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\"task_name\", None, \"The name of the task to train.\")\n\nflags.DEFINE_string(\"vocab_file\", None,\n                    \"The vocabulary file that the BERT model was trained on.\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\n## Other parameters\n\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained BERT model).\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 128,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded.\")\n\nflags.DEFINE_bool(\"do_train\", False, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_bool(\n    \"do_predict\", False,\n    \"Whether to run the model in inference mode on the test set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 32, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 8, \"Total batch size for eval.\")\n\nflags.DEFINE_integer(\"predict_batch_size\", 8, \"Total batch size for predict.\")\n\nflags.DEFINE_float(\"learning_rate\", 5e-5, \"The initial learning rate for Adam.\")\n\nflags.DEFINE_float(\"num_train_epochs\", 3.0,\n                   \"Total number of training epochs to perform.\")\n\nflags.DEFINE_float(\n    \"warmup_proportion\", 0.1,\n    \"Proportion of training to perform linear learning rate warmup for. \"\n    \"E.g., 0.1 = 10% of training.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 1000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\n\nclass InputExample(object):\n  \"\"\"A single training/test example for simple sequence classification.\"\"\"\n\n  def __init__(self, guid, text_a, text_b=None, label=None):\n    \"\"\"Constructs a InputExample.\n    Args:\n      guid: Unique id for the example.\n      text_a: string. The untokenized text of the first sequence. For single\n        sequence tasks, only this sequence must be specified.\n      text_b: (Optional) string. The untokenized text of the second sequence.\n        Only must be specified for sequence pair tasks.\n      label: (Optional) string. The label of the example. This should be\n        specified for train and dev examples, but not for test examples.\n    \"\"\"\n    self.guid = guid\n    self.text_a = text_a\n    self.text_b = text_b\n    self.label = label\n\n\nclass PaddingInputExample(object):\n  \"\"\"Fake example so the num input examples is a multiple of the batch size.\n  When running eval/predict on the TPU, we need to pad the number of examples\n  to be a multiple of the batch size, because the TPU requires a fixed batch\n  size. The alternative is to drop the last batch, which is bad because it means\n  the entire output data won't be generated.\n  We use this class instead of `None` because treating `None` as padding\n  battches could cause silent errors.\n  \"\"\"\n\n\nclass InputFeatures(object):\n  \"\"\"A single set of features of data.\"\"\"\n\n  def __init__(self,\n               input_ids,\n               input_mask,\n               segment_ids,\n               label_id,\n               is_real_example=True):\n    self.input_ids = input_ids\n    self.input_mask = input_mask\n    self.segment_ids = segment_ids\n    self.label_id = label_id\n    self.is_real_example = is_real_example\n\n\nclass DataProcessor(object):\n  \"\"\"Base class for data converters for sequence classification data sets.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the train set.\"\"\"\n    raise NotImplementedError()\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the dev set.\"\"\"\n    raise NotImplementedError()\n\n  def get_test_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for prediction.\"\"\"\n    raise NotImplementedError()\n\n  def get_labels(self):\n    \"\"\"Gets the list of labels for this data set.\"\"\"\n    raise NotImplementedError()\n\n  @classmethod\n  def _read_tsv(cls, input_file, quotechar=None):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with tf.gfile.Open(input_file, \"r\") as f:\n      reader = csv.reader(f, delimiter=\"\\t\", quotechar=quotechar)\n      lines = []\n      for line in reader:\n        lines.append(line)\n      return lines\n\ndef convert_single_example(ex_index, example, label_list, max_seq_length,\n                           tokenizer):\n  \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\n\n  if isinstance(example, PaddingInputExample):\n    return InputFeatures(\n        input_ids=[0] * max_seq_length,\n        input_mask=[0] * max_seq_length,\n        segment_ids=[0] * max_seq_length,\n        label_id=0,\n        is_real_example=False)\n\n  label_map = {}\n  for (i, label) in enumerate(label_list):\n    label_map[label] = i\n\n  tokens_a = tokenizer.tokenize(example.text_a)\n  tokens_b = None\n  if example.text_b:\n    tokens_b = tokenizer.tokenize(example.text_b)\n\n  if tokens_b:\n    # Modifies `tokens_a` and `tokens_b` in place so that the total\n    # length is less than the specified length.\n    # Account for [CLS], [SEP], [SEP] with \"- 3\"\n    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\n  else:\n    # Account for [CLS] and [SEP] with \"- 2\"\n    if len(tokens_a) > max_seq_length - 2:\n      tokens_a = tokens_a[0:(max_seq_length - 2)]\n\n  # The convention in BERT is:\n  # (a) For sequence pairs:\n  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\n  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\n  # (b) For single sequences:\n  #  tokens:   [CLS] the dog is hairy . [SEP]\n  #  type_ids: 0     0   0   0  0     0 0\n  #\n  # Where \"type_ids\" are used to indicate whether this is the first\n  # sequence or the second sequence. The embedding vectors for `type=0` and\n  # `type=1` were learned during pre-training and are added to the wordpiece\n  # embedding vector (and position vector). This is not *strictly* necessary\n  # since the [SEP] token unambiguously separates the sequences, but it makes\n  # it easier for the model to learn the concept of sequences.\n  #\n  # For classification tasks, the first vector (corresponding to [CLS]) is\n  # used as the \"sentence vector\". Note that this only makes sense because\n  # the entire model is fine-tuned.\n  tokens = []\n  segment_ids = []\n  tokens.append(\"[CLS]\")\n  segment_ids.append(0)\n  for token in tokens_a:\n    tokens.append(token)\n    segment_ids.append(0)\n  tokens.append(\"[SEP]\")\n  segment_ids.append(0)\n\n  if tokens_b:\n    for token in tokens_b:\n      tokens.append(token)\n      segment_ids.append(1)\n    tokens.append(\"[SEP]\")\n    segment_ids.append(1)\n\n  input_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n  # The mask has 1 for real tokens and 0 for padding tokens. Only real\n  # tokens are attended to.\n  input_mask = [1] * len(input_ids)\n\n  # Zero-pad up to the sequence length.\n  while len(input_ids) < max_seq_length:\n    input_ids.append(0)\n    input_mask.append(0)\n    segment_ids.append(0)\n\n  assert len(input_ids) == max_seq_length\n  assert len(input_mask) == max_seq_length\n  assert len(segment_ids) == max_seq_length\n\n  label_id = label_map[example.label]\n  if ex_index < 5:\n    tf.logging.info(\"*** Example ***\")\n    tf.logging.info(\"guid: %s\" % (example.guid))\n    tf.logging.info(\"tokens: %s\" % \" \".join(\n        [tokenization.printable_text(x) for x in tokens]))\n    tf.logging.info(\"input_ids: %s\" % \" \".join([str(x) for x in input_ids]))\n    tf.logging.info(\"input_mask: %s\" % \" \".join([str(x) for x in input_mask]))\n    tf.logging.info(\"segment_ids: %s\" % \" \".join([str(x) for x in segment_ids]))\n    tf.logging.info(\"label: %s (id = %d)\" % (example.label, label_id))\n\n  feature = InputFeatures(\n      input_ids=input_ids,\n      input_mask=input_mask,\n      segment_ids=segment_ids,\n      label_id=label_id,\n      is_real_example=True)\n  return feature\n\n\ndef file_based_convert_examples_to_features(\n    examples, label_list, max_seq_length, tokenizer, output_file):\n  \"\"\"Convert a set of `InputExample`s to a TFRecord file.\"\"\"\n\n  writer = tf.python_io.TFRecordWriter(output_file)\n\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    def create_int_feature(values):\n      f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n      return f\n\n    features = collections.OrderedDict()\n    features[\"input_ids\"] = create_int_feature(feature.input_ids)\n    features[\"input_mask\"] = create_int_feature(feature.input_mask)\n    features[\"segment_ids\"] = create_int_feature(feature.segment_ids)\n    features[\"label_ids\"] = create_int_feature([feature.label_id])\n    features[\"is_real_example\"] = create_int_feature(\n        [int(feature.is_real_example)])\n\n    tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n    writer.write(tf_example.SerializeToString())\n  writer.close()\n\n\ndef file_based_input_fn_builder(input_file, seq_length, is_training,\n                                drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  name_to_features = {\n      \"input_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"input_mask\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"segment_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"label_ids\": tf.FixedLenFeature([], tf.int64),\n      \"is_real_example\": tf.FixedLenFeature([], tf.int64),\n  }\n\n  def _decode_record(record, name_to_features):\n    \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n    example = tf.parse_single_example(record, name_to_features)\n\n    # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n    # So cast all int64 to int32.\n    for name in list(example.keys()):\n      t = example[name]\n      if t.dtype == tf.int64:\n        t = tf.to_int32(t)\n      example[name] = t\n\n    return example\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    d = tf.data.TFRecordDataset(input_file)\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.apply(\n        tf.contrib.data.map_and_batch(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            drop_remainder=drop_remainder))\n\n    return d\n\n  return input_fn\n\n\ndef _truncate_seq_pair(tokens_a, tokens_b, max_length):\n  \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\n\n  # This is a simple heuristic which will always truncate the longer sequence\n  # one token at a time. This makes more sense than truncating an equal percent\n  # of tokens from each, since if one sequence is very short then each token\n  # that's truncated likely contains more information than a longer sequence.\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_length:\n      break\n    if len(tokens_a) > len(tokens_b):\n      tokens_a.pop()\n    else:\n      tokens_b.pop()\n\n\ndef create_model(bert_config, is_training, input_ids, input_mask, segment_ids,\n                 labels, num_labels, use_one_hot_embeddings):\n  \"\"\"Creates a classification model.\"\"\"\n  model = modeling.BertModel(\n      config=bert_config,\n      is_training=is_training,\n      input_ids=input_ids,\n      input_mask=input_mask,\n      token_type_ids=segment_ids,\n      use_one_hot_embeddings=use_one_hot_embeddings)\n\n  # In the demo, we are doing a simple classification task on the entire\n  # segment.\n  #\n  # If you want to use the token-level output, use model.get_sequence_output()\n  # instead.\n  output_layer = model.get_pooled_output()\n\n  hidden_size = output_layer.shape[-1].value\n\n  output_weights = tf.get_variable(\n      \"output_weights\", [num_labels, hidden_size],\n      initializer=tf.truncated_normal_initializer(stddev=0.02))\n\n  output_bias = tf.get_variable(\n      \"output_bias\", [num_labels], initializer=tf.zeros_initializer())\n\n  with tf.variable_scope(\"loss\"):\n    ln_type = bert_config.ln_type\n    if ln_type == 'preln': # add by brightmart, 10-06. if it is preln, we need to an additonal layer: layer normalization as suggested in paper \"ON LAYER NORMALIZATION IN THE TRANSFORMER ARCHITECTURE\"\n        print(\"ln_type is preln. add LN layer.\")\n        output_layer=layer_norm(output_layer)\n    else:\n        print(\"ln_type is postln or other,do nothing.\")\n\n    if is_training:\n      # I.e., 0.1 dropout\n      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)\n\n    logits = tf.matmul(output_layer, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    probabilities = tf.nn.softmax(logits, axis=-1)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)\n\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) # todo 08-29 try temp-loss\n    ###############bi_tempered_logistic_loss############################################################################\n    # print(\"##cross entropy loss is used....\"); tf.logging.info(\"##cross entropy loss is used....\")\n    # t1=0.9 #t1=0.90\n    # t2=1.05 #t2=1.05\n    # per_example_loss=bi_tempered_logistic_loss(log_probs,one_hot_labels,t1,t2,label_smoothing=0.1,num_iters=5) # TODO label_smoothing=0.0\n    #tf.logging.info(\"per_example_loss:\"+str(per_example_loss.shape))\n    ##############bi_tempered_logistic_loss#############################################################################\n\n    loss = tf.reduce_mean(per_example_loss)\n\n    return (loss, per_example_loss, logits, probabilities)\n\ndef layer_norm(input_tensor, name=None):\n  \"\"\"Run layer normalization on the last dimension of the tensor.\"\"\"\n  return tf.contrib.layers.layer_norm(\n      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)\n\ndef model_fn_builder(bert_config, num_labels, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    label_ids = features[\"label_ids\"]\n    is_real_example = None\n    if \"is_real_example\" in features:\n      is_real_example = tf.cast(features[\"is_real_example\"], dtype=tf.float32)\n    else:\n      is_real_example = tf.ones(tf.shape(label_ids), dtype=tf.float32)\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    (total_loss, per_example_loss, logits, probabilities) = create_model(\n        bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,\n        num_labels, use_one_hot_embeddings)\n\n    tvars = tf.trainable_variables()\n    initialized_variable_names = {}\n    scaffold_fn = None\n    if init_checkpoint:\n      (assignment_map, initialized_variable_names\n      ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)\n      if use_tpu:\n\n        def tpu_scaffold():\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(per_example_loss, label_ids, logits, is_real_example):\n        predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)\n        accuracy = tf.metrics.accuracy(\n            labels=label_ids, predictions=predictions, weights=is_real_example)\n        loss = tf.metrics.mean(values=per_example_loss, weights=is_real_example)\n        return {\n            \"eval_accuracy\": accuracy,\n            \"eval_loss\": loss,\n        }\n\n      eval_metrics = (metric_fn,\n                      [per_example_loss, label_ids, logits, is_real_example])\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          predictions={\"probabilities\": probabilities},\n          scaffold_fn=scaffold_fn)\n    return output_spec\n\n  return model_fn\n\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef input_fn_builder(features, seq_length, is_training, drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  all_input_ids = []\n  all_input_mask = []\n  all_segment_ids = []\n  all_label_ids = []\n\n  for feature in features:\n    all_input_ids.append(feature.input_ids)\n    all_input_mask.append(feature.input_mask)\n    all_segment_ids.append(feature.segment_ids)\n    all_label_ids.append(feature.label_id)\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    num_examples = len(features)\n\n    # This is for demo purposes and does NOT scale to large data sets. We do\n    # not use Dataset.from_generator() because that uses tf.py_func which is\n    # not TPU compatible. The right way to load data is with TFRecordReader.\n    d = tf.data.Dataset.from_tensor_slices({\n        \"input_ids\":\n            tf.constant(\n                all_input_ids, shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"input_mask\":\n            tf.constant(\n                all_input_mask,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"segment_ids\":\n            tf.constant(\n                all_segment_ids,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"label_ids\":\n            tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),\n    })\n\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)\n    return d\n\n  return input_fn\n\nclass LCQMCPairClassificationProcessor(DataProcessor): # TODO NEED CHANGE2\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n  def __init__(self):\n    self.language = \"zh\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train.txt\")), \"train\")\n    # dev_0827.tsv\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"dev.txt\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test.txt\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n    #return [\"-1\",\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    print(\"length of lines:\",len(lines))\n    for (i, line) in enumerate(lines):\n      #print('#i:',i,line)\n      if i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      try:\n          label = tokenization.convert_to_unicode(line[2])\n          text_a = tokenization.convert_to_unicode(line[0])\n          text_b = tokenization.convert_to_unicode(line[1])\n          examples.append(\n              InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n      except Exception:\n          print('###error.i:', i, line)\n    return examples\n\nclass SentencePairClassificationProcessor(DataProcessor):\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n  def __init__(self):\n    self.language = \"zh\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train_0827.tsv\")), \"train\")\n    # dev_0827.tsv\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"dev_0827.tsv\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test_0827.tsv\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n    #return [\"-1\",\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    print(\"length of lines:\",len(lines))\n    for (i, line) in enumerate(lines):\n      #print('#i:',i,line)\n      if i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      try:\n          label = tokenization.convert_to_unicode(line[0])\n          text_a = tokenization.convert_to_unicode(line[1])\n          text_b = tokenization.convert_to_unicode(line[2])\n          examples.append(\n              InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n      except Exception:\n          print('###error.i:', i, line)\n    return examples\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef convert_examples_to_features(examples, label_list, max_seq_length,\n                                 tokenizer):\n  \"\"\"Convert a set of `InputExample`s to a list of `InputFeatures`.\"\"\"\n\n  features = []\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    features.append(feature)\n  return features\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  processors = {\n      \"sentence_pair\": SentencePairClassificationProcessor,\n      \"lcqmc_pair\":LCQMCPairClassificationProcessor,\n      \"lcqmc\": LCQMCPairClassificationProcessor\n\n  }\n\n  tokenization.validate_case_matches_checkpoint(FLAGS.do_lower_case,\n                                                FLAGS.init_checkpoint)\n\n  if not FLAGS.do_train and not FLAGS.do_eval and not FLAGS.do_predict:\n    raise ValueError(\n        \"At least one of `do_train`, `do_eval` or `do_predict' must be True.\")\n\n  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)\n\n  if FLAGS.max_seq_length > bert_config.max_position_embeddings:\n    raise ValueError(\n        \"Cannot use sequence length %d because the BERT model \"\n        \"was only trained up to sequence length %d\" %\n        (FLAGS.max_seq_length, bert_config.max_position_embeddings))\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  task_name = FLAGS.task_name.lower()\n\n  if task_name not in processors:\n    raise ValueError(\"Task not found: %s\" % (task_name))\n\n  processor = processors[task_name]()\n\n  label_list = processor.get_labels()\n\n  tokenizer = tokenization.FullTokenizer(\n      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  # Cloud TPU: Invalid TPU configuration, ensure ClusterResolver is passed to tpu.\n  print(\"###tpu_cluster_resolver:\",tpu_cluster_resolver)\n  run_config = tf.contrib.tpu.RunConfig(\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  train_examples = None\n  num_train_steps = None\n  num_warmup_steps = None\n  if FLAGS.do_train:\n    train_examples =processor.get_train_examples(FLAGS.data_dir) # TODO\n    print(\"###length of total train_examples:\",len(train_examples))\n    num_train_steps = int(len(train_examples)/ FLAGS.train_batch_size * FLAGS.num_train_epochs)\n    num_warmup_steps = int(num_train_steps * FLAGS.warmup_proportion)\n\n  model_fn = model_fn_builder(\n      bert_config=bert_config,\n      num_labels=len(label_list),\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=num_train_steps,\n      num_warmup_steps=num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size,\n      predict_batch_size=FLAGS.predict_batch_size)\n\n  if FLAGS.do_train:\n    train_file = os.path.join(FLAGS.output_dir, \"train.tf_record\")\n    train_file_exists=os.path.exists(train_file)\n    print(\"###train_file_exists:\", train_file_exists,\" ;train_file:\",train_file)\n    if not train_file_exists: # if tf_record file not exist, convert from raw text file. # TODO\n        file_based_convert_examples_to_features(train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Num examples = %d\", len(train_examples))\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    tf.logging.info(\"  Num steps = %d\", num_train_steps)\n    train_input_fn = file_based_input_fn_builder(\n        input_file=train_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=True,\n        drop_remainder=True)\n    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)\n\n  if FLAGS.do_eval:\n    eval_examples = processor.get_dev_examples(FLAGS.data_dir)\n    num_actual_eval_examples = len(eval_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on. These do NOT count towards the metric (all tf.metrics\n      # support a per-instance weight, and these get a weight of 0.0).\n      while len(eval_examples) % FLAGS.eval_batch_size != 0:\n        eval_examples.append(PaddingInputExample())\n\n    eval_file = os.path.join(FLAGS.output_dir, \"eval.tf_record\")\n    file_based_convert_examples_to_features(\n        eval_examples, label_list, FLAGS.max_seq_length, tokenizer, eval_file)\n\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(eval_examples), num_actual_eval_examples,\n                    len(eval_examples) - num_actual_eval_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n\n    # This tells the estimator to run through the entire set.\n    eval_steps = None\n    # However, if running eval on the TPU, you will need to specify the\n    # number of steps.\n    if FLAGS.use_tpu:\n      assert len(eval_examples) % FLAGS.eval_batch_size == 0\n      eval_steps = int(len(eval_examples) // FLAGS.eval_batch_size)\n\n    eval_drop_remainder = True if FLAGS.use_tpu else False\n    eval_input_fn = file_based_input_fn_builder(\n        input_file=eval_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=eval_drop_remainder)\n\n    #######################################################################################################################\n    # evaluate all checkpoints; you can use the checkpoint with the best dev accuarcy\n    steps_and_files = []\n    filenames = tf.gfile.ListDirectory(FLAGS.output_dir)\n    for filename in filenames:\n        if filename.endswith(\".index\"):\n            ckpt_name = filename[:-6]\n            cur_filename = os.path.join(FLAGS.output_dir, ckpt_name)\n            global_step = int(cur_filename.split(\"-\")[-1])\n            tf.logging.info(\"Add {} to eval list.\".format(cur_filename))\n            steps_and_files.append([global_step, cur_filename])\n    steps_and_files = sorted(steps_and_files, key=lambda x: x[0])\n\n    output_eval_file = os.path.join(FLAGS.data_dir, \"eval_results_albert_zh.txt\")\n    print(\"output_eval_file:\",output_eval_file)\n    tf.logging.info(\"output_eval_file:\"+output_eval_file)\n    with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n        for global_step, filename in sorted(steps_and_files, key=lambda x: x[0]):\n            result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps, checkpoint_path=filename)\n\n            tf.logging.info(\"***** Eval results %s *****\" % (filename))\n            writer.write(\"***** Eval results %s *****\\n\" % (filename))\n            for key in sorted(result.keys()):\n                tf.logging.info(\"  %s = %s\", key, str(result[key]))\n                writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n    #######################################################################################################################\n\n    #result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)\n    #\n    #output_eval_file = os.path.join(FLAGS.output_dir, \"eval_results.txt\")\n    #with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n    #  tf.logging.info(\"***** Eval results *****\")\n    #  for key in sorted(result.keys()):\n    #    tf.logging.info(\"  %s = %s\", key, str(result[key]))\n    #    writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\n  if FLAGS.do_predict:\n    predict_examples = processor.get_test_examples(FLAGS.data_dir)\n    num_actual_predict_examples = len(predict_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on.\n      while len(predict_examples) % FLAGS.predict_batch_size != 0:\n        predict_examples.append(PaddingInputExample())\n\n    predict_file = os.path.join(FLAGS.output_dir, \"predict.tf_record\")\n    file_based_convert_examples_to_features(predict_examples, label_list,\n                                            FLAGS.max_seq_length, tokenizer,\n                                            predict_file)\n\n    tf.logging.info(\"***** Running prediction*****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(predict_examples), num_actual_predict_examples,\n                    len(predict_examples) - num_actual_predict_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.predict_batch_size)\n\n    predict_drop_remainder = True if FLAGS.use_tpu else False\n    predict_input_fn = file_based_input_fn_builder(\n        input_file=predict_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=predict_drop_remainder)\n\n    result = estimator.predict(input_fn=predict_input_fn)\n\n    output_predict_file = os.path.join(FLAGS.output_dir, \"test_results.tsv\")\n    with tf.gfile.GFile(output_predict_file, \"w\") as writer:\n      num_written_lines = 0\n      tf.logging.info(\"***** Predict results *****\")\n      for (i, prediction) in enumerate(result):\n        probabilities = prediction[\"probabilities\"]\n        if i >= num_actual_predict_examples:\n          break\n        output_line = \"\\t\".join(\n            str(class_probability)\n            for class_probability in probabilities) + \"\\n\"\n        writer.write(output_line)\n        num_written_lines += 1\n    assert num_written_lines == num_actual_predict_examples\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"data_dir\")\n  flags.mark_flag_as_required(\"task_name\")\n  flags.mark_flag_as_required(\"vocab_file\")\n  flags.mark_flag_as_required(\"bert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()"
  },
  {
    "path": "run_classifier_clue.py",
    "content": "# -*- coding: utf-8 -*-\n# @Author: bo.shi\n# @Date:   2019-11-04 09:56:36\n# @Last Modified by:   bo.shi\n# @Last Modified time: 2019-12-04 14:29:04\n# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"BERT finetuning runner.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport modeling\nimport optimization_finetuning as optimization\nimport tokenization\nimport tensorflow as tf\n# from loss import bi_tempered_logistic_loss\nimport sys\nsys.path.append('..')\nfrom classifier_utils import *\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n# Required parameters\nflags.DEFINE_string(\n    \"data_dir\", None,\n    \"The input data dir. Should contain the .tsv files (or other data files) \"\n    \"for the task.\")\n\nflags.DEFINE_string(\n    \"bert_config_file\", None,\n    \"The config json file corresponding to the pre-trained BERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\"task_name\", None, \"The name of the task to train.\")\n\nflags.DEFINE_string(\"vocab_file\", None,\n                    \"The vocabulary file that the BERT model was trained on.\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\n# Other parameters\n\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained BERT model).\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 128,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded.\")\n\nflags.DEFINE_bool(\"do_train\", False, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_bool(\n    \"do_predict\", False,\n    \"Whether to run the model in inference mode on the test set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 32, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 8, \"Total batch size for eval.\")\n\nflags.DEFINE_integer(\"predict_batch_size\", 8, \"Total batch size for predict.\")\n\nflags.DEFINE_float(\"learning_rate\", 5e-5, \"The initial learning rate for Adam.\")\n\nflags.DEFINE_float(\"num_train_epochs\", 3.0,\n                   \"Total number of training epochs to perform.\")\n\nflags.DEFINE_float(\n    \"warmup_proportion\", 0.1,\n    \"Proportion of training to perform linear learning rate warmup for. \"\n    \"E.g., 0.1 = 10% of training.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 1000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\n\nclass InputFeatures(object):\n  \"\"\"A single set of features of data.\"\"\"\n\n  def __init__(self,\n               input_ids,\n               input_mask,\n               segment_ids,\n               label_id,\n               is_real_example=True):\n    self.input_ids = input_ids\n    self.input_mask = input_mask\n    self.segment_ids = segment_ids\n    self.label_id = label_id\n    self.is_real_example = is_real_example\n\n\ndef convert_single_example_for_inews(ex_index, tokens_a, tokens_b, label_map, max_seq_length,\n                                     tokenizer, example):\n  if tokens_b:\n    # Modifies `tokens_a` and `tokens_b` in place so that the total\n    # length is less than the specified length.\n    # Account for [CLS], [SEP], [SEP] with \"- 3\"\n    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\n  else:\n    # Account for [CLS] and [SEP] with \"- 2\"\n    if len(tokens_a) > max_seq_length - 2:\n      tokens_a = tokens_a[0:(max_seq_length - 2)]\n\n  # The convention in BERT is:\n  # (a) For sequence pairs:\n  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\n  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\n  # (b) For single sequences:\n  #  tokens:   [CLS] the dog is hairy . [SEP]\n  #  type_ids: 0     0   0   0  0     0 0\n  #\n  # Where \"type_ids\" are used to indicate whether this is the first\n  # sequence or the second sequence. The embedding vectors for `type=0` and\n  # `type=1` were learned during pre-training and are added to the wordpiece\n  # embedding vector (and position vector). This is not *strictly* necessary\n  # since the [SEP] token unambiguously separates the sequences, but it makes\n  # it easier for the model to learn the concept of sequences.\n  #\n  # For classification tasks, the first vector (corresponding to [CLS]) is\n  # used as the \"sentence vector\". Note that this only makes sense because\n  # the entire model is fine-tuned.\n  tokens = []\n  segment_ids = []\n  tokens.append(\"[CLS]\")\n  segment_ids.append(0)\n  for token in tokens_a:\n    tokens.append(token)\n    segment_ids.append(0)\n  tokens.append(\"[SEP]\")\n  segment_ids.append(0)\n\n  if tokens_b:\n    for token in tokens_b:\n      tokens.append(token)\n      segment_ids.append(1)\n    tokens.append(\"[SEP]\")\n    segment_ids.append(1)\n\n  input_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n  # The mask has 1 for real tokens and 0 for padding tokens. Only real\n  # tokens are attended to.\n  input_mask = [1] * len(input_ids)\n\n  # Zero-pad up to the sequence length.\n  while len(input_ids) < max_seq_length:\n    input_ids.append(0)\n    input_mask.append(0)\n    segment_ids.append(0)\n\n  assert len(input_ids) == max_seq_length\n  assert len(input_mask) == max_seq_length\n  assert len(segment_ids) == max_seq_length\n\n  label_id = label_map[example.label]\n  if ex_index < 5:\n    tf.logging.info(\"*** Example ***\")\n    tf.logging.info(\"guid: %s\" % (example.guid))\n    tf.logging.info(\"tokens: %s\" % \" \".join(\n        [tokenization.printable_text(x) for x in tokens]))\n    tf.logging.info(\"input_ids: %s\" % \" \".join([str(x) for x in input_ids]))\n    tf.logging.info(\"input_mask: %s\" % \" \".join([str(x) for x in input_mask]))\n    tf.logging.info(\"segment_ids: %s\" % \" \".join([str(x) for x in segment_ids]))\n    tf.logging.info(\"label: %s (id = %d)\" % (example.label, label_id))\n\n  feature = InputFeatures(\n      input_ids=input_ids,\n      input_mask=input_mask,\n      segment_ids=segment_ids,\n      label_id=label_id,\n      is_real_example=True)\n\n  return feature\n\n\ndef convert_example_list_for_inews(ex_index, example, label_list, max_seq_length,\n                                   tokenizer):\n  \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\n\n  if isinstance(example, PaddingInputExample):\n    return [InputFeatures(\n        input_ids=[0] * max_seq_length,\n        input_mask=[0] * max_seq_length,\n        segment_ids=[0] * max_seq_length,\n        label_id=0,\n        is_real_example=False)]\n\n  label_map = {}\n  for (i, label) in enumerate(label_list):\n    label_map[label] = i\n\n  tokens_a = tokenizer.tokenize(example.text_a)\n  tokens_b = None\n  if example.text_b:\n    tokens_b = tokenizer.tokenize(example.text_b)\n    must_len = len(tokens_a) + 3\n    extra_len = max_seq_length - must_len\n  feature_list = []\n  if example.text_b and extra_len > 0:\n    extra_num = int((len(tokens_b) - 1) / extra_len) + 1\n    for num in range(extra_num):\n      max_len = min((num + 1) * extra_len, len(tokens_b))\n      tokens_b_sub = tokens_b[num * extra_len: max_len]\n      feature = convert_single_example_for_inews(\n          ex_index, tokens_a, tokens_b_sub, label_map, max_seq_length, tokenizer, example)\n      feature_list.append(feature)\n  else:\n    feature = convert_single_example_for_inews(\n        ex_index, tokens_a, tokens_b, label_map, max_seq_length, tokenizer, example)\n    feature_list.append(feature)\n  return feature_list\n\n\ndef file_based_convert_examples_to_features_for_inews(\n        examples, label_list, max_seq_length, tokenizer, output_file):\n  \"\"\"Convert a set of `InputExample`s to a TFRecord file.\"\"\"\n\n  writer = tf.python_io.TFRecordWriter(output_file)\n  num_example = 0\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 1000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature_list = convert_example_list_for_inews(ex_index, example, label_list,\n                                                  max_seq_length, tokenizer)\n    num_example += len(feature_list)\n\n    def create_int_feature(values):\n      f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n      return f\n\n    features = collections.OrderedDict()\n    for feature in feature_list:\n      features[\"input_ids\"] = create_int_feature(feature.input_ids)\n      features[\"input_mask\"] = create_int_feature(feature.input_mask)\n      features[\"segment_ids\"] = create_int_feature(feature.segment_ids)\n      features[\"label_ids\"] = create_int_feature([feature.label_id])\n      features[\"is_real_example\"] = create_int_feature(\n          [int(feature.is_real_example)])\n\n      tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n      writer.write(tf_example.SerializeToString())\n  tf.logging.info(\"feature num: %s\", num_example)\n  writer.close()\n\n\ndef convert_single_example(ex_index, example, label_list, max_seq_length,\n                           tokenizer):\n  \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\n\n  if isinstance(example, PaddingInputExample):\n    return InputFeatures(\n        input_ids=[0] * max_seq_length,\n        input_mask=[0] * max_seq_length,\n        segment_ids=[0] * max_seq_length,\n        label_id=0,\n        is_real_example=False)\n\n  label_map = {}\n  for (i, label) in enumerate(label_list):\n    label_map[label] = i\n\n  tokens_a = tokenizer.tokenize(example.text_a)\n  tokens_b = None\n  if example.text_b:\n    tokens_b = tokenizer.tokenize(example.text_b)\n\n  if tokens_b:\n    # Modifies `tokens_a` and `tokens_b` in place so that the total\n    # length is less than the specified length.\n    # Account for [CLS], [SEP], [SEP] with \"- 3\"\n    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\n  else:\n    # Account for [CLS] and [SEP] with \"- 2\"\n    if len(tokens_a) > max_seq_length - 2:\n      tokens_a = tokens_a[0:(max_seq_length - 2)]\n\n  # The convention in BERT is:\n  # (a) For sequence pairs:\n  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\n  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\n  # (b) For single sequences:\n  #  tokens:   [CLS] the dog is hairy . [SEP]\n  #  type_ids: 0     0   0   0  0     0 0\n  #\n  # Where \"type_ids\" are used to indicate whether this is the first\n  # sequence or the second sequence. The embedding vectors for `type=0` and\n  # `type=1` were learned during pre-training and are added to the wordpiece\n  # embedding vector (and position vector). This is not *strictly* necessary\n  # since the [SEP] token unambiguously separates the sequences, but it makes\n  # it easier for the model to learn the concept of sequences.\n  #\n  # For classification tasks, the first vector (corresponding to [CLS]) is\n  # used as the \"sentence vector\". Note that this only makes sense because\n  # the entire model is fine-tuned.\n  tokens = []\n  segment_ids = []\n  tokens.append(\"[CLS]\")\n  segment_ids.append(0)\n  for token in tokens_a:\n    tokens.append(token)\n    segment_ids.append(0)\n  tokens.append(\"[SEP]\")\n  segment_ids.append(0)\n\n  if tokens_b:\n    for token in tokens_b:\n      tokens.append(token)\n      segment_ids.append(1)\n    tokens.append(\"[SEP]\")\n    segment_ids.append(1)\n\n  input_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n  # The mask has 1 for real tokens and 0 for padding tokens. Only real\n  # tokens are attended to.\n  input_mask = [1] * len(input_ids)\n\n  # Zero-pad up to the sequence length.\n  while len(input_ids) < max_seq_length:\n    input_ids.append(0)\n    input_mask.append(0)\n    segment_ids.append(0)\n\n  assert len(input_ids) == max_seq_length\n  assert len(input_mask) == max_seq_length\n  assert len(segment_ids) == max_seq_length\n\n  label_id = label_map[example.label]\n  if ex_index < 5:\n    tf.logging.info(\"*** Example ***\")\n    tf.logging.info(\"guid: %s\" % (example.guid))\n    tf.logging.info(\"tokens: %s\" % \" \".join(\n        [tokenization.printable_text(x) for x in tokens]))\n    tf.logging.info(\"input_ids: %s\" % \" \".join([str(x) for x in input_ids]))\n    tf.logging.info(\"input_mask: %s\" % \" \".join([str(x) for x in input_mask]))\n    tf.logging.info(\"segment_ids: %s\" % \" \".join([str(x) for x in segment_ids]))\n    tf.logging.info(\"label: %s (id = %d)\" % (example.label, label_id))\n\n  feature = InputFeatures(\n      input_ids=input_ids,\n      input_mask=input_mask,\n      segment_ids=segment_ids,\n      label_id=label_id,\n      is_real_example=True)\n  return feature\n\n\ndef file_based_convert_examples_to_features(\n        examples, label_list, max_seq_length, tokenizer, output_file):\n  \"\"\"Convert a set of `InputExample`s to a TFRecord file.\"\"\"\n\n  writer = tf.python_io.TFRecordWriter(output_file)\n\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    def create_int_feature(values):\n      f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n      return f\n\n    features = collections.OrderedDict()\n    features[\"input_ids\"] = create_int_feature(feature.input_ids)\n    features[\"input_mask\"] = create_int_feature(feature.input_mask)\n    features[\"segment_ids\"] = create_int_feature(feature.segment_ids)\n    features[\"label_ids\"] = create_int_feature([feature.label_id])\n    features[\"is_real_example\"] = create_int_feature(\n        [int(feature.is_real_example)])\n\n    tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n    writer.write(tf_example.SerializeToString())\n  writer.close()\n\n\ndef file_based_input_fn_builder(input_file, seq_length, is_training,\n                                drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  name_to_features = {\n      \"input_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"input_mask\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"segment_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"label_ids\": tf.FixedLenFeature([], tf.int64),\n      \"is_real_example\": tf.FixedLenFeature([], tf.int64),\n  }\n\n  def _decode_record(record, name_to_features):\n    \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n    example = tf.parse_single_example(record, name_to_features)\n\n    # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n    # So cast all int64 to int32.\n    for name in list(example.keys()):\n      t = example[name]\n      if t.dtype == tf.int64:\n        t = tf.to_int32(t)\n      example[name] = t\n\n    return example\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    d = tf.data.TFRecordDataset(input_file)\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.apply(\n        tf.contrib.data.map_and_batch(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            drop_remainder=drop_remainder))\n\n    return d\n\n  return input_fn\n\n\ndef _truncate_seq_pair(tokens_a, tokens_b, max_length):\n  \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\n\n  # This is a simple heuristic which will always truncate the longer sequence\n  # one token at a time. This makes more sense than truncating an equal percent\n  # of tokens from each, since if one sequence is very short then each token\n  # that's truncated likely contains more information than a longer sequence.\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_length:\n      break\n    if len(tokens_a) > len(tokens_b):\n      tokens_a.pop()\n    else:\n      tokens_b.pop()\n\n\ndef create_model(bert_config, is_training, input_ids, input_mask, segment_ids,\n                 labels, num_labels, use_one_hot_embeddings):\n  \"\"\"Creates a classification model.\"\"\"\n  model = modeling.BertModel(\n      config=bert_config,\n      is_training=is_training,\n      input_ids=input_ids,\n      input_mask=input_mask,\n      token_type_ids=segment_ids,\n      use_one_hot_embeddings=use_one_hot_embeddings)\n\n  # In the demo, we are doing a simple classification task on the entire\n  # segment.\n  #\n  # If you want to use the token-level output, use model.get_sequence_output()\n  # instead.\n  output_layer = model.get_pooled_output()\n\n  hidden_size = output_layer.shape[-1].value\n\n  output_weights = tf.get_variable(\n      \"output_weights\", [num_labels, hidden_size],\n      initializer=tf.truncated_normal_initializer(stddev=0.02))\n\n  output_bias = tf.get_variable(\n      \"output_bias\", [num_labels], initializer=tf.zeros_initializer())\n\n  with tf.variable_scope(\"loss\"):\n    ln_type = bert_config.ln_type\n    if ln_type == 'preln':  # add by brightmart, 10-06. if it is preln, we need to an additonal layer: layer normalization as suggested in paper \"ON LAYER NORMALIZATION IN THE TRANSFORMER ARCHITECTURE\"\n      print(\"ln_type is preln. add LN layer.\")\n      output_layer = layer_norm(output_layer)\n    else:\n      print(\"ln_type is postln or other,do nothing.\")\n\n    if is_training:\n      # I.e., 0.1 dropout\n      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)\n\n    logits = tf.matmul(output_layer, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    probabilities = tf.nn.softmax(logits, axis=-1)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)\n\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs,\n                                      axis=-1)  # todo 08-29 try temp-loss\n    ###############bi_tempered_logistic_loss############################################################################\n    # print(\"##cross entropy loss is used....\"); tf.logging.info(\"##cross entropy loss is used....\")\n    # t1=0.9 #t1=0.90\n    # t2=1.05 #t2=1.05\n    # per_example_loss=bi_tempered_logistic_loss(log_probs,one_hot_labels,t1,t2,label_smoothing=0.1,num_iters=5) # TODO label_smoothing=0.0\n    # tf.logging.info(\"per_example_loss:\"+str(per_example_loss.shape))\n    ##############bi_tempered_logistic_loss#############################################################################\n\n    loss = tf.reduce_mean(per_example_loss)\n\n    return (loss, per_example_loss, logits, probabilities)\n\n\ndef layer_norm(input_tensor, name=None):\n  \"\"\"Run layer normalization on the last dimension of the tensor.\"\"\"\n  return tf.contrib.layers.layer_norm(\n      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)\n\n\ndef model_fn_builder(bert_config, num_labels, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    label_ids = features[\"label_ids\"]\n    is_real_example = None\n    if \"is_real_example\" in features:\n      is_real_example = tf.cast(features[\"is_real_example\"], dtype=tf.float32)\n    else:\n      is_real_example = tf.ones(tf.shape(label_ids), dtype=tf.float32)\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    (total_loss, per_example_loss, logits, probabilities) = create_model(\n        bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,\n        num_labels, use_one_hot_embeddings)\n\n    tvars = tf.trainable_variables()\n    initialized_variable_names = {}\n    scaffold_fn = None\n    if init_checkpoint:\n      (assignment_map, initialized_variable_names\n       ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)\n      if use_tpu:\n\n        def tpu_scaffold():\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(per_example_loss, label_ids, logits, is_real_example):\n        predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)\n        accuracy = tf.metrics.accuracy(\n            labels=label_ids, predictions=predictions, weights=is_real_example)\n        loss = tf.metrics.mean(values=per_example_loss, weights=is_real_example)\n        return {\n            \"eval_accuracy\": accuracy,\n            \"eval_loss\": loss,\n        }\n\n      eval_metrics = (metric_fn,\n                      [per_example_loss, label_ids, logits, is_real_example])\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          predictions={\"probabilities\": probabilities},\n          scaffold_fn=scaffold_fn)\n    return output_spec\n\n  return model_fn\n\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef input_fn_builder(features, seq_length, is_training, drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  all_input_ids = []\n  all_input_mask = []\n  all_segment_ids = []\n  all_label_ids = []\n\n  for feature in features:\n    all_input_ids.append(feature.input_ids)\n    all_input_mask.append(feature.input_mask)\n    all_segment_ids.append(feature.segment_ids)\n    all_label_ids.append(feature.label_id)\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    num_examples = len(features)\n\n    # This is for demo purposes and does NOT scale to large data sets. We do\n    # not use Dataset.from_generator() because that uses tf.py_func which is\n    # not TPU compatible. The right way to load data is with TFRecordReader.\n    d = tf.data.Dataset.from_tensor_slices({\n        \"input_ids\":\n            tf.constant(\n                all_input_ids, shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"input_mask\":\n            tf.constant(\n                all_input_mask,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"segment_ids\":\n            tf.constant(\n                all_segment_ids,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"label_ids\":\n            tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),\n    })\n\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)\n    return d\n\n  return input_fn\n\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef convert_examples_to_features(examples, label_list, max_seq_length,\n                                 tokenizer):\n  \"\"\"Convert a set of `InputExample`s to a list of `InputFeatures`.\"\"\"\n\n  features = []\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    features.append(feature)\n  return features\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  processors = {\n      \"xnli\": XnliProcessor,\n      \"tnews\": TnewsProcessor,\n      \"afqmc\": AFQMCProcessor,\n      \"iflytek\": iFLYTEKDataProcessor,\n      \"copa\": COPAProcessor,\n      \"cmnli\": CMNLIProcessor,\n      \"wsc\": WSCProcessor,\n      \"csl\": CslProcessor,\n      \"copa\": COPAProcessor,\n  }\n\n  tokenization.validate_case_matches_checkpoint(FLAGS.do_lower_case,\n                                                FLAGS.init_checkpoint)\n\n  if not FLAGS.do_train and not FLAGS.do_eval and not FLAGS.do_predict:\n    raise ValueError(\n        \"At least one of `do_train`, `do_eval` or `do_predict' must be True.\")\n\n  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)\n\n  if FLAGS.max_seq_length > bert_config.max_position_embeddings:\n    raise ValueError(\n        \"Cannot use sequence length %d because the BERT model \"\n        \"was only trained up to sequence length %d\" %\n        (FLAGS.max_seq_length, bert_config.max_position_embeddings))\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  task_name = FLAGS.task_name.lower()\n\n  if task_name not in processors:\n    raise ValueError(\"Task not found: %s\" % (task_name))\n\n  processor = processors[task_name]()\n\n  label_list = processor.get_labels()\n\n  tokenizer = tokenization.FullTokenizer(\n      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  # Cloud TPU: Invalid TPU configuration, ensure ClusterResolver is passed to tpu.\n  print(\"###tpu_cluster_resolver:\", tpu_cluster_resolver)\n  run_config = tf.contrib.tpu.RunConfig(\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  train_examples = None\n  num_train_steps = None\n  num_warmup_steps = None\n  if FLAGS.do_train:\n    train_examples = processor.get_train_examples(FLAGS.data_dir)  # TODO\n    print(\"###length of total train_examples:\", len(train_examples))\n    num_train_steps = int(len(train_examples) / FLAGS.train_batch_size * FLAGS.num_train_epochs)\n    num_warmup_steps = int(num_train_steps * FLAGS.warmup_proportion)\n\n  model_fn = model_fn_builder(\n      bert_config=bert_config,\n      num_labels=len(label_list),\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=num_train_steps,\n      num_warmup_steps=num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size,\n      predict_batch_size=FLAGS.predict_batch_size)\n\n  if FLAGS.do_train:\n    train_file = os.path.join(FLAGS.output_dir, \"train.tf_record\")\n    train_file_exists = os.path.exists(train_file)\n    print(\"###train_file_exists:\", train_file_exists, \" ;train_file:\", train_file)\n    if not train_file_exists:  # if tf_record file not exist, convert from raw text file. # TODO\n      if task_name == \"inews\":\n        file_based_convert_examples_to_features_for_inews(\n            train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)\n      else:\n        file_based_convert_examples_to_features(\n            train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Num examples = %d\", len(train_examples))\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    tf.logging.info(\"  Num steps = %d\", num_train_steps)\n    train_input_fn = file_based_input_fn_builder(\n        input_file=train_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=True,\n        drop_remainder=True)\n    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)\n\n  if FLAGS.do_eval:\n    # dev dataset\n    eval_examples = processor.get_dev_examples(FLAGS.data_dir)\n    num_actual_eval_examples = len(eval_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on. These do NOT count towards the metric (all tf.metrics\n      # support a per-instance weight, and these get a weight of 0.0).\n      while len(eval_examples) % FLAGS.eval_batch_size != 0:\n        eval_examples.append(PaddingInputExample())\n\n    eval_file = os.path.join(FLAGS.output_dir, \"dev.tf_record\")\n    if task_name == \"inews\":\n      file_based_convert_examples_to_features_for_inews(\n          eval_examples, label_list, FLAGS.max_seq_length, tokenizer, eval_file)\n    else:\n      file_based_convert_examples_to_features(\n          eval_examples, label_list, FLAGS.max_seq_length, tokenizer, eval_file)\n\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(eval_examples), num_actual_eval_examples,\n                    len(eval_examples) - num_actual_eval_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n\n    # This tells the estimator to run through the entire set.\n    eval_steps = None\n    # However, if running eval on the TPU, you will need to specify the\n    # number of steps.\n    if FLAGS.use_tpu:\n      assert len(eval_examples) % FLAGS.eval_batch_size == 0\n      eval_steps = int(len(eval_examples) // FLAGS.eval_batch_size)\n\n    eval_drop_remainder = True if FLAGS.use_tpu else False\n    eval_input_fn = file_based_input_fn_builder(\n        input_file=eval_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=eval_drop_remainder)\n\n    #######################################################################################################################\n    # evaluate all checkpoints; you can use the checkpoint with the best dev accuarcy\n    steps_and_files = []\n    filenames = tf.gfile.ListDirectory(FLAGS.output_dir)\n    for filename in filenames:\n      if filename.endswith(\".index\"):\n        ckpt_name = filename[:-6]\n        cur_filename = os.path.join(FLAGS.output_dir, ckpt_name)\n        global_step = int(cur_filename.split(\"-\")[-1])\n        tf.logging.info(\"Add {} to eval list.\".format(cur_filename))\n        steps_and_files.append([global_step, cur_filename])\n    steps_and_files = sorted(steps_and_files, key=lambda x: x[0])\n\n    output_eval_file = os.path.join(FLAGS.data_dir, \"dev_results_albert_zh.txt\")\n    print(\"output_eval_file:\", output_eval_file)\n    tf.logging.info(\"output_eval_file:\" + output_eval_file)\n    with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n      for global_step, filename in sorted(steps_and_files, key=lambda x: x[0]):\n        result = estimator.evaluate(input_fn=eval_input_fn,\n                                    steps=eval_steps, checkpoint_path=filename)\n\n        tf.logging.info(\"***** Eval results %s *****\" % (filename))\n        writer.write(\"***** Eval results %s *****\\n\" % (filename))\n        for key in sorted(result.keys()):\n          tf.logging.info(\"  %s = %s\", key, str(result[key]))\n          writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n    #######################################################################################################################\n\n    # result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)\n    #\n    # output_eval_file = os.path.join(FLAGS.output_dir, \"dev_results_albert_zh.txt\")\n    # with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n    #  tf.logging.info(\"***** Eval results *****\")\n    #  for key in sorted(result.keys()):\n    #    tf.logging.info(\"  %s = %s\", key, str(result[key]))\n    #    writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\n  if FLAGS.do_predict:\n    predict_examples = processor.get_test_examples(FLAGS.data_dir)\n    num_actual_predict_examples = len(predict_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on.\n      while len(predict_examples) % FLAGS.predict_batch_size != 0:\n        predict_examples.append(PaddingInputExample())\n\n    predict_file = os.path.join(FLAGS.output_dir, \"predict.tf_record\")\n    if task_name == \"inews\":\n      file_based_convert_examples_to_features_for_inews(predict_examples, label_list,\n                                                        FLAGS.max_seq_length, tokenizer,\n                                                        predict_file)\n    else:\n      file_based_convert_examples_to_features(predict_examples, label_list,\n                                              FLAGS.max_seq_length, tokenizer,\n                                              predict_file)\n\n    tf.logging.info(\"***** Running prediction*****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(predict_examples), num_actual_predict_examples,\n                    len(predict_examples) - num_actual_predict_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.predict_batch_size)\n\n    predict_drop_remainder = True if FLAGS.use_tpu else False\n    predict_input_fn = file_based_input_fn_builder(\n        input_file=predict_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=predict_drop_remainder)\n\n    result = estimator.predict(input_fn=predict_input_fn)\n    index2label_map = {}\n    for (i, label) in enumerate(label_list):\n      index2label_map[i] = label\n    output_predict_file_label_name = task_name + \"_predict.json\"\n    output_predict_file_label = os.path.join(FLAGS.output_dir, output_predict_file_label_name)\n    output_predict_file = os.path.join(FLAGS.output_dir, \"test_results.tsv\")\n    with tf.gfile.GFile(output_predict_file_label, \"w\") as writer_label:\n      with tf.gfile.GFile(output_predict_file, \"w\") as writer:\n        num_written_lines = 0\n        tf.logging.info(\"***** Predict results *****\")\n        for (i, prediction) in enumerate(result):\n          probabilities = prediction[\"probabilities\"]\n          label_index = probabilities.argmax(0)\n          if i >= num_actual_predict_examples:\n            break\n          output_line = \"\\t\".join(\n              str(class_probability)\n              for class_probability in probabilities) + \"\\n\"\n          test_label_dict = {}\n          test_label_dict[\"id\"] = i\n          test_label_dict[\"label\"] = str(index2label_map[label_index])\n          if task_name == \"tnews\":\n            test_label_dict[\"label_desc\"] = \"\"\n          writer.write(output_line)\n          json.dump(test_label_dict, writer_label)\n          writer_label.write(\"\\n\")\n          num_written_lines += 1\n    assert num_written_lines == num_actual_predict_examples\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"data_dir\")\n  flags.mark_flag_as_required(\"task_name\")\n  flags.mark_flag_as_required(\"vocab_file\")\n  flags.mark_flag_as_required(\"bert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()"
  },
  {
    "path": "run_classifier_clue.sh",
    "content": "# @Author: bo.shi\n# @Date:   2020-03-15 16:11:00\n# @Last Modified by:   bo.shi\n# @Last Modified time: 2020-04-02 17:54:05\n#!/usr/bin/env bash\n\nexport CUDA_VISIBLE_DEVICES=\"0\"\nCURRENT_DIR=$(cd -P -- \"$(dirname -- \"$0\")\" && pwd -P)\nCLUE_DATA_DIR=$CURRENT_DIR/CLUEdataset\nALBERT_TINY_DIR=$CURRENT_DIR/albert_tiny\n\ndownload_data(){\n  TASK_NAME=$1\n  if [ ! -d $CLUE_DATA_DIR ]; then\n    mkdir -p $CLUE_DATA_DIR\n    echo \"makedir $CLUE_DATA_DIR\"\n  fi\n  cd $CLUE_DATA_DIR\n  if [ ! -d ${TASK_NAME} ]; then\n    mkdir $TASK_NAME\n    echo \"make dataset dir $CLUE_DATA_DIR/$TASK_NAME\"\n  fi\n  cd $TASK_NAME\n  if [ ! -f \"train.json\" ] || [ ! -f \"dev.json\" ] || [ ! -f \"test.json\" ]; then\n    rm *\n    wget https://storage.googleapis.com/cluebenchmark/tasks/${TASK_NAME}_public.zip\n    unzip ${TASK_NAME}_public.zip\n    rm ${TASK_NAME}_public.zip\n  else\n    echo \"data exists\"\n  fi\n  echo \"Finish download dataset.\"\n}\n\ndownload_model(){\n  if [ ! -d $ALBERT_TINY_DIR ]; then\n    mkdir -p $ALBERT_TINY_DIR\n    echo \"makedir $ALBERT_TINY_DIR\"\n  fi\n  cd $ALBERT_TINY_DIR\n  if [ ! -f \"albert_config_tiny.json\" ] || [ ! -f \"vocab.txt\" ] || [ ! -f \"checkpoint\" ] || [ ! -f \"albert_model.ckpt.index\" ] || [ ! -f \"albert_model.ckpt.meta\" ] || [ ! -f \"albert_model.ckpt.data-00000-of-00001\" ]; then\n    rm *\n    wget -c https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip\n    unzip albert_tiny_489k.zip\n    rm albert_tiny_489k.zip\n  else\n    echo \"model exists\"\n  fi\n  echo \"Finish download model.\"\n}\n\nrun_task() {\n  TASK_NAME=$1\n  download_data $TASK_NAME\n  download_model $MODEL_NAME\n  DATA_DIR=$CLUE_DATA_DIR/${TASK_NAME}\n  PREV_TRAINED_MODEL_DIR=$ALBERT_TINY_DIR\n  MAX_SEQ_LENGTH=$2\n  TRAIN_BATCH_SIZE=$3\n  LEARNING_RATE=$4\n  NUM_TRAIN_EPOCHS=$5\n  SAVE_CHECKPOINTS_STEPS=$6\n  OUTPUT_DIR=$CURRENT_DIR/${TASK_NAME}_output/\n  COMMON_ARGS=\"\n        --task_name=$TASK_NAME \\\n        --data_dir=$DATA_DIR \\\n        --vocab_file=$PREV_TRAINED_MODEL_DIR/vocab.txt \\\n        --bert_config_file=$PREV_TRAINED_MODEL_DIR/albert_config_tiny.json \\\n        --init_checkpoint=$PREV_TRAINED_MODEL_DIR/albert_model.ckpt \\\n        --max_seq_length=$MAX_SEQ_LENGTH \\\n        --train_batch_size=$TRAIN_BATCH_SIZE \\\n        --learning_rate=$LEARNING_RATE \\\n        --num_train_epochs=$NUM_TRAIN_EPOCHS \\\n        --save_checkpoints_steps=$SAVE_CHECKPOINTS_STEPS \\\n        --output_dir=$OUTPUT_DIR \\\n        --keep_checkpoint_max=0 \\\n  \"\n  cd $CURRENT_DIR\n  echo \"Start running...\"\n  python run_classifier_clue.py \\\n        $COMMON_ARGS \\\n        --do_train=true \\\n        --do_eval=false \\\n        --do_predict=false\n\n  echo \"Start predict...\"\n  python run_classifier_clue.py \\\n        $COMMON_ARGS \\\n        --do_train=false \\\n        --do_eval=true \\\n        --do_predict=true\n}\n\n##command##task_name##model_name##max_seq_length##train_batch_size##learning_rate##num_train_epochs##save_checkpoints_steps##tpu_ip\nrun_task afqmc 128 16 2e-5 3 300\nrun_task cmnli 128 64 3e-5 2 300\nrun_task csl 128 16 1e-5 5 100\nrun_task iflytek 128 32 2e-5 3 300\nrun_task tnews 128 16 2e-5 3 300\nrun_task wsc 128 16 1e-5 10 10"
  },
  {
    "path": "run_classifier_lcqmc.sh",
    "content": "#!/usr/bin/env bash\n# @Author: bo.shi, https://github.com/chineseGLUE/chineseGLUE\n# @Date:   2019-11-04 09:56:36\n# @Last Modified by:   bright\n# @Last Modified time: 2019-11-10 09:00:00\n\nTASK_NAME=\"lcqmc\"\nMODEL_NAME=\"albert_tiny_zh\"\nCURRENT_DIR=$(cd -P -- \"$(dirname -- \"$0\")\" && pwd -P)\n\nexport CUDA_VISIBLE_DEVICES=\"0\"\nexport ALBERT_CONFIG_DIR=$CURRENT_DIR/albert_config\nexport ALBERT_PRETRAINED_MODELS_DIR=$CURRENT_DIR/prev_trained_model\nexport ALBERT_TINY_DIR=$ALBERT_PRETRAINED_MODELS_DIR/$MODEL_NAME\n#mkdir chineseGLUEdatasets\nexport GLUE_DATA_DIR=$CURRENT_DIR/chineseGLUEdatasets\n\n# download and unzip dataset\nif [ ! -d $GLUE_DATA_DIR ]; then\n  mkdir -p $GLUE_DATA_DIR\n  echo \"makedir $GLUE_DATA_DIR\"\nfi\ncd $GLUE_DATA_DIR\nif [ ! -d $TASK_NAME ]; then\n  mkdir $TASK_NAME\n  echo \"makedir $GLUE_DATA_DIR/$TASK_NAME\"\nfi\ncd $TASK_NAME\necho \"Please try again if the data is not downloaded successfully.\"\nwget -c https://raw.githubusercontent.com/pengming617/text_matching/master/data/train.txt\nwget -c https://raw.githubusercontent.com/pengming617/text_matching/master/data/dev.txt\nwget -c https://raw.githubusercontent.com/pengming617/text_matching/master/data/test.txt\necho \"Finish download dataset.\"\n\n# download model\nif [ ! -d $ALBERT_TINY_DIR ]; then\n  mkdir -p $ALBERT_TINY_DIR\n  echo \"makedir $ALBERT_TINY_DIR\"\nfi\ncd $ALBERT_TINY_DIR\nif [ ! -f \"albert_config_tiny.json\" ] || [ ! -f \"vocab.txt\" ] || [ ! -f \"checkpoint\" ] || [ ! -f \"albert_model.ckpt.index\" ] || [ ! -f \"albert_model.ckpt.meta\" ] || [ ! -f \"albert_model.ckpt.data-00000-of-00001\" ]; then\n  rm *\n  wget https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip\n  unzip albert_tiny_489k.zip\n  rm albert_tiny_489k.zip\nelse\n  echo \"model exists\"\nfi\necho \"Finish download model.\"\n\n# run task\ncd $CURRENT_DIR\necho \"Start running...\"\npython run_classifier.py \\\n  --task_name=$TASK_NAME \\\n  --do_train=true \\\n  --do_eval=true \\\n  --data_dir=$GLUE_DATA_DIR/$TASK_NAME \\\n  --vocab_file=$ALBERT_CONFIG_DIR/vocab.txt \\\n  --bert_config_file=$ALBERT_CONFIG_DIR/albert_config_tiny.json \\\n  --init_checkpoint=$ALBERT_TINY_DIR/albert_model.ckpt \\\n  --max_seq_length=128 \\\n  --train_batch_size=64 \\\n  --learning_rate=1e-4 \\\n  --num_train_epochs=5.0 \\\n  --output_dir=$CURRENT_DIR/${TASK_NAME}_output/\n"
  },
  {
    "path": "run_classifier_sp_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"BERT finetuning runner with sentence piece tokenization.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport csv\nimport os\n\nimport six\nfrom six.moves import zip\nimport tensorflow as tf\n\nimport modeling_google as modeling\nimport optimization_google as optimization\nimport tokenization_google as tokenization\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n## Required parameters\nflags.DEFINE_string(\n    \"data_dir\", None,\n    \"The input data dir. Should contain the .tsv files (or other data files) \"\n    \"for the task.\")\n\nflags.DEFINE_string(\n    \"albert_config_file\", None,\n    \"The config json file corresponding to the pre-trained ALBERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\"task_name\", None, \"The name of the task to train.\")\n\nflags.DEFINE_string(\n    \"vocab_file\", None,\n    \"The vocabulary file that the ALBERT model was trained on.\")\n\nflags.DEFINE_string(\"spm_model_file\", None,\n                    \"The model file for sentence piece tokenization.\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\n## Other parameters\n\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained ALBERT model).\")\n\nflags.DEFINE_bool(\n    \"use_pooled_output\", True, \"Whether to use the CLS token outputs\")\n\nflags.DEFINE_bool(\n    \"do_lower_case\", True,\n    \"Whether to lower case the input text. Should be True for uncased \"\n    \"models and False for cased models.\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 512,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded.\")\n\nflags.DEFINE_bool(\"do_train\", False, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_bool(\n    \"do_predict\", False,\n    \"Whether to run the model in inference mode on the test set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 32, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 8, \"Total batch size for eval.\")\n\nflags.DEFINE_integer(\"predict_batch_size\", 8, \"Total batch size for predict.\")\n\nflags.DEFINE_float(\"learning_rate\", 5e-5, \"The initial learning rate for Adam.\")\n\nflags.DEFINE_float(\"num_train_epochs\", 3.0,\n                   \"Total number of training epochs to perform.\")\n\nflags.DEFINE_float(\n    \"warmup_proportion\", 0.1,\n    \"Proportion of training to perform linear learning rate warmup for. \"\n    \"E.g., 0.1 = 10% of training.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 1000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\n\nclass InputExample(object):\n  \"\"\"A single training/test example for simple sequence classification.\"\"\"\n\n  def __init__(self, guid, text_a, text_b=None, label=None):\n    \"\"\"Constructs a InputExample.\n    Args:\n      guid: Unique id for the example.\n      text_a: string. The untokenized text of the first sequence. For single\n        sequence tasks, only this sequence must be specified.\n      text_b: (Optional) string. The untokenized text of the second sequence.\n        Only must be specified for sequence pair tasks.\n      label: (Optional) string. The label of the example. This should be\n        specified for train and dev examples, but not for test examples.\n    \"\"\"\n    self.guid = guid\n    self.text_a = text_a\n    self.text_b = text_b\n    self.label = label\n\n\nclass PaddingInputExample(object):\n  \"\"\"Fake example so the num input examples is a multiple of the batch size.\n  When running eval/predict on the TPU, we need to pad the number of examples\n  to be a multiple of the batch size, because the TPU requires a fixed batch\n  size. The alternative is to drop the last batch, which is bad because it means\n  the entire output data won't be generated.\n  We use this class instead of `None` because treating `None` as padding\n  battches could cause silent errors.\n  \"\"\"\n\n\nclass InputFeatures(object):\n  \"\"\"A single set of features of data.\"\"\"\n\n  def __init__(self,\n               input_ids,\n               input_mask,\n               segment_ids,\n               label_id,\n               is_real_example=True):\n    self.input_ids = input_ids\n    self.input_mask = input_mask\n    self.segment_ids = segment_ids\n    self.label_id = label_id\n    self.is_real_example = is_real_example\n\n\nclass DataProcessor(object):\n  \"\"\"Base class for data converters for sequence classification data sets.\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the train set.\"\"\"\n    raise NotImplementedError()\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for the dev set.\"\"\"\n    raise NotImplementedError()\n\n  def get_test_examples(self, data_dir):\n    \"\"\"Gets a collection of `InputExample`s for prediction.\"\"\"\n    raise NotImplementedError()\n\n  def get_labels(self):\n    \"\"\"Gets the list of labels for this data set.\"\"\"\n    raise NotImplementedError()\n\n  @classmethod\n  def _read_tsv(cls, input_file, quotechar=None):\n    \"\"\"Reads a tab separated value file.\"\"\"\n    with tf.gfile.Open(input_file, \"r\") as f:\n      reader = csv.reader(f, delimiter=\"\\t\", quotechar=quotechar)\n      lines = []\n      for line in reader:\n        lines.append(line)\n      return lines\n\n\nclass XnliProcessor(DataProcessor):\n  \"\"\"Processor for the XNLI data set.\"\"\"\n\n  def __init__(self):\n    self.language = \"zh\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    lines = self._read_tsv(\n        os.path.join(data_dir, \"multinli\",\n                     \"multinli.train.%s.tsv\" % self.language))\n    examples = []\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n      guid = \"train-%d\" % (i)\n      text_a = tokenization.convert_to_unicode(line[0])\n      text_b = tokenization.convert_to_unicode(line[1])\n      label = tokenization.convert_to_unicode(line[2])\n      if label == tokenization.convert_to_unicode(\"contradictory\"):\n        label = tokenization.convert_to_unicode(\"contradiction\")\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    lines = self._read_tsv(os.path.join(data_dir, \"xnli.dev.tsv\"))\n    examples = []\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n      guid = \"dev-%d\" % (i)\n      language = tokenization.convert_to_unicode(line[0])\n      if language != tokenization.convert_to_unicode(self.language):\n        continue\n      text_a = tokenization.convert_to_unicode(line[6])\n      text_b = tokenization.convert_to_unicode(line[7])\n      label = tokenization.convert_to_unicode(line[1])\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"contradiction\", \"entailment\", \"neutral\"]\n\n\nclass MnliProcessor(DataProcessor):\n  \"\"\"Processor for the MultiNLI data set (GLUE version).\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"dev_matched.tsv\")),\n        \"dev_matched\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test_matched.tsv\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"contradiction\", \"entailment\", \"neutral\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n    # Note(mingdachen): We will rely on this guid for GLUE submission.\n      guid = tokenization.preprocess_text(line[0], lower=FLAGS.do_lower_case)\n      text_a = tokenization.preprocess_text(line[8], lower=FLAGS.do_lower_case)\n      text_b = tokenization.preprocess_text(line[9], lower=FLAGS.do_lower_case)\n      if set_type == \"test\":\n        label = \"contradiction\"\n      else:\n        label = tokenization.preprocess_text(line[-1])\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\nclass LCQMCPairClassificationProcessor(DataProcessor):\n  \"\"\"Processor for the internal data set. sentence pair classification\"\"\"\n  def __init__(self):\n    self.language = \"zh\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train.txt\")), \"train\")\n    # dev_0827.tsv\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test.txt\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test.txt\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    print(\"length of lines:\",len(lines))\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      try:\n          label = tokenization.convert_to_unicode(line[2])\n          text_a = tokenization.convert_to_unicode(line[0])\n          text_b = tokenization.convert_to_unicode(line[1])\n          examples.append(\n              InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n      except Exception:\n          print('###error.i:', i, line)\n    return examples\n\nclass MrpcProcessor(DataProcessor):\n  \"\"\"Processor for the MRPC data set (GLUE version).\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"dev.tsv\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test.tsv\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      if i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      text_a = tokenization.preprocess_text(line[3], lower=FLAGS.do_lower_case)\n      text_b = tokenization.preprocess_text(line[4], lower=FLAGS.do_lower_case)\n      if set_type == \"test\":\n        guid = line[0]\n        label = \"0\"\n      else:\n        label = tokenization.preprocess_text(line[0])\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\n    return examples\n\n\nclass ColaProcessor(DataProcessor):\n  \"\"\"Processor for the CoLA data set (GLUE version).\"\"\"\n\n  def get_train_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"train.tsv\")), \"train\")\n\n  def get_dev_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"dev.tsv\")), \"dev\")\n\n  def get_test_examples(self, data_dir):\n    \"\"\"See base class.\"\"\"\n    return self._create_examples(\n        self._read_tsv(os.path.join(data_dir, \"test.tsv\")), \"test\")\n\n  def get_labels(self):\n    \"\"\"See base class.\"\"\"\n    return [\"0\", \"1\"]\n\n  def _create_examples(self, lines, set_type):\n    \"\"\"Creates examples for the training and dev sets.\"\"\"\n    examples = []\n    for (i, line) in enumerate(lines):\n      # Only the test set has a header\n      if set_type == \"test\" and i == 0:\n        continue\n      guid = \"%s-%s\" % (set_type, i)\n      if set_type == \"test\":\n        guid = line[0]\n        text_a = tokenization.preprocess_text(\n            line[1], lower=FLAGS.do_lower_case)\n        label = \"0\"\n      else:\n        text_a = tokenization.preprocess_text(\n            line[3], lower=FLAGS.do_lower_case)\n        label = tokenization.preprocess_text(line[1])\n      examples.append(\n          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))\n    return examples\n\n\ndef convert_single_example(ex_index, example, label_list, max_seq_length,\n                           tokenizer):\n  \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\n\n  if isinstance(example, PaddingInputExample):\n    return InputFeatures(\n        input_ids=[0] * max_seq_length,\n        input_mask=[0] * max_seq_length,\n        segment_ids=[0] * max_seq_length,\n        label_id=0,\n        is_real_example=False)\n\n  label_map = {}\n  for (i, label) in enumerate(label_list):\n    label_map[label] = i\n\n  tokens_a = tokenizer.tokenize(example.text_a)\n  tokens_b = None\n  if example.text_b:\n    tokens_b = tokenizer.tokenize(example.text_b)\n\n  if tokens_b:\n    # Modifies `tokens_a` and `tokens_b` in place so that the total\n    # length is less than the specified length.\n    # Account for [CLS], [SEP], [SEP] with \"- 3\"\n    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\n  else:\n    # Account for [CLS] and [SEP] with \"- 2\"\n    if len(tokens_a) > max_seq_length - 2:\n      tokens_a = tokens_a[0:(max_seq_length - 2)]\n\n  # The convention in ALBERT is:\n  # (a) For sequence pairs:\n  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\n  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\n  # (b) For single sequences:\n  #  tokens:   [CLS] the dog is hairy . [SEP]\n  #  type_ids: 0     0   0   0  0     0 0\n  #\n  # Where \"type_ids\" are used to indicate whether this is the first\n  # sequence or the second sequence. The embedding vectors for `type=0` and\n  # `type=1` were learned during pre-training and are added to the wordpiece\n  # embedding vector (and position vector). This is not *strictly* necessary\n  # since the [SEP] token unambiguously separates the sequences, but it makes\n  # it easier for the model to learn the concept of sequences.\n  #\n  # For classification tasks, the first vector (corresponding to [CLS]) is\n  # used as the \"sentence vector\". Note that this only makes sense because\n  # the entire model is fine-tuned.\n  tokens = []\n  segment_ids = []\n  tokens.append(\"[CLS]\")\n  segment_ids.append(0)\n  for token in tokens_a:\n    tokens.append(token)\n    segment_ids.append(0)\n  tokens.append(\"[SEP]\")\n  segment_ids.append(0)\n\n  if tokens_b:\n    for token in tokens_b:\n      tokens.append(token)\n      segment_ids.append(1)\n    tokens.append(\"[SEP]\")\n    segment_ids.append(1)\n\n  input_ids = tokenizer.convert_tokens_to_ids(tokens)\n\n  # The mask has 1 for real tokens and 0 for padding tokens. Only real\n  # tokens are attended to.\n  input_mask = [1] * len(input_ids)\n\n  # Zero-pad up to the sequence length.\n  while len(input_ids) < max_seq_length:\n    input_ids.append(0)\n    input_mask.append(0)\n    segment_ids.append(0)\n\n  assert len(input_ids) == max_seq_length\n  assert len(input_mask) == max_seq_length\n  assert len(segment_ids) == max_seq_length\n\n  label_id = label_map[example.label]\n  if ex_index < 5:\n    tf.logging.info(\"*** Example ***\")\n    tf.logging.info(\"guid: %s\" % (example.guid))\n    tf.logging.info(\"tokens: %s\" % \" \".join(\n        [tokenization.printable_text(x) for x in tokens]))\n    tf.logging.info(\"input_ids: %s\" % \" \".join([str(x) for x in input_ids]))\n    tf.logging.info(\"input_mask: %s\" % \" \".join([str(x) for x in input_mask]))\n    tf.logging.info(\"segment_ids: %s\" % \" \".join([str(x) for x in segment_ids]))\n    tf.logging.info(\"label: %s (id = %d)\" % (example.label, label_id))\n\n  feature = InputFeatures(\n      input_ids=input_ids,\n      input_mask=input_mask,\n      segment_ids=segment_ids,\n      label_id=label_id,\n      is_real_example=True)\n  return feature\n\n\ndef file_based_convert_examples_to_features(\n    examples, label_list, max_seq_length, tokenizer, output_file):\n  \"\"\"Convert a set of `InputExample`s to a TFRecord file.\"\"\"\n\n  writer = tf.python_io.TFRecordWriter(output_file)\n\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    def create_int_feature(values):\n      f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))\n      return f\n\n    features = collections.OrderedDict()\n    features[\"input_ids\"] = create_int_feature(feature.input_ids)\n    features[\"input_mask\"] = create_int_feature(feature.input_mask)\n    features[\"segment_ids\"] = create_int_feature(feature.segment_ids)\n    features[\"label_ids\"] = create_int_feature([feature.label_id])\n    features[\"is_real_example\"] = create_int_feature(\n        [int(feature.is_real_example)])\n\n    tf_example = tf.train.Example(features=tf.train.Features(feature=features))\n    writer.write(tf_example.SerializeToString())\n  writer.close()\n\n\ndef file_based_input_fn_builder(input_file, seq_length, is_training,\n                                drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  name_to_features = {\n      \"input_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"input_mask\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"segment_ids\": tf.FixedLenFeature([seq_length], tf.int64),\n      \"label_ids\": tf.FixedLenFeature([], tf.int64),\n      \"is_real_example\": tf.FixedLenFeature([], tf.int64),\n  }\n\n  def _decode_record(record, name_to_features):\n    \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n    example = tf.parse_single_example(record, name_to_features)\n\n    # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n    # So cast all int64 to int32.\n    for name in list(example.keys()):\n      t = example[name]\n      if t.dtype == tf.int64:\n        t = tf.to_int32(t)\n      example[name] = t\n\n    return example\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    d = tf.data.TFRecordDataset(input_file)\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.apply(\n        tf.contrib.data.map_and_batch(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            drop_remainder=drop_remainder))\n\n    return d\n\n  return input_fn\n\n\ndef _truncate_seq_pair(tokens_a, tokens_b, max_length):\n  \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\n\n  # This is a simple heuristic which will always truncate the longer sequence\n  # one token at a time. This makes more sense than truncating an equal percent\n  # of tokens from each, since if one sequence is very short then each token\n  # that's truncated likely contains more information than a longer sequence.\n  while True:\n    total_length = len(tokens_a) + len(tokens_b)\n    if total_length <= max_length:\n      break\n    if len(tokens_a) > len(tokens_b):\n      tokens_a.pop()\n    else:\n      tokens_b.pop()\n\n\ndef create_model(albert_config, is_training, input_ids, input_mask, segment_ids,\n                 labels, num_labels, use_one_hot_embeddings):\n  \"\"\"Creates a classification model.\"\"\"\n  model = modeling.AlbertModel(\n      config=albert_config,\n      is_training=is_training,\n      input_ids=input_ids,\n      input_mask=input_mask,\n      token_type_ids=segment_ids,\n      use_one_hot_embeddings=use_one_hot_embeddings)\n\n  # In the demo, we are doing a simple classification task on the entire\n  # segment.\n  #\n  # If you want to use the token-level output, use model.get_sequence_output()\n  # instead.\n  if FLAGS.use_pooled_output:\n    tf.logging.info(\"using pooled output\")\n    output_layer = model.get_pooled_output()\n  else:\n    tf.logging.info(\"using meaned output\")\n    output_layer = tf.reduce_mean(model.get_sequence_output(), axis=1)\n\n  hidden_size = output_layer.shape[-1].value\n\n  output_weights = tf.get_variable(\n      \"output_weights\", [num_labels, hidden_size],\n      initializer=tf.truncated_normal_initializer(stddev=0.02))\n\n  output_bias = tf.get_variable(\n      \"output_bias\", [num_labels], initializer=tf.zeros_initializer())\n\n  with tf.variable_scope(\"loss\"):\n    if is_training:\n      # I.e., 0.1 dropout\n      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)\n\n    logits = tf.matmul(output_layer, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)\n    probabilities = tf.nn.softmax(logits, axis=-1)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)\n\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)\n    loss = tf.reduce_mean(per_example_loss)\n\n    return (loss, per_example_loss, probabilities, predictions)\n\n\ndef model_fn_builder(albert_config, num_labels, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    label_ids = features[\"label_ids\"]\n    is_real_example = None\n    if \"is_real_example\" in features:\n      is_real_example = tf.cast(features[\"is_real_example\"], dtype=tf.float32)\n    else:\n      is_real_example = tf.ones(tf.shape(label_ids), dtype=tf.float32)\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    (total_loss, per_example_loss, probabilities, predictions) = \\\n        create_model(albert_config, is_training, input_ids, input_mask,\n                     segment_ids, label_ids, num_labels, use_one_hot_embeddings)\n\n    tvars = tf.trainable_variables()\n    initialized_variable_names = {}\n    scaffold_fn = None\n    if init_checkpoint:\n      (assignment_map, initialized_variable_names\n      ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)\n      if use_tpu:\n\n        def tpu_scaffold():\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(per_example_loss, label_ids, predictions, is_real_example):\n        accuracy = tf.metrics.accuracy(\n            labels=label_ids, predictions=predictions, weights=is_real_example)\n        loss = tf.metrics.mean(values=per_example_loss, weights=is_real_example)\n        return {\n            \"eval_accuracy\": accuracy,\n            \"eval_loss\": loss,\n        }\n\n      eval_metrics = (metric_fn,\n                      [per_example_loss, label_ids,\n                       predictions, is_real_example])\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          predictions={\"probabilities\": probabilities,\n                       \"predictions\": predictions},\n          scaffold_fn=scaffold_fn)\n    return output_spec\n\n  return model_fn\n\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef input_fn_builder(features, seq_length, is_training, drop_remainder):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  all_input_ids = []\n  all_input_mask = []\n  all_segment_ids = []\n  all_label_ids = []\n\n  for feature in features:\n    all_input_ids.append(feature.input_ids)\n    all_input_mask.append(feature.input_mask)\n    all_segment_ids.append(feature.segment_ids)\n    all_label_ids.append(feature.label_id)\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    num_examples = len(features)\n\n    # This is for demo purposes and does NOT scale to large data sets. We do\n    # not use Dataset.from_generator() because that uses tf.py_func which is\n    # not TPU compatible. The right way to load data is with TFRecordReader.\n    d = tf.data.Dataset.from_tensor_slices({\n        \"input_ids\":\n            tf.constant(\n                all_input_ids, shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"input_mask\":\n            tf.constant(\n                all_input_mask,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"segment_ids\":\n            tf.constant(\n                all_segment_ids,\n                shape=[num_examples, seq_length],\n                dtype=tf.int32),\n        \"label_ids\":\n            tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),\n    })\n\n    if is_training:\n      d = d.repeat()\n      d = d.shuffle(buffer_size=100)\n\n    d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)\n    return d\n\n  return input_fn\n\n\n# This function is not used by this file but is still used by the Colab and\n# people who depend on it.\ndef convert_examples_to_features(examples, label_list, max_seq_length,\n                                 tokenizer):\n  \"\"\"Convert a set of `InputExample`s to a list of `InputFeatures`.\"\"\"\n\n  features = []\n  for (ex_index, example) in enumerate(examples):\n    if ex_index % 10000 == 0:\n      tf.logging.info(\"Writing example %d of %d\" % (ex_index, len(examples)))\n\n    feature = convert_single_example(ex_index, example, label_list,\n                                     max_seq_length, tokenizer)\n\n    features.append(feature)\n  return features\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  processors = {\n      \"cola\": ColaProcessor,\n      \"mnli\": MnliProcessor,\n      \"mrpc\": MrpcProcessor,\n      \"xnli\": XnliProcessor,\n      \"lcqmc_pair\": LCQMCPairClassificationProcessor\n\n  }\n\n  tokenization.validate_case_matches_checkpoint(FLAGS.do_lower_case,\n                                                FLAGS.init_checkpoint)\n\n  if not FLAGS.do_train and not FLAGS.do_eval and not FLAGS.do_predict:\n    raise ValueError(\n        \"At least one of `do_train`, `do_eval` or `do_predict' must be True.\")\n\n  albert_config = modeling.AlbertConfig.from_json_file(FLAGS.albert_config_file)\n\n  if FLAGS.max_seq_length > albert_config.max_position_embeddings:\n    raise ValueError(\n        \"Cannot use sequence length %d because the ALBERT model \"\n        \"was only trained up to sequence length %d\" %\n        (FLAGS.max_seq_length, albert_config.max_position_embeddings))\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  task_name = FLAGS.task_name.lower()\n\n  if task_name not in processors:\n    raise ValueError(\"Task not found: %s\" % (task_name))\n\n  processor = processors[task_name]()\n\n  label_list = processor.get_labels()\n\n  tokenizer = tokenization.FullTokenizer(\n      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case,\n      spm_model_file=FLAGS.spm_model_file)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  run_config = tf.contrib.tpu.RunConfig(\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  train_examples = None\n  num_train_steps = None\n  num_warmup_steps = None\n  if FLAGS.do_train:\n    train_examples = processor.get_train_examples(FLAGS.data_dir)\n    num_train_steps = int(\n        len(train_examples) / FLAGS.train_batch_size * FLAGS.num_train_epochs)\n    num_warmup_steps = int(num_train_steps * FLAGS.warmup_proportion)\n\n  model_fn = model_fn_builder(\n      albert_config=albert_config,\n      num_labels=len(label_list),\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=num_train_steps,\n      num_warmup_steps=num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size,\n      predict_batch_size=FLAGS.predict_batch_size)\n\n  if FLAGS.do_train:\n    train_file = os.path.join(FLAGS.output_dir, \"train.tf_record\")\n    file_based_convert_examples_to_features(\n        train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Num examples = %d\", len(train_examples))\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    tf.logging.info(\"  Num steps = %d\", num_train_steps)\n    train_input_fn = file_based_input_fn_builder(\n        input_file=train_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=True,\n        drop_remainder=True)\n    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)\n\n  if FLAGS.do_eval:\n    eval_examples = processor.get_dev_examples(FLAGS.data_dir)\n    num_actual_eval_examples = len(eval_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on. These do NOT count towards the metric (all tf.metrics\n      # support a per-instance weight, and these get a weight of 0.0).\n      while len(eval_examples) % FLAGS.eval_batch_size != 0:\n        eval_examples.append(PaddingInputExample())\n\n    eval_file = os.path.join(FLAGS.output_dir, \"eval.tf_record\")\n    file_based_convert_examples_to_features(\n        eval_examples, label_list, FLAGS.max_seq_length, tokenizer, eval_file)\n\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(eval_examples), num_actual_eval_examples,\n                    len(eval_examples) - num_actual_eval_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n\n    # This tells the estimator to run through the entire set.\n    eval_steps = None\n    # However, if running eval on the TPU, you will need to specify the\n    # number of steps.\n    if FLAGS.use_tpu:\n      assert len(eval_examples) % FLAGS.eval_batch_size == 0\n      eval_steps = int(len(eval_examples) // FLAGS.eval_batch_size)\n\n    eval_drop_remainder = True if FLAGS.use_tpu else False\n    eval_input_fn = file_based_input_fn_builder(\n        input_file=eval_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=eval_drop_remainder)\n\n    #######################################################################################################################\n    # evaluate all checkpoints; you can use the checkpoint with the best dev accuarcy\n    steps_and_files = []\n    filenames = tf.gfile.ListDirectory(FLAGS.output_dir)\n    for filename in filenames:\n        if filename.endswith(\".index\"):\n            ckpt_name = filename[:-6]\n            cur_filename = os.path.join(FLAGS.output_dir, ckpt_name)\n            global_step = int(cur_filename.split(\"-\")[-1])\n            tf.logging.info(\"Add {} to eval list.\".format(cur_filename))\n            steps_and_files.append([global_step, cur_filename])\n    steps_and_files = sorted(steps_and_files, key=lambda x: x[0])\n\n    output_eval_file = os.path.join(FLAGS.data_dir, \"eval_results_albert_zh.txt\")\n    print(\"output_eval_file:\",output_eval_file)\n    tf.logging.info(\"output_eval_file:\"+output_eval_file)\n    with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n        for global_step, filename in sorted(steps_and_files, key=lambda x: x[0]):\n            result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps, checkpoint_path=filename)\n\n            tf.logging.info(\"***** Eval results %s *****\" % (filename))\n            writer.write(\"***** Eval results %s *****\\n\" % (filename))\n            for key in sorted(result.keys()):\n                tf.logging.info(\"  %s = %s\", key, str(result[key]))\n                writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n    #######################################################################################################################\n    # result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)\n    # output_eval_file = os.path.join(FLAGS.output_dir, \"eval_results.txt\")\n    # with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n    #  tf.logging.info(\"***** Eval results *****\")\n    #  for key in sorted(result.keys()):\n    #    tf.logging.info(\"  %s = %s\", key, str(result[key]))\n    #    writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\n  if FLAGS.do_predict:\n    predict_examples = processor.get_test_examples(FLAGS.data_dir)\n    num_actual_predict_examples = len(predict_examples)\n    if FLAGS.use_tpu:\n      # TPU requires a fixed batch size for all batches, therefore the number\n      # of examples must be a multiple of the batch size, or else examples\n      # will get dropped. So we pad with fake examples which are ignored\n      # later on.\n      while len(predict_examples) % FLAGS.predict_batch_size != 0:\n        predict_examples.append(PaddingInputExample())\n\n    predict_file = os.path.join(FLAGS.output_dir, \"predict.tf_record\")\n    file_based_convert_examples_to_features(predict_examples, label_list,\n                                            FLAGS.max_seq_length, tokenizer,\n                                            predict_file)\n\n    tf.logging.info(\"***** Running prediction*****\")\n    tf.logging.info(\"  Num examples = %d (%d actual, %d padding)\",\n                    len(predict_examples), num_actual_predict_examples,\n                    len(predict_examples) - num_actual_predict_examples)\n    tf.logging.info(\"  Batch size = %d\", FLAGS.predict_batch_size)\n\n    predict_drop_remainder = True if FLAGS.use_tpu else False\n    predict_input_fn = file_based_input_fn_builder(\n        input_file=predict_file,\n        seq_length=FLAGS.max_seq_length,\n        is_training=False,\n        drop_remainder=predict_drop_remainder)\n\n    result = estimator.predict(input_fn=predict_input_fn)\n\n    output_predict_file = os.path.join(FLAGS.output_dir, \"test_results.tsv\")\n    output_submit_file = os.path.join(FLAGS.output_dir, \"submit_results.tsv\")\n    with tf.gfile.GFile(output_predict_file, \"w\") as pred_writer,\\\n        tf.gfile.GFile(output_submit_file, \"w\") as sub_writer:\n      num_written_lines = 0\n      tf.logging.info(\"***** Predict results *****\")\n      for (i, (example, prediction)) in\\\n          enumerate(zip(predict_examples, result)):\n        probabilities = prediction[\"probabilities\"]\n        if i >= num_actual_predict_examples:\n          break\n        output_line = \"\\t\".join(\n            str(class_probability)\n            for class_probability in probabilities) + \"\\n\"\n        pred_writer.write(output_line)\n\n        actual_label = label_list[int(prediction[\"predictions\"])]\n        sub_writer.write(\n            six.ensure_str(example.guid) + \"\\t\" + actual_label + \"\\n\")\n        num_written_lines += 1\n    assert num_written_lines == num_actual_predict_examples\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"data_dir\")\n  flags.mark_flag_as_required(\"task_name\")\n  flags.mark_flag_as_required(\"vocab_file\")\n  flags.mark_flag_as_required(\"albert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()"
  },
  {
    "path": "run_pretraining.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Run masked LM/next sentence masked_lm pre-training for BERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\nimport modeling\nimport optimization\nimport tensorflow as tf\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n## Required parameters\nflags.DEFINE_string(\n    \"bert_config_file\", None,\n    \"The config json file corresponding to the pre-trained BERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\n    \"input_file\", None,\n    \"Input TF example files (can be a glob or comma separated).\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\n## Other parameters\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained BERT model).\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 128,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded. Must match data generation.\")\n\nflags.DEFINE_integer(\n    \"max_predictions_per_seq\", 20,\n    \"Maximum number of masked LM predictions per sequence. \"\n    \"Must match data generation.\")\n\nflags.DEFINE_bool(\"do_train\", False, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 32, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 8, \"Total batch size for eval.\")\n\nflags.DEFINE_float(\"learning_rate\", 5e-5, \"The initial learning rate for Adam.\")\n\nflags.DEFINE_integer(\"num_train_steps\", 100000, \"Number of training steps.\")\n\nflags.DEFINE_integer(\"num_warmup_steps\", 10000, \"Number of warmup steps.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 1000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_integer(\"max_eval_steps\", 100, \"Maximum number of eval steps.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\n\ndef model_fn_builder(bert_config, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    masked_lm_positions = features[\"masked_lm_positions\"]\n    masked_lm_ids = features[\"masked_lm_ids\"]\n    masked_lm_weights = features[\"masked_lm_weights\"]\n    next_sentence_labels = features[\"next_sentence_labels\"]\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    model = modeling.BertModel(\n        config=bert_config,\n        is_training=is_training,\n        input_ids=input_ids,\n        input_mask=input_mask,\n        token_type_ids=segment_ids,\n        use_one_hot_embeddings=use_one_hot_embeddings)\n\n    (masked_lm_loss,\n     masked_lm_example_loss, masked_lm_log_probs) = get_masked_lm_output(\n         bert_config, model.get_sequence_output(), model.get_embedding_table(),model.get_embedding_table_2(),\n         masked_lm_positions, masked_lm_ids, masked_lm_weights)\n\n    (next_sentence_loss, next_sentence_example_loss,\n     next_sentence_log_probs) = get_next_sentence_output(\n         bert_config, model.get_pooled_output(), next_sentence_labels)\n\n    total_loss = masked_lm_loss + next_sentence_loss\n\n    tvars = tf.trainable_variables()\n\n    initialized_variable_names = {}\n    print(\"init_checkpoint:\",init_checkpoint)\n    scaffold_fn = None\n    if init_checkpoint:\n      (assignment_map, initialized_variable_names\n      ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)\n      if use_tpu:\n\n        def tpu_scaffold():\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n                    masked_lm_weights, next_sentence_example_loss,\n                    next_sentence_log_probs, next_sentence_labels):\n        \"\"\"Computes the loss and accuracy of the model.\"\"\"\n        masked_lm_log_probs = tf.reshape(masked_lm_log_probs,[-1, masked_lm_log_probs.shape[-1]])\n        masked_lm_predictions = tf.argmax(masked_lm_log_probs, axis=-1, output_type=tf.int32)\n        masked_lm_example_loss = tf.reshape(masked_lm_example_loss, [-1])\n        masked_lm_ids = tf.reshape(masked_lm_ids, [-1])\n        masked_lm_weights = tf.reshape(masked_lm_weights, [-1])\n        masked_lm_accuracy = tf.metrics.accuracy(\n            labels=masked_lm_ids,\n            predictions=masked_lm_predictions,\n            weights=masked_lm_weights)\n        masked_lm_mean_loss = tf.metrics.mean(\n            values=masked_lm_example_loss, weights=masked_lm_weights)\n\n        next_sentence_log_probs = tf.reshape(\n            next_sentence_log_probs, [-1, next_sentence_log_probs.shape[-1]])\n        next_sentence_predictions = tf.argmax(\n            next_sentence_log_probs, axis=-1, output_type=tf.int32)\n        next_sentence_labels = tf.reshape(next_sentence_labels, [-1])\n        next_sentence_accuracy = tf.metrics.accuracy(\n            labels=next_sentence_labels, predictions=next_sentence_predictions)\n        next_sentence_mean_loss = tf.metrics.mean(\n            values=next_sentence_example_loss)\n\n        return {\n            \"masked_lm_accuracy\": masked_lm_accuracy,\n            \"masked_lm_loss\": masked_lm_mean_loss,\n            \"next_sentence_accuracy\": next_sentence_accuracy,\n            \"next_sentence_loss\": next_sentence_mean_loss,\n        }\n\n      # next_sentence_example_loss=0.0 TODO\n      # next_sentence_log_probs=0.0 # TODO\n      eval_metrics = (metric_fn, [\n          masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n          masked_lm_weights, next_sentence_example_loss,\n          next_sentence_log_probs, next_sentence_labels\n      ])\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      raise ValueError(\"Only TRAIN and EVAL modes are supported: %s\" % (mode))\n\n    return output_spec\n\n  return model_fn\n\n\ndef get_masked_lm_output(bert_config, input_tensor, output_weights,project_weights, positions,\n                         label_ids, label_weights):\n  \"\"\"Get loss and log probs for the masked LM.\"\"\"\n  input_tensor = gather_indexes(input_tensor, positions)\n\n  with tf.variable_scope(\"cls/predictions\"):\n    # We apply one more non-linear transformation before the output layer.\n    # This matrix is not used after pre-training.\n    with tf.variable_scope(\"transform\"):\n      input_tensor = tf.layers.dense(\n          input_tensor,\n          units=bert_config.hidden_size,\n          activation=modeling.get_activation(bert_config.hidden_act),\n          kernel_initializer=modeling.create_initializer(\n              bert_config.initializer_range))\n      input_tensor = modeling.layer_norm(input_tensor)\n\n    # The output weights are the same as the input embeddings, but there is\n    # an output-only bias for each token.\n    output_bias = tf.get_variable(\n        \"output_bias\",\n        shape=[bert_config.vocab_size],\n        initializer=tf.zeros_initializer())\n    # logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    # input_tensor=[-1,hidden_size], project_weights=[embedding_size, hidden_size], project_weights_transpose=[hidden_size, embedding_size]--->[-1, embedding_size]\n    input_project = tf.matmul(input_tensor, project_weights, transpose_b=True)\n    logits = tf.matmul(input_project, output_weights, transpose_b=True)\n    #  # input_project=[-1, embedding_size], output_weights=[vocab_size, embedding_size], output_weights_transpose=[embedding_size, vocab_size] ---> [-1, vocab_size]\n\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    label_ids = tf.reshape(label_ids, [-1])\n    label_weights = tf.reshape(label_weights, [-1])\n\n    one_hot_labels = tf.one_hot(label_ids, depth=bert_config.vocab_size, dtype=tf.float32)\n\n    # The `positions` tensor might be zero-padded (if the sequence is too\n    # short to have the maximum number of predictions). The `label_weights`\n    # tensor has a value of 1.0 for every real prediction and 0.0 for the\n    # padding predictions.\n    per_example_loss = -tf.reduce_sum(log_probs * one_hot_labels, axis=[-1])\n    numerator = tf.reduce_sum(label_weights * per_example_loss)\n    denominator = tf.reduce_sum(label_weights) + 1e-5\n    loss = numerator / denominator\n\n  return (loss, per_example_loss, log_probs)\n\n\ndef get_next_sentence_output(bert_config, input_tensor, labels):\n  \"\"\"Get loss and log probs for the next sentence prediction.\"\"\"\n\n  # Simple binary classification. Note that 0 is \"next sentence\" and 1 is\n  # \"random sentence\". This weight matrix is not used after pre-training.\n  with tf.variable_scope(\"cls/seq_relationship\"):\n    output_weights = tf.get_variable(\n        \"output_weights\",\n        shape=[2, bert_config.hidden_size],\n        initializer=modeling.create_initializer(bert_config.initializer_range))\n    output_bias = tf.get_variable(\n        \"output_bias\", shape=[2], initializer=tf.zeros_initializer())\n\n    logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n    labels = tf.reshape(labels, [-1])\n    one_hot_labels = tf.one_hot(labels, depth=2, dtype=tf.float32)\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)\n    loss = tf.reduce_mean(per_example_loss)\n    return (loss, per_example_loss, log_probs)\n\n\ndef gather_indexes(sequence_tensor, positions):\n  \"\"\"Gathers the vectors at the specific positions over a minibatch.\"\"\"\n  sequence_shape = modeling.get_shape_list(sequence_tensor, expected_rank=3)\n  batch_size = sequence_shape[0]\n  seq_length = sequence_shape[1]\n  width = sequence_shape[2]\n\n  flat_offsets = tf.reshape(\n      tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])\n  flat_positions = tf.reshape(positions + flat_offsets, [-1])\n  flat_sequence_tensor = tf.reshape(sequence_tensor,\n                                    [batch_size * seq_length, width])\n  output_tensor = tf.gather(flat_sequence_tensor, flat_positions)\n  return output_tensor\n\n\ndef input_fn_builder(input_files,\n                     max_seq_length,\n                     max_predictions_per_seq,\n                     is_training,\n                     num_cpu_threads=4):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    name_to_features = {\n        \"input_ids\":\n            tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"input_mask\":\n            tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"segment_ids\":\n            tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"masked_lm_positions\":\n            tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n        \"masked_lm_ids\":\n            tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n        \"masked_lm_weights\":\n            tf.FixedLenFeature([max_predictions_per_seq], tf.float32),\n        \"next_sentence_labels\":\n            tf.FixedLenFeature([1], tf.int64),\n    }\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    if is_training:\n      d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))\n      d = d.repeat()\n      d = d.shuffle(buffer_size=len(input_files))\n\n      # `cycle_length` is the number of parallel files that get read.\n      cycle_length = min(num_cpu_threads, len(input_files))\n\n      # `sloppy` mode means that the interleaving is not exact. This adds\n      # even more randomness to the training pipeline.\n      d = d.apply(\n          tf.contrib.data.parallel_interleave(\n              tf.data.TFRecordDataset,\n              sloppy=is_training,\n              cycle_length=cycle_length))\n      d = d.shuffle(buffer_size=100)\n    else:\n      d = tf.data.TFRecordDataset(input_files)\n      # Since we evaluate for a fixed number of steps we don't want to encounter\n      # out-of-range exceptions.\n      d = d.repeat()\n\n    # We must `drop_remainder` on training because the TPU requires fixed\n    # size dimensions. For eval, we assume we are evaluating on the CPU or GPU\n    # and we *don't* want to drop the remainder, otherwise we wont cover\n    # every sample.\n    d = d.apply(\n        tf.contrib.data.map_and_batch(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            num_parallel_batches=num_cpu_threads,\n            drop_remainder=True))\n    return d\n\n  return input_fn\n\n\ndef _decode_record(record, name_to_features):\n  \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n  example = tf.parse_single_example(record, name_to_features)\n\n  # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n  # So cast all int64 to int32.\n  for name in list(example.keys()):\n    t = example[name]\n    if t.dtype == tf.int64:\n      t = tf.to_int32(t)\n    example[name] = t\n\n  return example\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  if not FLAGS.do_train and not FLAGS.do_eval: # 必须是训练或验证的类型\n    raise ValueError(\"At least one of `do_train` or `do_eval` must be True.\")\n\n  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file) # 从json文件中获得配置信息\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  input_files = [] # 输入可以是多个文件，以“逗号隔开”；可以是一个匹配形式的，如“input_x*”\n  for input_pattern in FLAGS.input_file.split(\",\"):\n    input_files.extend(tf.gfile.Glob(input_pattern))\n\n  tf.logging.info(\"*** Input Files ***\")\n  for input_file in input_files:\n    tf.logging.info(\"  %s\" % input_file)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n      tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver( # TODO\n            tpu=FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  print(\"###tpu_cluster_resolver:\",tpu_cluster_resolver,\";FLAGS.use_tpu:\",FLAGS.use_tpu,\";FLAGS.tpu_name:\",FLAGS.tpu_name,\";FLAGS.tpu_zone:\",FLAGS.tpu_zone)\n  # ###tpu_cluster_resolver: <tensorflow.python.distribute.cluster_resolver.tpu_cluster_resolver.TPUClusterResolver object at 0x7f4b387b06a0> ;FLAGS.use_tpu: True ;FLAGS.tpu_name: grpc://10.240.1.83:8470\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  run_config = tf.contrib.tpu.RunConfig(\n      keep_checkpoint_max=20, # 10\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  model_fn = model_fn_builder(\n      bert_config=bert_config,\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=FLAGS.num_train_steps,\n      num_warmup_steps=FLAGS.num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size)\n\n  if FLAGS.do_train:\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    train_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=True)\n    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)\n\n  if FLAGS.do_eval:\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n\n    eval_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=False)\n\n    result = estimator.evaluate(input_fn=eval_input_fn, steps=FLAGS.max_eval_steps)\n\n    output_eval_file = os.path.join(FLAGS.output_dir, \"eval_results.txt\")\n    with tf.gfile.GFile(output_eval_file, \"w\") as writer:\n      tf.logging.info(\"***** Eval results *****\")\n      for key in sorted(result.keys()):\n        tf.logging.info(\"  %s = %s\", key, str(result[key]))\n        writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"input_file\")\n  flags.mark_flag_as_required(\"bert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()\n"
  },
  {
    "path": "run_pretraining_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"Run masked LM/next sentence masked_lm pre-training for ALBERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\nimport time\n\nfrom six.moves import range\nimport tensorflow as tf\n\nimport modeling_google as modeling\nimport optimization_google as optimization\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n## Required parameters\nflags.DEFINE_string(\n    \"albert_config_file\", None,\n    \"The config json file corresponding to the pre-trained ALBERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\n    \"input_file\", None,\n    \"Input TF example files (can be a glob or comma separated).\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\nflags.DEFINE_string(\n    \"export_dir\", None,\n    \"The output directory where the saved models will be written.\")\n## Other parameters\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained ALBERT model).\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 512,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded. Must match data generation.\")\n\nflags.DEFINE_integer(\n    \"max_predictions_per_seq\", 20,\n    \"Maximum number of masked LM predictions per sequence. \"\n    \"Must match data generation.\")\n\nflags.DEFINE_bool(\"do_train\", True, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 4096, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 64, \"Total batch size for eval.\")\n\nflags.DEFINE_enum(\"optimizer\", \"lamb\", [\"adamw\", \"lamb\"],\n                  \"The optimizer for training.\")\n\nflags.DEFINE_float(\"learning_rate\", 0.00176, \"The initial learning rate.\")\n\nflags.DEFINE_float(\"poly_power\", 1.0, \"The power of poly decay.\")\n\nflags.DEFINE_integer(\"num_train_steps\", 125000, \"Number of training steps.\")\n\nflags.DEFINE_integer(\"num_warmup_steps\", 3125, \"Number of warmup steps.\")\n\nflags.DEFINE_integer(\"start_warmup_step\", 0, \"The starting step of warmup.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 5000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_integer(\"max_eval_steps\", 100, \"Maximum number of eval steps.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\nflags.DEFINE_bool(\"init_from_group0\", False, \"Whether to initialize\"\n                  \"parameters of other groups from group 0\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\nflags.DEFINE_float(\n    \"masked_lm_budget\", 0,\n    \"If >0, the ratio of masked ngrams to unmasked ngrams. Default 0,\"\n    \"for offline masking\")\n\n\ndef model_fn_builder(albert_config, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings, optimizer, poly_power,\n                     start_warmup_step):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    masked_lm_positions = features[\"masked_lm_positions\"]\n    masked_lm_ids = features[\"masked_lm_ids\"]\n    masked_lm_weights = features[\"masked_lm_weights\"]\n    # Note: We keep this feature name `next_sentence_labels` to be compatible\n    # with the original data created by lanzhzh@. However, in the ALBERT case\n    # it does represent sentence_order_labels.\n    sentence_order_labels = features[\"next_sentence_labels\"]\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    model = modeling.AlbertModel(\n        config=albert_config,\n        is_training=is_training,\n        input_ids=input_ids,\n        input_mask=input_mask,\n        token_type_ids=segment_ids,\n        use_one_hot_embeddings=use_one_hot_embeddings)\n\n    (masked_lm_loss, masked_lm_example_loss,\n     masked_lm_log_probs) = get_masked_lm_output(albert_config,\n                                                 model.get_sequence_output(),\n                                                 model.get_embedding_table(),\n                                                 masked_lm_positions,\n                                                 masked_lm_ids,\n                                                 masked_lm_weights)\n\n    (sentence_order_loss, sentence_order_example_loss,\n     sentence_order_log_probs) = get_sentence_order_output(\n         albert_config, model.get_pooled_output(), sentence_order_labels)\n\n    total_loss = masked_lm_loss + sentence_order_loss\n\n    tvars = tf.trainable_variables()\n\n    initialized_variable_names = {}\n    scaffold_fn = None\n    if init_checkpoint:\n      tf.logging.info(\"number of hidden group %d to initialize\",\n                      albert_config.num_hidden_groups)\n      num_of_initialize_group = 1\n      if FLAGS.init_from_group0:\n        num_of_initialize_group = albert_config.num_hidden_groups\n        if albert_config.net_structure_type > 0:\n          num_of_initialize_group = albert_config.num_hidden_layers\n      (assignment_map, initialized_variable_names\n      ) = modeling.get_assignment_map_from_checkpoint(\n              tvars, init_checkpoint, num_of_initialize_group)\n      if use_tpu:\n\n        def tpu_scaffold():\n          for gid in range(num_of_initialize_group):\n            tf.logging.info(\"initialize the %dth layer\", gid)\n            tf.logging.info(assignment_map[gid])\n            tf.train.init_from_checkpoint(init_checkpoint, assignment_map[gid])\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        for gid in range(num_of_initialize_group):\n          tf.logging.info(\"initialize the %dth layer\", gid)\n          tf.logging.info(assignment_map[gid])\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map[gid])\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps,\n          use_tpu, optimizer, poly_power, start_warmup_step)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(*args):\n        \"\"\"Computes the loss and accuracy of the model.\"\"\"\n        (masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n         masked_lm_weights, sentence_order_example_loss,\n         sentence_order_log_probs, sentence_order_labels) = args[:7]\n\n\n        masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n                                         [-1, masked_lm_log_probs.shape[-1]])\n        masked_lm_predictions = tf.argmax(\n            masked_lm_log_probs, axis=-1, output_type=tf.int32)\n        masked_lm_example_loss = tf.reshape(masked_lm_example_loss, [-1])\n        masked_lm_ids = tf.reshape(masked_lm_ids, [-1])\n        masked_lm_weights = tf.reshape(masked_lm_weights, [-1])\n        masked_lm_accuracy = tf.metrics.accuracy(\n            labels=masked_lm_ids,\n            predictions=masked_lm_predictions,\n            weights=masked_lm_weights)\n        masked_lm_mean_loss = tf.metrics.mean(\n            values=masked_lm_example_loss, weights=masked_lm_weights)\n\n        metrics = {\n            \"masked_lm_accuracy\": masked_lm_accuracy,\n            \"masked_lm_loss\": masked_lm_mean_loss,\n        }\n\n        sentence_order_log_probs = tf.reshape(\n            sentence_order_log_probs, [-1, sentence_order_log_probs.shape[-1]])\n        sentence_order_predictions = tf.argmax(\n            sentence_order_log_probs, axis=-1, output_type=tf.int32)\n        sentence_order_labels = tf.reshape(sentence_order_labels, [-1])\n        sentence_order_accuracy = tf.metrics.accuracy(\n            labels=sentence_order_labels,\n            predictions=sentence_order_predictions)\n        sentence_order_mean_loss = tf.metrics.mean(\n            values=sentence_order_example_loss)\n        metrics.update({\n            \"sentence_order_accuracy\": sentence_order_accuracy,\n            \"sentence_order_loss\": sentence_order_mean_loss\n        })\n        return metrics\n\n      metric_values = [\n          masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n          masked_lm_weights, sentence_order_example_loss,\n          sentence_order_log_probs, sentence_order_labels\n      ]\n\n      eval_metrics = (metric_fn, metric_values)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      raise ValueError(\"Only TRAIN and EVAL modes are supported: %s\" % (mode))\n\n    return output_spec\n\n  return model_fn\n\n\ndef get_masked_lm_output(albert_config, input_tensor, output_weights, positions,\n                         label_ids, label_weights):\n  \"\"\"Get loss and log probs for the masked LM.\"\"\"\n  input_tensor = gather_indexes(input_tensor, positions)\n\n\n  with tf.variable_scope(\"cls/predictions\"):\n    # We apply one more non-linear transformation before the output layer.\n    # This matrix is not used after pre-training.\n    with tf.variable_scope(\"transform\"):\n      input_tensor = tf.layers.dense(\n          input_tensor,\n          units=albert_config.embedding_size,\n          activation=modeling.get_activation(albert_config.hidden_act),\n          kernel_initializer=modeling.create_initializer(\n              albert_config.initializer_range))\n      input_tensor = modeling.layer_norm(input_tensor)\n\n    # The output weights are the same as the input embeddings, but there is\n    # an output-only bias for each token.\n    output_bias = tf.get_variable(\n        \"output_bias\",\n        shape=[albert_config.vocab_size],\n        initializer=tf.zeros_initializer())\n    logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    label_ids = tf.reshape(label_ids, [-1])\n    label_weights = tf.reshape(label_weights, [-1])\n\n    one_hot_labels = tf.one_hot(\n        label_ids, depth=albert_config.vocab_size, dtype=tf.float32)\n\n    # The `positions` tensor might be zero-padded (if the sequence is too\n    # short to have the maximum number of predictions). The `label_weights`\n    # tensor has a value of 1.0 for every real prediction and 0.0 for the\n    # padding predictions.\n    per_example_loss = -tf.reduce_sum(log_probs * one_hot_labels, axis=[-1])\n    numerator = tf.reduce_sum(label_weights * per_example_loss)\n    denominator = tf.reduce_sum(label_weights) + 1e-5\n    loss = numerator / denominator\n\n  return (loss, per_example_loss, log_probs)\n\n\ndef get_sentence_order_output(albert_config, input_tensor, labels):\n  \"\"\"Get loss and log probs for the next sentence prediction.\"\"\"\n\n  # Simple binary classification. Note that 0 is \"next sentence\" and 1 is\n  # \"random sentence\". This weight matrix is not used after pre-training.\n  with tf.variable_scope(\"cls/seq_relationship\"):\n    output_weights = tf.get_variable(\n        \"output_weights\",\n        shape=[2, albert_config.hidden_size],\n        initializer=modeling.create_initializer(\n            albert_config.initializer_range))\n    output_bias = tf.get_variable(\n        \"output_bias\", shape=[2], initializer=tf.zeros_initializer())\n\n    logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n    labels = tf.reshape(labels, [-1])\n    one_hot_labels = tf.one_hot(labels, depth=2, dtype=tf.float32)\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)\n    loss = tf.reduce_mean(per_example_loss)\n    return (loss, per_example_loss, log_probs)\n\n\ndef gather_indexes(sequence_tensor, positions):\n  \"\"\"Gathers the vectors at the specific positions over a minibatch.\"\"\"\n  sequence_shape = modeling.get_shape_list(sequence_tensor, expected_rank=3)\n  batch_size = sequence_shape[0]\n  seq_length = sequence_shape[1]\n  width = sequence_shape[2]\n\n  flat_offsets = tf.reshape(\n      tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])\n  flat_positions = tf.reshape(positions + flat_offsets, [-1])\n  flat_sequence_tensor = tf.reshape(sequence_tensor,\n                                    [batch_size * seq_length, width])\n  output_tensor = tf.gather(flat_sequence_tensor, flat_positions)\n  return output_tensor\n\n\ndef input_fn_builder(input_files,\n                     max_seq_length,\n                     max_predictions_per_seq,\n                     is_training,\n                     num_cpu_threads=4):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    name_to_features = {\n        \"input_ids\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"input_mask\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"segment_ids\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        # Note: We keep this feature name `next_sentence_labels` to be\n        # compatible with the original data created by lanzhzh@. However, in\n        # the ALBERT case it does represent sentence_order_labels.\n        \"next_sentence_labels\": tf.FixedLenFeature([1], tf.int64),\n    }\n\n    if FLAGS.masked_lm_budget:\n      name_to_features.update({\n          \"token_boundary\":\n              tf.FixedLenFeature([max_seq_length], tf.int64)})\n    else:\n      name_to_features.update({\n          \"masked_lm_positions\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n          \"masked_lm_ids\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n          \"masked_lm_weights\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.float32)})\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    if is_training:\n      d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))\n      d = d.repeat()\n      d = d.shuffle(buffer_size=len(input_files))\n\n      # `cycle_length` is the number of parallel files that get read.\n      cycle_length = min(num_cpu_threads, len(input_files))\n\n      # `sloppy` mode means that the interleaving is not exact. This adds\n      # even more randomness to the training pipeline.\n      d = d.apply(\n          tf.contrib.data.parallel_interleave(\n              tf.data.TFRecordDataset,\n              sloppy=is_training,\n              cycle_length=cycle_length))\n      d = d.shuffle(buffer_size=100)\n    else:\n      d = tf.data.TFRecordDataset(input_files)\n      # Since we evaluate for a fixed number of steps we don't want to encounter\n      # out-of-range exceptions.\n      d = d.repeat()\n\n    # We must `drop_remainder` on training because the TPU requires fixed\n    # size dimensions. For eval, we assume we are evaluating on the CPU or GPU\n    # and we *don't* want to drop the remainder, otherwise we wont cover\n    # every sample.\n    d = d.apply(\n        tf.data.experimental.map_and_batch_with_legacy_function(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            num_parallel_batches=num_cpu_threads,\n            drop_remainder=True))\n    tf.logging.info(d)\n    return d\n\n  return input_fn\n\n\ndef _decode_record(record, name_to_features):\n  \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n  example = tf.parse_single_example(record, name_to_features)\n\n  # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n  # So cast all int64 to int32.\n  for name in list(example.keys()):\n    t = example[name]\n    if t.dtype == tf.int64:\n      t = tf.to_int32(t)\n    example[name] = t\n\n  return example\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  if not FLAGS.do_train and not FLAGS.do_eval:\n    raise ValueError(\"At least one of `do_train` or `do_eval` must be True.\")\n\n  albert_config = modeling.AlbertConfig.from_json_file(FLAGS.albert_config_file)\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  input_files = []\n  for input_pattern in FLAGS.input_file.split(\",\"):\n    input_files.extend(tf.gfile.Glob(input_pattern))\n\n  tf.logging.info(\"*** Input Files ***\")\n  for input_file in input_files:\n    tf.logging.info(\"  %s\" % input_file)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  run_config = tf.contrib.tpu.RunConfig(\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  model_fn = model_fn_builder(\n      albert_config=albert_config,\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=FLAGS.num_train_steps,\n      num_warmup_steps=FLAGS.num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu,\n      optimizer=FLAGS.optimizer,\n      poly_power=FLAGS.poly_power,\n      start_warmup_step=FLAGS.start_warmup_step)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size)\n\n  if FLAGS.do_train:\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    train_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=True)\n    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)\n\n  if FLAGS.do_eval:\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n    global_step = -1\n    output_eval_file = os.path.join(FLAGS.output_dir, \"eval_results.txt\")\n    writer = tf.gfile.GFile(output_eval_file, \"w\")\n    tf.gfile.MakeDirs(FLAGS.export_dir)\n    eval_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=False)\n    while global_step < FLAGS.num_train_steps:\n      if estimator.latest_checkpoint() is None:\n        tf.logging.info(\"No checkpoint found yet. Sleeping.\")\n        time.sleep(1)\n      else:\n        result = estimator.evaluate(\n            input_fn=eval_input_fn, steps=FLAGS.max_eval_steps)\n        global_step = result[\"global_step\"]\n        tf.logging.info(\"***** Eval results *****\")\n        for key in sorted(result.keys()):\n          tf.logging.info(\"  %s = %s\", key, str(result[key]))\n          writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"input_file\")\n  flags.mark_flag_as_required(\"albert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()"
  },
  {
    "path": "run_pretraining_google_fast.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n\"\"\"Run masked LM/next sentence masked_lm pre-training for ALBERT.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\nimport time\n\nfrom six.moves import range\nimport tensorflow as tf\n\nimport modeling_google_fast as modeling\nimport optimization_google as optimization\n\nflags = tf.flags\n\nFLAGS = flags.FLAGS\n\n## Required parameters\nflags.DEFINE_string(\n    \"albert_config_file\", None,\n    \"The config json file corresponding to the pre-trained ALBERT model. \"\n    \"This specifies the model architecture.\")\n\nflags.DEFINE_string(\n    \"input_file\", None,\n    \"Input TF example files (can be a glob or comma separated).\")\n\nflags.DEFINE_string(\n    \"output_dir\", None,\n    \"The output directory where the model checkpoints will be written.\")\n\nflags.DEFINE_string(\n    \"export_dir\", None,\n    \"The output directory where the saved models will be written.\")\n## Other parameters\nflags.DEFINE_string(\n    \"init_checkpoint\", None,\n    \"Initial checkpoint (usually from a pre-trained ALBERT model).\")\n\nflags.DEFINE_integer(\n    \"max_seq_length\", 512,\n    \"The maximum total input sequence length after WordPiece tokenization. \"\n    \"Sequences longer than this will be truncated, and sequences shorter \"\n    \"than this will be padded. Must match data generation.\")\n\nflags.DEFINE_integer(\n    \"max_predictions_per_seq\", 20,\n    \"Maximum number of masked LM predictions per sequence. \"\n    \"Must match data generation.\")\n\nflags.DEFINE_bool(\"do_train\", True, \"Whether to run training.\")\n\nflags.DEFINE_bool(\"do_eval\", False, \"Whether to run eval on the dev set.\")\n\nflags.DEFINE_integer(\"train_batch_size\", 4096, \"Total batch size for training.\")\n\nflags.DEFINE_integer(\"eval_batch_size\", 64, \"Total batch size for eval.\")\n\nflags.DEFINE_enum(\"optimizer\", \"lamb\", [\"adamw\", \"lamb\"],\n                  \"The optimizer for training.\")\n\nflags.DEFINE_float(\"learning_rate\", 0.00176, \"The initial learning rate.\")\n\nflags.DEFINE_float(\"poly_power\", 1.0, \"The power of poly decay.\")\n\nflags.DEFINE_integer(\"num_train_steps\", 125000, \"Number of training steps.\")\n\nflags.DEFINE_integer(\"num_warmup_steps\", 3125, \"Number of warmup steps.\")\n\nflags.DEFINE_integer(\"start_warmup_step\", 0, \"The starting step of warmup.\")\n\nflags.DEFINE_integer(\"save_checkpoints_steps\", 5000,\n                     \"How often to save the model checkpoint.\")\n\nflags.DEFINE_integer(\"iterations_per_loop\", 1000,\n                     \"How many steps to make in each estimator call.\")\n\nflags.DEFINE_integer(\"max_eval_steps\", 100, \"Maximum number of eval steps.\")\n\nflags.DEFINE_bool(\"use_tpu\", False, \"Whether to use TPU or GPU/CPU.\")\n\nflags.DEFINE_bool(\"init_from_group0\", False, \"Whether to initialize\"\n                  \"parameters of other groups from group 0\")\n\ntf.flags.DEFINE_string(\n    \"tpu_name\", None,\n    \"The Cloud TPU to use for training. This should be either the name \"\n    \"used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 \"\n    \"url.\")\n\ntf.flags.DEFINE_string(\n    \"tpu_zone\", None,\n    \"[Optional] GCE zone where the Cloud TPU is located in. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\n    \"gcp_project\", None,\n    \"[Optional] Project name for the Cloud TPU-enabled project. If not \"\n    \"specified, we will attempt to automatically detect the GCE project from \"\n    \"metadata.\")\n\ntf.flags.DEFINE_string(\"master\", None, \"[Optional] TensorFlow master URL.\")\n\nflags.DEFINE_integer(\n    \"num_tpu_cores\", 8,\n    \"Only used if `use_tpu` is True. Total number of TPU cores to use.\")\n\nflags.DEFINE_float(\n    \"masked_lm_budget\", 0,\n    \"If >0, the ratio of masked ngrams to unmasked ngrams. Default 0,\"\n    \"for offline masking\")\n\n\ndef model_fn_builder(albert_config, init_checkpoint, learning_rate,\n                     num_train_steps, num_warmup_steps, use_tpu,\n                     use_one_hot_embeddings, optimizer, poly_power,\n                     start_warmup_step):\n  \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n\n  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n    \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n\n    tf.logging.info(\"*** Features ***\")\n    for name in sorted(features.keys()):\n      tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n\n    input_ids = features[\"input_ids\"]\n    input_mask = features[\"input_mask\"]\n    segment_ids = features[\"segment_ids\"]\n    masked_lm_positions = features[\"masked_lm_positions\"]\n    masked_lm_ids = features[\"masked_lm_ids\"]\n    masked_lm_weights = features[\"masked_lm_weights\"]\n    # Note: We keep this feature name `next_sentence_labels` to be compatible\n    # with the original data created by lanzhzh@. However, in the ALBERT case\n    # it does represent sentence_order_labels.\n    sentence_order_labels = features[\"next_sentence_labels\"]\n\n    is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n\n    model = modeling.AlbertModel(\n        config=albert_config,\n        is_training=is_training,\n        input_ids=input_ids,\n        input_mask=input_mask,\n        token_type_ids=segment_ids,\n        use_one_hot_embeddings=use_one_hot_embeddings)\n\n    (masked_lm_loss, masked_lm_example_loss,\n     masked_lm_log_probs) = get_masked_lm_output(albert_config,\n                                                 model.get_sequence_output(),\n                                                 model.get_embedding_table(),\n                                                 masked_lm_positions,\n                                                 masked_lm_ids,\n                                                 masked_lm_weights)\n\n    (sentence_order_loss, sentence_order_example_loss,\n     sentence_order_log_probs) = get_sentence_order_output(\n         albert_config, model.get_pooled_output(), sentence_order_labels)\n\n    total_loss = masked_lm_loss + sentence_order_loss\n\n    tvars = tf.trainable_variables()\n\n    initialized_variable_names = {}\n    scaffold_fn = None\n    if init_checkpoint:\n      tf.logging.info(\"number of hidden group %d to initialize\",\n                      albert_config.num_hidden_groups)\n      num_of_initialize_group = 1\n      if FLAGS.init_from_group0:\n        num_of_initialize_group = albert_config.num_hidden_groups\n        if albert_config.net_structure_type > 0:\n          num_of_initialize_group = albert_config.num_hidden_layers\n      (assignment_map, initialized_variable_names\n      ) = modeling.get_assignment_map_from_checkpoint(\n              tvars, init_checkpoint, num_of_initialize_group)\n      if use_tpu:\n\n        def tpu_scaffold():\n          for gid in range(num_of_initialize_group):\n            tf.logging.info(\"initialize the %dth layer\", gid)\n            tf.logging.info(assignment_map[gid])\n            tf.train.init_from_checkpoint(init_checkpoint, assignment_map[gid])\n          return tf.train.Scaffold()\n\n        scaffold_fn = tpu_scaffold\n      else:\n        for gid in range(num_of_initialize_group):\n          tf.logging.info(\"initialize the %dth layer\", gid)\n          tf.logging.info(assignment_map[gid])\n          tf.train.init_from_checkpoint(init_checkpoint, assignment_map[gid])\n\n    tf.logging.info(\"**** Trainable Variables ****\")\n    for var in tvars:\n      init_string = \"\"\n      if var.name in initialized_variable_names:\n        init_string = \", *INIT_FROM_CKPT*\"\n      tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n                      init_string)\n\n    output_spec = None\n    if mode == tf.estimator.ModeKeys.TRAIN:\n      train_op = optimization.create_optimizer(\n          total_loss, learning_rate, num_train_steps, num_warmup_steps,\n          use_tpu, optimizer, poly_power, start_warmup_step)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          train_op=train_op,\n          scaffold_fn=scaffold_fn)\n    elif mode == tf.estimator.ModeKeys.EVAL:\n\n      def metric_fn(*args):\n        \"\"\"Computes the loss and accuracy of the model.\"\"\"\n        (masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n         masked_lm_weights, sentence_order_example_loss,\n         sentence_order_log_probs, sentence_order_labels) = args[:7]\n\n\n        masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n                                         [-1, masked_lm_log_probs.shape[-1]])\n        masked_lm_predictions = tf.argmax(\n            masked_lm_log_probs, axis=-1, output_type=tf.int32)\n        masked_lm_example_loss = tf.reshape(masked_lm_example_loss, [-1])\n        masked_lm_ids = tf.reshape(masked_lm_ids, [-1])\n        masked_lm_weights = tf.reshape(masked_lm_weights, [-1])\n        masked_lm_accuracy = tf.metrics.accuracy(\n            labels=masked_lm_ids,\n            predictions=masked_lm_predictions,\n            weights=masked_lm_weights)\n        masked_lm_mean_loss = tf.metrics.mean(\n            values=masked_lm_example_loss, weights=masked_lm_weights)\n\n        metrics = {\n            \"masked_lm_accuracy\": masked_lm_accuracy,\n            \"masked_lm_loss\": masked_lm_mean_loss,\n        }\n\n        sentence_order_log_probs = tf.reshape(\n            sentence_order_log_probs, [-1, sentence_order_log_probs.shape[-1]])\n        sentence_order_predictions = tf.argmax(\n            sentence_order_log_probs, axis=-1, output_type=tf.int32)\n        sentence_order_labels = tf.reshape(sentence_order_labels, [-1])\n        sentence_order_accuracy = tf.metrics.accuracy(\n            labels=sentence_order_labels,\n            predictions=sentence_order_predictions)\n        sentence_order_mean_loss = tf.metrics.mean(\n            values=sentence_order_example_loss)\n        metrics.update({\n            \"sentence_order_accuracy\": sentence_order_accuracy,\n            \"sentence_order_loss\": sentence_order_mean_loss\n        })\n        return metrics\n\n      metric_values = [\n          masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n          masked_lm_weights, sentence_order_example_loss,\n          sentence_order_log_probs, sentence_order_labels\n      ]\n\n      eval_metrics = (metric_fn, metric_values)\n\n      output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n          mode=mode,\n          loss=total_loss,\n          eval_metrics=eval_metrics,\n          scaffold_fn=scaffold_fn)\n    else:\n      raise ValueError(\"Only TRAIN and EVAL modes are supported: %s\" % (mode))\n\n    return output_spec\n\n  return model_fn\n\n\ndef get_masked_lm_output(albert_config, input_tensor, output_weights, positions,\n                         label_ids, label_weights):\n  \"\"\"Get loss and log probs for the masked LM.\"\"\"\n  input_tensor = gather_indexes(input_tensor, positions)\n\n\n  with tf.variable_scope(\"cls/predictions\"):\n    # We apply one more non-linear transformation before the output layer.\n    # This matrix is not used after pre-training.\n    with tf.variable_scope(\"transform\"):\n      input_tensor = tf.layers.dense(\n          input_tensor,\n          units=albert_config.embedding_size,\n          activation=modeling.get_activation(albert_config.hidden_act),\n          kernel_initializer=modeling.create_initializer(\n              albert_config.initializer_range))\n      input_tensor = modeling.layer_norm(input_tensor)\n\n    # The output weights are the same as the input embeddings, but there is\n    # an output-only bias for each token.\n    output_bias = tf.get_variable(\n        \"output_bias\",\n        shape=[albert_config.vocab_size],\n        initializer=tf.zeros_initializer())\n    logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n\n    label_ids = tf.reshape(label_ids, [-1])\n    label_weights = tf.reshape(label_weights, [-1])\n\n    one_hot_labels = tf.one_hot(\n        label_ids, depth=albert_config.vocab_size, dtype=tf.float32)\n\n    # The `positions` tensor might be zero-padded (if the sequence is too\n    # short to have the maximum number of predictions). The `label_weights`\n    # tensor has a value of 1.0 for every real prediction and 0.0 for the\n    # padding predictions.\n    per_example_loss = -tf.reduce_sum(log_probs * one_hot_labels, axis=[-1])\n    numerator = tf.reduce_sum(label_weights * per_example_loss)\n    denominator = tf.reduce_sum(label_weights) + 1e-5\n    loss = numerator / denominator\n\n  return (loss, per_example_loss, log_probs)\n\n\ndef get_sentence_order_output(albert_config, input_tensor, labels):\n  \"\"\"Get loss and log probs for the next sentence prediction.\"\"\"\n\n  # Simple binary classification. Note that 0 is \"next sentence\" and 1 is\n  # \"random sentence\". This weight matrix is not used after pre-training.\n  with tf.variable_scope(\"cls/seq_relationship\"):\n    output_weights = tf.get_variable(\n        \"output_weights\",\n        shape=[2, albert_config.hidden_size],\n        initializer=modeling.create_initializer(\n            albert_config.initializer_range))\n    output_bias = tf.get_variable(\n        \"output_bias\", shape=[2], initializer=tf.zeros_initializer())\n\n    logits = tf.matmul(input_tensor, output_weights, transpose_b=True)\n    logits = tf.nn.bias_add(logits, output_bias)\n    log_probs = tf.nn.log_softmax(logits, axis=-1)\n    labels = tf.reshape(labels, [-1])\n    one_hot_labels = tf.one_hot(labels, depth=2, dtype=tf.float32)\n    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)\n    loss = tf.reduce_mean(per_example_loss)\n    return (loss, per_example_loss, log_probs)\n\n\ndef gather_indexes(sequence_tensor, positions):\n  \"\"\"Gathers the vectors at the specific positions over a minibatch.\"\"\"\n  sequence_shape = modeling.get_shape_list(sequence_tensor, expected_rank=3)\n  batch_size = sequence_shape[0]\n  seq_length = sequence_shape[1]\n  width = sequence_shape[2]\n\n  flat_offsets = tf.reshape(\n      tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])\n  flat_positions = tf.reshape(positions + flat_offsets, [-1])\n  flat_sequence_tensor = tf.reshape(sequence_tensor,\n                                    [batch_size * seq_length, width])\n  output_tensor = tf.gather(flat_sequence_tensor, flat_positions)\n  return output_tensor\n\n\ndef input_fn_builder(input_files,\n                     max_seq_length,\n                     max_predictions_per_seq,\n                     is_training,\n                     num_cpu_threads=4):\n  \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n\n  def input_fn(params):\n    \"\"\"The actual input function.\"\"\"\n    batch_size = params[\"batch_size\"]\n\n    name_to_features = {\n        \"input_ids\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"input_mask\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        \"segment_ids\": tf.FixedLenFeature([max_seq_length], tf.int64),\n        # Note: We keep this feature name `next_sentence_labels` to be\n        # compatible with the original data created by lanzhzh@. However, in\n        # the ALBERT case it does represent sentence_order_labels.\n        \"next_sentence_labels\": tf.FixedLenFeature([1], tf.int64),\n    }\n\n    if FLAGS.masked_lm_budget:\n      name_to_features.update({\n          \"token_boundary\":\n              tf.FixedLenFeature([max_seq_length], tf.int64)})\n    else:\n      name_to_features.update({\n          \"masked_lm_positions\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n          \"masked_lm_ids\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.int64),\n          \"masked_lm_weights\":\n              tf.FixedLenFeature([max_predictions_per_seq], tf.float32)})\n\n    # For training, we want a lot of parallel reading and shuffling.\n    # For eval, we want no shuffling and parallel reading doesn't matter.\n    if is_training:\n      d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))\n      d = d.repeat()\n      d = d.shuffle(buffer_size=len(input_files))\n\n      # `cycle_length` is the number of parallel files that get read.\n      cycle_length = min(num_cpu_threads, len(input_files))\n\n      # `sloppy` mode means that the interleaving is not exact. This adds\n      # even more randomness to the training pipeline.\n      d = d.apply(\n          tf.contrib.data.parallel_interleave(\n              tf.data.TFRecordDataset,\n              sloppy=is_training,\n              cycle_length=cycle_length))\n      d = d.shuffle(buffer_size=100)\n    else:\n      d = tf.data.TFRecordDataset(input_files)\n      # Since we evaluate for a fixed number of steps we don't want to encounter\n      # out-of-range exceptions.\n      d = d.repeat()\n\n    # We must `drop_remainder` on training because the TPU requires fixed\n    # size dimensions. For eval, we assume we are evaluating on the CPU or GPU\n    # and we *don't* want to drop the remainder, otherwise we wont cover\n    # every sample.\n    d = d.apply(\n        tf.data.experimental.map_and_batch_with_legacy_function(\n            lambda record: _decode_record(record, name_to_features),\n            batch_size=batch_size,\n            num_parallel_batches=num_cpu_threads,\n            drop_remainder=True))\n    tf.logging.info(d)\n    return d\n\n  return input_fn\n\n\ndef _decode_record(record, name_to_features):\n  \"\"\"Decodes a record to a TensorFlow example.\"\"\"\n  example = tf.parse_single_example(record, name_to_features)\n\n  # tf.Example only supports tf.int64, but the TPU only supports tf.int32.\n  # So cast all int64 to int32.\n  for name in list(example.keys()):\n    t = example[name]\n    if t.dtype == tf.int64:\n      t = tf.to_int32(t)\n    example[name] = t\n\n  return example\n\n\ndef main(_):\n  tf.logging.set_verbosity(tf.logging.INFO)\n\n  if not FLAGS.do_train and not FLAGS.do_eval:\n    raise ValueError(\"At least one of `do_train` or `do_eval` must be True.\")\n\n  albert_config = modeling.AlbertConfig.from_json_file(FLAGS.albert_config_file)\n\n  tf.gfile.MakeDirs(FLAGS.output_dir)\n\n  input_files = []\n  for input_pattern in FLAGS.input_file.split(\",\"):\n    input_files.extend(tf.gfile.Glob(input_pattern))\n\n  tf.logging.info(\"*** Input Files ***\")\n  for input_file in input_files:\n    tf.logging.info(\"  %s\" % input_file)\n\n  tpu_cluster_resolver = None\n  if FLAGS.use_tpu and FLAGS.tpu_name:\n    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)\n\n  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n  run_config = tf.contrib.tpu.RunConfig(\n      cluster=tpu_cluster_resolver,\n      master=FLAGS.master,\n      model_dir=FLAGS.output_dir,\n      save_checkpoints_steps=FLAGS.save_checkpoints_steps,\n      tpu_config=tf.contrib.tpu.TPUConfig(\n          iterations_per_loop=FLAGS.iterations_per_loop,\n          num_shards=FLAGS.num_tpu_cores,\n          per_host_input_for_training=is_per_host))\n\n  model_fn = model_fn_builder(\n      albert_config=albert_config,\n      init_checkpoint=FLAGS.init_checkpoint,\n      learning_rate=FLAGS.learning_rate,\n      num_train_steps=FLAGS.num_train_steps,\n      num_warmup_steps=FLAGS.num_warmup_steps,\n      use_tpu=FLAGS.use_tpu,\n      use_one_hot_embeddings=FLAGS.use_tpu,\n      optimizer=FLAGS.optimizer,\n      poly_power=FLAGS.poly_power,\n      start_warmup_step=FLAGS.start_warmup_step)\n\n  # If TPU is not available, this will fall back to normal Estimator on CPU\n  # or GPU.\n  estimator = tf.contrib.tpu.TPUEstimator(\n      use_tpu=FLAGS.use_tpu,\n      model_fn=model_fn,\n      config=run_config,\n      train_batch_size=FLAGS.train_batch_size,\n      eval_batch_size=FLAGS.eval_batch_size)\n\n  if FLAGS.do_train:\n    tf.logging.info(\"***** Running training *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.train_batch_size)\n    train_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=True)\n    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)\n\n  if FLAGS.do_eval:\n    tf.logging.info(\"***** Running evaluation *****\")\n    tf.logging.info(\"  Batch size = %d\", FLAGS.eval_batch_size)\n    global_step = -1\n    output_eval_file = os.path.join(FLAGS.output_dir, \"eval_results.txt\")\n    writer = tf.gfile.GFile(output_eval_file, \"w\")\n    tf.gfile.MakeDirs(FLAGS.export_dir)\n    eval_input_fn = input_fn_builder(\n        input_files=input_files,\n        max_seq_length=FLAGS.max_seq_length,\n        max_predictions_per_seq=FLAGS.max_predictions_per_seq,\n        is_training=False)\n    while global_step < FLAGS.num_train_steps:\n      if estimator.latest_checkpoint() is None:\n        tf.logging.info(\"No checkpoint found yet. Sleeping.\")\n        time.sleep(1)\n      else:\n        result = estimator.evaluate(\n            input_fn=eval_input_fn, steps=FLAGS.max_eval_steps)\n        global_step = result[\"global_step\"]\n        tf.logging.info(\"***** Eval results *****\")\n        for key in sorted(result.keys()):\n          tf.logging.info(\"  %s = %s\", key, str(result[key]))\n          writer.write(\"%s = %s\\n\" % (key, str(result[key])))\n\nif __name__ == \"__main__\":\n  flags.mark_flag_as_required(\"input_file\")\n  flags.mark_flag_as_required(\"albert_config_file\")\n  flags.mark_flag_as_required(\"output_dir\")\n  tf.app.run()"
  },
  {
    "path": "similarity.py",
    "content": "\"\"\"\r\n进行文本相似度预测的示例。可以直接运行进行预测。\r\n参考了项目：https://github.com/chdd/bert-utils\r\n\r\n\"\"\"\r\n\r\n\r\nimport tensorflow as tf\r\nimport args\r\nimport tokenization\r\nimport modeling\r\nfrom run_classifier import InputFeatures, InputExample, DataProcessor, create_model, convert_examples_to_features\r\n\r\n\r\n# os.environ['CUDA_VISIBLE_DEVICES'] = '1'\r\n\r\n\r\nclass SimProcessor(DataProcessor):\r\n    def get_sentence_examples(self, questions):\r\n        examples = []\r\n        for index, data in enumerate(questions):\r\n            guid = 'test-%d' % index\r\n            text_a = tokenization.convert_to_unicode(str(data[0]))\r\n            text_b = tokenization.convert_to_unicode(str(data[1]))\r\n            label = str(0)\r\n            examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))\r\n        return examples\r\n\r\n    def get_labels(self):\r\n        return ['0', '1']\r\n\r\n\r\n\"\"\"\r\n模型类，负责载入checkpoint初始化模型\r\n\"\"\"\r\nclass BertSim:\r\n    def __init__(self, batch_size=args.batch_size):\r\n        self.mode = None\r\n        self.max_seq_length = args.max_seq_len\r\n        self.tokenizer = tokenization.FullTokenizer(vocab_file=args.vocab_file, do_lower_case=True)\r\n        self.batch_size = batch_size\r\n        self.estimator = None\r\n        self.processor = SimProcessor()\r\n        tf.logging.set_verbosity(tf.logging.INFO)\r\n\r\n\r\n\r\n    #载入estimator,构造模型\r\n    def start_model(self):\r\n        self.estimator = self.get_estimator()\r\n\r\n\r\n    def model_fn_builder(self, bert_config, num_labels, init_checkpoint, learning_rate,\r\n                         num_train_steps, num_warmup_steps,\r\n                         use_one_hot_embeddings):\r\n        \"\"\"Returns `model_fn` closurimport_tfe for TPUEstimator.\"\"\"\r\n\r\n        def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\r\n            from tensorflow.python.estimator.model_fn import EstimatorSpec\r\n\r\n            tf.logging.info(\"*** Features ***\")\r\n            for name in sorted(features.keys()):\r\n                tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\r\n\r\n            input_ids = features[\"input_ids\"]\r\n            input_mask = features[\"input_mask\"]\r\n            segment_ids = features[\"segment_ids\"]\r\n            label_ids = features[\"label_ids\"]\r\n\r\n            is_training = (mode == tf.estimator.ModeKeys.TRAIN)\r\n\r\n            (total_loss, per_example_loss, logits, probabilities) = create_model(\r\n                bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,\r\n                num_labels, use_one_hot_embeddings)\r\n\r\n            tvars = tf.trainable_variables()\r\n            initialized_variable_names = {}\r\n\r\n            if init_checkpoint:\r\n                (assignment_map, initialized_variable_names) \\\r\n                    = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)\r\n                tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\r\n\r\n            tf.logging.info(\"**** Trainable Variables ****\")\r\n            for var in tvars:\r\n                init_string = \"\"\r\n                if var.name in initialized_variable_names:\r\n                    init_string = \", *INIT_FROM_CKPT*\"\r\n                tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\r\n                                init_string)\r\n            output_spec = EstimatorSpec(mode=mode, predictions=probabilities)\r\n\r\n            return output_spec\r\n\r\n        return model_fn\r\n\r\n    def get_estimator(self):\r\n\r\n        from tensorflow.python.estimator.estimator import Estimator\r\n        from tensorflow.python.estimator.run_config import RunConfig\r\n\r\n        bert_config = modeling.BertConfig.from_json_file(args.config_name)\r\n        label_list = self.processor.get_labels()\r\n        if self.mode == tf.estimator.ModeKeys.TRAIN:\r\n            init_checkpoint = args.ckpt_name\r\n        else:\r\n            init_checkpoint = args.output_dir\r\n\r\n        model_fn = self.model_fn_builder(\r\n            bert_config=bert_config,\r\n            num_labels=len(label_list),\r\n            init_checkpoint=init_checkpoint,\r\n            learning_rate=args.learning_rate,\r\n            num_train_steps=None,\r\n            num_warmup_steps=None,\r\n            use_one_hot_embeddings=False)\r\n\r\n        config = tf.ConfigProto()\r\n        config.gpu_options.allow_growth = True\r\n        config.gpu_options.per_process_gpu_memory_fraction = args.gpu_memory_fraction\r\n        config.log_device_placement = False\r\n\r\n        return Estimator(model_fn=model_fn, config=RunConfig(session_config=config), model_dir=args.output_dir,\r\n                         params={'batch_size': self.batch_size})\r\n\r\n    def predict_sentences(self,sentences):\r\n        results= self.estimator.predict(input_fn=input_fn_builder(self,sentences), yield_single_examples=False)\r\n        #打印预测结果\r\n        for i in results:\r\n            print(i)\r\n\r\n    def _truncate_seq_pair(self, tokens_a, tokens_b, max_length):\r\n        \"\"\"Truncates a sequence pair in place to the maximum length.\"\"\"\r\n\r\n        # This is a simple heuristic which will always truncate the longer sequence\r\n        # one token at a time. This makes more sense than truncating an equal percent\r\n        # of tokens from each, since if one sequence is very short then each token\r\n        # that's truncated likely contains more information than a longer sequence.\r\n        while True:\r\n            total_length = len(tokens_a) + len(tokens_b)\r\n            if total_length <= max_length:\r\n                break\r\n            if len(tokens_a) > len(tokens_b):\r\n                tokens_a.pop()\r\n            else:\r\n                tokens_b.pop()\r\n\r\n    def convert_single_example(self, ex_index, example, label_list, max_seq_length, tokenizer):\r\n        \"\"\"Converts a single `InputExample` into a single `InputFeatures`.\"\"\"\r\n        label_map = {}\r\n        for (i, label) in enumerate(label_list):\r\n            label_map[label] = i\r\n\r\n        tokens_a = tokenizer.tokenize(example.text_a)\r\n        tokens_b = None\r\n        if example.text_b:\r\n            tokens_b = tokenizer.tokenize(example.text_b)\r\n\r\n        if tokens_b:\r\n            # Modifies `tokens_a` and `tokens_b` in place so that the total\r\n            # length is less than the specified length.\r\n            # Account for [CLS], [SEP], [SEP] with \"- 3\"\r\n            self._truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)\r\n        else:\r\n            # Account for [CLS] and [SEP] with \"- 2\"\r\n            if len(tokens_a) > max_seq_length - 2:\r\n                tokens_a = tokens_a[0:(max_seq_length - 2)]\r\n\r\n        # The convention in BERT is:\r\n        # (a) For sequence pairs:\r\n        #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]\r\n        #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1\r\n        # (b) For single sequences:\r\n        #  tokens:   [CLS] the dog is hairy . [SEP]\r\n        #  type_ids: 0     0   0   0  0     0 0\r\n        #\r\n        # Where \"type_ids\" are used to indicate whether this is the first\r\n        # sequence or the second sequence. The embedding vectors for `type=0` and\r\n        # `type=1` were learned during pre-training and are added to the wordpiece\r\n        # embedding vector (and position vector). This is not *strictly* necessary\r\n        # since the [SEP] token unambiguously separates the sequences, but it makes\r\n        # it easier for the model to learn the concept of sequences.\r\n        #\r\n        # For classification tasks, the first vector (corresponding to [CLS]) is\r\n        # used as as the \"sentence vector\". Note that this only makes sense because\r\n        # the entire model is fine-tuned.\r\n        tokens = []\r\n        segment_ids = []\r\n        tokens.append(\"[CLS]\")\r\n        segment_ids.append(0)\r\n        for token in tokens_a:\r\n            tokens.append(token)\r\n            segment_ids.append(0)\r\n        tokens.append(\"[SEP]\")\r\n        segment_ids.append(0)\r\n\r\n        if tokens_b:\r\n            for token in tokens_b:\r\n                tokens.append(token)\r\n                segment_ids.append(1)\r\n            tokens.append(\"[SEP]\")\r\n            segment_ids.append(1)\r\n\r\n        input_ids = tokenizer.convert_tokens_to_ids(tokens)\r\n\r\n        # The mask has 1 for real tokens and 0 for padding tokens. Only real\r\n        # tokens are attended to.\r\n        input_mask = [1] * len(input_ids)\r\n\r\n        # Zero-pad up to the sequence length.\r\n        while len(input_ids) < max_seq_length:\r\n            input_ids.append(0)\r\n            input_mask.append(0)\r\n            segment_ids.append(0)\r\n\r\n        assert len(input_ids) == max_seq_length\r\n        assert len(input_mask) == max_seq_length\r\n        assert len(segment_ids) == max_seq_length\r\n\r\n        label_id = label_map[example.label]\r\n        if ex_index < 5:\r\n            tf.logging.info(\"*** Example ***\")\r\n            tf.logging.info(\"guid: %s\" % (example.guid))\r\n            tf.logging.info(\"tokens: %s\" % \" \".join(\r\n                [tokenization.printable_text(x) for x in tokens]))\r\n            tf.logging.info(\"input_ids: %s\" % \" \".join([str(x) for x in input_ids]))\r\n            tf.logging.info(\"input_mask: %s\" % \" \".join([str(x) for x in input_mask]))\r\n            tf.logging.info(\"segment_ids: %s\" % \" \".join([str(x) for x in segment_ids]))\r\n            tf.logging.info(\"label: %s (id = %d)\" % (example.label, label_id))\r\n\r\n        feature = InputFeatures(\r\n            input_ids=input_ids,\r\n            input_mask=input_mask,\r\n            segment_ids=segment_ids,\r\n            label_id=label_id)\r\n        return feature\r\n\r\n\r\n\r\n\r\ndef input_fn_builder(bertSim,sentences):\r\n    def predict_input_fn():\r\n        return (tf.data.Dataset.from_generator(\r\n            generate_from_input,\r\n            output_types={\r\n                'input_ids': tf.int32,\r\n                'input_mask': tf.int32,\r\n                'segment_ids': tf.int32,\r\n                'label_ids': tf.int32},\r\n            output_shapes={\r\n                'input_ids': (None, bertSim.max_seq_length),\r\n                'input_mask': (None, bertSim.max_seq_length),\r\n                'segment_ids': (None, bertSim.max_seq_length),\r\n                'label_ids': (1,)}).prefetch(10))\r\n\r\n    def generate_from_input():\r\n        processor = bertSim.processor\r\n        predict_examples = processor.get_sentence_examples(sentences)\r\n        features = convert_examples_to_features(predict_examples, processor.get_labels(), args.max_seq_len,\r\n                                                bertSim.tokenizer)\r\n        yield {\r\n            'input_ids': [f.input_ids for f in features],\r\n            'input_mask': [f.input_mask for f in features],\r\n            'segment_ids': [f.segment_ids for f in features],\r\n            'label_ids': [f.label_id for f in features]\r\n        }\r\n\r\n    return predict_input_fn\r\n\r\n\r\nif __name__ == '__main__':\r\n    sim = BertSim()\r\n    sim.start_model()\r\n    sim.predict_sentences([(\"我喜欢妈妈做的汤\", \"妈妈做的汤我很喜欢喝\")])\r\n"
  },
  {
    "path": "test_changes.py",
    "content": "# coding=utf-8\nimport tensorflow as tf\nfrom modeling import embedding_lookup_factorized,transformer_model\nimport os\n\n\"\"\"\n测试albert主要的改进点：词嵌入的因式分解、层间参数共享、段落间连贯性\ntest main change of albert from bert\n\"\"\"\nbatch_size = 2048\nsequence_length = 512\nvocab_size = 30000\nhidden_size = 1024\nnum_attention_heads = int(hidden_size / 64)\n\ndef get_total_parameters():\n    \"\"\"\n    get total parameters of a graph\n    :return:\n    \"\"\"\n    total_parameters = 0\n    for variable in tf.trainable_variables():\n        # shape is an array of tf.Dimension\n        shape = variable.get_shape()\n        # print(shape)\n        # print(len(shape))\n        variable_parameters = 1\n        for dim in shape:\n            # print(dim)\n            variable_parameters *= dim.value\n        # print(variable_parameters)\n        total_parameters += variable_parameters\n    return total_parameters\n\ndef test_factorized_embedding():\n    \"\"\"\n    test of Factorized embedding parameterization\n    :return:\n    \"\"\"\n    input_ids=tf.zeros((batch_size, sequence_length),dtype=tf.int32)\n    output, embedding_table, embedding_table_2=embedding_lookup_factorized(input_ids,vocab_size,hidden_size)\n    print(\"output:\",output)\n\ndef test_share_parameters():\n    \"\"\"\n    test of share parameters across all layers: how many parameter after share parameter across layers of transformer.\n    :return:\n    \"\"\"\n    def total_parameters_transformer(share_parameter_across_layers):\n        input_tensor=tf.zeros((batch_size, sequence_length, hidden_size),dtype=tf.float32)\n        print(\"transformer_model. input:\",input_tensor)\n        transformer_result=transformer_model(input_tensor,hidden_size=hidden_size,num_attention_heads=num_attention_heads,share_parameter_across_layers=share_parameter_across_layers)\n        print(\"transformer_result:\",transformer_result)\n        total_parameters=get_total_parameters()\n        print('total_parameters(not share):',total_parameters)\n\n    share_parameter_across_layers=False\n    total_parameters_transformer(share_parameter_across_layers) # total parameters, not share: 125,976,576 = 125 million\n\n    tf.reset_default_graph() # Clears the default graph stack and resets the global default graph\n    share_parameter_across_layers=True\n    total_parameters_transformer(share_parameter_across_layers) #  total parameters,   share: 10,498,048 = 10.5 million\n\ndef test_sentence_order_prediction():\n    \"\"\"\n    sentence order prediction.\n\n    check method of create_instances_from_document_albert from create_pretrining_data.py\n\n    :return:\n    \"\"\"\n    # 添加运行权限\n    os.system(\"chmod +x create_pretrain_data.sh\")\n\n    os.system(\"./create_pretrain_data.sh\")\n\n\n# 1.test of Factorized embedding parameterization\n#test_factorized_embedding()\n\n# 2. test of share parameters across all layers: how many parameter after share parameter across layers of transformer.\n# before share parameter: 125,976,576; after share parameter:\n#test_share_parameters()\n\n# 3. test of sentence order prediction(SOP)\ntest_sentence_order_prediction()\n\n"
  },
  {
    "path": "tokenization.py",
    "content": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"Tokenization classes.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport re\nimport unicodedata\nimport six\nimport tensorflow as tf\n\n\ndef validate_case_matches_checkpoint(do_lower_case, init_checkpoint):\n  \"\"\"Checks whether the casing config is consistent with the checkpoint name.\"\"\"\n\n  # The casing has to be passed in by the user and there is no explicit check\n  # as to whether it matches the checkpoint. The casing information probably\n  # should have been stored in the bert_config.json file, but it's not, so\n  # we have to heuristically detect it to validate.\n\n  if not init_checkpoint:\n    return\n\n  m = re.match(\"^.*?([A-Za-z0-9_-]+)/bert_model.ckpt\", init_checkpoint)\n  if m is None:\n    return\n\n  model_name = m.group(1)\n\n  lower_models = [\n      \"uncased_L-24_H-1024_A-16\", \"uncased_L-12_H-768_A-12\",\n      \"multilingual_L-12_H-768_A-12\", \"chinese_L-12_H-768_A-12\"\n  ]\n\n  cased_models = [\n      \"cased_L-12_H-768_A-12\", \"cased_L-24_H-1024_A-16\",\n      \"multi_cased_L-12_H-768_A-12\"\n  ]\n\n  is_bad_config = False\n  if model_name in lower_models and not do_lower_case:\n    is_bad_config = True\n    actual_flag = \"False\"\n    case_name = \"lowercased\"\n    opposite_flag = \"True\"\n\n  if model_name in cased_models and do_lower_case:\n    is_bad_config = True\n    actual_flag = \"True\"\n    case_name = \"cased\"\n    opposite_flag = \"False\"\n\n  if is_bad_config:\n    raise ValueError(\n        \"You passed in `--do_lower_case=%s` with `--init_checkpoint=%s`. \"\n        \"However, `%s` seems to be a %s model, so you \"\n        \"should pass in `--do_lower_case=%s` so that the fine-tuning matches \"\n        \"how the model was pre-training. If this error is wrong, please \"\n        \"just comment out this check.\" % (actual_flag, init_checkpoint,\n                                          model_name, case_name, opposite_flag))\n\n\ndef convert_to_unicode(text):\n  \"\"\"Converts `text` to Unicode (if it's not already), assuming utf-8 input.\"\"\"\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return text.decode(\"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text.decode(\"utf-8\", \"ignore\")\n    elif isinstance(text, unicode):\n      return text\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef printable_text(text):\n  \"\"\"Returns text encoded in a way suitable for print or `tf.logging`.\"\"\"\n\n  # These functions want `str` for both Python2 and Python3, but in one case\n  # it's a Unicode string and in the other it's a byte string.\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return text.decode(\"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, unicode):\n      return text.encode(\"utf-8\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef load_vocab(vocab_file):\n  \"\"\"Loads a vocabulary file into a dictionary.\"\"\"\n  vocab = collections.OrderedDict()\n  index = 0\n  with tf.gfile.GFile(vocab_file, \"r\") as reader:\n    while True:\n      token = convert_to_unicode(reader.readline())\n      if not token:\n        break\n      token = token.strip()\n      vocab[token] = index\n      index += 1\n  return vocab\n\n\ndef convert_by_vocab(vocab, items):\n  \"\"\"Converts a sequence of [tokens|ids] using the vocab.\"\"\"\n  output = []\n  #print(\"items:\",items) #['[CLS]', '日', '##期', '，', '但', '被', '##告', '金', '##东', '##福', '载', '##明', '[MASK]', 'U', '##N', '##K', ']', '保', '##证', '本', '##月', '1', '##4', '[MASK]', '到', '##位', '，', '2', '##0', '##1', '##5', '年', '6', '[MASK]', '1', '##1', '日', '[', 'U', '##N', '##K', ']', '，', '原', '##告', '[MASK]', '认', '##可', '于', '2', '##0', '##1', '##5', '[MASK]', '6', '月', '[MASK]', '[MASK]', '日', '##向', '被', '##告', '主', '##张', '权', '##利', '。', '而', '[MASK]', '[MASK]', '自', '[MASK]', '[MASK]', '[MASK]', '[MASK]', '年', '6', '月', '1', '##1', '日', '[SEP]', '原', '##告', '于', '2', '##0', '##1', '##6', '[MASK]', '6', '[MASK]', '2', '##4', '日', '起', '##诉', '，', '主', '##张', '保', '##证', '责', '##任', '，', '已', '超', '##过', '保', '##证', '期', '##限', '[MASK]', '保', '##证', '人', '依', '##法', '不', '##再', '承', '##担', '保', '##证', '[MASK]', '[MASK]', '[MASK]', '[SEP]']\n  for i,item in enumerate(items):\n    #print(i,\"item:\",item) #  ##期\n    output.append(vocab[item])\n  return output\n\n\ndef convert_tokens_to_ids(vocab, tokens):\n  return convert_by_vocab(vocab, tokens)\n\n\ndef convert_ids_to_tokens(inv_vocab, ids):\n  return convert_by_vocab(inv_vocab, ids)\n\n\ndef whitespace_tokenize(text):\n  \"\"\"Runs basic whitespace cleaning and splitting on a piece of text.\"\"\"\n  text = text.strip()\n  if not text:\n    return []\n  tokens = text.split()\n  return tokens\n\n\nclass FullTokenizer(object):\n  \"\"\"Runs end-to-end tokenziation.\"\"\"\n\n  def __init__(self, vocab_file, do_lower_case=True):\n    self.vocab = load_vocab(vocab_file)\n    self.inv_vocab = {v: k for k, v in self.vocab.items()}\n    self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)\n    self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)\n\n  def tokenize(self, text):\n    split_tokens = []\n    for token in self.basic_tokenizer.tokenize(text):\n      for sub_token in self.wordpiece_tokenizer.tokenize(token):\n        split_tokens.append(sub_token)\n\n    return split_tokens\n\n  def convert_tokens_to_ids(self, tokens):\n    return convert_by_vocab(self.vocab, tokens)\n\n  def convert_ids_to_tokens(self, ids):\n    return convert_by_vocab(self.inv_vocab, ids)\n\n\nclass BasicTokenizer(object):\n  \"\"\"Runs basic tokenization (punctuation splitting, lower casing, etc.).\"\"\"\n\n  def __init__(self, do_lower_case=True):\n    \"\"\"Constructs a BasicTokenizer.\n\n    Args:\n      do_lower_case: Whether to lower case the input.\n    \"\"\"\n    self.do_lower_case = do_lower_case\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text.\"\"\"\n    text = convert_to_unicode(text)\n    text = self._clean_text(text)\n\n    # This was added on November 1st, 2018 for the multilingual and Chinese\n    # models. This is also applied to the English models now, but it doesn't\n    # matter since the English models were not trained on any Chinese data\n    # and generally don't have any Chinese data in them (there are Chinese\n    # characters in the vocabulary because Wikipedia does have some Chinese\n    # words in the English Wikipedia.).\n    text = self._tokenize_chinese_chars(text)\n\n    orig_tokens = whitespace_tokenize(text)\n    split_tokens = []\n    for token in orig_tokens:\n      if self.do_lower_case:\n        token = token.lower()\n        token = self._run_strip_accents(token)\n      split_tokens.extend(self._run_split_on_punc(token))\n\n    output_tokens = whitespace_tokenize(\" \".join(split_tokens))\n    return output_tokens\n\n  def _run_strip_accents(self, text):\n    \"\"\"Strips accents from a piece of text.\"\"\"\n    text = unicodedata.normalize(\"NFD\", text)\n    output = []\n    for char in text:\n      cat = unicodedata.category(char)\n      if cat == \"Mn\":\n        continue\n      output.append(char)\n    return \"\".join(output)\n\n  def _run_split_on_punc(self, text):\n    \"\"\"Splits punctuation on a piece of text.\"\"\"\n    chars = list(text)\n    i = 0\n    start_new_word = True\n    output = []\n    while i < len(chars):\n      char = chars[i]\n      if _is_punctuation(char):\n        output.append([char])\n        start_new_word = True\n      else:\n        if start_new_word:\n          output.append([])\n        start_new_word = False\n        output[-1].append(char)\n      i += 1\n\n    return [\"\".join(x) for x in output]\n\n  def _tokenize_chinese_chars(self, text):\n    \"\"\"Adds whitespace around any CJK character.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if self._is_chinese_char(cp):\n        output.append(\" \")\n        output.append(char)\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n  def _is_chinese_char(self, cp):\n    \"\"\"Checks whether CP is the codepoint of a CJK character.\"\"\"\n    # This defines a \"chinese character\" as anything in the CJK Unicode block:\n    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)\n    #\n    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,\n    # despite its name. The modern Korean Hangul alphabet is a different block,\n    # as is Japanese Hiragana and Katakana. Those alphabets are used to write\n    # space-separated words, so they are not treated specially and handled\n    # like the all of the other languages.\n    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #\n        (cp >= 0x3400 and cp <= 0x4DBF) or  #\n        (cp >= 0x20000 and cp <= 0x2A6DF) or  #\n        (cp >= 0x2A700 and cp <= 0x2B73F) or  #\n        (cp >= 0x2B740 and cp <= 0x2B81F) or  #\n        (cp >= 0x2B820 and cp <= 0x2CEAF) or\n        (cp >= 0xF900 and cp <= 0xFAFF) or  #\n        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #\n      return True\n\n    return False\n\n  def _clean_text(self, text):\n    \"\"\"Performs invalid character removal and whitespace cleanup on text.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if cp == 0 or cp == 0xfffd or _is_control(char):\n        continue\n      if _is_whitespace(char):\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n\nclass WordpieceTokenizer(object):\n  \"\"\"Runs WordPiece tokenziation.\"\"\"\n\n  def __init__(self, vocab, unk_token=\"[UNK]\", max_input_chars_per_word=200):\n    self.vocab = vocab\n    self.unk_token = unk_token\n    self.max_input_chars_per_word = max_input_chars_per_word\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text into its word pieces.\n\n    This uses a greedy longest-match-first algorithm to perform tokenization\n    using the given vocabulary.\n\n    For example:\n      input = \"unaffable\"\n      output = [\"un\", \"##aff\", \"##able\"]\n\n    Args:\n      text: A single token or whitespace separated tokens. This should have\n        already been passed through `BasicTokenizer.\n\n    Returns:\n      A list of wordpiece tokens.\n    \"\"\"\n\n    text = convert_to_unicode(text)\n\n    output_tokens = []\n    for token in whitespace_tokenize(text):\n      chars = list(token)\n      if len(chars) > self.max_input_chars_per_word:\n        output_tokens.append(self.unk_token)\n        continue\n\n      is_bad = False\n      start = 0\n      sub_tokens = []\n      while start < len(chars):\n        end = len(chars)\n        cur_substr = None\n        while start < end:\n          substr = \"\".join(chars[start:end])\n          if start > 0:\n            substr = \"##\" + substr\n          if substr in self.vocab:\n            cur_substr = substr\n            break\n          end -= 1\n        if cur_substr is None:\n          is_bad = True\n          break\n        sub_tokens.append(cur_substr)\n        start = end\n\n      if is_bad:\n        output_tokens.append(self.unk_token)\n      else:\n        output_tokens.extend(sub_tokens)\n    return output_tokens\n\n\ndef _is_whitespace(char):\n  \"\"\"Checks whether `chars` is a whitespace character.\"\"\"\n  # \\t, \\n, and \\r are technically contorl characters but we treat them\n  # as whitespace since they are generally considered as such.\n  if char == \" \" or char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return True\n  cat = unicodedata.category(char)\n  if cat == \"Zs\":\n    return True\n  return False\n\n\ndef _is_control(char):\n  \"\"\"Checks whether `chars` is a control character.\"\"\"\n  # These are technically control characters but we count them as whitespace\n  # characters.\n  if char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return False\n  cat = unicodedata.category(char)\n  if cat in (\"Cc\", \"Cf\"):\n    return True\n  return False\n\n\ndef _is_punctuation(char):\n  \"\"\"Checks whether `chars` is a punctuation character.\"\"\"\n  cp = ord(char)\n  # We treat all non-letter/number ASCII as punctuation.\n  # Characters such as \"^\", \"$\", and \"`\" are not in the Unicode\n  # Punctuation class but we treat them as punctuation anyways, for\n  # consistency.\n  if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or\n      (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):\n    return True\n  cat = unicodedata.category(char)\n  if cat.startswith(\"P\"):\n    return True\n  return False\n"
  },
  {
    "path": "tokenization_google.py",
    "content": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\n# Lint as: python2, python3\n# coding=utf-8\n\"\"\"Tokenization classes.\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport collections\nimport re\nimport unicodedata\nimport six\nfrom six.moves import range\nimport tensorflow as tf\nimport sentencepiece as spm\n\nSPIECE_UNDERLINE = u\"▁\".encode(\"utf-8\")\n\n\ndef validate_case_matches_checkpoint(do_lower_case, init_checkpoint):\n  \"\"\"Checks whether the casing config is consistent with the checkpoint name.\"\"\"\n\n  # The casing has to be passed in by the user and there is no explicit check\n  # as to whether it matches the checkpoint. The casing information probably\n  # should have been stored in the bert_config.json file, but it's not, so\n  # we have to heuristically detect it to validate.\n\n  if not init_checkpoint:\n    return\n\n  m = re.match(\"^.*?([A-Za-z0-9_-]+)/bert_model.ckpt\",\n               six.ensure_str(init_checkpoint))\n  if m is None:\n    return\n\n  model_name = m.group(1)\n\n  lower_models = [\n      \"uncased_L-24_H-1024_A-16\", \"uncased_L-12_H-768_A-12\",\n      \"multilingual_L-12_H-768_A-12\", \"chinese_L-12_H-768_A-12\"\n  ]\n\n  cased_models = [\n      \"cased_L-12_H-768_A-12\", \"cased_L-24_H-1024_A-16\",\n      \"multi_cased_L-12_H-768_A-12\"\n  ]\n\n  is_bad_config = False\n  if model_name in lower_models and not do_lower_case:\n    is_bad_config = True\n    actual_flag = \"False\"\n    case_name = \"lowercased\"\n    opposite_flag = \"True\"\n\n  if model_name in cased_models and do_lower_case:\n    is_bad_config = True\n    actual_flag = \"True\"\n    case_name = \"cased\"\n    opposite_flag = \"False\"\n\n  if is_bad_config:\n    raise ValueError(\n        \"You passed in `--do_lower_case=%s` with `--init_checkpoint=%s`. \"\n        \"However, `%s` seems to be a %s model, so you \"\n        \"should pass in `--do_lower_case=%s` so that the fine-tuning matches \"\n        \"how the model was pre-training. If this error is wrong, please \"\n        \"just comment out this check.\" % (actual_flag, init_checkpoint,\n                                          model_name, case_name, opposite_flag))\n\n\ndef preprocess_text(inputs, remove_space=True, lower=False):\n  \"\"\"preprocess data by removing extra space and normalize data.\"\"\"\n  outputs = inputs\n  if remove_space:\n    outputs = \" \".join(inputs.strip().split())\n\n  if six.PY2 and isinstance(outputs, str):\n    try:\n      outputs = six.ensure_text(outputs, \"utf-8\")\n    except UnicodeDecodeError:\n      outputs = six.ensure_text(outputs, \"latin-1\")\n\n  outputs = unicodedata.normalize(\"NFKD\", outputs)\n  outputs = \"\".join([c for c in outputs if not unicodedata.combining(c)])\n  if lower:\n    outputs = outputs.lower()\n\n  return outputs\n\n\ndef encode_pieces(sp_model, text, return_unicode=True, sample=False):\n  \"\"\"turn sentences into word pieces.\"\"\"\n\n  if six.PY2 and isinstance(text, six.text_type):\n    text = six.ensure_binary(text, \"utf-8\")\n\n  if not sample:\n    pieces = sp_model.EncodeAsPieces(text)\n  else:\n    pieces = sp_model.SampleEncodeAsPieces(text, 64, 0.1)\n  new_pieces = []\n  for piece in pieces:\n    piece = printable_text(piece)\n    if len(piece) > 1 and piece[-1] == \",\" and piece[-2].isdigit():\n      cur_pieces = sp_model.EncodeAsPieces(\n          six.ensure_binary(piece[:-1]).replace(SPIECE_UNDERLINE, b\"\"))\n      if piece[0] != SPIECE_UNDERLINE and cur_pieces[0][0] == SPIECE_UNDERLINE:\n        if len(cur_pieces[0]) == 1:\n          cur_pieces = cur_pieces[1:]\n        else:\n          cur_pieces[0] = cur_pieces[0][1:]\n      cur_pieces.append(piece[-1])\n      new_pieces.extend(cur_pieces)\n    else:\n      new_pieces.append(piece)\n\n  # note(zhiliny): convert back to unicode for py2\n  if six.PY2 and return_unicode:\n    ret_pieces = []\n    for piece in new_pieces:\n      if isinstance(piece, str):\n        piece = six.ensure_text(piece, \"utf-8\")\n      ret_pieces.append(piece)\n    new_pieces = ret_pieces\n\n  return new_pieces\n\n\ndef encode_ids(sp_model, text, sample=False):\n  pieces = encode_pieces(sp_model, text, return_unicode=False, sample=sample)\n  ids = [sp_model.PieceToId(piece) for piece in pieces]\n  return ids\n\n\ndef convert_to_unicode(text):\n  \"\"\"Converts `text` to Unicode (if it's not already), assuming utf-8 input.\"\"\"\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return six.ensure_text(text, \"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return six.ensure_text(text, \"utf-8\", \"ignore\")\n    elif isinstance(text, six.text_type):\n      return text\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef printable_text(text):\n  \"\"\"Returns text encoded in a way suitable for print or `tf.logging`.\"\"\"\n\n  # These functions want `str` for both Python2 and Python3, but in one case\n  # it's a Unicode string and in the other it's a byte string.\n  if six.PY3:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, bytes):\n      return six.ensure_text(text, \"utf-8\", \"ignore\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  elif six.PY2:\n    if isinstance(text, str):\n      return text\n    elif isinstance(text, six.text_type):\n      return six.ensure_binary(text, \"utf-8\")\n    else:\n      raise ValueError(\"Unsupported string type: %s\" % (type(text)))\n  else:\n    raise ValueError(\"Not running on Python2 or Python 3?\")\n\n\ndef load_vocab(vocab_file):\n  \"\"\"Loads a vocabulary file into a dictionary.\"\"\"\n  vocab = collections.OrderedDict()\n  with tf.gfile.GFile(vocab_file, \"r\") as reader:\n    while True:\n      token = convert_to_unicode(reader.readline())\n      if not token:\n        break\n      token = token.strip() # previous: token.strip().split()[0]\n      if token not in vocab:\n        vocab[token] = len(vocab)\n  return vocab\n\n\ndef convert_by_vocab(vocab, items):\n  \"\"\"Converts a sequence of [tokens|ids] using the vocab.\"\"\"\n  output = []\n  for item in items:\n    output.append(vocab[item])\n  return output\n\n\ndef convert_tokens_to_ids(vocab, tokens):\n  return convert_by_vocab(vocab, tokens)\n\n\ndef convert_ids_to_tokens(inv_vocab, ids):\n  return convert_by_vocab(inv_vocab, ids)\n\n\ndef whitespace_tokenize(text):\n  \"\"\"Runs basic whitespace cleaning and splitting on a piece of text.\"\"\"\n  text = text.strip()\n  if not text:\n    return []\n  tokens = text.split()\n  return tokens\n\n\nclass FullTokenizer(object):\n  \"\"\"Runs end-to-end tokenziation.\"\"\"\n\n  def __init__(self, vocab_file, do_lower_case=True, spm_model_file=None):\n    self.vocab = None\n    self.sp_model = None\n    print(\"spm_model_file:\",spm_model_file,\";vocab_file:\",vocab_file)\n    if spm_model_file:\n      print(\"#Use spm_model_file\")\n      self.sp_model = spm.SentencePieceProcessor()\n      tf.logging.info(\"loading sentence piece model\")\n      self.sp_model.Load(spm_model_file)\n      # Note(mingdachen): For the purpose of consisent API, we are\n      # generating a vocabulary for the sentence piece tokenizer.\n      self.vocab = {self.sp_model.IdToPiece(i): i for i\n                    in range(self.sp_model.GetPieceSize())}\n    else:\n      print(\"#Use vocab_file\")\n      self.vocab = load_vocab(vocab_file)\n      self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)\n      self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)\n    self.inv_vocab = {v: k for k, v in self.vocab.items()}\n\n  def tokenize(self, text):\n    if self.sp_model:\n      split_tokens = encode_pieces(self.sp_model, text, return_unicode=False)\n    else:\n      split_tokens = []\n      for token in self.basic_tokenizer.tokenize(text):\n        for sub_token in self.wordpiece_tokenizer.tokenize(token):\n          split_tokens.append(sub_token)\n\n    return split_tokens\n\n  def convert_tokens_to_ids(self, tokens):\n    if self.sp_model:\n      tf.logging.info(\"using sentence piece tokenzier.\")\n      return [self.sp_model.PieceToId(\n          printable_text(token)) for token in tokens]\n    else:\n      return convert_by_vocab(self.vocab, tokens)\n\n  def convert_ids_to_tokens(self, ids):\n    if self.sp_model:\n      tf.logging.info(\"using sentence piece tokenzier.\")\n      return [self.sp_model.IdToPiece(id_) for id_ in ids]\n    else:\n      return convert_by_vocab(self.inv_vocab, ids)\n\n\nclass BasicTokenizer(object):\n  \"\"\"Runs basic tokenization (punctuation splitting, lower casing, etc.).\"\"\"\n\n  def __init__(self, do_lower_case=True):\n    \"\"\"Constructs a BasicTokenizer.\n\n    Args:\n      do_lower_case: Whether to lower case the input.\n    \"\"\"\n    self.do_lower_case = do_lower_case\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text.\"\"\"\n    text = convert_to_unicode(text)\n    text = self._clean_text(text)\n\n    # This was added on November 1st, 2018 for the multilingual and Chinese\n    # models. This is also applied to the English models now, but it doesn't\n    # matter since the English models were not trained on any Chinese data\n    # and generally don't have any Chinese data in them (there are Chinese\n    # characters in the vocabulary because Wikipedia does have some Chinese\n    # words in the English Wikipedia.).\n    text = self._tokenize_chinese_chars(text)\n\n    orig_tokens = whitespace_tokenize(text)\n    split_tokens = []\n    for token in orig_tokens:\n      if self.do_lower_case:\n        token = token.lower()\n        token = self._run_strip_accents(token)\n      split_tokens.extend(self._run_split_on_punc(token))\n\n    output_tokens = whitespace_tokenize(\" \".join(split_tokens))\n    return output_tokens\n\n  def _run_strip_accents(self, text):\n    \"\"\"Strips accents from a piece of text.\"\"\"\n    text = unicodedata.normalize(\"NFD\", text)\n    output = []\n    for char in text:\n      cat = unicodedata.category(char)\n      if cat == \"Mn\":\n        continue\n      output.append(char)\n    return \"\".join(output)\n\n  def _run_split_on_punc(self, text):\n    \"\"\"Splits punctuation on a piece of text.\"\"\"\n    chars = list(text)\n    i = 0\n    start_new_word = True\n    output = []\n    while i < len(chars):\n      char = chars[i]\n      if _is_punctuation(char):\n        output.append([char])\n        start_new_word = True\n      else:\n        if start_new_word:\n          output.append([])\n        start_new_word = False\n        output[-1].append(char)\n      i += 1\n\n    return [\"\".join(x) for x in output]\n\n  def _tokenize_chinese_chars(self, text):\n    \"\"\"Adds whitespace around any CJK character.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if self._is_chinese_char(cp):\n        output.append(\" \")\n        output.append(char)\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n  def _is_chinese_char(self, cp):\n    \"\"\"Checks whether CP is the codepoint of a CJK character.\"\"\"\n    # This defines a \"chinese character\" as anything in the CJK Unicode block:\n    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)\n    #\n    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,\n    # despite its name. The modern Korean Hangul alphabet is a different block,\n    # as is Japanese Hiragana and Katakana. Those alphabets are used to write\n    # space-separated words, so they are not treated specially and handled\n    # like the all of the other languages.\n    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #\n        (cp >= 0x3400 and cp <= 0x4DBF) or  #\n        (cp >= 0x20000 and cp <= 0x2A6DF) or  #\n        (cp >= 0x2A700 and cp <= 0x2B73F) or  #\n        (cp >= 0x2B740 and cp <= 0x2B81F) or  #\n        (cp >= 0x2B820 and cp <= 0x2CEAF) or\n        (cp >= 0xF900 and cp <= 0xFAFF) or  #\n        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #\n      return True\n\n    return False\n\n  def _clean_text(self, text):\n    \"\"\"Performs invalid character removal and whitespace cleanup on text.\"\"\"\n    output = []\n    for char in text:\n      cp = ord(char)\n      if cp == 0 or cp == 0xfffd or _is_control(char):\n        continue\n      if _is_whitespace(char):\n        output.append(\" \")\n      else:\n        output.append(char)\n    return \"\".join(output)\n\n\nclass WordpieceTokenizer(object):\n  \"\"\"Runs WordPiece tokenziation.\"\"\"\n\n  def __init__(self, vocab, unk_token=\"[UNK]\", max_input_chars_per_word=200):\n    self.vocab = vocab\n    self.unk_token = unk_token\n    self.max_input_chars_per_word = max_input_chars_per_word\n\n  def tokenize(self, text):\n    \"\"\"Tokenizes a piece of text into its word pieces.\n\n    This uses a greedy longest-match-first algorithm to perform tokenization\n    using the given vocabulary.\n\n    For example:\n      input = \"unaffable\"\n      output = [\"un\", \"##aff\", \"##able\"]\n\n    Args:\n      text: A single token or whitespace separated tokens. This should have\n        already been passed through `BasicTokenizer.\n\n    Returns:\n      A list of wordpiece tokens.\n    \"\"\"\n\n    text = convert_to_unicode(text)\n\n    output_tokens = []\n    for token in whitespace_tokenize(text):\n      chars = list(token)\n      if len(chars) > self.max_input_chars_per_word:\n        output_tokens.append(self.unk_token)\n        continue\n\n      is_bad = False\n      start = 0\n      sub_tokens = []\n      while start < len(chars):\n        end = len(chars)\n        cur_substr = None\n        while start < end:\n          substr = \"\".join(chars[start:end])\n          if start > 0:\n            substr = \"##\" + six.ensure_str(substr)\n          if substr in self.vocab:\n            cur_substr = substr\n            break\n          end -= 1\n        if cur_substr is None:\n          is_bad = True\n          break\n        sub_tokens.append(cur_substr)\n        start = end\n\n      if is_bad:\n        output_tokens.append(self.unk_token)\n      else:\n        output_tokens.extend(sub_tokens)\n    return output_tokens\n\n\ndef _is_whitespace(char):\n  \"\"\"Checks whether `chars` is a whitespace character.\"\"\"\n  # \\t, \\n, and \\r are technically control characters but we treat them\n  # as whitespace since they are generally considered as such.\n  if char == \" \" or char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return True\n  cat = unicodedata.category(char)\n  if cat == \"Zs\":\n    return True\n  return False\n\n\ndef _is_control(char):\n  \"\"\"Checks whether `chars` is a control character.\"\"\"\n  # These are technically control characters but we count them as whitespace\n  # characters.\n  if char == \"\\t\" or char == \"\\n\" or char == \"\\r\":\n    return False\n  cat = unicodedata.category(char)\n  if cat in (\"Cc\", \"Cf\"):\n    return True\n  return False\n\n\ndef _is_punctuation(char):\n  \"\"\"Checks whether `chars` is a punctuation character.\"\"\"\n  cp = ord(char)\n  # We treat all non-letter/number ASCII as punctuation.\n  # Characters such as \"^\", \"$\", and \"`\" are not in the Unicode\n  # Punctuation class but we treat them as punctuation anyways, for\n  # consistency.\n  if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or\n      (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):\n    return True\n  cat = unicodedata.category(char)\n  if cat.startswith(\"P\"):\n    return True\n  return False\n"
  }
]