Full Code of brightmart/albert_zh for AI

master 52149e82faf3 cached

40 files

620.3 KB

210.9k tokens

440 symbols

1 requests

Download .txt

Showing preview only (642K chars total). Download the full file or copy to clipboard to get everything.

Repository: brightmart/albert_zh
Branch: master
Commit: 52149e82faf3
Files: 40
Total size: 620.3 KB

Directory structure:
gitextract_fwt0rbxl/

├── README.md
├── albert_config/
│   ├── albert_config_base.json
│   ├── albert_config_base_google_fast.json
│   ├── albert_config_large.json
│   ├── albert_config_small_google.json
│   ├── albert_config_tiny.json
│   ├── albert_config_tiny_google.json
│   ├── albert_config_tiny_google_fast.json
│   ├── albert_config_xlarge.json
│   ├── albert_config_xxlarge.json
│   ├── bert_config.json
│   └── vocab.txt
├── args.py
├── bert_utils.py
├── classifier_utils.py
├── create_pretrain_data.sh
├── create_pretraining_data.py
├── create_pretraining_data_google.py
├── data/
│   └── news_zh_1.txt
├── lamb_optimizer_google.py
├── modeling.py
├── modeling_google.py
├── modeling_google_fast.py
├── optimization.py
├── optimization_finetuning.py
├── optimization_google.py
├── resources/
│   ├── create_pretraining_data_roberta.py
│   └── shell_scripts/
│       └── create_pretrain_data_batch_webtext.sh
├── run_classifier.py
├── run_classifier_clue.py
├── run_classifier_clue.sh
├── run_classifier_lcqmc.sh
├── run_classifier_sp_google.py
├── run_pretraining.py
├── run_pretraining_google.py
├── run_pretraining_google_fast.py
├── similarity.py
├── test_changes.py
├── tokenization.py
└── tokenization_google.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# albert_zh

An Implementation of <a href="https://arxiv.org/pdf/1909.11942.pdf">A Lite Bert For Self-Supervised Learning Language Representations</a> with TensorFlow

ALBert is based on Bert, but with some improvements. It achieves state of the art performance on main benchmarks with 30% parameters less. 

For albert_base_zh it only has ten percentage parameters compare of original bert model, and main accuracy is retained. 


Different version of ALBERT pre-trained model for Chinese, including TensorFlow, PyTorch and Keras, is available now.

海量中文语料上预训练ALBERT模型：参数更少，效果更好。预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准

<a href='https://www.cluebenchmarks.com/clueai.html'>clueai工具包: 三行代码，三分钟定制一个NLP的API（零样本学习）</a>

<img src="https://github.com/brightmart/albert_zh/blob/master/resources/albert_tiny_compare_s.jpg"  width="90%" height="70%" />

一键运行10个数据集、9个基线模型、不同任务上模型效果的详细对比，见<a href="http://www.CLUEbenchmarks.com">CLUE benchmark</a>

一键运行CLUE中文任务：6个中文分类或句子对任务（新）
---------------------------------------------------------------------
    使用方式：
    1、克隆项目
       git clone https://github.com/brightmart/albert_zh.git
    2、运行一键运行脚本(GPU方式): 会自动下载模型和所有任务数据并开始运行。
       bash run_classifier_clue.sh
       执行该一键运行脚本将会自动下载所有任务数据，并为所有任务找到最优模型，然后测试得到提交结果
    

模型下载 Download Pre-trained Models of Chinese
-----------------------------------------------
1、<a href="https://storage.googleapis.com/albert_zh/albert_tiny.zip">albert_tiny_zh</a>, <a href="https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip">albert_tiny_zh(训练更久，累积学习20亿个样本)</a>，文件大小16M、参数为4M

    训练和推理预测速度提升约10倍，精度基本保留，模型大小为bert的1/25；语义相似度数据集LCQMC测试集上达到85.4%，相比bert_base仅下降1.5个点。

    lcqmc训练使用如下参数： --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 
    
    albert_tiny使用同样的大规模中文语料数据，层数仅为4层、hidden size等向量维度大幅减少; 尝试使用如下学习率来获得更好效果：{2e-5, 6e-5, 1e-4} 
    
    【使用场景】任务相对比较简单一些或实时性要求高的任务，如语义相似度等句子对任务、分类任务；比较难的任务如阅读理解等，可以使用其他大模型。

     例如，可以使用[Tensorflow Lite](https://www.tensorflow.org/lite)在移动端进行部署，本文[随后](#use_tflite)针对这一点进行了介绍，包括如何把模型转换成Tensorflow Lite格式和对其进行性能测试等。
     
     一键运行albert_tiny_zh(linux,lcqmc任务)：
     1) git clone https://github.com/brightmart/albert_zh
     2) cd albert_zh
     3) bash run_classifier_lcqmc.sh
1.1、<a href="https://storage.googleapis.com/albert_zh/albert_tiny_zh_google.zip">albert_tiny_google_zh(累积学习10亿个样本,google版本)</a>，模型大小16M、性能与albert_tiny_zh一致

1.2、<a href="https://storage.googleapis.com/albert_zh/albert_small_zh_google.zip">albert_small_google_zh(累积学习10亿个样本,google版本)</a>，
     
     速度比bert_base快4倍；LCQMC测试集上比Bert下降仅0.9个点；去掉adam后模型大小18.5M；使用方法，见 #下游任务 Fine-tuning on Downstream Task     
     
2、<a href="https://storage.googleapis.com/albert_zh/albert_large_zh.zip">albert_large_zh</a>,参数量，层数24，文件大小为64M
   
    参数量和模型大小为bert_base的六分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base上升0.2个点

3、<a href="https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip">albert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096)</a>; <a href="https://storage.googleapis.com/albert_zh/albert_base_zh.zip"> albert_base_zh(小模型体验版)</a>, 参数量12M, 层数12，大小为40M

    参数量为bert_base的十分之一，模型大小也十分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base下降约0.6~1个点；
    相比未预训练，albert_base提升14个点

4、<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip">albert_xlarge_zh_177k </a>; 
<a href="https://storage.googleapis.com/albert_zh/albert_xlarge_zh_183k.zip">albert_xlarge_zh_183k(优先尝试)</a>参数量，层数24，文件大小为230M
   
    参数量和模型大小为bert_base的二分之一；需要一张大的显卡；完整测试对比将后续添加；batch_size不能太小，否则可能影响精度

### 快速加载
依托于[Huggingface-Transformers 2.2.2](https://github.com/huggingface/transformers)，可轻松调用以上模型。
```
tokenizer = AutoTokenizer.from_pretrained("MODEL_NAME")
model = AutoModel.from_pretrained("MODEL_NAME")
```

其中`MODEL_NAME`对应列表如下：

| 模型名 | MODEL_NAME |
| - | - |
| albert_tiny_google_zh | voidful/albert_chinese_tiny |
| albert_small_google_zh | voidful/albert_chinese_small  |
| albert_base_zh (from google) | voidful/albert_chinese_base   |
| albert_large_zh (from google) | voidful/albert_chinese_large   |
| albert_xlarge_zh (from google) | voidful/albert_chinese_xlarge   |
| albert_xxlarge_zh (from google) | voidful/albert_chinese_xxlarge   |

更多通过transformers使用albert的<a href='https://huggingface.co/models?search=albert_chinese'>示例</a>

预训练 Pre-training
-----------------------------------------------

#### 生成特定格式的文件(tfrecords) Generate tfrecords Files

Run following command 运行以下命令即可。项目自动了一个示例的文本文件(data/news_zh_1.txt)
   
       bash create_pretrain_data.sh
   
如果你有很多文本文件，可以通过传入参数的方式，生成多个特定格式的文件(tfrecords）

###### Support English and Other Non-Chinese Language: 
    If you are doing pre-train for english or other language,which is not chinese, 
    you should set hyperparameter of non_chinese to True on create_pretraining_data.py; 
    otherwise, by default it is doing chinese pre-train using whole word mask of chinese.

#### 执行预训练 pre-training on GPU/TPU using the command
    GPU(brightmart版, tiny模型):
    export BERT_BASE_DIR=./albert_tiny_zh
    nohup python3 run_pretraining.py --input_file=./data/tf*.tfrecord  \
    --output_dir=./my_new_model_path --do_train=True --do_eval=True --bert_config_file=$BERT_BASE_DIR/albert_config_tiny.json \
    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=51 \
    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176    \
    --save_checkpoints_steps=2000  --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &
    
    GPU(Google版本, small模型):
    export BERT_BASE_DIR=./albert_small_zh_google
    nohup python3 run_pretraining_google.py --input_file=./data/tf*.tfrecord --eval_batch_size=64 \
    --output_dir=./my_new_model_path --do_train=True --do_eval=True --albert_config_file=$BERT_BASE_DIR/albert_config_small_google.json  --export_dir=./my_new_model_path_export \
    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=20 \
    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176   \
    --save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt
    
    TPU, add something like this:
        --use_tpu=True  --tpu_name=grpc://10.240.1.66:8470 --tpu_zone=us-central1-a
        
    注：如果你重头开始训练，可以不指定init_checkpoint；
    如果你从现有的模型基础上训练，指定一下BERT_BASE_DIR的路径，并确保bert_config_file和init_checkpoint两个参数的值能对应到相应的文件上；
    领域上的预训练，根据数据的大小，可以不用训练特别久。

环境 Environment
-----------------------------------------------
Use Python3 + Tensorflow 1.x 

e.g. Tensorflow 1.4 or 1.5


下游任务 Fine-tuning on Downstream Task
-----------------------------------------------
##### 使用TensorFlow:

以使用albert_base做LCQMC任务为例。LCQMC任务是在口语化描述的数据集上做文本的相似性预测。

We will use LCQMC dataset for fine-tuning, it is oral language corpus, it is used to train and predict semantic similarity of a pair of sentences.

下载<a href="https://drive.google.com/open?id=1HXYMqsXjmA5uIfu_SFqP7r_vZZG-m_H0">LCQMC</a>数据集，包含训练、验证和测试集，训练集包含24万口语化描述的中文句子对，标签为1或0。1为句子语义相似，0为语义不相似。

通过运行下列命令做LCQMC数据集上的fine-tuning:
    
    1. Clone this project:
          
          git clone https://github.com/brightmart/albert_zh.git
          
    2. Fine-tuning by running the following command.
        brightmart版本的tiny模型
        export BERT_BASE_DIR=./albert_tiny_zh
        export TEXT_DIR=./lcqmc
        nohup python3 run_classifier.py   --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \
        --bert_config_file=./albert_config/albert_config_tiny.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4  --num_train_epochs=5 \
        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &
        
        google版本的small模型
        export BERT_BASE_DIR=./albert_small_zh
        export TEXT_DIR=./lcqmc
        nohup python3 run_classifier_sp_google.py --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \
        --albert_config_file=./$BERT_BASE_DIR/albert_config_small_google.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 \
        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt &

    Notice/注：
        1) you need to download pre-trained chinese albert model, and also download LCQMC dataset 
        你需要下载预训练的模型，并放入到项目当前项目，假设目录名称为albert_tiny_zh; 需要下载LCQMC数据集，并放入到当前项目，
        假设数据集目录名称为lcqmc

        2) for Fine-tuning, you can try to add small percentage of dropout(e.g. 0.1) by changing parameters of 
          attention_probs_dropout_prob & hidden_dropout_prob on albert_config_xxx.json. By default, we set dropout as zero. 
        
        3) you can try different learning rate {2e-5, 6e-5, 1e-4} for better performance 


Updates
-----------------------------------------------
**\*\*\*\*\* 2019-11-03: add google version of albert_small, albert_tiny; 

add method to deploy ablert_tiny to mobile devices with only 0.1 second inference time for sequence length 128, 60M memory \*\*\*\*\***

**\*\*\*\*\* 2019-10-30: add a simple guide about converting the model to Tensorflow Lite for edge deployment \*\*\*\*\***

**\*\*\*\*\* 2019-10-15: albert_tiny_zh, 10 times fast than bert base for training and inference, accuracy remains \*\*\*\*\***

**\*\*\*\*\* 2019-10-07: more models of albert \*\*\*\*\***

add albert_xlarge_zh; albert_base_zh_additional_steps, training with more instances

**\*\*\*\*\* 2019-10-04: PyTorch and Keras versions of albert were supported \*\*\*\*\***

a.Convert to PyTorch version and do your tasks through <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch</a>

b.Load pre-trained model with keras using one line of codes through <a href="https://github.com/bojone/bert4keras">bert4keras</a>

c.Use albert with TensorFlow 2.0: Use or load pre-trained model with tf2.0 through <a href="https://github.com/kpe/bert-for-tf2">bert-for-tf2</a>

Releasing albert_xlarge on 6th Oct

**\*\*\*\*\* 2019-10-02: albert_large_zh,albert_base_zh \*\*\*\*\***

Relesed albert_base_zh with only 10% parameters of bert_base, a small model(40M) & training can be very fast. 

Relased albert_large_zh with only 16% parameters of bert_base(64M)

**\*\*\*\*\* 2019-09-28: codes and test functions \*\*\*\*\*** 

Add codes and test functions for three main changes of albert from bert

ALBERT模型介绍 Introduction of ALBERT
-----------------------------------------------
ALBERT模型是BERT的改进版，与最近其他State of the art的模型不同的是，这次是预训练小模型，效果更好、参数更少。

它对BERT进行了三个改造 Three main changes of ALBert from Bert：

1）词嵌入向量参数的因式分解 Factorized embedding parameterization
   
     O(V * H) to O(V * E + E * H)
     
     如以ALBert_xxlarge为例，V=30000, H=4096, E=128
       
     那么原先参数为V * H= 30000 * 4096 = 1.23亿个参数，现在则为V * E + E * H = 30000*128+128*4096 = 384万 + 52万 = 436万，
       
     词嵌入相关的参数变化前是变换后的28倍。


2）跨层参数共享 Cross-Layer Parameter Sharing

     参数共享能显著减少参数。共享可以分为全连接层、注意力层的参数共享；注意力层的参数对效果的减弱影响小一点。

3）段落连续性任务 Inter-sentence coherence loss.
     
     使用段落连续性任务。正例，使用从一个文档中连续的两个文本段落；负例，使用从一个文档中连续的两个文本段落，但位置调换了。
     
     避免使用原有的NSP任务，原有的任务包含隐含了预测主题这类过于简单的任务。

      We maintain that inter-sentence modeling is an important aspect of language understanding, but we propose a loss 
      based primarily on coherence. That is, for ALBERT, we use a sentence-order prediction (SOP) loss, which avoids topic 
      prediction and instead focuses on modeling inter-sentence coherence. The SOP loss uses as positive examples the 
      same technique as BERT (two consecutive segments from the same document), and as negative examples the same two 
      consecutive segments but with their order swapped. This forces the model to learn finer-grained distinctions about
      discourse-level coherence properties. 

其他变化，还有 Other changes：

    1）去掉了dropout  Remove dropout to enlarge capacity of model.
        最大的模型，训练了1百万步后，还是没有过拟合训练数据。说明模型的容量还可以更大，就移除了dropout
        （dropout可以认为是随机的去掉网络中的一部分，同时使网络变小一些）
        We also note that, even after training for 1M steps, our largest models still do not overfit to their training data. 
        As a result, we decide to remove dropout to further increase our model capacity.
        其他型号的模型，在我们的实现中我们还是会保留原始的dropout的比例，防止模型对训练数据的过拟合。
        
    2）为加快训练速度，使用LAMB做为优化器 Use LAMB as optimizer, to train with big batch size
      使用了大的batch_size来训练(4096)。 LAMB优化器使得我们可以训练，特别大的批次batch_size，如高达6万。
    
    3）使用n-gram(uni-gram,bi-gram, tri-gram）来做遮蔽语言模型 Use n-gram as make language model
       即以不同的概率使用n-gram,uni-gram的概率最大，bi-gram其次，tri-gram概率最小。
       本项目中目前使用的是在中文上做whole word mask，稍后会更新一下与n-gram mask的效果对比。n-gram从spanBERT中来。


训练语料/训练配置 Training Data & Configuration
-----------------------------------------------
30g中文语料，超过100亿汉字，包括多个百科、新闻、互动社区。

预训练序列长度sequence_length设置为512，批次batch_size为4096，训练产生了3.5亿个训练数据(instance)；每一个模型默认会训练125k步，albert_xxlarge将训练更久。

作为比较，roberta_zh预训练产生了2.5亿个训练数据、序列长度为256。由于albert_zh预训练生成的训练数据更多、使用的序列长度更长，
 
    我们预计albert_zh会有比roberta_zh更好的性能表现，并且能更好处理较长的文本。

训练使用TPU v3 Pod，我们使用的是v3-256，它包含32个v3-8。每个v3-8机器，含有128G的显存。


模型性能与对比(英文) Performance and Comparision
-----------------------------------------------    
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/state_of_the_art.jpg"  width="80%" height="40%" />
  
   
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/albert_performance.jpg"  width="80%" height="40%" />


<img src="https://github.com/brightmart/albert_zh/blob/master/resources/add_data_removing_dropout.jpg"  width="80%" height="40%" />


中文任务集上效果对比测试 Performance on Chinese datasets
----------------------------------------------- 

###  问题匹配语任务：LCQMC(Sentence Pair Matching)

| 模型 | 开发集(Dev) | 测试集(Test) |
| :------- | :---------: | :---------: |
| BERT | 89.4(88.4) | 86.9(86.4) | 
| ERNIE | 89.8 (89.6) | 87.2 (87.0) | 
| BERT-wwm |89.4 (89.2) | 87.0 (86.8) | 
| BERT-wwm-ext | - |-  |
| RoBERTa-zh-base | 88.7 | 87.0  |
| RoBERTa-zh-Large | ***89.9(89.6)*** | 87.2(86.7) |
| RoBERTa-zh-Large(20w_steps) | 89.7| 87.0 |
| ALBERT-zh-tiny | -- | 85.4 |
| ALBERT-zh-small | -- | 86.0 |
| ALBERT-zh-small(Pytorch) | -- | 86.8 |
| ALBERT-zh-base-additional-36k-steps | 87.8 | 86.3 |
| ALBERT-zh-base | 87.2 | 86.3 |
| ALBERT-large | 88.7 | 87.1 |
| ALBERT-xlarge | 87.3 | ***87.7*** |

注：只跑了一次ALBERT-xlarge，效果还可能提升

### 自然语言推断：XNLI of Chinese Version

| 模型 | 开发集 | 测试集 |
| :------- | :---------: | :---------: |
| BERT | 77.8 (77.4) | 77.8 (77.5) | 
| ERNIE | 79.7 (79.4) | 78.6 (78.2) | 
| BERT-wwm | 79.0 (78.4) | 78.2 (78.0) | 
| BERT-wwm-ext | 79.4 (78.6) | 78.7 (78.3) |
| XLNet | 79.2  | 78.7 |
| RoBERTa-zh-base | 79.8 |78.8  |
| RoBERTa-zh-Large | 80.2 (80.0) | 79.9 (79.5) |
| ALBERT-base | 77.0 | 77.1 |
| ALBERT-large | 78.0 | 77.5 |
| ALBERT-xlarge | ? | ? |

注：BERT-wwm-ext来自于<a href="https://github.com/ymcui/Chinese-BERT-wwm">这里</a>；XLNet来自于<a href="https://github.com/ymcui/Chinese-PreTrained-XLNet">这里</a>; RoBERTa-zh-base，指12层RoBERTa中文模型
   

###  阅读理解任务：CRMC2018

<img src="https://github.com/brightmart/albert_zh/blob/master/resources/crmc2018_compare_s.jpg"  width="90%" height="70%" />


### 语言模型、文本段预测准确性、训练时间 Mask Language Model Accuarcy & Training Time

| Model | MLM eval acc | SOP eval acc | Training(Hours) | Loss eval |
| :------- | :---------: | :---------: | :---------: |:---------: |
| albert_zh_base | 79.1% | 99.0% | 6h | 1.01|
| albert_zh_large | 80.9% | 98.6% | 22.5h | 0.93|
| albert_zh_xlarge | ? | ? | 53h(预估) | ? |
| albert_zh_xxlarge | ? | ? | 106h(预估) | ? |

注：? 将很快替换

模型参数和配置 Configuration of Models
-----------------------------------------------
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/albert_configuration.jpg"  width="80%" height="40%" />

代码实现和测试 Implementation and Code Testing
-----------------------------------------------
通过运行以下命令测试主要的改进点，包括但不限于词嵌入向量参数的因式分解、跨层参数共享、段落连续性任务等。

    python test_changes.py

##### <a name="use_tflite"></a>使用TensorFlow Lite(TFLite)在移动端进行部署:
这里我们主要介绍TFLite模型格式转换和性能测试。转换成TFLite模型后，对于如何在移
动端使用该模型，可以参考TFLite提供的[Android/iOS应用完整开发案例教程页面](https://www.tensorflow.org/lite/examples)。
该页面目前已经包含了[文本分类](https://github.com/tensorflow/examples/blob/master/lite/examples/text_classification/android)，
[文本问答](https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/android)两个Android案例。

下面以<a href="https://storage.googleapis.com/albert_zh/albert_tiny.zip">albert_tiny_zh</a>
为例来介绍TFLite模型格式转换和性能测试：

1. Freeze graph from the checkpoint

Ensure to have >=1.14 1.x installed to use the freeze_graph tool as it is removed from 2.x distribution

    pip install tensorflow==1.15

    freeze_graph --input_checkpoint=./albert_model.ckpt \
      --output_graph=/tmp/albert_tiny_zh.pb \
      --output_node_names=cls/predictions/truediv \
      --checkpoint_version=1 --input_meta_graph=./albert_model.ckpt.meta --input_binary=true

2. Convert to TFLite format

We are going to use the new experimental tf->tflite converter that's distributed with the Tensorflow nightly build.

    pip install tf-nightly

    tflite_convert --graph_def_file=/tmp/albert_tiny_zh.pb \
      --input_arrays='input_ids,input_mask,segment_ids,masked_lm_positions,masked_lm_ids,masked_lm_weights' \
      --output_arrays='cls/predictions/truediv' \
      --input_shapes=1,128:1,128:128:1,128:1,128:1,128 \
      --output_file=/tmp/albert_tiny_zh.tflite \
      --enable_v1_converter --experimental_new_converter

3. Benchmark the performance of the TFLite model

See [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark) 
for details about the performance benchmark tools in TFLite. For example: after
building the benchmark tool binary for an Android phone, do the following to
get an idea of how the TFLite model performs on the phone

    adb push /tmp/albert_tiny_zh.tflite /data/local/tmp/
    adb shell /data/local/tmp/benchmark_model_performance_options --graph=/data/local/tmp/albert_tiny_zh.tflite --perf_options_list=cpu

On an Android phone w/ Qualcomm's SD845 SoC, via the above benchmark tool, as
of 2019/11/01, the inference latency is ~120ms w/ this converted TFLite model
using 4 threads on CPU, and the memory usage is ~60MB for the model during
inference. Note the performance will improve further with future TFLite
implementation optimizations.

##### 使用PyTorch版本:

    download pre-trained model, and convert to PyTorch using:
     
      python convert_albert_tf_checkpoint_to_pytorch.py     
     
   using <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch
   
##### 使用Keras加载:

<a href="https://github.com/bojone/bert4keras">bert4keras</a> 适配albert，能成功加载albert_zh的权重，只需要在load_pretrained_model函数里加上albert=True

load pre-trained model with bert4keras

##### 使用tf2.0加载:

<a href="https://github.com/kpe/bert-for-tf2">bert-for-tf2</a>


使用案例-基于用户输入预测文本相似性 Use Case-Text Similarity Based on User Input
-------------------------------------------------

功能说明：用户可以通过本例了解如何加载训训练集实现基于用户输入的短文本相似度判断。可以基于该代码将程序灵活地拓展为后台服务或增加文本分类等示例。

涉及代码：similarity.py、args.py

步骤：

1、使用本模型进行文本相似性训练，保存模型文件至相应目录下

2、根据实际情况，修改args.py中的参数，参数说明如下：

```python
#模型目录，存放ckpt文件
model_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')

#config文件，存放模型的json文件
config_name = os.path.join(file_path, 'albert_config/albert_config_tiny.json')

#ckpt文件名称
ckpt_name = os.path.join(model_dir, 'model.ckpt')

#输出文件目录，训练时的模型输出目录
output_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')

#vocab文件目录
vocab_file = os.path.join(file_path, 'albert_config/vocab.txt')

#数据目录，训练使用的数据集存放目录
data_dir = os.path.join(file_path, 'data/')
```

本例中的文件结构为：

    |__args.py
    
    |__similarity.py
    
    |__data
    
    |__albert_config
    
    |__albert_lcqmc_checkpoints
    
    |__lcqmc

3、修改用户输入单词

打开similarity.py，最底部如下代码：

```python
if __name__ == '__main__':
    sim = BertSim()
    sim.start_model()
    sim.predict_sentences([("我喜欢妈妈做的汤", "妈妈做的汤我很喜欢喝")])
```

其中sim.start_model()表示加载模型，sim.predict_sentences的输入为一个元组数组，元组中包含两个元素分别为需要判定相似的句子。

4、运行python文件：similarity.py


支持的序列长度与批次大小的关系,12G显存 Trade off between batch Size and sequence length
-------------------------------------------------

System       | Seq Length | Max Batch Size
------------ | ---------- | --------------
`albert-base`  | 64         | 64
...          | 128        | 32
...          | 256        | 16
...          | 320        | 14
...          | 384        | 12
...          | 512        | 6
`albert-large` | 64         | 12
...          | 128        | 6
...          | 256        | 2
...          | 320        | 1
...          | 384        | 0
...          | 512        | 0
`albert-xlarge` | -         | -

学习曲线 Training Loss of xlarge of albert_zh
-------------------------------------------------
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/xlarge_loss.jpg"  width="80%" height="40%" />

所有的参数 Parameters of albert_xlarge
-------------------------------------------------
<img src="https://github.com/brightmart/albert_zh/blob/master/resources/albert_large_zh_parameters.jpg"  width="80%" height="40%" />


#### 技术交流与问题讨论QQ群: 836811304 Join us on QQ group

If you have any question, you can raise an issue, or send me an email: brightmart@hotmail.com;

Currently how to use PyTorch version of albert is not clear yet, if you know how to do that, just email us or open an issue.

You can also send pull request to report you performance on your task or add methods on how to load models for PyTorch and so on.

If you have ideas for generate best performance pre-training Chinese model, please also let me know.

##### Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)

Cite Us
-----------------------------------------------
Bright Liang Xu, albert_zh, (2019), GitHub repository, https://github.com/brightmart/albert_zh

Reference
-----------------------------------------------
1、<a href="https://arxiv.org/pdf/1909.11942.pdf">ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations</a>

2、<a href="https://arxiv.org/pdf/1810.04805.pdf">BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</a>

3、<a href="https://arxiv.org/abs/1907.10529">SpanBERT: Improving Pre-training by Representing and Predicting Spans</a>

4、<a href="https://arxiv.org/pdf/1907.11692.pdf">RoBERTa: A Robustly Optimized BERT Pretraining Approach</a>

5、<a href="https://arxiv.org/pdf/1904.00962.pdf">Large Batch Optimization for Deep Learning: Training BERT in 76 minutes(LAMB)</a>

6、<a href="https://github.com/ymcui/LAMB_Optimizer_TF">LAMB Optimizer,TensorFlow version</a>

7、<a href="http://baijiahao.baidu.com/s?id=1645712785366950083&wfr=spider&for=pc">预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准</a>

8、 <a href="https://github.com/lonePatient/albert_pytorch">albert_pytorch</a>

9、<a href="https://github.com/bojone/bert4keras">load albert with keras</a>

10、<a href="https://github.com/kpe/bert-for-tf2">load albert with tf2.0</a>

11、<a href="https://github.com/google-research/google-research/tree/master/albert">repo of albert from google</a>

12、<a href="https://github.com/chineseGLUE/chineseGLUE">chineseGLUE-中文任务基准测评：公开可用多个任务、基线模型、广泛测评与效果对比</a>






================================================
FILE: albert_config/albert_config_base.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 3072 ,
  "max_position_embeddings": 512, 
  "num_attention_heads": 12,
  "num_hidden_layers": 12,

  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
   "ln_type":"postln"

}


================================================
FILE: albert_config/albert_config_base_google_fast.json
================================================
{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size": 128,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_hidden_groups": 12,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 21128
}

================================================
FILE: albert_config/albert_config_large.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 1024,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 4096,
  "max_position_embeddings": 512, 
  "num_attention_heads": 16,
  "num_hidden_layers": 24,

  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
   "ln_type":"postln"

}


================================================
FILE: albert_config/albert_config_small_google.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "embedding_size": 128,
  "hidden_size": 384,
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 6,
  "num_hidden_groups": 1,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 21128
}

================================================
FILE: albert_config/albert_config_tiny.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 312,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 1248 ,
  "max_position_embeddings": 512, 
  "num_attention_heads": 12,
  "num_hidden_layers": 4,

  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
   "ln_type":"postln"

}


================================================
FILE: albert_config/albert_config_tiny_google.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "embedding_size": 128,
  "hidden_size": 312,
  "initializer_range": 0.02,
  "intermediate_size": 1248,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 4,
  "num_hidden_groups": 1,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 21128
}


================================================
FILE: albert_config/albert_config_tiny_google_fast.json
================================================
{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "embedding_size": 128,
  "hidden_size": 336,
  "initializer_range": 0.02,
  "intermediate_size": 1344,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 4,
  "num_hidden_groups": 12,
  "net_structure_type": 0,
  "gap_size": 0,
  "num_memory_blocks": 0,
  "inner_group_num": 1,
  "down_scale_factor": 1,
  "type_vocab_size": 2,
  "vocab_size": 21128
}

================================================
FILE: albert_config/albert_config_xlarge.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 2048,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 8192,
  "max_position_embeddings": 512, 
  "num_attention_heads": 32,
  "num_hidden_layers": 24,

  "pooler_fc_size": 1024,
  "pooler_num_attention_heads": 64,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
  "ln_type":"postln"

}


================================================
FILE: albert_config/albert_config_xxlarge.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 4096,
  "embedding_size": 128,
  "initializer_range": 0.02, 
  "intermediate_size": 16384,
  "max_position_embeddings": 512, 
  "num_attention_heads": 64,
  "num_hidden_layers": 12,

  "pooler_fc_size": 1024,
  "pooler_num_attention_heads": 64,
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128,
   "ln_type":"preln"

}


================================================
FILE: albert_config/bert_config.json
================================================
{
  "attention_probs_dropout_prob": 0.0,
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.0,
  "hidden_size": 768, 
  "initializer_range": 0.02, 
  "intermediate_size": 3072, 
  "max_position_embeddings": 512, 
  "num_attention_heads": 12, 
  "num_hidden_layers": 12, 
  "pooler_fc_size": 768, 
  "pooler_num_attention_heads": 12, 
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128
}


================================================
FILE: albert_config/vocab.txt
================================================
[PAD]
[unused1]
[unused2]
[unused3]
[unused4]
[unused5]
[unused6]
[unused7]
[unused8]
[unused9]
[unused10]
[unused11]
[unused12]
[unused13]
[unused14]
[unused15]
[unused16]
[unused17]
[unused18]
[unused19]
[unused20]
[unused21]
[unused22]
[unused23]
[unused24]
[unused25]
[unused26]
[unused27]
[unused28]
[unused29]
[unused30]
[unused31]
[unused32]
[unused33]
[unused34]
[unused35]
[unused36]
[unused37]
[unused38]
[unused39]
[unused40]
[unused41]
[unused42]
[unused43]
[unused44]
[unused45]
[unused46]
[unused47]
[unused48]
[unused49]
[unused50]
[unused51]
[unused52]
[unused53]
[unused54]
[unused55]
[unused56]
[unused57]
[unused58]
[unused59]
[unused60]
[unused61]
[unused62]
[unused63]
[unused64]
[unused65]
[unused66]
[unused67]
[unused68]
[unused69]
[unused70]
[unused71]
[unused72]
[unused73]
[unused74]
[unused75]
[unused76]
[unused77]
[unused78]
[unused79]
[unused80]
[unused81]
[unused82]
[unused83]
[unused84]
[unused85]
[unused86]
[unused87]
[unused88]
[unused89]
[unused90]
[unused91]
[unused92]
[unused93]
[unused94]
[unused95]
[unused96]
[unused97]
[unused98]
[unused99]
[UNK]
[CLS]
[SEP]
[MASK]
<S>
<T>
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
@
[
\
]
^
_
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
£
¤
¥
§
©
«
®
°
±
²
³
µ
·
¹
º
»
¼
×
ß
æ
÷
ø
đ
ŋ
ɔ
ə
ɡ
ʰ
ˇ
ˈ
ˊ
ˋ
ˍ
ː
˙
˚
ˢ
α
β
γ
δ
ε
η
θ
ι
κ
λ
μ
ν
ο
π
ρ
ς
σ
τ
υ
φ
χ
ψ
ω
а
б
в
г
д
е
ж
з
и
к
л
м
н
о
п
р
с
т
у
ф
х
ц
ч
ш
ы
ь
я
і
ا
ب
ة
ت
د
ر
س
ع
ل
م
ن
ه
و
ي
۩
ก
ง
น
ม
ย
ร
อ
า
เ
๑
་
ღ
ᄀ
ᄁ
ᄂ
ᄃ
ᄅ
ᄆ
ᄇ
ᄈ
ᄉ
ᄋ
ᄌ
ᄎ
ᄏ
ᄐ
ᄑ
ᄒ
ᅡ
ᅢ
ᅣ
ᅥ
ᅦ
ᅧ
ᅨ
ᅩ
ᅪ
ᅬ
ᅭ
ᅮ
ᅯ
ᅲ
ᅳ
ᅴ
ᅵ
ᆨ
ᆫ
ᆯ
ᆷ
ᆸ
ᆺ
ᆻ
ᆼ
ᗜ
ᵃ
ᵉ
ᵍ
ᵏ
ᵐ
ᵒ
ᵘ
‖
„
†
•
‥
‧
 
‰
′
″
‹
›
※
‿
⁄
ⁱ
⁺
ⁿ
₁
₂
₃
₄
€
℃
№
™
ⅰ
ⅱ
ⅲ
ⅳ
ⅴ
←
↑
→
↓
↔
↗
↘
⇒
∀
−
∕
∙
√
∞
∟
∠
∣
∥
∩
∮
∶
∼
∽
≈
≒
≡
≤
≥
≦
≧
≪
≫
⊙
⋅
⋈
⋯
⌒
①
②
③
④
⑤
⑥
⑦
⑧
⑨
⑩
⑴
⑵
⑶
⑷
⑸
⒈
⒉
⒊
⒋
ⓒ
ⓔ
ⓘ
─
━
│
┃
┅
┆
┊
┌
└
├
┣
═
║
╚
╞
╠
╭
╮
╯
╰
╱
╳
▂
▃
▅
▇
█
▉
▋
▌
▍
▎
■
□
▪
▫
▬
▲
△
▶
►
▼
▽
◆
◇
○
◎
●
◕
◠
◢
◤
☀
★
☆
☕
☞
☺
☼
♀
♂
♠
♡
♣
♥
♦
♪
♫
♬
✈
✔
✕
✖
✦
✨
✪
✰
✿
❀
❤
➜
➤
⦿
、
。
〃
々
〇
〈
〉
《
》
「
」
『
』
【
】
〓
〔
〕
〖
〗
〜
〝
〞
ぁ
あ
ぃ
い
う
ぇ
え
お
か
き
く
け
こ
さ
し
す
せ
そ
た
ち
っ
つ
て
と
な
に
ぬ
ね
の
は
ひ
ふ
へ
ほ
ま
み
む
め
も
ゃ
や
ゅ
ゆ
ょ
よ
ら
り
る
れ
ろ
わ
を
ん
゜
ゝ
ァ
ア
ィ
イ
ゥ
ウ
ェ
エ
ォ
オ
カ
キ
ク
ケ
コ
サ
シ
ス
セ
ソ
タ
チ
ッ
ツ
テ
ト
ナ
ニ
ヌ
ネ
ノ
ハ
ヒ
フ
ヘ
ホ
マ
ミ
ム
メ
モ
ャ
ヤ
ュ
ユ
ョ
ヨ
ラ
リ
ル
レ
ロ
ワ
ヲ
ン
ヶ
・
ー
ヽ
ㄅ
ㄆ
ㄇ
ㄉ
ㄋ
ㄌ
ㄍ
ㄎ
ㄏ
ㄒ
ㄚ
ㄛ
ㄞ
ㄟ
ㄢ
ㄤ
ㄥ
ㄧ
ㄨ
ㆍ
㈦
㊣
㎡
㗎
一
丁
七
万
丈
三
上
下
不
与
丐
丑
专
且
丕
世
丘
丙
业
丛
东
丝
丞
丟
両
丢
两
严
並
丧
丨
个
丫
中
丰
串
临
丶
丸
丹
为
主
丼
丽
举
丿
乂
乃
久
么
义
之
乌
乍
乎
乏
乐
乒
乓
乔
乖
乗
乘
乙
乜
九
乞
也
习
乡
书
乩
买
乱
乳
乾
亀
亂
了
予
争
事
二
于
亏
云
互
五
井
亘
亙
亚
些
亜
亞
亟
亡
亢
交
亥
亦
产
亨
亩
享
京
亭
亮
亲
亳
亵
人
亿
什
仁
仃
仄
仅
仆
仇
今
介
仍
从
仏
仑
仓
仔
仕
他
仗
付
仙
仝
仞
仟
代
令
以
仨
仪
们
仮
仰
仲
件
价
任
份
仿
企
伉
伊
伍
伎
伏
伐
休
伕
众
优
伙
会
伝
伞
伟
传
伢
伤
伦
伪
伫
伯
估
伴
伶
伸
伺
似
伽
佃
但
佇
佈
位
低
住
佐
佑
体
佔
何
佗
佘
余
佚
佛
作
佝
佞
佟
你
佢
佣
佤
佥
佩
佬
佯
佰
佳
併
佶
佻
佼
使
侃
侄
來
侈
例
侍
侏
侑
侖
侗
供
依
侠
価
侣
侥
侦
侧
侨
侬
侮
侯
侵
侶
侷
便
係
促
俄
俊
俎
俏
俐
俑
俗
俘
俚
保
俞
俟
俠
信
俨
俩
俪
俬
俭
修
俯
俱
俳
俸
俺
俾
倆
倉
個
倌
倍
倏
們
倒
倔
倖
倘
候
倚
倜
借
倡
値
倦
倩
倪
倫
倬
倭
倶
债
值
倾
偃
假
偈
偉
偌
偎
偏
偕
做
停
健
側
偵
偶
偷
偻
偽
偿
傀
傅
傍
傑
傘
備
傚
傢
傣
傥
储
傩
催
傭
傲
傳
債
傷
傻
傾
僅
働
像
僑
僕
僖
僚
僥
僧
僭
僮
僱
僵
價
僻
儀
儂
億
儆
儉
儋
儒
儕
儘
償
儡
優
儲
儷
儼
儿
兀
允
元
兄
充
兆
兇
先
光
克
兌
免
児
兑
兒
兔
兖
党
兜
兢
入
內
全
兩
八
公
六
兮
兰
共
兲
关
兴
兵
其
具
典
兹
养
兼
兽
冀
内
円
冇
冈
冉
冊
册
再
冏
冒
冕
冗
写
军
农
冠
冢
冤
冥
冨
冪
冬
冯
冰
冲
决
况
冶
冷
冻
冼
冽
冾
净
凄
准
凇
凈
凉
凋
凌
凍
减
凑
凛
凜
凝
几
凡
凤
処
凪
凭
凯
凰
凱
凳
凶
凸
凹
出
击
函
凿
刀
刁
刃
分
切
刈
刊
刍
刎
刑
划
列
刘
则
刚
创
初
删
判
別
刨
利
刪
别
刮
到
制
刷
券
刹
刺
刻
刽
剁
剂
剃
則
剉
削
剋
剌
前
剎
剐
剑
剔
剖
剛
剜
剝
剣
剤
剥
剧
剩
剪
副
割
創
剷
剽
剿
劃
劇
劈
劉
劊
劍
劏
劑
力
劝
办
功
加
务
劣
动
助
努
劫
劭
励
劲
劳
労
劵
効
劾
势
勁
勃
勇
勉
勋
勐
勒
動
勖
勘
務
勛
勝
勞
募
勢
勤
勧
勳
勵
勸
勺
勻
勾
勿
匀
包
匆
匈
匍
匐
匕
化
北
匙
匝
匠
匡
匣
匪
匮
匯
匱
匹
区
医
匾
匿
區
十
千
卅
升
午
卉
半
卍
华
协
卑
卒
卓
協
单
卖
南
単
博
卜
卞
卟
占
卡
卢
卤
卦
卧
卫
卮
卯
印
危
即
却
卵
卷
卸
卻
卿
厂
厄
厅
历
厉
压
厌
厕
厘
厚
厝
原
厢
厥
厦
厨
厩
厭
厮
厲
厳
去
县
叁
参
參
又
叉
及
友
双
反
収
发
叔
取
受
变
叙
叛
叟
叠
叡
叢
口
古
句
另
叨
叩
只
叫
召
叭
叮
可
台
叱
史
右
叵
叶
号
司
叹
叻
叼
叽
吁
吃
各
吆
合
吉
吊
吋
同
名
后
吏
吐
向
吒
吓
吕
吖
吗
君
吝
吞
吟
吠
吡
否
吧
吨
吩
含
听
吭
吮
启
吱
吳
吴
吵
吶
吸
吹
吻
吼
吽
吾
呀
呂
呃
呆
呈
告
呋
呎
呐
呓
呕
呗
员
呛
呜
呢
呤
呦
周
呱
呲
味
呵
呷
呸
呻
呼
命
咀
咁
咂
咄
咆
咋
和
咎
咏
咐
咒
咔
咕
咖
咗
咘
咙
咚
咛
咣
咤
咦
咧
咨
咩
咪
咫
咬
咭
咯
咱
咲
咳
咸
咻
咽
咿
哀
品
哂
哄
哆
哇
哈
哉
哋
哌
响
哎
哏
哐
哑
哒
哔
哗
哟
員
哥
哦
哧
哨
哩
哪
哭
哮
哲
哺
哼
哽
唁
唄
唆
唇
唉
唏
唐
唑
唔
唠
唤
唧
唬
售
唯
唰
唱
唳
唷
唸
唾
啃
啄
商
啉
啊
問
啓
啕
啖
啜
啞
啟
啡
啤
啥
啦
啧
啪
啫
啬
啮
啰
啱
啲
啵
啶
啷
啸
啻
啼
啾
喀
喂
喃
善
喆
喇
喉
喊
喋
喎
喏
喔
喘
喙
喚
喜
喝
喟
喧
喪
喫
喬
單
喰
喱
喲
喳
喵
営
喷
喹
喺
喻
喽
嗅
嗆
嗇
嗎
嗑
嗒
嗓
嗔
嗖
嗚
嗜
嗝
嗟
嗡
嗣
嗤
嗦
嗨
嗪
嗬
嗯
嗰
嗲
嗳
嗶
嗷
嗽
嘀
嘅
嘆
嘈
嘉
嘌
嘍
嘎
嘔
嘖
嘗
嘘
嘚
嘛
嘜
嘞
嘟
嘢
嘣
嘤
嘧
嘩
嘭
嘮
嘯
嘰
嘱
嘲
嘴
嘶
嘸
嘹
嘻
嘿
噁
噌
噎
噓
噔
噗
噙
噜
噠
噢
噤
器
噩
噪
噬
噱
噴
噶
噸
噹
噻
噼
嚀
嚇
嚎
嚏
嚐
嚓
嚕
嚟
嚣
嚥
嚨
嚮
嚴
嚷
嚼
囂
囉
囊
囍
囑
囔
囗
囚
四
囝
回
囟
因
囡
团
団
囤
囧
囪
囫
园
困
囱
囲
図
围
囹
固
国
图
囿
圃
圄
圆
圈
國
圍
圏
園
圓
圖
團
圜
土
圣
圧
在
圩
圭
地
圳
场
圻
圾
址
坂
均
坊
坍
坎
坏
坐
坑
块
坚
坛
坝
坞
坟
坠
坡
坤
坦
坨
坪
坯
坳
坵
坷
垂
垃
垄
型
垒
垚
垛
垠
垢
垣
垦
垩
垫
垭
垮
垵
埂
埃
埋
城
埔
埕
埗
域
埠
埤
埵
執
埸
培
基
埼
堀
堂
堃
堅
堆
堇
堑
堕
堙
堡
堤
堪
堯
堰
報
場
堵
堺
堿
塊
塌
塑
塔
塗
塘
塚
塞
塢
塩
填
塬
塭
塵
塾
墀
境
墅
墉
墊
墒
墓
増
墘
墙
墜
增
墟
墨
墩
墮
墳
墻
墾
壁
壅
壆
壇
壊
壑
壓
壕
壘
壞
壟
壢
壤
壩
士
壬
壮
壯
声
売
壳
壶
壹
壺
壽
处
备
変
复
夏
夔
夕
外
夙
多
夜
够
夠
夢
夥
大
天
太
夫
夭
央
夯
失
头
夷
夸
夹
夺
夾
奂
奄
奇
奈
奉
奋
奎
奏
奐
契
奔
奕
奖
套
奘
奚
奠
奢
奥
奧
奪
奬
奮
女
奴
奶
奸
她
好
如
妃
妄
妆
妇
妈
妊
妍
妒
妓
妖
妘
妙
妝
妞
妣
妤
妥
妨
妩
妪
妮
妲
妳
妹
妻
妾
姆
姉
姊
始
姍
姐
姑
姒
姓
委
姗
姚
姜
姝
姣
姥
姦
姨
姪
姫
姬
姹
姻
姿
威
娃
娄
娅
娆
娇
娉
娑
娓
娘
娛
娜
娟
娠
娣
娥
娩
娱
娲
娴
娶
娼
婀
婁
婆
婉
婊
婕
婚
婢
婦
婧
婪
婭
婴
婵
婶
婷
婺
婿
媒
媚
媛
媞
媧
媲
媳
媽
媾
嫁
嫂
嫉
嫌
嫑
嫔
嫖
嫘
嫚
嫡
嫣
嫦
嫩
嫲
嫵
嫻
嬅
嬉
嬌
嬗
嬛
嬢
嬤
嬪
嬰
嬴
嬷
嬸
嬿
孀
孃
子
孑
孔
孕
孖
字
存
孙
孚
孛
孜
孝
孟
孢
季
孤
学
孩
孪
孫
孬
孰
孱
孳
孵
學
孺
孽
孿
宁
它
宅
宇
守
安
宋
完
宏
宓
宕
宗
官
宙
定
宛
宜
宝
实
実
宠
审
客
宣
室
宥
宦
宪
宫
宮
宰
害
宴
宵
家
宸
容
宽
宾
宿
寂
寄
寅
密
寇
富
寐
寒
寓
寛
寝
寞
察
寡
寢
寥
實
寧
寨
審
寫
寬
寮
寰
寵
寶
寸
对
寺
寻
导
対
寿
封
専
射
将
將
專
尉
尊
尋
對
導
小
少
尔
尕
尖
尘
尚
尝
尤
尧
尬
就
尴
尷
尸
尹
尺
尻
尼
尽
尾
尿
局
屁
层
屄
居
屆
屈
屉
届
屋
屌
屍
屎
屏
屐
屑
展
屜
属
屠
屡
屢
層
履
屬
屯
山
屹
屿
岀
岁
岂
岌
岐
岑
岔
岖
岗
岘
岙
岚
岛
岡
岩
岫
岬
岭
岱
岳
岷
岸
峇
峋
峒
峙
峡
峤
峥
峦
峨
峪
峭
峯
峰
峴
島
峻
峽
崁
崂
崆
崇
崎
崑
崔
崖
崗
崙
崛
崧
崩
崭
崴
崽
嵇
嵊
嵋
嵌
嵐
嵘
嵩
嵬
嵯
嶂
嶄
嶇
嶋
嶙
嶺
嶼
嶽
巅
巍
巒
巔
巖
川
州
巡
巢
工
左
巧
巨
巩
巫
差
己
已
巳
巴
巷
巻
巽
巾
巿
币
市
布
帅
帆
师
希
帐
帑
帕
帖
帘
帚
帛
帜
帝
帥
带
帧
師
席
帮
帯
帰
帳
帶
帷
常
帼
帽
幀
幂
幄
幅
幌
幔
幕
幟
幡
幢
幣
幫
干
平
年
并
幸
幹
幺
幻
幼
幽
幾
广
庁
広
庄
庆
庇
床
序
庐
库
应
底
庖
店
庙
庚
府
庞
废
庠
度
座
庫
庭
庵
庶
康
庸
庹
庾
廁
廂
廃
廈
廉
廊
廓
廖
廚
廝
廟
廠
廢
廣
廬
廳
延
廷
建
廿
开
弁
异
弃
弄
弈
弊
弋
式
弑
弒
弓
弔
引
弗
弘
弛
弟
张
弥
弦
弧
弩
弭
弯
弱
張
強
弹
强
弼
弾
彅
彆
彈
彌
彎
归
当
录
彗
彙
彝
形
彤
彥
彦
彧
彩
彪
彫
彬
彭
彰
影
彷
役
彻
彼
彿
往
征
径
待
徇
很
徉
徊
律
後
徐
徑
徒
従
徕
得
徘
徙
徜
從
徠
御
徨
復
循
徬
微
徳
徴
徵
德
徹
徼
徽
心
必
忆
忌
忍
忏
忐
忑
忒
忖
志
忘
忙
応
忠
忡
忤
忧
忪
快
忱
念
忻
忽
忿
怀
态
怂
怅
怆
怎
怏
怒
怔
怕
怖
怙
怜
思
怠
怡
急
怦
性
怨
怪
怯
怵
总
怼
恁
恃
恆
恋
恍
恐
恒
恕
恙
恚
恢
恣
恤
恥
恨
恩
恪
恫
恬
恭
息
恰
恳
恵
恶
恸
恺
恻
恼
恿
悄
悅
悉
悌
悍
悔
悖
悚
悟
悠
患
悦
您
悩
悪
悬
悯
悱
悲
悴
悵
悶
悸
悻
悼
悽
情
惆
惇
惊
惋
惑
惕
惘
惚
惜
惟
惠
惡
惦
惧
惨
惩
惫
惬
惭
惮
惯
惰
惱
想
惴
惶
惹
惺
愁
愆
愈
愉
愍
意
愕
愚
愛
愜
感
愣
愤
愧
愫
愷
愿
慄
慈
態
慌
慎
慑
慕
慘
慚
慟
慢
慣
慧
慨
慫
慮
慰
慳
慵
慶
慷
慾
憂
憊
憋
憎
憐
憑
憔
憚
憤
憧
憨
憩
憫
憬
憲
憶
憾
懂
懇
懈
應
懊
懋
懑
懒
懦
懲
懵
懶
懷
懸
懺
懼
懾
懿
戀
戈
戊
戌
戍
戎
戏
成
我
戒
戕
或
战
戚
戛
戟
戡
戦
截
戬
戮
戰
戲
戳
戴
戶
户
戸
戻
戾
房
所
扁
扇
扈
扉
手
才
扎
扑
扒
打
扔
払
托
扛
扣
扦
执
扩
扪
扫
扬
扭
扮
扯
扰
扱
扳
扶
批
扼
找
承
技
抄
抉
把
抑
抒
抓
投
抖
抗
折
抚
抛
抜
択
抟
抠
抡
抢
护
报
抨
披
抬
抱
抵
抹
押
抽
抿
拂
拄
担
拆
拇
拈
拉
拋
拌
拍
拎
拐
拒
拓
拔
拖
拗
拘
拙
拚
招
拜
拟
拡
拢
拣
拥
拦
拧
拨
择
括
拭
拮
拯
拱
拳
拴
拷
拼
拽
拾
拿
持
挂
指
挈
按
挎
挑
挖
挙
挚
挛
挝
挞
挟
挠
挡
挣
挤
挥
挨
挪
挫
振
挲
挹
挺
挽
挾
捂
捅
捆
捉
捋
捌
捍
捎
捏
捐
捕
捞
损
捡
换
捣
捧
捨
捩
据
捱
捲
捶
捷
捺
捻
掀
掂
掃
掇
授
掉
掌
掏
掐
排
掖
掘
掙
掛
掠
採
探
掣
接
控
推
掩
措
掬
掰
掲
掳
掴
掷
掸
掺
揀
揃
揄
揆
揉
揍
描
提
插
揖
揚
換
握
揣
揩
揪
揭
揮
援
揶
揸
揹
揽
搀
搁
搂
搅
損
搏
搐
搓
搔
搖
搗
搜
搞
搡
搪
搬
搭
搵
搶
携
搽
摀
摁
摄
摆
摇
摈
摊
摒
摔
摘
摞
摟
摧
摩
摯
摳
摸
摹
摺
摻
撂
撃
撅
撇
撈
撐
撑
撒
撓
撕
撚
撞
撤
撥
撩
撫
撬
播
撮
撰
撲
撵
撷
撸
撻
撼
撿
擀
擁
擂
擄
擅
擇
擊
擋
操
擎
擒
擔
擘
據
擞
擠
擡
擢
擦
擬
擰
擱
擲
擴
擷
擺
擼
擾
攀
攏
攒
攔
攘
攙
攜
攝
攞
攢
攣
攤
攥
攪
攫
攬
支
收
攸
改
攻
放
政
故
效
敌
敍
敎
敏
救
敕
敖
敗
敘
教
敛
敝
敞
敢
散
敦
敬
数
敲
整
敵
敷
數
斂
斃
文
斋
斌
斎
斐
斑
斓
斗
料
斛
斜
斟
斡
斤
斥
斧
斩
斫
斬
断
斯
新
斷
方
於
施
旁
旃
旅
旋
旌
旎
族
旖
旗
无
既
日
旦
旧
旨
早
旬
旭
旮
旱
时
旷
旺
旻
昀
昂
昆
昇
昉
昊
昌
明
昏
易
昔
昕
昙
星
映
春
昧
昨
昭
是
昱
昴
昵
昶
昼
显
晁
時
晃
晉
晋
晌
晏
晒
晓
晔
晕
晖
晗
晚
晝
晞
晟
晤
晦
晨
晩
普
景
晰
晴
晶
晷
智
晾
暂
暄
暇
暈
暉
暌
暐
暑
暖
暗
暝
暢
暧
暨
暫
暮
暱
暴
暸
暹
曄
曆
曇
曉
曖
曙
曜
曝
曠
曦
曬
曰
曲
曳
更
書
曹
曼
曾
替
最
會
月
有
朋
服
朐
朔
朕
朗
望
朝
期
朦
朧
木
未
末
本
札
朮
术
朱
朴
朵
机
朽
杀
杂
权
杆
杈
杉
李
杏
材
村
杓
杖
杜
杞
束
杠
条
来
杨
杭
杯
杰
東
杳
杵
杷
杼
松
板
极
构
枇
枉
枋
析
枕
林
枚
果
枝
枢
枣
枪
枫
枭
枯
枰
枱
枳
架
枷
枸
柄
柏
某
柑
柒
染
柔
柘
柚
柜
柞
柠
柢
查
柩
柬
柯
柱
柳
柴
柵
査
柿
栀
栃
栄
栅
标
栈
栉
栋
栎
栏
树
栓
栖
栗
校
栩
株
样
核
根
格
栽
栾
桀
桁
桂
桃
桅
框
案
桉
桌
桎
桐
桑
桓
桔
桜
桠
桡
桢
档
桥
桦
桧
桨
桩
桶
桿
梁
梅
梆
梏
梓
梗
條
梟
梢
梦
梧
梨
梭
梯
械
梳
梵
梶
检
棂
棄
棉
棋
棍
棒
棕
棗
棘
棚
棟
棠
棣
棧
森
棱
棲
棵
棹
棺
椁
椅
椋
植
椎
椒
検
椪
椭
椰
椹
椽
椿
楂
楊
楓
楔
楚
楝
楞
楠
楣
楨
楫
業
楮
極
楷
楸
楹
楼
楽
概
榄
榆
榈
榉
榔
榕
榖
榛
榜
榨
榫
榭
榮
榱
榴
榷
榻
槁
槃
構
槌
槍
槎
槐
槓
様
槛
槟
槤
槭
槲
槳
槻
槽
槿
樁
樂
樊
樑
樓
標
樞
樟
模
樣
権
横
樫
樯
樱
樵
樸
樹
樺
樽
樾
橄
橇
橋
橐
橘
橙
機
橡
橢
橫
橱
橹
橼
檀
檄
檎
檐
檔
檗
檜
檢
檬
檯
檳
檸
檻
櫃
櫚
櫛
櫥
櫸
櫻
欄
權
欒
欖
欠
次
欢
欣
欧
欲
欸
欺
欽
款
歆
歇
歉
歌
歎
歐
歓
歙
歛
歡
止
正
此
步
武
歧
歩
歪
歯
歲
歳
歴
歷
歸
歹
死
歼
殁
殃
殆
殇
殉
殊
残
殒
殓
殖
殘
殞
殡
殤
殭
殯
殲
殴
段
殷
殺
殼
殿
毀
毁
毂
毅
毆
毋
母
毎
每
毒
毓
比
毕
毗
毘
毙
毛
毡
毫
毯
毽
氈
氏
氐
民
氓
气
氖
気
氙
氛
氟
氡
氢
氣
氤
氦
氧
氨
氪
氫
氮
氯
氰
氲
水
氷
永
氹
氾
汀
汁
求
汆
汇
汉
汎
汐
汕
汗
汙
汛
汝
汞
江
池
污
汤
汨
汩
汪
汰
汲
汴
汶
汹
決
汽
汾
沁
沂
沃
沅
沈
沉
沌
沏
沐
沒
沓
沖
沙
沛
沟
没
沢
沣
沥
沦
沧
沪
沫
沭
沮
沱
河
沸
油
治
沼
沽
沾
沿
況
泄
泉
泊
泌
泓
法
泗
泛
泞
泠
泡
波
泣
泥
注
泪
泫
泮
泯
泰
泱
泳
泵
泷
泸
泻
泼
泽
泾
洁
洄
洋
洒
洗
洙
洛
洞
津
洩
洪
洮
洱
洲
洵
洶
洸
洹
活
洼
洽
派
流
浃
浄
浅
浆
浇
浊
测
济
浏
浑
浒
浓
浔
浙
浚
浜
浣
浦
浩
浪
浬
浮
浯
浴
海
浸
涂
涅
涇
消
涉
涌
涎
涓
涔
涕
涙
涛
涝
涞
涟
涠
涡
涣
涤
润
涧
涨
涩
涪
涮
涯
液
涵
涸
涼
涿
淀
淄
淅
淆
淇
淋
淌
淑
淒
淖
淘
淙
淚
淞
淡
淤
淦
淨
淩
淪
淫
淬
淮
深
淳
淵
混
淹
淺
添
淼
清
済
渉
渊
渋
渍
渎
渐
渔
渗
渙
渚
減
渝
渠
渡
渣
渤
渥
渦
温
測
渭
港
渲
渴
游
渺
渾
湃
湄
湊
湍
湖
湘
湛
湟
湧
湫
湮
湯
湳
湾
湿
満
溃
溅
溉
溏
源
準
溜
溝
溟
溢
溥
溧
溪
溫
溯
溱
溴
溶
溺
溼
滁
滂
滄
滅
滇
滋
滌
滑
滓
滔
滕
滙
滚
滝
滞
滟
满
滢
滤
滥
滦
滨
滩
滬
滯
滲
滴
滷
滸
滾
滿
漁
漂
漆
漉
漏
漓
演
漕
漠
漢
漣
漩
漪
漫
漬
漯
漱
漲
漳
漸
漾
漿
潆
潇
潋
潍
潑
潔
潘
潛
潜
潞
潟
潢
潤
潦
潧
潭
潮
潰
潴
潸
潺
潼
澀
澄
澆
澈
澍
澎
澗
澜
澡
澤
澧
澱
澳
澹
激
濁
濂
濃
濑
濒
濕
濘
濛
濟
濠
濡
濤
濫
濬
濮
濯
濱
濺
濾
瀅
瀆
瀉
瀋
瀏
瀑
瀕
瀘
瀚
瀛
瀝
瀞
瀟
瀧
瀨
瀬
瀰
瀾
灌
灏
灑
灘
灝
灞
灣
火
灬
灭
灯
灰
灵
灶
灸
灼
災
灾
灿
炀
炁
炅
炉
炊
炎
炒
炔
炕
炖
炙
炜
炫
炬
炭
炮
炯
炳
炷
炸
点
為
炼
炽
烁
烂
烃
烈
烊
烏
烘
烙
烛
烟
烤
烦
烧
烨
烩
烫
烬
热
烯
烷
烹
烽
焉
焊
焕
焖
焗
焘
焙
焚
焜
無
焦
焯
焰
焱
然
焼
煅
煉
煊
煌
煎
煒
煖
煙
煜
煞
煤
煥
煦
照
煨
煩
煮
煲
煸
煽
熄
熊
熏
熒
熔
熙
熟
熠
熨
熬
熱
熵
熹
熾
燁
燃
燄
燈
燉
燊
燎
燒
燔
燕
燙
燜
營
燥
燦
燧
燭
燮
燴
燻
燼
燿
爆
爍
爐
爛
爪
爬
爭
爰
爱
爲
爵
父
爷
爸
爹
爺
爻
爽
爾
牆
片
版
牌
牍
牒
牙
牛
牝
牟
牠
牡
牢
牦
牧
物
牯
牲
牴
牵
特
牺
牽
犀
犁
犄
犊
犍
犒
犢
犧
犬
犯
状
犷
犸
犹
狀
狂
狄
狈
狎
狐
狒
狗
狙
狞
狠
狡
狩
独
狭
狮
狰
狱
狸
狹
狼
狽
猎
猕
猖
猗
猙
猛
猜
猝
猥
猩
猪
猫
猬
献
猴
猶
猷
猾
猿
獄
獅
獎
獐
獒
獗
獠
獣
獨
獭
獰
獲
獵
獷
獸
獺
獻
獼
獾
玄
率
玉
王
玑
玖
玛
玟
玠
玥
玩
玫
玮
环
现
玲
玳
玷
玺
玻
珀
珂
珅
珈
珉
珊
珍
珏
珐
珑
珙
珞
珠
珣
珥
珩
珪
班
珮
珲
珺
現
球
琅
理
琇
琉
琊
琍
琏
琐
琛
琢
琥
琦
琨
琪
琬
琮
琰
琲
琳
琴
琵
琶
琺
琼
瑀
瑁
瑄
瑋
瑕
瑗
瑙
瑚
瑛
瑜
瑞
瑟
瑠
瑣
瑤
瑩
瑪
瑯
瑰
瑶
瑾
璀
璁
璃
璇
璉
璋
璎
璐
璜
璞
璟
璧
璨
環
璽
璿
瓊
瓏
瓒
瓜
瓢
瓣
瓤
瓦
瓮
瓯
瓴
瓶
瓷
甄
甌
甕
甘
甙
甚
甜
生
產
産
甥
甦
用
甩
甫
甬
甭
甯
田
由
甲
申
电
男
甸
町
画
甾
畀
畅
界
畏
畑
畔
留
畜
畝
畢
略
畦
番
畫
異
畲
畳
畴
當
畸
畹
畿
疆
疇
疊
疏
疑
疔
疖
疗
疙
疚
疝
疟
疡
疣
疤
疥
疫
疮
疯
疱
疲
疳
疵
疸
疹
疼
疽
疾
痂
病
症
痈
痉
痊
痍
痒
痔
痕
痘
痙
痛
痞
痠
痢
痣
痤
痧
痨
痪
痫
痰
痱
痴
痹
痺
痼
痿
瘀
瘁
瘋
瘍
瘓
瘘
瘙
瘟
瘠
瘡
瘢
瘤
瘦
瘧
瘩
瘪
瘫
瘴
瘸
瘾
療
癇
癌
癒
癖
癜
癞
癡
癢
癣
癥
癫
癬
癮
癱
癲
癸
発
登
發
白
百
皂
的
皆
皇
皈
皋
皎
皑
皓
皖
皙
皚
皮
皰
皱
皴
皺
皿
盂
盃
盅
盆
盈
益
盎
盏
盐
监
盒
盔
盖
盗
盘
盛
盜
盞
盟
盡
監
盤
盥
盧
盪
目
盯
盱
盲
直
相
盹
盼
盾
省
眈
眉
看
県
眙
眞
真
眠
眦
眨
眩
眯
眶
眷
眸
眺
眼
眾
着
睁
睇
睏
睐
睑
睛
睜
睞
睡
睢
督
睥
睦
睨
睪
睫
睬
睹
睽
睾
睿
瞄
瞅
瞇
瞋
瞌
瞎
瞑
瞒
瞓
瞞
瞟
瞠
瞥
瞧
瞩
瞪
瞬
瞭
瞰
瞳
瞻
瞼
瞿
矇
矍
矗
矚
矛
矜
矢
矣
知
矩
矫
短
矮
矯
石
矶
矽
矾
矿
码
砂
砌
砍
砒
研
砖
砗
砚
砝
砣
砥
砧
砭
砰
砲
破
砷
砸
砺
砼
砾
础
硅
硐
硒
硕
硝
硫
硬
确
硯
硼
碁
碇
碉
碌
碍
碎
碑
碓
碗
碘
碚
碛
碟
碣
碧
碩
碰
碱
碳
碴
確
碼
碾
磁
磅
磊
磋
磐
磕
磚
磡
磨
磬
磯
磲
磷
磺
礁
礎
礙
礡
礦
礪
礫
礴
示
礼
社
祀
祁
祂
祇
祈
祉
祎
祐
祕
祖
祗
祚
祛
祜
祝
神
祟
祠
祢
祥
票
祭
祯
祷
祸
祺
祿
禀
禁
禄
禅
禍
禎
福
禛
禦
禧
禪
禮
禱
禹
禺
离
禽
禾
禿
秀
私
秃
秆
秉
秋
种
科
秒
秘
租
秣
秤
秦
秧
秩
秭
积
称
秸
移
秽
稀
稅
程
稍
税
稔
稗
稚
稜
稞
稟
稠
稣
種
稱
稲
稳
稷
稹
稻
稼
稽
稿
穀
穂
穆
穌
積
穎
穗
穢
穩
穫
穴
究
穷
穹
空
穿
突
窃
窄
窈
窍
窑
窒
窓
窕
窖
窗
窘
窜
窝
窟
窠
窥
窦
窨
窩
窪
窮
窯
窺
窿
竄
竅
竇
竊
立
竖
站
竜
竞
竟
章
竣
童
竭
端
競
竹
竺
竽
竿
笃
笆
笈
笋
笏
笑
笔
笙
笛
笞
笠
符
笨
第
笹
笺
笼
筆
等
筊
筋
筍
筏
筐
筑
筒
答
策
筛
筝
筠
筱
筲
筵
筷
筹
签
简
箇
箋
箍
箏
箐
箔
箕
算
箝
管
箩
箫
箭
箱
箴
箸
節
篁
範
篆
篇
築
篑
篓
篙
篝
篠
篡
篤
篩
篪
篮
篱
篷
簇
簌
簍
簡
簦
簧
簪
簫
簷
簸
簽
簾
簿
籁
籃
籌
籍
籐
籟
籠
籤
籬
籮
籲
米
类
籼
籽
粄
粉
粑
粒
粕
粗
粘
粟
粤
粥
粧
粪
粮
粱
粲
粳
粵
粹
粼
粽
精
粿
糅
糊
糍
糕
糖
糗
糙
糜
糞
糟
糠
糧
糬
糯
糰
糸
系
糾
紀
紂
約
紅
紉
紊
紋
納
紐
紓
純
紗
紘
紙
級
紛
紜
素
紡
索
紧
紫
紮
累
細
紳
紹
紺
終
絃
組
絆
経
結
絕
絞
絡
絢
給
絨
絮
統
絲
絳
絵
絶
絹
綁
綏
綑
經
継
続
綜
綠
綢
綦
綫
綬
維
綱
網
綴
綵
綸
綺
綻
綽
綾
綿
緊
緋
総
緑
緒
緘
線
緝
緞
締
緣
編
緩
緬
緯
練
緹
緻
縁
縄
縈
縛
縝
縣
縫
縮
縱
縴
縷
總
績
繁
繃
繆
繇
繋
織
繕
繚
繞
繡
繩
繪
繫
繭
繳
繹
繼
繽
纂
續
纍
纏
纓
纔
纖
纜
纠
红
纣
纤
约
级
纨
纪
纫
纬
纭
纯
纰
纱
纲
纳
纵
纶
纷
纸
纹
纺
纽
纾
线
绀
练
组
绅
细
织
终
绊
绍
绎
经
绑
绒
结
绔
绕
绘
给
绚
绛
络
绝
绞
统
绡
绢
绣
绥
绦
继
绩
绪
绫
续
绮
绯
绰
绳
维
绵
绶
绷
绸
绻
综
绽
绾
绿
缀
缄
缅
缆
缇
缈
缉
缎
缓
缔
缕
编
缘
缙
缚
缜
缝
缠
缢
缤
缥
缨
缩
缪
缭
缮
缰
缱
缴
缸
缺
缽
罂
罄
罌
罐
网
罔
罕
罗
罚
罡
罢
罩
罪
置
罰
署
罵
罷
罹
羁
羅
羈
羊
羌
美
羔
羚
羞
羟
羡
羣
群
羥
羧
羨
義
羯
羲
羸
羹
羽
羿
翁
翅
翊
翌
翎
習
翔
翘
翟
翠
翡
翦
翩
翰
翱
翳
翹
翻
翼
耀
老
考
耄
者
耆
耋
而
耍
耐
耒
耕
耗
耘
耙
耦
耨
耳
耶
耷
耸
耻
耽
耿
聂
聆
聊
聋
职
聒
联
聖
聘
聚
聞
聪
聯
聰
聲
聳
聴
聶
職
聽
聾
聿
肃
肄
肅
肆
肇
肉
肋
肌
肏
肓
肖
肘
肚
肛
肝
肠
股
肢
肤
肥
肩
肪
肮
肯
肱
育
肴
肺
肽
肾
肿
胀
胁
胃
胄
胆
背
胍
胎
胖
胚
胛
胜
胝
胞
胡
胤
胥
胧
胫
胭
胯
胰
胱
胳
胴
胶
胸
胺
能
脂
脅
脆
脇
脈
脉
脊
脍
脏
脐
脑
脓
脖
脘
脚
脛
脣
脩
脫
脯
脱
脲
脳
脸
脹
脾
腆
腈
腊
腋
腌
腎
腐
腑
腓
腔
腕
腥
腦
腩
腫
腭
腮
腰
腱
腳
腴
腸
腹
腺
腻
腼
腾
腿
膀
膈
膊
膏
膑
膘
膚
膛
膜
膝
膠
膦
膨
膩
膳
膺
膻
膽
膾
膿
臀
臂
臃
臆
臉
臊
臍
臓
臘
臟
臣
臥
臧
臨
自
臬
臭
至
致
臺
臻
臼
臾
舀
舂
舅
舆
與
興
舉
舊
舌
舍
舎
舐
舒
舔
舖
舗
舛
舜
舞
舟
航
舫
般
舰
舱
舵
舶
舷
舸
船
舺
舾
艇
艋
艘
艙
艦
艮
良
艰
艱
色
艳
艷
艹
艺
艾
节
芃
芈
芊
芋
芍
芎
芒
芙
芜
芝
芡
芥
芦
芩
芪
芫
芬
芭
芮
芯
花
芳
芷
芸
芹
芻
芽
芾
苁
苄
苇
苋
苍
苏
苑
苒
苓
苔
苕
苗
苛
苜
苞
苟
苡
苣
若
苦
苫
苯
英
苷
苹
苻
茁
茂
范
茄
茅
茉
茎
茏
茗
茜
茧
茨
茫
茬
茭
茯
茱
茲
茴
茵
茶
茸
茹
茼
荀
荃
荆
草
荊
荏
荐
荒
荔
荖
荘
荚
荞
荟
荠
荡
荣
荤
荥
荧
荨
荪
荫
药
荳
荷
荸
荻
荼
荽
莅
莆
莉
莊
莎
莒
莓
莖
莘
莞
莠
莢
莧
莪
莫
莱
莲
莴
获
莹
莺
莽
莿
菀
菁
菅
菇
菈
菊
菌
菏
菓
菖
菘
菜
菟
菠
菡
菩
華
菱
菲
菸
菽
萁
萃
萄
萊
萋
萌
萍
萎
萘
萝
萤
营
萦
萧
萨
萩
萬
萱
萵
萸
萼
落
葆
葉
著
葚
葛
葡
董
葦
葩
葫
葬
葭
葯
葱
葳
葵
葷
葺
蒂
蒋
蒐
蒔
蒙
蒜
蒞
蒟
蒡
蒨
蒲
蒸
蒹
蒻
蒼
蒿
蓁
蓄
蓆
蓉
蓋
蓑
蓓
蓖
蓝
蓟
蓦
蓬
蓮
蓼
蓿
蔑
蔓
蔔
蔗
蔘
蔚
蔡
蔣
蔥
蔫
蔬
蔭
蔵
蔷
蔺
蔻
蔼
蔽
蕁
蕃
蕈
蕉
蕊
蕎
蕙
蕤
蕨
蕩
蕪
蕭
蕲
蕴
蕻
蕾
薄
薅
薇
薈
薊
薏
薑
薔
薙
薛
薦
薨
薩
薪
薬
薯
薰
薹
藉
藍
藏
藐
藓
藕
藜
藝
藤
藥
藩
藹
藻
藿
蘆
蘇
蘊
蘋
蘑
蘚
蘭
蘸
蘼
蘿
虎
虏
虐
虑
虔
處
虚
虛
虜
虞
號
虢
虧
虫
虬
虱
虹
虻
虽
虾
蚀
蚁
蚂
蚊
蚌
蚓
蚕
蚜
蚝
蚣
蚤
蚩
蚪
蚯
蚱
蚵
蛀
蛆
蛇
蛊
蛋
蛎
蛐
蛔
蛙
蛛
蛟
蛤
蛭
蛮
蛰
蛳
蛹
蛻
蛾
蜀
蜂
蜃
蜆
蜇
蜈
蜊
蜍
蜒
蜓
蜕
蜗
蜘
蜚
蜜
蜡
蜢
蜥
蜱
蜴
蜷
蜻
蜿
蝇
蝈
蝉
蝌
蝎
蝕
蝗
蝙
蝟
蝠
蝦
蝨
蝴
蝶
蝸
蝼
螂
螃
融
螞
螢
螨
螯
螳
螺
蟀
蟄
蟆
蟋
蟎
蟑
蟒
蟠
蟬
蟲
蟹
蟻
蟾
蠅
蠍
蠔
蠕
蠛
蠟
蠡
蠢
蠣
蠱
蠶
蠹
蠻
血
衄
衅
衆
行
衍
術
衔
街
衙
衛
衝
衞
衡
衢
衣
补
表
衩
衫
衬
衮
衰
衲
衷
衹
衾
衿
袁
袂
袄
袅
袈
袋
袍
袒
袖
袜
袞
袤
袪
被
袭
袱
裁
裂
装
裆
裊
裏
裔
裕
裘
裙
補
裝
裟
裡
裤
裨
裱
裳
裴
裸
裹
製
裾
褂
複
褐
褒
褓
褔
褚
褥
褪
褫
褲
褶
褻
襁
襄
襟
襠
襪
襬
襯
襲
西
要
覃
覆
覇
見
規
覓
視
覚
覦
覧
親
覬
観
覷
覺
覽
觀
见
观
规
觅
视
览
觉
觊
觎
觐
觑
角
觞
解
觥
触
觸
言
訂
計
訊
討
訓
訕
訖
託
記
訛
訝
訟
訣
訥
訪
設
許
訳
訴
訶
診
註
証
詆
詐
詔
評
詛
詞
詠
詡
詢
詣
試
詩
詫
詬
詭
詮
詰
話
該
詳
詹
詼
誅
誇
誉
誌
認
誓
誕
誘
語
誠
誡
誣
誤
誥
誦
誨
說
説
読
誰
課
誹
誼
調
諄
談
請
諏
諒
論
諗
諜
諡
諦
諧
諫
諭
諮
諱
諳
諷
諸
諺
諾
謀
謁
謂
謄
謊
謎
謐
謔
謗
謙
講
謝
謠
謨
謬
謹
謾
譁
證
譎
譏
識
譙
譚
譜
警
譬
譯
議
譲
譴
護
譽
讀
變
讓
讚
讞
计
订
认
讥
讧
讨
让
讪
讫
训
议
讯
记
讲
讳
讴
讶
讷
许
讹
论
讼
讽
设
访
诀
证
诃
评
诅
识
诈
诉
诊
诋
词
诏
译
试
诗
诘
诙
诚
诛
话
诞
诟
诠
诡
询
诣
诤
该
详
诧
诩
诫
诬
语
误
诰
诱
诲
说
诵
诶
请
诸
诺
读
诽
课
诿
谀
谁
调
谄
谅
谆
谈
谊
谋
谌
谍
谎
谏
谐
谑
谒
谓
谔
谕
谗
谘
谙
谚
谛
谜
谟
谢
谣
谤
谥
谦
谧
谨
谩
谪
谬
谭
谯
谱
谲
谴
谶
谷
豁
豆
豇
豈
豉
豊
豌
豎
豐
豔
豚
象
豢
豪
豫
豬
豹
豺
貂
貅
貌
貓
貔
貘
貝
貞
負
財
貢
貧
貨
販
貪
貫
責
貯
貰
貳
貴
貶
買
貸
費
貼
貽
貿
賀
賁
賂
賃
賄
資
賈
賊
賑
賓
賜
賞
賠
賡
賢
賣
賤
賦
質
賬
賭
賴
賺
購
賽
贅
贈
贊
贍
贏
贓
贖
贛
贝
贞
负
贡
财
责
贤
败
账
货
质
贩
贪
贫
贬
购
贮
贯
贰
贱
贲
贴
贵
贷
贸
费
贺
贻
贼
贾
贿
赁
赂
赃
资
赅
赈
赊
赋
赌
赎
赏
赐
赓
赔
赖
赘
赚
赛
赝
赞
赠
赡
赢
赣
赤
赦
赧
赫
赭
走
赳
赴
赵
赶
起
趁
超
越
趋
趕
趙
趟
趣
趨
足
趴
趵
趸
趺
趾
跃
跄
跆
跋
跌
跎
跑
跖
跚
跛
距
跟
跡
跤
跨
跩
跪
路
跳
践
跷
跹
跺
跻
踉
踊
踌
踏
踐
踝
踞
踟
踢
踩
踪
踮
踱
踴
踵
踹
蹂
蹄
蹇
蹈
蹉
蹊
蹋
蹑
蹒
蹙
蹟
蹣
蹤
蹦
蹩
蹬
蹭
蹲
蹴
蹶
蹺
蹼
蹿
躁
躇
躉
躊
躋
躍
躏
躪
身
躬
躯
躲
躺
軀
車
軋
軌
軍
軒
軟
転
軸
軼
軽
軾
較
載
輒
輓
輔
輕
輛
輝
輟
輩
輪
輯
輸
輻
輾
輿
轄
轅
轆
轉
轍
轎
轟
车
轧
轨
轩
转
轭
轮
软
轰
轲
轴
轶
轻
轼
载
轿
较
辄
辅
辆
辇
辈
辉
辊
辍
辐
辑
输
辕
辖
辗
辘
辙
辛
辜
辞
辟
辣
辦
辨
辩
辫
辭
辮
辯
辰
辱
農
边
辺
辻
込
辽
达
迁
迂
迄
迅
过
迈
迎
运
近
返
还
这
进
远
违
连
迟
迢
迤
迥
迦
迩
迪
迫
迭
述
迴
迷
迸
迹
迺
追
退
送
适
逃
逅
逆
选
逊
逍
透
逐
递
途
逕
逗
這
通
逛
逝
逞
速
造
逢
連
逮
週
進
逵
逶
逸
逻
逼
逾
遁
遂
遅
遇
遊
運
遍
過
遏
遐
遑
遒
道
達
違
遗
遙
遛
遜
遞
遠
遢
遣
遥
遨
適
遭
遮
遲
遴
遵
遶
遷
選
遺
遼
遽
避
邀
邁
邂
邃
還
邇
邈
邊
邋
邏
邑
邓
邕
邛
邝
邢
那
邦
邨
邪
邬
邮
邯
邰
邱
邳
邵
邸
邹
邺
邻
郁
郅
郊
郎
郑
郜
郝
郡
郢
郤
郦
郧
部
郫
郭
郴
郵
郷
郸
都
鄂
鄉
鄒
鄔
鄙
鄞
鄢
鄧
鄭
鄰
鄱
鄲
鄺
酉
酊
酋
酌
配
酐
酒
酗
酚
酝
酢
酣
酥
酩
酪
酬
酮
酯
酰
酱
酵
酶
酷
酸
酿
醃
醇
醉
醋
醍
醐
醒
醚
醛
醜
醞
醣
醪
醫
醬
醮
醯
醴
醺
釀
釁
采
釉
释
釋
里
重
野
量
釐
金
釗
釘
釜
針
釣
釦
釧
釵
鈀
鈉
鈍
鈎
鈔
鈕
鈞
鈣
鈦
鈪
鈴
鈺
鈾
鉀
鉄
鉅
鉉
鉑
鉗
鉚
鉛
鉤
鉴
鉻
銀
銃
銅
銑
銓
銖
銘
銜
銬
銭
銮
銳
銷
銹
鋁
鋅
鋒
鋤
鋪
鋰
鋸
鋼
錄
錐
錘
錚
錠
錢
錦
錨
錫
錮
錯
録
錳
錶
鍊
鍋
鍍
鍛
鍥
鍰
鍵
鍺
鍾
鎂
鎊
鎌
鎏
鎔
鎖
鎗
鎚
鎧
鎬
鎮
鎳
鏈
鏖
鏗
鏘
鏞
鏟
鏡
鏢
鏤
鏽
鐘
鐮
鐲
鐳
鐵
鐸
鐺
鑄
鑊
鑑
鑒
鑣
鑫
鑰
鑲
鑼
鑽
鑾
鑿
针
钉
钊
钎
钏
钒
钓
钗
钙
钛
钜
钝
钞
钟
钠
钡
钢
钣
钤
钥
钦
钧
钨
钩
钮
钯
钰
钱
钳
钴
钵
钺
钻
钼
钾
钿
铀
铁
铂
铃
铄
铅
铆
铉
铎
铐
铛
铜
铝
铠
铡
铢
铣
铤
铨
铩
铬
铭
铮
铰
铲
铵
银
铸
铺
链
铿
销
锁
锂
锄
锅
锆
锈
锉
锋
锌
锏
锐
锑
错
锚
锟
锡
锢
锣
锤
锥
锦
锭
键
锯
锰
锲
锵
锹
锺
锻
镀
镁
镂
镇
镉
镌
镍
镐
镑
镕
镖
镗
镛
镜
镣
镭
镯
镰
镳
镶
長
长
門
閃
閉
開
閎
閏
閑
閒
間
閔
閘
閡
関
閣
閥
閨
閩
閱
閲
閹
閻
閾
闆
闇
闊
闌
闍
闔
闕
闖
闘
關
闡
闢
门
闪
闫
闭
问
闯
闰
闲
间
闵
闷
闸
闹
闺
闻
闽
闾
阀
阁
阂
阅
阆
阇
阈
阉
阎
阐
阑
阔
阕
阖
阙
阚
阜
队
阡
阪
阮
阱
防
阳
阴
阵
阶
阻
阿
陀
陂
附
际
陆
陇
陈
陋
陌
降
限
陕
陛
陝
陞
陟
陡
院
陣
除
陨
险
陪
陰
陲
陳
陵
陶
陷
陸
険
陽
隅
隆
隈
隊
隋
隍
階
随
隐
隔
隕
隘
隙
際
障
隠
隣
隧
隨
險
隱
隴
隶
隸
隻
隼
隽
难
雀
雁
雄
雅
集
雇
雉
雋
雌
雍
雎
雏
雑
雒
雕
雖
雙
雛
雜
雞
離
難
雨
雪
雯
雰
雲
雳
零
雷
雹
電
雾
需
霁
霄
霆
震
霈
霉
霊
霍
霎
霏
霑
霓
霖
霜
霞
霧
霭
霰
露
霸
霹
霽
霾
靂
靄
靈
青
靓
靖
静
靚
靛
靜
非
靠
靡
面
靥
靦
革
靳
靴
靶
靼
鞅
鞋
鞍
鞏
鞑
鞘
鞠
鞣
鞦
鞭
韆
韋
韌
韓
韜
韦
韧
韩
韬
韭
音
韵
韶
韻
響
頁
頂
頃
項
順
須
頌
預
頑
頒
頓
頗
領
頜
頡
頤
頫
頭
頰
頷
頸
頹
頻
頼
顆
題
額
顎
顏
顔
願
顛
類
顧
顫
顯
顱
顴
页
顶
顷
项
顺
须
顼
顽
顾
顿
颁
颂
预
颅
领
颇
颈
颉
颊
颌
颍
颐
频
颓
颔
颖
颗
题
颚
颛
颜
额
颞
颠
颡
颢
颤
颦
颧
風
颯
颱
颳
颶
颼
飄
飆
风
飒
飓
飕
飘
飙
飚
飛
飞
食
飢
飨
飩
飪
飯
飲
飼
飽
飾
餃
餅
餉
養
餌
餐
餒
餓
餘
餚
餛
餞
餡
館
餮
餵
餾
饅
饈
饋
饌
饍
饑
饒
饕
饗
饞
饥
饨
饪
饬
饭
饮
饯
饰
饱
饲
饴
饵
饶
饷
饺
饼
饽
饿
馀
馁
馄
馅
馆
馈
馋
馍
馏
馒
馔
首
馗
香
馥
馨
馬
馭
馮
馳
馴
駁
駄
駅
駆
駐
駒
駕
駛
駝
駭
駱
駿
騁
騎
騏
験
騙
騨
騰
騷
驀
驅
驊
驍
驒
驕
驗
驚
驛
驟
驢
驥
马
驭
驮
驯
驰
驱
驳
驴
驶
驷
驸
驹
驻
驼
驾
驿
骁
骂
骄
骅
骆
骇
骈
骊
骋
验
骏
骐
骑
骗
骚
骛
骜
骞
骠
骡
骤
骥
骧
骨
骯
骰
骶
骷
骸
骼
髂
髅
髋
髏
髒
髓
體
髖
高
髦
髪
髮
髯
髻
鬃
鬆
鬍
鬓
鬚
鬟
鬢
鬣
鬥
鬧
鬱
鬼
魁
魂
魄
魅
魇
魍
魏
魔
魘
魚
魯
魷
鮑
鮨
鮪
鮭
鮮
鯉
鯊
鯖
鯛
鯨
鯰
鯽
鰍
鰓
鰭
鰲
鰻
鰾
鱈
鱉
鱔
鱗
鱷
鱸
鱼
鱿
鲁
鲈
鲍
鲑
鲛
鲜
鲟
鲢
鲤
鲨
鲫
鲱
鲲
鲶
鲷
鲸
鳃
鳄
鳅
鳌
鳍
鳕
鳖
鳗
鳝
鳞
鳥
鳩
鳳
鳴
鳶
鴉
鴕
鴛
鴦
鴨
鴻
鴿
鵑
鵜
鵝
鵡
鵬
鵰
鵲
鶘
鶩
鶯
鶴
鷗
鷲
鷹
鷺
鸚
鸞
鸟
鸠
鸡
鸢
鸣
鸥
鸦
鸨
鸪
鸭
鸯
鸳
鸵
鸽
鸾
鸿
鹂
鹃
鹄
鹅
鹈
鹉
鹊
鹌
鹏
鹑
鹕
鹘
鹜
鹞
鹤
鹦
鹧
鹫
鹭
鹰
鹳
鹵
鹹
鹼
鹽
鹿
麂
麋
麒
麓
麗
麝
麟
麥
麦
麩
麴
麵
麸
麺
麻
麼
麽
麾
黃
黄
黍
黎
黏
黑
黒
黔
默
黛
黜
黝
點
黠
黨
黯
黴
鼋
鼎
鼐
鼓
鼠
鼬
鼹
鼻
鼾
齁
齊
齋
齐
齒
齡
齢
齣
齦
齿
龄
龅
龈
龊
龋
龌
龍
龐
龔
龕
龙
龚
龛
龜
龟
︰
︱
︶
︿
﹁
﹂
﹍
﹏
﹐
﹑
﹒
﹔
﹕
﹖
﹗
﹙
﹚
﹝
﹞
﹡
﹣
！
＂
＃
＄
％
＆
＇
（
）
＊
＋
，
－
．
／
０
１
２
３
４
５
６
７
８
９
：
；
＜
＝
＞
？
＠
［
＼
］
＾
＿
｀
ａ
ｂ
ｃ
ｄ
ｅ
ｆ
ｇ
ｈ
ｉ
ｊ
ｋ
ｌ
ｍ
ｎ
ｏ
ｐ
ｑ
ｒ
ｓ
ｔ
ｕ
ｖ
ｗ
ｘ
ｙ
ｚ
｛
｜
｝
～
｡
｢
｣
､
･
ｯ
ｰ
ｲ
ｸ
ｼ
ｽ
ﾄ
ﾉ
ﾌ
ﾗ
ﾙ
ﾝ
ﾞ
ﾟ
￣
￥
👍
🔥
😂
😎
...
yam
10
2017
12
11
2016
20
30
15
06
lofter
##s
2015
by
16
14
18
13
24
17
2014
21
##0
22
19
25
23
com
100
00
05
2013
##a
03
09
08
28
##2
50
01
04
##1
27
02
2012
##3
26
##e
07
##8
##5
##6
##4
##9
##7
29
2011
40
##t
2010
##o
##d
##i
2009
##n
app
www
the
##m
31
##c
##l
##y
##r
##g
2008
60
http
200
qq
##p
80
##f
google
pixnet
90
cookies
tripadvisor
500
##er
##k
35
##h
facebook
2007
2000
70
##b
of
##x
##u
45
300
iphone
32
1000
2006
48
ip
36
in
38
3d
##w
##ing
55
ctrip
##on
##v
33
##の
to
34
400
id
2005
it
37
windows
llc
top
99
42
39
000
led
at
##an
41
51
52
46
49
43
53
44
##z
android
58
and
59
2004
56
vr
##か
5000
2003
47
blogthis
twitter
54
##le
150
ok
2018
57
75
cn
no
ios
##in
##mm
##00
800
on
te
3000
65
2001
360
95
ig
lv
120
##ng
##を
##us
##に
pc
てす
──
600
##te
85
2002
88
##ed
html
ncc
wifi
email
64
blog
is
##10
##て
mail
online
##al
dvd
##ic
studio
##は
##℃
##ia
##と
line
vip
72
##q
98
##ce
##en
for
##is
##ra
##es
##j
usb
net
cp
1999
asia
4g
##cm
diy
new
3c
##お
ta
66
language
vs
apple
tw
86
web
##ne
ipad
62
you
##re
101
68
##tion
ps
de
bt
pony
atm
##2017
1998
67
##ch
ceo
##or
go
##na
av
pro
cafe
96
pinterest
97
63
pixstyleme3c
##ta
more
said
##2016
1997
mp3
700
##ll
nba
jun
##20
92
tv
1995
pm
61
76
nbsp
250
##ie
linux
##ma
cd
110
hd
##17
78
##ion
77
6000
am
##th
##st
94
##se
##et
69
180
gdp
my
105
81
abc
89
flash
79
one
93
1990
1996
##ck
gps
##も
##ly
web885
106
2020
91
##ge
4000
1500
xd
boss
isbn
1994
org
##ry
me
love
##11
0fork
73
##12
3g
##ter
##ar
71
82
##la
hotel
130
1970
pk
83
87
140
ie
##os
##30
##el
74
##50
seo
cpu
##ml
p2p
84
may
##る
sun
tue
internet
cc
posted
youtube
##at
##ン
##man
ii
##ル
##15
abs
nt
pdf
yahoo
ago
1980
##it
news
mac
104
##てす
##me
##り
java
1992
spa
##de
##nt
hk
all
plus
la
1993
##mb
##16
##ve
west
##da
160
air
##い
##ps
から
##to
1989
logo
htc
php
https
fi
momo
##son
sat
##ke
##80
ebd
suv
wi
day
apk
##88
##um
mv
galaxy
wiki
or
brake
##ス
1200
する
this
1991
mon
##こ
❤2017
po
##ない
javascript
life
home
june
##ss
system
900
##ー
##０
pp
1988
world
fb
4k
br
##as
ic
ai
leonardo
safari
##60
live
free
xx
wed
win7
kiehl
##co
lg
o2o
##go
us
235
1949
mm
しい
vfm
kanye
##90
##2015
##id
jr
##ey
123
rss
##sa
##ro
##am
##no
thu
fri
350
##sh
##ki
103
comments
name
##のて
##pe
##ine
max
1987
8000
uber
##mi
##ton
wordpress
office
1986
1985
##ment
107
bd
win10
##ld
##li
gmail
bb
dior
##rs
##ri
##rd
##ます
up
cad
##®
dr
して
read
##21
をお
##io
##99
url
1984
pvc
paypal
show
policy
##40
##ty
##18
with
##★
##01
txt
102
##ba
dna
from
post
mini
ar
taiwan
john
##ga
privacy
agoda
##13
##ny
word
##24
##22
##by
##ur
##hz
1982
##ang
265
cookie
netscape
108
##ka
##～
##ad
house
share
note
ibm
code
hello
nike
sim
survey
##016
1979
1950
wikia
##32
##017
5g
cbc
##tor
##kg
1983
##rt
##14
campaign
store
2500
os
##ct
##ts
##°
170
api
##ns
365
excel
##な
##ao
##ら
##し
～～
##nd
university
163
には
518
##70
##ya
##il
##25
pierre
ipo
0020
897
##23
hotels
##ian
のお
125
years
6606
##ers
##26
high
##day
time
##ay
bug
##line
##く
##す
##be
xp
talk2yam
yamservice
10000
coco
##dy
sony
##ies
1978
microsoft
david
people
##ha
1960
instagram
intel
その
##ot
iso
1981
##va
115
##mo
##land
xxx
man
co
ltxsw
##ation
baby
220
##pa
##ol
1945
7000
tag
450
##ue
msn
##31
oppo
##ト
##ca
control
##om
st
chrome
##ure
##ん
be
##き
lol
##19
した
##bo
240
lady
##100
##way
##から
4600
##ko
##do
##un
4s
corporation
168
##ni
herme
##28
ｃｐ
978
##up
##06
ui
##ds
ppt
admin
three
します
bbc
re
128
##48
ca
##015
##35
hp
##ee
tpp
##た
##ive
××
root
##cc
##ました
##ble
##ity
adobe
park
114
et
oled
city
##ex
##ler
##ap
china
##book
20000
view
##ice
global
##km
your
hong
##mg
out
##ms
ng
ebay
##29
menu
ubuntu
##cy
rom
##view
open
ktv
do
server
##lo
if
english
##ね
##５
##oo
1600
##02
step1
kong
club
135
july
inc
1976
mr
hi
##net
touch
##ls
##ii
michael
lcd
##05
##33
phone
james
step2
1300
ios9
##box
dc
##２
##ley
samsung
111
280
pokemon
css
##ent
##les
いいえ
##１
s8
atom
play
bmw
##said
sa
etf
ctrl
♥yoyo♥
##55
2025
##2014
##66
adidas
amazon
1958
##ber
##ner
visa
##77
##der
1800
connectivity
##hi
firefox
109
118
hr
so
style
mark
pop
ol
skip
1975
as
##27
##ir
##61
190
mba
##う
##ai
le
##ver
1900
cafe2017
lte
super
113
129
##ron
amd
like
##☆
are
##ster
we
##sk
paul
data
international
##ft
longchamp
ssd
good
##ート
##ti
reply
##my
↓↓↓
apr
star
##ker
source
136
js
112
get
force
photo
##one
126
##2013
##ow
link
bbs
1972
goods
##lin
python
119
##ip
game
##ics
##ません
blue
##●
520
##45
page
itunes
##03
1955
260
1968
gt
gif
618
##ff
##47
group
くたさい
about
bar
ganji
##nce
music
lee
not
1977
1971
1973
##per
an
faq
comment
##って
days
##ock
116
##bs
1974
1969
v1
player
1956
xbox
sql
fm
f1
139
##ah
210
##lv
##mp
##000
melody
1957
##３
550
17life
199
1966
xml
market
##au
##71
999
##04
what
gl
##95
##age
tips
##68
book
##ting
mysql
can
1959
230
##ung
wonderland
watch
10℃
##ction
9000
mar
mobile
1946
1962
article
##db
part
▲top
party
って
1967
1964
1948
##07
##ore
##op
この
dj
##78
##38
010
main
225
1965
##ong
art
320
ad
134
020
##73
117
pm2
japan
228
##08
ts
1963
##ica
der
sm
##36
2019
##wa
ct
##７
##や
##64
1937
homemesh
search
##85
##れは
##tv
##di
macbook
##９
##くたさい
service
##♥
type
った
750
##ier
##si
##75
##います
##ok
best
##ット
goris
lock
##った
cf
3m
big
##ut
ftp
carol
##vi
１０
1961
happy
sd
##ac
122
anti
pe
cnn
iii
1920
138
##ラ
1940
esp
jan
tags
##98
##51
august
vol
##86
154
##™
##fs
##れ
##sion
design
ac
##ム
press
jordan
ppp
that
key
check
##６
##tt
##㎡
1080p
##lt
power
##42
1952
##bc
vivi
##ック
he
133
121
jpg
##rry
201
175
3500
1947
nb
##ted
##rn
しています
1954
usd
##t00
master
##ンク
001
model
##58
al
##09
1953
##34
ram
goo
ても
##ui
127
1930
red
##ary
rpg
item
##pm
##41
270
##za
project
##2012
hot
td
blogabstract
##ger
##62
650
##44
gr2
##します
##ｍ
black
electronic
nfc
year
asus
また
html5
cindy
##hd
m3
132
esc
##od
booking
##53
fed
tvb
##81
##ina
mit
165
##いる
chan
192
distribution
next
になる
peter
bios
steam
cm
1941
にも
pk10
##ix
##65
##91
dec
nasa
##ana
icecat
00z
b1
will
##46
li
se
##ji
##み
##ard
oct
##ain
jp
##ze
##bi
cio
##56
smart
h5
##39
##port
curve
vpn
##nm
##dia
utc
##あり
12345678910
##52
rmvb
chanel
a4
miss
##and
##im
media
who
##63
she
girl
5s
124
vera
##して
class
vivo
king
##フ
##ei
national
ab
1951
5cm
888
145
ipod
ap
1100
5mm
211
ms
2756
##69
mp4
msci
##po
##89
131
mg
index
380
##bit
##out
##zz
##97
##67
158
apec
##８
photoshop
opec
￥799
ては
##96
##tes
##ast
2g
○○
##ール
￥2899
##ling
##よ
##ory
1938
##ical
kitty
content
##43
step3
##cn
win8
155
vc
1400
iphone7
robert
##した
tcl
137
beauty
##87
en
dollars
##ys
##oc
step
pay
yy
a1
##2011
##lly
##ks
##♪
1939
188
download
1944
sep
exe
ph
います
school
gb
center
pr
street
##board
uv
##37
##lan
winrar
##que
##ua
##com
1942
1936
480
gpu
##４
ettoday
fu
tom
##54
##ren
##via
149
##72
b2b
144
##79
##tch
rose
arm
mb
##49
##ial
##nn
nvidia
step4
mvp
00㎡
york
156
##イ
how
cpi
591
2765
gov
kg
joe
##xx
mandy
pa
##ser
copyright
fashion
1935
don
##け
ecu
##ist
##art
erp
wap
have
##lm
talk
##ek
##ning
##if
ch
##ite
video
1943
cs
san
iot
look
##84
##2010
##ku
october
##ux
trump
##hs
##ide
box
141
first
##ins
april
##ight
##83
185
angel
protected
aa
151
162
x1
m2
##fe
##×
##ho
size
143
min
ofo
fun
gomaji
ex
hdmi
food
dns
march
chris
kevin
##のか
##lla
##pp
##ec
ag
ems
6s
720p
##rm
##ham
off
##92
asp
team
fandom
ed
299
▌♥
##ell
info
されています
##82
sina
4066
161
##able
##ctor
330
399
315
dll
rights
ltd
idc
jul
3kg
1927
142
ma
surface
##76
##ク
～～～
304
mall
eps
146
green
##59
map
space
donald
v2
sodu
##light
1931
148
1700
まて
310
reserved
htm
##han
##57
2d
178
mod
##ise
##tions
152
ti
##shi
doc
1933
icp
055
wang
##ram
shopping
aug
##pi
##well
now
wam
b2
からお
##hu
236
1928
##gb
266
f2
##93
153
mix
##ef
##uan
bwl
##plus
##res
core
##ess
tea
5℃
hktvmall
nhk
##ate
list
##ese
301
feb
4m
inn
ての
nov
159
12345
daniel
##ci
pass
##bet
##nk
coffee
202
ssl
airbnb
##ute
fbi
woshipm
skype
ea
cg
sp
##fc
##www
yes
edge
alt
007
##94
fpga
##ght
##gs
iso9001
さい
##ile
##wood
##uo
image
lin
icon
american
##em
1932
set
says
##king
##tive
blogger
##74
なと
256
147
##ox
##zy
##red
##ium
##lf
nokia
claire
##リ
##ding
november
lohas
##500
##tic
##マ
##cs
##ある
##che
##ire
##gy
##ult
db
january
win
##カ
166
road
ptt
##ま
##つ
198
##fa
##mer
anna
pchome
はい
udn
ef
420
##time
##tte
2030
##ア
g20
white
かかります
1929
308
garden
eleven
di
##おります
chen
309b
777
172
young
cosplay
ちてない
4500
bat
##123
##tra
##ては
kindle
npc
steve
etc
##ern
##｜
call
xperia
ces
travel
sk
s7
##ous
1934
##int
みいたたけます
183
edu
file
cho
qr
##car
##our
186
##ant
##ｄ
eric
1914
rends
##jo
##する
mastercard
##2000
kb
##min
290
##ino
vista
##ris
##ud
jack
2400
##set
169
pos
1912
##her
##ou
taipei
しく
205
beta
##ませんか
232
##fi
express
255
body
##ill
aphojoy
user
december
meiki
##ick
tweet
richard
##av
##ᆫ
iphone6
##dd
ちてすか
views
##mark
321
pd
##００
times
##▲
level
##ash
10g
point
5l
##ome
208
koreanmall
##ak
george
q2
206
wma
tcp
##200
スタッフ
full
mlb
##lle
##watch
tm
run
179
911
smith
business
##und
1919
color
##tal
222
171
##less
moon
4399
##rl
update
pcb
shop
499
157
little
なし
end
##mhz
van
dsp
easy
660
##house
##key
history
##ｏ
oh
##001
##hy
##web
oem
let
was
##2009
##gg
review
##wan
182
##°c
203
uc
title
##val
united
233
2021
##ons
doi
trivago
overdope
sbs
##ance
##ち
grand
special
573032185
imf
216
wx17house
##so
##ーム
audi
##he
london
william
##rp
##ake
science
beach
cfa
amp
ps4
880
##800
##link
##hp
crm
ferragamo
bell
make
##eng
195
under
zh
photos
2300
##style
##ント
via
176
da
##gi
company
i7
##ray
thomas
370
ufo
i5
##max
plc
ben
back
research
8g
173
mike
##pc
##ッフ
september
189
##ace
vps
february
167
pantos
wp
lisa
1921
★★
jquery
night
long
offer
##berg
##news
1911
##いて
ray
fks
wto
せます
over
164
340
##all
##rus
1924
##888
##works
blogtitle
loftpermalink
##→
187
martin
test
ling
km
##め
15000
fda
v3
##ja
##ロ
ｗedding
かある
outlet
family
##ea
をこ
##top
story
##ness
salvatore
##lu
204
swift
215
room
している
oracle
##ul
1925
sam
b2c
week
pi
rock
##のは
##ａ
##けと
##ean
##300
##gle
cctv
after
chinese
##back
powered
x2
##tan
1918
##nes
##イン
canon
only
181
##zi
##las
say
##oe
184
##sd
221
##bot
##world
##zo
sky
made
top100
just
1926
pmi
802
234
gap
##vr
177
les
174
▲topoct
ball
vogue
vi
ing
ofweek
cos
##list
##ort
▲topmay
##なら
##lon
として
last
##tc
##of
##bus
##gen
real
eva
##コ
a3
nas
##lie
##ria
##coin
##bt
▲topapr
his
212
cat
nata
vive
health
⋯⋯
drive
sir
▲topmar
du
cup
##カー
##ook
##よう
##sy
alex
msg
tour
しました
3ce
##word
193
ebooks
r8
block
318
##より
2200
nice
pvp
207
months
1905
rewards
##ther
1917
0800
##xi
##チ
##sc
micro
850
gg
blogfp
op
1922
daily
m1
264
true
##bb
ml
##tar
##のお
##ky
anthony
196
253
##yo
state
218
##ara
##aa
##rc
##tz
##ston
より
gear
##eo
##ade
ge
see
1923
##win
##ura
ss
heart
##den
##ita
down
##sm
el
png
2100
610
rakuten
whatsapp
bay
dream
add
##use
680
311
pad
gucci
mpv
##ode
##fo
island
▲topjun
##▼
223
jason
214
chicago
##❤
しの
##hone
io
##れる
##ことか
sogo
be2
##ology
990
cloud
vcd
##con
2～3
##ford
##joy
##kb
##こさいます
##rade
but
##ach
docker
##ful
rfid
ul
##ase
hit
ford
##star
580
##○
１１
a2
sdk
reading
edited
##are
cmos
##mc
238
siri
light
##ella
##ため
bloomberg
##read
pizza
##ison
jimmy
##vm
college
node
journal
ba
18k
##play
245
##cer
２０
magic
##yu
191
jump
288
tt
##ings
asr
##lia
3200
step5
network
##cd
mc
いします
1234
pixstyleme
273
##600
2800
money
★★★★★
1280
１２
430
bl
みの
act
##tus
tokyo
##rial
##life
emba
##ae
saas
tcs
##rk
##wang
summer
##sp
ko
##ving
390
premium
##その
netflix
##ヒ
uk
mt
##lton
right
frank
two
209
える
##ple
##cal
021
##んな
##sen
##ville
hold
nexus
dd
##ius
てお
##mah
##なく
tila
zero
820
ce
##tin
resort
##ws
charles
old
p10
5d
report
##360
##ru
##には
bus
vans
lt
##est
pv
##レ
links
rebecca
##ツ
##dm
azure
##365
きな
limited
bit
4gb
##mon
1910
moto
##eam
213
1913
var
eos
なとの
226
blogspot
された
699
e3
dos
dm
fc
##ments
##ik
##kw
boy
##bin
##ata
960
er
##せ
219
##vin
##tu
##ula
194
##∥
station
##ろ
##ature
835
files
zara
hdr
top10
nature
950
magazine
s6
marriott
##シ
avira
case
##っと
tab
##ran
tony
##home
oculus
im
##ral
jean
saint
cry
307
rosie
##force
##ini
ice
##bert
のある
##nder
##mber
pet
2600
##◆
plurk
▲topdec
##sis
00kg
▲topnov
720
##ence
tim
##ω
##nc
##ても
##name
log
ips
great
ikea
malaysia
unix
##イト
3600
##ncy
##nie
12000
akb48
##ye
##oid
404
##chi
##いた
oa
xuehai
##1000
##orm
##rf
275
さん
##ware
##リー
980
ho
##pro
text
##era
560
bob
227
##ub
##2008
8891
scp
avi
##zen
2022
mi
wu
museum
qvod
apache
lake
jcb
▲topaug
★★★
ni
##hr
hill
302
ne
weibo
490
ruby
##ーシ
##ヶ
##row
4d
▲topjul
iv
##ish
github
306
mate
312
##スト
##lot
##ane
andrew
のハイト
##tina
t1
rf
ed2k
##vel
##900
way
final
りの
ns
5a
705
197
##メ
sweet
bytes
##ene
▲topjan
231
##cker
##2007
##px
100g
topapp
229
helpapp
rs
low
14k
g4g
care
630
ldquo
あり
##fork
leave
rm
edition
##gan
##zon
##qq
▲topsep
##google
##ism
gold
224
explorer
##zer
toyota
category
select
visual
##labels
restaurant
##md
posts
s1
##ico
もっと
angelababy
123456
217
sports
s3
mbc
1915
してくたさい
shell
x86
candy
##new
kbs
face
xl
470
##here
4a
swissinfo
v8
▲topfeb
dram
##ual
##vice
3a
##wer
sport
q1
ios10
public
int
card
##ｃ
ep
au
rt
##れた
1080
bill
##mll
kim
３０
460
wan
##uk
##ミ
x3
298
0t
scott
##ming
239
e5
##3d
h7n9
worldcat
brown
##あります
##vo
##led
##580
##ax
249
410
##ert
paris
##～6
polo
925
##lr
599
##ナ
capital
##hing
bank
cv
1g
##chat
##ｓ
##たい
adc
##ule
2m
##ｅ
digital
hotmail
268
##pad
870
bbq
quot
##ring
before
wali
##まて
mcu
2k
2b
という
costco
316
north
333
switch
##city
##ｐ
philips
##mann
management
panasonic
##cl
##vd
##ping
##rge
alice
##lk
##ましょう
css3
##ney
vision
alpha
##ular
##400
##tter
lz
にお
##ありません
mode
gre
1916
pci
##tm
237
1～2
##yan
##そ
について
##let
##キ
work
war
coach
ah
mary
##ᅵ
huang
##pt
a8
pt
follow
##berry
1895
##ew
a5
ghost
##ション
##wn
##og
south
##code
girls
##rid
action
villa
git
r11
table
games
##cket
error
##anonymoussaid
##ag
here
##ame
##gc
qa
##■
##lis
gmp
##gin
vmalife
##cher
yu
wedding
##tis
demo
dragon
530
soho
social
bye
##rant
river
orz
acer
325
##↑
##ース
##ats
261
del
##ven
440
ups
##ように
##ター
305
value
macd
yougou
##dn
661
##ano
ll
##urt
##rent
continue
script
##wen
##ect
paper
263
319
shift
##chel
##フト
##cat
258
x5
fox
243
##さん
car
aaa
##blog
loading
##yn
##tp
kuso
799
si
sns
イカせるテンマ
ヒンクテンマ3
rmb
vdc
forest
central
prime
help
ultra
##rmb
##ような
241
square
688
##しい
のないフロクに
##field
##reen
##ors
##ju
c1
start
510
##air
##map
cdn
##wo
cba
stephen
m8
100km
##get
opera
##base
##ood
vsa
com™
##aw
##ail
251
なのて
count
t2
##ᅡ
##een
2700
hop
##gp
vsc
tree
##eg
##ose
816
285
##ories
##shop
alphago
v4
1909
simon
##ᆼ
fluke62max
zip
スホンサー
##sta
louis
cr
bas
##～10
bc
##yer
hadoop
##ube
##wi
1906
0755
hola
##low
place
centre
5v
d3
##fer
252
##750
##media
281
540
0l
exchange
262
series
##ハー
##san
eb
##bank
##ｋ
q3
##nge
##mail
take
##lp
259
1888
client
east
cache
event
vincent
##ールを
きを
##nse
sui
855
adchoice
##и
##stry
##なたの
246
##zone
ga
apps
sea
##ab
248
cisco
##タ
##rner
kymco
##care
dha
##pu
##yi
minkoff
royal
p1
への
annie
269
collection
kpi
playstation
257
になります
866
bh
##bar
queen
505
radio
1904
andy
armani
##xy
manager
iherb
##ery
##share
spring
raid
johnson
1908
##ob
volvo
hall
##ball
v6
our
taylor
##hk
bi
242
##cp
kate
bo
water
technology
##rie
サイトは
277
##ona
##sl
hpv
303
gtx
hip
rdquo
jayz
stone
##lex
##rum
namespace
##やり
620
##ale
##atic
des
##erson
##ql
##ves
##type
enter
##この
##てきます
d2
##168
##mix
##bian
との
a9
jj
ky
##lc
access
movie
##hc
リストに
tower
##ration
##mit
ます
##nch
ua
tel
prefix
##o2
1907
##point
1901
ott
～10
##http
##ury
baidu
##ink
member
##logy
bigbang
nownews
##js
##shot
##tb
##こと
247
eba
##tics
##lus
ける
v5
spark
##ama
there
##ions
god
##lls
##down
hiv
##ress
burberry
day2
##kv
◆◆
jeff
related
film
edit
joseph
283
##ark
cx
32gb
order
g9
30000
##ans
##tty
s5
##bee
かあります
thread
xr
buy
sh
005
land
spotify
mx
##ari
276
##verse
×email
sf
why
##ことて
244
7headlines
nego
sunny
dom
exo
401
666
positioning
fit
rgb
##tton
278
kiss
alexa
adam
lp
みリストを
##ｇ
mp
##ties
##llow
amy
##du
np
002
institute
271
##rth
##lar
2345
590
##des
sidebar
１５
imax
site
##cky
##kit
##ime
##009
season
323
##fun
##ンター
##ひ
gogoro
a7
pu
lily
fire
twd600
##ッセーシを
いて
##vis
30ml
##cture
##をお
information
##オ
close
friday
##くれる
yi
nick
てすか
##tta
##tel
6500
##lock
cbd
economy
254
かお
267
tinker
double
375
8gb
voice
##app
oops
channel
today
985
##right
raw
xyz
##＋
jim
edm
##cent
7500
supreme
814
ds
##its
##asia
dropbox
##てすか
##tti
books
272
100ml
##tle
##ller
##ken
##more
##boy
sex
309
##dom
t3
##ider
##なります
##unch
1903
810
feel
5500
##かった
##put
により
s2
mo
##gh
men
ka
amoled
div
##tr
##n1
port
howard
##tags
ken
dnf
##nus
adsense
##а
ide
##へ
buff
thunder
##town
##ique
has
##body
auto
pin
##erry
tee
てした
295
number
##the
##013
object
psp
cool
udnbkk
16gb
##mic
miui
##tro
most
r2
##alk
##nity
1880
±0
##いました
428
s4
law
version
##oa
n1
sgs
docomo
##tf
##ack
henry
fc2
##ded
##sco
##014
##rite
286
0mm
linkedin
##ada
##now
wii
##ndy
ucbug
##◎
sputniknews
legalminer
##ika
##xp
2gb
##bu
q10
oo
b6
come
##rman
cheese
ming
maker
##gm
nikon
##fig
ppi
kelly
##ります
jchere
てきます
ted
md
003
fgo
tech
##tto
dan
soc
##gl
##len
hair
earth
640
521
img
##pper
##a1
##てきる
##ロク
acca
##ition
##ference
suite
##ig
outlook
##mond
##cation
398
##pr
279
101vip
358
##999
282
64gb
3800
345
airport
##over
284
##おり
jones
##ith
lab
##su
##いるのて
co2
town
piece
##llo
no1
vmware
24h
##qi
focus
reader
##admin
##ora
tb
false
##log
1898
know
lan
838
##ces
f4
##ume
motel
stop
##oper
na
flickr
netcomponents
##af
##─
pose
williams
local
##ound
##cg
##site
##iko
いお
274
5m
gsm
con
##ath
1902
friends
##hip
cell
317
##rey
780
cream
##cks
012
##dp
facebooktwitterpinterestgoogle
sso
324
shtml
song
swiss
##mw
##キンク
lumia
xdd
string
tiffany
522
marc
られた
insee
russell
sc
dell
##ations
ｏｋ
camera
289
##vs
##flow
##late
classic
287
##nter
stay
g1
mtv
512
##ever
##lab
##nger
qe
sata
ryan
d1
50ml
cms
##cing
su
292
3300
editor
296
##nap
security
sunday
association
##ens
##700
##bra
acg
##かり
sofascore
とは
mkv
##ign
jonathan
gary
build
labels
##oto
tesla
moba
qi
gohappy
general
ajax
1024
##かる
サイト
society
##test
##urs
wps
fedora
##ich
mozilla
328
##480
##dr
usa
urn
##lina
##ｒ
grace
##die
##try
##ader
1250
##なり
elle
570
##chen
##ᆯ
price
##ten
uhz
##ough
eq
##hen
states
push
session
balance
wow
506
##cus
##py
when
##ward
##ep
34e
wong
library
prada
##サイト
##cle
running
##ree
313
ck
date
q4
##ctive
##ool
##＞
mk
##ira
##163
388
die
secret
rq
dota
buffet
は１ヶ
e6
##ez
pan
368
ha
##card
##cha
2a
##さ
alan
day3
eye
f3
##end
france
keep
adi
rna
tvbs
##ala
solo
nova
##え
##tail
##ょう
support
##ries
##なる
##ved
base
copy
iis
fps
##ways
hero
hgih
profile
fish
mu
ssh
entertainment
chang
##wd
click
cake
##ond
pre
##tom
kic
pixel
##ov
##fl
product
6a
##pd
dear
##gate
es
yumi
audio
##²
##sky
echo
bin
where
##ture
329
##ape
find
sap
isis
##なと
nand
##101
##load
##ream
band
a6
525
never
##post
festival
50cm
##we
555
guide
314
zenfone
##ike
335
gd
forum
jessica
strong
alexander
##ould
software
allen
##ious
program
360°
else
lohasthree
##gar
することかてきます
please
##れます
rc
##ggle
##ric
bim
50000
##own
eclipse
355
brian
3ds
##side
061
361
##other
##ける
##tech
##ator
485
engine
##ged
##ｔ
plaza
##fit
cia
ngo
westbrook
shi
tbs
50mm
##みませんか
sci
291
reuters
##ily
contextlink
##hn
af
##cil
bridge
very
##cel
1890
cambridge
##ize
15g
##aid
##data
790
frm
##head
award
butler
##sun
meta
##mar
america
ps3
puma
pmid
##すか
lc
670
kitchen
##lic
オーフン5
きなしソフトサーヒス
そして
day1
future
★★★★
##text
##page
##rris
pm1
##ket
fans
##っています
1001
christian
bot
kids
trackback
##hai
c3
display
##hl
n2
1896
idea
さんも
##sent
airmail
##ug
##men
pwm
けます
028
##lution
369
852
awards
schemas
354
asics
wikipedia
font
##tional
##vy
c2
293
##れている
##dget
##ein
っている
contact
pepper
スキル
339
##～5
294
##uel
##ument
730
##hang
みてす
q5
##sue
rain
##ndi
wei
swatch
##cept
わせ
331
popular
##ste
##tag
p2
501
trc
1899
##west
##live
justin
honda
ping
messenger
##rap
v9
543
##とは
unity
appqq
はすへて
025
leo
##tone
##テ
##ass
uniqlo
##010
502
her
jane
memory
moneydj
##tical
human
12306
していると
##m2
coc
miacare
##mn
tmt
##core
vim
kk
##may
fan
target
use
too
338
435
2050
867
737
fast
##2c
services
##ope
omega
energy
##わ
pinkoi
1a
##なから
##rain
jackson
##ement
##シャンルの
374
366
そんな
p9
rd
##ᆨ
1111
##tier
##vic
zone
##│
385
690
dl
isofix
cpa
m4
322
kimi
めて
davis
##lay
lulu
##uck
050
weeks
qs
##hop
920
##ｎ
ae
##ear
～5
eia
405
##fly
korea
jpeg
boost
##ship
small
##リア
1860
eur
297
425
valley
##iel
simple
##ude
rn
k2
##ena
されます
non
patrick
しているから
##ナー
feed
5757
30g
process
well
qqmei
##thing
they
aws
lu
pink
##ters
##kin
または
board
##vertisement
wine
##ien
unicode
##dge
r1
359
##tant
いを
##twitter
##3c
cool1
される
##れて
##ｌ
isp
##012
standard
45㎡2
402
##150
matt
##fu
326
##iner
googlemsn
pixnetfacebookyahoo
##ラン
x7
886
##uce
メーカー
sao
##ev
##きました
##file
9678
403
xddd
shirt
6l
##rio
##hat
3mm
givenchy
ya
bang
##lio
monday
crystal
ロクイン
##abc
336
head
890
ubuntuforumwikilinuxpastechat
##vc
##～20
##rity
cnc
7866
ipv6
null
1897
##ost
yang
imsean
tiger
##fet
##ンス
352
##＝
dji
327
ji
maria
##come
##んて
foundation
3100
##beth
##なった
1m
601
active
##aft
##don
3p
sr
349
emma
##khz
living
415
353
1889
341
709
457
sas
x6
##face
pptv
x4
##mate
han
sophie
##jing
337
fifa
##mand
other
sale
inwedding
##gn
てきちゃいます
##mmy
##pmlast
bad
nana
nbc
してみてくたさいね
なとはお
##wu
##かあります
##あ
note7
single
##340
せからこ
してくたさい♪この
しにはとんとんワークケートを
するとあなたにもっとマッチした
ならワークケートへ
もみつかっちゃうかも
ワークケートの
##bel
window
##dio
##ht
union
age
382
１４
##ivity
##ｙ
コメント
domain
neo
##isa
##lter
5k
f5
steven
##cts
powerpoint
tft
self
g2
ft
##テル
zol
##act
mwc
381
343
もう
nbapop
408
てある
eds
ace
##room
previous
author
tomtom
il
##ets
hu
financial
☆☆☆
っています
bp
5t
chi
1gb
##hg
fairmont
cross
008
gay
h2
function
##けて
356
also
1b
625
##ータ
##raph
1894
3～5
##ils
i3
334
avenue
##host
による
##bon
##tsu
message
navigation
50g
fintech
h6
##ことを
8cm
##ject
##vas
##firm
credit
##wf
xxxx
form
##nor
##space
huawei
plan
json
sbl
##dc
machine
921
392
wish
##120
##sol
windows7
edward
##ために
development
washington
##nsis
lo
818
##sio
##ym
##bor
planet
##～8
##wt
ieee
gpa
##めて
camp
ann
gm
##tw
##oka
connect
##rss
##work
##atus
wall
chicken
soul
2mm
##times
fa
##ather
##cord
009
##eep
hitachi
gui
harry
##pan
e1
disney
##press
##ーション
wind
386
frigidaire
##tl
liu
hsu
332
basic
von
ev
いた
てきる
スホンサーサイト
learning
##ull
expedia
archives
change
##wei
santa
cut
ins
6gb
turbo
brand
cf1
508
004
return
747
##rip
h1
##nis
##をこ
128gb
##にお
3t
application
しており
emc
rx
##oon
384
quick
412
15058
wilson
wing
chapter
##bug
beyond
##cms
##dar
##oh
zoom
e2
trip
sb
##nba
rcep
342
aspx
ci
080
gc
gnu
める
##count
advanced
dance
dv
##url
##ging
367
8591
am09
shadow
battle
346
##ｉ
##cia
##という
emily
##のてす
##tation
host
ff
techorz
sars
##mini
##mporary
##ering
nc
4200
798
##next
cma
##mbps
##gas
##ift
##dot
##ィ
455
##～17
amana
##りの
426
##ros
ir
00㎡1
##eet
##ible
##↓
710
ˋ▽ˊ
##aka
dcs
iq
##ｖ
l1
##lor
maggie
##011
##iu
588
##～1
830
##gt
1tb
articles
create
##burg
##iki
database
fantasy
##rex
##cam
dlc
dean
##you
hard
path
gaming
victoria
maps
cb
##lee
##itor
overchicstoretvhome
systems
##xt
416
p3
sarah
760
##nan
407
486
x9
install
second
626
##ann
##ph
##rcle
##nic
860
##nar
ec
##とう
768
metro
chocolate
##rian
～4
##table
##しています
skin
##sn
395
mountain
##0mm
inparadise
6m
7x24
ib
4800
##jia
eeworld
creative
g5
g3
357
parker
ecfa
village
からの
18000
sylvia
サーヒス
hbl
##ques
##onsored
##x2
##きます
##v4
##tein
ie6
383
##stack
389
ver
##ads
##baby
sound
bbe
##110
##lone
##uid
ads
022
gundam
351
thinkpad
006
scrum
match
##ave
mems
##470
##oy
##なりました
##talk
glass
lamigo
span
##eme
job
##a5
jay
wade
kde
498
##lace
ocean
tvg
##covery
##r3
##ners
##rea
junior
think
##aine
cover
##ision
##sia
↓↓
##bow
msi
413
458
406
##love
711
801
soft
z2
##pl
456
1840
mobil
mind
##uy
427
nginx
##oi
めた
##rr
6221
##mple
##sson
##ーシてす
371
##nts
91tv
comhd
crv3000
##uard
1868
397
deep
lost
field
gallery
##bia
rate
spf
redis
traction
930
icloud
011
なら
fe
jose
372
##tory
into
sohu
fx
899
379
kicstart2
##hia
すく
##～3
##sit
ra
２４
##walk
##xure
500g
##pact
pacific
xa
natural
carlo
##250
##walker
1850
##can
cto
gigi
516
##サー
pen
##hoo
ob
matlab
##ｂ
##yy
13913459
##iti
mango
##bbs
sense
c5
oxford
##ニア
walker
jennifer
##ola
course
##bre
701
##pus
##rder
lucky
075
##ぁ
ivy
なお
##nia
sotheby
side
##ugh
joy
##orage
##ush
##bat
##dt
364
r9
##2d
##gio
511
country
wear
##lax
##～7
##moon
393
seven
study
411
348
lonzo
8k
##ェ
evolution
##イフ
##kk
gs
kd
##レス
arduino
344
b12
##lux
arpg
##rdon
cook
##x5
dark
five
##als
##ida
とても
sign
362
##ちの
something
20mm
##nda
387
##posted
fresh
tf
1870
422
cam
##mine
##skip
##form
##ssion
education
394
##tee
dyson
stage
##jie
want
##night
epson
pack
あります
##ppy
テリヘル
##█
wd
##eh
##rence
left
##lvin
golden
mhz
discovery
##trix
##n2
loft
##uch
##dra
##sse
speed
～1
1mdb
sorry
welcome
##urn
wave
gaga
##lmer
teddy
##160
トラックハック
せよ
611
##f2016
378
rp
##sha
rar
##あなたに
##きた
840
holiday
##ュー
373
074
##vg
##nos
##rail
gartner
gi
6p
##dium
kit
488
b3
eco
##ろう
20g
sean
##stone
autocad
nu
##np
f16
write
029
m5
##ias
images
atp
##dk
fsm
504
1350
ve
52kb
##xxx
##のに
##cake
414
unit
lim
ru
1v
##ification
published
angela
16g
analytics
ak
##ｑ
##nel
gmt
##icon
again
##₂
##bby
ios11
445
かこさいます
waze
いてす
##ハ
9985
##ust
##ティー
framework
##007
iptv
delete
52sykb
cl
wwdc
027
30cm
##fw
##ての
1389
##xon
brandt
##ses
##dragon
tc
vetements
anne
monte
modern
official
##へて
##ere
##nne
##oud
もちろん
５０
etnews
##a2
##graphy
421
863
##ちゃん
444
##rtex
##てお
l2
##gma
mount
ccd
たと
archive
morning
tan
ddos
e7
##ホ
day4
##ウ
gis
453
its
495
factory
bruce
pg
##ito
ってくたさい
guest
cdma
##lling
536
n3
しかし
3～4
mega
eyes
ro
１３
women
dac
church
##jun
singapore
##facebook
6991
starbucks
##tos
##stin
##shine
zen
##mu
tina
20℃
1893
##たけて
503
465
request
##gence
qt
##っ
1886
347
363
q7
##zzi
diary
##tore
409
##ead
468
cst
##osa
canada
agent
va
##jiang
##ちは
##ーク
##lam
sg
##nix
##sday
##よって
g6
##master
bing
##zl
charlie
１６
8mm
nb40
##ーン
thai
##ルフ
ln284ct
##itz
##2f
bonnie
##food
##lent
originals
##stro
##lts
418
∟∣
##bscribe
children
ntd
yesstyle
##かも
hmv
##tment
d5
2cm
arts
sms
##pn
##я
##いい
topios9
539
lifestyle
virtual
##ague
xz
##deo
muji
024
unt
##nnis
##ᅩ
faq1
1884
396
##ette
fly
64㎡
はしめまして
441
curry
##pop
のこ
release
##←
##◆◆
##cast
073
ありな
500ml
##ews
5c
##stle
ios7
##ima
787
dog
lenovo
##r4
roger
013
cbs
vornado
100m
417
##desk
##クok
##ald
1867
9595
2900
##van
oil
##ｘ
some
break
common
##jy
##lines
g7
twice
419
ella
nano
belle
にこ
##mes
##self
##note
jb
##ことかてきます
benz
##との
##ova
451
save
##wing
##ますのて
kai
りは
##hua
##rect
rainer
##unge
448
##0m
adsl
##かな
guestname
##uma
##kins
##zu
tokichoi
##price
county
##med
##mus
rmk
391
address
vm
えて
openload
##group
##hin
##iginal
amg
urban
##oz
jobs
emi
##public
beautiful
##sch
album
##dden
##bell
jerry
works
hostel
miller
##drive
##rmin
##１０
376
boot
828
##370
##fx
##cm～
1885
##nome
##ctionary
##oman
##lish
##cr
##hm
433
##how
432
francis
xi
c919
b5
evernote
##uc
vga
##3000
coupe
##urg
##cca
##uality
019
6g
れる
multi
##また
##ett
em
hey
##ani
##tax
##rma
inside
than
740
leonnhurt
##jin
ict
れた
bird
notes
200mm
くの
##dical
##lli
result
442
iu
ee
438
smap
gopro
##last
yin
pure
998
32g
けた
5kg
##dan
##rame
mama
##oot
bean
marketing
##hur
2l
bella
sync
xuite
##ground
515
discuz
##getrelax
##ince
##bay
##5s
cj
##イス
gmat
apt
##pass
jing
##rix
c4
rich
##とても
niusnews
##ello
bag
770
##eting
##mobile
１８
culture
015
##のてすか
377
1020
area
##ience
616
details
gp
universal
silver
dit
はお
private
ddd
u11
kanshu
##ified
fung
##nny
dx
##520
tai
475
023
##fr
##lean
3s
##pin
429
##rin
25000
ly
rick
##bility
usb3
banner
##baru
##gion
metal
dt
vdf
1871
karl
qualcomm
bear
1010
oldid
ian
jo
##tors
population
##ernel
1882
mmorpg
##mv
##bike
603
##©
ww
friend
##ager
exhibition
##del
##pods
fpx
structure
##free
##tings
kl
##rley
##copyright
##mma
california
3400
orange
yoga
4l
canmake
honey
##anda
##コメント
595
nikkie
##ルハイト
dhl
publishing
##mall
##gnet
20cm
513
##クセス
##┅
e88
970
##dog
fishbase
##!
##"
###
##$
##%
##&
##'
##(
##)
##*
##+
##,
##-
##.
##/
##:
##;
##<
##=
##>
##?
##@
##[
##\
##]
##^
##_
##{
##|
##}
##~
##£
##¤
##¥
##§
##«
##±
##³
##µ
##·
##¹
##º
##»
##¼
##ß
##æ
##÷
##ø
##đ
##ŋ
##ɔ
##ə
##ɡ
##ʰ
##ˇ
##ˈ
##ˊ
##ˋ
##ˍ
##ː
##˙
##˚
##ˢ
##α
##β
##γ
##δ
##ε
##η
##θ
##ι
##κ
##λ
##μ
##ν
##ο
##π
##ρ
##ς
##σ
##τ
##υ
##φ
##χ
##ψ
##б
##в
##г
##д
##е
##ж
##з
##к
##л
##м
##н
##о
##п
##р
##с
##т
##у
##ф
##х
##ц
##ч
##ш
##ы
##ь
##і
##ا
##ب
##ة
##ت
##د
##ر
##س
##ع
##ل
##م
##ن
##ه
##و
##ي
##۩
##ก
##ง
##น
##ม
##ย
##ร
##อ
##า
##เ
##๑
##་
##ღ
##ᄀ
##ᄁ
##ᄂ
##ᄃ
##ᄅ
##ᄆ
##ᄇ
##ᄈ
##ᄉ
##ᄋ
##ᄌ
##ᄎ
##ᄏ
##ᄐ
##ᄑ
##ᄒ
##ᅢ
##ᅣ
##ᅥ
##ᅦ
##ᅧ
##ᅨ
##ᅪ
##ᅬ
##ᅭ
##ᅮ
##ᅯ
##ᅲ
##ᅳ
##ᅴ
##ᆷ
##ᆸ
##ᆺ
##ᆻ
##ᗜ
##ᵃ
##ᵉ
##ᵍ
##ᵏ
##ᵐ
##ᵒ
##ᵘ
##‖
##„
##†
##•
##‥
##‧
## 
##‰
##′
##″
##‹
##›
##※
##‿
##⁄
##ⁱ
##⁺
##ⁿ
##₁
##₃
##₄
##€
##№
##ⅰ
##ⅱ
##ⅲ
##ⅳ
##ⅴ
##↔
##↗
##↘
##⇒
##∀
##−
##∕
##∙
##√
##∞
##∟
##∠
##∣
##∩
##∮
##∶
##∼
##∽
##≈
##≒
##≡
##≤
##≥
##≦
##≧
##≪
##≫
##⊙
##⋅
##⋈
##⋯
##⌒
##①
##②
##③
##④
##⑤
##⑥
##⑦
##⑧
##⑨
##⑩
##⑴
##⑵
##⑶
##⑷
##⑸
##⒈
##⒉
##⒊
##⒋
##ⓒ
##ⓔ
##ⓘ
##━
##┃
##┆
##┊
##┌
##└
##├
##┣
##═
##║
##╚
##╞
##╠
##╭
##╮
##╯
##╰
##╱
##╳
##▂
##▃
##▅
##▇
##▉
##▋
##▌
##▍
##▎
##□
##▪
##▫
##▬
##△
##▶
##►
##▽
##◇
##◕
##◠
##◢
##◤
##☀
##☕
##☞
##☺
##☼
##♀
##♂
##♠
##♡
##♣
##♦
##♫
##♬
##✈
##✔
##✕
##✖
##✦
##✨
##✪
##✰
##✿
##❀
##➜
##➤
##⦿
##、
##。
##〃
##々
##〇
##〈
##〉
##《
##》
##「
##」
##『
##』
##【
##】
##〓
##〔
##〕
##〖
##〗
##〜
##〝
##〞
##ぃ
##ぇ
##ぬ
##ふ
##ほ
##む
##ゃ
##ゅ
##ゆ
##ょ
##゜
##ゝ
##ァ
##ゥ
##エ
##ォ
##ケ
##サ
##セ
##ソ
##ッ
##ニ
##ヌ
##ネ
##ノ
##ヘ
##モ
##ャ
##ヤ
##ュ
##ユ
##ョ
##ヨ
##ワ
##ヲ
##・
##ヽ
##ㄅ
##ㄆ
##ㄇ
##ㄉ
##ㄋ
##ㄌ
##ㄍ
##ㄎ
##ㄏ
##ㄒ
##ㄚ
##ㄛ
##ㄞ
##ㄟ
##ㄢ
##ㄤ
##ㄥ
##ㄧ
##ㄨ
##ㆍ
##㈦
##㊣
##㗎
##一
##丁
##七
##万
##丈
##三
##上
##下
##不
##与
##丐
##丑
##专
##且
##丕
##世
##丘
##丙
##业
##丛
##东
##丝
##丞
##丟
##両
##丢
##两
##严
##並
##丧
##丨
##个
##丫
##中
##丰
##串
##临
##丶
##丸
##丹
##为
##主
##丼
##丽
##举
##丿
##乂
##乃
##久
##么
##义
##之
##乌
##乍
##乎
##乏
##乐
##乒
##乓
##乔
##乖
##乗
##乘
##乙
##乜
##九
##乞
##也
##习
##乡
##书
##乩
##买
##乱
##乳
##乾
##亀
##亂
##了
##予
##争
##事
##二
##于
##亏
##云
##互
##五
##井
##亘
##亙
##亚
##些
##亜
##亞
##亟
##亡
##亢
##交
##亥
##亦
##产
##亨
##亩
##享
##京
##亭
##亮
##亲
##亳
##亵
##人
##亿
##什
##仁
##仃
##仄
##仅
##仆
##仇
##今
##介
##仍
##从
##仏
##仑
##仓
##仔
##仕
##他
##仗
##付
##仙
##仝
##仞
##仟
##代
##令
##以
##仨
##仪
##们
##仮
##仰
##仲
##件
##价
##任
##份
##仿
##企
##伉
##伊
##伍
##伎
##伏
##伐
##休
##伕
##众
##优
##伙
##会
##伝
##伞
##伟
##传
##伢
##伤
##伦
##伪
##伫
##伯
##估
##伴
##伶
##伸
##伺
##似
##伽
##佃
##但
##佇
##佈
##位
##低
##住
##佐
##佑
##体
##佔
##何
##佗
##佘
##余
##佚
##佛
##作
##佝
##佞
##佟
##你
##佢
##佣
##佤
##佥
##佩
##佬
##佯
##佰
##佳
##併
##佶
##佻
##佼
##使
##侃
##侄
##來
##侈
##例
##侍
##侏
##侑
##侖
##侗
##供
##依
##侠
##価
##侣
##侥
##侦
##侧
##侨
##侬
##侮
##侯
##侵
##侶
##侷
##便
##係
##促
##俄
##俊
##俎
##俏
##俐
##俑
##俗
##俘
##俚
##保
##俞
##俟
##俠
##信
##俨
##俩
##俪
##俬
##俭
##修
##俯
##俱
##俳
##俸
##俺
##俾
##倆
##倉
##個
##倌
##倍
##倏
##們
##倒
##倔
##倖
##倘
##候
##倚
##倜
##借
##倡
##値
##倦
##倩
##倪
##倫
##倬
##倭
##倶
##债
##值
##倾
##偃
##假
##偈
##偉
##偌
##偎
##偏
##偕
##做
##停
##健
##側
##偵
##偶
##偷
##偻
##偽
##偿
##傀
##傅
##傍
##傑
##傘
##備
##傚
##傢
##傣
##傥
##储
##傩
##催
##傭
##傲
##傳
##債
##傷
##傻
##傾
##僅
##働
##像
##僑
##僕
##僖
##僚
##僥
##僧
##僭
##僮
##僱
##僵
##價
##僻
##儀
##儂
##億
##儆
##儉
##儋
##儒
##儕
##儘
##償
##儡
##優
##儲
##儷
##儼
##儿
##兀
##允
##元
##兄
##充
##兆
##兇
##先
##光
##克
##兌
##免
##児
##兑
##兒
##兔
##兖
##党
##兜
##兢
##入
##內
##全
##兩
##八
##公
##六
##兮
##兰
##共
##兲
##关
##兴
##兵
##其
##具
##典
##兹
##养
##兼
##兽
##冀
##内
##円
##冇
##冈
##冉
##冊
##册
##再
##冏
##冒
##冕
##冗
##写
##军
##农
##冠
##冢
##冤
##冥
##冨
##冪
##冬
##冯
##冰
##冲
##决
##况
##冶
##冷
##冻
##冼
##冽
##冾
##净
##凄
##准
##凇
##凈
##凉
##凋
##凌
##凍
##减
##凑
##凛
##凜
##凝
##几
##凡
##凤
##処
##凪
##凭
##凯
##凰
##凱
##凳
##凶
##凸
##凹
##出
##击
##函
##凿
##刀
##刁
##刃
##分
##切
##刈
##刊
##刍
##刎
##刑
##划
##列
##刘
##则
##刚
##创
##初
##删
##判
##別
##刨
##利
##刪
##别
##刮
##到
##制
##刷
##券
##刹
##刺
##刻
##刽
##剁
##剂
##剃
##則
##剉
##削
##剋
##剌
##前
##剎
##剐
##剑
##剔
##剖
##剛
##剜
##剝
##剣
##剤
##剥
##剧
##剩
##剪
##副
##割
##創
##剷
##剽
##剿
##劃
##劇
##劈
##劉
##劊
##劍
##劏
##劑
##力
##劝
##办
##功
##加
##务
##劣
##动
##助
##努
##劫
##劭
##励
##劲
##劳
##労
##劵
##効
##劾
##势
##勁
##勃
##勇
##勉
##勋
##勐
##勒
##動
##勖
##勘
##務
##勛
##勝
##勞
##募
##勢
##勤
##勧
##勳
##勵
##勸
##勺
##勻
##勾
##勿
##匀
##包
##匆
##匈
##匍
##匐
##匕
##化
##北
##匙
##匝
##匠
##匡
##匣
##匪
##匮
##匯
##匱
##匹
##区
##医
##匾
##匿
##區
##十
##千
##卅
##升
##午
##卉
##半
##卍
##华
##协
##卑
##卒
##卓
##協
##单
##卖
##南
##単
##博
##卜
##卞
##卟
##占
##卡
##卢
##卤
##卦
##卧
##卫
##卮
##卯
##印
##危
##即
##却
##卵
##卷
##卸
##卻
##卿
##厂
##厄
##厅
##历
##厉
##压
##厌
##厕
##厘
##厚
##厝
##原
##厢
##厥
##厦
##厨
##厩
##厭
##厮
##厲
##厳
##去
##县
##叁
##参
##參
##又
##叉
##及
##友
##双
##反
##収
##发
##叔
##取
##受
##变
##叙
##叛
##叟
##叠
##叡
##叢
##口
##古
##句
##另
##叨
##叩
##只
##叫
##召
##叭
##叮
##可
##台
##叱
##史
##右
##叵
##叶
##号
##司
##叹
##叻
##叼
##叽
##吁
##吃
##各
##吆
##合
##吉
##吊
##吋
##同
##名
##后
##吏
##吐
##向
##吒
##吓
##吕
##吖
##吗
##君
##吝
##吞
##吟
##吠
##吡
##否
##吧
##吨
##吩
##含
##听
##吭
##吮
##启
##吱
##吳
##吴
##吵
##吶
##吸
##吹
##吻
##吼
##吽
##吾
##呀
##呂
##呃
##呆
##呈
##告
##呋
##呎
##呐
##呓
##呕
##呗
##员
##呛
##呜
##呢
##呤
##呦
##周
##呱
##呲
##味
##呵
##呷
##呸
##呻
##呼
##命
##咀
##咁
##咂
##咄
##咆
##咋
##和
##咎
##咏
##咐
##咒
##咔
##咕
##咖
##咗
##咘
##咙
##咚
##咛
##咣
##咤
##咦
##咧
##咨
##咩
##咪
##咫
##咬
##咭
##咯
##咱
##咲
##咳
##咸
##咻
##咽
##咿
##哀
##品
##哂
##哄
##哆
##哇
##哈
##哉
##哋
##哌
##响
##哎
##哏
##哐
##哑
##哒
##哔
##哗
##哟
##員
##哥
##哦
##哧
##哨
##哩
##哪
##哭
##哮
##哲
##哺
##哼
##哽
##唁
##唄
##唆
##唇
##唉
##唏
##唐
##唑
##唔
##唠
##唤
##唧
##唬
##售
##唯
##唰
##唱
##唳
##唷
##唸
##唾
##啃
##啄
##商
##啉
##啊
##問
##啓
##啕
##啖
##啜
##啞
##啟
##啡
##啤
##啥
##啦
##啧
##啪
##啫
##啬
##啮
##啰
##啱
##啲
##啵
##啶
##啷
##啸
##啻
##啼
##啾
##喀
##喂
##喃
##善
##喆
##喇
##喉
##喊
##喋
##喎
##喏
##喔
##喘
##喙
##喚
##喜
##喝
##喟
##喧
##喪
##喫
##喬
##單
##喰
##喱
##喲
##喳
##喵
##営
##喷
##喹
##喺
##喻
##喽
##嗅
##嗆
##嗇
##嗎
##嗑
##嗒
##嗓
##嗔
##嗖
##嗚
##嗜
##嗝
##嗟
##嗡
##嗣
##嗤
##嗦
##嗨
##嗪
##嗬
##嗯
##嗰
##嗲
##嗳
##嗶
##嗷
##嗽
##嘀
##嘅
##嘆
##嘈
##嘉
##嘌
##嘍
##嘎
##嘔
##嘖
##嘗
##嘘
##嘚
##嘛
##嘜
##嘞
##嘟
##嘢
##嘣
##嘤
##嘧
##嘩
##嘭
##嘮
##嘯
##嘰
##嘱
##嘲
##嘴
##嘶
##嘸
##嘹
##嘻
##嘿
##噁
##噌
##噎
##噓
##噔
##噗
##噙
##噜
##噠
##噢
##噤
##器
##噩
##噪
##噬
##噱
##噴
##噶
##噸
##噹
##噻
##噼
##嚀
##嚇
##嚎
##嚏
##嚐
##嚓
##嚕
##嚟
##嚣
##嚥
##嚨
##嚮
##嚴
##嚷
##嚼
##囂
##囉
##囊
##囍
##囑
##囔
##囗
##囚
##四
##囝
##回
##囟
##因
##囡
##团
##団
##囤
##囧
##囪
##囫
##园
##困
##囱
##囲
##図
##围
##囹
##固
##国
##图
##囿
##圃
##圄
##圆
##圈
##國
##圍
##圏
##園
##圓
##圖
##團
##圜
##土
##圣
##圧
##在
##圩
##圭
##地
##圳
##场
##圻
##圾
##址
##坂
##均
##坊
##坍
##坎
##坏
##坐
##坑
##块
##坚
##坛
##坝
##坞
##坟
##坠
##坡
##坤
##坦
##坨
##坪
##坯
##坳
##坵
##坷
##垂
##垃
##垄
##型
##垒
##垚
##垛
##垠
##垢
##垣
##垦
##垩
##垫
##垭
##垮
##垵
##埂
##埃
##埋
##城
##埔
##埕
##埗
##域
##埠
##埤
##埵
##執
##埸
##培
##基
##埼
##堀
##堂
##堃
##堅
##堆
##堇
##堑
##堕
##堙
##堡
##堤
##堪
##堯
##堰
##報
##場
##堵
##堺
##堿
##塊
##塌
##塑
##塔
##塗
##塘
##塚
##塞
##塢
##塩
##填
##塬
##塭
##塵
##塾
##墀
##境
##墅
##墉
##墊
##墒
##墓
##増
##墘
##墙
##墜
##增
##墟
##墨
##墩
##墮
##墳
##墻
##墾
##壁
##壅
##壆
##壇
##壊
##壑
##壓
##壕
##壘
##壞
##壟
##壢
##壤
##壩
##士
##壬
##壮
##壯
##声
##売
##壳
##壶
##壹
##壺
##壽
##处
##备
##変
##复
##夏
##夔
##夕
##外
##夙
##多
##夜
##够
##夠
##夢
##夥
##大
##天
##太
##夫
##夭
##央
##夯
##失
##头
##夷
##夸
##夹
##夺
##夾
##奂
##奄
##奇
##奈
##奉
##奋
##奎
##奏
##奐
##契
##奔
##奕
##奖
##套
##奘
##奚
##奠
##奢
##奥
##奧
##奪
##奬
##奮
##女
##奴
##奶
##奸
##她
##好
##如
##妃
##妄
##妆
##妇
##妈
##妊
##妍
##妒
##妓
##妖
##妘
##妙
##妝
##妞
##妣
##妤
##妥
##妨
##妩
##妪
##妮
##妲
##妳
##妹
##妻
##妾
##姆
##姉
##姊
##始
##姍
##姐
##姑
##姒
##姓
##委
##姗
##姚
##姜
##姝
##姣
##姥
##姦
##姨
##姪
##姫
##姬
##姹
##姻
##姿
##威
##娃
##娄
##娅
##娆
##娇
##娉
##娑
##娓
##娘
##娛
##娜
##娟
##娠
##娣
##娥
##娩
##娱
##娲
##娴
##娶
##娼
##婀
##婁
##婆
##婉
##婊
##婕
##婚
##婢
##婦
##婧
##婪
##婭
##婴
##婵
##婶
##婷
##婺
##婿
##媒
##媚
##媛
##媞
##媧
##媲
##媳
##媽
##媾
##嫁
##嫂
##嫉
##嫌
##嫑
##嫔
##嫖
##嫘
##嫚
##嫡
##嫣
##嫦
##嫩
##嫲
##嫵
##嫻
##嬅
##嬉
##嬌
##嬗
##嬛
##嬢
##嬤
##嬪
##嬰
##嬴
##嬷
##嬸
##嬿
##孀
##孃
##子
##孑
##孔
##孕
##孖
##字
##存
##孙
##孚
##孛
##孜
##孝
##孟
##孢
##季
##孤
##学
##孩
##孪
##孫
##孬
##孰
##孱
##孳
##孵
##學
##孺
##孽
##孿
##宁
##它
##宅
##宇
##守
##安
##宋
##完
##宏
##宓
##宕
##宗
##官
##宙
##定
##宛
##宜
##宝
##实
##実
##宠
##审
##客
##宣
##室
##宥
##宦
##宪
##宫
##宮
##宰
##害
##宴
##宵
##家
##宸
##容
##宽
##宾
##宿
##寂
##寄
##寅
##密
##寇
##富
##寐
##寒
##寓
##寛
##寝
##寞
##察
##寡
##寢
##寥
##實
##寧
##寨
##審
##寫
##寬
##寮
##寰
##寵
##寶
##寸
##对
##寺
##寻
##导
##対
##寿
##封
##専
##射
##将
##將
##專
##尉
##尊
##尋
##對
##導
##小
##少
##尔
##尕
##尖
##尘
##尚
##尝
##尤
##尧
##尬
##就
##尴
##尷
##尸
##尹
##尺
##尻
##尼
##尽
##尾
##尿
##局
##屁
##层
##屄
##居
##屆
##屈
##屉
##届
##屋
##屌
##屍
##屎
##屏
##屐
##屑
##展
##屜
##属
##屠
##屡
##屢
##層
##履
##屬
##屯
##山
##屹
##屿
##岀
##岁
##岂
##岌
##岐
##岑
##岔
##岖
##岗
##岘
##岙
##岚
##岛
##岡
##岩
##岫
##岬
##岭
##岱
##岳
##岷
##岸
##峇
##峋
##峒
##峙
##峡
##峤
##峥
##峦
##峨
##峪
##峭
##峯
##峰
##峴
##島
##峻
##峽
##崁
##崂
##崆
##崇
##崎
##崑
##崔
##崖
##崗
##崙
##崛
##崧
##崩
##崭
##崴
##崽
##嵇
##嵊
##嵋
##嵌
##嵐
##嵘
##嵩
##嵬
##嵯
##嶂
##嶄
##嶇
##嶋
##嶙
##嶺
##嶼
##嶽
##巅
##巍
##巒
##巔
##巖
##川
##州
##巡
##巢
##工
##左
##巧
##巨
##巩
##巫
##差
##己
##已
##巳
##巴
##巷
##巻
##巽
##巾
##巿
##币
##市
##布
##帅
##帆
##师
##希
##帐
##帑
##帕
##帖
##帘
##帚
##帛
##帜
##帝
##帥
##带
##帧
##師
##席
##帮
##帯
##帰
##帳
##帶
##帷
##常
##帼
##帽
##幀
##幂
##幄
##幅
##幌
##幔
##幕
##幟
##幡
##幢
##幣
##幫
##干
##平
##年
##并
##幸
##幹
##幺
##幻
##幼
##幽
##幾
##广
##庁
##広
##庄
##庆
##庇
##床
##序
##庐
##库
##应
##底
##庖
##店
##庙
##庚
##府
##庞
##废
##庠
##度
##座
##庫
##庭
##庵
##庶
##康
##庸
##庹
##庾
##廁
##廂
##廃
##廈
##廉
##廊
##廓
##廖
##廚
##廝
##廟
##廠
##廢
##廣
##廬
##廳
##延
##廷
##建
##廿
##开
##弁
##异
##弃
##弄
##弈
##弊
##弋
##式
##弑
##弒
##弓
##弔
##引
##弗
##弘
##弛
##弟
##张
##弥
##弦
##弧
##弩
##弭
##弯
##弱
##張
##強
##弹
##强
##弼
##弾
##彅
##彆
##彈
##彌
##彎
##归
##当
##录
##彗
##彙
##彝
##形
##彤
##彥
##彦
##彧
##彩
##彪
##彫
##彬
##彭
##彰
##影
##彷
##役
##彻
##彼
##彿
##往
##征
##径
##待
##徇
##很
##徉
##徊
##律
##後
##徐
##徑
##徒
##従
##徕
##得
##徘
##徙
##徜
##從
##徠
##御
##徨
##復
##循
##徬
##微
##徳
##徴
##徵
##德
##徹
##徼
##徽
##心
##必
##忆
##忌
##忍
##忏
##忐
##忑
##忒
##忖
##志
##忘
##忙
##応
##忠
##忡
##忤
##忧
##忪
##快
##忱
##念
##忻
##忽
##忿
##怀
##态
##怂
##怅
##怆
##怎
##怏
##怒
##怔
##怕
##怖
##怙
##怜
##思
##怠
##怡
##急
##怦
##性
##怨
##怪
##怯
##怵
##总
##怼
##恁
##恃
##恆
##恋
##恍
##恐
##恒
##恕
##恙
##恚
##恢
##恣
##恤
##恥
##恨
##恩
##恪
##恫
##恬
##恭
##息
##恰
##恳
##恵
##恶
##恸
##恺
##恻
##恼
##恿
##悄
##悅
##悉
##悌
##悍
##悔
##悖
##悚
##悟
##悠
##患
##悦
##您
##悩
##悪
##悬
##悯
##悱
##悲
##悴
##悵
##悶
##悸
##悻
##悼
##悽
##情
##惆
##惇
##惊
##惋
##惑
##惕
##惘
##惚
##惜
##惟
##惠
##惡
##惦
##惧
##惨
##惩
##惫
##惬
##惭
##惮
##惯
##惰
##惱
##想
##惴
##惶
##惹
##惺
##愁
##愆
##愈
##愉
##愍
##意
##愕
##愚
##愛
##愜
##感
##愣
##愤
##愧
##愫
##愷
##愿
##慄
##慈
##態
##慌
##慎
##慑
##慕
##慘
##慚
##慟
##慢
##慣
##慧
##慨
##慫
##慮
##慰
##慳
##慵
##慶
##慷
##慾
##憂
##憊
##憋
##憎
##憐
##憑
##憔
##憚
##憤
##憧
##憨
##憩
##憫
##憬
##憲
##憶
##憾
##懂
##懇
##懈
##應
##懊
##懋
##懑
##懒
##懦
##懲
##懵
##懶
##懷
##懸
##懺
##懼
##懾
##懿
##戀
##戈
##戊
##戌
##戍
##戎
##戏
##成
##我
##戒
##戕
##或
##战
##戚
##戛
##戟
##戡
##戦
##截
##戬
##戮
##戰
##戲
##戳
##戴
##戶
##户
##戸
##戻
##戾
##房
##所
##扁
##扇
##扈
##扉
##手
##才
##扎
##扑
##扒
##打
##扔
##払
##托
##扛
##扣
##扦
##执
##扩
##扪
##扫
##扬
##扭
##扮
##扯
##扰
##扱
##扳
##扶
##批
##扼
##找
##承
##技
##抄
##抉
##把
##抑
##抒
##抓
##投
##抖
##抗
##折
##抚
##抛
##抜
##択
##抟
##抠
##抡
##抢
##护
##报
##抨
##披
##抬
##抱
##抵
##抹
##押
##抽
##抿
##拂
##拄
##担
##拆
##拇
##拈
##拉
##拋
##拌
##拍
##拎
##拐
##拒
##拓
##拔
##拖
##拗
##拘
##拙
##拚
##招
##拜
##拟
##拡
##拢
##拣
##拥
##拦
##拧
##拨
##择
##括
##拭
##拮
##拯
##拱
##拳
##拴
##拷
##拼
##拽
##拾
##拿
##持
##挂
##指
##挈
##按
##挎
##挑
##挖
##挙
##挚
##挛
##挝
##挞
##挟
##挠
##挡
##挣
##挤
##挥
##挨
##挪
##挫
##振
##挲
##挹
##挺
##挽
##挾
##捂
##捅
##捆
##捉
##捋
##捌
##捍
##捎
##捏
##捐
##捕
##捞
##损
##捡
##换
##捣
##捧
##捨
##捩
##据
##捱
##捲
##捶
##捷
##捺
##捻
##掀
##掂
##掃
##掇
##授
##掉
##掌
##掏
##掐
##排
##掖
##掘
##掙
##掛
##掠
##採
##探
##掣
##接
##控
##推
##掩
##措
##掬
##掰
##掲
##掳
##掴
##掷
##掸
##掺
##揀
##揃
##揄
##揆
##揉
##揍
##描
##提
##插
##揖
##揚
##換
##握
##揣
##揩
##揪
##揭
##揮
##援
##揶
##揸
##揹
##揽
##搀
##搁
##搂
##搅
##損
##搏
##搐
##搓
##搔
##搖
##搗
##搜
##搞
##搡
##搪
##搬
##搭
##搵
##搶
##携
##搽
##摀
##摁
##摄
##摆
##摇
##摈
##摊
##摒
##摔
##摘
##摞
##摟
##摧
##摩
##摯
##摳
##摸
##摹
##摺
##摻
##撂
##撃
##撅
##撇
##撈
##撐
##撑
##撒
##撓
##撕
##撚
##撞
##撤
##撥
##撩
##撫
##撬
##播
##撮
##撰
##撲
##撵
##撷
##撸
##撻
##撼
##撿
##擀
##擁
##擂
##擄
##擅
##擇
##擊
##擋
##操
##擎
##擒
##擔
##擘
##據
##擞
##擠
##擡
##擢
##擦
##擬
##擰
##擱
##擲
##擴
##擷
##擺
##擼
##擾
##攀
##攏
##攒
##攔
##攘
##攙
##攜
##攝
##攞
##攢
##攣
##攤
##攥
##攪
##攫
##攬
##支
##收
##攸
##改
##攻
##放
##政
##故
##效
##敌
##敍
##敎
##敏
##救
##敕
##敖
##敗
##敘
##教
##敛
##敝
##敞
##敢
##散
##敦
##敬
##数
##敲
##整
##敵
##敷
##數
##斂
##斃
##文
##斋
##斌
##斎
##斐
##斑
##斓
##斗
##料
##斛
##斜
##斟
##斡
##斤
##斥
##斧
##斩
##斫
##斬
##断
##斯
##新
##斷
##方
##於
##施
##旁
##旃
##旅
##旋
##旌
##旎
##族
##旖
##旗
##无
##既
##日
##旦
##旧
##旨
##早
##旬
##旭
##旮
##旱
##时
##旷
##旺
##旻
##昀
##昂
##昆
##昇
##昉
##昊
##昌
##明
##昏
##易
##昔
##昕
##昙
##星
##映
##春
##昧
##昨
##昭
##是
##昱
##昴
##昵
##昶
##昼
##显
##晁
##時
##晃
##晉
##晋
##晌
##晏
##晒
##晓
##晔
##晕
##晖
##晗
##晚
##晝
##晞
##晟
##晤
##晦
##晨
##晩
##普
##景
##晰
##晴
##晶
##晷
##智
##晾
##暂
##暄
##暇
##暈
##暉
##暌
##暐
##暑
##暖
##暗
##暝
##暢
##暧
##暨
##暫
##暮
##暱
##暴
##暸
##暹
##曄
##曆
##曇
##曉
##曖
##曙
##曜
##曝
##曠
##曦
##曬
##曰
##曲
##曳
##更
##書
##曹
##曼
##曾
##替
##最
##會
##月
##有
##朋
##服
##朐
##朔
##朕
##朗
##望
##朝
##期
##朦
##朧
##木
##未
##末
##本
##札
##朮
##术
##朱
##朴
##朵
##机
##朽
##杀
##杂
##权
##杆
##杈
##杉
##李
##杏
##材
##村
##杓
##杖
##杜
##杞
##束
##杠
##条
##来
##杨
##杭
##杯
##杰
##東
##杳
##杵
##杷
##杼
##松
##板
##极
##构
##枇
##枉
##枋
##析
##枕
##林
##枚
##果
##枝
##枢
##枣
##枪
##枫
##枭
##枯
##枰
##枱
##枳
##架
##枷
##枸
##柄
##柏
##某
##柑
##柒
##染
##柔
##柘
##柚
##柜
##柞
##柠
##柢
##查
##柩
##柬
##柯
##柱
##柳
##柴
##柵
##査
##柿
##栀
##栃
##栄
##栅
##标
##栈
##栉
##栋
##栎
##栏
##树
##栓
##栖
##栗
##校
##栩
##株
##样
##核
##根
##格
##栽
##栾
##桀
##桁
##桂
##桃
##桅
##框
##案
##桉
##桌
##桎
##桐
##桑
##桓
##桔
##桜
##桠
##桡
##桢
##档
##桥
##桦
##桧
##桨
##桩
##桶
##桿
##梁
##梅
##梆
##梏
##梓
##梗
##條
##梟
##梢
##梦
##梧
##梨
##梭
##梯
##械
##梳
##梵
##梶
##检
##棂
##棄
##棉
##棋
##棍
##棒
##棕
##棗
##棘
##棚
##棟
##棠
##棣
##棧
##森
##棱
##棲
##棵
##棹
##棺
##椁
##椅
##椋
##植
##椎
##椒
##検
##椪
##椭
##椰
##椹
##椽
##椿
##楂
##楊
##楓
##楔
##楚
##楝
##楞
##楠
##楣
##楨
##楫
##業
##楮
##極
##楷
##楸
##楹
##楼
##楽
##概
##榄
##榆
##榈
##榉
##榔
##榕
##榖
##榛
##榜
##榨
##榫
##榭
##榮
##榱
##榴
##榷
##榻
##槁
##槃
##構
##槌
##槍
##槎
##槐
##槓
##様
##槛
##槟
##槤
##槭
##槲
##槳
##槻
##槽
##槿
##樁
##樂
##樊
##樑
##樓
##標
##樞
##樟
##模
##樣
##権
##横
##樫
##樯
##樱
##樵
##樸
##樹
##樺
##樽
##樾
##橄
##橇
##橋
##橐
##橘
##橙
##機
##橡
##橢
##橫
##橱
##橹
##橼
##檀
##檄
##檎
##檐
##檔
##檗
##檜
##檢
##檬
##檯
##檳
##檸
##檻
##櫃
##櫚
##櫛
##櫥
##櫸
##櫻
##欄
##權
##欒
##欖
##欠
##次
##欢
##欣
##欧
##欲
##欸
##欺
##欽
##款
##歆
##歇
##歉
##歌
##歎
##歐
##歓
##歙
##歛
##歡
##止
##正
##此
##步
##武
##歧
##歩
##歪
##歯
##歲
##歳
##歴
##歷
##歸
##歹
##死
##歼
##殁
##殃
##殆
##殇
##殉
##殊
##残
##殒
##殓
##殖
##殘
##殞
##殡
##殤
##殭
##殯
##殲
##殴
##段
##殷
##殺
##殼
##殿
##毀
##毁
##毂
##毅
##毆
##毋
##母
##毎
##每
##毒
##毓
##比
##毕
##毗
##毘
##毙
##毛
##毡
##毫
##毯
##毽
##氈
##氏
##氐
##民
##氓
##气
##氖
##気
##氙
##氛
##氟
##氡
##氢
##氣
##氤
##氦
##氧
##氨
##氪
##氫
##氮
##氯
##氰
##氲
##水
##氷
##永
##氹
##氾
##汀
##汁
##求
##汆
##汇
##汉
##汎
##汐
##汕
##汗
##汙
##汛
##汝
##汞
##江
##池
##污
##汤
##汨
##汩
##汪
##汰
##汲
##汴
##汶
##汹
##決
##汽
##汾
##沁
##沂
##沃
##沅
##沈
##沉
##沌
##沏
##沐
##沒
##沓
##沖
##沙
##沛
##沟
##没
##沢
##沣
##沥
##沦
##沧
##沪
##沫
##沭
##沮
##沱
##河
##沸
##油
##治
##沼
##沽
##沾
##沿
##況
##泄
##泉
##泊
##泌
##泓
##法
##泗
##泛
##泞
##泠
##泡
##波
##泣
##泥
##注
##泪
##泫
##泮
##泯
##泰
##泱
##泳
##泵
##泷
##泸
##泻
##泼
##泽
##泾
##洁
##洄
##洋
##洒
##洗
##洙
##洛
##洞
##津
##洩
##洪
##洮
##洱
##洲
##洵
##洶
##洸
##洹
##活
##洼
##洽
##派
##流
##浃
##浄
##浅
##浆
##浇
##浊
##测
##济
##浏
##浑
##浒
##浓
##浔
##浙
##浚
##浜
##浣
##浦
##浩
##浪
##浬
##浮
##浯
##浴
##海
##浸
##涂
##涅
##涇
##消
##涉
##涌
##涎
##涓
##涔
##涕
##涙
##涛
##涝
##涞
##涟
##涠
##涡
##涣
##涤
##润
##涧
##涨
##涩
##涪
##涮
##涯
##液
##涵
##涸
##涼
##涿
##淀
##淄
##淅
##淆
##淇
##淋
##淌
##淑
##淒
##淖
##淘
##淙
##淚
##淞
##淡
##淤
##淦
##淨
##淩
##淪
##淫
##淬
##淮
##深
##淳
##淵
##混
##淹
##淺
##添
##淼
##清
##済
##渉
##渊
##渋
##渍
##渎
##渐
##渔
##渗
##渙
##渚
##減
##渝
##渠
##渡
##渣
##渤
##渥
##渦
##温
##測
##渭
##港
##渲
##渴
##游
##渺
##渾
##湃
##湄
##湊
##湍
##湖
##湘
##湛
##湟
##湧
##湫
##湮
##湯
##湳
##湾
##湿
##満
##溃
##溅
##溉
##溏
##源
##準
##溜
##溝
##溟
##溢
##溥
##溧
##溪
##溫
##溯
##溱
##溴
##溶
##溺
##溼
##滁
##滂
##滄
##滅
##滇
##滋
##滌
##滑
##滓
##滔
##滕
##滙
##滚
##滝
##滞
##滟
##满
##滢
##滤
##滥
##滦
##滨
##滩
##滬
##滯
##滲
##滴
##滷
##滸
##滾
##滿
##漁
##漂
##漆
##漉
##漏
##漓
##演
##漕
##漠
##漢
##漣
##漩
##漪
##漫
##漬
##漯
##漱
##漲
##漳
##漸
##漾
##漿
##潆
##潇
##潋
##潍
##潑
##潔
##潘
##潛
##潜
##潞
##潟
##潢
##潤
##潦
##潧
##潭
##潮
##潰
##潴
##潸
##潺
##潼
##澀
##澄
##澆
##澈
##澍
##澎
##澗
##澜
##澡
##澤
##澧
##澱
##澳
##澹
##激
##濁
##濂
##濃
##濑
##濒
##濕
##濘
##濛
##濟
##濠
##濡
##濤
##濫
##濬
##濮
##濯
##濱
##濺
##濾
##瀅
##瀆
##瀉
##瀋
##瀏
##瀑
##瀕
##瀘
##瀚
##瀛
##瀝
##瀞
##瀟
##瀧
##瀨
##瀬
##瀰
##瀾
##灌
##灏
##灑
##灘
##灝
##灞
##灣
##火
##灬
##灭
##灯
##灰
##灵
##灶
##灸
##灼
##災
##灾
##灿
##炀
##炁
##炅
##炉
##炊
##炎
##炒
##炔
##炕
##炖
##炙
##炜
##炫
##炬
##炭
##炮
##炯
##炳
##炷
##炸
##点
##為
##炼
##炽
##烁
##烂
##烃
##烈
##烊
##烏
##烘
##烙
##烛
##烟
##烤
##烦
##烧
##烨
##烩
##烫
##烬
##热
##烯
##烷
##烹
##烽
##焉
##焊
##焕
##焖
##焗
##焘
##焙
##焚
##焜
##無
##焦
##焯
##焰
##焱
##然
##焼
##煅
##煉
##煊
##煌
##煎
##煒
##煖
##煙
##煜
##煞
##煤
##煥
##煦
##照
##煨
##煩
##煮
##煲
##煸
##煽
##熄
##熊
##熏
##熒
##熔
##熙
##熟
##熠
##熨
##熬
##熱
##熵
##熹
##熾
##燁
##燃
##燄
##燈
##燉
##燊
##燎
##燒
##燔
##燕
##燙
##燜
##營
##燥
##燦
##燧
##燭
##燮
##燴
##燻
##燼
##燿
##爆
##爍
##爐
##爛
##爪
##爬
##爭
##爰
##爱
##爲
##爵
##父
##爷
##爸
##爹
##爺
##爻
##爽
##爾
##牆
##片
##版
##牌
##牍
##牒
##牙
##牛
##牝
##牟
##牠
##牡
##牢
##牦
##牧
##物
##牯
##牲
##牴
##牵
##特
##牺
##牽
##犀
##犁
##犄
##犊
##犍
##犒
##犢
##犧
##犬
##犯
##状
##犷
##犸
##犹
##狀
##狂
##狄
##狈
##狎
##狐
##狒
##狗
##狙
##狞
##狠
##狡
##狩
##独
##狭
##狮
##狰
##狱
##狸
##狹
##狼
##狽
##猎
##猕
##猖
##猗
##猙
##猛
##猜
##猝
##猥
##猩
##猪
##猫
##猬
##献
##猴
##猶
##猷
##猾
##猿
##獄
##獅
##獎
##獐
##獒
##獗
##獠
##獣
##獨
##獭
##獰
##獲
##獵
##獷
##獸
##獺
##獻
##獼
##獾
##玄
##率
##玉
##王
##玑
##玖
##玛
##玟
##玠
##玥
##玩
##玫
##玮
##环
##现
##玲
##玳
##玷
##玺
##玻
##珀
##珂
##珅
##珈
##珉
##珊
##珍
##珏
##珐
##珑
##珙
##珞
##珠
##珣
##珥
##珩
##珪
##班
##珮
##珲
##珺
##現
##球
##琅
##理
##琇
##琉
##琊
##琍
##琏
##琐
##琛
##琢
##琥
##琦
##琨
##琪
##琬
##琮
##琰
##琲
##琳
##琴
##琵
##琶
##琺
##琼
##瑀
##瑁
##瑄
##瑋
##瑕
##瑗
##瑙
##瑚
##瑛
##瑜
##瑞
##瑟
##瑠
##瑣
##瑤
##瑩
##瑪
##瑯
##瑰
##瑶
##瑾
##璀
##璁
##璃
##璇
##璉
##璋
##璎
##璐
##璜
##璞
##璟
##璧
##璨
##環
##璽
##璿
##瓊
##瓏
##瓒
##瓜
##瓢
##瓣
##瓤
##瓦
##瓮
##瓯
##瓴
##瓶
##瓷
##甄
##甌
##甕
##甘
##甙
##甚
##甜
##生
##產
##産
##甥
##甦
##用
##甩
##甫
##甬
##甭
##甯
##田
##由
##甲
##申
##电
##男
##甸
##町
##画
##甾
##畀
##畅
##界
##畏
##畑
##畔
##留
##畜
##畝
##畢
##略
##畦
##番
##畫
##異
##畲
##畳
##畴
##當
##畸
##畹
##畿
##疆
##疇
##疊
##疏
##疑
##疔
##疖
##疗
##疙
##疚
##疝
##疟
##疡
##疣
##疤
##疥
##疫
##疮
##疯
##疱
##疲
##疳
##疵
##疸
##疹
##疼
##疽
##疾
##痂
##病
##症
##痈
##痉
##痊
##痍
##痒
##痔
##痕
##痘
##痙
##痛
##痞
##痠
##痢
##痣
##痤
##痧
##痨
##痪
##痫
##痰
##痱
##痴
##痹
##痺
##痼
##痿
##瘀
##瘁
##瘋
##瘍
##瘓
##瘘
##瘙
##瘟
##瘠
##瘡
##瘢
##瘤
##瘦
##瘧
##瘩
##瘪
##瘫
##瘴
##瘸
##瘾
##療
##癇
##癌
##癒
##癖
##癜
##癞
##癡
##癢
##癣
##癥
##癫
##癬
##癮
##癱
##癲
##癸
##発
##登
##發
##白
##百
##皂
##的
##皆
##皇
##皈
##皋
##皎
##皑
##皓
##皖
##皙
##皚
##皮
##皰
##皱
##皴
##皺
##皿
##盂
##盃
##盅
##盆
##盈
##益
##盎
##盏
##盐
##监
##盒
##盔
##盖
##盗
##盘
##盛
##盜
##盞
##盟
##盡
##監
##盤
##盥
##盧
##盪
##目
##盯
##盱
##盲
##直
##相
##盹
##盼
##盾
##省
##眈
##眉
##看
##県
##眙
##眞
##真
##眠
##眦
##眨
##眩
##眯
##眶
##眷
##眸
##眺
##眼
##眾
##着
##睁
##睇
##睏
##睐
##睑
##睛
##睜
##睞
##睡
##睢
##督
##睥
##睦
##睨
##睪
##睫
##睬
##睹
##睽
##睾
##睿
##瞄
##瞅
##瞇
##瞋
##瞌
##瞎
##瞑
##瞒
##瞓
##瞞
##瞟
##瞠
##瞥
##瞧
##瞩
##瞪
##瞬
##瞭
##瞰
##瞳
##瞻
##瞼
##瞿
##矇
##矍
##矗
##矚
##矛
##矜
##矢
##矣
##知
##矩
##矫
##短
##矮
##矯
##石
##矶
##矽
##矾
##矿
##码
##砂
##砌
##砍
##砒
##研
##砖
##砗
##砚
##砝
##砣
##砥
##砧
##砭
##砰
##砲
##破
##砷
##砸
##砺
##砼
##砾
##础
##硅
##硐
##硒
##硕
##硝
##硫
##硬
##确
##硯
##硼
##碁
##碇
##碉
##碌
##碍
##碎
##碑
##碓
##碗
##碘
##碚
##碛
##碟
##碣
##碧
##碩
##碰
##碱
##碳
##碴
##確
##碼
##碾
##磁
##磅
##磊
##磋
##磐
##磕
##磚
##磡
##磨
##磬
##磯
##磲
##磷
##磺
##礁
##礎
##礙
##礡
##礦
##礪
##礫
##礴
##示
##礼
##社
##祀
##祁
##祂
##祇
##祈
##祉
##祎
##祐
##祕
##祖
##祗
##祚
##祛
##祜
##祝
##神
##祟
##祠
##祢
##祥
##票
##祭
##祯
##祷
##祸
##祺
##祿
##禀
##禁
##禄
##禅
##禍
##禎
##福
##禛
##禦
##禧
##禪
##禮
##禱
##禹
##禺
##离
##禽
##禾
##禿
##秀
##私
##秃
##秆
##秉
##秋
##种
##科
##秒
##秘
##租
##秣
##秤
##秦
##秧
##秩
##秭
##积
##称
##秸
##移
##秽
##稀
##稅
##程
##稍
##税
##稔
##稗
##稚
##稜
##稞
##稟
##稠
##稣
##種
##稱
##稲
##稳
##稷
##稹
##稻
##稼
##稽
##稿
##穀
##穂
##穆
##穌
##積
##穎
##穗
##穢
##穩
##穫
##穴
##究
##穷
##穹
##空
##穿
##突
##窃
##窄
##窈
##窍
##窑
##窒
##窓
##窕
##窖
##窗
##窘
##窜
##窝
##窟
##窠
##窥
##窦
##窨
##窩
##窪
##窮
##窯
##窺
##窿
##竄
##竅
##竇
##竊
##立
##竖
##站
##竜
##竞
##竟
##章
##竣
##童
##竭
##端
##競
##竹
##竺
##竽
##竿
##笃
##笆
##笈
##笋
##笏
##笑
##笔
##笙
##笛
##笞
##笠
##符
##笨
##第
##笹
##笺
##笼
##筆
##等
##筊
##筋
##筍
##筏
##筐
##筑
##筒
##答
##策
##筛
##筝
##筠
##筱
##筲
##筵
##筷
##筹
##签
##简
##箇
##箋
##箍
##箏
##箐
##箔
##箕
##算
##箝
##管
##箩
##箫
##箭
##箱
##箴
##箸
##節
##篁
##範
##篆
##篇
##築
##篑
##篓
##篙
##篝
##篠
##篡
##篤
##篩
##篪
##篮
##篱
##篷
##簇
##簌
##簍
##簡
##簦
##簧
##簪
##簫
##簷
##簸
##簽
##簾
##簿
##籁
##籃
##籌
##籍
##籐
##籟
##籠
##籤
##籬
##籮
##籲
##米
##类
##籼
##籽
##粄
##粉
##粑
##粒
##粕
##粗
##粘
##粟
##粤
##粥
##粧
##粪
##粮
##粱
##粲
##粳
##粵
##粹
##粼
##粽
##精
##粿
##糅
##糊
##糍
##糕
##糖
##糗
##糙
##糜
##糞
##糟
##糠
##糧
##糬
##糯
##糰
##糸
##系
##糾
##紀
##紂
##約
##紅
##紉
##紊
##紋
##納
##紐
##紓
##純
##紗
##紘
##紙
##級
##紛
##紜
##素
##紡
##索
##紧
##紫
##紮
##累
##細
##紳
##紹
##紺
##終
##絃
##組
##絆
##経
##結
##絕
##絞
##絡
##絢
##給
##絨
##絮
##統
##絲
##絳
##絵
##絶
##絹
##綁
##綏
##綑
##經
##継
##続
##綜
##綠
##綢
##綦
##綫
##綬
##維
##綱
##網
##綴
##綵
##綸
##綺
##綻
##綽
##綾
##綿
##緊
##緋
##総
##緑
##緒
##緘
##線
##緝
##緞
##締
##緣
##編
##緩
##緬
##緯
##練
##緹
##緻
##縁
##縄
##縈
##縛
##縝
##縣
##縫
##縮
##縱
##縴
##縷
##總
##績
##繁
##繃
##繆
##繇
##繋
##織
##繕
##繚
##繞
##繡
##繩
##繪
##繫
##繭
##繳
##繹
##繼
##繽
##纂
##續
##纍
##纏
##纓
##纔
##纖
##纜
##纠
##红
##纣
##纤
##约
##级
##纨
##纪
##纫
##纬
##纭
##纯
##纰
##纱
##纲
##纳
##纵
##纶
##纷
##纸
##纹
##纺
##纽
##纾
##线
##绀
##练
##组
##绅
##细
##织
##终
##绊
##绍
##绎
##经
##绑
##绒
##结
##绔
##绕
##绘
##给
##绚
##绛
##络
##绝
##绞
##统
##绡
##绢
##绣
##绥
##绦
##继
##绩
##绪
##绫
##续
##绮
##绯
##绰
##绳
##维
##绵
##绶
##绷
##绸
##绻
##综
##绽
##绾
##绿
##缀
##缄
##缅
##缆
##缇
##缈
##缉
##缎
##缓
##缔
##缕
##编
##缘
##缙
##缚
##缜
##缝
##缠
##缢
##缤
##缥
##缨
##缩
##缪
##缭
##缮
##缰
##缱
##缴
##缸
##缺
##缽
##罂
##罄
##罌
##罐
##网
##罔
##罕
##罗
##罚
##罡
##罢
##罩
##罪
##置
##罰
##署
##罵
##罷
##罹
##羁
##羅
##羈
##羊
##羌
##美
##羔
##羚
##羞
##羟
##羡
##羣
##群
##羥
##羧
##羨
##義
##羯
##羲
##羸
##羹
##羽
##羿
##翁
##翅
##翊
##翌
##翎
##習
##翔
##翘
##翟
##翠
##翡
##翦
##翩
##翰
##翱
##翳
##翹
##翻
##翼
##耀
##老
##考
##耄
##者
##耆
##耋
##而
##耍
##耐
##耒
##耕
##耗
##耘
##耙
##耦
##耨
##耳
##耶
##耷
##耸
##耻
##耽
##耿
##聂
##聆
##聊
##聋
##职
##聒
##联
##聖
##聘
##聚
##聞
##聪
##聯
##聰
##聲
##聳
##聴
##聶
##職
##聽
##聾
##聿
##肃
##肄
##肅
##肆
##肇
##肉
##肋
##肌
##肏
##肓
##肖
##肘
##肚
##肛
##肝
##肠
##股
##肢
##肤
##肥
##肩
##肪
##肮
##肯
##肱
##育
##肴
##肺
##肽
##肾
##肿
##胀
##胁
##胃
##胄
##胆
##背
##胍
##胎
##胖
##胚
##胛
##胜
##胝
##胞
##胡
##胤
##胥
##胧
##胫
##胭
##胯
##胰
##胱
##胳
##胴
##胶
##胸
##胺
##能
##脂
##脅
##脆
##脇
##脈
##脉
##脊
##脍
##脏
##脐
##脑
##脓
##脖
##脘
##脚
##脛
##脣
##脩
##脫
##脯
##脱
##脲
##脳
##脸
##脹
##脾
##腆
##腈
##腊
##腋
##腌
##腎
##腐
##腑
##腓
##腔
##腕
##腥
##腦
##腩
##腫
##腭
##腮
##腰
##腱
##腳
##腴
##腸
##腹
##腺
##腻
##腼
##腾
##腿
##膀
##膈
##膊
##膏
##膑
##膘
##膚
##膛
##膜
##膝
##膠
##膦
##膨
##膩
##膳
##膺
##膻
##膽
##膾
##膿
##臀
##臂
##臃
##臆
##臉
##臊
##臍
##臓
##臘
##臟
##臣
##臥
##臧
##臨
##自
##臬
##臭
##至
##致
##臺
##臻
##臼
##臾
##舀
##舂
##舅
##舆
##與
##興
##舉
##舊
##舌
##舍
##舎
##舐
##舒
##舔
##舖
##舗
##舛
##舜
##舞
##舟
##航
##舫
##般
##舰
##舱
##舵
##舶
##舷
##舸
##船
##舺
##舾
##艇
##艋
##艘
##艙
##艦
##艮
##良
##艰
##艱
##色
##艳
##艷
##艹
##艺
##艾
##节
##芃
##芈
##芊
##芋
##芍
##芎
##芒
##芙
##芜
##芝
##芡
##芥
##芦
##芩
##芪
##芫
##芬
##芭
##芮
##芯
##花
##芳
##芷
##芸
##芹
##芻
##芽
##芾
##苁
##苄
##苇
##苋
##苍
##苏
##苑
##苒
##苓
##苔
##苕
##苗
##苛
##苜
##苞
##苟
##苡
##苣
##若
##苦
##苫
##苯
##英
##苷
##苹
##苻
##茁
##茂
##范
##茄
##茅
##茉
##茎
##茏
##茗
##茜
##茧
##茨
##茫
##茬
##茭
##茯
##茱
##茲
##茴
##茵
##茶
##茸
##茹
##茼
##荀
##荃
##荆
##草
##荊
##荏
##荐
##荒
##荔
##荖
##荘
##荚
##荞
##荟
##荠
##荡
##荣
##荤
##荥
##荧
##荨
##荪
##荫
##药
##荳
##荷
##荸
##荻
##荼
##荽
##莅
##莆
##莉
##莊
##莎
##莒
##莓
##莖
##莘
##莞
##莠
##莢
##莧
##莪
##莫
##莱
##莲
##莴
##获
##莹
##莺
##莽
##莿
##菀
##菁
##菅
##菇
##菈
##菊
##菌
##菏
##菓
##菖
##菘
##菜
##菟
##菠
##菡
##菩
##華
##菱
##菲
##菸
##菽
##萁
##萃
##萄
##萊
##萋
##萌
##萍
##萎
##萘
##萝
##萤
##营
##萦
##萧
##萨
##萩
##萬
##萱
##萵
##萸
##萼
##落
##葆
##葉
##著
##葚
##葛
##葡
##董
##葦
##葩
##葫
##葬
##葭
##葯
##葱
##葳
##葵
##葷
##葺
##蒂
##蒋
##蒐
##蒔
##蒙
##蒜
##蒞
##蒟
##蒡
##蒨
##蒲
##蒸
##蒹
##蒻
##蒼
##蒿
##蓁
##蓄
##蓆
##蓉
##蓋
##蓑
##蓓
##蓖
##蓝
##蓟
##蓦
##蓬
##蓮
##蓼
##蓿
##蔑
##蔓
##蔔
##蔗
##蔘
##蔚
##蔡
##蔣
##蔥
##蔫
##蔬
##蔭
##蔵
##蔷
##蔺
##蔻
##蔼
##蔽
##蕁
##蕃
##蕈
##蕉
##蕊
##蕎
##蕙
##蕤
##蕨
##蕩
##蕪
##蕭
##蕲
##蕴
##蕻
##蕾
##薄
##薅
##薇
##薈
##薊
##薏
##薑
##薔
##薙
##薛
##薦
##薨
##薩
##薪
##薬
##薯
##薰
##薹
##藉
##藍
##藏
##藐
##藓
##藕
##藜
##藝
##藤
##藥
##藩
##藹
##藻
##藿
##蘆
##蘇
##蘊
##蘋
##蘑
##蘚
##蘭
##蘸
##蘼
##蘿
##虎
##虏
##虐
##虑
##虔
##處
##虚
##虛
##虜
##虞
##號
##虢
##虧
##虫
##虬
##虱
##虹
##虻
##虽
##虾
##蚀
##蚁
##蚂
##蚊
##蚌
##蚓
##蚕
##蚜
##蚝
##蚣
##蚤
##蚩
##蚪
##蚯
##蚱
##蚵
##蛀
##蛆
##蛇
##蛊
##蛋
##蛎
##蛐
##蛔
##蛙
##蛛
##蛟
##蛤
##蛭
##蛮
##蛰
##蛳
##蛹
##蛻
##蛾
##蜀
##蜂
##蜃
##蜆
##蜇
##蜈
##蜊
##蜍
##蜒
##蜓
##蜕
##蜗
##蜘
##蜚
##蜜
##蜡
##蜢
##蜥
##蜱
##蜴
##蜷
##蜻
##蜿
##蝇
##蝈
##蝉
##蝌
##蝎
##蝕
##蝗
##蝙
##蝟
##蝠
##蝦
##蝨
##蝴
##蝶
##蝸
##蝼
##螂
##螃
##融
##螞
##螢
##螨
##螯
##螳
##螺
##蟀
##蟄
##蟆
##蟋
##蟎
##蟑
##蟒
##蟠
##蟬
##蟲
##蟹
##蟻
##蟾
##蠅
##蠍
##蠔
##蠕
##蠛
##蠟
##蠡
##蠢
##蠣
##蠱
##蠶
##蠹
##蠻
##血
##衄
##衅
##衆
##行
##衍
##術
##衔
##街
##衙
##衛
##衝
##衞
##衡
##衢
##衣
##补
##表
##衩
##衫
##衬
##衮
##衰
##衲
##衷
##衹
##衾
##衿
##袁
##袂
##袄
##袅
##袈
##袋
##袍
##袒
##袖
##袜
##袞
##袤
##袪
##被
##袭
##袱
##裁
##裂
##装
##裆
##裊
##裏
##裔
##裕
##裘
##裙
##補
##裝
##裟
##裡
##裤
##裨
##裱
##裳
##裴
##裸
##裹
##製
##裾
##褂
##複
##褐
##褒
##褓
##褔
##褚
##褥
##褪
##褫
##褲
##褶
##褻
##襁
##襄
##襟
##襠
##襪
##襬
##襯
##襲
##西
##要
##覃
##覆
##覇
##見
##規
##覓
##視
##覚
##覦
##覧
##親
##覬
##観
##覷
##覺
##覽
##觀
##见
##观
##规
##觅
##视
##览
##觉
##觊
##觎
##觐
##觑
##角
##觞
##解
##觥
##触
##觸
##言
##訂
##計
##訊
##討
##訓
##訕
##訖
##託
##記
##訛
##訝
##訟
##訣
##訥
##訪
##設
##許
##訳
##訴
##訶
##診
##註
##証
##詆
##詐
##詔
##評
##詛
##詞
##詠
##詡
##詢
##詣
##試
##詩
##詫
##詬
##詭
##詮
##詰
##話
##該
##詳
##詹
##詼
##誅
##誇
##誉
##誌
##認
##誓
##誕
##誘
##語
##誠
##誡
##誣
##誤
##誥
##誦
##誨
##說
##説
##読
##誰
##課
##誹
##誼
##調
##諄
##談
##請
##諏
##諒
##論
##諗
##諜
##諡
##諦
##諧
##諫
##諭
##諮
##諱
##諳
##諷
##諸
##諺
##諾
##謀
##謁
##謂
##謄
##謊
##謎
##謐
##謔
##謗
##謙
##講
##謝
##謠
##謨
##謬
##謹
##謾
##譁
##證
##譎
##譏
##識
##譙
##譚
##譜
##警
##譬
##譯
##議
##譲
##譴
##護
##譽
##讀
##變
##讓
##讚
##讞
##计
##订
##认
##讥
##讧
##讨
##让
##讪
##讫
##训
##议
##讯
##记
##讲
##讳
##讴
##讶
##讷
##许
##讹
##论
##讼
##讽
##设
##访
##诀
##证
##诃
##评
##诅
##识
##诈
##诉
##诊
##诋
##词
##诏
##译
##试
##诗
##诘
##诙
##诚
##诛
##话
##诞
##诟
##诠
##诡
##询
##诣
##诤
##该
##详
##诧
##诩
##诫
##诬
##语
##误
##诰
##诱
##诲
##说
##诵
##诶
##请
##诸
##诺
##读
##诽
##课
##诿
##谀
##谁
##调
##谄
##谅
##谆
##谈
##谊
##谋
##谌
##谍
##谎
##谏
##谐
##谑
##谒
##谓
##谔
##谕
##谗
##谘
##谙
##谚
##谛
##谜
##谟
##谢
##谣
##谤
##谥
##谦
##谧
##谨
##谩
##谪
##谬
##谭
##谯
##谱
##谲
##谴
##谶
##谷
##豁
##豆
##豇
##豈
##豉
##豊
##豌
##豎
##豐
##豔
##豚
##象
##豢
##豪
##豫
##豬
##豹
##豺
##貂
##貅
##貌
##貓
##貔
##貘
##貝
##貞
##負
##財
##貢
##貧
##貨
##販
##貪
##貫
##責
##貯
##貰
##貳
##貴
##貶
##買
##貸
##費
##貼
##貽
##貿
##賀
##賁
##賂
##賃
##賄
##資
##賈
##賊
##賑
##賓
##賜
##賞
##賠
##賡
##賢
##賣
##賤
##賦
##質
##賬
##賭
##賴
##賺
##購
##賽
##贅
##贈
##贊
##贍
##贏
##贓
##贖
##贛
##贝
##贞
##负
##贡
##财
##责
##贤
##败
##账
##货
##质
##贩
##贪
##贫
##贬
##购
##贮
##贯
##贰
##贱
##贲
##贴
##贵
##贷
##贸
##费
##贺
##贻
##贼
##贾
##贿
##赁
##赂
##赃
##资
##赅
##赈
##赊
##赋
##赌
##赎
##赏
##赐
##赓
##赔
##赖
##赘
##赚
##赛
##赝
##赞
##赠
##赡
##赢
##赣
##赤
##赦
##赧
##赫
##赭
##走
##赳
##赴
##赵
##赶
##起
##趁
##超
##越
##趋
##趕
##趙
##趟
##趣
##趨
##足
##趴
##趵
##趸
##趺
##趾
##跃
##跄
##跆
##跋
##跌
##跎
##跑
##跖
##跚
##跛
##距
##跟
##跡
##跤
##跨
##跩
##跪
##路
##跳
##践
##跷
##跹
##跺
##跻
##踉
##踊
##踌
##踏
##踐
##踝
##踞
##踟
##踢
##踩
##踪
##踮
##踱
##踴
##踵
##踹
##蹂
##蹄
##蹇
##蹈
##蹉
##蹊
##蹋
##蹑
##蹒
##蹙
##蹟
##蹣
##蹤
##蹦
##蹩
##蹬
##蹭
##蹲
##蹴
##蹶
##蹺
##蹼
##蹿
##躁
##躇
##躉
##躊
##躋
##躍
##躏
##躪
##身
##躬
##躯
##躲
##躺
##軀
##車
##軋
##軌
##軍
##軒
##軟
##転
##軸
##軼
##軽
##軾
##較
##載
##輒
##輓
##輔
##輕
##輛
##輝
##輟
##輩
##輪
##輯
##輸
##輻
##輾
##輿
##轄
##轅
##轆
##轉
##轍
##轎
##轟
##车
##轧
##轨
##轩
##转
##轭
##轮
##软
##轰
##轲
##轴
##轶
##轻
##轼
##载
##轿
##较
##辄
##辅
##辆
##辇
##辈
##辉
##辊
##辍
##辐
##辑
##输
##辕
##辖
##辗
##辘
##辙
##辛
##辜
##辞
##辟
##辣
##辦
##辨
##辩
##辫
##辭
##辮
##辯
##辰
##辱
##農
##边
##辺
##辻
##込
##辽
##达
##迁
##迂
##迄
##迅
##过
##迈
##迎
##运
##近
##返
##还
##这
##进
##远
##违
##连
##迟
##迢
##迤
##迥
##迦
##迩
##迪
##迫
##迭
##述
##迴
##迷
##迸
##迹
##迺
##追
##退
##送
##适
##逃
##逅
##逆
##选
##逊
##逍
##透
##逐
##递
##途
##逕
##逗
##這
##通
##逛
##逝
##逞
##速
##造
##逢
##連
##逮
##週
##進
##逵
##逶
##逸
##逻
##逼
##逾
##遁
##遂
##遅
##遇
##遊
##運
##遍
##過
##遏
##遐
##遑
##遒
##道
##達
##違
##遗
##遙
##遛
##遜
##遞
##遠
##遢
##遣
##遥
##遨
##適
##遭
##遮
##遲
##遴
##遵
##遶
##遷
##選
##遺
##遼
##遽
##避
##邀
##邁
##邂
##邃
##還
##邇
##邈
##邊
##邋
##邏
##邑
##邓
##邕
##邛
##邝
##邢
##那
##邦
##邨
##邪
##邬
##邮
##邯
##邰
##邱
##邳
##邵
##邸
##邹
##邺
##邻
##郁
##郅
##郊
##郎
##郑
##郜
##郝
##郡
##郢
##郤
##郦
##郧
##部
##郫
##郭
##郴
##郵
##郷
##郸
##都
##鄂
##鄉
##鄒
##鄔
##鄙
##鄞
##鄢
##鄧
##鄭
##鄰
##鄱
##鄲
##鄺
##酉
##酊
##酋
##酌
##配
##酐
##酒
##酗
##酚
##酝
##酢
##酣
##酥
##酩
##酪
##酬
##酮
##酯
##酰
##酱
##酵
##酶
##酷
##酸
##酿
##醃
##醇
##醉
##醋
##醍
##醐
##醒
##醚
##醛
##醜
##醞
##醣
##醪
##醫
##醬
##醮
##醯
##醴
##醺
##釀
##釁
##采
##釉
##释
##釋
##里
##重
##野
##量
##釐
##金
##釗
##釘
##釜
##針
##釣
##釦
##釧
##釵
##鈀
##鈉
##鈍
##鈎
##鈔
##鈕
##鈞
##鈣
##鈦
##鈪
##鈴
##鈺
##鈾
##鉀
##鉄
##鉅
##鉉
##鉑
##鉗
##鉚
##鉛
##鉤
##鉴
##鉻
##銀
##銃
##銅
##銑
##銓
##銖
##銘
##銜
##銬
##銭
##銮
##銳
##銷
##銹
##鋁
##鋅
##鋒
##鋤
##鋪
##鋰
##鋸
##鋼
##錄
##錐
##錘
##錚
##錠
##錢
##錦
##錨
##錫
##錮
##錯
##録
##錳
##錶
##鍊
##鍋
##鍍
##鍛
##鍥
##鍰
##鍵
##鍺
##鍾
##鎂
##鎊
##鎌
##鎏
##鎔
##鎖
##鎗
##鎚
##鎧
##鎬
##鎮
##鎳
##鏈
##鏖
##鏗
##鏘
##鏞
##鏟
##鏡
##鏢
##鏤
##鏽
##鐘
##鐮
##鐲
##鐳
##鐵
##鐸
##鐺
##鑄
##鑊
##鑑
##鑒
##鑣
##鑫
##鑰
##鑲
##鑼
##鑽
##鑾
##鑿
##针
##钉
##钊
##钎
##钏
##钒
##钓
##钗
##钙
##钛
##钜
##钝
##钞
##钟
##钠
##钡
##钢
##钣
##钤
##钥
##钦
##钧
##钨
##钩
##钮
##钯
##钰
##钱
##钳
##钴
##钵
##钺
##钻
##钼
##钾
##钿
##铀
##铁
##铂
##铃
##铄
##铅
##铆
##铉
##铎
##铐
##铛
##铜
##铝
##铠
##铡
##铢
##铣
##铤
##铨
##铩
##铬
##铭
##铮
##铰
##铲
##铵
##银
##铸
##铺
##链
##铿
##销
##锁
##锂
##锄
##锅
##锆
##锈
##锉
##锋
##锌
##锏
##锐
##锑
##错
##锚
##锟
##锡
##锢
##锣
##锤
##锥
##锦
##锭
##键
##锯
##锰
##锲
##锵
##锹
##锺
##锻
##镀
##镁
##镂
##镇
##镉
##镌
##镍
##镐
##镑
##镕
##镖
##镗
##镛
##镜
##镣
##镭
##镯
##镰
##镳
##镶
##長
##长
##門
##閃
##閉
##開
##閎
##閏
##閑
##閒
##間
##閔
##閘
##閡
##関
##閣
##閥
##閨
##閩
##閱
##閲
##閹
##閻
##閾
##闆
##闇
##闊
##闌
##闍
##闔
##闕
##闖
##闘
##關
##闡
##闢
##门
##闪
##闫
##闭
##问
##闯
##闰
##闲
##间
##闵
##闷
##闸
##闹
##闺
##闻
##闽
##闾
##阀
##阁
##阂
##阅
##阆
##阇
##阈
##阉
##阎
##阐
##阑
##阔
##阕
##阖
##阙
##阚
##阜
##队
##阡
##阪
##阮
##阱
##防
##阳
##阴
##阵
##阶
##阻
##阿
##陀
##陂
##附
##际
##陆
##陇
##陈
##陋
##陌
##降
##限
##陕
##陛
##陝
##陞
##陟
##陡
##院
##陣
##除
##陨
##险
##陪
##陰
##陲
##陳
##陵
##陶
##陷
##陸
##険
##陽
##隅
##隆
##隈
##隊
##隋
##隍
##階
##随
##隐
##隔
##隕
##隘
##隙
##際
##障
##隠
##隣
##隧
##隨
##險
##隱
##隴
##隶
##隸
##隻
##隼
##隽
##难
##雀
##雁
##雄
##雅
##集
##雇
##雉
##雋
##雌
##雍
##雎
##雏
##雑
##雒
##雕
##雖
##雙
##雛
##雜
##雞
##離
##難
##雨
##雪
##雯
##雰
##雲
##雳
##零
##雷
##雹
##電
##雾
##需
##霁
##霄
##霆
##震
##霈
##霉
##霊
##霍
##霎
##霏
##霑
##霓
##霖
##霜
##霞
##霧
##霭
##霰
##露
##霸
##霹
##霽
##霾
##靂
##靄
##靈
##青
##靓
##靖
##静
##靚
##靛
##靜
##非
##靠
##靡
##面
##靥
##靦
##革
##靳
##靴
##靶
##靼
##鞅
##鞋
##鞍
##鞏
##鞑
##鞘
##鞠
##鞣
##鞦
##鞭
##韆
##韋
##韌
##韓
##韜
##韦
##韧
##韩
##韬
##韭
##音
##韵
##韶
##韻
##響
##頁
##頂
##頃
##項
##順
##須
##頌
##預
##頑
##頒
##頓
##頗
##領
##頜
##頡
##頤
##頫
##頭
##頰
##頷
##頸
##頹
##頻
##頼
##顆
##題
##額
##顎
##顏
##顔
##願
##顛
##類
##顧
##顫
##顯
##顱
##顴
##页
##顶
##顷
##项
##顺
##须
##顼
##顽
##顾
##顿
##颁
##颂
##预
##颅
##领
##颇
##颈
##颉
##颊
##颌
##颍
##颐
##频
##颓
##颔
##颖
##颗
##题
##颚
##颛
##颜
##额
##颞
##颠
##颡
##颢
##颤
##颦
##颧
##風
##颯
##颱
##颳
##颶
##颼
##飄
##飆
##风
##飒
##飓
##飕
##飘
##飙
##飚
##飛
##飞
##食
##飢
##飨
##飩
##飪
##飯
##飲
##飼
##飽
##飾
##餃
##餅
##餉
##養
##餌
##餐
##餒
##餓
##餘
##餚
##餛
##餞
##餡
##館
##餮
##餵
##餾
##饅
##饈
##饋
##饌
##饍
##饑
##饒
##饕
##饗
##饞
##饥
##饨
##饪
##饬
##饭
##饮
##饯
##饰
##饱
##饲
##饴
##饵
##饶
##饷
##饺
##饼
##饽
##饿
##馀
##馁
##馄
##馅
##馆
##馈
##馋
##馍
##馏
##馒
##馔
##首
##馗
##香
##馥
##馨
##馬
##馭
##馮
##馳
##馴
##駁
##駄
##駅
##駆
##駐
##駒
##駕
##駛
##駝
##駭
##駱
##駿
##騁
##騎
##騏
##験
##騙
##騨
##騰
##騷
##驀
##驅
##驊
##驍
##驒
##驕
##驗
##驚
##驛
##驟
##驢
##驥
##马
##驭
##驮
##驯
##驰
##驱
##驳
##驴
##驶
##驷
##驸
##驹
##驻
##驼
##驾
##驿
##骁
##骂
##骄
##骅
##骆
##骇
##骈
##骊
##骋
##验
##骏
##骐
##骑
##骗
##骚
##骛
##骜
##骞
##骠
##骡
##骤
##骥
##骧
##骨
##骯
##骰
##骶
##骷
##骸
##骼
##髂
##髅
##髋
##髏
##髒
##髓
##體
##髖
##高
##髦
##髪
##髮
##髯
##髻
##鬃
##鬆
##鬍
##鬓
##鬚
##鬟
##鬢
##鬣
##鬥
##鬧
##鬱
##鬼
##魁
##魂
##魄
##魅
##魇
##魍
##魏
##魔
##魘
##魚
##魯
##魷
##鮑
##鮨
##鮪
##鮭
##鮮
##鯉
##鯊
##鯖
##鯛
##鯨
##鯰
##鯽
##鰍
##鰓
##鰭
##鰲
##鰻
##鰾
##鱈
##鱉
##鱔
##鱗
##鱷
##鱸
##鱼
##鱿
##鲁
##鲈
##鲍
##鲑
##鲛
##鲜
##鲟
##鲢
##鲤
##鲨
##鲫
##鲱
##鲲
##鲶
##鲷
##鲸
##鳃
##鳄
##鳅
##鳌
##鳍
##鳕
##鳖
##鳗
##鳝
##鳞
##鳥
##鳩
##鳳
##鳴
##鳶
##鴉
##鴕
##鴛
##鴦
##鴨
##鴻
##鴿
##鵑
##鵜
##鵝
##鵡
##鵬
##鵰
##鵲
##鶘
##鶩
##鶯
##鶴
##鷗
##鷲
##鷹
##鷺
##鸚
##鸞
##鸟
##鸠
##鸡
##鸢
##鸣
##鸥
##鸦
##鸨
##鸪
##鸭
##鸯
##鸳
##鸵
##鸽
##鸾
##鸿
##鹂
##鹃
##鹄
##鹅
##鹈
##鹉
##鹊
##鹌
##鹏
##鹑
##鹕
##鹘
##鹜
##鹞
##鹤
##鹦
##鹧
##鹫
##鹭
##鹰
##鹳
##鹵
##鹹
##鹼
##鹽
##鹿
##麂
##麋
##麒
##麓
##麗
##麝
##麟
##麥
##麦
##麩
##麴
##麵
##麸
##麺
##麻
##麼
##麽
##麾
##黃
##黄
##黍
##黎
##黏
##黑
##黒
##黔
##默
##黛
##黜
##黝
##點
##黠
##黨
##黯
##黴
##鼋
##鼎
##鼐
##鼓
##鼠
##鼬
##鼹
##鼻
##鼾
##齁
##齊
##齋
##齐
##齒
##齡
##齢
##齣
##齦
##齿
##龄
##龅
##龈
##龊
##龋
##龌
##龍
##龐
##龔
##龕
##龙
##龚
##龛
##龜
##龟
##︰
##︱
##︶
##︿
##﹁
##﹂
##﹍
##﹏
##﹐
##﹑
##﹒
##﹔
##﹕
##﹖
##﹗
##﹙
##﹚
##﹝
##﹞
##﹡
##﹣
##！
##＂
##＃
##＄
##％
##＆
##＇
##（
##）
##＊
##，
##－
##．
##／
##：
##；
##＜
##？
##＠
##［
##＼
##］
##＾
##＿
##｀
##ｆ
##ｈ
##ｊ
##ｕ
##ｗ
##ｚ
##｛
##｝
##｡
##｢
##｣
##､
##･
##ｯ
##ｰ
##ｲ
##ｸ
##ｼ
##ｽ
##ﾄ
##ﾉ
##ﾌ
##ﾗ
##ﾙ
##ﾝ
##ﾞ
##ﾟ
##￣
##￥
##👍
##🔥
##😂
##😎


================================================
FILE: args.py
================================================
import os
import tensorflow as tf

tf.logging.set_verbosity(tf.logging.INFO)

file_path = os.path.dirname(__file__)


#模型目录
model_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')

#config文件
config_name = os.path.join(file_path, 'albert_config/albert_config_tiny.json')
#ckpt文件名称
ckpt_name = os.path.join(model_dir, 'model.ckpt')
#输出文件目录
output_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')
#vocab文件目录
vocab_file = os.path.join(file_path, 'albert_config/vocab.txt')
#数据目录
data_dir = os.path.join(file_path, 'data/')

num_train_epochs = 10
batch_size = 128
learning_rate = 0.00005

# gpu使用率
gpu_memory_fraction = 0.8

# 默认取倒数第二层的输出值作为句向量
layer_indexes = [-2]

# 序列的最大程度，单文本建议把该值调小
max_seq_len = 128

# graph名字
graph_file = os.path.join(file_path, 'albert_lcqmc_checkpoints/graph')

================================================
FILE: bert_utils.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import copy
import json
import math
import re
import six
import tensorflow as tf

def get_shape_list(tensor, expected_rank=None, name=None):
	"""Returns a list of the shape of tensor, preferring static dimensions.

	Args:
		tensor: A tf.Tensor object to find the shape of.
		expected_rank: (optional) int. The expected rank of `tensor`. If this is
			specified and the `tensor` has a different rank, and exception will be
			thrown.
		name: Optional name of the tensor for the error message.

	Returns:
		A list of dimensions of the shape of tensor. All static dimensions will
		be returned as python integers, and dynamic dimensions will be returned
		as tf.Tensor scalars.
	"""
	if name is None:
		name = tensor.name

	if expected_rank is not None:
		assert_rank(tensor, expected_rank, name)

	shape = tensor.shape.as_list()

	non_static_indexes = []
	for (index, dim) in enumerate(shape):
		if dim is None:
			non_static_indexes.append(index)

	if not non_static_indexes:
		return shape

	dyn_shape = tf.shape(tensor)
	for index in non_static_indexes:
		shape[index] = dyn_shape[index]
	return shape

def reshape_to_matrix(input_tensor):
	"""Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix)."""
	ndims = input_tensor.shape.ndims
	if ndims < 2:
		raise ValueError("Input tensor must have at least rank 2. Shape = %s" %
										 (input_tensor.shape))
	if ndims == 2:
		return input_tensor

	width = input_tensor.shape[-1]
	output_tensor = tf.reshape(input_tensor, [-1, width])
	return output_tensor

def reshape_from_matrix(output_tensor, orig_shape_list):
	"""Reshapes a rank 2 tensor back to its original rank >= 2 tensor."""
	if len(orig_shape_list) == 2:
		return output_tensor

	output_shape = get_shape_list(output_tensor)

	orig_dims = orig_shape_list[0:-1]
	width = output_shape[-1]

	return tf.reshape(output_tensor, orig_dims + [width])

def assert_rank(tensor, expected_rank, name=None):
	"""Raises an exception if the tensor rank is not of the expected rank.

	Args:
		tensor: A tf.Tensor to check the rank of.
		expected_rank: Python integer or list of integers, expected rank.
		name: Optional name of the tensor for the error message.

	Raises:
		ValueError: If the expected shape doesn't match the actual shape.
	"""
	if name is None:
		name = tensor.name

	expected_rank_dict = {}
	if isinstance(expected_rank, six.integer_types):
		expected_rank_dict[expected_rank] = True
	else:
		for x in expected_rank:
			expected_rank_dict[x] = True

	actual_rank = tensor.shape.ndims
	if actual_rank not in expected_rank_dict:
		scope_name = tf.get_variable_scope().name
		raise ValueError(
				"For the tensor `%s` in scope `%s`, the actual rank "
				"`%d` (shape = %s) is not equal to the expected rank `%s`" %
				(name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))

def gather_indexes(sequence_tensor, positions):
	"""Gathers the vectors at the specific positions over a minibatch."""
	sequence_shape = get_shape_list(sequence_tensor, expected_rank=3)
	batch_size = sequence_shape[0]
	seq_length = sequence_shape[1]
	width = sequence_shape[2]

	flat_offsets = tf.reshape(
			tf.range(0, batch_size, dtype=tf.int32) * seq_length, [-1, 1])
	flat_positions = tf.reshape(positions + flat_offsets, [-1])
	flat_sequence_tensor = tf.reshape(sequence_tensor,
																		[batch_size * seq_length, width])
	output_tensor = tf.gather(flat_sequence_tensor, flat_positions)
	return output_tensor

# add sequence mask for:
# 1. random shuffle lm modeling---xlnet with random shuffled input
# 2. left2right and right2left language modeling
# 3. conditional generation
def generate_seq2seq_mask(attention_mask, mask_sequence, seq_type, **kargs):
	if seq_type == 'seq2seq':
		if mask_sequence is not None:
			seq_shape = get_shape_list(mask_sequence, expected_rank=2)
			seq_len = seq_shape[1]
			ones = tf.ones((1, seq_len, seq_len))
			a_mask = tf.matrix_band_part(ones, -1, 0)
			s_ex12 = tf.expand_dims(tf.expand_dims(mask_sequence, 1), 2)
			s_ex13 = tf.expand_dims(tf.expand_dims(mask_sequence, 1), 3)
			a_mask = (1 - s_ex13) * (1 - s_ex12) + s_ex13 * a_mask
			# generate mask of batch x seq_len x seq_len
			a_mask = tf.reshape(a_mask, (-1, seq_len, seq_len))
			out_mask = attention_mask * a_mask
		else:
			ones = tf.ones_like(attention_mask[:1])
			mask = (tf.matrix_band_part(ones, -1, 0))
			out_mask = attention_mask * mask
	else:
		out_mask = attention_mask

	return out_mask



================================================
FILE: classifier_utils.py
================================================
# -*- coding: utf-8 -*-
# @Author: bo.shi
# @Date:   2019-12-01 22:28:41
# @Last Modified by:   bo.shi
# @Last Modified time: 2019-12-02 18:36:50
# coding=utf-8
# Copyright 2019 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Utility functions for GLUE classification tasks."""

from __future__ import absolute_import
from __future__ import division

from __future__ import print_function

import json
import csv
import os
import six

import tensorflow as tf


def convert_to_unicode(text):
  """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
  if six.PY3:
    if isinstance(text, str):
      return text
    elif isinstance(text, bytes):
      return text.decode("utf-8", "ignore")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  elif six.PY2:
    if isinstance(text, str):
      return text.decode("utf-8", "ignore")
    elif isinstance(text, unicode):
      return text
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  else:
    raise ValueError("Not running on Python2 or Python 3?")


class InputExample(object):
  """A single training/test example for simple sequence classification."""

  def __init__(self, guid, text_a, text_b=None, label=None):
    """Constructs a InputExample.
    Args:
      guid: Unique id for the example.
      text_a: string. The untokenized text of the first sequence. For single
        sequence tasks, only this sequence must be specified.
      text_b: (Optional) string. The untokenized text of the second sequence.
        Only must be specified for sequence pair tasks.
      label: (Optional) string. The label of the example. This should be
        specified for train and dev examples, but not for test examples.
    """
    self.guid = guid
    self.text_a = text_a
    self.text_b = text_b
    self.label = label


class PaddingInputExample(object):
  """Fake example so the num input examples is a multiple of the batch size.
  When running eval/predict on the TPU, we need to pad the number of examples
  to be a multiple of the batch size, because the TPU requires a fixed batch
  size. The alternative is to drop the last batch, which is bad because it means
  the entire output data won't be generated.
  We use this class instead of `None` because treating `None` as padding
  battches could cause silent errors.
  """


class DataProcessor(object):
  """Base class for data converters for sequence classification data sets."""

  def get_train_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the train set."""
    raise NotImplementedError()

  def get_dev_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the dev set."""
    raise NotImplementedError()

  def get_test_examples(self, data_dir):
    """Gets a collection of `InputExample`s for prediction."""
    raise NotImplementedError()

  def get_labels(self):
    """Gets the list of labels for this data set."""
    raise NotImplementedError()

  @classmethod
  def _read_tsv(cls, input_file, delimiter="\t", quotechar=None):
    """Reads a tab separated value file."""
    with tf.gfile.Open(input_file, "r") as f:
      reader = csv.reader(f, delimiter=delimiter, quotechar=quotechar)
      lines = []
      for line in reader:
        lines.append(line)
      return lines

  @classmethod
  def _read_txt(cls, input_file):
    """Reads a tab separated value file."""
    with tf.gfile.Open(input_file, "r") as f:
      reader = f.readlines()
      lines = []
      for line in reader:
        lines.append(line.strip().split("_!_"))
      return lines

  @classmethod
  def _read_json(cls, input_file):
    """Reads a tab separated value file."""
    with tf.gfile.Open(input_file, "r") as f:
      reader = f.readlines()
      lines = []
      for line in reader:
        lines.append(json.loads(line.strip()))
      return lines


class XnliProcessor(DataProcessor):
  """Processor for the XNLI data set."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def _create_examples(self, lines, set_type):
    """See base class."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(line['premise'])
      text_b = convert_to_unicode(line['hypo'])
      label = convert_to_unicode(line['label']) if set_type != 'test' else 'contradiction'
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples

  def get_labels(self):
    """See base class."""
    return ["contradiction", "entailment", "neutral"]


# class TnewsProcessor(DataProcessor):
#     """Processor for the MRPC data set (GLUE version)."""
#
#     def get_train_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "toutiao_category_train.txt")), "train")
#
#     def get_dev_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "toutiao_category_dev.txt")), "dev")
#
#     def get_test_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "toutiao_category_test.txt")), "test")
#
#     def get_labels(self):
#         """See base class."""
#         labels = []
#         for i in range(17):
#             if i == 5 or i == 11:
#                 continue
#             labels.append(str(100 + i))
#         return labels
#
#     def _create_examples(self, lines, set_type):
#         """Creates examples for the training and dev sets."""
#         examples = []
#         for (i, line) in enumerate(lines):
#             if i == 0:
#                 continue
#             guid = "%s-%s" % (set_type, i)
#             text_a = convert_to_unicode(line[3])
#             text_b = None
#             label = convert_to_unicode(line[1])
#             examples.append(
#                 InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#         return examples


class TnewsProcessor(DataProcessor):
  """Processor for the MRPC data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    labels = []
    for i in range(17):
      if i == 5 or i == 11:
        continue
      labels.append(str(100 + i))
    return labels

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(line['sentence'])
      text_b = None
      label = convert_to_unicode(line['label']) if set_type != 'test' else "100"
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


# class iFLYTEKDataProcessor(DataProcessor):
#     """Processor for the iFLYTEKData data set (GLUE version)."""
#
#     def get_train_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "train.txt")), "train")
#
#     def get_dev_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "dev.txt")), "dev")
#
#     def get_test_examples(self, data_dir):
#         """See base class."""
#         return self._create_examples(
#             self._read_txt(os.path.join(data_dir, "test.txt")), "test")
#
#     def get_labels(self):
#         """See base class."""
#         labels = []
#         for i in range(119):
#             labels.append(str(i))
#         return labels
#
#     def _create_examples(self, lines, set_type):
#         """Creates examples for the training and dev sets."""
#         examples = []
#         for (i, line) in enumerate(lines):
#             if i == 0:
#                 continue
#             guid = "%s-%s" % (set_type, i)
#             text_a = convert_to_unicode(line[1])
#             text_b = None
#             label = convert_to_unicode(line[0])
#             examples.append(
#                 InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#         return examples


class iFLYTEKDataProcessor(DataProcessor):
  """Processor for the iFLYTEKData data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    labels = []
    for i in range(119):
      labels.append(str(i))
    return labels

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(line['sentence'])
      text_b = None
      label = convert_to_unicode(line['label']) if set_type != 'test' else "0"
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


class AFQMCProcessor(DataProcessor):
  """Processor for the internal data set. sentence pair classification"""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(line['sentence1'])
      text_b = convert_to_unicode(line['sentence2'])
      label = convert_to_unicode(line['label']) if set_type != 'test' else '0'
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


class CMNLIProcessor(DataProcessor):
  """Processor for the CMNLI data set."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples_json(os.path.join(data_dir, "train.json"), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples_json(os.path.join(data_dir, "dev.json"), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples_json(os.path.join(data_dir, "test.json"), "test")

  def get_labels(self):
    """See base class."""
    return ["contradiction", "entailment", "neutral"]

  def _create_examples_json(self, file_name, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    lines = tf.gfile.Open(file_name, "r")
    index = 0
    for line in lines:
      line_obj = json.loads(line)
      index = index + 1
      guid = "%s-%s" % (set_type, index)
      text_a = convert_to_unicode(line_obj["sentence1"])
      text_b = convert_to_unicode(line_obj["sentence2"])
      label = convert_to_unicode(line_obj["label"]) if set_type != 'test' else 'neutral'

      if label != "-":
        examples.append(InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))

    return examples


class CslProcessor(DataProcessor):
  """Processor for the CSL data set."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(" ".join(line['keyword']))
      text_b = convert_to_unicode(line['abst'])
      label = convert_to_unicode(line['label']) if set_type != 'test' else '0'
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


# class InewsProcessor(DataProcessor):
#   """Processor for the MRPC data set (GLUE version)."""
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "train.txt")), "train")
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "dev.txt")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "test.txt")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     labels = ["0", "1", "2"]
#     return labels
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     for (i, line) in enumerate(lines):
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       text_a = convert_to_unicode(line[2])
#       text_b = convert_to_unicode(line[3])
#       label = convert_to_unicode(line[0]) if set_type != "test" else '0'
#       examples.append(
#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#     return examples
#
#
# class THUCNewsProcessor(DataProcessor):
#   """Processor for the THUCNews data set (GLUE version)."""
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "train.txt")), "train")
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "dev.txt")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_txt(os.path.join(data_dir, "test.txt")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     labels = []
#     for i in range(14):
#       labels.append(str(i))
#     return labels
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     for (i, line) in enumerate(lines):
#       if i == 0 or len(line) < 3:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       text_a = convert_to_unicode(line[3])
#       text_b = None
#       label = convert_to_unicode(line[0])
#       examples.append(
#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#     return examples
#
# class LCQMCProcessor(DataProcessor):
#   """Processor for the internal data set. sentence pair classification"""
#
#   def __init__(self):
#     self.language = "zh"
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "train.txt")), "train")
#     # dev_0827.tsv
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "dev.txt")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "test.txt")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["0", "1"]
#     # return ["-1","0", "1"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     print("length of lines:", len(lines))
#     for (i, line) in enumerate(lines):
#       # print('#i:',i,line)
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       try:
#         label = convert_to_unicode(line[2])
#         text_a = convert_to_unicode(line[0])
#         text_b = convert_to_unicode(line[1])
#         examples.append(
#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#       except Exception:
#         print('###error.i:', i, line)
#     return examples
#
#
# class JDCOMMENTProcessor(DataProcessor):
#   """Processor for the internal data set. sentence pair classification"""
#
#   def __init__(self):
#     self.language = "zh"
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "jd_train.csv"), ",", "\""), "train")
#     # dev_0827.tsv
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "jd_dev.csv"), ",", "\""), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "jd_test.csv"), ",", "\""), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["1", "2", "3", "4", "5"]
#     # return ["-1","0", "1"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     print("length of lines:", len(lines))
#     for (i, line) in enumerate(lines):
#       # print('#i:',i,line)
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       try:
#         label = convert_to_unicode(line[0])
#         text_a = convert_to_unicode(line[1])
#         text_b = convert_to_unicode(line[2])
#         examples.append(
#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#       except Exception:
#         print('###error.i:', i, line)
#     return examples
#
#
# class BQProcessor(DataProcessor):
#   """Processor for the internal data set. sentence pair classification"""
#
#   def __init__(self):
#     self.language = "zh"
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "train.txt")), "train")
#     # dev_0827.tsv
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "dev.txt")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "test.txt")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["0", "1"]
#     # return ["-1","0", "1"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     print("length of lines:", len(lines))
#     for (i, line) in enumerate(lines):
#       # print('#i:',i,line)
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       try:
#         label = convert_to_unicode(line[2])
#         text_a = convert_to_unicode(line[0])
#         text_b = convert_to_unicode(line[1])
#         examples.append(
#             InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#       except Exception:
#         print('###error.i:', i, line)
#     return examples
#
#
# class MnliProcessor(DataProcessor):
#   """Processor for the MultiNLI data set (GLUE version)."""
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "dev_matched.tsv")),
#         "dev_matched")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "test_matched.tsv")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["contradiction", "entailment", "neutral"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     for (i, line) in enumerate(lines):
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, convert_to_unicode(line[0]))
#       text_a = convert_to_unicode(line[8])
#       text_b = convert_to_unicode(line[9])
#       if set_type == "test":
#         label = "contradiction"
#       else:
#         label = convert_to_unicode(line[-1])
#       examples.append(
#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#     return examples
#
#
# class MrpcProcessor(DataProcessor):
#   """Processor for the MRPC data set (GLUE version)."""
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["0", "1"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     for (i, line) in enumerate(lines):
#       if i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       text_a = convert_to_unicode(line[3])
#       text_b = convert_to_unicode(line[4])
#       if set_type == "test":
#         label = "0"
#       else:
#         label = convert_to_unicode(line[0])
#       examples.append(
#           InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
#     return examples
#
#
# class ColaProcessor(DataProcessor):
#   """Processor for the CoLA data set (GLUE version)."""
#
#   def get_train_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")
#
#   def get_dev_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")
#
#   def get_test_examples(self, data_dir):
#     """See base class."""
#     return self._create_examples(
#         self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")
#
#   def get_labels(self):
#     """See base class."""
#     return ["0", "1"]
#
#   def _create_examples(self, lines, set_type):
#     """Creates examples for the training and dev sets."""
#     examples = []
#     for (i, line) in enumerate(lines):
#       # Only the test set has a header
#       if set_type == "test" and i == 0:
#         continue
#       guid = "%s-%s" % (set_type, i)
#       if set_type == "test":
#         text_a = convert_to_unicode(line[1])
#         label = "0"
#       else:
#         text_a = convert_to_unicode(line[3])
#         label = convert_to_unicode(line[1])
#       examples.append(
#           InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
#     return examples

class WSCProcessor(DataProcessor):
  """Processor for the internal data set. sentence pair classification"""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    return ["true", "false"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      guid = "%s-%s" % (set_type, i)
      text_a = convert_to_unicode(line['text'])
      text_a_list = list(text_a)
      target = line['target']
      query = target['span1_text']
      query_idx = target['span1_index']
      pronoun = target['span2_text']
      pronoun_idx = target['span2_index']

      assert text_a[pronoun_idx: (pronoun_idx + len(pronoun))
                    ] == pronoun, "pronoun: {}".format(pronoun)
      assert text_a[query_idx: (query_idx + len(query))] == query, "query: {}".format(query)

      if pronoun_idx > query_idx:
        text_a_list.insert(query_idx, "_")
        text_a_list.insert(query_idx + len(query) + 1, "_")
        text_a_list.insert(pronoun_idx + 2, "[")
        text_a_list.insert(pronoun_idx + len(pronoun) + 2 + 1, "]")
      else:
        text_a_list.insert(pronoun_idx, "[")
        text_a_list.insert(pronoun_idx + len(pronoun) + 1, "]")
        text_a_list.insert(query_idx + 2, "_")
        text_a_list.insert(query_idx + len(query) + 2 + 1, "_")

      text_a = "".join(text_a_list)

      if set_type == "test":
        label = "true"
      else:
        label = line['label']

      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    return examples


class COPAProcessor(DataProcessor):
  """Processor for the internal data set. sentence pair classification"""

  def __init__(self):
    self.language = "zh"

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "train.json")), "train")
    # dev_0827.tsv

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "dev.json")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_json(os.path.join(data_dir, "test.json")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  @classmethod
  def _create_examples_one(self, lines, set_type):
    examples = []
    for (i, line) in enumerate(lines):
      guid1 = "%s-%s" % (set_type, i)
#         try:
      if line['question'] == 'cause':
        text_a = convert_to_unicode(line['premise'] + '原因是什么呢？' + line['choice0'])
        text_b = convert_to_unicode(line['premise'] + '原因是什么呢？' + line['choice1'])
      else:
        text_a = convert_to_unicode(line['premise'] + '造成了什么影响呢？' + line['choice0'])
        text_b = convert_to_unicode(line['premise'] + '造成了什么影响呢？' + line['choice1'])
      label = convert_to_unicode(str(1 if line['label'] == 0 else 0)) if set_type != 'test' else '0'
      examples.append(
          InputExample(guid=guid1, text_a=text_a, text_b=text_b, label=label))
#         except Exception as e:
#             print('###error.i:',e, i, line)
    return examples

  @classmethod
  def _create_examples(self, lines, set_type):
    examples = []
    for (i, line) in enumerate(lines):
      i = 2 * i
      guid1 = "%s-%s" % (set_type, i)
      guid2 = "%s-%s" % (set_type, i + 1)
#         try:
      premise = convert_to_unicode(line['premise'])
      choice0 = convert_to_unicode(line['choice0'])
      label = convert_to_unicode(str(1 if line['label'] == 0 else 0)) if set_type != 'test' else '0'
      #text_a2 = convert_to_unicode(line['premise'])
      choice1 = convert_to_unicode(line['choice1'])
      label2 = convert_to_unicode(
          str(0 if line['label'] == 0 else 1)) if set_type != 'test' else '0'
      if line['question'] == 'effect':
        text_a = premise
        text_b = choice0
        text_a2 = premise
        text_b2 = choice1
      elif line['question'] == 'cause':
        text_a = choice0
        text_b = premise
        text_a2 = choice1
        text_b2 = premise
      else:
        print('wrong format!!')
        return None
      examples.append(
          InputExample(guid=guid1, text_a=text_a, text_b=text_b, label=label))
      examples.append(
          InputExample(guid=guid2, text_a=text_a2, text_b=text_b2, label=label2))
#         except Exception as e:
#             print('###error.i:',e, i, line)
    return examples

================================================
FILE: create_pretrain_data.sh
================================================
#!/usr/bin/env bash

BERT_BASE_DIR=./albert_config
python3 create_pretraining_data.py --do_whole_word_mask=True --input_file=data/news_zh_1.txt \
--output_file=data/tf_news_2016_zh_raw_news2016zh_1.tfrecord --vocab_file=$BERT_BASE_DIR/vocab.txt --do_lower_case=True \
--max_seq_length=512 --max_predictions_per_seq=51 --masked_lm_prob=0.10

================================================
FILE: create_pretraining_data.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Create masked LM/next sentence masked_lm TF examples for BERT."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import random
import tokenization
import tensorflow as tf
import jieba
import re
flags = tf.flags

FLAGS = flags.FLAGS

flags.DEFINE_string("input_file", None,
                    "Input raw text file (or comma-separated list of files).")

flags.DEFINE_string(
    "output_file", None,
    "Output TF example file (or comma-separated list of files).")

flags.DEFINE_string("vocab_file", None,
                    "The vocabulary file that the BERT model was trained on.")

flags.DEFINE_bool(
    "do_lower_case", True,
    "Whether to lower case the input text. Should be True for uncased "
    "models and False for cased models.")

flags.DEFINE_bool(
    "do_whole_word_mask", False,
    "Whether to use whole word masking rather than per-WordPiece masking.")

flags.DEFINE_integer("max_seq_length", 128, "Maximum sequence length.")

flags.DEFINE_integer("max_predictions_per_seq", 20,
                     "Maximum number of masked LM predictions per sequence.")

flags.DEFINE_integer("random_seed", 12345, "Random seed for data generation.")

flags.DEFINE_integer(
    "dupe_factor", 10,
    "Number of times to duplicate the input data (with different masks).")

flags.DEFINE_float("masked_lm_prob", 0.15, "Masked LM probability.")

flags.DEFINE_float(
    "short_seq_prob", 0.1,
    "Probability of creating sequences which are shorter than the "
    "maximum length.")

flags.DEFINE_bool("non_chinese", False,"manually set this to True if you are not doing chinese pre-train task.")


class TrainingInstance(object):
  """A single training instance (sentence pair)."""

  def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm_labels,
               is_random_next):
    self.tokens = tokens
    self.segment_ids = segment_ids
    self.is_random_next = is_random_next
    self.masked_lm_positions = masked_lm_positions
    self.masked_lm_labels = masked_lm_labels

  def __str__(self):
    s = ""
    s += "tokens: %s\n" % (" ".join(
        [tokenization.printable_text(x) for x in self.tokens]))
    s += "segment_ids: %s\n" % (" ".join([str(x) for x in self.segment_ids]))
    s += "is_random_next: %s\n" % self.is_random_next
    s += "masked_lm_positions: %s\n" % (" ".join(
        [str(x) for x in self.masked_lm_positions]))
    s += "masked_lm_labels: %s\n" % (" ".join(
        [tokenization.printable_text(x) for x in self.masked_lm_labels]))
    s += "\n"
    return s

  def __repr__(self):
    return self.__str__()


def write_instance_to_example_files(instances, tokenizer, max_seq_length,
                                    max_predictions_per_seq, output_files):
  """Create TF example files from `TrainingInstance`s."""
  writers = []
  for output_file in output_files:
    writers.append(tf.python_io.TFRecordWriter(output_file))

  writer_index = 0

  total_written = 0
  for (inst_index, instance) in enumerate(instances):
    input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)
    input_mask = [1] * len(input_ids)
    segment_ids = list(instance.segment_ids)
    assert len(input_ids) <= max_seq_length

    while len(input_ids) < max_seq_length:
      input_ids.append(0)
      input_mask.append(0)
      segment_ids.append(0)

    assert len(input_ids) == max_seq_length
    assert len(input_mask) == max_seq_length
    assert len(segment_ids) == max_seq_length

    masked_lm_positions = list(instance.masked_lm_positions)
    masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)
    masked_lm_weights = [1.0] * len(masked_lm_ids)

    while len(masked_lm_positions) < max_predictions_per_seq:
      masked_lm_positions.append(0)
      masked_lm_ids.append(0)
      masked_lm_weights.append(0.0)

    next_sentence_label = 1 if instance.is_random_next else 0

    features = collections.OrderedDict()
    features["input_ids"] = create_int_feature(input_ids)
    features["input_mask"] = create_int_feature(input_mask)
    features["segment_ids"] = create_int_feature(segment_ids)
    features["masked_lm_positions"] = create_int_feature(masked_lm_positions)
    features["masked_lm_ids"] = create_int_feature(masked_lm_ids)
    features["masked_lm_weights"] = create_float_feature(masked_lm_weights)
    features["next_sentence_labels"] = create_int_feature([next_sentence_label])

    tf_example = tf.train.Example(features=tf.train.Features(feature=features))

    writers[writer_index].write(tf_example.SerializeToString())
    writer_index = (writer_index + 1) % len(writers)

    total_written += 1

    if inst_index < 20:
      tf.logging.info("*** Example ***")
      tf.logging.info("tokens: %s" % " ".join(
          [tokenization.printable_text(x) for x in instance.tokens]))

      for feature_name in features.keys():
        feature = features[feature_name]
        values = []
        if feature.int64_list.value:
          values = feature.int64_list.value
        elif feature.float_list.value:
          values = feature.float_list.value
        tf.logging.info(
            "%s: %s" % (feature_name, " ".join([str(x) for x in values])))

  for writer in writers:
    writer.close()

  tf.logging.info("Wrote %d total instances", total_written)


def create_int_feature(values):
  feature = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))
  return feature


def create_float_feature(values):
  feature = tf.train.Feature(float_list=tf.train.FloatList(value=list(values)))
  return feature


def create_training_instances(input_files, tokenizer, max_seq_length,
                              dupe_factor, short_seq_prob, masked_lm_prob,
                              max_predictions_per_seq, rng):
  """Create `TrainingInstance`s from raw text."""
  all_documents = [[]]

  # Input file format:
  # (1) One sentence per line. These should ideally be actual sentences, not
  # entire paragraphs or arbitrary spans of text. (Because we use the
  # sentence boundaries for the "next sentence prediction" task).
  # (2) Blank lines between documents. Document boundaries are needed so
  # that the "next sentence prediction" task doesn't span between documents.
  for input_file in input_files:
    with tf.gfile.GFile(input_file, "r") as reader:
      while True:
        strings=reader.readline()
        strings=strings.replace("   "," ").replace("  "," ") # 如果有两个或三个空格，替换为一个空格
        line = tokenization.convert_to_unicode(strings)
        if not line:
          break
        line = line.strip()

        # Empty lines are used as document delimiters
        if not line:
          all_documents.append([])
        tokens = tokenizer.tokenize(line)
        if tokens:
          all_documents[-1].append(tokens)

  # Remove empty documents
  all_documents = [x for x in all_documents if x]
  rng.shuffle(all_documents)

  vocab_words = list(tokenizer.vocab.keys())
  instances = []
  for _ in range(dupe_factor):
    for document_index in range(len(all_documents)):
      instances.extend(
        create_instances_from_document_albert( # change to albert style for sentence order prediction(SOP), 2019-08-28, brightmart
              all_documents, document_index, max_seq_length, short_seq_prob,
              masked_lm_prob, max_predictions_per_seq, vocab_words, rng))

  rng.shuffle(instances)
  return instances

def get_new_segment(segment):  # 新增的方法 ####
    """
    输入一句话，返回一句经过处理的话: 为了支持中文全称mask，将被分开的词，将上特殊标记("#")，使得后续处理模块，能够知道哪些字是属于同一个词的。
    :param segment: 一句话. e.g.  ['悬', '灸', '技', '术', '培', '训', '专', '家', '教', '你', '艾', '灸', '降', '血', '糖', '，', '为', '爸', '妈', '收', '好', '了', '！']
    :return: 一句处理过的话 e.g.    ['悬', '##灸', '技', '术', '培', '训', '专', '##家', '教', '你', '艾', '##灸', '降', '##血', '##糖', '，', '为', '爸', '##妈', '收', '##好', '了', '！']
    """
    seq_cws = jieba.lcut("".join(segment)) # 分词
    seq_cws_dict = {x: 1 for x in seq_cws} # 分词后的词加入到词典dict
    new_segment = []
    i = 0
    while i < len(segment): # 从句子的第一个字开始处理，知道处理完整个句子
      if len(re.findall('[\u4E00-\u9FA5]', segment[i])) == 0:  # 如果找不到中文的，原文加进去即不用特殊处理。
        new_segment.append(segment[i])
        i += 1
        continue

      has_add = False
      for length in range(3, 0, -1):
        if i + length > len(segment):
          continue
        if ''.join(segment[i:i + length]) in seq_cws_dict:
          new_segment.append(segment[i])
          for l in range(1, length):
            new_segment.append('##' + segment[i + l])
          i += length
          has_add = True
          break
      if not has_add:
        new_segment.append(segment[i])
        i += 1
    # print("get_new_segment.wwm.get_new_segment:",new_segment)
    return new_segment

def create_instances_from_document_albert(
    all_documents, document_index, max_seq_length, short_seq_prob,
    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):
  """Creates `TrainingInstance`s for a single document.
     This method is changed to create sentence-order prediction (SOP) followed by idea from paper of ALBERT, 2019-08-28, brightmart
  """
  document = all_documents[document_index] # 得到一个文档

  # Account for [CLS], [SEP], [SEP]
  max_num_tokens = max_seq_length - 3

  # We *usually* want to fill up the entire sequence since we are padding
  # to `max_seq_length` anyways, so short sequences are generally wasted
  # computation. However, we *sometimes*
  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter
  # sequences to minimize the mismatch between pre-training and fine-tuning.
  # The `target_seq_length` is just a rough target however, whereas
  # `max_seq_length` is a hard limit.
  target_seq_length = max_num_tokens
  if rng.random() < short_seq_prob: # 有一定的比例，如10%的概率，我们使用比较短的序列长度，以缓解预训练的长序列和调优阶段（可能的）短序列的不一致情况
    target_seq_length = rng.randint(2, max_num_tokens)

  # We DON'T just concatenate all of the tokens from a document into a long
  # sequence and choose an arbitrary split point because this would make the
  # next sentence prediction task too easy. Instead, we split the input into
  # segments "A" and "B" based on the actual "sentences" provided by the user
  # input.
  # 设法使用实际的句子，而不是任意的截断句子，从而更好的构造句子连贯性预测的任务
  instances = []
  current_chunk = [] # 当前处理的文本段，包含多个句子
  current_length = 0
  i = 0
  # print("###document:",document) # 一个document可以是一整篇文章、新闻、词条等. document:[['是', '爷', '们', '，', '就', '得', '给', '媳', '妇', '幸', '福'], ['关', '注', '【', '晨', '曦', '教', '育', '】', '，', '获', '取', '育', '儿', '的', '智', '慧', '，', '与', '孩', '子', '一', '同', '成', '长', '！'], ['方', '法', ':', '打', '开', '微', '信', '→', '添', '加', '朋', '友', '→', '搜', '号', '→', '##he', '##bc', '##x', '##jy', '##→', '关', '注', '!', '我', '是', '一', '个', '爷', '们', '，', '孝', '顺', '是', '做', '人', '的', '第', '一', '准', '则', '。'], ['甭', '管', '小', '时', '候', '怎', '么', '跟', '家', '长', '犯', '混', '蛋', '，', '长', '大', '了', '，', '就', '底', '报', '答', '父', '母', '，', '以', '后', '我', '媳', '妇', '也', '必', '须', '孝', '顺', '。'], ['我', '是', '一', '个', '爷', '们', '，', '可', '以', '花', '心', '，', '可', '以', '好', '玩', '。'], ['但', '我', '一', '定', '会', '找', '一', '个', '管', '的', '住', '我', '的', '女', '人', '，', '和', '我', '一', '起', '生', '活', '。'], ['28', '岁', '以', '前', '在', '怎', '么', '玩', '都', '行', '，', '但', '我', '最', '后', '一', '定', '会', '找', '一', '个', '勤', '俭', '持', '家', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '我', '不', '会', '让', '自', '己', '的', '女', '人', '受', '一', '点', '委', '屈', '，', '每', '次', '把', '她', '抱', '在', '怀', '里', '，', '看', '她', '洋', '溢', '着', '幸', '福', '的', '脸', '，', '我', '都', '会', '引', '以', '为', '傲', '，', '这', '特', '么', '就', '是', '我', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '干', '什', '么', '也', '不', '能', '忘', '了', '自', '己', '媳', '妇', '，', '就', '算', '和', '哥', '们', '一', '起', '喝', '酒', '，', '喝', '到', '很', '晚', '，', '也', '要', '提', '前', '打', '电', '话', '告', '诉', '她', '，', '让', '她', '早', '点', '休', '息', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '绝', '对', '不', '能', '抽', '烟', '，', '喝', '酒', '还', '勉', '强', '过', '得', '去', '，', '不', '过', '该', '喝', '的', '时', '候', '喝', '，', '不', '该', '喝', '的', '时', '候', '，', '少', '扯', '纳', '极', '薄', '蛋', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '必', '须', '听', '我', '话', '，', '在', '人', '前', '一', '定', '要', '给', '我', '面', '子', '，', '回', '家', '了', '咱', '什', '么', '都', '好', '说', '。'], ['我', '是', '一', '爷', '们', '，', '就', '算', '难', '的', '吃', '不', '上', '饭', '了', '，', '都', '不', '张', '口', '跟', '媳', '妇', '要', '一', '分', '钱', '。'], ['我', '是', '一', '爷', '们', '，', '不', '管', '上', '学', '还', '是', '上', '班', '，', '我', '都', '会', '送', '媳', '妇', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '交', '往', '不', '到', '1', '年', '，', '绝', '对', '不', '会', '和', '媳', '妇', '提', '过', '分', '的', '要', '求', '，', '我', '会', '尊', '重', '她', '。'], ['我', '是', '一', '爷', '们', '，', '游', '戏', '永', '远', '比', '不', '上', '我', '媳', '妇', '重', '要', '，', '只', '要', '媳', '妇', '发', '话', '，', '我', '绝', '对', '唯', '命', '是', '从', '。'], ['我', '是', '一', '爷', '们', '，', '上', 'q', '绝', '对', '是', '为', '了', '等', '媳', '妇', '，', '所', '有', '暧', '昧', '的', '心', '情', '只', '为', '她', '一', '个', '女', '人', '而', '写', '，', '我', '不', '一', '定', '会', '经', '常', '写', '日', '志', '，', '可', '是', '我', '会', '告', '诉', '全', '世', '界', '，', '我', '很', '爱', '她', '。'], ['我', '是', '一', '爷', '们', '，', '不', '一', '定', '要', '经', '常', '制', '造', '浪', '漫', '、', '偶', '尔', '过', '个', '节', '日', '也', '要', '送', '束', '玫', '瑰', '花', '给', '媳', '妇', '抱', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '手', '机', '会', '24', '小', '时', '为', '她', '开', '机', '，', '让', '她', '半', '夜', '痛', '经', '的', '时', '候', '，', '做', '恶', '梦', '的', '时', '候', '，', '随', '时', '可', '以', '联', '系', '到', '我', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '经', '常', '带', '媳', '妇', '出', '去', '玩', '，', '她', '不', '一', '定', '要', '和', '我', '所', '有', '的', '哥', '们', '都', '认', '识', '，', '但', '见', '面', '能', '说', '的', '上', '话', '就', '行', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '和', '媳', '妇', '的', '姐', '妹', '哥', '们', '搞', '好', '关', '系', '，', '让', '她', '们', '相', '信', '我', '一', '定', '可', '以', '给', '我', '媳', '妇', '幸', '福', '。'], ['我', '是', '一', '爷', '们', '，', '吵', '架', '后', '、', '也', '要', '主', '动', '打', '电', '话', '关', '心', '她', '，', '咱', '是', '一', '爷', '们', '，', '给', '媳', '妇', '服', '个', '软', '，', '道', '个', '歉', '怎', '么', '了', '？'], ['我', '是', '一', '爷', '们', '，', '绝', '对', '不', '会', '嫌', '弃', '自', '己', '媳', '妇', '，', '拿', '她', '和', '别', '人', '比', '，', '说', '她', '这', '不', '如', '人', '家', '，', '纳', '不', '如', '人', '家', '的', '。'], ['我', '是', '一', '爷', '们', '，', '陪', '媳', '妇', '逛', '街', '时', '，', '碰', '见', '熟', '人', '，', '无', '论', '我', '媳', '妇', '长', '的', '好', '看', '与', '否', '，', '我', '都', '会', '大', '方', '的', '介', '绍', '。'], ['谁', '让', '咱', '爷', '们', '就', '好', '这', '口', '呢', '。'], ['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。'], ['【', '我', '们', '重', '在', '分', '享', '。'], ['所', '有', '文', '字', '和', '美', '图', '，', '来', '自', '网', '络', '，', '晨', '欣', '教', '育', '整', '理', '。'], ['对', '原', '文', '作', '者', '，', '表', '示', '敬', '意', '。'], ['】', '关', '注', '晨', '曦', '教', '育', '[UNK]', '[UNK]', '晨', '曦', '教', '育', '（', '微', '信', '号', '：', 'he', '##bc', '##x', '##jy', '）', '。'], ['打', '开', '微', '信', '，', '扫', '描', '二', '维', '码', '，', '关', '注', '[UNK]', '晨', '曦', '教', '育', '[UNK]', '，', '获', '取', '更', '多', '育', '儿', '资', '源', '。'], ['点', '击', '下', '面', '订', '阅', '按', '钮', '订', '阅', '，', '会', '有', '更', '多', '惊', '喜', '哦', '！']]
  while i < len(document): # 从文档的第一个位置开始，按个往下看
    segment = document[i] # segment是列表，代表的是按字分开的一个完整句子，如 segment=['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。']
    if FLAGS.non_chinese==False: # if non chinese is False, that means it is chinese, then do something to make chinese whole word mask works.
      segment = get_new_segment(segment)  # whole word mask for chinese: 结合分词的中文的whole mask设置即在需要的地方加上“##”

    current_chunk.append(segment) # 将一个独立的句子加入到当前的文本块中
    current_length += len(segment) # 累计到为止位置接触到句子的总长度
    if i == len(document) - 1 or current_length >= target_seq_length:
      # 如果累计的序列长度达到了目标的长度，或当前走到了文档结尾==>构造并添加到“A[SEP]B“中的A和B中；
      if current_chunk: # 如果当前块不为空
        # `a_end` is how many segments from `current_chunk` go into the `A`
        # (first) sentence.
        a_end = 1
        if len(current_chunk) >= 2: # 当前块，如果包含超过两个句子，取当前块的一部分作为“A[SEP]B“中的A部分
          a_end = rng.randint(1, len(current_chunk) - 1)
        # 将当前文本段中选取出来的前半部分，赋值给A即tokens_a
        tokens_a = []
        for j in range(a_end):
          tokens_a.extend(current_chunk[j])

        # 构造“A[SEP]B“中的B部分(有一部分是正常的当前文档中的后半部;在原BERT的实现中一部分是随机的从另一个文档中选取的，）
        tokens_b = []
        for j in range(a_end, len(current_chunk)):
          tokens_b.extend(current_chunk[j])

        # 有百分之50%的概率交换一下tokens_a和tokens_b的位置
        # print("tokens_a length1:",len(tokens_a))
        # print("tokens_b length1:",len(tokens_b)) # len(tokens_b) = 0

        if len(tokens_a) == 0 or len(tokens_b) == 0: i += 1; continue
        if rng.random() < 0.5: # 交换一下tokens_a和tokens_b
          is_random_next=True
          temp=tokens_a
          tokens_a=tokens_b
          tokens_b=temp
        else:
          is_random_next=False

        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)

        assert len(tokens_a) >= 1
        assert len(tokens_b) >= 1

        # 把tokens_a & tokens_b加入到按照bert的风格，即以[CLS]tokens_a[SEP]tokens_b[SEP]的形式，结合到一起，作为最终的tokens; 也带上segment_ids，前面部分segment_ids的值是0，后面部分的值是1.
        tokens = []
        segment_ids = []
        tokens.append("[CLS]")
        segment_ids.append(0)
        for token in tokens_a:
          tokens.append(token)
          segment_ids.append(0)

        tokens.append("[SEP]")
        segment_ids.append(0)

        for token in tokens_b:
          tokens.append(token)
          segment_ids.append(1)
        tokens.append("[SEP]")
        segment_ids.append(1)

        # 创建masked LM的任务的数据 Creates the predictions for the masked LM objective
        (tokens, masked_lm_positions,
         masked_lm_labels) = create_masked_lm_predictions(
             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)
        instance = TrainingInstance( # 创建训练实例的对象
            tokens=tokens,
            segment_ids=segment_ids,
            is_random_next=is_random_next,
            masked_lm_positions=masked_lm_positions,
            masked_lm_labels=masked_lm_labels)
        instances.append(instance)
      current_chunk = [] # 清空当前块
      current_length = 0 # 重置当前文本块的长度
    i += 1 # 接着文档中的内容往后看

  return instances


def create_instances_from_document_original( # THIS IS ORIGINAL BERT STYLE FOR CREATE DATA OF MLM AND NEXT SENTENCE PREDICTION TASK
    all_documents, document_index, max_seq_length, short_seq_prob,
    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):
  """Creates `TrainingInstance`s for a single document."""
  document = all_documents[document_index] # 得到一个文档

  # Account for [CLS], [SEP], [SEP]
  max_num_tokens = max_seq_length - 3

  # We *usually* want to fill up the entire sequence since we are padding
  # to `max_seq_length` anyways, so short sequences are generally wasted
  # computation. However, we *sometimes*
  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter
  # sequences to minimize the mismatch between pre-training and fine-tuning.
  # The `target_seq_length` is just a rough target however, whereas
  # `max_seq_length` is a hard limit.
  target_seq_length = max_num_tokens
  if rng.random() < short_seq_prob: # 有一定的比例，如10%的概率，我们使用比较短的序列长度，以缓解预训练的长序列和调优阶段（可能的）短序列的不一致情况
    target_seq_length = rng.randint(2, max_num_tokens)

  # We DON'T just concatenate all of the tokens from a document into a long
  # sequence and choose an arbitrary split point because this would make the
  # next sentence prediction task too easy. Instead, we split the input into
  # segments "A" and "B" based on the actual "sentences" provided by the user
  # input.
  # 设法使用实际的句子，而不是任意的截断句子，从而更好的构造句子连贯性预测的任务
  instances = []
  current_chunk = [] # 当前处理的文本段，包含多个句子
  current_length = 0
  i = 0
  # print("###document:",document) # 一个document可以是一整篇文章、新闻、一个词条等. document:[['是', '爷', '们', '，', '就', '得', '给', '媳', '妇', '幸', '福'], ['关', '注', '【', '晨', '曦', '教', '育', '】', '，', '获', '取', '育', '儿', '的', '智', '慧', '，', '与', '孩', '子', '一', '同', '成', '长', '！'], ['方', '法', ':', '打', '开', '微', '信', '→', '添', '加', '朋', '友', '→', '搜', '号', '→', '##he', '##bc', '##x', '##jy', '##→', '关', '注', '!', '我', '是', '一', '个', '爷', '们', '，', '孝', '顺', '是', '做', '人', '的', '第', '一', '准', '则', '。'], ['甭', '管', '小', '时', '候', '怎', '么', '跟', '家', '长', '犯', '混', '蛋', '，', '长', '大', '了', '，', '就', '底', '报', '答', '父', '母', '，', '以', '后', '我', '媳', '妇', '也', '必', '须', '孝', '顺', '。'], ['我', '是', '一', '个', '爷', '们', '，', '可', '以', '花', '心', '，', '可', '以', '好', '玩', '。'], ['但', '我', '一', '定', '会', '找', '一', '个', '管', '的', '住', '我', '的', '女', '人', '，', '和', '我', '一', '起', '生', '活', '。'], ['28', '岁', '以', '前', '在', '怎', '么', '玩', '都', '行', '，', '但', '我', '最', '后', '一', '定', '会', '找', '一', '个', '勤', '俭', '持', '家', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '我', '不', '会', '让', '自', '己', '的', '女', '人', '受', '一', '点', '委', '屈', '，', '每', '次', '把', '她', '抱', '在', '怀', '里', '，', '看', '她', '洋', '溢', '着', '幸', '福', '的', '脸', '，', '我', '都', '会', '引', '以', '为', '傲', '，', '这', '特', '么', '就', '是', '我', '的', '女', '人', '。'], ['我', '是', '一', '爷', '们', '，', '干', '什', '么', '也', '不', '能', '忘', '了', '自', '己', '媳', '妇', '，', '就', '算', '和', '哥', '们', '一', '起', '喝', '酒', '，', '喝', '到', '很', '晚', '，', '也', '要', '提', '前', '打', '电', '话', '告', '诉', '她', '，', '让', '她', '早', '点', '休', '息', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '绝', '对', '不', '能', '抽', '烟', '，', '喝', '酒', '还', '勉', '强', '过', '得', '去', '，', '不', '过', '该', '喝', '的', '时', '候', '喝', '，', '不', '该', '喝', '的', '时', '候', '，', '少', '扯', '纳', '极', '薄', '蛋', '。'], ['我', '是', '一', '爷', '们', '，', '我', '媳', '妇', '必', '须', '听', '我', '话', '，', '在', '人', '前', '一', '定', '要', '给', '我', '面', '子', '，', '回', '家', '了', '咱', '什', '么', '都', '好', '说', '。'], ['我', '是', '一', '爷', '们', '，', '就', '算', '难', '的', '吃', '不', '上', '饭', '了', '，', '都', '不', '张', '口', '跟', '媳', '妇', '要', '一', '分', '钱', '。'], ['我', '是', '一', '爷', '们', '，', '不', '管', '上', '学', '还', '是', '上', '班', '，', '我', '都', '会', '送', '媳', '妇', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '交', '往', '不', '到', '1', '年', '，', '绝', '对', '不', '会', '和', '媳', '妇', '提', '过', '分', '的', '要', '求', '，', '我', '会', '尊', '重', '她', '。'], ['我', '是', '一', '爷', '们', '，', '游', '戏', '永', '远', '比', '不', '上', '我', '媳', '妇', '重', '要', '，', '只', '要', '媳', '妇', '发', '话', '，', '我', '绝', '对', '唯', '命', '是', '从', '。'], ['我', '是', '一', '爷', '们', '，', '上', 'q', '绝', '对', '是', '为', '了', '等', '媳', '妇', '，', '所', '有', '暧', '昧', '的', '心', '情', '只', '为', '她', '一', '个', '女', '人', '而', '写', '，', '我', '不', '一', '定', '会', '经', '常', '写', '日', '志', '，', '可', '是', '我', '会', '告', '诉', '全', '世', '界', '，', '我', '很', '爱', '她', '。'], ['我', '是', '一', '爷', '们', '，', '不', '一', '定', '要', '经', '常', '制', '造', '浪', '漫', '、', '偶', '尔', '过', '个', '节', '日', '也', '要', '送', '束', '玫', '瑰', '花', '给', '媳', '妇', '抱', '回', '家', '。'], ['我', '是', '一', '爷', '们', '，', '手', '机', '会', '24', '小', '时', '为', '她', '开', '机', '，', '让', '她', '半', '夜', '痛', '经', '的', '时', '候', '，', '做', '恶', '梦', '的', '时', '候', '，', '随', '时', '可', '以', '联', '系', '到', '我', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '经', '常', '带', '媳', '妇', '出', '去', '玩', '，', '她', '不', '一', '定', '要', '和', '我', '所', '有', '的', '哥', '们', '都', '认', '识', '，', '但', '见', '面', '能', '说', '的', '上', '话', '就', '行', '。'], ['我', '是', '一', '爷', '们', '，', '我', '会', '和', '媳', '妇', '的', '姐', '妹', '哥', '们', '搞', '好', '关', '系', '，', '让', '她', '们', '相', '信', '我', '一', '定', '可', '以', '给', '我', '媳', '妇', '幸', '福', '。'], ['我', '是', '一', '爷', '们', '，', '吵', '架', '后', '、', '也', '要', '主', '动', '打', '电', '话', '关', '心', '她', '，', '咱', '是', '一', '爷', '们', '，', '给', '媳', '妇', '服', '个', '软', '，', '道', '个', '歉', '怎', '么', '了', '？'], ['我', '是', '一', '爷', '们', '，', '绝', '对', '不', '会', '嫌', '弃', '自', '己', '媳', '妇', '，', '拿', '她', '和', '别', '人', '比', '，', '说', '她', '这', '不', '如', '人', '家', '，', '纳', '不', '如', '人', '家', '的', '。'], ['我', '是', '一', '爷', '们', '，', '陪', '媳', '妇', '逛', '街', '时', '，', '碰', '见', '熟', '人', '，', '无', '论', '我', '媳', '妇', '长', '的', '好', '看', '与', '否', '，', '我', '都', '会', '大', '方', '的', '介', '绍', '。'], ['谁', '让', '咱', '爷', '们', '就', '好', '这', '口', '呢', '。'], ['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。'], ['【', '我', '们', '重', '在', '分', '享', '。'], ['所', '有', '文', '字', '和', '美', '图', '，', '来', '自', '网', '络', '，', '晨', '欣', '教', '育', '整', '理', '。'], ['对', '原', '文', '作', '者', '，', '表', '示', '敬', '意', '。'], ['】', '关', '注', '晨', '曦', '教', '育', '[UNK]', '[UNK]', '晨', '曦', '教', '育', '（', '微', '信', '号', '：', 'he', '##bc', '##x', '##jy', '）', '。'], ['打', '开', '微', '信', '，', '扫', '描', '二', '维', '码', '，', '关', '注', '[UNK]', '晨', '曦', '教', '育', '[UNK]', '，', '获', '取', '更', '多', '育', '儿', '资', '源', '。'], ['点', '击', '下', '面', '订', '阅', '按', '钮', '订', '阅', '，', '会', '有', '更', '多', '惊', '喜', '哦', '！']]
  while i < len(document): # 从文档的第一个位置开始，按个往下看
    segment = document[i] # segment是列表，代表的是按字分开的一个完整句子，如 segment=['我', '是', '一', '爷', '们', '，', '我', '想', '我', '会', '给', '我', '媳', '妇', '最', '好', '的', '幸', '福', '。']
    # print("###i:",i,";segment:",segment)
    current_chunk.append(segment) # 将一个独立的句子加入到当前的文本块中
    current_length += len(segment) # 累计到为止位置接触到句子的总长度
    if i == len(document) - 1 or current_length >= target_seq_length: # 如果累计的序列长度达到了目标的长度==>构造并添加到“A[SEP]B“中的A和B中。
      if current_chunk: # 如果当前块不为空
        # `a_end` is how many segments from `current_chunk` go into the `A`
        # (first) sentence.
        a_end = 1
        if len(current_chunk) >= 2: # 当前块，如果包含超过两个句子，怎取当前块的一部分作为“A[SEP]B“中的A部分
          a_end = rng.randint(1, len(current_chunk) - 1)
        # 将当前文本段中选取出来的前半部分，赋值给A即tokens_a
        tokens_a = []
        for j in range(a_end):
          tokens_a.extend(current_chunk[j])

        # 构造“A[SEP]B“中的B部分(原本的B有一部分是随机的从另一个文档中选取的，有一部分是正常的当前文档中的后半部）
        tokens_b = []
        # Random next
        is_random_next = False
        if len(current_chunk) == 1 or rng.random() < 0.5: # 有50%的概率，是从其他文档中随机的选取一个文档，并得到这个文档的后半版本作为B即tokens_b
          is_random_next = True
          target_b_length = target_seq_length - len(tokens_a)

          # This should rarely go for more than one iteration for large
          # corpora. However, just to be careful, we try to make sure that
          # the random document is not the same as the document
          # we're processing.
          random_document_index=0
          for _ in range(10): # 随机的选出一个与当前的文档不一样的文档的索引
            random_document_index = rng.randint(0, len(all_documents) - 1)
            if random_document_index != document_index:
              break

          random_document = all_documents[random_document_index] # 选出这个文档
          random_start = rng.randint(0, len(random_document) - 1) # 从这个文档选出一个段落的开始位置
          for j in range(random_start, len(random_document)): # 从这个文档的开始位置到结束，作为我们的“A[SEP]B“中的B即tokens_b
            tokens_b.extend(random_document[j])
            if len(tokens_b) >= target_b_length:
              break
          # We didn't actually use these segments so we "put them back" so
          # they don't go to waste. 这里是为了防止文本的浪费的一个小技巧
          num_unused_segments = len(current_chunk) - a_end # e.g. 550-200=350
          i -= num_unused_segments # i=i-num_unused_segments, e.g. i=400, num_unused_segments=350, 那么 i=i-num_unused_segments=400-350=50
        # Actual next
        else: # 有另外50%的几乎，从当前文本块（长度为max_sequence_length）中的后段中填充到tokens_b即“A[SEP]B“中的B。
          is_random_next = False
          for j in range(a_end, len(current_chunk)):
            tokens_b.extend(current_chunk[j])
        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)

        assert len(tokens_a) >= 1
        assert len(tokens_b) >= 1

        # 把tokens_a & tokens_b加入到按照bert的风格，即以[CLS]tokens_a[SEP]tokens_b[SEP]的形式，结合到一起，作为最终的tokens; 也带上segment_ids，前面部分segment_ids的值是0，后面部分的值是1.
        tokens = []
        segment_ids = []
        tokens.append("[CLS]")
        segment_ids.append(0)
        for token in tokens_a:
          tokens.append(token)
          segment_ids.append(0)

        tokens.append("[SEP]")
        segment_ids.append(0)

        for token in tokens_b:
          tokens.append(token)
          segment_ids.append(1)
        tokens.append("[SEP]")
        segment_ids.append(1)

        # 创建masked LM的任务的数据 Creates the predictions for the masked LM objective
        (tokens, masked_lm_positions,
         masked_lm_labels) = create_masked_lm_predictions(
             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)
        instance = TrainingInstance( # 创建训练实例的对象
            tokens=tokens,
            segment_ids=segment_ids,
            is_random_next=is_random_next,
            masked_lm_positions=masked_lm_positions,
            masked_lm_labels=masked_lm_labels)
        instances.append(instance)
      current_chunk = [] # 清空当前块
      current_length = 0 # 重置当前文本块的长度
    i += 1 # 接着文档中的内容往后看

  return instances


MaskedLmInstance = collections.namedtuple("MaskedLmInstance",
                                          ["index", "label"])


def create_masked_lm_predictions(tokens, masked_lm_prob,
                                 max_predictions_per_seq, vocab_words, rng):
  """Creates the predictions for the masked LM objective."""

  cand_indexes = []
  for (i, token) in enumerate(tokens):
    if token == "[CLS]" or token == "[SEP]":
      continue
    # Whole Word Masking means that if we mask all of the wordpieces
    # corresponding to an original word. When a word has been split into
    # WordPieces, the first token does not have any marker and any subsequence
    # tokens are prefixed with ##. So whenever we see the ## token, we
    # append it to the previous set of word indexes.
    #
    # Note that Whole Word Masking does *not* change the training code
    # at all -- we still predict each WordPiece independently, softmaxed
    # over the entire vocabulary.
    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and
            token.startswith("##")):
      cand_indexes[-1].append(i)
    else:
      cand_indexes.append([i])

  rng.shuffle(cand_indexes)

  if FLAGS.non_chinese==False: # if non chinese is False, that means it is chinese, then try to remove "##" which is added previously
    output_tokens = [t[2:] if len(re.findall('##[\u4E00-\u9FA5]', t)) > 0 else t for t in tokens]  # 去掉"##"
  else: # english and other language, which is not chinese
    output_tokens = list(tokens)

  num_to_predict = min(max_predictions_per_seq,
                       max(1, int(round(len(tokens) * masked_lm_prob))))

  masked_lms = []
  covered_indexes = set()
  for index_set in cand_indexes:
    if len(masked_lms) >= num_to_predict:
      break
    # If adding a whole-word mask would exceed the maximum number of
    # predictions, then just skip this candidate.
    if len(masked_lms) + len(index_set) > num_to_predict:
      continue
    is_any_index_covered = False
    for index in index_set:
      if index in covered_indexes:
        is_any_index_covered = True
        break
    if is_any_index_covered:
      continue
    for index in index_set:
      covered_indexes.add(index)

      masked_token = None
      # 80% of the time, replace with [MASK]
      if rng.random() < 0.8:
        masked_token = "[MASK]"
      else:
        # 10% of the time, keep original
        if rng.random() < 0.5:
          if FLAGS.non_chinese == False: # if non chinese is False, that means it is chinese, then try to remove "##" which is added previously
            masked_token = tokens[index][2:] if len(re.findall('##[\u4E00-\u9FA5]', tokens[index])) > 0 else tokens[index]  # 去掉"##"
          else:
            masked_token = tokens[index]
        # 10% of the time, replace with random word
        else:
          masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]

      output_tokens[index] = masked_token

      masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))
  assert len(masked_lms) <= num_to_predict
  masked_lms = sorted(masked_lms, key=lambda x: x.index)

  masked_lm_positions = []
  masked_lm_labels = []
  for p in masked_lms:
    masked_lm_positions.append(p.index)
    masked_lm_labels.append(p.label)

  # tf.logging.info('%s' % (tokens))
  # tf.logging.info('%s' % (output_tokens))
  return (output_tokens, masked_lm_positions, masked_lm_labels)

def create_masked_lm_predictions_original(tokens, masked_lm_prob,
                                 max_predictions_per_seq, vocab_words, rng):
  """Creates the predictions for the masked LM objective."""

  cand_indexes = []
  for (i, token) in enumerate(tokens):
    if token == "[CLS]" or token == "[SEP]":
      continue
    # Whole Word Masking means that if we mask all of the wordpieces
    # corresponding to an original word. When a word has been split into
    # WordPieces, the first token does not have any marker and any subsequence
    # tokens are prefixed with ##. So whenever we see the ## token, we
    # append it to the previous set of word indexes.
    #
    # Note that Whole Word Masking does *not* change the training code
    # at all -- we still predict each WordPiece independently, softmaxed
    # over the entire vocabulary.
    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and
        token.startswith("##")):
      cand_indexes[-1].append(i)
    else:
      cand_indexes.append([i])

  rng.shuffle(cand_indexes)

  output_tokens = list(tokens)

  num_to_predict = min(max_predictions_per_seq,
                       max(1, int(round(len(tokens) * masked_lm_prob))))

  masked_lms = []
  covered_indexes = set()
  for index_set in cand_indexes:
    if len(masked_lms) >= num_to_predict:
      break
    # If adding a whole-word mask would exceed the maximum number of
    # predictions, then just skip this candidate.
    if len(masked_lms) + len(index_set) > num_to_predict:
      continue
    is_any_index_covered = False
    for index in index_set:
      if index in covered_indexes:
        is_any_index_covered = True
        break
    if is_any_index_covered:
      continue
    for index in index_set:
      covered_indexes.add(index)

      masked_token = None
      # 80% of the time, replace with [MASK]
      if rng.random() < 0.8:
        masked_token = "[MASK]"
      else:
        # 10% of the time, keep original
        if rng.random() < 0.5:
          masked_token = tokens[index]
        # 10% of the time, replace with random word
        else:
          masked_token = vocab_words[rng.randint(0, len(vocab_words) - 1)]

      output_tokens[index] = masked_token

      masked_lms.append(MaskedLmInstance(index=index, label=tokens[index]))
  assert len(masked_lms) <= num_to_predict
  masked_lms = sorted(masked_lms, key=lambda x: x.index)

  masked_lm_positions = []
  masked_lm_labels = []
  for p in masked_lms:
    masked_lm_positions.append(p.index)
    masked_lm_labels.append(p.label)

  return (output_tokens, masked_lm_positions, masked_lm_labels)


def truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):
  """Truncates a pair of sequences to a maximum sequence length."""
  while True:
    total_length = len(tokens_a) + len(tokens_b)
    if total_length <= max_num_tokens:
      break

    trunc_tokens = tokens_a if len(tokens_a) > len(tokens_b) else tokens_b
    assert len(trunc_tokens) >= 1

    # We want to sometimes truncate from the front and sometimes from the
    # back to add more randomness and avoid biases.
    if rng.random() < 0.5:
      del trunc_tokens[0]
    else:
      trunc_tokens.pop()


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)

  tokenizer = tokenization.FullTokenizer(
      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)

  input_files = []
  for input_pattern in FLAGS.input_file.split(","):
    input_files.extend(tf.gfile.Glob(input_pattern))

  tf.logging.info("*** Reading from input files ***")
  for input_file in input_files:
    tf.logging.info("  %s", input_file)

  rng = random.Random(FLAGS.random_seed)
  instances = create_training_instances(
      input_files, tokenizer, FLAGS.max_seq_length, FLAGS.dupe_factor,
      FLAGS.short_seq_prob, FLAGS.masked_lm_prob, FLAGS.max_predictions_per_seq,
      rng)

  output_files = FLAGS.output_file.split(",")
  tf.logging.info("*** Writing to output files ***")
  for output_file in output_files:
    tf.logging.info("  %s", output_file)

  write_instance_to_example_files(instances, tokenizer, FLAGS.max_seq_length,
                                  FLAGS.max_predictions_per_seq, output_files)


if __name__ == "__main__":
  flags.mark_flag_as_required("input_file")
  flags.mark_flag_as_required("output_file")
  flags.mark_flag_as_required("vocab_file")
  tf.app.run()

================================================
FILE: create_pretraining_data_google.py
================================================
# coding=utf-8
# Copyright 2019 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Lint as: python2, python3
# coding=utf-8
"""Create masked LM/next sentence masked_lm TF examples for ALBERT."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import random

import numpy as np
import six
from six.moves import range
from six.moves import zip
import tensorflow as tf

from albert import tokenization

flags = tf.flags

FLAGS = flags.FLAGS

flags.DEFINE_string("input_file", None,
                    "Input raw text file (or comma-separated list of files).")

flags.DEFINE_string(
    "output_file", None,
    "Output TF example file (or comma-separated list of files).")

flags.DEFINE_string(
    "vocab_file", None,
    "The vocabulary file that the ALBERT model was trained on.")

flags.DEFINE_string("spm_model_file", None,
                    "The model file for sentence piece tokenization.")

flags.DEFINE_bool(
    "do_lower_case", True,
    "Whether to lower case the input text. Should be True for uncased "
    "models and False for cased models.")

flags.DEFINE_bool(
    "do_whole_word_mask", True,
    "Whether to use whole word masking rather than per-xWordPiece masking.")

flags.DEFINE_bool(
    "do_permutation", False,
    "Whether to do the permutation training.")

flags.DEFINE_bool(
    "favor_shorter_ngram", False,
    "Whether to set higher probabilities for sampling shorter ngrams.")

flags.DEFINE_bool(
    "random_next_sentence", False,
    "Whether to use the sentence that's right before the current sentence "
    "as the negative sample for next sentence prection, rather than using "
    "sentences from other random documents.")

flags.DEFINE_integer("max_seq_length", 512, "Maximum sequence length.")

flags.DEFINE_integer("ngram", 3, "Maximum number of ngrams to mask.")

flags.DEFINE_integer("max_predictions_per_seq", 20,
                     "Maximum number of masked LM predictions per sequence.")

flags.DEFINE_integer("random_seed", 12345, "Random seed for data generation.")

flags.DEFINE_integer(
    "dupe_factor", 10,
    "Number of times to duplicate the input data (with different masks).")

flags.DEFINE_float("masked_lm_prob", 0.15, "Masked LM probability.")

flags.DEFINE_float(
    "short_seq_prob", 0.1,
    "Probability of creating sequences which are shorter than the "
    "maximum length.")


class TrainingInstance(object):
  """A single training instance (sentence pair)."""

  def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm_labels,
               is_random_next, token_boundary):
    self.tokens = tokens
    self.segment_ids = segment_ids
    self.is_random_next = is_random_next
    self.token_boundary = token_boundary
    self.masked_lm_positions = masked_lm_positions
    self.masked_lm_labels = masked_lm_labels

  def __str__(self):
    s = ""
    s += "tokens: %s\n" % (" ".join(
        [tokenization.printable_text(x) for x in self.tokens]))
    s += "segment_ids: %s\n" % (" ".join([str(x) for x in self.segment_ids]))
    s += "token_boundary: %s\n" % (" ".join(
        [str(x) for x in self.token_boundary]))
    s += "is_random_next: %s\n" % self.is_random_next
    s += "masked_lm_positions: %s\n" % (" ".join(
        [str(x) for x in self.masked_lm_positions]))
    s += "masked_lm_labels: %s\n" % (" ".join(
        [tokenization.printable_text(x) for x in self.masked_lm_labels]))
    s += "\n"
    return s

  def __repr__(self):
    return self.__str__()


def write_instance_to_example_files(instances, tokenizer, max_seq_length,
                                    max_predictions_per_seq, output_files):
  """Create TF example files from `TrainingInstance`s."""
  writers = []
  for output_file in output_files:
    writers.append(tf.python_io.TFRecordWriter(output_file))

  writer_index = 0

  total_written = 0
  for (inst_index, instance) in enumerate(instances):
    input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)
    input_mask = [1] * len(input_ids)
    segment_ids = list(instance.segment_ids)
    token_boundary = list(instance.token_boundary)
    assert len(input_ids) <= max_seq_length

    while len(input_ids) < max_seq_length:
      input_ids.append(0)
      input_mask.append(0)
      segment_ids.append(0)
      token_boundary.append(0)

    assert len(input_ids) == max_seq_length
    assert len(input_mask) == max_seq_length
    assert len(segment_ids) == max_seq_length

    masked_lm_positions = list(instance.masked_lm_positions)
    masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)
    masked_lm_weights = [1.0] * len(masked_lm_ids)

    multiplier = 1 + int(FLAGS.do_permutation)
    while len(masked_lm_positions) < max_predictions_per_seq * multiplier:
      masked_lm_positions.append(0)
      masked_lm_ids.append(0)
      masked_lm_weights.append(0.0)

    sentence_order_label = 1 if instance.is_random_next else 0

    features = collections.OrderedDict()
    features["input_ids"] = create_int_feature(input_ids)
    features["input_mask"] = create_int_feature(input_mask)
    features["segment_ids"] = create_int_feature(segment_ids)
    features["token_boundary"] = create_int_feature(token_boundary)
    features["masked_lm_positions"] = create_int_feature(masked_lm_positions)
    features["masked_lm_ids"] = create_int_feature(masked_lm_ids)
    features["masked_lm_weights"] = create_float_feature(masked_lm_weights)
    # Note: We keep this feature name `next_sentence_labels` to be compatible
    # with the original data created by lanzhzh@. However, in the ALBERT case
    # it does contain sentence_order_label.
    features["next_sentence_labels"] = create_int_feature(
        [sentence_order_label])

    tf_example = tf.train.Example(features=tf.train.Features(feature=features))

    writers[writer_index].write(tf_example.SerializeToString())
    writer_index = (writer_index + 1) % len(writers)

    total_written += 1

    if inst_index < 6:
      tf.logging.info("*** Example ***")
      tf.logging.info("tokens: %s" % " ".join(
          [tokenization.printable_text(x) for x in instance.tokens]))

      for feature_name in features.keys():
        feature = features[feature_name]
        values = []
        if feature.int64_list.value:
          values = feature.int64_list.value
        elif feature.float_list.value:
          values = feature.float_list.value
        tf.logging.info(
            "%s: %s" % (feature_name, " ".join([str(x) for x in values])))

  for writer in writers:
    writer.close()

  tf.logging.info("Wrote %d total instances", total_written)


def create_int_feature(values):
  feature = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))
  return feature


def create_float_feature(values):
  feature = tf.train.Feature(float_list=tf.train.FloatList(value=list(values)))
  return feature


def create_training_instances(input_files, tokenizer, max_seq_length,
                              dupe_factor, short_seq_prob, masked_lm_prob,
                              max_predictions_per_seq, rng):
  """Create `TrainingInstance`s from raw text."""
  all_documents = [[]]

  # Input file format:
  # (1) One sentence per line. These should ideally be actual sentences, not
  # entire paragraphs or arbitrary spans of text. (Because we use the
  # sentence boundaries for the "next sentence prediction" task).
  # (2) Blank lines between documents. Document boundaries are needed so
  # that the "next sentence prediction" task doesn't span between documents.
  for input_file in input_files:
    with tf.gfile.GFile(input_file, "r") as reader:
      while True:
        line = reader.readline()
        if not FLAGS.spm_model_file:
          line = tokenization.convert_to_unicode(line)
        if not line:
          break
        if FLAGS.spm_model_file:
          line = tokenization.preprocess_text(line, lower=FLAGS.do_lower_case)
        else:
          line = line.strip()

        # Empty lines are used as document delimiters
        if not line:
          all_documents.append([])
        tokens = tokenizer.tokenize(line)
        if tokens:
          all_documents[-1].append(tokens)

  # Remove empty documents
  all_documents = [x for x in all_documents if x]
  rng.shuffle(all_documents)

  vocab_words = list(tokenizer.vocab.keys())
  instances = []
  for _ in range(dupe_factor):
    for document_index in range(len(all_documents)):
      instances.extend(
          create_instances_from_document(
              all_documents, document_index, max_seq_length, short_seq_prob,
              masked_lm_prob, max_predictions_per_seq, vocab_words, rng))

  rng.shuffle(instances)
  return instances


def create_instances_from_document(
    all_documents, document_index, max_seq_length, short_seq_prob,
    masked_lm_prob, max_predictions_per_seq, vocab_words, rng):
  """Creates `TrainingInstance`s for a single document."""
  document = all_documents[document_index]

  # Account for [CLS], [SEP], [SEP]
  max_num_tokens = max_seq_length - 3

  # We *usually* want to fill up the entire sequence since we are padding
  # to `max_seq_length` anyways, so short sequences are generally wasted
  # computation. However, we *sometimes*
  # (i.e., short_seq_prob == 0.1 == 10% of the time) want to use shorter
  # sequences to minimize the mismatch between pre-training and fine-tuning.
  # The `target_seq_length` is just a rough target however, whereas
  # `max_seq_length` is a hard limit.
  target_seq_length = max_num_tokens
  if rng.random() < short_seq_prob:
    target_seq_length = rng.randint(2, max_num_tokens)

  # We DON'T just concatenate all of the tokens from a document into a long
  # sequence and choose an arbitrary split point because this would make the
  # next sentence prediction task too easy. Instead, we split the input into
  # segments "A" and "B" based on the actual "sentences" provided by the user
  # input.
  instances = []
  current_chunk = []
  current_length = 0
  i = 0
  while i < len(document):
    segment = document[i]
    current_chunk.append(segment)
    current_length += len(segment)
    if i == len(document) - 1 or current_length >= target_seq_length:
      if current_chunk:
        # `a_end` is how many segments from `current_chunk` go into the `A`
        # (first) sentence.
        a_end = 1
        if len(current_chunk) >= 2:
          a_end = rng.randint(1, len(current_chunk) - 1)

        tokens_a = []
        for j in range(a_end):
          tokens_a.extend(current_chunk[j])

        tokens_b = []
        # Random next
        is_random_next = False
        if len(current_chunk) == 1 or \
            (FLAGS.random_next_sentence and rng.random() < 0.5):
          is_random_next = True
          target_b_length = target_seq_length - len(tokens_a)

          # This should rarely go for more than one iteration for large
          # corpora. However, just to be careful, we try to make sure that
          # the random document is not the same as the document
          # we're processing.
          for _ in range(10):
            random_document_index = rng.randint(0, len(all_documents) - 1)
            if random_document_index != document_index:
              break

          random_document = all_documents[random_document_index]
          random_start = rng.randint(0, len(random_document) - 1)
          for j in range(random_start, len(random_document)):
            tokens_b.extend(random_document[j])
            if len(tokens_b) >= target_b_length:
              break
          # We didn't actually use these segments so we "put them back" so
          # they don't go to waste.
          num_unused_segments = len(current_chunk) - a_end
          i -= num_unused_segments
        elif not FLAGS.random_next_sentence and rng.random() < 0.5:
          is_random_next = True
          for j in range(a_end, len(current_chunk)):
            tokens_b.extend(current_chunk[j])
          # Note(mingdachen): in this case, we just swap tokens_a and tokens_b
          tokens_a, tokens_b = tokens_b, tokens_a
        # Actual next
        else:
          is_random_next = False
          for j in range(a_end, len(current_chunk)):
            tokens_b.extend(current_chunk[j])
        truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng)

        assert len(tokens_a) >= 1
        assert len(tokens_b) >= 1

        tokens = []
        segment_ids = []
        tokens.append("[CLS]")
        segment_ids.append(0)
        for token in tokens_a:
          tokens.append(token)
          segment_ids.append(0)

        tokens.append("[SEP]")
        segment_ids.append(0)

        for token in tokens_b:
          tokens.append(token)
          segment_ids.append(1)
        tokens.append("[SEP]")
        segment_ids.append(1)

        (tokens, masked_lm_positions,
         masked_lm_labels, token_boundary) = create_masked_lm_predictions(
             tokens, masked_lm_prob, max_predictions_per_seq, vocab_words, rng)
        instance = TrainingInstance(
            tokens=tokens,
            segment_ids=segment_ids,
            is_random_next=is_random_next,
            token_boundary=token_boundary,
            masked_lm_positions=masked_lm_positions,
            masked_lm_labels=masked_lm_labels)
        instances.append(instance)
      current_chunk = []
      current_length = 0
    i += 1

  return instances


MaskedLmInstance = collections.namedtuple("MaskedLmInstance",
                                          ["index", "label"])


def _is_start_piece_sp(piece):
  """Check if the current word piece is the starting piece (sentence piece)."""
  special_pieces = set(list('!"#$%&\"()*+,-./:;?@[\\]^_`{|}~'))
  special_pieces.add(u"€".encode("utf-8"))
  special_pieces.add(u"£".encode("utf-8"))
  # Note(mingdachen):
  # For foreign characters, we always treat them as a whole piece.
  english_chars = set(list("abcdefghijklmnopqrstuvwhyz"))
  if (six.ensure_str(piece).startswith("▁") or
      six.ensure_str(piece).startswith("<") or piece in special_pieces or
      not all([i.lower() in english_chars.union(special_pieces)
               for i in piece])):
    return True
  else:
    return False


def _is_start_piece_bert(piece):
  """Check if the current word piece is the starting piece (BERT)."""
  # When a word has been split into
  # WordPieces, the first token does not have any marker and any subsequence
  # tokens are prefixed with ##. So whenever we see the ## token, we
  # append it to the previous set of word indexes.
  return not six.ensure_str(piece).startswith("##")


def is_start_piece(piece):
  if FLAGS.spm_model_file:
    return _is_start_piece_sp(piece)
  else:
    return _is_start_piece_bert(piece)


def create_masked_lm_predictions(tokens, masked_lm_prob,
                                 max_predictions_per_seq, vocab_words, rng):
  """Creates the predictions for the masked LM objective."""

  cand_indexes = []
  # Note(mingdachen): We create a list for recording if the piece is
  # the starting piece of current token, where 1 means true, so that
  # on-the-fly whole word masking is possible.
  token_boundary = [0] * len(tokens)

  for (i, token) in enumerate(tokens):
    if token == "[CLS]" or token == "[SEP]":
      token_boundary[i] = 1
      continue
    # Whole Word Masking means that if we mask all of the wordpieces
    # corresponding to an original word.
    #
    # Note that Whole Word Masking does *not* change the training code
    # at all -- we still predict each WordPiece independently, softmaxed
    # over the entire vocabulary.
    if (FLAGS.do_whole_word_mask and len(cand_indexes) >= 1 and
        not is_start_piece(token)):
      cand_indexes[-1].append(i)
    else:
      cand_indexes.append([i])
      if is_start_piece(token):
        token_boundary[i] = 1

  output_tokens = list(tokens)

  masked_lm_positions = []
  masked_lm_labels = []

  if masked_lm_prob == 0:
    return (output_tokens, masked_lm_positions,
            masked_lm_labels, token_boundary)

  num_to_predict = min(max_predictions_per_seq,
                       max(1, int(round(len(tokens) * masked_lm_prob))))

  # Note(mingdachen):
  # By default, we set the probilities to favor longer ngram sequences.
  ngrams = np.arange(1, FLAGS.ngram + 1, dtype=np.int64)
  pvals = 1. / np.arange(1, FLAGS.ngram + 1)
  pvals /= pvals.sum(keepdims=True)

  if FLAGS.favor_shorter_ngram:
    pvals = pvals[::-1]

  ngram_indexes = []
  for idx in range(len(cand_indexes)):
    ngram_index = []
    for n in ngrams:
      ngram_index.append(cand_indexes[idx:idx+n])
    ngram_indexes.append(ngram_index)

  rng.shuffle(ngram_indexes)

  masked_lms = []
  covered_indexes = set()
  for cand_index_set in ngram_indexes:
    if len(masked_lms) >= num_to_predict:
      break
    if not cand_index_set:
      continue
    # Note(mingdachen):
    # Skip current piece if they are covered in lm masking or previous ngrams.
    for index_set in cand_index_set[0]:
      for index in index_set:
        if index in covered_indexes:
          continue

    n = np.random.choice(ngrams[:len(cand_index_set)],
                         p=pvals[:len(cand_index_set)] /
                         pvals[:len(cand_index_set)].sum(keepdims=True))
    index_set = sum(cand_index_set[n - 1], [])
    n -= 1
    # Note(mingdachen):
    # Repeatedly looking for a candidate that does not exceed the
    # maximum number of predictions by trying shorter ngrams.
    while len(masked_lms) + len(index_set) > num_to_predict:
      if n == 0:
        break
      index_set = sum(cand_index_set[n - 1], [])
      n -= 1
    # If adding a whole-word mask would exceed the maximum number of
    # predictions, then just skip this candidate.
    if len(masked_lms) + len(index_set) > num_to_predict:
      continue
    is_any_index_covered = False
    for index in index_set:
      if index in covered_indexes:
        is_any_index_covered = True
        break
    if is_any_index_covered:
      continue
    for index in index_set:
      covered_indexes.add(index)

      masked_token = None
      # 80% of the time, replace with [MASK]
      if rng.random() < 0.8:
        masked_token = "[MASK]"
      else:
        # 10% of the time, keep original
        if rng.random() < 0.5:
          masked_token = tokens[index]
        # 10% of the time, rep

Download .txt

gitextract_fwt0rbxl/

├── README.md
├── albert_config/
│   ├── albert_config_base.json
│   ├── albert_config_base_google_fast.json
│   ├── albert_config_large.json
│   ├── albert_config_small_google.json
│   ├── albert_config_tiny.json
│   ├── albert_config_tiny_google.json
│   ├── albert_config_tiny_google_fast.json
│   ├── albert_config_xlarge.json
│   ├── albert_config_xxlarge.json
│   ├── bert_config.json
│   └── vocab.txt
├── args.py
├── bert_utils.py
├── classifier_utils.py
├── create_pretrain_data.sh
├── create_pretraining_data.py
├── create_pretraining_data_google.py
├── data/
│   └── news_zh_1.txt
├── lamb_optimizer_google.py
├── modeling.py
├── modeling_google.py
├── modeling_google_fast.py
├── optimization.py
├── optimization_finetuning.py
├── optimization_google.py
├── resources/
│   ├── create_pretraining_data_roberta.py
│   └── shell_scripts/
│       └── create_pretrain_data_batch_webtext.sh
├── run_classifier.py
├── run_classifier_clue.py
├── run_classifier_clue.sh
├── run_classifier_lcqmc.sh
├── run_classifier_sp_google.py
├── run_pretraining.py
├── run_pretraining_google.py
├── run_pretraining_google_fast.py
├── similarity.py
├── test_changes.py
├── tokenization.py
└── tokenization_google.py

Download .txt

SYMBOL INDEX (440 symbols across 22 files)

FILE: bert_utils.py
  function get_shape_list (line 13) | def get_shape_list(tensor, expected_rank=None, name=None):
  function reshape_to_matrix (line 49) | def reshape_to_matrix(input_tensor):
  function reshape_from_matrix (line 62) | def reshape_from_matrix(output_tensor, orig_shape_list):
  function assert_rank (line 74) | def assert_rank(tensor, expected_rank, name=None):
  function gather_indexes (line 103) | def gather_indexes(sequence_tensor, positions):
  function generate_seq2seq_mask (line 122) | def generate_seq2seq_mask(attention_mask, mask_sequence, seq_type, **kar...

FILE: classifier_utils.py
  function convert_to_unicode (line 36) | def convert_to_unicode(text):
  class InputExample (line 56) | class InputExample(object):
    method __init__ (line 59) | def __init__(self, guid, text_a, text_b=None, label=None):
  class PaddingInputExample (line 76) | class PaddingInputExample(object):
  class DataProcessor (line 87) | class DataProcessor(object):
    method get_train_examples (line 90) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 94) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 98) | def get_test_examples(self, data_dir):
    method get_labels (line 102) | def get_labels(self):
    method _read_tsv (line 107) | def _read_tsv(cls, input_file, delimiter="\t", quotechar=None):
    method _read_txt (line 117) | def _read_txt(cls, input_file):
    method _read_json (line 127) | def _read_json(cls, input_file):
  class XnliProcessor (line 137) | class XnliProcessor(DataProcessor):
    method get_train_examples (line 140) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 145) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 150) | def get_test_examples(self, data_dir):
    method _create_examples (line 155) | def _create_examples(self, lines, set_type):
    method get_labels (line 167) | def get_labels(self):
  class TnewsProcessor (line 214) | class TnewsProcessor(DataProcessor):
    method get_train_examples (line 217) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 222) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 227) | def get_test_examples(self, data_dir):
    method get_labels (line 232) | def get_labels(self):
    method _create_examples (line 241) | def _create_examples(self, lines, set_type):
  class iFLYTEKDataProcessor (line 294) | class iFLYTEKDataProcessor(DataProcessor):
    method get_train_examples (line 297) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 302) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 307) | def get_test_examples(self, data_dir):
    method get_labels (line 312) | def get_labels(self):
    method _create_examples (line 319) | def _create_examples(self, lines, set_type):
  class AFQMCProcessor (line 332) | class AFQMCProcessor(DataProcessor):
    method get_train_examples (line 335) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 340) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 345) | def get_test_examples(self, data_dir):
    method get_labels (line 350) | def get_labels(self):
    method _create_examples (line 354) | def _create_examples(self, lines, set_type):
  class CMNLIProcessor (line 367) | class CMNLIProcessor(DataProcessor):
    method get_train_examples (line 370) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 374) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 378) | def get_test_examples(self, data_dir):
    method get_labels (line 382) | def get_labels(self):
    method _create_examples_json (line 386) | def _create_examples_json(self, file_name, set_type):
  class CslProcessor (line 405) | class CslProcessor(DataProcessor):
    method get_train_examples (line 408) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 413) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 418) | def get_test_examples(self, data_dir):
    method get_labels (line 423) | def get_labels(self):
    method _create_examples (line 427) | def _create_examples(self, lines, set_type):
  class WSCProcessor (line 779) | class WSCProcessor(DataProcessor):
    method get_train_examples (line 782) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 787) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 792) | def get_test_examples(self, data_dir):
    method get_labels (line 797) | def get_labels(self):
    method _create_examples (line 801) | def _create_examples(self, lines, set_type):
  class COPAProcessor (line 841) | class COPAProcessor(DataProcessor):
    method __init__ (line 844) | def __init__(self):
    method get_train_examples (line 847) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 853) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 858) | def get_test_examples(self, data_dir):
    method get_labels (line 863) | def get_labels(self):
    method _create_examples_one (line 868) | def _create_examples_one(self, lines, set_type):
    method _create_examples (line 887) | def _create_examples(self, lines, set_type):

FILE: create_pretraining_data.py
  class TrainingInstance (line 71) | class TrainingInstance(object):
    method __init__ (line 74) | def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm...
    method __str__ (line 82) | def __str__(self):
    method __repr__ (line 95) | def __repr__(self):
  function write_instance_to_example_files (line 99) | def write_instance_to_example_files(instances, tokenizer, max_seq_length,
  function create_int_feature (line 172) | def create_int_feature(values):
  function create_float_feature (line 177) | def create_float_feature(values):
  function create_training_instances (line 182) | def create_training_instances(input_files, tokenizer, max_seq_length,
  function get_new_segment (line 227) | def get_new_segment(segment):  # 新增的方法 ####
  function create_instances_from_document_albert (line 260) | def create_instances_from_document_albert(
  function create_instances_from_document_original (line 372) | def create_instances_from_document_original( # THIS IS ORIGINAL BERT STY...
  function create_masked_lm_predictions (line 498) | def create_masked_lm_predictions(tokens, masked_lm_prob,
  function create_masked_lm_predictions_original (line 581) | def create_masked_lm_predictions_original(tokens, masked_lm_prob,
  function truncate_seq_pair (line 657) | def truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):
  function main (line 675) | def main(_):

FILE: create_pretraining_data_google.py
  class TrainingInstance (line 97) | class TrainingInstance(object):
    method __init__ (line 100) | def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm...
    method __str__ (line 109) | def __str__(self):
    method __repr__ (line 124) | def __repr__(self):
  function write_instance_to_example_files (line 128) | def write_instance_to_example_files(instances, tokenizer, max_seq_length,
  function create_int_feature (line 209) | def create_int_feature(values):
  function create_float_feature (line 214) | def create_float_feature(values):
  function create_training_instances (line 219) | def create_training_instances(input_files, tokenizer, max_seq_length,
  function create_instances_from_document (line 268) | def create_instances_from_document(
  function _is_start_piece_sp (line 395) | def _is_start_piece_sp(piece):
  function _is_start_piece_bert (line 412) | def _is_start_piece_bert(piece):
  function is_start_piece (line 421) | def is_start_piece(piece):
  function create_masked_lm_predictions (line 428) | def create_masked_lm_predictions(tokens, masked_lm_prob,
  function truncate_seq_pair (line 603) | def truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):
  function main (line 621) | def main(_):

FILE: lamb_optimizer_google.py
  class LAMBOptimizer (line 34) | class LAMBOptimizer(tf.train.Optimizer):
    method __init__ (line 42) | def __init__(self,
    method apply_gradients (line 68) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 126) | def _do_use_weight_decay(self, param_name):
    method _do_layer_adaptation (line 136) | def _do_layer_adaptation(self, param_name):
    method _get_variable_name (line 144) | def _get_variable_name(self, param_name):

FILE: modeling.py
  class BertConfig (line 31) | class BertConfig(object):
    method __init__ (line 34) | def __init__(self,
    method from_dict (line 83) | def from_dict(cls, json_object):
    method from_json_file (line 91) | def from_json_file(cls, json_file):
    method to_dict (line 97) | def to_dict(self):
    method to_json_string (line 102) | def to_json_string(self):
  class BertModel (line 107) | class BertModel(object):
    method __init__ (line 131) | def __init__(self,
    method get_pooled_output (line 254) | def get_pooled_output(self):
    method get_sequence_output (line 257) | def get_sequence_output(self):
    method get_all_encoder_layers (line 266) | def get_all_encoder_layers(self):
    method get_embedding_output (line 269) | def get_embedding_output(self):
    method get_embedding_table (line 280) | def get_embedding_table(self):
    method get_embedding_table_2 (line 283) | def get_embedding_table_2(self):
  function gelu (line 286) | def gelu(x):
  function get_activation (line 302) | def get_activation(activation_string):
  function get_assignment_map_from_checkpoint (line 339) | def get_assignment_map_from_checkpoint(tvars, init_checkpoint):
  function dropout (line 366) | def dropout(input_tensor, dropout_prob):
  function layer_norm (line 384) | def layer_norm(input_tensor, name=None):
  function layer_norm_and_dropout (line 390) | def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
  function create_initializer (line 397) | def create_initializer(initializer_range=0.02):
  function embedding_lookup (line 402) | def embedding_lookup(input_ids,
  function embedding_lookup_factorized (line 448) | def embedding_lookup_factorized(input_ids, # Factorized embedding parame...
  function embedding_postprocessor (line 507) | def embedding_postprocessor(input_tensor,
  function create_attention_mask_from_input_mask (line 603) | def create_attention_mask_from_input_mask(from_tensor, to_mask):
  function attention_layer (line 637) | def attention_layer(from_tensor,
  function transformer_model (line 833) | def transformer_model(input_tensor,
  function get_shape_list (line 981) | def get_shape_list(tensor, expected_rank=None, name=None):
  function reshape_to_matrix (line 1018) | def reshape_to_matrix(input_tensor):
  function reshape_from_matrix (line 1032) | def reshape_from_matrix(output_tensor, orig_shape_list):
  function assert_rank (line 1045) | def assert_rank(tensor, expected_rank, name=None):
  function prelln_transformer_model (line 1074) | def prelln_transformer_model(input_tensor,

FILE: modeling_google.py
  class AlbertConfig (line 36) | class AlbertConfig(object):
    method __init__ (line 41) | def __init__(self,
    method from_dict (line 102) | def from_dict(cls, json_object):
    method from_json_file (line 110) | def from_json_file(cls, json_file):
    method to_dict (line 116) | def to_dict(self):
    method to_json_string (line 121) | def to_json_string(self):
  class AlbertModel (line 126) | class AlbertModel(object):
    method __init__ (line 145) | def __init__(self,
    method get_pooled_output (line 244) | def get_pooled_output(self):
    method get_sequence_output (line 247) | def get_sequence_output(self):
    method get_all_encoder_layers (line 255) | def get_all_encoder_layers(self):
    method get_word_embedding_output (line 258) | def get_word_embedding_output(self):
    method get_embedding_output (line 268) | def get_embedding_output(self):
    method get_embedding_table (line 278) | def get_embedding_table(self):
  function gelu (line 282) | def gelu(x):
  function get_activation (line 296) | def get_activation(activation_string):
  function get_assignment_map_from_checkpoint (line 330) | def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_gr...
  function dropout (line 389) | def dropout(input_tensor, dropout_prob):
  function layer_norm (line 405) | def layer_norm(input_tensor, name=None):
  function layer_norm_and_dropout (line 411) | def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
  function create_initializer (line 418) | def create_initializer(initializer_range=0.02):
  function get_timing_signal_1d_given_position (line 423) | def get_timing_signal_1d_given_position(channels,
  function embedding_lookup (line 453) | def embedding_lookup(input_ids,
  function embedding_postprocessor (line 499) | def embedding_postprocessor(input_tensor,
  function dense_layer_3d (line 592) | def dense_layer_3d(input_tensor,
  function dense_layer_3d_proj (line 632) | def dense_layer_3d_proj(input_tensor,
  function dense_layer_2d (line 669) | def dense_layer_2d(input_tensor,
  function dot_product_attention (line 704) | def dot_product_attention(q, k, v, bias, dropout_rate=0.0):
  function attention_layer (line 748) | def attention_layer(from_tensor,
  function attention_ffn_block (line 837) | def attention_ffn_block(layer_input,
  function transformer_model (line 913) | def transformer_model(input_tensor,
  function get_shape_list (line 997) | def get_shape_list(tensor, expected_rank=None, name=None):
  function reshape_to_matrix (line 1032) | def reshape_to_matrix(input_tensor):
  function reshape_from_matrix (line 1046) | def reshape_from_matrix(output_tensor, orig_shape_list):
  function assert_rank (line 1059) | def assert_rank(tensor, expected_rank, name=None):

FILE: modeling_google_fast.py
  class AlbertConfig (line 36) | class AlbertConfig(object):
    method __init__ (line 41) | def __init__(self,
    method from_dict (line 102) | def from_dict(cls, json_object):
    method from_json_file (line 110) | def from_json_file(cls, json_file):
    method to_dict (line 116) | def to_dict(self):
    method to_json_string (line 121) | def to_json_string(self):
  class AlbertModel (line 126) | class AlbertModel(object):
    method __init__ (line 145) | def __init__(self,
    method get_pooled_output (line 244) | def get_pooled_output(self):
    method get_sequence_output (line 247) | def get_sequence_output(self):
    method get_all_encoder_layers (line 255) | def get_all_encoder_layers(self):
    method get_word_embedding_output (line 258) | def get_word_embedding_output(self):
    method get_embedding_output (line 268) | def get_embedding_output(self):
    method get_embedding_table (line 278) | def get_embedding_table(self):
  function gelu (line 282) | def gelu(x):
  function get_activation (line 296) | def get_activation(activation_string):
  function get_assignment_map_from_checkpoint (line 332) | def get_assignment_map_from_checkpoint(tvars, init_checkpoint, num_of_gr...
  function dropout (line 391) | def dropout(input_tensor, dropout_prob):
  function layer_norm (line 407) | def layer_norm(input_tensor, name=None):
  function layer_norm_and_dropout (line 413) | def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
  function create_initializer (line 420) | def create_initializer(initializer_range=0.02):
  function get_timing_signal_1d_given_position (line 425) | def get_timing_signal_1d_given_position(channels,
  function embedding_lookup (line 455) | def embedding_lookup(input_ids,
  function embedding_postprocessor (line 501) | def embedding_postprocessor(input_tensor,
  function dense_layer_3d (line 594) | def dense_layer_3d(input_tensor,
  function dense_layer_3d_proj (line 634) | def dense_layer_3d_proj(input_tensor,
  function dense_layer_2d (line 670) | def dense_layer_2d(input_tensor,
  function dense_layer_2d_old (line 721) | def dense_layer_2d_old(input_tensor,
  function dot_product_attention (line 789) | def dot_product_attention(q, k, v, bias, dropout_rate=0.0):
  function attention_layer (line 833) | def attention_layer(from_tensor,
  function attention_ffn_block (line 922) | def attention_ffn_block(layer_input,
  function transformer_model (line 1000) | def transformer_model(input_tensor,
  function get_shape_list (line 1084) | def get_shape_list(tensor, expected_rank=None, name=None):
  function reshape_to_matrix (line 1119) | def reshape_to_matrix(input_tensor):
  function reshape_from_matrix (line 1133) | def reshape_from_matrix(output_tensor, orig_shape_list):
  function assert_rank (line 1146) | def assert_rank(tensor, expected_rank, name=None):

FILE: optimization.py
  function create_optimizer (line 25) | def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, u...
  class AdamWeightDecayOptimizer (line 87) | class AdamWeightDecayOptimizer(tf.train.Optimizer):
    method __init__ (line 90) | def __init__(self,
    method apply_gradients (line 108) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 159) | def _do_use_weight_decay(self, param_name):
    method _get_variable_name (line 169) | def _get_variable_name(self, param_name):
  class LAMBOptimizer (line 178) | class LAMBOptimizer(tf.train.Optimizer):
    method __init__ (line 195) | def __init__(self,
    method apply_gradients (line 213) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 285) | def _do_use_weight_decay(self, param_name):
    method _get_variable_name (line 295) | def _get_variable_name(self, param_name):

FILE: optimization_finetuning.py
  function create_optimizer (line 25) | def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, u...
  class AdamWeightDecayOptimizer (line 87) | class AdamWeightDecayOptimizer(tf.train.Optimizer):
    method __init__ (line 90) | def __init__(self,
    method apply_gradients (line 108) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 159) | def _do_use_weight_decay(self, param_name):
    method _get_variable_name (line 169) | def _get_variable_name(self, param_name):

FILE: optimization_google.py
  function create_optimizer (line 32) | def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, u...
  class AdamWeightDecayOptimizer (line 118) | class AdamWeightDecayOptimizer(tf.train.Optimizer):
    method __init__ (line 121) | def __init__(self,
    method apply_gradients (line 139) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 190) | def _do_use_weight_decay(self, param_name):
    method _get_variable_name (line 200) | def _get_variable_name(self, param_name):

FILE: resources/create_pretraining_data_roberta.py
  class TrainingInstance (line 70) | class TrainingInstance(object):
    method __init__ (line 73) | def __init__(self, tokens, segment_ids, masked_lm_positions, masked_lm...
    method __str__ (line 81) | def __str__(self):
    method __repr__ (line 94) | def __repr__(self):
  function write_instance_to_example_files (line 98) | def write_instance_to_example_files(instances, tokenizer, max_seq_length,
  function create_int_feature (line 172) | def create_int_feature(values):
  function create_float_feature (line 177) | def create_float_feature(values):
  function create_training_instances (line 182) | def create_training_instances(input_files, tokenizer, max_seq_length,
  function _is_chinese_char (line 229) | def _is_chinese_char(cp):
  function get_new_segment (line 250) | def get_new_segment(segment): #  新增的方法 ####
  function get_raw_instance (line 282) | def get_raw_instance(document,max_sequence_length): # 新增的方法 TODO need ch...
  function create_instances_from_document (line 319) | def create_instances_from_document( # 新增的方法
  function create_instances_from_document_original (line 376) | def create_instances_from_document_original(
  function create_masked_lm_predictions (line 501) | def create_masked_lm_predictions(tokens, masked_lm_prob,
  function truncate_seq_pair (line 579) | def truncate_seq_pair(tokens_a, tokens_b, max_num_tokens, rng):
  function main (line 597) | def main(_):

FILE: run_classifier.py
  class InputExample (line 128) | class InputExample(object):
    method __init__ (line 131) | def __init__(self, guid, text_a, text_b=None, label=None):
  class PaddingInputExample (line 148) | class PaddingInputExample(object):
  class InputFeatures (line 159) | class InputFeatures(object):
    method __init__ (line 162) | def __init__(self,
  class DataProcessor (line 175) | class DataProcessor(object):
    method get_train_examples (line 178) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 182) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 186) | def get_test_examples(self, data_dir):
    method get_labels (line 190) | def get_labels(self):
    method _read_tsv (line 195) | def _read_tsv(cls, input_file, quotechar=None):
  function convert_single_example (line 204) | def convert_single_example(ex_index, example, label_list, max_seq_length,
  function file_based_convert_examples_to_features (line 306) | def file_based_convert_examples_to_features(
  function file_based_input_fn_builder (line 336) | def file_based_input_fn_builder(input_file, seq_length, is_training,
  function _truncate_seq_pair (line 384) | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  function create_model (line 401) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function layer_norm (line 460) | def layer_norm(input_tensor, name=None):
  function model_fn_builder (line 465) | def model_fn_builder(bert_config, num_labels, init_checkpoint, learning_...
  function input_fn_builder (line 559) | def input_fn_builder(features, seq_length, is_training, drop_remainder):
  class LCQMCPairClassificationProcessor (line 610) | class LCQMCPairClassificationProcessor(DataProcessor): # TODO NEED CHANGE2
    method __init__ (line 612) | def __init__(self):
    method get_train_examples (line 615) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 621) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 626) | def get_test_examples(self, data_dir):
    method get_labels (line 631) | def get_labels(self):
    method _create_examples (line 636) | def _create_examples(self, lines, set_type):
  class SentencePairClassificationProcessor (line 655) | class SentencePairClassificationProcessor(DataProcessor):
    method __init__ (line 657) | def __init__(self):
    method get_train_examples (line 660) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 666) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 671) | def get_test_examples(self, data_dir):
    method get_labels (line 676) | def get_labels(self):
    method _create_examples (line 681) | def _create_examples(self, lines, set_type):
  function convert_examples_to_features (line 702) | def convert_examples_to_features(examples, label_list, max_seq_length,
  function main (line 718) | def main(_):

FILE: run_classifier_clue.py
  class InputFeatures (line 134) | class InputFeatures(object):
    method __init__ (line 137) | def __init__(self,
  function convert_single_example_for_inews (line 150) | def convert_single_example_for_inews(ex_index, tokens_a, tokens_b, label...
  function convert_example_list_for_inews (line 234) | def convert_example_list_for_inews(ex_index, example, label_list, max_se...
  function file_based_convert_examples_to_features_for_inews (line 272) | def file_based_convert_examples_to_features_for_inews(
  function convert_single_example (line 305) | def convert_single_example(ex_index, example, label_list, max_seq_length,
  function file_based_convert_examples_to_features (line 407) | def file_based_convert_examples_to_features(
  function file_based_input_fn_builder (line 437) | def file_based_input_fn_builder(input_file, seq_length, is_training,
  function _truncate_seq_pair (line 485) | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  function create_model (line 502) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function layer_norm (line 563) | def layer_norm(input_tensor, name=None):
  function model_fn_builder (line 569) | def model_fn_builder(bert_config, num_labels, init_checkpoint, learning_...
  function input_fn_builder (line 663) | def input_fn_builder(features, seq_length, is_training, drop_remainder):
  function convert_examples_to_features (line 717) | def convert_examples_to_features(examples, label_list, max_seq_length,
  function main (line 733) | def main(_):

FILE: run_classifier_sp_google.py
  class InputExample (line 139) | class InputExample(object):
    method __init__ (line 142) | def __init__(self, guid, text_a, text_b=None, label=None):
  class PaddingInputExample (line 159) | class PaddingInputExample(object):
  class InputFeatures (line 170) | class InputFeatures(object):
    method __init__ (line 173) | def __init__(self,
  class DataProcessor (line 186) | class DataProcessor(object):
    method get_train_examples (line 189) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 193) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 197) | def get_test_examples(self, data_dir):
    method get_labels (line 201) | def get_labels(self):
    method _read_tsv (line 206) | def _read_tsv(cls, input_file, quotechar=None):
  class XnliProcessor (line 216) | class XnliProcessor(DataProcessor):
    method __init__ (line 219) | def __init__(self):
    method get_train_examples (line 222) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 241) | def get_dev_examples(self, data_dir):
    method get_labels (line 259) | def get_labels(self):
  class MnliProcessor (line 264) | class MnliProcessor(DataProcessor):
    method get_train_examples (line 267) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 272) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 278) | def get_test_examples(self, data_dir):
    method get_labels (line 283) | def get_labels(self):
    method _create_examples (line 287) | def _create_examples(self, lines, set_type):
  class LCQMCPairClassificationProcessor (line 306) | class LCQMCPairClassificationProcessor(DataProcessor):
    method __init__ (line 308) | def __init__(self):
    method get_train_examples (line 311) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 317) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 322) | def get_test_examples(self, data_dir):
    method get_labels (line 327) | def get_labels(self):
    method _create_examples (line 331) | def _create_examples(self, lines, set_type):
  class MrpcProcessor (line 349) | class MrpcProcessor(DataProcessor):
    method get_train_examples (line 352) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 357) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 362) | def get_test_examples(self, data_dir):
    method get_labels (line 367) | def get_labels(self):
    method _create_examples (line 371) | def _create_examples(self, lines, set_type):
  class ColaProcessor (line 390) | class ColaProcessor(DataProcessor):
    method get_train_examples (line 393) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 398) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 403) | def get_test_examples(self, data_dir):
    method get_labels (line 408) | def get_labels(self):
    method _create_examples (line 412) | def _create_examples(self, lines, set_type):
  function convert_single_example (line 434) | def convert_single_example(ex_index, example, label_list, max_seq_length,
  function file_based_convert_examples_to_features (line 536) | def file_based_convert_examples_to_features(
  function file_based_input_fn_builder (line 566) | def file_based_input_fn_builder(input_file, seq_length, is_training,
  function _truncate_seq_pair (line 614) | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  function create_model (line 631) | def create_model(albert_config, is_training, input_ids, input_mask, segm...
  function model_fn_builder (line 682) | def model_fn_builder(albert_config, num_labels, init_checkpoint, learnin...
  function input_fn_builder (line 777) | def input_fn_builder(features, seq_length, is_training, drop_remainder):
  function convert_examples_to_features (line 831) | def convert_examples_to_features(examples, label_list, max_seq_length,
  function main (line 847) | def main(_):

FILE: run_pretraining.py
  function model_fn_builder (line 109) | def model_fn_builder(bert_config, init_checkpoint, learning_rate,
  function get_masked_lm_output (line 241) | def get_masked_lm_output(bert_config, input_tensor, output_weights,proje...
  function get_next_sentence_output (line 290) | def get_next_sentence_output(bert_config, input_tensor, labels):
  function gather_indexes (line 313) | def gather_indexes(sequence_tensor, positions):
  function input_fn_builder (line 329) | def input_fn_builder(input_files,
  function _decode_record (line 396) | def _decode_record(record, name_to_features):
  function main (line 411) | def main(_):

FILE: run_pretraining_google.py
  function model_fn_builder (line 133) | def model_fn_builder(albert_config, init_checkpoint, learning_rate,
  function get_masked_lm_output (line 296) | def get_masked_lm_output(albert_config, input_tensor, output_weights, po...
  function get_sentence_order_output (line 342) | def get_sentence_order_output(albert_config, input_tensor, labels):
  function gather_indexes (line 366) | def gather_indexes(sequence_tensor, positions):
  function input_fn_builder (line 382) | def input_fn_builder(input_files,
  function _decode_record (line 456) | def _decode_record(record, name_to_features):
  function main (line 471) | def main(_):

FILE: run_pretraining_google_fast.py
  function model_fn_builder (line 133) | def model_fn_builder(albert_config, init_checkpoint, learning_rate,
  function get_masked_lm_output (line 296) | def get_masked_lm_output(albert_config, input_tensor, output_weights, po...
  function get_sentence_order_output (line 342) | def get_sentence_order_output(albert_config, input_tensor, labels):
  function gather_indexes (line 366) | def gather_indexes(sequence_tensor, positions):
  function input_fn_builder (line 382) | def input_fn_builder(input_files,
  function _decode_record (line 456) | def _decode_record(record, name_to_features):
  function main (line 471) | def main(_):

FILE: similarity.py
  class SimProcessor (line 18) | class SimProcessor(DataProcessor):
    method get_sentence_examples (line 19) | def get_sentence_examples(self, questions):
    method get_labels (line 29) | def get_labels(self):
  class BertSim (line 36) | class BertSim:
    method __init__ (line 37) | def __init__(self, batch_size=args.batch_size):
    method start_model (line 49) | def start_model(self):
    method model_fn_builder (line 53) | def model_fn_builder(self, bert_config, num_labels, init_checkpoint, l...
    method get_estimator (line 97) | def get_estimator(self):
    method predict_sentences (line 126) | def predict_sentences(self,sentences):
    method _truncate_seq_pair (line 132) | def _truncate_seq_pair(self, tokens_a, tokens_b, max_length):
    method convert_single_example (line 148) | def convert_single_example(self, ex_index, example, label_list, max_se...
  function input_fn_builder (line 241) | def input_fn_builder(bertSim,sentences):

FILE: test_changes.py
  function get_total_parameters (line 16) | def get_total_parameters():
  function test_factorized_embedding (line 35) | def test_factorized_embedding():
  function test_share_parameters (line 44) | def test_share_parameters():
  function test_sentence_order_prediction (line 64) | def test_sentence_order_prediction():

FILE: tokenization.py
  function validate_case_matches_checkpoint (line 28) | def validate_case_matches_checkpoint(do_lower_case, init_checkpoint):
  function convert_to_unicode (line 78) | def convert_to_unicode(text):
  function printable_text (line 98) | def printable_text(text):
  function load_vocab (line 121) | def load_vocab(vocab_file):
  function convert_by_vocab (line 136) | def convert_by_vocab(vocab, items):
  function convert_tokens_to_ids (line 146) | def convert_tokens_to_ids(vocab, tokens):
  function convert_ids_to_tokens (line 150) | def convert_ids_to_tokens(inv_vocab, ids):
  function whitespace_tokenize (line 154) | def whitespace_tokenize(text):
  class FullTokenizer (line 163) | class FullTokenizer(object):
    method __init__ (line 166) | def __init__(self, vocab_file, do_lower_case=True):
    method tokenize (line 172) | def tokenize(self, text):
    method convert_tokens_to_ids (line 180) | def convert_tokens_to_ids(self, tokens):
    method convert_ids_to_tokens (line 183) | def convert_ids_to_tokens(self, ids):
  class BasicTokenizer (line 187) | class BasicTokenizer(object):
    method __init__ (line 190) | def __init__(self, do_lower_case=True):
    method tokenize (line 198) | def tokenize(self, text):
    method _run_strip_accents (line 222) | def _run_strip_accents(self, text):
    method _run_split_on_punc (line 233) | def _run_split_on_punc(self, text):
    method _tokenize_chinese_chars (line 253) | def _tokenize_chinese_chars(self, text):
    method _is_chinese_char (line 266) | def _is_chinese_char(self, cp):
    method _clean_text (line 288) | def _clean_text(self, text):
  class WordpieceTokenizer (line 302) | class WordpieceTokenizer(object):
    method __init__ (line 305) | def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=...
    method tokenize (line 310) | def tokenize(self, text):
  function _is_whitespace (line 364) | def _is_whitespace(char):
  function _is_control (line 376) | def _is_control(char):
  function _is_punctuation (line 388) | def _is_punctuation(char):

FILE: tokenization_google.py
  function validate_case_matches_checkpoint (line 35) | def validate_case_matches_checkpoint(do_lower_case, init_checkpoint):
  function preprocess_text (line 86) | def preprocess_text(inputs, remove_space=True, lower=False):
  function encode_pieces (line 106) | def encode_pieces(sp_model, text, return_unicode=True, sample=False):
  function encode_ids (line 144) | def encode_ids(sp_model, text, sample=False):
  function convert_to_unicode (line 150) | def convert_to_unicode(text):
  function printable_text (line 170) | def printable_text(text):
  function load_vocab (line 193) | def load_vocab(vocab_file):
  function convert_by_vocab (line 207) | def convert_by_vocab(vocab, items):
  function convert_tokens_to_ids (line 215) | def convert_tokens_to_ids(vocab, tokens):
  function convert_ids_to_tokens (line 219) | def convert_ids_to_tokens(inv_vocab, ids):
  function whitespace_tokenize (line 223) | def whitespace_tokenize(text):
  class FullTokenizer (line 232) | class FullTokenizer(object):
    method __init__ (line 235) | def __init__(self, vocab_file, do_lower_case=True, spm_model_file=None):
    method tokenize (line 255) | def tokenize(self, text):
    method convert_tokens_to_ids (line 266) | def convert_tokens_to_ids(self, tokens):
    method convert_ids_to_tokens (line 274) | def convert_ids_to_tokens(self, ids):
  class BasicTokenizer (line 282) | class BasicTokenizer(object):
    method __init__ (line 285) | def __init__(self, do_lower_case=True):
    method tokenize (line 293) | def tokenize(self, text):
    method _run_strip_accents (line 317) | def _run_strip_accents(self, text):
    method _run_split_on_punc (line 328) | def _run_split_on_punc(self, text):
    method _tokenize_chinese_chars (line 348) | def _tokenize_chinese_chars(self, text):
    method _is_chinese_char (line 361) | def _is_chinese_char(self, cp):
    method _clean_text (line 383) | def _clean_text(self, text):
  class WordpieceTokenizer (line 397) | class WordpieceTokenizer(object):
    method __init__ (line 400) | def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=...
    method tokenize (line 405) | def tokenize(self, text):
  function _is_whitespace (line 459) | def _is_whitespace(char):
  function _is_control (line 471) | def _is_control(char):
  function _is_punctuation (line 483) | def _is_punctuation(char):

Download .json

Condensed preview — 40 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (682K chars).

[
  {
    "path": "README.md",
    "chars": 23181,
    "preview": "# albert_zh\n\nAn Implementation of <a href=\"https://arxiv.org/pdf/1909.11942.pdf\">A Lite Bert For Self-Supervised Learnin"
  },
  {
    "path": "albert_config/albert_config_base.json",
    "chars": 563,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/albert_config_base_google_fast.json",
    "chars": 484,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.1,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.1,\n  \"embedding_size\": 128,\n"
  },
  {
    "path": "albert_config/albert_config_large.json",
    "chars": 563,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/albert_config_small_google.json",
    "chars": 482,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.0,\n  \"embedding_size\": 128,\n"
  },
  {
    "path": "albert_config/albert_config_tiny.json",
    "chars": 562,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/albert_config_tiny_google.json",
    "chars": 483,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.0,\n  \"embedding_size\": 128,\n"
  },
  {
    "path": "albert_config/albert_config_tiny_google_fast.json",
    "chars": 483,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.1,\n  \"hidden_act\": \"gelu\",\n  \"hidden_dropout_prob\": 0.1,\n  \"embedding_size\": 128,\n"
  },
  {
    "path": "albert_config/albert_config_xlarge.json",
    "chars": 563,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/albert_config_xxlarge.json",
    "chars": 564,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/bert_config.json",
    "chars": 518,
    "preview": "{\n  \"attention_probs_dropout_prob\": 0.0,\n  \"directionality\": \"bidi\", \n  \"hidden_act\": \"gelu\", \n  \"hidden_dropout_prob\": "
  },
  {
    "path": "albert_config/vocab.txt",
    "chars": 75770,
    "preview": "[PAD]\n[unused1]\n[unused2]\n[unused3]\n[unused4]\n[unused5]\n[unused6]\n[unused7]\n[unused8]\n[unused9]\n[unused10]\n[unused11]\n[u"
  },
  {
    "path": "args.py",
    "chars": 801,
    "preview": "import os\nimport tensorflow as tf\n\ntf.logging.set_verbosity(tf.logging.INFO)\n\nfile_path = os.path.dirname(__file__)\n\n\n#模"
  },
  {
    "path": "bert_utils.py",
    "chars": 4562,
    "preview": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport col"
  },
  {
    "path": "classifier_utils.py",
    "chars": 31043,
    "preview": "# -*- coding: utf-8 -*-\n# @Author: bo.shi\n# @Date:   2019-12-01 22:28:41\n# @Last Modified by:   bo.shi\n# @Last Modified "
  },
  {
    "path": "create_pretrain_data.sh",
    "chars": 339,
    "preview": "#!/usr/bin/env bash\n\nBERT_BASE_DIR=./albert_config\npython3 create_pretraining_data.py --do_whole_word_mask=True --input_"
  },
  {
    "path": "create_pretraining_data.py",
    "chars": 37639,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "create_pretraining_data_google.py",
    "chars": 23106,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "data/news_zh_1.txt",
    "chars": 11510,
    "preview": "最后的南京老城该往何处去 城市化时代呼唤文化自觉\n【概要】80后学者姚远出版《城市的自觉》一书 姚远出版《城市的自觉》 作者简介姚远，政治学博士，1981年出生于南京，1999年从金陵中学毕业后考入北京大学国际关系学院，负笈燕园十二载，获政"
  },
  {
    "path": "lamb_optimizer_google.py",
    "chars": 5623,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "modeling.py",
    "chars": 50600,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "modeling_google.py",
    "chars": 43011,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "modeling_google_fast.py",
    "chars": 47282,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "optimization.py",
    "chars": 11793,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "optimization_finetuning.py",
    "chars": 6323,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "optimization_google.py",
    "chars": 7574,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "resources/create_pretraining_data_roberta.py",
    "chars": 25201,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "resources/shell_scripts/create_pretrain_data_batch_webtext.sh",
    "chars": 416,
    "preview": "#!/usr/bin/env bash\necho $1,$2\n\nBERT_BASE_DIR=./bert_config\nfor((i=$1;i<=$2;i++));\ndo\npython3 create_pretraining_data.py"
  },
  {
    "path": "run_classifier.py",
    "chars": 35595,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "run_classifier_clue.py",
    "chars": 37965,
    "preview": "# -*- coding: utf-8 -*-\n# @Author: bo.shi\n# @Date:   2019-11-04 09:56:36\n# @Last Modified by:   bo.shi\n# @Last Modified "
  },
  {
    "path": "run_classifier_clue.sh",
    "chars": 3076,
    "preview": "# @Author: bo.shi\n# @Date:   2020-03-15 16:11:00\n# @Last Modified by:   bo.shi\n# @Last Modified time: 2020-04-02 17:54:0"
  },
  {
    "path": "run_classifier_lcqmc.sh",
    "chars": 2260,
    "preview": "#!/usr/bin/env bash\n# @Author: bo.shi, https://github.com/chineseGLUE/chineseGLUE\n# @Date:   2019-11-04 09:56:36\n# @Last"
  },
  {
    "path": "run_classifier_sp_google.py",
    "chars": 39275,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "run_pretraining.py",
    "chars": 19745,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "run_pretraining_google.py",
    "chars": 21756,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "run_pretraining_google_fast.py",
    "chars": 21761,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  },
  {
    "path": "similarity.py",
    "chars": 10889,
    "preview": "\"\"\"\r\n进行文本相似度预测的示例。可以直接运行进行预测。\r\n参考了项目：https://github.com/chdd/bert-utils\r\n\r\n\"\"\"\r\n\r\n\r\nimport tensorflow as tf\r\nimport args"
  },
  {
    "path": "test_changes.py",
    "chars": 3008,
    "preview": "# coding=utf-8\nimport tensorflow as tf\nfrom modeling import embedding_lookup_factorized,transformer_model\nimport os\n\n\"\"\""
  },
  {
    "path": "tokenization.py",
    "chars": 13166,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "tokenization_google.py",
    "chars": 15621,
    "preview": "# coding=utf-8\n# Copyright 2019 The Google Research Authors.\n#\n# Licensed under the Apache License, Version 2.0 (the \"Li"
  }
]

About this extraction

This page contains the full source code of the brightmart/albert_zh GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 40 files (620.3 KB), approximately 210.9k tokens, and a symbol index with 440 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo