master 091ff9910839 cached
108 files
22.5 MB
5.9M tokens
679 symbols
1 requests
Copy disabled (too large) Download .txt
Showing preview only (23,621K chars total). Download the full file to get everything.
Repository: brightmart/text_classification
Branch: master
Commit: 091ff9910839
Files: 108
Total size: 22.5 MB

Directory structure:
gitextract_51obnlez/

├── .travis.yml
├── LICENSE.md
├── README.md
├── a00_Bert/
│   ├── README_bert.md
│   ├── __init__.py
│   ├── bert_modeling.py
│   ├── optimization.py
│   ├── run_classifier_predict_online.py
│   ├── tokenization.py
│   ├── train_bert_multi-label.py
│   ├── train_bert_toy_task.py
│   ├── unused/
│   │   ├── run_classifier_multi_labels_bert.py
│   │   └── train_bert_multi-label_old.py
│   └── utils.py
├── a00_boosting/
│   └── a08_boosting.py
├── a01_FastText/
│   ├── old_single_label/
│   │   ├── p5_fastTextB_model.py
│   │   ├── p5_fastTextB_predict.py
│   │   └── p5_fastTextB_train.py
│   ├── p5_fastTextB_predict_multilabel.py
│   ├── p6_fastTextB_model_multilabel.py
│   └── p6_fastTextB_train_multilabel.py
├── a02_TextCNN/
│   ├── __init__.py
│   ├── data_util.py
│   ├── other_experiement/
│   │   ├── __init__.py
│   │   ├── data_util_zhihu.py
│   │   ├── p7_TextCNN_model_multilayers.py
│   │   ├── p7_TextCNN_predict_ensemble.py
│   │   ├── p7_TextCNN_predict_exp.py
│   │   ├── p7_TextCNN_predict_exp512.py
│   │   ├── p7_TextCNN_predict_exp512_0609.py
│   │   ├── p7_TextCNN_predict_exp512_simple.py
│   │   ├── p7_TextCNN_train_exp.py
│   │   ├── p7_TextCNN_train_exp512.py
│   │   ├── p7_TextCNN_train_exp_512_0609.py
│   │   └── p8_TextCNN_predict_exp.py
│   ├── p7_TextCNN_model.py
│   ├── p7_TextCNN_predict.py
│   ├── p7_TextCNN_train.py
│   └── p7_temp.py
├── a03_TextRNN/
│   ├── p8_TextRNN_model.py
│   ├── p8_TextRNN_model_multi_layers.py
│   ├── p8_TextRNN_predict.py
│   └── p8_TextRNN_train.py
├── a04_TextRCNN/
│   ├── p71_TextRCNN_mode2.py
│   ├── p71_TextRCNN_model.py
│   ├── p71_TextRCNN_predict.py
│   └── p71_TextRCNN_train.py
├── a05_HierarchicalAttentionNetwork/
│   ├── HAN_model.py
│   ├── p1_HierarchicalAttention_model.py
│   ├── p1_HierarchicalAttention_model_transformer.py
│   ├── p1_HierarchicalAttention_predict.py
│   ├── p1_HierarchicalAttention_train.py
│   └── p1_seq2seq.py
├── a06_Seq2seqWithAttention/
│   ├── a1_seq2seq.py
│   ├── a1_seq2seq_attention_model.py
│   ├── a1_seq2seq_attention_predict.py
│   └── a1_seq2seq_attention_train.py
├── a07_Transformer/
│   ├── a2_attention_between_enc_dec.py
│   ├── a2_base_model.py
│   ├── a2_decoder.py
│   ├── a2_encoder.py
│   ├── a2_layer_norm_residual_conn.py
│   ├── a2_multi_head_attention.py
│   ├── a2_poistion_wise_feed_forward.py
│   ├── a2_predict.py
│   ├── a2_predict_classification.py
│   ├── a2_split_traning_data.py
│   ├── a2_train.py
│   ├── a2_train_classification.py
│   ├── a2_transformer.py
│   ├── a2_transformer_classification.py
│   └── data_util_zhihu.py
├── a08_EntityNetwork/
│   ├── a3_entity_network.py
│   ├── a3_predict.py
│   ├── a3_train.py
│   └── data_util_zhihu.py
├── a08_predict_ensemble.py
├── a09_DynamicMemoryNet/
│   ├── a8_dynamic_memory_network.py
│   ├── a8_predict.py
│   └── a8_train.py
├── aa1_data_util/
│   ├── 1_process_zhihu.py
│   ├── 2_predict_zhihu_get_question_representation.py
│   ├── 3_process_zhihu_question_topic_relation.py
│   ├── data_multi_label.txt
│   ├── data_single_label.txt
│   └── data_util_zhihu.py
├── aa2_ClassificationTflearn/
│   ├── p2_classification_tflearn.py
│   └── p2_classification_tflearn_demo.py
├── aa3_CNNSentenceClassificationTflearn/
│   ├── p4_cnn_sentence_classification.py
│   ├── p4_cnn_sentence_classification_zhihu.py
│   ├── p4_cnn_sentence_classification_zhihu2.py
│   ├── p4_cnn_sentence_classification_zhihu2_predict.py
│   └── p4_conv_classification_tflearn.py
├── aa4_TextCNN_with_RCNN/
│   ├── p72_TextCNN_with_RCNN_model.py
│   └── p72_TextCNN_with_RCNN_train.py
├── aa5_BiLstmTextRelation/
│   ├── p9_BiLstmTextRelation_model.py
│   └── p9_BiLstmTextRelation_train.py
├── aa6_TwoCNNTextRelation/
│   ├── p9_twoCNNTextRelation_model.py
│   └── p9_twoCNNTextRelation_train.py
├── data/
│   ├── __init__.py
│   ├── ieee_zhihu_cup/
│   │   ├── label_set.txt
│   │   └── vocab.txt
│   ├── old/
│   │   ├── __init__.py
│   │   └── sample_multiple_label.txt
│   ├── sample_multiple_label.txt
│   └── sample_single_label.txt
├── images/
│   └── xx
└── pre-processing.ipynb

================================================
FILE CONTENTS
================================================

================================================
FILE: .travis.yml
================================================
language: python
python:
    - 2.7.13
    - 3.6.2
install:
    - pip install flake8==3.3.0  # pytest  # add another testing frameworks later
before_script:
    # stop the build if there are Python syntax errors or undefined names
    - time flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
    # exit-zero treats all errors as warnings.  The GitHub editor is 127 chars wide
    - time flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
script:
    - true  # add other tests here
notifications:
    on_success: change
    on_failure: always


================================================
FILE: LICENSE.md
================================================
MIT License

Copyright (c) [year] [fullname]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
Text Classification
-------------------------------------------------------------------------
The purpose of this repository is to explore text classification methods in NLP with deep learning.

#### Update: 

Customize an NLP API in three minutes, for free: <a href='https://www.cluebenchmarks.com/clueai.html'>NLP API Demo</a>

Language Understanding Evaluation benchmark for Chinese(<a href='https://www.CLUEbenchmarks.com'>CLUE benchmark<a/>): run 10 tasks & 9 baselines with one line of code, performance comparision with details.

Releasing Pre-trained Model of <a href="https://github.com/brightmart/albert_zh">ALBERT_Chinese</a> Training with 30G+ Raw Chinese Corpus, xxlarge, xlarge and more, Target to match State of the Art performance in Chinese, 2019-Oct-7, During the National Day of China!
 
<a href='https://github.com/brightmart/nlp_chinese_corpus'>Large Amount of Chinese Corpus for NLP Available!</a>

Google's BERT achieved new state of art result on more than 10 tasks in NLP using pre-train in language model then 

fine-tuning. <a href='https://github.com/brightmart/bert_language_understanding'>Pre-train TexCNN: idea from BERT for language understanding with running code and data set</a>


#### Introduction
it has all kinds of baseline models for text classification.

it also support for multi-label classification where multi labels associate with an sentence or document.

although many of these models are simple, and may not get you to top level of the task. but some of these models are very 

classic, so they may be good to serve as baseline models. each model has a test function under model class. you can run 

it to performance toy task first. the model is independent from data set.

<a href='https://github.com/brightmart/text_classification/blob/master/multi-label-classification.pdf'>check here for formal report of large scale multi-label text classification with deep learning</a>

several models here can also be used for modelling question answering (with or without context), or to do sequences generating. 

we explore two seq2seq model(seq2seq with attention,transformer-attention is all you need) to do text classification. 

and these two models can also be used for sequences generating and other tasks. if your task is a multi-label classification, 

you can cast the problem to sequences generating.

we implement two memory network. one is dynamic memory network. previously it reached state of art in question 

answering, sentiment analysis and sequence generating tasks. it is so called one model to do several different tasks, 

and reach high performance. it has four modules. the key component is episodic memory module. it use gate mechanism to 

performance attention, and use gated-gru to update episode memory, then it has another gru( in a vertical direction) to 

performance hidden state update. it has ability to do transitive inference.

the second memory network we implemented is recurrent entity network: tracking state of the world. it has blocks of 

key-value pairs as memory, run in parallel, which achieve new state of art. it can be used for modelling question 

answering with contexts(or history). for example, you can let the model to read some sentences(as context), and ask a 

question(as query), then ask the model to predict an answer; if you feed story same as query, then it can do 

classification task. 

To discuss ML/DL/NLP problems and get tech support from each other, you can join QQ group: 836811304

Models:
-------------------------------------------------------------------------

1) fastText
2) TextCNN 
3) Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding  
4) TextRNN    
5) RCNN     
6) Hierarchical Attention Network    
7) seq2seq with attention   
8) Transformer("Attend Is All You Need")
9) Dynamic Memory Network
10) EntityNetwork:tracking state of the world
11) Ensemble models
12) Boosting: 

    for a single model, stack identical models together. each layer is a model. the result will be based on logits added together. the only connection between layers are label's weights. the front layer's prediction error rate of each label will become weight for the next layers. those labels with high error rate will have big weight. so later layer's will pay more attention to those mis-predicted labels, and try to fix previous mistake of former layer. as a result, we will get a much strong model.
    check a00_boosting/boosting.py

and other models:

1) BiLstmTextRelation;

2) twoCNNTextRelation;

3) BiLstmTextRelationTwoRNN

Performance
-------------------------------------------------------------------------

(mulit-label label prediction task,ask to prediction top5, 3 million training data,full score:0.5)

Model   | fastText|TextCNN|TextRNN| RCNN | HierAtteNet|Seq2seqAttn|EntityNet|DynamicMemory|Transformer
---     | ---     | ---   | ---   |---   |---         |---        |---      |---          |----
Score   | 0.362   |  0.405| 0.358 | 0.395| 0.398      |0.322      |0.400    |0.392        |0.322
Training| 10m     |  2h   |10h    | 2h   | 2h         |3h         |3h       |5h           |7h
--------------------------------------------------------------------------------------------------
 
 Bert model achieves 0.368 after first 9 epoch from validation set.
 
 Ensemble of TextCNN,EntityNet,DynamicMemory: 0.411
 
 Ensemble EntityNet,DynamicMemory: 0.403
 

 
 --------------------------------------------------------------------------------------------------
 
 Notice: 
 
 `m` stand for **minutes**; `h` stand for **hours**;
 
`HierAtteNet` means Hierarchical Attention Networkk;

`Seq2seqAttn` means Seq2seq with attention;

`DynamicMemory` means DynamicMemoryNetwork;

`Transformer` stand for model from 'Attention Is All You Need'.

Usage:
-------------------------------------------------------------------------------------------------------
1) model is in `xxx_model.py`
2) run python `xxx_train.py` to train the model
3) run python `xxx_predict.py` to do inference(test).

Each model has a test method under the model class. you can run the test method first to check whether the model can work properly.

-------------------------------------------------------------------------

Environment:
-------------------------------------------------------------------------------------------------------
python 2.7+ tensorflow 1.8 

(tensorflow 1.1 to 1.13 should also works; most of models should also work fine in other tensorflow version, since we 

use very few features bond to certain version.

if you use python3, it will be fine as long as you change print/try catch function in case you meet any error.

TextCNN model is already transfomed to python 3.6


Sample data: <a href='https://pan.baidu.com/s/1yWZf2eAPxq15-r2hHk2M-Q'>cached file of baidu</a> or <a href="https://drive.google.com/drive/folders/0AKEuT4gza2AlUk9PVA">Google Drive:</a>send me an email
-------------------------------------------------------------------------------------------------------
to help you run this repository, currently we re-generate training/validation/test data and vocabulary/labels, and saved 

them as cache file using h5py. we suggest you to download it from above link.

it contain everything you need to run this repository: data is pre-processed, you can start to train the model in a minute.
  
it's a zip file about 1.8G, contains 3 million training data. although after unzip it's quite big, but with the help of 

hdf5, it only need a normal size of memory of computer(e.g.8 G or less) during training.

we use jupyter notebook: <a href='https://github.com/brightmart/text_classification/blob/master/pre-processing.ipynb'>pre-processing.ipynb</a> to pre-process data. you can have a better understanding of this task and 

data by taking a look of it. you can also generate data by yourself in the way your want, just change few lines of code 

using this jupyter notebook.

If you want to try a model now, you can dowload cached file from above, then go to folder 'a02_TextCNN', run 
        
     python  p7_TextCNN_train.py 
   
it will use data from cached files to train the model, and print loss and F1 score periodically.

old sample data source:
if you need some sample data and word embedding per-trained on word2vec, you can find it in closed issues, such as: <a href="https://github.com/brightmart/text_classification/issues/3">issue 3</a>. 

you can also find some sample data at folder "data". it contains two files:'sample_single_label.txt', contains 50k data 

with single label; 'sample_multiple_label.txt', contains 20k data with multiple labels. input and label of is separate by "   __label__".

if you want to know more detail about data set of text classification or task these models can be used, one of choose is below:

https://biendata.com/competition/zhihu/

Road Map
-------------------------------------------------------------------------------------------------------
One way you can use this repository:
 
step 1: you can read through this article. you will get a general idea of various classic models used to do text classification.

step 2: pre-process data and/or download cached file.

      a. take a look a look of jupyter notebook('pre-processing.ipynb'), where you can familiar with this text 

           classification task and data set. you will also know how we pre-process data and generate training/validation/test 
           
           set. there are a list of things you can try at the end of this jupyter.

       b. download zip file that contains cached files, so you will have all necessary data, and can start to train models.

step 3: run some of models list here, and change some codes and configurations as you want, to get a good performance.

      record performances, and things you done that works, and things that are not.

      for example, you can take this sequence to explore: 
      
      1) fasttext---> 2)TextCNN---> 3)Transformer---> 4)BERT

additionally, write your article about this topic, you can follow paper's style to write. you may need to read some papers
       
       on the way, many of these papers list in the # Reference at the end of this article; or join  a machine learning 
       
       competition, and apply it with what you've learned. 
       
Use Your Own Data:
-------------------------------------------------------------------------------------------------------
replace data in 'data/sample_multiple_label.txt', and make sure format as below:

'word1 word2 word3 __label__l1 __label__l2 __label__l3'
 
where part1: 'word1 word2 word3' is input(X), part2: '__label__l1 __label__l2 __label__l3' 

representing there are three labels: [l1,l2,l3]. between part1 and part2 there should be a empty string: ' '.

for example: each line (multiple labels) like: 

'w5466 w138990 w1638 w4301 w6 w470 w202 c1834 c1400 c134 c57 c73 c699 c317 c184 __label__5626661657638885119 __label__4921793805334628695 __label__8904735555009151318'

where '5626661657638885119','4921793805334628695',‘8904735555009151318’ are three labels associate with this input string 'w5466 w138990...c699 c317 c184'

Notice:


Some util function is in data_util.py;  check load_data_multilabel() of data_util for how process input and labels from raw data.

there is a function to load and assign pretrained word embedding to the model,where word embedding is pretrained in word2vec or fastText. 

Pretrain Work Embedding:
-------------------------------------------------------------------------------------------------------
if word2vec.load not works, you may load pretrained word embedding, especially for chinese word embedding use following lines:

import gensim

from gensim.models import KeyedVectors

word2vec_model = KeyedVectors.load_word2vec_format(word2vec_model_path, binary=True, unicode_errors='ignore')  #

or you can turn off use pretrain word embedding flag to false to disable loading word embedding.


Models Detail:
-------------------------------------------------------------------------

1.fastText:  
-------------
implmentation of <a href="https://arxiv.org/abs/1607.01759">Bag of Tricks for Efficient Text Classification</a>

after embed each word in the sentence, this word representations are then averaged into a text representation, which is in turn fed to a linear classifier.it use softmax function to compute the probability distribution over the predefined classes. then cross entropy is used to compute loss. bag of word representation does not consider word order. in order to take account of word order, n-gram features is used to capture some partial information about the local word order; when the number of classes is large, computing the linear classifier is computational expensive. so it usehierarchical softmax to speed training process.
1) use bi-gram and/or tri-gram
2) use NCE loss to speed us softmax computation(not use hierarchy softmax as original paper)

result: performance is as good as paper, speed also very fast.

check: p5_fastTextB_model.py

![alt text](https://github.com/brightmart/text_classification/blob/master/images/fastText.JPG)
-------------------------------------------------------------------------

2.TextCNN:
-------------
Implementation of <a href="http://www.aclweb.org/anthology/D14-1181"> Convolutional Neural Networks for Sentence Classification </a>

Structure:embedding--->conv--->max pooling--->fully connected layer-------->softmax

Check: p7_TextCNN_model.py

In order to get very good result with TextCNN, you also need to read carefully about this paper <a href="https://arxiv.org/abs/1510.03820">A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification</a>: it give you some insights of things that can affect performance. although you need to  change some settings according to your specific task.

Convolutional Neural Network is main building box for solve problems of computer vision. Now we will show how CNN can be used for NLP, in in particular, text classification. Sentence length will be different from one to another. So we will use pad to get fixed length, n. For each token in the sentence, we will use word embedding to get a fixed dimension vector, d. So our input is a 2-dimension matrix:(n,d). This is similar with image for CNN. 

Firstly, we will do convolutional operation to our input. It is a element-wise multiply between filter and part of input. We use k number of filters, each filter size is a 2-dimension matrix (f,d). Now the output will be k number of lists. Each list has a length of n-f+1. each element is a scalar. Notice that the second dimension will be always the dimension of word embedding. We are using different size of filters to get rich features from text inputs. And this is something similar with n-gram features. 

Secondly, we will do max pooling for the output of convolutional operation. For k number of lists, we will get k number of scalars. 

Thirdly, we will concatenate scalars to form final features. It is a fixed-size vector. And it is independent from the size of filters we use.

Finally, we will use linear layer to project these features to per-defined labels.

![alt text](https://github.com/brightmart/text_classification/blob/master/images/TextCNN.JPG)

-------------------------------------------------------------------------


3.BERT: 
-------------------------------------------------------------------------
#### Pre-training of Deep Bidirectional Transformers for Language Understanding 

BERT currently achieve state of art results on more than 10 NLP tasks. the key ideas behind this model is that we can 

pre-train the model by using one kind of language model with huge amount of raw data, where you can find it easily.

as most of parameters of the model is pre-trained, only last layer for classifier need to be need for different tasks.

as a result, this model is generic and very powerful. you can just fine-tuning based on the pre-trained model within
 
a short period of time.
 
however, this model is quite big. with sequence length 128, you may only able to train with a batch size of 32; for long

document such as sequence length 512, it can only train a batch size 4 for a normal GPU(with 11G); and very few people

can pre-train this model from scratch, as it takes many days or weeks to train, and a normal GPU's memory is too small 

for this model.

Specially, the backbone model is Transformer, where you can find it in Attention Is All You Need. it use two kind of 

tasks to pre-train the model.

#### Masked Languge Model
generally speaking, given a sentence, some percentage of words are masked, you will need to predict the masked words

based on this masked sentence. masked words are chosed randomly.

we feed the input through a deep Transformer encoder and then use the final hidden states corresponding to the masked 

positions to predict what word was masked, exactly like we would train a language model.

    source_file each line is a sequence of token, can be a sentence.
    
    Input Sequence  : The man went to [MASK] store with [MASK] dog
    Target Sequence :                  the                his
         

#### Next Sentence Prediction
many language understanding task, like question answering, inference, need understand relationship
  
between sentence. however, language model is only able to understand without a sentence. next sentence

prediction is a sample task to help model understand better in these kinds of task.

50% of chance the second sentence is tbe next sentence of the first one, 50% of not the next one.

given two sentence, the model is asked to predict whether the second sentence is real next sentence of 

the first one.
  
    Input : [CLS] the man went to the store [SEP] he bought a gallon of milk [SEP]
    Label : IsNext

    Input = [CLS] the man heading to the store [SEP] penguin [MASK] are flight ##less birds [SEP]
    Label = NotNext
    
<img src="https://github.com/brightmart/text_classification/blob/master/images/bert_1.jpeg"  width="65%" height="65%" />

<img src="https://github.com/brightmart/text_classification/blob/master/images/bert_2.jpeg"  width="65%" height="65%" />


#### How to use BERT?

basically, you can download pre-trained model, can just fine-tuning on your task with your own data.

for classification task, you can add processor to define the format you want to let input and labels from source data.

#### Use BERT for multi-label classification?

run the following command under folder a00_Bert:
 
      python  train_bert_multi-label.py
   
It achieve 0.368 after 9 epoch.
or you can run multi-label classification with downloadable data using BERT from 

<a href='https://github.com/brightmart/sentiment_analysis_fine_grain'>sentiment_analysis_fine_grain with BERT</a>
 
#### Use BERT for online prediction 

you can use session and feed style to restore model and feed data, then get logits to make a online prediction.

<a href='https://github.com/brightmart/sentiment_analysis_fine_grain'>online prediction with BERT</a>

originally, it train or evaluate model based on file, not for online.

#### How to get better model for BERT?

firstly, you can use pre-trained model download from google. run a few epoch on you dataset, and find a suitable 

sequence length.

secondly, you can pre-train the base model in your own data as long as  you can find a dataset that is related to 

your task, then fine-tuning on your specific task.

thirdly, you can change loss function and last layer to better suit for your task.

additionally, you can add define some pre-trained tasks that will help the model understand your task much better.

as experienced we got from experiments, pre-trained task is independent from model and pre-train is not limit to 

the tasks above.

-------------------------------------------------------------------------


4.TextRNN
-------------
Structure v1:embedding--->bi-directional lstm--->concat output--->average----->softmax layer

check: p8_TextRNN_model.py

![alt text](https://github.com/brightmart/text_classification/blob/master/images/bi-directionalRNN.JPG)

Structure v2:embedding-->bi-directional lstm---->dropout-->concat ouput--->lstm--->droput-->FC layer-->softmax layer

check: p8_TextRNN_model_multilayer.py

![alt text](https://github.com/brightmart/text_classification/blob/master/images/emojifier-v2.png)


-------------------------------------------------------------------------


5.BiLstmTextRelation
-------------
Structure same as TextRNN. but input is special designed. e.g.input:"how much is the computer? EOS price of laptop". where 'EOS' is a special
token spilted question1 and question2.

check:p9_BiLstmTextRelation_model.py


-------------------------------------------------------------------------


6.twoCNNTextRelation
-------------
Structure: first use two different convolutional to extract feature of two sentences. then concat two features. use linear
transform layer to out projection to target label, then softmax.

check: p9_twoCNNTextRelation_model.py


-------------------------------------------------------------------------


7.BiLstmTextRelationTwoRNN
-------------
Structure: one bi-directional lstm for one sentence(get output1), another bi-directional lstm for another sentence(get output2). then:
softmax(output1*M*output2)

check:p9_BiLstmTextRelationTwoRNN_model.py

for more detail you can go to: <a herf="http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow">Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow<a>


-------------------------------------------------------------------------


8.RCNN:
-------------
Recurrent convolutional neural network for text classification

implementation of <a href="https://scholar.google.com.hk/scholar?q=Recurrent+Convolutional+Neural+Networks+for+Text+Classification&hl=zh-CN&as_sdt=0&as_vis=1&oi=scholart&sa=X&ved=0ahUKEwjpx82cvqTUAhWHspQKHUbDBDYQgQMIITAA"> Recurrent Convolutional Neural Network for Text Classification </a>
 
structure:1)recurrent structure (convolutional layer) 2)max pooling 3) fully connected layer+softmax

it learn represenation of each word in the sentence or document with left side context and right side context:

representation current word=[left_side_context_vector,current_word_embedding,right_side_context_vecotor].

for left side context, it use a recurrent structure, a no-linearity transfrom of previous word and left side previous context; similarly to right side context.

check: p71_TextRCNN_model.py

![alt text](https://github.com/brightmart/text_classification/blob/master/images/RCNN.JPG)

-------------------------------------------------------------------------

9.Hierarchical Attention Network:
-------------
Implementation of <a href="https://www.cs.cmu.edu/~diyiy/docs/naacl16.pdf">Hierarchical Attention Networks for Document Classification</a>

Structure:

1) embedding 

2) Word Encoder: word level bi-directional GRU to get rich representation of words

3) Word Attention:word level attention to get important information in a sentence

4) Sentence Encoder: sentence level bi-directional GRU to get rich representation of sentences

5) Sentence Attetion: sentence level attention to get important sentence among sentences

5) FC+Softmax

![alt text](https://github.com/brightmart/text_classification/blob/master/images/HAN.JPG)

In NLP, text classification can be done for single sentence, but it can also be used for multiple sentences. we may call it document classification. Words are form to sentence. And sentence are form to document. In this circumstance, there may exists a intrinsic structure. So how can we model this kinds of task? Does all parts of document are equally relevant? And how we determine which part are more important than another?

It has two unique features: 

1)it has a hierarchical structure that reflect the hierarchical structure of documents; 

2)it has two levels of attention mechanisms used at the word and sentence-level. it enable the model to capture important information in different levels.

Word Encoder:
For each words in a sentence, it is embedded into word vector in distribution vector space. It use a bidirectional GRU to encode the sentence. By concatenate vector from two direction, it now can form a representation of the sentence, which also capture contextual information.

Word Attention:
Same words are more important than another for the sentence. So attention mechanism is used. It first use one layer MLP to get uit hidden representation of the sentence, then measure the importance of the word as the similarity of uit with a word level context vector uw and get a normalized importance through a softmax function. 

Sentence Encoder: 
for sentence vectors, bidirectional GRU is used to encode it. Similarly to word encoder.

Sentence Attention: 
sentence level vector is used to measure importance among sentences. Similarly to word attention.

Input of data: 

Generally speaking, input of this model should have serveral sentences instead of sinle sentence. shape is:[None,sentence_lenght]. where None means the batch_size.

In my training data, for each example, i have four parts. each part has same length. i concat four parts to form one single sentence. the model will split the sentence into four parts, to form a tensor with shape:[None,num_sentence,sentence_length]. where num_sentence is number of sentences(equal to 4, in my setting).

check:p1_HierarchicalAttention_model.py

for attentive attention you can check <a href='https://github.com/brightmart/text_classification/issues/55'>attentive attention</a>

-------------------------------------------------------------------------

10.Seq2seq with attention
-------------
Implementation seq2seq with attention derived from <a href="https://arxiv.org/pdf/1409.0473.pdf">NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE</a>

I.Structure:

1)embedding 2)bi-GRU too get rich representation from source sentences(forward & backward). 3)decoder with attention.

![alt text](https://github.com/brightmart/text_classification/blob/master/images/seq2seqAttention.JPG)

II.Input of data:

there are two kinds of three kinds of inputs:1)encoder inputs, which is a sentence; 2)decoder inputs, it is labels list with fixed length;3)target labels, it is also a list of labels.

for example, labels is:"L1 L2 L3 L4", then decoder inputs will be:[_GO,L1,L2,L2,L3,_PAD]; target label will be:[L1,L2,L3,L3,_END,_PAD]. length is fixed to 6, any exceed labels will be trancated, will pad if label is not enough to fill.

III.Attention Mechanism:

1) transfer encoder input list and hidden state of decoder

2) calculate similarity of hidden state with each encoder input, to get possibility distribution for each encoder input.

3) weighted sum of encoder input based on possibility distribution.

   go though RNN Cell using this weight sum together with decoder input to get new hidden state

IV.How Vanilla Encoder Decoder Works:

the source sentence will be encoded using RNN as fixed size vector ("thought vector"). then during decoder:

1) when it is training, another RNN will be used to try to get a word by using this "thought vector"  as init state, and take input from decoder input at each timestamp. decoder start from special token "_GO". 
after one step is performanced, new hidden state will be get and together with new input, we can continue this process until we reach to a special token "_END". 
we can calculate loss by compute cross entropy loss of logits and target label. logits is get through a projection layer for the hidden state(for output of decoder step(in GRU we can just use hidden states from decoder as output).

2) when it is testing, there is no label. so we should feed the output we get from previous timestamp, and continue the process util we reached "_END" TOKEN.

V.Notices:

1) here i use two kinds of vocabularies. one is from words,used by encoder; another is for labels,used by decoder

2) for vocabulary of lables, i insert three special token:"_GO","_END","_PAD"; "_UNK" is not used, since all labels is pre-defined.

-------------------------------------------------------------------------

11.Transformer("Attention Is All You Need")
-------------
Status: it was able to do task classification. and able to generate reverse order of its sequences in toy task. you can check it by running test function in the model. check: a2_train_classification.py(train) or a2_transformer_classification.py(model)

we do it in parallell style.layer normalization,residual connection, and mask are also used in the model. 

For every building blocks, we include a test function in the each file below, and we've test each small piece successfully.

Sequence to sequence with attention is a typical model to solve sequence generation problem, such as translate, dialogue system. most of time, it use RNN as buidling block to do these tasks. util recently, people also apply convolutional Neural Network for sequence to sequence problem. Transformer, however, it perform these tasks solely on attention mechansim. it is fast and achieve new state-of-art result.

![alt text](https://github.com/brightmart/text_classification/blob/master/images/attention_is_all_you_need.JPG)

It also has two main parts: encoder and decoder. below is desc from paper:

Encoder:

6 layers.each layers has two sub-layers.
the first is multi-head self-attention mechanism;
the second is position-wise fully connected feed-forward network.
for each sublayer. use LayerNorm(x+Sublayer(x)). all dimension=512.

Decoder:

1. The decoder is composed of a stack of N= 6 identical layers.
2. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head
attention over the output of the encoder stack.
3. Similar to the encoder, we employ residual connections
around each of the sub-layers, followed by layer normalization. We also modify the self-attention
sub-layer in the decoder stack to prevent positions from attending to subsequent positions.  This
masking, combined with fact that the output embeddings are offset by one position, ensures that the
predictions for position i can depend only on the known outputs at positions less than i.

Main Take away from this model:

1) multi-head self attention: use self attention, linear transform multi-times to get projection of key-values, then do ordinary attention; 2) some tricks to improve performance(residual connection,position encoding, poistion feed forward, label smooth, mask to ignore things we want to ignore).

Use this model to do task classification:

Here we only use encode part for task classification, removed resdiual connection, used only 1 layer.no need to use mask. we use multi-head attention and postionwise feed forward to extract features of input sentence, then use linear layer to project it to get logits.

for detail of the model, please check: a2_transformer_classification.py

-------------------------------------------------------------------------

12.Recurrent Entity Network
-------------------------------------------------------------------------
Input:1. story: it is multi-sentences, as context. 2.query: a sentence, which is a question, 3. ansewr: a single label.

Model Structure:

1) Input encoding: use bag of word to encode story(context) and query(question); take account of position by using position mask

   by using bi-directional rnn to encode story and query, performance boost from 0.392 to 0.398, increase 1.5%.

2) Dynamic memory: 

a. compute gate by using 'similarity' of keys,values with input of story. 

b. get candidate hidden state by transform each key,value and input.

c. combine gate and candidate hidden state to update current hidden state.

3) Output moudle( use attention mechanism):
a. to get possibility distribution by computing 'similarity' of query and hidden state

b. get weighted sum of hidden state using possibility distribution.

c. non-linearity transform of query and hidden state to get predict label.

![alt text](https://github.com/brightmart/text_classification/blob/master/images/EntityNet.JPG)

Main take away from this model:

1) use blocks of keys and values, which is independent from each other. so it can be run in parallel.

2) modelling context and question together. use memory to track state of world; and use non-linearity transform of hidden state and question(query) to make a prediction.

3) simple model can also achieve very good performance. simple encode as use bag of word.

for detail of the model, please check: a3_entity_network.py

under this model, it has a test function, which ask this model to count numbers both for story(context) and query(question). but weights of story is smaller than query.

-------------------------------------------------------------------------

13.Dynamic Memory Network
-------------------------------------------------------------------------
Outlook of Model:

1.Input Module: encode raw texts into vector representation

2.Question Module: encode question into vector representation

3.Episodic Memory Module: with inputs,it chooses which parts of inputs to focus on through the attention mechanism, taking into account of question and previous memory====>it poduce a 'memory' vecotr.

4.Answer Module:generate an answer from the final memory vector.

![alt text](https://github.com/brightmart/text_classification/blob/master/images/DMN.JPG)

Detail:

1.Input Module:

  a.single sentence: use gru to get hidden state
  b.list of sentences: use gru to get the hidden states for each sentence. e.g. [hidden states 1,hidden states 2, hidden states...,hidden state n]
  
2.Question Module:
  use gru to get hidden state
  
3.Episodic Memory Module:

  use an attention mechanism and recurrent network to updates its memory. 
     
  a. gate as attention mechanism:
  
     two-layer feed forward nueral network.input is candidate fact c,previous memory m and question q. features get by take: element-wise,matmul and absolute distance of q with c, and q with m.
     
  b.memory update mechanism: take candidate sentence, gate and previous hidden state, it use gated-gru to update hidden state. like: h=f(c,h_previous,g). the final hidden state is the input for answer module.
  
  c.need for multiple episodes===>transitive inference. 
  
  e.g. ask where is the football? it will attend to sentence of "john put down the football"), then in second pass, it need to attend location of john.

4.Answer Module:
take the final epsoidic memory, question, it update hidden state of answer module.


TODO 
-------------------------------------------------------------------------------------------------------
1.Character-level Convolutional Networks for Text Classification

2.Convolutional Neural Networks for Text Categorization:Shallow Word-level vs. Deep Character-level

3.Very Deep Convolutional Networks for Text Classification

4.Adversarial Training Methods For Semi-supervised Text Classification

5.Ensemble Models


Conclusion:
-------------------------------------------------------------------------
During the process of doing large scale of multi-label classification, serveral lessons has been learned, and some list as below:

1) What is most important thing to reach a high accuracy? 
It depend the task you are doing. From the task we conducted here, we believe that ensemble models based on models trained from multiple features including word, character for title and description can help to reach very high accuarcy; However, in some cases,as just alphaGo Zero demonstrated, algorithm is more important then data or computational power, in fact alphaGo Zero did not use any humam data. 

2) Is there a ceiling for any specific model or algorithm?
The answer is yes. lots of different models were used here, we found many models have similar performances, even though there are quite different in structure. In some extent, the difference of performance is not so big.

3) Is case study of error useful?
I think it is quite useful especially when you have done many different things, but reached a limit. For example, by doing case study, you can find labels that models can make correct prediction, and where they make mistakes. And to imporove performance by  increasing weights of these wrong predicted labels or finding potential errors from data.

4) How can we become expert in a specific of Machine Learning?
In my opinion,join a machine learning competation or begin a task with lots of data, then read papers and implement some, is a good starting point. So we will have some really experience and ideas of handling specific task, and know the challenges of it.
But what's more important is that we should not only follow ideas from papers, but to explore some new ideas we think may help to slove the problem. For example, by changing structures of classic models or even invent some new structures, we may able to tackle the problem in a much better way as it may more suitable for task we are doing.

Reference:
-------------------------------------------------------------------------------------------------------
1.Bag of Tricks for Efficient Text Classification

2.Convolutional Neural Networks for Sentence Classification

3.A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

4.Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow, from www.wildml.com

5.Recurrent Convolutional Neural Network for Text Classification

6.Hierarchical Attention Networks for Document Classification

7.Neural Machine Translation by Jointly Learning to Align and Translate

8.Attention Is All You Need

9.Ask Me Anything:Dynamic Memory Networks for Natural Language Processing

10.Tracking the state of world with recurrent entity networks

11.Ensemble Selection from Libraries of Models

12.<a href='https://arxiv.org/abs/1810.04805'>BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding</a>

13.<a href='https://github.com/google-research/bert'>google-research/bert</a>

-------------------------------------------------------------------------

to be continued. for any problem, concat brightmart@hotmail.com


================================================
FILE: a00_Bert/README_bert.md
================================================

1. train bert for multi-label classification:

    python try train_bert_multi-label.py
    
2. to run bert without really data: toy task to test bert

    python train_bert_toy_task.py

================================================
FILE: a00_Bert/__init__.py
================================================


================================================
FILE: a00_Bert/bert_modeling.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The main BERT model and related functions."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import copy
import json
import math
import re
import six
import tensorflow as tf


class BertConfig(object):
  """Configuration for `BertModel`."""

  def __init__(self,
               vocab_size,
               hidden_size=768,
               num_hidden_layers=12,
               num_attention_heads=12,
               intermediate_size=3072,
               hidden_act="gelu",
               hidden_dropout_prob=0.1,
               attention_probs_dropout_prob=0.1,
               max_position_embeddings=512,
               type_vocab_size=16,
               initializer_range=0.02):
    """Constructs BertConfig.

    Args:
      vocab_size: Vocabulary size of `inputs_ids` in `BertModel`.
      hidden_size: Size of the encoder layers and the pooler layer.
      num_hidden_layers: Number of hidden layers in the Transformer encoder.
      num_attention_heads: Number of attention heads for each attention layer in
        the Transformer encoder.
      intermediate_size: The size of the "intermediate" (i.e., feed-forward)
        layer in the Transformer encoder.
      hidden_act: The non-linear activation function (function or string) in the
        encoder and pooler.
      hidden_dropout_prob: The dropout probability for all fully connected
        layers in the embeddings, encoder, and pooler.
      attention_probs_dropout_prob: The dropout ratio for the attention
        probabilities.
      max_position_embeddings: The maximum sequence length that this model might
        ever be used with. Typically set this to something large just in case
        (e.g., 512 or 1024 or 2048).
      type_vocab_size: The vocabulary size of the `token_type_ids` passed into
        `BertModel`.
      initializer_range: The stdev of the truncated_normal_initializer for
        initializing all weight matrices.
    """
    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.hidden_act = hidden_act
    self.intermediate_size = intermediate_size
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.type_vocab_size = type_vocab_size
    self.initializer_range = initializer_range

  @classmethod
  def from_dict(cls, json_object):
    """Constructs a `BertConfig` from a Python dictionary of parameters."""
    config = BertConfig(vocab_size=None)
    for (key, value) in six.iteritems(json_object):
      config.__dict__[key] = value
    return config

  @classmethod
  def from_json_file(cls, json_file):
    """Constructs a `BertConfig` from a json file of parameters."""
    with tf.gfile.GFile(json_file, "r") as reader:
      text = reader.read()
    return cls.from_dict(json.loads(text))

  def to_dict(self):
    """Serializes this instance to a Python dictionary."""
    output = copy.deepcopy(self.__dict__)
    return output

  def to_json_string(self):
    """Serializes this instance to a JSON string."""
    return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"


class BertModel(object):
  """BERT model ("Bidirectional Embedding Representations from a Transformer").

  Example usage:

  ```python
  # Already been converted into WordPiece token ids
  input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])
  input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])
  token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])

  config = modeling.BertConfig(vocab_size=32000, hidden_size=512,
    num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)

  model = modeling.BertModel(config=config, is_training=True,
    input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)

  label_embeddings = tf.get_variable(...)
  pooled_output = model.get_pooled_output()
  logits = tf.matmul(pooled_output, label_embeddings)
  ...
  ```
  """

  def __init__(self,
               config,
               is_training,
               input_ids,
               input_mask=None,
               token_type_ids=None,
               use_one_hot_embeddings=True,
               scope=None):
    """Constructor for BertModel.

    Args:
      config: `BertConfig` instance.
      is_training: bool. true for training model, false for eval model. Controls
        whether dropout will be applied.
      input_ids: int32 Tensor of shape [batch_size, seq_length].
      input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].
      token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
      use_one_hot_embeddings: (optional) bool. Whether to use one-hot word
        embeddings or tf.embedding_lookup() for the word embeddings. On the TPU,
        it is must faster if this is True, on the CPU or GPU, it is faster if
        this is False.
      scope: (optional) variable scope. Defaults to "bert".

    Raises:
      ValueError: The config is invalid or one of the input tensor shapes
        is invalid.
    """
    # change 11.24 is_training is changed from bool to placeholder.

    config = copy.deepcopy(config)
    # if not is_training:
    #     config.hidden_dropout_prob = 0.0
    #     config.attention_probs_dropout_prob = 0.0
    def not_apply_dropout():
        config.hidden_dropout_prob = 0.0
        config.attention_probs_dropout_prob = 0.0
        return 1

    def apply_dropout():
        return 1

    tf.cond(is_training,apply_dropout,not_apply_dropout)

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope(scope, default_name="bert"):
      with tf.variable_scope("embeddings"):
        # Perform embedding lookup on the word ids.
        (self.embedding_output, self.embedding_table) = embedding_lookup(
            input_ids=input_ids,
            vocab_size=config.vocab_size,
            embedding_size=config.hidden_size,
            initializer_range=config.initializer_range,
            word_embedding_name="word_embeddings",
            use_one_hot_embeddings=use_one_hot_embeddings)

        # Add positional embeddings and token type embeddings, then layer
        # normalize and perform dropout.
        self.embedding_output = embedding_postprocessor(
            input_tensor=self.embedding_output,
            use_token_type=True,
            token_type_ids=token_type_ids,
            token_type_vocab_size=config.type_vocab_size,
            token_type_embedding_name="token_type_embeddings",
            use_position_embeddings=True,
            position_embedding_name="position_embeddings",
            initializer_range=config.initializer_range,
            max_position_embeddings=config.max_position_embeddings,
            dropout_prob=config.hidden_dropout_prob)

      with tf.variable_scope("encoder"):
        # This converts a 2D mask of shape [batch_size, seq_length] to a 3D
        # mask of shape [batch_size, seq_length, seq_length] which is used
        # for the attention scores.
        attention_mask = create_attention_mask_from_input_mask(
            input_ids, input_mask)

        # Run the stacked transformer.
        # `sequence_output` shape = [batch_size, seq_length, hidden_size].
        self.all_encoder_layers = transformer_model(
            input_tensor=self.embedding_output,
            attention_mask=attention_mask,
            hidden_size=config.hidden_size,
            num_hidden_layers=config.num_hidden_layers,
            num_attention_heads=config.num_attention_heads,
            intermediate_size=config.intermediate_size,
            intermediate_act_fn=get_activation(config.hidden_act),
            hidden_dropout_prob=config.hidden_dropout_prob,
            attention_probs_dropout_prob=config.attention_probs_dropout_prob,
            initializer_range=config.initializer_range,
            do_return_all_layers=True)

      self.sequence_output = self.all_encoder_layers[-1] # [batch_size, seq_length, hidden_size]
      # The "pooler" converts the encoded sequence tensor of shape
      # [batch_size, seq_length, hidden_size] to a tensor of shape
      # [batch_size, hidden_size]. This is necessary for segment-level
      # (or segment-pair-level) classification tasks where we need a fixed
      # dimensional representation of the segment.
      with tf.variable_scope("pooler"):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token. We assume that this has been pre-trained
        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)
        self.pooled_output = tf.layers.dense(
            first_token_tensor,
            config.hidden_size,
            activation=tf.tanh,
            kernel_initializer=create_initializer(config.initializer_range))

  def get_pooled_output(self):
    return self.pooled_output

  def get_sequence_output(self):
    """Gets final hidden layer of encoder.

    Returns:
      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
      to the final hidden of the transformer encoder.
    """
    return self.sequence_output

  def get_all_encoder_layers(self):
    return self.all_encoder_layers

  def get_embedding_output(self):
    """Gets output of the embedding lookup (i.e., input to the transformer).

    Returns:
      float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
      to the output of the embedding layer, after summing the word
      embeddings with the positional embeddings and the token type embeddings,
      then performing layer normalization. This is the input to the transformer.
    """
    return self.embedding_output

  def get_embedding_table(self):
    return self.embedding_table


def gelu(input_tensor):
  """Gaussian Error Linear Unit.

  This is a smoother version of the RELU.
  Original paper: https://arxiv.org/abs/1606.08415

  Args:
    input_tensor: float Tensor to perform activation.

  Returns:
    `input_tensor` with the GELU activation applied.
  """
  cdf = 0.5 * (1.0 + tf.erf(input_tensor / tf.sqrt(2.0)))
  return input_tensor * cdf


def get_activation(activation_string):
  """Maps a string to a Python function, e.g., "relu" => `tf.nn.relu`.

  Args:
    activation_string: String name of the activation function.

  Returns:
    A Python function corresponding to the activation function. If
    `activation_string` is None, empty, or "linear", this will return None.
    If `activation_string` is not a string, it will return `activation_string`.

  Raises:
    ValueError: The `activation_string` does not correspond to a known
      activation.
  """

  # We assume that anything that"s not a string is already an activation
  # function, so we just return it.
  if not isinstance(activation_string, six.string_types):
    return activation_string

  if not activation_string:
    return None

  act = activation_string.lower()
  if act == "linear":
    return None
  elif act == "relu":
    return tf.nn.relu
  elif act == "gelu":
    return gelu
  elif act == "tanh":
    return tf.tanh
  else:
    raise ValueError("Unsupported activation: %s" % act)


def get_assignment_map_from_checkpoint(tvars, init_checkpoint):
  """Compute the union of the current variables and checkpoint variables."""
  assignment_map = {}
  initialized_variable_names = {}

  name_to_variable = collections.OrderedDict()
  for var in tvars:
    name = var.name
    m = re.match("^(.*):\\d+$", name)
    if m is not None:
      name = m.group(1)
    name_to_variable[name] = var

  init_vars = tf.train.list_variables(init_checkpoint)

  assignment_map = collections.OrderedDict()
  for x in init_vars:
    (name, var) = (x[0], x[1])
    if name not in name_to_variable:
      continue
    assignment_map[name] = name
    initialized_variable_names[name] = 1
    initialized_variable_names[name + ":0"] = 1

  return (assignment_map, initialized_variable_names)


def dropout(input_tensor, dropout_prob):
  """Perform dropout.

  Args:
    input_tensor: float Tensor.
    dropout_prob: Python float. The probability of dropping out a value (NOT of
      *keeping* a dimension as in `tf.nn.dropout`).

  Returns:
    A version of `input_tensor` with dropout applied.
  """
  if dropout_prob is None or dropout_prob == 0.0:
    return input_tensor

  output = tf.nn.dropout(input_tensor, 1.0 - dropout_prob)
  return output


def layer_norm(input_tensor, name=None):
  """Run layer normalization on the last dimension of the tensor."""
  return tf.contrib.layers.layer_norm(
      inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)


def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
  """Runs layer normalization followed by dropout."""
  output_tensor = layer_norm(input_tensor, name)
  output_tensor = dropout(output_tensor, dropout_prob)
  return output_tensor


def create_initializer(initializer_range=0.02):
  """Creates a `truncated_normal_initializer` with the given range."""
  return tf.truncated_normal_initializer(stddev=initializer_range)


def embedding_lookup(input_ids,
                     vocab_size,
                     embedding_size=128,
                     initializer_range=0.02,
                     word_embedding_name="word_embeddings",
                     use_one_hot_embeddings=False):
  """Looks up words embeddings for id tensor.

  Args:
    input_ids: int32 Tensor of shape [batch_size, seq_length] containing word
      ids.
    vocab_size: int. Size of the embedding vocabulary.
    embedding_size: int. Width of the word embeddings.
    initializer_range: float. Embedding initialization range.
    word_embedding_name: string. Name of the embedding table.
    use_one_hot_embeddings: bool. If True, use one-hot method for word
      embeddings. If False, use `tf.nn.embedding_lookup()`. One hot is better
      for TPUs.

  Returns:
    float Tensor of shape [batch_size, seq_length, embedding_size].
  """
  # This function assumes that the input is of shape [batch_size, seq_length,
  # num_inputs].
  #
  # If the input is a 2D tensor of shape [batch_size, seq_length], we
  # reshape to [batch_size, seq_length, 1].
  if input_ids.shape.ndims == 2:
    input_ids = tf.expand_dims(input_ids, axis=[-1])

  embedding_table = tf.get_variable(
      name=word_embedding_name,
      shape=[vocab_size, embedding_size],
      initializer=create_initializer(initializer_range))

  if use_one_hot_embeddings:
    flat_input_ids = tf.reshape(input_ids, [-1])
    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)
    output = tf.matmul(one_hot_input_ids, embedding_table)
  else:
    output = tf.nn.embedding_lookup(embedding_table, input_ids)

  input_shape = get_shape_list(input_ids)

  output = tf.reshape(output,
                      input_shape[0:-1] + [input_shape[-1] * embedding_size])
  return (output, embedding_table)


def embedding_postprocessor(input_tensor,
                            use_token_type=False,
                            token_type_ids=None,
                            token_type_vocab_size=16,
                            token_type_embedding_name="token_type_embeddings",
                            use_position_embeddings=True,
                            position_embedding_name="position_embeddings",
                            initializer_range=0.02,
                            max_position_embeddings=512,
                            dropout_prob=0.1):
  """Performs various post-processing on a word embedding tensor.

  Args:
    input_tensor: float Tensor of shape [batch_size, seq_length,
      embedding_size].
    use_token_type: bool. Whether to add embeddings for `token_type_ids`.
    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
      Must be specified if `use_token_type` is True.
    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.
    token_type_embedding_name: string. The name of the embedding table variable
      for token type ids.
    use_position_embeddings: bool. Whether to add position embeddings for the
      position of each token in the sequence.
    position_embedding_name: string. The name of the embedding table variable
      for positional embeddings.
    initializer_range: float. Range of the weight initialization.
    max_position_embeddings: int. Maximum sequence length that might ever be
      used with this model. This can be longer than the sequence length of
      input_tensor, but cannot be shorter.
    dropout_prob: float. Dropout probability applied to the final output tensor.

  Returns:
    float tensor with same shape as `input_tensor`.

  Raises:
    ValueError: One of the tensor shapes or input values is invalid.
  """
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  width = input_shape[2]

  if seq_length > max_position_embeddings:
    raise ValueError("The seq length (%d) cannot be greater than "
                     "`max_position_embeddings` (%d)" %
                     (seq_length, max_position_embeddings))

  output = input_tensor

  if use_token_type:
    if token_type_ids is None:
      raise ValueError("`token_type_ids` must be specified if"
                       "`use_token_type` is True.")
    token_type_table = tf.get_variable(
        name=token_type_embedding_name,
        shape=[token_type_vocab_size, width],
        initializer=create_initializer(initializer_range))
    # This vocab will be small so we always do one-hot here, since it is always
    # faster for a small vocabulary.
    flat_token_type_ids = tf.reshape(token_type_ids, [-1])
    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)
    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)
    token_type_embeddings = tf.reshape(token_type_embeddings,
                                       [batch_size, seq_length, width])
    output += token_type_embeddings

  if use_position_embeddings:
    full_position_embeddings = tf.get_variable(
        name=position_embedding_name,
        shape=[max_position_embeddings, width],
        initializer=create_initializer(initializer_range))
    # Since the position embedding table is a learned variable, we create it
    # using a (long) sequence length `max_position_embeddings`. The actual
    # sequence length might be shorter than this, for faster training of
    # tasks that do not have long sequences.
    #
    # So `full_position_embeddings` is effectively an embedding table
    # for position [0, 1, 2, ..., max_position_embeddings-1], and the current
    # sequence has positions [0, 1, 2, ... seq_length-1], so we can just
    # perform a slice.
    if seq_length < max_position_embeddings:
      position_embeddings = tf.slice(full_position_embeddings, [0, 0],
                                     [seq_length, -1])
    else:
      position_embeddings = full_position_embeddings

    num_dims = len(output.shape.as_list())

    # Only the last two dimensions are relevant (`seq_length` and `width`), so
    # we broadcast among the first dimensions, which is typically just
    # the batch size.
    position_broadcast_shape = []
    for _ in range(num_dims - 2):
      position_broadcast_shape.append(1)
    position_broadcast_shape.extend([seq_length, width])
    position_embeddings = tf.reshape(position_embeddings,
                                     position_broadcast_shape)
    output += position_embeddings

  output = layer_norm_and_dropout(output, dropout_prob)
  return output


def create_attention_mask_from_input_mask(from_tensor, to_mask):
  """Create 3D attention mask from a 2D tensor mask.

  Args:
    from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...].
    to_mask: int32 Tensor of shape [batch_size, to_seq_length].

  Returns:
    float Tensor of shape [batch_size, from_seq_length, to_seq_length].
  """
  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  batch_size = from_shape[0]
  from_seq_length = from_shape[1]

  to_shape = get_shape_list(to_mask, expected_rank=2)
  to_seq_length = to_shape[1]

  to_mask = tf.cast(
      tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)

  # We don't assume that `from_tensor` is a mask (although it could be). We
  # don't actually care if we attend *from* padding tokens (only *to* padding)
  # tokens so we create a tensor of all ones.
  #
  # `broadcast_ones` = [batch_size, from_seq_length, 1]
  broadcast_ones = tf.ones(
      shape=[batch_size, from_seq_length, 1], dtype=tf.float32)

  # Here we broadcast along two dimensions to create the mask.
  mask = broadcast_ones * to_mask

  return mask


def attention_layer(from_tensor,
                    to_tensor,
                    attention_mask=None,
                    num_attention_heads=1,
                    size_per_head=512,
                    query_act=None,
                    key_act=None,
                    value_act=None,
                    attention_probs_dropout_prob=0.0,
                    initializer_range=0.02,
                    do_return_2d_tensor=False,
                    batch_size=None,
                    from_seq_length=None,
                    to_seq_length=None):
  """Performs multi-headed attention from `from_tensor` to `to_tensor`.

  This is an implementation of multi-headed attention based on "Attention
  is all you Need". If `from_tensor` and `to_tensor` are the same, then
  this is self-attention. Each timestep in `from_tensor` attends to the
  corresponding sequence in `to_tensor`, and returns a fixed-with vector.

  This function first projects `from_tensor` into a "query" tensor and
  `to_tensor` into "key" and "value" tensors. These are (effectively) a list
  of tensors of length `num_attention_heads`, where each tensor is of shape
  [batch_size, seq_length, size_per_head].

  Then, the query and key tensors are dot-producted and scaled. These are
  softmaxed to obtain attention probabilities. The value tensors are then
  interpolated by these probabilities, then concatenated back to a single
  tensor and returned.

  In practice, the multi-headed attention are done with transposes and
  reshapes rather than actual separate tensors.

  Args:
    from_tensor: float Tensor of shape [batch_size, from_seq_length,
      from_width].
    to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].
    attention_mask: (optional) int32 Tensor of shape [batch_size,
      from_seq_length, to_seq_length]. The values should be 1 or 0. The
      attention scores will effectively be set to -infinity for any positions in
      the mask that are 0, and will be unchanged for positions that are 1.
    num_attention_heads: int. Number of attention heads.
    size_per_head: int. Size of each attention head.
    query_act: (optional) Activation function for the query transform.
    key_act: (optional) Activation function for the key transform.
    value_act: (optional) Activation function for the value transform.
    attention_probs_dropout_prob: (optional) float. Dropout probability of the
      attention probabilities.
    initializer_range: float. Range of the weight initializer.
    do_return_2d_tensor: bool. If True, the output will be of shape [batch_size
      * from_seq_length, num_attention_heads * size_per_head]. If False, the
      output will be of shape [batch_size, from_seq_length, num_attention_heads
      * size_per_head].
    batch_size: (Optional) int. If the input is 2D, this might be the batch size
      of the 3D version of the `from_tensor` and `to_tensor`.
    from_seq_length: (Optional) If the input is 2D, this might be the seq length
      of the 3D version of the `from_tensor`.
    to_seq_length: (Optional) If the input is 2D, this might be the seq length
      of the 3D version of the `to_tensor`.

  Returns:
    float Tensor of shape [batch_size, from_seq_length,
      num_attention_heads * size_per_head]. (If `do_return_2d_tensor` is
      true, this will be of shape [batch_size * from_seq_length,
      num_attention_heads * size_per_head]).

  Raises:
    ValueError: Any of the arguments or tensor shapes are invalid.
  """

  def transpose_for_scores(input_tensor, batch_size, num_attention_heads,
                           seq_length, width):
    output_tensor = tf.reshape(
        input_tensor, [batch_size, seq_length, num_attention_heads, width])

    output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3])
    return output_tensor

  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])

  if len(from_shape) != len(to_shape):
    raise ValueError(
        "The rank of `from_tensor` must match the rank of `to_tensor`.")

  if len(from_shape) == 3:
    batch_size = from_shape[0]
    from_seq_length = from_shape[1]
    to_seq_length = to_shape[1]
  elif len(from_shape) == 2:
    if (batch_size is None or from_seq_length is None or to_seq_length is None):
      raise ValueError(
          "When passing in rank 2 tensors to attention_layer, the values "
          "for `batch_size`, `from_seq_length`, and `to_seq_length` "
          "must all be specified.")

  # Scalar dimensions referenced here:
  #   B = batch size (number of sequences)
  #   F = `from_tensor` sequence length
  #   T = `to_tensor` sequence length
  #   N = `num_attention_heads`
  #   H = `size_per_head`

  from_tensor_2d = reshape_to_matrix(from_tensor)
  to_tensor_2d = reshape_to_matrix(to_tensor)

  # `query_layer` = [B*F, N*H]
  query_layer = tf.layers.dense(
      from_tensor_2d,
      num_attention_heads * size_per_head,
      activation=query_act,
      name="query",
      kernel_initializer=create_initializer(initializer_range))

  # `key_layer` = [B*T, N*H]
  key_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=key_act,
      name="key",
      kernel_initializer=create_initializer(initializer_range))

  # `value_layer` = [B*T, N*H]
  value_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=value_act,
      name="value",
      kernel_initializer=create_initializer(initializer_range))

  # `query_layer` = [B, N, F, H]
  query_layer = transpose_for_scores(query_layer, batch_size,
                                     num_attention_heads, from_seq_length,
                                     size_per_head)

  # `key_layer` = [B, N, T, H]
  key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads,
                                   to_seq_length, size_per_head)

  # Take the dot product between "query" and "key" to get the raw
  # attention scores.
  # `attention_scores` = [B, N, F, T]
  attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  attention_scores = tf.multiply(attention_scores,
                                 1.0 / math.sqrt(float(size_per_head)))

  if attention_mask is not None:
    # `attention_mask` = [B, 1, F, T]
    attention_mask = tf.expand_dims(attention_mask, axis=[1])

    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
    # masked positions, this operation will create a tensor which is 0.0 for
    # positions we want to attend and -10000.0 for masked positions.
    adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0

    # Since we are adding it to the raw scores before the softmax, this is
    # effectively the same as removing these entirely.
    attention_scores += adder

  # Normalize the attention scores to probabilities.
  # `attention_probs` = [B, N, F, T]
  attention_probs = tf.nn.softmax(attention_scores)

  # This is actually dropping out entire tokens to attend to, which might
  # seem a bit unusual, but is taken from the original Transformer paper.
  attention_probs = dropout(attention_probs, attention_probs_dropout_prob)

  # `value_layer` = [B, T, N, H]
  value_layer = tf.reshape(
      value_layer,
      [batch_size, to_seq_length, num_attention_heads, size_per_head])

  # `value_layer` = [B, N, T, H]
  value_layer = tf.transpose(value_layer, [0, 2, 1, 3])

  # `context_layer` = [B, N, F, H]
  context_layer = tf.matmul(attention_probs, value_layer)

  # `context_layer` = [B, F, N, H]
  context_layer = tf.transpose(context_layer, [0, 2, 1, 3])

  if do_return_2d_tensor:
    # `context_layer` = [B*F, N*V]
    context_layer = tf.reshape(
        context_layer,
        [batch_size * from_seq_length, num_attention_heads * size_per_head])
  else:
    # `context_layer` = [B, F, N*V]
    context_layer = tf.reshape(
        context_layer,
        [batch_size, from_seq_length, num_attention_heads * size_per_head])

  return context_layer


def transformer_model(input_tensor,
                      attention_mask=None,
                      hidden_size=768,
                      num_hidden_layers=12,
                      num_attention_heads=12,
                      intermediate_size=3072,
                      intermediate_act_fn=gelu,
                      hidden_dropout_prob=0.1,
                      attention_probs_dropout_prob=0.1,
                      initializer_range=0.02,
                      do_return_all_layers=False):
  """Multi-headed, multi-layer Transformer from "Attention is All You Need".

  This is almost an exact implementation of the original Transformer encoder.

  See the original paper:
  https://arxiv.org/abs/1706.03762

  Also see:
  https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py

  Args:
    input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].
    attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,
      seq_length], with 1 for positions that can be attended to and 0 in
      positions that should not be.
    hidden_size: int. Hidden size of the Transformer.
    num_hidden_layers: int. Number of layers (blocks) in the Transformer.
    num_attention_heads: int. Number of attention heads in the Transformer.
    intermediate_size: int. The size of the "intermediate" (a.k.a., feed
      forward) layer.
    intermediate_act_fn: function. The non-linear activation function to apply
      to the output of the intermediate/feed-forward layer.
    hidden_dropout_prob: float. Dropout probability for the hidden layers.
    attention_probs_dropout_prob: float. Dropout probability of the attention
      probabilities.
    initializer_range: float. Range of the initializer (stddev of truncated
      normal).
    do_return_all_layers: Whether to also return all layers or just the final
      layer.

  Returns:
    float Tensor of shape [batch_size, seq_length, hidden_size], the final
    hidden layer of the Transformer.

  Raises:
    ValueError: A Tensor shape or parameter is invalid.
  """
  if hidden_size % num_attention_heads != 0:
    raise ValueError(
        "The hidden size (%d) is not a multiple of the number of attention "
        "heads (%d)" % (hidden_size, num_attention_heads))

  attention_head_size = int(hidden_size / num_attention_heads)
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  input_width = input_shape[2]

  # The Transformer performs sum residuals on all layers so the input needs
  # to be the same as the hidden size.
  if input_width != hidden_size:
    raise ValueError("The width of the input tensor (%d) != hidden size (%d)" %
                     (input_width, hidden_size))

  # We keep the representation as a 2D tensor to avoid re-shaping it back and
  # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on
  # the GPU/CPU but may not be free on the TPU, so we want to minimize them to
  # help the optimizer.
  prev_output = reshape_to_matrix(input_tensor)

  all_layer_outputs = []
  for layer_idx in range(num_hidden_layers):
    with tf.variable_scope("layer_%d" % layer_idx):
      layer_input = prev_output

      with tf.variable_scope("attention"):
        attention_heads = []
        with tf.variable_scope("self"):
          attention_head = attention_layer(
              from_tensor=layer_input,
              to_tensor=layer_input,
              attention_mask=attention_mask,
              num_attention_heads=num_attention_heads,
              size_per_head=attention_head_size,
              attention_probs_dropout_prob=attention_probs_dropout_prob,
              initializer_range=initializer_range,
              do_return_2d_tensor=True,
              batch_size=batch_size,
              from_seq_length=seq_length,
              to_seq_length=seq_length)
          attention_heads.append(attention_head)

        attention_output = None
        if len(attention_heads) == 1:
          attention_output = attention_heads[0]
        else:
          # In the case where we have other sequences, we just concatenate
          # them to the self-attention head before the projection.
          attention_output = tf.concat(attention_heads, axis=-1)

        # Run a linear projection of `hidden_size` then add a residual
        # with `layer_input`.
        with tf.variable_scope("output"):
          attention_output = tf.layers.dense(
              attention_output,
              hidden_size,
              kernel_initializer=create_initializer(initializer_range))
          attention_output = dropout(attention_output, hidden_dropout_prob)
          attention_output = layer_norm(attention_output + layer_input)

      # The activation is only applied to the "intermediate" hidden layer.
      with tf.variable_scope("intermediate"):
        intermediate_output = tf.layers.dense(
            attention_output,
            intermediate_size,
            activation=intermediate_act_fn,
            kernel_initializer=create_initializer(initializer_range))

      # Down-project back to `hidden_size` then add the residual.
      with tf.variable_scope("output"):
        layer_output = tf.layers.dense(
            intermediate_output,
            hidden_size,
            kernel_initializer=create_initializer(initializer_range))
        layer_output = dropout(layer_output, hidden_dropout_prob)
        layer_output = layer_norm(layer_output + attention_output)
        prev_output = layer_output
        all_layer_outputs.append(layer_output)

  if do_return_all_layers:
    final_outputs = []
    for layer_output in all_layer_outputs:
      final_output = reshape_from_matrix(layer_output, input_shape)
      final_outputs.append(final_output)
    return final_outputs
  else:
    final_output = reshape_from_matrix(prev_output, input_shape)
    return final_output


def get_shape_list(tensor, expected_rank=None, name=None):
  """Returns a list of the shape of tensor, preferring static dimensions.

  Args:
    tensor: A tf.Tensor object to find the shape of.
    expected_rank: (optional) int. The expected rank of `tensor`. If this is
      specified and the `tensor` has a different rank, and exception will be
      thrown.
    name: Optional name of the tensor for the error message.

  Returns:
    A list of dimensions of the shape of tensor. All static dimensions will
    be returned as python integers, and dynamic dimensions will be returned
    as tf.Tensor scalars.
  """
  if name is None:
    name = tensor.name

  if expected_rank is not None:
    assert_rank(tensor, expected_rank, name)

  shape = tensor.shape.as_list()

  non_static_indexes = []
  for (index, dim) in enumerate(shape):
    if dim is None:
      non_static_indexes.append(index)

  if not non_static_indexes:
    return shape

  dyn_shape = tf.shape(tensor)
  for index in non_static_indexes:
    shape[index] = dyn_shape[index]
  return shape


def reshape_to_matrix(input_tensor):
  """Reshapes a >= rank 2 tensor to a rank 2 tensor (i.e., a matrix)."""
  ndims = input_tensor.shape.ndims
  if ndims < 2:
    raise ValueError("Input tensor must have at least rank 2. Shape = %s" %
                     (input_tensor.shape))
  if ndims == 2:
    return input_tensor

  width = input_tensor.shape[-1]
  output_tensor = tf.reshape(input_tensor, [-1, width])
  return output_tensor


def reshape_from_matrix(output_tensor, orig_shape_list):
  """Reshapes a rank 2 tensor back to its original rank >= 2 tensor."""
  if len(orig_shape_list) == 2:
    return output_tensor

  output_shape = get_shape_list(output_tensor)

  orig_dims = orig_shape_list[0:-1]
  width = output_shape[-1]

  return tf.reshape(output_tensor, orig_dims + [width])


def assert_rank(tensor, expected_rank, name=None):
  """Raises an exception if the tensor rank is not of the expected rank.

  Args:
    tensor: A tf.Tensor to check the rank of.
    expected_rank: Python integer or list of integers, expected rank.
    name: Optional name of the tensor for the error message.

  Raises:
    ValueError: If the expected shape doesn't match the actual shape.
  """
  if name is None:
    name = tensor.name

  expected_rank_dict = {}
  if isinstance(expected_rank, six.integer_types):
    expected_rank_dict[expected_rank] = True
  else:
    for x in expected_rank:
      expected_rank_dict[x] = True

  actual_rank = tensor.shape.ndims
  if actual_rank not in expected_rank_dict:
    scope_name = tf.get_variable_scope().name
    raise ValueError(
        "For the tensor `%s` in scope `%s`, the actual rank "
        "`%d` (shape = %s) is not equal to the expected rank `%s`" %
        (name, scope_name, actual_rank, str(tensor.shape), str(expected_rank)))


================================================
FILE: a00_Bert/optimization.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Functions and classes related to optimization (weight updates)."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import re
import tensorflow as tf


def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, use_tpu):
  """Creates an optimizer training op."""
  global_step = tf.train.get_or_create_global_step()

  learning_rate = tf.constant(value=init_lr, shape=[], dtype=tf.float32)

  # Implements linear decay of the learning rate.
  learning_rate = tf.train.polynomial_decay(
      learning_rate,
      global_step,
      num_train_steps,
      end_learning_rate=0.0,
      power=1.0,
      cycle=False)

  # Implements linear warmup. I.e., if global_step < num_warmup_steps, the
  # learning rate will be `global_step/num_warmup_steps * init_lr`.
  if num_warmup_steps:
    global_steps_int = tf.cast(global_step, tf.int32)
    warmup_steps_int = tf.constant(num_warmup_steps, dtype=tf.int32)

    global_steps_float = tf.cast(global_steps_int, tf.float32)
    warmup_steps_float = tf.cast(warmup_steps_int, tf.float32)

    warmup_percent_done = global_steps_float / warmup_steps_float
    warmup_learning_rate = init_lr * warmup_percent_done

    is_warmup = tf.cast(global_steps_int < warmup_steps_int, tf.float32)
    learning_rate = (
        (1.0 - is_warmup) * learning_rate + is_warmup * warmup_learning_rate)

  # It is recommended that you use this optimizer for fine tuning, since this
  # is how the model was trained (note that the Adam m/v variables are NOT
  # loaded from init_checkpoint.)
  optimizer = AdamWeightDecayOptimizer(
      learning_rate=learning_rate,
      weight_decay_rate=0.01,
      beta_1=0.9,
      beta_2=0.999,
      epsilon=1e-6,
      exclude_from_weight_decay=["LayerNorm", "layer_norm", "bias"])

  if use_tpu:
    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)

  tvars = tf.trainable_variables()
  grads = tf.gradients(loss, tvars)

  # This is how the model was pre-trained.
  (grads, _) = tf.clip_by_global_norm(grads, clip_norm=1.0)

  train_op = optimizer.apply_gradients(
      zip(grads, tvars), global_step=global_step)

  new_global_step = global_step + 1
  train_op = tf.group(train_op, [global_step.assign(new_global_step)])
  return train_op


class AdamWeightDecayOptimizer(tf.train.Optimizer):
  """A basic Adam optimizer that includes "correct" L2 weight decay."""

  def __init__(self,
               learning_rate,
               weight_decay_rate=0.0,
               beta_1=0.9,
               beta_2=0.999,
               epsilon=1e-6,
               exclude_from_weight_decay=None,
               name="AdamWeightDecayOptimizer"):
    """Constructs a AdamWeightDecayOptimizer."""
    super(AdamWeightDecayOptimizer, self).__init__(False, name)

    self.learning_rate = learning_rate
    self.weight_decay_rate = weight_decay_rate
    self.beta_1 = beta_1
    self.beta_2 = beta_2
    self.epsilon = epsilon
    self.exclude_from_weight_decay = exclude_from_weight_decay

  def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    """See base class."""
    assignments = []
    for (grad, param) in grads_and_vars:
      if grad is None or param is None:
        continue

      param_name = self._get_variable_name(param.name)

      m = tf.get_variable(
          name=param_name + "/adam_m",
          shape=param.shape.as_list(),
          dtype=tf.float32,
          trainable=False,
          initializer=tf.zeros_initializer())
      v = tf.get_variable(
          name=param_name + "/adam_v",
          shape=param.shape.as_list(),
          dtype=tf.float32,
          trainable=False,
          initializer=tf.zeros_initializer())

      # Standard Adam update.
      next_m = (
          tf.multiply(self.beta_1, m) + tf.multiply(1.0 - self.beta_1, grad))
      next_v = (
          tf.multiply(self.beta_2, v) + tf.multiply(1.0 - self.beta_2,
                                                    tf.square(grad)))

      update = next_m / (tf.sqrt(next_v) + self.epsilon)

      # Just adding the square of the weights to the loss function is *not*
      # the correct way of using L2 regularization/weight decay with Adam,
      # since that will interact with the m and v parameters in strange ways.
      #
      # Instead we want ot decay the weights in a manner that doesn't interact
      # with the m/v parameters. This is equivalent to adding the square
      # of the weights to the loss with plain (non-momentum) SGD.
      if self._do_use_weight_decay(param_name):
        update += self.weight_decay_rate * param

      update_with_lr = self.learning_rate * update

      next_param = param - update_with_lr

      assignments.extend(
          [param.assign(next_param),
           m.assign(next_m),
           v.assign(next_v)])
    return tf.group(*assignments, name=name)

  def _do_use_weight_decay(self, param_name):
    """Whether to use L2 weight decay for `param_name`."""
    if not self.weight_decay_rate:
      return False
    if self.exclude_from_weight_decay:
      for r in self.exclude_from_weight_decay:
        if re.search(r, param_name) is not None:
          return False
    return True

  def _get_variable_name(self, param_name):
    """Get the variable name from the tensor name."""
    m = re.match("^(.*):\\d+$", param_name)
    if m is not None:
      param_name = m.group(1)
    return param_name


================================================
FILE: a00_Bert/run_classifier_predict_online.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT finetuning runner of classification for online prediction. input is a list. output is a label."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import csv
import os
import modeling
import tokenization
import tensorflow as tf
import numpy as np

flags = tf.flags

FLAGS = flags.FLAGS

## Required parameters
BERT_BASE_DIR="./checkpoint_finetuing_law512/"
flags.DEFINE_string("bert_config_file", BERT_BASE_DIR+"bert_config.json",
    "The config json file corresponding to the pre-trained BERT model. "
    "This specifies the model architecture.")

flags.DEFINE_string("task_name", "sentence_pair", "The name of the task to train.")

flags.DEFINE_string("vocab_file", BERT_BASE_DIR+"vocab.txt",
                    "The vocabulary file that the BERT model was trained on.")

flags.DEFINE_string("init_checkpoint", BERT_BASE_DIR, # model.ckpt-66870--> /model.ckpt-66870
    "Initial checkpoint (usually from a pre-trained BERT model).")

flags.DEFINE_integer("max_seq_length", 512,
    "The maximum total input sequence length after WordPiece tokenization. "
    "Sequences longer than this will be truncated, and sequences shorter "
    "than this will be padded.")

flags.DEFINE_bool(
    "do_lower_case", True,
    "Whether to lower case the input text. Should be True for uncased "
    "models and False for cased models.")

class InputExample(object):
  """A single training/test example for simple sequence classification."""

  def __init__(self, guid, text_a, text_b=None, label=None):
    """Constructs a InputExample.
    Args:
      guid: Unique id for the example.
      text_a: string. The untokenized text of the first sequence. For single
        sequence tasks, only this sequence must be specified.
      text_b: (Optional) string. The untokenized text of the second sequence.
        Only must be specified for sequence pair tasks.
      label: (Optional) string. The label of the example. This should be
        specified for train and dev examples, but not for test examples.
    """
    self.guid = guid
    self.text_a = text_a
    self.text_b = text_b
    self.label = label


class InputFeatures(object):
  """A single set of features of data."""

  def __init__(self, input_ids, input_mask, segment_ids, label_id):
    self.input_ids = input_ids
    self.input_mask = input_mask
    self.segment_ids = segment_ids
    self.label_id = label_id


class DataProcessor(object):
  """Base class for data converters for sequence classification data sets."""

  def get_train_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the train set."""
    raise NotImplementedError()

  def get_dev_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the dev set."""
    raise NotImplementedError()

  def get_test_examples(self, data_dir):
    """Gets a collection of `InputExample`s for prediction."""
    raise NotImplementedError()

  def get_labels(self):
    """Gets the list of labels for this data set."""
    raise NotImplementedError()

  @classmethod
  def _read_tsv(cls, input_file, quotechar=None):
    """Reads a tab separated value file."""
    with tf.gfile.Open(input_file, "r") as f:
      reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
      lines = []
      for line in reader:
        lines.append(line)
      return lines

class SentencePairClassificationProcessor(DataProcessor):
  """Processor for the internal data set. sentence pair classification"""
  def __init__(self):
    self.language = "zh"

  #def get_train_examples(self, data_dir):
  #  """See base class."""
  #  return self._create_examples(
  #      self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  #def get_dev_examples(self, data_dir):
  #  """See base class."""
  #  return self._create_examples(
  #      self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

  #def get_test_examples(self, data_dir):
  #  """See base class."""
  #  return self._create_examples(
  #      self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  #def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
  #  examples = []
  #  for (i, line) in enumerate(lines):
  #    if i == 0:
  #      continue
  #    guid = "%s-%s" % (set_type, i)
  #    label = tokenization.convert_to_unicode(line[0])
  #    text_a = tokenization.convert_to_unicode(line[1])
  #    text_b = tokenization.convert_to_unicode(line[2])
  #    examples.append(
  #        InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
  #  return examples

def convert_single_example(ex_index, example, label_list, max_seq_length,tokenizer):
  """Converts a single `InputExample` into a single `InputFeatures`."""
  label_map = {}
  for (i, label) in enumerate(label_list):
    label_map[label] = i

  tokens_a = tokenizer.tokenize(example.text_a)
  tokens_b = None
  if example.text_b:
    tokens_b = tokenizer.tokenize(example.text_b)

  if tokens_b:
    # Modifies `tokens_a` and `tokens_b` in place so that the total
    # length is less than the specified length.
    # Account for [CLS], [SEP], [SEP] with "- 3"
    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
  else:
    # Account for [CLS] and [SEP] with "- 2"
    if len(tokens_a) > max_seq_length - 2:
      tokens_a = tokens_a[0:(max_seq_length - 2)]

  # The convention in BERT is:
  # (a) For sequence pairs:
  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
  # (b) For single sequences:
  #  tokens:   [CLS] the dog is hairy . [SEP]
  #  type_ids: 0     0   0   0  0     0 0
  #
  # Where "type_ids" are used to indicate whether this is the first
  # sequence or the second sequence. The embedding vectors for `type=0` and
  # `type=1` were learned during pre-training and are added to the wordpiece
  # embedding vector (and position vector). This is not *strictly* necessary
  # since the [SEP] token unambiguously separates the sequences, but it makes
  # it easier for the model to learn the concept of sequences.
  #
  # For classification tasks, the first vector (corresponding to [CLS]) is
  # used as as the "sentence vector". Note that this only makes sense because
  # the entire model is fine-tuned.
  tokens = []
  segment_ids = []
  tokens.append("[CLS]")
  segment_ids.append(0)
  for token in tokens_a:
    tokens.append(token)
    segment_ids.append(0)
  tokens.append("[SEP]")
  segment_ids.append(0)

  if tokens_b:
    for token in tokens_b:
      tokens.append(token)
      segment_ids.append(1)
    tokens.append("[SEP]")
    segment_ids.append(1)

  input_ids = tokenizer.convert_tokens_to_ids(tokens)

  # The mask has 1 for real tokens and 0 for padding tokens. Only real
  # tokens are attended to.
  input_mask = [1] * len(input_ids)

  # Zero-pad up to the sequence length.
  while len(input_ids) < max_seq_length:
    input_ids.append(0)
    input_mask.append(0)
    segment_ids.append(0)

  assert len(input_ids) == max_seq_length
  assert len(input_mask) == max_seq_length
  assert len(segment_ids) == max_seq_length

  label_id = label_map[example.label]
  if ex_index < 5:
    tf.logging.info("*** Example ***")
    tf.logging.info("guid: %s" % (example.guid))
    tf.logging.info("tokens: %s" % " ".join(
        [tokenization.printable_text(x) for x in tokens]))
    tf.logging.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
    tf.logging.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
    tf.logging.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))
    tf.logging.info("label: %s (id = %d)" % (example.label, label_id))

  feature = InputFeatures(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids,
      label_id=label_id)
  return feature

def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  """Truncates a sequence pair in place to the maximum length."""

  # This is a simple heuristic which will always truncate the longer sequence
  # one token at a time. This makes more sense than truncating an equal percent
  # of tokens from each, since if one sequence is very short then each token
  # that's truncated likely contains more information than a longer sequence.
  while True:
    total_length = len(tokens_a) + len(tokens_b)
    if total_length <= max_length:
      break
    if len(tokens_a) > len(tokens_b):
      tokens_a.pop()
    else:
      tokens_b.pop()

def create_int_feature(values):
  f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))
  return f

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  # In the demo, we are doing a simple classification task on the entire
  # segment.
  #
  # If you want to use the token-level output, use model.get_sequence_output()
  # instead.
  output_layer = model.get_pooled_output()

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable(
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable(
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities,model)


tf.logging.set_verbosity(tf.logging.INFO)
processors = {
  "sentence_pair":SentencePairClassificationProcessor,
}
bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
task_name = FLAGS.task_name.lower()
print("task_name:",task_name)
processor = processors[task_name]()
label_list = processor.get_labels()
#lines_dev=processor.get_dev_examples("./TEXT_DIR")
index2label={i:label_list[i] for i in range(len(label_list))}
tokenizer = tokenization.FullTokenizer(vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)


def main(_):
    pass

# init mode and session
# move something codes outside of function, so that this code will run only once during online prediction when predict_online is invoked.
is_training=False
use_one_hot_embeddings=False
batch_size=1
num_labels=len(label_list)
gpu_config = tf.ConfigProto()
gpu_config.gpu_options.allow_growth = True
sess=tf.Session(config=gpu_config)
model=None
global graph
input_ids_p,input_mask_p,label_ids_p,segment_ids_p=None,None,None,None
if not os.path.exists(FLAGS.init_checkpoint + "checkpoint"):
    raise Exception("failed to get checkpoint. going to return ")

graph = tf.get_default_graph()
with graph.as_default():
    print("going to restore checkpoint")
    #sess.run(tf.global_variables_initializer())
    input_ids_p = tf.placeholder(tf.int32, [batch_size, FLAGS.max_seq_length], name="input_ids")
    input_mask_p = tf.placeholder(tf.int32, [batch_size, FLAGS.max_seq_length], name="input_mask")
    label_ids_p = tf.placeholder(tf.int32, [batch_size], name="label_ids")
    segment_ids_p = tf.placeholder(tf.int32, [FLAGS.max_seq_length], name="segment_ids")
    total_loss, per_example_loss, logits, probabilities,model = create_model(
        bert_config, is_training, input_ids_p, input_mask_p, segment_ids_p,
        label_ids_p, num_labels, use_one_hot_embeddings)
    saver = tf.train.Saver()
    saver.restore(sess, tf.train.latest_checkpoint(FLAGS.init_checkpoint))

def predict_online(line):
    """
    do online prediction. each time make prediction for one instance.
    you can change to a batch if you want.
    :param line: a list. element is: [dummy_label,text_a,text_b]
    :return:
    """
    label = line[0] #tokenization.convert_to_unicode(line[0]) # this should compatible with format you defined in processor.
    text_a = line[1] #tokenization.convert_to_unicode(line[1])
    text_b = line[2] #tokenization.convert_to_unicode(line[2])
    example= InputExample(guid=0, text_a=text_a, text_b=text_b, label=label)
    feature = convert_single_example(0, example, label_list,FLAGS.max_seq_length, tokenizer)
    input_ids = np.reshape([feature.input_ids],(1,FLAGS.max_seq_length))
    input_mask = np.reshape([feature.input_mask],(1,FLAGS.max_seq_length))
    segment_ids =  np.reshape([feature.segment_ids],(FLAGS.max_seq_length))
    label_ids =[feature.label_id]

    global graph
    with graph.as_default():
        feed_dict = {input_ids_p: input_ids, input_mask_p: input_mask,segment_ids_p:segment_ids,label_ids_p:label_ids}
        possibility = sess.run([probabilities], feed_dict)
        possibility=possibility[0][0] # get first label
        label_index=np.argmax(possibility)
        label_predict=index2label[label_index]
        #print("label_predict:",label_predict,";possibility:",possibility)
    return label_predict,possibility

if __name__ == "__main__":
  example=['0','\u5165\u804c\u4e00\u5e74\u534a\u672a\u7b7e\u52b3\u52a8\u5408\u540c\u5c0f\u83f2\u6bd5\u4e1a\u4e8e\u67d0\u62a4\u6821\uff0c\u548c\u5176\u4ed6\u7684\u9ad8\u6821\u6bd5\u4e1a\u751f\u4e00\u6837\uff0c\u5979\u4e5f\u5f00\u59cb\u7740\u624b\u627e\u5de5\u4f5c\u3002\u5f88\u5feb\uff0c\u4e00\u5bb6\u6c11\u529e\u533b\u9662\u901a\u8fc7\u67d0\u62db\u8058\u7f51\u7ad9\u627e\u5230\u5c0f\u83f2\uff0c\u901a\u8fc7\u9762\u8bd5\u540e\uff0c\u5c0f\u83f2\u4fbf\u5f00\u59cb\u4e86\u81ea\u5df1\u7684\u804c\u573a\u751f\u6daf\u3002\u8f6c\u773c\u6bd5\u4e1a\u5de5\u4f5c\u8fd1\u4e00\u5e74\uff0c\u533b\u9662\u4ecd\u8fdf\u8fdf\u4e0d\u4e0e\u5176\u7b7e\u8ba2\u52b3\u52a8\u5408\u540c\uff0c\u5c0f\u83f2\u4e0e\u5355\u4f4d\u591a\u6b21\u6c9f\u901a\u534f\u5546\u672a\u679c\uff0c\u65e0\u5948\u5c06\u533b\u9662\u8bc9\u81f3\u6cd5\u9662\u000d\u000a','\u652f\u4ed8\u5de5\u8d44']
  result=predict_online(example)
  print("result:",result)



================================================
FILE: a00_Bert/tokenization.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Tokenization classes."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import unicodedata
import six
import tensorflow as tf


def convert_to_unicode(text):
  """Converts `text` to Unicode (if it's not already), assuming utf-8 input."""
  if six.PY3:
    if isinstance(text, str):
      return text
    elif isinstance(text, bytes):
      return text.decode("utf-8", "ignore")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  elif six.PY2:
    if isinstance(text, str):
      return text.decode("utf-8", "ignore")
    elif isinstance(text, unicode):
      return text
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  else:
    raise ValueError("Not running on Python2 or Python 3?")


def printable_text(text):
  """Returns text encoded in a way suitable for print or `tf.logging`."""

  # These functions want `str` for both Python2 and Python3, but in one case
  # it's a Unicode string and in the other it's a byte string.
  if six.PY3:
    if isinstance(text, str):
      return text
    elif isinstance(text, bytes):
      return text.decode("utf-8", "ignore")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  elif six.PY2:
    if isinstance(text, str):
      return text
    elif isinstance(text, unicode):
      return text.encode("utf-8")
    else:
      raise ValueError("Unsupported string type: %s" % (type(text)))
  else:
    raise ValueError("Not running on Python2 or Python 3?")


def load_vocab(vocab_file):
  """Loads a vocabulary file into a dictionary."""
  vocab = collections.OrderedDict()
  index = 0
  with tf.gfile.GFile(vocab_file, "r") as reader:
    while True:
      token = convert_to_unicode(reader.readline())
      if not token:
        break
      token = token.strip()
      vocab[token] = index
      index += 1
  return vocab


def convert_by_vocab(vocab, items):
  """Converts a sequence of [tokens|ids] using the vocab."""
  output = []
  for item in items:
    output.append(vocab[item])
  return output


def convert_tokens_to_ids(vocab, tokens):
  return convert_by_vocab(vocab, tokens)


def convert_ids_to_tokens(inv_vocab, ids):
  return convert_by_vocab(inv_vocab, ids)


def whitespace_tokenize(text):
  """Runs basic whitespace cleaning and splitting on a peice of text."""
  text = text.strip()
  if not text:
    return []
  tokens = text.split()
  return tokens


class FullTokenizer(object):
  """Runs end-to-end tokenziation."""

  def __init__(self, vocab_file, do_lower_case=True):
    self.vocab = load_vocab(vocab_file)
    self.inv_vocab = {v: k for k, v in self.vocab.items()}
    self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case)
    self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)

  def tokenize(self, text):
    split_tokens = []
    for token in self.basic_tokenizer.tokenize(text):
      for sub_token in self.wordpiece_tokenizer.tokenize(token):
        split_tokens.append(sub_token)

    return split_tokens

  def convert_tokens_to_ids(self, tokens):
    return convert_by_vocab(self.vocab, tokens)

  def convert_ids_to_tokens(self, ids):
    return convert_by_vocab(self.inv_vocab, ids)


class BasicTokenizer(object):
  """Runs basic tokenization (punctuation splitting, lower casing, etc.)."""

  def __init__(self, do_lower_case=True):
    """Constructs a BasicTokenizer.

    Args:
      do_lower_case: Whether to lower case the input.
    """
    self.do_lower_case = do_lower_case

  def tokenize(self, text):
    """Tokenizes a piece of text."""
    text = convert_to_unicode(text)
    text = self._clean_text(text)

    # This was added on November 1st, 2018 for the multilingual and Chinese
    # models. This is also applied to the English models now, but it doesn't
    # matter since the English models were not trained on any Chinese data
    # and generally don't have any Chinese data in them (there are Chinese
    # characters in the vocabulary because Wikipedia does have some Chinese
    # words in the English Wikipedia.).
    text = self._tokenize_chinese_chars(text)

    orig_tokens = whitespace_tokenize(text)
    split_tokens = []
    for token in orig_tokens:
      if self.do_lower_case:
        token = token.lower()
        token = self._run_strip_accents(token)
      split_tokens.extend(self._run_split_on_punc(token))

    output_tokens = whitespace_tokenize(" ".join(split_tokens))
    return output_tokens

  def _run_strip_accents(self, text):
    """Strips accents from a piece of text."""
    text = unicodedata.normalize("NFD", text)
    output = []
    for char in text:
      cat = unicodedata.category(char)
      if cat == "Mn":
        continue
      output.append(char)
    return "".join(output)

  def _run_split_on_punc(self, text):
    """Splits punctuation on a piece of text."""
    chars = list(text)
    i = 0
    start_new_word = True
    output = []
    while i < len(chars):
      char = chars[i]
      if _is_punctuation(char):
        output.append([char])
        start_new_word = True
      else:
        if start_new_word:
          output.append([])
        start_new_word = False
        output[-1].append(char)
      i += 1

    return ["".join(x) for x in output]

  def _tokenize_chinese_chars(self, text):
    """Adds whitespace around any CJK character."""
    output = []
    for char in text:
      cp = ord(char)
      if self._is_chinese_char(cp):
        output.append(" ")
        output.append(char)
        output.append(" ")
      else:
        output.append(char)
    return "".join(output)

  def _is_chinese_char(self, cp):
    """Checks whether CP is the codepoint of a CJK character."""
    # This defines a "chinese character" as anything in the CJK Unicode block:
    #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
    #
    # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
    # despite its name. The modern Korean Hangul alphabet is a different block,
    # as is Japanese Hiragana and Katakana. Those alphabets are used to write
    # space-separated words, so they are not treated specially and handled
    # like the all of the other languages.
    if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
        (cp >= 0x3400 and cp <= 0x4DBF) or  #
        (cp >= 0x20000 and cp <= 0x2A6DF) or  #
        (cp >= 0x2A700 and cp <= 0x2B73F) or  #
        (cp >= 0x2B740 and cp <= 0x2B81F) or  #
        (cp >= 0x2B820 and cp <= 0x2CEAF) or
        (cp >= 0xF900 and cp <= 0xFAFF) or  #
        (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
      return True

    return False

  def _clean_text(self, text):
    """Performs invalid character removal and whitespace cleanup on text."""
    output = []
    for char in text:
      cp = ord(char)
      if cp == 0 or cp == 0xfffd or _is_control(char):
        continue
      if _is_whitespace(char):
        output.append(" ")
      else:
        output.append(char)
    return "".join(output)


class WordpieceTokenizer(object):
  """Runs WordPiece tokenziation."""

  def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
    self.vocab = vocab
    self.unk_token = unk_token
    self.max_input_chars_per_word = max_input_chars_per_word

  def tokenize(self, text):
    """Tokenizes a piece of text into its word pieces.

    This uses a greedy longest-match-first algorithm to perform tokenization
    using the given vocabulary.

    For example:
      input = "unaffable"
      output = ["un", "##aff", "##able"]

    Args:
      text: A single token or whitespace separated tokens. This should have
        already been passed through `BasicTokenizer.

    Returns:
      A list of wordpiece tokens.
    """

    text = convert_to_unicode(text)

    output_tokens = []
    for token in whitespace_tokenize(text):
      chars = list(token)
      if len(chars) > self.max_input_chars_per_word:
        output_tokens.append(self.unk_token)
        continue

      is_bad = False
      start = 0
      sub_tokens = []
      while start < len(chars):
        end = len(chars)
        cur_substr = None
        while start < end:
          substr = "".join(chars[start:end])
          if start > 0:
            substr = "##" + substr
          if substr in self.vocab:
            cur_substr = substr
            break
          end -= 1
        if cur_substr is None:
          is_bad = True
          break
        sub_tokens.append(cur_substr)
        start = end

      if is_bad:
        output_tokens.append(self.unk_token)
      else:
        output_tokens.extend(sub_tokens)
    return output_tokens


def _is_whitespace(char):
  """Checks whether `chars` is a whitespace character."""
  # \t, \n, and \r are technically contorl characters but we treat them
  # as whitespace since they are generally considered as such.
  if char == " " or char == "\t" or char == "\n" or char == "\r":
    return True
  cat = unicodedata.category(char)
  if cat == "Zs":
    return True
  return False


def _is_control(char):
  """Checks whether `chars` is a control character."""
  # These are technically control characters but we count them as whitespace
  # characters.
  if char == "\t" or char == "\n" or char == "\r":
    return False
  cat = unicodedata.category(char)
  if cat.startswith("C"):
    return True
  return False


def _is_punctuation(char):
  """Checks whether `chars` is a punctuation character."""
  cp = ord(char)
  # We treat all non-letter/number ASCII as punctuation.
  # Characters such as "^", "$", and "`" are not in the Unicode
  # Punctuation class but we treat them as punctuation anyways, for
  # consistency.
  if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
      (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
    return True
  cat = unicodedata.category(char)
  if cat.startswith("P"):
    return True
  return False


================================================
FILE: a00_Bert/train_bert_multi-label.py
================================================
# coding=utf-8
"""
train bert model

1.get training data and vocabulary & labels dict
2. create model
3. train the model and report f1 score
"""
import bert_modeling as modeling
import tensorflow as tf
import os
import numpy as np

from utils import load_data,init_label_dict,get_target_label_short,compute_confuse_matrix,compute_micro_macro,compute_confuse_matrix_batch,get_label_using_logits_batch,get_target_label_short_batch

FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("cache_file_h5py","../data/ieee_zhihu_cup/data.h5","path of training/validation/test data.") #../data/sample_multiple_label.txt
tf.app.flags.DEFINE_string("cache_file_pickle","../data/ieee_zhihu_cup/vocab_label.pik","path of vocabulary and label files") #../data/sample_multiple_label.txt

tf.app.flags.DEFINE_float("learning_rate",0.0001,"learning rate")
tf.app.flags.DEFINE_integer("batch_size", 256, "Batch size for training/evaluating.") #批处理的大小 32-->128
tf.app.flags.DEFINE_string("ckpt_dir","checkpoint/","checkpoint location for the model")
tf.app.flags.DEFINE_boolean("is_training",True,"is training.true:tranining,false:testing/inference")
tf.app.flags.DEFINE_integer("num_epochs",15,"number of epochs to run.")

# below hyper-parameter is for bert model
# to train a big model,                     use hidden_size=768, num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072
# to train a middel size model, train fast. use hidden_size=128, num_hidden_layers=4, num_attention_heads=8, intermediate_size=1024
tf.app.flags.DEFINE_integer("hidden_size",128,"hidden size") # 768
tf.app.flags.DEFINE_integer("num_hidden_layers",2,"number of hidden layers") # 12--->4
tf.app.flags.DEFINE_integer("num_attention_heads",4,"number of attention headers") # 12
tf.app.flags.DEFINE_integer("intermediate_size",256,"intermediate size of hidden layer") # 3072-->512
tf.app.flags.DEFINE_integer("max_seq_length",200,"max sequence length")

def main(_):
    # 1. get training data and vocabulary & labels dict
    word2index, label2index, trainX, trainY, vaildX, vaildY, testX, testY = load_data(FLAGS.cache_file_h5py,FLAGS.cache_file_pickle)
    vocab_size = len(word2index); print("bert model.vocab_size:", vocab_size);
    num_labels = len(label2index); print("num_labels:", num_labels); cls_id=word2index['CLS'];print("id of 'CLS':",word2index['CLS'])
    num_examples, FLAGS.max_seq_length = trainX.shape;print("num_examples of training:", num_examples, ";max_seq_length:", FLAGS.max_seq_length)

    # 2. create model, define train operation
    bert_config = modeling.BertConfig(vocab_size=len(word2index), hidden_size=FLAGS.hidden_size, num_hidden_layers=FLAGS.num_hidden_layers,
                                      num_attention_heads=FLAGS.num_attention_heads,intermediate_size=FLAGS.intermediate_size)
    input_ids = tf.placeholder(tf.int32, [None, FLAGS.max_seq_length], name="input_ids") # FLAGS.batch_size
    input_mask = tf.placeholder(tf.int32, [None, FLAGS.max_seq_length], name="input_mask")
    segment_ids = tf.placeholder(tf.int32, [None,FLAGS.max_seq_length],name="segment_ids")
    label_ids = tf.placeholder(tf.float32, [None,num_labels], name="label_ids")
    is_training = tf.placeholder(tf.bool, name="is_training") # FLAGS.is_training

    use_one_hot_embeddings = False
    loss, per_example_loss, logits, probabilities, model = create_model(bert_config, is_training, input_ids, input_mask,
                                                            segment_ids, label_ids, num_labels,use_one_hot_embeddings)
    # define train operation
    #num_train_steps = int(float(num_examples) / float(FLAGS.batch_size * FLAGS.num_epochs)); use_tpu=False; num_warmup_steps = int(num_train_steps * 0.1)
    #train_op = optimization.create_optimizer(loss, FLAGS.learning_rate, num_train_steps, num_warmup_steps, use_tpu)
    global_step = tf.Variable(0, trainable=False, name="Global_Step")
    train_op = tf.contrib.layers.optimize_loss(loss, global_step=global_step, learning_rate=FLAGS.learning_rate,optimizer="Adam", clip_gradients=3.0)

    # 3. train the model by calling create model, get loss
    gpu_config = tf.ConfigProto()
    gpu_config.gpu_options.allow_growth = True
    sess = tf.Session(config=gpu_config)
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    if os.path.exists(FLAGS.ckpt_dir + "checkpoint"):
        print("Checkpoint Exists. Restoring Variables from Checkpoint.")
        saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_dir))
    number_of_training_data = len(trainX)
    iteration = 0
    curr_epoch = 0 #sess.run(textCNN.epoch_step)
    batch_size = FLAGS.batch_size
    for epoch in range(curr_epoch, FLAGS.num_epochs):
        loss_total, counter = 0.0, 0
        for start, end in zip(range(0, number_of_training_data, batch_size),range(batch_size, number_of_training_data, batch_size)):
            iteration = iteration + 1 ###
            input_mask_, segment_ids_, input_ids_=get_input_mask_segment_ids(trainX[start:end],cls_id) # input_ids_,input_mask_,segment_ids_
            feed_dict = {input_ids: input_ids_, input_mask: input_mask_, segment_ids:segment_ids_,
                         label_ids:trainY[start:end],is_training:True}
            curr_loss,_ = sess.run([loss,train_op], feed_dict)
            loss_total, counter = loss_total + curr_loss, counter + 1
            if counter % 30 == 0:
                print(epoch,"\t",iteration,"\tloss:",loss_total/float(counter),"\tcurrent_loss:",curr_loss)
            if counter % 300==0:
                print("input_ids[",start,"]:",input_ids_[0]);#print("trainY[start:end]:",trainY[start:end])
                try:
                    target_labels = get_target_label_short_batch(trainY[start:end]);#print("target_labels:",target_labels)
                    print("trainY[",start,"]:",target_labels[0])
                except:
                    pass
            # evaulation
            if start!=0 and start % (1000 * FLAGS.batch_size) == 0:
                eval_loss, f1_score, f1_micro, f1_macro = do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training,loss,
                                                                  probabilities,vaildX, vaildY, num_labels,batch_size,cls_id)
                print("Epoch %d Validation Loss:%.3f\tF1 Score:%.3f\tF1_micro:%.3f\tF1_macro:%.3f" % (
                    epoch, eval_loss, f1_score, f1_micro, f1_macro))
                # save model to checkpoint
                #if start % (4000 * FLAGS.batch_size)==0:
                save_path = FLAGS.ckpt_dir + "model.ckpt"
                print("Going to save model..")
                saver.save(sess, save_path, global_step=epoch)

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,labels, num_labels, use_one_hot_embeddings,reuse_flag=False):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  output_layer = model.get_pooled_output()
  hidden_size = output_layer.shape[-1].value
  with tf.variable_scope("weights",reuse=reuse_flag):
      output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
      output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    #if is_training:
    #    print("###create_model.is_training:",is_training)
    #    output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
    def apply_dropout_last_layer(output_layer):
        output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
        return output_layer

    def not_apply_dropout(output_layer):
        return output_layer

    output_layer=tf.cond(is_training, lambda: apply_dropout_last_layer(output_layer), lambda:not_apply_dropout(output_layer))
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    print("output_layer:",output_layer.shape,";output_weights:",output_weights.shape,";logits:",logits.shape) # shape=(?, 1999)

    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.sigmoid(logits) #tf.nn.softmax(logits, axis=-1)
    per_example_loss=tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits) # shape=(?, 1999)
    loss_batch = tf.reduce_sum(per_example_loss,axis=1)  #  (?,)
    loss=tf.reduce_mean(loss_batch) #  (?,)

    return loss, per_example_loss, logits, probabilities,model


def do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training,loss,probabilities,vaildX, vaildY, num_labels,batch_size,cls_id):
    """
    evalution on model using validation data
    """
    num_eval=1000
    vaildX = vaildX[0:num_eval]
    vaildY = vaildY[0:num_eval]
    number_examples = len(vaildX)
    eval_loss, eval_counter, eval_f1_score, eval_p, eval_r = 0.0, 0, 0.0, 0.0, 0.0
    label_dict = init_label_dict(num_labels)
    print("do_eval.number_examples:",number_examples)
    f1_score_micro_sklearn_total=0.0
    # batch_size=1 # TODO
    for start, end in zip(range(0, number_examples, batch_size), range(batch_size, number_examples, batch_size)):
        input_mask_, segment_ids_, input_ids_ = get_input_mask_segment_ids(vaildX[start:end],cls_id)
        feed_dict = {input_ids: input_ids_,input_mask:input_mask_,segment_ids:segment_ids_,
                     label_ids:vaildY[start:end],is_training:False}
        curr_eval_loss, prob = sess.run([loss, probabilities],feed_dict)
        target_labels=get_target_label_short_batch(vaildY[start:end])
        predict_labels=get_label_using_logits_batch(prob)
        if start%100==0:
            print("prob.shape:",prob.shape,";prob:",prob)
            print("predict_labels:",predict_labels)

        #print("predict_labels:",predict_labels)
        label_dict=compute_confuse_matrix_batch(target_labels,predict_labels,label_dict,name='bert')
        eval_loss, eval_counter = eval_loss + curr_eval_loss, eval_counter + 1

    f1_micro, f1_macro = compute_micro_macro(label_dict)  # label_dictis a dict, key is: accusation,value is: (TP,FP,FN). where TP is number of True Positive
    f1_score_result = (f1_micro + f1_macro) / 2.0
    return eval_loss / float(eval_counter+0.00001), f1_score_result, f1_micro, f1_macro

def get_input_mask_segment_ids(train_x_batch,cls_id):
    """
    get input mask and segment ids given a batch of input x.
    if sequence length of input x is max_sequence_length, then shape of both input_mask and segment_ids should be
    [batch_size, max_sequence_length]. for those padding tokens, input_mask will be zero, value for all other place is one.
    :param train_x_batch:
    :return: input_mask_,segment_ids
    """
    batch_size,max_sequence_length=train_x_batch.shape
    input_mask=np.ones((batch_size,max_sequence_length),dtype=np.int32)
    # set 0 for token in padding postion
    for i in range(batch_size):
        input_x_=train_x_batch[i] # a list, length is max_sequence_length
        input_x=list(input_x_)
        for j in range(len(input_x)):
            if input_x[j]==0:
                input_mask[i][j:]=0
                break
    # insert CLS token for classification
    input_ids=np.zeros((batch_size,max_sequence_length),dtype=np.int32)
    #print("input_ids.shape1:",input_ids.shape)
    for k in range(batch_size):
        input_id_list=list(train_x_batch[k])
        input_id_list.insert(0,cls_id)
        del input_id_list[-1]
        input_ids[k]=input_id_list
    #print("input_ids.shape2:",input_ids.shape)

    segment_ids=np.ones((batch_size,max_sequence_length),dtype=np.int32)
    return input_mask, segment_ids,input_ids

#train_x_batch=np.ones((3,5))
#train_x_batch[0,4]=0
#train_x_batch[1,3]=0
#train_x_batch[1,4]=0
#cls_id=2
#print("train_x_batch:",train_x_batch)
#input_mask, segment_ids,input_ids=get_input_mask_segment_ids(train_x_batch,cls_id)
#print("input_mask:",input_mask, "segment_ids:",segment_ids,"input_ids:",input_ids)

if __name__ == "__main__":
    tf.app.run()

================================================
FILE: a00_Bert/train_bert_toy_task.py
================================================
# coding=utf-8
"""
train bert model
"""
import modeling
import tensorflow as tf
import numpy as np
import argparse

parser = argparse.ArgumentParser(description='Describe your program')
parser.add_argument('-batch_size', '--batch_size', type=int,default=128)
args = parser.parse_args()
batch_size=args.batch_size
print("batch_size:",batch_size)
def bert_train_fn():
    is_training=True
    hidden_size = 768
    num_labels = 10
    #batch_size=128
    max_seq_length=512
    use_one_hot_embeddings = False
    bert_config = modeling.BertConfig(vocab_size=21128, hidden_size=hidden_size, num_hidden_layers=12,
                                      num_attention_heads=12,intermediate_size=3072)

    input_ids = tf.placeholder(tf.int32, [batch_size, max_seq_length], name="input_ids")
    input_mask = tf.placeholder(tf.int32, [batch_size, max_seq_length], name="input_mask")
    segment_ids = tf.placeholder(tf.int32, [batch_size,max_seq_length],name="segment_ids")
    label_ids = tf.placeholder(tf.float32, [batch_size,num_labels], name="label_ids")
    loss, per_example_loss, logits, probabilities, model = create_model(bert_config, is_training, input_ids, input_mask,
                                                                        segment_ids, label_ids, num_labels,
                                                                        use_one_hot_embeddings)
    # 1. generate or load training/validation/test data. e.g. train:(X,y). X is input_ids,y is labels.

    # 2. train the model by calling create model, get loss
    gpu_config = tf.ConfigProto()
    gpu_config.gpu_options.allow_growth = True
    sess = tf.Session(config=gpu_config)
    sess.run(tf.global_variables_initializer())
    for i in range(1000):
        input_ids_=np.ones((batch_size,max_seq_length),dtype=np.int32)
        input_mask_=np.ones((batch_size,max_seq_length),dtype=np.int32)
        segment_ids_=np.ones((batch_size,max_seq_length),dtype=np.int32)
        label_ids_=np.ones((batch_size,num_labels),dtype=np.float32)
        feed_dict = {input_ids: input_ids_, input_mask: input_mask_,segment_ids:segment_ids_,label_ids:label_ids_}
        loss_ = sess.run([loss], feed_dict)
        print("loss:",loss_)
    # 3. eval the model from time to time

def bert_predict_fn():
    # 1. predict based on
    pass

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  output_layer = model.get_pooled_output()
  hidden_size = output_layer.shape[-1].value
  output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
  output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:  # if training, add dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    print("output_layer:",output_layer.shape,";output_weights:",output_weights.shape,";logits:",logits.shape)

    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    per_example_loss=tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
    loss = tf.reduce_mean(per_example_loss)

    return loss, per_example_loss, logits, probabilities,model

bert_train_fn()


================================================
FILE: a00_Bert/unused/run_classifier_multi_labels_bert.py
================================================
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""BERT finetuning runner."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import csv
import os
from . import bert_modeling as modeling
from . import optimization
from . import tokenization
import tensorflow as tf

flags = tf.flags

FLAGS = flags.FLAGS

## Required parameters
flags.DEFINE_string(
    "data_dir", None,
    "The input data dir. Should contain the .tsv files (or other data files) "
    "for the task.")

flags.DEFINE_string(
    "bert_config_file", None,
    "The config json file corresponding to the pre-trained BERT model. "
    "This specifies the model architecture.")

flags.DEFINE_string("task_name", None, "The name of the task to train.")

flags.DEFINE_string("vocab_file", None,
                    "The vocabulary file that the BERT model was trained on.")

flags.DEFINE_string(
    "output_dir", None,
    "The output directory where the model checkpoints will be written.")

## Other parameters

flags.DEFINE_string(
    "init_checkpoint", None,
    "Initial checkpoint (usually from a pre-trained BERT model).")

flags.DEFINE_bool(
    "do_lower_case", True,
    "Whether to lower case the input text. Should be True for uncased "
    "models and False for cased models.")

flags.DEFINE_integer(
    "max_seq_length", 128,
    "The maximum total input sequence length after WordPiece tokenization. "
    "Sequences longer than this will be truncated, and sequences shorter "
    "than this will be padded.")

flags.DEFINE_bool("do_train", False, "Whether to run training.")

flags.DEFINE_bool("do_eval", False, "Whether to run eval on the dev set.")

flags.DEFINE_bool(
    "do_predict", False,
    "Whether to run the model in inference mode on the test set.")

flags.DEFINE_integer("train_batch_size", 32, "Total batch size for training.")

flags.DEFINE_integer("eval_batch_size", 8, "Total batch size for eval.")

flags.DEFINE_integer("predict_batch_size", 8, "Total batch size for predict.")

flags.DEFINE_float("learning_rate", 5e-5, "The initial learning rate for Adam.")

flags.DEFINE_float("num_train_epochs", 3.0,
                   "Total number of training epochs to perform.")

flags.DEFINE_float(
    "warmup_proportion", 0.1,
    "Proportion of training to perform linear learning rate warmup for. "
    "E.g., 0.1 = 10% of training.")

flags.DEFINE_integer("save_checkpoints_steps", 1000,
                     "How often to save the model checkpoint.")

flags.DEFINE_integer("iterations_per_loop", 1000,
                     "How many steps to make in each estimator call.")

flags.DEFINE_bool("use_tpu", False, "Whether to use TPU or GPU/CPU.")

tf.flags.DEFINE_string(
    "tpu_name", None,
    "The Cloud TPU to use for training. This should be either the name "
    "used when creating the Cloud TPU, or a grpc://ip.address.of.tpu:8470 "
    "url.")

tf.flags.DEFINE_string(
    "tpu_zone", None,
    "[Optional] GCE zone where the Cloud TPU is located in. If not "
    "specified, we will attempt to automatically detect the GCE project from "
    "metadata.")

tf.flags.DEFINE_string(
    "gcp_project", None,
    "[Optional] Project name for the Cloud TPU-enabled project. If not "
    "specified, we will attempt to automatically detect the GCE project from "
    "metadata.")

tf.flags.DEFINE_string("master", None, "[Optional] TensorFlow master URL.")

flags.DEFINE_integer(
    "num_tpu_cores",    8,
    "Only used if `use_tpu` is True. Total number of TPU cores to use.")


# task specific parameter( sentiment analysis)
flags.DEFINE_integer("num_classes", 80, "Total number of labels for sentiment analysis")
flags.DEFINE_integer("num_aspects", 20, "Total number of aspect")

flags.DEFINE_list("aspect_value_list", [-2,-1,0,1], "Values that a aspect can have")



class InputExample(object):
  """A single training/test example for simple sequence classification."""

  def __init__(self, guid, text_a, text_b=None, label=None):
    """Constructs a InputExample.

    Args:
      guid: Unique id for the example.
      text_a: string. The untokenized text of the first sequence. For single
        sequence tasks, only this sequence must be specified.
      text_b: (Optional) string. The untokenized text of the second sequence.
        Only must be specified for sequence pair tasks.
      label: (Optional) string. The label of the example. This should be
        specified for train and dev examples, but not for test examples.
    """
    self.guid = guid
    self.text_a = text_a
    self.text_b = text_b
    self.label = label


class InputFeatures(object):
  """A single set of features of data."""

  def __init__(self, input_ids, input_mask, segment_ids, label_id):
    self.input_ids = input_ids
    self.input_mask = input_mask
    self.segment_ids = segment_ids
    self.label_id = label_id


class DataProcessor(object):
  """Base class for data converters for sequence classification data sets."""

  def get_train_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the train set."""
    raise NotImplementedError()

  def get_dev_examples(self, data_dir):
    """Gets a collection of `InputExample`s for the dev set."""
    raise NotImplementedError()

  def get_test_examples(self, data_dir):
    """Gets a collection of `InputExample`s for prediction."""
    raise NotImplementedError()

  def get_labels(self):
    """Gets the list of labels for this data set."""
    raise NotImplementedError()

  @classmethod
  def _read_tsv(cls, input_file, quotechar=None):
    """Reads a tab separated value file."""
    with tf.gfile.Open(input_file, "r") as f:
      reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
      lines = []
      for line in reader:
        lines.append(line)
      return lines


class XnliProcessor(DataProcessor):
  """Processor for the XNLI data set."""

  def __init__(self):
    self.language = "zh"

  def get_train_examples(self, data_dir):
    """See base class."""
    lines = self._read_tsv(
        os.path.join(data_dir, "multinli",
                     "multinli.train.%s.tsv" % self.language))
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0:
        continue
      guid = "train-%d" % (i)
      text_a = tokenization.convert_to_unicode(line[0])
      text_b = tokenization.convert_to_unicode(line[1])
      label = tokenization.convert_to_unicode(line[2])
      if label == tokenization.convert_to_unicode("contradictory"):
        label = tokenization.convert_to_unicode("contradiction")
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples

  def get_dev_examples(self, data_dir):
    """See base class."""
    lines = self._read_tsv(os.path.join(data_dir, "xnli.dev.tsv"))
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0:
        continue
      guid = "dev-%d" % (i)
      language = tokenization.convert_to_unicode(line[0])
      if language != tokenization.convert_to_unicode(self.language):
        continue
      text_a = tokenization.convert_to_unicode(line[6])
      text_b = tokenization.convert_to_unicode(line[7])
      label = tokenization.convert_to_unicode(line[1])
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples

  def get_labels(self):
    """See base class."""
    return ["contradiction", "entailment", "neutral"]


class MnliProcessor(DataProcessor):
  """Processor for the MultiNLI data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "dev_matched.tsv")),
        "dev_matched")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "test_matched.tsv")), "test")

  def get_labels(self):
    """See base class."""
    return ["contradiction", "entailment", "neutral"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0:
        continue
      guid = "%s-%s" % (set_type, tokenization.convert_to_unicode(line[0]))
      text_a = tokenization.convert_to_unicode(line[8])
      text_b = tokenization.convert_to_unicode(line[9])
      if set_type == "test":
        label = "contradiction"
      else:
        label = tokenization.convert_to_unicode(line[-1])
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


class MrpcProcessor(DataProcessor):
  """Processor for the MRPC data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0:
        continue
      guid = "%s-%s" % (set_type, i)
      text_a = tokenization.convert_to_unicode(line[3])
      text_b = tokenization.convert_to_unicode(line[4])
      if set_type == "test":
        label = "0"
      else:
        label = tokenization.convert_to_unicode(line[0])
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples


class ColaProcessor(DataProcessor):
  """Processor for the CoLA data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      # Only the test set has a header
      if set_type == "test" and i == 0:
        continue
      guid = "%s-%s" % (set_type, i)
      if set_type == "test":
        text_a = tokenization.convert_to_unicode(line[1])
        label = "0"
      else:
        text_a = tokenization.convert_to_unicode(line[3])
        label = tokenization.convert_to_unicode(line[1])
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    return examples

class SentimentAnalysisFineGrainProcessor(DataProcessor):
  """Processor for the CoLA data set (GLUE version)."""

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

  def get_labels(self):
    """See base class."""
    label_list=[]
    #num_aspect=FLAGS.num_aspects
    aspect_value_list=FLAGS.aspect_value_list #[-2,-1,0,1]
    for i in range(20):
        for value in aspect_value_list:
            label_list.append(str(i) + "_" + str(value))
    return label_list #[ {'0_-2': 0, '0_-1': 1, '0_0': 2, '0_1': 3,....'19_-2': 76, '19_-1': 77, '19_0': 78, '19_1': 79}]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      # Only the test set has a header
      if set_type == "test" and i == 0:
        continue
      guid = "%s-%s" % (set_type, i)
      #if set_type == "test":
      #  text_a = tokenization.convert_to_unicode(line[1])
      #  label = "0"
      #else:
      #  text_a = tokenization.convert_to_unicode(line[3])
      #  label = tokenization.convert_to_unicode(line[1])
      label = tokenization.convert_to_unicode(line[0])
      text_a = tokenization.convert_to_unicode(line[1])

      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    return examples

class SentencePairClassificationProcessor(DataProcessor):
  """Processor for the internal data set. sentence pair classification"""
  def __init__(self):
    self.language = "zh"

  def get_train_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

  def get_dev_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

  def get_test_examples(self, data_dir):
    """See base class."""
    return self._create_examples(
        self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

  def get_labels(self):
    """See base class."""
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0:
        continue
      guid = "%s-%s" % (set_type, i)
      label = tokenization.convert_to_unicode(line[0])
      text_a = tokenization.convert_to_unicode(line[1])
      text_b = tokenization.convert_to_unicode(line[2])
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
    return examples

def convert_single_example(ex_index, example, label_list, max_seq_length,
                           tokenizer):
  """Converts a single `InputExample` into a single `InputFeatures`."""
  label_map = {}
  for (i, label) in enumerate(label_list):
    label_map[label] = i

  tokens_a = tokenizer.tokenize(example.text_a)
  tokens_b = None
  if example.text_b:
    tokens_b = tokenizer.tokenize(example.text_b)

  if tokens_b:
    # Modifies `tokens_a` and `tokens_b` in place so that the total
    # length is less than the specified length.
    # Account for [CLS], [SEP], [SEP] with "- 3"
    _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
  else:
    # Account for [CLS] and [SEP] with "- 2"
    if len(tokens_a) > max_seq_length - 2:
      tokens_a = tokens_a[0:(max_seq_length - 2)]

  # The convention in BERT is:
  # (a) For sequence pairs:
  #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
  #  type_ids: 0     0  0    0    0     0       0 0     1  1  1  1   1 1
  # (b) For single sequences:
  #  tokens:   [CLS] the dog is hairy . [SEP]
  #  type_ids: 0     0   0   0  0     0 0
  #
  # Where "type_ids" are used to indicate whether this is the first
  # sequence or the second sequence. The embedding vectors for `type=0` and
  # `type=1` were learned during pre-training and are added to the wordpiece
  # embedding vector (and position vector). This is not *strictly* necessary
  # since the [SEP] token unambiguously separates the sequences, but it makes
  # it easier for the model to learn the concept of sequences.
  #
  # For classification tasks, the first vector (corresponding to [CLS]) is
  # used as as the "sentence vector". Note that this only makes sense because
  # the entire model is fine-tuned.
  tokens = []
  segment_ids = []
  tokens.append("[CLS]")
  segment_ids.append(0)
  for token in tokens_a:
    tokens.append(token)
    segment_ids.append(0)
  tokens.append("[SEP]")
  segment_ids.append(0)

  if tokens_b:
    for token in tokens_b:
      tokens.append(token)
      segment_ids.append(1)
    tokens.append("[SEP]")
    segment_ids.append(1)

  input_ids = tokenizer.convert_tokens_to_ids(tokens)

  # The mask has 1 for real tokens and 0 for padding tokens. Only real
  # tokens are attended to.
  input_mask = [1] * len(input_ids)

  # Zero-pad up to the sequence length.
  while len(input_ids) < max_seq_length:
    input_ids.append(0)
    input_mask.append(0)
    segment_ids.append(0)

  assert len(input_ids) == max_seq_length
  assert len(input_mask) == max_seq_length
  assert len(segment_ids) == max_seq_length
  #print("label_map:",label_map,";length of label_map:",len(label_map))
  label_id=None
  if "," in example.label: # multiple label
      # get list of label
      label_id_list=[]
      label_list=example.label.split(",")
      for label_ in label_list:
          label_id_list.append(label_map[label_])
      #print("label_id_list:",label_id_list)
      # convert to multi-hot style
      label_id=[0 for l in range(len(label_map))]
      for j, label_index in enumerate(label_id_list):
          label_id[label_index]=1
  else: # single label
      label_id = label_map[example.label]
  if ex_index < 5:
    tf.logging.info("*** Example ***")
    tf.logging.info("guid: %s" % (example.guid))
    tf.logging.info("tokens: %s" % " ".join(
        [tokenization.printable_text(x) for x in tokens]))
    tf.logging.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
    tf.logging.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
    tf.logging.info("segment_ids: %s" % " ".join([str(x) for x in segment_ids]))
    if "," in example.label: tf.logging.info("label: %s (id_list = %s)" % (str(example.label), str(label_id_list))) # if label_id is a list, try print multi-hot value: label_id_list
    tf.logging.info("label: %s (id = %s)" % (str(example.label), str(label_id))) # %d

  feature = InputFeatures(
      input_ids=input_ids,
      input_mask=input_mask,
      segment_ids=segment_ids,
      label_id=label_id)
  return feature


def file_based_convert_examples_to_features(
    examples, label_list, max_seq_length, tokenizer, output_file):
  """Convert a set of `InputExample`s to a TFRecord file."""

  writer = tf.python_io.TFRecordWriter(output_file)

  for (ex_index, example) in enumerate(examples):
    if ex_index % 10000 == 0:
      tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

    feature = convert_single_example(ex_index, example, label_list,
                                     max_seq_length, tokenizer)

    def create_int_feature(values):
      f = tf.train.Feature(int64_list=tf.train.Int64List(value=list(values)))
      return f

    features = collections.OrderedDict()
    features["input_ids"] = create_int_feature(feature.input_ids)
    features["input_mask"] = create_int_feature(feature.input_mask)
    features["segment_ids"] = create_int_feature(feature.segment_ids)

    # if feature.label_id is already a list, then no need to add [].
    if isinstance(feature.label_id, list):
        label_ids=feature.label_id
    else:
        label_ids = [feature.label_id]
    features["label_ids"] = create_int_feature(label_ids)

    tf_example = tf.train.Example(features=tf.train.Features(feature=features))
    writer.write(tf_example.SerializeToString())


def file_based_input_fn_builder(input_file, seq_length, is_training,
                                drop_remainder):
  """Creates an `input_fn` closure to be passed to TPUEstimator."""
  # task specific parameter
  name_to_features = {
      "input_ids": tf.FixedLenFeature([seq_length], tf.int64),
      "input_mask": tf.FixedLenFeature([seq_length], tf.int64),
      "segment_ids": tf.FixedLenFeature([seq_length], tf.int64),
      "label_ids": tf.FixedLenFeature([FLAGS.num_classes], tf.int64), # ADD TO A FIXED LENGTH
  }

  def _decode_record(record, name_to_features):
    """Decodes a record to a TensorFlow example."""
    example = tf.parse_single_example(record, name_to_features)

    # tf.Example only supports tf.int64, but the TPU only supports tf.int32.
    # So cast all int64 to int32.
    for name in list(example.keys()):
      t = example[name]
      if t.dtype == tf.int64:
        t = tf.to_int32(t)
      example[name] = t

    return example

  def input_fn(params):
    """The actual input function."""
    batch_size = params["batch_size"]

    # For training, we want a lot of parallel reading and shuffling.
    # For eval, we want no shuffling and parallel reading doesn't matter.
    d = tf.data.TFRecordDataset(input_file)
    if is_training:
      d = d.repeat()
      d = d.shuffle(buffer_size=100)

    d = d.apply(
        tf.contrib.data.map_and_batch(
            lambda record: _decode_record(record, name_to_features),
            batch_size=batch_size,
            drop_remainder=drop_remainder))

    return d

  return input_fn


def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  """Truncates a sequence pair in place to the maximum length."""

  # This is a simple heuristic which will always truncate the longer sequence
  # one token at a time. This makes more sense than truncating an equal percent
  # of tokens from each, since if one sequence is very short then each token
  # that's truncated likely contains more information than a longer sequence.
  while True:
    total_length = len(tokens_a) + len(tokens_b)
    if total_length <= max_length:
      break
    if len(tokens_a) > len(tokens_b):
      tokens_a.pop()
    else:
      tokens_b.pop()


def create_model_original(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  # In the demo, we are doing a simple classification task on the entire
  # segment.
  #
  # If you want to use the token-level output, use model.get_sequence_output()
  # instead.
  output_layer = model.get_pooled_output() # 从主干模型获得模型的输出

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable( # 分类模型特有的分类层的参数
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable( # 分类模型特有的bias
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True) # 分类模型特有的分类层
    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    log_probs = tf.nn.log_softmax(logits, axis=-1)

    one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) # 利用交叉熵就和
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
                 labels, num_labels, use_one_hot_embeddings):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  # In the demo, we are doing a simple classification task on the entire
  # segment.
  #
  # If you want to use the token-level output, use model.get_sequence_output()
  # instead.
  output_layer = model.get_pooled_output() # 从主干模型获得模型的输出

  hidden_size = output_layer.shape[-1].value

  output_weights = tf.get_variable( # 分类模型特有的分类层的参数
      "output_weights", [num_labels, hidden_size],
      initializer=tf.truncated_normal_initializer(stddev=0.02))

  output_bias = tf.get_variable( # 分类模型特有的bias
      "output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
      # I.e., 0.1 dropout
      output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)

    logits = tf.matmul(output_layer, output_weights, transpose_b=True) # 分类模型特有的分类层
    logits = tf.nn.bias_add(logits, output_bias)

    #print("labels:",labels,";logits:",logits,"isinstance(labels,list):",isinstance(labels,list))
    # mulit-label classification: 1.multi-hot==> then use sigmoid to transform it to possibility
    probabilities=tf.nn.sigmoid(logits)
    #log_probs=tf.log(probabilities)
    labels=tf.cast(labels,tf.float32)
    #  below is for single label classification
    #  one-hot for single label classification
    #  probabilities = tf.nn.softmax(logits, axis=-1)
    #log_probs = tf.nn.log_softmax(logits, axis=-1)
    #  one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)

    print("num_labels:",num_labels,";logits:",logits,";labels:",labels)
    #print("log_probs:",log_probs)
    #per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1) # 利用交叉熵就和
    per_example_loss=tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
    loss = tf.reduce_mean(per_example_loss)

    return (loss, per_example_loss, logits, probabilities)


def model_fn_builder(bert_config, num_labels, init_checkpoint, learning_rate,
                     num_train_steps, num_warmup_steps, use_tpu,
                     use_one_hot_embeddings):
  """Returns `model_fn` closure for TPUEstimator."""

  def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument
    """The `model_fn` for TPUEstimator."""

    tf.logging.info("*** Features ***")
    for name in sorted(features.keys()):
      tf.logging.info("  name = %s, shape = %s" % (name, features[name].shape))

    input_ids = features["input_ids"]
    input_mask = features["input_mask"]
    segment_ids = features["segment_ids"]
    label_ids = features["label_ids"]

    is_training = (mode == tf.estimator.ModeKeys.TRAIN)

    (total_loss, per_example_loss, logits, probabilities) = create_model(
        bert_config, is_training, input_ids, input_mask, segment_ids, label_ids,
        num_labels, use_one_hot_embeddings)

    tvars = tf.trainable_variables()

    scaffold_fn = None
    if init_checkpoint:
      (assignment_map, initialized_variable_names
      ) = modeling.get_assignment_map_from_checkpoint(tvars, init_checkpoint)
      if use_tpu:

        def tpu_scaffold():
          tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
          return tf.train.Scaffold()

        scaffold_fn = tpu_scaffold
      else:
        tf.train.init_from_checkpoint(init_checkpoint, assignment_map)

    tf.logging.info("**** Trainable Variables ****")
    for var in tvars:
      init_string = ""
      if var.name in initialized_variable_names:
        init_string = ", *INIT_FROM_CKPT*"
      tf.logging.info("  name = %s, shape = %s%s", var.name, var.shape,
                      init_string)

    output_spec = None
    if mode == tf.estimator.ModeKeys.TRAIN:

      train_op = optimization.create_optimizer(
          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)

      logging_hook = tf.train.LoggingTensorHook({"loss": total_loss}, every_n_iter=300)
      output_spec = tf.contrib.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=total_loss,
          train_op=train_op,
          training_hooks=[logging_hook],
          scaffold_fn=scaffold_fn)
    elif mode == tf.estimator.ModeKeys.EVAL:

      def metric_fn(per_example_loss, label_ids, logits):
        #predictions = tf.argmax(logits, axis=-1, output_type=tf.int32)
        logits_split=tf.split(logits,FLAGS.num_aspects,axis=-1) # a list. length is num_aspects
        label_ids_split=tf.split(logits,FLAGS.num_aspects,axis=-1) # a list. length is num_aspects
        accuracy=tf.constant(0.0,dtype=tf.float64)
        for j,logits in enumerate(logits_split): #
            # accuracy = tf.metrics.accuracy(label_ids, predictions)
            predictions=tf.argmax(logits, axis=-1, output_type=tf.int32) # should be [batch_size,]
            label_id_=tf.cast(tf.argmax(label_ids_split[j],axis=-1),dtype=tf.int32)
            print("label_ids_split_sub:",label_ids_split[j],";predictions:",predictions,";label_id_:",label_id_)
            current_accuracy,update_op_accuracy=tf.metrics.accuracy(label_id_,predictions)
            accuracy+=tf.cast(current_accuracy,dtype=tf.float64)
        accuracy=accuracy/tf.constant(FLAGS.num_aspects,dtype=tf.float64)
        loss = tf.metrics.mean(per_example_loss)
        return {
            "eval_accuracy": (accuracy,update_op_accuracy),
            "eval_loss": loss,
        }

      eval_metrics = (metric_fn, [per_example_loss, label_ids, logits])
      output_spec = tf.contrib.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=total_loss,
          eval_metrics=eval_metrics,
          scaffold_fn=scaffold_fn)
    else:
      output_spec = tf.contrib.tpu.TPUEstimatorSpec(
          mode=mode, predictions=probabilities, scaffold_fn=scaffold_fn)
    return output_spec

  return model_fn


# This function is not used by this file but is still used by the Colab and
# people who depend on it.
def input_fn_builder(features, seq_length, is_training, drop_remainder):
  """Creates an `input_fn` closure to be passed to TPUEstimator."""

  all_input_ids = []
  all_input_mask = []
  all_segment_ids = []
  all_label_ids = []

  for feature in features:
    all_input_ids.append(feature.input_ids)
    all_input_mask.append(feature.input_mask)
    all_segment_ids.append(feature.segment_ids)
    all_label_ids.append(feature.label_id)

  def input_fn(params):
    """The actual input function."""
    batch_size = params["batch_size"]

    num_examples = len(features)

    # This is for demo purposes and does NOT scale to large data sets. We do
    # not use Dataset.from_generator() because that uses tf.py_func which is
    # not TPU compatible. The right way to load data is with TFRecordReader.
    d = tf.data.Dataset.from_tensor_slices({
        "input_ids":
            tf.constant(
                all_input_ids, shape=[num_examples, seq_length],
                dtype=tf.int32),
        "input_mask":
            tf.constant(
                all_input_mask,
                shape=[num_examples, seq_length],
                dtype=tf.int32),
        "segment_ids":
            tf.constant(
                all_segment_ids,
                shape=[num_examples, seq_length],
                dtype=tf.int32),
        "label_ids":
            tf.constant(all_label_ids, shape=[num_examples], dtype=tf.int32),
    })

    if is_training:
      d = d.repeat()
      d = d.shuffle(buffer_size=100)

    d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)
    return d

  return input_fn


# This function is not used by this file but is still used by the Colab and
# people who depend on it.
def convert_examples_to_features(examples, label_list, max_seq_length,
                                 tokenizer):
  """Convert a set of `InputExample`s to a list of `InputFeatures`."""

  features = []
  for (ex_index, example) in enumerate(examples):
    if ex_index % 10000 == 0:
      tf.logging.info("Writing example %d of %d" % (ex_index, len(examples)))

    feature = convert_single_example(ex_index, example, label_list,
                                     max_seq_length, tokenizer)

    features.append(feature)
  return features


def main(_):
  tf.logging.set_verbosity(tf.logging.INFO)

  processors = {
      "cola": ColaProcessor,
      "mnli": MnliProcessor,
      "mrpc": MrpcProcessor,
      "xnli": XnliProcessor,
      "sentence_pair":SentencePairClassificationProcessor,
      "sentiment_analysis":SentimentAnalysisFineGrainProcessor,
  }

  if not FLAGS.do_train and not FLAGS.do_eval and not FLAGS.do_predict:
    raise ValueError(
        "At least one of `do_train`, `do_eval` or `do_predict' must be True.")

  bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)

  if FLAGS.max_seq_length > bert_config.max_position_embeddings:
    raise ValueError(
        "Cannot use sequence length %d because the BERT model "
        "was only trained up to sequence length %d" %
        (FLAGS.max_seq_length, bert_config.max_position_embeddings))

  tf.gfile.MakeDirs(FLAGS.output_dir)

  task_name = FLAGS.task_name.lower()

  if task_name not in processors:
    raise ValueError("Task not found: %s" % (task_name))

  processor = processors[task_name]()

  label_list = processor.get_labels()

  tokenizer = tokenization.FullTokenizer(
      vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)

  tpu_cluster_resolver = None
  if FLAGS.use_tpu and FLAGS.tpu_name:
    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(
        FLAGS.tpu_name, zone=FLAGS.tpu_zone, project=FLAGS.gcp_project)

  is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2
  run_config = tf.contrib.tpu.RunConfig(
      cluster=tpu_cluster_resolver,
      master=FLAGS.master,
      model_dir=FLAGS.output_dir,
      save_checkpoints_steps=FLAGS.save_checkpoints_steps,
      tpu_config=tf.contrib.tpu.TPUConfig(
          iterations_per_loop=FLAGS.iterations_per_loop,
          num_shards=FLAGS.num_tpu_cores,
          per_host_input_for_training=is_per_host))

  train_examples = None
  num_train_steps = None
  num_warmup_steps = None
  if FLAGS.do_train:
    train_examples = processor.get_train_examples(FLAGS.data_dir)
    num_train_steps = int(
        len(train_examples) / FLAGS.train_batch_size * FLAGS.num_train_epochs)
    num_warmup_steps = int(num_train_steps * FLAGS.warmup_proportion)

  model_fn = model_fn_builder(
      bert_config=bert_config,
      num_labels=len(label_list),
      init_checkpoint=FLAGS.init_checkpoint,
      learning_rate=FLAGS.learning_rate,
      num_train_steps=num_train_steps,
      num_warmup_steps=num_warmup_steps,
      use_tpu=FLAGS.use_tpu,
      use_one_hot_embeddings=FLAGS.use_tpu)

  # If TPU is not available, this will fall back to normal Estimator on CPU
  # or GPU.
  estimator = tf.contrib.tpu.TPUEstimator(
      use_tpu=FLAGS.use_tpu,
      model_fn=model_fn,
      config=run_config,
      train_batch_size=FLAGS.train_batch_size,
      eval_batch_size=FLAGS.eval_batch_size,
      predict_batch_size=FLAGS.predict_batch_size)

  if FLAGS.do_train:
    train_file = os.path.join(FLAGS.output_dir, "train.tf_record")
    file_based_convert_examples_to_features(
        train_examples, label_list, FLAGS.max_seq_length, tokenizer, train_file)
    tf.logging.info("***** Running training *****")
    tf.logging.info("  Num examples = %d", len(train_examples))
    tf.logging.info("  Batch size = %d", FLAGS.train_batch_size)
    tf.logging.info("  Num steps = %d", num_train_steps)
    train_input_fn = file_based_input_fn_builder(
        input_file=train_file,
        seq_length=FLAGS.max_seq_length,
        is_training=True,
        drop_remainder=True)
    estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)

  if FLAGS.do_eval:
    eval_examples = processor.get_dev_examples(FLAGS.data_dir)
    eval_file = os.path.join(FLAGS.output_dir, "eval.tf_record")
    file_based_convert_examples_to_features(
        eval_examples, label_list, FLAGS.max_seq_length, tokenizer, eval_file)

    tf.logging.info("***** Running evaluation *****")
    tf.logging.info("  Num examples = %d", len(eval_examples))
    tf.logging.info("  Batch size = %d", FLAGS.eval_batch_size)

    # This tells the estimator to run through the entire set.
    eval_steps = None
    # However, if running eval on the TPU, you will need to specify the
    # number of steps.
    if FLAGS.use_tpu:
      # Eval will be slightly WRONG on the TPU because it will truncate
      # the last batch.
      eval_steps = int(len(eval_examples) / FLAGS.eval_batch_size)

    eval_drop_remainder = True if FLAGS.use_tpu else False
    eval_input_fn = file_based_input_fn_builder(
        input_file=eval_file,
        seq_length=FLAGS.max_seq_length,
        is_training=False,
        drop_remainder=eval_drop_remainder)

    result = estimator.evaluate(input_fn=eval_input_fn, steps=eval_steps)

    output_eval_file = os.path.join(FLAGS.output_dir, "eval_results.txt")
    with tf.gfile.GFile(output_eval_file, "w") as writer:
      tf.logging.info("***** Eval results *****")
      for key in sorted(result.keys()):
        tf.logging.info("  %s = %s", key, str(result[key]))
        writer.write("%s = %s\n" % (key, str(result[key])))

  if FLAGS.do_predict:
    predict_examples = processor.get_test_examples(FLAGS.data_dir)
    predict_file = os.path.join(FLAGS.output_dir, "predict.tf_record")
    file_based_convert_examples_to_features(predict_examples, label_list,
                                            FLAGS.max_seq_length, tokenizer,
                                            predict_file)

    tf.logging.info("***** Running prediction*****")
    tf.logging.info("  Num examples = %d", len(predict_examples))
    tf.logging.info("  Batch size = %d", FLAGS.predict_batch_size)

    if FLAGS.use_tpu:
      # Warning: According to tpu_estimator.py Prediction on TPU is an
      # experimental feature and hence not supported here
      raise ValueError("Prediction in TPU not supported")

    predict_drop_remainder = True if FLAGS.use_tpu else False
    predict_input_fn = file_based_input_fn_builder(
        input_file=predict_file,
        seq_length=FLAGS.max_seq_length,
        is_training=False,
        drop_remainder=predict_drop_remainder)

    result = estimator.predict(input_fn=predict_input_fn)

    output_predict_file = os.path.join(FLAGS.output_dir, "test_results.tsv")
    with tf.gfile.GFile(output_predict_file, "w") as writer:
      tf.logging.info("***** Predict results *****")
      for prediction in result:
        output_line = "\t".join(
            str(class_probability) for class_probability in prediction) + "\n"
        writer.write(output_line)


if __name__ == "__main__":
  flags.mark_flag_as_required("data_dir")
  flags.mark_flag_as_required("task_name")
  flags.mark_flag_as_required("vocab_file")
  flags.mark_flag_as_required("bert_config_file")
  flags.mark_flag_as_required("output_dir")
  tf.app.run()


================================================
FILE: a00_Bert/unused/train_bert_multi-label_old.py
================================================
# coding=utf-8
"""
train bert model

1.get training data and vocabulary & labels dict
2. create model
3. train the model and report f1 score
"""
import bert_modeling as modeling
import tensorflow as tf
import os
import numpy as np

from utils import load_data,init_label_dict,get_label_using_logits,get_target_label_short,compute_confuse_matrix,\
    compute_micro_macro,compute_confuse_matrix_batch,get_label_using_logits_batch,get_target_label_short_batch

FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_string("cache_file_h5py","../data/ieee_zhihu_cup/data.h5","path of training/validation/test data.") #../data/sample_multiple_label.txt
tf.app.flags.DEFINE_string("cache_file_pickle","../data/ieee_zhihu_cup/vocab_label.pik","path of vocabulary and label files") #../data/sample_multiple_label.txt

tf.app.flags.DEFINE_float("learning_rate",0.0001,"learning rate")
tf.app.flags.DEFINE_integer("batch_size", 32, "Batch size for training/evaluating.") #批处理的大小 32-->128
tf.app.flags.DEFINE_string("ckpt_dir","checkpoint/","checkpoint location for the model")
tf.app.flags.DEFINE_boolean("is_training",True,"is training.true:tranining,false:testing/inference")
tf.app.flags.DEFINE_integer("num_epochs",15,"number of epochs to run.")

# below hyper-parameter is for bert model
# for a middel size model, train fast. use hidden_size=128, num_hidden_layers=4, num_attention_heads=8, intermediate_size=1024
tf.app.flags.DEFINE_integer("hidden_size",768,"hidden size")
tf.app.flags.DEFINE_integer("num_hidden_layers",12,"number of hidden layers")
tf.app.flags.DEFINE_integer("num_attention_heads",12,"number of attention headers")
tf.app.flags.DEFINE_integer("intermediate_size",3072,"intermediate size of hidden layer")
tf.app.flags.DEFINE_integer("max_seq_length",200,"max sequence length")

def main(_):
    # 1. get training data and vocabulary & labels dict
    word2index, label2index, trainX, trainY, vaildX, vaildY, testX, testY = load_data(FLAGS.cache_file_h5py,FLAGS.cache_file_pickle)
    vocab_size = len(word2index); print("bert model.vocab_size:", vocab_size);
    num_labels = len(label2index); print("num_labels:", num_labels); cls_id=word2index['CLS'];print("id of 'CLS':",word2index['CLS'])
    num_examples, FLAGS.max_seq_length = trainX.shape;print("num_examples of training:", num_examples, ";max_seq_length:", FLAGS.max_seq_length)

    # 2. create model, define train operation
    bert_config = modeling.BertConfig(vocab_size=len(word2index), hidden_size=FLAGS.hidden_size, num_hidden_layers=FLAGS.num_hidden_layers,
                                      num_attention_heads=FLAGS.num_attention_heads,intermediate_size=FLAGS.intermediate_size)
    input_ids = tf.placeholder(tf.int32, [None, FLAGS.max_seq_length], name="input_ids") # FLAGS.batch_size
    input_mask = tf.placeholder(tf.int32, [None, FLAGS.max_seq_length], name="input_mask")
    segment_ids = tf.placeholder(tf.int32, [None,FLAGS.max_seq_length],name="segment_ids")
    label_ids = tf.placeholder(tf.float32, [None,num_labels], name="label_ids")
    is_training = FLAGS.is_training #tf.placeholder(tf.bool, name="is_training")

    use_one_hot_embeddings = False
    loss, per_example_loss, logits, probabilities, model = create_model(bert_config, is_training, input_ids, input_mask,
                                                            segment_ids, label_ids, num_labels,use_one_hot_embeddings)
    # define train operation
    #num_train_steps = int(float(num_examples) / float(FLAGS.batch_size * FLAGS.num_epochs)); use_tpu=False; num_warmup_steps = int(num_train_steps * 0.1)
    #train_op = optimization.create_optimizer(loss, FLAGS.learning_rate, num_train_steps, num_warmup_steps, use_tpu)
    global_step = tf.Variable(0, trainable=False, name="Global_Step")
    train_op = tf.contrib.layers.optimize_loss(loss, global_step=global_step, learning_rate=FLAGS.learning_rate,optimizer="Adam", clip_gradients=3.0)

    is_training_eval=False
    # 3. train the model by calling create model, get loss
    gpu_config = tf.ConfigProto()
    gpu_config.gpu_options.allow_growth = True
    sess = tf.Session(config=gpu_config)
    sess.run(tf.global_variables_initializer())
    saver = tf.train.Saver()
    if os.path.exists(FLAGS.ckpt_dir + "checkpoint"):
        print("Checkpoint Exists. Restoring Variables from Checkpoint.")
        saver.restore(sess, tf.train.latest_checkpoint(FLAGS.ckpt_dir))
    number_of_training_data = len(trainX)
    iteration = 0
    curr_epoch = 0 #sess.run(textCNN.epoch_step)
    batch_size = FLAGS.batch_size
    for epoch in range(curr_epoch, FLAGS.num_epochs):
        loss_total, counter = 0.0, 0
        for start, end in zip(range(0, number_of_training_data, batch_size),range(batch_size, number_of_training_data, batch_size)):
            iteration = iteration + 1
            input_ids_,input_mask_,segment_ids_=get_input_mask_segment_ids(trainX[start:end],cls_id)
            feed_dict = {input_ids: input_ids_, input_mask: input_mask_, segment_ids:segment_ids_,
                         label_ids:trainY[start:end]}
            curr_loss,_ = sess.run([loss,train_op], feed_dict)
            loss_total, counter = loss_total + curr_loss, counter + 1
            if counter % 30 == 0:
                print(epoch,"\t",iteration,"\tloss:",loss_total/float(counter),"\tcurrent_loss:",curr_loss)
            if counter % 1000==0:
                print("trainX[",start,"]:",trainX[start]);#print("trainY[start:end]:",trainY[start:end])
                try:
                    target_labels = get_target_label_short_batch(trainY[start:end]);#print("target_labels:",target_labels)
                    print("trainY[",start,"]:",target_labels[0])
                except:
                    pass
            # evaulation
            if start!=0 and start % (3000 * FLAGS.batch_size) == 0:
                eval_loss, f1_score, f1_micro, f1_macro = do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training_eval,loss,
                                                                  probabilities,vaildX, vaildY, num_labels,batch_size,cls_id)
                print("Epoch %d Validation Loss:%.3f\tF1 Score:%.3f\tF1_micro:%.3f\tF1_macro:%.3f" % (
                    epoch, eval_loss, f1_score, f1_micro, f1_macro))
                # save model to checkpoint
                #if start % (4000 * FLAGS.batch_size)==0:
                save_path = FLAGS.ckpt_dir + "model.ckpt"
                print("Going to save model..")
                saver.save(sess, save_path, global_step=epoch)

def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,labels, num_labels, use_one_hot_embeddings,reuse_flag=False):
  """Creates a classification model."""
  model = modeling.BertModel(
      config=bert_config,
      is_training=is_training,
      input_ids=input_ids,
      input_mask=input_mask,
      token_type_ids=segment_ids,
      use_one_hot_embeddings=use_one_hot_embeddings)

  output_layer = model.get_pooled_output()
  hidden_size = output_layer.shape[-1].value
  with tf.variable_scope("weights",reuse=reuse_flag):
      output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
      output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())

  with tf.variable_scope("loss"):
    if is_training:
        print("###create_model.is_training:",is_training)
        output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
    logits = tf.matmul(output_layer, output_weights, transpose_b=True)
    print("output_layer:",output_layer.shape,";output_weights:",output_weights.shape,";logits:",logits.shape)

    logits = tf.nn.bias_add(logits, output_bias)
    probabilities = tf.nn.softmax(logits, axis=-1)
    per_example_loss=tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
    loss = tf.reduce_mean(per_example_loss)

    return loss, per_example_loss, logits, probabilities,model


def do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training,loss,probabilities,vaildX, vaildY, num_labels,batch_size,cls_id):
    """
    evalution on model using validation data
    :param sess:
    :param input_ids:
    :param input_mask:
    :param segment_ids:
    :param label_ids:
    :param is_training:
    :param loss:
    :param probabilities:
    :param vaildX:
    :param vaildY:
    :param num_labels:
    :param batch_size:
    :return:
    """
    num_eval=1000
    vaildX = vaildX[0:num_eval]
    vaildY = vaildY[0:num_eval]
    number_examples = len(vaildX)
    eval_loss, eval_counter, eval_f1_score, eval_p, eval_r = 0.0, 0, 0.0, 0.0, 0.0
    label_dict = init_label_dict(num_labels)
    f1_score_micro_sklearn_total=0.0
    # batch_size=1 # TODO
    for start, end in zip(range(0, number_examples, batch_size), range(batch_size, number_examples, batch_size)):
        input_ids_,input_mask_, segment_ids_ = get_input_mask_segment_ids(vaildX[start:end],cls_id)
        feed_dict = {input_ids: input_ids_,input_mask:input_mask_,segment_ids:segment_ids_,
                     label_ids:vaildY[start:end]}
        curr_eval_loss, prob = sess.run([loss, probabilities],feed_dict)
        target_labels=get_target_label_short_batch(vaildY[start:end])
        predict_labels=get_label_using_logits_batch(prob)
        #print("predict_labels:",predict_labels)
        label_dict=compute_confuse_matrix_batch(target_labels,predict_labels,label_dict,name='bert')
        eval_loss, eval_counter = eval_loss + curr_eval_loss, eval_counter + 1

    f1_micro, f1_macro = compute_micro_macro(label_dict)  # label_dictis a dict, key is: accusation,value is: (TP,FP,FN). where TP is number of True Positive
    f1_score_result = (f1_micro + f1_macro) / 2.0
    return eval_loss / float(eval_counter), f1_score_result, f1_micro, f1_macro

def get_input_mask_segment_ids(train_x_batch,cls_id):
    """
    get input mask and segment ids given a batch of input x.
    if sequence length of input x is max_sequence_length, then shape of both input_mask and segment_ids should be
    [batch_size, max_sequence_length]. for those padding tokens, input_mask will be zero, value for all other place is one.
    :param train_x_batch:
    :return: input_mask_,segment_ids
    """
    batch_size,max_sequence_length=train_x_batch.shape
    input_mask=np.ones((batch_size,max_sequence_length),dtype=np.int32)
    # set 0 for token in padding postion
    for i in range(batch_size):
        input_x_=train_x_batch[i] # a list, length is max_sequence_length
        input_x=list(input_x_)
        for j in range(len(input_x)):
            if input_x[j]==0:
                input_mask[i][j:]=0
                break
    # insert CLS token for classification
    input_ids=np.zeros((batch_size,max_sequence_length),dtype=np.int32)
    #print("input_ids.shape1:",input_ids.shape)
    for k in range(batch_size):
        input_id_list=list(train_x_batch[k])
        input_id_list.insert(0,cls_id)
        del input_id_list[-1]
        input_ids[k]=input_id_list
    #print("input_ids.shape2:",input_ids.shape)

    segment_ids=np.ones((batch_size,max_sequence_length),dtype=np.int32)
    return input_mask, segment_ids,input_ids

#train_x_batch=np.ones((3,5))
#train_x_batch[0,4]=0
#train_x_batch[1,3]=0
#train_x_batch[1,4]=0
#cls_id=2
#print("train_x_batch:",train_x_batch)
#input_mask, segment_ids,input_ids=get_input_mask_segment_ids(train_x_batch,cls_id)
#print("input_mask:",input_mask, "segment_ids:",segment_ids,"input_ids:",input_ids)

if __name__ == "__main__":
    tf.app.run()

================================================
FILE: a00_Bert/utils.py
================================================
# -*- coding: utf-8 -*-

import pickle
import h5py
import os
import numpy as np
import random

random_number=300

def load_data(cache_file_h5py,cache_file_pickle):
    """
    load data from h5py and pickle cache files, which is generate by take step by step of pre-processing.ipynb
    :param cache_file_h5py:
    :param cache_file_pickle:
    :return:
    """
    if not os.path.exists(cache_file_h5py) or not os.path.exists(cache_file_pickle):
        raise RuntimeError("############################ERROR##############################\n. "
                           "please download cache file, it include training data and vocabulary & labels. "
                           "link can be found in README.md\n download zip file, unzip it, then put cache files as FLAGS."
                           "cache_file_h5py and FLAGS.cache_file_pickle suggested location.")
    print("INFO. cache file exists. going to load cache file")
    f_data = h5py.File(cache_file_h5py, 'r')
    print("f_data.keys:",list(f_data.keys()))
    train_X=f_data['train_X'] # np.array(
    print("train_X.shape:",train_X.shape)
    train_Y=f_data['train_Y'] # np.array(
    print("train_Y.shape:",train_Y.shape,";")
    vaild_X=f_data['vaild_X'] # np.array(
    valid_Y=f_data['valid_Y'] # np.array(
    test_X=f_data['test_X'] # np.array(
    test_Y=f_data['test_Y'] # np.array(
    #f_data.close()

    word2index, label2index=None,None
    with open(cache_file_pickle, 'rb') as data_f_pickle:
        word2index, label2index=pickle.load(data_f_pickle)
    print("INFO. cache file load successful...")
    return word2index, label2index,train_X,train_Y,vaild_X,valid_Y,test_X,test_Y

#######################################
def compute_f1_score(predict_y,eval_y):
    """
    compoute f1_score.
    :param logits: [batch_size,label_size]
    :param evalY: [batch_size,label_size]
    :return:
    """
    f1_score=0.0
    p_5=0.0
    r_5=0.0
    return f1_score,p_5,r_5

def compute_f1_score_removed(label_list_top5,eval_y):
    """
    compoute f1_score.
    :param logits: [batch_size,label_size]
    :param evalY: [batch_size,label_size]
    :return:
    """
    num_correct_label=0
    eval_y_short=get_target_label_short(eval_y)
    for label_predict in label_list_top5:
        if label_predict in eval_y_short:
            num_correct_label=num_correct_label+1
    #P@5=Precision@5
    num_labels_predicted=len(label_list_top5)
    all_real_labels=len(eval_y_short)
    p_5=num_correct_label/num_labels_predicted
    #R@5=Recall@5
    r_5=num_correct_label/all_real_labels
    f1_score=2.0*p_5*r_5/(p_5+r_5+0.000001)
    return f1_score,p_5,r_5

def compute_confuse_matrix(target_y,predict_y,label_dict,name='default'):
    """
    compute true postive(TP), false postive(FP), false negative(FN) given target lable and predict label
    :param target_y:
    :param predict_y:
    :param label_dict {label:(TP,FP,FN)}
    :return: macro_f1(a scalar),micro_f1(a scalar)
    """
    #1.get target label and predict label
    if random.choice([x for x in range(random_number)]) ==1:
        print(name+".target_y:",target_y,";predict_y:",predict_y) #debug purpose

    #2.count number of TP,FP,FN for each class
    y_labels_unique=[]
    y_labels_unique.extend(target_y)
    y_labels_unique.extend(predict_y)
    y_labels_unique=list(set(y_labels_unique))
    for i,label in enumerate(y_labels_unique): #e.g. label=2
        TP, FP, FN = label_dict[label]
        if label in predict_y and label in target_y:#predict=1,truth=1 (TP)
            TP=TP+1
        elif label in predict_y and label not in target_y:#predict=1,truth=0(FP)
            FP=FP+1
        elif label not in predict_y and label in target_y:#predict=0,truth=1(FN)
            FN=FN+1
        label_dict[label] = (TP, FP, FN)
    return label_dict

def compute_micro_macro(label_dict):
    """
    compute f1 of micro and macro
    :param label_dict:
    :return: f1_micro,f1_macro: scalar, scalar
    """
    f1_micro = compute_f1_micro_use_TFFPFN(label_dict)
    f1_macro= compute_f1_macro_use_TFFPFN(label_dict)
    return f1_micro,f1_macro

def compute_TF_FP_FN_micro(label_dict):
    """
    compute micro FP,FP,FN
    :param label_dict_accusation: a dict. {label:(TP, FP, FN)}
    :return:TP_micro,FP_micro,FN_micro
    """
    TP_micro,FP_micro,FN_micro=0.0,0.0,0.0
    for label,tuplee in label_dict.items():
        TP,FP,FN=tuplee
        TP_micro=TP_micro+TP
        FP_micro=FP_micro+FP
        FN_micro=FN_micro+FN
    return TP_micro,FP_micro,FN_micro
def compute_f1_micro_use_TFFPFN(label_dict):
    """
    compute f1_micro
    :param label_dict: {label:(TP,FP,FN)}
    :return: f1_micro: a scalar
    """
    TF_micro_accusation, FP_micro_accusation, FN_micro_accusation =compute_TF_FP_FN_micro(label_dict)
    f1_micro_accusation = compute_f1(TF_micro_accusation, FP_micro_accusation, FN_micro_accusation,'micro')
    return f1_micro_accusation

def compute_f1_macro_use_TFFPFN(label_dict):
    """
    compute f1_macro
    :param label_dict: {label:(TP,FP,FN)}
    :return: f1_macro
    """
    f1_dict= {}
    num_classes=len(label_dict)
    for label, tuplee in label_dict.items():
        TP,FP,FN=tuplee
        f1_score_onelabel=compute_f1(TP,FP,FN,'macro')
        f1_dict[label]=f1_score_onelabel
    f1_score_sum=np.sum(f1_dict.values())
    f1_score=f1_score_sum/float(num_classes)
    return f1_score

small_value=0.00001
def compute_f1(TP,FP,FN,compute_type):
    """
    compute f1
    :param TP_micro: number.e.g. 200
    :param FP_micro: number.e.g. 200
    :param FN_micro: number.e.g. 200
    :return: f1_score: a scalar
    """
    precison=TP/(TP+FP+small_value)
    recall=TP/(TP+FN+small_value)
    f1_score=(2*precison*recall)/(precison+recall+small_value)

    if random.choice([x for x in range(500)]) == 1:print(compute_type,"precison:",str(precison),";recall:",str(recall),";f1_score:",f1_score)

    return f1_score
def init_label_dict(num_classes):
    """
    init label dict. this dict will be used to save TP,FP,FN
    :param num_classes:
    :return: label_dict: a dict. {label_index:(0,0,0)}
    """
    label_dict={}
    for i in range(num_classes):
        label_dict[i]=(0,0,0)
    return label_dict

def get_target_label_short(eval_y):
    eval_y_short=[] #will be like:[22,642,1391]
    for index,label in enumerate(eval_y):
        if label>0:
            eval_y_short.append(index)
    return eval_y_short

def get_target_label_short_batch(eval_y_big): # tested.
    eval_y_short_big=[] #will be like:[22,642,1391]
    for ind, eval_y in enumerate(eval_y_big):
        eval_y_short=[]
        for index,label in enumerate(eval_y):
            if label>0:
                eval_y_short.append(index)
        eval_y_short_big.append(eval_y_short)
    return eval_y_short_big

#eval_y_big=np.zeros((3,6))
#eval_y_big[0,5]=1
#eval_y_big[0,0]=1
#eval_y_big[1,0]=1
#eval_y_big[1,1]=1
#print("eval_y_big:",eval_y_big)
#result=get_target_label_short_batch(eval_y_big)
#print("result:",result)

#get top5 predicted labels
def get_label_using_prob(prob,top_number=5):
    y_predict_labels = [i for i in range(len(prob)) if prob[i] >= 0.50]  # TODO 0.5PW e.g.[2,12,13,10]
    if len(y_predict_labels) < 1:
        y_predict_labels = [np.argmax(prob)]
    return y_predict_labels

def get_label_using_logits_batch(prob,top_number=5): # tested.
    result_labels=[]
    for i in range(len(prob)):
        single_prob=prob[i]
        labels=get_label_using_prob(single_prob)
        result_labels.append(labels)
    return result_labels

#统计预测的准确率
def calculate_accuracy(labels_predicted, labels,eval_counter):
    label_nozero=[]
    #print("labels:",labels)
    labels=list(labels)
    for index,label in enumerate(labels):
        if label>0:
            label_nozero.append(index)
    if eval_counter<2:
        print("labels_predicted:",labels_predicted," ;labels_nozero:",label_nozero)
    count = 0
    label_dict = {x: x for x in label_nozero}
    for label_predict in labels_predicted:
        flag = label_dict.get(label_predict, None)
    if flag is not None:
        count = count + 1
    return count / len(labels)

def compute_confuse_matrix_batch(y_targetlabel_list,y_logits_array,label_dict,name='default'):
    """
    compute confuse matrix for a batch
    :param y_targetlabel_list: a list; each element is a mulit-hot,e.g. [1,0,0,1,...]
    :param y_logits_array: a 2-d array. [batch_size,num_class]
    :param label_dict:{label:(TP, FP, FN)}
    :param name: a string for debug purpose
    :return:label_dict:{label:(TP, FP, FN)}
    """
    for i,y_targetlabel_list_single in enumerate(y_targetlabel_list):
        label_dict=compute_confuse_matrix(y_targetlabel_list_single,y_logits_array[i],label_dict,name=name)
    return label_dict


================================================
FILE: a00_boosting/a08_boosting.py
================================================
# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import numpy as np
import tensorflow as tf

#main process for boosting:
#1.compute label weight after each epoch using validation data.
#2.get weights for each batch during traininig process
#3.compute loss using cross entropy with weights

#1.compute label weight after each epoch using validation data.
def compute_labels_weights(weights_label,logits,labels):
    """
    compute weights for labels in current batch, and update weights_label(a dict)
    :param weights_label:a dict
    :param logit: [None,Vocabulary_size]
    :param label: [None,]
    :return:
    """
    labels_predict=np.argmax(logits,axis=1) # logits:(256,108,754)
    for i in range(len(labels)):
        label=labels[i]
        label_predict=labels_predict[i]
        weight=weights_label.get(label,None)
        if weight==None:
            if label_predict == label:
                weights_label[label]=(1,1)
            else:
                weights_label[label]=(1,0)
        else:
            number=weight[0]
            correct=weight[1]
            number=number+1
            if label_predict==label:
                correct=correct+1
            weights_label[label]=(number,correct)
    return weights_label

#2.get weights for each batch during traininig process
def get_weights_for_current_batch(answer_list,weights_dict):
    """
    get weights for current batch
    :param  answer_list: a numpy array contain labels for a batch
    :param  weights_dict: a dict that contain weights for all labels
    :return: a list. length is label size.
    """
    weights_list_batch=list(np.ones((len(answer_list))))
    answer_list=list(answer_list)
    for i,label in enumerate(answer_list):
        acc=weights_dict[label]
        weights_list_batch[i]=min(1.5,1.0/(acc+0.001))
    #if np.random.choice(200)==0: #print something from time to time
    #    print("weights_list_batch:",weights_list_batch)
    return weights_list_batch

#3.compute loss using cross entropy with weights
def loss(logits,labels,weights):
    loss= tf.losses.sparse_softmax_cross_entropy(labels, logits,weights=weights)
    return loss

#######################################################################
#util function
def get_weights_label_as_standard_dict(weights_label):
    weights_dict = {}
    for k,v in weights_label.items():
        count,correct=v
        weights_dict[k]=float(correct)/float(count)
    return weights_dict


================================================
FILE: a01_FastText/old_single_label/p5_fastTextB_model.py
================================================
# fast text. using: very simple model;n-gram to captrue location information;h-softmax to speed up training/inference
# for the n-gram you can use data_util to generate. see method process_one_sentence_to_get_ui_bi_tri_gram under aa1_data_util/data_util_zhihu.py
print("started...")
import tensorflow as tf
import numpy as np

class fastTextB:
    def __init__(self, label_size, learning_rate, batch_size, decay_steps, decay_rate,num_sampled,sentence_len,vocab_size,embed_size,is_training):
        """init all hyperparameter here"""
        # set hyperparamter
        self.label_size = label_size
        self.batch_size = batch_size
        self.num_sampled = num_sampled
        self.sentence_len=sentence_len
        self.vocab_size=vocab_size
        self.embed_size=embed_size
        self.is_training=is_training
        self.learning_rate=learning_rate

        # add placeholder (X,label)
        self.sentence = tf.placeholder(tf.int32, [None, self.sentence_len], name="sentence")  # X
        self.labels = tf.placeholder(tf.int32, [None], name="Labels")  # y

        self.global_step = tf.Variable(0, trainable=False, name="Global_Step")
        self.epoch_step=tf.Variable(0,trainable=False,name="Epoch_Step")
        self.epoch_increment=tf.assign(self.epoch_step,tf.add(self.epoch_step,tf.constant(1)))
        self.decay_steps, self.decay_rate = decay_steps, decay_rate

        self.epoch_step = tf.Variable(0, trainable=False, name="Epoch_Step")
        self.instantiate_weights()
        self.logits = self.inference() #[None, self.label_size]
        if not is_training:
            return
        self.loss_val = self.loss()
        self.train_op = self.train()
        self.predictions = tf.argmax(self.logits, axis=1, name="predictions")  # shape:[None,]
        correct_prediction = tf.equal(tf.cast(self.predictions,tf.int32), self.labels) #tf.argmax(self.logits, 1)-->[batch_size]
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name="Accuracy") # shape=()

    def instantiate_weights(self):
        """define all weights here"""
        # embedding matrix
        self.Embedding = tf.get_variable("Embedding", [self.vocab_size, self.embed_size])
        self.W = tf.get_variable("W", [self.embed_size, self.label_size])
        self.b = tf.get_variable("b", [self.label_size])

    def inference(self):
        """main computation graph here: 1.embedding-->2.average-->3.linear classifier"""
        # 1.get emebedding of words in the sentence
        sentence_embeddings = tf.nn.embedding_lookup(self.Embedding,self.sentence)  # [None,self.sentence_len,self.embed_size]

        # 2.average vectors, to get representation of the sentence
        self.sentence_embeddings = tf.reduce_mean(sentence_embeddings, axis=1)  # [None,self.embed_size]

        # 3.linear classifier layer
        logits = tf.matmul(self.sentence_embeddings, self.W) + self.b #[None, self.label_size]==tf.matmul([None,self.embed_size],[self.embed_size,self.label_size])
        return logits

    def loss(self,l2_lambda=0.01): #0.0001-->0.001
        """calculate loss using (NCE)cross entropy here"""
        # Compute the average NCE loss for the batch.
        # tf.nce_loss automatically draws a new sample of the negative labels each
        # time we evaluate the loss.
        if self.is_training: #training
            labels=tf.reshape(self.labels,[-1])               #[batch_size,1]------>[batch_size,]
            labels=tf.expand_dims(labels,1)                   #[batch_size,]----->[batch_size,1]
            loss = tf.reduce_mean( #inputs: A `Tensor` of shape `[batch_size, dim]`.  The forward activations of the input network.
                tf.nn.nce_loss(weights=tf.transpose(self.W),  #[embed_size, label_size]--->[label_size,embed_size]. nce_weights:A `Tensor` of shape `[num_classes, dim].O.K.
                               biases=self.b,                 #[label_size]. nce_biases:A `Tensor` of shape `[num_classes]`.
                               labels=labels,                 #[batch_size,1]. train_labels, # A `Tensor` of type `int64` and shape `[batch_size,num_true]`. The target classes.
                               inputs=self.sentence_embeddings,# [None,self.embed_size] #A `Tensor` of shape `[batch_size, dim]`.  The forward activations of the input network.
                               num_sampled=self.num_sampled,  #scalar. 100
                               num_classes=self.label_size,partition_strategy="div"))  #scalar. 1999
        else:#eval/inference
            #logits = tf.matmul(self.sentence_embeddings, tf.transpose(self.W)) #matmul([None,self.embed_size])--->
            #logits = tf.nn.bias_add(logits, self.b)
            labels_one_hot = tf.one_hot(self.labels, self.label_size) #[batch_size]---->[batch_size,label_size]
            #sigmoid_cross_entropy_with_logits:Computes sigmoid cross entropy given `logits`.Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive.  For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
            loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels_one_hot,logits=self.logits) #labels:[batch_size,label_size];logits:[batch, label_size]
            print("loss0:", loss) #shape=(?, 1999)
            loss = tf.reduce_sum(loss, axis=1)
            print("loss1:",loss)  #shape=(?,)
        l2_losses = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'bias' not in v.name]) * l2_lambda
        return loss

    def train(self):
        """based on the loss, use SGD to update parameter"""
        learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step, self.decay_steps,self.decay_rate, staircase=True)
        train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,learning_rate=learning_rate, optimizer="Adam")
        return train_op

#test started
def test():
    #below is a function test; if you use this for text classifiction, you need to tranform sentence to indices of vocabulary first. then feed data to the graph.
    num_classes=19
    learning_rate=0.01
    batch_size=8
    decay_steps=1000
    decay_rate=0.9
    sequence_length=5
    vocab_size=10000
    embed_size=100
    is_training=True
    dropout_keep_prob=1
    fastText=fastTextB(num_classes, learning_rate, batch_size, decay_steps, decay_rate,5,sequence_length,vocab_size,embed_size,is_training)
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for i in range(100):
            input_x=np.zeros((batch_size,sequence_length),dtype=np.int32) #[None, self.sequence_length]
            input_y=input_y=np.array([1,0,1,1,1,2,1,1],dtype=np.int32) #np.zeros((batch_size),dtype=np.int32) #[None, self.sequence_length]
            loss,acc,predict,_=sess.run([fastText.loss_val,fastText.accuracy,fastText.predictions,fastText.train_op],
                                        feed_dict={fastText.sentence:input_x,fastText.labels:input_y})
            print("loss:",loss,"acc:",acc,"label:",input_y,"prediction:",predict)
#test()
print("ended...")


================================================
FILE: a01_FastText/old_single_label/p5_fastTextB_predict.py
================================================
# -*- coding: utf-8 -*-
#prediction using model.
#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed data. 4.predict
try:
    reload                        # Python 2
except NameError:
    from importlib import reload  # Python 3
import sys
reload(sys)
sys.setdefaultencoding('utf8')
import tensorflow as tf
import numpy as np
from p5_fastTextB_model import fastTextB as fastText
from data_util_zhihu import load_data_predict,load_final_test_data,create_voabulary,create_voabulary_label
from tflearn.data_utils import to_categorical, pad_sequences
import os
import codecs

#configuration
FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer("label_size",1999,"number of label")
tf.app.flags.DEFINE_float("learning_rate",0.01,"learning rate")
tf.app.flags.DEFINE_integer("batch_size", 512, "Batch size for training/evaluating.") #批处理的大小 32-->128
tf.app.flags.DEFINE_integer("decay_steps", 5000, "how many steps before decay learning rate.") #批处理的大小 32-->128
tf.app.flags.DEFINE_float("decay_rate", 0.9, "Rate of decay for learning rate.") #0.5一次衰减多少
tf.app.flags.DEFINE_integer("num_sampled",100,"number of noise sampling")
tf.app.flags.DEFINE_string("ckpt_dir","fast_text_checkpoint/","checkpoint location for the model")
tf.app.flags.DEFINE_integer("sentence_len",300,"max sentence length")
tf.app.flags.DEFINE_integer("embed_size",100,"embedding size")
tf.app.flags.DEFINE_boolean("is_training",False,"is traning.true:tranining,false:testing/inferen
Download .txt
gitextract_51obnlez/

├── .travis.yml
├── LICENSE.md
├── README.md
├── a00_Bert/
│   ├── README_bert.md
│   ├── __init__.py
│   ├── bert_modeling.py
│   ├── optimization.py
│   ├── run_classifier_predict_online.py
│   ├── tokenization.py
│   ├── train_bert_multi-label.py
│   ├── train_bert_toy_task.py
│   ├── unused/
│   │   ├── run_classifier_multi_labels_bert.py
│   │   └── train_bert_multi-label_old.py
│   └── utils.py
├── a00_boosting/
│   └── a08_boosting.py
├── a01_FastText/
│   ├── old_single_label/
│   │   ├── p5_fastTextB_model.py
│   │   ├── p5_fastTextB_predict.py
│   │   └── p5_fastTextB_train.py
│   ├── p5_fastTextB_predict_multilabel.py
│   ├── p6_fastTextB_model_multilabel.py
│   └── p6_fastTextB_train_multilabel.py
├── a02_TextCNN/
│   ├── __init__.py
│   ├── data_util.py
│   ├── other_experiement/
│   │   ├── __init__.py
│   │   ├── data_util_zhihu.py
│   │   ├── p7_TextCNN_model_multilayers.py
│   │   ├── p7_TextCNN_predict_ensemble.py
│   │   ├── p7_TextCNN_predict_exp.py
│   │   ├── p7_TextCNN_predict_exp512.py
│   │   ├── p7_TextCNN_predict_exp512_0609.py
│   │   ├── p7_TextCNN_predict_exp512_simple.py
│   │   ├── p7_TextCNN_train_exp.py
│   │   ├── p7_TextCNN_train_exp512.py
│   │   ├── p7_TextCNN_train_exp_512_0609.py
│   │   └── p8_TextCNN_predict_exp.py
│   ├── p7_TextCNN_model.py
│   ├── p7_TextCNN_predict.py
│   ├── p7_TextCNN_train.py
│   └── p7_temp.py
├── a03_TextRNN/
│   ├── p8_TextRNN_model.py
│   ├── p8_TextRNN_model_multi_layers.py
│   ├── p8_TextRNN_predict.py
│   └── p8_TextRNN_train.py
├── a04_TextRCNN/
│   ├── p71_TextRCNN_mode2.py
│   ├── p71_TextRCNN_model.py
│   ├── p71_TextRCNN_predict.py
│   └── p71_TextRCNN_train.py
├── a05_HierarchicalAttentionNetwork/
│   ├── HAN_model.py
│   ├── p1_HierarchicalAttention_model.py
│   ├── p1_HierarchicalAttention_model_transformer.py
│   ├── p1_HierarchicalAttention_predict.py
│   ├── p1_HierarchicalAttention_train.py
│   └── p1_seq2seq.py
├── a06_Seq2seqWithAttention/
│   ├── a1_seq2seq.py
│   ├── a1_seq2seq_attention_model.py
│   ├── a1_seq2seq_attention_predict.py
│   └── a1_seq2seq_attention_train.py
├── a07_Transformer/
│   ├── a2_attention_between_enc_dec.py
│   ├── a2_base_model.py
│   ├── a2_decoder.py
│   ├── a2_encoder.py
│   ├── a2_layer_norm_residual_conn.py
│   ├── a2_multi_head_attention.py
│   ├── a2_poistion_wise_feed_forward.py
│   ├── a2_predict.py
│   ├── a2_predict_classification.py
│   ├── a2_split_traning_data.py
│   ├── a2_train.py
│   ├── a2_train_classification.py
│   ├── a2_transformer.py
│   ├── a2_transformer_classification.py
│   └── data_util_zhihu.py
├── a08_EntityNetwork/
│   ├── a3_entity_network.py
│   ├── a3_predict.py
│   ├── a3_train.py
│   └── data_util_zhihu.py
├── a08_predict_ensemble.py
├── a09_DynamicMemoryNet/
│   ├── a8_dynamic_memory_network.py
│   ├── a8_predict.py
│   └── a8_train.py
├── aa1_data_util/
│   ├── 1_process_zhihu.py
│   ├── 2_predict_zhihu_get_question_representation.py
│   ├── 3_process_zhihu_question_topic_relation.py
│   ├── data_multi_label.txt
│   ├── data_single_label.txt
│   └── data_util_zhihu.py
├── aa2_ClassificationTflearn/
│   ├── p2_classification_tflearn.py
│   └── p2_classification_tflearn_demo.py
├── aa3_CNNSentenceClassificationTflearn/
│   ├── p4_cnn_sentence_classification.py
│   ├── p4_cnn_sentence_classification_zhihu.py
│   ├── p4_cnn_sentence_classification_zhihu2.py
│   ├── p4_cnn_sentence_classification_zhihu2_predict.py
│   └── p4_conv_classification_tflearn.py
├── aa4_TextCNN_with_RCNN/
│   ├── p72_TextCNN_with_RCNN_model.py
│   └── p72_TextCNN_with_RCNN_train.py
├── aa5_BiLstmTextRelation/
│   ├── p9_BiLstmTextRelation_model.py
│   └── p9_BiLstmTextRelation_train.py
├── aa6_TwoCNNTextRelation/
│   ├── p9_twoCNNTextRelation_model.py
│   └── p9_twoCNNTextRelation_train.py
├── data/
│   ├── __init__.py
│   ├── ieee_zhihu_cup/
│   │   ├── label_set.txt
│   │   └── vocab.txt
│   ├── old/
│   │   ├── __init__.py
│   │   └── sample_multiple_label.txt
│   ├── sample_multiple_label.txt
│   └── sample_single_label.txt
├── images/
│   └── xx
└── pre-processing.ipynb
Download .txt
SYMBOL INDEX (679 symbols across 83 files)

FILE: a00_Bert/bert_modeling.py
  class BertConfig (line 30) | class BertConfig(object):
    method __init__ (line 33) | def __init__(self,
    method from_dict (line 82) | def from_dict(cls, json_object):
    method from_json_file (line 90) | def from_json_file(cls, json_file):
    method to_dict (line 96) | def to_dict(self):
    method to_json_string (line 101) | def to_json_string(self):
  class BertModel (line 106) | class BertModel(object):
    method __init__ (line 130) | def __init__(self,
    method get_pooled_output (line 246) | def get_pooled_output(self):
    method get_sequence_output (line 249) | def get_sequence_output(self):
    method get_all_encoder_layers (line 258) | def get_all_encoder_layers(self):
    method get_embedding_output (line 261) | def get_embedding_output(self):
    method get_embedding_table (line 272) | def get_embedding_table(self):
  function gelu (line 276) | def gelu(input_tensor):
  function get_activation (line 292) | def get_activation(activation_string):
  function get_assignment_map_from_checkpoint (line 329) | def get_assignment_map_from_checkpoint(tvars, init_checkpoint):
  function dropout (line 356) | def dropout(input_tensor, dropout_prob):
  function layer_norm (line 374) | def layer_norm(input_tensor, name=None):
  function layer_norm_and_dropout (line 380) | def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
  function create_initializer (line 387) | def create_initializer(initializer_range=0.02):
  function embedding_lookup (line 392) | def embedding_lookup(input_ids,
  function embedding_postprocessor (line 441) | def embedding_postprocessor(input_tensor,
  function create_attention_mask_from_input_mask (line 544) | def create_attention_mask_from_input_mask(from_tensor, to_mask):
  function attention_layer (line 578) | def attention_layer(from_tensor,
  function transformer_model (line 774) | def transformer_model(input_tensor,
  function get_shape_list (line 915) | def get_shape_list(tensor, expected_rank=None, name=None):
  function reshape_to_matrix (line 952) | def reshape_to_matrix(input_tensor):
  function reshape_from_matrix (line 966) | def reshape_from_matrix(output_tensor, orig_shape_list):
  function assert_rank (line 979) | def assert_rank(tensor, expected_rank, name=None):

FILE: a00_Bert/optimization.py
  function create_optimizer (line 25) | def create_optimizer(loss, init_lr, num_train_steps, num_warmup_steps, u...
  class AdamWeightDecayOptimizer (line 84) | class AdamWeightDecayOptimizer(tf.train.Optimizer):
    method __init__ (line 87) | def __init__(self,
    method apply_gradients (line 105) | def apply_gradients(self, grads_and_vars, global_step=None, name=None):
    method _do_use_weight_decay (line 156) | def _do_use_weight_decay(self, param_name):
    method _get_variable_name (line 166) | def _get_variable_name(self, param_name):

FILE: a00_Bert/run_classifier_predict_online.py
  class InputExample (line 56) | class InputExample(object):
    method __init__ (line 59) | def __init__(self, guid, text_a, text_b=None, label=None):
  class InputFeatures (line 76) | class InputFeatures(object):
    method __init__ (line 79) | def __init__(self, input_ids, input_mask, segment_ids, label_id):
  class DataProcessor (line 86) | class DataProcessor(object):
    method get_train_examples (line 89) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 93) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 97) | def get_test_examples(self, data_dir):
    method get_labels (line 101) | def get_labels(self):
    method _read_tsv (line 106) | def _read_tsv(cls, input_file, quotechar=None):
  class SentencePairClassificationProcessor (line 115) | class SentencePairClassificationProcessor(DataProcessor):
    method __init__ (line 117) | def __init__(self):
    method get_labels (line 135) | def get_labels(self):
  function convert_single_example (line 153) | def convert_single_example(ex_index, example, label_list, max_seq_length...
  function _truncate_seq_pair (line 243) | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  function create_int_feature (line 259) | def create_int_feature(values):
  function create_model (line 263) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function main (line 322) | def main(_):
  function predict_online (line 354) | def predict_online(line):

FILE: a00_Bert/tokenization.py
  function convert_to_unicode (line 27) | def convert_to_unicode(text):
  function printable_text (line 47) | def printable_text(text):
  function load_vocab (line 70) | def load_vocab(vocab_file):
  function convert_by_vocab (line 85) | def convert_by_vocab(vocab, items):
  function convert_tokens_to_ids (line 93) | def convert_tokens_to_ids(vocab, tokens):
  function convert_ids_to_tokens (line 97) | def convert_ids_to_tokens(inv_vocab, ids):
  function whitespace_tokenize (line 101) | def whitespace_tokenize(text):
  class FullTokenizer (line 110) | class FullTokenizer(object):
    method __init__ (line 113) | def __init__(self, vocab_file, do_lower_case=True):
    method tokenize (line 119) | def tokenize(self, text):
    method convert_tokens_to_ids (line 127) | def convert_tokens_to_ids(self, tokens):
    method convert_ids_to_tokens (line 130) | def convert_ids_to_tokens(self, ids):
  class BasicTokenizer (line 134) | class BasicTokenizer(object):
    method __init__ (line 137) | def __init__(self, do_lower_case=True):
    method tokenize (line 145) | def tokenize(self, text):
    method _run_strip_accents (line 169) | def _run_strip_accents(self, text):
    method _run_split_on_punc (line 180) | def _run_split_on_punc(self, text):
    method _tokenize_chinese_chars (line 200) | def _tokenize_chinese_chars(self, text):
    method _is_chinese_char (line 213) | def _is_chinese_char(self, cp):
    method _clean_text (line 235) | def _clean_text(self, text):
  class WordpieceTokenizer (line 249) | class WordpieceTokenizer(object):
    method __init__ (line 252) | def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=...
    method tokenize (line 257) | def tokenize(self, text):
  function _is_whitespace (line 311) | def _is_whitespace(char):
  function _is_control (line 323) | def _is_control(char):
  function _is_punctuation (line 335) | def _is_punctuation(char):

FILE: a00_Bert/train_bert_multi-label.py
  function main (line 35) | def main(_):
  function create_model (line 103) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function do_eval (line 143) | def do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training,...
  function get_input_mask_segment_ids (line 175) | def get_input_mask_segment_ids(train_x_batch,cls_id):

FILE: a00_Bert/train_bert_toy_task.py
  function bert_train_fn (line 15) | def bert_train_fn():
  function bert_predict_fn (line 49) | def bert_predict_fn():
  function create_model (line 53) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...

FILE: a00_Bert/unused/run_classifier_multi_labels_bert.py
  class InputExample (line 135) | class InputExample(object):
    method __init__ (line 138) | def __init__(self, guid, text_a, text_b=None, label=None):
  class InputFeatures (line 156) | class InputFeatures(object):
    method __init__ (line 159) | def __init__(self, input_ids, input_mask, segment_ids, label_id):
  class DataProcessor (line 166) | class DataProcessor(object):
    method get_train_examples (line 169) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 173) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 177) | def get_test_examples(self, data_dir):
    method get_labels (line 181) | def get_labels(self):
    method _read_tsv (line 186) | def _read_tsv(cls, input_file, quotechar=None):
  class XnliProcessor (line 196) | class XnliProcessor(DataProcessor):
    method __init__ (line 199) | def __init__(self):
    method get_train_examples (line 202) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 221) | def get_dev_examples(self, data_dir):
    method get_labels (line 239) | def get_labels(self):
  class MnliProcessor (line 244) | class MnliProcessor(DataProcessor):
    method get_train_examples (line 247) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 252) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 258) | def get_test_examples(self, data_dir):
    method get_labels (line 263) | def get_labels(self):
    method _create_examples (line 267) | def _create_examples(self, lines, set_type):
  class MrpcProcessor (line 285) | class MrpcProcessor(DataProcessor):
    method get_train_examples (line 288) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 293) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 298) | def get_test_examples(self, data_dir):
    method get_labels (line 303) | def get_labels(self):
    method _create_examples (line 307) | def _create_examples(self, lines, set_type):
  class ColaProcessor (line 325) | class ColaProcessor(DataProcessor):
    method get_train_examples (line 328) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 333) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 338) | def get_test_examples(self, data_dir):
    method get_labels (line 343) | def get_labels(self):
    method _create_examples (line 347) | def _create_examples(self, lines, set_type):
  class SentimentAnalysisFineGrainProcessor (line 365) | class SentimentAnalysisFineGrainProcessor(DataProcessor):
    method get_train_examples (line 368) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 373) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 378) | def get_test_examples(self, data_dir):
    method get_labels (line 383) | def get_labels(self):
    method _create_examples (line 393) | def _create_examples(self, lines, set_type):
  class SentencePairClassificationProcessor (line 414) | class SentencePairClassificationProcessor(DataProcessor):
    method __init__ (line 416) | def __init__(self):
    method get_train_examples (line 419) | def get_train_examples(self, data_dir):
    method get_dev_examples (line 424) | def get_dev_examples(self, data_dir):
    method get_test_examples (line 429) | def get_test_examples(self, data_dir):
    method get_labels (line 434) | def get_labels(self):
    method _create_examples (line 438) | def _create_examples(self, lines, set_type):
  function convert_single_example (line 452) | def convert_single_example(ex_index, example, label_list, max_seq_length,
  function file_based_convert_examples_to_features (line 558) | def file_based_convert_examples_to_features(
  function file_based_input_fn_builder (line 591) | def file_based_input_fn_builder(input_file, seq_length, is_training,
  function _truncate_seq_pair (line 638) | def _truncate_seq_pair(tokens_a, tokens_b, max_length):
  function create_model_original (line 655) | def create_model_original(bert_config, is_training, input_ids, input_mas...
  function create_model (line 699) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function model_fn_builder (line 754) | def model_fn_builder(bert_config, num_labels, init_checkpoint, learning_...
  function input_fn_builder (line 851) | def input_fn_builder(features, seq_length, is_training, drop_remainder):
  function convert_examples_to_features (line 905) | def convert_examples_to_features(examples, label_list, max_seq_length,
  function main (line 921) | def main(_):

FILE: a00_Bert/unused/train_bert_multi-label_old.py
  function main (line 35) | def main(_):
  function create_model (line 104) | def create_model(bert_config, is_training, input_ids, input_mask, segmen...
  function do_eval (line 135) | def do_eval(sess,input_ids,input_mask,segment_ids,label_ids,is_training,...
  function get_input_mask_segment_ids (line 175) | def get_input_mask_segment_ids(train_x_batch,cls_id):

FILE: a00_Bert/utils.py
  function load_data (line 11) | def load_data(cache_file_h5py,cache_file_pickle):
  function compute_f1_score (line 43) | def compute_f1_score(predict_y,eval_y):
  function compute_f1_score_removed (line 55) | def compute_f1_score_removed(label_list_top5,eval_y):
  function compute_confuse_matrix (line 76) | def compute_confuse_matrix(target_y,predict_y,label_dict,name='default'):
  function compute_micro_macro (line 104) | def compute_micro_macro(label_dict):
  function compute_TF_FP_FN_micro (line 114) | def compute_TF_FP_FN_micro(label_dict):
  function compute_f1_micro_use_TFFPFN (line 127) | def compute_f1_micro_use_TFFPFN(label_dict):
  function compute_f1_macro_use_TFFPFN (line 137) | def compute_f1_macro_use_TFFPFN(label_dict):
  function compute_f1 (line 154) | def compute_f1(TP,FP,FN,compute_type):
  function init_label_dict (line 169) | def init_label_dict(num_classes):
  function get_target_label_short (line 180) | def get_target_label_short(eval_y):
  function get_target_label_short_batch (line 187) | def get_target_label_short_batch(eval_y_big): # tested.
  function get_label_using_prob (line 207) | def get_label_using_prob(prob,top_number=5):
  function get_label_using_logits_batch (line 213) | def get_label_using_logits_batch(prob,top_number=5): # tested.
  function calculate_accuracy (line 222) | def calculate_accuracy(labels_predicted, labels,eval_counter):
  function compute_confuse_matrix_batch (line 239) | def compute_confuse_matrix_batch(y_targetlabel_list,y_logits_array,label...

FILE: a00_boosting/a08_boosting.py
  function compute_labels_weights (line 14) | def compute_labels_weights(weights_label,logits,labels):
  function get_weights_for_current_batch (line 42) | def get_weights_for_current_batch(answer_list,weights_dict):
  function loss (line 59) | def loss(logits,labels,weights):
  function get_weights_label_as_standard_dict (line 65) | def get_weights_label_as_standard_dict(weights_label):

FILE: a01_FastText/old_single_label/p5_fastTextB_model.py
  class fastTextB (line 7) | class fastTextB:
    method __init__ (line 8) | def __init__(self, label_size, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 40) | def instantiate_weights(self):
    method inference (line 47) | def inference(self):
    method loss (line 59) | def loss(self,l2_lambda=0.01): #0.0001-->0.001
    method train (line 86) | def train(self):
  function test (line 93) | def test():

FILE: a01_FastText/old_single_label/p5_fastTextB_predict.py
  function main (line 37) | def main(_):
  function get_label_using_logits (line 84) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function write_question_id_with_labels (line 96) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a01_FastText/old_single_label/p5_fastTextB_train.py
  function main (line 34) | def main(_):
  function assign_pretrained_word_embedding (line 118) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 150) | def do_eval(sess,fast_text,evalX,evalY,batch_size):
  function load_data (line 159) | def load_data(cache_file_h5py,cache_file_pickle):

FILE: a01_FastText/p5_fastTextB_predict_multilabel.py
  function main (line 33) | def main(_):
  function get_label_using_logits (line 83) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function write_question_id_with_labels (line 93) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a01_FastText/p6_fastTextB_model_multilabel.py
  class fastTextB (line 6) | class fastTextB:
    method __init__ (line 7) | def __init__(self, label_size, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 45) | def instantiate_weights(self):
    method inference (line 52) | def inference(self):
    method loss (line 65) | def loss(self,l2_lambda=0.0001):
    method train (line 95) | def train(self):

FILE: a01_FastText/p6_fastTextB_train_multilabel.py
  function main (line 39) | def main(_):
  function do_eval (line 124) | def do_eval(sess,fast_text,evalX,evalY,batch_size,vocabulary_index2word_...
  function assign_pretrained_word_embedding (line 142) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function get_label_using_logits (line 176) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 186) | def calculate_accuracy(labels_predicted, labels,eval_counter):
  function load_data (line 197) | def load_data(cache_file_h5py,cache_file_pickle):
  function process_labels (line 228) | def process_labels(trainY_batch,require_size=5,number=None):
  function proces_label_to_algin (line 253) | def proces_label_to_algin(ys_list,require_size=5):

FILE: a02_TextCNN/data_util.py
  function load_data_multilabel (line 16) | def load_data_multilabel(traning_data_path,vocab_word2index, vocab_label...
  function transform_multilabel_as_multihot (line 52) | def transform_multilabel_as_multihot(label_list,label_size):
  function create_vocabulary (line 65) | def create_vocabulary(training_data_path,vocab_size,name_scope='cnn'):
  function load_data (line 130) | def load_data(cache_file_h5py,cache_file_pickle):

FILE: a02_TextCNN/other_experiement/data_util_zhihu.py
  function create_voabulary (line 15) | def create_voabulary(simple=None,word2vec_model_path='zhihu-word2vec-tit...
  function create_voabulary_label (line 47) | def create_voabulary_label(voabulary_label='train-zhihu4-only-title-all....
  function sort_by_value (line 97) | def sort_by_value(d):
  function create_voabulary_labelO (line 103) | def create_voabulary_labelO():
  function load_data_multilabel_new (line 119) | def load_data_multilabel_new(vocabulary_word2index,vocabulary_word2index...
  function load_data_multilabel_new_twoCNN (line 203) | def load_data_multilabel_new_twoCNN(vocabulary_word2index,vocabulary_wor...
  function load_data (line 265) | def load_data(vocabulary_word2index,vocabulary_word2index_label,valid_po...
  function process_one_sentence_to_get_ui_bi_tri_gram (line 306) | def process_one_sentence_to_get_ui_bi_tri_gram(sentence,n_gram=3):
  function load_data_with_multilabels (line 336) | def load_data_with_multilabels(vocabulary_word2index,vocabulary_word2ind...
  function transform_multilabel_as_multihot (line 388) | def transform_multilabel_as_multihot(label_list,label_size=1999): #1999l...
  function transform_multilabel_as_multihotO (line 400) | def transform_multilabel_as_multihotO(label_list,label_size=1999): #1999...
  function load_final_test_data (line 407) | def load_final_test_data(file_path):
  function load_data_predict (line 418) | def load_data_predict(vocabulary_word2index,vocabulary_word2index_label,...
  function proces_label_to_algin (line 436) | def proces_label_to_algin(ys_list,require_size=5):
  function write_uigram_to_trigram (line 455) | def write_uigram_to_trigram():
  function test_pad (line 461) | def test_pad():
  function read_topic_info (line 468) | def read_topic_info():
  function stat_training_data_length (line 485) | def stat_training_data_length():

FILE: a02_TextCNN/other_experiement/p7_TextCNN_model_multilayers.py
  class TextCNNMultilayers (line 7) | class TextCNNMultilayers:
    method __init__ (line 8) | def __init__(self, filter_sizes,num_filters,num_classes, learning_rate...
    method instantiate_weights (line 52) | def instantiate_weights(self):
    method inference (line 59) | def inference(self):
    method loss_multilabel (line 116) | def loss_multilabel(self,l2_lambda=0.0001): #0.0001#this loss function...
    method loss (line 131) | def loss(self,l2_lambda=0.0001):#0.001
    method train_old (line 142) | def train_old(self):
  function test (line 152) | def test():
  function get_label_y (line 180) | def get_label_y(input_x):
  function compute_single_label (line 189) | def compute_single_label(listt):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_predict_ensemble.py
  function main (line 4) | def main(_):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_predict_exp.py
  function get_logits_with_value_by_input_exp (line 81) | def get_logits_with_value_by_input_exp(start,end):
  function main (line 91) | def main(_):
  function get_label_using_logits (line 136) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 146) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 158) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512.py
  function main (line 40) | def main(_):
  function get_label_using_logits (line 85) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 95) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 107) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_0609.py
  function main (line 40) | def main(_):
  function get_label_using_logits (line 85) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 95) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 107) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_simple.py
  function main (line 40) | def main(_):
  function get_label_using_logits (line 85) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 95) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 107) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_train_exp.py
  function main (line 40) | def main(_):
  function assign_pretrained_word_embedding (line 120) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 153) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 169) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 180) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_train_exp512.py
  function main (line 40) | def main(_):
  function assign_pretrained_word_embedding (line 123) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 156) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 172) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 183) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a02_TextCNN/other_experiement/p7_TextCNN_train_exp_512_0609.py
  function main (line 43) | def main(_):
  function assign_pretrained_word_embedding (line 138) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 171) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 187) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 198) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a02_TextCNN/other_experiement/p8_TextCNN_predict_exp.py
  function get_logits_by_input_exp (line 77) | def get_logits_by_input_exp(start,end):
  function main (line 85) | def main(_):
  function get_label_using_logits (line 130) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 140) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 152) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a02_TextCNN/p7_TextCNN_model.py
  class TextCNN (line 7) | class TextCNN:
    method __init__ (line 8) | def __init__(self, filter_sizes,num_filters,num_classes, learning_rate...
    method instantiate_weights (line 57) | def instantiate_weights(self):
    method inference (line 64) | def inference(self):
    method cnn_single_layer (line 83) | def cnn_single_layer(self):
    method cnn_multiple_layers (line 120) | def cnn_multiple_layers(self):
    method loss_multilabel (line 159) | def loss_multilabel(self,l2_lambda=0.0001): #0.0001#this loss function...
    method loss (line 174) | def loss(self,l2_lambda=0.0001):#0.001
    method train_old (line 185) | def train_old(self):
    method train (line 191) | def train(self):
  function test (line 206) | def test():
  function get_label_y (line 235) | def get_label_y(input_x):
  function compute_single_label (line 244) | def compute_single_label(listt):

FILE: a02_TextCNN/p7_TextCNN_predict.py
  function get_logits_with_value_by_input (line 84) | def get_logits_with_value_by_input(start,end):
  function main (line 94) | def main(_):
  function get_label_using_logits (line 139) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 149) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 161) | def write_question_id_with_labels(question_id,labels_list,f):
  function load_data (line 165) | def load_data(cache_file_h5py,cache_file_pickle):

FILE: a02_TextCNN/p7_TextCNN_train.py
  function main (line 46) | def main(_):
  function do_eval (line 131) | def do_eval(sess, textCNN, evalX, evalY, num_classes):
  function fastF1 (line 159) | def fastF1(result: list, predict: list, num_classes: int):
  function assign_pretrained_word_embedding (line 190) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function load_data (line 222) | def load_data(cache_file_h5py,cache_file_pickle):

FILE: a02_TextCNN/p7_temp.py
  function read_write (line 3) | def read_write(source_file_path,target_file_path):

FILE: a03_TextRNN/p8_TextRNN_model.py
  class TextRNN (line 7) | class TextRNN:
    method __init__ (line 8) | def __init__(self,num_classes, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 42) | def instantiate_weights(self):
    method inference (line 49) | def inference(self):
    method loss (line 75) | def loss(self,l2_lambda=0.0001):
    method loss_nce (line 86) | def loss_nce(self,l2_lambda=0.0001): #0.0001-->0.001
    method train (line 105) | def train(self):
  function test (line 112) | def test():

FILE: a03_TextRNN/p8_TextRNN_model_multi_layers.py
  class TextRNN (line 7) | class TextRNN:
    method __init__ (line 8) | def __init__(self,num_classes, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 42) | def instantiate_weights(self):
    method inference (line 49) | def inference(self):
    method loss (line 83) | def loss(self,l2_lambda=0.0001):
    method loss_nce (line 94) | def loss_nce(self,l2_lambda=0.0001): #0.0001-->0.001
    method train (line 113) | def train(self):
  function test (line 120) | def test():

FILE: a03_TextRNN/p8_TextRNN_predict.py
  function main (line 32) | def main(_):
  function get_label_using_logits (line 83) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_batch (line 96) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 110) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a03_TextRNN/p8_TextRNN_train.py
  function main (line 33) | def main(_):
  function assign_pretrained_word_embedding (line 108) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 141) | def do_eval(sess,textRNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 154) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 165) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a04_TextRCNN/p71_TextRCNN_mode2.py
  class TextRCNN (line 6) | class TextRCNN:
    method __init__ (line 7) | def __init__(self,num_classes, learning_rate, decay_steps, decay_rate,...
    method instantiate_weights (line 52) | def instantiate_weights(self):
    method get_context_left (line 74) | def get_context_left(self,context_left,embedding_previous):
    method get_context_right (line 87) | def get_context_right(self,context_right,embedding_afterward):
    method conv_layer_with_recurrent_structure (line 100) | def conv_layer_with_recurrent_structure(self):
    method inference (line 142) | def inference(self):
    method loss (line 162) | def loss(self,l2_lambda=0.0001):#0.001
    method loss_multilabel (line 173) | def loss_multilabel(self,l2_lambda=0.00001): #0.0001#this loss functio...
    method train (line 188) | def train(self):
  function test (line 195) | def test():

FILE: a04_TextRCNN/p71_TextRCNN_model.py
  class TextRCNN (line 6) | class TextRCNN:
    method __init__ (line 7) | def __init__(self,num_classes, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 51) | def instantiate_weights(self):
    method get_context_left (line 66) | def get_context_left(self,context_left,embedding_previous):
    method get_context_right (line 78) | def get_context_right(self,context_right,embedding_afterward):
    method conv_layer_with_recurrent_structure (line 90) | def conv_layer_with_recurrent_structure(self):
    method inference (line 131) | def inference(self):
    method loss (line 149) | def loss(self,l2_lambda=0.0001):#0.001
    method loss_multilabel (line 160) | def loss_multilabel(self,l2_lambda=0.00001): #0.0001#this loss functio...
    method train (line 175) | def train(self):
  function test (line 182) | def test():

FILE: a04_TextRCNN/p71_TextRCNN_predict.py
  function main (line 35) | def main(_):
  function get_label_using_logits (line 84) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 94) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 106) | def write_question_id_with_labels(question_id,labels_list,f):
  function get_label_using_logits_batch (line 111) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 125) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a04_TextRCNN/p71_TextRCNN_train.py
  function main (line 36) | def main(_):
  function assign_pretrained_word_embedding (line 116) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 149) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 165) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 176) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a05_HierarchicalAttentionNetwork/HAN_model.py
  class HierarchicalAttention (line 8) | class HierarchicalAttention:
    method __init__ (line 9) | def __init__(self,  accusation_num_classes,article_num_classes, deathp...
    method inference (line 55) | def inference(self):
    method attention (line 101) | def attention(self,input_sequences,attention_level,reuse_flag=False):
    method bi_lstm (line 119) | def bi_lstm(self, input_sequences, level,num_units, reuse_flag=False):
    method loss (line 135) | def loss(self,l2_lambda=0.0001):
    method train (line 169) | def train(self):
    method instantiate_weights (line 179) | def instantiate_weights(self):
  function test (line 186) | def test():

FILE: a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_model.py
  class HierarchicalAttention (line 7) | class HierarchicalAttention:
    method __init__ (line 8) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method attention_word_level (line 65) | def attention_word_level(self, hidden_state):
    method attention_sentence_level (line 101) | def attention_sentence_level(self, hidden_state_sentence):
    method inference (line 133) | def inference(self):
    method loss (line 174) | def loss(self, l2_lambda=0.0001):  # 0.001
    method loss_multilabel (line 187) | def loss_multilabel(self, l2_lambda=0.00001*10): #*3#0.00001 #TODO 0.0...
    method train (line 203) | def train(self):
    method gru_single_step_word_level (line 213) | def gru_single_step_word_level(self, Xt, h_t_minus_1):
    method gru_single_step_sentence_level (line 232) | def gru_single_step_sentence_level(self, Xt,
    method gru_forward_word_level (line 254) | def gru_forward_word_level(self, embedded_words):
    method gru_backward_word_level (line 273) | def gru_backward_word_level(self, embedded_words):
    method gru_forward_sentence_level (line 294) | def gru_forward_sentence_level(self, sentence_representation):
    method gru_backward_sentence_level (line 314) | def gru_backward_sentence_level(self, sentence_representation):
    method instantiate_weights (line 334) | def instantiate_weights(self):
  function test (line 393) | def test():

FILE: a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_model_transformer.py
  class HierarchicalAttention (line 8) | class HierarchicalAttention:
    method __init__ (line 9) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method attention_word_level (line 66) | def attention_word_level(self, hidden_state):
    method attention_sentence_level (line 93) | def attention_sentence_level(self, hidden_state_sentence):
    method inference (line 125) | def inference(self):
    method loss (line 192) | def loss(self, l2_lambda=0.0001):  # 0.001
    method loss_multilabel (line 205) | def loss_multilabel(self, l2_lambda=0.00001*10): #*3#0.00001 #TODO 0.0...
    method train (line 221) | def train(self):
    method gru_single_step_word_level (line 231) | def gru_single_step_word_level(self, Xt, h_t_minus_1):
    method gru_single_step_sentence_level (line 250) | def gru_single_step_sentence_level(self, Xt,
    method gru_forward_word_level (line 272) | def gru_forward_word_level(self, embedded_words):
    method gru_backward_word_level (line 292) | def gru_backward_word_level(self, embedded_words):
    method gru_forward_sentence_level (line 313) | def gru_forward_sentence_level(self, sentence_representation):
    method gru_backward_sentence_level (line 333) | def gru_backward_sentence_level(self, sentence_representation):
    method layer_normalization (line 353) | def layer_normalization(self,x,scope):
    method instantiate_weights (line 369) | def instantiate_weights(self):
  function test (line 429) | def test():
  function get_input_y (line 461) | def get_input_y(i,input_x,batch_size):

FILE: a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_predict.py
  function main (line 43) | def main(_):
  function get_label_using_logits (line 93) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 103) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 115) | def write_question_id_with_labels(question_id,labels_list,f):
  function get_label_using_logits_batch (line 120) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 134) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_train.py
  function main (line 42) | def main(_):
  function assign_pretrained_word_embedding (line 149) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 182) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 198) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 209) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a05_HierarchicalAttentionNetwork/p1_seq2seq.py
  function extract_argmax_and_embed (line 5) | def extract_argmax_and_embed(embedding, output_projection=None):
  function rnn_decoder_with_attention (line 23) | def rnn_decoder_with_attention(decoder_inputs, initial_state, cell, loop...

FILE: a06_Seq2seqWithAttention/a1_seq2seq.py
  function extract_argmax_and_embed (line 5) | def extract_argmax_and_embed(embedding, output_projection=None):
  function rnn_decoder_with_attention (line 23) | def rnn_decoder_with_attention(decoder_inputs, initial_state, cell, loop...

FILE: a06_Seq2seqWithAttention/a1_seq2seq_attention_model.py
  class seq2seq_attention_model (line 10) | class seq2seq_attention_model:
    method __init__ (line 11) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method inference (line 50) | def inference(self):
    method loss_seq2seq (line 95) | def loss_seq2seq(self):
    method train (line 106) | def train(self):
    method gru_cell (line 115) | def gru_cell(self, Xt, h_t_minus_1):
    method gru_cell_decoder (line 132) | def gru_cell_decoder(self, Xt, h_t_minus_1,context_vector):
    method gru_forward (line 151) | def gru_forward(self, embedded_words,gru_cell, reverse=False):
    method instantiate_weights (line 170) | def instantiate_weights(self):
  function test (line 217) | def test():
  function get_unique_labels (line 249) | def get_unique_labels():

FILE: a06_Seq2seqWithAttention/a1_seq2seq_attention_predict.py
  function main (line 50) | def main(_):
  function get_label_using_logits (line 96) | def get_label_using_logits(logits, predictions,vocabulary_index2word_lab...
  function process_each_row_get_lable (line 106) | def process_each_row_get_lable(row,vocabulary_index2word_label,vocabular...
  function get_label_using_logitsO (line 125) | def get_label_using_logitsO(pred_list, vocabulary_index2word_label,vocab...
  function write_question_id_with_labels (line 138) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a06_Seq2seqWithAttention/a1_seq2seq_attention_train.py
  function main (line 41) | def main(_):
  function assign_pretrained_word_embedding (line 142) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 175) | def do_eval(sess,model,evalX,evalY,batch_size,vocabulary_index2word_labe...
  function get_label_using_logits (line 196) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 207) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a07_Transformer/a2_attention_between_enc_dec.py
  class AttentionEncoderDecoder (line 13) | class AttentionEncoderDecoder(BaseClass):
    method __init__ (line 14) | def __init__(self,d_model, d_k, d_v, sequence_length, h, batch_size,Q,...
    method attention_encoder_decoder_fn (line 35) | def attention_encoder_decoder_fn(self):
  function test (line 41) | def test():

FILE: a07_Transformer/a2_base_model.py
  class BaseClass (line 6) | class BaseClass(object):
    method __init__ (line 10) | def __init__(self,d_model,d_k,d_v,sequence_length,h,batch_size,num_lay...
    method sub_layer_postion_wise_feed_forward (line 30) | def sub_layer_postion_wise_feed_forward(self ,x ,layer_index,type)  :#...
    method sub_layer_multi_head_attention (line 42) | def sub_layer_multi_head_attention(self ,layer_index ,Q ,K_s,type,mask...
    method sub_layer_layer_norm_residual_connection (line 65) | def sub_layer_layer_norm_residual_connection(self,layer_input ,layer_o...

FILE: a07_Transformer/a2_decoder.py
  class Decoder (line 18) | class Decoder(BaseClass):
    method __init__ (line 19) | def __init__(self,d_model,d_k,d_v,sequence_length,h,batch_size,Q,K_s,K...
    method decoder_fn (line 43) | def decoder_fn(self):
    method decoder_single_layer (line 54) | def decoder_single_layer(self,Q,K_s,layer_index):
  function get_mask (line 94) | def get_mask(sequence_length):
  function init (line 100) | def init():
  function test_decoder_single_layer (line 130) | def test_decoder_single_layer():
  function test_decoder (line 137) | def test_decoder():

FILE: a07_Transformer/a2_encoder.py
  class Encoder (line 13) | class Encoder(BaseClass):
    method __init__ (line 14) | def __init__(self,d_model,d_k,d_v,sequence_length,h,batch_size,num_lay...
    method encoder_fn (line 33) | def encoder_fn(self):
    method encoder_single_layer (line 45) | def encoder_single_layer(self,Q,K_s,layer_index):
  function init (line 66) | def init():
  function get_mask (line 91) | def get_mask(batch_size,sequence_length):
  function test_sub_layer_multi_head_attention (line 97) | def test_sub_layer_multi_head_attention(encoder_class,index_layer,Q,K_s):
  function test_postion_wise_feed_forward (line 101) | def test_postion_wise_feed_forward(encoder_class,x,layer_index):

FILE: a07_Transformer/a2_layer_norm_residual_conn.py
  class LayerNormResidualConnection (line 6) | class LayerNormResidualConnection(object):
    method __init__ (line 7) | def __init__(self,x,y,layer_index,type,residual_dropout=0.1,use_residu...
    method layer_norm_residual_connection (line 16) | def layer_norm_residual_connection(self):
    method residual_connection (line 25) | def residual_connection(self):
    method layer_normalization (line 30) | def layer_normalization(self,x):
  function test (line 48) | def test():

FILE: a07_Transformer/a2_multi_head_attention.py
  class MultiHeadAttention (line 17) | class MultiHeadAttention(object):
    method __init__ (line 19) | def __init__(self,Q,K_s,V_s,d_model,d_k,d_v,sequence_length,h,type=Non...
    method multi_head_attention_fn (line 34) | def multi_head_attention_fn(self):
    method scaled_dot_product_attention_batch_mine (line 58) | def scaled_dot_product_attention_batch_mine(self,Q,K_s,V_s): #my own i...
    method scaled_dot_product_attention_batch (line 86) | def scaled_dot_product_attention_batch(self, Q, K_s, V_s):# scaled dot...
  function multi_head_attention_for_sentence_vectorized (line 119) | def multi_head_attention_for_sentence_vectorized(layer_number):
  function get_mask (line 151) | def get_mask(batch_size,sequence_length):

FILE: a07_Transformer/a2_poistion_wise_feed_forward.py
  class PositionWiseFeedFoward (line 16) | class PositionWiseFeedFoward(object): #TODO make it parallel
    method __init__ (line 21) | def __init__(self,x,layer_index,d_model=512,d_ff=2048):
    method position_wise_feed_forward_fn (line 35) | def position_wise_feed_forward_fn(self):
  function test_position_wise_feed_forward_fn (line 62) | def test_position_wise_feed_forward_fn():
  function test (line 72) | def test():

FILE: a07_Transformer/a2_predict.py
  function main (line 47) | def main(_):
  function get_label_using_logits (line 112) | def get_label_using_logits(logits, predictions,vocabulary_index2word_lab...
  function process_each_row_get_lable (line 122) | def process_each_row_get_lable(row,vocabulary_index2word_label,vocabular...
  function write_question_id_with_labels (line 142) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a07_Transformer/a2_predict_classification.py
  function main (line 44) | def main(_):
  function get_label_using_logits_batch (line 98) | def get_label_using_logits_batch(question_id_sublist, logits_batch, voca...
  function get_label_using_logits (line 115) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function process_each_row_get_lable (line 124) | def process_each_row_get_lable(row,vocabulary_index2word_label,vocabular...
  function write_question_id_with_labels (line 144) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a07_Transformer/a2_train.py
  function main (line 46) | def main(_):
  function assign_pretrained_word_embedding (line 147) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 180) | def do_eval(sess,model,evalX,evalY,batch_size,vocabulary_index2word_labe...

FILE: a07_Transformer/a2_train_classification.py
  function main (line 45) | def main(_):
  function assign_pretrained_word_embedding (line 140) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 173) | def do_eval(sess,model,evalX,evalY,batch_size,vocabulary_index2word_labe...

FILE: a07_Transformer/a2_transformer.py
  class Transformer (line 33) | class Transformer(BaseClass):
    method __init__ (line 34) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method inference (line 72) | def inference(self):
    method loss_seq2seq (line 110) | def loss_seq2seq(self):
    method train (line 119) | def train(self):
    method instantiate_weights (line 128) | def instantiate_weights(self):
    method get_mask (line 136) | def get_mask(self,sequence_length):
  function test_training (line 142) | def test_training():
  function test_predict (line 187) | def test_predict():
  function test_training_batch (line 231) | def test_training_batch():
  function get_unique_labels (line 282) | def get_unique_labels(length=5):
  function get_unique_labels_batch (line 290) | def get_unique_labels_batch(batch_size,length=None):

FILE: a07_Transformer/a2_transformer_classification.py
  class Transformer (line 32) | class Transformer(BaseClass):
    method __init__ (line 33) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method inference (line 71) | def inference(self):
    method loss (line 95) | def loss(self, l2_lambda=0.0001):  # 0.001
    method train (line 115) | def train(self):
    method instantiate_weights (line 124) | def instantiate_weights(self):
    method get_mask (line 132) | def get_mask(self,sequence_length):
  function test_training (line 138) | def test_training():
  function test_predict (line 181) | def test_predict():
  function get_unique_labels (line 219) | def get_unique_labels(length=5):
  function get_unique_labels_batch (line 227) | def get_unique_labels_batch(batch_size,length=None):

FILE: a07_Transformer/data_util_zhihu.py
  function create_voabulary (line 13) | def create_voabulary(simple=None,word2vec_model_path='../zhihu-word2vec-...
  function create_voabulary_label (line 45) | def create_voabulary_label(voabulary_label='../train-zhihu4-only-title-a...
  function sort_by_value (line 95) | def sort_by_value(d):
  function create_voabulary_labelO (line 101) | def create_voabulary_labelO():
  function load_data_multilabel_new (line 117) | def load_data_multilabel_new(vocabulary_word2index,vocabulary_word2index...
  function load_data_multilabel_new_twoCNN (line 201) | def load_data_multilabel_new_twoCNN(vocabulary_word2index,vocabulary_wor...
  function load_data (line 263) | def load_data(vocabulary_word2index,vocabulary_word2index_label,valid_po...
  function process_one_sentence_to_get_ui_bi_tri_gram (line 304) | def process_one_sentence_to_get_ui_bi_tri_gram(sentence,n_gram=3):
  function load_data_with_multilabels (line 334) | def load_data_with_multilabels(vocabulary_word2index,vocabulary_word2ind...
  function transform_multilabel_as_multihot (line 386) | def transform_multilabel_as_multihot(label_list,label_size=1999): #1999l...
  function transform_multilabel_as_multihotO (line 398) | def transform_multilabel_as_multihotO(label_list,label_size=1999): #1999...
  function load_final_test_data (line 405) | def load_final_test_data(file_path):
  function load_data_predict (line 416) | def load_data_predict(vocabulary_word2index,vocabulary_word2index_label,...
  function proces_label_to_algin (line 434) | def proces_label_to_algin(ys_list,require_size=5):
  function write_uigram_to_trigram (line 453) | def write_uigram_to_trigram():
  function test_pad (line 459) | def test_pad():
  function read_topic_info (line 466) | def read_topic_info():
  function stat_training_data_length (line 483) | def stat_training_data_length():

FILE: a08_EntityNetwork/a3_entity_network.py
  class EntityNetwork (line 10) | class EntityNetwork:
    method __init__ (line 11) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method inference (line 68) | def inference(self):
    method embedding_with_mask (line 83) | def embedding_with_mask(self):
    method input_encoder_bow (line 94) | def input_encoder_bow(self):
    method input_encoder_bi_lstm (line 99) | def input_encoder_bi_lstm(self):
    method activation (line 128) | def activation(self,features, scope=None):  # scope=None
    method output_module (line 135) | def output_module(self):
    method rnn_story (line 154) | def rnn_story(self):
    method cell (line 174) | def cell(self,s_t,h_all,w_all,i):
    method loss (line 201) | def loss(self, l2_lambda=0.0001):  # 0.001
    method loss_multilabel (line 212) | def loss_multilabel(self, l2_lambda=0.0001): #this loss function is fo...
    method smoothing_cross_entropy (line 224) | def smoothing_cross_entropy(self,logits, labels, vocab_size, confidenc...
    method train (line 242) | def train(self):
    method instantiate_weights (line 255) | def instantiate_weights(self):
  function test (line 283) | def test():
  function predict (line 325) | def predict():

FILE: a08_EntityNetwork/a3_predict.py
  function main (line 43) | def main(_):
  function get_label_using_logits (line 91) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 101) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 113) | def write_question_id_with_labels(question_id,labels_list,f):
  function get_label_using_logits_batch (line 118) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 132) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a08_EntityNetwork/a3_train.py
  function main (line 44) | def main(_):
  function assign_pretrained_word_embedding (line 146) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 179) | def do_eval(sess,model,evalX,evalY,batch_size,vocabulary_index2word_labe...
  function get_label_using_logits (line 199) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 210) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: a08_EntityNetwork/data_util_zhihu.py
  function create_voabulary (line 13) | def create_voabulary(simple=None,word2vec_model_path='../zhihu-word2vec-...
  function create_voabulary_label (line 45) | def create_voabulary_label(voabulary_label='train-zhihu4-only-title-all....
  function sort_by_value (line 95) | def sort_by_value(d):
  function create_voabulary_labelO (line 101) | def create_voabulary_labelO():
  function load_data_multilabel_new (line 117) | def load_data_multilabel_new(vocabulary_word2index,vocabulary_word2index...
  function load_data_multilabel_new_twoCNN (line 201) | def load_data_multilabel_new_twoCNN(vocabulary_word2index,vocabulary_wor...
  function load_data (line 263) | def load_data(vocabulary_word2index,vocabulary_word2index_label,valid_po...
  function process_one_sentence_to_get_ui_bi_tri_gram (line 304) | def process_one_sentence_to_get_ui_bi_tri_gram(sentence,n_gram=3):
  function load_data_with_multilabels (line 334) | def load_data_with_multilabels(vocabulary_word2index,vocabulary_word2ind...
  function transform_multilabel_as_multihot (line 386) | def transform_multilabel_as_multihot(label_list,label_size=1999): #1999l...
  function transform_multilabel_as_multihotO (line 398) | def transform_multilabel_as_multihotO(label_list,label_size=1999): #1999...
  function load_final_test_data (line 405) | def load_final_test_data(file_path):
  function load_data_predict (line 416) | def load_data_predict(vocabulary_word2index,vocabulary_word2index_label,...
  function proces_label_to_algin (line 434) | def proces_label_to_algin(ys_list,require_size=5):
  function write_uigram_to_trigram (line 453) | def write_uigram_to_trigram():
  function test_pad (line 459) | def test_pad():
  function read_topic_info (line 466) | def read_topic_info():
  function stat_training_data_length (line 483) | def stat_training_data_length():

FILE: a08_predict_ensemble.py
  function main (line 69) | def main(_):
  function get_label_using_logits (line 186) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 196) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 208) | def write_question_id_with_labels(question_id,labels_list,f):
  function get_label_using_logits_batch (line 213) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 227) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a09_DynamicMemoryNet/a8_dynamic_memory_network.py
  class DynamicMemoryNetwork (line 16) | class DynamicMemoryNetwork:
    method __init__ (line 17) | def __init__(self, num_classes, learning_rate, batch_size, decay_steps...
    method inference (line 74) | def inference(self):
    method input_module (line 86) | def input_module(self):
    method question_module (line 94) | def question_module(self):
    method episodic_memory_module (line 103) | def episodic_memory_module(self):#input(story):[batch_size,story_lengt...
    method answer_module (line 140) | def answer_module(self):
    method gated_gru (line 163) | def gated_gru(self,c_current,h_previous,g_current):
    method attention_mechanism_parallel (line 177) | def attention_mechanism_parallel(self,c_full,m,q,i):
    method x1Wx2_parallel (line 206) | def x1Wx2_parallel(self,x1,x2,scope):
    method gru_cell (line 222) | def gru_cell(self, Xt, h_t_minus_1,variable_scope):
    method loss (line 240) | def loss(self, l2_lambda=0.0001):  # 0.001
    method loss_multilabel (line 251) | def loss_multilabel(self, l2_lambda=0.0001): #0.0001 this loss functio...
    method smoothing_cross_entropy (line 263) | def smoothing_cross_entropy(self,logits, labels, vocab_size, confidenc...
    method train (line 281) | def train(self):
    method instantiate_weights (line 293) | def instantiate_weights(self):
  function train (line 315) | def train():
  function predict (line 356) | def predict():

FILE: a09_DynamicMemoryNet/a8_predict.py
  function main (line 46) | def main(_):
  function get_label_using_logits (line 96) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function get_label_using_logits_with_value (line 106) | def get_label_using_logits_with_value(logits,vocabulary_index2word_label...
  function write_question_id_with_labels (line 118) | def write_question_id_with_labels(question_id,labels_list,f):
  function get_label_using_logits_batch (line 123) | def get_label_using_logits_batch(question_id_sublist,logits_batch,vocabu...
  function write_question_id_with_labels (line 137) | def write_question_id_with_labels(question_id,labels_list,f):

FILE: a09_DynamicMemoryNet/a8_train.py
  function main (line 45) | def main(_):
  function assign_pretrained_word_embedding (line 144) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 177) | def do_eval(sess,model,evalX,evalY,batch_size,vocabulary_index2word_labe...
  function get_label_using_logits (line 192) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 203) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: aa1_data_util/1_process_zhihu.py
  function split_list (line 83) | def split_list(listt):
  function write_data_to_file_system (line 113) | def write_data_to_file_system(file_name, data):
  function write_data_to_file_system_multilabel (line 122) | def write_data_to_file_system_multilabel(file_name, data):

FILE: aa1_data_util/3_process_zhihu_question_topic_relation.py
  function read_topic_info (line 62) | def read_topic_info():
  function split_list (line 98) | def split_list(listt):
  function write_data_to_file_system (line 134) | def write_data_to_file_system(file_name, data):

FILE: aa1_data_util/data_util_zhihu.py
  function create_voabulary (line 13) | def create_voabulary(simple=None,word2vec_model_path='zhihu-word2vec-tit...
  function create_voabulary_label (line 45) | def create_voabulary_label(voabulary_label='train-zhihu4-only-title-all....
  function sort_by_value (line 95) | def sort_by_value(d):
  function create_voabulary_labelO (line 101) | def create_voabulary_labelO():
  function load_data_multilabel_new (line 117) | def load_data_multilabel_new(vocabulary_word2index,vocabulary_word2index...
  function load_data_multilabel_new_twoCNN (line 201) | def load_data_multilabel_new_twoCNN(vocabulary_word2index,vocabulary_wor...
  function load_data (line 263) | def load_data(vocabulary_word2index,vocabulary_word2index_label,valid_po...
  function process_one_sentence_to_get_ui_bi_tri_gram (line 304) | def process_one_sentence_to_get_ui_bi_tri_gram(sentence,n_gram=3):
  function load_data_with_multilabels (line 334) | def load_data_with_multilabels(vocabulary_word2index,vocabulary_word2ind...
  function transform_multilabel_as_multihot (line 386) | def transform_multilabel_as_multihot(label_list,label_size=1999): #1999l...
  function transform_multilabel_as_multihotO (line 398) | def transform_multilabel_as_multihotO(label_list,label_size=1999): #1999...
  function load_final_test_data (line 405) | def load_final_test_data(file_path):
  function load_data_predict (line 416) | def load_data_predict(vocabulary_word2index,vocabulary_word2index_label,...
  function proces_label_to_algin (line 434) | def proces_label_to_algin(ys_list,require_size=5):
  function write_uigram_to_trigram (line 454) | def write_uigram_to_trigram():
  function test_pad (line 460) | def test_pad():
  function read_topic_info (line 467) | def read_topic_info():
  function stat_training_data_length (line 484) | def stat_training_data_length():

FILE: aa2_ClassificationTflearn/p2_classification_tflearn.py
  function convert_int_to_one_hot (line 17) | def convert_int_to_one_hot(number,label_size):

FILE: aa2_ClassificationTflearn/p2_classification_tflearn_demo.py
  function convert_int_to_one_hot (line 17) | def convert_int_to_one_hot(number,label_size):

FILE: aa4_TextCNN_with_RCNN/p72_TextCNN_with_RCNN_model.py
  class TextCNN_with_RCNN (line 8) | class TextCNN_with_RCNN:
    method __init__ (line 9) | def __init__(self, filter_sizes,num_filters,num_classes, learning_rate...
    method instantiate_weights_cnn (line 59) | def instantiate_weights_cnn(self):
    method instantiate_weights_rcnn (line 66) | def instantiate_weights_rcnn(self):
    method inference1 (line 82) | def inference1(self):
    method get_context_left (line 126) | def get_context_left(self,context_left,embedding_previous):
    method get_context_right (line 138) | def get_context_right(self,context_right,embedding_afterward):
    method conv_layer_with_recurrent_structure (line 150) | def conv_layer_with_recurrent_structure(self):
    method inference2 (line 191) | def inference2(self):
    method inference (line 209) | def inference(self):
    method loss (line 219) | def loss(self,l2_lambda=0.0001):#0.001
    method loss_multilabel (line 230) | def loss_multilabel(self,l2_lambda=0.00001): #0.0001#this loss functio...
    method train (line 245) | def train(self):
  function test (line 252) | def test():

FILE: aa4_TextCNN_with_RCNN/p72_TextCNN_with_RCNN_train.py
  function main (line 39) | def main(_):
  function assign_pretrained_word_embedding (line 119) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 152) | def do_eval(sess,textCNN,evalX,evalY,batch_size,vocabulary_index2word_la...
  function get_label_using_logits (line 168) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 179) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: aa5_BiLstmTextRelation/p9_BiLstmTextRelation_model.py
  class BiLstmTextRelation (line 12) | class BiLstmTextRelation:
    method __init__ (line 13) | def __init__(self,num_classes, learning_rate, batch_size, decay_steps,...
    method instantiate_weights (line 47) | def instantiate_weights(self):
    method inference (line 54) | def inference(self):
    method loss (line 79) | def loss(self,l2_lambda=0.0001):
    method train (line 90) | def train(self):

FILE: aa5_BiLstmTextRelation/p9_BiLstmTextRelation_train.py
  function main (line 33) | def main(_):
  function assign_pretrained_word_embedding (line 117) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 150) | def do_eval(sess,biLstmTR,evalX,evalY,batch_size,vocabulary_index2word_l...
  function get_label_using_logits (line 163) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 174) | def calculate_accuracy(labels_predicted, labels,eval_counter):

FILE: aa6_TwoCNNTextRelation/p9_twoCNNTextRelation_model.py
  class TwoCNNTextRelation (line 9) | class TwoCNNTextRelation:
    method __init__ (line 10) | def __init__(self, filter_sizes,num_filters,num_classes, learning_rate...
    method instantiate_weights (line 47) | def instantiate_weights(self):
    method inference (line 54) | def inference(self):
    method conv_relu_pool_dropout (line 72) | def conv_relu_pool_dropout(self,sentence_embeddings_expanded, name_sco...
    method loss (line 113) | def loss(self,l2_lambda=0.0001):#0.001
    method loss_multilabel (line 124) | def loss_multilabel(self,l2_lambda=0.001): #this loss function is for ...
    method train (line 138) | def train(self):

FILE: aa6_TwoCNNTextRelation/p9_twoCNNTextRelation_train.py
  function main (line 36) | def main(_):
  function assign_pretrained_word_embedding (line 117) | def assign_pretrained_word_embedding(sess,vocabulary_index2word,vocab_si...
  function do_eval (line 150) | def do_eval(sess,twoCNNTR,evalX,evalX2,evalY,batch_size,vocabulary_index...
  function get_label_using_logits (line 163) | def get_label_using_logits(logits,vocabulary_index2word_label,top_number...
  function calculate_accuracy (line 174) | def calculate_accuracy(labels_predicted, labels,eval_counter):
Copy disabled (too large) Download .json
Condensed preview — 108 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (23,753K chars).
[
  {
    "path": ".travis.yml",
    "chars": 602,
    "preview": "language: python\npython:\n    - 2.7.13\n    - 3.6.2\ninstall:\n    - pip install flake8==3.3.0  # pytest  # add another test"
  },
  {
    "path": "LICENSE.md",
    "chars": 1069,
    "preview": "MIT License\n\nCopyright (c) [year] [fullname]\n\nPermission is hereby granted, free of charge, to any person obtaining a co"
  },
  {
    "path": "README.md",
    "chars": 38372,
    "preview": "Text Classification\n-------------------------------------------------------------------------\nThe purpose of this reposi"
  },
  {
    "path": "a00_Bert/README_bert.md",
    "chars": 186,
    "preview": "\n1. train bert for multi-label classification:\n\n    python try train_bert_multi-label.py\n    \n2. to run bert without rea"
  },
  {
    "path": "a00_Bert/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "a00_Bert/bert_modeling.py",
    "chars": 38611,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "a00_Bert/optimization.py",
    "chars": 6046,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "a00_Bert/run_classifier_predict_online.py",
    "chars": 15219,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "a00_Bert/tokenization.py",
    "chars": 10559,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "a00_Bert/train_bert_multi-label.py",
    "chars": 12150,
    "preview": "# coding=utf-8\n\"\"\"\ntrain bert model\n\n1.get training data and vocabulary & labels dict\n2. create model\n3. train the model"
  },
  {
    "path": "a00_Bert/train_bert_toy_task.py",
    "chars": 3673,
    "preview": "# coding=utf-8\n\"\"\"\ntrain bert model\n\"\"\"\nimport modeling\nimport tensorflow as tf\nimport numpy as np\nimport argparse\n\npars"
  },
  {
    "path": "a00_Bert/unused/run_classifier_multi_labels_bert.py",
    "chars": 39638,
    "preview": "# coding=utf-8\n# Copyright 2018 The Google AI Language Team Authors.\n#\n# Licensed under the Apache License, Version 2.0 "
  },
  {
    "path": "a00_Bert/unused/train_bert_multi-label_old.py",
    "chars": 11565,
    "preview": "# coding=utf-8\n\"\"\"\ntrain bert model\n\n1.get training data and vocabulary & labels dict\n2. create model\n3. train the model"
  },
  {
    "path": "a00_Bert/utils.py",
    "chars": 8755,
    "preview": "# -*- coding: utf-8 -*-\n\nimport pickle\nimport h5py\nimport os\nimport numpy as np\nimport random\n\nrandom_number=300\n\ndef lo"
  },
  {
    "path": "a00_boosting/a08_boosting.py",
    "chars": 2549,
    "preview": "# -*- coding: utf-8 -*-\r\nimport sys\r\nreload(sys)\r\nsys.setdefaultencoding('utf8')\r\nimport numpy as np\r\nimport tensorflow "
  },
  {
    "path": "a01_FastText/old_single_label/p5_fastTextB_model.py",
    "chars": 7221,
    "preview": "# fast text. using: very simple model;n-gram to captrue location information;h-softmax to speed up training/inference\n# "
  },
  {
    "path": "a01_FastText/old_single_label/p5_fastTextB_predict.py",
    "chars": 5242,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a01_FastText/old_single_label/p5_fastTextB_train.py",
    "chars": 10912,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a01_FastText/p5_fastTextB_predict_multilabel.py",
    "chars": 5155,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a01_FastText/p6_fastTextB_model_multilabel.py",
    "chars": 6595,
    "preview": "# autor:xul\n# fast text. using: very simple model;n-gram to captrue location information;h-softmax to speed up training/"
  },
  {
    "path": "a01_FastText/p6_fastTextB_train_multilabel.py",
    "chars": 15622,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\ntraining the model.\nprocess--->1.load data(X:list of lint,y:int). 2.create session. 3.feed d"
  },
  {
    "path": "a02_TextCNN/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "a02_TextCNN/data_util.py",
    "chars": 6514,
    "preview": "# -*- coding: utf-8 -*-\nimport codecs\nimport random\nimport numpy as np\nfrom tflearn.data_utils import pad_sequences\nfrom"
  },
  {
    "path": "a02_TextCNN/other_experiement/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "a02_TextCNN/other_experiement/data_util_zhihu.py",
    "chars": 26699,
    "preview": "# -*- coding: utf-8 -*-\nimport codecs\nimport numpy as np\n#load data of zhihu\nimport word2vec\nimport os\nimport pickle\nPAD"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_model_multilayers.py",
    "chars": 14176,
    "preview": "# -*- coding: utf-8 -*-\n#TextCNN: 1. embeddding layers, 2.convolutional layer, 3.max-pooling, 4.softmax layer.\n# print(\""
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_predict_ensemble.py",
    "chars": 693,
    "preview": "from  p7_TextCNN_predict import get_logits_with_value_by_input\nfrom p7_TextCNN_predict_exp import get_logits_with_value_"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_predict_exp.py",
    "chars": 9145,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512.py",
    "chars": 6648,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_0609.py",
    "chars": 6620,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_simple.py",
    "chars": 6677,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_train_exp.py",
    "chars": 11961,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_train_exp512.py",
    "chars": 12110,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a02_TextCNN/other_experiement/p7_TextCNN_train_exp_512_0609.py",
    "chars": 14173,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a02_TextCNN/other_experiement/p8_TextCNN_predict_exp.py",
    "chars": 8953,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/p7_TextCNN_model.py",
    "chars": 17954,
    "preview": "# -*- coding: utf-8 -*-\n#TextCNN: 1. embeddding layers, 2.convolutional layer, 3.max-pooling, 4.softmax layer.\n# print(\""
  },
  {
    "path": "a02_TextCNN/p7_TextCNN_predict.py",
    "chars": 10784,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a02_TextCNN/p7_TextCNN_train.py",
    "chars": 14075,
    "preview": "# -*- coding: utf-8 -*-\n#import sys\n#reload(sys)\n#sys.setdefaultencoding('utf-8') #gb2312\n#training the model.\n#process-"
  },
  {
    "path": "a02_TextCNN/p7_temp.py",
    "chars": 1531,
    "preview": "# -*- coding: utf-8 -*-\nimport random\ndef read_write(source_file_path,target_file_path):\n    # 1.read file\n    source_fi"
  },
  {
    "path": "a03_TextRNN/p8_TextRNN_model.py",
    "chars": 9061,
    "preview": "# -*- coding: utf-8 -*-\n#TextRNN: 1. embeddding layer, 2.Bi-LSTM layer, 3.concat output, 4.FC layer, 5.softmax\nimport te"
  },
  {
    "path": "a03_TextRNN/p8_TextRNN_model_multi_layers.py",
    "chars": 9226,
    "preview": "# -*- coding: utf-8 -*-\n#TextRNN: 1. embeddding layer, 2.Bi-LSTM layer, 3.concat output, 4.FC layer, 5.softmax\nimport te"
  },
  {
    "path": "a03_TextRNN/p8_TextRNN_predict.py",
    "chars": 7151,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a03_TextRNN/p8_TextRNN_train.py",
    "chars": 11187,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a04_TextRCNN/p71_TextRCNN_mode2.py",
    "chars": 14149,
    "preview": "# -*- coding: utf-8 -*-\n#TextRNN: 1. embeddding layer, 2.Bi-LSTM layer, 3.concat output, 4.FC layer, 5.softmax\nimport te"
  },
  {
    "path": "a04_TextRCNN/p71_TextRCNN_model.py",
    "chars": 12371,
    "preview": "# -*- coding: utf-8 -*-\n#TextRNN: 1. embeddding layer, 2.Bi-LSTM layer, 3.concat output, 4.FC layer, 5.softmax\nimport te"
  },
  {
    "path": "a04_TextRCNN/p71_TextRCNN_predict.py",
    "chars": 7342,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a04_TextRCNN/p71_TextRCNN_train.py",
    "chars": 11741,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/HAN_model.py",
    "chars": 15334,
    "preview": "# -*- coding: utf-8 -*-\n# HierarchicalAttention: 1.Word Encoder. 2.Word Attention. 3.Sentence Encoder 4.Sentence Attenti"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_model.py",
    "chars": 31654,
    "preview": "# -*- coding: utf-8 -*-\n# HierarchicalAttention: 1.Word Encoder. 2.Word Attention. 3.Sentence Encoder 4.Sentence Attenti"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_model_transformer.py",
    "chars": 35659,
    "preview": "# -*- coding: utf-8 -*-\n# HierarchicalAttention: 1.Word Encoder. 2.Word Attention. 3.Sentence Encoder 4.Sentence Attenti"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_predict.py",
    "chars": 8467,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_train.py",
    "chars": 14543,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a05_HierarchicalAttentionNetwork/p1_seq2seq.py",
    "chars": 6571,
    "preview": "# -*- coding: utf-8 -*-\nimport tensorflow as tf\n\n# 【该方法测试的时候使用】返回一个方法。这个方法根据输入的值,得到对应的索引,再得到这个词的embedding.\ndef extract_a"
  },
  {
    "path": "a06_Seq2seqWithAttention/a1_seq2seq.py",
    "chars": 7650,
    "preview": "# -*- coding: utf-8 -*-\nimport tensorflow as tf\n\n# 【该方法测试的时候使用】返回一个方法。这个方法根据输入的值,得到对应的索引,再得到这个词的embedding.\ndef extract_a"
  },
  {
    "path": "a06_Seq2seqWithAttention/a1_seq2seq_attention_model.py",
    "chars": 18866,
    "preview": "# -*- coding: utf-8 -*-\n# seq2seq_attention: 1.word embedding 2.encoder 3.decoder(optional with attention). for more det"
  },
  {
    "path": "a06_Seq2seqWithAttention/a1_seq2seq_attention_predict.py",
    "chars": 8635,
    "preview": "# -*- coding: utf-8 -*-\n#prediction using model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed"
  },
  {
    "path": "a06_Seq2seqWithAttention/a1_seq2seq_attention_train.py",
    "chars": 14657,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "a07_Transformer/a2_attention_between_enc_dec.py",
    "chars": 2977,
    "preview": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nattention connect encoder and decoder\r\nIn \"encoder-decoder attention\" layers, the queries "
  },
  {
    "path": "a07_Transformer/a2_base_model.py",
    "chars": 4533,
    "preview": "# -*- coding: utf-8 -*-\r\nimport tensorflow as tf\r\nfrom  a2_multi_head_attention import MultiHeadAttention\r\nfrom a2_poist"
  },
  {
    "path": "a07_Transformer/a2_decoder.py",
    "chars": 7994,
    "preview": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nDecoder:\r\n1. The decoder is composed of a stack of N= 6 identical layers.\r\n2. In addition "
  },
  {
    "path": "a07_Transformer/a2_encoder.py",
    "chars": 6307,
    "preview": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nencoder for the transformer:\r\n6 layers.each layers has two sub-layers.\r\nthe first is multi"
  },
  {
    "path": "a07_Transformer/a2_layer_norm_residual_conn.py",
    "chars": 2904,
    "preview": "import tensorflow as tf\r\nimport time\r\n\"\"\"\r\nWe employ a residual connection around each of the two sub-layers, followed b"
  },
  {
    "path": "a07_Transformer/a2_multi_head_attention.py",
    "chars": 9148,
    "preview": "# -*- coding: utf-8 -*-\r\n#test self-attention\r\nimport tensorflow as tf\r\nimport time\r\n\"\"\"\r\nmulti head attention.\r\n1.linea"
  },
  {
    "path": "a07_Transformer/a2_poistion_wise_feed_forward.py",
    "chars": 3327,
    "preview": "# -*- coding: utf-8 -*-\r\nimport tensorflow as tf\r\nimport time\r\n\"\"\"\r\nPosition-wise Feed-Forward Networks\r\nIn addition to "
  },
  {
    "path": "a07_Transformer/a2_predict.py",
    "chars": 9174,
    "preview": "# -*- coding: utf-8 -*-\r\n#prediction using model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.fe"
  },
  {
    "path": "a07_Transformer/a2_predict_classification.py",
    "chars": 8661,
    "preview": "# -*- coding: utf-8 -*-\r\n#prediction using model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.fe"
  },
  {
    "path": "a07_Transformer/a2_split_traning_data.py",
    "chars": 597,
    "preview": "# -*- coding: utf-8 -*-\r\nimport codecs\r\n\r\nfile='training-data/test-zhihu6-title-desc.txt'\r\nfile_x='training-data/test_x."
  },
  {
    "path": "a07_Transformer/a2_train.py",
    "chars": 13771,
    "preview": "# -*- coding: utf-8 -*-\r\n#training the model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed d"
  },
  {
    "path": "a07_Transformer/a2_train_classification.py",
    "chars": 12921,
    "preview": "# -*- coding: utf-8 -*-\r\n#training the model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed d"
  },
  {
    "path": "a07_Transformer/a2_transformer.py",
    "chars": 17846,
    "preview": "# -*- coding: utf-8 -*-\r\nimport tensorflow as tf\r\nimport numpy as np\r\nimport random\r\nimport copy\r\nfrom a2_base_model imp"
  },
  {
    "path": "a07_Transformer/a2_transformer_classification.py",
    "chars": 13826,
    "preview": "# -*- coding: utf-8 -*-\r\nimport tensorflow as tf\r\nimport numpy as np\r\nimport random\r\nimport copy\r\nfrom a2_base_model imp"
  },
  {
    "path": "a07_Transformer/data_util_zhihu.py",
    "chars": 27184,
    "preview": "# -*- coding: utf-8 -*-\r\nimport codecs\r\nimport numpy as np\r\n#load data of zhihu\r\nimport word2vec\r\nimport os\r\nimport pick"
  },
  {
    "path": "a08_EntityNetwork/a3_entity_network.py",
    "chars": 24181,
    "preview": "# -*- coding: utf-8 -*-\r\n# EntityNet:1.input encoder  2. dynamic emeory 3.output layer\r\nimport tensorflow as tf\r\nimport "
  },
  {
    "path": "a08_EntityNetwork/a3_predict.py",
    "chars": 8486,
    "preview": "# -*- coding: utf-8 -*-\r\n#prediction using model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.fe"
  },
  {
    "path": "a08_EntityNetwork/a3_train.py",
    "chars": 14887,
    "preview": "# -*- coding: utf-8 -*-\r\n#training the model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed d"
  },
  {
    "path": "a08_EntityNetwork/data_util_zhihu.py",
    "chars": 27181,
    "preview": "# -*- coding: utf-8 -*-\r\nimport codecs\r\nimport numpy as np\r\n#load data of zhihu\r\nimport word2vec\r\nimport os\r\nimport pick"
  },
  {
    "path": "a08_predict_ensemble.py",
    "chars": 16737,
    "preview": "# -*- coding: utf-8 -*-\r\n#prediction using multi-models. take out: create multiple graphs. each graph associate with a s"
  },
  {
    "path": "a09_DynamicMemoryNet/a8_dynamic_memory_network.py",
    "chars": 24774,
    "preview": "# -*- coding: utf-8 -*-\r\n\"\"\"\r\nDynamic Memory Network: a.Input Module,b.Question Module,c.Episodic Memory Module,d.Answer"
  },
  {
    "path": "a09_DynamicMemoryNet/a8_predict.py",
    "chars": 9176,
    "preview": "# -*- coding: utf-8 -*-\r\n#prediction using model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.fe"
  },
  {
    "path": "a09_DynamicMemoryNet/a8_train.py",
    "chars": 15085,
    "preview": "# -*- coding: utf-8 -*-\r\n#training the model.\r\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed d"
  },
  {
    "path": "aa1_data_util/1_process_zhihu.py",
    "chars": 5919,
    "preview": "# -*- coding: utf-8 -*-\nimport sys\n#reload(sys)\n#sys.setdefaultencoding('utf8')\n#1.将问题ID和TOPIC对应关系保持到字典里:process questio"
  },
  {
    "path": "aa1_data_util/2_predict_zhihu_get_question_representation.py",
    "chars": 1325,
    "preview": "# -*- coding: utf-8 -*-\nimport sys\nreload(sys)\nsys.setdefaultencoding('utf8')\n\n#准备预测需要的数据.每一行作为问题的表示,写到文件中.\n#prepreing p"
  },
  {
    "path": "aa1_data_util/3_process_zhihu_question_topic_relation.py",
    "chars": 6216,
    "preview": "# -*- coding: utf-8 -*-\nimport sys\nreload(sys)\nsys.setdefaultencoding('utf8')\n#最终输出:x1=question_representation,x2=topic_"
  },
  {
    "path": "aa1_data_util/data_multi_label.txt",
    "chars": 182,
    "preview": "xxx1 xxx2 xxx3 xxx4 xxx5 __label__L11 L09 L03\nxxx2 xxx2 xxx3 xxx4 xxx6 __label__L20 L11 L21 L24\nxxx0 xxx2 xxx3 xxx4 xxx2"
  },
  {
    "path": "aa1_data_util/data_single_label.txt",
    "chars": 113,
    "preview": "xxx1 xxx2 xxx3 xxx4 xxx5 __label__L11\nxxx2 xxx2 xxx3 xxx4 xxx6 __label__L20\nxxx0 xxx2 xxx3 xxx4 xxx2 __label__L1\n"
  },
  {
    "path": "aa1_data_util/data_util_zhihu.py",
    "chars": 26678,
    "preview": "# -*- coding: utf-8 -*-\nimport codecs\nimport numpy as np\n#load data of zhihu\nimport word2vec\nimport os\nimport pickle\nPAD"
  },
  {
    "path": "aa2_ClassificationTflearn/p2_classification_tflearn.py",
    "chars": 1345,
    "preview": "print(\"started...\")\nimport tflearn\nimport numpy as np\nimport tensorflow as tf\nclass_number=3 #10\ntflearn.init_graph(num_"
  },
  {
    "path": "aa2_ClassificationTflearn/p2_classification_tflearn_demo.py",
    "chars": 1345,
    "preview": "print(\"started...\")\nimport tflearn\nimport numpy as np\nimport tensorflow as tf\nclass_number=3 #10\ntflearn.init_graph(num_"
  },
  {
    "path": "aa3_CNNSentenceClassificationTflearn/p4_cnn_sentence_classification.py",
    "chars": 3758,
    "preview": "# -*- coding: utf-8 -*-\nfrom __future__ import division, print_function, absolute_import\n\n\"\"\"\nSimple example using convo"
  },
  {
    "path": "aa3_CNNSentenceClassificationTflearn/p4_cnn_sentence_classification_zhihu.py",
    "chars": 5000,
    "preview": "# -*- coding: utf-8 -*-\nfrom __future__ import division, print_function, absolute_import\n\n\"\"\"\nSimple example using convo"
  },
  {
    "path": "aa3_CNNSentenceClassificationTflearn/p4_cnn_sentence_classification_zhihu2.py",
    "chars": 6588,
    "preview": "# -*- coding: utf-8 -*-\nfrom __future__ import division, print_function, absolute_import\n\n\"\"\"\nSimple example using convo"
  },
  {
    "path": "aa3_CNNSentenceClassificationTflearn/p4_cnn_sentence_classification_zhihu2_predict.py",
    "chars": 5834,
    "preview": "# -*- coding: utf-8 -*-\nfrom __future__ import division, print_function, absolute_import\n\n\"\"\"\nSimple example using convo"
  },
  {
    "path": "aa3_CNNSentenceClassificationTflearn/p4_conv_classification_tflearn.py",
    "chars": 2323,
    "preview": "from __future__ import division, print_function, absolute_import\n\nimport tensorflow as tf\n\n# -*- coding: utf-8 -*-\n\n\"\"\" "
  },
  {
    "path": "aa4_TextCNN_with_RCNN/p72_TextCNN_with_RCNN_model.py",
    "chars": 18780,
    "preview": "# -*- coding: utf-8 -*-\n#TextCNN: 1. embeddding layers, 2.convolutional layer, 3.max-pooling, 4.softmax layer.\n# print(\""
  },
  {
    "path": "aa4_TextCNN_with_RCNN/p72_TextCNN_with_RCNN_train.py",
    "chars": 12046,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "aa5_BiLstmTextRelation/p9_BiLstmTextRelation_model.py",
    "chars": 8188,
    "preview": "# -*- coding: utf-8 -*-\n\"\"\"\nBiLstmTextRelation: check reationship of two questions(Qi,Qj),result(0 or 1). 1 means relate"
  },
  {
    "path": "aa5_BiLstmTextRelation/p9_BiLstmTextRelation_train.py",
    "chars": 12016,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "aa6_TwoCNNTextRelation/p9_twoCNNTextRelation_model.py",
    "chars": 12898,
    "preview": "# -*- coding: utf-8 -*-\n#TextCNN: for each of two sentences,do(1. embeddding layers, 2.convolutional layer, 3.max-poolin"
  },
  {
    "path": "aa6_TwoCNNTextRelation/p9_twoCNNTextRelation_train.py",
    "chars": 11967,
    "preview": "# -*- coding: utf-8 -*-\n#training the model.\n#process--->1.load data(X:list of lint,y:int). 2.create session. 3.feed dat"
  },
  {
    "path": "data/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/ieee_zhihu_cup/label_set.txt",
    "chars": 40791,
    "preview": "7476760589625268543\n4697014490911193675\n-4653836020042332281\n-8175048003539471998\n-8377411942628634656\n-7046289575185911"
  },
  {
    "path": "data/ieee_zhihu_cup/vocab.txt",
    "chars": 73300,
    "preview": "PAD\nUNK\nCLS\nSEP\nunused1\nunused2\nunused3\nunused4\nunused5\n</s>\nc17\nc101\nc11\nc4\nc147\nc85\nc184\nc855\nc2\nc57\nc152\nc148\nc38\nc15"
  },
  {
    "path": "data/old/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "data/old/sample_multiple_label.txt",
    "chars": 8469843,
    "preview": "w10253 w1723 w5240 w72 w13047 w111 c520 c1427 c407 c1451 c72 c131 c931 c769 c267 c184 w1617 c229 __label__-6522242102892"
  },
  {
    "path": "data/sample_multiple_label.txt",
    "chars": 8696185,
    "preview": "w10253 w1723 w5240 w72 w13047 w111 c520 c1427 c407 c1451 c72 c131 c931 c769 c267 c184 w1617 c229 __label__65222421028925"
  },
  {
    "path": "data/sample_single_label.txt",
    "chars": 5180669,
    "preview": "w91874 w2300 w6 w25363 w6332 w11 w767 w297441 w12480 w256 w23270 w13482 w22236 w259 w11 w26959 w25 w1613 w25363 w111 __l"
  },
  {
    "path": "images/xx",
    "chars": 3,
    "preview": "xx\n"
  },
  {
    "path": "pre-processing.ipynb",
    "chars": 63239,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"###                                "
  }
]

About this extraction

This page contains the full source code of the brightmart/text_classification GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 108 files (22.5 MB), approximately 5.9M tokens, and a symbol index with 679 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!